What is Lean

Lean is a functional programming language that makes it easy to write correct and maintainable code. You can also use Lean as an interactive theorem prover.

Lean programming primarily involves defining types and functions. This allows your focus to remain on the problem domain and manipulating its data, rather than the details of programming.

-- Defines a function that takes a name and produces a greeting.
def getGreeting (name : String) := s!"Hello, {name}! Isn't Lean great?"

-- The `main` function is the entry point of your program.
-- Its type is `IO Unit` because it can perform `IO` operations (side effects).
def main : IO Unit :=
  -- Define a list of names
  let names := ["Sebastian", "Leo", "Daniel"]

  -- Map each name to a greeting
  let greetings := names.map getGreeting

  -- Print the list of greetings
  for greeting in greetings do
    IO.println greeting

Lean has numerous features, including:

Tour of Lean

The best way to learn about Lean is to read and write Lean code. This article will act as a tour through some of the key features of the Lean language and give you some code snippets that you can execute on your machine. To learn about setting up a development environment, check out Setting Up Lean.

There are two primary concepts in Lean: functions and types. This tour will emphasize features of the language which fall into these two concepts.

Functions and Namespaces

The most fundamental pieces of any Lean program are functions organized into namespaces. Functions perform work on inputs to produce outputs, and they are organized under namespaces, which are the primary way you group things in Lean. They are defined using the def command, which give the function a name and define its arguments.

namespace BasicFunctions

-- The `#eval` command evaluates an expression on the fly and prints the result.
#eval 2+2

-- You use 'def' to define a function. This one accepts a natural number
-- and returns a natural number.
-- Parentheses are optional for function arguments, except for when
-- you use an explicit type annotation.
-- Lean can often infer the type of the function's arguments.
def sampleFunction1 x := x*x + 3

-- Apply the function, naming the function return result using 'def'.
-- The variable type is inferred from the function return type.
def result1 := sampleFunction1 4573

-- This line uses an interpolated string to print the result. Expressions inside
-- braces `{}` are converted into strings using the polymorphic method `toString`
#eval println! "The result of squaring the integer 4573 and adding 3 is {result1}"

-- When needed, annotate the type of a parameter name using '(argument : type)'.
def sampleFunction2 (x : Nat) := 2*x*x - x + 3

def result2 := sampleFunction2 (7 + 4)

#eval println! "The result of applying the 2nd sample function to (7 + 4) is {result2}"

-- Conditionals use if/then/else
def sampleFunction3 (x : Int) :=
  if x > 100 then
    2*x*x - x + 3
  else
    2*x*x + x - 37

#eval println! "The result of applying sampleFunction3 to 2 is {sampleFunction3 2}"

end BasicFunctions
-- Lean has first-class functions.
-- `twice` takes two arguments `f` and `a` where
-- `f` is a function from natural numbers to natural numbers, and
-- `a` is a natural number.
def twice (f : Nat → Nat) (a : Nat) :=
  f (f a)

-- `fun` is used to declare anonymous functions
#eval twice (fun x => x + 2) 10

-- You can prove theorems about your functions.
-- The following theorem states that for any natural number `a`,
-- adding 2 twice produces a value equal to `a + 4`.
theorem twiceAdd2 (a : Nat) : twice (fun x => x + 2) a = a + 4 :=
  -- The proof is by reflexivity. Lean "symbolically" reduces both sides of the equality
  -- until they are identical.
  rfl

-- `(· + 2)` is syntax sugar for `(fun x => x + 2)`. The parentheses + `·` notation
-- is useful for defining simple anonymous functions.
#eval twice (· + 2) 10

-- Enumerated types are a special case of inductive types in Lean,
-- which we will learn about later.
-- The following command creates a new type `Weekday`.
inductive Weekday where
  | sunday    : Weekday
  | monday    : Weekday
  | tuesday   : Weekday
  | wednesday : Weekday
  | thursday  : Weekday
  | friday    : Weekday
  | saturday  : Weekday

-- `Weekday` has 7 constructors/elements.
-- The constructors live in the `Weekday` namespace.
-- Think of `sunday`, `monday`, …, `saturday` as being distinct elements of `Weekday`,
-- with no other distinguishing properties.
-- The command `#check` prints the type of a term in Lean.
#check Weekday.sunday
#check Weekday.monday

-- The `open` command opens a namespace, making all declarations in it accessible without
-- qualification.
open Weekday
#check sunday
#check tuesday

-- You can define functions by pattern matching.
-- The following function converts a `Weekday` into a natural number.
def natOfWeekday (d : Weekday) : Nat :=
  match d with
  | sunday    => 1
  | monday    => 2
  | tuesday   => 3
  | wednesday => 4
  | thursday  => 5
  | friday    => 6
  | saturday  => 7

#eval natOfWeekday tuesday

def isMonday : Weekday → Bool :=
  -- `fun` + `match`  is a common idiom.
  -- The following expression is syntax sugar for
  -- `fun d => match d with | monday => true | _ => false`.
  fun
    | monday => true
    | _      => false

#eval isMonday monday
#eval isMonday sunday

-- Lean has support for type classes and polymorphic methods.
-- The `toString` method converts a value into a `String`.
#eval toString 10
#eval toString (10, 20)

-- The method `toString` converts values of any type that implements
-- the class `ToString`.
-- You can implement instances of `ToString` for your own types.
instance : ToString Weekday where
  toString (d : Weekday) : String :=
    match d with
    | sunday    => "Sunday"
    | monday    => "Monday"
    | tuesday   => "Tuesday"
    | wednesday => "Wednesday"
    | thursday  => "Thursday"
    | friday    => "Friday"
    | saturday  => "Saturday"

#eval toString (sunday, 10)

def Weekday.next (d : Weekday) : Weekday :=
  match d with
  | sunday    => monday
  | monday    => tuesday
  | tuesday   => wednesday
  | wednesday => thursday
  | thursday  => friday
  | friday    => saturday
  | saturday  => sunday

#eval Weekday.next Weekday.wednesday
-- Since the `Weekday` namespace has already been opened, you can also write
#eval next wednesday

-- Matching on a parameter like in the previous definition
-- is so common that Lean provides syntax sugar for it. The following
-- function uses it.
def Weekday.previous : Weekday -> Weekday
  | sunday    => saturday
  | monday    => sunday
  | tuesday   => monday
  | wednesday => tuesday
  | thursday  => wednesday
  | friday    => thursday
  | saturday  => friday

#eval next (previous wednesday)

-- We can prove that for any `Weekday` `d`, `next (previous d) = d`
theorem Weekday.nextOfPrevious (d : Weekday) : next (previous d) = d :=
  match d with
  | sunday    => rfl
  | monday    => rfl
  | tuesday   => rfl
  | wednesday => rfl
  | thursday  => rfl
  | friday    => rfl
  | saturday  => rfl

-- You can automate definitions such as `Weekday.nextOfPrevious`
-- using metaprogramming (or "tactics").
theorem Weekday.nextOfPrevious' (d : Weekday) : next (previous d) = d := by
  cases d       -- A proof by case distinction
  all_goals rfl  -- Each case is solved using `rfl`

Quickstart

These instructions will walk you through setting up Lean 4 together with VS Code as an editor for Lean 4. See Setup for supported platforms and other ways to set up Lean 4.

  1. Install VS Code.

  2. Launch VS Code and install the Lean 4 extension by clicking on the 'Extensions' sidebar entry and searching for 'Lean 4'.

    installing the vscode-lean4 extension

  3. Open the Lean 4 setup guide by creating a new text file using 'File > New Text File' (Ctrl+N / Cmd+N), clicking on the ∀-symbol in the top right and selecting 'Documentation… > Docs: Show Setup Guide'.

    show setup guide

  4. Follow the Lean 4 setup guide. It will:

    • walk you through learning resources for Lean,
    • teach you how to set up Lean's dependencies on your platform,
    • install Lean 4 for you at the click of a button,
    • help you set up your first project.

    setup guide

Supported Platforms

Tier 1

Platforms built & tested by our CI, available as binary releases via elan (see below)

  • x86-64 Linux with glibc 2.27+
  • x86-64 macOS 10.15+
  • aarch64 (Apple Silicon) macOS 10.15+
  • x86-64 Windows 11 (any version), Windows 10 (version 1903 or higher), Windows Server 2022

Tier 2

Platforms cross-compiled but not tested by our CI, available as binary releases

Releases may be silently broken due to the lack of automated testing. Issue reports and fixes are welcome.

  • aarch64 Linux with glibc 2.27+
  • x86 (32-bit) Linux
  • Emscripten Web Assembly

Setting Up Lean

See also the quickstart instructions for a standard setup with VS Code as the editor.

Release builds for all supported platforms are available at https://github.com/leanprover/lean4/releases. Instead of downloading these and setting up the paths manually, however, it is recommended to use the Lean version manager elan instead:

$ elan self update  # in case you haven't updated elan in a while
# download & activate latest Lean 4 stable release (https://github.com/leanprover/lean4/releases)
$ elan default leanprover/lean4:stable

lake

Lean 4 comes with a package manager named lake. Use lake init foo to initialize a Lean package foo in the current directory, and lake build to typecheck and build it as well as all its dependencies. Use lake help to learn about further commands. The general directory structure of a package foo is

lakefile.lean  # package configuration
lean-toolchain # specifies the lean version to use
Foo.lean       # main file, import via `import Foo`
Foo/
  A.lean       # further files, import via e.g. `import Foo.A`
  A/...        # further nesting
.lake/         # `lake` build output directory

After running lake build you will see a binary named ./.lake/build/bin/foo and when you run it you should see the output:

Hello, world!

Editing

Lean implements the Language Server Protocol that can be used for interactive development in Emacs, VS Code, and possibly other editors.

Changes must be saved to be visible in other files, which must then be invalidated using an editor command (see links above).

Theorem Proving in Lean

We strongly encourage you to read the book Theorem Proving in Lean. Many Lean users consider it to be the Lean Bible.

Functional Programming in Lean

The goal of this book is to be an accessible introduction to using Lean 4 as a programming language. It should be useful both to people who want to use Lean as a general-purpose programming language and to mathematicians who want to develop larger-scale proof automation but do not have a background in functional programming. It does not assume any background with functional programming, though it's probably not a good first book on programming in general. New content will be added once per month until it's done.

Examples

Palindromes

Palindromes are lists that read the same from left to right and from right to left. For example, [a, b, b, a] and [a, h, a] are palindromes.

We use an inductive predicate to specify whether a list is a palindrome or not. Recall that inductive predicates, or inductively defined propositions, are a convenient way to specify functions of type ... → Prop.

This example is a based on an example from the book "The Hitchhiker's Guide to Logical Verification".

inductive 
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
Prop: Type
Prop
where |
nil: ∀ {α : Type u_1}, Palindrome []
nil
:
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
[]: List ?m.17
[]
|
single: ∀ {α : Type u_1} (a : α), Palindrome [a]
single
: (
a: α
a
:
α: Type u_1
α
)
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
[
a: α
a
] |
sandwich: ∀ {α : Type u_1} {as : List α} (a : α), Palindrome as → Palindrome ([a] ++ as ++ [a])
sandwich
: (
a: α
a
:
α: Type u_1
α
)
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List α
as
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
([
a: α
a
] ++
as: List α
as
++ [
a: α
a
])

The definition distinguishes three cases: (1) [] is a palindrome; (2) for any element a, the singleton list [a] is a palindrome; (3) for any element a and any palindrome [b₁, . . ., bₙ], the list [a, b₁, . . ., bₙ, a] is a palindrome.

We now prove that the reverse of a palindrome is a palindrome using induction on the inductive predicate h : Palindrome as.

theorem 
palindrome_reverse: ∀ {α : Type u_1} {as : List α}, Palindrome as → Palindrome as.reverse
palindrome_reverse
(
h: Palindrome as
h
:
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.421
as
) :
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.421
as
.
reverse: {α : Type u_1} → List α → List α
reverse
:=

Goals accomplished! 🐙
α✝: Type u_1
as: List α
h: Palindrome as

Palindrome as.reverse

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙
α✝: Type u_1
as, as✝: List α
a: α
h: Palindrome as
ih: Palindrome as✝.reverse

sandwich
Palindrome (a :: (as✝.reverse ++ [a]))
;

Goals accomplished! 🐙

If a list as is a palindrome, then the reverse of as is equal to itself.

theorem 
reverse_eq_of_palindrome: ∀ {α : Type u_1} {as : List α}, Palindrome as → as.reverse = as
reverse_eq_of_palindrome
(
h: Palindrome as
h
:
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.696
as
) :
as: List ?m.696
as
.
reverse: {α : Type u_1} → List α → List α
reverse
=
as: List ?m.696
as
:=

Goals accomplished! 🐙
α✝: Type u_1
as: List α
h: Palindrome as

as.reverse = as

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

Note that you can also easily prove palindrome_reverse using reverse_eq_of_palindrome.

example: ∀ {α : Type u_1} {as : List α}, Palindrome as → Palindrome as.reverse
example
(
h: Palindrome as
h
:
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.942
as
) :
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.942
as
.
reverse: {α : Type u_1} → List α → List α
reverse
:=

Goals accomplished! 🐙

Goals accomplished! 🐙

Given a nonempty list, the function List.last returns its element. Note that we use (by simp) to prove that a₂ :: as ≠ [] in the recursive application.

def 
List.last: {α : Type u_1} → (as : List α) → as ≠ [] → α
List.last
: (
as: List α
as
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
)
as: List α
as
[]: List α
[]
α: Type u_1
α
| [
a: α
a
], _ =>
a: α
a
| _::
a₂: α
a₂
::
as: List α
as
, _ => (
a₂: α
a₂
::
as: List α
as
).
last: {α : Type u_1} → (as : List α) → as ≠ [] → α
last
(

Goals accomplished! 🐙

Goals accomplished! 🐙
)

We use the function List.last to prove the following theorem that says that if a list as is not empty, then removing the last element from as and appending it back is equal to as. We use the attribute @[simp] to instruct the simp tactic to use this theorem as a simplification rule.

@[simp] theorem 
List.dropLast_append_last: ∀ {α : Type u_1} {as : List α} (h : as ≠ []), as.dropLast ++ [as.last h] = as
List.dropLast_append_last
(
h: as ≠ []
h
:
as: List ?m.1481
as
[]: List ?m.1481
[]
) :
as: List ?m.1481
as
.
dropLast: {α : Type u_1} → List α → List α
dropLast
++ [
as: List ?m.1481
as
.
last: {α : Type u_1} → (as : List α) → as ≠ [] → α
last
h: as ≠ []
h
] =
as: List ?m.1481
as
:=

Goals accomplished! 🐙
α✝: Type u_1
as: List α
h: as []

as.dropLast ++ [as.last h] = as
α✝: Type u_1
as: List α
h: [] []

[].dropLast ++ [[].last h] = []

Goals accomplished! 🐙
α✝: Type u_1
as: List α
a: α
h: [a] []

[a].dropLast ++ [[a].last h] = [a]

Goals accomplished! 🐙
α✝: Type u_1
as✝: List α
a₁, a₂: α
as: List α
h: a₁ :: a₂ :: as []

(a₁ :: a₂ :: as).dropLast ++ [(a₁ :: a₂ :: as).last h] = a₁ :: a₂ :: as
α✝: Type u_1
as✝: List α
a₁, a₂: α
as: List α
h: a₁ :: a₂ :: as []

(a₂ :: as).dropLast ++ [(a₂ :: as).last ] = a₂ :: as
α✝: Type u_1
as✝: List α
a₁, a₂: α
as: List α
h: a₁ :: a₂ :: as []

(a₂ :: as).dropLast ++ [(a₂ :: as).last ] = a₂ :: as

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

We now define the following auxiliary induction principle for lists using well-founded recursion on as.length. We can read it as follows, to prove motive as, it suffices to show that: (1) motive []; (2) motive [a] for any a; (3) if motive as holds, then motive ([a] ++ as ++ [b]) also holds for any a, b, and as. Note that the structure of this induction principle is very similar to the Palindrome inductive predicate.

theorem 
List.palindrome_ind: ∀ {α : Type u_1} (motive : List α → Prop), motive [] → (∀ (a : α), motive [a]) → (∀ (a b : α) (as : List α), motive as → motive ([a] ++ as ++ [b])) → ∀ (as : List α), motive as
List.palindrome_ind
(
motive: List α → Prop
motive
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
Prop: Type
Prop
) (
h₁: motive []
h₁
:
motive: List α → Prop
motive
[]: List α
[]
) (
h₂: ∀ (a : α), motive [a]
h₂
: (
a: α
a
:
α: Type u_1
α
)
motive: List α → Prop
motive
[
a: α
a
]) (
h₃: ∀ (a b : α) (as : List α), motive as → motive ([a] ++ as ++ [b])
h₃
: (
a: α
a
b: α
b
:
α: Type u_1
α
) (
as: List α
as
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
)
motive: List α → Prop
motive
as: List α
as
motive: List α → Prop
motive
([
a: α
a
] ++
as: List α
as
++ [
b: α
b
])) (
as: List α
as
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
) :
motive: List α → Prop
motive
as: List α
as
:= match
as: List α
as
with | [] =>
h₁: motive []
h₁
| [
a: α
a
] =>
h₂: ∀ (a : α), motive [a]
h₂
a: α
a
|
a₁: α
a₁
::
a₂: α
a₂
::
as': List α
as'
=> have
ih: motive (a₂ :: as').dropLast
ih
:=
palindrome_ind: ∀ {α : Type u_1} (motive : List α → Prop), motive [] → (∀ (a : α), motive [a]) → (∀ (a b : α) (as : List α), motive as → motive ([a] ++ as ++ [b])) → ∀ (as : List α), motive as
palindrome_ind
motive: List α → Prop
motive
h₁: motive []
h₁
h₂: ∀ (a : α), motive [a]
h₂
h₃: ∀ (a b : α) (as : List α), motive as → motive ([a] ++ as ++ [b])
h₃
(
a₂: α
a₂
::
as': List α
as'
).
dropLast: {α : Type u_1} → List α → List α
dropLast
have : [
a₁: α
a₁
] ++ (
a₂: α
a₂
::
as': List α
as'
).
dropLast: {α : Type u_1} → List α → List α
dropLast
++ [(
a₂: α
a₂
::
as': List α
as'
).
last: {α : Type u_1} → (as : List α) → as ≠ [] → α
last
(

Goals accomplished! 🐙

Goals accomplished! 🐙
)] =
a₁: α
a₁
::
a₂: α
a₂
::
as': List α
as'
:=

Goals accomplished! 🐙

Goals accomplished! 🐙
this: [a₁] ++ (a₂ :: as').dropLast ++ [(a₂ :: as').last ⋯] = a₁ :: a₂ :: as'
this
h₃: ∀ (a b : α) (as : List α), motive as → motive ([a] ++ as ++ [b])
h₃
_: α
_
_: α
_
_: List α
_
ih: motive (a₂ :: as').dropLast
ih
termination_by
as: List α
as
.
length: {α : Type u_1} → List α → Nat
length

We use our new induction principle to prove that if as.reverse = as, then Palindrome as holds. Note that we use the using modifier to instruct the induction tactic to use this induction principle instead of the default one for lists.

theorem 
List.palindrome_of_eq_reverse: ∀ {α : Type u_1} {as : List α}, as.reverse = as → Palindrome as
List.palindrome_of_eq_reverse
(
h: as.reverse = as
h
:
as: List ?m.5792
as
.
reverse: {α : Type u_1} → List α → List α
reverse
=
as: List ?m.5792
as
) :
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List ?m.5792
as
:=

Goals accomplished! 🐙
α✝: Type u_1
h: [].reverse = []

h₁
Palindrome []
α✝: Type u_1
a✝: α
h: [a].reverse = [a]
Palindrome [a]
α✝: Type u_1
a✝¹, b✝: α
as✝: List α
a✝: as✝.reverse = as Palindrome as
h: ([a✝¹] ++ as ++ [b]).reverse = [a✝¹] ++ as ++ [b]
Palindrome ([a✝¹] ++ as ++ [b])
α✝: Type u_1
h: [].reverse = []

h₁
Palindrome []
α✝: Type u_1
a✝: α
h: [a].reverse = [a]
Palindrome [a]
α✝: Type u_1
a✝¹, b✝: α
as✝: List α
a✝: as✝.reverse = as Palindrome as
h: ([a✝¹] ++ as ++ [b]).reverse = [a✝¹] ++ as ++ [b]
Palindrome ([a✝¹] ++ as ++ [b])

Goals accomplished! 🐙
α✝: Type u_1
a✝: α
h: [a].reverse = [a]

h₂
Palindrome [a]
α✝: Type u_1
a✝¹, b✝: α
as✝: List α
a✝: as✝.reverse = as Palindrome as
h: ([a✝¹] ++ as ++ [b]).reverse = [a✝¹] ++ as ++ [b]
Palindrome ([a✝¹] ++ as ++ [b])

Goals accomplished! 🐙
α✝: Type u_1
a✝¹, b✝: α
as✝: List α
a✝: as✝.reverse = as Palindrome as
h: ([a✝¹] ++ as ++ [b]).reverse = [a✝¹] ++ as ++ [b]

h₃
Palindrome ([a✝¹] ++ as ++ [b])
α✝: Type u_1
a, b: α
as: List α
ih: as.reverse = as Palindrome as
h: ([a] ++ as ++ [b]).reverse = [a] ++ as ++ [b]

Palindrome ([a] ++ as ++ [b])

Goals accomplished! 🐙

Goals accomplished! 🐙
α✝: Type u_1
a: α
as: List α
ih: as.reverse = as Palindrome as
h: ([a] ++ as ++ [a]).reverse = [a] ++ as ++ [a]

Palindrome ([a] ++ as ++ [a])
α✝: Type u_1
a: α
as: List α
ih: as.reverse = as Palindrome as
h: ([a] ++ as ++ [a]).reverse = [a] ++ as ++ [a]

Palindrome ([a] ++ as ++ [a])

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙

We now define a function that returns true iff as is a palindrome. The function assumes that the type α has decidable equality. We need this assumption because we need to compare the list elements.

def 
List.isPalindrome: {α : Type u_1} → [inst : DecidableEq α] → List α → Bool
List.isPalindrome
[
DecidableEq: Type u_1 → Type (max 0 u_1)
DecidableEq
α: Type u_1
α
] (
as: List α
as
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
) :
Bool: Type
Bool
:=
as: List α
as
.
reverse: {α : Type u_1} → List α → List α
reverse
=
as: List α
as

It is straightforward to prove that isPalindrome is correct using the previously proved theorems.

theorem 
List.isPalindrome_correct: ∀ {α : Type u_1} [inst : DecidableEq α] (as : List α), as.isPalindrome = true ↔ Palindrome as
List.isPalindrome_correct
[
DecidableEq: Type u_1 → Type (max 0 u_1)
DecidableEq
α: Type u_1
α
] (
as: List α
as
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
) :
as: List α
as
.
isPalindrome: {α : Type u_1} → [inst : DecidableEq α] → List α → Bool
isPalindrome
Palindrome: {α : Type u_1} → List α → Prop
Palindrome
as: List α
as
:=

Goals accomplished! 🐙
α: Type u_1
inst✝: DecidableEq α
as: List α

as.reverse = as Palindrome as

Goals accomplished! 🐙
true
[
1: Nat
1
,
2: Nat
2
,
1: Nat
1
].
isPalindrome: {α : Type} → [inst : DecidableEq α] → List α → Bool
isPalindrome
false
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
,
1: Nat
1
].
isPalindrome: {α : Type} → [inst : DecidableEq α] → List α → Bool
isPalindrome
example: [1, 2, 1].isPalindrome = true
example
: [
1: Nat
1
,
2: Nat
2
,
1: Nat
1
].
isPalindrome: {α : Type} → [inst : DecidableEq α] → List α → Bool
isPalindrome
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl
example: [1, 2, 2, 1].isPalindrome = true
example
: [
1: Nat
1
,
2: Nat
2
,
2: Nat
2
,
1: Nat
1
].
isPalindrome: {α : Type} → [inst : DecidableEq α] → List α → Bool
isPalindrome
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl
example: (![1, 2, 3, 1].isPalindrome) = true
example
: ![
1: Nat
1
,
2: Nat
2
,
3: Nat
3
,
1: Nat
1
].
isPalindrome: {α : Type} → [inst : DecidableEq α] → List α → Bool
isPalindrome
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl

Binary Search Trees

If the type of keys can be totally ordered -- that is, it supports a well-behaved comparison -- then maps can be implemented with binary search trees (BSTs). Insert and lookup operations on BSTs take time proportional to the height of the tree. If the tree is balanced, the operations therefore take logarithmic time.

This example is based on a similar example found in the "Software Foundations" book (volume 3).

We use Nat as the key type in our implementation of BSTs, since it has a convenient total order with lots of theorems and automation available. We leave as an exercise to the reader the generalization to arbitrary types.

inductive 
Tree: Type v → Type v
Tree
(
β: Type v
β
:
Type v: Type (v + 1)
Type v
) where |
leaf: {β : Type v} → Tree β
leaf
|
node: {β : Type v} → Tree β → Nat → β → Tree β → Tree β
node
(
left: Tree β
left
:
Tree: Type v → Type v
Tree
β: Type v
β
) (
key: Nat
key
:
Nat: Type
Nat
) (
value: β
value
:
β: Type v
β
) (
right: Tree β
right
:
Tree: Type v → Type v
Tree
β: Type v
β
) deriving
Repr: Type u → Type u
Repr

The function contains returns true iff the given tree contains the key k.

def 
Tree.contains: {β : Type u_1} → Tree β → Nat → Bool
Tree.contains
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) :
Bool: Type
Bool
:= match
t: Tree β
t
with |
leaf: {β : Type ?u.1354} → Tree β
leaf
=>
false: Bool
false
|
node: {β : Type ?u.1363} → Tree β → Nat → β → Tree β → Tree β
node
left: Tree β
left
key: Nat
key
Warning: unused variable `value` note: this linter can be disabled with `set_option linter.unusedVariables false`
right: Tree β
right
=> if
k: Nat
k
<
key: Nat
key
then
left: Tree β
left
.
contains: {β : Type u_1} → Tree β → Nat → Bool
contains
k: Nat
k
else if
key: Nat
key
<
k: Nat
k
then
right: Tree β
right
.
contains: {β : Type u_1} → Tree β → Nat → Bool
contains
k: Nat
k
else
true: Bool
true

t.find? k returns some v if v is the value bound to key k in the tree t. It returns none otherwise.

def 
Tree.find?: {β : Type u_1} → Tree β → Nat → Option β
Tree.find?
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) :
Option: Type u_1 → Type u_1
Option
β: Type u_1
β
:= match
t: Tree β
t
with |
leaf: {β : Type ?u.1655} → Tree β
leaf
=>
none: {α : Type u_1} → Option α
none
|
node: {β : Type ?u.1667} → Tree β → Nat → β → Tree β → Tree β
node
left: Tree β
left
key: Nat
key
value: β
value
right: Tree β
right
=> if
k: Nat
k
<
key: Nat
key
then
left: Tree β
left
.
find?: {β : Type u_1} → Tree β → Nat → Option β
find?
k: Nat
k
else if
key: Nat
key
<
k: Nat
k
then
right: Tree β
right
.
find?: {β : Type u_1} → Tree β → Nat → Option β
find?
k: Nat
k
else
some: {α : Type u_1} → α → Option α
some
value: β
value

t.insert k v is the map containing all the bindings of t along with a binding of k to v.

def 
Tree.insert: {β : Type u_1} → Tree β → Nat → β → Tree β
Tree.insert
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) (
v: β
v
:
β: Type u_1
β
) :
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
:= match
t: Tree β
t
with |
leaf: {β : Type ?u.1964} → Tree β
leaf
=>
node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
node
leaf: {β : Type u_1} → Tree β
leaf
k: Nat
k
v: β
v
leaf: {β : Type u_1} → Tree β
leaf
|
node: {β : Type ?u.1979} → Tree β → Nat → β → Tree β → Tree β
node
left: Tree β
left
key: Nat
key
value: β
value
right: Tree β
right
=> if
k: Nat
k
<
key: Nat
key
then
node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
node
(
left: Tree β
left
.
insert: {β : Type u_1} → Tree β → Nat → β → Tree β
insert
k: Nat
k
v: β
v
)
key: Nat
key
value: β
value
right: Tree β
right
else if
key: Nat
key
<
k: Nat
k
then
node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
node
left: Tree β
left
key: Nat
key
value: β
value
(
right: Tree β
right
.
insert: {β : Type u_1} → Tree β → Nat → β → Tree β
insert
k: Nat
k
v: β
v
) else
node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
node
left: Tree β
left
k: Nat
k
v: β
v
right: Tree β
right

Let's add a new operation to our tree: converting it to an association list that contains the key--value bindings from the tree stored as pairs. If that list is sorted by the keys, then any two trees that represent the same map would be converted to the same list. Here's a function that does so with an in-order traversal of the tree.

def 
Tree.toList: {β : Type u_1} → Tree β → List (Nat × β)
Tree.toList
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) :
List: Type u_1 → Type u_1
List
(
Nat: Type
Nat
×
β: Type u_1
β
) := match
t: Tree β
t
with |
leaf: {β : Type ?u.2298} → Tree β
leaf
=>
[]: List (Nat × β)
[]
|
node: {β : Type ?u.2310} → Tree β → Nat → β → Tree β → Tree β
node
l: Tree β
l
k: Nat
k
v: β
v
r: Tree β
r
=>
l: Tree β
l
.
toList: {β : Type u_1} → Tree β → List (Nat × β)
toList
++ [(
k: Nat
k
,
v: β
v
)] ++
r: Tree β
r
.
toList: {β : Type u_1} → Tree β → List (Nat × β)
toList
Tree.node (Tree.node (Tree.leaf) 1 "one" (Tree.leaf)) 2 "two" (Tree.node (Tree.leaf) 3 "three" (Tree.leaf))
Tree.leaf: {β : Type} → Tree β
Tree.leaf
.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
2: Nat
2
"two": String
"two"
|>.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
3: Nat
3
"three": String
"three"
|>.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
1: Nat
1
"one": String
"one"
[(1, "one"), (2, "two"), (3, "three")]
Tree.leaf: {β : Type} → Tree β
Tree.leaf
.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
2: Nat
2
"two": String
"two"
|>.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
3: Nat
3
"three": String
"three"
|>.
insert: {β : Type} → Tree β → Nat → β → Tree β
insert
1: Nat
1
"one": String
"one"
|>.
toList: {β : Type} → Tree β → List (Nat × β)
toList

The implementation of Tree.toList is inefficient because of how it uses the ++ operator. On a balanced tree its running time is linearithmic, because it does a linear number of concatenations at each level of the tree. On an unbalanced tree it's quadratic time. Here's a tail-recursive implementation than runs in linear time, regardless of whether the tree is balanced:

def 
Tree.toListTR: {β : Type u_1} → Tree β → List (Nat × β)
Tree.toListTR
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) :
List: Type u_1 → Type u_1
List
(
Nat: Type
Nat
×
β: Type u_1
β
) :=
go: Tree β → List (Nat × β) → List (Nat × β)
go
t: Tree β
t
[]: List (Nat × β)
[]
where
go: {β : Type u_1} → Tree β → List (Nat × β) → List (Nat × β)
go
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) (
acc: List (Nat × β)
acc
:
List: Type u_1 → Type u_1
List
(
Nat: Type
Nat
×
β: Type u_1
β
)) :
List: Type u_1 → Type u_1
List
(
Nat: Type
Nat
×
β: Type u_1
β
) := match
t: Tree β
t
with |
leaf: {β : Type ?u.3168} → Tree β
leaf
=>
acc: List (Nat × β)
acc
|
node: {β : Type ?u.3178} → Tree β → Nat → β → Tree β → Tree β
node
l: Tree β
l
k: Nat
k
v: β
v
r: Tree β
r
=>
go: Tree β → List (Nat × β) → List (Nat × β)
go
l: Tree β
l
((
k: Nat
k
,
v: β
v
) ::
go: Tree β → List (Nat × β) → List (Nat × β)
go
r: Tree β
r
acc: List (Nat × β)
acc
)

We now prove that t.toList and t.toListTR return the same list. The proof is on induction, and as we used the auxiliary function go to define Tree.toListTR, we use the auxiliary theorem go to prove the theorem.

The proof of the auxiliary theorem is by induction on t. The generalizing acc modifier instructs Lean to revert acc, apply the induction theorem for Trees, and then reintroduce acc in each case. By using generalizing, we obtain the more general induction hypotheses

  • left_ih : ∀ acc, toListTR.go left acc = toList left ++ acc

  • right_ih : ∀ acc, toListTR.go right acc = toList right ++ acc

Recall that the combinator tac <;> tac' runs tac on the main goal and tac' on each produced goal, concatenating all goals produced by tac'. In this theorem, we use it to apply simp and close each subgoal produced by the induction tactic.

The simp parameters toListTR.go and toList instruct the simplifier to try to reduce and/or apply auto generated equation theorems for these two functions. The parameter * instructs the simplifier to use any equation in a goal as rewriting rules. In this particular case, simp uses the induction hypotheses as rewriting rules. Finally, the parameter List.append_assoc instructs the simplifier to use the List.append_assoc theorem as a rewriting rule.

theorem 
Tree.toList_eq_toListTR: ∀ {β : Type u_1} (t : Tree β), t.toList = t.toListTR
Tree.toList_eq_toListTR
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) :
t: Tree β
t
.
toList: {β : Type u_1} → Tree β → List (Nat × β)
toList
=
t: Tree β
t
.
toListTR: {β : Type u_1} → Tree β → List (Nat × β)
toListTR
:=

Goals accomplished! 🐙

Goals accomplished! 🐙
where
go: ∀ {β : Type u_1} (t : Tree β) (acc : List (Nat × β)), toListTR.go t acc = t.toList ++ acc
go
(
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
) (
acc: List (Nat × β)
acc
:
List: Type u_1 → Type u_1
List
(
Nat: Type
Nat
×
β: Type u_1
β
)) :
toListTR.go: {β : Type u_1} → Tree β → List (Nat × β) → List (Nat × β)
toListTR.go
t: Tree β
t
acc: List (Nat × β)
acc
=
t: Tree β
t
.
toList: {β : Type u_1} → Tree β → List (Nat × β)
toList
++
acc: List (Nat × β)
acc
:=

Goals accomplished! 🐙
β: Type u_1
t: Tree β
acc: List (Nat × β)

leaf
toListTR.go leaf acc = leaf.toList ++ acc
β: Type u_1
t, left✝: Tree β
key✝: Nat
value✝: β
right✝: Tree β
left_ih✝: (acc : List (Nat × β)), toListTR.go left acc = left✝.toList ++ acc
right_ih✝: (acc : List (Nat × β)), toListTR.go right acc = right✝.toList ++ acc
acc: List (Nat × β)
toListTR.go (left✝.node key value right) acc = (left✝.node key value right).toList ++ acc
β: Type u_1
t: Tree β
acc: List (Nat × β)

leaf
toListTR.go leaf acc = leaf.toList ++ acc
β: Type u_1
t, left✝: Tree β
key✝: Nat
value✝: β
right✝: Tree β
left_ih✝: (acc : List (Nat × β)), toListTR.go left acc = left✝.toList ++ acc
right_ih✝: (acc : List (Nat × β)), toListTR.go right acc = right✝.toList ++ acc
acc: List (Nat × β)
toListTR.go (left✝.node key value right) acc = (left✝.node key value right).toList ++ acc

Goals accomplished! 🐙

The [csimp] annotation instructs the Lean code generator to replace any Tree.toList with Tree.toListTR when generating code.

@[csimp] theorem 
Tree.toList_eq_toListTR_csimp: @toList = @toListTR
Tree.toList_eq_toListTR_csimp
: @
Tree.toList: {β : Type u_1} → Tree β → List (Nat × β)
Tree.toList
= @
Tree.toListTR: {β : Type u_1} → Tree β → List (Nat × β)
Tree.toListTR
:=

Goals accomplished! 🐙
β: Type u_1
t: Tree β

h.h
t.toList = t.toListTR

Goals accomplished! 🐙

The implementations of Tree.find? and Tree.insert assume that values of type tree obey the BST invariant: for any non-empty node with key k, all the values of the left subtree are less than k and all the values of the right subtree are greater than k. But that invariant is not part of the definition of tree.

So, let's formalize the BST invariant. Here's one way to do so. First, we define a helper ForallTree to express that idea that a predicate holds at every node of a tree:

inductive 
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
(
p: Nat → β → Prop
p
:
Nat: Type
Nat
β: Type u_1
β
Prop: Type
Prop
) :
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
Prop: Type
Prop
|
leaf: ∀ {β : Type u_1} {p : Nat → β → Prop}, ForallTree p Tree.leaf
leaf
:
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → β → Prop
p
.leaf: {β : Type u_1} → Tree β
.leaf
|
node: ∀ {β : Type u_1} {p : Nat → β → Prop} {left : Tree β} {key : Nat} {value : β} {right : Tree β}, ForallTree p left → p key value → ForallTree p right → ForallTree p (left.node key value right)
node
:
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → β → Prop
p
left: Tree β
left
p: Nat → β → Prop
p
key: Nat
key
value: β
value
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → β → Prop
p
right: Tree β
right
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → β → Prop
p
(
.node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
.node
left: Tree β
left
key: Nat
key
value: β
value
right: Tree β
right
)

Second, we define the BST invariant: An empty tree is a BST. A non-empty tree is a BST if all its left nodes have a lesser key, its right nodes have a greater key, and the left and right subtrees are themselves BSTs.

inductive 
BST: {β : Type u_1} → Tree β → Prop
BST
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
Prop: Type
Prop
|
leaf: ∀ {β : Type u_1}, BST Tree.leaf
leaf
:
BST: {β : Type u_1} → Tree β → Prop
BST
.leaf: {β : Type u_1} → Tree β
.leaf
|
node: ∀ {β : Type u_1} {key : Nat} {left right : Tree β} {value : β}, ForallTree (fun k v => k < key) left → ForallTree (fun k v => key < k) right → BST left → BST right → BST (left.node key value right)
node
:
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
(fun
k: Nat
k
Warning: unused variable `v` note: this linter can be disabled with `set_option linter.unusedVariables false`
=>
k: Nat
k
<
key: Nat
key
)
left: Tree ?m.4990
left
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
(fun
k: Nat
k
Warning: unused variable `v` note: this linter can be disabled with `set_option linter.unusedVariables false`
=>
key: Nat
key
<
k: Nat
k
)
right: Tree ?m.4990
right
BST: {β : Type u_1} → Tree β → Prop
BST
left: Tree ?m.4990
left
BST: {β : Type u_1} → Tree β → Prop
BST
right: Tree ?m.4990
right
BST: {β : Type u_1} → Tree β → Prop
BST
(
.node: {β : Type u_1} → Tree β → Nat → β → Tree β → Tree β
.node
left: Tree ?m.4990
left
key: Nat
key
value: ?m.4990
value
right: Tree ?m.4990
right
)

We can use the macro command to create helper tactics for organizing our proofs. The macro have_eq x y tries to prove x = y using linear arithmetic, and then immediately uses the new equality to substitute x with y everywhere in the goal.

The modifier local specifies the scope of the macro.

/-- The `have_eq lhs rhs` tactic (tries to) prove that `lhs = rhs`,
    and then replaces `lhs` with `rhs`. -/
local macro "have_eq " 
lhs: Lean.TSyntax `term
lhs
:
term: Lean.Parser.Category
term
:max
rhs: Lean.TSyntax `term
rhs
:
term: Lean.Parser.Category
term
:max :
tactic: Lean.Parser.Category
tactic
=> `(tactic| (have h : $
lhs: Lean.TSyntax `term
lhs
= $
rhs: Lean.TSyntax `term
rhs
:= -- TODO: replace with linarith by simp_arith at *; apply Nat.le_antisymm <;> assumption try subst $
lhs: Lean.TSyntax `term
lhs
))

The by_cases' e is just the regular by_cases followed by simp using all hypotheses in the current goal as rewriting rules. Recall that the by_cases tactic creates two goals. One where we have h : e and another one containing h : ¬ e. The simplifier uses the h to rewrite e to True in the first subgoal, and e to False in the second. This is particularly useful if e is the condition of an if-statement.

/-- `by_cases' e` is a shorthand form `by_cases e <;> simp[*]` -/
local macro "by_cases' " 
e: Lean.TSyntax `term
e
:
term: Lean.Parser.Category
term
:
tactic: Lean.Parser.Category
tactic
=> `(tactic| by_cases $
e: Lean.TSyntax `term
e
<;> simp [*])

We can use the attribute [simp] to instruct the simplifier to reduce given definitions or apply rewrite theorems. The local modifier limits the scope of this modification to this file.

attribute [local simp] 
Tree.insert: {β : Type u_1} → Tree β → Nat → β → Tree β
Tree.insert

We now prove that Tree.insert preserves the BST invariant using induction and case analysis. Recall that the tactic . tac focuses on the main goal and tries to solve it using tac, or else fails. It is used to structure proofs in Lean. The notation ‹e› is just syntax sugar for (by assumption : e). That is, it tries to find a hypothesis h : e. It is useful to access hypothesis that have auto generated names (aka "inaccessible") names.

theorem 
Tree.forall_insert_of_forall: ∀ {β : Type u_1} {p : Nat → β → Prop} {t : Tree β} {key : Nat} {value : β}, ForallTree p t → p key value → ForallTree p (t.insert key value)
Tree.forall_insert_of_forall
(
h₁: ForallTree p t
h₁
:
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → ?m.6770 → Prop
p
t: Tree ?m.6770
t
) (
h₂: p key value
h₂
:
p: Nat → ?m.6770 → Prop
p
key: Nat
key
value: ?m.6770
value
) :
ForallTree: {β : Type u_1} → (Nat → β → Prop) → Tree β → Prop
ForallTree
p: Nat → ?m.6770 → Prop
p
(
t: Tree ?m.6770
t
.
insert: {β : Type u_1} → Tree β → Nat → β → Tree β
insert
key: Nat
key
value: ?m.6770
value
) :=

Goals accomplished! 🐙
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₁: ForallTree p t
h₂: p key value

ForallTree p (t.insert key value)

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)

node
ForallTree p ((left✝.node k value right).insert key value)
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝: key < k

pos
ForallTree p ((left✝.insert key value).node k value right)
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝: ¬key < k
ForallTree p (if k < key then left✝.node k value (right✝.insert key value) else left✝.node key value right)
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝: key < k

pos
ForallTree p ((left✝.insert key value).node k value right)

Goals accomplished! 🐙
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝: ¬key < k

neg
ForallTree p (if k < key then left✝.node k value (right✝.insert key value) else left✝.node key value right)
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝¹: ¬key < k
h✝: k < key

pos
ForallTree p (left✝.node k value (right✝.insert key value))
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝¹: ¬key < k
h✝: ¬k < key
ForallTree p (left✝.node key value right)
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝¹: ¬key < k
h✝: k < key

pos
ForallTree p (left✝.node k value (right✝.insert key value))

Goals accomplished! 🐙
β✝: Type u_1
p: Nat β Prop
t: Tree β
key: Nat
value: β
h₂: p key value
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
ihl: ForallTree p (left✝.insert key value)
ihr: ForallTree p (right✝.insert key value)
h✝¹: ¬key < k
h✝: ¬k < key

neg
ForallTree p (left✝.node key value right)
β✝: Type u_1
p: Nat β Prop
t: Tree β
value: β
left✝: Tree β
k: Nat
value✝: β
right✝: Tree β
hl: ForallTree p left
hp: p k value
hr: ForallTree p right
h₂: p k value
ihl: ForallTree p (left✝.insert k value)
ihr: ForallTree p (right✝.insert k value)
h✝¹, h✝: ¬k < k

neg
ForallTree p (left✝.node k value right)

Goals accomplished! 🐙
theorem
Tree.bst_insert_of_bst: ∀ {β : Type u_1} {t : Tree β}, BST t → ∀ (key : Nat) (value : β), BST (t.insert key value)
Tree.bst_insert_of_bst
{
t: Tree β
t
:
Tree: Type u_1 → Type u_1
Tree
β: Type u_1
β
} (
h: BST t
h
:
BST: {β : Type u_1} → Tree β → Prop
BST
t: Tree β
t
) (
key: Nat
key
:
Nat: Type
Nat
) (
value: β
value
:
β: Type u_1
β
) :
BST: {β : Type u_1} → Tree β → Prop
BST
(
t: Tree β
t
.
insert: {β : Type u_1} → Tree β → Nat → β → Tree β
insert
key: Nat
key
value: β
value
) :=

Goals accomplished! 🐙
β: Type u_1
t: Tree β
h: BST t
key: Nat
value: β

BST (t.insert key value)

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)

node
BST ((left✝.node k value right).insert key value)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)

node
BST (if key < k then (left✝.insert key value).node k value right else if k < key then left✝.node k value (right✝.insert key value) else left✝.node key value right)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝: key < k

pos
BST ((left✝.insert key value).node k value right)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝: ¬key < k
BST (if k < key then left✝.node k value (right✝.insert key value) else left✝.node key value right)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝: key < k

pos
BST ((left✝.insert key value).node k value right)

Goals accomplished! 🐙
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝: ¬key < k

neg
BST (if k < key then left✝.node k value (right✝.insert key value) else left✝.node key value right)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝¹: ¬key < k
h✝: k < key

pos
BST (left✝.node k value (right✝.insert key value))
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝¹: ¬key < k
h✝: ¬k < key
BST (left✝.node key value right)
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝¹: ¬key < k
h✝: k < key

pos
BST (left✝.node k value (right✝.insert key value))

Goals accomplished! 🐙
β: Type u_1
t: Tree β
key: Nat
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert key value)
ih₂: BST (right✝.insert key value)
h✝¹: ¬key < k
h✝: ¬k < key

neg
BST (left✝.node key value right)
β: Type u_1
t: Tree β
value: β
k: Nat
left✝, right✝: Tree β
value✝: β
h₁: ForallTree (fun k_1 v => k_1 < k) left
h₂: ForallTree (fun k_1 v => k < k_1) right
b₁: BST left
b₂: BST right
ih₁: BST (left✝.insert k value)
ih₂: BST (right✝.insert k value)
h✝¹, h✝: ¬k < k

neg
BST (left✝.node k value right)

Goals accomplished! 🐙

Now, we define the type BinTree using a Subtype that states that only trees satisfying the BST invariant are BinTrees.

def 
BinTree: Type u → Type u
BinTree
(
β: Type u
β
:
Type u: Type (u + 1)
Type u
) := {
t: Tree β
t
:
Tree: Type u → Type u
Tree
β: Type u
β
//
BST: {β : Type u} → Tree β → Prop
BST
t: Tree β
t
} def
BinTree.mk: {β : Type u_1} → BinTree β
BinTree.mk
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
:= ⟨
.leaf: {β : Type u_1} → Tree β
.leaf
,
.leaf: ∀ {β : Type u_1}, BST Tree.leaf
.leaf
def
BinTree.contains: {β : Type u_1} → BinTree β → Nat → Bool
BinTree.contains
(
b: BinTree β
b
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) :
Bool: Type
Bool
:=
b: BinTree β
b
.
val: {α : Type u_1} → {p : α → Prop} → Subtype p → α
val
.
contains: {β : Type u_1} → Tree β → Nat → Bool
contains
k: Nat
k
def
BinTree.find?: {β : Type u_1} → BinTree β → Nat → Option β
BinTree.find?
(
b: BinTree β
b
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) :
Option: Type u_1 → Type u_1
Option
β: Type u_1
β
:=
b: BinTree β
b
.
val: {α : Type u_1} → {p : α → Prop} → Subtype p → α
val
.
find?: {β : Type u_1} → Tree β → Nat → Option β
find?
k: Nat
k
def
BinTree.insert: {β : Type u_1} → BinTree β → Nat → β → BinTree β
BinTree.insert
(
b: BinTree β
b
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) (
v: β
v
:
β: Type u_1
β
) :
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
:= ⟨
b: BinTree β
b
.
val: {α : Type u_1} → {p : α → Prop} → Subtype p → α
val
.
insert: {β : Type u_1} → Tree β → Nat → β → Tree β
insert
k: Nat
k
v: β
v
,
b: BinTree β
b
.
val: {α : Type u_1} → {p : α → Prop} → Subtype p → α
val
.
bst_insert_of_bst: ∀ {β : Type u_1} {t : Tree β}, BST t → ∀ (key : Nat) (value : β), BST (t.insert key value)
bst_insert_of_bst
b: BinTree β
b
.
property: ∀ {α : Type u_1} {p : α → Prop} (self : Subtype p), p self.val
property
k: Nat
k
v: β
v

Finally, we prove that BinTree.find? and BinTree.insert satisfy the map properties.

attribute [local simp]
  
BinTree.mk: {β : Type u_1} → BinTree β
BinTree.mk
BinTree.contains: {β : Type u_1} → BinTree β → Nat → Bool
BinTree.contains
BinTree.find?: {β : Type u_1} → BinTree β → Nat → Option β
BinTree.find?
BinTree.insert: {β : Type u_1} → BinTree β → Nat → β → BinTree β
BinTree.insert
Tree.find?: {β : Type u_1} → Tree β → Nat → Option β
Tree.find?
Tree.contains: {β : Type u_1} → Tree β → Nat → Bool
Tree.contains
Tree.insert: {β : Type u_1} → Tree β → Nat → β → Tree β
Tree.insert
theorem
BinTree.find_mk: ∀ {β : Type u_1} (k : Nat), mk.find? k = none
BinTree.find_mk
(
k: Nat
k
:
Nat: Type
Nat
) :
BinTree.mk: {β : Type u_1} → BinTree β
BinTree.mk
.
find?: {β : Type u_1} → BinTree β → Nat → Option β
find?
k: Nat
k
= (
none: {α : Type u_1} → Option α
none
:
Option: Type u_1 → Type u_1
Option
β: Type u_1
β
) :=

Goals accomplished! 🐙

Goals accomplished! 🐙
theorem
BinTree.find_insert: ∀ {β : Type u_1} (b : BinTree β) (k : Nat) (v : β), (b.insert k v).find? k = some v
BinTree.find_insert
(
b: BinTree β
b
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
) (
k: Nat
k
:
Nat: Type
Nat
) (
v: β
v
:
β: Type u_1
β
) : (
b: BinTree β
b
.
insert: {β : Type u_1} → BinTree β → Nat → β → BinTree β
insert
k: Nat
k
v: β
v
).
find?: {β : Type u_1} → BinTree β → Nat → Option β
find?
k: Nat
k
=
some: {α : Type u_1} → α → Option α
some
v: β
v
:=

Goals accomplished! 🐙
β: Type u_1
b: BinTree β
k: Nat
v: β
t: Tree β
h: BST t

(insert ⟨t, h⟩ k v).find? k = some v
;
β: Type u_1
b: BinTree β
k: Nat
v: β
t: Tree β
h: BST t

(t.insert k v).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
t: Tree β
h: BST t

(t.insert k v).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)

node
(if k < key then (left.insert k v).node key value right else if key < k then left.node key value (right.insert k v) else left.node k v right).find? k = some v

Goals accomplished! 🐙
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)
h✝: k < key

pos
(left.insert k v).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)
h✝: ¬k < key
(if key < k then left.node key value (right.insert k v) else left.node k v right).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)
h✝: k < key

pos
(left.insert k v).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h✝: k < key
a✝³: BST left
a✝²: ForallTree (fun k v => k < key) left
a✝¹: BST right
a✝: ForallTree (fun k v => key < k) right

pos.node
(left.insert k v).find? k = some v
;
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h✝: k < key
a✝³: BST left
a✝²: ForallTree (fun k v => k < key) left
a✝¹: BST right
a✝: ForallTree (fun k v => key < k) right

pos.node
BST left
;

Goals accomplished! 🐙
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)
h✝: ¬k < key

neg
(if key < k then left.node key value (right.insert k v) else left.node k v right).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h: BST (left.node key value right)
h✝¹: ¬k < key
h✝: key < k

pos
(right.insert k v).find? k = some v
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h✝¹: ¬k < key
h✝: key < k
a✝³: BST left
a✝²: ForallTree (fun k v => k < key) left
a✝¹: BST right
a✝: ForallTree (fun k v => key < k) right

pos.node
(right.insert k v).find? k = some v
;
β: Type u_1
b: BinTree β
k: Nat
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k = some v
ihr: BST right (right.insert k v).find? k = some v
h✝¹: ¬k < key
h✝: key < k
a✝³: BST left
a✝²: ForallTree (fun k v => k < key) left
a✝¹: BST right
a✝: ForallTree (fun k v => key < k) right

pos.node
BST right
;

Goals accomplished! 🐙
theorem
BinTree.find_insert_of_ne: ∀ {β : Type u_1} {k k' : Nat} (b : BinTree β), k ≠ k' → ∀ (v : β), (b.insert k v).find? k' = b.find? k'
BinTree.find_insert_of_ne
(
b: BinTree β
b
:
BinTree: Type u_1 → Type u_1
BinTree
β: Type u_1
β
) (
ne: k ≠ k'
ne
:
k: Nat
k
k': Nat
k'
) (
v: β
v
:
β: Type u_1
β
) : (
b: BinTree β
b
.
insert: {β : Type u_1} → BinTree β → Nat → β → BinTree β
insert
k: Nat
k
v: β
v
).
find?: {β : Type u_1} → BinTree β → Nat → Option β
find?
k': Nat
k'
=
b: BinTree β
b
.
find?: {β : Type u_1} → BinTree β → Nat → Option β
find?
k': Nat
k'
:=

Goals accomplished! 🐙
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
t: Tree β
h: BST t

(insert ⟨t, h⟩ k v).find? k' = find? ⟨t, h⟩ k'
;
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
t: Tree β
h: BST t

(t.insert k v).find? k' = t.find? k'
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
t: Tree β
h: BST t

(t.insert k v).find? k' = t.find? k'
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k' = left.find? k'
ihr: BST right (right.insert k v).find? k' = right.find? k'
h: BST (left.node key value right)

node
(if k < key then (left.insert k v).node key value right else if key < k then left.node key value (right.insert k v) else left.node k v right).find? k' = if k' < key then left.find? k' else if key < k' then right.find? k' else some value

Goals accomplished! 🐙
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
h: BST Tree.leaf
le: k k'

leaf
k < k'

Goals accomplished! 🐙

Goals accomplished! 🐙
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihl: BST left (left.insert k v).find? k' = left.find? k'
ihr: BST right (right.insert k v).find? k' = right.find? k'
h: BST (left.node key value right)
hl: ForallTree (fun k v => k < key) left
hr: ForallTree (fun k v => key < k) right
bl: BST left
br: BST right

node
(if k < key then (left.insert k v).node key value right else if key < k then left.node key value (right.insert k v) else left.node k v right).find? k' = if k' < key then left.find? k' else if key < k' then right.find? k' else some value
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
ihr: BST right (right.insert k v).find? k' = right.find? k'
h: BST (left.node key value right)
hl: ForallTree (fun k v => k < key) left
hr: ForallTree (fun k v => key < k) right
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'

node
(if k < key then (left.insert k v).node key value right else if key < k then left.node key value (right.insert k v) else left.node k v right).find? k' = if k' < key then left.find? k' else if key < k' then right.find? k' else some value
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
h: BST (left.node key value right)
hl: ForallTree (fun k v => k < key) left
hr: ForallTree (fun k v => key < k) right
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'

node
(if k < key then (left.insert k v).node key value right else if key < k then left.node key value (right.insert k v) else left.node k v right).find? k' = if k' < key then left.find? k' else if key < k' then right.find? k' else some value
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
h: BST (left.node key value right)
hl: ForallTree (fun k v => k < key) left
hr: ForallTree (fun k v => key < k) right
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'
h✝: ¬k < key

neg
(if key < k then left.node key value (right.insert k v) else left.node k v right).find? k' = if k' < key then left.find? k' else if key < k' then right.find? k' else some value
;
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
key: Nat
value: β
right: Tree β
h: BST (left.node key value right)
hl: ForallTree (fun k v => k < key) left
hr: ForallTree (fun k v => key < k) right
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'
h✝¹: ¬k < key
h✝: ¬key < k

neg
(if k' < k then left.find? k' else if k < k' then right.find? k' else some v) = if k' < key then left.find? k' else if key < k' then right.find? k' else some value
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
value: β
right: Tree β
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'
h: BST (left.node k value right)
hl: ForallTree (fun k_1 v => k_1 < k) left
hr: ForallTree (fun k_1 v => k < k_1) right
h✝¹, h✝: ¬k < k

neg
(if k' < k then left.find? k' else if k < k' then right.find? k' else some v) = if k' < k then left.find? k' else if k < k' then right.find? k' else some value
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
value: β
right: Tree β
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'
h: BST (left.node k value right)
hl: ForallTree (fun k_1 v => k_1 < k) left
hr: ForallTree (fun k_1 v => k < k_1) right
h✝², h✝¹: ¬k < k
h✝: ¬k' < k

neg
(if k < k' then right.find? k' else some v) = if k < k' then right.find? k' else some value
;
β: Type u_1
k, k': Nat
b: BinTree β
ne: k k'
v: β
left: Tree β
value: β
right: Tree β
bl: BST left
br: BST right
ihl: (left.insert k v).find? k' = left.find? k'
ihr: (right.insert k v).find? k' = right.find? k'
h: BST (left.node k value right)
hl: ForallTree (fun k_1 v => k_1 < k) left
hr: ForallTree (fun k_1 v => k < k_1) right
h✝³, h✝²: ¬k < k
h✝¹: ¬k' < k
h✝: ¬k < k'

neg
v = value
β: Type u_1
k': Nat
b: BinTree β
v: β
left: Tree β
value: β
right: Tree β
bl: BST left
br: BST right
ne: k' k'
ihl: (left.insert k' v).find? k' = left.find? k'
ihr: (right.insert k' v).find? k' = right.find? k'
h: BST (left.node k' value right)
hl: ForallTree (fun k v => k < k') left
hr: ForallTree (fun k v => k' < k) right
h✝³, h✝², h✝¹, h✝: ¬k' < k'

neg
v = value

Goals accomplished! 🐙

A Certified Type Checker

In this example, we build a certified type checker for a simple expression language.

Remark: this example is based on an example in the book Certified Programming with Dependent Types by Adam Chlipala.

inductive 
Expr: Type
Expr
where |
nat: Nat → Expr
nat
:
Nat: Type
Nat
Expr: Type
Expr
|
plus: Expr → Expr → Expr
plus
:
Expr: Type
Expr
Expr: Type
Expr
Expr: Type
Expr
|
bool: Bool → Expr
bool
:
Bool: Type
Bool
Expr: Type
Expr
|
and: Expr → Expr → Expr
and
:
Expr: Type
Expr
Expr: Type
Expr
Expr: Type
Expr

We define a simple language of types using the inductive datatype Ty, and its typing rules using the inductive predicate HasType.

inductive 
Ty: Type
Ty
where |
nat: Ty
nat
|
bool: Ty
bool
deriving
DecidableEq: Sort u → Sort (max 1 u)
DecidableEq
inductive
HasType: Expr → Ty → Prop
HasType
:
Expr: Type
Expr
Ty: Type
Ty
Prop: Type
Prop
|
nat: ∀ {v : Nat}, HasType (Expr.nat v) Ty.nat
nat
:
HasType: Expr → Ty → Prop
HasType
(
.nat: Nat → Expr
.nat
v: Nat
v
)
.nat: Ty
.nat
|
plus: ∀ {a b : Expr}, HasType a Ty.nat → HasType b Ty.nat → HasType (a.plus b) Ty.nat
plus
:
HasType: Expr → Ty → Prop
HasType
a: Expr
a
.nat: Ty
.nat
HasType: Expr → Ty → Prop
HasType
b: Expr
b
.nat: Ty
.nat
HasType: Expr → Ty → Prop
HasType
(
.plus: Expr → Expr → Expr
.plus
a: Expr
a
b: Expr
b
)
.nat: Ty
.nat
|
bool: ∀ {v : Bool}, HasType (Expr.bool v) Ty.bool
bool
:
HasType: Expr → Ty → Prop
HasType
(
.bool: Bool → Expr
.bool
v: Bool
v
)
.bool: Ty
.bool
|
and: ∀ {a b : Expr}, HasType a Ty.bool → HasType b Ty.bool → HasType (a.and b) Ty.bool
and
:
HasType: Expr → Ty → Prop
HasType
a: Expr
a
.bool: Ty
.bool
HasType: Expr → Ty → Prop
HasType
b: Expr
b
.bool: Ty
.bool
HasType: Expr → Ty → Prop
HasType
(
.and: Expr → Expr → Expr
.and
a: Expr
a
b: Expr
b
)
.bool: Ty
.bool

We can easily show that if e has type t₁ and type t₂, then t₁ and t₂ must be equal by using the cases tactic. This tactic creates a new subgoal for every constructor, and automatically discharges unreachable cases. The tactic combinator tac₁ <;> tac₂ applies tac₂ to each subgoal produced by tac₁. Then, the tactic rfl is used to close all produced goals using reflexivity.

theorem 
HasType.det: ∀ {e : Expr} {t₁ t₂ : Ty}, HasType e t₁ → HasType e t₂ → t₁ = t₂
HasType.det
(
h₁: HasType e t₁
h₁
:
HasType: Expr → Ty → Prop
HasType
e: Expr
e
t₁: Ty
t₁
) (
h₂: HasType e t₂
h₂
:
HasType: Expr → Ty → Prop
HasType
e: Expr
e
t₂: Ty
t₂
) :
t₁: Ty
t₁
=
t₂: Ty
t₂
:=

Goals accomplished! 🐙
t₂: Ty
v✝: Nat
h₂: HasType (Expr.nat v) t₂

nat
Ty.nat = t₂
t₂: Ty
a✝², b✝: Expr
a✝¹: HasType a✝² Ty.nat
a✝: HasType b Ty.nat
h₂: HasType (a✝².plus b) t₂
Ty.nat = t₂
t₂: Ty
v✝: Bool
h₂: HasType (Expr.bool v) t₂
Ty.bool = t₂
t₂: Ty
a✝², b✝: Expr
a✝¹: HasType a✝² Ty.bool
a✝: HasType b Ty.bool
h₂: HasType (a✝².and b) t₂
Ty.bool = t₂
t₂: Ty
v✝: Nat
h₂: HasType (Expr.nat v) t₂

nat
Ty.nat = t₂
t₂: Ty
a✝², b✝: Expr
a✝¹: HasType a✝² Ty.nat
a✝: HasType b Ty.nat
h₂: HasType (a✝².plus b) t₂
Ty.nat = t₂
t₂: Ty
v✝: Bool
h₂: HasType (Expr.bool v) t₂
Ty.bool = t₂
t₂: Ty
a✝², b✝: Expr
a✝¹: HasType a✝² Ty.bool
a✝: HasType b Ty.bool
h₂: HasType (a✝².and b) t₂
Ty.bool = t₂
v✝: Nat

nat.nat
Ty.nat = Ty.nat
v✝: Nat

nat.nat
Ty.nat = Ty.nat
a✝⁴, b✝: Expr
a✝³: HasType a✝⁴ Ty.nat
a✝²: HasType b Ty.nat
a✝¹: HasType a✝⁴ Ty.nat
a✝: HasType b Ty.nat
Ty.nat = Ty.nat
v✝: Bool
Ty.bool = Ty.bool
a✝⁴, b✝: Expr
a✝³: HasType a✝⁴ Ty.bool
a✝²: HasType b Ty.bool
a✝¹: HasType a✝⁴ Ty.bool
a✝: HasType b Ty.bool
Ty.bool = Ty.bool

Goals accomplished! 🐙

The inductive type Maybe p has two constructors: found a h and unknown. The former contains an element a : α and a proof that a satisfies the predicate p. The constructor unknown is used to encode "failure".

inductive 
Maybe: {α : Sort u_1} → (α → Prop) → Sort (max 1 u_1)
Maybe
(
p: α → Prop
p
:
α: Sort u_1
α
Prop: Type
Prop
) where |
found: {α : Sort u_1} → {p : α → Prop} → (a : α) → p a → Maybe p
found
: (
a: α
a
:
α: Sort u_1
α
)
p: α → Prop
p
a: α
a
Maybe: {α : Sort u_1} → (α → Prop) → Sort (max 1 u_1)
Maybe
p: α → Prop
p
|
unknown: {α : Sort u_1} → {p : α → Prop} → Maybe p
unknown

We define a notation for Maybe that is similar to the builtin notation for the Lean builtin type Subtype.

notation "{{ " 
x: Lean.TSyntax `term
x
" | "
p: Lean.TSyntax `term
p
" }}" =>
Maybe: {α : Sort u_1} → (α → Prop) → Sort (max 1 u_1)
Maybe
(fun
x: Lean.TSyntax `term
x
=>
p: Lean.TSyntax `term
p
)

The function Expr.typeCheck e returns a type ty and a proof that e has type ty, or unknown. Recall that, def Expr.typeCheck ... in Lean is notation for namespace Expr def typeCheck ... end Expr. The term .found .nat .nat is sugar for Maybe.found Ty.nat HasType.nat. Lean can infer the namespaces using the expected types.

def 
Expr.typeCheck: (e : Expr) → {{ ty | HasType e ty }}
Expr.typeCheck
(
e: Expr
e
:
Expr: Type
Expr
) : {{
ty: Ty
ty
|
HasType: Expr → Ty → Prop
HasType
e: Expr
e
ty: Ty
ty
}} := match
e: Expr
e
with |
nat: Nat → Expr
nat
.. =>
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.nat: Ty
.nat
.nat: ∀ {v : Nat}, HasType (nat v) Ty.nat
.nat
|
bool: Bool → Expr
bool
.. =>
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.bool: Ty
.bool
.bool: ∀ {v : Bool}, HasType (bool v) Ty.bool
.bool
|
plus: Expr → Expr → Expr
plus
a: Expr
a
b: Expr
b
=> match
a: Expr
a
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
,
b: Expr
b
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
with |
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.nat: Ty
.nat
h₁: HasType a Ty.nat
h₁
,
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.nat: Ty
.nat
h₂: HasType b Ty.nat
h₂
=>
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.nat: Ty
.nat
(
.plus: ∀ {a b : Expr}, HasType a Ty.nat → HasType b Ty.nat → HasType (a.plus b) Ty.nat
.plus
h₁: HasType a Ty.nat
h₁
h₂: HasType b Ty.nat
h₂
) | _, _ =>
.unknown: {α : Type} → {p : α → Prop} → Maybe p
.unknown
|
and: Expr → Expr → Expr
and
a: Expr
a
b: Expr
b
=> match
a: Expr
a
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
,
b: Expr
b
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
with |
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.bool: Ty
.bool
h₁: HasType a Ty.bool
h₁
,
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.bool: Ty
.bool
h₂: HasType b Ty.bool
h₂
=>
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
.bool: Ty
.bool
(
.and: ∀ {a b : Expr}, HasType a Ty.bool → HasType b Ty.bool → HasType (a.and b) Ty.bool
.and
h₁: HasType a Ty.bool
h₁
h₂: HasType b Ty.bool
h₂
) | _, _ =>
.unknown: {α : Type} → {p : α → Prop} → Maybe p
.unknown
theorem
Expr.typeCheck_correct: ∀ {e : Expr} {ty : Ty} {h : HasType e ty}, HasType e ty → e.typeCheck ≠ Maybe.unknown → e.typeCheck = Maybe.found ty h
Expr.typeCheck_correct
(
h₁: HasType e ty
h₁
:
HasType: Expr → Ty → Prop
HasType
e: Expr
e
ty: Ty
ty
) (
h₂: e.typeCheck ≠ Maybe.unknown
h₂
:
e: Expr
e
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
.unknown: {α : Type} → {p : α → Prop} → Maybe p
.unknown
) :
e: Expr
e
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
=
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
ty: Ty
ty
h: HasType e ty
h
:=

Goals accomplished! 🐙
e: Expr
ty: Ty
h, h₁: HasType e ty

e.typeCheck Maybe.unknown e.typeCheck = Maybe.found ty h
e: Expr
ty: Ty
h, h₁: HasType e ty
x✝: {{ ty | HasType e ty }}

x Maybe.unknown x = Maybe.found ty h

Goals accomplished! 🐙
e: Expr
ty: Ty
h, h₁: HasType e ty
ty': Ty
h': HasType e ty'
h₂✝: Maybe.found ty' h' Maybe.unknown

found
Maybe.found ty' h' = Maybe.found ty h
;
e: Expr
ty: Ty
h, h₁: HasType e ty
ty': Ty
h': HasType e ty'
h₂✝: Maybe.found ty' h' Maybe.unknown
this: ty = ty'

found
Maybe.found ty' h' = Maybe.found ty h
;
e: Expr
ty: Ty
h, h₁, h': HasType e ty
h₂✝: Maybe.found ty h' Maybe.unknown

found
Maybe.found ty h' = Maybe.found ty h
;

Goals accomplished! 🐙

Goals accomplished! 🐙
e: Expr
ty: Ty
h, h₁: HasType e ty
h₂✝: Maybe.unknown Maybe.unknown

unknown
Maybe.unknown = Maybe.found ty h
;

Goals accomplished! 🐙

Now, we prove that if Expr.typeCheck e returns Maybe.unknown, then forall ty, HasType e ty does not hold. The notation e.typeCheck is sugar for Expr.typeCheck e. Lean can infer this because we explicitly said that e has type Expr. The proof is by induction on e and case analysis. The tactic rename_i is used to rename "inaccessible" variables. We say a variable is inaccessible if it is introduced by a tactic (e.g., cases) or has been shadowed by another variable introduced by the user. Note that the tactic simp [typeCheck] is applied to all goal generated by the induction tactic, and closes the cases corresponding to the constructors Expr.nat and Expr.bool.

theorem 
Expr.typeCheck_complete: ∀ {ty : Ty} {e : Expr}, e.typeCheck = Maybe.unknown → ¬HasType e ty
Expr.typeCheck_complete
{
e: Expr
e
:
Expr: Type
Expr
} :
e: Expr
e
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
=
.unknown: {α : Type} → {p : α → Prop} → Maybe p
.unknown
¬
HasType: Expr → Ty → Prop
HasType
e: Expr
e
ty: Ty
ty
:=

Goals accomplished! 🐙
ty: Ty
e: Expr

e.typeCheck = Maybe.unknown ¬HasType e ty

Goals accomplished! 🐙

Goals accomplished! 🐙
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.nat
h₂✝: HasType b Ty.nat
heq✝¹: a.typeCheck = Maybe.found Ty.nat h₁
heq✝: b.typeCheck = Maybe.found Ty.nat h₂

plus.h_1
Maybe.found Ty.nat = Maybe.unknown ¬HasType (a.plus b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.nat) (h₂ : HasType b Ty.nat), a.typeCheck = Maybe.found Ty.nat h₁ b.typeCheck = Maybe.found Ty.nat h₂ False
Maybe.unknown = Maybe.unknown ¬HasType (a.plus b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.nat
h₂✝: HasType b Ty.nat
heq✝¹: a.typeCheck = Maybe.found Ty.nat h₁
heq✝: b.typeCheck = Maybe.found Ty.nat h₂

plus.h_1
Maybe.found Ty.nat = Maybe.unknown ¬HasType (a.plus b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.nat) (h₂ : HasType b Ty.nat), a.typeCheck = Maybe.found Ty.nat h₁ b.typeCheck = Maybe.found Ty.nat h₂ False
Maybe.unknown = Maybe.unknown ¬HasType (a.plus b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.nat
h₂✝: HasType b Ty.nat
heq✝¹: a.typeCheck = Maybe.found Ty.nat h₁
heq✝: b.typeCheck = Maybe.found Ty.nat h₂
a✝: Maybe.found Ty.nat = Maybe.unknown

¬HasType (a.plus b) ty
;

Goals accomplished! 🐙
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.nat) (h₂ : HasType b Ty.nat), a.typeCheck = Maybe.found Ty.nat h₁ b.typeCheck = Maybe.found Ty.nat h₂ False

plus.h_2
Maybe.unknown = Maybe.unknown ¬HasType (a.plus b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
ra: {{ ty | HasType a ty }}
rb: {{ ty | HasType b ty }}
hnp: (h₁ : HasType a Ty.nat) (h₂ : HasType b Ty.nat), a.typeCheck = Maybe.found Ty.nat h₁ b.typeCheck = Maybe.found Ty.nat h₂ False
h: Maybe.unknown = Maybe.unknown
ht: HasType (a.plus b) ty

False
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
ra: {{ ty | HasType a ty }}
rb: {{ ty | HasType b ty }}
hnp: (h₁ : HasType a Ty.nat) (h₂ : HasType b Ty.nat), a.typeCheck = Maybe.found Ty.nat h₁ b.typeCheck = Maybe.found Ty.nat h₂ False
h: Maybe.unknown = Maybe.unknown
ht: HasType (a.plus b) ty

False

Goals accomplished! 🐙

Goals accomplished! 🐙

Goals accomplished! 🐙
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.bool
h₂✝: HasType b Ty.bool
heq✝¹: a.typeCheck = Maybe.found Ty.bool h₁
heq✝: b.typeCheck = Maybe.found Ty.bool h₂

and.h_1
Maybe.found Ty.bool = Maybe.unknown ¬HasType (a.and b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.bool) (h₂ : HasType b Ty.bool), a.typeCheck = Maybe.found Ty.bool h₁ b.typeCheck = Maybe.found Ty.bool h₂ False
Maybe.unknown = Maybe.unknown ¬HasType (a.and b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.bool
h₂✝: HasType b Ty.bool
heq✝¹: a.typeCheck = Maybe.found Ty.bool h₁
heq✝: b.typeCheck = Maybe.found Ty.bool h₂

and.h_1
Maybe.found Ty.bool = Maybe.unknown ¬HasType (a.and b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.bool) (h₂ : HasType b Ty.bool), a.typeCheck = Maybe.found Ty.bool h₁ b.typeCheck = Maybe.found Ty.bool h₂ False
Maybe.unknown = Maybe.unknown ¬HasType (a.and b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝¹: {{ ty | HasType a ty }}
x✝: {{ ty | HasType b ty }}
h₁✝: HasType a Ty.bool
h₂✝: HasType b Ty.bool
heq✝¹: a.typeCheck = Maybe.found Ty.bool h₁
heq✝: b.typeCheck = Maybe.found Ty.bool h₂
a✝: Maybe.found Ty.bool = Maybe.unknown

¬HasType (a.and b) ty
;

Goals accomplished! 🐙
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
x✝²: {{ ty | HasType a ty }}
x✝¹: {{ ty | HasType b ty }}
x✝: (h₁ : HasType a Ty.bool) (h₂ : HasType b Ty.bool), a.typeCheck = Maybe.found Ty.bool h₁ b.typeCheck = Maybe.found Ty.bool h₂ False

and.h_2
Maybe.unknown = Maybe.unknown ¬HasType (a.and b) ty
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
ra: {{ ty | HasType a ty }}
rb: {{ ty | HasType b ty }}
hnp: (h₁ : HasType a Ty.bool) (h₂ : HasType b Ty.bool), a.typeCheck = Maybe.found Ty.bool h₁ b.typeCheck = Maybe.found Ty.bool h₂ False
h: Maybe.unknown = Maybe.unknown
ht: HasType (a.and b) ty

False
ty: Ty
a, b: Expr
iha: a.typeCheck = Maybe.unknown ¬HasType a ty
ihb: b.typeCheck = Maybe.unknown ¬HasType b ty
ra: {{ ty | HasType a ty }}
rb: {{ ty | HasType b ty }}
hnp: (h₁ : HasType a Ty.bool) (h₂ : HasType b Ty.bool), a.typeCheck = Maybe.found Ty.bool h₁ b.typeCheck = Maybe.found Ty.bool h₂ False
h: Maybe.unknown = Maybe.unknown
ht: HasType (a.and b) ty

False

Goals accomplished! 🐙

Goals accomplished! 🐙

Finally, we show that type checking for e can be decided using Expr.typeCheck.

instance: (e : Expr) → (t : Ty) → Decidable (HasType e t)
instance
(
e: Expr
e
:
Expr: Type
Expr
) (
t: Ty
t
:
Ty: Type
Ty
) :
Decidable: Prop → Type
Decidable
(
HasType: Expr → Ty → Prop
HasType
e: Expr
e
t: Ty
t
) := match
h': e.typeCheck = Maybe.unknown
h'
:
e: Expr
e
.
typeCheck: (e : Expr) → {{ ty | HasType e ty }}
typeCheck
with |
.found: {α : Type} → {p : α → Prop} → (a : α) → p a → Maybe p
.found
t': Ty
t'
ht': HasType e t'
ht'
=> if
heq: ¬t = t'
heq
:
t: Ty
t
=
t': Ty
t'
then
isTrue: {p : Prop} → p → Decidable p
isTrue
(
heq: t = t'
heq
ht': HasType e t'
ht'
) else
isFalse: {p : Prop} → ¬p → Decidable p
isFalse
fun
ht: HasType e t
ht
=>
heq: ¬t = t'
heq
(
HasType.det: ∀ {e : Expr} {t₁ t₂ : Ty}, HasType e t₁ → HasType e t₂ → t₁ = t₂
HasType.det
ht: HasType e t
ht
ht': HasType e t'
ht'
) |
.unknown: {{ ty | HasType e ty }}
.unknown
=>
isFalse: {p : Prop} → ¬p → Decidable p
isFalse
(
Expr.typeCheck_complete: ∀ {ty : Ty} {e : Expr}, e.typeCheck = Maybe.unknown → ¬HasType e ty
Expr.typeCheck_complete
h': e.typeCheck = Maybe.unknown
h'
)

The Well-Typed Interpreter

In this example, we build an interpreter for a simple functional programming language, with variables, function application, binary operators and an if...then...else construct. We will use the dependent type system to ensure that any programs which can be represented are well-typed.

Remark: this example is based on an example found in the Idris manual.

Vectors

A Vector is a list of size n whose elements belong to a type α.

inductive 
Vector: Type u → Nat → Type u
Vector
(
α: Type u
α
:
Type u: Type (u + 1)
Type u
) :
Nat: Type
Nat
Type u: Type (u + 1)
Type u
|
nil: {α : Type u} → Vector α 0
nil
:
Vector: Type u → Nat → Type u
Vector
α: Type u
α
0: Nat
0
|
cons: {α : Type u} → {n : Nat} → α → Vector α n → Vector α (n + 1)
cons
:
α: Type u
α
Vector: Type u → Nat → Type u
Vector
α: Type u
α
n: Nat
n
Vector: Type u → Nat → Type u
Vector
α: Type u
α
(
n: Nat
n
+
1: Nat
1
)

We can overload the List.cons notation :: and use it to create Vectors.

infix:67 " :: " => 
Vector.cons: {α : Type u} → {n : Nat} → α → Vector α n → Vector α (n + 1)
Vector.cons

Now, we define the types of our simple functional language. We have integers, booleans, and functions, represented by Ty.

inductive 
Ty: Type
Ty
where |
int: Ty
int
|
bool: Ty
bool
|
fn: Ty → Ty → Ty
fn
(
a: Ty
a
r: Ty
r
:
Ty: Type
Ty
)

We can write a function to translate Ty values to a Lean type — remember that types are first class, so can be calculated just like any other value. We mark Ty.interp as [reducible] to make sure the typeclass resolution procedure can unfold/reduce it. For example, suppose Lean is trying to synthesize a value for the instance Add (Ty.interp Ty.int). Since Ty.interp is marked as [reducible], the typeclass resolution procedure can reduce Ty.interp Ty.int to Int, and use the builtin instance for Add Int as the solution.

@[reducible] def 
Ty.interp: Ty → Type
Ty.interp
:
Ty: Type
Ty
Type: Type 1
Type
|
int: Ty
int
=>
Int: Type
Int
|
bool: Ty
bool
=>
Bool: Type
Bool
|
fn: Ty → Ty → Ty
fn
a: Ty
a
r: Ty
r
=>
a: Ty
a
.
interp: Ty → Type
interp
r: Ty
r
.
interp: Ty → Type
interp

Expressions are indexed by the types of the local variables, and the type of the expression itself.

inductive 
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
:
Fin: Nat → Type
Fin
n: Nat
n
Vector: Type → Nat → Type
Vector
Ty: Type
Ty
n: Nat
n
Ty: Type
Ty
Type: Type 1
Type
where |
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
:
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
0: Fin (?m.2856 + 1)
0
(
ty: Ty
ty
::
ctx: Vector Ty ?m.2856
ctx
)
ty: Ty
ty
|
pop: {x : Nat} → {k : Fin x} → {ctx : Vector Ty x} → {ty u : Ty} → HasType k ctx ty → HasType k.succ (u :: ctx) ty
pop
:
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
k: Fin ?m.3046
k
ctx: Vector Ty ?m.3046
ctx
ty: Ty
ty
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
k: Fin ?m.3046
k
.
succ: {n : Nat} → Fin n → Fin (n + 1)
succ
(
u: Ty
u
::
ctx: Vector Ty ?m.3046
ctx
)
ty: Ty
ty
inductive
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
:
Vector: Type → Nat → Type
Vector
Ty: Type
Ty
n: Nat
n
Ty: Type
Ty
Type: Type 1
Type
where |
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
:
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
i: Fin ?m.3929
i
ctx: Vector Ty ?m.3929
ctx
ty: Ty
ty
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.3929
ctx
ty: Ty
ty
|
val: {n : Nat} → {ctx : Vector Ty n} → Int → Expr ctx Ty.int
val
:
Int: Type
Int
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.3950
ctx
Ty.int: Ty
Ty.int
|
lam: {n : Nat} → {a : Ty} → {ctx : Vector Ty n} → {ty : Ty} → Expr (a :: ctx) ty → Expr ctx (a.fn ty)
lam
:
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
(
a: Ty
a
::
ctx: Vector Ty ?m.4114
ctx
)
ty: Ty
ty
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4114
ctx
(
Ty.fn: Ty → Ty → Ty
Ty.fn
a: Ty
a
ty: Ty
ty
) |
app: {n : Nat} → {ctx : Vector Ty n} → {a ty : Ty} → Expr ctx (a.fn ty) → Expr ctx a → Expr ctx ty
app
:
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4269
ctx
(
Ty.fn: Ty → Ty → Ty
Ty.fn
a: Ty
a
ty: Ty
ty
)
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4269
ctx
a: Ty
a
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4269
ctx
ty: Ty
ty
|
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
: (
a: Ty
a
.
interp: Ty → Type
interp
b: Ty
b
.
interp: Ty → Type
interp
c: Ty
c
.
interp: Ty → Type
interp
)
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4359
ctx
a: Ty
a
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4359
ctx
b: Ty
b
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4359
ctx
c: Ty
c
|
ife: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → Expr ctx Ty.bool → Expr ctx a → Expr ctx a → Expr ctx a
ife
:
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4480
ctx
Ty.bool: Ty
Ty.bool
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4480
ctx
a: Ty
a
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4480
ctx
a: Ty
a
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4480
ctx
a: Ty
a
|
delay: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → (Unit → Expr ctx a) → Expr ctx a
delay
: (
Unit: Type
Unit
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4523
ctx
a: Ty
a
)
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.4523
ctx
a: Ty
a

We use the command open to create the aliases stop and pop for HasType.stop and HasType.pop respectively.

open HasType (
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
pop: {x : Nat} → {k : Fin x} → {ctx : Vector Ty x} → {ty u : Ty} → HasType k ctx ty → HasType k.succ (u :: ctx) ty
pop
)

Since expressions are indexed by their type, we can read the typing rules of the language from the definitions of the constructors. Let us look at each constructor in turn.

We use a nameless representation for variables — they are de Bruijn indexed. Variables are represented by a proof of their membership in the context, HasType i ctx ty, which is a proof that variable i in context ctx has type ty.

We can treat stop as a proof that the most recently defined variable is well-typed, and pop n as a proof that, if the nth most recently defined variable is well-typed, so is the n+1th. In practice, this means we use stop to refer to the most recently defined variable, pop stop to refer to the next, and so on, via the Expr.var constructor.

A value Expr.val carries a concrete representation of an integer.

A lambda Expr.lam creates a function. In the scope of a function of type Ty.fn a ty, there is a new local variable of type a.

A function application Expr.app produces a value of type ty given a function from a to ty and a value of type a.

The constructor Expr.op allows us to use arbitrary binary operators, where the type of the operator informs what the types of the arguments must be.

Finally, the constructor Exp.ife represents a if-then-else expression. The condition is a Boolean, and each branch must have the same type.

The auxiliary constructor Expr.delay is used to delay evaluation.

When we evaluate an Expr, we’ll need to know the values in scope, as well as their types. Env is an environment, indexed over the types in scope. Since an environment is just another form of list, albeit with a strongly specified connection to the vector of local variable types, we overload again the notation :: so that we can use the usual list syntax. Given a proof that a variable is defined in the context, we can then produce a value from the environment.

inductive 
Env: {n : Nat} → Vector Ty n → Type
Env
:
Vector: Type → Nat → Type
Vector
Ty: Type
Ty
n: Nat
n
Type: Type 1
Type
where |
nil: Env Vector.nil
nil
:
Env: {n : Nat} → Vector Ty n → Type
Env
Vector.nil: {α : Type} → Vector α 0
Vector.nil
|
cons: {a : Ty} → {x : Nat} → {ctx : Vector Ty x} → a.interp → Env ctx → Env (a :: ctx)
cons
:
Ty.interp: Ty → Type
Ty.interp
a: Ty
a
Env: {n : Nat} → Vector Ty n → Type
Env
ctx: Vector Ty ?m.9340
ctx
Env: {n : Nat} → Vector Ty n → Type
Env
(
a: Ty
a
::
ctx: Vector Ty ?m.9340
ctx
) infix:67 " :: " =>
Env.cons: {a : Ty} → {x : Nat} → {ctx : Vector Ty x} → a.interp → Env ctx → Env (a :: ctx)
Env.cons
def
Env.lookup: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Env ctx → ty.interp
Env.lookup
:
HasType: {n : Nat} → Fin n → Vector Ty n → Ty → Type
HasType
i: Fin ?m.11969
i
ctx: Vector Ty ?m.11969
ctx
ty: Ty
ty
Env: {n : Nat} → Vector Ty n → Type
Env
ctx: Vector Ty ?m.11969
ctx
ty: Ty
ty
.
interp: Ty → Type
interp
|
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
,
x: ty.interp
x
::
xs: Env ctx✝
xs
=>
x: ty.interp
x
|
pop: {x : Nat} → {k : Fin x} → {ctx : Vector Ty x} → {ty u : Ty} → HasType k ctx ty → HasType k.succ (u :: ctx) ty
pop
k: HasType k✝ ctx✝ ty
k
,
x: u✝.interp
x
::
xs: Env ctx✝
xs
=>
lookup: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Env ctx → ty.interp
lookup
k: HasType k✝ ctx✝ ty
k
xs: Env ctx✝
xs

Given this, an interpreter is a function which translates an Expr into a Lean value with respect to a specific environment.

def 
Expr.interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
Expr.interp
(
env: Env ctx
env
:
Env: {n : Nat} → Vector Ty n → Type
Env
ctx: Vector Ty ?m.14834
ctx
) :
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.14834
ctx
ty: Ty
ty
ty: Ty
ty
.
interp: Ty → Type
interp
|
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
i: HasType i✝ ctx ty
i
=>
env: Env ctx
env
.
lookup: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Env ctx → ty.interp
lookup
i: HasType i✝ ctx ty
i
|
val: {n : Nat} → {ctx : Vector Ty n} → Int → Expr ctx Ty.int
val
x: Int
x
=>
x: Int
x
|
lam: {n : Nat} → {a : Ty} → {ctx : Vector Ty n} → {ty : Ty} → Expr (a :: ctx) ty → Expr ctx (a.fn ty)
lam
b: Expr (a✝ :: ctx) ty✝
b
=> fun
x: a✝.interp
x
=>
b: Expr (a✝ :: ctx) ty✝
b
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
(
Env.cons: {a : Ty} → {x : Nat} → {ctx : Vector Ty x} → a.interp → Env ctx → Env (a :: ctx)
Env.cons
x: a✝.interp
x
env: Env ctx
env
) |
app: {n : Nat} → {ctx : Vector Ty n} → {a ty : Ty} → Expr ctx (a.fn ty) → Expr ctx a → Expr ctx ty
app
f: Expr ctx (a✝.fn ty)
f
a: Expr ctx a✝
a
=>
f: Expr ctx (a✝.fn ty)
f
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
(
a: Expr ctx a✝
a
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
) |
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
o: a✝.interp → b✝.interp → ty.interp
o
x: Expr ctx a✝
x
y: Expr ctx b✝
y
=>
o: a✝.interp → b✝.interp → ty.interp
o
(
x: Expr ctx a✝
x
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
) (
y: Expr ctx b✝
y
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
) |
ife: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → Expr ctx Ty.bool → Expr ctx a → Expr ctx a → Expr ctx a
ife
c: Expr ctx Ty.bool
c
t: Expr ctx ty
t
e: Expr ctx ty
e
=> if
c: Expr ctx Ty.bool
c
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
then
t: Expr ctx ty
t
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
else
e: Expr ctx ty
e
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
|
delay: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → (Unit → Expr ctx a) → Expr ctx a
delay
a: Unit → Expr ctx ty
a
=> (
a: Unit → Expr ctx ty
a
(): Unit
()
).
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
env: Env ctx
env
open Expr

We can make some simple test functions. Firstly, adding two inputs fun x y => y + x is written as follows.

def 
add: {a : Nat} → {ctx : Vector Ty a} → Expr ctx (Ty.int.fn (Ty.int.fn Ty.int))
add
:
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.18844
ctx
(
Ty.fn: Ty → Ty → Ty
Ty.fn
Ty.int: Ty
Ty.int
(
Ty.fn: Ty → Ty → Ty
Ty.fn
Ty.int: Ty
Ty.int
Ty.int: Ty
Ty.int
)) :=
lam: {n : Nat} → {a : Ty} → {ctx : Vector Ty n} → {ty : Ty} → Expr (a :: ctx) ty → Expr ctx (a.fn ty)
lam
(
lam: {n : Nat} → {a : Ty} → {ctx : Vector Ty n} → {ty : Ty} → Expr (a :: ctx) ty → Expr ctx (a.fn ty)
lam
(
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
(·+·): Ty.int.interp → Ty.int.interp → Ty.int.interp
(·+·)
(
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
) (
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
(
pop: {x : Nat} → {k : Fin x} → {ctx : Vector Ty x} → {ty u : Ty} → HasType k ctx ty → HasType k.succ (u :: ctx) ty
pop
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
))))
30
add: {a : Nat} → {ctx : Vector Ty a} → Expr ctx (Ty.int.fn (Ty.int.fn Ty.int))
add
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
Env.nil: Env Vector.nil
Env.nil
10: Ty.int.interp
10
20: Ty.int.interp
20

More interestingly, a factorial function fact (e.g. fun x => if (x == 0) then 1 else (fact (x-1) * x)), can be written as. Note that this is a recursive (non-terminating) definition. For every input value, the interpreter terminates, but the definition itself is non-terminating. We use two tricks to make sure Lean accepts it. First, we use the auxiliary constructor Expr.delay to delay its unfolding. Second, we add the annotation decreasing_by sorry which can be viewed as "trust me, this recursive definition makes sense". Recall that sorry is an unsound axiom in Lean.

def 
Warning: declaration uses 'sorry'
:
Expr: {n : Nat} → Vector Ty n → Ty → Type
Expr
ctx: Vector Ty ?m.19227
ctx
(
Ty.fn: Ty → Ty → Ty
Ty.fn
Ty.int: Ty
Ty.int
Ty.int: Ty
Ty.int
) :=
lam: {n : Nat} → {a : Ty} → {ctx : Vector Ty n} → {ty : Ty} → Expr (a :: ctx) ty → Expr ctx (a.fn ty)
lam
(
ife: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → Expr ctx Ty.bool → Expr ctx a → Expr ctx a → Expr ctx a
ife
(
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
(·==·): Ty.int.interp → Ty.int.interp → Bool
(·==·)
(
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
) (
val: {n : Nat} → {ctx : Vector Ty n} → Int → Expr ctx Ty.int
val
0: Int
0
)) (
val: {n : Nat} → {ctx : Vector Ty n} → Int → Expr ctx Ty.int
val
1: Int
1
) (
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
(·*·): Ty.int.interp → Ty.int.interp → Ty.int.interp
(·*·)
(
delay: {n : Nat} → {ctx : Vector Ty n} → {a : Ty} → (Unit → Expr ctx a) → Expr ctx a
delay
fun
_: Unit
_
=>
app: {n : Nat} → {ctx : Vector Ty n} → {a ty : Ty} → Expr ctx (a.fn ty) → Expr ctx a → Expr ctx ty
app
fact: {a : Nat} → {ctx : Vector Ty a} → Expr ctx (Ty.int.fn Ty.int)
fact
(
op: {n : Nat} → {ctx : Vector Ty n} → {a b c : Ty} → (a.interp → b.interp → c.interp) → Expr ctx a → Expr ctx b → Expr ctx c
op
(·-·): Ty.int.interp → Ty.int.interp → Ty.int.interp
(·-·)
(
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
) (
val: {n : Nat} → {ctx : Vector Ty n} → Int → Expr ctx Ty.int
val
1: Int
1
))) (
var: {n : Nat} → {i : Fin n} → {ctx : Vector Ty n} → {ty : Ty} → HasType i ctx ty → Expr ctx ty
var
stop: {ty : Ty} → {x : Nat} → {ctx : Vector Ty x} → HasType 0 (ty :: ctx) ty
stop
)))
a✝: Nat
ctx: Vector Ty a

a + 1 < a

Goals accomplished! 🐙
3628800
fact: {a : Nat} → {ctx : Vector Ty a} → Expr ctx (Ty.int.fn Ty.int)
fact
.
interp: {a : Nat} → {ctx : Vector Ty a} → {ty : Ty} → Env ctx → Expr ctx ty → ty.interp
interp
Env.nil: Env Vector.nil
Env.nil
10: Ty.int.interp
10

Dependent de Bruijn Indices

In this example, we represent program syntax terms in a type family parameterized by a list of types, representing the typing context, or information on which free variables are in scope and what their types are.

Remark: this example is based on an example in the book Certified Programming with Dependent Types by Adam Chlipala.

Programmers who move to statically typed functional languages from scripting languages often complain about the requirement that every element of a list have the same type. With fancy type systems, we can partially lift this requirement. We can index a list type with a “type-level” list that explains what type each element of the list should have. This has been done in a variety of ways in Haskell using type classes, and we can do it much more cleanly and directly in Lean.

We parameterize our heterogeneous lists by at type α and an α-indexed type β.

inductive 
HList: {α : Type v} → (α → Type u) → List α → Type (max u v)
HList
{
α: Type v
α
:
Type v: Type (v + 1)
Type v
} (
β: α → Type u
β
:
α: Type v
α
Type u: Type (u + 1)
Type u
) :
List: Type v → Type v
List
α: Type v
α
Type (max u v): Type ((max u v) + 1)
Type (max u v)
|
nil: {α : Type v} → {β : α → Type u} → HList β []
nil
:
HList: {α : Type v} → (α → Type u) → List α → Type (max u v)
HList
β: α → Type u
β
[]: List α
[]
|
cons: {α : Type v} → {β : α → Type u} → {i : α} → {is : List α} → β i → HList β is → HList β (i :: is)
cons
:
β: α → Type u
β
i: α
i
HList: {α : Type v} → (α → Type u) → List α → Type (max u v)
HList
β: α → Type u
β
is: List α
is
HList: {α : Type v} → (α → Type u) → List α → Type (max u v)
HList
β: α → Type u
β
(
i: α
i
::
is: List α
is
)

We overload the List.cons notation :: so we can also use it to create heterogeneous lists.

infix:67 " :: " => 
HList.cons: {α : Type v} → {β : α → Type u} → {i : α} → {is : List α} → β i → HList β is → HList β (i :: is)
HList.cons

We similarly overload the List notation [] for the empty heterogeneous list.

notation "[" "]" => 
HList.nil: {α : Type v} → {β : α → Type u} → HList β []
HList.nil

Variables are represented in a way isomorphic to the natural numbers, where number 0 represents the first element in the context, number 1 the second element, and so on. Actually, instead of numbers, we use the Member inductive family.

The value of type Member a as can be viewed as a certificate that a is an element of the list as. The constructor Member.head says that a is in the list if the list begins with it. The constructor Member.tail says that if a is in the list bs, it is also in the list b::bs.

inductive 
Member: {α : Type} → α → List α → Type
Member
:
α: Type
α
List: Type → Type
List
α: Type
α
Type: Type 1
Type
|
head: {α : Type} → {a : α} → {as : List α} → Member a (a :: as)
head
:
Member: {α : Type} → α → List α → Type
Member
a: ?m.3852
a
(
a: ?m.3852
a
::
as: List ?m.3852
as
) |
tail: {α : Type} → {a : α} → {bs : List α} → {b : α} → Member a bs → Member a (b :: bs)
tail
:
Member: {α : Type} → α → List α → Type
Member
a: ?m.4024
a
bs: List ?m.4024
bs
Member: {α : Type} → α → List α → Type
Member
a: ?m.4024
a
(
b: ?m.4024
b
::
bs: List ?m.4024
bs
)

Given a heterogeneous list HList β is and value of type Member i is, HList.get retrieves an element of type β i from the list. The pattern .head and .tail h are sugar for Member.head and Member.tail h respectively. Lean can infer the namespace using the expected type.

def 
HList.get: {α : Type} → {β : α → Type u_1} → {is : List α} → {i : α} → HList β is → Member i is → β i
HList.get
:
HList: {α : Type} → (α → Type u_1) → List α → Type u_1
HList
β: ?m.4631 → Type u_1
β
is: List ?m.4631
is
Member: {α : Type} → α → List α → Type
Member
i: ?m.4631
i
is: List ?m.4631
is
β: ?m.4631 → Type u_1
β
i: ?m.4631
i
|
a: β i
a
::
as: HList β is✝
as
,
.head: {α : Type} → {a : α} → {as : List α} → Member a (a :: as)
.head
=>
a: β i
a
|
Warning: unused variable `a` note: this linter can be disabled with `set_option linter.unusedVariables false`
::
as: HList β is✝
as
,
.tail: {α : Type} → {a : α} → {bs : List α} → {b : α} → Member a bs → Member a (b :: bs)
.tail
h: Member i is✝
h
=>
as: HList β is✝
as
.
get: {α : Type} → {β : α → Type u_1} → {is : List α} → {i : α} → HList β is → Member i is → β i
get
h: Member i is✝
h

Here is the definition of the simple type system for our programming language, a simply typed lambda calculus with natural numbers as the base type.

inductive 
Ty: Type
Ty
where |
nat: Ty
nat
|
fn: Ty → Ty → Ty
fn
:
Ty: Type
Ty
Ty: Type
Ty
Ty: Type
Ty

We can write a function to translate Ty values to a Lean type — remember that types are first class, so can be calculated just like any other value. We mark Ty.denote as [reducible] to make sure the typeclass resolution procedure can unfold/reduce it. For example, suppose Lean is trying to synthesize a value for the instance Add (Ty.denote Ty.nat). Since Ty.denote is marked as [reducible], the typeclass resolution procedure can reduce Ty.denote Ty.nat to Nat, and use the builtin instance for Add Nat as the solution.

Recall that the term a.denote is sugar for denote a where denote is the function being defined. We call it the "dot notation".

@[reducible] def 
Ty.denote: Ty → Type
Ty.denote
:
Ty: Type
Ty
Type: Type 1
Type
|
nat: Ty
nat
=>
Nat: Type
Nat
|
fn: Ty → Ty → Ty
fn
a: Ty
a
b: Ty
b
=>
a: Ty
a
.
denote: Ty → Type
denote
b: Ty
b
.
denote: Ty → Type
denote

Here is the definition of the Term type, including variables, constants, addition, function application and abstraction, and let binding of local variables. Since let is a keyword in Lean, we use the "escaped identifier" «let». You can input the unicode (French double quotes) using \f<< (for «) and \f>> (for »). The term Term ctx .nat is sugar for Term ctx Ty.nat, Lean infers the namespace using the expected type.

inductive 
Term: List Ty → Ty → Type
Term
:
List: Type → Type
List
Ty: Type
Ty
Ty: Type
Ty
Type: Type 1
Type
|
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
:
Member: {α : Type} → α → List α → Type
Member
ty: Ty
ty
ctx: List Ty
ctx
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty: Ty
ty
|
const: {ctx : List Ty} → Nat → Term ctx Ty.nat
const
:
Nat: Type
Nat
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
.nat: Ty
.nat
|
plus: {ctx : List Ty} → Term ctx Ty.nat → Term ctx Ty.nat → Term ctx Ty.nat
plus
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
.nat: Ty
.nat
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
.nat: Ty
.nat
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
.nat: Ty
.nat
|
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
(
.fn: Ty → Ty → Ty
.fn
dom: Ty
dom
ran: Ty
ran
)
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
dom: Ty
dom
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ran: Ty
ran
|
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
:
Term: List Ty → Ty → Type
Term
(
dom: Ty
dom
::
ctx: List Ty
ctx
)
ran: Ty
ran
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
(
.fn: Ty → Ty → Ty
.fn
dom: Ty
dom
ran: Ty
ran
) |
«let»: {ctx : List Ty} → {ty₁ ty₂ : Ty} → Term ctx ty₁ → Term (ty₁ :: ctx) ty₂ → Term ctx ty₂
«let»
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty₁: Ty
ty₁
Term: List Ty → Ty → Type
Term
(
ty₁: Ty
ty₁
::
ctx: List Ty
ctx
)
ty₂: Ty
ty₂
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty₂: Ty
ty₂

Here are two example terms encoding, the first addition packaged as a two-argument curried function, and the second of a sample application of addition to constants.

The command open Ty Term Member opens the namespaces Ty, Term, and Member. Thus, you can write lam instead of Term.lam.

open Ty Term Member
def 
add: Term [] (nat.fn (nat.fn nat))
add
:
Term: List Ty → Ty → Type
Term
[]: List Ty
[]
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
nat: Ty
nat
)) :=
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
(
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
(
plus: {ctx : List Ty} → Term ctx nat → Term ctx nat → Term ctx nat
plus
(
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
(
tail: {α : Type} → {a : α} → {bs : List α} → {b : α} → Member a bs → Member a (b :: bs)
tail
head: {α : Type} → {a : α} → {as : List α} → Member a (a :: as)
head
)) (
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
head: {α : Type} → {a : α} → {as : List α} → Member a (a :: as)
head
))) def
three_the_hard_way: Term [] nat
three_the_hard_way
:
Term: List Ty → Ty → Type
Term
[]: List Ty
[]
nat: Ty
nat
:=
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
(
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
add: Term [] (nat.fn (nat.fn nat))
add
(
const: {ctx : List Ty} → Nat → Term ctx nat
const
1: Nat
1
)) (
const: {ctx : List Ty} → Nat → Term ctx nat
const
2: Nat
2
)

Since dependent typing ensures that any term is well-formed in its context and has a particular type, it is easy to translate syntactic terms into Lean values.

The attribute [simp] instructs Lean to always try to unfold Term.denote applications when one applies the simp tactic. We also say this is a hint for the Lean term simplifier.

@[simp] def 
Term.denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
Term.denote
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty: Ty
ty
HList: {α : Type} → (α → Type) → List α → Type
HList
Ty.denote: Ty → Type
Ty.denote
ctx: List Ty
ctx
ty: Ty
ty
.
denote: Ty → Type
denote
|
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
h: Member ty ctx
h
,
env: HList Ty.denote ctx
env
=>
env: HList Ty.denote ctx
env
.
get: {α : Type} → {β : α → Type} → {is : List α} → {i : α} → HList β is → Member i is → β i
get
h: Member ty ctx
h
|
const: {ctx : List Ty} → Nat → Term ctx nat
const
n: Nat
n
, _ =>
n: Nat
n
|
plus: {ctx : List Ty} → Term ctx nat → Term ctx nat → Term ctx nat
plus
a: Term ctx nat
a
b: Term ctx nat
b
,
env: HList Ty.denote ctx
env
=>
a: Term ctx nat
a
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
+
b: Term ctx nat
b
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
|
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
f: Term ctx (dom✝.fn ty)
f
a: Term ctx dom✝
a
,
env: HList Ty.denote ctx
env
=>
f: Term ctx (dom✝.fn ty)
f
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
(
a: Term ctx dom✝
a
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
) |
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
b: Term (dom✝ :: ctx) ran✝
b
,
env: HList Ty.denote ctx
env
=> fun
x: dom✝.denote
x
=>
b: Term (dom✝ :: ctx) ran✝
b
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
(
x: dom✝.denote
x
::
env: HList Ty.denote ctx
env
) |
«let»: {ctx : List Ty} → {ty₁ ty₂ : Ty} → Term ctx ty₁ → Term (ty₁ :: ctx) ty₂ → Term ctx ty₂
«let»
a: Term ctx ty₁✝
a
b: Term (ty₁✝ :: ctx) ty
b
,
env: HList Ty.denote ctx
env
=>
b: Term (ty₁✝ :: ctx) ty
b
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
(
a: Term ctx ty₁✝
a
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
::
env: HList Ty.denote ctx
env
)

You can show that the denotation of three_the_hard_way is indeed 3 using reflexivity.

example: three_the_hard_way.denote [] = 3
example
:
three_the_hard_way: Term [] nat
three_the_hard_way
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
[]: HList Ty.denote []
[]
=
3: nat.denote
3
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl

We now define the constant folding optimization that traverses a term if replaces subterms such as plus (const m) (const n) with const (n+m).

@[simp] def 
Term.constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
Term.constFold
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty: Ty
ty
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty: Ty
ty
|
const: {ctx : List Ty} → Nat → Term ctx nat
const
n: Nat
n
=>
const: {ctx : List Ty} → Nat → Term ctx nat
const
n: Nat
n
|
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
h: Member ty ctx
h
=>
var: {ty : Ty} → {ctx : List Ty} → Member ty ctx → Term ctx ty
var
h: Member ty ctx
h
|
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
f: Term ctx (dom✝.fn ty)
f
a: Term ctx dom✝
a
=>
app: {ctx : List Ty} → {dom ran : Ty} → Term ctx (dom.fn ran) → Term ctx dom → Term ctx ran
app
f: Term ctx (dom✝.fn ty)
f
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
a: Term ctx dom✝
a
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
|
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
b: Term (dom✝ :: ctx) ran✝
b
=>
lam: {dom : Ty} → {ctx : List Ty} → {ran : Ty} → Term (dom :: ctx) ran → Term ctx (dom.fn ran)
lam
b: Term (dom✝ :: ctx) ran✝
b
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
|
«let»: {ctx : List Ty} → {ty₁ ty₂ : Ty} → Term ctx ty₁ → Term (ty₁ :: ctx) ty₂ → Term ctx ty₂
«let»
a: Term ctx ty₁✝
a
b: Term (ty₁✝ :: ctx) ty
b
=>
«let»: {ctx : List Ty} → {ty₁ ty₂ : Ty} → Term ctx ty₁ → Term (ty₁ :: ctx) ty₂ → Term ctx ty₂
«let»
a: Term ctx ty₁✝
a
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
b: Term (ty₁✝ :: ctx) ty
b
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
|
plus: {ctx : List Ty} → Term ctx nat → Term ctx nat → Term ctx nat
plus
a: Term ctx nat
a
b: Term ctx nat
b
=> match
a: Term ctx nat
a
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
,
b: Term ctx nat
b
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
with |
const: {ctx : List Ty} → Nat → Term ctx nat
const
n: Nat
n
,
const: {ctx : List Ty} → Nat → Term ctx nat
const
m: Nat
m
=>
const: {ctx : List Ty} → Nat → Term ctx nat
const
(
n: Nat
n
+
m: Nat
m
) |
a': Term ctx nat
a'
,
b': Term ctx nat
b'
=>
plus: {ctx : List Ty} → Term ctx nat → Term ctx nat → Term ctx nat
plus
a': Term ctx nat
a'
b': Term ctx nat
b'

The correctness of the Term.constFold is proved using induction, case-analysis, and the term simplifier. We prove all cases but the one for plus using simp [*]. This tactic instructs the term simplifier to use hypotheses such as a = b as rewriting/simplifications rules. We use the split to break the nested match expression in the plus case into two cases. The local variables iha and ihb are the induction hypotheses for a and b. The modifier in a term simplifier argument instructs the term simplifier to use the equation as a rewriting rule in the "reverse direction". That is, given h : a = b, ← h instructs the term simplifier to rewrite b subterms to a.

theorem 
Term.constFold_sound: ∀ {ctx : List Ty} {ty : Ty} {env : HList Ty.denote ctx} (e : Term ctx ty), e.constFold.denote env = e.denote env
Term.constFold_sound
(
e: Term ctx ty
e
:
Term: List Ty → Ty → Type
Term
ctx: List Ty
ctx
ty: Ty
ty
) :
e: Term ctx ty
e
.
constFold: {ctx : List Ty} → {ty : Ty} → Term ctx ty → Term ctx ty
constFold
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
=
e: Term ctx ty
e
.
denote: {ctx : List Ty} → {ty : Ty} → Term ctx ty → HList Ty.denote ctx → ty.denote
denote
env: HList Ty.denote ctx
env
:=

Goals accomplished! 🐙
ctx: List Ty
ty: Ty
env: HList Ty.denote ctx
e: Term ctx ty

e.constFold.denote env = e.denote env
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx

plus
(match a.constFold, b.constFold with | const n, const m => const (n + m) | a', b' => a'.plus b').denote env = a.denote env + b.denote env

Goals accomplished! 🐙
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx
x✝¹, x✝: Term ctx nat
n✝, m✝: Nat
heq✝¹: a.constFold = const n
heq✝: b.constFold = const m

plus.h_1
(const (n + m)).denote env = a.denote env + b.denote env
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx
x✝², x✝¹: Term ctx nat
x✝: (n m : Nat), a.constFold = const n b.constFold = const m False
(a.constFold.plus b.constFold).denote env = a.denote env + b.denote env
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx
x✝¹, x✝: Term ctx nat
n✝, m✝: Nat
heq✝¹: a.constFold = const n
heq✝: b.constFold = const m

plus.h_1
(const (n + m)).denote env = a.denote env + b.denote env
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx
x✝², x✝¹: Term ctx nat
x✝: (n m : Nat), a.constFold = const n b.constFold = const m False
(a.constFold.plus b.constFold).denote env = a.denote env + b.denote env

Goals accomplished! 🐙
ctx: List Ty
ty: Ty
ctx✝: List Ty
a, b: Term ctx nat
iha: {env : HList Ty.denote ctx}, a.constFold.denote env = a.denote env
ihb: {env : HList Ty.denote ctx}, b.constFold.denote env = b.denote env
env: HList Ty.denote ctx
x✝², x✝¹: Term ctx nat
x✝: (n m : Nat), a.constFold = const n b.constFold = const m False

plus.h_2
(a.constFold.plus b.constFold).denote env = a.denote env + b.denote env

Goals accomplished! 🐙

Parametric Higher-Order Abstract Syntax

In contrast to first-order encodings, higher-order encodings avoid explicit modeling of variable identity. Instead, the binding constructs of an object language (the language being formalized) can be represented using the binding constructs of the meta language (the language in which the formalization is done). The best known higher-order encoding is called higher-order abstract syntax (HOAS), and we can start by attempting to apply it directly in Lean.

Remark: this example is based on an example in the book Certified Programming with Dependent Types by Adam Chlipala.

Here is the definition of the simple type system for our programming language, a simply typed lambda calculus with natural numbers as the base type.

inductive 
Ty: Type
Ty
where |
nat: Ty
nat
|
fn: Ty → Ty → Ty
fn
:
Ty: Type
Ty
Ty: Type
Ty
Ty: Type
Ty

We can write a function to translate Ty values to a Lean type — remember that types are first class, so can be calculated just like any other value. We mark Ty.denote as [reducible] to make sure the typeclass resolution procedure can unfold/reduce it. For example, suppose Lean is trying to synthesize a value for the instance Add (Ty.denote Ty.nat). Since Ty.denote is marked as [reducible], the typeclass resolution procedure can reduce Ty.denote Ty.nat to Nat, and use the builtin instance for Add Nat as the solution.

Recall that the term a.denote is sugar for denote a where denote is the function being defined. We call it the "dot notation".

@[reducible] def 
Ty.denote: Ty → Type
Ty.denote
:
Ty: Type
Ty
Type: Type 1
Type
|
nat: Ty
nat
=>
Nat: Type
Nat
|
fn: Ty → Ty → Ty
fn
a: Ty
a
b: Ty
b
=>
a: Ty
a
.
denote: Ty → Type
denote
b: Ty
b
.
denote: Ty → Type
denote

With HOAS, each object language binding construct is represented with a function of the meta language. Here is what we get if we apply that idea within an inductive definition of term syntax. However a naive encondig in Lean fails to meet the strict positivity restrictions imposed by the Lean kernel. An alternate higher-order encoding is parametric HOAS, as introduced by Washburn and Weirich for Haskell and tweaked by Adam Chlipala for use in Coq. The key idea is to parameterize the declaration by a type family rep standing for a "representation of variables."

inductive 
Term': (Ty → Type) → Ty → Type
Term'
(
rep: Ty → Type
rep
:
Ty: Type
Ty
Type: Type 1
Type
) :
Ty: Type
Ty
Type: Type 1
Type
|
var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
var
:
rep: Ty → Type
rep
ty: Ty
ty
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty
|
const: {rep : Ty → Type} → Nat → Term' rep Ty.nat
const
:
Nat: Type
Nat
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
.nat: Ty
.nat
|
plus: {rep : Ty → Type} → Term' rep Ty.nat → Term' rep Ty.nat → Term' rep Ty.nat
plus
:
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
.nat: Ty
.nat
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
.nat: Ty
.nat
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
.nat: Ty
.nat
|
lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
lam
: (
rep: Ty → Type
rep
dom: Ty
dom
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ran: Ty
ran
)
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
(
.fn: Ty → Ty → Ty
.fn
dom: Ty
dom
ran: Ty
ran
) |
app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
app
:
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
(
.fn: Ty → Ty → Ty
.fn
dom: Ty
dom
ran: Ty
ran
)
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
dom: Ty
dom
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ran: Ty
ran
|
let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
let
:
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty₁: Ty
ty₁
(
rep: Ty → Type
rep
ty₁: Ty
ty₁
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty₂: Ty
ty₂
)
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty₂: Ty
ty₂

Lean accepts this definition because our embedded functions now merely take variables as arguments, instead of arbitrary terms. One might wonder whether there is an easy loophole to exploit here, instantiating the parameter rep as term itself. However, to do that, we would need to choose a variable representation for this nested mention of term, and so on through an infinite descent into term arguments.

We write the final type of a closed term using polymorphic quantification over all possible choices of rep type family

open Ty (
nat: Ty
nat
fn: Ty → Ty → Ty
fn
) namespace FirstTry def
Term: Ty → Type 1
Term
(
ty: Ty
ty
:
Ty: Type
Ty
) := (
rep: Ty → Type
rep
:
Ty: Type
Ty
Type: Type 1
Type
)
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty

In the next two example, note how each is written as a function over a rep choice, such that the specific choice has no impact on the structure of the term.

def 
add: Term (nat.fn (nat.fn nat))
add
:
Term: Ty → Type 1
Term
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
nat: Ty
nat
)) := fun
_rep: Ty → Type
_rep
=>
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
x: _rep nat
x
=>
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
y: _rep nat
y
=>
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
(
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: _rep nat
x
) (
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
y: _rep nat
y
) def
three_the_hard_way: Term nat
three_the_hard_way
:
Term: Ty → Type 1
Term
nat: Ty
nat
:= fun
rep: Ty → Type
rep
=>
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
(
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
(
add: Term (nat.fn (nat.fn nat))
add
rep: Ty → Type
rep
) (
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
1: Nat
1
)) (
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
2: Nat
2
) end FirstTry

The argument rep does not even appear in the function body for add. How can that be? By giving our terms expressive types, we allow Lean to infer many arguments for us. In fact, we do not even need to name the rep argument! By using Lean implicit arguments and lambdas, we can completely hide rep in these examples.

def 
Term: Ty → Type 1
Term
(
ty: Ty
ty
:
Ty: Type
Ty
) := {
rep: Ty → Type
rep
:
Ty: Type
Ty
Type: Type 1
Type
}
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty
def
add: Term (nat.fn (nat.fn nat))
add
:
Term: Ty → Type 1
Term
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
(
fn: Ty → Ty → Ty
fn
nat: Ty
nat
nat: Ty
nat
)) :=
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
x: rep✝ nat
x
=>
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
y: rep✝ nat
y
=>
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
(
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep✝ nat
x
) (
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
y: rep✝ nat
y
) def
three_the_hard_way: Term nat
three_the_hard_way
:
Term: Ty → Type 1
Term
nat: Ty
nat
:=
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
(
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
add: Term (nat.fn (nat.fn nat))
add
(
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
1: Nat
1
)) (
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
2: Nat
2
)

It may not be at all obvious that the PHOAS representation admits the crucial computable operations. The key to effective deconstruction of PHOAS terms is one principle: treat the rep parameter as an unconstrained choice of which data should be annotated on each variable. We will begin with a simple example, that of counting how many variable nodes appear in a PHOAS term. This operation requires no data annotated on variables, so we simply annotate variables with Unit values. Note that, when we go under binders in the cases for lam and let, we must provide the data value to annotate on the new variable we pass beneath. For our current choice of Unit data, we always pass ().

def 
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
:
Term': (Ty → Type) → Ty → Type
Term'
(fun
_: Ty
_
=>
Unit: Type
Unit
)
ty: Ty
ty
Nat: Type
Nat
|
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
_ =>
1: Nat
1
|
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
_ =>
0: Nat
0
|
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a: Term' (fun x => Unit) nat
a
b: Term' (fun x => Unit) nat
b
=>
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
a: Term' (fun x => Unit) nat
a
+
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
b: Term' (fun x => Unit) nat
b
|
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
f: Term' (fun x => Unit) (dom✝.fn ty)
f
a: Term' (fun x => Unit) dom✝
a
=>
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
f: Term' (fun x => Unit) (dom✝.fn ty)
f
+
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
a: Term' (fun x => Unit) dom✝
a
|
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
b: Unit → Term' (fun x => Unit) ran✝
b
=>
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
(
b: Unit → Term' (fun x => Unit) ran✝
b
(): Unit
()
) |
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
a: Term' (fun x => Unit) ty₁✝
a
b: Unit → Term' (fun x => Unit) ty
b
=>
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
a: Term' (fun x => Unit) ty₁✝
a
+
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
(
b: Unit → Term' (fun x => Unit) ty
b
(): Unit
()
)

We can now easily prove that add has two variables by using reflexivity

example: countVars add = 2
example
:
countVars: {ty : Ty} → Term' (fun x => Unit) ty → Nat
countVars
add: Term (nat.fn (nat.fn nat))
add
=
2: Nat
2
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl

Here is another example, translating PHOAS terms into strings giving a first-order rendering. To implement this translation, the key insight is to tag variables with strings, giving their names. The function takes as an additional input i which is used to create variable names for binders. We also use the string interpolation available in Lean. For example, s!"x_{i}" is expanded to "x_" ++ toString i.

def 
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
(
e: Term' (fun x => String) ty
e
:
Term': (Ty → Type) → Ty → Type
Term'
(fun
_: Ty
_
=>
String: Type
String
)
ty: Ty
ty
) (
i: optParam Nat 1
i
:
Nat: Type
Nat
:=
1: Nat
1
) :
String: Type
String
:= match
e: Term' (fun x => String) ty
e
with |
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
s: String
s
=>
s: String
s
|
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
n: Nat
n
|
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
f: Term' (fun x => String) (dom✝.fn ty)
f
a: Term' (fun x => String) dom✝
a
=> s!"({
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
f: Term' (fun x => String) (dom✝.fn ty)
f
i: optParam Nat 1
i
} {
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
a: Term' (fun x => String) dom✝
a
i: optParam Nat 1
i
})" |
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a: Term' (fun x => String) nat
a
b: Term' (fun x => String) nat
b
=> s!"({
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
a: Term' (fun x => String) nat
a
i: optParam Nat 1
i
} + {
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
b: Term' (fun x => String) nat
b
i: optParam Nat 1
i
})" |
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
f: String → Term' (fun x => String) ran✝
f
=> let
x: String
x
:= s!"x_{
i: optParam Nat 1
i
}" s!"(fun {
x: String
x
} => {
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
(
f: String → Term' (fun x => String) ran✝
f
x: String
x
) (
i: optParam Nat 1
i
+
1: Nat
1
)})" |
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
a: Term' (fun x => String) ty₁✝
a
b: String → Term' (fun x => String) ty
b
=> let
x: String
x
:= s!"x_{
i: optParam Nat 1
i
}" s!"(let {
x: String
x
} := {
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
a: Term' (fun x => String) ty₁✝
a
i: optParam Nat 1
i
}; => {
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
(
b: String → Term' (fun x => String) ty
b
x: String
x
) (
i: optParam Nat 1
i
+
1: Nat
1
)}"
"(((fun x_1 => (fun x_2 => (x_1 + x_2))) 1) 2)"
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
three_the_hard_way: Term nat
three_the_hard_way

It is not necessary to convert to a different representation to support many common operations on terms. For instance, we can implement substitution of terms for variables. The key insight here is to tag variables with terms, so that, on encountering a variable, we can simply replace it by the term in its tag. We will call this function initially on a term with exactly one free variable, tagged with the appropriate substitute. During recursion, new variables are added, but they are only tagged with their own term equivalents. Note that this function squash is parameterized over a specific rep choice.

def 
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
:
Term': (Ty → Type) → Ty → Type
Term'
(
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
)
ty: Ty
ty
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty
|
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
e: Term' rep ty
e
=>
e: Term' rep ty
e
|
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
=>
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
|
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a: Term' (Term' rep) nat
a
b: Term' (Term' rep) nat
b
=>
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
(
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
a: Term' (Term' rep) nat
a
) (
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
b: Term' (Term' rep) nat
b
) |
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
f: Term' rep dom✝ → Term' (Term' rep) ran✝
f
=>
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
x: rep dom✝
x
=>
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
(
f: Term' rep dom✝ → Term' (Term' rep) ran✝
f
(
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep dom✝
x
)) |
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
f: Term' (Term' rep) (dom✝.fn ty)
f
a: Term' (Term' rep) dom✝
a
=>
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
(
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
f: Term' (Term' rep) (dom✝.fn ty)
f
) (
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
a: Term' (Term' rep) dom✝
a
) |
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
a: Term' (Term' rep) ty₁✝
a
b: Term' rep ty₁✝ → Term' (Term' rep) ty
b
=>
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
(
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
a: Term' (Term' rep) ty₁✝
a
) fun
x: rep ty₁✝
x
=>
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
(
b: Term' rep ty₁✝ → Term' (Term' rep) ty
b
(
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep ty₁✝
x
))

To define the final substitution function over terms with single free variables, we define Term1, an analogue to Term that we defined before for closed terms.

def 
Term1: Ty → Ty → Type 1
Term1
(
ty1: Ty
ty1
ty2: Ty
ty2
:
Ty: Type
Ty
) := {
rep: Ty → Type
rep
:
Ty: Type
Ty
Type: Type 1
Type
}
rep: Ty → Type
rep
ty1: Ty
ty1
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty2: Ty
ty2

Substitution is defined by (1) instantiating a Term1 to tag variables with terms and (2) applying the result to a specific term to be substituted. Note how the parameter rep of squash is instantiated: the body of subst is itself a polymorphic quantification over rep, standing for a variable tag choice in the output term; and we use that input to compute a tag choice for the input term.

def 
subst: {ty1 ty2 : Ty} → Term1 ty1 ty2 → Term ty1 → Term ty2
subst
(
e: Term1 ty1 ty2
e
:
Term1: Ty → Ty → Type 1
Term1
ty1: Ty
ty1
ty2: Ty
ty2
) (
e': Term ty1
e'
:
Term: Ty → Type 1
Term
ty1: Ty
ty1
) :
Term: Ty → Type 1
Term
ty2: Ty
ty2
:=
squash: {rep : Ty → Type} → {ty : Ty} → Term' (Term' rep) ty → Term' rep ty
squash
(
e: Term1 ty1 ty2
e
e': Term ty1
e'
)

We can view Term1 as a term with hole. In the following example, (fun x => plus (var x) (const 5)) can be viewed as the term plus _ (const 5) where the hole _ is instantiated by subst with three_the_hard_way

"((((fun x_1 => (fun x_2 => (x_1 + x_2))) 1) 2) + 5)"
pretty: {ty : Ty} → Term' (fun x => String) ty → optParam Nat 1 → String
pretty
<|
subst: {ty1 ty2 : Ty} → Term1 ty1 ty2 → Term ty1 → Term ty2
subst
(fun
x: rep✝ nat
x
=>
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
(
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep✝ nat
x
) (
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
5: Nat
5
))
three_the_hard_way: Term nat
three_the_hard_way

One further development, which may seem surprising at first, is that we can also implement a usual term denotation function, when we tag variables with their denotations.

The attribute [simp] instructs Lean to always try to unfold denote applications when one applies the simp tactic. We also say this is a hint for the Lean term simplifier.

@[simp] def 
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
:
Term': (Ty → Type) → Ty → Type
Term'
Ty.denote: Ty → Type
Ty.denote
ty: Ty
ty
ty: Ty
ty
.
denote: Ty → Type
denote
|
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: ty.denote
x
=>
x: ty.denote
x
|
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
=>
n: Nat
n
|
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a: Term' Ty.denote nat
a
b: Term' Ty.denote nat
b
=>
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
a: Term' Ty.denote nat
a
+
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
b: Term' Ty.denote nat
b
|
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
f: Term' Ty.denote (dom✝.fn ty)
f
a: Term' Ty.denote dom✝
a
=>
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
f: Term' Ty.denote (dom✝.fn ty)
f
(
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
a: Term' Ty.denote dom✝
a
) |
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
f: dom✝.denote → Term' Ty.denote ran✝
f
=> fun
x: dom✝.denote
x
=>
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
(
f: dom✝.denote → Term' Ty.denote ran✝
f
x: dom✝.denote
x
) |
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
a: Term' Ty.denote ty₁✝
a
b: ty₁✝.denote → Term' Ty.denote ty
b
=>
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
(
b: ty₁✝.denote → Term' Ty.denote ty
b
(
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
a: Term' Ty.denote ty₁✝
a
))
example: denote three_the_hard_way = 3
example
:
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
three_the_hard_way: Term nat
three_the_hard_way
=
3: nat.denote
3
:=
rfl: ∀ {α : Type} {a : α}, a = a
rfl

To summarize, the PHOAS representation has all the expressive power of more standard encodings (e.g., using de Bruijn indices), and a variety of translations are actually much more pleasant to implement than usual, thanks to the novel ability to tag variables with data.

We now define the constant folding optimization that traverses a term if replaces subterms such as plus (const m) (const n) with const (n+m).

@[simp] def 
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
:
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty
Term': (Ty → Type) → Ty → Type
Term'
rep: Ty → Type
rep
ty: Ty
ty
|
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep ty
x
=>
.var: {rep : Ty → Type} → {ty : Ty} → rep ty → Term' rep ty
.var
x: rep ty
x
|
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
=>
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
|
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
f: Term' rep (dom✝.fn ty)
f
a: Term' rep dom✝
a
=>
.app: {rep : Ty → Type} → {dom ran : Ty} → Term' rep (dom.fn ran) → Term' rep dom → Term' rep ran
.app
(
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
f: Term' rep (dom✝.fn ty)
f
) (
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
a: Term' rep dom✝
a
) |
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
f: rep dom✝ → Term' rep ran✝
f
=>
.lam: {rep : Ty → Type} → {dom ran : Ty} → (rep dom → Term' rep ran) → Term' rep (dom.fn ran)
.lam
fun
x: rep dom✝
x
=>
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
(
f: rep dom✝ → Term' rep ran✝
f
x: rep dom✝
x
) |
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
a: Term' rep ty₁✝
a
b: rep ty₁✝ → Term' rep ty
b
=>
.let: {rep : Ty → Type} → {ty₁ ty₂ : Ty} → Term' rep ty₁ → (rep ty₁ → Term' rep ty₂) → Term' rep ty₂
.let
(
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
a: Term' rep ty₁✝
a
) fun
x: rep ty₁✝
x
=>
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
(
b: rep ty₁✝ → Term' rep ty
b
x: rep ty₁✝
x
) |
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a: Term' rep nat
a
b: Term' rep nat
b
=> match
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
a: Term' rep nat
a
,
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
b: Term' rep nat
b
with |
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
n: Nat
n
,
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
m: Nat
m
=>
.const: {rep : Ty → Type} → Nat → Term' rep nat
.const
(
n: Nat
n
+
m: Nat
m
) |
a': Term' rep nat
a'
,
b': Term' rep nat
b'
=>
.plus: {rep : Ty → Type} → Term' rep nat → Term' rep nat → Term' rep nat
.plus
a': Term' rep nat
a'
b': Term' rep nat
b'

The correctness of the constFold is proved using induction, case-analysis, and the term simplifier. We prove all cases but the one for plus using simp [*]. This tactic instructs the term simplifier to use hypotheses such as a = b as rewriting/simplifications rules. We use the split to break the nested match expression in the plus case into two cases. The local variables iha and ihb are the induction hypotheses for a and b. The modifier in a term simplifier argument instructs the term simplifier to use the equation as a rewriting rule in the "reverse direction. That is, given h : a = b, ← h instructs the term simplifier to rewrite b subterms to a.

theorem 
constFold_sound: ∀ {ty : Ty} (e : Term' Ty.denote ty), denote (constFold e) = denote e
constFold_sound
(
e: Term' Ty.denote ty
e
:
Term': (Ty → Type) → Ty → Type
Term'
Ty.denote: Ty → Type
Ty.denote
ty: Ty
ty
) :
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
(
constFold: {rep : Ty → Type} → {ty : Ty} → Term' rep ty → Term' rep ty
constFold
e: Term' Ty.denote ty
e
) =
denote: {ty : Ty} → Term' Ty.denote ty → ty.denote
denote
e: Term' Ty.denote ty
e
:=

Goals accomplished! 🐙
ty: Ty
e: Term' Ty.denote ty

denote (constFold e) = denote e

Goals accomplished! 🐙

Goals accomplished! 🐙
ty: Ty
a, b: Term' Ty.denote nat
iha: denote (constFold a) = denote a
ihb: denote (constFold b) = denote b
x✝¹, x✝: Term' Ty.denote nat
n✝, m✝: Nat
heq✝¹: constFold a = Term'.const n
heq✝: constFold b = Term'.const m

plus.h_1
denote (Term'.const (n + m)) = denote a + denote b
ty: Ty
a, b: Term' Ty.denote nat
iha: denote (constFold a) = denote a
ihb: denote (constFold b) = denote b
x✝², x✝¹: Term' Ty.denote nat
x✝: (n m : Nat), constFold a = Term'.const n constFold b = Term'.const m False
denote ((constFold a).plus (constFold b)) = denote a + denote b
ty: Ty
a, b: Term' Ty.denote nat
iha: denote (constFold a) = denote a
ihb: denote (constFold b) = denote b
x✝¹, x✝: Term' Ty.denote nat
n✝, m✝: Nat
heq✝¹: constFold a = Term'.const n
heq✝: constFold b = Term'.const m

plus.h_1
denote (Term'.const (n + m)) = denote a + denote b
ty: Ty
a, b: Term' Ty.denote nat
iha: denote (constFold a) = denote a
ihb: denote (constFold b) = denote b
x✝², x✝¹: Term' Ty.denote nat
x✝: (n m : Nat), constFold a = Term'.const n constFold b = Term'.const m False
denote ((constFold a).plus (constFold b)) = denote a + denote b

Goals accomplished! 🐙
ty: Ty
a, b: Term' Ty.denote nat
iha: denote (constFold a) = denote a
ihb: denote (constFold b) = denote b
x✝², x✝¹: Term' Ty.denote nat
x✝: (n m : Nat), constFold a = Term'.const n constFold b = Term'.const m False

plus.h_2
denote ((constFold a).plus (constFold b)) = denote a + denote b

Goals accomplished! 🐙

Organizational features

In this section we introduce some organizational features of Lean that are not a part of its kernel per se, but make it possible to work in the framework more efficiently.

Variables and Sections

Consider the following three function definitions:

def compose (α β γ : Type) (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

def doTwice (α : Type) (h : α → α) (x : α) : α :=
  h (h x)

def doThrice (α : Type) (h : α → α) (x : α) : α :=
  h (h (h x))

Lean provides us with the variable command to make such declarations look more compact:

variable (α β γ : Type)

def compose (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

def doTwice (h : α → α) (x : α) : α :=
  h (h x)

def doThrice (h : α → α) (x : α) : α :=
  h (h (h x))

We can declare variables of any type, not just Type itself:

variable (α β γ : Type)
variable (g : β → γ) (f : α → β) (h : α → α)
variable (x : α)

def compose := g (f x)
def doTwice := h (h x)
def doThrice := h (h (h x))

#print compose
#print doTwice
#print doThrice

Printing them out shows that all three groups of definitions have exactly the same effect.

The variable command instructs Lean to insert the declared variables as bound variables in definitions that refer to them. Lean is smart enough to figure out which variables are used explicitly or implicitly in a definition. We can therefore proceed as though α, β, γ, g, f, h, and x are fixed objects when we write our definitions, and let Lean abstract the definitions for us automatically.

When declared in this way, a variable stays in scope until the end of the file we are working on. Sometimes, however, it is useful to limit the scope of a variable. For that purpose, Lean provides the notion of a section:

section useful
  variable (α β γ : Type)
  variable (g : β → γ) (f : α → β) (h : α → α)
  variable (x : α)

  def compose := g (f x)
  def doTwice := h (h x)
  def doThrice := h (h (h x))
end useful

When the section is closed, the variables go out of scope, and become nothing more than a distant memory.

You do not have to indent the lines within a section. Nor do you have to name a section, which is to say, you can use an anonymous section / end pair. If you do name a section, however, you have to close it using the same name. Sections can also be nested, which allows you to declare new variables incrementally.

Namespaces

Lean provides us with the ability to group definitions into nested, hierarchical namespaces:

namespace Foo
  def a : Nat := 5
  def f (x : Nat) : Nat := x + 7

  def fa : Nat := f a
  def ffa : Nat := f (f a)

  #check a
  #check f
  #check fa
  #check ffa
  #check Foo.fa
end Foo

-- #check a  -- error
-- #check f  -- error
#check Foo.a
#check Foo.f
#check Foo.fa
#check Foo.ffa

open Foo

#check a
#check f
#check fa
#check Foo.fa

When we declare that we are working in the namespace Foo, every identifier we declare has a full name with prefix "Foo." Within the namespace, we can refer to identifiers by their shorter names, but once we end the namespace, we have to use the longer names.

The open command brings the shorter names into the current context. Often, when we import a module, we will want to open one or more of the namespaces it contains, to have access to the short identifiers. But sometimes we will want to leave this information hidden, for example, when they conflict with identifiers in another namespace we want to use. Thus namespaces give us a way to manage our working environment.

For example, Lean groups definitions and theorems involving lists into a namespace List.

#check List.nil
#check List.cons
#check List.map

We will discuss their types, below. The command open List allows us to use the shorter names:

open List

#check nil
#check cons
#check map

Like sections, namespaces can be nested:

namespace Foo
  def a : Nat := 5
  def f (x : Nat) : Nat := x + 7

  def fa : Nat := f a

  namespace Bar
    def ffa : Nat := f (f a)

    #check fa
    #check ffa
  end Bar

  #check fa
  #check Bar.ffa
end Foo

#check Foo.fa
#check Foo.Bar.ffa

open Foo

#check fa
#check Bar.ffa

Namespaces that have been closed can later be reopened, even in another file:

namespace Foo
  def a : Nat := 5
  def f (x : Nat) : Nat := x + 7

  def fa : Nat := f a
end Foo

#check Foo.a
#check Foo.f

namespace Foo
  def ffa : Nat := f (f a)
end Foo

Like sections, nested namespaces have to be closed in the order they are opened. Namespaces and sections serve different purposes: namespaces organize data and sections declare variables for insertion in definitions. Sections are also useful for delimiting the scope of commands such as set_option and open.

In many respects, however, a namespace ... end block behaves the same as a section ... end block. In particular, if you use the variable command within a namespace, its scope is limited to the namespace. Similarly, if you use an open command within a namespace, its effects disappear when the namespace is closed.

Implicit Arguments

Suppose we define the compose function as.

def compose (α β γ : Type) (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

The function compose takes three types, α, β, and γ, and two functions, g : β → γ and f : α → β, a value x : α, and returns g (f x), the composition of g and f. We say compose is polymorphic over types α, β, and γ. Now, let's use compose:

def compose (α β γ : Type) (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)
def double (x : Nat) := 2*x
def triple (x : Nat) := 3*x

#check compose Nat Nat Nat double triple 10 -- Nat
#eval  compose Nat Nat Nat double triple 10 -- 60

def appendWorld (s : String) := s ++ "world"
#check String.length -- String → Nat

#check compose String String Nat String.length appendWorld "hello" -- Nat
#eval  compose String String Nat String.length appendWorld "hello" -- 10

Because compose is polymorphic over types α, β, and γ, we have to provide them in the examples above. But this information is redundant: one can infer the types from the arguments g and f. This is a central feature of dependent type theory: terms carry a lot of information, and often some of that information can be inferred from the context. In Lean, one uses an underscore, _, to specify that the system should fill in the information automatically.

def compose (α β γ : Type) (g : β → γ) (f : α → β) (x : α) : γ :=
 g (f x)
def double (x : Nat) := 2*x
def triple (x : Nat) := 3*x
#check compose _ _ _ double triple 10 -- Nat
#eval  compose Nat Nat Nat double triple 10 -- 60
def appendWorld (s : String) := s ++ "world"
#check String.length -- String → Nat
#check compose _ _ _ String.length appendWorld "hello" -- Nat
#eval  compose _ _ _ String.length appendWorld "hello" -- 10

It is still tedious, however, to type all these underscores. When a function takes an argument that can generally be inferred from context, Lean allows us to specify that this argument should, by default, be left implicit. This is done by putting the arguments in curly braces, as follows:

def compose {α β γ : Type} (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)
def double (x : Nat) := 2*x
def triple (x : Nat) := 3*x
#check compose double triple 10 -- Nat
#eval  compose double triple 10 -- 60
def appendWorld (s : String) := s ++ "world"
#check String.length -- String → Nat
#check compose String.length appendWorld "hello" -- Nat
#eval  compose String.length appendWorld "hello" -- 10

All that has changed are the braces around α β γ: Type. It makes these three arguments implicit. Notationally, this hides the specification of the type, making it look as though compose simply takes 3 arguments.

Variables can also be specified as implicit when they are declared with the variable command:

universe u

section
  variable {α : Type u}
  variable (x : α)
  def ident := x
end

variable (α β : Type u)
variable (a : α) (b : β)

#check ident
#check ident a
#check ident b

This definition of ident here has the same effect as the one above.

Lean has very complex mechanisms for instantiating implicit arguments, and we will see that they can be used to infer function types, predicates, and even proofs. The process of instantiating these "holes," or "placeholders," in a term is part of a bigger process called elaboration. The presence of implicit arguments means that at times there may be insufficient information to fix the meaning of an expression precisely. An expression like ident is said to be polymorphic, because it can take on different meanings in different contexts.

One can always specify the type T of an expression e by writing (e : T). This instructs Lean's elaborator to use the value T as the type of e when trying to elaborate it. In the following example, this mechanism is used to specify the desired types of the expressions ident.

def ident {α : Type u} (a : α) : α := a

#check (ident : Nat → Nat) -- Nat → Nat

Numerals are overloaded in Lean, but when the type of a numeral cannot be inferred, Lean assumes, by default, that it is a natural number. So the expressions in the first two #check commands below are elaborated in the same way, whereas the third #check command interprets 2 as an integer.

#check 2         -- Nat
#check (2 : Nat) -- Nat
#check (2 : Int) -- Int

Sometimes, however, we may find ourselves in a situation where we have declared an argument to a function to be implicit, but now want to provide the argument explicitly. If foo is such a function, the notation @foo denotes the same function with all the arguments made explicit.

def ident {α : Type u} (a : α) : α := a
variable (α β : Type)

#check @ident           -- {α : Type u} → α → α
#check @ident α         -- α → α
#check @ident β         -- β → β
#check @ident Nat       -- Nat → Nat
#check @ident Bool true -- Bool

Notice that now the first #check command gives the type of the identifier, ident, without inserting any placeholders. Moreover, the output indicates that the first argument is implicit.

Named arguments enable you to specify an argument for a parameter by matching the argument with its name rather than with its position in the parameter list. You can use them to specify explicit and implicit arguments. If you don't remember the order of the parameters but know their names, you can send the arguments in any order. You may also provide the value for an implicit parameter when Lean failed to infer it. Named arguments also improve the readability of your code by identifying what each argument represents.

def ident {α : Type u} (a : α) : α := a

#check ident (α := Nat)  -- Nat → Nat
#check ident (α := Bool) -- Bool → Bool

Auto Bound Implicit Arguments

In the previous section, we have shown how implicit arguments make functions more convenient to use. However, functions such as compose are still quite verbose to define. Note that the universe polymorphic compose is even more verbose than the one previously defined.

universe u v w
def compose {α : Type u} {β : Type v} {γ : Type w}
            (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

You can avoid the universe command by providing the universe parameters when defining compose.

def compose.{u, v, w}
            {α : Type u} {β : Type v} {γ : Type w}
            (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

Lean 4 supports a new feature called auto bound implicit arguments. It makes functions such as compose much more convenient to write. When Lean processes the header of a declaration, any unbound identifier is automatically added as an implicit argument if it is a single lower case or greek letter. With this feature, we can write compose as

def compose (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

#check @compose
-- {β : Sort u_1} → {γ : Sort u_2} → {α : Sort u_3} → (β → γ) → (α → β) → α → γ

Note that, Lean inferred a more general type using Sort instead of Type.

Although we love this feature and use it extensively when implementing Lean, we realize some users may feel uncomfortable with it. Thus, you can disable it using the command set_option autoImplicit false.

set_option autoImplicit false
/- The following definition produces `unknown identifier` errors -/
-- def compose (g : β → γ) (f : α → β) (x : α) : γ :=
--   g (f x)

The Lean language server provides semantic highlighting information to editors, and it provides visual feedback whether an identifier has been interpreted as an auto bound implicit argument.

Syntax Extensions

Lean's syntax can be extended and customized by users at every level, ranging from basic "mixfix" notations over macro transformers to type-aware elaborators. In fact, all builtin syntax is parsed and processed using the same mechanisms and APIs open to users. In this section, we will describe and explain the various extension points. Significant syntax extensions already builtin into Lean such as the do notation are described in subsections.

While introducing new syntax is a relatively rare feature in programming languages and sometimes even frowned upon because of its potential to obscure code, it is an invaluable tool in formalization for expressing established conventions and notations of the respective field succinctly in code. Going beyond basic notations, Lean's ability to factor out common boilerplate code into (well-behaved) macros and to embed entire custom domain specific languages (DSLs) to textually encode subproblems efficiently and readably can be of great benefit to both programmers and proof engineers alike.

The do notation

Lean is a pure functional programming language, but you can write effectful code using the do embedded domain specific language (DSL). The following simple program prints two strings "hello" and "world" in the standard output and terminates with exit code 0. Note that the type of the program is IO UInt32. You can read this type as the type of values that perform input-output effects and produce a value of type UInt32.

def main : IO UInt32 := do
  IO.println "hello"
  IO.println "world"
  return 0

The type of IO.println is String → IO Unit. That is, it is a function from String to IO Unit which indicates it may perform input-output effects and produce a value of type Unit. We often say that functions that may perform effects are methods. We also say a method application, such as IO.println "hello" is an action. Note that the examples above also demonstrates that braceless do blocks are whitespace sensitive. If you like ;s and curly braces, you can write the example above as

def main : IO UInt32 := do {
  IO.println "hello";
  IO.println "world";
  return 0;
}

Semicolons can be used even when curly braces are not used. They are particularly useful when you want to "pack" more than one action in a single line.

def main : IO UInt32 := do
  IO.println "hello"; IO.println "world"
  return 0

Whitespace sensitivity in programming languages is a controversial topic among programmers. You should use your own style. We, the Lean developers, love the braceless and semicolon-free style. We believe it is clean and beautiful.

The do DSL expands into the core Lean language. Let's inspect the different components using the commands #print and #check.

def main : IO UInt32 := do
 IO.println "hello"
 IO.println "world"
 return 0

#check IO.println "hello"
-- IO Unit
#print main
-- Output contains the infix operator `>>=` and `pure`
-- The following `set_option` disables notation such as `>>=` in the output
set_option pp.notation false in
#print main
-- Output contains `bind` and `pure`
#print bind
-- bind : {m : Type u → Type v} → [self : Bind m] → {α β : Type u} →
--        m α → (α → m β) → m β
#print pure
-- pure : {m : Type u → Type v} → [self : Pure m] → {α : Type u} →
--        α → m α

-- IO implements the type classes `Bind` and `Pure`.
#check (inferInstance : Bind IO)
#check (inferInstance : Pure IO)

The types of bind and pure may look daunting at first sight. They both have many implicit arguments. Let's focus first on the explicit arguments. bind has two explicit arguments m α and α → m β. The first one should be viewed as an action with effects m and producing a value of type α. The second is a function that takes a value of type α and produces an action with effects m and a value of type β. The result is m β. The method bind is composing these two actions. We often say bind is an abstract semicolon. The method pure converts a value α into an action that produces an action m α.

Here is the same function being defined using bind and pure without the do DSL.

def main : IO UInt32 :=
  bind (IO.println "hello") fun _ =>
  bind (IO.println "world") fun _ =>
  pure 0

The notations let x <- action1; action2 and let x ← action1; action2 are just syntax sugar for bind action1 fun x => action2. Here is a small example using it.

def isGreaterThan0 (x : Nat) : IO Bool := do
  IO.println s!"value: {x}"
  return x > 0

def f (x : Nat) : IO Unit := do
  let c <- isGreaterThan0 x
  if c then
    IO.println s!"{x} is greater than 0"
  else
    pure ()

#eval f 10
-- value: 10
-- 10 is greater than 0

Nested actions

Note that we cannot write if isGreaterThan0 x then ... else ... because the condition in a if-then-else is a pure value without effects, but isGreaterThan0 x has type IO Bool. You can use the nested action notation to avoid this annoyance. Here is an equivalent definition for f using a nested action.

def isGreaterThan0 (x : Nat) : IO Bool := do
 IO.println s!"x: {x}"
 return x > 0

def f (x : Nat) : IO Unit := do
  if (<- isGreaterThan0 x) then
    IO.println s!"{x} is greater than 0"
  else
    pure ()

#print f

Lean "lifts" the nested actions and introduces the bind for us. Here is an example with two nested actions. Note that both actions are executed even if x = 0.

def isGreaterThan0 (x : Nat) : IO Bool := do
 IO.println s!"x: {x}"
 return x > 0

def f (x y : Nat) : IO Unit := do
  if (<- isGreaterThan0 x) && (<- isGreaterThan0 y) then
    IO.println s!"{x} and {y} are greater than 0"
  else
    pure ()

#eval f 0 10
-- value: 0
-- value: 10

-- The function `f` above is equivalent to
def g (x y : Nat) : IO Unit := do
  let c1 <- isGreaterThan0 x
  let c2 <- isGreaterThan0 y
  if c1 && c2 then
    IO.println s!"{x} and {y} are greater than 0"
  else
    pure ()

theorem fgEqual : f = g :=
  rfl -- proof by reflexivity

Here are two ways to achieve the short-circuit semantics in the example above

def isGreaterThan0 (x : Nat) : IO Bool := do
 IO.println s!"x: {x}"
 return x > 0

def f1 (x y : Nat) : IO Unit := do
  if (<- isGreaterThan0 x <&&> isGreaterThan0 y) then
    IO.println s!"{x} and {y} are greater than 0"
  else
    pure ()

-- `<&&>` is the effectful version of `&&`
-- Given `x y : IO Bool`, `x <&&> y` : m Bool`
-- It only executes `y` if `x` returns `true`.

#eval f1 0 10
-- value: 0
#eval f1 1 10
-- value: 1
-- value: 10
-- 1 and 10 are greater than 0

def f2 (x y : Nat) : IO Unit := do
  if (<- isGreaterThan0 x) then
    if (<- isGreaterThan0 y) then
      IO.println s!"{x} and {y} are greater than 0"
    else
      pure ()
  else
    pure ()

if-then notation

In the do DSL, we can write if c then action as a shorthand for if c then action else pure (). Here is the method f2 using this shorthand.

def isGreaterThan0 (x : Nat) : IO Bool := do
 IO.println s!"x: {x}"
 return x > 0

def f2 (x y : Nat) : IO Unit := do
  if (<- isGreaterThan0 x) then
    if (<- isGreaterThan0 y) then
      IO.println s!"{x} and {y} are greater than 0"

Reassignments

When writing effectful code, it is natural to think imperatively. For example, suppose we want to create an empty array xs, add 0 if some condition holds, add 1 if another condition holds, and then print it. In the following example, we use variable "shadowing" to simulate this kind of "update".

def f (b1 b2 : Bool) : IO Unit := do
  let xs := #[]
  let xs := if b1 then xs.push 0 else xs
  let xs := if b2 then xs.push 1 else xs
  IO.println xs

#eval f true true
-- #[0, 1]
#eval f false true
-- #[1]
#eval f true false
-- #[0]
#eval f false false
-- #[]

We can use tuples to simulate updates on multiple variables.

def f (b1 b2 : Bool) : IO Unit := do
  let xs := #[]
  let ys := #[]
  let (xs, ys) := if b1 then (xs.push 0, ys) else (xs, ys.push 0)
  let (xs, ys) := if b2 then (xs.push 1, ys) else (xs, ys.push 1)
  IO.println s!"xs: {xs}, ys: {ys}"

#eval f true false
-- xs: #[0], ys: #[1]

We can also simulate the control-flow above using join-points. A join-point is a let that is always tail called and fully applied. The Lean compiler implements them using gotos. Here is the same example using join-points.

def f (b1 b2 : Bool) : IO Unit := do
  let jp1 xs ys := IO.println s!"xs: {xs}, ys: {ys}"
  let jp2 xs ys := if b2 then jp1 (xs.push 1) ys else jp1 xs (ys.push 1)
  let xs := #[]
  let ys := #[]
  if b1 then jp2 (xs.push 0) ys else jp2 xs (ys.push 0)

#eval f true false
-- xs: #[0], ys: #[1]

You can capture complex control-flow using join-points. The do DSL offers the variable reassignment feature to make this kind of code more comfortable to write. In the following example, the mut modifier at let mut xs := #[] indicates that variable xs can be reassigned. The example contains two reassignments xs := xs.push 0 and xs := xs.push 1. The reassignments are compiled using join-points. There is no hidden state being updated.

def f (b1 b2 : Bool) : IO Unit := do
  let mut xs := #[]
  if b1 then xs := xs.push 0
  if b2 then xs := xs.push 1
  IO.println xs

#eval f true true
-- #[0, 1]

The notation x <- action reassigns x with the value produced by the action. It is equivalent to x := (<- action)

Iteration

The do DSL provides a unified notation for iterating over datastructures. Here are a few examples.

def sum (xs : Array Nat) : IO Nat := do
  let mut s := 0
  for x in xs do
    IO.println s!"x: {x}"
    s := s + x
  return s

#eval sum #[1, 2, 3]
-- x: 1
-- x: 2
-- x: 3
-- 6

-- We can write pure code using the `Id.run <| do` DSL too.
def sum' (xs : Array Nat) : Nat := Id.run <| do
  let mut s := 0
  for x in xs do
    s := s + x
  return s

#eval sum' #[1, 2, 3]
-- 6

def sumEven (xs : Array Nat) : IO Nat := do
  let mut s := 0
  for x in xs do
    if x % 2 == 0 then
      IO.println s!"x: {x}"
      s := s + x
  return s

#eval sumEven #[1, 2, 3, 6]
-- x: 2
-- x: 6
-- 8

def splitEvenOdd (xs : List Nat) : IO Unit := do
  let mut evens := #[]
  let mut odds  := #[]
  for x in xs do
    if x % 2 == 0 then
      evens := evens.push x
    else
      odds := odds.push x
  IO.println s!"evens: {evens}, odds: {odds}"

#eval splitEvenOdd [1, 2, 3, 4]
-- evens: #[2, 4], odds: #[1, 3]

def findNatLessThan (x : Nat) (p : Nat → Bool) : IO Nat := do
  -- [:x] is notation for the range [0, x)
  for i in [:x] do
    if p i then
      return i -- `return` from the `do` block
  throw (IO.userError "value not found")

#eval findNatLessThan 10 (fun x => x > 5 && x % 4 == 0)
-- 8

def sumOddUpTo (xs : List Nat) (threshold : Nat) : IO Nat := do
  let mut s := 0
  for x in xs do
    if x % 2 == 0 then
      continue -- it behaves like the `continue` statement in imperative languages
    IO.println s!"x: {x}"
    s := s + x
    if s > threshold then
      break -- it behaves like the `break` statement in imperative languages
  IO.println s!"result: {s}"
  return s

#eval sumOddUpTo [2, 3, 4, 11, 20, 31, 41, 51, 107] 40
-- x: 3
-- x: 11
-- x: 31
-- result: 45
-- 45

TODO: describe forIn

Try-catch

TODO

Returning early from a failed match

Inside a do block, the pattern let _ ← <success> | <fail> will continue with the rest of the block if the match on the left hand side succeeds, but will execute the right hand side and exit the block on failure:

def showUserInfo (getUsername getFavoriteColor : IO (Option String)) : IO Unit := do
  let some n ← getUsername | IO.println "no username!"
  IO.println s!"username: {n}"
  let some c ← getFavoriteColor | IO.println "user didn't provide a favorite color!"
  IO.println s!"favorite color: {c}"

-- username: JohnDoe
-- favorite color: red
#eval showUserInfo (pure <| some "JohnDoe") (pure <| some "red")

-- no username
#eval showUserInfo (pure none) (pure <| some "purple")

-- username: JaneDoe
-- user didn't provide a favorite color
#eval showUserInfo (pure <| some "JaneDoe") (pure none)

If-let

Inside a do block, users can employ the if let pattern to destructure actions:

def tryIncrement (getInput : IO (Option Nat)) : IO (Except String Nat) := do
  if let some n ← getInput
  then return Except.ok n.succ
  else return Except.error "argument was `none`"

-- Except.ok 2
#eval tryIncrement (pure <| some 1)

-- Except.error "argument was `none`"
#eval tryIncrement (pure <| none)

Pattern matching

TODO

Monads

TODO

ReaderT

TODO

StateT

TODO

StateRefT

TODO

ExceptT

TODO

MonadLift and automatic lifting

TODO

String interpolation

The s! prefix identifies a string literal as an interpolated string. An interpolated string is a string literal that might contain interpolation expressions. When an interpolated string is resolved to a result string, items with interpolation expressions are replaced by the string representations of the expression results. The polymorphic method toString is used to convert the value into a string.

String interpolation provides a more readable and convenient syntax to create formatted strings than a string composite formatting feature. The following example uses both features to produce the same output:

def name := "John"
def age  := 28

#eval IO.println s!"Hello, {name}! Are you {age} years old?"

#eval IO.println ("Hello, " ++ name ++ "! Are you " ++ toString age ++ " years old?")

-- `println! <interpolated-string>` is a macro for `IO.println s!<interpolated-string>`
#eval println! "Hello, {name}! Are you {age} years old?"

Structure of an interpolated string

To identify a string literal as an interpolated string, prepend it with s!. Terms inside braces {} are ordinary expressions whose type implements the type class ToString. To include a curly brace { in your interpolated string, you must escape it using \{. You can nest interpolated strings inside interpolated strings.

def vals := [1, 2, 3]

#eval IO.println s!"\{ vals := {vals} }"

#eval IO.println s!"variables: {vals.map (fun i => s!"x_{i}")}"

ToString instances

You can define a ToString instance for your own datatypes.

structure Person where
  name : String
  age  : Nat

instance : ToString Person where
  toString : Person -> String
    | { name := n, age := v } => s!"\{ name := {n}, age := {v} }"

def person1 : Person := {
  name := "John"
  age  := 28
}

#eval println! "person1: {person1}"

Notations and Precedence

The most basic syntax extension commands allow introducing new (or overloading existing) prefix, infix, and postfix operators.

infixl:65   " + " => HAdd.hAdd  -- left-associative
infix:50    " = " => Eq         -- non-associative
infixr:80   " ^ " => HPow.hPow  -- right-associative
prefix:75  "-"   => Neg.neg
set_option quotPrecheck false
postfix:max "⁻¹"  => Inv.inv

After the initial command name describing the operator kind (its "fixity"), we give the parsing precedence of the operator preceded by a colon :, then a new or existing token surrounded by double quotes (the whitespace is used for pretty printing), then the function this operator should be translated to after the arrow =>.

The precedence is a natural number describing how "tightly" an operator binds to its arguments, encoding the order of operations. We can make this more precise by looking at what the commands above unfold to:

notation:65 lhs:65 " + " rhs:66 => HAdd.hAdd lhs rhs
notation:50 lhs:51 " = " rhs:51 => Eq lhs rhs
notation:80 lhs:81 " ^ " rhs:80 => HPow.hPow lhs rhs
notation:75 "-" arg:75 => Neg.neg arg
set_option quotPrecheck false
notation:1024 arg:1024 "⁻¹" => Inv.inv arg  -- `max` is a shorthand for precedence 1024

It turns out that all commands from the first code block are in fact command macros translating to the more general notation command. We will learn about writing such macros below. Instead of a single token, the notation command accepts a mixed sequence of tokens and named term placeholders with precedences, which can be referenced on the right-hand side of => and will be replaced by the respective term parsed at that position. A placeholder with precedence p accepts only notations with precedence at least p in that place. Thus the string a + b + c cannot be parsed as the equivalent of a + (b + c) because the right-hand side operand of an infixl notation has precedence one greater than the notation itself. In contrast, infixr reuses the notation's precedence for the right-hand side operand, so a ^ b ^ c can be parsed as a ^ (b ^ c). Note that if we used notation directly to introduce an infix notation like

set_option quotPrecheck false
notation:65 lhs:65 " ~ " rhs:65 => wobble lhs rhs

where the precedences do not sufficiently determine associativity, Lean's parser will default to right associativity. More precisely, Lean's parser follows a local longest parse rule in the presence of ambiguous grammars: when parsing the right-hand side of a ~ in a ~ b ~ c, it will continue parsing as long as possible (as the current precedence allows), not stopping after b but parsing ~ c as well. Thus the term is equivalent to a ~ (b ~ c).

As mentioned above, the notation command allows us to define arbitrary mixfix syntax freely mixing tokens and placeholders.

set_option quotPrecheck false
notation:max "(" e ")" => e
notation:10 Γ " ⊢ " e " : " τ => Typing Γ e τ

Placeholders without precedence default to 0, i.e. they accept notations of any precedence in their place. If two notations overlap, we again apply the longest parse rule:

notation:65 a " + " b:66 " + " c:66 => a + b - c
#eval 1 + 2 + 3  -- 0

The new notation is preferred to the binary notation since the latter, before chaining, would stop parsing after 1 + 2. If there are multiple notations accepting the same longest parse, the choice will be delayed until elaboration, which will fail unless exactly one overload is type correct.

Macro Overview

The official paper describing the mechanics behind Lean 4's macro system can be found in Beyond Notations: Hygienic Macro Expansion for Theorem Proving Languages by Sebastian Ullrich and Leonardo de Moura, and the accompanying repo with example code can be found in the paper's code supplement. The supplement also includes a working implementation of the macro expander, so it's a good case study for people interested in the details.

What is a macro in Lean?

A macro is a function that takes in a syntax tree and produces a new syntax tree. Macros are useful for many reasons, but two of the big ones are a) allowing users to extend the language with new syntactic constructs without having to actually expand the core language, and b) allowing users to automate tasks that would otherwise be extremely repetitive, time-consuming, and/or error-prone.

A motivating example is set builder notation. We would like to be able to write the set of natural numbers 0, 1, and 2 as just {0, 1, 2}. However, Lean does not natively support this syntax, and the actual definition of a set in Mathlib does not let us just declare sets in this manner; naively using the set API would force us to write Set.insert 1 (Set.insert 2 (Set.singleton 3)). Instead, we can teach Lean's macro system to recognize {0, 1, 2} as a shorthand for a composition of existing methods and let it do the repetitive work of creating the Set.insert... invocation for us. In this way, we can have our more readable and more convenient syntax without having to extend Lean itself, and while retaining the simple insert/singleton API.

How macros are handled

The general procedure is as follows:

  1. Lean parses a command, creating a Lean syntax tree which contains any unexpanded macros.

  2. Lean repeats the cycle (elaboration ~> (macro hygiene and expansion) ~> elaboration...)

The cycle in step 2 repeats until there are no more macros which need to be expanded, and elaboration can finish normally. This repetition is required since macros can expand to other macros, and may expand to code that needs information from the elaborator. As you can see, the process of macro parsing and expansion is interleaved with the parsing and elaboration of non-macro code.

By default, macros in Lean are hygienic, which means the system avoids accidental name capture when reusing the same name inside and outside the macro. Users may occasionally want to disable hygiene, which can be accomplished with the command set_option hygiene false. More in-depth information about hygiene and how it's implemented in the official paper and supplement linked at the top of this guide.

Elements of "a" macro (important types)

In the big picture, a macro has two components that must be implemented by the user, parsers and syntax transformers, where the latter is a function that says what the input syntax should expand to. There is a third component, syntax categories, such as term, tactic, and command, but declaring a new syntax category is not always necessary. When we say "parser" in the context of a macro, we refer to the core type Lean.ParserDescr, which parses elements of type Lean.Syntax, where Lean.Syntax represents elements of a Lean syntax tree. Syntax transformers are functions of type Syntax -> MacroM Syntax. Lean has a synonym for this type, which is simply Macro. MacroM is a monad that carries state needed for macro expansion to work nicely, including the info needed to implement hygiene.

As an example, we again refer to Mathlib's set builder notation:

/- Declares a parser -/
syntax (priority := high) "{" term,+ "}" : term

/- Declares two expansions/syntax transformers -/
macro_rules
  | `({$x}) => `(Set.singleton $x)
  | `({$x, $xs:term,*}) => `(Set.insert $x {$xs,*})

/- Provided `Set` has been imported (from Mathlib4), these are all we need for `{1, 2, 3}` to be valid notation to create a literal set -/

This example should also make clear the reason why macros (and pretty much all of Lean 4's metaprogramming facilities) are functions that take an argument of type Syntax e.g. Syntax -> MacroM Syntax; the leading syntax element is the thing that actually triggers the macro expansion by matching with the declared parser, and as a user, you will almost always be interested in inspecting and transforming that initial syntax element (though there are cases in which it can just be ignored, as in the parameter-less exfalso tactic).

Returning briefly to the API provided by Lean, Lean.Syntax, is pretty much what you would expect a basic syntax tree type to look like. Below is a slightly simplified representation which omits details in the atom and ident constructors; users can create atoms and idents which comport with this simplified representation using the mkAtom and mkIdent methods provided in the Lean namespace.

open Lean
inductive Syntax where
  | missing : Syntax
  | node (kind : SyntaxNodeKind) (args : Array Syntax) : Syntax
  | atom : String -> Syntax
  | ident : Name -> Syntax

For those interested, MacroM is a ReaderT:

open Lean
abbrev MacroM := ReaderT Macro.Context (EStateM Macro.Exception Macro.State)

The other relevant components are defined as follows:

open Lean
structure Context where
  methods        : MethodsRef
  mainModule     : Name
  currMacroScope : MacroScope
  currRecDepth   : Nat := 0
  maxRecDepth    : Nat := defaultMaxRecDepth
  ref            : Syntax

inductive Exception where
  | error             : Syntax → String → Exception
  | unsupportedSyntax : Exception

structure State where
  macroScope : MacroScope
  traceMsgs  : List (Prod Name String) := List.nil
  deriving Inhabited

As a review/checklist, the three (sometimes only two depending on whether you need a new syntax category) components users need to be concerned with are:

  1. You may or may not need to declare a new syntax category using declare_syntax_cat
  2. Declare a parser with either syntax or macro
  3. Declare an expansion/syntax transformer with either macro_rules or macro

Parsers and syntax transformers can be declared manually, but use of the pattern language and syntax, macro_rules, and macro is recommended.

syntax categories with declare_syntax_cat

declare_syntax_cat declares a new syntax category, like command, tactic, or mathlib4's binderterm. These are the different categories of things that can be referred to in a quote/antiquote. declare_syntax_cat results in a call to registerParserCategory and produces a new parser descriptor:

set_option trace.Elab.definition true in
declare_syntax_cat binderterm

/-
Output:

[Elab.definition.body] binderterm.quot : Lean.ParserDescr :=
Lean.ParserDescr.node `Lean.Parser.Term.quot 1024
  (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.symbol "`(binderterm|")
    (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.cat `binderterm 0)
      (Lean.ParserDescr.symbol ")")))
-/

Declaring a new syntax category like this one automatically declares a quotation operator `(binderterm| ...). These pipe prefixes <thing>| are used in syntax quotations to say what category a given quotation is expected to be an element of. The pipe prefixes are not used for elements in the term and command categories (since they're considered the default), but need to be used for everything else.

Parsers and the syntax keyword

Internally, elements of type Lean.ParserDescr are implemented as parser combinators. However, Lean offers the ability to write parsers using the macro/pattern language by way of the syntax keyword. This is the recommended means of writing parsers. As an example, the parser for the rwa (rewrite, then use assumption) tactic is:

open Lean.Parser.Tactic
set_option trace.Elab.definition true in
syntax "rwa " rwRuleSeq (location)? : tactic

/-
which expands to:
[Elab.definition.body] tacticRwa__ : Lean.ParserDescr :=
Lean.ParserDescr.node `tacticRwa__ 1022
  (Lean.ParserDescr.binary `andthen
    (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "rwa " false) Lean.Parser.Tactic.rwRuleSeq)
    (Lean.ParserDescr.unary `optional Lean.Parser.Tactic.location))

-/

Literals are written as double-quoted strings ("rwa " expects the literal sequence of characters rwa, while the trailing space provides a hint to the formatter that it should add a space after rwa when pretty printing this syntax); rwRuleSeq and location are themselves ParserDescrs, and we finish with : tactic specifying that the preceding parser is for an element in the tactic syntax category. The parentheses around (location)? are necessary (rather than location?) because Lean 4 allows question marks to be used in identifiers, so location? is one single identifier that ends with a question mark, which is not what we want.

The name tacticRwa__ is automatically generated. You can name parser descriptors declared with the syntax keyword like so:

set_option trace.Elab.definition true in
syntax (name := introv) "introv " (colGt ident)* : tactic

/-
[Elab.definition.body] introv : Lean.ParserDescr :=
Lean.ParserDescr.node `introv 1022
  (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "introv " false)
    (Lean.ParserDescr.unary `many
      (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.const `colGt) (Lean.ParserDescr.const `ident))))
-/

The pattern language

Available quantifiers are ? (one or zero occurrences, see note below), * (zero or more occurrences), and + (one or more occurrences).

Keep in mind that Lean makes ? available for use in identifiers, so if we want a parser to look for an optional location, we would need to write (location)? with parenthesis acting as a separator, since location? would look for something under the identifier location? (where the ? is part of the identifier).

Parentheses can be used as delimiters.

Separated lists can be constructed like so: $ts,* for a comma separated list.

"extended splices" can be constructed as $[..]. See the official paper (p. 12) for more details.

Literals are written as double-quoted strings. A literal may use trailing whitespace (see e.g. the rwa or introv tactics) to tell the pretty-printer how it should be displayed, but such whitespace will not prevent a literal with no trailing whitespace from matching. The spaces are relevant, but not interpreted literally. When the ParserDescr is turned into a Parser, the actual token matcher uses the .trim of the provided string, but the generated formatter uses the spaces as specified, that is, turning the atom "rwa" in the syntax into the string rwa as part of the pretty printed output.

Syntax expansions with macro_rules, and how it desugars.

macro_rules lets you declare expansions for a given Syntax element using a syntax similar to a match statement. The left-hand side of a match arm is a quotation (with a leading <cat>| for categories other than term and command) in which users can specify the pattern they'd like to write an expansion for. The right-hand side returns a syntax quotation which is the output the user wants to expand to.

A feature of Lean's macro system is that if there are multiple expansions for a particular match, Lean will try the most recently declared expansion first, and will retry with other matching expansions if the previous attempt failed. This is particularly useful for extending existing tactics.

The following example shows both the retry behavior, and the fact that macros declared using the shorthand macro syntax can still have additional expansions declared with macro_rules. This transitivity tactic is implemented such that it will work for either Nat.le or Nat.lt. The Nat.lt version was declared "most recently", so it will be tried first, but if it fails (for example, if the actual term in question is Nat.le) the next potential expansion will be tried:

macro "transitivity" e:(colGt term) : tactic => `(tactic| apply Nat.le_trans (m := $e))
macro_rules
  | `(tactic| transitivity $e) => `(tactic| apply Nat.lt_trans (m := $e))

example (a b c : Nat) (h0 : a < b) (h1 : b < c) : a < c := by
  transitivity b <;>
  assumption

example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : a <= c := by
  transitivity b <;>
  assumption

/- This will fail, but is interesting in that it exposes the "most-recent first" behavior, since the
  error message complains about being unable to unify mvar1 <= mvar2, rather than mvar1 < mvar2. -/
/-
example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : False := by
  transitivity b <;>
  assumption
-/

To see the desugared definition of the actual expansion, we can again use set_option trace.Elab.definition true in and observe the output of the humble exfalso tactic defined in Mathlib4:


set_option trace.Elab.definition true in
macro "exfalso" : tactic => `(tactic| apply False.elim)

/-
Results in the expansion:

[Elab.definition.body] _aux___macroRules_tacticExfalso_1 : Lean.Macro :=
fun x =>
  let discr := x;
  /- This is where Lean tries to actually identify that it's an invocation of the exfalso tactic -/
  if Lean.Syntax.isOfKind discr `tacticExfalso = true then
    let discr := Lean.Syntax.getArg discr 0;
    let x := discr;
    do
      /- Lean getting scope/meta info from the macro monad -/
      let info ← Lean.MonadRef.mkInfoFromRefPos
      let scp ← Lean.getCurrMacroScope
      let mainModule ← Lean.getMainModule
      pure
          (Lean.Syntax.node Lean.SourceInfo.none `Lean.Parser.Tactic.seq1
            #[Lean.Syntax.node Lean.SourceInfo.none `null
                #[Lean.Syntax.node Lean.SourceInfo.none `Lean.Parser.Tactic.apply
                    #[Lean.Syntax.atom info "apply",
                      Lean.Syntax.ident info (String.toSubstring "False.elim")
                        (Lean.addMacroScope mainModule `False.elim scp) [(`False.elim, [])]]]])
  else
    /- If this wasn't actually an invocation of the exfalso tactic, throw the "unsupportedSyntax" error -/
    let discr := x;
    throw Lean.Macro.Exception.unsupportedSyntax
-/

We can also create the syntax transformer declaration ourselves instead of using macro_rules. We'll need to name our parser and use the attribute @[macro myExFalsoParser] to associate our declaration with the parser:

open Lean
syntax (name := myExfalsoParser) "myExfalso" : tactic

-- remember that `Macro` is a synonym for `Syntax -> TacticM Unit`
@[macro myExfalsoParser] def implMyExfalso : Macro :=
fun stx => `(tactic| apply False.elim)

example (p : Prop) (h : p) (f : p -> False) : 3 = 2 := by
  myExfalso
  exact f h

In the above example, we're still using the sugar Lean provides for creating quotations, as it feels more intuitive and saves us some work. It is possible to forego the sugar altogether:

syntax (name := myExfalsoParser) "myExfalso" : tactic

@[macro myExfalsoParser] def implMyExfalso : Lean.Macro :=
  fun stx => pure (Lean.mkNode `Lean.Parser.Tactic.apply
    #[Lean.mkAtomFrom stx "apply", Lean.mkCIdentFrom stx ``False.elim])

example (p : Prop) (h : p) (f : p -> False) : 3 = 2 := by
  myExfalso
  exact f h

The macro keyword

macro is a shortcut which allows users to declare both a parser and an expansion at the same time as a matter of convenience. Additional expansions for the parser generated by the macro invocation can be added with a separate macro_rules block (see the example in the macro_rules section).

Unexpanders

TODO; for now, see the unexpander in Mathlib.Set for an example.

More illustrative examples:

The Tactic.Basic file in Mathlib4 contains many good examples to learn from.

Practical tips:

You can observe the output of commands and functions that in some way use the macro system by setting this option to true : set_option trace.Elab.definition true

Lean also offers the option of limiting the region in which option is set with the syntax set_option ... in):

Hygiene can be disabled with the command option set_option hygiene false

Elaborators

TODO. See Lean Together 2021: Metaprogramming in Lean 4 for an overview as well the continuation about tactic programming. For more information on antiquotations, see also §4.1 of Beyond Notations: Hygienic Macro Expansion for Theorem Proving Languages.

Syntax Metaprogramming Examples

Balanced Parentheses as an Embedded Domain Specific Language

Let's look at how to use macros to extend the Lean 4 parser and embed a language for building balanced parentheses. This language accepts strings given by the BNF grammar

Dyck ::=
  "(" Dyck ")"
  | "{" Dyck "}"
  | end

We begin by defining an inductive data type of the grammar we wish to parse:

inductive Dyck : Type where
  | round : Dyck → Dyck  -- ( <inner> )
  | curly : Dyck → Dyck  -- { <inner> }
  | leaf : Dyck

We begin by declaring a syntax category using the declare_syntax_cat <category> command. This names our grammar and allows us to specify parsing rules associated with our grammar.

declare_syntax_cat brack

Next, we specify the grammar using the syntax <parse rule> command:

syntax "end" : brack

The above means that the token "end" lives in syntax category brack.

Similarly, we declare the rules "(" Dyck ")" and "{" Dyck "}" using the rules:

syntax "(" brack ")" : brack
syntax "{" brack "}" : brack

Finally, we need a way to build Lean 4 terms from this grammar -- that is, we must translate out of this grammar into a Dyck value, which is a Lean 4 term. For this, we create a new kind of "quotation" that consumes syntax in brack and produces a term.

syntax "`[Dyck| " brack "]" : term

To specify the transformation rules, we use macro_rules to declare how the syntax `[Dyck| <brack>] produces terms. This is written using a pattern-matching style syntax, where the left-hand side declares the syntax pattern to be matched, and the right-hand side declares the production. Syntax placeholders (antiquotations) are introduced via the $<var-name> syntax. The right-hand side is an arbitrary Lean term that we are producing.

macro_rules
  | `(`[Dyck| end])    => `(Dyck.leaf)
  | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b])  -- recurse
  | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b])  -- recurse
#check `[Dyck| end]      -- Dyck.leaf
#check `[Dyck| {(end)}]  -- Dyck.curl (Dyck.round Dyck.leaf)

In summary, we've seen:

  • How to declare a syntax category for the Dyck grammar.
  • How to specify parse trees of this grammar using syntax
  • How to translate out of this grammar into Lean 4 terms using macro_rules.

The full program listing is given below:

inductive Dyck : Type where
  | round : Dyck → Dyck  -- ( <inner> )
  | curly : Dyck → Dyck  -- { <inner> }
  | leaf : Dyck

-- declare Dyck grammar parse trees
declare_syntax_cat brack
syntax "(" brack ")" : brack
syntax "{" brack "}" : brack
syntax "end" : brack

-- notation for translating `brack` into `term`
syntax "`[Dyck| " brack "]" : term

-- rules to translate Dyck grammar into inductive value of type Dyck
macro_rules
  | `(`[Dyck| end])    => `(Dyck.leaf)
  | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b])  -- recurse
  | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b])  -- recurse

-- tests
#check `[Dyck| end]      -- Dyck.leaf
#check `[Dyck| {(end)}]  -- Dyck.curl (Dyck.round Dyck.leaf)

Arithmetic as an embedded domain-specific language

Let's parse another classic grammar, the grammar of arithmetic expressions with addition, multiplication, integers, and variables. In the process, we'll learn how to:

  • Convert identifiers such as x into strings within a macro.
  • add the ability to "escape" the macro context from within the macro. This is useful to interpret identifiers with their original meaning (predefined values) instead of their new meaning within a macro (treat as a symbol).

Let's begin with the simplest thing possible. We'll define an AST, and use operators + and * to denote building an arithmetic AST.

Here's the AST that we will be parsing:

inductive Arith : Type
  | add : Arith → Arith → Arith  -- e + f
  | mul : Arith → Arith → Arith  -- e * f
  | int : Int → Arith  -- constant
  | symbol : String → Arith  -- variable

We declare a syntax category to describe the grammar that we will be parsing. See that we control the precedence of + and * by writing syntax:50 for addition and syntax:60 for multiplication, indicating that multiplication binds tighter than addition (higher the number, tighter the binding). This allows us to declare precedence when defining new syntax.

declare_syntax_cat arith

syntax num : arith  -- int for Arith.int
syntax str : arith  -- strings for Arith.symbol
syntax:60 arith:60 "+" arith:61 : arith  -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith  -- Arith.mul
syntax "(" arith ")" : arith  -- parenthesized expressions

Further, if we look at syntax:60 arith:60 "+" arith:61 : arith, the precedence declarations at arith:60 "+" arith:61 conveys that the left argument must have precedence at least 60 or greater, and the right argument must have precedence at least61 or greater. Note that this forces left associativity. To understand this, let's compare two hypothetical parses:

-- syntax:60  arith:60 "+" arith:61 : arith -- Arith.add
-- a + b + c
(a:60 + b:61):60 + c
a + (b:60 + c:61):60

In the parse tree of a + (b:60 + c:61):60, we see that the right argument (b + c) is given the precedence 60. However, the rule for addition expects the right argument to have a precedence of at least 61, as witnessed by the arith:61 at the right-hand-side of syntax:60 arith:60 "+" arith:61 : arith. Thus, the rule syntax:60 arith:60 "+" arith:61 : arith ensures that addition is left associative.

Since addition is declared arguments of precedence 60/61 and multiplication with 70/71, this causes multiplication to bind tighter than addition. Once again, let's compare two hypothetical parses:

-- syntax:60  arith:60 "+" arith:61 : arith -- Arith.add
-- syntax:70 arith:70 "*" arith:71 : arith -- Arith.mul
-- a * b + c
a * (b:60 + c:61):60
(a:70 * b:71):70 + c

While parsing a * (b + c), (b + c) is assigned a precedence 60 by the addition rule. However, multiplication expects the right argument to have precedence at least 71. Thus, this parse is invalid. In contrast, (a * b) + c assigns a precedence of 70 to (a * b). This is compatible with addition which expects the left argument to have precedence **at least 60 ** (70 is greater than 60). Thus, the string a * b + c is parsed as (a * b) + c. For more details, please look at the Lean manual on syntax extensions.

To go from strings into Arith, we define a macro to translate the syntax category arith into an Arith inductive value that lives in term:

-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term

Our macro rules perform the "obvious" translation:

macro_rules
  | `(`[Arith| $s:str]) => `(Arith.symbol $s)
  | `(`[Arith| $num:num]) => `(Arith.int $num)
  | `(`[Arith| $x + $y]) => `(Arith.add `[Arith| $x] `[Arith| $y])
  | `(`[Arith| $x * $y]) => `(Arith.mul `[Arith| $x] `[Arith| $y])
  | `(`[Arith| ($x)]) => `(`[Arith| $x])

And some examples:

#check `[Arith| "x" * "y"]  -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + "y"]  -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + 20]  -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)

#check `[Arith| "x" + "y" * "z"]  -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))

#check `[Arith| "x" * "y" + "z"]  -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

#check `[Arith| ("x" + "y") * "z"]  -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

Writing variables as strings, such as "x" gets old; wouldn't it be so much prettier if we could write x * y, and have the macro translate this into Arith.mul (Arith.Symbol "x") (Arith.mul "y")? We can do this, and this will be our first taste of manipulating macro variables --- we'll use x.getId instead of directly evaluating $x. We also write a macro rule for Arith| that translates an identifier into a string, using $(Lean.quote (toString x.getId)):

syntax ident : arith

macro_rules
  | `(`[Arith| $x:ident]) => `(Arith.symbol $(Lean.quote (toString x.getId)))

Let's test and see that we can now write expressions such as x * y directly instead of having to write "x" * "y":

#check `[Arith| x]  -- Arith.symbol "x"

def xPlusY := `[Arith| x + y]
#print xPlusY  -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")

We now show an unfortunate consequence of the above definitions. Suppose we want to build (x + y) + z. Since we already have defined xPlusY as x + y, perhaps we should reuse it! Let's try:

#check `[Arith| xPlusY + z]  -- Arith.add (Arith.symbol "xPlusY") (Arith.symbol "z")

Whoops, that didn't work! What happened? Lean treats xPlusY itself as an identifier! So we need to add some syntax to be able to "escape" the Arith| context. Let's use the syntax <[ $e:term ]> to mean: evaluate $e as a real term, not an identifier. The macro looks like follows:

syntax "<[" term "]>" : arith  -- escape for embedding terms into `Arith`

macro_rules
  | `(`[Arith| <[ $e:term ]>]) => pure e

Let's try our previous example:

#check `[Arith| <[ xPlusY ]> + z]  -- Arith.add xPlusY (Arith.symbol "z")

Perfect!

In this tutorial, we expanded on the previous tutorial to parse a more realistic grammar with multiple levels of precedence, how to parse identifiers directly within a macro, and how to provide an escape from within the macro context.

Full code listing

inductive Arith : Type
  | add : Arith → Arith → Arith  -- e + f
  | mul : Arith → Arith → Arith  -- e * f
  | int : Int → Arith  -- constant
  | symbol : String → Arith  -- variable

declare_syntax_cat arith

syntax num : arith  -- int for Arith.int
syntax str : arith  -- strings for Arith.symbol
syntax:60 arith:60 "+" arith:61 : arith  -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith  -- Arith.mul
syntax "(" arith ")" : arith  -- parenthesized expressions

-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term

macro_rules
  | `(`[Arith| $s:str]) => `(Arith.symbol $s)
  | `(`[Arith| $num:num]) => `(Arith.int $num)
  | `(`[Arith| $x + $y]) => `(Arith.add `[Arith| $x] `[Arith| $y])
  | `(`[Arith| $x * $y]) => `(Arith.mul `[Arith| $x] `[Arith| $y])
  | `(`[Arith| ($x)]) => `(`[Arith| $x])

#check `[Arith| "x" * "y"]  -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + "y"]  -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + 20]  -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)

#check `[Arith| "x" + "y" * "z"]  -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))

#check `[Arith| "x" * "y" + "z"]  -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

#check `[Arith| ("x" + "y") * "z"]  -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

syntax ident : arith

macro_rules
  | `(`[Arith| $x:ident]) => `(Arith.symbol $(Lean.quote (toString x.getId)))

#check `[Arith| x]  -- Arith.symbol "x"

def xPlusY := `[Arith| x + y]
#print xPlusY  -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")

syntax "<[" term "]>" : arith  -- escape for embedding terms into `Arith`

macro_rules
  | `(`[Arith| <[ $e:term ]>]) => pure e

#check `[Arith| <[ xPlusY ]> + z]  -- Arith.add xPlusY (Arith.symbol "z")

Declaring New Types

In Lean's library, every concrete type other than the universes and every type constructor other than the dependent function type is an instance of a general family of type constructions known as inductive types. It is remarkable that it is possible to develop complex programs and formalize mathematics based on nothing more than the type universes, dependent function types, and inductive types; everything else follows from those.

Intuitively, an inductive type is built up from a specified list of constructors. In Lean, the basic syntax for specifying such a type is as follows:

inductive NewType where
  | constructor_1 : ... → NewType
  | constructor_2 : ... → NewType
  ...
  | constructor_n : ... → NewType

The intuition is that each constructor specifies a way of building new objects of NewType, possibly from previously constructed values. The type NewType consists of nothing more than the objects that are constructed in this way.

We will see below that the arguments to the constructors can include objects of type NewType, subject to a certain "positivity" constraint, which guarantees that elements of NewType are built from the bottom up. Roughly speaking, each ... can be any function type constructed from NewType and previously defined types, in which NewType appears, if at all, only as the "target" of the function type.

We will provide a number of examples of inductive types. We will also consider slight generalizations of the scheme above, to mutually defined inductive types, and so-called inductive families.

Every inductive type comes with constructors, which show how to construct an element of the type, and elimination rules, which show how to "use" an element of the type in another construction.

Enumerated Types

The simplest kind of inductive type is simply a type with a finite, enumerated list of elements. The following command declares the enumerated type Weekday.

inductive Weekday where
  | sunday    : Weekday
  | monday    : Weekday
  | tuesday   : Weekday
  | wednesday : Weekday
  | thursday  : Weekday
  | friday    : Weekday
  | saturday  : Weekday

The Weekday type has 7 constructors/elements. The constructors live in the Weekday namespace Think of sunday, monday, …, saturday as being distinct elements of Weekday, with no other distinguishing properties.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
#check Weekday.sunday   -- Weekday
#check Weekday.monday   -- Weekday

You can define functions by pattern matching. The following function converts a Weekday into a natural number.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
def natOfWeekday (d : Weekday) : Nat :=
  match d with
  | Weekday.sunday    => 1
  | Weekday.monday    => 2
  | Weekday.tuesday   => 3
  | Weekday.wednesday => 4
  | Weekday.thursday  => 5
  | Weekday.friday    => 6
  | Weekday.saturday  => 7

#eval natOfWeekday Weekday.tuesday -- 3

It is often useful to group definitions related to a type in a namespace with the same name. For example, we can put the function above into the Weekday namespace. We are then allowed to use the shorter name when we open the namespace.

In the following example, we define functions from Weekday to Weekday in the namespace Weekday.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
namespace Weekday

def next (d : Weekday) : Weekday :=
  match d with
  | sunday    => monday
  | monday    => tuesday
  | tuesday   => wednesday
  | wednesday => thursday
  | thursday  => friday
  | friday    => saturday
  | saturday  => sunday

end Weekday

It is so common to start a definition with a match in Lean, that Lean provides a syntax sugar for it.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
namespace Weekday
def previous : Weekday -> Weekday
  | sunday    => saturday
  | monday    => sunday
  | tuesday   => monday
  | wednesday => tuesday
  | thursday  => wednesday
  | friday    => thursday
  | saturday  => friday
end Weekday

We can use the command #eval to test our definitions.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
namespace Weekday
def next (d : Weekday) : Weekday :=
 match d with
 | sunday    => monday
 | monday    => tuesday
 | tuesday   => wednesday
 | wednesday => thursday
 | thursday  => friday
 | friday    => saturday
 | saturday  => sunday
def previous : Weekday -> Weekday
 | sunday    => saturday
 | monday    => sunday
 | tuesday   => monday
 | wednesday => tuesday
 | thursday  => wednesday
 | friday    => thursday
 | saturday  => friday
def toString : Weekday -> String
  | sunday    => "Sunday"
  | monday    => "Monday"
  | tuesday   => "Tuesday"
  | wednesday => "Wednesday"
  | thursday  => "Thursday"
  | friday    => "Friday"
  | saturday  => "Saturday"

#eval toString (next sunday)             -- "Monday"
#eval toString (next tuesday)            -- "Wednesday"
#eval toString (previous wednesday)      -- "Tuesday"
#eval toString (next (previous sunday))  -- "Sunday"
#eval toString (next (previous monday))  -- "Monday"
-- ..
end Weekday

We can now prove the general theorem that next (previous d) = d for any weekday d. The idea is to perform a proof by cases using match, and rely on the fact for each constructor both sides of the equality reduce to the same term.

inductive Weekday where
 | sunday    : Weekday
 | monday    : Weekday
 | tuesday   : Weekday
 | wednesday : Weekday
 | thursday  : Weekday
 | friday    : Weekday
 | saturday  : Weekday
namespace Weekday
def next (d : Weekday) : Weekday :=
 match d with
 | sunday    => monday
 | monday    => tuesday
 | tuesday   => wednesday
 | wednesday => thursday
 | thursday  => friday
 | friday    => saturday
 | saturday  => sunday
def previous : Weekday -> Weekday
 | sunday    => saturday
 | monday    => sunday
 | tuesday   => monday
 | wednesday => tuesday
 | thursday  => wednesday
 | friday    => thursday
 | saturday  => friday
theorem nextOfPrevious (d : Weekday) : next (previous d) = d :=
  match d with
  | sunday    => rfl
  | monday    => rfl
  | tuesday   => rfl
  | wednesday => rfl
  | thursday  => rfl
  | friday    => rfl
  | saturday  => rfl
end Weekday

Inductive Types

Theorem Proving in Lean has a chapter about inductive datatypes.

Structures

Structure is a special case of inductive datatype. It has only one constructor and is not recursive. Similar to the inductive command, the structure command introduces a namespace with the same name. The general form is as follows:

structure <name> <parameters> <parent-structures> where
  <constructor-name> :: <fields>

Most parts are optional. Here is our first example.

structure Point (α : Type u) where
  x : α
  y : α

In the example above, the constructor name is not provided. So, the constructor is named mk by Lean. Values of type Point are created using Point.mk a b or { x := a, y := b : Point α }. The latter can be written as { x := a, y := b } when the expected type is known. The fields of a point p are accessed using Point.x p and Point.y p. You can also the more compact notation p.x and p.y as a shorthand for Point.x p and Point.y p.

structure Point (α : Type u) where
 x : α
 y : α
#check Point
#check Point       -- Type u -> Type u
#check @Point.mk   -- {α : Type u} → α → α → Point α
#check @Point.x    -- {α : Type u} → Point α → α
#check @Point.y    -- {α : Type u} → Point α → α

#check Point.mk 10 20                   -- Point Nat
#check { x := 10, y := 20 : Point Nat } -- Point Nat

def mkPoint (a : Nat) : Point Nat :=
  { x := a, y := a }

#eval (Point.mk 10 20).x                   -- 10
#eval (Point.mk 10 20).y                   -- 20
#eval { x := 10, y := 20 : Point Nat }.x   -- 10
#eval { x := 10, y := 20 : Point Nat }.y   -- 20

def addXY (p : Point Nat) : Nat :=
  p.x + p.y

#eval addXY { x := 10, y := 20 }    -- 30

In the notation { ... }, if the fields are in different lines, the , is optional.

structure Point (α : Type u) where
 x : α
 y : α
def mkPoint (a : Nat) : Point Nat := {
  x := a
  y := a
}

You can also use where instead of := { ... }.

structure Point (α : Type u) where
 x : α
 y : α
def mkPoint (a : Nat) : Point Nat where
  x := a
  y := a

Here are some simple theorems about our Point type.

structure Point (α : Type u) where
 x : α
 y : α
theorem ex1 (a b : α) : (Point.mk a b).x = a :=
  rfl

theorem ex2 (a b : α) : (Point.mk a b).y = b :=
  rfl

theorem ex3 (a b : α) : Point.mk a b = { x := a, y := b } :=
  rfl

The dot notation is convenient not just for accessing the projections of a structure, but also for applying functions defined in a namespace with the same name. If p has type Point, the expression p.foo is interpreted as Point.foo p, assuming that the first argument to foo has type Point. The expression p.add q is therefore shorthand for Point.add p q in the example below.

structure Point (α : Type u) where
  x : α
  y : α

def Point.add (p q : Point Nat) : Point Nat :=
  { x := p.x + q.x, y := p.y + q.y }

def p : Point Nat := Point.mk 1 2
def q : Point Nat := Point.mk 3 4

#eval (p.add q).x  -- 4
#eval (p.add q).y  -- 6

After we introduce type classes, we show how to define a function like add so that it works generically for elements of Point α rather than just Point Nat, assuming α has an associated addition operation.

More generally, given an expression p.foo x y z, Lean will insert p at the first argument to foo of type Point. For example, with the definition of scalar multiplication below, p.smul 3 is interpreted as Point.smul 3 p.

structure Point (α : Type u) where
  x : α
  y : α

def Point.smul (n : Nat) (p : Point Nat) :=
  Point.mk (n * p.x) (n * p.y)

def p : Point Nat :=
  Point.mk 1 2

#eval (p.smul 3).x -- 3
#eval (p.smul 3).y -- 6

Inheritance

We can extend existing structures by adding new fields. This feature allows us to simulate a form of inheritance.

structure Point (α : Type u) where
  x : α
  y : α

inductive Color where
  | red
  | green
  | blue

structure ColorPoint (α : Type u) extends Point α where
  color : Color

#check { x := 10, y := 20, color := Color.red : ColorPoint Nat }
-- { toPoint := { x := 10, y := 20 }, color := Color.red }

The output for the check command above suggests how Lean encoded inheritance and multiple inheritance. Lean uses fields to each parent structure.

structure Foo where
  x : Nat
  y : Nat

structure Boo where
  w : Nat
  z : Nat

structure Bla extends Foo, Boo where
  bit : Bool

#check Bla.mk -- Foo → Boo → Bool → Bla
#check Bla.mk { x := 10, y := 20 } { w := 30, z := 40 } true
#check { x := 10, y := 20, w := 30, z := 40, bit := true : Bla }
#check { toFoo := { x := 10, y := 20 },
         toBoo := { w := 30, z := 40 },
         bit := true : Bla }

theorem ex :
    Bla.mk { x := x, y := y } { w := w, z := z } b
    =
    { x := x, y := y, w := w, z := z, bit := b } :=
  rfl

Default field values

You can assign default value to fields when declaring a new structure.

inductive MessageSeverity
  | error | warning

structure Message where
  fileName : String
  pos      : Option Nat      := none
  severity : MessageSeverity := MessageSeverity.error
  caption  : String          := ""
  data     : String

def msg1 : Message :=
  { fileName := "foo.lean", data := "failed to import file" }

#eval msg1.pos      -- none
#eval msg1.fileName -- "foo.lean"
#eval msg1.caption  -- ""

When extending a structure, you can not only add new fields, but provide new default values for existing fields.

inductive MessageSeverity
 | error | warning
structure Message where
 fileName : String
 pos      : Option Nat      := none
 severity : MessageSeverity := MessageSeverity.error
 caption  : String          := ""
 data     : String
structure MessageExt extends Message where
  timestamp : Nat
  caption   := "extended" -- new default value for field `caption`

def msg2 : MessageExt where
  fileName  := "bar.lean"
  data      := "error at initialization"
  timestamp := 10

#eval msg2.fileName  -- "bar.lean"
#eval msg2.timestamp -- 10
#eval msg2.caption   -- "extended"

Updating structure fields

Structure fields can be updated using { <struct-val> with <field> := <new-value>, ... }:

structure Point (α : Type u) where
 x : α
 y : α
def incrementX (p : Point Nat) : Point Nat := { p with x := p.x + 1 }

Type classes

Typeclasses were introduced as a principled way of enabling ad-hoc polymorphism in functional programming languages. We first observe that it would be easy to implement an ad-hoc polymorphic function (such as addition) if the function simply took the type-specific implementation of addition as an argument and then called that implementation on the remaining arguments. For example, suppose we declare a structure in Lean to hold implementations of addition

namespace Ex
structure Add (a : Type) where
  add : a -> a -> a

#check @Add.add
-- Add.add : {a : Type} → Add a → a → a → a
end Ex

In the above Lean code, the field add has type Add.add : {α : Type} → Add α → α → α → α where the curly braces around the type a mean that it is an implicit argument. We could implement double by

namespace Ex
structure Add (a : Type) where
 add : a -> a -> a
def double (s : Add a) (x : a) : a :=
  s.add x x

#eval double { add := Nat.add } 10
-- 20

#eval double { add := Nat.mul } 10
-- 100

#eval double { add := Int.add } 10
-- 20

end Ex

Note that you can double a natural number n by double { add := Nat.add } n. Of course, it would be highly cumbersome for users to manually pass the implementations around in this way. Indeed, it would defeat most of the potential benefits of ad-hoc polymorphism.

The main idea behind typeclasses is to make arguments such as Add a implicit, and to use a database of user-defined instances to synthesize the desired instances automatically through a process known as typeclass resolution. In Lean, by changing structure to class in the example above, the type of Add.add becomes

namespace Ex
class Add (a : Type) where
  add : a -> a -> a

#check @Add.add
-- Add.add : {a : Type} → [self : Add a] → a → a → a
end Ex

where the square brackets indicate that the argument of type Add a is instance implicit, i.e. that it should be synthesized using typeclass resolution. This version of add is the Lean analogue of the Haskell term add :: Add a => a -> a -> a. Similarly, we can register an instance by

namespace Ex
class Add (a : Type) where
 add : a -> a -> a
instance : Add Nat where
  add := Nat.add

end Ex

Then for n : Nat and m : Nat, the term Add.add n m triggers typeclass resolution with the goal of Add Nat, and typeclass resolution will synthesize the instance above. In general, instances may depend on other instances in complicated ways. For example, you can declare an (anonymous) instance stating that if a has addition, then Array a has addition:

instance [Add a] : Add (Array a) where
  add x y := Array.zipWith x y (· + ·)

#eval Add.add #[1, 2] #[3, 4]
-- #[4, 6]

#eval #[1, 2] + #[3, 4]
-- #[4, 6]

Note that x + y is notation for Add.add x y in Lean.

The example above demonstrates how type classes are used to overload notation. Now, we explore another application. We often need an arbitrary element of a given type. Recall that types may not have any elements in Lean. It often happens that we would like a definition to return an arbitrary element in a "corner case." For example, we may like the expression head xs to be of type a when xs is of type List a. Similarly, many theorems hold under the additional assumption that a type is not empty. For example, if a is a type, exists x : a, x = x is true only if a is not empty. The standard library defines a type class Inhabited to enable type class inference to infer a "default" or "arbitrary" element of an inhabited type. Let us start with the first step of the program above, declaring an appropriate class:

namespace Ex
class Inhabited (a : Sort u) where
  default : a

#check @Inhabited.default
-- Inhabited.default : {a : Sort u} → [self : Inhabited a] → a
end Ex

Note Inhabited.default doesn't have any explicit argument.

An element of the class Inhabited a is simply an expression of the form Inhabited.mk x, for some element x : a. The projection Inhabited.default will allow us to "extract" such an element of a from an element of Inhabited a. Now we populate the class with some instances:

namespace Ex
class Inhabited (a : Sort _) where
 default : a
instance : Inhabited Bool where
  default := true

instance : Inhabited Nat where
  default := 0

instance : Inhabited Unit where
  default := ()

instance : Inhabited Prop where
  default := True

#eval (Inhabited.default : Nat)
-- 0

#eval (Inhabited.default : Bool)
-- true
end Ex

You can use the command export to create the alias default for Inhabited.default

namespace Ex
class Inhabited (a : Sort _) where
 default : a
instance : Inhabited Bool where
 default := true
instance : Inhabited Nat where
 default := 0
instance : Inhabited Unit where
 default := ()
instance : Inhabited Prop where
 default := True
export Inhabited (default)

#eval (default : Nat)
-- 0

#eval (default : Bool)
-- true
end Ex

Chaining Instances

If that were the extent of type class inference, it would not be all that impressive; it would be simply a mechanism of storing a list of instances for the elaborator to find in a lookup table. What makes type class inference powerful is that one can chain instances. That is, an instance declaration can in turn depend on an implicit instance of a type class. This causes class inference to chain through instances recursively, backtracking when necessary, in a Prolog-like search.

For example, the following definition shows that if two types a and b are inhabited, then so is their product:

instance [Inhabited a] [Inhabited b] : Inhabited (a × b) where
  default := (default, default)

With this added to the earlier instance declarations, type class instance can infer, for example, a default element of Nat × Bool:

namespace Ex
class Inhabited (a : Sort u) where
 default : a
instance : Inhabited Bool where
 default := true
instance : Inhabited Nat where
 default := 0
opaque default [Inhabited a] : a :=
 Inhabited.default
instance [Inhabited a] [Inhabited b] : Inhabited (a × b) where
  default := (default, default)

#eval (default : Nat × Bool)
-- (0, true)
end Ex

Similarly, we can inhabit type function with suitable constant functions:

namespace Ex
class Inhabited (a : Sort u) where
 default : a
opaque default [Inhabited a] : a :=
 Inhabited.default
instance [Inhabited b] : Inhabited (a -> b) where
  default := fun _ => default
end Ex

As an exercise, try defining default instances for other types, such as List and Sum types.

The Lean standard library contains the definition inferInstance. It has type {α : Sort u} → [i : α] → α, and is useful for triggering the type class resolution procedure when the expected type is an instance.

#check (inferInstance : Inhabited Nat) -- Inhabited Nat

def foo : Inhabited (Nat × Nat) :=
  inferInstance

theorem ex : foo.default = (default, default) :=
  rfl

You can use the command #print to inspect how simple inferInstance is.

#print inferInstance

ToString

The polymorphic method toString has type {α : Type u} → [ToString α] → α → String. You implement the instance for your own types and use chaining to convert complex values into strings. Lean comes with ToString instances for most builtin types.

structure Person where
  name : String
  age  : Nat

instance : ToString Person where
  toString p := p.name ++ "@" ++ toString p.age

#eval toString { name := "Leo", age := 542 : Person }
#eval toString ({ name := "Daniel", age := 18 : Person }, "hello")

Numerals

Numerals are polymorphic in Lean. You can use a numeral (e.g., 2) to denote an element of any type that implements the type class OfNat.

structure Rational where
  num : Int
  den : Nat
  inv : den ≠ 0

instance : OfNat Rational n where
  ofNat := { num := n, den := 1, inv := by decide }

instance : ToString Rational where
  toString r := s!"{r.num}/{r.den}"

#eval (2 : Rational) -- 2/1

#check (2 : Rational) -- Rational
#check (2 : Nat)      -- Nat

Lean elaborate the terms (2 : Nat) and (2 : Rational) as OfNat.ofNat Nat 2 (instOfNatNat 2) and OfNat.ofNat Rational 2 (instOfNatRational 2) respectively. We say the numerals 2 occurring in the elaborated terms are raw natural numbers. You can input the raw natural number 2 using the macro nat_lit 2.

#check nat_lit 2  -- Nat

Raw natural numbers are not polymorphic.

The OfNat instance is parametric on the numeral. So, you can define instances for particular numerals. The second argument is often a variable as in the example above, or a raw natural number.

class Monoid (α : Type u) where
  unit : α
  op   : α → α → α

instance [s : Monoid α] : OfNat α (nat_lit 1) where
  ofNat := s.unit

def getUnit [Monoid α] : α :=
  1

Because many users were forgetting the nat_lit when defining OfNat instances, Lean also accepts OfNat instance declarations not using nat_lit. Thus, the following is also accepted.

class Monoid (α : Type u) where
  unit : α
  op   : α → α → α

instance [s : Monoid α] : OfNat α 1 where
  ofNat := s.unit

def getUnit [Monoid α] : α :=
  1

Output parameters

By default, Lean only tries to synthesize an instance Inhabited T when the term T is known and does not contain missing parts. The following command produces the error "failed to create type class instance for Inhabited (Nat × ?m.1499)" because the type has a missing part (i.e., the _).

-- FIXME: should fail
#check (inferInstance : Inhabited (Nat × _))

You can view the parameter of the type class Inhabited as an input value for the type class synthesizer. When a type class has multiple parameters, you can mark some of them as output parameters. Lean will start type class synthesizer even when these parameters have missing parts. In the following example, we use output parameters to define a heterogeneous polymorphic multiplication.

namespace Ex
class HMul (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hMul : α → β → γ

export HMul (hMul)

instance : HMul Nat Nat Nat where
  hMul := Nat.mul

instance : HMul Nat (Array Nat) (Array Nat) where
  hMul a bs := bs.map (fun b => hMul a b)

#eval hMul 4 3           -- 12
#eval hMul 4 #[2, 3, 4]  -- #[8, 12, 16]
end Ex

The parameters α and β are considered input parameters and γ an output one. Given an application hMul a b, after types of a and b are known, the type class synthesizer is invoked, and the resulting type is obtained from the output parameter γ. In the example above, we defined two instances. The first one is the homogeneous multiplication for natural numbers. The second is the scalar multiplication for arrays. Note that, you chain instances and generalize the second instance.

namespace Ex
class HMul (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hMul : α → β → γ

export HMul (hMul)

instance : HMul Nat Nat Nat where
  hMul := Nat.mul

instance : HMul Int Int Int where
  hMul := Int.mul

instance [HMul α β γ] : HMul α (Array β) (Array γ) where
  hMul a bs := bs.map (fun b => hMul a b)

#eval hMul 4 3                    -- 12
#eval hMul 4 #[2, 3, 4]           -- #[8, 12, 16]
#eval hMul (-2) #[3, -1, 4]       -- #[-6, 2, -8]
#eval hMul 2 #[#[2, 3], #[0, 4]]  -- #[#[4, 6], #[0, 8]]
end Ex

You can use our new scalar array multiplication instance on arrays of type Array β with a scalar of type α whenever you have an instance HMul α β γ. In the last #eval, note that the instance was used twice on an array of arrays.

Default instances

In the class HMul, the parameters α and β are treated as input values. Thus, type class synthesis only starts after these two types are known. This may often be too restrictive.

namespace Ex
class HMul (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hMul : α → β → γ

export HMul (hMul)

instance : HMul Int Int Int where
  hMul := Int.mul

def xs : List Int := [1, 2, 3]

-- TODO: fix error message
-- Error "failed to create type class instance for HMul Int ?m.1767 (?m.1797 x)"
-- #check fun y => xs.map (fun x => hMul x y)
end Ex

The instance HMul is not synthesized by Lean because the type of y has not been provided. However, it is natural to assume that the type of y and x should be the same in this kind of situation. We can achieve exactly that using default instances.

namespace Ex
class HMul (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hMul : α → β → γ

export HMul (hMul)

@[default_instance]
instance : HMul Int Int Int where
  hMul := Int.mul

def xs : List Int := [1, 2, 3]

#check fun y => xs.map (fun x => hMul x y)  -- Int -> List Int
end Ex

By tagging the instance above with the attribute default_instance, we are instructing Lean to use this instance on pending type class synthesis problems. The actual Lean implementation defines homogeneous and heterogeneous classes for arithmetical operators. Moreover, a+b, a*b, a-b, a/b, and a%b are notations for the heterogeneous versions. The instance OfNat Nat n is the default instance (with priority 100) for the OfNat class. This is why the numeral 2 has type Nat when the expected type is not known. You can define default instances with higher priority to override the builtin ones.

structure Rational where
  num : Int
  den : Nat
  inv : den ≠ 0

@[default_instance 200]
instance : OfNat Rational n where
  ofNat := { num := n, den := 1, inv := by decide }

instance : ToString Rational where
  toString r := s!"{r.num}/{r.den}"

#check 2 -- Rational

Priorities are also useful to control the interaction between different default instances. For example, suppose xs has type α, when elaboration xs.map (fun x => 2 * x), we want the homogeneous instance for multiplication to have higher priority than the default instance for OfNat. This is particularly important when we have implemented only the instance HMul α α α, and did not implement HMul Nat α α. Now, we reveal how the notation a*b is defined in Lean.

namespace Ex
class OfNat (α : Type u) (n : Nat) where
  ofNat : α

@[default_instance]
instance (n : Nat) : OfNat Nat n where
  ofNat := n

class HMul (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hMul : α → β → γ

class Mul (α : Type u) where
  mul : α → α → α

@[default_instance 10]
instance [Mul α] : HMul α α α where
  hMul a b := Mul.mul a b

infixl:70 " * "  => HMul.hMul
end Ex

The Mul class is convenient for types that only implement the homogeneous multiplication.

Scoped Instances

TODO

Local Instances

TODO

Unification Hints

TODO

Builtin Types

Numeric Operations

Lean supports the basic mathematical operations you’d expect for all of the number types: addition, subtraction, multiplication, division, and remainder. The following code shows how you’d use each one in a def commands:

-- addition
def sum := 5 + 10

-- subtraction
def difference := 95.5 - 4.3

-- multiplication
def product := 4 * 30

-- division
def quotient := 53.7 / 32.2

-- remainder/modulo
def modulo := 43 % 5

Each expression in these statements uses a mathematical operator and evaluates to a single value.

Natural numbers

The Nat type represents the natural numbers, i.e., arbitrary-precision unsigned integers. There are no overflows.

#eval 100000000000000000 * 200000000000000000000 * 1000000000000000000000

A numeral is considered to be a Nat if there are no typing constraints.

#check 10    -- Nat
#check id 10 -- Nat

def f (x : Int) : Int :=
  x - 1

#eval f (3 - 5) -- 3 and 5 are `Int` since `f` expects an `Int`.
-- -3

The operator - for Nat implements truncated subtraction.

#eval 10 - 5 -- 5
#eval 5 - 10 -- 0

theorem ex : 5 - 10 = 0 :=
  rfl

#eval (5:Int) - 10 -- -5

The operator / for Nat implements Euclidean division.

#eval 10 / 4 -- 2

#check 10.0 / 4.0 -- Float
#eval 10.0 / 4.0  -- 2.5

As we described in the previous sections, we define the Nat type as an inductive datatype.

namespace hidden
inductive Nat where
  | zero : Nat
  | succ : Nat → Nat
end hidden

However, the internal representation of Nat is optimized. Small natural numbers (i.e., < 2^63 in a 64-bit machine) are represented by a single machine word. Big numbers are implemented using GMP numbers. We recommend you use fixed precision numeric types only in performance critical code.

The Lean kernel has builtin support for the Nat type too, and can efficiently reduce Nat expressions during type checking.

#reduce 100000000000000000 * 200000000000000000000 * 1000000000000000000000

theorem ex
      : 1000000000000000 * 2000000000000000000 = 2000000000000000000000000000000000 :=
  rfl

The sharp-eyed reader will notice that GMP is part of the Lean kernel trusted code base. We believe this is not a problem because you can use external type checkers to double-check your developments, and we consider GMP very trustworthy. Existing external type checkers for Lean 3 (e.g., Trepplein and TC) can be easily adapted to Lean 4. If you are still concerned after checking your development with multiple different external checkers because they may all rely on buggy arbitrary-precision libraries, you can develop your own certified arbitrary-precision library and use it to implement your own type checker for Lean.

Integers

The Int type represents the arbitrary-precision integers. There are no overflows.

#eval (100000000000000000 : Int) * 200000000000000000000 * 1000000000000000000000

Recall that nonnegative numerals are considered to be a Nat if there are no typing constraints.

#check 1 -- Nat
#check -1 -- Int
#check (1:Int) -- Int

The operator / for Int implements integer division.

#eval -10 / 4 -- -3

Similar to Nat, the internal representation of Int is optimized. Small integers are represented by a single machine word. Big integers are implemented using GMP numbers. We recommend you use fixed precision numeric types only in performance critical code.

The Lean kernel does not have special support for reducing Int during type checking. However, since Int is defined as

namespace hidden
inductive Int : Type where
  | ofNat   : Nat → Int
  | negSucc : Nat → Int
end hidden

the type checker will be able reduce Int expressions efficiently by relying on the special support for Nat.

theorem ex : -2000000000 * 1000000000 = -2000000000000000000 :=
 rfl

Fixed precision unsigned integers

Float

Arrays

The Array type implements a dynamic (aka growable) array. It is defined as

namespace hidden
structure Array (α : Type u) where
  data : List α
end hidden

but its execution time representation is optimized, and it is similar to C++ std::vector<T> and Rust Vec<T>. The Lean type checker has no special support for reducing Arrays.

You can create arrays in several ways. You can create a small array by listing consecutive values between #[ and ] and separated by commas, as shown in the following examples.

#check #[1, 2, 3] -- Array Nat

#check #[] -- Array ?m

The type of the array elements is inferred from the literals used and must be consistent.

#check #["hello", "world"] -- Array String

-- The following is not valid
#check_failure #[10, "hello"]

Recall that the command #check_failure <term> only succeeds when the given term is not type correct.

To create an array of size n in which all the elements are initialized to some value a, use mkArray.

#eval mkArray 5 'a'
-- #['a', 'a', 'a', 'a', 'a']

Accessing elements

You can access array elements by using brackets ([ and ]).

def f (a : Array Nat) (i : Fin a.size) :=
  a[i] + a[i]

Note that the index i has type Fin a.size, i.e., it is natural number less than a.size. You can also write

def f (a : Array Nat) (i : Nat) (h  : i < a.size) :=
  a[i] + a[i]

The bracket operator is whitespace sensitive.

def f (xs : List Nat) : List Nat :=
  xs ++ xs

def as : Array Nat :=
  #[1, 2, 3, 4]

def idx : Fin 4 :=
  2

#eval f [1, 2, 3] -- This is a function application

#eval as[idx] -- This is an array access

The notation a[i] has two variants: a[i]! and a[i]?. In both cases, i has type Nat. The first one produces a panic error message if the index i is out of bounds. The latter returns an Option type.

#eval #['a', 'b', 'c'][1]?
-- some 'b'
#eval #['a', 'b', 'c'][5]?
-- none
#eval #['a', 'b', 'c'][1]!
-- 'b!

List

Characters

A value of type Char, also known as a character, is a Unicode scalar value. It is represented using an unsigned 32-bit integer and is statically guaranteed to be a valid Unicode scalar value.

Syntactically, character literals are enclosed in single quotes.

#eval 'a' -- 'a'
#eval '∀' -- '∀'

Characters are ordered and can be decidably compared using the relational operators =, <, , >, .

Strings

Option

Thunks, Tasks, and Threads

A Thunk is defined as

namespace Ex
universe u
structure Thunk (α : Type u) : Type u where
  fn : Unit → α
end Ex

A Thunk encapsulates a computation without evaluation. That is, a Thunk stores the way of how the value would be computed. The Lean runtime has special support for Thunks. It caches their values after they are computed for the first time. This feature is useful for implementing data structures such as lazy lists. Here is a small example using a Thunk.

def fib : Nat → Nat
  | 0   => 0
  | 1   => 1
  | x+2 => fib (x+1) + fib x

def f (c : Bool) (x : Thunk Nat) : Nat :=
  if c then
    x.get
  else
    0

def g (c : Bool) (x : Nat) : Nat :=
  f c (Thunk.mk (fun _ => fib x))

#eval g false 1000

The function f above uses x.get to evaluate the Thunk x. The expression Thunk.mk (fun _ => fib x) creates a Thunk for computing fib x. Note that fib is a very naive function for computing the Fibonacci numbers, and it would an unreasonable amount of time to compute fib 1000. However, our test terminates instantaneously because the Thunk is not evaluated when c is false. Lean has a builtin coercion from any type a to Thunk a. You write the function g above as

def fib : Nat → Nat
 | 0   => 0
 | 1   => 1
 | x+2 => fib (x+1) + fib x
def f (c : Bool) (x : Thunk Nat) : Nat :=
 if c then
   x.get
 else
   0
def g (c : Bool) (x : Nat) : Nat :=
  f c (fib x)

#eval g false 1000

In the following example, we use the macro dbg_trace to demonstrate that the Lean runtime caches the value computed by a Thunk. We remark that the macro dbg_trace should be used for debugging purposes only.

def add1 (x : Nat) : Nat :=
  dbg_trace "add1: {x}"
  x + 1

def double (x : Thunk Nat) : Nat :=
  x.get + x.get

def triple (x : Thunk Nat) : Nat :=
  double x + x.get

def test (x : Nat) : Nat :=
  triple (add1 x)

#eval test 2
-- add1: 2
-- 9

Note that the message add1: 2 is printed only once. Now, consider the same example using Unit -> Nat instead of Thunk Nat.

def add1 (x : Nat) : Nat :=
  dbg_trace "add1: {x}"
  x + 1

def double (x : Unit -> Nat) : Nat :=
  x () + x ()

def triple (x : Unit -> Nat) : Nat :=
  double x + x ()

def test (x : Nat) : Nat :=
  triple (fun _ => add1 x)

#eval test 2
-- add1: 2
-- add1: 2
-- 9

Now, the message add1: 2 is printed twice. It may come as a surprise that it was printed twice instead of three times. As we pointed out, dbg_trace is a macro used for debugging purposes only, and add1 is still considered to be a pure function. The Lean compiler performs common subexpression elimination when compiling double, and the produced code for double executes x () only once instead of twice. This transformation is safe because x : Unit -> Nat is pure.

Task

Functions

Functions are the fundamental unit of program execution in any programming language. As in other languages, a Lean function has a name, can have parameters and take arguments, and has a body. Lean also supports functional programming constructs such as treating functions as values, using unnamed functions in expressions, composition of functions to form new functions, curried functions, and the implicit definition of functions by way of the partial application of function arguments.

You define functions by using the def keyword followed by its name, a parameter list, return type and its body. The parameter list consists of successive parameters that are separated by spaces. You can specify an explicit type for each parameter. If you do not specify a specific argument type, the compiler tries to infer the type from the function body. An error is returned when it cannot be inferred. The expression that makes up the function body is typically a compound expression consisting of a number of expressions that culminate in a final expression that is the return value. The return type is a colon followed by a type and is optional. If you do not specify the type of the return value explicitly, the compiler tries to determine the return type from the final expression.

def f x := x + 1

In the previous example, the function name is f, the argument is x, which has type Nat, the function body is x + 1, and the return value is of type Nat. The following example defines the factorial recursive function using pattern matching.

def fact x :=
  match x with
  | 0   => 1
  | n+1 => (n+1) * fact n

#eval fact 100

By default, Lean only accepts total functions. The partial keyword may be used to define a recursive function without a termination proof; partial functions compute in compiled programs, but are opaque in proofs and during type checking.

partial def g (x : Nat) (p : Nat -> Bool) : Nat :=
  if p x then
    x
  else
    g (x+1) p

#eval g 0 (fun x => x > 10)

In the previous example, g x p only terminates if there is a y >= x such that p y returns true. Of course, g 0 (fun x => false) never terminates.

However, the use of partial is restricted to functions whose return type is not empty so the soundness of the system is not compromised.

partial def loop? : α := -- failed to compile partial definition 'loop?', failed to
  loop?                  -- show that type is inhabited and non empty

partial def loop [Inhabited α] : α := -- compiles
  loop

example : True := -- accepted
  loop

example : False :=
  loop -- failed to synthesize instance Inhabited False

If we were able to partially define loop?, we could prove False with it.

Lambda expressions

A lambda expression is an unnamed function. You define lambda expressions by using the fun keyword. A lambda expression resembles a function definition, except that instead of the := token, the => token is used to separate the argument list from the function body. As in a regular function definition, the argument types can be inferred or specified explicitly, and the return type of the lambda expression is inferred from the type of the last expression in the body.

def twice (f : Nat -> Nat) (x : Nat) : Nat :=
  f (f x)

#eval twice (fun x => x + 1) 3
#eval twice (fun (x : Nat) => x * 2) 3

#eval List.map (fun x => x + 1) [1, 2, 3]
-- [2, 3, 4]

#eval List.map (fun (x, y) => x + y) [(1, 2), (3, 4)]
-- [3, 7]

Syntax sugar for simple lambda expressions

Simple functions can be defined using parentheses and · as a placeholder.

#check (· + 1)
-- fun a => a + 1
#check (2 - ·)
-- fun a => 2 - a
#eval [1, 2, 3, 4, 5].foldl (· * ·) 1
-- 120

def h (x y z : Nat) :=
  x + y + z

#check (h · 1 ·)
-- fun a b => h a 1 b

#eval [(1, 2), (3, 4), (5, 6)].map (·.1)
-- [1, 3, 5]

In the previous example, the term (·.1) is syntax sugar for fun x => x.1.

Pipelining

Pipelining enables function calls to be chained together as successive operations. Pipelining works as follows:

def add1 x := x + 1
def times2 x := x * 2

#eval times2 (add1 100)
#eval 100 |> add1 |> times2
#eval times2 <| add1 <| 100

The result of the previous #eval commands is 202. The forward pipeline |> operator takes a function and an argument and return a value. In contrast, the backward pipeline <| operator takes an argument and a function and returns a value. These operators are useful for minimizing the number of parentheses.

def add1Times3FilterEven (xs : List Nat) :=
  List.filter (· % 2 == 0) (List.map (· * 3) (List.map (· + 1) xs))

#eval add1Times3FilterEven [1, 2, 3, 4]
-- [6, 12]

-- Define the same function using pipes
def add1Times3FilterEven' (xs : List Nat) :=
  xs |> List.map (· + 1) |> List.map (· * 3) |> List.filter (· % 2 == 0)

#eval add1Times3FilterEven' [1, 2, 3, 4]
-- [6, 12]

Lean also supports the operator |>. which combines forward pipeline |> operator with the . field notation.

-- Define the same function using pipes
def add1Times3FilterEven'' (xs : List Nat) :=
  xs.map (· + 1) |>.map (· * 3) |>.filter (· % 2 == 0)

#eval add1Times3FilterEven'' [1, 2, 3, 4]
-- [6, 12]

For users familiar with the Haskell programming language, Lean also supports the notation f $ a for the backward pipeline f <| a.

Monads

Monads are used heavily in Lean, as they are also in Haskell. Monads come from the wonderful world of Category Theory.

Monads in Lean are so similar to Haskell that this introduction to monads is heavily based on the similar chapter of the Monday Morning Haskell. Many thanks to the authors of that material for allowing us to reuse it here.

Monads build on the following fundamental type classes which you will need to understand first before fully understanding monads. Shown in light blue are some concrete functors and monads that will also be covered in this chapter:

image

This chapter is organized to give you a bottom up introduction to monads, starting with functors and applicative functors, you'll get an intuition for how these abstract structures work in Lean. Then you'll dive into monads and learn how to use some of the most useful built-in ones.

Functor

A functor is a type class that provides a map function and the map function is something many people are already familiar with so this should be easy to follow. Here you will see some concrete examples in action with List and Option.

Applicative Functors

Applicatives are a little more difficult to understand than functors, but their functionality can still be summed up in a couple simple functions. Here you will learn how to create an Applicative List and a completely custom Applicative type.

Monads Tutorial

Now that you have an intuition for how abstract structures work, you'll examine some of the problems that functors and applicative functors don't help you solve. Then you'll learn the specifics of how to actually use monads with some examples using the Option monad and the all important IO monad.

Reader Monad

Now that you understand the details of what makes a monadic structure work, in this section, you'll learn about one of the most useful built in monads ReaderM, which gives your programs a global read-only context.

State Monad

This section introduces the StateM monad. This monad allows you to access a particular type that you can both read from and write to. It opens the door to fully stateful programming, allowing you to do many of the things a function programming language supposedly "can't" do.

Except Monad

Similar to the Option monad the Except monad allows you to change the signature of a function so that it can return an ok value or an error and it provides the classic exception handling operations throw/try/catch so that your programs can do monad-based exception handling.

Monad Transformers

Now that you are familiar with all the above monads it is time to answer the question - how you can make them work together? After all, there are definitely times when you need multiple kinds of monadic behavior. This section introduces the concept of monad transformers, which allow you to combine multiple monads into one.

Monad Laws

This section examines what makes a monad a legal monad. You could just implement your monadic type classes any way you want and write "monad" instances, but starting back with functors and applicative functors, you'll learn that all these structures have "laws" that they are expected to obey with respect to their behavior. You can make instances that don't follow these laws. But you do so at your peril, as other programmers will be very confused when they try to use them.

Functor

A Functor is any type that can act as a generic container that allows you to transform the underlying values inside the container using a function, so that the values are all updated, but the structure of the container is the same. This is called "mapping".

A List is one of the most basic examples of a Functor.

A list contains zero or more elements of the same, underlying type. When you map a function over a list, you create a new list with the same number of elements, where each has been transformed by the function:

["1", "2", "3"]
List.map: {α β : Type} → (α → β) → List α → List β
List.map
(λ
x: Nat
x
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
) [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] -- ["1", "2", "3"] -- you can also write this using dot notation on the List object
["1", "2", "3"]
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
].
map: {α β : Type} → (α → β) → List α → List β
map
(λ
x: Nat
x
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
) -- ["1", "2", "3"]

Here we converted a list of natural numbers (Nat) to a list of strings where the lambda function here used toString to do the transformation of each element. Notice that when you apply map the "structure" of the object remains the same, in this case the result is always a List of the same size.

Note that in Lean a lambda function can be written using fun keyword or the unicode symbol λ which you can type in VS code using \la .

List has a specialized version of map defined as follows:

def 
map: {α : Type u_1} → {β : Type u_2} → (α → β) → List α → List β
map
(
f: α → β
f
:
α: Type u_1
α
β: Type u_2
β
) :
List: Type u_1 → Type u_1
List
α: Type u_1
α
List: Type u_2 → Type u_2
List
β: Type u_2
β
| [] =>
[]: List β
[]
|
a: α
a
::
as: List α
as
=>
f: α → β
f
a: α
a
::
map: {α : Type u_1} → {β : Type u_2} → (α → β) → List α → List β
map
f: α → β
f
as: List α
as

This is a very generic map function that can take any function that converts (α → β) and use it to convert List α → List β. Notice the function call f a above, this application of f is producing the converted items for the new list.

Let's look at some more examples:

-- List String → List Nat
[8, 5, 7]
[
"elephant": String
"elephant"
,
"tiger": String
"tiger"
,
"giraffe": String
"giraffe"
].
map: {α β : Type} → (α → β) → List α → List β
map
(fun
s: String
s
=>
s: String
s
.
length: String → Nat
length
) -- [8, 5, 7] -- List Nat → List Float
[1.000000, 8.000000, 27.000000, 64.000000, 125.000000]
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
,
4: Nat
4
,
5: Nat
5
].
map: {α β : Type} → (α → β) → List α → List β
map
(fun
s: Nat
s
=> (
s: Nat
s
.
toFloat: Nat → Float
toFloat
) ^
3.0: Float
3.0
) -- [1.000000, 8.000000, 27.000000, 64.000000, 125.000000] --- List String → List String
["Chris", "David", "Mark"]
[
"chris": String
"chris"
,
"david": String
"david"
,
"mark": String
"mark"
].
map: {α β : Type} → (α → β) → List α → List β
map
(fun
s: String
s
=>
s: String
s
.
capitalize: String → String
capitalize
) -- ["Chris", "David", "Mark"]

Another example of a functor is the Option type. Option contains a value or nothing and is handy for code that has to deal with optional values, like optional command line arguments.

Remember you can construct an Option using the type constructors some or none:

some 5 : Option Nat
some: {α : Type} → α → Option α
some
5: Nat
5
-- Option Nat
some 5
some: {α : Type} → α → Option α
some
5: Nat
5
-- some 5
some 6
(
some: {α : Type} → α → Option α
some
5: Nat
5
).
map: {α β : Type} → (α → β) → Option α → Option β
map
(fun
x: Nat
x
=>
x: Nat
x
+
1: Nat
1
) -- some 6
some "5"
(
some: {α : Type} → α → Option α
some
5: Nat
5
).
map: {α β : Type} → (α → β) → Option α → Option β
map
(fun
x: Nat
x
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
) -- some "5"

Lean also provides a convenient short hand syntax for (fun x => x + 1), namely (· + 1) using the middle dot unicode character which you can type in VS code using \. .

some 20
(
some: {α : Type} → α → Option α
some
4: Nat
4
).
map: {α β : Type} → (α → β) → Option α → Option β
map
(· * 5): Nat → Nat
(· * 5)
-- some 20

The map function preserves the none state of the Option, so again map preserves the structure of the object.

def 
x: Option Nat
x
:
Option: Type → Type
Option
Nat: Type
Nat
:=
none: {α : Type} → Option α
none
none
x: Option Nat
x
.
map: {α β : Type} → (α → β) → Option α → Option β
map
(fun
x: Nat
x
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
) -- none
Option.map (fun x => toString x) x : Option String
x: Option Nat
x
.
map: {α β : Type} → (α → β) → Option α → Option β
map
(fun
x: Nat
x
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
) -- Option String

Notice that even in the none case it has transformed Option Nat into Option String as you see in the #check command.

How to make a Functor Instance?

The List type is made an official Functor by the following type class instance:

instance: Functor List
instance
:
Functor: (Type u_1 → Type u_1) → Type (u_1 + 1)
Functor
List: Type u_1 → Type u_1
List
where map :=
List.map: {α β : Type u_1} → (α → β) → List α → List β
List.map

Notice all you need to do is provide the map function implementation. For a quick example, let's supposed you create a new type describing the measurements of a home or apartment:

structure 
LivingSpace: Type → Type
LivingSpace
(
α: Type
α
:
Type: Type 1
Type
) where
totalSize: {α : Type} → LivingSpace α → α
totalSize
:
α: Type
α
numBedrooms: {α : Type} → LivingSpace α → Nat
numBedrooms
:
Nat: Type
Nat
masterBedroomSize: {α : Type} → LivingSpace α → α
masterBedroomSize
:
α: Type
α
livingRoomSize: {α : Type} → LivingSpace α → α
livingRoomSize
:
α: Type
α
kitchenSize: {α : Type} → LivingSpace α → α
kitchenSize
:
α: Type
α
deriving
Repr: Type u → Type u
Repr
,
BEq: Type u → Type u
BEq

Now you can construct a LivingSpace in square feet using floating point values:

abbrev 
SquareFeet: Type
SquareFeet
:=
Float: Type
Float
def
mySpace: LivingSpace SquareFeet
mySpace
:
LivingSpace: Type → Type
LivingSpace
SquareFeet: Type
SquareFeet
:= { totalSize :=
1800: SquareFeet
1800
, numBedrooms :=
4: Nat
4
, masterBedroomSize :=
500: SquareFeet
500
, livingRoomSize :=
900: SquareFeet
900
, kitchenSize :=
400: SquareFeet
400
}

Now, suppose you want anyone to be able to map a LivingSpace from one type of measurement unit to another. Then you would provide a Functor instance as follows:

def 
LivingSpace.map: {α β : Type} → (α → β) → LivingSpace α → LivingSpace β
LivingSpace.map
(
f: α → β
f
:
α: Type
α
β: Type
β
) (
s: LivingSpace α
s
:
LivingSpace: Type → Type
LivingSpace
α: Type
α
) :
LivingSpace: Type → Type
LivingSpace
β: Type
β
:= { totalSize :=
f: α → β
f
s: LivingSpace α
s
.
totalSize: {α : Type} → LivingSpace α → α
totalSize
numBedrooms :=
s: LivingSpace α
s
.
numBedrooms: {α : Type} → LivingSpace α → Nat
numBedrooms
masterBedroomSize :=
f: α → β
f
s: LivingSpace α
s
.
masterBedroomSize: {α : Type} → LivingSpace α → α
masterBedroomSize
livingRoomSize :=
f: α → β
f
s: LivingSpace α
s
.
livingRoomSize: {α : Type} → LivingSpace α → α
livingRoomSize
kitchenSize :=
f: α → β
f
s: LivingSpace α
s
.
kitchenSize: {α : Type} → LivingSpace α → α
kitchenSize
}
instance: Functor LivingSpace
instance
:
Functor: (Type → Type) → Type 1
Functor
LivingSpace: Type → Type
LivingSpace
where map :=
LivingSpace.map: {α β : Type} → (α → β) → LivingSpace α → LivingSpace β
LivingSpace.map

Notice this functor instance takes LivingSpace and not the fully qualified type LivingSpace SquareFeet. Notice below that LivingSpace is a function from Type to Type. For example, if you give it type SquareFeet it gives you back the fully qualified type LivingSpace SquareFeet.

LivingSpace (α : Type) : Type
LivingSpace: Type → Type
LivingSpace
-- Type → Type

So the instance : Functor then is operating on the more abstract, or generic LivingSpace saying for the whole family of types LivingSpace α you can map to LivingSpace β using the generic LivingSpace.map map function by simply providing a function that does the more primitive mapping from (f : α → β). So LivingSpace.map is a sort of function applicator. This is called a "higher order function" because it takes a function as input (α → β) and returns another function as output F α → F β.

Notice that LivingSpace.map applies a function f to convert the units of all the LivingSpace fields, except for numBedrooms which is a count (and therefore is not a measurement that needs converting).

So now you can define a simple conversion function, let's say you want square meters instead:

abbrev 
SquareMeters: Type
SquareMeters
:=
Float: Type
Float
def
squareFeetToMeters: SquareFeet → SquareMeters
squareFeetToMeters
(
ft: SquareFeet
ft
:
SquareFeet: Type
SquareFeet
) :
SquareMeters: Type
SquareMeters
:= (
ft: SquareFeet
ft
/
10.7639104: SquareMeters
10.7639104
)

and now bringing it all together you can use the simple function squareFeetToMeters to map mySpace to square meters:

{ totalSize := 167.225472, numBedrooms := 4, masterBedroomSize := 46.451520, livingRoomSize := 83.612736, kitchenSize := 37.161216 }
mySpace: LivingSpace SquareFeet
mySpace
.
map: {α β : Type} → (α → β) → LivingSpace α → LivingSpace β
map
squareFeetToMeters: SquareFeet → SquareMeters
squareFeetToMeters
/- { totalSize := 167.225472, numBedrooms := 4, masterBedroomSize := 46.451520, livingRoomSize := 83.612736, kitchenSize := 37.161216 } -/

Lean also defines custom infix operator <$> for Functor.map which allows you to write this:

[8, 5, 7]
(fun
s: String
s
=>
s: String
s
.
length: String → Nat
length
) <$> [
"elephant": String
"elephant"
,
"tiger": String
"tiger"
,
"giraffe": String
"giraffe"
] -- [8, 5, 7]
some 6
(fun
x: Nat
x
=>
x: Nat
x
+
1: Nat
1
) <$> (
some: {α : Type} → α → Option α
some
5: Nat
5
) -- some 6

Note that the infix operator is left associative which means it binds more tightly to the function on the left than to the expression on the right, this means you can often drop the parentheses on the right like this:

some 6
(fun
x: Nat
x
=>
x: Nat
x
+
1: Nat
1
) <$>
some: {α : Type} → α → Option α
some
5: Nat
5
-- some 6

Note that Lean lets you define your own syntax, so <$> is nothing special. You can define your own infix operator like this:

infixr:100 " doodle " => 
Functor.map: {f : Type u → Type v} → [self : Functor f] → {α β : Type u} → (α → β) → f α → f β
Functor.map
[5, 10, 15]
(· * 5): Nat → Nat
(· * 5)
doodle [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] -- [5, 10, 15]

Wow, this is pretty powerful. By providing a functor instance on LivingSpace with an implementation of the map function it is now super easy for anyone to come along and transform the units of a LivingSpace using very simple functions like squareFeetToMeters. Notice that squareFeetToMeters knows nothing about LivingSpace.

How do Functors help with Monads ?

Functors are an abstract mathematical structure that is represented in Lean with a type class. The Lean functor defines both map and a special case for working on constants more efficiently called mapConst:

class Functor (f : Type u → Type v) : Type (max (u+1) v) where
  map : {α β : Type u} → (α → β) → f α → f β
  mapConst : {α β : Type u} → α → f β → f α

Note that mapConst has a default implementation, namely: mapConst : {α β : Type u} → α → f β → f α := Function.comp map (Function.const _) in the Functor type class. So you can use this default implementation and you only need to replace it if your functor has a more specialized variant than this (usually the custom version is more performant).

In general then, a functor is a function on types F : Type u → Type v equipped with an operator called map such that if you have a function f of type α → β then map f will convert your container type from F α → F β. This corresponds to the category-theory notion of functor in the special case where the category is the category of types and functions between them.

Understanding abstract mathematical structures is a little tricky for most people. So it helps to start with a simpler idea like functors before you try to understand monads. Building on functors is the next abstraction called Applicatives.

Applicative Functors

Building on Functors is the Applicative Functor. For simplicity, you can refer to these simply as "Applicatives". These are a little tricker than functors, but still simpler than monads. Let's see how they work!

What is an Applicative Functor?

An applicative functor defines a default or "base" construction for an object and allows function application to be chained across multiple instances of the structure. All applicative functors are functors, meaning they must also support the "map" operation.

How are Applicatives represented in Lean?

An applicative functor is an intermediate structure between Functor and Monad. It mainly consists of two operations:

  • pure : α → F α
  • seq : F (α → β) → F α → F β (written as <*>)

The pure operator specifies how you can wrap a normal object α into an instance of this structure F α. This is the "default" mechanism mentioned above.

The seq operator allows you to chain operations by wrapping a function in a structure. The name "applicative" comes from the fact that you "apply" functions from within the structure, rather than simply from outside the structure, as was the case with Functor.map.

Applicative in Lean is built on some helper type classes, Functor, Pure and Seq:

namespace hidden -- hidden
class 
Applicative: (Type u → Type v) → Type (max (u + 1) v)
Applicative
(
f: Type u → Type v
f
:
Type u: Type (u + 1)
Type u
Type v: Type (v + 1)
Type v
) extends
Functor: (Type u → Type v) → Type (max (u + 1) v)
Functor
f: Type u → Type v
f
,
Pure: (Type u → Type v) → Type (max (u + 1) v)
Pure
f: Type u → Type v
f
,
Seq: (Type u → Type v) → Type (max (u + 1) v)
Seq
f: Type u → Type v
f
,
SeqLeft: (Type u → Type v) → Type (max (u + 1) v)
SeqLeft
f: Type u → Type v
f
,
SeqRight: (Type u → Type v) → Type (max (u + 1) v)
SeqRight
f: Type u → Type v
f
where map := fun
x: α✝ → β✝
x
y: f α✝
y
=>
Seq.seq: {f : Type u → Type v} → [self : Seq f] → {α β : Type u} → f (α → β) → (Unit → f α) → f β
Seq.seq
(
pure: {α : Type u} → α → f α
pure
x: α✝ → β✝
x
) fun
_: Unit
_
=>
y: f α✝
y
seqLeft := fun
a: f α✝
a
b: Unit → f β✝
b
=>
Seq.seq: {f : Type u → Type v} → [self : Seq f] → {α β : Type u} → f (α → β) → (Unit → f α) → f β
Seq.seq
(
Functor.map: {f : Type u → Type v} → [self : Functor f] → {α β : Type u} → (α → β) → f α → f β
Functor.map
(
Function.const: {α : Type u} → (β : Type u) → α → β → α
Function.const
_: Type u
_
)
a: f α✝
a
)
b: Unit → f β✝
b
seqRight := fun
a: f α✝
a
b: Unit → f β✝
b
=>
Seq.seq: {f : Type u → Type v} → [self : Seq f] → {α β : Type u} → f (α → β) → (Unit → f α) → f β
Seq.seq
(
Functor.map: {f : Type u → Type v} → [self : Functor f] → {α β : Type u} → (α → β) → f α → f β
Functor.map
(
Function.const: {α : Type u} → (β : Type u) → α → β → α
Function.const
_: Type u
_
id: {α : Type u} → α → α
id
)
a: f α✝
a
)
b: Unit → f β✝
b
end hidden -- hidden

Notice that as with Functor it is also a type transformer (f : Type u → Type v) and notice the extends Functor f is ensuring the base Functor also performs that same type transformation.

As stated above, all applicatives are then functors. This means you can assume that map already exists for all these types.

The Pure base type class is a very simple type class that supplies the pure function.

namespace hidden -- hidden
class 
Pure: (Type u → Type v) → Type (max (u + 1) v)
Pure
(
f: Type u → Type v
f
:
Type u: Type (u + 1)
Type u
Type v: Type (v + 1)
Type v
) where
pure: {f : Type u → Type v} → [self : Pure f] → {α : Type u} → α → f α
pure
{
α: Type u
α
:
Type u: Type (u + 1)
Type u
} :
α: Type u
α
f: Type u → Type v
f
α: Type u
α
end hidden -- hidden

You can think of it as lifting the result of a pure value to some monadic type. The simplest example of pure is the Option type:

some 10
(
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
10: Nat
10
:
Option: Type → Type
Option
Nat: Type
Nat
) -- some 10

Here we used the Option implementation of pure to wrap the Nat 10 value in an Option Nat type resulting in the value some 10, and in fact if you look at the Monad instance of Option , you will see that pure is indeed implemented using Option.some:

instance: Monad Option
instance
:
Monad: (Type u_1 → Type u_1) → Type (u_1 + 1)
Monad
Option: Type u_1 → Type u_1
Option
where pure :=
Option.some: {α : Type u_1} → α → Option α
Option.some

The Seq type class is also a simple type class that provides the seq operator which can also be written using the special syntax <*>.

namespace hidden -- hidden
class 
Seq: (Type u → Type v) → Type (max (u + 1) v)
Seq
(
f: Type u → Type v
f
:
Type u: Type (u + 1)
Type u
Type v: Type (v + 1)
Type v
) :
Type (max (u+1) v): Type ((max (u + 1) v) + 1)
Type (max (u+1) v)
where
seq: {f : Type u → Type v} → [self : Seq f] → {α β : Type u} → f (α → β) → (Unit → f α) → f β
seq
: {
α: Type u
α
β: Type u
β
:
Type u: Type (u + 1)
Type u
}
f: Type u → Type v
f
(
α: Type u
α
β: Type u
β
) (
Unit: Type
Unit
f: Type u → Type v
f
α: Type u
α
)
f: Type u → Type v
f
β: Type u
β
end hidden -- hidden

Basic Applicative Examples

Many of the basic functors also have instances of Applicative. For example, Option is also Applicative.

So let's take a look and what the seq operator can do. Suppose you want to multiply two Option Nat objects. Your first attempt might be this:

some 4 * some 5 : ?m.866
some 4 * some 5 : ?m.866
Warning: failed to synthesize HMul (Option Nat) (Option Nat) ?m.866 Additional diagnostic information may be available using the `set_option diagnostics true` command.
-- failed to synthesize instance

You then might wonder how to use the Functor.map to solve this since you could do these before:

some 20
(
some: {α : Type} → α → Option α
some
4: Nat
4
).
map: {α β : Type} → (α → β) → Option α → Option β
map
(fun
x: Nat
x
=>
x: Nat
x
*
5: Nat
5
) -- some 20
some 20
(
some: {α : Type} → α → Option α
some
4: Nat
4
).
map: {α β : Type} → (α → β) → Option α → Option β
map
(· * 5): Nat → Nat
(· * 5)
-- some 20
some 20
(· * 5): Nat → Nat
(· * 5)
<$> (
some: {α : Type} → α → Option α
some
4: Nat
4
) -- some 20

Remember that <$> is the infix notation for Functor.map.

The functor map operation can apply a multiplication to the value in the Option and then lift the result back up to become a new Option , but this isn't what you need here.

The Seq.seq operator <*> can help since it can apply a function to the items inside a container and then lift the result back up to the desired type, namely Option .

There are two ways to do this:

some 20
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(.*.): Nat → Nat → Nat
(.*.)
<*>
some: {α : Type} → α → Option α
some
4: Nat
4
<*>
some: {α : Type} → α → Option α
some
5: Nat
5
-- some 20
some 20
(.*.): Nat → Nat → Nat
(.*.)
<$>
some: {α : Type} → α → Option α
some
4: Nat
4
<*>
some: {α : Type} → α → Option α
some
5: Nat
5
-- some 20

In the first way, we start off by wrapping the function in an applicative using pure. Then we apply this to the first Option , and again to the second Option in a chain of operations. So you can see how Seq.seq can be chained in fact, Seq.seq is really all about chaining of operations.

But in this case there is a simpler way. In the second way, you can see that "applying" a single function to a container is the same as using Functor.map. So you use <$> to "transform" the first option into an Option containing a function, and then apply this function over the second value.

Now if either side is none, the result is none, as expected, and in this case the seq operator was able to eliminate the multiplication:

none
(.*.): Nat → Nat → Nat
(.*.)
<$>
none: {α : Type} → Option α
none
<*>
some: {α : Type} → α → Option α
some
5: Nat
5
-- none
none
(.*.): Nat → Nat → Nat
(.*.)
<$>
some: {α : Type} → α → Option α
some
4: Nat
4
<*>
none: {α : Type} → Option α
none
-- none

For a more interesting example, let's make List an applicative by adding the following definition:

instance: Applicative List
instance
:
Applicative: (Type u_1 → Type u_1) → Type (u_1 + 1)
Applicative
List: Type u_1 → Type u_1
List
where pure :=
List.singleton: {α : Type u_1} → α → List α
List.singleton
seq
f: List (α✝ → β✝)
f
x: Unit → List α✝
x
:=
List.flatMap: {α β : Type u_1} → List α → (α → List β) → List β
List.flatMap
f: List (α✝ → β✝)
f
fun
y: α✝ → β✝
y
=>
Functor.map: {f : Type u_1 → Type u_1} → [self : Functor f] → {α β : Type u_1} → (α → β) → f α → f β
Functor.map
y: α✝ → β✝
y
(
x: Unit → List α✝
x
(): Unit
()
)

Notice you can now sequence a list of functions and a list of items. The trivial case of sequencing a singleton list is in fact the same as map, as you saw earlier with the Option examples:

[6, 8]
[
(·+2): Nat → Nat
(·+2)
] <*> [
4: Nat
4
,
6: Nat
6
] -- [6, 8]
[6, 8]
(·+2): Nat → Nat
(·+2)
<$> [
4: Nat
4
,
6: Nat
6
] -- [6, 8]

But now with list it is easier to show the difference when you do this:

[6, 8, 12, 18]
[
(·+2): Nat → Nat
(·+2)
,
(· *3): Nat → Nat
(· *3)
] <*> [
4: Nat
4
,
6: Nat
6
] -- [6, 8, 12, 18]

Why did this produce 4 values? The reason is because <*> applies every function to every value in a pairwise manner. This makes sequence really convenient for solving certain problems. For example, how do you get the pairwise combinations of all values from two lists?

[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
Prod.mk: {α β : Type} → α → β → α × β
Prod.mk
<$> [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] <*> [
4: Nat
4
,
5: Nat
5
,
6: Nat
6
] -- [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

How do you get the sum of these pairwise values?

[5, 6, 7, 6, 7, 8, 7, 8, 9]
(·+·): Nat → Nat → Nat
(·+·)
<$> [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] <*> [
4: Nat
4
,
5: Nat
5
,
6: Nat
6
] -- [5, 6, 7, 6, 7, 8, 7, 8, 9]

Here you can use <$> to "transform" each element of the first list into a function, and then apply these functions over the second list.

If you have 3 lists, and want to find all combinations of 3 values across those lists you would need helper function that can create a tuple out of 3 values, and Lean provides a very convenient syntax for that (·,·,·):

[(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]
(·,·,·): Nat → Nat → Nat → Nat × Nat × Nat
(·,·,·)
<$> [
1: Nat
1
,
2: Nat
2
] <*> [
3: Nat
3
,
4: Nat
4
] <*> [
5: Nat
5
,
6: Nat
6
] -- [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]

And you could sum these combinations if you first define a sum function that takes three inputs and then you could chain apply this over the three lists. Again lean can create such a function with the expression (·+·+·):

[9, 10, 10, 11, 10, 11, 11, 12]
(·+·+·): Nat → Nat → Nat → Nat
(·+·+·)
<$> [
1: Nat
1
,
2: Nat
2
] <*> [
3: Nat
3
,
4: Nat
4
] <*> [
5: Nat
5
,
6: Nat
6
] -- [9, 10, 10, 11, 10, 11, 11, 12]

And indeed each sum here matches the expected values if you manually sum the triples we show above.

Side note: there is another way to combine lists with a function that does not do the pairwise combinatorics, it is called List.zipWith:

[5, 7, 9]
List.zipWith: {α β γ : Type} → (α → β → γ) → List α → List β → List γ
List.zipWith
(·+·): Nat → Nat → Nat
(·+·)
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] [
4: Nat
4
,
5: Nat
5
,
6: Nat
6
] -- [5, 7, 9]

And there is a helper function named List.zip that calls zipWith using the function Prod.mk so you get a nice zipped list like this:

[(1, 4), (2, 5), (3, 6)]
List.zip: {α β : Type} → List α → List β → List (α × β)
List.zip
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] [
4: Nat
4
,
5: Nat
5
,
6: Nat
6
] -- [(1, 4), (2, 5), (3, 6)]

And of course, as you would expect, there is an unzip also:

([1, 2, 3], [4, 5, 6])
List.unzip: {α β : Type} → List (α × β) → List α × List β
List.unzip
(
List.zip: {α β : Type} → List α → List β → List (α × β)
List.zip
[
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] [
4: Nat
4
,
5: Nat
5
,
6: Nat
6
]) -- ([1, 2, 3], [4, 5, 6])

Example: A Functor that is not Applicative

From the chapter on functors you might remember this example of LivingSpace that had a Functor instance:

structure 
LivingSpace: Type → Type
LivingSpace
(
α: Type
α
:
Type: Type 1
Type
) where
totalSize: {α : Type} → LivingSpace α → α
totalSize
:
α: Type
α
numBedrooms: {α : Type} → LivingSpace α → Nat
numBedrooms
:
Nat: Type
Nat
masterBedroomSize: {α : Type} → LivingSpace α → α
masterBedroomSize
:
α: Type
α
livingRoomSize: {α : Type} → LivingSpace α → α
livingRoomSize
:
α: Type
α
kitchenSize: {α : Type} → LivingSpace α → α
kitchenSize
:
α: Type
α
deriving
Repr: Type u → Type u
Repr
,
BEq: Type u → Type u
BEq
def
LivingSpace.map: {α β : Type} → (α → β) → LivingSpace α → LivingSpace β
LivingSpace.map
(
f: α → β
f
:
α: Type
α
β: Type
β
) (
s: LivingSpace α
s
:
LivingSpace: Type → Type
LivingSpace
α: Type
α
) :
LivingSpace: Type → Type
LivingSpace
β: Type
β
:= { totalSize :=
f: α → β
f
s: LivingSpace α
s
.
totalSize: {α : Type} → LivingSpace α → α
totalSize
numBedrooms :=
s: LivingSpace α
s
.
numBedrooms: {α : Type} → LivingSpace α → Nat
numBedrooms
masterBedroomSize :=
f: α → β
f
s: LivingSpace α
s
.
masterBedroomSize: {α : Type} → LivingSpace α → α
masterBedroomSize
livingRoomSize :=
f: α → β
f
s: LivingSpace α
s
.
livingRoomSize: {α : Type} → LivingSpace α → α
livingRoomSize
kitchenSize :=
f: α → β
f
s: LivingSpace α
s
.
kitchenSize: {α : Type} → LivingSpace α → α
kitchenSize
}
instance: Functor LivingSpace
instance
:
Functor: (Type → Type) → Type 1
Functor
LivingSpace: Type → Type
LivingSpace
where map :=
LivingSpace.map: {α β : Type} → (α → β) → LivingSpace α → LivingSpace β
LivingSpace.map

It wouldn't really make sense to make an Applicative instance here. How would you write pure in the Applicative instance? By taking a single value and plugging it in for total size and the master bedroom size and the living room size? That wouldn't really make sense. And what would the numBedrooms value be for the default? What would it mean to "chain" two of these objects together?

If you can't answer these questions very well, then it suggests this type isn't really an Applicative functor.

SeqLeft and SeqRight

You may remember seeing the SeqLeft and SeqRight base types on class Applicative earlier. These provide the seqLeft and seqRight operations which also have some handy notation shorthands <* and *> respectively. Where: x <* y evaluates x, then y, and returns the result of x and x *> y evaluates x, then y, and returns the result of y.

To make it easier to remember, notice that it returns that value that the <* or *> notation is pointing at. For example:

some 2
(
some: {α : Type} → α → Option α
some
1: Nat
1
) *> (
some: {α : Type} → α → Option α
some
2: Nat
2
) -- Some 2
some 1
(
some: {α : Type} → α → Option α
some
1: Nat
1
) <* (
some: {α : Type} → α → Option α
some
2: Nat
2
) -- Some 1

So these are a kind of "discard" operation. Run all the actions, but only return the values that you care about. It will be easier to see these in action when you get to full Monads, but they are used heavily in the Lean Parsec parser combinator library where you will find parsing functions like this one which parses the XML declaration <?xml version="1.0" encoding='utf-8' standalone="yes">:

def XMLdecl : Parsec Unit := do
  skipString "<?xml"
  VersionInfo
  optional EncodingDecl *> optional SDDecl *> optional S *> skipString "?>"

But you will need to understand full Monads before this will make sense.

Lazy Evaluation

Diving a bit deeper, (you can skip this and jump to the Applicative Laws if don't want to dive into this implementation detail right now). But, if you write a simple Option example (.*.) <$> some 4 <*> some 5 that produces some 20 using Seq.seq you will see something interesting:

some 20
Seq.seq: {f : Type → Type} → [self : Seq f] → {α β : Type} → f (α → β) → (Unit → f α) → f β
Seq.seq
(
(.*.): Nat → Nat → Nat
(.*.)
<$>
some: {α : Type} → α → Option α
some
4: Nat
4
) (fun (
_: Unit
_
:
Unit: Type
Unit
) =>
some: {α : Type} → α → Option α
some
5: Nat
5
) -- some 20

This may look a bit cumbersome, specifically, why did we need to invent this funny looking function fun (_ : Unit) => (some 5)?

Well if you take a close look at the type class definition:

class Seq (f : Type u → Type v) where
  seq : {α β : Type u} → f (α → β) → (Unit → f α) → f β

You will see this function defined here: (Unit → f α), this is a function that takes Unit as input and produces the output of type f α where f is the container type Type u -> Type v, in this example Option and α is the element type Nat, so fun (_ : Unit) => some 5 matches this definition because it is taking an input of type Unit and producing some 5 which is type Option Nat.

The that seq is defined this way is because Lean is an eagerly evaluated language (call-by-value), you have to use this kind of Unit function whenever you want to explicitly delay evaluation and seq wants that so it can eliminate unnecessary function evaluations whenever possible.

Fortunately the <*> infix notation hides this from you by creating this wrapper function for you. If you look up the notation using F12 in VS Code you will find it contains (fun _ : Unit => b).

Now to complete this picture you will find the default implementation of seq on the Lean Monad type class:

class Monad (m : Type u → Type v) extends Applicative m, Bind m where
  seq      f x := bind f fun y => Functor.map y (x ())

Notice here that x is the (Unit → f α) function, and it is calling that function by passing the Unit value (), which is the Unit value (Unit.unit). All this just to ensure delayed evaluation.

How do Applicatives help with Monads?

Applicatives are helpful for the same reasons as functors. They're a relatively simple abstract structure that has practical applications in your code. Now that you understand how chaining operations can fit into a structure definition, you're in a good position to start learning about Monads!

Monads

Building on Functors and Applicatives we can now introduce monads.

A monad is another type of abstract, functional structure. Let's explore what makes it different from the first two structures.

What is a Monad?

A monad is a computational context. It provides a structure that allows you to chain together operations that have some kind of shared state or similar effect. Whereas pure functional code can only operate on explicit input parameters and affect the program through explicit return values, operations in a monad can affect other computations in the chain implicitly through side effects, especially modification of an implicitly shared value.

How are monads represented in Lean?

Like functors and applicatives, monads are represented with a type class in Lean:

class Monad (m : Type u → Type v) extends Applicative m, Bind m where

Just as every applicative is a functor, every monad is also an applicative and there's one more new base type class used here that you need to understand, namely, Bind.

class Bind (f : Type u → Type v) where
  bind : {α β : Type u} → f α → (α → f β) → f β

The bind operator also has infix notation >>= where x >>= g represents the result of executing x to get a value of type f α then unwrapping the value α from that and passing it to function g of type α → f β returning the result of type f β where f is the target structure type (like Option or List)

This bind operation looks similar to the other ones you've seen so far, if you put them all together Monad has the following operations:

class Monad (f : Type u → Type v) extends Applicative f, Bind f where
  pure {α : Type u} : α → f α
  map : {α β : Type u} → (α → β) → f α → f β
  seq : {α β : Type u} → f (α → β) → (Unit → f α) → f β
  bind : {α β : Type u} → f α → (α → f β) → f β
  ...

Notice Monad also contains pure it must also have a "default" way to wrap a value in the structure.

The bind operator is similar to the applicative seq operator in that it chains two operations, with one of them being function related. Notice that bind, seq and map all take a function of some kind. Let's examine those function types:

  • map: (α → β)
  • seq: f (α → β)
  • bind: (α → f β)

So map is a pure function, seq is a pure function wrapped in the structure, and bind takes a pure input but produces an output wrapped in the structure.

Note: we are ignoring the (Unit → f α) function used by seq here since that has a special purpose explained in Applicatives Lazy Evaluation.

Basic Monad Example

Just as Option is a functor and an applicative functor, it is also a monad! Let's start with how Option implements the Monad type class.

instance: Monad Option
instance
:
Monad: (Type u_1 → Type u_1) → Type (u_1 + 1)
Monad
Option: Type u_1 → Type u_1
Option
where pure :=
Option.some: {α : Type u_1} → α → Option α
Option.some
bind :=
Option.bind: {α β : Type u_1} → Option α → (α → Option β) → Option β
Option.bind

where:

def Option.bind : Option α → (α → Option β) → Option β
  | none,   _ => none
  | some a, f => f a

Side note: this function definition is using a special shorthand syntax in Lean where the := match a, b with code can be collapsed away. To make this more clear consider the following simpler example, where Option.bind is using the second form like bar:

def 
foo: Option Nat → Nat → Option Nat
foo
(
x: Option Nat
x
:
Option: Type → Type
Option
Nat: Type
Nat
) (
y: Nat
y
:
Nat: Type
Nat
) :
Option: Type → Type
Option
Nat: Type
Nat
:= match
x: Option Nat
x
,
y: Nat
y
with |
none: {α : Type ?u.143} → Option α
none
, _ =>
none: {α : Type} → Option α
none
|
some: {α : Type ?u.161} → α → Option α
some
x: Nat
x
,
y: Nat
y
=>
some: {α : Type} → α → Option α
some
(
x: Nat
x
+
y: Nat
y
) def
bar: Option Nat → Nat → Option Nat
bar
:
Option: Type → Type
Option
Nat: Type
Nat
Nat: Type
Nat
Option: Type → Type
Option
Nat: Type
Nat
|
none: {α : Type ?u.309} → Option α
none
, _ =>
none: {α : Type} → Option α
none
|
some: {α : Type ?u.327} → α → Option α
some
x: Nat
x
,
y: Nat
y
=>
some: {α : Type} → α → Option α
some
(
x: Nat
x
+
y: Nat
y
)
some 3
foo: Option Nat → Nat → Option Nat
foo
(
some: {α : Type} → α → Option α
some
1: Nat
1
)
2: Nat
2
-- some 3
some 3
bar: Option Nat → Nat → Option Nat
bar
(
some: {α : Type} → α → Option α
some
1: Nat
1
)
2: Nat
2
-- some 3

What is important is that Option.bind is using a match statement to unwrap the input value Option α, if it is none then it does nothing and returns none, if it has a value of type α then it applies the function in the second argument (α → Option β) to this value, which is the expression f a that you see in the line | some a, f => f a above. The function returns a result of type Option β which then becomes the return value for bind. So there is no structure wrapping required on the return value since the input function already did that.

But let's bring in the definition of a monad. What does it mean to describe Option as a computational context?

The Option monad encapsulates the context of failure. Essentially, the Option monad lets us abort a series of operations whenever one of them fails. This allows future operations to assume that all previous operations have succeeded. Here's some code to motivate this idea:

def 
optionFunc1: String → Option Nat
optionFunc1
:
String: Type
String
->
Option: Type → Type
Option
Nat: Type
Nat
|
"": String
""
=>
none: {α : Type} → Option α
none
|
str: String
str
=>
some: {α : Type} → α → Option α
some
str: String
str
.
length: String → Nat
length
def
optionFunc2: Nat → Option Float
optionFunc2
(
i: Nat
i
:
Nat: Type
Nat
) :
Option: Type → Type
Option
Float: Type
Float
:= if
i: Nat
i
%
2: Nat
2
==
0: Nat
0
then
none: {α : Type} → Option α
none
else
some: {α : Type} → α → Option α
some
(
i: Nat
i
.
toFloat: Nat → Float
toFloat
*
3.14159: Float
3.14159
) def
optionFunc3: Float → Option (List Nat)
optionFunc3
(
f: Float
f
:
Float: Type
Float
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) := if
f: Float
f
>
15.0: Float
15.0
then
none: {α : Type} → Option α
none
else
some: {α : Type} → α → Option α
some
[
f: Float
f
.
floor: Float → Float
floor
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
,
f: Float
f
.
ceil: Float → Float
ceil
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
] def
runOptionFuncs: String → Option (List Nat)
runOptionFuncs
(
input: String
input
:
String: Type
String
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) := match
optionFunc1: String → Option Nat
optionFunc1
input: String
input
with |
none: {α : Type ?u.1335} → Option α
none
=>
none: {α : Type} → Option α
none
|
some: {α : Type ?u.1346} → α → Option α
some
i: Nat
i
=> match
optionFunc2: Nat → Option Float
optionFunc2
i: Nat
i
with |
none: {α : Type ?u.1362} → Option α
none
=>
none: {α : Type} → Option α
none
|
some: {α : Type ?u.1372} → α → Option α
some
f: Float
f
=>
optionFunc3: Float → Option (List Nat)
optionFunc3
f: Float
f
some [9, 10]
runOptionFuncs: String → Option (List Nat)
runOptionFuncs
"big": String
"big"
-- some [9, 10]

Here you see three different functions that could fail. These are then combined in runOptionFuncs. But then you have to use nested match expressions to check if the previous result succeeded. It would be very tedious to continue this pattern much longer.

The Option monad helps you fix this. Here's what this function looks like using the bind operator.

def 
runOptionFuncsBind: String → Option (List Nat)
runOptionFuncsBind
(
input: String
input
:
String: Type
String
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) :=
optionFunc1: String → Option Nat
optionFunc1
input: String
input
>>=
optionFunc2: Nat → Option Float
optionFunc2
>>=
optionFunc3: Float → Option (List Nat)
optionFunc3
some [9, 10]
runOptionFuncsBind: String → Option (List Nat)
runOptionFuncsBind
"big": String
"big"
-- some [9, 10]

It's much cleaner now! You take the first result and pass it into the second and third functions using the bind operation. The monad instance handles all the failure cases so you don't have to!

Let's see why the types work out. The result of optionFunc1 input is simply Option Nat. Then the bind operator allows you to take this Option Nat value and combine it with optionFunc2, whose type is Nat → Option Float The bind operator resolves these to an Option Float. Then you pass this similarly through the bind operator to optionFunc3, resulting in the final type, Option (List Nat).

Your functions will not always combine so cleanly though. This is where do notation comes into play. This notation allows you to write monadic operations one after another, line-by-line. It almost makes your code look like imperative programming. You can rewrite the above as:

def 
runOptionFuncsDo: String → Option (List Nat)
runOptionFuncsDo
(
input: String
input
:
String: Type
String
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) := do let
i: Nat
i
optionFunc1: String → Option Nat
optionFunc1
input: String
input
let
f: Float
f
optionFunc2: Nat → Option Float
optionFunc2
i: Nat
i
optionFunc3: Float → Option (List Nat)
optionFunc3
f: Float
f
some [9, 10]
runOptionFuncsDo: String → Option (List Nat)
runOptionFuncsDo
"big": String
"big"
-- some [9, 10]

The operator used here is special. It effectively unwraps the value on the right-hand side from the monad. This means the value i has type Nat, even though the result of optionFunc1 is Option Nat. This is done using a bind operation under the hood.

Note you can use <- or the nice unicode symbol which you can type into VS code by typing these characters \l . When you type the final space, \l is replaced with .

Observe that we do not unwrap the final line of the computation. The function result is Option (List Nat) which matches what optionFunc3 returns. At first glance, this may look more complicated than the bind example. However, it gives you a lot more flexibility, like mixing monadic and non-monadic statements, using if then/else structures with their own local do blocks and so on. It is particularly helpful when one monadic function depends on multiple previous functions.

Example using List

You can easily make List into a monad with the following, since List already provides an implementation of pure and bind.

instance: Monad List
instance
:
Monad: (Type u_1 → Type u_1) → Type (u_1 + 1)
Monad
List: Type u_1 → Type u_1
List
where pure :=
List.singleton: {α : Type u_1} → α → List α
List.singleton
bind :=
List.flatMap: {α β : Type u_1} → List α → (α → List β) → List β
List.flatMap

Like you saw with the applicative seq operator, the bind operator applies the given function to every element of the list. It is useful to look at the bind implementation for List:

open List
def 
bind: {α : Type u_1} → {β : Type u_2} → List α → (α → List β) → List β
bind
(
a: List α
a
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
) (
b: α → List β
b
:
α: Type u_1
α
List: Type u_2 → Type u_2
List
β: Type u_2
β
) :
List: Type u_2 → Type u_2
List
β: Type u_2
β
:=
Warning: `List.join` has been deprecated, use `List.flatten` instead
(
map: {α : Type u_1} → {β : Type u_2} → (α → β) → List α → List β
map
b: α → List β
b
a: List α
a
)

So Functor.map is used to apply the function b to every element of a but this would return a whole bunch of little lists, so join is used to turn those back into a single list.

Here's an example where you use bind to convert a list of strings into a combined list of chars:

['a', 'p', 'p', 'l', 'e']
"apple": String
"apple"
.
toList: String → List Char
toList
-- ['a', 'p', 'p', 'l', 'e']
['a', 'p', 'p', 'l', 'e', 'o', 'r', 'a', 'n', 'g', 'e']
[
"apple": String
"apple"
,
"orange": String
"orange"
] >>=
String.toList: String → List Char
String.toList
-- ['a', 'p', 'p', 'l', 'e', 'o', 'r', 'a', 'n', 'g', 'e']

The IO Monad

The IO Monad is perhaps the most important monad in Lean. It is also one of the hardest monads to understand starting out. Its actual implementation is too intricate to discuss when first learning monads. So it is best to learn by example.

What is the computational context that describes the IO monad? IO operations can read information from or write information to the terminal, file system, operating system, and/or network. They interact with systems outside of your program. If you want to get user input, print a message to the user, read information from a file, or make a network call, you'll need to do so within the IO Monad.

The state of the world outside your program can change at virtually any moment, and so this IO context is particularly special. So these IO operations are "side effects" which means you cannot perform them from "pure" Lean functions.

Now, the most important job of pretty much any computer program is precisely to perform this interaction with the outside world. For this reason, the root of all executable Lean code is a function called main, with the type IO Unit. So every program starts in the IO monad!

When your function is IO monadic, you can get any input you need, call into "pure" code with the inputs, and then output the result in some way. The reverse does not work. You cannot call into IO code from pure code like you can call into a function that takes Option as input. Another way to say this is you cannot invent an IO context out of thin air, it has to be given to you in your main function.

Let's look at a simple program showing a few of the basic IO functions. It also uses do notation to make the code read nicely:

def 
main: IO Unit
main
:
IO: Type → Type
IO
Unit: Type
Unit
:= do
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
"enter a line of text:": String
"enter a line of text:"
let
stdin: IO.FS.Stream
stdin
IO.getStdin: BaseIO IO.FS.Stream
IO.getStdin
-- IO IO.FS.Stream (monadic) let
input: String
input
stdin: IO.FS.Stream
stdin
.
getLine: IO.FS.Stream → IO String
getLine
-- IO.FS.Stream → IO String (monadic) let
uppercased: String
uppercased
:=
input: String
input
.
toUpper: String → String
toUpper
-- String → String (pure)
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
uppercased: String
uppercased
-- IO Unit (monadic)

So, once again you can see that the do notation lets you chain a series of monadic actions. IO.getStdin is of type IO IO.FS.Stream and stdin.getLine is of type IO String and IO.println is of type IO Unit.

In between you see a non-monadic expression let uppercased := input.toUpper which is fine too. A let statement can occur in any monad. Just as you could unwrap i from Option Nat to get the inner Nat, you can use to unwrap the result of getLine to get a String. You can then manipulate this value using normal pure string functions like toUpper, and then you can pass the result to the IO.println function.

This is a simple echo program. It reads a line from the terminal, and then prints the line back out capitalized to the terminal. Hopefully it gives you a basic understanding of how IO works.

You can test this program using lean --run as follows:

> lean --run Main.lean
enter a line of text:
the quick brown fox
THE QUICK BROWN FOX

Here the user entered the string the quick brown fox and got back the uppercase result.

What separates Monads from Applicatives?

The key that separates these is context. You cannot really determine the structure of "future" operations without knowing the results of "past" operations, because the past can alter the context in which the future operations work. With applicatives, you can't get the final function result without evaluating everything, but you can determine the structure of how the operation will take place. This allows some degree of parallelism with applicatives that is not generally possible with monads.

Conclusion

Hopefully you now have a basic level understanding of what a monad is. But perhaps some more examples of what a "computational context" means would be useful to you. The Reader, State and Except monads each provide a concrete and easily understood context that can be compared easily to function parameters. You can learn more about those in Reader monads, State monads, and the Except monad.

Readers

In the previous section you learned about the conceptual idea of monads. You learned what they are, and saw how some common types like IO and Option work as monads. Now in this section, you will be looking at some other useful monads. In particular, the ReaderM monad.

How to do Global Variables in Lean?

In Lean, your code is generally "pure", meaning functions can only interact with the arguments passed to them. This effectively means you cannot have global variables. You can have global definitions, but these are fixed at compile time. If some user behavior might change them, you would have to wrap them in the IO monad, which means they can't be used from pure code.

Consider this example. Here, you want to have an Environment containing different parameters as a global variable. However, you want to load these parameters from the process environment variables, which requires the IO monad.

structure 
Environment: Type
Environment
where
path: Environment → String
path
:
String: Type
String
home: Environment → String
home
:
String: Type
String
user: Environment → String
user
:
String: Type
String
deriving
Repr: Type u → Type u
Repr
def
getEnvDefault: String → IO String
getEnvDefault
(
name: String
name
:
String: Type
String
):
IO: Type → Type
IO
String: Type
String
:= do let
val?: Option String
val?
IO.getEnv: String → BaseIO (Option String)
IO.getEnv
name: String
name
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
<| match
val?: Option String
val?
with |
none: {α : Type ?u.1777} → Option α
none
=>
"": String
""
|
some: {α : Type ?u.1786} → α → Option α
some
s: String
s
=>
s: String
s
def
loadEnv: IO Environment
loadEnv
:
IO: Type → Type
IO
Environment: Type
Environment
:= do let
path: String
path
getEnvDefault: String → IO String
getEnvDefault
"PATH": String
"PATH"
let
home: String
home
getEnvDefault: String → IO String
getEnvDefault
"HOME": String
"HOME"
let
user: String
user
getEnvDefault: String → IO String
getEnvDefault
"USER": String
"USER"
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
{
path: String
path
,
home: String
home
,
user: String
user
} def
func1: Environment → Float
func1
(
e: Environment
e
:
Environment: Type
Environment
) :
Float: Type
Float
:= let
l1: Nat
l1
:=
e: Environment
e
.
path: Environment → String
path
.
length: String → Nat
length
let
l2: Nat
l2
:=
e: Environment
e
.
home: Environment → String
home
.
length: String → Nat
length
*
2: Nat
2
let
l3: Nat
l3
:=
e: Environment
e
.
user: Environment → String
user
.
length: String → Nat
length
*
3: Nat
3
(
l1: Nat
l1
+
l2: Nat
l2
+
l3: Nat
l3
).
toFloat: Nat → Float
toFloat
*
2.1: Float
2.1
def
func2: Environment → Nat
func2
(
env: Environment
env
:
Environment: Type
Environment
) :
Nat: Type
Nat
:=
2: Nat
2
+ (
func1: Environment → Float
func1
env: Environment
env
).
floor: Float → Float
floor
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
def
func3: Environment → String
func3
(
env: Environment
env
:
Environment: Type
Environment
) :
String: Type
String
:=
"Result: ": String
"Result: "
++ (
toString: {α : Type} → [self : ToString α] → α → String
toString
(
func2: Environment → Nat
func2
env: Environment
env
)) def
main: IO Unit
main
:
IO: Type → Type
IO
Unit: Type
Unit
:= do let
env: Environment
env
loadEnv: IO Environment
loadEnv
let
str: String
str
:=
func3: Environment → String
func3
env: Environment
env
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
str: String
str
Result: 1308
main: IO Unit
main
-- Result: 7538

The only function actually using the environment is func1. However func1 is a pure function. This means it cannot directly call loadEnv, an impure function in the IO monad. This means the environment has to be passed through as a variable to the other functions, just so they can ultimately pass it to func1. In a language with global variables, you could save env as a global value in main. Then func1 could access it directly. There would be no need to have it as a parameter to func1, func2 and func3. In larger programs, these "pass-through" variables can cause a lot of headaches.

The Reader Solution

The ReaderM monad solves this problem. It effectively creates a global read-only value of a specified type. All functions within the monad can "read" the type. Let's look at how the ReaderM monad changes the shape of this code. Now the functions no longer need to be given the Environment as an explicit parameter, as they can access it through the monad.

def 
readerFunc1: ReaderM Environment Float
readerFunc1
:
ReaderM: Type → Type → Type
ReaderM
Environment: Type
Environment
Float: Type
Float
:= do let
env: Environment
env
read: {ρ : outParam Type} → {m : Type → Type} → [self : MonadReader ρ m] → m ρ
read
let
l1: Nat
l1
:=
env: Environment
env
.
path: Environment → String
path
.
length: String → Nat
length
let
l2: Nat
l2
:=
env: Environment
env
.
home: Environment → String
home
.
length: String → Nat
length
*
2: Nat
2
let
l3: Nat
l3
:=
env: Environment
env
.
user: Environment → String
user
.
length: String → Nat
length
*
3: Nat
3
return (
l1: Nat
l1
+
l2: Nat
l2
+
l3: Nat
l3
).
toFloat: Nat → Float
toFloat
*
2.1: Float
2.1
def
readerFunc2: ReaderM Environment Nat
readerFunc2
:
ReaderM: Type → Type → Type
ReaderM
Environment: Type
Environment
Nat: Type
Nat
:=
readerFunc1: ReaderM Environment Float
readerFunc1
>>= (fun
x: Float
x
=> return
2: Nat
2
+ (
x: Float
x
.
floor: Float → Float
floor
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
)) def
readerFunc3: ReaderM Environment String
readerFunc3
:
ReaderM: Type → Type → Type
ReaderM
Environment: Type
Environment
String: Type
String
:= do let
x: Nat
x
readerFunc2: ReaderM Environment Nat
readerFunc2
return
"Result: ": String
"Result: "
++
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x
def
main2: IO Unit
main2
:
IO: Type → Type
IO
Unit: Type
Unit
:= do let
env: Environment
env
loadEnv: IO Environment
loadEnv
let
str: Id String
str
:=
readerFunc3: ReaderM Environment String
readerFunc3
.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
env: Environment
env
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
str: Id String
str
Result: 1308
main2: IO Unit
main2
-- Result: 7538

The ReaderM monad provides a run method and it is the ReaderM run method that takes the initial Environment context. So here you see main2 loads the environment as before, and establishes the ReaderM context by passing env to the run method.

Side note 1: The return statement used above also needs some explanation. The return statement in Lean is closely related to pure, but a little different. First the similarity is that return and pure both lift a pure value up to the Monad type. But return is a keyword so you do not need to parenthesize the expression like you do when using pure. (Note: you can avoid parentheses when using pure by using the <| operator like we did above in the initial getEnvDefault function). Furthermore, return can also cause an early return in a monadic function similar to how it can in an imperative language while pure cannot.

So technically if return is the last statement in a function it could be replaced with pure <|, but one could argue that return is still a little easier for most folks to read, just so long as you understand that return is doing more than other languages, it is also wrapping pure values in the monadic container type.

Side note 2: If the function readerFunc3 also took some explicit arguments then you would have to write (readerFunc3 args).run env and this is a bit ugly, so Lean provides an infix operator |> that eliminates those parentheses so you can write readerFunc3 args |>.run env and then you can chain multiple monadic actions like this m1 args1 |>.run args2 |>.run args3 and this is the recommended style. You will see this pattern used heavily in Lean code.

The let env ← read expression in readerFunc1 unwraps the environment from the ReaderM so we can use it. Each type of monad might provide one or more extra functions like this, functions that become available only when you are in the context of that monad.

Here the readerFunc2 function uses the bind operator >>= just to show you that there are bind operations happening here. The readerFunc3 function uses the do notation you learned about in Monads which hides that bind operation and can make the code look cleaner. So the expression let x ← readerFunc2 is also calling the bind function under the covers, so that you can access the unwrapped value x needed for the toString x conversion.

The important difference here to the earlier code is that readerFunc3 and readerFunc2 no longer have an explicit Environment input parameter that needs to be passed along all the way to readerFunc1. Instead, the ReaderM monad is taking care of that for you, which gives you the illusion of something like global context where the context is now available to all functions that use the ReaderM monad.

The above code also introduces an important idea. Whenever you learn about a monad "X", there's often (but not always) a run function to execute that monad, and sometimes some additional functions like read that interact with the monad context.

You might be wondering, how does the context actually move through the ReaderM monad? How can you add an input argument to a function by modifying its return type? There is a special command in Lean that will show you the reduced types:

ReaderM Environment String
ReaderM: Type → Type → Type
ReaderM
Environment: Type
Environment
String: Type
String
-- Environment → String

And you can see here that this type is actually a function! It's a function that takes an Environment as input and returns a String.

Now, remember in Lean that a function that takes an argument of type Nat and returns a String like def f (a : Nat) : String is the same as this function def f : Nat → String. These are exactly equal as types. Well this is being used by the ReaderM Monad to add an input argument to all the functions that use the ReaderM monad and this is why main is able to start things off by simply passing that new input argument in readerFunc3.run env. So now that you know the implementation details of the ReaderM monad you can see that what it is doing looks very much like the original code we wrote at the beginning of this section, only it's taking a lot of the tedious work off your plate and it is creating a nice clean separation between what your pure functions are doing, and the global context idea that the ReaderM adds.

withReader

One ReaderM function can call another with a modified version of the ReaderM context. You can use the withReader function from the MonadWithReader type class to do this:

def 
readerFunc3WithReader: ReaderM Environment String
readerFunc3WithReader
:
ReaderM: Type → Type → Type
ReaderM
Environment: Type
Environment
String: Type
String
:= do let
x: Nat
x
withReader: {ρ : outParam Type} → {m : Type → Type} → [self : MonadWithReader ρ m] → {α : Type} → (ρ → ρ) → m α → m α
withReader
(λ
env: Environment
env
=> {
env: Environment
env
with user :=
"new user": String
"new user"
})
readerFunc2: ReaderM Environment Nat
readerFunc2
return
"Result: ": String
"Result: "
++
toString: {α : Type} → [self : ToString α] → α → String
toString
x: Nat
x

Here we changed the user in the Environment context to "new user" and then we passed that modified context to readerFunc2.

So withReader f m executes monad m in the ReaderM context modified by f.

Handy shortcut with (← e)

If you use the operator in a let expression and the variable is only used once you can eliminate the let expression and place the operator in parentheses like this call to loadEnv:

def 
main3: IO Unit
main3
:
IO: Type → Type
IO
Unit: Type
Unit
:= do let
str: Id String
str
:=
readerFunc3: ReaderM Environment String
readerFunc3
(← loadEnv): Environment
(
loadEnv: IO Environment
loadEnv
(← loadEnv): Environment
)
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
str: Id String
str

Conclusion

It might not seem like much has been accomplished with this ReaderM Environment monad, but you will find that in larger code bases, with many different types of monads all composed together this greatly cleans up the code. Monads provide a beautiful functional way of managing cross-cutting concerns that would otherwise make your code very messy.

Having this control over the inherited ReaderM context via withReader is actually very useful and something that is quite messy if you try and do this sort of thing with global variables, saving the old value, setting the new one, calling the function, then restoring the old value, making sure you do that in a try/finally block and so on. The ReaderM design pattern avoids that mess entirely.

Now it's time to move on to StateM Monad which is like a ReaderM that is also updatable.

import Lean.Data.HashMap

State

In the previous section, you learned about the ReaderM monad. Hopefully this gave you a new perspective on Lean. It showed that, in fact, you can have global variables of some sort; you just need to encode them in the type signature somehow, and this is what monads are for! In this part, you will explore the StateM monad, which is like a ReaderM only the state can also be updated.

Motivating example: Tic Tac Toe

For this section, let's build a simple model for a Tic Tace Toe game. The main object is the GameState data type containing several important pieces of information. First and foremost, it has the "board", a map from 2D tile indices to the "Tile State" (X, O or empty). Then it also knows the current player, and it has a random generator.

open 
Error: unknown namespace 'Batteries'
(HashMap) abbrev
TileIndex: Type
TileIndex
:=
Nat: Type
Nat
×
Nat: Type
Nat
-- a 2D index inductive
TileState: Type
TileState
where |
TileEmpty: TileState
TileEmpty
|
TileX: TileState
TileX
|
TileO: TileState
TileO
deriving
Repr: Type u → Type u
Repr
,
BEq: Type u → Type u
BEq
inductive
Player: Type
Player
where |
XPlayer: Player
XPlayer
|
OPlayer: Player
OPlayer
deriving
Repr: Type u → Type u
Repr
,
BEq: Type u → Type u
BEq
abbrev
Board: ?m.688
Board
:=
Error: unknown identifier 'HashMap'
TileIndex TileState: ?m.688
TileIndex TileState
structure
GameState: Type u_1
GameState
where
board: GameState → {Board : Sort u_1} → Board
board
:
Board: Sort u_1
Board
currentPlayer: GameState → Player
currentPlayer
:
Player: Type
Player
generator: GameState → StdGen
generator
:
StdGen: Type
StdGen

Let's think at a high level about how some of the game functions would work. You could, for instance, have a function for selecting a random move. This would output a TileIndex to play and alter the game's number generator. You would then make a move based on the selected move and the current player. This would change the board state as well as swap the current player. In other words, you have operations that depend on the current state of the game, but also need to update that state.

The StateM Monad to the Rescue

This is exactly the situation the StateM monad deals with. The StateM monad wraps computations in the context of reading and modifying a global state object.

It is parameterized by a single type parameter s, the state type in use. So just like the ReaderM has a single type you read from, the StateM has a single type you can both read from and write to. There are three primary actions you can take within the StateMmonad:

  • get - retrieves the state, like Reader.read
  • set - updates the state
  • modifyGet - retrieves the state, then updates it

There is also a run function, similar to run on ReaderM. Like the ReaderM monad, you must provide an initial state, in addition to the computation to run. StateM then produces two outputs: the result of the computation combined with the final updated state.

If you wish to discard the final state and just get the computation's result, you can use run' method instead. Yes in Lean, the apostrophe can be part of a name, you read this "run prime", and the general naming convention is that the prime method discards something.

So for your Tic Tac Toe game, many of your functions will have a signature like State GameState a.

Stateful Functions

Now you can examine some of the different functions mentioned above and determine their types. You can, for instance, pick a random move:

open TileState

def 
findOpen: StateM GameState (List TileIndex)
findOpen
:
StateM: Type → Type → Type
StateM
GameState: Type
GameState
(
List: Type → Type
List
TileIndex: Type
TileIndex
) := do let
game: GameState
game
get: {σ : outParam Type} → {m : Type → Type} → [self : MonadState σ m] → m σ
get
return
Error: invalid field notation, type is not of the form (C ...) where C is a constant game.board has type ?m.1381
def
chooseRandomMove: StateM GameState TileIndex
chooseRandomMove
:
StateM: Type → Type → Type
StateM
GameState: Type
GameState
TileIndex: Type
TileIndex
:= do let
game: GameState
game
get: {σ : outParam Type} → {m : Type → Type} → [self : MonadState σ m] → m σ
get
let
openSpots: List TileIndex
openSpots
findOpen: StateM GameState (List TileIndex)
findOpen
let
gen: StdGen
gen
:=
game: GameState
game
.
generator: GameState → StdGen
generator
let (
i: Nat
i
,
gen': StdGen
gen'
) :=
randNat: {gen : Type} → [inst : RandomGen gen] → gen → Nat → Nat → Nat × gen
randNat
gen: StdGen
gen
0: Nat
0
(
openSpots: List TileIndex
openSpots
.
length: {α : Type} → List α → Nat
length
-
1: Nat
1
)
set: {σ : semiOutParam Type} → {m : Type → Type} → [self : MonadStateOf σ m] → σ → m PUnit
set
{
game: GameState
game
with generator :=
gen': StdGen
gen'
} return
openSpots: List TileIndex
openSpots
[
i: Nat
i
]!

This returns a TileIndex and modifies the random number generator stored in the GameState! Notice you have a fun little use of the Applicative.seqRight operator *> in findOpen as described in Applicatives.

Now you can create the function that can make a move:

open Player

def 
tileStateForPlayer: Player → TileState
tileStateForPlayer
:
Player: Type
Player
TileState: Type
TileState
|
XPlayer: Player
XPlayer
=>
TileX: TileState
TileX
|
OPlayer: Player
OPlayer
=>
TileO: TileState
TileO
def
nextPlayer: Player → Player
nextPlayer
:
Player: Type
Player
Player: Type
Player
|
XPlayer: Player
XPlayer
=>
OPlayer: Player
OPlayer
|
OPlayer: Player
OPlayer
=>
XPlayer: Player
XPlayer
def
applyMove: TileIndex → StateM GameState Unit
applyMove
(
i: TileIndex
i
:
TileIndex: Type
TileIndex
):
StateM: Type → Type → Type
StateM
GameState: Type
GameState
Unit: Type
Unit
:= do let
game: GameState
game
get: {σ : outParam Type} → {m : Type → Type} → [self : MonadState σ m] → m σ
get
let
p: Player
p
:=
game: GameState
game
.
currentPlayer: GameState → Player
currentPlayer
let
newBoard: ∀ {Board : Prop}, Board
newBoard
:=
Error: invalid field notation, type is not of the form (C ...) where C is a constant game.board has type ?m.2582
set: {σ : semiOutParam Type} → {m : Type → Type} → [self : MonadStateOf σ m] → σ → m PUnit
set
{
game: GameState
game
with currentPlayer :=
nextPlayer: Player → Player
nextPlayer
p: Player
p
, board :=
newBoard: ∀ {Board : Prop}, Board
newBoard
}

This updates the board in the GameState with the new tile, and then changes the current player, providing no output (Unit return type).

So finally, you can combine these functions together with do notation, and it actually looks quite clean! You don't need to worry about the side effects. The different monadic functions handle them. Here's a sample of what your function might look like to play one turn of the game. At the end, it returns a boolean determining if all the spaces have been filled.

Notice in isGameDone and nextTurn we have stopped providing the full return type StateM GameState Unit. This is because Lean is able to infer the correct monadic return type from the context and as a result the code is now looking really clean.

def 
isGameDone: StateM GameState Bool
isGameDone
:= do return
(← findOpen): List TileIndex
(
findOpen: StateM GameState (List TileIndex)
findOpen
(← findOpen): List TileIndex
)
.
isEmpty: {α : Type} → List α → Bool
isEmpty
def
nextTurn: StateM GameState Bool
nextTurn
:= do let
i: TileIndex
i
chooseRandomMove: StateM GameState TileIndex
chooseRandomMove
applyMove: TileIndex → StateM GameState Unit
applyMove
i: TileIndex
i
isGameDone: StateM GameState Bool
isGameDone

To give you a quick test harness that runs all moves for both players you can run this:

def 
initBoard: {Board : Type u_1} → Board
initBoard
:
Board: Type u_1
Board
:=
Id.run: {α : Type u_1} → Id α → α
Id.run
do let mut
board: Id Board
board
:=
Error: unknown identifier 'HashMap.empty'
for
i: Nat
i
in [
0: Nat
0
:
3: Nat
3
] do for
j: Nat
j
in [
0: Nat
0
:
3: Nat
3
] do let
t: TileIndex
t
:
TileIndex: Type
TileIndex
:= (
i: Nat
i
,
j: Nat
j
)
board: Id Board
board
:=
Error: invalid field notation, type is not of the form (C ...) where C is a constant board has type Board
Error: invalid field 'insert', the environment does not contain 'Id.insert' board has type Id Board
board: Id Board
board
def
printBoard: {Board : Sort u_1} → Board → IO Unit
printBoard
(
board: Board
board
:
Board: Sort u_1
Board
) :
IO: Type → Type
IO
Unit: Type
Unit
:= do let mut
row: List String
row
:
List: Type → Type
List
String: Type
String
:=
[]: List String
[]
Error: failed to construct 'ForIn' instance for collection ?m.3750 and monad EIO IO.Error
Error: failed to construct 'ForIn' instance for collection ?m.3750 and monad EIO IO.Error
Error: invalid field notation, type is not of the form (C ...) where C is a constant board has type Board
Error: failed to construct 'ForIn' instance for collection ?m.3750 and monad EIO IO.Error
def
playGame: StateM GameState PUnit
playGame
:= do while
true: Bool
true
do let
finished: Bool
finished
nextTurn: StateM GameState Bool
nextTurn
if
finished: Bool
finished
then
return: StateM GameState (ForInStep (MProd (Option PUnit) PUnit))
return
def
main: IO Unit
main
:
IO: Type → Type
IO
Unit: Type
Unit
:= do let
gen: StdGen
gen
IO.stdGenRef: IO.Ref StdGen
IO.stdGenRef
.
get: {σ : Type} → {m : Type → Type} → [inst : MonadLiftT (ST σ) m] → {α : Type} → ST.Ref σ α → m α
get
let (
x: Nat
x
,
gen': StdGen
gen'
) :=
randNat: {gen : Type} → [inst : RandomGen gen] → gen → Nat → Nat → Nat × gen
randNat
gen: StdGen
gen
0: Nat
0
1: Nat
1
let
gs: GameState
gs
:= { board :=
Error: type mismatch initBoard has type ?m.4893 : Type ?u.4892 but is expected to have type Board : Prop
, currentPlayer := if
x: Nat
x
=
0: Nat
0
then
XPlayer: Player
XPlayer
else
OPlayer: Player
OPlayer
, generator :=
gen': StdGen
gen'
} let (_,
g: GameState
g
) :=
playGame: StateM GameState PUnit
playGame
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
gs: GameState
gs
printBoard: {Board : Prop} → Board → IO Unit
printBoard
g: GameState
g
.
board: GameState → ∀ {Board : Prop}, Board
board
Error: aborting evaluation since the expression depends on the 'sorry' axiom, which can lead to runtime instability and crashes. To attempt to evaluate anyway despite the risks, use the '#eval!' command.
main: IO Unit
main
-- [X, X, O] -- [X, O, O] -- [O, O, X]

Note that when you run the above code interactively the random number generator always starts in the same place. But if you run lean --run states.lean then you will see randomness in the result.

Implementation

It may be helpful to see how the StateM monad adds the input state and output state. If you look at the reduced Type for nextTurn:

StateM GameState Bool
StateM: Type → Type → Type
StateM
GameState: Type
GameState
Bool: Type
Bool
-- GameState → Bool × GameState

So a function like nextTurn that might have just returned a Bool has been modified by the StateM monad such that the initial GameState is passed in as a new input argument, and the output value has been changed to the pair Bool × GameState so that it can return the pure Bool and the updated GameState. So playGame then is automatically saving that updated game state so that each time around the while loop it is acting on the new state, otherwise that would be an infinite loop!

It is also interesting to see how much work the do and notation are doing for you. To implement the nextTurn function without these you would have to write this, manually plumbing the state all the way through:

def 
nextTurnManually: StateM GameState Bool
nextTurnManually
:
StateM: Type → Type → Type
StateM
GameState: Type
GameState
Bool: Type
Bool
|
state: GameState
state
=> let (
i: TileIndex
i
,
gs: GameState
gs
) :=
chooseRandomMove: StateM GameState TileIndex
chooseRandomMove
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
state: GameState
state
let (_,
gs': GameState
gs'
) :=
applyMove: TileIndex → StateM GameState Unit
applyMove
i: TileIndex
i
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
gs: GameState
gs
let (
result: Bool
result
,
gs'': GameState
gs''
) :=
isGameDone: StateM GameState Bool
isGameDone
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
gs': GameState
gs'
(
result: Bool
result
,
gs'': GameState
gs''
)

This expression let (i, gs) conveniently breaks a returned pair up into 2 variables. In the expression let (_, gs') we didn't care what the first value was so we used underscore. Notice that nextTurn is capturing the updated game state from chooseRandomMove in the variable gs, which it is then passing to applyMove which returns gs' which is passed to isGameDone and that function returns gs'' which we then return from nextTurnManually. Phew, what a lot of work you don't have to do when you use do notation!

StateM vs ReaderM

While ReaderM functions can use withReader to modify the context before calling another function, StateM functions are a little more powerful, let's look at this function again:

def nextTurn : StateM GameState Bool := do
  let i ← chooseRandomMove
  applyMove i
  isGameDone

In this function chooseRandomMove is modifying the state that applyMove is getting and chooseRandomMove knows nothing about applyMove. So StateM functions can have this kind of downstream effect outside their own scope, whereas, withReader cannot do that.

So there is no equivalent to withReader for StateM, besides you can always use the StateM set function to modify the state before calling the next function anyway. You could however, manually call a StateM function like you see in nextTurnManually and completely override the state at any point that way.

State, IO and other languages

When thinking about Lean, it is often seen as a restriction that you can't have global variables or static variables like you can with other languages like Python or C++. However, hopefully you see now this isn't true. You can have a data type with exactly the same functionality as a Python class. You would simply have many functions that can modify some global state using the StateM monad.

The difference is in Lean you simply put a label on these types of functions. You don't allow it to happen for free anywhere in an uncontrolled fashion because that results in too many sleepless nights debugging nasty code. You want to know when side effects can potentially happen, because knowing when they can happen makes your code easier to reason about. In a Python class, many of the methods won't actually need to modify the global state. But they could, which makes it harder to debug them. In Lean you can simply make these pure functions, and the compiler will ensure they stay pure and cannot modify any global state.

IO is the same way. It's not like you can't perform IO in Lean. Instead, you want to label the areas where you can, to increase your certainty about the areas where you don't need to. When you know part of your code cannot communicate with the outside world, you can be far more certain of its behavior.

The StateM monad is also a more disciplined way of managing side effects. Top level code could call a StateM function multiple times with different independent initial states, even doing that across multiple tasks in parallel and each of these cannot clobber the state belonging to other tasks. Monadic code is more predictable and reusable than code that uses global variables.

Summary

That wraps it up for the StateM monad! There is one more very useful monad that can be used to do exception handling which will be covered in the next section.

Except

The Except Monad adds exception handling behavior to your functions. Exception handling in other languages like Python or Java is done with a built in throw method that you can use anywhere. In Lean you can only throw an exception when your function is executing in the context of an Except monad.

def 
divide: Float → Float → Except String Float
divide
(
x: Float
x
y: Float
y
:
Float: Type
Float
):
Except: Type → Type → Type
Except
String: Type
String
Float: Type
Float
:= if
y: Float
y
==
0: Float
0
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
"can't divide by zero": String
"can't divide by zero"
else
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(
x: Float
x
/
y: Float
y
)
Except.ok 2.500000
divide: Float → Float → Except String Float
divide
5: Float
5
2: Float
2
-- Except.ok 2.500000
Except.error "can't divide by zero"
divide: Float → Float → Except String Float
divide
5: Float
5
0: Float
0
-- Except.error "can't divide by zero"

Just as the read operation was available from the ReaderM monad and the get and set operations came with the StateM monad, here you can see a throw operation is provided by the Except monad.

So in Lean, throw is not available everywhere like it is in most imperative programming languages. You have to declare your function can throw by changing the type signature to Except String Float. This creates a function that might return an error of type String or it might return a value of type Float in the non-error case.

Once your function is monadic you also need to use the pure constructor of the Except monad to convert the pure non-monadic value x / y into the required Except object. See Applicatives for details on pure.

Now this return typing would get tedious if you had to include it everywhere that you call this function, however, Lean type inference can clean this up. For example, you can define a test function that calls the divide function and you don't need to say anything here about the fact that it might throw an error, because that is inferred:

def 
test: Except String Float
test
:=
divide: Float → Float → Except String Float
divide
5: Float
5
0: Float
0
test : Except String Float
test: Except String Float
test
-- Except String Float

Notice the Lean compiler infers the required Except String Float type information for you. And now you can run this test and get the expected exception:

Except.error "can't divide by zero"
test: Except String Float
test
-- Except.error "can't divide by zero"

Chaining

Now as before you can build a chain of monadic actions that can be composed together using bind (>>=):

def 
square: Float → Except String Float
square
(
x: Float
x
:
Float: Type
Float
) :
Except: Type → Type → Type
Except
String: Type
String
Float: Type
Float
:= if
x: Float
x
>=
100: Float
100
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
"it's absolutely huge": String
"it's absolutely huge"
else
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(
x: Float
x
*
x: Float
x
)
Except.ok 9.000000
divide: Float → Float → Except String Float
divide
6: Float
6
2: Float
2
>>=
square: Float → Except String Float
square
-- Except.ok 9.000000
Except.error "can't divide by zero"
divide: Float → Float → Except String Float
divide
6: Float
6
0: Float
0
>>=
square: Float → Except String Float
square
-- Except.error "can't divide by zero"
Except.error "it's absolutely huge"
divide: Float → Float → Except String Float
divide
100: Float
100
1: Float
1
>>=
square: Float → Except String Float
square
-- Except.error "it's absolutely huge" def
chainUsingDoNotation: Except String Float
chainUsingDoNotation
:= do let
r: Float
r
divide: Float → Float → Except String Float
divide
6: Float
6
0: Float
0
square: Float → Except String Float
square
r: Float
r
Except.error "can't divide by zero"
chainUsingDoNotation: Except String Float
chainUsingDoNotation
-- Except.error "can't divide by zero"

Notice in the second divide 6 0 the exception from that division was nicely propagated along to the final result and the square function was ignored in that case. You can see why the square function was ignored if you look at the implementation of Except.bind:

def 
bind: {ε : Type u_1} → {α : Type u_2} → {β : Type u_3} → Except ε α → (α → Except ε β) → Except ε β
bind
(
ma: Except ε α
ma
:
Except: Type u_1 → Type u_2 → Type (max u_1 u_2)
Except
ε: Type u_1
ε
α: Type u_2
α
) (
f: α → Except ε β
f
:
α: Type u_2
α
Except: Type u_1 → Type u_3 → Type (max u_1 u_3)
Except
ε: Type u_1
ε
β: Type u_3
β
) :
Except: Type u_1 → Type u_3 → Type (max u_1 u_3)
Except
ε: Type u_1
ε
β: Type u_3
β
:= match
ma: Except ε α
ma
with |
Except.error: {ε : Type ?u.2428} → {α : Type ?u.2427} → ε → Except ε α
Except.error
err: ε
err
=>
Except.error: {ε : Type u_1} → {α : Type u_3} → ε → Except ε α
Except.error
err: ε
err
|
Except.ok: {ε : Type ?u.2454} → {α : Type ?u.2453} → α → Except ε α
Except.ok
v: α
v
=>
f: α → Except ε β
f
v: α
v

Specifically notice that it only calls the next function f v in the Except.ok, and in the error case it simply passes the same error along.

Remember also that you can chain the actions with implicit binding by using the do notation as you see in the chainUsingDoNotation function above.

Try/Catch

Now with all good exception handling you also want to be able to catch exceptions so your program can continue on or do some error recovery task, which you can do like this:

def 
testCatch: Except String String
testCatch
:= try let
r: Float
r
divide: Float → Float → Except String Float
divide
8: Float
8
0: Float
0
-- 'r' is type Float
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(
toString: {α : Type} → [self : ToString α] → α → String
toString
r: Float
r
) catch
e: String
e
=>
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
s!"Caught exception: {
e: String
e
}"
testCatch : Except String String
testCatch: Except String String
testCatch
-- Except String String

Note that the type inferred by Lean for this function is Except String String so unlike the test function earlier, this time Lean type inference has figured out that since the pure value (toString r) is of type String, then this function must have type Except String String so you don't have to explicitly state this. You can always hover your mouse over testCatch or use #check testCatch to query Lean interactively to figure out what type inference has decided. Lean type inference makes life easy for you, so it's good to use it when you can.

You can now see the try/catch working in this eval:

Except.ok "Caught exception: can't divide by zero"
testCatch: Except String String
testCatch
-- Except.ok "Caught exception: can't divide by zero"

Notice the Caught exception: wrapped message is returned, and that it is returned as an Except.ok value, meaning testCatch eliminated the error result as expected.

So you've interleaved a new concept into your functions (exception handling) and the compiler is still able to type check everything just as well as it does for pure functions and it's been able to infer some things along the way to make it even easier to manage.

Now you might be wondering why testCatch doesn't infer the return type String? Lean does this as a convenience since you could have a rethrow in or after the catch block. If you really want to stop the Except type from bubbling up you can unwrap it like this:

def 
testUnwrap: String
testUnwrap
:
String: Type
String
:=
Id.run: {α : Type} → Id α → α
Id.run
do let
r: Except String Float
r
divide: Float → Float → Except String Float
divide
8: Float
8
0: Float
0
-- r is type Except String Float match
r: Except String Float
r
with |
.ok: {ε α : Type} → α → Except ε α
.ok
a: Float
a
=>
toString: {α : Type} → [self : ToString α] → α → String
toString
a: Float
a
-- 'a' is type Float |
.error: {ε α : Type} → ε → Except ε α
.error
e: String
e
=> s!"Caught exception: {
e: String
e
}"
testUnwrap : String
testUnwrap: String
testUnwrap
-- String
"Caught exception: can't divide by zero"
testUnwrap: String
testUnwrap
-- "Caught exception: can't divide by zero"

The Id.run function is a helper function that executes the do block and returns the result where Id is the identity monad. So Id.run do is a pattern you can use to execute monads in a function that is not itself monadic. This works for all monads except IO which, as stated earlier, you cannot invent out of thin air, you must use the IO monad given to your main function.

Monadic functions

You can also write functions that are designed to operate in the context of a monad. These functions typically end in upper case M like List.forM used below:

def 
validateList: List Nat → Nat → Except String Unit
validateList
(
x: List Nat
x
:
List: Type → Type
List
Nat: Type
Nat
) (
max: Nat
max
:
Nat: Type
Nat
):
Except: Type → Type → Type
Except
String: Type
String
Unit: Type
Unit
:= do
x: List Nat
x
.
forM: {m : Type → Type} → [inst : Monad m] → {α : Type} → List α → (α → m PUnit) → m PUnit
forM
fun
a: Nat
a
=> do if
a: Nat
a
>
max: Nat
max
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
"illegal value found in list": String
"illegal value found in list"
Except.ok ()
validateList: List Nat → Nat → Except String Unit
validateList
[
1: Nat
1
,
2: Nat
2
,
5: Nat
5
,
3: Nat
3
,
8: Nat
8
]
10: Nat
10
-- Except.ok ()
Except.error "illegal value found in list"
validateList: List Nat → Nat → Except String Unit
validateList
[
1: Nat
1
,
2: Nat
2
,
5: Nat
5
,
3: Nat
3
,
8: Nat
8
]
5: Nat
5
-- Except.error "illegal value found in list"

Notice here that the List.forM function passes the monadic context through to the inner function so it can use the throw function from the Except monad.

The List.forM function is defined like this where [Monad m] means "in the context of a monad m":

def 
forM: {m : Type u_1 → Type u_2} → {α : Type u_3} → [inst : Monad m] → List α → (α → m PUnit) → m PUnit
forM
[
Monad: (Type u_1 → Type u_2) → Type (max (u_1 + 1) u_2)
Monad
m: Type u_1 → Type u_2
m
] (
as: List α
as
:
List: Type u_3 → Type u_3
List
α: Type u_3
α
) (
f: α → m PUnit
f
:
α: Type u_3
α
m: Type u_1 → Type u_2
m
PUnit: Type u_1
PUnit
) :
m: Type u_1 → Type u_2
m
PUnit: Type u_1
PUnit
:= match
as: List α
as
with | [] =>
pure: {f : Type u_1 → Type u_2} → [self : Pure f] → {α : Type u_1} → α → f α
pure
⟨⟩: PUnit
⟨⟩
|
a: α
a
::
as: List α
as
=> do
f: α → m PUnit
f
a: α
a
;
List.forM: {m : Type u_1 → Type u_2} → [inst : Monad m] → {α : Type u_3} → List α → (α → m PUnit) → m PUnit
List.forM
as: List α
as
f: α → m PUnit
f

Summary

Now that you know all these different monad constructs, you might be wondering how you can combine them. What if there was some part of your state that you wanted to be able to modify (using the State monad), but you also needed exception handling. How can you get multiple monadic capabilities in the same function. To learn the answer, head to Monad Transformers.

Monad Transformers

In the previous sections you learned about some handy monads Option, IO, Reader, State and Except, and you now know how to make your function use one of these, but what you do not yet know is how to make your function use multiple monads at once.

For example, suppose you need a function that wants to access some Reader context and optionally throw an exception? This would require composition of two monads ReaderM and Except and this is what monad transformers are for.

A monad transformer is fundamentally a wrapper type. It is generally parameterized by another monadic type. You can then run actions from the inner monad, while adding your own customized behavior for combining actions in this new monad. The common transformers add T to the end of an existing monad name. You will find OptionT, ExceptT, ReaderT, StateT but there is no transformer for IO. So generally if you need IO it becomes the innermost wrapped monad.

In the following example we use ReaderT to provide some read only context to a function and this ReaderT transformer will wrap an Except monad. If all goes well the requiredArgument returns the value of a required argument and optionalSwitch returns true if the optional argument is present.

abbrev 
Arguments: Type
Arguments
:=
List: Type → Type
List
String: Type
String
def
indexOf?: {α : Type u_1} → [inst : BEq α] → List α → α → optParam Nat 0 → Option Nat
indexOf?
[
BEq: Type u_1 → Type u_1
BEq
α: Type u_1
α
] (
xs: List α
xs
:
List: Type u_1 → Type u_1
List
α: Type u_1
α
) (
s: α
s
:
α: Type u_1
α
) (
start: optParam Nat 0
start
:=
0: Nat
0
):
Option: Type → Type
Option
Nat: Type
Nat
:= match
xs: List α
xs
with | [] =>
none: {α : Type} → Option α
none
|
a: α
a
::
tail: List α
tail
=> if
a: α
a
==
s: α
s
then
some: {α : Type} → α → Option α
some
start: optParam Nat 0
start
else
indexOf?: {α : Type u_1} → [inst : BEq α] → List α → α → optParam Nat 0 → Option Nat
indexOf?
tail: List α
tail
s: α
s
(
start: optParam Nat 0
start
+
1: Nat
1
) def
requiredArgument: String → ReaderT Arguments (Except String) String
requiredArgument
(
name: String
name
:
String: Type
String
) :
ReaderT: Type → (Type → Type) → Type → Type
ReaderT
Arguments: Type
Arguments
(
Except: Type → Type → Type
Except
String: Type
String
)
String: Type
String
:= do let
args: Arguments
args
read: {ρ : outParam Type} → {m : Type → Type} → [self : MonadReader ρ m] → m ρ
read
let
value: String
value
:= match
indexOf?: {α : Type} → [inst : BEq α] → List α → α → optParam Nat 0 → Option Nat
indexOf?
args: Arguments
args
name: String
name
with |
some: {α : Type ?u.835} → α → Option α
some
i: Nat
i
=> if
i: Nat
i
+
1: Nat
1
<
args: Arguments
args
.
length: {α : Type} → List α → Nat
length
then
args: Arguments
args
[
i: Nat
i
+
1: Nat
1
]! else
"": String
""
|
none: {α : Type ?u.1106} → Option α
none
=>
"": String
""
if
value: String
value
==
"": String
""
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
throw s!"Command line argument {name} missing": ReaderT Arguments (Except String) String
throw s!"Command line argument {name} missing": ReaderT Arguments (Except String) String
s!
throw s!"Command line argument {name} missing": ReaderT Arguments (Except String) String
"Command line argument {
name: String
name
throw s!"Command line argument {name} missing": ReaderT Arguments (Except String) String
} missing"
return
value: String
value
def
optionalSwitch: String → ReaderT Arguments (Except String) Bool
optionalSwitch
(
name: String
name
:
String: Type
String
) :
ReaderT: Type → (Type → Type) → Type → Type
ReaderT
Arguments: Type
Arguments
(
Except: Type → Type → Type
Except
String: Type
String
)
Bool: Type
Bool
:= do let
args: Arguments
args
read: {ρ : outParam Type} → {m : Type → Type} → [self : MonadReader ρ m] → m ρ
read
return match (
indexOf?: {α : Type} → [inst : BEq α] → List α → α → optParam Nat 0 → Option Nat
indexOf?
args: Arguments
args
name: String
name
) with |
some: {α : Type ?u.1664} → α → Option α
some
_ =>
true: Bool
true
|
none: {α : Type ?u.1680} → Option α
none
=>
false: Bool
false
Except.ok "foo"
requiredArgument: String → ReaderT Arguments (Except String) String
requiredArgument
"--input": String
"--input"
|>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
[
"--input": String
"--input"
,
"foo": String
"foo"
] -- Except.ok "foo"
Except.error "Command line argument --input missing"
requiredArgument: String → ReaderT Arguments (Except String) String
requiredArgument
"--input": String
"--input"
|>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
[
"foo": String
"foo"
,
"bar": String
"bar"
] -- Except.error "Command line argument --input missing"
Except.ok true
optionalSwitch: String → ReaderT Arguments (Except String) Bool
optionalSwitch
"--help": String
"--help"
|>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
[
"--help": String
"--help"
] -- Except.ok true
Except.ok false
optionalSwitch: String → ReaderT Arguments (Except String) Bool
optionalSwitch
"--help": String
"--help"
|>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
[]: List String
[]
-- Except.ok false

Notice that throw was available from the inner Except monad. The cool thing is you can switch this around and get the exact same result using ExceptT as the outer monad transformer and ReaderM as the wrapped monad. Try changing requiredArgument to ExceptT String (ReaderM Arguments) Bool.

Note: the |>. notation is described in Readers.

Adding more layers

Here's the best part about monad transformers. Since the result of a monad transformer is itself a monad, you can wrap it inside another transformer! Suppose you need to pass in some read only context like the command line arguments, update some read-write state (like program Config) and optionally throw an exception, then you could write this:

structure 
Config: Type
Config
where
help: Config → Bool
help
:
Bool: Type
Bool
:=
false: Bool
false
verbose: Config → Bool
verbose
:
Bool: Type
Bool
:=
false: Bool
false
input: Config → String
input
:
String: Type
String
:=
"": String
""
deriving
Repr: Type u → Type u
Repr
abbrev
CliConfigM: Type → Type
CliConfigM
:=
StateT: Type → (Type → Type) → Type → Type
StateT
Config: Type
Config
(
ReaderT: Type → (Type → Type) → Type → Type
ReaderT
Arguments: Type
Arguments
(
Except: Type → Type → Type
Except
String: Type
String
)) def
parseArguments: CliConfigM Bool
parseArguments
:
CliConfigM: Type → Type
CliConfigM
Bool: Type
Bool
:= do let mut
config: Config
config
get: {σ : outParam Type} → {m : Type → Type} → [self : MonadState σ m] → m σ
get
if
(← optionalSwitch "--help"): Bool
(
optionalSwitch: String → ReaderT Arguments (Except String) Bool
optionalSwitch
(← optionalSwitch "--help"): Bool
"--help": String
"--help"
(← optionalSwitch "--help"): Bool
)
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
"Usage: example [--help] [--verbose] [--input <input file>]": String
"Usage: example [--help] [--verbose] [--input <input file>]"
config: Config
config
:= {
config: Config
config
with verbose :=
(← optionalSwitch "--verbose"): Bool
(
optionalSwitch: String → ReaderT Arguments (Except String) Bool
optionalSwitch
(← optionalSwitch "--verbose"): Bool
"--verbose": String
"--verbose"
(← optionalSwitch "--verbose"): Bool
)
, input :=
(← requiredArgument "--input"): String
(
requiredArgument: String → ReaderT Arguments (Except String) String
requiredArgument
(← requiredArgument "--input"): String
"--input": String
"--input"
(← requiredArgument "--input"): String
)
}
set: {σ : semiOutParam Type} → {m : Type → Type} → [self : MonadStateOf σ m] → σ → m PUnit
set
config: Config
config
return
true: Bool
true
def
main: List String → IO Unit
main
(
args: List String
args
:
List: Type → Type
List
String: Type
String
) :
IO: Type → Type
IO
Unit: Type
Unit
:= do let
config: Config
config
:
Config: Type
Config
:= { input :=
"default": String
"default"
} match
parseArguments: CliConfigM Bool
parseArguments
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
config: Config
config
|>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
args: List String
args
with |
Except.ok: {ε : Type ?u.5351} → {α : Type ?u.5350} → α → Except ε α
Except.ok
(_,
c: Config
c
) => do
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
s!"Processing input '{
c: Config
c
.
input: Config → String
input
}' with verbose={
c: Config
c
.
verbose: Config → Bool
verbose
}" |
Except.error: {ε : Type ?u.5590} → {α : Type ?u.5589} → ε → Except ε α
Except.error
s: String
s
=>
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
s: String
s
Usage: example [--help] [--verbose] [--input <input file>]
main: List String → IO Unit
main
[
"--help": String
"--help"
] -- Usage: example [--help] [--verbose] [--input <input file>]
Processing input 'foo' with verbose=false
main: List String → IO Unit
main
[
"--input": String
"--input"
,
"foo": String
"foo"
] -- Processing input file 'foo' with verbose=false
Processing input 'bar' with verbose=true
main: List String → IO Unit
main
[
"--verbose": String
"--verbose"
,
"--input": String
"--input"
,
"bar": String
"bar"
] -- Processing input 'bar' with verbose=true

In this example parseArguments is actually three stacked monads, StateM, ReaderM, Except. Notice the convention of abbreviating long monadic types with an alias like CliConfigM.

Monad Lifting

Lean makes it easy to compose functions that use different monads using a concept of automatic monad lifting. You already used lifting in the above code, because you were able to compose optionalSwitch which has type ReaderT Arguments (Except String) Bool and call it from parseArguments which has a bigger type StateT Config (ReaderT Arguments (Except String)). This "just worked" because Lean did some magic with monad lifting.

To give you a simpler example of this, suppose you have the following function:

def 
divide: Float → Float → ExceptT String Id Float
divide
(
x: Float
x
:
Float: Type
Float
) (
y: Float
y
:
Float: Type
Float
):
ExceptT: Type → (Type → Type) → Type → Type
ExceptT
String: Type
String
Id: Type → Type
Id
Float: Type
Float
:= if
y: Float
y
==
0: Float
0
then
throw: {ε : outParam Type} → {m : Type → Type} → [self : MonadExcept ε m] → {α : Type} → ε → m α
throw
"can't divide by zero": String
"can't divide by zero"
else
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(
x: Float
x
/
y: Float
y
)
Except.ok 2.000000
divide: Float → Float → ExceptT String Id Float
divide
6: Float
6
3: Float
3
-- Except.ok 2.000000
Except.error "can't divide by zero"
divide: Float → Float → ExceptT String Id Float
divide
1: Float
1
0: Float
0
-- Except.error "can't divide by zero"

Notice here we used the ExceptT transformer, but we composed it with the Id identity monad. This is then the same as writing Except String Float since the identity monad does nothing.

Now suppose you want to count the number of times divide is called and store the result in some global state:

def 
divideCounter: Float → Float → StateT Nat (ExceptT String Id) Float
divideCounter
(
x: Float
x
:
Float: Type
Float
) (
y: Float
y
:
Float: Type
Float
) :
StateT: Type → (Type → Type) → Type → Type
StateT
Nat: Type
Nat
(
ExceptT: Type → (Type → Type) → Type → Type
ExceptT
String: Type
String
Id: Type → Type
Id
)
Float: Type
Float
:= do
modify: {σ : Type} → {m : Type → Type} → [inst : MonadState σ m] → (σ → σ) → m PUnit
modify
fun
s: Nat
s
=>
s: Nat
s
+
1: Nat
1
divide: Float → Float → ExceptT String Id Float
divide
x: Float
x
y: Float
y
Except.ok (2.000000, 1)
divideCounter: Float → Float → StateT Nat (ExceptT String Id) Float
divideCounter
6: Float
6
3: Float
3
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
0: Nat
0
-- Except.ok (2.000000, 1)
Except.error "can't divide by zero"
divideCounter: Float → Float → StateT Nat (ExceptT String Id) Float
divideCounter
1: Float
1
0: Float
0
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
0: Nat
0
-- Except.error "can't divide by zero"

The modify function is a helper which makes it easier to use modifyGet from the StateM monad. But something interesting is happening here, divideCounter is returning the value of divide, but the types don't match, yet it works? This is monad lifting in action.

You can see this more clearly with the following test:

def 
liftTest: Except String Float → StateT Nat (Except String) Float
liftTest
(
x: Except String Float
x
:
Except: Type → Type → Type
Except
String: Type
String
Float: Type
Float
) :
StateT: Type → (Type → Type) → Type → Type
StateT
Nat: Type
Nat
(
Except: Type → Type → Type
Except
String: Type
String
)
Float: Type
Float
:=
x: Except String Float
x
Except.ok (5.000000, 3)
liftTest: Except String Float → StateT Nat (Except String) Float
liftTest
(
divide: Float → Float → ExceptT String Id Float
divide
5: Float
5
1: Float
1
) |>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
3: Nat
3
-- Except.ok (5.000000, 3)

Notice that liftTest returned x without doing anything to it, yet that matched the return type StateT Nat (Except String) Float. Monad lifting is provided by monad transformers. if you #print liftTest you will see that Lean is implementing this using a call to a function named monadLift from the MonadLift type class:

class MonadLift (m : Type u → Type v) (n : Type u → Type w) where
  monadLift : {α : Type u} → m α → n α

So monadLift is a function for lifting a computation from an inner Monad m α to an outer Monad n α. You could replace x in liftTest with monadLift x if you want to be explicit about it.

The StateT monad transformer defines an instance of MonadLift like this:

@[inline] protected def lift {α : Type u} (t : m α) : StateT σ m α :=
  fun s => do let a ← t; pure (a, s)

instance : MonadLift m (StateT σ m) := ⟨StateT.lift⟩

This means that any monad m can be wrapped in a StateT monad by using the function fun s => do let a ← t; pure (a, s) that takes state s, runs the inner monad action t, and returns the result and the new state in a pair (a, s) without making any changes to s.

Because MonadLift is a type class, Lean can automatically find the required monadLift instances in order to make your code compile and in this way it was able to find the StateT.lift function and use it to wrap the result of divide so that the correct type is returned from divideCounter.

If you have an instance MonadLift m n that means there is a way to turn a computation that happens inside of m into one that happens inside of n and (this is the key part) usually without the instance itself creating any additional data that feeds into the computation. This means you can in principle declare lifting instances from any monad to any other monad, it does not, however, mean that you should do this in all cases. You can get a very nice report on how all this was done by adding the line set_option trace.Meta.synthInstance true in before divideCounter and moving you cursor to the end of the first line after do.

This was a lot of detail, but it is very important to understand how monad lifting works because it is used heavily in Lean programs.

Transitive lifting

There is also a transitive version of MonadLift called MonadLiftT which can lift multiple monad layers at once. In the following example we added another monad layer with ReaderT String ... and notice that x is also automatically lifted to match.

def 
liftTest2: Except String Float → ReaderT String (StateT Nat (Except String)) Float
liftTest2
(
x: Except String Float
x
:
Except: Type → Type → Type
Except
String: Type
String
Float: Type
Float
) :
ReaderT: Type → (Type → Type) → Type → Type
ReaderT
String: Type
String
(
StateT: Type → (Type → Type) → Type → Type
StateT
Nat: Type
Nat
(
Except: Type → Type → Type
Except
String: Type
String
))
Float: Type
Float
:=
x: Except String Float
x
Except.ok (5.000000, 3)
liftTest2: Except String Float → ReaderT String (StateT Nat (Except String)) Float
liftTest2
(
divide: Float → Float → ExceptT String Id Float
divide
5: Float
5
1: Float
1
) |>.
run: {ρ : Type} → {m : Type → Type} → {α : Type} → ReaderT ρ m α → ρ → m α
run
"": String
""
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
3: Nat
3
-- Except.ok (5.000000, 3)

The ReaderT monadLift is even simpler than the one for StateT:

instance  : MonadLift m (ReaderT ρ m) where
  monadLift x := fun _ => x

This lift operation creates a function that defines the required ReaderT input argument, but the inner monad doesn't know or care about ReaderT so the monadLift function throws it away with the _ then calls the inner monad action x. This is a perfectly legal implementation of the ReaderM monad.

Add your own Custom MonadLift

This does not compile:

def 
main2: IO Unit
main2
:
IO: Type → Type
IO
Unit: Type
Unit
:= do try let
ret: ?m.8397
ret
Error: type mismatch (divideCounter 5 2).run 0 has type ExceptT String Id (Float × Nat) : Type but is expected to have type IO ?m.8428 : Type
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
(
toString: {α : Type} → [self : ToString α] → α → String
toString
ret: ?m.8397
ret
) catch
e: IO.Error
e
=>
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
e: IO.Error
e

saying:

typeclass instance problem is stuck, it is often due to metavariables
  ToString ?m.4786

The reason is divideCounter returns the big StateT Nat (ExceptT String Id) Float and that type cannot be automatically lifted into the main return type of IO Unit unless you give it some help.

The following custom MonadLift solves this problem:

def 
liftIO: {α : Type} → ExceptT String Id α → IO α
liftIO
(
t: ExceptT String Id α
t
:
ExceptT: Type → (Type → Type) → Type → Type
ExceptT
String: Type
String
Id: Type → Type
Id
α: Type
α
) :
IO: Type → Type
IO
α: Type
α
:= do match
t: ExceptT String Id α
t
with |
.ok: {ε α : Type} → α → Except ε α
.ok
r: α
r
=>
EStateM.Result.ok: {ε σ α : Type} → α → σ → EStateM.Result ε σ α
EStateM.Result.ok
r: α
r
|
.error: {ε α : Type} → ε → Except ε α
.error
s: String
s
=>
EStateM.Result.error: {ε σ α : Type} → ε → σ → EStateM.Result ε σ α
EStateM.Result.error
s: String
s
instance: MonadLift (ExceptT String Id) IO
instance
:
MonadLift: semiOutParam (Type → Type) → (Type → Type) → Type 1
MonadLift
(
ExceptT: Type → (Type → Type) → Type → Type
ExceptT
String: Type
String
Id: Type → Type
Id
)
IO: Type → Type
IO
where monadLift :=
liftIO: {α : Type} → ExceptT String Id α → IO α
liftIO
def
main3: IO Unit
main3
:
IO: Type → Type
IO
Unit: Type
Unit
:= do try let
ret: Float × Nat
ret
divideCounter: Float → Float → StateT Nat (ExceptT String Id) Float
divideCounter
5: Float
5
2: Float
2
|>.
run: {σ : Type} → {m : Type → Type} → {α : Type} → StateT σ m α → σ → m (α × σ)
run
0: Nat
0
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
(
toString: {α : Type} → [self : ToString α] → α → String
toString
ret: Float × Nat
ret
) catch
e: IO.Error
e
=>
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
e: IO.Error
e
(2.500000, 1)
main3: IO Unit
main3
-- (2.500000, 1)

It turns out that the IO monad you see in your main function is based on the EStateM.Result type which is similar to the Except type but it has an additional return value. The liftIO function converts any Except String α into IO α by simply mapping the ok case of the Except to the Result.ok and the error case to the Result.error.

Lifting ExceptT

In the previous Except section you saw functions that throw Except values. When you get all the way back up to your main function which has type IO Unit you have the same problem you had above, because Except String Float doesn't match even if you use a try/catch.

def 
main4: IO Unit
main4
:
IO: Type → Type
IO
Unit: Type
Unit
:= do try let
ret: Float
ret
divide: Float → Float → ExceptT String Id Float
divide
5: Float
5
0: Float
0
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
(
toString: {α : Type} → [self : ToString α] → α → String
toString
ret: Float
ret
) -- lifting happens here. catch
e: IO.Error
e
=>
IO.println: {α : Type} → [inst : ToString α] → α → IO Unit
IO.println
s!"Unhandled exception: {
e: IO.Error
e
}"
Unhandled exception: can't divide by zero
main4: IO Unit
main4
-- Unhandled exception: can't divide by zero

Without the liftIO the (toString ret) expression would not compile with a similar error:

typeclass instance problem is stuck, it is often due to metavariables
  ToString ?m.6007

So the general lesson is that if you see an error like this when using monads, check for a missing MonadLift.

Summary

Now that you know how to combine your monads together, you're almost done with understanding the key concepts of monads! You could probably go out now and start writing some pretty nice code! But to truly master monads, you should know how to make your own, and there's one final concept that you should understand for that. This is the idea of type "laws". Each of the structures you've learned so far has a series of laws associated with it. And for your instances of these classes to make sense, they should follow the laws! Check out Monad Laws.

Monad Laws

In the previous sections you learned how to use Functors, Applicatives, and Monads, and you played with some useful instances including Option, IO, Reader, State and Except and you learned about composition using Monad Transformers.

So far, you've learned the concrete details you need in order to use monads in your Lean programs. But there's still one more important concept you need if you want to create new functors, applicatives and monads. Namely, the notion of structural "laws" -- rules that these type classes should follow in order to meet other programmers' expectations about your code.

Life without Laws

Remember Lean represents each of these abstract structures by a type class. Each of these type classes has one or two main functions. So, as long as you implement those functions and it type checks, you have a new functor, applicative, or monad, right?

Well not quite. Yes, your program will compile and you'll be able to use the instances. But this doesn't mean your instances follow the mathematical constructs. If they don't, your instances won't fulfill other programmers' expectations. Each type class has its own "laws". For instance, suppose you have the following Point Functor:

structure 
Point: Type → Type
Point
(
α: Type
α
:
Type: Type 1
Type
) where
x: {α : Type} → Point α → α
x
:
α: Type
α
y: {α : Type} → Point α → α
y
:
α: Type
α
deriving
Repr: Type u → Type u
Repr
,
BEq: Type u → Type u
BEq
def
Point.map: {α β : Type} → (α → β) → Point α → Point β
Point.map
(
f: α → β
f
:
α: Type
α
β: Type
β
) (
s: Point α
s
:
Point: Type → Type
Point
α: Type
α
) :
Point: Type → Type
Point
β: Type
β
:= { x :=
f: α → β
f
s: Point α
s
.
y: {α : Type} → Point α → α
y
, -- an example of something weird y :=
f: α → β
f
s: Point α
s
.
x: {α : Type} → Point α → α
x
}
instance: Functor Point
instance
:
Functor: (Type → Type) → Type 1
Functor
Point: Type → Type
Point
where map :=
Point.map: {α β : Type} → (α → β) → Point α → Point β
Point.map
{ x := 4, y := 3 }
(·+2): Nat → Nat
(·+2)
<$> (
Point.mk: {α : Type} → α → α → Point α
Point.mk
1: Nat
1
2: Nat
2
) -- { x := 4, y := 3 }

This Point does something weird, when you map it because it transposes the x and y coordinates which is not what other people would expect from a map function. In fact, it breaks the rules as you will see below.

What are the Functor laws?

Functors have two laws: the identity law, and the composition law. These laws express behaviors that your functor instances should follow. If they don't, other programmers will be very confused at the effect your instances have on their program.

The identity law says that if you "map" the identity function (id) over your functor, the resulting functor should be the same. A succinct way of showing this on a List functor is:

def 
list1: List Nat
list1
:= [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
]
true
id: {α : Type} → α → α
id
<$>
list1: List Nat
list1
==
list1: List Nat
list1
-- true

Now let's try the same test on the Point functor:

def 
p1: Point Nat
p1
:
Point: Type → Type
Point
Nat: Type
Nat
:= (
Point.mk: {α : Type} → α → α → Point α
Point.mk
1: Nat
1
2: Nat
2
)
false
id: {α : Type} → α → α
id
<$>
p1: Point Nat
p1
==
p1: Point Nat
p1
-- false

Oh, and look while the List is behaving well, the Point functor fails this identity test.

The composition law says that if you "map" two functions in succession over a functor, this should be the same as "composing" the functions and simply mapping that one super-function over the functor. In Lean you can compose two functions using Function.comp f g (or the syntax f ∘ g, which you can type in VS code using \o ) and you will get the same results from both of these showing that the composition law holds for List Nat:

def 
double: Nat → Nat
double
(
x: Nat
x
:
Nat: Type
Nat
) :=
x: Nat
x
+
x: Nat
x
def
square: Nat → Nat
square
(
x: Nat
x
:
Nat: Type
Nat
) :=
x: Nat
x
*
x: Nat
x
[2, 8, 18]
double: Nat → Nat
double
<$> (
square: Nat → Nat
square
<$>
list1: List Nat
list1
) -- [2, 8, 18]
true
(
double: Nat → Nat
double
<$> (
square: Nat → Nat
square
<$>
list1: List Nat
list1
)) == ((
double: Nat → Nat
double
square: Nat → Nat
square
) <$>
list1: List Nat
list1
) -- true -- ok, what about the Point class?
{ x := 2, y := 8 }
double: Nat → Nat
double
<$> (
square: Nat → Nat
square
<$>
p1: Point Nat
p1
) -- { x := 2, y := 8 }
{ x := 8, y := 2 }
(
double: Nat → Nat
double
square: Nat → Nat
square
) <$>
p1: Point Nat
p1
-- { x := 8, y := 2 }
false
double: Nat → Nat
double
<$> (
square: Nat → Nat
square
<$>
p1: Point Nat
p1
) == (
double: Nat → Nat
double
square: Nat → Nat
square
) <$>
p1: Point Nat
p1
-- false

Note that composition also fails on the bad Point because the x/y transpose.

As you can see this bad Point implementation violates both of the functor laws. In this case it would not be a true functor. Its behavior would confuse any other programmers trying to use it. You should take care to make sure that your instances make sense. Once you get a feel for these type classes, the likelihood is that the instances you'll create will follow the laws.

You can also write a bad functor that passes one law but not the other like this:

def 
bad_option_map: {α β : Type u} → (α → β) → Option α → Option β
bad_option_map
{
α: Type u
α
β: Type u
β
:
Type u: Type (u + 1)
Type u
} : (
α: Type u
α
β: Type u
β
)
Option: Type u → Type u
Option
α: Type u
α
Option: Type u → Type u
Option
β: Type u
β
| _, _ =>
none: {α : Type u} → Option α
none
instance: Functor Option
instance
:
Functor: (Type u_1 → Type u_1) → Type (u_1 + 1)
Functor
Option: Type u_1 → Type u_1
Option
where map :=
bad_option_map: {α β : Type u_1} → (α → β) → Option α → Option β
bad_option_map
def
t1: Option Nat
t1
:
Option: Type → Type
Option
Nat: Type
Nat
:=
some: {α : Type} → α → Option α
some
10: Nat
10
false
id: {α : Type} → α → α
id
<$>
t1: Option Nat
t1
==
t1: Option Nat
t1
-- false
true
double: Nat → Nat
double
<$> (
square: Nat → Nat
square
<$>
t1: Option Nat
t1
) == (
double: Nat → Nat
double
square: Nat → Nat
square
) <$>
t1: Option Nat
t1
-- true

This fails the id law but obeys the composition law. Hopefully this explains the value of these laws, and you don't need to see any more bad examples!

What are the Applicative Laws?

While functors have two laws, applicatives have four laws:

  • Identity
  • Homomorphism
  • Interchange
  • Composition

Identity

pure id <*> v = v

Applying the identity function through an applicative structure should not change the underlying values or structure. For example:

instance: Applicative List
instance
:
Applicative: (Type u_1 → Type u_1) → Type (u_1 + 1)
Applicative
List: Type u_1 → Type u_1
List
where pure :=
List.singleton: {α : Type u_1} → α → List α
List.singleton
seq
f: List (α✝ → β✝)
f
x: Unit → List α✝
x
:=
List.flatMap: {α β : Type u_1} → List α → (α → List β) → List β
List.flatMap
f: List (α✝ → β✝)
f
fun
y: α✝ → β✝
y
=>
Functor.map: {f : Type u_1 → Type u_1} → [self : Functor f] → {α β : Type u_1} → (α → β) → f α → f β
Functor.map
y: α✝ → β✝
y
(
x: Unit → List α✝
x
(): Unit
()
)
[1, 2, 3]
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
id: {α : Type} → α → α
id
<*> [
1: Nat
1
,
2: Nat
2
,
3: Nat
3
] -- [1, 2, 3]

The pure id statement here is wrapping the identity function in an applicative structure so that you can apply that over the container [1, 2, 3] using the Applicative seq operation which has the notation <*>.

To prove this for all values v and any applicative m you can write this theorem:

example: ∀ {m : Type u_1 → Type u_2} {α : Type u_1} [inst : Applicative m] [inst_1 : LawfulApplicative m] (v : m α), pure id <*> v = v
example
[
Applicative: (Type u_1 → Type u_2) → Type (max (u_1 + 1) u_2)
Applicative
m: Type u_1 → Type u_2
m
] [
LawfulApplicative: (f : Type u_1 → Type u_2) → [inst : Applicative f] → Prop
LawfulApplicative
m: Type u_1 → Type u_2
m
] (
v: m α
v
:
m: Type u_1 → Type u_2
m
α: Type u_1
α
) :
pure: {f : Type u_1 → Type u_2} → [self : Pure f] → {α : Type u_1} → α → f α
pure
id: {α : Type u_1} → α → α
id
<*>
v: m α
v
=
v: m α
v
:=

Goals accomplished! 🐙

Goals accomplished! 🐙
-- Goals accomplished 🎉

Homomorphism

pure f <*> pure x = pure (f x)

Suppose you wrap a function and an object in pure. You can then apply the wrapped function over the wrapped object. Of course, you could also apply the normal function over the normal object, and then wrap it in pure. The homomorphism law states these results should be the same.

For example:

def x := 1
def f := (· + 2)

true
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
f: Nat → Nat
f
<*>
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
x: Nat
x
= (
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(
f: Nat → Nat
f
x: Nat
x
) :
List: Type → Type
List
Nat: Type
Nat
) -- true

You should see a distinct pattern here. The overriding theme of almost all these laws is that these Applicative types should behave like normal containers. The Applicative functions should not have any side effects. All they should do is facilitate the wrapping, unwrapping, and transformation of data contained in the container resulting in a new container that has the same structure.

Interchange

u <*> pure y = pure (. y) <*> u.

This law is a little more complicated, so don't sweat it too much. It states that the order that you wrap things shouldn't matter. One the left, you apply any applicative u over a pure wrapped object. On the right, you first wrap a function applying the object as an argument. Note that (· y) is short hand for: fun f => f y. Then you apply this to the first applicative u. These should be the same.

For example:

def 
y: Nat
y
:=
4: Nat
4
def
g: List (Nat → Nat)
g
:
List: Type → Type
List
(
Nat: Type
Nat
Nat: Type
Nat
) := [
(· + 2): Nat → Nat
(· + 2)
]
true
g: List (Nat → Nat)
g
<*>
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
y: Nat
y
=
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(·
y: Nat
y
) <*>
g: List (Nat → Nat)
g
-- true

You can prove this with the following theorem:

example: ∀ {m : Type u_1 → Type u_2} {α β : Type u_1} [inst : Applicative m] [inst_1 : LawfulApplicative m] (u : m (α → β)) (y : α), u <*> pure y = (pure fun x => x y) <*> u
example
[
Applicative: (Type u_1 → Type u_2) → Type (max (u_1 + 1) u_2)
Applicative
m: Type u_1 → Type u_2
m
] [
LawfulApplicative: (f : Type u_1 → Type u_2) → [inst : Applicative f] → Prop
LawfulApplicative
m: Type u_1 → Type u_2
m
] (
u: m (α → β)
u
:
m: Type u_1 → Type u_2
m
(
α: Type u_1
α
β: Type u_1
β
)) (
y: α
y
:
α: Type u_1
α
) :
u: m (α → β)
u
<*>
pure: {f : Type u_1 → Type u_2} → [self : Pure f] → {α : Type u_1} → α → f α
pure
y: α
y
=
pure: {f : Type u_1 → Type u_2} → [self : Pure f] → {α : Type u_1} → α → f α
pure
(·
y: α
y
) <*>
u: m (α → β)
u
:=

Goals accomplished! 🐙

Goals accomplished! 🐙
-- Goals accomplished 🎉

Composition:

u <*> v <*> w = u <*> (v <*> w)

This final applicative law mimics the second functor law. It is a composition law. It states that function composition holds across applications within the applicative:

For example:

def u := [1, 2]
def v := [3, 4]
def w := [5, 6]

[9, 10, 10, 11, 10, 11, 11, 12]
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(·+·+·): Nat → Nat → Nat → Nat
(·+·+·)
<*>
u: List Nat
u
<*>
v: List Nat
v
<*>
w: List Nat
w
-- [9, 10, 10, 11, 10, 11, 11, 12]
[9, 10, 10, 11, 10, 11, 11, 12]
let
grouping: List Nat
grouping
:=
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(·+·): Nat → Nat → Nat
(·+·)
<*>
v: List Nat
v
<*>
w: List Nat
w
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
(·+·): Nat → Nat → Nat
(·+·)
<*>
u: List Nat
u
<*>
grouping: List Nat
grouping
-- [9, 10, 10, 11, 10, 11, 11, 12]

To test composition you see the separate grouping (v <*> w) then that can be used in the outer sequence u <*> grouping to get the same final result [9, 10, 10, 11, 10, 11, 11, 12].

What are the Monad Laws?

Monads have three laws:

  • Left Identity
  • Right Identity
  • Associativity

Left Identity

Identity laws for monads specify that pure by itself shouldn't really change anything about the structure or its values.

Left identity is x >>= pure = x and is demonstrated by the following examples on a monadic List:

instance: Monad List
instance
:
Monad: (Type u_1 → Type u_1) → Type (u_1 + 1)
Monad
List: Type u_1 → Type u_1
List
where pure :=
List.singleton: {α : Type u_1} → α → List α
List.singleton
bind :=
List.flatMap: {α β : Type u_1} → List α → (α → List β) → List β
List.flatMap
def
a: List String
a
:= [
"apple": String
"apple"
,
"orange": String
"orange"
]
["apple", "orange"]
a: List String
a
>>=
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
-- ["apple", "orange"]
true
a: List String
a
>>=
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
=
a: List String
a
-- true

Right Identity

Right identity is pure x >>= f = f x and is demonstrated by the following example:

def 
h: Nat → Option Nat
h
(
x: Nat
x
:
Nat: Type
Nat
) :
Option: Type → Type
Option
Nat: Type
Nat
:=
some: {α : Type} → α → Option α
some
(
x: Nat
x
+
1: Nat
1
) def
z: Nat
z
:=
5: Nat
5
some 6
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
z: Nat
z
>>=
h: Nat → Option Nat
h
-- some 6
some 6
h: Nat → Option Nat
h
z: Nat
z
-- some 6
true
pure: {f : Type → Type} → [self : Pure f] → {α : Type} → α → f α
pure
z: Nat
z
>>=
h: Nat → Option Nat
h
=
h: Nat → Option Nat
h
z: Nat
z
-- true

So in this example, with this specific z and h, you see that the rule holds true.

Associativity

The associativity law is written as:

  x >>= f >>= g = x >>= (λ x => f x >>= g)

where (x : m α) and (f : α → m β) and (g : β → m γ).

The associativity law is difficult to parse like some of the applicative laws, but what it is saying is that if you change the grouping of bind operations, you should still get the same result. This law has a parallel structure to the other composition laws.

You can see this in action in the following rewrite of runOptionFuncsBind from monads:

def 
optionFunc1: String → Option Nat
optionFunc1
:
String: Type
String
->
Option: Type → Type
Option
Nat: Type
Nat
|
"": String
""
=>
none: {α : Type} → Option α
none
|
str: String
str
=>
some: {α : Type} → α → Option α
some
str: String
str
.
length: String → Nat
length
def
optionFunc2: Nat → Option Float
optionFunc2
(
i: Nat
i
:
Nat: Type
Nat
) :
Option: Type → Type
Option
Float: Type
Float
:= if
i: Nat
i
%
2: Nat
2
==
0: Nat
0
then
none: {α : Type} → Option α
none
else
some: {α : Type} → α → Option α
some
(
i: Nat
i
.
toFloat: Nat → Float
toFloat
*
3.14159: Float
3.14159
) def
optionFunc3: Float → Option (List Nat)
optionFunc3
(
f: Float
f
:
Float: Type
Float
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) := if
f: Float
f
>
15.0: Float
15.0
then
none: {α : Type} → Option α
none
else
some: {α : Type} → α → Option α
some
[
f: Float
f
.
floor: Float → Float
floor
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
,
f: Float
f
.
ceil: Float → Float
ceil
.
toUInt32: Float → UInt32
toUInt32
.
toNat: UInt32 → Nat
toNat
] def
runOptionFuncsBind: String → Option (List Nat)
runOptionFuncsBind
(
input: String
input
:
String: Type
String
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) :=
optionFunc1: String → Option Nat
optionFunc1
input: String
input
>>=
optionFunc2: Nat → Option Float
optionFunc2
>>=
optionFunc3: Float → Option (List Nat)
optionFunc3
def
runOptionFuncsBindGrouped: String → Option (List Nat)
runOptionFuncsBindGrouped
(
input: String
input
:
String: Type
String
) :
Option: Type → Type
Option
(
List: Type → Type
List
Nat: Type
Nat
) :=
optionFunc1: String → Option Nat
optionFunc1
input: String
input
>>= (λ
x: Nat
x
=>
optionFunc2: Nat → Option Float
optionFunc2
x: Nat
x
>>=
optionFunc3: Float → Option (List Nat)
optionFunc3
)
some [9, 10]
runOptionFuncsBind: String → Option (List Nat)
runOptionFuncsBind
"big": String
"big"
-- some [9, 10]
some [9, 10]
runOptionFuncsBindGrouped: String → Option (List Nat)
runOptionFuncsBindGrouped
"big": String
"big"
-- some [9, 10]

Notice here we had to insert a λ function just like the definition says: (λ x => f x >>= g). This is because unlike applicatives, you can't resolve the structure of later operations without the results of earlier operations quite as well because of the extra context monads provide. But you can still group their later operations into composite functions taking their inputs from earlier on, and the result should be the same.

Summary

While these laws may be a bit difficult to understand just by looking at them, the good news is that most of the instances you'll make will naturally follow the laws so long as you keep it simple, so you shouldn't have to worry about them too much.

There are two main ideas from all the laws:

  1. Applying the identity or pure function should not change the underlying values or structure.
  2. It should not matter what order you group operations in. Another way to state this is function composition should hold across your structures.

Following these laws will ensure other programmers are not confused by the behavior of your new functors, applicatives and monads.

Frequently Asked Questions

What is Lean?

Lean is a new open source theorem prover being developed at Microsoft Research. It is a research project that aims to bridge the gap between interactive and automated theorem proving. Lean can be also used as a programming language. Actually, some Lean features are implemented in Lean itself.

Should I use Lean?

Lean is under heavy development, and we are constantly trying new ideas and tweaking the system. It is a research project and not a product. Things change rapidly, and we constantly break backward compatibility. Lean comes "as is", you should not expect we will fix bugs and/or add new features for your project. We have our own priorities, and will not change them to accommodate your needs. Even if you implement a new feature or fix a bug, we may not want to merge it because it may conflict with our plans for Lean, it may not be performant, we may not want to maintain it, we may be busy, etc. If you really need this new feature or bug fix, we suggest you create your own fork and maintain it yourself.

Where is the documentation?

This is the Lean 4 manual. It is a work in progress, but it will eventually cover the whole language. A public and very active chat room dedicated to Lean is open on Zulip. It is a good place to interact with other Lean users.

Should I use Lean to teach a course?

Lean has been used to teach courses on logic, type theory and programming languages at CMU and the University of Washington. The lecture notes for the CMU course Logic and Proof are available online, but they are for Lean 3. If you decide to teach a course using Lean, we suggest you prepare all material before the beginning of the course, and make sure that Lean attends all your needs. You should not expect we will fix bugs and/or add features needed for your course.

Are there IDEs for Lean?

Yes, see Setting Up Lean.

Is Lean sound? How big is the kernel? Should I trust it?

Lean has a relatively small kernel. Several independent checkers have been implemented for Lean 3. Two of them are tc and trepplein. We expect similar independent checkers will be built for Lean 4.

Should I open a new issue?

We use GitHub to track bugs and new features. Bug reports are always welcome, but nitpicking issues are not (e.g., the error message is confusing). See also our contribution guidelines.

Is it Lean, LEAN, or L∃∀N?

We always use "Lean" in writing. When specifying a major version number, we append it together with a single space: Lean 4.

Significant changes from Lean 3

Lean 4 is not backward compatible with Lean 3. We have rewritten most of the system, and took the opportunity to cleanup the syntax, metaprogramming framework, and elaborator. In this section, we go over the most significant changes.

Lambda expressions

We do not use , anymore to separate the binders from the lambda expression body. The Lean 3 syntax for lambda expressions was unconventional, and , has been overused in Lean 3. For example, we believe a list of lambda expressions is quite confusing in Lean 3, since , is used to separate the elements of a list, and in the lambda expression itself. We now use => as the separator, as an example, fun x => x is the identity function. One may still use the symbol λ as a shorthand for fun. The lambda expression notation has many new features that are not supported in Lean 3.

Pattern matching

In Lean 4, one can easily create new notation that abbreviates commonly used idioms. One of them is a fun followed by a match. In the following examples, we define a few functions using fun+match notation.

namespace ex1
def Prod.str : Nat × Nat → String :=
  fun (a, b) => "(" ++ toString a ++ ", " ++ toString b ++ ")"

structure Point where
  x : Nat
  y : Nat
  z : Nat

def Point.addX : Point → Point → Nat :=
  fun { x := a, .. } { x := b, .. } =>  a+b

def Sum.str : Option Nat → String :=
  fun
    | some a => "some " ++ toString a
    | none   => "none"
end ex1

Implicit lambdas

In Lean 3 stdlib, we find many instances of the dreadful @+_ idiom. It is often used when the expected type is a function type with implicit arguments, and we have a constant (reader_t.pure in the example) which also takes implicit arguments. In Lean 4, the elaborator automatically introduces lambdas for consuming implicit arguments. We are still exploring this feature and analyzing its impact, but the experience so far has been very positive. As an example, here is the example in the link above using Lean 4 implicit lambdas.

variable (ρ : Type) (m : Type → Type) [Monad m]
instance : Monad (ReaderT ρ m) where
  pure := ReaderT.pure
  bind := ReaderT.bind

Users can disable the implicit lambda feature by using @ or writing a lambda expression with {} or [] binder annotations. Here are few examples

namespace ex2
def id1 : {α : Type} → α → α :=
  fun x => x

def listId : List ({α : Type} → α → α) :=
  (fun x => x) :: []

-- In this example, implicit lambda introduction has been disabled because
-- we use `@` before `fun`
def id2 : {α : Type} → α → α :=
  @fun α (x : α) => id1 x

def id3 : {α : Type} → α → α :=
  @fun α x => id1 x

def id4 : {α : Type} → α → α :=
  fun x => id1 x

-- In this example, implicit lambda introduction has been disabled
-- because we used the binder annotation `{...}`
def id5 : {α : Type} → α → α :=
  fun {α} x => id1 x
end ex2

Sugar for simple functions

In Lean 3, we can create simple functions from infix operators by using parentheses. For example, (+1) is sugar for fun x, x + 1. In Lean 4, we generalize this notation using · as a placeholder. Here are a few examples:

namespace ex3
#check (· + 1)
-- fun a => a + 1
#check (2 - ·)
-- fun a => 2 - a
#eval [1, 2, 3, 4, 5].foldl (·*·) 1
-- 120

def f (x y z : Nat) :=
  x + y + z

#check (f · 1 ·)
-- fun a b => f a 1 b

#eval [(1, 2), (3, 4), (5, 6)].map (·.1)
-- [1, 3, 5]
end ex3

As in Lean 3, the notation is activated using parentheses, and the lambda abstraction is created by collecting the nested ·s. The collection is interrupted by nested parentheses. In the following example, two different lambda expressions are created.

#check (Prod.mk · (· + 1))
-- fun a => (a, fun b => b + 1)

Function applications

In Lean 4, we have support for named arguments. Named arguments enable you to specify an argument for a parameter by matching the argument with its name rather than with its position in the parameter list. If you don't remember the order of the parameters but know their names, you can send the arguments in any order. You may also provide the value for an implicit parameter when Lean failed to infer it. Named arguments also improve the readability of your code by identifying what each argument represents.

def sum (xs : List Nat) :=
  xs.foldl (init := 0) (·+·)

#eval sum [1, 2, 3, 4]
-- 10

example {a b : Nat} {p : Nat → Nat → Nat → Prop} (h₁ : p a b b) (h₂ : b = a)
    : p a a b :=
  Eq.subst (motive := fun x => p a x b) h₂ h₁

In the following examples, we illustrate the interaction between named and default arguments.

def f (x : Nat) (y : Nat := 1) (w : Nat := 2) (z : Nat) :=
  x + y + w - z

example (x z : Nat) : f (z := z) x = x + 1 + 2 - z := rfl

example (x z : Nat) : f x (z := z) = x + 1 + 2 - z := rfl

example (x y : Nat) : f x y = fun z => x + y + 2 - z := rfl

example : f = (fun x z => x + 1 + 2 - z) := rfl

example (x : Nat) : f x = fun z => x + 1 + 2 - z := rfl

example (y : Nat) : f (y := 5) = fun x z => x + 5 + 2 - z := rfl

def g {α} [Add α] (a : α) (b? : Option α := none) (c : α) : α :=
  match b? with
  | none   => a + c
  | some b => a + b + c

variable {α} [Add α]

example : g = fun (a c : α) => a + c := rfl

example (x : α) : g (c := x) = fun (a : α) => a + x := rfl

example (x : α) : g (b? := some x) = fun (a c : α) => a + x + c := rfl

example (x : α) : g x = fun (c : α) => x + c := rfl

example (x y : α) : g x y = fun (c : α) => x + y + c := rfl

In Lean 4, we can use .. to provide missing explicit arguments as _. This feature combined with named arguments is useful for writing patterns. Here is an example:

inductive Term where
  | var    (name : String)
  | num    (val : Nat)
  | add    (fn : Term) (arg : Term)
  | lambda (name : String) (type : Term) (body : Term)

def getBinderName : Term → Option String
  | Term.lambda (name := n) .. => some n
  | _ => none

def getBinderType : Term → Option Term
  | Term.lambda (type := t) .. => some t
  | _ => none

Ellipsis are also useful when explicit argument can be automatically inferred by Lean, and we want to avoid a sequence of _s.

example (f : Nat → Nat) (a b c : Nat) : f (a + b + c) = f (a + (b + c)) :=
  congrArg f (Nat.add_assoc ..)

In Lean 4, writing f(x) in place of f x is no longer allowed, you must use whitespace between the function and its arguments (e.g., f (x)).

Dependent function types

Given α : Type and β : α → Type, (x : α) → β x denotes the type of functions f with the property that, for each a : α, f a is an element of β a. In other words, the type of the value returned by f depends on its input. We say (x : α) → β x is a dependent function type. In Lean 3, we write the dependent function type (x : α) → β x using one of the following three equivalent notations: forall x : α, β x or ∀ x : α, β x or Π x : α, β x. The first two were intended to be used for writing propositions, and the latter for writing code. Although the notation Π x : α, β x has historical significance, we have removed it from Lean 4 because it is awkward to use and often confuses new users. We can still write forall x : α, β x and ∀ x : α, β x.

#check forall (α : Type), α → α
#check ∀ (α : Type), α → α
#check ∀ α : Type, α → α
#check ∀ α, α → α
#check (α : Type) → α → α
#check {α : Type} → (a : Array α) → (i : Nat) → i < a.size → α
#check {α : Type} → [ToString α] → α → String
#check forall {α : Type} (a : Array α) (i : Nat), i < a.size → α
#check {α β : Type} → α → β → α × β

The meta keyword

In Lean 3, the keyword meta is used to mark definitions that can use primitives implemented in C/C++. These metadefinitions can also call themselves recursively, relaxing the termination restriction imposed by ordinary type theory. Metadefinitions may also use unsafe primitives such as eval_expr (α : Type u) [reflected α] : expr → tactic α, or primitives that break referential transparency tactic.unsafe_run_io.

The keyword meta has been currently removed from Lean 4. However, we may re-introduce it in the future, but with a much more limited purpose: marking meta code that should not be included in the executables produced by Lean.

The keyword constant has been deleted in Lean 4, and axiom should be used instead. In Lean 4, the new command opaque is used to define an opaque definition. Here are two simple examples:

namespace meta1
opaque x : Nat := 1
-- The following example will not type check since `x` is opaque
-- example : x = 1 := rfl

-- We can evaluate `x`
#eval x
-- 1

-- When no value is provided, the elaborator tries to build one automatically for us
-- using the `Inhabited` type class
opaque y : Nat
end meta1

We can instruct Lean to use a foreign function as the implementation for any definition using the attribute @[extern "foreign_function"]. It is the user's responsibility to ensure the foreign implementation is correct. However, a user mistake here will only impact the code generated by Lean, and it will not compromise the logical soundness of the system. That is, you cannot prove False using the @[extern] attribute. We use @[extern] with definitions when we want to provide a reference implementation in Lean that can be used for reasoning. When we write a definition such as

@[extern "lean_nat_add"]
def add : Nat → Nat → Nat
  | a, Nat.zero   => a
  | a, Nat.succ b => Nat.succ (add a b)

Lean assumes that the foreign function lean_nat_add implements the reference implementation above.

The unsafe keyword allows us to define functions using unsafe features such as general recursion, and arbitrary type casting. Regular (safe) functions cannot directly use unsafe ones since it would compromise the logical soundness of the system. As in regular programming languages, programs written using unsafe features may crash at runtime. Here are a few unsafe examples:

unsafe def unsound : False :=
  unsound

#check @unsafeCast
-- {α : Type _} → {β : Type _} → α → β

unsafe def nat2String (x : Nat) : String :=
  unsafeCast x

-- The following definition doesn't type check because it is not marked as `unsafe`
-- def nat2StringSafe (x : Nat) : String :=
--   unsafeCast x

The unsafe keyword is particularly useful when we want to take advantage of an implementation detail of the Lean execution runtime. For example, we cannot prove in Lean that arrays have a maximum size, but the runtime used to execute Lean programs guarantees that an array cannot have more than 2^64 (2^32) elements in a 64-bit (32-bit) machine. We can take advantage of this fact to provide a more efficient implementation for array functions. However, the efficient version would not be very useful if it can only be used in unsafe code. Thus, Lean 4 provides the attribute @[implemented_by functionName]. The idea is to provide an unsafe (and potentially more efficient) version of a safe definition or constant. The function f at the attribute @[implemented_by f] is very similar to an extern/foreign function, the key difference is that it is implemented in Lean itself. Again, the logical soundness of the system cannot be compromised by using the attribute implemented_by, but if the implementation is incorrect your program may crash at runtime. In the following example, we define withPtrUnsafe a k h which executes k using the memory address where a is stored in memory. The argument h is proof that k is a constant function. Then, we "seal" this unsafe implementation at withPtr. The proof h ensures the reference implementation k 0 is correct. For more information, see the article "Sealing Pointer-Based Optimizations Behind Pure Functions".

unsafe
def withPtrUnsafe {α β : Type} (a : α) (k : USize → β) (h : ∀ u, k u = k 0) : β :=
  k (ptrAddrUnsafe a)

@[implemented_by withPtrUnsafe]
def withPtr {α β : Type} (a : α) (k : USize → β) (h : ∀ u, k u = k 0) : β :=
  k 0

General recursion is very useful in practice, and it would be impossible to implement Lean 4 without it. The keyword partial implements a very simple and efficient approach for supporting general recursion. Simplicity was key here because of the bootstrapping problem. That is, we had to implement Lean in Lean before many of its features were implemented (e.g., the tactic framework or support for wellfounded recursion). Another requirement for us was performance. Functions tagged with partial should be as efficient as the ones implemented in mainstream functional programming languages such as OCaml. When the partial keyword is used, Lean generates an auxiliary unsafe definition that uses general recursion, and then defines an opaque constant that is implemented by this auxiliary definition. This is very simple, efficient, and is sufficient for users that want to use Lean as a regular programming language. A partial definition cannot use unsafe features such as unsafeCast and ptrAddrUnsafe, and it can only be used to implement types we already known to be inhabited. Finally, since we "seal" the auxiliary definition using an opaque constant, we cannot reason about partial definitions.

We are aware that proof assistants such as Isabelle provide a framework for defining partial functions that does not prevent users from proving properties about them. This kind of framework can be implemented in Lean 4. Actually, it can be implemented by users since Lean 4 is an extensible system. The developers current have no plans to implement this kind of support for Lean 4. However, we remark that users can implement it using a function that traverses the auxiliary unsafe definition generated by Lean, and produces a safe one using an approach similar to the one used in Isabelle.

namespace partial1
partial def f (x : Nat) : IO Unit := do
  IO.println x
  if x < 100 then
     f (x+1)

#eval f 98
end partial1

Library changes

These are changes to the library which may trip up Lean 3 users:

  • List is no longer a monad.

Style changes

Coding style changes have also been made:

  • Term constants and variables are now lowerCamelCase rather than snake_case
  • Type constants are now UpperCamelCase, eg Nat, List. Type variables are still lower case greek letters. Functors are still lower case latin (m : Type → Type) [Monad m].
  • When defining typeclasses, prefer not to use "has". Eg ToString or Add instead of HasToString or HasAdd.
  • Prefer return to pure in monad expressions.
  • Pipes <| are preferred to dollars $ for function application.
  • Declaration bodies should always be indented:
    inductive Hello where
      | foo
      | bar
    
    structure Point where
      x : Nat
      y : Nat
    
    def Point.addX : Point → Point → Nat :=
      fun { x := a, .. } { x := b, .. } => a + b
    
  • In structures and typeclass definitions, prefer where to := and don't surround fields with parentheses. (Shown in Point above)

You can copy highlighted code straight from VS Code to any rich text editor supporting HTML input. For highlighting code in LaTeX, there are two options:

  • listings, which is a common package and simple to set up, but you may run into some restrictions of it and LaTeX around Unicode
  • minted, a LaTeX package wrapping the Pygments syntax highlighting library. It needs a few more steps to set up, but provides unrestricted support for Unicode when combined with XeLaTeX or LuaLaTex.

Example with listings

Save lstlean.tex into the same directory, or anywhere in your TEXINPUTS path, as the following test file:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{listings}
\usepackage{amssymb}

\usepackage{color}
\definecolor{keywordcolor}{rgb}{0.7, 0.1, 0.1}   % red
\definecolor{tacticcolor}{rgb}{0.0, 0.1, 0.6}    % blue
\definecolor{commentcolor}{rgb}{0.4, 0.4, 0.4}   % grey
\definecolor{symbolcolor}{rgb}{0.0, 0.1, 0.6}    % blue
\definecolor{sortcolor}{rgb}{0.1, 0.5, 0.1}      % green
\definecolor{attributecolor}{rgb}{0.7, 0.1, 0.1} % red

\def\lstlanguagefiles{lstlean.tex}
% set default language
\lstset{language=lean}

\begin{document}
\begin{lstlisting}
theorem funext {f₁ f₂ : ∀ (x : α), β x} (h : ∀ x, f₁ x = f₂ x) : f₁ = f₂ := by
  show extfunApp (Quotient.mk f₁) = extfunApp (Quotient.mk f₂)
  apply congrArg
  apply Quotient.sound
  exact h
\end{lstlisting}
\end{document}

Compile the file via

$ pdflatex test.tex
  • for older LaTeX versions, you might need to use [utf8x] instead of [utf8] with inputenc

Example with minted

First install Pygments (version 2.18 or newer). Then save the following sample LaTeX file test.tex into the same directory:

\documentclass{article}
\usepackage{fontspec}
% switch to a monospace font supporting more Unicode characters
\setmonofont{FreeMono}
\usepackage{minted}
\newmintinline[lean]{lean4}{bgcolor=white}
\newminted[leancode]{lean4}{fontsize=\footnotesize}
\usemintedstyle{tango}  % a nice, colorful theme

\begin{document}
\begin{leancode}
theorem funext {f₁ f₂ : ∀ (x : α), β x} (h : ∀ x, f₁ x = f₂ x) : f₁ = f₂ := by
  show extfunApp (Quotient.mk' f₁) = extfunApp (Quotient.mk' f₂)
  apply congrArg
  apply Quotient.sound
  exact h
\end{leancode}
\end{document}

You can then compile test.tex by executing the following command:

xelatex --shell-escape test.tex

Some remarks:

  • either xelatex or lualatex is required to handle Unicode characters in the code.
  • --shell-escape is needed to allow xelatex to execute pygmentize in a shell.
  • If the chosen monospace font is missing some Unicode symbols, you can direct them to be displayed using a fallback font or other replacement LaTeX code.
    \usepackage{newunicodechar}
    \newfontfamily{\freeserif}{DejaVu Sans}
    \newunicodechar{✝}{\freeserif{✝}}
    \newunicodechar{𝓞}{\ensuremath{\mathcal{O}}}
    
  • If you are using an old version of Pygments, you can copy lean.py into your working directory, and use lean4.py:Lean4Lexer -x instead of lean4 above. If your version of minted is v2.7 or newer, but before v3.0, you will additionally need to follow the workaround described in https://github.com/gpoore/minted/issues/360.
import Lean
open Lean Widget

The user-widgets system

Proving and programming are inherently interactive tasks. Lots of mathematical objects and data structures are visual in nature. User widgets let you associate custom interactive UIs with sections of a Lean document. User widgets are rendered in the Lean infoview.

Rubik's cube

Trying it out

To try it out, type in the following code and place your cursor over the #widget command. You can also view this manual entry in the online editor.

@[widget_module]
def 
helloWidget: Widget.Module
helloWidget
:
Widget.Module: Type
Widget.Module
where javascript :=
" import * as React from 'react'; export default function(props) { const name = props.name || 'world' return React.createElement('p', {}, 'Hello ' + name + '!') }": String
" import * as React from 'react'; export default function(props) { const name = props.name || 'world' return React.createElement('p', {}, 'Hello ' + name + '!') }"
#widget
helloWidget: Widget.Module
helloWidget

If you want to dive into a full sample right away, check out Rubiks. This sample uses higher-level widget components from the ProofWidgets library.

Below, we'll explain the system piece by piece.

⚠️ WARNING: All of the user widget APIs are unstable and subject to breaking changes.

Widget modules and instances

A widget module is a valid JavaScript ESModule that can execute in the Lean infoview. Most widget modules export a React component as the piece of user interface to be rendered. To access React, the module can use import * as React from 'react'. Our first example of a widget module is helloWidget above. Widget modules must be registered with the @[widget_module] attribute.

A widget instance is then the identifier of a widget module (e.g. `helloWidget) bundled with a value for its props. This value is passed as the argument to the React component. In our first invocation of #widget, we set it to .null. Try out what happens when you type in:

structure 
HelloWidgetProps: Type
HelloWidgetProps
where
name?: HelloWidgetProps → Option String
name?
:
Option: Type → Type
Option
String: Type
String
:=
none: {α : Type} → Option α
none
deriving
Server.RpcEncodable: Type → Type
Server.RpcEncodable
#widget
helloWidget: Widget.Module
helloWidget
with { name? :=
"<your name here>": String
"<your name here>"
:
HelloWidgetProps: Type
HelloWidgetProps
}

Under the hood, widget instances are associated with a range of positions in the source file. Widget instances are stored in the infotree in the same manner as other information about the source file such as the type of every expression. In our example, the #widget command stores a widget instance with the entire line as its range. One can think of the infotree entry as an instruction for the infoview: "when the user places their cursor here, please render the following widget".

Querying the Lean server

💡 NOTE: The RPC system presented below does not depend on JavaScript. However, the primary use case is the web-based infoview in VSCode.

Besides enabling us to create cool client-side visualizations, user widgets have the ability to communicate with the Lean server. Thanks to this, they have the same metaprogramming capabilities as custom elaborators or the tactic framework. To see this in action, let's implement a #check command as a web input form. This example assumes some familiarity with React.

The first thing we'll need is to create an RPC method. Meaning "Remote Procedure Call",this is a Lean function callable from widget code (possibly remotely over the internet). Our method will take in the name : Name of a constant in the environment and return its type. By convention, we represent the input data as a structure. Since it will be sent over from JavaScript, we need FromJson and ToJson instance. We'll see why the position field is needed later.

structure 
GetTypeParams: Type
GetTypeParams
where /-- Name of a constant to get the type of. -/
name: GetTypeParams → Name
name
:
Name: Type
Name
/-- Position of our widget instance in the Lean file. -/
pos: GetTypeParams → Lsp.Position
pos
:
Lsp.Position: Type
Lsp.Position
deriving
FromJson: Type u → Type u
FromJson
,
ToJson: Type u → Type u
ToJson

After its argument structure, we define the getType method. RPCs method execute in the RequestM monad and must return a RequestTask α where α is the "actual" return type. The Task is so that requests can be handled concurrently. As a first guess, we'd use Expr as α. However, expressions in general can be large objects which depend on an Environment and LocalContext. Thus we cannot directly serialize an Expr and send it to JavaScript. Instead, there are two options:

  • One is to send a reference which points to an object residing on the server. From JavaScript's point of view, references are entirely opaque, but they can be sent back to other RPC methods for further processing.
  • The other is to pretty-print the expression and send its textual representation called CodeWithInfos. This representation contains extra data which the infoview uses for interactivity. We take this strategy here.

RPC methods execute in the context of a file, but not of any particular Environment, so they don't know about the available definitions and theorems. Thus, we need to pass in a position at which we want to use the local Environment. This is why we store it in GetTypeParams. The withWaitFindSnapAtPos method launches a concurrent computation whose job is to find such an Environment for us, in the form of a snap : Snapshot. With this in hand, we can call MetaM procedures to find out the type of name and pretty-print it.

open Server RequestM in
@[server_rpc_method]
def 
getType: GetTypeParams → RequestM (RequestTask CodeWithInfos)
getType
(
params: GetTypeParams
params
:
GetTypeParams: Type
GetTypeParams
) :
RequestM: Type → Type
RequestM
(
RequestTask: Type → Type
RequestTask
CodeWithInfos: Type
CodeWithInfos
) :=
withWaitFindSnapAtPos: {α : Type} → Lsp.Position → (Snapshots.Snapshot → RequestM α) → RequestM (RequestTask α)
withWaitFindSnapAtPos
params: GetTypeParams
params
.
pos: GetTypeParams → Lsp.Position
pos
fun
snap: Snapshots.Snapshot
snap
=> do
runTermElabM: {α : Type} → Snapshots.Snapshot → RequestT Elab.TermElabM α → RequestM α
runTermElabM
snap: Snapshots.Snapshot
snap
do let
name: Name
name
resolveGlobalConstNoOverloadCore: {m : Type → Type} → [inst : Monad m] → [inst : MonadResolveName m] → [inst : MonadEnv m] → [inst : MonadError m] → Name → m Name
resolveGlobalConstNoOverloadCore
params: GetTypeParams
params
.
name: GetTypeParams → Name
name
let
c: ConstantInfo
c
try
getConstInfo: {m : Type → Type} → [inst : Monad m] → [inst : MonadEnv m] → [inst : MonadError m] → Name → m ConstantInfo
getConstInfo
name: Name
name
catch
_: Exception
_
=>
throwThe: (ε : Type) → {m : Type → Type} → [inst : MonadExceptOf ε m] → {α : Type} → ε → m α
throwThe
RequestError: Type
RequestError
.invalidParams: JsonRpc.ErrorCode
.invalidParams
, s!"no constant named '{
name: Name
name
}'"⟩
Widget.ppExprTagged: Expr → optParam Bool false → MetaM CodeWithInfos
Widget.ppExprTagged
c: ConstantInfo
c
.
type: ConstantInfo → Expr
type

Using infoview components

Now that we have all we need on the server side, let's write the widget module. By importing @leanprover/infoview, widgets can render UI components used to implement the infoview itself. For example, the <InteractiveCode> component displays expressions with term : type tooltips as seen in the goal view. We will use it to implement our custom #check display.

⚠️ WARNING: Like the other widget APIs, the infoview JS API is unstable and subject to breaking changes.

The code below demonstrates useful parts of the API. To make RPC method calls, we invoke the useRpcSession hook. The useAsync helper packs the results of an RPC call into an AsyncState structure which indicates whether the call has resolved successfully, has returned an error, or is still in-flight. Based on this we either display an InteractiveCode component with the result, mapRpcError the error in order to turn it into a readable message, or show a Loading.. message, respectively.

@[widget_module]
def 
checkWidget: Widget.Module
checkWidget
:
Widget.Module: Type
Widget.Module
where javascript :=
" import * as React from 'react'; const e = React.createElement; import { useRpcSession, InteractiveCode, useAsync, mapRpcError } from '@leanprover/infoview'; export default function(props) { const rs = useRpcSession() const [name, setName] = React.useState('getType') const st = useAsync(() => rs.call('getType', { name, pos: props.pos }), [name, rs, props.pos]) const type = st.state === 'resolved' ? st.value && e(InteractiveCode, {fmt: st.value}) : st.state === 'rejected' ? e('p', null, mapRpcError(st.error).message) : e('p', null, 'Loading..') const onChange = (event) => { setName(event.target.value) } return e('div', null, e('input', { value: name, onChange }), ' : ', type) } ": String
" import * as React from 'react'; const e = React.createElement; import { useRpcSession, InteractiveCode, useAsync, mapRpcError } from '@leanprover/infoview'; export default function(props) { const rs = useRpcSession() const [name, setName] = React.useState('getType') const st = useAsync(() => rs.call('getType', { name, pos: props.pos }), [name, rs, props.pos]) const type = st.state === 'resolved' ? st.value && e(InteractiveCode, {fmt: st.value}) : st.state === 'rejected' ? e('p', null, mapRpcError(st.error).message) : e('p', null, 'Loading..') const onChange = (event) => { setName(event.target.value) } return e('div', null, e('input', { value: name, onChange }), ' : ', type) } "

We can now try out the widget.

#widget 
checkWidget: Widget.Module
checkWidget

#check as a service

Building widget sources

While typing JavaScript inline is fine for a simple example, for real developments we want to use packages from NPM, a proper build system, and JSX. Thus, most actual widget sources are built with Lake and NPM. They consist of multiple files and may import libraries which don't work as ESModules by default. On the other hand a widget module must be a single, self-contained ESModule in the form of a string. Readers familiar with web development may already have guessed that to obtain such a string, we need a bundler. Two popular choices are rollup.js and esbuild. If we go with rollup.js, to make a widget work with the infoview we need to:

  • Set output.format to 'es'.
  • Externalize react, react-dom, @leanprover/infoview. These libraries are already loaded by the infoview so they should not be bundled.

ProofWidgets provides a working rollup.js build configuration in rollup.config.js.

Inserting text

Besides making RPC calls, widgets can instruct the editor to carry out certain actions. We can insert text, copy text to the clipboard, or highlight a certain location in the document. To do this, use the EditorContext React context. This will return an EditorConnection whose api field contains a number of methods that interact with the editor.

The full API can be viewed here.

@[widget_module]
def 
insertTextWidget: Widget.Module
insertTextWidget
:
Widget.Module: Type
Widget.Module
where javascript :=
" import * as React from 'react'; const e = React.createElement; import { EditorContext } from '@leanprover/infoview'; export default function(props) { const editorConnection = React.useContext(EditorContext) function onClick() { editorConnection.api.insertText('-- hello!!!', 'above') } return e('div', null, e('button', { value: name, onClick }, 'insert')) } ": String
" import * as React from 'react'; const e = React.createElement; import { EditorContext } from '@leanprover/infoview'; export default function(props) { const editorConnection = React.useContext(EditorContext) function onClick() { editorConnection.api.insertText('-- hello!!!', 'above') } return e('div', null, e('button', { value: name, onClick }, 'insert')) } "
#widget
insertTextWidget: Widget.Module
insertTextWidget

Semantic Highlighting

The Lean language server provides semantic highlighting information to editors. In order to benefit from this in VSCode, you may need to activate the "Editor > Semantic Highlighting" option in the preferences (this is translates to "editor.semanticHighlighting.enabled": true, in settings.json). The default option here is to let your color theme decides whether it activates semantic highlighting (the default themes Dark+ and Light+ do activate it for instance).

However this may be insufficient if your color theme does not distinguish enough syntax categories or distinguishes them very subtly. For instance the default Light+ theme uses color #001080 for variables. This is awfully close to #000000 that is used as the default text color. This makes it very easy to miss an accidental use of auto bound implicit arguments. For instance in

def my_id (n : nat) := n

maybe nat is a typo and Nat was intended. If your color theme is good enough then you should see that n and nat have the same color since they are both marked as variables by semantic highlighting. If you rather write (n : Nat) then n keeps its variable color but Nat gets the default text color.

If you use such a bad theme, you can fix things by modifying the Semantic Token Color Customizations configuration. This cannot be done directly in the preferences dialog but you can click on "Edit in settings.json" to directly edit the settings file. Beware that you must save this file (in the same way you save any file opened in VSCode) before seeing any effect in other tabs or VSCode windows.

In the main config object, you can add something like

"editor.semanticTokenColorCustomizations": {
        "[Default Light+]": {"rules": {"function": "#ff0000", "property": "#00ff00", "variable": "#ff00ff"}}
    },

The colors in this example are not meant to be nice but to be easy to spot in your file when testing. Of course you need to replace Default Light+ with the name of your theme, and you can customize several themes if you use several themes. VSCode will display small colored boxes next to the HTML color specifications. Hovering on top of a color specification opens a convenient color picker dialog.

In order to understand what function, property and variable mean in the above example, the easiest path is to open a Lean file and ask VSCode about its classification of various bits of your file. Open the command palette with Ctrl-shift-p (or ⌘-shift-p on a Mac) and search for "Inspect Editor Tokens and Scopes" (typing the word "tokens" should be enough to see it). You can then click on any word in your file and look if there is a "semantic token type" line in the displayed information.

Development Workflow

If you want to make changes to Lean itself, start by building Lean from a clean checkout to make sure that everything is set up correctly. After that, read on below to find out how to set up your editor for changing the Lean source code, followed by further sections of the development manual where applicable such as on the test suite and commit convention.

If you are planning to make any changes that may affect the compilation of Lean itself, e.g. changes to the parser, elaborator, or compiler, you should first read about the bootstrapping pipeline. You should not edit the stage0 directory except using the commands described in that section when necessary.

Development Setup

You can use any of the supported editors for editing the Lean source code. If you set up elan as below, opening src/ as a workspace folder should ensure that stage 0 (i.e. the stage that first compiles src/) will be used for files in that directory.

Dev setup using elan

You can use elan to easily switch between stages and build configurations based on the current directory, both for the lean, leanc, and leanmake binaries in your shell's PATH and inside your editor.

To install elan, you can do so, without installing a default version of Lean, using (Unix)

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh -s -- --default-toolchain none

or (Windows)

curl -O --location https://raw.githubusercontent.com/leanprover/elan/master/elan-init.ps1
powershell -f elan-init.ps1 --default-toolchain none
del elan-init.ps1

The lean-toolchain files in the Lean 4 repository are set up to use the lean4-stage0 toolchain for editing files in src and the lean4 toolchain for editing files in tests.

Run the following commands to make lean4 point at stage1 and lean4-stage0 point at stage0:

# in the Lean rootdir
elan toolchain link lean4 build/release/stage1
elan toolchain link lean4-stage0 build/release/stage0

You can also use the +toolchain shorthand (e.g. lean +lean4-debug) to switch toolchains on the spot. lean4-mode will automatically use the lean executable associated with the directory of the current file as long as lean4-rootdir is unset and ~/.elan/bin is in your exec-path. Where Emacs sources the exec-path from can be a bit unclear depending on your configuration, so alternatively you can also set lean4-rootdir to "~/.elan" explicitly.

You might find that debugging through elan, e.g. via gdb lean, disables some things like symbol autocompletion because at first only the elan proxy binary is loaded. You can instead pass the explicit path to bin/lean in your build folder to gdb, or use gdb $(elan which lean).

It is also possible to generate releases that others can use, simply by pushing a tag to your fork of the Lean 4 github repository (and waiting about an hour; check the Actions tab for completion). If you push my-tag to a fork in your github account my_name, you can then put my_name/lean4:my-tag in your lean-toolchain file in a project using lake. (You must use a tag name that does not start with a numeral, or contain _).

VS Code

There is a lean.code-workspace file that correctly sets up VS Code with workspace roots for the stage0/stage1 setup described above as well as with other settings. You should always load it when working on Lean, such as by invoking

code lean.code-workspace

on the command line.

ccache

Lean's build process uses ccache if it is installed to speed up recompilation of the generated C code. Without ccache, you'll likely spend more time than necessary waiting on rebuilds - it's a good idea to make sure it's installed.

prelude

Unlike most Lean projects, all submodules of the Lean module begin with the prelude keyword. This disables the automated import of Init, meaning that developers need to figure out their own subset of Init to import. This is done such that changing files in Init doesn't force a full rebuild of Lean.

These are instructions to set up a working development environment for those who wish to make changes to Lean itself. It is part of the Development Guide.

We strongly suggest that new users instead follow the Quickstart to get started using Lean, since this sets up an environment that can automatically manage multiple Lean toolchain versions, which is necessary when working within the Lean ecosystem.

Requirements

Platform-Specific Setup

Generic Build Instructions

Setting up a basic parallelized release build:

git clone https://github.com/leanprover/lean4
cd lean4
cmake --preset release
make -C build/release -j$(nproc || sysctl -n hw.logicalcpu)

You can replace $(nproc || sysctl -n hw.logicalcpu) with the desired parallelism amount.

The above commands will compile the Lean library and binaries into the stage1 subfolder; see below for details.

You should not usually run cmake --install after a successful build. See Dev setup using elan on how to properly set up your editor to use the correct stage depending on the source directory.

Useful CMake Configuration Settings

Pass these along with the cmake --preset release command. There are also two alternative presets that combine some of these options you can use instead of release: debug and sandebug (sanitize + debug).

  • -D CMAKE_BUILD_TYPE=
    Select the build type. Valid values are RELEASE (default), DEBUG, RELWITHDEBINFO, and MINSIZEREL.

  • -D CMAKE_C_COMPILER=
    -D CMAKE_CXX_COMPILER=
    Select the C/C++ compilers to use. Official Lean releases currently use Clang; see also .github/workflows/ci.yml for the CI config.

Lean will automatically use CCache if available to avoid redundant builds, especially after stage 0 has been updated.

Troubleshooting

  • Call make with an additional VERBOSE=1 argument to print executed commands.

Installing Lean on Ubuntu

Build Dependencies

Please ensure you have the following build tools available and then follow the generic build instructions.

Basic packages

sudo apt-get install git libgmp-dev libuv1-dev cmake ccache clang

Install Packages on OS X 14.5

We assume that you are using homebrew as a package manager.

Compilers

You need a C++11-compatible compiler to build Lean. As of November 2014, you have three options:

  • clang++-3.5 (shipped with OSX, Apple LLVM version 6.0)
  • gcc-4.9.1 (homebrew)
  • clang++-3.5 (homebrew)

We recommend to use Apple's clang++ because it is pre-shipped with OS X and requires no further installation.

To install gcc-4.9.1 via homebrew, please execute:

brew install gcc

To install clang++-3.5 via homebrew, please execute:

brew install llvm

To use compilers other than the default one (Apple's clang++), you need to use -DCMAKE_CXX_COMPILER option to specify the compiler that you want to use when you run cmake. For example, do the following to use g++.

cmake -DCMAKE_CXX_COMPILER=g++ ...

Required Packages: CMake, GMP, libuv

brew install cmake
brew install gmp
brew install libuv
brew install ccache

Lean for Windows

A native Lean binary for Windows can be generated using MSYS2. It is easy to install all dependencies, it produces native 64/32-binaries, and supports a C++14 compiler.

An alternative to MSYS2 is to use Lean in Windows WSL.

While not necessary for pure building, you should first activate Developer Mode (Settings > Update & Security > For developers > Developer Mode), which will allow Lean to create symlinks that e.g. enable go-to-definition in the stdlib.

Installing the Windows SDK

Install the Windows SDK from Microsoft. The oldest supported version is 10.0.18362.0. If you installed the Windows SDK to the default location, then there should be a directory with the version number at C:\Program Files (x86)\Windows Kits\10\Include. If there are multiple directories, only the highest version number matters.

Installing dependencies

The official webpage of MSYS2 provides one-click installers. Once installed, you should run the "MSYS2 CLANG64" shell from the start menu (the one that runs clang64.exe). Do not run "MSYS2 MSYS" or "MSYS2 MINGW64" instead! MSYS2 has a package management system, pacman.

Here are the commands to install all dependencies needed to compile Lean on your machine.

pacman -S make python mingw-w64-clang-x86_64-cmake mingw-w64-clang-x86_64-clang mingw-w64-clang-x86_64-ccache mingw-w64-clang-x86_64-libuv mingw-w64-clang-x86_64-gmp git unzip diffutils binutils

You should now be able to run these commands:

clang --version
cmake --version

Then follow the generic build instructions in the MSYS2 MinGW shell, using:

cmake --preset release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++

instead of cmake --preset release. This will use the clang compiler instead of gcc, which is required with msys2.

Install lean

Follow the steps in Dev setup using elan regarding installation of the bits you just built. Note that in an msys2 environment elan-init.sh reports you need to add %USERPROFILE%\.elan\bin to your path, but of course in msys2 that needs to be a valid linux style path, like this:

export PATH="$PATH:/c/users/$USERNAME/.elan/bin"

Running

You can run lean --version to see if your binaries work.

If you want a version that can run independently of your MSYS install then you need to copy the following dependent DLL's from where ever they are installed in your MSYS setup:

  • libc++.dll
  • libgmp-10.dll
  • libuv-1.dll
  • libwinpthread-1.dll

The following linux command will do that:

cp $(ldd lean.exe | cut -f3 -d' ' | grep mingw) .

However, if you plan to use this build to compile lean programs to executable binaries using lake build in normal Windows command prompt outside of msys2 environment you will also need to add a windows version clang to your path.

Trouble shooting

-bash: gcc: command not found

Make sure /clang64/bin is in your PATH environment. If it is not then check you launched the MSYS2 CLANG64 shell from the start menu. (The one that runs clang64.exe).

Lean in Windows WSL

As an alternative to the MSYS2 setup you can also use the Windows Subsystem for Linux to build Lean there, but edit and debug using Visual Studio Code in Windows.

For the most part setup in WSL is the same as Ubuntu. This document provides additional information on how to setup Windows Visual Studio Code remote debugging into your WSL environment using the lean extension running in WSL.

It is recommended that you setup Ubuntu in WSL 2. Then follow the Dev setup using elan.

Visual Studio Code setup on Windows

Install Visual Studio Code on Windows. Install the VS Code Remote Development extension from Microsoft. This extension includes the Remote - WSL extension. Install the lean4 extension but into the WSL using: Install in WSL: Ubuntu

Type Ctrl+Shift+P and select Remote-WSL: Open Folder in WSL... to open a folder containing your hello world lean package.

When everything is working you should see something like this with a functioning infoview, syntax coloring and tooltips:

screenshot

Troubleshooting

lean4: Could not find Lean version by running 'lean --version'.

Check that the lean program is available in your PATH in your WSL environment.

Logs are showing up with a windows file path

Check that you have not set a windows path in your lean4.serverLogging.path Visual Studio Code setting. it is best if this setting is set as follows:

  "lean4.serverLogging.path": "logs"

This will result in a logs folder being created inside your lean package folder in the WSL file system.

Lean Build Bootstrapping

Since version 4, Lean is a partially bootstrapped program: most parts of the frontend and compiler are written in Lean itself and thus need to be built before building Lean itself - which is needed to again build those parts. This cycle is broken by using pre-built C files checked into the repository (which ultimately go back to a point where the Lean compiler was not written in Lean) in place of these Lean inputs and then compiling everything in multiple stages up to a fixed point. The build directory is organized in these stages:

stage0/
  # Bootstrap binary built from stage0/src/.
  # We do not use any other files from this directory in further stages.
  bin/lean
stage1/
  include/
    config.h  # config variables used to build `lean` such as used allocator
    runtime/lean.h  # runtime header, used by extracted C code, uses `config.h`
  share/lean/
    lean.mk  # used by `leanmake`
  lib/
    lean/**/*.olean  # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
    temp/**/*.{c,o}  # the library extracted to C and compiled by `leanc`
    libInit.a libLean.a  # static libraries of the Lean library
    libleancpp.a  # a static library of the C++ sources of Lean
    libleanshared.so  # a dynamic library including the static libraries above
  bin/
    lean  # the Lean compiler & server, a small executable that calls directly into libleanshared.so
    leanc  # a wrapper around a C compiler supplying search paths etc
    leanmake  # a wrapper around `make` supplying the Makefile above
stage2/...
stage3/...

Stage 0 can be viewed as a blackbox since it does not depend on any local changes and is equivalent to downloading a bootstrapping binary as done in other compilers. The build for any other stage starts by building the runtime and standard library from src/, using the lean binary from the previous stage in the latter case, which are then assembled into a new bin/lean binary.

Each stage can be built by calling make stageN in the root build folder. Running just make will default to stage 1, which is usually sufficient for testing changes on the test suite or other files outside of the stdlib. However, it might happen that the stage 1 compiler is not able to load its own stdlib, e.g. when changing the .olean format: the stage 1 stdlib will use the format generated by the stage 0 compiler, but the stage 1 compiler will expect the new format. In this case, we should continue with building and testing stage 2 instead, which will both build and expect the new format. Note that this is only possible because when building a stage's stdlib, we use the previous compiler but never load the previous stdlib (since everything is prelude). We can also use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes that affect the produced (.olean or .c) code, on the stdlib and compiler itself. We are not aware of any "meta-meta" parts that influence more than two stages of compilation, so stage 3 should always be identical to stage 2 and only exists as a sanity check.

In summary, doing a standard build via make internally involves these steps:

  1. compile the stage0/src archived sources into stage0/bin/lean
  2. use it to compile the current library (including your changes) into stage1/lib
  3. link that and the current C++ code from src/ into stage1/bin/lean

You now have a Lean binary and library that include your changes, though their own compilation was not influenced by them, that you can use to test your changes on test programs whose compilation will be influenced by the changes.

Updating stage0

Finally, when we want to use new language features in the library, we need to update the archived C source code of the stage 0 compiler in stage0/src.

The github repository will automatically update stage0 on master once src/stdlib_flags.h and stage0/src/stdlib_flags.h are out of sync.

If you have write access to the lean4 repository, you can also manually trigger that process, for example to be able to use new features in the compiler itself. You can do that on https://github.com/leanprover/lean4/actions/workflows/update-stage0.yml or using Github CLI with

gh workflow run update-stage0.yml

Leaving stage0 updates to the CI automation is preferable, but should you need to do it locally, you can use make update-stage0-commit in build/release to update stage0 from stage1 or make -C stageN update-stage0-commit to update from another stage. This command will automatically stage the updated files and introduce a commit,so make sure to commit your work before that.

If you rebased the branch (either onto a newer version of master, or fixing up some commits prior to the stage0 update, recreate the stage0 update commits. The script script/rebase-stage0.sh can be used for that.

The CI should prevent PRs with changes to stage0 (besides stdlib_flags.h) from entering master through the (squashing!) merge queue, and label such PRs with the changes-stage0 label. Such PRs should have a cleaned up history, with separate stage0 update commits; then coordinate with the admins to merge your PR using rebase merge, bypassing the merge queue.

Further Bootstrapping Complications

As written above, changes in meta code in the current stage usually will only affect later stages. This is an issue in two specific cases.

  • For non-builtin meta code such as notations or macros in Notation.lean, we expect changes to affect the current file and all later files of the same stage immediately, just like outside the stdlib. To ensure this, we need to build the stage using -Dinterpreter.prefer_native=false - otherwise, when executing a macro, the interpreter would notice that there is already a native symbol available for this function and run it instead of the new IR, but the symbol is from the previous stage!

    To make matters more complicated, while false is a reasonable default incurring only minor overhead (ParserDescrs and simple macros are cheap to interpret), there are situations where we need to set the option to true: when the interpreter is executed from the native code of the previous stage, the type of the value it computes must be identical to/ABI-compatible with the type in the previous stage. For example, if we add a new parameter to Macro or reorder constructors in ParserDescr, building the stage with the interpreter will likely fail. Thus we need to set interpreter.prefer_native to true in such cases to "freeze" meta code at their versions in the previous stage; no new meta code should be introduced in this stage. Any further stages (e.g. after an update-stage0) will then need to be compiled with the flag set to false again since they will expect the new signature.

    For an example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.

  • For the special case of quotations, it is desirable to have changes in built-in parsers affect them immediately: when the changes in the parser become active in the next stage, macros implemented via quotations should generate syntax trees compatible with the new parser, and quotation patterns in macro and elaborators should be able to match syntax created by the new parser and macros. Since quotations capture the syntax tree structure during execution of the current stage and turn it into code for the next stage, we need to run the current stage's built-in parsers in quotation via the interpreter for this to work. Caveats:

    • Since interpreting full parsers is not nearly as cheap and we rarely change built-in syntax, this needs to be opted in using -Dinternal.parseQuotWithCurrentStage=true.
    • The parser needs to be reachable via an import statement, otherwise the version of the previous stage will silently be used.
    • Only the parser code (Parser.fn) is affected; all metadata such as leading tokens is taken from the previous stage.

    For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422 (from before the flag defaulted to false).

To modify either of these flags both for building and editing the stdlib, adjust the code in stage0/src/stdlib_flags.h. The flags will automatically be reset on the next update-stage0 when the file is overwritten with the original version in src/.

Test Suite

After building Lean you can run all the tests using

cd build/release
make test ARGS=-j4

Change the 4 to the maximum number of parallel tests you want to allow. The best choice is the number of CPU cores on your machine as the tests are mostly CPU bound. You can find the number of processors on linux using nproc and on Windows it is the NUMBER_OF_PROCESSORS environment variable.

You can run tests after building a specific stage by adding the -C stageN argument. The default when run as above is stage 1. The Lean tests will automatically use that stage's corresponding Lean executables

Running make test will not pick up new test files; run

cmake build/release/stage1

to update the list of tests.

You can also use ctest directly if you are in the right folder. So to run stage1 tests with a 300 second timeout run this:

cd build/release/stage1
ctest -j 4 --output-on-failure --timeout 300

Useful ctest flags are -R <name of test> to run a single test, and --rerun-failed to run all tests that failed during the last run. You can also pass ctest flags via make test ARGS="--rerun-failed".

To get verbose output from ctest pass the --verbose command line option. Test output is normally suppressed and only summary information is displayed. This option will show all test output.

Test Suite Organization

All these tests are included by src/shell/CMakeLists.txt:

  • tests/lean: contains tests that come equipped with a .lean.expected.out file. The driver script test_single.sh runs each test and checks the actual output (*.produced.out) with the checked in expected output.

  • tests/lean/run: contains tests that are run through the lean command line one file at a time. These tests only look for error codes and do not check the expected output even though output is produced, it is ignored.

  • tests/lean/interactive: are designed to test server requests at a given position in the input file. Each .lean file contains comments that indicate how to simulate a client request at that position. using a --^ point to the line position. Example:

    open Foo in
    theorem tst2 (h : a ≤ b) : a + 2 ≤ b + 2 :=
    Bla.
      --^ textDocument/completion
    

    In this example, the test driver test_single.sh will simulate an auto-completion request at Bla.. The expected output is stored in a .lean.expected.out in the json format that is part of the Language Server Protocol.

    This can also be used to test the following additional requests:

    --^ textDocument/hover
    --^ textDocument/typeDefinition
    --^ textDocument/definition
    --^ $/lean/plainGoal
    --^ $/lean/plainTermGoal
    --^ insert: ...
    --^ collectDiagnostics
    
  • tests/lean/server: Tests more of the Lean --server protocol. There are just a few of them, and it uses .log files containing JSON.

  • tests/compiler: contains tests that will run the Lean compiler and build an executable that is executed and the output is compared to the .lean.expected.out file. This test also contains a subfolder foreign which shows how to extend Lean using C++.

  • tests/lean/trust0: tests that run Lean in a mode that Lean doesn't even trust the .olean files (i.e., trust 0).

  • tests/bench: contains performance tests.

  • tests/plugin: tests that compiled Lean code can be loaded into lean via the --plugin command line option.

Writing Good Tests

Every test file should contain:

  • an initial /-! -/ module docstring summarizing the test's purpose
  • a module docstring for each test section that describes what is tested and, if not 100% clear, why that is the desirable behavior

At the time of writing, most tests do not follow these new guidelines yet. For an example of a conforming test, see tests/lean/1971.lean.

Fixing Tests

When the Lean source code or the standard library are modified, some of the tests break because the produced output is slightly different, and we have to reflect the changes in the .lean.expected.out files. We should not blindly copy the new produced output since we may accidentally miss a bug introduced by recent changes. The test suite contains commands that allow us to see what changed in a convenient way. First, we must install meld. On Ubuntu, we can do it by simply executing

sudo apt-get install meld

Now, suppose bad_class.lean test is broken. We can see the problem by going to tests/lean directory and executing

./test_single.sh -i bad_class.lean

When the -i option is provided, meld is automatically invoked whenever there is discrepancy between the produced and expected outputs. meld can also be used to repair the problems.

In Emacs, we can also execute M-x lean4-diff-test-file to check/diff the file of the current buffer. To mass-copy all .produced.out files to the respective .expected.out file, use tests/lean/copy-produced.

Debugging

Some notes on how to debug Lean, which may also be applicable to debugging Lean programs in general.

Tracing

In CoreM and derived monads, we use trace[traceCls] "msg with {interpolations}" to fill the structured trace viewable with set_option trace.traceCls true. New trace classes have to be registered using registerTraceClass first.

Notable trace classes:

  • Elab.command/Elab.step: command/term macro expansion/elaboration steps

    Useful options modifying these traces for debugging syntax trees:

    set_option pp.raw true
    set_option pp.raw.maxDepth 10
    
  • Meta.synthInstance: typeclass resolution

  • Meta.isDefEq: unification

  • interpreter: full execution trace of the interpreter. Only available in debug builds.

In pure contexts or when execution is aborted before the messages are finally printed, one can instead use the term dbg_trace "msg with {interpolations}"; val (; can also be replaced by a newline), which will print the message to stderr before evaluating val. dbgTraceVal val can be used as a shorthand for dbg_trace "{val}"; val. Note that if the return value is not actually used, the trace code is silently dropped as well.

By default, such stderr output is buffered and shown as messages after a command has been elaborated, which is necessary to ensure deterministic ordering of messages under parallelism. If Lean aborts the process before it can finish the command or takes too long to do that, using -DstderrAsMessages=false avoids this buffering and shows dbg_trace output (but not traces or other diagnostics) immediately.

Debuggers

gdb/lldb can be used to inspect stack traces of compiled Lean code, though they cannot print values of Lean variables and terms in any legible way yet. For example, b lean_panic_fn can be used to look at the stack trace of a panic.

The rr reverse debugger is an amazing tool for investigating e.g. segfaults from reference counting errors, though better hope you will never need it...

Git Commit Convention

We are using the following convention for writing git commit messages. For pull requests, make sure the pull request title and description follow this convention, as the squash-merge commit will inherit title and body from the pull request.

This convention is based on the one from the AngularJS project (doc, commits).

Format of the commit message

<type>: <subject>
<NEWLINE>
<body>
<NEWLINE>
<footer>

<type> is:

  • feat (feature)
  • fix (bug fix)
  • doc (documentation)
  • style (formatting, missing semicolons, ...)
  • refactor
  • test (when adding missing tests)
  • chore (maintain, ex: travis-ci)
  • perf (performance improvement, optimization, ...)

<subject> has the following constraints:

  • use imperative, present tense: "change" not "changed" nor "changes"
  • do not capitalize the first letter
  • no dot(.) at the end

<body> has the following constraints:

  • just as in <subject>, use imperative, present tense
  • includes motivation for the change and contrasts with previous behavior

<footer> is optional and may contain two items:

  • Breaking changes: All breaking changes have to be mentioned in footer with the description of the change, justification and migration notes

  • Referencing issues: Closed bugs should be listed on a separate line in the footer prefixed with "Closes" keyword like this:

    Closes #123, #456

Examples

fix: add declarations for operator<<(std::ostream&, expr const&) and operator<<(std::ostream&, context const&) in the kernel

The actual implementation of these two operators is outside of the kernel. They are implemented in the file 'library/printer.cpp'. We declare them in the kernel to prevent the following problem. Suppose there is a file 'foo.cpp' that does not include 'library/printer.h', but contains

expr a;
...
std::cout << a << "\n";
...

The compiler does not generate an error message. It silently uses the operator bool() to coerce the expression into a Boolean. This produces counter-intuitive behavior, and may confuse developers.

Releasing a stable version

This checklist walks you through releasing a stable version. See below for the checklist for release candidates.

We'll use v4.6.0 as the intended release version as a running example.

  • One week before the planned release, ensure that (1) someone has written the release notes and (2) someone has written the first draft of the release blog post. If there is any material in ./releases_drafts/ on the releases/v4.6.0 branch, then the release notes are not done. (See the section "Writing the release notes".)
  • git checkout releases/v4.6.0 (This branch should already exist, from the release candidates.)
  • git pull
  • In src/CMakeLists.txt, verify you see
    • set(LEAN_VERSION_MINOR 6) (for whichever 6 is appropriate)
    • set(LEAN_VERSION_IS_RELEASE 1)
    • (both of these should already be in place from the release candidates)
  • git tag v4.6.0
  • git push $REMOTE v4.6.0, where $REMOTE is the upstream Lean repository (e.g., origin, upstream)
  • Now wait, while CI runs.
    • You can monitor this at https://github.com/leanprover/lean4/actions/workflows/ci.yml, looking for the v4.6.0 tag.
    • This step can take up to an hour.
    • If you are intending to cut the next release candidate on the same day, you may want to start on the release candidate checklist now.
  • Go to https://github.com/leanprover/lean4/releases and verify that the v4.6.0 release appears.
    • Edit the release notes on Github to select the "Set as the latest release".
    • Follow the instructions in creating a release candidate for the "GitHub release notes" step, now that we have a written RELEASES.md section. Do a quick sanity check.
  • Next, we will move a curated list of downstream repos to the latest stable release.
    • For each of the repositories listed below:
      • Make a PR to master/main changing the toolchain to v4.6.0
        • Update the toolchain file
        • In the Lakefile, if there are dependencies on specific version tags of dependencies that you've already pushed as part of this process, update them to the new tag. If they depend on main or master, don't change this; you've just updated the dependency, so it will work and be saved in the manifest
        • Run lake update
        • The PR title should be "chore: bump toolchain to v4.6.0".
      • Merge the PR once CI completes.
      • Create the tag v4.6.0 from master/main and push it.
      • Merge the tag v4.6.0 into the stable branch and push it.
    • We do this for the repositories:
      • lean4checker
        • No dependencies
        • Toolchain bump PR
        • Create and push the tag
        • Merge the tag into stable
      • Batteries
        • No dependencies
        • Toolchain bump PR
        • Create and push the tag
        • Merge the tag into stable
      • ProofWidgets4
        • Dependencies: Batteries
        • Note on versions and branches:
          • ProofWidgets uses a sequential version tagging scheme, e.g. v0.0.29, which does not refer to the toolchain being used.
          • Make a new release in this sequence after merging the toolchain bump PR.
          • ProofWidgets does not maintain a stable branch.
        • Toolchain bump PR
        • Create and push the tag, following the version convention of the repository
      • Aesop
        • Dependencies: Batteries
        • Toolchain bump PR including updated Lake manifest
        • Create and push the tag
        • Merge the tag into stable
      • doc-gen4
        • Dependencies: exist, but they're not part of the release workflow
        • Toolchain bump PR including updated Lake manifest
        • Create and push the tag
        • There is no stable branch; skip this step
      • Verso
        • Dependencies: exist, but they're not part of the release workflow
        • The SubVerso dependency should be compatible with every Lean release simultaneously, rather than following this workflow
        • Toolchain bump PR including updated Lake manifest
        • Create and push the tag
        • There is no stable branch; skip this step
      • import-graph
        • Toolchain bump PR including updated Lake manifest
        • Create and push the tag
        • There is no stable branch; skip this step
      • Mathlib
        • Dependencies: Aesop, ProofWidgets4, lean4checker, Batteries, doc-gen4, import-graph
        • Toolchain bump PR notes:
          • In addition to updating the lean-toolchain and lakefile.lean, in .github/workflows/lean4checker.yml update the line git checkout v4.6.0 to the appropriate tag.
          • Push the PR branch to the main Mathlib repository rather than a fork, or CI may not work reliably
          • Create and push the tag
          • Create a new branch from the tag, push it, and open a pull request against stable. Coordinate with a Mathlib maintainer to get this merged.
      • REPL
        • Dependencies: Mathlib (for test code)
        • Note that there are two copies of lean-toolchain/lakefile.lean: in the root, and in test/Mathlib/. Edit both, and run lake update in both directories.
        • Toolchain bump PR including updated Lake manifest
        • Create and push the tag
        • Merge the tag into stable
  • The v4.6.0 section of RELEASES.md is out of sync between releases/v4.6.0 and master. This should be reconciled:
    • Replace the v4.6.0 section on master with the v4.6.0 section on releases/v4.6.0 and commit this to master.
  • Merge the release announcement PR for the Lean website - it will be deployed automatically
  • Finally, make an announcement! This should go in https://leanprover.zulipchat.com/#narrow/stream/113486-announce, with topic v4.6.0. Please see previous announcements for suggested language. You will want a few bullet points for main topics from the release notes. Link to the blog post from the Zulip announcement.
  • Make sure that whoever is handling social media knows the release is out.

Optimistic(?) time estimates:

  • Initial checks and push the tag: 30 minutes.
  • Waiting for the release: 60 minutes.
  • Fixing release notes: 10 minutes.
  • Bumping toolchains in downstream repositories, up to creating the Mathlib PR: 30 minutes.
  • Waiting for Mathlib CI and bors: 120 minutes.
  • Finalizing Mathlib tags and stable branch, and updating REPL: 15 minutes.
  • Posting announcement and/or blog post: 20 minutes.

Creating a release candidate.

This checklist walks you through creating the first release candidate for a version of Lean.

We'll use v4.7.0-rc1 as the intended release version in this example.

  • Decide which nightly release you want to turn into a release candidate. We will use nightly-2024-02-29 in this example.
  • It is essential that Batteries and Mathlib already have reviewed branches compatible with this nightly.
    • Check that both Batteries and Mathlib's bump/v4.7.0 branch contain nightly-2024-02-29 in their lean-toolchain.
    • The steps required to reach that state are beyond the scope of this checklist, but see below!
  • Create the release branch from this nightly tag:
    git remote add nightly https://github.com/leanprover/lean4-nightly.git
    git fetch nightly tag nightly-2024-02-29
    git checkout nightly-2024-02-29
    git checkout -b releases/v4.7.0
    
  • In RELEASES.md replace Development in progress in the v4.7.0 section with Release notes to be written.
  • We will rely on automatically generated release notes for release candidates, and the written release notes will be used for stable versions only. It is essential to choose the nightly that will become the release candidate as early as possible, to avoid confusion.
  • In src/CMakeLists.txt,
    • verify that you see set(LEAN_VERSION_MINOR 7) (for whichever 7 is appropriate); this should already have been updated when the development cycle began.
    • set(LEAN_VERSION_IS_RELEASE 1) (this should be a change; on master and nightly releases it is always 0).
    • Commit your changes to src/CMakeLists.txt, and push.
  • git tag v4.7.0-rc1
  • git push origin v4.7.0-rc1
  • Ping the FRO Zulip that release notes need to be written. The release notes do not block completing the rest of this checklist.
  • Now wait, while CI runs.
    • You can monitor this at https://github.com/leanprover/lean4/actions/workflows/ci.yml, looking for the v4.7.0-rc1 tag.
    • This step can take up to an hour.
  • (GitHub release notes) Once the release appears at https://github.com/leanprover/lean4/releases/
    • Verify that the release is marked as a prerelease (this should have been done automatically by the CI release job).
    • In the "previous tag" dropdown, select v4.6.0, and click "Generate release notes". This will add a list of all the commits since the last stable version.
      • Delete "update stage0" commits, and anything with a completely inscrutable commit message.
  • Next, we will move a curated list of downstream repos to the release candidate.
    • This assumes that for each repository either:
      • There is already a reviewed branch bump/v4.7.0 containing the required adaptations. The preparation of this branch is beyond the scope of this document.
      • The repository does not need any changes to move to the new version.
    • For each of the target repositories:
      • If the repository does not need any changes (i.e. bump/v4.7.0 does not exist) then create a new PR updating lean-toolchain to leanprover/lean4:v4.7.0-rc1 and running lake update.
      • Otherwise:
        • Checkout the bump/v4.7.0 branch.
        • Verify that the lean-toolchain is set to the nightly from which the release candidate was created.
        • git merge origin/master
        • Change the lean-toolchain to leanprover/lean4:v4.7.0-rc1
        • In lakefile.lean, change any dependencies which were using nightly-testing or bump/v4.7.0 branches back to master or main, and run lake update for those dependencies.
        • Run lake build to ensure that dependencies are found (but it's okay to stop it after a moment).
        • git commit
        • git push
        • Open a PR from bump/v4.7.0 to master, and either merge it yourself after CI, if appropriate, or notify the maintainers that it is ready to go.
      • Once the PR has been merged, tag master with v4.7.0-rc1 and push this tag.
    • We do this for the same list of repositories as for stable releases, see above. As above, there are dependencies between these, and so the process above is iterative. It greatly helps if you can merge the bump/v4.7.0 PRs yourself! It is essential for Mathlib CI that you then create the next bump/v4.8.0 branch for the next development cycle. Set the lean-toolchain file on this branch to same nightly you used for this release.
    • For Batteries/Aesop/Mathlib, which maintain a nightly-testing branch, make sure there is a tag nightly-testing-2024-02-29 with date corresponding to the nightly used for the release (create it if not), and then on the nightly-testing branch git reset --hard master, and force push.
  • Make an announcement! This should go in https://leanprover.zulipchat.com/#narrow/stream/113486-announce, with topic v4.7.0-rc1. Please see previous announcements for suggested language. You will want a few bullet points for main topics from the release notes. Please also make sure that whoever is handling social media knows the release is out.
  • Begin the next development cycle (i.e. for v4.8.0) on the Lean repository, by making a PR that:
    • Updates src/CMakeLists.txt to say set(LEAN_VERSION_MINOR 8)
    • Replaces the "release notes will be copied" text in the v4.6.0 section of RELEASES.md with the finalized release notes from the releases/v4.6.0 branch.
    • Replaces the "development in progress" in the v4.7.0 section of RELEASES.md with
      Release candidate, release notes will be copied from the branch `releases/v4.7.0` once completed.
      
      and inserts the following section before that section:
      v4.8.0
      ----------
      Development in progress.
      
    • Removes all the entries from the ./releases_drafts/ folder.
    • Titled "chore: begin development cycle for v4.8.0"

Time estimates:

Slightly longer than the corresponding steps for a stable release. Similar process, but more things go wrong. In particular, updating the downstream repositories is significantly more work (because we need to merge existing bump/v4.7.0 branches, not just update a toolchain).

Preparing bump/v4.7.0 branches

While not part of the release process per se, this is a brief summary of the work that goes into updating Batteries/Aesop/Mathlib to new versions.

Please read https://leanprover-community.github.io/contribute/tags_and_branches.html

  • Each repo has an unreviewed nightly-testing branch that receives commits automatically from master, and has its toolchain updated automatically for every nightly. (Note: the aesop branch is not automated, and is updated on an as needed basis.) As a consequence this branch is often broken. A bot posts in the (private!) "Mathlib reviewers" stream on Zulip about the status of these branches.
  • We fix the breakages by committing directly to nightly-testing: there is no PR process.
    • This can either be done by the person managing this process directly, or by soliciting assistance from authors of files, or generally helpful people on Zulip!
  • Each repo has a bump/v4.7.0 which accumulates reviewed changes adapting to new versions.
  • Once nightly-testing is working on a given nightly, say nightly-2024-02-15, we will create a PR to bump/v4.7.0.
  • For Mathlib, there is a script in scripts/create-adaptation-pr.sh that automates this process.
  • For Batteries and Aesop it is currently manual.
  • For all of these repositories, the process is the same:
    • Make sure bump/v4.7.0 is up to date with master (by merging master, no PR necessary)
    • Create from bump/v4.7.0 a bump/nightly-2024-02-15 branch.
    • In that branch, git merge nightly-testing to bring across changes from nightly-testing.
    • Sanity check changes, commit, and make a PR to bump/v4.7.0 from the bump/nightly-2024-02-15 branch.
    • Solicit review, merge the PR into bump/v4.7.0.
  • It is always okay to merge in the following directions: master -> bump/v4.7.0 -> bump/nightly-2024-02-15 -> nightly-testing. Please remember to push any merges you make to intermediate steps!

Writing the release notes

We are currently trying a system where release notes are compiled all at once from someone looking through the commit history. The exact steps are a work in progress. Here is the general idea:

  • The work is done right on the releases/v4.6.0 branch sometime after it is created but before the stable release is made. The release notes for v4.6.0 will later be copied to master when we begin a new development cycle.
  • There can be material for release notes entries in commit messages.
  • There can also be pre-written entries in ./releases_drafts, which should be all incorporated in the release notes and then deleted from the branch. See ./releases_drafts/README.md for more information.
  • The release notes should be written from a downstream expert user's point of view.

This section will be updated when the next release notes are written (for v4.10.0).

Documentation

The Lean doc folder contains the Lean Manual and is authored in a combination of markdown (*.md) files and literate Lean files. The .lean files are preprocessed using a tool called LeanInk and Alectryon which produces a generated markdown file. We then run mdbook on the result to generate the html pages.

Settings

We are using the following settings while editing the markdown docs.

{
    "files.insertFinalNewline": true,
    "files.trimTrailingWhitespace": true,
    "[markdown]": {
        "rewrap.wrappingColumn": 70
    }
}

Build

Using Nix

Building the manual using Nix (which is what the CI does) is as easy as

$ nix build --update-input lean ./doc

You can also open a shell with mdbook for running the commands mentioned below with nix develop ./doc#book. Otherwise, read on.

Manually

To build and test the book you have to preprocess the .lean files with Alectryon then use our own fork of the Rust tool named mdbook. We have our own fork of mdBook with the following additional features:

  • Add support for hiding lines in other languages #1339
  • Make mdbook test call the lean compiler to test the snippets.
  • Ability to test a single chapter at a time which is handy when you are working on that chapter. See the --chapter option.

So you need to setup these tools before you can run mdBook.

  1. install Rust which provides you with the cargo tool for building rust packages. Then run the following:

    cargo install --git https://github.com/leanprover/mdBook mdbook
    
  2. Clone https://github.com/leanprover/LeanInk.git and run lake build then make the resulting binary available to Alectryon using e.g.

    # make `leanInk` available in the current shell
    export PATH=$PWD/build/bin:$PATH
    
  3. Create a Python 3.10 environment.

  4. Install Alectryon:

    python3 -m pip install git+https://github.com/Kha/alectryon.git@typeid
    
  5. Now you are ready to process the *.lean files using Alectryon as follows:

    cd lean4/doc
    alectryon --frontend lean4+markup examples/palindromes.lean --backend webpage -o palindromes.lean.md
    

    Repeat this for the other .lean files you care about or write a script to process them all.

  6. Now you can build the book using:

    cd lean4/doc
    mdbook build
    

This will put the HTML in a out folder so you can load out/index.html in your web browser and it should look like https://lean-lang.org/lean4/doc/.

  1. It is also handy to use e.g. mdbook watch in the doc/ folder so that it keeps the html up to date while you are editing.

    mdbook watch --open  # opens the output in `out/` in your default browser
    

Testing Lean Snippets

You can run the following in the doc/ folder to test all the lean code snippets.

```bash
mdbook test
```

and you can use the --chapter option to test a specific chapter that you are working on:

```bash
mdbook test --chapter Array
```

Use chapter name ? to get a list of all the chapter names.

Foreign Function Interface

NOTE: The current interface was designed for internal use in Lean and should be considered unstable. It will be refined and extended in the future.

As Lean is written partially in Lean itself and partially in C++, it offers efficient interoperability between the two languages (or rather, between Lean and any language supporting C interfaces). This support is however currently limited to transferring Lean data types; in particular, it is not possible yet to pass or return compound data structures such as C structs by value from or to Lean.

There are two primary attributes for interoperating with other languages:

  • @[extern "sym"] constant leanSym : ... binds a Lean declaration to the external symbol sym. It can also be used with def to provide an internal definition, but ensuring consistency of both definitions is up to the user.
  • @[export sym] def leanSym : ... exports leanSym under the unmangled symbol name sym.

For simple examples of how to call foreign code from Lean and vice versa, see https://github.com/leanprover/lean4/blob/master/src/lake/examples/ffi and https://github.com/leanprover/lean4/blob/master/src/lake/examples/reverse-ffi, respectively.

The Lean ABI

The Lean Application Binary Interface (ABI) describes how the signature of a Lean declaration is encoded as a native calling convention. It is based on the standard C ABI and calling convention of the target platform. For a Lean declaration marked with either @[extern "sym"] or @[export sym] for some symbol name sym, let α₁ → ... → αₙ → β be the normalized declaration's type. If n is 0, the corresponding C declaration is

extern s sym;

where s is the C translation of β as specified in the next section. In the case of an @[extern] definition, the symbol's value is guaranteed to be initialized only after calling the Lean module's initializer or that of an importing module; see Initialization.

If n is greater than 0, the corresponding C declaration is

s sym(t₁, ..., tₘ);

where the parameter types tᵢ are the C translation of the αᵢ as in the next section. In the case of @[extern] all irrelevant types are removed first; see next section.

Translating Types from Lean to C

  • The integer types UInt8, ..., UInt64, USize are represented by the C types uint8_t, ..., uint64_t, size_t, respectively

  • Char is represented by uint32_t

  • Float is represented by double

  • An enum inductive type of at least 2 and at most 2^32 constructors, each of which with no parameters, is represented by the first type of uint8_t, uint16_t, uint32_t that is sufficient to represent all constructor indices.

    For example, the type Bool is represented as uint8_t with values 0 for false and 1 for true.

  • Decidable α is represented the same way as Bool

  • An inductive type with a trivial structure, that is,

    • it is none of the types described above
    • it is not marked unsafe
    • it has a single constructor with a single parameter of relevant type

    is represented by the representation of that parameter's type.

    For example, { x : α // p }, the Subtype structure of a value of type α and an irrelevant proof, is represented by the representation of α.

  • Nat is represented by lean_object *. Its runtime value is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 (lean_is_scalar), an encoded unboxed natural number (lean_box/lean_unbox).

  • A universe Sort u, type constructor ... → Sort u, or proposition p : Prop is irrelevant and is either statically erased (see above) or represented as a lean_object * with the runtime value lean_box(0)

  • Any other type is represented by lean_object *. Its runtime value is a pointer to an object of a subtype of lean_object (see the "Inductive types" section below) or the unboxed value lean_box(cidx) for the cidxth constructor of an inductive type if this constructor does not have any relevant parameters.

    Example: the runtime value of u : Unit is always lean_box(0).

Inductive types

For inductive types which are in the fallback lean_object * case above and not trivial constructors, the type is stored as a lean_ctor_object, and lean_is_ctor will return true. A lean_ctor_object stores the constructor index in the header, and the fields are stored in the m_objs portion of the object.

The memory order of the fields is derived from the types and order of the fields in the declaration. They are ordered as follows:

  • Non-scalar fields stored as lean_object *
  • Fields of type USize
  • Other scalar fields, in decreasing order by size

Within each group the fields are ordered in declaration order. Warning: Trivial wrapper types still count toward a field being treated as non-scalar for this purpose.

  • To access fields of the first kind, use lean_ctor_get(val, i) to get the ith non-scalar field.
  • To access USize fields, use lean_ctor_get_usize(val, n+i) to get the ith usize field and n is the total number of fields of the first kind.
  • To access other scalar fields, use lean_ctor_get_uintN(val, off) or lean_ctor_get_usize(val, off) as appropriate. Here off is the byte offset of the field in the structure, starting at n*sizeof(void*) where n is the number of fields of the first two kinds.

For example, a structure such as

structure S where
  ptr_1 : Array Nat
  usize_1 : USize
  sc64_1 : UInt64
  ptr_2 : { x : UInt64 // x > 0 } -- wrappers don't count as scalars
  sc64_2 : Float -- `Float` is 64 bit
  sc8_1 : Bool
  sc16_1 : UInt16
  sc8_2 : UInt8
  sc64_3 : UInt64
  usize_2 : USize
  ptr_3 : Char -- trivial wrapper around `UInt32`
  sc32_1 : UInt32
  sc16_2 : UInt16

would get re-sorted into the following memory order:

  • S.ptr_1 - lean_ctor_get(val, 0)
  • S.ptr_2 - lean_ctor_get(val, 1)
  • S.ptr_3 - lean_ctor_get(val, 2)
  • S.usize_1 - lean_ctor_get_usize(val, 3)
  • S.usize_2 - lean_ctor_get_usize(val, 4)
  • S.sc64_1 - lean_ctor_get_uint64(val, sizeof(void*)*5)
  • S.sc64_2 - lean_ctor_get_float(val, sizeof(void*)*5 + 8)
  • S.sc64_3 - lean_ctor_get_uint64(val, sizeof(void*)*5 + 16)
  • S.sc32_1 - lean_ctor_get_uint32(val, sizeof(void*)*5 + 24)
  • S.sc16_1 - lean_ctor_get_uint16(val, sizeof(void*)*5 + 28)
  • S.sc16_2 - lean_ctor_get_uint16(val, sizeof(void*)*5 + 30)
  • S.sc8_1 - lean_ctor_get_uint8(val, sizeof(void*)*5 + 32)
  • S.sc8_2 - lean_ctor_get_uint8(val, sizeof(void*)*5 + 33)

Borrowing

By default, all lean_object * parameters of an @[extern] function are considered owned, i.e. the external code is passed a "virtual RC token" and is responsible for passing this token along to another consuming function (exactly once) or freeing it via lean_dec. To reduce reference counting overhead, parameters can be marked as borrowed by prefixing their type with @&. Borrowed objects must only be passed to other non-consuming functions (arbitrarily often) or converted to owned values using lean_inc. In lean.h, the lean_object * aliases lean_obj_arg and b_lean_obj_arg are used to mark this difference on the C side.

Return values and @[export] parameters are always owned at the moment.

Initialization

When including Lean code as part of a larger program, modules must be initialized before accessing any of their declarations. Module initialization entails

  • initialization of all "constants" (nullary functions), including closed terms lifted out of other functions
  • execution of all [init] functions
  • execution of all [builtin_init] functions, if the builtin parameter of the module initializer has been set

The module initializer is automatically run with the builtin flag for executables compiled from Lean code and for "plugins" loaded with lean --plugin. For all other modules imported by lean, the initializer is run without builtin. Thus [init] functions are run iff their module is imported, regardless of whether they have native code available or not, while [builtin_init] functions are only run for native executable or plugins, regardless of whether their module is imported or not. lean uses built-in initializers for e.g. registering basic parsers that should be available even without importing their module (which is necessary for bootstrapping).

The initializer for module A.B is called initialize_A_B and will automatically initialize any imported modules. Module initializers are idempotent (when run with the same builtin flag), but not thread-safe. Together with initialization of the Lean runtime, you should execute code like the following exactly once before accessing any Lean declarations:

void lean_initialize_runtime_module();
void lean_initialize();
lean_object * initialize_A_B(uint8_t builtin, lean_object *);
lean_object * initialize_C(uint8_t builtin, lean_object *);
...

lean_initialize_runtime_module();
//lean_initialize();  // necessary if you (indirectly) access the `Lean` package

lean_object * res;
// use same default as for Lean executables
uint8_t builtin = 1;
res = initialize_A_B(builtin, lean_io_mk_world());
if (lean_io_result_is_ok(res)) {
    lean_dec_ref(res);
} else {
    lean_io_result_show_error(res);
    lean_dec(res);
    return ...;  // do not access Lean declarations if initialization failed
}
res = initialize_C(builtin, lean_io_mk_world());
if (lean_io_result_is_ok(res)) {
...

//lean_init_task_manager();  // necessary if you (indirectly) use `Task`
lean_io_mark_end_initialization();

In addition, any other thread not spawned by the Lean runtime itself must be initialized for Lean use by calling

void lean_initialize_thread();

and should be finalized in order to free all thread-local resources by calling

void lean_finalize_thread();

@[extern] in the Interpreter

The interpreter can run Lean declarations for which symbols are available in loaded shared libraries, which includes @[extern] declarations. Thus to e.g. run #eval on such a declaration, you need to

  1. compile (at least) the module containing the declaration and its dependencies into a shared library, and then
  2. pass this library to lean --load-dynlib= to run code importing this module.

Note that it is not sufficient to load the foreign library containing the external symbol because the interpreter depends on code that is emitted for each @[extern] declaration. Thus it is not possible to interpret an @[extern] declaration in the same file.

See tests/compiler/foreign for an example.