Arithmetic as an embedded domain-specific language
Let's parse another classic grammar, the grammar of arithmetic expressions with addition, multiplication, integers, and variables. In the process, we'll learn how to:
- Convert identifiers such as
x
into strings within a macro. - add the ability to "escape" the macro context from within the macro. This is useful to interpret identifiers with their original meaning (predefined values) instead of their new meaning within a macro (treat as a symbol).
Let's begin with the simplest thing possible. We'll define an AST, and use operators +
and *
to denote
building an arithmetic AST.
Here's the AST that we will be parsing:
inductive Arith : Type
| add : Arith → Arith → Arith -- e + f
| mul : Arith → Arith → Arith -- e * f
| int : Int → Arith -- constant
| symbol : String → Arith -- variable
We declare a syntax category to describe the grammar that we will be parsing.
See that we control the precedence of +
and *
by writing syntax:50
for addition and syntax:60
for multiplication,
indicating that multiplication binds tighter than addition (higher the number, tighter the binding).
This allows us to declare precedence when defining new syntax.
declare_syntax_cat arith
syntax num : arith -- int for Arith.int
syntax str : arith -- strings for Arith.symbol
syntax:60 arith:60 "+" arith:61 : arith -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith -- Arith.mul
syntax "(" arith ")" : arith -- parenthesized expressions
Further, if we look at syntax:60 arith:60 "+" arith:61 : arith
, the
precedence declarations at arith:60 "+" arith:61
conveys that the left
argument must have precedence at least 60
or greater, and the right argument
must have precedence at least61
or greater. Note that this forces left
associativity. To understand this, let's compare two hypothetical parses:
-- syntax:60 arith:60 "+" arith:61 : arith -- Arith.add
-- a + b + c
(a:60 + b:61):60 + c
a + (b:60 + c:61):60
In the parse tree of a + (b:60 + c:61):60
, we see that the right argument (b + c)
is given the precedence 60
. However,
the rule for addition expects the right argument to have a precedence of at least 61, as witnessed by the arith:61
at
the right-hand-side of syntax:60 arith:60 "+" arith:61 : arith
. Thus, the rule syntax:60 arith:60 "+" arith:61 : arith
ensures that addition is left associative.
Since addition is declared arguments of precedence 60/61
and multiplication with 70/71
, this causes multiplication to bind
tighter than addition. Once again, let's compare two hypothetical parses:
-- syntax:60 arith:60 "+" arith:61 : arith -- Arith.add
-- syntax:70 arith:70 "*" arith:71 : arith -- Arith.mul
-- a * b + c
a * (b:60 + c:61):60
(a:70 * b:71):70 + c
While parsing a * (b + c)
, (b + c)
is assigned a precedence 60
by the addition rule. However, multiplication expects
the right argument to have precedence at least 71. Thus, this parse is invalid. In contrast, (a * b) + c
assigns
a precedence of 70
to (a * b)
. This is compatible with addition which expects the left argument to have precedence
**at least 60
** (70
is greater than 60
). Thus, the string a * b + c
is parsed as (a * b) + c
.
For more details, please look at the Lean manual on syntax extensions.
To go from strings into Arith
, we define a macro to
translate the syntax category arith
into an Arith
inductive value that
lives in term
:
-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term
Our macro rules perform the "obvious" translation:
macro_rules
| `(`[Arith| $s:str]) => `(Arith.symbol $s)
| `(`[Arith| $num:num]) => `(Arith.int $num)
| `(`[Arith| $x + $y]) => `(Arith.add `[Arith| $x] `[Arith| $y])
| `(`[Arith| $x * $y]) => `(Arith.mul `[Arith| $x] `[Arith| $y])
| `(`[Arith| ($x)]) => `(`[Arith| $x])
And some examples:
#check `[Arith| "x" * "y"] -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")
#check `[Arith| "x" + "y"] -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")
#check `[Arith| "x" + 20] -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)
#check `[Arith| "x" + "y" * "z"] -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))
#check `[Arith| "x" * "y" + "z"] -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")
#check `[Arith| ("x" + "y") * "z"] -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")
Writing variables as strings, such as "x"
gets old; wouldn't it be so much
prettier if we could write x * y
, and have the macro translate this into Arith.mul (Arith.Symbol "x") (Arith.mul "y")
?
We can do this, and this will be our first taste of manipulating macro variables --- we'll use x.getId
instead of directly evaluating $x
.
We also write a macro rule for Arith|
that translates an identifier into
a string, using $(Lean.quote (toString x.getId))
:
syntax ident : arith
macro_rules
| `(`[Arith| $x:ident]) => `(Arith.symbol $(Lean.quote (toString x.getId)))
Let's test and see that we can now write expressions such as x * y
directly instead of having to write "x" * "y"
:
#check `[Arith| x] -- Arith.symbol "x"
def xPlusY := `[Arith| x + y]
#print xPlusY -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")
We now show an unfortunate consequence of the above definitions. Suppose we want to build (x + y) + z
.
Since we already have defined xPlusY
as x + y
, perhaps we should reuse it! Let's try:
#check `[Arith| xPlusY + z] -- Arith.add (Arith.symbol "xPlusY") (Arith.symbol "z")
Whoops, that didn't work! What happened? Lean treats xPlusY
itself as an identifier! So we need to add some syntax
to be able to "escape" the Arith|
context. Let's use the syntax <[ $e:term ]>
to mean: evaluate $e
as a real term,
not an identifier. The macro looks like follows:
syntax "<[" term "]>" : arith -- escape for embedding terms into `Arith`
macro_rules
| `(`[Arith| <[ $e:term ]>]) => pure e
Let's try our previous example:
#check `[Arith| <[ xPlusY ]> + z] -- Arith.add xPlusY (Arith.symbol "z")
Perfect!
In this tutorial, we expanded on the previous tutorial to parse a more realistic grammar with multiple levels of precedence, how to parse identifiers directly within a macro, and how to provide an escape from within the macro context.
Full code listing
inductive Arith : Type
| add : Arith → Arith → Arith -- e + f
| mul : Arith → Arith → Arith -- e * f
| int : Int → Arith -- constant
| symbol : String → Arith -- variable
declare_syntax_cat arith
syntax num : arith -- int for Arith.int
syntax str : arith -- strings for Arith.symbol
syntax:60 arith:60 "+" arith:61 : arith -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith -- Arith.mul
syntax "(" arith ")" : arith -- parenthesized expressions
-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term
macro_rules
| `(`[Arith| $s:str]) => `(Arith.symbol $s)
| `(`[Arith| $num:num]) => `(Arith.int $num)
| `(`[Arith| $x + $y]) => `(Arith.add `[Arith| $x] `[Arith| $y])
| `(`[Arith| $x * $y]) => `(Arith.mul `[Arith| $x] `[Arith| $y])
| `(`[Arith| ($x)]) => `(`[Arith| $x])
#check `[Arith| "x" * "y"] -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")
#check `[Arith| "x" + "y"] -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")
#check `[Arith| "x" + 20] -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)
#check `[Arith| "x" + "y" * "z"] -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))
#check `[Arith| "x" * "y" + "z"] -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")
#check `[Arith| ("x" + "y") * "z"] -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")
syntax ident : arith
macro_rules
| `(`[Arith| $x:ident]) => `(Arith.symbol $(Lean.quote (toString x.getId)))
#check `[Arith| x] -- Arith.symbol "x"
def xPlusY := `[Arith| x + y]
#print xPlusY -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")
syntax "<[" term "]>" : arith -- escape for embedding terms into `Arith`
macro_rules
| `(`[Arith| <[ $e:term ]>]) => pure e
#check `[Arith| <[ xPlusY ]> + z] -- Arith.add xPlusY (Arith.symbol "z")