Balanced Parentheses as an Embedded Language

Let's look at how to use macros to extend the Lean 4 parser and embed a language for building balanced parentheses. This language accepts strings given by the BNF grammar

Dyck ::=
    "(" Dyck ")"
  | "{" Dyck "}"
  | "end"

We begin by defining an inductive data type of the grammar we wish to parse:

inductive Dyck : Type where | round : Dyck Dyck -- ( <inner> ) | curly : Dyck Dyck -- { <inner> } | leaf : Dyck

We begin by declaring a syntax category using the declare_syntax_cat <category> command. This names our grammar and allows us to specify parsing rules associated with our grammar.

/-- Dyck grammar parse trees -/ declare_syntax_cat brack

Next, we specify the grammar using the syntax <parse rule> command:

syntax "end" : brack

The above means that the token end lives in syntax category brack.

Similarly, we declare the rules "(" Dyck ")" and "{" Dyck "}" using the rules:

syntax "(" brack ")" : brack syntax "{" brack "}" : brack

Finally, we need a way to build Lean 4 terms from this grammar -- that is, we must translate out of this grammar into a Dyck value, which is a Lean 4 term. For this, we create a new kind of "quotation" that consumes syntax in brack and produces a term.

/-- Notation for translating `brack` into `term` -/ syntax "`[Dyck| " brack "]" : term

To specify the transformation rules, we use macro_rules to declare how the syntax `[Dyck| <brack>] produces terms. This is written using a pattern-matching style syntax, where the left-hand side declares the syntax pattern to be matched, and the right-hand side declares the production. Syntax placeholders (antiquotations) are introduced via the $<var-name> syntax. The right-hand side is an arbitrary Lean term that we are producing.

/-- rules to translate Dyck grammar into inductive value of type Dyck -/ macro_rules | `(`[Dyck| end]) => `(Dyck.leaf) | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b]) -- recurse | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b]) -- recurse Dyck.leaf : Dyck#check `[Dyck| end] -- Dyck.leaf Dyck.leaf.round.curly : Dyck#check `[Dyck| {(end)}] -- Dyck.leaf.round.curly

In summary, we've seen:

The full program listing is given below:

inductive Dyck : Type where | round : Dyck Dyck -- ( <inner> ) | curly : Dyck Dyck -- { <inner> } | leaf : Dyck /-- Dyck grammar parse trees -/ declare_syntax_cat brack syntax "(" brack ")" : brack syntax "{" brack "}" : brack syntax "end" : brack /-- Notation for translating `brack` into `term` -/ syntax "`[Dyck| " brack "]" : term /-- rules to translate Dyck grammar into inductive value of type Dyck -/ macro_rules | `(`[Dyck| end]) => `(Dyck.leaf) | `(`[Dyck| ($b)]) => `(Dyck.round `[Dyck| $b]) -- recurse | `(`[Dyck| {$b}]) => `(Dyck.curly `[Dyck| $b]) -- recurse Dyck.leaf : Dyck#check `[Dyck| end] -- Dyck.leaf Dyck.leaf.round.curly : Dyck#check `[Dyck| {(end)}] -- Dyck.leaf.round.curly