Lean's concrete syntax is extensible.
In a language like Lean, it's not possible to completely describe the syntax once and for all, because libraries may define syntax in addition to new constants or inductive types.
Rather than completely describing the language here, the overall framework is described, while the syntax of each language construct is documented in the section to which it belongs.
Comments are stretches of the file that, despite not being whitespace, are treated as such.
Lean has two syntaxes for comments:
- Line comments
A --
that does not occur as part of a token begins a line comment. All characters from the initial -
to the newline are treated as whitespace.
- Block comments
A /-
that does not occur as part of a token and is not immediately followed by a -
character begins a block comment.
The block comment continues until a terminating -/
is found.
Block comments may be nested; a -/
only terminates the comment if prior nested block comment openers /-
have been terminated by a matching -/
.
/--
and /-!
begin documentation rather than comments, which are also terminated with -/
and may contain nested block comments.
Even though documentation resembles comments, they are their own syntactic category; their valid placement is determined by Lean's grammar.
4.1.1.2.3. Keywords and Identifiers🔗
An identifier consists of one or more identifier components, separated by '.'
.
Identifier components consist of a letter or letter-like character or an underscore ('_'
), followed by zero or more identifier continuation characters.
Letters are English letters, upper- or lowercase, and the letter-like characters include a range of non-English alphabetic scripts, including the Greek script which is widely used in Lean, as well as the members of the Unicode letter-like symbol block, which contains a number of double-struck characters (including ℕ
and ℤ
) and abbreviations.
Identifier continuation characters consist of letters, letter-like characters, underscores ('_'
), exclamation marks (!
), question marks (?
), subscripts, and single quotes ('
).
As an exception, underscore alone is not a valid identifier.
Identifiers components may also be surrounded by double guillemets ('«'
and '»'
).
Such identifier components may contain any character at all aside from '»'
, even '«'
, '.'
, and newlines.
The guillemets are not part of the resulting identifier component, so «x»
and x
denote the same identifier.
«Nat.add»
, on the other hand, is an identifier with a single component, while Nat.add
has two.
Some potential identifier components may be reserved keywords.
The specific set of reserved keywords depends on the set of active syntax extensions, which may depend on the set of imported modules and the currently-opened namespaces; it is impossible to enumerate for Lean as a whole.
These keywords must also be quoted with guillemets to be used as identifier components in most syntactic contexts.
Contexts in which keywords may be used as identifiers without guillemets, such as constructor names in inductive types, are raw identifier contexts.
Identifiers that contain one or more '.'
characters, and thus consist of more than one identifier component, are called hierarchical identifiers.
Hierarchical identifiers are used to represent both module names and names in a namespace.