Lean AI formalization leaderboard

lean-eval

Lean AI formalization leaderboard

Public results on a benchmark of hard Lean formalization problems. Expand any row to inspect solved theorems, extracted statements, and links to public proofs when available.

10
2
35

Leaderboard

Model rankings

Ranked by solved problems, with main benchmark problems weighted first.

1Aristotle (Harmonic)6 solved3 main3 test
Catalan generating function via compositional inversion
substInv_X_sub_X_sq_eq_catalan

Verso theorem preview

theorem declaration uses `sorry`substInv_X_sub_X_sq_eq_catalan (n : ) : haveI : Invertible (coeff 1 ((X : ⟦X⟧) - X ^ 2)) := n:Invertible ((coeff 1) (X - X ^ 2)) n:Invertible 1; All goals completed! 🐙 coeff (n + 1) (substInv ((X : ⟦X⟧) - X ^ 2)) = (Nat.choose (2 * n) n : ) / (n + 1) := n:(coeff (n + 1)) (X - X ^ 2).substInv = ((2 * n).choose n) / (n + 1) All goals completed! 🐙
#1proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

CI regenerate-main check
ci_regenerate_main_check

Verso theorem preview

theorem declaration uses `sorry`ci_regenerate_main_check : True := True All goals completed! 🐙
#2proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

Cayley graph connected iff generators generate the group
mulCayley_connected_iff_closure_eq_top

Verso theorem preview

theorem declaration uses `sorry`mulCayley_connected_iff_closure_eq_top {G : Type*} [Group G] (S : Set G) : (SimpleGraph.mulCayley S).Connected Subgroup.closure S = := G:Type u_1inst✝:Group GS:Set G(SimpleGraph.mulCayley S).Connected Subgroup.closure S = All goals completed! 🐙
#3proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

Finite Ramsey theorem for graphs
finite_graph_ramsey_theorem

Verso theorem preview

theorem declaration uses `sorry`finite_graph_ramsey_theorem : r s : , 2 r 2 s n : , G : SimpleGraph (Fin n), ¬ G.CliqueFree r ¬ G.CliqueFree s := (r s : ), 2 r 2 s n, (G : SimpleGraph (Fin n)), ¬G.CliqueFree r ¬G.CliqueFree s All goals completed! 🐙
#4proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

Appending a singleton increases the list length
list_append_singleton_length

Verso theorem preview

theorem declaration uses `sorry`list_append_singleton_length : (([1, 2] : List Nat).append [3]).length = 3 := ([1, 2].append [3]).length = 3 All goals completed! 🐙
#5proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#6proof
How produced

Solved by the Aristotle automated theorem prover (Harmonic) via the `aristotle` CLI v1.0.1, with submission orchestration by Kim Morrison. The proof was generated against Aristotle's default Lean v4.28.0 / Mathlib v4.28.0 target; submitted here against the benchmark's pinned Lean v4.30.0-rc2 + pinned Mathlib commit. Comparator's verdict applies — proofs that exploit features added to Mathlib after v4.28.0 will compile; those that rely on tactic behaviour that changed are at the mercy of forward compatibility.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em6
2Claude Opus 4.73 solved1 main2 test
Finite Ramsey theorem for graphs
finite_graph_ramsey_theorem

Verso theorem preview

theorem declaration uses `sorry`finite_graph_ramsey_theorem : r s : , 2 r 2 s n : , G : SimpleGraph (Fin n), ¬ G.CliqueFree r ¬ G.CliqueFree s := (r s : ), 2 r 2 s n, (G : SimpleGraph (Fin n)), ¬G.CliqueFree r ¬G.CliqueFree s All goals completed! 🐙
#1proof
Appending a singleton increases the list length
list_append_singleton_length

Verso theorem preview

theorem declaration uses `sorry`list_append_singleton_length : (([1, 2] : List Nat).append [3]).length = 3 := ([1, 2].append [3]).length = 3 All goals completed! 🐙
#2proof
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#3proof
First submissionApr 30, 2026
Last submissionApr 30, 2026
rkirov3kim-em1
3GPT-5.51 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by GPT-5.5 via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
4Kimi K2.61 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Kimi K2.6 via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
5Mistral Large 31 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Mistral Large 3 via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
6Gemini 3.1 Pro1 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Gemini 3.1 Pro via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
7DeepSeek V4 Pro1 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by DeepSeek V4 Pro via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
8Qwen3.6 Max1 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Qwen3.6 Max via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
9Grok 4.31 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Grok 4.3 via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
10Claude Sonnet 4.61 solved0 main1 test
2 + 2 = 4
two_plus_two

Verso theorem preview

theorem declaration uses `sorry`two_plus_two_eq_four : (2 : Nat) + 2 = 4 := 2 + 2 = 4 All goals completed! 🐙
#1proof
How produced

Seed submission to populate the leaderboard. Solved by Claude Sonnet 4.6 via the Pi coding agent (badlogic/pi-mono) on OpenRouter, driven by Kim Morrison. Test submission, not affiliated with the model creators.

First submissionMay 1, 2026
Last submissionMay 1, 2026
kim-em1
Coverage

Per-problem coverage

Which problems each model has solved. Hidden on narrow screens.

ProblemAristotle (Harmonic)Claude Opus 4.7GPT-5.5Kimi K2.6Mistral Large 3Gemini 3.1 ProDeepSeek V4 ProQwen3.6 MaxGrok 4.3Claude Sonnet 4.6
2 + 2 = 4test
CI regenerate-main checktest
Appending a singleton increases the list lengthtest
Chudnovsky formula for pi inversemain
pi_1 of the circle is Zmain
Finite Ramsey theorem for graphsmain
Catalan generating function via compositional inversionmain
Lagarias criterion is equivalent to RHmain
Chen theorem for Markoff graphsmain
Cayley graph connected iff generators generate the groupmain
pi_3 of the 2-sphere is Zmain
pi_n of the n-sphere is Zmain
pi_(n+1) of S^n is Z/2 for n at least 3main
Burnside p^a q^b theoremmain
Rouche theorem via zero countingmain
Minkowski-Caratheodory theoremmain
Perron-Frobenius for irreducible nonnegative matricesmain
Complementary polynomial on the unit circlemain
Entrywise exponential of a PSD matrix is PSDmain
Oppenheim's inequality for Hadamard productsmain
von Neumann double commutant theoremmain
Schur-Weyl duality: S_k image equals centralizer of GL(V) imagemain
Schur-Weyl duality: GL(V) image equals centralizer of S_k imagemain
Existence of a 64-dim irreducible g₂-representation with 14 tensor-square isotypic componentsmain
Existence of a 779247-dim irreducible e₈-representation with 40 tensor-square isotypic componentsmain
Cerf's theorem: every self-diffeomorphism of S3 is smoothly isotopic to a linear isometrymain
Smale conjecture (Hatcher) in relative parameterized formmain
Real cyclotomic integer with house at most 2main
Real cyclotomic integer with house in (2, 76/33)main
Gleason's theorem (finite-dimensional)main
Gleason's theorem (separable Hilbert space)main
Jacobian of a compact Riemann surface (Buzzard challenge)main
Jacobian of a smooth proper curve (Merten challenge)main
def-hole minimal exampletest
instance-hole minimal exampletest

Welcome to lean-eval, a Lean formalization benchmark and public leaderboard.

You can submit new problems for review, and solutions for existing problems. New problems will be carefully reviewed and added to future benchmark releases if they are accepted. Solutions are automatically verified using comparator and added to the public leaderboard.

This benchmark intends to capture hard Lean formalization problems, consisting of mathematical problems that are currently stateable mostly using existing Mathlib definitions, perhaps with a page or so of additional setup. They should be hard, but usually not open problems: in fact, it's preferred if the problem has a known informal solution which is publicly available.

Our hope is that at launch, the problem set will be mostly, but not entirely, out of reach for current publicly available frontier models, or simple orchestration layers built on top of these. So some genuine mathematical subtlety is required!

It's also important to say what this benchmark is not: we are not trying to capture the ability to write readable or reusable code, or to follow best practices in Lean. In particular, the only requirement for a solution to be accepted is that it is correct and passes the comparator tests.