Neural Theorem Proving

Automated theorem proving once meant using a symbolic algorithm to search for formal proofs, but now it also includes using neural networks, reinforcement learning, and other machine learning methods to find formal or informal mathematical proofs. Machine learning is also used for autoformalization, that is, the automatic translation of informal mathematical definitions and proofs to formal ones that can be checked in a proof assistant.

In 2020, Daniel Selsam and colleagues posed the IMO Grand Challenge to design AI systems to problems in the International Mathematical Olympiad at the level of a gold medalist. Specifically, it called for finding formal solutions in Lean. At the time, the challenge made the popular press, but it seemed far out of reach. However, in 2024, Google DeepMind announced that its provers, AlphaProof and AlphaGeometry, solved problems in that year’s IMO at the level of a silver medalist. In 2025, four companies claimed gold medal performance: Google DeepMind and OpenAI used systems that produce informal proofs that human referees can check, and Harmonic and ByteDance used systems that produce formal proofs that can be checked in Lean. Since then, three systems, Seed-Prover, Hilbert, and Alpha have reported impressive performance on Putnam Competition problems as well.

Several startup companies, including Harmonic, Math Inc, Axiom, Axiomatic AI, and Logical Intelligence, are developing systems that produce formal proofs in Lean. Numina, a nonprofit, maintains open source provers and benchmarks. Commercial models like Google’s Gemini DeepThink have gotten good at producing informal mathematical arguments, giving rise to a practice that Emily Riehl has called “vibe proving.” Models often hallucinate, but one can address that problem by asking for formalized proofs as well, as described by Terence Tao in a Mathstadon post and again, in another one. See also his blog post, which discusses the solution to an Erdős problem that was obtained using several types of technology, including AlphaEvolve, Harmonic’s Aristotle prover, and social media.

At the Institute for Computer-Aided Reasoning in Mathematics, we aim to help the mathematics community make sense of these developments by evaluating and reporting on tools, and by serving as mediators between developers and the mathematics community.