Aarhus mathematician co-authors Google DeepMind breakthrough on machine-verified proofs of open problems

19. juni 2026 af Marina Behrend

AI for research mathematics is currently developing along two main lines: formal systems, which write proofs in formal languages such as Lean that are checked rigorously by a compiler, and informal systems, which write natural language arguments and must be checked by human experts. The strongest current systems (both formal and informal) are typically agentic harnesses built on frontier base LLMs such as GPT Pro and Gemini Pro models.

AlphaProof Nexus, presented in a paper released by Google DeepMind in May (arXiv:2605.22763), is one of the latest formal systems developed for research mathematics. It takes the formal route: AI agents write their proofs in the formal proof language Lean, where a compiler verifies every logical step. Associate Professor Gergely Bérczi from the Department of Mathematics, Aarhus University, is a co-author: he was the first research mathematician to work with the system, identifying suitable problems, validating the system’s solutions, and co-writing the natural-language versions of its proofs.

Working autonomously over hours or days, AlphaProof Nexus resolved 9 of 353 formalized open problems from the famous problem collection of Paul Erdős – including two questions, posed in 1970, that had remained open for 56 years — and proved 44 open conjectures from the Online Encyclopedia of Integer Sequences, at a typical cost of a few hundred dollars per problem. The system has since been deployed in live research projects, contributing machine-verified results in optimization, algebraic geometry, additive combinatorics, graph theory and quantum optics.

Bérczi has also been an early tester of Google DeepMind’s AI Co-Mathematician system (arXiv:2605.06651), which represents a complementary, informal collaborator system in AI for mathematics. Although he is not a co-author of the paper, Section 5.2 discusses his use of the system in a case study on Stirling coefficients for symmetric power representations, including positivity and log-concavity conjectures.

In parallel, Bérczi has recently co-developed ProofCouncil, an agentic modular system for mathematical research, with the IMProofBench team as part of a project led by Johannes Schmitt at ETH Zürich. In June, the independent First Proof initiative – a foundation of leading mathematicians that benchmarks AI on unpublished research-level problems – published the results of its Second Batch: ten new problems, attempted by four AI systems and graded blind by some thirty expert referees, as in journal peer review. ProofCouncil turned out to be the strongest performer: it solved 6 out of 10 problems.

Together, these projects mark a shift in how artificial intelligence meets research mathematics: agentic systems are becoming strong collaborators, whose outputs can be independently checked — whether by a compiler, by a panel of expert referees, or through close human-AI collaboration.

Aarhus mathematician co-authors Google DeepMind breakthrough on machine-verified proofs of open problems

Links