On mathematics and reputation
From ‘Formalizing the stability of the two Higgs doublet model potential into Lean: identifying an error in the literature’, uploaded to arXiv on March 9, 2026:
Firstly, we believe this to be the first time a non-trivial error in a research level physics paper has been identified through the process of formal verification. … Secondly, this was one of the first research-level papers where formalization was attempted, and it was not chosen with the intention of finding an error, but rather because we thought the process of formalization would be easy and the likelihood of an error was low. From this one could make the worrying extrapolation that there are many such errors in the physics literature. It is also a strong motive for making formal verification the gold standard for physics papers.
This is interesting because I haven’t heard of published theoretical physics papers being flawed in the same way I’ve read about such papers in, say, psychology or behavioural economics. I am of course extrapolating from my personally small knowledge base: theoretical physicists may find this surprising, although I haven’t seen signs of that. To the contrary in fact.
I recently found out more about Lean and formalisation (for my piece in The Hindu about mathematicians’ efforts to formalise and then verify Maryna Viazovska’s work that won her the Fields Medal in 2022). Formalisation is the task of translating ‘human’ proofs of maths problems in the language of a machine in great detail; the language here is Lean. Conlon’s point is that if Lean found errors in a relatively simple calculation, it’d be in for worse in the face of path integrals, which are the foundation of almost all modern quantum theory but also quite messy.
In the course of reading more about this, I came across a curious December 2024 post on the Xena Project (of “mathematicians learning Lean by doing”) blog by Imperial College in London professor of pure mathematics Kevin Buzzard. He wrote that when some mathematicians were using Lean to check some advanced maths proofs, Lean found that a particular step in a 1965 paper by a mathematician named Norbert Roby appeared to be wrong. It mattered because a branch of mathematics called crystalline cohomology, which had been built on Roby’s work and used by many mathematicians since the 1970s technically had a gap in its foundations.
But nobody thought the math was actually wrong, only that the written proof had a hole in it. However, in formal mathematics, a result being ‘probably fine’ isn’t good enough; as Buzzard put it, “you have to actually fix it” rather than rest on the idea that it’s fixable. At some point, the noted American mathematician Brian Conrad caught wind of the mistake and, after looking into it, found a fix: he knew that a different proof of the same result existed in the appendix of a book by Pierre Berthelot and Arthur Ogus. So a crisis was averted. But then came the twist: Buzzard later had lunch with Ogus and gleefully told him how his work had saved the day. Ogus’s response: “Oh, that appendix has several errors in it. But I think I know how to fix them.”
Buzzard also mentioned one defence of mathematical ideas that had problems at their foundations that struck me, too — that “crystalline cohomology has been used so much since the 1970s that if there were a problem with it, it would have come to light a long time ago” — but UniDistance Suisse mathematics professor David Loeffler countered it in the comments saying the risk is that mathematicians might be placing too much stock in the idea that something can be fixed when it really may not be that way, and could even be “collectively wrong in their estimation”. How could this be possible?
I’m learning the answers have to do with how mathematicians work together. For instance when they productively use some framework, like crystalline cohomology, for decades to generate papers and careers, they also create an enormous social pressure rooted in the idea that the framework works. They also assume — reasonably — that the framework has already survived the questions that could have revealed flaws in its foundations. But as Buzzard’s and Loeffler’s exchange shows, it’s likely that that question has already come along but there’s no guarantee. Moreover even if there’s a flaw it might show itself only in particular applications and the rest of the time the framework can appear to be ‘functioning’ as expected.
(This to me also speaks to the unsuitability of using buildings or similar structures in the real, physical world as metaphors to communicate their nature — at least not without also drawing on, say, the idea that those particular applications don’t stress the framework’s specific weak points.)
Then of course there’s the problem with how mathematicians build on each other’s work, which is also a problem with peer-review as well. The norm is to verify that argument B follows in valid ways from argument A and not whether argument A is itself valid, nor is it to derive argument B from scratch. This way an error in one old paper can spread through citations for many years, with each subsequent mathematician correctly reasoning from a flawed starting point.
This is reminiscent of the Schön scandal in the 2000s. The German physicist Jan Hendrik Schön fabricated data on organic superconductors and field-effect transistors but by the time his fraud came to light, other groups had begun building on his results, including designing experiments premised on his findings being real. So when Schön was ultimately exposed, a not insubstantial chunk of the field had to be unwound. For an even more dramatic example from history: the HeLa cells from Henrietta Lacks are extraordinarily robust and, it turned out, had silently contaminated a large fraction of cell cultures in labs worldwide from the 1950s onward. So researchers who believed they were studying prostate cancer cells, breast cancer cells or other lines were in many cases actually studying HeLa cells. But unlike the case with zombie citations, in both cases the ‘secondary’ scientists had no idea they were studying a wrong thing.
I used these examples because it seems the ways in which mathematicians fail could be the same ways in which scientists more broadly fail: due to common blindspots, foundational assumptions that nobody thinks to double-check (or which they assume others have checked), social structures that reward ‘progress’ more than other processes like auditing, and, overall, by underestimating the importance of social forces to the way scientists and mathematicians organise and share their work. Lean et al. are, or formalisation more broadly is, thus forcing mathematicians to look past these assumptions and, as the arXiv paper’s author Joseph Tooby-Smith wrote, that they’re already finding gaps in fundamental ideas suggests the foundations of rational inquiry may be somewhat less certain than a discipline’s reputation alone might imply.