I’ve just uploaded to the arXiv my paper “Products of consecutive integers with unusual anatomy“. This paper answers some questions of Erdős and Graham which were initially motivated by the study of the Diophantine factorial equation

\displaystyle  a_1! a_2! a_3! = m^2

where {a_1 < a_2 < a_3} and {m} are positive integers. Writing {(a,N,N+H) = (a_1,a_2,a_3)}, one can rewrite this equation as

\displaystyle  s( (N+1) \dots (N+H) ) = s(a!) \ \ \ \ \ (1)

where {s(n)} denotes the squarefree part of {n} (the smallest factor of {n} formed by dividing out a perfect square). For instance, we have

\displaystyle s(8 \times 9 \times 10) = 5 = s(6!)

which corresponds to the solution {6! 7! 10! = (6!\times 7!)^2} to the original equation.

The equation (1) ties into the general question of what the anatomy (prime factorization) of the product {(N+1) \dots (N+H)} looks like. This is a venerable topic, with the first major result being the Sylvester-Schur theorem from 1892 that the largest prime factor of {(N+1) \dots (N+H)} is greater than {H}. Another notable result is the Erdős-Selfridge theorem that the product {(N+1) \dots (N+H)} is never a perfect power for {H > 1}.

Erdős and Graham were able to show that solutions to (1) were somewhat rare, in that the set of possible values of {N+H} had density zero. For them, the hardest case to treat was when the interval {\{N+1,\dots,N+H\}} was what they called bad, in the sense that {(N+1) \dots (N+H)} was divisible by the square of its largest prime factor. They were able, with some effort, to show that the union of all bad intervals also had density zero, which was a key ingredient in to prove the previous result about solutions to (1). They isolated a subcase of the bad intervals, which they called the very bad intervals, in which the product {(N+1) \dots (N+H)} was a powerful number (divisible by the square of every prime factor).

A later paper of Luca, Saradha, and Shorey made the bounds more quantitative, showing that both the set of values of {N+H}, as well as the union of bad intervals, had density {O( \exp(-c \log^{1/4} x (\log\log x)^{3/4})} for some absolute constant {c>0}. In the other direction, just by considering the case {H=1}, one can show that the number of possible values of {N+H} up to {x} is {\gtrsim c_3^1 \sqrt{x}}, where {c_3^1} is the constant

\displaystyle  c_3^1 = 1 + \sum_{a \geq 2: a \neq n^2 \forall n} \frac{1}{s(a!)^{1/2}} = 3.70951\dots.

As for the bad intervals, by again considering the case {H=1}, it is possible to show that the number of bad points up to {x} is

\displaystyle  x / \exp((\sqrt{2}+o(1)) \sqrt{\log x \log\log x});

see for instance this paper of Ivic. Similarly, the union of the very bad intervals contains as the set of powerful numbers; Golomb worked out that the number of powerful numbers up to {x} is {\sim \frac{\zeta(3/2)}{\zeta(3)} \sqrt{x}}.

It was conjectured by Erdős and Graham that all of these lower bounds are in fact sharp (up to {1+o(1)} multiplicative factors); this is Erdos Problem 380 (and a portion of Erdos Problem 374). The main result of this paper is to confirm this conjecture in two cases and come close in the third:

Theorem 1
  • The number of numbers up to {x} that lie in a bad interval of length {H>1} is {O(1/\log^{1-o(1)} x)} of the number of bad points up to {x}.
  • The number of numbers up to {x} that lie in a very bad interval of length {H>1} is {O(x^{2/5+o(1)})}.
  • The number of numbers up to {x} of the form {N+H} for a solution to (1) is {O(x^{1/2+o(1)})}.

Not surprisingly, the methods of proof involve many standard tools in analytic number theory, such as the prime number theorem (and its variants in short intervals), zero density estimates, Vinogradov’s bounds on exponential sums, asymptotics for smooth numbers, the large sieve, the fundamental lemma of sieve theory, and the Burgess bound for character sums. There was one point where I needed a small amount of algebraic number theory (the classification of solutions to a generalized Pell equation), which was the one place where I turned to AI for assistance (though I ended up rewriting the AI argument myself). One amusing point is that I specifically needed the recent zero density theorem of Guth and Maynard (as converted to a bound on exceptions to the prime number theorem in short intervals by Gafni and myself); previous zero density theorems were barely not strong enough to close the arguments.

A few more details on the methods of proof. It turns out that very bad intervals, or intervals solving (1), are both rather short, in that the bound {H \leq \exp(\log^{2/3+o(1)} N)} holds. The reason for this is that the primes {p} that are larger than {H} (in the very bad case) or {C H \log N} for a large constant {C} (in the (1) case) cannot actually divide any of the {N+1,\dots,N+H} unless they divide it at least twice. This creates a constraint on the fractional parts of {N/p} and {N/p^2} that turns out to be inconsistent with the equidistribution results on those fractional parts coming from Vinogradov’s bounds on exponential sums unless {H} is small. In the very bad case, this forces a linear relation between two powerful numbers; expressing powerful numbers as the product of a square and a cube, matters then boil down to counting solutions to an equation such as

\displaystyle  n_1^2 n_2^3 + 1 = m_1^2 m_2^3

with say {n_1,n_2,m_1,m_2 \leq x^{1/5}}. The number of solutions here turns out to be {O(x^{2/5})} by work of Aktas-Murty and of Chan; we generalize the arguments in the former to handle a slightly more general equation. A similar argument handles solutions to (1), except in one regime where the parameter {a} is somewhat large (comparable to {H \log N}), in which case one instead collects some congruence conditions on {N} and applies the large sieve.

The situation with bad intervals is more delicate, because there is no obvious way to make {H} small in all cases. However, by the large sieve (as well as the Guth–Maynard theorem), one can show that the contribution of large {H} is negligible, and from bounds on smooth numbers one can show that the interval {\{N+1,\dots,N+H\}} contains a number with a particularly specific anatomy, of the form {p_0^2 p_1 \dots p_{1000} m'} where {p_0, \dots, p_{1000}} are all primes of roughly the same size, and {m'} is a smoother factor involving smaller primes. The rest of the bad interval creates some congruence conditions on the product {p_1 \dots p_{1000}}. Using some character sum estimates coming from the Burgess bounds, we find that the residue of {p_1\dots p_{1000}} becomes fairly equidistributed amongst the primitive congruence classes to a given modulus when one perturbs the primes {p_1,\dots,p_{1000}} randomly (there are some complications from exceptional characters of Siegel zero type, but we can use a large values estimate to keep their total contribution under control). This allows us to show that the congruence conditions coming from the bad interval are restrictive enough to make non-trivial bad intervals quite rare compared to bad points. One innovation in this regard is to set up an “anti-sieve”: the elements of a bad interval tend to have an elevated chance of being divisible by small primes, and one can use moment methods to show that an excessive number of small prime divisors is somewhat rare. This can be compared to standard sieve arguments, which often seek to limit the event that a number has an unexpectedly deficient number of small prime divisors.

Tanya Klowden and I have uploaded to the arXiv our preprint “Mathematical methods and human thought in the age of AI“. This is an unabridged version of a solicited article for a forthcoming Blackwell Companion to the Philosophy of Mathematics. I rarely write article-length essays of a philosophical nature (perhaps the last one was in 2007), but given the topical interest in AI and formalization for mathematics, which has begun to raise increasingly fundamental questions about what the nature, purpose and practice of mathematics actually is (or ought to be), it seemed like it was a timely opportunity to write about these matters. Other mathematicians seem to have recently come to this conclusion also; see for instance this paper of Avigad, or this paper of Commelin, Jamnik, Ochigame, Taelman, and Venkatesh, both of which have come out in the last few weeks.

Our piece took over a year to write – which means, at the current pace of development in the field, that some of it is already slightly out of date. Nevertheless, it was an instructive exercise for both of us to try to look beyond the immediate technical issues presented by current AI and formalization tools and try to point out the philosophical questions that we will have to grapple with as these tools become increasingly capable and integrated into our profession, using prior examples of technological advancement as a guide. We don’t pretend to have definitive answers to most of these questions, but as with mathematics itself, the first step is to pose the questions and then try to make partial progress on them (or at least identify some negative results and eliminate some failed approaches). One point we particularly felt worth stressing is that AI tools and applications (in mathematics or elsewhere) should not be viewed purely through the technical lens of what microscale problems they solve and how effective or efficient they are at solving them, but also through the macroscopic humanitarian lens of how our society, our shared body of knowledge and understanding, and our species benefits (or is harmed) as a whole from these technologies.

Our initial submission ended up significantly exceeding the page limits of the submission and ventured beyond the philosophy of mathematics into broader philosophical and ethical questions about AI in general. A streamlined version of the paper will appear in the forthcoming companion, but we have decided to make the original longer version available on the arXiv.

I’ve just uploaded to the arXiv my paper “Local Bernstein theory, and lower bounds for Lebesgue constants“. This paper was initially motivated by a problem of Erdős} on Lagrange interpolation, but in the course of solving that problem, I ended up modifying some very classical arguments of Bernstein and his contemporaries (Boas, Duffin, Schaeffer, Riesz, etc.) to obtain “local” versions of these classical “Bernstein-type inequalities” that may be of independent interest.

Bernstein proved many estimates concerning the derivatives of polynomials, trigonometric polynomials, and entire functions of exponential type, but perhaps his most famous inequality in this direction is:

Lemma 1 (Bernstein’s inequality for trigonometric polynomials) Let {P: {\bf R} \rightarrow {\bf C}} be a trigonometric polynomial of degree at most {n}, with {|P(x)| \leq A} for all {x}. Then {|P'(x)| \leq n A} for all {x}.

Similar inequalities concerning {L^p} norms of derivatives of Littlewood-Paley components of functions are now ubiquitious in the modern theory of nonlinear dispersive PDE (where they are also called Bernstein estimates), but this will not be the focus of this current post.

A trigonometric polynomial {P} of degree {n} is of exponential type {n} in the sense that {P(z) = O(\exp(n|z|))} for complex {z}. Bernstein in fact proved a more general result:

Lemma 2 (Bernstein’s inequality for functions of exponential type) Let {f: {\bf C} \rightarrow {\bf C}} be an entire function of exponential type at most {\lambda}, with {|f(x)| \leq A} for all {x \in {\bf R}}. Then {|f'(x)| \leq \lambda A} for all {x \in {\bf R}}.

There are several proofs of this lemma – see for instance this survey of Queffélec and Zarouf. In the case that {f} is real-valued on {{\bf R}}, there is a nice proof by Duffin and Schaeffer, which we sketch as follows. Suppose we normalize {A=\lambda=1}, and adjust {f} by a suitable damping factor so that {f(z)} actually decays slower than {\exp(|z|)} as {z \rightarrow \infty}. Then, for any {0 < \alpha < 1} and {x_0 \in {\bf R}}, one can use Rouche’s theorem to show that the function {\cos(x-x_0) - \alpha f(x)} has the same number of zeroes as {\cos(x-x_0)} in a suitable large rectangle; but on the other hand one can use the intermediate value theorem to show that {\cos(x-x_0) - \alpha f(x)} has at least as many zeroes than {\cos(x-x_0)} in the same rectangle. Among other things, this prevents double zeroes from occuring, which turns out to give the desired claim {|f'(x)| \leq 1} after some routine calculations (in fact one obtains the stronger bound {|f(x)|^2 + |f'(x)|^2 \leq 1} for all real {x}).

The first main result of the paper is to obtain localized versions of Lemma 2 (as well as some related estimates). Roughly speaking, these estimates assert that if {f} is holomorphic on a wide thin rectangle passing through the real axis, is bounded by {A} on the intersection of the real axis with this rectangle, and is “locally of exponential type” in the sense that it is bounded by {O(\exp( \lambda |\mathrm{Im} z|))} on the upper and lower edges of this rectangle (and obeys some very mild growth conditions on the remaining sides of this rectangle), then {|f'(x)|} can be bounded by {\lambda A} plus small errors on the real line, with some additional estimates away from the real line also available. The proof proceeds by a modification of the Duffin–Schaeffer argument, together with the two-constant theorem of Nevanlinna (and some standard estimates of harmonic measures on rectangles) to deal with the effect of the localization. (As a side note, this latter argument was provided to me by ChatGPT, as I was not previously aware of the Nevanlinna two-constant theorem.)

Once one localizes this “Bernstein theory”, it becomes suitable for the analysis of (real-rooted, monic) polynomials {P} of a high degree {n}, which are not bounded globally on {{\bf R}} (and grow polynomially rather than exponentially at infinity), but which can exhibit “local exponential type” behavior on various intervals, particularly in regions where the logarithmic potential

\displaystyle  U_\mu(z) := \frac{1}{n} \log \frac{1}{|P(z)|} = \int \log \frac{1}{|z-x|}\ d\mu(x)

behaves like a smooth function (here {\mu = \frac{1}{n} \sum_{j=1}^k \delta_{x_j}} is the empirical measure of the roots {x_1,\dots,x_n} of {P}). A key example is the (monic) Chebyshev polynomials {2^{1-n} T_n(x)}, which locally behave like sinusoids on the interval {[-1,1]} (and are locally of exponential type above and below this interval):

This becomes relevant in the theory of Lagrange interpolation. Recall that if {x_1 < \dots < x_n} are real numbers and {Q} is a polynomial of degree less than {n} then one has the interpolation formula

\displaystyle  Q(z) = \sum_{j=1}^k Q(x_k) \ell_j(z)

where the Lagrange basis functions {\ell_j(z)} are defined by the formula

\displaystyle  \ell_k(z) := \prod_{i \neq k} \frac{z - x_i}{x_k - x_i}.

In terms of the monic polynomial {P(z) := \prod_{j=1}^n (z-x_j)}, we can write

\displaystyle  \ell_k(z) = \frac{P(z)}{(z-x_k) P'(x_k)}.

The stability and convergence properties of Lagrange interpolation are closely related to the Lebesgue function

\displaystyle  \Lambda(z) := \sum_{k=1}^n |\ell_k(z)|,

and for a given interval {I}, the quantity {\sup_{x \in I} \Lambda(x)} is known as the Lebesgue constant for that interval.

If one chooses the interpolation points {x_1,\dots,x_n} poorly, then the Lebesgue constant can be extremely large. However, if one selects these points to be the roots of the aforementioned monic Chebyshev polynomials, then it is known that {\sup_{x \in I} \Lambda(x) = \frac{2}{\pi} \log n - O(1)} for all fixed intervals {I} in {[-1,1]}. In the case {I = [-1,1]}, it was shown by Erdős} that this is the best possible value of the Lebesgue constant up to {O(1)} errors for interpolation on {[-1,1]}, thus

\displaystyle  \sup_{x \in [-1,1]} \Lambda(x) \geq \frac{2}{\pi} \log n - O(1)

whenever {-1 \leq x_1 < \dots < x_n \leq 1} (a more precise bound was later shown by Vertesi). Erdős and Turán then asked if the same lower bound

\displaystyle  \sup_{x \in I} \Lambda(x) \geq \frac{2}{\pi} \log n - O(1) \ \ \ \ \ (1)

held for more general intervals {I}. This is shown in our paper; a variant integral bound

\displaystyle  \int_I \Lambda(x)\ dx \geq \frac{4}{\pi^2} |I| \log n - o(\log n) \ \ \ \ \ (2)

is also established, answering a separate question of Erdős. These lower bounds had previously obtained up to constants by Erdős and Szabados}; the main issue was to obtain the sharp constant in the main term.

In terms of the monic polynomial {P}, these two estimates can be written as

\displaystyle  \sup_{x \in I} \sum_{k=1}^n \frac{|P(x)|}{|x-x_k||P'(x_k)|} \geq \frac{2}{\pi} \log n - O(1)

and

\displaystyle  \int_I \sum_{k=1}^n \frac{|P(x)|}{|x-x_k||P'(x_k)|}\ dx \geq \frac{4}{\pi^2} |I| \log n - o(\log n).

Using the intuition that {P} should behave locally like a trigonometric polynomial, and performing some renormalizations, one can extract the following toy problem to work with first:

Problem 3 Let {P: {\bf R} \rightarrow {\bf R}} be a trigonometric polynomial of degree {n} with {2n} roots {x_1 < \dots < x_{2n}} in {[0, 2\pi)}.
  • (i) Show that

    \displaystyle  \sup_{x \in [0,2\pi)} \sum_{k=1}^{2n} \frac{|P(x)|}{|P'(x_k)|} \geq 2. \ \ \ \ \ (3)

  • (ii) Show that

    \displaystyle  \int_0^{2\pi} \sum_{k=1}^{2n} \frac{|P(x)|}{|P'(x_k)|}\ dx \geq 8. \ \ \ \ \ (4)

It is easy to check that the lower bounds of {2} and {8} are sharp by considering the case when {P} is a sinusoid {P(x) = A \sin(n(x-x_0))}.

The bound (3) is immediate from Bernstein’s inequality (Lemma 1). By applying a local version of this inequality, I was able to get a weak version of the claim (1) in which {O(1)} was replaced with {o(\log n)}; see this early version of the paper, which was developed through conversations with Nat Sothanaphan and Aron Bhalla. By combining this argument with ideas from the older work of Erdős}, I was able to establish (1).

The bound (2) took me longer to establish, and involved a non-trivial amount of playing around with AI tools, the story of which I would like to share here. I had discovered the toy problem (4), but initially was not able to establish this inequality; AlphaEvolve seemed to confirm it numerically (with sinusoids appearing to be the extremizer), but did not offer direct clues on how to prove this rigorously. At some point I realized that the left-hand side factorized into the expressions {\int_0^{2\pi} |P(x)|\ dx} and {\sum_{k=1}^{2n} \frac{1}{|P'(x_k)|}}, and tried to bound these expressions separately. Perturbing around a sinusoid {A \sin(n(x-x_0))}, I was able to show that the {L^1} norm {\int_0^{2\pi} |P(x)|\ dx} was a local minimum as long as one only perturbed by lower order Fourier modes, keeping the frequency {n} coefficients unchanged. Guessing that this local minimum was actually a global minimum, this led me to conjecture the general lower bound

\displaystyle  \int_0^{2\pi} |P(x)|\ dx \geq 4 |a_n+ib_n|

whenever {P} was a degree {n} trigonometric polynomial with highest frequency components {a_n \cos(nx) + b_n \sin(nx)}. AlphaEvolve numerically confirmed this inequality to be likely to be true also. I still did not see how to prove this inequality, but I decided to try my luck giving it to ChatGPT Pro, which recognized it as an {L^1} approximation problem and gave me a duality-based proof (based ultimately on the Fourier expansion of the square wave). With some further discussion, I was able to adapt this proof to functions of global exponential type (replacing the Fourier manipulations with contour shifting arguments, in the spirit of the Paley-Wiener theorem), which roughly speaking gave me half of what I needed to establish (2). However, I still needed the matching lower bound

\displaystyle  \sum_{k=1}^{2n} \frac{1}{|P'(x_k)|} \geq \frac{2}{|a_n+ib_n|}

on the other factor in the toy problem. Again, AlphaEvolve could numerically confirm that this inequality was likely to be true, but now the quantity I was trying to control did not look convex or linear in {P}, and so the previous duality method did not seem to apply. At this point I switched to pen and paper; eventually I realized that the expression almost looked like a sum of residues, and eventually after playing around with contour integrals of {\frac{e^{inz}}{P(z)}} using the residue theorem I was able to establish (4), and then with a bit more (human) effort I could move from the toy problem back to the original problem to obtain (2). Quite possibly AI tools would also have been able to assist with these steps, but they were not necessary here; their main value for me was in quickly confirming that the approach I had in mind was numerically plausible, and in recognizing the right technique to solve one part of the toy problem I had isolated. (I also used AI tools for several other secondary tasks, such as literature review, proofreading, and generating pictures, but these applications of AI have matured to the point where using them for this purpose is almost mundane.)

Mathematical research traditionally involves a small number of professional mathematicians working closely on difficult problems. However, I have long believed that there is a complementary way to do mathematics, in which one works with a broad community of mathematically minded people on problems which may not be as deep as the problems one traditionally works on, but still are of mathematical interest; and that modern technologies, including AI, are more suitable for contributing the latter type of workflow. The “Polymath projects” were one example of this broad type of collaboration, where internet platforms such as blogs and wikis were used to facilitate such collaboration. Some years later, collaborative formalization projects (such as the one to formalize the Polynomial Freiman–Ruzsa conjecture of Marton, discussed previously on this blog here) became popular in some circles. And in 2024, I launched the Equational Theories Project (ETP) (discussed on this blog here and here), combining the rigor of Lean formalization with “good old fashioned AI” (in the form of automated theorem provers) to settle (with formal verification) over 22 million true-false problems in universal algebra.

Continuing in this spirit, Damek Davis and I are launching a new project, in the form of an experimental competitive challenge hosted by the SAIR Foundation (where I serve as a board member, and which is supplying technical support and compute). The idea of this challenge, motivated in part by this recent paper of Honda, Murakami, and Zhang, is to measure the extent to which the 22 million universal algebra true-false results obtained by the ETP can be “distilled” into a short, human-readable “cheat sheet”, similar to how a student in an undergraduate math class might distill the knowledge learned from that class into a single sheet of paper that the student is permitted to bring into an exam.

Here is a typical problem in universal algebra that the ETP was able to answer:

Problem 1 Suppose that {*: M \times M \rightarrow M} is a binary operation such that {x * (y * z) = (z * w) * w} for all {x,y,z,w}. Is it true that {x * (y * x) = (x * y) * z} for all {x,y,z}?

Such a problem can be settled either by algebraically manipulating the initial equation to deduce the target equation, or by finding a counterexample to the target equation that still satisfies the initial equation. There are a variety of techniques to achieve either of these, but this sort of problem is difficult, and even undecidable in some cases; see this paper of the ETP collaborators for more discussion. Nevertheless, many of these problems can be settled with some effort by humans, by automated theorem provers, or by frontier AI systems; here for instance is an AI-generated solution to the above problem.

However, these AI models are expensive, and do not reveal much insight as to where their answers come from. If one instead tries a smaller and cheaper model, such as one of the many open-source models available, it turns out that these models basically perform no better than random chance, in that when asked to say whether the answer to a question such as the above is true or false, they only answer correctly about 50% of the time.

But, similarly to how a student struggling with the material for a math class can perform better on an exam when provided the right guidance, it turns out that such cheap models can perform at least modestly better on this task (with success rates increasing to about 55%-60%) if given the right prompt or “cheat sheet”.

“Stage 1” of the distillation challenge, which we launched today, asks for contestants to design a cheat sheet (of at most 10 kilobytes in size) that can increase the performance of these models on the above true-false problems to as high a level as possible. We have provided a “playground” with which to test one’s cheat sheet (or a small number of example cheat sheets) some cheap models against a public set of 1200 problems (1000 of which were randomly selected, and rather easy, together with 200 “hard” problems that were selected to resist the more obvious strategies for resolving these questions); a brief video explaining how to use the playground can be found here.

Submissions stage will end on April 20, after which we will evaluate the submissions against a private subset of test questions. The top 1000 submissions will advance to a second stage which we are currently in the process of designing, which will involve more advanced models, but also the more difficult task of not just providing a true-false answer, but also a proof or counterexample to the problem.

The competition will be coordinated on this Zulip channel, where I hope there will be a lively and informative discussion.

My hope is that the winning submissions will capture the most productive techniques for solving these problems, and/or provide general problem-solving techniques that would also be applicable to other types of mathematical problems. We started with the equational theory project data set for this pilot competition due to its availability and spectrum of difficulty levels, but if this type of distillation process leads to interesting results, one could certainly run in on many other types of mathematical problem classes to get some empirical data on how readily they can be solved, particularly after we learn from this pilot competition on how to encourage participation and share of best practices.

SAIR will also launch some other mathematical challenges in the coming months that will be of a more cooperative nature than this particular competitive challenge; stay tuned for further announcements.

[This post is written in my capacity as Vice-Chair of the Board of Trustees of SLMath. -T.]

SLMath, formerly MSRI, has launched the search for the next Deputy Director. This key position is a close advisor to the Director and shares in the internal management of the scientific team and programs at SLMath. This position is ideal for an experienced professional with a PhD in mathematical sciences seeking a new opportunity to leverage their strengths in program and grant management, financial management, and people management.

Just a brief announcement that I have been working with Quanta Books to publish a short book in popular mathematics entitled “Six Math Essentials“, which will cover six of the fundamental concepts in mathematics — numbers, algebra, geometry, probability, analysis, and dynamics — and how they connect with our real-world intuition, the history of math and science, and to modern practice of mathematics, both in theory and in applications. The scheduled publication date is Oct 27, but it is currently available for preorder.

(Sharing this in my capacity of director of special projects at IPAM.)

IPAM is holding an Industrial Short Course on Generative AI Algorithms on March 5-6, 2026.

The short course is aimed at people from industry or government who want to get started in deep learning, apply deep learning to their projects, learn how to code deep learning algorithms, and upgrade their skills to the latest AI algorithms, including generative AI. The course will be taught be Professor Xavier Bresson from the Department of Computer Science at the National University of Singapore (NUS).

The course will be limited to 25 participants. The fee for this course is $2,500 for participants. Registration closes soon (Feb 5); we still have a few spots available.

Thomas Bloom’s Erdös problem site has become a real hotbed of activity in recent months, particularly as some of the easiest of the outstanding open problems have turned out to be amenable to various AI-assisted approaches; there is now a lively community in which human contributions, AI contributions, and hybrid contributions are presented, discussed, and in some cases approved as updates to the site.

One of the lessons I draw from this is that once a well curated database of precise mathematical problems is maintained, it becomes possible for other parties to build upon it in many ways (including both AI-based and human-based approaches), to systematically make progress on some fraction of the problems.

This makes me wonder what other mathematical databases could be created to stimulate similar activity. One candidate that came to mind are “optimization constants” – constants {C} that arise from some mathematical optimization problem of interest, for instance finding the best constant {C} for which a certain functional inequality is satisfied.

I am therefore proposing to create a crowdsourced repository for such constants, to record the best upper and lower bounds known for any given such constant, in order to help encourage efforts (whether they be by professional mathematicians, amateur mathematicians, or research groups at a tech company) to try to improve upon the state of the art.

There are of course thousands of such constants one could consider, but just to set the discussion going, I set up a very minimal, proof of concept Github repository holding over 20 constants including:

  1. {C_{1a}}, the constant in a certain autocorrelation quantity relating to Sidon sets. (This constant seems to have a surprisingly nasty optimizer; see this tweet thread of Damek Davis.)
  2. {C_{1b}}, the constant in Erdös’ minimum overlap problem.

Here, I am taking inspiration from the Erdös problem web site and arbitrarily assigning a number to each constant, for ease of reference.

Even in this minimal state I think the repository is ready to start accepting more contributions, in the form of pull requests that add new constants, or improve the known bounds on existing constants. (I am particularly interested in constants that have an extensive literature of incremental improvements in the lower and upper bounds, and which look at least somewhat amenable to computational or AI-assisted approaches.) But I would be interested to hear feedback on how to improve the repository in other ways.

Update: Paata Ivanisvili and Damek Davis have kindly agreed to help run and expand this repository.

A basic problem in sieve theory is to understand what happens when we start with the integers {{\bf Z}} (or some subinterval of the integers) and remove some congruence classes {a_i \pmod{q_i}} for various moduli {q_i}. Here we shall concern ourselves with the simple setting where we are sieving the entire integers rather than an interval, and are only removing a finite number of congruence classes {a_1 \pmod{q_1}, \ldots, a_k \pmod{q_k}}. In this case, the set of integers that remain after the sieving is periodic with period {Q = \mathrm{lcm}(q_1,\dots,q_k)}, so one work without loss of generality in the cyclic group {{\bf Z}/Q{\bf Z}}. One can then ask: what is the density of the sieved set

\displaystyle  \{ n \in {\bf Z}/Q{\bf Z}: n \neq a_i \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}? \ \ \ \ \ (1)

If the {q_i} were all coprime, then it is easy to see from the Chinese remainder theorem that the density is given by the product

\displaystyle  \prod_{i=1}^k \left(1 - \frac{1}{q_i}\right).

However, when the {q_i} are not coprime, the situation is more complicated. One can use the inclusion-exclusion formula to get a complicated expression for the density, but it is not easy to work with. Sieve theory also supplies one with various useful upper and lower bounds (starting with the classical Bonferroni inequalities), but do not give exact formulae.

In this blog post I would like to note one simple fact, due to Rogers, that one can say about this problem:

Theorem 1 (Rogers’ theorem) For fixed {q_1,\dots,q_k}, the density of the sieved set is maximized when all the {a_i} vanish. Thus,

\displaystyle  |\{ n \in {\bf Z}/Q{\bf Z}: n \neq a_i \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}|

\displaystyle \leq |\{ n \in {\bf Z}/Q{\bf Z}: n \neq 0 \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}|.

Example 2 If one sieves out {1 \pmod{2}}, {1 \pmod{3}}, and {2 \pmod{6}}, then only {0 \pmod{6}} remains, giving a density of {1/6}. On the other hand, if one sieves out {0 \pmod{2}}, {0 \pmod{3}}, and {0 \pmod{6}}, then the remaining elements are {1} and {5 \pmod{6}}, giving the larger density of {2/6}.

This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!

Modern readers may see some similarities between Rogers’ theorem and various rearrangement or monotonicity inequalites, suggesting that the result may be proven by some sort of “symmetrization” or “compression” method. This is indeed the case, and is basically Rogers’ original proof. We can modernize a bit as follows. Firstly, we can abstract {{\bf Z}/Q{\bf Z}} into a finite cyclic abelian group {G}, with residue classes now becoming cosets of various subgroups of {G}. We can take complements and restate Rogers’ theorem as follows:

Theorem 3 (Rogers’ theorem, again) Let {a_1+H_1, \dots, a_k+H_k} be cosets of a finite cyclic abelian group {G}. Then

\displaystyle  |\bigcup_{j=1}^k a_j + H_j| \geq |\bigcup_{j=1}^k H_j|.

Example 4 Take {G = {\bf Z}/6{\bf Z}}, {H_1 = 2{\bf Z}/6{\bf Z}}, {H_2 = 3{\bf Z}/6{\bf Z}}, and {H_3 = 6{\bf Z}/6{\bf Z}}. Then the cosets {1 + H_1}, {1 + H_2}, and {2 + H_3} cover the residues {\{1,2,3,4,5\}}, with a cardinality of {5}; but the subgroups {H_1,H_2,H_3} cover the residues {\{0,2,3,4\}}, having the smaller cardinality of {4}.

Intuitively: “sliding” the cosets {a_i+H_i} together reduces the total amount of space that these cosets occupy. As pointed out in comments, the requirement of cyclicity is crucial; four lines in a finite affine plane already suffice to be a counterexample otherwise.

By factoring the cyclic group into p-groups, Rogers’ theorem is an immediate consequence of two observations:

Theorem 5 (Rogers’ theorem for cyclic groups of prime order) Rogers’ theorem holds when {G = {\bf Z}/p^n {\bf Z}} for some prime power {p^n}.

Theorem 6 (Rogers’ theorem preserved under products) If Rogers’ theorem holds for two finite abelian groups {G_1, G_2} of coprime orders, then it also holds for the product {G_1 \times G_2}.

The case of cyclic groups of prime order is trivial, because the subgroups of {G} are totally ordered. In this case {\bigcup_{j=1}^k H_j} is simply the largest of the {H_j}, which has the same size as {a_j + H_j} and thus has lesser or equal cardinality to {\bigcup_{j=1}^k a_j + H_j}.

The preservation of Rogers’ theorem under products is also routine to verify. By the coprime orders of {G_1,G_2} and standard group theoretic arguments (e.g., Goursat’s lemma, the Schur–Zassenhaus theorem, or the classification of finite abelian groups), one can see that any subgroup {H_j} of {G_1 \times G_2} splits as a direct product {H_j = H_{j,1} \times H_{j,2}} of subgroups of {G_1,G_2} respectively, so the cosets {a_j + H_j} also split as

\displaystyle  a_j + H_j = (a_{j,1} + H_{j,1}) \times (a_{j,2} + H_{j,2}).

Applying Rogers’ theorem for {G_2} to each “vertical slice” of {G_1 \times G_2} and summing, we see that

\displaystyle  |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times (a_{j,2} + H_{j,2})| \geq |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times H_{j,2}|

and then applying Rogers’ theorem for {G_1} to each “horizontal slice” of {G_1 \times G_2} and summing, we obtain

\displaystyle  |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times H_{j,2}| \geq |\bigcup_{j=1}^k H_{j,1} \times H_{j,2}|.

Combining the two inequalities, we obtain the claim.

Like many other areas of modern analysis, analytic number theory often relies on the convenient device of asymptotic notation to express its results. It is common to use notation such as {X = O(Y)} or {X \ll Y}, for instance, to indicate a bound of the form {|X| \leq C Y} for some unspecified constant {C}. Such implied constants {C} vary from line to line, and in most papers, one does not bother to compute them explicitly. This makes the papers easier both to write and to read (for instance, one can use asymptotic notation to conceal a large number of lower order terms from view), and also means that minor numerical errors (for instance, forgetting a factor of two in an inequality) typically has no major impact on the final results. However, the price one pays for this is that many results in analytic number theory are only true in asymptotic sense; a typical example is Vinogradov’s theorem that every sufficiently large odd integer can be expressed as the sum of three primes. In the first few proofs of this theorem, the threshold for “sufficiently large” was not made explicit.

There is however a small portion of the analytic number theory devoted to explicit analytic number theory estimates, in which all constants are made completely explicit (and many lower order terms are retained). For instance, whereas the prime number theorem asserts that the prime counting function {\pi(x)} is asymptotic to the logarithmic integral {\mathrm{Li}(x) = \int_2^x \frac{dt}{\log t}}, in this recent paper of Fiori, Kadiri, and Swidinsky the explicit estimate

\displaystyle  |\pi(x) - \mathrm{Li}(x)| \leq 9.2211 x \sqrt{\log x} \exp(- 0.8476 \sqrt{\log x} )

is proven for all {x \geq 2}.

Such explicit results follow broadly similar strategies of proof to their non-explicit counterparts, but require a significant amount of careful book-keeping and numerical optimization; furthermore, any given explicit analytic number theory paper is likely to rely on the numerical results obtained in previous explicit analytic number theory papers. While the authors make their best efforts to avoid errors and build in some redundancy into their work, there have unfortunately been a few cases in which an explicit result stated in the published literature contained numerical errors that placed the numerical constants in several downstream applications of these papers into doubt.

Because of this propensity for error, updating any given explicit analytic number theory result to take into account computational improvements in other explicit results (such as zero-free regions for the Riemann zeta function) is not done lightly; such updates occur on the timescale of decades, and only by a small number of specialists in such careful computations. This leads to authors needing such explicit results to often be forced to rely on papers that are a decade or more out of date, with constants that they know in principle can be improved by inserting more recent explicit inputs, but do not have the domain expertise to confidently update all the numerical coefficients.

To me, this situation sounds like an appropriate application of modern AI and formalization tools – not to replace the most enjoyable aspects of human mathematical research, but rather to allow extremely tedious and time-consuming, but still necessary, mathematical tasks to be offloaded to semi-automated or fully automated tools.

Because of this, I (acting in my capacity as Director of Special Projects at IPAM) have just launched the integrated explicit analytic number theory network, a project partially hosted within the existing “Prime Number Theorem And More” (PNT+) formalization project. This project will consist of two components. The first is a crowdsourced formalization project to formalize a number of inter-related explicit analytic number theory results in Lean, such as the explicit prime number theorem of Fiori, Kadiri, and Swidinsky mentioned above; already some smaller results have been largely formalized, and we are making good progress (especially with the aid of modern AI-powered autoformalization tools) on several of the larger papers. The second, which will be run at IPAM with the financial and technical support of Math Inc., will be to extract from this network of formalized results an interactive “spreadsheet” of a large number of types of such estimates, with the ability to add or remove estimates from the network and have the numerical impact of these changes automatically propagate to other estimates in the network, similar to how changing one cell in a spreadsheet will automatically update other cells that depend on it. For instance, one could increase or decrease the numerical threshold to which the Riemann hypothesis is verified, and see the impact of this change on the explicit error terms in the prime number theorem; or one could “roll back” all the literature to a given date, and see what the best estimates on various analytic number theory expressions could still be derived from the literature available at that date. Initially, this spreadsheet will be drawn from direct adaptations of the various arguments from papers formalized within the network, but in a more ambitious second stage of the project we plan to use AI tools to modify these arguments to find more efficient relationships between the various numerical parameters than were provided in the source literature.

These more ambitious outcomes will likely take several months before a working proof of concept can be demonstrated; but in the near term I will be grateful for any contributions to the formalization side of the project, which is being coordinated on the PNT+ Zulip channel and on the github repository. We are using a github issues based system to coordinate the project, similar to how it was done for the Equational Theories Project. Any volunteer can select one of the outstanding formalization tasks on the Github issues page and “claim” it as a task to work on, eventually submitting a pull request (PR) to the repository when proposing a solution (or to “disclaim” the task if for whatever reason you are unable to complete it). As with other large formalization projects, an informal “blueprint” is currently under construction that breaks up the proofs of the main results of several explicit analytic number theory papers into bite-sized lemmas and sublemmas, most of which can be formalized independently without requiring broader knowledge of the arguments from the paper that the lemma was taken from. (A graphical display of the current formalization status of this blueprint can be found here. At the current time of writing, many portions of the blueprint are disconnected from each other, but as the formalization progresses, more linkages should be created.)

One minor innovation implemented in this project is to label each task by a “size” (ranging from XS (extra small) to XL (extra large)) that is a subjective assessment of the task difficulty, with the tasks near the XS side of the spectrum particularly suitable for beginners to Lean.

We are permitting AI use in completing the proof formalization tasks, though we require the AI use to be disclosed, and that the code is edited by humans to remove excessive bloat. (We expect some of the AI-generated code to be rather inelegant; but no proof of these explicit analytic number theory estimates, whether human-generated or AI-generated, is likely to be all that pretty or worth reading for its own sake, so the downsides of using AI-generated proofs here are lower than in other use cases.) We of course require all submissions to typecheck correctly in Lean through Github’s Continuous Integration (CI) system, so that any incorrect AI-generated code will be rejected. We are also cautiously experimenting with ways in which AI can also automatically or semi-automatically generate the formalized statements of lemmas and theorems, though here one has to be significantly more alert to the dangers of misformalizing an informally stated result, as this type of error cannot be automatically detected by a proof assistant.

We also welcome suggestions for additional papers or results in explicit analytic number theory to add to the network, and will have some blueprinting tasks in addition to the formalization tasks to convert such papers into a blueprinted sequence of small lemmas suitable for individual formalization.

Update: a personal log documenting the project may be found here.

Archives