Two CS-Inspired Proofs for the Infinitude of Primes
It’s unfortunate that there aren’t many good open-source fonts designed specifically for dyslexic readers. However, there’s a helpful Chrome extension that can change the font of the text you read online, making it easier to follow.
Prime numbers are among the most fascinating objects in mathematics, and their study has led to advancements in several areas of math. One of the oldest known facts about primes is that the set of primes is not finite. The earliest existing proof for this statement is perhaps the one due to Euclid and can be found in Elements, Book IX, Proposition 20. Many different proofs, however, have emerged since then, with a comprehensive list compiled by Meštrović [Mešt23].
In this blog post, I will present two computer science-inspired proofs for the infinitude of primes. The first proof is due to Gregory Chaitin [Chai] and it uses the notion of Shannon entropy, which is a fundamental concept in information theory. The second proof is due to Aalok Thakkar [Thak18], which was recently featured in American Mathematical Monthly. The latter uses some tools from automata theory.
A word of acknowledgment: The first time I came across Chaitin’s proof was in the final exam of a course I had on information theory taught by Thomas Debris-Alazard. Thanks to Thomas for his excellent teaching and evaluation methods!
1 Chaitin’s Proof (Using Properties of Shannon Entropy)
Let \(n\) be a natural number, and define \(N\) as a uniform random variable whose support is the set \(\{ 1,2, \dots, n \}\). The number of primes less than or equal to \(n\) is denoted by \(\pi(n)\), known as the prime-counting function. Let \(p_1 < p_2 < \cdots < p_{\pi(n)}\) represent the sequence of primes less than or equal to \(n\). For \(1 \leq i \leq \pi(n)\), define a random variable \(X_i\) as the exponent of \(p_i\) in the prime factorization of \(N\).
By the fundamental theorem of arithmetic, \((X_1, \dots , X_{\pi(n)})\) is a function of \(N\) and vice versa. Therefore, we have \[ H(X_1, \dots, X_{\pi(n)}) = H(N), \] where \(H(\cdot)\) denotes Shannon entropy. Since \(N\) is a uniform random variable on a support of size \(n\), \(H(N) = \log n\). On the other hand, we can bound \(H(X_1, \dots, X_{\pi(n)})\), using the inequality \[ H(X_1, \dots, X_{\pi(n)}) \leq \sum_{i=1}^{\pi(n)} H(X_i). \] The cardinality of the support of \(X_i\) is \(\lbrack \log_{p_i} n \rbrack + 1 \leq \log n + 1\), thus, \[ H(X_i) \leq \log (\log n + 1), \] which leads to the inequality \[ \pi(n) \geq \frac{\log n}{\log (\log n + 1)}. \] As \(n \to \infty\), the right-hand side of this inequality tends to infinity, implying that \(\pi(n)\) becomes arbitrarily large.
It is worth noting that this approach not only proves the infinitude of primes, but also provides a lower bound on the prime-counting function, which is of great interest to many mathematicians.
Exercise 1 (A Tighter Bound) It is possible to improve the bound obtained above to \[ \pi(n) \geq \frac{ \log n}{2 \log 2}, \]
using a different factorization of \(n\). Can you prove it?
2 Thakkar’s Proof (Using Properties of Regular Languages)
Here, I will only outline Thakkar’s proof and leave working out the details for the reader. The proof is based on the fact that the class of regular languages is closed under finite union. Consider the following family of languages over the alphabet \(\{ 0,1 \}\):
\[ \mathcal{L}_n = \{ w \in \{ 0,1 \}^* : \#_0(w) - \#_1(w) \equiv 0 \pmod{n} \}, \]
where \(\#_0(w)\) and \(\#_1(w)\) denote the number of occurrences of \(0\) and \(1\) in the string \(w\), respectively. You can easily verify that \(\mathcal{L}_n\) is a regular language for any \(n \in \mathbb{N}\). In particular, \(\mathcal{L}_p\) is regular for any prime \(p\). For the sake of contradiction, assume that the set of primes is finite. This implies that
\[ \mathcal{L} = \bigcup_{p \in \mathcal{P}} \mathcal{L}_p, \]
where \(\mathcal{P}\) is the set of all primes, is a finite union of regular languages, hence regular. To arrive at a contradiction, it is easier to use another fact about regular languages: the class of regular languages is closed under complementation. Consider the complement of \(\mathcal{L}\), and observe that it is
\[ \overline{\mathcal{L}} = \{ w \in \{ 0,1 \}^* : \#_0(w) - \#_1(w) = \pm 1 \}. \]
Using the pumping Lemma, or the Myhill-Nerode theorem, one can show that \(\overline{\mathcal{L}}\) is not regular, which is a contradiction. Therefore, the set of primes is infinite.