The main task for day 2 is to find the sum of all “invalid IDs” in a given range \([a,b]\), which are numbers made of some sequence of digits repeated at least twice. For example, 22, 343434, 52475247, 11111 are all invalid IDs. For the full description of the problem, read at https://adventofcode.com/2025/day/2
To be honest, part 2 is quite difficult and it took me one day to find a good solution. Then, surprisingly I found that the problem is related to a concept in Number Theory: Möbius Inversion Formula. In this post, I will explain how we can apply this powerful theory to derive a closed form formula to compute the result. First, to describe the problem in terms of mathematics, we need some notations.
Some notations and preliminaries
A number is called primitive if it is not made of some sequence of digits (called primitive block) repeated at least twice. For example, 4, 52, 3299 are example of primitive numbers. Let \(x\) be an integer of \(n\) digits, we call its (minimal) period \(p=p(x)\) to be the smallest number \(p\) such that \(x\) starts repeating after \(p\) digits
| \(x\) | 222 | 343434 | 52475247 | 56799 |
| \(n\) | 3 | 6 | 8 | 5 |
| \(p\) = period of \(x\) | 1 | 2 | 4 | 5 |
| primitive block | ‘2’ | ’34’ | ‘5247’ | ‘56799’ |
Note that for primitive numbers, \(p(x)=n\) (no repeated blocks of length \(<n\)) and “invalid” numbers are just non-primitive numbers.
We denote \(|x|=\) the number of digits of \(x\) . By writing \(x=w^d\), we mean that \(x\) is formed by a block \(w\) repeated \(d\) times. With this notation, any invalid number takes the form \(w^d\) for some \(d \geq 2\).
| Lemma 1 (repetition factor). Let \(x\) be a \(n\)-digit integer and \(x=w^d\) and \(|w|=k\) , we have \[x=w \frac{10^n-1}{10^k-1}\] |
Proof. By assumption, \(x\) is obtained by repeating \(k\)-digits block \(d\) times
\[x=\underbrace{ww\dots w}_{d \; \text{times}}\]
Since \(w\) has \(k\) digits, the last ‘\(w\)’ has value \(w\), the second last ‘\(w\)’ has value \(10^{k}w\), …, the first one has value \(10^{(d-1)k}w\)
\[x=w \big( 1+10^k+10^{2k}+\dots+10^{(d-1)k} \big)\]
This is just a geometric series \(1+r+r^2+\dots+r^m=\frac{r^{m+1} -1}{m-1}\) with \(r=10^k\) and \(m=d-1\). Applying this, we have
\[x= w \frac{10^n-1}{10^k-1}\]
Remark. By Lemma 1, we can determine an invalid number by its primitive block \(w\). For example, take \(n=6\), \(w=25\), \(k=2\), then the repetition factor is
\[R= \frac{10^6-1}{10^2-1}=10101 \;\; \text{and} \;\; 252525=25\cdot 10101\]
Now, suppose we know that a “big task”, say \(g(6)\), can be split into sum of smaller task \(f\) on divisors of \(6\). More precisely, \(g(6)=f(1)+f(2)+f(3)+f(6)\). Can we compute \(f(6)\) via \(g\) only, without computing \(f(1),\; f(2),\; f(3)\)? The answer is Yes, and it lies in Möbius Inversion Formula.
Möbius function \(\mu(n)\) is a multiplicative function that takes every natural number \(n\) to value in \(\{1,0,-1\}\) :
- \(\mu(n)=1\) if \(n=1\)
- \(\mu(n)=(-1)^k\) if \(n\) is product of \(k\) distinct primes
- \(\mu(n)=0\) if \(n\) is divisible by a square \(>1\)
The main point is that \(\mu(n)\) is easily pre-computed and for day 2 problem, it’s enough to compute up to \(n=10\). The main tool we use to solve our problem is the following theorem.
Theorem. (Möbius Inversion Formula) 1
If \(f\) and \(g\) are arithmetic functions satisfying
\[g(n)=\sum_{d|n}f(d)\]
then we have the inverse formula:
\[f(n)=\sum_{d|n} \mu(d)g \left( \frac{n}{d} \right) \]
Our main problem: find the sum of all \(n\)-digit invalid numbers in the range \([a,b] \) .
First, we can assume that \(a, b\) both have \(n\) digits since otherwise we can split into sub-ranges with the same numbers of digits. Let’s define the following function:
\[F(d):=\sum_{a \leq x \leq b \\ p(x)=d}x \quad \text{and} \;\; G(d):=\sum_{a \leq x \leq b \\ p(x) | d}x\]
Note the difference between these two functions: \(G(d)\) sums all numbers in \([a, b]\) whose period divide \(d\), while \(F(d)\) sums all numbers whose periods are exactly \(d\). As an illustration, consider \(n=6\). Then \(F(3)\) is the sum of numbers of the form \(xyzxyz\), whereas \(G(3)\) sums all numbers of the form \(xyzxyz\) or \(xxxxxx\) (period is 3 or 1). There are several worthy facts about these functions when \(d = n\):
- \(G(n)=\sum_{a \leq x \leq b}x \) (that is, the sum of all numbers in the range) since any number \([a, b]\) has period dividing \(n\). This is not true for \(d \neq n\).
- \(F(n)= \text{sum of all primitive}\;x \in [a, b]\ \). This is because \(p(x)=n\) means that \(x\) is primitive.
From this observations, we can compute as
\[\sum_{a \leq x \leq b\\ x \; invalid}x =\left( \sum_{a \leq x \leq b}x \right) -F(n)\]
The task now becomes computing \(F(n)\). From the definition of \(F\) and \(G\) we have the forward relation
\[G(n)=\sum_{d|n}F(d)\]
Thus, we can invert \(F(n)\) by Möbius Inversion
\[F(n)=\sum_{d|n}\mu(d) G\left( \frac{n}{d}\right)\]
The good thing about \(G\) is that we can compute it in constant time.
Lemma 2. For each \(d | n\), we have \[G\left( \frac{n}{d} \right)=R \cdot \left( \sum_{L \leq x \leq U}x \right), \;\;\;\text{where} \] \[R=\frac{10^n-1}{10^{n/d}-1}, \; L=\max \left( \left\lceil \frac{a}{R} \right\rceil, 10^{n/d-1}\right), \; U=\min \left( \left\lfloor \frac{b}{R} \right\rfloor, 10^{n/d-1}\right)\] |
Proof. Denote \(k=n/d\). Any number whose period divides \(n/d\) must be obtained by repeating a \(k\)-digit block \(d=n/k\) times. In other words, any such number \(x\) has the form \(x=w^d\) where \(w\) has \(k\) digits. Note that \(w\) is not necessarily primitive. By Lemma 1, we have
\[x=wR \quad \text{where}\; R=\frac{10^n-1}{10^k-1} \]
We require \(a\leq x \leq b\), thus
\[ \left\lceil \frac{a}{R} \right\rceil \leq w \leq \left\lfloor\frac{b}{R} \right\rfloor\]
Here \(\lceil \cdot \rceil\) and \(\lfloor \cdot \rfloor\) stand for the ceiling and floor functions respectively 2. On the other hand, since \(w\) has \(k\) digits, we also have
\[10^{k-1} \leq w \leq 10^k -1\]
Combining these, we have the lower bound and upper bound for \(w\)
\[ L=\max \left( \left\lceil \frac{a}{R} \right\rceil, 10^{k-1}\right) \leq w \leq U=\min \left( \left\lfloor\frac{b}{R} \right\rfloor, 10^k-1 \right) \]
With this bounds, we can sum all \(x=w^d\) as
\[G(k)=G\left( \frac{n}{d} \right)=\left( \sum_{L \leq x \leq U}xR \right)=R \cdot \left( \sum_{L \leq x \leq U}x \right), \]
which is what we want to prove.
Now, we are ready to state the final formula:
Theorem. Let \(a \leq b\) are \(n\)-digit positive integers. Then the sum of all invalid numbers in \([a, b]\) is given by \[\sum_{a \leq x \leq b\\ x \; invalid}x =\left( \sum_{a \leq x \leq b}x \right) -\sum_{d|n}\mu(d) G\left( \frac{n}{d}\right)\] |
Remark. Let \(A(m,M)=\sum_{m \leq x \leq M}x\), that is, the arithmetic sum of all numbers from \(m\) to \(M\), then \(A(m,M)\) is equal to
\[A(m,M)=\frac{(M+m)(M-m+1)}{2}\]
The Pseudocode
Input: positive integers \(a, b\) with \(a \leq b\)
Output: sum of all invalid number \(x\) with \(a \leq x \leq b\)
def invalid_sum(a, b): n, m= len(str(a)), len(str(b)) total=0 if n==m: Fn=0 #compute F(n) for d in DIVS[n]: # DIVS[n] list of all divisors of n k=n//d R= (10**n-1)//(10**k-1) low=max(math.ceil(a/R), 10**(k-1)) high= min( math.floor(b/R), 10**k-1) Gk=R*A(low,high) # A= arithemtic sum from low to high Fn+=Mobius[d]*Gk # Mobius[d]= Mobius function at d return A(a,b)-Fn elif n<m: return invalid_sum(a,10**n-1)+invalid_sum(10**n,b) return total
Remark. If we want the sum of all invalid numbers repeated twice (part 1), we only care if \(d=2\) is a divisor of \(n\) (or \(2 | n\)). In this case the time complexity is O(1) and the result is 0 if \(n\) odd and \(G(n/2)\) if \(n\) even by Lemma 2.
Theorem. Let \(a \leq b\) are \(n\)-digit positive integers with \(n\) even . Then the sum of all numbers in \([a, b]\) obtained by repeating some block twice is equal to \[\sum_{a \leq x \leq b\\ x=w^2}x=G\left( \frac{n}{2} \right)=R \cdot \left( \sum_{L \leq x \leq U}x \right) \] where \[R=10^{n/2}+1, \; L=\max \left( \left\lceil \frac{a}{R} \right\rceil, 10^{n/2-1}\right), \; U=\min \left( \left\lfloor \frac{b}{R} \right\rfloor, 10^{n/2-1}\right)\] |
Actually, we don’t need the fancy Möbius Inversion to get O(1) solution for this case.
Conclusions
The key take away is that, if we have \(g(n)=\sum_{d|n}f(d)\), we can recover \(f(n)\) via Möbius Inversion:
\[f(n)=\sum_{d|n} \mu(d)g \left( \frac{n}{d} \right) \]
Summing over divisors of \(n\) is much less expensive then looping through \(n\) itself. In this problem, \(n\) being the number of digits (at most \(10\)) makes it feasible to pre-compute the Möbius function and divisor sets 3. It is also worth to note that Möbius Inversion involves the computation of \(g\) and in this case, we can compute \(g\) in constant time using only geometric and arithmetic series.
