From Here to Hensel, Daniel Mathews

Here’s a nice maths problem, which I thought it would be fun to discuss. The question doesn’t involve any advanced concepts, but it leads on to a very nice result called Hensel’s lemma, and can be taken even further.

The problem comes from somewhere in Mathematical Olympiad folklore… which is another way of saying “I don’t know the source”. If you know a source let me know by email or in the comments. It’s discussed in Problem Solving Tactics, the book I co-authored with my colleagues Angelo Di Pasquale and Norman Do.

PROBLEM. Let \(n\) be a positive integer. Prove that there are infinitely many perfect squares of the form \(a \cdot 2^n – 7 \), where \(a\) is a positive integer.

As with most Olympiad type problems, and especially as with problems in our book, it’s a problem which can be solved using “elementary” methods — in this case just modular arithmetic — but applied in a clever way. For the purposes of discussion here, I’m going to assume you know what modular arithmetic is. If you don’t know modular arithmetic, it’s a fun topic, and I’d encourage you to read about it.

As we’ll see, although his problem is a bit of a curly one, the ideas are quite general. They lead to a very nice result and a very nice technique.

Spoiler alert: I’m about to discuss an approach to this problem and give a solution. Now generally with mathematical problems it’s not good form to just read straight to a solution. So if you can, have a go at the problem yourself first. Otherwise you’ll just be burning the problem; you won’t get as much out of the discussion. Go on, get a pen and paper and give it a go. You’ve been here looking at a screen for too long anyway.

…

OK, on to discussing the problem.

It’s all modular arithmetic!

The first thing to note is that the problem is really about modular arithmetic, and we can rewrite it as such. To say that a number is of the form \( a \cdot 2^n – 7 \) with \(a\) an integer is to say that that number is congruent to \( -7 \) mod \( 2^n \). The problem is asking us to show that, for any positive integer \(n\), there are infinitely many perfect squares congruent to \(-7\) mod \(2^n\). In other words, it’s asking us to show that there exist infinitely many integers (x) such that
\[
x^2 + 7 \equiv 0 \text{ mod } 2^n.
\]
In fact, given \(n\), you only need to find one such \(x\). For once you have one such \(x\), any other number \(y\) congruent to \(x\) modulo \( 2^n \) will be a solution as well: since \( x \equiv y \) mod \(2^n\), then \( y^2 \equiv x^2 \equiv -7 \) mod ( 2^n ). So, we can simplify the problem somewhat.

To celebrate this first bit of progress, let’s fully write out our simplified form of the problem.

SIMPLIFIED VERSION OF PROBLEM. Let \(n\) be a positive integer. Show that there exists an integer \(x\) such that
\[
x^2 \equiv -7 \text{ mod } 2^n.
\]

Finding solutions, two by two by two…

To build up some understanding of how to find the required \(x\), let’s look at some small cases — as we often do, and which is a good way to approach problem-solving in general.

So, let’s start with \(n=1\). Here, we are looking for an \(x\) such that \( x^2 \equiv -7 \equiv 1 \) mod \(2\). In other words, we’re looking for \(x\) such that \(x^2\) is odd.

Try to guess a solution. Well, if the first \(x\) you guess doesn’t work, the second one probably will. Because in fact any odd \(x\) provides a solution. We may as well just take \( x = 1 \).

Great! What about \(n=2\)? Now we’re looking for \(x\) such that \(x^2 + 7 \) is a multiple of \(4\). This simplifies to \( x^2 \equiv -7 \equiv 1 \) mod \(4\).

Again, guessing a solution is not too hard. Since we’re working mod 4, there are only 4 possible guesses for \(x\) anyway. Guessing \(x = 1\) works, as does \(x=3\). This means that, again, any odd number works.

Well, this is all looking pretty good so far!

Flushed with success, we turn to \(n=3\). Now we want to solve \( x^2 \equiv -7 \) mod \(8\), or equivalently, \( x^2 \equiv 1 \) mod \(8\).

Again, not too difficult; and in any case there are only 8 possible guesses for \(x\) mod 8. And you can see that if \(x\) is even then \(x^2\) will also be even, and hence can’t possibly be \( \equiv 1 \) mod \(8\). So really there are only 4 reasonable guesses for \(x\).

And it turns out all guesses work. We can take \(x \equiv 1, 3, 5, \) or \(7\) mod \(8\). Again, any odd number works.

This all seems a bit too easy!

But at \(n=4\) we can’t just take any odd \(x\). We want to solve \(x^2 \equiv -7 \) mod \(16\), which is equivalent to \( x^2 \equiv 9 \) mod \(16\). We can take \(x=3\); in fact any \(x \equiv 3, 5, 11 \) or \(13 \) mod \(16\) works. So out of the 16 guesses mod 16, or 8 odd guesses, 4 of them work.

Now we go to \(n=5\) and try to solve \( x^2 \equiv -7 \) mod \(32\), or equivalently \(x^2 \equiv 25 \) mod \(32\). Noticing that 25 is actually a perfect square, we can just take \( x = 5 \). But again there are several more solutions, and you can check that \(x \equiv 5, 11, 21\) and \(27\) mod \(32\) are the solutions.

But now the numbers are getting bigger. In fact, they’re growing exponentially. As we proceed through \(n = 1, 2, 3, \ldots \), we are proceeding from considering numbers mod \(2\), to mod \(4 = 2^2\), to mod \(8 = 2^3\), and so on. The modulus is increasing by a factor of two at each step.

So, rather than get bogged down in arithmetic modulo 128, let’s step back and think a bit about how we might proceed.

Reducing from one modulus to another

You might have noticed, as we were passing through calculations mod 2, then 4, then 8, and so on, that at each stage we were “refining” our calculations.

And, on a related note, when we were considering calculations at each stage, we could apply the calculations at a previous stage.

Let’s be a bit more precise, and a bit more general.

If you know that a number \(x\) satisfies \(x \equiv 3\) mod \(4\), then you can deduce that \(x\) must be odd; we used this fact above. Similarly, if you know \(x\) satisfies \(x \equiv 7 \) mod \(15\), then you can deduce that \(x \equiv 1 \) mod \(3\). Indeed, if \(x\) is 7 more than a multiple of 15, then it’s also 7 more than a multiple of 3; any multiple of 15 is also a multiple of 3. So \(x \equiv 7 \equiv 1 \) mod \(3\).

More generally, if we know \(x\) mod \(m\), then we can deduce \(x\) mod \(k\), for any factor \(k\) of \(m\).

Let’s write \(m = kl\), and let’s assume \(k,l\) are both integers greater than 1. If you know \(x\) mod \(kl\) then you can know \(x\) mod \(k\). For if \(x \equiv c \) mod \(kl\), then \(x\) is \(c\) more than a multiple of \(kl\). But any multiple of \(kl\) is also a multiple of \(k\), so then \( x \equiv c \) mod \(kl\).

The upshot is that there is a well-defined natural way to reduce from an integer modulo \(kl\), to an integer modulo \(k\). In other words, there’s a natural function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
which does this reduction. If you’ve studied some of this stuff, a fancy way of writing this is to say that there’s a homomorphism of groups or rings
\[
R^{kl}_k \colon \frac{\mathbb{Z}}{kl \mathbb{Z}} \rightarrow \frac{\mathbb{Z}}{k \mathbb{Z}}.
\]
If you haven’t heard those words before, don’t worry; it isn’t necessary for what follows.

Now this idea of reducing from one modulus to another is very relevant in the present problem.

For instance, when we considered the case \(n=5\), we were looking for \(x\) such that \(x^2 \equiv -7 \) mod \(32\). But reducing to mod 16 (i.e. applying \( R^{32}_{16} \) ), such an \(x\) must also satisfy \(x^2 \equiv -7 \) mod \(16\). And we found the solution for such \(x\) mod \(16\) in the previous case \(n=4\): namely, \(x \equiv 3, 5, 11 \) or \(13\) mod \(16\).

Thus, the only possible solutions for \(x\) in the case \(n=5\), solving the equation \(x^2 \equiv -7\) mod \(32\), must be \(x\) mod 32 which reduce to \(3, 5, 11\) or \(13\) mod \(16\). There are 8 such \(x\), namely
\[
x \equiv 3, 5, 11, 13, 19, 21, 27, 29 \text{ mod } 32.
\]
So rather than checking all 32 possibilities for \(x\) mod 32, or (slightly more efficient) checking the 16 odd possibilities for \(x\) mod 32, by paying attention to the previous case we could get down to a mere 8 cases to check. And as we mentioned, among these 8 possibilities, 4 of them provide solutions:
\[
x \equiv 5, 11, 21, 27 \text{ mod } 32.
\]

Lifting from one modulus to another

What we’re doing here is often described as “lifting”: lifting integers mod 16 to mod 32.

For instance, if you know \( x \equiv 3 \) mod \(16\), then you can’t say what \(x\) is mod \(32\), but you can narrow it down to two possibilities: \( x \equiv 3 \) or \(19 \) mod \(32\). So \(x\) has two lifts from mod 16 to mod 32.

A more abstract way of saying the same thing is that, when you consider the function
\[
R^{32}_{16} \colon { \text{Integers mod 32} } \rightarrow { \text{Integers mod 16} },
\]
the integer 3 mod 16 has two preimages. These preimages are its “lifts” to modulo 32. Indeed, every integer mod 16 has two preimages under \( R^{32}_{16} \). Explicitly, the two preimages of a number \(c\) mod \(16\) under \( R^{32}_{16} \) are \(c\) and \(c+16\) mod \(32\).

As you might suspect, there’s nothing special about 16 and 32 here. Let’s again consider general positive integers \(k,l\) greater than 1. Then the function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
is surjective, and it’s “\(l\)-to-1”, meaning that every integer modulo \(k\) has precisely \(l\) preimages. Explicitly, the preimages of an integer \(c\) are just given by \(c\) and adding multiples of \(k\) to it some number of times; so the preimages of \(c\) mod \(k\) are
\[
c, c+k, c+2k, …, c+(l-1)k.
\]
These \(l\) preimages are just given by adding \(k\) until we arrive back where we started: from the last number listed here \(c+(l-1)k\), if we added another \(k\), we would arrive at \(c+lk\), which is congruent to \(c\) mod \(kl\) and back where we started.

I haven’t formally proved the statements in the above paragraph, but hopefully they should make sense and the above discussion gives an indication why they’re true. If you’re not entirely convinced, I’d encourage you to try and prove it yourself!

Lifting our Solutions

Let’s now return to the problem at hand and use these ideas to find more solutions. We left off at \(n=5\), finding the solutions \(x \equiv 5, 11, 21, 27 \) mod \( 32 \).

Proceeding to the case \(n=6\), we are trying to solve
\[
x^2 \equiv -7 \text{ mod } 64.
\]
Based on our previous discussion, let’s try to find a solution by lifting the solutions from the previous case from mod 32 to mod 64.

Let’s start from one of our solutions mod 32, \(x \equiv 5 \) mod 32.

It has two lifts, \(5\) and \(5+32 = 37 \) mod \(64\). Do either of these work? We can compute \(5^2 \equiv 25 \) and \(37^2 = 1369 \equiv 25 \) mod \(64\). So neither of them works.

Next, let’s try lifting \(x \equiv 11 \) mod \(32\). It has two lifts \(11, 43\) mod \(64\). We find that \( 11^2 \) and \(43^2 \) are both \( \equiv -7 \) mod \(64\). So both are solutions!

For the sake of completeness, we can check that lifting \(x \equiv 21 \) mod \(32\) gives the possibilities \(x \equiv 21, 53\) mod \(64\) — and both of these are also solutions. And lifting \(x \equiv 27 \) mod \(32\) gives the possibilities \(x \equiv 27, 59 \) mod \(64\), neither of which are solutions.

So, it’s not always the case that a solution mod 32 lifts directly to a solution mod 64. But it’s often the case.

If we can guarantee that we can always go from a solution mod \(2^n\) to a solution mod \(2^{n+1}\), then we’ve solved the problem. We would then be able to take our solutions mod \(2^n\) for small \(n\), and use this guarantee to successively obtain solutions mod higher powers of \(2\). By induction on \(n\) then we can guarantee a solution mod \(2^n\) for all \(n\), which is precisely what the question asks for.

An Unorthodox Lifting

So, let’s work generally. Suppose we have a solution \(x \equiv c\) mod \(2^n\), i.e. we have found \(c\) such that \(c^2 \equiv -7 \) mod \(2^n\).

Then when we lift \(c\) to mod \(2^{n+1}\), there are two possibilities: \(x \equiv c\) or \(c+2^n\) mod \(2^{n+1}\). If either of these satisfies \(x^2 \equiv -7\) mod \(2^{n+1}\), we are finished.

Moreover, when we do this, we know we have to be “pretty close” to a solution, in some sense. We can see this by lifting, not just the \(x\), but also the \(-7\). Indeed, if \(x^2 \equiv -7 \) mod \(2^n\), then \(x^2 \) considered mod \(2^{n+1}\) is a lift of \(-7\) from mod \(2^n\) to mod \(2^{n+1}\). The two lifts of \(-7\) from mod \(2^n \) to mod \(2^{n+1}\) are \(-7\) and \(-7+2^n\). So either
\[
x^2 \equiv -7 \text{ or } -7+2^n \text{ mod } 2^{n+1}.
\]

Anyway, let’s take the lifts \(c\) and \(c+2^n\) of our solution from mod \(2^n\) to mod \(2^{n+1}\), and see how they compare. We can calculate
\[
(c + 2^n)^2 = c^2 + 2^{n+1} c + 2^{2n} \equiv c^2 \text{ mod } 2^{n+1}.
\]
Observe what happened: upon expanding out \( (c+2^n)^2 \), all of the terms except \(c^2\) contained high powers of 2, which modulo \(2^{n+1}\) are just zero and disappear.

This is bad news, because it means that \(c\) and \(c+2^n\) give the same result mod \( 2^{n+1} \) when we square them. So if \(x \equiv c\) is a solution, i.e. \(c^2 \equiv -7 \) mod \(2^{n+1} \), then \(x \equiv c + 2^n \) is also a solution. That’s great, but if \(x \equiv c \) is not a solution, then neither is \(x \equiv c+2^n \).

That’s just as we found above. Some solutions mod \(32\) lift to give a solution mod \(64\), and when they lift they lift doubly well, with both lifts being solutions. But other solutions do not lift to solutions at all.

So the problem becomes a bit harder, on account of the fact that
\[
(x + 2^n)^2 \equiv x^2 \text{ mod } 2^{n+1}.
\]
If only that weren’t true… but we’ll return to that point later.

For now, let’s deal with the reality at hand, where that unfortunate equation holds.

Back to the drawing board…

Our approach at this point has now become an attempt at a proof by induction. Given a solution modulo \(2^n\), we want to obtain from it a solution modulo \(2^{n+1}\). We need to do this in a way that always works. But we now know that we can’t just directly lift a solution each time.

Let’s go back to our equation modulo \( 32 \) and try to lift the solutions to modulo \( 64 \) in a way that “always works”. In other words, let’s try to find a method that, no matter which solution modulo \(32\) we start with, we can end up with a solution modulo \(64\).

We had the solution \(x \equiv 5 \) mod \(32\), which did not lift to a solution mod \(64\). But since \(x^2 \equiv -7\) mod \(32\), we could have known, lifting the \(-7\), that
\[
x^2 \equiv -7 \text{ or } 25 \text{ mod } 64.
\]
However, we found that both lifts \( x \equiv 5, 37 \) mod \(64\) failed to be solutions: both squared to \( 25 \) mod \(64\). And as we know now, if one of them fails, the other is doomed to fail too.

The reason for this failure, we found, was because
\[
37^2 \equiv (5 + 32)^2 \equiv 5^2 + 2 \cdot 5 \cdot 32 + 32^2 \equiv 5^2 \text{ mod } 64.
\]
As you can see, when we add the 32 in the term to be squared, we have added such a high power of 2 that all the terms other than \(5^2\) disappear modulo \(64\). In particular, the cross term \(2 \cdot 5 \cdot 32 \) becomes a multiple of 64.

But what if we added a just slightly lower power of 2? Rather than adding 32, we could add 16. This would yield \(x = 21\), which we found above is actually a solution! Indeed, applying the same algebra as above but with a 16 rather than a 32, we can see that
\[
21^2 \equiv (5+16)^2 \equiv 5^2 + 2 \cdot 5 \cdot 16 + 16^2
\equiv 5^2 + 32 \cdot 5 \text{ mod } 64.
\]
Now rather than evaluating \(32 \cdot 5\), we could just notice that it’s an odd multiple of 32. A multiple of 32 is always going to be congruent to 0 or 32 mod 64, depending on whether we multiply it by an even of an odd number. Here we have an odd multiple, so \( 32 \cdot 5 \equiv 32 \) mod \(64\). Thus,
\[
21^2 \equiv 5^2 + 32 \text{ mod } 64.
\]

This is great news! It says that \(5^2\) and \(21^2\) are not the same, modulo 64: they differ by \(32\). If it turned out that \(5^2 \equiv -7\) mod \(64\), we’d be finished. But it isn’t: it takes the other value, \(5^2 \equiv 25 \) mod \(64\). And in that case, the above equation tells us that \(21^2\) must give us the right value, \(21^2 \equiv 5^2 + 32 \equiv 25+32 \equiv -7 \) mod \(64\).

This argument doesn’t rely at all on the particular powers of 2 involved. It’s going to give us a way to go from a solution mod \(2^n\) to a solution \(2^{n+1}\). It will solve our problem.

… and a proof

Let’s reiterate the above in more generality, which in fact gives a solution to the problem.

Suppose we have an integer \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\). Proceeding as above, we’ll find a \(y\) such that \(y^2 \equiv -7 \) mod \(2^n\).

Well, lifting \(-7\) from mod \(2^n\) to mod \(2^{n+1}\), we must have that either
\[
x^2 \equiv -7 \text{or} -7+2^n \text{ mod } 2^{n+1}.
\]
In the first case, we are finished: we can just take \(y = x\). In the second case, let \(y = x + 2^{n-1} \). Then we have
\[
y^2 = (x + 2^{n-1})^2
= x^2 + 2^n x + 2^{2n-2}
\equiv x^2 + 2^n x \text{ mod } 2^{n+1}.
\]
Now as \(x^2 \equiv -7 \) mod \(2^n\), \(x\) is definitely odd. So \( 2^n x \) is an odd multiple of \(2^n\). Modulo \(2^{n+1}\), all multiples of \(2^n\) are congruent to \(0\) or \(2^n\): even multiples are congruent to \(0\), odd multiples are congruent to \(2^n\). Thus \(2^n x \equiv 2^n \) mod \(2^{n+1}\). We then conclude that
\[
y^2 \equiv x^2 + 2^n \text{ mod } 2^{n+1}.
\]
Since we assumed in this case that \( x^2 \equiv -7 + 2^n \) mod \( 2^{n+1} \), we therefore have
\[
y^2 \equiv -7 + 2^n + 2^n \equiv -7 \text{ mod } 2^{n+1},
\]
which is precisely what we were looking for.

We have shown that, given \(x\) satisfying \(x^2 \equiv -7 \) mod \(2^n\), there exists \(y\) satisfying \(y^2 \equiv -7\) mod \(2^{n+1}\).

So, since we found solutions modulo \(2^n\) for small \(n\), we now have proved, by induction, that for all positive integers \(n\), there exists \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\).

We have solved the problem!

What if life was easy?

In this problem we had to solve a specific equation in modular arithmetic. We did it by repeatedly “lifting” solutions. Given a solution modulo \(2^n\), we used it to find a solution modulo \(2^{n+1}\).

It would have been easier if we could directly lift the solutions. In other words, it would have been easier if whenever \(x\) was a solution modulo \(2^n\), it was also a solution modulo \(2^{n+1}\). That wasn’t quite true in this problem, and we have to adjust our \(x\) by \(2^{n-1}\) if necessary in passing from arithmetic modulo \(2^n\) to arithmetic modulo \(2^{n+1}\).

Sometimes, however, you can lift your solutions directly.

Sometimes, life doesn’t have to be difficult.

Let’s step back a bit and think about the type of problem we were trying to solve. We had a polynomial equation modulo \(2^n\):
\[
P(x) \equiv 0 \text{ mod } 2^n.
\]
In this problem, we had \(P(x) = x^2 + 7 \).

Let’s change the problem to a different polynomial: let \( Q(x) = x^3 + 7 \), and we’ll consider the similar-looking polynomial equation
\[
Q(x) \equiv 0 \text{ mod } 2^n,
\text{ i.e. }
x^3 + 7 \equiv 0 \text{ mod } 2^n.
\]

Now, mod \(2,4,8\), there’s the straightforward solution \(x=1\).

Mod \(16\), \(x=1\) is not a solution, but the other lift of \(1\) from mod 8 to mod 16 is a solution, i.e. \(x=9\).

Mod \(32\), \(x=9\) is still a solution.

Mod \(64\), \(x=9\) is not a solution, but the other lift of \(9\) from mod 32 to mod 64 is a solution, namely \(x=41\).

And so on.

So it appears that when we’re solving \(x^3 + 7 \equiv 0 \) mod \(2^n\), we can just lift solutions directly. Going from a solution modulo \(2^n\) to \(2^{n+1}\) there are two lifts, and one of them always seems to work.

Let’s see if we can prove this, using a similar method as for the initial, harder, problem. Suppose we have a solution \(x\) to the equation \(x^3 + 7 \equiv 0 \) mod \( 2^n \); we’ll try to find a \(y\) such that \(x^3 + 7 \equiv 0 \) mod \(2^{n+1}\).

Well, since \(x^3 + 7 \equiv 0 \) mod \(2^n \), then there are two possibilities for \(x^3 + 7 \) modulo \(2^{n+1}\):
\[
x^3 + 7 \equiv 0 \text{ or } 2^n \text{ mod } 2^{n+1}.
\]
In the first case, we have our solution: we may simply take \(y=x\) and then \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\) as desired. In the second case, consider letting \(y\) to be the other lift of \(x\) from mod \(2^n\) to mod \(2^{n+1}\), namely \(y = x+2^n\).

We can then compute \(y^3 + 7 \) mod \(2^{n+1}\), using similar algebra as in the original question:
\[
y^3 + 7
= (x+2^n)^3 + 7
= x^3 + 3x^2 2^n + 3x 2^{2n} + 2^{3n} + 7.
\]
Now by assumption \(x^3 + 7 \equiv 2^n \) mod \(2^{n+1}\). Moreover, any term containing a factor with \(2^a\) with \(a \geq n+1\) reduces to zero modulo \(2^{n+1} \), since it’s a multiple of \(2^{n+1}\). Thus the terms with a \(2^{2n}\) and a \(2^{3n}\) disappear. Thus the above expression simplifies, and we obtain
\[
y^3 + 7 \equiv 2^n + 3x^2 2^n
\equiv 2^n (1 + 3x^2) \text{ mod } 2^{n+1}.
\]

Here we have a multiple of \(2^n\). And as we discussed earlier, there aren’t many possibilities for multiples of \(2^n \) modulo \(2^{n+1}\): even multiples of \(2^n\) are in fact multiplies of \(2^{n+1}\), hence congruent to \(0\) mod \(2^{n+1}\); while odd multiples of \(2^{n}\) must be congruent to \(2^n\) mod \(2^{n+1}\).

Since \(x\) satisfies \(x^3 + 7 \equiv 0 \) mod \(2^n\), we see that \(x\) must be odd. So \(3x^2 \) is odd and \(1+3x^2 \) is even. Thus \(2^n (1+3x^2) \) is an even multiple of \(2^n\), hence congruent to 0 mod \(2^{n+1}\), and we obtain
\[
y^3 + 7 \equiv 0 \text{ mod } 2^{n+1}
\]
as desired.

Thus, we have shown that the solution \(x\) of \(x^3 + 7 \equiv 0 \) mod \(2^n\) lifts directly to a solution \(y \) of \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\). It may be that \(y\) is one lift \(x\) or the other lift \(x+2^n\), but one of those two lifts works.

This is much more straightforward behaviour than in the original problem! Life is easy now!

Why is life sometimes easy and sometimes hard?

So, what’s the difference between these two cases?

With the polynomial \(P(x) = x^2 + 7 \), a solution of \( P(x) \equiv 0 \) mod \(2^n\) did not lift directly to a solution mod \(2^{n+1}\), although we were able to find solutions by using a trick.

With the polynomial \(Q(x) = x^3 + 7 \), a solution of \( Q(x) \equiv 0 \) mod \(2^n\) did lift directly to a solution mod \(2^{n+1}\).

What is so different about \(P(x)\) and \(Q(x)\)?

At one level, the difference derived from a technicality in trying to lift solutions. From a solution \(x\) of \(P(x) \equiv 0 \) mod \(2^n\), we considered the lifts \(x\) and \(x+2^n\) mod \(2^{n+1}\). In expanding out \( P(x+2^n) \), too many terms cancelled, leaving us with just the terms of \( P(x) \) :
\[
P(x+2^n) = (x+2^n)^2 + 7
\equiv x^2 + 7 \equiv P(x) \text{ mod } 2^{n+1}.
\]
But doing the same for \( Q(x) \), there wasn’t quite as much cancellation:
\[
Q(x+2^n) = (x+2^n)^3 + 7
\equiv x^3 + 3x^2 2^n + 7
\equiv Q(x) + 3x^2 2^n \text{ mod } 2^{n+1},
\]
and in fact we were able to show that the extra term \( 3x^2 2^n \) was congruent to \( 2^n \) mod \(2^{n+1} \), so that
\[
Q(x+2^n) \equiv Q(x) + 2^n \text{ mod } 2^{n+1}.
\]
This last equation effectively says that \( Q(x+2^n) \) and \( Q(x) \) are the two lifts of \( Q(x) \) from mod \( 2^n \) to mod \( 2^{n+1} \); so one or the other lift of \(x\), i.e. either \(x\) or \(x+2^n\) will be the solution we’re looking for.

What’s happening, in some sense, is that the solution \(x\) to \( P(x) \equiv 0 \) mod \(2^n\) was “too good”: it was so good that \( P(x+2^n) \) cancelled too much!

In fact, there is a sense in which the solution \(x\) of \( P(x) \equiv 0 \) mod \(2^n\) was a “repeated root”, just like \(x=2\) is a repeated root of the polynomial \( (x-2)^2 \) over the real numbers.

Repeated roots in modular arithmetic

Now we have to be very careful with saying things like “repeated roots” for polynomials in modular arithmetic. Take arithmetic modulo 8 for instance. Then for one thing, there is no unique factorisation of polynomials, even if we regard integers purely as mod 8: for instance,
\[
(x+4)^2 = x^2 + 8x + 16 \equiv x^2 \text{ mod } 8,
\]
so that we have two distinct factorisations of the same polynomial
\[
(x+4)^2 \equiv x^2 \text{ mod } 8.
\]
Thus it doesn’t really make sense to talk about the “roots” of a polynomial mod 8, in the same way that we talk about the roots of a polynomial over the real or complex numbers.

(However, as it turns out, modulo a prime, there is unique factorisation of polynomials, and you can talk about the roots of a polynomial in a very similar way as over the real or complex numbers.)

Nonetheless, you can definitely talk about the solutions of a polynomial equation modulo 8; and indeed we’ve been doing that for a while now.

Perhaps the best way to understand what “repeated roots” might mean modulo 8 is to forget about factorising polynomials and instead differentiate them.

Differentiate? As in calculus? As in the subject all about real numbers and limits and tangent lines and so on, having very little to do with modular arithmetic?

Yes. That sort of differentiation. Let’s see why.

Simple roots and derivatives

Let’s forget about modular arithmetic for the minute, and think about polynomials in general, with (say) real numbers for coefficients. So, let \( F(x) \) be a polynomial.

Recall that a root of \(F(x)\) is a number \(r\) such that \(F(r) = 0\). This is equivalent to \( x-r \) being a factor of \(F(x)\).

We say a root is a repeated root or a non-simple root if in fact \( (x-r)^2 \) is a factor of \(F(x)\).

Otherwise, we say a root is a simple root. In this case, \( x-r \) is only a factor of \(F(x) \) once. In other words, \( F(x) = (x-r)H(x) \) where \( x-r \) is not a factor of \(H(x)\), or equivalently, where \( H(r) \neq 0 \).

One interesting fact about polynomials is that you can detect whether root is simple via differentiation.

FACT. Let \(r\) be a root of \(F(x)\). Then \(r\) is a non-simple (i.e. repeated) root if and only if \( F'(r) = 0 \).

In other words, a root of \( F(x) \) is a repeated root if and only if it’s also a root of the derivative \( F'(x) \).

The proof comes from applying the product rule.

PROOF. If \(r\) is a repeated root then \( F(x) = (x-r)^2 G(x) \) for some polynomial \( G(x) \). By the product rule then
\[
F'(x) = 2(x-r)G(x) + (x-r)^2 G'(x)
\]
and since both terms have a factor of \(x-r\), when we substitute \(x=r\) we obtain \( F'(r) = 0 \). If \(r\) is not repeated, then \( F(x) = (x-r) H(x) \) where \( H(r) \neq 0 \). In this case the product rule gives
\[
F'(x) = H(x) + (x-r)H'(x),
\]
and substituting \(x=r\) gives \( F'(r) = H(r) \neq 0 \).

Well, that’s a nice little fact. And we can try to apply it over the integers… or in modular arithmetic.

Given a polynomial \(F(x)\) with integer coefficients, its derivative \(F'(x) \) is still a polynomial with integer coefficients. And even if you consider the integer coefficients modulo 8 (or any other modulus), the derivative is still well defined modulo 8 (or any other modulus).

We can use this idea to define a notion of simple or repeated roots, even when working with modular arithmetic. When we have a root \(r\) of a polynomial \(F(x)\), we look at \(F'(r)\). As we’ll see, a similar idea applies, but there is a slight twist.

Let’s return to our examples discussed above.

For \( P(x) = x^2 + 7 \), modulo 8, we have the root \(1\), since \(P(1) \equiv 0 \) mod 8. Its derivative is \( P'(x) = 2x \), and \(P'(1) \equiv 2 \) mod 8.

For \( Q(x) = x^3 + 7 \), modulo 8, we have the root \(1\), since \( Q(1) \equiv 0 \) mod 8. Its derivative is \( Q'(x) = 3x^2 \), and \( Q'(1) \equiv 3 \) mod 8.

Now as it turns out, \( 1 \) is a “repeated root” of \( P(x) \), but it is a simple root of \( Q(x) \).

As we can see, it’s not true that \( P'(1) = 0 \), as we might have expected for a repeated root. However, it is even: that is, \( P'(1) \equiv 0 \) mod 2. The key difference is that 2 is odd, but 3 is not!

As it turns out, the way to define simple and repeated roots of a polynomial \(F(x)\), when we are working modulo \(2^n\), is to look at the derivative \(F'(x)\), not mod \(2^n\), but simply mod \(2\). And more generally, the same ideas apply when we work modulo powers of any prime.

In this way, we can make sense of simple and non-simple roots, even when the idea of unique factorisation of polynomials doesn’t otherwise work. (We’ll call them “non-simple” rather than “repeated” since the whole idea of unique factorisation breaks down.)

DEFINITION. Let \( F(x) \) be a polynomial whose coefficients are integers mod \(p^k\), where \(p\) is a prime and \(k\) is a positive integer.

We say an integer \(r\) mod \(p^k\) is a root of \(F(x)\) if \(F(r) \equiv 0 \) mod \(p^k\).
If in addition \( F'(r) \equiv 0 \) mod \(p\), we say \(r\) is a non-simple root.
Otherwise, i.e. if \( F'(r) \neq 0 \) mod \(p\), we say \(r\) is a simple root.

With this definition, \(1\) is a non-simple root of our original polynomial \(P(x) = x^2 + 7 \) mod \(8\), since \(P'(x) = 2x\) and \(P'(1) \equiv 0 \) mod \(2\).

On the other hand, \(1\) is a simple root of the polynomial \(Q(x) = x^3 + 7 \) mod \(8\), since \(Q'(x) = 3x^2 \) and \(Q'(1) = 3 \neq 1 \) mod \(2\).

Hensel’s Lemma

We’ve seen that in certain circumstances, solution of polynomial equations modulo \(2^n\) lift to solutions modulo \(2^{n+1}\). Or put another way, we’ve seen that roots of polynomials over the integers modulo \(2^n\) lift to roots of polynomials over the integers modulo \(2^{n+1}\).

We found that for \(P(x) = x^2 + 7\), roots of \(P(x) \equiv 0 \) mod \(2^n\) did not lift to roots mod \(2^{n+1}\). But we also found non-simple roots in this case.

On the other hand, we found that for \(Q(x) = x^3 + 7 \), roots of \(Q(x) \equiv 0 \) mod \(2^n\) did lift to roots mod \(2^{n+1}\). And in this case the roots were simple.

As it turns out, this is true in complete generality. When we have simple roots mod \(2^n\), they always lift to roots mod \(2^{n+1}\). And this is true not just mod \(2^n\), but mod \(p^n\) for any prime \(p\).

This result is called Hensel’s lemma. Let’s state it.

HENSEL’S LEMMA. Let \(F(x)\) be a polynomial with integer coefficients. Let \(p\) be a prime and \(k\) a positive integer. Suppose \(r\) mod \(p^k\) is a simple root of \(F(x)\) modulo \(p^k\), i.e.
\[
F(r) \equiv 0 \text{ mod } p^k
\text{ and }
F'(r) \neq 0 \text{ mod } p.
\]
Then there is a lift \(s\) of \(r\) mod \(p^{k+1}\) which is a root of \(F(x)\) modulo \(p^{k+1}\), i.e.
\[
s \equiv r \text{ mod } p^k
\text{ and }
F(s) \equiv 0 \text{ mod } p^{k+1}.
\]

Hensel’s lemma, as we’ve seen, doesn’t apply directly to our original problem. But it does apply to the easier problem with the polynomial \( Q(x) = x^3 + 7 \). Let’s see how quickly the solution goes now when we revisit the problem.

PROBLEM. Show that for any positive integer \(n\), there is an integer \(x\) such that \(Q(x) = x^3 + 7\) is divisible by \(2^n\).

SOLUTION. Proof by induction on \(n\). When \(n=1\) we observe that \(x=1\) is a solution, since \(Q(1) = 8 \equiv 0 \) mod \(2\). Now suppose we have a root \(r\) of \(Q(x)\) modulo \(2^n\). Then as \(r^3 + 7 \equiv 0 \) mod \(2^n\), \(r\) must be odd. Thus \( Q'(r) = 3r^2 \) must be odd, so \( Q'(r) \neq 0 \) mod \(2\), and \(r\) is a simple root. By Hensel’s lemma then \(r\) lifts to a root \(s\) of \(Q(x)\) mod \(2^{n+1}\). So any root of \(Q(x) \) mod \(2^n\) lifts to a root mod \(2^{n+1}\), and by induction then \(Q(x) \) has a root modulo \(2^n\) for all positive integers \(n\), as desired.

And there you have it.

From Here to Hensel

Tagged on: maths number theory