Talk at Knots in Washington 49.75

On 22 April 2022 I gave a (virtual) talk at the 49.75’th (!) Knots in Washington conference, an international conference on knot theory held regularly since 1995.

Title: A symplectic basis for 3-manifold triangulations

Abstract: Neumann and Zagier in the 1980s introduced a symplectic vector space associated to an ideal triangulation of a cusped 3-manifold, such as a knot complement. We will discuss an interpretation for this symplectic structure in terms of the topology of the 3-manifold. This joint work with Jessica Purcell involves train tracks, Heegaard splittings, and is related to Ptolemy varieties, geometric quantisation, and the A-polynomial.

Slides from the talk are below.

washington_talk

A Symplectic Basis for 3-manifold Triangulations, AustMS 2021

On 8 December 2021 I gave a (virtual) talk in the Topology session of the 2021 meeting of the Australian Mathematical Society.

Title: A symplectic basis for 3-manifold triangulations

Abstract: Walter Neumann and Don Zagier in the 1980s introduced a symplectic vector space associated to an ideal triangulation of a cusped 3-manifold. We will discuss an interpretation for this symplectic structure in terms of the topology of the 3-manifold. This work, joint with Jessica Purcell, involves train tracks, Heegaard splittings, and is related to Ptolemy varieties, geometric quantisation, and the
A-polynomial.

Slides from the talk are below.

austms_talk_v2

Five-minute surrealist antiwar exposition of topological data analysis

On Remembrance Day 2021 (11 November) I have a talk at a session of Lightning Talks at a session on “Mathematics for Data Analysis, AI & Machine Learning” organised by the Monash Data Futures Institute.

This was a “Lightning Talk” — 5 minutes only. In which I attempted to explain what topological data analysis is and how it works.

It had to be impressionistic, but it turns out surrealism is better for this kind of thing. For what is topological data analysis — or more explicitly, one of its main tools, persistent homology — if not the Persistence of Memory of Topological Contortion?

Being Remembrance Day, a day for ending war, and Topological Data Analysis having been funded for military applications, no better time to mention the campaign against lethal autonomous weapons systems.

Five minutes.

The slides from my talk are available below (2mb pptx).

General tips for studying mathematics

I don’t know that I would have anything to say that’s not a platitude, but here are some thoughts.

It’s a hard topic. Recognise that it’s hard. Don’t expect to learn everything in one go. Don’t expect to solve every problem on the first attempt. Take the time to let ideas sink in.

Don’t feel intimidated by others who do things faster, nor superior to those who do things slower. Don’t let anyone put anyone else down for finding the subject difficult, comprehending things more slowly, etc. Try to include people who feel excluded. It’s difficult for everybody, and everybody has different difficulties. We don’t need to make them worse.

Ask questions of your teachers. Chances are if you’re thinking something in a lecture was unclear, many other people are too, but not all of them are brave enough to raise their hand. And ask outside of class too if you want. For most lecturers, most of the questions they have to answer are about homework extensions, repeating for the hundredth time some administrative detail, etc. They love answering questions about their actual subject. Ask them.

If you can, find other students to work with. People vary, but for many people, working together productively with others on problems makes the whole experience much more pleasant, and it’s better for solving problems and bouncing around ideas too. Learn from others, and they can learn from you.

Be curious. Ask how and why things work. Think about the big picture and how everything fits together. Ask why we study the things we study.

Enjoy the subject. Mathematics is full of patterns, curiosities, profound truths, breathtaking theorems, hard-fought triumphs, ancient mysteries. Its roots in culture stretch back as far as humans have walked the earth. Play with the patterns, play with as much of it as you can get your hands on, it’s much better if you an understand it that way. Understand it your way, and tell stories about it.

Wonder at the profound beauty and elegance of deep theories. Take the time to properly understand and appreciate what theorems say. They surprise, they flabbergast, they connect oceans of thought. They reduce the incomprehensible to the world of the living. Their proofs are often even more instructive than their statements. These things are among the greatest achievements of humanity. They are the eternal harmonies of this tragic universe.

Also, savour the enjoyment when you solve a difficult problem. It’s a hard subject and we need to take all the wins we can get.

Summarise your maths research in one slide, Dan

I’ve never much been one for concision. To say that the contemporary attention economy, not to mention professional communication, not to mention mainstream politics, trades on absurdly short, necessarily incomplete, if not wrong, attention-grabbing quips and barbs and slogans and pitches, effectively giving most social discourse a lobotomy, is a cliche. That doesn’t make it less true or less important. Equally, to insinuate that you have much more to say, and have such nuanced and complex ideas, that you’re above that sort of thing, that you’re so sophisticated as to be impossible to summarise or condense to that form, is not only irritating and self-important, it’s probably false most of the time too.

Anyway, as part of an upcoming workshop participants were asked to introduce themselves with a one-page slide. I took it as an extreme form of concision: summarise your maths research in one slide, Dan. In practice it turned into a word cloud with a bunch of pictures taken from my papers.

Anyway, I thought the pictures were, if not pretty to look at, a bit fun. Well, of course I would say that, I do this stuff for a living. (Or, at least, I’m supposed to. What it seems I actually do for a living is drown in email and admin and committees and so on.)

So why not post the one slide here too?

slide summarising my maths research

One line Euler line

Well, I probably saw it as a student years ago, and apparently it’s standard, but I came across this today through Olympiad wanderings and thought it was wonderful enough to share.

It’s a fun fact from Euclidean geometry — the sort of gem which in a decent world would be taught in an advanced secondary school maths class. But of course it is not, at least nowhere near me — and indeed even the basic ideas on which it relies (including almost all geometry and indeed the very notion of proof) have more or less been ruthlessly exterminated from the high school curriculum. As is happening now, with the content of the Australian school mathematics curriculum being shifted further back and, so it seems, further watered down, in the name of teaching abstract problem solving skills… skills so abstract that there is less and less actual concrete knowledge to apply them to.

And while not taught in schools, this sort of thing is also too “elementary” to appear in most university courses — unless the university has a specialised “elementary” mathematics course, perhaps aimed at future mathematics teachers, that might cover it. Even then the subject matter can’t be taught by those future teachers, as the content lies in a different universe from that where the curriculum lives, a superlunary realm inaccessible from such a small base of material.

It is just triangle geometry, in some sense no more than a clever elaboration of similar triangles and similar ideas. So it falls through the cracks, only to be learned by those precocious enough to be taught it in an Olympiad programme, or already interested enough to read Euclidean geometry for their own interest.

Anyway, the fact here is a fact about triangles. I’ll try not to get too snarky about the curriculum.

Triangles in the plane. Just the nice, flat, Euclidean, plane.

It’s a fact about centres of triangles. Triangles have centres.

But not one, oh no, no one. They have many centres. There is a whole world of triangle centres out there, none of which you are likely ever to see in any curriculum any time soon, being too far from mundane everyday applications. (Imagine if just one of these was included in a curriculum! Ah, we can dream. Imagine the problems you could solve, the proofs and abstractions and explorations you could do, if you even knew what these things were. Which you don’t.)

So yes, as it turns out there are many ways in which you might define the “centre” of a triangle. Here are three of the most standard ones. Let the triangle’s vertices be \( A,B,C \), so its sides are \( AB, BC \) and \( CA \).

  • The centroid. It’s often called G. This is what you get when you join each vertex to the midpoint of the opposite side. Those three lines, called medians, meet at this point G, called the centroid. It’s also the centre of mass of the triangle. (Oh what, a practical application? Who would have thought.)

  • The circumcentre. It’s often called O. As it turns out, there is a unique circle which passes through the three vertices A,B,C. A basic fact, again, which is not in any curriculum anywhere near me. This circle is called the circumcircle of ABC. (Well, it circumnavigates the vertices, doesn’t it?) Its centre O is called the circumcentre.

Circumcircle and circumcentre of a triangle

  • The orthocentre. It’s often called H. This is what you get when you drop a perpendicular from each vertex to the opposite side. Those three lines, called altitudes (because they measure the height of the vertex above each side), also meet at a point H. That point H is called the orthocentre.

(All diagrams public domain from wikipedia.)

Now, it’s not even clear that any of these points exist. (No, the pictures do not constitute proofs.) If you take three lines in the plane, they may or may not meet at a point. Usually they won’t. But in special cases they will. As it turns out, the three medians of a triangle are sufficiently special that they do meet at a point. As are the three altitudes.

Most likely any reader here will either know about all three of these centres, or about none of them. If you know all three, congratulations my friend. If not, let me simply say this about them:

  • The fact a centre of mass of a triangle exists shouldn’t be too surprising. And a median splits a triangle into two halves of equal area (same height, half of the base each), so it shouldn’t be too surprising that the centre of mass should lie on a median, hence on all the medians, so that it seems they at least ought to meet at a point. In fact, if you know vectors, then if we take an origin \( O \) and position vectors \( \vec{OA}, \vec{OB}, \vec{OC} \) for A,B,C, then the position vector of the centroid G is their average, \[ \vec{OG} = \frac{ \vec{OA} + \vec{OB} + \vec{OC} }{3} \].
  • The fact that you can put a circle through three points shouldn’t be too surprising. Given two points, you can draw many many circles through them, in fact so many that they can pass through any other point as well. Or, for a more misanthropic take on the matter, have you ever tried to keep equally away from 2 people? Well, I suppose it depends on how obnoxious or irritating are the people near you or wherever you happen to go, but in any case it’s not that hard, you just have to stay on a line bisecting those two people (it’s called a perpendicular bisector). Well, now take it a step further and suppose there are 3 people. It can be done. When you do that you intersect another perpendicular bisector and at their intersection, in fact all three perpendiculars have to meet, and that point is the same distance from all the people. You are now at the circumcentre and you can put a circle through all those who repel you. (Well, you’d likely feel trapped between all 3, and you’d probably end up running away, but then you won’t be equally far away from the repellent ones. Anyway…)
  • The fact that the orthocentre is much less obvious, at least to me, without using some other mathematical theorems. It is a bit surprising, without knowing mathematics. But no matter, because I’ll prove it as part of the gem I’m about to explain.

It’s not just the orthocentre. Mathematics is full of surprising, counterintuitive, even apparently ridiculous ideas and statements that turn out to be true. And the process of turning them from apparently ridiculous to understood — then even intuitive — ideas, is the idea of proof. Which you will never learn in a mathematics curriculum any time soon, except in the very uppermost advanced stream in the last year or two, if you’re lucky. Without the language of proof, there will be no words in which to express this process.

So, the centroid \( G \), the circumcentre \(O \), the orthocentre \( H \). Three very different but equally reasonable claimants to the title of *centre* of a circle. They are not the same. But as it turns out, amazingly, they always lie on a line. This was proved by Leonhard Euler in 1767, and the line is called the Euler line of the triangle. Drawing everything on one diagram results in a suitably impressive picture of this line, and hopefully gives some impression of how surprising it is. (Source: wikipedia.)

Euler line of triangle

There are many proofs of this fact. Many of them are very nice, using some combination of similar triangles, or transformation geometry. And as with all proofs of facts that are initially surprising, it converts a possibly surprising, possibly ridiculous statement into an incontrovertible, natural, and true one.

The proof I’m going to share uses vectors. So to understand it, you need to know something about vectors: including position vectors of points; addition, subtraction and scalar multiplication of vectors; and the dot product, and the fact that the dot product of two nonzero vectors is zero if and only if they’re perpendicular. If you don’t, oh well, go and learn it. You likely won’t be able to derive it yourself, even with your mad 21st century skillz.

Remember, \(G\) is the centroid, \(O\) is the circumcentre and \(H\) is the orthocentre. But I’m not even going to assume we even know the orthocentre exists yet.

Let’s set up some position vectors. We’ll take the circumcentre \(O\) as the origin. (After all, we often use \(O\) for the origin and \(O\) for the circumcentre. Why not both? Quite a convenient elision.) As \( O \) is the centre of a circle passing through \( A,B,C \) we have \( | \vec{OA} | = | \vec{OB} | = | \vec{OC} | \). And, as I mentioned above, as \(G\) is the centre of mass of the triangle, \( \vec{OG} = ( \vec{OA}+\vec{OB}+\vec{OC})/3 \).

We’re now ready to state a theorem.

Theorem: Let \(H\) be the point with position vector \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). Then the orthocentre of \( ABC \) exists and is equal to \(H\), and \( \vec{OH} = 3 \vec{OG} \).

This theorem tells us a lot. Firstly, that the orthocentre exists, i.e. that the three altitudes of ABC intersect. Secondly, that they intersect at this point \(H\) defined by \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). Thirdly, from \( \vec{OH} = 3 \vec{OG} \), recalling how scalar multiplication works, it tells us that \( O,H,G \) all lie on a line, with \( O, G, H \) in order along it, and with \( H \) 3 times as far away from \( O \) as \( G \).

In fact, the conclusion that \( \vec{OH} = 3 \vec{OG} \) isn’t really much of a conclusion. It is immediate from the way we defined \( H \). We defined \( H \) by \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). But since the centroid G satisfies \( \vec{OG}=( \vec{OA}+\vec{OB}+\vec{OC})/3 \), we observe that \( \vec{OH} \) is just 3 times \( \vec{OG} \). Of course 1 is 3 times 1/3…

So the thing to prove is that the orthocentre exists and is equal to \(H\), defined in this way. The proof is very short. And that’s what I mean when I say “one line Euler line”.

Proof: We show \(H\) lies on the altitude from \(A\) to \(BC\), i.e. that \(AH\) is perpendicular to \(BC\): \[\vec{AH}.\vec{BC} = (\vec{OH}-\vec{OA}).(\vec{OB}-\vec{OC}) = (\vec{OB}+\vec{OC}).(\vec{OB}-\vec{OC}) = |\vec{OB}|^2 – |\vec{OC}|^2 = 0. \] A similar argument shows that \(H\) lies on the other altitudes of \(ABC\), hence the altitudes intersect at \(H\).
QED

By “one line” I mean the one line calculation above. (If indeed it fit on one line on your screen…) Let’s explain each step. First, \( \vec{AH} = \vec{OH}-\vec{OA} \); that’s how vector subtraction works. Second, we substituted \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \), so that \( \vec{OH} – \vec{OA} = \vec{OB} + \vec{OC} \). Third, we expanded a dot product “difference of perfect squares”, and used the fact that the dot product of a vector with itself is the length squared of that vector. Finally, we used the fact that \(O\) is the circumcentre, so that \( |\vec{OB}|=|\vec{OC}| \).

If you’d like to make sure you understand this, try to do to the “similar” arguments that \(H\) lies on the other altitudes of \(ABC\). In other words, as an exercise, try to show that \( \vec{BH}.\vec{AC} = 0 \) and \( \vec{CH}.\vec{AB} = 0 \).

Well, as I say, this is all in some sense “standard knowledge”, just stuff which is inaccessible to pretty much every normal person since nothing remotely in the same universe is ever taught at school (and nor will it be any time soon anywhere near me). As I say I just came across and (re-)discovered it, and thought of it was worth sharing.

The fact that \( \vec{OH} =\vec{OA}+\vec{OB}+\vec{OC} \) is apparently known as Sylvester’s triangle problem, and what I’ve shown here is very similar to it; in essence it’s the same thing. This vector proof isn’t new, it appears in many places online and off. And you can find many other more standard type proofs, using similar triangles and such things, online and off. As for proving the existence of the orthocentre, cut the knot has a nice page of different proofs, including one along the lines above.

And there you have it. A one line Euler line. More or less. So to speak.

I would dearly love it if students could solve problems related to these ideas — or even much simpler versions thereof —  without knowing any geometry or the idea of proof beyond the pale shadow of which exists in our curriculum, simply by the application of “21st century problem solving skills”. Futuristic indeed they must be to derive everything from a few flawed lessons on proof and some drilling of ways to prove triangles congruent. But as it turns out, to do this sort of problem solving, the average person needs to learn some geometry first. You cannot expect the average student derive hundreds of years of mathematical progress and geometric ingenuity by themselves. With the words to say and the musical notes to play, you can sing your song and create your mathematical world. Once you have the ideas sorted out, you open up a whole universe of creativity, problem solving, a route to the stars, and you have a basis for learning 21st century problem solving skills of creativity, abstraction, precision, rationality, logic, powered by passion, insight, and intuition. But without them, what can you do but flail about. “Every year fewer and fewer words, and the range of consciousness always a little smaller.

Wouldn’t it be nice if ideas like these, or at an attenuated version, or at least something approximating them — or just something, something? — could at least make an appearance in our schools?

Ptolemy vs Thurston in Hyperbolic Geometry and Topology, AustMS 2020

On 9 December 2020 I gave a (virtual) talk in the Topology session of the 2020 meeting of the Australian Mathematical Society.

Title: Ptolemy vs Thurston in Hyperbolic Geometry and Topology.

Abstract:

Bill Thurston (1946-2012 CE) developed a system of great simplicity and power for understanding hyperbolic 3-manifolds. In particular, he introduced equations whose variables encode the shapes of ideal hyperbolic tetrahedra and whose solutions describe hyperbolic structures on 3-manifolds.

Claudius Ptolemy (c.100-170 CE), better known for developing a rather different system, proved in his Almagest an equation about the lengths of sides and diagonals in a cyclic quadrilateral. Such Ptolemy equations arise in numerous places across mathematics, including in 3-dimensional hyperbolic geometry and representation theory.

In this talk I’ll discuss how Ptolemy and Thurston equations provide complementary perspectives on hyperbolic geometry and topology.

20-12_AustMS_talk

From Here to Hensel

Here’s a nice maths problem, which I thought it would be fun to discuss. The question doesn’t involve any advanced concepts, but it leads on to a very nice result called Hensel’s lemma, and can be taken even further.

The problem comes from somewhere in Mathematical Olympiad folklore… which is another way of saying “I don’t know the source”. If you know a source let me know by email or in the comments. It’s discussed in Problem Solving Tactics, the book I co-authored with my colleagues Angelo Di Pasquale and Norman Do.

PROBLEM. Let \(n\) be a positive integer. Prove that there are infinitely many perfect squares of the form \(a \cdot 2^n – 7 \), where \(a\) is a positive integer.

As with most Olympiad type problems, and especially as with problems in our book, it’s a problem which can be solved using “elementary” methods — in this case just modular arithmetic — but applied in a clever way. For the purposes of discussion here, I’m going to assume you know what modular arithmetic is. If you don’t know modular arithmetic, it’s a fun topic, and I’d encourage you to read about it.

As we’ll see, although his problem is a bit of a curly one, the ideas are quite general. They lead to a very nice result and a very nice technique.

Spoiler alert: I’m about to discuss an approach to this problem and give a solution. Now generally with mathematical problems it’s not good form to just read straight to a solution. So if you can, have a go at the problem yourself first. Otherwise you’ll just be burning the problem; you won’t get as much out of the discussion. Go on, get a pen and paper and give it a go. You’ve been here looking at a screen for too long anyway.

OK, on to discussing the problem.

It’s all modular arithmetic!

The first thing to note is that the problem is really about modular arithmetic, and we can rewrite it as such. To say that a number is of the form \( a \cdot 2^n – 7 \) with \(a\) an integer is to say that that number is congruent to \( -7 \) mod \( 2^n \). The problem is asking us to show that, for any positive integer \(n\), there are infinitely many perfect squares congruent to \(-7\) mod \(2^n\). In other words, it’s asking us to show that there exist infinitely many integers (x) such that
\[
x^2 + 7 \equiv 0 \text{ mod } 2^n.
\]
In fact, given \(n\), you only need to find one such \(x\). For once you have one such \(x\), any other number \(y\) congruent to \(x\) modulo \( 2^n \) will be a solution as well: since \( x \equiv y \) mod \(2^n\), then \( y^2 \equiv x^2 \equiv -7 \) mod ( 2^n ). So, we can simplify the problem somewhat.

To celebrate this first bit of progress, let’s fully write out our simplified form of the problem.

SIMPLIFIED VERSION OF PROBLEM. Let \(n\) be a positive integer. Show that there exists an integer \(x\) such that
\[
x^2 \equiv -7 \text{ mod } 2^n.
\]

Finding solutions, two by two by two…

To build up some understanding of how to find the required \(x\), let’s look at some small cases — as we often do, and which is a good way to approach problem-solving in general.

So, let’s start with \(n=1\). Here, we are looking for an \(x\) such that \( x^2 \equiv -7 \equiv 1 \) mod \(2\). In other words, we’re looking for \(x\) such that \(x^2\) is odd.

Try to guess a solution. Well, if the first \(x\) you guess doesn’t work, the second one probably will. Because in fact any odd \(x\) provides a solution. We may as well just take \( x = 1 \).

Great! What about \(n=2\)? Now we’re looking for \(x\) such that \(x^2 + 7 \) is a multiple of \(4\). This simplifies to \( x^2 \equiv -7 \equiv 1 \) mod \(4\).

Again, guessing a solution is not too hard. Since we’re working mod 4, there are only 4 possible guesses for \(x\) anyway. Guessing \(x = 1\) works, as does \(x=3\). This means that, again, any odd number works.

Well, this is all looking pretty good so far!

Flushed with success, we turn to \(n=3\). Now we want to solve \( x^2 \equiv -7 \) mod \(8\), or equivalently, \( x^2 \equiv 1 \) mod \(8\).

Again, not too difficult; and in any case there are only 8 possible guesses for \(x\) mod 8. And you can see that if \(x\) is even then \(x^2\) will also be even, and hence can’t possibly be \( \equiv 1 \) mod \(8\). So really there are only 4 reasonable guesses for \(x\).

And it turns out all guesses work. We can take \(x \equiv 1, 3, 5, \) or \(7\) mod \(8\). Again, any odd number works.

This all seems a bit too easy!

But at \(n=4\) we can’t just take any odd \(x\). We want to solve \(x^2 \equiv -7 \) mod \(16\), which is equivalent to \( x^2 \equiv 9 \) mod \(16\). We can take \(x=3\); in fact any \(x \equiv 3, 5, 11 \) or \(13 \) mod \(16\) works. So out of the 16 guesses mod 16, or 8 odd guesses, 4 of them work.

Now we go to \(n=5\) and try to solve \( x^2 \equiv -7 \) mod \(32\), or equivalently \(x^2 \equiv 25 \) mod \(32\). Noticing that 25 is actually a perfect square, we can just take \( x = 5 \). But again there are several more solutions, and you can check that \(x \equiv 5, 11, 21\) and \(27\) mod \(32\) are the solutions.

But now the numbers are getting bigger. In fact, they’re growing exponentially. As we proceed through \(n = 1, 2, 3, \ldots \), we are proceeding from considering numbers mod \(2\), to mod \(4 = 2^2\), to mod \(8 = 2^3\), and so on. The modulus is increasing by a factor of two at each step.

So, rather than get bogged down in arithmetic modulo 128, let’s step back and think a bit about how we might proceed.

Reducing from one modulus to another

You might have noticed, as we were passing through calculations mod 2, then 4, then 8, and so on, that at each stage we were “refining” our calculations.

And, on a related note, when we were considering calculations at each stage, we could apply the calculations at a previous stage.

Let’s be a bit more precise, and a bit more general.

If you know that a number \(x\) satisfies \(x \equiv 3\) mod \(4\), then you can deduce that \(x\) must be odd; we used this fact above. Similarly, if you know \(x\) satisfies \(x \equiv 7 \) mod \(15\), then you can deduce that \(x \equiv 1 \) mod \(3\). Indeed, if \(x\) is 7 more than a multiple of 15, then it’s also 7 more than a multiple of 3; any multiple of 15 is also a multiple of 3. So \(x \equiv 7 \equiv 1 \) mod \(3\).

More generally, if we know \(x\) mod \(m\), then we can deduce \(x\) mod \(k\), for any factor \(k\) of \(m\).

Let’s write \(m = kl\), and let’s assume \(k,l\) are both integers greater than 1. If you know \(x\) mod \(kl\) then you can know \(x\) mod \(k\). For if \(x \equiv c \) mod \(kl\), then \(x\) is \(c\) more than a multiple of \(kl\). But any multiple of \(kl\) is also a multiple of \(k\), so then \( x \equiv c \) mod \(kl\).

The upshot is that there is a well-defined natural way to reduce from an integer modulo \(kl\), to an integer modulo \(k\). In other words, there’s a natural function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
which does this reduction. If you’ve studied some of this stuff, a fancy way of writing this is to say that there’s a homomorphism of groups or rings
\[
R^{kl}_k \colon \frac{\mathbb{Z}}{kl \mathbb{Z}} \rightarrow \frac{\mathbb{Z}}{k \mathbb{Z}}.
\]
If you haven’t heard those words before, don’t worry; it isn’t necessary for what follows.

Now this idea of reducing from one modulus to another is very relevant in the present problem.

For instance, when we considered the case \(n=5\), we were looking for \(x\) such that \(x^2 \equiv -7 \) mod \(32\). But reducing to mod 16 (i.e. applying \( R^{32}_{16} \) ), such an \(x\) must also satisfy \(x^2 \equiv -7 \) mod \(16\). And we found the solution for such \(x\) mod \(16\) in the previous case \(n=4\): namely, \(x \equiv 3, 5, 11 \) or \(13\) mod \(16\).

Thus, the only possible solutions for \(x\) in the case \(n=5\), solving the equation \(x^2 \equiv -7\) mod \(32\), must be \(x\) mod 32 which reduce to \(3, 5, 11\) or \(13\) mod \(16\). There are 8 such \(x\), namely
\[
x \equiv 3, 5, 11, 13, 19, 21, 27, 29 \text{ mod } 32.
\]
So rather than checking all 32 possibilities for \(x\) mod 32, or (slightly more efficient) checking the 16 odd possibilities for \(x\) mod 32, by paying attention to the previous case we could get down to a mere 8 cases to check. And as we mentioned, among these 8 possibilities, 4 of them provide solutions:
\[
x \equiv 5, 11, 21, 27 \text{ mod } 32.
\]

Lifting from one modulus to another

What we’re doing here is often described as “lifting”: lifting integers mod 16 to mod 32.

For instance, if you know \( x \equiv 3 \) mod \(16\), then you can’t say what \(x\) is mod \(32\), but you can narrow it down to two possibilities: \( x \equiv 3 \) or \(19 \) mod \(32\). So \(x\) has two lifts from mod 16 to mod 32.

A more abstract way of saying the same thing is that, when you consider the function
\[
R^{32}_{16} \colon { \text{Integers mod 32} } \rightarrow { \text{Integers mod 16} },
\]
the integer 3 mod 16 has two preimages. These preimages are its “lifts” to modulo 32. Indeed, every integer mod 16 has two preimages under \( R^{32}_{16} \). Explicitly, the two preimages of a number \(c\) mod \(16\) under \( R^{32}_{16} \) are \(c\) and \(c+16\) mod \(32\).

As you might suspect, there’s nothing special about 16 and 32 here. Let’s again consider general positive integers \(k,l\) greater than 1. Then the function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
is surjective, and it’s “\(l\)-to-1”, meaning that every integer modulo \(k\) has precisely \(l\) preimages. Explicitly, the preimages of an integer \(c\) are just given by \(c\) and adding multiples of \(k\) to it some number of times; so the preimages of \(c\) mod \(k\) are
\[
c, c+k, c+2k, …, c+(l-1)k.
\]
These \(l\) preimages are just given by adding \(k\) until we arrive back where we started: from the last number listed here \(c+(l-1)k\), if we added another \(k\), we would arrive at \(c+lk\), which is congruent to \(c\) mod \(kl\) and back where we started.

I haven’t formally proved the statements in the above paragraph, but hopefully they should make sense and the above discussion gives an indication why they’re true. If you’re not entirely convinced, I’d encourage you to try and prove it yourself!

Lifting our Solutions

Let’s now return to the problem at hand and use these ideas to find more solutions. We left off at \(n=5\), finding the solutions \(x \equiv 5, 11, 21, 27 \) mod \( 32 \).

Proceeding to the case \(n=6\), we are trying to solve
\[
x^2 \equiv -7 \text{ mod } 64.
\]
Based on our previous discussion, let’s try to find a solution by lifting the solutions from the previous case from mod 32 to mod 64.

Let’s start from one of our solutions mod 32, \(x \equiv 5 \) mod 32.

It has two lifts, \(5\) and \(5+32 = 37 \) mod \(64\). Do either of these work? We can compute \(5^2 \equiv 25 \) and \(37^2 = 1369 \equiv 25 \) mod \(64\). So neither of them works.

Next, let’s try lifting \(x \equiv 11 \) mod \(32\). It has two lifts \(11, 43\) mod \(64\). We find that \( 11^2 \) and \(43^2 \) are both \( \equiv -7 \) mod \(64\). So both are solutions!

For the sake of completeness, we can check that lifting \(x \equiv 21 \) mod \(32\) gives the possibilities \(x \equiv 21, 53\) mod \(64\) — and both of these are also solutions. And lifting \(x \equiv 27 \) mod \(32\) gives the possibilities \(x \equiv 27, 59 \) mod \(64\), neither of which are solutions.

So, it’s not always the case that a solution mod 32 lifts directly to a solution mod 64. But it’s often the case.

If we can guarantee that we can always go from a solution mod \(2^n\) to a solution mod \(2^{n+1}\), then we’ve solved the problem. We would then be able to take our solutions mod \(2^n\) for small \(n\), and use this guarantee to successively obtain solutions mod higher powers of \(2\). By induction on \(n\) then we can guarantee a solution mod \(2^n\) for all \(n\), which is precisely what the question asks for.

An Unorthodox Lifting

So, let’s work generally. Suppose we have a solution \(x \equiv c\) mod \(2^n\), i.e. we have found \(c\) such that \(c^2 \equiv -7 \) mod \(2^n\).

Then when we lift \(c\) to mod \(2^{n+1}\), there are two possibilities: \(x \equiv c\) or \(c+2^n\) mod \(2^{n+1}\). If either of these satisfies \(x^2 \equiv -7\) mod \(2^{n+1}\), we are finished.

Moreover, when we do this, we know we have to be “pretty close” to a solution, in some sense. We can see this by lifting, not just the \(x\), but also the \(-7\). Indeed, if \(x^2 \equiv -7 \) mod \(2^n\), then \(x^2 \) considered mod \(2^{n+1}\) is a lift of \(-7\) from mod \(2^n\) to mod \(2^{n+1}\). The two lifts of \(-7\) from mod \(2^n \) to mod \(2^{n+1}\) are \(-7\) and \(-7+2^n\). So either
\[
x^2 \equiv -7 \text{ or } -7+2^n \text{ mod } 2^{n+1}.
\]

Anyway, let’s take the lifts \(c\) and \(c+2^n\) of our solution from mod \(2^n\) to mod \(2^{n+1}\), and see how they compare. We can calculate
\[
(c + 2^n)^2 = c^2 + 2^{n+1} c + 2^{2n} \equiv c^2 \text{ mod } 2^{n+1}.
\]
Observe what happened: upon expanding out \( (c+2^n)^2 \), all of the terms except \(c^2\) contained high powers of 2, which modulo \(2^{n+1}\) are just zero and disappear.

This is bad news, because it means that \(c\) and \(c+2^n\) give the same result mod \( 2^{n+1} \) when we square them. So if \(x \equiv c\) is a solution, i.e. \(c^2 \equiv -7 \) mod \(2^{n+1} \), then \(x \equiv c + 2^n \) is also a solution. That’s great, but if \(x \equiv c \) is not a solution, then neither is \(x \equiv c+2^n \).

That’s just as we found above. Some solutions mod \(32\) lift to give a solution mod \(64\), and when they lift they lift doubly well, with both lifts being solutions. But other solutions do not lift to solutions at all.

So the problem becomes a bit harder, on account of the fact that
\[
(x + 2^n)^2 \equiv x^2 \text{ mod } 2^{n+1}.
\]
If only that weren’t true… but we’ll return to that point later.

For now, let’s deal with the reality at hand, where that unfortunate equation holds.

Back to the drawing board…

Our approach at this point has now become an attempt at a proof by induction. Given a solution modulo \(2^n\), we want to obtain from it a solution modulo \(2^{n+1}\). We need to do this in a way that always works. But we now know that we can’t just directly lift a solution each time.

Let’s go back to our equation modulo \( 32 \) and try to lift the solutions to modulo \( 64 \) in a way that “always works”. In other words, let’s try to find a method that, no matter which solution modulo \(32\) we start with, we can end up with a solution modulo \(64\).

We had the solution \(x \equiv 5 \) mod \(32\), which did not lift to a solution mod \(64\). But since \(x^2 \equiv -7\) mod \(32\), we could have known, lifting the \(-7\), that
\[
x^2 \equiv -7 \text{ or } 25 \text{ mod } 64.
\]
However, we found that both lifts \( x \equiv 5, 37 \) mod \(64\) failed to be solutions: both squared to \( 25 \) mod \(64\). And as we know now, if one of them fails, the other is doomed to fail too.

The reason for this failure, we found, was because
\[
37^2 \equiv (5 + 32)^2 \equiv 5^2 + 2 \cdot 5 \cdot 32 + 32^2 \equiv 5^2 \text{ mod } 64.
\]
As you can see, when we add the 32 in the term to be squared, we have added such a high power of 2 that all the terms other than \(5^2\) disappear modulo \(64\). In particular, the cross term \(2 \cdot 5 \cdot 32 \) becomes a multiple of 64.

But what if we added a just slightly lower power of 2? Rather than adding 32, we could add 16. This would yield \(x = 21\), which we found above is actually a solution! Indeed, applying the same algebra as above but with a 16 rather than a 32, we can see that
\[
21^2 \equiv (5+16)^2 \equiv 5^2 + 2 \cdot 5 \cdot 16 + 16^2
\equiv 5^2 + 32 \cdot 5 \text{ mod } 64.
\]
Now rather than evaluating \(32 \cdot 5\), we could just notice that it’s an odd multiple of 32. A multiple of 32 is always going to be congruent to 0 or 32 mod 64, depending on whether we multiply it by an even of an odd number. Here we have an odd multiple, so \( 32 \cdot 5 \equiv 32 \) mod \(64\). Thus,
\[
21^2 \equiv 5^2 + 32 \text{ mod } 64.
\]

This is great news! It says that \(5^2\) and \(21^2\) are not the same, modulo 64: they differ by \(32\). If it turned out that \(5^2 \equiv -7\) mod \(64\), we’d be finished. But it isn’t: it takes the other value, \(5^2 \equiv 25 \) mod \(64\). And in that case, the above equation tells us that \(21^2\) must give us the right value, \(21^2 \equiv 5^2 + 32 \equiv 25+32 \equiv -7 \) mod \(64\).

This argument doesn’t rely at all on the particular powers of 2 involved. It’s going to give us a way to go from a solution mod \(2^n\) to a solution \(2^{n+1}\). It will solve our problem.

… and a proof

Let’s reiterate the above in more generality, which in fact gives a solution to the problem.

Suppose we have an integer \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\). Proceeding as above, we’ll find a \(y\) such that \(y^2 \equiv -7 \) mod \(2^n\).

Well, lifting \(-7\) from mod \(2^n\) to mod \(2^{n+1}\), we must have that either
\[
x^2 \equiv -7 \text{or} -7+2^n \text{ mod } 2^{n+1}.
\]
In the first case, we are finished: we can just take \(y = x\). In the second case, let \(y = x + 2^{n-1} \). Then we have
\[
y^2 = (x + 2^{n-1})^2
= x^2 + 2^n x + 2^{2n-2}
\equiv x^2 + 2^n x \text{ mod } 2^{n+1}.
\]
Now as \(x^2 \equiv -7 \) mod \(2^n\), \(x\) is definitely odd. So \( 2^n x \) is an odd multiple of \(2^n\). Modulo \(2^{n+1}\), all multiples of \(2^n\) are congruent to \(0\) or \(2^n\): even multiples are congruent to \(0\), odd multiples are congruent to \(2^n\). Thus \(2^n x \equiv 2^n \) mod \(2^{n+1}\). We then conclude that
\[
y^2 \equiv x^2 + 2^n \text{ mod } 2^{n+1}.
\]
Since we assumed in this case that \( x^2 \equiv -7 + 2^n \) mod \( 2^{n+1} \), we therefore have
\[
y^2 \equiv -7 + 2^n + 2^n \equiv -7 \text{ mod } 2^{n+1},
\]
which is precisely what we were looking for.

We have shown that, given \(x\) satisfying \(x^2 \equiv -7 \) mod \(2^n\), there exists \(y\) satisfying \(y^2 \equiv -7\) mod \(2^{n+1}\).

So, since we found solutions modulo \(2^n\) for small \(n\), we now have proved, by induction, that for all positive integers \(n\), there exists \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\).

We have solved the problem!

What if life was easy?

In this problem we had to solve a specific equation in modular arithmetic. We did it by repeatedly “lifting” solutions. Given a solution modulo \(2^n\), we used it to find a solution modulo \(2^{n+1}\).

It would have been easier if we could directly lift the solutions. In other words, it would have been easier if whenever \(x\) was a solution modulo \(2^n\), it was also a solution modulo \(2^{n+1}\). That wasn’t quite true in this problem, and we have to adjust our \(x\) by \(2^{n-1}\) if necessary in passing from arithmetic modulo \(2^n\) to arithmetic modulo \(2^{n+1}\).

Sometimes, however, you can lift your solutions directly.

Sometimes, life doesn’t have to be difficult.

Let’s step back a bit and think about the type of problem we were trying to solve. We had a polynomial equation modulo \(2^n\):
\[
P(x) \equiv 0 \text{ mod } 2^n.
\]
In this problem, we had \(P(x) = x^2 + 7 \).

Let’s change the problem to a different polynomial: let \( Q(x) = x^3 + 7 \), and we’ll consider the similar-looking polynomial equation
\[
Q(x) \equiv 0 \text{ mod } 2^n,
\text{ i.e. }
x^3 + 7 \equiv 0 \text{ mod } 2^n.
\]

Now, mod \(2,4,8\), there’s the straightforward solution \(x=1\).

Mod \(16\), \(x=1\) is not a solution, but the other lift of \(1\) from mod 8 to mod 16 is a solution, i.e. \(x=9\).

Mod \(32\), \(x=9\) is still a solution.

Mod \(64\), \(x=9\) is not a solution, but the other lift of \(9\) from mod 32 to mod 64 is a solution, namely \(x=41\).

And so on.

So it appears that when we’re solving \(x^3 + 7 \equiv 0 \) mod \(2^n\), we can just lift solutions directly. Going from a solution modulo \(2^n\) to \(2^{n+1}\) there are two lifts, and one of them always seems to work.

Let’s see if we can prove this, using a similar method as for the initial, harder, problem. Suppose we have a solution \(x\) to the equation \(x^3 + 7 \equiv 0 \) mod \( 2^n \); we’ll try to find a \(y\) such that \(x^3 + 7 \equiv 0 \) mod \(2^{n+1}\).

Well, since \(x^3 + 7 \equiv 0 \) mod \(2^n \), then there are two possibilities for \(x^3 + 7 \) modulo \(2^{n+1}\):
\[
x^3 + 7 \equiv 0 \text{ or } 2^n \text{ mod } 2^{n+1}.
\]
In the first case, we have our solution: we may simply take \(y=x\) and then \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\) as desired. In the second case, consider letting \(y\) to be the other lift of \(x\) from mod \(2^n\) to mod \(2^{n+1}\), namely \(y = x+2^n\).

We can then compute \(y^3 + 7 \) mod \(2^{n+1}\), using similar algebra as in the original question:
\[
y^3 + 7
= (x+2^n)^3 + 7
= x^3 + 3x^2 2^n + 3x 2^{2n} + 2^{3n} + 7.
\]
Now by assumption \(x^3 + 7 \equiv 2^n \) mod \(2^{n+1}\). Moreover, any term containing a factor with \(2^a\) with \(a \geq n+1\) reduces to zero modulo \(2^{n+1} \), since it’s a multiple of \(2^{n+1}\). Thus the terms with a \(2^{2n}\) and a \(2^{3n}\) disappear. Thus the above expression simplifies, and we obtain
\[
y^3 + 7 \equiv 2^n + 3x^2 2^n
\equiv 2^n (1 + 3x^2) \text{ mod } 2^{n+1}.
\]

Here we have a multiple of \(2^n\). And as we discussed earlier, there aren’t many possibilities for multiples of \(2^n \) modulo \(2^{n+1}\): even multiples of \(2^n\) are in fact multiplies of \(2^{n+1}\), hence congruent to \(0\) mod \(2^{n+1}\); while odd multiples of \(2^{n}\) must be congruent to \(2^n\) mod \(2^{n+1}\).

Since \(x\) satisfies \(x^3 + 7 \equiv 0 \) mod \(2^n\), we see that \(x\) must be odd. So \(3x^2 \) is odd and \(1+3x^2 \) is even. Thus \(2^n (1+3x^2) \) is an even multiple of \(2^n\), hence congruent to 0 mod \(2^{n+1}\), and we obtain
\[
y^3 + 7 \equiv 0 \text{ mod } 2^{n+1}
\]
as desired.

Thus, we have shown that the solution \(x\) of \(x^3 + 7 \equiv 0 \) mod \(2^n\) lifts directly to a solution \(y \) of \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\). It may be that \(y\) is one lift \(x\) or the other lift \(x+2^n\), but one of those two lifts works.

This is much more straightforward behaviour than in the original problem! Life is easy now!

Why is life sometimes easy and sometimes hard?

So, what’s the difference between these two cases?

With the polynomial \(P(x) = x^2 + 7 \), a solution of \( P(x) \equiv 0 \) mod \(2^n\) did not lift directly to a solution mod \(2^{n+1}\), although we were able to find solutions by using a trick.

With the polynomial \(Q(x) = x^3 + 7 \), a solution of \( Q(x) \equiv 0 \) mod \(2^n\) did lift directly to a solution mod \(2^{n+1}\).

What is so different about \(P(x)\) and \(Q(x)\)?

At one level, the difference derived from a technicality in trying to lift solutions. From a solution \(x\) of \(P(x) \equiv 0 \) mod \(2^n\), we considered the lifts \(x\) and \(x+2^n\) mod \(2^{n+1}\). In expanding out \( P(x+2^n) \), too many terms cancelled, leaving us with just the terms of \( P(x) \) :
\[
P(x+2^n) = (x+2^n)^2 + 7
\equiv x^2 + 7 \equiv P(x) \text{ mod } 2^{n+1}.
\]
But doing the same for \( Q(x) \), there wasn’t quite as much cancellation:
\[
Q(x+2^n) = (x+2^n)^3 + 7
\equiv x^3 + 3x^2 2^n + 7
\equiv Q(x) + 3x^2 2^n \text{ mod } 2^{n+1},
\]
and in fact we were able to show that the extra term \( 3x^2 2^n \) was congruent to \( 2^n \) mod \(2^{n+1} \), so that
\[
Q(x+2^n) \equiv Q(x) + 2^n \text{ mod } 2^{n+1}.
\]
This last equation effectively says that \( Q(x+2^n) \) and \( Q(x) \) are the two lifts of \( Q(x) \) from mod \( 2^n \) to mod \( 2^{n+1} \); so one or the other lift of \(x\), i.e. either \(x\) or \(x+2^n\) will be the solution we’re looking for.

What’s happening, in some sense, is that the solution \(x\) to \( P(x) \equiv 0 \) mod \(2^n\) was “too good”: it was so good that \( P(x+2^n) \) cancelled too much!

In fact, there is a sense in which the solution \(x\) of \( P(x) \equiv 0 \) mod \(2^n\) was a “repeated root”, just like \(x=2\) is a repeated root of the polynomial \( (x-2)^2 \) over the real numbers.

Repeated roots in modular arithmetic

Now we have to be very careful with saying things like “repeated roots” for polynomials in modular arithmetic. Take arithmetic modulo 8 for instance. Then for one thing, there is no unique factorisation of polynomials, even if we regard integers purely as mod 8: for instance,
\[
(x+4)^2 = x^2 + 8x + 16 \equiv x^2 \text{ mod } 8,
\]
so that we have two distinct factorisations of the same polynomial
\[
(x+4)^2 \equiv x^2 \text{ mod } 8.
\]
Thus it doesn’t really make sense to talk about the “roots” of a polynomial mod 8, in the same way that we talk about the roots of a polynomial over the real or complex numbers.

(However, as it turns out, modulo a prime, there is unique factorisation of polynomials, and you can talk about the roots of a polynomial in a very similar way as over the real or complex numbers.)

Nonetheless, you can definitely talk about the solutions of a polynomial equation modulo 8; and indeed we’ve been doing that for a while now.

Perhaps the best way to understand what “repeated roots” might mean modulo 8 is to forget about factorising polynomials and instead differentiate them.

Differentiate? As in calculus? As in the subject all about real numbers and limits and tangent lines and so on, having very little to do with modular arithmetic?

Yes. That sort of differentiation. Let’s see why.

Simple roots and derivatives

Let’s forget about modular arithmetic for the minute, and think about polynomials in general, with (say) real numbers for coefficients. So, let \( F(x) \) be a polynomial.

Recall that a root of \(F(x)\) is a number \(r\) such that \(F(r) = 0\). This is equivalent to \( x-r \) being a factor of \(F(x)\).

We say a root is a repeated root or a non-simple root if in fact \( (x-r)^2 \) is a factor of \(F(x)\).

Otherwise, we say a root is a simple root. In this case, \( x-r \) is only a factor of \(F(x) \) once. In other words, \( F(x) = (x-r)H(x) \) where \( x-r \) is not a factor of \(H(x)\), or equivalently, where \( H(r) \neq 0 \).

One interesting fact about polynomials is that you can detect whether root is simple via differentiation.

FACT. Let \(r\) be a root of \(F(x)\). Then \(r\) is a non-simple (i.e. repeated) root if and only if \( F'(r) = 0 \).

In other words, a root of \( F(x) \) is a repeated root if and only if it’s also a root of the derivative \( F'(x) \).

The proof comes from applying the product rule.

PROOF. If \(r\) is a repeated root then \( F(x) = (x-r)^2 G(x) \) for some polynomial \( G(x) \). By the product rule then
\[
F'(x) = 2(x-r)G(x) + (x-r)^2 G'(x)
\]
and since both terms have a factor of \(x-r\), when we substitute \(x=r\) we obtain \( F'(r) = 0 \). If \(r\) is not repeated, then \( F(x) = (x-r) H(x) \) where \( H(r) \neq 0 \). In this case the product rule gives
\[
F'(x) = H(x) + (x-r)H'(x),
\]
and substituting \(x=r\) gives \( F'(r) = H(r) \neq 0 \).

Well, that’s a nice little fact. And we can try to apply it over the integers… or in modular arithmetic.

Given a polynomial \(F(x)\) with integer coefficients, its derivative \(F'(x) \) is still a polynomial with integer coefficients. And even if you consider the integer coefficients modulo 8 (or any other modulus), the derivative is still well defined modulo 8 (or any other modulus).

We can use this idea to define a notion of simple or repeated roots, even when working with modular arithmetic. When we have a root \(r\) of a polynomial \(F(x)\), we look at \(F'(r)\). As we’ll see, a similar idea applies, but there is a slight twist.

Let’s return to our examples discussed above.

For \( P(x) = x^2 + 7 \), modulo 8, we have the root \(1\), since \(P(1) \equiv 0 \) mod 8. Its derivative is \( P'(x) = 2x \), and \(P'(1) \equiv 2 \) mod 8.

For \( Q(x) = x^3 + 7 \), modulo 8, we have the root \(1\), since \( Q(1) \equiv 0 \) mod 8. Its derivative is \( Q'(x) = 3x^2 \), and \( Q'(1) \equiv 3 \) mod 8.

Now as it turns out, \( 1 \) is a “repeated root” of \( P(x) \), but it is a simple root of \( Q(x) \).

As we can see, it’s not true that \( P'(1) = 0 \), as we might have expected for a repeated root. However, it is even: that is, \( P'(1) \equiv 0 \) mod 2. The key difference is that 2 is odd, but 3 is not!

As it turns out, the way to define simple and repeated roots of a polynomial \(F(x)\), when we are working modulo \(2^n\), is to look at the derivative \(F'(x)\), not mod \(2^n\), but simply mod \(2\). And more generally, the same ideas apply when we work modulo powers of any prime.

In this way, we can make sense of simple and non-simple roots, even when the idea of unique factorisation of polynomials doesn’t otherwise work. (We’ll call them “non-simple” rather than “repeated” since the whole idea of unique factorisation breaks down.)

DEFINITION. Let \( F(x) \) be a polynomial whose coefficients are integers mod \(p^k\), where \(p\) is a prime and \(k\) is a positive integer.

  • We say an integer \(r\) mod \(p^k\) is a root of \(F(x)\) if \(F(r) \equiv 0 \) mod \(p^k\).
  • If in addition \( F'(r) \equiv 0 \) mod \(p\), we say \(r\) is a non-simple root.
  • Otherwise, i.e. if \( F'(r) \neq 0 \) mod \(p\), we say \(r\) is a simple root.

With this definition, \(1\) is a non-simple root of our original polynomial \(P(x) = x^2 + 7 \) mod \(8\), since \(P'(x) = 2x\) and \(P'(1) \equiv 0 \) mod \(2\).

On the other hand, \(1\) is a simple root of the polynomial \(Q(x) = x^3 + 7 \) mod \(8\), since \(Q'(x) = 3x^2 \) and \(Q'(1) = 3 \neq 1 \) mod \(2\).

Hensel’s Lemma

We’ve seen that in certain circumstances, solution of polynomial equations modulo \(2^n\) lift to solutions modulo \(2^{n+1}\). Or put another way, we’ve seen that roots of polynomials over the integers modulo \(2^n\) lift to roots of polynomials over the integers modulo \(2^{n+1}\).

We found that for \(P(x) = x^2 + 7\), roots of \(P(x) \equiv 0 \) mod \(2^n\) did not lift to roots mod \(2^{n+1}\). But we also found non-simple roots in this case.

On the other hand, we found that for \(Q(x) = x^3 + 7 \), roots of \(Q(x) \equiv 0 \) mod \(2^n\) did lift to roots mod \(2^{n+1}\). And in this case the roots were simple.

As it turns out, this is true in complete generality. When we have simple roots mod \(2^n\), they always lift to roots mod \(2^{n+1}\). And this is true not just mod \(2^n\), but mod \(p^n\) for any prime \(p\).

This result is called Hensel’s lemma. Let’s state it.

HENSEL’S LEMMA. Let \(F(x)\) be a polynomial with integer coefficients. Let \(p\) be a prime and \(k\) a positive integer. Suppose \(r\) mod \(p^k\) is a simple root of \(F(x)\) modulo \(p^k\), i.e.
\[
F(r) \equiv 0 \text{ mod } p^k
\text{ and }
F'(r) \neq 0 \text{ mod } p.
\]
Then there is a lift \(s\) of \(r\) mod \(p^{k+1}\) which is a root of \(F(x)\) modulo \(p^{k+1}\), i.e.
\[
s \equiv r \text{ mod } p^k
\text{ and }
F(s) \equiv 0 \text{ mod } p^{k+1}.
\]

Hensel’s lemma, as we’ve seen, doesn’t apply directly to our original problem. But it does apply to the easier problem with the polynomial \( Q(x) = x^3 + 7 \). Let’s see how quickly the solution goes now when we revisit the problem.

PROBLEM. Show that for any positive integer \(n\), there is an integer \(x\) such that \(Q(x) = x^3 + 7\) is divisible by \(2^n\).

SOLUTION. Proof by induction on \(n\). When \(n=1\) we observe that \(x=1\) is a solution, since \(Q(1) = 8 \equiv 0 \) mod \(2\). Now suppose we have a root \(r\) of \(Q(x)\) modulo \(2^n\). Then as \(r^3 + 7 \equiv 0 \) mod \(2^n\), \(r\) must be odd. Thus \( Q'(r) = 3r^2 \) must be odd, so \( Q'(r) \neq 0 \) mod \(2\), and \(r\) is a simple root. By Hensel’s lemma then \(r\) lifts to a root \(s\) of \(Q(x)\) mod \(2^{n+1}\). So any root of \(Q(x) \) mod \(2^n\) lifts to a root mod \(2^{n+1}\), and by induction then \(Q(x) \) has a root modulo \(2^n\) for all positive integers \(n\), as desired.

And there you have it.

Return of the Euler-Fermat theorem

A long long time ago, in a galaxy far away, I wrote up an account of the Euler-Fermat theorem for school students.

The Euler-Fermat theorem is the wonderful result of elementary number theory, which says that in modular arithmetic, you can often multiply a number by itself a certain number of times to get back to 1.

For instance, consider the integers modulo 14, i.e. remainders upon division by 14. Starting from the number 3, if we repeatedly multiply it by itself, i.e. we consider powers of 3, we obtain
\[
3, 3^2 = 9, 3^3 = 27, 3^4 = 81, 3^5 = 243, 3^6 = 729, …
\]
which reduce modulo 14 to (i.e. their remainders upon division by 14 are)
\[
3, 9, 13, 11, 5, 1, …
\]
Since, as we have just calculated, \( 3^6 \equiv 1 \) mod 14, this means that subsequent powers of 3 will reduce mod 14 to precisely the same numbers; they will continue to cycle around. For instance,
\[
3^7 = 3^6 \times 3 \equiv 1 \times 3 \equiv 3 \text{ mod } 14, \\
3^8 = 3^6 \times 3^2 \equiv 1 \times 9 \equiv 9 \text{ mod } 14, \\
3^9 = 3^6 \times 3^3 \equiv 1 \times 13 \equiv 13 \text{ mod } 14,
\]
and so on. The powers of 3 cycle around modulo 14, repeating the same 6 numbers.

So, the fact that \( 3^6 \equiv 1 \) mod 14 gives a very nice structure to powers of 3, modulo 14.

We can try the same thing starting from some different numbers — still working modulo 14, i.e. taking remainders upon division by 14.

If we do the same for powers of 5 modulo 14, we find the same thing: again \( 5^6 \equiv 1 \) mod 14, and the powers of 5 also repeat through 6 numbers.

With powers of 9 modulo 14, we find something similar, but not quite the same: here \( 9^3 \equiv 1 \) mod 14, so the powers of 9 form a cycle of length 3, repeating through 3 numbers.

With powers of 4 modulo 14, however, something goes wrong. Taking powers of 4, we never arrive back at 1 modulo 14. And if you think about it you can see why: 4 is even, hence so are all its powers. But a number congruent to 1 mod 14 must be odd, so this can never happen.

The same is true starting from 7. The powers of 7 are all divisible by 7, but a number congruent to 1 mod 14 must be 1 more than a multiple of 7.

In order to have a chance of arriving back at 1, with the powers cycling around, the number we start from must have no factors in common with 14. In other words, it must be relatively prime to 14.

The numbers 3, 5, 9 are all relatively prime to 14. And we found \( 3^6 \equiv 1 \) mod 14, \( 5^6 \equiv 1 \) mod 14, and \( 9^3 \equiv 1 \) mod 14. The power required is 6 in the first two cases and 3 in the other. But notice that 3 is a factor of 6, so in fact
\[
9^6 = (9^3)^2 \equiv 1 \text{ mod } 14
\]
as well. So starting from any of these numbers, its sixth power leaves a remainder of 1 modulo 14.

So now we arrive at a conjecture: if \( a \) is an integer relatively prime to 14, then \( a^6 \equiv 1 \) mod 14.

This conjecture turns out to be true. And in fact, something like it is true not just when we look modulo 14, but modulo any positive integer \( n \).

But as you might suspect, you don’t always raise numbers to the power 6. When you replace 14 by a general number \( n \), what do you replace the exponent 6 with?

The answer is, the Euler phi function \( \phi(n) \). Given a positive integer \( n \), we define \( \phi(n) \) as the number of positive integers less than \( n \) which are relatively prime to \( n \).

So, for instance, the positive integers less than 14 which are relatively prime to 14 are
\[
1, 3, 5, 9, 11, 13.
\]
There are six of these, so \( \phi(14)=6 \).

And so the Euler-Fermat theorem can be stated as follows.

THEOREM. For any positive integer \(n\) and any integer \( a \) relatively prime to \( n \),
\[
a^{\phi(n)} \equiv 1 \text{ mod } n.
\]

To really understand why this theorem is true, I think one should understand a little group theory. Just the first little bit of group theory.

And so, a long long time ago, in a galaxy far far away, I thought I would write up such a proof, introducing just as much group theory as one needs for it. The result is intended for talented high school students or interested members of the public.

Today, I rediscovered it. You can read it below!

Euler-Fermat-theorem