One line Euler line

Well, I probably saw it as a student years ago, and apparently it’s standard, but I came across this today through Olympiad wanderings and thought it was wonderful enough to share.

It’s a fun fact from Euclidean geometry — the sort of gem which in a decent world would be taught in an advanced secondary school maths class. But of course it is not, at least nowhere near me — and indeed even the basic ideas on which it relies (including almost all geometry and indeed the very notion of proof) have more or less been ruthlessly exterminated from the high school curriculum. As is happening now, with the content of the Australian school mathematics curriculum being shifted further back and, so it seems, further watered down, in the name of teaching abstract problem solving skills… skills so abstract that there is less and less actual concrete knowledge to apply them to.

And while not taught in schools, this sort of thing is also too “elementary” to appear in most university courses — unless the university has a specialised “elementary” mathematics course, perhaps aimed at future mathematics teachers, that might cover it. Even then the subject matter can’t be taught by those future teachers, as the content lies in a different universe from that where the curriculum lives, a superlunary realm inaccessible from such a small base of material.

It is just triangle geometry, in some sense no more than a clever elaboration of similar triangles and similar ideas. So it falls through the cracks, only to be learned by those precocious enough to be taught it in an Olympiad programme, or already interested enough to read Euclidean geometry for their own interest.

Anyway, the fact here is a fact about triangles. I’ll try not to get too snarky about the curriculum.

Triangles in the plane. Just the nice, flat, Euclidean, plane.

It’s a fact about centres of triangles. Triangles have centres.

But not one, oh no, no one. They have many centres. There is a whole world of triangle centres out there, none of which you are likely ever to see in any curriculum any time soon, being too far from mundane everyday applications. (Imagine if just one of these was included in a curriculum! Ah, we can dream. Imagine the problems you could solve, the proofs and abstractions and explorations you could do, if you even knew what these things were. Which you don’t.)

So yes, as it turns out there are many ways in which you might define the “centre” of a triangle. Here are three of the most standard ones. Let the triangle’s vertices be \( A,B,C \), so its sides are \( AB, BC \) and \( CA \).

  • The centroid. It’s often called G. This is what you get when you join each vertex to the midpoint of the opposite side. Those three lines, called medians, meet at this point G, called the centroid. It’s also the centre of mass of the triangle. (Oh what, a practical application? Who would have thought.)

  • The circumcentre. It’s often called O. As it turns out, there is a unique circle which passes through the three vertices A,B,C. A basic fact, again, which is not in any curriculum anywhere near me. This circle is called the circumcircle of ABC. (Well, it circumnavigates the vertices, doesn’t it?) Its centre O is called the circumcentre.

Circumcircle and circumcentre of a triangle

  • The orthocentre. It’s often called H. This is what you get when you drop a perpendicular from each vertex to the opposite side. Those three lines, called altitudes (because they measure the height of the vertex above each side), also meet at a point H. That point H is called the orthocentre.

(All diagrams public domain from wikipedia.)

Now, it’s not even clear that any of these points exist. (No, the pictures do not constitute proofs.) If you take three lines in the plane, they may or may not meet at a point. Usually they won’t. But in special cases they will. As it turns out, the three medians of a triangle are sufficiently special that they do meet at a point. As are the three altitudes.

Most likely any reader here will either know about all three of these centres, or about none of them. If you know all three, congratulations my friend. If not, let me simply say this about them:

  • The fact a centre of mass of a triangle exists shouldn’t be too surprising. And a median splits a triangle into two halves of equal area (same height, half of the base each), so it shouldn’t be too surprising that the centre of mass should lie on a median, hence on all the medians, so that it seems they at least ought to meet at a point. In fact, if you know vectors, then if we take an origin \( O \) and position vectors \( \vec{OA}, \vec{OB}, \vec{OC} \) for A,B,C, then the position vector of the centroid G is their average, \[ \vec{OG} = \frac{ \vec{OA} + \vec{OB} + \vec{OC} }{3} \].
  • The fact that you can put a circle through three points shouldn’t be too surprising. Given two points, you can draw many many circles through them, in fact so many that they can pass through any other point as well. Or, for a more misanthropic take on the matter, have you ever tried to keep equally away from 2 people? Well, I suppose it depends on how obnoxious or irritating are the people near you or wherever you happen to go, but in any case it’s not that hard, you just have to stay on a line bisecting those two people (it’s called a perpendicular bisector). Well, now take it a step further and suppose there are 3 people. It can be done. When you do that you intersect another perpendicular bisector and at their intersection, in fact all three perpendiculars have to meet, and that point is the same distance from all the people. You are now at the circumcentre and you can put a circle through all those who repel you. (Well, you’d likely feel trapped between all 3, and you’d probably end up running away, but then you won’t be equally far away from the repellent ones. Anyway…)
  • The fact that the orthocentre is much less obvious, at least to me, without using some other mathematical theorems. It is a bit surprising, without knowing mathematics. But no matter, because I’ll prove it as part of the gem I’m about to explain.

It’s not just the orthocentre. Mathematics is full of surprising, counterintuitive, even apparently ridiculous ideas and statements that turn out to be true. And the process of turning them from apparently ridiculous to understood — then even intuitive — ideas, is the idea of proof. Which you will never learn in a mathematics curriculum any time soon, except in the very uppermost advanced stream in the last year or two, if you’re lucky. Without the language of proof, there will be no words in which to express this process.

So, the centroid \( G \), the circumcentre \(O \), the orthocentre \( H \). Three very different but equally reasonable claimants to the title of *centre* of a circle. They are not the same. But as it turns out, amazingly, they always lie on a line. This was proved by Leonhard Euler in 1767, and the line is called the Euler line of the triangle. Drawing everything on one diagram results in a suitably impressive picture of this line, and hopefully gives some impression of how surprising it is. (Source: wikipedia.)

Euler line of triangle

There are many proofs of this fact. Many of them are very nice, using some combination of similar triangles, or transformation geometry. And as with all proofs of facts that are initially surprising, it converts a possibly surprising, possibly ridiculous statement into an incontrovertible, natural, and true one.

The proof I’m going to share uses vectors. So to understand it, you need to know something about vectors: including position vectors of points; addition, subtraction and scalar multiplication of vectors; and the dot product, and the fact that the dot product of two nonzero vectors is zero if and only if they’re perpendicular. If you don’t, oh well, go and learn it. You likely won’t be able to derive it yourself, even with your mad 21st century skillz.

Remember, \(G\) is the centroid, \(O\) is the circumcentre and \(H\) is the orthocentre. But I’m not even going to assume we even know the orthocentre exists yet.

Let’s set up some position vectors. We’ll take the circumcentre \(O\) as the origin. (After all, we often use \(O\) for the origin and \(O\) for the circumcentre. Why not both? Quite a convenient elision.) As \( O \) is the centre of a circle passing through \( A,B,C \) we have \( | \vec{OA} | = | \vec{OB} | = | \vec{OC} | \). And, as I mentioned above, as \(G\) is the centre of mass of the triangle, \( \vec{OG} = ( \vec{OA}+\vec{OB}+\vec{OC})/3 \).

We’re now ready to state a theorem.

Theorem: Let \(H\) be the point with position vector \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). Then the orthocentre of \( ABC \) exists and is equal to \(H\), and \( \vec{OH} = 3 \vec{OG} \).

This theorem tells us a lot. Firstly, that the orthocentre exists, i.e. that the three altitudes of ABC intersect. Secondly, that they intersect at this point \(H\) defined by \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). Thirdly, from \( \vec{OH} = 3 \vec{OG} \), recalling how scalar multiplication works, it tells us that \( O,H,G \) all lie on a line, with \( O, G, H \) in order along it, and with \( H \) 3 times as far away from \( O \) as \( G \).

In fact, the conclusion that \( \vec{OH} = 3 \vec{OG} \) isn’t really much of a conclusion. It is immediate from the way we defined \( H \). We defined \( H \) by \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \). But since the centroid G satisfies \( \vec{OG}=( \vec{OA}+\vec{OB}+\vec{OC})/3 \), we observe that \( \vec{OH} \) is just 3 times \( \vec{OG} \). Of course 1 is 3 times 1/3…

So the thing to prove is that the orthocentre exists and is equal to \(H\), defined in this way. The proof is very short. And that’s what I mean when I say “one line Euler line”.

Proof: We show \(H\) lies on the altitude from \(A\) to \(BC\), i.e. that \(AH\) is perpendicular to \(BC\): \[\vec{AH}.\vec{BC} = (\vec{OH}-\vec{OA}).(\vec{OB}-\vec{OC}) = (\vec{OB}+\vec{OC}).(\vec{OB}-\vec{OC}) = |\vec{OB}|^2 – |\vec{OC}|^2 = 0. \] A similar argument shows that \(H\) lies on the other altitudes of \(ABC\), hence the altitudes intersect at \(H\).
QED

By “one line” I mean the one line calculation above. (If indeed it fit on one line on your screen…) Let’s explain each step. First, \( \vec{AH} = \vec{OH}-\vec{OA} \); that’s how vector subtraction works. Second, we substituted \( \vec{OH} = \vec{OA}+\vec{OB}+\vec{OC} \), so that \( \vec{OH} – \vec{OA} = \vec{OB} + \vec{OC} \). Third, we expanded a dot product “difference of perfect squares”, and used the fact that the dot product of a vector with itself is the length squared of that vector. Finally, we used the fact that \(O\) is the circumcentre, so that \( |\vec{OB}|=|\vec{OC}| \).

If you’d like to make sure you understand this, try to do to the “similar” arguments that \(H\) lies on the other altitudes of \(ABC\). In other words, as an exercise, try to show that \( \vec{BH}.\vec{AC} = 0 \) and \( \vec{CH}.\vec{AB} = 0 \).

Well, as I say, this is all in some sense “standard knowledge”, just stuff which is inaccessible to pretty much every normal person since nothing remotely in the same universe is ever taught at school (and nor will it be any time soon anywhere near me). As I say I just came across and (re-)discovered it, and thought of it was worth sharing.

The fact that \( \vec{OH} =\vec{OA}+\vec{OB}+\vec{OC} \) is apparently known as Sylvester’s triangle problem, and what I’ve shown here is very similar to it; in essence it’s the same thing. This vector proof isn’t new, it appears in many places online and off. And you can find many other more standard type proofs, using similar triangles and such things, online and off. As for proving the existence of the orthocentre, cut the knot has a nice page of different proofs, including one along the lines above.

And there you have it. A one line Euler line. More or less. So to speak.

I would dearly love it if students could solve problems related to these ideas — or even much simpler versions thereof —  without knowing any geometry or the idea of proof beyond the pale shadow of which exists in our curriculum, simply by the application of “21st century problem solving skills”. Futuristic indeed they must be to derive everything from a few flawed lessons on proof and some drilling of ways to prove triangles congruent. But as it turns out, to do this sort of problem solving, the average person needs to learn some geometry first. You cannot expect the average student derive hundreds of years of mathematical progress and geometric ingenuity by themselves. With the words to say and the musical notes to play, you can sing your song and create your mathematical world. Once you have the ideas sorted out, you open up a whole universe of creativity, problem solving, a route to the stars, and you have a basis for learning 21st century problem solving skills of creativity, abstraction, precision, rationality, logic, powered by passion, insight, and intuition. But without them, what can you do but flail about. “Every year fewer and fewer words, and the range of consciousness always a little smaller.

Wouldn’t it be nice if ideas like these, or at an attenuated version, or at least something approximating them — or just something, something? — could at least make an appearance in our schools?

Ptolemy vs Thurston in Hyperbolic Geometry and Topology, AustMS 2020

On 9 December 2020 I gave a (virtual) talk in the Topology session of the 2020 meeting of the Australian Mathematical Society.

Title: Ptolemy vs Thurston in Hyperbolic Geometry and Topology.

Abstract:

Bill Thurston (1946-2012 CE) developed a system of great simplicity and power for understanding hyperbolic 3-manifolds. In particular, he introduced equations whose variables encode the shapes of ideal hyperbolic tetrahedra and whose solutions describe hyperbolic structures on 3-manifolds.

Claudius Ptolemy (c.100-170 CE), better known for developing a rather different system, proved in his Almagest an equation about the lengths of sides and diagonals in a cyclic quadrilateral. Such Ptolemy equations arise in numerous places across mathematics, including in 3-dimensional hyperbolic geometry and representation theory.

In this talk I’ll discuss how Ptolemy and Thurston equations provide complementary perspectives on hyperbolic geometry and topology.

20-12_AustMS_talk

From Here to Hensel

Here’s a nice maths problem, which I thought it would be fun to discuss. The question doesn’t involve any advanced concepts, but it leads on to a very nice result called Hensel’s lemma, and can be taken even further.

The problem comes from somewhere in Mathematical Olympiad folklore… which is another way of saying “I don’t know the source”. If you know a source let me know by email or in the comments. It’s discussed in Problem Solving Tactics, the book I co-authored with my colleagues Angelo Di Pasquale and Norman Do.

PROBLEM. Let \(n\) be a positive integer. Prove that there are infinitely many perfect squares of the form \(a \cdot 2^n – 7 \), where \(a\) is a positive integer.

As with most Olympiad type problems, and especially as with problems in our book, it’s a problem which can be solved using “elementary” methods — in this case just modular arithmetic — but applied in a clever way. For the purposes of discussion here, I’m going to assume you know what modular arithmetic is. If you don’t know modular arithmetic, it’s a fun topic, and I’d encourage you to read about it.

As we’ll see, although his problem is a bit of a curly one, the ideas are quite general. They lead to a very nice result and a very nice technique.

Spoiler alert: I’m about to discuss an approach to this problem and give a solution. Now generally with mathematical problems it’s not good form to just read straight to a solution. So if you can, have a go at the problem yourself first. Otherwise you’ll just be burning the problem; you won’t get as much out of the discussion. Go on, get a pen and paper and give it a go. You’ve been here looking at a screen for too long anyway.

OK, on to discussing the problem.

It’s all modular arithmetic!

The first thing to note is that the problem is really about modular arithmetic, and we can rewrite it as such. To say that a number is of the form \( a \cdot 2^n – 7 \) with \(a\) an integer is to say that that number is congruent to \( -7 \) mod \( 2^n \). The problem is asking us to show that, for any positive integer \(n\), there are infinitely many perfect squares congruent to \(-7\) mod \(2^n\). In other words, it’s asking us to show that there exist infinitely many integers (x) such that
\[
x^2 + 7 \equiv 0 \text{ mod } 2^n.
\]
In fact, given \(n\), you only need to find one such \(x\). For once you have one such \(x\), any other number \(y\) congruent to \(x\) modulo \( 2^n \) will be a solution as well: since \( x \equiv y \) mod \(2^n\), then \( y^2 \equiv x^2 \equiv -7 \) mod ( 2^n ). So, we can simplify the problem somewhat.

To celebrate this first bit of progress, let’s fully write out our simplified form of the problem.

SIMPLIFIED VERSION OF PROBLEM. Let \(n\) be a positive integer. Show that there exists an integer \(x\) such that
\[
x^2 \equiv -7 \text{ mod } 2^n.
\]

Finding solutions, two by two by two…

To build up some understanding of how to find the required \(x\), let’s look at some small cases — as we often do, and which is a good way to approach problem-solving in general.

So, let’s start with \(n=1\). Here, we are looking for an \(x\) such that \( x^2 \equiv -7 \equiv 1 \) mod \(2\). In other words, we’re looking for \(x\) such that \(x^2\) is odd.

Try to guess a solution. Well, if the first \(x\) you guess doesn’t work, the second one probably will. Because in fact any odd \(x\) provides a solution. We may as well just take \( x = 1 \).

Great! What about \(n=2\)? Now we’re looking for \(x\) such that \(x^2 + 7 \) is a multiple of \(4\). This simplifies to \( x^2 \equiv -7 \equiv 1 \) mod \(4\).

Again, guessing a solution is not too hard. Since we’re working mod 4, there are only 4 possible guesses for \(x\) anyway. Guessing \(x = 1\) works, as does \(x=3\). This means that, again, any odd number works.

Well, this is all looking pretty good so far!

Flushed with success, we turn to \(n=3\). Now we want to solve \( x^2 \equiv -7 \) mod \(8\), or equivalently, \( x^2 \equiv 1 \) mod \(8\).

Again, not too difficult; and in any case there are only 8 possible guesses for \(x\) mod 8. And you can see that if \(x\) is even then \(x^2\) will also be even, and hence can’t possibly be \( \equiv 1 \) mod \(8\). So really there are only 4 reasonable guesses for \(x\).

And it turns out all guesses work. We can take \(x \equiv 1, 3, 5, \) or \(7\) mod \(8\). Again, any odd number works.

This all seems a bit too easy!

But at \(n=4\) we can’t just take any odd \(x\). We want to solve \(x^2 \equiv -7 \) mod \(16\), which is equivalent to \( x^2 \equiv 9 \) mod \(16\). We can take \(x=3\); in fact any \(x \equiv 3, 5, 11 \) or \(13 \) mod \(16\) works. So out of the 16 guesses mod 16, or 8 odd guesses, 4 of them work.

Now we go to \(n=5\) and try to solve \( x^2 \equiv -7 \) mod \(32\), or equivalently \(x^2 \equiv 25 \) mod \(32\). Noticing that 25 is actually a perfect square, we can just take \( x = 5 \). But again there are several more solutions, and you can check that \(x \equiv 5, 11, 21\) and \(27\) mod \(32\) are the solutions.

But now the numbers are getting bigger. In fact, they’re growing exponentially. As we proceed through \(n = 1, 2, 3, \ldots \), we are proceeding from considering numbers mod \(2\), to mod \(4 = 2^2\), to mod \(8 = 2^3\), and so on. The modulus is increasing by a factor of two at each step.

So, rather than get bogged down in arithmetic modulo 128, let’s step back and think a bit about how we might proceed.

Reducing from one modulus to another

You might have noticed, as we were passing through calculations mod 2, then 4, then 8, and so on, that at each stage we were “refining” our calculations.

And, on a related note, when we were considering calculations at each stage, we could apply the calculations at a previous stage.

Let’s be a bit more precise, and a bit more general.

If you know that a number \(x\) satisfies \(x \equiv 3\) mod \(4\), then you can deduce that \(x\) must be odd; we used this fact above. Similarly, if you know \(x\) satisfies \(x \equiv 7 \) mod \(15\), then you can deduce that \(x \equiv 1 \) mod \(3\). Indeed, if \(x\) is 7 more than a multiple of 15, then it’s also 7 more than a multiple of 3; any multiple of 15 is also a multiple of 3. So \(x \equiv 7 \equiv 1 \) mod \(3\).

More generally, if we know \(x\) mod \(m\), then we can deduce \(x\) mod \(k\), for any factor \(k\) of \(m\).

Let’s write \(m = kl\), and let’s assume \(k,l\) are both integers greater than 1. If you know \(x\) mod \(kl\) then you can know \(x\) mod \(k\). For if \(x \equiv c \) mod \(kl\), then \(x\) is \(c\) more than a multiple of \(kl\). But any multiple of \(kl\) is also a multiple of \(k\), so then \( x \equiv c \) mod \(kl\).

The upshot is that there is a well-defined natural way to reduce from an integer modulo \(kl\), to an integer modulo \(k\). In other words, there’s a natural function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
which does this reduction. If you’ve studied some of this stuff, a fancy way of writing this is to say that there’s a homomorphism of groups or rings
\[
R^{kl}_k \colon \frac{\mathbb{Z}}{kl \mathbb{Z}} \rightarrow \frac{\mathbb{Z}}{k \mathbb{Z}}.
\]
If you haven’t heard those words before, don’t worry; it isn’t necessary for what follows.

Now this idea of reducing from one modulus to another is very relevant in the present problem.

For instance, when we considered the case \(n=5\), we were looking for \(x\) such that \(x^2 \equiv -7 \) mod \(32\). But reducing to mod 16 (i.e. applying \( R^{32}_{16} \) ), such an \(x\) must also satisfy \(x^2 \equiv -7 \) mod \(16\). And we found the solution for such \(x\) mod \(16\) in the previous case \(n=4\): namely, \(x \equiv 3, 5, 11 \) or \(13\) mod \(16\).

Thus, the only possible solutions for \(x\) in the case \(n=5\), solving the equation \(x^2 \equiv -7\) mod \(32\), must be \(x\) mod 32 which reduce to \(3, 5, 11\) or \(13\) mod \(16\). There are 8 such \(x\), namely
\[
x \equiv 3, 5, 11, 13, 19, 21, 27, 29 \text{ mod } 32.
\]
So rather than checking all 32 possibilities for \(x\) mod 32, or (slightly more efficient) checking the 16 odd possibilities for \(x\) mod 32, by paying attention to the previous case we could get down to a mere 8 cases to check. And as we mentioned, among these 8 possibilities, 4 of them provide solutions:
\[
x \equiv 5, 11, 21, 27 \text{ mod } 32.
\]

Lifting from one modulus to another

What we’re doing here is often described as “lifting”: lifting integers mod 16 to mod 32.

For instance, if you know \( x \equiv 3 \) mod \(16\), then you can’t say what \(x\) is mod \(32\), but you can narrow it down to two possibilities: \( x \equiv 3 \) or \(19 \) mod \(32\). So \(x\) has two lifts from mod 16 to mod 32.

A more abstract way of saying the same thing is that, when you consider the function
\[
R^{32}_{16} \colon { \text{Integers mod 32} } \rightarrow { \text{Integers mod 16} },
\]
the integer 3 mod 16 has two preimages. These preimages are its “lifts” to modulo 32. Indeed, every integer mod 16 has two preimages under \( R^{32}_{16} \). Explicitly, the two preimages of a number \(c\) mod \(16\) under \( R^{32}_{16} \) are \(c\) and \(c+16\) mod \(32\).

As you might suspect, there’s nothing special about 16 and 32 here. Let’s again consider general positive integers \(k,l\) greater than 1. Then the function
\[
R^{kl}_k \colon { \text{Integers modulo } kl } \rightarrow { \text{Integers modulo } k }
\]
is surjective, and it’s “\(l\)-to-1”, meaning that every integer modulo \(k\) has precisely \(l\) preimages. Explicitly, the preimages of an integer \(c\) are just given by \(c\) and adding multiples of \(k\) to it some number of times; so the preimages of \(c\) mod \(k\) are
\[
c, c+k, c+2k, …, c+(l-1)k.
\]
These \(l\) preimages are just given by adding \(k\) until we arrive back where we started: from the last number listed here \(c+(l-1)k\), if we added another \(k\), we would arrive at \(c+lk\), which is congruent to \(c\) mod \(kl\) and back where we started.

I haven’t formally proved the statements in the above paragraph, but hopefully they should make sense and the above discussion gives an indication why they’re true. If you’re not entirely convinced, I’d encourage you to try and prove it yourself!

Lifting our Solutions

Let’s now return to the problem at hand and use these ideas to find more solutions. We left off at \(n=5\), finding the solutions \(x \equiv 5, 11, 21, 27 \) mod \( 32 \).

Proceeding to the case \(n=6\), we are trying to solve
\[
x^2 \equiv -7 \text{ mod } 64.
\]
Based on our previous discussion, let’s try to find a solution by lifting the solutions from the previous case from mod 32 to mod 64.

Let’s start from one of our solutions mod 32, \(x \equiv 5 \) mod 32.

It has two lifts, \(5\) and \(5+32 = 37 \) mod \(64\). Do either of these work? We can compute \(5^2 \equiv 25 \) and \(37^2 = 1369 \equiv 25 \) mod \(64\). So neither of them works.

Next, let’s try lifting \(x \equiv 11 \) mod \(32\). It has two lifts \(11, 43\) mod \(64\). We find that \( 11^2 \) and \(43^2 \) are both \( \equiv -7 \) mod \(64\). So both are solutions!

For the sake of completeness, we can check that lifting \(x \equiv 21 \) mod \(32\) gives the possibilities \(x \equiv 21, 53\) mod \(64\) — and both of these are also solutions. And lifting \(x \equiv 27 \) mod \(32\) gives the possibilities \(x \equiv 27, 59 \) mod \(64\), neither of which are solutions.

So, it’s not always the case that a solution mod 32 lifts directly to a solution mod 64. But it’s often the case.

If we can guarantee that we can always go from a solution mod \(2^n\) to a solution mod \(2^{n+1}\), then we’ve solved the problem. We would then be able to take our solutions mod \(2^n\) for small \(n\), and use this guarantee to successively obtain solutions mod higher powers of \(2\). By induction on \(n\) then we can guarantee a solution mod \(2^n\) for all \(n\), which is precisely what the question asks for.

An Unorthodox Lifting

So, let’s work generally. Suppose we have a solution \(x \equiv c\) mod \(2^n\), i.e. we have found \(c\) such that \(c^2 \equiv -7 \) mod \(2^n\).

Then when we lift \(c\) to mod \(2^{n+1}\), there are two possibilities: \(x \equiv c\) or \(c+2^n\) mod \(2^{n+1}\). If either of these satisfies \(x^2 \equiv -7\) mod \(2^{n+1}\), we are finished.

Moreover, when we do this, we know we have to be “pretty close” to a solution, in some sense. We can see this by lifting, not just the \(x\), but also the \(-7\). Indeed, if \(x^2 \equiv -7 \) mod \(2^n\), then \(x^2 \) considered mod \(2^{n+1}\) is a lift of \(-7\) from mod \(2^n\) to mod \(2^{n+1}\). The two lifts of \(-7\) from mod \(2^n \) to mod \(2^{n+1}\) are \(-7\) and \(-7+2^n\). So either
\[
x^2 \equiv -7 \text{ or } -7+2^n \text{ mod } 2^{n+1}.
\]

Anyway, let’s take the lifts \(c\) and \(c+2^n\) of our solution from mod \(2^n\) to mod \(2^{n+1}\), and see how they compare. We can calculate
\[
(c + 2^n)^2 = c^2 + 2^{n+1} c + 2^{2n} \equiv c^2 \text{ mod } 2^{n+1}.
\]
Observe what happened: upon expanding out \( (c+2^n)^2 \), all of the terms except \(c^2\) contained high powers of 2, which modulo \(2^{n+1}\) are just zero and disappear.

This is bad news, because it means that \(c\) and \(c+2^n\) give the same result mod \( 2^{n+1} \) when we square them. So if \(x \equiv c\) is a solution, i.e. \(c^2 \equiv -7 \) mod \(2^{n+1} \), then \(x \equiv c + 2^n \) is also a solution. That’s great, but if \(x \equiv c \) is not a solution, then neither is \(x \equiv c+2^n \).

That’s just as we found above. Some solutions mod \(32\) lift to give a solution mod \(64\), and when they lift they lift doubly well, with both lifts being solutions. But other solutions do not lift to solutions at all.

So the problem becomes a bit harder, on account of the fact that
\[
(x + 2^n)^2 \equiv x^2 \text{ mod } 2^{n+1}.
\]
If only that weren’t true… but we’ll return to that point later.

For now, let’s deal with the reality at hand, where that unfortunate equation holds.

Back to the drawing board…

Our approach at this point has now become an attempt at a proof by induction. Given a solution modulo \(2^n\), we want to obtain from it a solution modulo \(2^{n+1}\). We need to do this in a way that always works. But we now know that we can’t just directly lift a solution each time.

Let’s go back to our equation modulo \( 32 \) and try to lift the solutions to modulo \( 64 \) in a way that “always works”. In other words, let’s try to find a method that, no matter which solution modulo \(32\) we start with, we can end up with a solution modulo \(64\).

We had the solution \(x \equiv 5 \) mod \(32\), which did not lift to a solution mod \(64\). But since \(x^2 \equiv -7\) mod \(32\), we could have known, lifting the \(-7\), that
\[
x^2 \equiv -7 \text{ or } 25 \text{ mod } 64.
\]
However, we found that both lifts \( x \equiv 5, 37 \) mod \(64\) failed to be solutions: both squared to \( 25 \) mod \(64\). And as we know now, if one of them fails, the other is doomed to fail too.

The reason for this failure, we found, was because
\[
37^2 \equiv (5 + 32)^2 \equiv 5^2 + 2 \cdot 5 \cdot 32 + 32^2 \equiv 5^2 \text{ mod } 64.
\]
As you can see, when we add the 32 in the term to be squared, we have added such a high power of 2 that all the terms other than \(5^2\) disappear modulo \(64\). In particular, the cross term \(2 \cdot 5 \cdot 32 \) becomes a multiple of 64.

But what if we added a just slightly lower power of 2? Rather than adding 32, we could add 16. This would yield \(x = 21\), which we found above is actually a solution! Indeed, applying the same algebra as above but with a 16 rather than a 32, we can see that
\[
21^2 \equiv (5+16)^2 \equiv 5^2 + 2 \cdot 5 \cdot 16 + 16^2
\equiv 5^2 + 32 \cdot 5 \text{ mod } 64.
\]
Now rather than evaluating \(32 \cdot 5\), we could just notice that it’s an odd multiple of 32. A multiple of 32 is always going to be congruent to 0 or 32 mod 64, depending on whether we multiply it by an even of an odd number. Here we have an odd multiple, so \( 32 \cdot 5 \equiv 32 \) mod \(64\). Thus,
\[
21^2 \equiv 5^2 + 32 \text{ mod } 64.
\]

This is great news! It says that \(5^2\) and \(21^2\) are not the same, modulo 64: they differ by \(32\). If it turned out that \(5^2 \equiv -7\) mod \(64\), we’d be finished. But it isn’t: it takes the other value, \(5^2 \equiv 25 \) mod \(64\). And in that case, the above equation tells us that \(21^2\) must give us the right value, \(21^2 \equiv 5^2 + 32 \equiv 25+32 \equiv -7 \) mod \(64\).

This argument doesn’t rely at all on the particular powers of 2 involved. It’s going to give us a way to go from a solution mod \(2^n\) to a solution \(2^{n+1}\). It will solve our problem.

… and a proof

Let’s reiterate the above in more generality, which in fact gives a solution to the problem.

Suppose we have an integer \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\). Proceeding as above, we’ll find a \(y\) such that \(y^2 \equiv -7 \) mod \(2^n\).

Well, lifting \(-7\) from mod \(2^n\) to mod \(2^{n+1}\), we must have that either
\[
x^2 \equiv -7 \text{or} -7+2^n \text{ mod } 2^{n+1}.
\]
In the first case, we are finished: we can just take \(y = x\). In the second case, let \(y = x + 2^{n-1} \). Then we have
\[
y^2 = (x + 2^{n-1})^2
= x^2 + 2^n x + 2^{2n-2}
\equiv x^2 + 2^n x \text{ mod } 2^{n+1}.
\]
Now as \(x^2 \equiv -7 \) mod \(2^n\), \(x\) is definitely odd. So \( 2^n x \) is an odd multiple of \(2^n\). Modulo \(2^{n+1}\), all multiples of \(2^n\) are congruent to \(0\) or \(2^n\): even multiples are congruent to \(0\), odd multiples are congruent to \(2^n\). Thus \(2^n x \equiv 2^n \) mod \(2^{n+1}\). We then conclude that
\[
y^2 \equiv x^2 + 2^n \text{ mod } 2^{n+1}.
\]
Since we assumed in this case that \( x^2 \equiv -7 + 2^n \) mod \( 2^{n+1} \), we therefore have
\[
y^2 \equiv -7 + 2^n + 2^n \equiv -7 \text{ mod } 2^{n+1},
\]
which is precisely what we were looking for.

We have shown that, given \(x\) satisfying \(x^2 \equiv -7 \) mod \(2^n\), there exists \(y\) satisfying \(y^2 \equiv -7\) mod \(2^{n+1}\).

So, since we found solutions modulo \(2^n\) for small \(n\), we now have proved, by induction, that for all positive integers \(n\), there exists \(x\) such that \(x^2 \equiv -7 \) mod \(2^n\).

We have solved the problem!

What if life was easy?

In this problem we had to solve a specific equation in modular arithmetic. We did it by repeatedly “lifting” solutions. Given a solution modulo \(2^n\), we used it to find a solution modulo \(2^{n+1}\).

It would have been easier if we could directly lift the solutions. In other words, it would have been easier if whenever \(x\) was a solution modulo \(2^n\), it was also a solution modulo \(2^{n+1}\). That wasn’t quite true in this problem, and we have to adjust our \(x\) by \(2^{n-1}\) if necessary in passing from arithmetic modulo \(2^n\) to arithmetic modulo \(2^{n+1}\).

Sometimes, however, you can lift your solutions directly.

Sometimes, life doesn’t have to be difficult.

Let’s step back a bit and think about the type of problem we were trying to solve. We had a polynomial equation modulo \(2^n\):
\[
P(x) \equiv 0 \text{ mod } 2^n.
\]
In this problem, we had \(P(x) = x^2 + 7 \).

Let’s change the problem to a different polynomial: let \( Q(x) = x^3 + 7 \), and we’ll consider the similar-looking polynomial equation
\[
Q(x) \equiv 0 \text{ mod } 2^n,
\text{ i.e. }
x^3 + 7 \equiv 0 \text{ mod } 2^n.
\]

Now, mod \(2,4,8\), there’s the straightforward solution \(x=1\).

Mod \(16\), \(x=1\) is not a solution, but the other lift of \(1\) from mod 8 to mod 16 is a solution, i.e. \(x=9\).

Mod \(32\), \(x=9\) is still a solution.

Mod \(64\), \(x=9\) is not a solution, but the other lift of \(9\) from mod 32 to mod 64 is a solution, namely \(x=41\).

And so on.

So it appears that when we’re solving \(x^3 + 7 \equiv 0 \) mod \(2^n\), we can just lift solutions directly. Going from a solution modulo \(2^n\) to \(2^{n+1}\) there are two lifts, and one of them always seems to work.

Let’s see if we can prove this, using a similar method as for the initial, harder, problem. Suppose we have a solution \(x\) to the equation \(x^3 + 7 \equiv 0 \) mod \( 2^n \); we’ll try to find a \(y\) such that \(x^3 + 7 \equiv 0 \) mod \(2^{n+1}\).

Well, since \(x^3 + 7 \equiv 0 \) mod \(2^n \), then there are two possibilities for \(x^3 + 7 \) modulo \(2^{n+1}\):
\[
x^3 + 7 \equiv 0 \text{ or } 2^n \text{ mod } 2^{n+1}.
\]
In the first case, we have our solution: we may simply take \(y=x\) and then \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\) as desired. In the second case, consider letting \(y\) to be the other lift of \(x\) from mod \(2^n\) to mod \(2^{n+1}\), namely \(y = x+2^n\).

We can then compute \(y^3 + 7 \) mod \(2^{n+1}\), using similar algebra as in the original question:
\[
y^3 + 7
= (x+2^n)^3 + 7
= x^3 + 3x^2 2^n + 3x 2^{2n} + 2^{3n} + 7.
\]
Now by assumption \(x^3 + 7 \equiv 2^n \) mod \(2^{n+1}\). Moreover, any term containing a factor with \(2^a\) with \(a \geq n+1\) reduces to zero modulo \(2^{n+1} \), since it’s a multiple of \(2^{n+1}\). Thus the terms with a \(2^{2n}\) and a \(2^{3n}\) disappear. Thus the above expression simplifies, and we obtain
\[
y^3 + 7 \equiv 2^n + 3x^2 2^n
\equiv 2^n (1 + 3x^2) \text{ mod } 2^{n+1}.
\]

Here we have a multiple of \(2^n\). And as we discussed earlier, there aren’t many possibilities for multiples of \(2^n \) modulo \(2^{n+1}\): even multiples of \(2^n\) are in fact multiplies of \(2^{n+1}\), hence congruent to \(0\) mod \(2^{n+1}\); while odd multiples of \(2^{n}\) must be congruent to \(2^n\) mod \(2^{n+1}\).

Since \(x\) satisfies \(x^3 + 7 \equiv 0 \) mod \(2^n\), we see that \(x\) must be odd. So \(3x^2 \) is odd and \(1+3x^2 \) is even. Thus \(2^n (1+3x^2) \) is an even multiple of \(2^n\), hence congruent to 0 mod \(2^{n+1}\), and we obtain
\[
y^3 + 7 \equiv 0 \text{ mod } 2^{n+1}
\]
as desired.

Thus, we have shown that the solution \(x\) of \(x^3 + 7 \equiv 0 \) mod \(2^n\) lifts directly to a solution \(y \) of \(y^3 + 7 \equiv 0 \) mod \(2^{n+1}\). It may be that \(y\) is one lift \(x\) or the other lift \(x+2^n\), but one of those two lifts works.

This is much more straightforward behaviour than in the original problem! Life is easy now!

Why is life sometimes easy and sometimes hard?

So, what’s the difference between these two cases?

With the polynomial \(P(x) = x^2 + 7 \), a solution of \( P(x) \equiv 0 \) mod \(2^n\) did not lift directly to a solution mod \(2^{n+1}\), although we were able to find solutions by using a trick.

With the polynomial \(Q(x) = x^3 + 7 \), a solution of \( Q(x) \equiv 0 \) mod \(2^n\) did lift directly to a solution mod \(2^{n+1}\).

What is so different about \(P(x)\) and \(Q(x)\)?

At one level, the difference derived from a technicality in trying to lift solutions. From a solution \(x\) of \(P(x) \equiv 0 \) mod \(2^n\), we considered the lifts \(x\) and \(x+2^n\) mod \(2^{n+1}\). In expanding out \( P(x+2^n) \), too many terms cancelled, leaving us with just the terms of \( P(x) \) :
\[
P(x+2^n) = (x+2^n)^2 + 7
\equiv x^2 + 7 \equiv P(x) \text{ mod } 2^{n+1}.
\]
But doing the same for \( Q(x) \), there wasn’t quite as much cancellation:
\[
Q(x+2^n) = (x+2^n)^3 + 7
\equiv x^3 + 3x^2 2^n + 7
\equiv Q(x) + 3x^2 2^n \text{ mod } 2^{n+1},
\]
and in fact we were able to show that the extra term \( 3x^2 2^n \) was congruent to \( 2^n \) mod \(2^{n+1} \), so that
\[
Q(x+2^n) \equiv Q(x) + 2^n \text{ mod } 2^{n+1}.
\]
This last equation effectively says that \( Q(x+2^n) \) and \( Q(x) \) are the two lifts of \( Q(x) \) from mod \( 2^n \) to mod \( 2^{n+1} \); so one or the other lift of \(x\), i.e. either \(x\) or \(x+2^n\) will be the solution we’re looking for.

What’s happening, in some sense, is that the solution \(x\) to \( P(x) \equiv 0 \) mod \(2^n\) was “too good”: it was so good that \( P(x+2^n) \) cancelled too much!

In fact, there is a sense in which the solution \(x\) of \( P(x) \equiv 0 \) mod \(2^n\) was a “repeated root”, just like \(x=2\) is a repeated root of the polynomial \( (x-2)^2 \) over the real numbers.

Repeated roots in modular arithmetic

Now we have to be very careful with saying things like “repeated roots” for polynomials in modular arithmetic. Take arithmetic modulo 8 for instance. Then for one thing, there is no unique factorisation of polynomials, even if we regard integers purely as mod 8: for instance,
\[
(x+4)^2 = x^2 + 8x + 16 \equiv x^2 \text{ mod } 8,
\]
so that we have two distinct factorisations of the same polynomial
\[
(x+4)^2 \equiv x^2 \text{ mod } 8.
\]
Thus it doesn’t really make sense to talk about the “roots” of a polynomial mod 8, in the same way that we talk about the roots of a polynomial over the real or complex numbers.

(However, as it turns out, modulo a prime, there is unique factorisation of polynomials, and you can talk about the roots of a polynomial in a very similar way as over the real or complex numbers.)

Nonetheless, you can definitely talk about the solutions of a polynomial equation modulo 8; and indeed we’ve been doing that for a while now.

Perhaps the best way to understand what “repeated roots” might mean modulo 8 is to forget about factorising polynomials and instead differentiate them.

Differentiate? As in calculus? As in the subject all about real numbers and limits and tangent lines and so on, having very little to do with modular arithmetic?

Yes. That sort of differentiation. Let’s see why.

Simple roots and derivatives

Let’s forget about modular arithmetic for the minute, and think about polynomials in general, with (say) real numbers for coefficients. So, let \( F(x) \) be a polynomial.

Recall that a root of \(F(x)\) is a number \(r\) such that \(F(r) = 0\). This is equivalent to \( x-r \) being a factor of \(F(x)\).

We say a root is a repeated root or a non-simple root if in fact \( (x-r)^2 \) is a factor of \(F(x)\).

Otherwise, we say a root is a simple root. In this case, \( x-r \) is only a factor of \(F(x) \) once. In other words, \( F(x) = (x-r)H(x) \) where \( x-r \) is not a factor of \(H(x)\), or equivalently, where \( H(r) \neq 0 \).

One interesting fact about polynomials is that you can detect whether root is simple via differentiation.

FACT. Let \(r\) be a root of \(F(x)\). Then \(r\) is a non-simple (i.e. repeated) root if and only if \( F'(r) = 0 \).

In other words, a root of \( F(x) \) is a repeated root if and only if it’s also a root of the derivative \( F'(x) \).

The proof comes from applying the product rule.

PROOF. If \(r\) is a repeated root then \( F(x) = (x-r)^2 G(x) \) for some polynomial \( G(x) \). By the product rule then
\[
F'(x) = 2(x-r)G(x) + (x-r)^2 G'(x)
\]
and since both terms have a factor of \(x-r\), when we substitute \(x=r\) we obtain \( F'(r) = 0 \). If \(r\) is not repeated, then \( F(x) = (x-r) H(x) \) where \( H(r) \neq 0 \). In this case the product rule gives
\[
F'(x) = H(x) + (x-r)H'(x),
\]
and substituting \(x=r\) gives \( F'(r) = H(r) \neq 0 \).

Well, that’s a nice little fact. And we can try to apply it over the integers… or in modular arithmetic.

Given a polynomial \(F(x)\) with integer coefficients, its derivative \(F'(x) \) is still a polynomial with integer coefficients. And even if you consider the integer coefficients modulo 8 (or any other modulus), the derivative is still well defined modulo 8 (or any other modulus).

We can use this idea to define a notion of simple or repeated roots, even when working with modular arithmetic. When we have a root \(r\) of a polynomial \(F(x)\), we look at \(F'(r)\). As we’ll see, a similar idea applies, but there is a slight twist.

Let’s return to our examples discussed above.

For \( P(x) = x^2 + 7 \), modulo 8, we have the root \(1\), since \(P(1) \equiv 0 \) mod 8. Its derivative is \( P'(x) = 2x \), and \(P'(1) \equiv 2 \) mod 8.

For \( Q(x) = x^3 + 7 \), modulo 8, we have the root \(1\), since \( Q(1) \equiv 0 \) mod 8. Its derivative is \( Q'(x) = 3x^2 \), and \( Q'(1) \equiv 3 \) mod 8.

Now as it turns out, \( 1 \) is a “repeated root” of \( P(x) \), but it is a simple root of \( Q(x) \).

As we can see, it’s not true that \( P'(1) = 0 \), as we might have expected for a repeated root. However, it is even: that is, \( P'(1) \equiv 0 \) mod 2. The key difference is that 2 is odd, but 3 is not!

As it turns out, the way to define simple and repeated roots of a polynomial \(F(x)\), when we are working modulo \(2^n\), is to look at the derivative \(F'(x)\), not mod \(2^n\), but simply mod \(2\). And more generally, the same ideas apply when we work modulo powers of any prime.

In this way, we can make sense of simple and non-simple roots, even when the idea of unique factorisation of polynomials doesn’t otherwise work. (We’ll call them “non-simple” rather than “repeated” since the whole idea of unique factorisation breaks down.)

DEFINITION. Let \( F(x) \) be a polynomial whose coefficients are integers mod \(p^k\), where \(p\) is a prime and \(k\) is a positive integer.

  • We say an integer \(r\) mod \(p^k\) is a root of \(F(x)\) if \(F(r) \equiv 0 \) mod \(p^k\).
  • If in addition \( F'(r) \equiv 0 \) mod \(p\), we say \(r\) is a non-simple root.
  • Otherwise, i.e. if \( F'(r) \neq 0 \) mod \(p\), we say \(r\) is a simple root.

With this definition, \(1\) is a non-simple root of our original polynomial \(P(x) = x^2 + 7 \) mod \(8\), since \(P'(x) = 2x\) and \(P'(1) \equiv 0 \) mod \(2\).

On the other hand, \(1\) is a simple root of the polynomial \(Q(x) = x^3 + 7 \) mod \(8\), since \(Q'(x) = 3x^2 \) and \(Q'(1) = 3 \neq 1 \) mod \(2\).

Hensel’s Lemma

We’ve seen that in certain circumstances, solution of polynomial equations modulo \(2^n\) lift to solutions modulo \(2^{n+1}\). Or put another way, we’ve seen that roots of polynomials over the integers modulo \(2^n\) lift to roots of polynomials over the integers modulo \(2^{n+1}\).

We found that for \(P(x) = x^2 + 7\), roots of \(P(x) \equiv 0 \) mod \(2^n\) did not lift to roots mod \(2^{n+1}\). But we also found non-simple roots in this case.

On the other hand, we found that for \(Q(x) = x^3 + 7 \), roots of \(Q(x) \equiv 0 \) mod \(2^n\) did lift to roots mod \(2^{n+1}\). And in this case the roots were simple.

As it turns out, this is true in complete generality. When we have simple roots mod \(2^n\), they always lift to roots mod \(2^{n+1}\). And this is true not just mod \(2^n\), but mod \(p^n\) for any prime \(p\).

This result is called Hensel’s lemma. Let’s state it.

HENSEL’S LEMMA. Let \(F(x)\) be a polynomial with integer coefficients. Let \(p\) be a prime and \(k\) a positive integer. Suppose \(r\) mod \(p^k\) is a simple root of \(F(x)\) modulo \(p^k\), i.e.
\[
F(r) \equiv 0 \text{ mod } p^k
\text{ and }
F'(r) \neq 0 \text{ mod } p.
\]
Then there is a lift \(s\) of \(r\) mod \(p^{k+1}\) which is a root of \(F(x)\) modulo \(p^{k+1}\), i.e.
\[
s \equiv r \text{ mod } p^k
\text{ and }
F(s) \equiv 0 \text{ mod } p^{k+1}.
\]

Hensel’s lemma, as we’ve seen, doesn’t apply directly to our original problem. But it does apply to the easier problem with the polynomial \( Q(x) = x^3 + 7 \). Let’s see how quickly the solution goes now when we revisit the problem.

PROBLEM. Show that for any positive integer \(n\), there is an integer \(x\) such that \(Q(x) = x^3 + 7\) is divisible by \(2^n\).

SOLUTION. Proof by induction on \(n\). When \(n=1\) we observe that \(x=1\) is a solution, since \(Q(1) = 8 \equiv 0 \) mod \(2\). Now suppose we have a root \(r\) of \(Q(x)\) modulo \(2^n\). Then as \(r^3 + 7 \equiv 0 \) mod \(2^n\), \(r\) must be odd. Thus \( Q'(r) = 3r^2 \) must be odd, so \( Q'(r) \neq 0 \) mod \(2\), and \(r\) is a simple root. By Hensel’s lemma then \(r\) lifts to a root \(s\) of \(Q(x)\) mod \(2^{n+1}\). So any root of \(Q(x) \) mod \(2^n\) lifts to a root mod \(2^{n+1}\), and by induction then \(Q(x) \) has a root modulo \(2^n\) for all positive integers \(n\), as desired.

And there you have it.

Return of the Euler-Fermat theorem

A long long time ago, in a galaxy far away, I wrote up an account of the Euler-Fermat theorem for school students.

The Euler-Fermat theorem is the wonderful result of elementary number theory, which says that in modular arithmetic, you can often multiply a number by itself a certain number of times to get back to 1.

For instance, consider the integers modulo 14, i.e. remainders upon division by 14. Starting from the number 3, if we repeatedly multiply it by itself, i.e. we consider powers of 3, we obtain
\[
3, 3^2 = 9, 3^3 = 27, 3^4 = 81, 3^5 = 243, 3^6 = 729, …
\]
which reduce modulo 14 to (i.e. their remainders upon division by 14 are)
\[
3, 9, 13, 11, 5, 1, …
\]
Since, as we have just calculated, \( 3^6 \equiv 1 \) mod 14, this means that subsequent powers of 3 will reduce mod 14 to precisely the same numbers; they will continue to cycle around. For instance,
\[
3^7 = 3^6 \times 3 \equiv 1 \times 3 \equiv 3 \text{ mod } 14, \\
3^8 = 3^6 \times 3^2 \equiv 1 \times 9 \equiv 9 \text{ mod } 14, \\
3^9 = 3^6 \times 3^3 \equiv 1 \times 13 \equiv 13 \text{ mod } 14,
\]
and so on. The powers of 3 cycle around modulo 14, repeating the same 6 numbers.

So, the fact that \( 3^6 \equiv 1 \) mod 14 gives a very nice structure to powers of 3, modulo 14.

We can try the same thing starting from some different numbers — still working modulo 14, i.e. taking remainders upon division by 14.

If we do the same for powers of 5 modulo 14, we find the same thing: again \( 5^6 \equiv 1 \) mod 14, and the powers of 5 also repeat through 6 numbers.

With powers of 9 modulo 14, we find something similar, but not quite the same: here \( 9^3 \equiv 1 \) mod 14, so the powers of 9 form a cycle of length 3, repeating through 3 numbers.

With powers of 4 modulo 14, however, something goes wrong. Taking powers of 4, we never arrive back at 1 modulo 14. And if you think about it you can see why: 4 is even, hence so are all its powers. But a number congruent to 1 mod 14 must be odd, so this can never happen.

The same is true starting from 7. The powers of 7 are all divisible by 7, but a number congruent to 1 mod 14 must be 1 more than a multiple of 7.

In order to have a chance of arriving back at 1, with the powers cycling around, the number we start from must have no factors in common with 14. In other words, it must be relatively prime to 14.

The numbers 3, 5, 9 are all relatively prime to 14. And we found \( 3^6 \equiv 1 \) mod 14, \( 5^6 \equiv 1 \) mod 14, and \( 9^3 \equiv 1 \) mod 14. The power required is 6 in the first two cases and 3 in the other. But notice that 3 is a factor of 6, so in fact
\[
9^6 = (9^3)^2 \equiv 1 \text{ mod } 14
\]
as well. So starting from any of these numbers, its sixth power leaves a remainder of 1 modulo 14.

So now we arrive at a conjecture: if \( a \) is an integer relatively prime to 14, then \( a^6 \equiv 1 \) mod 14.

This conjecture turns out to be true. And in fact, something like it is true not just when we look modulo 14, but modulo any positive integer \( n \).

But as you might suspect, you don’t always raise numbers to the power 6. When you replace 14 by a general number \( n \), what do you replace the exponent 6 with?

The answer is, the Euler phi function \( \phi(n) \). Given a positive integer \( n \), we define \( \phi(n) \) as the number of positive integers less than \( n \) which are relatively prime to \( n \).

So, for instance, the positive integers less than 14 which are relatively prime to 14 are
\[
1, 3, 5, 9, 11, 13.
\]
There are six of these, so \( \phi(14)=6 \).

And so the Euler-Fermat theorem can be stated as follows.

THEOREM. For any positive integer \(n\) and any integer \( a \) relatively prime to \( n \),
\[
a^{\phi(n)} \equiv 1 \text{ mod } n.
\]

To really understand why this theorem is true, I think one should understand a little group theory. Just the first little bit of group theory.

And so, a long long time ago, in a galaxy far far away, I thought I would write up such a proof, introducing just as much group theory as one needs for it. The result is intended for talented high school students or interested members of the public.

Today, I rediscovered it. You can read it below!

Euler-Fermat-theorem

Topology: The shape of space

Monash Open Day in 2020 was a purely online affair, thanks to COVID. I recorded a video talking about Topology: The Shape of Space. It’s a first attempt at making videos like this, and some of the audio is not great, but it’s available here.

,,, In which I talk coffee cups and donuts, play Pac Man, solve  a toroidal maze, fly through dodecahedral space, and more.

University forever

I don’t have anything against people who want to stay at university as long as possible: this is, after all, my life. Well, if you are failing everything, than perhaps there is a reasonable argument you shouldn’t be there.

I think long term economic trends point in this direction though.

With improving technology and automation and so on, it may well be that there aren’t enough jobs to go around for everybody. I don’t think we’re anywhere near a science-fiction type situation where robots can do all the work and humans can live in paradise (would it be paradise?), but it’s certainly the case that we don’t need all adults working 40 hour weeks to produce our current level of wealth. The pandemic has conclusively demonstrated that we can essentially shut down vast portions of the economy, without shortages of food, without shortages of essentials, without losses of basic amenities or utilities, and in fact without shortage of most luxury consumer products either.

There are of course many types of hardship going on, great sadness and loneliness and alienation, and many of the failings of our society have been exacerbated.

But the economic point remains: our society really can produce more than everybody needs, with only a relatively small proportion of the working population.

In that case, pandemics aside, we don’t have to “work”, in the usual sense of the word, very hard to produce all our society’s requirements. Rather than sitting around unemployed, a civilized society would spread the work around, shorten the working week, and provide more and more opportunities for self-development, through art, literature, science, and education — and whatever else, for that matter.

There are plenty of platitudes spoken by politicians about “lifelong learning” — why not make it formally possible to continue learning throughout your entire life?

Capitalism cannot do this. A civilized society could.

Sitting out the math wars

I recently had the misfortune to revisit some episodes of the “math wars” — the ongoing struggle over mathematics education and curriculum, mostly at the school level. It’s an important struggle, with great ramifications for our education system, not to mention one in which I have a personal stake, being a mathematician and all. But very few professional mathematicians have been involved in them, and when they have, they have not always inspired confidence. The very worst behaviour in these “wars” has come from professional mathematicians.

Putting aside the worst behaviour (but only temporarily), I wondered why. Indeed, I’ve never gotten myself involved in it. I think the reasons are worth examining.


Immediate disclaimers are necessary regarding (1) “math” and (2) “wars”.

I say “math” rather than “maths”, because, as with many of struggles over human culture in the West, the focal point of the struggles have been in the US. Of course, those struggles have their resonances and reverberations in Australia — the turbulence splitting off an Oceanic vortex across the Pacific.

And I detest the “war” analogy. It’s not a war in any meaningful sense. It’s a policy debate with some important political-economic and ideological undercurrents. And war analogies are overused and our society is over-militarised.

So, this is a bad usage, but it’s a standard usage. Hence the scare quotes. It’s “math wars” all the way through.


So why would mathematicians sit out the “math wars”?

For myself, the “math wars” are not a debate in which I’ve often felt I have much to contribute — despite having been involved in mathematics teaching, in one way or another, for over two decades. There are several reasons.

Firstly, my experience of learning mathematics is far from average, in any sense, and so I am naturally skeptical of any attempt to generalise to others those learning or teaching techniques that have worked for me. Arguably, I look like the best but am in fact the worst qualified person to opine on the topic.

Secondly, at the school level, my teaching experience is almost entirely limited to the teaching talented or advanced students. This type of teaching presents its own challenges, but it does not reflect the average teacher’s experience in the slightest. The challenges of extending precocious geniuses are essentially completely irrelevant to the challenges of getting the average class interested in and proficient at mathematics.

Thirdly — of which I am reminded every time I actually teach — I’ve never felt like a natural teacher: a natural explainer, perhaps, but teaching is much more than that. At most, I’ve become an experienced lecturer, I know what works for me and think I do a good job of it, but I recoil from the idea that this gives me any authority to speak on what others should do.

Because, finally, all my scientific and political commitments militate against pronouncements on the topic. The mathematician’s allergy to over-generalisation, the scientist’s skepticism to the applicability of findings beyond their domain, the anarchist’s aversion to telling others what to do, the socialist’s solidarity with a class of underpaid and underappreciated workers, and the general democratic injunction against thinking you know what is best for others in their own profession — all these, when I have some idea about classroom teaching, scream in unison that in fact I don’t know the first thing about teaching the average maths class in Australian schools today, and that greatly limits what I could say about the curriculum that should be taught.

Essentially, I have enormous respect for primary and secondary teachers — they do a thankless but crucially important job, are underpaid and overworked — and for all that, suffer the disrespect of parents and students every day. Telling them what to do, what curriculum is best for their class seems not only distasteful and uninformed, but also, in a certain sense, wrong.

On a deeper level, I’ve always felt that battles over curriculum and teaching are not the real issue, and that it is really simply the policy end of a much deeper cultural problem.

Australian culture, in general, despises mathematics. It is the only essential school subject where an Australian can declare its uselessness, and generally be assured sympathy. It is the only school subject on which, generally speaking, one can confess their ignorance or incompetence and be assured of understanding rather than embarrassment, without a hint of guilt. One can confess to not being much good at geography, or history, or other sciences — and even if it is more a celebration of ignorance than charming self-deprecation, it is still a confession, with a hint of remorse. Not so with simultaneous equations and the like, upon which opinion may vary from collective recollection of terrible classroom experiences, through to vengeful abuse of the educational system.

Given that cultural status, is it any surprise that students might be uninterested in mathematics? That there might not be many students studying mathematics to a high or advanced level? That there might not be many people studying to be mathematics teachers — and hence, a shortage of qualified mathematics teachers? And hence, the reproduction of a populace which despises mathematics?

On cultural matters I tend to throw up my hands, if not in despair, then because of my inability to do anything to change it. I can explain mathematics to people, why it is interesting and fascinating and beautiful and so on. But, in news which will come as a surprise to no one, I don’t really like my chances of making it cool.

Of course, all these difficulties — general derision for the subject, declining enrolments, teacher shortages — run together. An improvement in one can lift all the rest, and a failure in one can lower all the others. But being so total makes the problem a formidable one.

And that leads to a great sense of disconnection. I feel an enormous chasm between my experience and interests, and those of the general population. Why wouldn’t everyone want to try to figure out what on earth (on earth!) this universe is, and what it is, and the mechanisms that underlie it? How can it be that not everyone is captivated by mathematical truth? Antipathy, or even indifference, to mathematics is to me an alien concept.

Of course I understand that people have different interests, and diversity of human interests is a good thing. And this chasm does not of course prevent me from trying to convince others that these topics are fascinating, nor from trying to promote understanding of these topics. And of course I’m aware that the secondary mathematics curriculum can be boring and lifeless. (At school I much preferred history classes to my official school maths classes.)

But all this does give me pause as to how to address such a radically different audience. The reality is that I am the radically different one, not everyone else.


So that is my navel-gazing examination of my reticence on the subject. And much of it will clearly generalise to the mathematical community, though much will not. Not every mathematician shares my politics or my cultural background, but I think many would feel a long way removed from classroom teaching of their subject, a sense of being cultural outsiders in their embrace of mathematics, and despite their expertise in the subject, a general lack of knowledge about what would actually work in classrooms.

But this reticence, though perhaps justified in each individual case, may collectively be counterproductive. Arguably mathematics curriculum and assessment have descended into various domains of stupidity. Arguably the whole debate would benefit from more involvement by knowledgeable mathematicians.

And as per Bertrand Russell, “The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt.” Not to suggest I am at all intelligent on this matter, or that anyone else involved is not — but I am certainly full of doubt. (It’s about the only thing about which I have no doubt!) Perhaps Russell’s observation is a reason for some of us who are knowledgeable on the subject matter, at least, to put aside some doubts occasionally, as to what we can and should say.

And even though mathematicians may not know the answer to every question, one does not need to know the answer to every question. It is sufficient to be able to say something. Or even, simply to say something is bad.

This is nothing new. I grew up learning mathematics in a curriculum that was essentially universally understood by all those who taught me, and all those who I respect, to be heading in a ridiculous direction. Everyone decried the increasing reliance on calculators as a substitute for mathematical understanding. That has continued, with whole exams now written as a test of calculator usage. (Who even uses calculators any more? Phones are more powerful devices.)

Trends continue. My recent years of teaching first year university students have been marked by the steadily decreasing knowledge of incoming students. There is always some need to remove topics from the secondary curriculum, and never any room to strengthen it or include anything new. It can always be hollowed out and weakened, but never filled out or strengthened. I then have the task, in teaching university classes, of filling in the gaps and bringing students up o speed.


But these are not the main reasons I think mathematicians should say more about mathematics curriculum and teaching. Rather astonishingly though it sounds to say it, I think the most important reason why more mathematicians should say something on these matters is to make for greater civility.

Now, civility is one of the worst topics in political discourse.

One terribly overused, and bad, contemporary political argument is that extreme voices dominate political debate, and we need more civility. In one sense this is true: more and more right wing and even fascist voices have become prominent and even dominant in politics across the world in recent years. And debate increasingly happens on social media platforms which algorithmically promote extreme content because it generates “engagement”. But more usually the argument is made by centrists who decry this alongisde some supposed left-wing equivalent, such as the political movements behind Jeremy Corbyn or Bernie Sanders. There is just no comparison between their mild social democratic politics, and the resurgent reactionaries on the right. To the extent politeness or civility towards such reactionary forces normalises or strengthens them, we need less politeness or civility. Civility is not the problem here.

Even worse, the charge of incivility is often brought up against those who are genuinely and justifiably angry at the system. It can combine with racist and sexist attitudes to depict people of colour as irrationally angry, women as shrill, and so on. This sort of incivility, an expression of justifiable rage, is necessary for social change, and criticising it for its incivility is beside the point.

But the “math wars” are another matter. They arouse great passion and controversy in some mathematicians, educators and teachers. And this is a legitimate controversy, with deep political and ideological divides. But there is nothing to justify incivility, in the way that there is to justify rage against police brutality, misogyny, or fascism.

In my experience, the incivility is largely between mathematicians and mathematics educators. And the incivility is completely in one direction: by the mathematicians, towards the maths educators.

It’s not uncommon for mathematicians to express contempt for mathematics educators. It is not the only attitude, and in my experience in recent times not the most common one, but in my experience it is a standard attitude.

It’s disturbing how this contempt fits gender stereotypes. Professional mathematicians are overwhelmingly male, and mathematics education researchers are overwhelmingly female — and even if the contempt of the former for the latter is purely based on intellectual or political differences, it can easily look like misogyny.

Incredibly, the “math wars” have their own “cancel culture”. In the worst incident, though now over a decade ago, a group of prominent mathematicians tried to discredit a leading maths education researcher, and get her fired. Their accusations of misconduct which were found to be baseless, but not before succeeding in driving her out of a university position.

Legitimate criticism of the views or research of others is one thing. This was another.

This sort of incivility has to stop. If mathematicians on one side of the debate feel that they have been “losing” the “math wars”, this kind of incivility is part of the reason for it. More civil voices need to be heard. Mathematicians need to reject such incivility, and contribute in a democratic spirit.

Not human, but inhabited by humans: writing mathematics

Mathematics can be written in many ways. One approach, very popular with professional pure mathematicians, is to write as little as possible. Often the best proof of a mathematical theorem is the shortest and most elegant. This fact, combined with some of the history and culture of mathematics, leads to the classic terse mathematical style: theorem-proof, theorem-proof, lemma-theorem-proof, definition-prposition-theorem-proof, and so on.

(The fact that most mathematicians dislike writing may also have something to do with this!)

I think that, on every mathematical subject, there ought to be texts which are written in this way: short, crisp, elegant, minimalist. But there should also be others.

The standard terse style is, however, imperfect for learning mathematics — especially anyone below PhD level. Perhaps this style is tolerable for learning how the proofs go. It’s useful for understanding the exact steps in rigorous proofs of theorems. And it often works well with a highly motivated or sophisticated reader — one who understands that reading such books is not actually about reading, but about knowing when and how to ask oneself questions, filling in the details which have been omitted. The standard style is hard, both in the sense of “not easy” to read, and in the sense of “not soft”, with no surrounding story or context.

Such an approach is fine for insiders: those who already understand the culture and the conventions of mathematical literature. But for learners — particularly those with a weak background, as is increasingly the case — it is a different matter.

What tends to be left out in the standard terse style? Everything that makes mathematics human: history and context; motivation; commentary; connections within and beyond mathematics. And even a mathematician may not appreciate every book being so hard to read.

Other mathematicians may disagree, but given a choice between terse text, and a gentle version which is twice as long but twice as easy to read — and full of interesting details and tidbits — my preference is clear: genial is better than brutal and terse.

Why are we interested in the topic we are talking about? What are its implications and connections? Why do we cover the parts of the subject that we do? Why do we use the arguments we do, and why not others? Where did this proof come from? How could we use similar ideas to prove other results? These questions are often as important as the mathematical content itself.

More generally, contemporary curriculum and culture, at least in Australia, leads to the situation that students may know little about the background of their subject — even when they are studying at an advanced level.

Further, the classic terse approach can descend into a combination of intimidation and disrespect. Proofs and arguments are routinely omitted as “obvious” or “trivial”. Steps are skipped. Some readers may be fine with one or two gaps to fill in themselves, though no harm would have been done had the author included it; but every skipped step is a potential hazard, and a successful reader must navigate them all.

In extreme cases, authors leave a trail of breadcrumbs which the reader may be able to pick up and follow along, if they have enough knowledge or curiosity or insight or gumption or tenacity or luck. Mathematical writing then becomes a set of puzzles, where every sentence must be solved by the reader to progress to the next. Mathematicians in certain fields will know certain “classic” texts in the mathematical literature are precisely of this type. All this in the supposed pursuit of communicating mathematics as fast and efficiently as possible!

Such an approach makes reading mathematics, in its terse classic style, a completely different affair from reading almost any other subject.

Why erect walls of unexplained argumentation, and browbeat those who cannot scale them with cries of “obvious” and “trivial”?

For students first arriving upon the abstract world of pure mathematics, it can seem a harsh, even brutal subject. That is because it is a harsh, brutal subject. Mathematics does not forgive your one mistaken observation: your proof will come crashing down despite your pleadings. Most of your thoughts on mathematics will be wrong — to the extent they are even precise enough to be wrong. To do mathematics is to work through all the wrong thoughts to make them right.

Mathematical arguments are true independent of what humans think of them: in this sense, the truths of mathematics live in their own world, a world that has no feelings and is not human. The independence of mathematics from the human world is the source of an austere beauty, but it can also make the subject seem cold and desolate.

It is a cold world, it is a harsh world, but it is a beautiful world, and its statements are pure, honest, and beautiful. And while it is not human, it is a world inhabited by humans. It also provides the language of science and the universe.

Some can brave entry to this world themselves. But why should we not provide some guidance as to the nature of this world, as we enter it?

A-polynomials, Ptolemy varieties, and Dehn filling, Melbourne June 2020

On 15 June 2020 I gave a talk in the topology seminar at the University of Melbourne.

Title: A-polynomials, Ptolemy varieties, and Dehn filling, Melbourne June 2020

Abstract: The A-polynomial is a 2-variable knot polynomial which encodes topological and hyperbolic geometric information about a knot complement. In recent times it has been shown that the A-polynomial can be calculated from Ptolemy equations. Historically reaching back to antiquity, Ptolemy equations arise all across mathematics, often alongside cluster algebras.

In recent work with Howie and Purcell, we showed how to compute A-polynomials by starting with a triangulation of a manifold, then using symplectic properties of the Neumann-Zagier matrix to change basis, eventually arriving at a set of Ptolemy equations. This work refines methods of Dimofte, and the result is similar to certain varieties studied by Zickert and others. Applying this method to families of manifolds obtained by Dehn filling, we find relations between their A-polynomials and the cluster algebra of the cusp torus.

20-06_unimelb_talk_web