Liouville structures and convex surfaces

(This post is the third in a series on geometry. (A geometric series, har har har.) They all assume you know about differential forms and such things. The first was on Liouville geometry, also known as exact symplectic geometry, on surfaces. The second went from them to contact geometry. So I’m assuming you know what those are.)

We’ve seen that if you take a Liouville 1-form \(\beta\) on a surface \(S\) (i.e. such that \(d\beta\) is nondegenerate, hence a symplectic form), then the 1-form \(\alpha = \beta + dt\) on the 3-manifold \(M = S \times [0,1]\) obtained by thickening \(S\) is a contact form. (Here \(t\) is the coordinate on \([0,1]\).)

Moreover, we’ve seen that on each slice \(S \times \{t\}\) of this thickening, the characteristic foliation (i.e. the pattern of how the slice intersects the contact planes) \(\mathcal{F}\) coincides with \(\ker \beta\).

We’ve also noted that this contact form \(\alpha\) is a vertically invariant contact form on \(M\): it has no dependence on \(t\). Indeed, the flow of the vertical vector field \(\partial_t\) preserves \(\alpha\), and hence is a contact vector field. Thus each slice \(S \times \{t\}\) is transvserse to a contact vector field, and hence is a convex surface.

Thus, starting from the simple but elegant structure of a Liouville 1-form on a surface, we have been led to 3-dimensional contact geometry, and convex surfaces.

What we’re going to do now is go in the other direction, and start from a convex surface.

We’re going to make a clear distinction now between a contact structure and a contact form. A contact form is a 1-form \(\alpha\) such that \(\alpha \wedge d\alpha\) is non-degenerate, i.e. so that \(\ker \alpha\) is a non-integrable plane fielr. A contact structure \(\xi\) is a non-integrable plane field. So any contact form \(\alpha\) defines a contact structure \(\xi\) by \(\xi = \ker \alpha\), but a contact structure \(\xi\) has many 1-forms defining it (at least locally). Given any contact form \(\alpha\) such that \(\ker \alpha = \xi\), we can multiply \(\alpha\) by any smooth nonzero real-valued function \(f\), and \(f\alpha\) is then another contact 1-form, with \(\ker(f\alpha) = \ker \alpha = \xi\).

Well, let’s return to the definition of a convex surface: it’s an embedded surface \(S\) in a contact 3-manifold for which there is a vector field \(X\) transverse to \(S\). Said tersely, a convex surface is a surface with a transverse contact vector field.

Now, given a convex surface, we can introduce coordinates as we please. Let us define a coordinate \(t\) by the transverse vector field \(X\). So let \(X = \partial_t\). We can then let \(t=0\) on the surface \(S\), and flowing along \(X = \partial_t\), we obtain a coordinate \(t\) which measures how far from \(S\) we have flowed along \(t\). Using this coordinate, we can describe a neighbourhood of \(S\) as \(S \times [-\varepsilon, \varepsilon]\), for some sufficiently small \(\varepsilon\), where \(S\) appears as \(S \times \{0\}\) and the coordinate on the \([-\varepsilon, \varepsilon]\) factor is precisely \(t\). For simplicitly, we can take \(\varepsilon = 1\); by slowing down the vector field \(X\) we can in fact fit this \(S \times [-1,1]\) inside the previous \(S \times [-\varepsilon, \varepsilon]\).

So now we have a neighbourhood of \(S\) given as \(S \times [-1,1]\), and the transverse contact vector field is \(X = \partial_t\).

If we further denote by \(x,y\) some local coordinates on \(S\), then \(x,y,t\) form some local coordinates on \(S \times [-1,1]\). So the contact form \(\alpha\) (or indeed any 1-form) can be written in the form
\[ \alpha = f \; dx + g \; dy + u \; dt, \]
where \(f,g,u\) are real-valued functions on \(S \times [-1,1]\). Now the functions \(f,g,u : S \times [-1,1] \rightarrow \mathbb{R}\) might in general depend on \(x,y,t\). But as \(X = \partial_t\) s a contact vector field, the contact planes given by \(\ker \alpha\) don’t depend on the \(t\) coordinate at all. And hence we can take the contact form \(\alpha\) not to depend on \(t\) either. (Possibly \(\alpha\) might depend on \(t\), since multiplying \(\alpha\) by any nonero real-valued function produces a 1-form with the same kernel; but for such an \(\alpha\), we can “normalise” it, multiplying by a nonzero function, to make it independent of \(t\). Or indeed replacing \(f(x,y,t), g(x,y,t), u(x,y,t)\) with \(f(x,y,0), g(x,y,0), u(x,y,0)\) would have the same effect.)

In other words, since \(S\) is a convex surface, there is a contact form \(\alpha\) where \(f,g,u\) only depend on \(x,y\), and not \(t\). We can write
\[ \alpha = f(x,y) \; dx + g(x,y) \; dy + u(x,y) \; dt. \]
Written in this way, the first two terms \(f(x,y) \; dx + g(x,y) \; dy\) denote a 1-form purely on the surface \(S\). Indeed, any 1-form on \(S\) can be written this way. So let’s call it \(\beta\). In a similar way, \(u(x,y)\) can be regarded as a function purely on the surface \(S\). We then have
\[ \alpha = \beta + u \; dt \]
where \(\beta\) is a 1-form on \(S\), and \(u\) is a real-valued function on \(S\).

When \(u \neq 0\), we can do even better. We can then divide the whole 1-form \(\alpha\) by \(u\) — and remember, multiplying the contact form by a nonzero function results in another contact form defining the same contact structure. So this allows us effectively to assume that \(u=1\), and that \(\alpha\) is of the form \(\beta + dt\), where again \(\beta\) is a 1-form and \(u\) a real-valued function on \(S\).

Now if \(\alpha = \beta + dt\) is a contact form, then it must satisfy the contact condition of being non-integrable, i.e. \(\alpha \wedge d\alpha\) must be non-degenerate. Not every possible 1-form \(\beta\) on \(S\) and every function \(u\) on \(S\) will make \(\beta + dt\) a contact form. Which possible 1-forms \(\beta\) make a contact form? We can compute \(\alpha \wedge d\alpha\) to find out:
\[ \alpha \wedge d\alpha = (\beta + dt) \wedge d\beta = \beta \wedge d\beta + dt \wedge d\beta = dt \wedge d\beta. \]
This is a contact form if and only if \(d\beta\) is a non-degerate 2-form on \(S\) — that is, if \(\beta\) is a Liouville 1-form.

Thus we have proved the following.

PROPOSITION. Let \(S\) be a convex surface in a 3-manfold with a contact structure \(\xi\). Defining a transverse coordinate \(t\) via the transverse contact vector field, \(S\) has a neighbourhood on which \(\xi\) has a contact form \(\beta + u \; dt\), where \(\beta\) is a 1-form and \(u\) is a real-valued function on \(S\).

If further \(u\) is nowhere zero, then \(S\) has a neighbourhood on which \(\xi\) has a contact form \(\beta + dt\), where \(\beta\) is a Liouville 1-form on \(S\).

In our previous episode, we went from Liouville structures on surfaces, we have been led to convex surfaces in contact 3-manifolds. And now, we have gone back, from convex surfaces to Liouville structures.

Now we know that not every surface has a Liouville structure: we saw previously that there can’t be one if \(S\) is compact without boundary. And so a convex surface also can’t have a local contact form \(\beta + dt\) if \(S\) is compact without boundary.

But, amazingly enough, Giroux proved that, in a certain sense, almost any embedded surface in a contact manifold is convex — including almost any embedded compact surface without boundary.

Such a convex surface \(S\), compact without boundary, has a local contact form of the type \(\beta + u \; dt\), as we’ve discussed. And remember we said that if \(u\) is nowhere zero, then we could divide out by \(u\) and obtain a local contact form of the type \(\beta + dt\). But they can’t have a local contact form of the type \(\beta + dt\). Hence for any convex surface \(S\), compact without boundary, the contact form \(\beta + u \; dt\) must have some zeroes of \(u\).

And as it turns out, the zeroes of \(u\) are very interesting and important.

What happens at the zeroes of \(u\)? They are precisely where the contact planes are vertical, i.e. where \(\partial_t\), or the contact vector field \(X\), lie in \(\xi = \ker \alpha\). Indeed,
\[ \alpha(X) = \alpha(\partial_t) = \beta(\partial_t) + u dt (\partial_t) = u. \]
Here we used the fact that \(\beta(\partial_t) = 0\), since \(\beta\) is a 1-form on \(S\), which is independent of the \(t\) coordinate. So \(\alpha(X) = 0\) precisely when \(u=0\).

The set of points where \(u=0\) is called the dividing set (or decoupage, in the original French). It’s a curve on \(S\) and it splits \(S\) into pieces where \(u>0\) and where \(u<0\).

Note that when \(u>0\), we have \(\alpha(X)>0\); and when \(u<0\), we have \(\alpha(X)<0\). Suppose we paint one side of the contact planes white, and the other side black. We think of the black side as “positive”, and the white side as “negative”, in the following sense. Given any vector \(V\), we will have \(\alpha(V) > 0\) when \(V\) points out of the white side, \(\alpha(V) = 0\) when \(V\) points along the plane, and \(\alpha(V) < 0\) when \(V\) points out of the black side.

Thus, the contact planes are white side up when \(u<0\), they become vertical along the dividing set \(u=0\), where they flip over to be black side up when \(u>0\).

A convex disc. The dividing set is drawn in red. The dividing set is usually drawn in red.

The standard notation is that the dividing set (i.e. where \(u=0\)) is denoted \(\Gamma\); the region of \(S\) where \(u>0\) is denoted \(R_+\); and the region of \(S\) where \(u<0\) is denoted \(R_-\).

The best thing is that, if you just consider the suset of \(S \times [-1,1]\) piece where \(u>0\) (say), i.e. \(R_+ \times [-1,1]\), you can divide \(\alpha\) out by \(u\), and obtain a contact form of the type \(\beta + dt\), where \(\beta\) is Liouville. So the characteristic foliation on \(R_+\) is a Liouville foliation, and there is a flow tangent to it which exponentially expands an area form. The same applies to \(R_-\).

So in fact a convex surface can be regarded as made up of two Liouville structures pieced together along a dividing set, where the contact planes flip over.

Another, different, convex disc.

From Liouville geometry to contact geometry

(This post is a continuation of my previous one on Liouville structures, and hence is quite technical. It’s aimed at students of geometry, not the general public. It assumes you know some differential geometry. If you know what a Liouville structure is, read on.)

We’re going to take Liouville structures and move them into 3 dimensions, to obtain contact structures.

As we’ve seen, a Liouville structure on a surface \(S\) (which necessarily has boundary, or is non-compact) is given by a 1-form \(\beta\) such that \(d\beta\) is non-degenerate. Then \(d\beta\) is a symplectic form on \(S\), and \(\beta\) is dual to a vector field \(X\) via the symplectic form, i.e. \(\iota_X d\beta = \beta\). This structure has the nice property that \(X\) points along \(\ker \beta\), and the flow of \(X\) expands \(\beta\) and \(d\beta\) exponentially. In equations, \(\beta(X) = 0\), \(L_X \beta = \beta\), and \(L_X d\beta = d\beta\).

Let’s now go into the next dimension and consider a 3-dimensional space (or 3-manifold) \(M = S \times [0,1]\). We can use \(x,y\) as coordinates on \(S\), and \(t\) as a coordinate on \([0,1]\). So this is a thickening of \(S\); you can think of \(S\) as horizontal, and \([0,1]\) as vertical, with the coordinate \(t \in [0,1]\) measuring height. (However \(M\) certainly has boundary; and if \(S\) is non-compact, then the same is true for \(M\).)

On this 3-manifold \(M\), let’s consider a 1-form. The 1-form \(\beta\) is no more interesting here than it is on \(S\), but let’s add to it a form using the third dimension. We’ll just add to \(\beta\) the simplest possible such form, \(dt\).

So define a 1-form \(\alpha\) on \(M\) by \(\alpha = \beta + dt\).

Previously, we considered the example of a Liouville structure on \(S = \mathbb{R}^2\), given by \(\beta = x \; dy\). We will continue this running example. We obtain \(M = \mathbb{R}^2 \times [0,1]\), with the 1-form \(\alpha = x \; dy + dt\).

From the prequel.

It’s harder to draw pictures of 1-forms in 3 dimensions, but it is possible! Just as in the 2-dimensional case, we can draw the kernel of the 1-form. But whereas the kernel of a (nowhere zero) 1-form on a surface is a line field, the kernel of a (nowhere zero) 1-form on a 3-manifold is a plane field.

Now it’s not difficult to see that, in our example, no matter where you are, the \(x\)-direction always lies in \(\ker \alpha\). For \(\alpha\) only contains a \(dy\) and a \(dt\) term; if you feed it a vector in the \(x\)-direction, you get zero. But to get a second linearly independent vector in the kernel, you need to take a more carefully chosen combination of the \(y\) and \(t\) directions. We can check that the combination \(\partial_y – x \partial_t\) lies in the kernel; here \(\partial_y, \partial_t\) denote unit vectors in the \(y\) and \(t\) directions.
\[ \alpha(\partial_y – x \partial_t) = (x \; dy + dt)(\partial_y – x \partial_t) = x \; dy(\partial_y) – dt(x \partial_t) = x-x=0. \]
Thus, at each point \((x,y,t)\) in \(M\), \(\ker \alpha\) is spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). We can write
\[ \ker \alpha = \langle \partial_x, \; \partial_y – x \partial_t \rangle. \]

Let’s try to visualise \(\ker \alpha\) geometrically. It’s a plane field: there is a plane at each point in \(\mathbb{R}^2 \times [0,1]\). The plane at \((x,y,t)\) is spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). When \(x=0\), the plane is spanned by \(\partial_x\) and \(\partial_y\): it’s horizontal. As \(x\) varies, \(\partial_x\) always points along the plane, and so the planes “spin” around each line in the \(x\)-direction. As \(x\) increases from \(0\), the plane tilts: rather than \(\partial_y\), the plane contains \(\partial_y -x \partial_t\), and this negative \(\partial_t\) component becomes larger as \(x\) increases increases; as \(x\) becomes large, the plane becomes close to vertical. And similarly as \(x\) decreases from \(0\): the plane gains a positive \(\partial_t\) component.

This is in fact quite a standard geometric object, at least if you’re into the field of contact geometry. This plane field given by the kernel of \(x \; dy + dt\) is also known (possibly after some relabllings of variables, and some sign changes) as the standard contact structure on \(\mathbb{R}^3\).

The standard contact structure on \(\mathbb{R}^3\) (with slightly different coordinates: replace \(x,y,z\) here with \(y,x,t\) ). Public Domain, wikipedia

This plane field has some great properties. Try to find a surface which runs tangent to it! That is, try to find a surface in \(\mathbb{R}^2 \times [0,1]\), which at every point is tangent to this plane field. Even if you try to find a tiny surface, you will not succeed — because this plane field has a property called non-integrability.

Frobenius’ theorem in differential geometry tells you when a plane field is integrable (i.e. tangent to a surface) or not. (In fact, it says this not just about 2-dimensional plane fields in 3 dimensions, but in general about any-dimensional plane fields in any number of dimensions.) There is a vector-field version, and a differential form version.

The vector-field version of Frobenius’ theorem says to take the Lie bracket of two vector fields spanning the plane field: if the Lie bracket points along the planes of the plane field, it’s integrable; otherwise, it is not. In our example we obtain
\[ [ \partial_x, \; \partial_y – x \partial_t ]
= [\partial_x, \; \partial_y] – [\partial_x, \; x \partial_t]
= 0 – \partial_x (x \partial_t) + x \partial_t \partial_x \\
= – \partial_t – x \partial_x \partial_t + x \partial_t \partial_x
= – \partial_t.
Here we repeatedly used the fact that the partial derivatives commute, e.g. \(\partial_x \partial_y = \partial_y \partial_x\) or \([\partial_x, \partial_y] = 0\), and the product rule, in the slightly cryptic form of \(\partial_x x = 1 + x \partial_x\). The result is \(– \partial_t\), which is definitely not a linear combination of \(\partial_x\) and \(\partial_y – x \partial_t\).

The differential form version of Frobenius’ theorem says to take the wedge product \(\alpha \wedge d\alpha\), which will be a 3-form: if it is zero, then the plane field is integrable; otherwise, it is not. In our example we obtain
\[ \alpha \wedge d\alpha = (x \; dy + dt) \wedge (dx \wedge dy) \\
= dt \wedge dx \wedge dy \neq 0, \]
which is the standard Euclidean volume form and is definitely nowhere zero.

So, if you believe Frobenius’ theorem, or at least one or the other of these two variants, then we’ve shown that the plane field is nowhere integrable. This is exactly the definition of a contact structure: a plane field which is nowhere integrable. (This is the definition in 3 dimensions, at least.) So our plane field is a contact structure, in fact, it’s essentially the standard contact structure on \(\mathbb{R}^3\). A 1-form whose kernel is a contact structure is called a contact form. So in this example, the 1-form \(\alpha = \beta + dt = x \; dy + dt\) is a contact form.

Another nice feature comes from considering the surface \(S = S \times \{0\}\), sitting inside \(S \times [0,1]\), and how it intersects the contact planes. At every point of \(S\) is the horizontal surface \(S\) (which is a plane), and the plane \(\ker \alpha\). How do these planes intersect? Well, we saw that the planes of \(\ker \alpha\) were spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). The first of these actually points along \(S\), while the second does not — except when \(x = 0\), when the planes are tangent to \(S\). So the contact planes intersect \(S\) along the \(x\) direction.

Now the lines in the \(x\) directions along \(S\) have come up in our previous discussion — they are nothing but the kernel of the Liouville form \(\beta\)! So the planes of \(\ker \alpha\) cut \(S\) along the same line as \(\ker \beta\).

Thus, the lines at each point of \(S\), given by looking at how it intersects the contact plane \(\ker \alpha\), form a line field on \(S\); and if you integrate it, you get a foliation. We saw last time that \(\ker \beta\) also forms a line field on \(S\), which also integrates to a foliation. What we’ve found here in our example is that the foliations on \(S \times \{0\}\) given by \(\ker \beta\), and by intersection with \(\ker \alpha\), coincide. (The foliations coincide when \(x \neq 0\). When \(x = 0\), both foliations are singular, so they are in fact equal as singular foliations.)

When you have a contact form \(\alpha\) on a 3-manifold, and a surface \(S\) in that 3-manifold, you can always consider, in a similar way, how the contact planes intersect the tangent plane to \(S\), at each point of \(S\). The result is a (singular) line field on \(S\), which integrates to a (singular) foliation: this is called the characteristic foliation \(\mathcal{F}\) of \(S\). In fancy language, at a point \(p \in S\), the characteristic foliation is given by the intersection of the tangent plane \(T_p S\) to \(S\), with the contact plane \(\ker \alpha_p\) there:
\[ \mathcal{F}_p = T_p S \cap \ker \alpha_p. \]
In simple language, the characteristic foliation of a surface is the pattern of how it intersects the contact planes.

What we’ve found, in our example, is that the foliation \(\ker \beta\) coincides with the characteristic foliation on \(S \times \{0\}\). And in fact the same occurs not just on \(S \times \{0\}\), but on any slice \(S \times \{t\}\).

Let’s now consider our second example from last time: \(\beta = \frac{1}{2} (x \; dy – y \; dx)\) on the surface \(S = \mathbb{R}^2\). This was the “radial” example where \(X\) and \(\ker \beta\) both pointed out directly out along lines from the origin.

See the prequel for further discussion of this picture.

In this example, we obtain on \(M = \mathbb{R^2} \times [0,1]\) the 1-form \(\alpha = \beta + dt = \frac{1}{2} (x \; dy – y \; dx) + dt\). You can check that along the \(t\)-axis \(x=y=0\), \(\ker \alpha\) is a horizontal plane, spanned by \(\partial_x\) and \(\partial_y\). At other points, the kernel is spanned by \(x \partial_x + y \partial_y\), which points horizontally radially outward from the \(t\)-axis, and \(y \partial_x – x \partial_y + \frac{x^2+y^2}{2} \; dt\), which has an angular component \(y \partial x – x \partial y\) (which is in the \(\theta\) direction in cyclindrical coordinates), and a \(dt\) component. As \(x^2 + y^2\) becomes larger, i.e. further from the \(t\)-axis, the \(dt\) term becomes larger and the planes become more vertical. The result is a plane field known as the standard cylindrically symmetric contact structure, because it’s cylindrically symmetric, and it’s a contact structure.

From John Etnyre’s lecture notes on open book decompostions and contact structures. The picture is by Stephan Schonenberger.

It’s perhaps not surprising that if you take something on the plane which is radially symmetric, and then just make a 3-d thing out of it, which doesn’t change in the new (\(t\)) dimension, then you get something which is cylindrically symmetric.

Let’s check that we actually have a contact form, by applying the differential form version of Frobenius’ theorem:
\[ \alpha \wedge d\alpha =
\left( \frac{1}{2} x \; dy – \frac{1}{2} y \; dx + dt \right)
\wedge \left( dx \wedge dy \right) \\
= dt \wedge dx \wedge dy. \]
Most of the terms immediately cancel out in the wedge product because of the anti-symmetry: all that’s left is the standard 3-dimensional Euclidean volume form, which is definitely nowhere zero.

You can also check that on \(S \times \{0\}\), or in fact on any slice \(S \times \{t\}\), the characteristic foliation points radially outward from the origin. (Indeed, we just saw that the radial vector field \(x \partial_x + y \partial_y\) lies in \(\ker \alpha\).) This again coincides with the foliation \(\ker \beta\) from the Liouville 1-form.

In fact, this argument works generally, not just on this example. The differential form version of Frobenius’ theorem can always easily be applied to show \(\alpha\) is a contact form. Moreover, it’s not difficult to show that the characteristic foliation coincides with the foliation from the Liouville 1-form.

PROPOSITION. Starting from any Liouville 1-form \(\beta\) on any surface \(S\), the 1-form \(\alpha = \beta + dt\) is a contact form on \(M = S \times [0,1]\).

Moreover, for each \(t \in [0,1]\), the characteristic foliation on \(S \times \{t\}\) coincides with the foliation of \(\ker \beta\).

In other words, at each \((p,t) \in S \times [0,1]\), the intersection of \(\ker \alpha\) with the tangent space to \(S \times \{t\}\) is equal to \(\ker \beta\).
\[ \ker \alpha_{(p,t)} \cap T_{(p,t)} (S \times \{t\}) = \ker \beta_{(p,t)} \]

PROOF. Consider \(\alpha \wedge d\alpha\); we’ll show it is nowhere zero.
\[ \alpha \wedge d\alpha = (\beta + dt) \wedge d\beta
= \beta \wedge d\beta + dt \wedge d\beta
Now \(\beta \wedge d\beta\) is a 3-form, but \(\beta\) is a 1-form on the surface \(S\) only: it has no \(t\)-component. So \(\beta \wedge d\beta\) is a 3-form… on the 2-dimensional space \(S\)! Consequently it must be zero.

Thus \(\alpha \wedge d\alpha = dt \wedge d\beta\). Now we use the fact that \(\beta\) is a Liouville 1-form: this means that \(d\beta\) is a non-degenerate 2-form on \(S\). When we wedge it with \(dt\) then we must get a non-degenerate 3-form on \(S \times [0,1]\). This means \(\alpha\) is a contact form.

Now consider a point \((p,t) \in S \times [0,1]\), and a tangent vector \(V\) to \(S \times \{t\}\) there. So \(V\) is horizontal: it has no \(t\) component, and \(dt(V) = 0\). So \(\beta(V)= 0\) if and only if \(\alpha(V) = \beta(V) + dt(V) = 0\). Consequently for a vector \(V \in T_{(p,t)} S \times \{t\}\), we have \(V\) lies in \(\ker \alpha\) iff it lies in \(\ker \beta\). In other words, \(\ker \alpha \cap T_{(p,t)} S \times \{t\} = \ker \beta\). QED

Furthermore, this proof works in higher dimensions too. As long as you have a Liouville structure on a manifold \(S\) of any dimension \(2n\) (Liouville structures only exist in even dimension; this was discussed in the prequel), you’ll get a contact form on \(S \times [0,1]\) this way. (The only change is that \(\beta \wedge d\beta\) is a \((2n+1)\)-form on a \((2n)\)-manifold.)

In any case, this means that Liouville geometry leads naturally to contact geometry: we just add another dimension!

Indeed, sometimes this construction is called a contactisation!

(See, for example, section 2.3 of this 1998 paper of Yasha Eliashberg, Helmut Hofer and Dietmar Salamon.)

Liouville geometry is another name for symplectic geometry where the symplectic form is exact, i.e. the symplectic form is \(d\beta\). So contactisation upgrades symplectic geometry — the mathematics of classical Hamiltonian mechanics — to an odd-dimensional counterpart.

Another interesting thing to note is that in this “contactisation” construction, the 1-form \(\alpha = \beta + dt\) doesn’t actually depend on \(t\); it’s invariant under translations in the \(t\) direction. We saw in the second example that a radially symmetric Liouville form gave rise to a cylindrically symmetric contact structure.

And indeed, when you take \(\alpha = \beta + dt\), the 1-form \(\beta\) on the surface \(S\) doesn’t depend on \(t\); and \(dt\) is just the same \(dt\) regardless of what \(t\) is!

Such a contact form is sometimes called “vertically invariant”.

In fancy language, it means that the flow of the vertical vector field \(\partial_t\) preserves the contact form. In even fancier language, it means that the Lie derivative of the contact form \(\alpha\) in the direction of \(\partial_t\) is zero:
\[ L_{\partial_t} \alpha = 0. \]

A vector field whose flow preserves a contact structure is often called a contact vector field. So \(\partial_t\) is a contact vector field.

Our 3-manifold \(M\), the thickened surface \(S \times [0,1]\), can be considered as a family of surfaces \(S_t = S \times {t}\), for \(t \in [0,1]\). And there is a vector field \(\partial_t\), transverse to all of these surfaces.

Emmanuel Giroux discovered in the 1990s that, when you have a surface in a contact 3-manifold with a contact vector field transverse to it, very nice stuff happens. He called such a surface convex.

So Liouville geometry leads naturally not just to contact geometry, but to convex surfaces in contact geometry. More on that, another day.

Lovely Liouville geometry

(Note: This post is more technical than most stuff I write here. The intended audience here is not the general public, or even the general educated public: it’s students of geometry, broadly understood. In any case, if you don’t know what a differential form is, you’re probably not going to get much out of this.)

I’d like to show you some very nice geometry, involving some vector fields and differential forms.

Consider a surface. In fact, consider the plane, \(\mathbb{R}^2\). That’s just the standard Euclidean plane, with coordinates \( x\) and \( y\).

Now let’s consider a differential 1-form on the plane; call it \( \beta\). We’ll impose one condition on \( \beta\): its exterior derivative \( d\beta\) should be everywhere nonzero.

For instance, we can take \( \beta = x \; dy\). In fact, we will take this as a running example. Its exterior derivative is \( d\beta = dx \wedge dy\), which is just the usual Euclidean area form on the plane, and which is nowhere zero.

Now, saying that \(d\beta\) is everywhere nonzero is the same as saying that \( d\beta\) is an area form (although in general it might be different from the Euclidean area form \(dx \wedge dy\)); and this is also the same as saying that \( d\beta\) is a non-degenerate 2-form. In fact, being exact, \(d\beta\) is also closed: and hence \( d\beta\) is a closed non-degenerate 2-form, also known as a symplectic form.

Non-degenerate 2-forms are great. When you insert a vector into one, you get a 1-form; and because of the non-degeneracy, if the vector is nonzero, then the resulting 1-form is nonzero. So you get a bijective correspondence, or duality, between 1-forms and vectors.

This means that, at each point \(p \in \mathbb{R}^2\), the non-degenerate 2-form \(d\beta\) provides a linear map of 2-dimensional vector spaces
\[ T_p \mathbb{R}^2 \rightarrow T_p^* \mathbb{R}^2, \]
or in other words
\[ \{ \text{Vectors at $p$} \} \rightarrow \{ \text{1-forms at $p$} \}, \]
which sends a vector \(v\) to the 1-form \(\iota_v d\beta\) (i.e. the 1-form \(d\beta(v, \cdot)\), where you have fed \( d\beta\) one vector, but it eats two courses of vectors, and after its entree it remains a 1-form on its remaining main course). It’s a linear map and, by the non-degeneracy of \(d\beta\), its kernel/nullspace consists solely of the zero vector. Thus it’s injective and, both vector spaces being 2-dimensional, it’s an isomorphism.

If we consider (smooth) vector fields rather than just single vectors, then we can simultaneously do this at each point of \(\mathbb{R}^2\), and we get a map
\[ \{ \text{Vector fields on $\mathbb{R}^2$} \} \rightarrow \{ \text{1-forms on $\mathbb{R}^2$} \}. \]

So \( d\beta\), being a non-degenerate 2-form, gives us a way to go from 1-forms to vectors and back again. We can think of this as a duality: for each 1-form, this correspondence gives us a dual 1-form, and vice versa.

So far, we only have \(\beta\) and \( d\beta\). But \( \beta\) is a 1-form! So some vector field must correspond to it. Let’s call it \( X\). As it turns out, the 1-form \( \beta\), the 2-form \( d\beta\), and the vector field \( X\), form a very nice structure.

The name of Joseph Liouville is often associated with this stuff. Often the 1-form \(\beta\) is called a Liouville form, often the surface (or manifold in general) is called a Liouville manifold, and the whole thing is often called a Liouville structure.

In fact, we can draw pictures of such a structure.

The easiest thing to draw is the vector field \( X\): a vector, drawn as an arrow, at each point.

How do we draw \( \beta\)? It’s generally hard to draw a picture of a differential form! However for a 1-form, we can draw its kernel \( \ker \beta\). At a point \( p\) where \( \beta \neq 0\), \( \beta\) is a nontrivial linear map from the tangent space \( T_p \mathbb{R^2}\), which is a 2-dimensional vector space, to \( \mathbb{R}\). Being a nontrivial linear map from a 2-dimensional vector space to a 1-dimensional vector space, it has rank 1 and nullity 1, so the kernel is a 1-dimensional subspace of \( T_p \mathbb{R}^2\). When \( \beta_p = 0\), \( d\beta\) is the zero map \( T_p \mathbb{R}^2 \rightarrow \mathbb{R}^2\) and hence the kernel is the whole 2-dimensional tangent space \( T_p \mathbb{R}^2\). But where \( \beta \neq 0\), we have a 1-dimensional tangent subspace at each point; in other words, \( \ker \beta\) is a line field on \( \mathbb{R}^2\). We can even join up the lines (i.e. integrate them) to obtain a collection of curves on \( \mathbb{R}^2\), which become the leaves of a foliation. In this way \( \beta\) can be drawn as a collection of curves, or singular foliation, on \( \mathbb{R}^2\): the foliation is singular at the points where \( \beta = 0\). True, drawing it this way only shows the kernel of \( \beta\), i.e. in which direction you will get zero if you feed a vector into \( \beta\), and you will not see what you get if you feed vectors in other directions. So you don’t see how “strong” \( \beta\) is at each point. But it’s a useful way to represent \( \beta\) nonetheless.

As for the 2-form \(d\beta\)? It’s just an area form; we won’t attempt to draw anything to represent that.

So, let’s draw what we get in our example. In our example, the 1-form is \( \beta = x \; dy\), the 2-form is \( d\beta = dx \wedge dy\), and the vector field \( X\) must satisfy
\[ \iota_X d\beta = \beta,
\quad \text{i.e.} \quad
\iota_X ( dx \wedge dy ) = x \; dy. \]
It’s not too difficult to calculate that \( X = x \partial_x\) (here \( \partial_x\) is a unit vector in the \( x\) direction; if we’re using \( (x,y)\) to denote coordinates, then \( \partial_x = (1,0)\)).

Being a multiple of \( \partial_x\), \( X\) always points in the \( x\) direction, i.e. horizontally. When \( x\) is positive it points to the right, when \( x\) is negative it points to the left; and when \( x=0\), i.e. along the \( y\)-axis, \( X\) is zero. So \( X\) is actually a singular vector field, in the sense that it has zeroes. And it’s zero along a whole line. (So it’s not generic.)

As for \( \beta = x \; dy\), it is zero when \( x=0\), so the foliation \( \ker \beta\) is singular along the \( y\)-axis. When \( x \neq 0\), the kernel consists of anything pointing in the \( x\)-direction. Since \( \beta\) has only a \( dy\), but no \( dx\) term, if you feed it a \( \partial_x\), or any multiple thereof, you’ll get zero. So the line field you draw is horinzontal, as is the foliation. In other words, \( \ker \beta\) is the singular foliation consisting of horizontal lines, with singularities along the \( y\)-axis.

The line field \(\ker \beta = \langle \partial_x \rangle\) is shown in orange; the vector field \(X = x \partial_x\) is shown in blue.

Note that, although we might have expected the line field of \( \ker \beta\) and the arrows of \( X\) to point all over the place, in fact the arrows and lines point in the same direction, i.e. horizontal! The vectors of \( X\) point along the lines of \( \ker \beta\). This means that, at each point, the vector \( X\) lies in the kernel of \( \beta\), and hence \( \beta(X) = 0\).

Now in this example, the vector field \(X = x \partial_x\) has a very nice property. If you flow it, then points move out horizontally, and exponentially. Indeed, if you interpret \( x \partial_x\) as a velocity vector field, it is telling a point with horizontal coordinate \( x\) to move to the right, with velocity \( x\). Telling something to move as fast as where it already is, is a hallmark of exponential movement.

Denoting the flow of \( X\) for time \( t\) by \( \phi_t\), we have a map \( \phi_t : \mathbb{R}^2 \rightarrow \mathbb{R}^2\). It’s an exponential function:
\[ \phi_t (x,y) = (x e^{t}, y). \]
Indeed, you can check that \( \frac{\partial}{\partial t} \phi_t = (x e^t, 0)\), which at time \( t=0\) is
\[ \frac{\partial}{\partial t} |_{t=0} = (x,0) = X. \]

Now the vector field \(X\) (or its flow \(\phi_t\)) expands in the horizontal direction exponentially, and does nothing in the \(y\) direction: from this it follows that \(X\) also expands area exponentially. An infinitesimal volume \(V\), after flowing under \(\phi_t\) for time \(t\), expands so that \(\frac{\partial V}{\partial t} = V\), and hence grows exponentially: at time \(t\), the volume has expanded from \(V\) to \(V e^t\). Rephrased in terms of differential forms, the Lie derivative of the area form \(d\beta = dx \wedge dy\) under the flow of \(X\) is the area form itself:
\[ L_X d\beta = d\beta. \]

To see this, we use the Cartan formula \(L = d\iota + \iota d\), which yields
\[ L_X d\beta = d \iota_X (d\beta) + \iota_X d (d\beta). \]
From the definition of \(X\), being dual to \(\beta\), we have \(\iota_X (d\beta) = \beta\); using this, and the fact that \(d^2 = 0\), the first term becomes \(d\beta\) and the second term is zero.

In fact, \( X\) doesn’t just expand the area \( d\beta\) exponentially; it also expands the Liouville 1-form \( \beta\) exponentially. In other words,
\[ L_X \beta = \beta. \]

To see this, we just apply the Cartan formula,
\[ L_X \beta = d \iota_X \beta + \iota_X d\beta = 0 + \beta. \]
The first term is zero because, as we saw, \( X\) points along \( \ker \beta\), so \( \iota_X \beta = \beta(X) = 0\); the second term is zero because of the definition of \( X\) as dual to \( \beta\).

To summarise our example so far: we started with a 1-form \( \beta\) whose exterior derivative \( d\beta\) was a non-degenerate 2-form. We took a vector field \( X\) dual to \( \beta\), using the non-degeneracy of \( d\beta\). We have found that:

  • The vector field \( X\) points along the foliation \( \ker \beta\); in other words, \( \beta(X) = 0\).
  • Flowing \( X\) expands area exponentially: \( L_X d\beta = d\beta\).
  • Flowing \( X\) in fact expands the 1-form \( \beta\) exponentially: \( L_X \beta = \beta\).

We proved some of these in sort-of generality, but not everything. Let’s prove it all at once in general, now.

PROPOSITION. Let \( \beta\) be a 1-form on a surface such that \( d\beta\) is non-degenerate. Let \( X\) be dual to \( \beta\), i.e. \( \iota_X d\beta = \beta\). Then:
(i) \(\beta(X) = 0\)
(ii) \( L_X d\beta = d\beta\)
(iii) \( L_X \beta = \beta.\)

PROOF. For (i), we note \( \beta(X) = \iota_X \beta\), and then use \( \beta = \iota_X d\beta\) and the fact that differential forms are antisymmetric:
\[ \beta(X) = \iota_X \beta = \iota_X \iota_X d\beta = d\beta(X,X) = 0. \]

For (ii) and (iii), we can then follow the arguments above. For (ii), we use the Cartan formula \( L_X = d \iota_X + \iota_X d\), the fact that \( d^2 = 0\), and the fact that \( \iota_X d\beta = \beta\).
\[ L_X d\beta = d \iota_X d\beta + \iota_X d d\beta = d\beta + 0 \]

For (iii), we use the Cartan formula, part (i) that \(\iota_X \beta = 0\), and the fact that \(\iota_X d\beta =\beta\).
\[ L_X \beta = d \iota_X \beta + \iota_X d \beta = d 0 + \beta = \beta. \]

So Liouville structures have some very nice properties.

And, this is all in fact classical physics. We can think of the plane as a phase space, and \( \beta\) as an action \( y \; dx = p \; dq\). Then \( d\beta\) is the symplectic form on phase space, and the equation \( L_X d\beta = d\beta\) shows that this symplectic form, the fundamental structure on the phase space, is expanded by the flow of \( X\). (There is something in classical mechanics, or symplectic geometry, known as Liouville’s thoerem, which also says something about the effect of a flow of a vector field on the symplectic form.)

Anyway, above we saw one example of a Liouville structure on the plane. Here’s another one, which is more “radial”.

Take \(\beta = \frac{1}{2}(x dy – y dx)\). Then \(d\beta = \frac{1}{2}(dx \wedge dy – dy \wedge dx) = dx \wedge dy\). The vector field \(X\) dual to \(\beta\) is then \(\frac{1}{2}(x \; dx + y \; dy)\), which is a radial vector field. The kernel of \(\beta\) is also the radial direction: \(\beta(X) = (xdy – ydx)(x \partial_x + y \partial_y) = xy – yx = 0\). The flow of \(X\) then looks like it should expand area exponentially, and it does.

Liouville structures can exist in other places too, and not just on surfaces: they exist in higher dimensions too. Notice that the fact that \(\beta\) was on a surface was never actually used in the proof above: it could have been any manifold, with any 1-form \( \beta\) such that \( d\beta\) is non-degenerate.

However, not every manifold has Liouville structures. In fact, there are many surfaces on which no Liouville structures exist. Any compact surface (without boundary) has no Liouville structure.

Why is this? The idea is pretty simple. If you have a closed and bounded surface, it’s pretty hard to have a smooth vector field \( X\) which expands the area! Your surface has a given finite area, and then you flow it along \( X\) — moving the points of the surface around by a diffeomorphism — and now it has exponentially larger area! This is pretty paradoxical, and indeed it’s a contradiction.

The plane escapes this paradox, because it’s not bounded. You can indeed expand the plane by any factor you like, and it’s still the same plane. Surfaces with boundary also escape the paradox, because the flow of \( X\) will not be defined for all time: eventually points will be pushed off the edge.

But a sphere torus does not escape the paradox. Nor does a torus, or any higher genus compact surface without boundary.

Further, Liouville structures can only exist in even dimensions. So you can’t have one on a 3-dimensional space. But you can have one on a 4-dimensional space. Why is this? It can be seen from linear algebra. You simply can’t have non-degenerate 2-forms in odd dimensions. (The easiest way I know to see this is as follows. Let \( \omega\) be a non-degenerate 2-form on an \( n\)-dimensional vector space. Choose a basis \( e_1, \ldots, e_n\) and write \( \omega(e_i, e_j)\) as the \( (i,j)\) term of the \( n \times n\) matrix \( A\) for \( \omega\). The facts that \( \omega\) is antisymmetric and non-degenerate mean that \( A^T= -A\) and \( \det A \neq 0\) respectively. But then \( \det A = \det A^T = \det (-A) = (-1)^n \det A\), so \( (-1)^n = 1\), and \(n\) is even.

Just as a Liouville structure cannot exist on a compact surface (without boundary), it can’t exist on any compact manifold (without boundary). The argument is similar: because \( X\) expands the 2-form \( d\beta\), it also expands all its exterior powers, and hence the volume form of the manifold, whatever (even) dimension it may be.

So, we’ve seen that a Liouville structure can only exist on a manifold with even dimension, and which is not compact, or has boundary. (If you know about de Rham cohomology, it’s not difficult to see why the manifold must have \( H^2 \neq 0\).)

But when it does, we have a wonderful little geometric triplet \( \beta, d\beta\) and \( X\).

Emmy had a theorem (mathematical nursery rhyme #2)

In the spirit of previous work in abstract algebra, I have, erm, adapted another nursery rhyme.

After all, the songs are so common and commonly known; why not update them with some definite content?

To the tune of “Mary had a little lamb” (with no disrespect to the original, which seems to be an endearing story of an actual lamb), a discussion of Noether’s theorem.

If you haven’t heard of Noether’s theorem, it is very nice. (It should be distinguished from several other theorems of Emmy Noether, and indeed other mathematical Noethers.)

Roughly speaking, Noether’s theorem states that whenever a physical system has a nice symmetry, there is always some numerical quantity which is conserved along with it.

For instance, if a physical system is invariant under translation, then there is a conserved quantity associated to it, known as momentum. (And there are translations in three independent directions in space, so there are three components of momentum which are conserved. In other words, momentum as a vector quantity is conserved.) Similarly, if it’s invariant under rotations, then there is a conserved quantity known as angular momentum. Invariant under moving forward and backward in time — a conserved quantity known as energy. And so on.

This is not very precise, and there are different ways of formulating it, and of course physicists and mathematicians have different perspectives about it — as well as the level of mathematical precision and rigour with which it should be stated and understood.

The wikipedia page, at least at the time of writing, has a very physics-oriented discussion, which would offend many mathematicians’ sensibilities — certainly including my own. The nicest mathematical formulation uses symplectic geometry, and hence some fairly serious prerequisite knowledge, well beyond the Australian undergraduate curriculum. (Unless you take an undergraduate research unit with me at Monash, perhaps!)

A good discussion may be found in the lecture notes of Ana Cannas da Silva, available online here. Once enough machinery is developed to state the principle cleanly (um, in section 24 on page 147…), the theorem is proved in a leisurely half a dozen lines.

Anyway, less talk, more nursery rhymes!

Emmy had a theorem,
theorem, theorem
Emmy had a theorem
Its proof was clear as day.

Everywhere a symmetry,
symmetry, symmetry
Everywhere a symmetry
A conserved quantity.

Golay Golay Golay (Top of the autocorrelation world)

In 1949, Marcel Golay was thinking about spectrometry.

As he described it some time later the situation was as follows.

You have a spectrometer. The point of spectrometry is to find the frequency of light (or electromagnetic radiation more generally — but for convenience I’ll just say “light” from now on). Given a light source, spectrometry aims to find which frequencies (or colours) of light occur in it, and how they are distributed across the optical spectrum.

The spectrometer Golay had in mind was a cleverly designed “multislit” one. As the name suggests, it had many slits. Each slit could be open or closed. Light would come in on one side, pass through the contraption, and then exit on the other side, where detectors would be placed to record the output.

Both the entrance side and the exit side had many slits — the same number \(4N\) on either side. (Why a multiple of 4? It’s all part of the clever design, read on…)

Moreover, each entrance slit had a natural pathway through to an exit slit. The slits were designed so that light entering a particular entrance slit would pass through to a specific exit slit. The entrance and exit slits were thus matched up in a one-to-one fashion. This “matching up” in fact “inverted” the light: light coming in through the top slit on the left, would exit through the bottom slit on the right; light entering through the second-from-top slit on the left would exist through the second-from-bottom slit on the right; and so on.

At least, that’s what would happen for one particular colour of light, i.e. one particular frequency — let’s say pure crimson red. The point of the spectrometer is to pick out distinct frequencies, and so this contraption is “tuned” to perfectly align the slits for crimson red light.

What about other frequencies? They get shifted. When light of another frequency, let’s say green, passed through an entrance slit, it did not end up in the same place as crimson red light, opposite to where it came in; rather, it ended up shifted across by some number \(j\) of slits.

In other words, if red light and green light enter through the same lit, they exit through slits which are \(j\) spots apart from each other.

Golay’s idea was to arrange the slits and detectors in a clever way, so as to eliminate all the light of other freuqencies, and isolate the preferred (red) light. By an ingenious arrangement of detectors and open and closed slits, the red light would be greatly enhanced, with other colours (frequencies) completely filtered out.

How did this arrangement go? In a slightly complicated way. The entrance slits would be split into four equal length sections, each of length \(N\), as would the exit slits. Light entering through a slit in a particular section would go out a slit in the corresponding (opposite) section of exit slits.

These sections were separated from each other. In particular, non-red light could be shifted across slits within one section, but it could not cross over to another section.

Golay imagined there to be two detectors. The first detector \(D_1 \) would cover the bottom two exit sections, measuring the total amount of light exiting the top half of the slits, i.e. the bottom \( 2N \) exit slits. The other detector \(D_2\) would cover the top half of the exit slits, i.e. the bottom two sections, the bottom \( 2N \) exit slits. The detectors \( D_1 \) and \( D_2 \) simply capture the amount of light coming out of the bottom and top \( 2N \) slits respectively, or equivalently for our purposes, the number of those slits through which light emerges.

So in effect the whole contraption is in four separated parts, and there are two detectors, each detecting the output from two of the parts.

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory. What do the a’s and b’s mean? Read on…

Now, how to arrange the open and closed slits? Let’s denote open slits by a \( +1 \) (or just \( + \) for short), and closed slits by a \( -1 \) (or just \( – \) for short). So a sequence of open and closed slits can be denoted by a sequence of \( + \) and \( – \) symbols.

(You might think \( 1 \)s and \( 0 \)s are more appropriate for open and closed slits then \( +1\)s and \( -1 \)s. You could indeed use \(1\)s and \(0\)s; in that case I’ll leave it up to you to adjust the mathematics below.)

Now Golay suggested taking two sequences \(a\) and \(b\) of \(+\)s and \(–\)s, each of length \(N\) . They would be used to configure the slits. Let’s write \( a = (a_1, \ldots, a_N) \) and \(b = (b_1, \ldots, b_N)\) , where every \(a_i \) or \( b_i \) is either a \( +1 \) or a \( -1 \) .

Now, sequences \(a \) and \( b \) each have length \( N\), but there are \( 4N \) entrance slits and \( 4N \) exit slits.

What to do? Golay said what to do. Golay said also to take the negatives of \( a \) and \( b\). The negative of a sequence is given by multiplying all its terms by \(-1 \) (just like how you take the negative of a number). In other words, to take the negative of the sequence \( a\), you replace each \( + \) with a \( – \), and each \(– \) with \(+\). We can write \(-a\) for the negative of \(a\), and \(-b\) for the negative of \(b\).

Golay suggested, very cleverly, that the \(4N \) entrance slits, from top to bottom, should be should be arranged using \(a \) (for the top \( N \) slits), then \( -a \) (for the next \( N\)), then \( b \) (for the next \(N\)), and finally \(-b\) (for the bottom \(N\) slits). So as we read down the slits we read the sequences \( a,-a,b,-b\).

On the exit side, because the light is “inverted”, we now read bottom to top. Golay suggested that, as we read up the slits, we use the sequences \( a,-a,-b,b\). That’s not quite the same as what we did on the entrance side. The top \(N\) entrance slits, set according to the sequence \(a\), correspond to the bottom \(N \) exit slits, also set according to the sequence \(a\). The next \(N\) entrance slits are set according to \(-a\), as are the next \( N \) exit slits. But after that, the entrance slits set according to \( b \) correspond to the exit slits set to \( -b \) ; and the final \( N \) entrance slits are set to \( -b\), with corresponding exit slits set to \( b\). So the \(a \) and \( -a \) slits “match”, but the \( b \) and \( -b \) “anti-match”.

We can now see what the a’s and b’s mean in Golay’s diagram. (Golay writes \( a’ \) and \( b’ \) rather than \(-a\) and \(-b\).)

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory, again.

One final twist: the output of the contraption is measured by the two detectors \( D_1 \) and \( D_2\). But Golay proposed not to add their results, but to subtract them. So the final number we want to look at is not \(D_1 + D_2\), but \( D_1 – D_2\).

Anyway, that was Golay’s prescription.

So what happens to light going through this spectroscopic contraption, now with its numerous slits configured in this intricate way?

First let’s consider red light — which, recall, means the light goes straight from entrance slit to opposite exit slit. We’ll take the four sections separately, which, we recall, are labelled \(a,-a,b,-b \) at the entrance, and \( a,-a,-b,b \) at the exit.

  • For light hitting one of the top \( N \) entrance slits, one encoded by \( a_i\), it is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, out to the corresponding exit slit, which is also labelled \( a_i = 1\), and through to the detector \( D_1\).
  • Similarly, consider one of the entrance slits in the next section, encoded by some \( -a_i\). Light is blocked if \(-a_i = -1 \) but if \( -a_i = 1 \) then the light sails over the the corresponding exit slit, also labelled \( -a_i = 1 \) , through to the detector \( D_1\).
  • Now consider the third section, where entrance slits are encoded by \( b \) but exit slits are encoded by \( -b\). Light hits a slit encoded by some \(b_i\). If \(b_i = -1\), the entrance slit is closed, and the light is blocked there. If \( b_i = 1\), the entrance slit is open, and the light enters, but then the exit slit is encoded by \( -b_i = -1\), so is closed, and the light is blocked here. Either way, the light is blocked.
  • The final section is similar. The entrance slit is labelled by some \( -b_i\), and the exit slit by \(b_i\). If \( -b_i = -1\), the entrance slit is closed and light is blocked; if \( -b_i = 1\), then the entrance slit is open, but as \( b_i = -1\), the exit slit is blocked. Either way the light is blocked.

Now detector \(D_1 \) counts the number of slits in the first two sections from which light emerges. In the first section, those slits are the ones encoded by \( a_i \) such that \( a_i = 1\). In the second section, those slits are the ones encoded by \( -a_i \) such that \( -a_i = 1 \) , i.e. \( a_i = -1\). On the other hand, \(D_2 \) detects nothing, as everything is blocked. So we have

D_1 = ( \# i \text{ such that } a_i = 1) + ( \#i \text{ such that } a_i = -1), \\
D_2 = 0.

The expression for \( D_1 \) simplifies rather dramatically, because every \( a_i \) is either \( +1 \) or \( -1\). If you add up the number of \(+1\)s and the number of \(-1\)s, you simply get the number of terms in the sequence, which is \(N\). Thus in fact
D_1 = N, \quad D_2 = 0,
and the final result (remember we subtract the results of the two detectors) is
D_1 – D_2 = N.

So, we end up with a nice result, when we feed Golay’s spectroscope light of the colour it’s designed to detect (i.e. red).

Now, what happens with other colours? Let’s now feed Golay’s spectroscope some other colour (i.e. frequency, i.e. wavelength) of light, which means that the light gets shifted across \( j \) slots. Let’s say the light is green.

  • Consider green light hitting one of the top \( N \) entrance slits, encoded by \( a_i\). The light is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, over to the corresponding exit slits, which are also encoded by the sequence \( a\). The light is shifted across \(j \) slots in the process, and so arrives at the exit slit encoded by \( a_{i+j}\). If \(a_{i+j} = 1\), the light proceeds to detector \( D_1\); otherwise, the light is blocked. In other words, the green light gets to the detector if and only if \( a_i = a_{i+j} = 1\).
  • (Note also that if \( i+j > N \) or \( i+j < 1\), then the light beam gets shifted so far across that it hits the end of the section of the machine; and the sections are separated from each other. So we only need to consider those \( i \) (which are between \( 1 \) and \( N \) ) such that \( i+j \) is also between \( 1 \) and \( N\). In other words, (assuming \( j \) is positive) \( i \) only goes from \( 1 \) up to \( N-j\).
  • Now consider green light hitting the second section, where entrance and exit slits are labelled by \( -a_i\). If \( a_i = -1\), then light is blocked at the entrance. If \( -a_i = 1\), light enters, and proceeds with a shift over to an exit slit encoded by \( -a_{i+j}\). If \( -a_{i+j} = -1\), light is blocked at the exit, but if \( -a_{i+j} = 1\), then the light proceeds to detector \( D_2\). In other words, light gets to the detector \( D_1 \) if and only if \( -a_i = -a_{i+j} = 1\), or equivalently, \( a_i = a_{i+j} = -1\).
  • In the third section, entrance slits encoded by \( b \) and exit slits by \( -b\). For light to get through, we must have \( b_i = 1 \) and \( -b_{i+j} = 1\).
  • Finally, in the fourth section, entrance slits are encoded by \( -b \) and exit slits by \( b\). Light gets through when \( -b_i = 1 \) and \( b_{i+j} = 1 \).

Putting these together, we have
D_1 = ( \# i \text{ such that } a_i = 1 \text{ and } a_{i+j} = 1) + ( \# i \text{ such that } a_i = -1 \text{ and } a_{i+j} = -1), \\
D_2 = ( \# i \text{ such that } b_i = 1 \text{ and } b_{i+j} = -1) + ( \# i \text{ such that } b_i = -1 \text{ and } b_{i+j} = 1).

Now let’s manipulate these sums a little. Note that, for any \(i\), \( a_i = \pm 1 \) and \( a_{i+j} = \pm 1\). Thus the product \( a_i a_{i+j} = \pm 1\). But note that \( a_i a_{i+j} = 1 \) precisely when \( a_i = a_{i+j} = 1\), or \( a_i = a_{i+j} = -1[latex], i.e. when [latex] a_i \) and \( a_{i+j} \) are equal. These are precisely the cases counted in the sum for \( D_1 \) above. When \( a_i \) and \( a_{i+j} \) are not equal, they multiply to \( -1 \) instead.

Similarly, consider \( b_i \) and \( b_{i+j}\). The product \( b_i b_{i+j} \) is equal to \( -1 \) precisely when \( b_i = 1 \) and \( b_{i+j} = -1 \) , or when \( b_i = -1 \) and \( b_{i+j} = 1 \) . And these are precisely the cases counted above for \( D_2 \) .

So we have
D_1 = ( \# i \text{ such that } a_i a_{i+j} = 1 ),\\
D_2 = ( \# i \text{ such that } b_i b_{i+j} = -1).
Now, as we’ve said, for each \(i\), \( a_i a_{i+j} \) is \( 1 \) or \( -1\). For how many \( i \) do we get \( +1\)? Precisely \(D_1 \) times! Because that’s exactly what the equation above for \( D_1 \) says. All the other terms must be \( -1\). And we said above that \( i \) goes from \( 1 \) up to \( N-j\). So there are \( N-j-D_1 \) times that \( a_i a_{i+j} = -1\).

Let’s now just add up all the terms \( a_i a_{i+j}\), all the way from \( i=1\), i.e. the term \( _1 a_{1+j}\), to \( i=N-j\), i.e. the term \(a_{N-j} a_{N-j+j}\). We get \(+1 \) sometimes — precisely \( D_1 \) times — and \( -1 \) sometimes — precisely \( N-j-D_1 \) times. It follows that
a_1 a_{1+j} + \cdots + a_{N-j} a_{N-j+j} = 1 \cdot D_1 + (-1) \cdot (N-j-D_1)
or if we tidy up,
\sum_{i=1}^{N-j} a_i a_{i+j} = 2D_1 – N + j.
We can do the same for the terms \( b_i b_{i+j}\). We get \( -1 \) precisely \( D_2 \) times, as the equation for \( D_2 \) says above. And we get \( +1 \) all the other times, but there are \( N-j \) times overall, so we get \( +1 \) precisely \( N-j-D_2 \) times. Hence
b_1 b_{1+j} + \cdots + b_{N-j} b_{N-j+j} = 1 \cdot (N-j-D_2) + (-1) \cdot D_2,
or equivalently,
\sum_{i=1}^{N-j} b_i b_{i+j} = -2D_2 + N – j.

We want to get the final result of the detectors, which is \( D_1 – D_2\) . So let’s rearrange the equations above to obtain \( D_1 \) and \( D_2\),
D_1 = N – j + \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j}, \\
D_2 = N – j – \frac{1}{2} \sum_{i=1}^{N-j} b_i b_{i+j},
and subtract. When we do so, things simplify considerably!
D_1 – D_2 = \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j}

This is a very nice result. And it reduces what Golay wanted to a very interesting maths problem. Two sequences \( a = (a_1, \ldots, a_N) \) and \( b = (b_1, \ldots, b_N) \) of \( \pm 1\)s are called a complementary pair or a Golay pair if, for all \( j \neq 0\), this sum is zero:
\sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j} = 0.
Sums like these are often called autocorrelations. So the property we are looking for is a property of autocorrelations. Golay pairs are all about autocorrelations. Hence the title of this post.

If you can find a pair of Golay complementary sequences, then you can configure all the slits in the multislit spectrometer according to the sequence, and for any colour except the one you are looking for (red), the detectors will perfectly cancel out that colour! So your spectrometry will be greatly enhanced.

Now you might wonder, do any such pairs exist?

Yes, that do. Oh yes, they do. And that is also a very interesting question — not yet completely solved, with lots of ongoing research.

Stay tuned for more.

P.S. Yes, the title of this blog post is based on a song by Chumbawumba. It’s a very excellent song.

The “Australia day” category error

Australia’s national holiday commemorates not some heroic act, but the arrival of settler colonists who occupied, and settled that land, dispossessing the original and rightful inhabitants of the continent. Aboriginal sovereignty was never ceded; no treaty has ever been signed. Historic dispossession and violence, involving frontier wars and genocidal campaigns, decimated the Indigenous nations. There is struggle and heroism here, but mainly in the capacity of Indigenous peoples to resist and to survive.

Suppose I came to your home, invited myself in, made it my home, took your possessions, evicted or kidnapped or infected or murdered your family, and then celebrated the anniversary of my arrival each year — what would be the appropriate response?

And the answer is the same in the excruciatingly mind-numbing debate each year in Australia about whether the national holiday is appropriate.

(To avoid maximum excruciation, let us state the obvious. Clearly this analogy is not literal; no individual living today bears direct moral culpability for tragedies which unfolded in historical time. But it is precisely the symbolism, and national commemorations are pure symbolism, by design.)

The question in this mind-numbing debate may be an easy one, but even to ask it — of non-Indigenous Australians — contains a category error.

If I took over your home and then held a celebration there each year, it is not for me to say whether that celebration is appropriate. It is for you to say. I may well say it is not appropriate, but even if I think it is, your view counts for more; you have suffered the injustice. The correct answer is not just “no”, but also “it’s not for me to say”.

And so, to answer the question of the appropriateness of “Australia day”, the answers of Indigenous people are the most important. Everybody is entitled to their opinion, but an opinion on the question which does not take into account the views of Indigenous people cannot be taken seriously.

Views of Indigenous Australians can easily be found. The broadest data I’m aware of are poll results from 2017, a survey of 1,156 Indigenous Australians about “Australia day”. (If you know a better or more recent poll I would be happy to update.) It found that:

  • 54% of Indigenous Australians were in favour of a change of date. This may suggest that only a slim majority are against the event, but further results make it clear that the other 46% are far from being uniformly enthusiastic. For instance:
  • The survey asked participants to associate three words with Australia day. The most chosen words by Indigenous Australians were “invasion”, “survival” and “murder”.
  • A majority of Indigenous Australians said that the name “Australia day” should change.
  • 23% of Indigenous participants felt positive about Australia day, 30% had mixed feelings, and 31% had negative feelings.

Despite the above poll results, in January 2018 the Indigenous Affairs minister (who is not Indigenous) claimed that “no Indigenous Australian has told him the date of Australia Day should be changed other than a single government adviser”. This says more about a politician being out of touch, than it does about the distribution of opinion among Indigenous Australians.

In contrast, Jack Latimore, editor of IndigenousX, the prominent online platform for Indigenous voices, comes to a rather different conclusion.
Based on his extensive experience and engagement with Indigenous Australians from across the social and political spectrum, his conclusion is worth repeating:

When it comes to the subject of 26 January, the overwhelming sentiment among First Nations people is an uneasy blend of melancholy approaching outright grief, of profound despair, of opposition and antipathy, and always of staunch defiance.

The day and date is steeped in the blood of violent dispossession, of attempted genocide, of enduring trauma. And there is a shared understanding that there has been no conclusion of the white colonial project when it comes to the commonwealth’s approach to Indigenous people. We need only express our sentiments regarding any issue that affects us to be quickly reminded of the contempt in which our continued presence and rising voices are held.

Nor is our sentiment in regards to 26 January a recent phenomenon. I have witnessed it throughout my life in varied intensities. Evidence of it is even present in the recorded histories of White Australia.

Indeed, the long history of Indigenous protest against a January 26 celebration goes back at least to boycotts in 1888, and numerous actions on the 1938 sesquicentenary.

Returning to the present, numerous community leaders and representative bodies have also given their views, many of which are available online. Below are links to some such views; of course there plenty more are easily found.

Changing the date is an obvious, minimal, easy next step on the road to justice for Indigenous Australia. At the very least, maintaining the celebration in its current form is untenable. A minimal step towards respect for Indigenous Australia is to stop dancing on their ancestors’ graves.

Nor is it particularly opposed by the general Australian public. According to a December 2017 poll, most Australians are ignorant of the history of Australia Day, can’t guess what historical event happened on that day, and don’t really mind on what date it is celebrated. Half also think that the national holiday should not be held on a date offensive to Indigenous Australians (even though a plurality wrongly believes that January 26 is not offensive to Indigenous Australians).

As of a January 2017 poll, only 15% of Australians wanted to change the date. That number may well have increased by now, with the momentum of the movement to change the date.

And the survey apparently did not have “it’s not for me to say” as an option for non-Indigenous respondents — reinforcing the standard, annual category error.

I don’t believe in any patriotic holidays. But a patriotic holiday on such a terrible date needs to be moved, rebuilt, or abolished.

Topological entropy: information in the limit of perfect eyesight

Entropy is a notoriously tricky subject. There is a famous anecdote of John von Neumann telling Claude Shannon, the father of information theory, to use the word “entropy” for the concept he had just invented, because “nobody knows what entropy really is, so in a debate you will always have the advantage“.

Entropy means many different things in different contexts, but there is a wonderful notion of entropy which is purely topological. It only requires a space, and a map on it. It is independent of geometry, or any other arbitrary features — it is a purely intrinsic concept. This notion, not surprisingly, is known as topological entropy.

There are a few equivalent definitions; we’ll just discuss one, which is not the most general. As we’ll see, it can be described as the rate of information you gain about the space by applying the function, when you have poor eyesight — in the limit where your eyesight becomes perfect.

Let \(X\) be a metric space. It could be a surface, it could be a manifold, it could be a Riemannian manifold. Just some space with an idea of distance on it. We’ll write \(d(x,y)\) for the distance between \(x\) and \(y\). So, for instance, \(d(x,x) = 0\); the distance from a point to itself is zero. Additionally, \(d(x,y) = d(y,x)\); the distance from \(x\) to \(y\) is the same as the distance from \(y\) to \(x\); the triangle inequality applies as well. And if \(x \neq y\) then \(d(x,y) > 0\); to get from one point to a different point you have to travel over more than zero distance!

We assume \(X\) is compact, so roughly speaking, it has no holes, it doesn’t go off to infinity, its volume (if it has a volume) is finite.

Now, we will think of \(X\) as a space we are looking at, but we can’t see precisely. We have myopia. Our eyes are not that good, and we can only tell if two points are different if they are sufficiently far apart. We can only resolve points which have a certain degree of separation. Let this resolution be \(\varepsilon\). So if two points \(x,y\) are distance less than \(\varepsilon \) apart, then our eyes can’t tell them apart.

Rather than thinking of this situation as poor vision, you can alternatively suppose that \(X\) is quantum mechanical: there is uncertainty in the position of points, so if \(x\) and \(y\) are sufficently close, your measurement can’t be guaranteed to distinguish between them. Only when \(x\) and \(y\) are sufficiently far apart can your measurement definitely tell them apart.

We suppose that we have a function \(f \colon X \rightarrow X\). So \(f\) sends points of \(X\) to points of \(X\). We assume \(f\) is continuous, but nothing more. So, roughly, if \(x\) and \(y\) are close then \(f(x)\) and \(f(y)\) are close. (Making that rough statement precise is what the beginning of analysis is about.) We do not assume that \(f\) is injective; it could send many points to the same point. Nor do we assume \(f\) is surjective; it might send all the points of \(X\) to a small region of \(X\). All we know about \(f\) is that it jumbles up the points of \(f\), moving them around, in a continuous fashion.

We are going to define the topological entropy of \(f\), as a measure of the rate of information we can get out of \(f\), under the constraints of our poor eyesight (or our quantum uncertainty). The topological entropy of \(f\) is just a real number associated to \(f\), denoted \(h_{top}(f)\). In fact it’s a non-negative number. It could be as low as zero, and it can be infinite; and it can be any real number in between.

We ask: what is the maximum number of points can we distinguish, despite our poor eyesight / quantum uncertainty? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that any two of them are separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, we have \(d(x_i, x_j) \geq \varepsilon\). And if the answer is \(N\), then this is the maximum number; so there do not exist \(N+1\) points which are all separated by a distance of at least \(\varepsilon\).

Call this number \(N(\varepsilon)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our poor eyes can tell apart.

(Note that the number of points you can distinguish is necessarily finite, since they all lie in the compact space \(X\). There’s no way your shoddy eyesight can tell apart infinitely many points in a space of finite volume! So \(N(\varepsilon)\) is always finite.)

Clearly, if our eyesight deteriorates, then we see less, and we can distinguish fewer points. Similarly, if our eyes improve, then we see more, so we can distinguish more points. Eyesight deterioration means \(\varepsilon\) increases: we can only distinguish points if they are further apart. Similarly, eyesight improvement means \(\varepsilon\) decreases: we can tell apart points that are closer together.

Therefore, \(N(\varepsilon)\) is a decreasing function of \(\varepsilon\). As \(\varepsilon\) increases, our eyesight deteriorates, and we can distinguish fewer points.

Now, we haven’t yet used the function \(f\). Time to bring it into the picture.

So far, we’ve thought of our eyesight as being limited by space — by the spatial resolution it can distinguish. But our eyesight also applies over time.

We can think of the function \(f\) as describing a “time step”. After each second, say, each point \(x\) of \(X\) moves to \(f(x)\). So a point \(x\) moves to \(f(x)\) after 1 second, to \(f(f(x))\) after 2 seconds, to \(f(f(f(x)))\) after 3 seconds, and so on. In other words, we iterate the function \(f\). If \(f\) is applied \(n\) times to \(x\), we denote this by \(f^{(n)}(x)\). So, for instance, \(f^{(3)}(x) = f(f(f(x)))\).

The idea is that, if you stare at two moving points for long enough, you might not be able to distinguish them at first, but if eventually you may be able to. If they move apart at some point, then you may be able to distinguish them.

So while your eyes are encumbered by space, the are assisted by time. Your shoddy eyes have a finite spatial resolution they can distinguish, but over time points may move apart enough for you to resolve them.

(You can also think about this in a “quantum” way. The uncertainty principle says that uncertainties in space and time are complementary. If you look over a longer time period, you allow a greater uncertainty in time, which allows for smaller uncertainty in position. But from now on I’ll stick to my non-quantum myopia analogy.)

We can then ask a similar question: what is the maximum number of points we can distinguish, despite our myopia, while viewing the system for \(T\) seconds? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that at some point over \(T\) seconds, i.e. \(T\) iterations of the function \(f\), any two of them become separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, there exists some time \(t\), where \(0 \leq t \leq T\), such that \(d(f^{(t)}(x_i), f^{(t)}(x_j)) \geq \varepsilon\). And if the answer is \(N\), then this is again the maximal number, so there do not exist \(N+1\) points which all become separated at some instant over \(T\) seconds.

Call this number \(N(f, \varepsilon, T)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our decrepit eyes can distinguish over \(T\) seconds, i.e. \(T\) iterations of the function \(f\).

Now if we allow ourselves more time, then we have a better chance to see points separating. As long as there is one instant of time at which two points separate, we can distinguish them. So as \(T\) increases, we can distinguish more points. In other words, \(N(f, \varepsilon, T)\) is an increasing function of \(T\).

And by our previous argument about \(\varepsilon\), \(N(f, \varepsilon, T)\) is a decreasing function of \(\varepsilon\).

So we’ve deduced that the number of points we can distinguish over time, \(N(f, \varepsilon, T)\), is a decreasing function of \(\varepsilon\), and an increasing function of \(T\).

We can think of the number \(N(f, \varepsilon, T)\) as an amount of information: the number of points we can tell apart is surely some interesting data!

But rather than think about a single instant in time, we want to think of the rate of information we obtain, as time passes. How much more information do we get each time we iterate \(f\)?

As we iterate \(f\), and we look at our space \(X\) over a longer time interval, we know that we can distinguish more points: \(N(f, \varepsilon, T)\) is an increasing function of \(T\). But how fast is it increasing?

To pick one possibility out of thin air, it might be the case, that every time we iterate \(f\), i.e. when we increase \(T\) by \(1\), that we can distinguish twice as many points. In that case, \(N(f, \varepsilon, T)\) doubles every time we increment \(T\) by 1, and we will have something like \(N(f, \varepsilon, T) = 2^T\). In this case, \(N\) is increasing exponentially, and the (exponential) growth rate is given by the base 2.

(Note that doubling the number of points you can distinguish is just like having 1 extra bit of information: with 3 bits you can describe \(2^3 = 8\) different things, but with 4 bits you can describe \(2^4 = 16\) things — twice as many!)

Similarly, to pick another possibility out of thin air, if it were the case that \(N(f, \varepsilon, T)\) tripled every time we incremented \(T\) by \(1\), then we would have something like \(N(f, \varepsilon, T) = 3^T\), and the growth rate would be 3.

But in general, \(N(f, \varepsilon, T)\) will not increase in such a simple way. However, there is a standard way to describe the growth rate: look at the logarithm of \(N(f, \varepsilon, T)\), and divide by \(T\). For instance, if \(N(f, \varepsilon, T) \sim 2^T\), then we have \(\frac{1}{T} \log N(f, \varepsilon, T) \sim 2\). And then see what happens as \(T\) becomes larger and larger. As \(T\) becomes very large, you’ll get an asymptotic rate of information gain from each iteration of \(f\).

(In describing a logarithm, we should technically specify what the base of the logarithm is. It could be anything; I don’t care. Pick your favourite base. Since we’re talking about information, I’d pick base 2.)

This leads us to think that we should consider the limit
\lim_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).
This is a great idea, except that if \(N (f, \varepsilon, N)\) grows in an irregular fashion, this limit might not exist! But that’s OK, there’s a standard analysis trick to get around these kinds of situations. Rather than taking a limit, we’ll take a lim inf, which always exists.
\liminf_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).

(The astute reader might ask, why lim inf and not lim sup? We could actually use either: they both give the same result. In our analogy, we might want to know the rate of information we’re guaranteed to get out of \(f\), so we’ll take the lower bound.)

And this is almost the definition of topological entropy! By taking a limit (or rather, a lim inf), we have eliminated the dependence on \(T\). But this limit still depends on \(\varepsilon\), the resolution of our eyesight.

Although our eyesight is shoddy, mathematics is not! So in fact, to obtain the ideal rate of information gain, we will take a limit as our eyesight becomes perfect! That is, we take a limit as \(\varepsilon\) approaches zero.

And this is the definition of the topological entropy of \(f\):
h_{top}(f) = \lim_{\varepsilon \rightarrow 0} \liminf_{T \rightarrow \infty} \frac{1}{T} \log N(f, \varepsilon, n).
So the topological entropy is, as we said in the beginning, the asymptotic rate of information we gain in our ability to distinguish points in \(X\) as we iterate \(f\), in the limit of perfect eyesight!

As it turns out, even though we heavily relied on distances in \(X\) throughout this definition, \(h_{top}(f)\) is completely independent of our notion of distance! If we replace our metric, or distance function \(d(x,y)\) with a different one, we will obtain the same result for \(h_{top}\). So the topological entropy really is topological — it has nothing to do with any notion of distance at all.

This is just one of several ways to define topological entropy. There are many others, just as wonderful and surprising and which scratch the tip of an iceberg.


Abstract algebra nursery rhyme

In the spirit of hilariously advanced baby books like Chris Ferrie’s Quantum Physics for Babies, I have taken to incorporating absurdly sophisticated concepts into nursery rhymes.

To the tune of the ABC song (or, equivalently, Twinkle Twinkle Little Star):

The axioms of a group go 1, 2, 3
Identity, inverse, associativity!
The identity times any element g is g,
Inverse of g times g is identity,
Associativity says ab times c
is equal to a times bc.

The last resort of scoundrels

Samuel Johnson said it was “the last resort of scoundrels“; Emma Goldman, a menace to liberty. Leo Tolstoy said it “as a feeling is bad and harmful, and as a doctrine is stupid“. Patriotism, at least in its usual sense of love of one’s country over others, veneration of the virtue of its people over others, and adoration of its flag, is awful, irrational nonsense.

How on earth one can deduce moral values, or even a positive emotional response, from a geographic entity — indeed, such powerful emotions as to move men to war (yes, usually men) — has always eluded me.

It may be that there may be various administrative reasons to divide a geographical area (like the earth, or a continent) into official or legal sub-regions (like countries, or states).

More importantly, it may be that, for one born in a land oppressed by a colonist, an occupier, or other oppressor, the natural solidarity among those oppressed peoples in their legitimate resistance may be expressed in the language of patriotism.

And it may be that there can be good, even uniquely good, things about a nation’s culture, and that it is worth recalling them occasionally — though there will equally be bad, even uniquely bad aspects also. One must never forget that people everywhere are roughly equally good and equally bad.

It may also be that countries may have sporting teams, or the like, and it can be fun to barrack for them.

Beyond that, there is nothing positive to say about patriotism.

Even if a country is physically beautiful, others are too. Even if a country’s culture or people are wonderful, others are too. There are wonderful people and wonderful ideas everywhere, just as there are horrible people everywhere. Venerating only those nearby, to the exclusion of others, is insular, narcissistic, and leads naturally to racism, chauvinism, and xenophobia.

Even if the highly dubious conceit of orthodox patriotism is true for a country — that this nation is great and to be preferred over others, despite all the other ones believing the same — it does not follow that that one ought to venerate this nation: if one wants to venerate something, one should venerate good things and good people, whether here, there or anywhere.

(Incredibly, orthodox patriotism means that vast numbers people in every land can believe precisely this, despite those elsewhere thinking the same. They cannot all be right, but they can all be wrong — living “in a gross and hamful delusion“. It is the same with all religions claiming to be the one true religion, of course. It discloses something deep, and deeply worrisome, about the human condition, that vast numbers of people are capable of this conceit.)

What matters are universal moral values, equity, justice, freedom, and so on; not the country in which they are expressed. One’s specific birthplace or homeland or nation is irrelevant.

This is kindergarten level morals; except that the corresponding kindergarten situation, of a group of children each boasting they are the best, will be resolved by a game or by a distraction, rather than by oppression, detention archipelagos, or war.

Perhaps the worst aspect of patriotism is in the cultural realm. It creates mythologies, with deep and powerful emotions latent within its manufactured communities. These emotions, fueled also by resentment of outsiders, can be manipulated by regressive political forces to reinforce inequalities, persecute outsiders, and stoke wars.

These mythologies are created when a nation’s history is recounted as virtuous, dramatic and heroic. But it is the same with other nations; and if retelling the story of one nation excludes other peoples and nations (or worse, disparages or invokes hatred of them), then it leads in the direction of, at best, insularity and stagnation, and at worst, militarism, oppression and war.

Then there is Australia.

Here, the magnitude of the artifice required to tell the nation’s history as a virtuous story is itself heroic. The result is an increasingly viciously enforced cultural orthodoxy, together with a crushing cultural cringe.

An island continent, home to hundreds of Indigenous nations, until colonised by an imperial power to create an antipodean jail; the original inhabitants and rightful owners dispossessed by the accumulation of property and capital and microbes, by genocidal policy, and by over a century of smouldering frontier war; no galvanizing wars fought for independence, only complicity in the motherland’s imperial ambitions, and a standard role in humanity’s propensity for worldwide violence; with all the bravery, heroism, obedience, murder and atrocity that entails. The overall arc of post-settlement history must be twisted beyond recognition to confect an orthodox patriotic mythology.

There are plenty of heroic Australians, to be sure; just as there are plenty of villains, and everything in between. And there are plenty of legitimate sources of pride in that nation’s achievements, just as there are plenty of horrific sources of shame.

Nothing more and nothing less; special in some ways and not in others; which is precisely the negation of every orthodox patriotic myth.

Limitless as that space too narrow for its inspirations

On 22 February, 1877, James Joseph Sylvester gave an “Address on Commemoration day at Johns Hopkins University”.

Sylvester, the very excellent English mathematician, worked in areas of what we would today call algebra, number theory, and combinatorics. He is known for his algebraic work in invariant theory; he is known for his work in combinatorics, such as Sylvester’s Problem in discrete geometry; and for much else. He invented several terms which are commonplace in mathematics today — “matrix”, “graph” (in the sense of graph theory) and “discriminant”. He was also well known for his love of poetry, and indeed his poetic style. (He in fact published a book, The Laws of Verse, attempting to reduce “versification” to a set of axioms.)

I came across this address of Sylvester, not through mathematical investigations or in the references of a mathematical book, but rather in the footnotes of the book “Awakenings”, in which the late neurologist Oliver Sacks discusses, in affectionate and literary detail, the case histories of a number of survivors of the 1920s encephalitis lethargica (“sleeping sickness”) epidemic — an interesting and mysterious event in itself — as those patients are treated in the 1960s with the then-new drug L-DOPA and experience wondrous “awakenings”, often after decades of catatonia, although often followed by severe tribulations. (These awakenings were the subject of the 1990 Oscar-nominated movie of the same name.) These tribulations, in each patient, form an odyssey through the depths of human ontology, in which the effects of personality, character, physiology, environment, and social context are all present and deeply intertwined.

Sacks comes to the conclusion that a reductionist approach to medicine, focusing on the cellular and the chemical, is wholly deficient:

What we do see, first and last, is the utter inadequacy of mechanical medicine, the utter inadequacy of a mechanical world-view. These patients are living disproofs of mechanical thinking, as they are living exemplars of biological thinking. Expressed in their sickenss, their health, their reactions, is the living imagination of Nature itself, the imagination we must match in our picturing of Nature. They show us that Nature is everywhere real and alive and that our thinking about Nature must be real and alive. They remind us that we are over-developed in mechanical awareness; and that it is this, above all, that we need to regain, not only in medicine, but in all science.

Indeed, Sacks quotes from W H Auden’s “The Art of Healing”:

Papa would tell me,
‘is not a science,
but the intuitive art
of wooing Nature.

In an accompanying footnote, Sacks notes that mathematical thinking is real and alive, in just the same way. He quotes the aforementioned address of Sylvester.

Mathematics is not a book confined within a cover and bound between brazen clasps, whose contents it needs only patience to ransack; it is not a mine, whose treasures may take long to reduce into possession, but which fill only a limited number of veins and lodes; it is not a soil, whose fertility can be exhausted by the yield of successive harvests; it is not a continent or an ocean, whose area can be mapped out and its contour defined: it is limitless as that space which it finds too narrow for its aspirations; its possibilities are as infinite as as the worlds which are forever crowding in and multiplying upon the astronomer’s gaze; it is as incapable of being restricted within assigned boundaries or being reduced to definitions of permanent validity, as the consciousness, the life, which seems to slumber in each monad, in every atom of matter, in each leaf and bud and cell, and is forever ready to burst forth into new forms of vegetable and animal existence.

Sylvester is right, and if anything his argument is not forceful enough. Mathematics has always been limitless — and even more limitless than the seemingly (to Sylvester, at least) infinite possibilities of astronomy and biology — for, unlike the experimental or observational sciences, it requires no substrate in reality beyond the imagination of those who think it. Liberated from the necessity to study only this world, mathematics studies all the worlds it can imagine, which include our own but go far beyond our own one. (It is perhaps surprising, and even “unreasonable”, as Wigner argued, that we can count our own world as among those which are mathematical; but it is not surprising that its worlds transcend ours.)

The progress of science has displayed, in an absolute sense, how mathematics outstrips the limitlessness of other sciences.

However many may be the worlds of the astronomer — now teeming also with exoplanets and gravitational waves — they are still finite; the observable universe has a finite radius.

Sylvester’s panpsychism (everything has consciousness) is now out of fashion, but seems focused on biology — and we now know that biological life is constrained by genetics, and at the molecular level by DNA and related biochemistry. Mathematics knows no such constraint.

Taking panpsychism more generally, there is an argument — and a strong one, in my view — that understanding consciousness will eventually require a radical revision of our understanding of physics. But even then, I very much doubt any such radical revision would completely transcend mathematics — and I very much doubt that mathematics would not encompass infinitely more.

It is worth noting, though, that mathematics is, in a certain sense, reductionism par excellence. Even accepting what we know about incompleteness theorems and the like, mathematics, theoretically at least, can be reduced to sets of axioms and logical arguments, in the end consisting only of formal logic, modus ponens and the like. That is not how mathematicians do mathematics in practice, but that is the orthodox view on what mathematics formally is. Even the standard theorems that mathematics “knows no bounds” — the Godel incompleteness theorems, the Cantor diagonalisation argument, the set-theoretic paradoxes like Russell’s, for instance — can themselves be expressed, reductionistically, in this formal way.

All the infinite possibilities, the unboundedness, of mathematics, then, can be expressed in a very finite, very discrete, very reductionistic way. This is not surprising — even with finitely many letters one can construct an infinity of sentences, one can burst all brazen clasps, one can empty all veins and lodes, one can exhaust all soils, there is no end to the harvest, however dizzying and rarefied the altitude at which it is sown.

And as for definitions of permanent validity? At least in terms of the experience of learning, doing and discovering mathematics, I cannot go past Ada Lovelace’s definition of the “poetical science” as “the language of the unseen relations between things”.

There is much else of interest — and not just historical interest — in Sylvester’s address. Mathematics impedes public speaking; university study and research ought to avoid monetary reward and public recognition; students should avoid “disorder or levity”; all researchers should simultaneously engage in teaching; anecdotes of arithmetic in the French revolution; every science improves as it becomes more mathematical; and the taste for mathematics is much broader than one might think. So argues Sylvester, poet, mathematician; perhaps I will return to these arguments one day.