Lovely Liouville geometry

(Note: This post is more technical than most stuff I write here. The intended audience here is not the general public, or even the general educated public: it’s students of geometry, broadly understood. In any case, if you don’t know what a differential form is, you’re probably not going to get much out of this.)

I’d like to show you some very nice geometry, involving some vector fields and differential forms.

Consider a surface. In fact, consider the plane, \(\mathbb{R}^2\). That’s just the standard Euclidean plane, with coordinates \( x\) and \( y\).

Now let’s consider a differential 1-form on the plane; call it \( \beta\). We’ll impose one condition on \( \beta\): its exterior derivative \( d\beta\) should be everywhere nonzero.

For instance, we can take \( \beta = x \; dy\). In fact, we will take this as a running example. Its exterior derivative is \( d\beta = dx \wedge dy\), which is just the usual Euclidean area form on the plane, and which is nowhere zero.

Now, saying that \(d\beta\) is everywhere nonzero is the same as saying that \( d\beta\) is an area form (although in general it might be different from the Euclidean area form \(dx \wedge dy\)); and this is also the same as saying that \( d\beta\) is a non-degenerate 2-form. In fact, being exact, \(d\beta\) is also closed: and hence \( d\beta\) is a closed non-degenerate 2-form, also known as a symplectic form.

Non-degenerate 2-forms are great. When you insert a vector into one, you get a 1-form; and because of the non-degeneracy, if the vector is nonzero, then the resulting 1-form is nonzero. So you get a bijective correspondence, or duality, between 1-forms and vectors.

This means that, at each point \(p \in \mathbb{R}^2\), the non-degenerate 2-form \(d\beta\) provides a linear map of 2-dimensional vector spaces
\[ T_p \mathbb{R}^2 \rightarrow T_p^* \mathbb{R}^2, \]
or in other words
\[ \{ \text{Vectors at $p$} \} \rightarrow \{ \text{1-forms at $p$} \}, \]
which sends a vector \(v\) to the 1-form \(\iota_v d\beta\) (i.e. the 1-form \(d\beta(v, \cdot)\), where you have fed \( d\beta\) one vector, but it eats two courses of vectors, and after its entree it remains a 1-form on its remaining main course). It’s a linear map and, by the non-degeneracy of \(d\beta\), its kernel/nullspace consists solely of the zero vector. Thus it’s injective and, both vector spaces being 2-dimensional, it’s an isomorphism.

If we consider (smooth) vector fields rather than just single vectors, then we can simultaneously do this at each point of \(\mathbb{R}^2\), and we get a map
\[ \{ \text{Vector fields on $\mathbb{R}^2$} \} \rightarrow \{ \text{1-forms on $\mathbb{R}^2$} \}. \]

So \( d\beta\), being a non-degenerate 2-form, gives us a way to go from 1-forms to vectors and back again. We can think of this as a duality: for each 1-form, this correspondence gives us a dual 1-form, and vice versa.

So far, we only have \(\beta\) and \( d\beta\). But \( \beta\) is a 1-form! So some vector field must correspond to it. Let’s call it \( X\). As it turns out, the 1-form \( \beta\), the 2-form \( d\beta\), and the vector field \( X\), form a very nice structure.

The name of Joseph Liouville is often associated with this stuff. Often the 1-form \(\beta\) is called a Liouville form, often the surface (or manifold in general) is called a Liouville manifold, and the whole thing is often called a Liouville structure.

In fact, we can draw pictures of such a structure.

The easiest thing to draw is the vector field \( X\): a vector, drawn as an arrow, at each point.

How do we draw \(\beta\)? It’s generally hard to draw a picture of a differential form! However for a 1-form, we can draw its kernel \(\ker \beta\). At any point \(p\), \(\beta_p\) is a linear map from the tangent space \(T_p \mathbb{R^2}\), which is a 2-dimensional vector space, to \(\mathbb{R}\). When \(\beta_p = 0\), \(\beta_p\) is the zero map \(T_p \mathbb{R}^2 \rightarrow \mathbb{R}\) and hence the kernel is the whole 2-dimensional tangent space \(T_p \mathbb{R}^2\). But where \(\beta_p \neq 0\), we have a nontrivial linear map from a 2-dimensional vector space to a 1-dimensional vector space. Hence \(\beta_p\) has rank 1 and nullity 1, so the kernel is a 1-dimensional subspace of \(T_p \mathbb{R}^2\). We then have a 1-dimensional tangent subspace at each point; in other words, \(\ker \beta\) is a line field on \(\mathbb{R}^2\). We can even join up the lines (i.e. integrate them) to obtain a collection of curves on \(\mathbb{R}^2\), which become the leaves of a foliation. In this way \(\beta\) can be drawn as a collection of curves, or singular foliation, on \(\mathbb{R}^2\): the foliation is singular at the points where \(\beta = 0\). True, drawing it this way only shows the kernel of \(\beta\), i.e. in which direction you will get zero if you feed a vector into \(\beta\), and you will not see what you get if you feed vectors in other directions. So you don’t see how “strong” \(\beta\) is at each point. But it’s a useful way to represent \(\beta\) nonetheless.

As for the 2-form \(d\beta\)? It’s just an area form; we won’t attempt to draw anything to represent that.

So, let’s draw what we get in our example. In our example, the 1-form is \( \beta = x \; dy\), the 2-form is \( d\beta = dx \wedge dy\), and the vector field \( X\) must satisfy
\[ \iota_X d\beta = \beta,
\quad \text{i.e.} \quad
\iota_X ( dx \wedge dy ) = x \; dy. \]
It’s not too difficult to calculate that \( X = x \partial_x\) (here \( \partial_x\) is a unit vector in the \( x\) direction; if we’re using \( (x,y)\) to denote coordinates, then \( \partial_x = (1,0)\)).

Being a multiple of \( \partial_x\), \( X\) always points in the \( x\) direction, i.e. horizontally. When \( x\) is positive it points to the right, when \( x\) is negative it points to the left; and when \( x=0\), i.e. along the \( y\)-axis, \( X\) is zero. So \( X\) is actually a singular vector field, in the sense that it has zeroes. And it’s zero along a whole line. (So it’s not generic.)

As for \( \beta = x \; dy\), it is zero when \( x=0\), so the foliation \( \ker \beta\) is singular along the \( y\)-axis. When \( x \neq 0\), the kernel consists of anything pointing in the \( x\)-direction. Since \( \beta\) has only a \( dy\), but no \( dx\) term, if you feed it a \( \partial_x\), or any multiple thereof, you’ll get zero. So the line field you draw is horinzontal, as is the foliation. In other words, \( \ker \beta\) is the singular foliation consisting of horizontal lines, with singularities along the \( y\)-axis.

A line field.
The line field \(\ker \beta = \langle \partial_x \rangle\) is shown in orange; the vector field \(X = x \partial_x\) is shown in blue.

Note that, although we might have expected the line field of \( \ker \beta\) and the arrows of \( X\) to point all over the place, in fact the arrows and lines point in the same direction, i.e. horizontal! The vectors of \( X\) point along the lines of \( \ker \beta\). This means that, at each point, the vector \( X\) lies in the kernel of \( \beta\), and hence \( \beta(X) = 0\).

Now in this example, the vector field \(X = x \partial_x\) has a very nice property. If you flow it, then points move out horizontally, and exponentially. Indeed, if you interpret \( x \partial_x\) as a velocity vector field, it is telling a point with horizontal coordinate \( x\) to move to the right, with velocity \( x\). Telling something to move as fast as where it already is, is a hallmark of exponential movement.

Denoting the flow of \( X\) for time \( t\) by \( \phi_t\), we have a map \( \phi_t : \mathbb{R}^2 \rightarrow \mathbb{R}^2\). It’s an exponential function:
\[ \phi_t (x,y) = (x e^{t}, y). \]
Indeed, you can check that \( \frac{\partial}{\partial t} \phi_t = (x e^t, 0)\), which at time \( t=0\) is
\[ \frac{\partial}{\partial t} |_{t=0} = (x,0) = X. \]

Now the vector field \(X\) (or its flow \(\phi_t\)) expands in the horizontal direction exponentially, and does nothing in the \(y\) direction: from this it follows that \(X\) also expands area exponentially. An infinitesimal volume \(V\), after flowing under \(\phi_t\) for time \(t\), expands so that \(\frac{\partial V}{\partial t} = V\), and hence grows exponentially: at time \(t\), the volume has expanded from \(V\) to \(V e^t\). Rephrased in terms of differential forms, the Lie derivative of the area form \(d\beta = dx \wedge dy\) under the flow of \(X\) is the area form itself:
\[ L_X d\beta = d\beta. \]

To see this, we use the Cartan formula \(L = d\iota + \iota d\), which yields
\[ L_X d\beta = d \iota_X (d\beta) + \iota_X d (d\beta). \]
From the definition of \(X\), being dual to \(\beta\), we have \(\iota_X (d\beta) = \beta\); using this, and the fact that \(d^2 = 0\), the first term becomes \(d\beta\) and the second term is zero.

In fact, \( X\) doesn’t just expand the area \( d\beta\) exponentially; it also expands the Liouville 1-form \( \beta\) exponentially. In other words,
\[ L_X \beta = \beta. \]

To see this, we just apply the Cartan formula,
\[ L_X \beta = d \iota_X \beta + \iota_X d\beta = 0 + \beta. \]
The first term is zero because, as we saw, \( X\) points along \( \ker \beta\), so \( \iota_X \beta = \beta(X) = 0\); the second term is zero because of the definition of \( X\) as dual to \( \beta\).

To summarise our example so far: we started with a 1-form \( \beta\) whose exterior derivative \( d\beta\) was a non-degenerate 2-form. We took a vector field \( X\) dual to \( \beta\), using the non-degeneracy of \( d\beta\). We have found that:

  • The vector field \( X\) points along the foliation \( \ker \beta\); in other words, \( \beta(X) = 0\).
  • Flowing \( X\) expands area exponentially: \( L_X d\beta = d\beta\).
  • Flowing \( X\) in fact expands the 1-form \( \beta\) exponentially: \( L_X \beta = \beta\).

We proved some of these in sort-of generality, but not everything. Let’s prove it all at once in general, now.

PROPOSITION. Let \( \beta\) be a 1-form on a surface such that \( d\beta\) is non-degenerate. Let \( X\) be dual to \( \beta\), i.e. \( \iota_X d\beta = \beta\). Then:
(i) \(\beta(X) = 0\)
(ii) \( L_X d\beta = d\beta\)
(iii) \( L_X \beta = \beta.\)

PROOF. For (i), we note \( \beta(X) = \iota_X \beta\), and then use \( \beta = \iota_X d\beta\) and the fact that differential forms are antisymmetric:
\[ \beta(X) = \iota_X \beta = \iota_X \iota_X d\beta = d\beta(X,X) = 0. \]

For (ii) and (iii), we can then follow the arguments above. For (ii), we use the Cartan formula \( L_X = d \iota_X + \iota_X d\), the fact that \( d^2 = 0\), and the fact that \( \iota_X d\beta = \beta\).
\[ L_X d\beta = d \iota_X d\beta + \iota_X d d\beta = d\beta + 0 \]

For (iii), we use the Cartan formula, part (i) that \(\iota_X \beta = 0\), and the fact that \(\iota_X d\beta =\beta\).
\[ L_X \beta = d \iota_X \beta + \iota_X d \beta = d 0 + \beta = \beta. \]

So Liouville structures have some very nice properties.

And, this is all in fact classical physics. We can think of the plane as a phase space, and \( \beta\) as an action \( y \; dx = p \; dq\). Then \( d\beta\) is the symplectic form on phase space, and the equation \( L_X d\beta = d\beta\) shows that this symplectic form, the fundamental structure on the phase space, is expanded by the flow of \( X\). (There is something in classical mechanics, or symplectic geometry, known as Liouville’s thoerem, which also says something about the effect of a flow of a vector field on the symplectic form.)

Anyway, above we saw one example of a Liouville structure on the plane. Here’s another one, which is more “radial”.

Take \(\beta = \frac{1}{2}(x dy – y dx)\). Then \(d\beta = \frac{1}{2}(dx \wedge dy – dy \wedge dx) = dx \wedge dy\). The vector field \(X\) dual to \(\beta\) is then \(\frac{1}{2}(x \; dx + y \; dy)\), which is a radial vector field. The kernel of \(\beta\) is also the radial direction: \(\beta(X) = (xdy – ydx)(x \partial_x + y \partial_y) = xy – yx = 0\). The flow of \(X\) then looks like it should expand area exponentially, and it does.

Liouville structures can exist in other places too, and not just on surfaces: they exist in higher dimensions too. Notice that the fact that \(\beta\) was on a surface was never actually used in the proof above: it could have been any manifold, with any 1-form \( \beta\) such that \( d\beta\) is non-degenerate.

However, not every manifold has Liouville structures. In fact, there are many surfaces on which no Liouville structures exist. Any compact surface (without boundary) has no Liouville structure.

Why is this? The idea is pretty simple. If you have a closed and bounded surface, it’s pretty hard to have a smooth vector field \( X\) which expands the area! Your surface has a given finite area, and then you flow it along \( X\) — moving the points of the surface around by a diffeomorphism — and now it has exponentially larger area! This is pretty paradoxical, and indeed it’s a contradiction.

The plane escapes this paradox, because it’s not bounded. You can indeed expand the plane by any factor you like, and it’s still the same plane. Surfaces with boundary also escape the paradox, because the flow of \( X\) will not be defined for all time: eventually points will be pushed off the edge.

But a sphere torus does not escape the paradox. Nor does a torus, or any higher genus compact surface without boundary.

Further, Liouville structures can only exist in even dimensions. So you can’t have one on a 3-dimensional space. But you can have one on a 4-dimensional space. Why is this? It can be seen from linear algebra. You simply can’t have non-degenerate 2-forms in odd dimensions. (The easiest way I know to see this is as follows. Let \( \omega\) be a non-degenerate 2-form on an \( n\)-dimensional vector space. Choose a basis \( e_1, \ldots, e_n\) and write \( \omega(e_i, e_j)\) as the \( (i,j)\) term of the \( n \times n\) matrix \( A\) for \( \omega\). The facts that \( \omega\) is antisymmetric and non-degenerate mean that \( A^T= -A\) and \( \det A \neq 0\) respectively. But then \( \det A = \det A^T = \det (-A) = (-1)^n \det A\), so \( (-1)^n = 1\), and \(n\) is even.

Just as a Liouville structure cannot exist on a compact surface (without boundary), it can’t exist on any compact manifold (without boundary). The argument is similar: because \( X\) expands the 2-form \( d\beta\), it also expands all its exterior powers, and hence the volume form of the manifold, whatever (even) dimension it may be.

So, we’ve seen that a Liouville structure can only exist on a manifold with even dimension, and which is not compact, or has boundary. (If you know about de Rham cohomology, it’s not difficult to see why the manifold must have \( H^2 \neq 0\).)

But when it does, we have a wonderful little geometric triplet \( \beta, d\beta\) and \( X\).

Emmy had a theorem (mathematical nursery rhyme #2)

In the spirit of previous work in abstract algebra, I have, erm, adapted another nursery rhyme.

After all, the songs are so common and commonly known; why not update them with some definite content?

To the tune of “Mary had a little lamb” (with no disrespect to the original, which seems to be an endearing story of an actual lamb), a discussion of Noether’s theorem.

If you haven’t heard of Noether’s theorem, it is very nice. (It should be distinguished from several other theorems of Emmy Noether, and indeed other mathematical Noethers.)

Roughly speaking, Noether’s theorem states that whenever a physical system has a nice symmetry, there is always some numerical quantity which is conserved along with it.

For instance, if a physical system is invariant under translation, then there is a conserved quantity associated to it, known as momentum. (And there are translations in three independent directions in space, so there are three components of momentum which are conserved. In other words, momentum as a vector quantity is conserved.) Similarly, if it’s invariant under rotations, then there is a conserved quantity known as angular momentum. Invariant under moving forward and backward in time — a conserved quantity known as energy. And so on.

This is not very precise, and there are different ways of formulating it, and of course physicists and mathematicians have different perspectives about it — as well as the level of mathematical precision and rigour with which it should be stated and understood.

The wikipedia page, at least at the time of writing, has a very physics-oriented discussion, which would offend many mathematicians’ sensibilities — certainly including my own. The nicest mathematical formulation uses symplectic geometry, and hence some fairly serious prerequisite knowledge, well beyond the Australian undergraduate curriculum. (Unless you take an undergraduate research unit with me at Monash, perhaps!)

A good discussion may be found in the lecture notes of Ana Cannas da Silva, available online here. Once enough machinery is developed to state the principle cleanly (um, in section 24 on page 147…), the theorem is proved in a leisurely half a dozen lines.

Anyway, less talk, more nursery rhymes!

Emmy had a theorem,
theorem, theorem
Emmy had a theorem
Its proof was clear as day.

Everywhere a symmetry,
symmetry, symmetry
Everywhere a symmetry
A conserved quantity.

Golay Golay Golay (Top of the autocorrelation world)

In 1949, Marcel Golay was thinking about spectrometry.

As he described it some time later the situation was as follows.

You have a spectrometer. The point of spectrometry is to find the frequency of light (or electromagnetic radiation more generally — but for convenience I’ll just say “light” from now on). Given a light source, spectrometry aims to find which frequencies (or colours) of light occur in it, and how they are distributed across the optical spectrum.

The spectrometer Golay had in mind was a cleverly designed “multislit” one. As the name suggests, it had many slits. Each slit could be open or closed. Light would come in on one side, pass through the contraption, and then exit on the other side, where detectors would be placed to record the output.

Both the entrance side and the exit side had many slits — the same number \(4N\) on either side. (Why a multiple of 4? It’s all part of the clever design, read on…)

Moreover, each entrance slit had a natural pathway through to an exit slit. The slits were designed so that light entering a particular entrance slit would pass through to a specific exit slit. The entrance and exit slits were thus matched up in a one-to-one fashion. This “matching up” in fact “inverted” the light: light coming in through the top slit on the left, would exit through the bottom slit on the right; light entering through the second-from-top slit on the left would exist through the second-from-bottom slit on the right; and so on.

At least, that’s what would happen for one particular colour of light, i.e. one particular frequency — let’s say pure crimson red. The point of the spectrometer is to pick out distinct frequencies, and so this contraption is “tuned” to perfectly align the slits for crimson red light.

What about other frequencies? They get shifted. When light of another frequency, let’s say green, passed through an entrance slit, it did not end up in the same place as crimson red light, opposite to where it came in; rather, it ended up shifted across by some number \(j\) of slits.

In other words, if red light and green light enter through the same lit, they exit through slits which are \(j\) spots apart from each other.

Golay’s idea was to arrange the slits and detectors in a clever way, so as to eliminate all the light of other freuqencies, and isolate the preferred (red) light. By an ingenious arrangement of detectors and open and closed slits, the red light would be greatly enhanced, with other colours (frequencies) completely filtered out.

How did this arrangement go? In a slightly complicated way. The entrance slits would be split into four equal length sections, each of length \(N\), as would the exit slits. Light entering through a slit in a particular section would go out a slit in the corresponding (opposite) section of exit slits.

These sections were separated from each other. In particular, non-red light could be shifted across slits within one section, but it could not cross over to another section.

Golay imagined there to be two detectors. The first detector \(D_1 \) would cover the bottom two exit sections, measuring the total amount of light exiting the top half of the slits, i.e. the bottom \( 2N \) exit slits. The other detector \(D_2\) would cover the top half of the exit slits, i.e. the bottom two sections, the bottom \( 2N \) exit slits. The detectors \( D_1 \) and \( D_2 \) simply capture the amount of light coming out of the bottom and top \( 2N \) slits respectively, or equivalently for our purposes, the number of those slits through which light emerges.

So in effect the whole contraption is in four separated parts, and there are two detectors, each detecting the output from two of the parts.

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory. What do the a’s and b’s mean? Read on…

Now, how to arrange the open and closed slits? Let’s denote open slits by a \( +1 \) (or just \( + \) for short), and closed slits by a \( -1 \) (or just \( – \) for short). So a sequence of open and closed slits can be denoted by a sequence of \( + \) and \( – \) symbols.

(You might think \( 1 \)s and \( 0 \)s are more appropriate for open and closed slits then \( +1\)s and \( -1 \)s. You could indeed use \(1\)s and \(0\)s; in that case I’ll leave it up to you to adjust the mathematics below.)

Now Golay suggested taking two sequences \(a\) and \(b\) of \(+\)s and \(–\)s, each of length \(N\) . They would be used to configure the slits. Let’s write \( a = (a_1, \ldots, a_N) \) and \(b = (b_1, \ldots, b_N)\) , where every \(a_i \) or \( b_i \) is either a \( +1 \) or a \( -1 \) .

Now, sequences \(a \) and \( b \) each have length \( N\), but there are \( 4N \) entrance slits and \( 4N \) exit slits.

What to do? Golay said what to do. Golay said also to take the negatives of \( a \) and \( b\). The negative of a sequence is given by multiplying all its terms by \(-1 \) (just like how you take the negative of a number). In other words, to take the negative of the sequence \( a\), you replace each \( + \) with a \( – \), and each \(– \) with \(+\). We can write \(-a\) for the negative of \(a\), and \(-b\) for the negative of \(b\).

Golay suggested, very cleverly, that the \(4N \) entrance slits, from top to bottom, should be should be arranged using \(a \) (for the top \( N \) slits), then \( -a \) (for the next \( N\)), then \( b \) (for the next \(N\)), and finally \(-b\) (for the bottom \(N\) slits). So as we read down the slits we read the sequences \( a,-a,b,-b\).

On the exit side, because the light is “inverted”, we now read bottom to top. Golay suggested that, as we read up the slits, we use the sequences \( a,-a,-b,b\). That’s not quite the same as what we did on the entrance side. The top \(N\) entrance slits, set according to the sequence \(a\), correspond to the bottom \(N \) exit slits, also set according to the sequence \(a\). The next \(N\) entrance slits are set according to \(-a\), as are the next \( N \) exit slits. But after that, the entrance slits set according to \( b \) correspond to the exit slits set to \( -b \) ; and the final \( N \) entrance slits are set to \( -b\), with corresponding exit slits set to \( b\). So the \(a \) and \( -a \) slits “match”, but the \( b \) and \( -b \) “anti-match”.

We can now see what the a’s and b’s mean in Golay’s diagram. (Golay writes \( a’ \) and \( b’ \) rather than \(-a\) and \(-b\).)

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory, again.

One final twist: the output of the contraption is measured by the two detectors \( D_1 \) and \( D_2\). But Golay proposed not to add their results, but to subtract them. So the final number we want to look at is not \(D_1 + D_2\), but \( D_1 – D_2\).

Anyway, that was Golay’s prescription.

So what happens to light going through this spectroscopic contraption, now with its numerous slits configured in this intricate way?

First let’s consider red light — which, recall, means the light goes straight from entrance slit to opposite exit slit. We’ll take the four sections separately, which, we recall, are labelled \(a,-a,b,-b \) at the entrance, and \( a,-a,-b,b \) at the exit.

  • For light hitting one of the top \( N \) entrance slits, one encoded by \( a_i\), it is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, out to the corresponding exit slit, which is also labelled \( a_i = 1\), and through to the detector \( D_1\).
  • Similarly, consider one of the entrance slits in the next section, encoded by some \( -a_i\). Light is blocked if \(-a_i = -1 \) but if \( -a_i = 1 \) then the light sails over the the corresponding exit slit, also labelled \( -a_i = 1 \) , through to the detector \( D_1\).
  • Now consider the third section, where entrance slits are encoded by \( b \) but exit slits are encoded by \( -b\). Light hits a slit encoded by some \(b_i\). If \(b_i = -1\), the entrance slit is closed, and the light is blocked there. If \( b_i = 1\), the entrance slit is open, and the light enters, but then the exit slit is encoded by \( -b_i = -1\), so is closed, and the light is blocked here. Either way, the light is blocked.
  • The final section is similar. The entrance slit is labelled by some \( -b_i\), and the exit slit by \(b_i\). If \( -b_i = -1\), the entrance slit is closed and light is blocked; if \( -b_i = 1\), then the entrance slit is open, but as \( b_i = -1\), the exit slit is blocked. Either way the light is blocked.

Now detector \(D_1 \) counts the number of slits in the first two sections from which light emerges. In the first section, those slits are the ones encoded by \( a_i \) such that \( a_i = 1\). In the second section, those slits are the ones encoded by \( -a_i \) such that \( -a_i = 1 \) , i.e. \( a_i = -1\). On the other hand, \(D_2 \) detects nothing, as everything is blocked. So we have

\[
D_1 = ( \# i \text{ such that } a_i = 1) + ( \#i \text{ such that } a_i = -1), \\
D_2 = 0.
\]

The expression for \( D_1 \) simplifies rather dramatically, because every \( a_i \) is either \( +1 \) or \( -1\). If you add up the number of \(+1\)s and the number of \(-1\)s, you simply get the number of terms in the sequence, which is \(N\). Thus in fact
\[
D_1 = N, \quad D_2 = 0,
\]
and the final result (remember we subtract the results of the two detectors) is
\[
D_1 – D_2 = N.
\]

So, we end up with a nice result, when we feed Golay’s spectroscope light of the colour it’s designed to detect (i.e. red).

Now, what happens with other colours? Let’s now feed Golay’s spectroscope some other colour (i.e. frequency, i.e. wavelength) of light, which means that the light gets shifted across \( j \) slots. Let’s say the light is green.

  • Consider green light hitting one of the top \( N \) entrance slits, encoded by \( a_i\). The light is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, over to the corresponding exit slits, which are also encoded by the sequence \( a\). The light is shifted across \(j \) slots in the process, and so arrives at the exit slit encoded by \( a_{i+j}\). If \(a_{i+j} = 1\), the light proceeds to detector \( D_1\); otherwise, the light is blocked. In other words, the green light gets to the detector if and only if \( a_i = a_{i+j} = 1\).
  • (Note also that if \( i+j > N \) or \( i+j < 1\), then the light beam gets shifted so far across that it hits the end of the section of the machine; and the sections are separated from each other. So we only need to consider those \( i \) (which are between \( 1 \) and \( N \) ) such that \( i+j \) is also between \( 1 \) and \( N\). In other words, (assuming \( j \) is positive) \( i \) only goes from \( 1 \) up to \( N-j\).
  • Now consider green light hitting the second section, where entrance and exit slits are labelled by \( -a_i\). If \( a_i = -1\), then light is blocked at the entrance. If \( -a_i = 1\), light enters, and proceeds with a shift over to an exit slit encoded by \( -a_{i+j}\). If \( -a_{i+j} = -1\), light is blocked at the exit, but if \( -a_{i+j} = 1\), then the light proceeds to detector \( D_2\). In other words, light gets to the detector \( D_1 \) if and only if \( -a_i = -a_{i+j} = 1\), or equivalently, \( a_i = a_{i+j} = -1\).
  • In the third section, entrance slits encoded by \( b \) and exit slits by \( -b\). For light to get through, we must have \( b_i = 1 \) and \( -b_{i+j} = 1\).
  • Finally, in the fourth section, entrance slits are encoded by \( -b \) and exit slits by \( b\). Light gets through when \( -b_i = 1 \) and \( b_{i+j} = 1 \).

Putting these together, we have
\[
D_1 = ( \# i \text{ such that } a_i = 1 \text{ and } a_{i+j} = 1) + ( \# i \text{ such that } a_i = -1 \text{ and } a_{i+j} = -1), \\
D_2 = ( \# i \text{ such that } b_i = 1 \text{ and } b_{i+j} = -1) + ( \# i \text{ such that } b_i = -1 \text{ and } b_{i+j} = 1).
\]

Now let’s manipulate these sums a little. Note that, for any \(i\), \( a_i = \pm 1 \) and \( a_{i+j} = \pm 1\). Thus the product \( a_i a_{i+j} = \pm 1\). But note that \( a_i a_{i+j} = 1 \) precisely when \( a_i = a_{i+j} = 1\), or \( a_i = a_{i+j} = -1\), i.e. when \( a_i \) and \( a_{i+j} \) are equal. These are precisely the cases counted in the sum for \( D_1 \) above. When \( a_i \) and \( a_{i+j} \) are not equal, they multiply to \( -1 \) instead.

Similarly, consider \( b_i \) and \( b_{i+j}\). The product \( b_i b_{i+j} \) is equal to \( -1 \) precisely when \( b_i = 1 \) and \( b_{i+j} = -1 \) , or when \( b_i = -1 \) and \( b_{i+j} = 1 \) . And these are precisely the cases counted above for \( D_2 \) .

So we have
\[
D_1 = ( \# i \text{ such that } a_i a_{i+j} = 1 ),\\
D_2 = ( \# i \text{ such that } b_i b_{i+j} = -1).
\]
Now, as we’ve said, for each \(i\), \( a_i a_{i+j} \) is \( 1 \) or \( -1\). For how many \( i \) do we get \( +1\)? Precisely \(D_1 \) times! Because that’s exactly what the equation above for \( D_1 \) says. All the other terms must be \( -1\). And we said above that \( i \) goes from \( 1 \) up to \( N-j\). So there are \( N-j-D_1 \) times that \( a_i a_{i+j} = -1\).

Let’s now just add up all the terms \( a_i a_{i+j}\), all the way from \( i=1\), i.e. the term \( _1 a_{1+j}\), to \( i=N-j\), i.e. the term \(a_{N-j} a_{N-j+j}\). We get \(+1 \) sometimes — precisely \( D_1 \) times — and \( -1 \) sometimes — precisely \( N-j-D_1 \) times. It follows that
\[
a_1 a_{1+j} + \cdots + a_{N-j} a_{N-j+j} = 1 \cdot D_1 + (-1) \cdot (N-j-D_1)
\]
or if we tidy up,
\[
\sum_{i=1}^{N-j} a_i a_{i+j} = 2D_1 – N + j.
\]
We can do the same for the terms \( b_i b_{i+j}\). We get \( -1 \) precisely \( D_2 \) times, as the equation for \( D_2 \) says above. And we get \( +1 \) all the other times, but there are \( N-j \) times overall, so we get \( +1 \) precisely \( N-j-D_2 \) times. Hence
\[
b_1 b_{1+j} + \cdots + b_{N-j} b_{N-j+j} = 1 \cdot (N-j-D_2) + (-1) \cdot D_2,
\]
or equivalently,
\[
\sum_{i=1}^{N-j} b_i b_{i+j} = -2D_2 + N – j.
\]

We want to get the final result of the detectors, which is \( D_1 – D_2\) . So let’s rearrange the equations above to obtain \( D_1 \) and \( D_2\),
\[
D_1 = N – j + \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j}, \\
D_2 = N – j – \frac{1}{2} \sum_{i=1}^{N-j} b_i b_{i+j},
\]
and subtract. When we do so, things simplify considerably!
\[
D_1 – D_2 = \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j}
\]

This is a very nice result. And it reduces what Golay wanted to a very interesting maths problem. Two sequences \( a = (a_1, \ldots, a_N) \) and \( b = (b_1, \ldots, b_N) \) of \( \pm 1\)s are called a complementary pair or a Golay pair if, for all \( j \neq 0\), this sum is zero:
\[
\sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j} = 0.
\]
Sums like these are often called autocorrelations. So the property we are looking for is a property of autocorrelations. Golay pairs are all about autocorrelations. Hence the title of this post.

If you can find a pair of Golay complementary sequences, then you can configure all the slits in the multislit spectrometer according to the sequence, and for any colour except the one you are looking for (red), the detectors will perfectly cancel out that colour! So your spectrometry will be greatly enhanced.

Now you might wonder, do any such pairs exist?

Yes, that do. Oh yes, they do. And that is also a very interesting question — not yet completely solved, with lots of ongoing research.

Stay tuned for more.

P.S. Yes, the title of this blog post is based on a song by Chumbawumba. It’s a very excellent song.

The “Australia day” category error

Australia’s national holiday commemorates not some heroic act, but the arrival of settler colonists who occupied, and settled that land, dispossessing the original and rightful inhabitants of the continent. Aboriginal sovereignty was never ceded; no treaty has ever been signed. Historic dispossession and violence, involving frontier wars and genocidal campaigns, decimated the Indigenous nations. There is struggle and heroism here, but mainly in the capacity of Indigenous peoples to resist and to survive.

Suppose I came to your home, invited myself in, made it my home, took your possessions, evicted or kidnapped or infected or murdered your family, and then celebrated the anniversary of my arrival each year — what would be the appropriate response?

And the answer is the same in the excruciatingly mind-numbing debate each year in Australia about whether the national holiday is appropriate.

(To avoid maximum excruciation, let us state the obvious. Clearly this analogy is not literal; no individual living today bears direct moral culpability for tragedies which unfolded in historical time. But it is precisely the symbolism, and national commemorations are pure symbolism, by design.)

The question in this mind-numbing debate may be an easy one, but even to ask it — of non-Indigenous Australians — contains a category error.

If I took over your home and then held a celebration there each year, it is not for me to say whether that celebration is appropriate. It is for you to say. I may well say it is not appropriate, but even if I think it is, your view counts for more; you have suffered the injustice. The correct answer is not just “no”, but also “it’s not for me to say”.

And so, to answer the question of the appropriateness of “Australia day”, the answers of Indigenous people are the most important. Everybody is entitled to their opinion, but an opinion on the question which does not take into account the views of Indigenous people cannot be taken seriously.

Views of Indigenous Australians can easily be found. The broadest data I’m aware of are poll results from 2017, a survey of 1,156 Indigenous Australians about “Australia day”. (If you know a better or more recent poll I would be happy to update.) It found that:

  • 54% of Indigenous Australians were in favour of a change of date. This may suggest that only a slim majority are against the event, but further results make it clear that the other 46% are far from being uniformly enthusiastic. For instance:
  • The survey asked participants to associate three words with Australia day. The most chosen words by Indigenous Australians were “invasion”, “survival” and “murder”.
  • A majority of Indigenous Australians said that the name “Australia day” should change.
  • 23% of Indigenous participants felt positive about Australia day, 30% had mixed feelings, and 31% had negative feelings.

Despite the above poll results, in January 2018 the Indigenous Affairs minister (who is not Indigenous) claimed that “no Indigenous Australian has told him the date of Australia Day should be changed other than a single government adviser”. This says more about a politician being out of touch, than it does about the distribution of opinion among Indigenous Australians.

In contrast, Jack Latimore, editor of IndigenousX, the prominent online platform for Indigenous voices, comes to a rather different conclusion.
Based on his extensive experience and engagement with Indigenous Australians from across the social and political spectrum, his conclusion is worth repeating:

When it comes to the subject of 26 January, the overwhelming sentiment among First Nations people is an uneasy blend of melancholy approaching outright grief, of profound despair, of opposition and antipathy, and always of staunch defiance.

The day and date is steeped in the blood of violent dispossession, of attempted genocide, of enduring trauma. And there is a shared understanding that there has been no conclusion of the white colonial project when it comes to the commonwealth’s approach to Indigenous people. We need only express our sentiments regarding any issue that affects us to be quickly reminded of the contempt in which our continued presence and rising voices are held.

Nor is our sentiment in regards to 26 January a recent phenomenon. I have witnessed it throughout my life in varied intensities. Evidence of it is even present in the recorded histories of White Australia.

Indeed, the long history of Indigenous protest against a January 26 celebration goes back at least to boycotts in 1888, and numerous actions on the 1938 sesquicentenary.

Returning to the present, numerous community leaders and representative bodies have also given their views, many of which are available online. Below are links to some such views; of course there plenty more are easily found.

Changing the date is an obvious, minimal, easy next step on the road to justice for Indigenous Australia. At the very least, maintaining the celebration in its current form is untenable. A minimal step towards respect for Indigenous Australia is to stop dancing on their ancestors’ graves.

Nor is it particularly opposed by the general Australian public. According to a December 2017 poll, most Australians are ignorant of the history of Australia Day, can’t guess what historical event happened on that day, and don’t really mind on what date it is celebrated. Half also think that the national holiday should not be held on a date offensive to Indigenous Australians (even though a plurality wrongly believes that January 26 is not offensive to Indigenous Australians).

As of a January 2017 poll, only 15% of Australians wanted to change the date. That number may well have increased by now, with the momentum of the movement to change the date.

And the survey apparently did not have “it’s not for me to say” as an option for non-Indigenous respondents — reinforcing the standard, annual category error.

I don’t believe in any patriotic holidays. But a patriotic holiday on such a terrible date needs to be moved, rebuilt, or abolished.

Topological entropy: information in the limit of perfect eyesight

Entropy is a notoriously tricky subject. There is a famous anecdote of John von Neumann telling Claude Shannon, the father of information theory, to use the word “entropy” for the concept he had just invented, because “nobody knows what entropy really is, so in a debate you will always have the advantage“.

Entropy means many different things in different contexts, but there is a wonderful notion of entropy which is purely topological. It only requires a space, and a map on it. It is independent of geometry, or any other arbitrary features — it is a purely intrinsic concept. This notion, not surprisingly, is known as topological entropy.

There are a few equivalent definitions; we’ll just discuss one, which is not the most general. As we’ll see, it can be described as the rate of information you gain about the space by applying the function, when you have poor eyesight — in the limit where your eyesight becomes perfect.

Let \(X\) be a metric space. It could be a surface, it could be a manifold, it could be a Riemannian manifold. Just some space with an idea of distance on it. We’ll write \(d(x,y)\) for the distance between \(x\) and \(y\). So, for instance, \(d(x,x) = 0\); the distance from a point to itself is zero. Additionally, \(d(x,y) = d(y,x)\); the distance from \(x\) to \(y\) is the same as the distance from \(y\) to \(x\); the triangle inequality applies as well. And if \(x \neq y\) then \(d(x,y) > 0\); to get from one point to a different point you have to travel over more than zero distance!

We assume \(X\) is compact, so roughly speaking, it has no holes, it doesn’t go off to infinity, its volume (if it has a volume) is finite.

Now, we will think of \(X\) as a space we are looking at, but we can’t see precisely. We have myopia. Our eyes are not that good, and we can only tell if two points are different if they are sufficiently far apart. We can only resolve points which have a certain degree of separation. Let this resolution be \(\varepsilon\). So if two points \(x,y\) are distance less than \(\varepsilon \) apart, then our eyes can’t tell them apart.

Rather than thinking of this situation as poor vision, you can alternatively suppose that \(X\) is quantum mechanical: there is uncertainty in the position of points, so if \(x\) and \(y\) are sufficently close, your measurement can’t be guaranteed to distinguish between them. Only when \(x\) and \(y\) are sufficiently far apart can your measurement definitely tell them apart.

We suppose that we have a function \(f \colon X \rightarrow X\). So \(f\) sends points of \(X\) to points of \(X\). We assume \(f\) is continuous, but nothing more. So, roughly, if \(x\) and \(y\) are close then \(f(x)\) and \(f(y)\) are close. (Making that rough statement precise is what the beginning of analysis is about.) We do not assume that \(f\) is injective; it could send many points to the same point. Nor do we assume \(f\) is surjective; it might send all the points of \(X\) to a small region of \(X\). All we know about \(f\) is that it jumbles up the points of \(f\), moving them around, in a continuous fashion.

We are going to define the topological entropy of \(f\), as a measure of the rate of information we can get out of \(f\), under the constraints of our poor eyesight (or our quantum uncertainty). The topological entropy of \(f\) is just a real number associated to \(f\), denoted \(h_{top}(f)\). In fact it’s a non-negative number. It could be as low as zero, and it can be infinite; and it can be any real number in between.

We ask: what is the maximum number of points can we distinguish, despite our poor eyesight / quantum uncertainty? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that any two of them are separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, we have \(d(x_i, x_j) \geq \varepsilon\). And if the answer is \(N\), then this is the maximum number; so there do not exist \(N+1\) points which are all separated by a distance of at least \(\varepsilon\).

Call this number \(N(\varepsilon)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our poor eyes can tell apart.

(Note that the number of points you can distinguish is necessarily finite, since they all lie in the compact space \(X\). There’s no way your shoddy eyesight can tell apart infinitely many points in a space of finite volume! So \(N(\varepsilon)\) is always finite.)

Clearly, if our eyesight deteriorates, then we see less, and we can distinguish fewer points. Similarly, if our eyes improve, then we see more, so we can distinguish more points. Eyesight deterioration means \(\varepsilon\) increases: we can only distinguish points if they are further apart. Similarly, eyesight improvement means \(\varepsilon\) decreases: we can tell apart points that are closer together.

Therefore, \(N(\varepsilon)\) is a decreasing function of \(\varepsilon\). As \(\varepsilon\) increases, our eyesight deteriorates, and we can distinguish fewer points.

Now, we haven’t yet used the function \(f\). Time to bring it into the picture.

So far, we’ve thought of our eyesight as being limited by space — by the spatial resolution it can distinguish. But our eyesight also applies over time.

We can think of the function \(f\) as describing a “time step”. After each second, say, each point \(x\) of \(X\) moves to \(f(x)\). So a point \(x\) moves to \(f(x)\) after 1 second, to \(f(f(x))\) after 2 seconds, to \(f(f(f(x)))\) after 3 seconds, and so on. In other words, we iterate the function \(f\). If \(f\) is applied \(n\) times to \(x\), we denote this by \(f^{(n)}(x)\). So, for instance, \(f^{(3)}(x) = f(f(f(x)))\).

The idea is that, if you stare at two moving points for long enough, you might not be able to distinguish them at first, but if eventually you may be able to. If they move apart at some point, then you may be able to distinguish them.

So while your eyes are encumbered by space, the are assisted by time. Your shoddy eyes have a finite spatial resolution they can distinguish, but over time points may move apart enough for you to resolve them.

(You can also think about this in a “quantum” way. The uncertainty principle says that uncertainties in space and time are complementary. If you look over a longer time period, you allow a greater uncertainty in time, which allows for smaller uncertainty in position. But from now on I’ll stick to my non-quantum myopia analogy.)

We can then ask a similar question: what is the maximum number of points we can distinguish, despite our myopia, while viewing the system for \(T\) seconds? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that at some point over \(T\) seconds, i.e. \(T\) iterations of the function \(f\), any two of them become separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, there exists some time \(t\), where \(0 \leq t \leq T\), such that \(d(f^{(t)}(x_i), f^{(t)}(x_j)) \geq \varepsilon\). And if the answer is \(N\), then this is again the maximal number, so there do not exist \(N+1\) points which all become separated at some instant over \(T\) seconds.

Call this number \(N(f, \varepsilon, T)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our decrepit eyes can distinguish over \(T\) seconds, i.e. \(T\) iterations of the function \(f\).

Now if we allow ourselves more time, then we have a better chance to see points separating. As long as there is one instant of time at which two points separate, we can distinguish them. So as \(T\) increases, we can distinguish more points. In other words, \(N(f, \varepsilon, T)\) is an increasing function of \(T\).

And by our previous argument about \(\varepsilon\), \(N(f, \varepsilon, T)\) is a decreasing function of \(\varepsilon\).

So we’ve deduced that the number of points we can distinguish over time, \(N(f, \varepsilon, T)\), is a decreasing function of \(\varepsilon\), and an increasing function of \(T\).

We can think of the number \(N(f, \varepsilon, T)\) as an amount of information: the number of points we can tell apart is surely some interesting data!

But rather than think about a single instant in time, we want to think of the rate of information we obtain, as time passes. How much more information do we get each time we iterate \(f\)?

As we iterate \(f\), and we look at our space \(X\) over a longer time interval, we know that we can distinguish more points: \(N(f, \varepsilon, T)\) is an increasing function of \(T\). But how fast is it increasing?

To pick one possibility out of thin air, it might be the case, that every time we iterate \(f\), i.e. when we increase \(T\) by \(1\), that we can distinguish twice as many points. In that case, \(N(f, \varepsilon, T)\) doubles every time we increment \(T\) by 1, and we will have something like \(N(f, \varepsilon, T) = 2^T\). In this case, \(N\) is increasing exponentially, and the (exponential) growth rate is given by the base 2.

(Note that doubling the number of points you can distinguish is just like having 1 extra bit of information: with 3 bits you can describe \(2^3 = 8\) different things, but with 4 bits you can describe \(2^4 = 16\) things — twice as many!)

Similarly, to pick another possibility out of thin air, if it were the case that \(N(f, \varepsilon, T)\) tripled every time we incremented \(T\) by \(1\), then we would have something like \(N(f, \varepsilon, T) = 3^T\), and the growth rate would be 3.

But in general, \(N(f, \varepsilon, T)\) will not increase in such a simple way. However, there is a standard way to describe the growth rate: look at the logarithm of \(N(f, \varepsilon, T)\), and divide by \(T\). For instance, if \(N(f, \varepsilon, T) \sim 2^T\), then we have \(\frac{1}{T} \log N(f, \varepsilon, T) \sim 2\). And then see what happens as \(T\) becomes larger and larger. As \(T\) becomes very large, you’ll get an asymptotic rate of information gain from each iteration of \(f\).

(In describing a logarithm, we should technically specify what the base of the logarithm is. It could be anything; I don’t care. Pick your favourite base. Since we’re talking about information, I’d pick base 2.)

This leads us to think that we should consider the limit
\[
\lim_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).
\]
This is a great idea, except that if \(N (f, \varepsilon, N)\) grows in an irregular fashion, this limit might not exist! But that’s OK, there’s a standard analysis trick to get around these kinds of situations. Rather than taking a limit, we’ll take a lim inf, which always exists.
\[
\liminf_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).
\]

(The astute reader might ask, why lim inf and not lim sup? We could actually use either: they both give the same result. In our analogy, we might want to know the rate of information we’re guaranteed to get out of \(f\), so we’ll take the lower bound.)

And this is almost the definition of topological entropy! By taking a limit (or rather, a lim inf), we have eliminated the dependence on \(T\). But this limit still depends on \(\varepsilon\), the resolution of our eyesight.

Although our eyesight is shoddy, mathematics is not! So in fact, to obtain the ideal rate of information gain, we will take a limit as our eyesight becomes perfect! That is, we take a limit as \(\varepsilon\) approaches zero.

And this is the definition of the topological entropy of \(f\):
\[
h_{top}(f) = \lim_{\varepsilon \rightarrow 0} \liminf_{T \rightarrow \infty} \frac{1}{T} \log N(f, \varepsilon, n).
\]
So the topological entropy is, as we said in the beginning, the asymptotic rate of information we gain in our ability to distinguish points in \(X\) as we iterate \(f\), in the limit of perfect eyesight!

As it turns out, even though we heavily relied on distances in \(X\) throughout this definition, \(h_{top}(f)\) is completely independent of our notion of distance! If we replace our metric, or distance function \(d(x,y)\) with a different one, we will obtain the same result for \(h_{top}\). So the topological entropy really is topological — it has nothing to do with any notion of distance at all.

This is just one of several ways to define topological entropy. There are many others, just as wonderful and surprising and which scratch the tip of an iceberg.

References:

Abstract algebra nursery rhyme

In the spirit of hilariously advanced baby books like Chris Ferrie’s Quantum Physics for Babies, I have taken to incorporating absurdly sophisticated concepts into nursery rhymes.

To the tune of the ABC song (or, equivalently, Twinkle Twinkle Little Star):

The axioms of a group go 1, 2, 3
Identity, inverse, associativity!
The identity times any element g is g,
Inverse of g times g is identity,
Associativity says ab times c
is equal to a times bc.

The last resort of scoundrels

Samuel Johnson said it was “the last resort of scoundrels“; Emma Goldman, a menace to liberty. Leo Tolstoy said it “as a feeling is bad and harmful, and as a doctrine is stupid“. Patriotism, at least in its usual sense of love of one’s country over others, veneration of the virtue of its people over others, and adoration of its flag, is awful, irrational nonsense.

How on earth one can deduce moral values, or even a positive emotional response, from a geographic entity — indeed, such powerful emotions as to move men to war (yes, usually men) — has always eluded me.

It may be that there may be various administrative reasons to divide a geographical area (like the earth, or a continent) into official or legal sub-regions (like countries, or states).

More importantly, it may be that, for one born in a land oppressed by a colonist, an occupier, or other oppressor, the natural solidarity among those oppressed peoples in their legitimate resistance may be expressed in the language of patriotism.

And it may be that there can be good, even uniquely good, things about a nation’s culture, and that it is worth recalling them occasionally — though there will equally be bad, even uniquely bad aspects also. One must never forget that people everywhere are roughly equally good and equally bad.

It may also be that countries may have sporting teams, or the like, and it can be fun to barrack for them.

Beyond that, there is nothing positive to say about patriotism.

Even if a country is physically beautiful, others are too. Even if a country’s culture or people are wonderful, others are too. There are wonderful people and wonderful ideas everywhere, just as there are horrible people everywhere. Venerating only those nearby, to the exclusion of others, is insular, narcissistic, and leads naturally to racism, chauvinism, and xenophobia.

Even if the highly dubious conceit of orthodox patriotism is true for a country — that this nation is great and to be preferred over others, despite all the other ones believing the same — it does not follow that that one ought to venerate this nation: if one wants to venerate something, one should venerate good things and good people, whether here, there or anywhere.

(Incredibly, orthodox patriotism means that vast numbers people in every land can believe precisely this, despite those elsewhere thinking the same. They cannot all be right, but they can all be wrong — living “in a gross and harmful delusion“. It is the same with all religions claiming to be the one true religion, of course. It discloses something deep, and deeply worrisome, about the human condition, that vast numbers of people are capable of this conceit.)

What matters are universal moral values, equity, justice, freedom, and so on; not the country in which they are expressed. One’s specific birthplace or homeland or nation is irrelevant.

This is kindergarten level morals; except that the corresponding kindergarten situation, of a group of children each boasting they are the best, will be resolved by a game or by a distraction, rather than by oppression, detention archipelagos, or war.

Perhaps the worst aspect of patriotism is in the cultural realm. It creates mythologies, with deep and powerful emotions latent within its manufactured communities. These emotions, fueled also by resentment of outsiders, can be manipulated by regressive political forces to reinforce inequalities, persecute outsiders, and stoke wars.

These mythologies are created when a nation’s history is recounted as virtuous, dramatic and heroic. But it is the same with other nations; and if retelling the story of one nation excludes other peoples and nations (or worse, disparages or invokes hatred of them), then it leads in the direction of, at best, insularity and stagnation, and at worst, militarism, oppression and war.

Then there is Australia.

Here, the magnitude of the artifice required to tell the nation’s history as a virtuous story is itself heroic. The result is an increasingly viciously enforced cultural orthodoxy, together with a crushing cultural cringe.

An island continent, home to hundreds of Indigenous nations, until colonised by an imperial power to create an antipodean jail; the original inhabitants and rightful owners dispossessed by the accumulation of property and capital and microbes, by genocidal policy, and by over a century of smouldering frontier war; no galvanizing wars fought for independence, only complicity in the motherland’s imperial ambitions, and a standard role in humanity’s propensity for worldwide violence; with all the bravery, heroism, obedience, murder and atrocity that entails. The overall arc of post-settlement history must be twisted beyond recognition to confect an orthodox patriotic mythology.

There are plenty of heroic Australians, to be sure; just as there are plenty of villains, and everything in between. And there are plenty of legitimate sources of pride in that nation’s achievements, just as there are plenty of horrific sources of shame.

Nothing more and nothing less; special in some ways and not in others; which is precisely the negation of every orthodox patriotic myth.

Limitless as that space too narrow for its inspirations

On 22 February, 1877, James Joseph Sylvester gave an “Address on Commemoration day at Johns Hopkins University”.

Sylvester, the very excellent English mathematician, worked in areas of what we would today call algebra, number theory, and combinatorics. He is known for his algebraic work in invariant theory; he is known for his work in combinatorics, such as Sylvester’s Problem in discrete geometry; and for much else. He invented several terms which are commonplace in mathematics today — “matrix”, “graph” (in the sense of graph theory) and “discriminant”. He was also well known for his love of poetry, and indeed his poetic style. (He in fact published a book, The Laws of Verse, attempting to reduce “versification” to a set of axioms.)

I came across this address of Sylvester, not through mathematical investigations or in the references of a mathematical book, but rather in the footnotes of the book “Awakenings”, in which the late neurologist Oliver Sacks discusses, in affectionate and literary detail, the case histories of a number of survivors of the 1920s encephalitis lethargica (“sleeping sickness”) epidemic — an interesting and mysterious event in itself — as those patients are treated in the 1960s with the then-new drug L-DOPA and experience wondrous “awakenings”, often after decades of catatonia, although often followed by severe tribulations. (These awakenings were the subject of the 1990 Oscar-nominated movie of the same name.) These tribulations, in each patient, form an odyssey through the depths of human ontology, in which the effects of personality, character, physiology, environment, and social context are all present and deeply intertwined.

Sacks comes to the conclusion that a reductionist approach to medicine, focusing on the cellular and the chemical, is wholly deficient:

What we do see, first and last, is the utter inadequacy of mechanical medicine, the utter inadequacy of a mechanical world-view. These patients are living disproofs of mechanical thinking, as they are living exemplars of biological thinking. Expressed in their sickenss, their health, their reactions, is the living imagination of Nature itself, the imagination we must match in our picturing of Nature. They show us that Nature is everywhere real and alive and that our thinking about Nature must be real and alive. They remind us that we are over-developed in mechanical awareness; and that it is this, above all, that we need to regain, not only in medicine, but in all science.

Indeed, Sacks quotes from W H Auden’s “The Art of Healing”:

‘Healing,’
Papa would tell me,
‘is not a science,
but the intuitive art
of wooing Nature.

In an accompanying footnote, Sacks notes that mathematical thinking is real and alive, in just the same way. He quotes the aforementioned address of Sylvester.

Mathematics is not a book confined within a cover and bound between brazen clasps, whose contents it needs only patience to ransack; it is not a mine, whose treasures may take long to reduce into possession, but which fill only a limited number of veins and lodes; it is not a soil, whose fertility can be exhausted by the yield of successive harvests; it is not a continent or an ocean, whose area can be mapped out and its contour defined: it is limitless as that space which it finds too narrow for its aspirations; its possibilities are as infinite as as the worlds which are forever crowding in and multiplying upon the astronomer’s gaze; it is as incapable of being restricted within assigned boundaries or being reduced to definitions of permanent validity, as the consciousness, the life, which seems to slumber in each monad, in every atom of matter, in each leaf and bud and cell, and is forever ready to burst forth into new forms of vegetable and animal existence.

Sylvester is right, and if anything his argument is not forceful enough. Mathematics has always been limitless — and even more limitless than the seemingly (to Sylvester, at least) infinite possibilities of astronomy and biology — for, unlike the experimental or observational sciences, it requires no substrate in reality beyond the imagination of those who think it. Liberated from the necessity to study only this world, mathematics studies all the worlds it can imagine, which include our own but go far beyond our own one. (It is perhaps surprising, and even “unreasonable”, as Wigner argued, that we can count our own world as among those which are mathematical; but it is not surprising that its worlds transcend ours.)

The progress of science has displayed, in an absolute sense, how mathematics outstrips the limitlessness of other sciences.

However many may be the worlds of the astronomer — now teeming also with exoplanets and gravitational waves — they are still finite; the observable universe has a finite radius.

Sylvester’s panpsychism (everything has consciousness) is now out of fashion, but seems focused on biology — and we now know that biological life is constrained by genetics, and at the molecular level by DNA and related biochemistry. Mathematics knows no such constraint.

Taking panpsychism more generally, there is an argument — and a strong one, in my view — that understanding consciousness will eventually require a radical revision of our understanding of physics. But even then, I very much doubt any such radical revision would completely transcend mathematics — and I very much doubt that mathematics would not encompass infinitely more.

It is worth noting, though, that mathematics is, in a certain sense, reductionism par excellence. Even accepting what we know about incompleteness theorems and the like, mathematics, theoretically at least, can be reduced to sets of axioms and logical arguments, in the end consisting only of formal logic, modus ponens and the like. That is not how mathematicians do mathematics in practice, but that is the orthodox view on what mathematics formally is. Even the standard theorems that mathematics “knows no bounds” — the Godel incompleteness theorems, the Cantor diagonalisation argument, the set-theoretic paradoxes like Russell’s, for instance — can themselves be expressed, reductionistically, in this formal way.

All the infinite possibilities, the unboundedness, of mathematics, then, can be expressed in a very finite, very discrete, very reductionistic way. This is not surprising — even with finitely many letters one can construct an infinity of sentences, one can burst all brazen clasps, one can empty all veins and lodes, one can exhaust all soils, there is no end to the harvest, however dizzying and rarefied the altitude at which it is sown.

And as for definitions of permanent validity? At least in terms of the experience of learning, doing and discovering mathematics, I cannot go past Ada Lovelace’s definition of the “poetical science” as “the language of the unseen relations between things”.

There is much else of interest — and not just historical interest — in Sylvester’s address. Mathematics impedes public speaking; university study and research ought to avoid monetary reward and public recognition; students should avoid “disorder or levity”; all researchers should simultaneously engage in teaching; anecdotes of arithmetic in the French revolution; every science improves as it becomes more mathematical; and the taste for mathematics is much broader than one might think. So argues Sylvester, poet, mathematician; perhaps I will return to these arguments one day.

The Doors of Crime Perception

Crime is uniquely susceptible to the manipulation of perceptions.

It is common, it is bad, it is fascinating.

A wide spectrum of this common, bad, fascinating activity exists, and the fixation of fascinated attention on certain narrow portions of this spectrum serves numerous powerful political interests. Those numerous, already-aligned, authoritarian political interests — tabloid media, conservative politicians — are only too happy to indulge the public’s fascination. No similar political interest is usually served by attempting to understand other portions of the spectrum. Attempting to understand the spectrum as a whole, or the overall picture and causes of crime, might serve the purpose of building a better society, but that purpose is one which, all entrenched political powers agree, must remain unthinkable.

Which types of crime are they, on which power so fixates attention? Preferably those which are sensational, preferably involving violence and fear, preferably with perpetrators who are suitably villainous and “not like us”, where “we” means the “good folk” who are normalised within society. Powerless, marginalised groups form perfect villains: immigrants, ethnic minorities, racial minorities, Indigenous people, and in general, “others”.

In Australia at present, that means asylum seekers and refugees, it means African Australians, it means Aboriginal Australians.

Accordingly, the fixated attention of society on this narrow portion of crime — and its villains — blows it out of all proportion. Perceptions of crime in society can warp radically, tending towards fear and paranoia of the fixated type of crime, and the fixated vilains — and generalises to a fear of society at large.

The propaganda powers of media campaigns, their political protagonists, and their guerrilla online counterparts, are substantial. The far right delights in it.

A fearful populace is one that is easier to control. It is one which will more easily submit to existing oppression as justified or necessary, and accept further devolution towards a surveillance or police state. Fearful people will tend to look out only for themselves, diminishing the bonds of social solidarity, and furthering capitalist atomisation. And as the public holds a paranoid, distorted idea of reality, the desire to understand society, and in particular the root causes of crime, diminishes, or becomes unthinkable. Hysterical overreaction to the villains is the urgent goal, anything else is wasting time against this menace. The already marginalised will be oppressed further.

* * *

What is the situation in Victoria?

Crime statistics are freely available in Victoria.

What do they say?

(Let us put aside broader questions, such as whether existing laws are good laws, whether the criminal justice system is a good one, what better systems might exist, and so on.)

We can, for the moment, put aside subtle questions of methodology. (Do people report more crimes now, especially domestic violence? Should we refer to the number of criminal incidents, recorded offences, or offenders?) Because in any case it the statistics tell a fairly clear story.

To a first approximation, in Victoria, crime rates have decreased since 2016. They were roughly level from 2009 to 2015, at a rate of just under 6,000 incidents per 100,000 population, with a jump in 2016 to over 6,600. The rate has since decreased, and the current crime rate is similar to the rate of 2009-15. This crime rate is roughly similar to other states in Australia.

Some categories of crime, however, have not decreased from 2016-18. Assaults have remained steady at around 610 incidents per 100,000 population, and sexual offences have increased from about 110 to 132 incidents per 100,000 population. On the other hand, theft and burglary have decreased dramatically (from about 2,500 to 2,100, and from about 840 to 620 incidents per 100,000 population, respectively).
(More detail can be found from the Age here or in the statistics themselves.)

These numbers are too high. They mean thousands of sexual offences, tens of thousands of assaults and burglaries, and hundreds of thousands of thefts, happen each year, in Victoria. Each such crime is potentially a source of outrage.

Society ought to work so that these numbers, in the long run, tend to zero. It is not at all clear that more draconian laws or policing will help that goal. It requires addressing the root causes, which include, among others, poverty, misogyny, racism, authoritarianism, capitalism, and a culture which glorifies greed and violence.

But nonetheless, the point about perception remains. If one felt that, despite the continual rate of ongoing crime, that Victoria was a generally safe place to live in 2015, and one is consistent, then (putting aside local variations) one must feel the same at the beginning of 2019.

Indeed, Melbourne ranked in the top 10 safest cities in the world, in a 2017 Economist study.

If one feels that “African gangs” are a menace to society, as right-wing politicians and tabloid media continue to claim, despite the protestations even of the police to the contrary, then one is living in an alternate reality — a reality that at least has provided some social media entertainment, but whose racism is profoundly damaging to African communities in Melbourne.