Breakthroughs in primary school arithmetic

Humans have known how to multiply natural numbers for a long time. In primary school you learn how to multiply numbers using an algorithm which is often called long multiplication, and it was known to the ancient Babylonians. But it’s called “long” for a reason — you have to write a lot of lines! If you’re multiplying two numbers which both have length n, then you have to multiply every digit of the first number by every digit of the second number, so there are \(n^2\) multiplication operations. Then there are several additions. But addition is much easier than multiplication, as you learn in primary school: you can just go down column by column and work it out, and adding up two numbers of length n only takes roughly n operations.

In 1960, the great mathematician Andrey Kolmogorov was teaching a seminar in Soviet Russia. He conjectured that the ancient Babylonian method was best possible, in the sense that any algorithm to multiply natural numbers of length n must involve at least n^2 single-digit multiplication operations. One of the students in that seminar was Anatoly Karatsuba. One week later, Karatsuba came back with an improved method, which requires only  about \(n^{\log_2 3} \sim n^{1.58}\) multiplications. (Strictly speaking it’s \(O(n^{log_2 3})\), if you know “big-O notation“.)

Karatsuba’s method to multiply two 2-digit numbers involves multiplying the the units digits, multiplying the tens digits, and then multiplying the sum of the digits of one number by the sum of the digits of the other number. With some judicious addition, subtraction and placement of extra zeroes, the required product can be found. Karatsuba’s method in general repeats this method, in a recursive fashion, on larger numbers.

This was all in the news recently. Since Karatsuba’s breakthrough, there have been several further advances in the multiplication of natural numbers. But in the last few weeks, a paper was posted by two mathematicians, including David Harvey, an Australian number theorist at UNSW. It purports to give an algorithm to multiply natural numbers in time \(O(n \log n)\). A good article about all this recently appeared in quanta magazine, which is worth a read: here’s a link.

Uniqueness of contact structures and tomography

This article is the fifth in a series on Liouville and contact geometry, on convex surfaces and characcteristic foliations.

In our previous episode, we saw that when you have a product neighbourhood \(s \times[-1,1]\) of a surface \(S\) in a contact 3-manifold, you get a family, or “movie”, of characteristic foliations \(\mathcal{F}_t\) on the surfaces \(S_t = S \times {t}\). When \(S\) is convex and the neighbourhood \(S \times [-1,1]\) is defined from a transverse contact vector field, the foliations \(\mathcal{F}_t\) are all the same, \(\mathcal{F}_t = \mathcal{F}\).

We then asked the reverse question: if you have a family of foliations \(\mathcal{F}_t\) on the surface \(S\), do they arise as the movie of characteristic foliations of a contact structure on \(S \times [-1,1]\), i.e. with \(\mathcal{F}_t\) being the characteristic foliation on \(S_t\)? And we saw a couple answers. Under certain circumstances, a movie of foliations \(\mathcal{F}_t\) is the movie of a contact structure \(\xi\) — and depending on what you know about the \(\mathcal{F}_t\), you might know something about \(\xi\).

In this episode, we ask the question of how unique these contact structures are. If you have two contact structures \(\xi, \xi’\) with the same movie of foliations \(\mathcal{F}_t\), are they the same, or equivalent in any sense? And is it possible to have two movies of foliations \(\mathcal{F}_t, \mathcal{F}’_t\) which are the movies of equivalent contact structures?

Before attacking these questions, let’s recall what we saw previously, and let’s figure out what we might mean by “equivalence of contact structures”.

Recall we’ve said that a foliation \(\mathcal{F}\) on a surface \(S\) divides \(S\) if there is a curve (dividing set) \(\Gamma\) which cuts \(S\) into two pieces \(S_+, S_-\) on which \(\mathcal{F}\) is directed by a vector field which expands an area form. In this case you get a Liouville structure on each of \(S_+\) and \(S_-\). This is precisely the sort of foliation we see on a convex surface in a contact 3-manifold.

We saw that if all foliations \(\mathcal{F}_t\) are the same, and divide \(S\), then they are the movie of a contact structure on \(S \times [-1,1]\) — and in fact the contact structure is invariant in the \(t\) direction, and makes \(S\) convex.

We also saw Giroux’s realisation lemma, which says that if each foliation \(\mathcal{F}_t\) divides \(S\), then again, the foliations form the movie of a contact structure on \(S \times [-1,1]\).

This is all very nice. Our agenda, however, is to understand to what extent these contact structures are unique, or equivalent. So let’s examine two possible versions of equivalence of contact structures.

One standard way to consider two contact structures “equivalent” is if they are isotopic. A contact structure on a manifold \(M\) is a type of plane field on \(M\). If you can just continuously deform one into the other, they should be equivalent — but we need to be a little bit careful, because contact structures are plane fields with a special condition, that of non-integrability.

A homotopy of plane fields is a continuously varying family of plane fields, so two plane fields are homotopic if you can turn one into the other by a continuous deformation. An isotopy of contact structures is an homotopy of plane fields, where the plane field at each instant of time is in fact a contact structure.

So, two contact structures are isotopic if you can turn one into the other via a continuous deformation of the contact planes, but the planes must always retain the contact property of non-integrability.

Another way to consider two contact structures “equivalent” is if they are

related by a diffeomorphism of the manifold \(M\). A diffeomorphism \(\phi : M \rightarrow M\) has a derivative \(\phi_*\) which sends tangent vectors to tangent vectors, tangent planes to tangent planes, and contact structures to contact structures. Two contact structures \(\xi, \eta\) are are related by a diffeomorphism \(\phi\) if \(\eta = \phi_* \xi\).

For now, I’m more interested in continuous deformations of contact structures, rather than diffeomorphisms. But you can also obtain continuous deformations of contact structures from such diffeomorphisms!

Rather than considering a single diffeomorphism \(\phi\) of \(M\), we can have a 1-parameter family of diffeomorphisms \(\phi_t\), where each \(\phi_t\) is a diffeomorphism \(M \rightarrow M\). The \(\phi_t\) vary continuously in \(t\), say for \(t \in [0,1]\). And since we want to start from where we’re at, we usually require \(\phi_0\) to be the identity. This idea is sometimes called a diffeotopy. We can think of a diffeotopy as an ambient isotopy — the points of the whole 3-dimensional space \(M\) are moved about!

Such ambient isotopies naturally arise as the flows of vector fields. When you have a vector field \(X\) on a manifolld \(M\), if you flow along \(X\) for time \(t\) you obtain a diffeomorphism \(\phi_t : M \rightarrow M\). The family of diffeomorphisms \(\phi_t\) is a continuously varying family of diffeomorphisms of \(M\), starting from \(\phi_0\), which is the identity.

So this gives us two distinct notions of “continuous deformation” of a contact structure.

  • Isotopy of contact structures: A family of contact structures \(\xi_t\) on \(M\) which varies continuously. In other words, the contact planes move from one contact structure to another, through contact structures.
  • Ambient isotopy of contact structures: Given family of diffeomorphisms \(\phi_t : M \rightarrow M\), which varies continuously from \(\phi_0 = \text{Identity}\), starting from the contact structure \(\xi = \xi_0\) we obtain a family of contact structures \(\xi_t = \phi_{t*} \xi\). In other words, the whole space moves, and carries the contact plane field along with it!

Now it’s hopefully clear that an ambient isotopy induces an isotopy of contact structures. But it’s not at all clear that an isotopy of contact structures should arise from an ambient isotopy.

But although it’s not at all clear, it’s true! These two notions of “continuous deformation of contact structures” are essentially the same! This is known as Gray’s theorem.

GRAY’S THEOREM: Let \(\xi_t\) for \(t \in [0,1]\) be an isotopy of contact structures on a compact 3-manifold \(M\) without boundary. Then there exists a family of diffeomorphisms \(\phi_t : M \rightarrow M\), for \(t \in [0,1]\), such that that \(\phi_{t*} \xi_0 = \xi_t\).

Gray’s theorem is even more amazing, because the proof is explicit! It tells you how to find the diffeomorphisms \(\phi_t\). The method of this proof, sometimes known as Moser’s method, constructs \(\phi_t\) as the flow of a vector field \(X\), and uses the properties of symplectic and contact structures to find that vector field.

This, however, leads to a problem when \(M\) has boundary. The statement of the theorem applies only when \(M\) has no boundary. The method works in general, but the problem is that when \(M\) has boundary, the vector field \(X\) will in general point in or out of the boundary. Thus you can’t necessarily define the flow \(\phi_t\), as you might flow out of the manifold — there be dragons!

And, if we are just considering a neighbourhood \(S \times [-1,1]\) of a surface \(S\), then this is an issue, because \(S \times [-1,1]\) very definitely has boundary, namely at \(S \times {-1,1}\) !

Anyway, let’s return to our first question: if you have two contact structures \(\xi, \xi’\) with the same movie of foliations \(\mathcal{F}_t\), are they the same, or equivalent in any sense? Giroux showed that they are: they are then isotopic. He called this his “reconstruction lemma”; it’s lemma 2.1 of “Structures de contact en dimension trois et bifurcations des feuilletages de surfaces”.

RECONSTRUCTION LEMMA: If two contact structures on \(S \times [-1,1]\) have the same characteristic foliations \(\mathcal{F}_t\) on each surface \(S_t = S \times {t}\), then they are isotopic.

In other words, if two contact structures have the same movie, they are isotopic.

(Giroux says further that the two contact structures are isotopic relative to the boundary, but I don’t believe it, at least not in the sense of what I understand it to mean. The two contact structures could be quite different on \(S \times {-1,1}\), and hence the isotopy connecting them must be nontrivial on the boundary. But perhaps Giroux means something else.)

Geometrically, if two contact structures \(\xi, \xi’\) have the same movie, then on each surface \(S_t\), they draw the same characteristic foliation \(\mathcal{F}_t\), so they are both tangent to the lines of \(\mathcal{F}_t\). The contact planes of \(\xi, \ xi’\) just spin around those lines differently!

The proof of the reconstruction lemma is not very difficult in the end. It relies upon a calculation we saw before . Namely, a contact form can be written as \(\alpha = \beta_t + u_t \; dt\), where \(\beta_t\) is a 1-form whose kernel on \(S_t\) yields \(\mathcal{F}_t\), and \(u_t\) is a real-valued function on \(S_t\). We saw that the condition for \(\alpha\) to be a contact form is that
\[
u_t \; d\beta_t + \beta_t \wedge ( du_t – \dot{\beta}_t )
\]
be an area form on each \(S_t\). Fixing an orientation on \(S_t\), we can write this requirement as the inequality
\[
u_t \; d\beta_t + \beta_t \wedge ( du_t – \dot{\beta}_t ) > 0.
\]

Now, our two contact structures \(\xi,\xi’\) draw the same movies, which are the characteristic foliations \(\mathcal{F}_t\), which are given by the kernels of \(\beta_t\). So we can take contact forms \(\alpha, \alpha’\) which have the same \(\beta_t\) terms.

The key idea is to take the inequality above, with fixed \(\beta_t\), and consider the set of all \(u_t\) which would satisfy the inequality. The key observation is that this is a convex set. For if \(u_t, v_t\) are two functions which satisfy the inequality, then so too does any convex linear combination \((1 – \lambda) u_t + \lambda v_t\), for any \(\lambda \in [0,1]\).

Explicitly, if we have
\[
u_t \; d\beta_t + \beta_t \wedge ( du_t – \dot{\beta}_t ) > 0
\quad \text{and} \quad
v_t \; d\beta_t + \beta_t \wedge ( dv_t – \dot{\beta}_t ) > 0.,
\]
then taking \((1 – \lambda)\) times the first inequality plus \(\lambda\) times the second, since both \(\lambda, 1 – \lambda \geq 0\), yields
\[
[ (1- \lambda) u_t + \lambda v_t ] \; d\beta_t + \beta_t \wedge ( d [ (1 – \lambda) u_t + \lambda v_t ] – \dot{\beta}_t ) > 0,
\]
so that replacing \(u_t\) with \((1 – \lambda) u_t + \lambda v_t\) in the original inequality, the inequality still holds.

With this observation in hand, it’s not difficult to prove the lemma.

PROOF OF LEMMA. Let the two contact structures be \(\xi_0, \xi_1\). As they have the same movie, these two contact structures have contact forms \(\alpha_0 = \beta_t + u_t \; dt\), \(\alpha_1 = \beta_t + v_t \; dt\), where \(\beta_t\) is a 1-form and \(u_t, v_t\) are real-valued functions. We can take the same \(\beta_t\) in both contact forms precisely because they have the same movie of foliations.

Now the contact conditions for \(\alpha_0, \alpha_1\) are precisely given by the two inequalities above for \(u_t\) and \(v_t\). For \(\lambda \in [0,1]\), define a 1-form \(\alpha_\lambda\) as a convex linear combination of \(\alpha_0\) and \(\alpha_1\):
\[
\alpha_\lambda = (1-\lambda) \alpha_0 + \lambda \alpha_1
= \beta_t + [ (1-\lambda) u_t + \lambda V_t ] \; dt
\]
Now as discussed above, \(u_t, v_t\) satisfy the desired inequalities, and hence so too does \((1-\lambda) u_t + \lambda v_t\). And this convex linear combination satisfying the inequality means that \(\alpha_\lambda\) is a contact form. So we have a continuously varying family of contact forms \(\alpha_\lambda\), from \(\alpha_0\) to \(\alpha_1\). This gives an isotopy of contact structures from \(\xi_0\) to \(\xi_1\). QED

This lemma gives a very nice answer to our first question. Yes, it says, if two contact structures give the same movie of foliations, then they are equivalent — they are isotopic.

But what about ambient isotopy? Well, as mentioned above, Gray’s theorem construct a vector field whose flow will give an ambient isotopy — the problem is that this vector field might point in or out of the boundary of \(S \times [-1,1]\). If we’re happy to have our diffeomorphisms going beyond \(S \times [-1,1]\), there’s no problem. But if we want to stay with everything happening in \(S \times [-1,1]\), we may have a problem.

In any case, let’s now turn to our second question; and in fact in answer to this question we will be able to give an answer involving ambient isotopy. The question we asked was: Is it possible to have two movies of foliations, which are movies of equivalent contact structures?

It’s certainly possible. In fact, we can construct such a situation to involve an ambient isotopy.

Let’s start from our old, classic, convex surface situation. Let’s consider a contact structure \(\xi\) near a convex surface \(S\), with a neighbourhood \(S \times [-1,1]\) defined by a transverse contact vector field, so that all the foliations \(\mathcal{F}_t\) are the same, i.e. the movie of foliations is all just the same frame. In this case the contact structure is “vertically invariant” and we have a contact form \(\alpha = \beta + u \; dt\), where \(\beta, u\) are a 1-form and a real-valued function on \(S\), with no dependence on \(t\).

Now, let’s consider a diffeomorphism \(\phi\) of \(S\). In fact, let’s consider a smooth family of diffeomorphisms \(\phi_t\) of \(S\), starting from \(\phi_0\) being the identity, through to \(\phi_1\) being our diffeomorphism \(\phi\). So \(\phi\) is a diffeomorphism which is isotopic to the identity, and \(\phi_t\) is an isotopy of diffeomorphisms, or diffeotopy. So for each \(t \in [0,1]\), we have a diffeomorphism \(\phi_t : S \rightarrow S\), and these vary smoothly in \(t\), with \(\phi_0\) the identity, and \(\phi_1\) being our original diffeomorphism \(\phi\).

Let’s now use the diffeomorphisms \(\phi_t\), over all \(t \in [0,1]\), to construct a diffeomorphism \(\Phi\) of \(S \times [0,1]\). We’ll define \(\Phi\) by applying \(\phi_t\) to each surface \(S_t = S \times {t}\). In other words,
\[
\Phi(x,t) = (\phi_t (x), t).
\]
Note that we’ve only taken \(t \in [0,1]\) here, but we have a larger interval \([-1,1]\) in our thickened surface \(S \times [-1,1]\). But since \(\phi_0\) is the identity on \(S\), we can extend \(\Phi\) to in fact be a diffeomorphism of the whole \(S \times [-1,1]\) by being the identity on \(S \times [-1,0]\).

Thus we have a diffeomorphism \(\Phi : S \times [-1,1] \rightarrow S \times [-1,1]\). It preserves each slice \(S_t\). It’s the identity on each \(S_t\), for \(t \leq 0\). But for \(t \geq 0\) it moves the slices about in a smooth fashion, starting from the identity on \(S_0\), through to applying \(\phi\) to \(S_1\).

Applying the diffeomorphism \(\Phi\) (or rather, its derivative) to the nice original vertically invariant contact structgure \(\xi\), we obtain another contact structure. Let \(\eta= \Phi_* \xi\).

So \(\eta\) is another contact structure on \(S \times [-1,1]\). It’s related to \(\xi\) by the diffeomorphism \(\Phi\). Now since \(\Phi\) preserves each surface \(S_t\) and \(\Phi\) also sends the contact planes of \(\xi\) to \(\eta\), it must send the characteristic foliations of \(\xi\) to the characteristic foliations of \(\eta\). If we define \(\mathcal{G}_t\) to be the characteristic foliation of \(\eta\) on \(S_t\), then \(\Phi (\mathcal{F}_t) = \mathcal{G}_t\). Indeed, thinking purely about the individual slice \(S_t\), we have \(\phi_t (\mathcal{F}_t) = \mathcal{G}_t\). So \(\mathcal{F}_t\) is the movie of foliations of \(\xi\), and \(\mathcal{G}_t\) is the movie of foliations of \(\eta\).

Now recall the original contact structure \(\xi\) was vertically invariant, so all the foliations \(\mathcal{F}_t\) are the same, say \(\mathcal{F}_t = \mathcal{F}\). But \(\phi\), on the other hand, can be any diffeomorphism of \(S\) isotopic to the identity — deforming the points of \(S\) around in some fashion. So the movie of foliations \(\mathcal{G}_t = \phi_t (\mathcal{F}_t) = \phi_t (\mathcal{F})\) will in general be very different from \(\mathcal{F}\). In other words, \(\xi\) and \(\eta\) will in general have very different movies of foliations.

And yet, despite \(\xi\) and \(\eta\) having very different movies, they are related to each other by the diffeomorphism \(\Phi\). We claim that they are in fact isotopic — in fact, ambient isotopic.

To show \(\xi, \eta\) are ambient isotopic, we just need to show that the diffeomorphism \(\Phi\) of \(S \times [-1,1]\) is isotopic to the identity. This is not so difficult, since \(\Phi\) is constructed out of the diffeomorphism \(\phi_t\) of \(S\), which are themselves an isotopy from the identity! We just need to straighten out what we mean.

To define the isotopy from \(\Phi\) to the identity on \(S \times [-1,1]\), we need a new time variable! We’ve already used \(t\) for the coordinate on \([-1,1]\). So let’s use a new variable \(s\). We’ll define a family of diffeomorphisms \(\Phi_s : S \times [-1,1] \rightarrow S \times [-1,1]\), for \(s \in [0,1]\) varying smoothly in \(s\), with \(\Phi_0\) beint the identity and \(\Phi_1 = \Phi\).

On \(S \times [0,1]\), we defined \(\Phi (x,t) = (\phi_t (x), t)\). We can now define \(\Phi_s\) on \(S \times [0,1]\) by
\[
\Phi_s (x,t) = (\phi_{st} (x), t).
\]
Here \(st\) just means \(s\) times \(t\)! At time \(s\), \(\Phi_s\) acts on the slice \(S_t\) via the diffeomorphism \(\phi_{st}\).

From this definition we clearly have that each \(\Phi_s\) is a diffeomorphism of \(S \times [-1,1]\), and that these diffeomorphisms vary smoothly in \(s\). When \(s = 0\), we have \(\Phi_0 (x,t) = (\phi_0 (x), t)\), and since \(\phi_0\) is the identity on \(S\), this means \(\Phi_0\) is the identity on \(S \times [0,1]\). When \(s=1\), we have \(\Phi_1 (x,t) = (\phi_t (x), t) = \Phi(x,t)\), so \(\Phi_1 = \Phi\). So indeed \(\Phi_s\) is an isotopy of diffeomorphisms of \(S \times [0,1]\) from the identity to \(\Phi\).

Now on \(S \times [-1,0]\), \(\Phi\) is the identity. So there is a clear isotopy from the identity to \(\Phi\) — namely the isotopy just consisting of the identity! So we can extend \(\Phi_s\) to be defined on all of \(S \times [-1,1]\), by letting \(\Phi_s\) be the identity on \(S \times [-1,0]\).

Thus we obtain an isotopy of diffeomorphisms \(\Phi_s\) of \(S \times [-1,1]\) from the identity to \(\Phi\). So if we let \(\xi_s = \Phi_{s*} \xi\), then each \(\xi_s\) is a contact structure, with \(\xi_0 = \xi\) and \(\xi_1 = \eta\). So \(\xi, \eta\) are isotopic, and the diffeotopy \(\Phi_s\) shows that they are in fact ambient isotopic.

Thus, we have constructed examples of contact structures on \(S \times [-1,1]\) which are isotopic, indeed ambient isotopic, but which induce different movies of foliations on the surfaces \(S_t\).

We can summarise this construction in the following proposition.

PROPOSITION: Let \(S\) be a closed surface, and \(\xi\) a vertically invariant contact structure on \(S \times [-1,1]\). Let \(\phi\) be a diffeomorphism of \(S\) isotopic to the identity via an isotopy \(\phi_t\), with \(\phi_t = \phi\) and \(\phi_t\) the identity for \(t \leq 0\). Then the contact structure \(\eta\) defined by
\[
\eta = \Phi_* \xi,
\quad \text{where} \quad
\Phi(x,t) = (\phi_t (x), t)
\]
is ambient isotopic to \(\xi\), via the diffeotopy \(\Phi_s\) of \(S \times [-1,1]\) defined by \(\Phi_s (x,t) = ( \phi_{st} (x), t)\).

This actually has a nice application when we consider adding bypasses.

Bypasses are objects which are attached to the boundary of a contact 3-manifold along a type of arc called an attaching arc. An attaching arc is a special type of arc on a convex surface \(S\). Specifically, an attaching arc must (i) run along the characteristic foliation of \(S\), (ii) begins and ends on the dividing set, and (iii) intersects the dividing set at a single point of its interior. Thus an attaching arc intersects the dividing set in preciselly three points.

We’ll also consider arcs which satisfy only (ii) and (iii). That is, they intersect the dividing set in the same pattern, but they might not run along the characteristic foliation. We’ll call these combinatorial attaching arcs.

We’ll let \(\gamma\) be an attaching arc on the convex surface \(S\) in the contact manifold \(S \times [-1,1]\) with vertically invariant contact structure \(\xi\), and we’ll let \(\delta\) be a combinatorial attaching arc. Moreover, we’ll suppose that there is an isotopy of combinatorial attaching arcs between \(\gamma\) to \(\delta\). In other words, you can slide \(\gamma\) to \(\delta\) along \(S\) through combinatorial attaching arcs.

PROPOSITION: \(\xi\) is ambient isotopic to a contact structure \(\eta\) on \(S \times {1}\) for which \(\delta \times {1}\) is an attaching arc.

Obviously \(\delta \times {1}\) is a combinatorial attaching arc: the point is that we can adjust the contact structure to make it a bona fide attaching arc, running along the characteristic foliation.

PROOF: The isotopy of arcs from \(\gamma\) to \(\delta\) extends to an isotopy of diffeomorphisms of \(S\), which preserves the dividing set \(\Gamma\). In other words, we can obtain a family of diffeomorphisms \(\phi_t : S \rightarrow S\), varying smoothly in \(S\), with \(\phi_0\) being the identity, \(\phi_1 (\gamma) = \delta\), and for all \(t\), \(\phi_t (\Gamma) = \Gamma\).

(This \(\phi_t\) can be constructed by first extending the isotopy from \(\gamma\) to \(\delta\), to an isotopy from \(\gamma \cup \Gamma\) to \(\delta \cup \Gamma\). Then the remaining pieces of the surface can be carried to each other.)

The construction in the previous proposition then provides a diffeomorphism \(\Phi : S \times [-1,1] \rightarrow S \times [-1,1]\), isotopic to the identity via a diffeotopy \(\Phi_s\), and sending \(\xi\) to an ambient isotopic structure \(\eta\). Now \(\Phi(x,1) = \Phi_1 (x,1) = (\phi_1 (x),1)\), so \(\Phi\) acts on \(S_1 = S \times {1}\) via \(\phi_1\). Since \(\Phi\) takes \(\xi\) to \(\eta\), \(\Phi\) sends the characteristic foliation of \(\xi\) to the characteristic foliation of \(\eta\). Since \(\phi_1 (\gamma) = \delta\), this means that \(\delta\) runs along the characteristic foliation of \(\eta\). So \(\delta \times {1}\) is a bona fide attaching arc in the contact structure \(\eta\). QED

This is a form of flexibility of contact structures. We can slide an attaching arc around, and still realise it as an attaching arc, via an ambient isotopy of the contact structure.

It’s known, moreover, that once you give the dividing set, the contact structure nearby is determined, in a certain sense. So what this proposition tells us is that the dividing set and attaching arcs can be considered purely combinatorially, or topologically, on the surface, rather than having to worry too much about characteristic foliations and the detailed geometry of contact planes!

Convex surfaces and tomography

(This article is the fourth in a series on Liouville and contact geoemtry. The first was on Liouville (exact symplectic) geometry on surfaces. The second went from them to convex surfaces in 3-dimensional contact geometry. The third went back, from convex surfaces in 3-dimensional contact geometry, to 2-dimensional Liouville geometry, and showed how convex surfaces can be regarded as two Liouville structures, pieced together along a dividing set.)

We’ve seen that there are excellent things called convex surfaces in 3-dimensional contact geometry, closely related to Liouville geometry. Indeed, on convex surfaces we have wondeful foliations. So when you slice a contact 3-manifold along a convex surface, you get wonderful foliations. We’re now going to consider the relationship between these foliations on surfaces, and contact structures

Recall a convex surface is a surface \(S\) in a 3-dimensional contact manifold \((M, \xi)\) which have a transverse contact vector field \(X\). We’ve seen how you can use the vector field \(X\) to define a transverse coordinate \(t\) and hence describe a neighbourhood of \(S\) as \(S \times [-1,1]\), where \(S = S \times {0}\), with \(t\) giving the latter coordinate. With respect to these coordinates, \(\xi\) has a contact form \(\alpha\) of the form
\[
\alpha = \beta + u \; dt,
\]
where \(\beta\) is a 1-form on \(S\), and \(u\) is a real-valued function on \(S\). (We’re assuming everything is smooth here.)

The neighbourhood \(S \times [-1,1]\) of \(S\) comes as a family of surfaces \(S_t = S \times {t}\), with \(S = S_0\). The contact vector field \(X\) flows \(S_0\) to each \(S_t\), and hence each surface \(S_t\) in this family looks the same with respect to contact geometry. The contact planes are preserved by the flow of \(X\).

Thus, on each surface \(S_t\), we have the same characteristic foliation — where by “same”, we mean the surfaces and their foliations are related by the flow of \(X\). When you map one surface \(S_t\) in this family to another by a flow of \(X\), you’ll also map the characteristic foliation on one surface to the characteristic foliation on the other. Thinking of \(X\) as being “vertical” and \(S\) as “horizontal”, it means that the contact structure is “vertically invariant” — it doesn’t change as you move in the vertical direction, from one surface \(S_t\) of the family to another.

This is a really nice structure to have. Rather than having to think about all the surfaces \(S_t\) near \(S\), you really only have to think about one, because they are all the “same”, in this sense.

The most amazing thing about convex surfaces is Giroux’s proof that almost any embedded surface in a contact 3-manifold is convex. So, for “almost any” embedded surface \(S\), there is a transverse contact vector field, and then you know that you can take \(S\) as part of a family of surfaces \(S_t\) foliating a neighbourhood of \(S\), which are “all the same” in the above sense, and hence you only need to think about one of these surfaces!

What does “almost any” surface mean here? Giroux describes it in terms of a property of the characteristic foliation: if the characteristic foliation is “almost Morse-Smale”, then the surface is convex. And “almost Morse-Smale” is a generic property. In particular, given any embedded surface, after a \(C^\infty\) small perturbation, the surface becomes convex.

(For surfaces with boundary, we often want to preserve a little more structure, and so some further details are required. But we will not pursue them here.)

However, let’s now imagine we took a surface \(S\), and we didn’t know about any transverse contact vector field — i.e. we didn’t know if \(S\) was convex or not. Then, we could still take a neighbourhood of \(S\), of the form \(S \times [-1,1]\), and define \(t\) as the coordinate on the latter factor. Then we would again obtain a family of surfaces \(S_t = S \times {t}\) foliating a neighbourhood of \(S\), with \(S = S_0\). But now the surfaces \(S_t\) could all have different characteristic foliations \(\mathcal{F}_t\).

It would be a mess! That is why knowing \(S\) is convex, or equivalently, having a transverse contact vector field, really helps.

Thinking of \(t\) as time, you can think of the family of foliations \(\mathcal{F}_t\) as a “movie” of foliations. Or, since you are probing the contact structure by considering how it cuts the slices \(S_t\), you can think of it as a form of “tomography” — and this is what Giroux calls it.

In a 2000 paper, Giroux studied these sorts of “movies” or “tomography”. The paper is called “Structures de contact en dimension trois et bifurcations des feuilletages de surfaces”, which translates as “Contact structures in dimenson three and bifurcations of foliations of surfaces”. The French word “feuilletage” is much nicer than the English word “foliation”.

Given a surface \(S\) in a contact manifold \((M, \xi)\) with a product neighbourhood \(S \times [-1,1]\), you obtain a movie of foliations \(\mathcal{F}_t\) on the surfaces \(S_t\). If \(S\) is convex then, by choosing the product neighbourhood right (as above), all \(\mathcal{F}_t\) are the “same” (as discussed above) — the “movie” is just one frame, repeated!

But if all you have are the foliations \(\mathcal{F}_t\), you can’t say much at all. In fact, given a family of foliations \(\mathcal{F}_t\) on \(S_t\), it’s not even clear that they come from a contact structure at all!

So a first question is: which families of foliations arise from contact structures? Equivalently: what movies of foliations describe contact structures. Or: what tomography can you get from slicing a contact structure. This question is not easy.

Giroux however gives some answers. I want to consider one of his results, which he calls the “realisation lemma”.

Here is one possible line of reasoning. We have seen that certain characteristic foliations arise from convex surfaces. A characteristic foliation \(\mathcal{F}\) on a convex surface has a dividing set \(\Gamma\), splitting \(S\) into two pieces \(S_+\) and \(S_-\), and on each piece \(\mathcal{F}\) can be directed by a vector field which expands an area form. We’ll say that such a foliation divides \(S\). It would be very pleasing to see this lovely Liouville geometry on each slice. The nicest case would be if we had the same Liouville geometry on each slice.

Does it follow, in these circumstances, that the foliations come from a contact structure?

In other words, suppose we have \(S \times [-1,1]\), and let \(\mathcal{F}\) be a foliation which divides \(S\). Suppose we have the foliation \(\mathcal{F}_t = \mathcal{F}\) on each slice \(S_t = S \times {t}\). Is this the movie of foliations of a contact structure on \(S \times [-1,1]\)?

Giroux proved that the answer is yes. (This is proposition I.3.4 of his 1991 paper “Convexité en topologie de contact”.) In fact we have already seen the basic idea of the proof. Since \(S_+\) and \(S_-\) have Liouville structures, we may take a 1-form \(\beta\) on \(S_+ \cup S_-\), such that \(d\beta\) is an area form on \(S_+ \cup S_-\), and the dual vector field \(X\) with respect to \(\beta\) directs \(\mathcal{F}\) on \(S_+ \cup S_-\). As we saw previously, \(\beta\) being Liouville is equivalent to \(\beta + dt\) being a contact form on \((S_+ \cup S_-) \times [-1,1]\). So we have a contact structure on \((S_+ \cup S_-) \times [-1,1]\). It remains to extend this over \(\Gamma \times [-1,1]\). And Giroux shows that this can be done. He shows that you can patch them together, to obtain a contact form on \(S \times [-1,1]\) which takes the form \(\beta + u \; dt\), where \(\beta\) is a 1-form on \(S\), and \(u\) is a real-valued function on \(S\).

Indeed, having a contact form of this type, the contact structure obtained on \(S \times [-1,1]\) is not just any old contact structure: it’s invariant in the \([-1,1]\) direction. It’s “vertically invariant”, and so we have a contact vector field in the \([-1,1]\) direction. This direction of course is transverse to \(S\), so we have \(S\) is convex.

Great. So a “constant” movie of foliations, where the foliation \(\mathcal{F}\) divides each slice (i.e. has a dividing set which cuts each slice into pieces on which there is a Liouville structure) is always the movie of a contact structure — and indeed a vertically invariant contact structure exhibiting \(S\) as convex.

But let’s suppose we have a slightly worse situation. Suppose we have different foliations \(\mathcal{F}_t\) appearing on the slices \(S_t\), but each individual foliation \(\mathcal{F}_t\) still divides \(S_t\). In other words, each foliation \(\mathcal{F}_t\) has a dividing set \(\Gamma\) which splits \(S\) into an \(S+\) and \(S_-\), and \(\mathcal{F}_t\) can be directed by a vector field which expands an area form on \(S+\) and \(S_-\). Here \(\Gamma\), \(S_+\) and \(S_-\) might all vary with \(t\), so we should really write something like \(\Gamma_t\), \(S_{t,+}\) and \(S_{t,-}\) to indicate the dependence on \(t\).

In these circumstances, are the foliations \(\mathcal{F}_t\) the movie of a contact structure?

It’s not quite so clear. When you have a foliation which can vary with \(t\), the contact condition becomes more complicated.

When you just have a single foliation, with a dividing set and Liouville structures on either side, then you get a contact form of the type \(\alpha = \beta + u \; dt\), where \(\beta\) is a 1-form and \(u\) a real-valued function on \(S\). The condition for a 1-form to be a contact form is that \(\alpha \wedge d\alpha\) be a volume form, i.e. a non-degenerate 3-form. When \(\alpha = \beta + u \; dt\) we have
\[
\alpha \wedge d\alpha
= (\beta + u \; dt) \wedge (d\beta + du \wedge dt)
= (u \; d\beta + \beta \wedge du ) \wedge dt .
\]
So given that \(\beta\) and \(u\) are purely on S, i.e. have no \(t\)-dependence, the condition for \(\alpha = \beta + u \; dt\) to be a contact form is precisely that \(u \; d\beta + \beta \wedge du\) be an area form on \(S\).

But when you have a family of foliations, even if they each divide \(S\), you would be looking for a contact form of the type \(\alpha = \beta + u \; dt\) again — but now \(\beta\) and \(u\) can depend on \(t\). Perhaps it’s better to write, as Giroux does, \(\beta_t\) and \(u_t\), to indicate the dependence on \(t\). You can think of \(\beta_t\) as a 1-form on \(S_t\), and \(u_t\) as a real-valued function on \(S_t\). The contact condition then becomes more complicated, because then \(d\beta = d\beta_t + dt \wedge \frac{\partial \beta_t}{\partial t}\). Here we write \(d\beta_t\) for the 2-form on \(S_t\) which arises by taking the differential of a 1-form on \(S_t\); but \(\beta\) also has a \(t\)-dependence, and so we also obtain a derivative with respect to \(t\). Let us write \(\dot{\beta}_t\) for \(\frac{\partial \beta_t}{\partial t}\). Then the contact condition is
\[
\alpha \wedge d\alpha
= ( u_t \; d\beta_t + \beta_t \wedge ( du_t – \dot{\beta}_t ) \wedge dt .
\]
So in this more general case, with \(\beta\) and \(u\) depending on \(t\), the condition for \(\alpha\) to be a contact form is that \(u_t \; d\beta_t + \beta_t \wedge (du_t – \dot{\beta}_t )\) be an area form on \(S\).

So the answer to the question may not be clear. When \(\alpha\) takes the form \(\beta_t + u_t \; dt\), even if each surface \(S_t\) has a dividing set, with Liouville structures on either side, this only means that on each slice we have the first condition above, that \(u_t \; d\beta_t + \beta_t \wedge du_t\) is an area form on \(S_t\). To show that we have a contact form, we need to show the latter condition, that \(u_t \; d\beta_t + \beta_t \wedge (du_t – \dot{\beta}_t)\). The term with a \(\dot{\beta}_t\), taking a derivative in the \(t\)-direction, makes a difference.

But in any case, Giroux shows the answer is yes. This is his “realisation lemma”, which is lemma 2.4 of the 2000 paper. And the proof is not too difficult. Let’s state the result and prove it.

REALISATION LEMMA (Giroux): Let \(\beta_t\) be a family of 1-forms on \(S\), and \(v_t\) a family of functions \(S \rightarrow \mathbb{R}\), such that for all \(t\),
\[
v_t \; d\beta_t + \beta_t \wedge dv_t
\]
is an area form on \(S\). Then there exists a contact structure on \(S \times [-1,1]\) with a contact form of the form \(\beta_t + u_t \; dt\), where each \(u_t\) is a real-valued function on \(S\).

In other words, if each \(S_t\) is divided by the foliation \(\mathcal{F}_t\), then there exists a contact structure on \(S \times [-1,1]\) with \(\mathcal{F}_t\) as its movie of characteristic foliations.

PROOF:
Choose an orientation on \(S\) which agrees with \(v_t \; d\beta_t + d\beta_t \wedge dv_t\), so that we can write
\[
v_t \; d\beta_t + d\beta_t \wedge dv_t > 0.
\]
Now we need to find functions \(u_t\) such that
\[
u_t \; d\beta_t + \beta_t \wedge ( du_t – \dot{\beta}_t ) > 0.
\]
Clearly, the only difference between these two inequalities is the term \(\beta_t \wedge \dot{\beta}_t\). But the \(\beta_t\) are fixed — it’s the \(u_t\) we get to choose. And \(S \times [-1,1]\) is a compact set. So \(\beta_t \wedge \dot{\beta}_t\) only gets so large.

Similarly, by compactness, \(v_t \; d\beta_t + d \beta_t \wedge dv_t\), as a positive area form, only gets so small. If we multiply \(v_t\) by a large constant \(K\), then \(v_t \; d\beta_t + d \beta_t \wedge dv_t\) is also gets multiplied by \(K\) — and thus can be guaranteed to be arbitrarily large everywhere. Indeed, we can make it so large that it overwhelms the term \(\beta_t \wedge \dot{\beta}_t\).

And that is what we do. Let \(u_t = K v_t\) for sufficiently large \(K>0\). This is all we need to do. QED.

(This proof assumes the \(v_t\) vary smoothly in \(t\). But even if the \(v_t\) don’t vary smoothly, or even continuously, in \(t\), one can use a partition of unity to construct the desired \(u_t\).)

Liouville structures and convex surfaces

(This post is the third in a series on geometry. (A geometric series, har har har.) They all assume you know about differential forms and such things. The first was on Liouville geometry, also known as exact symplectic geometry, on surfaces. The second went from them to contact geometry. So I’m assuming you know what those are.)

We’ve seen that if you take a Liouville 1-form \(\beta\) on a surface \(S\) (i.e. such that \(d\beta\) is nondegenerate, hence a symplectic form), then the 1-form \(\alpha = \beta + dt\) on the 3-manifold \(M = S \times [0,1]\) obtained by thickening \(S\) is a contact form. (Here \(t\) is the coordinate on \([0,1]\).)

Moreover, we’ve seen that on each slice \(S \times \{t\}\) of this thickening, the characteristic foliation (i.e. the pattern of how the slice intersects the contact planes) \(\mathcal{F}\) coincides with \(\ker \beta\).

We’ve also noted that this contact form \(\alpha\) is a vertically invariant contact form on \(M\): it has no dependence on \(t\). Indeed, the flow of the vertical vector field \(\partial_t\) preserves \(\alpha\), and hence is a contact vector field. Thus each slice \(S \times \{t\}\) is transvserse to a contact vector field, and hence is a convex surface.

Thus, starting from the simple but elegant structure of a Liouville 1-form on a surface, we have been led to 3-dimensional contact geometry, and convex surfaces.

What we’re going to do now is go in the other direction, and start from a convex surface.

We’re going to make a clear distinction now between a contact structure and a contact form. A contact form is a 1-form \(\alpha\) such that \(\alpha \wedge d\alpha\) is non-degenerate, i.e. so that \(\ker \alpha\) is a non-integrable plane fielr. A contact structure \(\xi\) is a non-integrable plane field. So any contact form \(\alpha\) defines a contact structure \(\xi\) by \(\xi = \ker \alpha\), but a contact structure \(\xi\) has many 1-forms defining it (at least locally). Given any contact form \(\alpha\) such that \(\ker \alpha = \xi\), we can multiply \(\alpha\) by any smooth nonzero real-valued function \(f\), and \(f\alpha\) is then another contact 1-form, with \(\ker(f\alpha) = \ker \alpha = \xi\).

Well, let’s return to the definition of a convex surface: it’s an embedded surface \(S\) in a contact 3-manifold for which there is a vector field \(X\) transverse to \(S\). Said tersely, a convex surface is a surface with a transverse contact vector field.

Now, given a convex surface, we can introduce coordinates as we please. Let us define a coordinate \(t\) by the transverse vector field \(X\). So let \(X = \partial_t\). We can then let \(t=0\) on the surface \(S\), and flowing along \(X = \partial_t\), we obtain a coordinate \(t\) which measures how far from \(S\) we have flowed along \(X\). Using this coordinate, we can describe a neighbourhood of \(S\) as \(S \times [-\varepsilon, \varepsilon]\), for some sufficiently small \(\varepsilon\), where \(S\) appears as \(S \times \{0\}\) and the coordinate on the \([-\varepsilon, \varepsilon]\) factor is precisely \(t\). For simplicitly, we can take \(\varepsilon = 1\); by slowing down the vector field \(X\) we can in fact fit this \(S \times [-1,1]\) inside the previous \(S \times [-\varepsilon, \varepsilon]\).

So now we have a neighbourhood of \(S\) given as \(S \times [-1,1]\), and the transverse contact vector field is \(X = \partial_t\).

If we further denote by \(x,y\) some local coordinates on \(S\), then \(x,y,t\) form some local coordinates on \(S \times [-1,1]\). So the contact form \(\alpha\) (or indeed any 1-form) can be written in the form
\[ \alpha = f \; dx + g \; dy + u \; dt, \]
where \(f,g,u\) are real-valued functions on \(S \times [-1,1]\). Now the functions \(f,g,u : S \times [-1,1] \rightarrow \mathbb{R}\) might in general depend on \(x,y,t\). But as \(X = \partial_t\) s a contact vector field, the contact planes given by \(\ker \alpha\) don’t depend on the \(t\) coordinate at all. And hence we can take the contact form \(\alpha\) not to depend on \(t\) either. (Possibly \(\alpha\) might depend on \(t\), since multiplying \(\alpha\) by any nonero real-valued function produces a 1-form with the same kernel; but for such an \(\alpha\), we can “normalise” it, multiplying by a nonzero function, to make it independent of \(t\). Or indeed replacing \(f(x,y,t), g(x,y,t), u(x,y,t)\) with \(f(x,y,0), g(x,y,0), u(x,y,0)\) would have the same effect.)

In other words, since \(S\) is a convex surface, there is a contact form \(\alpha\) where \(f,g,u\) only depend on \(x,y\), and not \(t\). We can write
\[ \alpha = f(x,y) \; dx + g(x,y) \; dy + u(x,y) \; dt. \]
Written in this way, the first two terms \(f(x,y) \; dx + g(x,y) \; dy\) denote a 1-form purely on the surface \(S\). Indeed, any 1-form on \(S\) can be written this way. So let’s call it \(\beta\). In a similar way, \(u(x,y)\) can be regarded as a function purely on the surface \(S\). We then have
\[ \alpha = \beta + u \; dt \]
where \(\beta\) is a 1-form on \(S\), and \(u\) is a real-valued function on \(S\).

When \(u \neq 0\), we can do even better. We can then divide the whole 1-form \(\alpha\) by \(u\) — and remember, multiplying the contact form by a nonzero function results in another contact form defining the same contact structure. So this allows us effectively to assume that \(u=1\), and that \(\alpha\) is of the form \(\beta + dt\), where again \(\beta\) is a 1-form and \(u\) a real-valued function on \(S\).

Now if \(\alpha = \beta + dt\) is a contact form, then it must satisfy the contact condition of being non-integrable, i.e. \(\alpha \wedge d\alpha\) must be non-degenerate. Not every possible 1-form \(\beta\) on \(S\) and every function \(u\) on \(S\) will make \(\beta + dt\) a contact form. Which possible 1-forms \(\beta\) make a contact form? We can compute \(\alpha \wedge d\alpha\) to find out:
\[ \alpha \wedge d\alpha = (\beta + dt) \wedge d\beta = \beta \wedge d\beta + dt \wedge d\beta = dt \wedge d\beta. \]
This is a contact form if and only if \(d\beta\) is a non-degerate 2-form on \(S\) — that is, if \(\beta\) is a Liouville 1-form.

Thus we have proved the following.

PROPOSITION. Let \(S\) be a convex surface in a 3-manfold with a contact structure \(\xi\). Defining a transverse coordinate \(t\) via the transverse contact vector field, \(S\) has a neighbourhood on which \(\xi\) has a contact form \(\beta + u \; dt\), where \(\beta\) is a 1-form and \(u\) is a real-valued function on \(S\).

If further \(u\) is nowhere zero, then \(S\) has a neighbourhood on which \(\xi\) has a contact form \(\beta + dt\), where \(\beta\) is a Liouville 1-form on \(S\).

In our previous episode, starting from Liouville structures on surfaces, we were led to convex surfaces in contact 3-manifolds. And now, we have gone back, from convex surfaces to Liouville structures.

Now we know that not every surface has a Liouville structure: we saw previously that there can’t be one if \(S\) is compact without boundary. And so a convex surface also can’t have a local contact form \(\beta + dt\) if \(S\) is compact without boundary.

But, amazingly enough, Giroux proved that, in a certain sense, almost any embedded surface in a contact manifold is convex — including almost any embedded compact surface without boundary.

Such a convex surface \(S\), compact without boundary, has a local contact form of the type \(\beta + u \; dt\), as we’ve discussed. And remember we said that if \(u\) is nowhere zero, then we could divide out by \(u\) and obtain a local contact form of the type \(\beta + dt\). But they can’t have a local contact form of the type \(\beta + dt\). Hence for any convex surface \(S\), compact without boundary, the contact form \(\beta + u \; dt\) must have some zeroes of \(u\).

And as it turns out, the zeroes of \(u\) are very interesting and important.

What happens at the zeroes of \(u\)? They are precisely where the contact planes are vertical, i.e. where \(\partial_t\), or the contact vector field \(X\), lies in \(\xi = \ker \alpha\). Indeed,
\[ \alpha(X) = \alpha(\partial_t) = \beta(\partial_t) + u dt (\partial_t) = u. \]
Here we used the fact that \(\beta(\partial_t) = 0\), since \(\beta\) is a 1-form on \(S\), which is independent of the \(t\) coordinate. So \(\alpha(X) = 0\) precisely when \(u=0\).

The set of points where \(u=0\) is called the dividing set (or decoupage, in the original French). It turns out that it’s a curve on \(S\) and it splits \(S\) into pieces where \(u>0\) and where \(u<0\). (This was proved by Giroux.)

Note that when \(u>0\), we have \(\alpha(X)>0\); and when \(u<0\), we have \(\alpha(X)<0\). Suppose we paint one side of the contact planes white, and the other side black. We think of the black side as “positive”, and the white side as “negative”, in the following sense. Given any vector \(V\), we will have \(\alpha(V) > 0\) when \(V\) points out of the white side, \(\alpha(V) = 0\) when \(V\) points along the plane, and \(\alpha(V) < 0\) when \(V\) points out of the black side.

Thus, the contact planes are white side up when \(u<0\), they become vertical along the dividing set \(u=0\), where they flip over to be black side up when \(u>0\).

A convex disc. The dividing set is drawn in red. The dividing set is usually drawn in red.

The standard notation is that the dividing set (i.e. where \(u=0\)) is denoted \(\Gamma\); the region of \(S\) where \(u>0\) is denoted \(R_+\); and the region of \(S\) where \(u<0\) is denoted \(R_-\).

The best thing is that, if you just consider the subset of \(S \times [-1,1]\) where \(u>0\) (say), i.e. \(R_+ \times [-1,1]\), you can divide \(\alpha\) out by \(u\), and obtain a contact form of the type \(\beta + dt\), where \(\beta\) is Liouville. So the characteristic foliation on \(R_+\) is a Liouville foliation, and there is a flow tangent to it which exponentially expands an area form. The same applies to \(R_-\).

So in fact a convex surface can be regarded as made up of two Liouville structures pieced together along a dividing set, where the contact planes flip over.

Another, different, convex disc.

From Liouville geometry to contact geometry

(This post is a continuation of my previous one on Liouville structures, and hence is quite technical. It’s aimed at students of geometry, not the general public. It assumes you know some differential geometry. If you know what a Liouville structure is, read on.)

We’re going to take Liouville structures and move them into 3 dimensions, to obtain contact structures.

As we’ve seen, a Liouville structure on a surface \(S\) (which necessarily has boundary, or is non-compact) is given by a 1-form \(\beta\) such that \(d\beta\) is non-degenerate. Then \(d\beta\) is a symplectic form on \(S\), and \(\beta\) is dual to a vector field \(X\) via the symplectic form, i.e. \(\iota_X d\beta = \beta\). This structure has the nice property that \(X\) points along \(\ker \beta\), and the flow of \(X\) expands \(\beta\) and \(d\beta\) exponentially. In equations, \(\beta(X) = 0\), \(L_X \beta = \beta\), and \(L_X d\beta = d\beta\).

Let’s now go into the next dimension and consider a 3-dimensional space (or 3-manifold) \(M = S \times [0,1]\). We can use \(x,y\) as coordinates on \(S\), and \(t\) as a coordinate on \([0,1]\). So this is a thickening of \(S\); you can think of \(S\) as horizontal, and \([0,1]\) as vertical, with the coordinate \(t \in [0,1]\) measuring height. (However \(M\) certainly has boundary; and if \(S\) is non-compact, then the same is true for \(M\).)

On this 3-manifold \(M\), let’s consider a 1-form. The 1-form \(\beta\) is no more interesting here than it is on \(S\), but let’s add to it a form using the third dimension. We’ll just add to \(\beta\) the simplest possible such form, \(dt\).

So define a 1-form \(\alpha\) on \(M\) by \(\alpha = \beta + dt\).

Previously, we considered the example of a Liouville structure on \(S = \mathbb{R}^2\), given by \(\beta = x \; dy\). We will continue this running example. We obtain \(M = \mathbb{R}^2 \times [0,1]\), with the 1-form \(\alpha = x \; dy + dt\).

From the prequel.

It’s harder to draw pictures of 1-forms in 3 dimensions, but it is possible! Just as in the 2-dimensional case, we can draw the kernel of the 1-form. But whereas the kernel of a (nowhere zero) 1-form on a surface is a line field, the kernel of a (nowhere zero) 1-form on a 3-manifold is a plane field.

Now it’s not difficult to see that, in our example, no matter where you are, the \(x\)-direction always lies in \(\ker \alpha\). For \(\alpha\) only contains a \(dy\) and a \(dt\) term; if you feed it a vector in the \(x\)-direction, you get zero. But to get a second linearly independent vector in the kernel, you need to take a more carefully chosen combination of the \(y\) and \(t\) directions. We can check that the combination \(\partial_y – x \partial_t\) lies in the kernel; here \(\partial_y, \partial_t\) denote unit vectors in the \(y\) and \(t\) directions.
\[ \alpha(\partial_y – x \partial_t) = (x \; dy + dt)(\partial_y – x \partial_t) = x \; dy(\partial_y) – dt(x \partial_t) = x-x=0. \]
Thus, at each point \((x,y,t)\) in \(M\), \(\ker \alpha\) is spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). We can write
\[ \ker \alpha = \langle \partial_x, \; \partial_y – x \partial_t \rangle. \]

Let’s try to visualise \(\ker \alpha\) geometrically. It’s a plane field: there is a plane at each point in \(\mathbb{R}^2 \times [0,1]\). The plane at \((x,y,t)\) is spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). When \(x=0\), the plane is spanned by \(\partial_x\) and \(\partial_y\): it’s horizontal. As \(x\) varies, \(\partial_x\) always points along the plane, and so the planes “spin” around each line in the \(x\)-direction. As \(x\) increases from \(0\), the plane tilts: rather than \(\partial_y\), the plane contains \(\partial_y -x \partial_t\), and this negative \(\partial_t\) component becomes larger as \(x\) increases increases; as \(x\) becomes large, the plane becomes close to vertical. And similarly as \(x\) decreases from \(0\): the plane gains a positive \(\partial_t\) component.

This is in fact quite a standard geometric object, at least if you’re into the field of contact geometry. This plane field given by the kernel of \(x \; dy + dt\) is also known (possibly after some relabllings of variables, and some sign changes) as the standard contact structure on \(\mathbb{R}^3\).

The standard contact structure on \(\mathbb{R}^3\) (with slightly different coordinates: replace \(x,y,z\) here with \(y,x,t\) ). Public Domain, wikipedia

This plane field has some great properties. Try to find a surface which runs tangent to it! That is, try to find a surface in \(\mathbb{R}^2 \times [0,1]\), which at every point is tangent to this plane field. Even if you try to find a tiny surface, you will not succeed — because this plane field has a property called non-integrability.

Frobenius’ theorem in differential geometry tells you when a plane field is integrable (i.e. tangent to a surface) or not. (In fact, it says this not just about 2-dimensional plane fields in 3 dimensions, but in general about any-dimensional plane fields in any number of dimensions.) There is a vector-field version, and a differential form version.

The vector-field version of Frobenius’ theorem says to take the Lie bracket of two vector fields spanning the plane field: if the Lie bracket points along the planes of the plane field, it’s integrable; otherwise, it is not. In our example we obtain
\[ [ \partial_x, \; \partial_y – x \partial_t ]
= [\partial_x, \; \partial_y] – [\partial_x, \; x \partial_t]
= 0 – \partial_x (x \partial_t) + x \partial_t \partial_x \\
= – \partial_t – x \partial_x \partial_t + x \partial_t \partial_x
= – \partial_t.
\]
Here we repeatedly used the fact that the partial derivatives commute, e.g. \(\partial_x \partial_y = \partial_y \partial_x\) or \([\partial_x, \partial_y] = 0\), and the product rule, in the slightly cryptic form of \(\partial_x x = 1 + x \partial_x\). The result is \(– \partial_t\), which is definitely not a linear combination of \(\partial_x\) and \(\partial_y – x \partial_t\).

The differential form version of Frobenius’ theorem says to take the wedge product \(\alpha \wedge d\alpha\), which will be a 3-form: if it is zero, then the plane field is integrable; otherwise, it is not. In our example we obtain
\[ \alpha \wedge d\alpha = (x \; dy + dt) \wedge (dx \wedge dy) \\
= dt \wedge dx \wedge dy \neq 0, \]
which is the standard Euclidean volume form and is definitely nowhere zero.

So, if you believe Frobenius’ theorem, or at least one or the other of these two variants, then we’ve shown that the plane field is nowhere integrable. This is exactly the definition of a contact structure: a plane field which is nowhere integrable. (This is the definition in 3 dimensions, at least.) So our plane field is a contact structure, in fact, it’s essentially the standard contact structure on \(\mathbb{R}^3\). A 1-form whose kernel is a contact structure is called a contact form. So in this example, the 1-form \(\alpha = \beta + dt = x \; dy + dt\) is a contact form.

Another nice feature comes from considering the surface \(S = S \times \{0\}\), sitting inside \(S \times [0,1]\), and how it intersects the contact planes. At every point of \(S\) is the horizontal surface \(S\) (which is a plane), and the plane \(\ker \alpha\). How do these planes intersect? Well, we saw that the planes of \(\ker \alpha\) were spanned by \(\partial_x\) and \(\partial_y – x \partial_t\). The first of these actually points along \(S\), while the second does not — except when \(x = 0\), when the planes are tangent to \(S\). So the contact planes intersect \(S\) along the \(x\) direction.

Now the lines in the \(x\) directions along \(S\) have come up in our previous discussion — they are nothing but the kernel of the Liouville form \(\beta\)! So the planes of \(\ker \alpha\) cut \(S\) along the same line as \(\ker \beta\).

Thus, the lines at each point of \(S\), given by looking at how it intersects the contact plane \(\ker \alpha\), form a line field on \(S\); and if you integrate it, you get a foliation. We saw last time that \(\ker \beta\) also forms a line field on \(S\), which also integrates to a foliation. What we’ve found here in our example is that the foliations on \(S \times \{0\}\) given by \(\ker \beta\), and by intersection with \(\ker \alpha\), coincide. (The foliations coincide when \(x \neq 0\). When \(x = 0\), both foliations are singular, so they are in fact equal as singular foliations.)

When you have a contact form \(\alpha\) on a 3-manifold, and a surface \(S\) in that 3-manifold, you can always consider, in a similar way, how the contact planes intersect the tangent plane to \(S\), at each point of \(S\). The result is a (singular) line field on \(S\), which integrates to a (singular) foliation: this is called the characteristic foliation \(\mathcal{F}\) of \(S\). In fancy language, at a point \(p \in S\), the characteristic foliation is given by the intersection of the tangent plane \(T_p S\) to \(S\), with the contact plane \(\ker \alpha_p\) there:
\[ \mathcal{F}_p = T_p S \cap \ker \alpha_p. \]
In simple language, the characteristic foliation of a surface is the pattern of how it intersects the contact planes.

What we’ve found, in our example, is that the foliation \(\ker \beta\) coincides with the characteristic foliation on \(S \times \{0\}\). And in fact the same occurs not just on \(S \times \{0\}\), but on any slice \(S \times \{t\}\).

Let’s now consider our second example from last time: \(\beta = \frac{1}{2} (x \; dy – y \; dx)\) on the surface \(S = \mathbb{R}^2\). This was the “radial” example where \(X\) and \(\ker \beta\) both pointed out directly out along lines from the origin.

See the prequel for further discussion of this picture.

In this example, we obtain on \(M = \mathbb{R^2} \times [0,1]\) the 1-form \(\alpha = \beta + dt = \frac{1}{2} (x \; dy – y \; dx) + dt\). You can check that along the \(t\)-axis \(x=y=0\), \(\ker \alpha\) is a horizontal plane, spanned by \(\partial_x\) and \(\partial_y\). At other points, the kernel is spanned by \(x \partial_x + y \partial_y\), which points horizontally radially outward from the \(t\)-axis, and \(y \partial_x – x \partial_y + \frac{x^2+y^2}{2} \; dt\), which has an angular component \(y \partial x – x \partial y\) (which is in the \(\theta\) direction in cyclindrical coordinates), and a \(dt\) component. As \(x^2 + y^2\) becomes larger, i.e. further from the \(t\)-axis, the \(dt\) term becomes larger and the planes become more vertical. The result is a plane field known as the standard cylindrically symmetric contact structure, because it’s cylindrically symmetric, and it’s a contact structure.


From John Etnyre’s lecture notes on open book decompostions and contact structures. The picture is by Stephan Schonenberger.

It’s perhaps not surprising that if you take something on the plane which is radially symmetric, and then just make a 3-d thing out of it, which doesn’t change in the new (\(t\)) dimension, then you get something which is cylindrically symmetric.

Let’s check that we actually have a contact form, by applying the differential form version of Frobenius’ theorem:
\[ \alpha \wedge d\alpha =
\left( \frac{1}{2} x \; dy – \frac{1}{2} y \; dx + dt \right)
\wedge \left( dx \wedge dy \right) \\
= dt \wedge dx \wedge dy. \]
Most of the terms immediately cancel out in the wedge product because of the anti-symmetry: all that’s left is the standard 3-dimensional Euclidean volume form, which is definitely nowhere zero.

You can also check that on \(S \times \{0\}\), or in fact on any slice \(S \times \{t\}\), the characteristic foliation points radially outward from the origin. (Indeed, we just saw that the radial vector field \(x \partial_x + y \partial_y\) lies in \(\ker \alpha\).) This again coincides with the foliation \(\ker \beta\) from the Liouville 1-form.

In fact, this argument works generally, not just on this example. The differential form version of Frobenius’ theorem can always easily be applied to show \(\alpha\) is a contact form. Moreover, it’s not difficult to show that the characteristic foliation coincides with the foliation from the Liouville 1-form.

PROPOSITION. Starting from any Liouville 1-form \(\beta\) on any surface \(S\), the 1-form \(\alpha = \beta + dt\) is a contact form on \(M = S \times [0,1]\).

Moreover, for each \(t \in [0,1]\), the characteristic foliation on \(S \times \{t\}\) coincides with the foliation of \(\ker \beta\).

In other words, at each \((p,t) \in S \times [0,1]\), the intersection of \(\ker \alpha\) with the tangent space to \(S \times \{t\}\) is equal to \(\ker \beta\).
\[ \ker \alpha_{(p,t)} \cap T_{(p,t)} (S \times \{t\}) = \ker \beta_{(p,t)} \]

PROOF. Consider \(\alpha \wedge d\alpha\); we’ll show it is nowhere zero.
\[ \alpha \wedge d\alpha = (\beta + dt) \wedge d\beta
= \beta \wedge d\beta + dt \wedge d\beta
\]
Now \(\beta \wedge d\beta\) is a 3-form, but \(\beta\) is a 1-form on the surface \(S\) only: it has no \(t\)-component. So \(\beta \wedge d\beta\) is a 3-form… on the 2-dimensional space \(S\)! Consequently it must be zero.

Thus \(\alpha \wedge d\alpha = dt \wedge d\beta\). Now we use the fact that \(\beta\) is a Liouville 1-form: this means that \(d\beta\) is a non-degenerate 2-form on \(S\). When we wedge it with \(dt\) then we must get a non-degenerate 3-form on \(S \times [0,1]\). This means \(\alpha\) is a contact form.

Now consider a point \((p,t) \in S \times [0,1]\), and a tangent vector \(V\) to \(S \times \{t\}\) there. So \(V\) is horizontal: it has no \(t\) component, and \(dt(V) = 0\). So \(\beta(V)= 0\) if and only if \(\alpha(V) = \beta(V) + dt(V) = 0\). Consequently for a vector \(V \in T_{(p,t)} S \times \{t\}\), we have \(V\) lies in \(\ker \alpha\) iff it lies in \(\ker \beta\). In other words, \(\ker \alpha \cap T_{(p,t)} S \times \{t\} = \ker \beta\). QED

Furthermore, this proof works in higher dimensions too. As long as you have a Liouville structure on a manifold \(S\) of any dimension \(2n\) (Liouville structures only exist in even dimension; this was discussed in the prequel), you’ll get a contact form on \(S \times [0,1]\) this way. (The only change is that \(\beta \wedge d\beta\) is a \((2n+1)\)-form on a \((2n)\)-manifold.)

In any case, this means that Liouville geometry leads naturally to contact geometry: we just add another dimension!

Indeed, sometimes this construction is called a contactisation!

(See, for example, section 2.3 of this 1998 paper of Yasha Eliashberg, Helmut Hofer and Dietmar Salamon.)

Liouville geometry is another name for symplectic geometry where the symplectic form is exact, i.e. the symplectic form is \(d\beta\). So contactisation upgrades symplectic geometry — the mathematics of classical Hamiltonian mechanics — to an odd-dimensional counterpart.

Another interesting thing to note is that in this “contactisation” construction, the 1-form \(\alpha = \beta + dt\) doesn’t actually depend on \(t\); it’s invariant under translations in the \(t\) direction. We saw in the second example that a radially symmetric Liouville form gave rise to a cylindrically symmetric contact structure.

And indeed, when you take \(\alpha = \beta + dt\), the 1-form \(\beta\) on the surface \(S\) doesn’t depend on \(t\); and \(dt\) is just the same \(dt\) regardless of what \(t\) is!

Such a contact form is sometimes called “vertically invariant”.

In fancy language, it means that the flow of the vertical vector field \(\partial_t\) preserves the contact form. In even fancier language, it means that the Lie derivative of the contact form \(\alpha\) in the direction of \(\partial_t\) is zero:
\[ L_{\partial_t} \alpha = 0. \]

A vector field whose flow preserves a contact structure is often called a contact vector field. So \(\partial_t\) is a contact vector field.

Our 3-manifold \(M\), the thickened surface \(S \times [0,1]\), can be considered as a family of surfaces \(S_t = S \times {t}\), for \(t \in [0,1]\). And there is a vector field \(\partial_t\), transverse to all of these surfaces.

Emmanuel Giroux discovered in the 1990s that, when you have a surface in a contact 3-manifold with a contact vector field transverse to it, very nice stuff happens. He called such a surface convex.

So Liouville geometry leads naturally not just to contact geometry, but to convex surfaces in contact geometry. More on that, another day.

Lovely Liouville geometry

(Note: This post is more technical than most stuff I write here. The intended audience here is not the general public, or even the general educated public: it’s students of geometry, broadly understood. In any case, if you don’t know what a differential form is, you’re probably not going to get much out of this.)

I’d like to show you some very nice geometry, involving some vector fields and differential forms.

Consider a surface. In fact, consider the plane, \(\mathbb{R}^2\). That’s just the standard Euclidean plane, with coordinates \( x\) and \( y\).

Now let’s consider a differential 1-form on the plane; call it \( \beta\). We’ll impose one condition on \( \beta\): its exterior derivative \( d\beta\) should be everywhere nonzero.

For instance, we can take \( \beta = x \; dy\). In fact, we will take this as a running example. Its exterior derivative is \( d\beta = dx \wedge dy\), which is just the usual Euclidean area form on the plane, and which is nowhere zero.

Now, saying that \(d\beta\) is everywhere nonzero is the same as saying that \( d\beta\) is an area form (although in general it might be different from the Euclidean area form \(dx \wedge dy\)); and this is also the same as saying that \( d\beta\) is a non-degenerate 2-form. In fact, being exact, \(d\beta\) is also closed: and hence \( d\beta\) is a closed non-degenerate 2-form, also known as a symplectic form.

Non-degenerate 2-forms are great. When you insert a vector into one, you get a 1-form; and because of the non-degeneracy, if the vector is nonzero, then the resulting 1-form is nonzero. So you get a bijective correspondence, or duality, between 1-forms and vectors.

This means that, at each point \(p \in \mathbb{R}^2\), the non-degenerate 2-form \(d\beta\) provides a linear map of 2-dimensional vector spaces
\[ T_p \mathbb{R}^2 \rightarrow T_p^* \mathbb{R}^2, \]
or in other words
\[ \{ \text{Vectors at $p$} \} \rightarrow \{ \text{1-forms at $p$} \}, \]
which sends a vector \(v\) to the 1-form \(\iota_v d\beta\) (i.e. the 1-form \(d\beta(v, \cdot)\), where you have fed \( d\beta\) one vector, but it eats two courses of vectors, and after its entree it remains a 1-form on its remaining main course). It’s a linear map and, by the non-degeneracy of \(d\beta\), its kernel/nullspace consists solely of the zero vector. Thus it’s injective and, both vector spaces being 2-dimensional, it’s an isomorphism.

If we consider (smooth) vector fields rather than just single vectors, then we can simultaneously do this at each point of \(\mathbb{R}^2\), and we get a map
\[ \{ \text{Vector fields on $\mathbb{R}^2$} \} \rightarrow \{ \text{1-forms on $\mathbb{R}^2$} \}. \]

So \( d\beta\), being a non-degenerate 2-form, gives us a way to go from 1-forms to vectors and back again. We can think of this as a duality: for each 1-form, this correspondence gives us a dual 1-form, and vice versa.

So far, we only have \(\beta\) and \( d\beta\). But \( \beta\) is a 1-form! So some vector field must correspond to it. Let’s call it \( X\). As it turns out, the 1-form \( \beta\), the 2-form \( d\beta\), and the vector field \( X\), form a very nice structure.

The name of Joseph Liouville is often associated with this stuff. Often the 1-form \(\beta\) is called a Liouville form, often the surface (or manifold in general) is called a Liouville manifold, and the whole thing is often called a Liouville structure.

In fact, we can draw pictures of such a structure.

The easiest thing to draw is the vector field \( X\): a vector, drawn as an arrow, at each point.

How do we draw \(\beta\)? It’s generally hard to draw a picture of a differential form! However for a 1-form, we can draw its kernel \(\ker \beta\). At any point \(p\), \(\beta_p\) is a linear map from the tangent space \(T_p \mathbb{R^2}\), which is a 2-dimensional vector space, to \(\mathbb{R}\). When \(\beta_p = 0\), \(\beta\) is the zero map \(T_p \mathbb{R}^2 \rightarrow \mathbb{R}^2\) and hence the kernel is the whole 2-dimensional tangent space \(T_p \mathbb{R}^2\). But where \(\beta_p \neq 0\), we have a nontrivial linear map from a 2-dimensional vector space to a 1-dimensional vector space. Hence \(\beta_p\) has rank 1 and nullity 1, so the kernel is a 1-dimensional subspace of \(T_p \mathbb{R}^2\). We then have a 1-dimensional tangent subspace at each point; in other words, \(\ker \beta\) is a line field on \(\mathbb{R}^2\). We can even join up the lines (i.e. integrate them) to obtain a collection of curves on \(\mathbb{R}^2\), which become the leaves of a foliation. In this way \(\beta\) can be drawn as a collection of curves, or singular foliation, on \(\mathbb{R}^2\): the foliation is singular at the points where \(\beta = 0\). True, drawing it this way only shows the kernel of \(\beta\), i.e. in which direction you will get zero if you feed a vector into \(\beta\), and you will not see what you get if you feed vectors in other directions. So you don’t see how “strong” \(\beta\) is at each point. But it’s a useful way to represent \(\beta\) nonetheless.

As for the 2-form \(d\beta\)? It’s just an area form; we won’t attempt to draw anything to represent that.

So, let’s draw what we get in our example. In our example, the 1-form is \( \beta = x \; dy\), the 2-form is \( d\beta = dx \wedge dy\), and the vector field \( X\) must satisfy
\[ \iota_X d\beta = \beta,
\quad \text{i.e.} \quad
\iota_X ( dx \wedge dy ) = x \; dy. \]
It’s not too difficult to calculate that \( X = x \partial_x\) (here \( \partial_x\) is a unit vector in the \( x\) direction; if we’re using \( (x,y)\) to denote coordinates, then \( \partial_x = (1,0)\)).

Being a multiple of \( \partial_x\), \( X\) always points in the \( x\) direction, i.e. horizontally. When \( x\) is positive it points to the right, when \( x\) is negative it points to the left; and when \( x=0\), i.e. along the \( y\)-axis, \( X\) is zero. So \( X\) is actually a singular vector field, in the sense that it has zeroes. And it’s zero along a whole line. (So it’s not generic.)

As for \( \beta = x \; dy\), it is zero when \( x=0\), so the foliation \( \ker \beta\) is singular along the \( y\)-axis. When \( x \neq 0\), the kernel consists of anything pointing in the \( x\)-direction. Since \( \beta\) has only a \( dy\), but no \( dx\) term, if you feed it a \( \partial_x\), or any multiple thereof, you’ll get zero. So the line field you draw is horinzontal, as is the foliation. In other words, \( \ker \beta\) is the singular foliation consisting of horizontal lines, with singularities along the \( y\)-axis.

The line field \(\ker \beta = \langle \partial_x \rangle\) is shown in orange; the vector field \(X = x \partial_x\) is shown in blue.

Note that, although we might have expected the line field of \( \ker \beta\) and the arrows of \( X\) to point all over the place, in fact the arrows and lines point in the same direction, i.e. horizontal! The vectors of \( X\) point along the lines of \( \ker \beta\). This means that, at each point, the vector \( X\) lies in the kernel of \( \beta\), and hence \( \beta(X) = 0\).

Now in this example, the vector field \(X = x \partial_x\) has a very nice property. If you flow it, then points move out horizontally, and exponentially. Indeed, if you interpret \( x \partial_x\) as a velocity vector field, it is telling a point with horizontal coordinate \( x\) to move to the right, with velocity \( x\). Telling something to move as fast as where it already is, is a hallmark of exponential movement.

Denoting the flow of \( X\) for time \( t\) by \( \phi_t\), we have a map \( \phi_t : \mathbb{R}^2 \rightarrow \mathbb{R}^2\). It’s an exponential function:
\[ \phi_t (x,y) = (x e^{t}, y). \]
Indeed, you can check that \( \frac{\partial}{\partial t} \phi_t = (x e^t, 0)\), which at time \( t=0\) is
\[ \frac{\partial}{\partial t} |_{t=0} = (x,0) = X. \]

Now the vector field \(X\) (or its flow \(\phi_t\)) expands in the horizontal direction exponentially, and does nothing in the \(y\) direction: from this it follows that \(X\) also expands area exponentially. An infinitesimal volume \(V\), after flowing under \(\phi_t\) for time \(t\), expands so that \(\frac{\partial V}{\partial t} = V\), and hence grows exponentially: at time \(t\), the volume has expanded from \(V\) to \(V e^t\). Rephrased in terms of differential forms, the Lie derivative of the area form \(d\beta = dx \wedge dy\) under the flow of \(X\) is the area form itself:
\[ L_X d\beta = d\beta. \]

To see this, we use the Cartan formula \(L = d\iota + \iota d\), which yields
\[ L_X d\beta = d \iota_X (d\beta) + \iota_X d (d\beta). \]
From the definition of \(X\), being dual to \(\beta\), we have \(\iota_X (d\beta) = \beta\); using this, and the fact that \(d^2 = 0\), the first term becomes \(d\beta\) and the second term is zero.

In fact, \( X\) doesn’t just expand the area \( d\beta\) exponentially; it also expands the Liouville 1-form \( \beta\) exponentially. In other words,
\[ L_X \beta = \beta. \]

To see this, we just apply the Cartan formula,
\[ L_X \beta = d \iota_X \beta + \iota_X d\beta = 0 + \beta. \]
The first term is zero because, as we saw, \( X\) points along \( \ker \beta\), so \( \iota_X \beta = \beta(X) = 0\); the second term is zero because of the definition of \( X\) as dual to \( \beta\).

To summarise our example so far: we started with a 1-form \( \beta\) whose exterior derivative \( d\beta\) was a non-degenerate 2-form. We took a vector field \( X\) dual to \( \beta\), using the non-degeneracy of \( d\beta\). We have found that:

  • The vector field \( X\) points along the foliation \( \ker \beta\); in other words, \( \beta(X) = 0\).
  • Flowing \( X\) expands area exponentially: \( L_X d\beta = d\beta\).
  • Flowing \( X\) in fact expands the 1-form \( \beta\) exponentially: \( L_X \beta = \beta\).

We proved some of these in sort-of generality, but not everything. Let’s prove it all at once in general, now.

PROPOSITION. Let \( \beta\) be a 1-form on a surface such that \( d\beta\) is non-degenerate. Let \( X\) be dual to \( \beta\), i.e. \( \iota_X d\beta = \beta\). Then:
(i) \(\beta(X) = 0\)
(ii) \( L_X d\beta = d\beta\)
(iii) \( L_X \beta = \beta.\)

PROOF. For (i), we note \( \beta(X) = \iota_X \beta\), and then use \( \beta = \iota_X d\beta\) and the fact that differential forms are antisymmetric:
\[ \beta(X) = \iota_X \beta = \iota_X \iota_X d\beta = d\beta(X,X) = 0. \]

For (ii) and (iii), we can then follow the arguments above. For (ii), we use the Cartan formula \( L_X = d \iota_X + \iota_X d\), the fact that \( d^2 = 0\), and the fact that \( \iota_X d\beta = \beta\).
\[ L_X d\beta = d \iota_X d\beta + \iota_X d d\beta = d\beta + 0 \]

For (iii), we use the Cartan formula, part (i) that \(\iota_X \beta = 0\), and the fact that \(\iota_X d\beta =\beta\).
\[ L_X \beta = d \iota_X \beta + \iota_X d \beta = d 0 + \beta = \beta. \]

So Liouville structures have some very nice properties.

And, this is all in fact classical physics. We can think of the plane as a phase space, and \( \beta\) as an action \( y \; dx = p \; dq\). Then \( d\beta\) is the symplectic form on phase space, and the equation \( L_X d\beta = d\beta\) shows that this symplectic form, the fundamental structure on the phase space, is expanded by the flow of \( X\). (There is something in classical mechanics, or symplectic geometry, known as Liouville’s thoerem, which also says something about the effect of a flow of a vector field on the symplectic form.)

Anyway, above we saw one example of a Liouville structure on the plane. Here’s another one, which is more “radial”.

Take \(\beta = \frac{1}{2}(x dy – y dx)\). Then \(d\beta = \frac{1}{2}(dx \wedge dy – dy \wedge dx) = dx \wedge dy\). The vector field \(X\) dual to \(\beta\) is then \(\frac{1}{2}(x \; dx + y \; dy)\), which is a radial vector field. The kernel of \(\beta\) is also the radial direction: \(\beta(X) = (xdy – ydx)(x \partial_x + y \partial_y) = xy – yx = 0\). The flow of \(X\) then looks like it should expand area exponentially, and it does.

Liouville structures can exist in other places too, and not just on surfaces: they exist in higher dimensions too. Notice that the fact that \(\beta\) was on a surface was never actually used in the proof above: it could have been any manifold, with any 1-form \( \beta\) such that \( d\beta\) is non-degenerate.

However, not every manifold has Liouville structures. In fact, there are many surfaces on which no Liouville structures exist. Any compact surface (without boundary) has no Liouville structure.

Why is this? The idea is pretty simple. If you have a closed and bounded surface, it’s pretty hard to have a smooth vector field \( X\) which expands the area! Your surface has a given finite area, and then you flow it along \( X\) — moving the points of the surface around by a diffeomorphism — and now it has exponentially larger area! This is pretty paradoxical, and indeed it’s a contradiction.

The plane escapes this paradox, because it’s not bounded. You can indeed expand the plane by any factor you like, and it’s still the same plane. Surfaces with boundary also escape the paradox, because the flow of \( X\) will not be defined for all time: eventually points will be pushed off the edge.

But a sphere torus does not escape the paradox. Nor does a torus, or any higher genus compact surface without boundary.

Further, Liouville structures can only exist in even dimensions. So you can’t have one on a 3-dimensional space. But you can have one on a 4-dimensional space. Why is this? It can be seen from linear algebra. You simply can’t have non-degenerate 2-forms in odd dimensions. (The easiest way I know to see this is as follows. Let \( \omega\) be a non-degenerate 2-form on an \( n\)-dimensional vector space. Choose a basis \( e_1, \ldots, e_n\) and write \( \omega(e_i, e_j)\) as the \( (i,j)\) term of the \( n \times n\) matrix \( A\) for \( \omega\). The facts that \( \omega\) is antisymmetric and non-degenerate mean that \( A^T= -A\) and \( \det A \neq 0\) respectively. But then \( \det A = \det A^T = \det (-A) = (-1)^n \det A\), so \( (-1)^n = 1\), and \(n\) is even.

Just as a Liouville structure cannot exist on a compact surface (without boundary), it can’t exist on any compact manifold (without boundary). The argument is similar: because \( X\) expands the 2-form \( d\beta\), it also expands all its exterior powers, and hence the volume form of the manifold, whatever (even) dimension it may be.

So, we’ve seen that a Liouville structure can only exist on a manifold with even dimension, and which is not compact, or has boundary. (If you know about de Rham cohomology, it’s not difficult to see why the manifold must have \( H^2 \neq 0\).)

But when it does, we have a wonderful little geometric triplet \( \beta, d\beta\) and \( X\).

Emmy had a theorem (mathematical nursery rhyme #2)

In the spirit of previous work in abstract algebra, I have, erm, adapted another nursery rhyme.

After all, the songs are so common and commonly known; why not update them with some definite content?

To the tune of “Mary had a little lamb” (with no disrespect to the original, which seems to be an endearing story of an actual lamb), a discussion of Noether’s theorem.

If you haven’t heard of Noether’s theorem, it is very nice. (It should be distinguished from several other theorems of Emmy Noether, and indeed other mathematical Noethers.)

Roughly speaking, Noether’s theorem states that whenever a physical system has a nice symmetry, there is always some numerical quantity which is conserved along with it.

For instance, if a physical system is invariant under translation, then there is a conserved quantity associated to it, known as momentum. (And there are translations in three independent directions in space, so there are three components of momentum which are conserved. In other words, momentum as a vector quantity is conserved.) Similarly, if it’s invariant under rotations, then there is a conserved quantity known as angular momentum. Invariant under moving forward and backward in time — a conserved quantity known as energy. And so on.

This is not very precise, and there are different ways of formulating it, and of course physicists and mathematicians have different perspectives about it — as well as the level of mathematical precision and rigour with which it should be stated and understood.

The wikipedia page, at least at the time of writing, has a very physics-oriented discussion, which would offend many mathematicians’ sensibilities — certainly including my own. The nicest mathematical formulation uses symplectic geometry, and hence some fairly serious prerequisite knowledge, well beyond the Australian undergraduate curriculum. (Unless you take an undergraduate research unit with me at Monash, perhaps!)

A good discussion may be found in the lecture notes of Ana Cannas da Silva, available online here. Once enough machinery is developed to state the principle cleanly (um, in section 24 on page 147…), the theorem is proved in a leisurely half a dozen lines.

Anyway, less talk, more nursery rhymes!

Emmy had a theorem,
theorem, theorem
Emmy had a theorem
Its proof was clear as day.

Everywhere a symmetry,
symmetry, symmetry
Everywhere a symmetry
A conserved quantity.

Golay Golay Golay (Top of the autocorrelation world)

In 1949, Marcel Golay was thinking about spectrometry.

As he described it some time later the situation was as follows.

You have a spectrometer. The point of spectrometry is to find the frequency of light (or electromagnetic radiation more generally — but for convenience I’ll just say “light” from now on). Given a light source, spectrometry aims to find which frequencies (or colours) of light occur in it, and how they are distributed across the optical spectrum.

The spectrometer Golay had in mind was a cleverly designed “multislit” one. As the name suggests, it had many slits. Each slit could be open or closed. Light would come in on one side, pass through the contraption, and then exit on the other side, where detectors would be placed to record the output.

Both the entrance side and the exit side had many slits — the same number \(4N\) on either side. (Why a multiple of 4? It’s all part of the clever design, read on…)

Moreover, each entrance slit had a natural pathway through to an exit slit. The slits were designed so that light entering a particular entrance slit would pass through to a specific exit slit. The entrance and exit slits were thus matched up in a one-to-one fashion. This “matching up” in fact “inverted” the light: light coming in through the top slit on the left, would exit through the bottom slit on the right; light entering through the second-from-top slit on the left would exist through the second-from-bottom slit on the right; and so on.

At least, that’s what would happen for one particular colour of light, i.e. one particular frequency — let’s say pure crimson red. The point of the spectrometer is to pick out distinct frequencies, and so this contraption is “tuned” to perfectly align the slits for crimson red light.

What about other frequencies? They get shifted. When light of another frequency, let’s say green, passed through an entrance slit, it did not end up in the same place as crimson red light, opposite to where it came in; rather, it ended up shifted across by some number \(j\) of slits.

In other words, if red light and green light enter through the same lit, they exit through slits which are \(j\) spots apart from each other.

Golay’s idea was to arrange the slits and detectors in a clever way, so as to eliminate all the light of other freuqencies, and isolate the preferred (red) light. By an ingenious arrangement of detectors and open and closed slits, the red light would be greatly enhanced, with other colours (frequencies) completely filtered out.

How did this arrangement go? In a slightly complicated way. The entrance slits would be split into four equal length sections, each of length \(N\), as would the exit slits. Light entering through a slit in a particular section would go out a slit in the corresponding (opposite) section of exit slits.

These sections were separated from each other. In particular, non-red light could be shifted across slits within one section, but it could not cross over to another section.

Golay imagined there to be two detectors. The first detector \(D_1 \) would cover the bottom two exit sections, measuring the total amount of light exiting the top half of the slits, i.e. the bottom \( 2N \) exit slits. The other detector \(D_2\) would cover the top half of the exit slits, i.e. the bottom two sections, the bottom \( 2N \) exit slits. The detectors \( D_1 \) and \( D_2 \) simply capture the amount of light coming out of the bottom and top \( 2N \) slits respectively, or equivalently for our purposes, the number of those slits through which light emerges.

So in effect the whole contraption is in four separated parts, and there are two detectors, each detecting the output from two of the parts.

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory. What do the a’s and b’s mean? Read on…

Now, how to arrange the open and closed slits? Let’s denote open slits by a \( +1 \) (or just \( + \) for short), and closed slits by a \( -1 \) (or just \( – \) for short). So a sequence of open and closed slits can be denoted by a sequence of \( + \) and \( – \) symbols.

(You might think \( 1 \)s and \( 0 \)s are more appropriate for open and closed slits then \( +1\)s and \( -1 \)s. You could indeed use \(1\)s and \(0\)s; in that case I’ll leave it up to you to adjust the mathematics below.)

Now Golay suggested taking two sequences \(a\) and \(b\) of \(+\)s and \(–\)s, each of length \(N\) . They would be used to configure the slits. Let’s write \( a = (a_1, \ldots, a_N) \) and \(b = (b_1, \ldots, b_N)\) , where every \(a_i \) or \( b_i \) is either a \( +1 \) or a \( -1 \) .

Now, sequences \(a \) and \( b \) each have length \( N\), but there are \( 4N \) entrance slits and \( 4N \) exit slits.

What to do? Golay said what to do. Golay said also to take the negatives of \( a \) and \( b\). The negative of a sequence is given by multiplying all its terms by \(-1 \) (just like how you take the negative of a number). In other words, to take the negative of the sequence \( a\), you replace each \( + \) with a \( – \), and each \(– \) with \(+\). We can write \(-a\) for the negative of \(a\), and \(-b\) for the negative of \(b\).

Golay suggested, very cleverly, that the \(4N \) entrance slits, from top to bottom, should be should be arranged using \(a \) (for the top \( N \) slits), then \( -a \) (for the next \( N\)), then \( b \) (for the next \(N\)), and finally \(-b\) (for the bottom \(N\) slits). So as we read down the slits we read the sequences \( a,-a,b,-b\).

On the exit side, because the light is “inverted”, we now read bottom to top. Golay suggested that, as we read up the slits, we use the sequences \( a,-a,-b,b\). That’s not quite the same as what we did on the entrance side. The top \(N\) entrance slits, set according to the sequence \(a\), correspond to the bottom \(N \) exit slits, also set according to the sequence \(a\). The next \(N\) entrance slits are set according to \(-a\), as are the next \( N \) exit slits. But after that, the entrance slits set according to \( b \) correspond to the exit slits set to \( -b \) ; and the final \( N \) entrance slits are set to \( -b\), with corresponding exit slits set to \( b\). So the \(a \) and \( -a \) slits “match”, but the \( b \) and \( -b \) “anti-match”.

We can now see what the a’s and b’s mean in Golay’s diagram. (Golay writes \( a’ \) and \( b’ \) rather than \(-a\) and \(-b\).)

From Golay’s 1961 paper “Complementary Sequences”, IRE Transactions on Information Theory, again.

One final twist: the output of the contraption is measured by the two detectors \( D_1 \) and \( D_2\). But Golay proposed not to add their results, but to subtract them. So the final number we want to look at is not \(D_1 + D_2\), but \( D_1 – D_2\).

Anyway, that was Golay’s prescription.

So what happens to light going through this spectroscopic contraption, now with its numerous slits configured in this intricate way?

First let’s consider red light — which, recall, means the light goes straight from entrance slit to opposite exit slit. We’ll take the four sections separately, which, we recall, are labelled \(a,-a,b,-b \) at the entrance, and \( a,-a,-b,b \) at the exit.

  • For light hitting one of the top \( N \) entrance slits, one encoded by \( a_i\), it is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, out to the corresponding exit slit, which is also labelled \( a_i = 1\), and through to the detector \( D_1\).
  • Similarly, consider one of the entrance slits in the next section, encoded by some \( -a_i\). Light is blocked if \(-a_i = -1 \) but if \( -a_i = 1 \) then the light sails over the the corresponding exit slit, also labelled \( -a_i = 1 \) , through to the detector \( D_1\).
  • Now consider the third section, where entrance slits are encoded by \( b \) but exit slits are encoded by \( -b\). Light hits a slit encoded by some \(b_i\). If \(b_i = -1\), the entrance slit is closed, and the light is blocked there. If \( b_i = 1\), the entrance slit is open, and the light enters, but then the exit slit is encoded by \( -b_i = -1\), so is closed, and the light is blocked here. Either way, the light is blocked.
  • The final section is similar. The entrance slit is labelled by some \( -b_i\), and the exit slit by \(b_i\). If \( -b_i = -1\), the entrance slit is closed and light is blocked; if \( -b_i = 1\), then the entrance slit is open, but as \( b_i = -1\), the exit slit is blocked. Either way the light is blocked.

Now detector \(D_1 \) counts the number of slits in the first two sections from which light emerges. In the first section, those slits are the ones encoded by \( a_i \) such that \( a_i = 1\). In the second section, those slits are the ones encoded by \( -a_i \) such that \( -a_i = 1 \) , i.e. \( a_i = -1\). On the other hand, \(D_2 \) detects nothing, as everything is blocked. So we have

\[
D_1 = ( \# i \text{ such that } a_i = 1) + ( \#i \text{ such that } a_i = -1), \\
D_2 = 0.
\]

The expression for \( D_1 \) simplifies rather dramatically, because every \( a_i \) is either \( +1 \) or \( -1\). If you add up the number of \(+1\)s and the number of \(-1\)s, you simply get the number of terms in the sequence, which is \(N\). Thus in fact
\[
D_1 = N, \quad D_2 = 0,
\]
and the final result (remember we subtract the results of the two detectors) is
\[
D_1 – D_2 = N.
\]

So, we end up with a nice result, when we feed Golay’s spectroscope light of the colour it’s designed to detect (i.e. red).

Now, what happens with other colours? Let’s now feed Golay’s spectroscope some other colour (i.e. frequency, i.e. wavelength) of light, which means that the light gets shifted across \( j \) slots. Let’s say the light is green.

  • Consider green light hitting one of the top \( N \) entrance slits, encoded by \( a_i\). The light is blocked if \( a_i = -1\). But if \(a_i = 1 \) then the light sails through the open slit, over to the corresponding exit slits, which are also encoded by the sequence \( a\). The light is shifted across \(j \) slots in the process, and so arrives at the exit slit encoded by \( a_{i+j}\). If \(a_{i+j} = 1\), the light proceeds to detector \( D_1\); otherwise, the light is blocked. In other words, the green light gets to the detector if and only if \( a_i = a_{i+j} = 1\).
  • (Note also that if \( i+j > N \) or \( i+j < 1\), then the light beam gets shifted so far across that it hits the end of the section of the machine; and the sections are separated from each other. So we only need to consider those \( i \) (which are between \( 1 \) and \( N \) ) such that \( i+j \) is also between \( 1 \) and \( N\). In other words, (assuming \( j \) is positive) \( i \) only goes from \( 1 \) up to \( N-j\).
  • Now consider green light hitting the second section, where entrance and exit slits are labelled by \( -a_i\). If \( a_i = -1\), then light is blocked at the entrance. If \( -a_i = 1\), light enters, and proceeds with a shift over to an exit slit encoded by \( -a_{i+j}\). If \( -a_{i+j} = -1\), light is blocked at the exit, but if \( -a_{i+j} = 1\), then the light proceeds to detector \( D_2\). In other words, light gets to the detector \( D_1 \) if and only if \( -a_i = -a_{i+j} = 1\), or equivalently, \( a_i = a_{i+j} = -1\).
  • In the third section, entrance slits encoded by \( b \) and exit slits by \( -b\). For light to get through, we must have \( b_i = 1 \) and \( -b_{i+j} = 1\).
  • Finally, in the fourth section, entrance slits are encoded by \( -b \) and exit slits by \( b\). Light gets through when \( -b_i = 1 \) and \( b_{i+j} = 1 \).

Putting these together, we have
\[
D_1 = ( \# i \text{ such that } a_i = 1 \text{ and } a_{i+j} = 1) + ( \# i \text{ such that } a_i = -1 \text{ and } a_{i+j} = -1), \\
D_2 = ( \# i \text{ such that } b_i = 1 \text{ and } b_{i+j} = -1) + ( \# i \text{ such that } b_i = -1 \text{ and } b_{i+j} = 1).
\]

Now let’s manipulate these sums a little. Note that, for any \(i\), \( a_i = \pm 1 \) and \( a_{i+j} = \pm 1\). Thus the product \( a_i a_{i+j} = \pm 1\). But note that \( a_i a_{i+j} = 1 \) precisely when \( a_i = a_{i+j} = 1\), or \( a_i = a_{i+j} = -1[latex], i.e. when [latex] a_i \) and \( a_{i+j} \) are equal. These are precisely the cases counted in the sum for \( D_1 \) above. When \( a_i \) and \( a_{i+j} \) are not equal, they multiply to \( -1 \) instead.

Similarly, consider \( b_i \) and \( b_{i+j}\). The product \( b_i b_{i+j} \) is equal to \( -1 \) precisely when \( b_i = 1 \) and \( b_{i+j} = -1 \) , or when \( b_i = -1 \) and \( b_{i+j} = 1 \) . And these are precisely the cases counted above for \( D_2 \) .

So we have
\[
D_1 = ( \# i \text{ such that } a_i a_{i+j} = 1 ),\\
D_2 = ( \# i \text{ such that } b_i b_{i+j} = -1).
\]
Now, as we’ve said, for each \(i\), \( a_i a_{i+j} \) is \( 1 \) or \( -1\). For how many \( i \) do we get \( +1\)? Precisely \(D_1 \) times! Because that’s exactly what the equation above for \( D_1 \) says. All the other terms must be \( -1\). And we said above that \( i \) goes from \( 1 \) up to \( N-j\). So there are \( N-j-D_1 \) times that \( a_i a_{i+j} = -1\).

Let’s now just add up all the terms \( a_i a_{i+j}\), all the way from \( i=1\), i.e. the term \( _1 a_{1+j}\), to \( i=N-j\), i.e. the term \(a_{N-j} a_{N-j+j}\). We get \(+1 \) sometimes — precisely \( D_1 \) times — and \( -1 \) sometimes — precisely \( N-j-D_1 \) times. It follows that
\[
a_1 a_{1+j} + \cdots + a_{N-j} a_{N-j+j} = 1 \cdot D_1 + (-1) \cdot (N-j-D_1)
\]
or if we tidy up,
\[
\sum_{i=1}^{N-j} a_i a_{i+j} = 2D_1 – N + j.
\]
We can do the same for the terms \( b_i b_{i+j}\). We get \( -1 \) precisely \( D_2 \) times, as the equation for \( D_2 \) says above. And we get \( +1 \) all the other times, but there are \( N-j \) times overall, so we get \( +1 \) precisely \( N-j-D_2 \) times. Hence
\[
b_1 b_{1+j} + \cdots + b_{N-j} b_{N-j+j} = 1 \cdot (N-j-D_2) + (-1) \cdot D_2,
\]
or equivalently,
\[
\sum_{i=1}^{N-j} b_i b_{i+j} = -2D_2 + N – j.
\]

We want to get the final result of the detectors, which is \( D_1 – D_2\) . So let’s rearrange the equations above to obtain \( D_1 \) and \( D_2\),
\[
D_1 = N – j + \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j}, \\
D_2 = N – j – \frac{1}{2} \sum_{i=1}^{N-j} b_i b_{i+j},
\]
and subtract. When we do so, things simplify considerably!
\[
D_1 – D_2 = \frac{1}{2} \sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j}
\]

This is a very nice result. And it reduces what Golay wanted to a very interesting maths problem. Two sequences \( a = (a_1, \ldots, a_N) \) and \( b = (b_1, \ldots, b_N) \) of \( \pm 1\)s are called a complementary pair or a Golay pair if, for all \( j \neq 0\), this sum is zero:
\[
\sum_{i=1}^{N-j} a_i a_{i+j} + b_i b_{i+j} = 0.
\]
Sums like these are often called autocorrelations. So the property we are looking for is a property of autocorrelations. Golay pairs are all about autocorrelations. Hence the title of this post.

If you can find a pair of Golay complementary sequences, then you can configure all the slits in the multislit spectrometer according to the sequence, and for any colour except the one you are looking for (red), the detectors will perfectly cancel out that colour! So your spectrometry will be greatly enhanced.

Now you might wonder, do any such pairs exist?

Yes, that do. Oh yes, they do. And that is also a very interesting question — not yet completely solved, with lots of ongoing research.

Stay tuned for more.

P.S. Yes, the title of this blog post is based on a song by Chumbawumba. It’s a very excellent song.

The “Australia day” category error

Australia’s national holiday commemorates not some heroic act, but the arrival of settler colonists who occupied, and settled that land, dispossessing the original and rightful inhabitants of the continent. Aboriginal sovereignty was never ceded; no treaty has ever been signed. Historic dispossession and violence, involving frontier wars and genocidal campaigns, decimated the Indigenous nations. There is struggle and heroism here, but mainly in the capacity of Indigenous peoples to resist and to survive.

Suppose I came to your home, invited myself in, made it my home, took your possessions, evicted or kidnapped or infected or murdered your family, and then celebrated the anniversary of my arrival each year — what would be the appropriate response?

And the answer is the same in the excruciatingly mind-numbing debate each year in Australia about whether the national holiday is appropriate.

(To avoid maximum excruciation, let us state the obvious. Clearly this analogy is not literal; no individual living today bears direct moral culpability for tragedies which unfolded in historical time. But it is precisely the symbolism, and national commemorations are pure symbolism, by design.)

The question in this mind-numbing debate may be an easy one, but even to ask it — of non-Indigenous Australians — contains a category error.

If I took over your home and then held a celebration there each year, it is not for me to say whether that celebration is appropriate. It is for you to say. I may well say it is not appropriate, but even if I think it is, your view counts for more; you have suffered the injustice. The correct answer is not just “no”, but also “it’s not for me to say”.

And so, to answer the question of the appropriateness of “Australia day”, the answers of Indigenous people are the most important. Everybody is entitled to their opinion, but an opinion on the question which does not take into account the views of Indigenous people cannot be taken seriously.

Views of Indigenous Australians can easily be found. The broadest data I’m aware of are poll results from 2017, a survey of 1,156 Indigenous Australians about “Australia day”. (If you know a better or more recent poll I would be happy to update.) It found that:

  • 54% of Indigenous Australians were in favour of a change of date. This may suggest that only a slim majority are against the event, but further results make it clear that the other 46% are far from being uniformly enthusiastic. For instance:
  • The survey asked participants to associate three words with Australia day. The most chosen words by Indigenous Australians were “invasion”, “survival” and “murder”.
  • A majority of Indigenous Australians said that the name “Australia day” should change.
  • 23% of Indigenous participants felt positive about Australia day, 30% had mixed feelings, and 31% had negative feelings.

Despite the above poll results, in January 2018 the Indigenous Affairs minister (who is not Indigenous) claimed that “no Indigenous Australian has told him the date of Australia Day should be changed other than a single government adviser”. This says more about a politician being out of touch, than it does about the distribution of opinion among Indigenous Australians.

In contrast, Jack Latimore, editor of IndigenousX, the prominent online platform for Indigenous voices, comes to a rather different conclusion.
Based on his extensive experience and engagement with Indigenous Australians from across the social and political spectrum, his conclusion is worth repeating:

When it comes to the subject of 26 January, the overwhelming sentiment among First Nations people is an uneasy blend of melancholy approaching outright grief, of profound despair, of opposition and antipathy, and always of staunch defiance.

The day and date is steeped in the blood of violent dispossession, of attempted genocide, of enduring trauma. And there is a shared understanding that there has been no conclusion of the white colonial project when it comes to the commonwealth’s approach to Indigenous people. We need only express our sentiments regarding any issue that affects us to be quickly reminded of the contempt in which our continued presence and rising voices are held.

Nor is our sentiment in regards to 26 January a recent phenomenon. I have witnessed it throughout my life in varied intensities. Evidence of it is even present in the recorded histories of White Australia.

Indeed, the long history of Indigenous protest against a January 26 celebration goes back at least to boycotts in 1888, and numerous actions on the 1938 sesquicentenary.

Returning to the present, numerous community leaders and representative bodies have also given their views, many of which are available online. Below are links to some such views; of course there plenty more are easily found.

Changing the date is an obvious, minimal, easy next step on the road to justice for Indigenous Australia. At the very least, maintaining the celebration in its current form is untenable. A minimal step towards respect for Indigenous Australia is to stop dancing on their ancestors’ graves.

Nor is it particularly opposed by the general Australian public. According to a December 2017 poll, most Australians are ignorant of the history of Australia Day, can’t guess what historical event happened on that day, and don’t really mind on what date it is celebrated. Half also think that the national holiday should not be held on a date offensive to Indigenous Australians (even though a plurality wrongly believes that January 26 is not offensive to Indigenous Australians).

As of a January 2017 poll, only 15% of Australians wanted to change the date. That number may well have increased by now, with the momentum of the movement to change the date.

And the survey apparently did not have “it’s not for me to say” as an option for non-Indigenous respondents — reinforcing the standard, annual category error.

I don’t believe in any patriotic holidays. But a patriotic holiday on such a terrible date needs to be moved, rebuilt, or abolished.

Topological entropy: information in the limit of perfect eyesight

Entropy is a notoriously tricky subject. There is a famous anecdote of John von Neumann telling Claude Shannon, the father of information theory, to use the word “entropy” for the concept he had just invented, because “nobody knows what entropy really is, so in a debate you will always have the advantage“.

Entropy means many different things in different contexts, but there is a wonderful notion of entropy which is purely topological. It only requires a space, and a map on it. It is independent of geometry, or any other arbitrary features — it is a purely intrinsic concept. This notion, not surprisingly, is known as topological entropy.

There are a few equivalent definitions; we’ll just discuss one, which is not the most general. As we’ll see, it can be described as the rate of information you gain about the space by applying the function, when you have poor eyesight — in the limit where your eyesight becomes perfect.

Let \(X\) be a metric space. It could be a surface, it could be a manifold, it could be a Riemannian manifold. Just some space with an idea of distance on it. We’ll write \(d(x,y)\) for the distance between \(x\) and \(y\). So, for instance, \(d(x,x) = 0\); the distance from a point to itself is zero. Additionally, \(d(x,y) = d(y,x)\); the distance from \(x\) to \(y\) is the same as the distance from \(y\) to \(x\); the triangle inequality applies as well. And if \(x \neq y\) then \(d(x,y) > 0\); to get from one point to a different point you have to travel over more than zero distance!

We assume \(X\) is compact, so roughly speaking, it has no holes, it doesn’t go off to infinity, its volume (if it has a volume) is finite.

Now, we will think of \(X\) as a space we are looking at, but we can’t see precisely. We have myopia. Our eyes are not that good, and we can only tell if two points are different if they are sufficiently far apart. We can only resolve points which have a certain degree of separation. Let this resolution be \(\varepsilon\). So if two points \(x,y\) are distance less than \(\varepsilon \) apart, then our eyes can’t tell them apart.

Rather than thinking of this situation as poor vision, you can alternatively suppose that \(X\) is quantum mechanical: there is uncertainty in the position of points, so if \(x\) and \(y\) are sufficently close, your measurement can’t be guaranteed to distinguish between them. Only when \(x\) and \(y\) are sufficiently far apart can your measurement definitely tell them apart.

We suppose that we have a function \(f \colon X \rightarrow X\). So \(f\) sends points of \(X\) to points of \(X\). We assume \(f\) is continuous, but nothing more. So, roughly, if \(x\) and \(y\) are close then \(f(x)\) and \(f(y)\) are close. (Making that rough statement precise is what the beginning of analysis is about.) We do not assume that \(f\) is injective; it could send many points to the same point. Nor do we assume \(f\) is surjective; it might send all the points of \(X\) to a small region of \(X\). All we know about \(f\) is that it jumbles up the points of \(f\), moving them around, in a continuous fashion.

We are going to define the topological entropy of \(f\), as a measure of the rate of information we can get out of \(f\), under the constraints of our poor eyesight (or our quantum uncertainty). The topological entropy of \(f\) is just a real number associated to \(f\), denoted \(h_{top}(f)\). In fact it’s a non-negative number. It could be as low as zero, and it can be infinite; and it can be any real number in between.

We ask: what is the maximum number of points can we distinguish, despite our poor eyesight / quantum uncertainty? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that any two of them are separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, we have \(d(x_i, x_j) \geq \varepsilon\). And if the answer is \(N\), then this is the maximum number; so there do not exist \(N+1\) points which are all separated by a distance of at least \(\varepsilon\).

Call this number \(N(\varepsilon)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our poor eyes can tell apart.

(Note that the number of points you can distinguish is necessarily finite, since they all lie in the compact space \(X\). There’s no way your shoddy eyesight can tell apart infinitely many points in a space of finite volume! So \(N(\varepsilon)\) is always finite.)

Clearly, if our eyesight deteriorates, then we see less, and we can distinguish fewer points. Similarly, if our eyes improve, then we see more, so we can distinguish more points. Eyesight deterioration means \(\varepsilon\) increases: we can only distinguish points if they are further apart. Similarly, eyesight improvement means \(\varepsilon\) decreases: we can tell apart points that are closer together.

Therefore, \(N(\varepsilon)\) is a decreasing function of \(\varepsilon\). As \(\varepsilon\) increases, our eyesight deteriorates, and we can distinguish fewer points.

Now, we haven’t yet used the function \(f\). Time to bring it into the picture.

So far, we’ve thought of our eyesight as being limited by space — by the spatial resolution it can distinguish. But our eyesight also applies over time.

We can think of the function \(f\) as describing a “time step”. After each second, say, each point \(x\) of \(X\) moves to \(f(x)\). So a point \(x\) moves to \(f(x)\) after 1 second, to \(f(f(x))\) after 2 seconds, to \(f(f(f(x)))\) after 3 seconds, and so on. In other words, we iterate the function \(f\). If \(f\) is applied \(n\) times to \(x\), we denote this by \(f^{(n)}(x)\). So, for instance, \(f^{(3)}(x) = f(f(f(x)))\).

The idea is that, if you stare at two moving points for long enough, you might not be able to distinguish them at first, but if eventually you may be able to. If they move apart at some point, then you may be able to distinguish them.

So while your eyes are encumbered by space, the are assisted by time. Your shoddy eyes have a finite spatial resolution they can distinguish, but over time points may move apart enough for you to resolve them.

(You can also think about this in a “quantum” way. The uncertainty principle says that uncertainties in space and time are complementary. If you look over a longer time period, you allow a greater uncertainty in time, which allows for smaller uncertainty in position. But from now on I’ll stick to my non-quantum myopia analogy.)

We can then ask a similar question: what is the maximum number of points we can distinguish, despite our myopia, while viewing the system for \(T\) seconds? If the answer is \(N\), then there exist \(N\) points \(x_1, \ldots, x_N\) in \(X\), such that at some point over \(T\) seconds, i.e. \(T\) iterations of the function \(f\), any two of them become separated by a distance of at least \(\varepsilon\). In other words, for any two points \(x_i, x_j\) (with \(i \neq j\)) among these \(N\) points, there exists some time \(t\), where \(0 \leq t \leq T\), such that \(d(f^{(t)}(x_i), f^{(t)}(x_j)) \geq \varepsilon\). And if the answer is \(N\), then this is again the maximal number, so there do not exist \(N+1\) points which all become separated at some instant over \(T\) seconds.

Call this number \(N(f, \varepsilon, T)\). So \(N(\varepsilon)\) is the maximum number of points of \(X\) our decrepit eyes can distinguish over \(T\) seconds, i.e. \(T\) iterations of the function \(f\).

Now if we allow ourselves more time, then we have a better chance to see points separating. As long as there is one instant of time at which two points separate, we can distinguish them. So as \(T\) increases, we can distinguish more points. In other words, \(N(f, \varepsilon, T)\) is an increasing function of \(T\).

And by our previous argument about \(\varepsilon\), \(N(f, \varepsilon, T)\) is a decreasing function of \(\varepsilon\).

So we’ve deduced that the number of points we can distinguish over time, \(N(f, \varepsilon, T)\), is a decreasing function of \(\varepsilon\), and an increasing function of \(T\).

We can think of the number \(N(f, \varepsilon, T)\) as an amount of information: the number of points we can tell apart is surely some interesting data!

But rather than think about a single instant in time, we want to think of the rate of information we obtain, as time passes. How much more information do we get each time we iterate \(f\)?

As we iterate \(f\), and we look at our space \(X\) over a longer time interval, we know that we can distinguish more points: \(N(f, \varepsilon, T)\) is an increasing function of \(T\). But how fast is it increasing?

To pick one possibility out of thin air, it might be the case, that every time we iterate \(f\), i.e. when we increase \(T\) by \(1\), that we can distinguish twice as many points. In that case, \(N(f, \varepsilon, T)\) doubles every time we increment \(T\) by 1, and we will have something like \(N(f, \varepsilon, T) = 2^T\). In this case, \(N\) is increasing exponentially, and the (exponential) growth rate is given by the base 2.

(Note that doubling the number of points you can distinguish is just like having 1 extra bit of information: with 3 bits you can describe \(2^3 = 8\) different things, but with 4 bits you can describe \(2^4 = 16\) things — twice as many!)

Similarly, to pick another possibility out of thin air, if it were the case that \(N(f, \varepsilon, T)\) tripled every time we incremented \(T\) by \(1\), then we would have something like \(N(f, \varepsilon, T) = 3^T\), and the growth rate would be 3.

But in general, \(N(f, \varepsilon, T)\) will not increase in such a simple way. However, there is a standard way to describe the growth rate: look at the logarithm of \(N(f, \varepsilon, T)\), and divide by \(T\). For instance, if \(N(f, \varepsilon, T) \sim 2^T\), then we have \(\frac{1}{T} \log N(f, \varepsilon, T) \sim 2\). And then see what happens as \(T\) becomes larger and larger. As \(T\) becomes very large, you’ll get an asymptotic rate of information gain from each iteration of \(f\).

(In describing a logarithm, we should technically specify what the base of the logarithm is. It could be anything; I don’t care. Pick your favourite base. Since we’re talking about information, I’d pick base 2.)

This leads us to think that we should consider the limit
\[
\lim_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).
\]
This is a great idea, except that if \(N (f, \varepsilon, N)\) grows in an irregular fashion, this limit might not exist! But that’s OK, there’s a standard analysis trick to get around these kinds of situations. Rather than taking a limit, we’ll take a lim inf, which always exists.
\[
\liminf_{T \rightarrow \infty} \frac{1}{T} \log N (f, \varepsilon, N).
\]

(The astute reader might ask, why lim inf and not lim sup? We could actually use either: they both give the same result. In our analogy, we might want to know the rate of information we’re guaranteed to get out of \(f\), so we’ll take the lower bound.)

And this is almost the definition of topological entropy! By taking a limit (or rather, a lim inf), we have eliminated the dependence on \(T\). But this limit still depends on \(\varepsilon\), the resolution of our eyesight.

Although our eyesight is shoddy, mathematics is not! So in fact, to obtain the ideal rate of information gain, we will take a limit as our eyesight becomes perfect! That is, we take a limit as \(\varepsilon\) approaches zero.

And this is the definition of the topological entropy of \(f\):
\[
h_{top}(f) = \lim_{\varepsilon \rightarrow 0} \liminf_{T \rightarrow \infty} \frac{1}{T} \log N(f, \varepsilon, n).
\]
So the topological entropy is, as we said in the beginning, the asymptotic rate of information we gain in our ability to distinguish points in \(X\) as we iterate \(f\), in the limit of perfect eyesight!

As it turns out, even though we heavily relied on distances in \(X\) throughout this definition, \(h_{top}(f)\) is completely independent of our notion of distance! If we replace our metric, or distance function \(d(x,y)\) with a different one, we will obtain the same result for \(h_{top}\). So the topological entropy really is topological — it has nothing to do with any notion of distance at all.

This is just one of several ways to define topological entropy. There are many others, just as wonderful and surprising and which scratch the tip of an iceberg.

References: