Integration is hard.

When we learn calculus, we learn to differentiate before we can integrate. This is despite the fact that, arguably, integration is an “easier” concept. To my mind at least, when I am given a curve in the plane, the notion of an area bounded by this curve is a very straightforward, intuitive thing; while the notion of its “gradient” or “slope” at a point is a much more subtle, or at least less intuitive idea.

But whether these ideas are natural or not, one is certainly *mathematically* and *technically* more difficult than the other. Integration is much more subtle and difficult.

These difficulties highlight the extent to which integration is less a science and more an art form. And in my experience, those difficulties are seen very rarely in high school or undergraduate mathematics, even as students take course after course about calculus and integration. So it high time we shed some light on this lost art.

### Really existing differentiation

In order to see just how hard integration is, let’s first consider how we learn, and apply, the ideas of differentiation.

When we learn differentiation, we first learn a definition that involves limits and difference quotients — the old chestnut \( \lim_{h \rightarrow 0} \frac{ f(x+h) – f(x) }{ h } \). We pass through a discussion of chords and tangents — perhaps even supplemented with some physical intuition about average and instantaneous velocity. From this we have a “first principles” approach to calculus, using the formula \( f'(x) = \lim_{h \rightarrow 0} \frac{ f(x+h) – f(x) }{ h } \).

This formula, and the whole “first principles” approach, is then promptly forgotten. After we learn the “first principles” of calculus, we then learn a series of rules, techniques and tricks, such as the *product rule*, *quotient rule* and *chain rule*. Using these, combined with a few other “basic” derivatives, most students will never need the “first principles” again.

More specifically, once we know how to differentiate basic functions like polynomials, trig functions and exponentials,

$$

\text{e.g.} \quad \frac{d}{dx} (x^n) = nx^{n-1}, \quad \frac{d}{dx} e^x = e^x, \quad \frac{d}{dx} \sin x = \cos x

$$

and we know the rules for how to differentiate their products, quotients, and compositions

$$

\text{e.g.} \quad

\frac{d}{dx} f(x)g(x) = f'(x) g(x) + f(x) g'(x), \quad

\frac{d}{dx} \frac{f(x)}{g(x)} = \frac{ f'(x) g(x) – f(x) g'(x) }{ g(x)^2 }, \quad

\frac{d}{dx} f(g(x)) = f'(g(x)) \; g'(x)

$$

we can forget all about “first principles” and mechanically apply these formulae. With some basics down, and armed with the trident of product, quotient, and chain rules, then, we can differentiate most functions we’re likely to come up against.

It turns out, then, that in a certain sense, differentiation is “easy”. You don’t need to know the theory so much as a few basic rules and techniques. And although these rules can be a bit technically demanding, you can use them in a fairly straightforward way. In fact, their use is *algorithmic*. If you’ve got the technique sufficiently down, then you can mechanically differentiate most functions we’re likely to come across.

Let’s make this a little more precise. What do we mean by “most functions we’re likely to come across”? What are these functions? We mean the *elementary functions*. We can define these as follows. We start from some “basic” functions: polynomials, rational functions, trigonometric functions and their inverses, exponential functions and logarithms.

$$

\text{E.g.} \quad 31 x^4 – 159 x^2 + 65, \quad \frac{ 2x^5 – 3x + 1 }{ x^3 + 8x^2 – 1 }, \quad \sin x, \quad \cos x, \quad \tan x, \quad \arcsin x, \quad \arccos x, \quad \arctan x, \quad a^x, \quad \log_a x.

$$

We then think of all the functions that you can get by repeatedly adding, subtracting, multiplying, dividing, taking \(n\)’th roots (i.e. square roots, cube roots, etc) and composing these functions. These functions are the elementary ones. They include functions like the following:

$$

\log_2 \left( \frac{ \sqrt[4]{3x^4 – 1} + 2\sin e^x }{ \arcsin (x^{\tan \log_3 x} + \sqrt[7]{ \pi^x – \cos (x^2) } ) – x^x } \right).

$$

(*Aside*: There’s actually a technicality here. Instead of saying that we can take \(n\)’th roots of a function, we should actually say that we can take any function which is a solution of a polynomial expression of existing functions. The \(n\)’th root of a function \( f(x) \), i.e. \( \sqrt[n]{f(x)} \), is the solution of the polynomial equation in \( f(x) \) given by \( f(x)^n – 1 = 0 \). That is, you can take an algebraic extension of the function field. Having done this, you can find the derivative of the new function using implicit differentiation. But we will not worry too much about these technicalities.)

Actually, the above definition is not really a very efficient one. If you start from just the constant real functions and the function \(x\), then you can build a lot just from them! By repeatedly adding and multiplying \(x\)s and constants, you can build any polynomial; and then by dividing polynomials you can build any rational function. If you throw in \( e^x \) and \( \ln x = \log_e x \), then you also have all the other exponential and logarithmic functions, because for any (positive real) constant \(a\),

$$

a^x = e^{x \log_e a}

\quad \text{and} \quad

\log_a x = \frac{ \log_e x }{ \log_e a},

$$

and \(\log_e a\) is a constant! If you allow yourself to also use complex number constant functions, then you can build the trig functions out of exponentials,

$$

\sin x = \frac{ e^{ix} – e^{-ix} }{ 2i }, \quad

\cos x = \frac{ e^{ix} + e^{-ix} }{ 2 },

$$

and then you have \( \tan x = \frac{\sin x }{ \cos x } \). You can also build hyperbolic trigonometric functions if you wish, since \( \sinh x = \frac{ e^x – e^{-x} }{2} \), \( \cosh x = \frac{ e^x + e^{-x} }{2} \), and \( \tanh x = \frac{ \sinh x }{ \cosh x } \).

The formulas above for \(\sin x\) and \(cos x\) are relatively well known if you’ve studied complex numbers; a little less well-known are the formulas that allow us to express *inverse trigonometric* functions in terms of complex numbers, together with logarithms and square roots:

$$

\arcsin x = – i \; \ln \left( ix + \sqrt{1-x^2} \right), \quad

\arccos x = i \; \ln \left( x – i \sqrt{1-x^2} \right), \quad

\arctan x = \frac{i}{2} \; \left( \ln (1-ix) – \ln (1+ix) \right).

$$

(If you haven’t seen these before, try to prove them! There are also logarithmic functions for inverse hyperbolic trigonometric functions, which are probably slightly more well known as they don’t have complex numbers in them.)

Thus, we can define an elementary function as *a function which can be built from the functions \( \{ \text{complex constants}, x, e^x, \ln x \} \) using a finite number of additions, subtractions, multiplications, divisions, compositions, and \(n\)’th roots* (or really, solving polynomial equations in existing functions but don’t worry about this bit in parentheses).

The point is, that if you are good enough at the product, chain, and quotient rules, **you can differentiate any elementary function**. You don’t need any more tricks, though you might need to apply the rules very carefully and many times over! A further point is that when you find the answer, you find that **the derivative of an elementary function is another elementary function**.

### Not so elementary, my dear Watson

When we come to integration, though, everything becomes much more difficult. I’m only going to discuss *indefinite integration*, i.e. *antidifferentiation*. Definite integration with terminals just ends up giving you a number, but indefinite integration is essentially the inverse problem to differentiation. If we’re asked to find the indefinite integral \( \int f(x) \; dx \), we’re asked to find a function \(g(x)\) whose derivative is \(f(x)\), i.e. such that \( g'(x) = f(x)\). There are many such functions: if you have one such function \(g(x)\), then you can add any constant \(c\) to it, and the resulting function \(g(x)+c\) also has derivative \(f(x)\); that is why we tend to write \(+c\) at the end of the answer to any indefinite integration question. But it will suffice for us, here, to be able to find one — for the sake of simplicity, I will not write \(+c\) in the answers to indefinite integrals. In doing so I lose 1 mark for every integral I solve, but I don’t care!

We start with some basic functions like polynomials and trigonometric functions, exponentials and logarithms, some integrals are standard.

$$

\int x^n \; dx = \frac{1}{n+1} x^{n+1}, \quad

\int \sin x \; dx = – \cos x, \quad

\int \cos x \; dx = \sin x, \quad

\int e^x \; dx = e^x.

$$

Some are slightly less standard:

$$

\int \tan x \; dx = – \ln \cos x, \quad

\int \ln x \; dx = x \ln x – x, \quad.

$$

(You might complain that the integral of \(\tan x\) should actually be \( \ln | \cos x | \). You’d be right, and I am totally sweeping that technicality under the carpet!)

Some inverse trigonometric integrals, perhaps, are less standard again:

$$

\int \arcsin x \; dx = x \arcsin x + \sqrt{1-x^2}, \quad

\int \arccos x \; dx = x \arccos x – \sqrt{1-x^2}, \quad

\int \arctan x \; dx = x \arctan x – \frac{1}{2} \ln (1+x^2).

$$

So far, so good — although perhaps not always obvious! But now, in general, what if we start to combine these functions? The problem is that if you know how to integrate \(f(x)\) and you know how to integrate \(g(x)\), it does *not* follow that you know how to integrate their product \(f(x) g(x) \). This is in contrast to differentiation: if you know how to *differentiate* \(f(x)\) and \(g(x)\), then you can use the product rule to differentiate \(f(x)g(x)\). There is no product rule for integration!

The product rule for differentiation, rather, translates into the *integration by parts* formula for integration:

$$

\int f(x) g'(x) \; dx = f(x) g(x) – \int f'(x) g(x) \; dx.

$$

This is not a formula for \( \int f(x) g(x) \; dx \)! A product rule for integration would say to you “if you can integrate both of my factors, you can integrate me!” But this integration by parts formula says something more along the lines of “if you can integrate one of my factors and differentiate the other, then you can express me in terms of the integral obtained by integrating and differentiating those two factors”. That is a much more subtle statement. A product rule would be a hammer you could use to crack integrals; but the integration formula is a much more subtle card up your sleeve.

Essentially, integration by parts supplies you with a trick which, if you are clever enough, and the integral is conducive to it, you can use to rewrite the integral in terms of a different integral which is hopefully easier. Hopefully. While the product rule for differentiation is an all-purpose tool of the trade — a machine used to calculate derivatives — integration by parts is a subtle trick which, when wielded with enough sophistication and skill, can simplify (rather than calculate) integrals.

Similarly, *there is no chain rule* for integration. The chain rule for differentiation translates into the *integration by substitution* formula for integration:

$$

\int f'(g(x)) \; g'(x) \; dx = f(g(x)).

$$

A chain rule for integration would say to you “if I am a composition of two functions, and you can integrate both of them, then you can integrate me”. But integration by substitution says, instead, “if I am a composition of two functions, multiplied by the derivative of the inner function, then you can integrate me”. In a certain sense it’s easier than integration by parts, because it *calculates* the integral and *gives you an answer*, rather than merely reducing to a different (hopefully simpler) integral. But still, it remains an art form: it requires the skill to see how to regard the integrand as an expression of the form \( f'(g(x)) \; g'(x) \). Finally, *there is no quotient rule* for integration either.

So, while differentiation is a skill which can be learned and applied, integration is an art form for which we learn tricks and strategies, and develop our skills and intuition in applying them. Now, actually there are tables of standard integrals, far far beyond the small examples above. There are theorems about how functions of certain types can be integrated. There are algorithms which can be used to integrate certain, often very complicated, families of functions.

But the question remains: how far can we go? If we see an integral which we can’t immediately solve, do we just need to think a little harder, and apply something from our bag of tricks in a clever new way? Do we just need more skill, or is the integral *impossible*? How would we tell the difference between a “hard” and an “impossible” integral — and what does that even mean?

In a certain sense, no integrals are “impossible”. An integral of a continuous function always exists, in a certain sense. If you’ve got a continuous function \(f : \mathbb{R} \rightarrow \mathbb{R}\), then its integral is certainly defined *as a function*, using the definition with Riemann sums — this is a theorem. Even if \(f\) is not continuous, it’s possible that the Riemann sum approach can give a well-defined function as the integral. For more exotic functions \(f\), there is the more advanced method of *Lebesgue integration*.

But this is not what we have in mind when we say an “integral is impossible”. What we really mean is that we can’t write a nice formula for the integral. This would happen if the result were *not an elementary function*.

As we discussed above, if you take an elementary function and differentiate it, you can always calculate the derivative with a sufficiently careful application of product/chain/quotient rules, and the result is *another elementary function*.

So, we might ask: given an elementary function, even though there might not be any straightforward way to calculate its integral, is the result always *another elementary function?*

### Indomitable impossible integrals

It turns out, the answer is **no**. There are elementary functions such that, when you take their integral, it is not an elementary function. When you try to integrate such a function, although the integral exists, you can’t write a nice formula for it. And it’s not because you’re not skillful enough. It’s not because you’re not smart enough. The reason you can’t write a nice formula for the integral is because *no such formula exists*: the integral is not an elementary function.

What is an example of such a function? The simplest example is one that high school students come up against all the time: the *Gaussian function*

$$

e^{-x^2}.

$$

It’s clearly an elementary function, constructed by composition of a polynomial \(-x^2\) and the exponential function. But its integral is not elementary.

You might recall that the graph of \(y = e^{-x^2} \) is a bell curve. Suitably dilated (normalised), it is the probability density function for a normal distribution. When you calculate probabilities involving normally distributed random variables, you often integrate this function.

You may recall painful time spent in high school looking up a table to find out probabilities for the normal distribution. That table is essentially a table of (definite) integrals for the function \(e^{-x^2}\) (or a closely related function). And the reason that it’s a table you have to look up, rather than a formula, is because *there is no formula* for the integral \( \int e^{-x^2} \; dx \). You need a table because the integral of the elementary function \(e^{-x^2} \) is not elementary.

There’s no formula for normal distribution probabilities because integration is an art form, rather than algorithmic. And so we are sometimes reduced to the quite non-artistic process of looking up a table to find the integral.

Now, when I say that \( \int e^{-x^2} \; dx \) is not elementary, I mean that it’s known as a theorem. That is, it has been proved mathematically that \( \int e^{-x^2} \; dx \) is not elementary, and so doesn’t have a nice formula. But what could this mean? How could you *prove* that an integral doesn’t have a nice formula, isn’t an elementary function, *can’t* be written in a nice way? The proof is a bit complicated, too complicated to recall in complete detail here. But there are some nice ideas involved, and it’s worth recounting some of them here.

### Proving the impossible

The fact that \( \int e^{-x^2} \; dx \) is not elementary was proved by the French mathematician Joseph Liouville in the mid-19th century. In fact, he proved quite a deal more. Suppose you have an elementary function \( f(x) \), and you are trying to find its integral \( g(x) = \int f(x) \; dx \). Now as the integrand \(f(x)\) is continuous, the integral \(g(x)\) certainly exists as a continuous function; the question is whether \(g(x)\) is elementary or not, i.e. whether there is a formula for \(g(x)\) involving only complex numbers, powers of \(x\), rational functions, \(\exp\) and \(\log\), and \(n\)’th roots (and their generalisations).

Liouville’s theorem, amazingly, tells you that *if* the function \(g(x)\) you’re looking for *is* elementary, then it must have a very specific form. Very roughly, Liouville says, \(g(x)\) can have *more logarithms* than \(f(x)\), but *no more exponentials*. You can see the germ of this idea in some of the integrals above:

$$

\int \tan x \; dx = – \ln \cos x, \quad

\int \arctan x \; dx = x \arctan x – \frac{1}{2} \ln (1+x^2).

$$

In these integrals, a new logarithm appears, that did not appear in the integrand. Never does a new exponential appear. If an exponential appears in the integral, then it appeared in the integrand, as in examples like

$$

\int e^x \; dx = e^x.

$$

To state Liouville’s theorem more precisely, we need the idea of a *field of functions*. For our purposes, we can think of a field of functions as a collection of functions \(f(x)\) which is closed under addition, subtraction, multiplication, and division. The polynomials in \(x\) do *not* form a field of functions, because when you divide two polynomials you do not always get a polynomial! However, the **rational functions** in \(x\) *do* form a field of functions. A rational function in \(x\) is the quotient of two polynomials (with complex coefficients) in \(x\), i.e. a function like

$$

\frac{ 3x^2 – 7 }{ x^{10} – x^9 + x^3 + 1}

\quad \text{ or } \quad

\frac{ 4x+1}{2x-3}

\quad \text{or} \quad

x^2 – 3x + \pi

\quad \text{or} \quad

3.

$$

The first example is the quotient of a quadratic by a 10’th degree polynomial; the second example is the quotient of two linear polynomials. The third example illustrates the notion that any polynomial is also a rational function, because you can think of it as itself divided by \(1\), and \(1\) is a polynomial: \( x^2 – 3x + \pi = \frac{ x^2 – 3x + \pi }{ 1 } \). The final example illustrates the notion that any constant is also a rational function.

The field of rational functions (with complex coefficients) in \(x\) is denoted \( \mathbb{C}(x) \). You can make bigger fields of rational functions by including new elements! For instance, you could throw in the exponential function \( e^x \), and then you can obtain the larger field of functions \( \mathbb{C}(x, e^x) \). The functions in this field are those made up of adding, subtracting, multiplying, and dividing powers of \(x\) and the function \( e^x\). So this includes functions like

$$

\frac{ x^2 + x e^x – e^x }{ x + 1 }

\quad \text{ or } \quad

(e^x) \cdot (e^x) = e^{2x}

\quad \text{ or } \quad

x^7 e^{4x} – 3 x^2 e^x + \pi.

$$

Note however that a function like \( e^{e^x} \) does *not* lie in \( \mathbb{C} (x, e^x) \). This function field is made up out of adding, subtracting, multiplying and dividing \(x\) and \( e^x \), but *not* by composing these functions.

We can see, then, that this second function field is bigger than the first one: \( \mathbb{C}(x) \subset \mathbb{C}(x, e^x) \). In technical language, we say that \( \mathbb{C}(x) \subset \mathbb{C}(x, e^x) \) is a *field extension*. Moreover, both these fields have the nice property that they are *closed under differentiation*. That is, it you take a rational function and differentiate it, you get another rational function. And if you take a function in \( \mathbb{C}(x, e^x) \), involving \(x\)’s and \(e^x\)’s, and differentiate it, you get another function in \( \mathbb{C}(x, e^x) \). In technical language, we say that \( \mathbb{C}(x) \) and \( \mathbb{C}(x, e^x) \) are *differential fields of functions*.

A differential field obtained in this way, by starting from rational functions and then throwing in an exponential, is an example of an *field of elementary functions*. In general, a field of elementary functions is obtained from the field of rational functions \( \mathbb{C}(x) \) by successively throwing in extra functions, some finite number of times. Each time you add a function is must be either

- an exponential of a function already in the field, or
- a logarithm of a function already in the field, or
- an \(n\)’th root of a function already in the field (or more generally the root of a polynomial equation with coefficients in the field but as I keep saying don’t worry too much about this!).

Note that, by *definition*, any function in a field of elementary functions is made up by adding, subtracting, multiplying, and dividing \(x\)’s, and exponentials, and logarithms, and \(n\)’th roots (or generalisations thereof). That is, a function in an field of elementary functions is an elementary function! So our definitions of “elementary function” and “field of elementary functions” agree — it would be bad if we used the word “elementary” to mean two different things!

We can now state Liouville’s theorem precisely.

**Liouville’s theorem**: Let \(K\) be an field of elementary functions, and let \(f(x)\) be a function in \(K\). (Hence \(f(x)\) is an elementary function.) If the integral \( \int f(x) \; dx \) is elementary, then

$$

\int f(x) \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

$$

where \(n\) is a non-negative integer, each of \(c_1, c_2, \ldots, c_n\) is a constant, and the functions \(h(x), g_1(x), g_2(x), \ldots, g_n(x) \) all lie in \(K\).

That is, Liouville’s theorem says that the integral of an elementary function \(f(x)\) must be a sum of a function \(h(x)\) that lies in the same field as \(f\), and a constant linear combination of some logarithms of functions \( g_j(x)\) in the same field as \(f\). The fact that \(h(x) \) and each \(g_j(x)\) lies in the same field \(K\) as \(f(x)\) means that they cannot be much more complicated than \(f(x)\): they must be made up by adding, subtracting, multiplying and dividing the same bunch of functions that you can use to define \(f(x)\)

So Lioville’s theorem says, in a precise way, that when you integrate an elementary function \(f(x)\), *if* the result is elementary, *then* it can’t be much more complicated than \(f(x)\), *and* the only way in which it can be more complicated is that it can have some logarithms in it. This is what we meant when we gave the very rough description “Liouville says \(g(x)\) can have *more logarithms* than \(f(x)\), but *no more exponentials*“.

Let’s now return to our specific example of the Gaussian function \(f(x) = e^{-x^2}\). What does Liouville’s theorem mean for this function? Well, this function lies in the field of elementary functions \(K\) where we start from rational functions and then throw in, not \(e^x\), but \(e^{-x^2}\). That is, we can take \(K = \mathbb{C}(x, e^{-x^2})\).

The theorem says that *if* the integral

$$

\int e^{-x^2} \; dx

$$

is elementary, *then* it is given by

$$

\int e^{-x^2} \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

$$

where \(n\) is a non-negative integer, each of \( c_1, c_2, \ldots, c_n\) is a constant, and the functions \( h(x), g_1(x), g_2(x), \ldots, g_n(x) \) all lie in \(\mathbb{C}(x, e^{-x^2})\). That is, \(h(x), g_1(x), \ldots, g_n(x)\) are “no more complicated” than \(e^{-x^2}\); they are all made by adding, subtracting, multiplying and dividing \(x\)’s and \(e^{-x^2}\)’s.

If we differentiate the above equation, we obtain

$$

e^{-x^2} = h'(x) + \sum_{j=1}^n c_j \frac{ g’_j (x) }{ g_j (x) },

$$

On the left hand side is the function we started with, \(e^{-x^2}\). On the right hand side is an expression involving several functions. However, all the functions \(g_j(x)\) and \(h(x)\) lie in \(K\); they are “no more complicated” than \(e^{-x^2}\). Now as \(K\) is a *differential field*, their derivatives \(g’_j(x)\) and \(h'(x)\) also lie in \(K\); they are *also* “no more complicated”. So in fact the right hand side is an expression involving functions no more complicated than \(e^{-x^2}\). They are all just rational functions, with \(e^{-x^2}\)’s thrown in. And if you think about it, thinking about what you will get for each \(g’_j(x) / g_j (x)\), you might find it hard to avoid having a big denominator. You likely won’t be able to cancel the fraction. So you might find, then, that none of the \(g_j(x)\) can make this equality work, and all \(g_j(x)\) have to be zero; or in other words, \(e^{-x^2} = h'(x)\). But now that \(h(x)\) is, like everything else here, made up by adding, subtracting, multiplying and dividing \(x\)’s and \(e^{-x^2}\)’s. You might find, when you differentiate such a function, that it’s very hard to get a lone \(e^{-x^2}\). Every time you differentiate an \(e^{-x^2}\) you get a \(-2xe^{-x^2}\), which has a pesky extra factor of \(-2x\). And even if it appears together with other terms, as something like \(x^3 e^{-x^2}\), when you differentiate it you get something like \(-2x^4 e^{-x^2} + 3x^2 e^{-x^2}\), which still has no isolated \(e^{-x^2}\) term. And so, in conclusion, you might find it very difficult to find any functions that make the right hand side equal to \(e^{-x^2}\).

Of course, this is not a proof at all; it’s a mere plausibility argument. To prove the integral is not elementary does take a bit more work. But it has been done, and can be found in standard references.

Hopefully, though, this should at least give you some idea why it might be true, and how you might prove, that an integral is “impossible”, and can’t be written with any nice formula.

Mathematics is an amazing place.

### References

Brian Conrad, *Impossibility theorems for elementary integration*, [[http://www2.maths.ox.ac.uk/cmi/library/academy/LectureNotes05/Conrad.pdf]].

Keith O. Geddes, Stephen R. Czapor, George Labahn, *Algorithms for Computer Algebra*, Kluwer (1992).

Andy R. Magid, *Lectures on Differential Galois Theory*, AMS (1994).

(Update 2/3/15: Typo fixed.)