From the outside it might seem like physics and mathematics are a match made in heaven. In practice, it feels more like physicists are given a very short blanket made of math, and when they stretch it to cover their heads, their feet are freezing, and vice versa.

Physicists turn reality into numbers. They process these numbers using mathematics, and turn them into predictions about other numbers. The mapping between physical reality and mathematical models is not at all straightforward. It involves a lot of arbitrary choices. When we perform an experiment, we take the readings of our instruments and create one particular parameterization of nature. There usually are many equivalent parameterizations of the same process and this is one of the sources of redundancy in our description of nature. The Universe doesn’t care about our choice of units or coordinate systems.

This indifference, after we plug the numbers into our models, is reflected in symmetries of our models. A change in the parameters of our measuring apparatus must be compensated by a transformation of our model, so that the results of calculations still match the outcome of the experiment.

But there is an even deeper source of symmetries in physics. The model itself may introduce additional redundancy in order to simplify the calculations or, sometimes, make them possible. It is often necessary to use parameter spaces that allow the description of non-physical states–states that could never occur in reality.

Computer programmers are familiar with such situations. For instance, we often use integers to access arrays. But an integer can be negative, or it can be larger than the size of the array. We could say that an integer can describe “non-physical” states of the array. We also have freedom of parameterization of our input data: we can encode true as one, and false as zero; or the other way around. If we change our parameterization, we must modify the code that deals with it. As programmers we are very well aware of the arbitrariness of the choice of representation, but it’s even more true in physics. In physics, these reparameterizations are much more extensive and they have their mathematical description as groups of transformations.

But what we see in physics is very strange: the non-physical degrees of freedom introduced through redundant parameterizations turn out to have some measurable consequences.

Symmetries

If you ask physicists what the foundations of physics are, they will probably say: symmetry. Depending on their area of research, they will start talking about various symmetry groups, like SU(3), U(1), SO(3,1), general diffeomorphisms, etc. The foundations of physics are built upon fields and their symmetries. For physicists this is such an obvious observation that they assume that the goal of physics is to discover the symmetries of nature. But are symmetries the property of nature, or are they the artifact of our tools? This is a difficult question, because the only way we can study nature is through the prism or mathematics. Mathematical models of reality definitely exhibit lots of symmetries, and it’s easy to confuse this with the statement that nature itself is symmetric.

But why would models exhibit symmetry? One explanation is that symmetries are the effect of redundant descriptions.

I’ll use the example of electromagnetism because of its relative simplicity (some of the notation is explained in the Appendix), but the redundant degrees of freedom and the symmetries they generate show up everywhere in physics. The Standard Model is one big gauge theory, and Einstein’s General Relativity is built on the principle of invariance with respect to local coordinate transformations.

Electromagnetic field

Maxwell’s equations are a mess, until you rewrite them using 4-dimensional spacetime. The two vector fields, the electric field and the magnetic field are combined into one 4-dimensional antisymmetric tensor F^{\mu \nu}:

F^{\mu\nu} = \begin{bmatrix} 0 & -E_x & -E_y & -E_z \\ E_x & 0 & -B_z & B_y \\ E_y & B_z & 0 & -B_x \\ E_z & -B_y & B_x & 0 \end{bmatrix}

Because of antisymmetry, F^{\mu \nu} has only six independent components. The components of F^{\mu \nu} are physical fields that can be measured using test charges and magnetic needles.

The derivatives of these fields satisfy two sets of Maxwell’s equations. The first set of four describes the dependence of fields on sources—electric charges and currents:

\partial_{\mu} F^{\mu \nu} = j^{\nu}

The second set of four equations describe constraints imposed on these fields:

\partial_{[\rho} F_{\mu \nu ]} = 0

For a particular set of sources and an initial configuration, we could try to solve these equations numerically. A brute force approach would be to divide space into little cubes, distribute our charges and currents between them, replace differential equations with difference equations, and turn on the crank.

First, we would check if the initial field configuration satisfied the constraints. Then we would calculate time derivatives of the fields. We would turn time derivatives into time differences by multiplying them by a small time period, get the next configuration, and so on. With the size of the cubes and the quantum of time small enough, we could get a reasonable approximation of reality. A program to perform these calculations isn’t much harder to write than a lot of modern 3-d computer games.

Notice that this procedure has an important property. To calculate the value of a field in a particular cube, it’s enough to know the values at its nearest neighbors and its value at the previous moment of time. The nearest-neighbor property is called locality and the dependence on the past, as opposed to the future, is called causality. The famous Conway Game of Life is local and causal, and so are cellular automata.

We were very lucky to be able to formulate a model that pretty well approximates reality and has these properties. Without such models, it would be extremely hard to calculate anything. Essentially all classical physics is written in the language of differential equations, which means it’s local, and its time dependence is carefully crafted to be causal. But it should be stressed that locality and causality are properties of particular models. And locality, in particular, cannot be taken for granted.

Electromagnetic Potential

The second set of Maxwell’s equations can be solved by introducing a new field, a 4-vector A_{\mu} called the vector potential. The field tensor can be expressed as its anti-symmetrized derivative

F_{\mu \nu} = \partial_{[ \mu} A_{\nu ]}

Indeed, if we take its partial derivative and antisymmetrize the three indices, we get:

\partial_{[\rho} F_{\mu \nu ]} = \partial_{[\rho} \partial_{ \mu} A_{\nu ]} = 0

which vanishes because derivatives are symmetric, \partial_{\mu} \partial_{\nu} = \partial_{\nu} \partial_{\mu}.

Note for mathematicians: Think of A_{\mu} as a connection in the U(1) fiber bundle, and F_{\mu \nu} as its curvature. The second Maxwell equation is the Bianchi identity for this connection.

This field A_{\mu} is not physical. We cannot measure it. We can measure its derivatives in the form of F_{\mu \nu}, but not the field itself. In fact we cannot distinguish between A_{\mu} and the transformed field:

A'_{\mu} = A_{\mu} + \partial_{\mu} \Lambda

Here, \Lambda(x) is a completely arbitrary, time dependent scalar field. This is, again, because of the symmetry of partial derivatives:

F_{\mu \nu}' = \partial_{[ \mu} A'_{\nu ]} = \partial_{[ \mu} A_{\nu ]} + \partial_{[ \mu} \partial_{\nu ]} \Lambda = \partial_{[ \mu} A_{\nu ]} = F_{\mu \nu}

Adding a derivative of \Lambda is called a gauge transformation, and we can formulated a new law: Physics in invariant under gauge transformations. There is a beautiful symmetry we have discovered in nature.

But wait a moment: didn’t we just introduce this symmetry to simplify the math?

Well, it’s a bit more complicated. To explain that, we have to dive even deeper into technicalities.

The Action Principle

You cannot change the past and your cannot immediately influence far away events. These are the reasons why differential equations are so useful in physics. But there are some types of phenomena that are easier to explain by global rather than local reasoning. For instance, if you span an elastic rubber band between two points in space, it will trace a straight line. In this case, instead of diligently solving differential equations that describe the movements of the rubber band, we can guess its final state by calculating the shortest path between two points.

Surprisingly, just like the shape of the rubber band can be calculated by minimizing the length of the curve it spans, so the evolution of all classical systems can be calculated by minimizing (or, more precisely, finding a stationary point of) a quantity called the action. For mechanical systems the action is the integral of the Lagrangian along the trajectory, and the Lagrangian is given by the difference between  kinetic and potential energy.

Consider the simple example of an object thrown into the air and falling down due to gravity. Instead of solving the differential equations that relate acceleration to force, we can reformulate the problem in terms of minimizing the action. There is a tradeoff: we want to minimize the kinetic energy while maximizing the potential energy. Potential energy is larger at higher altitudes, so the object wants to get as high as possible in the shortest time, stay there as long as possible, before returning to earth. But the faster it tries to get there, the higher its kinetic energy. So it performs a balancing act resulting is a perfect parabola (at least if we ignore air resistance).

The same principle can be applied to fields, except that the action is now given by a 4-dimensional integral over spacetime of something called the Lagrangian density which, at every point, depends only of fields and their derivatives. This is the classical Lagrangian density that describes the electromagnetic field:

L = - \frac{1}{4} F^{\mu \nu} F_{\mu \nu} = \frac{1}{2}(\vec{E}^2 - \vec{B}^2)

and the action is:

S = \int L(x)\, d^4 x

However, if you want to derive Maxwell’s equations using the action principle, you have to express it in terms of the potential A_{\mu} and its derivatives.

Noether’s Theorem

The first of the Maxwell’s equations describes the relationship between electromagnetic fields and the rest of the world:

\partial_{\mu} F^{\mu \nu} = j^{\nu}

Here “the rest of the world” is summarized in a 4-dimensional current density j^{\nu}. This is all the information about matter that the fields need to know. In fact, this equation imposes additional constraints on the matter. If you differentiate it once more, you get:

\partial_{\nu}\partial_{\mu} F^{\mu \nu} = \partial_{\nu} j^{\nu} = 0

Again, this follows from the antisymmetry of F^{\mu \nu} and the symmetry of partial derivatives.

The equation:

\partial_{\nu} j^{\nu} = 0

is called the conservation of electric charge. In terms of 3-d components it reads:

\dot{\rho} = \vec{\nabla} \vec{J}

or, in words, the change in charge density is equal to the divergence of the electric current. Globally, it means that charge cannot appear or disappear. If your isolated system starts with a certain charge, it will end up with the same charge.

Why would the presence of electromagnetic fields impose conditions on the behavior of matter? Surprisingly, this too follows from gauge invariance. Electromagnetic fields must interact with matter in a way that makes it impossible to detect the non-physical vector potentials. In other words, the interaction must be gauge invariant. Which makes the whole action, which combines the pure-field Lagrangian and the interaction Lagrangian, gauge invariant.

It turns out that any time you have such an invariance of the action, you automatically get a conserved quantity. This is called the Noether’s theorem and, in the case of electromagnetic theory, it justifies the conservation of charge. So, even though the potentials are not physical, their symmetry has a very physical consequence: the conservation of charge.

Quantum Electrodynamics

The original idea of quantum field theory (QFT) was that it should extend the classical theory. It should be able to explain all the classical behavior plus quantum deviations from it.

This is no longer true. We don’t insist on extending classical behavior any more. We use QFT to, for instance, describe quarks, for which there is no classical theory.

The starting point of any QFT is still the good old Lagrangian density. But in quantum theory, instead of minimizing the action, we also consider quantum fluctuations around the stationary points. In fact, we consider all possible paths. It just so happens that the contributions from those paths that are far away from the classical solutions tend to cancel each other. This is the reason why classical physics works so well: classical trajectories are the most probable ones.

In quantum theory, we calculate probabilities of transitions from the initial state to the final state. These probabilities are given by summing up complex amplitudes for every possible path and then taking the absolute value of the result. The amplitudes are given by the exponential of the action:

e^{i S / \hbar }

Far away from the stationary point of the action, the amplitudes corresponding to adjacent paths vary very quickly in phase and they cancel each other. The summation effectively acts like a low-pass filter for these amplitudes. We are observing the Universe through a low-pass filter.

In quantum electrodynamics things are a little tricky. We would like to consider all possible paths in terms of the vector potential A_{\mu}(x). The problem is that two such paths that differ only by a gauge transformation result in exactly the same action, since the Lagrangian is written in terms of gauge invariant fields F^{\mu \nu}. The action is therefore constant along gauge transformations and the sum over all such paths would result in infinity. Once again, the non-physical nature of the potential raises its ugly head.

Another way of describing the same problem is that we expect the quantization of electromagnetic field to describe the quanta of such field, namely photons. But a photon has only two degrees of freedom corresponding to two polarizations, whereas a vector potential has four components. Besides the two physical ones, it also introduces longitudinal and time-like polarizations, which are not present in the real world.

To eliminate the non-physical degrees of freedom, physicists came up with lots of clever tricks. These tricks are relatively mild in the case of QED, but when it comes to non-Abelian gauge fields, the details are quite gory and involve the introduction of even more non-physical fields called ghosts.

Still, there is no way of getting away from vector potentials. Moreover, the interaction of the electromagnetic field with charged particles can only be described using potentials. For instance, the Lagrangian for the electron field \psi in the electromagnetic field is:

\bar{\psi}(i \gamma^{\mu}D_{\mu} - m) \psi

The potential A_{\mu} is hidden inside the covariant derivative

D_{\mu} = \partial_{\mu} - i e A_{\mu}

where e is the electron charge.

Note for mathematicians: The covariant derivative locally describes parallel transport in the U(1) bundle.

The electron is described by a complex Dirac spinor field \psi. Just as the electromagnetic potential is non-physical, so are the components of the electron field. You can conceptualize it as a “square root” of a physical field. Square roots of numbers come in pairs, positive and negative—Dirac field describes both negative electrons and positive positrons. In general, square roots are complex, and so are Dirac fields. Even the field equation they satisfy behaves like a square root of the conventional Klein-Gordon equation. Most importantly, Dirac field is only defined up to a complex phase. You can multiply it by a complex number of modulus one, e^{i e \Lambda} (the e in the exponent is the charge of the electron). Because the Lagrangian pairs the field \psi with its complex conjugate \bar{\psi}, the phases cancel, which shows that the Lagrangian does not depend on the choice of the phase.

In fact, the phase can vary from point to point (and time to time) as long as the phase change is compensated by the the corresponding gauge transformation of the electromagnetic potential. The whole Lagrangian is invariant under the following simultaneous gauge transformations of all fields:

\psi' = e^{i e \Lambda} \psi

\bar{\psi}' = \bar{\psi} e^{-i e \Lambda}

A_{\mu}' = A_{\mu} + \partial_{\mu} \Lambda

The important part is the cancellation between the derivative of the transformed field and the gauge transformation of the potential:

(\partial_{\mu} - i e A'_{\mu}) \psi' = e^{i e \Lambda}( \partial_{\mu} + i e \partial_{\mu} \Lambda - i e A_{\mu} - i e \partial_{\mu} \Lambda) \psi = e^{i e \Lambda} D_{\mu} \psi

Note for mathematicians: Dirac field forms a representation of the U(1) group.

Since the electron filed is coupled to the potential, does it mean that an electron can be used to detect the potential? But the potential is non-physical: it’s only defined up to a gauge transformation.

The answer is really strange. Locally, the potential is not measurable, but it may have some very interesting global effects. This is one of these situations where quantum mechanics defies locality. We may have a region of space where the electromagnetic field is zero but the potential is not. Such potential must, at least locally, be of the form: A_{\mu} = \partial_{\mu} \phi. Such potential is called pure gauge, because it can be “gauged away” using \Lambda = -\phi.

But in a topologically nontrivial space, it may be possible to define a pure-gauge potential that cannot be gauged away by a continuous function. For instance, if we remove a narrow infinite cylinder from a 3-d space, the rest has a non-trivial topology (there are loops that cannot be shrunk to a point). We could define a 3-d vector potential that circulates around the cylinder. For any fixed radius around the cylinder, the field would consist of constant-length vectors that are tangent to the circle. A constant function is a derivative of a linear function, so this potential could be gauged away using a function \Lambda that linearly increases with the angle around the cylinder, like a spiral staircase. But once we make a full circle, we end up on a different floor. There is no continuous \Lambda that would eliminate this potential.

This is not just a theoretical possibility. The field around a very long thin solenoid has this property. It’s all concentrated inside the solenoid and (almost) zero outside, yet its vector potential cannot be eliminated using a continuous gauge transformation.

Classically, there is no way to detect this kind of potential. But if you look at it from the perspective of an electron trying to pass by, the potential is higher on one side of the solenoid and lower on the other, and that means the phase of the electron field will be different, depending whether it passes on the left, or on the right of it. The phase itself is not measurable but, in quantum theory, the same electron can take both paths simultaneously and interfere with itself. The phase difference is translated into the shift in the interference pattern. This is called the Aharonov-Bohm effect and it has been confirmed experimentally.

Note for mathematicians: Here, the base space of the fiber bundle has non-trivial homotopy. There may be non-trivial connections that have zero curvature.

Aharonov-Bohm experiment

Space Pasta

I went into some detail to describe the role redundant degrees of freedom and their associated symmetries play in the theory of electromagnetic fields.

We know that the vector potentials are not physical: we have no way of measuring them directly. We know that in quantum mechanics they describe non-existent particles like longitudinal and time-like photons. Since we use redundant parameterization of fields, we introduce seemingly artificial symmetries.

And yet, these “bogus symmetries” have some physical consequences: they explain the conservation of charge; and the “bogus degrees of freedom” explain the results of the Aharonov-Bohm experiment. There are some parts of reality that they capture. What are these parts?

One possible answer is that we introduce redundant parametrizations in order to describe, locally, the phenomena of global or topological nature. This is pretty obvious in the case of the Aharonov-Bohm experiment where we create a topologically nontrivial space in which some paths are not shrinkable. The charge conservation case is subtler.

Consider the path a charged particle carves in space-time. If you remove this path, you get a topologically non-trivial space. Charge conservation makes this path unbreakable, so you can view it as defining a topological invariant of the surrounding space. I would even argue that charge quantization (all charges are multiples of 1/3 of the charge or the electron) can be explained this way. We know that topological invariants, like the Euler characteristic that describes the genus of a manifold, take whole-number values.

We’d like physics to describe the whole Universe but we know that current theories fail in some areas. For instance, they cannot tell us what happens at the center of a black hole or at the Big Bang singularity. These places are far away, either in space or in time, so we don’t worry about them too much. There’s still a lot of Universe left for physicist to explore.

Except that there are some unexplorable places right under our noses. Every elementary particle is surrounded by a very tiny bubble that’s unavailable to physics. When we try to extrapolate our current theories to smaller and smaller distances, we eventually hit the wall. Our calculations result in infinities. Some of these infinities can be swept under the rug using clever tricks like renormalization. But when we get close to Planck’s distance, the effects of gravity take over, and renormalization breaks down.

So if we wanted to define “physical space” as the place where physics is applicable, we’d have to exclude all the tiny volumes around the paths of elementary particles. Removing the spaghetti of all such paths leaves us with a topological mess. This is the mess on which we define all our theories. The redundant descriptions and symmetries are our way of probing the excluded spaces.

Appendix

A point in Minkowski spacetime is characterized by four coordinates x^{\mu} \mu = 0, 1, 2, 3, where x^0 is the time coordinate, and the rest are space coordinates. We use the system of units in which the speed of light c is one.

Repeated indices are, by Einstein convention, summed over (contracted). Indices between square brackets are anisymmetrized (that is summed over all permutations, with the minus sign for odd permutations). For instance

F_{0 1} = \partial_{[0} A_{1]} = \partial_{0} A_{1} - \partial_{1} A_{0} = \partial_{t} A_{x} - \partial_{x} A_{t}

Indexes are raised and lowered by contracting them with the Minkowski metric tensor:
\eta_{\mu\nu} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}

Partial derivatives with respect to these coordinates are written as:

\partial_{\mu} = \frac{\partial}{\partial x^{\mu}}

4-dimensional antisymmetric tensor F^{\mu \nu} is a 4 \times 4 matrix, but because of antisymmetry, it reduces to just 6 independent entries, which can be rearranged into two 3-d vector fields. The vector \vec E is the electric field, and the vector \vec B is the magnetic field.

F^{\mu\nu} = \begin{bmatrix} 0 & -E_x & -E_y & -E_z \\ E_x & 0 & -B_z & B_y \\ E_y & B_z & 0 & -B_x \\ E_z & -B_y & B_x & 0 \end{bmatrix}

The sources of these fields are described by a 4-dimensional vector j^{\mu}. Its zeroth component describes the distribution of electric charges, and the rest describes electric current density.

The second set of Maxwell’s equations can also be written using the completely antisymmetric Levi-Civita tensor with entries equal to 1 or -1 depending on the parity of the permutation of the indices:

\epsilon^{\mu \nu \rho \sigma} \partial_{\nu} F_{\rho \sigma} = 0