This is part 11 of Categories for Programmers. Previously: Natural Transformations. See the Table of Contents.

## Introduction to Part II

In the first part of the book I argued that both category theory and programming are about composability. In programming, you keep decomposing a problem until you reach the level of detail that you can deal with, solve each subproblem in turn, and re-compose the solutions bottom-up. There are, roughly speaking, two ways of doing it: by telling the computer what to do, or by telling it how to do it. One is called declarative and the other imperative.

You can see this even at the most basic level. Composition itself may be defined declaratively; as in, `h`

is a composite of `g`

after `f`

:

h = g . f

or imperatively; as in, call `f`

first, remember the result of that call, then call `g`

with the result:

h x = let y = f x in g y

The imperative version of a program is usually described as a sequence of actions ordered in time. In particular, the call to `g`

cannot happen before the execution of `f`

completes. At least, that’s the conceptual picture — in a lazy language, with *call-by-need* argument passing, the actual execution may proceed differently.

In fact, depending on the cleverness of the compiler, there may be little or no difference between how declarative and imperative code is executed. But the two methodologies differ, sometimes drastically, in the way we approach problem solving and in the maintainability and testability of the resulting code.

The main question is: when faced with a problem, do we always have the choice between a declarative and imperative approaches to solving it? And, if there is a declarative solution, can it always be translated into computer code? The answer to this question is far from obvious and, if we could find it, we would probably revolutionize our understanding of the universe.

Let me elaborate. There is a similar duality in physics, which either points at some deep underlying principle, or tells us something about how our minds work. Richard Feynman mentions this duality as an inspiration in his own work on quantum electrodynamics.

There are two forms of expressing most laws of physics. One uses local, or infinitesimal, considerations. We look at the state of a system around a small neighborhood, and predict how it will evolve within the next instant of time. This is usually expressed using differential equations that have to be integrated, or summed up, over a period of time.

Notice how this approach resembles imperative thinking: we reach the final solution by following a sequence of small steps, each depending on the result of the previous one. In fact, computer simulations of physical systems are routinely implemented by turning differential equations into difference equations and iterating them. This is how spaceships are animated in the asteroids game. At each time step, the position of a spaceship is changed by adding a small increment, which is calculated by multiplying its velocity by the time delta. The velocity, in turn, is changed by a small increment proportional to acceleration, which is given by force divided by mass.

These are the direct encodings of the differential equations corresponding to Newton’s laws of motion:

F = m dv/dt v = dx/dt

Similar methods may be applied to more complex problems, like the propagation of electromagnetic fields using Maxwell’s equations, or even the behavior of quarks and gluons inside a proton using lattice QCD (quantum chromodynamics).

This local thinking combined with discretization of space and time that is encouraged by the use of digital computers found its extreme expression in the heroic attempt by Stephen Wolfram to reduce the complexity of the whole universe to a system of cellular automata.

The other approach is global. We look at the initial and the final state of the system, and calculate a trajectory that connects them by minimizing a certain functional. The simplest example is the Fermat’s principle of least time. It states that light rays propagate along paths that minimize their flight time. In particular, in the absence of reflecting or refracting objects, a light ray from point A to point B will take the shortest path, which is a straight line. But light propagates slower in dense (transparent) materials, like water or glass. So if you pick the starting point in the air, and the ending point under water, it’s more advantageous for light to travel longer in the air and then take a shortcut through water. The path of minimum time makes the ray refract at the boundary of air and water, resulting in Snell’s law of refraction:

sin θ_{1}/ sin θ_{2}= v_{1}/ v_{2}

where `v`

is the speed of light in the air and _{1}`v`

is the speed of light in the water._{2}

All of classical mechanics can be derived from the principle of least action. The action can be calculated for any trajectory by integrating the Lagrangian, which is the difference between kinetic and potential energy (notice: it’s the difference, not the sum — the sum would be the total energy). When you fire a mortar to hit a given target, the projectile will first go up, where the potential energy due to gravity is higher, and spend some time there racking up negative contribution to the action. It will also slow down at the top of the parabola, to minimize kinetic energy. Then it will speed up to go quickly through the area of low potential energy.

Feynman’s greatest contribution was to realize that the principle of least action can be generalized to quantum mechanics. There, again, the problem is formulated in terms of initial state and final state. The Feynman path integral between those states is used to calculate the probability of transition.

The point is that there is a curious unexplained duality in the way we can describe the laws of physics. We can use the local picture, in which things happen sequentially and in small increments. Or we can use the global picture, where we declare the initial and final conditions, and everything in between just follows.

The global approach can be also used in programming, for instance when implementing ray tracing. We declare the position of the eye and the positions of light sources, and figure out the paths that the light rays may take to connect them. We don’t explicitly minimize the time of flight for each ray, but we do use Snell’s law and the geometry of reflection to the same effect.

The biggest difference between the local and the global approach is in their treatment of space and, more importantly, time. The local approach embraces the immediate gratification of here and now, whereas the global approach takes a long-term static view, as if the future had been preordained, and we were only analyzing the properties of some eternal universe.

Nowhere is it better illustrated than in the Functional Reactive Programming approach to user interaction. Instead of writing separate handlers for every possible user action, all having access to some shared mutable state, FRP treats external events as an infinite list, and applies a series of transformations to it. Conceptually, the list of all our future actions is there, available as the input data to our program. From a program’s perspective there’s no difference between the list of digits of π, a list of pseudo-random numbers, or a list of mouse positions coming through computer hardware. In each case, if you want to get the nth item, you have to first go through the first n-1 items. When applied to temporal events, we call this property *causality*.

So what does it have to do with category theory? I will argue that category theory encourages a global approach and therefore supports declarative programming. First of all, unlike calculus, it has no built-in notion of distance, or neighborhood, or time. All we have is abstract objects and abstract connections between them. If you can get from A to B through a series of steps, you can also get there in one leap. Moreover, the major tool of category theory is the universal construction, which is the epitome of a global approach. We’ve seen it in action, for instance, in the definition of the categorical product. It was done by specifying its properties — a very declarative approach. It’s an object equipped with two projections, and it’s the best such object — it optimizes a certain property: the property of factorizing the projections of other such objects.

Compare this with Fermat’s principle of minimum time, or the principle of least action.

Conversely, contrast this with the traditional definition of a cartesian product, which is much more imperative. You describe how to create an element of the product by picking one element from one set and another element from another set. It’s a recipe for creating a pair. And there’s another for disassembling a pair.

In almost every programming language, including functional languages like Haskell, product types, coproduct types, and function types are built in, rather than being defined by universal constructions; although there have been attempts at creating categorical programming languages (see, e.g., Tatsuya Hagino’s thesis).

Whether used directly or not, categorical definitions justify pre-existing programming constructs, and give rise to new ones. Most importantly, category theory provides a meta-language for reasoning about computer programs at a declarative level. It also encourages reasoning about problem specification before it is cast into code.

Next Limits and Colimits.

## Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.

Follow @BartoszMilewski

April 15, 2015 at 2:15 pm

You Sir are doing AWESOME job!

April 16, 2015 at 5:20 am

said deep underlying principle might be something which generalizes noether’s theorem along with the fact that symmetries abound in our explanations of processes (be it due to there being a deeper reason for symmetries to almost surely exist, or for our minds to be wired to recognise them as “principle components” of sorts, i.e. for us to be able to almost surely recognize processes as being near-symmetries)

July 23, 2015 at 12:11 pm

Great analogy about local and global reasoning! It was just at the fingertips, but I didn’t notice that before.

September 8, 2015 at 1:03 am

@suhail shergill

according to this http://bentnib.org/conservation-laws.pdf the Noether Theorem appears to essentially be a special case of the notion of theorems “for free”

@Bartosz Milewski you forgot to add a link to Limits and Colimits – also, very nice analogy!

January 11, 2016 at 6:05 am

My intuitive guess is that the top-down approach has the potential to be more expressive than Turing Machines. That is, there are ‘larger’ abstract formal systems we can’t instantiate in physical reality. What do you think of Wolfram’s rule 110? That seems to mess with my intuition…

January 11, 2016 at 1:19 pm

So far the top-down (global) and local approaches have been shown to be equivalent (at least in physics). Wolfram’s automata seem to be too simplistic to describe reality — good try, though. They are Turing complete, so they can simulate, digitally, more complex systems. But if we believe that space-time is continuous, the can only provide an approximation. But who knows, maybe reality is discrete below Planck scale.

January 11, 2016 at 11:19 pm

Doesn’t the existence of the Planck length itself strongly suggest that?

January 12, 2016 at 2:14 am

I see here an analogy between strategy and operations

January 12, 2016 at 7:27 am

Do you know if the expressiveness of category theory and/or set theory is beyond Turing Machines? I’ve kinda assumed that they were both broader formal systems, but I am just guessing. Software like Maple, Mathematica and Coq might actually shed some light on that.

Another intuitive view that I’ve been headed towards is that if we do really live within a spacetime object, that it is essentially discrete. That is, notions of continums and infinity are really just our own abstract symbolic creations. i could be way wrong about that, but funny things like IP most likely being in a different complexity class than LP seem to keep tilting me in that direction.

What disrupted me about rule 110 is not that I expect automata to be able to model the world, but rather I expected them to be contained and not burst through into being Turing-complete. In a sense, if they can flow upwards like that, then other formal systems can as well (which I sort of suspect, but it’s hazy).

January 12, 2016 at 3:08 pm

Hi Bartosz, excellent post.

Two questions:

1) what method do you use for writing specifications of your software?

2) what are your thoughts on the Hagino thesis?

January 12, 2016 at 3:09 pm

Citing from Wikipedia: “There is currently no proven physical significance of the Planck length.”

We have a “natural” or “universal” unit of speed, the speed of light. We have a natural quantum unit, the Planck constant. And we have a natural unit of gravitational interaction G. A particular combination of these natural units can be used to construct a unit of length — the Planck length. However, because if mixes quantum theory with gravitational theory — two theories that refuse to be unified with each other — its physical significance is purely speculative. There’s nothing to show that there’s “no physics” below Planck length.