In my category theory blog posts, I stated many theorems, but I didn’t provide many proofs. In most cases, it’s enough to know that the proof exists. We trust mathematicians to do their job. Granted, when you’re a professional mathematician, you have to study proofs, because one day you’ll have to prove something, and it helps to know the tricks that other people used before you. But programmers are engineers, and are therefore used to taking advantage of existing solutions, trusting that somebody else made sure they were correct. So it would seem that mathematical proofs are irrelevant to programming. Or so it may seem, until you learn about the Curry-Howard isomorphism–or propositions as types, as it is sometimes called–which says that there is a one to one correspondence between logic and programs, and that every function can be seen as a proof of a theorem. And indeed, I found that a lot of proofs in category theory turn out to be recipes for implementing functions. In most cases the problem can be reduced to this: Given some morphisms, implement another morphism, usually using simple composition. This is very much like using point-free notation to define functions in Haskell. The other ingredient in categorical proofs is diagram chasing, which is very much like equational resoning in Haskell. Of course, mathematicians use different notation, and they use lots of different categories, but the principles are the same.

I want to illustrate these points with the example from Emily Riehl’s excellent book Category Theory in Context. It’s a book for mathematicians, so it’s not an easy read. I’m going to concentrate on theorem 6.2.1, which derives a formula for left Kan extensions using colimits. I picked this theorem because it has calculational content: it tells you how to calculate a particular functor.

It’s not a short proof, and I have made it even longer by unpacking every single step. These steps are not too hard, it’s mostly a matter of understanding and using definitions of functoriality, naturality, and universality.

There is a bonus at the end of this post for Haskell programmers.

## Kan Extensions

I wrote about Kan extensions before, so here I’ll only recap the definition using the notation from Emily’s book. Here’s the setup: We want to extend a functor $F$, which goes from category $C$ to $E$, along another functor $K$, which goes from $C$ to $D$. This extension is a new functor from $D$ to $E$.

To give you some intuition, imagine that the functor $F$ is the Rosetta Stone. It’s a functor that maps the Ancient Egyptian text of a royal decree to the same text written in Ancient Greek. The functor $K$ embeds the Rosetta Stone hieroglyphics into the know corpus of Egyptian texts from various papyri and inscriptions on the walls of temples. We want to extend the functor $F$ to the whole corpus. In other words, we want to translate new texts from Egyptian to Greek (or whatever other language that’s isomorphic to it).

In the ideal case, we would just want $F$ to be isomorphic to the composition of the new functor after $K$. That’s usually not possible, so we’ll settle for less. A Kan extension is a functor which, when composed with $K$ produces something that is related to $F$ through a natural transformation. In particular, the left Kan extension, $Lan_K F$, is equipped with a natural transformation $\eta$ from $F$ to $Lan_K F \circ K$.

(The right Kan extension has this natural transformation reversed.)

There are usually many such functors, so there is the standard trick of universal construction to pick the best one.

In our analogy, we would ideally like the new functor, when applied to the hieroglyphs from the Rosetta stone, to exactly reproduce the original translation, but we’ll settle for something that has the same meaning. We’ll try to translate new hieroglyphs by looking at their relationship with known hieroglyphs. That means we should look closely at morphism in $D$.

## Comma Category

The key to understanding how Kan extensions work is to realize that, in general, the functor $K$ embeds $C$ in $D$ in a lossy way.

There may be objects (and morphisms) in $D$ that are not in the image of $K$. We have to somehow define the action of $Lan_K F$ on those objects. What do we know about such objects?

We know from the Yoneda lemma that all the information about an object is encoded in the totality of morphisms incoming to or outgoing from that object. We can summarize this information in the form of a functor, the hom-functor. Or we can define a category based on this information. This category is called the slice category. Its objects are morphisms from the original category. Notice that this is different from Yoneda, where we talked about sets of morphisms — the hom-sets. Here we treat individual morphisms as objects.

This is the definition: Given a category $C$ and a fixed object $c$ in it, the slice category $C/c$ has as objects pairs $(x, f)$, where $x$ is an object of $C$ and $f$ is a morphism from $x$ to $c$. In other words, all the arrows whose codomain is $c$ become objects in $C/c$.

A morphism in $C/c$ between two objects, $(x, f)$ and $(y, g)$ is a morphism $h : x \to y$ in $C$ that makes the following triangle commute:

In our case, we are interested in an object $d$ in $D$, and the slice category $D/d$ describes it in terms of morphisms. Think of this category as a holographic picture of $d$.

But what we are really interested in, is how to transfer this information about $d$ to $E$. We do have a functor $F$, which goes from $C$ to $E$. We need to somehow back-propagate the information about $d$ to $C$ along $K$, and then use $F$ to move it to $E$.

So let’s try again. Instead of using all morphisms impinging on $d$, let’s only pick the ones that originate in the image of $K$, because only those can be back-propagated to $C$.

This gives us limited information about $d$, but it’ll have to do. We’ll just use a partial hologram of $d$. Instead of the slice category, we’ll use the comma category $K \downarrow d$.

Here’s the definition: Given a functor $K : C \to D$ and an object $d$ of $D$, the comma category $K \downarrow d$ has as objects pairs $(c, f)$, where $c$ is an object of $C$ and $f$ is a morphism from $K c$ to $d$. So, indeed, this category describes the totality of morphisms coming to $d$ from the image of $K$.

A morphism in the comma category from $(c, f)$ to $(c', g)$ is a morphism $h : c \to c'$ such that the following triangle commutes:

So how does back-propagation work? There is a projection functor $\Pi_d : K \downarrow d \to C$ that maps an object $(c, f)$ to $c$ (with obvious action on morphisms). This functor loses a lot of information about objects (completely forgets the $f$ part), but it keeps a lot of the information in morphisms — it only picks those morphisms in $C$ that preserve the structure of the comma category. And it lets us “upload” the hologram of $d$ into $C$

Next, we transport all this information to $E$ using $F$. We get a mapping

$F \circ \Pi_d : K \downarrow d \to E$

Here’s the main observation: We can look at this functor $F \circ \Pi_d$ as a diagram in $E$ (remember, when constructing limits, diagrams are defined through functors). It’s just a bunch of objects and morphisms in $E$ that somehow approximate the image of $d$. This holographic information was extracted by looking at morphisms impinging on $d$. In our search for $(Lan_K F) d$ we should therefore look for an object in $E$ together with morphisms impinging on it from the diagram we’ve just constructed. In particular, we could look at cones under that diagram (or co-cones, as they are sometimes called). The best such cone defines a colimit. If that colimit exists, it is indeed the left Kan extension $(Lan_K F) d$. That’s the theorem we are going to prove.

To prove it, we’ll have to go through several steps:

1. Show that the mapping we have just defined on objects is indeed a functor, that is, we’ll have to define its action on morphisms.
2. Construct the transformation $\eta$ from $F$ to $Lan_K F \circ K$ and show the naturality condition.
3. Prove universality: For any other functor $G$ together with a natural transformation $\gamma$, show that $\gamma$ uniquely factorizes through $\eta$.

All of these things can be shown using clever manipulations of cones and the universality of the colimit. Let’s get to work.

## Functoriality

We have defined the action of $Lan_K F$ on objects of $D$. Let’s pick a morphism $g : d \to d'$. Just like $d$, the object $d'$ defines its own comma category $K \downarrow d'$, its own projection $\Pi_{d'}$, its own diagram $F \circ \Pi_{d'}$, and its own limiting cone. This new cone, however, can be reinterpreted as a cone for the old object $d$. That’s because, surprisingly, the diagram $F \circ \Pi_{d'}$ is the same as the diagram $F \circ \Pi_{d}$.

Take, for instance, an object $(c, f) : K \downarrow d$, with $f: K c \to d$. There is a corresponding object $(c, g \circ f)$ in $K \downarrow d'$. Both get projected down to the same $c$. That shows that objects that form the two diagrams are indeed the same.

Now take a morphism $h$ from $(c, f)$ to $(c', f')$ in $K \downarrow d$. It is also a morphism in $K \downarrow d'$ between $(c, g \circ f)$ and $(c', g \circ f')$. The commuting triangle condition in $K \downarrow d$

$f = f' \circ K h$

ensures the commuting condition in $K \downarrow d'$

$g \circ f = g \circ f' \circ K h$

All this shows that the new cone that defines the colimit of $F \circ \Pi_{d'}$ happens to also be a cone under $F \circ \Pi_{d}$.

But that diagram has its own colimit $(Lan_K F) d$. Because that colimit is universal, there must be a unique morphism from $(Lan_K F) d$ to $(Lan_K F) d'$, which makes all triangles between the two cones commute. We pick this morphism as the lifting of our $g$, which ensures the functoriality of $Lan_K F$.

The commuting condition will come in handy later, so let’s spell it out. For every object $(c, f:Kc \to d)$ we have a leg of the cone, a morphism $V_d^{(c, f)}$ from $F c$ to $(Lan_K F) d$; and another leg of the cone $V_{d'}^{(c, g \circ f)}$ from $F c$ to $(Lan_K F) d'$. If we denote the lifting of $g$ as $(Lan_K F) g$ then the commuting triangle is:

$(Lan_K F) g \circ V_{d}^{(c, f)} = V_{d'}^{(c, g \circ f)}$

In principle, we should also check that this newly defined functor preserves composition and identity, but this pretty much follows automatically whenever lifting is defined using composition of morphisms, which is indeed the case here.

## Natural Transformation

We want to show that there is a natural transformation $\eta$ from $F$ to $Lan_K F \circ K$. As usual, we’ll define the component of this natural transformation at some arbitrary object $c$. It’s a morphism between $F c$ and $(Lan_K F) (K c)$. We have lots of morphisms at our disposal, with all those cones lying around, so it shouldn’t be a problem.

First, observe that, because of the pre-composition with $K$, we are only looking at the action of $Lan_K F$ on objects which are inside the image of $K$.

The objects of the corresponding comma category $K \downarrow K c'$ have the form $(c, f)$, where $f : K c \to K c'$. In particular, we can pick $c' = c$, and $f = id_{K c}$, which will give us one particular vertex of the diagram $F \circ \Pi_{K c}$. The object at this vertex is $F c$ — exactly what we need as a starting point for our natural transformation. The leg of the colimiting cone we are interested in is:

$V_{K c}^{(c, id)} : F c \to (Lan_K F) (K c)$

We’ll pick this leg as the component of our natural transformation $\eta_c$.

What remains is to show the naturality condition. We pick a morphism $h : c \to c'$. We know how to lift this morphism using $F$ and using $Lan_K F \circ K$. The other two sides of the naturality square are $\eta_c$ and $\eta_{c'}$.

The bottom left composition is $(Lan_K F) (K h) \circ V_{K c}^{(c, id)}$. Let’s take the commuting triangle that we used to show the functoriality of $Lan_K F$:

$(Lan_K F) g \circ V_{d}^{(c, f)} = V_{d'}^{(c, g \circ f)}$

and replace $f$ by $id_{K c}$, $g$ by $K h$, $d$ by $K c$, and $d'$ by $K c'$, to get:

$(Lan_K F) (K h) \circ V_{K c}^{(c, id)} = V_{K c'}^{(c, K h)}$

Let’s draw this as the diagonal in the naturality square, and examine the upper right composition:

$V_{K c'}^{(c', id)} \circ F h$.

This is also equal to the diagonal $V_{K c'}^{(c, K h)}$. That’s because these are two legs of the same colimiting cone corresponding to $(c', id_{K c'})$ and $(c, K h)$. These vertices are connected by $h$ in $K \downarrow K c'$.

But how do we know that $h$ is a morphism in $K \downarrow K c'$? Not every morphism in $C$ is a morphism in the comma category. In this case, however, the triangle identity is automatic, because one of its sides is an identity $id_{K c'}$.

We have shown that our naturality square is composed of two commuting triangles, with $V_{K c'}^{(c, K h)}$ as its diagonal, and therefore commutes as a whole.

## Universality

Kan extension is defined using a universal construction: it’s the best of all functors that extend $F$ along $K$. It means that any other extension will factor through ours. More precisely, if there is another functor $G$ and another natural transformation:

$\gamma : F \to G \circ K$

then there is a unique natural transformation $\alpha$, such that

$(\alpha \cdot K) \circ \eta = \gamma$

(where we have a horizontal composition of a natural transformation $\alpha$ with the functor $K$)

As always, we start by clearly stating the goals. The proof proceeds in these steps:

1. Implement $\alpha$.
2. Prove that it’s natural.
3. Show that it factorizes $\gamma$ through $\eta$.
4. Show that it’s unique.

We are defining a natural transformation $\alpha$ between two functors: $Lan_K F$, which we have defined as a colimit, and $G$. Both are functors from $D$ to $E$. Let’s implement the component of $\alpha$ at some $d$. It must be a morphism:

$\alpha_d : (Lan_K F) d \to G d$

Notice that $d$ varies over all of $D$, not just the image of $K$. We are comparing the two extensions where it really matters.

We have at our disposal the natural transformation:

$\gamma : F \to G \circ K$

or, in components:

$\gamma_c : F c \to G (K c)$

More importantly, though, we have the universal property of the colimit. If we can construct a cone with the nadir at $G d$ then we can use its factorizing morphism to define $\alpha_d$.

This cone has to be built under the same diagram as $(Lan_K F) d$. So let’s start with some $(c, f : K c \to d)$. We want to construct a leg of our new cone going from $F c$ to $G d$. We can use $\gamma_c$ to get to $G (K c)$ and then hop to $G d$ using $G f$. Thus we can define the new cone’s leg as:

$W_c = G f \circ \gamma_c$

Let’s make sure that this is really a cone, meaning, its sides must be commuting triangles.

Let’s pick a morphism $h$ in $K \downarrow d$ from $(c, f)$ to $(c', f')$. A morphism in $K \downarrow d$ must satisfy the triangle condition, $f = f' \circ K h$:

We can lift this triangle using $G$:

$G f = G f' \circ G (K h)$

Naturality condition for $\gamma$ tells us that:

$\gamma_{c'} \circ F h = G (K h) \circ \gamma_c$

By combining the two, we get the pentagon:

whose outline forms a commuting triangle:

$G f \circ \gamma_c = G f' \circ \gamma_{c'} \circ F h$

Now that we have a cone with the nadir at $G d$, universality of the colimit tells us that there is a unique morphism from the colimiting cone to it that factorizes all triangles between the two cones. We make this morphism our $\alpha_d$. The commuting triangles are between the legs of the colimiting cone $V_d^{(c, f)}$ and the legs of our new cone $W_c$:

$\alpha_d \circ V_d^{(c, f)} = G f \circ \gamma_c$

Now we have to show that so defined $\alpha$ is a natural transformation. Let’s pick a morphism $g : d \to d'$. We can lift it using $Lan_K F$ or using $G$ in order to construct the naturality square:

$G g \circ \alpha_d = \alpha_{d'} \circ (Lan_K F) g$

Remember that the lifting of a morphism by $Lan_K F$ satisfies the following commuting condition:

$(Lan_K F) g \circ V_{d}^{(c, f)} = V_{d'}^{(c, g \circ f)}$

We can combine the two diagrams:

The two arms of the large triangle can be replaced using the diagram that defines $\alpha$, and we get:

$G (g \circ f) \circ \gamma_c = G g \circ G f \circ \gamma_c$

which commutes due to functoriality of $G$.

Now we have to show that $\alpha$ factorizes $\gamma$ through $\eta$. Both $\eta$ and $\gamma$ are natural transformations between functors going from $C$ to $E$, whereas $\alpha$ connects functors from $D$ to $E$. To extend $\alpha$, we horizontally compose it with the identity natural transformation from $K$ to $K$. This is called whiskering and is written as $\alpha \cdot K$. This becomes clear when expressed in components. Let’s pick an object $c$ in $C$. We want to show that:

$\gamma_c = \alpha_{K c} \circ \eta_c$

Let’s go back all the way to the definition of $\eta$:

$\eta_c = V_{K c}^{(c, id)}$

where $id$ is the identity morphism at $K c$. Compare this with the definition of $\alpha$:

$\alpha_d \circ V_d^{(c, f)} = G f \circ \gamma_c$

If we replace $f$ with $id$ and $d$ with $K c$, we get:

$\alpha_{K c} \circ V_{K c}^{(c, id)} = G id \circ \gamma_c$

which, due to functoriality of $G$ is exactly what we need:

$\alpha_{K c} \circ \eta_c = \gamma_c$

Finally, we have to prove the uniqueness of $\alpha$. Here’s the trick: assume that there is another natural transformation $\alpha'$ that factorizes $\gamma$. Let’s rewrite the naturality condition for $\alpha'$:

$G g \circ \alpha'_d = \alpha'_{d'} \circ (Lan_K F) g$

Replacing $g : d \to d'$ with $f : K c \to d$, we get:

$G f \circ \alpha'_{K c} = \alpha'_d \circ (Lan_K F) f$

The lifiting of $f$ by $Lan_K F$ satisfies the triangle identity:

$V_d^{(c, f)} = (Lan_K F) f \circ V_{K c}^{(c, id)}$

where we recognize $V_{K c}^{(c, id)}$ as $\eta_c$.

We said that $\alpha'$ factorizes $\gamma$ through $\eta$:

$\gamma_c = \alpha'_{K c} \circ \eta_c$

which let us straighten the left side of the pentagon.

This shows that $\alpha'$ is another factorization of the cone with the nadir at $G d$ through the colimit cone with the nadir $(Lan_K F) d$. But that would contradict the universality of the colimit, therefore $\alpha'$ must be the same as $\alpha$.

This completes the proof.

This post would not be complete if I didn’t mention a Haskell implementation of Kan extensions by Edward Kmett, which you can find in the library Data.Functor.Kan.Lan. At first look you might not recognize the definition given there:

data Lan g h a where
Lan :: (g b -> a) -> h b -> Lan g h a

To make it more in line with the previous discussion, let’s rename some variables:

data Lan k f d where
Lan :: (k c -> d) -> f c -> Lan k f d

This is an example of GADT syntax, which is a Haskell way of implementing existential types. It’s equivalent to the following pseudo-Haskell:

Lan k f d = exists c. (k c -> d, f c)

This is more like it: you may recognize (k c -> d) as an object of the comma category $K \downarrow d$, and f c as the mapping of $c$ (which is the projection of this object back to $C$) under the functor $F$. In fact, the Haskell representation is based on the encoding of the colimit using a coend:

$(Lan_K F) d = \int^{c \in C} C(K c, d) \times F c$

The Haskell library also contains the proof that Kan extension is a functor:

instance Functor (Lan k f) where
fmap g (Lan kcd fc) = Lan (g . kcd) fc

The natural transformation $\eta$ that is part of the definition of the Kan extension can be extracted from the Haskell definition:

eta :: f c -> Lan k f (k c)
eta = Lan id

In Haskell, we don’t have to prove naturality, as it is a consequence of parametricity.

The universality of the Kan extension is witnessed by the following function:

toLan :: Functor g => (forall c. f c -> g (k c)) -> Lan k f d -> g d
toLan gamma (Lan kcd fc) = fmap kcd (gamma fc)

It takes a natural transformation $\gamma$ from $F$ to $G \circ K$, and produces a natural transformation we called $\alpha$ from $Lan_K F$ to $G$.

This is $\gamma$ expressed as a composition of $\alpha$ and $\eta$:

fromLan :: (forall d. Lan k f d -> g d) -> f c -> g (k c)
fromLan alpha = alpha . eta

As an example of equational reasoning, let’s prove that $\alpha$ defined by toLan indeed factorizes $\gamma$. In other words, let’s prove that:

fromLan (toLan gamma) = gamma

Let’s plug the definition of toLan into the left hand side:

fromLan (\(Lan kcd fc) -> fmap kcd (gamma fc))

then use the definition of fromLan:

(\(Lan kcd fc) -> fmap kcd (gamma fc)) . eta

Let’s apply this to some arbitrary function g and expand eta:

(\(Lan kcd fc) -> fmap kcd (gamma fc)) (Lan id g)

Beta-reduction gives us:

fmap id (gamma g)

which is indeed equal to the right hand side acting on g:

gamma g

The proof of toLan . fromLan = id is left as an exercise to the reader (hint: you’ll have to use naturality).

## Acknowledgments

I’m grateful to Emily Riehl for reviewing the draft of this post and for writing the book from which I borrowed this proof.

La filosofia è scritta in questo grandissimo libro che continuamente ci sta aperto innanzi a gli occhi (io dico l’universo), ma non si può intendere se prima non s’impara a intender la lingua, e conoscer i caratteri, ne’ quali è scritto. Egli è scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed altre figure geometriche, senza i quali mezi è impossibile a intenderne umanamente parola; senza questi è un aggirarsi vanamente per un oscuro laberinto.

― Galileo Galilei, Il Saggiatore (The Assayer)

Joan was quizzical; studied pataphysical science in the home. Late nights all alone with a test tube.

— The Beatles, Maxwell’s Silver Hammer

Unless you’re a member of the Flat Earth Society, I bet you’re pretty confident that the Earth is round. In fact, you’re so confident that you don’t even ask yourself the question why you are so confident. After all, there is overwhelming scientific evidence for the round-Earth hypothesis. There is the old “ships disappearing behind the horizon” proof, there are satellites circling the Earth, there are even photos of the Earth seen from the Moon, the list goes on and on. I picked this particular theory because it seems so obviously true. So if I try to convince you that the Earth is flat, I’ll have to dig very deep into the foundation of your belief systems. Here’s what I’ve found: We believe that the Earth is round not because it’s the truth, but because we are lazy and stingy (or, to give it a more positive spin, efficient and parsimonious). Let me explain…

## The New Flat Earth Theory

Let’s begin by stressing how useful the flat-Earth model is in everyday life. I use it all the time. When I want to find the nearest ATM or a gas station, I take out my cell phone and look it up on its flat screen. I’m not carrying a special spherical gadget in my pocket. The screen on my phone is not bulging in the slightest when it’s displaying a map of my surroundings. So, at least within the limits of my city, or even the state, flat-Earth theory works just fine, thank you!

I’d like to make parallels with another widely accepted theory, Einstein’s special relativity. We believe that it’s true, but we never use it in everyday life. The vast majority of objects around us move much slower than the speed of light, so traditional Newtonian mechanics works just fine for us. When was the last time you had to reset your watch after driving from one city to another to account for the effects of time dilation?

The point is that every physical theory is only valid within a certain range of parameters. Physicists have always been looking for the Holy Grail of theories — the theory of everything that would be valid for all values of parameters with no exceptions. They haven’t found one yet.

But, obviously, special relativity is better than Newtonian mechanics because it’s more general. You can derive Newtonian mechanics as a low velocity approximation to special relativity. And, sure enough, the flat-Earth theory is an approximation to the round-Earth theory for small distances. Or, equivalently, it’s the limit as the radius of the Earth goes to infinity.

But suppose that we were prohibited (for instance, by a religion or a government) from ever considering the curvature of the Earth. As explorers travel farther and farther, they discover that the “naive” flat-Earth theory gives incorrect answers. Unlike present-day flat-earthers, who are not scientifically sophisticated, they would actually put some effort to refine their calculations to account for the “anomalies.” For instance, they could postulate that, as you get away from the North Pole, which is the center of the flat Earth, something funny keeps happening to measuring rods. They get elongated when positioned along the parallels (the circles centered at the North Pole). The further away you get from the North Pole, the more they elongate, until at a certain distance they become infinite. Which means that the distances (measured using those measuring rods) along the big circles get smaller and smaller until they shrink to zero.

I know this theory sounds weird at first, but so does special and, even more so, general relativity. In special relativity, weird things happen when your speed is close to the speed of light. Time slows down, distances shrink in the direction of flight (but not perpendicular to it!), and masses increase. In general relativity, similar things happen when you get closer to a black hole’s event horizon. In both theories things diverge as you hit the limit — the speed of light, or the event horizon, respectively.

Back to flat Earth — our explorers conquer space. They have to extend their weird geometry to three dimensions. They find out that horizontally positioned measuring rods shrink as you go higher (they un-shrink when you point them vertically). The intrepid explorers also dig into the ground, and probe the depths with seismographs. They find another singularity at a particular depth, where the horizontal dilation of measuring rods reaches infinity (round-Earthers call this the center of the Earth).

This generalized flat-Earth theory actually works. I know that, because I have just described the spherical coordinate system. We use it when we talk about degrees of longitude and latitude. We just never think of measuring distances using spherical coordinates — it’s too much work, and we are lazy. But it’s possible to express the metric tensor in those coordinates. It’s not constant — it varies with position — and it’s not isotropic — distances vary with direction. In fact, because of that, flat Earthers would be better equipped to understand general relativity than we are.

So is the Earth flat or spherical? Actually it’s neither. Both theories are just approximations. In cartesian coordinates, the Earth is the shape of a flattened ellipsoid, but as you increase the resolution, you discover more and more anomalies (we call them mountains, canyons, etc.). In spherical coordinates, the Earth is flat, but again, only approximately. The biggest difference is that the math is harder in spherical coordinates.

Have I confused you enough? On one level, unless you’re an astronaut, your senses tell you that the Earth is flat. On the other level, unless you’re a conspiracy theorist who believes that NASA is involved in a scam of enormous proportions, you believe that the Earth is pretty much spherical. Now I’m telling you that there is a perfectly consistent mathematical model in which the Earth is flat. It’s not a cult, it’s science! So why do you feel that the round Earth theory is closer to the truth?

## The Occam’s Razor

The round Earth theory is just simpler. And for some reason we cling to the belief that nature abhors complexity (I know, isn’t it crazy?). We even express this belief as a principle called the Occam’s razor. In a nutshell, it says that:

Among competing hypotheses, the one with the fewest assumptions should be selected.

Notice that this is not a law of nature. It’s not even scientific: there is no way to falsify it. You can argue for the Occam’s razor on the grounds of theology (William of Ockham was a Franciscan friar) or esthetics (we like elegant theories), but ultimately it boils down to pragmatism: A simpler theory is easier to understand and use.

It’s a mistake to think that Occam’s razor tells us anything about the nature of things, whatever that means. It simply describes the limitations of our mind. It’s not nature that abhors complexity — it’s our brains that prefer simplicity.

Unless you believe that physical laws have an independent existence of their own.

## The Layered Cake Hypothesis

Scientists since Galileo have a picture of the Universe that consists of three layers. The top layer is nature that we observe and interact with. Below are laws of physics — the mechanisms that drive nature and make it predictable. Still below is mathematics — the language of physics (that’s what Galileo’s quote at the top of this post is about). According to this view, physics and mathematics are the hidden components of the Universe. They are the invisible cogwheels and pulleys whose existence we can only deduce indirectly. According to this view, we discover the laws of physics. We also discover mathematics.

Notice that this is very different from art. We don’t say that Beethoven discovered the Fifth Symphony (although Igor Stravinsky called it “inevitable”) or that Leonardo da Vinci discovered the Mona Lisa. The difference is that, had not Beethoven composed his symphony, nobody would; but if Cardano hadn’t discovered complex numbers, somebody else probably would. In fact there were many cases of the same mathematical idea being discovered independently by more than one person. Does this prove that mathematical ideas exist the same way as, say, the moons of Jupiter?

Physical discoveries have a very different character than mathematical discoveries. Laws of physics are testable against physical reality. We perform experiments in the real world and if the results contradict a theory, we discard the theory. A mathematical theory, on the other hand, can only be tested against itself. We discard a theory when it leads to internal contradictions.

The belief that mathematics is discovered rather than invented has its roots in Platonism. When we say that the Earth is spherical, we are talking about the idea of a sphere. According to Plato, these ideas do exist independently of the observer — in this case, a mathematician who studies them. Most mathematicians are Platonists, whether they admit it or not.

Being able to formulate laws of physics in terms of simple mathematical equations is a thing of beauty and elegance. But you have to realize that history of physics is littered with carcasses of elegant theories. There was a very elegant theory, which postulated that all matter was made of just four elements: fire, air, water, and earth. The firmament was a collection of celestial spheres (spheres are so Platonic). Then the orbits of planets were supposed to be perfect circles — they weren’t. They aren’t even elliptical, if you study them close enough.

Celestial spheres. An elegant theory, slightly complicated by the need to introduce epicycles to describe the movements of planets

## The Impass

But maybe at the level of elementary particles and quantum fields some of this presumed elegance of the Universe shines through? Well, not really. If the Universe obeyed the Occam’s razor, it would have stopped at two quarks, up and down. Nobody needs the strange and the charmed quarks, not to mention the bottom and the top quarks. The Standard Model of particle physics looks like a kitchen sink filled with dirty dishes. And then there is gravity that resists all attempts at grand unification. Strings were supposed to help but they turned out to be as messy as the rest of it.

Of course the current state of impasse in physics might be temporary. After all we’ve been making tremendous progress up until about the second half of the twentieth century (the most recent major theoretical breakthroughs were the discovery of the Higgs mechanism in 1964 and the proof or renormalizability of the Standard Model in 1971).

On the other hand, it’s possible that we might be reaching the limits of human capacity to understand the Universe. After all, there is no reason to believe that the structure of the Universe is simple enough for the human brain to analyze. There is no guarantee that it can be translated into the language of physics and mathematics.

## Is the Universe Knowable?

In fact, if you think about it, our expectation that the Universe is knowable is quite arbitrary. On the one hand you have the vast complex Universe, on the other hand you have slightly evolved monkey brains that have only recently figured out how to use tools and communicate using speech. The idea that these brains could produce and store a model of the Universe is preposterous. Granted, our monkey brains are a product of evolution, and our survival depends on those brains being able to come up with workable models of our environment. These models, however, do not include the microcosm or the macrocosm — just the narrow band of phenomena in between. Our senses can perceive space and time scales within about 8 orders of magnitude. For comparison, the Universe is about 40 orders of magnitude larger than the size of the atomic nucleus (not to mention another 20 orders of magnitude down to Planck length).

The evolution came up with an ingenious scheme to deal with the complexities of our environment. Since it is impossible to store all information about the Universe in the very limited amount of memory at our disposal, and it’s impossible to run the simulation in real time, we have settled for the next best thing: creating simplified partial models that are composable.

The idea is that, in order to predict the trajectory of a spear thrown at a mammoth, it’s enough to roughly estimate the influence of a constant downward pull of gravity and the atmospheric drag on the idealized projectile. It is perfectly safe to ignore a lot of subtle effects: the non-uniformity of the gravitational field, air-density fluctuations, imperfections of the spear, not to mention relativistic effects or quantum corrections.

And this is the key to understanding our strategy: we build a simple model and then calculate corrections to it. The idea is that corrections are small enough as not to destroy the premise of the model.

## Celestial Mechanics

A great example of this is celestial mechanics. To the lowest approximation, the planets revolve around the Sun along elliptical orbits. The ellipse is a solution of the one body problem in a central gravitational field of the Sun; or a two body problem, if you also take into account the tiny orbit of the Sun. But planets also interact with each other — in particular the heaviest one, Jupiter, influences the orbits of other planets. We can treat these interactions as corrections to the original solution. The more corrections we add, the better predictions we can make. Astronomers came up with some ingenious numerical methods to make such calculations possible. And yet it’s known that, in the long run, this procedure fails miserably. That’s because even the tiniest of corrections may lead to a complete change of behavior in the far future. This is the property of chaotic systems, our Solar System being just one example of such. You must have heard of the butterfly effect — the Universe is filled with this kind of butterflies.

Ephemerides: Tables showing positions of planets on the firmament.

## The Microcosm

Anyone who is not shocked by quantum
theory has not understood a single word.

— Niels Bohr

At the other end of the spectrum we have atoms and elementary particles. We call them particles because, to the lowest approximation, they behave like particles. You might have seen traces made by particles in a bubble chamber.

Elementary particles might, at first sight, exhibit some properties of macroscopic objects. They follow paths through the bubble chamber. A rock thrown in the air also follows a path — so elementary particles can’t be much different from little rocks. This kind of thinking led to the first model of the atom as a miniature planetary system. As it turned out, elementary particles are nothing like little rocks. So maybe they are like waves on a lake? But waves are continuous and particles can be counted by Geiger counters. We would like elementary particles to either behave like particles or like waves but, despite our best efforts, they refuse to nicely fall into one of the categories.

There is a good reason why we favor particle and wave explanations: they are composable. A two-particle system is a composition of two one-particle systems. A complex wave can be decomposed into a superposition of simpler waves. A quantum system is neither. We might try to separate a two-particle system into its individual constituents, but then we have to introduce spooky action at a distance to explain quantum entanglement. A quantum system is an alien entity that does not fit our preconceived notions, and the main characteristic that distinguishes it from classical phenomena is that it’s not composable. If quantum phenomena were composable in some other way, different from particles or waves, we could probably internalize it. But non-composable phenomena are totally alien to our way of thinking. You might think that physicists have some deeper insight into quantum mechanics, but they don’t. Richard Feynman, who was a no-nonsense physicist, famously said, “If you think you understand quantum mechanics, you don’t understand quantum mechanics.” The problem with understanding quantum mechanics is not that it’s too complex. The problem is that our brains can only deal with concepts that are composable.

It’s interesting to notice that by accepting quantum mechanics we gave up on composability on one level in order to decompose something at another level. The periodic table of elements was the big challenge at the beginning of the 20th century. We already knew that earth, water, air, and fire were not enough. We understood that chemical compounds were combinations of atoms; but there were just too many kinds of atoms, and they could be grouped into families that shared similar properties. Atom was supposed to be indivisible (the Greek word ἄτομος [átomos] means indivisible), but we could not explain the periodic table without assuming that there was some underlying structure. And indeed, there is structure there, but the way the nucleus and the electrons compose in order to form an atom is far from trivial. Electrons are not like planets orbiting the nucleus. They form shells and orbitals. We had to wait for quantum mechanics and the Fermi exclusion principle to describe the structure of an atom.

Every time we explain one level of complexity by decomposing it in terms of simpler constituents we seem to trade off some of the simplicity of the composition itself. This happened again in the sixties, when physicists were faced with a confusing zoo of elementary particles. It seemed like there were hundreds of strongly interacting particles, hadrons, and every year was bringing new discoveries. This mess was finally cleaned up by the introduction of quarks. It was possible to categorize all hadrons as composed of just six types of quarks. This simplification didn’t come without a price, though. When we say an atom is composed of the nucleus and electrons, we can prove it by knocking off a few electrons and studying them as independent particles. We can even split the nucleus into protons and neutrons, although the neutrons outside of a nucleus are short lived. But no matter how hard we try, we cannot split a proton into its constituent quarks. In fact we know that quarks cannot exist outside of hadrons. This is called quark- or color-confinement. Quarks are supposed to come in three “colors,” but the only composites we can observe are colorless. We have stretched the idea of composition by accepting the fact that a composite structure can never be decomposed into its constituents.

## I’m Slightly Perturbed

How do physicists deal with quantum mechanics? They use mathematics. Richard Feynman came up with ingenious ways to perform calculations in quantum electrodynamics using perturbation theory. The idea of perturbation theory is that you start with the simple approximation and keep adding corrections to it, just like with celestial mechanics. The terms in the expansion can be visualized as Feynman diagrams. For instance, the lowest term in the interaction between two electrons corresponds to a diagram in which the electrons exchange a virtual photon.

This terms gives the classical repulsive force between two charged particles. The first quantum correction to it involves the exchange of two virtual photons. And here’s the kicker: this correction is not only larger than the original term — it’s infinite! So much for small corrections. Yes, there are tricks to shove this infinity under the carpet, but everybody who’s not fooling themselves understands that the so called renormalization is an ugly hack. We don’t understand what the world looks like at very small scales and we try to ignore it using tricks that make mathematicians faint.

Physicists are very pragmatic. As long as there is a recipe for obtaining results that can be compared with the experiment, they are happy with a theory. In this respect, the Standard Model is the most successful theory in the Universe. It’s a unified quantum field theory of electromagnetism, strong, and weak interactions that produces results that are in perfect agreement with all high-energy experiments we were able to perform to this day. Unfortunately, the Standard Model does not give us the understanding of what’s happening. It’s as if physicists were given an alien cell phone and figured out how to use various applications on it but have no idea about the internal workings of the gadget. And that’s even before we try to involve gravity in the model.

The “periodic table” of elementary particles.

The prevailing wisdom is that these are just little setbacks on the way toward the ultimate theory of everything. We just have to figure out the correct math. It may take us twenty years, or two hundred years, but we’ll get there. The hope that math is the answer led theoretical physicists to study more and more esoteric corners of mathematics and to contribute to its development. One of the most prominent theoretical physicists, Edward Witten, the father of M-theory that unified a number of string theories, was awarded the prestigious Fields Medal for his contribution to mathematics (Nobel prizes are only awarded when a theory is confirmed by experiment which, in the case of string theory, may be a be long way off, if ever).

If mathematics is discoverable, then we might indeed be able to find the right combination of math and physics to unlock the secrets of the Universe. That would be extremely lucky, though.

There is one property of all of mathematics that is really striking, and it’s most clearly visible in foundational theories, such as logic, category theory, and lambda calculus. All these theories are about composability. They all describe how to construct more complex things from simpler elements. Logic is about combining simple predicates using conjunctions, disjunctions, and implications. Category theory starts by defining a composition of arrows. It then introduces ways of combining objects using products, coproducts, and exponentials. Typed lambda calculus, the foundation of computer languages, shows us how to define new types using product types, sum types, and functions. In fact it can be shown that constructive logic, cartesian closed categories, and typed lambda calculus are three different formulations of the same theory. This is known as the Curry Howard Lambek isomorphism. We’ve been discovering the same thing over and over again.

It turns out that most mathematical theories have a skeleton that can be captured by category theory. This should not be a surprise considering how the biggest revolutions in mathematics were the result of realization that two or more disciplines were closely related to each other. The latest such breakthrough was the proof of the Fermat’s last theorem. This proof was based on the Taniyama-Shimura conjecture that related the study of elliptic curves to modular forms — two radically different branches of mathematics.

Earlier, geometry was turned upside down when it became obvious that one can define shapes using algebraic equations in cartesian coordinates. This retooling of geometry turned out to be very advantageous, because algebra has better compositional qualities than Euclidean-style geometry.

Finally, any mathematical theory starts with a set of axioms, which are combined using proof systems to produce theorems. Proof systems are compositional which, again, supports the view that mathematics is all about composition. But even there we hit a snag when we tried to decompose the space of all statements into true and false. Gödel has shown that, in any non-trivial theory, we can formulate a statement that can neither be proved to be right or wrong, and thus the Hilbert’s great project of defining one grand mathematical theory fell apart. It’s as if we have discovered that the Lego blocks we were playing with were not part of a giant Lego spaceship.

## Where Does Composability Come From?

It’s possible that composability is the fundamental property of the Universe, which would make it comprehensible to us humans, and it would validate our physics and mathematics. Personally, I’m very reluctant to accept this point of view, because it would give intelligent life a special place in the grand scheme of things. It’s as if the laws of the Universe were created in such a way as to be accessible to the brains of the evolved monkeys that we are.

It’s much more likely that mathematics describes the ways our brains are capable of composing simpler things into more complex systems. Anything that we can comprehend using our brains must, by necessity, be decomposable — and there are only so many ways of putting things together. Discovering mathematics means discovering the structure of our brains. Platonic ideals exist only as patterns of connections between neurons.

The amazing scientific progress that humanity has been able to make to this day was possible because there were so many decomposable phenomena available to us. Granted, as we progressed, we had to come up with more elaborate composition schemes. We have discovered differential equations, Hilbert spaces, path integrals, Lie groups, tensor calculus, fiber bundles, etc. With the combination of physics and mathematics we have tapped into a gold vein of composable phenomena. But research takes more and more resources as we progress, and it’s possible that we have reached the bedrock that may be resistant to our tools.

We have to seriously consider the possibility that there is a major incompatibility between the complexity of the Universe and the simplicity of our brains. We are not without recourse, though. We have at our disposal tools that multiply the power of our brains. The first such tool is language, which helps us combine brain powers of large groups of people. The invention of the printing press and then the internet helped us record and gain access to vast stores of information that’s been gathered by the combined forces of teams of researchers over long periods of time. But even though this is quantitative improvement, the processing of this information still relies on composition because it has to be presented to human brains. The fact that work can be divided among members of larger teams is proof of its decomposability. This is also why we sometimes need a genius to make a major breakthrough, when a task cannot be easily decomposed into smaller, easier, subtasks. But even genius has to start somewhere, and the ability to stand on the shoulders of giants is predicated on decomposability.

## Can Computers Help?

The role of computers in doing science is steadily increasing. To begin with, once we have a scientific theory, we can write computer programs to perform calculations. Nobody calculates the orbits of planets by hand any more — computers can do it much faster and error free. We are also beginning to use computers to prove mathematical theorems. The four-color problem is an example of a proof that would be impossible without the help of computers. It was decomposable, but the number of special cases was well over a thousand (it was later reduced to 633 — still too many, even for a dedicated team of graduate students).

Every planar map can be colored using only four colors.

Computer programs that are used in theorem proving provide a level of indirection between the mind of a scientist and formal manipulations necessary to prove a theorem. A programmer is still in control, and the problem is decomposable, but the number of components may be much larger, often too large for a human to go over one by one. The combined forces of humans and computers can stretch the limits of composability.

But how can we tackle problems that cannot be decomposed? First, let’s observe that in real life we rarely bother to go through the process of detailed analysis. In fact the survival of our ancestors depended on the ability to react quickly to changing circumstances, to make instantaneous decisions. When you see a tiger, you don’t decompose the image into individual parts, analyze them, and put together a model of a tiger. Image recognition is one of these areas where the analytic approach fails miserably. People tried to write programs that would recognize faces using separate subroutines to detect eyes, noses, lips, ears, etc., and composing them together, but they failed. And yet we instinctively recognize faces of familiar people at a glance.

## Neural Networks and the AI

We are now able to teach computers to classify images and recognize faces. We do it not by designing dedicated algorithms; we do it by training artificial neural networks. A neural network doesn’t start with a subsystem for recognizing eyes or noses. It’s possible that, in the process of training, it will develop the notions of lines, shadows, maybe even eyes and noses. But by no means is this necessary. Those abstractions, if they evolve, would be encoded in the connections between its neurons. We might even help the AI develop some general abstractions by tweaking its architecture. It’s common, for instance, to include convolutional layers to pre-process the input. Such a layer can be taught to recognize local features and compress the input to a more manageable size. This is very similar to how our own vision works: the retina in our eye does this kind of pre-processing before sending compressed signals through the optic nerve.

Compression is the key to matching the complexity of the task at hand to the simplicity of the system that is processing it. Just like our sensory organs and brains compress the inputs, so do neural networks. There are two kinds of compression: the kind that doesn’t lose any information, just removing the redundancy in the original signal; and the lossy kind that throws away irrelevant information. The task of deciding what information is irrelevant is in itself a process of discovery. The difference between the Earth and a sphere is the size of the Himalayas, but we ignore it when when we look at the globe. When calculating orbits around the Sun, we shrink all planets to points. That’s compression by elimination of details that we deem less important for the problem we are solving. In science, this kind of compression is called abstraction.

We are still way ahead of neural networks in our capacity to create abstractions. But it’s possible that, at some point, they’ll catch up with us. The problem is: Will we be able to understand machine-generated abstractions? We are already at the limits of understanding human-generated abstractions. You may count yourself a member of a very small club if you understand the statement “monad is a monoid in the category of endofunctors” that is chock full of mathematical abstractions. If neural networks come up with new abstractions/compression schemes, we might not be able to reverse engineer them. Unlike a human scientist, an AI is unlikely to be able to explain to us how it came up with a particular abstraction.

I’m not scared about a future AI trying to eliminate human kind (unless that’s what its design goals are). I’m afraid of the scenario in which we ask the AI a question like, “Can quantum mechanics be unified with gravity?” and it will answer, “Yes, but I can’t explain it to you, because you don’t have the brain capacity to understand the explanation.”

And this is the optimistic scenario. It assumes that such questions can be answered within the decomposition/re-composition framework. That the Universe can be decomposed into particles, waves, fields, strings, branes, and maybe some new abstractions that we haven’t even though about. We would at least get the satisfaction that we were on the right path but that the number of moving parts was simply too large for us to assimilate — just like with the proof of the four-color theorem.

But it’s possible that this reductionist scenario has its limits. That the complexity of the Universe is, at some level, irreducible and cannot be captured by human brains or even the most sophisticated AIs.

There are people who believe that we live in a computer simulation. But if the Universe is irreducible, it would mean that the smallest computer on which such a simulation could be run is the Universe itself, in which case it doesn’t make sense to call it a simulation.

## Conclusion

The scientific method has been tremendously successful in explaining the workings of our world. It led to exponential expansion of science and technology that started in the 19th century and continues to this day. We are so used to its successes that we are betting the future of humanity on it. Usually when somebody attacks the scientific method, they are coming from the background of obscurantism. Such attacks are easily rebuffed or dismissed. What I’m arguing is that science is not a property of the Universe, but rather a construct of our limited brains. We have developed some very sophisticated tools to create models of the Universe based on the principle of composition. Mathematics is the study of various ways of composing things and physics is applied composition. There is no guarantee, however, that the Universe is decomposable. Assuming that would be tantamount to postulating that its structure revolves around human brains, just like we used to believe that the Universe revolves around Earth.

You can also watch my talk on this subject.

Trying to improve my Haskell coding skills, I decided to test myself at solving the 2017 Advent of Code problems. It’s been a lot of fun and a great learning experience. One problem in particular stood out for me because, for the first time, it let me apply, in anger, the ideas I learned from category theory. But I’m not going to talk about category theory this time, just about programming.

The problem is really about dominoes. You get a set of dominoes, or pairs of numbers (the twist is that the numbers are not capped at 6), and you are supposed to build a chain, in which the numbers are matched between consecutive pieces. For instance, the chain [(0, 5), (5, 12), (12, 12), (12, 1)] is admissible. Like in the real game, the pieces can be turned around, so (1, 3) can be also used as (3, 1). The goal is to build a chain that starts from zero and maximizes the score, which is the sum of numbers on the pieces used.

The algorithm is pretty straightforward. You put all dominoes in a data structure that lets you quickly pull the pieces you need, you recursively build all possible chains, evaluate their sums, and pick the winner.

type Piece = (Int, Int)
type Chain = [Piece]

At each step in the procedure, we will be looking for a domino with a number that matches the end of the current chain. If the chain is [(0, 5), (5, 12)], we will be looking for pieces with 12 on one end. It’s best to organize pieces in a map indexed by these numbers. To allow for turning the dominoes, we’ll add each piece twice. For instance, the piece (12, 1) will be added as (12, 1) and (1, 12). We can make a small optimization for symmetric dominoes, like (12, 12), by adding them only once.

We’ll use the Map from the Prelude:

import qualified Data.Map as Map

The Map we’ll be using is:

type Pool = Map.Map Int [Int]

The key is an integer, the value is a list of integers corresponding to the other ends of pieces. This is made clear by the way we insert each piece in the map:

addPiece :: Piece -> Pool -> Pool
addPiece (m, n) = if m /= n
where
case Map.lookup m pool of
Nothing  -> Map.insert m [n] pool
Just lst -> Map.insert m (n : lst) pool

I used point-free notation. If that’s confusing, here’s the translation:

addPiece :: Piece -> Pool -> Pool
addPiece (m, n) pool = if m /= n
else add m n pool

As I said, each piece is added twice, except for the symmetric ones.

After using a piece in a chain, we’ll have to remove it from the pool:

removePiece :: Piece -> Pool -> Pool
removePiece (m, n) = if m /= n
then rem m n . rem n m
else rem m n
where
rem :: Int -> Int -> Pool -> Pool
rem m n pool =
case fromJust $Map.lookup m pool of [] -> Map.delete m pool lst -> Map.insert m (delete n lst) pool You might be wondering why I’m using a partial function fromJust. In industrial-strength code I would pattern match on the Maybe and issue a diagnostic if the piece were not found. Here I’m fine with a fatal exception if there’s a bug in my reasoning. It’s worth mentioning that, like all data structures in Haskell, Map is a persistent data structure. It means that it’s never modified in place, and its previous versions persist. This is invaluable in this kind of recursive algorithms, where we use backtracking to explore multiple paths. The input of the puzzle is a list of pieces. We’ll start by inserting them into our map. In functional programming we never think in terms of loops: we transform data. A list of pieces is a (recursive) data structure. We want to traverse it and accumulate the information stored in it into a Map. This kind of transformation is, in general, called a catamorphism. A list catamorphism is called a fold. It is specified by two things: (1) its action on the empty list (here, it turns it into Map.empty), and (2) its action on the head of the current list and the accumulated result of processing the tail. The head of the current list is a piece, and the accumulator is the Map. The function addPiece has just the right signature: presort :: [Piece] -> Pool presort = foldr addPiece Map.empty I’m using a right fold, but a left fold would work fine, too. Again, this is point free notation. Now that the preliminaries are over, let’s think about the algorithm. My first approach was to define a bunch of mutually recursive functions that would build all possible chains, score them, and then pick the best one. After a few tries, I got hopelessly bogged down in details. I took a break and started thinking. Functional programming is all about functions, right? Using a recursive function is the correct approach. Or is it? The more you program in Haskell, the more you realize that you get the most power by considering wholesale transformations of data structures. When creating a Map of pieces, I didn’t write a recursive function over a list — I used a fold instead. Of course, behind the scenes, fold is implemented using recursion (which, thanks to tail recursion, is usually transformed into a loop). But the idea of applying transformations to data structures is what lets us soar above the sea of details and into the higher levels of abstraction. So here’s the new idea: let’s create one gigantic data structure that contains all admissible chains built from the domino pieces at our disposal. The obvious choice is a tree. At the root we’ll have the starting number: zero, as specified in the description of the problem. All pool pieces that have a zero at one end will start a new branch. Instead of storing the whole piece at the node, we can just store the second number — the first being determined by the parent. So a piece (0, 5) starts a branch with a 5 node right below the 0 node. Next we’d look for pieces with a 5. Suppose that one of them is (5, 12), so we create a node with a 12, and so on. A tree with a variable list of branches is called a rose tree: data Rose = NodeR Int [Rose] deriving Show It’s always instructive to keep in mind at least one special boundary case. Consider what would happen if (0, 5) were the only piece in the pool. We’d end up with the following tree: NodeR 0 [NodeR 5 []] We’ll come back to this example later. The next question is, how do we build such a tree? We start with a set of dominoes gathered in a Map. At every step in the algorithm we pick a matching domino, remove it from the pool, and start a new subtree. To start a subtree we need a number and a pool of remaining pieces. Let’s call this combination a seed. The process of building a recursive data structure from a seed is called anamorphism. It’s a well studied and well understood process, so let’s try to apply it in our case. The key is to separate the big picture from the small picture. The big picture is the recursive data structure — the rose tree, in our case. The small picture is what happens at a single node. Let’s start with the small picture. We are given a seed of the type (Int, Pool). We use the number as a key to retrieve a list of matching pieces from the Pool (strictly speaking, just a list of numbers corresponding to the other ends of the pieces). Each piece will start a new subtree. The seed for such a subtree consists of the number at the other end of the piece and a new Pool with the piece removed. A function that produces seeds from a given seed looks like this: grow (n, pool) = case Map.lookup n pool of Nothing -> [] Just ms -> [(m, removePiece (m, n) pool) | m <- ms] Now we have to translate this to a procedure that recreates a complete tree. The trick is to split the definition of the tree into local and global pictures. The local picture is captured by this data structure: data TreeF a = NodeF Int [a] deriving Functor Here, the recursion of the original rose tree is replaced by the type parameter a. This data structure, which describes a single node, or a very shallow tree, is a functor with respect to a (the compiler is able to automatically figure out the implementation of fmap, but you can also do it by hand). It’s important to realize that the recursive definition of a rose tree can be recovered as a fixed point of this functor. We define the fixed point as the data structure X that results from replacing a in the definition of TreeF with X. Symbolically: X = TreeF X In fact, this procedure of finding the fixed point can be written in all generality for any functor f. If we call the fixed point Fix f, we can define it by replacing the type argument to f with Fix f, as in: newtype Fix f = Fix { unFix :: f (Fix f) } Our rose tree is the fixed point of the functor TreeF: type Tree = Fix TreeF This splitting of the recursive part from the functor part is very convenient because it lets us use non-recursive functions to generate or traverse recursive data structures. In particular, the procedure of unfolding a data structure from a seed is captured by a non-recursive function of the following signature: type Coalgebra f a = a -> f a Here, a serves as the seed that generates a single node populated with new seeds. We have already seen a function that generates seeds, we only have to cast it in the form of a coalgebra: coalg :: Coalgebra TreeF (Int, Pool) coalg (n, pool) = case Map.lookup n pool of Nothing -> NodeF n [] Just ms -> NodeF n [(m, removePiece (m, n) pool) | m <- ms] The pièce de résistance is the formula that uses a given coalgebra to unfold a recursive date structure. It’s called the anamorphism: ana :: Functor f => Coalgebra f a -> a -> Fix f ana coalg = Fix . fmap (ana coalg) . coalg Here’s the play-by-play: The anamorphism takes a seed and applies the coalgebra to it. That generates a single node with new seeds in place of children. Then it fmaps the whole anamorphism over this node, thus unfolding the seeds into full-blown trees. Finally, it applies the constructor Fix to produce the final tree. Notice that this is a recursive definition. We are now in a position to build a tree that contains all admissible chains of dominoes. We do it by applying the anamorphism to our coalgebra: tree = ana coalg Once we have this tree, we could traverse it, or fold it, to retrieve all the chains and find the best one. But once we have our tree in the form of a fixed point, we can be smart about folds as well. The procedure is essentially the same, except that now we are collecting information from the nodes of a tree. To do this, we define a non-recursive function called the algebra: type Algebra f a = f a -> a The type a is called the carrier of the algebra. It plays the role of the accumulator of data. We are interested in the algebra that would help us collect chains of dominoes from our rose tree. Suppose that we have already applied this algebra to all children of a particular node. Each child tree would produce its own list of chains. Our goal is to extend those chains by adding one more piece that is determined by the current node. Let’s start with our earlier trivial case of a tree that contains a single piece (0, 5): NodeR 0 [Node 5 []] We replace the leaf node with some value x of the still unspecified carrier type. We get: NodeR 0 x Obviously, x must contain the number 5, to let us recover the original piece (0, 5). The result of applying the algebra to the top node must produce the chain [(0, 5)]. These two pieces of information suggest the carrier type to be a combination of a number and a list of chains. The leaf node is turned to (5, []), and the top node produces (0, [[(0, 5)]]). With this choice of the carrier type, the algebra is easy to implement: chainAlg :: Algebra TreeF (Int, [Chain]) chainAlg (NodeF n []) = (n, []) chainAlg (NodeF n lst) = (n, concat [push (n, m) bs | (m, bs) <- lst]) where push :: (Int, Int) -> [Chain] -> [Chain] push (n, m) [] = [[(n, m)]] push (n, m) bs = [(n, m) : br | br <- bs]] For the leaf (a node with no children), we return the number stored in it together with an empty list. Otherwise, we gather the chains from children. If a child returns an empty list of chains, meaning it was a leaf, we create a single-piece chain. If the list is not empty, we prepend a new piece to all the chains. We then concatenate all lists of chains into one list. All that remains is to apply our algebra recursively to the whole tree. Again, this can be done in full generality using a catamorphism: cata :: Functor f => Algebra f a -> Fix f -> a cata alg = alg . fmap (cata alg) . unFix We start by stripping the fixed point constructor using unFix to expose a node, apply the catamorphism to all its children, and apply the algebra to the node. To summarize: we use an anamorphism to create a tree, then use a catamorphism to convert the tree to a list of chains. Notice that we don’t need the tree itself — we only use it to drive the algorithm. Because Haskell is lazy, the tree is evaluated on demand, node by node, as it is walked by the catamorphism. This combination of an anamorphism followed immediately by a catamorphism comes up often enough to merit its own name. It’s called a hylomorphism, and can be written concisely as: hylo :: Functor f => Algebra f a -> Coalgebra f b -> b -> a hylo f g = f . fmap (hylo f g) . g In our example, we produce a list of chains using a hylomorphism: let (_, chains) = hylo chainAlg coalg (0, pool) The solution of the puzzle is the chain with the maximum score: maximum$ fmap score chains

score :: Chain -> Int
score = sum . fmap score1
where score1 (m, n) = m + n

## Conclusion

The solution that I described in this post was not the first one that came to my mind. I could have persevered with the more obvious approach of implementing a big recursive function or a series of smaller mutually recursive ones. I’m glad I didn’t. I have found out that I’m much more productive when I can reason in terms of applying transformations to data structures.

You might think that a data structure that contains all admissible chains of dominoes would be too large to fit comfortably in memory, and you would probably be right in a strict language. But Haskell is a lazy language, and data structures work more often as control structures than as storage for data.

The use of recursion schemes further simplifies programming. You can design algebras and coalgebras as non-recursive functions, which are much easier to reason about, and then apply them to recursive data structures using catamorphisms and anamorphisms. You can even combine them into hylomorphisms.

It’s worth mentioning that we routinely apply these techniques to lists. I already mentioned that a fold is nothing but a list catamorphism. The functor in question can be written as:

data ListF e a = Nil | Cons e a
deriving Functor

A list is a fixed point of this functor:

type List e = Fix (ListF e)

An algebra for the list functor is implemented by pattern matching on its two constructors:

alg :: ListF e a -> a
alg Nil = z
alg (Cons e a) = f e a

Notice that a list algebra is parameterized by two items: the value z and the function f :: e -> a -> a. These are exactly the parameters to foldr. So, when you are calling foldr, you are defining an algebra and performing a list catamorphism.

Likewise, a list anamorphism takes a coalgebra and a seed and produces a list. Finite lists are produced by the anamorphism called unfoldr:

unfoldr :: (b -> Maybe (a, b)) -> b -> [a]

You can learn more about algebras and coalgebras from the point of view of category theory, in another blog post.

The source code for this post is available on GitHub.

There is no good place to end a book on category theory. There’s always more to learn. Category theory is a vast subject. At the same time, it’s obvious that the same themes, concepts, and patterns keep showing up over and over again. There is a saying that all concepts are Kan extensions and, indeed, you can use Kan extensions to derive limits, colimits, adjunctions, monads, the Yoneda lemma, and much more. The notion of a category itself arises at all levels of abstraction, and so does the concept of a monoid and a monad. Which one is the most basic? As it turns out they are all interrelated, one leading to another in a never-ending cycle of abstractions. I decided that showing these interconnections might be a good way to end this book.

## Bicategories

One of the most difficult aspects of category theory is the constant switching of perspectives. Take the category of sets, for instance. We are used to defining sets in terms of elements. An empty set has no elements. A singleton set has one element. A cartesian product of two sets is a set of pairs, and so on. But when talking about the category Set I asked you to forget about the contents of sets and instead concentrate on morphisms (arrows) between them. You were allowed, from time to time, to peek under the covers to see what a particular universal construction in Set described in terms of elements. The terminal object turned out to be a set with one element, and so on. But these were just sanity checks.

A functor is defined as a mapping of categories. It’s natural to consider a mapping as a morphism in a category. A functor turned out to be a morphism in the category of categories (small categories, if we want to avoid questions about size). By treating a functor as an arrow, we forfeit the information about its action on the internals of a category (its objects and morphisms), just like we forfeit the information about the action of a function on elements of a set when we treat it as an arrow in Set. But functors between any two categories also form a category. This time you are asked to consider something that was an arrow in one category to be an object in another. In a functor category functors are objects and natural transformations are morphisms. We have discovered that the same thing can be an arrow in one category and an object in another. The naive view of objects as nouns and arrows as verbs doesn’t hold.

Instead of switching between two views, we can try to merge them into one. This is how we get the concept of a 2-category, in which objects are called 0-cells, morphisms are 1-cells, and morphisms between morphisms are 2-cells.

0-cells a, b; 1-cells f, g; and a 2-cell α.

The category of categories Cat is an immediate example. We have categories as 0-cells, functors as 1-cells, and natural transformations as 2-cells. The laws of a 2-category tell us that 1-cells between any two 0-cells form a category (in other words, C(a, b) is a hom-category rather than a hom-set). This fits nicely with our earlier assertion that functors between any two categories form a functor category.

In particular, 1-cells from any 0-cell back to itself also form a category, the hom-category C(a, a); but that category has even more structure. Members of C(a, a) can be viewed as arrows in C or as objects in C(a, a). As arrows, they can be composed with each other. But when we look at them as objects, the composition becomes a mapping from a pair of objects to an object. In fact it looks very much like a product — a tensor product to be precise. This tensor product has a unit: the identity 1-cell. It turns out that, in any 2-category, a hom-category C(a, a) is automatically a monoidal category with the tensor product defined as composition of 1-cells. Associativity and unit laws simply fall out from the corresponding category laws.

Let’s see what this means in our canonical example of a 2-category Cat. The hom-category Cat(a, a) is the category of endofunctors on a. Endofunctor composition plays the role of a tensor product in it. The identity functor is the unit with respect to this product. We’ve seen before that endofunctors form a monoidal category (we used this fact in the definition of a monad), but now we see that this is a more general phenomenon: endo-1-cells in any 2-category form a monoidal category. We’ll come back to it later when we generalize monads.

You might recall that, in a general monoidal category, we did not insist on the monoid laws being satisfied on the nose. It was often enough for the unit laws and the associativity laws to be satisfied up to isomorphism. In a 2-category, monoidal laws in C(a, a) follow from composition laws for 1-cells. These laws are strict, so we will always get a strict monoidal category. It is, however, possible to relax these laws as well. We can say, for instance, that a composition of the identity 1-cell ida with another 1-cell, f :: a -> b, is isomorphic, rather than equal, to f. Isomorphism of 1-cells is defined using 2-cells. In other words, there is a 2-cell:

ρ :: f ∘ ida -> f

that has an inverse.

Identity law in a bicategory holds up to isomorphism (an invertible 2-cell ρ).

We can do the same for the left identity and associativity laws. This kind of relaxed 2-category is called a bicategory (there are some additional coherency laws, which I will omit here).

As expected, endo-1-cells in a bicategory form a general monoidal category with non-strict laws.

An interesting example of a bicategory is the category of spans. A span between two objects a and b is an object x and a pair of morphisms:

f :: x -> a
g :: x -> b

You might recall that we used spans in the definition of a categorical product. Here, we want to look at spans as 1-cells in a bicategory. The first step is to define a composition of spans. Suppose that we have an adjoining span:

f':: y -> b
g':: y -> c

The composition would be a third span, with some apex z. The most natural choice for it is the pullback of g along f'. Remember that a pullback is the object z together with two morphisms:

h :: z -> x
h':: z -> y

such that:

g ∘ h = f' ∘ h'

which is universal among all such objects.

For now, let’s concentrate on spans over the category of sets. In that case, the pullback is just a set of pairs (p, q) from the cartesian product x × y such that:

g p = f' q

A morphism between two spans that share the same endpoints is defined as a morphism h between their apices, such that the appropriate triangles commute.

A 2-cell in Span.

To summarize, in the bicategory Span: 0-cells are sets, 1-cells are spans, 2-cells are span morphisms. An identity 1-cell is a degenerate span in which all three objects are the same, and the two morphisms are identities.

We’ve seen another example of a bicategory before: the bicategory Prof of profunctors, where 0-cells are categories, 1-cells are profunctors, and 2-cells are natural transformations. The composition of profunctors was given by a coend.

By now you should be pretty familiar with the definition of a monad as a monoid in the category of endofunctors. Let’s revisit this definition with the new understanding that the category of endofunctors is just one small hom-category of endo-1-cells in the bicategory Cat. We know it’s a monoidal category: the tensor product comes from the composition of endofunctors. A monoid is defined as an object in a monoidal category — here it will be an endofunctor T — together with two morphisms. Morphisms between endofunctors are natural transformations. One morphism maps the monoidal unit — the identity endofunctor — to T:

η :: I -> T

The second morphism maps the tensor product of T ⊗ T to T. The tensor product is given by endofunctor composition, so we get:

μ :: T ∘ T -> T

We recognize these as the two operations defining a monad (they are called return and join in Haskell), and we know that monoid laws turn to monad laws.

Now let’s remove all mention of endofunctors from this definition. We start with a bicategory C and pick a 0-cell a in it. As we’ve seen earlier, the hom-category C(a, a) is a monoidal category. We can therefore define a monoid in C(a, a) by picking a 1-cell, T, and two 2-cells:

η :: I -> T
μ :: T ∘ T -> T

satisfying the monoid laws. We call this a monad.

That’s a much more general definition of a monad using only 0-cells, 1-cells, and 2-cells. It reduces to the usual monad when applied to the bicategory Cat. But let’s see what happens in other bicategories.

Let’s construct a monad in Span. We pick a 0-cell, which is a set that, for reasons that will become clear soon, I will call Ob. Next, we pick an endo-1-cell: a span from Ob back to Ob. It has a set at the apex, which I will call Ar, equipped with two functions:

dom :: Ar -> Ob
cod :: Ar -> Ob

Let’s call the elements of the set Ar “arrows.” If I also tell you to call the elements of Ob “objects,” you might get a hint where this is leading to. The two functions dom and cod assign the domain and the codomain to an “arrow.”

To make our span into a monad, we need two 2-cells, η and μ. The monoidal unit, in this case, is the trivial span from Ob to Ob with the apex at Ob and two identity functions. The 2-cell η is a function between the apices Ob and Arr. In other words, η assigns an “arrow” to every “object.” A 2-cell in Span must satisfy commutation conditions — in this case:

dom ∘ η = id
cod ∘ η = id

In components, this becomes:

dom (η ob) = ob = cod (η ob)

where ob is an “object” in Ob. In other words, η assigns to every “object” and “arrow” whose domain and codomain are that “object.” We’ll call this special “arrow” the “identity arrow.”

The second 2-cell μ acts on the composition of the span Ar with itself. The composition is defined as a pullback, so its elements are pairs of elements from Ar — pairs of “arrows” (a1, a2). The pullback condition is:

cod a1 = dom a2

We say that a2 and a1 are “composable,” because the domain of one is the codomain of the other.

The 2-cell μ is a function that maps a pair of composable arrows (a1, a2) to a single arrow a3 from Ar. In other words μ defines composition of arrows.

It’s easy to check that monad laws correspond to identity and associativity laws for arrows. We have just defined a category (a small category, mind you, in which objects and arrows form sets).

So, all told, a category is just a monad in the bicategory of spans.

What is amazing about this result is that it puts categories on the same footing as other algebraic structures like monads and monoids. There is nothing special about being a category. It’s just two sets and four functions. In fact we don’t even need a separate set for objects, because objects can be identified with identity arrows (they are in one-to-one correspondence). So it’s really just a set and a few functions. Considering the pivotal role that category theory plays in all of mathematics, this is a very humbling realization.

## Challenges

1. Derive unit and associativity laws for the tensor product defined as composition of endo-1-cells in a bicategory.
2. Check that monad laws for a monad in Span correspond to identity and associativity laws in the resulting category.
3. Show that a monad in Prof is an identity-on-objects functor.

## Bibliography

Nowadays you can’t talk about functional programming without mentioning monads. But there is an alternative universe in which, by chance, Eugenio Moggi turned his attention to Lawvere theories rather than monads. Let’s explore that universe.

## Universal Algebra

There are many ways of describing algebras at various levels of abstraction. We try to find a general language to describe things like monoids, groups, or rings. At the simplest level, all these constructions define operations on elements of a set, plus some laws that must be satisfied by these operations. For instance, a monoid can be defined in terms of a binary operation that is associative. We also have a unit element and unit laws. But with a little bit of imagination we can turn the unit element to a nullary operation — an operation that takes no arguments and returns a special element of the set. If we want to talk about groups, we add a unary operator that takes an element and returns its inverse. There are corresponding left and right inverse laws to go with it. A ring defines two binary operators plus some more laws. And so on.

The big picture is that an algebra is defined by a set of n-ary operations for various values of n, and a set of equational identities. These identities are all universally quantified. The associativity equation must be satisfied for all possible combinations of three elements, and so on.

Incidentally, this eliminates fields from consideration, for the simple reason that zero (unit with respect to addition) has no inverse with respect to multiplication. The inverse law for a field can’t be universally quantified.

This definition of a universal algebra can be extended to categories other than Set, if we replace operations (functions) with morphisms. Instead of a set, we select an object a (called a generic object). A unary operation is just an endomorphism of a. But what about other arities (arity is the number of arguments for a given operation)? A binary operation (arity 2) can be defined as a morphism from the product a×a back to a. A general n-ary operation is a morphism from the n-th power of a to a:

αn :: an -> a

A nullary operation is a morphism from the terminal object (the zeroth power of a). So all we need in order to define any algebra is a category whose objects are powers of one special object a. The specific algebra is encoded in the hom-sets of this category. This is a Lawvere theory in a nutshell.

The derivation of Lawvere theories goes through many steps, so here’s the roadmap:

1. Category of finite sets FinSet.
2. Its skeleton F.
3. Its opposite Fop.
4. Lawvere theory L: an object in the category Law.
5. Model M of a Lawvere category: an object in the category Mod(Law, Set).

## Lavwere Theories

All Lawvere theories share a common backbone. All objects in a Lawvere theory are generated from just one object using products (really, just powers). But how do we define these products in a general category? It turns out that we can define products using a mapping from a simpler category. In fact this simpler category may define coproducts instead of products, and we’ll use a contravariant functor to embed them in our target category. A contravariant functor turns coproducts into products and injections to projections.

The natural choice for the backbone of a Lawvere category is the category of finite sets, FinSet. It contains the empty set 0, a singleton set 1, a two-element set 2, and so on. All objects in this category can be generated from the singleton set using coproducts (treating the empty set as a special case of a nullary coproduct). For instance, a two-element set is a sum of two singletons, 2 = 1 + 1, as expressed in Haskell:

type Two = Either () ()

However, even though it’s natural to think that there’s only one empty set, there may be many distinct singleton sets. In particular, the set 1 + 0 is different from the set 0 + 1, and different from 1 — even though they are all isomorphic. The coproduct in the category of sets is not associative. We can remedy that situation by building a category that identifies all isomorphic sets. Such a category is called a skeleton. In other words, the backbone of any Lawvere theory is the skeleton F of FinSet. The objects in this category can be identified with natural numbers (including zero) that correspond to the element count in FinSet. Coproduct plays the role of addition. Morphisms in F correspond to functions between finite sets. For instance, there is a unique morphism from 0 to n (empty set being the initial object), no morphisms from n to 0 (except 0->0), n morphisms from 1 to n (the injections), one morphism from n to 1, and so on. Here, n denotes an object in F corresponding to all n-element sets in FinSet that have been identified through isomorphims.

Using the category F we can formally define a Lawvere theory as a category L equipped with a special functor

IL :: Fop -> L

This functor must be a bijection on objects and it must preserve finite products (products in Fop are the same as coproducts in F):

IL (m × n) = IL m × IL n

You may sometimes see this functor characterized as identity-on-objects, which means that the objects in F and L are the same. We will therefore use the same names for them — we’ll denote them by natural numbers. Keep in mind though that objects in F are not the same as sets (they are classes of isomorphic sets).

The hom-sets in L are, in general, richer than those in Fop. They may contain morphisms other than the ones corresponding to functions in FinSet (the latter are sometimes called basic product operations). Equational laws of a Lawvere theory are encoded in those morphisms.

The key observation is that the singleton set 1 in F is mapped to some object that we also call 1 in L, and all the other objects in L are automatically powers of this object. For instance, the two-element set 2 in F is the coproduct 1+1, so it must be mapped to a product 1×1 (or 12) in L. In this sense, the category F behaves like the logarithm of L.

Among morphisms in L we have those transferred by the functor IL from F. They play structural role in L. In particular coproduct injections ik become product projections pk. A useful intuition is to imagine the projection:

pk :: 1n -> 1

as the prototype for a function of n variables that ignores all but the k’th variable. Conversely, constant morphisms n->1 in F become diagonal morphisms 1->1n in L. They correspond to duplication of variables.

The interesting morphisms in L are the ones that define n-ary operations other than projections. It’s those morphisms that distinguish one Lawvere theory from another. These are the multiplications, the additions, the selections of unit elements, and so on, that define the algebra. But to make L a full category, we also need compound operations n->m (or, equivalently, 1n -> 1m). Because of the simple structure of the category, they turn out to be products of simpler morphisms of the type n->1. This is a generalization of the statement that a function that returns a product is a product of functions (or, as we’ve seen earlier, that the hom-functor is continuous).

Lawvere theory L is based on Fop, from which it inherits the “boring” morphisms that define the products. It adds the “interesting” morphisms that describe the n-ary operations (dotted arrows).

Lavwere theories form a category Law, in which morphisms are functors that preserve finite products and commute with the functors I. Given two such theories, (L, IL) and (L', I'L'), a morphism between them is a functor F :: L -> L' such that:

F (m × n) = F m × F n
F ∘ IL = I'L'

Morphisms between Lawvere theories encapsulate the idea of the interpretation of one theory inside another. For instance, group multiplication may be interpreted as monoid multiplication if we ignore inverses.

The simplest trivial example of a Lawvere category is Fop itself (corresponding to the choice of the identity functor for IL). This Lawvere theory that has no operations or laws happens to be the initial object in Law.

At this point it would be very helpful to present a non-trivial example of a Lawvere theory, but it would be hard to explain it without first understanding what models are.

## Models of Lawvere Theories

The key to understand Lawvere theories is to realize that one such theory generalizes a lot of individual algebras that share the same structure. For instance, the Lawvere theory of monoids describes the essence of being a monoid. It must be valid for all monoids. A particular monoid becomes a model of such a theory. A model is defined as a functor from the Lawvere theory L to the category of sets Set. (There are generalizations of Lawvere theories that use other categories for models but here I’ll just concentrate on Set.) Since the structure of L depends heavily on products, we require that such a functor preserve finite products. A model of L, also called the algebra over the Lawvere theory L, is therefore defined by a functor:

M :: L -> Set
M (a × b) ≅ M a × M b

Notice that we require the preservation of products only up to isomorphism. This is very important, because strict preservation of products would eliminate most interesting theories.

The preservation of products by models means that the image of M in Set is a sequence of sets generated by powers of the set M 1 — the image of the object 1 from L. Let’s call this set a. (This set is sometimes called a sort, and such algebra is called single-sorted. There exist generalizations of Lawvere theories to multi-sorted algebras.) In particular, binary operations from L are mapped to functions:

a × a -> a

As with any functor, it’s possible that multiple morphisms in L are collapsed to the same function in Set.

Incidentally, the fact that all laws are universally quantified equalities means that every Lawvere theory has a trivial model: a constant functor mapping all objects to the singleton set, and all morphisms to the identity function on it.

A general morphism in L of the form m -> n is mapped to a function:

am -> an

If we have two different models, M and N, a natural transformation between them is a family of functions indexed by n:

μn :: M n -> N n

or, equivalently:

μn :: an -> bn

where b = N 1.

Notice that the naturality condition guarantees the preservation of n-ary operations:

N f ∘ μn = μ1 ∘ M f

where f :: n -> 1 is an n-ary operation in L.

The functors that define models form a category of models, Mod(L, Set), with natural transformations as morphisms.

Consider a model for the trivial Lawvere category Fop. Such model is completely determined by its value at 1, M 1. Since M 1 can be any set, there are as many of these models as there are sets in Set. Moreover, every morphism in Mod(Fop, Set) (a natural transformation between functors M and N) is uniquely determined by its component at M 1. Conversely, every function M 1 -> N 1 induces a natural transformation between the two models M and N. Therefore Mod(Fop, Set) is equivalent to Set.

## The Theory of Monoids

The simplest nontrivial example of a Lawvere theory describes the structure of monoids. It is a single theory that distills the structure of all possible monoids, in the sense that the models of this theory span the whole category Mon of monoids. We’ve already seen a universal construction, which showed that every monoid can be obtained from an appropriate free monoid by identifying a subset of morphisms. So a single free monoid already generalizes a whole lot of monoids. There are, however, infinitely many free monoids. The Lawvere theory for monoids LMon combines all of them in one elegant construction.

Every monoid must have a unit, so we have to have a special morphism η in LMon that goes from 0 to 1. Notice that there can be no corresponding morphism in F. Such morphism would go in the opposite direction, from 1 to 0 which, in FinSet, would be a function from the singleton set to the empty set. No such function exists.

Next, consider morphisms 2->1, members of LMon(2, 1), which must contain prototypes of all binary operations. When constructing models in Mod(LMon, Set), these morphisms will be mapped to functions from the cartesian product M 1 × M 1 to M 1. In other words, functions of two arguments.

The question is: how many functions of two arguments can one implement using only the monoidal operator. Let’s call the two arguments a and b. There is one function that ignores both arguments and returns the monoidal unit. Then there are two projections that return a and b, respectively. They are followed by functions that return ab, ba, aa, bb, aab, and so on… In fact there are as many such functions of two arguments as there are elements in the free monoid with generators a and b. Notice that LMon(2, 1) must contain all those morphisms because one of the models is the free monoid. In a free monoid they correspond to distinct functions. Other models may collapse multiple morphisms in LMon(2, 1) down to a single function, but not the free monoid.

If we denote the free monoid with n generators n*, we may identify the hom-set L(2, 1) with the hom-set Mon(1*, 2*) in Mon, the category of monoids. In general, we pick LMon(m, n) to be Mon(n*, m*). In other words, the category LMon is the opposite of the category of free monoids.

The category of models of the Lawvere theory for monoids, Mod(LMon, Set), is equivalent to the category of all monoids, Mon.

As you may remember, algebraic theories can be described using monads — in particular algebras for monads. It should be no surprise then that there is a connection between Lawvere theories and monads.

First, let’s see how a Lawvere theory induces a monad. It does it through an adjunction between a forgetful functor and a free functor. The forgetful functor U assigns a set to each model. This set is given by evaluating the functor M from Mod(L, Set) at the object 1 in L.

Another way of deriving U is by exploiting the fact that Fop is the initial object in Law. It meanst that, for any Lawvere theory L, there is a unique functor Fop -> L. This functor induces the opposite functor on models (since models are functors from theories to sets):

Mod(L, Set) -> Mod(Fop, Set)

But, as we discussed, the category of models of Fop is equivalent to Set, so we get the forgetful functor:

U :: Mod(L, Set) -> Set

It can be shown that so defined U always has a left adjoint, the free functor F.

This is easily seen for finite sets. The free functor F produces free algebras. A free algebra is a particular model in Mod(L, Set) that is generated from a finite set of generators n. We can implement F as the representable functor:

L(n, -) :: L -> Set

To show that it’s indeed free, all we have to do is to prove that it’s a left adjoint to the forgetful functor:

Mod(L(n, -), M) ≅ Set(n, U(M))

Let’s simplify the right hand side:

Set(n, U(M)) ≅ Set(n, M 1) ≅ (M 1)n ≅ M n

(I used the fact that a set of morphisms is isomorphic to the exponential which, in this case, is just the iterated product.) The adjunction is the result of the Yoneda lemma:

[L, Set](L(n, -), M) ≅ M n

Together, the forgetful and the free functor define a monad T = U∘F on Set. Thus every Lawvere theory generates a monad.

It turns out that the category of algebras for this monad is equivalent to the category of models.

You may recall that monad algebras define ways to evaluate expressions that are formed using monads. A Lawvere theory defines n-ary operations that can be used to generate expressions. Models provide means to evaluate these expressions.

The connection between monads and Lawvere theories doesn’t go both ways, though. Only finitary monads lead to Lawvere thories. A finitary monad is based on a finitary functor. A finitary functor on Set is fully determined by its action on finite sets. Its action on an arbitrary set a can be evaluated using the following coend:

F a = ∫ n an × (F n)

Since the coend generalizes a coproduct, or a sum, this formula is a generalization of a power series expansion. Or we can use the intuition that a functor is a generalized container. In that case a finitary container of as can be described as a sum of shapes and contents. Here, F n is a set of shapes for storing n elements, and the contents is an n-tuple of elements, itself an element of an. For instance, a list (as a functor) is finitary, with one shape for every arity. A tree has more shapes per arity, and so on.

First off, all monads that are generated from Lawvere theories are finitary and they can be expressed as coends:

TL a = ∫ n an × L(n, 1)

Conversely, given any finitary monad T on Set, we can construct a Lawvere theory. We start by constructing a Kleisli category for T. As you may remember, a morphism in a Kleisli category from a to b is given by a morphism in the underlying category:

a -> T b

When restricted to finite sets, this becomes:

m -> T n

The category opposite to this Kleisli category, KlTop, restricted to finite sets, is the Lawvere theory in question. In particular, the hom-set L(n, 1) that describes n-ary operations in L is given by the hom-set KlT(1, n).

It turns out that most monads that we encounter in programming are finitary, with the notable exception of the continuation monad. It is possible to to extend the notion of Lawvere theory beyond finitary operations.

Let’s explore the coend formula in more detail.

TL a = ∫ n an × L(n, 1)

To begin with, this coend is taken over a profunctor P in F defined as:

P n m = an × L(m, 1)

This profunctor is contravariant in the first argument, n. Consider how it lifts morphisms. A morphism in FinSet is a mapping of finite sets f :: m -> n. Such a mapping describes a selection of m elements from an n-element set (repetitions are allowed). It can be lifted to the mapping of powers of a, namely (notice the direction):

an -> am

The lifting simply selects m elements from a tuple of n elements (a1, a2,...an) (possibly with repetitions).

For instance, let’s take fk :: 1 -> n — a selection of the kth element from an n-element set. It lifts to a function that takes a n-tuple of elements of a and returns the kth one.

Or let’s take f :: m -> 1 — a constant function that maps all m elements to one. Its lifting is a function that takes a single element of a and duplicates it m times:

λx -> (x, x, ... x)

You might notice that it’s not immediately obvious that the profunctor in question is covariant in the second argument. The hom-functor L(m, 1) is actually contravariant in m. However, we are taking the coend not in the category L but in the category F. The coend variable n goes over finite sets (or the skeletons of such). The category L contains the opposite of F, so a morphism m -> n in F is a member of L(n, m) in L (the embedding is given by the functor IL).

Let’s check the functoriality of L(m, 1) as a functor from F to Set. We want to lift a function f :: m -> n, so our goal is to implement a function from L(m, 1) to L(n, 1). Corresponding to the function f there is a morphism in L from n to m (notice the direction). Precomposing this morphism with L(m, 1) gives us a subset of L(n, 1).

Notice that, by lifting a function 1->n we can go from L(1, 1) to L(n, 1). We’ll use this fact later on.

The product of a contravariant functor an and a covariant functor L(m, 1) is a profunctor Fop×F->Set. Remember that a coend can be defined as a coproduct (disjoint sum) of all the diagonal members of a profunctor, in which some elements are identified. The identifications correspond to cowedge conditions.

Here, the coend starts as the disjoint sum of sets an × L(n, 1) over all ns. The identifications can be generated by expressing the coend as a coequilizer. We start with an off-diagonal term an × L(m, 1). To get to the diagonal, we can apply a morphism f :: m -> n either to the first or the second component of the product. The two results are then identified.

I have shown before that the lifting of f :: 1 -> n results in these two transformations:

an -> a

and:

L(1, 1) -> L(n, 1)

Therefore, starting from an × L(1, 1) we can reach both:

a × L(1, 1)

when we lift <f, id> and:

an × L(n, 1)

when we lift <id, f>. This doesn’t mean, however, that all elements of an × L(n, 1) can be identified with a × L(1, 1). That’s because not all elements of L(n, 1) can be reached from L(1, 1). Remember that we can only lift morphisms from F. A non-trivial n-ary operation in L cannot be constructed by lifting a morphism f :: 1 -> n.

In other words, we can only identify all addends in the coend formula for which L(n, 1) can be reached from L(1, 1) through the application of basic morphisms. They are all equivalent to a × L(1, 1). Basic morphisms are the ones that are images of morphisms in F.

Let’s see how this works in the simplest case of the Lawvere theory, the Fop itself. In such a theory, every L(n, 1) can be reached from L(1, 1). This is because L(1, 1) is a singleton containing just the identity morphism, and L(n, 1) only contains morphisms corresponding to injections 1->n in F, which are basic morphisms. Therefore all the addends in the coproduct are equivalent and we get:

T a = a × L(1, 1) = a

## Lawvere Theory of Side Effects

Since there is such a strong connection between monads and Lawvere theories, it’s natural to ask the question if Lawvere theories could be used in programming as an alternative to monads. The major problem with monads is that they don’t compose nicely. There is no generic recipe for building monad transformers. Lawvere theories have an advantage in this area: they can be composed using coproducts and tensor products. On the other hand, only finitary monads can be easily converted to Lawvere theories. The outlier here is the continuation monad. There is ongoing research in this area (see bibliography).

To give you a taste of how a Lawvere theory can be used to describe side effects, I’ll discuss the simple case of exceptions that are traditionally implemented using the Maybe monad.

The Maybe monad is generated by the Lawvere theory with a single nullary operation 0->1. A model of this theory is a functor that maps 1 to some set a, and maps the nullary operation to a function:

raise :: () -> a

We can recover the Maybe monad using the coend formula. Let’s consider what the addition of the nullary operation does to the hom-sets L(n, 1). Besides creating a new L(0, 1) (which is absent from Fop), it also adds new morphisms to L(n, 1). These are the results of composing morphism of the type n->0 with our 0->1. Such contributions are all identified with a0 × L(0, 1) in the coend formula, because they can be obtained from:

an × L(0, 1)

by lifting 0->n in two different ways.

The coend reduces to:

TL a = a0 + a1

type Maybe a = Either () a

which is equivalent to:

data Maybe a = Nothing | Just a

Notice that this Lawvere theory only supports the raising of exceptions, not their handling.

## Challenges

1. Enumarate all morphisms between 2 and 3 in F (the skeleton of FinSet).
2. Show that the category of models for the Lawvere theory of monoids is equivalent to the category of monad algebras for the list monad.
3. The Lawvere theory of monoids generates the list monad. Show that its binary operations can be generated using the corresponding Kleisli arrows.
4. FinSet is a subcategory of Set and there is a functor that embeds it in Set. Any functor on Set can be restricted to FinSet. Show that a finitary functor is the left Kan extension of its own restriction.

## Acknowledgments

I’m grateful to Gershom Bazerman for many useful comments.

1. Functorial Semantics of Algebraic Theories, F. William Lawvere
2. Notions of computation determine monads, Gordon Plotkin and John Power

I realize that we might be getting away from programming and diving into hard-core math. But you never know what the next big revolution in programming might bring and what kind of math might be necessary to understand it. There are some very interesting ideas going around, like functional reactive programming with its continuous time, the extention of Haskell’s type system with dependent types, or the exploration on homotopy type theory in programming.

So far I’ve been casually identifying types with sets of values. This is not strictly correct, because such approach doesn’t take into account the fact that, in programming, we compute values, and the computation is a process that takes time and, in extreme cases, might not terminate. Divergent computations are part of every Turing-complete language.

There are also foundational reasons why set theory might not be the best fit as the basis for computer science or even math itself. A good analogy is that of set theory being the assembly language that is tied to a particular architecture. If you want to run your math on different architectures, you have to use more general tools.

One possibility is to use spaces in place of sets. Spaces come with more structure, and may be defined without recourse to sets. One thing usually associated with spaces is topology, which is necessary to define things like continuity. And the conventional approach to topology is, you guessed it, through set theory. In particular, the notion of a subset is central to topology. Not surprisingly, category theorists generalized this idea to categories other than Set. The type of category that has just the right properties to serve as a replacement for set theory is called a topos (plural: topoi), and it provides, among other things, a generalized notion of a subset.

## Subobject Classifier

Let’s start by trying to express the idea of a subset using functions rather than elements. Any function f from some set a to b defines a subset of b–that of the image of a under f. But there are many functions that define the same subset. We need to be more specific. To begin with, we might focus on functions that are injective — ones that don’t smush multiple elements into one. Injective functions “inject” one set into another. For finite sets, you may visualize injective functions as parallel arrows connecting elements of one set to elements of another. Of course, the first set cannot be larger than the second set, or the arrows would necessarily converge. There is still some ambiguity left: there may be another set a' and another injective function f' from that set to b that picks the same subset. But you can easily convince yourself that such a set would have to be isomorphic to a. We can use this fact to define a subset as a family of injective functions that are related by isomorphisms of their domains. More precisely, we say that two injective functions:

f :: a -> b
f':: a'-> b

are equivalent if there is an isomorphism:

h :: a -> a'

such that:

f = f' . h

Such a family of equivalent injections defines a subset of b.

This definition can be lifted to an arbitrary category if we replace injective functions with monomorphism. Just to remind you, a monomorphism m from a to b is defined by its universal property. For any object c and any pair of morphisms:

g :: c -> a
g':: c -> a

such that:

m . g = m . g'

it must be that g = g'.

On sets, this definition is easier to understand if we consider what it would mean for a function m not to be a monomorphism. It would map two different elements of a to a single element of b. We could then find two functions g and g' that differ only at those two elements. The postcomposition with m would then mask this difference.

There is another way of defining a subset: using a single function called the characteristic function. It’s a function χ from the set b to a two-element set Ω. One element of this set is designated as “true” and the other as “false.” This function assigns “true” to those elements of b that are members of the subset, and “false” to those that aren’t.

It remains to specify what it means to designate an element of Ω as “true.” We can use the standard trick: use a function from a singleton set to Ω. We’ll call this function true:

true :: 1 -> Ω

These definitions can be combined in such a way that they not only define what a subobject is, but also define the special object Ω without talking about elements. The idea is that we want the morphism true to represent a “generic” subobject. In Set, it picks a single-element subset from a two-element set Ω. This is as generic as it gets. It’s clearly a proper subset, because Ω has one more element that’s not in that subset.

In a more general setting, we define true to be a monomorphism from the terminal object to the classifying object Ω. But we have to define the classifying object. We need a universal property that links this object to the characteristic function. It turns out that, in Set, the pullback of true along the characteristic function χ defines both the subset a and the injective function that embeds it in b. Here’s the pullback diagram:

Let’s analyze this diagram. The pullback equation is:

true . unit = χ . f

The function true . unit maps every element of a to “true.” Therefore f must map all elements of a to those elements of b for which χ is “true.” These are, by definition, the elements of the subset that is specified by the characteristic function χ. So the image of f is indeed the subset in question. The universality of the pullback guarantees that f is injective.

This pullback diagram can be used to define the classifying object in categories other than Set. Such a category must have a terminal object, which will let us define the monomorphism true. It must also have pullbacks — the actual requirement is that it must have all finite limits (a pullback is an example of a finite limit). Under those assumptions, we define the classifying object Ω by the property that, for every monomorphism f there is a unique morphism χ that completes the pullback diagram.

Let’s analyze the last statement. When we construct a pullback, we are given three objects Ω, b and 1; and two morphisms, true and χ. The existence of a pullback means that we can find the best such object a, equipped with two morphisms f and unit (the latter is uniquely determined by the definition of the terminal object), that make the diagram commute.

Here we are solving a different system of equations. We are solving for Ω and true while varying both a and b. For a given a and b there may or may not be a monomorphism f::a->b. But if there is one, we want it to be a pullback of some χ. Moreover, we want this χ to be uniquely determined by f.

We can’t say that there is a one-to-one correspondence between monomorphisms f and characteristic functions χ, because a pullback is only unique up to isomorphism. But remember our earlier definition of a subset as a family of equivalent injections. We can generalize it by defining a subobject of b as a family of equivalent monomorphisms to b. This family of monomorphisms is in one-to-one corrpespondence with the family of equivalent pullbacks of our diagram.

We can thus define a set of subobjects of b, Sub(b), as a family of monomorphisms, and see that it is isomorphic to the set of morphisms from b to Ω:

Sub(b) ≅ C(b, Ω)

This happens to be a natural isomorphism of two functors. In other words, Sub(-) is a representable (contravariant) functor whose representation is the object Ω.

## Topos

A topos is a category that:

1. Is cartesian closed: It has all products, the terminal object, and exponentials (defined as right adjoints to products),
2. Has limits for all finite diagrams,
3. Has a subobject classifier Ω.

This set of properties makes a topos a shoe-in for Set in most applications. It also has additional properties that follow from its definition. For instance, a topos has all finite colimits, including the initial object.

It would be tempting to define the subobject classifier as a coproduct (sum) of two copies of the terminal object –that’s what it is in Set— but we want to be more general than that. Topoi in which this is true are called Boolean.

## Topoi and Logic

In set theory, a characteristic function may be interpreted as defining a property of the elements of a set — a predicate that is true for some elements and false for others. The predicate isEven selects a subset of even numbers from the set of natural numbers. In a topos, we can generalize the idea of a predicate to be a morphism from object a to Ω. This is why Ω is sometimes called the truth object.

Predicates are the building blocks of logic. A topos contains all the necessary instrumentation to study logic. It has products that correspond to logical conjunctions (logical and), coproducts for disjunctions (logical or), and exponentials for implications. All standard axioms of logic hold in a topos except for the law of excluded middle (or, equivalently, double negation elimination). That’s why the logic of a topos corresponds to constructive or intuitionistic logic.

Intuitionistic logic has been steadily gaining ground, finding unexpected support from computer science. The classical notion of excluded middle is based on the belief that there is absolute truth: Any statement is either true or false or, as Ancient Romans would say, tertium non datur (there is no third option). But the only way we can know whether something is true or false is if we can prove or disprove it. A proof is a process, a computation — and we know that computations take time and resources. In some cases, they may never terminate. It doesn’t make sense to claim that a statement is true if we cannot prove it in finite amount of time. A topos with its more nuanced truth object provides a more general framework for modeling interesting logics.

Next: Lawvere Theories.

## Challenges

1. Show that the function f that is the pullback of true along the characteristic function must be injective.

Abstract: I present a uniform derivation of profunctor optics: isos, lenses, prisms, and grates based on the Yoneda lemma in the (enriched) profunctor category. In particular, lenses and prisms correspond to Tambara modules with the cartesian and cocartesian tensor product.

This blog post is the result of a collaboration between many people. The categorical profunctor picture solidified after long discussions with Edward Kmett. A lot of the theory was developed in exchanges on the Lens IRC channel between Russell O’Connor, Edward Kmett and James Deikun. They came up with the idea to use the Pastro functor to freely generate Tambara modules, which was the missing piece that completed the picture.

My interest in lenses started long time ago when I first made the connection between the universal quantification over functors in the van Laarhoven representation of lenses and the Yoneda lemma. Since I was still learning the basics of category theory, it took me a long time to find the right language to make the formal derivation. Unbeknownst to me Mauro Jaskellioff and Russell O’Connor independently had the same idea and they published a paper about it soon after I published my blog. But even though this solved the problem of lenses, prisms still seemed out of reach of the Yoneda lemma. Prisms require a more general formulation using universal quantification over profunctors. I was able to put a dent in it by deriving Isos from profunctor Yoneda, but then I was stuck again. I shared my ideas with Russell, who reached for help on the IRC channel, and a Haskell proof of concept was quickly established. Two years later, after a brainstorm with Edward, I was finally able to gather all these ideas in one place and give them a little categorical polish.

## Yoneda Lemma

The starting point is the Yoneda lemma, which states that the set of natural transformations between the hom-functor C(a, -) in the category C and an arbitrary functor f from C to Set is (naturally) isomorphic with the set f a:

[C, Set](C(a, -), f) ≅ f a

Here, f is a member of the functor category [C, Set], where natural transformation form hom-sets.

The set of natural transformations may be represented as an end, leading to the following formulation of the Yoneda lemma:

∫x Set(C(a, x), f x) ≅ f a

This notation makes the object x explicit, which is often very convenient. It can be easily translated to Haskell, by replacing the end with the universal quantifier. We get:

forall x. (a -> x) -> f x ≅ f a

A special case of the Yoneda lemma replaces the functor f with a hom-functor in C:

f x = C(b, x)

and we get:

∫x Set(C(a, x), C(b, x)) ≅ C(b, a)

This form of the Yoneda lemma is useful in showing the Yoneda embedding, which states that any category C can be fully and faithfully embedded in the functor category [C, Set]. The embedding is a functor, and the above formula defines its action on morphisms.

We will be interested in using the Yoneda lemma in the functor category. We simply replace C with [C, Set] in the previous formula, and do some renaming of variables:

∫f Set([C, Set](g, f), [C, Set](h, f)) ≅ [C, Set](h, g)

The hom-sets in the functor category are sets of natural transformations, which can be rewritten using ends:

∫f Set(∫x Set(g x, f x), ∫x Set(h x, f x))
≅ ∫x Set(h x, g x)

This is a short recap of adjunctions. We start with two functors going between two categories C and D:

L :: C -> D
R :: D -> C

We say that L is left adjoint to R iff there is a natural isomorphism between hom-sets:

D(L x, y) ≅ C(x, R y)

In particular, we can define an adjunction in a functor category [C, Set]. We start with two higher order (endo-) functors:

L :: [C, Set] -> [C, Set]
R :: [C, Set] -> [C, Set]

We say that L is left adjoint to R iff there is a natural isomorphism between two sets of natural transformations:

[C, Set](L f, g) ≅ [C, Set](f, R g)

where f and g are functors from C to Set. We can rewrite natural transformations using ends:

∫x Set((L f) x, g x) ≅ ∫x Set(f x, (R g) x)

In Haskell, you may think of f and g as type constructors (with the corresponding Functor instances), in which case L and R are types that are parameterized by these type constructors (similar to how the monad or functor classes are).

Here’s a little trick. Since the fixed objects in the formula for Yoneda embedding are arbitrary, we can pick them to be images of other objects under some functor L that we know is left adjoint to another functor R:

∫x Set(D(L a, x), D(L b, x)) ≅ D(L b, L a)

Using the adjunction, this is isomorphic to:

∫x Set(C(a, R x), C(b, R x)) ≅ C(b, (R ∘ L) a)

Notice that the composition R ∘ L of adjoint functors is a monad in C. Let’s write this monad as Φ.

The interesting case is the adjunction between a forgetful functor U and a free functor F. We get:

∫x Set(C(a, U x), C(b, U x)) ≅ C(b, Φ a)

The end is taken over x in a category D that has some additional structure (we’ll see examples of that later); but the hom-sets are in the underlying simpler category C, which is the target of the forgetful functor U.

The Yoneda-with-adjunction formula generalizes to the category of functors:

∫f Set(∫x Set((L g) x, f x), ∫x Set((L h) x, f x))
≅ ∫x Set((L h) x, (L g) x)

∫f Set(∫x Set((g x, (R f) x), ∫x Set(h x, (R f) x))
≅ ∫x Set(h x, (Φ g) x)

Here, Φ is the monad R ∘ L in the category of functors.

An interesting special case is when we substitute hom-functors for g and h:

g x = C(a, x)
h x = C(s, x)

We get:

∫f Set(∫x Set((C(a, x), (R f) x), ∫x Set(C(s, x), (R f) x))
≅ ∫x Set(C(s, x), (Φ C(a, -)) x)

We can then use the regular Yoneda lemma to “integrate over x” and reduce it down to:

∫f Set((R f) a, (R f) s)) ≅ (Φ C(a, -)) s

Again, we are particularly interested in the forgetful/free adjunction:

∫f Set((U f) a, (U f) s)) ≅ (Φ C(a, -)) s

Φ = U ∘ F

The simplest application of this identity is when the functors in question are identity functors. We get:

∫f Set(f a, f s)) ≅ C(a, s)

forall f. Functor f => f a -> f s  ≅ a -> s

You may think of this formula as defining the trivial kind of optic that simply turns a to s.

## Profunctors

Profunctors are just functors from a product category Cop×D to Set. All the results from the last section can be directly applied to the profunctor category [Cop×D, Set]. Keep in mind that morphisms in this category are natural transformations between profunctors. Here’s the key formula:

∫p Set((U p)<a, b>, (U p)<s, t>)) ≅ (Φ (Cop×D)(<a, b>, -)) <s, t>

I have replaced a with a pair <a, b> and s with a pair <s, t>. The end is taken over all profunctors that exhibit some structure that U forgets, and F freely creates. Φ is the monad U ∘ F. It’s a monad that acts on profunctors to produce other profunctors.

Notice that a hom-set in the category Cop×D is a set of pairs of morphisms:

<f, g> :: (Cop×D)(<a, b>, <s, t>)
f :: s -> a
g :: b -> t

the first one going in the opposite direction.

The simplest application of this identity is when we don’t impose any constraints on the profunctors, in which case Φ is the identity monad. We get:

∫p Set(p <a, b>, p <s, t>) ≅ (Cop×D)(<a, b>, <s, t>)

Haskell translation of this formula gives the well-known representation of Iso:

forall p. Profunctor p => p a b -> p s t ≅ Iso s t a b

where:

data Iso s t a b = Iso (s -> a) (b -> t)

Interesting things happen when we impose more structure on our profunctors.

## Enriched Categories

First, let’s generalize profunctors to work on enriched categories. We start with some monoidal category V whose objects serve as hom-objects in an enriched category A. The category V will essentially replace Set in our constructions. For instance, we’ll work with profunctors that are enriched functors from the (enriched) product category to V:

p :: Aop ⊗ A -> V

Notice that we use a tensor product of categories. The objects in such a category are pairs of objects, and the hom-objects are tensor products of individual hom-objects. The definition of composition in a product category requires that the tensor product in V be symmetric (up to isomorphism).

For such profunctors, there is a suitable generalization of the end:

∫x p x x

It’s an object in V together with a V-natural family of projections:

pry :: ∫x p x x -> p y y

We can formulate the Yoneda lemma in an enriched setting by considering enriched functors from A to V. We get the following generalization:

∫x [A(a, x), f x] ≅ f a

Notice that A(a, x) is now an object of V — the hom-object from a to x. The notation [v, w] generalizes the internal hom. It is defined as the right adjoint to the tensor product in V:

V(x ⊗ v, w) ≅ V(x, [v, w])

We are assuming that V is closed, so the internal hom is defined for every pair of objects.

Enriched functors, or V-functors, between two enriched categories C and D form a functor category [C, D] that is itself enriched over V. The hom-object between two functors f and g is given by the end:

[C, D](f, g) = ∫x D(f x, g x)

We can therefore use the Yoneda lemma in a category of enriched functors, or in the category of enriched profunctors. Therefore the result of the previous section holds in the enriched setting as well:

∫p [(U p)<a, b>, (U p)<s, t>] ≅ (Φ (Aop⊗A)(<a, b>, -)) <s, t>

with the understanding that:

(Aop⊗A)(<a, b>, -))

is an enriched hom functor mapping pairs of objects in A to objects in V, plus the appropriate action on hom-objects. This hom-functor is the profunctor on which Φ acts.

## Tambara Modules

An enriched category A may have a monoidal structure of its own. We’ll use the same tensor product notation for its structure as we did for the underlying monoidal category V. There is also a tensorial unit object i in A.

A Tambara module is a V-functor p from Aop⊗A to V, which transforms under the tensor action of A according to a family of morphisms, natural in all three arguments:

α a x y :: p x y -> p (a ⊗ x) (a ⊗ y)

Notice that these are morphisms in the underlying category V, which is also the target of the profunctor.

We impose the usual unit law:

α i x y = id

and associativity:

α a⊗b x y = α a b⊗x b⊗y ∘ α b x y

Strictly speaking one can separately define left and right action but, for simplicity, we’ll assume that the product is symmetric (up to isomorphism).

The intuition behind Tambara modules is that some of the profunctor values are not independent of others. Once we have calculated p x y, we can obtain the value of p at any of the points on the path <a⊗x, a⊗y> by applying α.

Tambara modules form a category that’s enriched over V. The construction of this enrichment is non-trivial. The hom-object between two profunctors p and q in a category of profunctors is given by the end:

[Aop⊗A, V](p, q) = ∫<x y> V(p x y, q x y)

This object generalizes the set of natural transformations. Conceptually, not all natural transformation preserve the Tambara structure, so we have to define a subobject of this hom-object that does. The intuition is that the end is a generalized product of its components. It comes equipped with projections. For instance, the projection pr<x,y> picks the component:

V(p x y, q x y)

But there is also a projection pr<a⊗x, a⊗y> that picks:

V(p a⊗x a⊗y, q a⊗x a⊗y)

from the same end. These two objects are not completely independent, because they can both be transformed into the same object. We have:

V(id, αa) :: V(p x y, q x y) -> V(p x y, q a⊗x a⊗y)
V(αa, id) :: V(a⊗x a⊗y, q a⊗x a⊗y) -> V(p x y, q a⊗x a⊗y)

We are using the fact that the mapping:

<v, w> -> V(v, w)

is itself a profunctor Vop×V -> V, so it can be used to lift pairs of morphisms in V.

Now, given any triple a, x, and y, we want the two paths to be equivalent, which means finding the equalizer between each pair of morphisms:

V(id, αa) ∘ pr<x, y>
V(αa, id) ∘ pr<a⊗x, a⊗y>

Since we want our hom-object to satisfy the above condition for any triple, we have to construct it as an intersection of all those equalizers. Here, an intersection means an object of V together with a family of monomorphisms, each embedding it into a particular equalizer.

It’s possible to construct a forgetful functor from the Tambara category to the category of profunctors [Aop⊗A, V]. It forgets the existence of α and it maps hom-objects between the two categories. Composition in the Tambara category is defined is such a way as to be compatible with this forgetful functor.

The fact that Tambara modules form a category is important, because we want to be able to use the Yoneda lemma in that category.

## Tambara Optics

The key observation is that the forgetful functor from the Tambara category has a left adjoint, and that their composition forms a monad in the category of profunctors. We’ll plug this monad into our general formula.

The construction of this monad starts with a comonad that is given by the following end:

(Θ p) s t = ∫c p (c⊗s) (c⊗t)

For a given profunctor p, this comonad builds a new profunctor that is essentially a gigantic product of all values of this profunctor “shifted” by tensoring its arguments with all possible objects c.

The monad we are interested in is the left adjoint to this comonad (calculated using a Kan extension):

(Φ p) s t = ∫ c x y A(s, c⊗x) ⊗ A(c⊗y, t) ⊗ p x y

Notice that we have two separate tensor products in this formula: one in V, between the hom-objects and the profunctor, and one in A, under the hom-objects. This monad takes an arbitrary profunctor p and produces a new profunctor Φ p.

We can now use our earlier formula:

∫p [(U p)<a, b>, (U p)<s, t>)] ≅ (Φ (Aop⊗A)(<a, b>, -)) <s, t>

inside the Tambara category. To calculate the right hand side, let’s evaluate the action of Φ on the hom-profunctor:

(Φ (Aop⊗A)(<a, b>, -)) <s, t>
= ∫ c x y A(s, c⊗x) ⊗ A(c⊗y, t) ⊗ (Aop⊗A)(<a, b>, <x, y>)

We can “integrate over” x and y using the Yoneda lemma to get:

∫ c A(s, c⊗a) ⊗ A(c⊗b, t)

We get the following result:

∫p [(U p)<a, b>, (U p)<s, t>)] ≅ ∫ c A(s, c⊗a) ⊗ A(c⊗b, t)

where the end on the left is taken over all Tambara modules, and U is the forgetful functor from the Tambara category to the category of profunctors.

If the category in question is closed, we can use the adjunction:

A(c⊗b, t) ≅ A(c, [b, t])

and “perform the integration” over c to arrive at the set/get formulation:

∫ c A(s, c⊗a) ⊗ A(c, [b, t]) ≅ A(s, [b, t]⊗a)

It corresponds to the familiar Haskell lens type:

(s -> b -> t, s -> a)

(This final trick doesn’t work for prisms, because there is no right adjoint to Either.)

A Tambara module is parameterized by the choice of the tensor product ten. We can write a general definition:

class (Profunctor p) => TamModule (ten :: * -> * -> *) p where
leftAction  :: p a b -> p (c ten a) (c ten a)
rightAction :: p a b -> p (a ten c) (b ten c)

This can be further specialized for two obvious monoidal structures: product and sum:

type TamProd p = TamModule (,) p
type TamSum p = TamModule Either p

The former is equivalent to what it called a Strong (or Cartesian) profunctor in Haskell, the latter is equivalent to a Choice (or Cocartesian) profunctor.

Replacing ends and coends with universal and existential quantifiers in Haskell, our main formula becomes (pseudocode):

forall p. TamModule ten p => p a b -> p s t
≅ exists c. (s -> c ten a, c ten b -> t)

The two sides of the isomorphism can be defined as the following data structures:

type TamOptic ten s t a b
= forall p. TamModule ten p => p a b -> p s t
data Optic ten s t a b
= forall c. Optic (s -> c ten a) (c ten b -> t)

Chosing product for the tensor, we recover two equivalent definitions of a lens:

type Lens s t a b = forall p. Strong p => p a b -> p s t
data Lens s t a b = forall c. Lens (s -> (c, a)) ((c, b) -> t)

Chosing the coproduct, we get:

type Prism s t a b = forall p. Choice p => p a b -> p s t
data Prism s t a b = forall c. Prism (s -> Either c a) (Either c b -> t)

These are the well-known existential representations of lenses and prisms.

The monad Φ (or, equivalently, the free functor that generates Tambara modules), is known in Haskell under the name Pastro for product, and Copastro for coproduct:

data Pastro p a b where
Pastro :: ((y, z) -> b) -> p x y -> (a -> (x, z))
-> Pastro p a b
data Copastro p a b where
Copastro :: (Either y z -> b) -> p x y -> (a -> Either x z)
-> Copastro p a b

They are the left adjoints of Tambara and Cotambara, respectively:

newtype Tambara p a b = Tambara forall c. p (a, c) (b, c)
newtype Cotambara p a b = Cotambara forall c. p (Either a c) (Either b c)

which are special cases of the comonad Θ.

## Discussion

It’s interesting that the work on Tambara modules has relevance to Haskell optics. It is, however, just one example of an even larger pattern.

The pattern is that we have a family of transformations in some category A. These transformations can be used to select a class of profunctors that have simple transformation laws. Using a tensor product in a monoidal category to transform objects, in essence “multiplying” them, is just one example of such symmetry. A more general pattern involves a family of transformations f that is closed under composition and includes a unit. We specify a transformation law for profunctors:

class Profunctor p => Related p where
α f a b :: forall f. Trans f => p a b -> p (f a) (f b)

This requirement picks a class of profunctors that we call Related.

Why are profunctors relevant as carriers of symmetry? It’s because they generalize a relationship between objects. The profunctor transformation law essentially says that if two objects a and b are related through p then so are the transformed objects; and that there is a function α that relates the proofs of this relationship. This is in the spirit of profunctors as proof-relevant relations.

As an analogy, imagine that we are comparing people, and the transformation we’re interested in is aging. We notice that family relationships remain invariant under aging: if a is a sibling of b, they will remain siblings as they age. This is not true about other relationships, for instance being a boss of another person. But family bonds are not the only ones that survive the test of time. Another such relation is being older or younger than the other person.

Now imagine that you pick four people at random points in time and you find out that any time-invariant relation between two of them, a and b, also holds between s and t. You have to conclude that there is some connection between s and age-adjusted a, and between age-adjusted b and t. In other words there exists a time shift that transforms one pair to another.

Considering all possible relations from the class Related corresponds to taking the end over all profunctors from this class:

type Optic p s t a b = forall p. Related p =>
p a b -> p s t

The end is a generalization of a product, so it’s enough that one of the components is empty for the whole end to be empty. It means that, for a particular choice of the four types a, b, s, and t, we have to be able to construct a whole family of morphisms, one for every p. We have seen that this end exists only if the four types are connected in a very peculiar way — for instance, if a and b are somehow embedded in s and t.

In the simplest case, we may choose the four types to be related by the transformation:

s = f a
t = f b

For these types, we know that the end exists:

forall p. Related p =>
p a b -> p s t

because there is a family of appropriate morphisms: our αf a b. In general, though, we can get away with weaker connection.

Let’s look at an example of a family of transformations generated by pairing with arbitrary type c:

fc a = (c, a)

Profunctors that respect these transformations are Tambara modules over a cartesian product (or, in lens parlance, Strong profunctors). For the choice:

s = (c, a)
t = (c, b)

the end in question trivially exists. As we’ve seen, we can weaken these conditions. It’s enough that one way (lax) transformations exist:

s -> (c, a)
t <- (c, b)

These morphisms assert that s can be split into a pair, and that t can be constructed from a pair (but not the other way around).

## Other Optics

With the understanding that optics may be defined using a family of transformations, we can analyze another optic called the Grate. It’s based on the following family:

type Reader e a = e -> a

Notice that, unlike the case of Tambara modules, this family is parameterized by a contravariant parameter e.

We are interested in profunctors that transform under these transformations:

class Profunctor p => Closed p where
closed :: p a b -> p (x -> a) (x -> b)

They let us form the optic:

type Grate s t a b = forall p. Closed p => p a b -> p s t

It turns out that there is a profunctor functor that freely generates Closed profunctors. We have the obvious comonad:

newtype Closure p a b = Closure forall x. p (x -> a) (x -> b)

data Environment p u v where
Environment :: ((c -> y) -> v) -> p x y -> (u -> (c -> x))
-> Environment p a b

or, in categorical notation:

(Φ p) u v = ∫ c x y A([c, y], v) ⊗ p x y ⊗ A(u, [c, x])

Using our construction, we apply this monad to the hom-profunctor:

(Φ (Aop⊗A)(<a, b>, -)) <s, t>
= ∫ c x y A([c, y], t) ⊗ (Aop⊗A)(<a, b>, <x, y>) ⊗ A(s, [c, x])
≅ ∫ c A([c, b], t) ⊗ A(s, [c, a])

Translating it back to Haskell, we get a representation of Grate as an existential type:

Grate s t a b = forall c. Grate ((c -> b) -> t) (s -> (c -> a))

This is very similar to the existential representation of a lens or a prism. It has the intuitive interpretation that s can be thought of as a container of a‘s indexed by some hidden type c.

We can also “perform the integration” using the Yoneda lemma, internal-hom-adjunction, and the symmetry of the product:

∫ c A([c, b], t) ⊗ A(s, [c, a])
≅ ∫ c A([c, b], t) ⊗ A(s ⊗ c, a)
≅ ∫ c A([c, b], t) ⊗ A(c, [s, a])
≅ A([[s, a], b], t)

to get the more familiar form:

Grate s t a b ≅ ((s -> a) -> b) -> t

## Conclusion

I find it fascinating that constructions that were first discovered in Haskell to make Haskell’s optics composable have their categorical counterparts. This was not at all obvious, if only because some of them use parametricity arguments. Parametricity is the property of the language, not easily translatable to category theory. Now we know that the profunctor formulation of isos, lenses, prisms, and grates follows from the Yoneda lemma. The work is not complete yet. I haven’t been able to derive the same formulation for traversals, which combine two different tensor products plus some monoidal constraints.

## Bibliography

1. Haskell lens library, Edward Kmett
2. Distributors on a tensor category, D. Tambara
3. Doubles for monoidal categories, Craig Pastro, Ross Street
4. Profunctor optics, Modular data accessors,
Matthew Pickering, Jeremy Gibbons, and Nicolas Wu
5. CPS based functional references, Twan van Laarhoven
6. Isomorphism lenses, Twan van Laarhoven
7. Theorem for Second-Order Functionals, Mauro Jaskellioff and Russell O’Connor