May 2014



In my last blog post I talked about a universal construction of a product in an arbitrary category. This kind of construction might seem very abstract to you and me, but not to a mathematician. Every step in that construction may be analyzed under a microscope and further generalized.

Let’s start with the selection of two objects whose product we are defining. What does it mean to select two objects in a category C? In any other branch of mathematics that would be a stupid question, but not in category theory. You select two objects by providing a functor from a two-object category to C.

You might be used to thinking of categories as those big hulking things, like the category of sets or monoids. But there are also dwarf categories that consist of one or two objects and just a handful of arrows between them. They don’t represent anything other than simple graphs. In particular, the simplest non-trivial category, called 1, is just a single object with one identity morphism looping back on itself. We can define a functor from that category to any other category. It will map the single object to a particular object in the target category. (It will also map the only morphism into the identity morphism for that target object.) This functor is the essence of picking an object in a category. Instead of saying “Pick an object in the category C,” you may say “Give me a functor from the singleton category to C.”

The next simplest category is a two-object category, {1, 2}. We have two objects and two identity morphisms acting on them. Assume there are no morphisms between the two objects (so it’s a “discrete” category). A functor F from this category to C is the essence of picking two objects in C.

Functor F from {1,2} to C

Fig 1. Functor F from {1,2} to C

Actually, there is an even simpler functor from {1, 2} to C, a degenerate functor that picks just one object in C — it maps both object to the same object in C. We’ll call this functor ΔX, where X is the target object in C.

Const functor from {1, 2} to C

Fig 2. Const functor from {1, 2} to C

When there are two functors, there may be a natural transformation between them. Let’s try to define such a transformation between ΔX and F. The first functor, ΔX, maps both objects 1 and 2 to X. The second, F, maps 1 to A and 2 to B. A natural transformation between these two functors has two components: the first is a morphism p1 from X to A, and the second is a morphism p2 from X to B.

Components of the natural transformation from the constant functor to F

Fig 3. Components of the natural transformation from the constant functor Δ to F

But that’s exactly one of the triples (X, p1, p2) that I used in the definition of a product in my previous post. The product was defined as the universal triple with a unique mapping from any other triple to it.

We no longer have to talk about selecting objects and constructing triples. Instead we can talk about natural transformations between the constant functor and a functor from the category {1, 2} to C.

This might not sound like a great simplification, but it’s an essential step toward the next generalization. We are going to replace the simple category {1, 2} with an arbitrary category J (sometimes called an index category). Again, we have the constant functor ΔX from J to C, which maps all objects in J to a single object X. It also maps all morphisms in J to a single identity morphism iX. We also have the functor F that embeds J in C, but it will now map morphisms as well as objects. It’s helpful to think of J as a graph with vertices and arrows. F embeds this graph into C. As before, a natural transformation from ΔX to F has as its components morphisms going from X to the vertices of the diagram defined by F. Because of its conical shape, this natural transformation is called a cone.

A cone

Fig 5. A cone

But a natural transformation not only acts between objects but also between morphisms. In fact “naturality” is defined in terms of morphisms.

Fig 6 illustrates the general case. We have two functors F and G. These functors map objects X and Y, respectively, to FX, FY, and GX, GY.

A morphism f from X to Y is mapped by F to Ff and by G to Gf.

On top of that, we have the natural transformation Φ from F to G, whose components are ΦX and ΦY. For Φ to be natural these four arrows must form a commutative diagram. In other words:

Gf . ΦX = ΦY . Ff
Naturality square

Fig 6. Naturality square

Replace F with ΔX and you’ll see that the naturality condition for our cone translates into the commutativity of all the triangles with the vertex X. We didn’t see this condition in the product example, because there the base of the cone had no arrows. So let’s find a better example.

Equilizer

A lot of practical math problems boil down to solving an equation or two. An equation usually has the form: A function of some variable is equal to zero. The set of solutions is called a kernel. But this definition assumes that there is a zero, and that’s too specific. A more general formulation talks about two functions being equal. A set of values on which two functions are equal is called the equilizer. So the equilizer is the generalization of the kernel of the difference of two functions. It will work even if we don’t know what zero is or how to subtract values.

Let’s dissect this definition further. We have two functions, f and g, from A to B. The equilizer is the subset of A on which f and g are equal. How can we generalize this definition so that we don’t have to think about elements of A and B? We have two problems: how to define a subset and how to define the equality of elements.

Let’s start with the subset. We can look at a subset as an embedding of one set inside another. Every function, in a matter of speaking, defines an embedding of its domain into its co-domain. For instance, a function from real numbers to 3-d points can be viewed as an embedding of a line in space.

Let’s see how far we can get with this idea for the purpose of defining the equilizer of f and g. We need another set X and a function p from X to A. The image of X in A will serve as our definition of a subset of A. Now let’s apply the function f after p. We get a composite function f . p from X to B. Similarly, we can apply g after p to get g . p. In general, these two composite functions will be different, except when the image of p falls inside the equilizer of f and g. Notice that we can talk about equality of functions without resorting to equality of elements if we look at them as morphisms in a category.

You see where this is heading? The set X with a function p, while not exactly defining the equilizer, provides some sort of a candidate for it. This candidate can be fully described in categorical terms as an object and a morphism, without recourse to sets or their elements. And the equality

f . p = g . p

is just the condition for the following diagram to commute:

Fig 7. The equilizer cone

Fig 7. The equilizer cone

But this is a cone for a two-object-two-arrow category! The objects A and B and the morphisms f and g can be seen as the image of this category under some functor F. Likewise, the object X is the image of this category under the constant functor ΔX. The two orange arrows are the components of the natural transformation from ΔX to F, and the commutativity of this diagram is nothing but its naturality condition.

Of course, there may be many pairs (X, p) that satisfy our conditions. We are looking for the universal one, with the property that any other pair (Y, q) can be mapped into it.

Fig 8. X is the equilizer if its cone is universal

Fig 8. X is the equilizer if its cone is universal

Just like in the product case, we have to demand that q factorizes through h, or that the appropriate triangles commute — in particular q = p . h. This universal cone is the equilizer.

The Moment of Zen

You might ask yourself the question: Why should I care whether some diagrams commute or not? In particular, why are natural transformations better than the “unnatural” ones? Why is it important that naturality diagrams commute? There must be something incredibly deep behind this idea of commutativity to make it pop up in so many diverse branches of mathematics unified by category theory.

Indeed, the very definition of category contains the mother of all commuting diagrams: The condition that, if there is a morphism from A to B and another from B to C, then there is a shortcut morphism from A to C, and it doesn’t matter which way you go. This is the essence of composability: you decompose the path from A to C into its components.

Every programmer is familiar with the idea of functional decomposition, but this pattern goes much deeper than just programming. It’s the essence of our knowledge, of our understanding. We don’t understand anything unless we are able to split it into smaller steps and then put these steps back together — compose them. Without composition we can’t deal with complexity.

The other foundation of understanding is the ability to create models. We create simple models of complex phenomena all the time. By understanding how models work, we gain insight into how complex phenomena work. But for that we need to establish the correspondence, the mapping, between the domain of the model and the domain of the phenomenon that we’re studying. And what’s the most important property of that mapping? Well, our understanding of the model is built on our understanding of its parts and of the ways they compose. So the mapping must preserve composability! Mappings that preserve composability are called functors.

We’ve used some very simple categories as our models. The {1, 2} category modeled the selection of two objects. We used a functor to embed this model into a bigger category C. In that particular case, there wasn’t much structure for the functor to preserve. The model for the equilizer, on the other hand, was a bit more involved. Besides the two objects, it also had two parallel arrows. The functor that embedded this model had to map those arrows as well.

If functors are used for modeling, what are natural transformations? They relate different phenomena described by the same model.

A stick figure is a model for a human being. It can be mapped into Alice, or it can be mapped into Bob. A natural transformation would map Alice into Bob in such a way as to preserve the stick-figure structure. So if the circle is mapped into Alice’s head by one functor, and into Bob’s head by another, a natural transformation would map Alice’s head into Bob’s head.

The model tells us that there is a neck between the head and the torso. So we could use Alice’s neck to go from her head to her torso, and then map her torso to Bob’s torso using the natural transformation. But we could also map Alice’s head to Bob’s head using the natural transformation, and then use Bob’s neck to get to his torso. Naturality tells us that it doesn’t matter which way we go, the result is the same.

We also have the constant functor, which maps the model into a single blob. A natural transformation then maps this blob into Bob; again, with the condition that it doesn’t matter whether we reach Bob’s torso from the blob through his right hand and arm or through his left hand and arm.

Now take into account that, although the stick figure is just made of points and lines, it is mapped into real human beings and real blobs. So a morphism from a blob to Bob’s hand is not trivial, and there may be many different ones. Similarly, the arm that connects the hand to the torso contains veins, arteries, nerves, etc. So naturality is a non-trivial condition, especially if the blob itself is complex.

So what is special about the universal blob? It has to have some interesting internal structure because it not only maps into every part of Bob, but it maps better than any other blob. Any other blob that maps naturally into Bob also maps into the universal blob. And it maps in such a way that it doesn’t matter if you go directly from the said blob to Bob, or if you go through the universal blob. The universal blob contains the stick-figure essence of Bob, and no other blob (that is not isomorphic to it) can take its place.

Universality tells us something about the structure of an object not by dissecting it but by describing its relationship to other objects that model a certain relationship.

Limits

Having defined a cone as a natural transformation from ΔX to F — two functors that map the category J to category C — we can now define the limit of F as the universal cone. For that, we need a way to compare cones that differ only by their top vertex. If there is a universal cone with the top vertex U, then any other cone with the top vertex V can be uniquely mapped onto it. It must be a mapping that preserves the structure of the cone, so if h maps V into U, h must factorize all the arrows that form the V cone into the arrows that form the U cone. For instance, in Fig 9, we must have q = p . h, etc.

The limit U is the vertex of the universal cone

Fig 9. The limit U is the vertex of the universal cone

But there’s a better way of looking at it. Notice that, in the definition of a limit, we establish a one to one correspondence between a cone with the vertex V and a morphism h from V to U. The definition talks about there being a unique h for every cone, but the other way around works as well. Given an h we can construct the cone at V by composing the morphisms — as in q = p . h.

A cone is a natural transformation, a member of Nat(ΔV, F), the set of natural transformation between two functors, ΔV and F. A morphism h is a member of the home set Hom(V, U), the set of morphisms between two objects. So we can say that if U is a limit of F then there is an isomorphism between those two sets:

Nat(ΔV, F) ~ Hom(V, U)

In fact, if we insist that this isomorphism be natural, we’ll get all the commuting triangles for free. Remember? Naturality condition is just the commutability of certain diagrams. So if there is a natural isomorphism between these two sets, then U is a limit of F. In fact, we can use this isomorphism as the definition of the limit.

But what does it mean that the isomorphism is natural? What are the two functors that are mapped by it? One functor maps V into Hom(V, U), and the other maps V into Nat(ΔV, F). Both Hom(V, U) and Nat(ΔV, F) are sets in the category of sets. So these are functors from C to Set. You might recognize the first one from the Yoneda lemma.

The second one is a bit more tricky. To understand it, you have to realize that functors themselves form a category. If functors are object in the functor category than natural transformations between these functors are morphisms. Natural transformations compose (and their composition is associative), and there always is a unit natural transformation for each functor. So this is a legitimate category. A hom set in that category is a set of all natural transformations between two functors. Nat(ΔV, F) is one such hom set.

Whenever you see a natural isomorphism of hom sets, chances are there is an adjunction between two functors. A functor F is said to be left adjoint to the functor G (or G right adjoint to F) if the following two hom sets are naturally isomorphic:

Hom(FX, Y) ~ Hom(X, GY)

If you compare this with our definition of a limit, you’ll see that the functor that maps F to its limit U is right adjoint to the const functor ΔV. Of course this is only true if the said functor exists, which is not always true — not all diagrams of shape J must have limits in C.

Adjunction between delta and f is the natural isomorphism between the corresponding hom sets (arrows).

The adjunction between ΔV and Lim F (limit of F) is the natural isomorphism between the corresponding hom sets (arrows).

I hope to talk more about adjoint functors in the future.

Videography

Hopefully this blog post will prepare you to watch this excellent series of videos by Catsters on YouTube:

  1. Cones and limits: Definitions.
  2. Examples of limits: Terminal object, product, pullback, equalizer.
  3. Cones as natural transformations.
  4. Formal definition of limit as natural isomorphism.
  5. Limits and adjunctions.
  6. Colimits.

In category theory, all information about objects is encoded in the arrows (morphisms) between them. You’re not supposed to look inside an object to study its internal structure. You’re supposed to look at the relationships this object has with other objects in the category, including itself.

This takes some getting used to, especially when you’re talking about familiar objects like sets. How do you express the idea that a set is empty, or that it consists of pairs of elements from two other sets, without talking about elements?

On the other hand, if you learn to construct particular types of sets purely in terms of their relationships, you can easily apply those constructions to other categories.

So let’s try to define a product of two sets A and B by looking only at its relationships with other objects — relationships being morphisms or, in the case of sets, functions. A Cartesian product AxB comes equipped with two functions called projections (in Haskell they are called fst and snd). So maybe a product AxB can be defined as a set equipped with two functions — one mapping it into A and the other mapping it into B. Unfortunately, there are infinitely many such sets, most of them looking nothing like a product, so it’s not a very good definition.

It turns out that, in order to characterize a set as being a product of A and B, we have to look at its relationship with every other set that has maps into A and B. Only when our set is surrounded by other similar sets, does it stand out. It stands out as the model for its relationship with A and B. It’s the one that has “the best” projections. Its projections are so good that any other pair of projections from any other set have to factor through them. Let me explain what that means.

How can you tell that one candidate for a product is better than another? Suppose you have two sets X and Y and their mappings to A and B. Altogether you have:

p1 :: X -> A
p2 :: X -> B
q1 :: Y -> A
q2 :: Y -> B

We want to somehow relate Y to X, so let’s assume that there is a function h from Y to X. We are interested in functions that also relate the projections, so let’s impose these two conditions on h:

q1 = p1 . h
q2 = p2 . h
Factorization of q1 and q2 through h.

Factorization of q1 and q2 through h.

If this were multiplication of natural numbers, we would say that q1 and q2 have a common factor, h. Here, q1 and q2 are functions and the dot is composition, but we still say that q1 and q2 factorize through h.

In most cases there will be no such function h, and we won’t be able to compare candidates. And that’s okay. But sometimes we’ll get really lucky, and there will be one and only one such function h between Y and X. In that case we will say that X is better than Y as the candidate for the product of A and B.

In particular, if we were allowed to look inside the sets, and if X were just a set of pairs (a, b), then we could always construct a unique h :: Y -> X using q1 and q2, like this:

h y = (q1 y, q2 y)

where y is an element of Y. Of course, we are not allowed to look at the components, but I wanted to motivate our preference for Y over X.

For the sake of an argument, let’s try some other combinations in Haskell — with sets represented as types. For instance, we could try to pretend that the the product of Int and Bool is String, with

p1 = length
p2 "" = False
p2 _ = True

This won’t work because there is a triple (Y, q1, q2) that won’t factorize through p1 and p2. The simplest such Y is the actual product (Int, Bool) with the usual projections fst and snd. There is no h that would map (8, False) into a String whose p1 yields 8 and p2 yields False. (There is no empty string of length 8.)

So let’s try something bigger, like (Int, Bool, Char). Could this work as a product of Int and Bool? This time we can easily find factorizable mappings from (Int, Bool) — but they won’t be unique. We can tuck any Char to a pair of Int and Bool and get a factorizable h, as in:

h (i, b) = (i, b, 'a')
h' (i, b) = (i, b, 'b')

Some sets will be too small, others will be too big — like Goldilocks, we have to find the one that’s just right.

What’s important is that we have a way of comparing triples (X, p1, p2) based on unique factorizability. If there exist “the best” such triple — one that any other triple uniquely factorizes into — we’ll call it the product of A and B. Again, it’s not necessary that any two triples be comparable — we don’t need the total order. We just need any triple to be comparable with the one that represents the product.

Putting all this together, X is a product of A and B if and only if:

  1. There is a pair of morphisms:
    p1 :: X -> A
    p2 :: X -> B
  2. For any other object Y with a pair of morphisms
    q1 :: Y -> A
    q2 :: Y -> B

    there is a unique morphism h, such that

    q1 = p1 . h
    q2 = p2 . h

Notice that I used a very general language of objects and morphisms, rather than sets and functions. This lets us define products in an arbitrary category.

There is no guarantee that a product of two arbitrary objects exists in a particular category. And even if it exists, it doesn’t have to be unique. Well, it has to be unique up to an isomorphism. The intuition is this: If you have two objects X and Y that both fulfill our conditions, then there must be a mapping h from X to Y, and a mapping h’ from Y to X, both factorizing the respective projections. One can easily show that they have to be the inverse of each other.

Even in Haskell, the Cartesian product of two types is not unique. The built-in pair is one representation, but here’s another one:.

type ChurchPair a b = forall c . (a -> b -> c) -> c

It’s a type of polymorphic functions that’s isomorphic to the pair type. To prove this, here are the two mappings, from (ChurchPair a b) to (a, b) and back (and, of course, one is the inverse of the other):

churchToPair :: ChurchPair a b -> (a, b)
churchToPair f = f (\x y -> (x, y))

pairToChurch :: (a, b) -> ChurchPair a b
pairToChurch (x, y) = \g -> g x y

You might wonder what this universal definition of a product means in other, more exotic categories. Here’s one: Every partially ordered set (poset) is a category with the less-than-or-equal relations playing the role of morphisms. A product of two objects A and B in a poset turns out to be the largest object that’s smaller than both A and B, a.k.a., the meet, or the infimum, or the greatest lower bound. Here’s another: In the category of logical predicates, the product is the conjunction; and so on.

Universal Construction

This kind of characterization of a particular type of object in terms of its relationship with the rest of the universe is called a universal construction and is very common in category theory. We specify a certain property and then establish a hierarchy of objects according to how well they model this property. We then pick the “best” model. Best could mean the simplest, the least constrained, the smallest, or the largest, depending on the context.

I used the same method in my previous blog, Understanding Free Monoids. There, the free monoid was picked as a universal object in the category of monoids. We considered all monoids that shared the same generators, and there was one (up to isomorphism) that could be uniquely mapped to all others.

There’s much more to universal constructions that those few examples suggest. For instance, the statement that “there is a unique something for every something else” suggests some kind of isomorphism. Commuting diagrams hint at naturality — that is a natural transformations between functors. These ideas generalize to limits, co-limits, adjunctions and, ultimately, Kan extensions. I hope to write more on these topics in the future.

Initial Object

Let’s try to use a universal construction to define an empty set without talking about elements (or, rather, lack thereof). What can we say about functions in and out of an empty set? First of all, you can’t have a function going from any non-empty set into an empty set. Just try defining a function from integers to an empty set: What’s its value at zero?!

On the other hand, it’s easy to map an empty set to any other set. It’s a no-brainer: You’ll never be asked to provide any value. So an empty set has this property that there’s a unique mapping from it into any other set. Before we use it as a definition of emptiness, let’s ask if there is any other set with this property? There isn’t, because if there were, there would be a mapping from it to our empty set and we have just said that that was impossible.

Having defined an empty set in terms of functions, we can generalize this definition to any category. An object that has a unique mapping to any other object in the category is called the initial object. Can there be more than one initial object in a category? Suppose that we have two such objects, I and I’. By definition, there must be a unique mapping from I to I’, because I is initial. By the same token there must be a mapping from I’ to I, because I’ is initial. What’s the composition of these two mappings? It has to be the identity mapping. There can be no other mapping from an initial object to itself (why?). So the two objects I and I’ must be isomorphic. (But, as we’ve seen, in the category of sets there is only one initial object: the empty set.)

You might recognize initial algebras from my blog Understanding F-Algebras as examples of initial objects in the category of F-algebras.

Terminal Object

Every construction in category theory has its dual: one obtained by inverting the direction of all arrows. We have just defined the initial object as the one with unique morphisms going to any other object. The dual of that would be an object that has a unique morphism coming from any other object. We’ll call such an object, predictably, the terminal object.

What’s the terminal object in the category of sets? It’s a one-element set. The unique morphism (function) from any set to a one-element set simply maps all elements of the set to that one element. It’s called the constant function. So here we have a universal definition of a one-element set as the final object in the category of sets.