In category theory, all information about objects is encoded in the arrows (morphisms) between them. You’re not supposed to look inside an object to study its internal structure. You’re supposed to look at the relationships this object has with other objects in the category, including itself.

This takes some getting used to, especially when you’re talking about familiar objects like sets. How do you express the idea that a set is empty, or that it consists of pairs of elements from two other sets, without talking about elements?

On the other hand, if you learn to construct particular types of sets purely in terms of their relationships, you can easily apply those constructions to other categories.

So let’s try to define a product of two sets A and B by looking only at its relationships with other objects — relationships being morphisms or, in the case of sets, functions. A Cartesian product AxB comes equipped with two functions called projections (in Haskell they are called fst and snd). So maybe a product AxB can be defined as a set equipped with two functions — one mapping it into A and the other mapping it into B. Unfortunately, there are infinitely many such sets, most of them looking nothing like a product, so it’s not a very good definition.

It turns out that, in order to characterize a set as being a product of A and B, we have to look at its relationship with every other set that has maps into A and B. Only when our set is surrounded by other similar sets, does it stand out. It stands out as the model for its relationship with A and B. It’s the one that has “the best” projections. Its projections are so good that any other pair of projections from any other set have to factor through them. Let me explain what that means.

How can you tell that one candidate for a product is better than another? Suppose you have two sets X and Y and their mappings to A and B. Altogether you have:

p1 :: X -> A
p2 :: X -> B
q1 :: Y -> A
q2 :: Y -> B

We want to somehow relate Y to X, so let’s assume that there is a function h from Y to X. We are interested in functions that also relate the projections, so let’s impose these two conditions on h:

q1 = p1 . h
q2 = p2 . h
Factorization of q1 and q2 through h.

Factorization of q1 and q2 through h.

If this were multiplication of natural numbers, we would say that q1 and q2 have a common factor, h. Here, q1 and q2 are functions and the dot is composition, but we still say that q1 and q2 factorize through h.

In most cases there will be no such function h, and we won’t be able to compare candidates. And that’s okay. But sometimes we’ll get really lucky, and there will be one and only one such function h between Y and X. In that case we will say that X is better than Y as the candidate for the product of A and B.

In particular, if we were allowed to look inside the sets, and if X were just a set of pairs (a, b), then we could always construct a unique h :: Y -> X using q1 and q2, like this:

h y = (q1 y, q2 y)

where y is an element of Y. Of course, we are not allowed to look at the components, but I wanted to motivate our preference for Y over X.

For the sake of an argument, let’s try some other combinations in Haskell — with sets represented as types. For instance, we could try to pretend that the the product of Int and Bool is String, with

p1 = length
p2 "" = False
p2 _ = True

This won’t work because there is a triple (Y, q1, q2) that won’t factorize through p1 and p2. The simplest such Y is the actual product (Int, Bool) with the usual projections fst and snd. There is no h that would map (8, False) into a String whose p1 yields 8 and p2 yields False. (There is no empty string of length 8.)

So let’s try something bigger, like (Int, Bool, Char). Could this work as a product of Int and Bool? This time we can easily find factorizable mappings from (Int, Bool) — but they won’t be unique. We can tuck any Char to a pair of Int and Bool and get a factorizable h, as in:

h (i, b) = (i, b, 'a')
h' (i, b) = (i, b, 'b')

Some sets will be too small, others will be too big — like Goldilocks, we have to find the one that’s just right.

What’s important is that we have a way of comparing triples (X, p1, p2) based on unique factorizability. If there exist “the best” such triple — one that any other triple uniquely factorizes into — we’ll call it the product of A and B. Again, it’s not necessary that any two triples be comparable — we don’t need the total order. We just need any triple to be comparable with the one that represents the product.

Putting all this together, X is a product of A and B if and only if:

  1. There is a pair of morphisms:
    p1 :: X -> A
    p2 :: X -> B
  2. For any other object Y with a pair of morphisms
    q1 :: Y -> A
    q2 :: Y -> B

    there is a unique morphism h, such that

    q1 = p1 . h
    q2 = p2 . h

Notice that I used a very general language of objects and morphisms, rather than sets and functions. This lets us define products in an arbitrary category.

There is no guarantee that a product of two arbitrary objects exists in a particular category. And even if it exists, it doesn’t have to be unique. Well, it has to be unique up to an isomorphism. The intuition is this: If you have two objects X and Y that both fulfill our conditions, then there must be a mapping h from X to Y, and a mapping h’ from Y to X, both factorizing the respective projections. One can easily show that they have to be the inverse of each other.

Even in Haskell, the Cartesian product of two types is not unique. The built-in pair is one representation, but here’s another one:.

type ChurchPair a b = forall c . (a -> b -> c) -> c

It’s a type of polymorphic functions that’s isomorphic to the pair type. To prove this, here are the two mappings, from (ChurchPair a b) to (a, b) and back (and, of course, one is the inverse of the other):

churchToPair :: ChurchPair a b -> (a, b)
churchToPair f = f (\x y -> (x, y))

pairToChurch :: (a, b) -> ChurchPair a b
pairToChurch (x, y) = \g -> g x y

You might wonder what this universal definition of a product means in other, more exotic categories. Here’s one: Every partially ordered set (poset) is a category with the less-than-or-equal relations playing the role of morphisms. A product of two objects A and B in a poset turns out to be the largest object that’s smaller than both A and B, a.k.a., the meet, or the infimum, or the greatest lower bound. Here’s another: In the category of logical predicates, the product is the conjunction; and so on.

Universal Construction

This kind of characterization of a particular type of object in terms of its relationship with the rest of the universe is called a universal construction and is very common in category theory. We specify a certain property and then establish a hierarchy of objects according to how well they model this property. We then pick the “best” model. Best could mean the simplest, the least constrained, the smallest, or the largest, depending on the context.

I used the same method in my previous blog, Understanding Free Monoids. There, the free monoid was picked as a universal object in the category of monoids. We considered all monoids that shared the same generators, and there was one (up to isomorphism) that could be uniquely mapped to all others.

There’s much more to universal constructions that those few examples suggest. For instance, the statement that “there is a unique something for every something else” suggests some kind of isomorphism. Commuting diagrams hint at naturality — that is a natural transformations between functors. These ideas generalize to limits, co-limits, adjunctions and, ultimately, Kan extensions. I hope to write more on these topics in the future.

Initial Object

Let’s try to use a universal construction to define an empty set without talking about elements (or, rather, lack thereof). What can we say about functions in and out of an empty set? First of all, you can’t have a function going from any non-empty set into an empty set. Just try defining a function from integers to an empty set: What’s its value at zero?!

On the other hand, it’s easy to map an empty set to any other set. It’s a no-brainer: You’ll never be asked to provide any value. So an empty set has this property that there’s a unique mapping from it into any other set. Before we use it as a definition of emptiness, let’s ask if there is any other set with this property? There isn’t, because if there were, there would be a mapping from it to our empty set and we have just said that that was impossible.

Having defined an empty set in terms of functions, we can generalize this definition to any category. An object that has a unique mapping to any other object in the category is called the initial object. Can there be more than one initial object in a category? Suppose that we have two such objects, I and I’. By definition, there must be a unique mapping from I to I’, because I is initial. By the same token there must be a mapping from I’ to I, because I’ is initial. What’s the composition of these two mappings? It has to be the identity mapping. There can be no other mapping from an initial object to itself (why?). So the two objects I and I’ must be isomorphic. (But, as we’ve seen, in the category of sets there is only one initial object: the empty set.)

You might recognize initial algebras from my blog Understanding F-Algebras as examples of initial objects in the category of F-algebras.

Terminal Object

Every construction in category theory has its dual: one obtained by inverting the direction of all arrows. We have just defined the initial object as the one with unique morphisms going to any other object. The dual of that would be an object that has a unique morphism coming from any other object. We’ll call such an object, predictably, the terminal object.

What’s the terminal object in the category of sets? It’s a one-element set. The unique morphism (function) from any set to a one-element set simply maps all elements of the set to that one element. It’s called the constant function. So here we have a universal definition of a one-element set as the final object in the category of sets.