Haskell



This is part 8 of Categories for Programmers. Previously: Functors. See the Table of Contents.

Now that you know what a functor is, and have seen a few examples, let’s see how we can build larger functors from smaller ones. In particular it’s interesting to see which type constructors (which correspond to mappings between objects in a category) can be extended to functors (which include mappings between morphisms).

Bifunctors

Since functors are morphisms in Cat (the category of categories), a lot of intuitions about morphisms — and functions in particular — apply to functors as well. For instance, just like you can have a function of two arguments, you can have a functor of two arguments, or a bifunctor. On objects, a bifunctor maps every pair of objects, one from category C, and one from category D, to an object in category E. Notice that this is just saying that it’s a mapping from a cartesian product of categories C×D to E.

Bifunctor

That’s pretty straightforward. But functoriality means that a bifunctor has to map morphisms as well. This time, though, it must map a pair of morphisms, one from C and one from D, to a morphism in E.

Again, a pair of morphisms is just a single morphism in the product category C×D. We define a morphism in a cartesian product of categories as a pair of morphisms which goes from one pair of objects to another pair of objects. These pairs of morphisms can be composed in the obvious way:

(f, g) ∘ (f', g') = (f ∘ f', g ∘ g')

The composition is associative and it has an identity — a pair of identity morphisms (id, id). So a cartesian product of categories is indeed a category.

An easier way to think about bifunctors would be to consider them functors in each argument separately. So instead of translating functorial laws — associativity and identity preservation — from functors to bifunctors, it would be enough to check them separately for each argument. However, in general, separate functoriality is not enough to prove joint functoriality. Categories in which joint functoriality fails are called premonoidal.

Let’s define a bifunctor in Haskell. In this case all three categories are the same: the category of Haskell types. A bifunctor is a type constructor that takes two type arguments. Here’s the definition of the Bifunctor typeclass taken directly from the library Control.Bifunctor:

class Bifunctor f where
    bimap :: (a -> c) -> (b -> d) -> f a b -> f c d
    bimap g h = first g . second h
    first :: (a -> c) -> f a b -> f c b
    first g = bimap g id
    second :: (b -> d) -> f a b -> f a d
    second = bimap id

The type variable f represents the bifunctor. You can see that in all type signatures it’s always applied to two type arguments. The first type signature defines bimap: a mapping of two functions at once. The result is a lifted function, (f a b -> f c d), operating on types generated by the bifunctor’s type constructor. There is a default implementation of bimap in terms of first and second, which shows that it’s enough to have functoriality in each argument separately to be able to define a bifunctor. (As mentioned before, this doesn’t always work, because the two maps may not commute, that is first g . second h may not be the same as second h . first g.)

Bimap

bimap

The two other type signatures, first and second, are the two fmaps witnessing the functoriality of f in the first and the second argument, respectively.

First

first

Second

second

The typeclass definition provides default implementations for both of them in terms of bimap.

When declaring an instance of Bifunctor, you have a choice of either implementing bimap and accepting the defaults for first and second, or implementing both first and second and accepting the default for bimap (of course, you may implement all three of them, but then it’s up to you to make sure they are related to each other in this manner).

Product and Coproduct Bifunctors

An important example of a bifunctor is the categorical product — a product of two objects that is defined by a universal construction. If the product exists for any pair of objects, the mapping from those objects to the product is bifunctorial. This is true in general, and in Haskell in particular. Here’s the Bifunctor instance for a pair constructor — the simplest product type:

instance Bifunctor (,) where
    bimap f g (x, y) = (f x, g y)

There isn’t much choice: bimap simply applies the first function to the first component, and the second function to the second component of a pair. The code pretty much writes itself, given the types:

bimap :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)

The action of the bifunctor here is to make pairs of types, for instance:

(,) a b = (a, b)

By duality, a coproduct, if it’s defined for every pair of objects in a category, is also a bifunctor. In Haskell, this is exemplified by the Either type constructor being an instance of Bifunctor:

instance Bifunctor Either where
    bimap f _ (Left x)  = Left (f x)
    bimap _ g (Right y) = Right (g y)

This code also writes itself.

Now, remember when we talked about monoidal categories? A monoidal category defines a binary operator acting on objects, together with a unit object. I mentioned that Set is a monoidal category with respect to cartesian product, with the singleton set as a unit. And it’s also a monoidal category with respect to disjoint union, with the empty set as a unit. What I haven’t mentioned is that one of the requirements for a monoidal category is that the binary operator be a bifunctor. This is a very important requirement — we want the monoidal product to be compatible with the structure of the category, which is defined by morphisms. We are now one step closer to the full definition of a monoidal category (we still need to learn about naturality, before we can get there).

Functorial Algebraic Data Types

We’ve seen several examples of parameterized data types that turned out to be functors — we were able to define fmap for them. Complex data types are constructed from simpler data types. In particular, algebraic data types (ADTs) are created using sums and products. We have just seen that sums and products are functorial. We also know that functors compose. So if we can show that the basic building blocks of ADTs are functorial, we’ll know that parameterized ADTs are functorial too.

So what are the building blocks of parameterized algebraic data types? First, there are the items that have no dependency on the type parameter of the functor, like Nothing in Maybe, or Nil in List. They are equivalent to the Const functor. Remember, the Const functor ignores its type parameter (really, the second type parameter, which is the one of interest to us, the first one being kept constant).

Then there are the elements that simply encapsulate the type parameter itself, like Just in Maybe. They are equivalent to the identity functor. I mentioned the identity functor previously, as the identity morphism in Cat, but didn’t give its definition in Haskell. Here it is:

data Identity a = Identity a
instance Functor Identity where
    fmap f (Identity x) = Identity (f x)

You can think of Identity as the simplest possible container that always stores just one (immutable) value of type a.

Everything else in algebraic data structures is constructed from these two primitives using products and sums.

With this new knowledge, let’s have a fresh look at the Maybe type constructor:

data Maybe a = Nothing | Just a

It’s a sum of two types, and we now know that the sum is functorial. The first part, Nothing can be represented as a Const () acting on a (the first type parameter of Const is set to unit — later we’ll see more interesting uses of Const). The second part is just a different name for the identity functor. We could have defined Maybe, up to isomorphism, as:

type Maybe a = Either (Const () a) (Identity a)

So Maybe is the composition of the bifunctor Either with two functors, Const () and Identity. (Const is really a bifunctor, but here we always use it partially applied.)

We’ve already seen that a composition of functors is a functor — we can easily convince ourselves that the same is true of bifunctors. All we need is to figure out how a composition of a bifunctor with two functors works on morphisms. Given two morphisms, we simply lift one with one functor and the other with the other functor. We then lift the resulting pair of lifted morphisms with the bifunctor.

We can express this composition in Haskell. Let’s define a data type that is parameterized by a bifunctor bf (it’s a type variable that is a type constructor that takes two types as arguments), two functors fu and gu (type constructors that take one type variable each), and two regular types a and b. We apply fu to a and gu to b, and then apply bf to the resulting two types:

newtype BiComp bf fu gu a b = BiComp (bf (fu a) (gu b))

That’s the composition on objects, or types. Notice how in Haskell we apply type constructors to types, just like we apply functions to arguments. The syntax is the same.

If you’re getting a little lost, try applying BiComp to Either, Const (), Identity, a, and b, in this order. You will recover our bare-bone version of Maybe b (a is ignored).

The new data type BiComp is a bifunctor in a and b, but only if bf is itself a Bifunctor and fu and gu are Functors. The compiler must know that there will be a definition of bimap available for bf, and definitions of fmap for fu and gu. In Haskell, this is expressed as a precondition in the instance declaration: a set of class constraints followed by a double arrow:

instance (Bifunctor bf, Functor fu, Functor gu) =>
  Bifunctor (BiComp bf fu gu) where
    bimap f1 f2 (BiComp x) = BiComp ((bimap (fmap f1) (fmap f2)) x)

The implementation of bimap for BiComp is given in terms of bimap for bf and the two fmaps for fu and gu. The compiler automatically infers all the types and picks the correct overloaded functions whenever BiComp is used.

The x in the definition of bimap has the type:

bf (fu a) (gu b)

which is quite a mouthful. The outer bimap breaks through the outer bf layer, and the two fmaps dig under fu and gu, respectively. If the types of f1 and f2 are:

f1 :: a -> a'
f2 :: b -> b'

then the final result is of the type bf (fu a') (gu b'):

bimapbf :: (fu a -> fu a') -> (gu b -> gu b') 
  -> bf (fu a) (gu b) -> bf (fu a') (gu b')

If you like jigsaw puzzles, these kinds of type manipulations can provide hours of entertainment.

So it turns out that we didn’t have to prove that Maybe was a functor — this fact followed from the way it was constructed as a sum of two functorial primitives.

A perceptive reader might ask the question: If the derivation of the Functor instance for algebraic data types is so mechanical, can’t it be automated and performed by the compiler? Indeed, it can, and it is. You need to enable a particular Haskell extension by including this line at the top of your source file:

{-# LANGUAGE DeriveFunctor #-}

and then add deriving Functor to your data structure:

data Maybe a = Nothing | Just a
  deriving Functor

and the corresponding fmap will be implemented for you.

The regularity of algebraic data structures makes it possible to derive instances not only of Functor but of several other type classes, including the Eq type class I mentioned before. There is also the option of teaching the compiler to derive instances of your own typeclasses, but that’s a bit more advanced. The idea though is the same: You provide the behavior for the basic building blocks and sums and products, and let the compiler figure out the rest.

Functors in C++

If you are a C++ programmer, you obviously are on your own as far as implementing functors goes. However, you should be able to recognize some types of algebraic data structures in C++. If such a data structure is made into a generic template, you should be able to quickly implement fmap for it.

Let’s have a look at a tree data structure, which we would define in Haskell as a recursive sum type:

data Tree a = Leaf a | Node (Tree a) (Tree a)
    deriving Functor

As I mentioned before, one way of implementing sum types in C++ is through class hierarchies. It would be natural, in an object-oriented language, to implement fmap as a virtual function of the base class Functor and then override it in all subclasses. Unfortunately this is impossible because fmap is a template, parameterized not only by the type of the object it’s acting upon (the this pointer) but also by the return type of the function that’s been applied to it. Virtual functions cannot be templatized in C++. We’ll implement fmap as a generic free function, and we’ll replace pattern matching with dynamic_cast.

The base class must define at least one virtual function in order to support dynamic casting, so we’ll make the destructor virtual (which is a good idea in any case):

template<class T>
struct Tree {
    virtual ~Tree() {};
};

The Leaf is just an Identity functor in disguise:

template<class T>
struct Leaf : public Tree<T> {
    T _label;
    Leaf(T l) : _label(l) {}
};

The Node is a product type:

template<class T>
struct Node : public Tree<T> {
    Tree<T> * _left;
    Tree<T> * _right;
    Node(Tree<T> * l, Tree<T> * r) : _left(l), _right(r) {}
};

When implementing fmap we take advantage of dynamic dispatching on the type of the Tree. The Leaf case applies the Identity version of fmap, and the Node case is treated like a bifunctor composed with two copies of the Tree functor. As a C++ programmer, you’re probably not used to analyzing code in these terms, but it’s a good exercise in categorical thinking.

template<class A, class B>
Tree<B> * fmap(std::function<B(A)> f, Tree<A> * t)
{
    Leaf<A> * pl = dynamic_cast <Leaf<A>*>(t);
    if (pl)
        return new Leaf<B>(f (pl->_label));
    Node<A> * pn = dynamic_cast<Node<A>*>(t);
    if (pn)
        return new Node<B>( fmap<A>(f, pn->_left)
                          , fmap<A>(f, pn->_right));
    return nullptr;
}

For simplicity, I decided to ignore memory and resource management issues, but in production code you would probably use smart pointers (unique or shared, depending on your policy).

Compare it with the Haskell implementation of fmap:

instance Functor Tree where
    fmap f (Leaf a) = Leaf (f a)
    fmap f (Node t t') = Node (fmap f t) (fmap f t')

This implementation can also be automatically derived by the compiler.

The Writer Functor

I promised that I would come back to the Kleisli category I described earlier. Morphisms in that category were represented as “embellished” functions returning the Writer data structure.

type Writer a = (a, String)

I said that the embellishment was somehow related to endofunctors. And, indeed, the Writer type constructor is functorial in a. We don’t even have to implement fmap for it, because it’s just a simple product type.

But what’s the relation between a Kleisli category and a functor — in general? A Kleisli category, being a category, defines composition and identity. Let’ me remind you that the composition is given by the fish operator:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)
m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

and the identity morphism by a function called return:

return :: a -> Writer a
return x = (x, "")

It turns out that, if you look at the types of these two functions long enough (and I mean, long enough), you can find a way to combine them to produce a function with the right type signature to serve as fmap. Like this:

fmap f = id >=> (\x -> return (f x))

Here, the fish operator combines two functions: one of them is the familiar id, and the other is a lambda that applies return to the result of acting with f on the lambda’s argument. The hardest part to wrap your brain around is probably the use of id. Isn’t the argument to the fish operator supposed to be a function that takes a “normal” type and returns an embellished type? Well, not really. Nobody says that a in a -> Writer b must be a “normal” type. It’s a type variable, so it can be anything, in particular it can be an embellished type, like Writer b.

So id will take Writer a and turn it into Writer a. The fish operator will fish out the value of a and pass it as x to the lambda. There, f will turn it into a b and return will embellish it, making it Writer b. Putting it all together, we end up with a function that takes Writer a and returns Writer b, exactly what fmap is supposed to produce.

Notice that this argument is very general: you can replace Writer with any type constructor. As long as it supports a fish operator and return, you can define fmap as well. So the embellishment in the Kleisli category is always a functor. (Not every functor, though, gives rise to a Kleisli category.)

You might wonder if the fmap we have just defined is the same fmap the compiler would have derived for us with deriving Functor. Interestingly enough, it is. This is due to the way Haskell implements polymorphic functions. It’s called parametric polymorphism, and it’s a source of so called theorems for free. One of those theorems says that, if there is an implementation of fmap for a given type constructor, one that preserves identity, then it must be unique.

Covariant and Contravariant Functors

Now that we’ve reviewed the writer functor, let’s go back to the reader functor. It was based on the partially applied function-arrow type constructor:

(->) r

We can rewrite it as a type synonym:

type Reader r a = r -> a

for which the Functor instance, as we’ve seen before, reads:

instance Functor (Reader r) where
    fmap f g = f . g

But just like the pair type constructor, or the Either type constructor, the function type constructor takes two type arguments. The pair and Either were functorial in both arguments — they were bifunctors. Is the function constructor a bifunctor too?

Let’s try to make it functorial in the first argument. We’ll start with a type synonym — it’s just like the Reader but with the arguments flipped:

type Op r a = a -> r

This time we fix the return type, r, and vary the argument type, a. Let’s see if we can somehow match the types in order to implement fmap, which would have the following type signature:

fmap :: (a -> b) -> (a -> r) -> (b -> r)

With just two functions taking a and returning, respectively, b and r, there is simply no way to build a function taking b and returning r! It would be different if we could somehow invert the first function, so that it took b and returned a instead. We can’t invert an arbitrary function, but we can go to the opposite category.

A short recap: For every category C there is a dual category Cop. It’s a category with the same objects as C, but with all the arrows reversed.

Consider a functor that goes between Cop and some other category D:
F :: Cop → D
Such a functor maps a morphism fop :: a → b in Cop to the morphism F fop :: F a → F b in D. But the morphism fop secretly corresponds to some morphism f :: b → a in the original category C. Notice the inversion.

Now, F is a regular functor, but there is another mapping we can define based on F, which is not a functor — let’s call it G. It’s a mapping from C to D. It maps objects the same way F does, but when it comes to mapping morphisms, it reverses them. It takes a morphism f :: b → a in C, maps it first to the opposite morphism fop :: a → b and then uses the functor F on it, to get F fop :: F a → F b.

Contravariant

Considering that F a is the same as G a and F b is the same as G b, the whole trip can be described as:
G f :: (b → a) → (G a → G b)
It’s a “functor with a twist.” A mapping of categories that inverts the direction of morphisms in this manner is called a contravariant functor. Notice that a contravariant functor is just a regular functor from the opposite category. The regular functors, by the way — the kind we’ve been studying thus far — are called covariant functors.

Here’s the typeclass defining a contravariant functor (really, a contravariant endofunctor) in Haskell:

class Contravariant f where
    contramap :: (b -> a) -> (f a -> f b)

Our type constructor Op is an instance of it:

instance Contravariant (Op r) where
    -- (b -> a) -> Op r a -> Op r b
    contramap f g = g . f

Notice that the function f inserts itself before (that is, to the right of) the contents of Op — the function g.

The definition of contramap for Op may be made even terser, if you notice that it’s just the function composition operator with the arguments flipped. There is a special function for flipping arguments, called flip:

flip :: (a -> b -> c) -> (b -> a -> c)
flip f y x = f x y

With it, we get:

contramap = flip (.)

Profunctors

We’ve seen that the function-arrow operator is contravariant in its first argument and covariant in the second. Is there a name for such a beast? It turns out that, if the target category is Set, such a beast is called a profunctor. Because a contravariant functor is equivalent to a covariant functor from the opposite category, a profunctor is defined as:
Cop × D → Set

Since, to first approximation, Haskell types are sets, we apply the name Profunctor to a type constructor p of two arguments, which is contra-functorial in the first argument and functorial in the second. Here’s the appropriate typeclass taken from the Data.Profunctor library:

class Profunctor p where
  dimap :: (a -> b) -> (c -> d) -> p b c -> p a d
  dimap f g = lmap f . rmap g
  lmap :: (a -> b) -> p b c -> p a c
  lmap f = dimap f id
  rmap :: (b -> c) -> p a b -> p a c
  rmap = dimap id

All three functions come with default implementations. Just like with Bifunctor, when declaring an instance of Profunctor, you have a choice of either implementing dimap and accepting the defaults for lmap and rmap, or implementing both lmap and rmap and accepting the default for dimap.

dimap

dimap

Now we can assert that the function-arrow operator is an instance of a Profunctor:

instance Profunctor (->) where
  dimap ab cd bc = cd . bc . ab
  lmap = flip (.)
  rmap = (.)

Profunctors have their application in the Haskell lens library. We’ll see them again when we talk about ends and coends.

The Hom-Functor

The above examples are the reflection of a more general statement that the mapping that takes a pair of objects a and b and assigns to it the set of morphisms between them, the hom-set C(a, b), is a functor. It is a functor from the product category Cop×C to the category of sets, Set.

Let’s define its action on morphisms. A morphism in Cop×C is a pair of morphisms from C:

f :: a'→ a
g :: b → b'

The lifting of this pair must be a morphism (a function) from the set C(a, b) to the set C(a', b'). Just pick any element h of C(a, b) (it’s a morphism from a to b) and assign to it:

g ∘ h ∘ f

which is an element of C(a', b').

As you can see, the hom-functor is a special case of a profunctor.

Challenges

  1. Show that the data type:
    data Pair a b = Pair a b

    is a bifunctor. For additional credit implement all three methods of Bifunctor and use equational reasoning to show that these definitions are compatible with the default implementations whenever they can be applied.

  2. Show the isomorphism between the standard definition of Maybe and this desugaring:
    type Maybe' a = Either (Const () a) (Identity a)

    Hint: Define two mappings between the two implementations. For additional credit, show that they are the inverse of each other using equational reasoning.

  3. Let’s try another data structure. I call it a PreList because it’s a precursor to a List. It replaces recursion with a type parameter b.
    data PreList a b = Nil | Cons a b

    You could recover our earlier definition of a List by recursively applying PreList to itself (we’ll see how it’s done when we talk about fixed points).

    Show that PreList is an instance of Bifunctor.

  4. Show that the following data types define bifunctors in a and b:
    data K2 c a b = K2 c
    data Fst a b = Fst a
    data Snd a b = Snd b

    For additional credit, check your solutions agains Conor McBride’s paper Clowns to the Left of me, Jokers to the Right.

  5. Define a bifunctor in a language other than Haskell. Implement bimap for a generic pair in that language.
  6. Should std::map be considered a bifunctor or a profunctor in the two template arguments Key and T? How would you redesign this data type to make it so?

Next: Function Types.

Acknowledgment

As usual, big thanks go to Gershom Bazerman for reviewing this article.


This is part of Categories for Programmers. Previously: Simple Algebraic Data Types. See the Table of Contents.

At the risk of sounding like a broken record, I will say this about functors: A functor is a very simple but powerful idea. Category theory is just full of those simple but powerful ideas. A functor is a mapping between categories. Given two categories, C and D, a functor F maps objects in C to objects in D — it’s a function on objects. If a is an object in C, we’ll write its image in D as F a (no parentheses). But a category is not just objects — it’s objects and morphisms that connect them. A functor also maps morphisms — it’s a function on morphisms. But it doesn’t map morphisms willy-nilly — it preserves connections. So if a morphism f in C connects object a to object b,

f :: a -> b

the image of f in D, F f, will connect the image of a to the image of b:

F f :: F a -> F b

(This is a mixture of mathematical and Haskell notation that hopefully makes sense by now. I won’t use parentheses when applying functors to objects or morphisms.) Functor As you can see, a functor preserves the structure of a category: what’s connected in one category will be connected in the other category. But there’s something more to the structure of a category: there’s also the composition of morphisms. If h is a composition of f and g:

h = g . f

we want its image under F to be a composition of the images of f and g:

F h = F g . F f

FunctorCompos Finally, we want all identity morphisms in C to be mapped to identity morphisms in D:

F ida = idF a

Here, ida is the identity at the object a, and idF a the identity at F a. FunctorId Note that these conditions make functors much more restrictive than regular functions. Functors must preserve the structure of a category. If you picture a category as a collection of objects held together by a network of morphisms, a functor is not allowed to introduce any tears into this fabric. It may smash objects together, it may glue multiple morphisms into one, but it may never break things apart. This no-tearing constraint is similar to the continuity condition you might know from calculus. In this sense functors are “continuous” (although there exists an even more restrictive notion of continuity for functors). Just like functions, functors may do both collapsing and embedding. The embedding aspect is more prominent when the source category is much smaller than the target category. In the extreme, the source can be the trivial singleton category — a category with one object and one morphism (the identity). A functor from the singleton category to any other category simply selects an object in that category. This is fully analogous to the property of morphisms from singleton sets selecting elements in target sets. The maximally collapsing functor is called the constant functor Δc. It maps every object in the source category to one selected object c in the target category. It also maps every morphism in the source category to the identity morphism idc. It acts like a black hole, compacting everything into one singularity. We’ll see more of this functor when we discuss limits and colimits.

Functors in Programming

Let’s get down to earth and talk about programming. We have our category of types and functions. We can talk about functors that map this category into itself — such functors are called endofunctors. So what’s an endofunctor in the category of types? First of all, it maps types to types. We’ve seen examples of such mappings, maybe without realizing that they were just that. I’m talking about definitions of types that were parameterized by other types. Let’s see a few examples.

The Maybe Functor

The definition of Maybe is a mapping from type a to type Maybe a:

data Maybe a = Nothing | Just a

Here’s an important subtlety: Maybe itself is not a type, it’s a type constructor. You have to give it a type argument, like Int or Bool, in order to turn it into a type. Maybe without any argument represents a function on types. But can we turn Maybe into a functor? (From now on, when I speak of functors in the context of programming, I will almost always mean endofunctors.) A functor is not only a mapping of objects (here, types) but also a mapping of morphisms (here, functions). For any function from a to b:

f :: a -> b

we would like to produce a function from Maybe a to Maybe b. To define such a function, we’ll have two cases to consider, corresponding to the two constructors of Maybe. The Nothing case is simple: we’ll just return Nothing back. And if the argument is Just, we’ll apply the function f to its contents. So the image of f under Maybe is the function:

f’ :: Maybe a -> Maybe b
f’ Nothing = Nothing
f’ (Just x) = Just (f x)

(By the way, in Haskell you can use apostrophes in variables names, which is very handy in cases like these.) In Haskell, we implement the morphism-mapping part of a functor as a higher order function called fmap. In the case of Maybe, it has the following signature:

fmap :: (a -> b) -> (Maybe a -> Maybe b)

FunctorMaybe We often say that fmap lifts a function. The lifted function acts on Maybe values. As usual, because of currying, this signature may be interpreted in two ways: as a function of one argument — which itself is a function (a->b) — returning a function (Maybe a -> Maybe b); or as a function of two arguments returning Maybe b:

fmap :: (a -> b) -> Maybe a -> Maybe b

Based on our previous discussion, this is how we implement fmap for Maybe:

fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)

To show that the type constructor Maybe together with the function fmap form a functor, we have to prove that fmap preserves identity and composition. These are called “the functor laws,” but they simply ensure the preservation of the structure of the category.

Equational Reasoning

To prove the functor laws, I will use equational reasoning, which is a common proof technique in Haskell. It takes advantage of the fact that Haskell functions are defined as equalities: the left hand side equals the right hand side. You can always substitute one for another, possibly renaming variables to avoid name conflicts. Think of this as either inlining a function, or the other way around, refactoring an expression into a function. Let’s take the identity function as an example:

id x = x

If you see, for instance, id y in some expression, you can replace it with y (inlining). Further, if you see id applied to an expression, say id (y + 2), you can replace it with the expression itself (y + 2). And this substitution works both ways: you can replace any expression e with id e (refactoring). If a function is defined by pattern matching, you can use each sub-definition independently. For instance, given the above definition of fmap you can replace fmap f Nothing with Nothing, or the other way around. Let’s see how this works in practice. Let’s start with the preservation of identity:

fmap id = id

There are two cases to consider: Nothing and Just. Here’s the first case (I’m using Haskell pseudo-code to transform the left hand side to the right hand side):

  fmap id Nothing 
= { definition of fmap }
  Nothing 
= { definition of id }
  id Nothing

Notice that in the last step I used the definition of id backwards. I replaced the expression Nothing with id Nothing. In practice, you carry out such proofs by “burning the candle at both ends,” until you hit the same expression in the middle — here it was Nothing. The second case is also easy:

  fmap id (Just x) 
= { definition of fmap }
  Just (id x) 
= { definition of id }
  Just x
= { definition of id }
  id (Just x)

Now, lets show that fmap preserves composition:

fmap (g . f) = fmap g . fmap f

First the Nothing case:

  fmap (g . f) Nothing 
= { definition of fmap }
  Nothing 
= { definition of fmap }
  fmap g Nothing
= { definition of fmap }
  fmap g (fmap f Nothing)

And then the Just case:

  fmap (g . f) (Just x)
= { definition of fmap }
  Just ((g . f) x)
= { definition of composition }
  Just (g (f x))
= { definition of fmap }
  fmap g (Just (f x))
= { definition of fmap }
  fmap g (fmap f (Just x))
= { definition of composition }
  (fmap g . fmap f) (Just x)

It’s worth stressing that equational reasoning doesn’t work for C++ style “functions” with side effects. Consider this code:

int square(int x) {
    return x * x;
}

int counter() {
    static int c = 0;
    return c++;
}

double y = square(counter());

Using equational reasoning, you would be able to inline square to get:

double y = counter() * counter();

This is definitely not a valid transformation, and it will not produce the same result. Despite that, the C++ compiler will try to use equational reasoning if you implement square as a macro, with disastrous results.

Optional

Functors are easily expressed in Haskell, but they can be defined in any language that supports generic programming and higher-order functions. Let’s consider the C++ analog of Maybe, the template type optional. Here’s a sketch of the implementation (the actual implementation is much more complex, dealing with various ways the argument may be passed, with copy semantics, and with the resource management issues characteristic of C++):

template<class T>
class optional {
    bool _isValid; // the tag
    T    _v;
public:
    optional()    : _isValid(false) {}         // Nothing
    optional(T x) : _isValid(true) , _v(x) {}  // Just
    bool isValid() const { return _isValid; }
    T val() const { return _v; }
};

This template provides one part of the definition of a functor: the mapping of types. It maps any type T to a new type optional<T>. Let’s define its action on functions:

template<class A, class B>
std::function<optional<B>(optional<A>)> 
fmap(std::function<B(A)> f) 
{
    return [f](optional<A> opt) {
        if (!opt.isValid())
            return optional<B>{};
        else
            return optional<B>{ f(opt.val()) };
    };
}

This is a higher order function, taking a function as an argument and returning a function. Here’s the uncurried version of it:

template<class A, class B>
optional<B> fmap(std::function<B(A)> f, optional<A> opt) {
    if (!opt.isValid())
        return optional<B>{};
    else
        return optional<B>{ f(opt.val()) };
}

There is also an option of making fmap a template method of optional. This embarrassment of choices makes abstracting the functor pattern in C++ a problem. Should functor be an interface to inherit from (unfortunately, you can’t have template virtual functions)? Should it be a curried or an uncurried free template function? Can the C++ compiler correctly infer the missing types, or should they be specified explicitly? Consider a situation where the input function f takes an int to a bool. How will the compiler figure out the type of g:

auto g = fmap(f);

especially if, in the future, there are multiple functors overloading fmap? (We’ll see more functors soon.)

Typeclasses

So how does Haskell deal with abstracting the functor? It uses the typeclass mechanism. A typeclass defines a family of types that support a common interface. For instance, the class of objects that support equality is defined as follows:

class Eq a where
    (==) :: a -> a -> Bool

This definition states that type a is of the class Eq if it supports the operator (==) that takes two arguments of type a and returns a Bool. If you want to tell Haskell that a particular type is Eq, you have to declare it an instance of this class and provide the implementation of (==). For example, given the definition of a 2D Point (a product type of two Floats):

data Point = Pt Float Float

you can define the equality of points:

instance Eq Point where
    (Pt x y) == (Pt x' y') = x == x' && y == y'

Here I used the operator (==) (the one I’m defining) in the infix position between the two patterns (Pt x y) and (Pt x' y'). The body of the function follows the single equal sign. Once Point is declared an instance of Eq, you can directly compare points for equality. Notice that, unlike in C++ or Java, you don’t have to specify the Eq class (or interface) when defining Point — you can do it later in client code. Typeclasses are also Haskell’s only mechanism for overloading functions (and operators). We will need that for overloading fmap for different functors. There is one complication, though: a functor is not defined as a type but as a mapping of types, a type constructor. We need a typeclass that’s not a family of types, as was the case with Eq, but a family of type constructors. Fortunately a Haskell typeclass works with type constructors as well as with types. So here’s the definition of the Functor class:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

It stipulates that f is a Functor if there exists a function fmap with the specified type signature. The lowercase f is a type variable, similar to type variables a and b. The compiler, however, is able to deduce that it represents a type constructor rather than a type by looking at its usage: acting on other types, as in f a and f b. Accordingly, when declaring an instance of Functor, you have to give it a type constructor, as is the case with Maybe:

instance Functor Maybe where
    fmap _ Nothing = Nothing
    fmap f (Just x) = Just (f x)

By the way, the Functor class, as well as its instance definitions for a lot of simple data types, including Maybe, are part of the standard Prelude library.

Functor in C++

Can we try the same approach in C++? A type constructor corresponds to a template class, like optional, so by analogy, we would parameterize fmap with a template template parameter F. This is the syntax for it:

template<template<class> F, class A, class B>
F<B> fmap(std::function<B(A)>, F<A>);

We would like to be able to specialize this template for different functors. Unfortunately, there is a prohibition against partial specialization of template functions in C++. You can’t write:

template<class A, class B>
optional<B> fmap<optional>(std::function<B(A)> f, optional<A> opt)

Instead, we have to fall back on function overloading, which brings us back to the original definition of the uncurried fmap:

template<class A, class B>
optional<B> fmap(std::function<B(A)> f, optional<A> opt) 
{
    if (!opt.isValid())
        return optional<B>{};
    else
        return optional<B>{ f(opt.val()) };
}

This definition works, but only because the second argument of fmap selects the overload. It totally ignores the more generic definition of fmap.

The List Functor

To get some intuition as to the role of functors in programming, we need to look at more examples. Any type that is parameterized by another type is a candidate for a functor. Generic containers are parameterized by the type of the elements they store, so let’s look at a very simple container, the list:

data List a = Nil | Cons a (List a)

We have the type constructor List, which is a mapping from any type a to the type List a. To show that List is a functor we have to define the lifting of functions: Given a function a->b define a function List a -> List b:

fmap :: (a -> b) -> (List a -> List b)

A function acting on List a must consider two cases corresponding to the two list constructors. The Nil case is trivial — just return Nil — there isn’t much you can do with an empty list. The Cons case is a bit tricky, because it involves recursion. So let’s step back for a moment and consider what we are trying to do. We have a list of a, a function f that turns a to b, and we want to generate a list of b. The obvious thing is to use f to turn each element of the list from a to b. How do we do this in practice, given that a (non-empty) list is defined as the Cons of a head and a tail? We apply f to the head and apply the lifted (fmapped) f to the tail. This is a recursive definition, because we are defining lifted f in terms of lifted f:

fmap f (Cons x t) = Cons (f x) (fmap f t)

Notice that, on the right hand side, fmap f is applied to a list that’s shorter than the list for which we are defining it — it’s applied to its tail. We recurse towards shorter and shorter lists, so we are bound to eventually reach the empty list, or Nil. But as we’ve decided earlier, fmap f acting on Nil returns Nil, thus terminating the recursion. To get the final result, we combine the new head (f x) with the new tail (fmap f t) using the Cons constructor. Putting it all together, here’s the instance declaration for the list functor:

instance Functor List where
    fmap _ Nil = Nil
    fmap f (Cons x t) = Cons (f x) (fmap f t)

If you are more comfortable with C++, consider the case of a std::vector, which could be considered the most generic C++ container. The implementation of fmap for std::vector is just a thin encapsulation of std::transform:

template<class A, class B>
std::vector<B> fmap(std::function<B(A)> f, std::vector<A> v)
{
    std::vector<B> w;
    std::transform( std::begin(v)
                  , std::end(v)
                  , std::back_inserter(w)
                  , f);
    return w;
}

We can use it, for instance, to square the elements of a sequence of numbers:

std::vector<int> v{ 1, 2, 3, 4 };
auto w = fmap([](int i) { return i*i; }, v);
std::copy( std::begin(w)
         , std::end(w)
         , std::ostream_iterator(std::cout, ", "));

Most C++ containers are functors by virtue of implementing iterators that can be passed to std::transform, which is the more primitive cousin of fmap. Unfortunately, the simplicity of a functor is lost under the usual clutter of iterators and temporaries (see the implementation of fmap above). I’m happy to say that the new proposed C++ range library makes the functorial nature of ranges much more pronounced.

The Reader Functor

Now that you might have developed some intuitions — for instance, functors being some kind of containers — let me show you an example which at first sight looks very different. Consider a mapping of type a to the type of a function returning a. We haven’t really talked about function types in depth — the full categorical treatment is coming — but we have some understanding of those as programmers. In Haskell, a function type is constructed using the arrow type constructor (->) which takes two types: the argument type and the result type. You’ve already seen it in infix form, a->b, but it can equally well be used in prefix form, when parenthesized:

(->) a b

Just like with regular functions, type functions of more than one argument can be partially applied. So when we provide just one type argument to the arrow, it still expects another one. That’s why:

(->) a

is a type constructor. It needs one more type b to produce a complete type a->b. As it stands, it defines a whole family of type constructors parameterized by a. Let’s see if this is also a family of functors. Dealing with two type parameters can get a bit confusing, so let’s do some renaming. Let’s call the argument type r and the result type a, in line with our previous functor definitions. So our type constructor takes any type a and maps it into the type r->a. To show that it’s a functor, we want to lift a function a->b to a function that takes r->a and returns r->b. These are the types that are formed using the type constructor (->) r acting on, respectively, a and b. Here’s the type signature of fmap applied to this case:

fmap :: (a -> b) -> (r -> a) -> (r -> b)

We have to solve the following puzzle: given a function f::a->b and a function g::r->a, create a function r->b. There is only one way we can compose the two functions, and the result is exactly what we need. So here’s the implementation of our fmap:

instance Functor ((->) r) where
    fmap f g = f . g

It just works! If you like terse notation, this definition can be reduced further by noticing that composition can be rewritten in prefix form:

fmap f g = (.) f g

and the arguments can be omitted to yield a direct equality of two functions:

fmap = (.)

This combination of the type constructor (->) r with the above implementation of fmap is called the reader functor.

Functors as Containers

We’ve seen some examples of functors in programming languages that define general-purpose containers, or at least objects that contain some value of the type they are parameterized over. The reader functor seems to be an outlier, because we don’t think of functions as data. But we’ve seen that pure functions can be memoized, and function execution can be turned into table lookup. Tables are data. Conversely, because of Haskell’s laziness, a traditional container, like a list, may actually be implemented as a function. Consider, for instance, an infinite list of natural numbers, which can be compactly defined as:

nats :: [Integer]
nats = [1..]

In the first line, a pair of square brackets is Haskell’s built-in type constructor for lists. In the second line, square brackets are used to create a list literal. Obviously, an infinite list like this cannot be stored in memory. The compiler implements it as a function that generates Integers on demand. Haskell effectively blurs the distinction between data and code. A list could be considered a function, and a function could be considered a table that maps arguments to results. The latter can even be practical if the domain of the function is finite and not too large. It would not be practical, however, to implement strlen as table lookup, because there are infinitely many different strings. As programmers, we don’t like infinities, but in category theory you learn to eat infinities for breakfast. Whether it’s a set of all strings or a collection of all possible states of the Universe, past, present, and future — we can deal with it! So I like to think of the functor object (an object of the type generated by an endofunctor) as containing a value or values of the type over which it is parameterized, even if these values are not physically present there. One example of a functor is a C++ std::future, which may at some point contain a value, but it’s not guaranteed it will; and if you want to access it, you may block waiting for another thread to finish execution. Another example is a Haskell IO object, which may contain user input, or the future versions of our Universe with “Hello World!” displayed on the monitor. According to this interpretation, a functor object is something that may contain a value or values of the type it’s parameterized upon. Or it may contain a recipe for generating those values. We are not at all concerned about being able to access the values — that’s totally optional, and outside of the scope of the functor. All we are interested in is to be able to manipulate those values using functions. If the values can be accessed, then we should be able to see the results of this manipulation. If they can’t, then all we care about is that the manipulations compose correctly and that the manipulation with an identity function doesn’t change anything. Just to show you how much we don’t care about being able to access the values inside a functor object, here’s a type constructor that ignores completely its argument a:

data Const c a = Const c

The Const type constructor takes two types, c and a. Just like we did with the arrow constructor, we are going to partially apply it to create a functor. The data constructor (also called Const) takes just one value of type c. It has no dependence on a. The type of fmap for this type constructor is:

fmap :: (a -> b) -> Const c a -> Const c b

Because the functor ignores its type argument, the implementation of fmap is free to ignore its function argument — the function has nothing to act upon:

instance Functor (Const c) where
    fmap _ (Const v) = Const v

This might be a little clearer in C++ (I never thought I would utter those words!), where there is a stronger distinction between type arguments — which are compile-time — and values, which are run-time:

template<class C, class A>
struct Const {
    Const(C v) : _v(v) {}
    C _v;
};

The C++ implementation of fmap also ignores the function argument and essentially re-casts the Const argument without changing its value:

template<class C, class A, class B>
Const<C, B> fmap(std::function<B(A)> f, Const<C, A> c) {
    return Const<C, B>{c._v};
}

Despite its weirdness, the Const functor plays an important role in many constructions. In category theory, it’s a special case of the Δc functor I mentioned earlier — the endo-functor case of a black hole. We’ll be seeing more of it it in the future.

Functor Composition

It’s not hard to convince yourself that functors between categories compose, just like functions between sets compose. A composition of two functors, when acting on objects, is just the composition of their respective object mappings; and similarly when acting on morphisms. After jumping through two functors, identity morphisms end up as identity morphisms, and compositions of morphisms finish up as compositions of morphisms. There’s really nothing much to it. In particular, it’s easy to compose endofunctors. Remember the function maybeTail? I’ll rewrite it using Haskell’s built in implementation of lists:

maybeTail :: [a] -> Maybe [a]
maybeTail [] = Nothing
maybeTail (x:xs) = Just xs

(The empty list constructor that we used to call Nil is replaced with the empty pair of square brackets []. The Cons constructor is replaced with the infix operator : (colon).) The result of maybeTail is of a type that’s a composition of two functors, Maybe and [], acting on a. Each of these functors is equipped with its own version of fmap, but what if we want to apply some function f to the contents of the composite: a Maybe list? We have to break through two layers of functors. We can use fmap to break through the outer Maybe. But we can’t just send f inside Maybe because f doesn’t work on lists. We have to send (fmap f) to operate on the inner list. For instance, let’s see how we can square the elements of a Maybe list of integers:

square x = x * x

mis :: Maybe [Int]
mis = Just [1, 2, 3]

mis2 = fmap (fmap square) mis

The compiler, after analyzing the types, will figure out that, for the outer fmap, it should use the implementation from the Maybe instance, and for the inner one, the list functor implementation. It may not be immediately obvious that the above code may be rewritten as:

mis2 = (fmap . fmap) square mis

But remember that fmap may be considered a function of just one argument:

fmap :: (a -> b) -> (f a -> f b)

In our case, the second fmap in (fmap . fmap) takes as its argument:

square :: Int -> Int

and returns a function of the type:

[Int] -> [Int]

The first fmap then takes that function and returns a function:

Maybe [Int] -> Maybe [Int]

Finally, that function is applied to mis. So the composition of two functors is a functor whose fmap is the composition of the corresponding fmaps. Going back to category theory: It’s pretty obvious that functor composition is associative (the mapping of objects is associative, and the mapping of morphisms is associative). And there is also a trivial identity functor in every category: it maps every object to itself, and every morphism to itself. So functors have all the same properties as morphisms in some category. But what category would that be? It would have to be a category in which objects are categories and morphisms are functors. It’s a category of categories. But a category of all categories would have to include itself, and we would get into the same kinds of paradoxes that made the set of all sets impossible. There is, however, a category of all small categories called Cat (which is big, so it can’t be a member of itself). A small category is one in which objects form a set, as opposed to something larger than a set. Mind you, in category theory, even an infinite uncountable set is considered “small.” I thought I’d mention these things because I find it pretty amazing that we can recognize the same structures repeating themselves at many levels of abstraction. We’ll see later that functors form categories as well.

Challenges

  1. Can we turn the Maybe type constructor into a functor by defining:
    fmap _ _ = Nothing

    which ignores both of its arguments? (Hint: Check the functor laws.)

  2. Prove functor laws for the reader functor. Hint: it’s really simple.
  3. Implement the reader functor in your second favorite language (the first being Haskell, of course).
  4. Prove the functor laws for the list functor. Assume that the laws are true for the tail part of the list you’re applying it to (in other words, use induction).

Acknowledgments

Gershom Bazerman is kind enough to keep reviewing these posts. I’m grateful for his patience and insight.

Next: Functoriality


Categories for Programmers. Previously Products and Coproducts. See the Table of Contents.

We’ve seen two basic ways of combining types: using a product and a coproduct. It turns out that a lot of data structures in everyday programming can be built using just these two mechanisms. This fact has important practical consequences. Many properties of data structures are composable. For instance, if you know how to compare values of basic types for equality, and you know how to generalize these comparisons to product and coproduct types, you can automate the derivation of equality operators for composite types. In Haskell you can automatically derive equality, comparison, conversion to and from string, and more, for a large subset of composite types.

Let’s have a closer look at product and sum types as they appear in programming.

Product Types

The canonical implementation of a product of two types in a programming language is a pair. In Haskell, a pair is a primitive type constructor; in C++ it’s a relatively complex template defined in the Standard Library.

Pair

Pairs are not strictly commutative: a pair (Int, Bool) cannot be substituted for a pair (Bool, Int), even though they carry the same information. They are, however, commutative up to isomorphism — the isomorphism being given by the swap function (which is its own inverse):

swap :: (a, b) -> (b, a)
swap (x, y) = (y, x)

You can think of the two pairs as simply using a different format for storing the same data. It’s just like big endian vs. little endian.

You can combine an arbitrary number of types into a product by nesting pairs inside pairs, but there is an easier way: nested pairs are equivalent to tuples. It’s the consequence of the fact that different ways of nesting pairs are isomorphic. If you want to combine three types in a product, a, b, and c, in this order, you can do it in two ways:

((a, b), c)

or

(a, (b, c))

These types are different — you can’t pass one to a function that expects the other — but their elements are in one-to-one correspondence. There is a function that maps one to another:

alpha :: ((a, b), c) -> (a, (b, c))
alpha ((x, y), z) = (x, (y, z))

and this function is invertible:

alpha_inv :: (a, (b, c)) -> ((a, b), c)
alpha_inv  (x, (y, z)) = ((x, y), z)

so it’s an isomorphism. These are just different ways of repackaging the same data.

You can interpret the creation of a product type as a binary operation on types. From that perspective, the above isomorphism looks very much like the associativity law we’ve seen in monoids:

(a * b) * c = a * (b * c)

Except that, in the monoid case, the two ways of composing products were equal, whereas here they are only equal “up to isomorphism.”

If we can live with isomorphisms, and don’t insist on strict equality, we can go even further and show that the unit type, (), is the unit of the product the same way 1 is the unit of multiplication. Indeed, the pairing of a value of some type a with a unit doesn’t add any information. The type:

(a, ())

is isomorphic to a. Here’s the isomorphism:

rho :: (a, ()) -> a
rho (x, ()) = x
rho_inv :: a -> (a, ())
rho_inv x = (x, ())

These observations can be formalized by saying that Set (the category of sets) is a monoidal category. It’s a category that’s also a monoid, in the sense that you can multiply objects (here, take their cartesian product). I’ll talk more about monoidal categories, and give the full definition in the future.

There is a more general way of defining product types in Haskell — especially, as we’ll see soon, when they are combined with sum types. It uses named constructors with multiple arguments. A pair, for instance, can be defined alternatively as:

data Pair a b = P a b

Here, Pair a b is the name of the type paremeterized by two other types, a and b; and P is the name of the data constructor. You define a pair type by passing two types to the Pair type constructor. You construct a pair value by passing two values of appropriate types to the constructor P. For instance, let’s define a value stmt as a pair of String and Bool:

stmt :: Pair String Bool
stmt = P "This statements is" False

The first line is the type declaration. It uses the type constructor Pair, with String and Bool replacing a and the b in the generic definition of Pair. The second line defines the actual value by passing a concrete string and a concrete Boolean to the data constructor P. Type constructors are used to construct types; data constructors, to construct values.

Since the name spaces for type and data constructors are separate in Haskell, you will often see the same name used for both, as in:

data Pair a b = Pair a b

And if you squint hard enough, you may even view the built-in pair type as a variation on this kind of declaration, where the name Pair is replaced with the binary operator (,). In fact you can use (,) just like any other named constructor and create pairs using prefix notation:

stmt = (,) "This statement is" False

Similarly, you can use (,,) to create triples, and so on.

Instead of using generic pairs or tuples, you can also define specific named product types, as in:

data Stmt = Stmt String Bool

which is just a product of String and Bool, but it’s given its own name and constructor. The advantage of this style of declaration is that you may define many types that have the same content but different meaning and functionality, and which cannot be substituted for each other.

Programming with tuples and multi-argument constructors can get messy and error prone — keeping track of which component represents what. It’s often preferable to give names to components. A product type with named fields is called a record in Haskell, and a struct in C.

Records

Let’s have a look at a simple example. We want to describe chemical elements by combining two strings, name and symbol; and an integer, the atomic number; into one data structure. We can use a tuple (String, String, Int) and remember which component represents what. We would extract components by pattern matching, as in this function that checks if the symbol of the element is the prefix of its name (as in He being the prefix of Helium):

startsWithSymbol :: (String, String, Int) -> Bool
startsWithSymbol (name, symbol, _) = isPrefixOf symbol name

This code is error prone, and is hard to read and maintain. It’s much better to define a record:

data Element = Element { name         :: String
                       , symbol       :: String
                       , atomicNumber :: Int }

The two representations are isomorphic, as witnessed by these two conversion functions, which are the inverse of each other:

tupleToElem :: (String, String, Int) -> Element
tupleToElem (n, s, a) = Element { name = n
                                , symbol = s
                                , atomicNumber = a }
elemToTuple :: Element -> (String, String, Int)
elemToTuple e = (name e, symbol e, atomicNumber e)

Notice that the names of record fields also serve as functions to access these fields. For instance, atomicNumber e retrieves the atomicNumber field from e. We use atomicNumber as a function of the type:

atomicNumber :: Element -> Int

With the record syntax for Element, our function startsWithSymbol becomes more readable:

startsWithSymbol :: Element -> Bool
startsWithSymbol e = isPrefixOf (symbol e) (name e)

We could even use the Haskell trick of turning the function isPrefixOf into an infix operator by surrounding it with backquotes, and make it read almost like a sentence:

startsWithSymbol e = symbol e `isPrefixOf` name e

The parentheses could be omitted in this case, because an infix operator has lower precedence than a function call.

Sum Types

Just as the product in the category of sets gives rise to product types, the coproduct gives rise to sum types. The canonical implementation of a sum type in Haskell is:

data Either a b = Left a | Right b

And like pairs, Eithers are commutative (up to isomorphism), can be nested, and the nesting order is irrelevant (up to isomorphism). So we can, for instance, define a sum equivalent of a triple:

data OneOfThree a b c = Sinistral a | Medial b | Dextral c

and so on.

It turns out that Set is also a (symmetric) monoidal category with respect to coproduct. The role of the binary operation is played by the disjoint sum, and the role of the unit element is played by the initial object. In terms of types, we have Either as the monoidal operator and Void, the uninhabited type, as its neutral element. You can think of Either as plus, and Void as zero. Indeed, adding Void to a sum type doesn’t change its content. For instance:

Either a Void

is isomorphic to a. That’s because there is no way to construct a Right version of this type — there isn’t a value of type Void. The only inhabitants of Either a Void are constructed using the Left constructors and they simply encapsulate a value of type a. So, symbolically, a + 0 = a.

Sum types are pretty common in Haskell, but their C++ equivalents, unions or variants, are much less common. There are several reasons for that.

First of all, the simplest sum types are just enumerations and are implemented using enum in C++. The equivalent of the Haskell sum type:

data Color = Red | Green | Blue

is the C++:

enum { Red, Green, Blue };

An even simpler sum type:

data Bool = True | False

is the primitive bool in C++.

Simple sum types that encode the presence or absence of a value are variously implemented in C++ using special tricks and “impossible” values, like empty strings, negative numbers, null pointers, etc. This kind of optionality, if deliberate, is expressed in Haskell using the Maybe type:

data Maybe a = Nothing | Just a

The Maybe type is a sum of two types. You can see this if you separate the two constructors into individual types. The first one would look like this:

data NothingType = Nothing

It’s an enumeration with one value called Nothing. In other words, it’s a singleton, which is equivalent to the unit type (). The second part:

data JustType a = Just a

is just an encapsulation of the type a. We could have encoded Maybe as:

type Maybe a = Either () a

More complex sum types are often faked in C++ using pointers. A pointer can be either null, or point to a value of specific type. For instance, a Haskell list type, which can be defined as a (recursive) sum type:

List a = Nil | Cons a (List a)

can be translated to C++ using the null pointer trick to implement the empty list:

template<class A> 
class List {
    Node<A> * _head;
public:
    List() : _head(nullptr) {}  // Nil
    List(A a, List<A> l)        // Cons
      : _head(new Node<A>(a, l))
    {}
};

Notice that the two Haskell constructors Nil and Cons are translated into two overloaded List constructors with analogous arguments (none, for Nil; and a value and a list for Cons). The List class doesn’t need a tag to distinguish between the two components of the sum type. Instead it uses the special nullptr value for _head to encode Nil.

The main difference, though, between Haskell and C++ types is that Haskell data structures are immutable. If you create an object using one particular constructor, the object will forever remember which constructor was used and what arguments were passed to it. So a Maybe object that was created as Just "energy" will never turn into Nothing. Similarly, an empty list will forever be empty, and a list of three elements will always have the same three elements.

It’s this immutability that makes construction reversible. Given an object, you can always disassemble it down to parts that were used in its construction. This deconstruction is done with pattern matching and it reuses constructors as patterns. Constructor arguments, if any, are replaced with variables (or other patterns).

The List data type has two constructors, so the deconstruction of an arbitrary List uses two patterns corresponding to those constructors. One matches the empty Nil list, and the other a Cons-constructed list. For instance, here’s the definition of a simple function on Lists:

maybeTail :: List a -> Maybe (List a)
maybeTail Nil = Nothing
maybeTail (Cons _ t) = Just t

The first part of the definition of maybeTail uses the Nil constructor as pattern and returns Nothing. The second part uses the Cons constructor as pattern. It replaces the first constructor argument with a wildcard, because we are not interested in it. The second argument to Cons is bound to the variable t (I will call these things variables even though, strictly speaking, they never vary: once bound to an expression, a variable never changes). The return value is Just t. Now, depending on how your List was created, it will match one of the clauses. If it was created using Cons, the two arguments that were passed to it will be retrieved (and the first discarded).

Even more elaborate sum types are implemented in C++ using polymorphic class hierarchies. A family of classes with a common ancestor may be understood as one variant type, in which the vtable serves as a hidden tag. What in Haskell would be done by pattern matching on the constructor, and by calling specialized code, in C++ is accomplished by dispatching a call to a virtual function based on the vtable pointer.

You will rarely see union used as a sum type in C++ because of severe limitations on what can go into a union. You can’t even put a std::string into a union because it has a copy constructor.

Algebra of Types

Taken separately, product and sum types can be used to define a variety of useful data structures, but the real strength comes from combining the two. Once again we are invoking the power of composition.

Let’s summarize what we’ve discovered so far. We’ve seen two commutative monoidal structures underlying the type system: We have the sum types with Void as the neutral element, and the product types with the unit type, (), as the neutral element. We’d like to think of them as analogous to addition and multiplication. In this analogy, Void would correspond to zero, and unit, (), to one.

Let’s see how far we can stretch this analogy. For instance, does multiplication by zero give zero? In other words, is a product type with one component being Void isomorphic to Void? For example, is it possible to create a pair of, say Int and Void?

To create a pair you need two values. Although you can easily come up with an integer, there is no value of type Void. Therefore, for any type a, the type (a, Void) is uninhabited — has no values — and is therefore equivalent to Void. In other words, a*0 = 0.

Another thing that links addition and multiplication is the distributive property:

a * (b + c) = a * b + a * c

Does it also hold for product and sum types? Yes, it does — up to isomorphisms, as usual. The left hand side corresponds to the type:

(a, Either b c)

and the right hand side corresponds to the type:

Either (a, b) (a, c)

Here’s the function that converts them one way:

prodToSum :: (a, Either b c) -> Either (a, b) (a, c)
prodToSum (x, e) = 
    case e of
      Left  y -> Left  (x, y)
      Right z -> Right (x, z)

and here’s one that goes the other way:

sumToProd :: Either (a, b) (a, c) -> (a, Either b c)
sumToProd e = 
    case e of
      Left  (x, y) -> (x, Left  y)
      Right (x, z) -> (x, Right z)

The case of statement is used for pattern matching inside functions. Each pattern is followed by an arrow and the expression to be evaluated when the pattern matches. For instance, if you call prodToSum with the value:

prod1 :: (Int, Either String Float)
prod1 = (2, Left "Hi!")

the e in case e of will be equal to Left "Hi!". It will match the pattern Left y, substituting "Hi!" for y. Since the x has already been matched to 2, the result of the case of clause, and the whole function, will be Left (2, "Hi!"), as expected.

I’m not going to prove that these two functions are the inverse of each other, but if you think about it, they must be! They are just trivially re-packing the contents of the two data structures. It’s the same data, only different format.

Mathematicians have a name for such two intertwined monoids: it’s called a semiring. It’s not a full ring, because we can’t define subtraction of types. That’s why a semiring is sometimes called a rig, which is a pun on “ring without an n” (negative). But barring that, we can get a lot of mileage from translating statements about, say, natural numbers, which form a rig, to statements about types. Here’s a translation table with some entries of interest:

Numbers Types
0 Void
1 ()
a + b Either a b = Left a | Right b
a * b (a, b) or Pair a b = Pair a b
2 = 1 + 1 data Bool = True | False
1 + a data Maybe = Nothing | Just a

The list type is quite interesting, because it’s defined as a solution to an equation. The type we are defining appears on both sides of the equation:

List a = Nil | Cons a (List a)

If we do our usual substitutions, and also replace List a with x, we get the equation:

x = 1 + a * x

We can’t solve it using traditional algebraic methods because we can’t subtract or divide types. But we can try a series of substitutions, where we keep replacing x on the right hand side with (1 + a*x), and use the distributive property. This leads to the following series:

x = 1 + a*x
x = 1 + a*(1 + a*x) = 1 + a + a*a*x
x = 1 + a + a*a*(1 + a*x) = 1 + a + a*a + a*a*a*x
...
x = 1 + a + a*a + a*a*a + a*a*a*a...

We end up with an infinite sum of products (tuples), which can be interpreted as: A list is either empty, 1; or a singleton, a; or a pair, a*a; or a triple, a*a*a; etc… Well, that’s exactly what a list is — a string of as!

There’s much more to lists than that, and we’ll come back to them and other recursive data structures after we learn about functors and fixed points.

Solving equations with symbolic variables — that’s algebra! It’s what gives these types their name: algebraic data types.

Finally, I should mention one very important interpretation of the algebra of types. Notice that a product of two types a and b must contain both a value of type a and a value of type b, which means both types must be inhabited. A sum of two types, on the other hand, contains either a value of type a or a value of type b, so it’s enough if one of them is inhabited. Logical and and or also form a semiring, and it too can be mapped into type theory:

Logic Types
false Void
true ()
a || b Either a b = Left a | Right b
a && b (a, b)

This analogy goes deeper, and is the basis of the Curry-Howard isomorphism between logic and type theory. We’ll come back to it when we talk about function types.

Challenges

  1. Show the isomorphism between Maybe a and Either () a.
  2. Here’s a sum type defined in Haskell:
    data Shape = Circle Float
               | Rect Float Float

    When we want to define a function like area that acts on a Shape, we do it by pattern matching on the two constructors:

    area :: Shape -> Float
    area (Circle r) = pi * r * r
    area (Rect d h) = d * h

    Implement Shape in C++ or Java as an interface and create two classes: Circle and Rect. Implement area as a virtual function.

  3. Continuing with the previous example: We can easily add a new function circ that calculates the circumference of a Shape. We can do it without touching the definition of Shape:
    circ :: Shape -> Float
    circ (Circle r) = 2.0 * pi * r
    circ (Rect d h) = 2.0 * (d + h)

    Add circ to your C++ or Java implementation. What parts of the original code did you have to touch?

  4. Continuing further: Add a new shape, Square, to Shape and make all the necessary updates. What code did you have to touch in Haskell vs. C++ or Java? (Even if you’re not a Haskell programmer, the modifications should be pretty obvious.)
  5. Show that a + a = 2 * a holds for types (up to isomorphism). Remember that 2 corresponds to Bool, according to our translation table.

Next: Functors.

Acknowledments

Thanks go to Gershom Bazerman for reviewing this post and helpful comments.


Categories for Programmers. In the previous installment we discussed how to add logging to pure functions. See the Table of Contents.

Follow the Arrows

The Ancient Greek playwright Euripides once said: “Every man is like the company he is wont to keep.” We are defined by our relationships. Nowhere is this more true than in category theory. If we want to single out a particular object in a category, we can only do this by describing its pattern of relationships with other objects (and itself). These relationships are defined by morphisms.

There is a common construction in category theory called the universal construction for defining objects in terms of their relationships. One way of doing this is to pick a pattern, a particular shape constructed from objects and morphisms, and look for all its occurrences in the category. If it’s a common enough pattern, and the category is large, chances are you’ll have lots and lots of hits. The trick is to establish some kind of ranking among those hits, and pick what could be considered the best fit.

This process is reminiscent of the way we do web searches. A query is like a pattern. A very general query will give you large recall: lots of hits. Some may be relevant, others not. To eliminate irrelevant hits, you refine your query. That increases its precision. Finally, the search engine will rank the hits and, hopefully, the one result that you’re interested in will be at the top of the list.

Initial Object

The simplest shape is a single object. Obviously, there are as many instances of this shape as there are objects in a given category. That’s a lot to choose from. We need to establish some kind of ranking and try to find the object that tops this hierarchy. The only means at our disposal are morphisms. If you think of morphisms as arrows, then it’s possible that there is an overall net flow of arrows from one end of the category to another. This is true in ordered categories, for instance in partial orders. We could generalize that notion of object precedence by saying that object a is “more initial” than object b if there is an arrow (a morphism) going from a to b. We would then define the initial object as one that has arrows going to all other objects. Obviously there is no guarantee that such an object exists, and that’s okay. A bigger problem is that there may be too many such objects: The recall is good, but precision is lacking. The solution is to take a hint from ordered categories — they allow at most one arrow between any two objects: there is only one way of being less-than or equal-to another object. Which leads us to this definition of the initial object:

The initial object is the object that has one and only one morphism going to any object in the category.

Initial

However, even that doesn’t guarantee the uniqueness of the initial object (if one exists). But it guarantees the next best thing: uniqueness up to isomorphism. Isomorphisms are very important in category theory, so I’ll talk about them shortly. For now, let’s just agree that uniqueness up to isomorphism justifies the use of “the” in the definition of the initial object.

Here are some examples: The initial object in a partially ordered set (often called a poset) is its least element. Some posets don’t have an initial object — like the set of all integers, positive and negative, with less-than-or-equal relation for morphisms.

In the category of sets and functions, the initial object is the empty set. Remember, an empty set corresponds to the Haskell type Void (there is no corresponding type in C++) and the unique polymorphic function from Void to any other type is called absurd:

absurd :: Void -> a

It’s this family of morphisms that makes Void the initial object in the category of types.

Terminal Object

Let’s continue with the single-object pattern, but let’s change the way we rank the objects. We’ll say that object a is “more terminal” than object b if there is a morphism going from b to a (notice the reversal of direction). We’ll be looking for an object that’s more terminal than any other object in the category. Again, we will insist on uniqueness:

The terminal object is the object with one and only one morphism coming to it from any object in the category.

Final

And again, the terminal object is unique, up to isomorphism, which I will show shortly. But first let’s look at some examples. In a poset, the terminal object, if it exists, is the biggest object. In the category of sets, the terminal object is a singleton. We’ve already talked about singletons — they correspond to the void type in C++ and the unit type () in Haskell. It’s a type that has only one value — implicit in C++ and explicit in Haskell, denoted by (). We’ve also established that there is one and only one pure function from any type to the unit type:

unit :: a -> ()
unit _ = ()

so all the conditions for the terminal object are satisfied.

Notice that in this example the uniqueness condition is crucial, because there are other sets (actually, all of them, except for the empty set) that have incoming morphisms from every set. For instance, there is a Boolean-valued function (a predicate) defined for every type:

yes :: a -> Bool
yes _ = True

But Bool is not a terminal object. There is at least one more Bool-valued function from every type:

no :: a -> Bool
no _ = False

Insisting on uniqueness gives us just the right precision to narrow down the definition of the terminal object to just one type.

Duality

You can’t help but to notice the symmetry between the way we defined the initial object and the terminal object. The only difference between the two was the direction of morphisms. It turns out that for any category C we can define the opposite category Cop just by reversing all the arrows. The opposite category automatically satisfies all the requirements of a category, as long as we simultaneously redefine composition. If original morphisms f::a->b and g::b->c composed to h::a->c with h=g∘f, then the reversed morphisms fop::b->a and gop::c->b will compose to hop::c->a with hop=fop∘gop. And reversing the identity arrows is a (pun alert!) no-op.

Duality is a very important property of categories because it doubles the productivity of every mathematician working in category theory. For every construction you come up with, there is its opposite; and for every theorem you prove, you get one for free. The constructions in the opposite category are often prefixed with “co”, so you have products and coproducts, monads and comonads, cones and cocones, limits and colimits, and so on. There are no cocomonads though, because reversing the arrows twice gets us back to the original state.

It follows then that a terminal object is the initial object in the opposite category.

Isomorphisms

As programmers, we are well aware that defining equality is a nontrivial task. What does it mean for two objects to be equal? Do they have to occupy the same location in memory (pointer equality)? Or is it enough that the values of all their components are equal? Are two complex numbers equal if one is expressed as the real and imaginary part, and the other as modulus and angle? You’d think that mathematicians would have figured out the meaning of equality, but they haven’t. They have the same problem of multiple competing definitions for equality. There is the propositional equality, intensional equality, extensional equality, and equality as a path in homotopy type theory. And then there are the weaker notions of isomorphism, and even weaker of equivalence.

The intuition is that isomorphic objects look the same — they have the same shape. It means that every part of one object corresponds to some part of another object in a one-to-one mapping. As far as our instruments can tell, the two objects are a perfect copy of each other. Mathematically it means that there is a mapping from object a to object b, and there is a mapping from object b back to object a, and they are the inverse of each other. In category theory we replace mappings with morphisms. An isomorphism is an invertible morphism; or a pair of morphisms, one being the inverse of the other.

We understand the inverse in terms of composition and identity: Morphism g is the inverse of morphism f if their composition is the identity morphism. These are actually two equations because there are two ways of composing two morphisms:

f . g = id
g . f = id

When I said that the initial (terminal) object was unique up to isomorphism, I meant that any two initial (terminal) objects are isomorphic. That’s actually easy to see. Let’s suppose that we have two initial objects i1 and i2. Since i1 is initial, there is a unique morphism f from i1 to i2. By the same token, since i2 is initial, there is a unique morphism g from i2 to i1. What’s the composition of these two morphisms?

All morphisms in this diagram are unique

All morphisms in this diagram are unique

The composition g∘f must be a morphism from i1 to i1. But i1 is initial so there can only be one morphism going from i1 to i1. Since we are in a category, we know that there is an identity morphism from i1 to i1, and since there is room for only one, that must be it. Therefore g∘f is equal to identity. Similarly, f∘g must be equal to identity, because there can be only one morphism from i2 back to i2. This proves that f and g must be the inverse of each other. Therefore any two initial objects are isomorphic.

Notice that in this proof we used the uniqueness of the morphism from the initial object to itself. Without that we couldn’t prove the “up to isomorphism” part. But why do we need the uniqueness of f and g? Because not only is the initial object unique up to isomorphism, it is unique up to unique isomorphism. In principle, there could be more than one isomorphism between two objects, but that’s not the case here. This “uniqueness up to unique isomorphism” is the important property of all universal constructions.

Products

The next universal construction is that of a product. We know what a cartesian product of two sets is: it’s a set of pairs. But what’s the pattern that connects the product set with its constituent sets? If we can figure that out, we’ll be able to generalize it to other categories.

All we can say is that there are two functions, the projections, from the product to each of the constituents. In Haskell, these two functions are called fst and snd and they pick, respectively, the first and the second component of a pair:

fst :: (a, b) -> a
fst (x, y) = x
snd :: (a, b) -> b
snd (x, y) = y

Here, the functions are defined by pattern matching their arguments: the pattern that matches any pair is (x, y), and it extracts its components into variables x and y.

These definitions can be simplified even further with the use of wildcards:

fst (x, _) = x
snd (_, y) = y

In C++, we would use template functions, for instance:

template<class A, class B>
A fst(pair<A, B> const & p) {
    return p.first;
}

Equipped with this seemingly very limited knowledge, let’s try to define a pattern of objects and morphisms in the category of sets that will lead us to the construction of a product of two sets, a and b. This pattern consists of an object c and two morphisms p and q connecting it to a and b, respectively:

p :: c -> a
q :: c -> b

ProductPattern

All cs that fit this pattern will be considered candidates for the product. There may be lots of them.

ProductCandidates

For instance, let’s pick, as our constituents, two Haskell types, Int and Bool, and get a sampling of candidates for their product.

Here’s one: Int. Can Int be considered a candidate for the product of Int and Bool? Yes, it can — and here are its projections:

p :: Int -> Int
p x = x

q :: Int -> Bool
q _ = True

That’s pretty lame, but it matches the criteria.

Here’s another one: (Int, Int, Bool). It’s a tuple of three elements, or a triple. Here are two morphisms that make it a legitimate candidate (we are using pattern matching on triples):

p :: (Int, Int, Bool) -> Int
p (x, _, _) = x

q :: (Int, Int, Bool) -> Bool
q (_, _, b) = b

You may have noticed that while our first candidate was too small — it only covered the Int dimension of the product; the second was too big — it spuriously duplicated the Int dimension.

But we haven’t explored yet the other part of the universal construction: the ranking. We want to be able to compare two instances of our pattern. We want to compare one candidate object c and its two projections p and q with another candidate object c’ and its two projections p’ and q’. We would like to say that c is “better” than c’ if there is a morphism m from c’ to c — but that’s too weak. We also want its projections to be “better,” or “more universal,” than the projections of c’. What it means is that the projections p’ and q’ can be reconstructed from p and q using m:

p’ = p . m
q’ = q . m

ProductRanking

Another way of looking at these equation is that m factorizes p’ and q’. Just pretend that these equations are in natural numbers, and the dot is multiplication: m is a common factor shared by p’ and q’.

Just to build some intuitions, let me show you that the pair (Int, Bool) with the two canonical projections, fst and snd is indeed better than the two candidates I presented before.

Not a product

The mapping m for the first candidate is:

m :: Int -> (Int, Bool)
m x = (x, True)

Indeed, the two projections, p and q can be reconstructed as:

p x = fst (m x) = x
q x = snd (m x) = True

The m for the second example is similarly uniquely determined:

m (x, _, b) = (x, b)

We were able to show that (Int, Bool) is better than either of the two candidates. Let’s see why the opposite is not true. Could we find some m' that would help us reconstruct fst and snd from p and q?

fst = p . m’
snd = q . m’

In our first example, q always returned True and we know that there are pairs whose second component is False. We can’t reconstruct snd from q.

The second example is different: we retain enough information after running either p or q, but there is more than one way to factorize fst and snd. Because both p and q ignore the second component of the triple, our m’ can put anything in it. We can have:

m’ (x, b) = (x, x, b)

or

m’ (x, b) = (x, 42, b)

and so on.

Putting it all together, given any type c with two projections p and q, there is a unique m from c to the cartesian product (a, b) that factorizes them. In fact, it just combines p and q into a pair.

m :: c -> (a, b)
m x = (p x, q x)

That makes the cartesian product (a, b) our best match, which means that this universal construction works in the category of sets. It picks the product of any two sets.

Now let’s forget about sets and define a product of two objects in any category using the same universal construction. Such product doesn’t always exist, but when it does, it is unique up to a unique isomorphism.

A product of two objects a and b is the object c equipped with two projections such that for any other object c’ equipped with two projections there is a unique morphism m from c’ to c that factorizes those projections.

A (higher order) function that produces the factorizing function m from two candidates is sometimes called the factorizer. In our case, it would be the function:

factorizer :: (c -> a) -> (c -> b) -> (c -> (a, b))
factorizer p q = \x -> (p x, q x)

Coproduct

Like every construction in category theory, the product has a dual, which is called the coproduct. When we reverse the arrows in the product pattern, we end up with an object c equipped with two injections, i and j: morphisms from a and b to c.

i :: a -> c
j :: b -> c

CoproductPattern

The ranking is also inverted: object c is “better” than object c’ that is equipped with the injections i’ and j’ if there is a morphism m from c to c’ that factorizes the injections:

i' = m . i
j' = m . j

CoproductRanking

The “best” such object, one with a unique morphism connecting it to any other pattern, is called a coproduct and, if it exists, is unique up to unique isomorphism.

A coproduct of two objects a and b is the object c equipped with two injections such that for any other object c’ equipped with two injections there is a unique morphism m from c to c’ that factorizes those injections.

In the category of sets, the coproduct is the disjoint union of two sets. An element of the disjoint union of a and b is either an element of a or an element of b. If the two sets overlap, the disjoint union contains two copies of the common part. You can think of an element of a disjoint union as being tagged with an identifier that specifies its origin.

For a programmer, it’s easier to understand a coproduct in terms of types: it’s a tagged union of two types. C++ supports unions, but they are not tagged. It means that in your program you have to somehow keep track which member of the union is valid. To create a tagged union, you have to define a tag — an enumeration — and combine it with the union. For instance, a tagged union of an int and a char const * could be implemented as:

struct Contact {
    enum { isPhone, isEmail } tag;
    union { int phoneNum; char const * emailAddr; };
};

The two injections can either be implemented as constructors or as functions. For instance, here’s the first injection as a function PhoneNum:

Contact PhoneNum(int n) {
    Contact c;
    c.tag = isPhone;
    c.phoneNum = n;
    return c;
}

It injects an integer into Contact.

A tagged union is also called a variant, and there is a very general implementation of a variant in the boost library, boost::variant.

In Haskell, you can combine any data types into a tagged union by separating data constructors with a vertical bar. The Contact example translates into the declaration:

data Contact = PhoneNum Int | EmailAddr String

Here, PhoneNum and EmailAddr serve both as constructors (injections), and as tags for pattern matching (more about this later). For instance, this is how you would construct a contact using a phone number:

helpdesk :: Contact;
helpdesk = PhoneNum 2222222

Unlike the canonical implementation of the product that is built into Haskell as the primitive pair, the canonical implementation of the coproduct is a data type called Either, which is defined in the standard Prelude as:

Either a b = Left a | Right b

It is parameterized by two types, a and b and has two constructors: Left that takes a value of type a, and Right that takes a value of type b.

Just as we’ve defined the factorizer for a product, we can define one for the coproduct. Given a candidate type c and two candidate injections i and j, the factorizer for Either produces the factoring function:

factorizer :: (a -> c) -> (b -> c) -> Either a b -> c
factorizer i j (Left a)  = i a
factorizer i j (Right b) = j b

Asymmetry

We’ve seen two set of dual definitions: The definition of a terminal object can be obtained from the definition of the initial object by reversing the direction of arrows; in a similar way, the definition of the coproduct can be obtained from that of the product. Yet in the category of sets the initial object is very different from the final object, and coproduct is very different from product. We’ll see later that product behaves like multiplication, with the terminal object playing the role of one; whereas coproduct behaves more like the sum, with the initial object playing the role of zero. In particular, for finite sets, the size of the product is the product of the sizes of individual sets, and the size of the coproduct is the sum of the sizes.

This shows that the category of sets is not symmetric with respect to the inversion of arrows.

Notice that while the empty set has a unique morphism to any set (the absurd function), it has no morphisms coming back. The singleton set has a unique morphism coming to it from any set, but it also has outgoing morphisms to every set (except for the empty one). As we’ve seen before, these outgoing morphisms from the terminal object play a very important role of picking elements of other sets (the empty set has no elements, so there’s nothing to pick).

It’s the relationship of the singleton set to the product that sets it apart from the coproduct. Consider using the singleton set, represented by the unit type (), as yet another — vastly inferior — candidate for the product pattern. Equip it with two projections p and q: functions from the singleton to each of the constituent sets. Each selects a concrete element from either set. Because the product is universal, there is also a (unique) morphism m from our candidate, the singleton, to the product. This morphism selects an element from the product set — it selects a concrete pair. It also factorizes the two projections:

p = fst . m
q = snd . m

When acting on the singleton value (), the only element of the singleton set, these two equations become:

p () = fst (m ())
q () = snd (m ())

Since m () is the element of the product picked by m, these equations tell us that the element picked by p from the first set, p (), is the first component of the pair picked by m. Similarly, q () is equal to the second component. This is in total agreement with our understanding that elements of the product are pairs of elements from the constituent sets.

There is no such simple interpretation of the coproduct. We could try the singleton set as a candidate for a coproduct, in an attempt to extract the elements from it, but there we would have two injections going into it rather than two projections coming out of it. They’d tell us nothing about their sources (in fact, we’ve seen that they ignore the input parameter). Neither would the unique morphism from the coproduct to our singleton. The category of sets just looks very different when seen from the direction of the initial object than it does when seen from the terminal end.

This is not an intrinsic property of sets, it’s a property of functions, which we use as morphisms in Set. Functions are, in general, asymmetric. Let me explain.

A function must be defined for every element of its domain set (in programming, we call it a total function), but it doesn’t have to cover the whole codomain. We’ve seen some extreme cases of it: functions from a singleton set — functions that select just a single element in the codomain. (Actually, functions from an empty set are the real extremes.) When the size of the domain is much smaller than the size of the codomain, we often think of such functions as embedding the domain in the codomain. For instance, we can think of a function from a singleton set as embedding its single element in the codomain. I call them embedding functions, but mathematicians prefer to give a name to the opposite: functions that tightly fill their codomains are called surjective or onto.

The other source of asymmetry is that functions are allowed to map many elements of the domain set into one element of the codomain. They can collapse them. The extreme case are functions that map whole sets into a singleton. You’ve seen the polymorphic unit function that does just that. The collapsing can only be compounded by composition. A composition of two collapsing functions is even more collapsing than the individual functions. Mathematicians have a name for non-collapsing functions: they call them injective or one-to-one

Of course there are some functions that are neither embedding nor collapsing. They are called bijections and they are truly symmetric, because they are invertible. In the category of sets, an isomorphism is the same as a bijection.

Challenges

  1. Show that the terminal object is unique up to unique isomorphism.
  2. What is a product of two objects in a poset? Hint: Use the universal construction.
  3. What is a coproduct of two objects in a poset?
  4. Implement the equivalent of Haskell Either as a generic type in your favorite language (other than Haskell).
  5. Show that Either is a “better” coproduct than int equipped with two injections:
    int i(int n) { return n; }
    int j(bool b) { return b? 0: 1; }

    Hint: Define a function

    int m(Either const & e);

    that factorizes i and j.

  6. Continuing the previous problem: How would you argue that int with the two injections i and j cannot be “better” than Either?
  7. Still continuing: What about these injections?
    int i(int n) { 
        if (n < 0) return n; 
        return n + 2;
    }
    int j(bool b) { return b? 0: 1; }
  8. Come up with an inferior candidate for a coproduct of int and bool that cannot be better than Either because it allows multiple acceptable morphisms from it to Either.

Next: Simple Algebraic Data Types.

Bibliography

  1. The Catsters, Products and Coproducts video.

Acknowledments

I’m grateful to Gershom Bazerman for reviewing this post before publication and for stimulating discussions.


In the previous installment of Categories for Programmers, Categories Great and Small, I gave a few examples of simple categories. In this installment we’ll work through a more advanced example. If you’re new to the series, here’s the Table of Contents.

Composition of Logs

You’ve seen how to model types and pure functions as a category. I also mentioned that there is a way to model side effects, or non-pure functions, in category theory. Let’s have a look at one such example: functions that log or trace their execution. Something that, in an imperative language, would likely be implemented by mutating some global state, as in:

string logger;

bool negate(bool b) {
     logger += "Not so! ";
     return !b;
}

You know that this is not a pure function, because its memoized version would fail to produce a log. This function has side effects.

In modern programming, we try to stay away from global mutable state as much as possible — if only because of the complications of concurrency. And you would never put code like this in a library.

Fortunately for us, it’s possible to make this function pure. You just have to pass the log explicitly, in and out. Let’s do that by adding a string argument, and pairing regular output with a string that contains the updated log:

pair<bool, string> negate(bool b, string logger) {
     return make_pair(!b, logger + "Not so! ");
}

This function is pure, it has no side effects, it returns the same pair every time it’s called with the same arguments, and it can be memoized if necessary. However, considering the cumulative nature of the log, you’d have to memoize all possible histories that can lead to a given call. There would be a separate memo entry for:

negate(true, "It was the best of times. ");

and

negate(true, "It was the worst of times. ");

and so on.

It’s also not a very good interface for a library function. The callers are free to ignore the string in the return type, so that’s not a huge burden; but they are forced to pass a string as input, which might be inconvenient.

Is there a way to do the same thing less intrusively? Is there a way to separate concerns? In this simple example, the main purpose of the function negate is to turn one Boolean into another. The logging is secondary. Granted, the message that is logged is specific to the function, but the task of aggregating the messages into one continuous log is a separate concern. We still want the function to produce a string, but we’d like to unburden it from producing a log. So here’s the compromise solution:

pair<bool, string> negate(bool b) {
     return make_pair(!b, "Not so! ");
}

The idea is that the log will be aggregated between function calls.

To see how this can be done, let’s switch to a slightly more realistic example. We have one function from string to string that turns lower case characters to upper case:

string toUpper(string s) {
    string result;
    int (*toupperp)(int) = &toupper; // toupper is overloaded
    transform(begin(s), end(s), back_inserter(result), toupperp);
    return result;
}

and another that splits a string into a vector of strings, breaking it on whitespace boundaries:

vector<string> toWords(string s) {
    return words(s);
}

The actual work is done in the auxiliary function words:

vector<string> words(string s) {
    vector<string> result{""};
    for (auto i = begin(s); i != end(s); ++i)
    {
        if (isspace(*i))
            result.push_back("");
        else
            result.back() += *i;
    }
    return result;
}

PiggyBack

We want to modify the functions toUpper and toWords so that they piggyback a message string on top of their regular return values.

We will “embellish” the return values of these functions. Let’s do it in a generic way by defining a template Writer that encapsulates a pair whose first component is a value of arbitrary type A and the second component is a string:

template<class A>
using Writer = pair<A, string>;

Here are the embellished functions:

Writer<string> toUpper(string s) {
    string result;
    int (*toupperp)(int) = &toupper;
    transform(begin(s), end(s), back_inserter(result), toupperp);
    return make_pair(result, "toUpper ");
}

Writer<vector<string>> toWords(string s) {
    return make_pair(words(s), "toWords ");
}

We want to compose these two functions into another embellished function that uppercases a string and splits it into words, all the while producing a log of those actions. Here’s how we may do it:

Writer<vector<string>> process(string s) {
    auto p1 = toUpper(s);
    auto p2 = toWords(p1.first);
    return make_pair(p2.first, p1.second + p2.second);
}

We have accomplished our goal: The aggregation of the log is no longer the concern of the individual functions. They produce their own messages, which are then, externally, concatenated into a larger log.

Now imagine a whole program written in this style. It’s a nightmare of repetitive, error-prone code. But we are programmers. We know how to deal with repetitive code: we abstract it! This is, however, not your run of the mill abstraction — we have to abstract function composition itself. But composition is the essence of category theory, so before we write more code, let’s analyze the problem from the categorical point of view.

The Writer Category

The idea of embellishing the return types of a bunch of functions in order to piggyback some additional functionality turns out to be very fruitful. We’ll see many more examples of it. The starting point is our regular category of types and functions. We’ll leave the types as objects, but redefine our morphisms to be the embellished functions.

For instance, suppose that we want to embellish the function isEven that goes from int to bool. We turn it into a morphism that is represented by an embellished function. The important point is that this morphism is still considered an arrow between the objects int and bool, even though the embellished function returns a pair:

pair<bool, string> isEven(int n) {
     return make_pair(n % 2 == 0, "isEven ");
}

By the laws of a category, we should be able to compose this morphism with another morphism that goes from the object bool to whatever. In particular, we should be able to compose it with our earlier negate:

pair<bool, string> negate(bool b) {
     return make_pair(!b, "Not so! ");
}

Obviously, we cannot compose these two morphisms the same way we compose regular functions, because of the input/output mismatch. Their composition should look more like this:

pair<bool, string> isOdd(int n) {
    pair<bool, string> p1 = isEven(n);
    pair<bool, string> p2 = negate(p1.first);
    return make_pair(p2.first, p1.second + p2.second);
}

So here’s the recipe for the composition of two morphisms in this new category we are constructing:

  1. Execute the embellished function corresponding to the first morphism
  2. Extract the first component of the result pair and pass it to the embellished function corresponding to the second morphism
  3. Concatenate the second component (the string) of of the first result and the second component (the string) of the second result
  4. Return a new pair combining the first component of the final result with the concatenated string.

If we want to abstract this composition as a higher order function in C++, we have to use a template parameterized by three types corresponding to three objects in our category. It should take two embellished functions that are composable according to our rules, and return a third embellished function:

template<class A, class B, class C>
function<Writer<C>(A)> compose(function<Writer<B>(A)> m1, 
                               function<Writer<C>(B)> m2)
{
    return [m1, m2](A x) {
        auto p1 = m1(x);
        auto p2 = m2(p1.first);
        return make_pair(p2.first, p1.second + p2.second);
    };
}

Now we can go back to our earlier example and implement the composition of toUpper and toWords using this new template:

Writer<vector<string>> process(string s) {
   return compose<string, string, vector<string>>(toUpper, toWords)(s);
}

There is still a lot of noise with the passing of types to the compose template. This can be avoided as long as you have a C++14-compliant compiler that supports generalized lambda functions with return type deduction (credit for this code goes to Eric Niebler):

auto const compose = [](auto m1, auto m2) {
    return [m1, m2](auto x) {
        auto p1 = m1(x);
        auto p2 = m2(p1.first);
        return make_pair(p2.first, p1.second + p2.second);
    };
};

In this new definition, the implementation of process simplifies to:

Writer<vector<string>> process(string s){
   return compose(toUpper, toWords)(s);
}

But we are not finished yet. We have defined composition in our new category, but what are the identity morphisms? These are not our regular identity functions! They have to be morphisms from type A back to type A, which means they are embellished functions of the form:

Writer<A> identity(A);

They have to behave like units with respect to composition. If you look at our definition of composition, you’ll see that an identity morphism should pass its argument without change, and only contribute an empty string to the log:

template<class A>
Writer<A> identity(A x) {
    return make_pair(x, "");
}

You can easily convince yourself that the category we have just defined is indeed a legitimate category. In particular, our composition is trivially associative. If you follow what’s happening with the first component of each pair, it’s just a regular function composition, which is associative. The second components are being concatenated, and concatenation is also associative.

An astute reader may notice that it would be easy to generalize this construction to any monoid, not just the string monoid. We would use mappend inside compose and mempty inside identity (in place of + and ""). There really is no reason to limit ourselves to logging just strings. A good library writer should be able to identify the bare minimum of constraints that make the library work — here the logging library’s only requirement is that the log have monoidal properties.

Writer in Haskell

The same thing in Haskell is a little more terse, and we also get a lot more help from the compiler. Let’s start by defining the Writer type:

type Writer a = (a, String)

Here I’m just defining a type alias, an equivalent of a typedef (or using) in C++. The type Writer is parameterized by a type variable a and is equivalent to a pair of a and String. The syntax for pairs is minimal: just two items in parentheses, separated by a comma.

Our morphisms are functions from an arbitrary type to some Writer type:

a -> Writer b

We’ll declare the composition as a funny infix operator, sometimes called the “fish”:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)

It’s a function of two arguments, each being a function on its own, and returning a function. The first argument is of the type (a->Writer b), the second is (b->Writer c), and the result is (a->Writer c).

Here’s the definition of this infix operator — the two arguments m1 and m2 appearing on either side of the fishy symbol:

m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

The result is a lambda function of one argument x. The lambda is written as a backslash — think of it as the Greek letter λ with an amputated leg.

The let expression lets you declare auxiliary variables. Here the result of the call to m1 is pattern matched to a pair of variables (y, s1); and the result of the call to m2, with the argument y from the first pattern, is matched to (z, s2).

It is common in Haskell to pattern match pairs, rather than use accessors, as we did in C++. Other than that there is a pretty straightforward correspondence between the two implementations.

The overall value of the let expression is specified in its in clause: here it’s a pair whose first component is z and the second component is the concatenation of two strings, s1++s2.

I will also define the identity morphism for our category, but for reasons that will become clear much later, I will call it return.

return :: a -> Writer a
return x = (x, "")

For completeness, let’s have the Haskell versions of the embellished functions upCase and toWords:

upCase :: String -> Writer String
upCase s = (map toUpper s, "upCase ")
toWords :: String -> Writer [String]
toWords s = (words s, "toWords ")

The function map corresponds to the C++ transform. It applies the character function toUpper to the string s. The auxiliary function words is defined in the standard Prelude library.

Finally, the composition of the two functions is accomplished with the help of the fish operator:

process :: String -> Writer [String]
process = upCase >=> toWords

Kleisli Categories

You might have guessed that I haven’t invented this category on the spot. It’s an example of the so called Kleisli category — a category based on a monad. We are not ready to discuss monads yet, but I wanted to give you a taste of what they can do. For our limited purposes, a Kleisli category has, as objects, the types of the underlying programming language. Morphisms from type A to type B are functions that go from A to a type derived from B using the particular embellishment. Each Kleisli category defines its own way of composing such morphisms, as well as the identity morphisms with respect to that composition. (Later we’ll see that the imprecise term “embellishment” corresponds to the notion of an endofunctor in a category.)

The particular monad that I used as the basis of the category in this post is called the writer monad and it’s used for logging or tracing the execution of functions. It’s also an example of a more general mechanism for embedding effects in pure computations. You’ve seen previously that we could model programming-language types and functions in the category of sets (disregarding bottoms, as usual). Here we have extended this model to a slightly different category, a category where morphisms are represented by embellished functions, and their composition does more than just pass the output of one function to the input of another. We have one more degree of freedom to play with: the composition itself. It turns out that this is exactly the degree of freedom which makes it possible to give simple denotational semantics to programs that in imperative languages are traditionally implemented using side effects.

Challenge

A function that is not defined for all possible values of its argument is called a partial function. It’s not really a function in the mathematical sense, so it doesn’t fit the standard categorical mold. It can, however, be represented by a function that returns an embellished type optional:

template<class A> class optional {
    bool _isValid;
    A    _value;
public:
    optional()    : _isValid(false) {}
    optional(A v) : _isValid(true), _value(v) {}
    bool isValid() const { return _isValid; }
    A value() const { return _value; }
};

As an example, here’s the implementation of the embellished function safe_root:

optional<double> safe_root(double x) {
    if (x >= 0) return optional<double>{sqrt(x)};
    else return optional<double>{};
}

Here’s the challenge:

  1. Construct the Kleisli category for partial functions (define composition and identity).
  2. Implement the embellished function safe_reciprocal that returns a valid reciprocal of its argument, if it’s different from zero.
  3. Compose safe_root and safe_reciprocal to implement safe_root_reciprocal that calculates sqrt(1/x) whenever possible.

Acknowledgments

I’m grateful to Eric Niebler for reading the draft and providing the clever implementation of compose that uses advanced features of C++14 to drive type inference. I was able to cut the whole section of old fashioned template magic that did the same thing using type traits. Good riddance! I’m also grateful to Gershom Bazerman for useful comments that helped me clarify some important points.

Next: Products and Coproducts.


In the previous instalment of Category Theory for Programmers we talked about the category of types and functions. If you’re new to the series, here’s the Table of Contents.

You can get real appreciation for categories by studying a variety of examples. Categories come in all shapes and sizes and often pop up in unexpected places. We’ll start with something really simple.

No Objects

The most trivial category is one with zero objects and, consequently, zero morphisms. It’s a very sad category by itself, but it may be important in the context of other categories, for instance, in the category of all categories (yes, there is one). If you think that an empty set makes sense, then why not an empty category?

Simple Graphs

You can build categories just by connecting objects with arrows. You can imagine starting with any directed graph and making it into a category by simply adding more arrows. First, add an identity arrow at each node. Then, for any two arrows such that the end of one coincides with the beginning of the other (in other words, any two composable arrows), add a new arrow to serve as their composition. Every time you add a new arrow, you have to also consider its composition with any other arrow (except for the identity arrows) and itself. You usually end up with infinitely many arrows, but that’s okay.

Another way of looking at this process is that you’re creating a category, which has an object for every node in the graph, and all possible chains of composable graph edges as morphisms. (You may even consider identity morphisms as special cases of chains of length zero.)

Such a category is called a free category generated by a given graph. It’s an example of a free construction, a process of completing a given structure by extending it with a minimum number of items to satisfy its laws (here, the laws of a category). We’ll see more examples of it in the future.

Orders

And now for something completely different! A category where a morphism is a relation between objects: the relation of being less than or equal. Let’s check if it indeed is a category. Do we have identity morphisms? Every object is less than or equal to itself: check! Do we have composition? If a <= b and b <= c then a <= c: check! Is composition associative? Check! A set with a relation like this is called a preorder, so a preorder is indeed a category.

You can also have a stronger relation, that satisfies an additional condition that, if a <= b and b <= a then a must be the same as b. That’s called a partial order.

Finally, you can impose the condition that any two objects are in a relation with each other, one way or another; and that gives you a linear order or total order.

Let’s characterize these ordered sets as categories. A preorder is a category where there is at most one morphism going from any object a to any object b. Another name for such a category is “thin.” A preorder is a thin category.

A set of morphisms from object a to object b in a category C is called a hom-set and is written as C(a, b) (or, sometimes, HomC(a, b)). So every hom-set in a preorder is either empty or a singleton. That includes the hom-set C(a, a), the set of morphisms from a to a, which must be a singleton, containing only the identity, in any preorder. You may, however, have cycles in a preorder. Cycles are forbidden in a partial order.

It’s very important to be able to recognize preorders, partial orders, and total orders because of sorting. Sorting algorithms, such as quicksort, bubble sort, merge sort, etc., can only work correctly on total orders. Partial orders can be sorted using topological sort.

Monoid as Set

Monoid is an embarrassingly simple but amazingly powerful concept. It’s the concept behind basic arithmetics: Both addition and multiplication form a monoid. Monoids are ubiquitous in programming. They show up as strings, lists, foldable data structures, futures in concurrent programming, events in functional reactive programming, and so on.

Traditionally, a monoid is defined as a set with a binary operation. All that’s required from this operation is that it’s associative, and that there is one special element that behaves like a unit with respect to it.

For instance, natural numbers with zero form a monoid under addition. Associativity means that:

(a + b) + c = a + (b + c)

(In other words, we can skip parentheses when adding numbers.)

The neutral element is zero, because:

0 + a = a

and

a + 0 = a

The second equation is redundant, because addition is commutative (a + b = b + a), but commutativity is not part of the definition of a monoid. For instance, string concatenation is not commutative and yet it forms a monoid. The neutral element for string concatenation, by the way, is an empty string, which can be attached to either side of a string without changing it.

In Haskell we can define a type class for monoids — a type for which there is a neutral element called mempty and a binary operation called mappend:

class Monoid m where
    mempty  :: m
    mappend :: m -> m -> m

The type signature for a two-argument function, m->m->m, might look strange at first, but it will make perfect sense after we talk about currying. You may interpret a signature with multiple arrows in two basic ways: as a function of multiple arguments, with the rightmost type being the return type; or as a function of one argument (the leftmost one), returning a function. The latter interpretation may be emphasized by adding parentheses (which are redundant, because the arrow is right-associative), as in: m->(m->m). We’ll come back to this interpretation in a moment.

Notice that, in Haskell, there is no way to express the monoidal properties of mempty and mappend (i.e., the fact that mempty is neutral and that mappend is associative). It’s the responsibility of the programmer to make sure they are satisfied.

Haskell classes are not as intrusive as C++ classes. When you’re defining a new type, you don’t have to specify its class up front. You are free to procrastinate and declare a given type to be an instance of some class much later. As an example, let’s declare String to be a monoid by providing the implementation of mempty and mappend (this is, in fact, done for you in the standard Prelude):

instance Monoid String where
    mempty = ""
    mappend = (++)

Here, we have reused the list concatenation operator (++), because a String is just a list of characters.

A word about Haskell syntax: Any infix operator can be turned into a two-argument function by surrounding it with parentheses. Given two strings, you can concatenate them by inserting ++ between them:

"Hello " ++ "world!"

or by passing them as two arguments to the parenthesized (++):

(++) "Hello " "world!"

Notice that arguments to a function are not separated by commas or surrounded by parentheses. (This is probably the hardest thing to get used to when learning Haskell.)

It’s worth emphasizing that Haskell lets you express equality of functions, as in:

mappend = (++)

Conceptually, this is different than expressing the equality of values produced by functions, as in:

mappend s1 s2 = (++) s1 s2

The former translates into equality of morphisms in the category Hask (or Set, if we ignore bottoms, which is the name for never-ending calculations). Such equations are not only more succinct, but can often be generalized to other categories. The latter is called extensional equality, and states the fact that for any two input strings, the outputs of mappend and (++) are the same. Since the values of arguments are sometimes called points (as in: the value of f at point x), this is called point-wise equality. Function equality without specifying the arguments is described as point-free. (Incidentally, point-free equations often involve composition of functions, which is symbolized by a point, so this might be a little confusing to the beginner.)

The closest one can get to declaring a monoid in C++ would be to use the (proposed) syntax for concepts.

template<class T>
  T mempty = delete;

template<class T>
  T mappend(T, T) = delete;

template<class M>
  concept bool Monoid = requires (M m) {
    { mempty<M> } -> M;
    { mappend(m, m); } -> M;
  };

The first definition uses a value template (also proposed). A polymorphic value is a family of values — a different value for every type.

The keyword delete means that there is no default value defined: It will have to be specified on a case-by-case basis. Similarly, there is no default for mappend.

The concept Monoid is a predicate (hence the bool type) that tests whether there exist appropriate definitions of mempty and mappend for a given type M.

An instantiation of the Monoid concept can be accomplished by providing appropriate specializations and overloads:

template<>
std::string mempty<std::string> = {""};

std::string mappend(std::string s1, std::string s2) {
    return s1 + s2;
}

Monoid as Category

That was the “familiar” definition of the monoid in terms of elements of a set. But as you know, in category theory we try to get away from sets and their elements, and instead talk about objects and morphisms. So let’s change our perspective a bit and think of the application of the binary operator as “moving” or “shifting” things around the set.

For instance, there is the operation of adding 5 to every natural number. It maps 0 to 5, 1 to 6, 2 to 7, and so on. That’s a function defined on the set of natural numbers. That’s good: we have a function and a set. In general, for any number n there is a function of adding n — the “adder” of n.

How do adders compose? The composition of the function that adds 5 with the function that adds 7 is a function that adds 12. So the composition of adders can be made equivalent to the rules of addition. That’s good too: we can replace addition with function composition.

But wait, there’s more: There is also the adder for the neutral element, zero. Adding zero doesn’t move things around, so it’s the identity function in the set of natural numbers.

Instead of giving you the traditional rules of addition, I could as well give you the rules of composing adders, without any loss of information. Notice that the composition of adders is associative, because the composition of functions is associative; and we have the zero adder corresponding to the identity function.

An astute reader might have noticed that the mapping from integers to adders follows from the second interpretation of the type signature of mappend as m->(m->m). It tells us that mappend maps an element of a monoid set to a function acting on that set.

Now I want you to forget that you are dealing with the set of natural numbers and just think of it as a single object, a blob with a bunch of morphisms — the adders. A monoid is a single object category. In fact the name monoid comes from Greek mono, which means single. Every monoid can be described as a single object category with a set of morphisms that follow appropriate rules of composition.

Monoid

String concatenation is an interesting case, because we have a choice of defining right appenders and left appenders (or prependers, if you will). The composition tables of the two models are a mirror reverse of each other. You can easily convince yourself that appending “bar” after “foo” corresponds to prepending “foo” after prepending “bar”.

You might ask the question whether every categorical monoid — a one-object category — defines a unique set-with-binary-operator monoid. It turns out that we can always extract a set from a single-object category. This set is the set of morphisms — the adders in our example. In other words, we have the hom-set M(m, m) of the single object m in the category M. We can easily define a binary operator in this set: The monoidal product of two set-elements is the element corresponding to the composition of the corresponding morphisms. If you give me two elements of M(m, m) corresponding to f and g, their product will correspond to the composition g∘f. The composition always exists, because the source and the target for these morphisms are the same object. And it’s associative by the rules of category. The identity morphism is the neutral element of this product. So we can always recover a set monoid from a category monoid. For all intents and purposes they are one and the same.

Monoid hom-set seen as morphisms and as points in a set

Monoid hom-set seen as morphisms and as points in a set

There is just one little nit for mathematicians to pick: morphisms don’t have to form a set. In the world of categories there are things larger than sets. A category in which morphisms between any two objects form a set is called locally small. As promised, I will be mostly ignoring such subtleties, but I thought I should mention them for the record.

A lot of interesting phenomena in category theory have their root in the fact that elements of a hom-set can be seen both as morphisms, which follow the rules of composition, and as points in a set. Here, composition of morphisms in M translates into monoidal product in the set M(m, m).

Acknowledgments

I’d like to thank Andrew Sutton for rewriting my C++ monoid concept code according to his and Bjarne Stroustrup’s latest proposal.

Challenges

  1. Generate a free category from:
    1. A graph with one node and no edges
    2. A graph with one node and one (directed) edge (hint: this edge can be composed with itself)
    3. A graph with two nodes and a single arrow between them
    4. A graph with a single node and 26 arrows marked with the letters of the alphabet: a, b, c … z.
  2. What kind of order is this?
    1. A set of sets with the inclusion relation: A is included in B if every element of A is also an element of B.
    2. C++ types with the following subtyping relation: T1 is a subtype of T2 if a pointer to T1 can be passed to a function that expects a pointer to T2 without triggering a compilation error.
  3. Considering that Bool is a set of two values True and False, show that it forms two (set-theoretical) monoids with respect to, respectively, operator && (AND) and || (OR).
  4. Represent the Bool monoid with the AND operator as a category: List the morphisms and their rules of composition.
  5. Represent addition modulo 3 as a monoid category.

Next: A programming example of pure functions that do logging using Kleisli categories.


Ferrari museum in Maranello

Ferrari museum in Maranello

I was recently visiting the Ferrari museum in Maranello, Italy, where I saw this display of telemetry data from racing cars.

Telemetry data from a racing car. The contour of the racing track is shown in the upper left corner and various data channels are displayed below.

Telemetry data from a racing car. The racing track is displayed in the upper left corner and various data channels are displayed below.

The processing and the display of telemetry data is an interesting programming challenge. It has application in space exploration (as in, when you land a probe on a surface of a comet), medicine, and the military. The same techniques are used in financial systems where streams carry information about stock prices, commodity prices, and currency exchange rates.

It’s also a problem that lends itself particularly well to functional programming. If you are one of these shops working with telemetry, and you have to maintain legacy code written in imperative style, you might be interested in an alternative approach, especially if you are facing constant pressure to provide more sophisticated analysis tools and introduce concurrency to make the system faster and more responsive.

What all these applications have in common is that they deal with multiple channels generating streams of data. The data has to be either displayed in real time or stored for later analysis and processing. It’s pretty obvious to a functional programmer that channels are functors, and that they should be composed using combinators. In fact this observation can drive the whole architecture. The clincher is the issue of concurrency: retrofitting non-functional code to run in parallel is a lost battle — it’s almost easier to start from scratch. But treating channels as immutable entities makes concurrency almost an after-thought.

Everything is a Number

The most basic (and totally wrong) approach is to look at telemetry as streams of numbers. This is the assembly language of data processing. When everything is a number and you can apply your math any way you wish. The problem is that you are throwing away a lot of useful information. You want to use types as soon as possible to encode additional information and to prevent nonsensical operations like adding temperature to velocity.

In an engineering application, the least you can do is to keep track of units of measurement. You also want to distinguish between channels that produce floating-point numbers and ones that produce integers, or Booleans, or strings. This immediately tells you that a channel should be a polymorphic data structure. You should be able to stream any type of data, be it bytes, complex numbers, or vectors.

Everything is an Object

To an object-oriented mind it looks very much like a channel should be an object that is parameterized by the type of data it carries. And as an object it should have some methods. We need the get method to access the current value, and the next method to increment the position in the stream. As an imperative programmer you might also be tempted to provide a mutator, set. If you ever want your program to be concurrent, don’t even think about it!

If you’re a C++ programmer, you may overload some operators, and use * and ++ instead. That would make a channel look more like a forward iterator. But whatever you call it, a functional programmer will recognize it as a list, with the head and tail functionality.

Everything is a List

Let’s talk about lists, because there is a lot of misunderstanding around them. When people think of lists in imperative languages they think about storage. A list is probably the worst data type for storing data. Imperative programmers naturally assume that functional programmers, who use lists a lot, must be crazy. They are not! A Haskell list is almost never used for storing bulk data. A list is either an interface to data that is stored elsewhere, or a generator of data. Haskell is a lazy functional language, so what looks like a data structure is really a bunch of functions that provide data on demand.

That’s why I wouldn’t hesitate to implement channels as lists in Haskell. As an added bonus, lists can provide a pull interface to data that is being pushed. Reactive programs that process streams of data may be written as if all the data were already there — the event handler logic can be hidden inside the objects that generate the data. And this is just what’s needed for live telemetry data.

Obviously, functional programming is easier in Haskell than in C++, C#, or Java. But given how much legacy software there is, it could be a lost cause to ask management to (a) throw away existing code and start from scratch, (b) retrain the team to learn a new language, and (c) deal with completely new performance characteristics, e.g., lazy evaluation and garbage collection. So, realistically, the best we can do is to keep introducing functional methods into imperative languages, at least for the time being. It doesn’t mean that Haskell should’t play an important role in it. Over and over again I find myself prototyping solutions in Haskell before translating them into C++. The added effort pays back handsomely through faster prototyping, better code quality, and fewer bugs to chase. So I would highly recommend to every imperative programmer to spend, say, an hour a day learning and playing with Haskell. You’d be amazed how it helps in developing your programming skills.

Everything is a Functor

So, if you’re an object oriented programmer, you’ll probably implement a channel as something like this:

template <class T> Channel {
    virtual T get();
    virtual bool next();
};

and then get stuck. With this kind of interface, the rest of your program is bound to degenerate into a complex system of loops that extract data from streams and process them, possibly stuffing it back into other streams.

Instead, I propose to try the functional way. I will show you some prototype code in Haskell, but mostly explain how things work, so a non-Haskell programmer can gain some insight.

Here’s the definition of a polymorphic channel type, Chan:

data Chan a = Chan [a]

where a plays the role of a type variable, analogous to T in the C++ code above. The right hand side of the equal sign defines the constructor Chan that takes a list as an argument. Constructors are used both for constructing and for pattern matching. The notation [a] means a list of a.

The details don’t really matter, as long as you understand that the channel is implemented as a list. Also, I’m making things a little bit more explicit for didactic purposes. A Haskell programmer would implement the channel as a type alias, type, rather than a separate type.

Rule number one of dealing with lists is: try not to access their elements in a loop (or, using the functional equivalent of a loop — recursively). Operate on lists holistically. For instance, one of the most common operations on lists is to apply a function to every element. That means we want our Chan to be a functor.

A functor is a polymorphic data type that supports operating on its contents with a function. In the case of Chan that’s easy, since a list itself is a functor. I’ll be explicit here, again for didactic reasons. This is how you make Chan an instance of the Functor class by defining how to fmap a function f over it:

instance Functor Chan where
    fmap f (Chan xs) = Chan (map f xs)

Here, map is a library function that applies f to every element of the list. This is very much like applying C++ std::transform to a container, except that in Haskell everything is evaluated lazily, so you can apply fmap to an infinite list, or to a list that is not there yet because, for instance, it’s being generated in real time from incoming telemetry.

Everything is a Combinator

Let’s see how far we can get with this channel idea. The next step is to be able to combine multiple channels to generate streams of derived data. For instance, suppose that you have one channel from a pressure gauge, and another providing volume data, and you want to calculate instantaneous temperature using the ideal gas equation.

Let’s start with defining some types. We want separate types for quantities that are measured using different units. Once more, I’m being didactic here, because there are ready-made Haskell libraries that use so called phantom types to encode measurement units. Here I’ll do it naively:

data Pressure = Pascal Float
data Volume   = Meter3 Float
data Temp     = Kelvin Float

I’ll also define the ideal gas constant:

constR = 8.314472 -- J/(mol·K)

Here’s the function that calculates the temperature of ideal gas:

getT :: Float -> Pressure -> Volume -> Temp
getT n (Pascal p) (Meter3 v) = Kelvin (p * v / (n * constR))

The question is, how can we apply this function to the pressure and volume channels to get the temperature channel? We know how to apply a function to a single channel using fmap, but here we have to work with two channels. Fortunately, a channel is not just a functor — it’s an applicative functor. It defines the action of multi-argument functions on multiple channels. I’ll give you a Haskell implementation, but you should be able to do the same in C++ by overloading fmap or transform.

instance Applicative Chan where
    pure x = Chan (repeat x)
    (Chan fs) <*> (Chan xs) = Chan (zipWith ($) fs xs)

The Applicative class defines two functions. One is called pure, and it creates a constant channel from a value by repeatedly returning the same value. The other is a binary operator <*> that applies a channel of functions (yes, you can treat functions the same way you treat any other data) to a channel of values. The function zipWith applies, pairwise, functions to arguments using the function application operator ($).

Again, the details are not essential. The bottom line is that this allows us to apply our function getT to two channels (actually, three channels, since we also need to provide the amount of gas in moles — here I’m assuming 0.1 moles).

chT :: Chan Pressure -> Chan Volume -> Chan Temp
chT chP chV = getT <$> pure 0.1 <*> chP <*> chV

Such functions that combine channels into new channels are called combinators, and an applicative functor makes the creation of new combinators very easy.

The combinators are not limited to producing physical quantities. They may as well produce channels of alerts, channels of pixels for display, or channels of visual widgets. You can construct the whole architecture around channels. And since we’ve been only considering functional data structures, the resulting architecture can be easily subject to parallelization.

Moving Average

But don’t some computations require mutable state? For instance, don’t you need some kind of accumulators in order to calculate, let’s say, moving averages? Let’s see how this can be done functionally.

The idea is to keep a running sum of list elements within a fixed window of size n. When advancing through the list, we will add the new incoming element to the running sum and subtract the old outgoing element. The average is just this sum divided by n.

We can use the old trick of delaying the list by n positions. We’ll pad the beginning of the delayed list with n zeros. Here’s the Haskell code:

delay :: Num a => Int -> [a] -> [a]
delay n lst = replicate n 0 ++ lst

The first line is the (optional, but very useful) type signature. The second line defines the function delay that takes the delay counter n and the list. The function returns a list that is obtained by concatenating (operator ++) the zero-filled list (replicate n 0) in front of the original list. For instance, if you start with the list [1, 2, 3, 4] and delay it by 2, you’ll get [0, 0, 1, 2, 3, 4].

The next step is to create a stream of deltas — the differences between elements separated by n positions. We do it by zipping two lists: the original and the delayed one.

zip lst (delay n lst)

The function zip pairs elements from the first list with the elements from the second list.

Continuing with our example, the zipping will produce the pairs [(1, 0), (2, 0), (3, 1), (4, 2)]. Notice that the left number in each pair is the incoming element that is to be added to the running sum, while the right number is the outgoing one, to be subtracted from the running sum.

Now if we subtract the two numbers in each pair we’ll get exactly the delta that has to be added to the running sum at each step. We do the subtraction by mapping the operator (-) over the list. (To make the subtraction operator (-) operate on pairs we have to uncurry it. (If you don’t know what currying is, don’t worry.)

deltas :: Num a => Int -> [a] -> [a]
deltas n lst = map (uncurry (-)) (zip lst (delay n lst))

Continuing with the example, we will get [1, 2, 2, 2]. These are the amounts by which the running sum should change at every step. (Incidentally, for n equal to one, the deltas are proportional to the derivative of the sampled stream.)

Finally, we have to start accumulating the deltas. There is a library function scanl1 that can be used to produce a list of partial sums when called with the summation operator (+).

slidingSums :: Num a => Int -> [a] -> [a]
slidingSums n lst =  scanl1 (+) (deltas n lst)

At each step, scanl1 will add the delta to the previous running sum. The “1” in its name means that it will start with the first element of the list as the accumulator. The result, in our little example, is [1, 3, 5, 7]. What remains is to divide each sum by n and we’re done:

movingAverage :: Fractional a => Int -> [a] -> [a]
movingAverage n list = map (/ (fromIntegral n)) (slidingSums n list)

Since n is an integer, it has to be explicitly converted to a fractional number before being passed to the division operator. This is done using fromIntegral. The slightly cryptic notation (/ (fromIntegral n)) is called operator section. It just means “divide by n.”

As expected, the final result for the two-element running average of [1, 2, 3, 4] is [0.5, 1.5, 2.5, 3.5]. Notice that we haven’t used any mutable state to achieve this result, which makes this code automatically thread safe. Also, because the calculation is lazy, we can calculate the moving average of an infinite list as long as we only extract a finite number of data points. Here, we are printing the first 10 points of the 5-element moving average of the list of integers from 1 to infinity.

print (take 10 (movingAverage 5 [1..]))

The result is:

[0.2, 0.6, 1.2, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

Conclusion

The functional approach is applicable to designing software not only in the small but, more importantly, in the large. It captures the patterns of interaction between components and the ways they compose. The patterns I mentioned in this post, the functor and the applicative functor, are probably the most common, but functional programmers have at their disposal a large variety of patterns borrowed from various branches of mathematics. These patterns can be used by imperative programmers as well, resulting in cleaner and more maintainable software that is, by construction, multithread-ready.


Table of Contents

Part One

  1. Category: The Essence of Composition
  2. Types and Functions
  3. Categories Great and Small
  4. Kleisli Categories
  5. Products and Coproducts
  6. Simple Algebraic Data Types
  7. Functors
  8. Functoriality
  9. Function Types
  10. Natural Transformations

Part Two

  1. Declarative Programming
  2. Limits and Colimits
  3. Free Monoids
  4. Representable Functors
  5. The Yoneda Lemma
  6. Yoneda Embedding

Part Three

  1. It’s All About Morphisms
  2. Adjunctions
  3. Free/Forgetful Adjunctions
  4. Monads: Programmer’s Definition
  5. Monads and Effects
  6. Monads Categorically
  7. Comonads
  8. F-Algebras
  9. Algebras for Monads
  10. Ends and Coends
  11. Kan Extensions
  12. Enriched Categories
  13. Topoi
  14. Lawvere Theories
  15. Monads, Monoids, and Categories

There is a free pdf version of this book with nicer typesetting available for download. You may order a hard-cover version with color illustrations at Blurb. Or you may watch me teaching this material to a live audience.

Preface

For some time now I’ve been floating the idea of writing a book about category theory that would be targeted at programmers. Mind you, not computer scientists but programmers — engineers rather than scientists. I know this sounds crazy and I am properly scared. I can’t deny that there is a huge gap between science and engineering because I have worked on both sides of the divide. But I’ve always felt a very strong compulsion to explain things. I have tremendous admiration for Richard Feynman who was the master of simple explanations. I know I’m no Feynman, but I will try my best. I’m starting by publishing this preface — which is supposed to motivate the reader to learn category theory — in hopes of starting a discussion and soliciting feedback.

I will attempt, in the space of a few paragraphs, to convince you that this book is written for you, and whatever objections you might have to learning one of the most abstract branches of mathematics in your “copious spare time” are totally unfounded.

My optimism is based on several observations. First, category theory is a treasure trove of extremely useful programming ideas. Haskell programmers have been tapping this resource for a long time, and the ideas are slowly percolating into other languages, but this process is too slow. We need to speed it up.

Second, there are many different kinds of math, and they appeal to different audiences. You might be allergic to calculus or algebra, but it doesn’t mean you won’t enjoy category theory. I would go as far as to argue that category theory is the kind of math that is particularly well suited for the minds of programmers. That’s because category theory — rather than dealing with particulars — deals with structure. It deals with the kind of structure that makes programs composable.

Composition is at the very root of category theory — it’s part of the definition of the category itself. And I will argue strongly that composition is the essence of programming. We’ve been composing things forever, long before some great engineer came up with the idea of a subroutine. Some time ago the principles of structured programming revolutionized programming because they made blocks of code composable. Then came object oriented programming, which is all about composing objects. Functional programming is not only about composing functions and algebraic data structures — it makes concurrency composable — something that’s virtually impossible with other programming paradigms.

Third, I have a secret weapon, a butcher’s knife, with which I will butcher math to make it more palatable to programmers. When you’re a professional mathematician, you have to be very careful to get all your assumptions straight, qualify every statement properly, and construct all your proofs rigorously. This makes mathematical papers and books extremely hard to read for an outsider. I’m a physicist by training, and in physics we made amazing advances using informal reasoning. Mathematicians laughed at the Dirac delta function, which was made up on the spot by the great physicist P. A. M. Dirac to solve some differential equations. They stopped laughing when they discovered a completely new branch of calculus called distribution theory that formalized Dirac’s insights.

Of course when using hand-waving arguments you run the risk of saying something blatantly wrong, so I will try to make sure that there is solid mathematical theory behind informal arguments in this book. I do have a worn-out copy of Saunders Mac Lane’s Category Theory for the Working Mathematician on my nightstand.

Since this is category theory for programmers I will illustrate all major concepts using computer code. You are probably aware that functional languages are closer to math than the more popular imperative languages. They also offer more abstracting power. So a natural temptation would be to say: You must learn Haskell before the bounty of category theory becomes available to you. But that would imply that category theory has no application outside of functional programming and that’s simply not true. So I will provide a lot of C++ examples. Granted, you’ll have to overcome some ugly syntax, the patterns might not stand out from the background of verbosity, and you might be forced to do some copy and paste in lieu of higher abstraction, but that’s just the lot of a C++ programmer.

But you’re not off the hook as far as Haskell is concerned. You don’t have to become a Haskell programmer, but you need it as a language for sketching and documenting ideas to be implemented in C++. That’s exactly how I got started with Haskell. I found its terse syntax and powerful type system a great help in understanding and implementing C++ templates, data structures, and algorithms. But since I can’t expect the readers to already know Haskell, I will introduce it slowly and explain everything as I go.

If you’re an experienced programmer, you might be asking yourself: I’ve been coding for so long without worrying about category theory or functional methods, so what’s changed? Surely you can’t help but notice that there’s been a steady stream of new functional features invading imperative languages. Even Java, the bastion of object-oriented programming, let the lambdas in C++ has recently been evolving at a frantic pace — a new standard every few years — trying to catch up with the changing world. All this activity is in preparation for a disruptive change or, as we physicist call it, a phase transition. If you keep heating water, it will eventually start boiling. We are now in the position of a frog that must decide if it should continue swimming in increasingly hot water, or start looking for some alternatives.

IMG_1299

One of the forces that are driving the big change is the multicore revolution. The prevailing programming paradigm, object oriented programming, doesn’t buy you anything in the realm of concurrency and parallelism, and instead encourages dangerous and buggy design. Data hiding, the basic premise of object orientation, when combined with sharing and mutation, becomes a recipe for data races. The idea of combining a mutex with the data it protects is nice but, unfortunately, locks don’t compose, and lock hiding makes deadlocks more likely and harder to debug.

But even in the absence of concurrency, the growing complexity of software systems is testing the limits of scalability of the imperative paradigm. To put it simply, side effects are getting out of hand. Granted, functions that have side effects are often convenient and easy to write. Their effects can in principle be encoded in their names and in the comments. A function called SetPassword or WriteFile is obviously mutating some state and generating side effects, and we are used to dealing with that. It’s only when we start composing functions that have side effects on top of other functions that have side effects, and so on, that things start getting hairy. It’s not that side effects are inherently bad — it’s the fact that they are hidden from view that makes them impossible to manage at larger scales. Side effects don’t scale, and imperative programming is all about side effects.

Changes in hardware and the growing complexity of software are forcing us to rethink the foundations of programming. Just like the builders of Europe’s great gothic cathedrals we’ve been honing our craft to the limits of material and structure. There is an unfinished gothic cathedral in Beauvais, France, that stands witness to this deeply human struggle with limitations. It was intended to beat all previous records of height and lightness, but it suffered a series of collapses. Ad hoc measures like iron rods and wooden supports keep it from disintegrating, but obviously a lot of things went wrong. From a modern perspective, it’s a miracle that so many gothic structures had been successfully completed without the help of modern material science, computer modelling, finite element analysis, and general math and physics. I hope future generations will be as admiring of the programming skills we’ve been displaying in building complex operating systems, web servers, and the internet infrastructure. And, frankly, they should, because we’ve done all this based on very flimsy theoretical foundations. We have to fix those foundations if we want to move forward.

Ad hoc measures preventing the Beauvais cathedral from collapsing

Ad hoc measures preventing the Beauvais cathedral from collapsing

Next: Category: The Essence of Composition.


We have this friendly competition going on between Eric Niebler and myself. He writes some clever C++ template code, and I feel the compulsion to explain it to him in functional terms. Then I write a blog about Haskell or category theory and Eric feels a compulsion to translate it into C++.

Eric is now working on his proposal to rewrite the C++ STL in terms of ranges and I keep reinterpreting his work in terms familiar to functional programmers. Eric’s range comprehensions are a result of some of this back and forth.

Lazy ranges are such an excellent example of functional programming that it would be foolish for me to pass this opportunity to dissect them. To any functional programmer worth their salt they just scream “monad!” A monad is a higher order pattern that can be built step by step, so that’s what I’m going to do. I’ll start with the functor pattern, then add some functionality that will make it a pointed functor, then add some more to make it an applicative functor, and finally add some more to make it a monad. This gradual buildup of functionality is reminiscent of building a class hierarchy, and indeed it can be modelled as such in Haskell (although Haskell type classes are slightly different than C++ classes). This hierarchy would look something like this:

  • A monad is-an applicative functor
  • An applicative functor is-a pointed functor
  • A pointed functor is-a functor

So let’s start with a functor.

Functor

I have a pet peeve about the use of the word “functor” in C++. People keep calling function objects functors. It’s like calling Luciano Pavarotti an “operator,” because he sings operas. The word functor has a very precise meaning in mathematics — moreover, it’s the branch of mathematics that’s extremely relevant to programming. So hijacking this term to mean a function-like object causes unnecessary confusion.

A functor in functional programming is a generic template, which allows the “lifting” of functions. Let me explain what it means. A generic template takes an arbitrary type as a template argument. So a range (whether lazy or eager) is a generic template because it can be instantiated for any type. You can have a range of integers, a range of vectors, a range of ranges, and so on. (We’ll come back to ranges of ranges later when we talk about monads.)

The “lifting” of functions means this: Give me any function from some type T to some other type U and I can apply this function to a range of T and produce a range of U. You may recognize this kind of lifting in the STL algorithm std::transform, which can be used to apply a function to a container. STL containers are indeed functors. Unfortunately, their functorial nature is buried under the noise of iterators. In Eric’s range library, the lifting is done much more cleanly using view::transform. Have a look at this example:

 int total = accumulate(view::iota(1) |
                        view::transform([](int x){return x*x;}) |
                        view::take(10), 0);

Here, view::transform takes an anonymous function that squares its argument, and lifts this function to the level of ranges. The range created by view::iota(1) is piped into it from the left, and the resulting rage of squares emerges from it on the right. The (infinite) range is then truncated by take‘ing the first 10 elements.

The function view::iota(1) is a factory that produces an infinite range of consecutive integers starting from 1. (We’ll come back to range factories later.)

In this form, view::transform plays the role of a higher-order function: one that takes a function and returns a function. It almost reaches the level of terseness and elegance of Haskell, where this example would look something like this:

total = sum $ take 10 $ fmap (\x->x*x) [1..]

(Traditionally, the flow of data in Haskell is from right to left.) The (higher-order) function fmap can be thought of as a “method” of the class Functor that does the lifting in Haskell. In C++ there is no overall functor abstraction, so each functor names its lifting function differently — for ranges, it’s view::transform.

The intuition behind a functor is that it generates a family of objects that somehow encapsulate values of arbitrary types. This encapsulation can be very concrete or very abstract. For instance, a container simply contains values of a given type. A range provides access to values that may be stored in some other container. A lazy range generates values on demand. A future, which is also a functor (or will be, in C++17), describes a value that might not be currently available because it’s being evaluated in a separate thread.

All these objects have one thing in common: they provide means to manipulate the encapsulated values with functions. That’s the only requirement for a functor. It’s not required that a functor provide access to encapsulated values (which may not even exist), although most do. In fact there is a functor (really, a monad), in Haskell, that provides no way of accessing its values other than outputting them to external devices.

Pointed Functor

A pointed functor is a functor with one additional ability: it lets you lift individual values. Give me a value of any type and I will encapsulate it. In Haskell, the encapsulating function is called pure although, as we will see later, in the context of a monad it’s called return.

All containers are pointed, because you can always create a singleton container — one that contains only one value. Ranges are more interesting. You can obviously create a range from a singleton container. But you can also create a lazy range from a value using a (generic) function called view::single, which doesn’t have a backing container behind it.

There is, however, an alternative way of lifting a value to a range, and that is by repeating it indefinitely. The function that creates such infinite (lazy) ranges is called view::repeat. For instance, view::repeat(1) creates an infinite series of ones. You might be wondering what use could there be of such a repetitive range. Not much, unless you combine it with other ranges. In general, pointed functors are not very interesting other than as stepping stones towards applicative functors and monads. So let’s move on.

Applicative Functor

An applicative functor is a pointed functor with one additional ability: it lets you lift multi-argument functions. We already know how to lift a single-argument function using fmap (or transform, or whatever it’s called for a particular functor).

With multi-argument functions acting on ranges we have two different options corresponding to the two choices for pure I mentioned before: view::single and view::repeat.

The idea, taken from functional languages, is to consider what happens when you provide the first argument to a function of multiple arguments (it’s called partial application). You don’t get back a value. Instead you get something that expects one or more remaining arguments. A thing that expects arguments is called a function (or a function object), so you get back a function of one fewer arguments. In C++ you can’t just call a function with fewer arguments than expected, or you get a compilation error, but there is a (higher-order) function in the Standard Library called std::bind that implements partial application.

This kind of transformation from a function of multiple arguments to a function of one argument that returns a function is called currying.

Let’s consider a simple example. We want to apply std::make_pair to two ranges: view::ints(10, 11) and view::ints(1, 3). To this end, let’s replace std::make_pair with the corresponding curried function of one argument returning a function of one argument:

[](int i) { return [i](int j) { return std::make_pair(i, j); };}

First, we want to apply this function to the first range. We know how to apply a function to a range: we use view::transform.

auto partial_app = view::ints(10, 11) 
                 | view::transform([](int i) { 
                      return [i](int j) { return std::make_pair(i, j); }
                   });

What’s the result of this application? Can you guess? Our curried function will be applied to each integer in the range, returning a function that pairs that integer with its argument. So we end up with a range of functions of the form:

[i](int j) { return std::make_pair(i, j); }

So far so good — we have just used the functorial property of the range. But now we have to decide how to apply a range of functions to the second range of values. And that’s the essence of the definition of an applicative functor. In Haskell the operation of applying encapsulated functions to encapsulated arguments is denoted by an infix operator <*>.

With ranges, there are two basic strategies:

  1. We can enumerate all possible combinations — in other words create the cartesian product of the range of functions with the range of values — or
  2. Match corresponding functions to corresponding values — in other words, “zip” the two ranges.

The former, when applied to view::ints(1, 3), will yield:

{(10,1),(10,2),(10,3),(11,1),(11,2),(11,3)}

and the latter will yield:

{(10, 1),(11, 2)}

(when the ranges are not equal length, you stop zipping when the shorter one is exhausted).

To see that those two choices correspond to the two choices for pure, we have to look at some consistency conditions. One of them is that if you encapsulate a single-argument function in a range using pure and then apply it to a range of arguments, you should get the same result as if you simply fmapped this function directly over the range of arguments. For the record, I’ll write it here in Haskell:

pure f <*> xs == fmap f xs

This is sort of an obvious requirement: You have two different ways of applying a single-argument function to a range, they better give the same result.

Let’s try it with the view::single version of pure. When acting on a function, it will create a one-element range containing this function. The “all possible combinations” application will just apply this function to all elements of the argument range, which is exactly what view::transform would do.

Conversely, if we apply view::repeat to the function, we’ll get an infinite range that repeats this function at every position. We have to zip this range with the range of arguments in order to get the same result as view::transform. So this implementation of pure works with the zippy applicative. Notice that if the argument range is finite the result will also be finite. But this kind of application will also work with infinite ranges thanks to laziness.

So there are two legitimate implementations of the applicative functor for ranges. One uses view::single to lift values and uses the all possible combinations strategy to apply a range of functions to a range of arguments. The other uses view::repeat to lift values and the zipping application for ranges of functions and arguments. They are both acceptable and have their uses.

Now let’s go back to our original problem of applying a function of multiple arguments to a bunch of ranges. Since we are not doing it in Haskell, currying is not really a practical option.

As it turns out, the second version of applicative has been implemented by Eric as a (higher-order) function view::zip_with. This function takes a multi-argument callable object as its first argument, followed by a variadic list of ranges.

There is no corresponding implementation for the combinatorial applicative. I think the most natural interface would be an overload of view::transform (or maybe view::fmap) with the same signature as zip_with. Our example would then look like this:

view::transform(std::make_pair, view::ints(10, 11), view::ints(1, 3));

The need for this kind of interface is not as acute because, as we’ll learn next, the combinatorial applicative is supplanted by a more general monadic interface.

Monad

Monads are applicative functors with one additional functionality. There are two equivalent ways of describing this functionality. But let me first explain why this functionality is needed.

The range library comes with a bunch of range factories, such as view::iota, view::ints, or view::repeat. It’s also very likely that users of the library will want to create their own range factories. The problem is: How do you compose existing range factories to obtain new range factories?

Let me give you an example that generated a blog post exchange between me and Eric. The challenge was to generate a list of Pythagorean triples. The idea is to take a cross product of three infinite ranges of integers and select those triples that satisfy the equation x2 + y2 = z2. The cross product of ranges is reminiscent of the “all possible combinations” applicative, and indeed that’s the applicative that can be extended to a monad (the zippy one can’t).

To make this algorithm feasible, we have to organize these ranges so we can (lazily) traverse them. Let’s start with a factory that produces all integers from 1 to infinity. That’s the view::ints(1) factory. Then, for each z produced by that factory, let’s create another factory view::ints(1, z). This range will provide our xs — and it makes no sense to try xs that are bigger than zs. These values, in turn, will be used in the creation of the third factory, view::ints(x, z) that will generate our ys. At the end we’ll filter out the triples that don’t satisfy the Pythagorean equation.

Notice how we are feeding the output of one range factory into another range factory. Do you see the problem? We can’t just iterate over an infinite range. We need a way to glue the output side of one range factory to the input side of another range factory without unpacking the range. And that’s what monads are for.

Remember also that there are functors that provide no way of extracting values, or for which extraction is expensive or blocking (as is the case with futures). Being able to compose those kinds of functor factories is often essential, and again, the monad is the answer.

Now let’s pinpoint the type of functionality that would allow us to glue range factories end-to-end. Since ranges are functorial, we can use view::transform to apply a factory to a range. After all a factory is just a function. The only problem is that the result of such application is a range of ranges. So, really, all that’s needed is a lazy way of flattening nested ranges. And that’s exactly what Eric’s view::flatten does.

With this new flattening power at our disposal, here’s a possible beginning of the solution to the Pythagorean triple problem:

view::ints(1) | view::transform([](int z) { 
                view::ints(1, z) | ... } | view::flatten

However, this combination of view::transform and view::flatten is so useful that it deserves its own function. In Haskell, this function is called “bind” and is written as an infix operator >>=. (And, while we’re at it, flatten is called join.)

And guess what the combination of view::transform and view::flatten is called in the range library. This discovery struck me as I was looking at one of Eric’s examples. It’s called view::for_each. Here’s the solution to the Pythagorean triple problem using view::for_each to bind range factories:

auto triples =
  for_each(ints(1), [](int z) {
    return for_each(ints(1, z), [=](int x) {
      return for_each(ints(x, z), [=](int y) {
        return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z));
      });
    });
  });

And here’s the same code in Haskell:

triples = 
  (>>=) [1..] $ \z -> 
     (>>=) [1..z] $ \x -> 
        (>>=) [x..z] $ \y -> 
           guard (x^2 + y^2 == z^2) >> return (x, y, z)

I have purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators >>= are normally used in infix position but here I wanted to match them against for_each. Haskell’s return is the same as view::single, which Eric renamed to yield inside for_each. In this particular case, yield is conditional, which in Haskell is expressed using guard. The syntax for lambdas is also different, but otherwise the code matches almost perfectly.

This is an amazing and somewhat unexpected convergence. In our tweeter exchange, Eric sheepishly called his for_each code imperative. We are used to thinking of for_each as synonymous with looping, which is such an iconic imperative construct. But here, for_each is the monadic bind — the epitome of functional programming. This puppy is purely functional. It’s an expression that returns a value (a range) and has no side effects.

But what about those loops that do have side effects and don’t return a value? In Haskell, side effects are always encapsulated using monads. The equivalent of a for_each loop with side effects would return a monadic object. What we consider side effects would be encapsulated in that object. It’s not the loop that performs side effects, its that object. It’s an executable object. In the simplest case, this object contains a function that may be called with the state that is to be modified. For side effects that involve the external world, there is a special monad called the IO monad. You can produce IO objects, you can compose them using monadic bind, but you can’t execute them. Instead you return one such object that combines all the IO of your program from main and let the runtime execute it. (At least that’s the theory.)

Is this in any way relevant to an imperative programmer? After all, in C++ you can perform side effects anywhere in your code. The problem is that there are some parts of your code where side effects can kill you. In concurrent programs uncontrolled side effects lead to data races. In Software Transactional Memory (STM, which at some point may become part of C++) side effects may be re-run multiple times when a transaction is retried. There is an urgent need to control side effects and to separate pure functions from their impure brethren. Encapsulating side effects inside monads could be the ticket to extend the usefulness of pure functions inside an imperative language like C++.

To summarize: A monad is an applicative functor with an additional ability, which can be expressed either as a way of flattening a doubly encapsulated object, or as a way of applying a functor factory to an encapsulated object.

In the range library, the first method is implemented through view::flatten, and the second through view::for_each. Being an applicative functor means that a range can be manipulated using view::transform and that any value may be encapsulated using view::single or, inside for_each, using yield.

The ability to apply a range of functions to a range of arguments that is characteristic of an applicative functor falls out of the monadic functionality. For instance, the example from the previous section can be rewritten as:

for_each(ints(10, 11), [](int i) {
  return for_each(ints(1, 3), [i](int j) {
    return yield(std::make_pair(i, j));
  });
});

The Mess We’re In

I don’t think the ideas I presented here are particularly difficult. What might be confusing though is the many names that are used to describe the same thing. There is a tendency in imperative (and some functional) languages to come up with cute names for identical patterns specialized to different applications. It is also believed that programmers would be scared by terms taken from mathematics. Personally, I think that’s silly. A monad by any other name would smell as sweet, but we wouldn’t be able to communicate about them as easily. Here’s a sampling of various names used in relation to concepts I talked about:

  1. Functor: fmap, transform, Select (LINQ)
  2. Pointed functor: pure, return, single, repeat, make_ready_future, yield, await
  3. Applicative functor: <*>, zip_with
  4. Monad: >>=, bind, mbind, for_each, next, then, SelectMany (LINQ)

Part of the problem is the lack of expressive power in C++ to unite such diverse phenomena as ranges and futures. Unfortunately, the absence of unifying ideas adds to the already overwhelming complexity of the language and its libraries. The functional paradigm could be a force capable of building connections between seemingly distant application areas.

Acknowledments

I’m grateful to Eric Niebler for reviewing the draft of this blog and correcting several mistakes. The remaining mistakes are all mine.


I’m not fond of arguments based on lack of imagination. “There’s no way this code may fail!” might be a sign of great confidence or the result of ignorance. The inability to come up with a counterexample doesn’t prove a theorem. And yet there is one area of programming where such arguments work, and are quite useful. These are parametricity arguments: free theorems about polymorphic functions. Fortunately, there is solid theory behind parametricity. Free theorems are not based on ignorance. So I decided to read the relevant papers (see bibliography at the end of this post) and write a blog about it. How hard could it be? A few months and several failed attempts later I realized how naive I was. But I think I finally understand the basics enough to explain them in relatively simple terms.

Motivation

Here’s a classic example — a function that takes a list of arbitrary type a and returns a list of the same type:

r :: [a] -> [a]

What can this function do? Since it has to work with any type of list element, it can’t do anything type-specific. It can’t modify the elements or invent new ones. So all it can do is rearrange them, duplicate, or remove. Can you think of anything else?

The questions it a little tricky because it all depends on the kind of polymorphism your language supports. In Haskell, where we have parametric polymorphism, the above statement is for the most part true (modulo termination worries). In C++, which supports ad-hoc polymorphism, a generic function like:

template<class T> 
list<T> r(list<T>);

can do all kinds of weird things.

Parametric polymorphism means that a function will act on all types uniformly, so the above declaration of r indeed drastically narrows down the possibilities.

For instance, consider what happens when you map any function of the type:

f :: a -> b

over a list of a. You can either apply map before or after acting on it with r. It shouldn’t matter whether you first modify the elements of the list and then rearrange them, or first rearrange and then modify them. The result should be the same:

r (map f as) = map f (r as)

But is it true just because we can’t imagine how it may fail, or can we make a formal argument to prove it?

Let’s Argue (Denotational) Semantics

One way to understand polymorphism is to have a good model for types. At first approximation types can be modeled as sets of values (strictly speaking, as shown by Reynolds, the set-theoretical model fails in the case of polymorphic lambda calculus, but there are ways around it).

The type Bool is a two-element set of True and False, Integer is a set of integers, and so on. Composite types can also be defined set-theoretically. For instance, a pair type is a cartesian product of two sets. A list of a is a set of lists with elements from the set a. A function type a->b is a set of functions between two sets.

For parametric polymorphism you need to first be able to define functions on types: functions that take a type and produce a new type. In other words, you should be able to define a family of types that is parametrized by another type. In Haskell, we call such things type constructors.

For instance, given some type a, produce a type of pairs: (a, a). This can be formally written (not in Haskell) as:

Λa . (a, a)

Notice the capital lambda for defining functions on types (sets), as opposed to the lowercase lambda used for functions on values (set elements).

To turn a family of types into a family of values — a polymorphic value — you put the universal quantifier forall in front of it. Don’t read too much into the quantifier aspect of it — it makes sense in the Curry-Howard isomorphism, but here it’s just a piece of syntax. It means that you use the type constructor to pick a type, and then you pick a specific value of that type.

You may recall the Axiom of Choice (AoC) from set theory. This axiom says that if you have a set of sets then there always exists a set of samples created by picking one element from each set. It’s like going to a chocolate store and ordering one of each. It’s a controversial axiom, and mathematicians are very careful in either using or avoiding it. The controversy is that, for infinite sets of sets, there may be no constructive way of picking elements. And in computer science we are not so much interested in proofs of existence, as in actual algorithms that produce tangible results.

Here’s an example:

forall a . (a, a)

This is a valid type signature, but you’d be hard pressed to implement it. You’d have to provide a pair of concrete values for every possible type. You can’t do it uniformly across all types. (Especially that some types are uninhabited, as Gershom Bazerman kindly pointed out to me.)

Interestingly enough, you can sometimes define polymorphic values if you constrain polymorphism to certain typeclasses. For instance, when you define a numeric constant in Haskell:

x = 1

its type is polymorphic:

x :: forall a. Num a => a

(using the language extension ExplicitForAll). Here x represents a whole family of values, including:

1.0 :: Float
1 :: Int
1 :: Integer

with potentially different representations.

But there are some types of values that can be specified wholesale. These are function values. Functions are first class values in Haskell (although you can’t compare them for equality). And with one formula you can define a whole family of functions. The following signature, for instance, is perfectly implementable:

forall a . a -> a

Let’s analyze it. It consists of a type function, or a type constructor:

Λa . a -> a

which, for any type a, returns a function type a->a. When universally quantified with forall, it becomes a family of concrete functions, one per each type. This is possible because all these functions can be defined with one closed term (see Appendix 2). Here’s this term:

\x -> x

In this case we actually have a constructive way of picking one element — a function — for each type a. For instance, if a is a String, we pick a function that takes any String and returns the same string. It’s a particular String->String function, one of many possible String->String functions. And it’s different from the Int->Int function that takes an Int and returns the same Int. But all these identity functions are encoded using the same lambda expression. It’s that generic formula that allows us to chose a representative function from each set of functions a->a: one from the set String->String, one from the set Int->Int, etc.

In Haskell, we usually omit the forall quantifier when there’s no danger of confusion. Any signature that contains a type variable is automatically universally quantified over it. (You’ll have to use explicit forall, however, with higher-order polymorphism, where a polymorphic function can be passed as an argument to another function.)

So what’s the set-theoretic model for polymorphism? You simply replace types with sets. A function on types becomes a function on sets. Notice that this is not the same as a function between sets. The latter assigns elements of one set to elements of another. The former assigns sets to sets — you could call it a set constructor. As in: Take any set a and return a cartesian product of this set with itself.

Or take any set a and return the set of functions from this set to itself. We have just seen that for this one we can easily build a polymorphic function — one which for every type a produces an actual function whose type is (a->a). Now, with ad-hoc polymorphism it’s okay to code the String function separately from the Int function; but in parametric polymorphism, you’ll have to use the same code for all types.

This uniformity — one formula for all types — dramatically restricts the set of polymorphic functions, and is the source of free theorems.

Any language that provides some kind of pattern-matching on types (e.g., template specialization in C++) automatically introduces ad-hoc polymorphism. Ad-hoc polymorphism is also possible in Haskell through the use of type classes and type families.

Preservation of Relations

Let’s go to our original example and rewrite it using the explicit universal quantifier:

r :: forall a. [a] -> [a]

It defines a family of functions parametrized by the type a. When used in Haskell code, a particular member of this family will be picked automatically by the type inference system, depending on the context. In what follows, I’ll use explicit subscripting for the same purpose. The free theorem I mentioned before can be rewritten as:

rb (map f as) = map f (ra as)

with the function:

f :: a -> b

serving as a bridge between the types a and b. Specifically, f relates values of type a to values of type b. This relation happens to be functional, which means that there is only one value of type b corresponding to any given value of type a.

But the correspondence between elements of two lists may, in principle, be more general. What’s more general than a function? A relation. A relation between two sets a and b is defined as a set of pairs — a subset of the cartesian product of a and b. A function is a special case of a relation, one that can be represented as a set of pairs of the form (x, f x), or in relational notation x <=> f x. This relation is often called the graph of the function, since it can be interpreted as coordinates of points on a 2-d plane that form the plot the function.

The key insight of Reynolds was that you can abstract the shape of a data structure by defining relations between values. For instance, how do we know that two pairs have the same shape — even if one is a pair of integers, say (1, 7), and the other a pair of colors, say (Red, Blue)? Because we can relate 1 to Red and 7 to Blue. This relation may be called: “occupying the same position”.

Notice that the relation doesn’t have to be functional. The pair (2, 2) can be related to the pair (Black, White) using the non-functional relation:

(2 <=> Black),
(2 <=> White)

This is not a function because 2 is not mapped to a single value.

Conversely, given any relation between integers and colors, you can easily test which integer pairs are related to which color pairs. For the above relation, for instance, these are all the pairs that are related:

((2, 2) <=> (Black, Black)),
((2, 2) <=> (Black, White)),
((2, 2) <=> (White, Black)),
((2, 2) <=> (White, White))

Thus a relation between values induces a relation between pairs.

This idea is easily extended to lists. Two lists are related if their corresponding elements are related: the first element of one list must be related to the first element of the second list, etc.; and empty lists are always related.

In particular, if the relationship between elements is established by a function f, it’s easy to convince yourself that the lists as and bs are related if

bs = map f as

With this in mind, our free theorem can be rewritten as:

rb bs = map f (ra as)

In other words, it tells us that the two lists

rb bs

and

ra as

are related through f.

ListFunRelation

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

So r transforms related lists into related lists. It may change the shape of the list, but it never touches the values in it. When it acts on two related lists, it rearranges them in exactly the same way, without breaking any of the relations between corresponding elements.

Reading Types as Relations

The above examples showed that we can define relations between values of composite types in terms of relations between values of simpler types. We’ve seen this with the pair constructor and with the list constructor. Continuing this trend, we can state that two functions:

f :: a -> b

and

g :: a' -> b'

are related iff, for related x and y, f x is related to g y. In other words, related functions map related arguments to related values.

Notice what we are doing here: We are consistently replacing types with relations in type constructors. This way we can read complex types as relations. The type constructor -> acts on two types, a and b. We extend it to act on relations: The “relation constructor” -> in A->B takes two relations A (between a and a') and B (between b and b') and produces a relation between functions f and g.

But what about primitive types? Let’s consider an example. Two functions from lists to integers that simply calculate the lengths of the lists:

lenStr  :: [Char] -> Int
lenBool :: [Bool] -> Int

What happens when we call them with two related lists? The first requirement for lists to be related is that they are of equal length. So when called with related lists the two functions will return the same integer value . It makes sense for us to consider these two functions related because they don’t inspect the values stored in the lists — just their shapes. (They also look like components of the same parametrically polymorphic function, length.)

It therefore makes sense to read a primitive type, such as Int, as an identity relation: two values are related if they are equal. This way our two functions, lenStr and lenBool are indeed related, because they turn related lists to related (equal) results.

Notice that for non-polymorphic functions the relationship that follows from their type is pretty restrictive. For instance, two functions Int->Int are related if and only if their outputs are equal for equal inputs. In other words, the functions must be (extensionally) equal.

All these relations are pretty trivial until we get to polymorphic functions. The type of a polymorphic function is specified by universally quantifying a function on types (a type constructor).

f :: forall a. φa

The type constructor φ maps types to types. In our set-theoretical model it maps sets to sets, but we want to read it in terms of relations.

Functions on relations

A general relation is a triple: We have to specify three sets, a, a', and a set of pairs — a subset of the cartesian product a × a'. It’s not at all obvious how to define functions that map relations to relations. What Reynolds chose is a definition that naturally factorizes into three mappings of sets, or to use the language of programming, three type constructors.

First of all, a function on relations Φ (or a “relation constructor”) is defined by two type constructors, φ and ψ. When Φ acts on a relation A between sets a and a', it first maps those sets, so that b=φa and b'=ψa'. ΦA then establishes a relation between the sets b and b'. In other words, ΦA is a subset of b × b'.

RelationMap

Fig 2. Φ maps relations to relations. The squarish sets represent cartesian products (think of a square as a cartesian product of two segments). Relations A and ΦA are subsets of these products.

Relations between polymorphic functions

Given that Φ maps relations to relations, a universally quantified version of it:

forall A. ΦA

maps pairs of sets to pairs of values.

Now suppose that you have two polymorphic functions g and g':

g  :: forall a . φa
g' :: forall a'. ψa'

They both map types (sets) to values.

  • We can instantiate g at some type a, and it will return a value ga of the type b=φa.
  • We can instantiate g' at some type a', and it will return a value g'a' of the type b'=ψa'.

We can do this for any relation A between two arbitrary sets a and a'.

We will say that g and g' are related through the relation induced by the type (forall A. ΦA) iff the results ga and g'a' are related by ΦA.

PolyFunRel

Fig 3. Relation between two polymorphic functions. The pair (g a, g' a') falls inside the relation ΦA.

In other words, polymorphic functions are related if they map related types to related values. Notice that in the interesting examples these values are themselves functions.

With these definitions, we can now reinterpret any type signature as a relation between values.

The Parametricity Theorem

Reynolds’ second key insight was that any term is in a relation with itself — the relation being induced by the term’s type. We have indeed defined the mapping of types to relations to make this work. Primitive types turn into identity relations, so obviously a primitive value is in relation with itself. A function between primitive types is in relation with itself because it maps related (equal) arguments into related (equal) results. A list or a pair of primitive types is in relation with itself because each element of it is equal to itself. You can recurse and consider a list of functions, or a pair of lists, etc., building the proof inductively, proceeding from simpler types to more and more complex types. The proof goes over all possible term construction rules and typing rules in a given theory.

Formally, this kind of proof is called “structural induction,” because you’re showing that more complex structures will satisfy the theorem as long as the simpler ones, from which they are constructed, do. The only tricky part is dealing with polymorphic functions, because they are quantified over all types (including polymorphic types). In fact, this is the reason why the naive interpretation of types as sets breaks down (see, however, Pitts’ paper). It is possible, however, to prove the parametricity theorem in a more general setting, for instance, using frames, or in the framework of operational semantics, so we won’t worry about it here.

Wadler’s key insight was to interpret Reynolds’ theorem not only as a way of identifying different implementations of the same type — for instance, cartesian and polar representations of complex numbers — but also as a source of free theorems for polymorphic types.

Let’s try applying parametricity theorem to some simple examples. Take a constant term: an integer like 5. Its type Int can be interpreted as a relation, which we defined to be the identity relation (it’s one of the primitive types). And indeed, 5 is in this relation with 5.

Take a function like:

ord :: Char -> Int

Its type defines a relation between functions: Two functions of the type Char->Int are related if they return equal integers for equal characters. Obviously, ord is in this relation with itself.

Parametricity in Action

Those were trivial examples. The interesting ones involve polymorphic functions. So let’s go back to our starting example. The term now is the polymorphic function r whose type is:

r :: forall a . [a] -> [a]

Parametricity tells us that r is in relation with itself. However, comparing a polymorphic function to itself involves comparing the instantiations of the same function at two arbitrary types, say a and a'. Let’s go through this example step by step.

We are free to pick an arbitrary relation A between elements of two arbitrary input sets a and a'. The type of r induces a mapping Φ on relations. As with every function on relations, we have to first identify the two type constructors φ and ψ, one mapping a and one mapping a'. In our case they are identical, because they are induced by the same polymorphic function. They are equal to:

Λ a. [a]->[a]

It’s a type constructor that maps an arbitrary type a to the function type [a]->[a].

The universal quantifier forall means that r lets us pick a particular value of the type [a]->[a] for each a. This value is a function that we call ra. We don’t care how this function is picked by r, as long as it’s picked uniformly, using a single formula for all a, so that our parametricity theorem holds.

FreeTheorem

Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

Parametricity means that, if a is related to a', then:

ra <=> ra'

This particular relation is induced by the function type [a]->[a]. By our definition, two functions are related if they map related arguments to related results. In this case both the arguments and the results are lists. So if we have two related lists, as and as':

as  :: [a]
as' :: [a']

they must, by parametricity, be mapped to two related lists, bs and bs':

bs  = ra  as
bs' = ra' as'

This must be true for any relation A, so let’s pick a functional relation generated by some function:

f :: a -> a'

This relation induces a relation on lists:

as' = map f as

The results of applying r, therefore, must be related through the same relation:

bs' = map f bs

Combining all these equalities, we get our expected result:

ra' (map f as) = map f (ra as)

Parametricity and Natural Transformations

The free theorem I used as the running example is interesting for another reason: The list constructor is a functor. You may think of functors as generalized containers for storing arbitrary types of values. You can imagine that they have shapes; and for two containers of the same shape you may establish a correspondence between “positions” at which the elements are stored. This is quite easy for traditional containers like lists or trees, and with a leap of faith it can be stretched to non-traditional “containers” like functions. We used the intuition of relations corresponding to the idea of “occupying the same position” within a data structure. This notion can be readily generalized to any polymorphic containers. Two trees, for instance, are related if they are both empty, or if they have the same shape and their corresponding elements are related.

Let’s try another functor: You can also think of Maybe as having two shapes: Nothing and Just. Two Nothings are always related, and two Justs are related if their contents are related.

This observation immediately gives us a free theorem about polymorphic functions of the type:

r :: forall a. [a] -> Maybe a

an example of which is safeHead. The theorem is:

fmap h . safeHead == safeHead . fmap h

Notice that the fmap on the left is defined by the Maybe functor, whereas the one on the right is the list one.

If you accept the premise that an appropriate relation can be defined for any functor, then you can derive a free theorem for all polymorphic functions of the type:

r :: forall a. f a -> g a

where f and g are functors. This type of function is known as a natural transformation between the two functors, and the free theorem:

fmap h . r == r . fmap h

is the naturality condition. That’s how naturality follows from parametricity.

Acknowledgments

I’d like to thank all the people I talked to about parametricity at the ICFP in Gothenburg, and Edward Kmett for reading and commenting on the draft of this blog.

Appendix 1: Other Examples

Here’s a list of other free theorems from Wadler’s paper. You might try proving them using parametricity.

r :: [a] -> a -- for instance, head
f . r == r . fmap f
r :: [a] -> [a] -> [a] -- for instance, (++)
fmap f (r as bs) == r (fmap f as) (fmap f bs)
r :: [[a]] -> [a] -- for instance, concat
fmap f . r == r . fmap (fmap f)
r :: (a, b) -> a -- for instance, fst
f . r == r . mapPair (f, g)
r :: (a, b) -> b -- for instance, snd
g . r == r . mapPair (f, g)
r :: ([a], [b]) -> [(a, b)] -- for instance, uncurry zip
fmap (mapPair (f, g)) . r == r . mapPair (fmap f, fmap g)
r :: (a -> Bool) -> [a] -> [a] -- for instance, filter
fmap f . r (p . f) = r p . fmap f
r :: (a -> a -> Ordering) -> [a] -> [a] -- for instance, sortBy
 -- assuming: f is monotone (preserves order)
fmap f . r cmp == r cmp' . fmap f
r :: (a -> b -> b) -> b -> [a] -> b -- for instance, foldl
-- assuming: g (acc x y) == acc (f x) (g y)
g . foldl acc zero == foldl acc (g zero) . fmap f
r :: a -> a -- id
f . r == r . f
r :: a -> b -> a -- for instance, the K combinator
f (r x y) == r (f x) (g y)

where:

mapPair :: (a -> c, b -> d) -> (a, b) -> (c, d)
mapPair (f, g) (x, y) = (f x, g y)

Appendix 2: Identity Function

Let’s prove that there is only one polymorphic function of the type:

r :: forall a. a -> a

and it’s the identity function:

id x = x

We start by picking a particular relation. It’s a relation between the unit type () and an arbitrary (inhabited) type a. The relation consists of just one pair ((), c), where () is the unit value and c is an element of a. By parametricity, the function

r() :: () -> ()

must be related to the function

ra :: a -> a

There is only one function of the type ()->() and it’s id(). Related functions must map related argument to related values. We know that r() maps unit value () to unit value (). Therefore ra must map c to c. Since c is arbitrary, ra must be an identity for all (inhabited) as.

Bibliography

  1. John C Reynolds, Types, Abstraction and Parametric Polymorphism
  2. Philip Wadler, Theorems for Free!
  3. Claudio Hermida, Uday S. Reddy, Edmund P. Robinson, Logical Relations and Parametricity – A Reynolds Programme for Category Theory and Programming Languages
  4. Derek Dreyer, Paremetricity and Relational Reasoning, Oregon Programming Languages Summer School
  5. Janis Voigtländer, Free Theorems Involving Type Constructor Classes

« Previous PageNext Page »