This is part 4 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
m o n e y

using monads in C++. Previously: The Tale of Two Monads. To start from the beginning, go to Using Monads in C++ to Solve Constraints: 1.

In the previous post I showed some very general programming techniques based on functional data structures and monads. A lot of the code I wrote to solve this particular arithmetic puzzle is easily reusable. In fact the monadic approach is perfect for constructing libraries. I talked about it when discussing C++ futures and ranges. A monad is a pure expression of composability, and composability is the most important feature of any library.

You would be justified to think that rolling out a monad library just to solve a simple arithmetic puzzle is overkill. But I’m sure you can imagine a lot of other, more complex problems that could be solved using the same techniques. You might also fear that such functional methods — especially when implemented in C++, which is not optimized for this kind of programming — would lead to less performant code. This doesn’t matter too much if you are solving a one-shot problem, and the time you spend developing and debugging your code dwarfs the time it takes the program to complete execution. But even if performance were an issue and you were faced with a larger problem, functional code could be parallelized much easier than its imperative counterpart with no danger of concurrency bugs. So you might actually get better performance from functional code by running it in parallel.


In this installment I’d like to talk about something that a lot of functional programmers swear by: Functional programs are amazingly easy to refactor. Anybody who has tried to refactor C++ code can testify to how hard it is, and how long it takes to iron out all the bugs introduced by refactoring. With functional code, it’s a breeze. So let’s have another look at the code from the previous post and do some deep surgery on it. This is our starting point:

StateList<tuple> solve()
    StateList sel = &select;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));

What immediately stands out is the amount of repetition: eight lines of code look almost exactly the same. Conceptually, these lines correspond to eight nested loops that would be used in the imperative solution. The question is, how can we abstract over control structures, such as loops? But in our monadic implementation the loops are replaced with higher-order functions, and that opens up a path toward even further abstraction.

Reifying Substitutions

The first observation is that we are missing a data structure that should correspond to the central concept we use when describing the solution — the substitution. We are substituting numbers for letters, but we only see those letters as names of variables. A more natural implementation would use a map data structure, mapping characters to integers.

Notice, however, that this mapping has to be dynamically collected and torn down as we are exploring successive branches of the solution space. The imperative approach would demand backtracking. The functional approach makes use of persistent data structures. I have described such data structures previously, in particular a persistent red-black tree. It can be easily extended to a red-black map. You can find the implementation on github.

A red-black map is a generic data structure, which we’ll specialize for our use:

using Map = RBMap<char, int>;

A map lookup may fail, so it should either be implemented to return an optional, or use a default value in case of a failure. In our case we know that we never look up a value that’s not in the map (unless our code is buggy), so we can use a helper function that never fails:

int get(Map const & subst, char c)
    return subst.findWithDefault(-1, c);

Once we have objectified the substitution as a map, we can convert a whole string of characters to a number in one operation:

int toNumber(Map const & subst, string const & str)
    int acc = 0;
    for (char c : str)
        int d = get(subst, c);
        acc = 10 * acc + d;
    return acc;

There is one more thing that we may automate: finding the set of unique characters in the puzzle. Previously, we just figured out that they were s, e, n, d, m, o, r, y and hard-coded them into the solution. Now we can use a function, fashioned after a Haskell utility called nub:

string nub(string const & str)
    string result;
    for (char c : str)
        if (result.find(c) == string::npos)
    return result;

Don’t worry about the quadratic running time of nub — it’s only called once.


With those preliminaries out of the way, let’s concentrate on analyzing the code. We have a series of nested mbind calls. The simplest thing is to turn these nested calls into recursion. This involves setting up the first recursive call, implementing the recursive function, and writing the code to terminate the recursion.

The main observation is that each level of nesting accomplishes one substitution of a character by a number. The recursion should end when we run out of characters to substitute.

Let’s first think about what data has to be passed down in a recursive call. One item is the string of characters that still need substituting. The second is the substitution map. While the first argument keeps shrinking, the second one keeps growing. The third argument is the number to be used for the next substitution.

Before we make the first recursive call, we have to prepare the initial arguments. We get the string of characters to be substituted by running nub over the text of our puzzle:


The second argument should start as an empty map. The third argument, the digit to be used for the next substitution, comes from binding the selection plan select<int> to a lambda that will call our recursive function. In Haskell, it’s common to call the auxiliary recursive function go, so that’s what we’ll do here as well:

StateList<tuple<int, int, int>> solve()
    StateList<int> sel = &select<int>;
    Map subst;
    return mbind(sel, [=](int s) {
        return go(nub("sendmoremoney"), subst, s);

When implementing go, we have to think about terminating the recursion. But first, let’s talk about how to continue it.

We are called with the next digit to be used for substitution, so we should perform this substitution. This means inserting a key/value pair into our substitution map subst. The key is the first character from the string str. The value is the digit i that we were given.

subst.inserted(str[0], i)

Notice that, because the map is immutable, the method inserted returns a new map with one more entry. This is a persistent data structure, so the new map shares most of its data with its previous version. The creation of the new version takes logarithmic time in the number of entries (just as it does with a regular balanced tree that’s used in the Standard Library std::map). The advantage of using a persistent map is that we don’t have to do any backtracking — the unmodified version is still available after the return from the recursive call.

The recursive call to go expects three arguments: (1) the string of characters yet to be substituted — that will be the tail of the string that we were passed, (2) the new substitution map, and (3) the new digit n. We get this new digit by binding the digit selection sel to a lambda that makes the recursive call:

mbind(sel, [=](int n) {
    string tail(&str[1], &str[str.length()]);
    return go(tail, subst.inserted(str[0], i), n);

Now let’s consider the termination condition. Since go was called with the next digit to be substituted, we need at least one character in the string for this last substitution. So the termination is reached when the length of the string reaches one. We have to make the last substitution and then call the final function that prunes the bad substitutions. Here’s the complete implementation of go:

StateList<tuple<int, int, int>> go(string str, Map subst, int i)
    StateList<int> sel = &select<int>;

    assert(str.length() > 0);
    if (str.length() == 1)
        return prune(subst.inserted(str[0], i));
        return mbind(sel, [=](int n) {
            string tail(&str[1], &str[str.length()]);
            return go(tail, subst.inserted(str[0], i), n);

The function prune is lifted almost literally from the original implementation. The only difference is that we are now using a substitution map.

StateList<tuple<int, int, int>> prune(Map subst)
    return mthen(guard(get(subst, 's') != 0 && get(subst, 'm') != 0), 
      [=]() {
        int send  = toNumber(subst, "send");
        int more  = toNumber(subst, "more");
        int money = toNumber(subst, "money");
        return mthen(guard(send + more == money), [=]() {
            return mreturn(make_tuple(send, more, money));

The top-level code that starts the chain of events and displays the solution is left unchanged:

int main()
    List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    cout << evalStateList(solve(), lst);
    return 0;

It’s a matter of opinion whether the refactored code is simpler or not. It’s definitely easier to generalize and it’s less error prone. But the main point is how easy it was to make the refactorization. The reason for that is that, in functional code there are no hidden dependencies, because there are no hidden side effects. What would usually be considered a side effect in imperative code is accomplished using pure functions and monadic binding. There is no way to forget that a given function works with state, because a function that deals with state has the state specified in its signature — it returns a StateList. And it doesn’t itself modify the state. In fact it doesn’t have access to the state. It just composes functions that will modify the state when executed. That’s why we were able to move these functions around so easily.

A Word About Syntax

Monadic code in C++ is artificially complex. That’s because C++ was not designed for functional programming. The same program written in Haskell is much shorter. Here it is, complete with the helper functions prune, get, and toNumber:

solve = StateL select >>= go (nub "sendmoremoney") M.empty
    go [c] subst i = prune (M.insert c i subst)
    go (c:cs) subst i = StateL select >>= go cs (M.insert c i subst)
    prune subst = do
        guard (get 's' /= 0 && get 'm' /= 0)
        let send  = toNumber "send"
            more  = toNumber "more"
            money = toNumber "money"
        guard (send + more == money)
        return (send, more, money)
        get c = fromJust (M.lookup c subst)
        toNumber str = asNumber (map get str)
        asNumber = foldl (\t u -> t*10 + u) 0
main = print $ evalStateL solve [0..9]

Some of the simplification comes from using the operator >>= in place of mbind, and a simpler syntax for lambdas. But there is also some built-in syntactic sugar for chains of monadic bindings in the form of the do notation. You see an example of it in the implementation of prune.

The funny thing is that C++ is on the verge of introducing something equivalent to do notation called resumable functions. There is a very strong proposal for resumable functions in connection with the future monad, dealing with asynchronous functions. There is talk of using it with generators, or lazy ranges, which are a version of the list monad. But there is still a need for a push towards generalizing resumable functions to user-defined monads, like the one I described in this series. It just doesn’t make sense to add separate language extensions for each aspect of the same general pattern. This pattern has been known in functional programming for a long time and it’s called a monad. It’s a very useful and versatile pattern whose importance has been steadily growing as the complexity of software projects increases.


  1. The code for this blog is available on github.
  2. Douglas M. Auclair, MonadPlus: What a Super Monad!, The Monad.Reader Issue 11, August 25, 2008. It contains the Haskell implementation of the solution to this puzzle.
  3. Justin Le, Unique sample drawing & searches with List and StateT — “Send more money”. It was the blog post that inspired this series.

This is part 3 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
m o n e y

using monads in C++. Previously: The State Monad

A constraint satisfaction problem may be solved by brute force, by trying all possible substitutions. This may be done using the imperative approach with loops, or a declarative approach with the list monad. The list monad works by fanning out a list of “alternative universes,” one for each individual substitution. It lets us explore the breadth of the solution space. The universes in which the substitution passes all the constraints are then gathered together to form the solution.

We noticed also that, as we are traversing the depth of the solution space, we could avoid testing unnecessary branches by keeping track of some state. Indeed, we start with a list of 10 possibilities for the first character, but there are only 9 possibilities for the second character, 8 for the third, and so on. It makes sense to start with a list of ten digits, and keep removing digits as we make our choices. At every point, the list of remaining choices is the state we have to keep track of.

Again, there is the imperative approach and the functional approach to state. The imperative approach makes the state mutable and the composition of actions requires backtracking. The functional approach uses persistent, immutable data structures, and the composition is done at the level of functions — plans of action that can be executed once the state is available. These plans of action form the state monad.

The functional solution to our problem involves the combination of the list monad and the state monad. Mashing two monads together is not trivial — in Haskell this is done using monad transformers — but here I’ll show you how to do it manually.

State List

Let’s start with the state, which is the list of digits to choose from:

using State = List<int>;

In the state monad, our basic data structure was a function that took a state and returned a pair: a value and a new state. Now we want our functions to generate, as in quantum mechanics, a whole list of alternative universes, each described by a pair: a value and a state.

Notice that we have to vary both the value and the state within the list. For instance, the function that selects a number from a list should produce a different number and a different list in each of the alternative universes. If the choice is, say, 3, the new state will have 3 removed from the list.


Here’s the convenient generic definition of the list of pairs:

template<class A>
using PairList = List<pair<A, State>>;

And this is our data structure that represents a non-deterministic plan of action:

template<class A> 
using StateList = function<PairList<A>(State)>;

Once we have such a non-deterministic plan of action, we can run it. We give it a state and get back a list of results, each paired with a new state:

template<class A>
PairList<A> runStateList(StateList<A> st, State s)
    return st(s);

Composing Non-Deterministic Plans

As we’ve seen before, the essence of every monad is the composition of individual actions. The higher-order function mbind that does the composition has the same structure for every monad. It takes two arguments. One argument is the result of our previous activity. For the list monad, it was a list of values. For the state monad it was a plan of action to produce a value (paired with the leftover state). For the combination state-list monad it will be a plan of action to produce a list of values (each combined with the leftover state).

The second argument to mbind is a continuation. It takes a single value. It’s supposed to be the value that is encapsulated in the first argument. In the case of the list monad, it was one value from the list. In the case of the state monad, it was the value that would be returned by the execution of the plan. In the combined state-list monad, this will be one of the values produced by the plan.

The continuation is supposed to produce the final result, be it a list of values or a plan. In the state-list monad it will be a plan to produce a list of values, each with its own leftover state.

Finally, the result of mbind is the composite: for the list monad it was a list, for the state monad it was a plan, and for the combination, it will be a plan to produce a list of values and states.

We could write the signature of mbind simply as:

template<class A, class B>
StateList<B> mbind(StateList<A>, function<StateList<B>(A)>)

but that would prevent the C++ compiler from making automatic type deductions. So instead we parameterize it with the type A and a function type F. That requires some additional footwork to extract the return type from the type of F:

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))

The implementation of mbind is a combination of what we did for the list monad and for the state monad. To begin with, we are supposed to return a plan, so we have to create a lambda. This lambda takes a state s as an argument. Inside the lambda, given the state, we can run the first argument — the plan. We get a list of pairs. Each pair contains a value and a new state. We want to execute the continuation k for each of these values.

To do that, we run fmap over this list (think std::transform, but less messy). Inside fmap, we disassemble each pair into value and state, and execute the continuation k over the value. The result is a new plan. We run this plan, passing it the new state. The result is a list of pairs: value, state.

Since each item in the list produces a list, the result of fmap is a list of lists. Just like we did in the list monad, we concatenate all these lists into one big list using concatAll.

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))
    return [g, k](State s) {
        PairList<A> plst = g(s);
        // List<PairList<B>> 
        auto lst2 = fmap([k](pair<A, State> const & p) {
            A a = p.first;
            State s1 = p.second;
            auto ka = k(a);
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);

To make state-list into a full-blown monad we need one more ingredient: the function mreturn that turns any value into a trivial plan to produce a singleton list with that value. The state is just piped through:

template<class A>
StateList<A> mreturn(A a)
    return [a](State s) {
        // singleton list
        return PairList<A>(make_pair(a, s)); 

So that would be all for the state-list monad, except that in C++ we have to special-case mbind for the type of continuation that takes a void argument (there is no value of type void in C++). We’ll call this version mthen:

template<class A, class F>
auto mthen(StateList<A> g, F k) -> decltype(k())
    return [g, k](State s) {
        PairList<A> plst = g(s);
        auto lst2 = fmap([k](pair<A, State> const & p) {
            State s1 = p.second;
            auto ka = k();
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);

Notice that, even though the value resulting from running the first argument to mthen is discarded, we still have to run it because we need the modified state. In a sense, we are running it only “for side effects,” although all these functions are pure and have no real side effects. Once again, we can have our purity cake and eat it too, simulating side effects with pure functions.


Every time we generate a candidate substitution, we have to check if it satisfies our constraints. That would be easy if we could access the substitution. But we are in a rather peculiar situation: We are writing code that deals with plans to produce substitutions. How do you test a plan? Obviously we need to write a plan to do the testing (haven’t we heard that before?).

Here’s the trick: Look at the implementation of mthen (or mbind for that matter). When is the continuation not executed? The continuation is not executed if the list passed to fmap is empty. Not executing the continuation is equivalent to aborting the computation. And what does mthen return in that case? An empty list!

So the way to abort a computation is to pass an argument to mthen that produces an empty list. Such a poison pill of a plan is produced by a function traditionally called mzero.

template<class A>
StateList<A> mzero()
    return[](State s) {
        return PairList<A>(); // empty list

In functional programming, a monad that supports this functionality is called a “monad plus” (it also has a function called mplus). It so happens that the list monad and the state-list monad are both plus.

Now we need a function that produces a poison pill conditionally, so we can pass the result of this function to mthen. Remember, mthen ignores the individual results, but is sensitive to the size of the list. The function guard, on success, produces a discardable value — here, a nullptr — in a singleton list. This will trigger the execution of the continuation in mthen, while discarding the nullptr. On failure, guard produces mzero, which will abort the computation.

StateList<void*> guard(bool b)
    if (b) {
        return [](State s) {
            // singleton list with discardable value
            return List<pair<void*, State>>(make_pair(nullptr, s));
        return mzero<void*>(); // empty list

The Client Side

Everything I’ve described so far is reusable code that should be put in a library. Equipped with the state-list monad library, all the client needs to do is to write a few utility functions and combine them into a solution.

Here’s the function that picks a number from a list. Actually, it picks a number from a list in all possible ways. It returns a list of all picks paired with the remainders of the list. This function is our non-deterministic selection plan.

template<class A>
PairList<A> select(List<A> lst)
    if (lst.isEmpty())
        return PairList<A>();

    A       x  = lst.front();
    List<A> xs = lst.popped_front();

    auto result = List<pair<A, State>>();
    forEach(select(xs), [x, &result](pair<A, List<A>> const & p)
        A       y  = p.first;
        List<A> ys = p.second;
        auto y_xys = make_pair(y, ys.pushed_front(x));
        result = result.pushed_front(y_xys);

    return result.pushed_front(make_pair(x, xs));

Here’s a little utility function that converts a list (actually, a vector) of digits to a number:

int asNumber(vector<int> const & v)
    int acc = 0;
    for (auto i : v)
        acc = 10 * acc + i;
    return acc;

And here’s the final solution to our puzzle:

StateList<tuple<int, int, int>> solve()
    StateList<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));

It starts by creating the basic plan called sel for picking numbers from a list. (The list is the state that will be provided later.) It binds this selection to a very large continuation that produces the final result — the plan to produce triples of numbers.

Here, sel is really a plan to produce a list, but the continuation expects just one number, s. If you look at the implementation of mbind, you’ll see that this continuation is called for every single pick, and the results are aggregated into one list.

The large continuation that is passed to the first mbind is itself implemented using mbind. This one, again, takes the plan sel. But we know that the state on which this sel will operate is one element shorter than the one before it. This second selection is passed to the next continuation, and so on.

Then the pruning begins: There is an mthen that takes a guard that aborts all computations where either s or m is zero. Then three numbers are formed from the selected digits. Notice that all those digits are captured by the enclosing lambdas by value (the [=] clauses). Yet another mthen takes a guard that checks the arithmetics and, if that one passes, the final plan to produce a singleton triple (send, more, money) is returned.

In theory, all these singletons would be concatenated on the way up to produce a list of solutions, but in reality this puzzle has only one solution. To get this solution, you run the plan with the list of digits:

List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
cout << evalStateList(solve(), lst);

The function evalStateList works the same way as runStateList except that it discards the leftover state. You can either implement it or use runStateList instead.

What’s Next?

One of the advantages of functional programming (other than thread safety) is that it makes refactoring really easy. Did you notice that 8 lines of code in our solution are almost identical? That’s outrageous! We have to do something about it. And while we’re at it, isn’t it obvious that a substitution should be represented as a map from characters to digits? I’ll show you how to do it in the next installment.

All this code and more is available on github.

This is the second part of the cycle “Using Monads in C++ to Solve Constraints” in which I’m illustrating functional techniques using the example of a simple puzzle:

  s e n d
+ m o r e
m o n e y

Previously, I talked about using the list monad to search the breadth of the solution space.

What would you do if you won a lottery? Would you buy a sports car, drop your current job, and go on a trip across the United States? Maybe you would start your own company, multiply the money, and then buy a private jet?

We all like making plans, but they are often contingent on the state of our finances. Such plans can be described by functions. For instance, the car-buying plan is a function:

pair<Car, Cash> buyCar(Cash cashIn)

The input is some amount of Cash, and the output is a brand new Car and the leftover Cash (not necessarily a positive number!).

In general, a financial plan is a function that takes cash and returns the result paired with the new value of cash. It can be described generically using a template:

template<class A>
using Plan = function<pair<A, Cash>(Cash)>;

You can combine smaller plans to make bigger plans. For instance, you may use the leftover cash from your car purchase to finance your trip, or invest in a business, and so on.

There are some things that you already own, and you can trivially include them in your plans:

template<class A>
Plan<A> got_it(A a)
    return [a](Cash s) { return make_pair(a, s); };

What does all this daydreaming have to do with the solution to our puzzle? I mentioned previously that we needed to keep track of state, and this is how functional programmers deal with state. Instead of relying on side effects to silently modify the state, they write code that generates plans of action.

An imperative programmer, on the other hand, might implement the car-buying procedure by passing it a bank object, and the withdrawal of money would be a side effect of the purchase. Or, the horror!, the bank object could be global.

In functional programming, each individual plan is a function: The state comes in, and the new state goes out, paired with whatever value the function was supposed to produce in the first place. These little plans are aggregated into bigger plans. Finally, the master plan is executed — that’s when the actual state is passed in and the result comes out together with a new state. We can do the same in modern C++ using lambdas.

You might be familiar with a similar technique used with expression templates in C++. Expression templates form the basis of efficient implementation of matrix calculus, where expressions involving matrices and vectors are not evaluated on the spot but rather converted to parse trees. These trees may then be evaluated using more efficient techniques. You can think of an expression template as a plan for evaluating the final result.

The State Monad


To find the solution to our puzzle, we will generate substitutions by picking digits from a list of integers. This list of integers will be our state.

using State = List<int>;

We will be using a persistent list, so we won’t have to worry about backtracking. Persistent lists are never mutated — all their versions persist, and we can go back to them without fearing that they may have changed. We’ll need that when we combine our state calculations with the list monad to get the final solution. For now, we’ll just consider one substitution.

We’ll be making and combining all kinds of plans that rely on, and produce new state:

template<class A>
using Plan = function<pair<A, State>(State)>;

We can always run a plan, provided we have the state available:

template<class A>
pair<A, State> runPlan(Plan<A> pl, State s)
    return pl(s);

You may recall from the previous post that the essence of every monad is the ability to compose smaller items into bigger ones. In the case of the state monad we need the ability to compose plans of action.

For instance, suppose that you know how to generate a plan to travel across the United States on a budget, as long as you have a car. You don’t have a car though. Not to worry, you have a plan to get a car. It should be easy to generate a composite plan that involves buying the car and then continuing with your trip.

Notice the two ingredients: one is the plan to buy a car: Plan<Car>. The second ingredient is a function that takes a car and produces the plan for your trip, Plan<Trip>. This function is the continuation that leads to your final goal: it’s a function of the type Plan<Trip>(Car). And the continuation itself may in turn be a composite of many smaller plans.

So here’s the function mbind that binds a plan pl to a continuation k. The continuation uses the output of the plan pl to generate another plan. The function mbind is supposed to return a new composite plan, so it must return a lambda. Like any other plan, this lambda takes a state and returns a pair: value, state. We’ll implement this lambda for the most general case.

The logic is simple. Inside the lambda we will have the state available, so we can run the plan pl. We get back a pair: the value of type A and the new state. We pass this value to the continuation k and get back a new plan. Finally, we run that plan with the new state and we’re done.

template<class A, class F>
auto mbind(Plan<A> pl, F k) -> decltype(k(pl(State()).first))
    using B = decltype(k(pl(State()).first)(State()).first);
    // this should really be done with concepts
        F, std::function<Plan<B>(A)>> ::value,
        "mbind requires a function type Plan<B>(A)");

    return [pl, k](State s) {
        pair<A, State> ps = runPlan(pl, s);
        Plan<B> plB = k(ps.first);
        return runPlan(plB, ps.second); // new state!

Notice that all this running of plans inside mbind doesn’t happen immediately. It happens when the lambda is executed, which is when the larger plan is run (possibly as part of an even larger plan). So what mbind does is to create a new plan to be executed in the future.

And, as with every monad, there is a function that takes regular input and turns it into a trivial plan. I called it got_it before, but a more common name would be mreturn:

template<class A>
Plan<A> mreturn(A a)
    return [a](State s) { return make_pair(a, s); };

The state monad usually comes with two helper functions. The function getState gives direct access to the state by turning it into the return value:

Plan<State> getState()
    return [](State s) { return make_pair(s, s); };

Using getState you may examine the state when the plan is running, and dynamically select different branches of your plan. This makes monads very flexible, but it also makes the composition of different monads harder. We’ll see this in the next installment, when we compose the state monad with the list monad.

The second helper function is used to modify (completely replace) the state.

Plan<void*> putState(State newState)
    return [=](State s) { return make_pair(nullptr, newState); };

It doesn’t evaluate anything useful, so its return type is void* and the returned value is nullptr. Its only purpose is to encapsulate a side effect. What’s amazing is that you can do it and still preserve functional purity. There’s no better example of having your cake and eating it too.


To demonstrate the state monad in action, let’s implement a tiny example. We start with a small plan that just picks the first number from a list (the list will be our state):

pair<int, List<int>> select(List<int> lst)
    int i = lst.front();
    return make_pair(i, lst.popped_front());

Persistent list method popped_front returns a list with its first element popped off. Typical of persistent data structures, this method doesn’t modify the original list. It doesn’t copy the list either — it just creates a pointer to its tail.

Here’s our first plan:

Plan<int> sel = &select;

Now we can create a more complex plan to produce a pair of integers:

Plan<pair<int, int>> pl = 
    mbind(sel, [=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

Let’s analyze this code. The first mbind takes a plan sel that selects the first element from a list (the list will be provided later, when we execute this plan). It binds it to the continuation (whose body I have grayed out) that takes the selected integer i and produces a plan to make a pair of integers. Here’s this continuation:

[=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

It binds the plan sel to another, smaller continuation that takes the selected element j and produces a plan to make a pair of integers. Here’s this smaller continuation:

[=](int j) { return
    mreturn(make_pair(i, j));

It combines the first integer i that was captured by the lambda, with the second integer j that is the argument to the lambda, and creates a trivial plan that returns a pair:

mreturn(make_pair(i, j));

Notice that we are using the same plan sel twice. But when this plan is executed inside of our final plan, it will return two different elements from the input list. When mbind is run, it first passes the state (the list of integers) to the first sel. It gets back the modified state — the list with the first element popped. It then uses this shorter list to execute the plan produced by the continuation. So the second sel gets the shortened list, and picks its first element, which is the second element of the original list. It, too, shortens the list, which is then passed to mreturn, which doesn’t modify it any more.

We can now run the final plan by giving it a list to pick from:

List<int> st{ 1, 2, 3 };
cout << runPlan(pl, st) << endl;

We are still not ready to solve the puzzle, but we are much closer. All we need is to combine the list monad with the state monad. I’m leaving this task for the next installment.

But here’s another look at the final solution:

StateL<tuple<int, int, int>> solve()
    StateL<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));

This time I didn’t rename mbind to for_each and mreturn to yield.

Now that you’re familiar with the state monad, you may appreciate the fact that the same sel passed as the argument to mbind will yield a different number every time, as long as the state list contains unique digits.

Before You Post a Comment

I know what you’re thinking: Why would I want to complicate my life with monads if there is a much simpler, imperative way of dealing with state? What’s the payoff of the functional approach? The immediate payoff is thread safety. In imperative programming mutable shared state is a never-ending source of concurrency bugs. The state monad and the use of persistent data structures eliminates the possibility of data races. And it does it without any need for synchronization (other than reference counting inside shared pointers).

I will freely admit that C++ is not the best language for functional programming, and that monadic code in C++ looks overly complicated. But, let’s face it, in C++ everything looks overly complicated. What I’m describing here are implementation details of something that should be encapsulated in an easy to use library. I’ll come back to it in the next installment of this mini-series.


  1. Implement select from the example in the text using getState and putState.
  2. Implement evalPlan, a version of runPlan that only returns the final value, without the state.
  3. Implement mthen — a version of mbind where the continuation doesn’t take any arguments. It ignores the result of the Plan, which is the first argument to mthen (but it still runs it, and uses the modified state).
  4. Use the state monad to build a simple RPN calculator. The state is the stack (a List) of items:
    enum ItemType { Plus, Minus, Num };
    struct Item
        ItemType _type;
        int _num;
        Item(int i) : _type(Num), _num(i) {}
        Item(ItemType t) : _type(t), _num(-1) {}

    Implement the function calc() that creates a plan to evaluate an integer. Here’s the test code that should print -1:

    List<int> stack{ Item(Plus)
                   , Item(Minus)
                   , Item(4)
                   , Item(8)
                   , Item(3) };
    cout << evalPlan(calc(), stack) << endl;

    (The solution is available on github.)

I am sometimes asked by C++ programmers to give an example of a problem that can’t be solved without monads. This is the wrong kind of question — it’s like asking if there is a problem that can’t be solved without for loops. Obviously, if your language supports a goto, you can live without for loops. What monads (and for loops) can do for you is to help you structure your code. The use of loops and if statements lets you convert spaghetti code into structured code. Similarly, the use of monads lets you convert imperative code into declarative code. These are the kind of transformations that make code easier to write, understand, maintain, and generalize.

So here’s a problem that you may get as an interview question. It’s a small problem, so the advantages of various approaches might not be immediately obvious, especially if you’ve been trained all your life in imperative programming, and you are seeing monads for the first time.

You’re supposed write a program to solve this puzzle:

  s e n d
+ m o r e
m o n e y

Each letter correspond to a different digit between 0 and 9. Before you continue reading this post, try to think about how you would approach this problem.

The Analysis

It never hurts to impress your interviewer with your general knowledge by correctly classifying the problem. This one belongs to the class of “constraint satisfaction problems.” The obvious constraint is that the numbers obtained by substituting letters with digits have to add up correctly. There are also some less obvious constraints, namely the numbers should not start with zero.

If you were to solve this problem using pencil and paper, you would probably come up with lots of heuristics. For instance, you would deduce that m must stand for 1 because that’s the largest possible carry from the addition of two digits (even if there is a carry from the previous column). Then you’d figure out that s must be either 8 or 9 to produce this carry, and so on. Given enough time, you could probably write an expert system with a large set of rules that could solve this and similar problems. (Mentioning an expert system could earn you extra points with the interviewer.)

However, the small size of the problem suggests that a simple brute force approach is probably best. The interviewer might ask you to estimate the number of possible substitutions, which is 10!/(10 – 8)! or roughly 2 million. That’s not a lot. So, really, the solution boils down to generating all those substitutions and testing the constraints for each.

The Straightforward Solution

The mind of an imperative programmer immediately sees the solution as a set of 8 nested loops (there are 8 unique letters in the problem: s, e, n, d, m, o, r, y). Something like this:

for (int s = 0; s < 10; ++s)
    for (int e = 0; e < 10; ++e)
        for (int n = 0; n < 10; ++n)
            for (int d = 0; d < 10; ++d)

and so on, until y. But then there is the condition that the digits have to be different, so you have to insert a bunch of tests like:

e != s
n != s && n != e
d != s && d != e && d != n

and so on, the last one involving 7 inequalities… Effectively you have replaced the uniqueness condition with 28 new constraints.

This would probably get you through the interview at Microsoft, Google, or Facebook, but really, can’t you do better than that?

The Smart Solution

Before I proceed, I should mention that what follows is almost a direct translation of a Haskell program from the blog post by Justin Le. I strongly encourage everybody to learn some Haskell, but in the meanwhile I’ll be happy to serve as your translator.

The problem with our naive solution is the 28 additional constraints. Well, I guess one could live with that — except that this is just a tiny example of a whole range of constraint satisfaction problems, and it makes sense to figure out a more general approach.

The problem can actually be formulated as a superposition of two separate concerns. One deals with the depth and the other with the breadth of the search for solutions.

Let me touch on the depth issue first. Let’s consider the problem of creating just one substitution of letters with numbers. This could be described as picking 8 digits from a list of 0, 1, …9, one at a time. Once a digit is picked, it’s no longer in the list. We don’t want to hard code the list, so we’ll make it a parameter to our algorithm. Notice that this approach works even if the list contains duplicates, or if the list elements are not easily comparable for equality (for instance, if they are futures). We’ll discuss the list-picking part of the problem in more detail later.

Now let’s talk about breadth: we have to repeat the above process for all possible picks. This is what the 8 nested loops were doing. Except that now we are in trouble because each individual pick is destructive. It removes items from the list — it mutates the list. This is a well known problem when searching through solution spaces, and the standard remedy is called backtracking. Once you have processed a particular candidate, you put the elements back in the list, and try the next one. Which means that you have to keep track of your state, either implicitly on your function’s stack, or in a separate explicit data structure.

Wait a moment! Weren’t we supposed to talk about functional programming? So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming? Functional programmers have been using the state monad since time immemorial. And mutation is not an issue if you’re using persistent data structures. So fasten your seat belts and make sure your folding trays are in the upright position.

The List Monad

We’ll start with a small refresher in quantum mechanics. As you may remember from school, quantum processes are non-deterministic. You may repeat the same experiment many times and every time get a different result. There is a very interesting view of quantum mechanics called the many-worlds interpretation, in which every experiment gives rise to multiple alternate histories. So if the spin of an electron may be measured as either up or down, there will be one universe in which it’s up, and one in which it’s down.


We’ll use the same approach to solving our puzzle. We’ll create an alternate universe for each digit substitution for a given letter. So we’ll start with 10 universes for the letter s; then we’ll split each of them into ten universes for the letter e, and so on. Of course, most of these universes won’t yield the desired result, so we’ll have to destroy them. I know, it seems kind of wasteful, but in functional programming it’s easy come, easy go. The creation of a new universe is relatively cheap. That’s because new universes are not that different from their parent universes, and they can share almost all of their data. That’s the idea behind persistent data structures. These are the immutable data structures that are “mutated” by cloning. A cloned data structure shares most of its implementation with the parent, except for a small delta. We’ll be using persistent lists described in my earlier post.

Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward. First, we need functions that generate new worlds. Since we are cheap, we’ll only generate the parts that are different. So what’s the difference between all the worlds that we get when selecting the substitution for the letter s? Just the number that we assign to s. There are ten worlds corresponding to the ten possible digits (we’ll deal with the constraints like s being different from zero later). So all we need is a function that generates a list of ten digits. These are our ten universes in a nutshell. They share everything else.

Once you are in an alternate universe, you have to continue with your life. In functional programming, the rest of your life is just a function called a continuation. I know it sounds like a horrible simplification. All your actions, emotions, and hopes reduced to just one function. Well, maybe the continuation just describes one aspect of your life, the computational part, and you can still hold on to our emotions.

So what do our lives look like, and what do they produce? The input is the universe we’re in, in particular the one number that was picked for us. But since we live in a quantum universe, the outcome is a multitude of universes. So a continuation takes a number, and produces a list. It doesn’t have to be a list of numbers, just a list of whatever characterizes the differences between alternate universes. In particular, it could be a list of different solutions to our puzzle — triples of numbers corresponding to “send”, “more”, and “money”. (There is actually only one solution, but that’s beside the point.)

And what’s the very essence of this new approach? It’s the binding of the selection of the universes to the continuation. That’s where the action is. This binding, again, can be expressed as a function. It’s a function that takes a list of universes and a continuation that produces a list of universes. It returns an even bigger list of universes. We’ll call this function for_each, and we’ll make it as generic as possible. We won’t assume anything about the type of the universes that are passed in, or the type of the universes that the continuation k produces. We’ll also make the type of the continuation a template parameter and extract the return type from it using auto and decltype:

template<class A, class F>
auto for_each(List<A> lst, F k) -> decltype(k(lst.front()))
    using B = decltype(k(lst.front()).front());
    // This should really be expressed using concepts
        F, std::function<List<B>(A)>>::value,
        "for_each requires a function type List<B>(A)");

    List<List<B>> lstLst = fmap(k, lst);
    return concatAll(lstLst);

The function fmap is similar to std::transform. It applies the continuation k to every element of the list lst. Because k itself produces a list, the result is a list of lists, lstLst. The function concatAll concatenates all those lists into one big list.

Congratulations! You have just seen a monad. This one is called the list monad and it’s used to model non-deterministic processes. The monad is actually defined by two functions. One of them is for_each, and here’s the other one:

template<class A>
List<A> yield(A a)
    return List<A> (a);

It’s a function that returns a singleton list. We use yield when we are done multiplying universes and we just want to return a single value. We use it to create a single-valued continuation. It represents the lonesome boring life, devoid of any choices.

I will later rename these functions to mbind and mreturn, because they are part of any monad, not just the list monad.

The names like for_each or yield have a very imperative ring to them. That’s because, in functional programming, monadic code plays a role similar to imperative code. But neither for_each nor yield are control structures — they are functions. In particular for_each, which sounds and works like a loop, is just a higher order function; and so is fmap, which is used in its implementation. Of course, at some level the code becomes imperative — fmap can either be implemented recursively or using an actual loop — but the top levels are just declarations of functions. Hence, declarative programming.

There is a slight difference between a loop and a function on lists like for_each: for_each takes a whole list as an argument, while a loop might generate individual items — in this case integers — on the fly. This is not a problem in a lazy functional language like Haskell, where a list is evaluated on demand. The same behavior may be implemented in C++ using streams or lazy ranges. I won’t use it here, since the lists we are dealing with are short, but you can read more about it in my earlier post Getting Lazy with C++.

We are not ready yet to implement the solution to our puzzle, but I’d like to give you a glimpse of what it looks like. For now, think of StateL as just a list. See if it starts making sense (I grayed out the usual C++ noise):

StateL<tuple<int, int, int>> solve()
    StateL<int> sel = &select<int>;

    return for_each(sel, [=](int s) {
    return for_each(sel, [=](int e) {
    return for_each(sel, [=](int n) {
    return for_each(sel, [=](int d) {
    return for_each(sel, [=](int m) {
    return for_each(sel, [=](int o) {
    return for_each(sel, [=](int r) {
    return for_each(sel, [=](int y) {
        return yield_if(s != 0 && m != 0, [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return yield_if(send + more == money, [=]() {
                return yield(make_tuple(send, more, money));

The first for_each takes a selection of integers, sel, (never mind how we deal with uniqueness); and a continuation, a lambda, that takes one integer, s, and produces a list of solutions — tuples of three integers. This continuation, in turn, calls for_each with a selection for the next letter, e, and another continuation that returns a list of solutions, and so on.

The innermost continuation is a conditional version of yield called yield_if. It checks a condition and produces a zero- or one-element list of solutions. Internally, it calls another yield_if, which calls the ultimate yield. If that final yield is called (and it might not be, if one of the previous conditions fails), it will produce a solution — a triple of numbers. If there is more than one solution, these singleton lists will get concatenated inside for_each while percolating to the top.

In the second part of this post I will come back to the problem of picking unique numbers and introduce the state monad. You can also have a peek at the code on github.


  1. Implement for_each and yield for a vector instead of a List. Use the Standard Library transform instead of fmap.
  2. Using the list monad (or your vector monad), write a function that generates all positions on a chessboard as pairs of characters between 'a' and 'h' and numbers between 1 and 8.
  3. Implement a version of for_each (call it repeat) that takes a continuation k of the type function<List<B>()> (notice the void argument). The function repeat calls k for each element of the list lst, but it ignores the element itself.

This is part 11 of Categories for Programmers. Previously: Natural Transformations. See the Table of Contents.

Introduction to Part II

In the first part of the book I argued that both category theory and programming are about composability. In programming, you keep decomposing a problem until you reach the level of detail that you can deal with, solve each subproblem in turn, and re-compose the solutions bottom-up. There are, roughly speaking, two ways of doing it: by telling the computer what to do, or by telling it how to do it. One is called declarative and the other imperative.

You can see this even at the most basic level. Composition itself may be defined declaratively; as in, h is a composite of g after f:

h = g . f

or imperatively; as in, call f first, remember the result of that call, then call g with the result:

h x = let y = f x
      in g y

The imperative version of a program is usually described as a sequence of actions ordered in time. In particular, the call to g cannot happen before the execution of f completes. At least, that’s the conceptual picture — in a lazy language, with call-by-need argument passing, the actual execution may proceed differently.

In fact, depending on the cleverness of the compiler, there may be little or no difference between how declarative and imperative code is executed. But the two methodologies differ, sometimes drastically, in the way we approach problem solving and in the maintainability and testability of the resulting code.

The main question is: when faced with a problem, do we always have the choice between a declarative and imperative approaches to solving it? And, if there is a declarative solution, can it always be translated into computer code? The answer to this question is far from obvious and, if we could find it, we would probably revolutionize our understanding of the universe.

Let me elaborate. There is a similar duality in physics, which either points at some deep underlying principle, or tells us something about how our minds work. Richard Feynman mentions this duality as an inspiration in his own work on quantum electrodynamics.

There are two forms of expressing most laws of physics. One uses local, or infinitesimal, considerations. We look at the state of a system around a small neighborhood, and predict how it will evolve within the next instant of time. This is usually expressed using differential equations that have to be integrated, or summed up, over a period of time.

Notice how this approach resembles imperative thinking: we reach the final solution by following a sequence of small steps, each depending on the result of the previous one. In fact, computer simulations of physical systems are routinely implemented by turning differential equations into difference equations and iterating them. This is how spaceships are animated in the asteroids game. At each time step, the position of a spaceship is changed by adding a small increment, which is calculated by multiplying its velocity by the time delta. The velocity, in turn, is changed by a small increment proportional to acceleration, which is given by force divided by mass.


These are the direct encodings of the differential equations corresponding to Newton’s laws of motion:

F = m dv/dt
v = dx/dt

Similar methods may be applied to more complex problems, like the propagation of electromagnetic fields using Maxwell’s equations, or even the behavior of quarks and gluons inside a proton using lattice QCD (quantum chromodynamics).

This local thinking combined with discretization of space and time that is encouraged by the use of digital computers found its extreme expression in the heroic attempt by Stephen Wolfram to reduce the complexity of the whole universe to a system of cellular automata.

The other approach is global. We look at the initial and the final state of the system, and calculate a trajectory that connects them by minimizing a certain functional. The simplest example is the Fermat’s principle of least time. It states that light rays propagate along paths that minimize their flight time. In particular, in the absence of reflecting or refracting objects, a light ray from point A to point B will take the shortest path, which is a straight line. But light propagates slower in dense (transparent) materials, like water or glass. So if you pick the starting point in the air, and the ending point under water, it’s more advantageous for light to travel longer in the air and then take a shortcut through water. The path of minimum time makes the ray refract at the boundary of air and water, resulting in Snell’s law of refraction:

sin θ1 / sin θ2 =  v1 / v2

where v1 is the speed of light in the air and v2 is the speed of light in the water.


All of classical mechanics can be derived from the principle of least action. The action can be calculated for any trajectory by integrating the Lagrangian, which is the difference between kinetic and potential energy (notice: it’s the difference, not the sum — the sum would be the total energy). When you fire a mortar to hit a given target, the projectile will first go up, where the potential energy due to gravity is higher, and spend some time there racking up negative contribution to the action. It will also slow down at the top of the parabola, to minimize kinetic energy. Then it will speed up to go quickly through the area of low potential energy.


Feynman’s greatest contribution was to realize that the principle of least action can be generalized to quantum mechanics. There, again, the problem is formulated in terms of initial state and final state. The Feynman path integral between those states is used to calculate the probability of transition.


The point is that there is a curious unexplained duality in the way we can describe the laws of physics. We can use the local picture, in which things happen sequentially and in small increments. Or we can use the global picture, where we declare the initial and final conditions, and everything in between just follows.

The global approach can be also used in programming, for instance when implementing ray tracing. We declare the position of the eye and the positions of light sources, and figure out the paths that the light rays may take to connect them. We don’t explicitly minimize the time of flight for each ray, but we do use Snell’s law and the geometry of reflection to the same effect.

The biggest difference between the local and the global approach is in their treatment of space and, more importantly, time. The local approach embraces the immediate gratification of here and now, whereas the global approach takes a long-term static view, as if the future had been preordained, and we were only analyzing the properties of some eternal universe.

Nowhere is it better illustrated than in the Functional Reactive Programming approach to user interaction. Instead of writing separate handlers for every possible user action, all having access to some shared mutable state, FRP treats external events as an infinite list, and applies a series of transformations to it. Conceptually, the list of all our future actions is there, available as the input data to our program. From a program’s perspective there’s no difference between the list of digits of π, a list of pseudo-random numbers, or a list of mouse positions coming through computer hardware. In each case, if you want to get the nth item, you have to first go through the first n-1 items. When applied to temporal events, we call this property causality.

So what does it have to do with category theory? I will argue that category theory encourages a global approach and therefore supports declarative programming. First of all, unlike calculus, it has no built-in notion of distance, or neighborhood, or time. All we have is abstract objects and abstract connections between them. If you can get from A to B through a series of steps, you can also get there in one leap. Moreover, the major tool of category theory is the universal construction, which is the epitome of a global approach. We’ve seen it in action, for instance, in the definition of the categorical product. It was done by specifying its properties — a very declarative approach. It’s an object equipped with two projections, and it’s the best such object — it optimizes a certain property: the property of factorizing the projections of other such objects.

Compare this with Fermat’s principle of minimum time, or the principle of least action.

Conversely, contrast this with the traditional definition of a cartesian product, which is much more imperative. You describe how to create an element of the product by picking one element from one set and another element from another set. It’s a recipe for creating a pair. And there’s another for disassembling a pair.

In almost every programming language, including functional languages like Haskell, product types, coproduct types, and function types are built in, rather than being defined by universal constructions; although there have been attempts at creating categorical programming languages (see, e.g., Tatsuya Hagino’s thesis).

Whether used directly or not, categorical definitions justify pre-existing programming constructs, and give rise to new ones. Most importantly, category theory provides a meta-language for reasoning about computer programs at a declarative level. It also encourages reasoning about problem specification before it is cast into code.


I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.

This is part 11 of Categories for Programmers. Previously: Declarative Programming. See the Table of Contents.

It seems like in category theory everything is related to everything and everything can be viewed from many angles. Take for instance the universal construction of the product. Now that we know more about functors and natural transformations, can we simplify and, possibly, generalize it? Let us try.


The construction of a product starts with the selection of two objects a and b, whose product we want to construct. But what does it mean to select objects? Can we rephrase this action in more categorical terms? Two objects form a pattern — a very simple pattern. We can abstract this pattern into a category — a very simple category, but a category nevertheless. It’s a category that we’ll call 2. It contains just two objects, 1 and 2, and no morphisms other than the two obligatory identities. Now we can rephrase the selection of two objects in C as the act of defining a functor D from the category 2 to C. A functor maps objects to objects, so its image is just two objects (or it could be one, if the functor collapses objects, which is fine too). It also maps morphisms — in this case it simply maps identity morphisms to identity morphisms.


What’s great about this approach is that it builds on categorical notions, eschewing the imprecise descriptions like “selecting objects,” taken straight from the hunter-gatherer lexicon of our ancestors. And, incidentally, it is also easily generalized, because nothing can stop us from using categories more complex than 2 to define our patterns.

But let’s continue. The next step in the definition of a product is the selection of the candidate object c. Here again, we could rephrase the selection in terms of a functor from a singleton category. And indeed, if we were using Kan extensions, that would be the right thing to do. But since we are not ready for Kan extensions yet, there is another trick we can use: a constant functor Δ from the same category 2 to C. The selection of c in C can be done with Δc. Remember, Δc maps all objects into c and all morphisms into idc.


Now we have two functors, Δc and D going between 2 and C so it’s only natural to ask about natural transformations between them. Since there are only two objects in 2, a natural transformation will have two components. Object 1 in 2 is mapped to c by Δc and to a by D. So the component of a natural transformation between Δc and D at 1 is a morphism from c to a. We can call it p. Similarly, the second component is a morphism q from c to b — the image of the object 2 in 2 under D. But these are exactly like the two projections we used in our original definition of the product. So instead of talking about selecting objects and projections, we can just talk about picking functors and natural transformations. It so happens that in this simple case the naturality condition for our transformation is trivially satisfied, because there are no morphisms (other than the identities) in 2.


A generalization of this construction to categories other than 2 — ones that, for instance, contain non-trivial morphisms — will impose naturality conditions on the transformation between Δc and D. We call such transformation a cone, because the image of Δ is the apex of a cone/pyramid whose sides are formed by the components of the natural transformation. The image of D forms the base of the cone.

In general, to build a cone, we start with a category I that defines the pattern. It’s a small, often finite category. We pick a functor D from I to C and call it (or its image) a diagram. We pick some c in C as the apex of our cone. We use it to define the constant functor Δc from I to C. A natural transformation from Δc to D is then our cone. For a finite I it’s just a bunch of morphisms connecting c to the diagram: the image of I under D.


Naturality requires that all triangles (the walls of the pyramid) in this diagram commute. Indeed, take any morphism f in I. The functor D maps it to a morphism D f in C, a morphism that forms the base of some triangle. The constant functor Δc maps f to the identity morphism on c. Δ squishes the two ends of the morphism into one object, and the naturality square becomes a commuting triangle. The two arms of this triangle are the components of the natural transformation.


So that’s one cone. What we are interested in is the universal cone — just like we picked a universal object for our definition of a product.

There are many ways to go about it. For instance, we may define a category of cones based on a given functor D. Objects in that category are cones, which, since D is fixed, are entirely determined by their apexes. Not every object c in C forms a cone, though, because there may be no natural transformation between Δc and D.

To make it a category, we also have to define morphisms between cones. These would be fully determined by morphisms between their apexes. But not just any morphism will do. Remember that, in our construction of the product, we imposed the condition that the morphisms between candidate objects (the apexes) must be common factors for the projections. For instance:

p' = p . m
q' = q . m


This condition translates, in the general case, to the condition that the triangles whose one side is the factorizing morphism all commute.

Cone Commutativity

The commuting triangle connecting two cones, with the factorizing morphism h  (here, the lower cone is the universal one, with Lim D as its apex).

We’ll take those factorizing morphisms as the morphisms in our category of cones. It’s easy to check that those morphisms indeed compose, and that the identity morphism is a factorizing morphism as well. Cones therefore form a category.

Now we can define the universal cone as the terminal object in the category of cones. The definition of the terminal object states that there is a unique morphism from any other object to that object. In our case it means that there is a unique factorizing morphism from the apex of any other cone to the apex of the universal cone. We call this universal cone the limit of the diagram D, Lim D (in the literature, you’ll often see a left arrow pointing towards I under the Lim sign). Often, as a shorthand, we call the apex of this cone the limit (or the limit object).

The intuition is that the limit embodies the properties of the whole diagram in a single object. For instance, the limit of our two-object diagram is the product of two objects. The product (together with the two projections) contains the information about both objects. And being universal means that it has no extraneous junk.

Limit as a Natural Isomorphism

There is still something unsatisfying about this definition of a limit. I mean, it’s workable, but we still have this commutativity condition for the triangles that are linking any two cones. It would be so much more elegant if we could replace it with some naturality condition. But how?

We are no longer dealing with one cone but with a whole collection (in fact, a category) of cones. If the limit exists (and — let’s make it clear — there’s no guarantee of that), one of those cones is the universal cone. For every other cone we have a unique factorizing morphism that maps its apex, let’s call it c, to the apex of the universal cone, which we named Lim D. (In fact, I can skip the word “other,” because the identity morphism maps the universal cone to itself and it trivially factorizes through itself.) Let me repeat the important part: given any cone, there is a unique morphism of a special kind. We have a mapping of cones to special morphisms, and it’s a one-to-one mapping.

This special morphism is a member of the hom-set C(c, Lim D). The other members of this hom-set are less fortunate, in the sense that they don’t factorize the mapping of cones. What we want is to be able to pick, for each c, one morphism from the set C(c, Lim D) — a morphism that satisfies the particular commutativity condition. Does that sound like defining a natural transformation? It most certainly does!

But what are the functors that are related by this transformation?

One functor is the mapping of c to the set C(c, Lim D). It’s a functor from C to Set — it maps objects to sets. In fact it’s a contravariant functor. Here’s how we define its action on morphisms: Let’s take a morphism f from c' to c:

f :: c' -> c

Our functor maps c' to the set C(c', Lim D). To define the action of this functor on f (in other words, to lift f), we have to define the corresponding mapping between C(c, Lim D) and C(c', Lim D). So let’s pick one element u of C(c, Lim D) and see if we can map it to some element of C(c', Lim D). An element of a hom-set is a morphism, so we have:

u :: c -> Lim D

We can precompose f with u to get:

u . f :: c' -> Lim D

And that’s an element of C(c', Lim D)— so indeed, we have found a mapping of morphisms:

contramap :: (c' -> c) -> (c -> Lim D) -> (c' -> Lim D)
contramap f u = u . f

Notice the inversion in the order of c and c' characteristic of a contravariant functor.


To define a natural transformation, we need another functor that’s also a mapping from C to Set. But this time we’ll consider a set of cones. Cones are just natural transformations, so we are looking at the set of natural transformations Nat(Δc, D). The mapping from c to this particular set of natural transformations is a (contravariant) functor. How can we show that? Again, let’s define its action on a morphism:

f :: c' -> c

The lifting of f should be a mapping of natural transformations between two functors that go from I to C:

Nat(Δc, D) -> Nat(Δc', D)

How do we map natural transformations? Every natural transformation is a selection of morphisms — its components — one morphism per element of I. A component of some α (a member of Nat(Δc, D)) at a (an object in I) is a morphism:

αa :: Δca -> D a

or, using the definition of the constant functor Δ,

αa :: c -> D a

Given f and α, we have to construct a β, a member of Nat(Δc', D). Its component at a should be a morphism:

βa :: c' -> D a

We can easily get the latter from the former by precomposing it with f:

βa = αa . f

It’s relatively easy to show that those components indeed add up to a natural transformation.

Given our morphism f, we have thus built a mapping between two natural transformations, component-wise. This mapping defines contramap for the functor:

c -> Nat(Δc, D)

What I have just done is to show you that we have two (contravariant) functors from C to Set. I haven’t made any assumptions — these functors always exist.

Incidentally, the first of these functors plays an important role in category theory, and we’ll see it again when we talk about Yoneda’s lemma. There is a name for contravariant functors from any category C to Set: they are called “presheaves.” This one is called a representable presheaf. The second functor is also a presheaf.

Now that we have two functors, we can talk about natural transformations between them. So without further ado, here’s the conclusion: A functor D from I to C has a limit Lim D if and only if there is a natural isomorphism between the two functors I have just defined:

C(c, Lim D) ≃ Nat(Δc, D)

Let me remind you what a natural isomorphism is. It’s a natural transformation whose every component is an isomorphism, that is to say an invertible morphism.

I’m not going to go through the proof of this statement. The procedure is pretty straightforward if not tedious. When dealing with natural transformations, you usually focus on components, which are morphisms. In this case, since the target of both functors is Set, the components of the natural isomorphism will be functions. These are higher order functions, because they go from the hom-set to the set of natural transformations. Again, you can analyze a function by considering what it does to its argument: here the argument will be a morphism — a member of C(c, Lim D) — and the result will be a natural transformation — a member of Nat(Δc, D), or what we have called a cone. This natural transformation, in turn, has its own components, which are morphisms. So it’s morphisms all the way down, and if you can keep track of them, you can prove the statement.

The most important result is that the naturality condition for this isomorphism is exactly the commutativity condition for the mapping of cones.

As a preview of coming attractions, let me mention that the set Nat(Δc, D) can be thought of as a hom-set in the functor category; so our natural isomorphism relates two hom-sets, which points at an even more general relationship called an adjunction.

Examples of Limits

We’ve seen that the categorical product is a limit of a diagram generated by a simple category we called 2.

There is an even simpler example of a limit: the terminal object. The first impulse would be to think of a singleton category as leading to a terminal object, but the truth is even starker than that: the terminal object is a limit generated by an empty category. A functor from an empty category selects no object, so a cone shrinks to just the apex. The universal cone is the lone apex that has a unique morphism coming to it from any other apex. You will recognize this as the definition of the terminal object.

The next interesting limit is called the equalizer. It’s a limit generated by a two-element category with two parallel morphisms going between them (and, as always, the identity morphisms). This category selects a diagram in C consisting of two objects, a and b, and two morphisms:

f :: a -> b
g :: a -> b

To build a cone over this diagram, we have to add the apex, c and two projections:

p :: c -> a
q :: c -> b


We have two triangles that must commute:

q = f . p
q = g . p

This tells us that q is uniquely determined by one of these equations, say, q = f . p, and we can omit it from the picture. So we are left with just one condition:

f . p = g . p

The way to think about it is that, if we restrict our attention to Set, the image of the function p selects a subset of a. When restricted to this subset, the functions f and g are equal.

For instance, take a to be the two-dimensional plane parameterized by coordinates x and y. Take b to be the real line, and take:

f (x, y) = 2 * y + x
g (x, y) = y - x

The equalizer for these two functions is the set of real numbers (the apex, c) and the function:

p t = (t, (-2) * t)

Notice that (p t) defines a straight line in the two-dimensional plane. Along this line, the two functions are equal.

Of course, there are other sets c' and functions p' that may lead to the equality:

f . p' = g . p'

but they all uniquely factor out through p. For instance, we can take the singleton set () as c' and the function:

p'() = (0, 0)

It’s a good cone, because f (0, 0) = g (0, 0). But it’s not universal, because of the unique factorization through h:

p' = p . h


h () = 0

An equalizer can thus be used to solve equations of the type f x = g x. But it’s much more general, because it’s defined in terms of objects and morphisms rather than algebraically.

An even more general idea of solving an equation is embodied in another limit — the pullback. Here, we still have two morphisms that we want to equate, but this time their domains are different. We start with a three-object category of the shape: 1->2<-3. The diagram corresponding to this category consists of three objects, a, b, and c, and two morphisms:

f :: a -> b
g :: c -> b

This diagram is often called a span.

A cone built on top of this diagram consists of the apex, d, and three morphisms:

p :: d -> a
q :: d -> c
r :: d -> b


Commutativity conditions tell us that r is completely determined by the other morphisms, and can be omitted from the picture. So we are only left with the following condition:

g . q = f . p

A pullback is a universal cone of this shape.


Again, if you narrow your focus down to sets, you can think of the object d as consisting of pairs of elements from a and c for which f acting on the first component is equal to g acting on the second component. If this is still too general, consider the special case in which g is a constant function, say g _ = 42. Then you are really solving the equation:

f x = 42

But pullbacks have more general applications, also in programming. For instance, consider C++ classes as a category in which morphism are arrows that connect subclasses to superclasses. We’ll consider inheritance a transitive property, so if C inherits from B and B inherits from A then we’ll say that C inherits from A (after all, you can pass a pointer to C where a pointer to A is expected). Also, we’ll assume that C inherits from C, so we have the identity arrow for every class. This way subclassing is aligned with subtyping. C++ also supports multiple inheritance, so you can construct a diamond inheritance diagram with two classes B and C inheriting from A, and a fourth class D multiply inheriting from B and C. Normally, D would get two copies of A, which is rarely desirable; but you can use virtual inheritance to have just one copy of A in D.

What would it mean to have D be a pullback in this diagram? It would mean that any class E that multiply inherits from B and C is also a subclass of D. This is not directly expressible in C++, where subtyping is nominal (the C++ compiler wouldn’t infer this kind of class relationship — it would require “duck typing”). But we could go outside of the subtyping relationship and instead ask whether a cast from E to D would be safe or not. This cast would be safe if D were the bare-bone combination of B and C, with no additional data and no overriding of methods. And, of course, there would be no pullback if there is a name conflict between some methods of B and C.


There’s also a more advanced use of a pullback in type inference. There is often a need to unify types of two expressions. For instance, suppose that the compiler wants to infer the type of a function:

twice f x = f (f x)

It will assign preliminary types to all variables and sub-expressions. In particular, it will assign:

f       :: t0
x       :: t1
f x     :: t2
f (f x) :: t3

from which it will deduce that:

twice :: t0 -> t1 -> t3

It will also come up with a set of constraints resulting from the rules of function application:

t0 = t1 -> t2 -- because f is applied to x
t0 = t2 -> t3 -- because f is applied to (f x)

These constraints have to be unified by finding a set of types (or type variables) that, when substituted for the unknown types in both expressions, produce the same type. One such substitution is:

t1 = t2 = t3 = Int
twice :: (Int -> Int) -> Int -> Int

but, obviously, it’s not the most general one. The most general substitution is obtained using a pullback. I won’t go into the details, because they are beyond the scope of this book, but you can convince yourself that the result should be:

twice :: (t -> t) -> t -> t

with t a free type variable.


Just like all constructions in category theory, limits have their dual image in opposite categories. When you invert the direction of all arrows in a cone, you get a co-cone, and the universal one of those is called a colimit. Notice that the inversion also affects the factorizing morphism, which now flows from the universal co-cone to any other co-cone.


Cocone with a factorizing morphism h connecting two apexes.

A typical example of a colimit is a coproduct, which corresponds to the diagram generated by 2, the category we’ve used in the definition of the product.


Both the product and the coproduct embody the essence of a pair of objects, each in a different way.

Just like the terminal object was a limit, so the initial object is a colimit corresponding to the diagram based on an empty category.

The dual of the pullback is called the pushout. It’s based on a diagram called a co-span, generated by the category 13.


I said previously that functors come close to the idea of continuous mappings of categories, in the sense that they never break existing connections (morphisms). The actual definition of a continuous functor F from a category C to C’ includes the requirement that the functor preserve limits. Every diagram D in C can be mapped to a diagram F ∘ D in C’ by simply composing two functors. The continuity condition for F states that, if the diagram D has a limit Lim D, then the diagram F ∘ D also has a limit, and it is equal to F (Lim D).


Notice that, because functors map morphisms to morphisms, and compositions to compositions, an image of a cone is always a cone. A commuting triangle is always mapped to a commuting triangle (functors preserve composition). The same is true for the factorizing morphisms: the image of a factorizing morphism is also a factorizing morphism. So every functor is almost continuous. What may go wrong is the uniqueness condition. The factorizing morphism in C’ might not be unique. There may also be other “better cones” in C’ that were not available in C.

A hom-functor is an example of a continuous functor. Recall that the hom-functor, C(a, b), is contravariant in the first variable and covariant in the second. In other words, it’s a functor:

Cop × C -> Set

When its second argument is fixed, the hom-set functor (which becomes the representable presheaf) maps colimits in C to limits in Set; and when its first argument is fixed, it maps limits to limits.

In Haskell, a hom-functor is the mapping of any two types to a function type, so it’s just a parameterized function type. When we fix the second parameter, let’s say to String, we get the contravariant functor:

newtype ToString a = ToString (a -> String)
instance Contravariant ToString where
    contramap f (ToString g) = ToString (g . f)

Continuity means that when ToString is applied to a colimit, for instance a coproduct Either b c, it will produce a limit; in this case a product of two function types:

ToString (Eiter b c) ~ (b -> String, c -> String)

Indeed, any function of Either b c is implemented as a case statement with the two cases being serviced by a pair of functions.

Similarly, when we fix the first argument of the hom-set, we get the familiar reader functor. Its continuity means that, for instance, any function returning a product is equivalent to a product of functions; in particular:

r -> (a, b) ~ (r -> a, r -> b)

I know what you’re thinking: You don’t need category theory to figure these things out. And you’re right! Still, I find it amazing that such results can be derived from first principles with no recourse to bits and bytes, processor architectures, compiler technologies, or even lambda calculus.

If you’re curious where the names “limit” and “continuity” come from, they are a generalization of the corresponding notions from calculus. In calculus limits and continuity are defined in terms of open neighborhoods. Open sets, which define topology, form a category (a poset).


  1. How would you describe a pushout in the category of C++ classes?
  2. Show that the limit of the identity functor Id :: C -> C is the initial object.
  3. Subsets of a given set form a category. A morphism in that category is defined to be an arrow connecting two sets if the first is the subset of the second. What is a pullback of two sets in such a category? What’s a pushout? What are the initial and terminal objects?
  4. Can you guess what a coequalizer is?
  5. Show that, in a category with a terminal object, a pullback towards the terminal object is a product.
  6. Similarly, show that a pushout from an initial object (if one exists) is the coproduct.


I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.

This is part 10 of Categories for Programmers. Previously: Function Types. See the Table of Contents.

We talked about functors as mappings between categories that preserve their structure. A functor “embeds” one category in another. It may collapse multiple things into one, but it never breaks connections. One way of thinking about it is that with a functor we are modeling one category inside another. The source category serves as a model, a blueprint, for some structure that’s part of the target category.


There may be many ways of embedding one category in another. Sometimes they are equivalent, sometimes very different. One may collapse the whole source category into one object, another may map every object to a different object and every morphism to a different morphism. The same blueprint may be realized in many different ways. Natural transformations help us compare these realizations. They are mappings of functors — special mappings that preserve their functorial nature.

Consider two functors F and G between categories C and D. If you focus on just one object a in C, it is mapped to two objects: F a and G a. A mapping of functors should therefore map F a to G a.


Notice that F a and G a are objects in the same category D. Mappings between objects in the same category should not go against the grain of the category. We don’t want to make artificial connections between objects. So it’s natural to use existing connections, namely morphisms. A natural transformation is a selection of morphisms: for every object a, it picks one morphism from F a to G a. If we call the natural transformation α, this morphism is called the component of α at a, or αa.

αa :: F a -> G a

Keep in mind that a is an object in C while αa is a morphism in D.

If, for some a, there is no morphism between F a and G a in D, there can be no natural transformation between F and G.

Of course that’s only half of the story, because functors not only map objects, they map morphisms as well. So what does a natural transformation do with those mappings? It turns out that the mapping of morphisms is fixed — under any natural transformation between F and G, F f must be transformed into G f. What’s more, the mapping of morphisms by the two functors drastically restricts the choices we have in defining a natural transformation that’s compatible with it. Consider a morphism f between two objects a and b in C. It’s mapped to two morphisms, F f and G f in D:

F f :: F a -> F b
G f :: G a -> G b

The natural transformation α provides two additional morphisms that complete the diagram in D:

αa :: F a -> G a
αb :: F b -> G b


Now we have two ways of getting from F a to G b. To make sure that they are equal, we must impose the naturality condition that holds for any f:

G f ∘ αa = αb ∘ F f

The naturality condition is a pretty stringent requirement. For instance, if the morphism F f is invertible, naturality determines αb in terms of αa. It transports αa along f:

αb = (G f) ∘ αa ∘ (F f)-1


If there is more than one invertible morphism between two objects, all these transports have to agree. In general, though, morphisms are not invertible; but you can see that the existence of natural transformations between two functors is far from guaranteed. So the scarcity or the abundance of functors that are related by natural transformations may tell you a lot about the structure of categories between which they operate. We’ll see some examples of that when we talk about limits and the Yoneda lemma.

Looking at a natural transformation component-wise, one may say that it maps objects to morphisms. Because of the naturality condition, one may also say that it maps morphisms to commuting squares — there is one commuting naturality square in D for every morphism in C.


This property of natural transformations comes in very handy in a lot of categorical constructions, which often include commuting diagrams. With a judicious choice of functors, a lot of these commutativity conditions may be transformed into naturality conditions. We’ll see examples of that when we get to limits, colimits, and adjunctions.

Finally, natural transformations may be used to define isomorphisms of functors. Saying that two functors are naturally isomorphic is almost like saying they are the same. Natural isomorphism is defined as a natural transformation whose components are all isomorphisms (invertible morphisms).

Polymorphic Functions

We talked about the role of functors (or, more specifically, endofunctors) in programming. They correspond to type constructors that map types to types. They also map functions to functions, and this mapping is implemented by a higher order function fmap (or transform, then, and the like in C++).

To construct a natural transformation we start with an object, here a type, a. One functor, F, maps it to the type F a. Another functor, G, maps it to G a. The component of a natural transformation alpha at a is a function from F a to G a. In pseudo-Haskell:

alphaa :: F a -> G a

A natural transformation is a polymorphic function that is defined for all types a:

alpha :: forall a . F a -> G a

The forall a is optional in Haskell (and in fact requires turning on the language extension ExplicitForAll). Normally, you would write it like this:

alpha :: F a -> G a

Keep in mind that it’s really a family of functions parameterized by a. This is another example of the terseness of the Haskell syntax. A similar construct in C++ would be slightly more verbose:

template G<A> alpha(F<A>);

There is a more profound difference between Haskell’s polymorphic functions and C++ generic functions, and it’s reflected in the way these functions are implemented and type-checked. In Haskell, a polymorphic function must be defined uniformly for all types. One formula must work across all types. This is called parametric polymorphism.

C++, on the other hand, supports by default ad hoc polymorphism, which means that a template doesn’t have to be well-defined for all types. Whether a template will work for a given type is decided at instantiation time, where a concrete type is substituted for the type parameter. Type checking is deferred, which unfortunately often leads to incomprehensible error messages.

In C++, there is also a mechanism for function overloading and template specialization, which allows different definitions of the same function for different types. In Haskell this functionality is provided by type classes and type families.

Haskell’s parametric polymorphism has an unexpected consequence: any polymorphic function of the type:

alpha :: F a -> G a

where F and G are functors, automatically satisfies the naturality condition. Here it is in categorical notation (f is a function f::a->b):

G f ∘ αa = αb ∘ F f

In Haskell, the action of a functor G on a morphism f is implemented using fmap. I’ll first write it in pseudo-Haskell, with explicit type annotations:

fmapG f . alphaa = alphab . fmapF f

Because of type inference, these annotations are not necessary, and the following equation holds:

fmap f . alpha = alpha . fmap f

This is still not real Haskell — function equality is not expressible in code — but it’s an identity that can be used by the programmer in equational reasoning; or by the compiler, to implement optimizations.

The reason why the naturality condition is automatic in Haskell has to do with “theorems for free.” Parametric polymorphism, which is used to define natural transformations in Haskell, imposes very strong limitations on the implementation — one formula for all types. These limitations translate into equational theorems about such functions. In the case of functions that transform functors, free theorems are the naturality conditions. [You may read more about free theorems in my blog Parametricity: Money for Nothing and Theorems for Free.]

One way of thinking about functors in Haskell that I mentioned earlier is to consider them generalized containers. We can continue this analogy and consider natural transformations to be recipes for repackaging the contents of one container into another container. We are not touching the items themselves: we don’t modify them, and we don’t create new ones. We are just copying (some of) them, sometimes multiple times, into a new container.

The naturality condition becomes the statement that it doesn’t matter whether we modify the items first, through the application of fmap, and repackage later; or repackage first, and then modify the items in the new container, with its own implementation of fmap. These two actions, repackaging and fmapping, are orthogonal. “One moves the eggs, the other boils them.”

Let’s see a few examples of natural transformations in Haskell. The first is between the list functor, and the Maybe functor. It returns the head of the list, but only if the list is non-empty:

safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:xs) = Just x

It’s a function polymorphic in a. It works for any type a, with no limitations, so it is an example of parametric polymorphism. Therefore it is a natural transformation between the two functors. But just to convince ourselves, let’s verify the naturality condition.

fmap f . safeHead = safeHead . fmap f

We have two cases to consider; an empty list:

fmap f (safeHead []) = fmap f Nothing = Nothing
safeHead (fmap f []) = safeHead [] = Nothing

and a non-empty list:

fmap f (safeHead (x:xs)) = fmap f (Just x) = Just (f x)
safeHead (fmap f (x:xs)) = safeHead (f x : fmap f xs) = Just (f x)

I used the implementation of fmap for lists:

fmap f [] = []
fmap f (x:xs) = f x : fmap f xs

and for Maybe:

fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)

An interesting case is when one of the functors is the trivial Const functor. A natural transformation from or to a Const functor looks just like a function that’s either polymorphic in its return type or in its argument type.

For instance, length can be thought of as a natural transformation from the list functor to the Const Int functor:

length :: [a] -> Const Int a
length [] = Const 0
length (x:xs) = Const (1 + unConst (length xs))

Here, unConst is used to peel off the Const constructor:

unConst :: Const c a -> c
unConst (Const x) = x

Of course, in practice length is defined as:

length :: [a] -> Int

which effectively hides the fact that it’s a natural transformation.

Finding a parametrically polymorphic function from a Const functor is a little harder, since it would require the creation of a value from nothing. The best we can do is:

scam :: Const Int a -> Maybe a
scam (Const x) = Nothing

Another common functor that we’ve seen already, and which will play an important role in the Yoneda lemma, is the Reader functor. I will rewrite its definition as a newtype:

newtype Reader e a = Reader (e -> a)

It is parameterized by two types, but is (covariantly) functorial only in the second one:

instance Functor (Reader e) where  
    fmap f (Reader g) = Reader (\x -> f (g x))

For every type e, you can define a family of natural transformations from Reader e to any other functor f. We’ll see later that the members of this family are always in one to one correspondence with the elements of f e (the Yoneda lemma).

For instance, consider the somewhat trivial unit type () with one element (). The functor Reader () takes any type a and maps it into a function type ()->a. These are just all the functions that pick a single element from the set a. There are as many of these as there are elements in a. Now let’s consider natural transformations from this functor to the Maybe functor:

alpha :: Reader () a -> Maybe a

There are only two of these, dumb and obvious:

dumb (Reader _) = Nothing


obvious (Reader g) = Just (g ())

(The only thing you can do with g is to apply it to the unit value ().)

And, indeed, as predicted by the Yoneda lemma, these correspond to the two elements of the Maybe () type, which are Nothing and Just (). We’ll come back to the Yoneda lemma later — this was just a little teaser.

Beyond Naturality

A parametrically polymorphic function between two functors (including the edge case of the Const functor) is always a natural transformation. Since all standard algebraic data types are functors, any polymorphic function between such types is a natural transformation.

We also have function types at our disposal, and those are functorial in their return type. We can use them to build functors (like the Reader functor) and define natural transformations that are higher-order functions.

However, function types are not covariant in the argument type. They are contravariant. Of course contravariant functors are equivalent to covariant functors from the opposite category. Polymorphic functions between two contravariant functors are still natural transformations in the categorical sense, except that they work on functors from the opposite category to Haskell types.

You might remember the example of a contravariant functor we’ve looked at before:

newtype Op r a = Op (a -> r)

This functor is contravariant in a:

instance Contravariant (Op r) where
    contramap f (Op g) = Op (g . f)

We can write a polymorphic function from, say, Op Bool to Op String:

predToStr (Op f) = Op (\x -> if f x then "T" else "F")

But since the two functors are not covariant, this is not a natural transformation in Hask. However, because they are both contravariant, they satisfy the “opposite” naturality condition:

contramap f . predToStr = predToStr . contramap f

Notice that the function f must go in the opposite direction than what you’d use with fmap, because of the signature of contramap:

contramap :: (b -> a) -> (Op Bool a -> Op Bool b)

Are there any type constructors that are not functors, whether covariant or contravariant? Here’s one example:

a -> a

This is not a functor because the same type a is used both in the negative (contravariant) and positive (covariant) position. You can’t implement fmap or contramap for this type. Therefore a function of the signature:

(a -> a) -> f a

where f is an arbitrary functor, cannot be a natural transformation. Interestingly, there is a generalization of natural transformations, called dinatural transformations, that deals with such cases. We’ll get to them when we discuss ends.

Functor Category

Now that we have mappings between functors — natural transformations — it’s only natural to ask the question whether functors form a category. And indeed they do! There is one category of functors for each pair of categories, C and D. Objects in this category are functors from C to D, and morphisms are natural transformations between those functors.

We have to define composition of two natural transformations, but that’s quite easy. The components of natural transformations are morphisms, and we know how to compose morphisms.

Indeed, let’s take a natural transformation α from functor F to G. Its component at object a is some morphism:

αa :: F a -> G a

We’d like to compose α with β, which is a natural transformation from functor G to H. The component of β at a is a morphism:

βa :: G a -> H a

These morphisms are composable and their composition is another morphism:

βa ∘ αa :: F a -> H a

We will use this morphism as the component of the natural transformation β ⋅ α — the composition of two natural transformations β after α:

(β ⋅ α)a = βa ∘ αa


One (long) look at a diagram convinces us that the result of this composition is indeed a natural transformation from F to H:

H f ∘ (β ⋅ α)a = (β ⋅ α)b ∘ F f


Composition of natural transformations is associative, because their components, which are regular morphisms, are associative with respect to their composition.

Finally, for each functor F there is an identity natural transformation 1F whose components are the identity morphisms:

idF a :: F a -> F a

So, indeed, functors form a category.

A word about notation. Following Saunders Mac Lane I use the dot for the kind of natural transformation composition I have just described. The problem is that there are two ways of composing natural transformations. This one is called the vertical composition, because the functors are usually stacked up vertically in the diagrams that describe it. Vertical composition is important in defining the functor category. I’ll explain horizontal composition shortly.


The functor category between categories C and D is written as [C, D] or DC. This latter notation suggests that a functor category itself might be considered a function object (or an exponential) in some other category. Is this indeed the case?

Let’s have a look at the hierarchy of abstractions that we’ve been building so far. We started with a category, which is a collection of objects and morphisms. Categories themselves (or, strictly speaking small categories, whose objects form sets) are themselves objects in a higher-level category Cat. Morphisms in that category are functors. A Hom-set in Cat is a set of functors. For instance Cat(C, D) is a set of functors between two categories C and D.


A functor category [C, D] is also a set of functors between two categories (plus natural transformations as morphisms). Its objects are the same as the members of Cat(C, D). Moreover, a functor category, being a category, must itself be an object of Cat (it so happens that the functor category between two small categories is itself small). We have a relationship between a Hom-set in a category and an object in the same category. The situation is exactly like the exponential object that we’ve seen in the last section. Let’s see how we can construct the latter in Cat.

As you may remember, in order to construct an exponential, we need to first define a product. In Cat, this turns out to be relatively easy, because small categories are sets of objects, and we know how to define cartesian products of sets. So an object in a product category C × D is just a pair of objects, (c, d), one from C and one from D. Similarly, a morphism between two such pairs, (c, d) and (c', d'), is a pair of morphisms, (f, g), where f :: c -> c' and g :: d -> d'. These pairs of morphisms compose component-wise, and there is always an identity pair that is just a pair of identity morphisms. To make the long story short, Cat is a full-blown cartesian closed category in which there is an exponential object DC for any pair of categories. And by “object” in Cat I mean a category, so DC is a category, which we can identify with the functor category between C and D.


With that out of the way, let’s have a closer look at Cat. By definition, any Hom-set in Cat is a set of functors. But, as we have seen, functors between two objects have a richer structure than just a set. They form a category, with natural transformations acting as morphisms. Since functors are considered morphisms in Cat, natural transformations are morphisms between morphisms.

This richer structure is an example of a 2-category, a generalization of a category where, besides objects and morphisms (which might be called 1-morphisms in this context), there are also 2-morphisms, which are morphisms between morphisms.

In the case of Cat seen as a 2-category we have:

  • Objects: (Small) categories
  • 1-morphisms: Functors between categories
  • 2-morphisms: Natural transformations between functors.

Instead of a Hom-set between two categories C and D, we have a Hom-category — the functor category DC. We have regular functor composition: a functor F from DC composes with a functor G from ED to give G ∘ F from EC. But we also have composition inside each Hom-category — vertical composition of natural transformations, or 2-morphisms, between functors.


With two kinds of composition in a 2-category, the question arises: How do they interact with each other?

Let’s pick two functors, or 1-morphisms, in Cat:

F :: C -> D
G :: D -> E

and their composition:

G ∘ F :: C -> E

Suppose we have two natural transformations, α and β, that act, respectively, on functors F and G:

α :: F -> F'
β :: G -> G'


Notice that we cannot apply vertical composition to this pair, because the target on α is different from the source of β. In fact they are members of two different functor categories: F’ F and G’ G. We can, however, apply composition to the functors F’ and G’, because the target of F’ is the source of G’ — it’s the category D. What’s the relation between the functors G’∘ F’ and G ∘ F?

Having α and β at our disposal, can we define a natural transformation from G ∘ F to G’∘ F’? Let me sketch the construction.


As usual, we start with an object a in C. Its image splits into two objects in D: F a and F'a. There is also a morphism, a component of α, connecting these two objects:

αa :: F a -> F'a

When going from D to E, these two objects split further into four objects:

G (F a), G'(F a), G (F'a), G'(F'a)

We also have four morphisms forming a square. Two of these morphisms are the components of the natural transformation β:

βF a :: G (F a) -> G'(F a)
βF'a :: G (F'a) -> G'(F'a)

The other two are the images of αa under the two functors (functors map morphisms):

G αa :: G (F a) -> G (F'a)
G'αa :: G'(F a) -> G'(F'a)

That’s a lot of morphisms. Our goal is to find a morphism that goes from G (F a) to G'(F'a), a candidate for the component of a natural transformation connecting the two functors G ∘ F and G’∘ F’. In fact there’s not one but two paths we can take from G (F a) to G'(F'a):

G'αa ∘ βF a
βF'a ∘ G αa

Luckily for us, they are equal, because the square we have formed turns out to be the naturality square for β.

We have just defined a component of a natural transformation from G ∘ F to G’∘ F’. The proof of naturality for this transformation is pretty straightforward, provided you have enough patience.

We call this natural transformation the horizontal composition of α and β:

β ∘ α :: G ∘ F -> G'∘ F'

Again, following Mac Lane I use the small circle for horizontal composition, although you may also encounter star in its place.

Here’s a categorical rule of thumb: Every time you have composition, you should look for a category. We have vertical composition of natural transformations, and it’s part of the functor category. But what about the horizontal composition? What category does that live in?

The way to figure this out is to look at Cat sideways. Look at natural transformations not as arrows between functors but as arrows between categories. A natural transformation sits between two categories, the ones that are connected by the functors it transforms. We can think of it as connecting these two categories.


Let’s focus on two objects of Cat — categories C and D. There is a set of natural transformations that go between functors that connect C to D. These natural transformations are our new arrows from C to D. By the same token, there are natural transformations going between functors that connect D to E, which we can treat as new arrows going from D to E. Horizontal composition is the composition of these arrows.

We also have an identity arrow going from C to C. It’s the identity natural transformation that maps the identity functor on C to itself. Notice that the identity for horizontal composition is also the identity for vertical composition, but not vice versa.

Finally, the two compositions satisfy the interchange law:

(β' ⋅ α') ∘ (β ⋅ α) = (β' ∘ β) ⋅ (α' ∘ α)

I will quote Saunders Mac Lane here: The reader may enjoy writing down the evident diagrams needed to prove this fact.

There is one more piece of notation that might come in handy in the future. In this new sideways interpretation of Cat there are two ways of getting from object to object: using a functor or using a natural transformation. We can, however, re-interpret the functor arrow as a special kind of natural transformation: the identity natural transformation acting on this functor. So you’ll often see this notation:

F ∘ α

where F is a functor from D to E, and α is a natural transformation between two functors going from C to D. Since you can’t compose a functor with a natural transformation, this is interpreted as a horizontal composition of the identity natural transformation 1F after α.


α ∘ F

is a horizontal composition of α after 1F.


This concludes the first part of the book. We’ve learned the basic vocabulary of category theory. You may think of objects and categories as nouns; and morphisms, functors, and natural transformations as verbs. Morphisms connect objects, functors connect categories, natural transformations connect functors.

But we’ve also seen that, what appears as an action at one level of abstraction, becomes an object at the next level. A set of morphisms turns into a function object. As an object, it can be a source or a target of another morphism. That’s the idea behind higher order functions.

A functor maps objects to objects, so we can use it as a type constructor, or a parametric type. A functor also maps morphisms, so it is a higher order function — fmap. There are some simple functors, like Const, product, and coproduct, that can be used to generate a large variety of algebraic data types. Function types are also functorial, both covariant and contravariant, and can be used to extend algebraic data types.

Functors may be looked upon as objects in the functor category. As such, they become sources and targets of morphisms: natural transformations. A natural transformation is a special type of polymorphic function.


  1. Define a natural transformation from the Maybe functor to the list functor. Prove the naturality condition for it.
  2. Define at least two different natural transformations between Reader () and the list functor. How many different lists of () are there?
  3. Continue the previous exercise with Reader Bool and Maybe.
  4. Show that horizontal composition of natural transformation satisfies the naturality condition (hint: use components). It’s a good exercise in diagram chasing.
  5. Write a short essay about how you may enjoy writing down the evident diagrams needed to prove the interchange law.
  6. Create a few test cases for the opposite naturality condition of transformations between different Op functors. Here’s one choice:
    op :: Op Bool Int
    op = Op (\x -> x > 0)


    f :: String -> Int
    f x = read x

Next: Declarative Programming.


I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.