Functional Programming



This is part 3 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
---------
m o n e y

using monads in C++. Previously: The State Monad

A constraint satisfaction problem may be solved by brute force, by trying all possible substitutions. This may be done using the imperative approach with loops, or a declarative approach with the list monad. The list monad works by fanning out a list of “alternative universes,” one for each individual substitution. It lets us explore the breadth of the solution space. The universes in which the substitution passes all the constraints are then gathered together to form the solution.

We noticed also that, as we are traversing the depth of the solution space, we could avoid testing unnecessary branches by keeping track of some state. Indeed, we start with a list of 10 possibilities for the first character, but there are only 9 possibilities for the second character, 8 for the third, and so on. It makes sense to start with a list of ten digits, and keep removing digits as we make our choices. At every point, the list of remaining choices is the state we have to keep track of.

Again, there is the imperative approach and the functional approach to state. The imperative approach makes the state mutable and the composition of actions requires backtracking. The functional approach uses persistent, immutable data structures, and the composition is done at the level of functions — plans of action that can be executed once the state is available. These plans of action form the state monad.

The functional solution to our problem involves the combination of the list monad and the state monad. Mashing two monads together is not trivial — in Haskell this is done using monad transformers — but here I’ll show you how to do it manually.

State List

Let’s start with the state, which is the list of digits to choose from:

using State = List<int>;

In the state monad, our basic data structure was a function that took a state and returned a pair: a value and a new state. Now we want our functions to generate, as in quantum mechanics, a whole list of alternative universes, each described by a pair: a value and a state.

Notice that we have to vary both the value and the state within the list. For instance, the function that selects a number from a list should produce a different number and a different list in each of the alternative universes. If the choice is, say, 3, the new state will have 3 removed from the list.

ListSelections

Here’s the convenient generic definition of the list of pairs:

template<class A>
using PairList = List<pair<A, State>>;

And this is our data structure that represents a non-deterministic plan of action:

template<class A> 
using StateList = function<PairList<A>(State)>;

Once we have such a non-deterministic plan of action, we can run it. We give it a state and get back a list of results, each paired with a new state:

template<class A>
PairList<A> runStateList(StateList<A> st, State s)
{
    return st(s);
}

Composing Non-Deterministic Plans

As we’ve seen before, the essence of every monad is the composition of individual actions. The higher-order function mbind that does the composition has the same structure for every monad. It takes two arguments. One argument is the result of our previous activity. For the list monad, it was a list of values. For the state monad it was a plan of action to produce a value (paired with the leftover state). For the combination state-list monad it will be a plan of action to produce a list of values (each combined with the leftover state).

The second argument to mbind is a continuation. It takes a single value. It’s supposed to be the value that is encapsulated in the first argument. In the case of the list monad, it was one value from the list. In the case of the state monad, it was the value that would be returned by the execution of the plan. In the combined state-list monad, this will be one of the values produced by the plan.

The continuation is supposed to produce the final result, be it a list of values or a plan. In the state-list monad it will be a plan to produce a list of values, each with its own leftover state.

Finally, the result of mbind is the composite: for the list monad it was a list, for the state monad it was a plan, and for the combination, it will be a plan to produce a list of values and states.

We could write the signature of mbind simply as:

template<class A, class B>
StateList<B> mbind(StateList<A>, function<StateList<B>(A)>)

but that would prevent the C++ compiler from making automatic type deductions. So instead we parameterize it with the type A and a function type F. That requires some additional footwork to extract the return type from the type of F:

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))

The implementation of mbind is a combination of what we did for the list monad and for the state monad. To begin with, we are supposed to return a plan, so we have to create a lambda. This lambda takes a state s as an argument. Inside the lambda, given the state, we can run the first argument — the plan. We get a list of pairs. Each pair contains a value and a new state. We want to execute the continuation k for each of these values.

To do that, we run fmap over this list (think std::transform, but less messy). Inside fmap, we disassemble each pair into value and state, and execute the continuation k over the value. The result is a new plan. We run this plan, passing it the new state. The result is a list of pairs: value, state.

Since each item in the list produces a list, the result of fmap is a list of lists. Just like we did in the list monad, we concatenate all these lists into one big list using concatAll.

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))
{
    return [g, k](State s) {
        PairList<A> plst = g(s);
        // List<PairList<B>> 
        auto lst2 = fmap([k](pair<A, State> const & p) {
            A a = p.first;
            State s1 = p.second;
            auto ka = k(a);
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);
    };
}

To make state-list into a full-blown monad we need one more ingredient: the function mreturn that turns any value into a trivial plan to produce a singleton list with that value. The state is just piped through:

template<class A>
StateList<A> mreturn(A a)
{
    return [a](State s) {
        // singleton list
        return PairList<A>(make_pair(a, s)); 
    };
}

So that would be all for the state-list monad, except that in C++ we have to special-case mbind for the type of continuation that takes a void argument (there is no value of type void in C++). We’ll call this version mthen:

template<class A, class F>
auto mthen(StateList<A> g, F k) -> decltype(k())
{
    return [g, k](State s) {
        PairList<A> plst = g(s);
        auto lst2 = fmap([k](pair<A, State> const & p) {
            State s1 = p.second;
            auto ka = k();
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);
    };
}

Notice that, even though the value resulting from running the first argument to mthen is discarded, we still have to run it because we need the modified state. In a sense, we are running it only “for side effects,” although all these functions are pure and have no real side effects. Once again, we can have our purity cake and eat it too, simulating side effects with pure functions.

Guards

Every time we generate a candidate substitution, we have to check if it satisfies our constraints. That would be easy if we could access the substitution. But we are in a rather peculiar situation: We are writing code that deals with plans to produce substitutions. How do you test a plan? Obviously we need to write a plan to do the testing (haven’t we heard that before?).

Here’s the trick: Look at the implementation of mthen (or mbind for that matter). When is the continuation not executed? The continuation is not executed if the list passed to fmap is empty. Not executing the continuation is equivalent to aborting the computation. And what does mthen return in that case? An empty list!

So the way to abort a computation is to pass an argument to mthen that produces an empty list. Such a poison pill of a plan is produced by a function traditionally called mzero.

template<class A>
StateList<A> mzero()
{
    return[](State s) {
        return PairList<A>(); // empty list
    };
}

In functional programming, a monad that supports this functionality is called a “monad plus” (it also has a function called mplus). It so happens that the list monad and the state-list monad are both plus.

Now we need a function that produces a poison pill conditionally, so we can pass the result of this function to mthen. Remember, mthen ignores the individual results, but is sensitive to the size of the list. The function guard, on success, produces a discardable value — here, a nullptr — in a singleton list. This will trigger the execution of the continuation in mthen, while discarding the nullptr. On failure, guard produces mzero, which will abort the computation.

StateList<void*> guard(bool b)
{
    if (b) {
        return [](State s) {
            // singleton list with discardable value
            return List<pair<void*, State>>(make_pair(nullptr, s));
        };
    }
    else
        return mzero<void*>(); // empty list
}

The Client Side

Everything I’ve described so far is reusable code that should be put in a library. Equipped with the state-list monad library, all the client needs to do is to write a few utility functions and combine them into a solution.

Here’s the function that picks a number from a list. Actually, it picks a number from a list in all possible ways. It returns a list of all picks paired with the remainders of the list. This function is our non-deterministic selection plan.

template<class A>
PairList<A> select(List<A> lst)
{
    if (lst.isEmpty())
        return PairList<A>();

    A       x  = lst.front();
    List<A> xs = lst.popped_front();

    auto result = List<pair<A, State>>();
    forEach(select(xs), [x, &result](pair<A, List<A>> const & p)
    {
        A       y  = p.first;
        List<A> ys = p.second;
        auto y_xys = make_pair(y, ys.pushed_front(x));
        result = result.pushed_front(y_xys);
    });

    return result.pushed_front(make_pair(x, xs));
}

Here’s a little utility function that converts a list (actually, a vector) of digits to a number:

int asNumber(vector<int> const & v)
{
    int acc = 0;
    for (auto i : v)
        acc = 10 * acc + i;
    return acc;
}

And here’s the final solution to our puzzle:

StateList<tuple<int, int, int>> solve()
{
    StateList<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

It starts by creating the basic plan called sel for picking numbers from a list. (The list is the state that will be provided later.) It binds this selection to a very large continuation that produces the final result — the plan to produce triples of numbers.

Here, sel is really a plan to produce a list, but the continuation expects just one number, s. If you look at the implementation of mbind, you’ll see that this continuation is called for every single pick, and the results are aggregated into one list.

The large continuation that is passed to the first mbind is itself implemented using mbind. This one, again, takes the plan sel. But we know that the state on which this sel will operate is one element shorter than the one before it. This second selection is passed to the next continuation, and so on.

Then the pruning begins: There is an mthen that takes a guard that aborts all computations where either s or m is zero. Then three numbers are formed from the selected digits. Notice that all those digits are captured by the enclosing lambdas by value (the [=] clauses). Yet another mthen takes a guard that checks the arithmetics and, if that one passes, the final plan to produce a singleton triple (send, more, money) is returned.

In theory, all these singletons would be concatenated on the way up to produce a list of solutions, but in reality this puzzle has only one solution. To get this solution, you run the plan with the list of digits:

List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
cout << evalStateList(solve(), lst);

The function evalStateList works the same way as runStateList except that it discards the leftover state. You can either implement it or use runStateList instead.

What’s Next?

One of the advantages of functional programming (other than thread safety) is that it makes refactoring really easy. Did you notice that 8 lines of code in our solution are almost identical? That’s outrageous! We have to do something about it. And while we’re at it, isn’t it obvious that a substitution should be represented as a map from characters to digits? I’ll show you how to do it in the next installment.

All this code and more is available on github.


This is the second part of the cycle “Using Monads in C++ to Solve Constraints” in which I’m illustrating functional techniques using the example of a simple puzzle:

  s e n d
+ m o r e
---------
m o n e y

Previously, I talked about using the list monad to search the breadth of the solution space.

What would you do if you won a lottery? Would you buy a sports car, drop your current job, and go on a trip across the United States? Maybe you would start your own company, multiply the money, and then buy a private jet?

We all like making plans, but they are often contingent on the state of our finances. Such plans can be described by functions. For instance, the car-buying plan is a function:

pair<Car, Cash> buyCar(Cash cashIn)

The input is some amount of Cash, and the output is a brand new Car and the leftover Cash (not necessarily a positive number!).

In general, a financial plan is a function that takes cash and returns the result paired with the new value of cash. It can be described generically using a template:

template<class A>
using Plan = function<pair<A, Cash>(Cash)>;

You can combine smaller plans to make bigger plans. For instance, you may use the leftover cash from your car purchase to finance your trip, or invest in a business, and so on.

There are some things that you already own, and you can trivially include them in your plans:

template<class A>
Plan<A> got_it(A a)
{
    return [a](Cash s) { return make_pair(a, s); };
}

What does all this daydreaming have to do with the solution to our puzzle? I mentioned previously that we needed to keep track of state, and this is how functional programmers deal with state. Instead of relying on side effects to silently modify the state, they write code that generates plans of action.

An imperative programmer, on the other hand, might implement the car-buying procedure by passing it a bank object, and the withdrawal of money would be a side effect of the purchase. Or, the horror!, the bank object could be global.

In functional programming, each individual plan is a function: The state comes in, and the new state goes out, paired with whatever value the function was supposed to produce in the first place. These little plans are aggregated into bigger plans. Finally, the master plan is executed — that’s when the actual state is passed in and the result comes out together with a new state. We can do the same in modern C++ using lambdas.

You might be familiar with a similar technique used with expression templates in C++. Expression templates form the basis of efficient implementation of matrix calculus, where expressions involving matrices and vectors are not evaluated on the spot but rather converted to parse trees. These trees may then be evaluated using more efficient techniques. You can think of an expression template as a plan for evaluating the final result.

The State Monad

SelectionsWithLists

To find the solution to our puzzle, we will generate substitutions by picking digits from a list of integers. This list of integers will be our state.

using State = List<int>;

We will be using a persistent list, so we won’t have to worry about backtracking. Persistent lists are never mutated — all their versions persist, and we can go back to them without fearing that they may have changed. We’ll need that when we combine our state calculations with the list monad to get the final solution. For now, we’ll just consider one substitution.

We’ll be making and combining all kinds of plans that rely on, and produce new state:

template<class A>
using Plan = function<pair<A, State>(State)>;

We can always run a plan, provided we have the state available:

template<class A>
pair<A, State> runPlan(Plan<A> pl, State s)
{
    return pl(s);
}

You may recall from the previous post that the essence of every monad is the ability to compose smaller items into bigger ones. In the case of the state monad we need the ability to compose plans of action.

For instance, suppose that you know how to generate a plan to travel across the United States on a budget, as long as you have a car. You don’t have a car though. Not to worry, you have a plan to get a car. It should be easy to generate a composite plan that involves buying the car and then continuing with your trip.

Notice the two ingredients: one is the plan to buy a car: Plan<Car>. The second ingredient is a function that takes a car and produces the plan for your trip, Plan<Trip>. This function is the continuation that leads to your final goal: it’s a function of the type Plan<Trip>(Car). And the continuation itself may in turn be a composite of many smaller plans.

So here’s the function mbind that binds a plan pl to a continuation k. The continuation uses the output of the plan pl to generate another plan. The function mbind is supposed to return a new composite plan, so it must return a lambda. Like any other plan, this lambda takes a state and returns a pair: value, state. We’ll implement this lambda for the most general case.

The logic is simple. Inside the lambda we will have the state available, so we can run the plan pl. We get back a pair: the value of type A and the new state. We pass this value to the continuation k and get back a new plan. Finally, we run that plan with the new state and we’re done.

template<class A, class F>
auto mbind(Plan<A> pl, F k) -> decltype(k(pl(State()).first))
{
    using B = decltype(k(pl(State()).first)(State()).first);
    // this should really be done with concepts
    static_assert(std::is_convertible<
        F, std::function<Plan<B>(A)>> ::value,
        "mbind requires a function type Plan<B>(A)");

    return [pl, k](State s) {
        pair<A, State> ps = runPlan(pl, s);
        Plan<B> plB = k(ps.first);
        return runPlan(plB, ps.second); // new state!
    };
}

Notice that all this running of plans inside mbind doesn’t happen immediately. It happens when the lambda is executed, which is when the larger plan is run (possibly as part of an even larger plan). So what mbind does is to create a new plan to be executed in the future.

And, as with every monad, there is a function that takes regular input and turns it into a trivial plan. I called it got_it before, but a more common name would be mreturn:

template<class A>
Plan<A> mreturn(A a)
{
    return [a](State s) { return make_pair(a, s); };
}

The state monad usually comes with two helper functions. The function getState gives direct access to the state by turning it into the return value:

Plan<State> getState()
{
    return [](State s) { return make_pair(s, s); };
}

Using getState you may examine the state when the plan is running, and dynamically select different branches of your plan. This makes monads very flexible, but it also makes the composition of different monads harder. We’ll see this in the next installment, when we compose the state monad with the list monad.

The second helper function is used to modify (completely replace) the state.

Plan<void*> putState(State newState)
{
    return [=](State s) { return make_pair(nullptr, newState); };
}

It doesn’t evaluate anything useful, so its return type is void* and the returned value is nullptr. Its only purpose is to encapsulate a side effect. What’s amazing is that you can do it and still preserve functional purity. There’s no better example of having your cake and eating it too.

Example

To demonstrate the state monad in action, let’s implement a tiny example. We start with a small plan that just picks the first number from a list (the list will be our state):

pair<int, List<int>> select(List<int> lst)
{
    int i = lst.front();
    return make_pair(i, lst.popped_front());
}

Persistent list method popped_front returns a list with its first element popped off. Typical of persistent data structures, this method doesn’t modify the original list. It doesn’t copy the list either — it just creates a pointer to its tail.

Here’s our first plan:

Plan<int> sel = &select;

Now we can create a more complex plan to produce a pair of integers:

Plan<pair<int, int>> pl = 
    mbind(sel, [=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

Let’s analyze this code. The first mbind takes a plan sel that selects the first element from a list (the list will be provided later, when we execute this plan). It binds it to the continuation (whose body I have grayed out) that takes the selected integer i and produces a plan to make a pair of integers. Here’s this continuation:

[=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

It binds the plan sel to another, smaller continuation that takes the selected element j and produces a plan to make a pair of integers. Here’s this smaller continuation:

[=](int j) { return
    mreturn(make_pair(i, j));
});

It combines the first integer i that was captured by the lambda, with the second integer j that is the argument to the lambda, and creates a trivial plan that returns a pair:

mreturn(make_pair(i, j));

Notice that we are using the same plan sel twice. But when this plan is executed inside of our final plan, it will return two different elements from the input list. When mbind is run, it first passes the state (the list of integers) to the first sel. It gets back the modified state — the list with the first element popped. It then uses this shorter list to execute the plan produced by the continuation. So the second sel gets the shortened list, and picks its first element, which is the second element of the original list. It, too, shortens the list, which is then passed to mreturn, which doesn’t modify it any more.

We can now run the final plan by giving it a list to pick from:

List<int> st{ 1, 2, 3 };
cout << runPlan(pl, st) << endl;

We are still not ready to solve the puzzle, but we are much closer. All we need is to combine the list monad with the state monad. I’m leaving this task for the next installment.

But here’s another look at the final solution:

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

This time I didn’t rename mbind to for_each and mreturn to yield.

Now that you’re familiar with the state monad, you may appreciate the fact that the same sel passed as the argument to mbind will yield a different number every time, as long as the state list contains unique digits.

Before You Post a Comment

I know what you’re thinking: Why would I want to complicate my life with monads if there is a much simpler, imperative way of dealing with state? What’s the payoff of the functional approach? The immediate payoff is thread safety. In imperative programming mutable shared state is a never-ending source of concurrency bugs. The state monad and the use of persistent data structures eliminates the possibility of data races. And it does it without any need for synchronization (other than reference counting inside shared pointers).

I will freely admit that C++ is not the best language for functional programming, and that monadic code in C++ looks overly complicated. But, let’s face it, in C++ everything looks overly complicated. What I’m describing here are implementation details of something that should be encapsulated in an easy to use library. I’ll come back to it in the next installment of this mini-series.

Challenges

  1. Implement select from the example in the text using getState and putState.
  2. Implement evalPlan, a version of runPlan that only returns the final value, without the state.
  3. Implement mthen — a version of mbind where the continuation doesn’t take any arguments. It ignores the result of the Plan, which is the first argument to mthen (but it still runs it, and uses the modified state).
  4. Use the state monad to build a simple RPN calculator. The state is the stack (a List) of items:
    enum ItemType { Plus, Minus, Num };
    
    struct Item
    {
        ItemType _type;
        int _num;
        Item(int i) : _type(Num), _num(i) {}
        Item(ItemType t) : _type(t), _num(-1) {}
    };

    Implement the function calc() that creates a plan to evaluate an integer. Here’s the test code that should print -1:

    List<int> stack{ Item(Plus)
                   , Item(Minus)
                   , Item(4)
                   , Item(8)
                   , Item(3) };
    cout << evalPlan(calc(), stack) << endl;

    (The solution is available on github.)


I am sometimes asked by C++ programmers to give an example of a problem that can’t be solved without monads. This is the wrong kind of question — it’s like asking if there is a problem that can’t be solved without for loops. Obviously, if your language supports a goto, you can live without for loops. What monads (and for loops) can do for you is to help you structure your code. The use of loops and if statements lets you convert spaghetti code into structured code. Similarly, the use of monads lets you convert imperative code into declarative code. These are the kind of transformations that make code easier to write, understand, maintain, and generalize.

So here’s a problem that you may get as an interview question. It’s a small problem, so the advantages of various approaches might not be immediately obvious, especially if you’ve been trained all your life in imperative programming, and you are seeing monads for the first time.

You’re supposed write a program to solve this puzzle:

  s e n d
+ m o r e
---------
m o n e y

Each letter correspond to a different digit between 0 and 9. Before you continue reading this post, try to think about how you would approach this problem.

The Analysis

It never hurts to impress your interviewer with your general knowledge by correctly classifying the problem. This one belongs to the class of “constraint satisfaction problems.” The obvious constraint is that the numbers obtained by substituting letters with digits have to add up correctly. There are also some less obvious constraints, namely the numbers should not start with zero.

If you were to solve this problem using pencil and paper, you would probably come up with lots of heuristics. For instance, you would deduce that m must stand for 1 because that’s the largest possible carry from the addition of two digits (even if there is a carry from the previous column). Then you’d figure out that s must be either 8 or 9 to produce this carry, and so on. Given enough time, you could probably write an expert system with a large set of rules that could solve this and similar problems. (Mentioning an expert system could earn you extra points with the interviewer.)

However, the small size of the problem suggests that a simple brute force approach is probably best. The interviewer might ask you to estimate the number of possible substitutions, which is 10!/(10 – 8)! or roughly 2 million. That’s not a lot. So, really, the solution boils down to generating all those substitutions and testing the constraints for each.

The Straightforward Solution

The mind of an imperative programmer immediately sees the solution as a set of 8 nested loops (there are 8 unique letters in the problem: s, e, n, d, m, o, r, y). Something like this:

for (int s = 0; s < 10; ++s)
    for (int e = 0; e < 10; ++e)
        for (int n = 0; n < 10; ++n)
            for (int d = 0; d < 10; ++d)
                ...

and so on, until y. But then there is the condition that the digits have to be different, so you have to insert a bunch of tests like:

e != s
n != s && n != e
d != s && d != e && d != n

and so on, the last one involving 7 inequalities… Effectively you have replaced the uniqueness condition with 28 new constraints.

This would probably get you through the interview at Microsoft, Google, or Facebook, but really, can’t you do better than that?

The Smart Solution

Before I proceed, I should mention that what follows is almost a direct translation of a Haskell program from the blog post by Justin Le. I strongly encourage everybody to learn some Haskell, but in the meanwhile I’ll be happy to serve as your translator.

The problem with our naive solution is the 28 additional constraints. Well, I guess one could live with that — except that this is just a tiny example of a whole range of constraint satisfaction problems, and it makes sense to figure out a more general approach.

The problem can actually be formulated as a superposition of two separate concerns. One deals with the depth and the other with the breadth of the search for solutions.

Let me touch on the depth issue first. Let’s consider the problem of creating just one substitution of letters with numbers. This could be described as picking 8 digits from a list of 0, 1, …9, one at a time. Once a digit is picked, it’s no longer in the list. We don’t want to hard code the list, so we’ll make it a parameter to our algorithm. Notice that this approach works even if the list contains duplicates, or if the list elements are not easily comparable for equality (for instance, if they are futures). We’ll discuss the list-picking part of the problem in more detail later.

Now let’s talk about breadth: we have to repeat the above process for all possible picks. This is what the 8 nested loops were doing. Except that now we are in trouble because each individual pick is destructive. It removes items from the list — it mutates the list. This is a well known problem when searching through solution spaces, and the standard remedy is called backtracking. Once you have processed a particular candidate, you put the elements back in the list, and try the next one. Which means that you have to keep track of your state, either implicitly on your function’s stack, or in a separate explicit data structure.

Wait a moment! Weren’t we supposed to talk about functional programming? So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming? Functional programmers have been using the state monad since time immemorial. And mutation is not an issue if you’re using persistent data structures. So fasten your seat belts and make sure your folding trays are in the upright position.

The List Monad

We’ll start with a small refresher in quantum mechanics. As you may remember from school, quantum processes are non-deterministic. You may repeat the same experiment many times and every time get a different result. There is a very interesting view of quantum mechanics called the many-worlds interpretation, in which every experiment gives rise to multiple alternate histories. So if the spin of an electron may be measured as either up or down, there will be one universe in which it’s up, and one in which it’s down.

Selections

We’ll use the same approach to solving our puzzle. We’ll create an alternate universe for each digit substitution for a given letter. So we’ll start with 10 universes for the letter s; then we’ll split each of them into ten universes for the letter e, and so on. Of course, most of these universes won’t yield the desired result, so we’ll have to destroy them. I know, it seems kind of wasteful, but in functional programming it’s easy come, easy go. The creation of a new universe is relatively cheap. That’s because new universes are not that different from their parent universes, and they can share almost all of their data. That’s the idea behind persistent data structures. These are the immutable data structures that are “mutated” by cloning. A cloned data structure shares most of its implementation with the parent, except for a small delta. We’ll be using persistent lists described in my earlier post.

Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward. First, we need functions that generate new worlds. Since we are cheap, we’ll only generate the parts that are different. So what’s the difference between all the worlds that we get when selecting the substitution for the letter s? Just the number that we assign to s. There are ten worlds corresponding to the ten possible digits (we’ll deal with the constraints like s being different from zero later). So all we need is a function that generates a list of ten digits. These are our ten universes in a nutshell. They share everything else.

Once you are in an alternate universe, you have to continue with your life. In functional programming, the rest of your life is just a function called a continuation. I know it sounds like a horrible simplification. All your actions, emotions, and hopes reduced to just one function. Well, maybe the continuation just describes one aspect of your life, the computational part, and you can still hold on to our emotions.

So what do our lives look like, and what do they produce? The input is the universe we’re in, in particular the one number that was picked for us. But since we live in a quantum universe, the outcome is a multitude of universes. So a continuation takes a number, and produces a list. It doesn’t have to be a list of numbers, just a list of whatever characterizes the differences between alternate universes. In particular, it could be a list of different solutions to our puzzle — triples of numbers corresponding to “send”, “more”, and “money”. (There is actually only one solution, but that’s beside the point.)

And what’s the very essence of this new approach? It’s the binding of the selection of the universes to the continuation. That’s where the action is. This binding, again, can be expressed as a function. It’s a function that takes a list of universes and a continuation that produces a list of universes. It returns an even bigger list of universes. We’ll call this function for_each, and we’ll make it as generic as possible. We won’t assume anything about the type of the universes that are passed in, or the type of the universes that the continuation k produces. We’ll also make the type of the continuation a template parameter and extract the return type from it using auto and decltype:

template<class A, class F>
auto for_each(List<A> lst, F k) -> decltype(k(lst.front()))
{
    using B = decltype(k(lst.front()).front());
    // This should really be expressed using concepts
    static_assert(std::is_convertible<
        F, std::function<List<B>(A)>>::value,
        "for_each requires a function type List<B>(A)");

    List<List<B>> lstLst = fmap(k, lst);
    return concatAll(lstLst);
}

The function fmap is similar to std::transform. It applies the continuation k to every element of the list lst. Because k itself produces a list, the result is a list of lists, lstLst. The function concatAll concatenates all those lists into one big list.

Congratulations! You have just seen a monad. This one is called the list monad and it’s used to model non-deterministic processes. The monad is actually defined by two functions. One of them is for_each, and here’s the other one:

template<class A>
List<A> yield(A a)
{
    return List<A> (a);
}

It’s a function that returns a singleton list. We use yield when we are done multiplying universes and we just want to return a single value. We use it to create a single-valued continuation. It represents the lonesome boring life, devoid of any choices.

I will later rename these functions to mbind and mreturn, because they are part of any monad, not just the list monad.

The names like for_each or yield have a very imperative ring to them. That’s because, in functional programming, monadic code plays a role similar to imperative code. But neither for_each nor yield are control structures — they are functions. In particular for_each, which sounds and works like a loop, is just a higher order function; and so is fmap, which is used in its implementation. Of course, at some level the code becomes imperative — fmap can either be implemented recursively or using an actual loop — but the top levels are just declarations of functions. Hence, declarative programming.

There is a slight difference between a loop and a function on lists like for_each: for_each takes a whole list as an argument, while a loop might generate individual items — in this case integers — on the fly. This is not a problem in a lazy functional language like Haskell, where a list is evaluated on demand. The same behavior may be implemented in C++ using streams or lazy ranges. I won’t use it here, since the lists we are dealing with are short, but you can read more about it in my earlier post Getting Lazy with C++.

We are not ready yet to implement the solution to our puzzle, but I’d like to give you a glimpse of what it looks like. For now, think of StateL as just a list. See if it starts making sense (I grayed out the usual C++ noise):

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return for_each(sel, [=](int s) {
    return for_each(sel, [=](int e) {
    return for_each(sel, [=](int n) {
    return for_each(sel, [=](int d) {
    return for_each(sel, [=](int m) {
    return for_each(sel, [=](int o) {
    return for_each(sel, [=](int r) {
    return for_each(sel, [=](int y) {
        return yield_if(s != 0 && m != 0, [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return yield_if(send + more == money, [=]() {
                return yield(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

The first for_each takes a selection of integers, sel, (never mind how we deal with uniqueness); and a continuation, a lambda, that takes one integer, s, and produces a list of solutions — tuples of three integers. This continuation, in turn, calls for_each with a selection for the next letter, e, and another continuation that returns a list of solutions, and so on.

The innermost continuation is a conditional version of yield called yield_if. It checks a condition and produces a zero- or one-element list of solutions. Internally, it calls another yield_if, which calls the ultimate yield. If that final yield is called (and it might not be, if one of the previous conditions fails), it will produce a solution — a triple of numbers. If there is more than one solution, these singleton lists will get concatenated inside for_each while percolating to the top.

In the second part of this post I will come back to the problem of picking unique numbers and introduce the state monad. You can also have a peek at the code on github.

Challenges

  1. Implement for_each and yield for a vector instead of a List. Use the Standard Library transform instead of fmap.
  2. Using the list monad (or your vector monad), write a function that generates all positions on a chessboard as pairs of characters between 'a' and 'h' and numbers between 1 and 8.
  3. Implement a version of for_each (call it repeat) that takes a continuation k of the type function<List<B>()> (notice the void argument). The function repeat calls k for each element of the list lst, but it ignores the element itself.

This is part 10 of Categories for Programmers. Previously: Function Types. See the Table of Contents.

We talked about functors as mappings between categories that preserve their structure. A functor “embeds” one category in another. It may collapse multiple things into one, but it never breaks connections. One way of thinking about it is that with a functor we are modeling one category inside another. The source category serves as a model, a blueprint, for some structure that’s part of the target category.

1_Functors

There may be many ways of embedding one category in another. Sometimes they are equivalent, sometimes very different. One may collapse the whole source category into one object, another may map every object to a different object and every morphism to a different morphism. The same blueprint may be realized in many different ways. Natural transformations help us compare these realizations. They are mappings of functors — special mappings that preserve their functorial nature.

Consider two functors F and G between categories C and D. If you focus on just one object a in C, it is mapped to two objects: F a and G a. A mapping of functors should therefore map F a to G a.

2_NatComp

Notice that F a and G a are objects in the same category D. Mappings between objects in the same category should not go against the grain of the category. We don’t want to make artificial connections between objects. So it’s natural to use existing connections, namely morphisms. A natural transformation is a selection of morphisms: for every object a, it picks one morphism from F a to G a. If we call the natural transformation α, this morphism is called the component of α at a, or αa.

αa :: F a -> G a

Keep in mind that a is an object in C while αa is a morphism in D.

If, for some a, there is no morphism between F a and G a in D, there can be no natural transformation between F and G.

Of course that’s only half of the story, because functors not only map objects, they map morphisms as well. So what does a natural transformation do with those mappings? It turns out that the mapping of morphisms is fixed — under any natural transformation between F and G, F f must be transformed into G f. What’s more, the mapping of morphisms by the two functors drastically restricts the choices we have in defining a natural transformation that’s compatible with it. Consider a morphism f between two objects a and b in C. It’s mapped to two morphisms, F f and G f in D:

F f :: F a -> F b
G f :: G a -> G b

The natural transformation α provides two additional morphisms that complete the diagram in D:

αa :: F a -> G a
αb :: F b -> G b

3_Naturality

Now we have two ways of getting from F a to G b. To make sure that they are equal, we must impose the naturality condition that holds for any f:

G f ∘ αa = αb ∘ F f

The naturality condition is a pretty stringent requirement. For instance, if the morphism F f is invertible, naturality determines αb in terms of αa. It transports αa along f:

αb = (G f) ∘ αa ∘ (F f)-1

4_Transport

If there is more than one invertible morphism between two objects, all these transports have to agree. In general, though, morphisms are not invertible; but you can see that the existence of natural transformations between two functors is far from guaranteed. So the scarcity or the abundance of functors that are related by natural transformations may tell you a lot about the structure of categories between which they operate. We’ll see some examples of that when we talk about limits and the Yoneda lemma.

Looking at a natural transformation component-wise, one may say that it maps objects to morphisms. Because of the naturality condition, one may also say that it maps morphisms to commuting squares — there is one commuting naturality square in D for every morphism in C.

Naturality

This property of natural transformations comes in very handy in a lot of categorical constructions, which often include commuting diagrams. With a judicious choice of functors, a lot of these commutativity conditions may be transformed into naturality conditions. We’ll see examples of that when we get to limits, colimits, and adjunctions.

Finally, natural transformations may be used to define isomorphisms of functors. Saying that two functors are naturally isomorphic is almost like saying they are the same. Natural isomorphism is defined as a natural transformation whose components are all isomorphisms (invertible morphisms).

Polymorphic Functions

We talked about the role of functors (or, more specifically, endofunctors) in programming. They correspond to type constructors that map types to types. They also map functions to functions, and this mapping is implemented by a higher order function fmap (or transform, then, and the like in C++).

To construct a natural transformation we start with an object, here a type, a. One functor, F, maps it to the type F a. Another functor, G, maps it to G a. The component of a natural transformation alpha at a is a function from F a to G a. In pseudo-Haskell:

alphaa :: F a -> G a

A natural transformation is a polymorphic function that is defined for all types a:

alpha :: forall a . F a -> G a

The forall a is optional in Haskell (and in fact requires turning on the language extension ExplicitForAll). Normally, you would write it like this:

alpha :: F a -> G a

Keep in mind that it’s really a family of functions parameterized by a. This is another example of the terseness of the Haskell syntax. A similar construct in C++ would be slightly more verbose:

template<class A> G<A> alpha(F<A>);

There is a more profound difference between Haskell’s polymorphic functions and C++ generic functions, and it’s reflected in the way these functions are implemented and type-checked. In Haskell, a polymorphic function must be defined uniformly for all types. One formula must work across all types. This is called parametric polymorphism.

C++, on the other hand, supports by default ad hoc polymorphism, which means that a template doesn’t have to be well-defined for all types. Whether a template will work for a given type is decided at instantiation time, where a concrete type is substituted for the type parameter. Type checking is deferred, which unfortunately often leads to incomprehensible error messages.

In C++, there is also a mechanism for function overloading and template specialization, which allows different definitions of the same function for different types. In Haskell this functionality is provided by type classes and type families.

Haskell’s parametric polymorphism has an unexpected consequence: any polymorphic function of the type:

alpha :: F a -> G a

where F and G are functors, automatically satisfies the naturality condition. Here it is in categorical notation (f is a function f::a->b):

G f ∘ αa = αb ∘ F f

In Haskell, the action of a functor G on a morphism f is implemented using fmap. I’ll first write it in pseudo-Haskell, with explicit type annotations:

fmapG f . alphaa = alphab . fmapF f

Because of type inference, these annotations are not necessary, and the following equation holds:

fmap f . alpha = alpha . fmap f

This is still not real Haskell — function equality is not expressible in code — but it’s an identity that can be used by the programmer in equational reasoning; or by the compiler, to implement optimizations.

The reason why the naturality condition is automatic in Haskell has to do with “theorems for free.” Parametric polymorphism, which is used to define natural transformations in Haskell, imposes very strong limitations on the implementation — one formula for all types. These limitations translate into equational theorems about such functions. In the case of functions that transform functors, free theorems are the naturality conditions. [You may read more about free theorems in my blog Parametricity: Money for Nothing and Theorems for Free.]

One way of thinking about functors in Haskell that I mentioned earlier is to consider them generalized containers. We can continue this analogy and consider natural transformations to be recipes for repackaging the contents of one container into another container. We are not touching the items themselves: we don’t modify them, and we don’t create new ones. We are just copying (some of) them, sometimes multiple times, into a new container.

The naturality condition becomes the statement that it doesn’t matter whether we modify the items first, through the application of fmap, and repackage later; or repackage first, and then modify the items in the new container, with its own implementation of fmap. These two actions, repackaging and fmapping, are orthogonal. “One moves the eggs, the other boils them.”

Let’s see a few examples of natural transformations in Haskell. The first is between the list functor, and the Maybe functor. It returns the head of the list, but only if the list is non-empty:

safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:xs) = Just x

It’s a function polymorphic in a. It works for any type a, with no limitations, so it is an example of parametric polymorphism. Therefore it is a natural transformation between the two functors. But just to convince ourselves, let’s verify the naturality condition.

fmap f . safeHead = safeHead . fmap f

We have two cases to consider; an empty list:

fmap f (safeHead []) = fmap f Nothing = Nothing
safeHead (fmap f []) = safeHead [] = Nothing

and a non-empty list:

fmap f (safeHead (x:xs)) = fmap f (Just x) = Just (f x)
safeHead (fmap f (x:xs)) = safeHead (f x : fmap f xs) = Just (f x)

I used the implementation of fmap for lists:

fmap f [] = []
fmap f (x:xs) = f x : fmap f xs

and for Maybe:

fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)

An interesting case is when one of the functors is the trivial Const functor. A natural transformation from or to a Const functor looks just like a function that’s either polymorphic in its return type or in its argument type.

For instance, length can be thought of as a natural transformation from the list functor to the Const Int functor:

length :: [a] -> Const Int a
length [] = Const 0
length (x:xs) = Const (1 + unConst (length xs))

Here, unConst is used to peel off the Const constructor:

unConst :: Const c a -> c
unConst (Const x) = x

Of course, in practice length is defined as:

length :: [a] -> Int

which effectively hides the fact that it’s a natural transformation.

Finding a parametrically polymorphic function from a Const functor is a little harder, since it would require the creation of a value from nothing. The best we can do is:

scam :: Const Int a -> Maybe a
scam (Const x) = Nothing

Another common functor that we’ve seen already, and which will play an important role in the Yoneda lemma, is the Reader functor. I will rewrite its definition as a newtype:

newtype Reader e a = Reader (e -> a)

It is parameterized by two types, but is (covariantly) functorial only in the second one:

instance Functor (Reader e) where  
    fmap f (Reader g) = Reader (\x -> f (g x))

For every type e, you can define a family of natural transformations from Reader e to any other functor f. We’ll see later that the members of this family are always in one to one correspondence with the elements of f e (the Yoneda lemma).

For instance, consider the somewhat trivial unit type () with one element (). The functor Reader () takes any type a and maps it into a function type ()->a. These are just all the functions that pick a single element from the set a. There are as many of these as there are elements in a. Now let’s consider natural transformations from this functor to the Maybe functor:

alpha :: Reader () a -> Maybe a

There are only two of these, dumb and obvious:

dumb (Reader _) = Nothing

and

obvious (Reader g) = Just (g ())

(The only thing you can do with g is to apply it to the unit value ().)

And, indeed, as predicted by the Yoneda lemma, these correspond to the two elements of the Maybe () type, which are Nothing and Just (). We’ll come back to the Yoneda lemma later — this was just a little teaser.

Beyond Naturality

A parametrically polymorphic function between two functors (including the edge case of the Const functor) is always a natural transformation. Since all standard algebraic data types are functors, any polymorphic function between such types is a natural transformation.

We also have function types at our disposal, and those are functorial in their return type. We can use them to build functors (like the Reader functor) and define natural transformations that are higher-order functions.

However, function types are not covariant in the argument type. They are contravariant. Of course contravariant functors are equivalent to covariant functors from the opposite category. Polymorphic functions between two contravariant functors are still natural transformations in the categorical sense, except that they work on functors from the opposite category to Haskell types.

You might remember the example of a contravariant functor we’ve looked at before:

newtype Op r a = Op (a -> r)

This functor is contravariant in a:

instance Contravariant (Op r) where
    contramap f (Op g) = Op (g . f)

We can write a polymorphic function from, say, Op Bool to Op String:

predToStr (Op f) = Op (\x -> if f x then "T" else "F")

But since the two functors are not covariant, this is not a natural transformation in Hask. However, because they are both contravariant, they satisfy the “opposite” naturality condition:

contramap f . predToStr = predToStr . contramap f

Notice that the function f must go in the opposite direction than what you’d use with fmap, because of the signature of contramap:

contramap :: (b -> a) -> (Op Bool a -> Op Bool b)

Are there any type constructors that are not functors, whether covariant or contravariant? Here’s one example:

a -> a

This is not a functor because the same type a is used both in the negative (contravariant) and positive (covariant) position. You can’t implement fmap or contramap for this type. Therefore a function of the signature:

(a -> a) -> f a

where f is an arbitrary functor, cannot be a natural transformation. Interestingly, there is a generalization of natural transformations, called dinatural transformations, that deals with such cases. We’ll get to them when we discuss ends.

Functor Category

Now that we have mappings between functors — natural transformations — it’s only natural to ask the question whether functors form a category. And indeed they do! There is one category of functors for each pair of categories, C and D. Objects in this category are functors from C to D, and morphisms are natural transformations between those functors.

We have to define composition of two natural transformations, but that’s quite easy. The components of natural transformations are morphisms, and we know how to compose morphisms.

Indeed, let’s take a natural transformation α from functor F to G. Its component at object a is some morphism:

αa :: F a -> G a

We’d like to compose α with β, which is a natural transformation from functor G to H. The component of β at a is a morphism:

βa :: G a -> H a

These morphisms are composable and their composition is another morphism:

βa ∘ αa :: F a -> H a

We will use this morphism as the component of the natural transformation β ⋅ α — the composition of two natural transformations β after α:

(β ⋅ α)a = βa ∘ αa

5_Vertical

One (long) look at a diagram convinces us that the result of this composition is indeed a natural transformation from F to H:

H f ∘ (β ⋅ α)a = (β ⋅ α)b ∘ F f

6_VerticalNaturality

Composition of natural transformations is associative, because their components, which are regular morphisms, are associative with respect to their composition.

Finally, for each functor F there is an identity natural transformation 1F whose components are the identity morphisms:

idF a :: F a -> F a

So, indeed, functors form a category.

A word about notation. Following Saunders Mac Lane I use the dot for the kind of natural transformation composition I have just described. The problem is that there are two ways of composing natural transformations. This one is called the vertical composition, because the functors are usually stacked up vertically in the diagrams that describe it. Vertical composition is important in defining the functor category. I’ll explain horizontal composition shortly.

6a_Vertical

The functor category between categories C and D is written as Fun(C, D), or [C, D], or sometimes as DC. This last notation suggests that a functor category itself might be considered a function object (an exponential) in some other category. Is this indeed the case?

Let’s have a look at the hierarchy of abstractions that we’ve been building so far. We started with a category, which is a collection of objects and morphisms. Categories themselves (or, strictly speaking small categories, whose objects form sets) are themselves objects in a higher-level category Cat. Morphisms in that category are functors. A Hom-set in Cat is a set of functors. For instance Cat(C, D) is a set of functors between two categories C and D.

7_CatHomSet

A functor category [C, D] is also a set of functors between two categories (plus natural transformations as morphisms). Its objects are the same as the members of Cat(C, D). Moreover, a functor category, being a category, must itself be an object of Cat (it so happens that the functor category between two small categories is itself small). We have a relationship between a Hom-set in a category and an object in the same category. The situation is exactly like the exponential object that we’ve seen in the last section. Let’s see how we can construct the latter in Cat.

As you may remember, in order to construct an exponential, we need to first define a product. In Cat, this turns out to be relatively easy, because small categories are sets of objects, and we know how to define cartesian products of sets. So an object in a product category C × D is just a pair of objects, (c, d), one from C and one from D. Similarly, a morphism between two such pairs, (c, d) and (c', d'), is a pair of morphisms, (f, g), where f :: c -> c' and g :: d -> d'. These pairs of morphisms compose component-wise, and there is always an identity pair that is just a pair of identity morphisms. To make the long story short, Cat is a full-blown cartesian closed category in which there is an exponential object DC for any pair of categories. And by “object” in Cat I mean a category, so DC is a category, which we can identify with the functor category between C and D.

2-Categories

With that out of the way, let’s have a closer look at Cat. By definition, any Hom-set in Cat is a set of functors. But, as we have seen, functors between two objects have a richer structure than just a set. They form a category, with natural transformations acting as morphisms. Since functors are considered morphisms in Cat, natural transformations are morphisms between morphisms.

This richer structure is an example of a 2-category, a generalization of a category where, besides objects and morphisms (which might be called 1-morphisms in this context), there are also 2-morphisms, which are morphisms between morphisms.

In the case of Cat seen as a 2-category we have:

  • Objects: (Small) categories
  • 1-morphisms: Functors between categories
  • 2-morphisms: Natural transformations between functors.

Instead of a Hom-set between two categories C and D, we have a Hom-category — the functor category DC. We have regular functor composition: a functor F from DC composes with a functor G from ED to give G ∘ F from EC. But we also have composition inside each Hom-category — vertical composition of natural transformations, or 2-morphisms, between functors.

8_Cat-2-Cat

With two kinds of composition in a 2-category, the question arises: How do they interact with each other?

Let’s pick two functors, or 1-morphisms, in Cat:

F :: C -> D
G :: D -> E

and their composition:

G ∘ F :: C -> E

Suppose we have two natural transformations, α and β, that act, respectively, on functors F and G:

α :: F -> F'
β :: G -> G'

10_Horizontal

Notice that we cannot apply vertical composition to this pair, because the target of α is different from the source of β. In fact they are members of two different functor categories: D C and E D. We can, however, apply composition to the functors F’ and G’, because the target of F’ is the source of G’ — it’s the category D. What’s the relation between the functors G’∘ F’ and G ∘ F?

Having α and β at our disposal, can we define a natural transformation from G ∘ F to G’∘ F’? Let me sketch the construction.

9_Horizontal

As usual, we start with an object a in C. Its image splits into two objects in D: F a and F'a. There is also a morphism, a component of α, connecting these two objects:

αa :: F a -> F'a

When going from D to E, these two objects split further into four objects:

G (F a), G'(F a), G (F'a), G'(F'a)

We also have four morphisms forming a square. Two of these morphisms are the components of the natural transformation β:

βF a :: G (F a) -> G'(F a)
βF'a :: G (F'a) -> G'(F'a)

The other two are the images of αa under the two functors (functors map morphisms):

G αa :: G (F a) -> G (F'a)
G'αa :: G'(F a) -> G'(F'a)

That’s a lot of morphisms. Our goal is to find a morphism that goes from G (F a) to G'(F'a), a candidate for the component of a natural transformation connecting the two functors G ∘ F and G’∘ F’. In fact there’s not one but two paths we can take from G (F a) to G'(F'a):

G'αa ∘ βF a
βF'a ∘ G αa

Luckily for us, they are equal, because the square we have formed turns out to be the naturality square for β.

We have just defined a component of a natural transformation from G ∘ F to G’∘ F’. The proof of naturality for this transformation is pretty straightforward, provided you have enough patience.

We call this natural transformation the horizontal composition of α and β:

β ∘ α :: G ∘ F -> G'∘ F'

Again, following Mac Lane I use the small circle for horizontal composition, although you may also encounter star in its place.

Here’s a categorical rule of thumb: Every time you have composition, you should look for a category. We have vertical composition of natural transformations, and it’s part of the functor category. But what about the horizontal composition? What category does that live in?

The way to figure this out is to look at Cat sideways. Look at natural transformations not as arrows between functors but as arrows between categories. A natural transformation sits between two categories, the ones that are connected by the functors it transforms. We can think of it as connecting these two categories.

Sideways

Let’s focus on two objects of Cat — categories C and D. There is a set of natural transformations that go between functors that connect C to D. These natural transformations are our new arrows from C to D. By the same token, there are natural transformations going between functors that connect D to E, which we can treat as new arrows going from D to E. Horizontal composition is the composition of these arrows.

We also have an identity arrow going from C to C. It’s the identity natural transformation that maps the identity functor on C to itself. Notice that the identity for horizontal composition is also the identity for vertical composition, but not vice versa.

Finally, the two compositions satisfy the interchange law:

(β' ⋅ α') ∘ (β ⋅ α) = (β' ∘ β) ⋅ (α' ∘ α)

I will quote Saunders Mac Lane here: The reader may enjoy writing down the evident diagrams needed to prove this fact.

There is one more piece of notation that might come in handy in the future. In this new sideways interpretation of Cat there are two ways of getting from object to object: using a functor or using a natural transformation. We can, however, re-interpret the functor arrow as a special kind of natural transformation: the identity natural transformation acting on this functor. So you’ll often see this notation:

F ∘ α

where F is a functor from D to E, and α is a natural transformation between two functors going from C to D. Since you can’t compose a functor with a natural transformation, this is interpreted as a horizontal composition of the identity natural transformation 1F after α.

Similarly:

α ∘ F

is a horizontal composition of α after 1F.

Conclusion

This concludes the first part of the book. We’ve learned the basic vocabulary of category theory. You may think of objects and categories as nouns; and morphisms, functors, and natural transformations as verbs. Morphisms connect objects, functors connect categories, natural transformations connect functors.

But we’ve also seen that, what appears as an action at one level of abstraction, becomes an object at the next level. A set of morphisms turns into a function object. As an object, it can be a source or a target of another morphism. That’s the idea behind higher order functions.

A functor maps objects to objects, so we can use it as a type constructor, or a parametric type. A functor also maps morphisms, so it is a higher order function — fmap. There are some simple functors, like Const, product, and coproduct, that can be used to generate a large variety of algebraic data types. Function types are also functorial, both covariant and contravariant, and can be used to extend algebraic data types.

Functors may be looked upon as objects in the functor category. As such, they become sources and targets of morphisms: natural transformations. A natural transformation is a special type of polymorphic function.

Challenges

  1. Define a natural transformation from the Maybe functor to the list functor. Prove the naturality condition for it.
  2. Define at least two different natural transformations between Reader () and the list functor. How many different lists of () are there?
  3. Continue the previous exercise with Reader Bool and Maybe.
  4. Show that horizontal composition of natural transformation satisfies the naturality condition (hint: use components). It’s a good exercise in diagram chasing.
  5. Write a short essay about how you may enjoy writing down the evident diagrams needed to prove the interchange law.
  6. Create a few test cases for the opposite naturality condition of transformations between different Op functors. Here’s one choice:
    op :: Op Bool Int
    op = Op (\x -> x > 0)

    and

    f :: String -> Int
    f x = read x

Next: Declarative Programming.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.


This is part 9 of Categories for Programmers. Previously: Functoriality. See the Table of Contents.

So far I’ve been glossing over the meaning of function types. A function type is different from other types.

Take Integer, for instance: It’s just a set of integers. Bool is a two element set. But a function type a->b is more than that: it’s a set of morphisms between objects a and b. A set of morphisms between two objects in any category is called a hom-set. It just so happens that in the category Set every hom-set is itself an object in the same category —because it is, after all, a set.

Hom-set in Set is just a set

Hom-set in Set is just a set

The same is not true of other categories where hom-sets are external to a category. They are even called external hom-sets.

Hom-set in category C is an external set

Hom-set in category C is an external set

It’s the self-referential nature of the category Set that makes function types special. But there is a way, at least in some categories, to construct objects that represent hom-sets. Such objects are called internal hom-sets.

Universal Construction

Let’s forget for a moment that function types are sets and try to construct a function type, or more generally, an internal hom-set, from scratch. As usual, we’ll take our cues from the Set category, but carefully avoid using any properties of sets, so that the construction will automatically work for other categories.

A function type may be considered a composite type because of its relationship to the argument type and the result type. We’ve already seen the constructions of composite types — those that involved relationships between objects. We used universal constructions to define a product type and a coproduct types. We can use the same trick to define a function type. We will need a pattern that involves three objects: the function type that we are constructing, the argument type, and the result type.

The obvious pattern that connects these three types is called function application or evaluation. Given a candidate for a function type, let’s call it z (notice that, if we are not in the category Set, this is just an object like any other object), and the argument type a (an object), the application maps this pair to the result type b (an object). We have three objects, two of them fixed (the ones representing the argument type and the result type).

We also have the application, which is a mapping. How do we incorporate this mapping into our pattern? If we were allowed to look inside objects, we could pair a function f (an element of z) with an argument x (an element of a) and map it to f x (the application of f to x, which is an element of b).

In Set we can pick a function f from a set of functions z and we can pick an argument x from the set (type) a. We get an element f x in the set (type) b.

In Set we can pick a function f from a set of functions z and we can pick an argument x from the set (type) a. We get an element f x in the set (type) b.

But instead of dealing with individual pairs (f, x), we can as well talk about the whole product of the function type z and the argument type a. The product z×a is an object, and we can pick, as our application morphism, an arrow g from that object to b. In Set, g would be the function that maps every pair (f, x) to f x.

So that’s the pattern: a product of two objects z and a connected to another object b by a morphism g.

A pattern of objects and morphisms that is the starting point of the universal construction

A pattern of objects and morphisms that is the starting point of the universal construction

Is this pattern specific enough to single out the function type using a universal construction? Not in every category. But in the categories of interest to us it is. And another question: Would it be possible to define a function object without first defining a product? There are categories in which there is no product, or there isn’t a product for all pairs of objects. The answer is no: there is no function type, if there is no product type. We’ll come back to this later when we talk about exponentials.

Let’s review the universal construction. We start with a pattern of objects and morphisms. That’s our imprecise query, and it usually yields lots and lots of hits. In particular, in Set, pretty much everything is connected to everything. We can take any object z, form its product with a, and there’s going to be a function from it to b (except when b is an empty set).

That’s when we apply our secret weapon: ranking. This is usually done by requiring that there be a mapping between candidate objects — a mapping that somehow factorizes our construction. In our case, we’ll decree that z together with the morphism g from z×a to b is better than some other z' with its own application g', if and only if there is a mapping h from z' to z such that the application of g' factors through the application of g. (Hint: Read this sentence while looking at the picture.)

Establishing a ranking between candidates for the function object

Establishing a ranking between candidates for the function object

Now here’s the tricky part, and the main reason I postponed this particular universal construction till now. Given the morphism h :: z'-> z, we want to close the diagram that has both z' and z crossed with a. What we really need, given the mapping h from z' to z, is a mapping from z'×a to z×a. And now, after discussing the functoriality of the product, we know how to do it. Because the product itself is a functor (more precisely an endo-bi-functor), it’s possible to lift pairs of morphisms. In other words, we can define not only products of objects but also products of morphisms.

Since we are not touching the second component of the product z'×a, we will lift the pair of morphisms (h, id), where id is an identity on a.

So, here’s how we can factor one application, g, out of another application g':

g' = g ∘ (h × id)

The key here is the action of the product on morphisms.

The third part of the universal construction is selecting the object that is universally the best. Let’s call this object a⇒b (think of this as a symbolic name for one object, not to be confused with a Haskell typeclass constraint — I’ll discuss different ways of naming it later). This object comes with its own application — a morphism from (a⇒b)×a to b — which we will call eval. The object a⇒b is the best if any other candidate for a function object can be uniquely mapped to it in such a way that its application morphism g factorizes through eval. This object is better than any other object according to our ranking.

The definition of the universal function object

The definition of the universal function object. This is the same diagram as above, but now the object a⇒b is universal.

Formally:

A function object from a to b is an object a⇒b together with the morphism

eval :: ((a⇒b) × a) -> b

such that for any other object z with a morphism

g :: z × a -> b

there is a unique morphism

h :: z -> (a⇒b)

that factors g through eval:

g = eval ∘ (h × id)

Of course, there is no guarantee that such an object a⇒b exists for any pair of objects a and b in a given category. But it always does in Set. Moreover, in Set, this object is isomorphic to the hom-set Set(a, b).

This is why, in Haskell, we interpret the function type a->b as the categorical function object a⇒b.

Currying

Let’s have a second look at all the candidates for the function object. This time, however, let’s think of the morphism g as a function of two variables, z and a.

g :: z × a -> b

Being a morphism from a product comes as close as it gets to being a function of two variables. In particular, in Set, g is a function from pairs of values, one from the set z and one from the set a.

On the other hand, the universal property tells us that for each such g there is a unique morphism h that maps z to a function object a⇒b.

h :: z -> (a⇒b)

In Set, this just means that h is a function that takes one variable of type z and returns a function from a to b. That makes h a higher order function. Therefore the universal construction establishes a one-to-one correspondence between functions of two variables and functions of one variable returning functions. This correspondence is called currying, and h is called the curried version of g.

This correspondence is one-to-one, because given any g there is a unique h, and given any h you can always recreate the two-argument function g using the formula:

g = eval ∘ (h × id)

The function g can be called the uncurried version of h.

Currying is essentially built into the syntax of Haskell. A function returning a function:

a -> (b -> c)

is often thought of as a function of two variables. That’s how we read the un-parenthesized signature:

a -> b -> c

This interpretation is apparent in the way we define multi-argument functions. For instance:

catstr :: String -> String -> String
catstr s s’ = s ++ s’

The same function can be written as a one-argument function returning a function — a lambda:

catstr’ s = \s’ -> s ++ s’

These two definitions are equivalent, and either can be partially applied to just one argument, producing a one-argument function, as in:

greet :: String -> String
greet = catstr “Hello “

Strictly speaking, a function of two variables is one that takes a pair (a product type):

(a, b) -> c

It’s trivial to convert between the two representations, and the two (higher-order) functions that do it are called, unsurprisingly, curry and uncurry:

curry :: ((a, b)->c) -> (a->b->c)
curry f a b = f (a, b)

and

uncurry :: (a->b->c) -> ((a, b)->c)
uncurry f (a, b) = f a b

Notice that curry is the factorizer for the universal construction of the function object. This is especially apparent if it’s rewritten in this form:

factorizer :: ((a, b)->c) -> (a->(b->c))
factorizer g = \a -> (\b -> g (a, b))

(As a reminder: A factorizer produces the factorizing function from a candidate.)

In non-functional languages, like C++, currying is possible but nontrivial. You can think of multi-argument functions in C++ as corresponding to Haskell functions taking tuples (although, to confuse things even more, in C++ you can define functions that take an explicit std::tuple, as well as variadic functions, and functions taking initializer lists).

You can partially apply a C++ function using the template std::bind. For instance, given a function of two strings:

std::string catstr(std::string s1, std::string s2) {
    return s1 + s2;
}

you can define a function of one string:

using namespace std::placeholders;

auto greet = std::bind(catstr, "Hello ", _1);
std::cout << greet("Haskell Curry");

Scala, which is more functional than C++ or Java, falls somewhere in between. If you anticipate that the function you’re defining will be partially applied, you define it with multiple argument lists:

def catstr(s1: String)(s2: String) = s1 + s2

Of course that requires some amount of foresight or prescience on the part of a library writer.

Exponentials

In mathematical literature, the function object, or the internal hom-object between two objects a and b, is often called the exponential and denoted by ba. Notice that the argument type is in the exponent. This notation might seem strange at first, but it makes perfect sense if you think of the relationship between functions and products. We’ve already seen that we have to use the product in the universal construction of the internal hom-object, but the connection goes deeper than that.

This is best seen when you consider functions between finite types — types that have a finite number of values, like Bool, Char, or even Int or Double. Such functions, at least in principle, can be fully memoized or turned into data structures to be looked up. And this is the essence of the equivalence between functions, which are morphisms, and function types, which are objects.

For instance a (pure) function from Bool is completely specified by a pair of values: one corresponding to False, and one corresponding to True. The set of all possible functions from Bool to, say, Int is the set of all pairs of Ints. This is the same as the product Int × Int or, being a little creative with notation, Int2.

For another example, let’s look at the C++ type char, which contains 256 values (Haskell Char is larger, because Haskell uses Unicode). There are several functions in the part of the C++ Standard Library that are usually implemented using lookups. Functions like isupper or isspace are implemented using tables, which are equivalent to tuples of 256 Boolean values. A tuple is a product type, so we are dealing with products of 256 Booleans: bool × bool × bool × ... × bool. We know from arithmetics that an iterated product defines a power. If you “multiply” bool by itself 256 (or char) times, you get bool to the power of char, or boolchar.

How many values are there in the type defined as 256-tuples of bool? Exactly 2256. This is also the number of different functions from char to bool, each function corresponding to a unique 256-tuple. You can similarly calculate that the number of functions from bool to char is 2562, and so on. The exponential notation for function types makes perfect sense in these cases.

We probably wouldn’t want to fully memoize a function from int or double. But the equivalence between functions and data types, if not always practical, is there. There are also infinite types, for instance lists, strings, or trees. Eager memoization of functions from those types would require infinite storage. But Haskell is a lazy language, so the boundary between lazily evaluated (infinite) data structures and functions is fuzzy. This function vs. data duality explains the identification of Haskell’s function type with the categorical exponential object — which corresponds more to our idea of data.

Cartesian Closed Categories

Although I will continue using the category of sets as a model for types and functions, it’s worth mentioning that there is a larger family of categories that can be used for that purpose. These categories are called cartesian closed, and Set is just one example of such a category.

A cartesian closed category must contain:

  1. The terminal object,
  2. A product of any pair of objects, and
  3. An exponential for any pair of objects.

If you consider an exponential as an iterated product (possibly infinitely many times), then you can think of a cartesian closed category as one supporting products of an arbitrary arity. In particular, the terminal object can be thought of as a product of zero objects — or the zero-th power of an object.

What’s interesting about cartesian closed categories from the perspective of computer science is that they provide models for the simply typed lambda calculus, which forms the basis of all typed programming languages.

The terminal object and the product have their duals: the initial object and the coproduct. A cartesian closed category that also supports those two, and in which product can be distributed over coproduct

a × (b + c) = a × b + a × c
(b + c) × a = b × a + c × a

is called a bicartesian closed category. We’ll see in the next section that bicartesian closed categories, of which Set is a prime example, have some interesting properties.

Exponentials and Algebraic Data Types

The interpretation of function types as exponentials fits very well into the scheme of algebraic data types. It turns out that all the basic identities from high-school algebra relating numbers zero and one, sums, products, and exponentials hold pretty much unchanged in any bicartesian closed category theory for, respectively, initial and final objects, coproducts, products, and exponentials. We don’t have the tools yet to prove them (such as adjunctions or the Yoneda lemma), but I’ll list them here nevertheless as a source of valuable intuitions.

Zeroth Power

a0 = 1

In the categorical interpretation, we replace 0 with the initial object, 1 with the final object, and equality with isomorphism. The exponential is the internal hom-object. This particular exponential represents the set of morphisms going from the initial object to an arbitrary object a. By the definition of the initial object, there is exactly one such morphism, so the hom-set C(0, a) is a singleton set. A singleton set is the terminal object in Set, so this identity trivially works in Set. What we are saying is that it works in any bicartesian closed category.

In Haskell, we replace 0 with Void; 1 with the unit type (); and the exponential with function type. The claim is that the set of functions from Void to any type a is equivalent to the unit type — which is a singleton. In other words, there is only one function Void->a. We’ve seen this function before: it’s called absurd.

This is a little bit tricky, for two reasons. One is that in Haskell we don’t really have uninhabited types — every type contains the “result of a never ending calculation,” or the bottom. The second reason is that all implementations of absurd are equivalent because, no matter what they do, nobody can ever execute them. There is no value that can be passed to absurd. (And if you manage to pass it a never ending calculation, it will never return!)

Powers of One

1a = 1

This identity, when interpreted in Set, restates the definition of the terminal object: There is a unique morphism from any object to the terminal object. In general, the internal hom-object from a to the terminal object is isomorphic to the terminal object itself.

In Haskell, there is only one function from any type a to unit. We’ve seen this function before — it’s called unit. You can also think of it as the function const partially applied to ().

First Power

a1 = a

This is a restatement of the observation that morphisms from the terminal object can be used to pick “elements” of the object a. The set of such morphisms is isomorphic to the object itself. In Set, and in Haskell, the isomorphism is between elements of the set a and functions that pick those elements, ()->a.

Exponentials of Sums

ab+c = ab × ac

Categorically, this says that the exponential from a coproduct of two objects is isomorphic to a product of two exponentials. In Haskell, this algebraic identity has a very practical, interpretation. It tells us that a function from a sum of two types is equivalent to a pair of functions from individual types. This is just the case analysis that we use when defining functions on sums. Instead of writing one function definition with a case statement, we usually split it into two (or more) functions dealing with each type constructor separately. For instance, take a function from the sum type (Either Int Double):

f :: Either Int Double -> String

It may be defined as a pair of functions from, respectively, Int and Double:

f (Left n)  = if n < 0 then "Negative int" else "Positive int"
f (Right x) = if x < 0.0 then "Negative double" else "Positive double"

Here, n is an Int and x is a Double.

Exponentials of Exponentials

(ab)c = ab×c

This is just a way of expressing currying purely in terms of exponential objects. A function returning a function is equivalent to a function from a product (a two-argument function).

Exponentials over Products

(a × b)c = ac × bc

In Haskell: A function returning a pair is equivalent to a pair of functions, each producing one element of the pair.

It’s pretty incredible how those simple high-school algebraic identities can be lifted to category theory and have practical application in functional programming.

Curry-Howard Isomorphism

I have already mentioned the correspondence between logic and algebraic data types. The Void type and the unit type () correspond to false and true. Product types and sum types correspond to logical conjunction ∧ (AND) and disjunction ⋁ (OR). In this scheme, the function type we have just defined corresponds to logical implication ⇒. In other words, the type a->b can be read as “if a then b.”

According to the Curry-Howard isomorphism, every type can be interpreted as a proposition — a statement or a judgment that may be true or false. Such a proposition is considered true if the type is inhabited and false if it isn’t. In particular, a logical implication is true if the function type corresponding to it is inhabited, which means that there exists a function of that type. An implementation of a function is therefore a proof of a theorem. Writing programs is equivalent to proving theorems. Let’s see a few examples.

Let’s take the function eval we have introduced in the definition of the function object. Its signature is:

eval :: ((a -> b), a) -> b

It takes a pair consisting of a function and its argument and produces a result of the appropriate type. It’s the Haskell implementation of the morphism:

eval :: (a⇒b) × a -> b

which defines the function type a⇒b (or the exponential object ba). Let’s translate this signature to a logical predicate using the Curry-Howard isomorphism:

((a ⇒ b) ∧ a) ⇒ b

Here’s how you can read this statement: If it’s true that b follows from a, and a is true, then b must be true. This makes perfect intuitive sense and has been known since antiquity as modus ponens. We can prove this theorem by implementing the function:

eval :: ((a -> b), a) -> b
eval (f, x) = f x

If you give me a pair consisting of a function f taking a and returning b, and a concrete value x of type a, I can produce a concrete value of type b by simply applying the function f to x. By implementing this function I have just shown that the type ((a -> b), a) -> b is inhabited. Therefore modus ponens is true in our logic.

How about a predicate that is blatantly false? For instance: if a or b is true then a must be true.

a ⋁ b ⇒ a

This is obviously wrong because you can chose an a that is false and a b that is true, and that’s a counter-example.

Mapping this predicate into a function signature using the Curry-Howard isomorphism, we get:

Either a b -> a

Try as you may, you can’t implement this function — you can’t produce a value of type a if you are called with the Right value. (Remember, we are talking about pure functions.)

Finally, we come to the meaning of the absurd function:

absurd :: Void -> a

Considering that Void translates into false, we get:

 false ⇒ a

Anything follows from falsehood (ex falso quodlibet). Here’s one possible proof (implementation) of this statement (function) in Haskell:

absurd (Void a) = absurd a

where Void is defined as:

newtype Void = Void Void

As always, the type Void is tricky. This definition makes it impossible to construct a value because in order to construct one, you would need to provide one. Therefore, the function absurd can never be called.

These are all interesting examples, but is there a practical side to Curry-Howard isomorphism? Probably not in everyday programming. But there are programming languages like Agda or Coq, which take advantage of the Curry-Howard isomorphism to prove theorems.

Computers are not only helping mathematicians do their work — they are revolutionizing the very foundations of mathematics. The latest hot research topic in that area is called Homotopy Type Theory, and is an outgrowth of type theory. It’s full of Booleans, integers, products and coproducts, function types, and so on. And, as if to dispel any doubts, the theory is being formulated in Coq and Agda. Computers are revolutionizing the world in more than one way.

Bibliography

  1. Ralph Hinze, Daniel W. H. James, Reason Isomorphically!. This paper contains proofs of all those high-school algebraic identities in category theory that I mentioned in this chapter.

Next: Natural Transformations.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help throughout this series of posts.


This is part 8 of Categories for Programmers. Previously: Functors. See the Table of Contents.

Now that you know what a functor is, and have seen a few examples, let’s see how we can build larger functors from smaller ones. In particular it’s interesting to see which type constructors (which correspond to mappings between objects in a category) can be extended to functors (which include mappings between morphisms).

Bifunctors

Since functors are morphisms in Cat (the category of categories), a lot of intuitions about morphisms — and functions in particular — apply to functors as well. For instance, just like you can have a function of two arguments, you can have a functor of two arguments, or a bifunctor. On objects, a bifunctor maps every pair of objects, one from category C, and one from category D, to an object in category E. Notice that this is just saying that it’s a mapping from a cartesian product of categories C×D to E.

Bifunctor

That’s pretty straightforward. But functoriality means that a bifunctor has to map morphisms as well. This time, though, it must map a pair of morphisms, one from C and one from D, to a morphism in E.

Again, a pair of morphisms is just a single morphism in the product category C×D. We define a morphism in a cartesian product of categories as a pair of morphisms which goes from one pair of objects to another pair of objects. These pairs of morphisms can be composed in the obvious way:

(f, g) ∘ (f', g') = (f ∘ f', g ∘ g')

The composition is associative and it has an identity — a pair of identity morphisms (id, id). So a cartesian product of categories is indeed a category.

An easier way to think about bifunctors would be to consider them functors in each argument separately. So instead of translating functorial laws — associativity and identity preservation — from functors to bifunctors, it would be enough to check them separately for each argument. However, in general, separate functoriality is not enough to prove joint functoriality. Categories in which joint functoriality fails are called premonoidal.

Let’s define a bifunctor in Haskell. In this case all three categories are the same: the category of Haskell types. A bifunctor is a type constructor that takes two type arguments. Here’s the definition of the Bifunctor typeclass taken directly from the library Control.Bifunctor:

class Bifunctor f where
    bimap :: (a -> c) -> (b -> d) -> f a b -> f c d
    bimap g h = first g . second h
    first :: (a -> c) -> f a b -> f c b
    first g = bimap g id
    second :: (b -> d) -> f a b -> f a d
    second = bimap id

The type variable f represents the bifunctor. You can see that in all type signatures it’s always applied to two type arguments. The first type signature defines bimap: a mapping of two functions at once. The result is a lifted function, (f a b -> f c d), operating on types generated by the bifunctor’s type constructor. There is a default implementation of bimap in terms of first and second, which shows that it’s enough to have functoriality in each argument separately to be able to define a bifunctor. (As mentioned before, this doesn’t always work, because the two maps may not commute, that is first g . second h may not be the same as second h . first g.)

Bimap

bimap

The two other type signatures, first and second, are the two fmaps witnessing the functoriality of f in the first and the second argument, respectively.

First

first

Second

second

The typeclass definition provides default implementations for both of them in terms of bimap.

When declaring an instance of Bifunctor, you have a choice of either implementing bimap and accepting the defaults for first and second, or implementing both first and second and accepting the default for bimap (of course, you may implement all three of them, but then it’s up to you to make sure they are related to each other in this manner).

Product and Coproduct Bifunctors

An important example of a bifunctor is the categorical product — a product of two objects that is defined by a universal construction. If the product exists for any pair of objects, the mapping from those objects to the product is bifunctorial. This is true in general, and in Haskell in particular. Here’s the Bifunctor instance for a pair constructor — the simplest product type:

instance Bifunctor (,) where
    bimap f g (x, y) = (f x, g y)

There isn’t much choice: bimap simply applies the first function to the first component, and the second function to the second component of a pair. The code pretty much writes itself, given the types:

bimap :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)

The action of the bifunctor here is to make pairs of types, for instance:

(,) a b = (a, b)

By duality, a coproduct, if it’s defined for every pair of objects in a category, is also a bifunctor. In Haskell, this is exemplified by the Either type constructor being an instance of Bifunctor:

instance Bifunctor Either where
    bimap f _ (Left x)  = Left (f x)
    bimap _ g (Right y) = Right (g y)

This code also writes itself.

Now, remember when we talked about monoidal categories? A monoidal category defines a binary operator acting on objects, together with a unit object. I mentioned that Set is a monoidal category with respect to cartesian product, with the singleton set as a unit. And it’s also a monoidal category with respect to disjoint union, with the empty set as a unit. What I haven’t mentioned is that one of the requirements for a monoidal category is that the binary operator be a bifunctor. This is a very important requirement — we want the monoidal product to be compatible with the structure of the category, which is defined by morphisms. We are now one step closer to the full definition of a monoidal category (we still need to learn about naturality, before we can get there).

Functorial Algebraic Data Types

We’ve seen several examples of parameterized data types that turned out to be functors — we were able to define fmap for them. Complex data types are constructed from simpler data types. In particular, algebraic data types (ADTs) are created using sums and products. We have just seen that sums and products are functorial. We also know that functors compose. So if we can show that the basic building blocks of ADTs are functorial, we’ll know that parameterized ADTs are functorial too.

So what are the building blocks of parameterized algebraic data types? First, there are the items that have no dependency on the type parameter of the functor, like Nothing in Maybe, or Nil in List. They are equivalent to the Const functor. Remember, the Const functor ignores its type parameter (really, the second type parameter, which is the one of interest to us, the first one being kept constant).

Then there are the elements that simply encapsulate the type parameter itself, like Just in Maybe. They are equivalent to the identity functor. I mentioned the identity functor previously, as the identity morphism in Cat, but didn’t give its definition in Haskell. Here it is:

data Identity a = Identity a
instance Functor Identity where
    fmap f (Identity x) = Identity (f x)

You can think of Identity as the simplest possible container that always stores just one (immutable) value of type a.

Everything else in algebraic data structures is constructed from these two primitives using products and sums.

With this new knowledge, let’s have a fresh look at the Maybe type constructor:

data Maybe a = Nothing | Just a

It’s a sum of two types, and we now know that the sum is functorial. The first part, Nothing can be represented as a Const () acting on a (the first type parameter of Const is set to unit — later we’ll see more interesting uses of Const). The second part is just a different name for the identity functor. We could have defined Maybe, up to isomorphism, as:

type Maybe a = Either (Const () a) (Identity a)

So Maybe is the composition of the bifunctor Either with two functors, Const () and Identity. (Const is really a bifunctor, but here we always use it partially applied.)

We’ve already seen that a composition of functors is a functor — we can easily convince ourselves that the same is true of bifunctors. All we need is to figure out how a composition of a bifunctor with two functors works on morphisms. Given two morphisms, we simply lift one with one functor and the other with the other functor. We then lift the resulting pair of lifted morphisms with the bifunctor.

We can express this composition in Haskell. Let’s define a data type that is parameterized by a bifunctor bf (it’s a type variable that is a type constructor that takes two types as arguments), two functors fu and gu (type constructors that take one type variable each), and two regular types a and b. We apply fu to a and gu to b, and then apply bf to the resulting two types:

newtype BiComp bf fu gu a b = BiComp (bf (fu a) (gu b))

That’s the composition on objects, or types. Notice how in Haskell we apply type constructors to types, just like we apply functions to arguments. The syntax is the same.

If you’re getting a little lost, try applying BiComp to Either, Const (), Identity, a, and b, in this order. You will recover our bare-bone version of Maybe b (a is ignored).

The new data type BiComp is a bifunctor in a and b, but only if bf is itself a Bifunctor and fu and gu are Functors. The compiler must know that there will be a definition of bimap available for bf, and definitions of fmap for fu and gu. In Haskell, this is expressed as a precondition in the instance declaration: a set of class constraints followed by a double arrow:

instance (Bifunctor bf, Functor fu, Functor gu) =>
  Bifunctor (BiComp bf fu gu) where
    bimap f1 f2 (BiComp x) = BiComp ((bimap (fmap f1) (fmap f2)) x)

The implementation of bimap for BiComp is given in terms of bimap for bf and the two fmaps for fu and gu. The compiler automatically infers all the types and picks the correct overloaded functions whenever BiComp is used.

The x in the definition of bimap has the type:

bf (fu a) (gu b)

which is quite a mouthful. The outer bimap breaks through the outer bf layer, and the two fmaps dig under fu and gu, respectively. If the types of f1 and f2 are:

f1 :: a -> a'
f2 :: b -> b'

then the final result is of the type bf (fu a') (gu b'):

bimapbf :: (fu a -> fu a') -> (gu b -> gu b') 
  -> bf (fu a) (gu b) -> bf (fu a') (gu b')

If you like jigsaw puzzles, these kinds of type manipulations can provide hours of entertainment.

So it turns out that we didn’t have to prove that Maybe was a functor — this fact followed from the way it was constructed as a sum of two functorial primitives.

A perceptive reader might ask the question: If the derivation of the Functor instance for algebraic data types is so mechanical, can’t it be automated and performed by the compiler? Indeed, it can, and it is. You need to enable a particular Haskell extension by including this line at the top of your source file:

{-# LANGUAGE DeriveFunctor #-}

and then add deriving Functor to your data structure:

data Maybe a = Nothing | Just a
  deriving Functor

and the corresponding fmap will be implemented for you.

The regularity of algebraic data structures makes it possible to derive instances not only of Functor but of several other type classes, including the Eq type class I mentioned before. There is also the option of teaching the compiler to derive instances of your own typeclasses, but that’s a bit more advanced. The idea though is the same: You provide the behavior for the basic building blocks and sums and products, and let the compiler figure out the rest.

Functors in C++

If you are a C++ programmer, you obviously are on your own as far as implementing functors goes. However, you should be able to recognize some types of algebraic data structures in C++. If such a data structure is made into a generic template, you should be able to quickly implement fmap for it.

Let’s have a look at a tree data structure, which we would define in Haskell as a recursive sum type:

data Tree a = Leaf a | Node (Tree a) (Tree a)
    deriving Functor

As I mentioned before, one way of implementing sum types in C++ is through class hierarchies. It would be natural, in an object-oriented language, to implement fmap as a virtual function of the base class Functor and then override it in all subclasses. Unfortunately this is impossible because fmap is a template, parameterized not only by the type of the object it’s acting upon (the this pointer) but also by the return type of the function that’s been applied to it. Virtual functions cannot be templatized in C++. We’ll implement fmap as a generic free function, and we’ll replace pattern matching with dynamic_cast.

The base class must define at least one virtual function in order to support dynamic casting, so we’ll make the destructor virtual (which is a good idea in any case):

template<class T>
struct Tree {
    virtual ~Tree() {};
};

The Leaf is just an Identity functor in disguise:

template<class T>
struct Leaf : public Tree<T> {
    T _label;
    Leaf(T l) : _label(l) {}
};

The Node is a product type:

template<class T>
struct Node : public Tree<T> {
    Tree<T> * _left;
    Tree<T> * _right;
    Node(Tree<T> * l, Tree<T> * r) : _left(l), _right(r) {}
};

When implementing fmap we take advantage of dynamic dispatching on the type of the Tree. The Leaf case applies the Identity version of fmap, and the Node case is treated like a bifunctor composed with two copies of the Tree functor. As a C++ programmer, you’re probably not used to analyzing code in these terms, but it’s a good exercise in categorical thinking.

template<class A, class B>
Tree<B> * fmap(std::function<B(A)> f, Tree<A> * t)
{
    Leaf<A> * pl = dynamic_cast <Leaf<A>*>(t);
    if (pl)
        return new Leaf<B>(f (pl->_label));
    Node<A> * pn = dynamic_cast<Node<A>*>(t);
    if (pn)
        return new Node<B>( fmap<A>(f, pn->_left)
                          , fmap<A>(f, pn->_right));
    return nullptr;
}

For simplicity, I decided to ignore memory and resource management issues, but in production code you would probably use smart pointers (unique or shared, depending on your policy).

Compare it with the Haskell implementation of fmap:

instance Functor Tree where
    fmap f (Leaf a) = Leaf (f a)
    fmap f (Node t t') = Node (fmap f t) (fmap f t')

This implementation can also be automatically derived by the compiler.

The Writer Functor

I promised that I would come back to the Kleisli category I described earlier. Morphisms in that category were represented as “embellished” functions returning the Writer data structure.

type Writer a = (a, String)

I said that the embellishment was somehow related to endofunctors. And, indeed, the Writer type constructor is functorial in a. We don’t even have to implement fmap for it, because it’s just a simple product type.

But what’s the relation between a Kleisli category and a functor — in general? A Kleisli category, being a category, defines composition and identity. Let’ me remind you that the composition is given by the fish operator:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)
m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

and the identity morphism by a function called return:

return :: a -> Writer a
return x = (x, "")

It turns out that, if you look at the types of these two functions long enough (and I mean, long enough), you can find a way to combine them to produce a function with the right type signature to serve as fmap. Like this:

fmap f = id >=> (\x -> return (f x))

Here, the fish operator combines two functions: one of them is the familiar id, and the other is a lambda that applies return to the result of acting with f on the lambda’s argument. The hardest part to wrap your brain around is probably the use of id. Isn’t the argument to the fish operator supposed to be a function that takes a “normal” type and returns an embellished type? Well, not really. Nobody says that a in a -> Writer b must be a “normal” type. It’s a type variable, so it can be anything, in particular it can be an embellished type, like Writer b.

So id will take Writer a and turn it into Writer a. The fish operator will fish out the value of a and pass it as x to the lambda. There, f will turn it into a b and return will embellish it, making it Writer b. Putting it all together, we end up with a function that takes Writer a and returns Writer b, exactly what fmap is supposed to produce.

Notice that this argument is very general: you can replace Writer with any type constructor. As long as it supports a fish operator and return, you can define fmap as well. So the embellishment in the Kleisli category is always a functor. (Not every functor, though, gives rise to a Kleisli category.)

You might wonder if the fmap we have just defined is the same fmap the compiler would have derived for us with deriving Functor. Interestingly enough, it is. This is due to the way Haskell implements polymorphic functions. It’s called parametric polymorphism, and it’s a source of so called theorems for free. One of those theorems says that, if there is an implementation of fmap for a given type constructor, one that preserves identity, then it must be unique.

Covariant and Contravariant Functors

Now that we’ve reviewed the writer functor, let’s go back to the reader functor. It was based on the partially applied function-arrow type constructor:

(->) r

We can rewrite it as a type synonym:

type Reader r a = r -> a

for which the Functor instance, as we’ve seen before, reads:

instance Functor (Reader r) where
    fmap f g = f . g

But just like the pair type constructor, or the Either type constructor, the function type constructor takes two type arguments. The pair and Either were functorial in both arguments — they were bifunctors. Is the function constructor a bifunctor too?

Let’s try to make it functorial in the first argument. We’ll start with a type synonym — it’s just like the Reader but with the arguments flipped:

type Op r a = a -> r

This time we fix the return type, r, and vary the argument type, a. Let’s see if we can somehow match the types in order to implement fmap, which would have the following type signature:

fmap :: (a -> b) -> (a -> r) -> (b -> r)

With just two functions taking a and returning, respectively, b and r, there is simply no way to build a function taking b and returning r! It would be different if we could somehow invert the first function, so that it took b and returned a instead. We can’t invert an arbitrary function, but we can go to the opposite category.

A short recap: For every category C there is a dual category Cop. It’s a category with the same objects as C, but with all the arrows reversed.

Consider a functor that goes between Cop and some other category D:
F :: Cop → D
Such a functor maps a morphism fop :: a → b in Cop to the morphism F fop :: F a → F b in D. But the morphism fop secretly corresponds to some morphism f :: b → a in the original category C. Notice the inversion.

Now, F is a regular functor, but there is another mapping we can define based on F, which is not a functor — let’s call it G. It’s a mapping from C to D. It maps objects the same way F does, but when it comes to mapping morphisms, it reverses them. It takes a morphism f :: b → a in C, maps it first to the opposite morphism fop :: a → b and then uses the functor F on it, to get F fop :: F a → F b.

Contravariant

Considering that F a is the same as G a and F b is the same as G b, the whole trip can be described as:
G f :: (b → a) → (G a → G b)
It’s a “functor with a twist.” A mapping of categories that inverts the direction of morphisms in this manner is called a contravariant functor. Notice that a contravariant functor is just a regular functor from the opposite category. The regular functors, by the way — the kind we’ve been studying thus far — are called covariant functors.

Here’s the typeclass defining a contravariant functor (really, a contravariant endofunctor) in Haskell:

class Contravariant f where
    contramap :: (b -> a) -> (f a -> f b)

Our type constructor Op is an instance of it:

instance Contravariant (Op r) where
    -- (b -> a) -> Op r a -> Op r b
    contramap f g = g . f

Notice that the function f inserts itself before (that is, to the right of) the contents of Op — the function g.

The definition of contramap for Op may be made even terser, if you notice that it’s just the function composition operator with the arguments flipped. There is a special function for flipping arguments, called flip:

flip :: (a -> b -> c) -> (b -> a -> c)
flip f y x = f x y

With it, we get:

contramap = flip (.)

Profunctors

We’ve seen that the function-arrow operator is contravariant in its first argument and covariant in the second. Is there a name for such a beast? It turns out that, if the target category is Set, such a beast is called a profunctor. Because a contravariant functor is equivalent to a covariant functor from the opposite category, a profunctor is defined as:
Cop × D → Set

Since, to first approximation, Haskell types are sets, we apply the name Profunctor to a type constructor p of two arguments, which is contra-functorial in the first argument and functorial in the second. Here’s the appropriate typeclass taken from the Data.Profunctor library:

class Profunctor p where
  dimap :: (a -> b) -> (c -> d) -> p b c -> p a d
  dimap f g = lmap f . rmap g
  lmap :: (a -> b) -> p b c -> p a c
  lmap f = dimap f id
  rmap :: (b -> c) -> p a b -> p a c
  rmap = dimap id

All three functions come with default implementations. Just like with Bifunctor, when declaring an instance of Profunctor, you have a choice of either implementing dimap and accepting the defaults for lmap and rmap, or implementing both lmap and rmap and accepting the default for dimap.

dimap

dimap

Now we can assert that the function-arrow operator is an instance of a Profunctor:

instance Profunctor (->) where
  dimap ab cd bc = cd . bc . ab
  lmap = flip (.)
  rmap = (.)

Profunctors have their application in the Haskell lens library. We’ll see them again when we talk about ends and coends.

The Hom-Functor

The above examples are the reflection of a more general statement that the mapping that takes a pair of objects a and b and assigns to it the set of morphisms between them, the hom-set C(a, b), is a functor. It is a functor from the product category Cop×C to the category of sets, Set.

Let’s define its action on morphisms. A morphism in Cop×C is a pair of morphisms from C:

f :: a'→ a
g :: b → b'

The lifting of this pair must be a morphism (a function) from the set C(a, b) to the set C(a', b'). Just pick any element h of C(a, b) (it’s a morphism from a to b) and assign to it:

g ∘ h ∘ f

which is an element of C(a', b').

As you can see, the hom-functor is a special case of a profunctor.

Challenges

  1. Show that the data type:
    data Pair a b = Pair a b

    is a bifunctor. For additional credit implement all three methods of Bifunctor and use equational reasoning to show that these definitions are compatible with the default implementations whenever they can be applied.

  2. Show the isomorphism between the standard definition of Maybe and this desugaring:
    type Maybe' a = Either (Const () a) (Identity a)

    Hint: Define two mappings between the two implementations. For additional credit, show that they are the inverse of each other using equational reasoning.

  3. Let’s try another data structure. I call it a PreList because it’s a precursor to a List. It replaces recursion with a type parameter b.
    data PreList a b = Nil | Cons a b

    You could recover our earlier definition of a List by recursively applying PreList to itself (we’ll see how it’s done when we talk about fixed points).

    Show that PreList is an instance of Bifunctor.

  4. Show that the following data types define bifunctors in a and b:
    data K2 c a b = K2 c
    data Fst a b = Fst a
    data Snd a b = Snd b

    For additional credit, check your solutions agains Conor McBride’s paper Clowns to the Left of me, Jokers to the Right.

  5. Define a bifunctor in a language other than Haskell. Implement bimap for a generic pair in that language.
  6. Should std::map be considered a bifunctor or a profunctor in the two template arguments Key and T? How would you redesign this data type to make it so?

Next: Function Types.

Acknowledgment

As usual, big thanks go to Gershom Bazerman for reviewing this article.


Ferrari museum in Maranello

Ferrari museum in Maranello

I was recently visiting the Ferrari museum in Maranello, Italy, where I saw this display of telemetry data from racing cars.

Telemetry data from a racing car. The contour of the racing track is shown in the upper left corner and various data channels are displayed below.

Telemetry data from a racing car. The racing track is displayed in the upper left corner and various data channels are displayed below.

The processing and the display of telemetry data is an interesting programming challenge. It has application in space exploration (as in, when you land a probe on a surface of a comet), medicine, and the military. The same techniques are used in financial systems where streams carry information about stock prices, commodity prices, and currency exchange rates.

It’s also a problem that lends itself particularly well to functional programming. If you are one of these shops working with telemetry, and you have to maintain legacy code written in imperative style, you might be interested in an alternative approach, especially if you are facing constant pressure to provide more sophisticated analysis tools and introduce concurrency to make the system faster and more responsive.

What all these applications have in common is that they deal with multiple channels generating streams of data. The data has to be either displayed in real time or stored for later analysis and processing. It’s pretty obvious to a functional programmer that channels are functors, and that they should be composed using combinators. In fact this observation can drive the whole architecture. The clincher is the issue of concurrency: retrofitting non-functional code to run in parallel is a lost battle — it’s almost easier to start from scratch. But treating channels as immutable entities makes concurrency almost an after-thought.

Everything is a Number

The most basic (and totally wrong) approach is to look at telemetry as streams of numbers. This is the assembly language of data processing. When everything is a number and you can apply your math any way you wish. The problem is that you are throwing away a lot of useful information. You want to use types as soon as possible to encode additional information and to prevent nonsensical operations like adding temperature to velocity.

In an engineering application, the least you can do is to keep track of units of measurement. You also want to distinguish between channels that produce floating-point numbers and ones that produce integers, or Booleans, or strings. This immediately tells you that a channel should be a polymorphic data structure. You should be able to stream any type of data, be it bytes, complex numbers, or vectors.

Everything is an Object

To an object-oriented mind it looks very much like a channel should be an object that is parameterized by the type of data it carries. And as an object it should have some methods. We need the get method to access the current value, and the next method to increment the position in the stream. As an imperative programmer you might also be tempted to provide a mutator, set. If you ever want your program to be concurrent, don’t even think about it!

If you’re a C++ programmer, you may overload some operators, and use * and ++ instead. That would make a channel look more like a forward iterator. But whatever you call it, a functional programmer will recognize it as a list, with the head and tail functionality.

Everything is a List

Let’s talk about lists, because there is a lot of misunderstanding around them. When people think of lists in imperative languages they think about storage. A list is probably the worst data type for storing data. Imperative programmers naturally assume that functional programmers, who use lists a lot, must be crazy. They are not! A Haskell list is almost never used for storing bulk data. A list is either an interface to data that is stored elsewhere, or a generator of data. Haskell is a lazy functional language, so what looks like a data structure is really a bunch of functions that provide data on demand.

That’s why I wouldn’t hesitate to implement channels as lists in Haskell. As an added bonus, lists can provide a pull interface to data that is being pushed. Reactive programs that process streams of data may be written as if all the data were already there — the event handler logic can be hidden inside the objects that generate the data. And this is just what’s needed for live telemetry data.

Obviously, functional programming is easier in Haskell than in C++, C#, or Java. But given how much legacy software there is, it could be a lost cause to ask management to (a) throw away existing code and start from scratch, (b) retrain the team to learn a new language, and (c) deal with completely new performance characteristics, e.g., lazy evaluation and garbage collection. So, realistically, the best we can do is to keep introducing functional methods into imperative languages, at least for the time being. It doesn’t mean that Haskell should’t play an important role in it. Over and over again I find myself prototyping solutions in Haskell before translating them into C++. The added effort pays back handsomely through faster prototyping, better code quality, and fewer bugs to chase. So I would highly recommend to every imperative programmer to spend, say, an hour a day learning and playing with Haskell. You’d be amazed how it helps in developing your programming skills.

Everything is a Functor

So, if you’re an object oriented programmer, you’ll probably implement a channel as something like this:

template <class T> Channel {
    virtual T get();
    virtual bool next();
};

and then get stuck. With this kind of interface, the rest of your program is bound to degenerate into a complex system of loops that extract data from streams and process them, possibly stuffing it back into other streams.

Instead, I propose to try the functional way. I will show you some prototype code in Haskell, but mostly explain how things work, so a non-Haskell programmer can gain some insight.

Here’s the definition of a polymorphic channel type, Chan:

data Chan a = Chan [a]

where a plays the role of a type variable, analogous to T in the C++ code above. The right hand side of the equal sign defines the constructor Chan that takes a list as an argument. Constructors are used both for constructing and for pattern matching. The notation [a] means a list of a.

The details don’t really matter, as long as you understand that the channel is implemented as a list. Also, I’m making things a little bit more explicit for didactic purposes. A Haskell programmer would implement the channel as a type alias, type, rather than a separate type.

Rule number one of dealing with lists is: try not to access their elements in a loop (or, using the functional equivalent of a loop — recursively). Operate on lists holistically. For instance, one of the most common operations on lists is to apply a function to every element. That means we want our Chan to be a functor.

A functor is a polymorphic data type that supports operating on its contents with a function. In the case of Chan that’s easy, since a list itself is a functor. I’ll be explicit here, again for didactic reasons. This is how you make Chan an instance of the Functor class by defining how to fmap a function f over it:

instance Functor Chan where
    fmap f (Chan xs) = Chan (map f xs)

Here, map is a library function that applies f to every element of the list. This is very much like applying C++ std::transform to a container, except that in Haskell everything is evaluated lazily, so you can apply fmap to an infinite list, or to a list that is not there yet because, for instance, it’s being generated in real time from incoming telemetry.

Everything is a Combinator

Let’s see how far we can get with this channel idea. The next step is to be able to combine multiple channels to generate streams of derived data. For instance, suppose that you have one channel from a pressure gauge, and another providing volume data, and you want to calculate instantaneous temperature using the ideal gas equation.

Let’s start with defining some types. We want separate types for quantities that are measured using different units. Once more, I’m being didactic here, because there are ready-made Haskell libraries that use so called phantom types to encode measurement units. Here I’ll do it naively:

data Pressure = Pascal Float
data Volume   = Meter3 Float
data Temp     = Kelvin Float

I’ll also define the ideal gas constant:

constR = 8.314472 -- J/(mol·K)

Here’s the function that calculates the temperature of ideal gas:

getT :: Float -> Pressure -> Volume -> Temp
getT n (Pascal p) (Meter3 v) = Kelvin (p * v / (n * constR))

The question is, how can we apply this function to the pressure and volume channels to get the temperature channel? We know how to apply a function to a single channel using fmap, but here we have to work with two channels. Fortunately, a channel is not just a functor — it’s an applicative functor. It defines the action of multi-argument functions on multiple channels. I’ll give you a Haskell implementation, but you should be able to do the same in C++ by overloading fmap or transform.

instance Applicative Chan where
    pure x = Chan (repeat x)
    (Chan fs) <*> (Chan xs) = Chan (zipWith ($) fs xs)

The Applicative class defines two functions. One is called pure, and it creates a constant channel from a value by repeatedly returning the same value. The other is a binary operator <*> that applies a channel of functions (yes, you can treat functions the same way you treat any other data) to a channel of values. The function zipWith applies, pairwise, functions to arguments using the function application operator ($).

Again, the details are not essential. The bottom line is that this allows us to apply our function getT to two channels (actually, three channels, since we also need to provide the amount of gas in moles — here I’m assuming 0.1 moles).

chT :: Chan Pressure -> Chan Volume -> Chan Temp
chT chP chV = getT <$> pure 0.1 <*> chP <*> chV

Such functions that combine channels into new channels are called combinators, and an applicative functor makes the creation of new combinators very easy.

The combinators are not limited to producing physical quantities. They may as well produce channels of alerts, channels of pixels for display, or channels of visual widgets. You can construct the whole architecture around channels. And since we’ve been only considering functional data structures, the resulting architecture can be easily subject to parallelization.

Moving Average

But don’t some computations require mutable state? For instance, don’t you need some kind of accumulators in order to calculate, let’s say, moving averages? Let’s see how this can be done functionally.

The idea is to keep a running sum of list elements within a fixed window of size n. When advancing through the list, we will add the new incoming element to the running sum and subtract the old outgoing element. The average is just this sum divided by n.

We can use the old trick of delaying the list by n positions. We’ll pad the beginning of the delayed list with n zeros. Here’s the Haskell code:

delay :: Num a => Int -> [a] -> [a]
delay n lst = replicate n 0 ++ lst

The first line is the (optional, but very useful) type signature. The second line defines the function delay that takes the delay counter n and the list. The function returns a list that is obtained by concatenating (operator ++) the zero-filled list (replicate n 0) in front of the original list. For instance, if you start with the list [1, 2, 3, 4] and delay it by 2, you’ll get [0, 0, 1, 2, 3, 4].

The next step is to create a stream of deltas — the differences between elements separated by n positions. We do it by zipping two lists: the original and the delayed one.

zip lst (delay n lst)

The function zip pairs elements from the first list with the elements from the second list.

Continuing with our example, the zipping will produce the pairs [(1, 0), (2, 0), (3, 1), (4, 2)]. Notice that the left number in each pair is the incoming element that is to be added to the running sum, while the right number is the outgoing one, to be subtracted from the running sum.

Now if we subtract the two numbers in each pair we’ll get exactly the delta that has to be added to the running sum at each step. We do the subtraction by mapping the operator (-) over the list. (To make the subtraction operator (-) operate on pairs we have to uncurry it. (If you don’t know what currying is, don’t worry.)

deltas :: Num a => Int -> [a] -> [a]
deltas n lst = map (uncurry (-)) (zip lst (delay n lst))

Continuing with the example, we will get [1, 2, 2, 2]. These are the amounts by which the running sum should change at every step. (Incidentally, for n equal to one, the deltas are proportional to the derivative of the sampled stream.)

Finally, we have to start accumulating the deltas. There is a library function scanl1 that can be used to produce a list of partial sums when called with the summation operator (+).

slidingSums :: Num a => Int -> [a] -> [a]
slidingSums n lst =  scanl1 (+) (deltas n lst)

At each step, scanl1 will add the delta to the previous running sum. The “1” in its name means that it will start with the first element of the list as the accumulator. The result, in our little example, is [1, 3, 5, 7]. What remains is to divide each sum by n and we’re done:

movingAverage :: Fractional a => Int -> [a] -> [a]
movingAverage n list = map (/ (fromIntegral n)) (slidingSums n list)

Since n is an integer, it has to be explicitly converted to a fractional number before being passed to the division operator. This is done using fromIntegral. The slightly cryptic notation (/ (fromIntegral n)) is called operator section. It just means “divide by n.”

As expected, the final result for the two-element running average of [1, 2, 3, 4] is [0.5, 1.5, 2.5, 3.5]. Notice that we haven’t used any mutable state to achieve this result, which makes this code automatically thread safe. Also, because the calculation is lazy, we can calculate the moving average of an infinite list as long as we only extract a finite number of data points. Here, we are printing the first 10 points of the 5-element moving average of the list of integers from 1 to infinity.

print (take 10 (movingAverage 5 [1..]))

The result is:

[0.2, 0.6, 1.2, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

Conclusion

The functional approach is applicable to designing software not only in the small but, more importantly, in the large. It captures the patterns of interaction between components and the ways they compose. The patterns I mentioned in this post, the functor and the applicative functor, are probably the most common, but functional programmers have at their disposal a large variety of patterns borrowed from various branches of mathematics. These patterns can be used by imperative programmers as well, resulting in cleaner and more maintainable software that is, by construction, multithread-ready.


Table of Contents

Part One

  1. Category: The Essence of Composition
  2. Types and Functions
  3. Categories Great and Small
  4. Kleisli Categories
  5. Products and Coproducts
  6. Simple Algebraic Data Types
  7. Functors
  8. Functoriality
  9. Function Types
  10. Natural Transformations

Part Two

  1. Declarative Programming
  2. Limits and Colimits
  3. Free Monoids
  4. Representable Functors
  5. The Yoneda Lemma
  6. Yoneda Embedding

Part Three

  1. It’s All About Morphisms
  2. Adjunctions
  3. Free/Forgetful Adjunctions
  4. Monads: Programmer’s Definition
  5. Monads and Effects
  6. Monads Categorically
  7. Comonads
  8. F-Algebras
  9. Algebras for Monads
  10. Ends and Coends
  11. Kan Extensions
  12. Enriched Categories
  13. Topoi
  14. Lawvere Theories
  15. Monads, Monoids, and Categories

There is a free pdf version of this book with nicer typesetting available for download. You may order a hard-cover version with color illustrations at Blurb. Or you may watch me teaching this material to a live audience.

Preface

For some time now I’ve been floating the idea of writing a book about category theory that would be targeted at programmers. Mind you, not computer scientists but programmers — engineers rather than scientists. I know this sounds crazy and I am properly scared. I can’t deny that there is a huge gap between science and engineering because I have worked on both sides of the divide. But I’ve always felt a very strong compulsion to explain things. I have tremendous admiration for Richard Feynman who was the master of simple explanations. I know I’m no Feynman, but I will try my best. I’m starting by publishing this preface — which is supposed to motivate the reader to learn category theory — in hopes of starting a discussion and soliciting feedback.

I will attempt, in the space of a few paragraphs, to convince you that this book is written for you, and whatever objections you might have to learning one of the most abstract branches of mathematics in your “copious spare time” are totally unfounded.

My optimism is based on several observations. First, category theory is a treasure trove of extremely useful programming ideas. Haskell programmers have been tapping this resource for a long time, and the ideas are slowly percolating into other languages, but this process is too slow. We need to speed it up.

Second, there are many different kinds of math, and they appeal to different audiences. You might be allergic to calculus or algebra, but it doesn’t mean you won’t enjoy category theory. I would go as far as to argue that category theory is the kind of math that is particularly well suited for the minds of programmers. That’s because category theory — rather than dealing with particulars — deals with structure. It deals with the kind of structure that makes programs composable.

Composition is at the very root of category theory — it’s part of the definition of the category itself. And I will argue strongly that composition is the essence of programming. We’ve been composing things forever, long before some great engineer came up with the idea of a subroutine. Some time ago the principles of structured programming revolutionized programming because they made blocks of code composable. Then came object oriented programming, which is all about composing objects. Functional programming is not only about composing functions and algebraic data structures — it makes concurrency composable — something that’s virtually impossible with other programming paradigms.

Third, I have a secret weapon, a butcher’s knife, with which I will butcher math to make it more palatable to programmers. When you’re a professional mathematician, you have to be very careful to get all your assumptions straight, qualify every statement properly, and construct all your proofs rigorously. This makes mathematical papers and books extremely hard to read for an outsider. I’m a physicist by training, and in physics we made amazing advances using informal reasoning. Mathematicians laughed at the Dirac delta function, which was made up on the spot by the great physicist P. A. M. Dirac to solve some differential equations. They stopped laughing when they discovered a completely new branch of calculus called distribution theory that formalized Dirac’s insights.

Of course when using hand-waving arguments you run the risk of saying something blatantly wrong, so I will try to make sure that there is solid mathematical theory behind informal arguments in this book. I do have a worn-out copy of Saunders Mac Lane’s Category Theory for the Working Mathematician on my nightstand.

Since this is category theory for programmers I will illustrate all major concepts using computer code. You are probably aware that functional languages are closer to math than the more popular imperative languages. They also offer more abstracting power. So a natural temptation would be to say: You must learn Haskell before the bounty of category theory becomes available to you. But that would imply that category theory has no application outside of functional programming and that’s simply not true. So I will provide a lot of C++ examples. Granted, you’ll have to overcome some ugly syntax, the patterns might not stand out from the background of verbosity, and you might be forced to do some copy and paste in lieu of higher abstraction, but that’s just the lot of a C++ programmer.

But you’re not off the hook as far as Haskell is concerned. You don’t have to become a Haskell programmer, but you need it as a language for sketching and documenting ideas to be implemented in C++. That’s exactly how I got started with Haskell. I found its terse syntax and powerful type system a great help in understanding and implementing C++ templates, data structures, and algorithms. But since I can’t expect the readers to already know Haskell, I will introduce it slowly and explain everything as I go.

If you’re an experienced programmer, you might be asking yourself: I’ve been coding for so long without worrying about category theory or functional methods, so what’s changed? Surely you can’t help but notice that there’s been a steady stream of new functional features invading imperative languages. Even Java, the bastion of object-oriented programming, let the lambdas in C++ has recently been evolving at a frantic pace — a new standard every few years — trying to catch up with the changing world. All this activity is in preparation for a disruptive change or, as we physicist call it, a phase transition. If you keep heating water, it will eventually start boiling. We are now in the position of a frog that must decide if it should continue swimming in increasingly hot water, or start looking for some alternatives.

IMG_1299

One of the forces that are driving the big change is the multicore revolution. The prevailing programming paradigm, object oriented programming, doesn’t buy you anything in the realm of concurrency and parallelism, and instead encourages dangerous and buggy design. Data hiding, the basic premise of object orientation, when combined with sharing and mutation, becomes a recipe for data races. The idea of combining a mutex with the data it protects is nice but, unfortunately, locks don’t compose, and lock hiding makes deadlocks more likely and harder to debug.

But even in the absence of concurrency, the growing complexity of software systems is testing the limits of scalability of the imperative paradigm. To put it simply, side effects are getting out of hand. Granted, functions that have side effects are often convenient and easy to write. Their effects can in principle be encoded in their names and in the comments. A function called SetPassword or WriteFile is obviously mutating some state and generating side effects, and we are used to dealing with that. It’s only when we start composing functions that have side effects on top of other functions that have side effects, and so on, that things start getting hairy. It’s not that side effects are inherently bad — it’s the fact that they are hidden from view that makes them impossible to manage at larger scales. Side effects don’t scale, and imperative programming is all about side effects.

Changes in hardware and the growing complexity of software are forcing us to rethink the foundations of programming. Just like the builders of Europe’s great gothic cathedrals we’ve been honing our craft to the limits of material and structure. There is an unfinished gothic cathedral in Beauvais, France, that stands witness to this deeply human struggle with limitations. It was intended to beat all previous records of height and lightness, but it suffered a series of collapses. Ad hoc measures like iron rods and wooden supports keep it from disintegrating, but obviously a lot of things went wrong. From a modern perspective, it’s a miracle that so many gothic structures had been successfully completed without the help of modern material science, computer modelling, finite element analysis, and general math and physics. I hope future generations will be as admiring of the programming skills we’ve been displaying in building complex operating systems, web servers, and the internet infrastructure. And, frankly, they should, because we’ve done all this based on very flimsy theoretical foundations. We have to fix those foundations if we want to move forward.

Ad hoc measures preventing the Beauvais cathedral from collapsing

Ad hoc measures preventing the Beauvais cathedral from collapsing

Next: Category: The Essence of Composition.


We have this friendly competition going on between Eric Niebler and myself. He writes some clever C++ template code, and I feel the compulsion to explain it to him in functional terms. Then I write a blog about Haskell or category theory and Eric feels a compulsion to translate it into C++.

Eric is now working on his proposal to rewrite the C++ STL in terms of ranges and I keep reinterpreting his work in terms familiar to functional programmers. Eric’s range comprehensions are a result of some of this back and forth.

Lazy ranges are such an excellent example of functional programming that it would be foolish for me to pass this opportunity to dissect them. To any functional programmer worth their salt they just scream “monad!” A monad is a higher order pattern that can be built step by step, so that’s what I’m going to do. I’ll start with the functor pattern, then add some functionality that will make it a pointed functor, then add some more to make it an applicative functor, and finally add some more to make it a monad. This gradual buildup of functionality is reminiscent of building a class hierarchy, and indeed it can be modelled as such in Haskell (although Haskell type classes are slightly different than C++ classes). This hierarchy would look something like this:

  • A monad is-an applicative functor
  • An applicative functor is-a pointed functor
  • A pointed functor is-a functor

So let’s start with a functor.

Functor

I have a pet peeve about the use of the word “functor” in C++. People keep calling function objects functors. It’s like calling Luciano Pavarotti an “operator,” because he sings operas. The word functor has a very precise meaning in mathematics — moreover, it’s the branch of mathematics that’s extremely relevant to programming. So hijacking this term to mean a function-like object causes unnecessary confusion.

A functor in functional programming is a generic template, which allows the “lifting” of functions. Let me explain what it means. A generic template takes an arbitrary type as a template argument. So a range (whether lazy or eager) is a generic template because it can be instantiated for any type. You can have a range of integers, a range of vectors, a range of ranges, and so on. (We’ll come back to ranges of ranges later when we talk about monads.)

The “lifting” of functions means this: Give me any function from some type T to some other type U and I can apply this function to a range of T and produce a range of U. You may recognize this kind of lifting in the STL algorithm std::transform, which can be used to apply a function to a container. STL containers are indeed functors. Unfortunately, their functorial nature is buried under the noise of iterators. In Eric’s range library, the lifting is done much more cleanly using view::transform. Have a look at this example:

 int total = accumulate(view::iota(1) |
                        view::transform([](int x){return x*x;}) |
                        view::take(10), 0);

Here, view::transform takes an anonymous function that squares its argument, and lifts this function to the level of ranges. The range created by view::iota(1) is piped into it from the left, and the resulting rage of squares emerges from it on the right. The (infinite) range is then truncated by take‘ing the first 10 elements.

The function view::iota(1) is a factory that produces an infinite range of consecutive integers starting from 1. (We’ll come back to range factories later.)

In this form, view::transform plays the role of a higher-order function: one that takes a function and returns a function. It almost reaches the level of terseness and elegance of Haskell, where this example would look something like this:

total = sum $ take 10 $ fmap (\x->x*x) [1..]

(Traditionally, the flow of data in Haskell is from right to left.) The (higher-order) function fmap can be thought of as a “method” of the class Functor that does the lifting in Haskell. In C++ there is no overall functor abstraction, so each functor names its lifting function differently — for ranges, it’s view::transform.

The intuition behind a functor is that it generates a family of objects that somehow encapsulate values of arbitrary types. This encapsulation can be very concrete or very abstract. For instance, a container simply contains values of a given type. A range provides access to values that may be stored in some other container. A lazy range generates values on demand. A future, which is also a functor (or will be, in C++17), describes a value that might not be currently available because it’s being evaluated in a separate thread.

All these objects have one thing in common: they provide means to manipulate the encapsulated values with functions. That’s the only requirement for a functor. It’s not required that a functor provide access to encapsulated values (which may not even exist), although most do. In fact there is a functor (really, a monad), in Haskell, that provides no way of accessing its values other than outputting them to external devices.

Pointed Functor

A pointed functor is a functor with one additional ability: it lets you lift individual values. Give me a value of any type and I will encapsulate it. In Haskell, the encapsulating function is called pure although, as we will see later, in the context of a monad it’s called return.

All containers are pointed, because you can always create a singleton container — one that contains only one value. Ranges are more interesting. You can obviously create a range from a singleton container. But you can also create a lazy range from a value using a (generic) function called view::single, which doesn’t have a backing container behind it.

There is, however, an alternative way of lifting a value to a range, and that is by repeating it indefinitely. The function that creates such infinite (lazy) ranges is called view::repeat. For instance, view::repeat(1) creates an infinite series of ones. You might be wondering what use could there be of such a repetitive range. Not much, unless you combine it with other ranges. In general, pointed functors are not very interesting other than as stepping stones towards applicative functors and monads. So let’s move on.

Applicative Functor

An applicative functor is a pointed functor with one additional ability: it lets you lift multi-argument functions. We already know how to lift a single-argument function using fmap (or transform, or whatever it’s called for a particular functor).

With multi-argument functions acting on ranges we have two different options corresponding to the two choices for pure I mentioned before: view::single and view::repeat.

The idea, taken from functional languages, is to consider what happens when you provide the first argument to a function of multiple arguments (it’s called partial application). You don’t get back a value. Instead you get something that expects one or more remaining arguments. A thing that expects arguments is called a function (or a function object), so you get back a function of one fewer arguments. In C++ you can’t just call a function with fewer arguments than expected, or you get a compilation error, but there is a (higher-order) function in the Standard Library called std::bind that implements partial application.

This kind of transformation from a function of multiple arguments to a function of one argument that returns a function is called currying.

Let’s consider a simple example. We want to apply std::make_pair to two ranges: view::ints(10, 11) and view::ints(1, 3). To this end, let’s replace std::make_pair with the corresponding curried function of one argument returning a function of one argument:

[](int i) { return [i](int j) { return std::make_pair(i, j); };}

First, we want to apply this function to the first range. We know how to apply a function to a range: we use view::transform.

auto partial_app = view::ints(10, 11) 
                 | view::transform([](int i) { 
                      return [i](int j) { return std::make_pair(i, j); }
                   });

What’s the result of this application? Can you guess? Our curried function will be applied to each integer in the range, returning a function that pairs that integer with its argument. So we end up with a range of functions of the form:

[i](int j) { return std::make_pair(i, j); }

So far so good — we have just used the functorial property of the range. But now we have to decide how to apply a range of functions to the second range of values. And that’s the essence of the definition of an applicative functor. In Haskell the operation of applying encapsulated functions to encapsulated arguments is denoted by an infix operator <*>.

With ranges, there are two basic strategies:

  1. We can enumerate all possible combinations — in other words create the cartesian product of the range of functions with the range of values — or
  2. Match corresponding functions to corresponding values — in other words, “zip” the two ranges.

The former, when applied to view::ints(1, 3), will yield:

{(10,1),(10,2),(10,3),(11,1),(11,2),(11,3)}

and the latter will yield:

{(10, 1),(11, 2)}

(when the ranges are not equal length, you stop zipping when the shorter one is exhausted).

To see that those two choices correspond to the two choices for pure, we have to look at some consistency conditions. One of them is that if you encapsulate a single-argument function in a range using pure and then apply it to a range of arguments, you should get the same result as if you simply fmapped this function directly over the range of arguments. For the record, I’ll write it here in Haskell:

pure f <*> xs == fmap f xs

This is sort of an obvious requirement: You have two different ways of applying a single-argument function to a range, they better give the same result.

Let’s try it with the view::single version of pure. When acting on a function, it will create a one-element range containing this function. The “all possible combinations” application will just apply this function to all elements of the argument range, which is exactly what view::transform would do.

Conversely, if we apply view::repeat to the function, we’ll get an infinite range that repeats this function at every position. We have to zip this range with the range of arguments in order to get the same result as view::transform. So this implementation of pure works with the zippy applicative. Notice that if the argument range is finite the result will also be finite. But this kind of application will also work with infinite ranges thanks to laziness.

So there are two legitimate implementations of the applicative functor for ranges. One uses view::single to lift values and uses the all possible combinations strategy to apply a range of functions to a range of arguments. The other uses view::repeat to lift values and the zipping application for ranges of functions and arguments. They are both acceptable and have their uses.

Now let’s go back to our original problem of applying a function of multiple arguments to a bunch of ranges. Since we are not doing it in Haskell, currying is not really a practical option.

As it turns out, the second version of applicative has been implemented by Eric as a (higher-order) function view::zip_with. This function takes a multi-argument callable object as its first argument, followed by a variadic list of ranges.

There is no corresponding implementation for the combinatorial applicative. I think the most natural interface would be an overload of view::transform (or maybe view::fmap) with the same signature as zip_with. Our example would then look like this:

view::transform(std::make_pair, view::ints(10, 11), view::ints(1, 3));

The need for this kind of interface is not as acute because, as we’ll learn next, the combinatorial applicative is supplanted by a more general monadic interface.

Monad

Monads are applicative functors with one additional functionality. There are two equivalent ways of describing this functionality. But let me first explain why this functionality is needed.

The range library comes with a bunch of range factories, such as view::iota, view::ints, or view::repeat. It’s also very likely that users of the library will want to create their own range factories. The problem is: How do you compose existing range factories to obtain new range factories?

Let me give you an example that generated a blog post exchange between me and Eric. The challenge was to generate a list of Pythagorean triples. The idea is to take a cross product of three infinite ranges of integers and select those triples that satisfy the equation x2 + y2 = z2. The cross product of ranges is reminiscent of the “all possible combinations” applicative, and indeed that’s the applicative that can be extended to a monad (the zippy one can’t).

To make this algorithm feasible, we have to organize these ranges so we can (lazily) traverse them. Let’s start with a factory that produces all integers from 1 to infinity. That’s the view::ints(1) factory. Then, for each z produced by that factory, let’s create another factory view::ints(1, z). This range will provide our xs — and it makes no sense to try xs that are bigger than zs. These values, in turn, will be used in the creation of the third factory, view::ints(x, z) that will generate our ys. At the end we’ll filter out the triples that don’t satisfy the Pythagorean equation.

Notice how we are feeding the output of one range factory into another range factory. Do you see the problem? We can’t just iterate over an infinite range. We need a way to glue the output side of one range factory to the input side of another range factory without unpacking the range. And that’s what monads are for.

Remember also that there are functors that provide no way of extracting values, or for which extraction is expensive or blocking (as is the case with futures). Being able to compose those kinds of functor factories is often essential, and again, the monad is the answer.

Now let’s pinpoint the type of functionality that would allow us to glue range factories end-to-end. Since ranges are functorial, we can use view::transform to apply a factory to a range. After all a factory is just a function. The only problem is that the result of such application is a range of ranges. So, really, all that’s needed is a lazy way of flattening nested ranges. And that’s exactly what Eric’s view::flatten does.

With this new flattening power at our disposal, here’s a possible beginning of the solution to the Pythagorean triple problem:

view::ints(1) | view::transform([](int z) { 
                view::ints(1, z) | ... } | view::flatten

However, this combination of view::transform and view::flatten is so useful that it deserves its own function. In Haskell, this function is called “bind” and is written as an infix operator >>=. (And, while we’re at it, flatten is called join.)

And guess what the combination of view::transform and view::flatten is called in the range library. This discovery struck me as I was looking at one of Eric’s examples. It’s called view::for_each. Here’s the solution to the Pythagorean triple problem using view::for_each to bind range factories:

auto triples =
  for_each(ints(1), [](int z) {
    return for_each(ints(1, z), [=](int x) {
      return for_each(ints(x, z), [=](int y) {
        return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z));
      });
    });
  });

And here’s the same code in Haskell:

triples = 
  (>>=) [1..] $ \z -> 
     (>>=) [1..z] $ \x -> 
        (>>=) [x..z] $ \y -> 
           guard (x^2 + y^2 == z^2) >> return (x, y, z)

I have purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators >>= are normally used in infix position but here I wanted to match them against for_each. Haskell’s return is the same as view::single, which Eric renamed to yield inside for_each. In this particular case, yield is conditional, which in Haskell is expressed using guard. The syntax for lambdas is also different, but otherwise the code matches almost perfectly.

This is an amazing and somewhat unexpected convergence. In our tweeter exchange, Eric sheepishly called his for_each code imperative. We are used to thinking of for_each as synonymous with looping, which is such an iconic imperative construct. But here, for_each is the monadic bind — the epitome of functional programming. This puppy is purely functional. It’s an expression that returns a value (a range) and has no side effects.

But what about those loops that do have side effects and don’t return a value? In Haskell, side effects are always encapsulated using monads. The equivalent of a for_each loop with side effects would return a monadic object. What we consider side effects would be encapsulated in that object. It’s not the loop that performs side effects, its that object. It’s an executable object. In the simplest case, this object contains a function that may be called with the state that is to be modified. For side effects that involve the external world, there is a special monad called the IO monad. You can produce IO objects, you can compose them using monadic bind, but you can’t execute them. Instead you return one such object that combines all the IO of your program from main and let the runtime execute it. (At least that’s the theory.)

Is this in any way relevant to an imperative programmer? After all, in C++ you can perform side effects anywhere in your code. The problem is that there are some parts of your code where side effects can kill you. In concurrent programs uncontrolled side effects lead to data races. In Software Transactional Memory (STM, which at some point may become part of C++) side effects may be re-run multiple times when a transaction is retried. There is an urgent need to control side effects and to separate pure functions from their impure brethren. Encapsulating side effects inside monads could be the ticket to extend the usefulness of pure functions inside an imperative language like C++.

To summarize: A monad is an applicative functor with an additional ability, which can be expressed either as a way of flattening a doubly encapsulated object, or as a way of applying a functor factory to an encapsulated object.

In the range library, the first method is implemented through view::flatten, and the second through view::for_each. Being an applicative functor means that a range can be manipulated using view::transform and that any value may be encapsulated using view::single or, inside for_each, using yield.

The ability to apply a range of functions to a range of arguments that is characteristic of an applicative functor falls out of the monadic functionality. For instance, the example from the previous section can be rewritten as:

for_each(ints(10, 11), [](int i) {
  return for_each(ints(1, 3), [i](int j) {
    return yield(std::make_pair(i, j));
  });
});

The Mess We’re In

I don’t think the ideas I presented here are particularly difficult. What might be confusing though is the many names that are used to describe the same thing. There is a tendency in imperative (and some functional) languages to come up with cute names for identical patterns specialized to different applications. It is also believed that programmers would be scared by terms taken from mathematics. Personally, I think that’s silly. A monad by any other name would smell as sweet, but we wouldn’t be able to communicate about them as easily. Here’s a sampling of various names used in relation to concepts I talked about:

  1. Functor: fmap, transform, Select (LINQ)
  2. Pointed functor: pure, return, single, repeat, make_ready_future, yield, await
  3. Applicative functor: <*>, zip_with
  4. Monad: >>=, bind, mbind, for_each, next, then, SelectMany (LINQ)

Part of the problem is the lack of expressive power in C++ to unite such diverse phenomena as ranges and futures. Unfortunately, the absence of unifying ideas adds to the already overwhelming complexity of the language and its libraries. The functional paradigm could be a force capable of building connections between seemingly distant application areas.

Acknowledments

I’m grateful to Eric Niebler for reviewing the draft of this blog and correcting several mistakes. The remaining mistakes are all mine.


I’m not fond of arguments based on lack of imagination. “There’s no way this code may fail!” might be a sign of great confidence or the result of ignorance. The inability to come up with a counterexample doesn’t prove a theorem. And yet there is one area of programming where such arguments work, and are quite useful. These are parametricity arguments: free theorems about polymorphic functions. Fortunately, there is solid theory behind parametricity. Free theorems are not based on ignorance. So I decided to read the relevant papers (see bibliography at the end of this post) and write a blog about it. How hard could it be? A few months and several failed attempts later I realized how naive I was. But I think I finally understand the basics enough to explain them in relatively simple terms.

Motivation

Here’s a classic example — a function that takes a list of arbitrary type a and returns a list of the same type:

r :: [a] -> [a]

What can this function do? Since it has to work with any type of list element, it can’t do anything type-specific. It can’t modify the elements or invent new ones. So all it can do is rearrange them, duplicate, or remove. Can you think of anything else?

The questions it a little tricky because it all depends on the kind of polymorphism your language supports. In Haskell, where we have parametric polymorphism, the above statement is for the most part true (modulo termination worries). In C++, which supports ad-hoc polymorphism, a generic function like:

template<class T> 
list<T> r(list<T>);

can do all kinds of weird things.

Parametric polymorphism means that a function will act on all types uniformly, so the above declaration of r indeed drastically narrows down the possibilities.

For instance, consider what happens when you map any function of the type:

f :: a -> b

over a list of a. You can either apply map before or after acting on it with r. It shouldn’t matter whether you first modify the elements of the list and then rearrange them, or first rearrange and then modify them. The result should be the same:

r (map f as) = map f (r as)

But is it true just because we can’t imagine how it may fail, or can we make a formal argument to prove it?

Let’s Argue (Denotational) Semantics

One way to understand polymorphism is to have a good model for types. At first approximation types can be modeled as sets of values (strictly speaking, as shown by Reynolds, the set-theoretical model fails in the case of polymorphic lambda calculus, but there are ways around it).

The type Bool is a two-element set of True and False, Integer is a set of integers, and so on. Composite types can also be defined set-theoretically. For instance, a pair type is a cartesian product of two sets. A list of a is a set of lists with elements from the set a. A function type a->b is a set of functions between two sets.

For parametric polymorphism you need to first be able to define functions on types: functions that take a type and produce a new type. In other words, you should be able to define a family of types that is parametrized by another type. In Haskell, we call such things type constructors.

For instance, given some type a, produce a type of pairs: (a, a). This can be formally written (not in Haskell) as:

Λa . (a, a)

Notice the capital lambda for defining functions on types (sets), as opposed to the lowercase lambda used for functions on values (set elements).

To turn a family of types into a family of values — a polymorphic value — you put the universal quantifier forall in front of it. Don’t read too much into the quantifier aspect of it — it makes sense in the Curry-Howard isomorphism, but here it’s just a piece of syntax. It means that you use the type constructor to pick a type, and then you pick a specific value of that type.

You may recall the Axiom of Choice (AoC) from set theory. This axiom says that if you have a set of sets then there always exists a set of samples created by picking one element from each set. It’s like going to a chocolate store and ordering one of each. It’s a controversial axiom, and mathematicians are very careful in either using or avoiding it. The controversy is that, for infinite sets of sets, there may be no constructive way of picking elements. And in computer science we are not so much interested in proofs of existence, as in actual algorithms that produce tangible results.

Here’s an example:

forall a . (a, a)

This is a valid type signature, but you’d be hard pressed to implement it. You’d have to provide a pair of concrete values for every possible type. You can’t do it uniformly across all types. (Especially that some types are uninhabited, as Gershom Bazerman kindly pointed out to me.)

Interestingly enough, you can sometimes define polymorphic values if you constrain polymorphism to certain typeclasses. For instance, when you define a numeric constant in Haskell:

x = 1

its type is polymorphic:

x :: forall a. Num a => a

(using the language extension ExplicitForAll). Here x represents a whole family of values, including:

1.0 :: Float
1 :: Int
1 :: Integer

with potentially different representations.

But there are some types of values that can be specified wholesale. These are function values. Functions are first class values in Haskell (although you can’t compare them for equality). And with one formula you can define a whole family of functions. The following signature, for instance, is perfectly implementable:

forall a . a -> a

Let’s analyze it. It consists of a type function, or a type constructor:

Λa . a -> a

which, for any type a, returns a function type a->a. When universally quantified with forall, it becomes a family of concrete functions, one per each type. This is possible because all these functions can be defined with one closed term (see Appendix 2). Here’s this term:

\x -> x

In this case we actually have a constructive way of picking one element — a function — for each type a. For instance, if a is a String, we pick a function that takes any String and returns the same string. It’s a particular String->String function, one of many possible String->String functions. And it’s different from the Int->Int function that takes an Int and returns the same Int. But all these identity functions are encoded using the same lambda expression. It’s that generic formula that allows us to chose a representative function from each set of functions a->a: one from the set String->String, one from the set Int->Int, etc.

In Haskell, we usually omit the forall quantifier when there’s no danger of confusion. Any signature that contains a type variable is automatically universally quantified over it. (You’ll have to use explicit forall, however, with higher-order polymorphism, where a polymorphic function can be passed as an argument to another function.)

So what’s the set-theoretic model for polymorphism? You simply replace types with sets. A function on types becomes a function on sets. Notice that this is not the same as a function between sets. The latter assigns elements of one set to elements of another. The former assigns sets to sets — you could call it a set constructor. As in: Take any set a and return a cartesian product of this set with itself.

Or take any set a and return the set of functions from this set to itself. We have just seen that for this one we can easily build a polymorphic function — one which for every type a produces an actual function whose type is (a->a). Now, with ad-hoc polymorphism it’s okay to code the String function separately from the Int function; but in parametric polymorphism, you’ll have to use the same code for all types.

This uniformity — one formula for all types — dramatically restricts the set of polymorphic functions, and is the source of free theorems.

Any language that provides some kind of pattern-matching on types (e.g., template specialization in C++) automatically introduces ad-hoc polymorphism. Ad-hoc polymorphism is also possible in Haskell through the use of type classes and type families.

Preservation of Relations

Let’s go to our original example and rewrite it using the explicit universal quantifier:

r :: forall a. [a] -> [a]

It defines a family of functions parametrized by the type a. When used in Haskell code, a particular member of this family will be picked automatically by the type inference system, depending on the context. In what follows, I’ll use explicit subscripting for the same purpose. The free theorem I mentioned before can be rewritten as:

rb (map f as) = map f (ra as)

with the function:

f :: a -> b

serving as a bridge between the types a and b. Specifically, f relates values of type a to values of type b. This relation happens to be functional, which means that there is only one value of type b corresponding to any given value of type a.

But the correspondence between elements of two lists may, in principle, be more general. What’s more general than a function? A relation. A relation between two sets a and b is defined as a set of pairs — a subset of the cartesian product of a and b. A function is a special case of a relation, one that can be represented as a set of pairs of the form (x, f x), or in relational notation x <=> f x. This relation is often called the graph of the function, since it can be interpreted as coordinates of points on a 2-d plane that form the plot the function.

The key insight of Reynolds was that you can abstract the shape of a data structure by defining relations between values. For instance, how do we know that two pairs have the same shape — even if one is a pair of integers, say (1, 7), and the other a pair of colors, say (Red, Blue)? Because we can relate 1 to Red and 7 to Blue. This relation may be called: “occupying the same position”.

Notice that the relation doesn’t have to be functional. The pair (2, 2) can be related to the pair (Black, White) using the non-functional relation:

(2 <=> Black),
(2 <=> White)

This is not a function because 2 is not mapped to a single value.

Conversely, given any relation between integers and colors, you can easily test which integer pairs are related to which color pairs. For the above relation, for instance, these are all the pairs that are related:

((2, 2) <=> (Black, Black)),
((2, 2) <=> (Black, White)),
((2, 2) <=> (White, Black)),
((2, 2) <=> (White, White))

Thus a relation between values induces a relation between pairs.

This idea is easily extended to lists. Two lists are related if their corresponding elements are related: the first element of one list must be related to the first element of the second list, etc.; and empty lists are always related.

In particular, if the relationship between elements is established by a function f, it’s easy to convince yourself that the lists as and bs are related if

bs = map f as

With this in mind, our free theorem can be rewritten as:

rb bs = map f (ra as)

In other words, it tells us that the two lists

rb bs

and

ra as

are related through f.

ListFunRelation

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

So r transforms related lists into related lists. It may change the shape of the list, but it never touches the values in it. When it acts on two related lists, it rearranges them in exactly the same way, without breaking any of the relations between corresponding elements.

Reading Types as Relations

The above examples showed that we can define relations between values of composite types in terms of relations between values of simpler types. We’ve seen this with the pair constructor and with the list constructor. Continuing this trend, we can state that two functions:

f :: a -> b

and

g :: a' -> b'

are related iff, for related x and y, f x is related to g y. In other words, related functions map related arguments to related values.

Notice what we are doing here: We are consistently replacing types with relations in type constructors. This way we can read complex types as relations. The type constructor -> acts on two types, a and b. We extend it to act on relations: The “relation constructor” -> in A->B takes two relations A (between a and a') and B (between b and b') and produces a relation between functions f and g.

But what about primitive types? Let’s consider an example. Two functions from lists to integers that simply calculate the lengths of the lists:

lenStr  :: [Char] -> Int
lenBool :: [Bool] -> Int

What happens when we call them with two related lists? The first requirement for lists to be related is that they are of equal length. So when called with related lists the two functions will return the same integer value . It makes sense for us to consider these two functions related because they don’t inspect the values stored in the lists — just their shapes. (They also look like components of the same parametrically polymorphic function, length.)

It therefore makes sense to read a primitive type, such as Int, as an identity relation: two values are related if they are equal. This way our two functions, lenStr and lenBool are indeed related, because they turn related lists to related (equal) results.

Notice that for non-polymorphic functions the relationship that follows from their type is pretty restrictive. For instance, two functions Int->Int are related if and only if their outputs are equal for equal inputs. In other words, the functions must be (extensionally) equal.

All these relations are pretty trivial until we get to polymorphic functions. The type of a polymorphic function is specified by universally quantifying a function on types (a type constructor).

f :: forall a. φa

The type constructor φ maps types to types. In our set-theoretical model it maps sets to sets, but we want to read it in terms of relations.

Functions on relations

A general relation is a triple: We have to specify three sets, a, a', and a set of pairs — a subset of the cartesian product a × a'. It’s not at all obvious how to define functions that map relations to relations. What Reynolds chose is a definition that naturally factorizes into three mappings of sets, or to use the language of programming, three type constructors.

First of all, a function on relations Φ (or a “relation constructor”) is defined by two type constructors, φ and ψ. When Φ acts on a relation A between sets a and a', it first maps those sets, so that b=φa and b'=ψa'. ΦA then establishes a relation between the sets b and b'. In other words, ΦA is a subset of b × b'.

RelationMap

Fig 2. Φ maps relations to relations. The squarish sets represent cartesian products (think of a square as a cartesian product of two segments). Relations A and ΦA are subsets of these products.

Relations between polymorphic functions

Given that Φ maps relations to relations, a universally quantified version of it:

forall A. ΦA

maps pairs of sets to pairs of values.

Now suppose that you have two polymorphic functions g and g':

g  :: forall a . φa
g' :: forall a'. ψa'

They both map types (sets) to values.

  • We can instantiate g at some type a, and it will return a value ga of the type b=φa.
  • We can instantiate g' at some type a', and it will return a value g'a' of the type b'=ψa'.

We can do this for any relation A between two arbitrary sets a and a'.

We will say that g and g' are related through the relation induced by the type (forall A. ΦA) iff the results ga and g'a' are related by ΦA.

PolyFunRel

Fig 3. Relation between two polymorphic functions. The pair (g a, g' a') falls inside the relation ΦA.

In other words, polymorphic functions are related if they map related types to related values. Notice that in the interesting examples these values are themselves functions.

With these definitions, we can now reinterpret any type signature as a relation between values.

The Parametricity Theorem

Reynolds’ second key insight was that any term is in a relation with itself — the relation being induced by the term’s type. We have indeed defined the mapping of types to relations to make this work. Primitive types turn into identity relations, so obviously a primitive value is in relation with itself. A function between primitive types is in relation with itself because it maps related (equal) arguments into related (equal) results. A list or a pair of primitive types is in relation with itself because each element of it is equal to itself. You can recurse and consider a list of functions, or a pair of lists, etc., building the proof inductively, proceeding from simpler types to more and more complex types. The proof goes over all possible term construction rules and typing rules in a given theory.

Formally, this kind of proof is called “structural induction,” because you’re showing that more complex structures will satisfy the theorem as long as the simpler ones, from which they are constructed, do. The only tricky part is dealing with polymorphic functions, because they are quantified over all types (including polymorphic types). In fact, this is the reason why the naive interpretation of types as sets breaks down (see, however, Pitts’ paper). It is possible, however, to prove the parametricity theorem in a more general setting, for instance, using frames, or in the framework of operational semantics, so we won’t worry about it here.

Wadler’s key insight was to interpret Reynolds’ theorem not only as a way of identifying different implementations of the same type — for instance, cartesian and polar representations of complex numbers — but also as a source of free theorems for polymorphic types.

Let’s try applying parametricity theorem to some simple examples. Take a constant term: an integer like 5. Its type Int can be interpreted as a relation, which we defined to be the identity relation (it’s one of the primitive types). And indeed, 5 is in this relation with 5.

Take a function like:

ord :: Char -> Int

Its type defines a relation between functions: Two functions of the type Char->Int are related if they return equal integers for equal characters. Obviously, ord is in this relation with itself.

Parametricity in Action

Those were trivial examples. The interesting ones involve polymorphic functions. So let’s go back to our starting example. The term now is the polymorphic function r whose type is:

r :: forall a . [a] -> [a]

Parametricity tells us that r is in relation with itself. However, comparing a polymorphic function to itself involves comparing the instantiations of the same function at two arbitrary types, say a and a'. Let’s go through this example step by step.

We are free to pick an arbitrary relation A between elements of two arbitrary input sets a and a'. The type of r induces a mapping Φ on relations. As with every function on relations, we have to first identify the two type constructors φ and ψ, one mapping a and one mapping a'. In our case they are identical, because they are induced by the same polymorphic function. They are equal to:

Λ a. [a]->[a]

It’s a type constructor that maps an arbitrary type a to the function type [a]->[a].

The universal quantifier forall means that r lets us pick a particular value of the type [a]->[a] for each a. This value is a function that we call ra. We don’t care how this function is picked by r, as long as it’s picked uniformly, using a single formula for all a, so that our parametricity theorem holds.

FreeTheorem

Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

Parametricity means that, if a is related to a', then:

ra <=> ra'

This particular relation is induced by the function type [a]->[a]. By our definition, two functions are related if they map related arguments to related results. In this case both the arguments and the results are lists. So if we have two related lists, as and as':

as  :: [a]
as' :: [a']

they must, by parametricity, be mapped to two related lists, bs and bs':

bs  = ra  as
bs' = ra' as'

This must be true for any relation A, so let’s pick a functional relation generated by some function:

f :: a -> a'

This relation induces a relation on lists:

as' = map f as

The results of applying r, therefore, must be related through the same relation:

bs' = map f bs

Combining all these equalities, we get our expected result:

ra' (map f as) = map f (ra as)

Parametricity and Natural Transformations

The free theorem I used as the running example is interesting for another reason: The list constructor is a functor. You may think of functors as generalized containers for storing arbitrary types of values. You can imagine that they have shapes; and for two containers of the same shape you may establish a correspondence between “positions” at which the elements are stored. This is quite easy for traditional containers like lists or trees, and with a leap of faith it can be stretched to non-traditional “containers” like functions. We used the intuition of relations corresponding to the idea of “occupying the same position” within a data structure. This notion can be readily generalized to any polymorphic containers. Two trees, for instance, are related if they are both empty, or if they have the same shape and their corresponding elements are related.

Let’s try another functor: You can also think of Maybe as having two shapes: Nothing and Just. Two Nothings are always related, and two Justs are related if their contents are related.

This observation immediately gives us a free theorem about polymorphic functions of the type:

r :: forall a. [a] -> Maybe a

an example of which is safeHead. The theorem is:

fmap h . safeHead == safeHead . fmap h

Notice that the fmap on the left is defined by the Maybe functor, whereas the one on the right is the list one.

If you accept the premise that an appropriate relation can be defined for any functor, then you can derive a free theorem for all polymorphic functions of the type:

r :: forall a. f a -> g a

where f and g are functors. This type of function is known as a natural transformation between the two functors, and the free theorem:

fmap h . r == r . fmap h

is the naturality condition. That’s how naturality follows from parametricity.

Acknowledgments

I’d like to thank all the people I talked to about parametricity at the ICFP in Gothenburg, and Edward Kmett for reading and commenting on the draft of this blog.

Appendix 1: Other Examples

Here’s a list of other free theorems from Wadler’s paper. You might try proving them using parametricity.

r :: [a] -> a -- for instance, head
f . r == r . fmap f
r :: [a] -> [a] -> [a] -- for instance, (++)
fmap f (r as bs) == r (fmap f as) (fmap f bs)
r :: [[a]] -> [a] -- for instance, concat
fmap f . r == r . fmap (fmap f)
r :: (a, b) -> a -- for instance, fst
f . r == r . mapPair (f, g)
r :: (a, b) -> b -- for instance, snd
g . r == r . mapPair (f, g)
r :: ([a], [b]) -> [(a, b)] -- for instance, uncurry zip
fmap (mapPair (f, g)) . r == r . mapPair (fmap f, fmap g)
r :: (a -> Bool) -> [a] -> [a] -- for instance, filter
fmap f . r (p . f) = r p . fmap f
r :: (a -> a -> Ordering) -> [a] -> [a] -- for instance, sortBy
 -- assuming: f is monotone (preserves order)
fmap f . r cmp == r cmp' . fmap f
r :: (a -> b -> b) -> b -> [a] -> b -- for instance, foldl
-- assuming: g (acc x y) == acc (f x) (g y)
g . foldl acc zero == foldl acc (g zero) . fmap f
r :: a -> a -- id
f . r == r . f
r :: a -> b -> a -- for instance, the K combinator
f (r x y) == r (f x) (g y)

where:

mapPair :: (a -> c, b -> d) -> (a, b) -> (c, d)
mapPair (f, g) (x, y) = (f x, g y)

Appendix 2: Identity Function

Let’s prove that there is only one polymorphic function of the type:

r :: forall a. a -> a

and it’s the identity function:

id x = x

We start by picking a particular relation. It’s a relation between the unit type () and an arbitrary (inhabited) type a. The relation consists of just one pair ((), c), where () is the unit value and c is an element of a. By parametricity, the function

r() :: () -> ()

must be related to the function

ra :: a -> a

There is only one function of the type ()->() and it’s id(). Related functions must map related argument to related values. We know that r() maps unit value () to unit value (). Therefore ra must map c to c. Since c is arbitrary, ra must be an identity for all (inhabited) as.

Bibliography

  1. John C Reynolds, Types, Abstraction and Parametric Polymorphism
  2. Philip Wadler, Theorems for Free!
  3. Claudio Hermida, Uday S. Reddy, Edmund P. Robinson, Logical Relations and Parametricity – A Reynolds Programme for Category Theory and Programming Languages
  4. Derek Dreyer, Paremetricity and Relational Reasoning, Oregon Programming Languages Summer School
  5. Janis Voigtländer, Free Theorems Involving Type Constructor Classes

« Previous PageNext Page »