I am sometimes asked by C++ programmers to give an example of a problem that can’t be solved without monads. This is the wrong kind of question — it’s like asking if there is a problem that can’t be solved without `for` loops. Obviously, if your language supports a `goto`, you can live without `for` loops. What monads (and for loops) can do for you is to help you structure your code. The use of loops and if statements lets you convert spaghetti code into structured code. Similarly, the use of monads lets you convert imperative code into declarative code. These are the kind of transformations that make code easier to write, understand, maintain, and generalize.

So here’s a problem that you may get as an interview question. It’s a small problem, so the advantages of various approaches might not be immediately obvious, especially if you’ve been trained all your life in imperative programming, and you are seeing monads for the first time.

You’re supposed write a program to solve this puzzle:

```  s e n d
+ m o r e
---------
m o n e y```

Each letter correspond to a different digit between 0 and 9. Before you continue reading this post, try to think about how you would approach this problem.

## The Analysis

It never hurts to impress your interviewer with your general knowledge by correctly classifying the problem. This one belongs to the class of “constraint satisfaction problems.” The obvious constraint is that the numbers obtained by substituting letters with digits have to add up correctly. There are also some less obvious constraints, namely the numbers should not start with zero.

If you were to solve this problem using pencil and paper, you would probably come up with lots of heuristics. For instance, you would deduce that `m` must stand for 1 because that’s the largest possible carry from the addition of two digits (even if there is a carry from the previous column). Then you’d figure out that `s` must be either 8 or 9 to produce this carry, and so on. Given enough time, you could probably write an expert system with a large set of rules that could solve this and similar problems. (Mentioning an expert system could earn you extra points with the interviewer.)

However, the small size of the problem suggests that a simple brute force approach is probably best. The interviewer might ask you to estimate the number of possible substitutions, which is 10!/(10 – 8)! or roughly 2 million. That’s not a lot. So, really, the solution boils down to generating all those substitutions and testing the constraints for each.

## The Straightforward Solution

The mind of an imperative programmer immediately sees the solution as a set of 8 nested loops (there are 8 unique letters in the problem: s, e, n, d, m, o, r, y). Something like this:

```for (int s = 0; s < 10; ++s)
for (int e = 0; e < 10; ++e)
for (int n = 0; n < 10; ++n)
for (int d = 0; d < 10; ++d)
...```

and so on, until `y`. But then there is the condition that the digits have to be different, so you have to insert a bunch of tests like:

```e != s
n != s && n != e
d != s && d != e && d != n```

and so on, the last one involving 7 inequalities… Effectively you have replaced the uniqueness condition with 28 new constraints.

This would probably get you through the interview at Microsoft, Google, or Facebook, but really, can’t you do better than that?

## The Smart Solution

Before I proceed, I should mention that what follows is almost a direct translation of a Haskell program from the blog post by Justin Le. I strongly encourage everybody to learn some Haskell, but in the meanwhile I’ll be happy to serve as your translator.

The problem with our naive solution is the 28 additional constraints. Well, I guess one could live with that — except that this is just a tiny example of a whole range of constraint satisfaction problems, and it makes sense to figure out a more general approach.

The problem can actually be formulated as a superposition of two separate concerns. One deals with the depth and the other with the breadth of the search for solutions.

Let me touch on the depth issue first. Let’s consider the problem of creating just one substitution of letters with numbers. This could be described as picking 8 digits from a list of 0, 1, …9, one at a time. Once a digit is picked, it’s no longer in the list. We don’t want to hard code the list, so we’ll make it a parameter to our algorithm. Notice that this approach works even if the list contains duplicates, or if the list elements are not easily comparable for equality (for instance, if they are futures). We’ll discuss the list-picking part of the problem in more detail later.

Now let’s talk about breadth: we have to repeat the above process for all possible picks. This is what the 8 nested loops were doing. Except that now we are in trouble because each individual pick is destructive. It removes items from the list — it mutates the list. This is a well known problem when searching through solution spaces, and the standard remedy is called backtracking. Once you have processed a particular candidate, you put the elements back in the list, and try the next one. Which means that you have to keep track of your state, either implicitly on your function’s stack, or in a separate explicit data structure.

Wait a moment! Weren’t we supposed to talk about functional programming? So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming? Functional programmers have been using the state monad since time immemorial. And mutation is not an issue if you’re using persistent data structures. So fasten your seat belts and make sure your folding trays are in the upright position.

## The List Monad

We’ll start with a small refresher in quantum mechanics. As you may remember from school, quantum processes are non-deterministic. You may repeat the same experiment many times and every time get a different result. There is a very interesting view of quantum mechanics called the many-worlds interpretation, in which every experiment gives rise to multiple alternate histories. So if the spin of an electron may be measured as either up or down, there will be one universe in which it’s up, and one in which it’s down.

We’ll use the same approach to solving our puzzle. We’ll create an alternate universe for each digit substitution for a given letter. So we’ll start with 10 universes for the letter `s`; then we’ll split each of them into ten universes for the letter `e`, and so on. Of course, most of these universes won’t yield the desired result, so we’ll have to destroy them. I know, it seems kind of wasteful, but in functional programming it’s easy come, easy go. The creation of a new universe is relatively cheap. That’s because new universes are not that different from their parent universes, and they can share almost all of their data. That’s the idea behind persistent data structures. These are the immutable data structures that are “mutated” by cloning. A cloned data structure shares most of its implementation with the parent, except for a small delta. We’ll be using persistent lists described in my earlier post.

Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward. First, we need functions that generate new worlds. Since we are cheap, we’ll only generate the parts that are different. So what’s the difference between all the worlds that we get when selecting the substitution for the letter `s`? Just the number that we assign to `s`. There are ten worlds corresponding to the ten possible digits (we’ll deal with the constraints like `s` being different from zero later). So all we need is a function that generates a list of ten digits. These are our ten universes in a nutshell. They share everything else.

Once you are in an alternate universe, you have to continue with your life. In functional programming, the rest of your life is just a function called a continuation. I know it sounds like a horrible simplification. All your actions, emotions, and hopes reduced to just one function. Well, maybe the continuation just describes one aspect of your life, the computational part, and you can still hold on to our emotions.

So what do our lives look like, and what do they produce? The input is the universe we’re in, in particular the one number that was picked for us. But since we live in a quantum universe, the outcome is a multitude of universes. So a continuation takes a number, and produces a list. It doesn’t have to be a list of numbers, just a list of whatever characterizes the differences between alternate universes. In particular, it could be a list of different solutions to our puzzle — triples of numbers corresponding to “send”, “more”, and “money”. (There is actually only one solution, but that’s beside the point.)

And what’s the very essence of this new approach? It’s the binding of the selection of the universes to the continuation. That’s where the action is. This binding, again, can be expressed as a function. It’s a function that takes a list of universes and a continuation that produces a list of universes. It returns an even bigger list of universes. We’ll call this function `for_each`, and we’ll make it as generic as possible. We won’t assume anything about the type of the universes that are passed in, or the type of the universes that the continuation `k` produces. We’ll also make the type of the continuation a template parameter and extract the return type from it using `auto` and `decltype`:

```template<class A, class F>
auto for_each(List<A> lst, F k) -> decltype(k(lst.front()))
{
using B = decltype(k(lst.front()).front());
// This should really be expressed using concepts
static_assert(std::is_convertible<
F, std::function<List<B>(A)>>::value,
"for_each requires a function type List<B>(A)");

List<List<B>> lstLst = fmap(k, lst);
return concatAll(lstLst);
}```

The function `fmap` is similar to `std::transform`. It applies the continuation `k` to every element of the list `lst`. Because `k` itself produces a list, the result is a list of lists, `lstLst`. The function `concatAll` concatenates all those lists into one big list.

Congratulations! You have just seen a monad. This one is called the list monad and it’s used to model non-deterministic processes. The monad is actually defined by two functions. One of them is `for_each`, and here’s the other one:

```template<class A>
List<A> yield(A a)
{
return List<A> (a);
}```

It’s a function that returns a singleton list. We use `yield` when we are done multiplying universes and we just want to return a single value. We use it to create a single-valued continuation. It represents the lonesome boring life, devoid of any choices.

I will later rename these functions to `mbind` and `mreturn`, because they are part of any monad, not just the list monad.

The names like `for_each` or `yield` have a very imperative ring to them. That’s because, in functional programming, monadic code plays a role similar to imperative code. But neither `for_each` nor `yield` are control structures — they are functions. In particular `for_each`, which sounds and works like a loop, is just a higher order function; and so is `fmap`, which is used in its implementation. Of course, at some level the code becomes imperative — `fmap` can either be implemented recursively or using an actual loop — but the top levels are just declarations of functions. Hence, declarative programming.

There is a slight difference between a loop and a function on lists like `for_each`: `for_each` takes a whole list as an argument, while a loop might generate individual items — in this case integers — on the fly. This is not a problem in a lazy functional language like Haskell, where a list is evaluated on demand. The same behavior may be implemented in C++ using streams or lazy ranges. I won’t use it here, since the lists we are dealing with are short, but you can read more about it in my earlier post Getting Lazy with C++.

We are not ready yet to implement the solution to our puzzle, but I’d like to give you a glimpse of what it looks like. For now, think of `StateL` as just a list. See if it starts making sense (I grayed out the usual C++ noise):

```StateL<tuple<int, int, int>> solve()
{
StateL<int> sel = &select<int>;

return for_each(sel, [=](int s) {
return for_each(sel, [=](int e) {
return for_each(sel, [=](int n) {
return for_each(sel, [=](int d) {
return for_each(sel, [=](int m) {
return for_each(sel, [=](int o) {
return for_each(sel, [=](int r) {
return for_each(sel, [=](int y) {
return yield_if(s != 0 && m != 0, [=]() {
int send  = asNumber(vector{s, e, n, d});
int more  = asNumber(vector{m, o, r, e});
int money = asNumber(vector{m, o, n, e, y});
return yield_if(send + more == money, [=]() {
return yield(make_tuple(send, more, money));
});
});
});});});});});});});});
}```

The first `for_each` takes a selection of integers, `sel`, (never mind how we deal with uniqueness); and a continuation, a lambda, that takes one integer, `s`, and produces a list of solutions — tuples of three integers. This continuation, in turn, calls `for_each` with a selection for the next letter, `e`, and another continuation that returns a list of solutions, and so on.

The innermost continuation is a conditional version of `yield` called `yield_if`. It checks a condition and produces a zero- or one-element list of solutions. Internally, it calls another `yield_if`, which calls the ultimate `yield`. If that final `yield` is called (and it might not be, if one of the previous conditions fails), it will produce a solution — a triple of numbers. If there is more than one solution, these singleton lists will get concatenated inside `for_each` while percolating to the top.

In the second part of this post I will come back to the problem of picking unique numbers and introduce the state monad. You can also have a peek at the code on github.

## Challenges

1. Implement `for_each` and `yield` for a `vector` instead of a `List`. Use the Standard Library `transform` instead of `fmap`.
2. Using the list monad (or your vector monad), write a function that generates all positions on a chessboard as pairs of characters between `'a'` and `'h'` and numbers between 1 and 8.
3. Implement a version of `for_each` (call it `repeat`) that takes a continuation `k` of the type `function<List<B>()>` (notice the void argument). The function `repeat` calls `k` for each element of the list `lst`, but it ignores the element itself.

We have this friendly competition going on between Eric Niebler and myself. He writes some clever C++ template code, and I feel the compulsion to explain it to him in functional terms. Then I write a blog about Haskell or category theory and Eric feels a compulsion to translate it into C++.

Eric is now working on his proposal to rewrite the C++ STL in terms of ranges and I keep reinterpreting his work in terms familiar to functional programmers. Eric’s range comprehensions are a result of some of this back and forth.

• A monad is-an applicative functor
• An applicative functor is-a pointed functor
• A pointed functor is-a functor

## Functor

I have a pet peeve about the use of the word “functor” in C++. People keep calling function objects functors. It’s like calling Luciano Pavarotti an “operator,” because he sings operas. The word functor has a very precise meaning in mathematics — moreover, it’s the branch of mathematics that’s extremely relevant to programming. So hijacking this term to mean a function-like object causes unnecessary confusion.

A functor in functional programming is a generic template, which allows the “lifting” of functions. Let me explain what it means. A generic template takes an arbitrary type as a template argument. So a range (whether lazy or eager) is a generic template because it can be instantiated for any type. You can have a range of integers, a range of vectors, a range of ranges, and so on. (We’ll come back to ranges of ranges later when we talk about monads.)

The “lifting” of functions means this: Give me any function from some type T to some other type U and I can apply this function to a range of T and produce a range of U. You may recognize this kind of lifting in the STL algorithm `std::transform`, which can be used to apply a function to a container. STL containers are indeed functors. Unfortunately, their functorial nature is buried under the noise of iterators. In Eric’s range library, the lifting is done much more cleanly using `view::transform`. Have a look at this example:

``` int total = accumulate(view::iota(1) |
view::transform([](int x){return x*x;}) |
view::take(10), 0);```

Here, `view::transform` takes an anonymous function that squares its argument, and lifts this function to the level of ranges. The range created by `view::iota(1)` is piped into it from the left, and the resulting rage of squares emerges from it on the right. The (infinite) range is then truncated by `take`‘ing the first 10 elements.

The function `view::iota(1)` is a factory that produces an infinite range of consecutive integers starting from 1. (We’ll come back to range factories later.)

In this form, `view::transform` plays the role of a higher-order function: one that takes a function and returns a function. It almost reaches the level of terseness and elegance of Haskell, where this example would look something like this:

`total = sum \$ take 10 \$ fmap (\x->x*x) [1..]`

(Traditionally, the flow of data in Haskell is from right to left.) The (higher-order) function `fmap` can be thought of as a “method” of the class `Functor` that does the lifting in Haskell. In C++ there is no overall functor abstraction, so each functor names its lifting function differently — for ranges, it’s `view::transform`.

The intuition behind a functor is that it generates a family of objects that somehow encapsulate values of arbitrary types. This encapsulation can be very concrete or very abstract. For instance, a container simply contains values of a given type. A range provides access to values that may be stored in some other container. A lazy range generates values on demand. A `future`, which is also a functor (or will be, in C++17), describes a value that might not be currently available because it’s being evaluated in a separate thread.

All these objects have one thing in common: they provide means to manipulate the encapsulated values with functions. That’s the only requirement for a functor. It’s not required that a functor provide access to encapsulated values (which may not even exist), although most do. In fact there is a functor (really, a monad), in Haskell, that provides no way of accessing its values other than outputting them to external devices.

## Pointed Functor

A pointed functor is a functor with one additional ability: it lets you lift individual values. Give me a value of any type and I will encapsulate it. In Haskell, the encapsulating function is called `pure` although, as we will see later, in the context of a monad it’s called `return`.

All containers are pointed, because you can always create a singleton container — one that contains only one value. Ranges are more interesting. You can obviously create a range from a singleton container. But you can also create a lazy range from a value using a (generic) function called `view::single`, which doesn’t have a backing container behind it.

There is, however, an alternative way of lifting a value to a range, and that is by repeating it indefinitely. The function that creates such infinite (lazy) ranges is called `view::repeat`. For instance, `view::repeat(1)` creates an infinite series of ones. You might be wondering what use could there be of such a repetitive range. Not much, unless you combine it with other ranges. In general, pointed functors are not very interesting other than as stepping stones towards applicative functors and monads. So let’s move on.

## Applicative Functor

An applicative functor is a pointed functor with one additional ability: it lets you lift multi-argument functions. We already know how to lift a single-argument function using `fmap` (or `transform`, or whatever it’s called for a particular functor).

With multi-argument functions acting on ranges we have two different options corresponding to the two choices for `pure` I mentioned before: `view::single` and `view::repeat`.

The idea, taken from functional languages, is to consider what happens when you provide the first argument to a function of multiple arguments (it’s called partial application). You don’t get back a value. Instead you get something that expects one or more remaining arguments. A thing that expects arguments is called a function (or a function object), so you get back a function of one fewer arguments. In C++ you can’t just call a function with fewer arguments than expected, or you get a compilation error, but there is a (higher-order) function in the Standard Library called `std::bind` that implements partial application.

This kind of transformation from a function of multiple arguments to a function of one argument that returns a function is called currying.

Let’s consider a simple example. We want to apply `std::make_pair` to two ranges: `view::ints(10, 11)` and `view::ints(1, 3)`. To this end, let’s replace `std::make_pair` with the corresponding curried function of one argument returning a function of one argument:

`[](int i) { return [i](int j) { return std::make_pair(i, j); };}`

First, we want to apply this function to the first range. We know how to apply a function to a range: we use `view::transform`.

```auto partial_app = view::ints(10, 11)
| view::transform([](int i) {
return [i](int j) { return std::make_pair(i, j); }
});```

What’s the result of this application? Can you guess? Our curried function will be applied to each integer in the range, returning a function that pairs that integer with its argument. So we end up with a range of functions of the form:

`[i](int j) { return std::make_pair(i, j); }`

So far so good — we have just used the functorial property of the range. But now we have to decide how to apply a range of functions to the second range of values. And that’s the essence of the definition of an applicative functor. In Haskell the operation of applying encapsulated functions to encapsulated arguments is denoted by an infix operator `<*>`.

With ranges, there are two basic strategies:

1. We can enumerate all possible combinations — in other words create the cartesian product of the range of functions with the range of values — or
2. Match corresponding functions to corresponding values — in other words, “zip” the two ranges.

The former, when applied to `view::ints(1, 3)`, will yield:

`{(10,1),(10,2),(10,3),(11,1),(11,2),(11,3)}`

and the latter will yield:

`{(10, 1),(11, 2)}`

(when the ranges are not equal length, you stop zipping when the shorter one is exhausted).

To see that those two choices correspond to the two choices for `pure`, we have to look at some consistency conditions. One of them is that if you encapsulate a single-argument function in a range using `pure` and then apply it to a range of arguments, you should get the same result as if you simply `fmap`ped this function directly over the range of arguments. For the record, I’ll write it here in Haskell:

`pure f <*> xs == fmap f xs`

This is sort of an obvious requirement: You have two different ways of applying a single-argument function to a range, they better give the same result.

Let’s try it with the `view::single` version of `pure`. When acting on a function, it will create a one-element range containing this function. The “all possible combinations” application will just apply this function to all elements of the argument range, which is exactly what `view::transform` would do.

Conversely, if we apply `view::repeat` to the function, we’ll get an infinite range that repeats this function at every position. We have to zip this range with the range of arguments in order to get the same result as `view::transform`. So this implementation of `pure` works with the zippy applicative. Notice that if the argument range is finite the result will also be finite. But this kind of application will also work with infinite ranges thanks to laziness.

So there are two legitimate implementations of the applicative functor for ranges. One uses `view::single` to lift values and uses the all possible combinations strategy to apply a range of functions to a range of arguments. The other uses `view::repeat` to lift values and the zipping application for ranges of functions and arguments. They are both acceptable and have their uses.

Now let’s go back to our original problem of applying a function of multiple arguments to a bunch of ranges. Since we are not doing it in Haskell, currying is not really a practical option.

As it turns out, the second version of applicative has been implemented by Eric as a (higher-order) function `view::zip_with`. This function takes a multi-argument callable object as its first argument, followed by a variadic list of ranges.

There is no corresponding implementation for the combinatorial applicative. I think the most natural interface would be an overload of `view::transform` (or maybe `view::fmap`) with the same signature as `zip_with`. Our example would then look like this:

`view::transform(std::make_pair, view::ints(10, 11), view::ints(1, 3));`

The need for this kind of interface is not as acute because, as we’ll learn next, the combinatorial applicative is supplanted by a more general monadic interface.

Monads are applicative functors with one additional functionality. There are two equivalent ways of describing this functionality. But let me first explain why this functionality is needed.

The range library comes with a bunch of range factories, such as `view::iota`, `view::ints`, or `view::repeat`. It’s also very likely that users of the library will want to create their own range factories. The problem is: How do you compose existing range factories to obtain new range factories?

Let me give you an example that generated a blog post exchange between me and Eric. The challenge was to generate a list of Pythagorean triples. The idea is to take a cross product of three infinite ranges of integers and select those triples that satisfy the equation x2 + y2 = z2. The cross product of ranges is reminiscent of the “all possible combinations” applicative, and indeed that’s the applicative that can be extended to a monad (the zippy one can’t).

To make this algorithm feasible, we have to organize these ranges so we can (lazily) traverse them. Let’s start with a factory that produces all integers from 1 to infinity. That’s the `view::ints(1)` factory. Then, for each `z` produced by that factory, let’s create another factory `view::ints(1, z)`. This range will provide our `x`s — and it makes no sense to try `x`s that are bigger than `z`s. These values, in turn, will be used in the creation of the third factory, `view::ints(x, z)` that will generate our `y`s. At the end we’ll filter out the triples that don’t satisfy the Pythagorean equation.

Notice how we are feeding the output of one range factory into another range factory. Do you see the problem? We can’t just iterate over an infinite range. We need a way to glue the output side of one range factory to the input side of another range factory without unpacking the range. And that’s what monads are for.

Remember also that there are functors that provide no way of extracting values, or for which extraction is expensive or blocking (as is the case with futures). Being able to compose those kinds of functor factories is often essential, and again, the monad is the answer.

Now let’s pinpoint the type of functionality that would allow us to glue range factories end-to-end. Since ranges are functorial, we can use `view::transform` to apply a factory to a range. After all a factory is just a function. The only problem is that the result of such application is a range of ranges. So, really, all that’s needed is a lazy way of flattening nested ranges. And that’s exactly what Eric’s `view::flatten` does.

With this new flattening power at our disposal, here’s a possible beginning of the solution to the Pythagorean triple problem:

```view::ints(1) | view::transform([](int z) {
view::ints(1, z) | ... } | view::flatten```

However, this combination of `view::transform` and `view::flatten` is so useful that it deserves its own function. In Haskell, this function is called “bind” and is written as an infix operator `>>=`. (And, while we’re at it, `flatten` is called `join`.)

And guess what the combination of `view::transform` and `view::flatten` is called in the range library. This discovery struck me as I was looking at one of Eric’s examples. It’s called `view::for_each`. Here’s the solution to the Pythagorean triple problem using `view::for_each` to bind range factories:

```auto triples =
for_each(ints(1), [](int z) {
return for_each(ints(1, z), [=](int x) {
return for_each(ints(x, z), [=](int y) {
return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z));
});
});
});```

And here’s the same code in Haskell:

```triples =
(>>=) [1..] \$ \z ->
(>>=) [1..z] \$ \x ->
(>>=) [x..z] \$ \y ->
guard (x^2 + y^2 == z^2) >> return (x, y, z)```

I have purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators `>>=` are normally used in infix position but here I wanted to match them against `for_each`. Haskell’s `return` is the same as `view::single`, which Eric renamed to `yield` inside `for_each`. In this particular case, `yield` is conditional, which in Haskell is expressed using `guard`. The syntax for lambdas is also different, but otherwise the code matches almost perfectly.

This is an amazing and somewhat unexpected convergence. In our tweeter exchange, Eric sheepishly called his `for_each` code imperative. We are used to thinking of `for_each` as synonymous with looping, which is such an iconic imperative construct. But here, `for_each` is the monadic bind — the epitome of functional programming. This puppy is purely functional. It’s an expression that returns a value (a range) and has no side effects.

But what about those loops that do have side effects and don’t return a value? In Haskell, side effects are always encapsulated using monads. The equivalent of a `for_each` loop with side effects would return a monadic object. What we consider side effects would be encapsulated in that object. It’s not the loop that performs side effects, its that object. It’s an executable object. In the simplest case, this object contains a function that may be called with the state that is to be modified. For side effects that involve the external world, there is a special monad called the IO monad. You can produce IO objects, you can compose them using monadic bind, but you can’t execute them. Instead you return one such object that combines all the IO of your program from `main` and let the runtime execute it. (At least that’s the theory.)

Is this in any way relevant to an imperative programmer? After all, in C++ you can perform side effects anywhere in your code. The problem is that there are some parts of your code where side effects can kill you. In concurrent programs uncontrolled side effects lead to data races. In Software Transactional Memory (STM, which at some point may become part of C++) side effects may be re-run multiple times when a transaction is retried. There is an urgent need to control side effects and to separate pure functions from their impure brethren. Encapsulating side effects inside monads could be the ticket to extend the usefulness of pure functions inside an imperative language like C++.

To summarize: A monad is an applicative functor with an additional ability, which can be expressed either as a way of flattening a doubly encapsulated object, or as a way of applying a functor factory to an encapsulated object.

In the range library, the first method is implemented through `view::flatten`, and the second through `view::for_each`. Being an applicative functor means that a range can be manipulated using `view::transform` and that any value may be encapsulated using `view::single` or, inside `for_each`, using `yield`.

The ability to apply a range of functions to a range of arguments that is characteristic of an applicative functor falls out of the monadic functionality. For instance, the example from the previous section can be rewritten as:

```for_each(ints(10, 11), [](int i) {
return for_each(ints(1, 3), [i](int j) {
return yield(std::make_pair(i, j));
});
});```

## The Mess We’re In

I don’t think the ideas I presented here are particularly difficult. What might be confusing though is the many names that are used to describe the same thing. There is a tendency in imperative (and some functional) languages to come up with cute names for identical patterns specialized to different applications. It is also believed that programmers would be scared by terms taken from mathematics. Personally, I think that’s silly. A monad by any other name would smell as sweet, but we wouldn’t be able to communicate about them as easily. Here’s a sampling of various names used in relation to concepts I talked about:

1. Functor: `fmap`, `transform`, `Select` (LINQ)
2. Pointed functor: `pure`, `return`, `single`, `repeat`, `make_ready_future`, `yield`, `await`
3. Applicative functor: `<*>`, `zip_with`
4. Monad: `>>=`, `bind`, `mbind`, `for_each`, `next`, `then`, `SelectMany` (LINQ)

Part of the problem is the lack of expressive power in C++ to unite such diverse phenomena as ranges and futures. Unfortunately, the absence of unifying ideas adds to the already overwhelming complexity of the language and its libraries. The functional paradigm could be a force capable of building connections between seemingly distant application areas.

## Acknowledments

I’m grateful to Eric Niebler for reviewing the draft of this blog and correcting several mistakes. The remaining mistakes are all mine.

Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

# The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

```void printNTriples(int n)
{
int i = 0;
for (int z = 1; ; ++z)
for (int x = 1; x <= z; ++x)
for (int y = x; y <= z; ++y)
if (x*x + y*y == z*z) {
printf("%d, %d, %d\n", x, y, z);
if (++i == n)
return;
}
}```

Here, a single C function serves three distinct purposes: It

1. Generates Pythagorean triples,
2. Prints them,
3. Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

1. Generate the list of Pythagorean triples,
2. Take the first ten of them, and
3. Print them.

And that’s exactly how you’d write this program in Haskell:

```main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
, x <- [1..z]
, y <- [x..z]
, x^2 + y^2 == z^2]```

This program reads: take 10 triples and print them. It declares `triples` as a list (square brackets mean a list) of triples `(x, y, z)`, where (the vertical bar reads “where”) `z` is an element of the list of integers from 1 to infinity, `x` is from 1 to `z`, `y` is from `x` to `z`, and the sum of squares of `x` and `y` is equal to the square of `z`. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to `print` that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.

# Suspension

We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

```template<class T>
class Susp {
public:
explicit Susp(std::function<T()> f)
: _f(f)
{}
T get() { return _f(); }
private:
std::function<T()> _f;
};```

We often create suspensions using lambda functions, as in:

```int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
...
int z = sum.get();```

Notice that the suspended lambda may capture variables from its environment: here `x` and `y`. A lambda, and therefore a suspension, is a `closure`.

The trouble with this implementation is that the function is re-executed every time we call `get`. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls `get` we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the `this` pointer) and returning a value (by const reference). The `get` method simply calls this thunk, passing it the `this` pointer.

Initially, the thunk is set to `thunkForce`, which calls the method `setMemo`. This method evaluates the function, stores the result in `_memo`, switches the thunk to `thunkGet`, and returns the memoized value. On subsequent calls `get` goes through the `getMemo` thunk which simply returns the memoized value.

```template<class T>
class Susp
{
// thunk
static T const & thunkForce(Susp * susp) {
return susp->setMemo();
}
// thunk
static T const & thunkGet(Susp * susp) {
return susp->getMemo();
}
T const & getMemo() {
return _memo;
}
T const & setMemo() {
_memo = _f();
_thunk = &thunkGet;
return getMemo();
}
public:
explicit Susp(std::function<T()> f)
: _f(f), _thunk(&thunkForce), _memo(T())
{}
T const & get() {
return _thunk(this);
}
private:
T const & (*_thunk)(Susp *);
mutable T   _memo;

std::function<T()> _f;
};```

(By the way, the function pointer declaration of `_thunk` looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn’t ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

# Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the `const` method `pop_front`.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

```template<class T>
class Stream
{
private:
std::shared_ptr <Susp<Cell<T>>> _lazyCell;
};```

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the `Cell`? Remember, we want to be able to generate infinite sequences, so `Stream` must contain the DNA for not only producing the value of type `T` but also for producing the offspring — another (lazy) `Stream` of values. The `Cell` is just that: A value and a stream.

```template<class T>
class Cell
{
public:
Cell() {} // need default constructor for memoization
Cell(T v, Stream<T> const & tail)
: _v(v), _tail(tail)
{}
explicit Cell(T v) : _v(v) {}
T val() const {
return _v;
}
Stream<T> pop_front() const {
return _tail;
}
private:
T _v;
Stream<T> _tail;
};```

This mutually recursive pair of data structures works together amazingly well.

```template<class T>
class Stream
{
private:
std::shared_ptr <Susp<Cell<T>>> _lazyCell;
public:
Stream() {}
Stream(std::function<Cell<T>()> f)
: _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
{}
Stream(Stream && stm)
: _lazyCell(std::move(stm._lazyCell))
{}
Stream & operator=(Stream && stm)
{
_lazyCell = std::move(stm._lazyCell);
return *this;
}
bool isEmpty() const
{
return !_lazyCell;
}
T get() const
{
return _lazyCell->get().val();
}
Stream<T> pop_front() const
{
return _lazyCell->get().pop_front();
}
};```

There are several things worth pointing out. The two constructors follow our formal definition of the `Stream`: one constructs an empty stream, the other constructs a suspended `Cell`. A suspension is created from a function returning `Cell`.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of `forEach`.

The magic happens when we call `get` for the first time. That’s when the suspended `Cell` comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to `pop_front`. Notice that `pop_front` is a `const` method — the `Stream` itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from `n` to infinity. The constructor of a `Stream` takes a function that returns a `Cell`. We’ll use a lambda that captures the value of `n`. It creates a `Cell` with that value and a tail, which it obtains by calling `intsFrom` with `n+1`:

```Stream<int> intsFrom(int n)
{
return Stream<int>([n]()
{
return Cell<int>(n, intsFrom(n + 1));
});
}```

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner `intsFrom` is not made from the outer `intsFrom`. Instead it’s made the first time `get` is called on the emerging `Stream`.

Of course, we can also create finite streams, like this one, which produces integers from `n` to `m`:

```Stream<int> ints(int n, int m)
{
if (n > m)
return Stream<int>();
return Stream<int>([n, m]()
{
return Cell<int>(n, ints(n + 1, m));
});
}```

The trick is to capture the limit `m` as well as the recursion variable `n`. When the limit is reached, we simply return an empty `Stream`.

We’ll also need the method `take`, which creates a `Stream` containing the first `n` elements of the original stream:

```Stream take(int n) const {
if (n == 0 || isEmpty())
return Stream();
auto cell = _lazyCell;
return Stream([cell, n]()
{
auto v = cell->get().val();
auto t = cell->get().pop_front();
return Cell<T>(v, t.take(n - 1));
});
}```

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, `Stream`. Again, the key to understanding why this works is to keep in mind that `Stream`s and `Cell`s are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of `take` forces the evaluation of the first n elements — e.g., by passing them through the consuming `forEach` below — these elements will appear miraculously memoized in the original `Stream`.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of `forEach` that consumes the stream while enumerating it and feeding its elements to a function.

```template<class T, class F>
void forEach(Stream<T> strm, F f)
{
while (!strm.isEmpty())
{
f(strm.get());
strm = strm.pop_front();
}
}```

It’s the assignment:

`strm = strm.pop_front();`

which consumes the stream by decreasing the reference count of the head of the `Stream`. In particular, if you pass an rvalue `Stream` to `forEach`, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the `Stream`. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and `shared_ptr`.

# Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

```triples = do
z <- [1..]
x <- [1..z]
y <- [x..z]
guard (x^2 + y^2 == z^2)
return (x, y, z)```

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it `z` — create a stream from 1 to `z`. For each of those — call them `x` — create a stream from `x` to `z`. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples `(x, y, z)`.

So far we’ve learned how to create a stream of integers — we have the function `intsFrom`. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our `Stream` is a functor.

# Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the `std::transform` algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it `fmap` — generic, we have to parameterize it over types. The algorithm starts with a `Stream` of elements of type `T` and a function from `T` to some other type `U`. The result should be a stream of `U`.

We don’t want to make `U` the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type `F` to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

```template<class T, class F>
auto fmap(Stream<T> stm, F f)```

Without the use of concepts, we have no way of enforcing, or even specifying, that `F` be a type of a function from `T` to `U`. The best we can do is to statically assert it inside the function:

```static_assert(std::is_convertible<F, std::function<U(T)>>::value,
"fmap requires a function type U(T)");```

But what is `U`? We can get at it using `decltype`:

`decltype(f(stm.get()));`

Notice that `decltype` takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of `f`. We also need a dummy argument for this function: we use the result of `stm.get()`. The argument to `decltype` is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of `fmap`. It’s supposed to be `Stream<U>`, but we don’t know `U` until we apply `decltype` to the arguments of `fmap`. We have to use the new `auto` function declaration syntax of C++11. So here are all the type-related preliminaries:

```template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
{
using U = decltype(f(stm.get()));
static_assert(std::is_convertible<F, std::function<U(T)>>::value,
"fmap requires a function type U(T)");
...
}```

Compared to that, the actual implementation of `fmap` seems rather straightforward:

```    if (stm.isEmpty()) return Stream<U>();
return Stream<U>([stm, f]()
{
return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));
});
```

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function `f`, and returns a `Cell`. That cell contains the value of `f` acting on the first element of the original stream, and a tail. The tail is created with `fmap` acting on the rest of the original stream.

Equipped with `fmap`, we can now attempt to take the first step towards generating our triples: apply the function `ints(1, z)` to each element of the stream `intsFrom(1)`:

```fmap(intsFrom(1), [](int z)
{
return ints(1, z);
});```

The result is a `Stream` of `Stream`s of integers of the shape:

```1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
...```

But now we are stuck. We’d like to apply `ints(x, z)` to each element of that sequence, but we don’t know how to get through two levels of `Stream`. Our `fmap` can only get through one layer. We need a way to flatten a `Stream` of `Stream`s. That functionality is part of what functional programmers call a monad. So let me show you that `Stream` is indeed a monad.

# Stream as a Monad

If you think of a `Stream` as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two `Stream`s (we’ll get to it later) and let’s implement a function `mjoin` that concatenates a whole `Stream` of `Stream`s.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a `Stream` of `Stream`s, so we have to test two levels deep. This way we’ll ensure that the concatenation of a `Stream` of empty `Stream`s immediately returns an empty `Stream`.

The recursive step in `mjoin` creates a `Cell` whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of `mjoin` of the rest of the streams:

```template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
{
while (!stm.isEmpty() && stm.get().isEmpty())
{
stm = stm.pop_front();
}
if (stm.isEmpty()) return Stream<T>();
return Stream<T>([stm]()
{
Stream<T> hd = stm.get();
return Cell<T>( hd.get()
, concat(hd.pop_front(), mjoin(stm.pop_front())));
});
}```

The combination of `fmap` and `mjoin` lets us compose function like `intsFrom` or `ints` that return `Stream`s. In fact, this combination is so common that it deserves its own function, which we’ll call `mbind`:

```template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
{
return mjoin(fmap(stm, f));
}```

If we use `mbind` in place of `fmap`:

```mbind(intsFrom(1), [](int z)
{
return ints(1, z);
});```

we can produce a flattened list:

`1 1 2 1 2 3 1 2 3 4...`

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable `z`. We can keep chaining calls to `mbind` and capture more variables in the process:

```mbind(intsFrom(1), [](int z)
{
return mbind(ints(1, z), [z](int x)
{
return mbind(ints(x, z), [x, z](int y)
{
...
}
}
}```

At this point we have captured the triples `x`, `y`, `z`, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is `mthen` which is a version of `mbind` that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, `mthen` will return an empty stream.

We implement `mthen` using a slightly modified version of `fmap` that takes a function `f` of no arguments:

```template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
{
using U = decltype(f());
static_assert(std::is_convertible<F, std::function<U()>>::value,
"fmapv requires a function type U()");

if (stm.isEmpty()) return Stream<U>();
return Stream<U>([stm, f]()
{
return Cell<U>(f(), fmapv(stm.pop_front(), f));
});
}```

We plug it into the definition of `mthen` the same way `fmap` was used in `mbind`:

```template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
{
return mjoin(fmapv(stm, f));
}```

The second useful function is `mreturn`, which simply turns a value of any type into a one-element `Stream`:

```template<class T>
Stream<T> mreturn(T v)
{
return Stream<T>([v]() {
return Cell<T>(v);
});
}```

We’ll need `mreturn` to turn our triples into `Stream`s.

It so happens that a parameterized type equipped with `mbind` and `mreturn` is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy `Stream` is indeed a monad.

# Stream as a Monoid and a Monad Plus

When implementing `mjoin` we used the function `concat` to lazily concatenate two `Stream`s. Its implementation follows the same recursive pattern we’ve seen so many times:

```template<class T>
Stream<T> concat( Stream<T> lft
, Stream<T> rgt)
{
if (lft.isEmpty())
return rgt;
return Stream<T>([=]()
{
return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));
});
}```

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like `concat`, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of `Stream`s is indeed associative, and that the neutral element is an empty `Stream`. Concatenating an empty `Stream`, whether in front or in the back of any other `Stream`, doesn’t change the original `Stream`.

What’s even more interesting is that being a combination of a monoid and a monad makes `Stream` into a monad plus, and every monad plus defines a `guard` function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a `Stream`. If the Boolean is false, the `Stream` is empty (the unit element of monad plus!), otherwise it’s a singleton `Stream`. We really don’t care what value sits in this `Stream` — we never use the result of `guard` for anything but the flow of control. In Haskell, there is a special “unit” value `()` — here I use a `nullptr` as its closest C++ analog.

```Stream<void*> guard(bool b)
{
if (b) return Stream<void*>(nullptr);
else return Stream<void*>();
}```

We can now pipe the result of `guard` into `mthen`, which will ignore the content of the `Stream` but won’t fire when the `Stream` is empty. When the `Stream` is not empty, we will call `mreturn` to output a singleton `Stream` with the result tuple:

```Stream<std::tuple<int, int, int>> triples()
{
return mbind(intsFrom(1), [](int z)
{
return mbind(ints(1, z), [z](int x)
{
return mbind(ints(x, z), [x, z](int y)
{
return mthen(guard(x*x + y*y == z*z), [x, y, z]()
{
return mreturn(std::make_tuple(x, y, z));
});
});
});
});
}```

These singletons will then be concatenated by the three levels of `mbind` to create one continuous lazy `Stream` of Pythagorean triples.

Compare this function with its Haskell counterpart:

```triples = do
z <- [1..]
x <- [1..z]
y <- [x..z]
guard (x^2 + y^2 == z^2)
return (x, y, z)```

Now, the client can `take` 10 of those triples from the `Stream` — and the triples still won’t be evaluated!. It’s the consuming `forEach` that finally forces the evaluation:

```void test()
{
auto strm = triples().take(10);
forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
{
std::cout << std::get<0>(t) << ", "
<< std::get<1>(t) << ", "
<< std::get<2>(t) << std::endl;
});
}```

# Conclusion

The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function `triples` to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.

[If you prefer, you may watch the video of my talk on this topic (here are the slides).]

If you thought you were safe from functional programming in your cozy C++ niche, think again! First the lambdas and function objects and now the monad camouflaged as `std::future`. But do not despair, it’s all just patterns. You won’t find them in the Gang of Four book, but once you see them, they will become obvious.

Let me give you some background: I was very disappointed with the design of C++11 `std::future`. I described my misgivings in: Broken Promises — C++0x futures. I also made a few suggestions as how to fix it: Futures Done Right. Five years went by and, lo and behold, a proposal to improve `std::future` and related API, N3721, was presented to the Standards Committee for discussion. I thought it would be a no brainer, since the proposal was fixing obvious holes in the original design. A week ago I attended the meetings of the C++ Standards Committee in Issaquah — since it was within driving distance from me — and was I in for a surprise! Apparently some design patterns that form the foundation of functional programming are not obvious to everybody. So now I find myself on the other side of the discussion and will try to explain why the improved design of `std::future` is right.

Design arguments are not easy. You can’t mathematically prove that one design is better than another, or a certain set of abstractions is better than another — unless you discover some obvious design flaws in one of them. You might have a gut feeling that a particular solution is elegant, but how do you argue about elegance?

Thankfully, when designing a library, there are some well known and accepted criteria. The most important ones, in my mind, are orthogonality, a.k.a., separation of concerns, and composability. It also helps if the solution has been previously implemented and tested, especially in more than one language. I will argue that this is indeed the case with the extended `std::future` design. In the process, I will describe some programming patterns that might be new to C++ programmers but have been tried and tested in functional languages. They tend to pop up more and more in imperative languages, especially in connection with concurrency and parallelism.

# The Problem

In a nutshell, the problem that `std::future` is trying to solve is that of returning the result of a computation that’s being performed in parallel, or returning the result of an asynchronous call. For instance, you start a computation in a separate thread (or a more general execution agent) and you want to, at some point in time, get back the result of that computation. This is one of the simplest models of concurrency: delegating the execution of a function (a closure) to another thread.

To return a value from one thread to another you need some kind of a communication channel. One thread puts a value in the channel, another picks it up. Instead of providing one channel abstraction, as does ML or Haskell, C++11 splits it into two separate abstractions: the promise and the future. The promise is the push end of the channel, the future is the pull end. (In Rust there are similar objects called `Chan` and `Port`.)

The general pattern is for the client to construct a promise, get the future from it using `get_future`, and start a thread, passing it the promise. When the thread is done, it puts the result in the promise using `set_value`. In the meanwhile, the calling thread may do some other work and eventually decide to retrieve the result from the future by calling its method `get`. If the promise has been fulfilled, `get` returns immediately with the value, otherwise it blocks until the value is available.

This pattern involves some boilerplate code dealing with the promise side of things, so the Standard introduced a shortcut called `std::async` to simplify it. You call `std::async` with a plain function (closure) and its result is automatically put into a hidden promise. All the client sees is the `future` side of the channel. (I am simplifying things by ignoring exception handling and various modes of starting `async`.)

# The Functor Pattern

Here’s the first abstraction: A future is an object that encapsulates a value. By itself, this would be a pretty useless abstraction unless the encapsulation came with some other functionality or restriction. For instance, `std::unique_ptr` encapsulates a value, but also manages the lifetime of the memory it occupies. A future encapsulates a value, but you might have to block to get it. Functional languages have a very useful pattern for just this kind of situation: the functor pattern (not to be confused with the C++ misnomer for a function object). A functor encapsulates a value of an arbitrary type, plus it lets you act on it with a function.

Notice that the functor doesn’t necessarily give you access to the value — instead it lets you modify it. The beauty of it is that, in the case of a future, a functor gives you the means to modify the value that potentially isn’t there yet — and it lets you do it without blocking. Of course, behind the scenes, the function (closure) that you provide is stored in the future and only applied when the value is ready and is being accessed using `get`.

The first part of the fix that was proposed to the Committee was to turn `std::future` into a functor. Technically, this is done by adding a new method, `then`:

```template<typename F>
auto future::then(F&& func) -> future<decltype(func(*this))>;```

This method takes a function object `func` to be applied to the future in question. The result is a new future of the type that is returned by the function object, `decltype(func(*this))`.

Things are slightly muddled by the fact that a future not only encapsulates the value to be calculated but also the possibility of an exception. This is why the function passed to `then` takes the whole future, from which it can extract the value using `get`, which at that point is guaranteed not to block, but may rethrow an exception. There is an additional proposal N3865 to introduce another method, `next`, that would deal only with the value, not the exception. The advantage of `next` is that it could be called with a regular function unaware of the existence of futures, with no additional boilerplate. For simplicity, I’ll be using `next` in what follows.

The functor pattern makes perfect sense for composing a regular function on top of an asynchronous function (one returning a future), but it’s more general than that. Any time you have an object that is parameterized by an arbitrary type, you might be dealing with a functor. In C++, that would be a template class that doesn’t impose any restrictions on its template argument. Most containers have this property. In order for a generic class to be a functor it must also support a means to operate on its contents. Most containers in STL provide this functionality through the algorithm `std::transform`. For an imperative programmer it might come as a surprise that such disparate things as futures and containers fall under the same functional pattern — a functor.

Unlike in functional languages, in C++ there is no natural reusable expression for the functor pattern, so it’s more of the pattern in the head of the programmer. For instance, because of memory management considerations, `std::transform` operates on iterators rather than containers — the storage for the target container must be either pre-allocated or allocated on demand through iterator adapters. One could try to provide iterator adapters for futures, so they could be operated on by `std::transform`, but ultimately the transformation has to act on the internals of the future (i.e., store the function object in it) so it either has to be a method or a friend of the future.

# The Monad Pattern

The functor pattern is not enough to provide full composability for futures. The likely scenario is that the user creates a library of future-returning functions, each performing a specific task. He or she then needs the means to combine such functions into more complex tasks. This is, for instance, the case when combining asynchronous operations, such as opening a file and then reading from it. Suppose we have the `async_open` function that returns a file handle future:

`future<HANDLE> async_open(string &);`

and the `async_read` function that takes a file handle and returns a future with the buffer filled with data:

`future<Buffer> async_read(HANDLE fh);`

If you combine the two using `next`, the result will be a future of a future:

`future<future<Buffer>> ffBuf = async_open("foo").next(&async_read);`

In order to continue chaining such calls without blocking — for instance to asynchronously process the buffer — you need a way to collapse the double future to a single future and then call `next` on it.

The collapsing method, `unwrap`, is another part of the extended future proposal. When called on a `future<future<T>>` it returns `future<T>`. It lets you chain asynchronous functions using `next` followed by `unwrap`.

`async_open("foo").next(&async_read).unwrap().next(&async_process);`

In functional programming such a collapsing function is called `join`. The combination `next` followed by `unwrap` (or, in Haskell, `fmap` followed by `join`) is so common that it has its own name, bind (in Haskell it’s the operator `>>=`). It might make sense to make `bind` another method of future (possibly under a different name). [Edit: In fact, the proposal (n3721) is to overload `then` to automatically perform `unwrap` whenever the result is a future of a future. This way `then` would also work as bind.]

There’s one more important usage pattern: a function that may execute asynchronously, but sometimes returns the result immediately. This often happens in recursive algorithms, when the recursion bottoms up. For instance, a parallel tree traversal function may spawn asynchronous tasks for traversing the children of a node, but when it reaches a leaf, it might want to return the result synchronously. Instead of writing complicated conditional code at each level, it’s easier to provide a “fake” future whose contents is immediately available — whose `get` method never blocks. Such fake future and the function that creates it called `make_ready_future` are also part of the proposal.

Together, the methods `next` (or `then`) and `unwrap`, and the function `make_ready_future` are easily recognizable by a functional programmer as forming the monad pattern (in Haskell, they would be called, respectively, `fmap`, `join`, and `return`). It’s a very general pattern for composing functions that return encapsulated values. Using a monad you may work with such functions directly, rather than unwrapping their results at every step. In the case of futures, this is an important issue, since the “unwrapping” means making a potentially blocking call to `get` and losing precious opportunities for parallelism. You want to set up as much computation up front and let the system schedule the most advantageous execution.

Combining functions using `next`, `unwrap` (or, equivalently, `bind`), and `make_ready_future` is equivalent to specifying data dependencies between computations and letting the runtime explore opportunities for parallelism between independent computations.

# The Applicative Pattern

The combinators `then` and `next` are designed for linear composition: the output of one computation serves as the input for another. A more general pattern requires the combining of multiple asynchronous sources of data. In functional programming the problem would be described as applying a function to multiple arguments, hence the name “applicative” pattern. A functional programmer would take a multi-argument function and “lift” it to accept futures instead of immediate values.

As expected, in imperative programming things are a little messier. You have to create a barrier for all the input futures, retrieve the values, and then pass them to the multi-argument function or algorithm. The proposal contains a function called `when_all` that implements the first part of the process — the barrier. It takes either a pair of iterators to a container of futures or a variable number of futures, and returns a future that fires when all the arguments are ready. Conceptually, it performs a logical AND of all input futures.

The iterator version of `when_all` returns a future of a vector of futures, while the variadic version returns a future of a tuple of futures. It’s up to the client to `get` the resulting vector or tuple and iterate over it. Because of that, it’s not possible to directly chain the results of `when_all` the way `then` or `next` does it.

If you’re wondering how this kind of chaining is done in a functional language, you have to understand what partial application is. A function of many arguments doesn’t have to be applied to all of the arguments at once. You can imagine that applying it to the first argument doesn’t yield a value but rather a function on n-1 arguments. In C++11, this can be accomplished by calling `std::bind`, which takes a multi-parameter function and a value of the first argument, and returns a function object (a closure) that takes the remaining n-1 arguments (actually, you may pass it more than one argument at a time).

In this spirit, you could bind a multi-parameter function to a single future and get a future of a function of n-1 arguments. Then you are left with the problem of applying a future of a function to a future of an argument, and that’s exactly what the applicative pattern is all about. In Haskell, the `Applicative` class defines the operator `<*>` that applies an encapsulated function to an encapsulated value.

# The Monoid Pattern

A very common pattern is to start several computations in parallel and pick the one that finishes first. This is the basis of speculative computation, where you pitch several algorithms against each other. Or you might be waiting for any of a number of asynchronous events, and attend to them as soon as they happen.

At a minimum you would expect a combinator that acts like a logical OR of two futures. A functional programmer would be immediately on the lookout for the monoid pattern. A monoid is equipped with a binary operation and a unit element. If the binary operation on futures picks the one that finishes first, what should the unit future be? A unit combined with any element must give back that same element. Therefore we need a future that would lose the race with any other future. We could call this special future “never.” Calling `get` on such a future would block forever.

In practice, one could slightly relax the definition of the “never” future. It would never return a result, but it could still throw an exception. A future like this could be used to implement a timeout. Pitching it against another future would either let the other future complete, or result in a timeout exception.

This is not the way the future extension proposal went, though. The proposed combinator is called `when_any` and it takes either a pair of iterators to a container of futures or a variable number of futures. It returns a future of either a vector or a tuple of futures. It’s up to the client to iterate over those futures and find the one (or the ones) that fired by calling `is_ready` on each of them.

The advantage of this approach is that the client may still write code to wait for the remaining futures to finish. The disadvantage is that the client is responsible for writing a lot of boilerplate code, which will obscure the program logic.

# Performance and Programming Considerations

An objection to using futures as the main vehicle for asynchronous programming was raised in N3896: Library Foundations for Asynchronous Operations. The point it that it’s possible for an asynchronous API to have a result ready before the client had the opportunity to provide the continuation by calling `then` (or `next`). This results in unnecessary synchronization, which may negatively impact performance.

The alternative approach is to pass the continuation (the handler) directly to the asynchronous API. This is how a lot of asynchronous APIs are implemented at the lowest level anyway. The two approaches don’t exclude each other, but supporting both at the same time, as proposed in N3896, adds a lot of complexity to the programming model.

From the programmer’s perspective, the continuation passing model of N3896 is probably the hardest to use. The programming model is that of a state machine, with the client responsible for writing handlers for every transition.

Futures provide a useful abstraction by reifying the anticipated values. The programmer can write code as if the values were there. Futures also provide a common language between concurrent, parallel, and asynchronous worlds. It doesn’t matter if a value is to be evaluated by spawning a thread, creating a lightweight execution agent, or by calling an asynchronous API, as long as it’s encapsulated in a future. The compositional and expressional power of futures is well founded in major patterns of functional programming: the functor, the monad, the applicative, and the monoid.

There is another, even more attractive programming model that’s been proposed for C++, Resumable Functions, which makes asynchronous code look more like sequential code. This is based on a trick that’s well known to Haskell programmers in the form of the “do” notation. In C++, a resumable function would be chopped by the compiler into a series of continuations separated by `await` keywords. Instead of creating a future and calling `then` with a lambda function, the programmer would insert `await` and continue writing code as if the value were available synchronously.

# Acknowledgment

I’d like to thank Artur Laksberg for reading the draft of this blog and providing useful feedback.

In my previous post I worked on stretching the intuition of what a container is. I proposed that, in Haskell, any functor may be interpreted as some kind of container, including the hard cases like the state functor or the IO functor. I also talked about natural transformations between functors as “repackaging” schemes for containers, which work without “breaking the eggs” — not looking inside the elements stored in the container. Continuing with this analogy: Algebras are like recipes for making omelets.

The intuition is that an algebra provides a way to combine elements stored inside a container. This cannot be done for arbitrary types because there is no generic notion of “combining.” So an algebra is always defined for a specific type. For instance, you can define an algebra for numbers because you know how to add or multiply them, or for strings because you can concatenate them, and so on. The way elements are combined by an algebra is in general driven by the structure of the container itself.

For example, think of an expression tree as a container.

```data Expr a = Const a
| Add (Expr a) (Expr a)
| Mul (Expr a) (Expr a)```

We could define many algebras for it. An integer algebra would work on an expression tree that stores integers. A complex algebra would work on a tree that stores complex numbers. A Boolean algebra would work on Boolean expressions using, for instance, logical OR to evaluate the `Add` node and logical AND for the `Mul` node. You could even define an algebra of sets with union and intersection for `Add` and `Mul`. In fact, in the absence of any additional requirements, any pair of binary functions acting on a given type will do.

The definition of an algebra for a given functor `f` consists of a type `t` called the carrier type and a function called the action. Any Haskell algebra is therefore of the type:

`newtype Algebra f t = Algebra (f t -> t)`

More abstractly, in category theory, an algebra (or, more precisely, an F-algebra) for an endofunctor F is a pair (A, alg) of an object A and a morphism alg : F A -> A. As always, the standard translation from category theory to Haskell replaces objects with types and morphisms with functions.

Let’s have a look at a simple example of an algebra. Let’s pick the list functor and define an `Int` algebra for it, for instance:

```sumAlg :: Algebra [] Int
sumAlg = Algebra (foldr (+) 0)```

Despite its simplicity, this example leads to some interesting observations.

First, the use of `foldr` tells us that it’s possible to handle recursion separately from evaluation. The evaluation is really parameterized here by the function `(+)` and the value, zero. The algebra is type-specific. On the other hand, `foldr` is fully polymorphic. It turns out that there is another algebra hidden in this example, and it’s determined just by `(+)` and zero. We’ll see that more clearly when we talk about fixed points of functors.

The second observation is that a list is not only a functor but also a monad. Is there something special about algebras for a monad? We’ll see.

# Algebras and Fixed Points

I wrote a whole blog post about F-algebras with a more categorical slant. Here I’ll elaborate on the Haskell aspects of algebras and develop some more intuitions.

A recursive container is not only a functor but it can also be defined as a fixed point of a functor. So, really, we should start with a double functor, parameterized by two types, `a` and `b`:

```data ExprF a b = Const a
| Add b b
| Mul b b
deriving Functor```

We can then find its fixed point: a type that, when substituted for `b`, will give back itself. Think of a functor as a TV camera (sorry for switching metaphors). When you point it at some type `b`, its image appears in all the little monitors where `b` is on the right hand side of the definition. We all know what happens when you point the camera back at the monitors — you get the ever receding image within image within image… That’s your fixed point.

This “pointing of the camera at the monitors” can be abstracted into a Haskell data structure. It is parameterized by a functor `f`, which provides the camera and the monitors. The fixed point is given by the ever receding:

`newtype Fix f = In (f (Fix f))`

Notice that, on the left hand side, `f` appears without an argument. If `f a` is a container of `a` then `f` by itself is a recipe for creating a container from any type. `Fix` takes such a recipe and applies it to itself — to `(Fix f)`.

Later we’ll also need the deconstructor, `unIn`:

```unIn :: Fix f -> f (Fix f)
unIn (In x) = x```

Going back to our earlier functor, we can apply `Fix` to it and get back the recursive version of `Expr`:

`type Expr a = Fix (ExprF a)`

Here, `(ExprF a)` is a recipe for stuffing any type `b` into a simple (non-recursive) container defined by `ExprF`.

Creating actual expressions using the above definition of `Expr` is a little awkward, but possible. Here’s one:

```testExpr :: Expr Int
testExpr = In \$ (In \$ (In \$ Const 2) `Add` (In \$ Const 3))
`Mul` (In \$ Const 4)```

Knowing that a recursive data type such as `(Expr a)` is defined in terms of a simpler functor `(ExprF a b)` means that any recursive algebra for it can be defined in terms of a simpler algebra. For instance, we can define a simple algebra for `(ExprF Int)` by picking the carrier type `Double` and the following action:

```alg :: ExprF Int Double -> Double
alg (Const i) = fromIntegral i
alg (Add x y) = x + y
alg (Mul x y) = x * y```

We can extend this algebra to work on arbitrary recursive expressions of type `Expr Int`. We’ll call this new recursive algebra `alg'`. When given an `(Expr Int)` it will do the following:

1. Extract the contents of the outer `Fix` by pattern matching on the consturctor `In`. The contents is of the type `ExprF` acting on `(Expr Int)`.
2. Apply `alg'` (the recursive one we are just defininig) to this contents. Do this using `fmap`. Here we are taking advantage of the fact that `ExprF` is a functor. This application of `alg'` replaces the children of the expression `ExprF` with `Double`s — the results of their evaluation.
3. Apply `alg` to the result of the previous step, which is of the type `(ExprF Int Double)`.

Here’s the code that implements these steps:

```alg' :: Fix (ExprF Int) -> Double
alg' (In expr) = alg (fmap alg' expr)```

Notice that this code does not depend on the details of the functor. In fact it will work for any functor and any algebra:

```cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unIn```

This generic function is called a catamorphism. It lets you apply an algebra to the contents of a recursively defined container.

My first example of an algebra was acting on a list. A list can also be defined as a fixed point of a functor:

```data ListF a b = Empty | Cons a b
deriving Functor```

If you work out the details, you can convince yourself that the `sumAlg` I defined earlier is nothing else but the catamorphism for the functor `ListF Int` applied to the following simple algebra:

```alg :: ListF Int Int -> Int
alg Empty = 0
alg (Cons a b) = a + b```

Now we understand why any list catamorphism is parameterized by one value and one function of two arguments.

# Monads and Algebras

As I said in the beginning, a list is not only a functor but also a monad. A monad adds two special abilities to a functor/container. It lets you create a default container that contains just a given value: The function that does it is called `return`. And it lets you collapse a container of containers into a single container: That function is called `join` (and I explained before how it relates to the more commonly used bind, `>>=`).

When we define an algebra for a functor that happens to be a monad, it would be nice for this algebra to interact sensibly with `return` and `join`. For instance, you can apply `return` to a value of the algebra’s carrier type to obtain a default container of that type. Evaluating such a container should be trivial — it should give you back the same value:

`(1) alg . return == id`

For instance, in the list monad `return` creates a singleton list, so we want the algebra to extract the value from a singleton without modifying it in any way.

```alg [a] =
(alg . return) a =
id a =
a```

Now let’s consider a container of containers of the carrier type. We have two ways of collapsing it: we can `fmap` our algebra over it — in other words, evaluate all the sub-containers — or we can `join` it. Expecting to get the same result in both cases would be asking a lot (but we get something like this in the Kleisli category later). We can demand though that, for an algebra to be compatible with a monad, the two resulting containers at least evaluate to the same thing:

`(2) alg . fmap alg == alg . join`

Let’s see what this condition means for lists, where `join` is concatenation. We start with a list of lists and we apply two evaluation strategies to it: We can evaluate the sub-lists and then evaluate the resulting list of results, or we can concatenate the sub-lists and then evaluate the concatenated list.

Guess what, our condition is equivalent to imposing associativity on the algebra. Think of the action of the algebra on a two-element list as some kind of “multiplication.” Since the concatenation of `[a, [b, c]]` is the same as the concatenation of `[[a, b], c]`, these two must evaluate to the same value. But that’s just associativity of our “multiplication.”

How much can we extend this analogy with multiplication? Can we actually produce a unit element? Of course: The action of the algebra on an empty list:

`e = alg []`

Let’s check it: Apply our compatibility conditions to the list `[[a], []]`. This is the left hand side:

```(alg . fmap alg) [[a], []] =
alg [alg [a], alg []] =
alg [a, e]```

And this is the right hand side:

```(alg . join) [[a], []] =
alg [a] =
a```

So, indeed, `e` is the right unit of our “multiplication.” You can do the same calculation for `[[], [a]]` to show that it’s also the left unit.

We have an associative operation equipped with a unit — that’s called a monoid. So any list algebra compatible with the list’s monadic structure defines a monoid.

## T-Algebras

An F-algebra that’s compatible with a monad (conditions (1) and (2) above), both built on the same functor, is called a T-algebra. I guess that’s because mathematicians replace F with T when they talk about monads. There may be many T-algebras for a given monad and in fact they form a category of their own.

This is not saying much, because requirements for a category are pretty minimal. You have to define arrows: here it would be homomorphisms of T-algebras. A homomorphism of algebras maps one carrier into another in such a way as to preserve the action.

In Haskell, a homomorphism of algebras would just be a function `h` from one carrier type to another such that:

```h    :: A -> B
alg  :: F A -> A
alg' :: F B -> B

h . alg == alg' . fmap h```

Here, `alg` and `alg'` are the two actions with carrier types `A` and `B`, respectively, and `F` is the functor. What this means is that, if you have a container of As you can evaluate it using `alg` and then apply `h` to it and get a B, or you can apply `h` to the contents of the container using `fmap` and then evaluate the resulting container of Bs using `alg'`. The result should be the same in both cases.

This is a pretty standard way of defining homomorphisms for any structure, not just an algebra. Homomorphisms behave like functions: they are composable and there always is an identity homomorphism for every algebra, so they indeed turn T-algebras into a category — the so called Eilenberg-Moore category.

Remember what I said about the compatibility between `join` and `alg`? They both take down one layer of containment. Other than that, they are very different: `join` is a polymorphic natural transformation — it operates on the structure of the container, not its contents. An F-algebra operates on the contents and is defined only for a specific type.

And yet we can use `join` to define a T-algebra. Just consider using a container as a carrier type. A container is an image of some type `a` under a functor `m` which, for our purposes also happens to be a monad. Apply `m` to it one more time and you get a container of containers. You can “evaluate” this container of containers down to a single container using `join`.

You have just defined an algebra for the functor `m` whose carrier type is `(m a)` and the action is `join`. In fact, you have defined a whole family of algebras parameterized by the type `a`. Keep in mind that `a` is not the carrier type of this algebra, `(m a)` is. These algebras are called free algebras for the monad `m`. Guess what, they also form a category — the so called Kleisli category — which is a subcategory of the Eilenberg-Moore category.

Why are these two categories important? Well, it’s a topic for another blog post, but here’s the idea: Suppose you have two functors, F and G, one going from category C to D and the other going back. If G were the inverse of F, we would say that C and D are isomorphic. But what if they were “almost” inverse? For instance, their composition instead of being the identity were somehow mappable to identity. This kind of relationship between functors can be formalized into an adjunction. It so happens that the composition of two adjoint functors forms a monad (or a comonad, if you compose them the other way around). Not only that — any monad may be decomposed into a pair of adjoint functors. There are many ways to perform this decomposition and there are many choices for the intermediate category — the target of F and the source of G. The Kleisli category is the smallest such category and the Eilenberg-Moore category is the largest one.

For an outsider, Haskell is full of intimidating terms like functor, monad, applicative, monoid… These mathematical abstractions are hard to explain to a newcomer. The internet is full of tutorials that try to simplify them with limited success.

The most common simplification you hear is that a functor or a monad is like a box or a container. Indeed, a list is a container and a functor, `Maybe` is like a box, but what about functions? Functions from a fixed type to an arbitrary type define both a functor and a monad (the reader monad). More complex functions define the state and the continuation monads (all these monads are functors as well). I used to point these out as counterexamples to the simplistic picture of a functor as a container. Then I had an epiphany: These are containers!

So here’s the plan: I will first try to convince you that a functor is the purest expression of containment. I’ll follow with progressively more complex examples. Then I’ll show you what natural transformations really are and how simple the Yoneda lemma is in terms of containers. After functors, I’ll talk about container interpretation of pointed, applicative, and monad. I will end with a new twist on the state monad.

# What’s a Container?

What is a container after all? We all have some intuitions about containers and containment but if you try to formalize them, you get bogged down with tricky cases. For instance, can a container be infinite? In Haskell you can easily define the list of all integers or all Pythagorean triples. In non-lazy language like C++ you can fake infinite containers by defining input iterators. Obviously, an infinite container doesn’t physically contain all the data: it generates it on demand, just like a function does. We can also memoize functions and tabulate their values. Is the hash table of the values of the `sin` function a container or a function?

The bottom line is that there isn’t that much of a difference between containers and functions.

What characterizes a container is its ability to contain values. In a strongly typed language, these values have types. The type of elements shouldn’t matter, so it’s natural to describe a generic container as a mapping of types — element type to container type. A truly polymorphic container should not impose any constraints on the type of values it contains, so it is a total function from types to types.

It would be nice to be able to generically describe a way to retrieve values stored in a container, but each container provides its own unique retrieval protocol. A retrieval mechanism needs a way to specify the location from which to retrieve the value and a protocol for failure. This is an orthogonal problem and, in Haskell, it is addressed by lenses.

It would also be nice to be able to iterate over, or enumerate the contents of a container, but that cannot be done generically either. You need at least to specify the order of traversal. Even the simplest list can be traversed forwards or backwards, not to mention pre-, in-, and post-order traversals of trees. This problem is addressed, for instance, by Haskell’s `Traversable` functors.

But I think there is a deeper reason why we wouldn’t want to restrict ourselves to enumerable containers, and it has to do with infinity. This might sound like a heresy, but I don’t see any reason why we should limit the semantics of a language to countable infinities. The fact that digital computers can’t represent infinities, even those of the countable kind, doesn’t stop us from defining types that have infinite membership (the usual `Int`s and `Float`s are finite, because of the binary representation, but there are, for instance, infinitely many lists of `Int`s). Being able to enumerate the elements of a container, or convert it to a (possibly infinite) list means that it is countable. There are some operations that require countability: witness the `Foldable` type class with its `toList` function and `Traversable`, which is a subclass of `Foldable`. But maybe there is a subset of functionality that does not require the contents of the container to be countable.

If we restrain ourselves from retrieving or enumerating the contents of a container, how do we know the contents even exists? Because we can operate on it! The most generic operation over the contents of a container is applying a function to it. And that’s what functors let us do.

# Container as Functor

Here’s the translation of terms from category theory to Haskell.

A functor maps all objects in one category to objects in another category. In Haskell the objects are types, so a functor maps types into types (so, strictly speaking, it’s an endofunctor). You can look at it as a function on types, and this is reflected in the notation for the kind of the functor: `* -> *`. But normally, in a definition of a functor, you just see a polymorphic type constructor, which doesn’t really look like a function unless you squint really hard.

A categorical functor also maps morphisms to morphisms. In Haskell, morphisms correspond to functions, so a `Functor` type class defines a mapping of functions:

`fmap :: (a -> b) -> (f a -> f b)`

(Here, `f` is the functor in question acting on types `a` and `b`.)

Now let’s put on our container glasses and have another look at the functor. The type constructor defines a generic container type parameterized by the type of the element. The polymorphic function `fmap`, usually seen in its curried form:

`fmap :: (a -> b) -> f a -> f b`

defines the action of an arbitrary function `(a -> b)` on a container `(f a)` of elements of type `a` resulting in a container full of elements of type `b`.

# Examples

Let’s have a look at a few important functors as containers.

There is the trivial but surprisingly useful container that can hold no elements. It’s called the `Const` functor (parameterized by an unrelated type `b`):

```newtype Const b a = Const { getConst :: b }

instance Functor (Const b) where
fmap _ (Const x) = Const x```

Notice that `fmap` ignores its function argument because there isn’t any contents this function could act upon.

A container that can hold one and only one element is defined by the `Identity` functor:

```newtype Identity a = Identity { runIdentity :: a }

instance Functor Identity where
fmap f (Identity x) = Identity (f x)```

Then there is the familiar `Maybe` container that can hold (maybe) one element and a bunch of regular containers like lists, trees, etc.

The really interesting container is defined by the function application functor, `((->) e)` (which I would really like to write as `(e-> )`). The functor itself is parameterized by the type `e` — the type of the function argument. This is best seen when this functor is re-cast as a type constructor:

`newtype Reader e a = Reader (e -> a)`

This is of course the functor that underlies the `Reader` monad, where the first argument represents some kind of environment. It’s also the functor you’ll see in a moment in the Yoneda lemma.

Here’s the `Functor` instance for `Reader`:

```instance Functor (Reader e) where
fmap f (Reader g) = Reader (\x -> f (g x))```

or, equivalently, for the function application operator:

```instance Functor ((->) e) where
fmap = (.)```

This is a strange kind of container where the values that are “stored” are keyed by values of type `e`, the environments. Given a particular environment, you can retrieve the corresponding value by simply calling the function:

```runReader :: Reader e a -> e -> a
runReader (Reader f) env = f env```

You can look at it as a generalization of the key/value store where the environment plays the role of the key.

The reader functor (for the lack of a better term) covers a large spectrum of containers depending of the type of the environment you pick. The simplest choice is the unit type `()`, which contains only one element, `()`. A function from unit is just a constant, so such a function provides a container for storing one value (just like the `Identity` functor). A function of `Bool` stores two values. A function of `Integer` is equivalent to an infinite list. If it weren’t for space and time limitations we could in principle memoize any function and turn it into a lookup table.

In type theory you might see the type of functions from A to B written as BA, where A and B are types seen as sets. That’s because the analogy with exponentiation — taking B to the power of A — is very fruitful. When A is the unit type with just one element, BA becomes B1, which is just B: A function from unit is just a constant of type B. A function of `Bool`, which contains just two elements, is like B2 or BxB: a Cartesian product of Bs, or the set of pairs of Bs. A function from the enumeration of N values is equivalent to an N-tuple of Bs, or an element of BxBxBx…B, N-fold. You can kind of see how this generalizes into B to the power of A, for arbitrary A.

So a function from A to B is like a huge tuple of Bs that is indexed by an element of A. Notice however that the values stored in this kind of container can only be enumerated (or traversed) if A itself is enumerable.

The IO functor that is the basis of the IO monad is even more interesting as a container because it offers no way of extracting its contents. An object of the type `IO String`, for instance, may contain all possible answers from a user to a prompt, but we can’t look at any of them in separation. All we can do is to process them in bulk. This is true even when `IO` is looked upon as a monad. All a monad lets you do is to pass your IO container to another monadic function that returns a new container. You’re just passing along containers without ever finding out if the Schrodinger’s cat trapped in them is dead or alive. Yes, parallels with quantum mechanics help a lot!

# Natural Transformations

Now that we’ve got used to viewing functors as containers, let’s figure out what natural transformations are. A natural transformation is a mapping of functors that preserves their functorial nature. If functor F maps object A to X and another functor G maps A to Y, then a natural transformation from F to G must map X to Y. A mapping from X to Y is a morphism. So you can look at a natural transformation as a family of morphisms parameterized by A.

In Haskell, we turn all these objects A, X, and Y into types. We have two functors `f` and `g` acting on type `a`. A natural transformation will be a polymorphic function that maps `f a` to `g a` for any `a`.

`forall a . f a -> g a`

What does it mean in terms of containers? Very simple: A natural transformation is a way of re-packaging containers. It tells you how to take elements from one container and put them into another. It must do it without ever inspecting the elements themselves (it can, however, drop some elements or clone them).

Examples of natural transformations abound, but my favorite is `safeHead`. It takes the head element from a list container and repackages it into a `Maybe` container:

```safeHead :: forall a . [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x```

What about a more ambitions example: Let’s take a reader functor, `Int -> a`, and map it into the list functor `[a]`. The former corresponds to a container of `a` keyed by an integer, so it’s easily repackaged into a finite or an infinite list, for instance:

```genInfList :: forall a . (Int -> a) -> [a]
genInfList f = fmap f [0..]```

I’ll show you soon that all natural transformations from `(Int -> a)` to `[a]` have this form, and differ only by the choice of the list of integers (here, I arbitrarily picked `[0..]`).

A natural transformation, being a mapping of functors, must interact nicely with morphisms as well. The corresponding naturality condition translates easily into our container language. It tells you that it shouldn’t matter whether you first apply a function to the contents of a container (`fmap` over it) and then repackage it, or first repackage and then apply the function. This meshes very well with our intuition that repackaging doesn’t touch the elements of the container — it doesn’t breaks the eggs in the crate.

# The Yoneda Lemma

Now let’s get back to the function application functor (the `Reader` functor). I said it had something to do with the Yoneda lemma. I wrote a whole blog about the Yoneda lemma, so I’m not going to repeat it here — just translate it into the container language.

What Yoneda says is that the reader is a universal container from which stuff can be repackaged into any other container. I just showed you how to repackage the `Int` reader into a list using `fmap` and a list of `Int`. It turns out that you can do the same for any type of reader and an arbitrary container type. You just provide a container full of “environments” and `fmap` the reader function over it. In my example, the environment type was `Int` and the container was a list.

Moreover, Yoneda says that there is a one-to-one correspondence between “repackaging schemes” and containers of environments. Given a container of environments you do the repackaging by `fmap`ping the reader over it, as I did in the example. The inverse is also easy: given a repackaging, call it with an identity reader:

```idReader :: Reader e e

and you’ll get a container filled with environments.

Let me re-word it in terms of functors and natural transformations. For any functor `f` and any type `e`, all natural transformations of the form:

`forall a . ((e -> a) -> f a)`

are in one-to-one correspondence with values of the type `f e`. This is a pretty powerful equivalence. On the one hand you have a polymorphic function, on the other hand a polymorphic data structure, and they encode the same data. Except that things you do with functions are different than things you do with data structures so, depending on the context, one may be more convenient than the other.

For instance, if we apply the Yoneda lemma to the reader functor itself, we find out that all repackagings (natural transformations) between readers can be parameterized by functions between their environment types:

`forall a . ((e -> a) -> (e' -> a)) ~ e' -> e`

Or, you can look at this result as the CPS transform: Any function can be encoded in the Continuation Passing Style. The argument `(e -> a)` is the continuation. The `forall` quantifier tells us that the return type of the continuation is up to the caller. The caller might, for instance, decide to print the result, in which case they would call the function with the continuation that returns `IO ()`. Or they might call it with `id`, which is itself polymorphic: `a -> a`.

# Where Do Containers Come From?

A functor is a type constructor — it operates on types — but in a program you want to deal with data. A particular functor might define its data constructor: List and `Maybe` have constructors. A function, which we need in order to create an instance of the reader functor, may either be defined globally or through a lambda expression. You can’t construct an `IO` object, but there are some built-in runtime functions, like `getChar` or `putChar` that return `IO`.

If you have functions that produce containers you may compose them to create more complex containers, as in:

```-- m is the functor
f :: a -> m b
g :: b -> m c
fmap g (f x) :: m (m c)```

But the general ability to construct containers from scratch and to combine them requires special powers that are granted by successively more powerful classes of containers.

## Pointed

The first special power is the ability to create a default container from an element of any type. The function that does that is called `pure` in the context of applicative and `return` in the context of a monad. To confuse things a bit more, there is a type class `Pointed` that defines just this ability, giving it yet another name, `point`:

```class Functor f => Pointed f where
point :: a -> f a```

`point` is a natural transformation. You might object that there is no functor on the left hand side of the arrow, but just imagine seeing `Identity` there. Naturality just means that you can sneak a function under the functor using `fmap`:

`fmap g (point x) = point (g x)`

The presence of `point` means that there is a default, “trivial,” shape for the container in question. We usually don’t want this container to be empty (although it may — I’m grateful to Edward Kmett for correcting me on this point). It doesn’t mean that it’s a singleton, though — for `ZipList`, for instance, `pure` generates an infinite list of `a`.

## Applicative

Once you have a container of one type, `fmap` lets you generate a container of another type. But since the function you pass to `fmap` takes only one argument, you can’t create more complex types that take more than one argument in their constructor. You can’t even create a container of (non-diagonal) pairs. For that you need a more general ability: to apply a multi-argument function to multiple containers at once.

Of course, you can curry a multi-argument function and `fmap` it over the first container, but the result will be a container of hungry functions waiting for more arguments.

```h :: a -> b -> c
fmap h (m a) :: m (b -> c)```

(Here, `m` stands for the functor, applicative, or the monad in question.)

What you need is the ability to apply a container of functions to a container of arguments. The function that does that is called `<*>` in the context of applicative, and `ap` in the context of monad.

`(<*>) :: m (a -> b) -> m a -> m b`

As I mentioned before, `Applicative` is also `Pointed`, with `point` renamed to `pure`. This lets you wrap any additional arguments to your multi-argument functions.

The intuition is that applicative brings to the table its ability to increase the complexity of objects stored in containers. A functor lets you modify the objects but it’s a one-input one-output transformation. An applicative can combine multiple sources of information. You will often see applicative used with data constructors (which are just functions) to create containers of object from containers of arguments. When the containers also carry state, as you’ll see when we talk about `State`, an applicative will also be able to reflect the state of the arguments in the state of the result.

The monad has the special power of collapsing containers. The function that does it is called `join` and it turns a container of containers into a single container:

`join :: m (m a) -> m a`

Although it’s not obvious at first sight, `join` is also a natural transformation. The `fmap` for the `m . m` functor is the square of the original `fmap`, so the naturality condition looks like this:

` fmap f . join = join . (fmap . fmap) f `

Every monad is also an applicative with `return` playing the role of `pure` and `ap` implementing `<*>`:

```ap :: m (a -> b) -> m a -> m b
ap mf ma = join \$ fmap (\f -> fmap f ma) mf```

When working with the container interpretation, I find this view of a monad as an applicative functor with `join` more intuitive. In practice, however, it’s more convenient to define the monad in terms of bind, which combines application of a function a la `fmap` with the collapsing of the container. This is done using the function `>>=`:

```(>>=) :: m a -> (a -> m b) -> m b
ma >>= k = join (fmap k ma)```

Here, `k` is a function that produces containers. It is applied to a container of `a`, `ma`, using `fmap`. We’ve seen this before, but we had no way to collapse the resulting container of containers — `join` makes this possible.

Imagine a hierarchy of containment. You start with functions that produce containers. They “lift” your data to the level of containers. These are functions like `putChar`, data constructors like `Just`, etc. Then you have the “trivial” lifters of data called `pure` or `return`. You may operate on the data stored in containers by “lifting” a regular function using `fmap`. Applicative lets you lift multi-parameter functions to create containers of more complex data. You may also lift functions that produce containers to climb the rungs of containment: you get containers of containers, and so on. But only the monad provides the ability to collapse this hierarchy.

# State

Let’s have a look at the state functor, the basis of the state monad. It’s very similar to the reader functor, except that it also modifies the environment. We’ll call this modifiable environment “state.” The modified state is paired with the return value of the function that defines the state functor:

`newtype State s a = State (s -> (a, s))`

As a container, the reader functor generalized the key/value store. How should we interpret the state functor in the container language? Part of it is still the key/value mapping, but we have the additional key/key mapping that describes the state transformation. (The state plays the role of the key.) Notice also that the action of `fmap` modifies the values, but doesn’t change the key mappings.

```instance Functor (State s) where
fmap f (State g) = State (\st -> let (x, st') = g st
in (f x, st'))```

This is even more obvious if we separate the two mappings. Here’s the equivalent definition of the state functor in terms of two functions:

`data State' s a = State' (s -> a) (s -> s)`

The first function maps state to value: that’s our key/value store, identical to that of the reader functor. The second function is the state transition matrix. Their actions are quite independent:

`runState' (State' f tr) s = (f s, tr s)`

In this representation, you can clearly see that `fmap` only acts on the key/value part of the container, and its action on data is identical to that of the reader functor:

```instance Functor (State' s) where
fmap f (State' g tr) = State' (f . g) tr```

In the container language, we like to separate the contents from the shape of the container. Clearly, in the case of the state functor, the transition matrix, not being influenced by `fmap`, is part of the shape.

A look at the `Applicative` instance for this representation of the state functor is also interesting:

```instance Applicative (State' s) where
pure x = State' (const x) id
State' fg tr1 <*> State' fx tr2 =
State' ff (tr2 . tr1)
where
ff st = let g = fg st
x = fx (tr1 st)
in g x```

The default container created by `pure` uses identity as its transition matrix. As expected, the action of `<*>` creates a new “shape” for the container, but it does it in a very regular way by composing the transition matrices. In the language of linear algebra, the transformation of state by the applicative functor would be called “linear.” This will not be true with monads.

You can also see the propagation of side effects: the values for the first and second argument are retrieved using different keys: The key for the retrieval of the function `g` is the original state, `st`; but the argument to the function, `x`, is retrieved using the state transformed by the transition matrix of the first argument `(tr1 st)`. Notice however that the selection of keys is not influenced by the values stored in the containers.

But here’s the monad instance:

```instance Monad (State' s) where
return x = State' (const x) id
State' fx tr >>= k =
State' ff ttr
where
ff st  = let x = fx st
st' = tr st
State' fy tr' = k x
in fy st'
ttr st = let x = fx st
st' = tr st
State' fy tr' = k x
in tr' st'```

What’s interesting here is that the calculation of the transition matrix requires the evaluation of the function `k` with the argument `x`. It means that the state transition is no longer linear — the decision which state to chose next may depend on the contents of the container. This is also visible in the implementation of `join` for this monad:

```join :: State' s (State' s a) -> State' s a
join (State' ff ttr) = State' f' tr'
where
f' st  = let State' f tr = ff st
st'         = ttr st
in f st'
tr' st = let State' f tr = ff st
st'         = ttr st
in tr st'```

Here, the outer container stores the inner container as data. Part of the inner container is its transition matrix. So the decision of which transition matrix `tr` to use is intertwined with data in a non-linear way.

This non-linearity means that a monad is strictly more powerful than applicative, but it also makes composing monads together harder.

# Conclusions

The only way to really understand a complex concept is to see it from as many different angles as possible. Looking at functors as containers provides a different perspective and brings new insights. For me it was the realization that functions can be seen as non-enumerable containers of values, and that the state monad can be seen as a key/value store with an accompanying state transition matrix that brought the aha! moments. It was also nice to explicitly see the linearity in the applicative’s propagation of state. It was surprising to discover the simplicity of the Yoneda lemma and natural transformations in the language of containers.

# Bibliography and Acknowledgments

A container is not a very well defined term — an ambiguity I exploited in this blog post — but there is a well defined notion of finitary containers, and they indeed have a strong connection to functors. Russell O’Connor and Mauro Jaskelioff have recently shown that every traversable functor is a finitary container (I’m grateful to the authors for providing me with the preliminary copy of their paper, in which they have also independently and much more rigorously shown the connection between the Yoneda lemma for the functor category and the van Laarhoven formulation of the lens).

In my n’th attempt to tackle monads I tried a different, mystical, approach. In mysticism you often talk about things that you cannot define. Taoism takes this approach to the extreme — any attempt to define Tao must fail: “Tao that can be described is not the eternal Tao.” Hence the slightly tongue-in-cheek approach I took in this tutorial, Pure Functions, Laziness, I/O, and Monads. If you’re new to Haskell, it might help if you read the two previous tutorials in my Basics of Haskell. Notice that these tutorials are in the new format introduced by the School of Haskell, where you can run and edit program snippets directly in your browser.

I must confess that this tutorial got slightly cold welcome in the Haskell community. That’s why I’m curious what non-Haskellers think of it. Does my approach make monads less or more scary? Take into account that I decided to tackle monads right after explaining the very basics of Haskell syntax — in most courses or books monads are treated as an advanced topic.

The videos and the slides of my Boostcon talks, “Haskell — The Pseudocode Language for C++ Metaprogramming” and now available online, courtesy my employer, Corensic. They cover some of the material from the blog series on monads.

“You must be kidding!” would be the expected reaction to “Monads in C++.” Hence my surprise when I was invited to Boostcon 11 to give a three-hour presentation on said topic, a presentation which was met with totally unexpected level of interest. I imagine there was a small uptick in the sales of Haskell books afterwards.

Before I answer the question: “Why are monads relevant in C++?”, let me first answer the question: “Why is Haskell relevant in C++?” It turns out that Haskell makes C++ template metaprogramming if not easy then at least approachable (see my blog, What Does Haskell Have to Do with C++?). You have to understand that compile-time C++ is a strict functional language operating (mostly) on types. If that is not immediately obvious, it’s because the C++ template syntax is so appalling.

Now, not everybody falls in love with the Haskell syntax (at least not at first sight) but it fits the functional paradigm much better than anything else. What’s more, the correspondence between Haskell code and C++ template code is so direct that it could almost be translated mechanically. It really pays to design and test your template metaprograms in Haskell first, before translating them to C++. Conversely, translating complex template code to Haskell may help you understand it better and enable you to maintain and extend it. Those were my reasons to advocate the use of Haskell as a pseudo-language for C++ template metaprogramming.

Armed with Haskell I could study and analyze some of the most sophisticated C++ metacode. But when I looked at Eric Niebler’s Boost Proto library and couldn’t make heads or tails of it, even after discussing it with him over email and over beers, I was stumped.

# Monads and EDSLs

Boost Proto is a library for implementing domain-specific languages (DSLs). It’s an impressive piece of C++ metaprogramming but it’s hard to comprehend and it doesn’t yield to straightforward translation to Haskell. The problem is that it combines compile-time constructs with run time code in a non-trivial way. I struggled long with Proto until one day I had an epiphany– I was looking at a monad. But first:

## What Is an EDSL?

Great things come out of abusing C++. One example is the abuse of templates (which were originally designed to support parametrized data types) to express compile-time computations. The result is Template Metaprogrammming (TMP). Another example is the abuse of operator overloading, which created a fertile ground for Embedded Domain-Specific Languages (EDSLs). Eric’s Proto combined the two abominations into a library for constructing EDSLs.

In this post I will construct a simple EDSL in C++. This is how it works:

1. You overload the heck out of operators
2. The overloaded operators build trees from expressions instead of eagerly computing results
3. The type of the tree encodes the structure of the expression that created it. Since it’s a type, the structure of the expression is available at compile time
4. You construct an object of that type
5. You execute that object (because it is a function object) with appropriate arguments, and get the results you wanted.

A more general EDSL will also create a runtime data structure that stores the values of any variables and literals in the expression. Algorithms can walk the tree at compile time, runtime, or both to compute values and perform actions. You can even write a compile time algorithm to compute a runtime algorithm that munges trees in interesting ways.

My moment of Zen was when I realized that an EDSL corresponds to a Haskell reader monad. (If you’re not familiar with Haskell or monads, read my series, Monads for the Curios Programmer.) To my own amazement this analogy worked and led to executable code.

Composition of monadic functions Compile-time parsing of expressions
Execution of compound action on astate Application of function object to runtime values

To prove the concept, I picked a simple EDSL based on one of the Eric’s examples. It’s a two-argument lambda EDSL. It lets you write an expression with two placeholders, `arg1` and `arg2`. Through the magic of templates and operator overloading, this expression is interpreted as an anonymous function of two arguments. Here’s an example which evaluates to 25:

`int x = (arg1 * arg2 + 13)(3, 4)`

It turns out that the most important step in this process is to be able to convert an expression tree into a function object. Let me do this in Haskell first, and then translate it into C++.

A reader monad (a state monad with immutable state) provides a way to express stateful computations in a pure functional language. A stateful computation takes some arguments and returns a value, but unlike a function, it makes use of some external state. This computation can be turned into a function that takes the same arguments but, instead of returning a value, returns another function that takes a state as an argument and calculates the value. (The distinction between a computation and a function is essential here.)

Statefull computation represented as a monadic function returning an action

## State

The state, in the case of our expression monad, is a collection of two values — the arguments to the lambda. They will be used to replace the two placeholders in the expression.

Here’s the definition of our state — a list of integers (we really need only two, but I’m being sloppy… I mean general):

`type Args = [Int]`

## Actions

An action is a function that takes the state and produces a value of arbitrary type `t`:

`Args -> t`

You might remember from my previous post that one element of the monad is a type constructor: a mapping of an arbitrary type into an enriched type. In this case the enriched type is an action type. Because of technicalities related to Haskell’s type inference, I’ll have to thinly encapsulate an action inside a new type, which I will call `Prog`:

`newtype Prog t = PR (Args -> t)`

The idea is that the composition of monadic actions is similar to the compilation of source code: it produces a “program,” which is then “run.” In fact, I’ll define an auxiliary function that does just that, it runs a “program”:

```run :: Prog t -> Args -> t
run (PR act) args = act args```

This is a function that takes a `Prog t` as its first argument and pattern matches it to the constructor `(PR act)`. The second argument is the state, `args` of type `Args`. The result of running the program is a value of type `t`.

The next step is to define some primitive monadic functions. These are functions that produce actions (or “programs”). Here’s a very useful function that produces an action that extracts the n’th value from the list of arguments (the state):

```getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)```

`getArg` takes an `Int` (in our case, zero or one) and returns a lambda (encapsulated in a `Prog` using the constructor `PR`). This lambda takes a list of `Int`s, `args`, and extracts its n’th element (`args !! n` means: take the n’th element of the list, `args`).

You’ve seen examples of monadic functions in my previous blog post, but it’s worth repeating the idea: `getArg` is a function that returns an action that is like a promise to extract the n’th argument when the arguments become available.

Just for fun, here’s another monadic function that takes `n` and promises to return twice `n` when the arguments are provided. It doesn’t matter that the arguments are going to be ignored.

`doubleIt n = PR (λ args -> 2 * n)`

I need two monadic functions, `getArg` and `doubleIt`, in order to demonstrate their composition using “bind.” I want to create an action that will get the first argument using `getArg 0` and double it using `doubleIt v`. If you remember how “bind” works, it takes an action, in this case the result of `getArg 0`, and a continuation, which represents “the rest of the computation.”

Combining two monadic functions using bind

In our case, the continuation is a lambda that takes an argument `v` (that’s the future result of the first action) and performs `doubleIt v`.

`λ v -> doubleIt v`

This lambda returns an action because `doubleIt v` returns an action.

The signature of `bind` is:

`bind :: (Prog a) -> (a -> (Prog b)) -> (Prog b)`

You might have noticed that I use the words “action” and “program” interchangeably, although, strictly speaking, an action is the contents of a program. However, this distinction is an artifact or a Haskell quirk — a monad can’t be defined using a type alias, so we need the `Prog` type to encapsulate the action. Curiously, we won’t have this problem in C++.

The purpose of bind is to glue together an action with a continuation and return a new action. That means that `bind` has to return a lambda of appropriate type:

```bind (PR act) cont =
PR (λ args -> ... produce value of type b ...)```

This lambda will be executed when arguments are available. At that point, with the arguments handy, we’ll be able to first execute the action, `act`, and pass its result to the continuation. The continuation will return another action, which we can promptly execute too:

```bind (PR act) cont =
PR (λ args ->
let v = act args
(PR act') = cont v
in
act' args)```

(In Haskell you can use primed variables, like `act'`. I like that notation.)

This is very much like the state monad bind, except that we don’t have to worry about chaining the state, which is immutable. In fact the above is the definition of the reader monad’s `bind`.

Let’s test our new bind by composing `getArg` with `doubleIt`:

```test0 :: Prog Int
test0 =
bind (getArg 0) (λ v -> doubleIt v)```

Composing monadic function, `GetArg`, with the continuation that calls another monadic function, `doubleIt`

We can run the program produced by `test0` to see that it actually works:

```> let prog = test0
> run prog [3,4]
6
> run prog [11,0]
22```

For completeness, here’s the full definition of the `Prog` reader monad:

```instance Monad Prog where
return v = PR (λ args -> v)
(PR act) >>= cont = PR (λ arg -> let v = act arg
(PR act') = cont v
in act' arg)```

With that definition, we can rewrite our little example in the `do` notation:

```test1 = do
v <- getArg 0
doubleIt v```

## Expression Tree

We have the definition of a program — a thinly encapsulated action. To create an actual program we need some kind of source code and a compiler. Our source code takes the form of an expression tree. A tree in Haskell is defined as a tagged union:

```data Exp = Const Int
| Plus Exp Exp
| Times Exp Exp
| Arg1
| Arg2```

The first constructor, `Const` takes an `Int` and creates a leaf node corresponding to this constant value. The next two constructors are recursive, they each take two expressions and produce, correspondingly, a `Plus` and a `Times` node. Finally, there are two placeholders for the two arguments that will be provided at “runtime.”

## Compilation

Compilation is turning source code into a program. This process can be represented as a function with the following signature:

`compile :: Exp -> Prog Int`

It takes an expression tree (the source) and returns a program that evaluates to an integer. Looking from another perspective, `compile` is just a monadic function in our `Prog` monad.

We will define `compile` in little steps driven by the various patterns in the definition of `Exp`. If the expression matches the `Const` node, we define `compile` as:

`compile (Const c) = return c`

Remember, `return` is a function that takes a value and returns an action (program) that will produce this value when executed.

Another easy case is `Arg1` (and `Arg2`)–we already have a monadic function `getArg` that we can call:

`compile Arg1 = getArg 0`

The interesting case is the `Plus` (or `Times`) node. Here we have to recursively call `compile` for both children and then compose the results. Well, that’s what monadic function composition is for. Here’s the code using the `do` notation, hiding the calls to `bind`:

```compile (Plus e1 e2) =
do
v1 <- compile e1
v2 <- compile e2
return (v1 + v2)```

That’s it! We can now compile a little test expression `(Arg1 * Arg2 + 13)`:

```testExp =
let exp = (Plus (Times Arg1 Arg2)(Const 13))
in compile exp```

and we can run the resulting program with an argument list:

```> let args = [3, 4]
> let prog = testExp
> run prog args
25```

# The Expression Monad in C++

The translation of the Haskell expression monad to C++ is almost embarrassingly easy. I’ll be showing Haskell code side by side with its C++ equivalents.

## Expression Tree

```data Exp = Const Int
| Plus Exp Exp
| Times Exp Exp
| Arg1
| Arg2```

In C++, the expression tree is a compile-time construct that is expressed as a type (I’ll show you later where this type originates from). We have separate types for all the leaves, and the non-leaf nodes are parametrized by the types of their children.

```template<int n> struct Const {};

template<class E1, class E2>
struct Plus {};

template<class E1, class E2>
struct Times {};

struct Arg1 {};
struct Arg2 {};```

For instance, here’s the type that corresponds to the expression `arg1*arg2+13`:

`Plus<Times<Arg1, Arg2>, Const<13>>`

## State

`type Args = [Int]`

C++: State is a runtime object. I implemented it using an array of two integers and, for good measure, I added a constructor and an accessor.

```struct Args
{
Args(int i, int j) {
_a[0] = i;
_a[1] = j;
}
int operator[](int n) { return _a[n]; }
int _a[2];
};```

## Action

Here’s the tricky part: How to represent an action? Remember, an action takes state, which is now represented by `Args`, and returns a value of some type. Because `Args` are only available at runtime, an action must be a runtime function or, even better, a function object.

How does that fit in our compile-time/runtime picture? We want our C++ monadic functions to be “executed” at compile time, but they should produce actions that are executed at runtime. All we can do at compile time is to operate on types, and this is exactly what we’ll do. We will create a new type that is a function object. A function object is a `struct` that implements an overloading of a function-call operator.

There’s another way of looking at it by extending the notion of a metafunction. In one of my previous posts I described metafunctions that “return” values, types, or even other metafunctions. Here we have a metafunction that returns a (runtime) function. This view fits better the Haskell monadic picture where a monadic function returns an action.

## Type Constructor

Unfortunately, not everything can be expressed as neatly in C++ as it is in Haskell. In particular our type constructor:

`newtype Prog t = PR (Args -> t)`

doesn’t have a direct C++ counterpart. It should be a template that, for any type `T`, defines an action returning that type. In this construction, an action is represented by a C++ function object, so we would like something like this:

```template<class T> struct PR {
T operator()(Args args);
};```

which, for many reasons, is useless. What we really need is for `PR` to be a “concept” that specifies a type with an associated method, `operator()`, of a particular signature. Since concepts are not part of C++11, we’ll have to rely on programming discipline and hope that, if we make a mistake, the compiler error messages will not be totally horrible.

Let’s start by translating the Haskell monadic function `getArg`:

```getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)```

Here it is in C++:

```template<int n>
struct GetArg { // instance of the concept PR
int operator()(Args args) {
return args[n];
}
};```

It is a metafunction that takes a compile-time argument `n` and “returns” an action. Translation: it defines a struct with an overloaded `operator()` that takes `Args` and returns an `int`. Again, ideally this metafunction should be a struct that is constrained by the concept `PR`.

## Bind

Let’s look again at the Haskell’s implementation of monadic `bind`:

```bind (PR prog) cont =
PR (λ args ->
let v = prog args
(PR prog') = cont v
in
prog' args)```

It takes two arguments: an action and a continuation. We know what an action is in our C++ construction: it’s a function object with a particular signature. We’ll parameterize our C++ `Bind` with the type, `P1`, of that object. The continuation is supposed to take whatever that action returns and return a new action. The type of that new action, `P2`, will be the second template parameter of `Bind`.

We’ll encode the type of the continuation as the standard function object taking an `int` and returning `P2`:

`std::function<P2(int)>`

Now, the C++ `Bind` is a metafunction of `P1` and `P2` that “returns” an action. The act of “returning” an action translates into defining a `struct` with the appropriate overload of `operator()`. Here’s the skeleton of this `struct`:

```template<class P1, class P2> // compile-time type parameters
struct Bind {
Bind(P1 prog, std::function<P2(int)> cont)
: _prog(prog), _cont(cont)
{}
...
P1 _prog;
std::function<P2(int)> _cont;
};```

Notice that at runtime we will want to construct an object of this type, `Bind`, and pass it the runtime arguments: the action and the continuation. The role of the constructor requires some explanation. Haskell function `bind` is a monadic function of two arguments. Its C++ counterpart is a metafunction that takes four arguments: `P1` and `P2` at compile time, and `prog` and `cont` at runtime. This is a general pattern: When constructing a monadic metafunction in C++ we try to push as much as possible into compile time, but some of the arguments might not be available until runtime. In that case we shove them into the constructor.

The interesting part is, of course, the function-call operator, which really looks like a one-to-one translation of the Haskell implementation:

```template<class P1, class P2>
struct Bind {
...
int operator()(Args args) {
int v = _prog(args);
P2 prog2 = _cont(v);
return prog2(args);
}
...
};```

Things to observe: A smart compiler should be able to inline all these calls because it knows the types `P1` and `P2`, so it can look up the implementations of their function-call operators. What’s left to the runtime is just the operations on the actual runtime values, like the ones done inside `GetArg::operator()`. However, I have been informed by Eric Niebler that many of these optimizations are thwarted by the choice of `std::function` for the representation of the continuation. This problem can be overcome, but at some loss of clarity of code, so I’ll stick to my original implementation (also, see Appendix 2).

## Return

All that is left to complete the monad is the `return` function. Here it is in Haskell:

```return :: a -> Prog a
return v = PR (λ args -> v)```

And here it is, as a function object, in C++:

```struct Return
{
Return(int v) : _v(v) {}
int operator()(Args args)
{
return _v;
}
int _v;
};```

Of course, in full generality, `Return` should be parameterized by the return type of its `operator()`, but for this example a simple `int` will do. The argument `v` is only available at runtime, so it is passed to the constructor of `Return`.

# The “Compile” Metafunction in C++

Now that we have our monad implemented, we can use it to build complex monadic functions from simple ones (such as `GetArg`). In Haskell we have implemented a monadic function `compile` with the following signature:

`compile :: Exp -> Prog Int`

It takes an expression tree and returns an action that evaluates this tree.

The C++ equivalent is a metafunction that takes a (compile-time) expression tree and defines a struct with the appropriate overload of a function-call operator. Lacking concept support, the latter requirement can’t be directly expressed in C++. We can, and in fact have to, provide a forward declaration of `Compile`:

```template<class Exp>
struct Compile;```

In Haskell, we defined `compile` using pattern matching. We can do the same in C++. Just like we split the Haskel definition of `compile` into multiple sub-definitions corresponding to different argument patterns, we’ll split the C++ definition into multiple specializations of the general template, `Compile`.

Here’s the first specialization in Haskell:

`compile (Const c) = return c`

and in C++:

```template<int c>
struct Compile<Const<c>> : Return
{
Compile() : Return(c) {}
};```

I could have defined a separate overload of `operator()` for this case, but it’s simpler to reuse the one in `Return`.

Here’s another trivial case:

`compile Arg1 = getArg 0`

translates into:

```template<>
struct Compile<Arg1> : GetArg<0> {};```

The real fun begins with the `Plus` node, because it involves composition of monadic functions. Here’s the de-sugared Haskell version:

```compile (Plus exL exR) =
bind compile exL
λ left ->
bind compile exR
λ right ->
return (left + right)```

The logic is simple: First we compile the left child and bind the result with the continuation (the lambda) that does the rest. Inside this continuation, we compile the right child and bind the result with the continuation that does the rest. Inside that continuation we add the two results (they are the arguments to the continuation) and encapsulate the sum in an action using `return`.

In C++, the binding is done by creating the appropriate `Bind` object and passing it (in the constructor) a function object and a continuation. The function object is the result of the compilation of the left child (we construct a temporary object of this type on the fly):

`Compile<L>()`

Just like in Haskell, the continuation is a lambda, only now it’s the C++11 lambda. Here’s the code, still with some holes to be filled later:

```template<class L, class R>
struct Compile<Plus<L, R>> {
int operator()(Args args)
{
return Bind<...> (
Compile<L>(),
[](int left) {
return Bind<...>(
Compile<R>(),
[left](int right) {
return Return(left + right);
}
);
}
)(args);
}
};```

Notice that the second lambda must explicitly capture the local variable, `left`. In Haskell, this capture was implicit.

The types for the instantiations of the two `Bind` templates can be easily derived bottom up. Ideally, we would like the compiler to infer them, just like in Haskell, but the C++ compiler is not powerful enough (although, at the cost of muddying the code some more, one can define template functions that return `Bind` objects of appropriate types — in C++ type inference works for template functions).

Here’s the final implementation with the types filled in:

```template<class L, class R>
struct Compile<Plus<L, R>> {
int operator()(Args args)
{
return Bind<Compile<L>, Bind<Compile<R>, Return>> (
Compile<L>(),
[](int left) -> Bind<Compile<R>, Return> {
return Bind<Compile<R>, Return>(
Compile<R>(),
[left](int right) -> Return {
return Return(left + right);
}
);
}
)(args);
}
};```

It’s quite a handful and, frankly speaking, I would have never been able to understand it, much less write it, if it weren’t for Haskell.

# The Test

You can imagine the emotional moment when I finally ran the test and it produced the correct result. I evaluated the simple expression, `Arg1*Arg2+13`, with arguments 3 and 4 and got back 25. The monad worked!

```void main () {
Args args(3, 4);
Compile<Plus<Times<Arg1, Arg2>, Const<13>>> act;
int v = act(args);
std::cout << v << std::endl;
}```

# The EDSL

Now that we have the monad and a monadic function `Compile`, we can finally build a simple embedded domain-specific language. Our minimalistic goal is to be able to evaluate the following expression:

`int x = (arg1 + arg2 * arg2)(3, 4);`

The trick is to convince the compiler to construct a very special type that represents the particular expression tree. In our case it should be something like this:

`Plus< Arg1, Times<Arg2, Arg2> >`

With this type, call it `E`, the compiler should call a special metafunction we’ll call `Lambda`, which returns a function object of two integral arguments:

```template<class E>
struct Lambda {
int operator()(int x, int y) {
Args args(x, y);
Compile<E> prog;
return prog(args);
}
};```

How does one do it? This is really the bread and butter of EDSL writers — pretty elementary stuff — but I’ll explain it anyway. We start by declaring two objects, `arg1` and `arg2`:

```const Lambda<Arg1> arg1;
const Lambda<Arg2> arg2;```

These little objects can infect any expression with their `Lambda`-ness, and spread the infection with the help of appropriately overloaded arithmetic operators.

For instance, when the compiler sees `arg1+arg2`, it will look for the overload of `operator+` that takes two `Lambda`s. And we’ll provide this one:

```template<class E1, class E2>
Lambda<Plus<E1, E2>> operator+ (Lambda<E1> e1, Lambda<E2> e2)
{
return Lambda<Plus<E1, E2>>();
}```

Notice that this operator returns another `Lambda`, albeit of a more complex type, thus spreading the `Lambda`-ness even further, and so on. (In this very simplified example I’m ignoring the arguments `e1` and `e1`. In general they would be used to create a runtime version of the expression tree.)

Let’s see how it works in our example:

`(arg1 + arg2 * arg2)(3, 4)`

The seed types are, respectively,

`Lambda<Arg1>,  Lambda<Arg2>, Lambda<Arg2> `

The application of the inner `operator*` (its definition is left as an exercise to the reader), produces the following type:

`Lambda<Times<Arg2, Arg2>>`

This type is passed, along with `Lambda<Arg1>` to `operator+`, which returns:

`Lambda<Plus<Arg1, Times<Arg2, Arg2>>`

All this is just type analysis done by the compiler to find out what type of object is returned by the outermost call to `operator+`.

The next question the compiler asks itself is whether this object can be called like a function, with the two integer arguments. Why, yes! A `Lambda` object (see above) defines an overload of the function-call operator. The instantiation of this particular `Lambda` defines the following overload:

```int Lambda<Plus<Arg1, Times<Arg2, Arg2>>::operator()(int x, int y) {
Args args(x, y);
Compile<Plus<Arg1, Times<Arg2, Arg2>> prog;
return prog(args);
}```

This function will be called at runtime with the arguments 3 and 4 to produce the expected result, 19.

The code that has to be actually executed at runtime is the call to `prog(args)`, which is mostly a series of `Bind`s, an addition, and a multiplication. Since the implementation of a `Bind`‘s function-call operators has no flow of control statements (`if` statements or loops), it can all be inlined by an optimizing compiler (modulo the glitch with `std::function` I mentioned earlier). So all that’s left is the addition and multiplication. Eric tells me that this is how Proto expressions work, and there is no reason why the monadic version wouldn’t lead to the same kind of performance.

# Conculsions and Future Work

I came up with this C++ monadic construction to help me understand the kind of heavy-duty template metaprogramming that goes on in Proto. Whether I have succeeded is a matter of opinion. I exchanged some of the complexity of template manipulations for a different complexity of Haskell and monads. I would argue that any C++ template metaprogrammer should know at least one functional language, so learning Haskell is a good investment. Monads are just part of the same Haskell package.

Assuming you understand monads, is the monadic construction of an EDSL simpler than the original using Proto? I will argue that it is. The implementation of Proto is pretty much monolithic, the concepts it’s using have very little application or meaning outside of Proto. The monadic approach, on the other hand, decomposes into several layers of abstraction. At the bottom you have the construction of the state-, or reader-, monad. The monad exposes just three primitives: the type constructor, bind, and return. With these primitives you can create and compose monadic (meta-) functions — in my example it was the `compile` metafunction. Finally, you use these metafunctions to build an EDSL.

With this three-layer structure comes a well defined set of customization points. Users may plug in their own type constructors and implement their own `Bind` and `Return`. Or, they might use the default monad and just create their own monadic functions. It’s important that the procedure for building and composing monadic functions is well defined and that it uses straight C++ for implementing the logic. Inside a monadic function you can use regular C++ statements and standard control flow devices.

It’s not yet totally clear how general this approach is — after all what I described is a toy example. But there is a lot of interest in re-thinking or maybe even re-implementing Boost Proto and Phoenix in terms of monads. At Boostcon I started working with Joel Falcou and Hartmut Kaiser on this project, and later Eric Niebler and Thomas Heller joined our group. We believe that having solid theoretical foundations might encourage wider acceptance of some of the more complex template libraries. Especially if strict one-to-one correspondence with Haskell code could be established.

# Acknowledgments

I’ve been given so much feedback on my Boostcon presentation and the drafts of this blog post that I should really list the following people as co-authors: Eric Niebler, Thomas Heller, Joel Falcou, and Hartmut Keiser. Thank you guys!

This page has been translated into Spanish language by Maria Ramos.

# Appendix 1

The picture wouldn’t be complete if I didn’t provide the translation of the EDSL construction back to Haskell. It’s pretty straightforward, except for the last part where an expression becomes a callable function object. Maybe there is a way to bend Haskell’s syntax to do it directly but, since this is just a proof of concept, I took a shortcut and defined a function `toFun` which turns a `Lambda` expression into a function.

```newtype Lambda = L Exp

toFun (L ex) =
λ x y ->
run (compile ex) [x, y]

instance Num Lambda where
(L e1) + (L e2) = L (Plus e1 e2)
(L e1) * (L e2) = L (Times e1 e2)
fromInteger n = L (Const (fromInteger n))

test =
let arg1 = L Arg1
arg2 = L Arg2
in
(toFun (arg1 + 2 * arg2 * arg2)) 2 3```

The overloading of arithmetic operators is done by making `Lambda` an instance of the type class `Num`.

# Appendix 2

When writing code at this level of abstraction it’s easy to bump into compiler bugs. For instance, Visual Studio 2010 won’t instantiate this simpler version of `Bind` that doesn’t use `std::function`:

```template<class P1, class Cont>
struct Bind
{
Bind(P1 prog, Cont f)
: _prog(prog), _f(f)
{}
int operator()(State args) {
int v = _prog(args);
auto p2 = _f(v);
return p2(args);
}
P1 _prog;
Cont _f; // store a lambda continuation
};```

# Bibliography

1. Bartosz Milewski, Monads for the Curious Programmer:
2. Bartosz Milewski, What Does Haskell Have to Do with C++?
3. Eric Niebler, Expressive C++. A series of blog posts that started all this
4. Brian McNamara and Yannis Smaragdakis, Functional Programming in C++. A Haskell-in-C++ library
5. Brian McNamara and Yannis Smaragdakis, Syntax sugar for C++: lambda, infix, monads, and more. More traditional approach to monads in C++ as a runtime device
6. Zoltán Porkoláb, Ábel Sinkovics, Domain-specific Language Integration with Compile-time Parser Generator Library. A C++ implementation of a compile-time parser that uses a monadic approach (although the word “monad” is never mentioned in the paper)

I only went to one talk, not because the rest was not interesting, quite the contrary, but because I worked with Joel and Hartmut on rewriting Proto. I think we essentially got it. We have the exrpession monad implemented, my “compile” function turned out to be the equivalent of Proto transform, but with much more flexibility, expression extension produced a little Lambda EDSL with placeholders for arguments and even const terminals. It works like a charm. If you don’t know what I’m talking about, I promise to finish my blog series on monads in C++ real soon now.

The talk I went to was Chris Kohlhoff talking more about Asio, the asynchronous IO library. He was showing how the new features of C++11 make his code simpler, safer, and more flexible without too much effort. In particular he found move semantics extremely helpful in reducing (or, in some cases, eliminating) the need for memory allocation in steady state–an important property when running in an embedded system, for instance. But what I liked most was his approach to solving the inversion of control problem by implementing his own coroutines. Sure, he had to abuse C++ macros, but the code was actually much more readable and reflected the way we think about asynchronous calls.

The idea is that, with coroutines, you write your code in a linear way. You say “read socket asynchronously” and then yield. The flow of control exits the coroutine in the middle, and continues with other tasks. The trick is that the rest of the coroutine becomes your completion handler. When the async call completes, the coroutine is called again, but it starts executing right after the last yield. Your flow of control moves back and forth, but your code looks linear, instead of being fragmented into a bunch of handlers. It makes you wish coroutines were part of the language, as they are, for instance, in C#.

By the way, I caught Hans Boehm while he was waiting for the airport shuttle and asked questions about memory_order_relaxed. You know, the problem is, Can a relaxed load fetch an “out of thin air” value–a value that has never been written by anybody else? What I’m getting now is that in practice this will never happen, but it’s very hard to describe this requirement formally. In other words, Can a malicious chip manufacturer in cahoots with a compiler writer come up with a system that fulfills the letter of the C++ memory model and lets you fetch an out-of-thin-air value? I think the answer is yes, because the language of the Standard is purposely vague on this topic:

(29.3.11) [ Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

```// Thread 1: