### Monads

Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

# The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

```void printNTriples(int n)
{
int i = 0;
for (int z = 1; ; ++z)
for (int x = 1; x <= z; ++x)
for (int y = x; y <= z; ++y)
if (x*x + y*y == z*z) {
printf("%d, %d, %d\n", x, y, z);
if (++i == n)
return;
}
}```

Here, a single C function serves three distinct purposes: It

1. Generates Pythagorean triples,
2. Prints them,
3. Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

1. Generate the list of Pythagorean triples,
2. Take the first ten of them, and
3. Print them.

And that’s exactly how you’d write this program in Haskell:

```main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
, x <- [1..z]
, y <- [x..z]
, x^2 + y^2 == z^2]```

This program reads: take 10 triples and print them. It declares `triples` as a list (square brackets mean a list) of triples `(x, y, z)`, where (the vertical bar reads “where”) `z` is an element of the list of integers from 1 to infinity, `x` is from 1 to `z`, `y` is from `x` to `z`, and the sum of squares of `x` and `y` is equal to the square of `z`. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to `print` that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.

# Suspension

We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

```template<class T>
class Susp {
public:
explicit Susp(std::function<T()> f)
: _f(f)
{}
T get() { return _f(); }
private:
std::function<T()> _f;
};```

We often create suspensions using lambda functions, as in:

```int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
...
int z = sum.get();```

Notice that the suspended lambda may capture variables from its environment: here `x` and `y`. A lambda, and therefore a suspension, is a `closure`.

The trouble with this implementation is that the function is re-executed every time we call `get`. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls `get` we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the `this` pointer) and returning a value (by const reference). The `get` method simply calls this thunk, passing it the `this` pointer.

Initially, the thunk is set to `thunkForce`, which calls the method `setMemo`. This method evaluates the function, stores the result in `_memo`, switches the thunk to `thunkGet`, and returns the memoized value. On subsequent calls `get` goes through the `getMemo` thunk which simply returns the memoized value.

```template<class T>
class Susp
{
// thunk
static T const & thunkForce(Susp * susp) {
return susp->setMemo();
}
// thunk
static T const & thunkGet(Susp * susp) {
return susp->getMemo();
}
T const & getMemo() {
return _memo;
}
T const & setMemo() {
_memo = _f();
_thunk = &thunkGet;
return getMemo();
}
public:
explicit Susp(std::function<T()> f)
: _f(f), _thunk(&thunkForce), _memo(T())
{}
T const & get() {
return _thunk(this);
}
private:
T const & (*_thunk)(Susp *);
mutable T   _memo;

std::function<T()> _f;
};```

(By the way, the function pointer declaration of `_thunk` looks pretty scary in C++, doesn’t it?)

But this is not the whole story. We have to make the suspension thread-safe or else we’ll have a data race when switching thunks. If we add a mutex to our data structure, we’ll pretty much turn it into an academic curiosity. So atomics to the rescue:

`mutable std::atomic<T const & (*)(Susp *)> _thunk;`

Fortunately, this is just a simple case of publication safety: the switch from one thunk to another happens only once. We can make it thread safe by using `memory_order_release` when storing the pointer:

```T const & setMemo()
{
_memo = _f();
_thunk.store(&thunkGet, std::memory_order_release);
return getMemo();
}```

and `memory_order_acquire` when reading it (which turns into a regular load on the x86):

```T const & get()
{
auto th = _thunk.load(std::memory_order_acquire);
return th(this);
}```

The release of the thunk also guarantees that all changes to `_memo` (which is not declared `atomic`) become visible to the clients who acquire the new thunk, because of the “happens before” relationship they enforce.

Unfortunately, this lock free solution cannot guarantee that the suspended function will always be called only once. There is a window of opportunity between the first call to the suspended function and the setting of the thunk during which another thread might start executing the function in parallel. As long as the function is pure, however, this is perfectly safe. But it is the duty of the client to ensure the purity of the function they suspend!

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

# Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the `const` method `pop_front`.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream, including inter-thread sharing. We can therefore use a shared pointer:

```template<class T>
class Stream
{
private:
std::shared_ptr <Susp<Cell<T>>> _lazyCell;
};```

Of course, because of reference counting and memoization, the stream is only conceptually immutable; but we (and the implementers of `shared_ptr`) have made sure that the hidden mutation is thread safe.

So what’s in the `Cell`? Remember, we want to be able to generate infinite sequences, so `Stream` must contain the DNA for not only producing the value of type `T` but also for producing the offspring — another (lazy) `Stream` of values. The `Cell` is just that: A value and a stream.

```template<class T>
class Cell
{
public:
Cell() {} // need default constructor for memoization
Cell(T v, Stream<T> const & tail)
: _v(v), _tail(tail)
{}
explicit Cell(T v) : _v(v) {}
T val() const {
return _v;
}
Stream<T> pop_front() const {
return _tail;
}
private:
T _v;
Stream<T> _tail;
};```

This mutually recursive pair of data structures works together amazingly well.

```template<class T>
class Stream
{
private:
std::shared_ptr <Susp<Cell<T>>> _lazyCell;
public:
Stream() {}
Stream(std::function<Cell<T>()> f)
: _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
{}
Stream(Stream && stm)
: _lazyCell(std::move(stm._lazyCell))
{}
Stream & operator=(Stream && stm)
{
_lazyCell = std::move(stm._lazyCell);
return *this;
}
bool isEmpty() const
{
return !_lazyCell;
}
T get() const
{
return _lazyCell->get().val();
}
Stream<T> pop_front() const
{
return _lazyCell->get().pop_front();
}
};```

There are several things worth pointing out. The two constructors follow our formal definition of the `Stream`: one constructs an empty stream, the other constructs a suspended `Cell`. A suspension is created from a function returning `Cell`.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of `forEach`.

The magic happens when we call `get` for the first time. That’s when the suspended `Cell` comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to `pop_front`. Notice that `pop_front` is a `const` method — the `Stream` itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from `n` to infinity. The constructor of a `Stream` takes a function that returns a `Cell`. We’ll use a lambda that captures the value of `n`. It creates a `Cell` with that value and a tail, which it obtains by calling `intsFrom` with `n+1`:

```Stream<int> intsFrom(int n)
{
return Stream<int>([n]()
{
return Cell<int>(n, intsFrom(n + 1));
});
}```

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner `intsFrom` is not made from the outer `intsFrom`. Instead it’s made the first time `get` is called on the emerging `Stream`.

Of course, we can also create finite streams, like this one, which produces integers from `n` to `m`:

```Stream<int> ints(int n, int m)
{
if (n > m)
return Stream<int>();
return Stream<int>([n, m]()
{
return Cell<int>(n, ints(n + 1, m));
});
}```

The trick is to capture the limit `m` as well as the recursion variable `n`. When the limit is reached, we simply return an empty `Stream`.

We’ll also need the method `take`, which creates a `Stream` containing the first `n` elements of the original stream:

```Stream take(int n) const {
if (n == 0 || isEmpty())
return Stream();
auto cell = _lazyCell;
return Stream([cell, n]()
{
auto v = cell->get().val();
auto t = cell->get().pop_front();
return Cell<T>(v, t.take(n - 1));
});
}```

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, `Stream`. Again, the key to understanding why this works is to keep in mind that `Stream`s and `Cell`s are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of `take` forces the evaluation of the first n elements — e.g., by passing them through the consuming `forEach` below — these elements will appear miraculously memoized in the original `Stream`.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of `forEach` that consumes the stream while enumerating it and feeding its elements to a function.

```template<class T, class F>
void forEach(Stream<T> strm, F f)
{
while (!strm.isEmpty())
{
f(strm.get());
strm = strm.pop_front();
}
}```

It’s the assignment:

`strm = strm.pop_front();`

which consumes the stream by decreasing the reference count of the head of the `Stream`. In particular, if you pass an rvalue `Stream` to `forEach`, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the `Stream`. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and `shared_ptr`.

# Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

```triples = do
z <- [1..]
x <- [1..z]
y <- [x..z]
guard (x^2 + y^2 == z^2)
return (x, y, z)```

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it `z` — create a stream from 1 to `z`. For each of those — call them `x` — create a stream from `x` to `z`. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples `(x, y, z)`.

So far we’ve learned how to create a stream of integers — we have the function `intsFrom`. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our `Stream` is a functor.

# Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the `std::transform` algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it `fmap` — generic, we have to parameterize it over types. The algorithm starts with a `Stream` of elements of type `T` and a function from `T` to some other type `U`. The result should be a stream of `U`.

We don’t want to make `U` the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type `F` to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

```template<class T, class F>
auto fmap(Stream<T> stm, F f)```

Without the use of concepts, we have no way of enforcing, or even specifying, that `F` be a type of a function from `T` to `U`. The best we can do is to statically assert it inside the function:

```static_assert(std::is_convertible<F, std::function<U(T)>>::value,
"fmap requires a function type U(T)");```

But what is `U`? We can get at it using `decltype`:

`decltype(f(stm.get()));`

Notice that `decltype` takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of `f`. We also need a dummy argument for this function: we use the result of `stm.get()`. The argument to `decltype` is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of `fmap`. It’s supposed to be `Stream<U>`, but we don’t know `U` until we apply `decltype` to the arguments of `fmap`. We have to use the new `auto` function declaration syntax of C++11. So here are all the type-related preliminaries:

```template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
{
using U = decltype(f(stm.get()));
static_assert(std::is_convertible<F, std::function<U(T)>>::value,
"fmap requires a function type U(T)");
...
}```

Compared to that, the actual implementation of `fmap` seems rather straightforward:

```    if (stm.isEmpty()) return Stream<U>();
return Stream<U>([stm, f]()
{
return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));
});
```

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function `f`, and returns a `Cell`. That cell contains the value of `f` acting on the first element of the original stream, and a tail. The tail is created with `fmap` acting on the rest of the original stream.

Equipped with `fmap`, we can now attempt to take the first step towards generating our triples: apply the function `ints(1, z)` to each element of the stream `intsFrom(1)`:

```fmap(intsFrom(1), [](int z)
{
return ints(1, z);
});```

The result is a `Stream` of `Stream`s of integers of the shape:

```1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
...```

But now we are stuck. We’d like to apply `ints(x, z)` to each element of that sequence, but we don’t know how to get through two levels of `Stream`. Our `fmap` can only get through one layer. We need a way to flatten a `Stream` of `Stream`s. That functionality is part of what functional programmers call a monad. So let me show you that `Stream` is indeed a monad.

# Stream as a Monad

If you think of a `Stream` as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two `Stream`s (we’ll get to it later) and let’s implement a function `mjoin` that concatenates a whole `Stream` of `Stream`s.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a `Stream` of `Stream`s, so we have to test two levels deep. This way we’ll ensure that the concatenation of a `Stream` of empty `Stream`s immediately returns an empty `Stream`.

The recursive step in `mjoin` creates a `Cell` whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of `mjoin` of the rest of the streams:

```template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
{
while (!stm.isEmpty() && stm.get().isEmpty())
{
stm = stm.pop_front();
}
if (stm.isEmpty()) return Stream<T>();
return Stream<T>([stm]()
{
Stream<T> hd = stm.get();
return Cell<T>( hd.get()
, concat(hd.pop_front(), mjoin(stm.pop_front())));
});
}```

The combination of `fmap` and `mjoin` lets us compose function like `intsFrom` or `ints` that return `Stream`s. In fact, this combination is so common that it deserves its own function, which we’ll call `mbind`:

```template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
{
return mjoin(fmap(stm, f));
}```

If we use `mbind` in place of `fmap`:

```mbind(intsFrom(1), [](int z)
{
return ints(1, z);
});```

we can produce a flattened list:

`1 1 2 1 2 3 1 2 3 4...`

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable `z`. We can keep chaining calls to `mbind` and capture more variables in the process:

```mbind(intsFrom(1), [](int z)
{
return mbind(ints(1, z), [z](int x)
{
return mbind(ints(x, z), [x, z](int y)
{
...
}
}
}```

At this point we have captured the triples `x`, `y`, `z`, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is `mthen` which is a version of `mbind` that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, `mthen` will return an empty stream.

We implement `mthen` using a slightly modified version of `fmap` that takes a function `f` of no arguments:

```template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
{
using U = decltype(f());
static_assert(std::is_convertible<F, std::function<U()>>::value,
"fmapv requires a function type U()");

if (stm.isEmpty()) return Stream<U>();
return Stream<U>([stm, f]()
{
return Cell<U>(f(), fmapv(stm.pop_front(), f));
});
}```

We plug it into the definition of `mthen` the same way `fmap` was used in `mbind`:

```template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
{
return mjoin(fmapv(stm, f));
}```

The second useful function is `mreturn`, which simply turns a value of any type into a one-element `Stream`:

```template<class T>
Stream<T> mreturn(T v)
{
return Stream<T>([v]() {
return Cell<T>(v);
});
}```

We’ll need `mreturn` to turn our triples into `Stream`s.

It so happens that a parameterized type equipped with `mbind` and `mreturn` is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy `Stream` is indeed a monad.

# Stream as a Monoid and a Monad Plus

When implementing `mjoin` we used the function `concat` to lazily concatenate two `Stream`s. Its implementation follows the same recursive pattern we’ve seen so many times:

```template<class T>
Stream<T> concat( Stream<T> lft
, Stream<T> rgt)
{
if (lft.isEmpty())
return rgt;
return Stream<T>([=]()
{
return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));
});
}```

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like `concat`, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of `Stream`s is indeed associative, and that the neutral element is an empty `Stream`. Concatenating an empty `Stream`, whether in front or in the back of any other `Stream`, doesn’t change the original `Stream`.

What’s even more interesting is that being a combination of a monoid and a monad makes `Stream` into a monad plus, and every monad plus defines a `guard` function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a `Stream`. If the Boolean is false, the `Stream` is empty (the unit element of monad plus!), otherwise it’s a singleton `Stream`. We really don’t care what value sits in this `Stream` — we never use the result of `guard` for anything but the flow of control. In Haskell, there is a special “unit” value `()` — here I use a `nullptr` as its closest C++ analog.

```Stream<void*> guard(bool b)
{
if (b) return Stream<void*>(nullptr);
else return Stream<void*>();
}```

We can now pipe the result of `guard` into `mthen`, which will ignore the content of the `Stream` but won’t fire when the `Stream` is empty. When the `Stream` is not empty, we will call `mreturn` to output a singleton `Stream` with the result tuple:

```Stream<std::tuple<int, int, int>> triples()
{
return mbind(intsFrom(1), [](int z)
{
return mbind(ints(1, z), [z](int x)
{
return mbind(ints(x, z), [x, z](int y)
{
return mthen(guard(x*x + y*y == z*z), [x, y, z]()
{
return mreturn(std::make_tuple(x, y, z));
});
});
});
});
}```

These singletons will then be concatenated by the three levels of `mbind` to create one continuous lazy `Stream` of Pythagorean triples.

Compare this function with its Haskell counterpart:

```triples = do
z <- [1..]
x <- [1..z]
y <- [x..z]
guard (x^2 + y^2 == z^2)
return (x, y, z)```

Now, the client can `take` 10 of those triples from the `Stream` — and the triples still won’t be evaluated!. It’s the consuming `forEach` that finally forces the evaluation:

```void test()
{
auto strm = triples().take(10);
forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
{
std::cout << std::get<0>(t) << ", "
<< std::get<1>(t) << ", "
<< std::get<2>(t) << std::endl;
});
}```

# Conclusion

The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function `triples` to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. They are by construction thread safe and the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.

[If you prefer, you may watch the video of my talk on this topic (here are the slides).]

If you thought you were safe from functional programming in your cozy C++ niche, think again! First the lambdas and function objects and now the monad camouflaged as `std::future`. But do not despair, it’s all just patterns. You won’t find them in the Gang of Four book, but once you see them, they will become obvious.

Let me give you some background: I was very disappointed with the design of C++11 `std::future`. I described my misgivings in: Broken Promises — C++0x futures. I also made a few suggestions as how to fix it: Futures Done Right. Five years went by and, lo and behold, a proposal to improve `std::future` and related API, N3721, was presented to the Standards Committee for discussion. I thought it would be a no brainer, since the proposal was fixing obvious holes in the original design. A week ago I attended the meetings of the C++ Standards Committee in Issaquah — since it was within driving distance from me — and was I in for a surprise! Apparently some design patterns that form the foundation of functional programming are not obvious to everybody. So now I find myself on the other side of the discussion and will try to explain why the improved design of `std::future` is right.

Design arguments are not easy. You can’t mathematically prove that one design is better than another, or a certain set of abstractions is better than another — unless you discover some obvious design flaws in one of them. You might have a gut feeling that a particular solution is elegant, but how do you argue about elegance?

Thankfully, when designing a library, there are some well known and accepted criteria. The most important ones, in my mind, are orthogonality, a.k.a., separation of concerns, and composability. It also helps if the solution has been previously implemented and tested, especially in more than one language. I will argue that this is indeed the case with the extended `std::future` design. In the process, I will describe some programming patterns that might be new to C++ programmers but have been tried and tested in functional languages. They tend to pop up more and more in imperative languages, especially in connection with concurrency and parallelism.

# The Problem

In a nutshell, the problem that `std::future` is trying to solve is that of returning the result of a computation that’s being performed in parallel, or returning the result of an asynchronous call. For instance, you start a computation in a separate thread (or a more general execution agent) and you want to, at some point in time, get back the result of that computation. This is one of the simplest models of concurrency: delegating the execution of a function (a closure) to another thread.

To return a value from one thread to another you need some kind of a communication channel. One thread puts a value in the channel, another picks it up. Instead of providing one channel abstraction, as does ML or Haskell, C++11 splits it into two separate abstractions: the promise and the future. The promise is the push end of the channel, the future is the pull end. (In Rust there are similar objects called `Chan` and `Port`.)

The general pattern is for the client to construct a promise, get the future from it using `get_future`, and start a thread, passing it the promise. When the thread is done, it puts the result in the promise using `set_value`. In the meanwhile, the calling thread may do some other work and eventually decide to retrieve the result from the future by calling its method `get`. If the promise has been fulfilled, `get` returns immediately with the value, otherwise it blocks until the value is available.

This pattern involves some boilerplate code dealing with the promise side of things, so the Standard introduced a shortcut called `std::async` to simplify it. You call `std::async` with a plain function (closure) and its result is automatically put into a hidden promise. All the client sees is the `future` side of the channel. (I am simplifying things by ignoring exception handling and various modes of starting `async`.)

# The Functor Pattern

Here’s the first abstraction: A future is an object that encapsulates a value. By itself, this would be a pretty useless abstraction unless the encapsulation came with some other functionality or restriction. For instance, `std::unique_ptr` encapsulates a value, but also manages the lifetime of the memory it occupies. A future encapsulates a value, but you might have to block to get it. Functional languages have a very useful pattern for just this kind of situation: the functor pattern (not to be confused with the C++ misnomer for a function object). A functor encapsulates a value of an arbitrary type, plus it lets you act on it with a function.

Notice that the functor doesn’t necessarily give you access to the value — instead it lets you modify it. The beauty of it is that, in the case of a future, a functor gives you the means to modify the value that potentially isn’t there yet — and it lets you do it without blocking. Of course, behind the scenes, the function (closure) that you provide is stored in the future and only applied when the value is ready and is being accessed using `get`.

The first part of the fix that was proposed to the Committee was to turn `std::future` into a functor. Technically, this is done by adding a new method, `then`:

```template<typename F>
auto future::then(F&& func) -> future<decltype(func(*this))>;```

This method takes a function object `func` to be applied to the future in question. The result is a new future of the type that is returned by the function object, `decltype(func(*this))`.

Things are slightly muddled by the fact that a future not only encapsulates the value to be calculated but also the possibility of an exception. This is why the function passed to `then` takes the whole future, from which it can extract the value using `get`, which at that point is guaranteed not to block, but may rethrow an exception. There is an additional proposal N3865 to introduce another method, `next`, that would deal only with the value, not the exception. The advantage of `next` is that it could be called with a regular function unaware of the existence of futures, with no additional boilerplate. For simplicity, I’ll be using `next` in what follows.

The functor pattern makes perfect sense for composing a regular function on top of an asynchronous function (one returning a future), but it’s more general than that. Any time you have an object that is parameterized by an arbitrary type, you might be dealing with a functor. In C++, that would be a template class that doesn’t impose any restrictions on its template argument. Most containers have this property. In order for a generic class to be a functor it must also support a means to operate on its contents. Most containers in STL provide this functionality through the algorithm `std::transform`. For an imperative programmer it might come as a surprise that such disparate things as futures and containers fall under the same functional pattern — a functor.

Unlike in functional languages, in C++ there is no natural reusable expression for the functor pattern, so it’s more of the pattern in the head of the programmer. For instance, because of memory management considerations, `std::transform` operates on iterators rather than containers — the storage for the target container must be either pre-allocated or allocated on demand through iterator adapters. One could try to provide iterator adapters for futures, so they could be operated on by `std::transform`, but ultimately the transformation has to act on the internals of the future (i.e., store the function object in it) so it either has to be a method or a friend of the future.

# The Monad Pattern

The functor pattern is not enough to provide full composability for futures. The likely scenario is that the user creates a library of future-returning functions, each performing a specific task. He or she then needs the means to combine such functions into more complex tasks. This is, for instance, the case when combining asynchronous operations, such as opening a file and then reading from it. Suppose we have the `async_open` function that returns a file handle future:

`future<HANDLE> async_open(string &);`

and the `async_read` function that takes a file handle and returns a future with the buffer filled with data:

`future<Buffer> async_read(HANDLE fh);`

If you combine the two using `next`, the result will be a future of a future:

`future<future<Buffer>> ffBuf = async_open("foo").next(&async_read);`

In order to continue chaining such calls without blocking — for instance to asynchronously process the buffer — you need a way to collapse the double future to a single future and then call `next` on it.

The collapsing method, `unwrap`, is another part of the extended future proposal. When called on a `future<future<T>>` it returns `future<T>`. It lets you chain asynchronous functions using `next` followed by `unwrap`.

`async_open("foo").next(&async_read).unwrap().next(&async_process);`

In functional programming such a collapsing function is called `join`. The combination `next` followed by `unwrap` (or, in Haskell, `fmap` followed by `join`) is so common that it has its own name, bind (in Haskell it’s the operator `>>=`). It might make sense to make `bind` another method of future (possibly under a different name). [Edit: In fact, the proposal (n3721) is to overload `then` to automatically perform `unwrap` whenever the result is a future of a future. This way `then` would also work as bind.]

There’s one more important usage pattern: a function that may execute asynchronously, but sometimes returns the result immediately. This often happens in recursive algorithms, when the recursion bottoms up. For instance, a parallel tree traversal function may spawn asynchronous tasks for traversing the children of a node, but when it reaches a leaf, it might want to return the result synchronously. Instead of writing complicated conditional code at each level, it’s easier to provide a “fake” future whose contents is immediately available — whose `get` method never blocks. Such fake future and the function that creates it called `make_ready_future` are also part of the proposal.

Together, the methods `next` (or `then`) and `unwrap`, and the function `make_ready_future` are easily recognizable by a functional programmer as forming the monad pattern (in Haskell, they would be called, respectively, `fmap`, `join`, and `return`). It’s a very general pattern for composing functions that return encapsulated values. Using a monad you may work with such functions directly, rather than unwrapping their results at every step. In the case of futures, this is an important issue, since the “unwrapping” means making a potentially blocking call to `get` and losing precious opportunities for parallelism. You want to set up as much computation up front and let the system schedule the most advantageous execution.

Combining functions using `next`, `unwrap` (or, equivalently, `bind`), and `make_ready_future` is equivalent to specifying data dependencies between computations and letting the runtime explore opportunities for parallelism between independent computations.

# The Applicative Pattern

The combinators `then` and `next` are designed for linear composition: the output of one computation serves as the input for another. A more general pattern requires the combining of multiple asynchronous sources of data. In functional programming the problem would be described as applying a function to multiple arguments, hence the name “applicative” pattern. A functional programmer would take a multi-argument function and “lift” it to accept futures instead of immediate values.

As expected, in imperative programming things are a little messier. You have to create a barrier for all the input futures, retrieve the values, and then pass them to the multi-argument function or algorithm. The proposal contains a function called `when_all` that implements the first part of the process — the barrier. It takes either a pair of iterators to a container of futures or a variable number of futures, and returns a future that fires when all the arguments are ready. Conceptually, it performs a logical AND of all input futures.

The iterator version of `when_all` returns a future of a vector of futures, while the variadic version returns a future of a tuple of futures. It’s up to the client to `get` the resulting vector or tuple and iterate over it. Because of that, it’s not possible to directly chain the results of `when_all` the way `then` or `next` does it.

If you’re wondering how this kind of chaining is done in a functional language, you have to understand what partial application is. A function of many arguments doesn’t have to be applied to all of the arguments at once. You can imagine that applying it to the first argument doesn’t yield a value but rather a function on n-1 arguments. In C++11, this can be accomplished by calling `std::bind`, which takes a multi-parameter function and a value of the first argument, and returns a function object (a closure) that takes the remaining n-1 arguments (actually, you may pass it more than one argument at a time).

In this spirit, you could bind a multi-parameter function to a single future and get a future of a function of n-1 arguments. Then you are left with the problem of applying a future of a function to a future of an argument, and that’s exactly what the applicative pattern is all about. In Haskell, the `Applicative` class defines the operator `<*>` that applies an encapsulated function to an encapsulated value.

# The Monoid Pattern

A very common pattern is to start several computations in parallel and pick the one that finishes first. This is the basis of speculative computation, where you pitch several algorithms against each other. Or you might be waiting for any of a number of asynchronous events, and attend to them as soon as they happen.

At a minimum you would expect a combinator that acts like a logical OR of two futures. A functional programmer would be immediately on the lookout for the monoid pattern. A monoid is equipped with a binary operation and a unit element. If the binary operation on futures picks the one that finishes first, what should the unit future be? A unit combined with any element must give back that same element. Therefore we need a future that would lose the race with any other future. We could call this special future “never.” Calling `get` on such a future would block forever.

In practice, one could slightly relax the definition of the “never” future. It would never return a result, but it could still throw an exception. A future like this could be used to implement a timeout. Pitching it against another future would either let the other future complete, or result in a timeout exception.

This is not the way the future extension proposal went, though. The proposed combinator is called `when_any` and it takes either a pair of iterators to a container of futures or a variable number of futures. It returns a future of either a vector or a tuple of futures. It’s up to the client to iterate over those futures and find the one (or the ones) that fired by calling `is_ready` on each of them.

The advantage of this approach is that the client may still write code to wait for the remaining futures to finish. The disadvantage is that the client is responsible for writing a lot of boilerplate code, which will obscure the program logic.

# Performance and Programming Considerations

An objection to using futures as the main vehicle for asynchronous programming was raised in N3896: Library Foundations for Asynchronous Operations. The point it that it’s possible for an asynchronous API to have a result ready before the client had the opportunity to provide the continuation by calling `then` (or `next`). This results in unnecessary synchronization, which may negatively impact performance.

The alternative approach is to pass the continuation (the handler) directly to the asynchronous API. This is how a lot of asynchronous APIs are implemented at the lowest level anyway. The two approaches don’t exclude each other, but supporting both at the same time, as proposed in N3896, adds a lot of complexity to the programming model.

From the programmer’s perspective, the continuation passing model of N3896 is probably the hardest to use. The programming model is that of a state machine, with the client responsible for writing handlers for every transition.

Futures provide a useful abstraction by reifying the anticipated values. The programmer can write code as if the values were there. Futures also provide a common language between concurrent, parallel, and asynchronous worlds. It doesn’t matter if a value is to be evaluated by spawning a thread, creating a lightweight execution agent, or by calling an asynchronous API, as long as it’s encapsulated in a future. The compositional and expressional power of futures is well founded in major patterns of functional programming: the functor, the monad, the applicative, and the monoid.

There is another, even more attractive programming model that’s been proposed for C++, Resumable Functions, which makes asynchronous code look more like sequential code. This is based on a trick that’s well known to Haskell programmers in the form of the “do” notation. In C++, a resumable function would be chopped by the compiler into a series of continuations separated by `await` keywords. Instead of creating a future and calling `then` with a lambda function, the programmer would insert `await` and continue writing code as if the value were available synchronously.

# Acknowledgment

I’d like to thank Artur Laksberg for reading the draft of this blog and providing useful feedback.

In my previous post I worked on stretching the intuition of what a container is. I proposed that, in Haskell, any functor may be interpreted as some kind of container, including the hard cases like the state functor or the IO functor. I also talked about natural transformations between functors as “repackaging” schemes for containers, which work without “breaking the eggs” — not looking inside the elements stored in the container. Continuing with this analogy: Algebras are like recipes for making omelets.

The intuition is that an algebra provides a way to combine elements stored inside a container. This cannot be done for arbitrary types because there is no generic notion of “combining.” So an algebra is always defined for a specific type. For instance, you can define an algebra for numbers because you know how to add or multiply them, or for strings because you can concatenate them, and so on. The way elements are combined by an algebra is in general driven by the structure of the container itself.

For example, think of an expression tree as a container.

```data Expr a = Const a
| Add (Expr a) (Expr a)
| Mul (Expr a) (Expr a)```

We could define many algebras for it. An integer algebra would work on an expression tree that stores integers. A complex algebra would work on a tree that stores complex numbers. A Boolean algebra would work on Boolean expressions using, for instance, logical OR to evaluate the `Add` node and logical AND for the `Mul` node. You could even define an algebra of sets with union and intersection for `Add` and `Mul`. In fact, in the absence of any additional requirements, any pair of binary functions acting on a given type will do.

The definition of an algebra for a given functor `f` consists of a type `t` called the carrier type and a function called the action. Any Haskell algebra is therefore of the type:

`newtype Algebra f t = Algebra (f t -> t)`

More abstractly, in category theory, an algebra (or, more precisely, an F-algebra) for an endofunctor F is a pair (A, alg) of an object A and a morphism alg : F A -> A. As always, the standard translation from category theory to Haskell replaces objects with types and morphisms with functions.

Let’s have a look at a simple example of an algebra. Let’s pick the list functor and define an `Int` algebra for it, for instance:

```sumAlg :: Algebra [] Int
sumAlg = Algebra (foldr (+) 0)```

Despite its simplicity, this example leads to some interesting observations.

First, the use of `foldr` tells us that it’s possible to handle recursion separately from evaluation. The evaluation is really parameterized here by the function `(+)` and the value, zero. The algebra is type-specific. On the other hand, `foldr` is fully polymorphic. It turns out that there is another algebra hidden in this example, and it’s determined just by `(+)` and zero. We’ll see that more clearly when we talk about fixed points of functors.

The second observation is that a list is not only a functor but also a monad. Is there something special about algebras for a monad? We’ll see.

# Algebras and Fixed Points

I wrote a whole blog post about F-algebras with a more categorical slant. Here I’ll elaborate on the Haskell aspects of algebras and develop some more intuitions.

A recursive container is not only a functor but it can also be defined as a fixed point of a functor. So, really, we should start with a double functor, parameterized by two types, `a` and `b`:

```data ExprF a b = Const a
| Add b b
| Mul b b
deriving Functor```

We can then find its fixed point: a type that, when substituted for `b`, will give back itself. Think of a functor as a TV camera (sorry for switching metaphors). When you point it at some type `b`, its image appears in all the little monitors where `b` is on the right hand side of the definition. We all know what happens when you point the camera back at the monitors — you get the ever receding image within image within image… That’s your fixed point.

This “pointing of the camera at the monitors” can be abstracted into a Haskell data structure. It is parameterized by a functor `f`, which provides the camera and the monitors. The fixed point is given by the ever receding:

`newtype Fix f = In (f (Fix f))`

Notice that, on the left hand side, `f` appears without an argument. If `f a` is a container of `a` then `f` by itself is a recipe for creating a container from any type. `Fix` takes such a recipe and applies it to itself — to `(Fix f)`.

Later we’ll also need the deconstructor, `unIn`:

```unIn :: Fix f -> f (Fix f)
unIn (In x) = x```

Going back to our earlier functor, we can apply `Fix` to it and get back the recursive version of `Expr`:

`type Expr a = Fix (ExprF a)`

Here, `(ExprF a)` is a recipe for stuffing any type `b` into a simple (non-recursive) container defined by `ExprF`.

Creating actual expressions using the above definition of `Expr` is a little awkward, but possible. Here’s one:

```testExpr :: Expr Int
testExpr = In \$ (In \$ (In \$ Const 2) `Add` (In \$ Const 3))
`Mul` (In \$ Const 4)```

Knowing that a recursive data type such as `(Expr a)` is defined in terms of a simpler functor `(ExprF a b)` means that any recursive algebra for it can be defined in terms of a simpler algebra. For instance, we can define a simple algebra for `(ExprF Int)` by picking the carrier type `Double` and the following action:

```alg :: ExprF Int Double -> Double
alg (Const i) = fromIntegral i
alg (Add x y) = x + y
alg (Mul x y) = x * y```

We can extend this algebra to work on arbitrary recursive expressions of type `Expr Int`. We’ll call this new recursive algebra `alg'`. When given an `(Expr Int)` it will do the following:

1. Extract the contents of the outer `Fix` by pattern matching on the consturctor `In`. The contents is of the type `ExprF` acting on `(Expr Int)`.
2. Apply `alg'` (the recursive one we are just defininig) to this contents. Do this using `fmap`. Here we are taking advantage of the fact that `ExprF` is a functor. This application of `alg'` replaces the children of the expression `ExprF` with `Double`s — the results of their evaluation.
3. Apply `alg` to the result of the previous step, which is of the type `(ExprF Int Double)`.

Here’s the code that implements these steps:

```alg' :: Fix (ExprF Int) -> Double
alg' (In expr) = alg (fmap alg' expr)```

Notice that this code does not depend on the details of the functor. In fact it will work for any functor and any algebra:

```cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unIn```

This generic function is called a catamorphism. It lets you apply an algebra to the contents of a recursively defined container.

My first example of an algebra was acting on a list. A list can also be defined as a fixed point of a functor:

```data ListF a b = Empty | Cons a b
deriving Functor```

If you work out the details, you can convince yourself that the `sumAlg` I defined earlier is nothing else but the catamorphism for the functor `ListF Int` applied to the following simple algebra:

```alg :: ListF Int Int -> Int
alg Empty = 0
alg (Cons a b) = a + b```

Now we understand why any list catamorphism is parameterized by one value and one function of two arguments.

# Monads and Algebras

As I said in the beginning, a list is not only a functor but also a monad. A monad adds two special abilities to a functor/container. It lets you create a default container that contains just a given value: The function that does it is called `return`. And it lets you collapse a container of containers into a single container: That function is called `join` (and I explained before how it relates to the more commonly used bind, `>>=`).

When we define an algebra for a functor that happens to be a monad, it would be nice for this algebra to interact sensibly with `return` and `join`. For instance, you can apply `return` to a value of the algebra’s carrier type to obtain a default container of that type. Evaluating such a container should be trivial — it should give you back the same value:

`(1) alg . return == id`

For instance, in the list monad `return` creates a singleton list, so we want the algebra to extract the value from a singleton without modifying it in any way.

```alg [a] =
(alg . return) a =
id a =
a```

Now let’s consider a container of containers of the carrier type. We have two ways of collapsing it: we can `fmap` our algebra over it — in other words, evaluate all the sub-containers — or we can `join` it. Expecting to get the same result in both cases would be asking a lot (but we get something like this in the Kleisli category later). We can demand though that, for an algebra to be compatible with a monad, the two resulting containers at least evaluate to the same thing:

`(2) alg . fmap alg == alg . join`

Let’s see what this condition means for lists, where `join` is concatenation. We start with a list of lists and we apply two evaluation strategies to it: We can evaluate the sub-lists and then evaluate the resulting list of results, or we can concatenate the sub-lists and then evaluate the concatenated list.

Guess what, our condition is equivalent to imposing associativity on the algebra. Think of the action of the algebra on a two-element list as some kind of “multiplication.” Since the concatenation of `[a, [b, c]]` is the same as the concatenation of `[[a, b], c]`, these two must evaluate to the same value. But that’s just associativity of our “multiplication.”

How much can we extend this analogy with multiplication? Can we actually produce a unit element? Of course: The action of the algebra on an empty list:

`e = alg []`

Let’s check it: Apply our compatibility conditions to the list `[[a], []]`. This is the left hand side:

```(alg . fmap alg) [[a], []] =
alg [alg [a], alg []] =
alg [a, e]```

And this is the right hand side:

```(alg . join) [[a], []] =
alg [a] =
a```

So, indeed, `e` is the right unit of our “multiplication.” You can do the same calculation for `[[], [a]]` to show that it’s also the left unit.

We have an associative operation equipped with a unit — that’s called a monoid. So any list algebra compatible with the list’s monadic structure defines a monoid.

## T-Algebras

An F-algebra that’s compatible with a monad (conditions (1) and (2) above), both built on the same functor, is called a T-algebra. I guess that’s because mathematicians replace F with T when they talk about monads. There may be many T-algebras for a given monad and in fact they form a category of their own.

This is not saying much, because requirements for a category are pretty minimal. You have to define arrows: here it would be homomorphisms of T-algebras. A homomorphism of algebras maps one carrier into another in such a way as to preserve the action.

In Haskell, a homomorphism of algebras would just be a function `h` from one carrier type to another such that:

```h    :: A -> B
alg  :: F A -> A
alg' :: F B -> B

h . alg == alg' . fmap h```

Here, `alg` and `alg'` are the two actions with carrier types `A` and `B`, respectively, and `F` is the functor. What this means is that, if you have a container of As you can evaluate it using `alg` and then apply `h` to it and get a B, or you can apply `h` to the contents of the container using `fmap` and then evaluate the resulting container of Bs using `alg'`. The result should be the same in both cases.

This is a pretty standard way of defining homomorphisms for any structure, not just an algebra. Homomorphisms behave like functions: they are composable and there always is an identity homomorphism for every algebra, so they indeed turn T-algebras into a category — the so called Eilenberg-Moore category.

Remember what I said about the compatibility between `join` and `alg`? They both take down one layer of containment. Other than that, they are very different: `join` is a polymorphic natural transformation — it operates on the structure of the container, not its contents. An F-algebra operates on the contents and is defined only for a specific type.

And yet we can use `join` to define a T-algebra. Just consider using a container as a carrier type. A container is an image of some type `a` under a functor `m` which, for our purposes also happens to be a monad. Apply `m` to it one more time and you get a container of containers. You can “evaluate” this container of containers down to a single container using `join`.

You have just defined an algebra for the functor `m` whose carrier type is `(m a)` and the action is `join`. In fact, you have defined a whole family of algebras parameterized by the type `a`. Keep in mind that `a` is not the carrier type of this algebra, `(m a)` is. These algebras are called free algebras for the monad `m`. Guess what, they also form a category — the so called Kleisli category — which is a subcategory of the Eilenberg-Moore category.

Why are these two categories important? Well, it’s a topic for another blog post, but here’s the idea: Suppose you have two functors, F and G, one going from category C to D and the other going back. If G were the inverse of F, we would say that C and D are isomorphic. But what if they were “almost” inverse? For instance, their composition instead of being the identity were somehow mappable to identity. This kind of relationship between functors can be formalized into an adjunction. It so happens that the composition of two adjoint functors forms a monad (or a comonad, if you compose them the other way around). Not only that — any monad may be decomposed into a pair of adjoint functors. There are many ways to perform this decomposition and there are many choices for the intermediate category — the target of F and the source of G. The Kleisli category is the smallest such category and the Eilenberg-Moore category is the largest one.

Anybody who tries to popularize Haskell has to deal with the monad problem. You can’t write the simplest Haskell program without a monad (the IO monad in particular) and yet there is no easy way to explain what a monad is. Mind you, when teaching object oriented programming, nobody starts with a definition of “object.” There’s no need for that because everybody has some intuition about objects — we are surrounded by objects in real life. Which is very convenient because if you tried to define what an object is in C++ or Java, you’d have to use quite a bit of hard-core PL theory; plus you’d have to wave your hands a lot. This trick doesn’t work with monads because, except for category theorists, nobody comes to Haskell with any intuitions about monads. The name monad is scary in itself (this is probably why monads in F# are innocuously called “computation expressions”).

In my n’th attempt to tackle monads I tried a different, mystical, approach. In mysticism you often talk about things that you cannot define. Taoism takes this approach to the extreme — any attempt to define Tao must fail: “Tao that can be described is not the eternal Tao.” Hence the slightly tongue-in-cheek approach I took in this tutorial, Pure Functions, Laziness, I/O, and Monads. If you’re new to Haskell, it might help if you read the two previous tutorials in my Basics of Haskell. Notice that these tutorials are in the new format introduced by the School of Haskell, where you can run and edit program snippets directly in your browser.

I must confess that this tutorial got slightly cold welcome in the Haskell community. That’s why I’m curious what non-Haskellers think of it. Does my approach make monads less or more scary? Take into account that I decided to tackle monads right after explaining the very basics of Haskell syntax — in most courses or books monads are treated as an advanced topic.

The videos and the slides of my Boostcon talks, “Haskell — The Pseudocode Language for C++ Metaprogramming” and now available online, courtesy my employer, Corensic. They cover some of the material from the blog series on monads.

“You must be kidding!” would be the expected reaction to “Monads in C++.” Hence my surprise when I was invited to Boostcon 11 to give a three-hour presentation on said topic, a presentation which was met with totally unexpected level of interest. I imagine there was a small uptick in the sales of Haskell books afterwards.

Before I answer the question: “Why are monads relevant in C++?”, let me first answer the question: “Why is Haskell relevant in C++?” It turns out that Haskell makes C++ template metaprogramming if not easy then at least approachable (see my blog, What Does Haskell Have to Do with C++?). You have to understand that compile-time C++ is a strict functional language operating (mostly) on types. If that is not immediately obvious, it’s because the C++ template syntax is so appalling.

Now, not everybody falls in love with the Haskell syntax (at least not at first sight) but it fits the functional paradigm much better than anything else. What’s more, the correspondence between Haskell code and C++ template code is so direct that it could almost be translated mechanically. It really pays to design and test your template metaprograms in Haskell first, before translating them to C++. Conversely, translating complex template code to Haskell may help you understand it better and enable you to maintain and extend it. Those were my reasons to advocate the use of Haskell as a pseudo-language for C++ template metaprogramming.

Armed with Haskell I could study and analyze some of the most sophisticated C++ metacode. But when I looked at Eric Niebler’s Boost Proto library and couldn’t make heads or tails of it, even after discussing it with him over email and over beers, I was stumped.

# Monads and EDSLs

Boost Proto is a library for implementing domain-specific languages (DSLs). It’s an impressive piece of C++ metaprogramming but it’s hard to comprehend and it doesn’t yield to straightforward translation to Haskell. The problem is that it combines compile-time constructs with run time code in a non-trivial way. I struggled long with Proto until one day I had an epiphany– I was looking at a monad. But first:

## What Is an EDSL?

Great things come out of abusing C++. One example is the abuse of templates (which were originally designed to support parametrized data types) to express compile-time computations. The result is Template Metaprogrammming (TMP). Another example is the abuse of operator overloading, which created a fertile ground for Embedded Domain-Specific Languages (EDSLs). Eric’s Proto combined the two abominations into a library for constructing EDSLs.

In this post I will construct a simple EDSL in C++. This is how it works:

1. You overload the heck out of operators
2. The overloaded operators build trees from expressions instead of eagerly computing results
3. The type of the tree encodes the structure of the expression that created it. Since it’s a type, the structure of the expression is available at compile time
4. You construct an object of that type
5. You execute that object (because it is a function object) with appropriate arguments, and get the results you wanted.

A more general EDSL will also create a runtime data structure that stores the values of any variables and literals in the expression. Algorithms can walk the tree at compile time, runtime, or both to compute values and perform actions. You can even write a compile time algorithm to compute a runtime algorithm that munges trees in interesting ways.

My moment of Zen was when I realized that an EDSL corresponds to a Haskell reader monad. (If you’re not familiar with Haskell or monads, read my series, Monads for the Curios Programmer.) To my own amazement this analogy worked and led to executable code.

Haskell Monad C++ “Monad”
Composition of monadic functions Compile-time parsing of expressions
Execution of compound action on astate Application of function object to runtime values

To prove the concept, I picked a simple EDSL based on one of the Eric’s examples. It’s a two-argument lambda EDSL. It let’s you write an expression with two placeholders, `arg1` and `arg2`. Through the magic of templates and operator overloading, this expression is interpreted as an anonymous function of two arguments. Here’s an example which evaluates to 25:

`int x = (arg1 * arg2 + 13)(3, 4)`

It turns out that the most important step in this process is to be able to convert an expression tree into a function object. Let me do this in Haskell first, and then translate it into C++.

# The Expression Monad in Haskell

Let’s start with a short:

## Reader Monad Refresher

A reader monad (a state monad with immutable state) provides a way to express stateful computations in a pure functional language. A stateful computation takes some arguments and returns a value, but unlike a function, it makes use of some external state. This computation can be turned into a function that takes the same arguments but, instead of returning a value, returns another function that takes a state as an argument and calculates the value. (The distinction between a computation and a function is essential here.)

Statefull computation represented as a monadic function returning an action

## State

The state, in the case of our expression monad, is a collection of two values — the arguments to the lambda. They will be used to replace the two placeholders in the expression.

Here’s the definition of our state — a list of integers (we really need only two, but I’m being sloppy… I mean general):

`type Args = [Integer]`

## Actions

An action is a function that takes the state and produces a value of arbitrary type `t`:

`Args -> t`

You might remember from my previous post that one element of the monad is a type constructor: a mapping of an arbitrary type into an enriched type. In this case the enriched type is an action type. Because of technicalities related to Haskell’s type inference, I’ll have to thinly encapsulate an action inside a new type, which I will call `Prog`:

`newtype Prog t = PR (Args -> t)`

The idea is that the composition of monadic actions is similar to the compilation of source code: it produces a “program,” which is then “run.” In fact, I’ll define an auxiliary function that does just that, it runs a “program”:

```run :: Prog t -> Args -> t
run (PR act) args = act args```

This is a function that takes a `Prog t` as its first argument and pattern matches it to the constructor `(PR act)`. The second argument is the state, `args` of type `Args`. The result of running the program is a value of type `t`.

## Monadic Functions

The next step is to define some primitive monadic functions. These are functions that produce actions (or “programs”). Here’s a very useful function that produces an action that extracts the n’th value from the list of arguments (the state):

```getArg :: Int -> Prog Integer
getArg n = PR (λ args -> args !! n))```

`getArg` takes an `Int` (in our case, zero or one) and returns a lambda (encapsulated in a `Prog` using the constructor `PR`). This lambda takes a list of `Integer`s, `args`, and extracts its n’th element (`args !! n` means: take the n’th element of the list, `args`).

You’ve seen examples of monadic functions in my previous blog post, but it’s worth repeating the idea: `getArg` is a function that returns an action that is like a promise to extract the n’th argument when the arguments become available.

Just for fun, here’s another monadic function that takes `n` and promises to return twice `n` when the arguments are provided. It doesn’t matter that the arguments are going to be ignored.

`doubleIt n = PR (λ args -> 2 * n)`

## Monadic Bind

I need two monadic functions, `getArg` and `doubleIt`, in order to demonstrate their composition using “bind.” I want to create an action that will get the first argument using `getArg 0` and double it using `doubleIt v`. If you remember how “bind” works, it takes an action, in this case the result of `getArg 0`, and a continuation, which represents “the rest of the computation.”

Combining two monadic functions using bind

In our case, the continuation is a lambda that takes an argument `v` (that’s the future result of the first action) and performs `doubleIt v`.

`λ v -> doubleIt v`

This lambda returns an action because `doubleIt v` returns an action.

The signature of `bind` is:

`bind :: (Prog a) -> (a -> (Prog b)) -> (Prog b)`

You might have noticed that I use the words “action” and “program” interchangeably, although, strictly speaking, an action is the contents of a program. However, this distinction is an artifact or a Haskell quirk — a monad can’t be defined using a type alias, so we need the `Prog` type to encapsulate the action. Curiously, we won’t have this problem in C++.

The purpose of bind is to glue together an action with a continuation and return a new action. That means that `bind` has to return a lambda of appropriate type:

```bind (PR act) cont =
PR (λ args -> ... produce value of type b ...)```

This lambda will be executed when arguments are available. At that point, with the arguments handy, we’ll be able to first execute the action, `act`, and pass its result to the continuation. The continuation will return another action, which we can promptly execute too:

```bind (PR act) cont =
PR (λ args ->
let v = act args
(PR act') = cont v
in
act' args)```

(In Haskell you can use primed variables, like `act'`. I like that notation.)

This is very much like the state monad bind, except that we don’t have to worry about chaining the state, which is immutable. In fact the above is the definition of the reader monad’s `bind`.

Let’s test our new bind by composing `getArg` with `doubleIt`:

```test0 :: Prog Integer
test0 =
bind (getArg 0) (λ v -> doubleIt v)```

Composing monadic function, `GetArg`, with the continuation that calls another monadic function, `doubleIt`

We can run the program produced by `test0` to see that it actually works:

```> let prog = test0
> run prog [3,4]
6
> run prog [11,0]
22```

For completeness, here’s the full definition of the `Prog` reader monad:

```instance Monad Prog where
return v = PR (λ args -> v)
(PR act) >>= cont = PR (λ arg -> let v = act arg
(PR act') = cont v
in act' arg)```

With that definition, we can rewrite our little example in the `do` notation:

```test1 = do
v <- getArg 0
doubleIt v```

## Expression Tree

We have the definition of a program — a thinly encapsulated action. To create an actual program we need some kind of source code and a compiler. Our source code takes the form of an expression tree. A tree in Haskell is defined as a tagged union:

```data Exp = Const Integer
| Plus Exp Exp
| Times Exp Exp
| Arg1
| Arg2```

The first constructor, `Const` takes an `Integer` and creates a leaf node corresponding to this constant value. The next two constructors are recursive, they each take two expressions and produce, correspondingly, a `Plus` and a `Times` node. Finally, there are two placeholders for the two arguments that will be provided at “runtime.”

## Compilation

Compilation is turning source code into a program. This process can be represented as a function with the following signature:

`compile :: Exp -> Prog Int`

It takes an expression tree (the source) and returns a program that evaluates to an integer. Looking from another perspective, `compile` is just a monadic function in our `Prog` monad.

We will define `compile` in little steps driven by the various patterns in the definition of `Exp`. If the expression matches the `Const` node, we define `compile` as:

`compile (Const c) = return c`

Remember, `return` is a function that takes a value and returns an action (program) that will produce this value when executed.

Another easy case is `Arg1` (and `Arg2`)–we already have a monadic function `getArg` that we can call:

`compile Arg1 = getArg 0`

The interesting case is the `Plus` (or `Times`) node. Here we have to recursively call `compile` for both children and then compose the results. Well, that’s what monadic function composition is for. Here’s the code using the `do` notation, hiding the calls to `bind`:

```compile (Plus e1 e2) =
do
v1 <- compile e1
v2 <- compile e2
return (v1 + v2)```

That’s it! We can now compile a little test expression `(Arg1 * Arg2 + 13)`:

```testExp =
let exp = (Plus (Times Arg1 Arg2)(Const 13))
in compile exp```

and we can run the resulting program with an argument list:

```> let args = [3, 4]
> let prog = testExp
> run prog args
25```

# The Expression Monad in C++

The translation of the Haskell expression monad to C++ is almost embarrassingly easy. I’ll be showing Haskell code side by side with its C++ equivalents.

## Expression Tree

```data Exp = Const Integer
| Plus Exp Exp
| Times Exp Exp
| Arg1
| Arg2```

In C++, the expression tree is a compile-time construct that is expressed as a type (I’ll show you later where this type originates from). We have separate types for all the leaves, and the non-leaf nodes are parametrized by the types of their children.

```template<int n> struct Const {};

template<class E1, class E2>
struct Plus {};

template<class E1, class E2>
struct Times {};

struct Arg1 {};
struct Arg2 {};```

For instance, here’s the type that corresponds to the expression `arg1*arg2+13`:

`Plus<Times<Arg1, Arg2>, Const<13>>`

## State

Haskell:

`type Args = [Integer]`

C++: State is a runtime object. I implemented it using an array of two integers and, for good measure, I added a constructor and an accessor.

```struct Args
{
Args(int i, int j) {
_a[0] = i;
_a[1] = j;
}
int operator[](int n) { return _a[n]; }
int _a[2];
};```

## Action

Here’s the tricky part: How to represent an action? Remember, an action takes state, which is now represented by `Args`, and returns a value of some type. Because `Args` are only available at runtime, an action must be a runtime function or, even better, a function object.

How does that fit in our compile-time/runtime picture? We want our C++ monadic functions to be “executed” at compile time, but they should produce actions that are executed at runtime. All we can do at compile time is to operate on types, and this is exactly what we’ll do. We will create a new type that is a function object. A function object is a `struct` that implements an overloading of a function-call operator.

There’s another way of looking at it by extending the notion of a metafunction. In one of my previous posts I described metafunctions that “return” values, types, or even other metafunctions. Here we have a metafunction that returns a (runtime) function. This view fits better the Haskell monadic picture where a monadic function returns an action.

## Type Constructor

Unfortunately, not everything can be expressed as neatly in C++ as it is in Haskell. In particular our type constructor:

`newtype Prog t = PR (Args -> t)`

doesn’t have a direct C++ counterpart. It should be a template that, for any type `T`, defines an action returning that type. In this construction, an action is represented by a C++ function object, so we would like something like this:

```template<class T> struct PR {
T operator()(Args args);
};```

which, for many reasons, is useless. What we really need is for `PR` to be a “concept” that specifies a type with an associated method, `operator()`, of a particular signature. Since concepts are not part of C++11, we’ll have to rely on programming discipline and hope that, if we make a mistake, the compiler error messages will not be totally horrible.

## Monadic Metafunction

Let’s start by translating the Haskell monadic function `getArg`:

```getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)```

Here it is in C++:

```template<int n>
struct GetArg { // instance of the concept PR
int operator()(Args args) {
return args[n];
}
};```

It is a metafunction that takes a compile-time argument `n` and “returns” an action. Translation: it defines a struct with an overloaded `operator()` that takes `Args` and returns an `int`. Again, ideally this metafunction should be a struct that is constrained by the concept `PR`.

## Bind

Let’s look again at the Haskell’s implementation of monadic `bind`:

```bind (PR prog) cont =
PR (λ args ->
let v = prog args
(PR prog') = cont v
in
prog' args)```

It takes two arguments: an action and a continuation. We know what an action is in our C++ construction: it’s a function object with a particular signature. We’ll parameterize our C++ `Bind` with the type, `P1`, of that object. The continuation is supposed to take whatever that action returns and return a new action. The type of that new action, `P2`, will be the second template parameter of `Bind`.

We’ll encode the type of the continuation as the standard function object taking an `int` and returning `P2`:

`std::function<P2(int)>`

Now, the C++ `Bind` is a metafunction of `P1` and `P2` that “returns” an action. The act of “returning” an action translates into defining a `struct` with the appropriate overload of `operator()`. Here’s the skeleton of this `struct`:

```template<class P1, class P2> // compile-time type parameters
struct Bind {
Bind(P1 prog, std::function<P2(int)> cont)
: _prog(prog), _cont(cont)
{}
...
P1 _prog;
std::function<P2(int)> _cont;
};```

Notice that at runtime we will want to construct an object of this type, `Bind`, and pass it the runtime arguments: the action and the continuation. The role of the constructor requires some explanation. Haskell function `bind` is a monadic function of two arguments. Its C++ counterpart is a metafunction that takes four arguments: `P1` and `P2` at compile time, and `prog` and `cont` at runtime. This is a general pattern: When constructing a monadic metafunction in C++ we try to push as much as possible into compile time, but some of the arguments might not be available until runtime. In that case we shove them into the constructor.

The interesting part is, of course, the function-call operator, which really looks like a one-to-one translation of the Haskell implementation:

```template<class P1, class P2>
struct Bind {
...
int operator()(Args args) {
int v = _prog(args);
P2 prog2 = _cont(v);
return prog2(args);
}
...
};```

Things to observe: A smart compiler should be able to inline all these calls because it knows the types `P1` and `P2`, so it can look up the implementations of their function-call operators. What’s left to the runtime is just the operations on the actual runtime values, like the ones done inside `GetArg::operator()`. However, I have been informed by Eric Niebler that many of these optimizations are thwarted by the choice of `std::function` for the representation of the continuation. This problem can be overcome, but at some loss of clarity of code, so I’ll stick to my original implementation (also, see Appendix 2).

## Return

All that is left to complete the monad is the `return` function. Here it is in Haskell:

```return :: a -> Prog a
return v = PR (λ args -> v)```

And here it is, as a function object, in C++:

```struct Return
{
Return(int v) : _v(v) {}
int operator()(Args args)
{
return _v;
}
int _v;
};```

Of course, in full generality, `Return` should be parameterized by the return type of its `operator()`, but for this example a simple `int` will do. The argument `v` is only available at runtime, so it is passed to the constructor of `Return`.

# The “Compile” Metafunction in C++

Now that we have our monad implemented, we can use it to build complex monadic functions from simple ones (such as `GetArg`). In Haskell we have implemented a monadic function `compile` with the following signature:

`compile :: Exp -> Prog Int`

It takes an expression tree and returns an action that evaluates this tree.

The C++ equivalent is a metafunction that takes a (compile-time) expression tree and defines a struct with the appropriate overload of a function-call operator. Lacking concept support, the latter requirement can’t be directly expressed in C++. We can, and in fact have to, provide a forward declaration of `Compile`:

```template<class Exp>
struct Compile;```

In Haskell, we defined `compile` using pattern matching. We can do the same in C++. Just like we split the Haskel definition of `compile` into multiple sub-definitions corresponding to different argument patterns, we’ll split the C++ definition into multiple specializations of the general template, `Compile`.

Here’s the first specialization in Haskell:

`compile (Const c) = return c`

and in C++:

```template<int c>
struct Compile<Const<c>> : Return
{
Compile() : Return(c) {}
};```

I could have defined a separate overload of `operator()` for this case, but it’s simpler to reuse the one in `Return`.

Here’s another trivial case:

`compile Arg1 = getArg 0`

translates into:

```template<>
struct Compile<Arg1> : GetArg<0> {};```

The real fun begins with the `Plus` node, because it involves composition of monadic functions. Here’s the de-sugared Haskell version:

```compile (Plus exL exR) =
bind compile exL
λ left ->
bind compile exR
λ right ->
return (left + right)```

The logic is simple: First we compile the left child and bind the result with the continuation (the lambda) that does the rest. Inside this continuation, we compile the right child and bind the result with the continuation that does the rest. Inside that continuation we add the two results (they are the arguments to the continuation) and encapsulate the sum in an action using `return`.

In C++, the binding is done by creating the appropriate `Bind` object and passing it (in the constructor) a function object and a continuation. The function object is the result of the compilation of the left child (we construct a temporary object of this type on the fly):

`Compile<L>()`

Just like in Haskell, the continuation is a lambda, only now it’s the C++11 lambda. Here’s the code, still with some holes to be filled later:

```template<class L, class R>
struct Compile<Plus<L, R>> {
int operator()(Args args)
{
return Bind<...> (
Compile<L>(),
[](int left) {
return Bind<...>(
Compile<R>(),
[left](int right) {
return Return(left + right);
}
);
}
)(args);
}
};```

Notice that the second lambda must explicitly capture the local variable, `left`. In Haskell, this capture was implicit.

The types for the instantiations of the two `Bind` templates can be easily derived bottom up. Ideally, we would like the compiler to infer them, just like in Haskell, but the C++ compiler is not powerful enough (although, at the cost of muddying the code some more, one can define template functions that return `Bind` objects of appropriate types — in C++ type inference works for template functions).

Here’s the final implementation with the types filled in:

```template<class L, class R>
struct Compile<Plus<L, R>> {
int operator()(Args args)
{
return Bind<Compile<L>, Bind<Compile<R>, Return>> (
Compile<L>(),
[](int left) -> Bind<Compile<R>, Return> {
return Bind<Compile<R>, Return>(
Compile<R>(),
[left](int right) -> Return {
return Return(left + right);
}
);
}
)(args);
}
};```

It’s quite a handful and, frankly speaking, I would have never been able to understand it, much less write it, if it weren’t for Haskell.

# The Test

You can imagine the emotional moment when I finally ran the test and it produced the correct result. I evaluated the simple expression, `Arg1*Arg2+13`, with arguments 3 and 4 and got back 25. The monad worked!

```void main () {
Args args(3, 4);
Compile<Plus<Times<Arg1, Arg2>, Const<13>>> act;
int v = act(args);
std::cout << v << std::endl;
}```

# The EDSL

Now that we have the monad and a monadic function `Compile`, we can finally build a simple embedded domain-specific language. Our minimalistic goal is to be able to evaluate the following expression:

`int x = (arg1 + arg2 * arg2)(3, 4);`

The trick is to convince the compiler to construct a very special type that represents the particular expression tree. In our case it should be something like this:

`Plus< Arg1, Times<Arg2, Arg2> >`

With this type, call it `E`, the compiler should call a special metafunction we’ll call `Lambda`, which returns a function object of two integral arguments:

```template<class E>
struct Lambda {
int operator()(int x, int y) {
Args args(x, y);
Compile<E> prog;
return prog(args);
}
};```

How does one do it? This is really the bread and butter of EDSL writers — pretty elementary stuff — but I’ll explain it anyway. We start by declaring two object, `arg1` and `arg2`:

```const Lambda<Arg1> arg1;
const Lambda<Arg2> arg2;```

These little objects can infect any expression with their `Lambda`-ness, and spread the infection with the help of appropriately overloaded arithmetic operators.

For instance, when the compiler sees `arg1+arg2`, it will look for the overload of `operator+` that takes two `Lambda`s. And we’ll provide this one:

```template<class E1, class E2>
Lambda<Plus<E1, E2>> operator+ (Lambda<E1> e1, Lambda<E2> e2)
{
return Lambda<Plus<E1, E2>>();
}```

Notice that this operator returns another `Lambda`, albeit of a more complex type, thus spreading the `Lambda`-ness even further, and so on. (In this very simplified example I’m ignoring the arguments `e1` and `e1`. In general they would be used to create a runtime version of the expression tree.)

Let’s see how it works in our example:

`(arg1 + arg2 * arg2)(3, 4)`

The seed types are, respectively,

`Lambda<Arg1>,  Lambda<Arg2>, Lambda<Arg2> `

The application of the inner `operator*` (its definition is left as an exercise to the reader), produces the following type:

`Lambda<Times<Arg2, Arg2>>`

This type is passed, along with `Lambda<Arg1>` to `operator+`, which returns:

`Lambda<Plus<Arg1, Times<Arg2, Arg2>>`

All this is just type analysis done by the compiler to find out what type of object is returned by the outermost call to `operator+`.

The next question the compiler asks itself is whether this object can be called like a function, with the two integer arguments. Why, yes! A `Lambda` object (see above) defines an overload of the function-call operator. The instantiation of this particular `Lambda` defines the following overload:

```int Lambda<Plus<Arg1, Times<Arg2, Arg2>>::operator()(int x, int y) {
Args args(x, y);
Compile<Plus<Arg1, Times<Arg2, Arg2>> prog;
return prog(args);
}```

This function will be called at runtime with the arguments 3 and 4 to produce the expected result, 19.

The code that has to be actually executed at runtime is the call to `prog(args)`, which is mostly a series of `Bind`s, an addition, and a multiplication. Since the implementation of a `Bind`‘s function-call operators has no flow of control statements (`if` statements or loops), it can all be inlined by an optimizing compiler (modulo the glitch with `std::function` I mentioned earlier). So all that’s left is the addition and multiplication. Eric tells me that this is how Proto expressions work, and there is no reason why the monadic version wouldn’t lead to the same kind of performance.

# Conculsions and Future Work

I came up with this C++ monadic construction to help me understand the kind of heavy-duty template metaprogramming that goes on in Proto. Whether I have succeeded is a matter of opinion. I exchanged some of the complexity of template manipulations for a different complexity of Haskell and monads. I would argue that any C++ template metaprogrammer should know at least one functional language, so learning Haskell is a good investment. Monads are just part of the same Haskell package.

Assuming you understand monads, is the monadic construction of an EDSL simpler than the original using Proto? I will argue that it is. The implementation of Proto is pretty much monolithic, the concepts it’s using have very little application or meaning outside of Proto. The monadic approach, on the other hand, decomposes into several layers of abstraction. At the bottom you have the construction of the state-, or reader-, monad. The monad exposes just three primitives: the type constructor, bind, and return. With these primitives you can create and compose monadic (meta-) functions — in my example it was the `compile` metafunction. Finally, you use these metafunctions to build an EDSL.

With this three-layer structure comes a well defined set of customization points. Users may plug in their own type constructors and implement their own `Bind` and `Return`. Or, they might use the default monad and just create their own monadic functions. It’s important that the procedure for building and composing monadic functions is well defined and that it uses straight C++ for implementing the logic. Inside a monadic function you can use regular C++ statements and standard control flow devices.

It’s not yet totally clear how general this approach is — after all what I described is a toy example. But there is a lot of interest in re-thinking or maybe even re-implementing Boost Proto and Phoenix in terms of monads. At Boostcon I started working with Joel Falcou and Hartmut Kaiser on this project, and later Eric Niebler and Thomas Heller joined our group. We believe that having solid theoretical foundations might encourage wider acceptance of some of the more complex template libraries. Especially if strict one-to-one correspondence with Haskell code could be established.

# Acknowledgments

I’ve been given so much feedback on my Boostcon presentation and the drafts of this blog post that I should really list the following people as co-authors: Eric Niebler, Thomas Heller, Joel Falcou, and Hartmut Keiser. Thank you guys!

This page has been translated into Spanish language by Maria Ramos.

# Appendix 1

The picture wouldn’t be complete if I didn’t provide the translation of the EDSL construction back to Haskell. It’s pretty straightforward, except for the last part where an expression becomes a callable function object. Maybe there is a way to bend Haskell’s syntax to do it directly but, since this is just a proof of concept, I took a shortcut and defined a function `toFun` which turns a `Lambda` expression into a function.

```newtype Lambda = L Exp

toFun (L ex) =
λ x y ->
run (compile ex) [x, y]

instance Num Lambda where
(L e1) + (L e2) = L (Plus e1 e2)
(L e1) * (L e2) = L (Times e1 e2)
fromInteger n = L (Const n)

test =
let arg1 = L Arg1
arg2 = L Arg2
in
(toFun (arg1 + 2 * arg2 * arg2)) 2 3```

The overloading of arithmetic operators is done by making `Lambda` an instance of the type class `Num`.

# Appendix 2

When writing code at this level of abstraction it’s easy to bump into compiler bugs. For instance, Visual Studio 2010 won’t instantiate this simpler version of `Bind` that doesn’t use `std::function`:

```template<class P1, class Cont>
struct Bind
{
Bind(P1 prog, Cont f)
: _prog(prog), _f(f)
{}
int operator()(State args) {
int v = _prog(args);
auto p2 = _f(v);
return p2(args);
}
P1 _prog;
Cont _f; // store a lambda continuation
};```

# Bibliography

1. Bartosz Milewski, Monads for the Curious Programmer:
2. Bartosz Milewski, What Does Haskell Have to Do with C++?
3. Eric Niebler, Expressive C++. A series of blog posts that started all this
4. Brian McNamara and Yannis Smaragdakis, Functional Programming in C++. A Haskell-in-C++ library
5. Brian McNamara and Yannis Smaragdakis, Syntax sugar for C++: lambda, infix, monads, and more. More traditional approach to monads in C++ as a runtime device
6. Zoltán Porkoláb, Ábel Sinkovics, Domain-specific Language Integration with Compile-time Parser Generator Library. A C++ implementation of a compile-time parser that uses monadic approach (although the word “monad” is never mentioned in the paper)

I only went to one talk, not because the rest was not interesting, quite the contrary, but because I worked with Joel and Hartmut on rewriting Proto. I think we essentially got it. We have the exrpession monad implemented, my “compile” function turned out to be the equivalent of Proto transform, but with much more flexibility, expression extension produced a little Lambda EDSL with placeholders for arguments and even const terminals. It works like a charm. If you don’t know what I’m talking about, I promise to finish my blog series on monads in C++ real soon now.

The talk I went to was Chris Kohlhoff talking more about Asio, the asynchronous IO library. He was showing how the new features of C++11 make his code simpler, safer, and more flexible without too much effort. In particular he found move semantics extremely helpful in reducing (or, in some cases, eliminating) the need for memory allocation in steady state–an important property when running in an embedded system, for instance. But what I liked most was his approach to solving the inversion of control problem by implementing his own coroutines. Sure, he had to abuse C++ macros, but the code was actually much more readable and reflected the way we think about asynchronous calls.

The idea is that, with coroutines, you write your code in a linear way. You say “read socket asynchronously” and then yield. The flow of control exits the coroutine in the middle, and continues with other tasks. The trick is that the rest of the coroutine becomes your completion handler. When the async call completes, the coroutine is called again, but it starts executing right after the last yield. Your flow of control moves back and forth, but your code looks linear, instead of being fragmented into a bunch of handlers. It makes you wish coroutines were part of the language, as they are, for instance, in C#.

By the way, I caught Hans Boehm while he was waiting for the airport shuttle and asked questions about memory_order_relaxed. You know, the problem is, Can a relaxed load fetch an “out of thin air” value–a value that has never been written by anybody else? What I’m getting now is that in practice this will never happen, but it’s very hard to describe this requirement formally. In other words, Can a malicious chip manufacturer in cahoots with a compiler writer come up with a system that fulfills the letter of the C++ memory model and lets you fetch an out-of-thin-air value? I think the answer is yes, because the language of the Standard is purposely vague on this topic:

(29.3.11) [ Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

```// Thread 1:
r1 = x.load(memory_order_relaxed);
if (r1 == 42) y.store(r1, memory_order_relaxed);
// Thread 2:
r2 = y.load(memory_order_relaxed);
if (r2 == 42) x.store(42, memory_order_relaxed);```

However, implementations should not allow such behavior.—end note ]

I’m totally exhausted, so I won’t write much. I spoke to the audience practically non-stop for three hours, and then I was rewriting Proto with Joel Falcou and Hartmut Kaiser over beers till late at night. My talk about the connection between Haskell monads and Boost Proto was very well received–beyond my expectations. It looks like there is some real need to put theoretical foundations under the more advanced areas of template metaprogramming; and monads just fit the bill. There was a monad lurking in Proto all this time– the only thing missing was “bind”.

I’m looking forward to tomorrow’s keynote by Hans Boehm, and the continued hacking of Proto with Joel and Hartmut.

In my previous post I talked about categories, functors, endofunctors, and a little about monads. Why didn’t I just start with a Haskell definition of a monad and skip the math? Because I believe that monads are bigger than that. I don’t want to pigeonhole monads based on some specific language or application. If you know what to look for you’ll find monads everywhere. Some languages support them better, as a single general abstraction, others require you to re-implement them from scratch for every use case. I want you to be able to look at a piece of C++ code and say, “Hey, that looks like a monad.” That’s why I started from category theory, which sits above all programming languages and provides a natural platform to talk about monads.

In this installment I will talk about some specific programming challenges that, on the surface, seem to have very little in common. I will show how they can be solved and how from these disparate solutions a pattern emerges that can be recognized as a monad. I’ll talk in some detail about the `Maybe` monad and the `List` monad, since they are simple and illustrate the basic ideas nicely. I will leave the really interesting examples, like the state monad and its variations, to the next installment.

# Challenges of Functional Programming

You know what a function is: called with the same argument it always returns the same result. A lot of useful computations map directly to functions. Still, a lot don’t. We often have to deal with some notions of computations that are not functions. We can describe them in words or implement as procedures with side effects. In non-functional languages such computations are often called “functions” anyway– I’ll enclose such usage with quotation marks. Eugenio Moggi, the guy who introduced monads to computing, listed a few examples:

• Partial “functions”: For some arguments they never terminate (e.g., go into an infinite loop). These are not really functions in the mathematical sense.
• Nondeterministic “functions”: They don’t return a single result but a choice of results. Non-deterministic parsers are like this. When given an input, they return a set of possible parses. Which of them is right depends on the context. These are not really functions because a function must return a single value.
• “Functions” with side effects: They access and modify some external state and, when called repeatedly, return different results depending on that state.
• “Functions” that throw exceptions: They are not defined for certain arguments (or states) and they throw exceptions instead of returning a result.
• Continuations.
• Interactive input: `getchar` is a good example. It’s result depends on what hits the keyboard.
• Interactive output: `putchar` is a good example. It’s not a function because it doesn’t return anything (if it returns an error code, it doesn’t depend on the argument). Still, it can’t be just optimized away if we want any output to appear on the screen.

The amazing thing is that, using some creative thinking, all these problems can be solved using pure functional methods. I won’t discuss all of them, but I’ll show you a series of progressively more interesting examples.

# Error Propagation and Exceptions

Who doesn’t deal with error propagation? Let’s dissect the problem: You are defining a computation that can return a valid result only for a subset of possible arguments. So what happens when the argument is “wrong”? Well, you can return an error.

In the simplest case you might add one extra bit, success or failure, to your result. Sometimes you can spare a bit in the result type: If the correct result is always a non-negative number, you may return negative one as an error. Or, if the result is a pointer, you might return a null pointer instead. But these are just small hacks introduced for the sake of performance. And, like most hacks, they can lead to dangerous code. Consider this (likely incorrect) code:

```size_t off = fileName.find('.');
string ext = fileName.substr(off, fileName.length() - off);```

If `fileName` contains no dot, the result is a special value `npos` signifying an error. The problem is that `npos` is of the same type as a non-error result, so it can be passed quietly to `string::substr` causing undefined behavior.

A more general and safer solution is to change– extend– the type of the result to include an error bit. You can do it for any type of result. In Haskell, you just define a type constructor called `Maybe`:

`data Maybe a = Nothing | Just a`

It takes an arbitrary type `a` and defines a new type that adds the special value `Nothing` to the set of possible values of `a`. In C++ that would be equivalent to a template:

```template<class T>
struct Maybe {
T just; // valid if 'nothing' is false
bool nothing;
};```

(This is just an example. I’m not arguing that `Maybe` would be very useful in C++ which has other mechanisms for error propagation.)

So here we have a way to turn a computation that’s not defined for all values into a function that’s defined over the whole domain, but which returns a new richer type.

The next question is, do these new things– functions returning `Maybe`– compose? What should the caller do with the result of such a function when it’s part of a larger computation? One thing is for sure, this result cannot be passed directly to an unsuspecting function–the error cannot be easily ignored– which is good. If we replace `find` by a new function `safe_find` that returns a `Maybe<size_t>`, the client won’t call `substr` with it. The types wouldn’t match. Instead, the result of `safe_find` must be unpacked and (much more likely than before) tested.

```Maybe<size_t> off = safe_find(fileName, '.');
string ext;
if (!off.nothing)
ext = fileName.substr(off.just, fileName.length() - off.just);```

Notice what happened here: By changing the return type of a function we broke the natural way functions compose– the result of one becoming the input of the next. On the other hand, we have come up with a new way of composing such functions. Let me chain a few of such compositions for better effect:

```Maybe<Foo> foo = f(x);
if (!foo.nothing) {
Maybe<Bar> bar = g(foo.just);
if (!bar.nothing) {
Maybe<Baz> baz = h(bar.just);
if (!baz.nothing) {
...
}
}
}```

I have highlighted the elements of the “glue,” which is used to compose our new, more error-conscious functions. Notice how this boilerplate glue gets in the way of code clarity. Ideally, we would like to write something like this:

```DO
{
auto y = f(x);
auto v = g(y);
auto z = h(v);
return z;
}```

where `DO` would magically provide the glue that’s implicit in the chaining of `f`, `g`, and `h`.

It’s not easy to do this– abstract the glue away– in C++. I’m not even going to try. But in Haskell it’s a different story. Let’s start with the almost direct translation of the C++ code into Haskell:

```compose n =
let m = f n in
case m of
Nothing -> Nothing
Just n1 ->
let m1 = g n1 in
case m1 of
Nothing -> Nothing
Just n2 ->
let n3 = h n2 in
n3```

The if statements are replaced by Haskell’s pattern matching (`case x of`). The `m`s are the `Maybe`s and the `n`s art their contents (if they aren’t `Nothing`s).

Notice the characteristic cascading style of this code– the nested conditionals or pattern matches. Let’s analyze one level of such a cascade. We start with a `Maybe` value (one returned by `f n`). We unpack it and examine the contents. If the result is not an error (`Just n1`), we continue with the rest of the body of `compose`. I have highlighted the “rest of the body” in blue (the “glue” is still in red).

What’s also important is that, if the result is an error (the `Nothing` branch of the pattern match) the whole “rest of the body” is skipped. In order to abstract the glue, we have to abstract the “rest of the body” as well. Such an abstraction is called a continuation. Let’s write this continuation as a lambda (lambdas in Haskell are written using the backslash, which is nothing but the Greek letter λ missing one leg):

```\ n1 ->
let m1 = g n1 in
case m1 of
Nothing -> Nothing
Just n2 ->
let n3 = h n2 in
n3```

And here’s the trick: We can abstract the glue as a (higher-order) function that takes a `Maybe` value and a continuation. Let’s call this new function `bind` and rewrite `compose` with its help:

```compose n =
let m = f n in
-- the first argument is m, the second is the whole continuation
bind m (\n1 ->
let m1 = g n1 in
case m1 of
Nothing -> Nothing
Just n2 ->
let n3 = h n2 in
n3)```

Here’s how `bind` is implemented:

```bind m f =
case m of
Nothing -> Nothing  -- bypass the continuation
Just n -> f n       -- pass n to the continuation```

or, more compactly,

```bind Nothing cont  = Nothing
bind (Just n) cont = cont n```

Figure 1 illustrates the complete code transformation. The result of `f n`, which is a `Maybe`, is passed to `bind`, represented by a blue box. Inside `bind` the `Maybe` is unpacked. If its value is `Nothing`, nothing happens. If its value is `Just n1`, the rest of the code, the continuation, is called with the argument `n1`. The continuation itself is rewritten using `bind`, and so on. The final continuation calls `return`, which I will explain shortly.

Fig 1. Composition of functions that return Maybe results

The position of `bind`, the blue box in Figure 1, between its `Maybe` argument and the continuation suggests that infix notation might be more appropriate. Indeed, in Haskell `bind` is represented by the infix operator, `>>=`:

```Nothing  >>= cont = Nothing
(Just x) >>= cont = cont x```

(The left-hand side of the equal sign is the operator between its two arguments [actually, patterns], and the right-hand side is the body of the function.) We can express the type signature of the bind function as:

`(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b`

It takes a `Maybe a` and a function from `a` to `Maybe b` (that’s our continuation). It returns a `Maybe b`. When dealing with higher order functions I find type signatures extremely useful; so much so that I often use them as comments in my C++ meta-code.

The complete rewrite of `compose` using `>>=` looks like this:

```compose1 n =
f n >>= \n1 ->
g n1 >>= \n2 ->
h n2 >>= \n3 ->
return n3```

Now is the time to explain the mysterious `return` at the end of the function. No, it’s not the keyword for returning a result from a function. It’s a function that takes an argument of type `a` and turns it into a `Maybe a`:

`return n = Just n`

We need `return` because the result of `compose` is a `Maybe`. If any of the functions returns `Nothing`, `compose` returns `Nothing` (it’s part of the definition of `>>=`). Only when all functions succeed do we call `return` with the correct result, `n3`. It turns `n3` into `Just n3`, thus announcing the success of the computation and encapsulating the final result.

You might say that using `return` instead of `Just n3` is an overkill, but there are good reasons to do that. One is to encapsulate and hide direct access to the implementation of `Maybe`. The other has to do with the ways this pattern is generalized beyond `Maybe`.

## The Maybe Monad

What does `Maybe` have to do with monads? Let’s see what we have done so far from a more general perspective.

We started from a computation which didn’t behave like a function– here, it was not defined for all arguments. We had found a clever way to turn it into a function by enriching its return type. Such general “enriching” of types can be expressed as a type constructor.

From a category-theoretical point of view, defining a type constructor, which is a mapping from types to types, gets us half way towards defining an endofunctor. The other part is the mapping of functions (morphisms). For any function that takes values of type `a` and returns values of type `b` we need a corresponding function that acts between the enriched types. In Haskell, the mapping of functions to functions can be expressed as a higher-order function. When this mapping is part of an endofunctor, the corresponding higher-order function is called `fmap`.

Let’s call the type constructor `M`. It constructs the enriched type `M a` from any type `a`. The function `fmap` has the following signature:

`fmap :: (a -> b) -> (M a -> M b)`

It maps a function from `a` to `b` to a function from `M a` to `M b`. In the case of `Maybe` we already know the type constructor part, and `fmap` will follow soon.

Let’s go back to our computation turned function returning an enriched type. In general, its signature is:

`f :: a -> M b`

Since we care about composability of computations, we need a way to glue the enriched functions together. In the process of gluing we usually have to inspect the enriched type and make decisions based on its value. One of such decisions might be whether to continue with the rest of the computation or not. So the general `bind` operation must take two arguments: the result of the previous enriched function and the rest of the computation– a continuation:

`bind :: M a -> (a -> M b) -> M b`

If you squint enough, you can see a similarity between `bind` and `fmap`.

Here’s how you do it: Imagine that you supply `bind` with its second argument, the continuation. There is still one free parameter of the type `M a`. So what’s left is a function that takes `M a` and returns `M b`. (This is not exactly currying, since we are binding the second argument rather than the first, but the idea is the same.) That’s exactly the type of function that occurs on the right hand side of the signature of `fmap`. Seen from that point of view, `bind` maps functions into functions:

`(a -> M b) -> (M a -> M b)`

The function of the left hand side is our continuation, the one on the right hand side is `bind` with the second argument plugged by the continuation.

This signature is really close to that of `fmap`, but not quite. Luckily, we still have a second family of functions at our disposal, the polymorphic `return`. The signature of `return` is:

`return :: a -> M a`

Using both `bind` and `return` we can lift any function `f`:

`f :: a -> b`

to a function `g`:

`g :: M a -> M b`

Here’s the magic formula that defines `g` in terms of `f`, `bind`, and `return` (the dot denotes regular function composition):

`g ma = bind ma (return . f)`

Or, using infix notation:

`g ma = ma >>= (return . f)`

This mapping of `f` into `g` becomes our definition of `fmap`. Of course, `fmap` must obey some axioms, in particular it must preserve function composition and interact correctly with unit (of which shortly). These axioms translate back into corresponding conditions on `bind` and `return`, which I’m not going to discuss here.

Now that we have recovered the functor, we need the two other parts of a monad: unit and join. Guess what, `return` is just another name for unit. For any type, it lifts an element of that type to an element of the enriched type in the most natural manner (incidentally, the word natural has a very precise meaning in category theory; both unit and join are natural transformation of the functor (M, fmap)).

To recover join, let’s look at its type signature in Haskell:

`join :: M (M a) -> M a`

It collapses one layer of the type constructor. In the case of `Maybe` it should map a double `Maybe` to a single `Maybe`. It’s easy to figure out that it should map `Just (Just x)` into `Just x` and anything else to `Nothing`. But there is a more general way of defining `join` using `bind` combined with the identity function:

```join :: M (M a) -> M a
join mmx = mmx >>= id```

We are binding a value of type `M (M a)` to a function `id`. This function, when acting on `M a` returns `M a`. You can convince yourself that these signatures match the general signature of `bind` (see Appendix); and that this definition, when applied to `Maybe`, produces the correct result.

To summarize: What we have done here is to show that `Maybe` together with `bind` and `return` is a monad in the category-theory sense. Actually, we have shown something far more general: Any triple that consists of a type constructor and two functions `bind` and `return` that obey certain identities (axioms) define a monad. In category theory this triple is called a Kleisli triple and can be used as an alternative definition of a monad.

## Syntactic Sugar

So far you’ve seen only the `Maybe` example, so it might seem like we are pulling out a monadic cannon to kill a mosquito. But, as you’ll see later, this same pattern pops up in a lot of places in various guises. In fact it’s so common in functional languages that it acquired its own syntactic sugar. In Haskell, this sugar is called the do notation. Let’s go back to the implementation of our function `compose`:

```compose n =
f n >>= \n1 ->
g n1 >>= \n2 ->
h n2 >>= \n3 ->
return n3```

Here’s exactly the same code in do notation:

```compose n = do
n1 <- f n
n2 <- g n1
n3 <- h n2
return n3
```

This looks deceptively similar to an imperative program. In fact here’s a C++ version of it (for the sake of the argument let’s assume that `f` takes a pointer to `Foo` and `h` returns an integer) :

```int compose(Foo * n)
{
auto n1 = f(n);
auto n2 = g(n1);
auto n3 = h(n2);
return n3;
}```

Uh, one more thing. This is how you would call it:

```try {
compose(pFoo);
}
catch(...) {
// error handling
}```

In C++ you get virtually the same functionality not by modifying the return type, but by throwing an exception. (Now you may utter your first cry, “Hey, that looks like a monad!”, when looking at C++ code.)

Just like in our Haskell example, once any of the functions reports an error (throws an exception), the rest of the body of `compose` is skipped. You might say that C++ exceptions offer more power than the `Maybe` monad and you’ll be right. But that’s because I haven’t shown you the `Error` monad and the `Exception` monad.

Where Haskell’s `Exception` monad beats C++ exceptions is in type checking. Remember the unfortunate attempt at adding exceptions specification to the C++ type system? (Java went the same way.) Here’s what the C++ standard says:

An implementation shall not reject an expression merely because when executed it throws or might throw an exception that the containing function does not allow.

In other words, exceptions specifications are bona fide comments. Not so in Haskell! If a function returns a `Maybe` or `Exception`, that becomes part of its type signature which is checked both at the call site and at the function definition site. No cheating is allowed, period.

But the major difference between Haskell’s and C++’s approach is that the `do` notation generalizes to all monads, whereas C++ neat `try/catch` syntax applies only to exceptions.

A word of caution when reading Haskell monadic code. Despite similarities in usage, Haskell’s left arrow is not an assignment. The left hand side identifier corresponds to the argument of the continuation (which is the rest of the `do` block). It is the result of unpacking the outcome of the right hand side expression. This unpacking always happens inside `bind`.

# Non-deterministic Computations

In the previous post I introduced the list monad by defining a functor and two families of functions. The type constructor part of the functor mapped any type `a` into the list of `a`, `[a]`. The part that acted on functions (now we know that, in general, it’s called `fmap`; but for lists it’s just `map`) worked by applying the function to each element of the list. The `unit` and the `join` were polymorphic functions. `unit` was defined as:

` unit x = [x]`

and `join` as `concat`.

Now I can tell you that the list monad provides the solution to the problem of implementing non-deterministic computations. Imagine parsing a non-deterministic grammar. A production might offer multiple alternative parse trees for the same input.

These types of computations may be simulated with functions returning lists of possible results. Here’s the same trick again: whatever type is returned by a deterministic parser (e.g., a parse-tree type), it’s now turned into an enriched type (a parse tree list).

We’ve already seen the category-theory construction of the list monad but here we are facing a slightly different problem: how to string together functions that return lists. We know that we have to define `bind`. It’s signature, in this case, would be:

`bind :: [a] -> (a -> [b]) -> [b]`

It takes a list (which is the result of a previous function) and a continuation, `(a -> [b])`, and must produce another list. The obvious thing is to apply the continuation to each element of the input list. But since the continuation also produces a list, we’ll end up with a list of lists. To get a single list as the result, we’ll simply concatenate the sublists. Here’s the final implementation:

`xs >>= cont = concat (map cont xs)`

The application of the continuation to each element of the input list is done using `map`.

Is this a coincidence that we were able to define bind in therms of join and fmap? Not at all. The general formula for converting the functor definition of a monad to a Kleisli triple is:

• Take the object-mapping part of the functor (the type constructor)
• Define bind as
`bind x f = join ((fmap f) x))`

where `fmap` is the part of the functor that maps morphisms.

• Define return as unit

Now we know how to move between those two definitions back and forth.

Just as in the case of `Maybe`, we can apply the do notation to functions returning lists:

```toss2Dice = do
n <- tossDie
m <- tossDie
return (n + m)```

Here, `tossDie` returns the list of all possible outcomes of a die toss, and `toss2Dice` returns the list of all possible sums of the outcomes of a two-die toss.

An interesting observation is that the list monad is closely related to list comprehensions, down to the use of left arrows (in Haskell). For instance, the above example is equivalent to:

`toss2Dice = [n + m | n <- tossDie, m <- tossDie]`

# Conclusions

There is a large class of computations that convert input to output in a non-functional way. Many of those can be expressed as functions that return “enriched” output. This enrichment of the output can be expressed in terms of a type constructor. This type constructor defines the first component of a monad.

Computations have to be composable, so we need a way of composing the enriched functions. This introduces the second component of the monad, the `bind` family of higher-order functions.

Finally, we should be able to construct functions that return enriched types, and for that we need the third component of the monad, the `return` family of functions. They convert regular values into their enriched counterparts.

I’ve shown you two examples of Haskell monads in action, but the real treat will come in the third installment of my mini-series. I’ll show you how to define and compose functions that, instead of returning regular values, return functions that calculate those values.

# Appendix: Join From Bind

By popular demand (see comments), I decided to explain in more detail the typing of the formula:

```join :: M (M a) -> M a
join mmx = mmx >>= id```

At first look it seems like `id` doesn’t have the right signature for the second parameter to `bind`:

`bind :: M a' -> (a' -> M b') -> M b'`

(I changed the names of type arguments to `a'` and `b'` to avoid confusion.) However, notice that here we are calling `bind` with the first argument, `mmx`, of type `M (M a)`. It means that `a'` in this particular instantiation of `bind` is `(M a)`. The second argument to bind is `id` with the signature:
`id: c -> c`
Here, `id` will be called with `c` equal to `(M a)`, so its return type will also be `(M a)`. With those substitution, `bind` will also return type `(M a)`:

`M (M a) -> (M a -> M a) -> M a`

Thus the whole combination `(mmx >>= id)` ends up with the right type signature for `join`.

# Bibliography

1. Eugenio Moggi, Notions of Computation and Monads. This is a hard core research paper that started the whole monad movement in functional languages.
2. Philip Wadler, Monads for Functional Programming. The classic paper introducing monads into Haskell.
3. Tony Morris, What Does Monad Mean?
4. Brian Beckman, Don’t fear the Monad
5. Graham Hutton, Erik Meijer, Monadic Parsing in Haskell.
6. Brian McNamara, Yannis Smaragdakis, Syntax sugar for FC++: lambda, infix, monads, and more. A C++ template library for functional programming that imitates Haskell pretty closely.

The Monad is like a bellows:
it is empty yet infinitely capable.
The more you use it, the more it produces;
the more you talk about it, the less you understand.

–Monad Te Ching

I don’t know if I’m exaggerating but it seems like every programmer who gets monads posts a tutorial about them. (And each post begins with: There’s already a lot of monad tutorials on the Internet, but…) The reason is that getting monads it’s like a spiritual experience that you want to share with others.

When facing a monad, people often behave like the three blind men describing an elephant. You’ll see monads described as containers and monads described as actions. Some people see them as a cover-up for side effects, others as examples of endofunctors in Category Theory.

Monads are hard to describe because they don’t correspond to anything in our everyday experience. Compare this with Objects in Object-Oriented programming. Even an infant knows what an object is (something you can put in your mouth). What do you do with a monad?

But first, let me answer the pertinent question:

# Why Bother?

Monads enable pure functional programmers to implement mutation, state, I/O, and a plethora of other things that are not functions. Well, you might say, they brought it on themselves. They tied their hands behind their backs and now they’re bragging that they can type with their toes. Why should we pay attention?

The thing is, all those non-functional things that we are so used to do in imperative programming are also sources of a lot of troubles. Take side effects for instance. Smart programmers (read: the ones who burnt their fingers too many times) try to minimize the use of global and static variables for fear of side effects. That’s doable if you know what you’re doing. But the real game changer is multithreading. Controlling the sharing of state between threads is not just good programming practice– it’s a survival skill. Extreme programming models are in use that eliminate sharing altogether, like Erlang’s full isolation of processes and its restriction of message passing to values.

Monads stake the ground between total anarchy of imperative languages and the rigid dictatorship of Erlang-like isolationism. They don’t prohibit sharing or side effects but let you control them. And, since the control is exercised through the type system, a program that uses monads can be checked for correctness by the compiler. Considering how hard it it to test for data races in imperative programs, I think it’s worth investing some time to learn monads.

There is also a completely different motivation: metaprogramming. The template language used for metaprogramming in C++ is a pure functional language (see my blog post, What does Haskell have to do with C++?). If monads are so important in functional programming, they must also pop up in C++ metaprogramming. And indeed they do. I hope to discuss this topic in a future post.

So what’s a monad?

# A Categorical Answer

If you don’t know anything about category theory, don’t get intimidated. This is really simple stuff and it will clarify a lot of things, not to mention earning you some bragging rights. My main goal is to share some intuitions from mathematics that will build foundations for a deeper understanding of monads in programming. In this installment I will explain categories, functors, and endofunctors, leading up to monads. I will give examples taken both from everyday life and from programming. I will really get into monads and their practical applications in the next installment, so be patient.

## Categories

A category is a natural extension of our notion of sets and functions. The generalization of a set in a category is called an object (a pretty neutral term with little semantic ballast), and the generalization of a function is called a morphism. In fact, the standard example of a category is the category of sets and functions called (capital letter) Set.

A morphism (read “function”) goes from one object (read “set”) to another. Mathematical functions like sin or exp usually go from the set of real numbers to the set of real numbers. But you may also define functions like isPrime that go from natural numbers to Booleans, or a function price that goes from a set of goods to the set of numbers.

The only thing a mathematician needs to know about morphisms is that they can be composed. If you have a morphism from A to B, `A->B`, and another going from B to C, `B->C`, then they can be composed to a morphism from A to C,` A->C`. And just like the standard composition of functions, morphism composition must be associative, so we don’t need parentheses when composing more than two of them.

Actually, two things. There must be, for every object, a special morphism called identity that essentially does nothing and when composed with any other morphism reproduces the same morphism.

Just to throw you off the track, a category doesn’t have to be built on sets and functions. You can easily construct simple categories from blobs and arrows. Fig 1 shows such a category that contains two objects and four morphisms: arrows between them (formally, those arrows are ordered pairs of objects so, for instance, f is a pair (A, B)). You can easily check that any two morphisms can be composed and that the two moprphisms `iA` and `iB` serve as identities.

Fig 1. A simple category with two objects and four morphisms.

That’s it! Hopefully I have just convinced you that a category is not a big deal. But let’s get down to Earth. The one category that’s really important in programming languages is the category of types and functions, in particular its Haskell version called Hask. There usually is a finite set of basic types like integers or Booleans, and an infinite set of derived types, like lists of integers, functions from integers to Booleans, etc. In Hask, a type is just a set of values. For instance, the type `Char` is a set {‘a’, ‘b’, ‘c’, … }.

So, in the category Hask, types are objects and functions are morphisms. Indeed, a function maps one type into another (forget for a moment functions of multiple arguments– they can be modeled with currying– and polymorphic functions– they are families of functions). And these are functions in the functional-programming sense: called with the same values they return the same values–no side effects allowed.

Function composition is just passing the result of one function as an argument to another. The identity function takes x and immediately returns it back.

This is all fine, but what’s in it for me, you might ask. So here’s the first insight and a moment of Zen. If there is one thing that you can call the essence of programming, it’s composability. In any style of programming you always compose your program from smaller pieces, and those pieces from even smaller pieces, and so on. That’s why categories with their composable morphisms are so important. The essence of Lego blocks is the way they fit together, their composability, not the color or size. The essence of functional programming is how functions work together: how you can build larger functions from smaller ones.

Every category is defined by its choice of objects and morphisms. But is there something that can characterize a given category that’s independent of its choice of particular objects and morphisms? How do you expose the inner structure of a particular category? Mathematicians know exactly how to do that. You have to be able to map categories into other categories while preserving some constraints imposed by the way morphisms are attached to objects and the way they compose. Such maps let you find similarities between categories and catalog different kinds of categories. That’s when things get really interesting.

## Functors

A functor, `F`, is a map from one category to another: it maps objects into objects and morphisms into morphisms. But it can’t do it in a haphazard way because that would destroy the very structures that we are after. So we must impose some “obvious” (mathematicians love that word) constraints.

First of all, if you have a morphism between two objects in the first category then it better be mapped into a morphism between the corresponding objects in the second category. Fig 2 explains this diagrammatically. Object A is mapped into F(A), object B into F(B). A morphism f from A to B is mapped into a morphism F(f) from F(A) to F(B). Mathematicians say that such diagram must commute, that is the result must be the same whether you go from A to F(A) and then apply F(f), or first apply f and then go from B to F(B).

Fig 2. Diagram showing the action of a functor F on objects A and B and a morphism f. The bottom part lives in F's domain (source) category, the top part in its codomain (the target).

Moreover, such mapping should preserve the composition property of morphisms. So if morphism `h` is a composition of `f` and `g`, then `F(h)` must be a composition of `F(f)` and `F(g)`. And, of course, the functor must map identity morphisms into identity morphisms.

To get a feel for how constrained functors are by these conditions, consider how you could map the category in Fig 1 into itself (such a functor just rearranges things inside one category). There are two trivial mappings that collapse both objects into one (either A or B), and turn all morphisms into identity. Then there is the identity functor that maps both objects into themselves and all morphisms into themselves. Finally, there is just one “interesting” functor that maps A into B and B into A with f and g switching roles. Now imagine a similar category but with the g arrow removed (yes, it’s still a category). Suddenly there is no functor other than the collapsing ones between Fig 1 and that new category. That’s because the two categories have completely different structure.

Let me now jump into more familiar territory. Since we are mostly interested in one category, Hask, let me define a functor that maps that category into itself (such functors are called endofunctors). An object in Hask is a type, so our functor must map types into types. The way to look at it is that a functor in Hask constructs one type from another– it’s a type constructor. Don’t get confused by the name: a type constructor creates a new type in your program, but that type has already existed in Hask.

A classical example is the list type constructor. Given any type it constructs a list of that type. Type `Integer` is mapped into list of integers or, in Haskell notation, `[Integer]`. Notice that this is not a map defined on integer values, like 1, 2, or 3. It also doesn’t add a new type to Hask– the type `[Integer]` is already there. It just maps one type into another. For C++ programmers: think of mapping type T into a container of T; for instance, `std::vector<T>`.

Mapping the types is the easy part, what about functions? We have to find a way to take a particular function and map it into a function on lists. That’s also easy: apply the function to each element of the list in turn. There is a (higher level) function in Haskel that does it. It’s called `map` and it takes a function and a list and returns a new list (or, because of currying, you may say that it takes a function and returns a function acting on lists). In C++ there is a corresponding template function called `std::transform` (well, it takes two iterators and a function object, but the idea is the same).

Mathematicians often use diagrams to illustrate the properties of morphisms and functors (see Fig 2). The arrows for morphisms are usually horizontal, while the arrows for functors are vertical (going up). That’s why the mapping of morphisms under a functor is often called lifting. You can take a function operating on integers and “lift it” (using a functor) to a function operating on lists of integers, and so on.

The list functor obviously preserves function composition and identity (I’ll leave it as an easy but instructive exercise for the reader).

And now for another moment of Zen. What’s the second most important property of programming? Reusability! Look what we have just done: We took all the functions we’ve implemented so far and lifted them to the level of lists. We’ve got functions operating on lists essentially for free (well, we’ve got a small but important subset of those functions). And the same trick may be applied to all kinds of containers, arrays, trees, queues, `unique_ptr`s and more.

It’s all beautiful, but you don’t really need category theory to apply functions to lists. Still it’s always good to see patterns in programming, and this one is definitely a keeper. The real revolution starts with monads. And, guess what, the list functor is actually a monad. You just need a few more ingredients.

What’s the intuition behind the statement that mappings expose the structure of the system? Consider the schematic of the London underground in Fig 3. It’s just a bunch of circles and lines. It’s only relevant because there is a mapping between the city of London and this schematic. The circles correspond to tube stations and the lines to train connections. Most importantly, if trains run between two stations, the corresponding circles in the diagram are connected by lines and vice versa: these are the constraints that the mapping preserves. The schematic shows a certain structure that exists in London (mostly hidden underground) which is made apparent by the mapping.

Fig 3. The schematic map of London underground system.

Interestingly, what I’m doing here is also mapping: London and the underground map correspond to two categories. Trains stations/circles are objects and train connections/lines are morphism. How’s that for an example?

## Endofunctors

Mathematicians love mappings that preserve “obvious” constraints. As I explained, such mappings abstract inner structures away from the details of implementation. But you can also learn a lot about structure by studying non-trivial mappings into itself. Functors that map a category into itself are called endofunctors (like endo-scopes they let you look inside things). If functors expose similarities, endofunctors expose self-similarities. Take one look at the fractal fern, Fig 4, and you’ll understand how powerful self-similarity can be.

Fig 4. This fractal fern was generated using just four endomorphisms.

With a little bit of imagination you can see the list functor exposing fern-like structures inside Hask (Fig 5). Chars fan out into lists of Chars, which then fan out into lists of lists of Chars, and so on, ad infinitum. Horizontal structures described by functions from `Char` to `Bool` are reflected at higher and higher levels as functions on lists, lists of lists, etc.

Fig 5. The action of the list type constructor reveals fractal-like structure inside Hask. The functor lifts things up, the functions act horizontally.

A C++ template that takes a type parameter could be considered a type constructor. How likely is it that it also defines a functor (loosely speaking– C++ is not as mathematized as Haskell)? You have to ask yourself: Is the type parameter constrained in any way? It’s often hard to say, because type constraints are implicit in the body of a template and are tested only during instantiation. For instance, the type parameter for a `std::vector` must be copyable. That eliminates, for instance, classes that have private or `deleted` (in C++0x) copy constructors. This is not a problem though, because copyable types form a subcategory (I’m speaking really loosely now). The important thing is that a vector of copyable is itself copyable, so the “endo-” part of the endomorphism holds. In general you want to be able to feed the type created by the type constructor back to the type constructor, as in `std::vector<std::vector<Foo>>`. And, of course, you have to be able to lift functions in a generic way too, as in `std::transform`.

## Monads

Ooh, Monads!
–Haskell Simpson

It’s time to finally lift the veil. I’ll start with the definition of a monad that builds on the previous sections and is mostly used by mathematicians. There is another one that’s less intuitive but easier to use in programming. I’ll leave that one for later.

A monad is an endofunctor together with two special families of morphisms, both going vertically, one up and one down (for “directions” see Fig 5). The one going up is called unit and the one going down is called join.

Now we are juggling a lot of mappings so let’s slow down to build some intuition. Remember, a functor maps objects: in our case, types, which are sets of values. The functor doesn’t see what’s inside the objects; morphisms, in general, do. In our case, a morphism is a function that maps values of one type into values of another type. Our functors, which are defined by type constructors, usually map poorer types into richer types; in the sense that type Bool is a set that contains just two elements, True and False, but type [Bool] contains infinitely many lists of True and False.

Unit takes a value from the poorer type, then picks one value from the richer type, and pronounces the two roughly equivalent. Such a rough equivalent of True from the Bool object is the one-element list [True] from the [Bool] object. Similarly, unit would map False into [False]. It would also map integer 5 into [5] and so on.

Unit can be though of as immersing values from a lower level into the higher level in the most natural way possible. By the way, in programming we call a family of functions defined for any type a polymorphic function. In C++, we would express unit as a template, like this:

```template<class T>
std::vector<T> unit(T value) {
std::vector<T> vec;
vec.push_back(value);
return vec;
}```

To explain join, imagine the functor acting twice. For instance, from a given type `T` the list functor will first construct the type `[T]` (list of `T`), and then `[[T]]` (list of list of `T`). Join removes one layer of “listiness” by joining the sub-lists. Plainly speaking, it just concatenates the inner lists. Given, for instance, `[[a, b], [c], [d, e]]`, it produces `[a, b, c, d, e]`. It’s a many-to-one mapping from the richer type to the poorer type and the type-parameterized family of joins also forms a polymorphic function (a template, in C++).

There are a few monadic axioms that define the properties of unit and join (for instance that unit and join cancel each other), but I’m not going to elaborate on them. The important part is that the existence of unit and join imposes new constraints on the endofunctor and thus exposes even more structure.

Mathematicians look at `join` as the grandfather of all multiplication with `unit` being its neutral element. It’s heaven for mathematicians because multiplication leads to algebraic structures and indeed monads are great for constructing algebras and finding their hidden properties.

Unlike mathematicians, we programmers are not that interested in algebraic structures. So there must be something else that makes monads such a hit. As I mentioned in the beginning, in programming we often face problems that don’t naturally translate into functional paradigm. There are some types of computations that are best expressed in imperative style. It doesn’t mean they can’t be translated into functions, it’s just that the translation is somewhat awkward and tedious. Monads provide an elegant tool to do this translation. Monads made possible the absorption and assimilation of imperative programming into functional programming, so much so that some people claim (tongue in cheek?) that Haskell is the best imperative language. And like all things functional monads are bound to turn around and find their place in imperative programming. But that’s material for my next blog post.