Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

void printNTriples(int n)
    int i = 0;
    for (int z = 1; ; ++z)
        for (int x = 1; x <= z; ++x)
            for (int y = x; y <= z; ++y)
                if (x*x + y*y == z*z) {
                    printf("%d, %d, %d\n", x, y, z);
                    if (++i == n)

Here, a single C function serves three distinct purposes: It

  1. Generates Pythagorean triples,
  2. Prints them,
  3. Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

  1. Generate the list of Pythagorean triples,
  2. Take the first ten of them, and
  3. Print them.

And that’s exactly how you’d write this program in Haskell:

main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
                     , x <- [1..z]
                     , y <- [x..z]
                     , x^2 + y^2 == z^2]

This program reads: take 10 triples and print them. It declares triples as a list (square brackets mean a list) of triples (x, y, z), where (the vertical bar reads “where”) z is an element of the list of integers from 1 to infinity, x is from 1 to z, y is from x to z, and the sum of squares of x and y is equal to the square of z. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to print that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.


We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

template<class T>
class Susp {
    explicit Susp(std::function<T()> f)
        : _f(f)
    T get() { return _f(); }
    std::function<T()> _f;

We often create suspensions using lambda functions, as in:

int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
int z = sum.get();

Notice that the suspended lambda may capture variables from its environment: here x and y. A lambda, and therefore a suspension, is a closure.

The trouble with this implementation is that the function is re-executed every time we call get. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls get we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the this pointer) and returning a value (by const reference). The get method simply calls this thunk, passing it the this pointer.

Initially, the thunk is set to thunkForce, which calls the method setMemo. This method evaluates the function, stores the result in _memo, switches the thunk to thunkGet, and returns the memoized value. On subsequent calls get goes through the getMemo thunk which simply returns the memoized value.

template<class T>
class Susp
    // thunk
    static T const & thunkForce(Susp * susp) {
        return susp->setMemo();
    // thunk
    static T const & thunkGet(Susp * susp) {
        return susp->getMemo();
    T const & getMemo() {
        return _memo;
    T const & setMemo() {
        _memo = _f();
        _thunk = &thunkGet;
        return getMemo();
    explicit Susp(std::function<T()> f)
        : _f(f), _thunk(&thunkForce), _memo(T())
    T const & get() {
        return _thunk(this);
    T const & (*_thunk)(Susp *);
    mutable T   _memo;

    std::function<T()> _f;

(By the way, the function pointer declaration of _thunk looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn't ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the const method pop_front.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

template<class T>
class Stream
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the Cell? Remember, we want to be able to generate infinite sequences, so Stream must contain the DNA for not only producing the value of type T but also for producing the offspring — another (lazy) Stream of values. The Cell is just that: A value and a stream.

template<class T>
class Cell
    Cell() {} // need default constructor for memoization
    Cell(T v, Stream<T> const & tail)
        : _v(v), _tail(tail)
    explicit Cell(T v) : _v(v) {}
    T val() const {
        return _v;
    Stream<T> pop_front() const {
        return _tail;
    T _v;
    Stream<T> _tail;

This mutually recursive pair of data structures works together amazingly well.

template<class T>
class Stream
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
    Stream() {}
    Stream(std::function<Cell<T>()> f)
        : _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
    Stream(Stream && stm)
        : _lazyCell(std::move(stm._lazyCell))
    Stream & operator=(Stream && stm)
        _lazyCell = std::move(stm._lazyCell);
        return *this;
    bool isEmpty() const
        return !_lazyCell;
    T get() const
        return _lazyCell->get().val();
    Stream<T> pop_front() const
        return _lazyCell->get().pop_front();

There are several things worth pointing out. The two constructors follow our formal definition of the Stream: one constructs an empty stream, the other constructs a suspended Cell. A suspension is created from a function returning Cell.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of forEach.

The magic happens when we call get for the first time. That’s when the suspended Cell comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to pop_front. Notice that pop_front is a const method — the Stream itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from n to infinity. The constructor of a Stream takes a function that returns a Cell. We’ll use a lambda that captures the value of n. It creates a Cell with that value and a tail, which it obtains by calling intsFrom with n+1:

Stream<int> intsFrom(int n)
    return Stream<int>([n]()
        return Cell<int>(n, intsFrom(n + 1)); 

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner intsFrom is not made from the outer intsFrom. Instead it’s made the first time get is called on the emerging Stream.

Of course, we can also create finite streams, like this one, which produces integers from n to m:

Stream<int> ints(int n, int m)
    if (n > m)
        return Stream<int>();
    return Stream<int>([n, m]()
        return Cell<int>(n, ints(n + 1, m));

The trick is to capture the limit m as well as the recursion variable n. When the limit is reached, we simply return an empty Stream.

We’ll also need the method take, which creates a Stream containing the first n elements of the original stream:

Stream take(int n) const {
    if (n == 0 || isEmpty())
        return Stream();
    auto cell = _lazyCell;
    return Stream([cell, n]()
        auto v = cell->get().val();
        auto t = cell->get().pop_front();
        return Cell<T>(v, t.take(n - 1));

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, Stream. Again, the key to understanding why this works is to keep in mind that Streams and Cells are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of take forces the evaluation of the first n elements — e.g., by passing them through the consuming forEach below — these elements will appear miraculously memoized in the original Stream.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of forEach that consumes the stream while enumerating it and feeding its elements to a function.

template<class T, class F>
void forEach(Stream<T> strm, F f)
    while (!strm.isEmpty())
        strm = strm.pop_front();

It’s the assignment:

strm = strm.pop_front();

which consumes the stream by decreasing the reference count of the head of the Stream. In particular, if you pass an rvalue Stream to forEach, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the Stream. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and shared_ptr.

Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it z — create a stream from 1 to z. For each of those — call them x — create a stream from x to z. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples (x, y, z).

So far we’ve learned how to create a stream of integers — we have the function intsFrom. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our Stream is a functor.

Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the std::transform algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it fmap — generic, we have to parameterize it over types. The algorithm starts with a Stream of elements of type T and a function from T to some other type U. The result should be a stream of U.

We don’t want to make U the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type F to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

template<class T, class F>
auto fmap(Stream<T> stm, F f)

Without the use of concepts, we have no way of enforcing, or even specifying, that F be a type of a function from T to U. The best we can do is to statically assert it inside the function:

static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

But what is U? We can get at it using decltype:


Notice that decltype takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of f. We also need a dummy argument for this function: we use the result of stm.get(). The argument to decltype is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of fmap. It’s supposed to be Stream<U>, but we don’t know U until we apply decltype to the arguments of fmap. We have to use the new auto function declaration syntax of C++11. So here are all the type-related preliminaries:

template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
    using U = decltype(f(stm.get()));
    static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

Compared to that, the actual implementation of fmap seems rather straightforward:

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
        return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function f, and returns a Cell. That cell contains the value of f acting on the first element of the original stream, and a tail. The tail is created with fmap acting on the rest of the original stream.

Equipped with fmap, we can now attempt to take the first step towards generating our triples: apply the function ints(1, z) to each element of the stream intsFrom(1):

fmap(intsFrom(1), [](int z)
    return ints(1, z);

The result is a Stream of Streams of integers of the shape:

1 2
1 2 3
1 2 3 4
1 2 3 4 5

But now we are stuck. We’d like to apply ints(x, z) to each element of that sequence, but we don’t know how to get through two levels of Stream. Our fmap can only get through one layer. We need a way to flatten a Stream of Streams. That functionality is part of what functional programmers call a monad. So let me show you that Stream is indeed a monad.

Stream as a Monad

If you think of a Stream as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two Streams (we’ll get to it later) and let’s implement a function mjoin that concatenates a whole Stream of Streams.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a Stream of Streams, so we have to test two levels deep. This way we’ll ensure that the concatenation of a Stream of empty Streams immediately returns an empty Stream.

The recursive step in mjoin creates a Cell whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of mjoin of the rest of the streams:

template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
    while (!stm.isEmpty() && stm.get().isEmpty())
        stm = stm.pop_front();
    if (stm.isEmpty()) return Stream<T>();
    return Stream<T>([stm]()
        Stream<T> hd = stm.get();
        return Cell<T>( hd.get()
                      , concat(hd.pop_front(), mjoin(stm.pop_front())));

The combination of fmap and mjoin lets us compose function like intsFrom or ints that return Streams. In fact, this combination is so common that it deserves its own function, which we’ll call mbind:

template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
    return mjoin(fmap(stm, f));

If we use mbind in place of fmap:

mbind(intsFrom(1), [](int z)
    return ints(1, z);

we can produce a flattened list:

1 1 2 1 2 3 1 2 3 4...

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable z. We can keep chaining calls to mbind and capture more variables in the process:

mbind(intsFrom(1), [](int z)
    return mbind(ints(1, z), [z](int x)
        return mbind(ints(x, z), [x, z](int y)

At this point we have captured the triples x, y, z, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is mthen which is a version of mbind that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, mthen will return an empty stream.

We implement mthen using a slightly modified version of fmap that takes a function f of no arguments:

template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
    using U = decltype(f());
    static_assert(std::is_convertible<F, std::function<U()>>::value,
        "fmapv requires a function type U()");

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
        return Cell<U>(f(), fmapv(stm.pop_front(), f));

We plug it into the definition of mthen the same way fmap was used in mbind:

template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
    return mjoin(fmapv(stm, f));

The second useful function is mreturn, which simply turns a value of any type into a one-element Stream:

template<class T>
Stream<T> mreturn(T v)
    return Stream<T>([v]() {
        return Cell<T>(v);

We’ll need mreturn to turn our triples into Streams.

It so happens that a parameterized type equipped with mbind and mreturn is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy Stream is indeed a monad.

Stream as a Monoid and a Monad Plus

When implementing mjoin we used the function concat to lazily concatenate two Streams. Its implementation follows the same recursive pattern we’ve seen so many times:

template<class T>
Stream<T> concat( Stream<T> lft
                , Stream<T> rgt)
    if (lft.isEmpty())
        return rgt;
    return Stream<T>([=]()
        return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like concat, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of Streams is indeed associative, and that the neutral element is an empty Stream. Concatenating an empty Stream, whether in front or in the back of any other Stream, doesn’t change the original Stream.

What’s even more interesting is that being a combination of a monoid and a monad makes Stream into a monad plus, and every monad plus defines a guard function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a Stream. If the Boolean is false, the Stream is empty (the unit element of monad plus!), otherwise it’s a singleton Stream. We really don’t care what value sits in this Stream — we never use the result of guard for anything but the flow of control. In Haskell, there is a special “unit” value () — here I use a nullptr as its closest C++ analog.

Stream<void*> guard(bool b)
    if (b) return Stream<void*>(nullptr);
    else return Stream<void*>();

We can now pipe the result of guard into mthen, which will ignore the content of the Stream but won’t fire when the Stream is empty. When the Stream is not empty, we will call mreturn to output a singleton Stream with the result tuple:

Stream<std::tuple<int, int, int>> triples()
    return mbind(intsFrom(1), [](int z)
        return mbind(ints(1, z), [z](int x)
            return mbind(ints(x, z), [x, z](int y)
                return mthen(guard(x*x + y*y == z*z), [x, y, z]()
                    return mreturn(std::make_tuple(x, y, z));

These singletons will then be concatenated by the three levels of mbind to create one continuous lazy Stream of Pythagorean triples.

Compare this function with its Haskell counterpart:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

Now, the client can take 10 of those triples from the Stream — and the triples still won’t be evaluated!. It’s the consuming forEach that finally forces the evaluation:

void test()
    auto strm = triples().take(10);
    forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
        std::cout << std::get<0>(t) << ", " 
                  << std::get<1>(t) << ", " 
                  << std::get<2>(t) << std::endl;


The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function triples to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.

[If you prefer, you may watch the video of my talk on this topic (here are the slides).]

If you thought you were safe from functional programming in your cozy C++ niche, think again! First the lambdas and function objects and now the monad camouflaged as std::future. But do not despair, it’s all just patterns. You won’t find them in the Gang of Four book, but once you see them, they will become obvious.

Let me give you some background: I was very disappointed with the design of C++11 std::future. I described my misgivings in: Broken Promises — C++0x futures. I also made a few suggestions as how to fix it: Futures Done Right. Five years went by and, lo and behold, a proposal to improve std::future and related API, N3721, was presented to the Standards Committee for discussion. I thought it would be a no brainer, since the proposal was fixing obvious holes in the original design. A week ago I attended the meetings of the C++ Standards Committee in Issaquah — since it was within driving distance from me — and was I in for a surprise! Apparently some design patterns that form the foundation of functional programming are not obvious to everybody. So now I find myself on the other side of the discussion and will try to explain why the improved design of std::future is right.

Design arguments are not easy. You can’t mathematically prove that one design is better than another, or a certain set of abstractions is better than another — unless you discover some obvious design flaws in one of them. You might have a gut feeling that a particular solution is elegant, but how do you argue about elegance?

Thankfully, when designing a library, there are some well known and accepted criteria. The most important ones, in my mind, are orthogonality, a.k.a., separation of concerns, and composability. It also helps if the solution has been previously implemented and tested, especially in more than one language. I will argue that this is indeed the case with the extended std::future design. In the process, I will describe some programming patterns that might be new to C++ programmers but have been tried and tested in functional languages. They tend to pop up more and more in imperative languages, especially in connection with concurrency and parallelism.

The Problem

In a nutshell, the problem that std::future is trying to solve is that of returning the result of a computation that’s being performed in parallel, or returning the result of an asynchronous call. For instance, you start a computation in a separate thread (or a more general execution agent) and you want to, at some point in time, get back the result of that computation. This is one of the simplest models of concurrency: delegating the execution of a function (a closure) to another thread.

To return a value from one thread to another you need some kind of a communication channel. One thread puts a value in the channel, another picks it up. Instead of providing one channel abstraction, as does ML or Haskell, C++11 splits it into two separate abstractions: the promise and the future. The promise is the push end of the channel, the future is the pull end. (In Rust there are similar objects called Chan and Port.)

The general pattern is for the client to construct a promise, get the future from it using get_future, and start a thread, passing it the promise. When the thread is done, it puts the result in the promise using set_value. In the meanwhile, the calling thread may do some other work and eventually decide to retrieve the result from the future by calling its method get. If the promise has been fulfilled, get returns immediately with the value, otherwise it blocks until the value is available.

This pattern involves some boilerplate code dealing with the promise side of things, so the Standard introduced a shortcut called std::async to simplify it. You call std::async with a plain function (closure) and its result is automatically put into a hidden promise. All the client sees is the future side of the channel. (I am simplifying things by ignoring exception handling and various modes of starting async.)

The Functor Pattern

Here’s the first abstraction: A future is an object that encapsulates a value. By itself, this would be a pretty useless abstraction unless the encapsulation came with some other functionality or restriction. For instance, std::unique_ptr encapsulates a value, but also manages the lifetime of the memory it occupies. A future encapsulates a value, but you might have to block to get it. Functional languages have a very useful pattern for just this kind of situation: the functor pattern (not to be confused with the C++ misnomer for a function object). A functor encapsulates a value of an arbitrary type, plus it lets you act on it with a function.

Notice that the functor doesn’t necessarily give you access to the value — instead it lets you modify it. The beauty of it is that, in the case of a future, a functor gives you the means to modify the value that potentially isn’t there yet — and it lets you do it without blocking. Of course, behind the scenes, the function (closure) that you provide is stored in the future and only applied when the value is ready and is being accessed using get.

The first part of the fix that was proposed to the Committee was to turn std::future into a functor. Technically, this is done by adding a new method, then:

template<typename F>
auto future::then(F&& func) -> future<decltype(func(*this))>;

This method takes a function object func to be applied to the future in question. The result is a new future of the type that is returned by the function object, decltype(func(*this)).

Things are slightly muddled by the fact that a future not only encapsulates the value to be calculated but also the possibility of an exception. This is why the function passed to then takes the whole future, from which it can extract the value using get, which at that point is guaranteed not to block, but may rethrow an exception. There is an additional proposal N3865 to introduce another method, next, that would deal only with the value, not the exception. The advantage of next is that it could be called with a regular function unaware of the existence of futures, with no additional boilerplate. For simplicity, I’ll be using next in what follows.

The functor pattern makes perfect sense for composing a regular function on top of an asynchronous function (one returning a future), but it’s more general than that. Any time you have an object that is parameterized by an arbitrary type, you might be dealing with a functor. In C++, that would be a template class that doesn’t impose any restrictions on its template argument. Most containers have this property. In order for a generic class to be a functor it must also support a means to operate on its contents. Most containers in STL provide this functionality through the algorithm std::transform. For an imperative programmer it might come as a surprise that such disparate things as futures and containers fall under the same functional pattern — a functor.

Unlike in functional languages, in C++ there is no natural reusable expression for the functor pattern, so it’s more of the pattern in the head of the programmer. For instance, because of memory management considerations, std::transform operates on iterators rather than containers — the storage for the target container must be either pre-allocated or allocated on demand through iterator adapters. One could try to provide iterator adapters for futures, so they could be operated on by std::transform, but ultimately the transformation has to act on the internals of the future (i.e., store the function object in it) so it either has to be a method or a friend of the future.

The Monad Pattern

The functor pattern is not enough to provide full composability for futures. The likely scenario is that the user creates a library of future-returning functions, each performing a specific task. He or she then needs the means to combine such functions into more complex tasks. This is, for instance, the case when combining asynchronous operations, such as opening a file and then reading from it. Suppose we have the async_open function that returns a file handle future:

future<HANDLE> async_open(string &);

and the async_read function that takes a file handle and returns a future with the buffer filled with data:

future<Buffer> async_read(HANDLE fh);

If you combine the two using next, the result will be a future of a future:

future<future<Buffer>> ffBuf = async_open("foo").next(&async_read);

In order to continue chaining such calls without blocking — for instance to asynchronously process the buffer — you need a way to collapse the double future to a single future and then call next on it.

The collapsing method, unwrap, is another part of the extended future proposal. When called on a future<future<T>> it returns future<T>. It lets you chain asynchronous functions using next followed by unwrap.


In functional programming such a collapsing function is called join. The combination next followed by unwrap (or, in Haskell, fmap followed by join) is so common that it has its own name, bind (in Haskell it’s the operator >>=). It might make sense to make bind another method of future (possibly under a different name). [Edit: In fact, the proposal (n3721) is to overload then to automatically perform unwrap whenever the result is a future of a future. This way then would also work as bind.]

There’s one more important usage pattern: a function that may execute asynchronously, but sometimes returns the result immediately. This often happens in recursive algorithms, when the recursion bottoms up. For instance, a parallel tree traversal function may spawn asynchronous tasks for traversing the children of a node, but when it reaches a leaf, it might want to return the result synchronously. Instead of writing complicated conditional code at each level, it’s easier to provide a “fake” future whose contents is immediately available — whose get method never blocks. Such fake future and the function that creates it called make_ready_future are also part of the proposal.

Together, the methods next (or then) and unwrap, and the function make_ready_future are easily recognizable by a functional programmer as forming the monad pattern (in Haskell, they would be called, respectively, fmap, join, and return). It’s a very general pattern for composing functions that return encapsulated values. Using a monad you may work with such functions directly, rather than unwrapping their results at every step. In the case of futures, this is an important issue, since the “unwrapping” means making a potentially blocking call to get and losing precious opportunities for parallelism. You want to set up as much computation up front and let the system schedule the most advantageous execution.

Combining functions using next, unwrap (or, equivalently, bind), and make_ready_future is equivalent to specifying data dependencies between computations and letting the runtime explore opportunities for parallelism between independent computations.

The Applicative Pattern

The combinators then and next are designed for linear composition: the output of one computation serves as the input for another. A more general pattern requires the combining of multiple asynchronous sources of data. In functional programming the problem would be described as applying a function to multiple arguments, hence the name “applicative” pattern. A functional programmer would take a multi-argument function and “lift” it to accept futures instead of immediate values.

As expected, in imperative programming things are a little messier. You have to create a barrier for all the input futures, retrieve the values, and then pass them to the multi-argument function or algorithm. The proposal contains a function called when_all that implements the first part of the process — the barrier. It takes either a pair of iterators to a container of futures or a variable number of futures, and returns a future that fires when all the arguments are ready. Conceptually, it performs a logical AND of all input futures.

The iterator version of when_all returns a future of a vector of futures, while the variadic version returns a future of a tuple of futures. It’s up to the client to get the resulting vector or tuple and iterate over it. Because of that, it’s not possible to directly chain the results of when_all the way then or next does it.

If you’re wondering how this kind of chaining is done in a functional language, you have to understand what partial application is. A function of many arguments doesn’t have to be applied to all of the arguments at once. You can imagine that applying it to the first argument doesn’t yield a value but rather a function on n-1 arguments. In C++11, this can be accomplished by calling std::bind, which takes a multi-parameter function and a value of the first argument, and returns a function object (a closure) that takes the remaining n-1 arguments (actually, you may pass it more than one argument at a time).

In this spirit, you could bind a multi-parameter function to a single future and get a future of a function of n-1 arguments. Then you are left with the problem of applying a future of a function to a future of an argument, and that’s exactly what the applicative pattern is all about. In Haskell, the Applicative class defines the operator <*> that applies an encapsulated function to an encapsulated value.

The Monoid Pattern

A very common pattern is to start several computations in parallel and pick the one that finishes first. This is the basis of speculative computation, where you pitch several algorithms against each other. Or you might be waiting for any of a number of asynchronous events, and attend to them as soon as they happen.

At a minimum you would expect a combinator that acts like a logical OR of two futures. A functional programmer would be immediately on the lookout for the monoid pattern. A monoid is equipped with a binary operation and a unit element. If the binary operation on futures picks the one that finishes first, what should the unit future be? A unit combined with any element must give back that same element. Therefore we need a future that would lose the race with any other future. We could call this special future “never.” Calling get on such a future would block forever.

In practice, one could slightly relax the definition of the “never” future. It would never return a result, but it could still throw an exception. A future like this could be used to implement a timeout. Pitching it against another future would either let the other future complete, or result in a timeout exception.

This is not the way the future extension proposal went, though. The proposed combinator is called when_any and it takes either a pair of iterators to a container of futures or a variable number of futures. It returns a future of either a vector or a tuple of futures. It’s up to the client to iterate over those futures and find the one (or the ones) that fired by calling is_ready on each of them.

The advantage of this approach is that the client may still write code to wait for the remaining futures to finish. The disadvantage is that the client is responsible for writing a lot of boilerplate code, which will obscure the program logic.

Performance and Programming Considerations

An objection to using futures as the main vehicle for asynchronous programming was raised in N3896: Library Foundations for Asynchronous Operations. The point it that it’s possible for an asynchronous API to have a result ready before the client had the opportunity to provide the continuation by calling then (or next). This results in unnecessary synchronization, which may negatively impact performance.

The alternative approach is to pass the continuation (the handler) directly to the asynchronous API. This is how a lot of asynchronous APIs are implemented at the lowest level anyway. The two approaches don’t exclude each other, but supporting both at the same time, as proposed in N3896, adds a lot of complexity to the programming model.

From the programmer’s perspective, the continuation passing model of N3896 is probably the hardest to use. The programming model is that of a state machine, with the client responsible for writing handlers for every transition.

Futures provide a useful abstraction by reifying the anticipated values. The programmer can write code as if the values were there. Futures also provide a common language between concurrent, parallel, and asynchronous worlds. It doesn’t matter if a value is to be evaluated by spawning a thread, creating a lightweight execution agent, or by calling an asynchronous API, as long as it’s encapsulated in a future. The compositional and expressional power of futures is well founded in major patterns of functional programming: the functor, the monad, the applicative, and the monoid.

There is another, even more attractive programming model that’s been proposed for C++, Resumable Functions, which makes asynchronous code look more like sequential code. This is based on a trick that’s well known to Haskell programmers in the form of the “do” notation. In C++, a resumable function would be chopped by the compiler into a series of continuations separated by await keywords. Instead of creating a future and calling then with a lambda function, the programmer would insert await and continue writing code as if the value were available synchronously.


I’d like to thank Artur Laksberg for reading the draft of this blog and providing useful feedback.

In my previous post I worked on stretching the intuition of what a container is. I proposed that, in Haskell, any functor may be interpreted as some kind of container, including the hard cases like the state functor or the IO functor. I also talked about natural transformations between functors as “repackaging” schemes for containers, which work without “breaking the eggs” — not looking inside the elements stored in the container. Continuing with this analogy: Algebras are like recipes for making omelets.

The intuition is that an algebra provides a way to combine elements stored inside a container. This cannot be done for arbitrary types because there is no generic notion of “combining.” So an algebra is always defined for a specific type. For instance, you can define an algebra for numbers because you know how to add or multiply them, or for strings because you can concatenate them, and so on. The way elements are combined by an algebra is in general driven by the structure of the container itself.

For example, think of an expression tree as a container.

data Expr a = Const a 
            | Add (Expr a) (Expr a) 
            | Mul (Expr a) (Expr a)

We could define many algebras for it. An integer algebra would work on an expression tree that stores integers. A complex algebra would work on a tree that stores complex numbers. A Boolean algebra would work on Boolean expressions using, for instance, logical OR to evaluate the Add node and logical AND for the Mul node. You could even define an algebra of sets with union and intersection for Add and Mul. In fact, in the absence of any additional requirements, any pair of binary functions acting on a given type will do.

The definition of an algebra for a given functor f consists of a type t called the carrier type and a function called the action. Any Haskell algebra is therefore of the type:

newtype Algebra f t = Algebra (f t -> t)

More abstractly, in category theory, an algebra (or, more precisely, an F-algebra) for an endofunctor F is a pair (A, alg) of an object A and a morphism alg : F A -> A. As always, the standard translation from category theory to Haskell replaces objects with types and morphisms with functions.

Let’s have a look at a simple example of an algebra. Let’s pick the list functor and define an Int algebra for it, for instance:

sumAlg :: Algebra [] Int
sumAlg = Algebra (foldr (+) 0)

Despite its simplicity, this example leads to some interesting observations.

First, the use of foldr tells us that it’s possible to handle recursion separately from evaluation. The evaluation is really parameterized here by the function (+) and the value, zero. The algebra is type-specific. On the other hand, foldr is fully polymorphic. It turns out that there is another algebra hidden in this example, and it’s determined just by (+) and zero. We’ll see that more clearly when we talk about fixed points of functors.

The second observation is that a list is not only a functor but also a monad. Is there something special about algebras for a monad? We’ll see.

Algebras and Fixed Points

I wrote a whole blog post about F-algebras with a more categorical slant. Here I’ll elaborate on the Haskell aspects of algebras and develop some more intuitions.

A recursive container is not only a functor but it can also be defined as a fixed point of a functor. So, really, we should start with a double functor, parameterized by two types, a and b:

data ExprF a b = Const a
               | Add b b
               | Mul b b
     deriving Functor

We can then find its fixed point: a type that, when substituted for b, will give back itself. Think of a functor as a TV camera (sorry for switching metaphors). When you point it at some type b, its image appears in all the little monitors where b is on the right hand side of the definition. We all know what happens when you point the camera back at the monitors — you get the ever receding image within image within image… That’s your fixed point.

This “pointing of the camera at the monitors” can be abstracted into a Haskell data structure. It is parameterized by a functor f, which provides the camera and the monitors. The fixed point is given by the ever receding:

newtype Fix f = In (f (Fix f))

Notice that, on the left hand side, f appears without an argument. If f a is a container of a then f by itself is a recipe for creating a container from any type. Fix takes such a recipe and applies it to itself — to (Fix f).

Later we’ll also need the deconstructor, unIn:

unIn :: Fix f -> f (Fix f)
unIn (In x) = x

Going back to our earlier functor, we can apply Fix to it and get back the recursive version of Expr:

type Expr a = Fix (ExprF a)

Here, (ExprF a) is a recipe for stuffing any type b into a simple (non-recursive) container defined by ExprF.

Creating actual expressions using the above definition of Expr is a little awkward, but possible. Here’s one:

testExpr :: Expr Int
testExpr = In $ (In $ (In $ Const 2) `Add` (In $ Const 3)) 
                `Mul` (In $ Const 4)

Knowing that a recursive data type such as (Expr a) is defined in terms of a simpler functor (ExprF a b) means that any recursive algebra for it can be defined in terms of a simpler algebra. For instance, we can define a simple algebra for (ExprF Int) by picking the carrier type Double and the following action:

alg :: ExprF Int Double -> Double
alg (Const i) = fromIntegral i
alg (Add x y) = x + y
alg (Mul x y) = x * y

We can extend this algebra to work on arbitrary recursive expressions of type Expr Int. We’ll call this new recursive algebra alg'. When given an (Expr Int) it will do the following:

  1. Extract the contents of the outer Fix by pattern matching on the consturctor In. The contents is of the type ExprF acting on (Expr Int).
  2. Apply alg' (the recursive one we are just defininig) to this contents. Do this using fmap. Here we are taking advantage of the fact that ExprF is a functor. This application of alg' replaces the children of the expression ExprF with Doubles — the results of their evaluation.
  3. Apply alg to the result of the previous step, which is of the type (ExprF Int Double).

Here’s the code that implements these steps:

alg' :: Fix (ExprF Int) -> Double
alg' (In expr) = alg (fmap alg' expr)

Notice that this code does not depend on the details of the functor. In fact it will work for any functor and any algebra:

cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unIn

This generic function is called a catamorphism. It lets you apply an algebra to the contents of a recursively defined container.

My first example of an algebra was acting on a list. A list can also be defined as a fixed point of a functor:

data ListF a b = Empty | Cons a b
     deriving Functor

If you work out the details, you can convince yourself that the sumAlg I defined earlier is nothing else but the catamorphism for the functor ListF Int applied to the following simple algebra:

alg :: ListF Int Int -> Int
alg Empty = 0
alg (Cons a b) = a + b

Now we understand why any list catamorphism is parameterized by one value and one function of two arguments.

Monads and Algebras

As I said in the beginning, a list is not only a functor but also a monad. A monad adds two special abilities to a functor/container. It lets you create a default container that contains just a given value: The function that does it is called return. And it lets you collapse a container of containers into a single container: That function is called join (and I explained before how it relates to the more commonly used bind, >>=).

When we define an algebra for a functor that happens to be a monad, it would be nice for this algebra to interact sensibly with return and join. For instance, you can apply return to a value of the algebra’s carrier type to obtain a default container of that type. Evaluating such a container should be trivial — it should give you back the same value:

(1) alg . return == id

For instance, in the list monad return creates a singleton list, so we want the algebra to extract the value from a singleton without modifying it in any way.

alg [a] =
(alg . return) a =
id a =

Now let’s consider a container of containers of the carrier type. We have two ways of collapsing it: we can fmap our algebra over it — in other words, evaluate all the sub-containers — or we can join it. Expecting to get the same result in both cases would be asking a lot (but we get something like this in the Kleisli category later). We can demand though that, for an algebra to be compatible with a monad, the two resulting containers at least evaluate to the same thing:

(2) alg . fmap alg == alg . join

Let’s see what this condition means for lists, where join is concatenation. We start with a list of lists and we apply two evaluation strategies to it: We can evaluate the sub-lists and then evaluate the resulting list of results, or we can concatenate the sub-lists and then evaluate the concatenated list.

Guess what, our condition is equivalent to imposing associativity on the algebra. Think of the action of the algebra on a two-element list as some kind of “multiplication.” Since the concatenation of [a, [b, c]] is the same as the concatenation of [[a, b], c], these two must evaluate to the same value. But that’s just associativity of our “multiplication.”

How much can we extend this analogy with multiplication? Can we actually produce a unit element? Of course: The action of the algebra on an empty list:

e = alg []

Let’s check it: Apply our compatibility conditions to the list [[a], []]. This is the left hand side:

(alg . fmap alg) [[a], []] = 
alg [alg [a], alg []] = 
alg [a, e]

And this is the right hand side:

(alg . join) [[a], []] = 
alg [a] = 

So, indeed, e is the right unit of our “multiplication.” You can do the same calculation for [[], [a]] to show that it’s also the left unit.

We have an associative operation equipped with a unit — that’s called a monoid. So any list algebra compatible with the list’s monadic structure defines a monoid.


An F-algebra that’s compatible with a monad (conditions (1) and (2) above), both built on the same functor, is called a T-algebra. I guess that’s because mathematicians replace F with T when they talk about monads. There may be many T-algebras for a given monad and in fact they form a category of their own.

This is not saying much, because requirements for a category are pretty minimal. You have to define arrows: here it would be homomorphisms of T-algebras. A homomorphism of algebras maps one carrier into another in such a way as to preserve the action.

In Haskell, a homomorphism of algebras would just be a function h from one carrier type to another such that:

h    :: A -> B
alg  :: F A -> A
alg' :: F B -> B

h . alg == alg' . fmap h

Here, alg and alg' are the two actions with carrier types A and B, respectively, and F is the functor. What this means is that, if you have a container of As you can evaluate it using alg and then apply h to it and get a B, or you can apply h to the contents of the container using fmap and then evaluate the resulting container of Bs using alg'. The result should be the same in both cases.

This is a pretty standard way of defining homomorphisms for any structure, not just an algebra. Homomorphisms behave like functions: they are composable and there always is an identity homomorphism for every algebra, so they indeed turn T-algebras into a category — the so called Eilenberg-Moore category.

Remember what I said about the compatibility between join and alg? They both take down one layer of containment. Other than that, they are very different: join is a polymorphic natural transformation — it operates on the structure of the container, not its contents. An F-algebra operates on the contents and is defined only for a specific type.

And yet we can use join to define a T-algebra. Just consider using a container as a carrier type. A container is an image of some type a under a functor m which, for our purposes also happens to be a monad. Apply m to it one more time and you get a container of containers. You can “evaluate” this container of containers down to a single container using join.

You have just defined an algebra for the functor m whose carrier type is (m a) and the action is join. In fact, you have defined a whole family of algebras parameterized by the type a. Keep in mind that a is not the carrier type of this algebra, (m a) is. These algebras are called free algebras for the monad m. Guess what, they also form a category — the so called Kleisli category — which is a subcategory of the Eilenberg-Moore category.

Why are these two categories important? Well, it’s a topic for another blog post, but here’s the idea: Suppose you have two functors, F and G, one going from category C to D and the other going back. If G were the inverse of F, we would say that C and D are isomorphic. But what if they were “almost” inverse? For instance, their composition instead of being the identity were somehow mappable to identity. This kind of relationship between functors can be formalized into an adjunction. It so happens that the composition of two adjoint functors forms a monad (or a comonad, if you compose them the other way around). Not only that — any monad may be decomposed into a pair of adjoint functors. There are many ways to perform this decomposition and there are many choices for the intermediate category — the target of F and the source of G. The Kleisli category is the smallest such category and the Eilenberg-Moore category is the largest one.

Anybody who tries to popularize Haskell has to deal with the monad problem. You can’t write the simplest Haskell program without a monad (the IO monad in particular) and yet there is no easy way to explain what a monad is. Mind you, when teaching object oriented programming, nobody starts with a definition of “object.” There’s no need for that because everybody has some intuition about objects — we are surrounded by objects in real life. Which is very convenient because if you tried to define what an object is in C++ or Java, you’d have to use quite a bit of hard-core PL theory; plus you’d have to wave your hands a lot. This trick doesn’t work with monads because, except for category theorists, nobody comes to Haskell with any intuitions about monads. The name monad is scary in itself (this is probably why monads in F# are innocuously called “computation expressions”).

In my n’th attempt to tackle monads I tried a different, mystical, approach. In mysticism you often talk about things that you cannot define. Taoism takes this approach to the extreme — any attempt to define Tao must fail: “Tao that can be described is not the eternal Tao.” Hence the slightly tongue-in-cheek approach I took in this tutorial, Pure Functions, Laziness, I/O, and Monads. If you’re new to Haskell, it might help if you read the two previous tutorials in my Basics of Haskell. Notice that these tutorials are in the new format introduced by the School of Haskell, where you can run and edit program snippets directly in your browser.

I must confess that this tutorial got slightly cold welcome in the Haskell community. That’s why I’m curious what non-Haskellers think of it. Does my approach make monads less or more scary? Take into account that I decided to tackle monads right after explaining the very basics of Haskell syntax — in most courses or books monads are treated as an advanced topic.

The videos and the slides of my Boostcon talks, “Haskell — The Pseudocode Language for C++ Metaprogramming” and now available online, courtesy my employer, Corensic. They cover some of the material from the blog series on monads.

“You must be kidding!” would be the expected reaction to “Monads in C++.” Hence my surprise when I was invited to Boostcon 11 to give a three-hour presentation on said topic, a presentation which was met with totally unexpected level of interest. I imagine there was a small uptick in the sales of Haskell books afterwards.

Before I answer the question: “Why are monads relevant in C++?”, let me first answer the question: “Why is Haskell relevant in C++?” It turns out that Haskell makes C++ template metaprogramming if not easy then at least approachable (see my blog, What Does Haskell Have to Do with C++?). You have to understand that compile-time C++ is a strict functional language operating (mostly) on types. If that is not immediately obvious, it’s because the C++ template syntax is so appalling.

Now, not everybody falls in love with the Haskell syntax (at least not at first sight) but it fits the functional paradigm much better than anything else. What’s more, the correspondence between Haskell code and C++ template code is so direct that it could almost be translated mechanically. It really pays to design and test your template metaprograms in Haskell first, before translating them to C++. Conversely, translating complex template code to Haskell may help you understand it better and enable you to maintain and extend it. Those were my reasons to advocate the use of Haskell as a pseudo-language for C++ template metaprogramming.

Armed with Haskell I could study and analyze some of the most sophisticated C++ metacode. But when I looked at Eric Niebler’s Boost Proto library and couldn’t make heads or tails of it, even after discussing it with him over email and over beers, I was stumped.

Monads and EDSLs

Boost Proto is a library for implementing domain-specific languages (DSLs). It’s an impressive piece of C++ metaprogramming but it’s hard to comprehend and it doesn’t yield to straightforward translation to Haskell. The problem is that it combines compile-time constructs with run time code in a non-trivial way. I struggled long with Proto until one day I had an epiphany– I was looking at a monad. But first:

What Is an EDSL?

Great things come out of abusing C++. One example is the abuse of templates (which were originally designed to support parametrized data types) to express compile-time computations. The result is Template Metaprogrammming (TMP). Another example is the abuse of operator overloading, which created a fertile ground for Embedded Domain-Specific Languages (EDSLs). Eric’s Proto combined the two abominations into a library for constructing EDSLs.

In this post I will construct a simple EDSL in C++. This is how it works:

  1. You overload the heck out of operators
  2. The overloaded operators build trees from expressions instead of eagerly computing results
  3. The type of the tree encodes the structure of the expression that created it. Since it’s a type, the structure of the expression is available at compile time
  4. You construct an object of that type
  5. You execute that object (because it is a function object) with appropriate arguments, and get the results you wanted.

A more general EDSL will also create a runtime data structure that stores the values of any variables and literals in the expression. Algorithms can walk the tree at compile time, runtime, or both to compute values and perform actions. You can even write a compile time algorithm to compute a runtime algorithm that munges trees in interesting ways.

My moment of Zen was when I realized that an EDSL corresponds to a Haskell reader monad. (If you’re not familiar with Haskell or monads, read my series, Monads for the Curios Programmer.) To my own amazement this analogy worked and led to executable code.

Haskell Monad C++ “Monad”
Composition of monadic functions Compile-time parsing of expressions
Execution of compound action on astate Application of function object to runtime values

To prove the concept, I picked a simple EDSL based on one of the Eric’s examples. It’s a two-argument lambda EDSL. It let’s you write an expression with two placeholders, arg1 and arg2. Through the magic of templates and operator overloading, this expression is interpreted as an anonymous function of two arguments. Here’s an example which evaluates to 25:

int x = (arg1 * arg2 + 13)(3, 4)

It turns out that the most important step in this process is to be able to convert an expression tree into a function object. Let me do this in Haskell first, and then translate it into C++.

The Expression Monad in Haskell

Let’s start with a short:

Reader Monad Refresher

A reader monad (a state monad with immutable state) provides a way to express stateful computations in a pure functional language. A stateful computation takes some arguments and returns a value, but unlike a function, it makes use of some external state. This computation can be turned into a function that takes the same arguments but, instead of returning a value, returns another function that takes a state as an argument and calculates the value. (The distinction between a computation and a function is essential here.)

Statefull computation represented as a monadic function returning an action


The state, in the case of our expression monad, is a collection of two values — the arguments to the lambda. They will be used to replace the two placeholders in the expression.

Here’s the definition of our state — a list of integers (we really need only two, but I’m being sloppy… I mean general):

type Args = [Integer]


An action is a function that takes the state and produces a value of arbitrary type t:

Args -> t

You might remember from my previous post that one element of the monad is a type constructor: a mapping of an arbitrary type into an enriched type. In this case the enriched type is an action type. Because of technicalities related to Haskell’s type inference, I’ll have to thinly encapsulate an action inside a new type, which I will call Prog:

newtype Prog t = PR (Args -> t)

The idea is that the composition of monadic actions is similar to the compilation of source code: it produces a “program,” which is then “run.” In fact, I’ll define an auxiliary function that does just that, it runs a “program”:

run :: Prog t -> Args -> t
run (PR act) args = act args

This is a function that takes a Prog t as its first argument and pattern matches it to the constructor (PR act). The second argument is the state, args of type Args. The result of running the program is a value of type t.

Monadic Functions

The next step is to define some primitive monadic functions. These are functions that produce actions (or “programs”). Here’s a very useful function that produces an action that extracts the n’th value from the list of arguments (the state):

getArg :: Int -> Prog Integer
getArg n = PR (λ args -> args !! n))

getArg takes an Int (in our case, zero or one) and returns a lambda (encapsulated in a Prog using the constructor PR). This lambda takes a list of Integers, args, and extracts its n’th element (args !! n means: take the n’th element of the list, args).

You’ve seen examples of monadic functions in my previous blog post, but it’s worth repeating the idea: getArg is a function that returns an action that is like a promise to extract the n’th argument when the arguments become available.

Just for fun, here’s another monadic function that takes n and promises to return twice n when the arguments are provided. It doesn’t matter that the arguments are going to be ignored.

doubleIt n = PR (λ args -> 2 * n)

Monadic Bind

I need two monadic functions, getArg and doubleIt, in order to demonstrate their composition using “bind.” I want to create an action that will get the first argument using getArg 0 and double it using doubleIt v. If you remember how “bind” works, it takes an action, in this case the result of getArg 0, and a continuation, which represents “the rest of the computation.”

Combining two monadic functions using bind

In our case, the continuation is a lambda that takes an argument v (that’s the future result of the first action) and performs doubleIt v.

λ v -> doubleIt v

This lambda returns an action because doubleIt v returns an action.

The signature of bind is:

bind :: (Prog a) -> (a -> (Prog b)) -> (Prog b)

You might have noticed that I use the words “action” and “program” interchangeably, although, strictly speaking, an action is the contents of a program. However, this distinction is an artifact or a Haskell quirk — a monad can’t be defined using a type alias, so we need the Prog type to encapsulate the action. Curiously, we won’t have this problem in C++.

The purpose of bind is to glue together an action with a continuation and return a new action. That means that bind has to return a lambda of appropriate type:

bind (PR act) cont =
    PR (λ args -> ... produce value of type b ...)

This lambda will be executed when arguments are available. At that point, with the arguments handy, we’ll be able to first execute the action, act, and pass its result to the continuation. The continuation will return another action, which we can promptly execute too:

bind (PR act) cont =
    PR (λ args ->
        let v = act args
            (PR act') = cont v
            act' args)

(In Haskell you can use primed variables, like act'. I like that notation.)

This is very much like the state monad bind, except that we don’t have to worry about chaining the state, which is immutable. In fact the above is the definition of the reader monad’s bind.

Let’s test our new bind by composing getArg with doubleIt:

test0 :: Prog Integer
test0 =
    bind (getArg 0) (λ v -> doubleIt v)

Composing monadic function, GetArg, with the continuation that calls another monadic function, doubleIt

We can run the program produced by test0 to see that it actually works:

> let prog = test0
> run prog [3,4]
> run prog [11,0]

For completeness, here’s the full definition of the Prog reader monad:

instance Monad Prog where
     return v = PR (λ args -> v)
     (PR act) >>= cont = PR (λ arg -> let v = act arg
                                          (PR act') = cont v
                                      in act' arg)

With that definition, we can rewrite our little example in the do notation:

test1 = do
    v <- getArg 0
    doubleIt v

Expression Tree

We have the definition of a program — a thinly encapsulated action. To create an actual program we need some kind of source code and a compiler. Our source code takes the form of an expression tree. A tree in Haskell is defined as a tagged union:

data Exp = Const Integer
           | Plus Exp Exp
           | Times Exp Exp
           | Arg1 
           | Arg2

The first constructor, Const takes an Integer and creates a leaf node corresponding to this constant value. The next two constructors are recursive, they each take two expressions and produce, correspondingly, a Plus and a Times node. Finally, there are two placeholders for the two arguments that will be provided at “runtime.”


Compilation is turning source code into a program. This process can be represented as a function with the following signature:

compile :: Exp -> Prog Int

It takes an expression tree (the source) and returns a program that evaluates to an integer. Looking from another perspective, compile is just a monadic function in our Prog monad.

We will define compile in little steps driven by the various patterns in the definition of Exp. If the expression matches the Const node, we define compile as:

compile (Const c) = return c

Remember, return is a function that takes a value and returns an action (program) that will produce this value when executed.

Another easy case is Arg1 (and Arg2)–we already have a monadic function getArg that we can call:

compile Arg1 = getArg 0

The interesting case is the Plus (or Times) node. Here we have to recursively call compile for both children and then compose the results. Well, that’s what monadic function composition is for. Here’s the code using the do notation, hiding the calls to bind:

compile (Plus e1 e2) = 
        v1 <- compile e1
        v2 <- compile e2
        return (v1 + v2)

That’s it! We can now compile a little test expression (Arg1 * Arg2 + 13):

testExp =
    let exp = (Plus (Times Arg1 Arg2)(Const 13))
    in compile exp

and we can run the resulting program with an argument list:

> let args = [3, 4]
> let prog = testExp
> run prog args

The Expression Monad in C++

The translation of the Haskell expression monad to C++ is almost embarrassingly easy. I’ll be showing Haskell code side by side with its C++ equivalents.

Expression Tree

data Exp = Const Integer
           | Plus Exp Exp
           | Times Exp Exp
           | Arg1 
           | Arg2

In C++, the expression tree is a compile-time construct that is expressed as a type (I’ll show you later where this type originates from). We have separate types for all the leaves, and the non-leaf nodes are parametrized by the types of their children.

template<int n> struct Const {};

template<class E1, class E2>
struct Plus {};

template<class E1, class E2>
struct Times {};

struct Arg1 {};
struct Arg2 {};

For instance, here’s the type that corresponds to the expression arg1*arg2+13:

Plus<Times<Arg1, Arg2>, Const<13>>



type Args = [Integer]

C++: State is a runtime object. I implemented it using an array of two integers and, for good measure, I added a constructor and an accessor.

struct Args
    Args(int i, int j) {
        _a[0] = i;
        _a[1] = j;
    int operator[](int n) { return _a[n]; }
    int _a[2];


Here’s the tricky part: How to represent an action? Remember, an action takes state, which is now represented by Args, and returns a value of some type. Because Args are only available at runtime, an action must be a runtime function or, even better, a function object.

How does that fit in our compile-time/runtime picture? We want our C++ monadic functions to be “executed” at compile time, but they should produce actions that are executed at runtime. All we can do at compile time is to operate on types, and this is exactly what we’ll do. We will create a new type that is a function object. A function object is a struct that implements an overloading of a function-call operator.

There’s another way of looking at it by extending the notion of a metafunction. In one of my previous posts I described metafunctions that “return” values, types, or even other metafunctions. Here we have a metafunction that returns a (runtime) function. This view fits better the Haskell monadic picture where a monadic function returns an action.

Type Constructor

Unfortunately, not everything can be expressed as neatly in C++ as it is in Haskell. In particular our type constructor:

newtype Prog t = PR (Args -> t)

doesn’t have a direct C++ counterpart. It should be a template that, for any type T, defines an action returning that type. In this construction, an action is represented by a C++ function object, so we would like something like this:

template<class T> struct PR {
    T operator()(Args args);

which, for many reasons, is useless. What we really need is for PR to be a “concept” that specifies a type with an associated method, operator(), of a particular signature. Since concepts are not part of C++11, we’ll have to rely on programming discipline and hope that, if we make a mistake, the compiler error messages will not be totally horrible.

Monadic Metafunction

Let’s start by translating the Haskell monadic function getArg:

getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)

Here it is in C++:

template<int n>
struct GetArg { // instance of the concept PR
    int operator()(Args args) {
        return args[n];

It is a metafunction that takes a compile-time argument n and “returns” an action. Translation: it defines a struct with an overloaded operator() that takes Args and returns an int. Again, ideally this metafunction should be a struct that is constrained by the concept PR.


Let’s look again at the Haskell’s implementation of monadic bind:

bind (PR prog) cont =
    PR (λ args ->
        let v = prog args
            (PR prog') = cont v
            prog' args)

It takes two arguments: an action and a continuation. We know what an action is in our C++ construction: it’s a function object with a particular signature. We’ll parameterize our C++ Bind with the type, P1, of that object. The continuation is supposed to take whatever that action returns and return a new action. The type of that new action, P2, will be the second template parameter of Bind.

We’ll encode the type of the continuation as the standard function object taking an int and returning P2:


Now, the C++ Bind is a metafunction of P1 and P2 that “returns” an action. The act of “returning” an action translates into defining a struct with the appropriate overload of operator(). Here’s the skeleton of this struct:

template<class P1, class P2> // compile-time type parameters
struct Bind {
    Bind(P1 prog, std::function<P2(int)> cont)
        : _prog(prog), _cont(cont)
    P1 _prog;
    std::function<P2(int)> _cont;

Notice that at runtime we will want to construct an object of this type, Bind, and pass it the runtime arguments: the action and the continuation. The role of the constructor requires some explanation. Haskell function bind is a monadic function of two arguments. Its C++ counterpart is a metafunction that takes four arguments: P1 and P2 at compile time, and prog and cont at runtime. This is a general pattern: When constructing a monadic metafunction in C++ we try to push as much as possible into compile time, but some of the arguments might not be available until runtime. In that case we shove them into the constructor.

The interesting part is, of course, the function-call operator, which really looks like a one-to-one translation of the Haskell implementation:

template<class P1, class P2>
struct Bind {
    int operator()(Args args) {
        int v = _prog(args);
        P2 prog2 = _cont(v);
        return prog2(args);

Things to observe: A smart compiler should be able to inline all these calls because it knows the types P1 and P2, so it can look up the implementations of their function-call operators. What’s left to the runtime is just the operations on the actual runtime values, like the ones done inside GetArg::operator(). However, I have been informed by Eric Niebler that many of these optimizations are thwarted by the choice of std::function for the representation of the continuation. This problem can be overcome, but at some loss of clarity of code, so I’ll stick to my original implementation (also, see Appendix 2).


All that is left to complete the monad is the return function. Here it is in Haskell:

return :: a -> Prog a
return v = PR (λ args -> v)

And here it is, as a function object, in C++:

struct Return
    Return(int v) : _v(v) {}
    int operator()(Args args) 
        return _v;
    int _v;

Of course, in full generality, Return should be parameterized by the return type of its operator(), but for this example a simple int will do. The argument v is only available at runtime, so it is passed to the constructor of Return.

The “Compile” Metafunction in C++

Now that we have our monad implemented, we can use it to build complex monadic functions from simple ones (such as GetArg). In Haskell we have implemented a monadic function compile with the following signature:

compile :: Exp -> Prog Int

It takes an expression tree and returns an action that evaluates this tree.

The C++ equivalent is a metafunction that takes a (compile-time) expression tree and defines a struct with the appropriate overload of a function-call operator. Lacking concept support, the latter requirement can’t be directly expressed in C++. We can, and in fact have to, provide a forward declaration of Compile:

template<class Exp>
struct Compile;

In Haskell, we defined compile using pattern matching. We can do the same in C++. Just like we split the Haskel definition of compile into multiple sub-definitions corresponding to different argument patterns, we’ll split the C++ definition into multiple specializations of the general template, Compile.

Here’s the first specialization in Haskell:

compile (Const c) = return c

and in C++:

template<int c>
struct Compile<Const<c>> : Return
    Compile() : Return(c) {}

I could have defined a separate overload of operator() for this case, but it’s simpler to reuse the one in Return.

Here’s another trivial case:

compile Arg1 = getArg 0

translates into:

struct Compile<Arg1> : GetArg<0> {};

The real fun begins with the Plus node, because it involves composition of monadic functions. Here’s the de-sugared Haskell version:

compile (Plus exL exR) = 
  bind compile exL
       λ left -> 
          bind compile exR
               λ right ->
                   return (left + right)

The logic is simple: First we compile the left child and bind the result with the continuation (the lambda) that does the rest. Inside this continuation, we compile the right child and bind the result with the continuation that does the rest. Inside that continuation we add the two results (they are the arguments to the continuation) and encapsulate the sum in an action using return.

In C++, the binding is done by creating the appropriate Bind object and passing it (in the constructor) a function object and a continuation. The function object is the result of the compilation of the left child (we construct a temporary object of this type on the fly):


Just like in Haskell, the continuation is a lambda, only now it’s the C++11 lambda. Here’s the code, still with some holes to be filled later:

template<class L, class R>
struct Compile<Plus<L, R>> {
  int operator()(Args args) 
    return Bind<...> (
      [](int left) {
        return Bind<...>(
          [left](int right) {
             return Return(left + right);

Notice that the second lambda must explicitly capture the local variable, left. In Haskell, this capture was implicit.

The types for the instantiations of the two Bind templates can be easily derived bottom up. Ideally, we would like the compiler to infer them, just like in Haskell, but the C++ compiler is not powerful enough (although, at the cost of muddying the code some more, one can define template functions that return Bind objects of appropriate types — in C++ type inference works for template functions).

Here’s the final implementation with the types filled in:

template<class L, class R>
struct Compile<Plus<L, R>> {
  int operator()(Args args) 
    return Bind<Compile<L>, Bind<Compile<R>, Return>> (
      [](int left) -> Bind<Compile<R>, Return> {
        return Bind<Compile<R>, Return>(
          [left](int right) -> Return {
            return Return(left + right);

It’s quite a handful and, frankly speaking, I would have never been able to understand it, much less write it, if it weren’t for Haskell.

The Test

You can imagine the emotional moment when I finally ran the test and it produced the correct result. I evaluated the simple expression, Arg1*Arg2+13, with arguments 3 and 4 and got back 25. The monad worked!

void main () {
    Args args(3, 4);
    Compile<Plus<Times<Arg1, Arg2>, Const<13>>> act;
    int v = act(args);
    std::cout << v << std::endl;


Now that we have the monad and a monadic function Compile, we can finally build a simple embedded domain-specific language. Our minimalistic goal is to be able to evaluate the following expression:

int x = (arg1 + arg2 * arg2)(3, 4);

The trick is to convince the compiler to construct a very special type that represents the particular expression tree. In our case it should be something like this:

Plus< Arg1, Times<Arg2, Arg2> >

With this type, call it E, the compiler should call a special metafunction we’ll call Lambda, which returns a function object of two integral arguments:

template<class E>
struct Lambda {
    int operator()(int x, int y) {
        Args args(x, y);
        Compile<E> prog;
        return prog(args);

How does one do it? This is really the bread and butter of EDSL writers — pretty elementary stuff — but I’ll explain it anyway. We start by declaring two object, arg1 and arg2:

const Lambda<Arg1> arg1;
const Lambda<Arg2> arg2;

These little objects can infect any expression with their Lambda-ness, and spread the infection with the help of appropriately overloaded arithmetic operators.

For instance, when the compiler sees arg1+arg2, it will look for the overload of operator+ that takes two Lambdas. And we’ll provide this one:

template<class E1, class E2>
Lambda<Plus<E1, E2>> operator+ (Lambda<E1> e1, Lambda<E2> e2)
    return Lambda<Plus<E1, E2>>();

Notice that this operator returns another Lambda, albeit of a more complex type, thus spreading the Lambda-ness even further, and so on. (In this very simplified example I’m ignoring the arguments e1 and e1. In general they would be used to create a runtime version of the expression tree.)

Let’s see how it works in our example:

(arg1 + arg2 * arg2)(3, 4)

The seed types are, respectively,

Lambda<Arg1>,  Lambda<Arg2>, Lambda<Arg2> 

The application of the inner operator* (its definition is left as an exercise to the reader), produces the following type:

Lambda<Times<Arg2, Arg2>>

This type is passed, along with Lambda<Arg1> to operator+, which returns:

Lambda<Plus<Arg1, Times<Arg2, Arg2>>

All this is just type analysis done by the compiler to find out what type of object is returned by the outermost call to operator+.

The next question the compiler asks itself is whether this object can be called like a function, with the two integer arguments. Why, yes! A Lambda object (see above) defines an overload of the function-call operator. The instantiation of this particular Lambda defines the following overload:

int Lambda<Plus<Arg1, Times<Arg2, Arg2>>::operator()(int x, int y) {
     Args args(x, y);
     Compile<Plus<Arg1, Times<Arg2, Arg2>> prog;
     return prog(args);

This function will be called at runtime with the arguments 3 and 4 to produce the expected result, 19.

The code that has to be actually executed at runtime is the call to prog(args), which is mostly a series of Binds, an addition, and a multiplication. Since the implementation of a Bind‘s function-call operators has no flow of control statements (if statements or loops), it can all be inlined by an optimizing compiler (modulo the glitch with std::function I mentioned earlier). So all that’s left is the addition and multiplication. Eric tells me that this is how Proto expressions work, and there is no reason why the monadic version wouldn’t lead to the same kind of performance.

Conculsions and Future Work

I came up with this C++ monadic construction to help me understand the kind of heavy-duty template metaprogramming that goes on in Proto. Whether I have succeeded is a matter of opinion. I exchanged some of the complexity of template manipulations for a different complexity of Haskell and monads. I would argue that any C++ template metaprogrammer should know at least one functional language, so learning Haskell is a good investment. Monads are just part of the same Haskell package.

Assuming you understand monads, is the monadic construction of an EDSL simpler than the original using Proto? I will argue that it is. The implementation of Proto is pretty much monolithic, the concepts it’s using have very little application or meaning outside of Proto. The monadic approach, on the other hand, decomposes into several layers of abstraction. At the bottom you have the construction of the state-, or reader-, monad. The monad exposes just three primitives: the type constructor, bind, and return. With these primitives you can create and compose monadic (meta-) functions — in my example it was the compile metafunction. Finally, you use these metafunctions to build an EDSL.

With this three-layer structure comes a well defined set of customization points. Users may plug in their own type constructors and implement their own Bind and Return. Or, they might use the default monad and just create their own monadic functions. It’s important that the procedure for building and composing monadic functions is well defined and that it uses straight C++ for implementing the logic. Inside a monadic function you can use regular C++ statements and standard control flow devices.

It’s not yet totally clear how general this approach is — after all what I described is a toy example. But there is a lot of interest in re-thinking or maybe even re-implementing Boost Proto and Phoenix in terms of monads. At Boostcon I started working with Joel Falcou and Hartmut Kaiser on this project, and later Eric Niebler and Thomas Heller joined our group. We believe that having solid theoretical foundations might encourage wider acceptance of some of the more complex template libraries. Especially if strict one-to-one correspondence with Haskell code could be established.


I’ve been given so much feedback on my Boostcon presentation and the drafts of this blog post that I should really list the following people as co-authors: Eric Niebler, Thomas Heller, Joel Falcou, and Hartmut Keiser. Thank you guys!

This page has been translated into Spanish language by Maria Ramos.

Appendix 1

The picture wouldn’t be complete if I didn’t provide the translation of the EDSL construction back to Haskell. It’s pretty straightforward, except for the last part where an expression becomes a callable function object. Maybe there is a way to bend Haskell’s syntax to do it directly but, since this is just a proof of concept, I took a shortcut and defined a function toFun which turns a Lambda expression into a function.

newtype Lambda = L Exp

toFun (L ex) =
  λ x y ->
     run (compile ex) [x, y]

instance Num Lambda where
    (L e1) + (L e2) = L (Plus e1 e2)
    (L e1) * (L e2) = L (Times e1 e2)
    fromInteger n = L (Const n)

test =
    let arg1 = L Arg1
        arg2 = L Arg2
        (toFun (arg1 + 2 * arg2 * arg2)) 2 3

The overloading of arithmetic operators is done by making Lambda an instance of the type class Num.

Appendix 2

When writing code at this level of abstraction it’s easy to bump into compiler bugs. For instance, Visual Studio 2010 won’t instantiate this simpler version of Bind that doesn’t use std::function:

template<class P1, class Cont>
struct Bind
    Bind(P1 prog, Cont f)
        : _prog(prog), _f(f)
    int operator()(State args) {
        int v = _prog(args);
        auto p2 = _f(v);
        return p2(args);
    P1 _prog;
    Cont _f; // store a lambda continuation


  1. Bartosz Milewski, Monads for the Curious Programmer:
  2. Bartosz Milewski, What Does Haskell Have to Do with C++?
  3. Eric Niebler, Expressive C++. A series of blog posts that started all this
  4. Brian McNamara and Yannis Smaragdakis, Functional Programming in C++. A Haskell-in-C++ library
  5. Brian McNamara and Yannis Smaragdakis, Syntax sugar for C++: lambda, infix, monads, and more. More traditional approach to monads in C++ as a runtime device
  6. Zoltán Porkoláb, Ábel Sinkovics, Domain-specific Language Integration with Compile-time Parser Generator Library. A C++ implementation of a compile-time parser that uses monadic approach (although the word “monad” is never mentioned in the paper)

I only went to one talk, not because the rest was not interesting, quite the contrary, but because I worked with Joel and Hartmut on rewriting Proto. I think we essentially got it. We have the exrpession monad implemented, my “compile” function turned out to be the equivalent of Proto transform, but with much more flexibility, expression extension produced a little Lambda EDSL with placeholders for arguments and even const terminals. It works like a charm. If you don’t know what I’m talking about, I promise to finish my blog series on monads in C++ real soon now.

The talk I went to was Chris Kohlhoff talking more about Asio, the asynchronous IO library. He was showing how the new features of C++11 make his code simpler, safer, and more flexible without too much effort. In particular he found move semantics extremely helpful in reducing (or, in some cases, eliminating) the need for memory allocation in steady state–an important property when running in an embedded system, for instance. But what I liked most was his approach to solving the inversion of control problem by implementing his own coroutines. Sure, he had to abuse C++ macros, but the code was actually much more readable and reflected the way we think about asynchronous calls.

The idea is that, with coroutines, you write your code in a linear way. You say “read socket asynchronously” and then yield. The flow of control exits the coroutine in the middle, and continues with other tasks. The trick is that the rest of the coroutine becomes your completion handler. When the async call completes, the coroutine is called again, but it starts executing right after the last yield. Your flow of control moves back and forth, but your code looks linear, instead of being fragmented into a bunch of handlers. It makes you wish coroutines were part of the language, as they are, for instance, in C#.

By the way, I caught Hans Boehm while he was waiting for the airport shuttle and asked questions about memory_order_relaxed. You know, the problem is, Can a relaxed load fetch an “out of thin air” value–a value that has never been written by anybody else? What I’m getting now is that in practice this will never happen, but it’s very hard to describe this requirement formally. In other words, Can a malicious chip manufacturer in cahoots with a compiler writer come up with a system that fulfills the letter of the C++ memory model and lets you fetch an out-of-thin-air value? I think the answer is yes, because the language of the Standard is purposely vague on this topic:

(29.3.11) [ Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

// Thread 1:
r1 = x.load(memory_order_relaxed);
if (r1 == 42), memory_order_relaxed);
// Thread 2:
r2 = y.load(memory_order_relaxed);
if (r2 == 42), memory_order_relaxed);

However, implementations should not allow such behavior.—end note ]

Next Page »