[If you prefer, you may watch the video of my talk on this topic (here are the slides).]

If you thought you were safe from functional programming in your cozy C++ niche, think again! First the lambdas and function objects and now the monad camouflaged as std::future. But do not despair, it’s all just patterns. You won’t find them in the Gang of Four book, but once you see them, they will become obvious.

Let me give you some background: I was very disappointed with the design of C++11 std::future. I described my misgivings in: Broken Promises — C++0x futures. I also made a few suggestions as how to fix it: Futures Done Right. Five years went by and, lo and behold, a proposal to improve std::future and related API, N3721, was presented to the Standards Committee for discussion. I thought it would be a no brainer, since the proposal was fixing obvious holes in the original design. A week ago I attended the meetings of the C++ Standards Committee in Issaquah — since it was within driving distance from me — and was I in for a surprise! Apparently some design patterns that form the foundation of functional programming are not obvious to everybody. So now I find myself on the other side of the discussion and will try to explain why the improved design of std::future is right.

Design arguments are not easy. You can’t mathematically prove that one design is better than another, or a certain set of abstractions is better than another — unless you discover some obvious design flaws in one of them. You might have a gut feeling that a particular solution is elegant, but how do you argue about elegance?

Thankfully, when designing a library, there are some well known and accepted criteria. The most important ones, in my mind, are orthogonality, a.k.a., separation of concerns, and composability. It also helps if the solution has been previously implemented and tested, especially in more than one language. I will argue that this is indeed the case with the extended std::future design. In the process, I will describe some programming patterns that might be new to C++ programmers but have been tried and tested in functional languages. They tend to pop up more and more in imperative languages, especially in connection with concurrency and parallelism.

The Problem

In a nutshell, the problem that std::future is trying to solve is that of returning the result of a computation that’s being performed in parallel, or returning the result of an asynchronous call. For instance, you start a computation in a separate thread (or a more general execution agent) and you want to, at some point in time, get back the result of that computation. This is one of the simplest models of concurrency: delegating the execution of a function (a closure) to another thread.

To return a value from one thread to another you need some kind of a communication channel. One thread puts a value in the channel, another picks it up. Instead of providing one channel abstraction, as does ML or Haskell, C++11 splits it into two separate abstractions: the promise and the future. The promise is the push end of the channel, the future is the pull end. (In Rust there are similar objects called Chan and Port.)

The general pattern is for the client to construct a promise, get the future from it using get_future, and start a thread, passing it the promise. When the thread is done, it puts the result in the promise using set_value. In the meanwhile, the calling thread may do some other work and eventually decide to retrieve the result from the future by calling its method get. If the promise has been fulfilled, get returns immediately with the value, otherwise it blocks until the value is available.

This pattern involves some boilerplate code dealing with the promise side of things, so the Standard introduced a shortcut called std::async to simplify it. You call std::async with a plain function (closure) and its result is automatically put into a hidden promise. All the client sees is the future side of the channel. (I am simplifying things by ignoring exception handling and various modes of starting async.)

The Functor Pattern

Here’s the first abstraction: A future is an object that encapsulates a value. By itself, this would be a pretty useless abstraction unless the encapsulation came with some other functionality or restriction. For instance, std::unique_ptr encapsulates a value, but also manages the lifetime of the memory it occupies. A future encapsulates a value, but you might have to block to get it. Functional languages have a very useful pattern for just this kind of situation: the functor pattern (not to be confused with the C++ misnomer for a function object). A functor encapsulates a value of an arbitrary type, plus it lets you act on it with a function.

Notice that the functor doesn’t necessarily give you access to the value — instead it lets you modify it. The beauty of it is that, in the case of a future, a functor gives you the means to modify the value that potentially isn’t there yet — and it lets you do it without blocking. Of course, behind the scenes, the function (closure) that you provide is stored in the future and only applied when the value is ready and is being accessed using get.

The first part of the fix that was proposed to the Committee was to turn std::future into a functor. Technically, this is done by adding a new method, then:

template<typename F>
auto future::then(F&& func) -> future<decltype(func(*this))>;

This method takes a function object func to be applied to the future in question. The result is a new future of the type that is returned by the function object, decltype(func(*this)).

Things are slightly muddled by the fact that a future not only encapsulates the value to be calculated but also the possibility of an exception. This is why the function passed to then takes the whole future, from which it can extract the value using get, which at that point is guaranteed not to block, but may rethrow an exception. There is an additional proposal N3865 to introduce another method, next, that would deal only with the value, not the exception. The advantage of next is that it could be called with a regular function unaware of the existence of futures, with no additional boilerplate. For simplicity, I’ll be using next in what follows.

The functor pattern makes perfect sense for composing a regular function on top of an asynchronous function (one returning a future), but it’s more general than that. Any time you have an object that is parameterized by an arbitrary type, you might be dealing with a functor. In C++, that would be a template class that doesn’t impose any restrictions on its template argument. Most containers have this property. In order for a generic class to be a functor it must also support a means to operate on its contents. Most containers in STL provide this functionality through the algorithm std::transform. For an imperative programmer it might come as a surprise that such disparate things as futures and containers fall under the same functional pattern — a functor.

Unlike in functional languages, in C++ there is no natural reusable expression for the functor pattern, so it’s more of the pattern in the head of the programmer. For instance, because of memory management considerations, std::transform operates on iterators rather than containers — the storage for the target container must be either pre-allocated or allocated on demand through iterator adapters. One could try to provide iterator adapters for futures, so they could be operated on by std::transform, but ultimately the transformation has to act on the internals of the future (i.e., store the function object in it) so it either has to be a method or a friend of the future.

The Monad Pattern

The functor pattern is not enough to provide full composability for futures. The likely scenario is that the user creates a library of future-returning functions, each performing a specific task. He or she then needs the means to combine such functions into more complex tasks. This is, for instance, the case when combining asynchronous operations, such as opening a file and then reading from it. Suppose we have the async_open function that returns a file handle future:

future<HANDLE> async_open(string &);

and the async_read function that takes a file handle and returns a future with the buffer filled with data:

future<Buffer> async_read(HANDLE fh);

If you combine the two using next, the result will be a future of a future:

future<future<Buffer>> ffBuf = async_open("foo").next(&async_read);

In order to continue chaining such calls without blocking — for instance to asynchronously process the buffer — you need a way to collapse the double future to a single future and then call next on it.

The collapsing method, unwrap, is another part of the extended future proposal. When called on a future<future<T>> it returns future<T>. It lets you chain asynchronous functions using next followed by unwrap.


In functional programming such a collapsing function is called join. The combination next followed by unwrap (or, in Haskell, fmap followed by join) is so common that it has its own name, bind (in Haskell it’s the operator >>=). It might make sense to make bind another method of future (possibly under a different name). [Edit: In fact, the proposal (n3721) is to overload then to automatically perform unwrap whenever the result is a future of a future. This way then would also work as bind.]

There’s one more important usage pattern: a function that may execute asynchronously, but sometimes returns the result immediately. This often happens in recursive algorithms, when the recursion bottoms up. For instance, a parallel tree traversal function may spawn asynchronous tasks for traversing the children of a node, but when it reaches a leaf, it might want to return the result synchronously. Instead of writing complicated conditional code at each level, it’s easier to provide a “fake” future whose contents is immediately available — whose get method never blocks. Such fake future and the function that creates it called make_ready_future are also part of the proposal.

Together, the methods next (or then) and unwrap, and the function make_ready_future are easily recognizable by a functional programmer as forming the monad pattern (in Haskell, they would be called, respectively, fmap, join, and return). It’s a very general pattern for composing functions that return encapsulated values. Using a monad you may work with such functions directly, rather than unwrapping their results at every step. In the case of futures, this is an important issue, since the “unwrapping” means making a potentially blocking call to get and losing precious opportunities for parallelism. You want to set up as much computation up front and let the system schedule the most advantageous execution.

Combining functions using next, unwrap (or, equivalently, bind), and make_ready_future is equivalent to specifying data dependencies between computations and letting the runtime explore opportunities for parallelism between independent computations.

The Applicative Pattern

The combinators then and next are designed for linear composition: the output of one computation serves as the input for another. A more general pattern requires the combining of multiple asynchronous sources of data. In functional programming the problem would be described as applying a function to multiple arguments, hence the name “applicative” pattern. A functional programmer would take a multi-argument function and “lift” it to accept futures instead of immediate values.

As expected, in imperative programming things are a little messier. You have to create a barrier for all the input futures, retrieve the values, and then pass them to the multi-argument function or algorithm. The proposal contains a function called when_all that implements the first part of the process — the barrier. It takes either a pair of iterators to a container of futures or a variable number of futures, and returns a future that fires when all the arguments are ready. Conceptually, it performs a logical AND of all input futures.

The iterator version of when_all returns a future of a vector of futures, while the variadic version returns a future of a tuple of futures. It’s up to the client to get the resulting vector or tuple and iterate over it. Because of that, it’s not possible to directly chain the results of when_all the way then or next does it.

If you’re wondering how this kind of chaining is done in a functional language, you have to understand what partial application is. A function of many arguments doesn’t have to be applied to all of the arguments at once. You can imagine that applying it to the first argument doesn’t yield a value but rather a function on n-1 arguments. In C++11, this can be accomplished by calling std::bind, which takes a multi-parameter function and a value of the first argument, and returns a function object (a closure) that takes the remaining n-1 arguments (actually, you may pass it more than one argument at a time).

In this spirit, you could bind a multi-parameter function to a single future and get a future of a function of n-1 arguments. Then you are left with the problem of applying a future of a function to a future of an argument, and that’s exactly what the applicative pattern is all about. In Haskell, the Applicative class defines the operator <*> that applies an encapsulated function to an encapsulated value.

The Monoid Pattern

A very common pattern is to start several computations in parallel and pick the one that finishes first. This is the basis of speculative computation, where you pitch several algorithms against each other. Or you might be waiting for any of a number of asynchronous events, and attend to them as soon as they happen.

At a minimum you would expect a combinator that acts like a logical OR of two futures. A functional programmer would be immediately on the lookout for the monoid pattern. A monoid is equipped with a binary operation and a unit element. If the binary operation on futures picks the one that finishes first, what should the unit future be? A unit combined with any element must give back that same element. Therefore we need a future that would lose the race with any other future. We could call this special future “never.” Calling get on such a future would block forever.

In practice, one could slightly relax the definition of the “never” future. It would never return a result, but it could still throw an exception. A future like this could be used to implement a timeout. Pitching it against another future would either let the other future complete, or result in a timeout exception.

This is not the way the future extension proposal went, though. The proposed combinator is called when_any and it takes either a pair of iterators to a container of futures or a variable number of futures. It returns a future of either a vector or a tuple of futures. It’s up to the client to iterate over those futures and find the one (or the ones) that fired by calling is_ready on each of them.

The advantage of this approach is that the client may still write code to wait for the remaining futures to finish. The disadvantage is that the client is responsible for writing a lot of boilerplate code, which will obscure the program logic.

Performance and Programming Considerations

An objection to using futures as the main vehicle for asynchronous programming was raised in N3896: Library Foundations for Asynchronous Operations. The point it that it’s possible for an asynchronous API to have a result ready before the client had the opportunity to provide the continuation by calling then (or next). This results in unnecessary synchronization, which may negatively impact performance.

The alternative approach is to pass the continuation (the handler) directly to the asynchronous API. This is how a lot of asynchronous APIs are implemented at the lowest level anyway. The two approaches don’t exclude each other, but supporting both at the same time, as proposed in N3896, adds a lot of complexity to the programming model.

From the programmer’s perspective, the continuation passing model of N3896 is probably the hardest to use. The programming model is that of a state machine, with the client responsible for writing handlers for every transition.

Futures provide a useful abstraction by reifying the anticipated values. The programmer can write code as if the values were there. Futures also provide a common language between concurrent, parallel, and asynchronous worlds. It doesn’t matter if a value is to be evaluated by spawning a thread, creating a lightweight execution agent, or by calling an asynchronous API, as long as it’s encapsulated in a future. The compositional and expressional power of futures is well founded in major patterns of functional programming: the functor, the monad, the applicative, and the monoid.

There is another, even more attractive programming model that’s been proposed for C++, Resumable Functions, which makes asynchronous code look more like sequential code. This is based on a trick that’s well known to Haskell programmers in the form of the “do” notation. In C++, a resumable function would be chopped by the compiler into a series of continuations separated by await keywords. Instead of creating a future and calling then with a lambda function, the programmer would insert await and continue writing code as if the value were available synchronously.


I’d like to thank Artur Laksberg for reading the draft of this blog and providing useful feedback.

In my previous post I worked on stretching the intuition of what a container is. I proposed that, in Haskell, any functor may be interpreted as some kind of container, including the hard cases like the state functor or the IO functor. I also talked about natural transformations between functors as “repackaging” schemes for containers, which work without “breaking the eggs” — not looking inside the elements stored in the container. Continuing with this analogy: Algebras are like recipes for making omelets.

The intuition is that an algebra provides a way to combine elements stored inside a container. This cannot be done for arbitrary types because there is no generic notion of “combining.” So an algebra is always defined for a specific type. For instance, you can define an algebra for numbers because you know how to add or multiply them, or for strings because you can concatenate them, and so on. The way elements are combined by an algebra is in general driven by the structure of the container itself.

For example, think of an expression tree as a container.

data Expr a = Const a 
            | Add (Expr a) (Expr a) 
            | Mul (Expr a) (Expr a)

We could define many algebras for it. An integer algebra would work on an expression tree that stores integers. A complex algebra would work on a tree that stores complex numbers. A Boolean algebra would work on Boolean expressions using, for instance, logical OR to evaluate the Add node and logical AND for the Mul node. You could even define an algebra of sets with union and intersection for Add and Mul. In fact, in the absence of any additional requirements, any pair of binary functions acting on a given type will do.

The definition of an algebra for a given functor f consists of a type t called the carrier type and a function called the action. Any Haskell algebra is therefore of the type:

newtype Algebra f t = Algebra (f t -> t)

More abstractly, in category theory, an algebra (or, more precisely, an F-algebra) for an endofunctor F is a pair (A, alg) of an object A and a morphism alg : F A -> A. As always, the standard translation from category theory to Haskell replaces objects with types and morphisms with functions.

Let’s have a look at a simple example of an algebra. Let’s pick the list functor and define an Int algebra for it, for instance:

sumAlg :: Algebra [] Int
sumAlg = Algebra (foldr (+) 0)

Despite its simplicity, this example leads to some interesting observations.

First, the use of foldr tells us that it’s possible to handle recursion separately from evaluation. The evaluation is really parameterized here by the function (+) and the value, zero. The algebra is type-specific. On the other hand, foldr is fully polymorphic. It turns out that there is another algebra hidden in this example, and it’s determined just by (+) and zero. We’ll see that more clearly when we talk about fixed points of functors.

The second observation is that a list is not only a functor but also a monad. Is there something special about algebras for a monad? We’ll see.

Algebras and Fixed Points

I wrote a whole blog post about F-algebras with a more categorical slant. Here I’ll elaborate on the Haskell aspects of algebras and develop some more intuitions.

A recursive container is not only a functor but it can also be defined as a fixed point of a functor. So, really, we should start with a double functor, parameterized by two types, a and b:

data ExprF a b = Const a
               | Add b b
               | Mul b b
     deriving Functor

We can then find its fixed point: a type that, when substituted for b, will give back itself. Think of a functor as a TV camera (sorry for switching metaphors). When you point it at some type b, its image appears in all the little monitors where b is on the right hand side of the definition. We all know what happens when you point the camera back at the monitors — you get the ever receding image within image within image… That’s your fixed point.

This “pointing of the camera at the monitors” can be abstracted into a Haskell data structure. It is parameterized by a functor f, which provides the camera and the monitors. The fixed point is given by the ever receding:

newtype Fix f = In (f (Fix f))

Notice that, on the left hand side, f appears without an argument. If f a is a container of a then f by itself is a recipe for creating a container from any type. Fix takes such a recipe and applies it to itself — to (Fix f).

Later we’ll also need the deconstructor, unIn:

unIn :: Fix f -> f (Fix f)
unIn (In x) = x

Going back to our earlier functor, we can apply Fix to it and get back the recursive version of Expr:

type Expr a = Fix (ExprF a)

Here, (ExprF a) is a recipe for stuffing any type b into a simple (non-recursive) container defined by ExprF.

Creating actual expressions using the above definition of Expr is a little awkward, but possible. Here’s one:

testExpr :: Expr Int
testExpr = In $ (In $ (In $ Const 2) `Add` (In $ Const 3)) 
                `Mul` (In $ Const 4)

Knowing that a recursive data type such as (Expr a) is defined in terms of a simpler functor (ExprF a b) means that any recursive algebra for it can be defined in terms of a simpler algebra. For instance, we can define a simple algebra for (ExprF Int) by picking the carrier type Double and the following action:

alg :: ExprF Int Double -> Double
alg (Const i) = fromIntegral i
alg (Add x y) = x + y
alg (Mul x y) = x * y

We can extend this algebra to work on arbitrary recursive expressions of type Expr Int. We’ll call this new recursive algebra alg'. When given an (Expr Int) it will do the following:

  1. Extract the contents of the outer Fix by pattern matching on the consturctor In. The contents is of the type ExprF acting on (Expr Int).
  2. Apply alg' (the recursive one we are just defininig) to this contents. Do this using fmap. Here we are taking advantage of the fact that ExprF is a functor. This application of alg' replaces the children of the expression ExprF with Doubles — the results of their evaluation.
  3. Apply alg to the result of the previous step, which is of the type (ExprF Int Double).

Here’s the code that implements these steps:

alg' :: Fix (ExprF Int) -> Double
alg' (In expr) = alg (fmap alg' expr)

Notice that this code does not depend on the details of the functor. In fact it will work for any functor and any algebra:

cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unIn

This generic function is called a catamorphism. It lets you apply an algebra to the contents of a recursively defined container.

My first example of an algebra was acting on a list. A list can also be defined as a fixed point of a functor:

data ListF a b = Empty | Cons a b
     deriving Functor

If you work out the details, you can convince yourself that the sumAlg I defined earlier is nothing else but the catamorphism for the functor ListF Int applied to the following simple algebra:

alg :: ListF Int Int -> Int
alg Empty = 0
alg (Cons a b) = a + b

Now we understand why any list catamorphism is parameterized by one value and one function of two arguments.

Monads and Algebras

As I said in the beginning, a list is not only a functor but also a monad. A monad adds two special abilities to a functor/container. It lets you create a default container that contains just a given value: The function that does it is called return. And it lets you collapse a container of containers into a single container: That function is called join (and I explained before how it relates to the more commonly used bind, >>=).

When we define an algebra for a functor that happens to be a monad, it would be nice for this algebra to interact sensibly with return and join. For instance, you can apply return to a value of the algebra’s carrier type to obtain a default container of that type. Evaluating such a container should be trivial — it should give you back the same value:

(1) alg . return == id

For instance, in the list monad return creates a singleton list, so we want the algebra to extract the value from a singleton without modifying it in any way.

alg [a] =
(alg . return) a =
id a =

Now let’s consider a container of containers of the carrier type. We have two ways of collapsing it: we can fmap our algebra over it — in other words, evaluate all the sub-containers — or we can join it. Expecting to get the same result in both cases would be asking a lot (but we get something like this in the Kleisli category later). We can demand though that, for an algebra to be compatible with a monad, the two resulting containers at least evaluate to the same thing:

(2) alg . fmap alg == alg . join

Let’s see what this condition means for lists, where join is concatenation. We start with a list of lists and we apply two evaluation strategies to it: We can evaluate the sub-lists and then evaluate the resulting list of results, or we can concatenate the sub-lists and then evaluate the concatenated list.

Guess what, our condition is equivalent to imposing associativity on the algebra. Think of the action of the algebra on a two-element list as some kind of “multiplication.” Since the concatenation of [a, [b, c]] is the same as the concatenation of [[a, b], c], these two must evaluate to the same value. But that’s just associativity of our “multiplication.”

How much can we extend this analogy with multiplication? Can we actually produce a unit element? Of course: The action of the algebra on an empty list:

e = alg []

Let’s check it: Apply our compatibility conditions to the list [[a], []]. This is the left hand side:

(alg . fmap alg) [[a], []] = 
alg [alg [a], alg []] = 
alg [a, e]

And this is the right hand side:

(alg . join) [[a], []] = 
alg [a] = 

So, indeed, e is the right unit of our “multiplication.” You can do the same calculation for [[], [a]] to show that it’s also the left unit.

We have an associative operation equipped with a unit — that’s called a monoid. So any list algebra compatible with the list’s monadic structure defines a monoid.


An F-algebra that’s compatible with a monad (conditions (1) and (2) above), both built on the same functor, is called a T-algebra. I guess that’s because mathematicians replace F with T when they talk about monads. There may be many T-algebras for a given monad and in fact they form a category of their own.

This is not saying much, because requirements for a category are pretty minimal. You have to define arrows: here it would be homomorphisms of T-algebras. A homomorphism of algebras maps one carrier into another in such a way as to preserve the action.

In Haskell, a homomorphism of algebras would just be a function h from one carrier type to another such that:

h    :: A -> B
alg  :: F A -> A
alg' :: F B -> B

h . alg == alg' . fmap h

Here, alg and alg' are the two actions with carrier types A and B, respectively, and F is the functor. What this means is that, if you have a container of As you can evaluate it using alg and then apply h to it and get a B, or you can apply h to the contents of the container using fmap and then evaluate the resulting container of Bs using alg'. The result should be the same in both cases.

This is a pretty standard way of defining homomorphisms for any structure, not just an algebra. Homomorphisms behave like functions: they are composable and there always is an identity homomorphism for every algebra, so they indeed turn T-algebras into a category — the so called Eilenberg-Moore category.

Remember what I said about the compatibility between join and alg? They both take down one layer of containment. Other than that, they are very different: join is a polymorphic natural transformation — it operates on the structure of the container, not its contents. An F-algebra operates on the contents and is defined only for a specific type.

And yet we can use join to define a T-algebra. Just consider using a container as a carrier type. A container is an image of some type a under a functor m which, for our purposes also happens to be a monad. Apply m to it one more time and you get a container of containers. You can “evaluate” this container of containers down to a single container using join.

You have just defined an algebra for the functor m whose carrier type is (m a) and the action is join. In fact, you have defined a whole family of algebras parameterized by the type a. Keep in mind that a is not the carrier type of this algebra, (m a) is. These algebras are called free algebras for the monad m. Guess what, they also form a category — the so called Kleisli category — which is a subcategory of the Eilenberg-Moore category.

Why are these two categories important? Well, it’s a topic for another blog post, but here’s the idea: Suppose you have two functors, F and G, one going from category C to D and the other going back. If G were the inverse of F, we would say that C and D are isomorphic. But what if they were “almost” inverse? For instance, their composition instead of being the identity were somehow mappable to identity. This kind of relationship between functors can be formalized into an adjunction. It so happens that the composition of two adjoint functors forms a monad (or a comonad, if you compose them the other way around). Not only that — any monad may be decomposed into a pair of adjoint functors. There are many ways to perform this decomposition and there are many choices for the intermediate category — the target of F and the source of G. The Kleisli category is the smallest such category and the Eilenberg-Moore category is the largest one.

A heap is a great data structure for merging and sorting data. It’s implemented as a tree with the special heap property: A parent node is always less or equal than its children nodes, according to some comparison operator. In particular, the top element of the heap is always its smallest element. To guarantee quick retrieval and insertion, the tree doesn’t necessarily have to be well balanced. A leftist heap, for instance, is lopsided, with left branches always larger or equal to right branches.

The invariant of the leftist heap is expressed in terms of its right spines. The right spine of a tree is its rightmost path. Its length is called the rank of the tree. In a leftist heap the rank of the right child is always less or equal to the rank of the left child — the tree is leaning left. Because of that, the rank can grow at most logarithmically with the number of elements.

Leftist heap with ranks and spines. Ranks take into account empty leaf nodes, not shown.

Leftist heap with ranks and spines. Ranks take into account empty leaf nodes, not shown.

You can always merge two heaps by merging their right spines because they are just sorted linked lists. Since the right spines are at most logarithmically long, the merge can be done in logarithmic time. Moreover, it’s always possible to rotate nodes in the merged path to move heavier branches to the left and thus restore the leftist property.

With merging thus figured out, deletion from the top and insertion are trivial. After removing the top, you just merge left and right children. When inserting a new element, you create a singleton heap and merge it with the rest.


The implementation of the functional leftist heap follows the same pattern we’ve seen before. We start with the definition:

A heap can either be empty or consist of a rank, a value, and two children: left and right heaps.

Let’s start with the definition of a non-empty heap as a private structure inside the Heap class:

template<class T>
class Heap
    struct Tree
        Tree(T v) : _rank(1), _v(v) {}
        Tree(int rank
            , T v
            , std::shared_ptr<const Tree> const & left
            , std::shared_ptr<const Tree> const & right)
        : _rank(rank), _v(v), _left(left), _right(right)

        int _rank;
        T   _v;
        std::shared_ptr<const Tree> _left;
        std::shared_ptr<const Tree> _right;
    std::shared_ptr<Tree> _tree;

Heap data is just the shared_ptr<Tree>. An empty shared_ptr encodes and empty heap, otherwise it points to a non-empty Tree.

We’ll make the constructor of a non-empty heap private, because not all combinations of its arguments create a valid heap — see the two assertions:

Heap(T x, Heap const & a, Heap const & b)
    assert(a.isEmpty() || x <= a.front());
    assert(b.isEmpty() || x <= b.front());
    // rank is the length of the right spine
    if (a.rank() >= b.rank())
        _tree = std::make_shared<const Tree>(
                b.rank() + 1, x, a._tree, b._tree);
        _tree = std::make_shared<const Tree>(
                a.rank() + 1, x, b._tree, a._tree);

We’ll make sure these assertions are true whenever we call this constructor from inside Heap code. This constructor guarantees that, as long as the two arguments are leftist heaps, the result is also a leftist heap. It also calculates the rank of the resulting heap by adding one to the rank of its right, shorter, branch. We’ll set the rank of an empty heap to zero (see implementation of rank).

As always with functional data structures, it’s important to point out that the construction takes constant time because the two subtrees are shared rather than copied. The sharing is thread-safe because, once constructed, the heaps are always immutable.

The clients of the heap will need an empty heap constructor:

Heap() {}

A singleton constructor might come in handy too:

explicit Heap(T x) : _tree(std::make_shared(x)) {}

They will need a few accessors as well:

bool isEmpty() const { return !_tree; }
int rank() const {
    if (isEmpty()) return 0;
    else return _tree->_rank;

The top, smallest, element is accessed using front:

T front() const { return _tree->_v; }

As I explained, the removal of the top element is implemented by merging left and right children:

Heap pop_front() const {
    return merge(left(), right()); 

Again, this is a functional data structure, so we don’t mutate the original heap, we just return the new heap with the top removed. Because of the sharing, this is a cheap operation.

The insertion is also done using merging. We merge the original heap with a singleton heap:

Heap insert(T x) {
    return merge(Heap(x), *this);

The workhorse of the heap is the recursive merge algorithm below:

static Heap merge(Heap const & h1, Heap const & h2)
    if (h1.isEmpty())
        return h2;
    if (h2.isEmpty())
        return h1;
    if (h1.front() <= h2.front())
        return Heap(h1.front(), h1.left(), merge(h1.right(), h2));
        return Heap(h2.front(), h2.left(), merge(h1, h2.right()));

If neither heap is empty, we compare the top elements. We create a new heap with the smaller element at the top. Now we have to do something with the two children of the smaller element and the other heap. First we merge the right child with the other heap. This is the step I mentioned before: the merge follows the right spines of the heaps, guaranteeing logarithmic time. The left child is then combined with the result of the merge. Notice that the Heap constructor will automatically rotate the higher-rank tree to the left, thus keeping the leftist property. The code is surprisingly simple.

You might wonder how come we are not worried about the trees degenerating — turning into (left leaning) linked lists. Consider, however, that such a linked list, because of the heap property, would always be sorted. So the retrieval of the smallest element would still be very fast and require no restructuring. Insertion of an element smaller than the existing top would just prepend it to the list — a very cheap operation. Finally, the insertion of a larger element would turn this element into a length-one right spine — the right child of the top of the linked list. The degenerate case is actually our best case.

Turning an unsorted list of elements into a heap could naively be done in O(N*log(N)) time by inserting the elements one by one. But there is a better divide-and-conquer algorithm that does it in O(N) time (the proof that it’s O(N) is non-trivial though):

template<class Iter>
static Heap heapify(Iter b, Iter e)
    if (b == e)
        return Heap();
    if (e - b == 1)
        return Heap(*b);
        Iter mid = b + (e - b) / 2;
        return merge(heapify(b, mid), heapify(mid, e));

This function is at the core of heap sort: you heapify a list and then extract elements from the top one by one. Since the extraction takes O(log(N)) time, you end up with a sort algorithm with the worst case performance O(N*long(N)). On average, heapsort is slower than quicksort, but quicksort’s worst case performance is O(N2), which might be a problem in some scenarios.

For an outsider, Haskell is full of intimidating terms like functor, monad, applicative, monoid… These mathematical abstractions are hard to explain to a newcomer. The internet is full of tutorials that try to simplify them with limited success.

The most common simplification you hear is that a functor or a monad is like a box or a container. Indeed, a list is a container and a functor, Maybe is like a box, but what about functions? Functions from a fixed type to an arbitrary type define both a functor and a monad (the reader monad). More complex functions define the state and the continuation monads (all these monads are functors as well). I used to point these out as counterexamples to the simplistic picture of a functor as a container. Then I had an epiphany: These are containers!

So here’s the plan: I will first try to convince you that a functor is the purest expression of containment. I’ll follow with progressively more complex examples. Then I’ll show you what natural transformations really are and how simple the Yoneda lemma is in terms of containers. After functors, I’ll talk about container interpretation of pointed, applicative, and monad. I will end with a new twist on the state monad.

What’s a Container?

What is a container after all? We all have some intuitions about containers and containment but if you try to formalize them, you get bogged down with tricky cases. For instance, can a container be infinite? In Haskell you can easily define the list of all integers or all Pythagorean triples. In non-lazy language like C++ you can fake infinite containers by defining input iterators. Obviously, an infinite container doesn’t physically contain all the data: it generates it on demand, just like a function does. We can also memoize functions and tabulate their values. Is the hash table of the values of the sin function a container or a function?

The bottom line is that there isn’t that much of a difference between containers and functions.

What characterizes a container is its ability to contain values. In a strongly typed language, these values have types. The type of elements shouldn’t matter, so it’s natural to describe a generic container as a mapping of types — element type to container type. A truly polymorphic container should not impose any constraints on the type of values it contains, so it is a total function from types to types.

It would be nice to be able to generically describe a way to retrieve values stored in a container, but each container provides its own unique retrieval protocol. A retrieval mechanism needs a way to specify the location from which to retrieve the value and a protocol for failure. This is an orthogonal problem and, in Haskell, it is addressed by lenses.

It would also be nice to be able to iterate over, or enumerate the contents of a container, but that cannot be done generically either. You need at least to specify the order of traversal. Even the simplest list can be traversed forwards or backwards, not to mention pre-, in-, and post-order traversals of trees. This problem is addressed, for instance, by Haskell’s Traversable functors.

But I think there is a deeper reason why we wouldn’t want to restrict ourselves to enumerable containers, and it has to do with infinity. This might sound like a heresy, but I don’t see any reason why we should limit the semantics of a language to countable infinities. The fact that digital computers can’t represent infinities, even those of the countable kind, doesn’t stop us from defining types that have infinite membership (the usual Ints and Floats are finite, because of the binary representation, but there are, for instance, infinitely many lists of Ints). Being able to enumerate the elements of a container, or convert it to a (possibly infinite) list means that it is countable. There are some operations that require countability: witness the Foldable type class with its toList function and Traversable, which is a subclass of Foldable. But maybe there is a subset of functionality that does not require the contents of the container to be countable.

If we restrain ourselves from retrieving or enumerating the contents of a container, how do we know the contents even exists? Because we can operate on it! The most generic operation over the contents of a container is applying a function to it. And that’s what functors let us do.

Container as Functor

Here’s the translation of terms from category theory to Haskell.

A functor maps all objects in one category to objects in another category. In Haskell the objects are types, so a functor maps types into types (so, strictly speaking, it’s an endofunctor). You can look at it as a function on types, and this is reflected in the notation for the kind of the functor: * -> *. But normally, in a definition of a functor, you just see a polymorphic type constructor, which doesn’t really look like a function unless you squint really hard.

A categorical functor also maps morphisms to morphisms. In Haskell, morphisms correspond to functions, so a Functor type class defines a mapping of functions:

fmap :: (a -> b) -> (f a -> f b)

(Here, f is the functor in question acting on types a and b.)

Now let’s put on our container glasses and have another look at the functor. The type constructor defines a generic container type parameterized by the type of the element. The polymorphic function fmap, usually seen in its curried form:

fmap :: (a -> b) -> f a -> f b

defines the action of an arbitrary function (a -> b) on a container (f a) of elements of type a resulting in a container full of elements of type b.


Let’s have a look at a few important functors as containers.

There is the trivial but surprisingly useful container that can hold no elements. It’s called the Const functor (parameterized by an unrelated type b):

newtype Const b a = Const { getConst :: b }

instance Functor (Const b) where
    fmap _ (Const x) = Const x

Notice that fmap ignores its function argument because there isn’t any contents this function could act upon.

A container that can hold one and only one element is defined by the Identity functor:

newtype Identity a = Identity { runIdentity :: a }

instance Functor Identity where
    fmap f (Identity x) = Identity (f x)

Then there is the familiar Maybe container that can hold (maybe) one element and a bunch of regular containers like lists, trees, etc.

The really interesting container is defined by the function application functor, ((->) e) (which I would really like to write as (e-> )). The functor itself is parameterized by the type e — the type of the function argument. This is best seen when this functor is re-cast as a type constructor:

newtype Reader e a = Reader (e -> a)

This is of course the functor that underlies the Reader monad, where the first argument represents some kind of environment. It’s also the functor you’ll see in a moment in the Yoneda lemma.

Here’s the Functor instance for Reader:

instance Functor (Reader e) where  
    fmap f (Reader g) = Reader (\x -> f (g x))

or, equivalently, for the function application operator:

instance Functor ((->) e) where
    fmap = (.)

This is a strange kind of container where the values that are “stored” are keyed by values of type e, the environments. Given a particular environment, you can retrieve the corresponding value by simply calling the function:

runReader :: Reader e a -> e -> a
runReader (Reader f) env = f env

You can look at it as a generalization of the key/value store where the environment plays the role of the key.

The reader functor (for the lack of a better term) covers a large spectrum of containers depending of the type of the environment you pick. The simplest choice is the unit type (), which contains only one element, (). A function from unit is just a constant, so such a function provides a container for storing one value (just like the Identity functor). A function of Bool stores two values. A function of Integer is equivalent to an infinite list. If it weren’t for space and time limitations we could in principle memoize any function and turn it into a lookup table.

In type theory you might see the type of functions from A to B written as BA, where A and B are types seen as sets. That’s because the analogy with exponentiation — taking B to the power of A — is very fruitful. When A is the unit type with just one element, BA becomes B1, which is just B: A function from unit is just a constant of type B. A function of Bool, which contains just two elements, is like B2 or BxB: a Cartesian product of Bs, or the set of pairs of Bs. A function from the enumeration of N values is equivalent to an N-tuple of Bs, or an element of BxBxBx…B, N-fold. You can kind of see how this generalizes into B to the power of A, for arbitrary A.

So a function from A to B is like a huge tuple of Bs that is indexed by an element of A. Notice however that the values stored in this kind of container can only be enumerated (or traversed) if A itself is enumerable.

The IO functor that is the basis of the IO monad is even more interesting as a container because it offers no way of extracting its contents. An object of the type IO String, for instance, may contain all possible answers from a user to a prompt, but we can’t look at any of them in separation. All we can do is to process them in bulk. This is true even when IO is looked upon as a monad. All a monad lets you do is to pass your IO container to another monadic function that returns a new container. You’re just passing along containers without ever finding out if the Schrodinger’s cat trapped in them is dead or alive. Yes, parallels with quantum mechanics help a lot!

Natural Transformations

Now that we’ve got used to viewing functors as containers, let’s figure out what natural transformations are. A natural transformation is a mapping of functors that preserves their functorial nature. If functor F maps object A to X and another functor G maps A to Y, then a natural transformation from F to G must map X to Y. A mapping from X to Y is a morphism. So you can look at a natural transformation as a family of morphisms parameterized by A.

In Haskell, we turn all these objects A, X, and Y into types. We have two functors f and g acting on type a. A natural transformation will be a polymorphic function that maps f a to g a for any a.

forall a . f a -> g a

What does it mean in terms of containers? Very simple: A natural transformation is a way of re-packaging containers. It tells you how to take elements from one container and put them into another. It must do it without ever inspecting the elements themselves (it can, however, drop some elements or clone them).

Examples of natural transformations abound, but my favorite is safeHead. It takes the head element from a list container and repackages it into a Maybe container:

safeHead :: forall a . [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x

What about a more ambitions example: Let’s take a reader functor, Int -> a, and map it into the list functor [a]. The former corresponds to a container of a keyed by an integer, so it’s easily repackaged into a finite or an infinite list, for instance:

genInfList :: forall a . (Int -> a) -> [a]
genInfList f = fmap f [0..]

I’ll show you soon that all natural transformations from (Int -> a) to [a] have this form, and differ only by the choice of the list of integers (here, I arbitrarily picked [0..]).

A natural transformation, being a mapping of functors, must interact nicely with morphisms as well. The corresponding naturality condition translates easily into our container language. It tells you that it shouldn’t matter whether you first apply a function to the contents of a container (fmap over it) and then repackage it, or first repackage and then apply the function. This meshes very well with our intuition that repackaging doesn’t touch the elements of the container — it doesn’t breaks the eggs in the crate.

The Yoneda Lemma

Now let’s get back to the function application functor (the Reader functor). I said it had something to do with the Yoneda lemma. I wrote a whole blog about the Yoneda lemma, so I’m not going to repeat it here — just translate it into the container language.

What Yoneda says is that the reader is a universal container from which stuff can be repackaged into any other container. I just showed you how to repackage the Int reader into a list using fmap and a list of Int. It turns out that you can do the same for any type of reader and an arbitrary container type. You just provide a container full of “environments” and fmap the reader function over it. In my example, the environment type was Int and the container was a list.

Moreover, Yoneda says that there is a one-to-one correspondence between “repackaging schemes” and containers of environments. Given a container of environments you do the repackaging by fmapping the reader over it, as I did in the example. The inverse is also easy: given a repackaging, call it with an identity reader:

idReader :: Reader e e
idReader = Reader id

and you’ll get a container filled with environments.

Let me re-word it in terms of functors and natural transformations. For any functor f and any type e, all natural transformations of the form:

forall a . ((e -> a) -> f a)

are in one-to-one correspondence with values of the type f e. This is a pretty powerful equivalence. On the one hand you have a polymorphic function, on the other hand a polymorphic data structure, and they encode the same data. Except that things you do with functions are different than things you do with data structures so, depending on the context, one may be more convenient than the other.

For instance, if we apply the Yoneda lemma to the reader functor itself, we find out that all repackagings (natural transformations) between readers can be parameterized by functions between their environment types:

forall a . ((e -> a) -> (e' -> a)) ~ e' -> e

Or, you can look at this result as the CPS transform: Any function can be encoded in the Continuation Passing Style. The argument (e -> a) is the continuation. The forall quantifier tells us that the return type of the continuation is up to the caller. The caller might, for instance, decide to print the result, in which case they would call the function with the continuation that returns IO (). Or they might call it with id, which is itself polymorphic: a -> a.

Where Do Containers Come From?

A functor is a type constructor — it operates on types — but in a program you want to deal with data. A particular functor might define its data constructor: List and Maybe have constructors. A function, which we need in order to create an instance of the reader functor, may either be defined globally or through a lambda expression. You can’t construct an IO object, but there are some built-in runtime functions, like getChar or putChar that return IO.

If you have functions that produce containers you may compose them to create more complex containers, as in:

-- m is the functor
f :: a -> m b
g :: b -> m c
fmap g (f x) :: m (m c)

But the general ability to construct containers from scratch and to combine them requires special powers that are granted by successively more powerful classes of containers.


The first special power is the ability to create a default container from an element of any type. The function that does that is called pure in the context of applicative and return in the context of a monad. To confuse things a bit more, there is a type class Pointed that defines just this ability, giving it yet another name, point:

class Functor f => Pointed f where
        point :: a -> f a

point is a natural transformation. You might object that there is no functor on the left hand side of the arrow, but just imagine seeing Identity there. Naturality just means that you can sneak a function under the functor using fmap:

fmap g (point x) = point (g x)

The presence of point means that there is a default, “trivial,” shape for the container in question. We usually don’t want this container to be empty (although it may — I’m grateful to Edward Kmett for correcting me on this point). It doesn’t mean that it’s a singleton, though — for ZipList, for instance, pure generates and infinite list of a.


Once you have a container of one type, fmap lets you generate a container of another type. But since the function you pass to fmap takes only one argument, you can’t create more complex types that take more than one argument in their constructor. You can’t even create a container of (non-diagonal) pairs. For that you need more general ability: to apply a multi-argument function to multiple containers at once.

Of course, you can curry a multi-argument function and fmap it over the first container, but the result will be a container of hungry functions waiting for more arguments.

h :: a -> b -> c
fmap h (m a) :: m (b -> c)

(Here, m stands for the functor, applicative, or the monad in question.)

What you need is the ability to apply a container of functions to a container of arguments. The function that does that is called <*> in the context of applicative, and ap in the context of monad.

(<*>) :: m (a -> b) -> m a -> m b

As I mentioned before, Applicative is also Pointed, with point renamed to pure. This lets you wrap any additional arguments to your multi-argument functions.

The intuition is that applicative brings to the table its ability to increase the complexity of objects stored in containers. A functor lets you modify the objects but it’s a one-input one-output transformation. An applicative can combine multiple sources of information. You will often see applicative used with data constructors (which are just functions) to create containers of object from containers of arguments. When the containers also carry state, as you’ll see when we talk about State, an applicative will also be able to reflect the state of the arguments in the state of the result.


The monad has the special power of collapsing containers. The function that does it is called join and it turns a container of containers into a single container:

join :: m (m a) -> m a

Although it’s not obvious at first sight, join is also a natural transformation. The fmap for the m . m functor is the square of the original fmap, so the naturality condition looks like this:

 fmap f . join = join . (fmap . fmap) f 

Every monad is also an applicative with return playing the role of pure and ap implementing <*>:

ap :: m (a -> b) -> m a -> m b
ap mf ma = join $ fmap (\f -> fmap f ma) mf

When working with the container interpretation, I find this view of a monad as an applicative functor with join more intuitive. In practice, however, it’s more convenient to define the monad in terms of bind, which combines application of a function a la fmap with the collapsing of the container. This is done using the function >>=:

(>>=) :: m a -> (a -> m b) -> m b
ma >>= k = join (fmap k ma)

Here, k is a function producing containers. It is applied to a container of a, ma, using fmap. We’ve seen this before, but we had no way to collapse the resulting container of containers — join makes this possible.

Imagine a hierarchy of containment. You start with functions that produce containers. They “lift” your data to the level of containers. These are functions like putChar, data constructors like Just, etc. Then you have the “trivial” lifters of data called pure or return. You may operate on the data stored in containers by “lifting” a regular function using fmap. Applicative lets you lift multi-parameter functions to create containers of more complex data. You may also lift functions that produce containers to climb the rungs of containment: you get containers of containers, and so on. But only the monad provides the ability to collapse this hierarchy.


Let’s have a look at the state functor, the basis of the state monad. It’s very similar to the reader functor, except that it also modifies the environment. We’ll call this modifiable environment “state.” The modified state is paired with the return value of the function that defines the state functor:

newtype State s a = State (s -> (a, s))

As a container, the reader functor generalized the key/value store. How should we interpret the state functor in the container language? Part of it is still the key/value mapping, but we have the additional key/key mapping that describes the state transformation. (The state plays the role of the key.) Notice also that the action of fmap modifies the values, but doesn’t change the key mappings.

instance Functor (State s) where
  fmap f (State g) = State (\st -> let (x, st') = g st 
                                   in (f x, st'))

This is even more obvious if we separate the two mappings. Here’s the equivalent definition of the state functor in terms of two functions:

data State' s a = State' (s -> a) (s -> s)

The first function maps state to value: that’s our key/value store, identical to that of the reader functor. The second function is the state transition matrix. Their actions are quite independent:

runState' (State' f tr) s = (f s, tr s)

In this representation, you can clearly see that fmap only acts on the key/value part of the container, and its action on data is identical to that of the reader functor:

instance Functor (State' s) where
  fmap f (State' g tr) = State' (f . g) tr

In the container language, we like to separate the contents from the shape of the container. Clearly, in the case of the state functor, the transition matrix, not being influenced by fmap, is part of the shape.

A look at the Applicative instance for this representation of the state functor is also interesting:

instance Applicative (State' s) where
  pure x = State' (const x) id
  State' fg tr1 <*> State' fx tr2 =
      State' ff (tr2 . tr1)
      ff st = let g = fg st
                  x = fx (tr1 st)
              in g x

The default container created by pure uses identity as its transition matrix. As expected, the action of <*> creates a new “shape” for the container, but it does it in a very regular way by composing the transition matrices. In the language of linear algebra, the transformation of state by the applicative functor would be called “linear.” This will not be true with monads.

You can also see the propagation of side effects: the values for the first and second argument are retrieved using different keys: The key for the retrieval of the function g is the original state, st; but the argument to the function, x, is retrieved using the state transformed by the transition matrix of the first argument (tr1 st). Notice however that the selection of keys is not influenced by the values stored in the containers.

But here’s the monad instance:

instance Monad (State' s) where
  return x = State' (const x) id
  State' fx tr >>= k =
      State' ff ttr
      ff st  = let x = fx st
                   st' = tr st
                   State' fy tr' = k x
               in fy st' 
      ttr st = let x = fx st
                   st' = tr st
                   State' fy tr' = k x
               in tr' st'

What’s interesting here is that the calculation of the transition matrix requires the evaluation of the function k with the argument x. It means that the state transition is no longer linear — the decision which state to chose next may depend on the contents of the container. This is also visible in the implementation of join for this monad:

join :: State' s (State' s a) -> State' s a
join (State' ff ttr) = State' f' tr'
    f' st  = let State' f tr = ff st
                 st'         = ttr st
             in f st'
    tr' st = let State' f tr = ff st
                 st'         = ttr st
             in tr st'

Here, the outer container stores the inner container as data. Part of the inner container is its transition matrix. So the decision of which transition matrix tr to use is intertwined with data in a non-linear way.

This non-linearity means that a monad is strictly more powerful than applicative, but it also makes composing monads together harder.


The only way to really understand a complex concept is to see it from as many different angles as possible. Looking at functors as containers provides a different perspective and brings new insights. For me it was the realization that functions can be seen as non-enumerable containers of values, and that the state monad can be seen as a key/value store with an accompanying state transition matrix that brought the aha! moments. It was also nice to explicitly see the linearity in the applicative’s propagation of state. It was surprising to discover the simplicity of the Yoneda lemma and natural transformations in the language of containers.

Bibliography and Acknowledgments

A container is not a very well defined term — an ambiguity I exploited in this blog post — but there is a well defined notion of finitary containers, and they indeed have a strong connection to functors. Russell O’Connor and Mauro Jaskelioff have recently shown that every traversable functor is a finitary container (I’m grateful to the authors for providing me with the preliminary copy of their paper, in which they have also indenpendently and much more rigorously shown the connection between the Yoneda lemma for the functor category and the van Laarhoven formulation of the lens).

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 220,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 9 days for that many people to see it.

Click here to see the complete report.

There is an engineer and a mathematician in each of us. My blog posts, for instance, alternate between discussing write ordering in the x86 processor, and explaining abstract ideas from category theory. I might be a bit extreme in this respect, but even in everyday programming we often switch between the focused, low-level, practical thinking of an engineer and the holistic, global, abstract thinking of a mathematician. Sometimes we write a for loop; sometimes we holistically apply std::transform to a vector. Sometimes we write imperative step-by-step programs; and sometimes we write declarative ones, letting the compiler figure out the steps.

What I find fascinating is that these two approaches also manifest themselves in mathematics. There are constructive proofs that resemble the work of an engineer; and there are proofs of existence, that resemble the work of a philosopher. An entity may be defined by showing how to construct it, or it might be defined by its universal property — by being the most general or the most specific among its peers. A universal description is favored by category theorists, especially if it reduces their dependency on Set Theory. It’s not that category theorists hate set theory, but they definitely like to limit their reliance on it.

In this post I’ll show you the two ways of defining a free monoid. If the term monoid is not familiar to you, don’t despair; it’s a very simple construct, and you’ve seen it and used it many times. The classic example of a monoid is the set of strings with string concatenation. Another is the set of natural numbers with addition (or multiplication — either will work).

Free Monoid

A monoid is a set with a binary operation — let’s just call it multiplication and denote it by the infix * (you’ll also see it called, in Haskell, mappend, ++, <>…). This binary operation must obey certain laws. The first law is that there is a special element called unit, e, with the property that for any a:

a * e = e * a = a

(You’ll also see it called mempty.) The second law is that of associativity:

a * (b * c) = (a * b) * c

There are many examples of monoids, but there is a special class of them called free monoids. The easiest way to understand what a free monoid is, is to construct one. Just pick a set, any set, and call it the set of generators. Then define multiplication in the laziest, dumbest possible way. First, add one more element to the set and call it a unit. Define multiplication by unit to always return the other multiplicand. Then, for every pair of generators create a new element and call it their product. Define products of products the same way — every time you need to multiply two existing elements, create a new one, call it their product, and add it to the growing set. The only tricky part is to make sure that thusly defined product is associative. So for every triple of elements, a, b, and c, you have to identify a * (b * c) with (a * b) * c.

Here’s a little mnemonic trick that may help you keeping track of associativity. Assign letters of the alphabet to your generators. Say, you have less than 26 generators and you call them a, b, c, etc. Reserve letter z for your unit element. When asked for the product of, say, a and t, call it at. The product of c and at will be cat, and so on. This way, you’ll automatically call the product of ca with t, cat, as you should.

As you can see a free monoid generated by an alphabet is equivalent to the set of strings, with product defined as string concatenation.

What happens when your original set has just one element, say a? The free monoid based on this set would contain the unit z and all the powers of a of the form aaa.... Writing long strings of as is boring, so instead let’s just length-encode them: a as 1, aa as 2, and so on. It’s natural to assign zero to z. We have just reinvented natural numbers with “product” being the addition.

Lists are free monoids too. Take any finite or infinite set and create ordered lists of its elements. Your unit element is the empty list, and “multiplication” is the concatenation of lists. Strings are just a special type of lists based on a finite set of generators. But you can have lists of integers, 3-D points, or even lists of lists.

Not all monoids are free, though. Take for instance addition modulo 4. You start with four elements: 0, 1, 2, and 3. But the sum of 2 and 3 is 1 (5 mod 4). Were it a free monoid, 2+3 would be a totally new element, but here we identified it with the existing element 1.

The question is, can we obtain any monoid by identifying some elements of the corresponding free monoid? The modulo 4 monoid, for instance, could be obtained by identifying all natural numbers that have the same remainder when divided by 4.

The answer is yes, and it’s an example of a very important notion in category theory called universality. A free monoid is universal. Here’s what it means: I have a set of generators and a free monoid built from it. Give me any monoid and select any elements in it to correspond to the generators of my free monoid. I can show you a unique mapping m of my free monoid to your monoid that not only maps my generators to your selected elements, but also preserves multiplication. In other words, if a * b = c in my monoid, then m(a) * m(b) = m(c) in your monoid.

Notice that the mapping might not cover the whole target monoid. It might cover a sub-monoid. But within that sub-monoid, for each element in my monoid there will be a corresponding element in yours. In general, multiple elements in my monoid may be mapped to a single element in yours. This way your monoid has the same structure as mine, except that some elements have been identified.

Any monoid can be obtained from a free monoid by identifying some elements.

Categorical Monoid

You might find the above theorem in a category theory text (see, for instance, Saunders Mac Lane, Categories for the Working Mathematician, Corollary 2 on p. 50) and not recognize it. That’s because category theoreticians don’t like treating monoids as sets of elements. They prefer to see a monoid as a shapeless object whose properties are encoded in morphisms — arrows from a monoid to itself.

How do we reconcile these two views of monoids? On the one hand you have a set of elements with multiplication, on the other hand a monolithic blob with arrows.

Here’s one way: Consider what happens when you pick one element of a monoid and apply it to all elements — say, by left multiplication. You get a function acting on this set (in Haskell we call this function an operator section). That’s your morphism. Notice that you can compose such morphisms just like you compose functions: left multiplication by “a” composed with left multiplication by “b” is the same function as left multiplication by “a * b”. The set of these morphisms/functions — one for each element — together with the usual function composition — tells you everything about the monoid. It gives you the categorical description of it: A monoid is a category with a single object (“mono” means one, single) and a bunch of morphisms acting on it.

There is also a bigger Category of Monoids, where monoids are the objects and morphisms map monoids into monoids preserving their structure (they map products into products). Some of these monoids are free, others are not.

We know how to construct a free monoid starting from a set of generators, but how can we tell which of the abstract blobs with morphisms is free and which isn’t? And how do we even begin talking about generators if we vowed not to view monoids as sets?

Universal Construction

We have to work our way back from abstract monoids to sets. But to do that we need a functor because we’re going between categories. Our source is the category of monoids, and our target is the category of sets, where objects are sets and morphisms are regular functions.

Just a reminder: A functor is a mapping between categories. It maps objects into objects and morphisms into morphisms; but not willy-nilly — it has to preserve composition. So if one morphism is a composition of two others, it’s image under the functor must also be the composition of the two images.

In our case we will be mapping the category of monoids into the category of sets. But once you mapped a monoid into a set (its, so called, underlying set), you have forgotten its structure. You have forgotten that some functions were “special” because they corresponded to the morphisms of the original monoid. All functions are now equal. This kind of functor that forgets structure is aptly called a forgetful functor.

Our goal is to describe a free monoid corresponding to a set of generators X. X is just a regular set, nothing fancy, but we can’t look for it inside a monoid because a monoid is not a set any more. Fortunately we have our forgetful functor U, which maps monoids into sets. So instead of looking for generators inside monoids (which we can’t do), we’ll look for them inside those underlying sets. Given a monoid M, we just pick a function p: X -> U M. The image of X under p will serve as candidate generators.

Now let’s take another monoid L and do the same thing. We map our generators into the set U L (the set obtained from L using the same forgetful functor U). Call this mapping h: X -> U L. Now, U is a functor, so it maps monoids to sets and monoid morphisms to functions. So if there is a morphism m from M to L, there is a corresponding function from U M to U L (we’ll call this function U m). This is just a function mapping the set that’s underlying one monoid into the set that’s underlying another monoid.

The first set, U M, contains the images of our generators under p. Our function U m will map those candidate generators into some set in U L. Now, U L also contains candidate generators — the images of X under h. What are the chances that those two sets are the same? Slim but not zero! If we can pick m so that they overlap, we will have come as close as possible to the idea of the two monoids “sharing the same generators.” (See the following illustration.)

Universal construction of a free monoid generated by X

Universal construction of a free monoid generated by X

So given two monoids M and L and a set of generators X we can create candidate sets of generators in U M and U L. If we’re lucky, we can find a monoid morphism m that maps M to L and whose projection using U maps one candidate set of generators into the other.

Now zoom out to the category of monoids and draw an arrow between any M and L whenever such m exists. What you’ll find out, to your great amazement, is that there is one unique monoid (up to an isomorphism) that has all arrows going out and none coming in. This universal monoid is the free monoid constructed from the set of generators X. The two completely different descriptions converge!


How does it work? What’s the intuition behind it?

A lot of mathematicians secretly (or not so secretly) subscribe to Plato’s philosophy. A monoid is a Platonic object that can’t be observed directly, but which can cast a shadow on the wall of the cave we live in. This shadow is the underlying set: the result of a forgetful functor. The morphisms between monoids preserve their nature; theirs shadows, though, are mere functions.

Functions may be invertible, but often they conflate elements together. So if you have a flow of functions from one set to a bunch of others, they will tend to smooth out the differences, collapse multiple elements into one. So if there is one shadow of a monoid that dominates all others that are sharing the same generators — and there always is one — it must be the shadow of the free one. It is the one that has the largest selection of distinct elements, since every multiplication produces a new element (mitigated only by the requirements of associativity and the need for the unit).

Appendix: A Bonus Free Object

Free monoid is an example of a larger class of free objects. Free object is a powerful notion in category theory that generalizes the idea of a basis. Instead of the category of monoids let’s take any category that supports a faithful functor into the category of sets. A faithful functor is a functor that maps distinct morphism into distinct functions — it doesn’t mash them together. This will be our forgetful functor U. It maps an object A from our category to a set U A; and we can inject our basis X into this set, just like we did with generators. Object A is called universal if, for any other object B and any injection of X into U B, there is a unique morphism from A to B that “preserves” the basis. Again, by preserving the basis we mean that the image of our morphism under U maps one injected basis into the other.

In my previous blog posts I described C++ implementations of two basic functional data structures: a persistent list and a persistent red-black tree. I made an argument that persistent data structures are good for concurrency because of their immutability. In this post I will explain in much more detail the role of immutability in concurrent programming and argue that functional data structures make immutability scalable and composable.

Concurrency in 5 Minutes

To understand the role of functional data structures in concurrent programming we first have to understand concurrent programming. Okay, so maybe one blog post is not enough, but I’ll try my best at mercilessly slashing through the complexities and intricacies of concurrency while brutally ignoring all the details and subtleties.

The archetype for all concurrency is message passing. Without some form of message passing you have no communication between processes, threads, tasks, or whatever your units of execution are. The two parts of “message passing” loosely correspond to data (message) and action (passing). So there is the fiddling with data by one thread, some kind of handover between threads, and then the fiddling with data by another thread. The handover process requires synchronization.

There are two fundamental problems with this picture: Fiddling without proper synchronization leads to data races, and too much synchronization leads to deadlocks.

Communicating Processes

Let’s start with a simpler world and assume that our concurrent participants share no memory — in that case they are called processes. And indeed it might be physically impossible to share memory between isolated units because of distances or hardware protection. In that case messages are just pieces of data that are magically transported between processes. You just put them (serialize, marshall) in a special buffer and tell the system to transmit them to someone else, who then picks them up from the system.

So the problem reduces to the proper synchronization protocols. The theory behind such systems is the good old CSP (Communicating Sequential Processes) from the 1970s. It has subsequently been extended to the Actor Model and has been very successful in Erlang. There are no data races in Erlang because of the isolation of processes, and no traditional deadlocks because there are no locks (although you can have distributed deadlocks when processes are blocked on receiving messages from each other).

The fact that Erlang’s concurrency is process-based doesn’t mean that it’s heavy-weight. The Erlang runtime is quite able to spawn thousands of light-weight user-level processes that, at the implementation level, may share the same address space. Isolation is enforced by the language rather than by the operating system. Banning direct sharing of memory is the key to Erlang’s success as the language for concurrent programming.

So why don’t we stop right there? Because shared memory is so much faster. It’s not a big deal if your messages are integers, but imagine passing a video buffer from one process to another. If you share the same address space (that is, you are passing data between threads rather than processes) why not just pass a pointer to it?

Shared Memory

Shared memory is like a canvas where threads collaborate in painting images, except that they stand on the opposite sides of the canvas and use guns rather than brushes. The only way they can avoid killing each other is if they shout “duck!” before opening fire. This is why I like to think of shared-memory concurrency as the extension of message passing. Even though the “message” is not physically moved, the right to access it must be passed between threads. The message itself can be of arbitrary complexity: it could be a single word of memory or a hash table with thousands of entries.

It’s very important to realize that this transfer of access rights is necessary at every level, starting with a simple write into a single memory location. The writing thread has to send a message “I have written” and the reading thread has to acknowledge it: “I have read.” In standard portable C++ this message exchange might look something like this:

std::atomic x = false;
// thread one
x.store(true, std::memory_order_release);
// thread two

You rarely have to deal with such low level code because it’s abstracted into higher order libraries. You would, for instance, use locks for transferring access. A thread that acquires a lock gains unique access to a data structure that’s protected by it. It can freely modify it knowing that nobody else can see it. It’s the release of the lock variable that makes all those modifications visible to other threads. This release (e.g., mutex::unlock) is then matched with the subsequent acquire (e.g., mutex::lock) by another thread. In reality, the locking protocol is more complicated, but it is at its core based on the same principle as message passing, with unlock corresponding to a message send (or, more general, a broadcast), and lock to a message receive.

The point is, there is no sharing of memory without communication.

Immutable Data

The first rule of synchronization is:

The only time you don’t need synchronization is when the shared data is immutable.

We would like to use as much immutability in implementing concurrency as possible. It’s not only because code that doesn’t require synchronization is faster, but it’s also easier to write, maintain, and reason about. The only problem is that:

No object is born immutable.

Immutable objects never change, but all data, immutable or not, must be initialized before being read. And initialization means mutation. Static global data is initialized before entering main, so we don’t have to worry about it, but everything else goes through a construction phase.

First, we have to answer the question: At what point after initialization is data considered immutable?

Here’s what needs to happen: A thread has to somehow construct the data that it destined to be immutable. Depending on the structure of that data, this could be a very simple or a very complex process. Then the state of that data has to be frozen — no more changes are allowed. But still, before the data can be read by another thread, a synchronization event has to take place. Otherwise the other thread might see partially constructed data. This problem has been extensively discussed in articles about the singleton pattern, so I won’t go into more detail here.

One such synchronization event is the creation of the receiving thread. All data that had been frozen before the new thread was created is seen as immutable by that thread. That’s why it’s okay to pass immutable data as an argument to a thread function.

Another such event is message passing. It is always safe to pass a pointer to immutable data to another thread. The handover always involves the release/acquire protocol (as illustrated in the example above).

All memory writes that happened in the first thread before it released the message become visible to the acquiring thread after it received it.

The act of message passing establishes the “happens-before” relationship for all memory writes prior to it, and all memory reads after it. Again, these low-level details are rarely visible to the programmer, since they are hidden in libraries (channels, mailboxes, message queues, etc.). I’m pointing them out only because there is no protection in the language against the user inadvertently taking affairs into their own hands and messing things up. So creating an immutable object and passing a pointer to it to another thread through whatever message passing mechanism is safe. I also like to think of thread creation as a message passing event — the payload being the arguments to the thread function.

The beauty of this protocol is that, once the handover is done, the second (and the third, and the fourth, and so on…) thread can read the whole immutable data structure over and over again without any need for synchronization. The same is not true for shared mutable data structures! For such structures every read has to be synchronized at a non-trivial performance cost.

However, it can’t be stressed enough that this is just a protocol and any deviation from it may be fatal. There is no language mechanism in C++ that may enforce this protocol.


As I argued before, access rights to shared memory have to be tightly controlled. The problem is that shared memory is not partitioned nicely into separate areas, each with its own army, police, and border controls. Even though we understand that an object is frozen after construction and ready to be examined by other threads without synchronization, we have to ask ourselves the question: Where exactly does this object begin and end in memory? And how do we know that nobody else claims writing privileges to any of its parts? After all, in C++ it’s pointers all the way. This is one of the biggest problems faced by imperative programmers trying to harness concurrency — who’s pointing where?

For instance, what does it mean to get access to an immutable linked list? Obviously, it’s not enough that the head of the list never changes, every single element of the list must be immutable as well. In fact, any memory that can be transitively accessed from the head of the list must be immutable. Only then can you safely forgo synchronization when accessing the list, as you would in a single-threaded program. This transitive closure of memory accessible starting from a given pointer is often called a cluster. So when you’re constructing an immutable object, you have to be able to freeze the whole cluster before you can pass it to other threads.

But that’s not all! You must also guarantee that there are no mutable pointers outside of the cluster pointing to any part of it. Such pointers could be inadvertently used to modify the data other threads believe to be immutable.

That means the construction of an immutable object is a very delicate operation. You not only have to make sure you don’t leak any pointers, but you have to inspect every component you use in building your object for potential leaks — you either have to trust all your subcontractors or inspect their code under the microscope. This clearly is no way to build software! We need something that it scalable and composable. Enter…

Functional Data Structures

Functional data structures let you construct new immutable objects by composing existing immutable objects.

Remember, an immutable object is a complete cluster with no pointers sticking out of it, and no mutable pointers poking into it. A sum of such objects is still an immutable cluster. As long as the constructor of a functional data structure doesn’t violate the immutability of its arguments and does not leak mutable pointers to the memory it is allocating itself, the result will be another immutable object.

Of course, it would be nice if immutability were enforced by the type system, as it is in the D language. In C++ we have to replace the type system with discipline, but still, it helps to know exactly what the terms of the immutability contract are. For instance, make sure you pass only (const) references to other immutable objects to the constructor of an immutable object.

Let’s now review the example of the persistent binary tree from my previous post to see how it follows the principles I described above. In particular, let me show you that every Tree forms an immutable cluster, as long as user data is stored in it by value (or is likewise immutable).

The proof proceeds through structural induction, but it’s easy to understand. An empty tree forms an immutable cluster trivially. A non-empty tree is created by combining two other trees. We can assume by the inductive step that both of them form immutable clusters:

Tree(Tree const & lft, T val, Tree const & rgt)

In particular, there are no external mutating pointers to lft, rgt, or to any of their nodes.

Inside the constructor we allocate a fresh node and pass it the three arguments:

Tree(Tree const & lft, T val, Tree const & rgt)
      : _root(std::make_shared<const Node>(lft._root, val, rgt._root))

Here _root is a private member of the Tree:

std::shared_ptr<const Node> _root;

and Node is a private struct defined inside Tree:

struct Node
   Node(std::shared_ptr<const Node> const & lft
       , T val
       , std::shared_ptr<const Node> const & rgt)
   : _lft(lft), _val(val), _rgt(rgt)

   std::shared_ptr<const Node> _lft;
   T _val;
   std::shared_ptr<const Node> _rgt;

Notice that the only reference to the newly allocated Node is stored in _root through a const pointer and is never leaked. Moreover, there are no methods of the tree that either modify or expose any part of the tree to modification. Therefore the newly constructed Tree forms an immutable cluster. (With the usual caveat that you don’t try to bypass the C++ type system or use other dirty tricks).

As I discussed before, there is some bookkeeping related to reference counting in C++, which is however totally transparent to the user of functional data structures.


Immutable data structures play an important role in concurrency but there’s more to them that meets the eye. In this post I tried to demonstrate how to use them safely and productively. In particular, functional data structures provide a scalable and composable framework for working with immutable objects.

Of course not all problems of concurrency can be solved with immutability and not all immutable object can be easily created from other immutable objects. The classic example is a doubly-linked list: you can’t add a new element to it without modifying pointers in it. But there is a surprising variety of composable immutable data structures that can be used in C++ without breaking the rules. I will continue describing them in my future blog posts.