C++



Aspen, Monday, May 15: These are just quick notes, and I apologize if they might not very coherent. I have little time to jot them down because the day was filled with talks and discussions. There are two main topics people are excited about: concurrency and template metaprogramming.

In the morning I went to Christopher Kohlhoff talk about Boost.Asio. Asio stands for Asynchronous IO. The problem is how to structure your program to deal with a lot of asynchronous calls. There are two main difficulties: resource management and inversion of control. An async-driven program is essentially a collection of disjoint callbacks. When control jumps from one callback to another, you can easily lose track of who is the owner of which resources, how to get hold of them, and how to reason about the whole system. Chris showed how to deal with resources, essentially by using shared, reference-counted, pointers. When you’re making an asynchronous call you usually have to make sure that the completion handler shares some resources with the caller. In Chris’s approach these resources are encapsulated in a Connection object and the completion handlers are methods of that object. So when a handler is called, it has access to the “this” pointer and can share data with the caller. A handler must take part in the reference counting of the Connection object, so that the object doesn’t suddenly disappear. Considering that a lot of event-driven code I’ve seen over the years used global shared data to store state, this is progress. The inversion of control is still a problem though.

Hartmut Kaiser talked about the Phoenix library, which essentially implements C++ inside C++. It is built on top of Proto (which is a big hit at the conference–more about it later). From what I gathered, Phoenix is mostly a better lambda. You can write anonymous functions in Phoenix using a very stylized C++: for instance, instead of braces, you use brackets, instead of semicolons, commas, etc. One advantage of Phoenix functions is that they can be statically polymorphic. Unfortunately the main example didn’t require polymorphism, and in fact would be much easier on the eyes if it were written using C++ lambdas. The other, more important advantage of Proto is that it’s highly customizable. Hartmut showed an example of how to extend C++ syntax to support parallelism. Behind the scenes, Phoenix took advantage of OpenMP, which is a system of ugly pragmas supported by many compilers to create parallel loops and other concurrent constructs.

An then there was a Proto marathon by Joel Falcou, after which I had a long discussion with him over dinner. Proto is a metaprogramming tour de force. If I described it as a library for constructing embedded domain-specific languages in C++, I wouldn’t give it justice. It’s a system that tricks the C++ compiler into parsing expressions into full-blown compile-time abstract syntax trees, which are at the same time function objects that can be executed at runtime. If this is not impressive enough, Proto provides multiple customization mechanism that allow you to plug in new constructs, give them specific semantics, and even rewrite the ASTs. Joel gave an example of an EDSL for expressing analytical functions, which could analytically calculate derivatives of functions at compile time. Joel is coming to my talk tomorrow and I hope he will be able to explain to me what I’m doing. My talk is essentially about how to get to grips with Proto by using Haskell monads. We’ll see how it goes.


In the previous installment I introduced monads using two examples: the Maybe monad and the List monad. Admittedly, those weren’t the most exciting uses of monads, although they served the purpose of explaining the underlying theory. I also mentioned that monads were introduced into functional programming as a general solution for a variety of problems. One of those problems was representing stateful computations as functions.

Functions, State, and Side Effects

Here’s the problem: In functional languages, every time you call a function with the same arguments, it must return the same result. In fact, if the compiler detects such a situation, it is free to memoize the result of the first call and skip subsequent calls altogether. A stateful computation, on the other hand, might return different results every time it’s performed. It may, for instance, access some global or static variables. It may also modify those variables– in other words have side effects. In extreme cases a computation might be performed purely for side effects and not even bother to return any results.

This kind of behavior is often troublesome even in imperative programming. The use of global variables in particular is being discouraged. A better solution is to encapsulate the state and pass it explicitly to functions that use it. As a syntactic shortcut, in object-oriented languages, some of the state is regularly passed to functions (methods) as a hidden “this” or “self” argument. There’s even a syntax for composing such functions, as in this JavaScript snippet:

with(document) {
    var t = title;
    write(t + " and more");
}

Here, title is a property and write a method of the object document. (If you put on your monadic glasses, it almost looks like do notation.)

In functional languages we have one more limitation: we cannot mutate any data. There’s a standard way to overcome this limitation: Instead of modifying the data, you create a modified copy. This doesn’t even have to be expensive, if the language supports smart data structures that silently substitute references for copies whenever possible. Most operations on lists in Haskell are optimized this way, and there’s even a language, Clojure, at whose core are “persistent” data structures that behave as if they were immutable, but do a lot of sharing behind the scenes. Immutability is a very attractive feature when you are doing concurrent programming: access to immutable data requires no synchronization.

Taking all this into account, the way to translate stateful computations into functional language is to use functions that explicitly take state (encapsulated in some data structure) and return the, possibly modified, state together with the usual return value. For instance, a C++ “function”:

int pop() {
    auto v = glob.top();
    glob.pop();
    return v;
}

that operates on a global vector, glob of the type std::vector<int>, would be turned into a Haskell function:

pop (ST lst) = (ST (tail lst), head lst)

operating on the state of type Stack:

newtype Stack = ST [Int]

The constructor ST creates a Stack from a list of integers. This constructor is also used for pattern matching, as in the argument of pop. The function head returns the first element of a list, tail returns the rest.

The signature of pop is characteristic of functions that operate on state:

top:: Stack-> (Stack, Int)

Such functions are often called “actions.”

There are two problems with this scheme: (1) It’s awkward, and (2) It doesn’t fit our monadic approach. In this example, the original computation (as expressed in C++) takes no arguments and returns an integer. Its functional counterpart takes a state and returns an integer combined with the state. It’s not clear how one would bind such functions together and use the convenient do notation.

We are on the right track though. We just need to get even more general: We need to separate the construction of an action from its execution. Our basic blocks will be functions that return actions– we’ll call them “monadic functions.” Since action is a function, we’ll be dealing with functions returning functions; or higher order functions.

Our goal is to find a way to compose monadic functions into larger monadic functions. A composite monadic function will return a composite action. We will then execute such action on a state, and get our result.

This new description fits the monadic pattern much better. We start with a generic stateful computation that takes an argument of type a and returns a value of type b, and we turn it into a (monadic) function that takes type a and returns an enriched type based on b. This time, though, the enriched type is a function type– an action. In general, an action is a function that takes a state (of some type S) and returns a tuple consisting of the (possibly modified) state and the value of type b.

S -> (S, b)

Here’s the first element of the monad– a type constructor: For any type t it defines a new type: an action to calculate a value of type t given a state. This type constructor is part of the state monad. Before we get to a more formal definition, let’s do some practice exercises.

The Monadic Calculator

There’s a perfect example of a stateful computation: a stack-based calculator. The state in this case is represented by the type Calc:

newtype Calc = Calc [Int]

that hides a list of integers– our calculator’s stack.

First, lets define a monadic function (a function that returns an action) that pops an element off the calculator’s stack. It will be a function returning a function, so we need to use lambdas.

popCalc = 
    \(Calc lst) -> (Calc (tail lst), head lst)

The body of the lambda is almost identical to the implementation of pop above. Notice that popCalc takes no arguments. Rather, the function that it produces takes a calculator as an argument and returns the calculator back, paired with the result–the value at the top of the stack. In other words, popCalc returns a promise to calculate the top of the calculator’s stack when the stack is available.

Here’s how you can use popCalc. First, you call it with no arguments and record the returned action. Next, you create a calculator (with a non-empty stack, otherwise the next line of code would bomb). You apply the action to that calculator and record the result– you pattern-match it to a tuple consisting of a changed calculator and a number. Finally you display that number. This is the actual output of a Haskell interpreter session:

> let f = popCalc
> let calc = Calc [3, 2, 1]
> let (calc', x) = f calc
> x
3

While we’re at it, we can similarly implement a pushCalc function:

pushCalc n =
    \(Calc lst) -> (Calc (n:lst), ())

Notice that the lambda produced by pushCalc returns a modified calculator (argument n is prepended to the list) paired with a special value () of the type unit— a Haskell equivalent of void. The imperative equivalent of this function would return void and work only through side effects. Notice also that the lambda is actually a closure: it captures the outer variable n for later use.

Finally, we need a function that performs some calculation; after all we are implementing a calculator:

addCalc =
    \(Calc lst) -> let (a:b:rest) = lst
                   in
                       (Calc ((a + b):rest), ())

Here I’m matching the calculator’s list with the pattern (a:b:rest) to retrieve the top two elements. The modified calculator has the sum of those two elements on the top of its stack.

We can use all these functions in combination to perform more complex operations, like adding two numbers. Here’s a piece code that might rival some of the Rube Goldberg creations:

add x y =
    let pux = pushCalc x -- promise to push x
        puy = pushCalc y -- promise to push y
        axy = addCalc    -- promise to add top numbers
        pp = popCalc     -- promise to pop the result
        calc = Calc []   -- we need a calculator
        (calc1, _) = pux calc  -- actually push x
        (calc2, _) = puy calc1 -- actually push y
        (calc3, _) = axy calc2 -- actually add top numbers
        (_, z) = pp calc3      -- actually pop the result
    in
        z  -- return the result

But what we really want is to be able to combine smaller actions into larger actions. For that we have to define bind. The signature of bind, in this case, should be:

bind :: (Calc -> (Calc, a)) ->        -- action
        (a -> (Calc -> (Calc, b)) ->  -- continuation
        (Calc -> (Calc, b))           -- new action

I have highlighted our enriched types–the action types. This signature looks much more complex than the signature of the Maybe bind, but that’s only because the enriched type is itself a function type. Other than that, the structure is the same: bind accepts an action and a continuation and returns a new action. The continuation in this case takes an argument of type a (the value to be calculated by the first action) and returns the composite action.

In fact, if we define Action as a type alias:

type Action a = Calc -> (Calc, a)

the signature of bind can be abbreviated to:

bind :: (Action a) -> (a -> (Action b)) ->  (Action b)

Now for the implementation. Since the result of bind is an action, it has to return a lambda of the appropriate signature.

bind act cont =
    \calc -> ... produce (Calc, b) tuple ...

Bind is supposed to compose the action, act, with the continuation, cont. So it should first apply act to calc.

let (calc', v) = act calc

The result is a tuple (calc', v) with a new calculator and a value v of type a.

This is the v that the continuation expects, so the next step is to apply the continuation:

act' = cont v

The result of the continuation is a new action. This new action can then be executed, that is applied to the new calculator:

act' calc'

to produce the desired result– a tuple of the type (Calc, b).

Here’s the final code:

bind act cont =
    \calc ->
        let (calc', v) = act calc
            act' = cont v
        in
            act' calc'

To complete our construction of the monad, we need to define return. The signature of return is:

return :: a -> Action a

and the implementation is pretty straightforward. It takes a value v and returns a promise to return this value.

return v = \calc -> (calc, v)

An astute reader might notice that nothing in this construction depends on the peculiarities of the type Calc. It will work for any type that is used to represent state. So we have in fact just constructed a generic state monad. The stack-based calculator is just a silly example of that monad.

It’s not difficult to implement bind as an infix operator, >>=, and turn the calculator into a monad that’s recognizable by the Haskell compiler (see Appendix 1). Then the relevant part of the add function may be rewritten in the do notation:

add x y = do
    pushCalc x
    pushCalc y
    addCalc
    r <- popCalc
    return r

Let me present the same code without any syntactic sugar, using the cascading lambda-within-lambda notation:

add x y =
  bind (pushCalc x) 
       (\() -> bind (pushCalc y)
                    (\() -> bind addCalc
                                 (\() -> bind popCalc
                                              (\z -> return z))))

This is not something you will see often in Haskell programs, but I will eventually want to go beyond Haskell. My goal is to connect back with C++, and this is the form that’s most convenient form making such a transition.

So let’s painstakingly analyze this code. We are binding the first action, (pushCalc x), to the rest of the code. The rest of the code is expressed as one huge lambda. To make these two parts fit together, their types have to match. The value produced by the action pushCalc is void (a.k.a., “unit”)– so it’s type is Action (). Therefore the lambda to which it binds must also take void, hence the notation:

\() -> ...

The body of that lambda is another bind, and so on, until we get to the interesting part, which is popCalc.

popCalc is an action that calculates a value: its signature is Action Int. This value is passed to the lambda to which popCalc is bound. Therefore this last lambda takes an Int argument, z. Finally, this value is enclosed in an action, and that’s done by the function return.

This unsugared notation elucidates one more aspect of the monadic approach that’s very relevant in the context of Haskell. Haskell is a lazy language: it doesn’t evaluate anything unless it is strictly necessary for achieving some final goal. Also, when it needs to evaluate several independent things, it will do that in some arbitrary, unpredictable order. So if it were somehow possible to implement the imperative versions of push and pop in Haskell, we would have two problems: push would never be evaluated because it produces no result, and even if it were, its evaluation could be swapped with the subsequent pop. Monadic bind forces the ordered evaluation of actions by introducing explicit data dependency. If pop follows push in the chain of binds, pop cannot be evaluated before push because its argument is the calculator that is returned by push. The two are linked by data dependency which, by the way, is not so obvious in the do notation.

Conclusion

The state monad is a very interesting pattern from the programming point of view. Instead of doing something, you create an action that is executed (maybe even multiple times) later. The monadic scaffolding provides the standard amenities like the ability to compose actions, and the do notation makes writing functions that produce functions much more natural. There is an interesting variation of the state monad called the IO monad, which is used for input and output in Haskell. I describe it in Appendix 2.

There are many patterns in imperative languages that have elements, or sometimes just hints, of a state monad. For instance, in the OO world you might encounter a very useful Command Pattern. You can “bind” command objects using the Composite Pattern. In languages that support anonymous functions and closures, like JavaScript, C# and, recently C++, you can return functions from functions directly. This might help, for instance, in dealing with inversion of control, where you return a closure as an event handler (that would be material for another series of blog posts).

But I have in mind a very specific example that I’ve been working on in C++ that fits the monadic pattern perfectly, and I’m going to write about it in the next installment of this series.

I’d like to thank Eric Niebler for valuable comments on the draft of this post.

Appendix 1

The full implementation of a stack-based calculator requires a few more Haskell tricks. First, we have to explicitly define our type constructor. I’ll call the new type Calculation, with the type constructor CL that encapsulates an action:

newtype Calculation a = CL (Calc -> (Calc, a))

Monadic functions have to return this new type, so they all wrap their actions into a Calculation.

pushCalc n =
    CL (\(Calc lst) -> (Calc (n:lst), ()))

topCalc = 
    CL (\(Calc lst) -> (Calc lst, head lst))

popCalc =
    CL (\(Calc lst) -> (Calc (tail lst), head lst))

addCalc =
    CL (\(Calc lst) -> let (a:b:rest) = lst
                       in
                           (Calc ((a + b):rest), ()))

Haskell has a built-in type class for Monads (think of a type class as a C++ concept). We have to tell Haskell that our Calculation is an instance of Monad and provide the definition of the two associated functions: bind, using infix notation, and return.

instance Monad Calculation where
    return x = 
        CL (\calc -> (calc, x))
    CL(c) >>= cont =
        CL (\calc ->
            let (calc', v) = c calc
                CL c' = cont v
            in
                c' calc')

With those definitions, our add function can be written using the do notation.

add x y = do
    pushCalc x
    pushCalc y
    addCalc
    r <- popCalc
    return r

Since we are not expecting any values to be calculated by pushCalc or addCalc, there are no left arrows accompanying them in the do notation.

Appendix 1a: The Applicative

Haskell keeps evolving and, on occasion, the changes in the language break old code. In particular the Monad class has a superclass now, called Applicative. That’s why the Monad instance for Calculation won’t compile any more, unless you explicitly add the instance for Applicative. Fortunately, the Applicative functionality can be fully implemented using the Monad interface, as in:

instance Applicative Calculation where
  pure = return
  mf <*> mx = do f <- mf
                 x <- mx
                 return (f x)

This won’t compile either, because Applicative requires Functor as its superclass. So we have to make Applicative a Functor. The simplest is to let the compiler work it out, but you have to include this line at the very top of the file:

{-# LANGUAGE DeriveFunctor #-}

and modify the definition of Calculation:

newtype Calculation a = CL (Calc -> (Calc, a))
  deriving Functor

In fact, it’s possible to implement the calculator as an Applicative, without the need for a Monad instance. But that’s a different story.

Appendix 2: The IO Monad

Strictly speaking a lazy purely functional language like Haskell cannot do any input or output. That’s because the compiler is free to memoize the result of the first call to, say, getChar and elide all subsequent calls. Calls to putChar, which don’t return anything, may be ignored altogether. This is why most functional languages cheat when it comes to I/O. But not Haskell. Monads to the rescue!

Let’s think for a moment why getChar may return different characters every time it’s called. It’s because there is a keyboard somewhere out there that changes its state. Why is it changing its state? Because there is a human who pushes the keys. Why is the human pushing the keys? Because he or she got a phone call from China that the price of rice is about to go up, so it makes sense to buy some futures. And so on… In other words there is this external world that is full of state.

What if we encapsulate that whole world in a hidden data structure and write our program as an action that operates on it? This is very similar to the state monad pattern except that here the programmer has no access to the state and cannot execute the action. The action is produced by main and passed to the system for execution. It’s the system, wink wink, that has access to “the world” and may pass it as a state argument to that action. Of course it’s all smoke and mirrors, but it successfully staves off the insanity of the external world from impinging on the purity of Haskell.

How does it work in practice? There is a monad called IO. It’s almost like a state monad, except that its type can’t be expressed in Haskell, because it would have to look something like this:

type IO a = World -> (World, a)

and we don’t know what World is. The main function in a Haskell program is a monadic IO action, usually:

main :: IO ()

with the type parameter a replaced by unit, (), (although any other type will work too).

The simplest main is just a single IO action:

main = putStrLn "Hello World!"

but in general main is a do block.

You might ask: If a Haskell program is one monadic IO action, then where does the traditional functional code go? The answer is that you can call traditional functions from anywhere in the do block. Even in my highly biased example there were several non-monadic function calls (head, tail, operator (+), etc.). Imagine a Haskell program as a tree: it’s trunk is monadic IO, and so are all the thick branches that have anything to do with I/O. But the thinner branches and leaves are your run of the mill functions that get (lazily) evaluated only when the main IO action is executed by the system.

Another interesting observation is that all functions that perform I/O have this information encoded in their type signature. Not in the type of its arguments, mind you, but in the return type. This is almost the opposite of what you see in imperative languages, where you’d have to pass some kind of I/O object (file or stream) to a procedure that performs I/O (except when that object is global, like standard input/output in C++). In Haskell, if you want your function to perform I/O, two things must happen: it must return an IO action that it internally constructs; and the action must find its way to the very top, to the main function. On the way up, it might be bound with other such actions either using bind or the do notation.

You might ask: How does Haskell make sure that all IO actions get to the top, so that the system may execute them? It doesn’t! But consider what you would have to do in order not to pass an action to the top. You’d have to explicitly ignore it, as in:

let _ = putStrLn "Don't print me!"

Ignoring things in Haskell is not a default thing.


In my previous post I talked about categories, functors, endofunctors, and a little about monads. Why didn’t I just start with a Haskell definition of a monad and skip the math? Because I believe that monads are bigger than that. I don’t want to pigeonhole monads based on some specific language or application. If you know what to look for you’ll find monads everywhere. Some languages support them better, as a single general abstraction, others require you to re-implement them from scratch for every use case. I want you to be able to look at a piece of C++ code and say, “Hey, that looks like a monad.” That’s why I started from category theory, which sits above all programming languages and provides a natural platform to talk about monads.

In this installment I will talk about some specific programming challenges that, on the surface, seem to have very little in common. I will show how they can be solved and how from these disparate solutions a pattern emerges that can be recognized as a monad. I’ll talk in some detail about the Maybe monad and the List monad, since they are simple and illustrate the basic ideas nicely. I will leave the really interesting examples, like the state monad and its variations, to the next installment.

Challenges of Functional Programming

You know what a function is: called with the same argument it always returns the same result. A lot of useful computations map directly to functions. Still, a lot don’t. We often have to deal with some notions of computations that are not functions. We can describe them in words or implement as procedures with side effects. In non-functional languages such computations are often called “functions” anyway– I’ll enclose such usage with quotation marks. Eugenio Moggi, the guy who introduced monads to computing, listed a few examples:

  • Partial “functions”: For some arguments they never terminate (e.g., go into an infinite loop). These are not really functions in the mathematical sense.
  • Nondeterministic “functions”: They don’t return a single result but a choice of results. Non-deterministic parsers are like this. When given an input, they return a set of possible parses. Which of them is right depends on the context. These are not really functions because a function must return a single value.
  • “Functions” with side effects: They access and modify some external state and, when called repeatedly, return different results depending on that state.
  • “Functions” that throw exceptions: They are not defined for certain arguments (or states) and they throw exceptions instead of returning a result.
  • Continuations.
  • Interactive input: getchar is a good example. It’s result depends on what hits the keyboard.
  • Interactive output: putchar is a good example. It’s not a function because it doesn’t return anything (if it returns an error code, it doesn’t depend on the argument). Still, it can’t be just optimized away if we want any output to appear on the screen.

The amazing thing is that, using some creative thinking, all these problems can be solved using pure functional methods. I won’t discuss all of them, but I’ll show you a series of progressively more interesting examples.

Error Propagation and Exceptions

Who doesn’t deal with error propagation? Let’s dissect the problem: You are defining a computation that can return a valid result only for a subset of possible arguments. So what happens when the argument is “wrong”? Well, you can return an error.

In the simplest case you might add one extra bit, success or failure, to your result. Sometimes you can spare a bit in the result type: If the correct result is always a non-negative number, you may return negative one as an error. Or, if the result is a pointer, you might return a null pointer instead. But these are just small hacks introduced for the sake of performance. And, like most hacks, they can lead to dangerous code. Consider this (likely incorrect) code:

size_t off = fileName.find('.');
string ext = fileName.substr(off, fileName.length() - off);

If fileName contains no dot, the result is a special value npos signifying an error. The problem is that npos is of the same type as a non-error result, so it can be passed quietly to string::substr causing undefined behavior.

A more general and safer solution is to change– extend– the type of the result to include an error bit. You can do it for any type of result. In Haskell, you just define a type constructor called Maybe:

data Maybe a = Nothing | Just a

It takes an arbitrary type a and defines a new type that adds the special value Nothing to the set of possible values of a. In C++ that would be equivalent to a template:

template<class T>
struct Maybe {
    T just; // valid if 'nothing' is false
    bool nothing;
};

(This is just an example. I’m not arguing that Maybe would be very useful in C++ which has other mechanisms for error propagation.)

So here we have a way to turn a computation that’s not defined for all values into a function that’s defined over the whole domain, but which returns a new richer type.

The next question is, do these new things– functions returning Maybe— compose? What should the caller do with the result of such a function when it’s part of a larger computation? One thing is for sure, this result cannot be passed directly to an unsuspecting function–the error cannot be easily ignored– which is good. If we replace find by a new function safe_find that returns a Maybe<size_t>, the client won’t call substr with it. The types wouldn’t match. Instead, the result of safe_find must be unpacked and (much more likely than before) tested.

Maybe<size_t> off = safe_find(fileName, '.');
string ext;
if (!off.nothing)
    ext = fileName.substr(off.just, fileName.length() - off.just);

Notice what happened here: By changing the return type of a function we broke the natural way functions compose– the result of one becoming the input of the next. On the other hand, we have come up with a new way of composing such functions. Let me chain a few of such compositions for better effect:

Maybe<Foo> foo = f(x);
if (!foo.nothing) {
    Maybe<Bar> bar = g(foo.just);
    if (!bar.nothing) {
        Maybe<Baz> baz = h(bar.just);
        if (!baz.nothing) {
            ...
        }
    }
}

I have highlighted the elements of the “glue,” which is used to compose our new, more error-conscious functions. Notice how this boilerplate glue gets in the way of code clarity. Ideally, we would like to write something like this:

DO
{
    auto y = f(x);
    auto v = g(y);
    auto z = h(v);
    return z;
}

where DO would magically provide the glue that’s implicit in the chaining of f, g, and h.

It’s not easy to do this– abstract the glue away– in C++. I’m not even going to try. But in Haskell it’s a different story. Let’s start with the almost direct translation of the C++ code into Haskell:

compose n =
    let m = f n in
    case m of
    Nothing -> Nothing
    Just n1 ->
        let m1 = g n1 in
        case m1 of
        Nothing -> Nothing
        Just n2 -> 
            let n3 = h n2 in
            n3

The if statements are replaced by Haskell’s pattern matching (case x of). The ms are the Maybes and the ns art their contents (if they aren’t Nothings).

Notice the characteristic cascading style of this code– the nested conditionals or pattern matches. Let’s analyze one level of such a cascade. We start with a Maybe value (one returned by f n). We unpack it and examine the contents. If the result is not an error (Just n1), we continue with the rest of the body of compose. I have highlighted the “rest of the body” in blue (the “glue” is still in red).

What’s also important is that, if the result is an error (the Nothing branch of the pattern match) the whole “rest of the body” is skipped. In order to abstract the glue, we have to abstract the “rest of the body” as well. Such an abstraction is called a continuation. Let’s write this continuation as a lambda (lambdas in Haskell are written using the backslash, which is nothing but the Greek letter λ missing one leg):

\ n1 -> 
    let m1 = g n1 in
    case m1 of
    Nothing -> Nothing
    Just n2 -> 
        let n3 = h n2 in
        n3

And here’s the trick: We can abstract the glue as a (higher-order) function that takes a Maybe value and a continuation. Let’s call this new function bind and rewrite compose with its help:

compose n =
    let m = f n in
    -- the first argument is m, the second is the whole continuation
    bind m (\n1 ->
                let m1 = g n1 in
                case m1 of
                Nothing -> Nothing
                Just n2 -> 
                    let n3 = h n2 in
                    n3)

Here’s how bind is implemented:

bind m f = 
    case m of 
    Nothing -> Nothing  -- bypass the continuation
    Just n -> f n       -- pass n to the continuation

or, more compactly,

bind Nothing cont  = Nothing
bind (Just n) cont = cont n

Figure 1 illustrates the complete code transformation. The result of f n, which is a Maybe, is passed to bind, represented by a blue box. Inside bind the Maybe is unpacked. If its value is Nothing, nothing happens. If its value is Just n1, the rest of the code, the continuation, is called with the argument n1. The continuation itself is rewritten using bind, and so on. The final continuation calls return, which I will explain shortly.

Fig 1. Composition of functions that return Maybe results

The position of bind, the blue box in Figure 1, between its Maybe argument and the continuation suggests that infix notation might be more appropriate. Indeed, in Haskell bind is represented by the infix operator, >>=:

Nothing  >>= cont = Nothing
(Just x) >>= cont = cont x

(The left-hand side of the equal sign is the operator between its two arguments [actually, patterns], and the right-hand side is the body of the function.) We can express the type signature of the bind function as:

(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b

It takes a Maybe a and a function from a to Maybe b (that’s our continuation). It returns a Maybe b. When dealing with higher order functions I find type signatures extremely useful; so much so that I often use them as comments in my C++ meta-code.

The complete rewrite of compose using >>= looks like this:

compose1 n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Now is the time to explain the mysterious return at the end of the function. No, it’s not the keyword for returning a result from a function. It’s a function that takes an argument of type a and turns it into a Maybe a:

return n = Just n

We need return because the result of compose is a Maybe. If any of the functions returns Nothing, compose returns Nothing (it’s part of the definition of >>=). Only when all functions succeed do we call return with the correct result, n3. It turns n3 into Just n3, thus announcing the success of the computation and encapsulating the final result.

You might say that using return instead of Just n3 is an overkill, but there are good reasons to do that. One is to encapsulate and hide direct access to the implementation of Maybe. The other has to do with the ways this pattern is generalized beyond Maybe.

The Maybe Monad

What does Maybe have to do with monads? Let’s see what we have done so far from a more general perspective.

We started from a computation which didn’t behave like a function– here, it was not defined for all arguments. We had found a clever way to turn it into a function by enriching its return type. Such general “enriching” of types can be expressed as a type constructor.

From a category-theoretical point of view, defining a type constructor, which is a mapping from types to types, gets us half way towards defining an endofunctor. The other part is the mapping of functions (morphisms). For any function that takes values of type a and returns values of type b we need a corresponding function that acts between the enriched types. In Haskell, the mapping of functions to functions can be expressed as a higher-order function. When this mapping is part of an endofunctor, the corresponding higher-order function is called fmap.

Let’s call the type constructor M. It constructs the enriched type M a from any type a. The function fmap has the following signature:

fmap :: (a -> b) -> (M a -> M b)

It maps a function from a to b to a function from M a to M b. In the case of Maybe we already know the type constructor part, and fmap will follow soon.

Let’s go back to our computation turned function returning an enriched type. In general, its signature is:

f :: a -> M b

Since we care about composability of computations, we need a way to glue the enriched functions together. In the process of gluing we usually have to inspect the enriched type and make decisions based on its value. One of such decisions might be whether to continue with the rest of the computation or not. So the general bind operation must take two arguments: the result of the previous enriched function and the rest of the computation– a continuation:

bind :: M a -> (a -> M b) -> M b

If you squint enough, you can see a similarity between bind and fmap.

Here’s how you do it: Imagine that you supply bind with its second argument, the continuation. There is still one free parameter of the type M a. So what’s left is a function that takes M a and returns M b. (This is not exactly currying, since we are binding the second argument rather than the first, but the idea is the same.) That’s exactly the type of function that occurs on the right hand side of the signature of fmap. Seen from that point of view, bind maps functions into functions:

(a -> M b) -> (M a -> M b)

The function of the left hand side is our continuation, the one on the right hand side is bind with the second argument plugged by the continuation.

This signature is really close to that of fmap, but not quite. Luckily, we still have a second family of functions at our disposal, the polymorphic return. The signature of return is:

return :: a -> M a

Using both bind and return we can lift any function f:

f :: a -> b

to a function g:

g :: M a -> M b

Here’s the magic formula that defines g in terms of f, bind, and return (the dot denotes regular function composition):

g ma = bind ma (return . f)

Or, using infix notation:

g ma = ma >>= (return . f)

This mapping of f into g becomes our definition of fmap. Of course, fmap must obey some axioms, in particular it must preserve function composition and interact correctly with unit (which I will discuss shortly). These axioms translate back into corresponding conditions on bind and return, which I’m not going to discuss here.

Now that we have recovered the functor, we need the two other parts of a monad: unit and join. Guess what, return is just another name for unit. For any type, it lifts an element of that type to an element of the enriched type in the most natural manner (incidentally, the word natural has a very precise meaning in category theory; both unit and join are natural transformations of the functor (M, fmap)).

To recover join, let’s look at its type signature in Haskell:

join :: M (M a) -> M a

It collapses one layer of the type constructor. In the case of Maybe it should map a double Maybe to a single Maybe. It’s easy to figure out that it should map Just (Just x) into Just x and anything else to Nothing. But there is a more general way of defining join using bind combined with the identity function:

join :: M (M a) -> M a
join mmx = mmx >>= id

We are binding a value of type M (M a) to a function id. This function, when acting on M a returns M a. You can convince yourself that these signatures match the general signature of bind (see Appendix); and that this definition, when applied to Maybe, produces the correct result.

To summarize: What we have done here is to show that Maybe together with bind and return is a monad in the category-theory sense. Actually, we have shown something far more general: Any triple that consists of a type constructor and two functions bind and return that obey certain identities (axioms) define a monad. In category theory this triple is called a Kleisli triple and can be used as an alternative definition of a monad.

Syntactic Sugar

So far you’ve seen only the Maybe example, so it might seem like we are pulling out a monadic cannon to kill a mosquito. But, as you’ll see later, this same pattern pops up in a lot of places in various guises. In fact it’s so common in functional languages that it acquired its own syntactic sugar. In Haskell, this sugar is called the do notation. Let’s go back to the implementation of our function compose:

compose n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Here’s exactly the same code in do notation:

compose n = do
    n1 <- f n
    n2 <- g n1
    n3 <- h n2
    return n3

This looks deceptively similar to an imperative program. In fact here’s a C++ version of it (for the sake of the argument let’s assume that f takes a pointer to Foo and h returns an integer) :

int compose(Foo * n)
{
    auto n1 = f(n);
    auto n2 = g(n1);
    auto n3 = h(n2);
    return n3;
}

Uh, one more thing. This is how you would call it:

try {
    compose(pFoo);
}
catch(...) {
    // error handling
}

In C++ you get virtually the same functionality not by modifying the return type, but by throwing an exception. (Now you may utter your first cry, “Hey, that looks like a monad!”, when looking at C++ code.)

Just like in our Haskell example, once any of the functions reports an error (throws an exception), the rest of the body of compose is skipped. You might say that C++ exceptions offer more power than the Maybe monad and you’ll be right. But that’s because I haven’t shown you the Error monad and the Exception monad.

Where Haskell’s Exception monad beats C++ exceptions is in type checking. Remember the unfortunate attempt at adding exceptions specification to the C++ type system? (Java went the same way.) Here’s what the C++ standard says:

An implementation shall not reject an expression merely because when executed it throws or might throw an exception that the containing function does not allow.

In other words, exceptions specifications are bona fide comments. Not so in Haskell! If a function returns a Maybe or Exception, that becomes part of its type signature which is checked both at the call site and at the function definition site. No cheating is allowed, period.

But the major difference between Haskell’s and C++’s approach is that the do notation generalizes to all monads, whereas C++ neat try/catch syntax applies only to exceptions.

A word of caution when reading Haskell monadic code. Despite similarities in usage, Haskell’s left arrow is not an assignment. The left hand side identifier corresponds to the argument of the continuation (which is the rest of the do block). It is the result of unpacking the outcome of the right hand side expression. This unpacking always happens inside bind.

Non-deterministic Computations

In the previous post I introduced the list monad by defining a functor and two families of functions. The type constructor part of the functor mapped any type a into the list of a, [a]. The part that acted on functions (now we know that, in general, it’s called fmap; but for lists it’s just map) worked by applying the function to each element of the list. The unit and the join were polymorphic functions. unit was defined as:

 unit x = [x]

and join as concat.

Now I can tell you that the list monad provides the solution to the problem of implementing non-deterministic computations. Imagine parsing a non-deterministic grammar. A production might offer multiple alternative parse trees for the same input.

These types of computations may be simulated with functions returning lists of possible results. Here’s the same trick again: whatever type is returned by a deterministic parser (e.g., a parse-tree type), it’s now turned into an enriched type (a parse tree list).

We’ve already seen the category-theory construction of the list monad but here we are facing a slightly different problem: how to string together functions that return lists. We know that we have to define bind. It’s signature, in this case, would be:

bind :: [a] -> (a -> [b]) -> [b]

It takes a list (which is the result of a previous function) and a continuation, (a -> [b]), and must produce another list. The obvious thing is to apply the continuation to each element of the input list. But since the continuation also produces a list, we’ll end up with a list of lists. To get a single list as the result, we’ll simply concatenate the sublists. Here’s the final implementation:

xs >>= cont = concat (map cont xs)

The application of the continuation to each element of the input list is done using map.

Is this a coincidence that we were able to define bind in therms of join and fmap? Not at all. The general formula for converting the functor definition of a monad to a Kleisli triple is:

  • Take the object-mapping part of the functor (the type constructor)
  • Define bind as
    bind x f = join ((fmap f) x))

    where fmap is the part of the functor that maps morphisms.

  • Define return as unit

Now we know how to move between those two definitions back and forth.

Just as in the case of Maybe, we can apply the do notation to functions returning lists:

toss2Dice = do
    n <- tossDie
    m <- tossDie
    return (n + m)

Here, tossDie returns the list of all possible outcomes of a die toss, and toss2Dice returns the list of all possible sums of the outcomes of a two-die toss.

An interesting observation is that the list monad is closely related to list comprehensions, down to the use of left arrows (in Haskell). For instance, the above example is equivalent to:

toss2Dice = [n + m | n <- tossDie, m <- tossDie]

Conclusions

There is a large class of computations that convert input to output in a non-functional way. Many of those can be expressed as functions that return “enriched” output. This enrichment of the output can be expressed in terms of a type constructor. This type constructor defines the first component of a monad.

Computations have to be composable, so we need a way of composing the enriched functions. This introduces the second component of the monad, the bind family of higher-order functions.

Finally, we should be able to construct functions that return enriched types, and for that we need the third component of the monad, the return family of functions. They convert regular values into their enriched counterparts.

I’ve shown you two examples of Haskell monads in action, but the real treat will come in the third installment of my mini-series. I’ll show you how to define and compose functions that, instead of returning regular values, return functions that calculate those values.

Appendix: Join From Bind

By popular demand (see comments), I decided to explain in more detail the typing of the formula:

join :: M (M a) -> M a
join mmx = mmx >>= id

At first look it seems like id doesn’t have the right signature for the second parameter to bind:

bind :: M a' -> (a' -> M b') -> M b'

(I changed the names of type arguments to a' and b' to avoid confusion.) However, notice that here we are calling bind with the first argument, mmx, of type M (M a). It means that a' in this particular instantiation of bind is (M a). The second argument to bind is id with the signature:
id: c -> c
Here, id will be called with c equal to (M a), so its return type will also be (M a). With those substitutions, bind will also return type (M a):

M (M a) -> (M a -> M a) -> M a

Thus the whole combination (mmx >>= id) ends up with the right type signature for join.

Bibliography

  1. Eugenio Moggi, Notions of Computation and Monads. This is a hard core research paper that started the whole monad movement in functional languages.
  2. Philip Wadler, Monads for Functional Programming. The classic paper introducing monads into Haskell.
  3. Tony Morris, What Does Monad Mean?
  4. Brian Beckman, Don’t fear the Monad
  5. Graham Hutton, Erik Meijer, Monadic Parsing in Haskell.
  6. Brian McNamara, Yannis Smaragdakis, Syntax sugar for FC++: lambda, infix, monads, and more. A C++ template library for functional programming that imitates Haskell pretty closely.

The Monad is like a bellows:
it is empty yet infinitely capable.
The more you use it, the more it produces;
the more you talk about it, the less you understand.

–Monad Te Ching

I don’t know if I’m exaggerating but it seems like every programmer who gets monads posts a tutorial about them. (And each post begins with: There’s already a lot of monad tutorials on the Internet, but…) The reason is that getting monads it’s like a spiritual experience that you want to share with others.

When facing a monad, people often behave like the three blind men describing an elephant. You’ll see monads described as containers and monads described as actions. Some people see them as a cover-up for side effects, others as examples of endofunctors in Category Theory.

Monads are hard to describe because they don’t correspond to anything in our everyday experience. Compare this with Objects in Object-Oriented programming. Even an infant knows what an object is (something you can put in your mouth). What do you do with a monad?

But first, let me answer the pertinent question:

Why Bother?

Monads enable pure functional programmers to implement mutation, state, I/O, and a plethora of other things that are not functions. Well, you might say, they brought it on themselves. They tied their hands behind their backs and now they’re bragging that they can type with their toes. Why should we pay attention?

The thing is, all those non-functional things that we are so used to doing in imperative programming are also sources of a lot of troubles. Take side effects for instance. Smart programmers (read: the ones who burnt their fingers too many times) try to minimize the use of global and static variables for fear of side effects. That’s doable if you know what you’re doing. But the real game changer is multithreading. Controlling the sharing of state between threads is not just good programming practice– it’s a survival skill. Extreme programming models are in use that eliminate sharing altogether, like Erlang’s full isolation of processes and its restriction of message passing to values.

Monads stake the ground between total anarchy of imperative languages and the rigid dictatorship of Erlang-like isolationism. They don’t prohibit sharing or side effects but let you control them. And, since the control is exercised through the type system, a program that uses monads can be checked for correctness by the compiler. Considering how hard it it to test for data races in imperative programs, I think it’s worth investing some time to learn monads.

There is also a completely different motivation: metaprogramming. The template language used for metaprogramming in C++ is a pure functional language (see my blog post, What does Haskell have to do with C++?). If monads are so important in functional programming, they must also pop up in C++ metaprogramming. And indeed they do. I hope to discuss this topic in a future post.

So what’s a monad?

A Categorical Answer

If you don’t know anything about category theory, don’t get intimidated. This is really simple stuff and it will clarify a lot of things, not to mention earning you some bragging rights. My main goal is to share some intuitions from mathematics that will build foundations for a deeper understanding of monads in programming. In this installment I will explain categories, functors, and endofunctors, leading up to monads. I will give examples taken both from everyday life and from programming. I will really get into monads and their practical applications in the next installment, so be patient.

Categories

A category is a natural extension of our notion of sets and functions. The generalization of a set in a category is called an object (a pretty neutral term with little semantic ballast), and the generalization of a function is called a morphism. In fact, the standard example of a category is the category of sets and functions called (capital letter) Set.

A morphism (read “function”) goes from one object (read “set”) to another. Mathematical functions like sin or exp usually go from the set of real numbers to the set of real numbers. But you may also define functions like isPrime that go from natural numbers to Booleans, or a function price that goes from a set of goods to the set of numbers.

The only thing a mathematician needs to know about morphisms is that they can be composed. If you have a morphism from A to B, A->B, and another going from B to C, B->C, then they can be composed to a morphism from A to C, A->C. And just like the standard composition of functions, morphism composition must be associative, so we don’t need parentheses when composing more than two of them.

Actually, two things. There must be, for every object, a special morphism called identity that essentially does nothing and when composed with any other morphism reproduces the same morphism.

Just to throw you off the track, a category doesn’t have to be built on sets and functions. You can easily construct simple categories from blobs and arrows. Fig 1 shows such a category that contains two objects and four morphisms: arrows between them (formally, those arrows are ordered pairs of objects so, for instance, f is a pair (A, B)). You can easily check that any two morphisms can be composed and that the two moprphisms iA and iB serve as identities.

Fig 1. A simple category with two objects and four morphisms.

That’s it! Hopefully I have just convinced you that a category is not a big deal. But let’s get down to Earth. The one category that’s really important in programming languages is the category of types and functions, in particular its Haskell version called Hask. There usually is a finite set of basic types like integers or Booleans, and an infinite set of derived types, like lists of integers, functions from integers to Booleans, etc. In Hask, a type is just a set of values. For instance, the type Char is a set {‘a’, ‘b’, ‘c’, … }.

So, in the category Hask, types are objects and functions are morphisms. Indeed, a function maps one type into another (forget for a moment functions of multiple arguments– they can be modeled with currying– and polymorphic functions– they are families of functions). And these are functions in the functional-programming sense: called with the same values they return the same values–no side effects allowed.

Function composition is just passing the result of one function as an argument to another. The identity function takes x and immediately returns it back.

This is all fine, but what’s in it for me, you might ask. So here’s the first insight and a moment of Zen. If there is one thing that you can call the essence of programming, it’s composability. In any style of programming you always compose your program from smaller pieces, and those pieces from even smaller pieces, and so on. That’s why categories with their composable morphisms are so important. The essence of Lego blocks is the way they fit together, their composability, not the color or size. The essence of functional programming is how functions work together: how you can build larger functions from smaller ones.

Every category is defined by its choice of objects and morphisms. But is there something that can characterize a given category that’s independent of its choice of particular objects and morphisms? How do you expose the inner structure of a particular category? Mathematicians know exactly how to do that. You have to be able to map categories into other categories while preserving some constraints imposed by the way morphisms are attached to objects and the way they compose. Such maps let you find similarities between categories and catalog different kinds of categories. That’s when things get really interesting.

Functors

A functor, F, is a map from one category to another: it maps objects into objects and morphisms into morphisms. But it can’t do it in a haphazard way because that would destroy the very structures that we are after. So we must impose some “obvious” (mathematicians love that word) constraints.

First of all, if you have a morphism between two objects in the first category then it better be mapped into a morphism between the corresponding objects in the second category. Fig 2 explains this diagrammatically. Object A is mapped into F(A), object B into F(B). A morphism f from A to B is mapped into a morphism F(f) from F(A) to F(B). Mathematicians say that such a diagram must commute, that is the result must be the same whether you go from A to F(A) and then apply F(f), or first apply f and then go from B to F(B).

Functor diagram

Fig 2. Diagram showing the action of a functor F on objects A and B and a morphism f. The bottom part lives in F's domain (source) category, the top part in its codomain (the target).

Moreover, such mapping should preserve the composition property of morphisms. So if morphism h is a composition of f and g, then F(h) must be a composition of F(f) and F(g). And, of course, the functor must map identity morphisms into identity morphisms.

To get a feel for how constrained functors are by these conditions, consider how you could map the category in Fig 1 into itself (such a functor just rearranges things inside one category). There are two trivial mappings that collapse both objects into one (either A or B), and turn all morphisms into identity. Then there is the identity functor that maps both objects into themselves and all morphisms into themselves. Finally, there is just one “interesting” functor that maps A into B and B into A with f and g switching roles. Now imagine a similar category but with the g arrow removed (yes, it’s still a category). Suddenly there is no functor other than the collapsing ones between Fig 1 and that new category. That’s because the two categories have completely different structure.

Let me now jump into more familiar territory. Since we are mostly interested in one category, Hask, let me define a functor that maps that category into itself (such functors are called endofunctors). An object in Hask is a type, so our functor must map types into types. The way to look at it is that a functor in Hask constructs one type from another– it’s a type constructor. Don’t get confused by the name: a type constructor creates a new type in your program, but that type has already existed in Hask.

A classical example is the list type constructor. Given any type it constructs a list of that type. Type Integer is mapped into list of integers or, in Haskell notation, [Integer]. Notice that this is not a map defined on integer values, like 1, 2, or 3. It also doesn’t add a new type to Hask— the type [Integer] is already there. It just maps one type into another. For C++ programmers: think of mapping type T into a container of T; for instance, std::vector<T>.

Mapping the types is the easy part, what about functions? We have to find a way to take a particular function and map it into a function on lists. That’s also easy: apply the function to each element of the list in turn. There is a (higher level) function in Haskel that does it. It’s called map and it takes a function and a list and returns a new list (or, because of currying, you may say that it takes a function and returns a function acting on lists). In C++ there is a corresponding template function called std::transform (well, it takes two iterators and a function object, but the idea is the same).

Mathematicians often use diagrams to illustrate the properties of morphisms and functors (see Fig 2). The arrows for morphisms are usually horizontal, while the arrows for functors are vertical (going up). That’s why the mapping of morphisms under a functor is often called lifting. You can take a function operating on integers and “lift it” (using a functor) to a function operating on lists of integers, and so on.

The list functor obviously preserves function composition and identity (I’ll leave it as an easy but instructive exercise for the reader).

And now for another moment of Zen. What’s the second most important property of programming? Reusability! Look what we have just done: We took all the functions we’ve implemented so far and lifted them to the level of lists. We’ve got functions operating on lists essentially for free (well, we’ve got a small but important subset of those functions). And the same trick may be applied to all kinds of containers, arrays, trees, queues, unique_ptrs and more.

It’s all beautiful, but you don’t really need category theory to apply functions to lists. Still it’s always good to see patterns in programming, and this one is definitely a keeper. The real revolution starts with monads. And, guess what, the list functor is actually a monad. You just need a few more ingredients.

What’s the intuition behind the statement that mappings expose the structure of the system? Consider the schematic of the London underground in Fig 3. It’s just a bunch of circles and lines. It’s only relevant because there is a mapping between the city of London and this schematic. The circles correspond to tube stations and the lines to train connections. Most importantly, if trains run between two stations, the corresponding circles in the diagram are connected by lines and vice versa: these are the constraints that the mapping preserves. The schematic shows a certain structure that exists in London (mostly hidden underground) which is made apparent by the mapping.

Fig 3. The schematic map of London underground system.

Interestingly, what I’m doing here is also mapping: London and the underground map correspond to two categories. Trains stations/circles are objects and train connections/lines are morphism. How’s that for an example?

Endofunctors

Mathematicians love mappings that preserve “obvious” constraints. As I explained, such mappings abstract inner structures away from the details of implementation. But you can also learn a lot about structure by studying non-trivial mappings into itself. Functors that map a category into itself are called endofunctors (like endo-scopes they let you look inside things). If functors expose similarities, endofunctors expose self-similarities. Take one look at the fractal fern, Fig 4, and you’ll understand how powerful self-similarity can be.

Fractal Fern

Fig 4. This fractal fern was generated using just four endomorphisms.

With a little bit of imagination you can see the list functor exposing fern-like structures inside Hask (Fig 5). Chars fan out into lists of Chars, which then fan out into lists of lists of Chars, and so on, ad infinitum. Horizontal structures described by functions from Char to Bool are reflected at higher and higher levels as functions on lists, lists of lists, etc.

Fig 5. The action of the list type constructor reveals fractal-like structure inside Hask. The functor lifts things up, the functions act horizontally.

A C++ template that takes a type parameter could be considered a type constructor. How likely is it that it also defines a functor (loosely speaking– C++ is not as mathematized as Haskell)? You have to ask yourself: Is the type parameter constrained in any way? It’s often hard to say, because type constraints are implicit in the body of a template and are tested only during instantiation. For instance, the type parameter for a std::vector must be copyable. That eliminates, for instance, classes that have private or deleted (in C++0x) copy constructors. This is not a problem though, because copyable types form a subcategory (I’m speaking really loosely now). The important thing is that a vector of copyable is itself copyable, so the “endo-” part of the endomorphism holds. In general you want to be able to feed the type created by the type constructor back to the type constructor, as in std::vector<std::vector<Foo>>. And, of course, you have to be able to lift functions in a generic way too, as in std::transform.

Monads

Ooh, Monads!
–Haskell Simpson

It’s time to finally lift the veil. I’ll start with the definition of a monad that builds on the previous sections and is mostly used by mathematicians. There is another one that’s less intuitive but easier to use in programming. I’ll leave that one for later.

A monad is an endofunctor together with two special families of morphisms, both going vertically, one up and one down (for “directions” see Fig 5). The one going up is called unit and the one going down is called join.

Now we are juggling a lot of mappings so let’s slow down to build some intuition. Remember, a functor maps objects: in our case, types, which are sets of values. The functor doesn’t see what’s inside the objects; morphisms, in general, do. In our case, a morphism is a function that maps values of one type into values of another type. Our functors, which are defined by type constructors, usually map poorer types into richer types; in the sense that type Bool is a set that contains just two elements, True and False, but type [Bool] contains infinitely many lists of True and False.

Unit takes a value from the poorer type, then picks one value from the richer type, and pronounces the two roughly equivalent. Such a rough equivalent of True from the Bool object is the one-element list [True] from the [Bool] object. Similarly, unit would map False into [False]. It would also map integer 5 into [5] and so on.

Unit can be thought of as immersing values from a lower level into the higher level in the most natural way possible. By the way, in programming we call a family of functions defined for any type a polymorphic function. In C++, we would express unit as a template, like this:

template<class T>
std::vector<T> unit(T value) {
    std::vector<T> vec;
    vec.push_back(value);
    return vec;
}

To explain join, imagine the functor acting twice. For instance, from a given type T the list functor will first construct the type [T] (list of T), and then [[T]] (list of list of T). Join removes one layer of “listiness” by joining the sub-lists. Plainly speaking, it just concatenates the inner lists. Given, for instance, [[a, b], [c], [d, e]], it produces [a, b, c, d, e]. It’s a many-to-one mapping from the richer type to the poorer type and the type-parameterized family of joins also forms a polymorphic function (a template, in C++).

There are a few monadic axioms that define the properties of unit and join (for instance that unit and join cancel each other), but I’m not going to elaborate on them. The important part is that the existence of unit and join imposes new constraints on the endofunctor and thus exposes even more structure.

Mathematicians look at join as the grandfather of all multiplication with unit being its neutral element. It’s heaven for mathematicians because multiplication leads to algebraic structures and indeed monads are great for constructing algebras and finding their hidden properties.

Unlike mathematicians, we programmers are not that interested in algebraic structures. So there must be something else that makes monads such a hit. As I mentioned in the beginning, in programming we often face problems that don’t naturally translate into the functional paradigm. There are some types of computations that are best expressed in imperative style. It doesn’t mean they can’t be translated into functions, it’s just that the translation is somewhat awkward and tedious. Monads provide an elegant tool to do this translation. Monads made possible the absorption and assimilation of imperative programming into functional programming, so much so that some people claim (tongue in cheek?) that Haskell is the best imperative language. And like all things functional monads are bound to turn around and find their place in imperative programming. But that’s material for my next blog post.

Bibliography


I gave this presentation on Nov 17 at the Northwest C++ Users Group. Here are the links to the two-part video:

  1. Part I
  2. Part II

There was an unresolved discussion about one of the examples–the one about template specialization conflicting with modular type checking. It turns out the example was correct (that is an error would occur).

template<class T> struct vector {
    vector(int); // has constructor
};

//-- This one typechecks
template<LessThanComparable T>
void f(T x) { vector<T> v(100); ... }

//-- Specialization of vector for int
template<> struct vector<int> {
    // missing constructor!
};

int main() { f(4); } // error

The issue was whether the instantiation of f in main would see the specialization of the vector template for int even though the specialization occurs after the definition of f. Yes, it would, because vector<T> is a dependent name inside f. Dependent names are resolved in the context of template instantiation–here in the context of main, where the specialization is visible. Only non-dependent names are resolved in the context of template definition.


I’ve been planning on writing about the Google’s MapReduce algorithm for some time but I couldn’t find a good practical example. Then we had a Northwest C++ Users Group presentation by Steve Yegge and a followup discussion and beers, and I had a little epiphany. Steve was talking about, among other things, the build process. And that’s just a bunch of algorithms that are perfect for explaining MapReduce.

MapReduce is usually introduced in the context of distributed systems. It’s a system that spreads processing among multiple machines, often organized in huge server farms. But it can also be used on a single machine, to work with processes; or even in a single process, with threads. The beauty of it is that, if you write a build engine that uses MapReduce, you may scale it from a few threads to thousands of servers. It’s like writing a program that, rather than reading individual disk sectors, uses a file system. Such program will magically work on a USB stick as well as on a distributed file system.

The other hot trend, besides scalability, is data mining. Huge software projects are like the Internet. Developers spend a lot of time browsing source files. We need tools that are more than just search engines, we need smart tools that understand computer languages. And data mining fits very well into the MapReduce paradigm. There is fan-out of small independent tasks operating on single files, followed by a fan-in, during which data is combined. A perfect example of such an algorithm is mapping inter-file dependencies (mediated by include statements), something every build does to minimize rebuilding after a change.

I organized this blog to follow the old parable of three blind men and an elephant. The elephant in this case is the build. The first man touches the compile-link part of the build process and exclaims: “It’s MapReduce.” He has a well-developed sense of scale. The second one notices that not all files are rebuilt after a change and shouts: “It’s Whole Program Analysis.” He has an excellent data-mining ear. The third man, with a sequential nose, observes that not everything can be done in parallel and cries: “It’s a Topological Sort.”

So what is a build?

It’s MapReduce

Obviously, you can build a project on a single machine by performing a sequence of actions. I choose to treat this as a trivial case of a distributed process. Let’s start with the simplest thing: You have a bunch of C++ files and you want to compile them and link the resulting object files into an executable.

The separate compilation model tells us that the compilation of any source file (a Translation Unit, if you wish) is independent of the compilation of any other source file. Therefore, the compilation can (and I’d say, should) be run in parallel. You will not only make more cores busy, but also overlap I/O-intensive parts of compilation.

So the build engine would fan out a number of compilation tasks and, when they are all finished, combine the outputs (object files) and pass them to the linker. Let’s call the first part “mapping” and the second part “reducing.” That’s your MapReduce in a nutshell.

To be more specific, MapReduce abstracts this process so it may be reused in many different contexts. It’s a framework into which the client plugs in the parts that define what may be done in parallel and how to combine the results.

Here are the general steps:

  1. The client specifies input data. In our example this would be a list of files to be rebuilt.
  2. MapReduce partitions this list among multiple worker threads, processes, or machines.
  3. Each worker executes client-provided code–a function called map. In our example, we’d write map to run the compiler on a single file to produce the object file. This function “maps” a source file into an object file.
  4. MapReduce then waits for all workers to finish (possibly re-distributing the work if a server crashes). This barrier is the main synchronization point in the algorithm.
  5. MapReduce reshuffles the results and, again, distributes them to different workers for the final phase of the algorithm.
  6. Each worker executes client-provided code–a function called reduce. In our example we’d write reduce to execute a linker on a list of object files to produce the executable.

For all this to work, the client must organize data in a particular form that is palatable to MapReduce. The input must be a list of (key, data) pairs.

Since keys and data may be of arbitrary type, why this separation? MapReduce needs keys to be able to partition the work. Each key uniquely identifies a task that may be done in parallel with other tasks. In our case the key would be a source file name (or, in general, a path). The keys are sorted by MapReduce and partitioned equitably among workers (often the client has the option to override the default partitioning algorithm or the compare function).

The map function, written by the client, must conform to a particular (generic) interface. It takes a key and the associated data. In our case map would take a file name (key) and some data to be defined later. It would compile the file to an object. Notice that map is myopic by nature. It concerns itself only with a small fraction of the whole process. Because it doesn’t have to know about the big picture (here, the build), its implementation is relatively easy (here, it just calls the compiler).

The output of map has to conform to certain standards too, so it can be shuffled by MapReduce and passed to the second client-defined function, reduce. The map function must emit new pairs of (key, data). The new keys and the new data may be totally unrelated to the original ones, both in meaning and type.

Back to our example, to make reduce nontrivial, let’s shoot for a little more general scanario: we want to build multiple targets at once. In Visual Studio, for instance, building the solution might involve building several projects, each producing a separate executable, library, or a DLL. We’d make map emit pairs (target name, object name). (The target name will have to be passed to map as input data.)

Here’s my simplified implementation of map (a more complete toy example follows at the end of this post):

void map(std::string const & file, std::string const & target)
{
    std::string cmd = "cl /c ";
    cmd += file;
    execute(cmd);
    // output new (key, value) pair
    emit(target, ObjFileName(file));
}

When all workers are done, MapReduce gathers the emitted pairs and sorts them by the key so that it can accumulate data for each key into a separate list. It then redistributes the results among workers. As before, the keys define units of concurrency. Here we assume that each target may be linked independently (this is not always true: see “It’s a Topological Sort”).

The second-phase workers execute the client-defined reduce. The first argument to reduce is the new key, and the second is a list of data for this key.

In our example MapReduce would combine the outputs of all maps to build a list of object files corresponding to each build target. And that’s exactly what the linker needs. Here’s my toy implementation of reduce:

void reduce(std::string const & target, 
    std::list<std::string> const & objFiles)
{
    std::string cmd = "link /out ";
    cmd += ExeNames[target];
    std::for_each(objFiles.begin(), objFiles.end(), 
        [&](std::string const & obj) {
            cmd += " ";
            cmd += obj;
        });
    execute(cmd);
}

(I explain my use of lambdas in the Appendix.)

Of course this is a very simplified picture of just one part of the build process. Still, I suspect many a build environment use the same overall idea, and maybe some of them even use MapReduce to implement it.

There are many distributed build environment in use today. The difference is that they either put distribution on top of non-distributed builds or write one monolithic application. That restricts their reusability and leads to duplication of functionality. Here, on the other hand, distribution is a layer below the build engine. The MapReduce implementation, whether on a single machine or over a server farm, knows nothing about building software projects, so it doesn’t duplicate any functionality and is perfectly reusable. It’s a matter of whether the build engine uses, say, message-passing API or, more abstracted, MapReduce API.

It’s Whole Program Analysis

Separate compilation model means that the compiler has access to only one source file at a time. The build environment, on the other hand, has access to the whole program. So if there is any need for whole program analysis, that’s the natural place to put it.

Figuring out dependencies between program files is an example of whole program analysis. The build needs it because knowing the dependency graph allows it to recompile only the minimum number of files after one of them (for instance, a header file) changes. There is a program, makedepend, that finds such dependencies by looking at the #include statements.

Inverted Index

Finding dependencies is a lot like creation of an inverted index — a classic example of how to use MapReduce. The map function scans a given document and emits a stream of pairs, (word, document). The reduce function gets a word as the key and a list of documents in which this word occurs (possibly with some positional information). The bulk of the work is done inside MapReduce by sorting the results of map by the word.

As an exercise, let’s see how a simple makedepend could be implemented using MapReduce. Suppose that we have a dedicated parser (or a smart preprocessor) that takes a source file and finds all the includes on which it depends. (In C++, the parser must do it recursively, going into includes of includes, etc. See item 7 in Walter Bright’s post.)

What the build process really needs is the answer to a different question: Which files must be recompiled if a given include file changes?

Here’s how we may answer this question using MapReduce. Our map function should take a source file name as key and run our parser over this file. (We also have to know what constants are predefined for a given build target, because of conditional compilation. We could pass them as data to map.)

What map would emit is a stream of pairs, (include file, current source file). Notice the inversion: For map, the name of the source file was the key; here the name of the include file is the key. Because of that trick, MapReduce will do the bulk of work for us. It will sort and group data by the new key. Thus the input to our reduce function will be the name of an include file and a list of all source files that depend on it. These are the files that will have to be recompiled if that header file changes.

Here again, a higher level of abstraction–writing the dependency engine on top of MapReduce–provides better scalability and avoids code duplication. It also opens up new possibilities. Consider the building of a dependency graph as an example of data mining. In general, a build engine that works on top of MapReduce could easily dispatch all kinds of bots that extract local data from every file and combine them into a global index or database.

In particular, many IDEs attempt, with higher or lower degree of success, to build call graphs, definition/use graphs, inheritance graphs, etc.

Also, there are programs that can infer const-ness or, in Java, non-null-ness. In dynamic languages, type inference is very useful, and it requires whole-program analysis. Not to mention programs that can find potential data races or deadlocks. They all follow the same pattern and can easily be fit into a MapReduce build engine.

It’s a Topological Sort

The build process must also deal with other types of dependencies: when the result of one operation is a prerequisite for another operation. If you’ve ever built a compiler using flex and bison, you know that before you can run flex you need to run bison to produce the appropriate include files that contain, among others, definitions of tokens. These kinds of dependencies are usually explicitly written into makefiles, as in this example:

lang.tab.hpp lang.tab.cpp: lang.ypp
    bison lang.ypp
lex.yy.c: lang.lex lang.tab.hpp
    flex lang.lex

When you have results of some operations being prerequisites for other operations, you use the good old topological sort to organize your work. It so happens that topological sort may also be done using MapReduce (see, for instance, Ricky Ho’s blog).

I’ll explain it using a classic example: getting dressed in the morning. Our set consists of objects of clothing:

{jeans, lsock, rsock, lshoe, rshoe}

You can’t get dressed by putting on those objects in random order. Traditionally, you put on a sock before you put on a shoe. A sock must be on the list of prerequisites for a shoe.

In the preliminary stage of topological sort, we prepare a list of pairs, (object, prerequisite). In our case, the list consists of:

(lshoe, lsock)
(lshoe, jeans)
(rshoe, rsock)
(rshoe, jeans) 

Our map1 iterates over the list of prerequisites for a given object and emits inverted pairs: the prerequisite is the key, and the object is the data. In other words, it emits (object, dependent) pairs.

void map1(std::string const & object, std::string const & prereq)
{
    emit(prereq, object);
}

MapReduce sorts and reshuffles those pairs and then calls reduce1.

void reduce1(std::string const & object, StringList const & dependents)
{
    TheDependentsOf[object] = dependents;
};

Now we have, for each file, both the original list of prerequisites, and the newly created list of dependents. For the sake of this exposition, I’ll store these list in two global maps:

StringToList ThePrerequisitesOf;
StringToList TheDependentsOf;

In our case, the resulting dependents map will contain:

TheDependentsOf[jeans] = {lshoe, rshoe};
The DependentsOf[lsock] = {lshoe};
TheDependentsOf[rsock] = {rshoe};

The second part of the algorithm is a loop in which MapReduce is called repeatedly until all objects are consumed (I mean, put on).

The map2 function takes an object of clothing as key and a pair of lists, its prerequisites and its dependents, as data. If the list of prerequisites is not empty, it does nothing–we can’t put on that object yet.

Otherwise it performs the action: “Put the object on.” In our case, jeans and socks have no prerequisites, so they are put on immediately. Once the object is put on, our map2 removes that object from the list of objects. It also iterates over the list of its dependents and emits inverted pairs, (dependent, current object). For instance, once the jeans are put on, two pairs are emitted:

(lshoe, jeans),
(rshoe, jeans)

because both shoes depended on the jeans.

Here’s the code:

void map2(std::string const & object, TwoLists const & preqsAndDepds)
{
    if (preqsAndDepds.first.empty())
    {
        // This object has no prerequisites. Put it on.
        std::string cmd = "put on ";
        execute(cmd + object);
        // Remove it from the global list
        TheObjects.erase(
            std::find(TheObjects.begin(), TheObjects.end(), object));
        // Emit pairs (object, its completed prerequisite)
        StringList const & depds = preqsAndDepds.second;
        std::for_each(depds.begin(), depds.end(),
            [&](std::string const & dep) {
                emit(dep, object);
            });
    }
}

The reduce2 function is then called with an object and the list of its completed prerequisites. In our case, the arguments will be:

lshoe, {jeans, lsock}
rshoe, {jeans, rsock}

The reduce2 function removes the completed prerequisites from the list of all prerequisites for a given object.

In our case, reduce2 will remove both jeans and the left sock from ThePrerequisitsOf[lshoe].

It will also remove the jeans and the right sock from ThePrerequisitesOf[rshoe].

void reduce2(std::string const & object, StringList const & complPrereqs)
{
    std::for_each(complPrereqs.begin(), complPrereqs.end(),
        [&](std::string const & completed) {
            StringList & lst = ThePrerequisitesOf[object];
            lst.erase(std::find(lst.begin(), lst.end(), completed));
        });
}

As a result, the shoes end up with no prerequisites, so they will be put on in the next (and last) iteration of MapReduce.

Limitations of MapReduce

MapReduce is one particular approach to data-driven parallelism. In general it’s not a good fit for problems that exhibit task-driven parallelism. But even within the data-driven domain, MapReduce has competitors, one of them being PGAS (Partitioned Global Address Space; see my post on The Future of Concurrent Programming). Let’s briefly compare the two approaches.

MapReduce is closer to the functional-programming, message-passing, data-copying paradigm. PGAS is a generalization of shared-memory paradigm. It creates the illusion of global address space spanning multiple processes or machines. Consequently, MapReduce doesn’t require explicit synchronization while PGAS often does. On the other hand, MapReduce incurs more data copying even if it runs on threads of the same process (I’ll try to quantify this in my next post).

MapReduce requires a particular form of input — a set of (key, data) pairs. The distribution of work is encoded in the input and is driven by the choice of keys–each key defining a minimum unit of concurrency. PGAS works on shared data structures, typically multi-dimensional arrays. Distribution of work is abstracted from data and from the algorithm– it is defined by distribution maps over data structures (this is also explained in my previous post). A distribution may be modified without changing any other part of the algorithm. With MapReduce that would require either the change of keys, or a client-defined partitioning function.

A lot of MapReduce applications end up sharing data one way or another. Even my little example assumes that files, in particular header files, are accessible form every location. This is usually made possible through some kind of distributed file system. Google, the owner of the patent on MapReduce (I just hope they won’t “do evil” by suing other companies or individuals who use it or, gasp!, blog about it), has its own GFS (Google File System) that’s used in concert with MapReduce.

Finally, it’s not clear what fraction of problems lend themselves to MapReduce. Scientific simulations, for instance, are usually easier to describe in terms of large matrices, and therefore fit the PGAS mold more naturally. I plan to write more on this topic in my next post.

Appendix: Bob the Builder

Just for fun, below is a short C++ program I wrote to illustrate the use of MapReduce in the build process. Since MapReduce has its roots in functional programming, I felt free to use the C++0x lambdas at my convenience. For instance, in this snippet:

[&](Pair const & p) { targetFiles[p.first].push_back(p.second); }

I define an anonymous function that takes a Pair (by const reference) and adds its second component to the list targetFiles[p.first].

Notice that targetFiles is defined outside of the scope of the lambda–it is captured from the environment. A function that captures the environment is called a closure. Here I told the compiler to capture the whole environment, by reference, using the notation [&].

// A toy implementation of MapReduce. Sorry for using globals.

#include <iostream>
#include <string>
#include <list>
#include <map>
#include <algorithm>

std::list<std::pair<std::string, std::string>> Emitted;

void emit(std::string const & key, std::string const & data)
{
   std::cout << "  emit: (" << key.c_str() << ", " 
      << data.c_str() << ")\n";
   Emitted.push_back(make_pair(key, data));
}

void execute(std::string const & cmd)
{
   std::cout << cmd.c_str() << std::endl;
}

typedef std::pair<std::string, std::string> Pair;
typedef std::list<Pair> InputList;
typedef std::list<std::string> StringList;

void MapReduce(InputList const & input, 
   void (*map)(std::string const & key, std::string const & data),
   void (*reduce)(std::string const & key, StringList const & lst))
{
   // Distribute the map part to workers (here: do them sequentially)
   std::for_each(input.begin(), input.end(), [&](Pair const & p) {
      map(p.first, p.second);
   });
   // (--Wait for all workers to finish--)
   // Reshuffle emitted key/value pairs
   std::map<std::string, StringList> targetFiles;
   std::for_each(Emitted.begin(), Emitted.end(), 
      [&](Pair const & p) {
         targetFiles[p.first].push_back(p.second);
      });
   // Distribute the reduce part to workers (here: do them sequentially)
   std::for_each(targetFiles.begin(), targetFiles.end(), 
      [&](std::pair<std::string, StringList> const & p) {
         reduce(p.first, p.second);
      });
}

std::string ObjFileName(std::string const & srcFileName)
{
   std::string oFile = srcFileName.substr(0, srcFileName.length() - 3);
   return oFile + "obj";
}

// Maps target names to executable names
std::map<std::string, std::string> ExeNames;

void map(std::string const & file, std::string const & target)
{
   std::string cmd = "cl /c ";
   cmd += file;
   execute(cmd);
   emit(target, ObjFileName(file));
}

void reduce(std::string const & target, StringList const & objFiles)
{
   std::string cmd = "link /out ";
   cmd += ExeNames[target];
   std::for_each(objFiles.begin(), objFiles.end(), 
      [&](std::string const & obj) {
         cmd += " ";
         cmd += obj;
      });
   execute(cmd);
}

int main()
{
   // There are two targets
   ExeNames["MyApp"] = "MyApp.exe";
   ExeNames["YourApp"] = "YourApp.exe";
   // The input is a list of key/value pairs
   // Key: source file name
   // Data: target name
   InputList input;
   input.push_back(std::make_pair("foo.cpp", "MyApp"));
   input.push_back(std::make_pair("bar.cpp", "MyApp"));
   input.push_back(std::make_pair("baz.cpp", "YourApp"));

   MapReduce(input, &map, &reduce);
}

Here’s the output:

cl /c foo.cpp
  emit: (MyApp, foo.obj)
cl /c bar.cpp
  emit: (MyApp, bar.obj)
cl /c baz.cpp
  emit: (YourApp, baz.obj)
link /out MyApp.exe foo.obj bar.obj
link /out YourApp.exe baz.obj

Some of the brightest C++ minds had been working on concepts for years, only to vote them out of the C++0x spec–unanimously, not long after they’ve been voted in. In my naivete I decided to write a short blog post explaining why in my opinion it’s still worth pursuing concepts into the next revision of the Standard. I showed a draft to my friends and was overwhelmed by criticism. The topic turned out to be so controversial that I was forced not only to defend every single point but to discuss alternative approaches and their pluses. That’s why this post grew so large.

What Are Concepts?

In a nutshell, concepts are a type system for types. Below I have tabled corresponding features and examples of non-generic programming with types vs. generic programming with concepts. The concept Iterator specifies the “type” of the type parameter, T, which is used in the definition of the template function, Count.

Regular Programming Generic Programming
functions, data structures template functions, parametrized types
Example:
int strlen(char const * s)
template<Iterator T>
void Count(T first, T last)
parameter: s type parameter: T
type: char const * concept: Iterator



Just to give you a feel for the syntax, here's an example of a concept--a simplified definition of Iterator:

concept Iterator<typename T>
    : EqualityComparable<T>
{ 
    typename value_type;
    T& operator++(T&);
    ...
}

Iterator illustrates both the simplicity and the complexity of concepts. Let me concentrate on the simple aspects first and leave the discussion of complexities for later.

Iterator refines (inherits from) another concept, EqualityComparable (anything that can be compared using '=='). It declares an associated type, value_type; and an associated function, operator++. Associated types and functions are the bread and butter of concepts.

You might have seen something like an "associated type" implemented as a typedef, value_type, in STL iterators. Of course, such typedefs are only a convention. You may learn about such conventions by analyzing somebody else's code or by reading documentation (it so happens that STL is well documented, but that's definitely not true of all libraries), but they are not part of the language.

You use concepts when you define template functions or parametrized data structures. For instance, here's a toy function template, Count:

template<Iterator T> int Count(T first, T last) {
    int count = 0;
    while (!(first == last)) { 
        ++first;
        ++count;
    }
    return count;
}

As you can see, this function uses two properties of the concept Iterator: you can compare two iterators for equality and you can increment an iterator.

Ignoring the fine points for now, this seems to be pretty straightforward. Now, let's see what problems were concepts designed to solve.

Who Ordered Concepts?

Besides being an elegant abstraction over the type system, concepts offer some of the same advantages that types do:

  • Better error reporting
  • Better documentation
  • Overloading

Error Reporting

Why do templates produce such horrible error messages? The main reason is that errors are detected too late.

It's the same phenomenon you have in weakly typed languages, except that there error detection is postponed even further--till runtime. In weakly typed languages an error is discovered when, for instance, a certain operation cannot be performed on a certain variable. This might happen deep into the calling tree, often inside a library you're not familiar with. You essentially need to examine a stack trace to figure out where the root cause of the problem is and whether it's in your code or inside a library.

Conversely, in statically typed languages, most errors are detected during type-checking. The fact that a certain operation cannot be performed on a certain variable is encoded in the type of that variable. If the error is indeed in your code, it will be detected at the call site and will be much more informative.

In current C++, template errors are detected when a certain type argument does not support a certain operation (often expressed as the inability to convert one type to another or a failure of type deduction). It often happens deep into the instantiation tree. You need an equivalent of the stack trace to figure out where the root cause of the error is. It's called an "instantiation stack" and is dumped by the compiler upon a template error. It often spans several pages and contains unfamiliar names and implementation details of some library code.

I suspect that the main reason a lot of C++ programmers still shun template programming is because they were burnt by such error messages. I know that I take a coffee break before approaching a template error message.

Self Documentation

The only thing better than early error detection is not making the error in the first place. When you're calling a function, you usually look at its signature first. The signature tells you what types are expected and what type is returned. You don't have to study the body of the function and the bodies of functions that it calls (and so on). You don't have to rely on comments or off-line documentation. Signatures are the kind of documentation that is checked by the compiler. Even more importantly, at some point the compiler also checks the body of the function you're calling to see if its arguments are used in accordance with their declared types.

Contrast it with template programming. If the "signature" of a template is of the form template<class T>, you have no information about T. What kind of T's are acceptable can only be divined by studying the template's body and the bodies of templates it is instantiating. (And, of course, the compiler can do nothing to check this "signature" against the template body.)

The Case of STL

Even if you're not defining templates in your own code, you probably use them through the Standard Library--STL in particular. STL is a library of algorithms and containers. Alexander Stepanov's stroke of genius was to separate the two. Using STL you may apply one of the 112 algorithms to 12 standard containers. It's easy to calculate that there are 1344 possible combinations (not counting those algorithms that operate on two or three containers at a time).

Of course not all combinations of algorithms and containers make sense. For instance, you can't apply std::sort to a std::list. The reason is simple: sort (which is based on quicksort) requires random access to the container; a list can only provide sequential access. And this is the crux of the matter: this important restriction on access cannot be expressed in C++. Granted, the code will not compile because at some point sort will try to perform some operation that the list iterator doesn't support. You'll get a page-long error message. By my last count a leading C++ compiler emits 73 cryptic lines, many of them stretching to 350 characters, and nary a mention of random access being required by sort.

Template error messages would be even worse if not for a system of hacks (I mean, template tricks) and concept-like conventions in the STL. For instance, a dummy field iterator_tag corresponds to a concept name and iterator traits look a lot like concept maps. With a number of new features in C++0x (especially decltype that make SFINAE constructs such as the enable_if more powerful), even more "tricks" became possible. If some of this sounds like Gibberish to you, you're in good company. That's the mess that the concept proposal was trying to fix. Concepts do introduce a lot of complexity into the language but they reduce and organize the complexity of programs.

It's worth mentioning that Alex Stepanov collaborated in the development of the concept description language, Tecton, and participated in the C++ concept effort, so he's been acutely aware of the problems with STL.

Competing Options

The idea of limiting types that can be passed to generic algorithms or data structures has been around for some time. The C++ concept proposal is based on the Haskell's notion of type classes. Several other languages opted for simpler constrained templates instead.

Constrained Templates

One way to look at concepts is that they constrain the kind of types that may be passed to a template function or data type. One can drop concepts and instead impose type constraints directly in the definition of a template. For instance, rather than defining the Iterator concept, we could have tried to impose analogous type constraints in the definition of Count (C++ doesn't have special syntax for it but, of course, there are template tricks for everything).

There are various ways of expressing type constraints, so let's look at a few examples. In Java, everything is an object, and objects are classified by classes and interfaces. Java reuses the same class/interface mechanism for constraining template parameters. A generic SortedList, for instance, is parametrized by the type of its elements, T. The constraint imposed on type T is that it implements the interface Comparable.

class SortedList<T extends Comparable<T>>

This approach breaks down for built-in types since, for instance, int cannot be derived from Comparable. (Java's workaround is to use boxed types, like Integer, which are magically derived from Comparable.)

C# has similar approach, with some restrictions and some extensions. For instance, C# lets you specify that a given type is default constructible, as in this example:

class WidgetFactory<T> where T : Widget, new()

Here type T must be derived from Widget and it must have a default constructor.

In the D programming language, template constraints may be expressed by arbitrary compile-time functions as well as use-patterns, which are the topic of the next section.

These approaches are constrained to structural (as opposed to semantic) matching of types. A type is accepted by a constrained template if the operations supported by it match those specified by the constraints name-by-name. If one library names the same operations differently, they won't match (see how Concept Maps deal with this problem).

Use Patterns vs. Signatures

Use patterns formed the basis on the original Texas proposal for C++ Concepts. The idea was to show to the compiler representative examples of what you were planning to do with type arguments. These use patterns look almost like straightforward C++. For instance, an Iterator concept might be defined as follows:

concept Iterator<typename Iter, typename Value> {
    Var<Iter> i;
    Iter j = i; // copyable
    bool b = (i != j); // (non)-equality comparable
    ++i; // supports operator++
    Var<const Value> v;
    *i = v; // connects Value type with Iter type
    ...
}

Notice the subtlety with defining dummy variables i and v. You can't just write:

Iter i;

because that would imply that Iter has a default constructor-- which is not required. Hence special notation Var<T> for introducing dummy variables. In general, the use-pattern language was supposed to be a subset of C++ with some additional syntax on the side. To my knowledge this subset has never been formally defined. Also, there are some useful concepts that are hard or impossible to define in use-pattern language (see the sidebox).

Then there is the problem of concept-checking the body of a template before it is instantiated-- the holy grail of conceptology. It involves some nontrivial analysis and comparison of two parsing trees (the pattern's, and the template body's). As far as I know no formal algorithm has been presented.

Because of the difficulties in formalizing the use-pattern approach, the C++ committee decided to focus on defining concepts using signatures.

We've seen such a signature in the Iterator concept: T&operator++(T&).

How are we to interpret a signature within the context of a concept? Naively, we'd require that there should exist a free function called operator++. But what if the type T has a method operator++ instead? Do we have to list it too (as T::operator++())? A uniform call syntax was proposed that would unify functions and method calls. Our operator++ would match a function or a method (this feature didn't make it into the final concept proposal.)

Also, an operator signature must also match the built-in operator--this way we'd allow an arbitrary pointer to match the Iterator concept.

These are some of the complications I mentioned at the beginning of this post. There are also complications related to symbol lookup (it has to take into account the Argument-Dependent-Lookup hack) and overload resolution, which are notoriously difficult in C++.

Who Ordered Concept Maps?

So far we've talked about concepts and about templates that use concepts. Now let's focus on what happens when we instantiate a template with concrete types. How does the compiler check that those types fulfill the constraints of the concepts? One possibility is the "duck-typing" approach--if the type in question defines the same associated types and functions (and possibly some other constraints I haven't talked about), it is a match. This is called structural (same set of functions) and nominal (by-name) matching.

Structural matching is simple and straightforward but it might lead to "false positives." Here's the archetypal example: Syntactically, there is no difference between a ForwardIterator and an InputIterator, yet semantically those two are not equivalent. Some algorithms that work fine onForwardIterators will fail on InputIterators (the difference is that you can't go twice through an InputIterator).

Semantic matching, on the other hand, requires the client to explicitly match types with concepts (only the clients knows the meaning of a particular concept). This is done through concept maps (in Haskell they are called "instances"). With this approach, the mapping between Iterator and a pointer to T would require the following statement:

template<T>
concept_map Iterator<T *> {
    typedef T value_type;
};

Notice that we don't have to provide the mapping for operator++; it's automatically mapped into the built-in ++. In general, if the associated functions have the same names as the functions defined for the particular type (like ++ in this case), the concept map may be empty. It's mostly because of the need to create those empty concept maps that the concept proposal was rejected.

Interestingly, both groups that worked on concepts-- the Texas group with Bjarne Stroustrup and the Indiana group with Doug Gregor-- supported concept maps. The difference was that the Texas group wanted concepts to be matched automatically, unless marked as explicit; whereas the Indiana group required explicit concept maps, unless the concept was marked auto.

On a more general level, concept maps enable cross-matching between different libraries and naming conventions. Semantically, a concept from one library may match a type from another library, but the corresponding functions and types may have different names. For instance, a StackOfInt concept from one library may declare an associated function called push. A vector from the Standard Library, conceptually, implements this interface, but uses a different name for it. A concept map would provide the appropriate mapping:

concept_map StackOfInt<std::vector<int>> {
    void push(std::vector<int> & v, int x) {
        v.push_back(x);
    }
}

Conclusion

In my opinion, concepts would add a lot to C++. They would replace conventions and hacks in the STL with a coherent set of language rules. They would provide verifiable documentation and drastically simplify template error messages.

Concept maps have an important role to play as the glue between concepts and types. With concept maps you'll be able to add, post hoc, a translation layer between disparate libraries. The back-and-forth between explicit and auto concepts has a philosophical aspect: Do we prefer precision or recall in concept matching? It also has a practical aspect: Is writing empty concept maps an abomination? More experience is needed to resolve this issue.

The major complaint I've heard about the concept proposal was that it was too complex. Indeed, it took 30 pages to explain it in the Draft Standard. However, adding a new layer of abstraction to an existing language, especially one with such baggage as C++, cannot be trivial. The spec had to take into account all possible interaction with other language features, such as name resolution, overload resolution, ADL, various syntactic idiosyncrasies, and so on. But programmers rarely read language specs. I'm sure that, if there was a book "Effective Concepts" or some such, programmers would quickly absorb the rules and start using concepts.

The current hope is that concepts will become part of the 1x version of C++. Doug Gregor is working on a new version of the C++ compiler, which is likely to become testing grounds for concepts. He also publishes a blog with Dave Abrahams (see for instance this post and the follow up).

On the other hand, the concept effort has definitely lost its momentum after it was voted out of C++0x. Corporate support for further research is dwindling. The hope is in the educated programmers saying no to hacks and demanding the introduction of concepts in future versions of the C++ Standard.

Acknowledgments

I'm grateful to the "Seattle Gang", in particular (alphabetically) Andrei Alexandrescu, Walter Bright, Dave Held, Eric Niebler, and Brad Roberts, for reading and commenting on several drafts of this post. I'd like to thank Jeremy Siek for interesting discussions and the inspiration for this post.

Bibliography

  1. Gabriel Dos Reis and Bjarne Stroustrup, Specifying C++ Concepts, the original Texas proposal with use patterns.
  2. Joint paper by the Indiana and Texas groups, Concepts: Linguistic Support for Generic Programming in C++
  3. Google Video of Doug Gregor talking about concepts.
  4. 2009 Draft C++ Standard with concepts still in. See section 14.10.
  5. Bjarne Stroustrup, The concept killer article.
  6. Jeremy Siek, The C++0x Concept Effort presentation slides.
  7. Posts about concepts on C++Next by Dave Abrahams and Doug Gregor.
  8. Dave Abrahams, To Auto or Not?.
  9. Examples of concept-like hacks in C++0x.

I recently gave a presentation at the Northwest C++ Users Group on message passing. The video and the slides are available here:

Here’s my premise: I needed a flexible system of primitives for passing messages between threads in C++. After studying existing solutions in C++ (Boost, MPI) and in other languages (Haskell MVars and Caml channels and events) I created my own C++ mini-library. It’s general (no restrictions on message types or message storage), composable (allows waiting on multiple polymorphic channels), type-safe, and first-class (you may pass channels inside messages). Considering all this, the library turned out to be surprisingly simple. One thing I gave up on, and I think it was the right decision, is location transparency. Message passing between threads is fundamentally different from message passing between processes or network nodes. Performance and first-class-ness require substantially different interfaces, not to mention different implementations.


Wouldn’t it be nice to be able to write sequential programs and let the compiler or the runtime automatically find opportunities for parallel execution? Something like this is already being done on a micro scale inside processors. As much as possible, they try to execute individual instructions in parallel. Of course they have to figure out data dependencies and occasionally stall the pipeline or idle while waiting for a memory fetch. More sophisticated processors are even able to speculate–they guess a value that hasn’t been calculated or fetched yet and run speculative execution. When the value is finally available, they compare it with their guess and either discard or commit the execution.

If processors can do it, why can’t a language runtime do the same on a larger scale? It would solve the problem of effectively using all those cores that keep multiplying like rabbits on a chip.

The truth is, we haven’t figured out yet how to do it. Automatic parallelization is, in general, too complex. But if the programmer is willing to provide just enough hints to the compiler, the runtime might figure things out. Such programming model is called semi-implicit parallelism and has been implemented in two very different environments, in Haskell and in .NET. The two relevant papers I’m going to discuss are Runtime Support for Multicore Haskell and The Design of a Task Parallel Library.

In both cases the idea is to tell the compiler that certain calculations may be done in parallel. It doesn’t necessarily mean that the code will be executed in multiple threads–the runtime makes this decision depending on the number of cores and their availability. The important thing is that, other than providing those hints, the programmer doesn’t have to deal with threads or, at least in Haskell, with synchronization. I will start with Haskell but, if you’re not into functional programming, you may skip to .NET and the Task Parallel Library (and hopefully come back to Haskell later).

In Multicore Haskell

In Haskell, you hint at parallel execution using the par combinator, which you insert between two expressions, like this: e1 `par` e2. The runtime then creates a spark for the left hand side expression (here, e1). A spark is a deferred calculation that may be executed in parallel (in the .NET implementation a spark is called a task). Notice that, in Haskell, which is a lazy programming language, all calculations are, by design, deferred until their results are needed; at which point their evaluation is forced. The same mechanism kicks in when the result of a spark is needed–and it hasn’t been calculated in parallel yet. In such a case the spark is immediately evaluated in the current thread (thus forfeiting the chance for parallel execution). The hope is though that enough sparks will be ready before their results are needed, leading to an overall speedup.

To further control when the sparks are evaluated (whether in parallel or not), Haskell provides another combinator, pseq, which enforces sequencing. You insert it between two expressions, e1 `pseq` e2, to make sure that the left hand side, e1, is evaluated before the evaluation of e2 is started.

I’ll show you how to parallelize the standard map function (in C++ it would be called std::transform). Map applies a function, passed to it as the first argument, to each element of a list, which is passed to it as the second argument. As a Haskell refresher, let me walk you through the implementation of map.

map f []     = []
map f (x:xs) = y:ys
    where y  = f x
          ys = map f xs

Map is implemented recursively, so its definition is split into the base case and the recursive case. The base case just states that map applied to an empty list, [], returns an empty list (it ignores the first argument, f).

If, on the other hand, the list is non-empty, it can be split into its head and tail. This is done through pattern matching–the pattern being (x:xs), where x matches the head element and xs the (possibly empty) tail of the list.

In that case, map is defined to return a new list, (y:ys) whose head is y and tail is ys. The where clause defines those two: y is the result of the application of the function f to x, and ys is the result of the recursive application of map to the tail of the list, xs.

The parallel version does the same (it is semantically equivalent to the sequential version), but it gives the runtime the opportunity to perform function applications in paralle. It also waits for the evaluation to finish.

parMap f []     = []
parMap f (x:xs) = y `par` (ys `pseq` y:ys)
    where y  = f x
          ys = parMap f xs

The important changes are: y, the new head, may be evaluated in parallel with the tail (the use of the par combinator). The result, y:ys, is returned only when the tail part, ys, has been evaluated (the use of the pseq combinator).

The tail calculation is also split into parallel computations through recursive calls to parMap. The net result is that all applications of f to elements of the list are potentially done in parallel. Because of the use of pseq, all the elements (except for the very first one) are guaranteed to have been evaluated before parMap returns.

It’s instructive to walk through the execution of parMap step-by-step. For simplicity, let’s perform parMap on a two-element list, [a, b].

First we pattern-match this list to x = a and xs = [b]. We create the first spark for the evaluation of (y = f a) and then proceed with the evaluation of the right hand side of par, (ys `pseq` y:ys). Here ys = parMap f [b].

Because of the `pseq`, we must evaluate ys next. To do that, we call (parMap f [b]). Now the list [b] is split into the head, b, and the empty tail, []. We create a spark to evaluate y' = f b and proceed with the right-hand side, (ys' `pseq` y':ys').

Again, the `pseq` waits for the evaluation of ys' = parMap f []. But this one is easy: we apply the base definition of parMap, which returns an empty list.

Now we are ready to retrace our steps. The right hand side of the last `pseq` re-forms the list y':[]. But that’s the ys the previous `pseq` was waiting for. It can now proceed, producing y:(y':[]), which is the same as [y, y'] or [f a, f b], which is what we were expecting.

Notice complete absence of explicit synchronization in this code. This is due to the functional nature of Haskell. There’s no shared mutable state so no locking or atomic operations are needed. (More explicit concurrent models are also available in Haskell, using MVars or transactional memory.).

Task Parallel Library in .NET

It’s no coincidence that many ideas from Haskell end up in Microsoft languages. Many Haskell programmers work for Microsoft Research, including the ultimate guru, Simon Peyton Jones. The Microsoft Task Parallel Library (TPL) translates the ideas from Multicore Haskell to .NET. One of its authors, Daan Leijen, is a Haskell programmer who, at some point, collaborated with Simon Peyton Jones. Of course, a .NET language like C# presents a different set of obstacles to parallel programming. It operates on mutable state which needs protection from concurrent access. This protection (which, incidentally, is the hardest part of multithreaded programming) is left to the programmer.

Here’s the example of an algorithm in C# with hidden opportunities for parallel implementation. MatrixMult multiplies two matrices. It iterates over columns and rows of the result matrix. The value that goes at their intersection is calculated by the innermost loop.

void MatrixMult(int size, double[,] m1,double[,] m2, double[,] result)
{
   for(int i = 0; i < size; i++){
      // calculate the i'th column
      for(int j = 0; j < size; j++){
         result[i, j] = 0;
         for(int k = 0; k < size; k++){
              result[i, j] += m1[i, k] * m2[k, j];
         }
      }
   }
}

Each column of the result could potentially be evaluated in parallel. The problem is, the size of the array and the number of processor cores might be unknown until the program is run. Creating a large number of threads when there are only a few cores may lead to a considerable slowdown, which is the opposite of what we want. So the goal of TPL is to let the programmer express the potential for parallel execution but leave it to the runtime to create an optimal number of threads.

The programmer splits the calculation into tasks (the equivalent of Haskell sparks) by making appropriate library calls; and the runtime maps those tasks into OS threads–many-to-one, if necessary.

Here’s how the same function looks with parallelization hooks.

void ParMatrixMult(int size, double[,] m1,double[,] m2, double[,] result)
{
   Parallel.For(0, size, delegate(int i)
   {
      for(int j = 0; j < size; j++){
         result[i, j] = 0;
         for(int k = 0; k < size; k++){
              result[i, j] += m1[i, k] * m2[k, j];
         }
      }
   });
}

Because of clever formatting, this version looks very similar to the original. The outer loop is replaced by the call to Parallel.For, which is one of the parallelizing TPL functions. The inner loops are packed into a delegate.

This delegate is assigned to a task (the analog of Haskell spark) that is potentially run in a separate thread. Here the delegate is actually a closure–it captures local variables, size, m1, m2 and result. The latter is actually modified inside the delegate. This is how shared mutable state sneaks into potentially multi-threaded execution. Luckily, in this case, such sharing doesn’t cause races. Consider however what would happen if we changed the types of the matrices from double[,] to char[,]. Parallel updates to neighboring byte-sized array elements may inadvertently encroach on each other and lead to incorrect results. Programmer beware! (This is not a problem in Haskell because of the absence of mutable state.)

But even if the programmer is aware of potential sharing and protects shared variables with locks, it’s not the end of the story. Consider this example:

int sum = 0;
Parallel.For(0, 10000, delegate(int i)
{
   if(isPrime(i)){
      lock(this) { sum += i; }
   }
});

The captured variable, sum is protected by the lock, so data races are avoided. This lock, however, becomes a performance bottleneck–it is taken for every prime number in the range.

Now consider the fact that, on a 4-core machine, we’ll be running 10000 tasks distributed between about 4 threads. It would be much more efficient to accumulate the sum in four local variables–no locking necessary–and add them together only at the end of the calculation. This recipe can be expressed abstractly as a map/reduce type of algorithm (a generalization of the C++ std::accumulate). The tasks are mapped into separate threads, which work in parallel, and the results are then reduced into the final answer.

Here’s how map/reduce is expressed in TPL:

int sum = Parallel.Aggregate(
  0, 10000, // domain
  0, // initial value
  delegate(int i){ return (isPrime(i) ? i : 0) },
  delegate(int x, int y){ return x+y; }
);

The first delegate, which is run by 10000 tasks, does not modify any shared state–it just returns its result, which is internally accumulated in some hidden local variable. The second delegate–the “reduce” part of the algorithm–is called when there’s a need to combine results from two different tasks.

The Place for Functional Programming

Notice that the last example was written in very functional style. In particular you don’t see any mutable state. The delegates are pure functions. This is no coincidence: functional programming has many advantages in parallel programming.

I’ve been doing a lot of multi-threaded programming in C++ lately and I noticed how my style is gradually shifting from object-oriented towards functional. This process is accellerating as functional features keep seeping into the C++ standard. Obviously, lambdas are very useful, but so is move semantics that’s been made possible by rvalue references, especially in passing data between threads. It’s becoming more and more obvious that, in order to be a good C++ programmer, one needs to study other languages. I recommend Haskell and Scala in particular. I’ll be blogging about them in the future.

Bibliography

  1. Simon Marlow, Simon Peyton Jones, and Satnam Singh, Runtime Support for Multicore Haskell
  2. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt, The Design of a Task Parallel Library
  3. A video introduction to Haskell by Simon Peyton Jones: Part I, Part II, together with the slides.

What’s the use of theory if you can’t apply it in practice? I’ve been blogging about concurrency, starting with the intricacies of multicore memory models, all the way through to experimental thread-safe type systems and obscure languages. In real life, however, I do most of my development in C++, which offers precious little support for multithreading. And yet I find that my coding is very strongly influenced by theoretical work.

C++ doesn’t support the type system based on ownership, which is a way to write provably race-free code; in fact it doesn’t have a notion of immutability, and very limited support for uniqueness. But that doesn’t stop me from considering ownership properties in my design and implementation. A lot of things become much clearer and less error-prone with better understanding of high-level paradigms.

I own a tiny software company, Reliable Software, which makes a distributed version control system, Code Co-op, written entirely in C++. Recently I had to increase the amount of concurrency in one of its components. I reviewed existing code, written some years ago, and realized how unnecessarily complex it was. No, it didn’t have data races or deadlocks, but it wasn’t clear how it would fare if more concurrency were added. I decided to rewrite it using the new understanding.

As it turned out, I was able to make it much more robust and maintainable. I found many examples of patterns based on ownership. I was able to better separate shared state from the non-shared, immutable from mutable, values from references. I was able to put synchronization exactly where it was needed and remove it from where it wasn’t. I’ve found reasonable trade-offs between message passing and data sharing.

It was a pretty amazing trip. I described it in my other blog, which deals more with actual programming experience. I though it might be a nice break from all the theory you normally find in this blog.

« Previous PageNext Page »