Haskell



I gave this presentation on Nov 17 at the Northwest C++ Users Group. Here are the links to the two-part video:

  1. Part I
  2. Part II

There was an unresolved discussion about one of the examples–the one about template specialization conflicting with modular type checking. It turns out the example was correct (that is an error would occur).

template<class T> struct vector {
    vector(int); // has constructor
};

//-- This one typechecks
template<LessThanComparable T>
void f(T x) { vector<T> v(100); ... }

//-- Specialization of vector for int
template<> struct vector<int> {
    // missing constructor!
};

int main() { f(4); } // error

The issue was whether the instantiation of f in main would see the specialization of the vector template for int even though the specialization occurs after the definition of f. Yes, it would, because vector<T> is a dependent name inside f. Dependent names are resolved in the context of template instantiation–here in the context of main, where the specialization is visible. Only non-dependent names are resolved in the context of template definition.


Some of the brightest C++ minds had been working on concepts for years, only to vote them out of the C++0x spec–unanimously, not long after they’ve been voted in. In my naivete I decided to write a short blog post explaining why in my opinion it’s still worth pursuing concepts into the next revision of the Standard. I showed a draft to my friends and was overwhelmed by criticism. The topic turned out to be so controversial that I was forced not only to defend every single point but to discuss alternative approaches and their pluses. That’s why this post grew so large.

What Are Concepts?

In a nutshell, concepts are a type system for types. Below I have tabled corresponding features and examples of non-generic programming with types vs. generic programming with concepts. The concept Iterator specifies the “type” of the type parameter, T, which is used in the definition of the template function, Count.

Regular Programming Generic Programming
functions, data structures template functions, parametrized types
Example:
int strlen(char const * s)
template<Iterator T>
void Count(T first, T last)
parameter: s type parameter: T
type: char const * concept: Iterator



Just to give you a feel for the syntax, here's an example of a concept--a simplified definition of Iterator:

concept Iterator<typename T>
    : EqualityComparable<T>
{ 
    typename value_type;
    T& operator++(T&);
    ...
}

Iterator illustrates both the simplicity and the complexity of concepts. Let me concentrate on the simple aspects first and leave the discussion of complexities for later.

Iterator refines (inherits from) another concept, EqualityComparable (anything that can be compared using '=='). It declares an associated type, value_type; and an associated function, operator++. Associated types and functions are the bread and butter of concepts.

You might have seen something like an "associated type" implemented as a typedef, value_type, in STL iterators. Of course, such typedefs are only a convention. You may learn about such conventions by analyzing somebody else's code or by reading documentation (it so happens that STL is well documented, but that's definitely not true of all libraries), but they are not part of the language.

You use concepts when you define template functions or parametrized data structures. For instance, here's a toy function template, Count:

template<Iterator T> int Count(T first, T last) {
    int count = 0;
    while (!(first == last)) { 
        ++first;
        ++count;
    }
    return count;
}

As you can see, this function uses two properties of the concept Iterator: you can compare two iterators for equality and you can increment an iterator.

Ignoring the fine points for now, this seems to be pretty straightforward. Now, let's see what problems were concepts designed to solve.

Who Ordered Concepts?

Besides being an elegant abstraction over the type system, concepts offer some of the same advantages that types do:

  • Better error reporting
  • Better documentation
  • Overloading

Error Reporting

Why do templates produce such horrible error messages? The main reason is that errors are detected too late.

It's the same phenomenon you have in weakly typed languages, except that there error detection is postponed even further--till runtime. In weakly typed languages an error is discovered when, for instance, a certain operation cannot be performed on a certain variable. This might happen deep into the calling tree, often inside a library you're not familiar with. You essentially need to examine a stack trace to figure out where the root cause of the problem is and whether it's in your code or inside a library.

Conversely, in statically typed languages, most errors are detected during type-checking. The fact that a certain operation cannot be performed on a certain variable is encoded in the type of that variable. If the error is indeed in your code, it will be detected at the call site and will be much more informative.

In current C++, template errors are detected when a certain type argument does not support a certain operation (often expressed as the inability to convert one type to another or a failure of type deduction). It often happens deep into the instantiation tree. You need an equivalent of the stack trace to figure out where the root cause of the error is. It's called an "instantiation stack" and is dumped by the compiler upon a template error. It often spans several pages and contains unfamiliar names and implementation details of some library code.

I suspect that the main reason a lot of C++ programmers still shun template programming is because they were burnt by such error messages. I know that I take a coffee break before approaching a template error message.

Self Documentation

The only thing better than early error detection is not making the error in the first place. When you're calling a function, you usually look at its signature first. The signature tells you what types are expected and what type is returned. You don't have to study the body of the function and the bodies of functions that it calls (and so on). You don't have to rely on comments or off-line documentation. Signatures are the kind of documentation that is checked by the compiler. Even more importantly, at some point the compiler also checks the body of the function you're calling to see if its arguments are used in accordance with their declared types.

Contrast it with template programming. If the "signature" of a template is of the form template<class T>, you have no information about T. What kind of T's are acceptable can only be divined by studying the template's body and the bodies of templates it is instantiating. (And, of course, the compiler can do nothing to check this "signature" against the template body.)

The Case of STL

Even if you're not defining templates in your own code, you probably use them through the Standard Library--STL in particular. STL is a library of algorithms and containers. Alexander Stepanov's stroke of genius was to separate the two. Using STL you may apply one of the 112 algorithms to 12 standard containers. It's easy to calculate that there are 1344 possible combinations (not counting those algorithms that operate on two or three containers at a time).

Of course not all combinations of algorithms and containers make sense. For instance, you can't apply std::sort to a std::list. The reason is simple: sort (which is based on quicksort) requires random access to the container; a list can only provide sequential access. And this is the crux of the matter: this important restriction on access cannot be expressed in C++. Granted, the code will not compile because at some point sort will try to perform some operation that the list iterator doesn't support. You'll get a page-long error message. By my last count a leading C++ compiler emits 73 cryptic lines, many of them stretching to 350 characters, and nary a mention of random access being required by sort.

Template error messages would be even worse if not for a system of hacks (I mean, template tricks) and concept-like conventions in the STL. For instance, a dummy field iterator_tag corresponds to a concept name and iterator traits look a lot like concept maps. With a number of new features in C++0x (especially decltype that make SFINAE constructs such as the enable_if more powerful), even more "tricks" became possible. If some of this sounds like Gibberish to you, you're in good company. That's the mess that the concept proposal was trying to fix. Concepts do introduce a lot of complexity into the language but they reduce and organize the complexity of programs.

It's worth mentioning that Alex Stepanov collaborated in the development of the concept description language, Tecton, and participated in the C++ concept effort, so he's been acutely aware of the problems with STL.

Competing Options

The idea of limiting types that can be passed to generic algorithms or data structures has been around for some time. The C++ concept proposal is based on the Haskell's notion of type classes. Several other languages opted for simpler constrained templates instead.

Constrained Templates

One way to look at concepts is that they constrain the kind of types that may be passed to a template function or data type. One can drop concepts and instead impose type constraints directly in the definition of a template. For instance, rather than defining the Iterator concept, we could have tried to impose analogous type constraints in the definition of Count (C++ doesn't have special syntax for it but, of course, there are template tricks for everything).

There are various ways of expressing type constraints, so let's look at a few examples. In Java, everything is an object, and objects are classified by classes and interfaces. Java reuses the same class/interface mechanism for constraining template parameters. A generic SortedList, for instance, is parametrized by the type of its elements, T. The constraint imposed on type T is that it implements the interface Comparable.

class SortedList<T extends Comparable<T>>

This approach breaks down for built-in types since, for instance, int cannot be derived from Comparable. (Java's workaround is to use boxed types, like Integer, which are magically derived from Comparable.)

C# has similar approach, with some restrictions and some extensions. For instance, C# lets you specify that a given type is default constructible, as in this example:

class WidgetFactory<T> where T : Widget, new()

Here type T must be derived from Widget and it must have a default constructor.

In the D programming language, template constraints may be expressed by arbitrary compile-time functions as well as use-patterns, which are the topic of the next section.

These approaches are constrained to structural (as opposed to semantic) matching of types. A type is accepted by a constrained template if the operations supported by it match those specified by the constraints name-by-name. If one library names the same operations differently, they won't match (see how Concept Maps deal with this problem).

Use Patterns vs. Signatures

Use patterns formed the basis on the original Texas proposal for C++ Concepts. The idea was to show to the compiler representative examples of what you were planning to do with type arguments. These use patterns look almost like straightforward C++. For instance, an Iterator concept might be defined as follows:

concept Iterator<typename Iter, typename Value> {
    Var<Iter> i;
    Iter j = i; // copyable
    bool b = (i != j); // (non)-equality comparable
    ++i; // supports operator++
    Var<const Value> v;
    *i = v; // connects Value type with Iter type
    ...
}

Notice the subtlety with defining dummy variables i and v. You can't just write:

Iter i;

because that would imply that Iter has a default constructor-- which is not required. Hence special notation Var<T> for introducing dummy variables. In general, the use-pattern language was supposed to be a subset of C++ with some additional syntax on the side. To my knowledge this subset has never been formally defined. Also, there are some useful concepts that are hard or impossible to define in use-pattern language (see the sidebox).

Then there is the problem of concept-checking the body of a template before it is instantiated-- the holy grail of conceptology. It involves some nontrivial analysis and comparison of two parsing trees (the pattern's, and the template body's). As far as I know no formal algorithm has been presented.

Because of the difficulties in formalizing the use-pattern approach, the C++ committee decided to focus on defining concepts using signatures.

We've seen such a signature in the Iterator concept: T&operator++(T&).

How are we to interpret a signature within the context of a concept? Naively, we'd require that there should exist a free function called operator++. But what if the type T has a method operator++ instead? Do we have to list it too (as T::operator++())? A uniform call syntax was proposed that would unify functions and method calls. Our operator++ would match a function or a method (this feature didn't make it into the final concept proposal.)

Also, an operator signature must also match the built-in operator--this way we'd allow an arbitrary pointer to match the Iterator concept.

These are some of the complications I mentioned at the beginning of this post. There are also complications related to symbol lookup (it has to take into account the Argument-Dependent-Lookup hack) and overload resolution, which are notoriously difficult in C++.

Who Ordered Concept Maps?

So far we've talked about concepts and about templates that use concepts. Now let's focus on what happens when we instantiate a template with concrete types. How does the compiler check that those types fulfill the constraints of the concepts? One possibility is the "duck-typing" approach--if the type in question defines the same associated types and functions (and possibly some other constraints I haven't talked about), it is a match. This is called structural (same set of functions) and nominal (by-name) matching.

Structural matching is simple and straightforward but it might lead to "false positives." Here's the archetypal example: Syntactically, there is no difference between a ForwardIterator and an InputIterator, yet semantically those two are not equivalent. Some algorithms that work fine onForwardIterators will fail on InputIterators (the difference is that you can't go twice through an InputIterator).

Semantic matching, on the other hand, requires the client to explicitly match types with concepts (only the clients knows the meaning of a particular concept). This is done through concept maps (in Haskell they are called "instances"). With this approach, the mapping between Iterator and a pointer to T would require the following statement:

template<T>
concept_map Iterator<T *> {
    typedef T value_type;
};

Notice that we don't have to provide the mapping for operator++; it's automatically mapped into the built-in ++. In general, if the associated functions have the same names as the functions defined for the particular type (like ++ in this case), the concept map may be empty. It's mostly because of the need to create those empty concept maps that the concept proposal was rejected.

Interestingly, both groups that worked on concepts-- the Texas group with Bjarne Stroustrup and the Indiana group with Doug Gregor-- supported concept maps. The difference was that the Texas group wanted concepts to be matched automatically, unless marked as explicit; whereas the Indiana group required explicit concept maps, unless the concept was marked auto.

On a more general level, concept maps enable cross-matching between different libraries and naming conventions. Semantically, a concept from one library may match a type from another library, but the corresponding functions and types may have different names. For instance, a StackOfInt concept from one library may declare an associated function called push. A vector from the Standard Library, conceptually, implements this interface, but uses a different name for it. A concept map would provide the appropriate mapping:

concept_map StackOfInt<std::vector<int>> {
    void push(std::vector<int> & v, int x) {
        v.push_back(x);
    }
}

Conclusion

In my opinion, concepts would add a lot to C++. They would replace conventions and hacks in the STL with a coherent set of language rules. They would provide verifiable documentation and drastically simplify template error messages.

Concept maps have an important role to play as the glue between concepts and types. With concept maps you'll be able to add, post hoc, a translation layer between disparate libraries. The back-and-forth between explicit and auto concepts has a philosophical aspect: Do we prefer precision or recall in concept matching? It also has a practical aspect: Is writing empty concept maps an abomination? More experience is needed to resolve this issue.

The major complaint I've heard about the concept proposal was that it was too complex. Indeed, it took 30 pages to explain it in the Draft Standard. However, adding a new layer of abstraction to an existing language, especially one with such baggage as C++, cannot be trivial. The spec had to take into account all possible interaction with other language features, such as name resolution, overload resolution, ADL, various syntactic idiosyncrasies, and so on. But programmers rarely read language specs. I'm sure that, if there was a book "Effective Concepts" or some such, programmers would quickly absorb the rules and start using concepts.

The current hope is that concepts will become part of the 1x version of C++. Doug Gregor is working on a new version of the C++ compiler, which is likely to become testing grounds for concepts. He also publishes a blog with Dave Abrahams (see for instance this post and the follow up).

On the other hand, the concept effort has definitely lost its momentum after it was voted out of C++0x. Corporate support for further research is dwindling. The hope is in the educated programmers saying no to hacks and demanding the introduction of concepts in future versions of the C++ Standard.

Acknowledgments

I'm grateful to the "Seattle Gang", in particular (alphabetically) Andrei Alexandrescu, Walter Bright, Dave Held, Eric Niebler, and Brad Roberts, for reading and commenting on several drafts of this post. I'd like to thank Jeremy Siek for interesting discussions and the inspiration for this post.

Bibliography

  1. Gabriel Dos Reis and Bjarne Stroustrup, Specifying C++ Concepts, the original Texas proposal with use patterns.
  2. Joint paper by the Indiana and Texas groups, Concepts: Linguistic Support for Generic Programming in C++
  3. Google Video of Doug Gregor talking about concepts.
  4. 2009 Draft C++ Standard with concepts still in. See section 14.10.
  5. Bjarne Stroustrup, The concept killer article.
  6. Jeremy Siek, The C++0x Concept Effort presentation slides.
  7. Posts about concepts on C++Next by Dave Abrahams and Doug Gregor.
  8. Dave Abrahams, To Auto or Not?.
  9. Examples of concept-like hacks in C++0x.

Wouldn’t it be nice to be able to write sequential programs and let the compiler or the runtime automatically find opportunities for parallel execution? Something like this is already being done on a micro scale inside processors. As much as possible, they try to execute individual instructions in parallel. Of course they have to figure out data dependencies and occasionally stall the pipeline or idle while waiting for a memory fetch. More sophisticated processors are even able to speculate–they guess a value that hasn’t been calculated or fetched yet and run speculative execution. When the value is finally available, they compare it with their guess and either discard or commit the execution.

If processors can do it, why can’t a language runtime do the same on a larger scale? It would solve the problem of effectively using all those cores that keep multiplying like rabbits on a chip.

The truth is, we haven’t figured out yet how to do it. Automatic parallelization is, in general, too complex. But if the programmer is willing to provide just enough hints to the compiler, the runtime might figure things out. Such programming model is called semi-implicit parallelism and has been implemented in two very different environments, in Haskell and in .NET. The two relevant papers I’m going to discuss are Runtime Support for Multicore Haskell and The Design of a Task Parallel Library.

In both cases the idea is to tell the compiler that certain calculations may be done in parallel. It doesn’t necessarily mean that the code will be executed in multiple threads–the runtime makes this decision depending on the number of cores and their availability. The important thing is that, other than providing those hints, the programmer doesn’t have to deal with threads or, at least in Haskell, with synchronization. I will start with Haskell but, if you’re not into functional programming, you may skip to .NET and the Task Parallel Library (and hopefully come back to Haskell later).

In Multicore Haskell

In Haskell, you hint at parallel execution using the par combinator, which you insert between two expressions, like this: e1 `par` e2. The runtime then creates a spark for the left hand side expression (here, e1). A spark is a deferred calculation that may be executed in parallel (in the .NET implementation a spark is called a task). Notice that, in Haskell, which is a lazy programming language, all calculations are, by design, deferred until their results are needed; at which point their evaluation is forced. The same mechanism kicks in when the result of a spark is needed–and it hasn’t been calculated in parallel yet. In such a case the spark is immediately evaluated in the current thread (thus forfeiting the chance for parallel execution). The hope is though that enough sparks will be ready before their results are needed, leading to an overall speedup.

To further control when the sparks are evaluated (whether in parallel or not), Haskell provides another combinator, pseq, which enforces sequencing. You insert it between two expressions, e1 `pseq` e2, to make sure that the left hand side, e1, is evaluated before the evaluation of e2 is started.

I’ll show you how to parallelize the standard map function (in C++ it would be called std::transform). Map applies a function, passed to it as the first argument, to each element of a list, which is passed to it as the second argument. As a Haskell refresher, let me walk you through the implementation of map.

map f []     = []
map f (x:xs) = y:ys
    where y  = f x
          ys = map f xs

Map is implemented recursively, so its definition is split into the base case and the recursive case. The base case just states that map applied to an empty list, [], returns an empty list (it ignores the first argument, f).

If, on the other hand, the list is non-empty, it can be split into its head and tail. This is done through pattern matching–the pattern being (x:xs), where x matches the head element and xs the (possibly empty) tail of the list.

In that case, map is defined to return a new list, (y:ys) whose head is y and tail is ys. The where clause defines those two: y is the result of the application of the function f to x, and ys is the result of the recursive application of map to the tail of the list, xs.

The parallel version does the same (it is semantically equivalent to the sequential version), but it gives the runtime the opportunity to perform function applications in paralle. It also waits for the evaluation to finish.

parMap f []     = []
parMap f (x:xs) = y `par` (ys `pseq` y:ys)
    where y  = f x
          ys = parMap f xs

The important changes are: y, the new head, may be evaluated in parallel with the tail (the use of the par combinator). The result, y:ys, is returned only when the tail part, ys, has been evaluated (the use of the pseq combinator).

The tail calculation is also split into parallel computations through recursive calls to parMap. The net result is that all applications of f to elements of the list are potentially done in parallel. Because of the use of pseq, all the elements (except for the very first one) are guaranteed to have been evaluated before parMap returns.

It’s instructive to walk through the execution of parMap step-by-step. For simplicity, let’s perform parMap on a two-element list, [a, b].

First we pattern-match this list to x = a and xs = [b]. We create the first spark for the evaluation of (y = f a) and then proceed with the evaluation of the right hand side of par, (ys `pseq` y:ys). Here ys = parMap f [b].

Because of the `pseq`, we must evaluate ys next. To do that, we call (parMap f [b]). Now the list [b] is split into the head, b, and the empty tail, []. We create a spark to evaluate y' = f b and proceed with the right-hand side, (ys' `pseq` y':ys').

Again, the `pseq` waits for the evaluation of ys' = parMap f []. But this one is easy: we apply the base definition of parMap, which returns an empty list.

Now we are ready to retrace our steps. The right hand side of the last `pseq` re-forms the list y':[]. But that’s the ys the previous `pseq` was waiting for. It can now proceed, producing y:(y':[]), which is the same as [y, y'] or [f a, f b], which is what we were expecting.

Notice complete absence of explicit synchronization in this code. This is due to the functional nature of Haskell. There’s no shared mutable state so no locking or atomic operations are needed. (More explicit concurrent models are also available in Haskell, using MVars or transactional memory.).

Task Parallel Library in .NET

It’s no coincidence that many ideas from Haskell end up in Microsoft languages. Many Haskell programmers work for Microsoft Research, including the ultimate guru, Simon Peyton Jones. The Microsoft Task Parallel Library (TPL) translates the ideas from Multicore Haskell to .NET. One of its authors, Daan Leijen, is a Haskell programmer who, at some point, collaborated with Simon Peyton Jones. Of course, a .NET language like C# presents a different set of obstacles to parallel programming. It operates on mutable state which needs protection from concurrent access. This protection (which, incidentally, is the hardest part of multithreaded programming) is left to the programmer.

Here’s the example of an algorithm in C# with hidden opportunities for parallel implementation. MatrixMult multiplies two matrices. It iterates over columns and rows of the result matrix. The value that goes at their intersection is calculated by the innermost loop.

void MatrixMult(int size, double[,] m1,double[,] m2, double[,] result)
{
   for(int i = 0; i < size; i++){
      // calculate the i'th column
      for(int j = 0; j < size; j++){
         result[i, j] = 0;
         for(int k = 0; k < size; k++){
              result[i, j] += m1[i, k] * m2[k, j];
         }
      }
   }
}

Each column of the result could potentially be evaluated in parallel. The problem is, the size of the array and the number of processor cores might be unknown until the program is run. Creating a large number of threads when there are only a few cores may lead to a considerable slowdown, which is the opposite of what we want. So the goal of TPL is to let the programmer express the potential for parallel execution but leave it to the runtime to create an optimal number of threads.

The programmer splits the calculation into tasks (the equivalent of Haskell sparks) by making appropriate library calls; and the runtime maps those tasks into OS threads–many-to-one, if necessary.

Here’s how the same function looks with parallelization hooks.

void ParMatrixMult(int size, double[,] m1,double[,] m2, double[,] result)
{
   Parallel.For(0, size, delegate(int i)
   {
      for(int j = 0; j < size; j++){
         result[i, j] = 0;
         for(int k = 0; k < size; k++){
              result[i, j] += m1[i, k] * m2[k, j];
         }
      }
   });
}

Because of clever formatting, this version looks very similar to the original. The outer loop is replaced by the call to Parallel.For, which is one of the parallelizing TPL functions. The inner loops are packed into a delegate.

This delegate is assigned to a task (the analog of Haskell spark) that is potentially run in a separate thread. Here the delegate is actually a closure–it captures local variables, size, m1, m2 and result. The latter is actually modified inside the delegate. This is how shared mutable state sneaks into potentially multi-threaded execution. Luckily, in this case, such sharing doesn’t cause races. Consider however what would happen if we changed the types of the matrices from double[,] to char[,]. Parallel updates to neighboring byte-sized array elements may inadvertently encroach on each other and lead to incorrect results. Programmer beware! (This is not a problem in Haskell because of the absence of mutable state.)

But even if the programmer is aware of potential sharing and protects shared variables with locks, it’s not the end of the story. Consider this example:

int sum = 0;
Parallel.For(0, 10000, delegate(int i)
{
   if(isPrime(i)){
      lock(this) { sum += i; }
   }
});

The captured variable, sum is protected by the lock, so data races are avoided. This lock, however, becomes a performance bottleneck–it is taken for every prime number in the range.

Now consider the fact that, on a 4-core machine, we’ll be running 10000 tasks distributed between about 4 threads. It would be much more efficient to accumulate the sum in four local variables–no locking necessary–and add them together only at the end of the calculation. This recipe can be expressed abstractly as a map/reduce type of algorithm (a generalization of the C++ std::accumulate). The tasks are mapped into separate threads, which work in parallel, and the results are then reduced into the final answer.

Here’s how map/reduce is expressed in TPL:

int sum = Parallel.Aggregate(
  0, 10000, // domain
  0, // initial value
  delegate(int i){ return (isPrime(i) ? i : 0) },
  delegate(int x, int y){ return x+y; }
);

The first delegate, which is run by 10000 tasks, does not modify any shared state–it just returns its result, which is internally accumulated in some hidden local variable. The second delegate–the “reduce” part of the algorithm–is called when there’s a need to combine results from two different tasks.

The Place for Functional Programming

Notice that the last example was written in very functional style. In particular you don’t see any mutable state. The delegates are pure functions. This is no coincidence: functional programming has many advantages in parallel programming.

I’ve been doing a lot of multi-threaded programming in C++ lately and I noticed how my style is gradually shifting from object-oriented towards functional. This process is accellerating as functional features keep seeping into the C++ standard. Obviously, lambdas are very useful, but so is move semantics that’s been made possible by rvalue references, especially in passing data between threads. It’s becoming more and more obvious that, in order to be a good C++ programmer, one needs to study other languages. I recommend Haskell and Scala in particular. I’ll be blogging about them in the future.

Bibliography

  1. Simon Marlow, Simon Peyton Jones, and Satnam Singh, Runtime Support for Multicore Haskell
  2. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt, The Design of a Task Parallel Library
  3. A video introduction to Haskell by Simon Peyton Jones: Part I, Part II, together with the slides.

The video of my talk, Haskell and C++ Template Metaprogramming, is now available; and so are the slides. I pretty much covered the material from my last blog post, but many people (including me) find a video presentation easier to follow.

This is also a plug for the Northwest C++ Users Group that meets in Redmond every third Wednesday of the month. If you live in Seattle or on the east side of Lake Washington, check it out. You won’t be disappointed.


If you want to understand C++ template metaprogramming (TMP) you have to know functional programming. Seriously. I want you to think of TMP as maximally obfuscated (subset of) Haskell, and I’ll illustrate this point by point. If you don’t know Haskell, don’t worry, I’ll explain the syntax as I go.

The nice thing about single-paradigm languages like Haskell is that they have very simple syntax (think of Lisp that does everything with just a bunch of parentheses). I will start the Haskell-C++TMP mapping with basics, like functions and recursion, but I’ll try to cover a lot more, including higher-order functions, pattern matching, list comprehension (did you know it was expressible in C++?), and more.

Keep in mind that my Haskell examples are runtime functions operating on runtime data whereas their C++ TMP equivalents are compile-time templates operating mostly on types. Operation on types are essential in providing correct and efficient implementations of parametrized classes and functions.

By necessity the examples are simple, but the same mapping may be applied to much more complex templates from the C++ Standard Library and Boost. As a bonus, I’ll also explain the hot new thing, variadic templates.

Functional Approach to Functions

How do you implement useful functions if you don’t have mutable variables, if statements, or loops? To a C++ programmer that might seem like an impossible task. But that’s the reality of C++ compile-time language that forms the basis of TMP. Functional programming to the rescue!

As a warm-up, let’s see how Haskell implements a simple function, the factorial:

fact 0 = 1
fact n = n * fact (n - 1)

The first line states that the factorial of zero is one. The second line defines factorial for a non-zero argument, n (strictly speaking it only works for positive non-zero arguments). It does it using recursion: factorial of n is equal to n times the factorial of n-1. The recursion stops when n is equal to zero, in which case the first definition kicks in. Notice that the definition of the function fact is split into two sub-definitions. So when you call:

fact 4

the first definition is looked up first and, if it doesn’t match the argument (which it doesn’t), the second one comes into play. This is the simplest case of pattern matching: 4 doesn’t match the “pattern” 0, but it matches the pattern n.

Here’s almost exactly the same code expressed in C++ TMP:

template<int n> struct
fact {
    static const int value = n * fact<n - 1>::value;
};

template<> struct
fact<0> { // specialization for n = 0
    static const int value = 1;
};

You might notice how the horrible syntax of C++ TMP obscures the simplicity and elegance of this code. But once you are equipped with the C++/Haskell decoder ring, things become a lot clearer.

Let’s analyze this code. Just like in Haskell, there are two definitions of fact, except that their order is inverted. This is because C++ requires template specialization to follow the template’s general definition (or declaration, as we’ll see later). The pattern matching of arguments in C++ does not follow the order of declarations but rather is based on “best match”. If you instantiate the template with argument zero:

cout << "Factorial of 0 = " << fact<0>::value << endl;

the second pattern, fact<0>, is a better fit. Otherwise the first one, <int n>, is used.

Notice also the weird syntax for “function call”

fact<n>::value

and for the “return statement”

static const int value = n * fact<n - 1>::value;

This all makes sense if you look at templates as definitions of parameterized types, which was their initial purpose in C++. In that interpretation, we are defining a struct called fact, parameterized by an integer n, whose sole member is a static const integer called value. Moreover, this template is specialized for the case of n equal zero.

Now I want you to forget about what I just said and put on the glasses which make the C++ code look like the corresponding Haskell code.

Here’s another example–this time of a predicate (a function returning a Boolean):

is_zero 0 = True
is_zero x = False

Let’s spice it up a little for C++ and define a predicate on types rather than integers. The following compile-time function returns true only when the type T is a pointer:

template<class T> struct
isPtr {
    static const bool value = false;
};

template<class U> struct
isPtr<U*> {
    static const bool value = true;
};

This time the actual argument to isPtr is first matched to the more specialized pattern, U* and, if it fails, the general pattern is used.

We can add yet another specialization, which will pattern-match a const pointer:

template<class U> struct
isPtr<U * const> {
    static const bool value = true;
};

These types of type predicates may be, for instance, used to select more flexible and efficient implementations of parameterized containers. Think of the differences between a vector of values vs. a vector of pointers.

Lists

The basic data structure in functional languages is the list. Haskell’s lists are introduced using square brackets. For instance, a list of three numbers, 1, 2, 3, looks like:

[1, 2, 3]

List processing in functional languages follows the standard pattern: a list is split into head and tail, an operation is performed on the head, and the tail is processed using recursion. The splitting is done by pattern matching: the Haskell pattern being (head:tail) (to be precise, the colon in parentheses represents the cons operation–the creation of a list by prepending an element to an existing list).

Here’s a simple function, count, that calculates the length of a list:

count [] = 0
count (head:tail) = 1 + count tail

The first pattern, [], matches an empty list; the second a non-empty one. Notice that a function call in Haskell doesn’t use parentheses around arguments, so count tail is interpreted as a call to count with the argument tail.

Before C++0x, TMP was severely crippled by the lack of a list primitive. People used separate definitions for a list of zero, one, two, etc., elements, and even used special macros to define them. This is no longer true in C++0x, thanks to variadic templates and template parameter packs. Here’s our Haskel count translated into C++ TMP:

// Just a declaration
template<class... list> struct
count;

template<> struct
count<> {
    static const int value = 0;
};

template<class head, class... tail> struct
count<head, tail...> {
    static const int value = 1 + count<tail...>::value;
};

First we have a declaration (not a definition) of the template count that takes a variable number of type parameters (the keyword class or typename introduces a type parameter). They are packed into a template parameter pack, list.

Once this general declaration is visible, specializations may follow in any order. I arranged them to follow the Haskell example. The first one matches the empty list and returns zero. The second one uses the pattern, <head, tail…>. This pattern will match any non-empty list and split it into the head and the (possibly empty) tail.

To “call” a variadic template, you initiate it with an arbitrary number of arguments and retrieve its member, value, e.g.,

int n = count<int, char, long>::value; // returns 3

A few words about variadic templates: A variadic template introduces a template parameter pack using the notation class… pack (or int… ipack, etc…). The only thing you may do with a pack is to expand it and pass to another variadic template. The expansion is done by following the name of the pack with three dots, as in tail…. You’ll see more examples later.

Variadic templates have many applications such as type-safe printf, tuples (objects that store an arbitrary number of differently typed arguments), variants, and many more.

Higher-Order Functions and Closures

The real power of functional programming comes from treating functions as first class citizens. It means that you may pass functions to other functions and return functions from functions. Functions operating on functions are called higher-order functions. Surprisingly, it seems like compile-time C++ has better support for higher-order functions than run-time C++.

Let’s start with a Haskell example. I want to define a function that takes two predicate functions and returns another predicate function that combines the two using logical OR. Here it is in Haskell:

or_combinator f1 f2 = 
    λ x -> (f1 x) || (f2 x)

The or_combinator returns an anonymous function (the famous “lambda”) that takes one argument, x, calls both f1 and f2 with it, and returns the logical OR of the two results. The return value of or_combinator is this freshly constructed function. I can then call this function with an arbitrary argument. For instance, here I’m checking if 2 is either zero or one (guess what, it isn’t!):

(or_combinator is_zero is_one) 2

I put the parentheses around the function and its arguments for readability, although they are not strictly necessary. 2 is the argument to the function returned by or_combinator.

The lambda that’s returned from or_combinator is actually a closure. It “captures” the two arguments, f1 and f2 passed to or_combinator. They may be used long after the call to or_combinator has returned.

It might take some getting used to it before you are comfortable with functions taking functions and returning functions, but it’s much easier to learn this stuff in Haskell than in the obfuscated C++. Indeed, here’s an almost direct translation of this example:

template<template<class> class f1, template<class> class f2> struct
or_combinator {
    template<class T> struct
    lambda {
        static const bool value = f1<T>::value || f2<T>::value;
    };
};

Since in the metalanguage a function is represented by a template, the template or_combinator takes two such templates as arguments. It “calls” these templates using the standard syntax f<T>::value. Actually, the or_combinator doesn’t call these functions. Instead it defines a new template, which I call lambda, that takes the argument T and calls those functions. This template acts like a closure–it captures the two templates that are the arguments to or_combinator.

Here’s how you may use the or_combinator to combine two tests, isPtr and isConst and apply the result to the type const int:

std::cout 
   << "or_combinator<isPtr, isConst>::lambda<const int>::value = "
   << or_combinator<isPtr, isConst>::lambda<const int>::value 
   << std::endl;

Such logical combinators are essential for predicate composability.

Higher-Order Functions Operating on Lists

Once you combine higher-order functions with lists you have a powerful functional language at your disposal. Higher-order functions operating on lists look very much like algorithms. Let me show you some classic examples. Here’s the function (or algorithm), all, that returns true if and only if all elements of a list satisfy a given predicate.

all pred [] = True
all pred (head:tail) = (pred head) && (all pred tail)

By now you should be familiar with all the techniques I used here, like pattern matching or list recursion.

Here’s the same code obfuscated by the C++ syntax:

template<template<class> class predicate, class... list> struct
all;

template<template<class> class predicate> struct
all<predicate> {
    static const bool value = true;
};

template<
    template<class> class predicate, 
    class head, 
    class... tail> struct
all<predicate, head, tail...> {
    static const bool value = 
        predicate<head>::value 
        && all<predicate, tail...>::value;
};

Except for the initial declaration required by C++ there is a one-to-one paradigm match between the two implementations.

Another useful algorithm, a veritable workhorse of functional programming, is “fold right” (together with it’s dual partner, “fold left”). It folds a list while accumulating the results (that’s why in runtime C++ this algorithm is called “accumulate”). Here’s the Haskell implementation:

foldr f init [] = init
foldr f init (head:tail) =
    f head (foldr f init tail)

Function f, which is the first argument to foldr, takes two arguments, the current element of the list and the accumulated value. Its purpose is to process the element and incorporate the result in the accumulator. The new accumulated value is then returned. It is totally up to the client to decide what kind of processing to perform, how it is accumulated, and what kind of value is used. The second argument, init, is the initial value for the accumulator.

Here’s how it works: The result of foldr is generated by acting with f on the head of the list and whatever has been accumulated by processing the tail of the list. The algorithm recurses until the tail is empty, in which case it returns the initial value. At runtime this type of algorithm would make N recursive calls before starting to pop the stack and accumulate the results.

For instance, foldr may be used to sum the elements of a list (so_far is the accumulator, which is initialized to zero):

add_it elem so_far = elem + so_far
sum_it lst = foldr add_it 0 lst

The accumulator function is add_it. If, instead, I wanted to calculate the product of all elements, I’d use a function mult_it and the starting value of one. You get the idea.

Here’s the same algorithm in C++ TMP:

template<template<class, int> class, int, class...> struct
fold_right;

template<template<class, int> class f, int init> struct
fold_right<f, init> {
    static const int value = init;
};

template<template<class, int> class f, int init, class head, class...tail> struct
fold_right<f, init, head, tail...> {
    static const int value = f<head, fold_right<f, init, tail...>::value>::value;
};

Once you understand the Haskell version, this complex code suddenly becomes transparent (if it doesn’t, try squinting 😉 ).

Lists of Numbers

Let’s now switch to integers for a moment. Haskell defines a function sum that adds all elements of a list:

sum [] = 0
sum (head:tail) = head + (sum tail)

We can do the same in C++ TMP (in five times as many lines of code):

template<int...> struct
sum;

template<> struct
sum<> {
    static const int value = 0;
};

template<int i, int... tail> struct
sum<i, tail...> {
    static const int value = i + sum<tail...>::value;
};

List Comprehension

Haskell has one more trick up its sleeve for operating on lists without explicit recursion. It’s called list comprehension. It’s a way of defining new lists based on existing lists. The nomenclature and notation are borrowed from Set Theory, where you often encounter definitions such as: S is a set of elements, where… Let’s look at a simple Haskell example:

[x * x | x <- [3, 4, 5]]

This is a set (list) of elements x * x, where x is from the list [3, 4, 5].

Remember our recursive definition of count? Using list comprehension it’s reduced to a one-liner:

count lst = sum [1 | x <- lst]

Here, we create a list of ones, one for each element of the list. Our result is the sum of those ones. To make this definition more amenable to translation into C++, let’s define an auxiliary function one that, for any argument x, returns 1.

one x = 1

Here’s the modified definition of count:

count lst = sum [one x | x <- lst]

Now we are ready to convert this code to C++ TMP:

template<class T> struct
one {
    static const int value = 1;
};

template<class... lst> struct
count {
    static const int value = sum<one<lst>::value...>::value;
};

Here our list is stored in a template parameter pack, lst. If we wanted to expand this pack, we’d use the notation lst…, but that’s not what’s happening here. The ellipsis appears after the pattern containing the pack:

one<lst>::value...

Compare this with the equivalent Haskell:

[one x | x <- lst]

In C++, when the ellipsis follows a pattern that contains a pack, it’s not the pack that’s expanded, but the whole pattern is repeated for each element of the pack. Here, if our list were <int, char, void*>, the pattern would be expanded to:

<one<int>::value, one<char>::value, one<void*>::value>

The subsequent call to sum would be made with those arguments.

Notice that a different positioning of the ellipsis would result in a completely different expansion. This pattern:

one<lst...>::value

would result in the call to one with the list of types, which would be an error.

Here’s another example of pattern expansion: a function that counts the number of pointers in a list of types:

template<class... lst> struct
countPtrs {
    static const int value = sum<isPtr<lst>::value ...>::value;
};

In this case the pattern is:

isPtr<lst>::value ...

and it expands into a list of Booleans. (I’m taking advantage of the fact that false is zero and true is one, when converted to integers.)

You may find a more complex practical example in the Gregor, Järvi, and Powell paper (see bibliography).

Continuations

List comprehension can be used to define some very useful higher-order functions. One of such functions is map, which takes a list and applies a unary function to each element, resulting in a new list. You might be familiar with the runtime implementation of this algorithm in C++ STL under the name of transform. This is what map looks like in Haskell:

map f lst = [f x | x <- lst]

Here, f is the unary function and lst is the input list. You have to admire the terseness and elegance of this notation.

The first impulse would be to translate it into C++ TMP as:

template<template<class> class f, class... lst> struct
map {
    typedef f<lst>... type;
};

This is surprisingly terse too. The problem is that it doesn’t compile. As far as I know there is no way for a template to “return” a variable list of elements. In my opinion, this is a major language design flaw, but that’s just me.

There are several workarounds, none of them too exciting. One is to define a separate entity called a typelist along the lines of:

template struct
typelist<hd, tl...> {
    typedef hd head;
    typedef typelist<tl...> tail;
};

(As a matter of fact I have implemented typelists and related algorithms both in C++ and D.)

Another approach is to use continuations. Template parameter packs cannot be returned, but they can be passed to variadic templates (after expansion). So how about defining an algorithm like map to take one additional function that would consume the list that is the result of mapping? Such a function is often called a continuation, since it continues the calculation where normally one would return the result. First, let’s do it in Haskell:

map_cont cont f lst = cont [f x | x <- lst]

The function map_cont is just like map except that it takes a continuation, cont, and applies it to the result of mapping. We can test it by defining yet another implementation of count:

count_cont lst = map_cont sum one lst

The continuation here is the function sum that will be applied to the list produced by acting with function one on the list lst. Since this is quite a handful, let me rewrite it in a more familiar notation of runtime C++:

int map_cont(int (*cont)(list), int (*f)(int), list lst) {
    list tmp;
    for (auto it = lst.begin(); it != lst.end(); ++it)
        tmp.push_front(f(*it));
    return cont(tmp);
}

Now for the same thing in compile-time C++:

template<template<class...> class cont, 
         template<class> class f,
         class... lst> struct
map_cont {
    static const int value =
        cont<typename f<lst>::type ...>::value;
};

It’s a one-to-one mapping of Haskell code–and it has very little in common with the iterative runtime C++ implementation. Also notice how loose the typing is in the TMP version as compared with the runtime version. The continuation is declared as a variadic template taking types, function f is declared as taking a type, and the list is a variadic list of types. Nothing is said about return types, except for the constraints that f returns a type (because cont consumes a list of types) and cont returns an int. Actually, this last constraint can be relaxed if we turn integers into types–a standard trick (hack?) in TMP:

template<int n> struct
Int {
    static const int value = n;
};

Loose typing–or “kinding,” as it is called for types of types–is an essential part of compile-time programming. In fact the popularity of the above trick shows that C++ kinding might be already too strong.

The D Digression

I’m grateful to Andrei Alexandrescu for reviewing this post. Since he objected to the sentence, “I’m disappointed that the D programming language followed the same path as C++ rather than lead the way,” I feel compelled to support my view with at least one example. Consider various implementations of all.

In Haskell, beside the terse and elegant version I showed before:

all pred [] = True
all pred (head:tail) = (pred head) && (all pred tail)

there is also a slightly more verbose one:

all pred list =
    if null list then True
    else pred (head list) && all pred (tail list)

which translates better into D. The D version (taken from its standard library, Phobos) is not as short as Haskell’s, but follows the same functional paradigm:

template allSatisfy(alias F, T...) {
    static if (T.length == 1)
    {
        alias F!(T[0]) allSatisfy;
    }
    else
    {
        enum bool allSatisfy = F!(T[0]) && allSatisfy!(F, T[1 .. $]);
    }
}

It definitely beats C++, there’s no doubt about it. There are some oddities about it though. Notice that the type tuple, T… gets the standard list treatment, but the split into the head and tail follows the array/slice notation. The head is T[0] and the tail is an array slice, T[1..$]. Instead of using value for the return value, D uses the “eponymous hack,” as I call it. The template “returns” the value using its own name. It’s a hack because it breaks down if you want to define the equivalent of “local variable” inside a template. For instance, the following code doesn’t compile:

template allSatisfy(alias F, T...) {
    static if (T.length == 1)
    {
        alias F!(T[0]) allSatisfy;
    }
    else
    {
        private enum bool tailResult = allSatisfy!(F, T[1..$]);
        enum bool allSatisfy = F!(T[0]) && tailResult;
    }
}

This breaks one of the fundamental property of any language: decomposability. You want to be able to decompose your calculation into smaller chunks and then combine them together. Of course, you may still use decomposition if you don’t use the eponymous hack and just call your return value value. But that means modifying all the calling sites.

By the way, this is how decomposition works in Haskell:

all2 pred [] = True;
all2 pred (head:tail) = (pred head) && tailResult
    where tailResult = all2 pred tail

Anyway, the important point is that there is no reason to force functional programming paradigm at compile-time. The following hypothetical syntax would be much easier for programmers who are used to imperative programming:

template allSatisfy(alias Pred, T...) {
    foreach(t; T)
        if (!Pred!(t))
            return false;
    return true;
}

In fact it’s much closer to the parameterized runtime function proposed by Andrei:

bool all(alias pred, Range)(Range r) {
    foreach (e; r) 
        if (!pred(e)) 
            return false;
    return true;
}

This is why I’m disappointed that the D programming language followed the same path as C++ rather than lead the way.

Conclusions

I have argued that some familiarity with Haskell may be really helpful in understanding and designing templates in C++. You might ask why C++ chose such horrible syntax to do compile-time functional programming. Well, it didn’t. The ability to do compile-time calculations in C++ was discovered rather than built into the language. It was a very fruitful discovery, as the subsequent developments, especially the implementation of the Boost MPL, have shown. However, once the functional paradigm and its weird syntax took root in C++ TMP, it stayed there forever. I’m aware of only one effort to rationalize C++ TMP by Daveed Vandevoorde in Reflective Metaprogramming in C++.

I have tested all code in this blog using the Hugs interpreter for Haskell and the GNU C++ compiler v. 4.4.1 with the special switch -std=c++0x. The names I used in the blog might conflict with the standard definitions. For instance, Hugs defines its own map and foldr.

If you want to learn more about template metaprogramming, I recommend two books:

  1. Andrei Alexandrescu, Modern C++ Design
  2. David Abrahams and Aleksey Gurtvoy, C++ Template Metaprogramming

The reference I used for variadic templates was the paper by Douglas Gregor, Jaakko Järvi, and Gary Powell


In my last post I criticized the C++0x implementation of futures for their lack of composability. I now feel obliged to show what I consider “the right way” to do futures.

I use the event library of Concurrent Caml (actually, its version distributed with OCaml) as my model. The ideas behind it are described in J. H. Reppy’s Ph.D. thesis, Higher-order concurrency.

Events and Futures

CML channels and events are more general than futures, but here I’ll concentrate on the subset of their functionality that can be used to implement futures. Below is a useful translation table between CML and C++. (I should add that CML channels, as opposed to C++ futures, are synchronous. In practice it doesn’t matter that much, since promise::set_value–corresponding to channel send–is usually called right before the thread exits.)

CML C++
channel promise
event future
evt = receive chan ftr = prom.get_future()
sync (send chan msg) (synchronous) prom.set_value(msg) (asynchronous)
msg = sync evt msg = ftr.get()
choose evt_lst ???
evt’ = wrap evt fun ???

As you can see, C++ has no equivalents of choose and wrap. I will discuss those next.

choose

CML choose is a function that takes a list of events and creates a new composite event. When the program syncs on this event, it blocks until one of the listed events completes. The combination sync and choose is similar to Unix select or Windows WaitForMultipleObjects.

An example of the usefulness of choose is a situation when there is more than one algorithm to calculate a result, and the program runs them in parallel until the fastest one completes. Linear search, for instance, is faster for small sets than binary search. Instead of figuring out which one to use for each particular case, it might be cheaper (on a multicore) to run both and stop when the fist one is done.

What would choose look like in C++? One possibility is to implement it as a container object. You would push individual futures into it and then call get_future to obtain a composite future. Let me call this object future_or since it does the equivalent of logical OR for futures (the equivalent of AND is called a thread barrier).

  unique_future<int> ftr1 = promise1.get_future();
  unique_future<int> ftr2 = promise2.get_future();
  future_or<int> orf;
  orf.push_back(std::move(ftr1));
  orf.push_back(std::move(ftr2));
  unique_future<int> compositeFut = orf.get_future();
  ...
  int result = compositeFut.get();

Another possible implementation, in C++0x, would be to define a variadic template function future_choose. It would take a variable number of futures and return the composite future. The advantage of such an approach would be to let the compiler infer the types and the number of arguments automatically.

You probably noticed that future_or can only accept futures of one type (the same would be true about the function future_choose). For instance, to create a composite future that returns a string upon get, you must build it from other futures that also return a string (and not an int, for instance).

In principle, heterogeneous futures could be combined if they all returned different types through the same variant. Then, after a composite of variant-returning futures returns a variant, program logic would fork depending on the actual type inside the variant. This approach bypasses the static type system and is error prone. CML provides a more elegant solution using wrap.

wrap

The trick is to combine (wrap) a particular future with a function that processes its output. A future that returns an int is combined with a function that takes an int, etc. The result of this combination is a new future usually returning void.

When a wrapped future is synchronized (e.g., by calling wait), it executes the associated function, passing it the result returned by the original future. Just look at this example:

  void handler(Foo const & foo);
  unique_future<Foo> fooFuture = prom.get_future();
  // combine the future with a handler function
  unique_future<void> wrappedFuture = wrap_future(fooFuture, &handler);
  ...
  wrappedFuture.wait();
  // at this point fooFuture has returned a Foo object 
  // and the handler has been executed with that object.

There’s one thing worth mentioning: the compiler will assure type correctness of the arguments to wrap_future: the return type of the future must be compatible with the argument type of the function. Mismatch errors will be discovered at compile time. Compare this with the implementation using variants, which does not have this property.

Of course the usefulness of wrap really comes into play when you try to combine heterogeneous futures using future_choose. You just wrap each future with its own post-processing function (or callable object) and pass them all to future_choose. Wrapped futures are homogeneous–they all return void.

Here’s an example that uses C++0x lambdas (one of them being a closure):

  unique_future<Foo> fooFuture = prom1.get_future();
  unique_future<int> intFuture = prom2.get_future();
  int sum = 0;
  future<void> composite = future_choose(
      wrap_future(fooFuture, [](Foo const & foo){foo.Commit();},
      wrap_future(intFuture, [&sum](int i){sum += i;});
  ...
  composite.wait();

Wrapped futures would be useful, e.g., in implementing a server that listens to several communication channels.

What about Haskell?

As I described in my earlier post, unlike CML, Haskell implements asynchronous message passing. The original Concurrent Haskell paper argued against choose. Later however Avik Chaudhuri was able to show that the CML event library can be implemented on top of asynchronous MVars. So there are no theoretical obstacles to asynchronous implementation of futures.

Final remarks

The equivalent of choose called future_select has been proposed for Boost and C++0x, but it was turned down.

Interestingly enough, Microsoft’s Task Parallel Library for .NET implements higher-order concurrency. They have ContinueWith for wrapping futures and WaitAny for choice. I can only assume that it’s easier to implement those on a virtual machine.

I also wonder whether it would be more productive to implement the equivalent of the CML channels and events in C++, and treat futures as a special case.

If you like this post, please vote for it on reddit.

Bibliography

  1. J. H. Reppy. Higher-order concurrency. PhD thesis, Cornell
    University, 1992. Technical Report 92-1852.
  2. Avik Chaudhuri. A Concurrent ML Library in Concurrent Haskell.

What is the most basic building block for asynchronous message passing? I’ve found the answer in Concurrent Haskell. Not many people are fluent in Haskell, especially when monads are involved, so MVars may sound foreign to most. Not to worry–I’ll translate them into Java (as I did with synchronous CML channels in my previous post).

An MVar is an object with two methods, put and take. The sender calls put with a message and continues (it doesn’t block–that’s why this message-passing model is called “asynchronous”). The receiver, in another thread, calls take to remove the message from MVar. If there is no message, the receiver blocks until there is one.

An MVar can store a maximum of one message, so it’s an error to call put when there is an unclaimed message inside it. An MVar can thus be only in one of two states: empty or full.

Here’s a simple-minded implementation of MVar written in Java. (A lock-free implementation is possible–and closer to how Haskell implements it–but it’s harder to reason about.)

public class MVar<T> {
    private T _obj;
    private boolean _full;

    public MVar(){
        _full = false;
    }
    // put: asynchronous (non-blocking)
    // Precondition: MVar must be empty
    public synchronized void put(T obj) {
        assert !_full;
        assert _obj == null;
        _obj = obj;
        _full = true;
        notify();
    }
    // take: if empty, blocks until full.
    // Removes the object and switches to empty
    public synchronized T take() throws InterruptedException {
        while (!_full)
            wait(); // may throw!

        T ret = _obj;
        _obj = null;
        _full = false;
        return ret;
    }
}

You might think that it would be difficult to implement anything useful with MVars. After all it looks like a message queue of length one, which bombs when you try to put a second message in. Yet it is the simplest building block for more sophisticated structures. It is the atom of asynchronous message passing.

We’ve seen before how to implement asynchronous message passing using synchronous building blocks by spawning a thread. Now let’s see how we can implement a simple synchronous channel variable using asynchronous MVars. Remember, in a synchronous channel, if there is no receiver already waiting on the other side, a call to send (or write, in this example) will block.

The basic channel variable, or a channel of length one, is called a CVar in Haskell, and it contains two MVars, one to store the message, and the other for the acknowledgment. The acknowledgment MVar doesn’t really have to store anything–we are only interested in its state: empty or full. The reader will acknowledge the receipt of the message by setting it to full.

public class CVar<T> {
    private MVar<T> _data;
    private MVar<Object> _ack;
    
    public CVar(){
        _data = new MVar<T>();
        _ack = new MVar<Object>();
        _ack.put(null); // make _ack full
    }
    public void write(T obj) throws InterruptedException {
        _ack.take(); // make _ack empty
        _data.put(obj);
    }
    public T read() throws InterruptedException {
        T data = _data.take();
        _ack.put(null); // make _ack full
        return data;
    }
}

MVars can also be used to build a more useful asynchronous buffered channel that can store more than one message at a time. I will show you the construction, but it’s far from trivial. You might notice that it resembles a lot a lock-free FIFO queue (although I chose to use locks to implement the MVar). As with all lock-free data structures, one has to be very careful when reasoning about their correctness. I’ll leave it as an exercise to the reader 😉 .

Item and Stream are mutually recursive data structures forming a linked list. An Item points to a Stream–the tail of the list. A Stream is an MVar containing an Item. You may look at a Stream as a sequence of alternating MVars and Items. An empty MVar may serve as a sentinel.

class Item<T> {
    private T _val;
    private Stream<T> _tail;
    
    public Item(T val, Stream<T> tail){
        _val = val;
        _tail = tail;
    }
    T value(){
        return _val;
    }
    Stream<T> tail(){
        return _tail;
    }
}

// It's just a typedef for an MVar that stores an Item
class Stream<T> extends MVar<Item<T>> {}

A Channel contains the Stream linked list stored in an MVar, which is called _read because the head of the list is where we read messages. The other MVar points to the end of the list (really an empty sentinel Stream). This is the write end of the list. The methods put and get (which are not synchronized!) perform some intricate manipulations characteristic of lock-free algorithms, making sure that at every step the queue is in a consistent state for concurrent access. Notice that put will only block if another put is in progress. Otherwise it’s asynchronous–there is no waiting for a receiver.

public class Channel<T> {
    private MVar<Stream<T>> _read;
    private MVar<Stream<T>> _write;
    
    public Channel() {
        _read = new MVar<Stream<T>>();
        _write = new MVar<Stream<T>>();
        Stream<T> hole = new Stream<T>();
        _read.put(hole);
        _write.put(hole);
    }
    public void put(T val)throws InterruptedException {
        Stream<T> newHole = new Stream<T>();
        Stream<T> oldHole = _write.take();
        _write.put(newHole);
        oldHole.put(new Item<T>(val, newHole));
    }
    public T get()throws InterruptedException {
        Stream<T> cts = _read.take();
        Item<T> item = cts.take();
        _read.put(item.tail());
        return item.value();
    }
}

In the next few installments I’m planning to talk a little about “choice” and then tackle the actor-based message-passing paradigm (Erlang and Scala).

« Previous Page