March 2009



A type system that prevents data races must not eliminate useful concurrency patterns or force the programmer to maintain multiple copies of almost identical data structures.

In my previous post about Guava, a dialect of Java, I talked about a type system that enforces the separation of thread-local and shared objects at the class level. Unfortunately, such rigid system forces the programmer (especially a library writer) to provide dual implementations of many generic classes like vectors, queues, etc. Often the only difference between implementations is the use of synchronized (or, in case of Guava, the special base class called Monitor).

To solve this problem, Boyapati and Rinard proposed a system where the same generic class may be used in different sharing contexts. For instance, the same parameterized vector class, may be used to instantiate a thread-local instance that requires no locking, as well as a shared instance that has a built-in lock.

The paper precedes Generic Java, but employs similar notation. For instance, it lets you define a generic vector class like this:

  class vector<thisOwner> { ... }

and then instantiate it as thread-local (no locking requirements):

  vector<thisThread> localV= new vector<thisThread>;

or as a shared monitor:

  vector<self> sharedV= new vector<self>;

The first template parameter is always interpreted as “the owner” (more about it later). Objects owned by thisThread are thread-local, objects owned by self are monitors.

Even though the notation used in GRFJ (the acronym the authors use for their Java dialect) is different from that of Guava, there are many similarities in the two approaches, since both have to deal with similar issues: explicit sharing, ownership, passing objects between threads, etc.

-Explicit sharing

You might remember that, in Guava, only those classes that inherit from Monitor may be shared. In GRFJ, sharing is defined at the instance level (the instantiation of a template). Every instance declaration must specify the owner of the object. If the owner is not thisThread, the object may be shared between threads. The equivalent of the Guava Monitor is a self-owned object–its owner is declared as self.

-Ownership

Ownership plays an essential role in preventing data races. Every object has an owner. In GRFJ there are three basic types of owners:

  1. thisThread–the object owned by thisThread is never shared.
  2. self–the object is the root of an ownership tree.
  3. Another object–the sharing is defined by the root of the ownership tree

Types of ownership translate naturally into protection patterns. If the owner is thisThread there is no need for locking. If the owner is self, all methods must be synchronized by the object’s lock. In the third case, the object’s owner is responsible for locking. More precisely, the root of the ownership tree to which the object belongs has to be locked, if it’s not declared thread-local.

There are some interesting combinations of ownership. For instance, you can declare a thread-local vector that stores self-owned (shared) items. Or you can declare a shared Stack that contains (owns) a Vector object. All access to Vector will be protected by the Stack’s lock.

For this level of expressiveness, we need classes that are parameterized by owners. Notice that, if the owner of the object x is another object, that object must exist when the declaration of x is reached.

A template might be parameterized by multiple owners. The first one on the list is always the owner of this. In the example below, Stack is parameterized by two owners–the first owns the Stack, the second owns the Values. Note that, in this case, all Values will always share the same owner.

class Stack<thisOwner, valueOwner> {
    Node<thisOwner, valueOwner> head = null;

    void push(Value<valueOwner> value) requires (this) {
        Node<thisOwner, valueOwner> newNode = 
            new Node<thisOwner, valueOwner>;
        newNode.init(value, head);
        head = newNode;
    }
    Value<valueOwner> pop() requires (this) {
        if (head == null) return null;
        Value<valueOwner> value = head.value();
        head = head.next();
        return value;
    }
}

The requires clause specifies whose lock must be held when calling a particular method. In this case, the lock on this must be held. But if this is owned by another object, the locking responsibility automatically moves up a level, until it reaches the ownership root.

Here’s the definition of Node:

class Node<thisOwner, valueOwner> {
    Value<valueOwner> value;
    Node<thisOwner, valueOwner> next;

    void init(Value<valueOwner> v, Node<thisOwner, valueOwner> n)
        requires (this) {
        this.value = v;
        this.next = n;
    }
    Value<valueOwner> value() requires (this) {
        return value;
    }
    Node<thisOwner, valueOwner> next() requires (this) {
       return next;
    }
}

And the definition of Value:

class Value<thisOwner> { int x = 0; }

Using the declarations above we are now ready to declare different values and stacks:

Value<thisThread> v1 = new Value<thisThread>;
Value<self> v2 = new Value<self>;

We have created two values from the same template–v1 is thread-local, v2 is a monitor (access to x is protected by its lock).

Stack<thisThread, thisThread> s1 = new Stack<thisThread, thisThread>;
Stack<thisThread, self> s2 =  new Stack<thisThread, self>;
Stack<self, self> s3 = new Stack<self, self>;

Stack s1 is thread-local and can store only thread-local values. No locks or locking code will be created by the compiler. Stack s2 is also thread-local, but it stores shareable values. A thread-local stack will never be visible from other threads. But self-owned values it stores might be accessed from multiple threads. Finally, s3 is a shared stack containing shared values. Both, the stack s3 and the values it stores have their own independent locks.

s1.push(v1);
s2.push(v2);

We may push a thread-local value v1 on the stack s1, but if we tried to push v2 on s1, the compiler would consider it an error. Pushing v2 on s2, on the other hand, is okay.

Since GRFJ is based on Concurrent Java, the locking and threading look a little odd. To illustrate the sharing of s3, we fork a thread, passing it s3 and v2 (both are sharing-ready) and executing the code s3.push(v2) under the lock of s3.

fork (s3,v2) {synchronized (s3) in {s3.push(v2);}}

Notice that, according to the declaration of s3, it would be an error to push v1 onto it. Indeed, that could result in illegal sharing of a thread-local object. The type system protects us from a hard-to-detect error.

This is hardly a free lunch, though. Annotating every class and every variable might be just too much for a programmer. Fortunately, most owners can be inferred by the compiler by analyzing assignments. Because of that, single threaded programs in GRFJ require virtually no annotations.

-Foreign owners

In most cases, ownership tree follows the containment tree. The owner contains the ownee. Although desirable from the architectural point of view, this arrangement is not strictly necessary. An object might declare another separate object as its owner. This is safe under one condition–the owner object may not be overwritten. Hence the requirement that the owner be final. Here’s the relevant example:

final Foo<self> owner = new Foo<self>;
Bar<owner> ownee = new Bar<owner>;

This becomes important when building new locking protocols from pre-existing parts.

-Object migration

The passing of objects between threads requires either deep copying (like Guava’s Values), or move semantics. In GRFJ, move semantics is implemented by specifying another special type of owner–unique. Unique objects cannot be copied, they have to be moved. The “move” operator is the postfix decrement, just like in Guava. It moves the reference and nulls the source.

Value<unique> v3 = new Value<unique>
Value<unique> v4 = v3--;

Our Stack class is not prepared to store unique objects. This prohibition may be included in its definition (compare it with C++ “concepts”):

class Stack<thisOwner, valueOwner> where (valueOwner != unique)

Conversely, we might want to redefine Stack to store unique objects. A few code changes would be necessary though. For instance, in push, the value must be moved:

newNode.init(value--, head);

In pop, the return value has to be moved:

return value--;

The class Node requires similar changes.

The authors note that, despite appearances, the moving of objects is multiprocessor safe. Even though the assignment to a new reference is not guaranteed to be immediately visible to other threads, a unique object is always published to another thread via a monitor (for instance, a shared stack). The new thread can only get hold of the object by first acquiring the monitor’s lock, which forces previous stores to commit.

-Alias control

Move semantics requires control over aliasing–a moved object may not leave any aliases, nor may it carry along any references to thread-local objects. GRFJ provides additional annotations to mark non-escaping method parameters. The syntax is !e appended to the parameter type. Here’s an example:

void display(Value<unique>!e val);
Value<unique> v5 = new Value<unique>;
display(v5);

A unique object here is passed by reference only because display guarantees that its argument won’t escape.

-Immutable objects

Immutable objects may be passed between threads by reference. In GRFJ, immutable objects are marked by another special owner type, readonly. Interestingly, the problem of the construction of immutable objects is cleverly avoided. You first create a unique object and then move it to a readonly reference.

Value<unique> v6 = new Value<unique>;
v6.x = 1;
Value<readonly> v7 = v6--;

This is a perfectly safe operation, since there is a guarantee that no writable aliases to the unique object may stay behind. The move to readonly freezes the object forever.

Immutable objects can only be passed to methods that promise not to modify their arguments. This is done by appending !w to the type of the parameter. The two suffixes may be combined to form !ew (I am not kidding you!), a parameter that doesn’t escape and is not modified by the method.

-Limitations

– Some concurrent programs use multi-stage access patterns that are not expressible in GRFJ. For instance, a shared array is divided into disjoint sections and each thread operates exclusively on its section without any locking. After all threads synchronize on a barrier, they pick up different sections and continue. The ownership of sections changes with time. (In D this pattern might be implementable using array slices.)

– Dynamic downcasting, which used to be the workhorse of Java before generics, can’t verify the ownership part of the cast, because this information is not available at runtime.

– Static variables may be accessed only when the client holds the class lock. Each class with static variables must therefore have a static lock.

– The authors mention the idea of parameterizing methods that accept poly-owned arguments. This is not so easy as it sounds, since virtual functions cannot be parameterized by a potentially infinite set of types. My guess is that this is possible because detailed ownership information is only needed during type checking. Still, the compiler might have to produce additional implementations of a method depending on whether the parameter is thread-local or not (imagine the parameterized method creating a local variable of the same ownership type–sometimes this variable must contain a lock, sometimes not). Still, this is a finite set of possibilities, so vtable slots may be preallocated for all of them.

– The template syntax for ownership won’t work in languages where templates already accept value parameters, and the compiler isn’t always able to distinguish between types and values.


Can a good type system prevent concurrency errors? Or is this a quest for the Holy Grail?

There are two parts to this question, corresponding to two major types of concurrency errors:

  1. Preventing data races
  2. Preventing deadlocks

I’ll start with the first one.

Data races occur only when memory is shared between threads. Disallow sharing and data races are gone! In fact there is a name for threads that don’t share memory: processes. It’s perfectly feasible to have a concurrent language that disallows sharing–Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional languages.

Non-functional languages like C++, Java, or D (I was told by Walter Bright, the creator of D, to always use the full name, “the D programming language,” so that search engines can index it properly) tend to share data almost by default (see, however, this post).

In Java, all non-primitive types have reference semantics. When you pass a Java object between threads, you’re only passing a handle; the object behind it becomes accessible from both threads.

C++ at least has means to express passing by value and move semantics for user-defined types. Still, it’s up to the programmer to use them.

Who ordered Guava?

For every type-system idea there is one or more dialects of Java that implement it. I’ll start with an older attempt at data-race free Java called Guava, as it illustrates some of the basic premises.

-Explicit sharing

The most important step–if we don’t want to completely ban the sharing of data–is to regulate it. Let the programmer explicitly mark the data destined for sharing as such. The corollary is that the data that is not marked for sharing cannot be shared. This can be accomplished, for instance, by making all objects thread-local by default, or by using type modifiers that prevent references to such objects from escaping.

In Guava, the data type designed for sharing is called a Monitor. As the name suggests, all access to a Monitor is automatically synchronized by the Monitor’s lock. This, incidentally, eliminates the need for the synchronized keyword, which is absent from Guava.

The non-shared data types are either Objects or Values.

Objects are just like regular Java Objects, except that they don’t have a built-in lock, since they can never be shared between threads.

Values are either primitive values, like integers, or user-defined types with value semantics.

Monitors, Objects, and Values are collectively called instances.

-Value semantics

When you pass a Value, the compiler will make a deep copy of it (well, sort of–the monitors embedded in a Value are not deep copied). Since deep copying might be expensive, Guava defines operator “move”, which nulls the source. The syntax is:

  v2 = v1--;

The value v1 becomes null after the move to v2. This is similar to C++ unique_ptr and std::move.

-Ownership

The biggest problem in lock based concurrency is to make sure that the correct lock(s) are taken when accessing shared data. In Guava, all Monitor’s data are protected by that Monitor’s lock. As long as they stay inside that Monitor, nothing bad can happen to them from the point of concurrency.

Values stored inside a Monitor are never accessible outside of the Monitor–only their copies may be passed out.

The same is not true about Objects. Since Objects have reference semantics, there is a real danger that Objects’ references might escape the Monitor that protects them. Imagine a situation where two Monitors have references to the same Object. It is possible then that two threads may operate on that Object at the same time–one entering through one Monitor, the other through the other Monitor. We have a data race!

Therefore it is important that every Object have an owner at all times. The Object’s owner is either a Value or a Monitor. (The exception is a fresh Object that’s been just allocated–it has no owner until it is assigned to a variable.) Since an Object may only be owned by at most one Monitor, it is that Monitor that protects it from simultaneous (racy) access.

-Regions

All Objects that are owned by a particular Monitor or Value form a region. Equivalently, assigning a monitored region to an object specifies what lock must be held when accessing it.

All instances may contain (references to) monitors, but monitors are not “owned” by anybody. References to the same monitor may appear in multiple regions and may be freely passed around. It is thus up to programmers to define an ordering scheme for their monitors in order to avoid deadlocks.

How can we protect Objects from moving between regions and acquiring multiple owners? We need a way to control aliasing.

Here are some Guava rules for passing Objects. A method may declare its Object parameter as either kept or lent. (By default, parameters to Object methods are kept and to Monitor methods are lent.) If the parameter is kept it must belong to the same region as this, and there are no limitations on its use. If, however, the parameter is lent, the method may not store a reference to it in this, nor may it store this inside a lent Object. No cross-aliasing is allowed.

A method may also be marked new if it returns a freshly created object, which has no owner yet. Constructors are considered new unless they accept kept parameters.

Notice that you may have multiple references to the same Object, but they will all be within the same region. The only instances that may be passed between threads are Monitors and Values.

-Constructors

Guava final fields may either be initialized inside a constructor or in a private method that is only called from inside a constructor. (BTW, in D a private method is not private within a module, so the compiler would have to analyze the whole module to establish the latter condition.) [Note to self: The same scheme might be useful in the construction of immutable objects in D.]

Partially constructed Objects must not be visible outside constructors. The compiler must verify that constructors don’t pass this to other methods, and don’t store this inside other instances (no alias cross-contamination).

-Optimizations

Copying Values around may be expensive. I already mentioned one optimization, the use of the “move” operator. The other optimization is related to immutability. If a Value is immutable, it may be safely passed by reference. Guava defines immutable classes as ones that have no update methods. Any method that may modify this must be marked update. The update notation is also extended to method parameters–by default parameters are immutable.

There is a bonus advantage to separating update methods from the rest. In a Monitor, a non-update method may safely use a readers’ lock, which allows multiple readers to access data simultaneously, to increase concurrency.

-Global and local methods

A method is considered local if its effects cannot be observed or influenced by other threads. All Object and Value methods are by default local. A local method is immune to races thus allowing single-thread optimizations.

Conversely, all Monitor methods are considered global, since operations on Monitors may be visible or influenced by other threads.

These defaults may be overridden. For instance, an Object may contain a reference to a Monitor. The methods that call this Monitor’s methods must be marked global. Moreover, Object or Value methods that access Monitors that are passed to them as arguments must also be marked global. So touching a Monitor (which is equivalent to using its lock) taints the whole callers’ tree with the global annotation.

This is similar to the way update taints the callers tree, except the update annotation of a method only pertains to this, not to its arguments. However, when global and update are combined, they have the same tainting power as global. In particular, a method that invokes a global update method on its argument becomes tainted with global update.

Methods that are global update cannot be invoked from non-update methods, even if they only global-update their arguments.

Note: This part of the paper is not very clear to me. The authors never explain the importance of global update methods (other than optimization opportunities).

-Limitations

Guava implements a relatively simple and somewhat limited system to eliminate races. It punts the problem of ownership-passing and lock-free programming. Even with those limitations the Guava type system is not simple.

The idea that safe multithreading may be achieved with a few simple modification to the type system seems to be untenable. However, as long as special type annotations are limited to the code that does actual multithreading, I think they’re worth the added complexity.


In my last post I criticized the C++0x implementation of futures for their lack of composability. I now feel obliged to show what I consider “the right way” to do futures.

I use the event library of Concurrent Caml (actually, its version distributed with OCaml) as my model. The ideas behind it are described in J. H. Reppy’s Ph.D. thesis, Higher-order concurrency.

Events and Futures

CML channels and events are more general than futures, but here I’ll concentrate on the subset of their functionality that can be used to implement futures. Below is a useful translation table between CML and C++. (I should add that CML channels, as opposed to C++ futures, are synchronous. In practice it doesn’t matter that much, since promise::set_value–corresponding to channel send–is usually called right before the thread exits.)

CML C++
channel promise
event future
evt = receive chan ftr = prom.get_future()
sync (send chan msg) (synchronous) prom.set_value(msg) (asynchronous)
msg = sync evt msg = ftr.get()
choose evt_lst ???
evt’ = wrap evt fun ???

As you can see, C++ has no equivalents of choose and wrap. I will discuss those next.

choose

CML choose is a function that takes a list of events and creates a new composite event. When the program syncs on this event, it blocks until one of the listed events completes. The combination sync and choose is similar to Unix select or Windows WaitForMultipleObjects.

An example of the usefulness of choose is a situation when there is more than one algorithm to calculate a result, and the program runs them in parallel until the fastest one completes. Linear search, for instance, is faster for small sets than binary search. Instead of figuring out which one to use for each particular case, it might be cheaper (on a multicore) to run both and stop when the fist one is done.

What would choose look like in C++? One possibility is to implement it as a container object. You would push individual futures into it and then call get_future to obtain a composite future. Let me call this object future_or since it does the equivalent of logical OR for futures (the equivalent of AND is called a thread barrier).

  unique_future<int> ftr1 = promise1.get_future();
  unique_future<int> ftr2 = promise2.get_future();
  future_or<int> orf;
  orf.push_back(std::move(ftr1));
  orf.push_back(std::move(ftr2));
  unique_future<int> compositeFut = orf.get_future();
  ...
  int result = compositeFut.get();

Another possible implementation, in C++0x, would be to define a variadic template function future_choose. It would take a variable number of futures and return the composite future. The advantage of such an approach would be to let the compiler infer the types and the number of arguments automatically.

You probably noticed that future_or can only accept futures of one type (the same would be true about the function future_choose). For instance, to create a composite future that returns a string upon get, you must build it from other futures that also return a string (and not an int, for instance).

In principle, heterogeneous futures could be combined if they all returned different types through the same variant. Then, after a composite of variant-returning futures returns a variant, program logic would fork depending on the actual type inside the variant. This approach bypasses the static type system and is error prone. CML provides a more elegant solution using wrap.

wrap

The trick is to combine (wrap) a particular future with a function that processes its output. A future that returns an int is combined with a function that takes an int, etc. The result of this combination is a new future usually returning void.

When a wrapped future is synchronized (e.g., by calling wait), it executes the associated function, passing it the result returned by the original future. Just look at this example:

  void handler(Foo const & foo);
  unique_future<Foo> fooFuture = prom.get_future();
  // combine the future with a handler function
  unique_future<void> wrappedFuture = wrap_future(fooFuture, &handler);
  ...
  wrappedFuture.wait();
  // at this point fooFuture has returned a Foo object 
  // and the handler has been executed with that object.

There’s one thing worth mentioning: the compiler will assure type correctness of the arguments to wrap_future: the return type of the future must be compatible with the argument type of the function. Mismatch errors will be discovered at compile time. Compare this with the implementation using variants, which does not have this property.

Of course the usefulness of wrap really comes into play when you try to combine heterogeneous futures using future_choose. You just wrap each future with its own post-processing function (or callable object) and pass them all to future_choose. Wrapped futures are homogeneous–they all return void.

Here’s an example that uses C++0x lambdas (one of them being a closure):

  unique_future<Foo> fooFuture = prom1.get_future();
  unique_future<int> intFuture = prom2.get_future();
  int sum = 0;
  future<void> composite = future_choose(
      wrap_future(fooFuture, [](Foo const & foo){foo.Commit();},
      wrap_future(intFuture, [&sum](int i){sum += i;});
  ...
  composite.wait();

Wrapped futures would be useful, e.g., in implementing a server that listens to several communication channels.

What about Haskell?

As I described in my earlier post, unlike CML, Haskell implements asynchronous message passing. The original Concurrent Haskell paper argued against choose. Later however Avik Chaudhuri was able to show that the CML event library can be implemented on top of asynchronous MVars. So there are no theoretical obstacles to asynchronous implementation of futures.

Final remarks

The equivalent of choose called future_select has been proposed for Boost and C++0x, but it was turned down.

Interestingly enough, Microsoft’s Task Parallel Library for .NET implements higher-order concurrency. They have ContinueWith for wrapping futures and WaitAny for choice. I can only assume that it’s easier to implement those on a virtual machine.

I also wonder whether it would be more productive to implement the equivalent of the CML channels and events in C++, and treat futures as a special case.

If you like this post, please vote for it on reddit.

Bibliography

  1. J. H. Reppy. Higher-order concurrency. PhD thesis, Cornell
    University, 1992. Technical Report 92-1852.
  2. Avik Chaudhuri. A Concurrent ML Library in Concurrent Haskell.

I am frankly disappointed with C++0x futures. They lack a major feature–composability. It’s like having Booleans without OR or AND.

But let me first explain what futures are and how they’re implemented in C++0x.

What is a future?

In a nutshell, a future is an asynchronous function call. You call a function, but don’t wait until it completes–the body of the function executes in a separate thread. Instead you get a future, a promise to deliver the result at some later point in time. Only when your program desperately needs that value, do you synchronize on the future. If the calculation hasn’t been completed by then, you block until it has. Otherwise you get the result of the calculation.

C++0x futures

C++ splits the implementation of futures into a set of small blocks–it almost creates an assembly language of futures. Here are the primitives.

-promise

A promise is a vehicle for passing the return value (or an exception) from the thread executing a function to the thread that cashes in on the function future. The function (or a callable object) that is executed in a separate thread must have access to the promise, and it explicitly sets the return value.

You can think of a promise as a primitive channel; the function in this case is the sender. The equivalent of the “send” method is called promise::set_value (to pass an exception to the caller, the callee can call promise::set_exception.

There is no corresponding “receive” method–the receiving is abstracted into the future object.

Here’s some code that illustrates the use of promise. Notice how the function to be called asynchronously must be aware of the promise.

void asyncFun(promise<int> intPromise)
{
    int result;
    try {
        // calculate the result
        intPromise.set_value(result);
    } catch (MyException e) {
        intPromise.set_exception(std::copy_exception(e));
    }
}

The calling thread creates the promise and passes it to the worker thread executing asyncFun (I’ll show the details later).

-future

A future is the synchronization object constructed around the receiving end of the promise channel. The calling thread obtains a future using promise::get_future.

The most important method of the future is get. If the result is not ready, get will block. When get completes, it either returns a value or throws an exception. The return value or the exception has been set by the called function through the promise associated with the future (see above).

get can also be split into its more basic parts: wait, optionally followed by has_value (or has_exception) and the call to get, which is guaranteed not to block. The advantage of the latter approach is that one can use versions of wait, wait_for and wait_until, that set timeouts.

There is also an asynchronous method, is_ready, that can be used for polling rather than blocking.

There are two separate implementations of future: regular future (which used to be called unique_future in Draft Standard) that works like a unique_ptr as far as passing values goes, and shared_future that works more like shared_ptr. For instance, the method get of future cannot be called twice because it transfers the ownership of the result.

Here’s an example how a future could be used in the caller’s code:

promise<int> intPromise;
future<int> intFuture = intPromise.get_future();
std::thread t(asyncFun, std::move(intPromise));
// do some other stuff
int result = intFuture.get(); // may throw MyException

-packaged_task

In most cases the use of promises can be hidden from the programmer. There is a template class, packaged_task that takes any function (or callable object) and instruments it for use with futures. In essence, it creates a promise and calls the function from inside a try/catch block. If the function returns, packaged_task puts the value in the promise, otherwise it sets the exception. A packaged_task is a callable object (has the function call operator() defined) can be passed directly to a thread.

Here’s a more complete example:

vector<int> primesUpTo(int n);

int main() {
    packaged_task<vector<int>(int)> task(&primesUpTo);
    future<vector<int>> fut = task.get_future();
    thread t(move(task), 100);
    t.detach();
    // do some other work while the primes are computing
    vector<int> primes = fut.get();
    // print primes
    return 0;
}

Composability

So far so good. The creation of futures might be a bit verbose in C++, but it can be easily adapted to concrete environments. The futures standard library should be treated as a set of primitive building blocks from which to build higher abstractions.

Notice how C++ separated the channel (promise) from the synchronization mechanism (future); and thread creation from communication and synchronization. You may, for instance, use futures with thread pools or create threads on the spot.

What was omitted from the standard, however, was the ability to compose futures. Suppose you start several threads to perform calculations or retrieve data in parallel. You want to communicate with those threads using futures. And here’s the problem: you may block on any individual future but not on all of them. While you are blocked on one, other futures might become ready. Instead of spending your precious time servicing those other futures, you might be blocked on the most time-consuming one. The only option to process futures in order of completion is by polling (calling is_ready on consecutive futures in a loop) and burning the processor.

The need for composability of synchronization events has been widely recognized. Windows has WaitForMultipleObjects, Unix has select, and several higher order languages have explicit constructs that serve the same purpose. But that’s a topic for my next installment.

Acknowledgments

I’d like to thank Anthony Williams for helpful comments. Among other things, he is the author of one of the futures proposals for Boost. A stripped down version of it will become part of the threadpool library. Unfortunately, the stripping got rid of composability features.