Futures done right

In my last post I criticized the C++0x implementation of futures for their lack of composability. I now feel obliged to show what I consider “the right way” to do futures.

I use the event library of Concurrent Caml (actually, its version distributed with OCaml) as my model. The ideas behind it are described in J. H. Reppy’s Ph.D. thesis, Higher-order concurrency.

Events and Futures

CML channels and events are more general than futures, but here I’ll concentrate on the subset of their functionality that can be used to implement futures. Below is a useful translation table between CML and C++. (I should add that CML channels, as opposed to C++ futures, are synchronous. In practice it doesn’t matter that much, since promise::set_value–corresponding to channel send–is usually called right before the thread exits.)

CML	C++
channel	promise
event	future
evt = receive chan	ftr = prom.get_future()
sync (send chan msg) (synchronous)	prom.set_value(msg) (asynchronous)
msg = sync evt	msg = ftr.get()
choose evt_lst	???
evt’ = wrap evt fun	???

As you can see, C++ has no equivalents of choose and wrap. I will discuss those next.

choose

CML choose is a function that takes a list of events and creates a new composite event. When the program syncs on this event, it blocks until one of the listed events completes. The combination sync and choose is similar to Unix select or Windows WaitForMultipleObjects.

An example of the usefulness of choose is a situation when there is more than one algorithm to calculate a result, and the program runs them in parallel until the fastest one completes. Linear search, for instance, is faster for small sets than binary search. Instead of figuring out which one to use for each particular case, it might be cheaper (on a multicore) to run both and stop when the fist one is done.

What would choose look like in C++? One possibility is to implement it as a container object. You would push individual futures into it and then call get_future to obtain a composite future. Let me call this object future_or since it does the equivalent of logical OR for futures (the equivalent of AND is called a thread barrier).

  unique_future<int> ftr1 = promise1.get_future();
  unique_future<int> ftr2 = promise2.get_future();
  future_or<int> orf;
  orf.push_back(std::move(ftr1));
  orf.push_back(std::move(ftr2));
  unique_future<int> compositeFut = orf.get_future();
  ...
  int result = compositeFut.get();

Another possible implementation, in C++0x, would be to define a variadic template function future_choose. It would take a variable number of futures and return the composite future. The advantage of such an approach would be to let the compiler infer the types and the number of arguments automatically.

You probably noticed that future_or can only accept futures of one type (the same would be true about the function future_choose). For instance, to create a composite future that returns a string upon get, you must build it from other futures that also return a string (and not an int, for instance).

In principle, heterogeneous futures could be combined if they all returned different types through the same variant. Then, after a composite of variant-returning futures returns a variant, program logic would fork depending on the actual type inside the variant. This approach bypasses the static type system and is error prone. CML provides a more elegant solution using wrap.

wrap

The trick is to combine (wrap) a particular future with a function that processes its output. A future that returns an int is combined with a function that takes an int, etc. The result of this combination is a new future usually returning void.

When a wrapped future is synchronized (e.g., by calling wait), it executes the associated function, passing it the result returned by the original future. Just look at this example:

  void handler(Foo const & foo);
  unique_future<Foo> fooFuture = prom.get_future();
  // combine the future with a handler function
  unique_future<void> wrappedFuture = wrap_future(fooFuture, &handler);
  ...
  wrappedFuture.wait();
  // at this point fooFuture has returned a Foo object 
  // and the handler has been executed with that object.

There’s one thing worth mentioning: the compiler will assure type correctness of the arguments to wrap_future: the return type of the future must be compatible with the argument type of the function. Mismatch errors will be discovered at compile time. Compare this with the implementation using variants, which does not have this property.

Of course the usefulness of wrap really comes into play when you try to combine heterogeneous futures using future_choose. You just wrap each future with its own post-processing function (or callable object) and pass them all to future_choose. Wrapped futures are homogeneous–they all return void.

Here’s an example that uses C++0x lambdas (one of them being a closure):

  unique_future<Foo> fooFuture = prom1.get_future();
  unique_future<int> intFuture = prom2.get_future();
  int sum = 0;
  future<void> composite = future_choose(
      wrap_future(fooFuture, [](Foo const & foo){foo.Commit();},
      wrap_future(intFuture, [&sum](int i){sum += i;});
  ...
  composite.wait();

Wrapped futures would be useful, e.g., in implementing a server that listens to several communication channels.

What about Haskell?

As I described in my earlier post, unlike CML, Haskell implements asynchronous message passing. The original Concurrent Haskell paper argued against choose. Later however Avik Chaudhuri was able to show that the CML event library can be implemented on top of asynchronous MVars. So there are no theoretical obstacles to asynchronous implementation of futures.

Final remarks

The equivalent of choose called future_select has been proposed for Boost and C++0x, but it was turned down.

Interestingly enough, Microsoft’s Task Parallel Library for .NET implements higher-order concurrency. They have ContinueWith for wrapping futures and WaitAny for choice. I can only assume that it’s easier to implement those on a virtual machine.

I also wonder whether it would be more productive to implement the equivalent of the CML channels and events in C++, and treat futures as a special case.

If you like this post, please vote for it on reddit.

Bibliography

J. H. Reppy. Higher-order concurrency. PhD thesis, Cornell
University, 1992. Technical Report 92-1852.
Avik Chaudhuri. A Concurrent ML Library in Concurrent Haskell.

Robert Jacques Says:

March 10, 2009 at 9:52 pm

Thank you for the article. I hadn’t considered the use of selects/guards with futures, aside from getting them for free if using a CSP based framework. However, I don’t think futures should be a special case of a CSP or Pi calculus framework (By the way, I’m not sure why you didn’t choose to do this in one of the CSP libraries from a C-like language). Yes it’s fast to implement, but futures can and should be feather weight. Consider x = ftr1.get() + ftr2.get(). You suggested in your last post that waiting for both to complete in parallel (i.e. join(ftr1,ftr2) ) is more efficient than waiting serially for one and then another. However, working stealing is both a more efficient and lower latency solution to the common case where either ftr1 or ftr2 has yet to begin evaluation. A work-yield is a solution (though I’m not sure it’s the best one) for when both ftr1 and ft2 are being currently evaluated. Both of these alternatives are impossible with CSP based futures. Also, futures should share the same backend as auto-parallelized pure (a.k.a. functional language) function calls or actor based programming (at least at the local, non-networked level). Essentially, a channel is designed to be reused and a future is only used once, and so can be optimized (i.e. a task-based framework )

Also, on that note, alts/alternative/selects/chooses are a more general concept than a simple future: they tend to be evaluated multiple times with various sampling rules (such as round robin) and can take other arguments besides futures/channels like bools. Guards/wraps are similar. Separating these from futures is both more expressive and avoids the need for polymorphic futures.

Joerg Says:

March 11, 2009 at 1:32 am

“Instead of figuring out which one to use for each particular case, it might be cheaper (on a multicore) to run both and stop when the fist one is done.”

Yes, one of the reasons IT will never become “green” 😉
Speculative execution probably has some benefits (because current HW is using it), but running it on the scale you described is doubtful. There is a difference between e.g. speculative bit stream decompression used in a specialized HW unit or burning a whole second CPU core for a speculative computation. Can’t we find a better example here, perhaps one which really needs all results finally but can continue after a single one is available?

March 11, 2009 at 1:20 pm

Re Joerg:
The classic use case for selecting on channels are GUIs (e.g. Plan9) and other event handlers.
Here’s another one (inspired by a problem my friend ran into the other day). Some file systems have very poor support for multi-threaded file writing. So one might create a bunch of futures to do the work and wraps to output the results to a file. Selecting on the set of wrapped futures multiple times would allow the results to be written as they came in instead after the slowest future completed. (This would require wraps that only executed once and re-usable selects, both of which are present in CSP)

Top Posts « WordPress.com Says:

March 12, 2009 at 5:24 pm

[…] Futures done right In my last post I criticized the C++0x implementation of futures for their lack of composability. I now feel obliged to […] […]

Manchukuo Says:

March 12, 2009 at 10:49 pm

Even that in C++0x i haven’t seen support for verbatim and raw strings wich are useful in cases like regex and the use of get/set methods

Bartosz Milewski Says:

March 17, 2009 at 11:01 am

My main interest is what set of primitives should be implemented in a compiled (as opposed to interpreted) language. A lot of concurrency idioms work well in interpreted languages in which the virtual machine implements cheap threads, message passing, or choice primitives. Another requirement is efficient use of multicores. This used to be the Achilles heel of Haskell compilers, but some recent developments put Multicore Haskell in the spotlight.

Sektor van Skijlen Says:

February 27, 2010 at 5:34 pm

This “wrap” is actually nothing else than a “binder” in C++. It’s a pity that futures weren’t treated just like a function, that is, instead of having ‘get()’ method, they return the value via operator(). In this case “wrap” could be implemented by “tr1::bind”.

Actually, it would be still possible by just doing something like:

auto ift = []() -> int { return atoi( future.get().c_str() ); }

The Anatomy of Message Passing « Bartosz Milewski’s Programming Cafe Says:

May 21, 2010 at 2:13 pm

[…] After studying existing solutions in C++ (Boost, MPI) and in other languages (Haskell MVars and Caml channels and events) I created my own C++ mini-library. It’s general (no restrictions on message types or message […]

The Future of C++ Concurrency and Parallelism « Bartosz Milewski's Programming Cafe Says:

May 11, 2012 at 12:48 pm

[…] Futures, in which I lamented the lack of composability of futures. I followed it with another blog, Futures Done Right, proposing a solution. So I was very happy to learn about a new proposal to fix C++ futures. The […]

mariusaeriksen Says:

May 14, 2012 at 8:40 am

Most of Twitter’s Java and Scala services are programmed using composable Futures: https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Future.scala

Composable futures, very similar to Twitter’s, will also make it into the next version of the Scala language itself: http://docs.scala-lang.org/sips/pending/futures-promises.html

May 14, 2012 at 8:41 am

Most of Twitter’s services (that are written in Scala & Java) make heavy use of composable futures: https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Future.scala

And composable Futures are making it into Scala language proper as well: http://docs.scala-lang.org/sips/pending/futures-promises.html

phil Says:

November 21, 2013 at 11:31 am

“I use the event library of Concurrent Caml (actually, its version distributed with OCaml) as my model.”

There’s an event library in the OCaml standard library? Which one? Or are you referring to Lwt or Async (distributed separately)

C++17: I See a Monad in Your Future! | Bartosz Milewski's Programming Cafe Says:

February 26, 2014 at 11:59 am

[…] in: Broken Promises — C++0x futures. I also made a few suggestions as how to fix it: Futures Done Right. Five years went by and, lo and behold, a proposal to improve std::future and related API, N3721, […]

Piet Says:

May 30, 2014 at 5:46 am

“There’s an event library in the OCaml standard library? Which one? Or are you referring to Lwt or Async (distributed separately)”

See: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Event.html

Bartosz Milewski's Programming Cafe