May 2011

I’m experimenting with new media so I prepared a one hour webinar on concurrency. You can’t really say much in an hour, so I’ll just give a broad overview of the domain without going into too much detail. Join me this Tuesday May 24, 2011 at 12pm PDT. The webinar is free, courtesy my employer, Corensic. You can preview the slides and, if you’re interested, register on the same page (you don’t have to fill out all the details).

I only went to one talk, not because the rest was not interesting, quite the contrary, but because I worked with Joel and Hartmut on rewriting Proto. I think we essentially got it. We have the exrpession monad implemented, my “compile” function turned out to be the equivalent of Proto transform, but with much more flexibility, expression extension produced a little Lambda EDSL with placeholders for arguments and even const terminals. It works like a charm. If you don’t know what I’m talking about, I promise to finish my blog series on monads in C++ real soon now.

The talk I went to was Chris Kohlhoff talking more about Asio, the asynchronous IO library. He was showing how the new features of C++11 make his code simpler, safer, and more flexible without too much effort. In particular he found move semantics extremely helpful in reducing (or, in some cases, eliminating) the need for memory allocation in steady state–an important property when running in an embedded system, for instance. But what I liked most was his approach to solving the inversion of control problem by implementing his own coroutines. Sure, he had to abuse C++ macros, but the code was actually much more readable and reflected the way we think about asynchronous calls.

The idea is that, with coroutines, you write your code in a linear way. You say “read socket asynchronously” and then yield. The flow of control exits the coroutine in the middle, and continues with other tasks. The trick is that the rest of the coroutine becomes your completion handler. When the async call completes, the coroutine is called again, but it starts executing right after the last yield. Your flow of control moves back and forth, but your code looks linear, instead of being fragmented into a bunch of handlers. It makes you wish coroutines were part of the language, as they are, for instance, in C#.

By the way, I caught Hans Boehm while he was waiting for the airport shuttle and asked questions about memory_order_relaxed. You know, the problem is, Can a relaxed load fetch an “out of thin air” value–a value that has never been written by anybody else? What I’m getting now is that in practice this will never happen, but it’s very hard to describe this requirement formally. In other words, Can a malicious chip manufacturer in cahoots with a compiler writer come up with a system that fulfills the letter of the C++ memory model and lets you fetch an out-of-thin-air value? I think the answer is yes, because the language of the Standard is purposely vague on this topic:

(29.3.11) [ Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

// Thread 1:
r1 = x.load(memory_order_relaxed);
if (r1 == 42), memory_order_relaxed);
// Thread 2:
r2 = y.load(memory_order_relaxed);
if (r2 == 42), memory_order_relaxed);

However, implementations should not allow such behavior.—end note ]

Hans Boehm gave a keynote address about C++11’s support for concurrency. It was a nice overview of major features and, of course, the most interesting topic, atomics and weak atomics. The official story is that if you use locks and strong atomics, you get the DRF guarantee: If the program has no data races, it will behave in a sequentially consistent manner. How do you prove that you have no data races? You enumerate all possible interleavings, and if you can’t find one where two conflicting memory accesses happen next to each other, you’re golden. That’s more or less what Java memory model guarantees (and what Posix tried to standardize). However C++ offers the programmer a way to relax sequential consistency constraints without introducing data races. Now, if you spin it this way, it sounds like a really cool thing. Hey, look, my program is data-race free! And, get this, I don’t have to suffer sequential consistency! The natural question is, what does it buy me that the C++ Standard doesn’t treat “memory_order_relaxed” accesses as data races? I would like to hear that programs with weak atomics have well defined semantics, even if the semantics are so complex that proofs of correctness of even the simplest algorithms are non-existent. But as far as I know this is not “really” true (maybe “sort of” true?). I tried to get straight answers from Hans, but he chooses his words very carefuly, like a UN diplomat. I’ll see him again at the HotPar and I’lll press him some more.

Hans’s talk was followed by Tony Van Eerd’s presentation on lock-free programming. I liked Tony’s attitude, which was “Use Locks!” Indeed, you should look at lock-free algorithms as a last resort. He showed a few examples that were hair-raising. Even the simplest lock-free linked list is a challenge. It’s really hard to spot danger areas, like the ABA problem when the node you’re pointing at gets deallocated and reallocated when you’re not looking. Your CAS succeeds, because the addresses match, but your update ends up in the great bucket in the sky. The lock-free circular queue of integers with only one thread pushing and one thread popping turned out to be a mine field. Tony claimed that it should work with weak, relaxed memory order, atomics. But, of course, no formal proof is on the horizon. I stared at the code for minutes and it sort of made sense to me, but who knows? Hans stared at it some more and tentatively murmured that it’s probably okay. The bottom line: This is some really scary stuff.

Then I spent half a day with Hartmut and Joel: Me trying to understand Proto and they trying to understand monads. I think we’ve learned a lot from each other and the new formulation of Proto using monads is getting closer and closer. We have sort of nailed the definition of a monadic “function” in C++. I think we should call these things “hybrid” monads because they blend compile-time and runtime aspects of C++. Fascinating stuff!

I’m totally exhausted, so I won’t write much. I spoke to the audience practically non-stop for three hours, and then I was rewriting Proto with Joel Falcou and Hartmut Kaiser over beers till late at night. My talk about the connection between Haskell monads and Boost Proto was very well received–beyond my expectations. It looks like there is some real need to put theoretical foundations under the more advanced areas of template metaprogramming; and monads just fit the bill. There was a monad lurking in Proto all this time– the only thing missing was “bind”.

I’m looking forward to tomorrow’s keynote by Hans Boehm, and the continued hacking of Proto with Joel and Hartmut.

Aspen, Monday, May 15: These are just quick notes, and I apologize if they might not very coherent. I have little time to jot them down because the day was filled with talks and discussions. There are two main topics people are excited about: concurrency and template metaprogramming.

In the morning I went to Christopher Kohlhoff talk about Boost.Asio. Asio stands for Asynchronous IO. The problem is how to structure your program to deal with a lot of asynchronous calls. There are two main difficulties: resource management and inversion of control. An async-driven program is essentially a collection of disjoint callbacks. When control jumps from one callback to another, you can easily lose track of who is the owner of which resources, how to get hold of them, and how to reason about the whole system. Chris showed how to deal with resources, essentially by using shared, reference-counted, pointers. When you’re making an asynchronous call you usually have to make sure that the completion handler shares some resources with the caller. In Chris’s approach these resources are encapsulated in a Connection object and the completion handlers are methods of that object. So when a handler is called, it has access to the “this” pointer and can share data with the caller. A handler must take part in the reference counting of the Connection object, so that the object doesn’t suddenly disappear. Considering that a lot of event-driven code I’ve seen over the years used global shared data to store state, this is progress. The inversion of control is still a problem though.

Hartmut Kaiser talked about the Phoenix library, which essentially implements C++ inside C++. It is built on top of Proto (which is a big hit at the conference–more about it later). From what I gathered, Phoenix is mostly a better lambda. You can write anonymous functions in Phoenix using a very stylized C++: for instance, instead of braces, you use brackets, instead of semicolons, commas, etc. One advantage of Phoenix functions is that they can be statically polymorphic. Unfortunately the main example didn’t require polymorphism, and in fact would be much easier on the eyes if it were written using C++ lambdas. The other, more important advantage of Proto is that it’s highly customizable. Hartmut showed an example of how to extend C++ syntax to support parallelism. Behind the scenes, Phoenix took advantage of OpenMP, which is a system of ugly pragmas supported by many compilers to create parallel loops and other concurrent constructs.

An then there was a Proto marathon by Joel Falcou, after which I had a long discussion with him over dinner. Proto is a metaprogramming tour de force. If I described it as a library for constructing embedded domain-specific languages in C++, I wouldn’t give it justice. It’s a system that tricks the C++ compiler into parsing expressions into full-blown compile-time abstract syntax trees, which are at the same time function objects that can be executed at runtime. If this is not impressive enough, Proto provides multiple customization mechanism that allow you to plug in new constructs, give them specific semantics, and even rewrite the ASTs. Joel gave an example of an EDSL for expressing analytical functions, which could analytically calculate derivatives of functions at compile time. Joel is coming to my talk tomorrow and I hope he will be able to explain to me what I’m doing. My talk is essentially about how to get to grips with Proto by using Haskell monads. We’ll see how it goes.