See my new blog, Asynchronous API in C++ and the Continuation Monad. It’s an explanation why functional programming ideas are important in designing imperative languages.
May 11, 2012
The Future of C++ Concurrency and Parallelism
Posted by Bartosz Milewski under C++, Concurrency, Multicore, Multithreading, Parallelism, Programming, Software Transactional Memory[16] Comments
It was my first experience working with the C++ Standardization Committee in a subgroup dedicated to concurrency and parallelism. I won’t bore you with details — they will be available at the committee web site. I’ll share my overall impressions and then focus on specific areas where I have strong opinions.
Being an outsider I considered the C++ Standard the ultimate word. If I had problems interpreting the letter of the Standard I would ask one of the committee members for interpretation and assume that I would get the same answer from any of them. Reality turned out to be more complex than that. C++ Standard is full of controversial topics. Some of those controversies could not be resolved in time, so often the wording of the Standard is intentionally vague. Some features were not ready for inclusion so little stubs were inserted into the document that sometimes don’t make much sense in isolation.
One such example is the intentional vagueness and the lack of definition of thread of execution. Not only is a thread undefined, some of the semantics are expressed using the “as if” language. In particular the thingie started by std::async is supposed to behave “as if” it were run in a separate thread of execution (whatever that means). At some point I had a long email exchange about it with Anthony Williams and Hans Boehm that resulted in a blog post. I thought the things were settled until I was alerted to the fact that Microsoft’s interpretation of the Standard was slightly different, and their “as if” didn’t include thread_local variables, at least not in the beta of the new Visual C++.
Here’s the problem: std::async was introduced in the Standard as a compromise between the idea that it’s just syntactic sugar over std::thread creation, and the idea that it’s an opening for task-based parallelism. In fact when I first tried std::async using Anthony Williams’ Just Thread library I expected it to run on a thread pool complete with work stealing and thread reuse. Not so, argued Anthony and Hans, pointing among others things to the problem of managing thread-local variables — are they supposed to be local with respect to the underlying OS thread, or to a smaller units of execution, the tasks?. If multiple tasks are reusing the same thread should they see fresh versions of thread_local variables? When should thread-local variables be destroyed if the lifetime of pool threads is theoretically infinite?
Now, Microsoft has its implementation of task-based concurrency in the form of PPL (Parallel Pattern Library). Intel has TBB (Threading Building Blocks), which is a superset of PPL and it also runs on Linux. I can understand the eagerness of those companies to bend the (intentionally vague) rules and make these libraries accessible through std::async, especially if they can dramatically improve performance.
I’d be the first to vote for this proposal, except for a few unsolved problems.
First of all, Microsoft wanted to change the semantics of std::async when called with launch_policy::async. I think this was pretty much ruled out in the ensuing discussion. Pure async case should be indistinguishable from direct creation of std::thread. Any attempt at using a thread pool behind the scenes could result in deadlocks. Essentially, the programmer must have a guarantee that all the tasks will be allowed to run in parallel no matter how many there are. Just imagine a bunch of tasks trying to communicate with each other back and forth. If thread creation is throttled down after N of them start and possibly block waiting for responses from the rest of them, they might block forever. Thread pools usually have the ability to create new threads on demand, but it’s never obvious when a new thread must be created. Even if the pool could detect all the threads that are blocked, it couldn’t detect those that are busy-spinning. This is why std::async with launch_policy::async must always create, or at least immediately steal, a thread.
The situation is different with std::async called with the default launch policy (the bitwise OR of launch_policy::async). In that case the runtime does not guarantee that all tasks will be able to run in parallel. In fact the programmer must be prepared for the possibility that all tasks run serially in the context of the parent thread (more specifically, in the context of the thread that calls and launch_policy::deferredfuture::get). Here the problem with using a thread pool is different. It has to do with the lifetimes of thread_local variables that I mentioned before. This is a serious problem and the semantics defined by the current Standard are far from natural. As it stands, a task created using the default launch policy must either run on a completely new thread, in which case that thread defines the lifetimes of thread_local variables; or it must be deferred, in which case it shares thread_local variables with its parent (again, strictly speaking, with the caller of future::get — if the future is passed to a different thread). This behavior might seem confusing, but at least it’s well defined.
Here’s how Herb Sutter proposed to solve the problem of making tasks run in a thread pool: Disallow non-POD thread_locals altogether. The argument was that nobody has implemented non-POD thread locals anyway, so nobody will suffer. Anthony Williams’ and Boost implementations were dismissed as library-based.
This seems to me like a violation of the spirit of C++, but there is a precedent for it: atomic variables. You can declare a POD (Plain Old Data, including simple structs) as atomic and, if it fits inside a hardware supported atomic word, it will become a lock-free atomic; otherwise a lock will be provided free of charge (well, you’ll pay for it with performance, but that’s a different story). But you can’t define a non-POD as atomic!
A quick straw poll showed that the subcommittee was equally split between those who were willing to discuss this change and those who weren’t. It seems though that Microsoft will go ahead with its PPL implementation ignoring the problems with thread_local (and also with DLL_THREAD_DETACH handlers I mentioned in my blog). So you might want to restrict the use of non-POD thread-local variables for the time being.
This discussion had a larger context: The proposal to introduce thread pools into the language/library as first class objects. Google’s Jeffrey Yaskin described their Executor library, which combines thread pools with work-stealing queues and schedulers. PPL has a similar construct called task group. In this new context, std::async would only provide an interface to a global default thread-pool/executor/task-group. The introduction of first-class thread pools would take away the pressure to modify the semantics of std::async. If you cared about the way your tasks are scheduled, you could spawn them using a dedicated thread-pool object. Having an explicit object representing a set of tasks would also allow collective operations such as wait-for-all or cancel.
Which brings me to another topic: composable futures. I wrote a blog post some time ago, Broken Promises: C++0x Futures, in which I lamented the lack of composability of futures. I followed it with another blog, Futures Done Right, proposing a solution. So I was very happy to learn about a new proposal to fix C++ futures. The proposal came from an unexpected source — C#.
The newest addition to C# is support for asynchronous interfaces (somewhat similar to Boost::ASIO). This is a hot topic at Microsoft because the new Windows 8 runtime is based on asynchronous API — any call that might take more than 50ms is implemented as an asynchronous API. Of course you can program to asynchronous API by writing completion handlers, but it’s a very tedious and error-prone method. Microsoft’s Mads Torgersen described how C# offers several layers of support for asynchronous programming.
But what caught my interest was how C# deals with composition of futures (they call them task objects). They have the analog of an aggregate join called WhenAll and an equivalent of “select” called WhenAny. However these combinators do not block; instead they return new futures. There is another important combinator, ContinueWith. You give it a function (usually a lambda) that will be called when the task completes. And again, ContinueWith doesn’t block — it returns another future, which may be composed with other futures, and so on. This is exactly what makes C# futures composable and, hopefully, C++ will adopt a similar approach.
Of course there is much more to the async proposal, and I wish I had more time to talk about it; but the composable integration of asynchronicity with task-based concurrency is in my eyes a perfect example of thoughtful design.
I noticed that there seems to be a problem with C++’s aversion to generalizations (I might be slightly biased having studied Haskell with its love for generalizations). Problems are often treated in separation, and specific solutions are provided for each, sometimes without a serious attempt at generalization. Case in point: cancellation of tasks. A very specialized solution involving cancellation tokens was proposed. You get opaque tokens from a factory, you pass them to tasks (either explicitly or by lambda capture), and the tasks are responsible for polling the tokens and performing appropriate cancellation actions. But this is an example of an asynchronous Boolean channel. Instead of defining channels, C++ is considering a special-purpose one-shot solution (unless there is a volunteer willing who will write a channels proposal). By the way, futures can be also viewed as channels, so this generalization might go a long way.
Another candidate for generalization was the Intel vectorization proposal presented by Robert Geva. Of course it would be great to support the use of vector processors in C++. But you have to see it in the larger context of data-driven parallelism. It doesn’t make sense to have separate solutions for vector processors, multicores running in SIMD mode, and GPGPUs. What’s needed is general support for data parallelism that allows multiple hardware-specific specializations. Hopefully a more general proposal will materialize.
The C++ Standards Committee is doing a great job, considering all the limitations it’s facing. The committee will not add anything to the language unless there are volunteers who will write proposals and demonstrate working implementations. Remember, you too can contribute to the future of C++.
April 9, 2012
The Downfall of Imperative Programming
Posted by Bartosz Milewski under C++, Concurrency, Functional Programming, Multicore, Multithreading, Parallelism, Programming[7] Comments
April 6, 2012
First off, Microsoft did a great job with this conference, and kudos to Mads Torgersen who organized it. They invited interesting speakers from all over the world. Even some PL professors from the local University of Washington left their ivory tower and came to listen to some talks. A variety of languages and new features were discussed.
The two topics that stood out were: functional programming and DSLs. It seems like every language tries to assimilate some functional elements in order to deal with parallelism. Still, the problem is that in most languages the added safety features for concurrency are not enforceable and don’t compose well. A compiler can potentially check whether a lambda passed to a parallel loop doesn’t modify shared state, but it can’t extend this check to functions called from that lambda. The solution would be an effect system, which includes side effects of a function in its signature, but effect systems are extremely hard to program with. Compare this with Haskell monads, which serve as a simple but safe effect system. Unfortunately many imperative programmers have fits when they hear the M word. (I remember C programmers having the same reaction to the word “object.”)
Martin Odersky, the creator of Scala described an ingenious macro system for Scala. The macros work on AST trees that can be manipulated programmatically. Kunle Olokotun from Stanford described an EDSL system written in Scala that is used for highly optimized parallel computations.
Robert Griesemer from Google presented the go language. It’s a nice idea to have a simple minimalistic general purpose language (they don’t call it a systems language any more). It’s harder to keep it simple forever. We’ve seen many languages that started simple only to cave in later, under demands from users, to explode in a cornucopia of features. I fear the same fate awaits go. Generics will come first, then the community will realize that go’s concurrency model is not general enough, and so on. Soon we’ll have go++.
As a confirmation of this trend, John Rose from Oracle presented language extensions in Java 8. Not surprisingly, among them was support for lambdas — another sign of functional programming going mainstream.
Speaking of functional programming, Andy Adams-Moran had a presentation about the Haskell work they do at Galois. They specialize in high-assurance software and do a lot of business with the Government. Andy showed amazing feats of Haskell engineering. They were able to port the Haskell compiler and runtime to bare Xen, bypassing Linux. They even wrote their own drivers in Haskell. Low-level and high performance are not epithets normally associated with Haskell. (And yes, they used monads.)
Microsoft had their own talks, quite interesting although more product oriented. They talked about the Power Shell, a scripting language for Windows 8. F# had some interesting extensions (described by Donna Malayeri) that let the programmer, during program development, connect to a web site and download the schema, turn it into F# types, and plug them into Visual Studio intellisense. There was also a talk by Martyn Lovell about the new Windows Runtime that, in some sense, reverted to COM for its API. I’m not sure if that means that the Frankenstein of C++/CLI can finally be put to rest.
The especially interesting Microsoft talk was about support for asynchronous programming in C# and VB. The speaker was Mads Torgersen, the organizer of the conference. Mads described a coroutine-style programming model where you can block in any part of your code using await and, when an event is triggered, resume execution right where you stopped. In the meanwhile, your program can continue executing other code — on the same thread. (You might recognize here a continuation monad in disguise.) This is a perfect model for GUI applications that call asynchronous operations. I talked to Herb Sutter about it later and he said that a very similar thing is being proposed for C++ (I’m going to be on the C++ sub-committee for parallelism with him, and I’ll report more details).
Herb also had a talk about C++11 and how different it feels from the previous versions. For me the really big deal is move semantics that tips the scales from passing by reference to passing by value — where passing by value is really passing by reference behind the scenes.
There was some trench war between strong typing and weak, dynamic, or no typing. Gilad Bracha from Google presented the Dart language that they think will replace JavaScript. Dart has optional types and Gilad was trying to convince the audience that strong typing doesn’t decrease the number of bugs — I don’t think many programmers bought that. Right after his talk there was a counter-talk about ECMA Script 6, which is the new version of JavaScript. It seems like they are adding a lot of features to make JavaScript more like Java, including classes and methods. Both Dart and JavaScript are trying to position themselves as large-scale development languages — I’d like to see that, but I’m not holding my breath.
Another split was between native and managed languages, but nobody could agree on the definitions. When does a native language with a big runtime become a managed language, and a managed language with a JIT compiler become a native language? Herb Sutter tried to marry these two approaches, but he chose a curious metaphor — native programmers are from Mars and managed programmers are from Venus. I don’t think a lot of managed programmers were happy with that comparison. From where I sat it looked more like native programmers were distributing assault rifles to the programming public, trusting they won’t shoot themselves in the foot; while managed programmers were trying to enforce some kind of gun laws.
There were many new (at least to me) interesting languages and DSLs. Among them was Grace, Bloom, Julia, R, and others. You should look them up if you’re interested.
There were two talks about the D language by Walter Bright and Andrei Alexandrescu. There was some interest in D’s pure functions which, together with immutability, could form a platform for safer concurrency (purity propagates in a way not dissimilar to an effect system). D could become a player in that area, if only Andrei were not so stubborn and listen to me ;-).
Videos of presentations will be available on Lang.NEXT web site.
[:Bartosz:]
April 5, 2012
I had a pleasure to meet this man when I was a physics postdoc at the Max Planck Institute in Munich. He was still able to have lunch with the faculty and gave his own talk through a translator who could understand his deteriorating speech.

Created by: OnlinePhD.org
February 6, 2012
Microsoft’s Going Native 2012 was a great conference for C++ developers. There were talks by the movers and shakers of C++ including Bjarne Stroustrup himself. There were a few new topics, like the experimental work on Concepts by Stroustrup and Sutton.
And a new meme was forged by Andrei Alexandrescu.
Andrei is famous for his sound bytes, but this one really hit the nerve. There was a question during the panel discussion from a person who complained that, for some bizarre nonsensical reasons, the management of his company wouldn’t let him use STL. “Call your headhunter!” responded Andrei. The audience cheered and the tweetosphere lit up. It was not only a great comeback but also a serious piece of advice. We C++ programmers have much more power than we think.
So if you are feeling unhappy with your job because the management interferes with your productivity and creativity, you don’t have to take it any more. Good C++ programmers are in great demand, recession notwithstanding. I regularly attend Northwest C++ User’s Group meetings, and every month there is some recruiter buying pizzas for all 20-30 people (if you are living in the Greater Seattle Area, please come and have some pizza with us). If you have any doubts whether you are a good programmer, let me assure you: The fact that you are reading this blog is proof enough.
I’ve had the opportunity to talk to Bjarne Stroustrup about the value of a good programmer. Bjarne said that managers rarely appreciate the orders of magnitude separating good programmers from average programmers in terms of productivity and code quality. The managers don’t see the point in providing top-notch productivity tools nor do they understand the need for constant education (some of my programmer friends had to pay for the conference from their own pockets). In other words, they consider programmers a cheap commodity. If you are stuck working for one of those companies, call your headhunter!
No programming job should be boring. This is the magic of programming: Any boring repetitive task can be automated. It’s this process of automation, of finding meta-solutions, that makes programming exciting.
Chandler Carruth had an interesting talk about the progress of the Clang project. They have open-sourced their C++ front-end and made possible the creation of smart tools that can work on very large C++ projects. Large projects sooner or later enter the stage when maintenance is nightmare. Chandler demonstrated that it doesn’t have to be so. Instead of propagating a top-level modification by hand through millions of lines of code, you can write a short program that uses the C++ front-end to make exactly the modifications that are needed — more reliably than a human. This includes understanding and, if necessary, modifying macros.
Chandler’s group was also able to produce tools that not only figure out which includes are strictly necessary in each file, but also introduce forward declarations whenever possible. Java programmers are probably snickering, but for C++ this is a major breakthrough.
So if you are faced with a boring job and are not allowed to search for automated solutions, use STL, or upgrade an ancient compiler or build system — seriously, call your headhunter!
January 23, 2012
The Day the LOLcats Laughed
Posted by Bartosz Milewski under Uncategorized | Tags: internet crowd, recording industry association of america |[3] Comments
I don’t know about you, but I feel like we have just witnessed the beginning of a new era. The Internet stood up to Congress and to the Senate and won. We have defeated SOPA and PIPA. Big corporations represented by MPAA (Motion Picture Association of America) and RIAA (Recording Industry Association of America) sponsored those bills and they spent huge amounts on lobbying in Washington. They were so close to having their wishes granted and then, unexpectedly, a spontaneous grass-root protest movement organized with the help of the Internet defeated them. Of course this is not Libya or Syria — people’s lives are not in danger, our system is based on democracy, and we have our rights. But those rights sometimes need defending.
I don’t expect everybody to spend time studying SOPA and POPA in depth, but we (still) have the Internet to provide intelligent explanations and summaries. I like this short video by the Khan Academy. It explains in simple terms what a disaster SOPA and PIPA would have been if they were written into law.
So what has just happened? A group of people — programmers, content providers, the Internet crowd — suddenly realized how much power they have, and were able to put politicians on notice. The Big Blackout of 1/18/2012, with the participation of major sites like Reddit, Wikipedia, WordPress, and many others, was extremely effective. Those sites took a big risk, and they succeeded because of overwhelming support for their actions.
But as Wikipedia says, the fight isn’t over yet. I’m sure MPAA and RIAA are hard at work trying to sneak another law that would give them power to block any site on a whim and without recourse. We should remain vigilant. But, in the meanwhile, why not rectify some mistakes from the past?
Mickey Mouse Laws
Don’t get me wrong, content creators deserve copyright protection. Just like inventors deserve patent protection. Patents expire after 20 years; but the Mickey Mouse Law provides copyright protection for a century. This is how long we have to wait until copyrighted material enters public domain.
Walt Disney is long dead but every time the copyright on his works nears expiry, the Congress enacts a new law that extends its reach.
If I drew a Mickey Mouse in my blog I’d be sued by the Disney estate (and, if SOPA passed, WordPress would be taken offline).
Do you know what was invented 100 years ago? Motorized film cameras — the replacement for the hand-cranked ones. If patents were valid as long as copyrights, the motion picture industry would still be paying patent fees to the great-grandchildren of Kazimierz Prószyński.
Copyright laws were originally designed to protect artists but have since degenerated into corporate welfare. Works of art and cultural icons should become part of public domain in reasonable time, before we and our children die. So why don’t we start writing letters to our representatives asking them to abolish Mickey Mouse laws and settle on something that would actually encourage creativity. It’s time to release poor Mickey from the corporate prison.
January 3, 2012
Virtual Machines: The Traps and The Thin Hypervisor
Posted by Bartosz Milewski under hypervisor, Programming, virtualization, x86Leave a Comment
I have released the final part of the series on virtual machines, The Thin Hypervisor. It’s a very promising technology that allows you to virtualize a running OS on demand. My previous blog entry is a recommended reading before this one. It explains how the hypervisor interacts with the operating system.
Follow @BartoszMilewski
(You can also follow me on Google+, if you search for Bartosz Milewski.)
December 31, 2011
The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.
Here’s an excerpt:
The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 180,000 times in 2011. If it were an exhibit at the Louvre Museum, it would take about 8 days for that many people to see it.
Click here to see the complete report.
December 5, 2011
Virtual Machines: Virtualizing Virtual Memory
Posted by Bartosz Milewski under hypervisor, Programming, virtualization, x86Leave a Comment
My new blog about virtual machines is out. It gives a peek at the tricks used by hypervisors to fool the operating system into running in a virtual box. I discuss nested page tables, shadow page tables, tracing, hidden page faults, and other interesting goodies.
Follow @BartoszMilewski
(You can also follow me on Google+, if you search for Bartosz Milewski.)

