C++ is like an oil tanker — it takes a long time for it to change course. The turbulent reefs towards which C++ has been heading were spotted on the horizon more than ten years ago. I’m talking, of course, about the end of smooth sailing under the Moore’s law and the arrival of the Multicore. It took six years to acknowledge the existence of concurrency in the C++11 Standard, but that’s only the beginning. It’s becoming more and more obvious that a major paradigm shift is needed if C++ is to remain relevant in the new era.
Why do we need a new paradigm to deal with concurrency? Can’t we use object oriented programming with small modifications? The answer to this question goes to the heart of programming: it’s about composability. We humans solve complex problems by splitting them into smaller subproblems. This is a recursive process, we split subproblems into still smaller pieces, and so on. Eventually we reach the size of the problem which can be easily translated into computer code. We then have to compose all these partial solutions into larger programs.
The key to composability is being able to hide complexity at each level. This is why object oriented programming has been so successful. When you’re implementing an object, you have to deal with its internals, with state transitions, intermediate states, etc. But once the object is implemented, all you see is the interface. The interface must be simpler than the implementation for object oriented programming to make sense. You compose larger objects from smaller objects based on their interfaces, not the details of their implementation. That’s how object oriented programming solves the problem of complexity.
Unfortunately, objects don’t compose in the presence of concurrency. They hide the wrong kind of things. They hide sharing and mutation. Let me quote the definition of data race: Two or more threads accessing the same piece of memory at the same time, at least one of them writing. In other words: Sharing + Mutation = Data Race. Nothing in the object’s interface informs you about the possibility of sharing and mutation inside the object’s implementation. Each object in isolation may be data-race-free but their composition may inadvertently introduce data races. And you won’t know about it unless you study the details of their implementation down to every single memory access.
In Java, an attempt had been made to mollify this problem: Every object is equipped with a mutex that can be invoked by declaring the method synchronized. This is not a scalable solution. Even Java’s clever thin lock implementation incurs non-negligible performance overhead, so it is used only when the programmer is well aware of potential races, which requires deep insight into the implementation of all subobjects, exactly the thing we are trying to avoid.
More importantly, locking itself doesn’t compose. There’s a classic example of a locked bank account whose
withdraw methods are synchronized by a lock. The problem occurs when one tries to transfer money from one account to another. Without exposing the locks, it’s impossible to avoid a transient state in which the funds have already left one account but haven’t reached the second. With locks exposed, one may try to hold both locks during the transfer, but that creates a real potential for deadlocks. (Software Transactional Memory provides a composable solution to this problem, but there are no practical implementations of STM outside of Haskell and Clojure.)
Moreover, if we are interested in taking advantage of multicores to improve performance, the use of locks is a non-starter. Eking out parallel performance is hard enough without locks, given all the overheads of thread management and the Amdahl’s law. Parallelism requires a drastically different approach.
Since the central problem of concurrency is the conflict between sharing and mutation, the solution is to control these two aspects of programming. We can do mutation to our heart’s content as long as there’s no sharing. For instance, we can mutate local variables; or we can ensure unique ownership by making deep copies, using move semantics, or by employing
unique_ptrs. Unique ownership plays very important role in message passing, allowing large amounts of data to be passed cheaply between threads.
However, the key to multicore programming is controlling mutation. This is why functional languages have been steadily gaining ground in concurrency and parallelism. In a nutshell, functional programmers have found a way to program using what, to all intents and purposes, looks like immutable data. An imperative programmer, when faced with immutability, is as confused as a barbecue cook in a vegetarian kitchen. And the truth is that virtually all data structures from the C++ standard library are unsuitable for this kind of programming — the standard vector being the worst offender. A continuous slab of memory is perfect for random or sequential access, but the moment mutation is involved, you can’t share it between threads. Of course, you can use a mutex to lock the whole vector every time you access it, but as I explained already, you can forget about performance and composability of such a solution.
The trick with functional data structures is that they appear immutable, and therefore require no synchronization when accessed from multiple threads. Mutation is replaced by construction: you construct a new object that’s a clone of the source object but with the requested modification in place. Obviously, if you tried to do this with a vector, you’d end up with a lot of copying. But functional data structures are designed for maximum sharing of representation. So a clone of a functional object will share most of its data with the original, and only record a small delta. The sharing is totally transparent since the originals are guaranteed to be immutable.
A singly-linked list is a classical, if not somewhat trivial, example of such a data structure. Adding an element to the front of a list requires only the creation of a single node to store the new value and a pointer to the original (immutable) list. There are also many tree-like data structures that are logarithmically cheap to clone-mutate (red-black trees, leftist heaps). Parallel algorithms are easy to implement with functional data structures, since the programmer doesn’t have to worry about synchronization.
Functional data structures, also known as “persistent” data structures, are naturally composable. This follows from the composability of immutable data — you can build larger immutable objects from smaller immutable objects. But there’s more to it: This new way of mutating by construction also composes well. A composite persistent object can be clone-mutated by clone-mutating only the objects on the path to the mutation; everything else can be safely shared.
Concurrency also introduces nonstandard flows of control. In general, things don’t progress sequentially. Programmers have to deal with inversion of control, jumping from handler to handler, keeping track of shared mutable state, etc. Again, in functional programming this is nothing unusual. Functions are first class citizens and they can be composed in many ways. A handler is nothing but a continuation in the continuation passing style. Continuations do compose, albeit in ways that are not familiar to imperative programmers. Functional programmers have a powerful compositional tool called a monad that, among other things, can linearize inverted flow of control. The design of libraries for concurrent programming makes much more sense once you understand that.
A paradigm shift towards functional programming is unavoidable and I’m glad to report that there’s a growing awareness of that new trend among C++ programmers. I used to be the odd guy talking about Haskell and monads at C++ meetings and conferences. This is no longer so. There was a sea change at this year’s C++Now. The cool kids were all talking about functional programming, and the presentation “Functional Data Structures in C++” earned me the most inspiring session award. I take it as a sign that the C++ community is ready for a big change.