Most languages were caught unaware by the multicore revolution. C++, predictably, developed a portable assembly language for multiprocessors (C++0x atomics). Out of sheer desperation some programmers turned to Erlang. Many still try to ignore the elephant in the room, while others try to shoot it with BB guns provided by their favorite languages.

I’ve looked at many languages and decided that none provided the right combination of performance and safety. We are still waiting for the next big one to break into the multicore territory.

Towards this goal, I’ll present a series of posts in which I’m going to develop a
threading model for that hypothetical language. The model is based on several papers that I reviewed in my previous blogs posts. My own contribution is putting the best of those ideas together into one package. I will use syntax similar to that of the D programming language, but C++ and Java programmers shouldn’t have problems following it.

Teaser #1

In one of my previous posts I described a concurrent data structure based on Haskell’s MVar. It’s essentially a one-element message queue. Let’s have a peek at the implementation that uses my proposed multithreaded scheme. Notice the total absence of special annotations–even the synchronized keyword is missing. The only unusual thing is the move operator, :=, introduced in my previous blog. It is needed in case we want to pass unique objects through MVar. You are probably asking yourself, How the heck is this supposed to work in a multithreaded environment? Read on!

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    // put: asynchronous (non-blocking)
    // Precondition: MVar must be empty
    void put(T msg) {
        assert (!_full);
        _msg := msg; // move
        _full = true;
        notify();
    }
    // take: If empty, blocks until full.
    // Removes the message and switches state to empty
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

Let’s instantiate this template as a monitor–that is an object accessible from multiple threads. We do it by specifying the owner as “self” (the this owner is not explicitly listed as a template parameter, but it can be passed to the template during instantiation).

auto mVar = new MVar<owner::self, int>;

In a self-owned object all methods are automatically synchronized. The move operator acting on an int simply copies it.

That was easy! How about instantiating a self-owned MVar for passing unique objects? Piece of cake!

auto q = new MVar<owner::self, unique Foo>;

auto foo = new unique Foo;
q.put(:= foo); // move foo
assert (foo is null);

// another thread
unique Foo f2 = q.get(); // implicit move of rvalue

So far data races have been avoided by implicit synchronization in self-owned objects and by the use of move semantics for unique objects.

Of course you know how to break the MVar, right? Instantiate it with a regular, non-unique class objects and watch aliases spread through threads. Let’s try it:

auto mVar = new MVar<owner::self, Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
// now let me race through my local alias!
myFoo.method();

Surprise! This code will not compile because a self-owned object cannot have a thread-local member. Because of a richer type system, the compiler can easily detect such violations. I’ll explain the details of the ownership scheme in my future posts–for now let me assure you that, no matter how hard you try, you can’t create a data race in this system (unless you bypass it with explicit casts).

How to making concurrent programming safer?

I propose to do this by extending the type system. (You’ve already seen some examples above.) I realize that there is strong resistance to extending a language’s type system. The main problem is increased complexity. The programmers are reluctant to study and use complex annotation schemes. I will argue that, in the case of multithreading, the complexity argument is reversed. Shared-memory multithreading is hard! Figuring out how to avoid all the pitfalls of sharing is harder than learning a type system that prevents them.

If a programmer is not willing to learn or cannot understand the race-free type system, he or she should not be writing multithreaded programs.

How to limit complexity?

To limit the complexity of the type system, I came up with several simple principles. The most important one is that multithreaded extensions should not require modifications to single-threaded programs. That means coming up with reasonable defaults for all new annotations.

If necessary, the compiler should be able to derive some of the defaults from program analysis. Because modularity is very important, such analysis should not creep across compilation unit boundaries. Whole-program analysis is out of the question.

How to keep goals reasonable?

There are two major challenges in multithreaded programming:

  1. Avoiding races
  2. Preventing deadlocks

The first problem is approachable, whereas the second seems to be a pie in the sky. I will therefore concentrate on designing a race-free system that is based on existing academic research.

Such system would be incomplete without some support for lock-free programming. There are very good reasons for the language to enforce sequential consistency. For comparison, C++ committee, after a long struggle, decided to allow non-SC primitives (weak atomics). Java only enforces SC for volatile variables.

Teaser #2

Speaking of lock-free programming, here’s the implementation of the infamous Double-Checked Locking Pattern:

class Singleton<T>
{
private:
    lockfree T _value;
public:
    T get() lockfree
    {
        if (_value is null)   
        {
            synchronized(this)
            {
                if (_value is null)
                   _value = new T;
            }
            return _value;
        }
    }
}

A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way. A lockfree method is not synchronized, but it can only operate on lockfree variables (or use a synchronized section). Even though the use of lockfree doesn’t introduce low-level races, it may create high-level races when accessing more that one lockfree variable at a time. But we all know that lock free programming is only for desperadoes.

Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

  • Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
  • There should be efficient mechanisms for passing data between threads
    • By value
    • By reference. (The object being passed must be a monitor.)
    • As immutable data, by const reference
    • By unique reference using move semantics

Most of those ideas are expressible through type qualifiers.

Higher-level models

Explicit synchronization is hard, no doubt about it. In my proposal, the type system makes it safe. There are however other models of concurrency that, although more restrictive, are often easier to use.

One such model is based on message passing. In my opinion, support for message passing should be library-based. The race-free type system will make it safe and flexible. It will, for instance, support passing messages not only by value but also by reference–without sacrificing safety. In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself (have it’s own lock) or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible through the type system, allowing the compiler to check for its violations.

I’ve been paying a lot of attention to software transactional memory (STM). I believe that STM could be built into the language–again, with the support of the type system. It looks like the hardest problem with STM is how it should inter-operate with other types of multithreaded access (both locked and lock-free). The simplest solution is to enforce full isolation–no STM object can ever be accessed outside of a transaction. It’s not clear how practical such approach is, but it definitely simplifies things to the point that STM becomes orthogonal to other types of synchronization. And that means it will be possible to tackle it at a later time, after the race-free multithreading model is firmly established.

In the next installment, I will describe the ownership scheme that is the foundation of the race-free type system.

Please vote for this article on reddit.

Bibliography

C. Flanagan, M. Abadi, Object Types against Races. The seminal paper introducing ownership-based type systems.

See also my previous posts on Guava and GRFJ that discuss race free dialects of Java.

Advertisements