May | 2009 | Bartosz Milewski's Programming Cafe

May 2009

Monthly Archive

May 26, 2009

Race-free Multithreading

Posted by Bartosz Milewski under Programming
[28] Comments

Most languages were caught unaware by the multicore revolution. C++, predictably, developed a portable assembly language for multiprocessors (C++0x atomics). Out of sheer desperation some programmers turned to Erlang. Many still try to ignore the elephant in the room, while others try to shoot it with BB guns provided by their favorite languages.

I’ve looked at many languages and decided that none provided the right combination of performance and safety. We are still waiting for the next big one to break into the multicore territory.

Towards this goal, I’ll present a series of posts in which I’m going to develop a
threading model for that hypothetical language. The model is based on several papers that I reviewed in my previous blogs posts. My own contribution is putting the best of those ideas together into one package. I will use syntax similar to that of the D programming language, but C++ and Java programmers shouldn’t have problems following it.

Teaser #1

In one of my previous posts I described a concurrent data structure based on Haskell’s MVar. It’s essentially a one-element message queue. Let’s have a peek at the implementation that uses my proposed multithreaded scheme. Notice the total absence of special annotations–even the synchronized keyword is missing. The only unusual thing is the move operator, :=, introduced in my previous blog. It is needed in case we want to pass unique objects through MVar. You are probably asking yourself, How the heck is this supposed to work in a multithreaded environment? Read on!

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    // put: asynchronous (non-blocking)
    // Precondition: MVar must be empty
    void put(T msg) {
        assert (!_full);
        _msg := msg; // move
        _full = true;
        notify();
    }
    // take: If empty, blocks until full.
    // Removes the message and switches state to empty
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

Let’s instantiate this template as a monitor–that is an object accessible from multiple threads. We do it by specifying the owner as “self” (the this owner is not explicitly listed as a template parameter, but it can be passed to the template during instantiation).

auto mVar = new MVar<owner::self, int>;

In a self-owned object all methods are automatically synchronized. The move operator acting on an int simply copies it.

That was easy! How about instantiating a self-owned MVar for passing unique objects? Piece of cake!

auto q = new MVar<owner::self, unique Foo>;

auto foo = new unique Foo;
q.put(:= foo); // move foo
assert (foo is null);

// another thread
unique Foo f2 = q.get(); // implicit move of rvalue

So far data races have been avoided by implicit synchronization in self-owned objects and by the use of move semantics for unique objects.

Of course you know how to break the MVar, right? Instantiate it with a regular, non-unique class objects and watch aliases spread through threads. Let’s try it:

auto mVar = new MVar<owner::self, Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
// now let me race through my local alias!
myFoo.method();

Surprise! This code will not compile because a self-owned object cannot have a thread-local member. Because of a richer type system, the compiler can easily detect such violations. I’ll explain the details of the ownership scheme in my future posts–for now let me assure you that, no matter how hard you try, you can’t create a data race in this system (unless you bypass it with explicit casts).

How to making concurrent programming safer?

I propose to do this by extending the type system. (You’ve already seen some examples above.) I realize that there is strong resistance to extending a language’s type system. The main problem is increased complexity. The programmers are reluctant to study and use complex annotation schemes. I will argue that, in the case of multithreading, the complexity argument is reversed. Shared-memory multithreading is hard! Figuring out how to avoid all the pitfalls of sharing is harder than learning a type system that prevents them.

If a programmer is not willing to learn or cannot understand the race-free type system, he or she should not be writing multithreaded programs.

How to limit complexity?

To limit the complexity of the type system, I came up with several simple principles. The most important one is that multithreaded extensions should not require modifications to single-threaded programs. That means coming up with reasonable defaults for all new annotations.

If necessary, the compiler should be able to derive some of the defaults from program analysis. Because modularity is very important, such analysis should not creep across compilation unit boundaries. Whole-program analysis is out of the question.

How to keep goals reasonable?

There are two major challenges in multithreaded programming:

Avoiding races
Preventing deadlocks

The first problem is approachable, whereas the second seems to be a pie in the sky. I will therefore concentrate on designing a race-free system that is based on existing academic research.

Such system would be incomplete without some support for lock-free programming. There are very good reasons for the language to enforce sequential consistency. For comparison, C++ committee, after a long struggle, decided to allow non-SC primitives (weak atomics). Java only enforces SC for volatile variables.

Teaser #2

Speaking of lock-free programming, here’s the implementation of the infamous Double-Checked Locking Pattern:

class Singleton<T>
{
private:
    lockfree T _value;
public:
    T get() lockfree
    {
        if (_value is null)   
        {
            synchronized(this)
            {
                if (_value is null)
                   _value = new T;
            }
            return _value;
        }
    }
}

A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way. A lockfree method is not synchronized, but it can only operate on lockfree variables (or use a synchronized section). Even though the use of lockfree doesn’t introduce low-level races, it may create high-level races when accessing more that one lockfree variable at a time. But we all know that lock free programming is only for desperadoes.

Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
There should be efficient mechanisms for passing data between threads
- By value
- By reference. (The object being passed must be a monitor.)
- As immutable data, by const reference
- By unique reference using move semantics

Most of those ideas are expressible through type qualifiers.

Higher-level models

Explicit synchronization is hard, no doubt about it. In my proposal, the type system makes it safe. There are however other models of concurrency that, although more restrictive, are often easier to use.

One such model is based on message passing. In my opinion, support for message passing should be library-based. The race-free type system will make it safe and flexible. It will, for instance, support passing messages not only by value but also by reference–without sacrificing safety. In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself (have it’s own lock) or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible through the type system, allowing the compiler to check for its violations.

I’ve been paying a lot of attention to software transactional memory (STM). I believe that STM could be built into the language–again, with the support of the type system. It looks like the hardest problem with STM is how it should inter-operate with other types of multithreaded access (both locked and lock-free). The simplest solution is to enforce full isolation–no STM object can ever be accessed outside of a transaction. It’s not clear how practical such approach is, but it definitely simplifies things to the point that STM becomes orthogonal to other types of synchronization. And that means it will be possible to tackle it at a later time, after the race-free multithreading model is firmly established.

In the next installment, I will describe the ownership scheme that is the foundation of the race-free type system.

Please vote for this article on reddit.

Bibliography

C. Flanagan, M. Abadi, Object Types against Races. The seminal paper introducing ownership-based type systems.

See also my previous posts on Guava and GRFJ that discuss race free dialects of Java.

May 21, 2009

unique_ptr–How Unique is it?

Posted by Bartosz Milewski under C++, D Programming Language, Multithreading, Programming, Type System
[29] Comments

I’ve been using auto_ptrs before they were available in C++ (I made my own). I wrote several articles about resource management, in which auto_ptr played a prominent role. auto_ptr had always had many flaws, but it was a workhorse of memory management in a language that shuns garbage collection. Many of the flaws have been understood over the years and, in the 0x version of C++, auto_ptr was supplanted by the new improved unique_ptr. Using the latest features of C++, like rvalue references, unique_ptr implements move semantics in a consistent manner. You can now store unique_ptrs in most containers and apply many algorithms to them.

So why am I not jumping for joy? Because I know how much more can be done.

But first let me summarize the idea behind the unique pointer. It is a (smart) pointer that is the only reference to the object it’s pointing to. In particular, you cannot make a copy of a unique pointer–if you could, you’d end up with two references to the same object. You can only move it (hence the move semantics), making the source of the move invalid. With the older auto_ptr the moving was done quietly during assignment or pass-by-value. The problem with that arrangement was that the source auto_ptr would suddenly turn to null, which was sometimes confusing and occasionally led to access violations–as in this example:

void f(auto_ptr<Foo> foo); // pass by value

auto_ptr<Foo> pFoo (new Foo());
f(pFoo); // pass by value nulls the internal pointer
pFoo->method(); // access violation

The improvement provided by unique_ptr is to require an explicit move, to sensitize the programmer to the fact that the source of move becomes invalid. Still, the following code will compile, although the bug is much more prominent:

void f(unique_ptr<Foo> foo);

unique_ptr<Foo> pFoo (new Foo())
f(move(pFoo)); // explicit move
pFoo->method(); // access violation

A bigger problem is that there is no guarantee that a unique_ptr indeed stores the unique reference to an object. To begin with, unique_ptr can be constructed from any pointer. That pointer might have already been aliased, as in:

void f(unique_ptr<Foo> foo) {
    // destructor of foo called at the end of scope
}

Foo * foo = new Foo();
unique_ptr<Foo> upFoo(foo); 
// foo now aliases the contents of upFoo
f(move(upFoo)); // explicit move
foo->method(); // undefined behavior

There is also an obvious backdoor in the form of the method unique_ptr::get, which can be used to spread new aliases. This can be particularly insidious when you have to pass the result of get to a foreign function:

void f(Foo * pf) {
    globalFoo = pf; // creates a global alias
}

unique_ptr<Foo> pFoo(new Foo());
f(pFoo.get()); // leaks an alias

Finally, it’s possible to create aliases to data members of the object stored in unique_ptr, as well as assign aliased references to its data members. Consider this example:

class Foo {
public:
    ~Foo() { delete _bar; }
    Bar * _bar;
};

Bar * pBar = new Bar();
unique_ptr<Foo> upFoo(new Foo());
pFoo->_bar = pBar;
// pBar is now an alias to a member of Foo
upFoo.reset(); // deletes _bar inside foo
pBar->method(); // undefined behavior

All this adds up to quite a number of ways to shoot yourself in the foot! Still, if the only problems were deterministic access violations, I could live with them (I’m a very disciplined programmer). They are reasonably easy to reproduce and can be debugged using standard methods (code coverage). But now there is a new elephant in the room–multithreading.

The beauty of unique objects is that they can be safely passed (moved) between threads and they never require locking. Indeed, since only one thread at a time can reference them, there is no danger of data races. Except when, as in the examples above, aliases are inadvertently leaked from unique_ptr. Accessing an object through such aliases from more than one thread without locking is a very nasty bug, usually very hard to reproduce.

Type system approach

Okay, so what’s the panacea? I’m glad you asked–the type system, of course! Make unique a type qualifier and all your troubles are gone. Notice that the compiler has no idea about the semantics of some library class called unique_ptr, but it can be made aware of the semantics of the unique type qualifier. What’s more, it can enforce this semantics from cradle to grave–from construction to destruction. (I know this proposal has no chance to be incorporated into C++, but it might be useful in, let’s say, the D programming language.)

You don’t construct a unique pointer from an arbitrary pointer. You just construct the object as unique (I’m still using C++-like syntax):

unique Foo * foo = new unique Foo();

Now that the compiler knows the meaning of unique, it can, for instance, detect if a unique pointer is accessed after having been moved. The compiler might prevent such an invalid pointer from being dereferenced or passed to other functions. Such bugs can be detected at compile time, rather than through access violations at runtime.

Do we need separate syntax for move? It might seem that using the assignment syntax (the equal sign) wouldn’t be so bad, as long as the compiler prevents us from dereferencing the null pointer left behind after a move. However, another problem arises when you start instantiating templates with unique pointers. Some containers and algorithms will work out of the box. Some will refuse to be instantiated because internally they try to access unique pointers after the move. But there is a third category of templates that, although move-correct, will generate unexpected results–like vectors with null unique pointers. Unfortunately, there are situations where the compiler has limited vision (i.e., it cannot do whole-program analysis) and can miss a null unique pointer dereference. For instance, it can’t prevent the use of a vector after a random element has been moved out of it.

Let’s assume that we have a special syntax for move, say := . Here’s a simple example of its use:

unique * Foo foo1 = new unique Foo();
unique * Foo foo2 := foo1; // move
foo1->method(); // compile-time error: foo1 invalid after move

In C++0x, a lot of containers and algorithms have been rewritten to use the explicit move instead of the assignment. It wouldn’t be a big deal to use := instead. Notice that the compiler would not allow the use of a regular assignment on unique pointers (unless they are rvalues), so the templates that are not ready for move semantics would refuse to get instantiated with unique template parameters.

Obviously, the move operator applied to a non-unique pointer must also work, otherwise we’d have to write separate templates for unique and non-unique template arguments.

We also have to use our special notation when passing unique arguments to functions, as in:

void f(unique * Foo);

unique * Foo foo = new unique Foo();
f(:= foo); // move
foo->method(); // compile-time error: foo invalid after move

When returning an lvalue unique pointer (for instance a member of a data structure), we use the notation return := foo;. The move operator may be elided when returning a local unique pointer, since the source ceases to exist upon the return (in general, move is implied when the source is a unique rvalue).

The biggest strength of the type-based approach is that the compiler can reliably prevent the aliasing of unique pointers. An assignment (as opposed to move) of an (lvalue) unique pointer to either unique or non-unique pointer is impossible. Taking the address of a unique pointer is forbidden too.

That leaves us with an interesting dilemma: How do you call a (library) function that takes a regular pointer if all you have at your disposal is a unique pointer? C++ unique_ptr let’s you do it through its method get. But we know how dangerous get is as far as aliasing goes.

If you don’t have the source code for the function you are calling, you probably shouldn’t be calling it with a unique pointer, because you have no idea what this function will do with it (store an alias in a global variable?). If you know the implementer of the function and he/she is willing to donate his/her kidney in case the function leaks aliases, you may risk casting away the uniqueness.

There is however a way for a function to guarantee that it’s alias-safe by declaring its parameters lent. The compiler will check the body of the function to make sure it doesn’t leak aliases to any part of the lent parameter, nor does it store non-unique (and possibly aliased) objects inside the lent object. Of course, the function can only pass the lent parameter (or any sub-part of it) to another function that makes the same guarantee.

It’s not obvious which should be the default: lent or it’s opposite, owned. If lent were the default, the compiler would be flagging a lot of valid single-threaded code (although it makes sense to assume that methods of a monitor object take lent parameters by default).

The relationship between unique and lent is very similar to that between immutable and const in D. If you declare data as unique or immutable you can safely pass it to a functions that declares its parameter as lent or const, respectively. lent guarantees that the parameter will not be aliased, const that it won’t be mutated.

Speaking of immutable–there’s always been a problem with constructing non-trivial immutable objects. The tricky part is that during construction we often need to explicitly modify the object, but we don’t want to leak non-const aliases to it or to its sub-parts. And now we have a new tool at hand to control aliasing–unique pointers. Instead of constructing an immutable object, you may construct a unique object, with all the guarantees about aliasing, and then move it to an immutable pointer. Just like this:

unique Foo * pFoo = new unique Foo();
immutable Foo * imFoo := pFoo;

(Of course, C++ doesn’t have immutable types either, but D does.)

By the way, you can always safely move a unique pointer to any of immutable, const, or regular pointers. Notice that it doesn’t mean that unique is a supertype of, for instance, immutable. You can’t pass unique where immutable is expected (you don’t want read-only aliases to escape to another thread!)–you can only move it.

A method that promises not to leak aliases to this is declared as lent. The compiler will detect any violations of this promise, as in:

class Foo {
    Bar * _bar;
public:
    Bar * get() lent {
        return _bar; // error: a lent method returning an alias
    }
};

In general, when you are dealing with a unique object, you may only call its lent methods.

Issues

Strict uniqueness imposes constraints on the internal structure of objects. Imagine creating a unique doubly-linked list. A link in such a list must be accessible from two places: from the previous link and from the next link. The scheme that absolutely prevents the creation of aliases to unique objects will not allow the creation of doubly-linked lists–and rightly so! Imagine moving a link out of the list: you’ll end up with a null pointer in the list, and a possible cross-thread aliasing if the (unique) link migrates to another thread.

There is a solution to this problem based on the ownership scheme (like the one used in Guava or GRFJ), which allows cross-aliasing of objects that share the same owner (e.g., all links are owned by the list). Such aliases cannot be leaked because the compiler won’t allow objects owned by a unique object to be moved. But that’s a topic for another blog post.

The use of iterators on unique containers is also highly restricted, but there are other ways of iterating that are inherently more thread-safe.

Conclusion

Any extension to the type system is usually met with some resistance from the user community. While some appreciate the added safety, others complain about added complexity. So it’s very important to be able to limit this complexity to specific areas.

When are unique and lent qualifiers really needed? In C++, at least in my experience, unique_ptr serves mostly as a poor man’s garbage collector. Since memory management permeates all C++ programs, the switch to using unique qualifiers would require a lot of changes even in single-threaded programs. In contrast, in garbage-collected languages, the main usefulness of unique is for multithreaded programming. A message queue that passes large unique objects between threads is more efficient than the one that deep-copies them. And unique objects never require locking.

When multithreading is involved, a safer type system doesn’t really add complexity. It translates the hard complexity of shared-memory concurrency into something that is more approachable by mere mortals.

As always, the introduction of new type modifiers may lead to code duplication (as in the case of const and non-const accessors). This problem can be dealt with using qualifier polymorphism–a topic for a whole new blog post.

I understand that this post might be hard to follow. That doesn’t mean that the uniqueness scheme is difficult to understand. My task was not only to explain the hows, but also the whys–and that requires giving a lot of background.

Bibliography

Jonathan Aldrich, Valentin Kostadinov, Craig Chambers, Alias Annotations for Program Understanding. Describes unique and lent annotations in detail.
See also my previous blog on Generic Race-Free Java.

Bartosz Milewski's Programming Cafe

May 2009

Race-free Multithreading

Teaser #1

How to making concurrent programming safer?

How to limit complexity?

How to keep goals reasonable?

Teaser #2

Lessons learned from previous work

Higher-level models

Bibliography

unique_ptr–How Unique is it?

Type system approach

Issues

Conclusion

Bibliography

Top Posts

License

Blogroll

Follow Me

Archives