Bartosz Milewski's Programming Cafe

2009

Yearly Archive

June 15, 2009

Race-free Multithreading : Owner polymorphism

Posted by Bartosz Milewski under Concurrency, D Programming Language, Java, Multithreading, Programming, Type System
[4] Comments

In my last post I talked about the proposal for the ownership scheme for multithreaded programs that provides alias control and eliminates data races. The scheme requires the addition of new type qualifiers to the (hypothetical) language. The standard concern is that new type qualifiers introduce code duplication. The classic example is the duplication of getters required by the introduction of the const modifier:

class Foo {
    private Bar _bar;
public:
    Bar get() {
        return _bar;
    }
    const Bar get() const {
        return _bar;
    }
}

Do ownership annotations lead to the same kind of duplication? Fortunately not. It’s true that, in most cases, two implementations of each public method are needed–with and without synchronization–but this is taken care by the compiler, not by the programmer. Unlike in Java, we don’t need a different class for shared Vector and thread-local ArrayList. In my scheme, when a vector is instantiated as a monitor (shared), the compiler automatically puts in the necessary synchronization code.

Need for generic code

The ownership scheme introduces an element of genericity by letting the programmer specify ownership during the instantiation of a class (just as a template parameter is specified during the instantiation of a template).

I already mentioned that most declarations can be expressed in two ways: one using type qualifiers, another using template notation–the latter exposing the generic nature of ownership annotations. For instance, these two forms are equivalent:

auto foo2 = new shared Foo;
auto foo2 = new Foo<owner::self>;

The template form emphasizes the generic nature of ownership annotations.

With the ownership system in place, regular templates parametrized by types also gain an additional degree of genericity since their parameters now include implicit ownership information. This is best seen in objects that don’t own the objects they hold. Most containers have this property (unless they are restricted to storing value types). For instance, a stack object might be thread-local while its elements are either thread-local or shared. Or the stack might be shared, with shared elements, etc. The source code to implement such stacks may be identical.

The polymorphic scheme and the examples are based on the GRFJ paper I discussed in a past post.

An example

– Stack

A parameterized stack might look like this :

class Stack<T> {
    private Node<T> _top;
public:
    void push(T value) {
        auto newNode = new Node<owner::this, T>;
        newNode.init(:=value, _top);
        _top = newNode;
    }
    T pop() {
        if (_top is null) return null;
        auto value := _top.value();
        _top = _top.next();
        return value;
    }
}

This stack is parametrized by the type T. This time, however, T includes ownership information. In particular T could be shared, unique, immutable or–the default–thread-local. The template picks up whatever qualifier is specified during instantiation, as in:

auto stack = new Stack<unique Foo>;

The _top (the head of the linked list) is of the type Node, which is parametrized by the type T. What is implicit in this declaration is that _top is owned by this–the default assignment of ownership for subobjects. If you want, you can make this more explicit:

private Node<owner::this, T> _top;

Notice that, when constructing a new Node, I had to specify the owner explicitly as this. The default would be thread-local and could leak thread-local aliases in the constructor. It is technically possible that the owner::this could, at least in this case, be inferred by the compiler through simple flow analysis.

Let’s have a closer look at the method push, where some interesting things are happening. First push creates a new node, which is later assigned to _top. The compiler cannot be sure that the Node constructor or the init method don’t leak aliases. That looks bad at first glance because, if an alias to newNode were leaked, that would lead to the leakage of _top as well (after a pop).

And here’s the clincher: Because newNode was declared with the correct owner–the stack itself–it can’t leak an alias that has a different owner. So anybody who tries to access the (correctly typed) leaked alias would have to hold the lock on the stack. Which means that, if the stack is shared, unsynchronized access to any of the nodes and their aliases is impossible. Which means no races on Nodes.

I also used the move operator := to move the values stored on the stack. That will make the stack usable with unique types. (For non-unique types, the move operator turns into regular assignment.)

I can now instantiate various stacks with different combinations of ownerships. The simplest one is:

auto localStack = new Stack<Foo>;

which is thread-local and stores thread-local objects of class Foo. There are no restrictions on Foo.

A more interesting combination is:

auto localStackOfMonitors = new Stack<shared Foo>;

This is a thread-local stack which stores monitor objects (the opposite is illegal though, as I’ll explain in a moment).

There is also a primitive multithreaded message queue:

auto msgQueue = new shared Stack<shared Foo>;

Notice that code that would try to push a thread-local object on the localStackOfMonitors or the msgQueue would not compile. We need the rich type system to be able to express such subtleties.

Other interesting combinations are:

auto stackOfImmutable = new shared Stack<immutable Foo>;
auto stackOfUnique = new shared Stack<unique Foo>;

The latter is possible because I used move operators in the body of Stack.

– Node

Now I’ll show you the fully parameterized definition of Node. I made all ownership annotations explicit for explanatory purposes. Later I’ll argue later that all of them could be elided.

class Node<T> {
private:
    T _value;
    Node<owner::of_this, T> _next;
public:
    void init(T v, Node<owner::of_this, T> next)
    {
        _value := v;
        _next = next;
    }
    T value() {
        return :=_value;
    }
    Node<owner::of_this, T> next() {
        return _next;
    }
}

Notice the declaration of _next: I specified that it must be owned by the owner of the current object, owner::of_this. In our case, the current object is a node and its owner is an instance of the Stack (let’s assume it’s the self-owned msgQueue).

This is the most logical assignment of ownership: all nodes are owned by the same stack object. That means no ownership conversions need be done, for instance, in the implementation of pop. In this assignment:

_top = _top.next();

the owner of _top is msgQueue, and so is the owner of its _next object. The types match exactly. I drew the ownership tree below. The fat arrows point at owners.
Ownership hierarchy using owner::of_this

But that’s not the only possibility. The default–that is _next being owned by the current node–would work too. The corresponding ownership tree is shown below.

Ownership hierarchy using owner::this

The left-hand side of the assignment

_top = _top.next();

is still owned by msgQueue. But the _next object inside the _top is not. It is owned by the _top node itself. These are two different owners so, during the assignment, the compiler has to do an implicit ownership conversion. Such conversion is only safe if both owners belong to the same ownership tree (sometimes called a “region”). Indeed, ownership is only needed for correct locking, and the locks always reside at the top of the tree (msgQueue in this case). So, after all, we don’t need to annotate _next with the ownership qualifier.

The two other annotations can be inferred by the compiler (there are some elements of type inference even in C++0x and D). The argument next to the method init must be either owned by this or be convertible to owner::this because of the assignment

_next = next;

Similarly, the return from the method next is implicitly owned by this (the node). When it’s used in Stack.pop:

_top = _top.next();

the owner conversion is performed.

With ownership inference, the definition of Node simplifies to the following:

class Node<T> {
private:
    T _value;
    Node<T> _next; // by default owned by this
public:
    void init(T v, Node<T> next)
    {
        _value := v;
        _next = next; // inference: owner of next must be this
    }
    T value() {
        return :=_value;
    }
    Node<T> next() {
        return _next; // inference: returned node is owned by this
    }
}

which has no ownership annotations.

Let me stress again a very important point: If init wanted to leak the alias to next, it would have to assign it to a variable of the type Node<owner::this, T>, where this is the current node. The compiler would make sure that such a variable is accessed only by code that locks the root of the ownership tree, msgQueue. This arrangement ensures the absence of races for the nodes of the list.

Another important point is that Node contains _value of type T as a subobject. The compiler will refuse instantiations where Node‘s ownership tree is shared (its root is is self-owned), and T is thread-local. Indeed, such instantiation would lead to races if an alias to _value escaped from Node. Such an alias, being thread-local, would be accessible without locking.

Comment on notation

In general, a template parameter list might contain a mixture of types, type qualifiers, values, (and, in D, aliases). Because of this mixture, I’m using special syntax for ownership qualifiers, owner::x to distinguish them from other kinds of parameters.

As you have seen, a naked ownership qualifier may be specified during instantiation. If it’s the first template argument, it becomes the owner of the object. Class templates don’t specify this parameter, but they have access to it as owner::of_this.

Other uses of qualifier polymorphism

Once qualifier polymorphism is in the language, there is no reason not to allow other qualifiers to take part in polymorphism. For instance, the old problem of having to write separate const versions of accessors can be easily solved:

class Foo {
    private Bar _bar;
    public mut_q Bar get<mutability::mut_q>() mut_q
    {
        return _bar;
    }
}

Here method get is parametrized by the mutability qualifier mut_q. The values it can take are: mutable (the default), const, or immutable. For instance, in

auto immFoo = new immutable Foo;
immutable Bar b = immFoo.get();

the immutable version of get is called. Similarly, in

void f(const Foo foo) {
    const Bar b = foo.get();
}

the const version is called (notice that f may also be called with an immutable object–it will work just fine).

Class methods in Java or D are by default virtual. This is why, in general, non-final class methods cannot be templatized (an infinite number of possible versions of a method would have to be included in the vtable). Type qualifiers are an exception, because there is a finite number of them. It would be okay for the vtable to have three entries for the method get, one for each possible value of the mutability parameter. In this case, however, all three are identical, so the compiler will generate just one entry.

Conclusion

The hard part–explaining the theory and the details of the ownership scheme–is over. I will now switch to a tutorial-style presentation that is much more programmer friendly. You’ll see how simple the scheme really is in practice.

June 2, 2009

Race-free Multithreading: Ownership

Posted by Bartosz Milewski under Concurrency, D Programming Language, Java, Multithreading, Programming, Type System
[18] Comments

Since ownership plays a major role in race-free programming, it will be the first topic in my proposal for a race-free system. I presented the bird’s eye view of the system and provided a few teasers in my previous post. The design is based on published papers (see bibliography at the end). My contribution was to integrate several ideas into one package.

When I showed this proposal to my friends they either didn’t believe it could work or considered it too complex, depending which end they were looking at. From users’ perspective, the system looks relatively simple, so the natural reaction is: That can’t work. If you get into the details of why it works, and how the compiler knows you are in danger of a data race, you need some theory, and that is complex. So I decided to deal with some theory first, to show that the things work. If you’re not into theory, just look at the examples. They are usually simple to understand.

Owners

The ownership relationship is necessary to establish a tree-like structure among objects. This is needed by the compiler to decide which lock, if any, is responsible for the protection of each object, and take it when necessary. Simply speaking, the lock at the root of each tree protects the rest of the tree. If you think that your multithreaded programs don’t follow a tree structure, look at them closely. If they don’t, you either already have data races or are likely to develop them in the future.

-Every object has an owner

The owner may be another object–usually the embedding object. In the example below:

class Foo {
    void doWork() { _bar.doWork(); }
    private Bar _bar;
}

auto foo = new Foo;

the embedded object _bar is owned, at runtime, by the object foo (I repeat, the concrete object, not the class Foo). This is the default ownership relationship for embedded objects, so no special notation is needed to establish it (I’ll show later how to override this default).

There are also special symbolic “owners” that are used for the roots of ownership trees:

thread,
self,
unique, and
immutable.

unique and immutable are included in this list for convenience. I’ll discuss them later.

-Trees

Every object has just one owner for life, a condition necessary to create ownership trees that can be checked at compile time. Every tree has a single root and a lock is attached to that root, if needed.

The ownership information is embedded in the type of the object. Using this information, the compiler is able to deduce which lock must be held while accessing that object, and what kind of aliasing is allowed. All races (access to mutable shared variables without locking) are detected at compile time. I’ll sketch a proof later.

-What may be shared

Only immutable objects or objects rooted with a self-owned object may be shared between threads.

Additionally, objects whose direct owner is self (such objects are called monitors) may have multiple aliases while being shared. Monitors may own (and protect) other objects that are not monitors.

-Locking

The compiler will make sure that access to an object can only happen when the root of its ownership tree is locked (symbolic owners other than self are considered locked at all times). Since an object may only have one lock associated with it (at the top of its ownership tree), this condition is enough to ensure freedom from races.

Proof: I have to show that when a (mutable) object is seen by more than one thread, each access to it (read or write) is always protected by the same lock. Indeed, for an object to be shared between threads, the root of its ownership tree must be self, hence the top object must be a monitor. This monitor’s lock is always, automatically or explicitly, taken before accessing any member of the tree. The compiler knows which lock to take because the ownership information is encoded in the type of the object.

Introducing ownership annotations

Ownership is specified at the instance level (although it may be restricted at the class level). The previous example, which relied on default assignment of owners, is equivalent to the more explicit instance-level specification (that you will never see in actual programs):

Foo<owner::thread> foo = new Foo<owner::thread>;

This declares and constructs foo as being owned by the symbolic owner, thread. The embedded object _bar‘s owner is foo.

-Creating a monitor

A self-owned object is a monitor (I will alternate between the notation using shared type modifier or explicit owner annotation, <owner::self>). It contains a hidden lock and its methods are, by default, synchronized. Continuing with my example:

auto fooMon = new shared Foo;
// The same as:
// auto fooMon = new Foo<owner::self>;
fooMon.doWork();

The variable fooMon is a monitor and the doWork method is implicitly synchronized. The object _bar is now owned by fooMon. Its type can be expressed (this is rarely needed, however see the example of external ownership) as:

Bar<owner::fooMon>

Types parameterized by runtime entities (fooMon is a runtime handle) are known in programming language theory as dependent types.

Notice that I’m using the same class to create thread-local and shared instances. This is usually possible unless there is a specific restriction at the class level.

Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)

Thread-local objects

The special thread owner, which is the owner of all thread-local objects, is conceptually always locked, so thread-local objects don’t require any synchronization. Also, thread is the default owner so, in the absence of any ownership annotations, all objects are thread-local. That’s one of the defaults that makes single-threaded programs work as-is.

Here’s an interesting twist–global and static objects are by default thread-local. This part has been implemented in D, uncovering a number of threading bugs in the process.

Monitors

The special self owner (or the shared type modifier) is used to create monitor objects. A monitor has a built-in lock and all its public methods are by default synchronized.

As always with defaults, the language must provide a (preferably safe) way to bypass them. To prevent locking, a method may be explicitly marked as lockfree. The compiler is obliged to check if the lockfree method doesn’t access the object’s members in a non-safe way (although it can’t prevent high-level races on lockfree variables). That restricts the lockfree constructs to those that don’t require whole-program analysis to prove their safety.

The lockfree annotation is essential for, among others, the double-checked locking pattern (DCLP). I showed its implementation as a teaser in my previous post.

Subobjects

As I explained earlier, data members of an object are by default owned by that object. This way they inherit the root owner from their parent. This is another default that makes single-threaded programs run without additional qualifiers.

Notice that there are two important aspects of ownership, the direct owner and the root owner, which might be different. The direct owner is used in type-checking, the root owner in deciding which synchronization method to use. Both are known or inferred during compilation.

As usual, the defaults may be overridden. For instance, you may embed a monitor in a thread-local object by qualifying it as self-owned/shared:

class Holder {
    private Mon<owner::self> _mon;
}

or, in common notation, as shared:

class Holder {
    private shared Mon _mon;
}

Here, _mon is not owned by Holder (the default has been overridden) so it doesn’t inherit its root owner. Its methods are synchronized by its own lock. As you can see, ownership tree not always reflects embedding. An embedded monitor starts a new tree.

Well, the situation is a bit subtler. Objects in Java or D have reference semantics, so there is a hidden pointer, or handle, in the code above. Accessing the handle is not the same as accessing the object proper. Consider this example:

class Holder {
    private shared Mon _mon;
    public setMon(shared Mon newMon) {
        _mon = newMon;
    }
}

Let’s instantiate a self-owned Holder and a self-owned Mon:

auto holder = new shared Holder;
auto newMon = new shared Mon;
holder.setMon(newMon);

Since holder is itself a monitor, the setMon method is automatically synchronized by its lock (it must be!). Therefore, strictly speaking, the handle part of _mon is owned by holderMon, whereas the object-proper part is self-owned.

You cannot embed a thread-owned object inside a monitor–the compiler would flag it as an error. This is part of alias control–a thread-local object might possibly have thread-local aliases that may be accessed without locking. Being part of a monitor, it could then migrate to another thread and cause a race.

What if a subobject is accessed directly (not through a method)? This may happen when the subobject is declared public:

class Foo {
    public Bar _bar;
}

In that case not all uses of _bar are allowed. Consider this:

auto foo = new shared Foo;
foo._bar.m(); // error

Access to _bar must happen only when foo is locked. The compiler knows it because the full type of _bar is:

Bar<owner::foo>

Here’s the corrected code:

synchronized(foo) {
    foo._bar.m();
}

An even better solution is to make _bar private and provide appropriate methods to access it. Those methods would be automatically synchronized for a shared foo.

unique and immutable

I discussed unique objects in one of my previous posts. Although not strictly required in the ownership scheme, uniqueness allows for very efficient and safe transmission of large objects between threads. It makes sense to include unique as another symbolic root owner, since its multithreaded semantics is different from other types and it doesn’t require locking.

Some languages, including D, define immutable objects, which cannot be modified after creation. Such objects may be freely shared and passed by reference between threads. Again, immutable may be used as a root owner.

Example

With the preliminaries out of the way, I can now explain in more detail the workings of the teaser from my previous post. Here’s the definition of the class MVar:

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    void put(T msg) {
        _msg := msg; // move
        _full = true;
        notify();
    }
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

First, let’s instantiate MVar as a shared (self-owned) monitor that is used to pass unique objects of class Foo as messages:

auto chanUnique = new shared MVar<unique Foo>;

The type of _msg in this instantiation is unique Foo, which is the same as Foo<owner::unique>. The method put takes unique Foo, so the following code is type-correct:

auto foo = new unique Foo;
chanUnique.put(:= foo); // move foo

Notice that unique objects cannot be assigned or passed by value–they have to be moved, hence the use of the move operator, :=. Internally, the method put also uses the move operator (good thinking on the part of the designer–otherwise MVar couldn’t be instantiated with unique). What’s interesting about this example is that messages are not deep-copied between threads. They are safely passed by reference.

Since chanUnique is self-owned (shared), both put and get are automatically synchronized.

Now let’s access chanUnique from another thread:

// another thread
unique Foo f2 = chanUnique.get(); // implicit move of rvalue

The return type of get is unique Foo, so the types check. I could have used the move operator, but since the right hand side is an rvalue, the compiler lets me use the assignment.

Now for the tricky case: What’s wrong with this code?

auto mVar = new shared MVar<Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
myFoo.unsyncMethod(); // ouch!

Since myFoo is created as thread-local (that’s the default), its methods are not synchronized. If I were able to pass it to shared MVar, another thread could obtain it through get. It could then call the unsynchronized method unsyncMethod at the moment when I was calling it. A data race would be possible! Or would it?

Guess what–the compiler won’t let you shoot yourself in the foot. It will notice that it would have to instantiate a shared object mVar with a thread-local member _msg. That’s against the rules! (A shared object cannot own a thread-local object.)

External ownership

In the original GRFJ paper the authors showed an example where one object was owned by another object without the former being embedded in the latter. They made an observation that, for the purpose of locking, the ownership relationship must be unchangeable: You can’t switch the owner on the fly. Therefore external ownership is allowed only if the owner is declared final.

final shared Lock lockObj = new shared Lock;
auto foo = new Foo<owner::lockObj>;
auto bar = new Bar<owner::lockObj>;

In this case, the compiler will only allow access to foo under the lock of lockObj, as in:

synchronized(lockObj) {
    foo.method();
    bar.method();
}

This construct is useful in situations where the locking discipline is not easily convertible to object hierarchy.

Conclusion

You might have noticed my use of dual notation. Most user code would be written with type qualifiers such as shared, unique, or immutable. However, in some cases I used an alternative notation that looked more like the specification of template parameters: <owner::self>, <owner::unique>, <owner::immutable>, or even <owner::thread> (in D they would be surrounded by !( and )). This was not meant to further confuse the reader, but as a gentle introduction to qualifier polymorphism, which I will describe in the next installment. I will show how classes and methods may be parameterized with different types of ownership, cutting down code duplication.

I’d like to thank Andrei Alexandrescu, Walter Bright, Sean Kelly and Jason House for very helpful comments. I’m also indebted to the D community for discussing my previous posts.

Bibliography

Boyapati, Rinard, A Parameterized Type System for Race-Free Java Programs
C. Flanagan, M. Abadi, Object Types against Races.

May 26, 2009

Race-free Multithreading

Posted by Bartosz Milewski under Programming
[28] Comments

Most languages were caught unaware by the multicore revolution. C++, predictably, developed a portable assembly language for multiprocessors (C++0x atomics). Out of sheer desperation some programmers turned to Erlang. Many still try to ignore the elephant in the room, while others try to shoot it with BB guns provided by their favorite languages.

I’ve looked at many languages and decided that none provided the right combination of performance and safety. We are still waiting for the next big one to break into the multicore territory.

Towards this goal, I’ll present a series of posts in which I’m going to develop a
threading model for that hypothetical language. The model is based on several papers that I reviewed in my previous blogs posts. My own contribution is putting the best of those ideas together into one package. I will use syntax similar to that of the D programming language, but C++ and Java programmers shouldn’t have problems following it.

Teaser #1

In one of my previous posts I described a concurrent data structure based on Haskell’s MVar. It’s essentially a one-element message queue. Let’s have a peek at the implementation that uses my proposed multithreaded scheme. Notice the total absence of special annotations–even the synchronized keyword is missing. The only unusual thing is the move operator, :=, introduced in my previous blog. It is needed in case we want to pass unique objects through MVar. You are probably asking yourself, How the heck is this supposed to work in a multithreaded environment? Read on!

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    // put: asynchronous (non-blocking)
    // Precondition: MVar must be empty
    void put(T msg) {
        assert (!_full);
        _msg := msg; // move
        _full = true;
        notify();
    }
    // take: If empty, blocks until full.
    // Removes the message and switches state to empty
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

Let’s instantiate this template as a monitor–that is an object accessible from multiple threads. We do it by specifying the owner as “self” (the this owner is not explicitly listed as a template parameter, but it can be passed to the template during instantiation).

auto mVar = new MVar<owner::self, int>;

In a self-owned object all methods are automatically synchronized. The move operator acting on an int simply copies it.

That was easy! How about instantiating a self-owned MVar for passing unique objects? Piece of cake!

auto q = new MVar<owner::self, unique Foo>;

auto foo = new unique Foo;
q.put(:= foo); // move foo
assert (foo is null);

// another thread
unique Foo f2 = q.get(); // implicit move of rvalue

So far data races have been avoided by implicit synchronization in self-owned objects and by the use of move semantics for unique objects.

Of course you know how to break the MVar, right? Instantiate it with a regular, non-unique class objects and watch aliases spread through threads. Let’s try it:

auto mVar = new MVar<owner::self, Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
// now let me race through my local alias!
myFoo.method();

Surprise! This code will not compile because a self-owned object cannot have a thread-local member. Because of a richer type system, the compiler can easily detect such violations. I’ll explain the details of the ownership scheme in my future posts–for now let me assure you that, no matter how hard you try, you can’t create a data race in this system (unless you bypass it with explicit casts).

How to making concurrent programming safer?

I propose to do this by extending the type system. (You’ve already seen some examples above.) I realize that there is strong resistance to extending a language’s type system. The main problem is increased complexity. The programmers are reluctant to study and use complex annotation schemes. I will argue that, in the case of multithreading, the complexity argument is reversed. Shared-memory multithreading is hard! Figuring out how to avoid all the pitfalls of sharing is harder than learning a type system that prevents them.

If a programmer is not willing to learn or cannot understand the race-free type system, he or she should not be writing multithreaded programs.

How to limit complexity?

To limit the complexity of the type system, I came up with several simple principles. The most important one is that multithreaded extensions should not require modifications to single-threaded programs. That means coming up with reasonable defaults for all new annotations.

If necessary, the compiler should be able to derive some of the defaults from program analysis. Because modularity is very important, such analysis should not creep across compilation unit boundaries. Whole-program analysis is out of the question.

How to keep goals reasonable?

There are two major challenges in multithreaded programming:

Avoiding races
Preventing deadlocks

The first problem is approachable, whereas the second seems to be a pie in the sky. I will therefore concentrate on designing a race-free system that is based on existing academic research.

Such system would be incomplete without some support for lock-free programming. There are very good reasons for the language to enforce sequential consistency. For comparison, C++ committee, after a long struggle, decided to allow non-SC primitives (weak atomics). Java only enforces SC for volatile variables.

Teaser #2

Speaking of lock-free programming, here’s the implementation of the infamous Double-Checked Locking Pattern:

class Singleton<T>
{
private:
    lockfree T _value;
public:
    T get() lockfree
    {
        if (_value is null)   
        {
            synchronized(this)
            {
                if (_value is null)
                   _value = new T;
            }
            return _value;
        }
    }
}

A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way. A lockfree method is not synchronized, but it can only operate on lockfree variables (or use a synchronized section). Even though the use of lockfree doesn’t introduce low-level races, it may create high-level races when accessing more that one lockfree variable at a time. But we all know that lock free programming is only for desperadoes.

Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
There should be efficient mechanisms for passing data between threads
- By value
- By reference. (The object being passed must be a monitor.)
- As immutable data, by const reference
- By unique reference using move semantics

Most of those ideas are expressible through type qualifiers.

Higher-level models

Explicit synchronization is hard, no doubt about it. In my proposal, the type system makes it safe. There are however other models of concurrency that, although more restrictive, are often easier to use.

One such model is based on message passing. In my opinion, support for message passing should be library-based. The race-free type system will make it safe and flexible. It will, for instance, support passing messages not only by value but also by reference–without sacrificing safety. In traditional type systems there is no way to express the requirement that a message passed by reference must either be a monitor itself (have it’s own lock) or behave like a C++ unique_ptr (an object that leaves no aliases behind). This requirement should be expressible through the type system, allowing the compiler to check for its violations.

I’ve been paying a lot of attention to software transactional memory (STM). I believe that STM could be built into the language–again, with the support of the type system. It looks like the hardest problem with STM is how it should inter-operate with other types of multithreaded access (both locked and lock-free). The simplest solution is to enforce full isolation–no STM object can ever be accessed outside of a transaction. It’s not clear how practical such approach is, but it definitely simplifies things to the point that STM becomes orthogonal to other types of synchronization. And that means it will be possible to tackle it at a later time, after the race-free multithreading model is firmly established.

In the next installment, I will describe the ownership scheme that is the foundation of the race-free type system.

Please vote for this article on reddit.

Bibliography

C. Flanagan, M. Abadi, Object Types against Races. The seminal paper introducing ownership-based type systems.

See also my previous posts on Guava and GRFJ that discuss race free dialects of Java.

May 21, 2009

unique_ptr–How Unique is it?

Posted by Bartosz Milewski under C++, D Programming Language, Multithreading, Programming, Type System
[29] Comments

I’ve been using auto_ptrs before they were available in C++ (I made my own). I wrote several articles about resource management, in which auto_ptr played a prominent role. auto_ptr had always had many flaws, but it was a workhorse of memory management in a language that shuns garbage collection. Many of the flaws have been understood over the years and, in the 0x version of C++, auto_ptr was supplanted by the new improved unique_ptr. Using the latest features of C++, like rvalue references, unique_ptr implements move semantics in a consistent manner. You can now store unique_ptrs in most containers and apply many algorithms to them.

So why am I not jumping for joy? Because I know how much more can be done.

But first let me summarize the idea behind the unique pointer. It is a (smart) pointer that is the only reference to the object it’s pointing to. In particular, you cannot make a copy of a unique pointer–if you could, you’d end up with two references to the same object. You can only move it (hence the move semantics), making the source of the move invalid. With the older auto_ptr the moving was done quietly during assignment or pass-by-value. The problem with that arrangement was that the source auto_ptr would suddenly turn to null, which was sometimes confusing and occasionally led to access violations–as in this example:

void f(auto_ptr<Foo> foo); // pass by value

auto_ptr<Foo> pFoo (new Foo());
f(pFoo); // pass by value nulls the internal pointer
pFoo->method(); // access violation

The improvement provided by unique_ptr is to require an explicit move, to sensitize the programmer to the fact that the source of move becomes invalid. Still, the following code will compile, although the bug is much more prominent:

void f(unique_ptr<Foo> foo);

unique_ptr<Foo> pFoo (new Foo())
f(move(pFoo)); // explicit move
pFoo->method(); // access violation

A bigger problem is that there is no guarantee that a unique_ptr indeed stores the unique reference to an object. To begin with, unique_ptr can be constructed from any pointer. That pointer might have already been aliased, as in:

void f(unique_ptr<Foo> foo) {
    // destructor of foo called at the end of scope
}

Foo * foo = new Foo();
unique_ptr<Foo> upFoo(foo); 
// foo now aliases the contents of upFoo
f(move(upFoo)); // explicit move
foo->method(); // undefined behavior

There is also an obvious backdoor in the form of the method unique_ptr::get, which can be used to spread new aliases. This can be particularly insidious when you have to pass the result of get to a foreign function:

void f(Foo * pf) {
    globalFoo = pf; // creates a global alias
}

unique_ptr<Foo> pFoo(new Foo());
f(pFoo.get()); // leaks an alias

Finally, it’s possible to create aliases to data members of the object stored in unique_ptr, as well as assign aliased references to its data members. Consider this example:

class Foo {
public:
    ~Foo() { delete _bar; }
    Bar * _bar;
};

Bar * pBar = new Bar();
unique_ptr<Foo> upFoo(new Foo());
pFoo->_bar = pBar;
// pBar is now an alias to a member of Foo
upFoo.reset(); // deletes _bar inside foo
pBar->method(); // undefined behavior

All this adds up to quite a number of ways to shoot yourself in the foot! Still, if the only problems were deterministic access violations, I could live with them (I’m a very disciplined programmer). They are reasonably easy to reproduce and can be debugged using standard methods (code coverage). But now there is a new elephant in the room–multithreading.

The beauty of unique objects is that they can be safely passed (moved) between threads and they never require locking. Indeed, since only one thread at a time can reference them, there is no danger of data races. Except when, as in the examples above, aliases are inadvertently leaked from unique_ptr. Accessing an object through such aliases from more than one thread without locking is a very nasty bug, usually very hard to reproduce.

Type system approach

Okay, so what’s the panacea? I’m glad you asked–the type system, of course! Make unique a type qualifier and all your troubles are gone. Notice that the compiler has no idea about the semantics of some library class called unique_ptr, but it can be made aware of the semantics of the unique type qualifier. What’s more, it can enforce this semantics from cradle to grave–from construction to destruction. (I know this proposal has no chance to be incorporated into C++, but it might be useful in, let’s say, the D programming language.)

You don’t construct a unique pointer from an arbitrary pointer. You just construct the object as unique (I’m still using C++-like syntax):

unique Foo * foo = new unique Foo();

Now that the compiler knows the meaning of unique, it can, for instance, detect if a unique pointer is accessed after having been moved. The compiler might prevent such an invalid pointer from being dereferenced or passed to other functions. Such bugs can be detected at compile time, rather than through access violations at runtime.

Do we need separate syntax for move? It might seem that using the assignment syntax (the equal sign) wouldn’t be so bad, as long as the compiler prevents us from dereferencing the null pointer left behind after a move. However, another problem arises when you start instantiating templates with unique pointers. Some containers and algorithms will work out of the box. Some will refuse to be instantiated because internally they try to access unique pointers after the move. But there is a third category of templates that, although move-correct, will generate unexpected results–like vectors with null unique pointers. Unfortunately, there are situations where the compiler has limited vision (i.e., it cannot do whole-program analysis) and can miss a null unique pointer dereference. For instance, it can’t prevent the use of a vector after a random element has been moved out of it.

Let’s assume that we have a special syntax for move, say := . Here’s a simple example of its use:

unique * Foo foo1 = new unique Foo();
unique * Foo foo2 := foo1; // move
foo1->method(); // compile-time error: foo1 invalid after move

In C++0x, a lot of containers and algorithms have been rewritten to use the explicit move instead of the assignment. It wouldn’t be a big deal to use := instead. Notice that the compiler would not allow the use of a regular assignment on unique pointers (unless they are rvalues), so the templates that are not ready for move semantics would refuse to get instantiated with unique template parameters.

Obviously, the move operator applied to a non-unique pointer must also work, otherwise we’d have to write separate templates for unique and non-unique template arguments.

We also have to use our special notation when passing unique arguments to functions, as in:

void f(unique * Foo);

unique * Foo foo = new unique Foo();
f(:= foo); // move
foo->method(); // compile-time error: foo invalid after move

When returning an lvalue unique pointer (for instance a member of a data structure), we use the notation return := foo;. The move operator may be elided when returning a local unique pointer, since the source ceases to exist upon the return (in general, move is implied when the source is a unique rvalue).

The biggest strength of the type-based approach is that the compiler can reliably prevent the aliasing of unique pointers. An assignment (as opposed to move) of an (lvalue) unique pointer to either unique or non-unique pointer is impossible. Taking the address of a unique pointer is forbidden too.

That leaves us with an interesting dilemma: How do you call a (library) function that takes a regular pointer if all you have at your disposal is a unique pointer? C++ unique_ptr let’s you do it through its method get. But we know how dangerous get is as far as aliasing goes.

If you don’t have the source code for the function you are calling, you probably shouldn’t be calling it with a unique pointer, because you have no idea what this function will do with it (store an alias in a global variable?). If you know the implementer of the function and he/she is willing to donate his/her kidney in case the function leaks aliases, you may risk casting away the uniqueness.

There is however a way for a function to guarantee that it’s alias-safe by declaring its parameters lent. The compiler will check the body of the function to make sure it doesn’t leak aliases to any part of the lent parameter, nor does it store non-unique (and possibly aliased) objects inside the lent object. Of course, the function can only pass the lent parameter (or any sub-part of it) to another function that makes the same guarantee.

It’s not obvious which should be the default: lent or it’s opposite, owned. If lent were the default, the compiler would be flagging a lot of valid single-threaded code (although it makes sense to assume that methods of a monitor object take lent parameters by default).

The relationship between unique and lent is very similar to that between immutable and const in D. If you declare data as unique or immutable you can safely pass it to a functions that declares its parameter as lent or const, respectively. lent guarantees that the parameter will not be aliased, const that it won’t be mutated.

Speaking of immutable–there’s always been a problem with constructing non-trivial immutable objects. The tricky part is that during construction we often need to explicitly modify the object, but we don’t want to leak non-const aliases to it or to its sub-parts. And now we have a new tool at hand to control aliasing–unique pointers. Instead of constructing an immutable object, you may construct a unique object, with all the guarantees about aliasing, and then move it to an immutable pointer. Just like this:

unique Foo * pFoo = new unique Foo();
immutable Foo * imFoo := pFoo;

(Of course, C++ doesn’t have immutable types either, but D does.)

By the way, you can always safely move a unique pointer to any of immutable, const, or regular pointers. Notice that it doesn’t mean that unique is a supertype of, for instance, immutable. You can’t pass unique where immutable is expected (you don’t want read-only aliases to escape to another thread!)–you can only move it.

A method that promises not to leak aliases to this is declared as lent. The compiler will detect any violations of this promise, as in:

class Foo {
    Bar * _bar;
public:
    Bar * get() lent {
        return _bar; // error: a lent method returning an alias
    }
};

In general, when you are dealing with a unique object, you may only call its lent methods.

Issues

Strict uniqueness imposes constraints on the internal structure of objects. Imagine creating a unique doubly-linked list. A link in such a list must be accessible from two places: from the previous link and from the next link. The scheme that absolutely prevents the creation of aliases to unique objects will not allow the creation of doubly-linked lists–and rightly so! Imagine moving a link out of the list: you’ll end up with a null pointer in the list, and a possible cross-thread aliasing if the (unique) link migrates to another thread.

There is a solution to this problem based on the ownership scheme (like the one used in Guava or GRFJ), which allows cross-aliasing of objects that share the same owner (e.g., all links are owned by the list). Such aliases cannot be leaked because the compiler won’t allow objects owned by a unique object to be moved. But that’s a topic for another blog post.

The use of iterators on unique containers is also highly restricted, but there are other ways of iterating that are inherently more thread-safe.

Conclusion

Any extension to the type system is usually met with some resistance from the user community. While some appreciate the added safety, others complain about added complexity. So it’s very important to be able to limit this complexity to specific areas.

When are unique and lent qualifiers really needed? In C++, at least in my experience, unique_ptr serves mostly as a poor man’s garbage collector. Since memory management permeates all C++ programs, the switch to using unique qualifiers would require a lot of changes even in single-threaded programs. In contrast, in garbage-collected languages, the main usefulness of unique is for multithreaded programming. A message queue that passes large unique objects between threads is more efficient than the one that deep-copies them. And unique objects never require locking.

When multithreading is involved, a safer type system doesn’t really add complexity. It translates the hard complexity of shared-memory concurrency into something that is more approachable by mere mortals.

As always, the introduction of new type modifiers may lead to code duplication (as in the case of const and non-const accessors). This problem can be dealt with using qualifier polymorphism–a topic for a whole new blog post.

I understand that this post might be hard to follow. That doesn’t mean that the uniqueness scheme is difficult to understand. My task was not only to explain the hows, but also the whys–and that requires giving a lot of background.

Bibliography

Jonathan Aldrich, Valentin Kostadinov, Craig Chambers, Alias Annotations for Program Understanding. Describes unique and lent annotations in detail.
See also my previous blog on Generic Race-Free Java.

April 20, 2009

SharC Infested C

Posted by Bartosz Milewski under Concurrency, Multithreading, Programming, Type System
[2] Comments

Making the C language race-free is quite a challenge. I discussed one approach in my previous post. Here’s another one, based on lightweight annotations and whole-program analysis.

SharC is a sharing checker and program transformer for C. It is described in the paper SharC: Checking Data Sharing Strategies for Multithreaded C.

Sharing modes

The goal of SharC is to detect (and possibly eliminate) data races from legacy C programs. This is an incremental process in which the programmer annotates the program with type qualifiers, runs the checker, listens to its complaints, ads more annotations, etc., until SharC is totally satisfied and produces an instrumented version of the program with additional built-in runtime checks.

Some of the annotations that specify sharing modes are similar to those of Cyclone, but there are some new ones.

private: Thread-local
readonly: Does not change (immutable). Interestingly, it may be cast to another sharing mode that is writable.
locked(lock): Shared under specific lock. The lock must be readonly.
racy: Intentionally racy. No compile- or runtime checks.
dynamic: May change sharing mode at runtime, at which point it’s checked for readonly or single-thread access.

Note that SharC provides ways to move data between various sharing modes. The key to these operations is a very clever reference counting scheme built into the memory allocator. The paper provides the details of that scheme, but I’ll skip them here.

The overall policy of SharC is to consider almost all sharing (even under locks) illegal unless explicitly declared by the user.

Safe cast

Dynamic types can be cast between sharing modes. Here’s an example:

int readonly * y;
...
x = SCAST(int private *, y);

A pointer to a readonly int (which may be shared) is cast to a pointer to private (thread-only) int. Normally such cast would be unsafe. The int might be referenced by multiple readonly pointers (potentially shared between threads). The clients of those pointers don’t expect the value to ever change and may access it without locking. That’s why SCAST not only nulls the original pointer y, but also checks if the data reference count is zero (using the clever reference counting scheme I mentioned before). If it’s not, it reports an error.

Example

The paper offers a more elaborate example, which shows the use of dynamic data types and checked casting. The example is based on a pattern typical for multimedia processing and games.

A data buffer is processed in stages and passed from thread to thread.

We start with a linked list of stages. Each stage is run by a separate thread. The first stage is fed some data in a buffer (for instance a frame of a movie). The thread responsible for that stage starts its own processing. When it’s done, it hands the buffer over to the next stage, and so on. The buffer is like a car on an assembly line.

When the assembly line is created, each stage is assigned a different processing function through the function pointer fun (see code below).

You may also think of a stage as a one-element synchronous channel, with the buffer being the message.

What is important in this example is that we don’t want to copy a potentially very large buffer. We want to pass it between threads by reference without risking data races.

This is a stage structure with all the necessary SharC annotations.

typedef struct stage(q) {
    struct stage dynamic * q next;
    cond racy * q cv;
    mutex racy * readonly mut;
    char locked(mut) * locked(mut) sdata;
    void (*q fun)(char private * private fdata);
} stage_t;

Don’t be put off by the sheer number of annotations. Most of them are defaults or can be easily inferred by SharC.

This is what the programmer must annotate:

The buffer sdata is shared and protected by mut.
The function pointed to by fun takes a private (thread-local) buffer, fdata.

Notice that, without casting, fun cannot be called with sdata, because sdata is shared.

What SharC infers is that:

Structure stage is parametrized by some sharing policy, q (which stands for “qualifier”). This qualifier will be provided when the variable of type stage is defined. (This is called qualifier polymorphism.)
All pointers inside stage, except for mut and sdata inherit the qualifier q.
The field next, which is a pointer to the next stage, has two qualifiers. The pointer itself inherits the qualifier q from its parent. The data it points to is marked dynamic, which means that it will be checked during runtime to be either accessed by a single thread or immutable.
The pointers cv and fun also inherit the parent qualifier, q. Condition variable cv is declared racy to turn off compiler checks for sharing. cv is shared between threads but does not require locking.
The mutex object is also racy. The pointer to it though is readonly (immutable). This is an important detail, since it is unsafe to change the mutex during the lifetime of data it protects (which is sdata in this case).
sdata is a pointer to a char buffer. Both the pointer and the buffer it points to are protected by the mutex, mut.
Finally, the pointer fun inherits its qualifier from the parent. The function parameter, fdata, is a private pointer to a private buffer.

This is the beginning of the thread function, thrFunc:

void dynamic * private thrFunc(void dynamic * private d) {
    stage_t dynamic * private S = d;
    stage_t dynamic * private nextS = S->next;
    char private * private ldata;
    ...

The function takes and returns a private void pointer to dynamically shared data. In reality this pointer points to a stage structure–this is reflected in the definition of S. Notice that in C a void pointer can be converted to a typed pointer. In SharC, the compiler makes sure that the qualifiers match.

The next field of S inherits its qualifier private from S. The data it points to (the next stage) is dynamic, as per the definition of stage.

Finally there is a private pointer to a private buffer, which will be used to point to the shared buffer. Since in general private and shared don’t mix, a sharing cast will be performed in the body of thrFunc (see below).

Here’s the rest of thrFunc:

    ...
    while (notDone) {
        mutexLock(S->mut);
        while (S->sdata == NULL)
            condWait(S->cv,S->mut);
        ldata = SCAST(char private *, S->sdata);
        condSignal(S->cv);
        mutexUnlock(S->mut);
        S->fun(ldata);
        if (nextS) {
            mutexLock(nextS->mut);
            while (nextS->sdata)
                condWait(nextS->cv,nextS->mut);
            nextS->sdata =
                SCAST(char locked(nextS->mut) *, ldata);
            condSignal(nextS->cv);
            mutexUnlock(nextS->mut);
        }
    }
    return NULL;
}

The stage S that was passed to the thread function contains shared pointer sdata. To access this pointer we need to take the correct lock, S->mut. This is the lock that was specified in the declaration of sdata inside stage.

First we loop waiting for S->sdata to become non-null. Another thread is supposed to initialize that pointer. As long as the pointer is null, we block on the condition variable S->cv.

while (S->sdata == NULL)
    condWait(S->cv,S->mut);

Notice that condWait takes the mutex as its second argument. It has to unlock it while waiting for the signal, otherwise no other thread would have a chance to modify S->sdata (and we’d have a livelock). (In the channel analogy, we are performing a synchronous receive.)

Now we are ready to call the function S->fun to process the buffer. However, that function is declared to take a private buffer, not a shared buffer. It could, for instance, be a library function that has no provisions for sharing. It could be strlen or a Fast Fourier Transform.

The programmer knows that it is safe to call this function, because the buffer has been handed over between threads and shared access is no longer in the picture. What remains is to convince the compiler that that’s okay, and to add some code that does necessary checks at runtime. All this is done by SCAST.

The line

ldata = SCAST(char private *, S->sdata);

casts S->sdata to a pointer to a private (thread-local) buffer and stores it in the private variable ldata (local data). It also nulls the source and makes sure there are no more references to it (a very clever reference counting scheme, remember?). If this looks to you like move semantics, you get the right idea.

Still under lock, we signal the previous thread that the shared buffer has been nulled so it can re-fill it.

We are now ready to release the lock and do some local processing. We call the function S->fun with the (now private) buffer.

If there is a next stage (pointed to by S->next), we want to pass our processed buffer to it. Before we do that, we have to wait until the buffer in the next stage becomes null. (In the channel analogy, we are now doing a synchronous send.)

while (nextS->sdata)
    condWait(nextS->cv,nextS->mut);

We have to do it under the next stage lock.

We assign our current buffer to the next stage and signal the next thread. Here, again, we have to perform a cast: our buffer is private, and next->sdata is shared and locked under next->mut. The statement:

nextS->sdata = SCAST(char locked(nextS->mut) *, ldata);

casts ldata (local data) to a pointer to char locked(nextS->mut). This is exactly the opposite of what we did earlier (casting shared data to local data). As before, the runtime has to null the source and check if there are no other references to it. This way thread-local unique data may be moved to another thread. If you followed my previous posts, you notice the pattern: unique data, move semantics.

Conclusion

What is unique in SharC approach is the presence of dynamically checked casts that are used to change sharing modes at runtime. This is probably the best that can be done on top of C.

When designing a new language, we could use different mechanisms to achieve the same goals. For instance, the buffer could be declared unique. The move semantics would then guarantee the absence of dangerous aliasing thus eliminating the need for behind-the-scenes reference counting.

The problem of passing a unique object to an unsuspecting library function (without nulling the source) can be solved by declaring the function parameter as lent (which should be the default).

If you read this space on a semi-regular basis, you might have noticed that recently it turned into a series of “Cliff Notes for CS grad students.” That was because I tried to prepare myself for the task of designing a multithreading system and memory model for the D programming language. My future posts will (hopefully) track progress in that direction.

April 9, 2009

Cyclone with no Data Races

Posted by Bartosz Milewski under C++, Concurrency, Multithreading, Programming, Type System
[3] Comments

When I read Dan Grossman’s paper, Type-Safe Multithreading in Cyclone, I imagine Dan on a raft in the middle of a tropical cyclone, his threads soaked by the rain, frantically racing to build a shelter. Maybe a type system can’t protect you from the rain, but it sure can protect you from data races–even if the final product looks a little like a Rube Goldberg contraption.

Cyclone, as it turns out, is a safe dialect of C. Dan’s goal was to extended Cyclone’s type system to make it safe in a multithreaded environment. This is not an easy task in a language that uses pointers and manual memory management. The ideas are similar to those in the dialects of Java in my previous posts, but the type-theoretical concepts are pretty advanced.

-Locks

A Cyclone lock is a special type parameterized by name. Defining such entities requires a major type-system tour de force. Little known notions like “existential types,” “dependent types,” and “effect systems” come into play. The major point of all this magic is that you want to use lock names without knowing what they are.

You create a Cyclon lock using the following pseudo-syntax:

let lk<λ> = newlock();

I’ll use Greek letters for symbolic lock names (except for the special name, loc), since the author didn’t propose any concrete syntax and, admittedly, never finished the implementation.

What’s happening behind the scenes is that newlock returns an existential type, which is unpacked into variable lk of type lock_t<λ>. That variable may, for instance, be passed to a (generic) function taking lock_t<λ>.

The beauty of an existential type is that it won’t let the client “guess” the name of the lock. The name is totally opaque–there is not even a type for it. It can only be generated by the system in the call to newlock.

To synchronize a (possibly compound) statement, you precede it with sync(lk), where lk is a lock variable (or an expression that evaluates to such). Continuing the previous example, we may write:

sync(lk) ++*p;

which increments the value pointed to by p under the lock lk.

-Ownership

In an OO language, ownership is the key to a race-free type system. The owner of a given object has to be locked before the object may be accessed.

In a non-OO language, like Cyclone, instead of specifying the owner of data, you specify the lock that protects it. Since you might not have access to the lock variable when you are declaring data, the symbolic lock name is used instead.

Also, since all references in Cyclon are expressed through pointers, lock annotations are associated with pointers. (This is again parallel to annotating object references with their owner objects.)

We can now complete the example above with the annotated definition of p:

int *λ p = new 0;

The pointer p can only be access when the lock whose name is λ is held.

-Local and Shared data

In Cyclon, thread-local data appear as a special case of shared data. A special lock name loc and a (global) variable nonlock are used to turn off the locking of individual instances. (Compare this to the dummy owner thisThread in the OO approach.)

Lock names are also used to parameterize functions and structs. In that case it might be necessary to be able to encode the “sharability” of the type. If the lock type has the kind LS, it actually protects Shared data. The kind LU, on the other hand, may be Unsharable, i.e., it could be instantiated with the thread-local loc. LS is a sub-kind of LU.

The main purpose of “kinding” is to ensure that thread-local data can never be shared between threads.

-Example

All this sounds very dry without an actual (however contrived) example. Let’s start with a simple function inc:

void inc<λ:LU>(int *λ p; {λ}) 
{
    *p = *p + 1;
}

The function is parameterized by a lock name parameter λ of the LU kind. It means that it can be instantiated both in the sharing and thread-local context. Lock-name variable λ first appears in the declaration of the pointer parameter p. Data pointed to by p is protected by the lock named λ. That’s not telling a lot–every pointer in Cyclon is protected by some lock (or nonlock).

What makes the difference is that the same name λ appears in the effects part of the declaration. The runtime system keeps track of what locks are being held at any point in the program, and passes this information to every function. Here, {λ} is an effect that contains one lock name, λ. In other words, the lock that protects p must be taken before calling inc. (Of course, other locks might be held too.)

Here’s an example of how inc may be called from another function, inc2.

void inc2<λ:LU>(lock_t<λ> plk, int *λ p ;{}) 
{
    sync(plk) inc(p);
}

The effects part of the declaration of inc2 is empty. No locks need be held before calling it. But when inc is called, the current effects include the name of the lock specified in the declaration of plk (the sync section adds λ to the effects). Since the same lock name appears in the declaration of p, the precondition for calling inc is fulfilled.

A struct containing a pointer must also be an existential type. The actual value of λ is assigned to LkInt when the type is “packed” (created from its constituents).

struct LkInt 
{<λ:LS> 
    lock_t<λ> plk;
    int*λ p; 
};

Here, LkInt (locked integer) only makes sense in a sharing context, so λ is of the strictly sharing kind, LS. That allows LkInts to be passed between threads.

Here’s an example of using a pointer to LkInt:

void g<λ:LU>(struct LkInt *λ s ;{λ}) 
{
    let LkInt{<λ'> .plk=lk, .p=ptr} = *s;
    inc2(lk, ptr);
}

Function g is parameterized by some lock name of the LU kind (so it may be loc). This lock is used by the callee to lock the LkInt argument s (λ is present in the callee effects).

Now s itself contains a pointer and a lock (of the LS kind). We get at them by unpacking an existential type–splitting it into constituents and giving them new names, lk and ptr. The name of lk is λ’. A shallow copy of s is performed in the process. We can now call inc2, which doesn’t require any effects, and uses lk to lock and increment ptr.

Finally, all these things come together in the example below:

void f(;{}) {
    let lk<λ> = newlock();
    int *λ p1 = new 0;
    int *loc p2 = new 0;
    struct LkInt *loc s = new LkInt{.plk=lk, .p=p1};
    spawn(g, s, sizeof(struct LkInt));
    inc2(lk, p1);
    inc2(nonlock, p2);
}

Here’s the play-by-play:

Function f requires no effects.
The programmer creates a lock lk with the name λ (unpacks an existential type returned by newlock).
She creates a pointer p1 to an integer protected by the lock named λ.
Then she creates an unprotected thread-local pointer p2 (using dummy lock name, loc)
She combines the lock lk and the pointer p1 into a struct LkInt. This is an example of “packing” an existential type. Note that both components must share the same lock name. (The pointer to LkInt, s, is thread-local–lock name: loc.)
The programmer spawns a thread that executes the function g with the argument s. The argument is passed by a shallow copy, but deep contents (the lock and the pointer) are shared. In fact they must be shared with the kinding LS; otherwise spawn would not compile. This mechanism protects any thread-local data from being shared with a new thread.
While the call to g(s) executes in parallel, the programmer increments p1 under the lock lk. Notice that these are the same variables that the other thread operates on. The shared lock enforces mutual exclusion.
The increment of thread-local p2 requires no lock, that’s why dummy nonlock is used.

-Limitations

The biggest limitation of the Cyclon system is that it’s static. The association between data and locks is rigid. That makes several useful idioms impossible to implement. There is no way to “move” data between threads. You can either pass a pointer together with its lock (essentially creating a monitor), or pass data by value.

The other serious problem is the overall complexity of the system–especially the “effects” part. When effects have to be included and manipulated in generic code things get a little hairy.

Granted, a lot of annotations can be elided–the defaults “do the right thing” in most cases. Still, what remains is asking a lot from an average programmer.

-Conclusion

I would like to thank Dan Grossman for correcting some of my mistakes and explaining a few subtle points.

In my next installment I’ll describe a more recent race-free extension of C, SharC, which supports a mix of static and dynamic checking.

March 30, 2009

Generic Types for Concurrency

Posted by Bartosz Milewski under Concurrency, Java, Programming, Type System
[5] Comments

A type system that prevents data races must not eliminate useful concurrency patterns or force the programmer to maintain multiple copies of almost identical data structures.

In my previous post about Guava, a dialect of Java, I talked about a type system that enforces the separation of thread-local and shared objects at the class level. Unfortunately, such rigid system forces the programmer (especially a library writer) to provide dual implementations of many generic classes like vectors, queues, etc. Often the only difference between implementations is the use of synchronized (or, in case of Guava, the special base class called Monitor).

To solve this problem, Boyapati and Rinard proposed a system where the same generic class may be used in different sharing contexts. For instance, the same parameterized vector class, may be used to instantiate a thread-local instance that requires no locking, as well as a shared instance that has a built-in lock.

The paper precedes Generic Java, but employs similar notation. For instance, it lets you define a generic vector class like this:

  class vector<thisOwner> { ... }

and then instantiate it as thread-local (no locking requirements):

  vector<thisThread> localV= new vector<thisThread>;

or as a shared monitor:

  vector<self> sharedV= new vector<self>;

The first template parameter is always interpreted as “the owner” (more about it later). Objects owned by thisThread are thread-local, objects owned by self are monitors.

Even though the notation used in GRFJ (the acronym the authors use for their Java dialect) is different from that of Guava, there are many similarities in the two approaches, since both have to deal with similar issues: explicit sharing, ownership, passing objects between threads, etc.

-Explicit sharing

You might remember that, in Guava, only those classes that inherit from Monitor may be shared. In GRFJ, sharing is defined at the instance level (the instantiation of a template). Every instance declaration must specify the owner of the object. If the owner is not thisThread, the object may be shared between threads. The equivalent of the Guava Monitor is a self-owned object–its owner is declared as self.

-Ownership

Ownership plays an essential role in preventing data races. Every object has an owner. In GRFJ there are three basic types of owners:

thisThread–the object owned by thisThread is never shared.
self–the object is the root of an ownership tree.
Another object–the sharing is defined by the root of the ownership tree

Types of ownership translate naturally into protection patterns. If the owner is thisThread there is no need for locking. If the owner is self, all methods must be synchronized by the object’s lock. In the third case, the object’s owner is responsible for locking. More precisely, the root of the ownership tree to which the object belongs has to be locked, if it’s not declared thread-local.

There are some interesting combinations of ownership. For instance, you can declare a thread-local vector that stores self-owned (shared) items. Or you can declare a shared Stack that contains (owns) a Vector object. All access to Vector will be protected by the Stack’s lock.

For this level of expressiveness, we need classes that are parameterized by owners. Notice that, if the owner of the object x is another object, that object must exist when the declaration of x is reached.

A template might be parameterized by multiple owners. The first one on the list is always the owner of this. In the example below, Stack is parameterized by two owners–the first owns the Stack, the second owns the Values. Note that, in this case, all Values will always share the same owner.

class Stack<thisOwner, valueOwner> {
    Node<thisOwner, valueOwner> head = null;

    void push(Value<valueOwner> value) requires (this) {
        Node<thisOwner, valueOwner> newNode = 
            new Node<thisOwner, valueOwner>;
        newNode.init(value, head);
        head = newNode;
    }
    Value<valueOwner> pop() requires (this) {
        if (head == null) return null;
        Value<valueOwner> value = head.value();
        head = head.next();
        return value;
    }
}

The requires clause specifies whose lock must be held when calling a particular method. In this case, the lock on this must be held. But if this is owned by another object, the locking responsibility automatically moves up a level, until it reaches the ownership root.

Here’s the definition of Node:

class Node<thisOwner, valueOwner> {
    Value<valueOwner> value;
    Node<thisOwner, valueOwner> next;

    void init(Value<valueOwner> v, Node<thisOwner, valueOwner> n)
        requires (this) {
        this.value = v;
        this.next = n;
    }
    Value<valueOwner> value() requires (this) {
        return value;
    }
    Node<thisOwner, valueOwner> next() requires (this) {
       return next;
    }
}

And the definition of Value:

class Value<thisOwner> { int x = 0; }

Using the declarations above we are now ready to declare different values and stacks:

Value<thisThread> v1 = new Value<thisThread>;
Value<self> v2 = new Value<self>;

We have created two values from the same template–v1 is thread-local, v2 is a monitor (access to x is protected by its lock).

Stack<thisThread, thisThread> s1 = new Stack<thisThread, thisThread>;
Stack<thisThread, self> s2 =  new Stack<thisThread, self>;
Stack<self, self> s3 = new Stack<self, self>;

Stack s1 is thread-local and can store only thread-local values. No locks or locking code will be created by the compiler. Stack s2 is also thread-local, but it stores shareable values. A thread-local stack will never be visible from other threads. But self-owned values it stores might be accessed from multiple threads. Finally, s3 is a shared stack containing shared values. Both, the stack s3 and the values it stores have their own independent locks.

s1.push(v1);
s2.push(v2);

We may push a thread-local value v1 on the stack s1, but if we tried to push v2 on s1, the compiler would consider it an error. Pushing v2 on s2, on the other hand, is okay.

Since GRFJ is based on Concurrent Java, the locking and threading look a little odd. To illustrate the sharing of s3, we fork a thread, passing it s3 and v2 (both are sharing-ready) and executing the code s3.push(v2) under the lock of s3.

fork (s3,v2) {synchronized (s3) in {s3.push(v2);}}

Notice that, according to the declaration of s3, it would be an error to push v1 onto it. Indeed, that could result in illegal sharing of a thread-local object. The type system protects us from a hard-to-detect error.

This is hardly a free lunch, though. Annotating every class and every variable might be just too much for a programmer. Fortunately, most owners can be inferred by the compiler by analyzing assignments. Because of that, single threaded programs in GRFJ require virtually no annotations.

-Foreign owners

In most cases, ownership tree follows the containment tree. The owner contains the ownee. Although desirable from the architectural point of view, this arrangement is not strictly necessary. An object might declare another separate object as its owner. This is safe under one condition–the owner object may not be overwritten. Hence the requirement that the owner be final. Here’s the relevant example:

final Foo<self> owner = new Foo<self>;
Bar<owner> ownee = new Bar<owner>;

This becomes important when building new locking protocols from pre-existing parts.

-Object migration

The passing of objects between threads requires either deep copying (like Guava’s Values), or move semantics. In GRFJ, move semantics is implemented by specifying another special type of owner–unique. Unique objects cannot be copied, they have to be moved. The “move” operator is the postfix decrement, just like in Guava. It moves the reference and nulls the source.

Value<unique> v3 = new Value<unique>
Value<unique> v4 = v3--;

Our Stack class is not prepared to store unique objects. This prohibition may be included in its definition (compare it with C++ “concepts”):

class Stack<thisOwner, valueOwner> where (valueOwner != unique)

Conversely, we might want to redefine Stack to store unique objects. A few code changes would be necessary though. For instance, in push, the value must be moved:

newNode.init(value--, head);

In pop, the return value has to be moved:

return value--;

The class Node requires similar changes.

The authors note that, despite appearances, the moving of objects is multiprocessor safe. Even though the assignment to a new reference is not guaranteed to be immediately visible to other threads, a unique object is always published to another thread via a monitor (for instance, a shared stack). The new thread can only get hold of the object by first acquiring the monitor’s lock, which forces previous stores to commit.

-Alias control

Move semantics requires control over aliasing–a moved object may not leave any aliases, nor may it carry along any references to thread-local objects. GRFJ provides additional annotations to mark non-escaping method parameters. The syntax is !e appended to the parameter type. Here’s an example:

void display(Value<unique>!e val);
Value<unique> v5 = new Value<unique>;
display(v5);

A unique object here is passed by reference only because display guarantees that its argument won’t escape.

-Immutable objects

Immutable objects may be passed between threads by reference. In GRFJ, immutable objects are marked by another special owner type, readonly. Interestingly, the problem of the construction of immutable objects is cleverly avoided. You first create a unique object and then move it to a readonly reference.

Value<unique> v6 = new Value<unique>;
v6.x = 1;
Value<readonly> v7 = v6--;

This is a perfectly safe operation, since there is a guarantee that no writable aliases to the unique object may stay behind. The move to readonly freezes the object forever.

Immutable objects can only be passed to methods that promise not to modify their arguments. This is done by appending !w to the type of the parameter. The two suffixes may be combined to form !ew (I am not kidding you!), a parameter that doesn’t escape and is not modified by the method.

-Limitations

– Some concurrent programs use multi-stage access patterns that are not expressible in GRFJ. For instance, a shared array is divided into disjoint sections and each thread operates exclusively on its section without any locking. After all threads synchronize on a barrier, they pick up different sections and continue. The ownership of sections changes with time. (In D this pattern might be implementable using array slices.)

– Dynamic downcasting, which used to be the workhorse of Java before generics, can’t verify the ownership part of the cast, because this information is not available at runtime.

– Static variables may be accessed only when the client holds the class lock. Each class with static variables must therefore have a static lock.

– The authors mention the idea of parameterizing methods that accept poly-owned arguments. This is not so easy as it sounds, since virtual functions cannot be parameterized by a potentially infinite set of types. My guess is that this is possible because detailed ownership information is only needed during type checking. Still, the compiler might have to produce additional implementations of a method depending on whether the parameter is thread-local or not (imagine the parameterized method creating a local variable of the same ownership type–sometimes this variable must contain a lock, sometimes not). Still, this is a finite set of possibilities, so vtable slots may be preallocated for all of them.

– The template syntax for ownership won’t work in languages where templates already accept value parameters, and the compiler isn’t always able to distinguish between types and values.

March 23, 2009

Types for Concurrency

Posted by Bartosz Milewski under Concurrency, Java, Multithreading, Programming, Type System
[11] Comments

Can a good type system prevent concurrency errors? Or is this a quest for the Holy Grail?

There are two parts to this question, corresponding to two major types of concurrency errors:

Preventing data races
Preventing deadlocks

I’ll start with the first one.

Data races occur only when memory is shared between threads. Disallow sharing and data races are gone! In fact there is a name for threads that don’t share memory: processes. It’s perfectly feasible to have a concurrent language that disallows sharing–Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional languages.

Non-functional languages like C++, Java, or D (I was told by Walter Bright, the creator of D, to always use the full name, “the D programming language,” so that search engines can index it properly) tend to share data almost by default (see, however, this post).

In Java, all non-primitive types have reference semantics. When you pass a Java object between threads, you’re only passing a handle; the object behind it becomes accessible from both threads.

C++ at least has means to express passing by value and move semantics for user-defined types. Still, it’s up to the programmer to use them.

Who ordered Guava?

For every type-system idea there is one or more dialects of Java that implement it. I’ll start with an older attempt at data-race free Java called Guava, as it illustrates some of the basic premises.

-Explicit sharing

The most important step–if we don’t want to completely ban the sharing of data–is to regulate it. Let the programmer explicitly mark the data destined for sharing as such. The corollary is that the data that is not marked for sharing cannot be shared. This can be accomplished, for instance, by making all objects thread-local by default, or by using type modifiers that prevent references to such objects from escaping.

In Guava, the data type designed for sharing is called a Monitor. As the name suggests, all access to a Monitor is automatically synchronized by the Monitor’s lock. This, incidentally, eliminates the need for the synchronized keyword, which is absent from Guava.

The non-shared data types are either Objects or Values.

Objects are just like regular Java Objects, except that they don’t have a built-in lock, since they can never be shared between threads.

Values are either primitive values, like integers, or user-defined types with value semantics.

Monitors, Objects, and Values are collectively called instances.

-Value semantics

When you pass a Value, the compiler will make a deep copy of it (well, sort of–the monitors embedded in a Value are not deep copied). Since deep copying might be expensive, Guava defines operator “move”, which nulls the source. The syntax is:

  v2 = v1--;

The value v1 becomes null after the move to v2. This is similar to C++ unique_ptr and std::move.

-Ownership

The biggest problem in lock based concurrency is to make sure that the correct lock(s) are taken when accessing shared data. In Guava, all Monitor’s data are protected by that Monitor’s lock. As long as they stay inside that Monitor, nothing bad can happen to them from the point of concurrency.

Values stored inside a Monitor are never accessible outside of the Monitor–only their copies may be passed out.

The same is not true about Objects. Since Objects have reference semantics, there is a real danger that Objects’ references might escape the Monitor that protects them. Imagine a situation where two Monitors have references to the same Object. It is possible then that two threads may operate on that Object at the same time–one entering through one Monitor, the other through the other Monitor. We have a data race!

Therefore it is important that every Object have an owner at all times. The Object’s owner is either a Value or a Monitor. (The exception is a fresh Object that’s been just allocated–it has no owner until it is assigned to a variable.) Since an Object may only be owned by at most one Monitor, it is that Monitor that protects it from simultaneous (racy) access.

-Regions

All Objects that are owned by a particular Monitor or Value form a region. Equivalently, assigning a monitored region to an object specifies what lock must be held when accessing it.

All instances may contain (references to) monitors, but monitors are not “owned” by anybody. References to the same monitor may appear in multiple regions and may be freely passed around. It is thus up to programmers to define an ordering scheme for their monitors in order to avoid deadlocks.

How can we protect Objects from moving between regions and acquiring multiple owners? We need a way to control aliasing.

Here are some Guava rules for passing Objects. A method may declare its Object parameter as either kept or lent. (By default, parameters to Object methods are kept and to Monitor methods are lent.) If the parameter is kept it must belong to the same region as this, and there are no limitations on its use. If, however, the parameter is lent, the method may not store a reference to it in this, nor may it store this inside a lent Object. No cross-aliasing is allowed.

A method may also be marked new if it returns a freshly created object, which has no owner yet. Constructors are considered new unless they accept kept parameters.

Notice that you may have multiple references to the same Object, but they will all be within the same region. The only instances that may be passed between threads are Monitors and Values.

-Constructors

Guava final fields may either be initialized inside a constructor or in a private method that is only called from inside a constructor. (BTW, in D a private method is not private within a module, so the compiler would have to analyze the whole module to establish the latter condition.) [Note to self: The same scheme might be useful in the construction of immutable objects in D.]

Partially constructed Objects must not be visible outside constructors. The compiler must verify that constructors don’t pass this to other methods, and don’t store this inside other instances (no alias cross-contamination).

-Optimizations

Copying Values around may be expensive. I already mentioned one optimization, the use of the “move” operator. The other optimization is related to immutability. If a Value is immutable, it may be safely passed by reference. Guava defines immutable classes as ones that have no update methods. Any method that may modify this must be marked update. The update notation is also extended to method parameters–by default parameters are immutable.

There is a bonus advantage to separating update methods from the rest. In a Monitor, a non-update method may safely use a readers’ lock, which allows multiple readers to access data simultaneously, to increase concurrency.

-Global and local methods

A method is considered local if its effects cannot be observed or influenced by other threads. All Object and Value methods are by default local. A local method is immune to races thus allowing single-thread optimizations.

Conversely, all Monitor methods are considered global, since operations on Monitors may be visible or influenced by other threads.

These defaults may be overridden. For instance, an Object may contain a reference to a Monitor. The methods that call this Monitor’s methods must be marked global. Moreover, Object or Value methods that access Monitors that are passed to them as arguments must also be marked global. So touching a Monitor (which is equivalent to using its lock) taints the whole callers’ tree with the global annotation.

This is similar to the way update taints the callers tree, except the update annotation of a method only pertains to this, not to its arguments. However, when global and update are combined, they have the same tainting power as global. In particular, a method that invokes a global update method on its argument becomes tainted with global update.

Methods that are global update cannot be invoked from non-update methods, even if they only global-update their arguments.

Note: This part of the paper is not very clear to me. The authors never explain the importance of global update methods (other than optimization opportunities).

-Limitations

Guava implements a relatively simple and somewhat limited system to eliminate races. It punts the problem of ownership-passing and lock-free programming. Even with those limitations the Guava type system is not simple.

The idea that safe multithreading may be achieved with a few simple modification to the type system seems to be untenable. However, as long as special type annotations are limited to the code that does actual multithreading, I think they’re worth the added complexity.

March 10, 2009

Futures done right

Posted by Bartosz Milewski under C++, Concurrency, Haskell, ML, Multicore, Programming
[14] Comments

In my last post I criticized the C++0x implementation of futures for their lack of composability. I now feel obliged to show what I consider “the right way” to do futures.

I use the event library of Concurrent Caml (actually, its version distributed with OCaml) as my model. The ideas behind it are described in J. H. Reppy’s Ph.D. thesis, Higher-order concurrency.

Events and Futures

CML channels and events are more general than futures, but here I’ll concentrate on the subset of their functionality that can be used to implement futures. Below is a useful translation table between CML and C++. (I should add that CML channels, as opposed to C++ futures, are synchronous. In practice it doesn’t matter that much, since promise::set_value–corresponding to channel send–is usually called right before the thread exits.)

CML	C++
channel	promise
event	future
evt = receive chan	ftr = prom.get_future()
sync (send chan msg) (synchronous)	prom.set_value(msg) (asynchronous)
msg = sync evt	msg = ftr.get()
choose evt_lst	???
evt’ = wrap evt fun	???

As you can see, C++ has no equivalents of choose and wrap. I will discuss those next.

choose

CML choose is a function that takes a list of events and creates a new composite event. When the program syncs on this event, it blocks until one of the listed events completes. The combination sync and choose is similar to Unix select or Windows WaitForMultipleObjects.

An example of the usefulness of choose is a situation when there is more than one algorithm to calculate a result, and the program runs them in parallel until the fastest one completes. Linear search, for instance, is faster for small sets than binary search. Instead of figuring out which one to use for each particular case, it might be cheaper (on a multicore) to run both and stop when the fist one is done.

What would choose look like in C++? One possibility is to implement it as a container object. You would push individual futures into it and then call get_future to obtain a composite future. Let me call this object future_or since it does the equivalent of logical OR for futures (the equivalent of AND is called a thread barrier).

  unique_future<int> ftr1 = promise1.get_future();
  unique_future<int> ftr2 = promise2.get_future();
  future_or<int> orf;
  orf.push_back(std::move(ftr1));
  orf.push_back(std::move(ftr2));
  unique_future<int> compositeFut = orf.get_future();
  ...
  int result = compositeFut.get();

Another possible implementation, in C++0x, would be to define a variadic template function future_choose. It would take a variable number of futures and return the composite future. The advantage of such an approach would be to let the compiler infer the types and the number of arguments automatically.

You probably noticed that future_or can only accept futures of one type (the same would be true about the function future_choose). For instance, to create a composite future that returns a string upon get, you must build it from other futures that also return a string (and not an int, for instance).

In principle, heterogeneous futures could be combined if they all returned different types through the same variant. Then, after a composite of variant-returning futures returns a variant, program logic would fork depending on the actual type inside the variant. This approach bypasses the static type system and is error prone. CML provides a more elegant solution using wrap.

wrap

The trick is to combine (wrap) a particular future with a function that processes its output. A future that returns an int is combined with a function that takes an int, etc. The result of this combination is a new future usually returning void.

When a wrapped future is synchronized (e.g., by calling wait), it executes the associated function, passing it the result returned by the original future. Just look at this example:

  void handler(Foo const & foo);
  unique_future<Foo> fooFuture = prom.get_future();
  // combine the future with a handler function
  unique_future<void> wrappedFuture = wrap_future(fooFuture, &handler);
  ...
  wrappedFuture.wait();
  // at this point fooFuture has returned a Foo object 
  // and the handler has been executed with that object.

There’s one thing worth mentioning: the compiler will assure type correctness of the arguments to wrap_future: the return type of the future must be compatible with the argument type of the function. Mismatch errors will be discovered at compile time. Compare this with the implementation using variants, which does not have this property.

Of course the usefulness of wrap really comes into play when you try to combine heterogeneous futures using future_choose. You just wrap each future with its own post-processing function (or callable object) and pass them all to future_choose. Wrapped futures are homogeneous–they all return void.

Here’s an example that uses C++0x lambdas (one of them being a closure):

  unique_future<Foo> fooFuture = prom1.get_future();
  unique_future<int> intFuture = prom2.get_future();
  int sum = 0;
  future<void> composite = future_choose(
      wrap_future(fooFuture, [](Foo const & foo){foo.Commit();},
      wrap_future(intFuture, [&sum](int i){sum += i;});
  ...
  composite.wait();

Wrapped futures would be useful, e.g., in implementing a server that listens to several communication channels.

What about Haskell?

As I described in my earlier post, unlike CML, Haskell implements asynchronous message passing. The original Concurrent Haskell paper argued against choose. Later however Avik Chaudhuri was able to show that the CML event library can be implemented on top of asynchronous MVars. So there are no theoretical obstacles to asynchronous implementation of futures.

Final remarks

The equivalent of choose called future_select has been proposed for Boost and C++0x, but it was turned down.

Interestingly enough, Microsoft’s Task Parallel Library for .NET implements higher-order concurrency. They have ContinueWith for wrapping futures and WaitAny for choice. I can only assume that it’s easier to implement those on a virtual machine.

I also wonder whether it would be more productive to implement the equivalent of the CML channels and events in C++, and treat futures as a special case.

If you like this post, please vote for it on reddit.

Bibliography

J. H. Reppy. Higher-order concurrency. PhD thesis, Cornell
University, 1992. Technical Report 92-1852.
Avik Chaudhuri. A Concurrent ML Library in Concurrent Haskell.

March 3, 2009

Broken promises–C++0x futures

Posted by Bartosz Milewski under C++, Concurrency, Programming
[31] Comments

I am frankly disappointed with C++0x futures. They lack a major feature–composability. It’s like having Booleans without OR or AND.

But let me first explain what futures are and how they’re implemented in C++0x.

What is a future?

In a nutshell, a future is an asynchronous function call. You call a function, but don’t wait until it completes–the body of the function executes in a separate thread. Instead you get a future, a promise to deliver the result at some later point in time. Only when your program desperately needs that value, do you synchronize on the future. If the calculation hasn’t been completed by then, you block until it has. Otherwise you get the result of the calculation.

C++0x futures

C++ splits the implementation of futures into a set of small blocks–it almost creates an assembly language of futures. Here are the primitives.

-promise

A promise is a vehicle for passing the return value (or an exception) from the thread executing a function to the thread that cashes in on the function future. The function (or a callable object) that is executed in a separate thread must have access to the promise, and it explicitly sets the return value.

You can think of a promise as a primitive channel; the function in this case is the sender. The equivalent of the “send” method is called promise::set_value (to pass an exception to the caller, the callee can call promise::set_exception.

There is no corresponding “receive” method–the receiving is abstracted into the future object.

Here’s some code that illustrates the use of promise. Notice how the function to be called asynchronously must be aware of the promise.

void asyncFun(promise<int> intPromise)
{
    int result;
    try {
        // calculate the result
        intPromise.set_value(result);
    } catch (MyException e) {
        intPromise.set_exception(std::copy_exception(e));
    }
}

The calling thread creates the promise and passes it to the worker thread executing asyncFun (I’ll show the details later).

-future

A future is the synchronization object constructed around the receiving end of the promise channel. The calling thread obtains a future using promise::get_future.

The most important method of the future is get. If the result is not ready, get will block. When get completes, it either returns a value or throws an exception. The return value or the exception has been set by the called function through the promise associated with the future (see above).

get can also be split into its more basic parts: wait, optionally followed by has_value (or has_exception) and the call to get, which is guaranteed not to block. The advantage of the latter approach is that one can use versions of wait, wait_for and wait_until, that set timeouts.

There is also an asynchronous method, is_ready, that can be used for polling rather than blocking.

There are two separate implementations of future: regular future (which used to be called unique_future in Draft Standard) that works like a unique_ptr as far as passing values goes, and shared_future that works more like shared_ptr. For instance, the method get of future cannot be called twice because it transfers the ownership of the result.

Here’s an example how a future could be used in the caller’s code:

promise<int> intPromise;
future<int> intFuture = intPromise.get_future();
std::thread t(asyncFun, std::move(intPromise));
// do some other stuff
int result = intFuture.get(); // may throw MyException

-packaged_task

In most cases the use of promises can be hidden from the programmer. There is a template class, packaged_task that takes any function (or callable object) and instruments it for use with futures. In essence, it creates a promise and calls the function from inside a try/catch block. If the function returns, packaged_task puts the value in the promise, otherwise it sets the exception. A packaged_task is a callable object (has the function call operator() defined) can be passed directly to a thread.

Here’s a more complete example:

vector<int> primesUpTo(int n);

int main() {
    packaged_task<vector<int>(int)> task(&primesUpTo);
    future<vector<int>> fut = task.get_future();
    thread t(move(task), 100);
    t.detach();
    // do some other work while the primes are computing
    vector<int> primes = fut.get();
    // print primes
    return 0;
}

Composability

So far so good. The creation of futures might be a bit verbose in C++, but it can be easily adapted to concrete environments. The futures standard library should be treated as a set of primitive building blocks from which to build higher abstractions.

Notice how C++ separated the channel (promise) from the synchronization mechanism (future); and thread creation from communication and synchronization. You may, for instance, use futures with thread pools or create threads on the spot.

What was omitted from the standard, however, was the ability to compose futures. Suppose you start several threads to perform calculations or retrieve data in parallel. You want to communicate with those threads using futures. And here’s the problem: you may block on any individual future but not on all of them. While you are blocked on one, other futures might become ready. Instead of spending your precious time servicing those other futures, you might be blocked on the most time-consuming one. The only option to process futures in order of completion is by polling (calling is_ready on consecutive futures in a loop) and burning the processor.

The need for composability of synchronization events has been widely recognized. Windows has WaitForMultipleObjects, Unix has select, and several higher order languages have explicit constructs that serve the same purpose. But that’s a topic for my next installment.

Acknowledgments

I’d like to thank Anthony Williams for helpful comments. Among other things, he is the author of one of the futures proposals for Boost. A stripped down version of it will become part of the threadpool library. Unfortunately, the stripping got rid of composability features.

« Previous Page — Next Page »

2009

Need for generic code

An example

– Stack

– Node

Comment on notation

Other uses of qualifier polymorphism

Conclusion

Owners

-Every object has an owner

-Trees

-What may be shared

-Locking

Introducing ownership annotations

-Creating a monitor

Thread-local objects

Monitors

Subobjects

unique and immutable

Example

External ownership

Conclusion

Bibliography

Teaser #1

How to making concurrent programming safer?

How to limit complexity?

How to keep goals reasonable?

Teaser #2

Lessons learned from previous work

Higher-level models

Bibliography

Type system approach

Issues

Conclusion

Bibliography

Sharing modes

Safe cast

Example

Conclusion

-Locks

-Ownership

-Local and Shared data

-Example

-Limitations

-Conclusion

-Explicit sharing

-Ownership

-Foreign owners

-Object migration

-Alias control

-Immutable objects

-Limitations

Who ordered Guava?

-Explicit sharing

-Value semantics

-Ownership

-Regions

-Constructors

-Optimizations

-Global and local methods

-Limitations

Events and Futures

choose

wrap

What about Haskell?

Final remarks

Bibliography

What is a future?

C++0x futures

-promise

-future

-packaged_task

Composability

Acknowledgments

Top Posts

License

Blogroll

Follow Me

Archives