Java



I’ve blogged before about the C++ unique_ptr not being unique and how true uniqueness can be implemented in an ownership-based type system. But I’ve been just scratching the surface.

The new push toward uniqueness is largely motivated by the demands of multithreaded programming. Unique objects are alias free and, in particular, cannot be accessed from more than one thread at a time. Because of that, they never require locking. They can also be safely passed between threads without the need for deep copying. In other words, they are a perfect vehicle for safe and efficient message passing. But there’s a rub…

The problem is this: How do you create and modify unique objects that have internal pointers. A classic example is a doubly linked list. Consider this Java code:

public class Node {
    public Node _next;
    public Node _prev;
}
public class LinkedList {
    private Node _head;
    public void insert(Node n) {
        n._next = _head;
        if (_head != null)
            _head._prev = n;
        _head = n;
    }
}

Suppose that you have a unique instance of an empty LinkedList and you want to insert a new link into it without compromising its uniqueness.

The first danger is that there might be external aliases to the node you are inserting–the node is not unique, it is shared. In that case, after the node is absorbed:

_head = n;

_head would be pointing to an alias-contaminated object. The list would “catch” aliases and that would break the uniqueness property.

The remedy is to require that the inserted node be unique too, and the ownership of it be transferred from the caller to the insert method. (Notice however that, in the process of being inserted, the node loses its uniqueness, since there are potentially two aliases pointing to it from inside the list–one is _head and the other is _head._prev. Objects inside the list don’t have to be unique–they may be cross-linked.)

The second danger is that the method insert might “leak” aliases. The tricky part is when we let the external node, n, store the reference to our internal _head:

n._next = _head

We know that this is safe here because the node started unique and it will be absorbed into the list, so this alias will become an internal alias, which is okay. But how do we convince the compiler to certify this code as safe and reject code that isn’t? Type system to the rescue!

Types for Uniqueness

There have been several approaches to uniqueness using the type system. To my knowledge, the most compact and comprehensive one was presented by Haller and Odersky in the paper, Capabilities for External Uniqueness, which I will discuss in this post. The authors not only presented the theory but also implemented the prototype of the system as an extension of Scala. Since not many people are fluent in Scala, I’ll translate their examples into pseudo-Java, hopefully not missing much.

Both in Scala and Java one can use annotations to extend the type system. Uniqueness introduces three such annotations, @unique, @transient, and @exposed; and two additional keywords, expose and localize.

-Objects that are @unique

In the first approximation you may think of a @unique object as a leak-proof version of C++ unique_ptr. Such object is guaranteed to be “tracked” by at most one reference–no aliases are allowed. Also no external references are allowed to point to the object’s internals and, conversely, object internals may not reference any external objects. However, and this is a very important point, the insides of the @unique object may freely alias each other. Such a closed cross-linked mess is called a cluster.

Consider, for instance, a (non-empty) @unique linked list. It’s cluster consists of cross-linked set of nodes. It’s relatively easy for the compiler to guarantee that no external aliases are created to a @unique list–the tricky part is to allow the manipulation of list internals without breaking its uniqueness (Fig 1 shows our starting point).

Fig 1. The linked list and the node form separate clusters

Look at the definition of insert. Without additional annotations we would be able to call it with a node that is shared between several external aliases. After the node is included in the list, those aliases would be pointing to the internals of the list thus breaking its uniqueness. Because of that, the uniqueness-savvy compiler will flag a call to such un-annotated insert on a @unique list as an error. So how can we annotate insert so that it guarantees the preservation of uniqueness?

-Exposing and Localizing

Here’s the modified definition of insert:

public void insert(@unique Node n) @transient {
    expose (this) { list =>
        var node = localize (n, list);
        node._next = list._head;
        if (list._head != null)
            list._head._prev = node;
        list._head = node;
    }
}

Don’t worry, most of the added code can be inferred by the compiler, but I make it explicit here for the sake of presentation. Let’s go over some of the details.

The node, n passed to insert is declared as @unique. This guarantees that it forms its own cluster and that n is the only reference to it. Also, @unique parameters to a method are consumed by that method. The caller thus loses her reference to it (the compiler invalidates it), as demonstrated in this example:

@unique LinkedList lst = new @unique LinkedList();
@unique Node nd = new @unique Node();
lst.insert(nd);
nd._next; // error: nd has been consumed!

The method itself is annotated as @transient. It means that the this object is @unique, but not consumed by the method. In general, the @transient annotation may be applied to any parameter, not only this. You might be familiar with a different name for transient–borrowed.

Inside insert, the this parameter is explicitly exposed (actually, since the method is @transient, the compiler would expose this implicitly).

expose (this) { list => ... }

The new name for the exposed this is list.

Once a cluster is exposed, some re-linking of its constituents is allowed. The trick is not to allow any re-linking that would lead to the leakage of aliases. And here’s the trick: To guarantee no leakage, the compiler assigns the exposed object a special type–its original type tagged by a unique identifier. This identifier is created for the scope of each expose statement. All members of the exposed cluster are also tagged by the same tag. Since the compiler type-checks every assignment it automatically makes sure that both sides have the same tag.

Now we need one more ingredient–bringing the @unique node into the cluster. This is done by localizing the parameter n to the same cluster as list.

var node = localize (n, list);

The localize statement does two things. It consumes n and it returns a reference to it that is tagged by the same tag as its second parameter. From that point on, node has the same tagged type as all the exposed nodes inside the list, and all assignments type-check.

Exposed list and localized node

Fig 2. The list has been exposed: all references are tagged. The node has been localized (given the same tag as the list). Re-linking is now possible without violating the type system.

Note that, in my pseudo-Java, I didn’t specify the type of node returned by localize. That’s because tagged types are never explicitly mentioned in the program. They are the realm of the compiler.

Functional Decomposition

The last example was somewhat trivial in that the code that worked on exposed objects fit neatly into one method. But a viable type system cannot impose restrictions on structuring the code. The basic requirement for any programming language is to allow functional decomposition–delegating work to separate subroutines, which can be re-used in other contexts. That’s why we have to be able to define functions that operate on exposed and/or localized objects.

Here’s an example from Haller/Odersky that uses recursion within the expose statement. append is a method of a singly-linked list:

void append(@unique SinglyLinkedList other) @transient
{
    expose(this) { list =>
        if (list._next == null)
            list._next = other; // localize and consume
        else
            list._next.append(other);
    }
}

In the first branch of the if statement, a @unique parameter, other, is (implicitly) localized and consumed. In the second branch, it is recursively passed to append. Notice an important detail, the subject of append, list._next, is not @unique–it is exposed. Its type has been tagged by a unique tag. But the append method is declared as @transient. It turns out that both unique and exposed arguments may be safely accepted as transient parameters (including the implicit this parameter).

Because of this rule, it’s perfectly safe to forgo the explicit expose inside a transient method. The append method may be thus simplified to:

void append(@unique SinglyLinkedList other) @transient
{
    // 'this' is implicitly exposed
    if (_next == null)
        _next = other; // localize and consume
    else
        _next.append(other);
}

Things get a little more interesting when you try to reuse append inside another method. Consider the implementation of insert:

void insert(@unique SingleLinkedList other) @transient
{
    var locOther = localize(other, this);
    if (other != null) 
    {
        locOther.append(_next)
        _next = locOther;
   }
}

The insert method is transient–it works on unique or exposed lists. It accepts a unique list, other, which is consumed by the localize statement. The this reference is implicitly exposed with the same tag as locOther, so the last statement _next=locOther type-checks. The only thing that doesn’t type-check is the argument to append, which is supposed to be unique, but here it’s exposed instead.

This time there is no safe conversion to help us, so if we want to be able to reuse append, we have to modify its definition. First of all, we’ll mark its parameter as @exposed. An exposed parameter is tagged by the caller. In order for append to work, the this reference must also be tagged by the caller–with the same tag. Otherwise the assignment, _next=other, inside append, wouldn’t type-check. It follows that the append method must also be marked as @exposed (when there is more than one exposed parameter, they all have to be tagged with the same tag).

Here’s the new version of append:

void append(@exposed SinglyLinkedList other) @exposed
{
    if (_next == null)
        _next = other; // both exposed under the same tag
    else
        _next.append(other); // both exposed under the same tag
}

Something interesting happened to append. Since it now operates on exposed objects, it’s the caller’s responsibility to expose and localize unique object (this is exactly what we did in insert). Interestingly, append will now also operate on non-annotated types. You may, for instance, append one non-unique list to another non-unique list and it will type-check! That’s because non-annotated types are equivalent to exposed types with a null tag–they form a global cluster of their own.

This kind of polymorphism (non-annotated/annotated) means that in many cases you don’t have to define separate classes for use with unique objects. What Haller and Odersky found out is that almost all class methods in the Scala’s collection library required only the simplest @exposed annotations without changing their implementation. That’s why they proposed to use the @exposed annotation on whole classes.

Conclusion

Every time I read a paper about Scala I’m impressed. It’s a language that has very solid theoretical foundations and yet is very practical–on a par with Java, whose runtime it uses. I like Scala’s approach towards concurrency, with strong emphasis on safe and flexible message passing. Like functional languages, Scala supports immutable messages. With the addition of uniqueness, it will also support safe mutable messages. Neither kind requires synchronization (outside of that provided by the message queue), or deep copying.

There still is a gap in the Scala’s concurrency model–it’s possible to share objects between threads without any protection. It’s up to the programmer to declare shared methods as synchronized–just like in Java; but there is no overall guarantee of data-race freedom. So far, only ownership systems were able to deliver that guarantee, but I wouldn’t be surprised if Martin Odersky had something else up his sleeve for Scala.

I’d like to thank Philip Haller for reading the draft of this post and providing valuable comments. Philip told me that a new version of the prototype is in works, which will simplify the system further, both for the programmer and the implementer.


Is the Actor model just another name for message passing between threads? In other words, can you consider a Java Thread object with a message queue an Actor? Or is there more to the Actor model? Bartosz investigates.

I’ll start with listing various properties that define the Actor Model. I will discuss implementation options in several languages.

Concurrency

Actors are objects that execute concurrently. Well, sort of. Erlang, for instance, is not an object-oriented language, so we can’t really talk about “objects”. An actor in Erlang is represented by a thing called a Process ID (Pid). But that’s nitpicking. The second part of the statement is more interesting. Strictly speaking, an actor may execute concurrently but at times it will not. For instance, in Scala, actor code may be executed by the calling thread.

Caveats aside, it’s convenient to think of actors as objects with a thread inside.

Message Passing

Actors communicate through message passing. Actors don’t communicate using shared memory (or at least pretend not to). The only way data may be passed between actors is through messages.

Erlang has a primitive send operation denoted by the exclamation mark. To send a message Msg to the process (actor) Pid you write:

Pid ! Msg

The message is copied to the address space of the receiver, so there is no sharing.

If you were to imitate this mechanism in Java, you would create a Thread object with a mailbox (a concurrent message queue), with no public methods other than put and get for passing messages. Enforcing copy semantics in Java is impossible so, strictly speaking, mailboxes should only store built-in types. Note that passing a Java Strings is okay, since strings are immutable.

-Typed messages

Here’s the first conundrum: in Java, as in any statically typed language, messages have to be typed. If you want to process more than one type of messages, it’s not enough to have just one mailbox per actor. In Erlang, which is dynamically typed, one canonical mailbox per actor suffices. In Java, mailboxes have to be abstracted from actors. So an actor may have one mailbox for accepting strings, another for integers, etc. You build actors from those smaller blocks.

But having multiple mailboxes creates another problem: How to block, waiting for messages from more than one mailbox at a time without breaking the encapsulation? And when one of the mailboxes fires, how to retrieve the correct type of a message from the appropriate mailbox? I’ll describe a few approaches.

-Pattern matching

Scala, which is also a statically typed language, uses the power of functional programming to to solve the typed messages problem. The receive statement uses pattern matching, which can match different types. It looks like a switch statements whose case labels are patterns. A pattern may specify the type it expects. You may send a string, or an integer, or a more complex data structure to an actor. A single receive statement inside the actor code may match any of those.

receive {
    case s: String => println("string: "+ s)
    case i: Int => println("integer: "+ i)
    case m => println("unknown: "+ m)
}

In Scala the type of a variable is specified after the colon, so s:String declares the variable s of the type String. The last case is a catch-all.

This is a very elegant solution to a difficult problem of marrying object-oriented programming to functional programming–a task at which Scala exceeds.

-Casting

Of course, we always have the option of escaping the type system. A mailbox could be just a queue of Objects. When a message is received, the actor could try casting it to each of the expected types in turn or use reflection to find out the type of the message. Here’s what Martin Odersky, the creator of Scala, has to say about it:

The most direct (some would say: crudest) form of decomposition uses the type-test and type-cast instructions available in Java and many other languages.

In the paper he co-authored with Emir and Williams (Matching Objects With Patterns) he gives the following evaluation of this method:

Evaluation: Type-tests and type-casts require zero overhead for the class hierarchy. The pattern matching itself is very verbose, for both shallow and deep patterns. In particular, every match appears as both a type-test and a subsequent type-cast. The scheme raises also the issue that type-casts are potentially unsafe because they can raise ClassCastExceptions. Type-tests and type-casts completely expose representation. They have mixed characteristics with respect to extensibility. On the one hand, one can add new variants without changing the framework (because there is nothing to be done in the framework itself). On the other hand, one cannot invent new patterns over existing variants that use the same syntax as the type-tests and type-casts.

The best one could do in C++ or D is to write generic code that hides casting from the client. Such generic code could use continuations to process messages after they’ve been cast. A continuation is a function that you pass to another function to be executed after that function completes (strictly speaking, a real continuation never returns, so I’m using this word loosely). The above example could be rewritten in C++ as:

void onString(std::string const & s) {
    cout << "string: " << s << std::endl;
}
void onInt(int i) {
    cout << "integer: " << i << std::endl;
}

receive<std::string, int> (&onString, &onInt);

where receive is a variadic template (available in C++0x). It would do the dynamic casting and call the appropriate function to process the result. The syntax is awkward and less flexible than that of Scala, but it works.

The use of lambdas might make things a bit clearer. Here’s an example in D using lambdas (function literals), courtesy Sean Kelly and Jason House:

receive(
    (string s){ writefln("string: %s", s); },
    (int i){ writefln("integer: %s", i); }
);

Interestingly enough, Scala’s receive is a library function with the pattern matching block playing the role of a continuation. Scala has syntactic sugar to make lambdas look like curly-braced blocks of code. Actually, each case statement is interpreted by Scala as a partial function–a function that is not defined for all values (or types) of arguments. The pattern matching part of case becomes the isDefinedAt method of this partial function object, and the code after that becomes its apply method. Of course, partial functions could also be implemented in C++ or D, but with a lot of superfluous awkwardness–lambda notation doesn’t help when partial functions are involved.

-Isolation

Finally, there is the problem of isolation. A message-passing system must be protected from data sharing. As long as the message is a primitive type and is passed by value (or an immutable type passed by reference), there’s no problem. But when you pass a mutable Object as a message, in reality you are passing a reference (a handle) to it. Suddenly your message is shared and may be accessed by more than one thread at a time. You either need additional synchronization outside of the Actor model or risk data races. Languages that are not strictly functional, including Scala, have to deal with this problem. They usually pass this responsibility, conveniently, to the programmer.

-Kilim

Java is not a good language to implement the Actor model. You can extend Java though, and there is one such extension worth mentioning called Kilim by Sriram Srinivasan and Alan Mycroft from Cambridge, UK. Messages in Kilim are restricted to objects with no internal aliasing, which have move semantics. The pre-processor (weaver) checks the structure of messages and generates appropriate Java code for passing them around. I tried to figure out how Kilim deals with waiting on multiple mailboxes, but there isn’t enough documentation available on the Web. The authors mention using the select statement, but never provide any details or examples.

Correction: Sriram was kind enough to provide an example of the use of select:

int n = Mailbox.select(mb0, mb1, .., timeout);

The return value is the index of the mailbox, or -1 for the timeout. Composability is an important feature of the message passing model.

Dynamic Networks

Everything I described so far is common to CSP (Communicating Sequential Processes) and the Actor model. Here’s what makes actors more general:

Connections between actors are dynamic. Unlike processes in CSP, actors may establish communication channels dynamically. They may pass messages containing references to actors (or mailboxes). They can then send messages to those actors. Here’s a Scala example:

receive {
    case (name: String, actor: Actor) =>
        actor ! lookup(name)
}

The original message is a tuple combining a string and an actor object. The receiver sends the result of lookup(name) to the actor it has just learned about. Thus a new communication channel between the receiver and the unknown actor can be established at runtime. (In Kilim the same is possible by passing mailboxes via messages.)

Actors in D

The D programming language with my proposed race-free type system could dramatically improve the safety of message passing. Race-free type system distinguishes between various types of sharing and enforces synchronization when necessary. For instance, since an Actor would be shared between threads, it would have to be declared shared. All objects inside a shared actor, including the mailbox, would automatically inherit the shared property. A shared message queue inside the mailbox could only store value types, unique types with move semantics, or reference types that are either immutable or are monitors (provide their own synchronization). These are exactly the types of messages that may be safely passed between actors. Notice that this is more than is allowed in Erlang (value types only) or Kilim (unique types only), but doesn’t include “dangerous” types that even Scala accepts (not to mention Java or C++).

I will discuss message queues in the next installment.


If it weren’t for the multitude of opportunities to shoot yourself in the foot, multithreaded programming would be easy. I’m going to discuss some of these “opportunities” in relation to global variables. I’ll talk about general issues and discuss the ways compilers can detect them. In particular, I’ll show the protections provided by my proposed extensions to the type system.

Global Variables

There are so many ways the sharing of global variables between threads can go wrong, it’s scary.

Let me start with the simplest example: the declaration of a global object of class Foo (in an unspecified language with Java-like syntax).

Foo TheFoo = new Foo;

In C++ or Java, TheFoo would immediately be visible to all threads, even if Foo provided no synchronization whatsoever (strictly speaking Java doesn’t have global variables, but static data members play the same role).

If the programmer doesn’t do anything to protect shared data, the default immediately exposes her to data races.

The D programming language (version 2.0, also known as D2) makes a better choice–global variables are, by default, thread local. That takes away the danger of accidental sharing. If the programmer wants to share a global variable, she has to declare it as such:

shared Foo TheFoo = new shared Foo;

It’s still up to the designer of the class Foo to provide appropriate synchronization.

Currently, the only multithreaded guarantee for shared objects in D2 is the absence of low-level data races on multiprocessors–and even that, only in the safe subset of D. What are low level data races? Those are the races that break some lock-free algorithms, like the infamous Double-Checked Locking Pattern. If I were to explain this to a Java programmer, I’d say that all data members in a shared object are volatile. This property propagates transitively to all objects the current object has access to.

Still, the following implementation of a shared object in D would most likely be incorrect even with the absence of low-level data races:

class Foo {
    private int[] _arr;
    public void append(int i) {
       _arr ~= i; // array append
    }
}

auto TheFoo = new shared Foo;

The problem is that an array in D has two fields: the length and the pointer to a buffer. In shared Foo, each of them would be updated atomically, but the duo would not. So two threads calling TheFoo.append could interleave their updates in an unpredictable way, possibly leading to loss of data.

My race-free type system goes further–it eliminates all data races, both low- and high-level. The same code would work differently in my scheme. When an object is declared shared, all its methods are automatically synchronized. TheFoo.append would take Foo‘s lock and make the whole append operation atomic. (For the advanced programmer who wants to implement lock-free algorithms my scheme offers a special lockfree qualifier, which I’ll describe shortly.)

Now suppose that you were cautious enough to design your Java/D2 class Foo to be thread safe:

class Foo {
    private int [] _arr;
    public synchronized void append(int i) {
       _arr ~= i; // array append
    }
}

Does it mean your global variable, TheFoo, is safe to use? Not in Java. Consider this:

static Foo TheFoo; // static = global
// Thread 1
TheFoo = new Foo();
// Thread 2
while (TheFoo == null)
    continue;
TheFoo.append(1);

You won’t even know what hit you when your program fails. I will direct the reader to one of my older posts that explains the problems of publication safety on a multiprocessor machine. The bottom line is that, in order to make your program work correctly in Java, you have to declare TheFoo as volatile (or final, if you simply want to prevent such usage). Again, it looks like in Java the defaults are stacked against safe multithreading.

This is not a problem in D2, since shared implies volatile.

In my scheme, the default behavior of shared is different. It works like Java’s final. The code that tries to rebind the shared object (re-assign to the handle) would not compile. This is to prevent accidental lock-free programming. (If you haven’t noticed, the code that waits on the handle of TheFoo to switch from null to non-null is lock-free. The handle is not protected by any lock.) Unlike D2, I don’t want to make lock-free programming “easy,” because it isn’t easy. It’s almost like D2 is endorsing lock-free programming by giving the programmer a false sense of security.

So what do you do if you really want to spin on the handle? You declare your object lockfree.

lockfree Foo TheFoo;

lockfree implies shared (it doesn’t make sense otherwise), but it also makes the handle “volatile”. All accesses to it will be made sequentially consistent (on the x86, it means all stores will compile to xchg).

Note that lockfree is shallow–data members of TheFoo don’t inherit the lockfree qualification. Instead, they inherit the implied shared property of TheFoo.

It’s not only object handles that can be made lockfree. Other atomic data types like integers, Booleans, etc., can also be lockfree. A lockfree struct is also possible–it is treated as a tuple whose all elements are lockfree. There is no atomicity guarantee for the whole struct. Methods can be declared lockfree to turn off default synchronization.

Conclusion

Even the simplest case of sharing a global variable between threads is fraught with danger. My proposal inobtrusively eliminates most common traps. The defaults are carefully chosen to let the beginners avoid the pitfalls of multithreaded programming.


In my last post I talked about the proposal for the ownership scheme for multithreaded programs that provides alias control and eliminates data races. The scheme requires the addition of new type qualifiers to the (hypothetical) language. The standard concern is that new type qualifiers introduce code duplication. The classic example is the duplication of getters required by the introduction of the const modifier:

class Foo {
    private Bar _bar;
public:
    Bar get() {
        return _bar;
    }
    const Bar get() const {
        return _bar;
    }
}

Do ownership annotations lead to the same kind of duplication? Fortunately not. It’s true that, in most cases, two implementations of each public method are needed–with and without synchronization–but this is taken care by the compiler, not by the programmer. Unlike in Java, we don’t need a different class for shared Vector and thread-local ArrayList. In my scheme, when a vector is instantiated as a monitor (shared), the compiler automatically puts in the necessary synchronization code.

Need for generic code

The ownership scheme introduces an element of genericity by letting the programmer specify ownership during the instantiation of a class (just as a template parameter is specified during the instantiation of a template).

I already mentioned that most declarations can be expressed in two ways: one using type qualifiers, another using template notation–the latter exposing the generic nature of ownership annotations. For instance, these two forms are equivalent:

auto foo2 = new shared Foo;
auto foo2 = new Foo<owner::self>;

The template form emphasizes the generic nature of ownership annotations.

With the ownership system in place, regular templates parametrized by types also gain an additional degree of genericity since their parameters now include implicit ownership information. This is best seen in objects that don’t own the objects they hold. Most containers have this property (unless they are restricted to storing value types). For instance, a stack object might be thread-local while its elements are either thread-local or shared. Or the stack might be shared, with shared elements, etc. The source code to implement such stacks may be identical.

The polymorphic scheme and the examples are based on the GRFJ paper I discussed in a past post.

An example

– Stack

A parameterized stack might look like this :

class Stack<T> {
    private Node<T> _top;
public:
    void push(T value) {
        auto newNode = new Node<owner::this, T>;
        newNode.init(:=value, _top);
        _top = newNode;
    }
    T pop() {
        if (_top is null) return null;
        auto value := _top.value();
        _top = _top.next();
        return value;
    }
}

This stack is parametrized by the type T. This time, however, T includes ownership information. In particular T could be shared, unique, immutable or–the default–thread-local. The template picks up whatever qualifier is specified during instantiation, as in:

auto stack = new Stack<unique Foo>;

The _top (the head of the linked list) is of the type Node, which is parametrized by the type T. What is implicit in this declaration is that _top is owned by this–the default assignment of ownership for subobjects. If you want, you can make this more explicit:

private Node<owner::this, T> _top;

Notice that, when constructing a new Node, I had to specify the owner explicitly as this. The default would be thread-local and could leak thread-local aliases in the constructor. It is technically possible that the owner::this could, at least in this case, be inferred by the compiler through simple flow analysis.

Let’s have a closer look at the method push, where some interesting things are happening. First push creates a new node, which is later assigned to _top. The compiler cannot be sure that the Node constructor or the init method don’t leak aliases. That looks bad at first glance because, if an alias to newNode were leaked, that would lead to the leakage of _top as well (after a pop).

And here’s the clincher: Because newNode was declared with the correct owner–the stack itself–it can’t leak an alias that has a different owner. So anybody who tries to access the (correctly typed) leaked alias would have to hold the lock on the stack. Which means that, if the stack is shared, unsynchronized access to any of the nodes and their aliases is impossible. Which means no races on Nodes.

I also used the move operator := to move the values stored on the stack. That will make the stack usable with unique types. (For non-unique types, the move operator turns into regular assignment.)

I can now instantiate various stacks with different combinations of ownerships. The simplest one is:

auto localStack = new Stack<Foo>;

which is thread-local and stores thread-local objects of class Foo. There are no restrictions on Foo.

A more interesting combination is:

auto localStackOfMonitors = new Stack<shared Foo>;

This is a thread-local stack which stores monitor objects (the opposite is illegal though, as I’ll explain in a moment).

There is also a primitive multithreaded message queue:

auto msgQueue = new shared Stack<shared Foo>;

Notice that code that would try to push a thread-local object on the localStackOfMonitors or the msgQueue would not compile. We need the rich type system to be able to express such subtleties.

Other interesting combinations are:

auto stackOfImmutable = new shared Stack<immutable Foo>;
auto stackOfUnique = new shared Stack<unique Foo>;

The latter is possible because I used move operators in the body of Stack.

– Node

Now I’ll show you the fully parameterized definition of Node. I made all ownership annotations explicit for explanatory purposes. Later I’ll argue later that all of them could be elided.

class Node<T> {
private:
    T _value;
    Node<owner::of_this, T> _next;
public:
    void init(T v, Node<owner::of_this, T> next)
    {
        _value := v;
        _next = next;
    }
    T value() {
        return :=_value;
    }
    Node<owner::of_this, T> next() {
        return _next;
    }
}

Notice the declaration of _next: I specified that it must be owned by the owner of the current object, owner::of_this. In our case, the current object is a node and its owner is an instance of the Stack (let’s assume it’s the self-owned msgQueue).

This is the most logical assignment of ownership: all nodes are owned by the same stack object. That means no ownership conversions need be done, for instance, in the implementation of pop. In this assignment:

_top = _top.next();

the owner of _top is msgQueue, and so is the owner of its _next object. The types match exactly. I drew the ownership tree below. The fat arrows point at owners.
Ownership hierarchy using owner::of_this

But that’s not the only possibility. The default–that is _next being owned by the current node–would work too. The corresponding ownership tree is shown below.

Ownership hierarchy using owner::this

The left-hand side of the assignment

_top = _top.next();

is still owned by msgQueue. But the _next object inside the _top is not. It is owned by the _top node itself. These are two different owners so, during the assignment, the compiler has to do an implicit ownership conversion. Such conversion is only safe if both owners belong to the same ownership tree (sometimes called a “region”). Indeed, ownership is only needed for correct locking, and the locks always reside at the top of the tree (msgQueue in this case). So, after all, we don’t need to annotate _next with the ownership qualifier.

The two other annotations can be inferred by the compiler (there are some elements of type inference even in C++0x and D). The argument next to the method init must be either owned by this or be convertible to owner::this because of the assignment

_next = next;

Similarly, the return from the method next is implicitly owned by this (the node). When it’s used in Stack.pop:

_top = _top.next();

the owner conversion is performed.

With ownership inference, the definition of Node simplifies to the following:

class Node<T> {
private:
    T _value;
    Node<T> _next; // by default owned by this
public:
    void init(T v, Node<T> next)
    {
        _value := v;
        _next = next; // inference: owner of next must be this
    }
    T value() {
        return :=_value;
    }
    Node<T> next() {
        return _next; // inference: returned node is owned by this
    }
}

which has no ownership annotations.

Let me stress again a very important point: If init wanted to leak the alias to next, it would have to assign it to a variable of the type Node<owner::this, T>, where this is the current node. The compiler would make sure that such a variable is accessed only by code that locks the root of the ownership tree, msgQueue. This arrangement ensures the absence of races for the nodes of the list.

Another important point is that Node contains _value of type T as a subobject. The compiler will refuse instantiations where Node‘s ownership tree is shared (its root is is self-owned), and T is thread-local. Indeed, such instantiation would lead to races if an alias to _value escaped from Node. Such an alias, being thread-local, would be accessible without locking.

Comment on notation

In general, a template parameter list might contain a mixture of types, type qualifiers, values, (and, in D, aliases). Because of this mixture, I’m using special syntax for ownership qualifiers, owner::x to distinguish them from other kinds of parameters.

As you have seen, a naked ownership qualifier may be specified during instantiation. If it’s the first template argument, it becomes the owner of the object. Class templates don’t specify this parameter, but they have access to it as owner::of_this.

Other uses of qualifier polymorphism

Once qualifier polymorphism is in the language, there is no reason not to allow other qualifiers to take part in polymorphism. For instance, the old problem of having to write separate const versions of accessors can be easily solved:

class Foo {
    private Bar _bar;
    public mut_q Bar get<mutability::mut_q>() mut_q
    {
        return _bar;
    }
}

Here method get is parametrized by the mutability qualifier mut_q. The values it can take are: mutable (the default), const, or immutable. For instance, in

auto immFoo = new immutable Foo;
immutable Bar b = immFoo.get();

the immutable version of get is called. Similarly, in

void f(const Foo foo) {
    const Bar b = foo.get();
}

the const version is called (notice that f may also be called with an immutable object–it will work just fine).

Class methods in Java or D are by default virtual. This is why, in general, non-final class methods cannot be templatized (an infinite number of possible versions of a method would have to be included in the vtable). Type qualifiers are an exception, because there is a finite number of them. It would be okay for the vtable to have three entries for the method get, one for each possible value of the mutability parameter. In this case, however, all three are identical, so the compiler will generate just one entry.

Conclusion

The hard part–explaining the theory and the details of the ownership scheme–is over. I will now switch to a tutorial-style presentation that is much more programmer friendly. You’ll see how simple the scheme really is in practice.


Since ownership plays a major role in race-free programming, it will be the first topic in my proposal for a race-free system. I presented the bird’s eye view of the system and provided a few teasers in my previous post. The design is based on published papers (see bibliography at the end). My contribution was to integrate several ideas into one package.

When I showed this proposal to my friends they either didn’t believe it could work or considered it too complex, depending which end they were looking at. From users’ perspective, the system looks relatively simple, so the natural reaction is: That can’t work. If you get into the details of why it works, and how the compiler knows you are in danger of a data race, you need some theory, and that is complex. So I decided to deal with some theory first, to show that the things work. If you’re not into theory, just look at the examples. They are usually simple to understand.

Owners

The ownership relationship is necessary to establish a tree-like structure among objects. This is needed by the compiler to decide which lock, if any, is responsible for the protection of each object, and take it when necessary. Simply speaking, the lock at the root of each tree protects the rest of the tree. If you think that your multithreaded programs don’t follow a tree structure, look at them closely. If they don’t, you either already have data races or are likely to develop them in the future.

-Every object has an owner

The owner may be another object–usually the embedding object. In the example below:

class Foo {
    void doWork() { _bar.doWork(); }
    private Bar _bar;
}

auto foo = new Foo;

the embedded object _bar is owned, at runtime, by the object foo (I repeat, the concrete object, not the class Foo). This is the default ownership relationship for embedded objects, so no special notation is needed to establish it (I’ll show later how to override this default).

There are also special symbolic “owners” that are used for the roots of ownership trees:

  • thread,
  • self,
  • unique, and
  • immutable.

unique and immutable are included in this list for convenience. I’ll discuss them later.

-Trees

Every object has just one owner for life, a condition necessary to create ownership trees that can be checked at compile time. Every tree has a single root and a lock is attached to that root, if needed.

The ownership information is embedded in the type of the object. Using this information, the compiler is able to deduce which lock must be held while accessing that object, and what kind of aliasing is allowed. All races (access to mutable shared variables without locking) are detected at compile time. I’ll sketch a proof later.

-What may be shared

Only immutable objects or objects rooted with a self-owned object may be shared between threads.

Additionally, objects whose direct owner is self (such objects are called monitors) may have multiple aliases while being shared. Monitors may own (and protect) other objects that are not monitors.

-Locking

The compiler will make sure that access to an object can only happen when the root of its ownership tree is locked (symbolic owners other than self are considered locked at all times). Since an object may only have one lock associated with it (at the top of its ownership tree), this condition is enough to ensure freedom from races.

Proof: I have to show that when a (mutable) object is seen by more than one thread, each access to it (read or write) is always protected by the same lock. Indeed, for an object to be shared between threads, the root of its ownership tree must be self, hence the top object must be a monitor. This monitor’s lock is always, automatically or explicitly, taken before accessing any member of the tree. The compiler knows which lock to take because the ownership information is encoded in the type of the object.

Introducing ownership annotations

Ownership is specified at the instance level (although it may be restricted at the class level). The previous example, which relied on default assignment of owners, is equivalent to the more explicit instance-level specification (that you will never see in actual programs):

Foo<owner::thread> foo = new Foo<owner::thread>;

This declares and constructs foo as being owned by the symbolic owner, thread. The embedded object _bar‘s owner is foo.

-Creating a monitor

A self-owned object is a monitor (I will alternate between the notation using shared type modifier or explicit owner annotation, <owner::self>). It contains a hidden lock and its methods are, by default, synchronized. Continuing with my example:

auto fooMon = new shared Foo;
// The same as:
// auto fooMon = new Foo<owner::self>;
fooMon.doWork();

The variable fooMon is a monitor and the doWork method is implicitly synchronized. The object _bar is now owned by fooMon. Its type can be expressed (this is rarely needed, however see the example of external ownership) as:

Bar<owner::fooMon>

Types parameterized by runtime entities (fooMon is a runtime handle) are known in programming language theory as dependent types.

Notice that I’m using the same class to create thread-local and shared instances. This is usually possible unless there is a specific restriction at the class level.

Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)

Thread-local objects

The special thread owner, which is the owner of all thread-local objects, is conceptually always locked, so thread-local objects don’t require any synchronization. Also, thread is the default owner so, in the absence of any ownership annotations, all objects are thread-local. That’s one of the defaults that makes single-threaded programs work as-is.

Here’s an interesting twist–global and static objects are by default thread-local. This part has been implemented in D, uncovering a number of threading bugs in the process.

Monitors

The special self owner (or the shared type modifier) is used to create monitor objects. A monitor has a built-in lock and all its public methods are by default synchronized.

As always with defaults, the language must provide a (preferably safe) way to bypass them. To prevent locking, a method may be explicitly marked as lockfree. The compiler is obliged to check if the lockfree method doesn’t access the object’s members in a non-safe way (although it can’t prevent high-level races on lockfree variables). That restricts the lockfree constructs to those that don’t require whole-program analysis to prove their safety.

The lockfree annotation is essential for, among others, the double-checked locking pattern (DCLP). I showed its implementation as a teaser in my previous post.

Subobjects

As I explained earlier, data members of an object are by default owned by that object. This way they inherit the root owner from their parent. This is another default that makes single-threaded programs run without additional qualifiers.

Notice that there are two important aspects of ownership, the direct owner and the root owner, which might be different. The direct owner is used in type-checking, the root owner in deciding which synchronization method to use. Both are known or inferred during compilation.

As usual, the defaults may be overridden. For instance, you may embed a monitor in a thread-local object by qualifying it as self-owned/shared:

class Holder {
    private Mon<owner::self> _mon;
}

or, in common notation, as shared:

class Holder {
    private shared Mon _mon;
}

Here, _mon is not owned by Holder (the default has been overridden) so it doesn’t inherit its root owner. Its methods are synchronized by its own lock. As you can see, ownership tree not always reflects embedding. An embedded monitor starts a new tree.

Well, the situation is a bit subtler. Objects in Java or D have reference semantics, so there is a hidden pointer, or handle, in the code above. Accessing the handle is not the same as accessing the object proper. Consider this example:

class Holder {
    private shared Mon _mon;
    public setMon(shared Mon newMon) {
        _mon = newMon;
    }
}

Let’s instantiate a self-owned Holder and a self-owned Mon:

auto holder = new shared Holder;
auto newMon = new shared Mon;
holder.setMon(newMon);

Since holder is itself a monitor, the setMon method is automatically synchronized by its lock (it must be!). Therefore, strictly speaking, the handle part of _mon is owned by holderMon, whereas the object-proper part is self-owned.

You cannot embed a thread-owned object inside a monitor–the compiler would flag it as an error. This is part of alias control–a thread-local object might possibly have thread-local aliases that may be accessed without locking. Being part of a monitor, it could then migrate to another thread and cause a race.

What if a subobject is accessed directly (not through a method)? This may happen when the subobject is declared public:

class Foo {
    public Bar _bar;
}

In that case not all uses of _bar are allowed. Consider this:

auto foo = new shared Foo;
foo._bar.m(); // error

Access to _bar must happen only when foo is locked. The compiler knows it because the full type of _bar is:

Bar<owner::foo>

Here’s the corrected code:

synchronized(foo) {
    foo._bar.m();
}

An even better solution is to make _bar private and provide appropriate methods to access it. Those methods would be automatically synchronized for a shared foo.

unique and immutable

I discussed unique objects in one of my previous posts. Although not strictly required in the ownership scheme, uniqueness allows for very efficient and safe transmission of large objects between threads. It makes sense to include unique as another symbolic root owner, since its multithreaded semantics is different from other types and it doesn’t require locking.

Some languages, including D, define immutable objects, which cannot be modified after creation. Such objects may be freely shared and passed by reference between threads. Again, immutable may be used as a root owner.

Example

With the preliminaries out of the way, I can now explain in more detail the workings of the teaser from my previous post. Here’s the definition of the class MVar:

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    void put(T msg) {
        _msg := msg; // move
        _full = true;
        notify();
    }
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

First, let’s instantiate MVar as a shared (self-owned) monitor that is used to pass unique objects of class Foo as messages:

auto chanUnique = new shared MVar<unique Foo>;

The type of _msg in this instantiation is unique Foo, which is the same as Foo<owner::unique>. The method put takes unique Foo, so the following code is type-correct:

auto foo = new unique Foo;
chanUnique.put(:= foo); // move foo

Notice that unique objects cannot be assigned or passed by value–they have to be moved, hence the use of the move operator, :=. Internally, the method put also uses the move operator (good thinking on the part of the designer–otherwise MVar couldn’t be instantiated with unique). What’s interesting about this example is that messages are not deep-copied between threads. They are safely passed by reference.

Since chanUnique is self-owned (shared), both put and get are automatically synchronized.

Now let’s access chanUnique from another thread:

// another thread
unique Foo f2 = chanUnique.get(); // implicit move of rvalue

The return type of get is unique Foo, so the types check. I could have used the move operator, but since the right hand side is an rvalue, the compiler lets me use the assignment.

Now for the tricky case: What’s wrong with this code?

auto mVar = new shared MVar<Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
myFoo.unsyncMethod(); // ouch!

Since myFoo is created as thread-local (that’s the default), its methods are not synchronized. If I were able to pass it to shared MVar, another thread could obtain it through get. It could then call the unsynchronized method unsyncMethod at the moment when I was calling it. A data race would be possible! Or would it?

Guess what–the compiler won’t let you shoot yourself in the foot. It will notice that it would have to instantiate a shared object mVar with a thread-local member _msg. That’s against the rules! (A shared object cannot own a thread-local object.)

External ownership

In the original GRFJ paper the authors showed an example where one object was owned by another object without the former being embedded in the latter. They made an observation that, for the purpose of locking, the ownership relationship must be unchangeable: You can’t switch the owner on the fly. Therefore external ownership is allowed only if the owner is declared final.

final shared Lock lockObj = new shared Lock;
auto foo = new Foo<owner::lockObj>;
auto bar = new Bar<owner::lockObj>;

In this case, the compiler will only allow access to foo under the lock of lockObj, as in:

synchronized(lockObj) {
    foo.method();
    bar.method();
}

This construct is useful in situations where the locking discipline is not easily convertible to object hierarchy.

Conclusion

You might have noticed my use of dual notation. Most user code would be written with type qualifiers such as shared, unique, or immutable. However, in some cases I used an alternative notation that looked more like the specification of template parameters: <owner::self>, <owner::unique>, <owner::immutable>, or even <owner::thread> (in D they would be surrounded by !( and )). This was not meant to further confuse the reader, but as a gentle introduction to qualifier polymorphism, which I will describe in the next installment. I will show how classes and methods may be parameterized with different types of ownership, cutting down code duplication.

I’d like to thank Andrei Alexandrescu, Walter Bright, Sean Kelly and Jason House for very helpful comments. I’m also indebted to the D community for discussing my previous posts.

Bibliography

  1. Boyapati, Rinard, A Parameterized Type System for Race-Free Java Programs
  2. C. Flanagan, M. Abadi, Object Types against Races.

A type system that prevents data races must not eliminate useful concurrency patterns or force the programmer to maintain multiple copies of almost identical data structures.

In my previous post about Guava, a dialect of Java, I talked about a type system that enforces the separation of thread-local and shared objects at the class level. Unfortunately, such rigid system forces the programmer (especially a library writer) to provide dual implementations of many generic classes like vectors, queues, etc. Often the only difference between implementations is the use of synchronized (or, in case of Guava, the special base class called Monitor).

To solve this problem, Boyapati and Rinard proposed a system where the same generic class may be used in different sharing contexts. For instance, the same parameterized vector class, may be used to instantiate a thread-local instance that requires no locking, as well as a shared instance that has a built-in lock.

The paper precedes Generic Java, but employs similar notation. For instance, it lets you define a generic vector class like this:

  class vector<thisOwner> { ... }

and then instantiate it as thread-local (no locking requirements):

  vector<thisThread> localV= new vector<thisThread>;

or as a shared monitor:

  vector<self> sharedV= new vector<self>;

The first template parameter is always interpreted as “the owner” (more about it later). Objects owned by thisThread are thread-local, objects owned by self are monitors.

Even though the notation used in GRFJ (the acronym the authors use for their Java dialect) is different from that of Guava, there are many similarities in the two approaches, since both have to deal with similar issues: explicit sharing, ownership, passing objects between threads, etc.

-Explicit sharing

You might remember that, in Guava, only those classes that inherit from Monitor may be shared. In GRFJ, sharing is defined at the instance level (the instantiation of a template). Every instance declaration must specify the owner of the object. If the owner is not thisThread, the object may be shared between threads. The equivalent of the Guava Monitor is a self-owned object–its owner is declared as self.

-Ownership

Ownership plays an essential role in preventing data races. Every object has an owner. In GRFJ there are three basic types of owners:

  1. thisThread–the object owned by thisThread is never shared.
  2. self–the object is the root of an ownership tree.
  3. Another object–the sharing is defined by the root of the ownership tree

Types of ownership translate naturally into protection patterns. If the owner is thisThread there is no need for locking. If the owner is self, all methods must be synchronized by the object’s lock. In the third case, the object’s owner is responsible for locking. More precisely, the root of the ownership tree to which the object belongs has to be locked, if it’s not declared thread-local.

There are some interesting combinations of ownership. For instance, you can declare a thread-local vector that stores self-owned (shared) items. Or you can declare a shared Stack that contains (owns) a Vector object. All access to Vector will be protected by the Stack’s lock.

For this level of expressiveness, we need classes that are parameterized by owners. Notice that, if the owner of the object x is another object, that object must exist when the declaration of x is reached.

A template might be parameterized by multiple owners. The first one on the list is always the owner of this. In the example below, Stack is parameterized by two owners–the first owns the Stack, the second owns the Values. Note that, in this case, all Values will always share the same owner.

class Stack<thisOwner, valueOwner> {
    Node<thisOwner, valueOwner> head = null;

    void push(Value<valueOwner> value) requires (this) {
        Node<thisOwner, valueOwner> newNode = 
            new Node<thisOwner, valueOwner>;
        newNode.init(value, head);
        head = newNode;
    }
    Value<valueOwner> pop() requires (this) {
        if (head == null) return null;
        Value<valueOwner> value = head.value();
        head = head.next();
        return value;
    }
}

The requires clause specifies whose lock must be held when calling a particular method. In this case, the lock on this must be held. But if this is owned by another object, the locking responsibility automatically moves up a level, until it reaches the ownership root.

Here’s the definition of Node:

class Node<thisOwner, valueOwner> {
    Value<valueOwner> value;
    Node<thisOwner, valueOwner> next;

    void init(Value<valueOwner> v, Node<thisOwner, valueOwner> n)
        requires (this) {
        this.value = v;
        this.next = n;
    }
    Value<valueOwner> value() requires (this) {
        return value;
    }
    Node<thisOwner, valueOwner> next() requires (this) {
       return next;
    }
}

And the definition of Value:

class Value<thisOwner> { int x = 0; }

Using the declarations above we are now ready to declare different values and stacks:

Value<thisThread> v1 = new Value<thisThread>;
Value<self> v2 = new Value<self>;

We have created two values from the same template–v1 is thread-local, v2 is a monitor (access to x is protected by its lock).

Stack<thisThread, thisThread> s1 = new Stack<thisThread, thisThread>;
Stack<thisThread, self> s2 =  new Stack<thisThread, self>;
Stack<self, self> s3 = new Stack<self, self>;

Stack s1 is thread-local and can store only thread-local values. No locks or locking code will be created by the compiler. Stack s2 is also thread-local, but it stores shareable values. A thread-local stack will never be visible from other threads. But self-owned values it stores might be accessed from multiple threads. Finally, s3 is a shared stack containing shared values. Both, the stack s3 and the values it stores have their own independent locks.

s1.push(v1);
s2.push(v2);

We may push a thread-local value v1 on the stack s1, but if we tried to push v2 on s1, the compiler would consider it an error. Pushing v2 on s2, on the other hand, is okay.

Since GRFJ is based on Concurrent Java, the locking and threading look a little odd. To illustrate the sharing of s3, we fork a thread, passing it s3 and v2 (both are sharing-ready) and executing the code s3.push(v2) under the lock of s3.

fork (s3,v2) {synchronized (s3) in {s3.push(v2);}}

Notice that, according to the declaration of s3, it would be an error to push v1 onto it. Indeed, that could result in illegal sharing of a thread-local object. The type system protects us from a hard-to-detect error.

This is hardly a free lunch, though. Annotating every class and every variable might be just too much for a programmer. Fortunately, most owners can be inferred by the compiler by analyzing assignments. Because of that, single threaded programs in GRFJ require virtually no annotations.

-Foreign owners

In most cases, ownership tree follows the containment tree. The owner contains the ownee. Although desirable from the architectural point of view, this arrangement is not strictly necessary. An object might declare another separate object as its owner. This is safe under one condition–the owner object may not be overwritten. Hence the requirement that the owner be final. Here’s the relevant example:

final Foo<self> owner = new Foo<self>;
Bar<owner> ownee = new Bar<owner>;

This becomes important when building new locking protocols from pre-existing parts.

-Object migration

The passing of objects between threads requires either deep copying (like Guava’s Values), or move semantics. In GRFJ, move semantics is implemented by specifying another special type of owner–unique. Unique objects cannot be copied, they have to be moved. The “move” operator is the postfix decrement, just like in Guava. It moves the reference and nulls the source.

Value<unique> v3 = new Value<unique>
Value<unique> v4 = v3--;

Our Stack class is not prepared to store unique objects. This prohibition may be included in its definition (compare it with C++ “concepts”):

class Stack<thisOwner, valueOwner> where (valueOwner != unique)

Conversely, we might want to redefine Stack to store unique objects. A few code changes would be necessary though. For instance, in push, the value must be moved:

newNode.init(value--, head);

In pop, the return value has to be moved:

return value--;

The class Node requires similar changes.

The authors note that, despite appearances, the moving of objects is multiprocessor safe. Even though the assignment to a new reference is not guaranteed to be immediately visible to other threads, a unique object is always published to another thread via a monitor (for instance, a shared stack). The new thread can only get hold of the object by first acquiring the monitor’s lock, which forces previous stores to commit.

-Alias control

Move semantics requires control over aliasing–a moved object may not leave any aliases, nor may it carry along any references to thread-local objects. GRFJ provides additional annotations to mark non-escaping method parameters. The syntax is !e appended to the parameter type. Here’s an example:

void display(Value<unique>!e val);
Value<unique> v5 = new Value<unique>;
display(v5);

A unique object here is passed by reference only because display guarantees that its argument won’t escape.

-Immutable objects

Immutable objects may be passed between threads by reference. In GRFJ, immutable objects are marked by another special owner type, readonly. Interestingly, the problem of the construction of immutable objects is cleverly avoided. You first create a unique object and then move it to a readonly reference.

Value<unique> v6 = new Value<unique>;
v6.x = 1;
Value<readonly> v7 = v6--;

This is a perfectly safe operation, since there is a guarantee that no writable aliases to the unique object may stay behind. The move to readonly freezes the object forever.

Immutable objects can only be passed to methods that promise not to modify their arguments. This is done by appending !w to the type of the parameter. The two suffixes may be combined to form !ew (I am not kidding you!), a parameter that doesn’t escape and is not modified by the method.

-Limitations

– Some concurrent programs use multi-stage access patterns that are not expressible in GRFJ. For instance, a shared array is divided into disjoint sections and each thread operates exclusively on its section without any locking. After all threads synchronize on a barrier, they pick up different sections and continue. The ownership of sections changes with time. (In D this pattern might be implementable using array slices.)

– Dynamic downcasting, which used to be the workhorse of Java before generics, can’t verify the ownership part of the cast, because this information is not available at runtime.

– Static variables may be accessed only when the client holds the class lock. Each class with static variables must therefore have a static lock.

– The authors mention the idea of parameterizing methods that accept poly-owned arguments. This is not so easy as it sounds, since virtual functions cannot be parameterized by a potentially infinite set of types. My guess is that this is possible because detailed ownership information is only needed during type checking. Still, the compiler might have to produce additional implementations of a method depending on whether the parameter is thread-local or not (imagine the parameterized method creating a local variable of the same ownership type–sometimes this variable must contain a lock, sometimes not). Still, this is a finite set of possibilities, so vtable slots may be preallocated for all of them.

– The template syntax for ownership won’t work in languages where templates already accept value parameters, and the compiler isn’t always able to distinguish between types and values.


Can a good type system prevent concurrency errors? Or is this a quest for the Holy Grail?

There are two parts to this question, corresponding to two major types of concurrency errors:

  1. Preventing data races
  2. Preventing deadlocks

I’ll start with the first one.

Data races occur only when memory is shared between threads. Disallow sharing and data races are gone! In fact there is a name for threads that don’t share memory: processes. It’s perfectly feasible to have a concurrent language that disallows sharing–Erlang is one (see my post about Erlang). The trick is to always pass data between threads by value. This is especially easy in functional languages.

Non-functional languages like C++, Java, or D (I was told by Walter Bright, the creator of D, to always use the full name, “the D programming language,” so that search engines can index it properly) tend to share data almost by default (see, however, this post).

In Java, all non-primitive types have reference semantics. When you pass a Java object between threads, you’re only passing a handle; the object behind it becomes accessible from both threads.

C++ at least has means to express passing by value and move semantics for user-defined types. Still, it’s up to the programmer to use them.

Who ordered Guava?

For every type-system idea there is one or more dialects of Java that implement it. I’ll start with an older attempt at data-race free Java called Guava, as it illustrates some of the basic premises.

-Explicit sharing

The most important step–if we don’t want to completely ban the sharing of data–is to regulate it. Let the programmer explicitly mark the data destined for sharing as such. The corollary is that the data that is not marked for sharing cannot be shared. This can be accomplished, for instance, by making all objects thread-local by default, or by using type modifiers that prevent references to such objects from escaping.

In Guava, the data type designed for sharing is called a Monitor. As the name suggests, all access to a Monitor is automatically synchronized by the Monitor’s lock. This, incidentally, eliminates the need for the synchronized keyword, which is absent from Guava.

The non-shared data types are either Objects or Values.

Objects are just like regular Java Objects, except that they don’t have a built-in lock, since they can never be shared between threads.

Values are either primitive values, like integers, or user-defined types with value semantics.

Monitors, Objects, and Values are collectively called instances.

-Value semantics

When you pass a Value, the compiler will make a deep copy of it (well, sort of–the monitors embedded in a Value are not deep copied). Since deep copying might be expensive, Guava defines operator “move”, which nulls the source. The syntax is:

  v2 = v1--;

The value v1 becomes null after the move to v2. This is similar to C++ unique_ptr and std::move.

-Ownership

The biggest problem in lock based concurrency is to make sure that the correct lock(s) are taken when accessing shared data. In Guava, all Monitor’s data are protected by that Monitor’s lock. As long as they stay inside that Monitor, nothing bad can happen to them from the point of concurrency.

Values stored inside a Monitor are never accessible outside of the Monitor–only their copies may be passed out.

The same is not true about Objects. Since Objects have reference semantics, there is a real danger that Objects’ references might escape the Monitor that protects them. Imagine a situation where two Monitors have references to the same Object. It is possible then that two threads may operate on that Object at the same time–one entering through one Monitor, the other through the other Monitor. We have a data race!

Therefore it is important that every Object have an owner at all times. The Object’s owner is either a Value or a Monitor. (The exception is a fresh Object that’s been just allocated–it has no owner until it is assigned to a variable.) Since an Object may only be owned by at most one Monitor, it is that Monitor that protects it from simultaneous (racy) access.

-Regions

All Objects that are owned by a particular Monitor or Value form a region. Equivalently, assigning a monitored region to an object specifies what lock must be held when accessing it.

All instances may contain (references to) monitors, but monitors are not “owned” by anybody. References to the same monitor may appear in multiple regions and may be freely passed around. It is thus up to programmers to define an ordering scheme for their monitors in order to avoid deadlocks.

How can we protect Objects from moving between regions and acquiring multiple owners? We need a way to control aliasing.

Here are some Guava rules for passing Objects. A method may declare its Object parameter as either kept or lent. (By default, parameters to Object methods are kept and to Monitor methods are lent.) If the parameter is kept it must belong to the same region as this, and there are no limitations on its use. If, however, the parameter is lent, the method may not store a reference to it in this, nor may it store this inside a lent Object. No cross-aliasing is allowed.

A method may also be marked new if it returns a freshly created object, which has no owner yet. Constructors are considered new unless they accept kept parameters.

Notice that you may have multiple references to the same Object, but they will all be within the same region. The only instances that may be passed between threads are Monitors and Values.

-Constructors

Guava final fields may either be initialized inside a constructor or in a private method that is only called from inside a constructor. (BTW, in D a private method is not private within a module, so the compiler would have to analyze the whole module to establish the latter condition.) [Note to self: The same scheme might be useful in the construction of immutable objects in D.]

Partially constructed Objects must not be visible outside constructors. The compiler must verify that constructors don’t pass this to other methods, and don’t store this inside other instances (no alias cross-contamination).

-Optimizations

Copying Values around may be expensive. I already mentioned one optimization, the use of the “move” operator. The other optimization is related to immutability. If a Value is immutable, it may be safely passed by reference. Guava defines immutable classes as ones that have no update methods. Any method that may modify this must be marked update. The update notation is also extended to method parameters–by default parameters are immutable.

There is a bonus advantage to separating update methods from the rest. In a Monitor, a non-update method may safely use a readers’ lock, which allows multiple readers to access data simultaneously, to increase concurrency.

-Global and local methods

A method is considered local if its effects cannot be observed or influenced by other threads. All Object and Value methods are by default local. A local method is immune to races thus allowing single-thread optimizations.

Conversely, all Monitor methods are considered global, since operations on Monitors may be visible or influenced by other threads.

These defaults may be overridden. For instance, an Object may contain a reference to a Monitor. The methods that call this Monitor’s methods must be marked global. Moreover, Object or Value methods that access Monitors that are passed to them as arguments must also be marked global. So touching a Monitor (which is equivalent to using its lock) taints the whole callers’ tree with the global annotation.

This is similar to the way update taints the callers tree, except the update annotation of a method only pertains to this, not to its arguments. However, when global and update are combined, they have the same tainting power as global. In particular, a method that invokes a global update method on its argument becomes tainted with global update.

Methods that are global update cannot be invoked from non-update methods, even if they only global-update their arguments.

Note: This part of the paper is not very clear to me. The authors never explain the importance of global update methods (other than optimization opportunities).

-Limitations

Guava implements a relatively simple and somewhat limited system to eliminate races. It punts the problem of ownership-passing and lock-free programming. Even with those limitations the Guava type system is not simple.

The idea that safe multithreading may be achieved with a few simple modification to the type system seems to be untenable. However, as long as special type annotations are limited to the code that does actual multithreading, I think they’re worth the added complexity.


What is the most basic building block for asynchronous message passing? I’ve found the answer in Concurrent Haskell. Not many people are fluent in Haskell, especially when monads are involved, so MVars may sound foreign to most. Not to worry–I’ll translate them into Java (as I did with synchronous CML channels in my previous post).

An MVar is an object with two methods, put and take. The sender calls put with a message and continues (it doesn’t block–that’s why this message-passing model is called “asynchronous”). The receiver, in another thread, calls take to remove the message from MVar. If there is no message, the receiver blocks until there is one.

An MVar can store a maximum of one message, so it’s an error to call put when there is an unclaimed message inside it. An MVar can thus be only in one of two states: empty or full.

Here’s a simple-minded implementation of MVar written in Java. (A lock-free implementation is possible–and closer to how Haskell implements it–but it’s harder to reason about.)

public class MVar<T> {
    private T _obj;
    private boolean _full;

    public MVar(){
        _full = false;
    }
    // put: asynchronous (non-blocking)
    // Precondition: MVar must be empty
    public synchronized void put(T obj) {
        assert !_full;
        assert _obj == null;
        _obj = obj;
        _full = true;
        notify();
    }
    // take: if empty, blocks until full.
    // Removes the object and switches to empty
    public synchronized T take() throws InterruptedException {
        while (!_full)
            wait(); // may throw!

        T ret = _obj;
        _obj = null;
        _full = false;
        return ret;
    }
}

You might think that it would be difficult to implement anything useful with MVars. After all it looks like a message queue of length one, which bombs when you try to put a second message in. Yet it is the simplest building block for more sophisticated structures. It is the atom of asynchronous message passing.

We’ve seen before how to implement asynchronous message passing using synchronous building blocks by spawning a thread. Now let’s see how we can implement a simple synchronous channel variable using asynchronous MVars. Remember, in a synchronous channel, if there is no receiver already waiting on the other side, a call to send (or write, in this example) will block.

The basic channel variable, or a channel of length one, is called a CVar in Haskell, and it contains two MVars, one to store the message, and the other for the acknowledgment. The acknowledgment MVar doesn’t really have to store anything–we are only interested in its state: empty or full. The reader will acknowledge the receipt of the message by setting it to full.

public class CVar<T> {
    private MVar<T> _data;
    private MVar<Object> _ack;
    
    public CVar(){
        _data = new MVar<T>();
        _ack = new MVar<Object>();
        _ack.put(null); // make _ack full
    }
    public void write(T obj) throws InterruptedException {
        _ack.take(); // make _ack empty
        _data.put(obj);
    }
    public T read() throws InterruptedException {
        T data = _data.take();
        _ack.put(null); // make _ack full
        return data;
    }
}

MVars can also be used to build a more useful asynchronous buffered channel that can store more than one message at a time. I will show you the construction, but it’s far from trivial. You might notice that it resembles a lot a lock-free FIFO queue (although I chose to use locks to implement the MVar). As with all lock-free data structures, one has to be very careful when reasoning about their correctness. I’ll leave it as an exercise to the reader 😉 .

Item and Stream are mutually recursive data structures forming a linked list. An Item points to a Stream–the tail of the list. A Stream is an MVar containing an Item. You may look at a Stream as a sequence of alternating MVars and Items. An empty MVar may serve as a sentinel.

class Item<T> {
    private T _val;
    private Stream<T> _tail;
    
    public Item(T val, Stream<T> tail){
        _val = val;
        _tail = tail;
    }
    T value(){
        return _val;
    }
    Stream<T> tail(){
        return _tail;
    }
}

// It's just a typedef for an MVar that stores an Item
class Stream<T> extends MVar<Item<T>> {}

A Channel contains the Stream linked list stored in an MVar, which is called _read because the head of the list is where we read messages. The other MVar points to the end of the list (really an empty sentinel Stream). This is the write end of the list. The methods put and get (which are not synchronized!) perform some intricate manipulations characteristic of lock-free algorithms, making sure that at every step the queue is in a consistent state for concurrent access. Notice that put will only block if another put is in progress. Otherwise it’s asynchronous–there is no waiting for a receiver.

public class Channel<T> {
    private MVar<Stream<T>> _read;
    private MVar<Stream<T>> _write;
    
    public Channel() {
        _read = new MVar<Stream<T>>();
        _write = new MVar<Stream<T>>();
        Stream<T> hole = new Stream<T>();
        _read.put(hole);
        _write.put(hole);
    }
    public void put(T val)throws InterruptedException {
        Stream<T> newHole = new Stream<T>();
        Stream<T> oldHole = _write.take();
        _write.put(newHole);
        oldHole.put(new Item<T>(val, newHole));
    }
    public T get()throws InterruptedException {
        Stream<T> cts = _read.take();
        Item<T> item = cts.take();
        _read.put(item.tail());
        return item.value();
    }
}

In the next few installments I’m planning to talk a little about “choice” and then tackle the actor-based message-passing paradigm (Erlang and Scala).


What’s the lowest level primitive that can be used to build a concurrent message passing system? As I discussed in my previous post, there are two message passing paradigms, synchronous and asynchronous.

Let’s start with the atom of synchronous message passing. In its purest form it is implemented in Concurrent ML (CML) in the form of a channel. A channel has two methods, send and recv. A thread that calls send blocks until another thread calls recv on the same channel. Similarly, a thread that calls recv will block if there is no blocked send to rendezvous with.

CML channels are typed, i.e., a channel for passing integers has a different type than a channel for passing characters.

Since not everybody is familiar with ML, I decided to implement an equivalent of CML channels in Java. I considered C++ and D, but the Java implementation is the simplest. In C++ I would have to use a C++0x compiler, which I don’t have; in the D programming language, condition variables are not as easy to use as in Java (although that might change in the future). This implementation works only for point-to-point communications–it does not support multiple senders or receivers. [Following readers’ comments I modified the implementation to take into account the so-called spurious wakeups–exits from wait not caused by notify.]

public class Channel<T> {
    private boolean _dataReady;
    private boolean _received;
    T _msg;

    Channel(){
        _dataReady = false;
        _received = false;
    }
    public synchronized void send(T msg) throws InterruptedException {
    	while (_dataReady)
    		wait();
        _msg = msg;
        _dataReady = true;
        notify();
        while (!_received)
            wait();
        _received = false;
    }
    public synchronized T recv() throws InterruptedException {
        while (!_dataReady)
            wait();
        T msg = _msg;
        _dataReady = false;
        _received = true;
        notify();
        return msg;
    }
}

This is probably not the simplest implementation, but it illustrates the principle. Notice that internally the message is passed through shared memory (the data member _msg). This is standard for intra-process, and often inter-process, message passing.

Synchronous channels can be used as building blocks in the implementation of asynchronous message passing. The asynchronous sender simply creates a worker thread that calls send and possibly blocks. The calling thread is free to proceed immediately. The efficiency of this sort of implementation depends heavily on how cheap thread creation is (although worker threads can be cached in thread pools). Any language runtime that uses native threads (including Java) is handicapped in this respect. Interestingly enough, Erlang and Haskell, which use the asynchronous model, have cheap threads; the synchronous CML, paradoxically, uses native threads.

In the next installment, I’ll describe the atom of asynchronous message passing that can be found in Haskell (don’t worry, I’ll translate it into Java).

For completeness, here’s the program I used to test my channels:

public class RecvThread extends Thread {
    private Channel<String> _ch;
    long _waitTime;

    public RecvThread(Channel<String> ch, long waitTime){
        _ch = ch;
        _waitTime = waitTime;
    }
    public void run()
    {
        try {
            Thread.sleep(_waitTime);
            String msg = _ch.recv();
            System.out.println("Message received: " + msg);
        } catch (InterruptedException e) {}
    }
}

public class TestChannel {
    public static void main(String[] args) {
        try {
            testReceiver();
            testSender();
        } catch (InterruptedException e) {
            System.out.println("Interrupted Exception");
        }
    }
    public static void testReceiver() throws InterruptedException 
    {
        Channel<String> ch = new Channel<String>();
        // send will happen first
        RecvThread t = new RecvThread(ch, 1000);
        t.start();
        ch.send("Hello before rendezvous");
    }
    public static void testSender() throws InterruptedException 
    {
        Channel<String> ch = new Channel<String>();
        RecvThread t = new RecvThread(ch, 0);
        t.start();
        Thread.sleep(1000);
        ch.send("Hello after sleep");
    }
}

In the first test, the sender blocks; in the second, the receiver blocks.


An immutable object never changes. You can bet your program on it. As I explained in my previous post, the same is not true for const objects (or readonly objects, in dialects of Java). They may be changed through mutable aliases. An immutable object has no mutable aliases. Ever!

Small print: this guarantee is predicated on the programmer not overriding the type system with casts and other escape mechanisms.

To my knowledge, immutability is currently available in the D programming language and in a Java dialect called IGJ (Immutability Generic Java). It is the default in functional languages.

The closest you can get to immutability in C++ is by using const as a storage class:

const double pi = 3.141592;
const char ERRORMSG[] = "Your bank went belly up.";

Here, the value of pi or ERRORMSG is guaranteed to never change. Global or static immutable values can be used at compile time (for instance as labels in a switch statement).

Creating Immutable Objects

Defining immutable numbers, arrays, or POD structs is pretty straightforward. But what about more complex objects that don’t have static initializers? How do you create an immutable linked list? In functional languages, lists are treated as built-in types; like arrays in general purpose languages. They can be statically initialized. But in C++ or Java the creation of a list involves memory allocations and reference manipulation. Since, by definition, we can’t manipulate an immutable list, it seems like we can never create one!

How about relaxing the constraints a little to allow mutation inside the constructor of an immutable object. Here’s a hypothetical example of a list in D (the current D compiler doesn’t fully implement immutability, so my examples are written in pseudo-D):

class IntList
{
public:
  // default constructor
  this() {} // _head is default-initialized to null
  // one element constructor
  this(int i) {
    _head = new IntLink(i); // mutates _head
  }
private:
  IntLink _head;
}
// will this work?
immutable IntList oneList= new immutable IntList(1);

There is a significant problem with this solution. If this is not considered immutable inside the constructor then there is no guarantee that a mutable reference to this or any of its subobjects won’t escape its scope. Consider this example:

IntLink globalLink; // mutable!

IntList.this(int i) {
  _head = new IntLink(i);
  globalLink = _head; // escape!
}

immutable IntList immList= new immutable IntList(1);
globalLink.setValue(2); // mutates immList!

Here, an immutable object immList has been mutated through an alias globalLink. We can’t allow this!

It’s true that a compiler could perform escape analysis on the constructor of IntList, provided it has access to its source code; which might not always be true when it’s compiling the statement that creates the immutable object. After all, class IntList might be implemented in a third-party library.

In the absence of source code, the only other possibility is to include immutability information in the type signature of the constructor. When an immutable object is created, the compiler would use an immutable constructor, and it would fail if one didn’t exist. Conversely, an immutable constructor would not compile if it allowed a mutable reference to escape. This bad code would not compile:

IntList.this(int i) immutable {
  _head = new IntLink(i);
  globalLink = _head; // error!
}

Of course, no mutable methods may be called from inside an immutable constructor–they couldn’t guarantee the non-escape of mutable aliases.

This solution works, even if it’s not perfect. It often leads to code duplication (the immutable constructor being identical to the mutable one, as in the IntList example). Moreover, it prevents some forms of refactoring. Even though, inside an immutable constructor, you may initialize an object’s fields, you can’t delegate this task to a (perforce immutable) method of the object.

Assignment is not the same as Mutation

The key to immutable construction is the observation that, when constructing an immutable object, it’s okay to assign the object’s fields but not to mutate them. During construction only such “shallow” mutation should be possible.

In my example, the assignment to _head is okay, but the mutation of the IntLink object attached to it should be prohibited. Indeed, I don’t need to mutate the head link once it’s constructed. Of course the construction of an immutable IntLink follows the same rules. Here’s the relevant code:

class IntLink
{
  this(int i) immutable {
    // _next is default-initialized to null
    _val = i; // field assignment
  }
  int _val;
  IntLink _next;
}

With this understanding, it’s actually possible to minimize code duplication. To this end, IGJ introduces a new type modifier, AssignFields. A constructor or a method that performs no other mutation but field assignment may be declared AssignFields. Since AssignFields methods and AssignFields constructors can also be used in mutable contexts, they don’t have to be duplicated. Expanding on the above example:

class IntLink
{
  this(int i) assignfields {
    // _next is default-initialized to null
    SetValue(i); // Ok: it's an assignfields method
  }
  void SetValue(int i) assignfields {
    _val = i; // field assignment
  }
  int _val;
  IntLink _next;
}

As you can see, I was even able to refactor the part of the constructor that does the assignment to _val. I can now use the same constructor in both, mutable and immutable, contexts. The SetValue method can only be called in a mutable or assignfields context.

immutable IntLink iLink = new immutable IntLink(1);
IntLink mLink = new IntLink(2);
mLink.SetValue(3);

Subtyping relationships

It is okay to pass an immutable object to a function that expects a const object. After all, such a function will not mutate the object. If a reference to the object escapes the function, it can only be a const reference. And again, const reference cannot be used to mutate the object, so we’re fine.

The compiler will allow this subsumption if we establish that immutable is a subtype of const (more precisely, for any type T, immutable T is a subtype of const T). This is very similar to the compiler allowing the passing a derived class object to a function that accepts a base class objects–the Liskov substitution principle.

The full subtyping hierarchy between various mutability annotations is as follows:

  • immutable is a subtype of const: You can pass an immutable object to a function taking a const argument.
  • assignfields is a subtype of const: You can call a const method from an assignfields method (subsumption of this).
  • mutable is a subtype of assignfields. You can call an assignfields method on a mutable object (as in mLink.SetValue()).

Because of transitivity, mutable is also a subtype of const. You can pass a mutable object to a function that takes a const argument (you do it in C++ without much thinking).

subtyping relationships

Conclusion

There is some advantage to introducing yet another type modifier, assignfields, to smooth out the process of constructing immutable objects. On the other hand, is it worth additional complexity? How often does one construct non-trivial immutable objects? If it’s not a common programming pattern then maybe we can live with current restrictions and some code duplication. We still have very little experience in using immutable in D, so we might have to wait and see.

Recent news: A video of my talk to the Vancouver C++ Users group was just posted. The black shadow in front of the bright screen talking about memory fences is yours truly.

If you found this post interesting, please register your vote on reddit, so that more people can find it.

Next Page »