D Programming Language

Since ownership plays a major role in race-free programming, it will be the first topic in my proposal for a race-free system. I presented the bird’s eye view of the system and provided a few teasers in my previous post. The design is based on published papers (see bibliography at the end). My contribution was to integrate several ideas into one package.

When I showed this proposal to my friends they either didn’t believe it could work or considered it too complex, depending which end they were looking at. From users’ perspective, the system looks relatively simple, so the natural reaction is: That can’t work. If you get into the details of why it works, and how the compiler knows you are in danger of a data race, you need some theory, and that is complex. So I decided to deal with some theory first, to show that the things work. If you’re not into theory, just look at the examples. They are usually simple to understand.


The ownership relationship is necessary to establish a tree-like structure among objects. This is needed by the compiler to decide which lock, if any, is responsible for the protection of each object, and take it when necessary. Simply speaking, the lock at the root of each tree protects the rest of the tree. If you think that your multithreaded programs don’t follow a tree structure, look at them closely. If they don’t, you either already have data races or are likely to develop them in the future.

-Every object has an owner

The owner may be another object–usually the embedding object. In the example below:

class Foo {
    void doWork() { _bar.doWork(); }
    private Bar _bar;

auto foo = new Foo;

the embedded object _bar is owned, at runtime, by the object foo (I repeat, the concrete object, not the class Foo). This is the default ownership relationship for embedded objects, so no special notation is needed to establish it (I’ll show later how to override this default).

There are also special symbolic “owners” that are used for the roots of ownership trees:

  • thread,
  • self,
  • unique, and
  • immutable.

unique and immutable are included in this list for convenience. I’ll discuss them later.


Every object has just one owner for life, a condition necessary to create ownership trees that can be checked at compile time. Every tree has a single root and a lock is attached to that root, if needed.

The ownership information is embedded in the type of the object. Using this information, the compiler is able to deduce which lock must be held while accessing that object, and what kind of aliasing is allowed. All races (access to mutable shared variables without locking) are detected at compile time. I’ll sketch a proof later.

-What may be shared

Only immutable objects or objects rooted with a self-owned object may be shared between threads.

Additionally, objects whose direct owner is self (such objects are called monitors) may have multiple aliases while being shared. Monitors may own (and protect) other objects that are not monitors.


The compiler will make sure that access to an object can only happen when the root of its ownership tree is locked (symbolic owners other than self are considered locked at all times). Since an object may only have one lock associated with it (at the top of its ownership tree), this condition is enough to ensure freedom from races.

Proof: I have to show that when a (mutable) object is seen by more than one thread, each access to it (read or write) is always protected by the same lock. Indeed, for an object to be shared between threads, the root of its ownership tree must be self, hence the top object must be a monitor. This monitor’s lock is always, automatically or explicitly, taken before accessing any member of the tree. The compiler knows which lock to take because the ownership information is encoded in the type of the object.

Introducing ownership annotations

Ownership is specified at the instance level (although it may be restricted at the class level). The previous example, which relied on default assignment of owners, is equivalent to the more explicit instance-level specification (that you will never see in actual programs):

Foo<owner::thread> foo = new Foo<owner::thread>;

This declares and constructs foo as being owned by the symbolic owner, thread. The embedded object _bar‘s owner is foo.

-Creating a monitor

A self-owned object is a monitor (I will alternate between the notation using shared type modifier or explicit owner annotation, <owner::self>). It contains a hidden lock and its methods are, by default, synchronized. Continuing with my example:

auto fooMon = new shared Foo;
// The same as:
// auto fooMon = new Foo<owner::self>;

The variable fooMon is a monitor and the doWork method is implicitly synchronized. The object _bar is now owned by fooMon. Its type can be expressed (this is rarely needed, however see the example of external ownership) as:


Types parameterized by runtime entities (fooMon is a runtime handle) are known in programming language theory as dependent types.

Notice that I’m using the same class to create thread-local and shared instances. This is usually possible unless there is a specific restriction at the class level.

Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)

Thread-local objects

The special thread owner, which is the owner of all thread-local objects, is conceptually always locked, so thread-local objects don’t require any synchronization. Also, thread is the default owner so, in the absence of any ownership annotations, all objects are thread-local. That’s one of the defaults that makes single-threaded programs work as-is.

Here’s an interesting twist–global and static objects are by default thread-local. This part has been implemented in D, uncovering a number of threading bugs in the process.


The special self owner (or the shared type modifier) is used to create monitor objects. A monitor has a built-in lock and all its public methods are by default synchronized.

As always with defaults, the language must provide a (preferably safe) way to bypass them. To prevent locking, a method may be explicitly marked as lockfree. The compiler is obliged to check if the lockfree method doesn’t access the object’s members in a non-safe way (although it can’t prevent high-level races on lockfree variables). That restricts the lockfree constructs to those that don’t require whole-program analysis to prove their safety.

The lockfree annotation is essential for, among others, the double-checked locking pattern (DCLP). I showed its implementation as a teaser in my previous post.


As I explained earlier, data members of an object are by default owned by that object. This way they inherit the root owner from their parent. This is another default that makes single-threaded programs run without additional qualifiers.

Notice that there are two important aspects of ownership, the direct owner and the root owner, which might be different. The direct owner is used in type-checking, the root owner in deciding which synchronization method to use. Both are known or inferred during compilation.

As usual, the defaults may be overridden. For instance, you may embed a monitor in a thread-local object by qualifying it as self-owned/shared:

class Holder {
    private Mon<owner::self> _mon;

or, in common notation, as shared:

class Holder {
    private shared Mon _mon;

Here, _mon is not owned by Holder (the default has been overridden) so it doesn’t inherit its root owner. Its methods are synchronized by its own lock. As you can see, ownership tree not always reflects embedding. An embedded monitor starts a new tree.

Well, the situation is a bit subtler. Objects in Java or D have reference semantics, so there is a hidden pointer, or handle, in the code above. Accessing the handle is not the same as accessing the object proper. Consider this example:

class Holder {
    private shared Mon _mon;
    public setMon(shared Mon newMon) {
        _mon = newMon;

Let’s instantiate a self-owned Holder and a self-owned Mon:

auto holder = new shared Holder;
auto newMon = new shared Mon;

Since holder is itself a monitor, the setMon method is automatically synchronized by its lock (it must be!). Therefore, strictly speaking, the handle part of _mon is owned by holderMon, whereas the object-proper part is self-owned.

You cannot embed a thread-owned object inside a monitor–the compiler would flag it as an error. This is part of alias control–a thread-local object might possibly have thread-local aliases that may be accessed without locking. Being part of a monitor, it could then migrate to another thread and cause a race.

What if a subobject is accessed directly (not through a method)? This may happen when the subobject is declared public:

class Foo {
    public Bar _bar;

In that case not all uses of _bar are allowed. Consider this:

auto foo = new shared Foo;
foo._bar.m(); // error

Access to _bar must happen only when foo is locked. The compiler knows it because the full type of _bar is:


Here’s the corrected code:

synchronized(foo) {

An even better solution is to make _bar private and provide appropriate methods to access it. Those methods would be automatically synchronized for a shared foo.

unique and immutable

I discussed unique objects in one of my previous posts. Although not strictly required in the ownership scheme, uniqueness allows for very efficient and safe transmission of large objects between threads. It makes sense to include unique as another symbolic root owner, since its multithreaded semantics is different from other types and it doesn’t require locking.

Some languages, including D, define immutable objects, which cannot be modified after creation. Such objects may be freely shared and passed by reference between threads. Again, immutable may be used as a root owner.


With the preliminaries out of the way, I can now explain in more detail the workings of the teaser from my previous post. Here’s the definition of the class MVar:

class MVar<T> {
    T    _msg;
    bool _full;
    void put(T msg) {
        _msg := msg; // move
        _full = true;
    T take() {
        while (!_full)
        _full = false;
        return := _msg;

First, let’s instantiate MVar as a shared (self-owned) monitor that is used to pass unique objects of class Foo as messages:

auto chanUnique = new shared MVar<unique Foo>;

The type of _msg in this instantiation is unique Foo, which is the same as Foo<owner::unique>. The method put takes unique Foo, so the following code is type-correct:

auto foo = new unique Foo;
chanUnique.put(:= foo); // move foo

Notice that unique objects cannot be assigned or passed by value–they have to be moved, hence the use of the move operator, :=. Internally, the method put also uses the move operator (good thinking on the part of the designer–otherwise MVar couldn’t be instantiated with unique). What’s interesting about this example is that messages are not deep-copied between threads. They are safely passed by reference.

Since chanUnique is self-owned (shared), both put and get are automatically synchronized.

Now let’s access chanUnique from another thread:

// another thread
unique Foo f2 = chanUnique.get(); // implicit move of rvalue

The return type of get is unique Foo, so the types check. I could have used the move operator, but since the right hand side is an rvalue, the compiler lets me use the assignment.

Now for the tricky case: What’s wrong with this code?

auto mVar = new shared MVar<Foo>;
auto myFoo = new Foo;
myFoo.unsyncMethod(); // ouch!

Since myFoo is created as thread-local (that’s the default), its methods are not synchronized. If I were able to pass it to shared MVar, another thread could obtain it through get. It could then call the unsynchronized method unsyncMethod at the moment when I was calling it. A data race would be possible! Or would it?

Guess what–the compiler won’t let you shoot yourself in the foot. It will notice that it would have to instantiate a shared object mVar with a thread-local member _msg. That’s against the rules! (A shared object cannot own a thread-local object.)

External ownership

In the original GRFJ paper the authors showed an example where one object was owned by another object without the former being embedded in the latter. They made an observation that, for the purpose of locking, the ownership relationship must be unchangeable: You can’t switch the owner on the fly. Therefore external ownership is allowed only if the owner is declared final.

final shared Lock lockObj = new shared Lock;
auto foo = new Foo<owner::lockObj>;
auto bar = new Bar<owner::lockObj>;

In this case, the compiler will only allow access to foo under the lock of lockObj, as in:

synchronized(lockObj) {

This construct is useful in situations where the locking discipline is not easily convertible to object hierarchy.


You might have noticed my use of dual notation. Most user code would be written with type qualifiers such as shared, unique, or immutable. However, in some cases I used an alternative notation that looked more like the specification of template parameters: <owner::self>, <owner::unique>, <owner::immutable>, or even <owner::thread> (in D they would be surrounded by !( and )). This was not meant to further confuse the reader, but as a gentle introduction to qualifier polymorphism, which I will describe in the next installment. I will show how classes and methods may be parameterized with different types of ownership, cutting down code duplication.

I’d like to thank Andrei Alexandrescu, Walter Bright, Sean Kelly and Jason House for very helpful comments. I’m also indebted to the D community for discussing my previous posts.


  1. Boyapati, Rinard, A Parameterized Type System for Race-Free Java Programs
  2. C. Flanagan, M. Abadi, Object Types against Races.
    1. Advertisements

I’ve been using auto_ptrs before they were available in C++ (I made my own). I wrote several articles about resource management, in which auto_ptr played a prominent role. auto_ptr had always had many flaws, but it was a workhorse of memory management in a language that shuns garbage collection. Many of the flaws have been understood over the years and, in the 0x version of C++, auto_ptr was supplanted by the new improved unique_ptr. Using the latest features of C++, like rvalue references, unique_ptr implements move semantics in a consistent manner. You can now store unique_ptrs in most containers and apply many algorithms to them.

So why am I not jumping for joy? Because I know how much more can be done.

But first let me summarize the idea behind the unique pointer. It is a (smart) pointer that is the only reference to the object it’s pointing to. In particular, you cannot make a copy of a unique pointer–if you could, you’d end up with two references to the same object. You can only move it (hence the move semantics), making the source of the move invalid. With the older auto_ptr the moving was done quietly during assignment or pass-by-value. The problem with that arrangement was that the source auto_ptr would suddenly turn to null, which was sometimes confusing and occasionally led to access violations–as in this example:

void f(auto_ptr<Foo> foo); // pass by value

auto_ptr<Foo> pFoo (new Foo());
f(pFoo); // pass by value nulls the internal pointer
pFoo->method(); // access violation

The improvement provided by unique_ptr is to require an explicit move, to sensitize the programmer to the fact that the source of move becomes invalid. Still, the following code will compile, although the bug is much more prominent:

void f(unique_ptr<Foo> foo);

unique_ptr<Foo> pFoo (new Foo())
f(move(pFoo)); // explicit move
pFoo->method(); // access violation

A bigger problem is that there is no guarantee that a unique_ptr indeed stores the unique reference to an object. To begin with, unique_ptr can be constructed from any pointer. That pointer might have already been aliased, as in:

void f(unique_ptr<Foo> foo) {
    // destructor of foo called at the end of scope

Foo * foo = new Foo();
unique_ptr<Foo> upFoo(foo); 
// foo now aliases the contents of upFoo
f(move(upFoo)); // explicit move
foo->method(); // undefined behavior

There is also an obvious backdoor in the form of the method unique_ptr::get, which can be used to spread new aliases. This can be particularly insidious when you have to pass the result of get to a foreign function:

void f(Foo * pf) {
    globalFoo = pf; // creates a global alias

unique_ptr<Foo> pFoo(new Foo());
f(pFoo.get()); // leaks an alias

Finally, it’s possible to create aliases to data members of the object stored in unique_ptr, as well as assign aliased references to its data members. Consider this example:

class Foo {
    ~Foo() { delete _bar; }
    Bar * _bar;

Bar * pBar = new Bar();
unique_ptr<Foo> upFoo(new Foo());
pFoo->_bar = pBar;
// pBar is now an alias to a member of Foo
upFoo.reset(); // deletes _bar inside foo
pBar->method(); // undefined behavior

All this adds up to quite a number of ways to shoot yourself in the foot! Still, if the only problems were deterministic access violations, I could live with them (I’m a very disciplined programmer). They are reasonably easy to reproduce and can be debugged using standard methods (code coverage). But now there is a new elephant in the room–multithreading.

The beauty of unique objects is that they can be safely passed (moved) between threads and they never require locking. Indeed, since only one thread at a time can reference them, there is no danger of data races. Except when, as in the examples above, aliases are inadvertently leaked from unique_ptr. Accessing an object through such aliases from more than one thread without locking is a very nasty bug, usually very hard to reproduce.

Type system approach

Okay, so what’s the panacea? I’m glad you asked–the type system, of course! Make unique a type qualifier and all your troubles are gone. Notice that the compiler has no idea about the semantics of some library class called unique_ptr, but it can be made aware of the semantics of the unique type qualifier. What’s more, it can enforce this semantics from cradle to grave–from construction to destruction. (I know this proposal has no chance to be incorporated into C++, but it might be useful in, let’s say, the D programming language.)

You don’t construct a unique pointer from an arbitrary pointer. You just construct the object as unique (I’m still using C++-like syntax):

unique Foo * foo = new unique Foo();

Now that the compiler knows the meaning of unique, it can, for instance, detect if a unique pointer is accessed after having been moved. The compiler might prevent such an invalid pointer from being dereferenced or passed to other functions. Such bugs can be detected at compile time, rather than through access violations at runtime.

Do we need separate syntax for move? It might seem that using the assignment syntax (the equal sign) wouldn’t be so bad, as long as the compiler prevents us from dereferencing the null pointer left behind after a move. However, another problem arises when you start instantiating templates with unique pointers. Some containers and algorithms will work out of the box. Some will refuse to be instantiated because internally they try to access unique pointers after the move. But there is a third category of templates that, although move-correct, will generate unexpected results–like vectors with null unique pointers. Unfortunately, there are situations where the compiler has limited vision (i.e., it cannot do whole-program analysis) and can miss a null unique pointer dereference. For instance, it can’t prevent the use of a vector after a random element has been moved out of it.

Let’s assume that we have a special syntax for move, say := . Here’s a simple example of its use:

unique * Foo foo1 = new unique Foo();
unique * Foo foo2 := foo1; // move
foo1->method(); // compile-time error: foo1 invalid after move

In C++0x, a lot of containers and algorithms have been rewritten to use the explicit move instead of the assignment. It wouldn’t be a big deal to use := instead. Notice that the compiler would not allow the use of a regular assignment on unique pointers (unless they are rvalues), so the templates that are not ready for move semantics would refuse to get instantiated with unique template parameters.

Obviously, the move operator applied to a non-unique pointer must also work, otherwise we’d have to write separate templates for unique and non-unique template arguments.

We also have to use our special notation when passing unique arguments to functions, as in:

void f(unique * Foo);

unique * Foo foo = new unique Foo();
f(:= foo); // move
foo->method(); // compile-time error: foo invalid after move

When returning an lvalue unique pointer (for instance a member of a data structure), we use the notation return := foo;. The move operator may be elided when returning a local unique pointer, since the source ceases to exist upon the return (in general, move is implied when the source is a unique rvalue).

The biggest strength of the type-based approach is that the compiler can reliably prevent the aliasing of unique pointers. An assignment (as opposed to move) of an (lvalue) unique pointer to either unique or non-unique pointer is impossible. Taking the address of a unique pointer is forbidden too.

That leaves us with an interesting dilemma: How do you call a (library) function that takes a regular pointer if all you have at your disposal is a unique pointer? C++ unique_ptr let’s you do it through its method get. But we know how dangerous get is as far as aliasing goes.

If you don’t have the source code for the function you are calling, you probably shouldn’t be calling it with a unique pointer, because you have no idea what this function will do with it (store an alias in a global variable?). If you know the implementer of the function and he/she is willing to donate his/her kidney in case the function leaks aliases, you may risk casting away the uniqueness.

There is however a way for a function to guarantee that it’s alias-safe by declaring its parameters lent. The compiler will check the body of the function to make sure it doesn’t leak aliases to any part of the lent parameter, nor does it store non-unique (and possibly aliased) objects inside the lent object. Of course, the function can only pass the lent parameter (or any sub-part of it) to another function that makes the same guarantee.

It’s not obvious which should be the default: lent or it’s opposite, owned. If lent were the default, the compiler would be flagging a lot of valid single-threaded code (although it makes sense to assume that methods of a monitor object take lent parameters by default).

The relationship between unique and lent is very similar to that between immutable and const in D. If you declare data as unique or immutable you can safely pass it to a functions that declares its parameter as lent or const, respectively. lent guarantees that the parameter will not be aliased, const that it won’t be mutated.

Speaking of immutable–there’s always been a problem with constructing non-trivial immutable objects. The tricky part is that during construction we often need to explicitly modify the object, but we don’t want to leak non-const aliases to it or to its sub-parts. And now we have a new tool at hand to control aliasing–unique pointers. Instead of constructing an immutable object, you may construct a unique object, with all the guarantees about aliasing, and then move it to an immutable pointer. Just like this:

unique Foo * pFoo = new unique Foo();
immutable Foo * imFoo := pFoo;

(Of course, C++ doesn’t have immutable types either, but D does.)

By the way, you can always safely move a unique pointer to any of immutable, const, or regular pointers. Notice that it doesn’t mean that unique is a supertype of, for instance, immutable. You can’t pass unique where immutable is expected (you don’t want read-only aliases to escape to another thread!)–you can only move it.

A method that promises not to leak aliases to this is declared as lent. The compiler will detect any violations of this promise, as in:

class Foo {
    Bar * _bar;
    Bar * get() lent {
        return _bar; // error: a lent method returning an alias

In general, when you are dealing with a unique object, you may only call its lent methods.


Strict uniqueness imposes constraints on the internal structure of objects. Imagine creating a unique doubly-linked list. A link in such a list must be accessible from two places: from the previous link and from the next link. The scheme that absolutely prevents the creation of aliases to unique objects will not allow the creation of doubly-linked lists–and rightly so! Imagine moving a link out of the list: you’ll end up with a null pointer in the list, and a possible cross-thread aliasing if the (unique) link migrates to another thread.

There is a solution to this problem based on the ownership scheme (like the one used in Guava or GRFJ), which allows cross-aliasing of objects that share the same owner (e.g., all links are owned by the list). Such aliases cannot be leaked because the compiler won’t allow objects owned by a unique object to be moved. But that’s a topic for another blog post.

The use of iterators on unique containers is also highly restricted, but there are other ways of iterating that are inherently more thread-safe.


Any extension to the type system is usually met with some resistance from the user community. While some appreciate the added safety, others complain about added complexity. So it’s very important to be able to limit this complexity to specific areas.

When are unique and lent qualifiers really needed? In C++, at least in my experience, unique_ptr serves mostly as a poor man’s garbage collector. Since memory management permeates all C++ programs, the switch to using unique qualifiers would require a lot of changes even in single-threaded programs. In contrast, in garbage-collected languages, the main usefulness of unique is for multithreaded programming. A message queue that passes large unique objects between threads is more efficient than the one that deep-copies them. And unique objects never require locking.

When multithreading is involved, a safer type system doesn’t really add complexity. It translates the hard complexity of shared-memory concurrency into something that is more approachable by mere mortals.

As always, the introduction of new type modifiers may lead to code duplication (as in the case of const and non-const accessors). This problem can be dealt with using qualifier polymorphism–a topic for a whole new blog post.

I understand that this post might be hard to follow. That doesn’t mean that the uniqueness scheme is difficult to understand. My task was not only to explain the hows, but also the whys–and that requires giving a lot of background.


  1. Jonathan Aldrich, Valentin Kostadinov, Craig Chambers, Alias Annotations for Program Understanding. Describes unique and lent annotations in detail.
  2. See also my previous blog on Generic Race-Free Java.

An immutable object never changes. You can bet your program on it. As I explained in my previous post, the same is not true for const objects (or readonly objects, in dialects of Java). They may be changed through mutable aliases. An immutable object has no mutable aliases. Ever!

Small print: this guarantee is predicated on the programmer not overriding the type system with casts and other escape mechanisms.

To my knowledge, immutability is currently available in the D programming language and in a Java dialect called IGJ (Immutability Generic Java). It is the default in functional languages.

The closest you can get to immutability in C++ is by using const as a storage class:

const double pi = 3.141592;
const char ERRORMSG[] = "Your bank went belly up.";

Here, the value of pi or ERRORMSG is guaranteed to never change. Global or static immutable values can be used at compile time (for instance as labels in a switch statement).

Creating Immutable Objects

Defining immutable numbers, arrays, or POD structs is pretty straightforward. But what about more complex objects that don’t have static initializers? How do you create an immutable linked list? In functional languages, lists are treated as built-in types; like arrays in general purpose languages. They can be statically initialized. But in C++ or Java the creation of a list involves memory allocations and reference manipulation. Since, by definition, we can’t manipulate an immutable list, it seems like we can never create one!

How about relaxing the constraints a little to allow mutation inside the constructor of an immutable object. Here’s a hypothetical example of a list in D (the current D compiler doesn’t fully implement immutability, so my examples are written in pseudo-D):

class IntList
  // default constructor
  this() {} // _head is default-initialized to null
  // one element constructor
  this(int i) {
    _head = new IntLink(i); // mutates _head
  IntLink _head;
// will this work?
immutable IntList oneList= new immutable IntList(1);

There is a significant problem with this solution. If this is not considered immutable inside the constructor then there is no guarantee that a mutable reference to this or any of its subobjects won’t escape its scope. Consider this example:

IntLink globalLink; // mutable!

IntList.this(int i) {
  _head = new IntLink(i);
  globalLink = _head; // escape!

immutable IntList immList= new immutable IntList(1);
globalLink.setValue(2); // mutates immList!

Here, an immutable object immList has been mutated through an alias globalLink. We can’t allow this!

It’s true that a compiler could perform escape analysis on the constructor of IntList, provided it has access to its source code; which might not always be true when it’s compiling the statement that creates the immutable object. After all, class IntList might be implemented in a third-party library.

In the absence of source code, the only other possibility is to include immutability information in the type signature of the constructor. When an immutable object is created, the compiler would use an immutable constructor, and it would fail if one didn’t exist. Conversely, an immutable constructor would not compile if it allowed a mutable reference to escape. This bad code would not compile:

IntList.this(int i) immutable {
  _head = new IntLink(i);
  globalLink = _head; // error!

Of course, no mutable methods may be called from inside an immutable constructor–they couldn’t guarantee the non-escape of mutable aliases.

This solution works, even if it’s not perfect. It often leads to code duplication (the immutable constructor being identical to the mutable one, as in the IntList example). Moreover, it prevents some forms of refactoring. Even though, inside an immutable constructor, you may initialize an object’s fields, you can’t delegate this task to a (perforce immutable) method of the object.

Assignment is not the same as Mutation

The key to immutable construction is the observation that, when constructing an immutable object, it’s okay to assign the object’s fields but not to mutate them. During construction only such “shallow” mutation should be possible.

In my example, the assignment to _head is okay, but the mutation of the IntLink object attached to it should be prohibited. Indeed, I don’t need to mutate the head link once it’s constructed. Of course the construction of an immutable IntLink follows the same rules. Here’s the relevant code:

class IntLink
  this(int i) immutable {
    // _next is default-initialized to null
    _val = i; // field assignment
  int _val;
  IntLink _next;

With this understanding, it’s actually possible to minimize code duplication. To this end, IGJ introduces a new type modifier, AssignFields. A constructor or a method that performs no other mutation but field assignment may be declared AssignFields. Since AssignFields methods and AssignFields constructors can also be used in mutable contexts, they don’t have to be duplicated. Expanding on the above example:

class IntLink
  this(int i) assignfields {
    // _next is default-initialized to null
    SetValue(i); // Ok: it's an assignfields method
  void SetValue(int i) assignfields {
    _val = i; // field assignment
  int _val;
  IntLink _next;

As you can see, I was even able to refactor the part of the constructor that does the assignment to _val. I can now use the same constructor in both, mutable and immutable, contexts. The SetValue method can only be called in a mutable or assignfields context.

immutable IntLink iLink = new immutable IntLink(1);
IntLink mLink = new IntLink(2);

Subtyping relationships

It is okay to pass an immutable object to a function that expects a const object. After all, such a function will not mutate the object. If a reference to the object escapes the function, it can only be a const reference. And again, const reference cannot be used to mutate the object, so we’re fine.

The compiler will allow this subsumption if we establish that immutable is a subtype of const (more precisely, for any type T, immutable T is a subtype of const T). This is very similar to the compiler allowing the passing a derived class object to a function that accepts a base class objects–the Liskov substitution principle.

The full subtyping hierarchy between various mutability annotations is as follows:

  • immutable is a subtype of const: You can pass an immutable object to a function taking a const argument.
  • assignfields is a subtype of const: You can call a const method from an assignfields method (subsumption of this).
  • mutable is a subtype of assignfields. You can call an assignfields method on a mutable object (as in mLink.SetValue()).

Because of transitivity, mutable is also a subtype of const. You can pass a mutable object to a function that takes a const argument (you do it in C++ without much thinking).

subtyping relationships


There is some advantage to introducing yet another type modifier, assignfields, to smooth out the process of constructing immutable objects. On the other hand, is it worth additional complexity? How often does one construct non-trivial immutable objects? If it’s not a common programming pattern then maybe we can live with current restrictions and some code duplication. We still have very little experience in using immutable in D, so we might have to wait and see.

Recent news: A video of my talk to the Vancouver C++ Users group was just posted. The black shadow in front of the bright screen talking about memory fences is yours truly.

If you found this post interesting, please register your vote on reddit, so that more people can find it.

Why const?

With judicious use of const you can catch many bugs and improve code documentation and maintenance. For me the classic example is reading from and writing to files. Look at these two interfaces in stylized C++:

bool Read(File & file, char * buffer, unsigned & size)
bool Write(File & file, char const * buffer, unsigned const & size);

The first one tells me that the buffer may be modified by the function. In fact, the Read function will most likely write characters into it. It is also allowed to modify the parameter size, putting the number of characters read into it. The incorrect call

Read("abc", 3);

will not compile.

The second declaration promises that the buffer will not be modified and neither will size. So the following (correct) call will compile:

Write("abc", 3);

What’s more, when compiling the implementation of Write, the compiler will ensure that the arguments are not modified (unless the programmer subverts the type system using const_cast).

What does const guarantee? Not much!

This optimistic picture is marred by a few gotchas. You’d think that if you call a function taking a const reference, the object you are passing won’t change (assume there’s no use of const_cast or mutable members). Think again!

  1. In a multithreaded program, another thread might change the object concurrently. You may avoid this problem by preventing concurrent access using critical sections.
  2. But even in a single threaded program there is no guarantee the const object won’t change. That’s because there might be a non-const alias to the argument you’re passing. This alias might be accessible to the function either through a global variable, or through another argument. The classic example is copying elements between overlapping ranges. The source for the copy is passed as const, the target as non-const, but if they point at overlapping regions, the source may end up modified.

    There is a stronger version of constness that might guarantee immutability. The qualifier is usually called immutable and it’s available in the D programming language and in some dialects of Java. I’ll come back to it later.

  3. On top of those gotchas, even in a single-threaded C++ program with no aliasing of arguments, a const function argument might get mutated. That’s because constness is shallow–it doesn’t protect indirect data members (pointers or references). See next section.

Transitivity or “deep” const

Consider a list that is defined recursively:

class List {
  List * GetNext() const { return _next; }
  void SetNext(List * next) { _next = next; }
  List * _next;
void clip(List const * list) {
  List * next = list.GetNext();
  if (next)

Function clip takes a const list and blatantly modifies it. That’s because the indirect member, _next, of a const List is not const. C++ constness is not transitive! This fact is reflected in GetNext being a const method and returning a non-const pointer to _next. So even if this is const inside GetNext, indirect parts of this are open to modification.

I always found the non-transitive definition of const counter-intuitive, and made sure that in my code constness was always transitive. For instance, I would implement GetNext to return a const pointer:

List const * GetNext() const { return _next; }

and maybe provide another non-const method that returns a non-const pointer. Notice the need for code duplication in this approach.

The depths of immutability

I have an even larger problem with non-transitive immutability (as opposed to constness). I expect a const method of an immutable object to leave the object unchanged. In particular I expect such method to behave like a pure function (not to be confused with pure virtual function). A pure function returns the same value every time it’s called with the same arguments. I expect the length of an immutable string to never change. Every call to size should return the same value. I would even expect the compiler to eliminate redundant calls to size is such a case. Here’s a hypothetical example:

immutable string("Hello!");
doSomething(string.data(), string.size());
doSomethingElse(string.data(), string.size());

A smart compiler could cache the results of data and size from the first set of calls and reuse them when calling doSomethingElse. However, if immutable is not transitive, such an optimization is incorrect. Just imagine a bizarre implementation of string that stores the length indirectly:

class string {
  string(char const * src) {
    _length = new unsigned;
    *_length = strlen(src);
  unsigned size() const {
    return *_length;
  unsigned * _length;

The D-language model

Taking all the above arguments into account, we decided to implement both const and immutable as transitive type modifiers in the D programming language. This enabled a lot of interesting compiler optimizations.

Making const and immutable transitive is essential for D’s support for functional-style programming. It’s important that const methods of an immutable object be pure functions. It’s also important to let the compiler optimize functional-style code, which otherwise tends to be less efficient than its imperative-style equivalent.

Functional programming is one of the safest paradigms for concurrency. Pure functions and immutable objects don’t require locking.

The price of transitivity

Having said that, transitivity of const imposes some constraints on the programming style. It enforces the view that anything reachable from an object is part of the object. When you are given a const linked list, all links in it must necessarily be const. That makes logical sense–the links are part of the list.

The picture gets muddier when the object contains references to some external services. Just by calling them “external,” I’m admitting that they are not part of the object.

Can an object that contains a Windows handle be const? Is it enough that it doesn’t modify the handle itself? Or shouldn’t it also refrain from modifying the window or the file behind the handle? Handles are often implemented as opaque pointers so, strictly speaking, constness should propagate through them. But how do I know which APIs don’t mutate the state of Windows and which do? This is an extreme example, but there are many in-between cases.

In an editor you might have an undo stack that stores editing commands. When you call the undo method of the command object, you may pass it a reference to the document object on which it is to operate. In that case undo may be declared as const. But if you store a reference to the document within each command, the undo method can no longer be const–it modifies the document. The classic Command Pattern uses the latter approach.

It hasn’t been decided yet if and how D will support an escape mechanism from const transitivity (and constness in general). Following C++, D could use the mutable type qualifier for that purpose. This would also solve the problem of memoizing and caching inside a const object.

In the next installment, I’ll describe more complexities of immutability and discuss a dialect of Java that supports immutability constraints.

You can vote for this article on reddit

In the previous two installments I tried to explain the origins of rvalue references in C++. If you were left with the impression that each design step was a reaction to some previously unanticipated problem, you’re right. As it often happens in C++, the design was reactive rather than proactive.

Passing by value

If you think about it long enough, you realize that the process that led to rvalue references started with the way C++ implemented passing by value. In C, passing by value was simple–a shallow copy of data was made. “Shallow” means, no pointers were followed in the process of copying. C++ decided to provide hooks for the programmer to override this behavior. If you define a copy constructor it will be used instead of shallow copy. A user-defined copy constructor may for instance follow pointers or references and clone the pointed-to objects, thus implementing “deep copy”. It may also do bookkeeping–count references, etc.

Whenever the compiler needs to copy an object–and that may happen during return by value or pass by value–it inserts the appropriate call to the copy constructor (if one is not provided by the programmer, the default shallow copy is used).

This is a very general mechanism, which however leads to some awkwardness and inefficiencies. For instance, the writer of a copy constructor must remember to copy all of the shallow data members–something that the default shallow copy does automatically. The inefficiencies show up when copying rvalues.

Return value optimizations

Let’s revisit the example from my previous post:

 auto_ptr<Foo> factory() {
    return auto_ptr<Foo>(new Foo);
 // client code
 auto_ptr<Foo> ap = factory();

Here, a temporary auto_ptr is created inside the function, copied using a copy constructor into the caller’s auto_ptr, and then immediately destroyed.

This is so obviously inefficient that the compiler will often optimize it away by constructing the return auto_ptr directly in the space allocated by the caller (a pointer to which is passed as a hidden argument to the function). This is called return value optimization (RVO) and it eliminates the call to the copy constructor and to the destructor.

But what about more general cases, like this one:

 auto_ptr<Foo> factory() {
    auto_ptr<Foo> src;
    src.reset(new Foo);
    return src;
 // client code
 auto_ptr<Foo> ap = factory();

Some compilers will attempt to optimize this code as well, using the named return value optimization (NRVO).

The C++ Standard allows such optimizations, which elide the calls to a copy constructor and a destructor, but it doesn’t mandate them. Moreover, NRVO cannot be applied to functions that have multiple return statements, with different objects being returned from each.

But notice an interesting thing: If (N)RVO were mandated by the language and applicable in every case, we could almost implement auto_ptr without the rvalue reference copy constructor. We wouldn’t need to reach back at the source auto_ptr from the copy constructor to turn off its destruction. The destructor (together with the copy constructor) would be guaranteed to be elided.

In fact, if we don’t insist on using caller’s space for creating the return value, we could cover the case of multiple return statements. And how would we transfer bits between the two spaces? We’d do a shallow, bitwise, copy!

An alternative return-by-value scheme

Here’s the scheme that was proposed for the D programming language:

  1. The compiler adds a hidden parameter to every function that returns a non-trivial data structure by value. This parameter is the address of the stack space allocated by the caller for the return value. (This is a temporary space for the rvalue returned by the function) This part is common with the C++ implementation
  2. The function allocates its local variables on the stack, as usual, including the ones that are (or might be–in case of multiple return statements) returned. RVO may be used to eliminate some of these allocations. This is still in common with C++.
  3. When executing the return statement, the value to be returned is bit-copied into the caller’s space.
  4. No copy constructor is called. No destructor is called for the variable that’s been returned.

Look ma! No copy constructor!

What is amazing about this scheme, is that it may be applied to all situations where the source of the copy is an rvalue. This is also true when passing an rvalue to a function, as in this example:

  struct S;
  void f(S s); // by value
  // caller's code
  f(S(1, 2)); // passing an rvalue

Suddenly there is no need whatsoever for an rvalue-taking copy constructor because it would never be called. Since an rvalue-taking copy constructor was the main motivation for introducing rvalue references to C++, in D we can get away without them.

It takes some getting used to thinking of passing rvalues without calling a copy constructor. Notice however that, even in C++, there is no guarantee that the copy constructor will be called (remember RVO?).

But what about non-trivial copy constructors, like in ref_ptr (reference-counting smart pointer)? It must be called to keep count of the reference count, right? Not when the source is an rvalue! Remember that we are eliding not only the copy constructor (which increments the reference count) but also the destructor of the source (which decrements the reference count). The net result is no change to the reference count.

This is a very important optimization, especially in multithreaded code, when reference counting incurs expensive synchronization overhead.

Location transparency

There is one gotcha in this scheme–we are assuming that a shallow copy will always result in a valid object. This is not true if the object contains a pointer to itself. A shallow copy will duplicate the pointer, the object itself will move to a new location. but the pointer will point at the old location. This could be a disaster.

For this and several other reasons D assumes location transparency. An object cannot point at itself (or any of its shallow members), because then it couldn’t be transparently moved to a new location.

Unfortunately this requirement cannot be enforced statically. A dynamic check during copying is possible, but it’s not clear that the overhead is always worth paying. This is still an open design area.

Notice that, in D, the scope of this problem is narrowed to structs and fixed arrays that contain pointers. There is no direct way to manipulate the location of class objects (they are always accessed indirectly)–except by the garbage collector. A generational garbage collector that moves objects in memory could definitely profit from location transparency.

The binding table

The binding table in D is slightly different than the one in C++:

  • references bind to lvalues (same as C++)
  • const references bind only to lvalues (different)
  • “values” (arguments passed by value) bind to
    • lvalues (same as C++)
    • rvalues (same as C++, except that no copy constructor is called)

Notice that in D we don’t want a const reference to bind to an rvalue. This is an inherently dangerous feature of C++. Consider the following C++ example:

  struct S;
  const S& f(const S& x) {
     return x;
  // caller's code
  const S& y = f(S(1, 2));
  y.access(); // kaboom!

Instead, if you want to have a version of a function that binds to rvalues, you declare its arguments as passed by value, as in this D example:

  S f(S x) {
    return x;
  // caller's code
  auto x = f(S(1, 2));
  x.access; // fine, it's a copy

In the worst case, the overhead of the D solution is a bitwise copy of a struct (which can often be optimized away) against the indirect access of the C++ solution. But considering that it buys us safety, it’s a trade off well worth it.

I’d like to thank Andrei Alexandrescu for his comments and, in general, for driving the design of pass-by-value in D.

With the multicore explosion in the making, are we going to be running hundreds of thousands of threads in a single program? Erlang programmers would emphatically say, yes! C++ and Java programmers would probably say no.

Why this discrepancy?

Thread model based on heavy-duty OS threads and mutexes has its limitations. You can ask server writers, or Google for “thread per connection” to convince yourself. Servers use thread pools exactly because of that.

Thread pools are an admission of defeat for the thread model. Having to pool threads tells you that:

  • Thread creation is not fast enough
  • Threads’ consumption of resources is substantial, so it makes sense to keep their numbers down

Granted, these are technical problems that might be overcome in future by improvements in operating systems.

The more fundamental problem with threads has its root in memory sharing. It seems like sharing offers great advantage in terms of performance, but sharing requires locking. It’s a well known fact that locking doesn’t scale (or compose). Between races and deadlocks, it’s also extremely hard to get right.

Here’s what Erlang does

Erlang gives up on sharing!

Threads that don’t share memory are called processes. We tend to think of processes as heavyweight beasts implemented by operating systems. That’s because one needs the operating system to strictly enforce the no-sharing policy (the isolation of address spaces). Only the OS can manage separate address spaces and the passing of data between them.

But that’s not the only way. The isolation might instead be enforced by the language. Erlang is a functional language with strict copy semantics and with no pointers or references. Erlang processes communicate by message passing. Of course, behind the scenes, messages are passed through shared memory, thus avoiding a large performance hit of inter-process communication. But that’s invisible to the client.

Erlang rolls out its own threads!

Erlang interpreter provides lightweight processes (so lightweight that there’s a benchmark running 20 million Erlang processes).

And there is a bonus: Erlang code that uses lightweight processes will also work with heavyweight processes and in a distributed environment.

Why don’t we all switch to Erlang?

As far as I know there are two main reasons:

  • It’s a weird language. Don’t get me wrong, I love functional programming for its mathematical beauty, terseness, and elegance. But I had to rewire my brain to be able to write pure functional programs. Functional paradigm is as alien to our everyday experience as quantum mechanics. (CS grad students: you’re not typical programmers.)
  • Messages have to be copied. You can’t deep-copy a large data structure without some performance degradation, and not all copying can be optimized away (it requires behind-the-scenes alias analysis). This is why mainstream languages (and I will even include Scala in this category) don’t abandon sharing. Instead they rely on programmer’s discipline or try to control aliasing.


Native threads are expensive. Interpreted languages, like Erlang, can afford to implement their own lightweight threads and schedulers. I don’t see that happening in compiled languages. The hope is that  operating systems will improve their implementations of threads–I hear Linux’s NPTL already is a big improvement in this area. In the meanwhile, we have to rely on thread pools.

Shared memory concurrency model is the reason why multithreaded programming is so difficult and error-prone. Separating address spaces simplifies programming, but at a performance cost. I believe that some kind of compromise is possible. A message-passing or an actor model can work with sharing, as long as aliasing is under control.

Inspiration for this post

After my last post on thin lock implementation, I was asked the question, why I used such a small number, 1024, for the maximum number of threads. It was actually a number I’ve found in the D compiler runtime. A thin lock could easily work with a much larger number of threads. In fact I’m going to substantially increase the number of bits for thread ID at the expense of recursion count. But this question got me thinking about scalability of threading in general.


Fibers (as well as Java’s green threads) are not an alternative to heavyweight threads. Fibers can’t run in parallel, so they have no way to use multiple processors. They are more of a flow-of-control construct, often used interchangeably with coroutines.

Interesting reading

Previously I have described a thin lock as a very efficient implementation of object monitor. It’s time to flesh out the design. Since most of the thin lock implementation is itself lock free, all kinds of multicore subtleties come into play. I did my best to analyze every step, but I’m only human. The code hasn’t yet been written in D, much less tested.

Object Layout

Every object (instance of a class) in the D programming language inherits from Object. Object contains a hidden header consisting of two fields:

  • pointer to vtable
  • pointer-sized thin lock

I’m not adding any new fields, just replacing an existing pointer field with the thin lock. The pointer used to point to a lazily evaluated Monitor object.

Thin lock is a union of a bit field and a pointer to a FatLock struct. The union is discriminated by the two lowest bits–if they are zero, it’s a pointer. A null pointer has special meaning–the object can never be shared (it was created as non-shared). See sharing and unsharing n D.

Here’s the thin lock bit-field layout (the low 32-bits; the whole field is either 32- or 64-bit wide, depending on native pointer size).

Bits Name Purpose
11 thread index One-based index into the global thread table
19 recursion counter Used for recursive locking by the same thread
1 currently shared Set during the creation of a shared object. It can be turned off and back on with the help of sharing casts.
1 created shared Set only if the object was created as shared

D runtime has a fixed-size global array of threads. An index into this array is used to identify a thread. This is different from the operating-system-defined thread ID.

Sharing policy

To share an object between threads, it must be created as shared (possibly allocated from a separate, shared heap). Any global object that is not declared shared is only accessible through thread-local handles.

A shared object’s thin lock (which hasn’t been inflated to a fat lock, see later) has the two lowest bits set. Sharing can be cast away, at which point the currently-shared bit is cleared, but the created-shared bit is still on. The object can then be cast back to shared; however, if the created-shared bit is not on, such a cast will throw an exception.

This guarantees that an object that was not created for sharing can never be cast to shared.

When sharing is cast away, the casting thread’s identifier is remembered in the thread index field. The object becomes exclusively owned by the current thread (essentially, locked by it). Any attempt by a different thread to lock such an object will result in an exception.

Locking a never-shared object

An object that was created as non-shared (the default) has zero in its thin lock. This state is permanent and not reachable from any other state.

Originally I thought that the comparison (thinlock == 0) didn’t require any synchronization. I totally spaced out on publication safety (thank you, Andrei, for a reality check!). In general, this test has to be preceded by a read fence. Fortunately, on an x86, publication safety is guaranteed without fencing, so at least on that platform, the check is lightning fast. Because the thin lock is co-located with the rest of the object–next to the vtable pointer–even the overhead of fetching it from main memory is rarely incurred.

Therefore the incentive for writing two versions of a class–one with synchronized, and the other with non-synchronized methods–is practically absent in D.

I mentioned before that the D compiler might be able to elide synchronization of non-shared objects altogether. If that happens, the testing of the two sharing bits in the thin lock would be redundant. There is however a code-size/performance trade-off between the two solutions.

Locking algorithm

– Test thin lock for zero (on an x86, without any synchronization, otherwise precede it with a read fence). If zero, return. This is the most common case: the object was not created for sharing and will never be accessed by another thread.

– Fetch current compact thread ID, which is conveniently stored in thread-local memory. Compact thread ID is pre-calculated for each thread using an 11-bit thread index (see above) shifted left by 21 and OR’ed with 3 (the lowest two bits are set). This is an optimization. Notice that an XOR with a compact thread ID flips the two lowest bits of the thin lock, which makes subsequent tests look logically inverted.

– XOR thin lock with this ID. If the result is 2 (remember the inversion), return. This is the case when the object was created shared, but was cast to non-shared by the current thread. This operation doesn’t require any synchronization, because only the current thread could cast the object back to shared.

if (thinlock ^ compactThreadId == 2)

– If we are past this point, we know that the object is shared, or we are attempting (incorrectly) to lock an object that is exclusive to (cast to non-shared by) another thread.

– Perform a CAS (atomic Compare And Swap operation) on thinlock. The value we are expecting is 3 (two lowest bits set and noting else), the replacement value is the compact thread ID of the current thread. This operation will succeed in the next most common case–the object is shared but is not currently locked. The resulting thin lock state has the thread index filled with a non-zero value, and the two lowest bits set (see the description of compact thread ID). This state is interpreted as “locked once by a given thread”.

– If the CAS fails (thinlock didn’t have the expected value and the swap did not occur), we know that the object is locked (or the thin lock has been inflated to a fat lock, see later).

– Try the next most common case–the object is locked by the current thread (recursive locking). First XOR the value of the thin lock with the current compact thread ID to isolate the count field. If the result is less that the maximum count shifted left by 2 and the two lowest bits are zero (meaning, they were set before the XOR), then increment the count. These operations don’t require any synchronization, because they only succeed when the lock is owned by the current thread.

uint tmp = thinlock ^ compactTID;
if (tmp < MAX_COUNT_MASK && (tmp & 3) == 0)
  thinlock += COUNT_INCREMENT;

– Check for the error of trying to lock an object that is owned exclusively by another thread.

if ((thinlock & 3) == 1)
  throw new ExclusiveLockViolation;

– Check if the lock has been inflated (the two lowest bits are zero). If so, interpret the thin lock as a pointer to FatLock and lock it. Fat lock is implemented using the operating system locking primitives and can deal with contention. Return.

– If we reach this point, we know that there is contention for the lock or the count has overflowed. In either case we have to inflate the lock. We have to preserve one invariant–the lock can only be modified by the thread that holds it, otherwise we are open to all kinds of races. Therefore we have to busy-wait for the lock to be released.

while (thinlock != 3 && (thinlock & 3) != 0)

Notice that busy waiting requires that the compiler not optimize this loop–we need a “compiler fence”. However, no processor memory fences are required, because there is no ordering dependence. It’s enough that, when another thread modifies the lock, the new value eventually becomes visible to the spinning thread.

It’s possible to miss the unlocked state and spin longer than necessary. Starvation is theoretically possible if the other thread keeps unlocking and locking without discovering a contention; with the current thread repeatedly missing the unlocked state. (This part of the algorithm may be optimized further by introducing exponential backup.)

– The loop is exited if either the lock has been released (thinlock == 3) or another thread managed to inflate the lock (the two lowest bits are zero, signifying that the value stored is a pointer to FatLock).

– Try to acquire the lock using CAS (the arguments are the same as in the original attempt).

– If it succeeds, allocate the fat lock, lock it, and atomically store the pointer to it in the thin lock. Return. Once the lock has been inflated, it will remain so for the lifetime of the object.

– If we reach this point, we know that: either the lock has been inflated, another thread is in the process of inflating it, or another thread has acquired the lock without contention while we were busy waiting.

– Try again: go back to busy waiting.


When unlocking, we have the guarantee that the current thread owns the lock, so we don’t need any additional synchronization.

  • If the thin locks is zero, return. The object was not created for sharing.
  • XOR thin lock with the compact thread ID.
  • If the result is 2, we own the object exclusively. Return.
  • If the result is zero, we hit the next most common case–the lock has been taken once by the current thread. Store 3 in the thin lock and return.
  • If the result of XOR is non-zero and its lowest two bits are clear, decrement the recursion count.
  • Otherwise, the lock has been inflated. Unlock the fat lock.

FatLock must also contain a field exclusively-owned-by. This field is filled with a compact thread ID when sharing is cast away. When sharing is re-established, this field goes back to zero. Therefore, after locking the fat lock, additional checking is done: If the field is non-zero and the result of XOR with the current compact thread ID is also non-zero, an exception is thrown (attempt to lock an exclusively owned object).

« Previous PageNext Page »