Why const?

With judicious use of const you can catch many bugs and improve code documentation and maintenance. For me the classic example is reading from and writing to files. Look at these two interfaces in stylized C++:

bool Read(File & file, char * buffer, unsigned & size)
bool Write(File & file, char const * buffer, unsigned const & size);

The first one tells me that the buffer may be modified by the function. In fact, the Read function will most likely write characters into it. It is also allowed to modify the parameter size, putting the number of characters read into it. The incorrect call

Read("abc", 3);

will not compile.

The second declaration promises that the buffer will not be modified and neither will size. So the following (correct) call will compile:

Write("abc", 3);

What’s more, when compiling the implementation of Write, the compiler will ensure that the arguments are not modified (unless the programmer subverts the type system using const_cast).

What does const guarantee? Not much!

This optimistic picture is marred by a few gotchas. You’d think that if you call a function taking a const reference, the object you are passing won’t change (assume there’s no use of const_cast or mutable members). Think again!

  1. In a multithreaded program, another thread might change the object concurrently. You may avoid this problem by preventing concurrent access using critical sections.
  2. But even in a single threaded program there is no guarantee the const object won’t change. That’s because there might be a non-const alias to the argument you’re passing. This alias might be accessible to the function either through a global variable, or through another argument. The classic example is copying elements between overlapping ranges. The source for the copy is passed as const, the target as non-const, but if they point at overlapping regions, the source may end up modified.

    There is a stronger version of constness that might guarantee immutability. The qualifier is usually called immutable and it’s available in the D programming language and in some dialects of Java. I’ll come back to it later.

  3. On top of those gotchas, even in a single-threaded C++ program with no aliasing of arguments, a const function argument might get mutated. That’s because constness is shallow–it doesn’t protect indirect data members (pointers or references). See next section.

Transitivity or “deep” const

Consider a list that is defined recursively:

class List {
  List * GetNext() const { return _next; }
  void SetNext(List * next) { _next = next; }
  List * _next;
void clip(List const * list) {
  List * next = list.GetNext();
  if (next)

Function clip takes a const list and blatantly modifies it. That’s because the indirect member, _next, of a const List is not const. C++ constness is not transitive! This fact is reflected in GetNext being a const method and returning a non-const pointer to _next. So even if this is const inside GetNext, indirect parts of this are open to modification.

I always found the non-transitive definition of const counter-intuitive, and made sure that in my code constness was always transitive. For instance, I would implement GetNext to return a const pointer:

List const * GetNext() const { return _next; }

and maybe provide another non-const method that returns a non-const pointer. Notice the need for code duplication in this approach.

The depths of immutability

I have an even larger problem with non-transitive immutability (as opposed to constness). I expect a const method of an immutable object to leave the object unchanged. In particular I expect such method to behave like a pure function (not to be confused with pure virtual function). A pure function returns the same value every time it’s called with the same arguments. I expect the length of an immutable string to never change. Every call to size should return the same value. I would even expect the compiler to eliminate redundant calls to size is such a case. Here’s a hypothetical example:

immutable string("Hello!");
doSomething(, string.size());
doSomethingElse(, string.size());

A smart compiler could cache the results of data and size from the first set of calls and reuse them when calling doSomethingElse. However, if immutable is not transitive, such an optimization is incorrect. Just imagine a bizarre implementation of string that stores the length indirectly:

class string {
  string(char const * src) {
    _length = new unsigned;
    *_length = strlen(src);
  unsigned size() const {
    return *_length;
  unsigned * _length;

The D-language model

Taking all the above arguments into account, we decided to implement both const and immutable as transitive type modifiers in the D programming language. This enabled a lot of interesting compiler optimizations.

Making const and immutable transitive is essential for D’s support for functional-style programming. It’s important that const methods of an immutable object be pure functions. It’s also important to let the compiler optimize functional-style code, which otherwise tends to be less efficient than its imperative-style equivalent.

Functional programming is one of the safest paradigms for concurrency. Pure functions and immutable objects don’t require locking.

The price of transitivity

Having said that, transitivity of const imposes some constraints on the programming style. It enforces the view that anything reachable from an object is part of the object. When you are given a const linked list, all links in it must necessarily be const. That makes logical sense–the links are part of the list.

The picture gets muddier when the object contains references to some external services. Just by calling them “external,” I’m admitting that they are not part of the object.

Can an object that contains a Windows handle be const? Is it enough that it doesn’t modify the handle itself? Or shouldn’t it also refrain from modifying the window or the file behind the handle? Handles are often implemented as opaque pointers so, strictly speaking, constness should propagate through them. But how do I know which APIs don’t mutate the state of Windows and which do? This is an extreme example, but there are many in-between cases.

In an editor you might have an undo stack that stores editing commands. When you call the undo method of the command object, you may pass it a reference to the document object on which it is to operate. In that case undo may be declared as const. But if you store a reference to the document within each command, the undo method can no longer be const–it modifies the document. The classic Command Pattern uses the latter approach.

It hasn’t been decided yet if and how D will support an escape mechanism from const transitivity (and constness in general). Following C++, D could use the mutable type qualifier for that purpose. This would also solve the problem of memoizing and caching inside a const object.

In the next installment, I’ll describe more complexities of immutability and discuss a dialect of Java that supports immutability constraints.

You can vote for this article on reddit

Strong typing–you either love it or hate it! Elaborate type systems catch a lot of programming errors at compile time. On the other hand they impose the burden of type annotation on the programmer, and they sometimes disallow valid programs or force type casting.

In Java there’s been a never-ending discussion about the lack of support for const types. People coming from the C++ background see it as a major flaw. Others would defend Java from constness with their own bodies.

I’m always on the lookout for solutions that let programmers have a cake and eat it too. Pluggable types are just the thing. You want const enforcement? Add a constness plug to your compiler. You want compile time null reference checks? Add another plug.

How is it possible?

Java annotations as type modifiers

Did you know that Java had a system of user-defined annotations (C# has a similar thing)? Annotations start with the @ sign and can be used, for instance, to annotate classes or variables. The Java compiler doesn’t do much with them other then embed them as metadata in class files. They can be retrieved at runtime as part of the reflection mechanism.

I never thought of annotations as more than an ad hoc curiosity until I read Matthew Papi’s Ph.D. thesis, Practical pluggable types for Java. The paper proposes the use annotations to extend the Java type system. This work contributed to the JSR 308 proposal, which allows annotations to appear anywhere types are used. For instance, you could annotate a declaration of an object with the @NonNull qualifier:

@NonNull Object nnObj;

or declare a read-only list of read-only elements:

class UnmodifiableList<T>
  implements @ReadOnly List<@ReadOnly T>
{ ... }

What distinguishes such annotations from bona fide comments is that they can be type checked by user-defined checkers. These checkers can be plugged into the compiler and run during compilation. They get access to abstract syntax trees (ASTs) produced by the language parser, so they can also do semantic checks (e.g., detect method calls through a potentially null pointer).

Examples of type system extensions

A simple NonNull checker detects assignments of potentially null objects to non-null objects, as in:

Object obj;
@NonNull Object nnObj = obj; // warning!

The checker must know that the default qualifier is @Nullable (potentially null) and @NonNull is a “subtype” of @Nullable (strinctly speaking, @NonNull T is a subtype of @Nullable T). In fact, simple type annotations of the @Nullable variety may be defined within the program and type-checked by the generic Basic Type Checker provided as part of the type checker framework. For instance, the subtyping relationship may expressed declaratively as:

    @SubtypeOf( { Nullable.class } )
    public @interface NonNull { }

However, a specialized checker can do more than just check type correctness. The NonNull checker, for instance, issues a warning every time the program tries to dereference a @Nullable object (remember that @Nullable is the default implicit annotation).

An even more useful checker enforces immutability constraints (the generalization of const types). The IGJ checker (Immutability Generic Java) introduces, among others, the following annotations:

  • @Mutable: The default. Allows the object to be modified.
  • @ReadOnly: An object cannot be modified through this reference. This is analogous to C++ const reference.
  • @Immutable: An object may not be modified through any reference (not only the current one). Java Strings are immutable.

Incidentally, similar immutability constraints are built into the type system of the D programming language, except that D’s const and immutable (formerly known as invariant) are transitive.

Try it!

Suppose you are a fan of immutability constraints and you want to try them right now. First, download the JSR 308 compiler and checkers from the pluggable types web site. To compile your annotated program, use the command line:

javac -processor checkers.IGJ.IGJChecker <source-files>

If your program doesn’t obey immutability constraints specified in your source code, you’ll see a bunch of warnings and errors.

Now for the annotations. You might want to do them bottom up:

  • Annotate methods that don’t modify this:
    void foo() @ReadOnly;

    These methods may only call other @ReadOnly methods of the same object (the checker will tell you if you break this rule).

  • Annotate arguments that are not modified by a method, e.g.,
    void foo(@ReadOnly Bar bar);
  • Annotate objects that never change after creation:
    @Immutable Bar bar = new @Immutable Bar();

And so on…

You can pass @Immutable as well as @Mutable objects to methods that take @ReadOnly arguments, but not the other way around.

Of course, any attempt to modify a @ReadOnly or @Immutable object will be flagged as error by the checker.

Nullness checker

The nullness checker protects your program from dereferencing uninitialized objects. For instance, you can only call a method or access a property of an object that is typed as @NonNull.

Since the implicit default annotation is @Nullable, the following code will be flagged as an error:

Bar bar;

There are some interesting issues related to nullness. For instance, during object construction, this cannot be considered @NonNull. Consequently you cannot call regular methods from an object’s constructor. To remedy this situation without compromising the type system, a special annotation @Raw was added. Inside the constructor, this (and all @NonNull subobjects under construction) are considered @Raw. So if you want to call an object’s method from inside its constructor, such method must be annotated @Raw.


It’s possible to overload functions based on their nullness annotation. For instance, if you know that a function’s argument is @NonNull, you might want to skip a null test:

String stringize(Bar bar)
   if (bar != null)
      return bar.toString();
      return "";

// Specialized version
String stringize(@NonNull Bar bar)
   return bar.toString();

Interestingly, since the compiler does flow analysis, it knows that inside the if statement bar is not null, even though it’s (implicitly) declared as @Nullable, so it won’t flag it as an error. (This is called flow-dependent typing–I implemented something like that in my Dr. Dobb’s article using local variable shadowing.)

Let’s try something more challenging–declaring the identity function, a function that simply returns its argument. The naive declaration

T identity(T x);

has a problem when nullability annotations are in force. Since the default annotation is @Nullable, the above is equivalent to

@Nullable T identity(@Nullable T x);

When called with a @NonNull argument, this function will return a @Nullable result. That’s not good! You can ameliorate this situation by declaring a separate overload:

@NonNull T identity(@NonNull T x);

The problem is that, when you keep adding new kinds of type annotations to your system, the number of overloads of identity starts growing exponentially.

Enter polymorphic annotation @PolyNull. The identity function can pass the nullness annotation through, if you declare it as follows:

@PolyNull T identity(@PolyNull T x);

Amazingly enough, @PolyNull will also work with multiple arguments. Consider the max function:

@PolyNull T max(@PolyNull T x, @PolyNull T y);

Through the magic of type unification, this declaration does the right thing. The nullability annotations of both arguments are unified to the lowest common denominator–the most derived annotation in the subtyping hierarchy to which both types can be implicitly converted. So, if both arguments are @NonNull, their unification is @NonNull, and the result is @NonNull. But if even one of the arguments is @Nullable, then the result is @Nullable. Notice that this declaration covers four possible overloads:

@NonNull T max(@NonNull T x, @NonNull T y);
@NullableT max(@NonNull T x, @NullableT y);
@NullableT max(@NullableT x, @NonNull T y);
@Nullable T max(@NullableT x, @NullableT y);

For non-static methods, the annotation of this may also take part in the unification:

@PolyNull T foo(@PolyNull T x) @PolyNull;


One of the limitation of user-defined type annotations is that they can’t take part in program optimization. That’s because the type checker is not allowed to modify the AST. (See Walter Bright’s blog in Dr. Dobbs about optimization opportunities based on immutability in the D programming language.)


Michael Ernst, who was Matthew Papi’s doctoral adviser, clarified some of the issues for me. Since I’m attending his seminar, I’ll be blogging more about his work.

How does Java do it? Motivation for C++ programmers

Things like double-checked locking pattern and Peterson lock work out of the box in Java, as long as you declare all shared variables volatile. (Do not confuse Java volatile with C++ volatile–the latter has nothing to do with multithreading.) Obviously the Java compiler knows what kind of fences are necessary on relaxed memory architectures, including the x86. I thought it was worth looking at.

Why the sudden interest in Java? Because Java memory model is very relevant to C++0x. The keyword here is sequential consistency. Java enforces sequential consistency on all access to volatile variables. C++0x introduces atomic objects which, by default, also follow sequential consistency. So C++ atomics will, by default, behave almost exactly like Java volatile variables.

Also, why even bother studying the quirks of the x86? Can’t we just stick to locks and, occasionally, to default atomics and let the compiler/library writers worry about how they translate into fences? Absolutely!

This should be the end of the story, except that a lot of programmers seem to “know better.” They will try to optimize multithreaded algorithms and get into a world of trouble. So if you know anybody who might be tempted to write or optimize lock-free algorithms, read on.

Why the x86? Not only because it’s the most common chip, but also because its memory model is deceptively “normal.” It’s much more likely for programmers to attempt to play games with the x86 than, for instance, with the alpha. The rest of this post should serve as motivation to stay away form such games.

What is Sequential Consistency?

Sequential consistency is how we naturally think about multithreaded programs. It’s also how we think about the world. If A happens before B then it cannot be true that B happens before A. If one processor stores 1 in variable x, and another processor stores 1 in variable y, than either the sequence of events is:

  • x flips to 1 and then y flips to 1, or
  • y flips to 1 and then x flips to 1

(assume both are initially zero). Which sequence actually happened could be observed by a third processor, which loads both variables. If, for instance, it sees x == 1 and y == 0, it concludes that the write to x happened earlier. If it sees x == 0 and y == 1, it concludes that the write to y happened earlier (if it sees x == y, it can’t tell).

Now, in a sequentially consistent world, a fourth processor could not see the two events in a different order than what the third processor observed. We assume that, for each particular run of the program, there is a global sequence of events that is seen by all processors.

Of course, on multicore processors many things can happen simultaneously, and it usually doesn’t matter–except when memory access is involved. Sequentially consistent model assumes that there is a single switch between all processors and memory, and only one processor at a time can access it. This imaginary switch serves as the serialization point.

The x86 is not sequentially consistent

Welcome to the world of special relativity! Two observers (cores) might see two events (memory writes) in different order. Even on an x86.

The technical explanation is that, instead of one bus serializing all memory accesses, each core uses its own memory cache. Writes propagate from one cache to another with finite speed (measured in clock cycles). Reads snoop around the caches before going to shared memory. All this caching is done because a trip to main memory very expensive–hundreds of cycles.

I don’t care, says Java

The Java memory model requires that all access to shared volatile variables be sequentially consistent across all threads, even when they run on different cores. How do they enforce it?

The primary source for this kind of information is Doug Lea’s excellent The JSR-133 Cookbook for Compiler Writers.

Here’s the big picture: When Java is translated into bytecode, special codes are issued around volatile-variable access. These codes tell the Java runtime where memory fences should be inserted. The bytecodes are supposed to be processor independent; the fences are highly processor dependent. I’ll call the special bytecodes memory barriers as opposed to processor-specific memory fences. Memory barriers are inserted conservatively–the compiler assumes the most relaxed memory model (in other words, the bytecode has to run correctly on an alpha, whose memory model is so relaxed you’d think it was conceived in a hot tub).

Fences should go between memory accesses, so Java barriers are named after the kind of accesses (load or store) they separate. There is a LoadLoad barrier between volatile reads, LoadStore barrier between a read and a write, StoreLoad barrier between a write and a read, and StoreStore barrier between writes.

Here’s the first problem: the compiler can often only see one part of the pair. In that case it has to assume the worst and use the most restrictive barrier. Considering that, here are some of the cookbook rules:

  • Issue a StoreStore barrier before each volatile store.
  • Issue a StoreLoad barrier after each volatile store.
  • Issue LoadLoad and LoadStore barriers after each volatile load.

That’s a lot of barriers! Fortunately, the compiler can optimize some of them away using dataflow analysis.

The next step is to run the bytecode on a particular processor (or compile it to native code). If the processor is an alpha, practically all Java barriers translate into some kind of fences. But if it’s an x86, all but the StoreLoad barriers are turned into no-ops. The StoreLoad barrier is either translated into an mfence or a locked instruction (lock xchg).

When executed on an x86, the barrier rules boil down to this one: Issue an mfence after each volatile store (or turn it into lock xchg).

Peterson lock on an x86 in Java

Figuring the details of the Peterson lock in Java was not easy. I’m grateful to Doug Lea for patiently explaining to me some of the gotchas. I am solely responsible for any mistakes though.

In my previous post I came to the conclusion that for Peterson lock to work on an x86, an mfence is needed. Here’s the relevant pseudo-code:

Thread 0 Thread 1
store(zeroWants, 1)
store(victim, 0)
// Java puts another fence here
r0 = load(oneWants)
r1 = load(victim)
store(oneWants, 1)
store(victim, 1)
// Java puts another fence here
r0 = load(zeroWants)
r1 = load(victim)

In Java, all three variables, zeroWants, oneWants, and victim, would be declared volatile. Using the x86 Java translation rules, that would mean putting an mfence after every store to these variables.

The fences after the writes to zeroWant and oneWant are just what the doctor ordered. Form my previous post we know why they are needed on an x86.

The mfence after the write to victim is something new though. It isn’t strictly necessary on an x86, but to see that you really have to analyze the whole algorithm–something that’s beyond capabilities of modern compilers.

My original (turns out, incorrect) reasoning was that no fence is needed between the write to victim and the subsequent read because, on an x86, reads and writes to the same location are never reordered. Well, that’s not the whole story because…

Intra-processor forwarding is allowed

Consider the following example from the x86 spec. Initially x == y == 0.

Processor 0 Processor 1
mov [_x], 1
mov r1, [_x]
mov r2, [_y]
mov [_y], 1

mov r3, [_y]
mov r4, [_x]

The result r2 == 0 and r4 == 0 is allowed.

Let’s analyze that. First of all, because reads and writes to the same location are never reordered, r1 and r3 must both be 1.

If r2 == 0, we would naturally conclude that the write to y at processor 1 hasn’t happened yet. It means: processor 1 hasn’t executed its code yet. When it does, it should see the earlier write to x (processor 0 saw it!). Guess what–sometimes it doesn’t.

The technical reason for this behavior is that the write to x is immediately visible at processor 0, before it propagates to processor 1. Similarly, the write to y is immediately visible at processor 1, before it propagates to processor 0. So it is possible for both processor to see each other’s writes delayed, even though they see their own writes immediately.

This result violates sequential consistency and therefore cannot be tolerated in Java. Hence the need for the mfence between the write to victim and the subsequent read of victim.

Gosh darn it! I want to optimize it!

Having said that, I must point out that, in this particular case, lack of sequential consistency is not a deal breaker. It’s okay for thread 0 of Peterson lock to read stale (local) value of victim, even if “in the meanwhile” thread 1 overwrote it. All it means is that some unnecessary spins might be performed. This doesn’t violate the lock principles and doesn’t lead to a deadlock as long as the new value of victim eventually reaches thread 0.

I’m pretty sure of this reasoning–at least at the time of this writing. However, I wouldn’t be totally surprised is somebody found a fault in it.

The bottom line is that even such an “obvious” optimization is not obviously correct. The proof of correctness of the Peterson lock is based on the assumption of sequential consistency. If we relax this assumption even slightly, we have to redo the whole proof.

Any volunteers?

« Previous Page