Type System

Learning a new programming paradigm is like learning a foreign language. You learn the new vocabulary, the grammar, a few idioms, but you still formulate your thoughts in your native tongue. It takes years of immersion before you start thinking in a foreign language. In programming, the litmus test comes when you’re approaching a new problem. Will you formulate your solution in terms of the old or the new paradigm? Will you see procedures, objects, or functions?

I remember proposing an object oriented approach to the design of a content index back in my Microsoft years. The reaction was: It’s a great paradigm, but it’s not applicable to this particular problem. There are no “objects” in the content index. Indeed, you can’t find Employees and Payrolls, Students and Courses, DisplayableObjects and LightRays in the content index. But after a short brain storm we discovered such exotic objects as a Resource Manager or a Master Merge. We ended up with a fine piece of OO engineering that is still part of the Windows shell after all those years.

There’s a similar, if not larger, shift in thinking when you learn functional programming, especially if you come from an OO background. Initially you can’t help but see everything through the perspective of mutable data structures and loops. There are no obvious “functions” in your favorite problem. Functions are these weird stateless things–they always produce the same results when called with the same arguments. They are good in mathematics, but not in real-life programming.

And yet, there are people who write complete applications using functional (or at least hybrid) languages. One such example is the game The Path of Go created by Microsoft Research for the Xbox. It’s not a spectacular game as far as UI goes, but it plays some mean Go and it’s written in F#.

F# is a mostly functional language (based on ML) with support for object-oriented programming and access to the rich .NET libraries. In this respect it’s similar to Scala. Roughly: F# is to .NET what Scala is to JVM. Both languages were designed by excellent teams. F# was designed by Microsoft Research in Cambridge, England. The chief architect of F# is Don Syme. Scala is the brainchild of Martin Odersky.

I decided to learn F# and immerse myself in the new paradigm. So I had to pick a random problem, not one with obvious functional implementation, and start from scratch, design and all. In practice, I had to do a lot of experimenting in order to familiarize myself with the language and the library. Experimenting, by the way, was made relatively easy by the inclusion of an F# interpreter in Visual Studio 2010.

To learn F#, I used only online documentation which, as I found out, is not that good. There are also fewer online discussions about F# than, say, about Haskell. The two websites I used most are:

The Problem

Without further ado, let me describe the challenge:

Write a program that finds duplicate files on disk.

In particular, I was interested in finding duplicate image files, but for testing I used text files. The program should therefore concentrate on files with particular extensions. It should also be able to skip directories that I’m not interested in. The duplicates (or triplicates, etc.) don’t have to have the same names but have to have identical extensions and contents.

The Design

Not surprisingly, functional programming requires a major adjustment. It’s one thing to read somebody else’s code and admire the tricks, but a completely different thing to be designing and writing functional code from scratch. But once you get the hang of it, it actually becomes easy and quite natural.

The most important thing you notice when using functional languages is that types are mostly unobtrusive due to type inference, but type checking is very strong. Essentially, if you manage to compile your program, it usually runs correctly. You spend much less time debugging (which I found rather difficult in F#) and much more time figuring out why the types don’t match. Of course, you have to learn a whole new language of error messages.

So it’s definitely a steep learning curve but once you’re over the hump you start reaping the benefits.

The naive approach to solving my problem would be to list all files on disk (recursively, starting with the root directory) and compare each with each. That would scale very poorly, O(N2), so we need to do some pruning.

Let’s first group files by extension and size. There will be a lot of singleton groups– containing only one file with a particular combination of extension and size. Let’s eliminate them from consideration. After that we’ll be dealing with much smaller groups of files so, within those groups, we can do full-blown byte-by-byte comparisons. Strict comparisons will potentially split those groups into even smaller groups. Again, we should eliminate the resulting singletons. Finally, we should be able to print the resulting lists of lists of identical files.

For an imperative programmer the first impulse would be to use a lot of looping; e.g., for each file retrieve its extension and size, etc. An object-oriented programmer would use vectors, hash tables, and looping over iterators.

How would a functional programmer approach the subject? Iteration is out of the question. Recursion and immutable lists are in. Quite often functions operating on lists can be expressed as list comprehensions. But there’s an even better tool called sequences in F#. They’re sort of like iterators, but with some very nice compositional properties. Sequences can be composed using pipelining. So let me express the above design as a pipeline.

The Pipeline

This is the refinement of the original design that takes into account data structures: sequences, lists, and tuples in various combinations.

  1. The source for the pipeline is a sequence of file paths coming from a recursive enumerator.
  2. The first operation is to group the files that share the same key: in our case the key will be a tuple (file extension, file size).
  3. The next operation is to filter out groups of length one, the singletons.
  4. Since the grouping injected keys into our stream, we need to strip them now and convert groups to lists.
  5. Now we group byte-wise equal files within each group.
  6. Then we remove singletons within those subgroups,
  7. Flatten lists of lists, and
  8. Print the results.

There are only two stages that deal with technical details of data structures: the stripping of the keys and the flattening of the lists. Everything else follows from high-level design.

Here’s the pipeline in its full functional glory. The |> symbol is used to forward the results of one stage to the next.

  (filterOutPaths ["c:\\Windows";"c:\\ProgramData";"c:\\Program Files"])
  (filterExt [".jpg"; ".gif"])
|> Seq.groupBy (fun pth->(Path.GetExtension pth, (FileInfo pth).Length))
|> Seq.filter (fun (_, s) -> (Seq.length s) > 1)
|> Seq.map (fun (_, sq) -> [for path in sq -> path]) 
|> Seq.map groupEqualFiles
|> Seq.map filterOutSingletons
|> Seq.collect Seq.ofList
|> Seq.iter (fun lst -> printfn "%A" lst)

I will go through it line by line shortly.

I realize that this is a handful and if you have no previous experience with functional programming you are likely to feel overwhelmed at some point. The important thing is to observe how the original design translates almost one-to-one into implementation. Notice also the points of customization–they are almost universally plugs for user-defined functions. For instance, you customize Seq.map, Seq.filter, or Seq.collect by passing functions, often lambdas, as their arguments. Also, look how the function enumFilesRec is used. I decided to make its first two arguments functions even though my first impulse was to directly pass lists of directories to be skipped and extensions to be accepted. This way my design will work even if I later decide to filter files by, say, time of creation or size.

The Stages

Here’s the line by line reading of the pipeline code. My suggestion is to read as far as your patience permits and then skip to conclusions.

  1. I’m calling my function enumFilesRec with three arguments:
      (filterOutPaths ["c:\\Windows";"c:\\ProgramData";"c:\\Program Files"])
      (filterExt [".jpg"; ".gif"])
    1. A directory filter: a function (predicate) that returns true for all directories except the ones listed as arguments to filterOutPaths. It’s worth mentioning that filterOutPaths is a function that returns another function — the predicate expected by enumFilesRec.
    2. A file filter: a function that returns true only for listed extensions. Again, filterExt is a function that takes a list and returns a predicate.
    3. The top directory: the root of the listing.

    enumFilesRec returns a sequence of paths. Since the sequence is only evaluated on demand, the call to this function returns almost immediately.

  2. The next stage of the pipeline:
    |> Seq.groupBy (fun p->(Path.GetExtension p, (FileInfo p).Length))

    applies Seq.gropuBy to the incoming sequence of paths. Seq.groupBy takes one argument– a function that takes a path and generates a key:

    fun path -> (Path.GetExtension path, (FileInfo path).Length)

    The key is the tuple consisting of file extension and file length:

    (Path.GetExtension path, (FileInfo path).Length)

    F# notation for anonymous functions (lambdas) is of the form:

    fun x -> expr

    The function Seq.gropuBy groups all elements of the sequence into subgroups that share the same key. The result is a sequence of sequences (the groups). Of course, to perform this step the whole input sequence must be scanned. That forces the actual listing of directories on disk, which takes the bulk of the run time.

  3. The next stage performs Seq.filter on the sequence:
    |> Seq.filter (fun (_, s) -> (Seq.length s) > 1)

    Seq.filter takes a predicate– here defined by a lambda– and applies it to all elements of the sequence; passing through only those that satisfy the predicate. This is the predicate:

    fun (_, s) -> (Seq.length s) > 1

    Notice that the previous step produced a sequence whose elements were tuples of (key, subsequence) with the subsequences sharing the same key. The lambda pattern-matches these tuples, (_, s), ignoring the key and testing the length of the subsequence against one. That eliminates singleton groups.

  4. We can now get rid of the keys and convert the subsequences into plain lists that will be needed for further processing.
    |> Seq.map (fun (_, sq) -> [for path in sq -> path])

    I use the workhorse of sequences, Seq.map, that applies a function to every element of the sequence. Remember that the element is still a tuple (key, subsequence). The lambda ignores the key and returns a list:

    fun (_, sq) -> [for path in sq -> path]

    The expression:

    [for path in sq -> path]

    enumerates the paths in the sequence sq and uses them to initialize a list (the brackets denote a list in F#). In functional programming such constructs are known as list comprehensions. The expression for path in sq -> path is called a generator.

  5. The next stage looks deceptively simple:
    |> Seq.map groupEqualFiles

    It applies a function, groupEqualsFiles to each list in the sequence. The interesting work happens in that function, which I will analyze shortly. Suffice it to say that it produces a list of sublists of identical files. Some of the sublists may be singletons.

    It might be a little hard to keep track of all those sequences, subsequences, and sublists. A good development environment will show you all the types while you’re developing the program. You may also sketch simple examples:

    seq[ [[a; a]; [b]]; [[c; c; c]; [d; d]; [e]] ]

    This one shows a sequence of lists of lists of identical elements analogous to the output of the last stage.

  6. Next I apply another function, filterOutSingletons to each list of sublists.
    |> Seq.map filterOutSingletons

    I end up with a sequence of lists of sublists of length greater than one containing identical files. The sequence above would be transformed to:

    seq[ [[a; a]]; [[c; c; c]; [d; d]] ]
  7. In the next step I flatten this hierarchy using Seq.collect.
    |> Seq.collect Seq.ofList

    Seq.collect takes a function that turns each element of the original sequence into a sequence and concatenates all those sequences into one. Like this:

    seq[ [a; a]; [c; c; c]; [d; d] ]

    Remember that the element of our sequence is a list of sublists. We can easily convert such a list to a sequence by applying Seq.ofList to it. It creates a sequence of sublists, and Seq.collect will concatenate all such sequences. I end up with a big sequence of lists. Those lists contain identical files. Voila!

  8. The final step is to print those lists.
    |> Seq.iter (fun lst -> printfn "%A" lst)

    I apply Seq.iter, which takes a void function (a function returning unit, in the F# parlance):

    fun lst -> printfn "%A" lst

    (which is really not a function because it has a side effect of printing its argument–a list). Seq.iter is just like Seq.map, but it consumes its input sequence producing nothing (except for side effects). Unlike Haskell, F# doesn’t track I/O side effects in the type system.


For those who are really curious, I can go on filling in the details–the implementations of various functions used in the pipeline. Those functions use a variety of functional features of F# such as lists, recursion, pattern matching, etc. This is the bread and butter of functional programming.

Let me start with the function that enumerates files in a directory tree. The idea is to first list the files in the current directory and pass them through the file filter; then list the subdirectories, filter them through the directory filter, and recurse into each subdirectory. Since this function is the first stage of the pipeline, it should produce a sequence.

let rec enumFilesRec dirFilter fileFilter dir =
   seq {
         enumFilesSafe dir
         |> Seq.filter fileFilter
         enumDirsSafe dir
         |> Seq.filter dirFilter
         |> Seq.map (fun sub -> enumFilesRec dirFilter fileFilter sub)
         |> Seq.collect id

Monads Anyone?

I don’t want to scare anyone but F# sequences are monads. People usually have strong feelings about monads, some love them, some hate them. But don’t get intimidated by monads. The theory behind them is hard, but the usage is pretty simple.

To create a sequence you use the seq { ... } block sprinkled with yield and yield! statements. When such a sequence is enumerated (you can do it with a loop for instance: for elem in sqnc), each yield returns an element and suspends the execution of the seq block until the next call. The next iteration resumes right after the last yield. In our case we are building a new sequence from existing ones. To dive into another sequence inside the seq block we use yield! (yield bang). This is what the above code does: It first dives into file enumeration (a sequence returned by enumFilesSafe) and then into the enumeration of files in subdirectories.

enumFilesSafe is a function that calls the system’s Directory.EnumerateFiles API (part of the .NET library). I had to encapsulate it into my own function in order to catch (and ignore) the UnauthorizedAccessExceptions. Notice the use of pipelining to filter the paths.

After the sequence of files paths is exhausted, we enter the second yield!. This one starts by enumerating subdirectories. Subdirectory paths are pipelined through the directory filter. Now we have to do something for each subdirectory– that’s the clue to use Seq.map. The mapping function:

fun sub -> enumFilesRec dirFilter fileFilter sub

simply calls enumFilesRec recursively, passing it the filters and the name of the subdirectory. Notice that enumFilseRec returns another sequence, so we end up with a sequence of sequences corresponding to individual subdirectories. To flatten this hierarchy I use Seq.collect. Notice that I pass it the identity function, id, which just returns its argument: The elements of my sequence are sequences and I don’t have to do anything to them.

The second function I’d like to discuss is groupEqualFiles. It gets a list of file paths and splits it into sublists containing byte-wise identical files. This problem can be decomposed in the following way: Let’s pick the first file and split the rest into two groups: the ones equal to that file and the ones not equal. Then do the same with the non-equal group. The “do the same” part hints at recursion. Here’s the code:

let groupEqualFiles paths =
    let rec groupEqualFilesRec soFar lst =
        match lst with 
        | [] -> soFar
        | (file::tail) ->
            let (eq, rest) = groupFilesEqualTo file tail
            if rest.IsEmpty then eq::soFar
            else groupEqualFilesRec (eq::soFar) rest
    groupEqualFilesRec [] paths

A recursive solution often involves defining a recursive function and then calling it with some initial arguments. That’s the case here as well. The recursive function groupEqualFilesRec takes the accumulator, the soFar list, and a list of files to group.

let rec groupEqualFilesRec soFar lst =
    match lst with 
    | [] -> soFar
    | (file::tail) ->
        let (eq, rest) = groupFilesEqualTo file tail
        if rest.IsEmpty then eq::soFar
        else groupEqualFilesRec (eq::soFar) rest

The new trick here is pattern matching. A list can be empty and match the pattern [], or it can be split into the head and tail using the pattern (file::tail). In the first case I return the soFar list and terminate recursion. Otherwise I call another function groupFilesEqualTo with the head and the tail of the list. This auxiliary function returns a tuple of lists: the equal group and the rest. Symbolically, when called with a and [b; a; c; b; d; a] it produces:

([a; a; a], [b; c; b; d])

The tuple is immediately pattern matched to (eq, rest) in:

let (eq, rest) = groupFilesEqualTo file tail

if the rest is empty, I prepend the eq list to the accumulator, soFar. Otherwise I recursively call groupEqualFilesRec with the augmented accumulator and the rest.

The function groupEqualFiles simply calls the recursive groupEqualFilesRec with an empty accumulator and the initial list. The result is a list of lists.

For completeness, here’s the implementation of the recursive function groupFilesEqualTo

let rec groupFilesEqualTo file files =
   match files with 
   | [] -> ([file], [])
   | (hd::tail) -> 
       let (eqs, rest) = (groupFilesEqualTo file tail)
       if eqFiles file hd then (hd::eqs, rest) else (eqs, hd::rest)

Again, this function pattern-matches the list. If it’s empty, it returns a tuple consisting of the singleton list containing the file in question and the empty rest. Otherwise it calls itself recursively to group the tail. The result is pattern-matched into (eqs, rest). Now the comparison is made between the original file and the head of the original list (we could have done it before making the recursive call, but this way the code is more compact). If they match then the head is prepended to the list of equal files, otherwise it lands in the rest.

Did I mention there would be a test at the end? By now you should be able to analyze the implementation of filterOutSingletons:

let rec filterOutSingletons lstOfLst =
    match lstOfLst with
    | [] -> []
    | (h::t) -> 
        let t1 = filterOutSingletons t
        if (List.length h) > 1 then h::t1 else t1


I am not arguing that we should all switch to functional programming. For one thing, despite great progress, performance is still a problem in many areas. Immutable data structures are great, especially in concurrent programming, but can at times be awkward and inefficient. However, I strongly believe that any programmer worth his or her salt should be fluent with the functional paradigm.

The heart of programming is composition and reuse. In object-oriented programming you compose and reuse objects. In functional programming you do the same with functions. There are myriads of ways you can compose functions, the simplest being pipelining, passing functions as arguments, and returning functions from functions. There are lambdas, closures, continuations, comprehensions and, yes, monads. These are powerful tools in the hands of a skilled programmer. Not every problem fits neatly within the functional paradigm, but neither do all problems fit the OO paradigm. What’s important is having choices.

The code is available on GitHub.

I gave this presentation on Nov 17 at the Northwest C++ Users Group. Here are the links to the two-part video:

  1. Part I
  2. Part II

There was an unresolved discussion about one of the examples–the one about template specialization conflicting with modular type checking. It turns out the example was correct (that is an error would occur).

template<class T> struct vector {
    vector(int); // has constructor

//-- This one typechecks
template<LessThanComparable T>
void f(T x) { vector<T> v(100); ... }

//-- Specialization of vector for int
template<> struct vector<int> {
    // missing constructor!

int main() { f(4); } // error

The issue was whether the instantiation of f in main would see the specialization of the vector template for int even though the specialization occurs after the definition of f. Yes, it would, because vector<T> is a dependent name inside f. Dependent names are resolved in the context of template instantiation–here in the context of main, where the specialization is visible. Only non-dependent names are resolved in the context of template definition.

Some of the brightest C++ minds had been working on concepts for years, only to vote them out of the C++0x spec–unanimously, not long after they’ve been voted in. In my naivete I decided to write a short blog post explaining why in my opinion it’s still worth pursuing concepts into the next revision of the Standard. I showed a draft to my friends and was overwhelmed by criticism. The topic turned out to be so controversial that I was forced not only to defend every single point but to discuss alternative approaches and their pluses. That’s why this post grew so large.

What Are Concepts?

In a nutshell, concepts are a type system for types. Below I have tabled corresponding features and examples of non-generic programming with types vs. generic programming with concepts. The concept Iterator specifies the “type” of the type parameter, T, which is used in the definition of the template function, Count.

Regular Programming Generic Programming
functions, data structures template functions, parametrized types
int strlen(char const * s)
template<Iterator T>
void Count(T first, T last)
parameter: s type parameter: T
type: char const * concept: Iterator

Just to give you a feel for the syntax, here's an example of a concept--a simplified definition of Iterator:

concept Iterator<typename T>
    : EqualityComparable<T>
    typename value_type;
    T& operator++(T&);

Iterator illustrates both the simplicity and the complexity of concepts. Let me concentrate on the simple aspects first and leave the discussion of complexities for later.

Iterator refines (inherits from) another concept, EqualityComparable (anything that can be compared using '=='). It declares an associated type, value_type; and an associated function, operator++. Associated types and functions are the bread and butter of concepts.

You might have seen something like an "associated type" implemented as a typedef, value_type, in STL iterators. Of course, such typedefs are only a convention. You may learn about such conventions by analyzing somebody else's code or by reading documentation (it so happens that STL is well documented, but that's definitely not true of all libraries), but they are not part of the language.

You use concepts when you define template functions or parametrized data structures. For instance, here's a toy function template, Count:

template<Iterator T> int Count(T first, T last) {
    int count = 0;
    while (!(first == last)) { 
    return count;

As you can see, this function uses two properties of the concept Iterator: you can compare two iterators for equality and you can increment an iterator.

Ignoring the fine points for now, this seems to be pretty straightforward. Now, let's see what problems were concepts designed to solve.

Who Ordered Concepts?

Besides being an elegant abstraction over the type system, concepts offer some of the same advantages that types do:

  • Better error reporting
  • Better documentation
  • Overloading

Error Reporting

Why do templates produce such horrible error messages? The main reason is that errors are detected too late.

It's the same phenomenon you have in weakly typed languages, except that there error detection is postponed even further--till runtime. In weakly typed languages an error is discovered when, for instance, a certain operation cannot be performed on a certain variable. This might happen deep into the calling tree, often inside a library you're not familiar with. You essentially need to examine a stack trace to figure out where the root cause of the problem is and whether it's in your code or inside a library.

Conversely, in statically typed languages, most errors are detected during type-checking. The fact that a certain operation cannot be performed on a certain variable is encoded in the type of that variable. If the error is indeed in your code, it will be detected at the call site and will be much more informative.

In current C++, template errors are detected when a certain type argument does not support a certain operation (often expressed as the inability to convert one type to another or a failure of type deduction). It often happens deep into the instantiation tree. You need an equivalent of the stack trace to figure out where the root cause of the error is. It's called an "instantiation stack" and is dumped by the compiler upon a template error. It often spans several pages and contains unfamiliar names and implementation details of some library code.

I suspect that the main reason a lot of C++ programmers still shun template programming is because they were burnt by such error messages. I know that I take a coffee break before approaching a template error message.

Self Documentation

The only thing better than early error detection is not making the error in the first place. When you're calling a function, you usually look at its signature first. The signature tells you what types are expected and what type is returned. You don't have to study the body of the function and the bodies of functions that it calls (and so on). You don't have to rely on comments or off-line documentation. Signatures are the kind of documentation that is checked by the compiler. Even more importantly, at some point the compiler also checks the body of the function you're calling to see if its arguments are used in accordance with their declared types.

Contrast it with template programming. If the "signature" of a template is of the form template<class T>, you have no information about T. What kind of T's are acceptable can only be divined by studying the template's body and the bodies of templates it is instantiating. (And, of course, the compiler can do nothing to check this "signature" against the template body.)

The Case of STL

Even if you're not defining templates in your own code, you probably use them through the Standard Library--STL in particular. STL is a library of algorithms and containers. Alexander Stepanov's stroke of genius was to separate the two. Using STL you may apply one of the 112 algorithms to 12 standard containers. It's easy to calculate that there are 1344 possible combinations (not counting those algorithms that operate on two or three containers at a time).

Of course not all combinations of algorithms and containers make sense. For instance, you can't apply std::sort to a std::list. The reason is simple: sort (which is based on quicksort) requires random access to the container; a list can only provide sequential access. And this is the crux of the matter: this important restriction on access cannot be expressed in C++. Granted, the code will not compile because at some point sort will try to perform some operation that the list iterator doesn't support. You'll get a page-long error message. By my last count a leading C++ compiler emits 73 cryptic lines, many of them stretching to 350 characters, and nary a mention of random access being required by sort.

Template error messages would be even worse if not for a system of hacks (I mean, template tricks) and concept-like conventions in the STL. For instance, a dummy field iterator_tag corresponds to a concept name and iterator traits look a lot like concept maps. With a number of new features in C++0x (especially decltype that make SFINAE constructs such as the enable_if more powerful), even more "tricks" became possible. If some of this sounds like Gibberish to you, you're in good company. That's the mess that the concept proposal was trying to fix. Concepts do introduce a lot of complexity into the language but they reduce and organize the complexity of programs.

It's worth mentioning that Alex Stepanov collaborated in the development of the concept description language, Tecton, and participated in the C++ concept effort, so he's been acutely aware of the problems with STL.

Competing Options

The idea of limiting types that can be passed to generic algorithms or data structures has been around for some time. The C++ concept proposal is based on the Haskell's notion of type classes. Several other languages opted for simpler constrained templates instead.

Constrained Templates

One way to look at concepts is that they constrain the kind of types that may be passed to a template function or data type. One can drop concepts and instead impose type constraints directly in the definition of a template. For instance, rather than defining the Iterator concept, we could have tried to impose analogous type constraints in the definition of Count (C++ doesn't have special syntax for it but, of course, there are template tricks for everything).

There are various ways of expressing type constraints, so let's look at a few examples. In Java, everything is an object, and objects are classified by classes and interfaces. Java reuses the same class/interface mechanism for constraining template parameters. A generic SortedList, for instance, is parametrized by the type of its elements, T. The constraint imposed on type T is that it implements the interface Comparable.

class SortedList<T extends Comparable<T>>

This approach breaks down for built-in types since, for instance, int cannot be derived from Comparable. (Java's workaround is to use boxed types, like Integer, which are magically derived from Comparable.)

C# has similar approach, with some restrictions and some extensions. For instance, C# lets you specify that a given type is default constructible, as in this example:

class WidgetFactory<T> where T : Widget, new()

Here type T must be derived from Widget and it must have a default constructor.

In the D programming language, template constraints may be expressed by arbitrary compile-time functions as well as use-patterns, which are the topic of the next section.

These approaches are constrained to structural (as opposed to semantic) matching of types. A type is accepted by a constrained template if the operations supported by it match those specified by the constraints name-by-name. If one library names the same operations differently, they won't match (see how Concept Maps deal with this problem).

Use Patterns vs. Signatures

Use patterns formed the basis on the original Texas proposal for C++ Concepts. The idea was to show to the compiler representative examples of what you were planning to do with type arguments. These use patterns look almost like straightforward C++. For instance, an Iterator concept might be defined as follows:

concept Iterator<typename Iter, typename Value> {
    Var<Iter> i;
    Iter j = i; // copyable
    bool b = (i != j); // (non)-equality comparable
    ++i; // supports operator++
    Var<const Value> v;
    *i = v; // connects Value type with Iter type

Notice the subtlety with defining dummy variables i and v. You can't just write:

Iter i;

because that would imply that Iter has a default constructor-- which is not required. Hence special notation Var<T> for introducing dummy variables. In general, the use-pattern language was supposed to be a subset of C++ with some additional syntax on the side. To my knowledge this subset has never been formally defined. Also, there are some useful concepts that are hard or impossible to define in use-pattern language (see the sidebox).

Then there is the problem of concept-checking the body of a template before it is instantiated-- the holy grail of conceptology. It involves some nontrivial analysis and comparison of two parsing trees (the pattern's, and the template body's). As far as I know no formal algorithm has been presented.

Because of the difficulties in formalizing the use-pattern approach, the C++ committee decided to focus on defining concepts using signatures.

We've seen such a signature in the Iterator concept: T&operator++(T&).

How are we to interpret a signature within the context of a concept? Naively, we'd require that there should exist a free function called operator++. But what if the type T has a method operator++ instead? Do we have to list it too (as T::operator++())? A uniform call syntax was proposed that would unify functions and method calls. Our operator++ would match a function or a method (this feature didn't make it into the final concept proposal.)

Also, an operator signature must also match the built-in operator--this way we'd allow an arbitrary pointer to match the Iterator concept.

These are some of the complications I mentioned at the beginning of this post. There are also complications related to symbol lookup (it has to take into account the Argument-Dependent-Lookup hack) and overload resolution, which are notoriously difficult in C++.

Who Ordered Concept Maps?

So far we've talked about concepts and about templates that use concepts. Now let's focus on what happens when we instantiate a template with concrete types. How does the compiler check that those types fulfill the constraints of the concepts? One possibility is the "duck-typing" approach--if the type in question defines the same associated types and functions (and possibly some other constraints I haven't talked about), it is a match. This is called structural (same set of functions) and nominal (by-name) matching.

Structural matching is simple and straightforward but it might lead to "false positives." Here's the archetypal example: Syntactically, there is no difference between a ForwardIterator and an InputIterator, yet semantically those two are not equivalent. Some algorithms that work fine onForwardIterators will fail on InputIterators (the difference is that you can't go twice through an InputIterator).

Semantic matching, on the other hand, requires the client to explicitly match types with concepts (only the clients knows the meaning of a particular concept). This is done through concept maps (in Haskell they are called "instances"). With this approach, the mapping between Iterator and a pointer to T would require the following statement:

concept_map Iterator<T *> {
    typedef T value_type;

Notice that we don't have to provide the mapping for operator++; it's automatically mapped into the built-in ++. In general, if the associated functions have the same names as the functions defined for the particular type (like ++ in this case), the concept map may be empty. It's mostly because of the need to create those empty concept maps that the concept proposal was rejected.

Interestingly, both groups that worked on concepts-- the Texas group with Bjarne Stroustrup and the Indiana group with Doug Gregor-- supported concept maps. The difference was that the Texas group wanted concepts to be matched automatically, unless marked as explicit; whereas the Indiana group required explicit concept maps, unless the concept was marked auto.

On a more general level, concept maps enable cross-matching between different libraries and naming conventions. Semantically, a concept from one library may match a type from another library, but the corresponding functions and types may have different names. For instance, a StackOfInt concept from one library may declare an associated function called push. A vector from the Standard Library, conceptually, implements this interface, but uses a different name for it. A concept map would provide the appropriate mapping:

concept_map StackOfInt<std::vector<int>> {
    void push(std::vector<int> & v, int x) {


In my opinion, concepts would add a lot to C++. They would replace conventions and hacks in the STL with a coherent set of language rules. They would provide verifiable documentation and drastically simplify template error messages.

Concept maps have an important role to play as the glue between concepts and types. With concept maps you'll be able to add, post hoc, a translation layer between disparate libraries. The back-and-forth between explicit and auto concepts has a philosophical aspect: Do we prefer precision or recall in concept matching? It also has a practical aspect: Is writing empty concept maps an abomination? More experience is needed to resolve this issue.

The major complaint I've heard about the concept proposal was that it was too complex. Indeed, it took 30 pages to explain it in the Draft Standard. However, adding a new layer of abstraction to an existing language, especially one with such baggage as C++, cannot be trivial. The spec had to take into account all possible interaction with other language features, such as name resolution, overload resolution, ADL, various syntactic idiosyncrasies, and so on. But programmers rarely read language specs. I'm sure that, if there was a book "Effective Concepts" or some such, programmers would quickly absorb the rules and start using concepts.

The current hope is that concepts will become part of the 1x version of C++. Doug Gregor is working on a new version of the C++ compiler, which is likely to become testing grounds for concepts. He also publishes a blog with Dave Abrahams (see for instance this post and the follow up).

On the other hand, the concept effort has definitely lost its momentum after it was voted out of C++0x. Corporate support for further research is dwindling. The hope is in the educated programmers saying no to hacks and demanding the introduction of concepts in future versions of the C++ Standard.


I'm grateful to the "Seattle Gang", in particular (alphabetically) Andrei Alexandrescu, Walter Bright, Dave Held, Eric Niebler, and Brad Roberts, for reading and commenting on several drafts of this post. I'd like to thank Jeremy Siek for interesting discussions and the inspiration for this post.


  1. Gabriel Dos Reis and Bjarne Stroustrup, Specifying C++ Concepts, the original Texas proposal with use patterns.
  2. Joint paper by the Indiana and Texas groups, Concepts: Linguistic Support for Generic Programming in C++
  3. Google Video of Doug Gregor talking about concepts.
  4. 2009 Draft C++ Standard with concepts still in. See section 14.10.
  5. Bjarne Stroustrup, The concept killer article.
  6. Jeremy Siek, The C++0x Concept Effort presentation slides.
  7. Posts about concepts on C++Next by Dave Abrahams and Doug Gregor.
  8. Dave Abrahams, To Auto or Not?.
  9. Examples of concept-like hacks in C++0x.

What’s the use of theory if you can’t apply it in practice? I’ve been blogging about concurrency, starting with the intricacies of multicore memory models, all the way through to experimental thread-safe type systems and obscure languages. In real life, however, I do most of my development in C++, which offers precious little support for multithreading. And yet I find that my coding is very strongly influenced by theoretical work.

C++ doesn’t support the type system based on ownership, which is a way to write provably race-free code; in fact it doesn’t have a notion of immutability, and very limited support for uniqueness. But that doesn’t stop me from considering ownership properties in my design and implementation. A lot of things become much clearer and less error-prone with better understanding of high-level paradigms.

I own a tiny software company, Reliable Software, which makes a distributed version control system, Code Co-op, written entirely in C++. Recently I had to increase the amount of concurrency in one of its components. I reviewed existing code, written some years ago, and realized how unnecessarily complex it was. No, it didn’t have data races or deadlocks, but it wasn’t clear how it would fare if more concurrency were added. I decided to rewrite it using the new understanding.

As it turned out, I was able to make it much more robust and maintainable. I found many examples of patterns based on ownership. I was able to better separate shared state from the non-shared, immutable from mutable, values from references. I was able to put synchronization exactly where it was needed and remove it from where it wasn’t. I’ve found reasonable trade-offs between message passing and data sharing.

It was a pretty amazing trip. I described it in my other blog, which deals more with actual programming experience. I though it might be a nice break from all the theory you normally find in this blog.

I’ve blogged before about the C++ unique_ptr not being unique and how true uniqueness can be implemented in an ownership-based type system. But I’ve been just scratching the surface.

The new push toward uniqueness is largely motivated by the demands of multithreaded programming. Unique objects are alias free and, in particular, cannot be accessed from more than one thread at a time. Because of that, they never require locking. They can also be safely passed between threads without the need for deep copying. In other words, they are a perfect vehicle for safe and efficient message passing. But there’s a rub…

The problem is this: How do you create and modify unique objects that have internal pointers. A classic example is a doubly linked list. Consider this Java code:

public class Node {
    public Node _next;
    public Node _prev;
public class LinkedList {
    private Node _head;
    public void insert(Node n) {
        n._next = _head;
        if (_head != null)
            _head._prev = n;
        _head = n;

Suppose that you have a unique instance of an empty LinkedList and you want to insert a new link into it without compromising its uniqueness.

The first danger is that there might be external aliases to the node you are inserting–the node is not unique, it is shared. In that case, after the node is absorbed:

_head = n;

_head would be pointing to an alias-contaminated object. The list would “catch” aliases and that would break the uniqueness property.

The remedy is to require that the inserted node be unique too, and the ownership of it be transferred from the caller to the insert method. (Notice however that, in the process of being inserted, the node loses its uniqueness, since there are potentially two aliases pointing to it from inside the list–one is _head and the other is _head._prev. Objects inside the list don’t have to be unique–they may be cross-linked.)

The second danger is that the method insert might “leak” aliases. The tricky part is when we let the external node, n, store the reference to our internal _head:

n._next = _head

We know that this is safe here because the node started unique and it will be absorbed into the list, so this alias will become an internal alias, which is okay. But how do we convince the compiler to certify this code as safe and reject code that isn’t? Type system to the rescue!

Types for Uniqueness

There have been several approaches to uniqueness using the type system. To my knowledge, the most compact and comprehensive one was presented by Haller and Odersky in the paper, Capabilities for External Uniqueness, which I will discuss in this post. The authors not only presented the theory but also implemented the prototype of the system as an extension of Scala. Since not many people are fluent in Scala, I’ll translate their examples into pseudo-Java, hopefully not missing much.

Both in Scala and Java one can use annotations to extend the type system. Uniqueness introduces three such annotations, @unique, @transient, and @exposed; and two additional keywords, expose and localize.

-Objects that are @unique

In the first approximation you may think of a @unique object as a leak-proof version of C++ unique_ptr. Such object is guaranteed to be “tracked” by at most one reference–no aliases are allowed. Also no external references are allowed to point to the object’s internals and, conversely, object internals may not reference any external objects. However, and this is a very important point, the insides of the @unique object may freely alias each other. Such a closed cross-linked mess is called a cluster.

Consider, for instance, a (non-empty) @unique linked list. It’s cluster consists of cross-linked set of nodes. It’s relatively easy for the compiler to guarantee that no external aliases are created to a @unique list–the tricky part is to allow the manipulation of list internals without breaking its uniqueness (Fig 1 shows our starting point).

Fig 1. The linked list and the node form separate clusters

Look at the definition of insert. Without additional annotations we would be able to call it with a node that is shared between several external aliases. After the node is included in the list, those aliases would be pointing to the internals of the list thus breaking its uniqueness. Because of that, the uniqueness-savvy compiler will flag a call to such un-annotated insert on a @unique list as an error. So how can we annotate insert so that it guarantees the preservation of uniqueness?

-Exposing and Localizing

Here’s the modified definition of insert:

public void insert(@unique Node n) @transient {
    expose (this) { list =>
        var node = localize (n, list);
        node._next = list._head;
        if (list._head != null)
            list._head._prev = node;
        list._head = node;

Don’t worry, most of the added code can be inferred by the compiler, but I make it explicit here for the sake of presentation. Let’s go over some of the details.

The node, n passed to insert is declared as @unique. This guarantees that it forms its own cluster and that n is the only reference to it. Also, @unique parameters to a method are consumed by that method. The caller thus loses her reference to it (the compiler invalidates it), as demonstrated in this example:

@unique LinkedList lst = new @unique LinkedList();
@unique Node nd = new @unique Node();
nd._next; // error: nd has been consumed!

The method itself is annotated as @transient. It means that the this object is @unique, but not consumed by the method. In general, the @transient annotation may be applied to any parameter, not only this. You might be familiar with a different name for transient–borrowed.

Inside insert, the this parameter is explicitly exposed (actually, since the method is @transient, the compiler would expose this implicitly).

expose (this) { list => ... }

The new name for the exposed this is list.

Once a cluster is exposed, some re-linking of its constituents is allowed. The trick is not to allow any re-linking that would lead to the leakage of aliases. And here’s the trick: To guarantee no leakage, the compiler assigns the exposed object a special type–its original type tagged by a unique identifier. This identifier is created for the scope of each expose statement. All members of the exposed cluster are also tagged by the same tag. Since the compiler type-checks every assignment it automatically makes sure that both sides have the same tag.

Now we need one more ingredient–bringing the @unique node into the cluster. This is done by localizing the parameter n to the same cluster as list.

var node = localize (n, list);

The localize statement does two things. It consumes n and it returns a reference to it that is tagged by the same tag as its second parameter. From that point on, node has the same tagged type as all the exposed nodes inside the list, and all assignments type-check.

Exposed list and localized node

Fig 2. The list has been exposed: all references are tagged. The node has been localized (given the same tag as the list). Re-linking is now possible without violating the type system.

Note that, in my pseudo-Java, I didn’t specify the type of node returned by localize. That’s because tagged types are never explicitly mentioned in the program. They are the realm of the compiler.

Functional Decomposition

The last example was somewhat trivial in that the code that worked on exposed objects fit neatly into one method. But a viable type system cannot impose restrictions on structuring the code. The basic requirement for any programming language is to allow functional decomposition–delegating work to separate subroutines, which can be re-used in other contexts. That’s why we have to be able to define functions that operate on exposed and/or localized objects.

Here’s an example from Haller/Odersky that uses recursion within the expose statement. append is a method of a singly-linked list:

void append(@unique SinglyLinkedList other) @transient
    expose(this) { list =>
        if (list._next == null)
            list._next = other; // localize and consume

In the first branch of the if statement, a @unique parameter, other, is (implicitly) localized and consumed. In the second branch, it is recursively passed to append. Notice an important detail, the subject of append, list._next, is not @unique–it is exposed. Its type has been tagged by a unique tag. But the append method is declared as @transient. It turns out that both unique and exposed arguments may be safely accepted as transient parameters (including the implicit this parameter).

Because of this rule, it’s perfectly safe to forgo the explicit expose inside a transient method. The append method may be thus simplified to:

void append(@unique SinglyLinkedList other) @transient
    // 'this' is implicitly exposed
    if (_next == null)
        _next = other; // localize and consume

Things get a little more interesting when you try to reuse append inside another method. Consider the implementation of insert:

void insert(@unique SingleLinkedList other) @transient
    var locOther = localize(other, this);
    if (other != null) 
        _next = locOther;

The insert method is transient–it works on unique or exposed lists. It accepts a unique list, other, which is consumed by the localize statement. The this reference is implicitly exposed with the same tag as locOther, so the last statement _next=locOther type-checks. The only thing that doesn’t type-check is the argument to append, which is supposed to be unique, but here it’s exposed instead.

This time there is no safe conversion to help us, so if we want to be able to reuse append, we have to modify its definition. First of all, we’ll mark its parameter as @exposed. An exposed parameter is tagged by the caller. In order for append to work, the this reference must also be tagged by the caller–with the same tag. Otherwise the assignment, _next=other, inside append, wouldn’t type-check. It follows that the append method must also be marked as @exposed (when there is more than one exposed parameter, they all have to be tagged with the same tag).

Here’s the new version of append:

void append(@exposed SinglyLinkedList other) @exposed
    if (_next == null)
        _next = other; // both exposed under the same tag
        _next.append(other); // both exposed under the same tag

Something interesting happened to append. Since it now operates on exposed objects, it’s the caller’s responsibility to expose and localize unique object (this is exactly what we did in insert). Interestingly, append will now also operate on non-annotated types. You may, for instance, append one non-unique list to another non-unique list and it will type-check! That’s because non-annotated types are equivalent to exposed types with a null tag–they form a global cluster of their own.

This kind of polymorphism (non-annotated/annotated) means that in many cases you don’t have to define separate classes for use with unique objects. What Haller and Odersky found out is that almost all class methods in the Scala’s collection library required only the simplest @exposed annotations without changing their implementation. That’s why they proposed to use the @exposed annotation on whole classes.


Every time I read a paper about Scala I’m impressed. It’s a language that has very solid theoretical foundations and yet is very practical–on a par with Java, whose runtime it uses. I like Scala’s approach towards concurrency, with strong emphasis on safe and flexible message passing. Like functional languages, Scala supports immutable messages. With the addition of uniqueness, it will also support safe mutable messages. Neither kind requires synchronization (outside of that provided by the message queue), or deep copying.

There still is a gap in the Scala’s concurrency model–it’s possible to share objects between threads without any protection. It’s up to the programmer to declare shared methods as synchronized–just like in Java; but there is no overall guarantee of data-race freedom. So far, only ownership systems were able to deliver that guarantee, but I wouldn’t be surprised if Martin Odersky had something else up his sleeve for Scala.

I’d like to thank Philip Haller for reading the draft of this post and providing valuable comments. Philip told me that a new version of the prototype is in works, which will simplify the system further, both for the programmer and the implementer.

Here’s the video from my recent talk to the Northwest C++ Users Group (NWCPP) about how to translate the data-race free type system into a system of user-defined annotations in C++. I start with the definition of a data race and discuss various ways to eliminate them. Then I describe the ownership system and give a few examples of annotated programs.

Here are the slides from the presentation. They include extensive notes.

By the way, we are looking for speakers at the NWCPP, not necessarily related to C++. We are thinking of changing the charter to include all programming languages. If you are near Seattle on Oct 21 09 (or any third Wednesday of the month), and you are ready to give a 60 min presentation, please contact me.

Is the Actor model just another name for message passing between threads? In other words, can you consider a Java Thread object with a message queue an Actor? Or is there more to the Actor model? Bartosz investigates.

I’ll start with listing various properties that define the Actor Model. I will discuss implementation options in several languages.


Actors are objects that execute concurrently. Well, sort of. Erlang, for instance, is not an object-oriented language, so we can’t really talk about “objects”. An actor in Erlang is represented by a thing called a Process ID (Pid). But that’s nitpicking. The second part of the statement is more interesting. Strictly speaking, an actor may execute concurrently but at times it will not. For instance, in Scala, actor code may be executed by the calling thread.

Caveats aside, it’s convenient to think of actors as objects with a thread inside.

Message Passing

Actors communicate through message passing. Actors don’t communicate using shared memory (or at least pretend not to). The only way data may be passed between actors is through messages.

Erlang has a primitive send operation denoted by the exclamation mark. To send a message Msg to the process (actor) Pid you write:

Pid ! Msg

The message is copied to the address space of the receiver, so there is no sharing.

If you were to imitate this mechanism in Java, you would create a Thread object with a mailbox (a concurrent message queue), with no public methods other than put and get for passing messages. Enforcing copy semantics in Java is impossible so, strictly speaking, mailboxes should only store built-in types. Note that passing a Java Strings is okay, since strings are immutable.

-Typed messages

Here’s the first conundrum: in Java, as in any statically typed language, messages have to be typed. If you want to process more than one type of messages, it’s not enough to have just one mailbox per actor. In Erlang, which is dynamically typed, one canonical mailbox per actor suffices. In Java, mailboxes have to be abstracted from actors. So an actor may have one mailbox for accepting strings, another for integers, etc. You build actors from those smaller blocks.

But having multiple mailboxes creates another problem: How to block, waiting for messages from more than one mailbox at a time without breaking the encapsulation? And when one of the mailboxes fires, how to retrieve the correct type of a message from the appropriate mailbox? I’ll describe a few approaches.

-Pattern matching

Scala, which is also a statically typed language, uses the power of functional programming to to solve the typed messages problem. The receive statement uses pattern matching, which can match different types. It looks like a switch statements whose case labels are patterns. A pattern may specify the type it expects. You may send a string, or an integer, or a more complex data structure to an actor. A single receive statement inside the actor code may match any of those.

receive {
    case s: String => println("string: "+ s)
    case i: Int => println("integer: "+ i)
    case m => println("unknown: "+ m)

In Scala the type of a variable is specified after the colon, so s:String declares the variable s of the type String. The last case is a catch-all.

This is a very elegant solution to a difficult problem of marrying object-oriented programming to functional programming–a task at which Scala exceeds.


Of course, we always have the option of escaping the type system. A mailbox could be just a queue of Objects. When a message is received, the actor could try casting it to each of the expected types in turn or use reflection to find out the type of the message. Here’s what Martin Odersky, the creator of Scala, has to say about it:

The most direct (some would say: crudest) form of decomposition uses the type-test and type-cast instructions available in Java and many other languages.

In the paper he co-authored with Emir and Williams (Matching Objects With Patterns) he gives the following evaluation of this method:

Evaluation: Type-tests and type-casts require zero overhead for the class hierarchy. The pattern matching itself is very verbose, for both shallow and deep patterns. In particular, every match appears as both a type-test and a subsequent type-cast. The scheme raises also the issue that type-casts are potentially unsafe because they can raise ClassCastExceptions. Type-tests and type-casts completely expose representation. They have mixed characteristics with respect to extensibility. On the one hand, one can add new variants without changing the framework (because there is nothing to be done in the framework itself). On the other hand, one cannot invent new patterns over existing variants that use the same syntax as the type-tests and type-casts.

The best one could do in C++ or D is to write generic code that hides casting from the client. Such generic code could use continuations to process messages after they’ve been cast. A continuation is a function that you pass to another function to be executed after that function completes (strictly speaking, a real continuation never returns, so I’m using this word loosely). The above example could be rewritten in C++ as:

void onString(std::string const & s) {
    cout << "string: " << s << std::endl;
void onInt(int i) {
    cout << "integer: " << i << std::endl;

receive<std::string, int> (&onString, &onInt);

where receive is a variadic template (available in C++0x). It would do the dynamic casting and call the appropriate function to process the result. The syntax is awkward and less flexible than that of Scala, but it works.

The use of lambdas might make things a bit clearer. Here’s an example in D using lambdas (function literals), courtesy Sean Kelly and Jason House:

    (string s){ writefln("string: %s", s); },
    (int i){ writefln("integer: %s", i); }

Interestingly enough, Scala’s receive is a library function with the pattern matching block playing the role of a continuation. Scala has syntactic sugar to make lambdas look like curly-braced blocks of code. Actually, each case statement is interpreted by Scala as a partial function–a function that is not defined for all values (or types) of arguments. The pattern matching part of case becomes the isDefinedAt method of this partial function object, and the code after that becomes its apply method. Of course, partial functions could also be implemented in C++ or D, but with a lot of superfluous awkwardness–lambda notation doesn’t help when partial functions are involved.


Finally, there is the problem of isolation. A message-passing system must be protected from data sharing. As long as the message is a primitive type and is passed by value (or an immutable type passed by reference), there’s no problem. But when you pass a mutable Object as a message, in reality you are passing a reference (a handle) to it. Suddenly your message is shared and may be accessed by more than one thread at a time. You either need additional synchronization outside of the Actor model or risk data races. Languages that are not strictly functional, including Scala, have to deal with this problem. They usually pass this responsibility, conveniently, to the programmer.


Java is not a good language to implement the Actor model. You can extend Java though, and there is one such extension worth mentioning called Kilim by Sriram Srinivasan and Alan Mycroft from Cambridge, UK. Messages in Kilim are restricted to objects with no internal aliasing, which have move semantics. The pre-processor (weaver) checks the structure of messages and generates appropriate Java code for passing them around. I tried to figure out how Kilim deals with waiting on multiple mailboxes, but there isn’t enough documentation available on the Web. The authors mention using the select statement, but never provide any details or examples.

Correction: Sriram was kind enough to provide an example of the use of select:

int n = Mailbox.select(mb0, mb1, .., timeout);

The return value is the index of the mailbox, or -1 for the timeout. Composability is an important feature of the message passing model.

Dynamic Networks

Everything I described so far is common to CSP (Communicating Sequential Processes) and the Actor model. Here’s what makes actors more general:

Connections between actors are dynamic. Unlike processes in CSP, actors may establish communication channels dynamically. They may pass messages containing references to actors (or mailboxes). They can then send messages to those actors. Here’s a Scala example:

receive {
    case (name: String, actor: Actor) =>
        actor ! lookup(name)

The original message is a tuple combining a string and an actor object. The receiver sends the result of lookup(name) to the actor it has just learned about. Thus a new communication channel between the receiver and the unknown actor can be established at runtime. (In Kilim the same is possible by passing mailboxes via messages.)

Actors in D

The D programming language with my proposed race-free type system could dramatically improve the safety of message passing. Race-free type system distinguishes between various types of sharing and enforces synchronization when necessary. For instance, since an Actor would be shared between threads, it would have to be declared shared. All objects inside a shared actor, including the mailbox, would automatically inherit the shared property. A shared message queue inside the mailbox could only store value types, unique types with move semantics, or reference types that are either immutable or are monitors (provide their own synchronization). These are exactly the types of messages that may be safely passed between actors. Notice that this is more than is allowed in Erlang (value types only) or Kilim (unique types only), but doesn’t include “dangerous” types that even Scala accepts (not to mention Java or C++).

I will discuss message queues in the next installment.

If it weren’t for the multitude of opportunities to shoot yourself in the foot, multithreaded programming would be easy. I’m going to discuss some of these “opportunities” in relation to global variables. I’ll talk about general issues and discuss the ways compilers can detect them. In particular, I’ll show the protections provided by my proposed extensions to the type system.

Global Variables

There are so many ways the sharing of global variables between threads can go wrong, it’s scary.

Let me start with the simplest example: the declaration of a global object of class Foo (in an unspecified language with Java-like syntax).

Foo TheFoo = new Foo;

In C++ or Java, TheFoo would immediately be visible to all threads, even if Foo provided no synchronization whatsoever (strictly speaking Java doesn’t have global variables, but static data members play the same role).

If the programmer doesn’t do anything to protect shared data, the default immediately exposes her to data races.

The D programming language (version 2.0, also known as D2) makes a better choice–global variables are, by default, thread local. That takes away the danger of accidental sharing. If the programmer wants to share a global variable, she has to declare it as such:

shared Foo TheFoo = new shared Foo;

It’s still up to the designer of the class Foo to provide appropriate synchronization.

Currently, the only multithreaded guarantee for shared objects in D2 is the absence of low-level data races on multiprocessors–and even that, only in the safe subset of D. What are low level data races? Those are the races that break some lock-free algorithms, like the infamous Double-Checked Locking Pattern. If I were to explain this to a Java programmer, I’d say that all data members in a shared object are volatile. This property propagates transitively to all objects the current object has access to.

Still, the following implementation of a shared object in D would most likely be incorrect even with the absence of low-level data races:

class Foo {
    private int[] _arr;
    public void append(int i) {
       _arr ~= i; // array append

auto TheFoo = new shared Foo;

The problem is that an array in D has two fields: the length and the pointer to a buffer. In shared Foo, each of them would be updated atomically, but the duo would not. So two threads calling TheFoo.append could interleave their updates in an unpredictable way, possibly leading to loss of data.

My race-free type system goes further–it eliminates all data races, both low- and high-level. The same code would work differently in my scheme. When an object is declared shared, all its methods are automatically synchronized. TheFoo.append would take Foo‘s lock and make the whole append operation atomic. (For the advanced programmer who wants to implement lock-free algorithms my scheme offers a special lockfree qualifier, which I’ll describe shortly.)

Now suppose that you were cautious enough to design your Java/D2 class Foo to be thread safe:

class Foo {
    private int [] _arr;
    public synchronized void append(int i) {
       _arr ~= i; // array append

Does it mean your global variable, TheFoo, is safe to use? Not in Java. Consider this:

static Foo TheFoo; // static = global
// Thread 1
TheFoo = new Foo();
// Thread 2
while (TheFoo == null)

You won’t even know what hit you when your program fails. I will direct the reader to one of my older posts that explains the problems of publication safety on a multiprocessor machine. The bottom line is that, in order to make your program work correctly in Java, you have to declare TheFoo as volatile (or final, if you simply want to prevent such usage). Again, it looks like in Java the defaults are stacked against safe multithreading.

This is not a problem in D2, since shared implies volatile.

In my scheme, the default behavior of shared is different. It works like Java’s final. The code that tries to rebind the shared object (re-assign to the handle) would not compile. This is to prevent accidental lock-free programming. (If you haven’t noticed, the code that waits on the handle of TheFoo to switch from null to non-null is lock-free. The handle is not protected by any lock.) Unlike D2, I don’t want to make lock-free programming “easy,” because it isn’t easy. It’s almost like D2 is endorsing lock-free programming by giving the programmer a false sense of security.

So what do you do if you really want to spin on the handle? You declare your object lockfree.

lockfree Foo TheFoo;

lockfree implies shared (it doesn’t make sense otherwise), but it also makes the handle “volatile”. All accesses to it will be made sequentially consistent (on the x86, it means all stores will compile to xchg).

Note that lockfree is shallow–data members of TheFoo don’t inherit the lockfree qualification. Instead, they inherit the implied shared property of TheFoo.

It’s not only object handles that can be made lockfree. Other atomic data types like integers, Booleans, etc., can also be lockfree. A lockfree struct is also possible–it is treated as a tuple whose all elements are lockfree. There is no atomicity guarantee for the whole struct. Methods can be declared lockfree to turn off default synchronization.


Even the simplest case of sharing a global variable between threads is fraught with danger. My proposal inobtrusively eliminates most common traps. The defaults are carefully chosen to let the beginners avoid the pitfalls of multithreaded programming.

In my last post I talked about the proposal for the ownership scheme for multithreaded programs that provides alias control and eliminates data races. The scheme requires the addition of new type qualifiers to the (hypothetical) language. The standard concern is that new type qualifiers introduce code duplication. The classic example is the duplication of getters required by the introduction of the const modifier:

class Foo {
    private Bar _bar;
    Bar get() {
        return _bar;
    const Bar get() const {
        return _bar;

Do ownership annotations lead to the same kind of duplication? Fortunately not. It’s true that, in most cases, two implementations of each public method are needed–with and without synchronization–but this is taken care by the compiler, not by the programmer. Unlike in Java, we don’t need a different class for shared Vector and thread-local ArrayList. In my scheme, when a vector is instantiated as a monitor (shared), the compiler automatically puts in the necessary synchronization code.

Need for generic code

The ownership scheme introduces an element of genericity by letting the programmer specify ownership during the instantiation of a class (just as a template parameter is specified during the instantiation of a template).

I already mentioned that most declarations can be expressed in two ways: one using type qualifiers, another using template notation–the latter exposing the generic nature of ownership annotations. For instance, these two forms are equivalent:

auto foo2 = new shared Foo;
auto foo2 = new Foo<owner::self>;

The template form emphasizes the generic nature of ownership annotations.

With the ownership system in place, regular templates parametrized by types also gain an additional degree of genericity since their parameters now include implicit ownership information. This is best seen in objects that don’t own the objects they hold. Most containers have this property (unless they are restricted to storing value types). For instance, a stack object might be thread-local while its elements are either thread-local or shared. Or the stack might be shared, with shared elements, etc. The source code to implement such stacks may be identical.

The polymorphic scheme and the examples are based on the GRFJ paper I discussed in a past post.

An example

– Stack

A parameterized stack might look like this :

class Stack<T> {
    private Node<T> _top;
    void push(T value) {
        auto newNode = new Node<owner::this, T>;
        newNode.init(:=value, _top);
        _top = newNode;
    T pop() {
        if (_top is null) return null;
        auto value := _top.value();
        _top = _top.next();
        return value;

This stack is parametrized by the type T. This time, however, T includes ownership information. In particular T could be shared, unique, immutable or–the default–thread-local. The template picks up whatever qualifier is specified during instantiation, as in:

auto stack = new Stack<unique Foo>;

The _top (the head of the linked list) is of the type Node, which is parametrized by the type T. What is implicit in this declaration is that _top is owned by this–the default assignment of ownership for subobjects. If you want, you can make this more explicit:

private Node<owner::this, T> _top;

Notice that, when constructing a new Node, I had to specify the owner explicitly as this. The default would be thread-local and could leak thread-local aliases in the constructor. It is technically possible that the owner::this could, at least in this case, be inferred by the compiler through simple flow analysis.

Let’s have a closer look at the method push, where some interesting things are happening. First push creates a new node, which is later assigned to _top. The compiler cannot be sure that the Node constructor or the init method don’t leak aliases. That looks bad at first glance because, if an alias to newNode were leaked, that would lead to the leakage of _top as well (after a pop).

And here’s the clincher: Because newNode was declared with the correct owner–the stack itself–it can’t leak an alias that has a different owner. So anybody who tries to access the (correctly typed) leaked alias would have to hold the lock on the stack. Which means that, if the stack is shared, unsynchronized access to any of the nodes and their aliases is impossible. Which means no races on Nodes.

I also used the move operator := to move the values stored on the stack. That will make the stack usable with unique types. (For non-unique types, the move operator turns into regular assignment.)

I can now instantiate various stacks with different combinations of ownerships. The simplest one is:

auto localStack = new Stack<Foo>;

which is thread-local and stores thread-local objects of class Foo. There are no restrictions on Foo.

A more interesting combination is:

auto localStackOfMonitors = new Stack<shared Foo>;

This is a thread-local stack which stores monitor objects (the opposite is illegal though, as I’ll explain in a moment).

There is also a primitive multithreaded message queue:

auto msgQueue = new shared Stack<shared Foo>;

Notice that code that would try to push a thread-local object on the localStackOfMonitors or the msgQueue would not compile. We need the rich type system to be able to express such subtleties.

Other interesting combinations are:

auto stackOfImmutable = new shared Stack<immutable Foo>;
auto stackOfUnique = new shared Stack<unique Foo>;

The latter is possible because I used move operators in the body of Stack.

– Node

Now I’ll show you the fully parameterized definition of Node. I made all ownership annotations explicit for explanatory purposes. Later I’ll argue later that all of them could be elided.

class Node<T> {
    T _value;
    Node<owner::of_this, T> _next;
    void init(T v, Node<owner::of_this, T> next)
        _value := v;
        _next = next;
    T value() {
        return :=_value;
    Node<owner::of_this, T> next() {
        return _next;

Notice the declaration of _next: I specified that it must be owned by the owner of the current object, owner::of_this. In our case, the current object is a node and its owner is an instance of the Stack (let’s assume it’s the self-owned msgQueue).

This is the most logical assignment of ownership: all nodes are owned by the same stack object. That means no ownership conversions need be done, for instance, in the implementation of pop. In this assignment:

_top = _top.next();

the owner of _top is msgQueue, and so is the owner of its _next object. The types match exactly. I drew the ownership tree below. The fat arrows point at owners.
Ownership hierarchy using owner::of_this

But that’s not the only possibility. The default–that is _next being owned by the current node–would work too. The corresponding ownership tree is shown below.

Ownership hierarchy using owner::this

The left-hand side of the assignment

_top = _top.next();

is still owned by msgQueue. But the _next object inside the _top is not. It is owned by the _top node itself. These are two different owners so, during the assignment, the compiler has to do an implicit ownership conversion. Such conversion is only safe if both owners belong to the same ownership tree (sometimes called a “region”). Indeed, ownership is only needed for correct locking, and the locks always reside at the top of the tree (msgQueue in this case). So, after all, we don’t need to annotate _next with the ownership qualifier.

The two other annotations can be inferred by the compiler (there are some elements of type inference even in C++0x and D). The argument next to the method init must be either owned by this or be convertible to owner::this because of the assignment

_next = next;

Similarly, the return from the method next is implicitly owned by this (the node). When it’s used in Stack.pop:

_top = _top.next();

the owner conversion is performed.

With ownership inference, the definition of Node simplifies to the following:

class Node<T> {
    T _value;
    Node<T> _next; // by default owned by this
    void init(T v, Node<T> next)
        _value := v;
        _next = next; // inference: owner of next must be this
    T value() {
        return :=_value;
    Node<T> next() {
        return _next; // inference: returned node is owned by this

which has no ownership annotations.

Let me stress again a very important point: If init wanted to leak the alias to next, it would have to assign it to a variable of the type Node<owner::this, T>, where this is the current node. The compiler would make sure that such a variable is accessed only by code that locks the root of the ownership tree, msgQueue. This arrangement ensures the absence of races for the nodes of the list.

Another important point is that Node contains _value of type T as a subobject. The compiler will refuse instantiations where Node‘s ownership tree is shared (its root is is self-owned), and T is thread-local. Indeed, such instantiation would lead to races if an alias to _value escaped from Node. Such an alias, being thread-local, would be accessible without locking.

Comment on notation

In general, a template parameter list might contain a mixture of types, type qualifiers, values, (and, in D, aliases). Because of this mixture, I’m using special syntax for ownership qualifiers, owner::x to distinguish them from other kinds of parameters.

As you have seen, a naked ownership qualifier may be specified during instantiation. If it’s the first template argument, it becomes the owner of the object. Class templates don’t specify this parameter, but they have access to it as owner::of_this.

Other uses of qualifier polymorphism

Once qualifier polymorphism is in the language, there is no reason not to allow other qualifiers to take part in polymorphism. For instance, the old problem of having to write separate const versions of accessors can be easily solved:

class Foo {
    private Bar _bar;
    public mut_q Bar get<mutability::mut_q>() mut_q
        return _bar;

Here method get is parametrized by the mutability qualifier mut_q. The values it can take are: mutable (the default), const, or immutable. For instance, in

auto immFoo = new immutable Foo;
immutable Bar b = immFoo.get();

the immutable version of get is called. Similarly, in

void f(const Foo foo) {
    const Bar b = foo.get();

the const version is called (notice that f may also be called with an immutable object–it will work just fine).

Class methods in Java or D are by default virtual. This is why, in general, non-final class methods cannot be templatized (an infinite number of possible versions of a method would have to be included in the vtable). Type qualifiers are an exception, because there is a finite number of them. It would be okay for the vtable to have three entries for the method get, one for each possible value of the mutability parameter. In this case, however, all three are identical, so the compiler will generate just one entry.


The hard part–explaining the theory and the details of the ownership scheme–is over. I will now switch to a tutorial-style presentation that is much more programmer friendly. You’ll see how simple the scheme really is in practice.

Since ownership plays a major role in race-free programming, it will be the first topic in my proposal for a race-free system. I presented the bird’s eye view of the system and provided a few teasers in my previous post. The design is based on published papers (see bibliography at the end). My contribution was to integrate several ideas into one package.

When I showed this proposal to my friends they either didn’t believe it could work or considered it too complex, depending which end they were looking at. From users’ perspective, the system looks relatively simple, so the natural reaction is: That can’t work. If you get into the details of why it works, and how the compiler knows you are in danger of a data race, you need some theory, and that is complex. So I decided to deal with some theory first, to show that the things work. If you’re not into theory, just look at the examples. They are usually simple to understand.


The ownership relationship is necessary to establish a tree-like structure among objects. This is needed by the compiler to decide which lock, if any, is responsible for the protection of each object, and take it when necessary. Simply speaking, the lock at the root of each tree protects the rest of the tree. If you think that your multithreaded programs don’t follow a tree structure, look at them closely. If they don’t, you either already have data races or are likely to develop them in the future.

-Every object has an owner

The owner may be another object–usually the embedding object. In the example below:

class Foo {
    void doWork() { _bar.doWork(); }
    private Bar _bar;

auto foo = new Foo;

the embedded object _bar is owned, at runtime, by the object foo (I repeat, the concrete object, not the class Foo). This is the default ownership relationship for embedded objects, so no special notation is needed to establish it (I’ll show later how to override this default).

There are also special symbolic “owners” that are used for the roots of ownership trees:

  • thread,
  • self,
  • unique, and
  • immutable.

unique and immutable are included in this list for convenience. I’ll discuss them later.


Every object has just one owner for life, a condition necessary to create ownership trees that can be checked at compile time. Every tree has a single root and a lock is attached to that root, if needed.

The ownership information is embedded in the type of the object. Using this information, the compiler is able to deduce which lock must be held while accessing that object, and what kind of aliasing is allowed. All races (access to mutable shared variables without locking) are detected at compile time. I’ll sketch a proof later.

-What may be shared

Only immutable objects or objects rooted with a self-owned object may be shared between threads.

Additionally, objects whose direct owner is self (such objects are called monitors) may have multiple aliases while being shared. Monitors may own (and protect) other objects that are not monitors.


The compiler will make sure that access to an object can only happen when the root of its ownership tree is locked (symbolic owners other than self are considered locked at all times). Since an object may only have one lock associated with it (at the top of its ownership tree), this condition is enough to ensure freedom from races.

Proof: I have to show that when a (mutable) object is seen by more than one thread, each access to it (read or write) is always protected by the same lock. Indeed, for an object to be shared between threads, the root of its ownership tree must be self, hence the top object must be a monitor. This monitor’s lock is always, automatically or explicitly, taken before accessing any member of the tree. The compiler knows which lock to take because the ownership information is encoded in the type of the object.

Introducing ownership annotations

Ownership is specified at the instance level (although it may be restricted at the class level). The previous example, which relied on default assignment of owners, is equivalent to the more explicit instance-level specification (that you will never see in actual programs):

Foo<owner::thread> foo = new Foo<owner::thread>;

This declares and constructs foo as being owned by the symbolic owner, thread. The embedded object _bar‘s owner is foo.

-Creating a monitor

A self-owned object is a monitor (I will alternate between the notation using shared type modifier or explicit owner annotation, <owner::self>). It contains a hidden lock and its methods are, by default, synchronized. Continuing with my example:

auto fooMon = new shared Foo;
// The same as:
// auto fooMon = new Foo<owner::self>;

The variable fooMon is a monitor and the doWork method is implicitly synchronized. The object _bar is now owned by fooMon. Its type can be expressed (this is rarely needed, however see the example of external ownership) as:


Types parameterized by runtime entities (fooMon is a runtime handle) are known in programming language theory as dependent types.

Notice that I’m using the same class to create thread-local and shared instances. This is usually possible unless there is a specific restriction at the class level.

Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)

Thread-local objects

The special thread owner, which is the owner of all thread-local objects, is conceptually always locked, so thread-local objects don’t require any synchronization. Also, thread is the default owner so, in the absence of any ownership annotations, all objects are thread-local. That’s one of the defaults that makes single-threaded programs work as-is.

Here’s an interesting twist–global and static objects are by default thread-local. This part has been implemented in D, uncovering a number of threading bugs in the process.


The special self owner (or the shared type modifier) is used to create monitor objects. A monitor has a built-in lock and all its public methods are by default synchronized.

As always with defaults, the language must provide a (preferably safe) way to bypass them. To prevent locking, a method may be explicitly marked as lockfree. The compiler is obliged to check if the lockfree method doesn’t access the object’s members in a non-safe way (although it can’t prevent high-level races on lockfree variables). That restricts the lockfree constructs to those that don’t require whole-program analysis to prove their safety.

The lockfree annotation is essential for, among others, the double-checked locking pattern (DCLP). I showed its implementation as a teaser in my previous post.


As I explained earlier, data members of an object are by default owned by that object. This way they inherit the root owner from their parent. This is another default that makes single-threaded programs run without additional qualifiers.

Notice that there are two important aspects of ownership, the direct owner and the root owner, which might be different. The direct owner is used in type-checking, the root owner in deciding which synchronization method to use. Both are known or inferred during compilation.

As usual, the defaults may be overridden. For instance, you may embed a monitor in a thread-local object by qualifying it as self-owned/shared:

class Holder {
    private Mon<owner::self> _mon;

or, in common notation, as shared:

class Holder {
    private shared Mon _mon;

Here, _mon is not owned by Holder (the default has been overridden) so it doesn’t inherit its root owner. Its methods are synchronized by its own lock. As you can see, ownership tree not always reflects embedding. An embedded monitor starts a new tree.

Well, the situation is a bit subtler. Objects in Java or D have reference semantics, so there is a hidden pointer, or handle, in the code above. Accessing the handle is not the same as accessing the object proper. Consider this example:

class Holder {
    private shared Mon _mon;
    public setMon(shared Mon newMon) {
        _mon = newMon;

Let’s instantiate a self-owned Holder and a self-owned Mon:

auto holder = new shared Holder;
auto newMon = new shared Mon;

Since holder is itself a monitor, the setMon method is automatically synchronized by its lock (it must be!). Therefore, strictly speaking, the handle part of _mon is owned by holderMon, whereas the object-proper part is self-owned.

You cannot embed a thread-owned object inside a monitor–the compiler would flag it as an error. This is part of alias control–a thread-local object might possibly have thread-local aliases that may be accessed without locking. Being part of a monitor, it could then migrate to another thread and cause a race.

What if a subobject is accessed directly (not through a method)? This may happen when the subobject is declared public:

class Foo {
    public Bar _bar;

In that case not all uses of _bar are allowed. Consider this:

auto foo = new shared Foo;
foo._bar.m(); // error

Access to _bar must happen only when foo is locked. The compiler knows it because the full type of _bar is:


Here’s the corrected code:

synchronized(foo) {

An even better solution is to make _bar private and provide appropriate methods to access it. Those methods would be automatically synchronized for a shared foo.

unique and immutable

I discussed unique objects in one of my previous posts. Although not strictly required in the ownership scheme, uniqueness allows for very efficient and safe transmission of large objects between threads. It makes sense to include unique as another symbolic root owner, since its multithreaded semantics is different from other types and it doesn’t require locking.

Some languages, including D, define immutable objects, which cannot be modified after creation. Such objects may be freely shared and passed by reference between threads. Again, immutable may be used as a root owner.


With the preliminaries out of the way, I can now explain in more detail the workings of the teaser from my previous post. Here’s the definition of the class MVar:

class MVar<T> {
    T    _msg;
    bool _full;
    void put(T msg) {
        _msg := msg; // move
        _full = true;
    T take() {
        while (!_full)
        _full = false;
        return := _msg;

First, let’s instantiate MVar as a shared (self-owned) monitor that is used to pass unique objects of class Foo as messages:

auto chanUnique = new shared MVar<unique Foo>;

The type of _msg in this instantiation is unique Foo, which is the same as Foo<owner::unique>. The method put takes unique Foo, so the following code is type-correct:

auto foo = new unique Foo;
chanUnique.put(:= foo); // move foo

Notice that unique objects cannot be assigned or passed by value–they have to be moved, hence the use of the move operator, :=. Internally, the method put also uses the move operator (good thinking on the part of the designer–otherwise MVar couldn’t be instantiated with unique). What’s interesting about this example is that messages are not deep-copied between threads. They are safely passed by reference.

Since chanUnique is self-owned (shared), both put and get are automatically synchronized.

Now let’s access chanUnique from another thread:

// another thread
unique Foo f2 = chanUnique.get(); // implicit move of rvalue

The return type of get is unique Foo, so the types check. I could have used the move operator, but since the right hand side is an rvalue, the compiler lets me use the assignment.

Now for the tricky case: What’s wrong with this code?

auto mVar = new shared MVar<Foo>;
auto myFoo = new Foo;
myFoo.unsyncMethod(); // ouch!

Since myFoo is created as thread-local (that’s the default), its methods are not synchronized. If I were able to pass it to shared MVar, another thread could obtain it through get. It could then call the unsynchronized method unsyncMethod at the moment when I was calling it. A data race would be possible! Or would it?

Guess what–the compiler won’t let you shoot yourself in the foot. It will notice that it would have to instantiate a shared object mVar with a thread-local member _msg. That’s against the rules! (A shared object cannot own a thread-local object.)

External ownership

In the original GRFJ paper the authors showed an example where one object was owned by another object without the former being embedded in the latter. They made an observation that, for the purpose of locking, the ownership relationship must be unchangeable: You can’t switch the owner on the fly. Therefore external ownership is allowed only if the owner is declared final.

final shared Lock lockObj = new shared Lock;
auto foo = new Foo<owner::lockObj>;
auto bar = new Bar<owner::lockObj>;

In this case, the compiler will only allow access to foo under the lock of lockObj, as in:

synchronized(lockObj) {

This construct is useful in situations where the locking discipline is not easily convertible to object hierarchy.


You might have noticed my use of dual notation. Most user code would be written with type qualifiers such as shared, unique, or immutable. However, in some cases I used an alternative notation that looked more like the specification of template parameters: <owner::self>, <owner::unique>, <owner::immutable>, or even <owner::thread> (in D they would be surrounded by !( and )). This was not meant to further confuse the reader, but as a gentle introduction to qualifier polymorphism, which I will describe in the next installment. I will show how classes and methods may be parameterized with different types of ownership, cutting down code duplication.

I’d like to thank Andrei Alexandrescu, Walter Bright, Sean Kelly and Jason House for very helpful comments. I’m also indebted to the D community for discussing my previous posts.


  1. Boyapati, Rinard, A Parameterized Type System for Race-Free Java Programs
  2. C. Flanagan, M. Abadi, Object Types against Races.

« Previous PageNext Page »