D Programming Language

Archived Posts from this Category

April 3, 2013

Beautiful Concurrency

Posted by Bartosz Milewski under C++, Concurrency, D Programming Language, Functional Programming, Haskell, Multithreading, Programming, Software Transactional Memory
[7] Comments

I’ve been dealing with concurrency for many years in the context of C++ and D (see my presentation about Software Transactional Memory in D). I worked for a startup, Corensic, that made an ingenious tool called Jinx for detecting data races using a lightweight hypervisor. I recorded a series of screencasts teaching concurrency to C++ programmers. If you follow these screencasts you might realize that I was strongly favoring functional approach to concurrency, promoting immutability and pure functions. I even showed how non-functional looking code leads to data races that could be detected by (now defunct) Jinx. Essentially I was teaching C++ programmers how to imitate Haskell.

Except that C++ could only support a small subset of Haskell functionality and provide no guarantees against even the most common concurrency errors.

It’s unfortunate that most programmers haven’t seen what Haskell can do with concurrency. There is a natural barrier to learning a new language, especially one that has the reputation of being mind boggling. A lot of people are turned off simply by unfamiliar syntax. They can’t get over the fact that in Haskell a function call uses no parentheses or commas. What in C++ looks like

f(x, y)

in Haskell is written as:

f x y

Other people dread the word “monad,” which in Haskell is essentially a tool for embedding imperative code inside functional code.

Why is Haskell syntax the way it is? Couldn’t it have been more C- or Java- like? Well, look no farther than Scala. Because of Java-like syntax the most basic functional trick, currying a function, requires a Scala programmer to anticipate this possibility in the function definition (using special syntax: multiple pairs of parentheses). If you have no access to the source code for a function, you can’t curry it (at least not without even more syntactic gymnastics).

If you managed to wade through this post so far, you are probably not faint of heart and you will accept the challenge of reading an excellent article by Simon Peyton Jones, Beautiful Concurrency that does a great job of explaining to the uninitiated Haskell’s beautiful approach to concurrency. It’s not a new article, but it has been adapted to the new format of the School of Haskell by Yours Truly, which means you’ll be able to run code examples embedded in it. If you feel adventurous, you might even edit them and see how the results change.

October 21, 2009

What Does Haskell Have to Do with C++?

Posted by Bartosz Milewski under C++, D Programming Language, Functional Programming, Haskell, Programming
[58] Comments

If you want to understand C++ template metaprogramming (TMP) you have to know functional programming. Seriously. I want you to think of TMP as maximally obfuscated (subset of) Haskell, and I’ll illustrate this point by point. If you don’t know Haskell, don’t worry, I’ll explain the syntax as I go.

The nice thing about single-paradigm languages like Haskell is that they have very simple syntax (think of Lisp that does everything with just a bunch of parentheses). I will start the Haskell-C++TMP mapping with basics, like functions and recursion, but I’ll try to cover a lot more, including higher-order functions, pattern matching, list comprehension (did you know it was expressible in C++?), and more.

Keep in mind that my Haskell examples are runtime functions operating on runtime data whereas their C++ TMP equivalents are compile-time templates operating mostly on types. Operation on types are essential in providing correct and efficient implementations of parametrized classes and functions.

By necessity the examples are simple, but the same mapping may be applied to much more complex templates from the C++ Standard Library and Boost. As a bonus, I’ll also explain the hot new thing, variadic templates.

Functional Approach to Functions

How do you implement useful functions if you don’t have mutable variables, if statements, or loops? To a C++ programmer that might seem like an impossible task. But that’s the reality of C++ compile-time language that forms the basis of TMP. Functional programming to the rescue!

As a warm-up, let’s see how Haskell implements a simple function, the factorial:

fact 0 = 1
fact n = n * fact (n - 1)

The first line states that the factorial of zero is one. The second line defines factorial for a non-zero argument, n (strictly speaking it only works for positive non-zero arguments). It does it using recursion: factorial of n is equal to n times the factorial of n-1. The recursion stops when n is equal to zero, in which case the first definition kicks in. Notice that the definition of the function fact is split into two sub-definitions. So when you call:

fact 4

the first definition is looked up first and, if it doesn’t match the argument (which it doesn’t), the second one comes into play. This is the simplest case of pattern matching: 4 doesn’t match the “pattern” 0, but it matches the pattern n.

Here’s almost exactly the same code expressed in C++ TMP:

template<int n> struct
fact {
    static const int value = n * fact<n - 1>::value;
};

template<> struct
fact<0> { // specialization for n = 0
    static const int value = 1;
};

You might notice how the horrible syntax of C++ TMP obscures the simplicity and elegance of this code. But once you are equipped with the C++/Haskell decoder ring, things become a lot clearer.

Let’s analyze this code. Just like in Haskell, there are two definitions of fact, except that their order is inverted. This is because C++ requires template specialization to follow the template’s general definition (or declaration, as we’ll see later). The pattern matching of arguments in C++ does not follow the order of declarations but rather is based on “best match”. If you instantiate the template with argument zero:

cout << "Factorial of 0 = " << fact<0>::value << endl;

the second pattern, fact<0>, is a better fit. Otherwise the first one, <int n>, is used.

Notice also the weird syntax for “function call”

fact<n>::value

and for the “return statement”

static const int value = n * fact<n - 1>::value;

This all makes sense if you look at templates as definitions of parameterized types, which was their initial purpose in C++. In that interpretation, we are defining a struct called fact, parameterized by an integer n, whose sole member is a static const integer called value. Moreover, this template is specialized for the case of n equal zero.

Now I want you to forget about what I just said and put on the glasses which make the C++ code look like the corresponding Haskell code.

Here’s another example–this time of a predicate (a function returning a Boolean):

is_zero 0 = True
is_zero x = False

Let’s spice it up a little for C++ and define a predicate on types rather than integers. The following compile-time function returns true only when the type T is a pointer:

template<class T> struct
isPtr {
    static const bool value = false;
};

template<class U> struct
isPtr<U*> {
    static const bool value = true;
};

This time the actual argument to isPtr is first matched to the more specialized pattern, U* and, if it fails, the general pattern is used.

We can add yet another specialization, which will pattern-match a const pointer:

template<class U> struct
isPtr<U * const> {
    static const bool value = true;
};

These types of type predicates may be, for instance, used to select more flexible and efficient implementations of parameterized containers. Think of the differences between a vector of values vs. a vector of pointers.

Lists

The basic data structure in functional languages is the list. Haskell’s lists are introduced using square brackets. For instance, a list of three numbers, 1, 2, 3, looks like:

[1, 2, 3]

List processing in functional languages follows the standard pattern: a list is split into head and tail, an operation is performed on the head, and the tail is processed using recursion. The splitting is done by pattern matching: the Haskell pattern being (head:tail) (to be precise, the colon in parentheses represents the cons operation–the creation of a list by prepending an element to an existing list).

Here’s a simple function, count, that calculates the length of a list:

count [] = 0
count (head:tail) = 1 + count tail

The first pattern, [], matches an empty list; the second a non-empty one. Notice that a function call in Haskell doesn’t use parentheses around arguments, so count tail is interpreted as a call to count with the argument tail.

Before C++0x, TMP was severely crippled by the lack of a list primitive. People used separate definitions for a list of zero, one, two, etc., elements, and even used special macros to define them. This is no longer true in C++0x, thanks to variadic templates and template parameter packs. Here’s our Haskel count translated into C++ TMP:

// Just a declaration
template<class... list> struct
count;

template<> struct
count<> {
    static const int value = 0;
};

template<class head, class... tail> struct
count<head, tail...> {
    static const int value = 1 + count<tail...>::value;
};

First we have a declaration (not a definition) of the template count that takes a variable number of type parameters (the keyword class or typename introduces a type parameter). They are packed into a template parameter pack, list.

Once this general declaration is visible, specializations may follow in any order. I arranged them to follow the Haskell example. The first one matches the empty list and returns zero. The second one uses the pattern, <head, tail…>. This pattern will match any non-empty list and split it into the head and the (possibly empty) tail.

To “call” a variadic template, you initiate it with an arbitrary number of arguments and retrieve its member, value, e.g.,

int n = count<int, char, long>::value; // returns 3

A few words about variadic templates: A variadic template introduces a template parameter pack using the notation class… pack (or int… ipack, etc…). The only thing you may do with a pack is to expand it and pass to another variadic template. The expansion is done by following the name of the pack with three dots, as in tail…. You’ll see more examples later.

Variadic templates have many applications such as type-safe printf, tuples (objects that store an arbitrary number of differently typed arguments), variants, and many more.

Higher-Order Functions and Closures

The real power of functional programming comes from treating functions as first class citizens. It means that you may pass functions to other functions and return functions from functions. Functions operating on functions are called higher-order functions. Surprisingly, it seems like compile-time C++ has better support for higher-order functions than run-time C++.

Let’s start with a Haskell example. I want to define a function that takes two predicate functions and returns another predicate function that combines the two using logical OR. Here it is in Haskell:

or_combinator f1 f2 = 
    λ x -> (f1 x) || (f2 x)

The or_combinator returns an anonymous function (the famous “lambda”) that takes one argument, x, calls both f1 and f2 with it, and returns the logical OR of the two results. The return value of or_combinator is this freshly constructed function. I can then call this function with an arbitrary argument. For instance, here I’m checking if 2 is either zero or one (guess what, it isn’t!):

(or_combinator is_zero is_one) 2

I put the parentheses around the function and its arguments for readability, although they are not strictly necessary. 2 is the argument to the function returned by or_combinator.

The lambda that’s returned from or_combinator is actually a closure. It “captures” the two arguments, f1 and f2 passed to or_combinator. They may be used long after the call to or_combinator has returned.

It might take some getting used to it before you are comfortable with functions taking functions and returning functions, but it’s much easier to learn this stuff in Haskell than in the obfuscated C++. Indeed, here’s an almost direct translation of this example:

template<template<class> class f1, template<class> class f2> struct
or_combinator {
    template<class T> struct
    lambda {
        static const bool value = f1<T>::value || f2<T>::value;
    };
};

Since in the metalanguage a function is represented by a template, the template or_combinator takes two such templates as arguments. It “calls” these templates using the standard syntax f<T>::value. Actually, the or_combinator doesn’t call these functions. Instead it defines a new template, which I call lambda, that takes the argument T and calls those functions. This template acts like a closure–it captures the two templates that are the arguments to or_combinator.

Here’s how you may use the or_combinator to combine two tests, isPtr and isConst and apply the result to the type const int:

std::cout 
   << "or_combinator<isPtr, isConst>::lambda<const int>::value = "
   << or_combinator<isPtr, isConst>::lambda<const int>::value 
   << std::endl;

Such logical combinators are essential for predicate composability.

Higher-Order Functions Operating on Lists

Once you combine higher-order functions with lists you have a powerful functional language at your disposal. Higher-order functions operating on lists look very much like algorithms. Let me show you some classic examples. Here’s the function (or algorithm), all, that returns true if and only if all elements of a list satisfy a given predicate.

all pred [] = True
all pred (head:tail) = (pred head) && (all pred tail)

By now you should be familiar with all the techniques I used here, like pattern matching or list recursion.

Here’s the same code obfuscated by the C++ syntax:

template<template<class> class predicate, class... list> struct
all;

template<template<class> class predicate> struct
all<predicate> {
    static const bool value = true;
};

template<
    template<class> class predicate, 
    class head, 
    class... tail> struct
all<predicate, head, tail...> {
    static const bool value = 
        predicate<head>::value 
        && all<predicate, tail...>::value;
};

Except for the initial declaration required by C++ there is a one-to-one paradigm match between the two implementations.

Another useful algorithm, a veritable workhorse of functional programming, is “fold right” (together with it’s dual partner, “fold left”). It folds a list while accumulating the results (that’s why in runtime C++ this algorithm is called “accumulate”). Here’s the Haskell implementation:

foldr f init [] = init
foldr f init (head:tail) =
    f head (foldr f init tail)

Function f, which is the first argument to foldr, takes two arguments, the current element of the list and the accumulated value. Its purpose is to process the element and incorporate the result in the accumulator. The new accumulated value is then returned. It is totally up to the client to decide what kind of processing to perform, how it is accumulated, and what kind of value is used. The second argument, init, is the initial value for the accumulator.

Here’s how it works: The result of foldr is generated by acting with f on the head of the list and whatever has been accumulated by processing the tail of the list. The algorithm recurses until the tail is empty, in which case it returns the initial value. At runtime this type of algorithm would make N recursive calls before starting to pop the stack and accumulate the results.

For instance, foldr may be used to sum the elements of a list (so_far is the accumulator, which is initialized to zero):

add_it elem so_far = elem + so_far
sum_it lst = foldr add_it 0 lst

The accumulator function is add_it. If, instead, I wanted to calculate the product of all elements, I’d use a function mult_it and the starting value of one. You get the idea.

Here’s the same algorithm in C++ TMP:

template<template<class, int> class, int, class...> struct
fold_right;

template<template<class, int> class f, int init> struct
fold_right<f, init> {
    static const int value = init;
};

template<template<class, int> class f, int init, class head, class...tail> struct
fold_right<f, init, head, tail...> {
    static const int value = f<head, fold_right<f, init, tail...>::value>::value;
};

Once you understand the Haskell version, this complex code suddenly becomes transparent (if it doesn’t, try squinting 😉 ).

Lists of Numbers

Let’s now switch to integers for a moment. Haskell defines a function sum that adds all elements of a list:

sum [] = 0
sum (head:tail) = head + (sum tail)

We can do the same in C++ TMP (in five times as many lines of code):

template<int...> struct
sum;

template<> struct
sum<> {
    static const int value = 0;
};

template<int i, int... tail> struct
sum<i, tail...> {
    static const int value = i + sum<tail...>::value;
};

List Comprehension

Haskell has one more trick up its sleeve for operating on lists without explicit recursion. It’s called list comprehension. It’s a way of defining new lists based on existing lists. The nomenclature and notation are borrowed from Set Theory, where you often encounter definitions such as: S is a set of elements, where… Let’s look at a simple Haskell example:

[x * x | x <- [3, 4, 5]]

This is a set (list) of elements x * x, where x is from the list [3, 4, 5].

Remember our recursive definition of count? Using list comprehension it’s reduced to a one-liner:

count lst = sum [1 | x <- lst]

Here, we create a list of ones, one for each element of the list. Our result is the sum of those ones. To make this definition more amenable to translation into C++, let’s define an auxiliary function one that, for any argument x, returns 1.

one x = 1

Here’s the modified definition of count:

count lst = sum [one x | x <- lst]

Now we are ready to convert this code to C++ TMP:

template<class T> struct
one {
    static const int value = 1;
};

template<class... lst> struct
count {
    static const int value = sum<one<lst>::value...>::value;
};

Here our list is stored in a template parameter pack, lst. If we wanted to expand this pack, we’d use the notation lst…, but that’s not what’s happening here. The ellipsis appears after the pattern containing the pack:

one<lst>::value...

Compare this with the equivalent Haskell:

[one x | x <- lst]

In C++, when the ellipsis follows a pattern that contains a pack, it’s not the pack that’s expanded, but the whole pattern is repeated for each element of the pack. Here, if our list were <int, char, void*>, the pattern would be expanded to:

<one<int>::value, one<char>::value, one<void*>::value>

The subsequent call to sum would be made with those arguments.

Notice that a different positioning of the ellipsis would result in a completely different expansion. This pattern:

one<lst...>::value

would result in the call to one with the list of types, which would be an error.

Here’s another example of pattern expansion: a function that counts the number of pointers in a list of types:

template<class... lst> struct
countPtrs {
    static const int value = sum<isPtr<lst>::value ...>::value;
};

In this case the pattern is:

isPtr<lst>::value ...

and it expands into a list of Booleans. (I’m taking advantage of the fact that false is zero and true is one, when converted to integers.)

You may find a more complex practical example in the Gregor, Järvi, and Powell paper (see bibliography).

Continuations

List comprehension can be used to define some very useful higher-order functions. One of such functions is map, which takes a list and applies a unary function to each element, resulting in a new list. You might be familiar with the runtime implementation of this algorithm in C++ STL under the name of transform. This is what map looks like in Haskell:

map f lst = [f x | x <- lst]

Here, f is the unary function and lst is the input list. You have to admire the terseness and elegance of this notation.

The first impulse would be to translate it into C++ TMP as:

template<template<class> class f, class... lst> struct
map {
    typedef f<lst>... type;
};

This is surprisingly terse too. The problem is that it doesn’t compile. As far as I know there is no way for a template to “return” a variable list of elements. In my opinion, this is a major language design flaw, but that’s just me.

There are several workarounds, none of them too exciting. One is to define a separate entity called a typelist along the lines of:

template struct
typelist<hd, tl...> {
    typedef hd head;
    typedef typelist<tl...> tail;
};

(As a matter of fact I have implemented typelists and related algorithms both in C++ and D.)

Another approach is to use continuations. Template parameter packs cannot be returned, but they can be passed to variadic templates (after expansion). So how about defining an algorithm like map to take one additional function that would consume the list that is the result of mapping? Such a function is often called a continuation, since it continues the calculation where normally one would return the result. First, let’s do it in Haskell:

map_cont cont f lst = cont [f x | x <- lst]

The function map_cont is just like map except that it takes a continuation, cont, and applies it to the result of mapping. We can test it by defining yet another implementation of count:

count_cont lst = map_cont sum one lst

The continuation here is the function sum that will be applied to the list produced by acting with function one on the list lst. Since this is quite a handful, let me rewrite it in a more familiar notation of runtime C++:

int map_cont(int (*cont)(list), int (*f)(int), list lst) {
    list tmp;
    for (auto it = lst.begin(); it != lst.end(); ++it)
        tmp.push_front(f(*it));
    return cont(tmp);
}

Now for the same thing in compile-time C++:

template<template<class...> class cont, 
         template<class> class f,
         class... lst> struct
map_cont {
    static const int value =
        cont<typename f<lst>::type ...>::value;
};

It’s a one-to-one mapping of Haskell code–and it has very little in common with the iterative runtime C++ implementation. Also notice how loose the typing is in the TMP version as compared with the runtime version. The continuation is declared as a variadic template taking types, function f is declared as taking a type, and the list is a variadic list of types. Nothing is said about return types, except for the constraints that f returns a type (because cont consumes a list of types) and cont returns an int. Actually, this last constraint can be relaxed if we turn integers into types–a standard trick (hack?) in TMP:

template<int n> struct
Int {
    static const int value = n;
};

Loose typing–or “kinding,” as it is called for types of types–is an essential part of compile-time programming. In fact the popularity of the above trick shows that C++ kinding might be already too strong.

The D Digression

I’m grateful to Andrei Alexandrescu for reviewing this post. Since he objected to the sentence, “I’m disappointed that the D programming language followed the same path as C++ rather than lead the way,” I feel compelled to support my view with at least one example. Consider various implementations of all.

In Haskell, beside the terse and elegant version I showed before:

all pred [] = True
all pred (head:tail) = (pred head) && (all pred tail)

there is also a slightly more verbose one:

all pred list =
    if null list then True
    else pred (head list) && all pred (tail list)

which translates better into D. The D version (taken from its standard library, Phobos) is not as short as Haskell’s, but follows the same functional paradigm:

template allSatisfy(alias F, T...) {
    static if (T.length == 1)
    {
        alias F!(T[0]) allSatisfy;
    }
    else
    {
        enum bool allSatisfy = F!(T[0]) && allSatisfy!(F, T[1 .. $]);
    }
}

It definitely beats C++, there’s no doubt about it. There are some oddities about it though. Notice that the type tuple, T… gets the standard list treatment, but the split into the head and tail follows the array/slice notation. The head is T[0] and the tail is an array slice, T[1..$]. Instead of using value for the return value, D uses the “eponymous hack,” as I call it. The template “returns” the value using its own name. It’s a hack because it breaks down if you want to define the equivalent of “local variable” inside a template. For instance, the following code doesn’t compile:

template allSatisfy(alias F, T...) {
    static if (T.length == 1)
    {
        alias F!(T[0]) allSatisfy;
    }
    else
    {
        private enum bool tailResult = allSatisfy!(F, T[1..$]);
        enum bool allSatisfy = F!(T[0]) && tailResult;
    }
}

This breaks one of the fundamental property of any language: decomposability. You want to be able to decompose your calculation into smaller chunks and then combine them together. Of course, you may still use decomposition if you don’t use the eponymous hack and just call your return value value. But that means modifying all the calling sites.

By the way, this is how decomposition works in Haskell:

all2 pred [] = True;
all2 pred (head:tail) = (pred head) && tailResult
    where tailResult = all2 pred tail

Anyway, the important point is that there is no reason to force functional programming paradigm at compile-time. The following hypothetical syntax would be much easier for programmers who are used to imperative programming:

template allSatisfy(alias Pred, T...) {
    foreach(t; T)
        if (!Pred!(t))
            return false;
    return true;
}

In fact it’s much closer to the parameterized runtime function proposed by Andrei:

bool all(alias pred, Range)(Range r) {
    foreach (e; r) 
        if (!pred(e)) 
            return false;
    return true;
}

This is why I’m disappointed that the D programming language followed the same path as C++ rather than lead the way.

Conclusions

I have argued that some familiarity with Haskell may be really helpful in understanding and designing templates in C++. You might ask why C++ chose such horrible syntax to do compile-time functional programming. Well, it didn’t. The ability to do compile-time calculations in C++ was discovered rather than built into the language. It was a very fruitful discovery, as the subsequent developments, especially the implementation of the Boost MPL, have shown. However, once the functional paradigm and its weird syntax took root in C++ TMP, it stayed there forever. I’m aware of only one effort to rationalize C++ TMP by Daveed Vandevoorde in Reflective Metaprogramming in C++.

I have tested all code in this blog using the Hugs interpreter for Haskell and the GNU C++ compiler v. 4.4.1 with the special switch -std=c++0x. The names I used in the blog might conflict with the standard definitions. For instance, Hugs defines its own map and foldr.

If you want to learn more about template metaprogramming, I recommend two books:

Andrei Alexandrescu, Modern C++ Design
David Abrahams and Aleksey Gurtvoy, C++ Template Metaprogramming

The reference I used for variadic templates was the paper by Douglas Gregor, Jaakko Järvi, and Gary Powell

September 8, 2009

Template Metaprogramming Made Easy (Huh?)

Posted by Bartosz Milewski under C++, D Programming Language, Functional Programming, Programming
[22] Comments

“I’ve been doing some template metaprogramming lately,” he said nonchallantly.

Why is it funny? Because template metaprogramming is considered really hard. I mean, über-guru-level hard. I’m lucky to be friends with two such gurus, Andrei Alexandrescu who wrote the seminal “Modern C++ Programming,” and Eric Niebler, who implemented the Xpressive library for Boost; so I know the horrors.

But why is template metaprogramming so hard? Big part of it is that C++ templates are rather ill suited for metaprogramming, to put it mildly. They are fine for simple tasks like parameterized containers and some generic algorithms, but not for operations on types or lists of types. To make things worse, C++ doesn’t provide a lot in terms of reflection, so even such simple tasks like deciding whether a given type is a pointer are hard (see the example later). Granted, C++0x offers some improvements, like template parameter packs; but guru-level creds are still required.

So, I’ve been doing some template metaprogramming lately… in D. The D programming language doesn’t have the baggage of compatibility with previous botched attempts, so it makes many things considerably easier on programmers. But before I get to it, I’d like to talk a little about the connection between generic programming and functional programming, give a short intro to functional programming; and then show some examples in C++ and D that involve pattern matching and type lists.

It’s not your father’s language

The key to understanding metaprogramming is to realize that it’s done in a different language than the rest of your program. Both in C++ and D you use a form of functional language for that purpose. First of all, no mutation! If you pass a list of types to a template, it won’t be able to append another type to it. It will have to create a completely new list using the old list and the new type as raw materials.

Frankly, I don’t know why mutation should be disallowed at compile time (all template calculations are done at compile time). In fact, for templates that are used in D mixins, I proposed not to invent a new language but to use a subset of D that included mutation. It worked just fine and made mixins much easier to use (for an example, see my DrDobbs article).

Once you disallow mutation, you’re pretty much stuck with functional paradigm. For instance, you can’t have loops, which require a mutable loop counter or some other mutable state, so you have to use recursion.

You’d think functional programmers would love template metaprogramming; except that they flip over horrendous syntax of C++ templates. The one thing going for functional programming is that it’s easy to define and implement. You can describe typeless lambda calculus with just a few formulas in operational semantics.

One thing is important though: meta-language can’t be strongly typed, because a strongly typed language requires another language to implement generic algorithms on top of it. So to terminate the succession of meta-meta-meta… languages there’s a need for either a typeless, or at least dynamically-typed, top-level meta-language. My suspicion is that C++0x concepts failed so miserably because they dragged the metalanguage in the direction of strong typing. The nails in the coffin for C++ concepts were concept maps, the moral equivalent of implicit conversions in strongly-typed languages.

Templates are still not totally typeless. They distinguish between type arguments (introduced by typename or class in C++), template template arguments, and typed template arguments. Here’s an example that shows all three kinds:

template<class T, template<class X> class F, int n>

Functional Programming in a Nutshell

“Functions operating on functions”–that’s the gist of functional programming. The rest is syntactic sugar. Some of this sugar is very important. For instance, you want to have built-in integers and lists for data types, and pattern matching for dispatching.

-Functions

Here’s a very simple compile-time function in the C++ template language:

template<class T> 
struct IsPtr {
    static const bool apply = false;
}

If it doesn’t look much like a function to you, here it is in more normal ad-hoc notation:

IsPtr(T) {
    return false;
}

You can “execute” or “call” this meta-function by instantiating the template IsPtr with a type argument and accessing its member apply:

IsPtr<int>::apply;

There is nothing magical about “apply”, you can call it anything (“result” or “value” are other popular identifiers). This particular meta-function returns a Boolean, but any compile-time constant may be returned. What’s more important, any type or a template may be returned. But let’s not get ahead of ourselves.

-Pattern matching

You might be wondering what the use is for a function (I’ll be dropping the “meta-” prefix in what follows) that always returns false and is called IsPtr. Enter the next weapon in the arsenal of functional programmers: pattern matching. What we need here is to be able to match function arguments to different patterns and execute different code depending on the match. In particular, we’d like to return a different value, true, for T matching the pattern T*. In the C++ metalanguage this is done by partial template specialization. It’s enough to define another template of the same name that matches a more specialized pattern, T*:

template<class T>
struct IsPtr<T*> {
    static const bool apply = true;
}

When faced with the call,

IsPtr<int*>::apply

the compiler will first look for specializations of the template IsPtr, starting with the most specialized one. In our case, the argument int* matches the pattern T* so the version returning true will be instantiated. Accessing the apply member of this instantiation will result in the Boolean value true, which is exactly what we wanted. Let me rewrite this example using less obfuscated syntax.

IsPtr(T*) {
    return true;
}
IsPtr(T) { // default case
    return false;
}

D template syntax is slightly less complex than that of C++. The above example will read:

template IsPtr(T) {
    static if (is (T dummy: U*, U))
        enum IsPtr = true;
    else
        enum IsPtr = false;
}
// Compile-time tests
static assert( IsPtr!(int*) );
static assert( !IsPtr!(int) );

As you can see, D offers compile-time if statements and more general pattern matching. The syntax of pattern matching is not as clear as it could be (what’s with the dummy?), but it’s more flexible. Compile-time constants are declared as enums.

There is one little trick (a hack?) that makes the syntax of “function call” a little cleaner. If, inside the template, you define a member of the same name as the template itself (I call it the “eponymous” member) than you don’t have to use the “apply” syntax. The “call” looks more like a call, except for the exclamation mark before the argument list (a D tradeoff for not using angle brackets). You’ll see later how the eponymous trick fails for more complex cases.

-Lists

The fundamental data structure in all functional languages is a list. Lists are very easy to operate upon using recursive algorithms and, as it turns out, they can be used to define arbitrarily complex data structures. No wonder C++0x felt obliged to introduce a compile-time type list as a primitive. It’s called a template parameter pack and the new syntax is:

template<class... T>Foo

You can instantiate such a template with zero arguments,

Foo<>

one argument,

Foo<int>

or more arguments,

Foo<int, char*, void*>

How do you iterate over a type list? Well, there is no iteration in the metalanguge so the best you can do is to use recursion. To do that, you have to be able to separate the head of the list from its tail. Then you perform the action on the head and call yourself recursively with the tail. The head/tail separation is done using pattern matching.

Let me demonstrate a simple example from the paper Variadic Templates by Garry Powell et al. It calculates the length of a pack using recursion. First, the basic case–length-zero list:

template<>
struct count <> {
    static const int value = 0;
}

That is the full specialization of a template, so it will be tried first. Here’s the general case:

template<typename Head, typename... Tail>
struct count<Head, Tail...> {
    static const int value = 1 + count<Tail...>::value;
}

Let’s see what it would look like in “normal” notation:

count() {
    return 0;
}
count(head, tail) {
    return 1 + count(tail);
}

And here’s the D version:

template count(T...) {
    static if (T.length == 0)
        enum count = 0;
    else
        enum count = 1 + count!(T[1..$]);
}
// tests
static assert( count!() == 0);
static assert( count!(int, char*, char[]) == 3);

T… denotes a type tuple, which supports array-like access. To get to the tail of the list, D uses array slicing, where T[1..$] denotes the slice of the array starting from index 1 up to the length of the array (denoted by the dollar sign). I’ll explain the important differences between C++ pack and D tuple (including pack expansion) in the next installment.

Conclusion

When looked upon from the functional perspective, template metaprogramming doesn’t look as intimidating as it it seems at first. Knowing this interpretation makes you wonder if there isn’t a better syntax or even a better paradigm for metaprogramming.

I’ll discuss more interesting parts of template metaprogramming in the next installment (this one is getting too big already). In particular, I’ll show examples of higher order meta-functions like Filter or Not and some interesting tricks with type lists.

August 19, 2009

The Anatomy of Reference Counting

Posted by Bartosz Milewski under C++, Concurrency, D Programming Language, Multithreading, Programming
[23] Comments

What is there to reference counting that is not obvious? In any language that supports deterministic destruction and the overloading of the copy constructor and the assignment operator it should be trivial. Or so I though until I decided to implement a simple ref-counted thread handle in D. Two problems popped up:

How does reference counting interact with garbage collection?
How to avoid data races in a multithreaded environment?

In purely garbage-collected languages, like Java, you don’t implement reference counting, period. Which is pretty bad, if you ask me. GC is great at managing memory, but not so good at managing other resources. When a program runs out of memory, it forces a collection and reclaims unused memory. When it runs out of, say, system thread handles, it doesn’t reclaim unused handles–it just dies. You can’t use GC to manage system handles. So, as far as system resources go, Java forces the programmer to use the moral equivalent of C’s malloc and free. The programmer must free the resources explicitly.

In C++ you have std::shared_ptr for all your reference-counting needs, but you don’t have garbage collection for memory–at least not yet. (There is also the Microsoft’s C++/CLI which mixes the two systems.)

D offers the best of both worlds: GC and deterministic destruction. So let’s use GC to manage memory and reference counting (or other policies, like uniqueness) to manage other limited resources.

First attempt

The key to reference counting is to have a “value” type, like the shared_ptr in C++, that can be cloned and passed between functions at will. Internally this value type must have access to a shared chunk of memory that contains the reference count. In shared_ptr this chunk is a separately allocated integral type–the counter. The important thing is that all clones of the same resource share the same counter. The counter’s value reflects how many clones there are. Copy constructors and assignment operators take care of keeping the count exact. When the count goes to zero, that is the last copy of the resource goes out of scope, the resource is automatically freed (for instance, by calling CloseHandle).

In my first attempt, I decided that the memory allocated for the counter should be garbage-collected. After all, the important thing is to release the handle–the memory will take care of itself.

struct RcHandle {
   shared Counter _counter; // GC'd shared Counter object
   HANDLE _h;
   ~this() { // destructor
      if (_counter.dec() == 0) // access the counter
         CloseHandle(_h);
   }
   // methods that keep the counter up to date
}

RcHandle is a struct, which is a value type in D. Counter is a class, which is a reference type; so _counter really hides a pointer to shared memory.

A few tests later I got a nasty surprise. My program faulted while trying to access an already deallocated counter. How did that happen? How could garbage collector deallocate my counter if I still had a reference to it?

Here’s what I did in my test: I embedded the ref-counted handle inside another garbage-collected object:

class Embedder { // GC'd class object
   RcHandle _rc;
}

When the time came to collect that object (which happened after the program ran to completion), its finalizer was called. Whenever an object contains fields that have non-trivial destructors, the compiler generates a finalizer that calls the destructors of those embedded objects–_rc in this case. The destructor of the ref-counted handle checks the reference count stored in the counter. Unfortunately the counter didn’t exist anymore. Hours of debugging later I had the answer.

What happened is that the garbage collector had two objects on its list: the embedder and the counter. It just so happened that the collector decided to collect those two objects in reverse order: the counter first, then the embedding object. So, by the time it got to the finalizer of the embedding object, the counter was gone!

What I discovered (with the help of other members of the D team who were involved in the discussion) was that there are some limitations on mixing garbage collection with deterministic destruction. There is a general rule:

An object’s destructor must not access any garbage-collected objects embedded in it.

Since the destructor of the ref-counted handle must have access to the counter, the counter must not be garbage-collectible. That means only one thing: it has to be allocated using malloc and explicitly deallocated using free. Which brings us to the second problem.

Concurrency

What can be simpler than an atomic reference count? On most processors you can atomically increment and decrement a memory location. You can even decrement and test the value in one uninterruptible operation. Problem solved! Or is it?

There is one tiny complication–the location that you are atomically modifying might disappear. I know, this is totally counter-intuitive. After all the management of the counter follows the simple rule: the last to leave the room turns off the light. If the destructor of RcHandle sees the reference count going from one to zero, it knows that no other RcHandle has access it, and it can safely free the counter. Who can argue with cold logic?

Here’s the troubling scenario: RcHandle is embedded in an object that is visible from two threads:

class Embedder {
   RcHandle _rcH;
}
shared Embedder emb;

Thread 1 tries to overwrite the handle:

RcHandle myHandle1;
emb._rcH = myHandle1;

while Thread 2 tries to copy the same handle to a local variable:

RcHandle myHandle2 = emb._rcH;

Consider the following interleaving:

T2: Load the address of the _counter embedded in _rcH.
T1: Swap emb._rcH with myHandle
T1: Decrement the counter that was embedded in _rcH. If it’s zero (and it is, in this example), free the counter.
T2: Increment the _counter. Oops! This memory location has just been freed.

The snag is that there is a window between T2 reading the pointer in (1), and incrementing the location it’s pointing to in (4). Within that window, the reference count does not match the actual number of clients having access to the pointer. If T1 happens to do its ugly deed of freeing the counter within that window, the race may turn out deadly. (This problem has been known for some time and there were various proposals to fix it, for instance using DCAS, as in this paper on Lock-Free Reference Counting.)

Should we worry? After all the C++ shared_ptr also exposes this race and nobody is crying havoc. It turns out that it all boils down to the responsibilities of the shared object (and I’m grateful to Andrei for pointing it out).

A shared object should not willy-nilly expose its implementation to clients

If the clients of Embedder want access to the handle, they should call a synchronized method. Here’s the correct, race-free implementation of the Embedder in the scenario I just described.

class Embedder {
private:
   RcHandle _rcH;
public:
   synchronized RcHandle GetHandle() const { return _rcH; }
   synchronized void SetHandle(RcHandle h) { _rcH = h; }
   ...
}

The method GetHandle copies _rcH and increments its count under the Embedder‘s lock. Another thread calling SetHandle has no chance of interleaving with that action, because it is forced to use the same lock. D2 actually enforces this kind of protection for shared objects, so my original example wouldn’t even compile.

You might be thinking right now that all this is baloney because I’m trying to fit a square peg into a round hole. I’m imposing value semantics on a non-atomic object, and you cannot atomically overwrite a non-atomic object. However, using this logic, you could convince yourself that a double cannot have value semantics (on most common architectures doubles are too large to be atomic). And yet you can safely pass doubles between threads and assign them to each other (which is the same as overwriting). It’s only when you embed a double inside a shared object, you have to protect it from concurrent access. And it’s not the double protecting itself–it’s the shared embedder that is responsible for synchronization. It’s exactly the same with RcHandle.

Conclusions

Just when you thought you knew everything about reference counting you discover something you haven’t though about. The take-home message is that mixing garbage collection with deterministic destruction is not a trivial matter, and that a race-free reference-counted object is vulnerable to races when embedded it in another shared object. Something to keep in mind when programming in D or in C++.

July 16, 2009

On Actors and Casting

Posted by Bartosz Milewski under C++, Concurrency, D Programming Language, Erlang, Java, Multithreading, Programming, Scala, Type System
[31] Comments

Is the Actor model just another name for message passing between threads? In other words, can you consider a Java Thread object with a message queue an Actor? Or is there more to the Actor model? Bartosz investigates.

I’ll start with listing various properties that define the Actor Model. I will discuss implementation options in several languages.

Concurrency

Actors are objects that execute concurrently. Well, sort of. Erlang, for instance, is not an object-oriented language, so we can’t really talk about “objects”. An actor in Erlang is represented by a thing called a Process ID (Pid). But that’s nitpicking. The second part of the statement is more interesting. Strictly speaking, an actor may execute concurrently but at times it will not. For instance, in Scala, actor code may be executed by the calling thread.

Caveats aside, it’s convenient to think of actors as objects with a thread inside.

Message Passing

Actors communicate through message passing. Actors don’t communicate using shared memory (or at least pretend not to). The only way data may be passed between actors is through messages.

Erlang has a primitive send operation denoted by the exclamation mark. To send a message Msg to the process (actor) Pid you write:

Pid ! Msg

The message is copied to the address space of the receiver, so there is no sharing.

If you were to imitate this mechanism in Java, you would create a Thread object with a mailbox (a concurrent message queue), with no public methods other than put and get for passing messages. Enforcing copy semantics in Java is impossible so, strictly speaking, mailboxes should only store built-in types. Note that passing a Java Strings is okay, since strings are immutable.

-Typed messages

Here’s the first conundrum: in Java, as in any statically typed language, messages have to be typed. If you want to process more than one type of messages, it’s not enough to have just one mailbox per actor. In Erlang, which is dynamically typed, one canonical mailbox per actor suffices. In Java, mailboxes have to be abstracted from actors. So an actor may have one mailbox for accepting strings, another for integers, etc. You build actors from those smaller blocks.

But having multiple mailboxes creates another problem: How to block, waiting for messages from more than one mailbox at a time without breaking the encapsulation? And when one of the mailboxes fires, how to retrieve the correct type of a message from the appropriate mailbox? I’ll describe a few approaches.

-Pattern matching

Scala, which is also a statically typed language, uses the power of functional programming to to solve the typed messages problem. The receive statement uses pattern matching, which can match different types. It looks like a switch statements whose case labels are patterns. A pattern may specify the type it expects. You may send a string, or an integer, or a more complex data structure to an actor. A single receive statement inside the actor code may match any of those.

receive {
    case s: String => println("string: "+ s)
    case i: Int => println("integer: "+ i)
    case m => println("unknown: "+ m)
}

In Scala the type of a variable is specified after the colon, so s:String declares the variable s of the type String. The last case is a catch-all.

This is a very elegant solution to a difficult problem of marrying object-oriented programming to functional programming–a task at which Scala exceeds.

-Casting

Of course, we always have the option of escaping the type system. A mailbox could be just a queue of Objects. When a message is received, the actor could try casting it to each of the expected types in turn or use reflection to find out the type of the message. Here’s what Martin Odersky, the creator of Scala, has to say about it:

The most direct (some would say: crudest) form of decomposition uses the type-test and type-cast instructions available in Java and many other languages.

In the paper he co-authored with Emir and Williams (Matching Objects With Patterns) he gives the following evaluation of this method:

Evaluation: Type-tests and type-casts require zero overhead for the class hierarchy. The pattern matching itself is very verbose, for both shallow and deep patterns. In particular, every match appears as both a type-test and a subsequent type-cast. The scheme raises also the issue that type-casts are potentially unsafe because they can raise ClassCastExceptions. Type-tests and type-casts completely expose representation. They have mixed characteristics with respect to extensibility. On the one hand, one can add new variants without changing the framework (because there is nothing to be done in the framework itself). On the other hand, one cannot invent new patterns over existing variants that use the same syntax as the type-tests and type-casts.

The best one could do in C++ or D is to write generic code that hides casting from the client. Such generic code could use continuations to process messages after they’ve been cast. A continuation is a function that you pass to another function to be executed after that function completes (strictly speaking, a real continuation never returns, so I’m using this word loosely). The above example could be rewritten in C++ as:

void onString(std::string const & s) {
    cout << "string: " << s << std::endl;
}
void onInt(int i) {
    cout << "integer: " << i << std::endl;
}

receive<std::string, int> (&onString, &onInt);

where receive is a variadic template (available in C++0x). It would do the dynamic casting and call the appropriate function to process the result. The syntax is awkward and less flexible than that of Scala, but it works.

The use of lambdas might make things a bit clearer. Here’s an example in D using lambdas (function literals), courtesy Sean Kelly and Jason House:

receive(
    (string s){ writefln("string: %s", s); },
    (int i){ writefln("integer: %s", i); }
);

Interestingly enough, Scala’s receive is a library function with the pattern matching block playing the role of a continuation. Scala has syntactic sugar to make lambdas look like curly-braced blocks of code. Actually, each case statement is interpreted by Scala as a partial function–a function that is not defined for all values (or types) of arguments. The pattern matching part of case becomes the isDefinedAt method of this partial function object, and the code after that becomes its apply method. Of course, partial functions could also be implemented in C++ or D, but with a lot of superfluous awkwardness–lambda notation doesn’t help when partial functions are involved.

-Isolation

Finally, there is the problem of isolation. A message-passing system must be protected from data sharing. As long as the message is a primitive type and is passed by value (or an immutable type passed by reference), there’s no problem. But when you pass a mutable Object as a message, in reality you are passing a reference (a handle) to it. Suddenly your message is shared and may be accessed by more than one thread at a time. You either need additional synchronization outside of the Actor model or risk data races. Languages that are not strictly functional, including Scala, have to deal with this problem. They usually pass this responsibility, conveniently, to the programmer.

-Kilim

Java is not a good language to implement the Actor model. You can extend Java though, and there is one such extension worth mentioning called Kilim by Sriram Srinivasan and Alan Mycroft from Cambridge, UK. Messages in Kilim are restricted to objects with no internal aliasing, which have move semantics. The pre-processor (weaver) checks the structure of messages and generates appropriate Java code for passing them around. I tried to figure out how Kilim deals with waiting on multiple mailboxes, but there isn’t enough documentation available on the Web. The authors mention using the select statement, but never provide any details or examples.

Correction: Sriram was kind enough to provide an example of the use of select:

int n = Mailbox.select(mb0, mb1, .., timeout);

The return value is the index of the mailbox, or -1 for the timeout. Composability is an important feature of the message passing model.

Dynamic Networks

Everything I described so far is common to CSP (Communicating Sequential Processes) and the Actor model. Here’s what makes actors more general:

Connections between actors are dynamic. Unlike processes in CSP, actors may establish communication channels dynamically. They may pass messages containing references to actors (or mailboxes). They can then send messages to those actors. Here’s a Scala example:

receive {
    case (name: String, actor: Actor) =>
        actor ! lookup(name)
}

The original message is a tuple combining a string and an actor object. The receiver sends the result of lookup(name) to the actor it has just learned about. Thus a new communication channel between the receiver and the unknown actor can be established at runtime. (In Kilim the same is possible by passing mailboxes via messages.)

Actors in D

The D programming language with my proposed race-free type system could dramatically improve the safety of message passing. Race-free type system distinguishes between various types of sharing and enforces synchronization when necessary. For instance, since an Actor would be shared between threads, it would have to be declared shared. All objects inside a shared actor, including the mailbox, would automatically inherit the shared property. A shared message queue inside the mailbox could only store value types, unique types with move semantics, or reference types that are either immutable or are monitors (provide their own synchronization). These are exactly the types of messages that may be safely passed between actors. Notice that this is more than is allowed in Erlang (value types only) or Kilim (unique types only), but doesn’t include “dangerous” types that even Scala accepts (not to mention Java or C++).

I will discuss message queues in the next installment.

June 23, 2009

Multithreading Tutorial: Globals

Posted by Bartosz Milewski under C++, Concurrency, D Programming Language, Java, Multithreading, Programming, Type System
[2] Comments

If it weren’t for the multitude of opportunities to shoot yourself in the foot, multithreaded programming would be easy. I’m going to discuss some of these “opportunities” in relation to global variables. I’ll talk about general issues and discuss the ways compilers can detect them. In particular, I’ll show the protections provided by my proposed extensions to the type system.

Global Variables

There are so many ways the sharing of global variables between threads can go wrong, it’s scary.

Let me start with the simplest example: the declaration of a global object of class Foo (in an unspecified language with Java-like syntax).

Foo TheFoo = new Foo;

In C++ or Java, TheFoo would immediately be visible to all threads, even if Foo provided no synchronization whatsoever (strictly speaking Java doesn’t have global variables, but static data members play the same role).

If the programmer doesn’t do anything to protect shared data, the default immediately exposes her to data races.

The D programming language (version 2.0, also known as D2) makes a better choice–global variables are, by default, thread local. That takes away the danger of accidental sharing. If the programmer wants to share a global variable, she has to declare it as such:

shared Foo TheFoo = new shared Foo;

It’s still up to the designer of the class Foo to provide appropriate synchronization.

Currently, the only multithreaded guarantee for shared objects in D2 is the absence of low-level data races on multiprocessors–and even that, only in the safe subset of D. What are low level data races? Those are the races that break some lock-free algorithms, like the infamous Double-Checked Locking Pattern. If I were to explain this to a Java programmer, I’d say that all data members in a shared object are volatile. This property propagates transitively to all objects the current object has access to.

Still, the following implementation of a shared object in D would most likely be incorrect even with the absence of low-level data races:

class Foo {
    private int[] _arr;
    public void append(int i) {
       _arr ~= i; // array append
    }
}

auto TheFoo = new shared Foo;

The problem is that an array in D has two fields: the length and the pointer to a buffer. In shared Foo, each of them would be updated atomically, but the duo would not. So two threads calling TheFoo.append could interleave their updates in an unpredictable way, possibly leading to loss of data.

My race-free type system goes further–it eliminates all data races, both low- and high-level. The same code would work differently in my scheme. When an object is declared shared, all its methods are automatically synchronized. TheFoo.append would take Foo‘s lock and make the whole append operation atomic. (For the advanced programmer who wants to implement lock-free algorithms my scheme offers a special lockfree qualifier, which I’ll describe shortly.)

Now suppose that you were cautious enough to design your Java/D2 class Foo to be thread safe:

class Foo {
    private int [] _arr;
    public synchronized void append(int i) {
       _arr ~= i; // array append
    }
}

Does it mean your global variable, TheFoo, is safe to use? Not in Java. Consider this:

static Foo TheFoo; // static = global
// Thread 1
TheFoo = new Foo();
// Thread 2
while (TheFoo == null)
    continue;
TheFoo.append(1);

You won’t even know what hit you when your program fails. I will direct the reader to one of my older posts that explains the problems of publication safety on a multiprocessor machine. The bottom line is that, in order to make your program work correctly in Java, you have to declare TheFoo as volatile (or final, if you simply want to prevent such usage). Again, it looks like in Java the defaults are stacked against safe multithreading.

This is not a problem in D2, since shared implies volatile.

In my scheme, the default behavior of shared is different. It works like Java’s final. The code that tries to rebind the shared object (re-assign to the handle) would not compile. This is to prevent accidental lock-free programming. (If you haven’t noticed, the code that waits on the handle of TheFoo to switch from null to non-null is lock-free. The handle is not protected by any lock.) Unlike D2, I don’t want to make lock-free programming “easy,” because it isn’t easy. It’s almost like D2 is endorsing lock-free programming by giving the programmer a false sense of security.

So what do you do if you really want to spin on the handle? You declare your object lockfree.

lockfree Foo TheFoo;

lockfree implies shared (it doesn’t make sense otherwise), but it also makes the handle “volatile”. All accesses to it will be made sequentially consistent (on the x86, it means all stores will compile to xchg).

Note that lockfree is shallow–data members of TheFoo don’t inherit the lockfree qualification. Instead, they inherit the implied shared property of TheFoo.

It’s not only object handles that can be made lockfree. Other atomic data types like integers, Booleans, etc., can also be lockfree. A lockfree struct is also possible–it is treated as a tuple whose all elements are lockfree. There is no atomicity guarantee for the whole struct. Methods can be declared lockfree to turn off default synchronization.

Conclusion

Even the simplest case of sharing a global variable between threads is fraught with danger. My proposal inobtrusively eliminates most common traps. The defaults are carefully chosen to let the beginners avoid the pitfalls of multithreaded programming.

June 15, 2009

Race-free Multithreading : Owner polymorphism

Posted by Bartosz Milewski under Concurrency, D Programming Language, Java, Multithreading, Programming, Type System
[4] Comments

In my last post I talked about the proposal for the ownership scheme for multithreaded programs that provides alias control and eliminates data races. The scheme requires the addition of new type qualifiers to the (hypothetical) language. The standard concern is that new type qualifiers introduce code duplication. The classic example is the duplication of getters required by the introduction of the const modifier:

class Foo {
    private Bar _bar;
public:
    Bar get() {
        return _bar;
    }
    const Bar get() const {
        return _bar;
    }
}

Do ownership annotations lead to the same kind of duplication? Fortunately not. It’s true that, in most cases, two implementations of each public method are needed–with and without synchronization–but this is taken care by the compiler, not by the programmer. Unlike in Java, we don’t need a different class for shared Vector and thread-local ArrayList. In my scheme, when a vector is instantiated as a monitor (shared), the compiler automatically puts in the necessary synchronization code.

Need for generic code

The ownership scheme introduces an element of genericity by letting the programmer specify ownership during the instantiation of a class (just as a template parameter is specified during the instantiation of a template).

I already mentioned that most declarations can be expressed in two ways: one using type qualifiers, another using template notation–the latter exposing the generic nature of ownership annotations. For instance, these two forms are equivalent:

auto foo2 = new shared Foo;
auto foo2 = new Foo<owner::self>;

The template form emphasizes the generic nature of ownership annotations.

With the ownership system in place, regular templates parametrized by types also gain an additional degree of genericity since their parameters now include implicit ownership information. This is best seen in objects that don’t own the objects they hold. Most containers have this property (unless they are restricted to storing value types). For instance, a stack object might be thread-local while its elements are either thread-local or shared. Or the stack might be shared, with shared elements, etc. The source code to implement such stacks may be identical.

The polymorphic scheme and the examples are based on the GRFJ paper I discussed in a past post.

An example

– Stack

A parameterized stack might look like this :

class Stack<T> {
    private Node<T> _top;
public:
    void push(T value) {
        auto newNode = new Node<owner::this, T>;
        newNode.init(:=value, _top);
        _top = newNode;
    }
    T pop() {
        if (_top is null) return null;
        auto value := _top.value();
        _top = _top.next();
        return value;
    }
}

This stack is parametrized by the type T. This time, however, T includes ownership information. In particular T could be shared, unique, immutable or–the default–thread-local. The template picks up whatever qualifier is specified during instantiation, as in:

auto stack = new Stack<unique Foo>;

The _top (the head of the linked list) is of the type Node, which is parametrized by the type T. What is implicit in this declaration is that _top is owned by this–the default assignment of ownership for subobjects. If you want, you can make this more explicit:

private Node<owner::this, T> _top;

Notice that, when constructing a new Node, I had to specify the owner explicitly as this. The default would be thread-local and could leak thread-local aliases in the constructor. It is technically possible that the owner::this could, at least in this case, be inferred by the compiler through simple flow analysis.

Let’s have a closer look at the method push, where some interesting things are happening. First push creates a new node, which is later assigned to _top. The compiler cannot be sure that the Node constructor or the init method don’t leak aliases. That looks bad at first glance because, if an alias to newNode were leaked, that would lead to the leakage of _top as well (after a pop).

And here’s the clincher: Because newNode was declared with the correct owner–the stack itself–it can’t leak an alias that has a different owner. So anybody who tries to access the (correctly typed) leaked alias would have to hold the lock on the stack. Which means that, if the stack is shared, unsynchronized access to any of the nodes and their aliases is impossible. Which means no races on Nodes.

I also used the move operator := to move the values stored on the stack. That will make the stack usable with unique types. (For non-unique types, the move operator turns into regular assignment.)

I can now instantiate various stacks with different combinations of ownerships. The simplest one is:

auto localStack = new Stack<Foo>;

which is thread-local and stores thread-local objects of class Foo. There are no restrictions on Foo.

A more interesting combination is:

auto localStackOfMonitors = new Stack<shared Foo>;

This is a thread-local stack which stores monitor objects (the opposite is illegal though, as I’ll explain in a moment).

There is also a primitive multithreaded message queue:

auto msgQueue = new shared Stack<shared Foo>;

Notice that code that would try to push a thread-local object on the localStackOfMonitors or the msgQueue would not compile. We need the rich type system to be able to express such subtleties.

Other interesting combinations are:

auto stackOfImmutable = new shared Stack<immutable Foo>;
auto stackOfUnique = new shared Stack<unique Foo>;

The latter is possible because I used move operators in the body of Stack.

– Node

Now I’ll show you the fully parameterized definition of Node. I made all ownership annotations explicit for explanatory purposes. Later I’ll argue later that all of them could be elided.

class Node<T> {
private:
    T _value;
    Node<owner::of_this, T> _next;
public:
    void init(T v, Node<owner::of_this, T> next)
    {
        _value := v;
        _next = next;
    }
    T value() {
        return :=_value;
    }
    Node<owner::of_this, T> next() {
        return _next;
    }
}

Notice the declaration of _next: I specified that it must be owned by the owner of the current object, owner::of_this. In our case, the current object is a node and its owner is an instance of the Stack (let’s assume it’s the self-owned msgQueue).

This is the most logical assignment of ownership: all nodes are owned by the same stack object. That means no ownership conversions need be done, for instance, in the implementation of pop. In this assignment:

_top = _top.next();

the owner of _top is msgQueue, and so is the owner of its _next object. The types match exactly. I drew the ownership tree below. The fat arrows point at owners.
Ownership hierarchy using owner::of_this

But that’s not the only possibility. The default–that is _next being owned by the current node–would work too. The corresponding ownership tree is shown below.

Ownership hierarchy using owner::this

The left-hand side of the assignment

_top = _top.next();

is still owned by msgQueue. But the _next object inside the _top is not. It is owned by the _top node itself. These are two different owners so, during the assignment, the compiler has to do an implicit ownership conversion. Such conversion is only safe if both owners belong to the same ownership tree (sometimes called a “region”). Indeed, ownership is only needed for correct locking, and the locks always reside at the top of the tree (msgQueue in this case). So, after all, we don’t need to annotate _next with the ownership qualifier.

The two other annotations can be inferred by the compiler (there are some elements of type inference even in C++0x and D). The argument next to the method init must be either owned by this or be convertible to owner::this because of the assignment

_next = next;

Similarly, the return from the method next is implicitly owned by this (the node). When it’s used in Stack.pop:

_top = _top.next();

the owner conversion is performed.

With ownership inference, the definition of Node simplifies to the following:

class Node<T> {
private:
    T _value;
    Node<T> _next; // by default owned by this
public:
    void init(T v, Node<T> next)
    {
        _value := v;
        _next = next; // inference: owner of next must be this
    }
    T value() {
        return :=_value;
    }
    Node<T> next() {
        return _next; // inference: returned node is owned by this
    }
}

which has no ownership annotations.

Let me stress again a very important point: If init wanted to leak the alias to next, it would have to assign it to a variable of the type Node<owner::this, T>, where this is the current node. The compiler would make sure that such a variable is accessed only by code that locks the root of the ownership tree, msgQueue. This arrangement ensures the absence of races for the nodes of the list.

Another important point is that Node contains _value of type T as a subobject. The compiler will refuse instantiations where Node‘s ownership tree is shared (its root is is self-owned), and T is thread-local. Indeed, such instantiation would lead to races if an alias to _value escaped from Node. Such an alias, being thread-local, would be accessible without locking.

Comment on notation

In general, a template parameter list might contain a mixture of types, type qualifiers, values, (and, in D, aliases). Because of this mixture, I’m using special syntax for ownership qualifiers, owner::x to distinguish them from other kinds of parameters.

As you have seen, a naked ownership qualifier may be specified during instantiation. If it’s the first template argument, it becomes the owner of the object. Class templates don’t specify this parameter, but they have access to it as owner::of_this.

Other uses of qualifier polymorphism

Once qualifier polymorphism is in the language, there is no reason not to allow other qualifiers to take part in polymorphism. For instance, the old problem of having to write separate const versions of accessors can be easily solved:

class Foo {
    private Bar _bar;
    public mut_q Bar get<mutability::mut_q>() mut_q
    {
        return _bar;
    }
}

Here method get is parametrized by the mutability qualifier mut_q. The values it can take are: mutable (the default), const, or immutable. For instance, in

auto immFoo = new immutable Foo;
immutable Bar b = immFoo.get();

the immutable version of get is called. Similarly, in

void f(const Foo foo) {
    const Bar b = foo.get();
}

the const version is called (notice that f may also be called with an immutable object–it will work just fine).

Class methods in Java or D are by default virtual. This is why, in general, non-final class methods cannot be templatized (an infinite number of possible versions of a method would have to be included in the vtable). Type qualifiers are an exception, because there is a finite number of them. It would be okay for the vtable to have three entries for the method get, one for each possible value of the mutability parameter. In this case, however, all three are identical, so the compiler will generate just one entry.

Conclusion

The hard part–explaining the theory and the details of the ownership scheme–is over. I will now switch to a tutorial-style presentation that is much more programmer friendly. You’ll see how simple the scheme really is in practice.

June 2, 2009

Race-free Multithreading: Ownership

Posted by Bartosz Milewski under Concurrency, D Programming Language, Java, Multithreading, Programming, Type System
[18] Comments

Since ownership plays a major role in race-free programming, it will be the first topic in my proposal for a race-free system. I presented the bird’s eye view of the system and provided a few teasers in my previous post. The design is based on published papers (see bibliography at the end). My contribution was to integrate several ideas into one package.

When I showed this proposal to my friends they either didn’t believe it could work or considered it too complex, depending which end they were looking at. From users’ perspective, the system looks relatively simple, so the natural reaction is: That can’t work. If you get into the details of why it works, and how the compiler knows you are in danger of a data race, you need some theory, and that is complex. So I decided to deal with some theory first, to show that the things work. If you’re not into theory, just look at the examples. They are usually simple to understand.

Owners

The ownership relationship is necessary to establish a tree-like structure among objects. This is needed by the compiler to decide which lock, if any, is responsible for the protection of each object, and take it when necessary. Simply speaking, the lock at the root of each tree protects the rest of the tree. If you think that your multithreaded programs don’t follow a tree structure, look at them closely. If they don’t, you either already have data races or are likely to develop them in the future.

-Every object has an owner

The owner may be another object–usually the embedding object. In the example below:

class Foo {
    void doWork() { _bar.doWork(); }
    private Bar _bar;
}

auto foo = new Foo;

the embedded object _bar is owned, at runtime, by the object foo (I repeat, the concrete object, not the class Foo). This is the default ownership relationship for embedded objects, so no special notation is needed to establish it (I’ll show later how to override this default).

There are also special symbolic “owners” that are used for the roots of ownership trees:

thread,
self,
unique, and
immutable.

unique and immutable are included in this list for convenience. I’ll discuss them later.

-Trees

Every object has just one owner for life, a condition necessary to create ownership trees that can be checked at compile time. Every tree has a single root and a lock is attached to that root, if needed.

The ownership information is embedded in the type of the object. Using this information, the compiler is able to deduce which lock must be held while accessing that object, and what kind of aliasing is allowed. All races (access to mutable shared variables without locking) are detected at compile time. I’ll sketch a proof later.

-What may be shared

Only immutable objects or objects rooted with a self-owned object may be shared between threads.

Additionally, objects whose direct owner is self (such objects are called monitors) may have multiple aliases while being shared. Monitors may own (and protect) other objects that are not monitors.

-Locking

The compiler will make sure that access to an object can only happen when the root of its ownership tree is locked (symbolic owners other than self are considered locked at all times). Since an object may only have one lock associated with it (at the top of its ownership tree), this condition is enough to ensure freedom from races.

Proof: I have to show that when a (mutable) object is seen by more than one thread, each access to it (read or write) is always protected by the same lock. Indeed, for an object to be shared between threads, the root of its ownership tree must be self, hence the top object must be a monitor. This monitor’s lock is always, automatically or explicitly, taken before accessing any member of the tree. The compiler knows which lock to take because the ownership information is encoded in the type of the object.

Introducing ownership annotations

Ownership is specified at the instance level (although it may be restricted at the class level). The previous example, which relied on default assignment of owners, is equivalent to the more explicit instance-level specification (that you will never see in actual programs):

Foo<owner::thread> foo = new Foo<owner::thread>;

This declares and constructs foo as being owned by the symbolic owner, thread. The embedded object _bar‘s owner is foo.

-Creating a monitor

A self-owned object is a monitor (I will alternate between the notation using shared type modifier or explicit owner annotation, <owner::self>). It contains a hidden lock and its methods are, by default, synchronized. Continuing with my example:

auto fooMon = new shared Foo;
// The same as:
// auto fooMon = new Foo<owner::self>;
fooMon.doWork();

The variable fooMon is a monitor and the doWork method is implicitly synchronized. The object _bar is now owned by fooMon. Its type can be expressed (this is rarely needed, however see the example of external ownership) as:

Bar<owner::fooMon>

Types parameterized by runtime entities (fooMon is a runtime handle) are known in programming language theory as dependent types.

Notice that I’m using the same class to create thread-local and shared instances. This is usually possible unless there is a specific restriction at the class level.

Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)

Thread-local objects

The special thread owner, which is the owner of all thread-local objects, is conceptually always locked, so thread-local objects don’t require any synchronization. Also, thread is the default owner so, in the absence of any ownership annotations, all objects are thread-local. That’s one of the defaults that makes single-threaded programs work as-is.

Here’s an interesting twist–global and static objects are by default thread-local. This part has been implemented in D, uncovering a number of threading bugs in the process.

Monitors

The special self owner (or the shared type modifier) is used to create monitor objects. A monitor has a built-in lock and all its public methods are by default synchronized.

As always with defaults, the language must provide a (preferably safe) way to bypass them. To prevent locking, a method may be explicitly marked as lockfree. The compiler is obliged to check if the lockfree method doesn’t access the object’s members in a non-safe way (although it can’t prevent high-level races on lockfree variables). That restricts the lockfree constructs to those that don’t require whole-program analysis to prove their safety.

The lockfree annotation is essential for, among others, the double-checked locking pattern (DCLP). I showed its implementation as a teaser in my previous post.

Subobjects

As I explained earlier, data members of an object are by default owned by that object. This way they inherit the root owner from their parent. This is another default that makes single-threaded programs run without additional qualifiers.

Notice that there are two important aspects of ownership, the direct owner and the root owner, which might be different. The direct owner is used in type-checking, the root owner in deciding which synchronization method to use. Both are known or inferred during compilation.

As usual, the defaults may be overridden. For instance, you may embed a monitor in a thread-local object by qualifying it as self-owned/shared:

class Holder {
    private Mon<owner::self> _mon;
}

or, in common notation, as shared:

class Holder {
    private shared Mon _mon;
}

Here, _mon is not owned by Holder (the default has been overridden) so it doesn’t inherit its root owner. Its methods are synchronized by its own lock. As you can see, ownership tree not always reflects embedding. An embedded monitor starts a new tree.

Well, the situation is a bit subtler. Objects in Java or D have reference semantics, so there is a hidden pointer, or handle, in the code above. Accessing the handle is not the same as accessing the object proper. Consider this example:

class Holder {
    private shared Mon _mon;
    public setMon(shared Mon newMon) {
        _mon = newMon;
    }
}

Let’s instantiate a self-owned Holder and a self-owned Mon:

auto holder = new shared Holder;
auto newMon = new shared Mon;
holder.setMon(newMon);

Since holder is itself a monitor, the setMon method is automatically synchronized by its lock (it must be!). Therefore, strictly speaking, the handle part of _mon is owned by holderMon, whereas the object-proper part is self-owned.

You cannot embed a thread-owned object inside a monitor–the compiler would flag it as an error. This is part of alias control–a thread-local object might possibly have thread-local aliases that may be accessed without locking. Being part of a monitor, it could then migrate to another thread and cause a race.

What if a subobject is accessed directly (not through a method)? This may happen when the subobject is declared public:

class Foo {
    public Bar _bar;
}

In that case not all uses of _bar are allowed. Consider this:

auto foo = new shared Foo;
foo._bar.m(); // error

Access to _bar must happen only when foo is locked. The compiler knows it because the full type of _bar is:

Bar<owner::foo>

Here’s the corrected code:

synchronized(foo) {
    foo._bar.m();
}

An even better solution is to make _bar private and provide appropriate methods to access it. Those methods would be automatically synchronized for a shared foo.

unique and immutable

I discussed unique objects in one of my previous posts. Although not strictly required in the ownership scheme, uniqueness allows for very efficient and safe transmission of large objects between threads. It makes sense to include unique as another symbolic root owner, since its multithreaded semantics is different from other types and it doesn’t require locking.

Some languages, including D, define immutable objects, which cannot be modified after creation. Such objects may be freely shared and passed by reference between threads. Again, immutable may be used as a root owner.

Example

With the preliminaries out of the way, I can now explain in more detail the workings of the teaser from my previous post. Here’s the definition of the class MVar:

class MVar<T> {
private:
    T    _msg;
    bool _full;
public:
    void put(T msg) {
        _msg := msg; // move
        _full = true;
        notify();
    }
    T take() {
        while (!_full)
            wait();
        _full = false;
        return := _msg;
    }
}

First, let’s instantiate MVar as a shared (self-owned) monitor that is used to pass unique objects of class Foo as messages:

auto chanUnique = new shared MVar<unique Foo>;

The type of _msg in this instantiation is unique Foo, which is the same as Foo<owner::unique>. The method put takes unique Foo, so the following code is type-correct:

auto foo = new unique Foo;
chanUnique.put(:= foo); // move foo

Notice that unique objects cannot be assigned or passed by value–they have to be moved, hence the use of the move operator, :=. Internally, the method put also uses the move operator (good thinking on the part of the designer–otherwise MVar couldn’t be instantiated with unique). What’s interesting about this example is that messages are not deep-copied between threads. They are safely passed by reference.

Since chanUnique is self-owned (shared), both put and get are automatically synchronized.

Now let’s access chanUnique from another thread:

// another thread
unique Foo f2 = chanUnique.get(); // implicit move of rvalue

The return type of get is unique Foo, so the types check. I could have used the move operator, but since the right hand side is an rvalue, the compiler lets me use the assignment.

Now for the tricky case: What’s wrong with this code?

auto mVar = new shared MVar<Foo>;
auto myFoo = new Foo;
mVar.put(myFoo);
myFoo.unsyncMethod(); // ouch!

Since myFoo is created as thread-local (that’s the default), its methods are not synchronized. If I were able to pass it to shared MVar, another thread could obtain it through get. It could then call the unsynchronized method unsyncMethod at the moment when I was calling it. A data race would be possible! Or would it?

Guess what–the compiler won’t let you shoot yourself in the foot. It will notice that it would have to instantiate a shared object mVar with a thread-local member _msg. That’s against the rules! (A shared object cannot own a thread-local object.)

External ownership

In the original GRFJ paper the authors showed an example where one object was owned by another object without the former being embedded in the latter. They made an observation that, for the purpose of locking, the ownership relationship must be unchangeable: You can’t switch the owner on the fly. Therefore external ownership is allowed only if the owner is declared final.

final shared Lock lockObj = new shared Lock;
auto foo = new Foo<owner::lockObj>;
auto bar = new Bar<owner::lockObj>;

In this case, the compiler will only allow access to foo under the lock of lockObj, as in:

synchronized(lockObj) {
    foo.method();
    bar.method();
}

This construct is useful in situations where the locking discipline is not easily convertible to object hierarchy.

Conclusion

You might have noticed my use of dual notation. Most user code would be written with type qualifiers such as shared, unique, or immutable. However, in some cases I used an alternative notation that looked more like the specification of template parameters: <owner::self>, <owner::unique>, <owner::immutable>, or even <owner::thread> (in D they would be surrounded by !( and )). This was not meant to further confuse the reader, but as a gentle introduction to qualifier polymorphism, which I will describe in the next installment. I will show how classes and methods may be parameterized with different types of ownership, cutting down code duplication.

I’d like to thank Andrei Alexandrescu, Walter Bright, Sean Kelly and Jason House for very helpful comments. I’m also indebted to the D community for discussing my previous posts.

Bibliography

Boyapati, Rinard, A Parameterized Type System for Race-Free Java Programs
C. Flanagan, M. Abadi, Object Types against Races.

May 21, 2009

unique_ptr–How Unique is it?

Posted by Bartosz Milewski under C++, D Programming Language, Multithreading, Programming, Type System
[29] Comments

I’ve been using auto_ptrs before they were available in C++ (I made my own). I wrote several articles about resource management, in which auto_ptr played a prominent role. auto_ptr had always had many flaws, but it was a workhorse of memory management in a language that shuns garbage collection. Many of the flaws have been understood over the years and, in the 0x version of C++, auto_ptr was supplanted by the new improved unique_ptr. Using the latest features of C++, like rvalue references, unique_ptr implements move semantics in a consistent manner. You can now store unique_ptrs in most containers and apply many algorithms to them.

So why am I not jumping for joy? Because I know how much more can be done.

But first let me summarize the idea behind the unique pointer. It is a (smart) pointer that is the only reference to the object it’s pointing to. In particular, you cannot make a copy of a unique pointer–if you could, you’d end up with two references to the same object. You can only move it (hence the move semantics), making the source of the move invalid. With the older auto_ptr the moving was done quietly during assignment or pass-by-value. The problem with that arrangement was that the source auto_ptr would suddenly turn to null, which was sometimes confusing and occasionally led to access violations–as in this example:

void f(auto_ptr<Foo> foo); // pass by value

auto_ptr<Foo> pFoo (new Foo());
f(pFoo); // pass by value nulls the internal pointer
pFoo->method(); // access violation

The improvement provided by unique_ptr is to require an explicit move, to sensitize the programmer to the fact that the source of move becomes invalid. Still, the following code will compile, although the bug is much more prominent:

void f(unique_ptr<Foo> foo);

unique_ptr<Foo> pFoo (new Foo())
f(move(pFoo)); // explicit move
pFoo->method(); // access violation

A bigger problem is that there is no guarantee that a unique_ptr indeed stores the unique reference to an object. To begin with, unique_ptr can be constructed from any pointer. That pointer might have already been aliased, as in:

void f(unique_ptr<Foo> foo) {
    // destructor of foo called at the end of scope
}

Foo * foo = new Foo();
unique_ptr<Foo> upFoo(foo); 
// foo now aliases the contents of upFoo
f(move(upFoo)); // explicit move
foo->method(); // undefined behavior

There is also an obvious backdoor in the form of the method unique_ptr::get, which can be used to spread new aliases. This can be particularly insidious when you have to pass the result of get to a foreign function:

void f(Foo * pf) {
    globalFoo = pf; // creates a global alias
}

unique_ptr<Foo> pFoo(new Foo());
f(pFoo.get()); // leaks an alias

Finally, it’s possible to create aliases to data members of the object stored in unique_ptr, as well as assign aliased references to its data members. Consider this example:

class Foo {
public:
    ~Foo() { delete _bar; }
    Bar * _bar;
};

Bar * pBar = new Bar();
unique_ptr<Foo> upFoo(new Foo());
pFoo->_bar = pBar;
// pBar is now an alias to a member of Foo
upFoo.reset(); // deletes _bar inside foo
pBar->method(); // undefined behavior

All this adds up to quite a number of ways to shoot yourself in the foot! Still, if the only problems were deterministic access violations, I could live with them (I’m a very disciplined programmer). They are reasonably easy to reproduce and can be debugged using standard methods (code coverage). But now there is a new elephant in the room–multithreading.

The beauty of unique objects is that they can be safely passed (moved) between threads and they never require locking. Indeed, since only one thread at a time can reference them, there is no danger of data races. Except when, as in the examples above, aliases are inadvertently leaked from unique_ptr. Accessing an object through such aliases from more than one thread without locking is a very nasty bug, usually very hard to reproduce.

Type system approach

Okay, so what’s the panacea? I’m glad you asked–the type system, of course! Make unique a type qualifier and all your troubles are gone. Notice that the compiler has no idea about the semantics of some library class called unique_ptr, but it can be made aware of the semantics of the unique type qualifier. What’s more, it can enforce this semantics from cradle to grave–from construction to destruction. (I know this proposal has no chance to be incorporated into C++, but it might be useful in, let’s say, the D programming language.)

You don’t construct a unique pointer from an arbitrary pointer. You just construct the object as unique (I’m still using C++-like syntax):

unique Foo * foo = new unique Foo();

Now that the compiler knows the meaning of unique, it can, for instance, detect if a unique pointer is accessed after having been moved. The compiler might prevent such an invalid pointer from being dereferenced or passed to other functions. Such bugs can be detected at compile time, rather than through access violations at runtime.

Do we need separate syntax for move? It might seem that using the assignment syntax (the equal sign) wouldn’t be so bad, as long as the compiler prevents us from dereferencing the null pointer left behind after a move. However, another problem arises when you start instantiating templates with unique pointers. Some containers and algorithms will work out of the box. Some will refuse to be instantiated because internally they try to access unique pointers after the move. But there is a third category of templates that, although move-correct, will generate unexpected results–like vectors with null unique pointers. Unfortunately, there are situations where the compiler has limited vision (i.e., it cannot do whole-program analysis) and can miss a null unique pointer dereference. For instance, it can’t prevent the use of a vector after a random element has been moved out of it.

Let’s assume that we have a special syntax for move, say := . Here’s a simple example of its use:

unique * Foo foo1 = new unique Foo();
unique * Foo foo2 := foo1; // move
foo1->method(); // compile-time error: foo1 invalid after move

In C++0x, a lot of containers and algorithms have been rewritten to use the explicit move instead of the assignment. It wouldn’t be a big deal to use := instead. Notice that the compiler would not allow the use of a regular assignment on unique pointers (unless they are rvalues), so the templates that are not ready for move semantics would refuse to get instantiated with unique template parameters.

Obviously, the move operator applied to a non-unique pointer must also work, otherwise we’d have to write separate templates for unique and non-unique template arguments.

We also have to use our special notation when passing unique arguments to functions, as in:

void f(unique * Foo);

unique * Foo foo = new unique Foo();
f(:= foo); // move
foo->method(); // compile-time error: foo invalid after move

When returning an lvalue unique pointer (for instance a member of a data structure), we use the notation return := foo;. The move operator may be elided when returning a local unique pointer, since the source ceases to exist upon the return (in general, move is implied when the source is a unique rvalue).

The biggest strength of the type-based approach is that the compiler can reliably prevent the aliasing of unique pointers. An assignment (as opposed to move) of an (lvalue) unique pointer to either unique or non-unique pointer is impossible. Taking the address of a unique pointer is forbidden too.

That leaves us with an interesting dilemma: How do you call a (library) function that takes a regular pointer if all you have at your disposal is a unique pointer? C++ unique_ptr let’s you do it through its method get. But we know how dangerous get is as far as aliasing goes.

If you don’t have the source code for the function you are calling, you probably shouldn’t be calling it with a unique pointer, because you have no idea what this function will do with it (store an alias in a global variable?). If you know the implementer of the function and he/she is willing to donate his/her kidney in case the function leaks aliases, you may risk casting away the uniqueness.

There is however a way for a function to guarantee that it’s alias-safe by declaring its parameters lent. The compiler will check the body of the function to make sure it doesn’t leak aliases to any part of the lent parameter, nor does it store non-unique (and possibly aliased) objects inside the lent object. Of course, the function can only pass the lent parameter (or any sub-part of it) to another function that makes the same guarantee.

It’s not obvious which should be the default: lent or it’s opposite, owned. If lent were the default, the compiler would be flagging a lot of valid single-threaded code (although it makes sense to assume that methods of a monitor object take lent parameters by default).

The relationship between unique and lent is very similar to that between immutable and const in D. If you declare data as unique or immutable you can safely pass it to a functions that declares its parameter as lent or const, respectively. lent guarantees that the parameter will not be aliased, const that it won’t be mutated.

Speaking of immutable–there’s always been a problem with constructing non-trivial immutable objects. The tricky part is that during construction we often need to explicitly modify the object, but we don’t want to leak non-const aliases to it or to its sub-parts. And now we have a new tool at hand to control aliasing–unique pointers. Instead of constructing an immutable object, you may construct a unique object, with all the guarantees about aliasing, and then move it to an immutable pointer. Just like this:

unique Foo * pFoo = new unique Foo();
immutable Foo * imFoo := pFoo;

(Of course, C++ doesn’t have immutable types either, but D does.)

By the way, you can always safely move a unique pointer to any of immutable, const, or regular pointers. Notice that it doesn’t mean that unique is a supertype of, for instance, immutable. You can’t pass unique where immutable is expected (you don’t want read-only aliases to escape to another thread!)–you can only move it.

A method that promises not to leak aliases to this is declared as lent. The compiler will detect any violations of this promise, as in:

class Foo {
    Bar * _bar;
public:
    Bar * get() lent {
        return _bar; // error: a lent method returning an alias
    }
};

In general, when you are dealing with a unique object, you may only call its lent methods.

Issues

Strict uniqueness imposes constraints on the internal structure of objects. Imagine creating a unique doubly-linked list. A link in such a list must be accessible from two places: from the previous link and from the next link. The scheme that absolutely prevents the creation of aliases to unique objects will not allow the creation of doubly-linked lists–and rightly so! Imagine moving a link out of the list: you’ll end up with a null pointer in the list, and a possible cross-thread aliasing if the (unique) link migrates to another thread.

There is a solution to this problem based on the ownership scheme (like the one used in Guava or GRFJ), which allows cross-aliasing of objects that share the same owner (e.g., all links are owned by the list). Such aliases cannot be leaked because the compiler won’t allow objects owned by a unique object to be moved. But that’s a topic for another blog post.

The use of iterators on unique containers is also highly restricted, but there are other ways of iterating that are inherently more thread-safe.

Conclusion

Any extension to the type system is usually met with some resistance from the user community. While some appreciate the added safety, others complain about added complexity. So it’s very important to be able to limit this complexity to specific areas.

When are unique and lent qualifiers really needed? In C++, at least in my experience, unique_ptr serves mostly as a poor man’s garbage collector. Since memory management permeates all C++ programs, the switch to using unique qualifiers would require a lot of changes even in single-threaded programs. In contrast, in garbage-collected languages, the main usefulness of unique is for multithreaded programming. A message queue that passes large unique objects between threads is more efficient than the one that deep-copies them. And unique objects never require locking.

When multithreading is involved, a safer type system doesn’t really add complexity. It translates the hard complexity of shared-memory concurrency into something that is more approachable by mere mortals.

As always, the introduction of new type modifiers may lead to code duplication (as in the case of const and non-const accessors). This problem can be dealt with using qualifier polymorphism–a topic for a whole new blog post.

I understand that this post might be hard to follow. That doesn’t mean that the uniqueness scheme is difficult to understand. My task was not only to explain the hows, but also the whys–and that requires giving a lot of background.

Bibliography

Jonathan Aldrich, Valentin Kostadinov, Craig Chambers, Alias Annotations for Program Understanding. Describes unique and lent annotations in detail.
See also my previous blog on Generic Race-Free Java.

February 5, 2009

Immutable Things, where do they all come from?

Posted by Bartosz Milewski under C++, D Programming Language, Java, Programming
1 Comment

An immutable object never changes. You can bet your program on it. As I explained in my previous post, the same is not true for const objects (or readonly objects, in dialects of Java). They may be changed through mutable aliases. An immutable object has no mutable aliases. Ever!

Small print: this guarantee is predicated on the programmer not overriding the type system with casts and other escape mechanisms.

To my knowledge, immutability is currently available in the D programming language and in a Java dialect called IGJ (Immutability Generic Java). It is the default in functional languages.

The closest you can get to immutability in C++ is by using const as a storage class:

const double pi = 3.141592;
const char ERRORMSG[] = "Your bank went belly up.";

Here, the value of pi or ERRORMSG is guaranteed to never change. Global or static immutable values can be used at compile time (for instance as labels in a switch statement).

Creating Immutable Objects

Defining immutable numbers, arrays, or POD structs is pretty straightforward. But what about more complex objects that don’t have static initializers? How do you create an immutable linked list? In functional languages, lists are treated as built-in types; like arrays in general purpose languages. They can be statically initialized. But in C++ or Java the creation of a list involves memory allocations and reference manipulation. Since, by definition, we can’t manipulate an immutable list, it seems like we can never create one!

How about relaxing the constraints a little to allow mutation inside the constructor of an immutable object. Here’s a hypothetical example of a list in D (the current D compiler doesn’t fully implement immutability, so my examples are written in pseudo-D):

class IntList
{
public:
  // default constructor
  this() {} // _head is default-initialized to null
  // one element constructor
  this(int i) {
    _head = new IntLink(i); // mutates _head
  }
private:
  IntLink _head;
}
// will this work?
immutable IntList oneList= new immutable IntList(1);

There is a significant problem with this solution. If this is not considered immutable inside the constructor then there is no guarantee that a mutable reference to this or any of its subobjects won’t escape its scope. Consider this example:

IntLink globalLink; // mutable!

IntList.this(int i) {
  _head = new IntLink(i);
  globalLink = _head; // escape!
}

immutable IntList immList= new immutable IntList(1);
globalLink.setValue(2); // mutates immList!

Here, an immutable object immList has been mutated through an alias globalLink. We can’t allow this!

It’s true that a compiler could perform escape analysis on the constructor of IntList, provided it has access to its source code; which might not always be true when it’s compiling the statement that creates the immutable object. After all, class IntList might be implemented in a third-party library.

In the absence of source code, the only other possibility is to include immutability information in the type signature of the constructor. When an immutable object is created, the compiler would use an immutable constructor, and it would fail if one didn’t exist. Conversely, an immutable constructor would not compile if it allowed a mutable reference to escape. This bad code would not compile:

IntList.this(int i) immutable {
  _head = new IntLink(i);
  globalLink = _head; // error!
}

Of course, no mutable methods may be called from inside an immutable constructor–they couldn’t guarantee the non-escape of mutable aliases.

This solution works, even if it’s not perfect. It often leads to code duplication (the immutable constructor being identical to the mutable one, as in the IntList example). Moreover, it prevents some forms of refactoring. Even though, inside an immutable constructor, you may initialize an object’s fields, you can’t delegate this task to a (perforce immutable) method of the object.

Assignment is not the same as Mutation

The key to immutable construction is the observation that, when constructing an immutable object, it’s okay to assign the object’s fields but not to mutate them. During construction only such “shallow” mutation should be possible.

In my example, the assignment to _head is okay, but the mutation of the IntLink object attached to it should be prohibited. Indeed, I don’t need to mutate the head link once it’s constructed. Of course the construction of an immutable IntLink follows the same rules. Here’s the relevant code:

class IntLink
{
  this(int i) immutable {
    // _next is default-initialized to null
    _val = i; // field assignment
  }
  int _val;
  IntLink _next;
}

With this understanding, it’s actually possible to minimize code duplication. To this end, IGJ introduces a new type modifier, AssignFields. A constructor or a method that performs no other mutation but field assignment may be declared AssignFields. Since AssignFields methods and AssignFields constructors can also be used in mutable contexts, they don’t have to be duplicated. Expanding on the above example:

class IntLink
{
  this(int i) assignfields {
    // _next is default-initialized to null
    SetValue(i); // Ok: it's an assignfields method
  }
  void SetValue(int i) assignfields {
    _val = i; // field assignment
  }
  int _val;
  IntLink _next;
}

As you can see, I was even able to refactor the part of the constructor that does the assignment to _val. I can now use the same constructor in both, mutable and immutable, contexts. The SetValue method can only be called in a mutable or assignfields context.

immutable IntLink iLink = new immutable IntLink(1);
IntLink mLink = new IntLink(2);
mLink.SetValue(3);

Subtyping relationships

It is okay to pass an immutable object to a function that expects a const object. After all, such a function will not mutate the object. If a reference to the object escapes the function, it can only be a const reference. And again, const reference cannot be used to mutate the object, so we’re fine.

The compiler will allow this subsumption if we establish that immutable is a subtype of const (more precisely, for any type T, immutable T is a subtype of const T). This is very similar to the compiler allowing the passing a derived class object to a function that accepts a base class objects–the Liskov substitution principle.

The full subtyping hierarchy between various mutability annotations is as follows:

immutable is a subtype of const: You can pass an immutable object to a function taking a const argument.
assignfields is a subtype of const: You can call a const method from an assignfields method (subsumption of this).
mutable is a subtype of assignfields. You can call an assignfields method on a mutable object (as in mLink.SetValue()).

Because of transitivity, mutable is also a subtype of const. You can pass a mutable object to a function that takes a const argument (you do it in C++ without much thinking).

subtyping relationships

Conclusion

There is some advantage to introducing yet another type modifier, assignfields, to smooth out the process of constructing immutable objects. On the other hand, is it worth additional complexity? How often does one construct non-trivial immutable objects? If it’s not a common programming pattern then maybe we can live with current restrictions and some code duplication. We still have very little experience in using immutable in D, so we might have to wait and see.

Recent news: A video of my talk to the Vancouver C++ Users group was just posted. The black shadow in front of the bright screen talking about memory fences is yours truly.

If you found this post interesting, please register your vote on reddit, so that more people can find it.