Sharing and Unsharing of Data Between Threads

I already mentioned the paper, SharC: Checking Data Sharing Strategies for Multithreaded C, by Anderson at al. The authors describe a strategy for checking multithreaded correctness of C programs. They were able to classify sharing modes into five categories:

Private (to current thread)
Read only
Shared under a specific lock (the lock is part of the type)
Racy (no checking)
Dynamically checked (to be either read-only or private)

The programmer makes strategic annotations by adding sharing type qualifiers to declarations of shared data. The SharC tool then derives the rest, and flags inconsistencies. There is also a run-time component for checking dynamically shared data and allowing safe casting between sharing modes.

You might notice some similarities to the D sharing model. For one, SharC assumes that all unannotated sharing is unintended and treats it as an error. In D, if you don’t annotate something as shared, it will be allocated from thread-private pool and it will be invisible to other threads.

The annotation system of SharC was not designed to be very practical. The authors didn’t expect the programmer to precisely annotate all shared variables–it’s clearly too much work. Instead they fell back on global program analysis, which is quite expensive and requires access to all sources.

In D we’ll have to make a few compromises to get maximum benefit from the types system without over-burdening the programmer with tedious annotations. (Making the right trade-offs is the hardest part of language design. You know you got it right when half of the programmers hate you for making it too strict, and the other half for making it too relaxed.) D reduces sharing annotations to just two type qualifiers: shared and invariant. I talked about shared in one of my previous post. The invariant type modifier is already well defined and in use in D 2.0.

The most interesting part of any sharing scheme based on types is the transitions between modes. A common example where such transitions might be desirable is in the producer consumer queue. Objects may be passed between threads–through the queue–either by value or by reference. When they are passed by reference, they have to be shared–multiple threads may access them concurrently. However, once the consumer gets exclusive access to such an object, she might want to treat it as non-shared. Why? Because she might want to pass itto a library function, which was not designed to deal with shared objects.

For obvious reasons we don’t want conversions between shared and non-shared data to be implicit. That would pretty much defeat the whole scheme. So we are left with explicit casting. Here’s my current thinking (which hasn’t been peer-reviewed yet).

There are two types of casts, unchecked and checked. An unchecked cast always succeeds (assuming it compiles), a checked one might throw an exception. C++ dynamic_cast is checked. So is D cast when applied to class objects. There are unchecked casts for numeric types in D, but the standard library provides a checked template to (in the module std.conv), which checks for over- and underflows.

How can you check a cast that strips the shared modifier– i.e., privatizes the object? SharC offers one solution–reference counting.

You can be 100% sure that no other thread has access to your shared data when you can prove that you have the only reference to it. SharC’s run-time uses a very clever reference counting scheme borrowed from a garbage collector to accomplish just that. Could we do it in D? We probably could, if we committed to the reference-counting GC, but that’s rather unlikely.

What’s the next best thing? Locking the object! This will only work for class objects, which have a built-in monitor but, at least in the SafeD subset, we expect class objects to play a major role. Locking the object when privatizing it has several advantages.

It will fail if the object is already locked by another thread
If another thread still has shared access to it after privatization, and tries to lock it, it will fail
The reverse operation, casting an object back to shared, can be checked.

Most of this can be easily accomplished by slightly modifying thin locks (see my previous post). I’ll provide more details later.

Now, let’s talk about casting back to shared (I’m using “back” deliberately, as we don’t want to share objects that were created as non-shared). This cast is much more tricky, since a lot of bad things might have happen while the object was unshared. We can check for some of them and trust the programmer not to do others.

We have to trust the programmer not to squirrel away non-shared aliases to a temporarily unshared object (or its internals). Such aliases become unprotected back doors to the object when it becomes shared again. Remember that, even if you call a synchronized method on an object that is not declared as shared, the synchronization is statically elided by the compiler (at least that’s the plan).

Another danger is that non-shared objects may be inserted into a temporarily unshared object. For that we could check during casting, if we used a different heap for allocating shared objects. We could ask the garbage collector if a given pointer points into the shared heap or not. For class objects, this check can be made much faster by testing a special bit in the thin lock.

Checked sharing casts in both directions have to be recursive, since the shared qualifier is transitive. When casting from shared to non-shared, each class object must have its thin lock put in the “exclusively owned by current thread” state. When casting back to shared, each class object must have its thin lock put back in the sharing state, and all pointers must be checked against the shared heap.

Conveniently enough, in D, such checked casts can be implemented in the library using unchecked casts and reflection.

Bartosz Milewski's Programming Cafe