Andrei brought up the idea of encoding the sharing of an object between threads in the type of the object. After months of discussions we are still not sure how far we want to go with it. One thing is for sure, letting the programmer mark objects for sharing can help the compiler prevent a lot of concurrency bugs.
One of the common concurrency errors is accidental sharing. Some data structures are designed for multi-threaded access, e.g., objects with synchronized methods; and they usually work just fine (except for deadlocks). The problem is when a chunk of data that was not designed for sharing is accessed by multiple threads. There is no easy way to detect this error since, in general, concurrency bugs are hard to reproduce.
The proposal is to make accidental sharing impossible. This requires that all objects, by default, be thread local. For instance, if you declare a global object and initialize it in one thread, another thread will see a different version of this object. In most cases it will see a null pointer or an unitialized object handle, and you’ll get an easy to reproduce null-reference error.
If you consciously want to share an object, you have to declare it “shared”. This type modifier is transitive (just like const and invariant in the D programming language), so you can’t have references to non-shared object inside a shared object. It simply won’t compile.
A function may declare its (reference) parameter as “shared”, in which case the compiler won’t let you pass a non-shared object to it. Conversely, if the parameter is declared as non-shared (the default), no shared argument may be passed in its place. There is a guarantee that it will be thread-local. (See however “unsharing”.)
Let me discuss potential objections to this scheme.
The first is performance–not for shared objects, mind you, but for the non-shared ones. Walter tells us that accessing a thread-local static variable adds between 1 to 3 instructions of overhead. That seems quite reasonable. Especially considering that in multi-threaded environment the use of global non-shared variables is rarely correct.
There is also a performance penalty when starting a new thread–all static variables it has access to have to be default-initialized, plus all module constructors have to be called. This might amount to quite a bit. We will recommend not to overuse global variables and module constructors. The way to amortize this cost is to create thread pools.
What about invariant objects (ones that are never modified)? Those can be safely shared, so they must be allocated as not thread-local. It is okay for a shared object to contain references to invariant objects.
Can a shared object be “unshared”? This is a tricky one. There are situations when threads hand over objects to each other. The object is only shared during the handover, but otherwise is accessed by one thread at a time. The currently owning thread should be able to call regular library functions (that don’t expect sharing) with such objects. So we need some kind of share/unshare cast. On the other hand, such cast creates a wormhole into accidental sharing. There is an interesting SharC paper that discusses runtime techniques to make “unsharing” safe. Safe casting from temporarily non-shared to shared is even more tricky. I’ll talk more about it in my next post.
Finally, there is an unexpected bonus from this scheme for the garbage collector. We will be able to use a separate shared heap (which will also store invariant objects), and separate per-thread heaps for non-shared objects. Since there can’t be any references going from the shared/invariant heap to non-shared ones, per-thread garbage collection will be easy. Only occasional collection of the shared heap would require the cooperation of all threads, and even that could be done without stopping the world.