April 2009



Making the C language race-free is quite a challenge. I discussed one approach in my previous post. Here’s another one, based on lightweight annotations and whole-program analysis.

SharC is a sharing checker and program transformer for C. It is described in the paper SharC: Checking Data Sharing Strategies for Multithreaded C.

Sharing modes

The goal of SharC is to detect (and possibly eliminate) data races from legacy C programs. This is an incremental process in which the programmer annotates the program with type qualifiers, runs the checker, listens to its complaints, ads more annotations, etc., until SharC is totally satisfied and produces an instrumented version of the program with additional built-in runtime checks.

Some of the annotations that specify sharing modes are similar to those of Cyclone, but there are some new ones.

  • private: Thread-local
  • readonly: Does not change (immutable). Interestingly, it may be cast to another sharing mode that is writable.
  • locked(lock): Shared under specific lock. The lock must be readonly.
  • racy: Intentionally racy. No compile- or runtime checks.
  • dynamic: May change sharing mode at runtime, at which point it’s checked for readonly or single-thread access.

Note that SharC provides ways to move data between various sharing modes. The key to these operations is a very clever reference counting scheme built into the memory allocator. The paper provides the details of that scheme, but I’ll skip them here.

The overall policy of SharC is to consider almost all sharing (even under locks) illegal unless explicitly declared by the user.

Safe cast

Dynamic types can be cast between sharing modes. Here’s an example:

int readonly * y;
...
x = SCAST(int private *, y);

A pointer to a readonly int (which may be shared) is cast to a pointer to private (thread-only) int. Normally such cast would be unsafe. The int might be referenced by multiple readonly pointers (potentially shared between threads). The clients of those pointers don’t expect the value to ever change and may access it without locking. That’s why SCAST not only nulls the original pointer y, but also checks if the data reference count is zero (using the clever reference counting scheme I mentioned before). If it’s not, it reports an error.

Example

The paper offers a more elaborate example, which shows the use of dynamic data types and checked casting. The example is based on a pattern typical for multimedia processing and games.

A data buffer is processed in stages and passed from thread to thread.

We start with a linked list of stages. Each stage is run by a separate thread. The first stage is fed some data in a buffer (for instance a frame of a movie). The thread responsible for that stage starts its own processing. When it’s done, it hands the buffer over to the next stage, and so on. The buffer is like a car on an assembly line.

When the assembly line is created, each stage is assigned a different processing function through the function pointer fun (see code below).

You may also think of a stage as a one-element synchronous channel, with the buffer being the message.

What is important in this example is that we don’t want to copy a potentially very large buffer. We want to pass it between threads by reference without risking data races.

This is a stage structure with all the necessary SharC annotations.

typedef struct stage(q) {
    struct stage dynamic * q next;
    cond racy * q cv;
    mutex racy * readonly mut;
    char locked(mut) * locked(mut) sdata;
    void (*q fun)(char private * private fdata);
} stage_t;

Don’t be put off by the sheer number of annotations. Most of them are defaults or can be easily inferred by SharC.

This is what the programmer must annotate:

  • The buffer sdata is shared and protected by mut.
  • The function pointed to by fun takes a private (thread-local) buffer, fdata.

Notice that, without casting, fun cannot be called with sdata, because sdata is shared.

What SharC infers is that:

  • Structure stage is parametrized by some sharing policy, q (which stands for “qualifier”). This qualifier will be provided when the variable of type stage is defined. (This is called qualifier polymorphism.)
  • All pointers inside stage, except for mut and sdata inherit the qualifier q.
  • The field next, which is a pointer to the next stage, has two qualifiers. The pointer itself inherits the qualifier q from its parent. The data it points to is marked dynamic, which means that it will be checked during runtime to be either accessed by a single thread or immutable.
  • The pointers cv and fun also inherit the parent qualifier, q. Condition variable cv is declared racy to turn off compiler checks for sharing. cv is shared between threads but does not require locking.
  • The mutex object is also racy. The pointer to it though is readonly (immutable). This is an important detail, since it is unsafe to change the mutex during the lifetime of data it protects (which is sdata in this case).
  • sdata is a pointer to a char buffer. Both the pointer and the buffer it points to are protected by the mutex, mut.
  • Finally, the pointer fun inherits its qualifier from the parent. The function parameter, fdata, is a private pointer to a private buffer.

This is the beginning of the thread function, thrFunc:

void dynamic * private thrFunc(void dynamic * private d) {
    stage_t dynamic * private S = d;
    stage_t dynamic * private nextS = S->next;
    char private * private ldata;
    ...

The function takes and returns a private void pointer to dynamically shared data. In reality this pointer points to a stage structure–this is reflected in the definition of S. Notice that in C a void pointer can be converted to a typed pointer. In SharC, the compiler makes sure that the qualifiers match.

The next field of S inherits its qualifier private from S. The data it points to (the next stage) is dynamic, as per the definition of stage.

Finally there is a private pointer to a private buffer, which will be used to point to the shared buffer. Since in general private and shared don’t mix, a sharing cast will be performed in the body of thrFunc (see below).

Here’s the rest of thrFunc:

    ...
    while (notDone) {
        mutexLock(S->mut);
        while (S->sdata == NULL)
            condWait(S->cv,S->mut);
        ldata = SCAST(char private *, S->sdata);
        condSignal(S->cv);
        mutexUnlock(S->mut);
        S->fun(ldata);
        if (nextS) {
            mutexLock(nextS->mut);
            while (nextS->sdata)
                condWait(nextS->cv,nextS->mut);
            nextS->sdata =
                SCAST(char locked(nextS->mut) *, ldata);
            condSignal(nextS->cv);
            mutexUnlock(nextS->mut);
        }
    }
    return NULL;
}

The stage S that was passed to the thread function contains shared pointer sdata. To access this pointer we need to take the correct lock, S->mut. This is the lock that was specified in the declaration of sdata inside stage.

First we loop waiting for S->sdata to become non-null. Another thread is supposed to initialize that pointer. As long as the pointer is null, we block on the condition variable S->cv.

while (S->sdata == NULL)
    condWait(S->cv,S->mut);

Notice that condWait takes the mutex as its second argument. It has to unlock it while waiting for the signal, otherwise no other thread would have a chance to modify S->sdata (and we’d have a livelock). (In the channel analogy, we are performing a synchronous receive.)

Now we are ready to call the function S->fun to process the buffer. However, that function is declared to take a private buffer, not a shared buffer. It could, for instance, be a library function that has no provisions for sharing. It could be strlen or a Fast Fourier Transform.

The programmer knows that it is safe to call this function, because the buffer has been handed over between threads and shared access is no longer in the picture. What remains is to convince the compiler that that’s okay, and to add some code that does necessary checks at runtime. All this is done by SCAST.

The line

ldata = SCAST(char private *, S->sdata);

casts S->sdata to a pointer to a private (thread-local) buffer and stores it in the private variable ldata (local data). It also nulls the source and makes sure there are no more references to it (a very clever reference counting scheme, remember?). If this looks to you like move semantics, you get the right idea.

Still under lock, we signal the previous thread that the shared buffer has been nulled so it can re-fill it.

We are now ready to release the lock and do some local processing. We call the function S->fun with the (now private) buffer.

If there is a next stage (pointed to by S->next), we want to pass our processed buffer to it. Before we do that, we have to wait until the buffer in the next stage becomes null. (In the channel analogy, we are now doing a synchronous send.)

while (nextS->sdata)
    condWait(nextS->cv,nextS->mut);

We have to do it under the next stage lock.

We assign our current buffer to the next stage and signal the next thread. Here, again, we have to perform a cast: our buffer is private, and next->sdata is shared and locked under next->mut. The statement:

nextS->sdata = SCAST(char locked(nextS->mut) *, ldata);

casts ldata (local data) to a pointer to char locked(nextS->mut). This is exactly the opposite of what we did earlier (casting shared data to local data). As before, the runtime has to null the source and check if there are no other references to it. This way thread-local unique data may be moved to another thread. If you followed my previous posts, you notice the pattern: unique data, move semantics.

Conclusion

What is unique in SharC approach is the presence of dynamically checked casts that are used to change sharing modes at runtime. This is probably the best that can be done on top of C.

When designing a new language, we could use different mechanisms to achieve the same goals. For instance, the buffer could be declared unique. The move semantics would then guarantee the absence of dangerous aliasing thus eliminating the need for behind-the-scenes reference counting.

The problem of passing a unique object to an unsuspecting library function (without nulling the source) can be solved by declaring the function parameter as lent (which should be the default).

If you read this space on a semi-regular basis, you might have noticed that recently it turned into a series of “Cliff Notes for CS grad students.” That was because I tried to prepare myself for the task of designing a multithreading system and memory model for the D programming language. My future posts will (hopefully) track progress in that direction.


When I read Dan Grossman’s paper, Type-Safe Multithreading in Cyclone, I imagine Dan on a raft in the middle of a tropical cyclone, his threads soaked by the rain, frantically racing to build a shelter. Maybe a type system can’t protect you from the rain, but it sure can protect you from data races–even if the final product looks a little like a Rube Goldberg contraption.

Cyclone, as it turns out, is a safe dialect of C. Dan’s goal was to extended Cyclone’s type system to make it safe in a multithreaded environment. This is not an easy task in a language that uses pointers and manual memory management. The ideas are similar to those in the dialects of Java in my previous posts, but the type-theoretical concepts are pretty advanced.

-Locks

A Cyclone lock is a special type parameterized by name. Defining such entities requires a major type-system tour de force. Little known notions like “existential types,” “dependent types,” and “effect systems” come into play. The major point of all this magic is that you want to use lock names without knowing what they are.

You create a Cyclon lock using the following pseudo-syntax:

let lk<λ> = newlock();

I’ll use Greek letters for symbolic lock names (except for the special name, loc), since the author didn’t propose any concrete syntax and, admittedly, never finished the implementation.

What’s happening behind the scenes is that newlock returns an existential type, which is unpacked into variable lk of type lock_t<λ>. That variable may, for instance, be passed to a (generic) function taking lock_t<λ>.

The beauty of an existential type is that it won’t let the client “guess” the name of the lock. The name is totally opaque–there is not even a type for it. It can only be generated by the system in the call to newlock.

To synchronize a (possibly compound) statement, you precede it with sync(lk), where lk is a lock variable (or an expression that evaluates to such). Continuing the previous example, we may write:

sync(lk) ++*p;

which increments the value pointed to by p under the lock lk.

-Ownership

In an OO language, ownership is the key to a race-free type system. The owner of a given object has to be locked before the object may be accessed.

In a non-OO language, like Cyclone, instead of specifying the owner of data, you specify the lock that protects it. Since you might not have access to the lock variable when you are declaring data, the symbolic lock name is used instead.

Also, since all references in Cyclon are expressed through pointers, lock annotations are associated with pointers. (This is again parallel to annotating object references with their owner objects.)

We can now complete the example above with the annotated definition of p:

int *λ p = new 0;

The pointer p can only be access when the lock whose name is λ is held.

-Local and Shared data

In Cyclon, thread-local data appear as a special case of shared data. A special lock name loc and a (global) variable nonlock are used to turn off the locking of individual instances. (Compare this to the dummy owner thisThread in the OO approach.)

Lock names are also used to parameterize functions and structs. In that case it might be necessary to be able to encode the “sharability” of the type. If the lock type has the kind LS, it actually protects Shared data. The kind LU, on the other hand, may be Unsharable, i.e., it could be instantiated with the thread-local loc. LS is a sub-kind of LU.

The main purpose of “kinding” is to ensure that thread-local data can never be shared between threads.

-Example

All this sounds very dry without an actual (however contrived) example. Let’s start with a simple function inc:

void inc<λ:LU>(int *λ p; {λ}) 
{
    *p = *p + 1;
}

The function is parameterized by a lock name parameter λ of the LU kind. It means that it can be instantiated both in the sharing and thread-local context. Lock-name variable λ first appears in the declaration of the pointer parameter p. Data pointed to by p is protected by the lock named λ. That’s not telling a lot–every pointer in Cyclon is protected by some lock (or nonlock).

What makes the difference is that the same name λ appears in the effects part of the declaration. The runtime system keeps track of what locks are being held at any point in the program, and passes this information to every function. Here, {λ} is an effect that contains one lock name, λ. In other words, the lock that protects p must be taken before calling inc. (Of course, other locks might be held too.)

Here’s an example of how inc may be called from another function, inc2.

void inc2<λ:LU>(lock_t<λ> plk, int *λ p ;{}) 
{
    sync(plk) inc(p);
}

The effects part of the declaration of inc2 is empty. No locks need be held before calling it. But when inc is called, the current effects include the name of the lock specified in the declaration of plk (the sync section adds λ to the effects). Since the same lock name appears in the declaration of p, the precondition for calling inc is fulfilled.

A struct containing a pointer must also be an existential type. The actual value of λ is assigned to LkInt when the type is “packed” (created from its constituents).

struct LkInt 
{<λ:LS> 
    lock_t<λ> plk;
    int*λ p; 
};

Here, LkInt (locked integer) only makes sense in a sharing context, so λ is of the strictly sharing kind, LS. That allows LkInts to be passed between threads.

Here’s an example of using a pointer to LkInt:

void g<λ:LU>(struct LkInt *λ s ;{λ}) 
{
    let LkInt{<λ'> .plk=lk, .p=ptr} = *s;
    inc2(lk, ptr);
}

Function g is parameterized by some lock name of the LU kind (so it may be loc). This lock is used by the callee to lock the LkInt argument s (λ is present in the callee effects).

Now s itself contains a pointer and a lock (of the LS kind). We get at them by unpacking an existential type–splitting it into constituents and giving them new names, lk and ptr. The name of lk is λ’. A shallow copy of s is performed in the process. We can now call inc2, which doesn’t require any effects, and uses lk to lock and increment ptr.

Finally, all these things come together in the example below:

void f(;{}) {
    let lk<λ> = newlock();
    int *λ p1 = new 0;
    int *loc p2 = new 0;
    struct LkInt *loc s = new LkInt{.plk=lk, .p=p1};
    spawn(g, s, sizeof(struct LkInt));
    inc2(lk, p1);
    inc2(nonlock, p2);
}

Here’s the play-by-play:

  • Function f requires no effects.
  • The programmer creates a lock lk with the name λ (unpacks an existential type returned by newlock).
  • She creates a pointer p1 to an integer protected by the lock named λ.
  • Then she creates an unprotected thread-local pointer p2 (using dummy lock name, loc)
  • She combines the lock lk and the pointer p1 into a struct LkInt. This is an example of “packing” an existential type. Note that both components must share the same lock name. (The pointer to LkInt, s, is thread-local–lock name: loc.)
  • The programmer spawns a thread that executes the function g with the argument s. The argument is passed by a shallow copy, but deep contents (the lock and the pointer) are shared. In fact they must be shared with the kinding LS; otherwise spawn would not compile. This mechanism protects any thread-local data from being shared with a new thread.
  • While the call to g(s) executes in parallel, the programmer increments p1 under the lock lk. Notice that these are the same variables that the other thread operates on. The shared lock enforces mutual exclusion.
  • The increment of thread-local p2 requires no lock, that’s why dummy nonlock is used.

-Limitations

The biggest limitation of the Cyclon system is that it’s static. The association between data and locks is rigid. That makes several useful idioms impossible to implement. There is no way to “move” data between threads. You can either pass a pointer together with its lock (essentially creating a monitor), or pass data by value.

The other serious problem is the overall complexity of the system–especially the “effects” part. When effects have to be included and manipulated in generic code things get a little hairy.

Granted, a lot of annotations can be elided–the defaults “do the right thing” in most cases. Still, what remains is asking a lot from an average programmer.

-Conclusion

I would like to thank Dan Grossman for correcting some of my mistakes and explaining a few subtle points.

In my next installment I’ll describe a more recent race-free extension of C, SharC, which supports a mix of static and dynamic checking.