In the previous two installments I tried to explain the origins of rvalue references in C++. If you were left with the impression that each design step was a reaction to some previously unanticipated problem, you’re right. As it often happens in C++, the design was reactive rather than proactive.

Passing by value

If you think about it long enough, you realize that the process that led to rvalue references started with the way C++ implemented passing by value. In C, passing by value was simple–a shallow copy of data was made. “Shallow” means, no pointers were followed in the process of copying. C++ decided to provide hooks for the programmer to override this behavior. If you define a copy constructor it will be used instead of shallow copy. A user-defined copy constructor may for instance follow pointers or references and clone the pointed-to objects, thus implementing “deep copy”. It may also do bookkeeping–count references, etc.

Whenever the compiler needs to copy an object–and that may happen during return by value or pass by value–it inserts the appropriate call to the copy constructor (if one is not provided by the programmer, the default shallow copy is used).

This is a very general mechanism, which however leads to some awkwardness and inefficiencies. For instance, the writer of a copy constructor must remember to copy all of the shallow data members–something that the default shallow copy does automatically. The inefficiencies show up when copying rvalues.

Return value optimizations

Let’s revisit the example from my previous post:

 auto_ptr<Foo> factory() {
    return auto_ptr<Foo>(new Foo);
 }
 // client code
 auto_ptr<Foo> ap = factory();

Here, a temporary auto_ptr is created inside the function, copied using a copy constructor into the caller’s auto_ptr, and then immediately destroyed.

This is so obviously inefficient that the compiler will often optimize it away by constructing the return auto_ptr directly in the space allocated by the caller (a pointer to which is passed as a hidden argument to the function). This is called return value optimization (RVO) and it eliminates the call to the copy constructor and to the destructor.

But what about more general cases, like this one:

 auto_ptr<Foo> factory() {
    auto_ptr<Foo> src;
    ...
    src.reset(new Foo);
    ...
    return src;
 }
 // client code
 auto_ptr<Foo> ap = factory();

Some compilers will attempt to optimize this code as well, using the named return value optimization (NRVO).

The C++ Standard allows such optimizations, which elide the calls to a copy constructor and a destructor, but it doesn’t mandate them. Moreover, NRVO cannot be applied to functions that have multiple return statements, with different objects being returned from each.

But notice an interesting thing: If (N)RVO were mandated by the language and applicable in every case, we could almost implement auto_ptr without the rvalue reference copy constructor. We wouldn’t need to reach back at the source auto_ptr from the copy constructor to turn off its destruction. The destructor (together with the copy constructor) would be guaranteed to be elided.

In fact, if we don’t insist on using caller’s space for creating the return value, we could cover the case of multiple return statements. And how would we transfer bits between the two spaces? We’d do a shallow, bitwise, copy!

An alternative return-by-value scheme

Here’s the scheme that was proposed for the D programming language:

  1. The compiler adds a hidden parameter to every function that returns a non-trivial data structure by value. This parameter is the address of the stack space allocated by the caller for the return value. (This is a temporary space for the rvalue returned by the function) This part is common with the C++ implementation
  2. The function allocates its local variables on the stack, as usual, including the ones that are (or might be–in case of multiple return statements) returned. RVO may be used to eliminate some of these allocations. This is still in common with C++.
  3. When executing the return statement, the value to be returned is bit-copied into the caller’s space.
  4. No copy constructor is called. No destructor is called for the variable that’s been returned.

Look ma! No copy constructor!

What is amazing about this scheme, is that it may be applied to all situations where the source of the copy is an rvalue. This is also true when passing an rvalue to a function, as in this example:

  struct S;
  void f(S s); // by value
  // caller's code
  f(S(1, 2)); // passing an rvalue

Suddenly there is no need whatsoever for an rvalue-taking copy constructor because it would never be called. Since an rvalue-taking copy constructor was the main motivation for introducing rvalue references to C++, in D we can get away without them.

It takes some getting used to thinking of passing rvalues without calling a copy constructor. Notice however that, even in C++, there is no guarantee that the copy constructor will be called (remember RVO?).

But what about non-trivial copy constructors, like in ref_ptr (reference-counting smart pointer)? It must be called to keep count of the reference count, right? Not when the source is an rvalue! Remember that we are eliding not only the copy constructor (which increments the reference count) but also the destructor of the source (which decrements the reference count). The net result is no change to the reference count.

This is a very important optimization, especially in multithreaded code, when reference counting incurs expensive synchronization overhead.

Location transparency

There is one gotcha in this scheme–we are assuming that a shallow copy will always result in a valid object. This is not true if the object contains a pointer to itself. A shallow copy will duplicate the pointer, the object itself will move to a new location. but the pointer will point at the old location. This could be a disaster.

For this and several other reasons D assumes location transparency. An object cannot point at itself (or any of its shallow members), because then it couldn’t be transparently moved to a new location.

Unfortunately this requirement cannot be enforced statically. A dynamic check during copying is possible, but it’s not clear that the overhead is always worth paying. This is still an open design area.

Notice that, in D, the scope of this problem is narrowed to structs and fixed arrays that contain pointers. There is no direct way to manipulate the location of class objects (they are always accessed indirectly)–except by the garbage collector. A generational garbage collector that moves objects in memory could definitely profit from location transparency.

The binding table

The binding table in D is slightly different than the one in C++:

  • references bind to lvalues (same as C++)
  • const references bind only to lvalues (different)
  • “values” (arguments passed by value) bind to
    • lvalues (same as C++)
    • rvalues (same as C++, except that no copy constructor is called)

Notice that in D we don’t want a const reference to bind to an rvalue. This is an inherently dangerous feature of C++. Consider the following C++ example:

  struct S;
  const S& f(const S& x) {
     return x;
  }
  // caller's code
  const S& y = f(S(1, 2));
  y.access(); // kaboom!

Instead, if you want to have a version of a function that binds to rvalues, you declare its arguments as passed by value, as in this D example:

  S f(S x) {
    return x;
  }
  // caller's code
  auto x = f(S(1, 2));
  x.access; // fine, it's a copy

In the worst case, the overhead of the D solution is a bitwise copy of a struct (which can often be optimized away) against the indirect access of the C++ solution. But considering that it buys us safety, it’s a trade off well worth it.

I’d like to thank Andrei Alexandrescu for his comments and, in general, for driving the design of pass-by-value in D.

Advertisements