Some of the brightest C++ minds had been working on concepts for years, only to vote them out of the C++0x spec–unanimously, not long after they’ve been voted in. In my naivete I decided to write a short blog post explaining why in my opinion it’s still worth pursuing concepts into the next revision of the Standard. I showed a draft to my friends and was overwhelmed by criticism. The topic turned to be so controversial that I was forced not only to defend every single point but to discuss alternative approaches and their pluses. That’s why this post grew so large.

What Are Concepts?

In a nutshell, concepts are a type system for types. Below I have tabled corresponding features and examples of non-generic programming with types vs. generic programming with concepts. The concept Iterator specifies the “type” of the type parameter, T, which is used in the definition of the template function, Count.

Regular Programming Generic Programming
functions, data structures template functions, parametrized types
Example:
int strlen(char const * s)
template<Iterator T>
void Count(T first, T last)
parameter: s type parameter: T
type: char const * concept: Iterator



Just to give you a feel for the syntax, here's an example of a concept--a simplified definition of Iterator:

concept Iterator<typename T>
    : EqualityComparable<T>
{ 
    typename value_type;
    T& operator++(T&);
    ...
}

Iterator illustrates both the simplicity and the complexity of concepts. Let me concentrate on the simple aspects first and leave the discussion of complexities for later.

Iterator refines (inherits from) another concept, EqualityComparable (anything that can be compared using '=='). It declares an associated type, value_type; and an associated function, operator++. Associated types and functions are the bread and butter of concepts.

You might have seen something like an "associated type" implemented as a typedef, value_type, in STL iterators. Of course, such typedefs are only a convention. You may learn about such conventions by analyzing somebody else's code or by reading documentation (it so happens that STL is well documented, but that's definitely not true of all libraries), but they are not part of the language.

You use concepts when you define template functions or parametrized data structures. For instance, here's a toy function template, Count:

template<Iterator T> int Count(T first, T last) {
    int count = 0;
    while (!(first == last)) { 
        ++first;
        ++count;
    }
    return count;
}

As you can see, this function uses two properties of the concept Iterator: you can compare two iterators for equality and you can increment an iterator.

Ignoring the fine points for now, this seems to be pretty straightforward. Now, let's see what problems were concepts designed to solve.

Who Ordered Concepts?

Besides being an elegant abstraction over the type system, concepts offer some of the same advantages that types do:

  • Better error reporting
  • Better documentation
  • Overloading

Error Reporting

Why do templates produce such horrible error messages? The main reason is that errors are detected too late.

It's the same phenomenon you have in weakly typed languages, except that there error detection is postponed even further--till runtime. In weakly typed languages an error is discovered when, for instance, a certain operation cannot be performed on a certain variable. This might happen deep into the calling tree, often inside a library you're not familiar with. You essentially need to examine a stack trace to figure out where the root cause of the problem is and whether it's in your code or inside a library.

Conversely, in statically typed languages, most errors are detected during type-checking. The fact that a certain operation cannot be performed on a certain variable is encoded in the type of that variable. If the error is indeed in your code, it will be detected at the call site and will be much more informative.

In current C++, template errors are detected when a certain type argument does not support a certain operation (often expressed as the inability to convert one type to another or a failure of type deduction). It often happens deep into the instantiation tree. You need an equivalent of the stack trace to figure out where the root cause of the error is. It's called an "instantiation stack" and is dumped by the compiler upon a template error. It often spans several pages and contains unfamiliar names and implementation details of some library code.

I suspect that the main reason a lot of C++ programmers still shun template programming is because they were burnt by such error messages. I know that I take a coffee break before approaching a template error message.

Self Documentation

The only thing better than early error detection is not making the error in the first place. When you're calling a function, you usually look at its signature first. The signature tells you what types are expected and what type is returned. You don't have to study the body of the function and the bodies of functions that it calls (and so on). You don't have to rely on comments or off-line documentation. Signatures are the kind of documentation that is checked by the compiler. Even more importantly, at some point the compiler also checks the body of the function you're calling to see if its arguments are used in accordance with their declared types.

Contrast it with template programming. If the "signature" of a template is of the form template<class T>, you have no information about T. What kind of T's are acceptable can only be divined by studying the template's body and the bodies of templates it is instantiating. (And, of course, the compiler can do nothing to check this "signature" against the template body.)

The Case of STL

Even if you're not defining templates in your own code, you probably use them through the Standard Library--STL in particular. STL is a library of algorithms and containers. Alexander Stepanov's stroke of genius was to separate the two. Using STL you may apply one of the 112 algorithms to 12 standard containers. It's easy to calculate that there are 1344 possible combinations (not counting those algorithms that operate on two or three containers at a time).

Of course not all combinations of algorithms and containers make sense. For instance, you can't apply std::sort to a std::list. The reason is simple: sort (which is based on quicksort) requires random access to the container; a list can only provide sequential access. And this is the crux of the matter: this important restriction on access cannot be expressed in C++. Granted, the code will not compile because at some point sort will try to perform some operation that the list iterator doesn't support. You'll get a page-long error message. By my last count a leading C++ compiler emits 73 cryptic lines, many of them stretching to 350 characters, and nary a mention of random access being required by sort.

Template error messages would be even worse if not for a system of hacks (I mean, template tricks) and concept-like conventions in the STL. For instance, a dummy field iterator_tag corresponds to a concept name and iterator traits look a lot like concept maps. With a number of new features in C++0x (especially decltype that make SFINAE constructs such as the enable_if more powerful), even more "tricks" became possible. If some of this sounds like Gibberish to you, you're in good company. That's the mess that the concept proposal was trying to fix. Concepts do introduce a lot of complexity into the language but they reduce and organize the complexity of programs.

It's worth mentioning that Alex Stepanov collaborated in the development of the concept description language, Tecton, and participated in the C++ concept effort, so he's been acutely aware of the problems with STL.

Competing Options

The idea of limiting types that can be passed to generic algorithms or data structures has been around for some time. The C++ concept proposal is based on the Haskell's notion of type classes. Several other languages opted for simpler constrained templates instead.

Constrained Templates

One way to look at concepts is that they constrain the kind of types that may be passed to a template function or data type. One can drop concepts and instead impose type constraints directly in the definition of a template. For instance, rather than defining the Iterator concept, we could have tried to impose analogous type constraints in the definition of Count (C++ doesn't have special syntax for it but, of course, there are template tricks for everything).

There are various ways of expressing type constraints, so let's look at a few examples. In Java, everything is an object, and objects are classified by classes and interfaces. Java reuses the same class/interface mechanism for constraining template parameters. A generic SortedList, for instance, is parametrized by the type of its elements, T. The constraint imposed on type T is that it implements the interface Comparable.

class SortedList<T extends Comparable<T>>

This approach breaks down for built-in types since, for instance, int cannot be derived from Comparable. (Java's workaround is to use boxed types, like Integer, which are magically derived from Comparable.)

C# has similar approach, with some restrictions and some extensions. For instance, C# lets you specify that a given type is default constructible, as in this example:

class WidgetFactory<T> where T : Widget, new()

Here type T must be derived from Widget and it must have a default constructor.

In the D programming language, template constraints may be expressed by arbitrary compile-time functions as well as use-patterns, which are the topic of the next section.

These approaches are constrained to structural (as opposed to semantic) matching of types. A type is accepted by a constrained template if the operations supported by it match those specified by the constraints name-by-name. If one library names the same operations differently, they won't match (see how Concept Maps deal with this problem).

Use Patterns vs. Signatures

Use patterns formed the basis on the original Texas proposal for C++ Concepts. The idea was to show to the compiler representative examples of what you were planning to do with type arguments. These use patterns look almost like straightforward C++. For instance, an Iterator concept might be defined as follows:

concept Iterator<typename Iter, typename Value> {
    Var<Iter> i;
    Iter j = i; // copyable
    bool b = (i != j); // (non)-equality comparable
    ++i; // supports operator++
    Var<const Value> v;
    *i = v; // connects Value type with Iter type
    ...
}

Notice the subtlety with defining dummy variables i and v. You can't just write:

Iter i;

because that would imply that Iter has a default constructor-- which is not required. Hence special notation Var<T> for introducing dummy variables. In general, the use-pattern language was supposed to be a subset of C++ with some additional syntax on the side. To my knowledge this subset has never been formally defined. Also, there are some useful concepts that are hard or impossible to define in use-pattern language (see the sidebox).

Then there is the problem of concept-checking the body of a template before it is instantiated-- the holy grail of conceptology. It involves some nontrivial analysis and comparison of two parsing trees (the pattern's, and the template body's). As far as I know no formal algorithm has been presented.

Because of the difficulties in formalizing the use-pattern approach, the C++ committee decided to focus on defining concepts using signatures.

We've seen such a signature in the Iterator concept: T&operator++(T&).

How are we to interpret a signature within the context of a concept? Naively, we'd require that there should exist a free function called operator++. But what if the type T has a method operator++ instead? Do we have to list it too (as T::operator++())? A uniform call syntax was proposed that would unify functions and method calls. Our operator++ would match a function or a method (this feature didn't make it into the final concept proposal.)

Also, an operator signature must also match the built-in operator--this way we'd allow an arbitrary pointer to match the Iterator concept.

These are some of the complications I mentioned at the beginning of this post. There are also complications related to symbol lookup (it has to take into account the Argument-Dependent-Lookup hack) and overload resolution, which are notoriously difficult in C++.

Who Ordered Concept Maps?

So far we've talked about concepts and about templates that use concepts. Now let's focus on what happens when we instantiate a template with concrete types. How does the compiler check that those types fulfill the constraints of the concepts? One possibility is the "duck-typing" approach--if the type in question defines the same associated types and functions (and possibly some other constraints I haven't talked about), it is a match. This is called structural (same set of functions) and nominal (by-name) matching.

Structural matching is simple and straightforward but it might lead to "false positives." Here's the archetypal example: Syntactically, there is no difference between a ForwardIterator and an InputIterator, yet semantically those two are not equivalent. Some algorithms that work fine onForwardIterators will fail on InputIterators (the difference is that you can't go twice through an InputIterator).

Semantic matching, on the other hand, requires the client to explicitly match types with concepts (only the clients knows the meaning of a particular concept). This is done through concept maps (in Haskell they are called "instances"). With this approach, the mapping between Iterator and a pointer to T would require the following statement:

template<T>
concept_map Iterator<T *> {
    typedef T value_type;
};

Notice that we don't have to provide the mapping for operator++; it's automatically mapped into the built-in ++. In general, if the associated functions have the same names as the functions defined for the particular type (like ++ in this case), the concept map may be empty. It's mostly because of the need to create those empty concept maps that the concept proposal was rejected.

Interestingly, both groups that worked on concepts-- the Texas group with Bjarne Stroustrup and the Indiana group with Doug Gregor-- supported concept maps. The difference was that the Texas group wanted concepts to be matched automatically, unless marked as explicit; whereas the Indiana group required explicit concept maps, unless the concept was marked auto.

On a more general level, concept maps enable cross-matching between different libraries and naming conventions. Semantically, a concept from one library may match a type from another library, but the corresponding functions and types may have different names. For instance, a StackOfInt concept from one library may declare an associated function called push. A vector from the Standard Library, conceptually, implements this interface, but uses a different name for it. A concept map would provide the appropriate mapping:

concept_map StackOfInt<std::vector<int>> {
    void push(std::vector<int> & v, int x) {
        v.push_back(x);
    }
}

Conclusion

In my opinion, concepts would add a lot to C++. They would replace conventions and hacks in the STL with a coherent set of language rules. They would provide verifiable documentation and drastically simplify template error messages.

Concept maps have an important role to play as the glue between concepts and types. With concept maps you'll be able to add, post hoc, a translation layer between disparate libraries. The back-and-forth between explicit and auto concepts has a philosophical aspect: Do we prefer precision or recall in concept matching? It also has a practical aspect: Is writing empty concept maps an abomination? More experience is needed to resolve this issue.

The major complaint I've heard about the concept proposal was that it was too complex. Indeed, it took 30 pages to explain it in the Draft Standard. However, adding a new layer of abstraction to an existing language, especially one with such baggage as C++, cannot be trivial. The spec had to take into account all possible interaction with other language features, such as name resolution, overload resolution, ADL, various syntactic idiosyncrasies, and so on. But programmers rarely read language specs. I'm sure that, if there was a book "Effective Concepts" or some such, programmers would quickly absorb the rules and start using concepts.

The current hope is that concepts will become part of the 1x version of C++. Doug Gregor is working on a new version of the C++ compiler, which is likely to become testing grounds for concepts. He also publishes a blog with Dave Abrahams (see for instance this post and the follow up).

On the other hand, the concept effort has definitely lost its momentum after it was voted out of C++0x. Corporate support for further research is dwindling. The hope is in the educated programmers saying no to hacks and demanding the introduction of concepts in future versions of the C++ Standard.

Acknowledgments

I'm grateful to the "Seattle Gang", in particular (alphabetically) Andrei Alexandrescu, Walter Bright, Dave Held, Eric Niebler, and Brad Roberts, for reading and commenting on several drafts of this post. I'd like to thank Jeremy Siek for interesting discussions and the inspiration for this post.

Bibliography

  1. Gabriel Dos Reis and Bjarne Stroustrup, Specifying C++ Concepts, the original Texas proposal with use patterns.
  2. Joint paper by the Indiana and Texas groups, Concepts: Linguistic Support for Generic Programming in C++
  3. Google Video of Doug Gregor talking about concepts.
  4. 2009 Draft C++ Standard with concepts still in. See section 14.10.
  5. Bjarne Stroustrup, The concept killer article.
  6. Jeremy Siek, The C++0x Concept Effort presentation slides.
  7. Posts about concepts on C++Next by Dave Abrahams and Doug Gregor.
  8. Dave Abrahams, To Auto or Not?.
  9. Examples of concept-like hacks in C++0x.
About these ads