C++



The series of posts about so called benign data races stirred a lot of controversy and led to numerous discussions at the startup I was working at called Corensic. Two bastions formed, one claiming that no data race was benign, and the other claiming that data races were essential for performance. Then it turned out that we couldn’t even agree on the definition of a data race. In particular, the C++11 definition seemed to deviate from the established notions.

What Is a Data Race Anyway?

First of all, let’s make sure we know what we’re talking about. In current usage a data race is synonymous with a low-level data race, as opposed to a high-level race that involves either multiple memory locations, or multiple accesses per thread. Everybody agrees on the meaning of data conflict, which is multiple threads accessing the same memory location, at least one of them through a write. But a data conflict is not necessarily a data race. In order for it to become a race, one more condition must be true: the access has to be “simultaneous.”

Unfortunately, simultaneity is not a well defined term in concurrent systems. Leslie Lamport was the first to observe that a distributed system follows the rules of Special Relativity, with no independent notion of simultaneity, rather than those of Galilean Mechanics, with its absolute time. So, really, what defines a data race is up to your notion of simultaneity.

Maybe it’s easier to define what isn’t, rather than what is, simultaneous? Indeed, if we can tell which event happened before another event, we can be sure that they weren’t simultaneous. Hence the use of the famous “happened before” relationship in defining data races. In Special Relativity this kind of relationship is established by the exchange of messages, which can travel no faster than the speed of light. The act of sending a message always happens before the act of receiving the same message. In concurrent programming this kind of connection is made using synchronizing actions. Hence an alternative definition of a data race: A memory conflict without intervening synchronization.

The simplest examples of synchronizing actions are the taking and the releasing of a lock. Imagine two threads executing this code:

  mutex.lock();
  x = x + 1;
  mutex.unlock();

In any actual execution, accesses to the shared variable x from the two threads will be separated by a synchronization. The happens-before (HB) arrow will always go from one thread releasing the lock to the other thread acquiring it. For instance in:

# Thread 1 Thread 2
1 mutex.lock();
2 x = x + 1;
3 mutex.unlock();
4 mutex.lock();
5 x = x + 1;
6 mutex.unlock();

the HB arrow goes from 3 to 4, clearly separating the conflicting accesses in 2 and 5.

Notice the careful choice of words: “actual execution.” The following execution that contains a race can never happen, provided the mutex indeed guarantees mutual exclusion:

# Thread 1 Thread 2
1 mutex.lock();
2 mutex.lock();
3 x = x + 1; x = x + 1;
4 mutex.unlock();
5 mutex.unlock();

It turns out that the selection of possible executions plays an important role in the definition of a data race. In every memory model I know of, only sequentially consistent executions are tried in testing for data races. Notice that non-sequentially-consistent executions may actually happen, but they do not enter the data-race test.

In fact, most languages try to provide the so called DRF (Data Race Free) guarantee, which states that all executions of data-race-free programs are sequentially consistent. Don’t be alarmed by the apparent circularity of the argument: you start with sequentially consistent executions to prove data-race freedom and, if you don’t find any data races, you conclude that all executions are sequentially consistent. But if you do find a data race this way, then you know that non-sequentially-consistent executions are also possible.

drf

DRF guarantee. If there are no data races for sequentially consistent executions, there are no non-sequentially consistent executions. But if there are data races for sequentially consistent executions, the non-sequentially consistent executions are possible.

As you can see, in order to define a data race you have to precisely define what you mean by “simultaneous,” or by “synchronization,” and you have to specify to which executions your definition may be applied.

The Java Memory Model

In Java, besides traditional mutexes that are accessed through “synchronized” methods, there is another synchronization device called a volatile variable. Any access to a volatile variable is considered a synchronization action. You can draw happens-before arrows not only between consecutive unlocks and locks of the same object, but also between consecutive accesses to a volatile variable. With this extension in mind, Java offers the the traditional DRF guarantee. The semantics of data-race free programs is well defined in terms of sequential consistency thus making every Java programmer happy.

But Java didn’t stop there, it also attempted to provide at least some modicum of semantics for programs with data races. The idea is noble–as long as programmers are human, they will write buggy programs. It’s easy to proclaim that any program with data races exhibits undefined behavior, but if this undefined behavior results in serious security loopholes, people get really nervous. So what the Java memory model guarantees on top of DRF is that the undefined behavior resulting from data races cannot lead to out-of-thin-air values appearing in your program (for instance, security credentials for an intruder).

It is now widely recognized that this attempt to define the semantics of data races has failed, and the Java memory model is broken (I’m citing Hans Boehm here).

The C++ Memory Model

Why is it so important to have a good definition of a data race? Is it because of the DRF guarantee? That seems to be the motivation behind the Java memory model. The absence of data races defines a subset of programs that are sequentially consistent and therefore have well-defined semantics. But these two properties: being sequentially consistent and having well-defined semantics are not necessarily the same. After all, Java tried (albeit unsuccessfully) to define semantics for non sequentially consistent programs.

So C++ chose a slightly different approach. The C++ memory model is based on partitioning all programs into three categories:

  1. Sequentially consistent,
  2. Non-sequentially consistent, but with defined semantics, and
  3. Incorrect programs with undefined semantics

The first category is very similar to race-free Java programs. The place of Java volatile is taken by C++11 default atomic. The word “default” is crucial here, as we’ll see in a moment. Just like in Java, the DRF guarantee holds for those programs.

It’s the second category that’s causing all the controversy. It was introduced not so much for security as for performance reasons. Sequential consistency is expensive on most multiprocessors. This is why many C++ programmers currently resort to “benign” data races, even at the risk of undefined behavior. Hans Boehm’s paper, How to miscompile programs with “benign” data races, delivered a death blow to such approaches. He showed, example by example, how legitimate compiler optimizations may wreak havoc on programs with “benign” data races.

Fortunately, C++11 lets you relax sequential consistency in a controlled way, which combines high performance with the safety of well-defined (if complex) semantics. So the second category of C++ programs use atomic variables with relaxed memory ordering semantics. Here’s some typical syntax taken from my previous blog post:

std::atomic<int> owner = 0
...
owner.load(memory_order_relaxed);

And here’s the controversial part: According to the C++ memory model, relaxed memory operations, like the above load, don’t contribute to data races, even though they are not considered synchronization actions. Remember one of the versions of the definition of a data race: Conflicting actions without intervening synchronization? That definition doesn’t work any more.

The C++ Standard decided that only conflicts for which there is no defined semantics are called data races.

Notice that some forms of relaxed atomics may introduce synchronization. For instance, a write access with memory_order_release “happens before” another access with memory_order_acquire, if the latter follows the former in a particular execution (but not if they are reversed!).

Conclusion

What does it all mean for the C++11 programmer? It means that there no longer is an excuse for data races. If you need benign data races for performance, rewrite your code using weak atomics. Weak atomics give you the same kind of performance as benign data races but they have well defined semantics. Traditional “benign” races are likely to be broken by optimizing compilers or on tricky architectures. But if you use weak atomics, the compiler will apply whatever means necessary to enforce the correct semantics, and your program will always execute correctly. It will even naturally align atomic variables to avoid torn reads and writes.

What’s more, since C++11 has well defined memory semantics, compiler writers are no longer forced to be conservative with their optimizations. If the programmer doesn’t specifically mark shared variables as atomic, the compiler is free to optimize code as if it were single-threaded. So all those clever tricks with benign data races are no longer guaranteed to work, even on relatively simple architectures, like the x86. For instance, compiler is free to use your lossy counter or a binary flag for its own temporary storage, as long as it restores it back later. If other threads access those variables through racy code, they might see arbitrary values as part of the “undefined behavior.” You have been warned!


This post is based on the talk I gave in Moscow, Russia, in February 2015 to an audience of C++ programmers.

Let’s agree on some preliminaries.

C++ is a low level programming language. It’s very close to the machine. C++ is engineering at its grittiest.

Category theory is the most abstract branch of mathematics. It’s very very high in the layers of abstraction. Category theory is mathematics at its highest.

So why have I decided to speak about category theory to C++ programmers? There are many reasons.

The main reason is that category theory captures the essence of programming. We can program at many levels, and if I ask somebody “What is programming?” most C++ programmers will probably say that it’s about telling the computer what to do. How to move bytes from memory to the processor, how to manipulate them, and so on.

But there is another view of programming and it’s related to the human side of programming. We are humans writing programs. We decide what to tell the computer to do.

We are solving problems. We are finding solutions to problems and translating them in the language that is understandable to the computer.

But what is problem solving? How do we, humans, approach problem solving? It was only a recent development in our evolution that we have acquired these fantastic brains of ours. For hundreds of millions of years not much was happening under the hood, and suddenly we got this brain, and we used this brain to help us chase animals, shoot arrows, find mates, organize hunting parties, and so on. It’s been going on for a few hundred thousand years. And suddenly the same brain is supposed to solve problems in software engineering.

So how do we approach problem solving? There is one general approach that we humans have developed for problem solving. We had to develop it because of the limitations of our brain, not because of the limitations of computers or our tools. Our brains have this relatively small cache memory, so when we’re dealing with a huge problem, we have to split it into smaller parts. We have to decompose bigger problems into smaller problems. And this is very human. This is what we do. We decompose, and then we attack each problem separately, find the solution; and once we have solutions to all the smaller problems, we recompose them.

So the essence of programming is composition.

If we want to be good programmers, we have to understand composition. And who knows more about composing than musicians? They are the original composers!

So let me show you an example. This is a piece by Johann Sebastian Bach. I’ll show you two versions of this composition. One is low level, and one is high level.

The low level is just sampled sound. These are bytes that approximate the waveform of the sound.

SampledMusic

And this is the same piece in typical music notation.

GavotteEnRondeau

Which one is easier to manipulate? Which one is easier to reason about? Obviously, the high level one!

Notice that, in the high level language, we use a lot of different abstractions that can be processed separately. We split the problem into smaller parts. We know that there are things called notes, and they can be reproduced, in this particular case, using violins. There are also some letters like E, A, B7: these are chords. They describe harmony. There is melody, there is harmony, there is the bass line.

Musicians, when they compose music, use higher level abstractions. These higher level abstractions are easier to manipulate, reason about, and modify when necessary.

And this is probably what Bach was hearing in his head.
 

 
And he chose to represent it using the high level language of musical notation.

Now, if you’re a rap musician, you work with samples, and you learn how to manipulate the low level description of music. It’s a very different process. It’s much closer to low-level C++ programming. We often do copy and paste, just like rap musicians. There’s nothing wrong with that, but sometimes we would like to be more like Bach.

So how do we approach this problem as programmers and not as musicians. We cannot use musical notation to lift ourselves to higher levels of abstraction. We have to use mathematics. And there is one particular branch of mathematics, category theory, that is exactly about composition. If programming is about composition, then this is what we should be looking at.

Category theory, in general, is not easy to learn, but the basic concepts of category theory are embarrassingly simple. So I will talk about some of those embarrassingly simple concepts from category theory, and then explain how to use them in programming in some weird ways that would probably not have occurred to you when you’re programming.

Categories

So what is this concept of a category? Two things: object and arrows between objects.

In category theory you don’t ask what these objects are. You call them objects, you give them names like A, B, C, D, etc., but you don’t ask what they are or what’s inside them. And then you have arrows that connect objects. Every arrow starts at some object and ends at some object. You can have many arrows going between two objects, or none whatsoever. Again, you don’t ask what these arrows are. You just give them names like f, g, h, etc.

And that’s it—that’s how you visualize a category: a bunch of objects and a bunch of arrows between them.

There are some operations on arrows and some laws that they have to obey, and they are also very simple.

Since composition is the essence of category theory (and of programming), we have to define composition in a category.

Composition

Whenever you have an arrow f going from object A to object B, here represented by two little piggies, and another arrow g going from object B to object C, there is an arrow called their composition, g ∘ f, that goes directly from object A to object C. We pronounce this “g after f.”

Composition is part of the definition of a category. Again, since we don’t know what these arrows are, we don’t ask what composition is. We just know that for any two composable arrows — such that the end of one coincides with the start of the other — there exists another arrow that’s their composition.

And this is exactly what we do when we solve problems. We find an arrow from A to B — that’s our subproblem. We find an arrow from B to C, that’s another subproblem. And then we compose them to get an arrow from A to C, and that’s a solution to our bigger problem. We can repeat this process, building larger and larger solutions by solving smaller problems and composing the solutions.

Notice that when we have three arrows to compose, there are two ways of doing that, depending on which pair we compose first. We don’t want composition to have history. We want to be able to say: This arrow is a composition of these three arrows: h after g after f, without having to use parentheses for grouping. That’s called associativity:

 (f ∘ g) ∘ h = f ∘ (g ∘ h)

Composition in a category must be associative.

And finally, every object has to have an identity arrow. It’s an arrow that goes from the object back to itself. You can have many arrows that loop back to the same object. But there is always one such loop for every object that is the identity with respect to composition.

Identity

It has the property that if you compose it with any other arrow that’s composable with it — meaning it either starts or ends at this object — you get that arrow back. It acts like multiplication by one. It’s an identity — it doesn’t change anything.

Monoid

I can immediately give you an example of a very simple category that I’m sure you know very well and have used all your adult life. It’s called a monoid. It’s another embarrassingly simple concept. It’s a category that has only one object. It may have lots of arrows, but all these arrows have to start at this object and end at this object, so they are all composable. You can compose any two arrows in this category to get another arrow. And there is one arrow that’s the identity. When composed with any other arrow it will give you back the same arrow.

Monoid

There are some very simple examples of monoids. We have natural numbers with addition and zero. An arrow corresponds to adding a number. For instance, you will have an arrow that corresponds to adding 5. You compose it with an arrow that corresponds to adding 3, and you get an arrow that corresponds to adding 8. Identity arrow corresponds to adding zero.

Multiplication forms a monoid too. The identity arrow corresponds to multiplying by 1. The composition rule for these arrows is just a multiplication table.

Strings form another interesting monoid. An arrow corresponds to appending a particular string. Unit arrow appends an empty string. What’s interesting about this monoid is that it has no additional structure. In particular, it doesn’t have an inverse for any of its arrows. There are no “negative” strings. There is no anti-“world” string that, when appended to “Hello world”, would result in the string “Hello“.

In each of these monoids, you can think of the one object as being a set: a set of all numbers, or a set of all strings. But that’s just an aid to imagination. All information about the monoid is in the composition rules — the multiplication table for arrows.

In programming we encounter monoids all over the place. We just normally don’t call them that. But every time you have something like logging, gathering data, or auditing, you are using a monoid structure. You’re basically adding some information to a log, appending, or concatenating, so that’s a monoidal operation. And there is an identity log entry that you may use when you have nothing interesting to add.

Types and Functions

So monoid is one example, but there is something closer to our hearts as programmers, and that’s the category of types and functions. And the funny thing is that this category of types and functions is actually almost enough to do programming, and in functional languages that’s what people do. In C++ there is a little bit more noise, so it’s harder to abstract this part of programming, but we do have types — it’s a strongly typed language, modulo implicit conversions. And we do have functions. So let’s see why this is a category and how it’s used in programming.

This category is actually called Set — a category of sets — because, to the lowest approximation, types are just sets of values. The type bool is a set of two values, true and false. The type int is a set of integers from something like negative two billion to two billion (on a 32-bit machine). All types are sets: whether it’s numbers, enums, structs, or objects of a class. There could be an infinite set of possible values, but it’s okay — a set may be infinite. And functions are just mappings between these sets. I’m talking about the simplest functions, ones that take just one value of some type and return another value of another type. So these are arrows from one type to another.

Can we compose these functions? Of course we can. We do it all the time. We call one function, it returns some value, and with this value we call another function. That’s function composition. In fact this is the basis of procedural decomposition, the first serious approach to formalizing problem solving in programming.

Here’s a piece of C++ code that composes two functions f and g.

C g_after_f(A x) {
    B y = f(x);
    return g(y);
}

In modern C++ you can make this code generic — a higher order function that accepts two functions and returns a third function that’s the composition of the two.

Can you compose any two functions? Yes — if they are composable. The output type of one must match the input type of another. That’s the essence of strong typing in C++ (modulo implicit conversions).

Is there an identity function? Well, in C++ we don’t have an identity function in the library, which is too bad. That’s because there’s a complex issue of how you pass things: is it by value, by reference, by const reference, by move, and so on. But in functional languages there is just one function called identity. It takes an argument and returns it back. But even in C++, if you limit yourself to functions that take arguments by value and return values, then it’s very easy to define a generic identity function.

Notice that the functions I’m talking about are actually special kind of functions called pure functions. They can’t have any side effects. Mathematically, a function is just a mapping from one set to another set, so it can’t have side effects. Also, a pure function must return the same value when called with the same arguments. This is called referential transparency.

A pure function doesn’t have any memory or state. It doesn’t have static variables, and doesn’t use globals. A pure function is an ideal we strive towards in programming, especially when writing reusable components and libraries. We don’t like having global variables, and we don’t like state hidden in static variables.

Moreover, if a function is pure, you can memoize it. If a function takes a long time to evaluate, maybe you’ll want to cache the value, so it can be retrieved quickly next time you call it with the same arguments.

Another property of pure functions is that all dependencies in your code only come through composition. If the result of one function is used as an argument to another then obviously you can’t run them in parallel or reverse the order of execution. You have to call them in that particular order. You have to sequence their execution. The dependencies between functions are fully explicit. This is not true for functions that have side effects. They may look like independent functions, but they have to be executed in sequence, or their side effects will be different.

We know that compiler optimizers will try to rearrange our code, but it’s very hard to do it in C++ because of hidden dependencies. If you have two functions that are not composed, they just calculate different things, and you try to call them in a different order, you might get a completely different result. It’s because of the order of side effects, which are invisible to the compiler. You would have to go deep into the implementation of the functions; you would have to analyse everything they are doing, and the functions they are calling, and so on, in order to find out what these side effects are; and only then you could decide: Oh, I can swap these two functions.

In functional programming, where you only deal with pure functions, you can swap any two functions that are not explicitly composed, and composition is immediately visible.

At this point I would expect half of the audience to leave and say: “You can’t program with pure functions, Programming is all about side effects.” And it’s true. So in order to keep you here I will have to explain how to deal with side effects. But it’s important that you start with something that is easy to understand, something you can reason about, like pure functions, and then build side effects on top of these things, so you can build up abstractions on top of other abstractions.

You start with pure functions and then you talk about side effects, not the other way around.

Auditing

Instead of explaining the general theory of side effects in category theory, I’ll give you an example from programming. Let’s solve this simple problem that, in all likelihood, most C++ programmers would solve using side effects. It’s about auditing.

You start with a sequence of functions that you want to compose. For instance, you have a function getKey. You give it a password and it returns a key. And you have another function, withdraw. You give it a key and gives you back money. You want to compose these two functions, so you start with a password and you get money. Excellent!

But now you have a new requirement: you want to have an audit trail. Every time one of these functions is called, you want to log something in the audit trail, so that you’ll know what things have happened and in what order. That’s a side effect, right?

How do we solve this problem? Well, how about creating a global variable to store the audit trail? That’s the simplest solution that comes to mind. And it’s exactly the same method that’s used for standard output in C++, with the global object std::cout. The functions that access a global variable are obviously not pure functions, we are talking about side effects.

string audit;

int logIn(string passwd){
  audit += passwd;
  return 42;
}

double withdraw(int key){
   audit += “withdrawing ”;
   return 100.0;
}

So we have a string, audit, it’s a global variable, and in each of these functions we access this global variable and append something to it. For simplicity, I’m just returning some fake numbers, not to complicate things.

This is not a good solution, for many reasons. It doesn’t scale very well. It’s difficult to maintain. If you want to change the name of the variable, you’d have to go through all this code and modify it. And if, at some point, you decide you want to log more information, not just a string but maybe a timestamp as well, then you have to go through all this code again and modify everything. And I’m not even mentioning concurrency. So this is not the best solution.

But there is another solution that’s really pure. It’s based on the idea that whatever you’re accessing in a function, you should pass explicitly to it, and then return it, with modifications, from the function. That’s pure. So here’s the next solution.

pair<int, string> 
logIn(string passwd, string audit){
  return make_pair(42, audit + passwd);
}

pair<double, string> 
withdraw(int key, string audit){
  return make_pair(100.0
                 , audit + “withdrawing ”);
}

You modify all the functions so that they take an additional argument, the audit string. And the return type is also changed. When we had an int before, it’s now a pair of int and string. When we had a double before, it’s now a pair of double and string. These function now call make_pair before they return, and they put in whatever they were returning before, plus they do this concatenation of a new message at the end of the old audit string. This is a better solution because it uses pure functions. They only depend on their arguments. They don’t have any state, they don’t access any global variables. Every time you call them with the same arguments, they produce the same result.

The problem though is that they don’t memoize that well. Look at the function logIn: you normally get the same key for the same password. But if you want to memoize it when it takes two arguments, you suddenly have to memoize it for all possible histories. Even if you call it with the same password, but the audit string is different, you can’t just access the cache, you have to cache a new pair of values. Your cache explodes with all possible histories.

An even bigger problem is security. Each of these functions has access to the complete log, including the passwords.

Also, each of these functions has to care about things that maybe it shouldn’t be bothered with. It knows about how to concatenate strings. It knows the details of the implementation of the log: that the log is a string. It must know how to accumulate the log.

Now I want to show you a solution that maybe is not that obvious, maybe a little outside of what we would normally think of.

pair<int, string> 
logIn(string passwd){
  return make_pair(42, passwd);
}

pair<double, string> 
withdraw(int key){
  return make_pair(100.0
                  ,“withdrawing ”);
}

We use modified functions, but they don’t take the audit string any more. They just return a pair of whatever they were returning before, plus a string. But each of them only creates a message about what it considers important. It doesn’t have access to any log and it doesn’t know how to work with an audit trail. It’s just doing its local thing. It’s only responsible for its local data. It’s not responsible for concatenation.

It still creates a pair and it has a modified return type.

We have one problem though: we don’t know how to compose these functions. We can’t pass a pair of key and string from logIn to withdraw, because withdraw expects an int. Of course we could extract the int and drop the string, but that would defeat the goal of auditing the code.

Let’s go back a little bit and see how we can abstract this thing. We have functions that used to return some types, and now they return pairs of the original type and a string. This should in principle work with any original type, not just an int or a double. In functional programming we call this “lifting.” Here we lift some type A to a new type, which is a pair of A and a string. Or we can say that we are “embellishing” the return type of a function by pairing it with a string.

I’ll create an alias for this new parameterised type and call it Writer.

template<class A>
using Writer = pair<A, string>;

My functions now return Writers: logIn returns a writer of int, and withdraw returns a writer of double. They return “embellished” types.

Writer<int> logIn(string passwd){
    return make_pair(42, passwd);
}

Writer<double> withdraw(int key){
    return make_pair(100.0, “withdrawing ”);
} 

So how do we compose these embellished functions?

In this case we want to compose logIn with withdraw to create a new function called transact. This new function transact will take a password, log the user in, withdraw money, and return the money plus the audit trail. But it will return the audit trail only from those two functions.

Writer<double> transact(string passwd){
  auto p1 logIn(passwd);
  auto p2 withdraw(p1.first);
  return make_pair(p2.first
          , p1.second + p2.second);
}

How is it done? It’s very simple. I call the first function, logIn, with the password. It returns a pair of key and string. Then I call the second function, passing it the first component of the pair — the key. I get a new pair with the money and a string. And then I perform the composition. I take the money, which is the first component of the second pair, and I pair it with the concatenation of the two string that were the second components of the pairs returned by logIn and withdraw.

So the accumulation of the log is done “in between” the calls (think of composition as happening between calls). I have these two functions, and I’m composing them in this funny way that involves the concatenation of strings. The accumulation of the log does not happen inside these two functions, as it happened before. It happens outside. And I can pull out this code and abstract the composition. It doesn’t really matter what functions I’m calling. I can do it for any two functions that return embellished results. I can write generic code that does it and I can call it “compose”.

template<class A, class B, class C>
function<Writer<C>(A)> compose(function<Writer<B>(A)> f
                              ,function<Writer<C>(B)> g)
{
    return [f, g](A x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                  , p1.second + p2.second);
    };
}

What does compose do? It takes two functions. The first function takes A and returns a Writer of B. The second function takes a B and return a Writer of C. When I compose them, I get a function that takes an A and returns a Writer of C.

This higher order function just does the composition. It has no idea that there are functions like logIn or withdraw, or any other functions that I may come up with later. It takes two embellished functions and glues them together.

We’re lucky that in modern C++ we can work with higher order functions that take functions as arguments and return other functions.

This is how I would implement the transact function using compose.

Writer<double> transact(string passwd){
  return compose<string, int, double>
           (logIn, withdraw)(passwd);
}

The transact function is nothing but the composition of logIn and withdraw. It doesn’t contain any other logic. I’m using this special composition because I want to create an audit trail. And the audit trail is accumulated “between” the calls — it’s in the glue that glues these two functions together.

This particular implementation of compose requires explicit type annotations, which is kind of ugly. We would like the types to be inferred. And you can do it in C++14 using generalised lambdas with return type deduction. This code was contributed by Eric Niebler.

auto const compose = [](auto f, auto g) {
    return [f, g](auto x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                    , p1.second + p2.second);
    };
};
Writer<double> transact(string passwd){
  return compose(logIn, withdraw)(passwd);
}

Back to Categories

Now that we’ve done this example, let’s go back to where we started. In category theory we have functions and we have composition of functions. Here we also have functions and composition, but it’s a funny composition. We have functions that take simple types, but they return embellished types. The types don’t match.

Let me remind you what we had before. We had a category of types and pure functions with the obvious composition.

  • Objects: types,
  • Arrows: pure functions,
  • Composition: pass the result of one function as the argument to another.

What we have created just now is a different category. Slightly different. It’s a category of embellished functions. Objects are still types: Types A, B, C, like integers, doubles, strings, etc. But an arrow from A to B is not a function from type A to type B. It’s a function from type A to the embellishment of the type B. The embellished type depends on the type B — in our case it was a pair type that combined B and a string — the Writer of B.

Now we have to say how to compose these arrows. It’s not as trivial as it was before. We have one arrow that takes A into a pair of B and string, and we have another arrow that takes B into a pair of C and string, and the composition should take an A and return a pair of C and string. And I have just defined this composition. I wrote code that does this:

auto const compose = [](auto f, auto g) {
    return [f, g](auto x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                    , p1.second + p2.second);
    };
};

So do we have a category here? A category that’s different from the original category? Yes, we do! It has composition and it has identity.

What’s its identity? It has to be an arrow from the object to itself, from A to A. But an arrow from A to A is a function from A to a pair of A and string — to a Writer of A. Can we implement something like this? Yes, easily. We will return a pair that contains the original value and the empty string. The empty string will not contribute to our audit trail.

template<class A>
Writer<A> identity(A x) {
    return make_pair(x, "");
}

Is this composition associative? Yes, it is, because the underlying composition is associative, and the concatenation of strings is associative.

We have a new category. We have incorporated side effects by modifying the original category. We are still only using pure functions and yet we are able to accumulate an audit trail as a side effect. And we moved the side effects to the definition of composition.

It’s a funny new way of looking at programming. We usually see the functions, and the data being passed between functions, and here suddenly we see a new dimension to programming that is orthogonal to this, and we can manipulate it. We change the way we compose functions. We have this new power to change composition. We have a new way of solving problems by moving to these embellished functions and defining a new way of composing them. We can define new combinators to compose functions, and we’ll let the combinators do some work that we don’t want these functions to do. We can factor these things out and make them orthogonal.

Does this approach generalize?

One easy generalisation is to observe that the Writer structure works for any monoid. It doesn’t have to be just strings. Look at how composition and identity are defined in our new cateogory. The only properties of the log we are using are concatenation and unit. Concatenation must be associative for the composition to be associative. And we need a unit of concatenation so that we can define identity in our category. We don’t need anything else. This construction will work with any monoid.

And that’s great because you have one more dimension in which you can modify your code without touching the rest. You can change the format of the log, and all you need to modify in your code is compose and identity. You don’t have to go through all your functions and modify the code. They will still work because all the concatenation of logs is done inside compose.

Kleisli Categories

This was just a little taste of what is possible with category theory. The thing I called embellishment is called a functor in category theory. You can implement categorical functors in C++. There are all kinds of embellishments/functors that you can use here. And now I can tell you the secret: this funny composition of functions with the funny identity is really a monad in disguise. A monad is just a funny way of composing embellished functions so that they form a category. A category based on a monad is called a Kleisli category.

Are there any other interesting monads that I can use this construction with? Yes, lots! I’ll give you one example. Functions that return futures. That’s our new embellishment. Give me any type A and I will embellish it by making it into a future. This embellishment also produces a Kleisli category. The composition of functions that return futures is done through the combinator “then”. You call one function returning a future and compose it with another function returning a future by passing it to “then.” You can compose these function into chains without ever having to block for a thread to finish. And there is an identity, which is a function that returns a trivial future that’s always ready. It’s called make_ready_future. It’s an arrow that takes A and returns a future of A.

Now you understand what’s really happening. We are creating this new category based on future being a monad. We have new words to describe what we are doing. We are reusing an idea from category theory to solve a completely different problem.

Resumable Functions

There is one little invonvenience with this approach. It requires writing a lot of so called “boilerplate” code. Repetitive code that obscures the simple logic. Here it’s the glue code, the “compose” and the “then.” What you’d like to do is to write your code directly in terms of embellished function, and the composition to be implicit. People noticed this and came up with solutions. In case of futures, the practical solution is called resumable functions.

Resumable functions are designed to hide the composition of functions that return futures. Here’s an example.

int cnt = 0;
do
{
   cnt = await streamR.read(512, buf);
   if ( cnt == 0 ) break;
   cnt = await streamW.write(cnt, buf);
} while (cnt > 0);

This code copies a file using a buffer, but it does it asynchronously. We call a function read that’s asynchronous. It doesn’t immediately fill the buffer, it returns a future instead. Then we call the function write that’s also asynchronous. We do it in a loop.

This code looks almost like sequential code, except that it has these await keywords. These are the points of insertion of our composition. These are the places where the code is chopped into pieces and composed using then.

I won’t go into details of the implementation. The point is that the composition of these embellished functions is almost entirely hidden. It doesn’t look like composition in a Kleisli category, but it really is.

This solution is usually described at a very low level, in terms of coroutines implemented as state machines with static variables and gotos. And what is being lost in all this engineering talk is how general this idea is — the idea of overloading composition to build a category of embellished functions.

Just to drive this home, here’s an example of different code that does completely different stuff. It calculates Fibonacci numbers on demand. It’s a generator of Fibonacci numbers.

generator<int> fib() 
{
    int a = 0; 
    int b = 1; 
    for (;;) { 
        __yield_value a; 
        auto next = a + b; 
        a = b; 
        b = next; 
    } 
} 

Instead of await it has __yield_value. But it’s the same idea of resumable functions, only with a different monad. This monad is called a list monad. And this kind of code in combination with Eric Niebler’s proposed range library could lead to very powerful programming idioms.

Conclusion

Why do we have to separate the two notions: that of resumable functions and that of generators, if they are based on the same abstraction? Why do we have to reinvent the wheel?

There’s this great opportunity for C++, and I’m afraid it will be missed like so many other opportunities for great generalisations that were missed in the past. It’s the opportunity to introduce one general solution based on monads, rather than keep creating ad-hoc solutions, one problem at a time. The same very general pattern can be used to control all kinds of side effects. It can be used for auditing, exceptions, ranges, futures, I/O, continuations, and all kinds of user-defined monads.

This amazing power could be ours if we start thinking in more abstract terms, if we reach into category theory.


This is part 4 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
---------
m o n e y

using monads in C++. Previously: The Tale of Two Monads. To start from the beginning, go to Using Monads in C++ to Solve Constraints: 1.

In the previous post I showed some very general programming techniques based on functional data structures and monads. A lot of the code I wrote to solve this particular arithmetic puzzle is easily reusable. In fact the monadic approach is perfect for constructing libraries. I talked about it when discussing C++ futures and ranges. A monad is a pure expression of composability, and composability is the most important feature of any library.

You would be justified to think that rolling out a monad library just to solve a simple arithmetic puzzle is overkill. But I’m sure you can imagine a lot of other, more complex problems that could be solved using the same techniques. You might also fear that such functional methods — especially when implemented in C++, which is not optimized for this kind of programming — would lead to less performant code. This doesn’t matter too much if you are solving a one-shot problem, and the time you spend developing and debugging your code dwarfs the time it takes the program to complete execution. But even if performance were an issue and you were faced with a larger problem, functional code could be parallelized much easier than its imperative counterpart with no danger of concurrency bugs. So you might actually get better performance from functional code by running it in parallel.

Refactoring

In this installment I’d like to talk about something that a lot of functional programmers swear by: Functional programs are amazingly easy to refactor. Anybody who has tried to refactor C++ code can testify to how hard it is, and how long it takes to iron out all the bugs introduced by refactoring. With functional code, it’s a breeze. So let’s have another look at the code from the previous post and do some deep surgery on it. This is our starting point:

StateList<tuple> solve()
{
    StateList sel = &select;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

What immediately stands out is the amount of repetition: eight lines of code look almost exactly the same. Conceptually, these lines correspond to eight nested loops that would be used in the imperative solution. The question is, how can we abstract over control structures, such as loops? But in our monadic implementation the loops are replaced with higher-order functions, and that opens up a path toward even further abstraction.

Reifying Substitutions

The first observation is that we are missing a data structure that should correspond to the central concept we use when describing the solution — the substitution. We are substituting numbers for letters, but we only see those letters as names of variables. A more natural implementation would use a map data structure, mapping characters to integers.

Notice, however, that this mapping has to be dynamically collected and torn down as we are exploring successive branches of the solution space. The imperative approach would demand backtracking. The functional approach makes use of persistent data structures. I have described such data structures previously, in particular a persistent red-black tree. It can be easily extended to a red-black map. You can find the implementation on github.

A red-black map is a generic data structure, which we’ll specialize for our use:

using Map = RBMap<char, int>;

A map lookup may fail, so it should either be implemented to return an optional, or use a default value in case of a failure. In our case we know that we never look up a value that’s not in the map (unless our code is buggy), so we can use a helper function that never fails:

int get(Map const & subst, char c)
{
    return subst.findWithDefault(-1, c);
}

Once we have objectified the substitution as a map, we can convert a whole string of characters to a number in one operation:

int toNumber(Map const & subst, string const & str)
{
    int acc = 0;
    for (char c : str)
    {
        int d = get(subst, c);
        acc = 10 * acc + d;
    }
    return acc;
}

There is one more thing that we may automate: finding the set of unique characters in the puzzle. Previously, we just figured out that they were s, e, n, d, m, o, r, y and hard-coded them into the solution. Now we can use a function, fashioned after a Haskell utility called nub:

string nub(string const & str)
{
    string result;
    for (char c : str)
    {
        if (result.find(c) == string::npos)
            result.push_back(c);
    }
    return result;
}

Don’t worry about the quadratic running time of nub — it’s only called once.

Recursion

With those preliminaries out of the way, let’s concentrate on analyzing the code. We have a series of nested mbind calls. The simplest thing is to turn these nested calls into recursion. This involves setting up the first recursive call, implementing the recursive function, and writing the code to terminate the recursion.

The main observation is that each level of nesting accomplishes one substitution of a character by a number. The recursion should end when we run out of characters to substitute.

Let’s first think about what data has to be passed down in a recursive call. One item is the string of characters that still need substituting. The second is the substitution map. While the first argument keeps shrinking, the second one keeps growing. The third argument is the number to be used for the next substitution.

Before we make the first recursive call, we have to prepare the initial arguments. We get the string of characters to be substituted by running nub over the text of our puzzle:

nub("sendmoremoney")

The second argument should start as an empty map. The third argument, the digit to be used for the next substitution, comes from binding the selection plan select<int> to a lambda that will call our recursive function. In Haskell, it’s common to call the auxiliary recursive function go, so that’s what we’ll do here as well:

StateList<tuple<int, int, int>> solve()
{
    StateList<int> sel = &select<int>;
    Map subst;
    return mbind(sel, [=](int s) {
        return go(nub("sendmoremoney"), subst, s);
    });
}

When implementing go, we have to think about terminating the recursion. But first, let’s talk about how to continue it.

We are called with the next digit to be used for substitution, so we should perform this substitution. This means inserting a key/value pair into our substitution map subst. The key is the first character from the string str. The value is the digit i that we were given.

subst.inserted(str[0], i)

Notice that, because the map is immutable, the method inserted returns a new map with one more entry. This is a persistent data structure, so the new map shares most of its data with its previous version. The creation of the new version takes logarithmic time in the number of entries (just as it does with a regular balanced tree that’s used in the Standard Library std::map). The advantage of using a persistent map is that we don’t have to do any backtracking — the unmodified version is still available after the return from the recursive call.

The recursive call to go expects three arguments: (1) the string of characters yet to be substituted — that will be the tail of the string that we were passed, (2) the new substitution map, and (3) the new digit n. We get this new digit by binding the digit selection sel to a lambda that makes the recursive call:

mbind(sel, [=](int n) {
    string tail(&str[1], &str[str.length()]);
    return go(tail, subst.inserted(str[0], i), n);
});

Now let’s consider the termination condition. Since go was called with the next digit to be substituted, we need at least one character in the string for this last substitution. So the termination is reached when the length of the string reaches one. We have to make the last substitution and then call the final function that prunes the bad substitutions. Here’s the complete implementation of go:

StateList<tuple<int, int, int>> go(string str, Map subst, int i)
{
    StateList<int> sel = &select<int>;

    assert(str.length() > 0);
    if (str.length() == 1)
        return prune(subst.inserted(str[0], i));
    else
    {
        return mbind(sel, [=](int n) {
            string tail(&str[1], &str[str.length()]);
            return go(tail, subst.inserted(str[0], i), n);
        });
    }
}

The function prune is lifted almost literally from the original implementation. The only difference is that we are now using a substitution map.

StateList<tuple<int, int, int>> prune(Map subst)
{
    return mthen(guard(get(subst, 's') != 0 && get(subst, 'm') != 0), 
      [=]() {
        int send  = toNumber(subst, "send");
        int more  = toNumber(subst, "more");
        int money = toNumber(subst, "money");
        return mthen(guard(send + more == money), [=]() {
            return mreturn(make_tuple(send, more, money));
        });
    });
}

The top-level code that starts the chain of events and displays the solution is left unchanged:

int main()
{
    List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    cout << evalStateList(solve(), lst);
    return 0;
}

It’s a matter of opinion whether the refactored code is simpler or not. It’s definitely easier to generalize and it’s less error prone. But the main point is how easy it was to make the refactorization. The reason for that is that, in functional code there are no hidden dependencies, because there are no hidden side effects. What would usually be considered a side effect in imperative code is accomplished using pure functions and monadic binding. There is no way to forget that a given function works with state, because a function that deals with state has the state specified in its signature — it returns a StateList. And it doesn’t itself modify the state. In fact it doesn’t have access to the state. It just composes functions that will modify the state when executed. That’s why we were able to move these functions around so easily.

A Word About Syntax

Monadic code in C++ is artificially complex. That’s because C++ was not designed for functional programming. The same program written in Haskell is much shorter. Here it is, complete with the helper functions prune, get, and toNumber:

solve = StateL select >>= go (nub "sendmoremoney") M.empty
  where
    go [c] subst i = prune (M.insert c i subst)
    go (c:cs) subst i = StateL select >>= go cs (M.insert c i subst)
    prune subst = do
        guard (get 's' /= 0 && get 'm' /= 0)
        let send  = toNumber "send"
            more  = toNumber "more"
            money = toNumber "money"
        guard (send + more == money)
        return (send, more, money)
      where
        get c = fromJust (M.lookup c subst)
        toNumber str = asNumber (map get str)
        asNumber = foldl (\t u -> t*10 + u) 0
main = print $ evalStateL solve [0..9]

Some of the simplification comes from using the operator >>= in place of mbind, and a simpler syntax for lambdas. But there is also some built-in syntactic sugar for chains of monadic bindings in the form of the do notation. You see an example of it in the implementation of prune.

The funny thing is that C++ is on the verge of introducing something equivalent to do notation called resumable functions. There is a very strong proposal for resumable functions in connection with the future monad, dealing with asynchronous functions. There is talk of using it with generators, or lazy ranges, which are a version of the list monad. But there is still a need for a push towards generalizing resumable functions to user-defined monads, like the one I described in this series. It just doesn’t make sense to add separate language extensions for each aspect of the same general pattern. This pattern has been known in functional programming for a long time and it’s called a monad. It’s a very useful and versatile pattern whose importance has been steadily growing as the complexity of software projects increases.

Bibliography

  1. The code for this blog is available on github.
  2. Douglas M. Auclair, MonadPlus: What a Super Monad!, The Monad.Reader Issue 11, August 25, 2008. It contains the Haskell implementation of the solution to this puzzle.
  3. Justin Le, Unique sample drawing & searches with List and StateT — “Send more money”. It was the blog post that inspired this series.

This is part 3 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
---------
m o n e y

using monads in C++. Previously: The State Monad

A constraint satisfaction problem may be solved by brute force, by trying all possible substitutions. This may be done using the imperative approach with loops, or a declarative approach with the list monad. The list monad works by fanning out a list of “alternative universes,” one for each individual substitution. It lets us explore the breadth of the solution space. The universes in which the substitution passes all the constraints are then gathered together to form the solution.

We noticed also that, as we are traversing the depth of the solution space, we could avoid testing unnecessary branches by keeping track of some state. Indeed, we start with a list of 10 possibilities for the first character, but there are only 9 possibilities for the second character, 8 for the third, and so on. It makes sense to start with a list of ten digits, and keep removing digits as we make our choices. At every point, the list of remaining choices is the state we have to keep track of.

Again, there is the imperative approach and the functional approach to state. The imperative approach makes the state mutable and the composition of actions requires backtracking. The functional approach uses persistent, immutable data structures, and the composition is done at the level of functions — plans of action that can be executed once the state is available. These plans of action form the state monad.

The functional solution to our problem involves the combination of the list monad and the state monad. Mashing two monads together is not trivial — in Haskell this is done using monad transformers — but here I’ll show you how to do it manually.

State List

Let’s start with the state, which is the list of digits to choose from:

using State = List<int>;

In the state monad, our basic data structure was a function that took a state and returned a pair: a value and a new state. Now we want our functions to generate, as in quantum mechanics, a whole list of alternative universes, each described by a pair: a value and a state.

Notice that we have to vary both the value and the state within the list. For instance, the function that selects a number from a list should produce a different number and a different list in each of the alternative universes. If the choice is, say, 3, the new state will have 3 removed from the list.

ListSelections

Here’s the convenient generic definition of the list of pairs:

template<class A>
using PairList = List<pair<A, State>>;

And this is our data structure that represents a non-deterministic plan of action:

template<class A> 
using StateList = function<PairList<A>(State)>;

Once we have such a non-deterministic plan of action, we can run it. We give it a state and get back a list of results, each paired with a new state:

template<class A>
PairList<A> runStateList(StateList<A> st, State s)
{
    return st(s);
}

Composing Non-Deterministic Plans

As we’ve seen before, the essence of every monad is the composition of individual actions. The higher-order function mbind that does the composition has the same structure for every monad. It takes two arguments. One argument is the result of our previous activity. For the list monad, it was a list of values. For the state monad it was a plan of action to produce a value (paired with the leftover state). For the combination state-list monad it will be a plan of action to produce a list of values (each combined with the leftover state).

The second argument to mbind is a continuation. It takes a single value. It’s supposed to be the value that is encapsulated in the first argument. In the case of the list monad, it was one value from the list. In the case of the state monad, it was the value that would be returned by the execution of the plan. In the combined state-list monad, this will be one of the values produced by the plan.

The continuation is supposed to produce the final result, be it a list of values or a plan. In the state-list monad it will be a plan to produce a list of values, each with its own leftover state.

Finally, the result of mbind is the composite: for the list monad it was a list, for the state monad it was a plan, and for the combination, it will be a plan to produce a list of values and states.

We could write the signature of mbind simply as:

template<class A, class B>
StateList<B> mbind(StateList<A>, function<StateList<B>(A)>)

but that would prevent the C++ compiler from making automatic type deductions. So instead we parameterize it with the type A and a function type F. That requires some additional footwork to extract the return type from the type of F:

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))

The implementation of mbind is a combination of what we did for the list monad and for the state monad. To begin with, we are supposed to return a plan, so we have to create a lambda. This lambda takes a state s as an argument. Inside the lambda, given the state, we can run the first argument — the plan. We get a list of pairs. Each pair contains a value and a new state. We want to execute the continuation k for each of these values.

To do that, we run fmap over this list (think std::transform, but less messy). Inside fmap, we disassemble each pair into value and state, and execute the continuation k over the value. The result is a new plan. We run this plan, passing it the new state. The result is a list of pairs: value, state.

Since each item in the list produces a list, the result of fmap is a list of lists. Just like we did in the list monad, we concatenate all these lists into one big list using concatAll.

template<class A, class F>
auto mbind(StateList<A> g, F k) 
    -> decltype(k(g(State()).front().first))
{
    return [g, k](State s) {
        PairList<A> plst = g(s);
        // List<PairList<B>> 
        auto lst2 = fmap([k](pair<A, State> const & p) {
            A a = p.first;
            State s1 = p.second;
            auto ka = k(a);
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);
    };
}

To make state-list into a full-blown monad we need one more ingredient: the function mreturn that turns any value into a trivial plan to produce a singleton list with that value. The state is just piped through:

template<class A>
StateList<A> mreturn(A a)
{
    return [a](State s) {
        // singleton list
        return PairList<A>(make_pair(a, s)); 
    };
}

So that would be all for the state-list monad, except that in C++ we have to special-case mbind for the type of continuation that takes a void argument (there is no value of type void in C++). We’ll call this version mthen:

template<class A, class F>
auto mthen(StateList<A> g, F k) -> decltype(k())
{
    return [g, k](State s) {
        PairList<A> plst = g(s);
        auto lst2 = fmap([k](pair<A, State> const & p) {
            State s1 = p.second;
            auto ka = k();
            auto result = runStateList(ka, s1);
            return result;
        }, plst);
        return concatAll(lst2);
    };
}

Notice that, even though the value resulting from running the first argument to mthen is discarded, we still have to run it because we need the modified state. In a sense, we are running it only “for side effects,” although all these functions are pure and have no real side effects. Once again, we can have our purity cake and eat it too, simulating side effects with pure functions.

Guards

Every time we generate a candidate substitution, we have to check if it satisfies our constraints. That would be easy if we could access the substitution. But we are in a rather peculiar situation: We are writing code that deals with plans to produce substitutions. How do you test a plan? Obviously we need to write a plan to do the testing (haven’t we heard that before?).

Here’s the trick: Look at the implementation of mthen (or mbind for that matter). When is the continuation not executed? The continuation is not executed if the list passed to fmap is empty. Not executing the continuation is equivalent to aborting the computation. And what does mthen return in that case? An empty list!

So the way to abort a computation is to pass an argument to mthen that produces an empty list. Such a poison pill of a plan is produced by a function traditionally called mzero.

template<class A>
StateList<A> mzero()
{
    return[](State s) {
        return PairList<A>(); // empty list
    };
}

In functional programming, a monad that supports this functionality is called a “monad plus” (it also has a function called mplus). It so happens that the list monad and the state-list monad are both plus.

Now we need a function that produces a poison pill conditionally, so we can pass the result of this function to mthen. Remember, mthen ignores the individual results, but is sensitive to the size of the list. The function guard, on success, produces a discardable value — here, a nullptr — in a singleton list. This will trigger the execution of the continuation in mthen, while discarding the nullptr. On failure, guard produces mzero, which will abort the computation.

StateList<void*> guard(bool b)
{
    if (b) {
        return [](State s) {
            // singleton list with discardable value
            return List<pair<void*, State>>(make_pair(nullptr, s));
        };
    }
    else
        return mzero<void*>(); // empty list
}

The Client Side

Everything I’ve described so far is reusable code that should be put in a library. Equipped with the state-list monad library, all the client needs to do is to write a few utility functions and combine them into a solution.

Here’s the function that picks a number from a list. Actually, it picks a number from a list in all possible ways. It returns a list of all picks paired with the remainders of the list. This function is our non-deterministic selection plan.

template<class A>
PairList<A> select(List<A> lst)
{
    if (lst.isEmpty())
        return PairList<A>();

    A       x  = lst.front();
    List<A> xs = lst.popped_front();

    auto result = List<pair<A, State>>();
    forEach(select(xs), [x, &result](pair<A, List<A>> const & p)
    {
        A       y  = p.first;
        List<A> ys = p.second;
        auto y_xys = make_pair(y, ys.pushed_front(x));
        result = result.pushed_front(y_xys);
    });

    return result.pushed_front(make_pair(x, xs));
}

Here’s a little utility function that converts a list (actually, a vector) of digits to a number:

int asNumber(vector<int> const & v)
{
    int acc = 0;
    for (auto i : v)
        acc = 10 * acc + i;
    return acc;
}

And here’s the final solution to our puzzle:

StateList<tuple<int, int, int>> solve()
{
    StateList<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

It starts by creating the basic plan called sel for picking numbers from a list. (The list is the state that will be provided later.) It binds this selection to a very large continuation that produces the final result — the plan to produce triples of numbers.

Here, sel is really a plan to produce a list, but the continuation expects just one number, s. If you look at the implementation of mbind, you’ll see that this continuation is called for every single pick, and the results are aggregated into one list.

The large continuation that is passed to the first mbind is itself implemented using mbind. This one, again, takes the plan sel. But we know that the state on which this sel will operate is one element shorter than the one before it. This second selection is passed to the next continuation, and so on.

Then the pruning begins: There is an mthen that takes a guard that aborts all computations where either s or m is zero. Then three numbers are formed from the selected digits. Notice that all those digits are captured by the enclosing lambdas by value (the [=] clauses). Yet another mthen takes a guard that checks the arithmetics and, if that one passes, the final plan to produce a singleton triple (send, more, money) is returned.

In theory, all these singletons would be concatenated on the way up to produce a list of solutions, but in reality this puzzle has only one solution. To get this solution, you run the plan with the list of digits:

List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
cout << evalStateList(solve(), lst);

The function evalStateList works the same way as runStateList except that it discards the leftover state. You can either implement it or use runStateList instead.

What’s Next?

One of the advantages of functional programming (other than thread safety) is that it makes refactoring really easy. Did you notice that 8 lines of code in our solution are almost identical? That’s outrageous! We have to do something about it. And while we’re at it, isn’t it obvious that a substitution should be represented as a map from characters to digits? I’ll show you how to do it in the next installment.

All this code and more is available on github.


This is the second part of the cycle “Using Monads in C++ to Solve Constraints” in which I’m illustrating functional techniques using the example of a simple puzzle:

  s e n d
+ m o r e
---------
m o n e y

Previously, I talked about using the list monad to search the breadth of the solution space.

What would you do if you won a lottery? Would you buy a sports car, drop your current job, and go on a trip across the United States? Maybe you would start your own company, multiply the money, and then buy a private jet?

We all like making plans, but they are often contingent on the state of our finances. Such plans can be described by functions. For instance, the car-buying plan is a function:

pair<Car, Cash> buyCar(Cash cashIn)

The input is some amount of Cash, and the output is a brand new Car and the leftover Cash (not necessarily a positive number!).

In general, a financial plan is a function that takes cash and returns the result paired with the new value of cash. It can be described generically using a template:

template<class A>
using Plan = function<pair<A, Cash>(Cash)>;

You can combine smaller plans to make bigger plans. For instance, you may use the leftover cash from your car purchase to finance your trip, or invest in a business, and so on.

There are some things that you already own, and you can trivially include them in your plans:

template<class A>
Plan<A> got_it(A a)
{
    return [a](Cash s) { return make_pair(a, s); };
}

What does all this daydreaming have to do with the solution to our puzzle? I mentioned previously that we needed to keep track of state, and this is how functional programmers deal with state. Instead of relying on side effects to silently modify the state, they write code that generates plans of action.

An imperative programmer, on the other hand, might implement the car-buying procedure by passing it a bank object, and the withdrawal of money would be a side effect of the purchase. Or, the horror!, the bank object could be global.

In functional programming, each individual plan is a function: The state comes in, and the new state goes out, paired with whatever value the function was supposed to produce in the first place. These little plans are aggregated into bigger plans. Finally, the master plan is executed — that’s when the actual state is passed in and the result comes out together with a new state. We can do the same in modern C++ using lambdas.

You might be familiar with a similar technique used with expression templates in C++. Expression templates form the basis of efficient implementation of matrix calculus, where expressions involving matrices and vectors are not evaluated on the spot but rather converted to parse trees. These trees may then be evaluated using more efficient techniques. You can think of an expression template as a plan for evaluating the final result.

The State Monad

SelectionsWithLists

To find the solution to our puzzle, we will generate substitutions by picking digits from a list of integers. This list of integers will be our state.

using State = List<int>;

We will be using a persistent list, so we won’t have to worry about backtracking. Persistent lists are never mutated — all their versions persist, and we can go back to them without fearing that they may have changed. We’ll need that when we combine our state calculations with the list monad to get the final solution. For now, we’ll just consider one substitution.

We’ll be making and combining all kinds of plans that rely on, and produce new state:

template<class A>
using Plan = function<pair<A, State>(State)>;

We can always run a plan, provided we have the state available:

template<class A>
pair<A, State> runPlan(Plan<A> pl, State s)
{
    return pl(s);
}

You may recall from the previous post that the essence of every monad is the ability to compose smaller items into bigger ones. In the case of the state monad we need the ability to compose plans of action.

For instance, suppose that you know how to generate a plan to travel across the United States on a budget, as long as you have a car. You don’t have a car though. Not to worry, you have a plan to get a car. It should be easy to generate a composite plan that involves buying the car and then continuing with your trip.

Notice the two ingredients: one is the plan to buy a car: Plan<Car>. The second ingredient is a function that takes a car and produces the plan for your trip, Plan<Trip>. This function is the continuation that leads to your final goal: it’s a function of the type Plan<Trip>(Car). And the continuation itself may in turn be a composite of many smaller plans.

So here’s the function mbind that binds a plan pl to a continuation k. The continuation uses the output of the plan pl to generate another plan. The function mbind is supposed to return a new composite plan, so it must return a lambda. Like any other plan, this lambda takes a state and returns a pair: value, state. We’ll implement this lambda for the most general case.

The logic is simple. Inside the lambda we will have the state available, so we can run the plan pl. We get back a pair: the value of type A and the new state. We pass this value to the continuation k and get back a new plan. Finally, we run that plan with the new state and we’re done.

template<class A, class F>
auto mbind(Plan<A> pl, F k) -> decltype(k(pl(State()).first))
{
    using B = decltype(k(pl(State()).first)(State()).first);
    // this should really be done with concepts
    static_assert(std::is_convertible<
        F, std::function<Plan<B>(A)>> ::value,
        "mbind requires a function type Plan<B>(A)");

    return [pl, k](State s) {
        pair<A, State> ps = runPlan(pl, s);
        Plan<B> plB = k(ps.first);
        return runPlan(plB, ps.second); // new state!
    };
}

Notice that all this running of plans inside mbind doesn’t happen immediately. It happens when the lambda is executed, which is when the larger plan is run (possibly as part of an even larger plan). So what mbind does is to create a new plan to be executed in the future.

And, as with every monad, there is a function that takes regular input and turns it into a trivial plan. I called it got_it before, but a more common name would be mreturn:

template<class A>
Plan<A> mreturn(A a)
{
    return [a](State s) { return make_pair(a, s); };
}

The state monad usually comes with two helper functions. The function getState gives direct access to the state by turning it into the return value:

Plan<State> getState()
{
    return [](State s) { return make_pair(s, s); };
}

Using getState you may examine the state when the plan is running, and dynamically select different branches of your plan. This makes monads very flexible, but it also makes the composition of different monads harder. We’ll see this in the next installment, when we compose the state monad with the list monad.

The second helper function is used to modify (completely replace) the state.

Plan<void*> putState(State newState)
{
    return [=](State s) { return make_pair(nullptr, newState); };
}

It doesn’t evaluate anything useful, so its return type is void* and the returned value is nullptr. Its only purpose is to encapsulate a side effect. What’s amazing is that you can do it and still preserve functional purity. There’s no better example of having your cake and eating it too.

Example

To demonstrate the state monad in action, let’s implement a tiny example. We start with a small plan that just picks the first number from a list (the list will be our state):

pair<int, List<int>> select(List<int> lst)
{
    int i = lst.front();
    return make_pair(i, lst.popped_front());
}

Persistent list method popped_front returns a list with its first element popped off. Typical of persistent data structures, this method doesn’t modify the original list. It doesn’t copy the list either — it just creates a pointer to its tail.

Here’s our first plan:

Plan<int> sel = &select;

Now we can create a more complex plan to produce a pair of integers:

Plan<pair<int, int>> pl = 
    mbind(sel, [=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

Let’s analyze this code. The first mbind takes a plan sel that selects the first element from a list (the list will be provided later, when we execute this plan). It binds it to the continuation (whose body I have grayed out) that takes the selected integer i and produces a plan to make a pair of integers. Here’s this continuation:

[=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

It binds the plan sel to another, smaller continuation that takes the selected element j and produces a plan to make a pair of integers. Here’s this smaller continuation:

[=](int j) { return
    mreturn(make_pair(i, j));
});

It combines the first integer i that was captured by the lambda, with the second integer j that is the argument to the lambda, and creates a trivial plan that returns a pair:

mreturn(make_pair(i, j));

Notice that we are using the same plan sel twice. But when this plan is executed inside of our final plan, it will return two different elements from the input list. When mbind is run, it first passes the state (the list of integers) to the first sel. It gets back the modified state — the list with the first element popped. It then uses this shorter list to execute the plan produced by the continuation. So the second sel gets the shortened list, and picks its first element, which is the second element of the original list. It, too, shortens the list, which is then passed to mreturn, which doesn’t modify it any more.

We can now run the final plan by giving it a list to pick from:

List<int> st{ 1, 2, 3 };
cout << runPlan(pl, st) << endl;

We are still not ready to solve the puzzle, but we are much closer. All we need is to combine the list monad with the state monad. I’m leaving this task for the next installment.

But here’s another look at the final solution:

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

This time I didn’t rename mbind to for_each and mreturn to yield.

Now that you’re familiar with the state monad, you may appreciate the fact that the same sel passed as the argument to mbind will yield a different number every time, as long as the state list contains unique digits.

Before You Post a Comment

I know what you’re thinking: Why would I want to complicate my life with monads if there is a much simpler, imperative way of dealing with state? What’s the payoff of the functional approach? The immediate payoff is thread safety. In imperative programming mutable shared state is a never-ending source of concurrency bugs. The state monad and the use of persistent data structures eliminates the possibility of data races. And it does it without any need for synchronization (other than reference counting inside shared pointers).

I will freely admit that C++ is not the best language for functional programming, and that monadic code in C++ looks overly complicated. But, let’s face it, in C++ everything looks overly complicated. What I’m describing here are implementation details of something that should be encapsulated in an easy to use library. I’ll come back to it in the next installment of this mini-series.

Challenges

  1. Implement select from the example in the text using getState and putState.
  2. Implement evalPlan, a version of runPlan that only returns the final value, without the state.
  3. Implement mthen — a version of mbind where the continuation doesn’t take any arguments. It ignores the result of the Plan, which is the first argument to mthen (but it still runs it, and uses the modified state).
  4. Use the state monad to build a simple RPN calculator. The state is the stack (a List) of items:
    enum ItemType { Plus, Minus, Num };
    
    struct Item
    {
        ItemType _type;
        int _num;
        Item(int i) : _type(Num), _num(i) {}
        Item(ItemType t) : _type(t), _num(-1) {}
    };

    Implement the function calc() that creates a plan to evaluate an integer. Here’s the test code that should print -1:

    List<int> stack{ Item(Plus)
                   , Item(Minus)
                   , Item(4)
                   , Item(8)
                   , Item(3) };
    cout << evalPlan(calc(), stack) << endl;

    (The solution is available on github.)


I am sometimes asked by C++ programmers to give an example of a problem that can’t be solved without monads. This is the wrong kind of question — it’s like asking if there is a problem that can’t be solved without for loops. Obviously, if your language supports a goto, you can live without for loops. What monads (and for loops) can do for you is to help you structure your code. The use of loops and if statements lets you convert spaghetti code into structured code. Similarly, the use of monads lets you convert imperative code into declarative code. These are the kind of transformations that make code easier to write, understand, maintain, and generalize.

So here’s a problem that you may get as an interview question. It’s a small problem, so the advantages of various approaches might not be immediately obvious, especially if you’ve been trained all your life in imperative programming, and you are seeing monads for the first time.

You’re supposed write a program to solve this puzzle:

  s e n d
+ m o r e
---------
m o n e y

Each letter correspond to a different digit between 0 and 9. Before you continue reading this post, try to think about how you would approach this problem.

The Analysis

It never hurts to impress your interviewer with your general knowledge by correctly classifying the problem. This one belongs to the class of “constraint satisfaction problems.” The obvious constraint is that the numbers obtained by substituting letters with digits have to add up correctly. There are also some less obvious constraints, namely the numbers should not start with zero.

If you were to solve this problem using pencil and paper, you would probably come up with lots of heuristics. For instance, you would deduce that m must stand for 1 because that’s the largest possible carry from the addition of two digits (even if there is a carry from the previous column). Then you’d figure out that s must be either 8 or 9 to produce this carry, and so on. Given enough time, you could probably write an expert system with a large set of rules that could solve this and similar problems. (Mentioning an expert system could earn you extra points with the interviewer.)

However, the small size of the problem suggests that a simple brute force approach is probably best. The interviewer might ask you to estimate the number of possible substitutions, which is 10!/(10 – 8)! or roughly 2 million. That’s not a lot. So, really, the solution boils down to generating all those substitutions and testing the constraints for each.

The Straightforward Solution

The mind of an imperative programmer immediately sees the solution as a set of 8 nested loops (there are 8 unique letters in the problem: s, e, n, d, m, o, r, y). Something like this:

for (int s = 0; s < 10; ++s)
    for (int e = 0; e < 10; ++e)
        for (int n = 0; n < 10; ++n)
            for (int d = 0; d < 10; ++d)
                ...

and so on, until y. But then there is the condition that the digits have to be different, so you have to insert a bunch of tests like:

e != s
n != s && n != e
d != s && d != e && d != n

and so on, the last one involving 7 inequalities… Effectively you have replaced the uniqueness condition with 28 new constraints.

This would probably get you through the interview at Microsoft, Google, or Facebook, but really, can’t you do better than that?

The Smart Solution

Before I proceed, I should mention that what follows is almost a direct translation of a Haskell program from the blog post by Justin Le. I strongly encourage everybody to learn some Haskell, but in the meanwhile I’ll be happy to serve as your translator.

The problem with our naive solution is the 28 additional constraints. Well, I guess one could live with that — except that this is just a tiny example of a whole range of constraint satisfaction problems, and it makes sense to figure out a more general approach.

The problem can actually be formulated as a superposition of two separate concerns. One deals with the depth and the other with the breadth of the search for solutions.

Let me touch on the depth issue first. Let’s consider the problem of creating just one substitution of letters with numbers. This could be described as picking 8 digits from a list of 0, 1, …9, one at a time. Once a digit is picked, it’s no longer in the list. We don’t want to hard code the list, so we’ll make it a parameter to our algorithm. Notice that this approach works even if the list contains duplicates, or if the list elements are not easily comparable for equality (for instance, if they are futures). We’ll discuss the list-picking part of the problem in more detail later.

Now let’s talk about breadth: we have to repeat the above process for all possible picks. This is what the 8 nested loops were doing. Except that now we are in trouble because each individual pick is destructive. It removes items from the list — it mutates the list. This is a well known problem when searching through solution spaces, and the standard remedy is called backtracking. Once you have processed a particular candidate, you put the elements back in the list, and try the next one. Which means that you have to keep track of your state, either implicitly on your function’s stack, or in a separate explicit data structure.

Wait a moment! Weren’t we supposed to talk about functional programming? So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming? Functional programmers have been using the state monad since time immemorial. And mutation is not an issue if you’re using persistent data structures. So fasten your seat belts and make sure your folding trays are in the upright position.

The List Monad

We’ll start with a small refresher in quantum mechanics. As you may remember from school, quantum processes are non-deterministic. You may repeat the same experiment many times and every time get a different result. There is a very interesting view of quantum mechanics called the many-worlds interpretation, in which every experiment gives rise to multiple alternate histories. So if the spin of an electron may be measured as either up or down, there will be one universe in which it’s up, and one in which it’s down.

Selections

We’ll use the same approach to solving our puzzle. We’ll create an alternate universe for each digit substitution for a given letter. So we’ll start with 10 universes for the letter s; then we’ll split each of them into ten universes for the letter e, and so on. Of course, most of these universes won’t yield the desired result, so we’ll have to destroy them. I know, it seems kind of wasteful, but in functional programming it’s easy come, easy go. The creation of a new universe is relatively cheap. That’s because new universes are not that different from their parent universes, and they can share almost all of their data. That’s the idea behind persistent data structures. These are the immutable data structures that are “mutated” by cloning. A cloned data structure shares most of its implementation with the parent, except for a small delta. We’ll be using persistent lists described in my earlier post.

Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward. First, we need functions that generate new worlds. Since we are cheap, we’ll only generate the parts that are different. So what’s the difference between all the worlds that we get when selecting the substitution for the letter s? Just the number that we assign to s. There are ten worlds corresponding to the ten possible digits (we’ll deal with the constraints like s being different from zero later). So all we need is a function that generates a list of ten digits. These are our ten universes in a nutshell. They share everything else.

Once you are in an alternate universe, you have to continue with your life. In functional programming, the rest of your life is just a function called a continuation. I know it sounds like a horrible simplification. All your actions, emotions, and hopes reduced to just one function. Well, maybe the continuation just describes one aspect of your life, the computational part, and you can still hold on to our emotions.

So what do our lives look like, and what do they produce? The input is the universe we’re in, in particular the one number that was picked for us. But since we live in a quantum universe, the outcome is a multitude of universes. So a continuation takes a number, and produces a list. It doesn’t have to be a list of numbers, just a list of whatever characterizes the differences between alternate universes. In particular, it could be a list of different solutions to our puzzle — triples of numbers corresponding to “send”, “more”, and “money”. (There is actually only one solution, but that’s beside the point.)

And what’s the very essence of this new approach? It’s the binding of the selection of the universes to the continuation. That’s where the action is. This binding, again, can be expressed as a function. It’s a function that takes a list of universes and a continuation that produces a list of universes. It returns an even bigger list of universes. We’ll call this function for_each, and we’ll make it as generic as possible. We won’t assume anything about the type of the universes that are passed in, or the type of the universes that the continuation k produces. We’ll also make the type of the continuation a template parameter and extract the return type from it using auto and decltype:

template<class A, class F>
auto for_each(List<A> lst, F k) -> decltype(k(lst.front()))
{
    using B = decltype(k(lst.front()).front());
    // This should really be expressed using concepts
    static_assert(std::is_convertible<
        F, std::function<List<B>(A)>>::value,
        "for_each requires a function type List<B>(A)");

    List<List<B>> lstLst = fmap(k, lst);
    return concatAll(lstLst);
}

The function fmap is similar to std::transform. It applies the continuation k to every element of the list lst. Because k itself produces a list, the result is a list of lists, lstLst. The function concatAll concatenates all those lists into one big list.

Congratulations! You have just seen a monad. This one is called the list monad and it’s used to model non-deterministic processes. The monad is actually defined by two functions. One of them is for_each, and here’s the other one:

template<class A>
List<A> yield(A a)
{
    return List<A> (a);
}

It’s a function that returns a singleton list. We use yield when we are done multiplying universes and we just want to return a single value. We use it to create a single-valued continuation. It represents the lonesome boring life, devoid of any choices.

I will later rename these functions to mbind and mreturn, because they are part of any monad, not just the list monad.

The names like for_each or yield have a very imperative ring to them. That’s because, in functional programming, monadic code plays a role similar to imperative code. But neither for_each nor yield are control structures — they are functions. In particular for_each, which sounds and works like a loop, is just a higher order function; and so is fmap, which is used in its implementation. Of course, at some level the code becomes imperative — fmap can either be implemented recursively or using an actual loop — but the top levels are just declarations of functions. Hence, declarative programming.

There is a slight difference between a loop and a function on lists like for_each: for_each takes a whole list as an argument, while a loop might generate individual items — in this case integers — on the fly. This is not a problem in a lazy functional language like Haskell, where a list is evaluated on demand. The same behavior may be implemented in C++ using streams or lazy ranges. I won’t use it here, since the lists we are dealing with are short, but you can read more about it in my earlier post Getting Lazy with C++.

We are not ready yet to implement the solution to our puzzle, but I’d like to give you a glimpse of what it looks like. For now, think of StateL as just a list. See if it starts making sense (I grayed out the usual C++ noise):

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return for_each(sel, [=](int s) {
    return for_each(sel, [=](int e) {
    return for_each(sel, [=](int n) {
    return for_each(sel, [=](int d) {
    return for_each(sel, [=](int m) {
    return for_each(sel, [=](int o) {
    return for_each(sel, [=](int r) {
    return for_each(sel, [=](int y) {
        return yield_if(s != 0 && m != 0, [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return yield_if(send + more == money, [=]() {
                return yield(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

The first for_each takes a selection of integers, sel, (never mind how we deal with uniqueness); and a continuation, a lambda, that takes one integer, s, and produces a list of solutions — tuples of three integers. This continuation, in turn, calls for_each with a selection for the next letter, e, and another continuation that returns a list of solutions, and so on.

The innermost continuation is a conditional version of yield called yield_if. It checks a condition and produces a zero- or one-element list of solutions. Internally, it calls another yield_if, which calls the ultimate yield. If that final yield is called (and it might not be, if one of the previous conditions fails), it will produce a solution — a triple of numbers. If there is more than one solution, these singleton lists will get concatenated inside for_each while percolating to the top.

In the second part of this post I will come back to the problem of picking unique numbers and introduce the state monad. You can also have a peek at the code on github.

Challenges

  1. Implement for_each and yield for a vector instead of a List. Use the Standard Library transform instead of fmap.
  2. Using the list monad (or your vector monad), write a function that generates all positions on a chessboard as pairs of characters between 'a' and 'h' and numbers between 1 and 8.
  3. Implement a version of for_each (call it repeat) that takes a continuation k of the type function<List<B>()> (notice the void argument). The function repeat calls k for each element of the list lst, but it ignores the element itself.

This is part 8 of Categories for Programmers. Previously: Functors. See the Table of Contents.

Now that you know what a functor is, and have seen a few examples, let’s see how we can build larger functors from smaller ones. In particular it’s interesting to see which type constructors (which correspond to mappings between objects in a category) can be extended to functors (which include mappings between morphisms).

Bifunctors

Since functors are morphisms in Cat (the category of categories), a lot of intuitions about morphisms — and functions in particular — apply to functors as well. For instance, just like you can have a function of two arguments, you can have a functor of two arguments, or a bifunctor. On objects, a bifunctor maps every pair of objects, one from category C, and one from category D, to an object in category E. Notice that this is just saying that it’s a mapping from a cartesian product of categories C×D to E.

Bifunctor

That’s pretty straightforward. But functoriality means that a bifunctor has to map morphisms as well. This time, though, it must map a pair of morphisms, one from C and one from D, to a morphism in E.

Again, a pair of morphisms is just a single morphism in the product category C×D. We define a morphism in a cartesian product of categories as a pair of morphisms which goes from one pair of objects to another pair of objects. These pairs of morphisms can be composed in the obvious way:

(f, g) ∘ (f', g') = (f ∘ f', g ∘ g')

The composition is associative and it has an identity — a pair of identity morphisms (id, id). So a cartesian product of categories is indeed a category.

An easier way to think about bifunctors would be to consider them functors in each argument separately. So instead of translating functorial laws — associativity and identity preservation — from functors to bifunctors, it would be enough to check them separately for each argument. However, in general, separate functoriality is not enough to prove joint functoriality. Categories in which joint functoriality fails are called premonoidal.

Let’s define a bifunctor in Haskell. In this case all three categories are the same: the category of Haskell types. A bifunctor is a type constructor that takes two type arguments. Here’s the definition of the Bifunctor typeclass taken directly from the library Control.Bifunctor:

class Bifunctor f where
    bimap :: (a -> c) -> (b -> d) -> f a b -> f c d
    bimap g h = first g . second h
    first :: (a -> c) -> f a b -> f c b
    first g = bimap g id
    second :: (b -> d) -> f a b -> f a d
    second = bimap id

The type variable f represents the bifunctor. You can see that in all type signatures it’s always applied to two type arguments. The first type signature defines bimap: a mapping of two functions at once. The result is a lifted function, (f a b -> f c d), operating on types generated by the bifunctor’s type constructor. There is a default implementation of bimap in terms of first and second, which shows that it’s enough to have functoriality in each argument separately to be able to define a bifunctor. (As mentioned before, this doesn’t always work, because the two maps may not commute, that is first g . second h may not be the same as second h . first g.)

Bimap

bimap

The two other type signatures, first and second, are the two fmaps witnessing the functoriality of f in the first and the second argument, respectively.

First

first

Second

second

The typeclass definition provides default implementations for both of them in terms of bimap.

When declaring an instance of Bifunctor, you have a choice of either implementing bimap and accepting the defaults for first and second, or implementing both first and second and accepting the default for bimap (of course, you may implement all three of them, but then it’s up to you to make sure they are related to each other in this manner).

Product and Coproduct Bifunctors

An important example of a bifunctor is the categorical product — a product of two objects that is defined by a universal construction. If the product exists for any pair of objects, the mapping from those objects to the product is bifunctorial. This is true in general, and in Haskell in particular. Here’s the Bifunctor instance for a pair constructor — the simplest product type:

instance Bifunctor (,) where
    bimap f g (x, y) = (f x, g y)

There isn’t much choice: bimap simply applies the first function to the first component, and the second function to the second component of a pair. The code pretty much writes itself, given the types:

bimap :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)

The action of the bifunctor here is to make pairs of types, for instance:

(,) a b = (a, b)

By duality, a coproduct, if it’s defined for every pair of objects in a category, is also a bifunctor. In Haskell, this is exemplified by the Either type constructor being an instance of Bifunctor:

instance Bifunctor Either where
    bimap f _ (Left x)  = Left (f x)
    bimap _ g (Right y) = Right (g y)

This code also writes itself.

Now, remember when we talked about monoidal categories? A monoidal category defines a binary operator acting on objects, together with a unit object. I mentioned that Set is a monoidal category with respect to cartesian product, with the singleton set as a unit. And it’s also a monoidal category with respect to disjoint union, with the empty set as a unit. What I haven’t mentioned is that one of the requirements for a monoidal category is that the binary operator be a bifunctor. This is a very important requirement — we want the monoidal product to be compatible with the structure of the category, which is defined by morphisms. We are now one step closer to the full definition of a monoidal category (we still need to learn about naturality, before we can get there).

Functorial Algebraic Data Types

We’ve seen several examples of parameterized data types that turned out to be functors — we were able to define fmap for them. Complex data types are constructed from simpler data types. In particular, algebraic data types (ADTs) are created using sums and products. We have just seen that sums and products are functorial. We also know that functors compose. So if we can show that the basic building blocks of ADTs are functorial, we’ll know that parameterized ADTs are functorial too.

So what are the building blocks of parameterized algebraic data types? First, there are the items that have no dependency on the type parameter of the functor, like Nothing in Maybe, or Nil in List. They are equivalent to the Const functor. Remember, the Const functor ignores its type parameter (really, the second type parameter, which is the one of interest to us, the first one being kept constant).

Then there are the elements that simply encapsulate the type parameter itself, like Just in Maybe. They are equivalent to the identity functor. I mentioned the identity functor previously, as the identity morphism in Cat, but didn’t give its definition in Haskell. Here it is:

data Identity a = Identity a
instance Functor Identity where
    fmap f (Identity x) = Identity (f x)

You can think of Identity as the simplest possible container that always stores just one (immutable) value of type a.

Everything else in algebraic data structures is constructed from these two primitives using products and sums.

With this new knowledge, let’s have a fresh look at the Maybe type constructor:

data Maybe a = Nothing | Just a

It’s a sum of two types, and we now know that the sum is functorial. The first part, Nothing can be represented as a Const () acting on a (the first type parameter of Const is set to unit — later we’ll see more interesting uses of Const). The second part is just a different name for the identity functor. We could have defined Maybe, up to isomorphism, as:

type Maybe a = Either (Const () a) (Identity a)

So Maybe is the composition of the bifunctor Either with two functors, Const () and Identity. (Const is really a bifunctor, but here we always use it partially applied.)

We’ve already seen that a composition of functors is a functor — we can easily convince ourselves that the same is true of bifunctors. All we need is to figure out how a composition of a bifunctor with two functors works on morphisms. Given two morphisms, we simply lift one with one functor and the other with the other functor. We then lift the resulting pair of lifted morphisms with the bifunctor.

We can express this composition in Haskell. Let’s define a data type that is parameterized by a bifunctor bf (it’s a type variable that is a type constructor that takes two types as arguments), two functors fu and gu (type constructors that take one type variable each), and two regular types a and b. We apply fu to a and gu to b, and then apply bf to the resulting two types:

newtype BiComp bf fu gu a b = BiComp (bf (fu a) (gu b))

That’s the composition on objects, or types. Notice how in Haskell we apply type constructors to types, just like we apply functions to arguments. The syntax is the same.

If you’re getting a little lost, try applying BiComp to Either, Const (), Identity, a, and b, in this order. You will recover our bare-bone version of Maybe b (a is ignored).

The new data type BiComp is a bifunctor in a and b, but only if bf is itself a Bifunctor and fu and gu are Functors. The compiler must know that there will be a definition of bimap available for bf, and definitions of fmap for fu and gu. In Haskell, this is expressed as a precondition in the instance declaration: a set of class constraints followed by a double arrow:

instance (Bifunctor bf, Functor fu, Functor gu) =>
  Bifunctor (BiComp bf fu gu) where
    bimap f1 f2 (BiComp x) = BiComp ((bimap (fmap f1) (fmap f2)) x)

The implementation of bimap for BiComp is given in terms of bimap for bf and the two fmaps for fu and gu. The compiler automatically infers all the types and picks the correct overloaded functions whenever BiComp is used.

The x in the definition of bimap has the type:

bf (fu a) (gu b)

which is quite a mouthful. The outer bimap breaks through the outer bf layer, and the two fmaps dig under fu and gu, respectively. If the types of f1 and f2 are:

f1 :: a -> a'
f2 :: b -> b'

then the final result is of the type bf (fu a') (gu b'):

bimapbf :: (fu a -> fu a') -> (gu b -> gu b') 
  -> bf (fu a) (gu b) -> bf (fu a') (gu b')

If you like jigsaw puzzles, these kinds of type manipulations can provide hours of entertainment.

So it turns out that we didn’t have to prove that Maybe was a functor — this fact followed from the way it was constructed as a sum of two functorial primitives.

A perceptive reader might ask the question: If the derivation of the Functor instance for algebraic data types is so mechanical, can’t it be automated and performed by the compiler? Indeed, it can, and it is. You need to enable a particular Haskell extension by including this line at the top of your source file:

{-# LANGUAGE DeriveFunctor #-}

and then add deriving Functor to your data structure:

data Maybe a = Nothing | Just a
  deriving Functor

and the corresponding fmap will be implemented for you.

The regularity of algebraic data structures makes it possible to derive instances not only of Functor but of several other type classes, including the Eq type class I mentioned before. There is also the option of teaching the compiler to derive instances of your own typeclasses, but that’s a bit more advanced. The idea though is the same: You provide the behavior for the basic building blocks and sums and products, and let the compiler figure out the rest.

Functors in C++

If you are a C++ programmer, you obviously are on your own as far as implementing functors goes. However, you should be able to recognize some types of algebraic data structures in C++. If such a data structure is made into a generic template, you should be able to quickly implement fmap for it.

Let’s have a look at a tree data structure, which we would define in Haskell as a recursive sum type:

data Tree a = Leaf a | Node (Tree a) (Tree a)
    deriving Functor

As I mentioned before, one way of implementing sum types in C++ is through class hierarchies. It would be natural, in an object-oriented language, to implement fmap as a virtual function of the base class Functor and then override it in all subclasses. Unfortunately this is impossible because fmap is a template, parameterized not only by the type of the object it’s acting upon (the this pointer) but also by the return type of the function that’s been applied to it. Virtual functions cannot be templatized in C++. We’ll implement fmap as a generic free function, and we’ll replace pattern matching with dynamic_cast.

The base class must define at least one virtual function in order to support dynamic casting, so we’ll make the destructor virtual (which is a good idea in any case):

template<class T>
struct Tree {
    virtual ~Tree() {};
};

The Leaf is just an Identity functor in disguise:

template<class T>
struct Leaf : public Tree<T> {
    T _label;
    Leaf(T l) : _label(l) {}
};

The Node is a product type:

template<class T>
struct Node : public Tree<T> {
    Tree<T> * _left;
    Tree<T> * _right;
    Node(Tree<T> * l, Tree<T> * r) : _left(l), _right(r) {}
};

When implementing fmap we take advantage of dynamic dispatching on the type of the Tree. The Leaf case applies the Identity version of fmap, and the Node case is treated like a bifunctor composed with two copies of the Tree functor. As a C++ programmer, you’re probably not used to analyzing code in these terms, but it’s a good exercise in categorical thinking.

template<class A, class B>
Tree<B> * fmap(std::function<B(A)> f, Tree<A> * t)
{
    Leaf<A> * pl = dynamic_cast <Leaf<A>*>(t);
    if (pl)
        return new Leaf<B>(f (pl->_label));
    Node<A> * pn = dynamic_cast<Node<A>*>(t);
    if (pn)
        return new Node<B>( fmap<A>(f, pn->_left)
                          , fmap<A>(f, pn->_right));
    return nullptr;
}

For simplicity, I decided to ignore memory and resource management issues, but in production code you would probably use smart pointers (unique or shared, depending on your policy).

Compare it with the Haskell implementation of fmap:

instance Functor Tree where
    fmap f (Leaf a) = Leaf (f a)
    fmap f (Node t t') = Node (fmap f t) (fmap f t')

This implementation can also be automatically derived by the compiler.

The Writer Functor

I promised that I would come back to the Kleisli category I described earlier. Morphisms in that category were represented as “embellished” functions returning the Writer data structure.

type Writer a = (a, String)

I said that the embellishment was somehow related to endofunctors. And, indeed, the Writer type constructor is functorial in a. We don’t even have to implement fmap for it, because it’s just a simple product type.

But what’s the relation between a Kleisli category and a functor — in general? A Kleisli category, being a category, defines composition and identity. Let’ me remind you that the composition is given by the fish operator:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)
m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

and the identity morphism by a function called return:

return :: a -> Writer a
return x = (x, "")

It turns out that, if you look at the types of these two functions long enough (and I mean, long enough), you can find a way to combine them to produce a function with the right type signature to serve as fmap. Like this:

fmap f = id >=> (\x -> return (f x))

Here, the fish operator combines two functions: one of them is the familiar id, and the other is a lambda that applies return to the result of acting with f on the lambda’s argument. The hardest part to wrap your brain around is probably the use of id. Isn’t the argument to the fish operator supposed to be a function that takes a “normal” type and returns an embellished type? Well, not really. Nobody says that a in a -> Writer b must be a “normal” type. It’s a type variable, so it can be anything, in particular it can be an embellished type, like Writer b.

So id will take Writer a and turn it into Writer a. The fish operator will fish out the value of a and pass it as x to the lambda. There, f will turn it into a b and return will embellish it, making it Writer b. Putting it all together, we end up with a function that takes Writer a and returns Writer b, exactly what fmap is supposed to produce.

Notice that this argument is very general: you can replace Writer with any type constructor. As long as it supports a fish operator and return, you can define fmap as well. So the embellishment in the Kleisli category is always a functor. (Not every functor, though, gives rise to a Kleisli category.)

You might wonder if the fmap we have just defined is the same fmap the compiler would have derived for us with deriving Functor. Interestingly enough, it is. This is due to the way Haskell implements polymorphic functions. It’s called parametric polymorphism, and it’s a source of so called theorems for free. One of those theorems says that, if there is an implementation of fmap for a given type constructor, one that preserves identity, then it must be unique.

Covariant and Contravariant Functors

Now that we’ve reviewed the writer functor, let’s go back to the reader functor. It was based on the partially applied function-arrow type constructor:

(->) r

We can rewrite it as a type synonym:

type Reader r a = r -> a

for which the Functor instance, as we’ve seen before, reads:

instance Functor (Reader r) where
    fmap f g = f . g

But just like the pair type constructor, or the Either type constructor, the function type constructor takes two type arguments. The pair and Either were functorial in both arguments — they were bifunctors. Is the function constructor a bifunctor too?

Let’s try to make it functorial in the first argument. We’ll start with a type synonym — it’s just like the Reader but with the arguments flipped:

type Op r a = a -> r

This time we fix the return type, r, and vary the argument type, a. Let’s see if we can somehow match the types in order to implement fmap, which would have the following type signature:

fmap :: (a -> b) -> (a -> r) -> (b -> r)

With just two functions taking a and returning, respectively, b and r, there is simply no way to build a function taking b and returning r! It would be different if we could somehow invert the first function, so that it took b and returned a instead. We can’t invert an arbitrary function, but we can go to the opposite category.

A short recap: For every category C there is a dual category Cop. It’s a category with the same objects as C, but with all the arrows reversed.

Consider a functor that goes between Cop and some other category D:
F :: Cop → D
Such a functor maps a morphism fop :: a → b in Cop to the morphism F fop :: F a → F b in D. But the morphism fop secretly corresponds to some morphism f :: b → a in the original category C. Notice the inversion.

Now, F is a regular functor, but there is another mapping we can define based on F, which is not a functor — let’s call it G. It’s a mapping from C to D. It maps objects the same way F does, but when it comes to mapping morphisms, it reverses them. It takes a morphism f :: b → a in C, maps it first to the opposite morphism fop :: a → b and then uses the functor F on it, to get F fop :: F a → F b.

Contravariant

Considering that F a is the same as G a and F b is the same as G b, the whole trip can be described as:
G f :: (b → a) → (G a → G b)
It’s a “functor with a twist.” A mapping of categories that inverts the direction of morphisms in this manner is called a contravariant functor. Notice that a contravariant functor is just a regular functor from the opposite category. The regular functors, by the way — the kind we’ve been studying thus far — are called covariant functors.

Here’s the typeclass defining a contravariant functor (really, a contravariant endofunctor) in Haskell:

class Contravariant f where
    contramap :: (b -> a) -> (f a -> f b)

Our type constructor Op is an instance of it:

instance Contravariant (Op r) where
    -- (b -> a) -> Op r a -> Op r b
    contramap f g = g . f

Notice that the function f inserts itself before (that is, to the right of) the contents of Op — the function g.

The definition of contramap for Op may be made even terser, if you notice that it’s just the function composition operator with the arguments flipped. There is a special function for flipping arguments, called flip:

flip :: (a -> b -> c) -> (b -> a -> c)
flip f y x = f x y

With it, we get:

contramap = flip (.)

Profunctors

We’ve seen that the function-arrow operator is contravariant in its first argument and covariant in the second. Is there a name for such a beast? It turns out that, if the target category is Set, such a beast is called a profunctor. Because a contravariant functor is equivalent to a covariant functor from the opposite category, a profunctor is defined as:
Cop × D → Set

Since, to first approximation, Haskell types are sets, we apply the name Profunctor to a type constructor p of two arguments, which is contra-functorial in the first argument and functorial in the second. Here’s the appropriate typeclass taken from the Data.Profunctor library:

class Profunctor p where
  dimap :: (a -> b) -> (c -> d) -> p b c -> p a d
  dimap f g = lmap f . rmap g
  lmap :: (a -> b) -> p b c -> p a c
  lmap f = dimap f id
  rmap :: (b -> c) -> p a b -> p a c
  rmap = dimap id

All three functions come with default implementations. Just like with Bifunctor, when declaring an instance of Profunctor, you have a choice of either implementing dimap and accepting the defaults for lmap and rmap, or implementing both lmap and rmap and accepting the default for dimap.

dimap

dimap

Now we can assert that the function-arrow operator is an instance of a Profunctor:

instance Profunctor (->) where
  dimap ab cd bc = cd . bc . ab
  lmap = flip (.)
  rmap = (.)

Profunctors have their application in the Haskell lens library. We’ll see them again when we talk about ends and coends.

The Hom-Functor

The above examples are the reflection of a more general statement that the mapping that takes a pair of objects a and b and assigns to it the set of morphisms between them, the hom-set C(a, b), is a functor. It is a functor from the product category Cop×C to the category of sets, Set.

Let’s define its action on morphisms. A morphism in Cop×C is a pair of morphisms from C:

f :: a'→ a
g :: b → b'

The lifting of this pair must be a morphism (a function) from the set C(a, b) to the set C(a', b'). Just pick any element h of C(a, b) (it’s a morphism from a to b) and assign to it:

g ∘ h ∘ f

which is an element of C(a', b').

As you can see, the hom-functor is a special case of a profunctor.

Challenges

  1. Show that the data type:
    data Pair a b = Pair a b

    is a bifunctor. For additional credit implement all three methods of Bifunctor and use equational reasoning to show that these definitions are compatible with the default implementations whenever they can be applied.

  2. Show the isomorphism between the standard definition of Maybe and this desugaring:
    type Maybe' a = Either (Const () a) (Identity a)

    Hint: Define two mappings between the two implementations. For additional credit, show that they are the inverse of each other using equational reasoning.

  3. Let’s try another data structure. I call it a PreList because it’s a precursor to a List. It replaces recursion with a type parameter b.
    data PreList a b = Nil | Cons a b

    You could recover our earlier definition of a List by recursively applying PreList to itself (we’ll see how it’s done when we talk about fixed points).

    Show that PreList is an instance of Bifunctor.

  4. Show that the following data types define bifunctors in a and b:
    data K2 c a b = K2 c
    data Fst a b = Fst a
    data Snd a b = Snd b

    For additional credit, check your solutions agains Conor McBride’s paper Clowns to the Left of me, Jokers to the Right.

  5. Define a bifunctor in a language other than Haskell. Implement bimap for a generic pair in that language.
  6. Should std::map be considered a bifunctor or a profunctor in the two template arguments Key and T? How would you redesign this data type to make it so?

Next: Function Types.

Acknowledgment

As usual, big thanks go to Gershom Bazerman for reviewing this article.


Categories for Programmers. Previously Products and Coproducts. See the Table of Contents.

We’ve seen two basic ways of combining types: using a product and a coproduct. It turns out that a lot of data structures in everyday programming can be built using just these two mechanisms. This fact has important practical consequences. Many properties of data structures are composable. For instance, if you know how to compare values of basic types for equality, and you know how to generalize these comparisons to product and coproduct types, you can automate the derivation of equality operators for composite types. In Haskell you can automatically derive equality, comparison, conversion to and from string, and more, for a large subset of composite types.

Let’s have a closer look at product and sum types as they appear in programming.

Product Types

The canonical implementation of a product of two types in a programming language is a pair. In Haskell, a pair is a primitive type constructor; in C++ it’s a relatively complex template defined in the Standard Library.

Pair

Pairs are not strictly commutative: a pair (Int, Bool) cannot be substituted for a pair (Bool, Int), even though they carry the same information. They are, however, commutative up to isomorphism — the isomorphism being given by the swap function (which is its own inverse):

swap :: (a, b) -> (b, a)
swap (x, y) = (y, x)

You can think of the two pairs as simply using a different format for storing the same data. It’s just like big endian vs. little endian.

You can combine an arbitrary number of types into a product by nesting pairs inside pairs, but there is an easier way: nested pairs are equivalent to tuples. It’s the consequence of the fact that different ways of nesting pairs are isomorphic. If you want to combine three types in a product, a, b, and c, in this order, you can do it in two ways:

((a, b), c)

or

(a, (b, c))

These types are different — you can’t pass one to a function that expects the other — but their elements are in one-to-one correspondence. There is a function that maps one to another:

alpha :: ((a, b), c) -> (a, (b, c))
alpha ((x, y), z) = (x, (y, z))

and this function is invertible:

alpha_inv :: (a, (b, c)) -> ((a, b), c)
alpha_inv  (x, (y, z)) = ((x, y), z)

so it’s an isomorphism. These are just different ways of repackaging the same data.

You can interpret the creation of a product type as a binary operation on types. From that perspective, the above isomorphism looks very much like the associativity law we’ve seen in monoids:

(a * b) * c = a * (b * c)

Except that, in the monoid case, the two ways of composing products were equal, whereas here they are only equal “up to isomorphism.”

If we can live with isomorphisms, and don’t insist on strict equality, we can go even further and show that the unit type, (), is the unit of the product the same way 1 is the unit of multiplication. Indeed, the pairing of a value of some type a with a unit doesn’t add any information. The type:

(a, ())

is isomorphic to a. Here’s the isomorphism:

rho :: (a, ()) -> a
rho (x, ()) = x
rho_inv :: a -> (a, ())
rho_inv x = (x, ())

These observations can be formalized by saying that Set (the category of sets) is a monoidal category. It’s a category that’s also a monoid, in the sense that you can multiply objects (here, take their cartesian product). I’ll talk more about monoidal categories, and give the full definition in the future.

There is a more general way of defining product types in Haskell — especially, as we’ll see soon, when they are combined with sum types. It uses named constructors with multiple arguments. A pair, for instance, can be defined alternatively as:

data Pair a b = P a b

Here, Pair a b is the name of the type paremeterized by two other types, a and b; and P is the name of the data constructor. You define a pair type by passing two types to the Pair type constructor. You construct a pair value by passing two values of appropriate types to the constructor P. For instance, let’s define a value stmt as a pair of String and Bool:

stmt :: Pair String Bool
stmt = P "This statements is" False

The first line is the type declaration. It uses the type constructor Pair, with String and Bool replacing a and the b in the generic definition of Pair. The second line defines the actual value by passing a concrete string and a concrete Boolean to the data constructor P. Type constructors are used to construct types; data constructors, to construct values.

Since the name spaces for type and data constructors are separate in Haskell, you will often see the same name used for both, as in:

data Pair a b = Pair a b

And if you squint hard enough, you may even view the built-in pair type as a variation on this kind of declaration, where the name Pair is replaced with the binary operator (,). In fact you can use (,) just like any other named constructor and create pairs using prefix notation:

stmt = (,) "This statement is" False

Similarly, you can use (,,) to create triples, and so on.

Instead of using generic pairs or tuples, you can also define specific named product types, as in:

data Stmt = Stmt String Bool

which is just a product of String and Bool, but it’s given its own name and constructor. The advantage of this style of declaration is that you may define many types that have the same content but different meaning and functionality, and which cannot be substituted for each other.

Programming with tuples and multi-argument constructors can get messy and error prone — keeping track of which component represents what. It’s often preferable to give names to components. A product type with named fields is called a record in Haskell, and a struct in C.

Records

Let’s have a look at a simple example. We want to describe chemical elements by combining two strings, name and symbol; and an integer, the atomic number; into one data structure. We can use a tuple (String, String, Int) and remember which component represents what. We would extract components by pattern matching, as in this function that checks if the symbol of the element is the prefix of its name (as in He being the prefix of Helium):

startsWithSymbol :: (String, String, Int) -> Bool
startsWithSymbol (name, symbol, _) = isPrefixOf symbol name

This code is error prone, and is hard to read and maintain. It’s much better to define a record:

data Element = Element { name         :: String
                       , symbol       :: String
                       , atomicNumber :: Int }

The two representations are isomorphic, as witnessed by these two conversion functions, which are the inverse of each other:

tupleToElem :: (String, String, Int) -> Element
tupleToElem (n, s, a) = Element { name = n
                                , symbol = s
                                , atomicNumber = a }
elemToTuple :: Element -> (String, String, Int)
elemToTuple e = (name e, symbol e, atomicNumber e)

Notice that the names of record fields also serve as functions to access these fields. For instance, atomicNumber e retrieves the atomicNumber field from e. We use atomicNumber as a function of the type:

atomicNumber :: Element -> Int

With the record syntax for Element, our function startsWithSymbol becomes more readable:

startsWithSymbol :: Element -> Bool
startsWithSymbol e = isPrefixOf (symbol e) (name e)

We could even use the Haskell trick of turning the function isPrefixOf into an infix operator by surrounding it with backquotes, and make it read almost like a sentence:

startsWithSymbol e = symbol e `isPrefixOf` name e

The parentheses could be omitted in this case, because an infix operator has lower precedence than a function call.

Sum Types

Just as the product in the category of sets gives rise to product types, the coproduct gives rise to sum types. The canonical implementation of a sum type in Haskell is:

data Either a b = Left a | Right b

And like pairs, Eithers are commutative (up to isomorphism), can be nested, and the nesting order is irrelevant (up to isomorphism). So we can, for instance, define a sum equivalent of a triple:

data OneOfThree a b c = Sinistral a | Medial b | Dextral c

and so on.

It turns out that Set is also a (symmetric) monoidal category with respect to coproduct. The role of the binary operation is played by the disjoint sum, and the role of the unit element is played by the initial object. In terms of types, we have Either as the monoidal operator and Void, the uninhabited type, as its neutral element. You can think of Either as plus, and Void as zero. Indeed, adding Void to a sum type doesn’t change its content. For instance:

Either a Void

is isomorphic to a. That’s because there is no way to construct a Right version of this type — there isn’t a value of type Void. The only inhabitants of Either a Void are constructed using the Left constructors and they simply encapsulate a value of type a. So, symbolically, a + 0 = a.

Sum types are pretty common in Haskell, but their C++ equivalents, unions or variants, are much less common. There are several reasons for that.

First of all, the simplest sum types are just enumerations and are implemented using enum in C++. The equivalent of the Haskell sum type:

data Color = Red | Green | Blue

is the C++:

enum { Red, Green, Blue };

An even simpler sum type:

data Bool = True | False

is the primitive bool in C++.

Simple sum types that encode the presence or absence of a value are variously implemented in C++ using special tricks and “impossible” values, like empty strings, negative numbers, null pointers, etc. This kind of optionality, if deliberate, is expressed in Haskell using the Maybe type:

data Maybe a = Nothing | Just a

The Maybe type is a sum of two types. You can see this if you separate the two constructors into individual types. The first one would look like this:

data NothingType = Nothing

It’s an enumeration with one value called Nothing. In other words, it’s a singleton, which is equivalent to the unit type (). The second part:

data JustType a = Just a

is just an encapsulation of the type a. We could have encoded Maybe as:

data Maybe a = Either () a

More complex sum types are often faked in C++ using pointers. A pointer can be either null, or point to a value of specific type. For instance, a Haskell list type, which can be defined as a (recursive) sum type:

List a = Nil | Cons a (List a)

can be translated to C++ using the null pointer trick to implement the empty list:

template<class A> 
class List {
    Node<A> * _head;
public:
    List() : _head(nullptr) {}  // Nil
    List(A a, List<A> l)        // Cons
      : _head(new Node<A>(a, l))
    {}
};

Notice that the two Haskell constructors Nil and Cons are translated into two overloaded List constructors with analogous arguments (none, for Nil; and a value and a list for Cons). The List class doesn’t need a tag to distinguish between the two components of the sum type. Instead it uses the special nullptr value for _head to encode Nil.

The main difference, though, between Haskell and C++ types is that Haskell data structures are immutable. If you create an object using one particular constructor, the object will forever remember which constructor was used and what arguments were passed to it. So a Maybe object that was created as Just "energy" will never turn into Nothing. Similarly, an empty list will forever be empty, and a list of three elements will always have the same three elements.

It’s this immutability that makes construction reversible. Given an object, you can always disassemble it down to parts that were used in its construction. This deconstruction is done with pattern matching and it reuses constructors as patterns. Constructor arguments, if any, are replaced with variables (or other patterns).

The List data type has two constructors, so the deconstruction of an arbitrary List uses two patterns corresponding to those constructors. One matches the empty Nil list, and the other a Cons-constructed list. For instance, here’s the definition of a simple function on Lists:

maybeTail :: List a -> Maybe (List a)
maybeTail Nil = Nothing
maybeTail (Cons _ t) = Just t

The first part of the definition of maybeTail uses the Nil constructor as pattern and returns Nothing. The second part uses the Cons constructor as pattern. It replaces the first constructor argument with a wildcard, because we are not interested in it. The second argument to Cons is bound to the variable t (I will call these things variables even though, strictly speaking, they never vary: once bound to an expression, a variable never changes). The return value is Just t. Now, depending on how your List was created, it will match one of the clauses. If it was created using Cons, the two arguments that were passed to it will be retrieved (and the first discarded).

Even more elaborate sum types are implemented in C++ using polymorphic class hierarchies. A family of classes with a common ancestor may be understood as one variant type, in which the vtable serves as a hidden tag. What in Haskell would be done by pattern matching on the constructor, and by calling specialized code, in C++ is accomplished by dispatching a call to a virtual function based on the vtable pointer.

You will rarely see union used as a sum type in C++ because of severe limitations on what can go into a union. You can’t even put a std::string into a union because it has a copy constructor.

Algebra of Types

Taken separately, product and sum types can be used to define a variety of useful data structures, but the real strength comes from combining the two. Once again we are invoking the power of composition.

Let’s summarize what we’ve discovered so far. We’ve seen two commutative monoidal structures underlying the type system: We have the sum types with Void as the neutral element, and the product types with the unit type, (), as the neutral element. We’d like to think of them as analogous to addition and multiplication. In this analogy, Void would correspond to zero, and unit, (), to one.

Let’s see how far we can stretch this analogy. For instance, does multiplication by zero give zero? In other words, is a product type with one component being Void isomorphic to Void? For example, is it possible to create a pair of, say Int and Void?

To create a pair you need two values. Although you can easily come up with an integer, there is no value of type Void. Therefore, for any type a, the type (a, Void) is uninhabited — has no values — and is therefore equivalent to Void. In other words, a*0 = 0.

Another thing that links addition and multiplication is the distributive property:

a * (b + c) = a * b + a * c

Does it also hold for product and sum types? Yes, it does — up to isomorphisms, as usual. The left hand side corresponds to the type:

(a, Either b c)

and the right hand side corresponds to the type:

Either (a, b) (a, c)

Here’s the function that converts them one way:

prodToSum :: (a, Either b c) -> Either (a, b) (a, c)
prodToSum (x, e) = 
    case e of
      Left  y -> Left  (x, y)
      Right z -> Right (x, z)

and here’s one that goes the other way:

sumToProd :: Either (a, b) (a, c) -> (a, Either b c)
sumToProd e = 
    case e of
      Left  (x, y) -> (x, Left  y)
      Right (x, z) -> (x, Right z)

The case of statement is used for pattern matching inside functions. Each pattern is followed by an arrow and the expression to be evaluated when the pattern matches. For instance, if you call prodToSum with the value:

prod1 :: (Int, Either String Float)
prod1 = (2, Left "Hi!")

the e in case e of will be equal to Left "Hi!". It will match the pattern Left y, substituting "Hi!" for y. Since the x has already been matched to 2, the result of the case of clause, and the whole function, will be Left (2, "Hi!"), as expected.

I’m not going to prove that these two functions are the inverse of each other, but if you think about it, they must be! They are just trivially re-packing the contents of the two data structures. It’s the same data, only different format.

Mathematicians have a name for such two intertwined monoids: it’s called a semiring. It’s not a full ring, because we can’t define subtraction of types. That’s why a semiring is sometimes called a rig, which is a pun on “ring without an n” (negative). But barring that, we can get a lot of mileage from translating statements about, say, natural numbers, which form a rig, to statements about types. Here’s a translation table with some entries of interest:

Numbers Types
0 Void
1 ()
a + b Either a b = Left a | Right b
a * b (a, b) or Pair a b = Pair a b
2 = 1 + 1 data Bool = True | False
1 + a data Maybe = Nothing | Just a

The list type is quite interesting, because it’s defined as a solution to an equation. The type we are defining appears on both sides of the equation:

List a = Nil | Cons a (List a)

If we do our usual substitutions, and also replace List a with x, we get the equation:

x = 1 + a * x

We can’t solve it using traditional algebraic methods because we can’t subtract or divide types. But we can try a series of substitutions, where we keep replacing x on the right hand side with (1 + a*x), and use the distributive property. This leads to the following series:

x = 1 + a*x
x = 1 + a*(1 + a*x) = 1 + a + a*a*x
x = 1 + a + a*a*(1 + a*x) = 1 + a + a*a + a*a*a*x
...
x = 1 + a + a*a + a*a*a + a*a*a*a...

We end up with an infinite sum of products (tuples), which can be interpreted as: A list is either empty, 1; or a singleton, a; or a pair, a*a; or a triple, a*a*a; etc… Well, that’s exactly what a list is — a string of as!

There’s much more to lists than that, and we’ll come back to them and other recursive data structures after we learn about functors and fixed points.

Solving equations with symbolic variables — that’s algebra! It’s what gives these types their name: algebraic data types.

Finally, I should mention one very important interpretation of the algebra of types. Notice that a product of two types a and b must contain both a value of type a and a value of type b, which means both types must be inhabited. A sum of two types, on the other hand, contains either a value of type a or a value of type b, so it’s enough if one of them is inhabited. Logical and and or also form a semiring, and it too can be mapped into type theory:

Logic Types
false Void
true ()
a || b Either a b = Left a | Right b
a && b (a, b)

This analogy goes deeper, and is the basis of the Curry-Howard isomorphism between logic and type theory. We’ll come back to it when we talk about function types.

Challenges

  1. Show the isomorphism between Maybe a and Either () a.
  2. Here’s a sum type defined in Haskell:
    data Shape = Circle Float
               | Rect Float Float

    When we want to define a function like area that acts on a Shape, we do it by pattern matching on the two constructors:

    area :: Shape -> Float
    area (Circle r) = pi * r * r
    area (Rect d h) = d * h

    Implement Shape in C++ or Java as an interface and create two classes: Circle and Rect. Implement area as a virtual function.

  3. Continuing with the previous example: We can easily add a new function circ that calculates the circumference of a Shape. We can do it without touching the definition of Shape:
    circ :: Shape -> Float
    circ (Circle r) = 2.0 * pi * r
    circ (Rect d h) = 2.0 * (d + h)

    Add circ to your C++ or Java implementation. What parts of the original code did you have to touch?

  4. Continuing further: Add a new shape, Square, to Shape and make all the necessary updates. What code did you have to touch in Haskell vs. C++ or Java? (Even if you’re not a Haskell programmer, the modifications should be pretty obvious.)
  5. Show that a + a = 2 * a holds for types (up to isomorphism). Remember that 2 corresponds to Bool, according to our translation table.

Next: Functors.

Acknowledments

Thanks go to Gershom Bazerman for reviewing this post and helpful comments.


In the previous installment of Categories for Programmers, Categories Great and Small, I gave a few examples of simple categories. In this installment we’ll work through a more advanced example. If you’re new to the series, here’s the Table of Contents.

Composition of Logs

You’ve seen how to model types and pure functions as a category. I also mentioned that there is a way to model side effects, or non-pure functions, in category theory. Let’s have a look at one such example: functions that log or trace their execution. Something that, in an imperative language, would likely be implemented by mutating some global state, as in:

string logger;

bool negate(bool b) {
     logger += "Not so! ";
     return !b;
}

You know that this is not a pure function, because its memoized version would fail to produce a log. This function has side effects.

In modern programming, we try to stay away from global mutable state as much as possible — if only because of the complications of concurrency. And you would never put code like this in a library.

Fortunately for us, it’s possible to make this function pure. You just have to pass the log explicitly, in and out. Let’s do that by adding a string argument, and pairing regular output with a string that contains the updated log:

pair<bool, string> negate(bool b, string logger) {
     return make_pair(!b, logger + "Not so! ");
}

This function is pure, it has no side effects, it returns the same pair every time it’s called with the same arguments, and it can be memoized if necessary. However, considering the cumulative nature of the log, you’d have to memoize all possible histories that can lead to a given call. There would be a separate memo entry for:

negate(true, "It was the best of times. ");

and

negate(true, "It was the worst of times. ");

and so on.

It’s also not a very good interface for a library function. The callers are free to ignore the string in the return type, so that’s not a huge burden; but they are forced to pass a string as input, which might be inconvenient.

Is there a way to do the same thing less intrusively? Is there a way to separate concerns? In this simple example, the main purpose of the function negate is to turn one Boolean into another. The logging is secondary. Granted, the message that is logged is specific to the function, but the task of aggregating the messages into one continuous log is a separate concern. We still want the function to produce a string, but we’d like to unburden it from producing a log. So here’s the compromise solution:

pair<bool, string> negate(bool b) {
     return make_pair(!b, "Not so! ");
}

The idea is that the log will be aggregated between function calls.

To see how this can be done, let’s switch to a slightly more realistic example. We have one function from string to string that turns lower case characters to upper case:

string toUpper(string s) {
    string result;
    int (*toupperp)(int) = &toupper; // toupper is overloaded
    transform(begin(s), end(s), back_inserter(result), toupperp);
    return result;
}

and another that splits a string into a vector of strings, breaking it on whitespace boundaries:

vector<string> toWords(string s) {
    return words(s);
}

The actual work is done in the auxiliary function words:

vector<string> words(string s) {
    vector<string> result{""};
    for (auto i = begin(s); i != end(s); ++i)
    {
        if (isspace(*i))
            result.push_back("");
        else
            result.back() += *i;
    }
    return result;
}

PiggyBack

We want to modify the functions toUpper and toWords so that they piggyback a message string on top of their regular return values.

We will “embellish” the return values of these functions. Let’s do it in a generic way by defining a template Writer that encapsulates a pair whose first component is a value of arbitrary type A and the second component is a string:

template<class A>
using Writer = pair<A, string>;

Here are the embellished functions:

Writer<string> toUpper(string s) {
    string result;
    int (*toupperp)(int) = &toupper;
    transform(begin(s), end(s), back_inserter(result), toupperp);
    return make_pair(result, "toUpper ");
}

Writer<vector<string>> toWords(string s) {
    return make_pair(words(s), "toWords ");
}

We want to compose these two functions into another embellished function that uppercases a string and splits it into words, all the while producing a log of those actions. Here’s how we may do it:

Writer<vector<string>> process(string s) {
    auto p1 = toUpper(s);
    auto p2 = toWords(p1.first);
    return make_pair(p2.first, p1.second + p2.second);
}

We have accomplished our goal: The aggregation of the log is no longer the concern of the individual functions. They produce their own messages, which are then, externally, concatenated into a larger log.

Now imagine a whole program written in this style. It’s a nightmare of repetitive, error-prone code. But we are programmers. We know how to deal with repetitive code: we abstract it! This is, however, not your run of the mill abstraction — we have to abstract function composition itself. But composition is the essence of category theory, so before we write more code, let’s analyze the problem from the categorical point of view.

The Writer Category

The idea of embellishing the return types of a bunch of functions in order to piggyback some additional functionality turns out to be very fruitful. We’ll see many more examples of it. The starting point is our regular category of types and functions. We’ll leave the types as objects, but redefine our morphisms to be the embellished functions.

For instance, suppose that we want to embellish the function isEven that goes from int to bool. We turn it into a morphism that is represented by an embellished function. The important point is that this morphism is still considered an arrow between the objects int and bool, even though the embellished function returns a pair:

pair<bool, string> isEven(int n) {
     return make_pair(n % 2 == 0, "isEven ");
}

By the laws of a category, we should be able to compose this morphism with another morphism that goes from the object bool to whatever. In particular, we should be able to compose it with our earlier negate:

pair<bool, string> negate(bool b) {
     return make_pair(!b, "Not so! ");
}

Obviously, we cannot compose these two morphisms the same way we compose regular functions, because of the input/output mismatch. Their composition should look more like this:

pair<bool, string> isOdd(int n) {
    pair<bool, string> p1 = isEven(n);
    pair<bool, string> p2 = negate(p1.first);
    return make_pair(p2.first, p1.second + p2.second);
}

So here’s the recipe for the composition of two morphisms in this new category we are constructing:

  1. Execute the embellished function corresponding to the first morphism
  2. Extract the first component of the result pair and pass it to the embellished function corresponding to the second morphism
  3. Concatenate the second component (the string) of of the first result and the second component (the string) of the second result
  4. Return a new pair combining the first component of the final result with the concatenated string.

If we want to abstract this composition as a higher order function in C++, we have to use a template parameterized by three types corresponding to three objects in our category. It should take two embellished functions that are composable according to our rules, and return a third embellished function:

template<class A, class B, class C>
function<Writer<C>(A)> compose(function<Writer<B>(A)> m1, 
                               function<Writer<C>(B)> m2)
{
    return [m1, m2](A x) {
        auto p1 = m1(x);
        auto p2 = m2(p1.first);
        return make_pair(p2.first, p1.second + p2.second);
    };
}

Now we can go back to our earlier example and implement the composition of toUpper and toWords using this new template:

Writer<vector<string>> process(string s) {
   return compose<string, string, vector<string>>(toUpper, toWords)(s);
}

There is still a lot of noise with the passing of types to the compose template. This can be avoided as long as you have a C++14-compliant compiler that supports generalized lambda functions with return type deduction (credit for this code goes to Eric Niebler):

auto const compose = [](auto m1, auto m2) {
    return [m1, m2](auto x) {
        auto p1 = m1(x);
        auto p2 = m2(p1.first);
        return make_pair(p2.first, p1.second + p2.second);
    };
};

In this new definition, the implementation of process simplifies to:

Writer<vector<string>> process(string s){
   return compose(toUpper, toWords)(s);
}

But we are not finished yet. We have defined composition in our new category, but what are the identity morphisms? These are not our regular identity functions! They have to be morphisms from type A back to type A, which means they are embellished functions of the form:

Writer<A> identity(A);

They have to behave like units with respect to composition. If you look at our definition of composition, you’ll see that an identity morphism should pass its argument without change, and only contribute an empty string to the log:

template<class A>
Writer<A> identity(A x) {
    return make_pair(x, "");
}

You can easily convince yourself that the category we have just defined is indeed a legitimate category. In particular, our composition is trivially associative. If you follow what’s happening with the first component of each pair, it’s just a regular function composition, which is associative. The second components are being concatenated, and concatenation is also associative.

An astute reader may notice that it would be easy to generalize this construction to any monoid, not just the string monoid. We would use mappend inside compose and mempty inside identity (in place of + and ""). There really is no reason to limit ourselves to logging just strings. A good library writer should be able to identify the bare minimum of constraints that make the library work — here the logging library’s only requirement is that the log have monoidal properties.

Writer in Haskell

The same thing in Haskell is a little more terse, and we also get a lot more help from the compiler. Let’s start by defining the Writer type:

type Writer a = (a, String)

Here I’m just defining a type alias, an equivalent of a typedef (or using) in C++. The type Writer is parameterized by a type variable a and is equivalent to a pair of a and String. The syntax for pairs is minimal: just two items in parentheses, separated by a comma.

Our morphisms are functions from an arbitrary type to some Writer type:

a -> Writer b

We’ll declare the composition as a funny infix operator, sometimes called the “fish”:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)

It’s a function of two arguments, each being a function on its own, and returning a function. The first argument is of the type (a->Writer b), the second is (b->Writer c), and the result is (a->Writer c).

Here’s the definition of this infix operator — the two arguments m1 and m2 appearing on either side of the fishy symbol:

m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

The result is a lambda function of one argument x. The lambda is written as a backslash — think of it as the Greek letter λ with an amputated leg.

The let expression lets you declare auxiliary variables. Here the result of the call to m1 is pattern matched to a pair of variables (y, s1); and the result of the call to m2, with the argument y from the first pattern, is matched to (z, s2).

It is common in Haskell to pattern match pairs, rather than use accessors, as we did in C++. Other than that there is a pretty straightforward correspondence between the two implementations.

The overall value of the let expression is specified in its in clause: here it’s a pair whose first component is z and the second component is the concatenation of two strings, s1++s2.

I will also define the identity morphism for our category, but for reasons that will become clear much later, I will call it return.

return :: a -> Writer a
return x = (x, "")

For completeness, let’s have the Haskell versions of the embellished functions upCase and toWords:

upCase :: String -> Writer String
upCase s = (map toUpper s, "upCase ")
toWords :: String -> Writer [String]
toWords s = (words s, "toWords ")

The function map corresponds to the C++ transform. It applies the character function toUpper to the string s. The auxiliary function words is defined in the standard Prelude library.

Finally, the composition of the two functions is accomplished with the help of the fish operator:

process :: String -> Writer [String]
process = upCase >=> toWords

Kleisli Categories

You might have guessed that I haven’t invented this category on the spot. It’s an example of the so called Kleisli category — a category based on a monad. We are not ready to discuss monads yet, but I wanted to give you a taste of what they can do. For our limited purposes, a Kleisli category has, as objects, the types of the underlying programming language. Morphisms from type A to type B are functions that go from A to a type derived from B using the particular embellishment. Each Kleisli category defines its own way of composing such morphisms, as well as the identity morphisms with respect to that composition. (Later we’ll see that the imprecise term “embellishment” corresponds to the notion of an endofunctor in a category.)

The particular monad that I used as the basis of the category in this post is called the writer monad and it’s used for logging or tracing the execution of functions. It’s also an example of a more general mechanism for embedding effects in pure computations. You’ve seen previously that we could model programming-language types and functions in the category of sets (disregarding bottoms, as usual). Here we have extended this model to a slightly different category, a category where morphisms are represented by embellished functions, and their composition does more than just pass the output of one function to the input of another. We have one more degree of freedom to play with: the composition itself. It turns out that this is exactly the degree of freedom which makes it possible to give simple denotational semantics to programs that in imperative languages are traditionally implemented using side effects.

Challenge

A function that is not defined for all possible values of its argument is called a partial function. It’s not really a function in the mathematical sense, so it doesn’t fit the standard categorical mold. It can, however, be represented by a function that returns an embellished type optional:

template<class A> class optional {
    bool _isValid;
    A    _value;
public:
    optional()    : _isValid(false) {}
    optional(A v) : _isValid(true), _value(v) {}
    bool isValid() const { return _isValid; }
    A value() const { return _value; }
};

As an example, here’s the implementation of the embellished function safe_root:

optional<double> safe_root(double x) {
    if (x >= 0) return optional<double>{sqrt(x)};
    else return optional<double>{};
}

Here’s the challenge:

  1. Construct the Kleisli category for partial functions (define composition and identity).
  2. Implement the embellished function safe_reciprocal that returns a valid reciprocal of its argument, if it’s different from zero.
  3. Compose safe_root and safe_reciprocal to implement safe_root_reciprocal that calculates sqrt(1/x) whenever possible.

Acknowledgments

I’m grateful to Eric Niebler for reading the draft and providing the clever implementation of compose that uses advanced features of C++14 to drive type inference. I was able to cut the whole section of old fashioned template magic that did the same thing using type traits. Good riddance! I’m also grateful to Gershom Bazerman for useful comments that helped me clarify some important points.

Next: Products and Coproducts.


In the previous instalment of Category Theory for Programmers we talked about the category of types and functions. If you’re new to the series, here’s the Table of Contents.

You can get real appreciation for categories by studying a variety of examples. Categories come in all shapes and sizes and often pop up in unexpected places. We’ll start with something really simple.

No Objects

The most trivial category is one with zero objects and, consequently, zero morphisms. It’s a very sad category by itself, but it may be important in the context of other categories, for instance, in the category of all categories (yes, there is one). If you think that an empty set makes sense, then why not an empty category?

Simple Graphs

You can build categories just by connecting objects with arrows. You can imagine starting with any directed graph and making it into a category by simply adding more arrows. First, add an identity arrow at each node. Then, for any two arrows such that the end of one coincides with the beginning of the other (in other words, any two composable arrows), add a new arrow to serve as their composition. Every time you add a new arrow, you have to also consider its composition with any other arrow (except for the identity arrows) and itself. You usually end up with infinitely many arrows, but that’s okay.

Another way of looking at this process is that you’re creating a category, which has an object for every node in the graph, and all possible chains of composable graph edges as morphisms. (You may even consider identity morphisms as special cases of chains of length zero.)

Such a category is called a free category generated by a given graph. It’s an example of a free construction, a process of completing a given structure by extending it with a minimum number of items to satisfy its laws (here, the laws of a category). We’ll see more examples of it in the future.

Orders

And now for something completely different! A category where a morphism is a relation between objects: the relation of being less than or equal. Let’s check if it indeed is a category. Do we have identity morphisms? Every object is less than or equal to itself: check! Do we have composition? If a <= b and b <= c then a <= c: check! Is composition associative? Check! A set with a relation like this is called a preorder, so a preorder is indeed a category.

You can also have a stronger relation, that satisfies an additional condition that, if a <= b and b <= a then a must be the same as b. That’s called a partial order.

Finally, you can impose the condition that any two objects are in a relation with each other, one way or another; and that gives you a linear order or total order.

Let’s characterize these ordered sets as categories. A preorder is a category where there is at most one morphism going from any object a to any object b. Another name for such a category is “thin.” A preorder is a thin category.

A set of morphisms from object a to object b in a category C is called a hom-set and is written as C(a, b) (or, sometimes, HomC(a, b)). So every hom-set in a preorder is either empty or a singleton. That includes the hom-set C(a, a), the set of morphisms from a to a, which must be a singleton, containing only the identity, in any preorder. You may, however, have cycles in a preorder. Cycles are forbidden in a partial order.

It’s very important to be able to recognize preorders, partial orders, and total orders because of sorting. Sorting algorithms, such as quicksort, bubble sort, merge sort, etc., can only work correctly on total orders. Partial orders can be sorted using topological sort.

Monoid as Set

Monoid is an embarrassingly simple but amazingly powerful concept. It’s the concept behind basic arithmetics: Both addition and multiplication form a monoid. Monoids are ubiquitous in programming. They show up as strings, lists, foldable data structures, futures in concurrent programming, events in functional reactive programming, and so on.

Traditionally, a monoid is defined as a set with a binary operation. All that’s required from this operation is that it’s associative, and that there is one special element that behaves like a unit with respect to it.

For instance, natural numbers with zero form a monoid under addition. Associativity means that:

(a + b) + c = a + (b + c)

(In other words, we can skip parentheses when adding numbers.)

The neutral element is zero, because:

0 + a = a

and

a + 0 = a

The second equation is redundant, because addition is commutative (a + b = b + a), but commutativity is not part of the definition of a monoid. For instance, string concatenation is not commutative and yet it forms a monoid. The neutral element for string concatenation, by the way, is an empty string, which can be attached to either side of a string without changing it.

In Haskell we can define a type class for monoids — a type for which there is a neutral element called mempty and a binary operation called mappend:

class Monoid m where
    mempty  :: m
    mappend :: m -> m -> m

The type signature for a two-argument function, m->m->m, might look strange at first, but it will make perfect sense after we talk about currying. You may interpret a signature with multiple arrows in two basic ways: as a function of multiple arguments, with the rightmost type being the return type; or as a function of one argument (the leftmost one), returning a function. The latter interpretation may be emphasized by adding parentheses (which are redundant, because the arrow is right-associative), as in: m->(m->m). We’ll come back to this interpretation in a moment.

Notice that, in Haskell, there is no way to express the monoidal properties of mempty and mappend (i.e., the fact that mempty is neutral and that mappend is associative). It’s the responsibility of the programmer to make sure they are satisfied.

Haskell classes are not as intrusive as C++ classes. When you’re defining a new type, you don’t have to specify its class up front. You are free to procrastinate and declare a given type to be an instance of some class much later. As an example, let’s declare String to be a monoid by providing the implementation of mempty and mappend (this is, in fact, done for you in the standard Prelude):

instance Monoid String where
    mempty = ""
    mappend = (++)

Here, we have reused the list concatenation operator (++), because a String is just a list of characters.

A word about Haskell syntax: Any infix operator can be turned into a two-argument function by surrounding it with parentheses. Given two strings, you can concatenate them by inserting ++ between them:

"Hello " ++ "world!"

or by passing them as two arguments to the parenthesized (++):

(++) "Hello " "world!"

Notice that arguments to a function are not separated by commas or surrounded by parentheses. (This is probably the hardest thing to get used to when learning Haskell.)

It’s worth emphasizing that Haskell lets you express equality of functions, as in:

mappend = (++)

Conceptually, this is different than expressing the equality of values produced by functions, as in:

mappend s1 s2 = (++) s1 s2

The former translates into equality of morphisms in the category Hask (or Set, if we ignore bottoms, which is the name for never-ending calculations). Such equations are not only more succinct, but can often be generalized to other categories. The latter is called extensional equality, and states the fact that for any two input strings, the outputs of mappend and (++) are the same. Since the values of arguments are sometimes called points (as in: the value of f at point x), this is called point-wise equality. Function equality without specifying the arguments is described as point-free. (Incidentally, point-free equations often involve composition of functions, which is symbolized by a point, so this might be a little confusing to the beginner.)

The closest one can get to declaring a monoid in C++ would be to use the (proposed) syntax for concepts.

template<class T>
  T mempty = delete;

template<class T>
  T mappend(T, T) = delete;

template<class M>
  concept bool Monoid = requires (M m) {
    { mempty<M> } -> M;
    { mappend(m, m); } -> M;
  };

The first definition uses a value template (also proposed). A polymorphic value is a family of values — a different value for every type.

The keyword delete means that there is no default value defined: It will have to be specified on a case-by-case basis. Similarly, there is no default for mappend.

The concept Monoid is a predicate (hence the bool type) that tests whether there exist appropriate definitions of mempty and mappend for a given type M.

An instantiation of the Monoid concept can be accomplished by providing appropriate specializations and overloads:

template<>
std::string mempty<std::string> = {""};

std::string mappend(std::string s1, std::string s2) {
    return s1 + s2;
}

Monoid as Category

That was the “familiar” definition of the monoid in terms of elements of a set. But as you know, in category theory we try to get away from sets and their elements, and instead talk about objects and morphisms. So let’s change our perspective a bit and think of the application of the binary operator as “moving” or “shifting” things around the set.

For instance, there is the operation of adding 5 to every natural number. It maps 0 to 5, 1 to 6, 2 to 7, and so on. That’s a function defined on the set of natural numbers. That’s good: we have a function and a set. In general, for any number n there is a function of adding n — the “adder” of n.

How do adders compose? The composition of the function that adds 5 with the function that adds 7 is a function that adds 12. So the composition of adders can be made equivalent to the rules of addition. That’s good too: we can replace addition with function composition.

But wait, there’s more: There is also the adder for the neutral element, zero. Adding zero doesn’t move things around, so it’s the identity function in the set of natural numbers.

Instead of giving you the traditional rules of addition, I could as well give you the rules of composing adders, without any loss of information. Notice that the composition of adders is associative, because the composition of functions is associative; and we have the zero adder corresponding to the identity function.

An astute reader might have noticed that the mapping from integers to adders follows from the second interpretation of the type signature of mappend as m->(m->m). It tells us that mappend maps an element of a monoid set to a function acting on that set.

Now I want you to forget that you are dealing with the set of natural numbers and just think of it as a single object, a blob with a bunch of morphisms — the adders. A monoid is a single object category. In fact the name monoid comes from Greek mono, which means single. Every monoid can be described as a single object category with a set of morphisms that follow appropriate rules of composition.

Monoid

String concatenation is an interesting case, because we have a choice of defining right appenders and left appenders (or prependers, if you will). The composition tables of the two models are a mirror reverse of each other. You can easily convince yourself that appending “bar” after “foo” corresponds to prepending “foo” after prepending “bar”.

You might ask the question whether every categorical monoid — a one-object category — defines a unique set-with-binary-operator monoid. It turns out that we can always extract a set from a single-object category. This set is the set of morphisms — the adders in our example. In other words, we have the hom-set M(m, m) of the single object m in the category M. We can easily define a binary operator in this set: The monoidal product of two set-elements is the element corresponding to the composition of the corresponding morphisms. If you give me two elements of M(m, m) corresponding to f and g, their product will correspond to the composition g∘f. The composition always exists, because the source and the target for these morphisms are the same object. And it’s associative by the rules of category. The identity morphism is the neutral element of this product. So we can always recover a set monoid from a category monoid. For all intents and purposes they are one and the same.

Monoid hom-set seen as morphisms and as points in a set

Monoid hom-set seen as morphisms and as points in a set

There is just one little nit for mathematicians to pick: morphisms don’t have to form a set. In the world of categories there are things larger than sets. A category in which morphisms between any two objects form a set is called locally small. As promised, I will be mostly ignoring such subtleties, but I thought I should mention them for the record.

A lot of interesting phenomena in category theory have their root in the fact that elements of a hom-set can be seen both as morphisms, which follow the rules of composition, and as points in a set. Here, composition of morphisms in M translates into monoidal product in the set M(m, m).

Acknowledgments

I’d like to thank Andrew Sutton for rewriting my C++ monoid concept code according to his and Bjarne Stroustrup’s latest proposal.

Challenges

  1. Generate a free category from:
    1. A graph with one node and no edges
    2. A graph with one node and one (directed) edge (hint: this edge can be composed with itself)
    3. A graph with two nodes and a single arrow between them
    4. A graph with a single node and 26 arrows marked with the letters of the alphabet: a, b, c … z.
  2. What kind of order is this?
    1. A set of sets with the inclusion relation: A is included in B if every element of A is also an element of B.
    2. C++ types with the following subtyping relation: T1 is a subtype of T2 if a pointer to T1 can be passed to a function that expects a pointer to T2 without triggering a compilation error.
  3. Considering that Bool is a set of two values True and False, show that it forms two (set-theoretical) monoids with respect to, respectively, operator && (AND) and || (OR).
  4. Represent the Bool monoid with the AND operator as a category: List the morphisms and their rules of composition.
  5. Represent addition modulo 3 as a monoid category.

Next: A programming example of pure functions that do logging using Kleisli categories.

Next Page »