For some time now I’ve been floating the idea of writing a book about category theory that would be targeted at programmers. Mind you, not computer scientists but programmers — engineers rather than scientists. I know this sounds crazy and I am properly scared. I can’t deny that there is a huge gap between science and engineering because I have worked on both sides of the divide. But I’ve always felt a very strong compulsion to explain things. I have tremendous admiration for Richard Feynman who was the master of simple explanations. I know I’m no Feynman, but I will try my best. I’m starting by publishing this preface — which is supposed to motivate the reader to learn category theory — in hopes of starting a discussion and soliciting feedback.

I will attempt, in the space of a few paragraphs, to convince you that this book is written for you, and whatever objections you might have to learning one of the most abstracts branches of mathematics in your “copious spare time” are totally unfounded.

My optimism is based on several observations. First, category theory is a treasure trove of extremely useful programming ideas. Haskell programmers have been tapping this resource for a long time, and the ideas are slowly percolating into other languages, but this process is too slow. We need to speed it up.

Second, there are many different kinds of math, and they appeal to different audiences. You might be allergic to calculus or algebra, but it doesn’t mean you won’t enjoy category theory. I would go as far as to argue that category theory is the kind of math that is particularly well suited for the minds of programmers. That’s because category theory — rather than dealing with particulars — deals with structure. It deals with the kind of structure that makes programs composable.

Composition is at the very root of category theory — it’s part of the definition of the category itself. And I will argue strongly that composition is the essence of programming. We’ve been composing things forever, long before some great engineer came up with the idea of a subroutine. Some time ago the principles of structural programming revolutionized programming because they made blocks of code composable. Then came object oriented programming, which is all about composing objects. Functional programming is not only about composing functions and algebraic data structures — it makes concurrency composable — something that’s virtually impossible with other programming paradigms.

Third, I have a secret weapon, a butcher’s knife, with which I will butcher math to make it more palatable to programmers. When you’re a professional mathematician, you have to be very careful to get all your assumptions straight, qualify every statement properly, and construct all your proofs rigorously. This makes mathematical papers and books extremely hard to read for an outsider. I’m a physicist by training, and in physics we made amazing advances using informal reasoning. Mathematicians laughed at the Dirac delta function, which was made up on the spot by the great physicist P. A. M. Dirac to solve some differential equations. They stopped laughing when they discovered a completely new branch of calculus called distribution theory that formalized Dirac’s insights.

Of course when using hand-waving arguments you run the risk of saying something blatantly wrong, so I will try to make sure that there is solid mathematical theory behind informal arguments in this book. I do have a worn-out copy of Saunders Mac Lane’s Category Theory for the Working Mathematician on my nightstand.

Since this is category theory for programmers I will illustrate all major concepts using computer code. You are probably aware that functional languages are closer to math than the more popular imperative languages. They also offer more abstracting power. So a natural temptation would be to say: You must learn Haskell before the bounty of category theory becomes available to you. But that would imply that category theory has no application outside of functional programming and that’s simply not true. So I will provide a lot of C++ examples. Granted, you’ll have to overcome some ugly syntax, the patterns might not stand out from the background of verbosity, and you might be forced to do some copy and paste in lieu of higher abstraction, but that’s just the lot of a C++ programmer.

But you’re not off the hook as far as Haskell is concerned. You don’t have to become a Haskell programmer, but you need it as a language for sketching and documenting ideas to be implemented in C++. That’s exactly how I got started with Haskell. I found its terse syntax and powerful type system a great help in understanding and implementing C++ templates, data structures, and algorithms. But since I can’t expect the readers to already know Haskell, I will introduce it slowly and explain everything as I go.

If you’re an experienced programmer, you might be asking yourself: I’ve been coding for so long without worrying about category theory or functional methods, so what’s changed? Surely you can’t help but notice that there’s been a steady stream of new functional features invading imperative languages. Even Java, the bastion of object-oriented programming, let the lambdas in. C++ has recently been evolving at a frantic pace — a new standard every few years — trying to catch up with the changing world. All this activity is in preparation for a disruptive change or, as we physicist call it, a phase transition. If you keep heating water, it will eventually start boiling. We are now in the position of a frog that must decide if it should continue swimming in increasingly hot water, or start looking for some alternatives.

IMG_1299

One of the forces that are driving the big change is the multicore revolution. The prevailing programming paradigm, object oriented programming, doesn’t buy you anything in the realm of concurrency and parallelism, and instead encourages dangerous and buggy design. Data hiding, the basic premise of object orientation, when combined with sharing and mutation, becomes a recipe for data races. The idea of combining a mutex with the data it protects is nice but, unfortunately, locks don’t compose, and lock hiding makes deadlocks more likely and harder to debug.

But even in the absence of concurrency, the growing complexity of software systems is testing the limits of scalability of the imperative paradigm. To put it simply, side effects are getting out of hand. Granted, functions that have side effects are often convenient and easy to write. Their effects can in principle be encoded in their names and in the comments. A function called SetPassword or WriteFile is obviously mutating some state and generating side effects, and we are used to dealing with that. It’s only when we start composing functions that have side effects on top of other functions that have side effects, and so on, that things start getting hairy. It’s not that side effects are inherently bad — it’s the fact that they are hidden from view that makes them impossible to manage at larger scales. Side effects don’t scale, and imperative programming is all about side effects.

Changes in hardware and the growing complexity of software are forcing us to rethink the foundations of programming. Just like the builders of Europe’s great gothic cathedrals we’ve been honing our craft to the limits of material and structure. There is an unfinished gothic cathedral in Bauvais, France, that stands witness to this deeply human struggle with limitations. It was intended to beat all previous records of height and lightness, but it suffered a series of collapses. Ad hoc measures like iron rods and wooden supports keep if from disintegrating, but obviously a lot of things went wrong. From modern perspective, it’s a miracle that so many gothic structures had been successfully completed without the help of modern material science, computer modelling, finite element analysis, and general math and physics. I hope future generations will be as admiring of the programming skills we’ve been displaying in building complex operating systems, web servers, and the internet infrastructure. And, frankly, they should, because we’ve done all this based on very flimsy theoretical foundations. We have to fix those foundations if we want to move forward.

Ad hoc measures preventing the Beauvais cathedral from collapsing

Ad hoc measures preventing the Beauvais cathedral from collapsing


Data races lead to undefined behavior; but how bad can they really be? In my previous post I talked about benign data races and I gave several examples taken from the Windows kernel. Those examples worked because the kernel was compiled with a specific compiler for a specific processor. But in general, if you want your code to be portable, you can’t have data races, period.

You just cannot reason about something that is specifically defined to be “undefined.” So, obviously, you cannot prove the correctness of a program that has data races. However, very few people engage in proofs of correctness. In most cases the argument goes, “I can’t see how this can fail, therefore it must be correct” (maybe not in so many words). I call this “proof by lack of imagination.” If you want to become a concurrency expert, you have to constantly stretch your imagination. So let’s do a few stretches.

One of the readers of my previous post, Duarte Nunes, posted a very interesting example of a benign data race. Here’s the code:

int owner = 0;
int count = 0;
std::mutex mtx;

bool TryEnter() {
    if (owner == std::this_thread::get_id()) {
        count += 1;
        return true;
    }

    if (mtx.try_lock()) {
        owner = std::this_thread::get_id();
        return true;
    }
    return false;
}

void Exit() {
    if (count != 0) {
        count -= 1;
        return;
    }
    owner = 0;
    mtx.unlock();
}

I highlighted in blue the parts that are executed under the lock (in a correct program, Exit will always be called after the lock has been acquired). Notice that the variable count is always accessed under the lock, so no data races there. However, the variable owner may be read outside of the lock– I highlighted that part of code in red. That’s the data race we are talking about.

Try to analyze this code and imagine what could go wrong. Notice that the compiler or the processor can’t be too malicious. The code still has to work correctly if the data race is removed, for instance if the racy read is put under the lock.

Here’s an attempt at the “proof” of correctness. First, Duarte observed that “The valid values for the owner variable are zero and the id of any thread in the process.” That sort of makes sense, doesn’t it? Now, the only way the racy read can have any effect is if the value of owner is equal to the current thread’s ID. But that’s exactly the value that could have been written only by the current thread– and under the lock.

There are two possibilities when reading owner: either we are still under that lock, in which case the read is not at all racy; or we have already released the lock. But the release of the lock happens only after the current thread zeroes owner.

Notice that this is a single-thread analysis and, within a single thread, events must be ordered (no need to discuss memory fences or acquire/release semantics at this point). A read following a write in the same thread cannot see the value that was there before the write. That would break regular single-threaded programs. Of course, other threads may have overwritten this zero with their own thread IDs, but never with the current thread ID. Or so the story goes…

Brace yourself: Here’s an example how a compiler or the processor may “optimize” the program:

void Exit() {
    if (count != 0) {
        count -= 1;
        return;
    }
    owner = 42;
    owner = 0;
    mtx.unlock();
}

You might argue that this is insane and no compiler in its right mind would do a thing like this; but the truth is: It’s a legal program transformation. The effect of this modification is definitely not observable in the current thread. Neither is it observable by other threads in the absence of data races. Now, the unfortunate thread whose ID just happens to be 42 might observe this value and take the wrong turn. But it can only observe it through a racy read. For instance, it would never see this value if it read owner under the lock. Moreover, it would never see it if the variable owner were defined as ‘atomic':

std::atomic<int> owner = 0

Stores and loads of atomic variables are, by default, sequentially consistent. Unfortunately sequential consistency, even on an x86, requires memory fences, which can be quite costly. It would definitely be an overkill in our example. So here’s the trick: Tell the compiler to forgo sequential consistency on a per read/write basis. For instance, here’s how you read an atomic variable without imposing any ordering constraints:

owner.load(memory_order_relaxed)

Such ‘relaxed’ operations will not introduce any memory fences– at least not on any processor I know about.

Here’s the version of the code that is exactly equivalent to the original, except that it’s correct and portable:

std::atomic<int> owner = 0;
int count = 0;
std::mutex mtx;

bool TryEnter() {
    if (owner.load(memory_order_relaxed) == std::this_thread::get_id()) {
        count += 1;
        return true;
    }

    if (mtx.try_lock()) {
        owner.store(std::this_thread::get_id(), memory_order_relaxed);
        return true;
    }
    return false;
}

void Exit() {
    if (count != 0) {
        count -= 1;
        return;
    }
    owner.store(0, memory_order_relaxed);
    mtx.unlock();
}

So what has changed? Can’t the compiler still do the same dirty trick, and momentarily store 42 in the owner variable? No, it can’t! Since the variable is declared ‘atomic,’ the compiler can no longer assume that the write can’t be observed by other threads.

The new version has no data races: The Standard specifically states that ‘atomic’ variables don’t contribute to data races, even in their most relaxed form.

C++ Standard, (1.10.5):
[...] “Relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races.

With those changes, I believe that our “proof” of correctness may actually be turned into a more rigorous proof using the axioms of the C++ memory model (although I’m not prepared to present one). We can have our cake (correct, portable code) and eat it too (no loss of performance). And, by the way, the same trick may be used in the case of lossy counters from my previous post.

Warning: I do not recommend this style of coding, or the use of weak atomics, to anybody who is not implementing operating system kernels or concurrency libraries.

Acknowledgments

I’d like to thank Luis Ceze, Hans Boehm, and Anthony Williams for useful remarks and for verifying my assumptions about the C++ memory model.

Bibliography

  1. C++ Draft Standard

We have this friendly competition going on between Eric Niebler and myself. He writes some clever C++ template code, and I feel the compulsion to explain it to him in functional terms. Then I write a blog about Haskell or category theory and Eric feels a compulsion to translate it into C++.

Eric is now working on his proposal to rewrite the C++ STL in terms of ranges and I keep reinterpreting his work in terms familiar to functional programmers. Eric’s range comprehensions are a result of some of this back and forth.

Lazy ranges are such an excellent example of functional programming that it would be foolish for me to pass this opportunity to dissect them. To any functional programmer worth their salt they just scream “monad!” A monad is a higher order pattern that can be built step by step, so that’s what I’m going to do. I’ll start with the functor pattern, then add some functionality that will make it a pointed functor, then add some more to make it an applicative functor, and finally add some more to make it a monad. This gradual buildup of functionality is reminiscent of building a class hierarchy, and indeed it can be modelled as such in Haskell (although Haskell type classes are slightly different than C++ classes). This hierarchy would look something like this:

  • A monad is-an applicative functor
  • An applicative functor is-a pointed functor
  • A pointed functor is-a functor

So let’s start with a functor.

Functor

I have a pet peeve about the use of the word “functor” in C++. People keep calling function objects functors. It’s like calling Luciano Pavarotti an “operator,” because he sings operas. The word functor has a very precise meaning in mathematics — moreover, it’s the branch of mathematics that’s extremely relevant to programming. So hijacking this term to mean a function-like object causes unnecessary confusion.

A functor in functional programming is a generic template, which allows the “lifting” of functions. Let me explain what it means. A generic template takes an arbitrary type as a template argument. So a range (whether lazy or eager) is a generic template because it can be instantiated for any type. You can have a range of integers, a range of vectors, a range of ranges, and so on. (We’ll come back to ranges of ranges later when we talk about monads.)

The “lifting” of functions means this: Give me any function from some type T to some other type U and I can apply this function to a range of T and produce a range of U. You may recognize this kind of lifting in the STL algorithm std::transform, which can be used to apply a function to a container. STL containers are indeed functors. Unfortunately, their functorial nature is buried under the noise of iterators. In Eric’s range library, the lifting is done much more cleanly using view::transform. Have a look at this example:

 int total = accumulate(view::iota(1) |
                        view::transform([](int x){return x*x;}) |
                        view::take(10), 0);

Here, view::transform takes an anonymous function that squares its argument, and lifts this function to the level of ranges. The range created by view::iota(1) is piped into it from the left, and the resulting rage of squares emerges from it on the right. The (infinite) range is then truncated by take‘ing the first 10 elements.

The function view::iota(1) is a factory that produces an infinite range of consecutive integers starting from 1. (We’ll come back to range factories later.)

In this form, view::transform plays the role of a higher-order function: one that takes a function and returns a function. It almost reaches the level of terseness and elegance of Haskell, where this example would look something like this:

total = sum $ take 10 $ fmap (\x->x*x) [1..]

(Traditionally, the flow of data in Haskell is from right to left.) The (higher-order) function fmap can be thought of as a “method” of the class Functor that does the lifting in Haskell. In C++ there is no overall functor abstraction, so each functor names its lifting function differently — for ranges, it’s view::transform.

The intuition behind a functor is that it generates a family of objects that somehow encapsulate values of arbitrary types. This encapsulation can be very concrete or very abstract. For instance, a container simply contains values of a given type. A range provides access to values that may be stored in some other container. A lazy range generates values on demand. A future, which is also a functor (or will be, in C++17), describes a value that might not be currently available because it’s being evaluated in a separate thread.

All these objects have one thing in common: they provide means to manipulate the encapsulated values with functions. That’s the only requirement for a functor. It’s not required that a functor provide access to encapsulated values (which may not even exist), although most do. In fact there is a functor (really, a monad), in Haskell, that provides no way of accessing its values other than outputting them to external devices.

Pointed Functor

A pointed functor is a functor with one additional ability: it lets you lift individual values. Give me a value of any type and I will encapsulate it. In Haskell, the encapsulating function is called pure although, as we will see later, in the context of a monad it’s called return.

All containers are pointed, because you can always create a singleton container — one that contains only one value. Ranges are more interesting. You can obviously create a range from a singleton container. But you can also create a lazy range from a value using a (generic) function called view::single, which doesn’t have a backing container behind it.

There is, however, an alternative way of lifting a value to a range, and that is by repeating it indefinitely. The function that creates such infinite (lazy) ranges is called view::repeat. For instance, view::repeat(1) creates an infinite series of ones. You might be wondering what use could there be of such a repetitive range. Not much, unless you combine it with other ranges. In general, pointed functors are not very interesting other than as stepping stones towards applicative functors and monads. So let’s move on.

Applicative Functor

An applicative functor is a pointed functor with one additional ability: it lets you lift multi-argument functions. We already know how to lift a single-argument function using fmap (or transform, or whatever it’s called for a particular functor).

With multi-argument functions acting on ranges we have two different options corresponding to the two choices for pure I mentioned before: view::single and view::repeat.

The idea, taken from functional languages, is to consider what happens when you provide the fist argument to a function of multiple arguments (it’s called partial application). You don’t get back a value. Instead you get something that expects one or more remaining arguments. A thing that expects arguments is called a function (or a function object), so you get back a function of one fewer arguments. In C++ you can’t just call a function with fewer arguments than expected, or you get a compilation error, but there is a (higher-order) function in the Standard Library called std::bind that implements partial application.

This kind of transformation from a function of multiple arguments to a function of one argument that returns a function is called currying.

Let’s consider a simple example. We want to apply std::make_pair to two ranges: view::ints(10, 11) and view::ints(1, 3). To this end, let’s replace std::make_pair with the corresponding curried function of one argument returning a function of one argument:

[](int i) { return [i](int j) { return std::make_pair(i, j); };}

First, we want to apply this function to the first range. We know how to apply a function to a range: we use view::transform.

auto partial_app = view::ints(10, 11) 
                 | view::transform([](int i) { 
                      return [i](int j) { return std::make_pair(i, j); }
                   });

What’s the result of this application? Can you guess? Our curried function will be applied to each integer in the range, returning a function that pairs that integer with its argument. So we end up with a range of functions of the form:

[i](int j) { return std::make_pair(i, j); }

So far so good — we have just used the functorial property of the range. But now we have to decide how to apply a range of functions to the second range of values. And that’s the essence of the definition of an applicative functor. In Haskell the operation of applying encapsulated functions to encapsulated arguments is denoted by an infix operator <*>.

With ranges, there are two basic strategies:

  1. We can enumerate all possible combinations — in other words create the cartesian product of the range of functions with the range of values — or
  2. Match corresponding functions to corresponding values — in other words, “zip” the two ranges.

The former, when applied to view::ints(1, 3), will yield:

{(10,1),(10,2),(10,3),(11,1),(11,2),(11,3)}

and the latter will yield:

{(10, 1),(11, 2)}

(when the ranges are not equal length, you stop zipping when the shorter one is exhausted).

To see that those two choices correspond to the two choices for pure, we have to look at some consistency conditions. One of them is that if you encapsulate a single-argument function in a range using pure and then apply it to a range of arguments, you should get the same result as if you simply fmapped this function directly over the range of arguments. For the record, I’ll write it here in Haskell:

pure f <*> xs == fmap f xs

This is sort of an obvious requirement: You have two different ways of applying a single-argument function to a range, they better give the same result.

Let’s try it with the view::single version of pure. When acting on a function, it will create a one-element range containing this function. The “all possible combinations” application will just apply this function to all elements of the argument range, which is exactly what view::transform would do.

Conversely, if we apply view::repeat to the function, we’ll get an infinite range that repeats this function at every position. We have to zip this range with the range of arguments in order to get the same result as view::transform. So this implementation of pure works with the zippy applicative. Notice that if the argument range is finite the result will also be finite. But this kind of application will also work with infinite ranges thanks to laziness.

So there are two legitimate implementations of the applicative functor for ranges. One uses view::single to lift values and uses the all possible combinations strategy to apply a range of functions to a range of arguments. The other uses view::repeat to lift values and the zipping application for ranges of functions and arguments. They are both acceptable and have their uses.

Now let’s go back to our original problem of applying a function of multiple arguments to a bunch of ranges. Since we are not doing it in Haskell, currying is not really a practical option.

As it turns out, the second version of applicative has been implemented by Eric as a (higher-order) function view::zip_with. This function takes a multi-argument callable object as its first argument, followed by a variadic list of ranges.

There is no corresponding implementation for the combinatorial applicative. I think the most natural interface would be an overload of view::transform (or maybe view::fmap) with the same signature as zip_with. Our example would then look like this:

view::transform(std::make_pair, view::ints(10, 11), view::ints(1, 3));

The need for this kind of interface is not as acute because, as we’ll learn next, the combinatorial applicative is supplanted by a more general monadic interface.

Monad

Monads are applicative functors with one additional functionality. There are two equivalent ways of describing this functionality. But let me first explain why this functionality is needed.

The range library comes with a bunch of range factories, such as view::iota, view::ints, or view::repeat. It’s also very likely that users of the library will want to create their own range factories. The problem is: How do you compose existing range factories to obtain new range factories?

Let me give you an example that generated a blog post exchange between me and Eric. The challenge was to generate a list of Pythagorean triples. The idea is to take a cross product of three infinite ranges of integers and select those triples that satisfy the equation x2 + y2 = z2. The cross product of ranges is reminiscent of the “all possible combinations” applicative, and indeed that’s the applicative that can be extended to a monad (the zippy one can’t).

To make this algorithm feasible, we have to organize these ranges so we can (lazily) traverse them. Let’s start with a factory that produces all integers from 1 to infinity. That’s the view::ints(1) factory. Then, for each z produced by that factory, let’s create another factory view::ints(1, z). This range will provide our xs — and it makes no sense to try xs that are bigger than zs. These values, in turn, will be used in the creation of the third factory, view::ints(x, z) that will generate our ys. At the end we’ll filter out the triples that don’t satisfy the Pythagorean equation.

Notice how we are feeding the output of one range factory into another range factory. Do you see the problem? We can’t just iterate over an infinite range. We need a way to glue the output side of one range factory to the input side of another range factory without unpacking the range. And that’s what monads are for.

Remember also that there are functors that provide no way of extracting values, or for which extraction is expensive or blocking (as is the case with futures). Being able to compose those kinds of functor factories is often essential, and again, the monad is the answer.

Now let’s pinpoint the type of functionality that would allow us to glue range factories end-to-end. Since ranges are functorial, we can use view::transform to apply a factory to a range. After all a factory is just a function. The only problem is that the result of such application is a range of ranges. So, really, all that’s needed is a lazy way of flattening nested ranges. And that’s exactly what Eric’s view::flatten does.

With this new flattening power at our disposal, here’s a possible beginning of the solution to the Pythagorean triple problem:

view::ints(1) | view::transform([](int z) { 
                view::ints(1, z) | ... } | view::flatten

However, this combination of view::transform and view::flatten is so useful that it deserves its own function. In Haskell, this function is called “bind” and is written as an infix operator >>=. (And, while we’re at it, flatten is called join.)

And guess what the combination of view::transform and view::flatten is called in the range library. This discovery struck me as I was looking at one of Eric’s examples. It’s called view::for_each. Here’s the solution to the Pythagorean triple problem using view::for_each to bind range factories:

auto triples =
  for_each(ints(1), [](int z) {
    return for_each(ints(1, z), [=](int x) {
      return for_each(ints(x, z), [=](int y) {
        return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z));
      });
    });
  });

And here’s the same code in Haskell:

triples = 
  (>>=) [1..] $ \z -> 
     (>>=) [1..z] $ \x -> 
        (>>=) [x..z] $ \y -> 
           guard (x^2 + y^2 == z^2) >> return (x, y, z)

I have purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators >>= are normally used in infix position but here I wanted to match them against for_each. Haskell’s return is the same as view::single, which Eric renamed to yield inside for_each. In this particular case, yield is conditional, which in Haskell is expressed using guard. The syntax for lambdas is also different, but otherwise the code matches almost perfectly.

This is an amazing and somewhat unexpected convergence. In our tweeter exchange, Eric sheepishly called his for_each code imperative. We are used to thinking of for_each as synonymous with looping, which is such an iconic imperative construct. But here, for_each is the monadic bind — the epitome of functional programming. This puppy is purely functional. It’s an expression that returns a value (a range) and has no side effects.

But what about those loops that do have side effects and don’t return a value? In Haskell, side effects are always encapsulated using monads. The equivalent of a for_each loop with side effects would return a monadic object. What we consider side effects would be encapsulated in that object. It’s not the loop that performs side effects, its that object. It’s an executable object. In the simplest case, this object contains a function that may be called with the state that is to be modified. For side effects that involve the external world, there is a special monad called the IO monad. You can produce IO objects, you can compose them using monadic bind, but you can’t execute them. Instead you return one such object that combines all the IO of your program from main and let the runtime execute it. (At least that’s the theory.)

Is this in any way relevant to an imperative programmer? After all, in C++ you can perform side effects anywhere in your code. The problem is that there are some parts of your code where side effects can kill you. In concurrent programs uncontrolled side effects lead to data races. In Software Transactional Memory (STM, which at some point may become part of C++) side effects may be re-run multiple times when a transaction is retried. There is an urgent need to control side effects and to separate pure functions from their impure brethren. Encapsulating side effects inside monads could be the ticket to extend the usefulness of pure functions inside an imperative language like C++.

To summarize: A monad is an applicative functor with an additional ability, which can be expressed either as a way of flattening a doubly encapsulated object, or as a way of applying a functor factory to an encapsulated object.

In the range library, the first method is implemented through view::flatten, and the second through view::for_each. Being an applicative functor means that a range can be manipulated using view::transform and that any value may be encapsulated using view::single or, inside for_each, using yield.

The ability to apply a range of functions to a range of arguments that is characteristic of an applicative functor falls out of the monadic functionality. For instance, the example from the previous section can be rewritten as:

for_each(ints(10, 11), [](int i) {
  return for_each(ints(1, 3), [i](int j) {
    return yield(std::make_pair(i, j));
  });
});

The Mess We’re In

I don’t think the ideas I presented here are particularly difficult. What might be confusing though is the many names that are used to describe the same thing. There is a tendency in imperative (and some functional) languages to come up with cute names for identical patterns specialized to different applications. It is also believed that programmers would be scared by terms taken from mathematics. Personally, I think that’s silly. A monad by any other name would smell as sweet, but we wouldn’t be able to communicate about them as easily. Here’s a sampling of various names used in relation to concepts I talked about:

  1. Functor: fmap, transform, Select (LINQ)
  2. Pointed functor: pure, return, single, repeat, make_ready_future, yield, await
  3. Applicative functor: <*>, zip_with
  4. Monad: >>=, bind, mbind, for_each, next, then, SelectMany (LINQ)

Part of the problem is the lack of expressive power in C++ to unite such diverse phenomena as ranges and futures. Unfortunately, the absence of unifying ideas adds to the already overwhelming complexity of the language and its libraries. Functional paradigm could be a force capable of building connections between seemingly distant application areas.

Acknowledments

I’m grateful to Eric Niebler for reviewing the draft of this blog and correcting several mistakes. The remaining mistakes are all mine.


Can a data race not be a bug? In the strictest sense I would say it’s always a bug. A correct program written in a high-level language should run the same way on every processor present, past, and future. But there is no proscription, or even a convention, about what a processor should (or shouldn’t) do when it encounters a race. This is usually described in higher-level language specs by the ominous phrase: “undefined behavior.” A data race could legitimately reprogram your BIOS, wipe out your disk, and stop the processor’s fan causing a multi-core meltdown.

Data race: Multiple threads accessing the same memory location without intervening synchronization, with at least one thread performing a write.

However, if your program is only designed to run on a particular family of processors, say the x86, you might allow certain types of data races for the sake of performance. And as your program matures, i.e., goes through many cycles of testing and debugging, the proportion of buggy races to benign races keeps decreasing. This becomes a real problem if you are using a data-race detection tool that cannot distinguish between the two. You get swamped by false positives.

Microsoft Research encountered and dealt with this problem when running their race detector called DataCollider on the Windows kernel (see Bibliography). Their program found 25 actual bugs, and almost an order of magnitude more benign data races. I’ll summarize their methodology and discuss their findings about benign data races.

Data Races in the Windows Kernel

The idea of the program is very simple. Put a hardware breakpoint on a shared memory access and wait for one of the threads to stumble upon it. This is a code breakpoint, which is triggered when the particular code location is executed. The x86 also supports another kind of a breakpoint, which is called a data breakpoint, triggered when the program accesses a specific memory location. So when a thread hits the code breakpoint, DataCollider installs a data breakpoint at the location the thread was just accessing. It then stalls the current thread and lets all other threads run. If any one of them hits the data breakpoint, it’s a race (as long as one of the accesses is a write). Consider this: If there was any synchronization (say, a lock acquisition was attempted) between the two accesses, the second thread would have been blocked from accessing that location. Since it wasn’t, it’s a classic data race.

Notice that this method might not catch all data races, but it doesn’t produce false positives. Except, of course, when the race is considered benign.

There are other interesting details of the algorithm. One is the choice of code locations for installing breakpoints. DataCollider first analyzes the program’s assembly code to create a pool of memory accesses. It discards all thread-local accesses and explicitly synchronized instructions (for instance, the ones with the LOCK prefix). It then randomly picks locations for breakpoints from this pool. Notice that rarely executed paths are as likely to be sampled as the frequently executed ones. This is important because data races, like fugitive criminals, often hide in less frequented places.

Pruning Benign Races

90% of data races caught by DataCollider in the Windows kernel were benign. For several reasons it’s hard to say how general this result is. First, the kernel had already been tested and debugged for some time, so many low-hanging concurrency bugs have been picked. Second, operating system kernels are highly optimized for a particular processor and might use all kinds of tricks to improve performance. Finally, kernels often use unusual synchronization strategies. Still, it’s interesting to see what shape benign data races take.

It turns out that half of all false positives came from lossy counters. There are many places where statistics are gathered: counting certain kinds of events, either for reporting or for performance enhancements. In those situations losing a few increments is of no relevance. However not all counters are lossy and, for instance, a data race in reference counting is a serious bug. DataCollider uses simple heuristic to detect lossy counters–they are the ones that are always incremented. A reference counter, on the other hand, is as often incremented as decremented.

Another benign race happens when one thread reads a particular bit in a bitfield while another thread updates another bit. A bit update is a read-modify-write (RMW) sequence: The thread reads the previous value of the bitfield, modifies one bit, and writes the whole bitfield back. Other bits are overwritten in the process too, but their new values are the same as the old values. A read from another thread of any of the the non-changed bits does not interfere with the write, at least not on the x86. Of course if yet another thread modified one of those bits, it would be a real bug, and it would be caught separately. The pruning of this type of race requires analysis of surrounding code (looking for the masking of other bits).

Windows kernel also has some special variables that are racy by design–current time is one such example. DataCollider has these locations hard-coded and automatically prunes them away.

There are benign races that are hard to prune automatically, and those are left for manual pruning (in fact, DataCollider reports all races, it just de-emphasizes the ones it considers benign). One of them is the double-checked locking pattern (DCLP), where a thread makes a non-synchronized read to be later re-confirmed under the lock. This pattern happens to work on the x86, although it definitely isn’t portable.

Finally, there is the interesting case of idempotent writes– two racing writes that happen to write the same value to the same location. Even though such scenarios are easy to prune, the implementers of DataCollider decided not to prune them because more often than not they led to the uncovering of concurrency bugs. Below is a table that summarizes various cases.

Benign race Differentiate from Pruned?
Lossy counter Reference counting Yes
Read and write of different bits Read and write of the whole word Yes
Deliberately racy variables Yes
DCLP No
Idempotent writes No

Conclusion

In the ideal world there would be no data races. But a concurrency bug detector must take into account the existence of benign data races. In the early stages of product testing the majority of detected races are real bugs. It’s only when chasing the most elusive of concurrency bugs that it becomes important to weed out benign races. But it’s the elusive ones that bite the hardest.

Bibliography

  1. John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk, Effective Data-Race Detection for the Kernel

If you’re like me, you might be worried that the world is being taken over by monoids. Binary operators that are associative and respect unity are popping up everywhere. Even our beloved monads are being dismissively called “just” monoids in the category of endofunctors. Personally, I have nothing against associativity, but when it’s not there, it’s not there. Did you know, for instance, that addition is non-associative? Of course, I’m talking about floating-point addition. Taking averages is non-associative. Lie algebras are non-associative. At a recent FARM (Functional Art, Music, Modelling and Design) Workshop, Paul Hudak spoke of a non-associative way of tiling temporal media (the %\ — percent backslash — operator).

In mathematics, there exists a simpler notion of magma, which is a structure with a binary operator with no additional conditions imposed on it. So monoid is really “just” a magma with some additional structure. Take that monoid!

There was a recent interesting post about free magmas and their relation to binary trees, which shows that magmas might be relevant outside of geology. All this encouraged me to revisit some of the areas traditionally associated with monoids and give them another look-see. If nothing else, it’s an opportunity to have a little refresher in monoidal and enriched categories. Hopefully, this will also help in understanding Edward Kmett’s latest experiments with encoding category theory in Hask.

Magmatic Category

Let’s start by slimming down the usual definition of a monoidal category: A magmatic category is a category 𝒱0 with a functor ⨁ : 𝒱0 ⨯ 𝒱0 → 𝒱0 from the cartesian product of 𝒱0 with itself to 𝒱0. This is our “attach” operator that works on objects and can be lifted to morphisms.

That’s it! A monoidal category would also have the unit object I, the associator, the left unitor, the right unitor, and two coherence axioms. But if magma is all we need, we’re done.

In what follows I’m not going to be interested in associativity until much later, but I will consider including the unit object I (a magma with a unit is called a unital magma). What defines the unit object are two natural (in X) isomorphisms: lX: I ⨁ X → X, and rX: X ⨁ I → X, called, respectively, the left and the right unitors. Still, because of no associativity, there are no coherence axioms one has to worry about in a monoidal category.

Having a unit object allows us to define an interesting functor that maps any object X to the hom-set 𝒱0(I, X) — the set of morphisms from I to X (obviously, this is a set only if 𝒱0 is locally small). We’ll call this functor V, to make sure you pay attention to fonts, and not just letters. Why is it interesting? Because it’s as close as we can get, in category theory, to defining an element of an object. Remember: in category theory you don’t look inside objects — all you have at your disposal are morphisms between amorphous objects.

So how would you define an element of, let’s say, a set, without looking inside it? You’d define it as a function (morphism) from a one-element set. Such a morphism picks a single element in the target set. But what’s a one-element set (remember: we can’t look inside sets)? Well, it’s a terminal object in the category of sets. Again, a terminal object is defined entirely in terms of morphisms. It’s the object that has a unique morphism coming to it from any other object. (Exercise: Show that a one element set is terminal among sets.)

So an element of a set can be defined as a morphism from the terminal set. Nothing stops us from extending this definition of an “element” to an arbitrary category that has a terminal object. But what does a terminal object have to do with a unit object in the monoidal (or unital-magmatic) category? In the archetypal monoidal category — the cartesian monoidal category — the terminal object is the unit. The binary operator is the product, defined by a universal construction. That’s why we can call a morphism f from the hom-set 𝒱0(I, X) an element of X. In this language, V maps X into a set of elements of X — the underlying set.

Exponential Objects and Currying

The usual universal construction of an exponential object YX can be easily reproduced in a magmatic category. We just replace the cartesian product in that construction with the “attach” operator we have at our disposal.

Here’s how you do it: Start with a pair of objects, X and Y. Consider all pairs <object Z, morphism g> in which g is a morphism from Z ⨁ X to Y. We are looking for one of these pairs, call it <YX, app>, such that any other pair “factorizes” through it (YX is just an object with a funny symbol). It means that, for each <Z, g> there is a unique morphism λg : Z→ YX, such that:

g = app ∘ (λg ⨁ id)

Notice that we are using the fact that ⨁ is a (bi-)functor, which means it can be applied to morphisms as well as objects. Here it’s applied to λg and id (the identity morphism on X).

The universal construction of an exponential object

The universal construction of an exponential object

There is an interesting class of categories in which there is an exponential object for every pair of objects. Or even better, an exponential objects that is defined by an adjunction. An adjunction relates two functors. In this case, one functor is defined by the action of the binary operator: Z→ Z ⨁ X. The exponentiation functor Y→ YX is then its right adjoint.  The adjunction is a natural bijection of hom-sets:

λ: 𝒱0(Z ⨁ X, Y) ≅ 𝒱0(Z, YX)

This equation is just the re-statement of the condition that for every g : Z ⨁ X→ Y there is a unique λg : Z→ YX, except that we now require this mapping to be natural in both Y and Z.

Here’s the interesting part: This definition can be viewed as generalized currying. Normal currying establishes the relationship between functions of two arguments and functions returning functions. A function of two arguments can be thought of as a function defined on a cartesian product of those arguments. Here, we are replacing the cartesian product with our magmatic operation, and the adjunction tells us that for every morphism g from Z ⨁ X to Y there is a morphism λg from Z to the exponential object YX. The currying intuition is that YX objectifies morphisms from X to Y; it objectifies the hom-set 𝒱0(X, Y) inside 𝒱0. In fact YX is often called the internal hom-set (in which case 𝒱0(X, Y) is called the external hom-set).

This intuition is reinforced by the observation that, if we replace Z with I, the magmatic unit, in the adjunction, we get:

𝒱0(I ⨁ X, Y) ≅ 𝒱0(I, YX)

I ⨁ X is naturally isomorphic to X (through the left unitor), so we get:

𝒱0(X, Y) ≅ 𝒱0(I, YX)

Now recall what we have said about morphisms from the unit: they define elements of an object through the functor we called V. The equation:

𝒱0(X, Y) ≅ V YX

tells us that elements of the (external) hom-set 𝒱0(X, Y) are in one-to-one correspondence with the “elements” of the exponential YX. This is not exactly the isomorphism of external and internal hom-sets, because of the intervening functor V, which might not even be faithful. However, this is exactly one of the conditions that define a closed category, hinting at the possibility that the adjunction defining the exponential object in a unital magmatic category might be enough to make it closed.

All this was done without any mention of associativity. So what is associativity good for? Here’s an example. Let’s replace X with (A ⨁ B) in the adjunction:

𝒱0(Z ⨁ (A ⨁ B), Y) ≅ 𝒱0(Z, Y(A ⨁ B))

If we have associativity, the above is naturally isomorphic to:

𝒱0((Z ⨁ A) ⨁ B, Y) ≅ 𝒱0(Z ⨁ A, YB) ≅ 𝒱0(Z, YBA)

Putting it all together and replacing Z with I, we get the natural isomorphism between “elements” of two objects:

V Y(A ⨁ B) ≅ V YBA

This looks just like currying inside the internal hom-set, except for the functor V. So to curry from the external to the internal hom-set it’s enough to have a unital magmatic category, but to curry within the internal hom-set we need the full monoidal category.

A word about notation: Traditionally, the exponential object from X to Y is denoted as YX. That’s the convention I’ve been following so far. Arguably, this notation is somewhat awkward. In his book, Basic Concepts of Enriched Category Theory, Kelly uses a more convenient notation: [X, Y]. In his notation, the above equation takes a slightly more readable form:

V [(A ⨁ B), Y] ≅ V [A, [B, Y]]

Magmatically Enriched Category

A (locally small) category consists of object and sets of morphisms, hom-sets, between every pair of objects. All we need to assume about morphisms is that they compose nicely and that there exist morphisms that act as units of composition. Think of it this way: Each hom-set is an object from the Set category. Composition is a mapping from a cartesian product of two such objects to a third object.

But why restrict ourselves to Set to draw our hom-sets from? Why not pick an arbitrary category? The only question is: How should we define composition of morphisms? Morphisms are elements of hom-sets. In an arbitrary category we might not be able to define elements of objects. What we could do, however, is to pick a category that has all the products and define composition as a mapping (morphism) from a product of two objects to a third object. This is analogous to defining composition in terms of cartesian products of hom-sets, rather than defining it point-wise on individual morphisms.

But once we made a bold decision to abandon sets, why restrict ourselves to cartesian monoidal categories (the ones with a product defined by a universal construction)? All we really need is some notion of a binary operation, and that’s given by a magmatic category. So let’s define a magmatically enriched category 𝒜 as a collection of objects together with an additional magmatic category (𝒱0, ⨁) from which we pick our hom-objects (no longer called hom-sets). For any two objects A and B in the category 𝒜, a hom-object 𝒜(A, B) is an object of 𝒱0. Composition is defined for any triple of objects A, B, and C. It’s a morphism between objects in 𝒱0:

M: 𝒜(B, C) ⨁ 𝒜(A, B) → 𝒜(A, C)

Of course I stole this definition from a monoidally enriched category. Another part of it, which might be useful in the future, is the definition of a unit element. In a normal category, a unit morphism idA is an element of the hom-set 𝒜(A, A) of morphisms that go from A back to A. In an enriched category, the only way to define an element of a hom-object is though a mapping from a unit object I, so here’s the generalization of the identity morphism:

jA: I → 𝒜(A, A)

We have to make sure that there is appropriate interplay between composition, left and right unitors, and the unit element. These are expressed through commutative diagrams:

Commuting diagrams relating composition M, unitors l and r, and the unit element

Commuting diagrams relating composition M, unitors l and r, and the unit element I

As before, since we don’t have to worry about associativity, we can omit the rest of the axioms.

Functors and Natural Transformations

A functor maps objects to objects and morphisms to morphisms. The mapping of objects translates directly to enriched settings, but we don’t want to deal with morphisms individually. We don’t want to look inside hom objects. Fortunately, we can define the action of a functor on a hom-object as a whole. The mappings of hom-objects are just morphisms in 𝒱0.

To define a functor T from an enriched category 𝒜 to ℬ, we have to define its action on any  hom-object 𝒜(A, B):

TAB : 𝒜(A, B) → ℬ(T A, T B)

Notice that both categories must be enriched over the same 𝒱0 for TAB to be a morphism.

A natural transformation is trickier. Normally it is defined as a family of morphisms parameterized by objects. In the enriched setting, we have a slight problem: For a given object A we have to pick a single morphism from the hom-object ℬ(T A, S A). Here T and S are functors between two enriched categories 𝒜 and ℬ. Luckily, we have this trick of replacing element selection with a mapping from the unit object. We’ll use it to define the components of a natural transformation:

αA: I → ℬ(T A, S A).

The fact that the existence of the unit is necessary to define natural transformations in enriched categories is a little surprising. It also means that adjointness, which is defined through natural isomorphism between hom-objects, cannot be defined without a unit.

The generalization of the naturality condition is also tricky. Normally we just pick a morphism f : A → B in the category 𝒜 and lift it using two functors, T and S, to the target category ℬ. We get two morphisms,

T f : T A → T B

S f : S A → S B

We can then compose them with two components of the natural transformation,

αA : T A → S A

αB : T B → S B

and demand that the resulting diagram commute:

αB ∘ T f = S f ∘ αA

In an enriched category we can’t easily pick individual morphisms, so we have to operate on whole hom-objects. Here’s what we have at our disposal. The components of the natural transformation at A and B:

αB: I → ℬ(T B, S B)

αA: I → ℬ(T A, S A)

The action of the two functors on the hom-object connecting A to B:

TAB : 𝒜(A, B) → ℬ(T A, T B)

SAB : 𝒜(A, B) → ℬ(S A, S B)

Also, the composition of morphisms is replaced with the action of our binary operator on hom-objects. If you look at the hom-sets involved in the standard naturality condition, their composition corresponds to these two mappings under M:

ℬ(T B, S B) ⨁ ℬ(T A, T B) → ℬ(T A, S B)

ℬ(S A, S B) ⨁ ℬ(T A, S A) → ℬ(T A, S B)

The right hand sides are identical, the left hand sides can be obtained by either acting with α on the unit object, or by lifting the hom-object 𝒜(A, B) using S or T. There aren’t that many choices in this jigsaw puzzle. In particular, we get:

α ⨁ T : I ⨁ 𝒜(A, B) → ℬ(T B, S B) ⨁ ℬ(T A, T B)

S ⨁ α : 𝒜(A, B) ⨁ I → ℬ(S A, S B) ⨁ ℬ(T A, S A)

Finally, both of the left hand sides can be obtained from 𝒜(A, B) by applying the inverse of, respectively, l and r — the left and right unitors (remember, they are isomorphisms, so they are invertible). That gives us the naturality hexagon:

Naturality condition for enriched categories

Naturality condition in enriched categories

Representable functors and the Yoneda lemma

On the surface, the enriched version of the Yoneda lemma looks deceptively simple. But there are some interesting nuances that should be mentioned. So let me recap what Yoneda is in the more traditional setting (see my blog post about it). It’s about functors from an arbitrary category to a much simpler category of sets — a category where we can actually look at elements, and where morphisms are familiar functions. In particular we are practically given for free those very convenient representable functors. These are the functors that map objects to hom-sets. You just fix one object, say K, in the category 𝒜, and for any other object X you get the set 𝒜(K, X) in Set.

This mapping of X to 𝒜(K, X) is called a representable functor because, in a way, it represents the object X as a set. It is a functor because it’s also defined on morphisms. Indeed, take a morphism f: X → Y. Its image must be a function from 𝒜(K, X) to 𝒜(K, Y). So pick an h in 𝒜(K, X) and map it to the composition f∘h, which is indeed in 𝒜(K, Y).

Now take any functor F from 𝒜 to Set, not necessarily representable. Yoneda tells us that natural transformations from any representable functor 𝒜(K, _) to F are in one to one correspondence with the elements of F K. In other words, there is a bijection:

Nat(𝒜(K, _), F) ≅ F K

which, additionally, is natural in both K and F (naturality if F requires you to look at F as an object in the category of functors, where morphisms are natural transformations).

Enriched representable functors

So how do we go about generalizing Yoneda to an enriched category. We start with a representable functor, except that now 𝒜(K, X) is not a hom-set but a hom-object in 𝒱0. The mapping from the enriched category 𝒜 to 𝒱0 cannot be treated as an enriched functor, because enriched functors are only defined between categories enriched over the same base category, and 𝒱0 is not enriched. So the trick is to enrich 𝒱0 over itself.

What does it mean? We want to keep the objects of 𝒱0 but somehow replace the hom-sets with objects from — again — 𝒱0. Now, if 𝒱0 is closed, it contains objects that are perfect for this purpose: the exponential objects. We’ll call the category resulting from enriching 𝒱0 over 𝒱0 simply 𝒱, hoping not to cause too much confusion. So we define a hom-object 𝒱(X, Y) as [X, Y] (the new notation for YX). We can apply our magmatic binary operator to two such hom-objects and map the resulting object to another hom-object:

M: [Y, Z] ⨁ [X, Y] → [X, Z]

This defines composition of hom-objects for us. The mapping is induced by the adjunction that defines the exponential object, and in fact requires associativity, so we can relax and admit that we are now in a closed monoidal (rather than magmatic) category. Oh well!

We’ve seen before that there is a mapping between hom-sets and exponentials, which we can rewrite in this form:

𝒱0(X, Y) ≅ V [X, Y]

(with V on the right hand side the functor that defines “elements” of an object). So 𝒱 not only has the same objects as 𝒱0 but its hom-objects are lifted by the functor V from hom-sets of 𝒱0. That makes 𝒱 a perfect stand-in for 𝒱0 in the enriched realm.

We are now free to define (enriched) functors that go from any category 𝒜 enriched over 𝒱0 to 𝒱. In particular, the mapping 𝒜(K, _): 𝒜 → 𝒱, which assigns the object 𝒜(K, X) to every X, defines a representable enriched functor. This way an object X in some exotic enriched category 𝒜 is represented by an object in a presumably better understood category 𝒱 which, being closed and monoidal, has more structure to play with.

Enriched Yoneda lemma

We now have all the tools to formulate the Yoneda lemma for enriched categories. Let’s pick an object K in an enriched category 𝒜. It defines an enriched representable functor, 𝒜(K, _) from 𝒜 to 𝒱. Let’s take another functor F from 𝒜 to 𝒱. The Yoneda lemma states that all natural transformations between these two functors:

α: 𝒜(K, _) → F

are in one-to-one correspondence with “elements” of the object F K, where by “elements” we mean mappings:

η: I → F K

More precisely, there is a bijection (one-to-one mapping that is onto) between the set of natural transformations Nat(𝒜(K, _)) and the set 𝒱0(I, F K).

Nat(𝒜(K, _)) ≅ 𝒱0(I, F K).

Notice that the bijection here is between sets: a set of natural transformations and a set of morphisms. There is a stronger version of the Yoneda lemma, which established a bijection between objects rather than sets. But for that one needs to objectify natural transformations, something that can indeed be done using ends. But that’s a topic for a whole new post.

Conclusions

My main motivation in this blog post was to try to figure out how far one can get with magmas before being forced to introduce a unit, and how far one can get with unital magmas before being forced to introduce associativity. These results are far from rigorous, since I might not have been aware of some hidden assumptions, and because the fact that some property is used in the proof of another property doesn’t mean that it is necessary.

So here’s the short summary. The following things are possible to define:

  1. Magmatic category
  2. Exponential objects and currying in a magmatic category
  3. Adjunction between magmatic product and exponentiation — the internal hom
  4. In unital magmatic category, the adjunction leads to isomorphism between external hom-set and “elements” of the internal hom
  5. Currying in the internal hom requires a monoidal category

Having a magmatic category one can define a magmatically enriched category and the following things:

  1. Functors
  2. Natural transformations — only over unitally magmatic category
  3. Enriched Yoneda lemma — only over monoidal category

Acknowledments

I’m grateful to Gershom Bazerman for stimulating discussions and helpful pointers.

Bibliography

  1. G. M. Kelly, Basic Concepts of Enriched Category Theory
  2. nLab, Closed category

I’m not fond of arguments based on lack of imagination. “There’s no way this code may fail!” might be a sign of great confidence or the result of ignorance. The inability to come up with a counterexample doesn’t prove a theorem. And yet there is one area of programming where such arguments work, and are quite useful. These are parametricity arguments: free theorems about polymorphic functions. Fortunately, there is solid theory behind parametricity. Free theorems are not based on ignorance. So I decided to read the relevant papers (see bibliography at the end of this post) and write a blog about it. How hard could it be? A few months and several failed attempts later I realized how naive I was. But I think I finally understand the basics enough to explain them in relatively simple terms.

Motivation

Here’s a classic example — a function that takes a list of arbitrary type a and returns a list of the same type:

r :: [a] -> [a]

What can this function do? Since it has to work with any type of list element, it can’t do anything type-specific. It can’t modify the elements or invent new ones. So all it can do is rearrange them, duplicate, or remove. Can you think of anything else?

The questions it a little tricky because it all depends on the kind of polymorphism your language supports. In Haskell, where we have parametric polymorphism, the above statement is for the most part true (modulo termination worries). In C++, which supports ad-hoc polymorphism, a generic function like:

template<class T> 
list<T> r(list<T>);

can do all kinds of weird things.

Parametric polymorphism means that a function will act on all types uniformly, so the declaration at the top indeed drastically narrows down the possibilities.

For instance, try fmapping any function of the type:

f :: a -> b

over a list of a. You can either apply fmap before or after acting on it with r. It shouldn’t matter whether you first modify the elements of the list and then rearrange them, or first rearrange and then modify them. The result should be the same:

r (fmap f as) = fmap f (r as)

But is it true just because we can’t imagine how it may fail, or can we make a formal argument to prove it?

Let’s Argue (Denotational) Semantics

One way to understand polymorphism is to have a good model for types. At first approximation types can be modeled as sets of values (strictly speaking, as shown by Reynolds, the set-theoretical model fails in the case of polymorphic lambda calculus, but there are ways around it).

The type Bool is a two-element set of True and False, Integer is a set of integers, and so on. Composite types can also be defined set-theoretically. For instance, a pair type is a cartesian product of two sets. A list of a is a set of lists with elements from the set a. A function type a->b is a set of functions between two sets.

For parametric polymorphism you need to first be able to define functions on types: functions that take a type and produce a new type. In other words, you should be able to define a family of types that is parametrized by another type.

For instance, given some type a, produce a type of pairs: (a, a). This can be formally written (not in Haskell) as:

Λa . (a, a)

Notice the capital lambda for defining functions on types, as opposed to the lowercase lambda used for functions on values.

To turn a family of types into a family of values, you put the universal quantifier in front of it, as in:

forall a . (a, a)

This is a valid type signature, but you’d be hard pressed to implement it. You’d have to provide a pair of concrete values for every possible type. You can’t do it uniformly across all types.

But there are some types of values that can be specified wholesale. These are function values. With one formula you can define a whole family of functions. The following signature, for instance, is perfectly implementable:

forall a . a -> a

Let’s analyze it. It consists of a type function:

Λa . a -> a

which, for any type a, returns a function type a->a. When universally quantified with forall, it becomes a family of concrete functions parameterized by type. This is possible because all these functions can be defined with one closed term:

id x = x

In Haskell, we usually omit the forall quantifier when there’s no danger of confusion. Any signature that contains a type variable is automatically universally quantified over it. (You’ll have to use explicit forall, however, with higher-order polymorphism, where a polymorphic function can be passed as an argument to another function.)

So what’s the set-theoretic model for polymorphism? You simply replace types with sets. A function on types becomes a function on sets. Notice that this is not the same as a function between sets. The latter assigns elements of one set to elements of another. The former assigns sets to sets. As in: Take any set a and return a cartesian product of the set with itself. (You might be tempted to think of functions on sets as functions between sets of all sets, but this kind of thinking leads to well known paradoxes.)

Or take any set a and return the set of functions from this set to itself. For this one we can easily build a polymorphic function — one which for every type a produces an actual function whose type is (a->a). So, for instance, if a is Int, we will produce a function (Int->Int); if a is Bool, we will produce a function (Bool->Bool); and so on. Now, with ad-hoc polymorphism it’s okay to code the Int function separately from the Bool function; but in parametric polymorphism, you’ll have to use the same code for all types.

Any language that provides some kind of pattern-matching on types (e.g., template specialization in C++) automatically introduces ad-hoc polymorphism. Incidentally, ad-hoc polymorphism is also possible in Haskell with the use of type families.

Preservation of Relations

Let’s go to our original example and rewrite it using the explicit universal quantifier:

r :: forall a. [a] -> [a]

It defines a family of functions parametrized by the type a. When used in Haskell code, a particular member of this family will be picked automatically by the type inference system, depending on the context. In what follows, I’ll use explicit subscripting for the same purpose. The free theorem I mentioned before can be rewritten as:

rb (fmap f as) = fmap f (ra as)

with the function:

f :: a -> b

serving as a bridge between the types a and b. Specifically, f relates values of type a to values of type b. This relation happens to be functional, which means that there is only one value of type b corresponding to any given value of type a.

But the correspondence between elements of two lists may, in principle, be more general. What’s more general than a function? A relation. A relation between two sets a and b is defined as a set of pairs — a subset of the cartesian product of a and b. A function is a special case of a relation, one that can be represented as a set of pairs of the form (x, f x), or in relational notation x <=> f x. This relation is often called the graph of the function, since it can be interpreted as coordinates of points on a 2-d plane that form the plot the function.

The key insight of Reynolds’ was that you can abstract the shape of a data structure by defining relations between values. For instance, how do we know that two pairs have the same shape — even if one is a pair of integers, say (1, 7), and the other a pair of colors, say (Red, Blue)? Because we can relate 1 to Red and 7 to Blue. This relation may be called: “occupying the same position”.

Notice that the relation doesn’t have to be functional. The pair (2, 2) can be related to the pair (Black, White) using the non-functional relation:

(2 <=> Black),
(2 <=> White)

Conversely, given any relation between integers and colors, you can easily test which integer pairs are related to which color pairs. For the above relation, for instance, these are all the pairs that are related:

((2, 2) <=> (Black, Black)),
((2, 2) <=> (Black, White)),
((2, 2) <=> (White, Black)),
((2, 2) <=> (White, White))

This idea is easily extended to lists. Two lists are related if their corresponding elements are related: the first element of one list must be related to the first element of the second list, etc.; and empty lists are always related.

In particular, if the relationship between elements is established by a function f, it’s easy to convince yourself that the lists as and bs are related if

bs = fmap f as

With this in mind, our free theorem can be rewritten as:

rb bs = fmap f (ra as)

In other words, it tells us that the two lists

rb bs

and

ra as

are related under f.

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

So r transforms related lists into related lists. It may change the shape of the list, but it never touches the values in it. When it acts on two related lists, it rearranges them in exactly the same way, without breaking any of the relations between corresponding elements.

Reading Types as Relations

The above examples showed that we can define relations between values of composite types in terms of relations between values of simpler types. We’ve seen this with the pair constructor and with the list constructor. Continuing this trend, we can state that two functions:

f :: a -> b

and

g :: a' -> b'

are related iff, for related x and y, f x is related to g y. In other words, related functions map related arguments to related values.

Notice what we are doing here: We are consistently replacing types with relations in type constructors. This way we can read complex types as relations.

But what about primitive types? Let’s consider an example. Two functions from lists to integers that simply calculate the lengths of the lists:

lenStr  :: [Char] -> Int
lenBool :: [Bool] -> Int

What happens when we call them with two related lists? The first requirement for lists to be related is that they are of equal length. So when called with related lists the two functions will return the same integer value . It makes sense for us to consider these two functions related because they don’t inspect the values stored in the lists — just their shapes. They also look like components of the same parametrically polymorphic function, length.

It therefore makes sense to read a primitive type, such as Int, as an identity relation: two values are related if they are equal. This way our two functions, lenStr and lenBool are indeed related, because they turn related lists to related results.

Notice that for non-polymorphic functions the relationship that follows from their type is pretty restrictive. For instance, two functions Int->Int are related if and only if their outputs are equal for equal inputs. In other words, the functions must be (extensionally) equal.

Now let’s go back to polymorphic functions. The type of a polymorphic function is specified by universally quantifying a function on types.

f :: forall A. ΦA

The function Φ maps types to types. In our set-theoretical model it maps sets to sets, but we want to read it in terms of relations.

A function on relations does two things: it maps input sets to output sets, and it maps pairs from input sets to pairs from output sets. If you pick an arbitrary relation A between sets a and a' then Φ will map A to a relation ΦA between sets b and b'. The connection between a and b, and between a' and b' may, in principle, vary with the input relation.

In describing polymorphic functions, we are only interested in functions on relations that “factorize” into functions on sets. It means that Φ defines two families of functions on sets, φ and ψ, such that our b is given by φa and b' is given by ψa'. ΦA then establishes a relation between the sets φa and ψa'.

Fig 2. Φ maps relations to relations

Fig 2. Φ maps relations to relations

Given that Φ maps relations to relations, a universally quantified version of it:

forall A. ΦA

maps pairs of sets to pairs of values.

Now suppose you have two polymorphic functions g and g':

g  :: forall a . φa
g' :: forall a'. ψa'

They both map types (sets) to values. We can instantiate g at some type a, and it will return a value ga of the type b=φa. We can instantiate g' at some type a', and it will return a value g'a' of the type b'=ψa'. We can do this for any relation A between two arbitrary sets a and a'.

We will say that g and g' are related through the relation induced by the type (forall A. ΦA) iff:

  • Φ factorizes, as a function on sets, into φ and ψ, and
  • the results ga and g'a' are related by ΦA.
Fig 3. Relation between two polymorphic functions

Fig 3. Relation between two polymorphic functions

In other words, polymorphic functions are related if they map related types to related values. Notice that in the interesting examples these values are themselves functions.

With these definitions, we can now reinterpret any type signature as a relation between values.

The Parametricity Theorem

Reynolds’ second key insight was that any term is in a relation with itself — the relation being defined by the term’s type. We have indeed defined the mapping of types to relations to make this work. Primitive types turn into identity relations, so obviously a primitive value is in relation with itself. A function between primitive types is in relation with itself because it maps related (equal) arguments into related (equal) results. A list or a pair of primitive types is in relation with itself because each element of it is equal to itself. You can recurse and consider a list of functions, or a pair of lists, etc., building the proof inductively, proceeding from simpler types to more and more complex types. The proof goes over all possible term construction rules and typing rules in a given theory.

Formally, this kind of proof is called “structural induction,” because you’re showing that more complex structures will satisfy the theorem as long as the simpler ones from which they are constructed, do. The only tricky part is dealing with polymorphic functions, because they are quantified over all types. In fact, this is the reason why the naive interpretation of types as sets breaks down (see, however, Pitts’ paper). It is possible, however, to prove the parametricity theorem in a more general setting, for instance, using frames, or in the framework of operational semantics.

Wadler’s key insight was to interpret Reynolds’ theorem not only as a way of identifying different implementations of the same type — for instance, cartesian and polar representations of complex numbers — but also as a source of free theorems for polymorphic types.

Let’s try applying parametricity theorem to some simple examples. Take a constant term: an integer like 5. Its type Int can be interpreted as a relation, which we defined to be the identity relation (it’s one of the primitive types). And indeed, 5 is in this relation with 5.

Take a function like:

ord :: Char -> Int

Its type defines a relation between functions: Two functions of the type Char->Int are related if they return equal integers for equal characters. Obviously, ord is in this relation with itself.

Those were trivial examples. The interesting ones involve polymorphic functions. So let’s go back to our starting example. The term now is the polymorphic function r whose type is:

forall a . [a] -> [a]

Parametricity tells us that r is in relation with itself. However, comparing a polymorphic function to itself involves comparing the instantiations of the same function at two arbitrary types (sets), say a and a':

ra <=> ra'

We are free to pick an arbitrary relation A between elements of these sets. A will be mapped to another relation by ΦA which, in our case, is induced by the type:

[a] -> [a]
Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

By our definition, two functions ra and ra’ are related if they map related arguments to related results. In this case the arguments and the results are lists. So if we have two related lists:

as  :: [a]
as' :: [a']

they must, by parametricity, be mapped to two related lists:

bs  = ra  as
bs' = ra' as'

This must be true for any type of relation, so let’s pick a functional relation generated by some (f :: a -> a'). We have:

as' = fmap f as

The results of applying r, therefore, must be related through the same relation:

bs' = fmap f bs

Putting it all together, we get our expected result:

ra' (fmap f as) = fmap f (ra as)

Parametricity and Natural Transformations

The free theorem I used as the running example is interesting for another reason: The list constructor is a functor. You may think of functors as generalized containers for storing arbitrary types of values. You can imagine that they have shapes; and for two containers of the same shape you may establish a correspondence between “positions” at which the elements are stored. This is quite easy for traditional containers like lists or trees, and with a leap of faith it can be stretched to non-traditional “containers” like functions. We used the intuition of relations corresponding to the idea of “occupying the same position” within a data structure. This notion can be readily generalized to any polymorphic containers. Two trees, for instance, are related if they are both empty, or if they have the same shape and their corresponding elements are related.

Let’s try another functor: You can also think of Maybe as having two shapes: Nothing and Just. Two Nothings are always related, and two Justs are related if their contents are related.

This observation immediately gives us a free theorem about polymorphic functions of the type:

r :: forall a. [a] -> Maybe a

an example of which is safeHead. The theorem is:

fmap h . safeHead == safeHead . fmap h

Notice that the fmap on the left is defined by the Maybe functor, whereas the one on the right is the list one.

If you accept the premise that an appropriate relation can be defined for any functor, then you can derive a free theorem for all polymorphic functions of the type:

r :: forall a. f a -> g a

where f and g are functors. This type of function is known as a natural transformation between the two functors, and the free theorem:

fmap h . r == r . fmap h

is the naturality condition. That’s how naturality follows from parametricity.

Acknowledgments

I’d like to thank all the people I talked to about parametricity at the ICFP in Gothenburg, and Edward Kmett for reading and commenting on the draft of this blog.

Appendix: Other Examples

Here’s a list of other free theorems from Wadler’s paper. You might try proving them using parametricity.

r :: [a] -> a -- for instance, head
f . r == r . fmap f
r :: [a] -> [a] -> [a] -- for instance, (++)
fmap f (r as bs) == r (fmap f as) (fmap f bs)
r :: [[a]] -> [a] -- for instance, concat
fmap f . r == r . fmap (fmap f)
r :: (a, b) -> a -- for instance, fst
f . r == r . mapPair (f, g)
r :: (a, b) -> b -- for instance, snd
g . r == r . mapPair (f, g)
r :: ([a], [b]) -> [(a, b)] -- for instance, uncurry zip
fmap (mapPair (f, g)) . r == r . mapPair (fmap f, fmap g)
r :: (a -> Bool) -> [a] -> [a] -- for instance, filter
fmap f . r (p . f) = r p . fmap f
r :: (a -> a -> Ordering) -> [a] -> [a] -- for instance, sortBy
 -- assuming: f is monotone (preserves order)
fmap f . r cmp == r cmp' . fmap f
r :: (a -> b -> b) -> b -> [a] -> b -- for instance, foldl
-- assuming: g (acc x y) == acc (f x) (g y)
g . foldl acc zero == foldl acc (g zero) . fmap f
r :: a -> a -- id
f . r == r . f
r :: a -> b -> a -- for instance, the K combinator
f (r x y) == r (f x) (g y)

where:

mapPair :: (a -> c, b -> d) -> (a, b) -> (c, d)
mapPair (f, g) (x, y) = (f x, g y)

Bibliography

  1. John C Reynolds, Types, Abstraction and Parametric Polymorphism
  2. Philip Wadler, Theorems for Free!
  3. Claudio Hermida, Uday S. Reddy, Edmund P. Robinson, Logical Relations and Parametricity – A Reynolds Programme for Category Theory and Programming Languages
  4. Derek Dreyer, Paremetricity and Relational Reasoning, Oregon Programming Languages Summer School

In category theory we never look inside objects. All information about objects is encoded in the arrows (morphisms) between them. In set theory, on the other hand, we express properties of sets through their elements. But there is a strong link between categories and sets, at least in the case of locally small categories. In those categories, morphisms between any two objects form a set. We call this set the hom-set.

Go one level up in abstraction and you have categories of functors, which are mappings between categories. In a functor category a functor is treated as an object, and a natural transformation — a mapping between functors– is treated as a morphism. So a hom-set in a functor category is a set of natural transformations between two functors. An interesting question arises: How is this set related to the hom-sets in the categories that are mapped by these functors? After all, they are sets in the same category of sets. The answer involves, as usual, a universal construction. This one is called an end.

Notation

The problem with category theory is that it deals with so many levels of abstraction that you quickly run out of fonts for your notation. I’ll try to stick to more or less the standard notation. I’ll use capital letters C, D, etc. for categories (script font would be nice, but it’s not very practical in a blog). The corresponding lower case letters, c, d — optionally with primes, like c', d' — will denote objects in those categories. I’ll use f, g, h, etc. for morphisms and F, G, H for functors. I’ll also use the terse notation for hom-sets, so the set of all morphisms between objects c and c' in category C will be simply C(c, c'). The set of natural transformations between two functors F and G, both mapping C to D, will be [C, D](F, G). I will not use parentheses for functors acting on objects or morphisms, so F acting on c will be simply F c (and similarly F f, when acting on f). But occasionally I’ll break the rules if it helps the presentation.

Natural Transformations

A natural transformation is a map between functors. So let’s start with two functors F and G, both going from category C to another category D. Let’s focus on one object in C, call it c. F maps c to F c and G maps c to G c. These two, F c and G c, are objects in D. As such they define a hom-set: the set of all morphisms from F c to G c denoted D(F c, G c). A natural transformation picks one morphism from every such hom-set. Not all picks are natural, as we’ll see soon.

Hom-set defined by two images of object c under functors F and G

Hom-set defined by two images of object c under functors F and G

A hom-set, being a set, is also an object in Set — the category of sets. So let’s look at the hom-sets D(F c, G c) as objects in Set. There is one for every c. If we carefully pick one element from each, we will have a natural transformation (I said “carefully”). A natural transformation is a family of morphisms picked from hom-sets D(F c, G c), one for each c. If you think of those hom-sets as fibers growing from objects in C, a natural transformation is a cross-section through all these fibers.

Here’s another way of looking at this. Let’s pick an arbitrary set in Set, call it X (sorry, I’m using a capital letter for an object in Set, but I do need lowercase x for its element). A function φ from X to the set D(F c, G c) maps an element x of X to a particular morphism in D(F c, G c). We can think of it as picking a component of some natural transformation. Each choice of x would potentially correspond to a different natural transformation. So the set X could be looked upon as representing some set of natural transformations. We still have to fill in a lot of blanks, but we are on the right track.

A function from a set X to a hom-set defined by two functors F and G

A function from a set X to a hom-set defined by two functors F and G

Since a natural transformation is a family of morphisms, we need a family of such functions from X to D(F c, G c), one for every c. Let’s call this family τ. When we fix c, it’s a function from X to D(F c, G c). When we fix x, it’s a precursor of a natural transformation: a family of morphisms picked from hom-sets.

A family of functions parameterized by c and x

A family of functions parameterized by c and x

The only thing missing from this picture is the naturality condition. Not all families of D(F c, G c) are natural. We need to account for the fact that our functors F and G not only map objects, but also morphisms.

So let’s look at morphisms in C and how they are mapped to morphisms in D by, say, our functor F. We can pick two object in C, c and c'. Morphisms between those two form a hom-set C(c, c'). F maps this hom-set to another hom-set, D(F c, F c'). Similarly, G maps the same hom-set to D(G c, G c').

Four hom-sets defined by two functors F and G and two objects c and c'

Four hom-sets defined by two functors F and G and two objects c and c’

Taken together, we have a diagram of four hom-sets between four objects: F c, F c', G c, and G c'. Fixing a morphism f from c to c' (an element of C(c, c')) picks two morphisms, one F f from D(F c, F c') and one G f from D(G c, G c'), given the functoriality of F and G. Fixing an x in X picks two morphisms, one τc x from D(F c, G c) and one τc' x from D(F c', G c'). These four better commute:

G f . τc x = τc' x . F f
The commuting naturality square

The commuting naturality square

That’s the naturality condition. Any τ that satisfies this condition defines a set of natural transformations, one for each x.

Wedges

All this time I’ve been setting up the scene for one important insight. The set X and the family of functions τ look a lot like a cone (see my blog post about limits). Except that, instead on one functor, we have two, and τ is not really a natural transformation. But we are getting close. And if we can carve out something cone-like out of this construction, than we could maybe find something limit-like. And indeed the cone-like object is called a wedge and the limit-like thing is called an end, and in our case the end would be a set of all natural transformations from F to G. So let’s work this thing out.

If we were constructing a cone, we’d start with a functor from our source category C to the target category — Set in this case. That’s easy to define for objects: c goes into the set D(F c, G c). But we run into a problem with morphisms. Suppose we want to map a morphism h that goes from c to c':

h : c -> c'

Its image should be a function from D(F c, G c) to D(F c', G c'). It’s a function that maps morphisms to morphisms. Let’s see what morphisms are at our disposal. We have:

F h : F c -> F c'
G h : G c -> G c'

How can we turn a morphism f from D(F c, G c)

f : F c -> G c

into a morphism f' from D(F c', G c'):

f' : F c' -> G c'

One thing we could do is to post-compose G h after f to get to G c'; but there is no way to get anywhere from F c'. So it can’t be that simple.

But notice one thing. Even though τ maps elements of X to “diagonal” hom-sets (by diagonal I mean that the argument c is repeated in D(F c, G c)), naturality condition involves off-diagonal hom-sets, like D(F c, F c') and D(G c, G c') (c potentially different from c'). So maybe we should open our minds to those off-diagonal hom-sets in our search of a functor? How about a functor that maps a pair of elements (c, c') to a hom-set D(F c, G c')? Let’s see if we can figure out its action on morphisms.

Now, since we are constructing a functor of two arguments, we have to define its action on a pair of morphisms, (f, g).

f : c -> c'
g : d -> d'

The image of this pair should be a function in Set that maps D(F c, G d) into D(F c', G d'). Unfortunately, we run into the same problem again. Given h : F c -> G d, we can follow it with G g : G d -> G d', but there is no way we can move anywhere from F c'. The only morphism at our disposal goes the wrong way, F f : F c -> F c'. If we could only reverse it!

Ah! but functors that go “the wrong way” on morphisms are well known. They are called contravariant functors, as opposed to the good old covariant functors that we have grown to love and cherish. What we need is a functor that’s contravariant in the first argument and covariant in the second. Such functors even have a name: they are called profunctors.

So, given a pair of morphisms (f, g) our profunctor Snat will map a morphism from D(F c', G d) to a morphism from D(F c, G d'). Notice that this time I have inverted c and c'.

Given h : F c' -> G d, we can easily construct a correpsponding morphism in D(F c, G d') by this composition:

G g . h . F f : F c' -> G d
Constructing the action of a profunctor built from two functors on a pair of morphisms

Constructing the action of a profunctor built from two functors on a pair of morphisms

So that’s our action of Snat on a pair of morphisms from C. It turns a morphism h into:

(Snat f g) h = G g . h . F f

Let’s summarize what we have so far: We have a profunctor Snat, a set X, and a family of morphisms τ from X to diagonal elements Snat c c. We are not interested in defining τ for off-diagonal elements of Snat. What remains is to impose some kind of generic condition on τ that would let it generate natural transformations only. We would like to formulate this condition in terms of a general profunctor S, if possible.

What’s the minimal consistency condition on τ, given that it only generates diagonal elements S c c? We need a way to somehow connect two such objects, say S c c and S c' c'. Suppose we have a morphism f : c -> c'. How can we use this morphism to operate on S c c and S c' c'? A profunctor S can be used to map a pair of morphisms, so how about pairing our f with something obvious that always exists, like the identity morphism id? Let’s try it:

S idc f  : S c  c  -> S c c'
S f idc' : S c' c' -> S c c'

(Notice the inversion of c and c' when f is used as the first argument — that’s our contravariance in action.) We have found two ways to get from two diagonal elements to the same non-diagonal element of S. Together with τc and τc' they form a diamond. This diamond better commute or we’re in trouble.

The wedge condition

The wedge condition

S idc f . τc = S f idc' . τc'

Now we can finally define a wedge for an arbitrary profunctor S. A wedge for S consists of an object X and a family of morphisms τc from X to S c c that satisfy the wedge condition above.

And if we plug in our special profunctor Snat that maps pairs of objects to hom-sets, the wedge condition turns into the naturality condition for our two functors. Let’s check this out.

Remember, this is how Snat acts on a pair of morphisms f and g:

(Snat f g) h = G g . h . F f

Substituting identities in the right places, we get (remember, functors turn identities into identities):

(Snat idc f) h = G f . h
(Snat f idc') h' = h' . F f

Notice that h and h' are morphisms in D but, at the same time, elements of sets S c c and S c' c' in the wedge condition. They are given, respectively, by the action of τc and τc’ on an element x of X. The wedge condition for Snat will therefore post-compose τc and pre-compose τc:

G f . τc = τc' . F f

And this is exactly the naturality condition for components of τ. It means that τ that satisfies the wedge condition is a natural transformation for any choice of x.

Dinatural transformations

This definition of a wedge looks almost like the definition of a cone, except that the commuting conditions in a cone were expressed in terms of naturality of the transformation between the diagonal functor ΔX and the functor F. Here, we could also say that the object X is an image of a diagonal profunctor ΔX — trivially defined to map pairs of objects from C to an object X, and pairs of morphisms to the identity morphism on X.

A wedge

A wedge (Strictly speaking, a profunctor is a mapping from CopxC to D — the “op” accounting for the reversal of the direction of morphisms in the first argument)

So what we need to complete the picture is some generalized notion of a natural transformation between two profunctors R and S. Since we are only interested in the mapping of diagonal elements of profunctors, this transformation will be called a dinatural transformation (diagonally natural). All we need is to expand the top vertex of the diamond diagram (the wedge condition) to create the commuting hexagon below.

Dinaturality condition

Dinaturality condition

S idc f . τc . R f idc = S f idc' . τc' . R idc' f

With this dinaturality condition we can define a wedge for any profunctor S with a vertex X as a dinatural transformation between ΔX and S.

Now, just as we have defined a limit as a universal cone, we can define an end as a universal wedge. An end is an object End S and a dinatural transformation ω such that for any other wedge (X, τ) there is a unique morphism h from X to End S for which all the triangles in the following diagram commute:

The end as a universal wedge

The end as a universal wedge

In particular, since the wedges for our profunctor Snat defined natural transformations, their end defines the set of all natural transformations between the functors F and G: [C, D](F, G).

There is special notation for an end using the integral symbol. We are “integrating” a profunctor S along a diagnonal over all objects c in category C.

End S = ∫ c S(c, c)

Using this notation, the set of natural transformations can be written as the following end:

[C, D](F, G) = ∫ c D(F c, G c)

Note: This is a bit of abuse of notation that happens quite often in category theory. The integral sign makes more sense for coends, which are duals of ends. Coends are related to colimits, which include coproducts, which are the category theory proxies for sums. In this sense, a coend is a generalization of a sum; just as an integral is an infinite continuous sum. By the same token, an end is a generalization of a product. Unfortunately, there is no common symbol for a continuous product in calculus, so we are stuck with the integral symbol.

In Haskell, ends become universal quantifiers, hence this definition of natural transformation from Edward Kmett’s category extras (here f and g are functors):

type f :~> g = forall a. f a -> g a
type Natural f g = f :~> g

You can find more about representing profunctors, dinatural transformations, and ends in Haskell in Edward’s blog.

The Moment of Zen

You can think of a hom-set as defining a mapping: It takes a pair of objects and generates a set of morphisms — an object in Set. A profunctor generalizes this idea. It takes a pair of objects and generates a hom-set-like object in a category that’s not necessarily the category of sets. The mapping from a pair of objects to a hom-set is functorial: its action on pairs of morphisms is well defined as well. Except that this action is contravariant in the first argument and covariant in the second. And so is a profunctor which imitates it.

We’ve seen before how a functor can embed a simple graph into a category and define a limit. The limit embodies the structure of this graph in a single object. A profunctor embeds a structure of a hom-set into a category. An end then embodies this structure in a single object. When this hom-set structure is fashioned using two functors, the end becomes a set of mappings between those two functors — a set of natural transformations. But nothing stops us from fashioning more complex hom-set-like structures and finding their ends.

One such construction I’ve had my eyes on for a long itme is called the Kan extension. To give you an idea, imagine a functor T that’s defined on a sub-category M of a bigger category C. It maps it into a category A. The question is, can we extend, or interpolate, this functor over the rest of C? How would we define its value on an object c that’s not in M? The trick is to look at all the hom-sets that go from c to objects in M.

Kan extension

Kan extension setup


Our extended functor will have to map not only c to an object in A, but also all those hom-sets to hom-sets in A. After all, that’s what functors do. This mapping of hom-sets looks very much like a profunctor. It has to be contravariant in its source and covariant in its target. If this profunctor has an end, that’s a perfect candidate for the image of c under the extended functor. That, in a nutshell, is the idea behind the right Kan extension (the left one is, predictably, built with coends). But that’s a topic for another blog post.