Programming



“Data structures in functional languages are immutable.”

What?! How can you write programs if you can’t mutate data? To an imperative programmer this sounds like anathema. “Are you telling me that I can’t change a value stored in a vector, delete a node in a tree, or push an element on a stack?” Well, yes and no. It’s all a matter of interpretation. When you give me a list and I give you back the same list with one more element, have I modified it or have I constructed a brand new list with the same elements plus one more?

Why would you care? Actually, you might care if you are still holding on to your original list. Has that list changed? In a functional language, the original data structure will remain unmodified! The version from before the modification persists — hence such data structures are called persistent (it has nothing to do with being storable on disk).

In this post I will discuss the following aspects of persistent data structures:

  • They are immutable, therefore
    • They are easier to reason about and maintain
    • They are thread-safe
  • They can be implemented efficiently
  • They require some type of resource management to “garbage collect” them.

I will illustrate these aspects on a simple example of a persistent linked list implemented in C++.

Motivation

There is a wealth of persistent data structures in functional languages, a lot of them based on the seminal book by Chris Okasaki, Purely Functional Data Structures (based on his thesis, which is available online). Unfortunately, persistent data structures haven’t found their way into imperative programming yet. In this series of blog posts I’ll try to provide the motivation for using functional data structures in imperative languages and start translating some of them into C++. I believe that persistent data structures should be part of every C++ programmer’s toolbox, especially one interested in concurrency and parallelism.

Persistence and Immutability

What’s the advantage of persistent data structures? For one, they behave as if they were immutable, yet you can modify them. The trick is that the modifications never spread to the aliases of the data structure — nobody else can observe them other that the mutator itself. This way you avoid any implicit long-distance couplings. This is important for program maintenance — you know that your bug fixes and feature tweaks will remain localized, and you don’t have to worry about breaking remote parts of the program that happen to have access to the same data structure.

There is another crucial advantage of immutable data structures — they are thread safe! You can’t have a data race on a structure that is read-only. Since copies of a persistent data structure are immutable, you don’t need to synchronize access to them. (This is, by the way, why concurrent programming is much easier in functional languages.)

Persistence and Multithreading

So if you want just one reason to use persistent data structures — it’s multithreading. A lot of conventional wisdom about performance is void in the face of multithreading. Concurrent and parallel programming introduces new performance criteria. It forces you to balance the cost of accessing data structures against the cost of synchronization.

Synchronization is hard to figure out correctly in the first place and is fraught with such dangers as data races, deadlocks, livelocks, inversions of control, etc. But even if your synchronization is correct, it may easily kill your performance. Locks in the wrong places can be expensive. You might be tempted to use a traditional mutable data structure like a vector or a tree under a single lock, but that creates a bottleneck for other threads. Or you might be tempted to handcraft your own fine granularity locking schemes; which is tantamount to designing your own data structures, whose correctness and performance characteristics are very hard to estimate. Even the lock-free data structures of proven correctness can incur substantial synchronization penalty by spinning on atomic variables (more about it later).

The fastest synchronization is no synchronization at all. That’s why the holy grail of parallelism is either not to share, or share only immutable data structures. Persistent data structures offer a special kind of mutability without the need for synchronization. You are free to share these data structures without synchronization because they never change under you. Mutation is accomplished by constructing a new object.

Persistence and Performance

So what’s the catch? You guessed it — performance! A naive implementation of a persistent data structure would require a lot of copying — the smallest modification would produce a new copy of the whole data structure. Fortunately, this is not necessary, and most implementations try to minimize copying by essentially storing only the “deltas.” Half of this blog will be about performance analysis and showing that you can have your cake and eat it too.

Every data structure has unique performance characteristics. If you judge a C++ vector by the performance of indexed access, its performance is excellent: it’s O(1) — constant time. But if you judge it by the performance of insert, which is O(N) — linear time — it’s pretty bad. Similarly, persistent data structures have their good sides and bad sides. Appending to the end of a persistent singly-linked list, for instance, is O(N), but push and pop from the front are a comfortable O(1).

Most importantly, major work has been done designing efficient persistent data structures. In many cases they closely match the performance of mutable data structures, or are within a logarithm from them.

Persistent List: First Cut

Before I get to more advanced data structures in the future installments of this blog, I’d like to start with the simplest of all: singly-linked list. Because it’s so simple, it will be easy to demonstrate the craft behind efficient implementation of persistency.

We all know how to implement a singly linked list in C++. Let’s take a slightly different approach here and define it abstractly first. Here’s the definition:

A list of T is either empty or consists of an element of type T followed by a list of T.

This definition translates directly into a generic data structure with two constructors, one for the empty list and another taking the value/list (head/tail) pair:

template<class T>
class List {
public:
    List();
    List(T val, List tail);
    ...
};

Here’s the trick: Since we are going to make all Lists immutable, we can guarantee that the second argument to the second constructor (marked in red) is forever frozen. Therefore we don’t have to deep-copy it, we can just store a reference to it inside the list. This way we can implement this constructor to be O(1), both in time and space. It also means that those List modifications that involve only the head of the list will have constant cost — because they can all share the same tail. This is very important, because the naive copying implementation would require O(N) time. You’ll see the same pattern in other persistent data structures: they are constructed from big immutable chunks that can be safely aliased rather than copied. Of course this brings the problem of being able to collect the no-longer-referenced tails — I’ll talk about it later.

Another important consequence of immutability is that there are two kinds of Lists: empty and non-empty. A List that was created empty will always remain empty. So the most important question about a list will be whether it’s empty or not. Something in the implementation of the list must store this information. We could, for instance, have a Boolean data member, _isEmpty, but that’s not what we do in C++. For better or worse we use this “clever” trick called the null pointer. So a List is really a pointer that can either be null or point to the first element of the list. That’s why there is no overhead in passing a list by value — we’re just copying a single pointer.

Is it a shallow or a deep copy? Technically it’s shallow, but because the List is (deeply) immutable, there is no observable difference.

Here’s the code that reflects the discussion so far:

template<class T>
class List
{
    struct Item;
public:
    List() : _head(nullptr) {}
    List(T v, List tail) : _head(new Item(v, tail._head)) {}
    bool isEmpty() const { return !_head; }
private:
    // may be null
    Item const * _head;
};

The Item contains the value, _val and a pointer _next to the (constant) Item (which may be shared between many lists):

    struct Item
    {
        Item(T v, Item const * tail) : _val(v), _next(tail) {}
        T _val;
        Item const * _next;
    };

The fact that items are (deeply) immutable can’t be expressed in the C++ type system, so we use (shallow) constness and recursive reasoning to enforce it.

In a functional language, once you’ve defined your constructors, you can automatically use them for pattern matching in order to “deconstruct” objects. For instance, an empty list would match the empty list pattern/constructor and a non-empty list would match the (head, tail) pattern/constructor (often called “cons”, after Lisp). Instead, in C++ we have to define accessors like front, which returns the head of the list, and pop_front, which returns the tail:

    T front() const
    {
        assert(!isEmpty());
        return _head->_val;
    }
    List pop_front() const
    {
        assert(!isEmpty());
        return List(_head->_next);
    }

In the implementation of pop_front I used an additional private constructor:

    explicit List (Item const * items) : _head(items) {}

Notice the assertions: You are not supposed to call front or pop_front on an empty list. Make sure you always check isEmpty before you call them. Admittedly, this kind of interface exposes the programmer to potential bugs (forgetting to check for an empty list) — something that the pattern-matching approach of functional languages mitigates to a large extent. You could make these two methods safer in C++ by using boost optional, but not without some awkwardness and performance overhead.

We have defined five primitives: two constructors plus isEmpty, front, and pop_front that completely describe a persistent list and are all of order O(1). Everything else can be implemented using those five. For instance, we may add a helper method push_front:

    List push_front(T v) const {
       return List(v, *this);
    }

Notice that push_front does not modify the list — it returns a new list with the new element at its head. Because of the implicit sharing of the tail, push_front is executed in O(1) time and takes O(1) space.

The list is essentially a LIFO stack and its asymptotic behavior is the same as that of the std::vector implementation of stack (without the random access, and with front and back inverted). There is an additional constant cost of allocating Items (and deallocating them, as we’ll see soon) both in terms of time and space. In return we are gaining multithreaded performance by avoiding the the need to lock our immutable data structures.

I’m not going to argue whether this tradeoff is always positive in the case of simple lists. Remember, I used lists to demonstrate the principles behind persistent data structures in the simplest setting. In the next installment though, I’m going to make a stronger case for tree-based data structures.

Reference Counting

If we could use garbage collection in C++ (and there are plans to add it to the Standard) we’d be done with the List, at least in the case of no hard resources (the ones that require finalization). As it is, we better come up with the scheme for releasing both memory and hard resources owned by lists. Since persistent data structures use sharing in their implementation, the simplest thing to do is to replace naked pointers with shared pointers.

Let’s start with the pointer to the head of the list:

std::shared_ptr<const Item> _head;

We no longer need to initialize _head to nullptr in the empty list constructor because shared_ptr does it for us. We need, however, to construct a new shared_ptr when creating a list form a head and a tail:

List() {}
List(T v, List const & tail) 
    : _head(std::make_shared<Item>(v, tail._head)) {}

Item itself needs a shared_ptr as _next:

struct Item
{
    Item(T v, std::shared_ptr<const Item> const & tail) 
        : _val(v), _next(tail) {}
    T _val;
    std::shared_ptr<const Item> _next;
};

Surprisingly, these are the only changes we have to make. Everything else just works. Every time a shared_ptr is copied, as in the constructors of List and Item, a reference count is automatically increased. Every time a List goes out of scope, the destructor of _head decrements the reference count of the first Item of the list. If that item is not shared, it is deleted, which decreases the reference count of the next Item, and so on.

Let’s talk about performance again, because now we have to deal with memory management. Reference counting doesn’t come for free. First, a standard implementation of shared_ptr consists of two pointers — one pointing to data, the other to the reference counter (this is why I’m now passing List by const reference rather than by value — although it’s not clear if this makes much difference).

Notice that I was careful to always do Item allocations using make_shared, rather than allocating data using new and then turning it into a shared_ptr. This way the counter is allocated in the same memory block as the Item. This not only avoids the overhead of a separate call to new for the (shared) counter, but also helps locality of reference.

Then there is the issue of accessing the counter. Notice that the counter is only accessed when an Item is constructed or destroyed, and not, for instance, when the list is traversed. So that’s good. What’s not good is that, in a multithreaded environment, counter access requires synchronization. Granted, this is usually the lock-free kind of synchronization provided by shared_ptr, but it’s still there.

So my original claim that persistent data structures didn’t require synchronization was not exactly correct in a non-garbage-collected environment. The problem is somewhat mitigated by the fact that this synchronization happens only during construction and destruction, which are already heavy duty allocator operations with their own synchronization needs.

The cost of synchronization varies depending on how much contention there is. If there are only a few threads modifying a shared list, collisions are rare and the cost of a counter update is just one CAS (Compare And Swap) or an equivalent atomic operation. The overhead is different on different processor, but the important observation is that it’s the same overhead as in an efficient implementation of a mutex in the absence of contention (the so called thin locks or futexes require just one CAS to enter the critical section — see my blog about thin locks).

At high contention, when there are a lot of collisions, the reference count synchronization degenerates to a spin lock. (A mutex, on the other hand, would fall back on the operating system, since it must enqueue blocked threads). This high contention regime, however, is unlikely in the normal usage of persistent data structures.

A little digression about memory management is in order. Allocating Items from a garbage-collected heap would likely be more efficient, because then persistent objects would really require zero synchronization, especially if we had separate per-processor heaps. It’s been known for some time that the tradeoff between automated garbage collection (GC) and reference counting (RC) is far from obvious. David Bacon et. al. showed that, rather than there being one most efficient approach, there is a whole spectrum of solutions between GC and RC, each with their own performance tradeoffs.

There is a popular belief that GC always leads to long unexpected pauses in the execution of the program. This used to be true in the old times, but now we have incremental concurrent garbage collectors that either never “stop the world” or stop it for short bounded periods of time (just do the internet search for “parallel incremental garbage collection”). On the other hand, manual memory management a la C++ has latency problems of its own. Data structures that use bulk allocation, like vectors, have to occasionally double their size and copy all elements. In a multithreaded environment, this not only blocks the current thread from making progress but, if the vector is shared, may block other threads as well.

The use of shared_ptr in the implementation of containers may also result in arbitrarily long and quite unpredictable slowdowns. A destruction of a single shared_ptr might occasionally lead to a cascade of dereferences that deallocate large portions of a data structure, which may in turn trigger a bout of free list compactions within the heap (this is more evident in tree-like, branching, data structures). It’s important to keep these facts in mind when talking about performance tradeoffs, and use actual timings in choosing implementations.

List Functions

Since, as I said, a persistent list is immutable, we obviously cannot perform destructive operations on it. If we want to increment each element of a list of integers, for instance, we have to create a new list (which, by the way, doesn’t change the asymptotic behavior of such an operation). In functional languages such bulk operations are normally implemented using recursion.

You don’t see much recursion in C++ because of one problem: C++ doesn’t implement tail recursion optimization. In any functional language worth its salt, a recursive function that calls itself in its final step is automatically replaced by a loop. In C++, recursion consumes stack and may lead to stack overflow. So it’s the lack of guaranteed tail recursion optimization that is at the root of C++ programmers’ aversion to recursion. Of course, there are also algorithms that cannot be made tail recursive, like tree traversals, which are nevertheless implemented using recursion even in C++. One can make an argument that (balanced) tree algorithms will only use O(log(N)) amounts of stack, thus mitigating the danger of stack overflow.

List algorithms may be implemented either using recursion or loops and iterators. I’ll leave the implementation of iterators for a persistent list to the reader — notice that only a const forward iterator or an output iterator make sense in this case. Instead I’ll show you a few examples of recursive algorithms. They can all be rewritten using loops and iterators, but it’s interesting to see them in the purest form.

The example of incrementing each element of a list is a special case of a more general algorithm of applying a function to all elements of a list. This algorithm is usually called fmap and can be generalized to a large variety of data structures. Those parameterized data structures that support fmap are called functors (not to be confused with the common C++ misnomer for a function object). Here’s fmap for our persistent list:

template<class U, class T, class F>
List<U> fmap(F f, List<T> lst)
{
    static_assert(std::is_convertible<F, std::function<U(T)>>::value, 
                 "fmap requires a function type U(T)");
    if (lst.isEmpty()) 
        return List<U>();
    else
        return List<U>(f(lst.front()), fmap<U>(f, lst.pop_front()));
}

An astute reader will notice a similarity between fmap and the standard C++ algorithm transform in both semantics and interface. The power of the Standard Template Library can be traced back to its roots in functional programming.

The static_assert verifies that the the template argument F is convertible to a function type that takes T and returns U. This way fmap may be instantiated for a function pointer, function object (a class with an overloaded operator()), or a lambda, as long as its output type is convertible to U. Ultimately, these kind of constraints should be expressible as concepts.

The compiler is usually able to infer type arguments for a template function by analyzing the instantiation context. Unfortunately, inferring the return type of a functional argument like F in fmap is beyond its abilities, so you are forced to specify the type of U at the call site, as in this example (also, toupper is defined to return an int rather than char):

auto charLst2 = fmap<char>(toupper, charLst);

There is a common structure to recursive functions operating on functional data structures. They usually branch on, essentially, different constructors. In the implementation of fmap, we first check for an empty list — the result of the empty constructor — otherwise we deconstruct the (head, tail) constructor. We apply the function f to the head and then recurse into the tail.

Notice that fmap produces a new list of the same shape (number and arrangement of elements) as the original list. There are also algorithms that either change the shape of the list, or produce some kind of a “total” from a list. An example of the former is filter:

template<class T, class P>
List<T> filter(P p, List<T> lst)
{
    static_assert(std::is_convertible<P, std::function<bool(T)>>::value, 
                 "filter requires a function type bool(T)");
    if (lst.isEmpty())
        return List<T>();
    if (p(lst.front()))
        return List<T>(lst.front(), filter(p, lst.pop_front()));
    else
        return filter(p, lst.pop_front());
}

Totaling a list requires some kind of running accumulator and a function to process an element of a list and “accumulate” it in the accumulator, whatever that means. We also need to define an “empty” accumulator to start with. For instance, if we want to sum up the elements of a list of integers, we’d use an integer as an accumulator, set it initially to zero, and define a function that adds an element of a list to the accumulator.

In general such accumulation may produce different results when applied left to right or right to left (although not in the case of summation). Therefore we need two such algorithms, foldl (fold left) and foldr (fold right).

The right fold first recurses into the tail of the list to produce a partial accumulator then applies the function f to the head of the list and that accumulator:

template<class T, class U, class F>
U foldr(F f, U acc, List<T> lst)
{
    static_assert(std::is_convertible<F, std::function<U(T, U)>>::value, 
                 "foldr requires a function type U(T, U)");
    if (lst.isEmpty())
        return acc;
    else
        return f(lst.front(), foldr(f, acc, lst.pop_front()));
}

Conversely, the left fold first applies f to the head of the list and the accumulator that was passed in, and then calls itself recursively with the new accumulator and the tail of the list. Notice that, unlike foldr, foldl is tail recursive.

template<class T, class U, class F>
U foldl(F f, U acc, List<T> lst)
{
    static_assert(std::is_convertible<F, std::function<U(U, T)>>::value, 
                 "foldl requires a function type U(U, T)");
    if (lst.isEmpty())
        return acc;
    else
        return foldl(f, f(acc, lst.front()), lst.pop_front());
}

Again, the STL implements a folding algorithm as well, called accumulate. I’ll leave it to the reader to figure out which fold it implements, left or right, and why.

In C++ we can have procedures that instead of (or along with) producing a return value produce side effects. We can capture this pattern with forEach:

template<class T, class F>
void forEach(List<T> lst, F f) 
{
    static_assert(std::is_convertible<F, std::function<void(T)>>::value, 
                 "forEach requires a function type void(T)");
    if (!lst.isEmpty()) {
        f(lst.front());
        forEach(lst.pop_front(), f);
    }
}

We can, for instance, use forEach to implement print:

template<class T>
void print(List<T> lst)
{
    forEach(lst, [](T v) 
    {
        std::cout << "(" << v << ") "; 
    });
    std::cout << std::endl;
}

Singly-linked list concatenation is not a cheap operation. It takes O(N) time (there are however persistent data structures that can do this in O(1) time). Here’s the recursive implementation of it:

template
List concat(List const & a, List const & b)
{
    if (a.isEmpty())
        return b;
    return List(a.front(), concat(a.pop_front(), b));
}

We can reverse a list using foldl in O(N) time. The trick is to use a new list as the accumulator:

template<class T>
List<T> reverse(List<T> const & lst)
{
    return foldl([](List<T> const & acc, T v)
    {
        return List<T>(v, acc);
    }, List<T>(), lst);
}

Again, all these algorithms can be easily implemented using iteration rather than recursion. In fact, once you define (input/output) iterators for a List, you can just use the STL algorithms.

Conclusion

A singly linked list is not the most efficient data structure in the world but it can easily be made persistent. What’s important is that a persistent list supports all the operation of a FIFO stack in constant time and is automatically thread safe. You can safely and efficiently pass such lists to and from threads without the need to synchronize (except for the internal synchronization built into shared pointers).

Complete code for this post is available on GitHub. It uses some advanced features of C++11. I compiled it with Visual Studio 2013.

Appendix: Haskell Implementation

Functional languages support persistent lists natively. But it’s not hard to implement lists explicitly. The following one liner is one such implementation in Haskell:

data List t = Empty | Cons t (List t)

This list is parameterized by the type parameter t (just like our C++ template was parameterized by T). It has two constructors, one called Empty and the other called Cons. The latter takes two arguments: a value of type t and a List of t — the tail. These constructors can be used for both the creation of new lists and for pattern-matching. For instance, here’s the implementation of cat (the function concat is already defined in the Haskell default library so I had to use a different name):

cat Empty lst = lst
cat (Cons x tail) lst = Cons x (cat tail lst)

The selection between the empty and non-empty case is made through pattern matching on the first argument. The first line matches the pattern (constructor) Empty. The second line matches the Cons pattern and, at the same time, extracts the head and the tail of the list (extraction using pattern matching is thus much safer than calling head or tail because an empty list won’t match this pattern). It then constructs a new list with x as the head and the (recursive) concatenation of the first list’s tail with the second list. (I should mention that this recursive call is lazily evaluated in Haskell, so the cost of cat is amortized across many accesses to the list — more about lazy evaluation in the future.)


Edward Kmett’s lens library made lenses talk of the town. This is, however, not a lens tutorial (let’s wait for the upcoming Simon Peyton Jones’s intro to lenses (edit: here it is)). I’m going to concentrate on one aspect of lenses that I’ve found intriguing — the van Laarhoven representation.

Quick introduction: A lens is a data structure with a getter and a setter.

get :: b -> a
set :: b -> a -> b

Given object of type b, the getter returns the value of type a of the substructure on which the lens is focused (say, a field of a record, or an element of a list). The setter takes the object and a new value and returns a modified object, with the focus substructure replaced by the new value.

A lens can also be represented as a single function that returns a Store structure. Given the object of type b the lens returns a pair of: old value at the focus, and a function to modify it. This pair represents the Store comonad (I followed the naming convention in Russell O’Connor’s paper, see bibliography):

data Store a b = Store
  { pos  :: a
  , peek :: a -> b
  }

The two elements of Store are just like getter and setter except that the initial b -> was factored out.

Here’s the unexpected turn of events: Twan van Laarhoven came up with a totally different representation for a lens. Applied to the Store comonad it looks like this:

forall g . Functor g => (a -> g a) -> g b

The Store with pos and peek is equivalent to this single polymorphic function. Notice that the polymorphism is in terms of the functor g, which means that the function may only use the functor-specific fmap, nothing else. Recall the signature of fmap — for a given g:

fmap :: (a -> b) -> g a -> g b

The only way the van Laarhoven function could produce the result g b is if it has access to a function a -> b and a value of type a (which is exactly the content of Store). It can apply the function a -> g a to this value and fmap the function a -> b over it. Russell O’Connor showed the isomorphism between the two representations of the Store comonad.

This is all hunky-dory, but what’s the theory behind this equivalence? When is a data structure equivalent to a polymorphic function? I’ve seen this pattern in the Yoneda lemma (for a refresher, see my tutorial: Understanding Yoneda). In Haskell we usually see a slightly narrower application of Yoneda. It says that a certain set of polymorphic functions (natural transformations) is equivalent to a certain set of values in the image of the functor g:

forall t . (a -> t) -> g t
   ≈ g a

Here, (a -> t) is a function, which is mapped to g t — some functor g acting on type t. This mapping is defined once for all possible types t — it’s a natural transformation — hence forall t. All such natural transformations are equivalent to (parameterized by, isomomophic to) a type (set of values) g a. I use the symbol for type isomorphism.

There definitely is a pattern, but we have to look at the application of Yoneda not to Haskell types (the Hask category) but to Haskell functors.

Yoneda on Functors

We are in luck because functors form a category. Objects in the Functor Category are functors between some underlying categories, and morphisms are natural transformations between those functors.

Here’s the Yoneda construction in the Functor Category. Let’s fix one functor f and consider all natural transformations of f to an arbitrary functor g. These transformations form a set, which is an object in the Set Category. For each choice of g we get a different set. Let’s call this mapping from functors to sets a representation, rep. This mapping, the canonical Yoneda embedding, is actually a functor. I’ll call this a second order functor, since it takes a functor as its argument. Being a functor, its action on morphisms in the Functor Category is also well-defined — morphisms being natural transformations in this case.

Now let’s consider an arbitrary mapping from functors to sets, let’s call it eta and let’s assume that it’s also a 2nd order functor. We have now two such functors, rep and eta. The Yoneda lemma considers mappings between these 2nd order functors, but not just any mappings — natural transformations. Since those transformations map 2nd order functors, I’ll also call them 2nd order natural transformations (thick yellow arrow in the picture).

What does it mean for a transformation to be natural? Technically, it means that a certain diagram commutes (see my Yoneda tutorial), but the Haskell intuition is the following: A natural transformation must be polymorphic in its argument. It treats this argument generically, without considering its peculiarities. In our case the 2nd order natural transformation will be polymorphic in its functor argument g.

The Yoneda lemma tells us that all such 2nd order natural transformations from rep to eta are in one-to-one correspondence with the elements of eta acting on f. I didn’t want to overcrowd the picture, but imagine another red arrow going from f to some set. That’s the set that parameterizes these natural transformations.

Back to Haskell

Let’s get down to earth. We’ll specialize the general Functor Category to the category of endofunctors in Hask and replace the Set Category with Hask (a category of Haskell types, which are treated as sets of values).

Elements of the Functor Category will be represented in Haskell through the Functor class. If f and g are two functors, a natural transformation between them can be defined pointwise (e.g., acting on actual types) as a polymorphic function.

forall t . f t -> g t

Another way of looking at this formula is that it describes a mapping from an arbitrary functor g to the Haskell type forall t . f t -> g t, where f is some fixed functor. This mapping is the Yoneda embedding we were talking about in the previous section. We’ll call it RepF:

{-# LANGUAGE Rank2Types #-}

type RepF f g = (Functor f, Functor g) 
    => forall t . f t -> g t

The second ingredient we need is the mapping Eta that, just like Rep maps functors to types. Since the kind of the functor g is * -> *, the kind of Eta is (* -> *) -> *.

type family Eta :: (* -> *) -> *

Putting all this together, the Yoneda lemma tells us that the following types are equivalent:

{-# LANGUAGE Rank2Types, TypeFamilies #-}

type family Eta :: (* -> *) -> *

type NatF f = Functor f  
    => forall g . Functor g 
    => forall t. (f t -> g t) -> Eta g

-- equivalent to 

type NatF' = Functor f => Eta f

There are many mappings that we can substitute for Eta, but I’d like to concentrate on the simplest nontrivial one, parameterized by some type b. The action of this EtaB on a functor g is defined by the application of g to b.

{-# LANGUAGE Rank2Types #-}

type EtaB b g = Functor g => g b

Now let’s consider 2nd order natural transformations between RepF and EtaB or, more precisely, between RepF f and EtaB b for some fixed f and b. These transformations must be polymorphic in g:

type NatFB f b = Functor f  
    => forall g . Functor g 
    => forall t. (f t -> g t) -> EtaB b g

This can be further simplified by applying EtaB to its arguments:

type NatFB f b = Functor f  
    => forall g . Functor g 
    => forall t. (f t -> g t) -> g b

The final step is to apply the Yoneda lemma which, in this case, tells us that the above type is equivalent to the type obtained by acting with EtaB b on f. This type is simply f b.

Do you see how close we are to the van Laarhoven equivalence?

forall g . Functor g => (a -> g a) -> g b
    ≈ (a, a -> b)

We just need to find the right f. But before we do that, one little exercise to get some familiarity with the Yoneda lemma for functors.

Undoing fmap

Here’s an interesting choice for the functor f — function application. For a given type a the application (->) a is a functor. Indeed, it’s easy to implement fmap for it:

instance Functor ((->) a) where
    -- fmap :: (t -> u) -> (a -> t) -> (a -> u)
    fmap f g = f . g

Let’s plug this functor in the place of f in our version of Yoneda:

type NatApA a b =   
    forall g . Functor g 
    => forall t. ((a -> t) -> g t) -> g b

NatApA a b ≈ a -> b

Here, f t was replaced by a -> t and f b by a -> b. Now let’s dig into this part of the formula:

forall t. ((a -> t) -> g t)

Isn’t this just the left hand side of the regular Yoneda? It’s a natural transformation between the Yoneda embedding functor (->) a and some functor g. The lemma tells us that this is equivalent to the type g a. So let’s make the substitution:

type NatApA a b =   
    forall g . Functor g 
    => g a -> g b

NatApA a b ≈ a -> b

On the one hand we have a function g a -> g b which maps types lifted by the functor g. If this function is polymorphic in the functor g than it is equivalent to a function a -> b. This equivalence shows that fmap can go both ways. In fact it’s easy to show the isomorphism of the two types directly. Of course, given a function a -> b and any functor g, we can construct a function g a -> g b by applying fmap. Conversely, if we have a function g a -> g b that works for any functor g then we can use it with the trivial identity functor Identity and recover a -> b.

So this is not a big deal, but the surprise for me was that it followed from the Yoneda lemma.

The Store Comonad and Yoneda

After this warmup exercise, I’m ready to unveil the functor f that, when plugged into the Yoneda lemma, will generate the van Laarhoven equivalence. This functor is:

Product (Const a) ((->) a)

When acting on any type b, it produces:

(Const a b, a -> b)

The Const functor ignores its second argument and is equivalent to its first argument:

newtype Const a b = Const { getConst :: a }

So the right hand side of the Yoneda lemma is equivalent to the Store comonad (a, a -> b).

Let’s look at the left hand side of the Yoneda lemma:

type NatFB f b = Functor f  
    => forall g . Functor g 
    => forall t. (f t -> g t) -> g b

and do the substitution:

forall g . Functor g 
  => forall t. ((a, a -> t) -> g t) -> g b

Here’s the curried version of the function in parentheses:

forall t. (a -> (a -> t) -> g t)

Since the first argument a doesn’t depend on t, we can move it in front of forall:

a -> forall t . (a -> t) -> g t

We are now free to apply the 1st order Yoneda

forall t . (a -> t) -> g t ≈ g a

Which gives us:

a -> forall t . (a -> t) -> g t ≈ a -> g a

Substituting it back to our formula, we get:

forall g . Functor g => (a -> g a) -> g b
  ≈ (a, a -> b)

Which is exactly the van Laarhoven equivalence.

Conclusion

I have shown that the equivalence of the two formulations of the Store comonad and, consequently, the Lens, follows from the Yoneda lemma applied to the Functor Category. This equivalence is a special case of the more general formula for functor-polymorphic representations:

type family Eta :: (* -> *) -> *

Functor f  
    => forall g . Functor g 
    => forall t. (f t -> g t) -> Eta g
  ≈ Functor f => Eta f

This formula is parameterized by two entities: a functor f and a 2nd order functor Eta.

In this article I restricted myself to one particular 2nd order functor, EtaB b, but it would be interesting to see if more complex Etas lead to more interesting results.

In the choice of the functor f I also restricted myself to just a few simple examples. It would be interesting to try, for instance, functors that generate recursive data structures and try to reproduce some of the biplate and multiplate results of Russell O’Connor’s.

Appendix: Another Exercise

Just for the fun of it, let’s try substituting Const a for f:

The naive substitution would give us this:

forall g . Functor g 
    => (forall t . Const a t -> g t) -> g b 
  ≈ Const a b

But the natural transformation on the left:

forall t . Const a t -> g t

cannot be defined for all gs. Const a t doesn’t depend on t, so the co-domain of the natural transformation can only include functors that don’t depend on their argument — constant functors. So we are really only looking for gs that are Const c for any choice of c. All the allowed variation in g can be parameterized by type c:

forall c . (Const a t -> Const c t) -> Const c b 
  ≈ Const a b

If you remove the Const noise, the conclusion turns out to be pretty trivial:

forall c . (a -> c) -> c ≈ a

It says that a polymorphic function that takes a function a -> c and returns a c must have some fixed a inside. This is pretty obvious, but it’s nice to know that it can be derived from the Yoneda lemma.

Acknowledgment

Special thanks go to the folks on the Lens IRC channel for correcting my Haskell errors and for helpful advice.

Update

After I finished writing this blog, I contacted Russel O’Connor asking him for comments. It turned out that he and Mauro Jaskelioff had been working on a paper in which they independently came up with almost exactly the same derivation of the van Laarhoven representation for the lens starting from the Yoneda lemma. They later published the paper, A Representation Theorem for Second-Order Functionals, including the link to this blog. It contains a much more rigorous proof of the equivalence.

Bibliography

  1. Twan van Laarhoven, [CPS based functional references](http://twanvl.nl/blog/haskell/cps-functional-references)
  2. Russell O’Connor, [Functor is to Lens as Applicative is to Biplate](http://arxiv.org/pdf/1103.2841v2.pdf)

I’ve been looking for a good analogy of what programming in C++ feels like and I remembered this 1990 Tim Burton movie, Edward Scissorhands.

It’s a darker version of Pinocchio, shot in suburban settings. In this poster, the scary guy (Johnny Depp) is trying to gently hug Winona Ryder but his clumsy scissor-hands are making it very dangerous for both of them. His face is already covered with deep scars.

Having scissors for hands in not all that bad. Edward has many talents: he can, for instance, create stunning dog hairdos.

I often have these kinds of thoughts after attending C++ conferences: this time it was Going Native 2013. The previous year, the excitement was all about the shiny new C++11 Standard. This year it was more of a reality check. Don’t get me wrong — there were many stunning dog hairdos on display (I mean C++ code that was elegant and simple) but the bulk of the conference was about how to avoid mutilation and how to deliver first aid in case of accidental amputation.

Little shop of horrors

There was so much talk about how not to use C++ that it occurred to me that maybe this wasn’t the problem of incompetent programmers, but that straightforward C++ is plain wrong. So if you just learn the primitives of the language and try to use them, you’re doomed.

C++ has an excuse for that: backward compatibility — in particular compatibility with C. You might think of the C subset of C++ as bona fide assembly language which you shouldn’t use it in day-to-day programming, except that it’s right there on the surface. If you reach blindly into your C++ toolbox, you’re likely to come up with naked pointers, for loops, and all this ugly stuff.

A well known example of what not to do is to use malloc to dynamically allocate memory, and free to deallocate it. malloc takes a count of bytes and returns a void pointer, which you have to cast to something more usable — it would be hard to come up with worse API for memory management. Here’s an example of really bad (but almost correct, if it weren’t for the possibility of null pointer dereference) code:

struct Pod {
    int count;
    int * counters;
};

int n = 10;
Pod * pod = (Pod *) malloc (sizeof Pod);
pod->count = n
pod->counters = (int *) malloc (n * sizeof(int));
...
free (pod->counters);
free (pod);

Hopefully, nobody writes code like this in C++, although I’m sure there are a lot of legacy apps with such constructs, so don’t laugh.

C++ “solved” the problem of redundant casting and error-prone size calculations by replacing malloc and free with new and delete. The corrected C++ version of the code above would be:

struct Pod {
    int count;
    int * counters;
};

int n = 10;
Pod * pod = new Pod;
pod->count = n;
pod->counters = new int [n];
...
delete [] pod->counters;
delete pod;

BTW, the null pointer dereference problem is solved too, because new will throw an exception when the system runs out of memory. There is still a slight chance of a memory leak if the second new fails (But how often does that happen? Hint: how big can n get?) So here’s the really correct version of the code:

class Snd { // Sophisticated New Data
public:
    Snd (int n) : _count(n), _counters(new int [n]) {}
    ~Snd () { delete [] _counters; }
private:
    int _count;
    int * _counters;
};

Snd * snd = new Snd (10);
...
delete snd;

Are we done yet? Of course not! The code is not exception safe.

The C++ lore is that you should avoid naked pointers, avoid arrays, avoid delete. So the remedy for the lameness of malloc is operator new, which is also broken because it returns a dangerous pointer and pointers are bad.

We all know (and have scars on our faces to prove it) that you should use the Standard Library containers and smart pointers whenever possible. Oh, and use value semantics for passing things around. No wait! Value semantics comes with a performance penalty because of excessive copying. So what about shared_ptr and vectors of shared_ptr? But that adds the overhead of reference counting! No, here’s a new idea: move semantics and rvalue references.

I can go on and on like this (and I often do!). Do you see the pattern? Every remedy breeds another remedy. It’s no longer just the C subset that should be avoided. Every new language feature or library addition comes with a new series of gotchas. And you know a new feature is badly designed if Scott Meyers has a talk about it. (His latest was about the pitfalls of, you guessed it, move semantics.)

The Philosophy of C++

Bjarne Stroustrup keeps stressing how important backward compatibility is for C++. It’s one of the pillars of the C++ philosophy. Considering how much legacy code there is, it makes perfect sense. Compatibility, though, takes a very heavy toll on the evolution of the language. If nature were as serious about backward compatibility as C++ is, humans would still have tails, gills, flippers, antennae, and external skeletons on top of internal ones — they all made sense at some point in the evolution.

C++ has become an extremely complex language. There are countless ways of doing the same thing — almost all of them either plain wrong, dangerous, unmaintainable, or all of the above. The problem is that most code compiles and even runs. The mistakes and shortcomings are discovered much later, often after the product has been released.

You might say: Well, that’s the nature of programming. If you think so, you should seriously look at Haskell. Your first reaction will be: I don’t know how to implement the first thing (other than the factorial and Fibonacci numbers) in this extremely restrictive language. This is totally different from the C++ experience, where you can start hacking from day one. What you don’t realize is that it will take you 10 years, if you’re lucky, to discover the “right way” of programming in C++ (if there even is such thing). And guess what, the better a C++ programmer you are, the more functional your programs look like. Ask any C++ guru and they will tell you: avoid mutation, avoid side effects, don’t use loops, avoid class hierarchies and inheritance. But you will need strict discipline and total control over your collaborators to pull that off because C++ is so permissive.

Haskell is not permissive, it won’t let you — or your coworkers — write unsafe code. Yes, initially you’ll be scratching your head trying to implement something in Haskell that you could hack in C++ in 10 minutes. If you’re lucky, and you work for Sean Parent or other exceptional programmer, he will code review your hacks and show you how not to program in C++. Otherwise, you might be kept in the dark for decades accumulating self-inflicted wounds and dreaming of dog hairdos.

Resource Management

I started this post with examples of resource management (strictly speaking, memory management), because this is one of my personal favorites. I’ve been advocating and writing about it since the nineties (see bibliography at the end). Obviously I have failed because 20 years later resource management techniques are still not universally known. Bjarne Stroustrup felt obliged to spend half of his opening talk explaining resource management to the crowd of advanced C++ programmers. Again, one could blame incompetent programmers for not accepting resource management as the foundation of C++ programming. The problem though is that there is nothing in the language that would tell a programmer that something is amiss in the code I listed in the beginning of this post. In fact it often feels like learning the correct techniques is like learning a new language.

Why is it so hard? Because in C++ the bulk of resource management is memory management. In fact it has to be stressed repeatedly that garbage collection would not solve the problem of managing resources: There will always be file handles, window handles, open databases and transactions, etc. These are important resources, but their management is overshadowed by the tedium of memory management. The reason C++ doesn’t have garbage collection is not because it can’t be done in an efficient way, but because C++ itself is hostile to GC. The compiler and the runtime have to always assume the worst — not only that any pointer can alias any other pointer but that a memory address can be stored as an integer or its lower bits could be used as bitfields (that’s why only conservative garbage collectors are considered for C++).

It’s a common but false belief that reference counting (using shared pointers in particular) is better than garbage collection. There is actual research showing that the two approaches are just two sides of the same coin. You should realize that deleting a shared pointer may lead to an arbitrary long pause in program execution, with similar performance characteristics as a garbage sweep. It’s not only because every serious reference counting algorithm must be able to deal with cycles, but also because every time a reference count goes to zero on a piece of data a whole graph of pointers reachable from that object has to be traversed. A data structure built with shared pointers might take a long time to delete and, except for simple cases, you’ll never know which shared pointer will go out of scope last and trigger it.

Careful resource management and spare use of shared_ptr might still be defendable for single-threaded programs, but the moment you start using concurrency, you’re in big trouble. Every increment or decrement of the counter requires locking! This locking is usually implemented with atomic variables, but so are mutexes! Don’t be fooled: accessing atomic variables is expensive. Which brings me to the central problem with C++.

Concurrency and Parallelism

It’s been 8 years since Herb Sutter famously exclaimed: The Free Lunch is Over! Ever since then the big C++ oil tanker has been slowly changing its course. It’s not like concurrency was invented in 2005. Posix threads have been defined in 1995. Microsoft introduced threads in Windows 95 and multiprocessor support in Windows NT. Still, concurrency has only been acknowledged in the C++ Standard in 2011.

C++11 had to start from scratch. It had to define the memory model: when and in what order memory writes from multiple threads become visible to other threads. For all practical purposes, the C++ memory model was copied from Java (minus some controversial guarantees that Java made about behavior under data races). In a nutshell, C++ programs are sequentially consistent if there are no data races. However, since C++ had to compete with the assembly language, the full memory model includes so called weak atomics, which I would describe as portable data races, and recommend staying away from.

C++11 also defined primitives for thread creation and management, and basic synchronization primitives as defined by Dijkstra and Hoare back in the 1960s, such as mutexes and condition variables. One could argue whether these are indeed the right building blocks for synchronization, but maybe that doesn’t really matter because they are known not to be composable anyway. The composable abstraction for synchronization is STM (Software Transactional Memory), which is hard to implement correctly and efficiently in an imperative language. There is an STM study group in the Standards Committee, so there is a chance it might one day become part of the Standard. But because C++ offers no control over effects, it will be very hard to use properly.

There was also a misguided and confusing attempt at providing support for task-based parallelism with async tasks and non-composable futures (both seriously considered for deprecation in C++14). Thread-local variables were also standardized making task-based approach that much harder. Locks and condition variables are also tied to threads, not tasks. So that was pretty much a disaster. The Standards Committee has the work cut out for them for many years ahead. That includes task-based composable parallelism, communication channels to replace futures (one would hope), task cancellation and, probably longer term, data-driven parallelism, including GPU support. A derivative of Microsoft PPL and Intel TBB should become part of the Standard (hopefully not Microsoft AMP).

Let’s take a great leap of faith and assume that all these things will be standardized and implemented by, say, 2015. Even if that happens, I still don’t think people will be able to use C++ for mainstream parallel programming. C++ has been designed for single thread programming, and parallel programming requires a revolutionary rather than evolutionary change. Two words: data races. Imperative languages offer no protection against data races — maybe with the exception of D.

In C++, data is shared between threads by default, is mutable by default, and functions have side effects almost by default. All those pointers and references create fertile grounds for data races, and the vulnerability of data structures and functions to races is in no way reflected in the type system. In C++, even if you have a const reference to an object, there is no guarantee that another thread won’t modify it. Still worse, any references inside a const object are mutable by default.

D at least has the notion of deep constness and immutability (no thread can change an immutable data structure). Another nod towards concurrency from D is the ability to define pure functions. Also, in D, mutable objects are not shared between threads by default. It is a step in the right direction, even though it imposes runtime cost for shared objects. Most importantly though, threads are not a good abstraction for parallel programming, so this approach won’t work with lightweight tasks and work-stealing queues, where tasks are passed between threads.

But C++ doesn’t support any of this and it doesn’t look like it ever will.

Of course, you might recognize all these pro-concurrency and parallelism features as functional programming — immutability and pure functions in particular. At the risk of sounding repetitive: Haskell is way ahead of the curve with respect to parallelism, including GPU programming. That was the reason I so easily converted to Haskell after years of evangelizing good programming practices in C++. Every programmer who’s serious about concurrency and parallelism should learn enough Haskell to understand how it deals with it. There is an excellent book by Simon Marlow, Parallel and Concurrent Programming in Haskell. After you read it, you will either start using functional techniques in your C++ programming, or realize what an impedance mismatch there is between parallel programming and an imperative language, and you will switch to Haskell.

Conclusions

I believe that the C++ language and its philosophy are in direct conflict with the requirements of parallel programming. This conflict is responsible for the very slow uptake of parallel programming in mainstream software development. The power of multicore processors, vector units, and GPUs is being squandered by the industry because of an obsolete programming paradigm.

Bibliography

Here I put together some of my publications about resource management:

  1. Bartosz Milewski, “Resource Management in C++,” Journal of Object Oriented Programming, March/April 1997, Vol. 10, No 1. p. 14-22. This is still pre-unique_ptr, so I’m using auto_ptr for what it’s worth. Since you can’t have vectors of auto_ptr I implemented an auto_vector.
  2. C++ Report in September 1998 and February 1999 (still using auto_ptr).
  3. C++ in Action (still auto_ptr), Addison Wesley 2001. See an excerpt from this book that talks about resource management.
  4. Walking Down Memory Lane, with Andrei Alexandrescu, CUJ October 2005 (using unique_ptr)
  5. unique_ptr–How Unique is it?, WordPress, 2009

Here are some of my blogs criticizing the C++11 approach to concurrency:

  1. Async Tasks in C++11: Not Quite There Yet
  2. Broken promises–C++0x futures

This is my 100th WordPress post, so I decided to pull all the stops and go into some crazy stuff where hard math and hard physics mix freely with wild speculation. I hope you will enjoy reading it as much as I enjoyed writing it.

It’s a HoTT Summer of 2013

One of my current activities is reading the new book, Homotopy Type Theory (HoTT) that promises to revolutionize the foundations of mathematics in a way that’s close to the heart of a programmer. It talks about types in the familiar sense: Booleans, natural numbers, (polymorphic) function types, tuples, discriminated unions, etc.

As do previous type theories, HoTT assumes the Curry-Howard isomorphism that establishes the correspondence between logic and type theory. The gist of it is that any theorem can be translated into a definition of a type; and its proof is equivalent to producing a value of that type (false theorems correspond to uninhabited types that have no elements). Such proofs are by necessity constructive: you actually have to construct a value to prove a theorem. None if this “if it didn’t exist then it would lead to contradictions” negativity that is shunned by intuitionistic logicians. (HoTT doesn’t constrain itself to intuitionistic logic — too many important theorems of mathematics rely on non-constructive proofs of existence — but it clearly delineates its non-intuitionistic parts.)

Type theory has been around for some time, and several languages and theorem provers have been implemented on the base of the Curry-Howard isomorphism: Agda and Coq being common examples. So what’s new?

Set Theory Rant

Here’s the problem: Traditional type theory is based on set theory. A type is defined as a set of values. Bool is a two-element set {True, False}. Char is a set of all (Unicode) characters. String is an infinite set of all lists of characters. And so on. In fact all of traditional mathematics is based on set theory. It’s the “assembly language” of mathematics. And it’s not a very good assembly language.

First of all, the naive formulation of set theory suffers from paradoxes. One such paradox, called Russell’s paradox, is about sets that are members of themselves. A “normal” set is not a member of itself: a set of dogs is not a dog. But a set of all non-dogs is a member of itself — it’s “abnormal”. The question is: Is the set of all “normal” sets normal or abnormal? If it’s normal that it’s a member of normal sets, right? Oops! That would make it abnormal. So maybe it’s abnormal, that is not a member of normal sets. Oops! That would make it normal. That just shows you that our natural intuitions about sets can lead us astray.

Fortunately there is an axiomatic set theory called the Zermelo–Fraenkel (or ZF) theory, which avoids such paradoxes. There are actually two versions of this theory: with or without the Axiom of Choice. The version without it seems to be too weak (not every vector space has a basis, the product of compact sets isn’t necessarily compact, etc.); the one with it (called ZFC) leads to weird non-intuitive consequences.

What bothers many mathematicians is that proofs that are based on set theory are rarely formal enough. It’s not that they can’t be made formal, it’s just that formalizing them would be so tedious that nobody wants to do it. Also, when you base any theory on set theory, you can formulate lots of idiotic theorems that have nothing to do with the theory in question but are only relevant to its clunky set-theoretical plumbing. It’s like the assembly language leaking out from higher abstractions. Sort of like programming in C.

Donuts are Tastier than Sets

Tired of all this nonsense with set theory, a group of Princeton guys and their guests decided to forget about sets and start from scratch. Their choice for the foundation of mathematics was the theory of homotopy. Homotopy is about paths — continuous maps from real numbers between 0 and 1 to topological spaces; and continuous deformations of such paths. The properties of paths capture the essential topological properties of spaces. For instance, if there is no path between a and b, it means that the space is not connected — it has at least two disjoint components — a sits in one and b in another.

Two paths from a to b that cannot be continuously deformed into each other

If two paths between a and b cannot be deformed into each other, it means that there is a hole in space between them.

Obviously, this “traditional” formulation of homotopy relies heavily on set theory. A topological space, for instance, is defined in terms of open sets. So the first step is to distill the essence of homotopy theory by getting rid of sets. Enter Homotopy Type Theory. Paths and their deformations become primitives in the theory. We still get to use our intuitions about paths as curves inscribed on surfaces, but otherwise the math is totally abstract. There is a small set of axioms, the basic one asserting that the statement that a and b are equivalent is equivalent to the statement that they are equal. Of course the notions of equivalence and equality have special meanings and are very well defined in terms of primitives.

Cultural Digression

Why homotopy? I have my own theory about it. Our mathematics has roots in Ancient Greece, and the Greeks were not interested in developing technology because they had very cheap labor — slaves.

Euclid explaining geometry to his pupils in Raphael’s School of Athens

Instead, like all agricultural societies before them (Mesopotamia, Egypt), they were into owning land. Land owners are interested in geometry — Greek word γεωμετρία literally means measuring Earth. The “computers” of geometry were the slate, ruler and compass. Unlike technology, the science of measuring plots of land was generously subsidized by feudal societies. This is why the first rigorous mathematical theory was Euclid’s geometry, which happened to be based on axioms and logic. Euclid’s methodology culminated in the 20th century in Hilbert’s program of axiomatization of the whole of mathematics. This program crashed and burned when Gödel proved that any non-trivial theory (one containing arithmetic) is chock full of non-decidable theorems.

I was always wondering what mathematics would be like if it were invented by an industrial, rather than agricultural, society. The “computer” of an industrial society is the slide rule, which uses (the approximation of) real numbers and logarithms. What if Newton and Leibniz never studied Euclid? Would mathematics be based on calculus rather than geometry? Calculus is not easy to axiomatize, so we’d have to wait for the Euclid of calculus for a long time. The basic notions of calculus are Banach spaces, topology, and continuity. Topology and continuity happen to form the basis of homotopy theory as well. So if Greeks were an industrial society they could have treated homotopy as more basic than geometry. Geometry would then be discovered not by dividing plots of land but by studying solutions to analytic equations. Instead of defining a circle as a set of points equidistant from the center, as Euclid did, we would first define it as a solution to the equation x2+y2=r2.

Now imagine that this hypothetical industrial society also skipped the hunter-gather phase of development. That’s the period that gave birth to counting and natural numbers. I know it’s a stretch of imagination worthy a nerdy science fiction novel, but think of a society that would evolve from industrial robots if they were abandoned by humanity in a distant star system. Such a society could discover natural numbers by studying the topology of manifolds that are solutions to n-dimensional equations. The number of holes in a manifold is always a natural number. You can’t have half a hole!

Instead of counting apples (or metal bolts) they would consider the homotopy of the two-apple space: Not all points in that space can be connected by continuous paths.

There is no continuous path between a and b

Maybe in the world where homotopy were the basis of all mathematics, Andrew Wiles’s proof of the Fermat’s Last Theorem could fit in a margin of a book — as long as it were a book on cohomology and elliptic curves (some of the areas of mathematics Wiles used in his proof). Prime numbers would probably be discovered by studying the zeros of the Riemann zeta function.

Industrial robot explaining to its pupils the homotopy of a two-apple space.

Quantum Digression

If our industrial robots were very tiny and explored the world at the quantum level (nanorobots?), they might try counting particles instead of apples. But in quantum mechanics, a two-particle state is not a direct product of two one-particle states. Two particles share the same wave function. In some cases this function can be factorized when particles are far apart, in others it can’t, giving rise to quantum entanglement. In quantum world, 2 is not always equal to 1+1.

In Quantum Field Theory (QFT — the relativistic counterpart of Quantum Mechanics), physicist calculate the so called S matrix that describes idealized experiments in which particles are far away from each other in the initial and final states.  Since they don’t interact, they can be approximated by single-particle states. For instance, you can start with a proton and an antiproton coming at each other from opposite directions. They can be approximated as two separate particles. Then they smash into each other, produce a large multi-particle mess that escapes from the interaction region and is eventually seen as (approximately) separate particles by a big underground detector. (That’s, for instance, how the Higgs boson was discovered.) The number of particles in the final state may very well be different from the number of particles in the initial state. In general, QFT does not preserve the number of particles. There is no such conservation law.

Counting particles is very different from counting apples.

Relaxing Equality

In traditional mathematics, the notions of isomorphism and equality are very different. Isomorphism means (in Greek, literally) that things have the same shape, but aren’t necessarily equal. And yet mathematicians often treat isomorphic things as if they were equal. They prove a property of one thing and then assume that this property is also true for all things isomorphic. And it usually is, but nobody has the patience to prove it on the case-by-case basis. This phenomenon even has its own name: abuse of notation. It’s like writing programs in a language in which equality ‘==’ does not translate into the assembly-language CMP instruction followed be a conditional jump. We would like to work with structural identity, but all we do is compare pointers. You can overload operator ‘==’ in C++ but many a bug was the result of comparing pointers instead of values.

How can we make isomorphism more like equality? HoTT took quite an unusual approach by relaxing equality enough to make it plausibly equivalent to isomorphism.

HoTT’s homotopic version of equality is this: Two things are equal if there is a path between them. This equality is reflexive, symmetric, and transitive, just like equality is supposed to be. Reflexivity, for instance, tells us that x=x, and indeed there is always a trivial (constant) path from a point to itself. But there could also be other non-trivial paths looping from the point to itself. Some of them might not even be contractible. They all contribute to equality x=x.

There could be several paths between different points, a and b, making them “equal”: a=b. We are tempted to say that in this picture equality is a set of paths between points. Well, not exactly a set but the next best thing to a set — a type. So equality is a type, often called “identity type”, and two things are equal if the “identity type” for them is inhabited. That’s a very peculiar way to define equality. It’s an equality that carries with it a witness — a construction of an element of the equality type.

Relaxing Isomorphism

The first thing we could justifiably expect from any definition of equality is that if two things are equal they should be isomorphic. In other words, there should be an invertible function that maps one equal thing to another equal thing. This sound pretty obvious until you realize that, since equality is relaxed, it’s not! In fact we can’t prove strict isomorphism between things that are homotopically equal. But we do get a slightly relaxed version of isomorphism called equivalence. In HoTT, if things are equal they are equivalent. Phew, that’s a relief!

The trick is going the other way: Are equivalent things equal? In traditional mathematics that would be blatantly wrong — there are many isomorphic objects that are not equal. But with the HoTT’s notion of equality, there is nothing that would contradict it. In fact, the statement that equivalence is equivalent to equality can be added to HoTT as an axiom. It’s called Voevodski’s axiom of univalence.

It’s hard (or at least tedious), in traditional math, to prove that properties (propositions) can be carried over isomorphisms. With univalence, equivalence (generalized isomorphism) is the same as equality, and one can prove once and for all that propositions can be transported between equal objects. With univalence, the tedium of proving that if one object has a given property then all equivalent (“isomorphic”) object have the same property is eliminated.

Incidentally, where do types live? Is there (ahem!) a set of all types? There’s something better! A type of types called a Universe. Since a Universe is a type, is it a member of itself? You can almost see the Russel’s paradox looming in the background. But don’t despair, a Universe is not a member of itself, it’s a member of the higher Universe. In fact there are infinitely many Universes, each being a member of the next one.

Taking Roots

How does relaxed equality differ from the set-theoretical one? The simplest such example is the equality of Boolean types. There are two ways you can equal the Bool type to itself: One maps True to True and False to False, the other maps True to False and False to True. The first one is an identity mapping, but the second one is not — its square though is! (apply this mapping twice and you get back to original). Within HoTT you can take the square root of identity!

So here’s an interesting intuition for you: HoTT is to set theory as complex numbers are to real numbers (in complex numbers you can take a square root of -1). Paradoxically, complex numbers make a lot of things simpler. For instance, all quadratic equations are suddenly solvable. Sine and cosine become two projections of the same complex exponential. Riemann’s zeta function gains very interesting zeros on the imaginary line. The hope is that switching from sets to homotopy will lead to similar simplifications.

I like the example with flipping Booleans because it reminds me of an interesting quantum phenomenon. Imagine a quantum state with two identical particles. What happens when you switch the particles? If you get exactly the same state, the particles are called bosons (think photons). If you don’t, they are called fermions (think electrons). But when you flip fermions twice, you get back to the original state. In many ways fermions behave like square roots of bosons. For instance their equation of motion (Dirac equation) when squared produces the bosonic equation of motion (Klein-Gordon equation).

Computers Hate Sets

There is another way HoTT is better than set theory. (And, in my cultural analogy, that becomes more pertinent when an industrial society transitions to a computer society.) There is no good way to represent sets on a computer. Data structures that model sets are all fake. They always put some restrictions on the type of elements they can store. For instance the elements must be comparable, or hashable, or something. Even the simplest set of just two elements is implemented as an ordered pair — in sets you can’t have the first or the second element of a set (and in fact the definition of a pair as a set is quite tricky). You can easily write a program in Haskell that would take a (potenitally infinite) list of pairs and pick one element from each pair to form a (potentially infinite) list of picks. You can, for instance, tell the computer to pick the left element from each pair. Replace lists of pairs with sets of sets and you can’t do it! There is no constructive way of creating such a set and it’s very existence hinges on the axiom of choice.

This fact alone convinces me that set theory is not the best foundation for the theory of computing. But is homotopy a better assembly language for computing? We can’t represent sets using digital computers, but can we represent homotopy? Or should we start building computers from donuts and rubber strings? Maybe if we keep miniaturizing our computers down to the Planck scale, we could find a way to do calculations using loop quantum gravity, if it ever pans out.

Aharonov-Bohm Experiment

Even without invoking quantum gravity, quantum mechanics exhibits a lot of interesting non-local behaviors that often probe the topological properties of the surroundings. For instance, in the classic double-slit experiment, the fact that there are paths between the source of electrons and the screen that are not homotopically equivalent makes the electrons produce an interference pattern. But my favorite example is the Bohm-Aharonov experiment.

First, let me explain what a magnetic potential is. One of the Maxwell’s equations states that the divergence of the magnetic field is always zero (see a Tidbit at the end of this post that explains this notation):

This is the reason why magnetic field lines are always continuous. Interestingly, this equation has a solution that follows from the observation that the divergence of a curl is zero. So we can represent the magnetic field as a curl of some other vector field, which is called the magnetic potential A:

It’s just a clever mathematical trick. There is no way to measure magnetic potential, and the solution isn’t even unique: you can add a gradient of any scalar field to it and you’ll get the same magnetic field (the curl of a gradient is zero). So A is totally fake, it exists only as a device to simplify some calculations. Or is it…?

It turns out that electrons can sniff the magnetic potential, but only if there’s a hole in space. It turns out that, if you have a thin (almost) infinite linear coil with a current running through its windings, (almost) all magnetic field will be confined to its interior. Outside the coil there’s no magnetic field. However, there is a nonzero curl-free magnetic potential circling it. Now imagine using this coil as a separator between the two slits of the double-slit experiment. As before, there are two paths for the electron to follow: to the left of the coil and to the right of the coil. But now, along one path, the electron will be traveling with the lines of magnetic potential; along the other, against.

Aharononv-Bohm experiment. There are two paths available to the electron.

Magnetic potential doesn’t contribute to the electron’s energy or momentum but it does change its phase. So in the presence of the coil, the interference pattern in the two slit experiment shifts. That shift has been experimentally confirmed. The Aharonov-Bohm effect takes place because the electron is excluded from the part of space that is taken up by the coil — think of it as an infinite vertical line in space. The space available to the electron contains paths that cannot be continuously deformed into each other (they would have to cross the coil). In HoTT that would mean that although the point a, which is the source of the electron, and point b, where the electron hit the screen, are “equal,” there are two different members of the equality type.

The Incredible Quantum Homotopy Computer

The Aharonov-Bohm effect can be turned on and off by switching the current in the coil (actually, nobody uses coils in this experiment, but there is some promising research that uses nano-rings instead). If you can imagine a transistor built on the Aharonov-Bohm principle, you can easily imagine a computer. But can we go beyond digital computers and really explore varying homotopies?

I’ll be the first to admit that it might be too early to go to Kickstarter and solicit funds for a computer based on the Aharonov-Bohm effect that would be able to prove theorems formulated using Homotopy Type Theory; but the idea of breaking away from digital computing is worth a thought or two.

Or we can leave it to the post apocalyptic industrial-robot civilization that doesn’t even know what a digit is.

Acknowledgments

I’m grateful to the friendly (and patient) folks on the HoTT IRC channel for answering my questions and providing valuable insights.

Tidbit about Differential Operators

What are all those curls, divergences, and gradients? It’s just some vectors in 3-D space.

A scalar field φ(x, y, z) is a single function of space coordinates x, y, and z. You can calculate three different derivatives of this function with respect to x, y, and z. You can symbolically combine these three derivatives into one vector, (∂x, ∂y, ∂z). There is a symbol for that vector, called a nabla: . If you apply a nabla to a scalar field, you get a vector field that is called the gradient, φ, of that field. In coordinates, it is: (∂xφ, ∂yφ, ∂zφ).

A vector field V(x, y, z) is a triple of functions forming a vector at each point of space, (Vx, Vy, Vz). Magnetic field B and magnetic potential A are such vector fields. There are two ways you can apply a nabla to a vector field. One is just a scalar product of the nabla and the vector field, ·V, and it’s called the divergence of the vector field. In components, you can rewrite it as ∂xVx + ∂yVy + ∂zVz.

The other way of multiplying two vectors is called the vector product and its result is a vector. The vector product of the nabla and a vector field, ×A, is called the curl of that field. In components it is: (∂yAz – ∂zAy, ∂zAx – ∂xAz, ∂xAy – ∂yAx).

The vector product of two vectors is perpendicular to both. So when you then take a scalar product of the vector product with any of the original vectors, you get zero. This works also with nablas so, for instance, ·(×A) = 0 — the divergence of a curl is zero. That’s why the solution to ·B is B = ×A.

Similarly, because the vector product of a vector with itself is zero, we get ×φ = 0 — the curl of a gradient is zero. That’s why we can always add a term of the form φ to A and get the same field B. In physics, this freedom is called gauge invariance.


What is algebra? Naively speaking algebra gives us the ability to perform calculations with numbers and symbols. Abstract algebra treats symbols as elements of a vector space: they can be multiplied by scalars and added to each other. But what makes algebras stand appart from linear spaces is the presence of vector multiplication: a bilinear product of vectors whose result is another vector (as opposed to inner product, which produces a scalar). Complex numbers, for instance, can be described as 2-d vectors, whose components are the real and the imaginary parts.

But nothing prepares you for this definition of F-algebra from the Haskell package Control.Functor.Algebra:

type Algebra f a = f a -> a

In this post I will try to bridge the gap between traditional algebras and more powerful F-algebras. F-algebras reduce the notion of an algebra to the bare minimum. It turns out that the three basic ingredients of an algebra are: a functor, a type, and a function. It always amazes me how much you can do with so little. In particular I will explain a very general way of evaluating arbitrary expressions using catamorphisms, which reduces to foldr when applied to lists (which can also be looked at as simple F-algebras).

The Essence of Algebra

There are two really essential aspects of an algebra:

  1. The ability to form expressions and
  2. The ability to evaluate these expressions

The Essence of Expression

The standard way of generating expressions is to use grammars. Here’s an example of a grammar in Haskell:

data Expr = Const Int
          | Add Expr Expr
          | Mul Expr Expr

Like most non-trivial grammars, this one is defined recursively. You may think of Expr as a self-similar fractal. An Expr, as a type, contains not only Const Int, but also Add and Mult, which inside contain Exprs, and so on. It’s trees all the way down.

1. The fractal nature of an expression type

But recursion can be abstracted away to uncover the real primitives behind expressions. The trick is to define a non-recursive function and then find its fixed point.

Since here we are dealing with types, we have to define a type function, otherwise known as type constructor. Here’s the non-recursive precursor of our grammar (later you’ll see that the F in ExprF stands for functor):

data ExprF a = Const Int
             | Add a a
             | Mul a a

The fractally recursive structure of Expr can be generated by repeatedly applying ExprF to itself, as in ExprF (ExprF (ExprF a))), etc. The more times we apply it, the deeper trees we can generate. After infinitely many iterations we should get to a fixed point where further iterations make no difference. It means that applying one more ExprF would’t change anything — a fixed point doesn’t move under ExprF. It’s like adding one to infinity: you get back infinity.

In Haskell, we can express the fixed point of a type constructor f as a type:

newtype Fix f = In (f (Fix f))

If you look at this formula closely, it is exactly what I said: Fix f is the type you get by applying f to itself. It’s a fixed point of f. (In the literature you’ll sometimes see Fix called Mu.)

We only need one generic recursive type, Fix, to be able to crank out other recursive types from (non-recursive) type constructors.

One thing to observe about the data constructor of Fix: In can be treated as a function that takes an element of type f (Fix f) and produces a Fix f:

In :: f (Fix f) -> Fix f

We’ll use this function later.

With that, we can redefine Expr as a fixed point of ExprF:

type Expr = Fix ExprF

You might ask yourself: Are there any values of the type Fix ExprF at all? Is this type inhabited? It’s a good question and the answer is yes, because there is one constructor of ExprF that doesn’t depend on a. This constructor is Const Int. We can bootstrap ourselves because we can always create a leaf Expr, for instance:

val :: Fix ExprF
val = In (Const 12)

Once we have that ability, we can create more and more complex values using the other two constructors of ExprF, as in:

testExpr = In $ (In $ (In $ Const 2) `Add` 
                (In $ Const 3)) `Mul` (In $ Const 4)

The Essence of Evaluation

Evaluation is a recipe for extracting a single value from an expression. In order to evaluate expressions which are defined recursively, the evaluation has to proceed recursively as well.

Again, recursion can be abstracted away — all we really need is an evaluation strategy for each top level construct (generated, for instance, by ExprF) and a way to evaluate its children. Let’s call this non-recursive top-level evaluator alg and the recursive one (for evaluating children) eval. Both alg and eval return values of the same type, a.

First, we need to be able to map eval over the children of an expression. Did somebody mentioned mapping? That means we need a functor!

Indeed, it’s easy to convince ourselves that our ExprF is a functor:

instance Functor ExprF where
    fmap f (Const i) = Const i
    fmap f (left `Add` right) = (f left) `Add` (f right)
    fmap f (left `Mul` right) = (f left) `Mul` (f right)

An F-algebra is built on top of a functor — any functor. (Strictly speaking, an endofunctor: it maps a given category into itself — in our examples the category refers to Hask — the category of all Haskell types).

Now suppose we know how to evaluate all the children of Add and Mul in an Expr, giving us values of some type a. All that’s left is to evaluate (Add a a) and (Mul a a) in ExprF a. (We also need to evaluate Const Int, but that doesn’t involve recursion.)

Here’s an example of such an evaluator that produces Int values:

alg :: ExprF Int -> Int

alg (Const i)   = i
alg (x `Add` y) = x + y
alg (x `Mul` y) = x * y

(Notice that we are free to add and multiply x and y, since they are just Ints.)

What I have done here is to pick one particular type, Int, as my evaluation target. This type is called the carrier type of the algebra. I then defined a function alg from the image of Int under the functor ExprF back to Int.

Just to show that the carrier type is arbitrary, let me define another evaluator that returns a string:

alg' :: ExprF String -> String

alg' (Const i)   = [chr (ord 'a' + i)]
alg' (x `Add` y) = x ++ y
alg' (x `Mul` y) = concat [[a, b] | a <- x, b <- y]

F-Algebras

We are now ready to define F-algebras in the most general terms. First I’ll use the language of category theory and then quickly translate it to Haskell.

An F-algebra consists of:

  1. an endofunctor F in a category C,
  2. an object A in that category, and
  3. a morphism from F(A) to A.

An F-algebra in Haskell is defined by a functor f, a carrier type a, and a function from (f a) to a. (The underlying category is Hask.)

Right about now the definition with which I started this post should start making sense:

type Algebra f a = f a -> a

For a given functor f and a carrier type a the alebra is defined by specifying just one function. Often this function itself is called the algebra, hence my use of the name alg in previous examples.

Back to our conrete example, the functor is ExprF, the carrier type is Int and the function is alg:

-- My simple algebra
type SimpleA = Algebra ExprF Int

alg :: SimpleA
alg (Const i)   = i
alg (x `Add` y) = x + y
alg (x `Mul` y) = x * y

The only thing that’s still missing is the definition of the function eval, which takes care of evaluating children of an expression. It turns out this function can be defined in a very general form. To do that we’ll need to familiarize ourselves with the notion of the initial algebra.

Initial Algebras

There are many algebras based on a given functor (I’ve shown you two so far). But there is one algebra to bind them all — the initial algebra. In fact you’ve already seen elements of it. Remember the Fix type function?

newtype Fix f = In (f (Fix f))

Given any functor f it defines a new unique type Fix f. We will now lift ourselves by the bootstraps. We’ll use this type as a carrier in the definition of another algebra. This will turn out to be our initial algebra.

First, let’s go back to our example and, instead of using Int or String, use (Fix ExprF) as the carrier type:

type ExprInitAlg = Algebra ExprF (Fix ExprF)

We have the functor and the carrier type. To complete the triple we need to define a function with the following signature:

ex_init_alg :: ExprF (Fix ExprF) -> Fix ExprF

Guess what, we already have a function of this type. It’s the constructor of Fix:

ex_init_alg = In

(Replace f with ExprF in the definition of Fix to see that the type signatures match.)

But wait! What does this “evaluator” evaluate? Given (ExprF Expr) it produces an Expr. For instance, when given,

Add (In $ Const 2) (In $ Const 3)

it will produce an Expr:

In $ Add (In $ Const 2) (In $ Const 3)

This evaluator doesn’t reduce anything like the evaluators we’ve been using so far. It is not lossy. It preserves all the information passed to it as input. [Note: In fact, Lambek’s lemma states that the initial algebra is an isomorphism.] In comparison, all other evaluators potentially lose some information. They return some kind of summary of the information encoded in the data structure. In this sense, the algebra we have just defined is at least as powerful as all other algebras based on the same functor. That’s why it’s called the initial algebra.

The word initial has a special meaning in category theory. The initial algebra has the property that there exists a (unique) homomophism from it to any other algebra based on the same functor.

A homomoprhism is a mapping that preserves certain structure. In the case of algebras, a homomorphism has to preserve the algebraic structure. An algebra consists of a functor, a carrier type, and an evaluator. Since we are keeping the functor fixed, we only need to map carrier types and evaluators.

In fact, a homomorphism of algebras is fully specified by a function that maps one carrier to another and obeys certain properties. Since the carrier of the intial algebra is Fix f, we need a function:

g :: Fix f -> a

where a is the carrier for the other algebra. That algebra has an evaluator alg with the signature:

alg :: f a -> a

2. Homomorphism from the initial algebra to an arbitrary algebra

The special property g has to obey is that it shouldn’t matter whether we first use the initial algebra’s evaluator and then aply g, or first apply g (through fmap) and then the second algebra’s evaluator, alg. Let’s check the types involved to convince ourselves that this requirement makes sense.

The first evaluator uses In to go from f (Fix f) to Fix f. Then g takes Fix f to a.

The alternate route uses fmap g to map f (Fix f) to f a, followed by alg from f a to a. Notice that this is the first time that we used the functorial property of f. It allowed us to lift the function g to fmap g.

The crucial observation is that In is a losless transformation and it can be easily inverted. The inverse of In is unFix:

unFix :: Fix f -> f (Fix f)
unFix (In x) = x

With one reversal of the arrow In to unFix, it’s easy to see that going the route of g is the same as taking the detour through unFix, followed by fmap g, and then alg:

3. Defining a catamorphism

g = alg . (fmap g) . unFix

We can use this equation as a recursive definition of g. We know that this definition converges because the application of g through fmap deals with subtrees of the original tree, and they are strictly smaller than the original tree.

We can abstract the evaluation further by factoring out the dependence on alg (redefining g = cata alg):

cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

The result is a very generic function called a catamorphism. We have constructed the catamorphism from an algebra in order to prove that the fixed point of this algebra’s functor is the initial algebra. But wait, haven’t we just created the recursive evaluator we’ve been looking for?

Catamorphisms

Look again at the type signature of the catamorphism with some additional (redundant) parentheses:

cata :: Functor f => (f a -> a) -> (Fix f -> a)

It takes an arbitrary algebra, which is a non-recursive function f a -> a, and returns an evaluator function, (Fix f -> a). This function takes an expression of the type Fix f and evaluates it down to type a. A catamorphism lets us evaluate arbitrarily nested expressions!

Let’s try it with our simple functor ExprF, which we used to generate nested expressions of the type Fix ExprF.

We have already defined an alg for it:

type SimpleA = Algebra ExprF Int

alg :: SimpleA
alg (Const i)   = i
alg (x `Add` y) = x + y
alg (x `Mul` y) = x * y

So our full-blown evaluator is just:

eval :: Fix ExprF -> Int
-- eval = cata alg = alg . fmap (cata alg) . unFix
eval = alg . fmap eval . unFix

Let’s analyze it: First, unFix allows us to peek at the top level of the input expression: It’s either a leaf Const i or an Add or Mul whose children are, again, full-blown expression, albeit one degree shallower. We evaluate the children by recursively applying eval to them. We end up with a single level tree whose leaves are now evaluated down to Ints. That allows us to apply alg and get the result.

You can test this on a sample expression:

testExpr = In $ (In $ (In $ Const 2) `Add` 
                (In $ Const 3)) `Mul` (In $ Const 4)

You can run (and modify) this code online in the School of Haskell version of this blog.

{-# LANGUAGE DeriveFunctor #-}
data ExprF r = Const Int
             | Add r r
             | Mul r r
    deriving Functor

newtype Fix f = In (f (Fix f))
unFix :: Fix f -> f (Fix f)
unFix (In x) = x

cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

alg :: ExprF Int -> Int
alg (Const i)   = i
alg (x `Add` y) = x + y
alg (x `Mul` y) = x * y

eval :: Fix ExprF -> Int
eval = cata alg

testExpr = In $ 
             (In $ (In $ Const 2) `Add` (In $ Const 3)) `Mul` 
             (In $ Const 4)

main = print $ eval $ testExpr

foldr

Traversing and evaluating a recursive data structure? Isn’t that what foldr does for lists?

Indeed, it’s easy to create algebras for lists. We start with a functor:

data ListF a b = Nil | Cons a b

instance Functor (ListF a) where
    fmap f Nil = Nil
    fmap f (Cons e x) = Cons e (f x)

The first type argument to ListF is the type of the element, the second is the one we will recurse into.

Here’s a simple algebra with the carrier type Int:

algSum :: ListF Int Int -> Int
algSum Nil = 0
algSum (Cons e acc) = e + acc

Using the constructor In we can recursively generate arbitrary lists:

lst :: Fix (ListF Int)
lst = In $ Cons 2 (In $ Cons 3 (In $ Cons 4 (In Nil)))

Finally, we can evaluate our list using our generic catamorphism:

cata algSum lst

Of course, we can do exactly the same thing with a more traditional list and foldr:

foldr (\e acc -> e + acc) 0 [2..4]

You should see the obvious paralles between the definition of the algSum algebra and the two arguments to foldr. The difference is that the algebraic approach can be generalized beyond lists to any recursive data structure.

Here’s the complete list example:

newtype Fix f = In (f (Fix f))

unFix :: Fix f -> f (Fix f)
unFix (In x) = x

cata :: Functor f => (f a -> a) -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

data ListF a b = Nil | Cons a b

instance Functor (ListF a) where
    fmap f Nil = Nil
    fmap f (Cons e x) = Cons e (f x)

algSum :: ListF Int Int -> Int
algSum Nil = 0
algSum (Cons e acc) = e + acc

lst :: Fix (ListF Int)
lst = In $ Cons 2 (In $ Cons 3 (In $ Cons 4 (In Nil)))

main = do
    print $ (cata algSum) lst
    print $ foldr (\e acc -> e + acc) 0 [2, 3, 4]

Conclusion

Here are the main points of this post:

  1. Just like recursive functions are defined as fixed points of regular functions, recursive (nested) data structures can be defined as fixed points of regular type constructors.
  2. Functors are interesting type constructors because they give rise to nested data structures that support recursive evaluation (generalized folding).
  3. An F-algebra is defined by a functor f, a carrier type a, and a function from f a to a.
  4. There is one initial algebra that maps into all algebras defined over a given functor. This algebra’s carrier type is the fixed point of the functor in question.
  5. The unique mapping between the initial algebra and any other algebra over the same functor is generated by a catamorphism.
  6. Catamophism takes a simple algebra and creates a recursive evaluator for a nested data structure (the fixed point of the functor in question). This is a generalization of list folding to arbitrary recursive data structures.

Acknowledgment

I’m greatful to Gabriel Gonzales for reviewing this post. Gabriel made an interesting observation:

“Actually, even in Haskell recursion is not completely first class because the compiler does a terrible job of optimizing recursive code. This is why F-algebras and F-coalgebras are pervasive in high-performance Haskell libraries like vector, because they transform recursive code to non-recursive code, and the compiler does an amazing job of optimizing non-recursive code.”

Bibliography

Most examples in my post were taken from the first two publications below:

  1. Fixing GADTs by Tim Philip Williams.
  2. Advanced Functional Programming, Tim Sheard’s course notes.
  3. Functional Programming with Bananas, Lenses, Envelopes, and Barbed Wire by Erik Meijer, Maarten Fokkinga, and Ross Paterson.
  4. Recursive types for free! by Philip Wadler
  5. Catamorphisms in Haskell Wiki

You don’t need to know anything about category theory to use Haskell as a programming language. But if you want to understand the theory behind Haskell or contribute to its development, some familiarity with category theory is a prerequisite.

Category theory is very easy at the beginning. I was able to explain what a category is to my 10-year old son. But the learning curve gets steeper as you go. Functors are easy. Natural transformations may take some getting used to, but after chasing a few diagrams, you’ll get the hang of it. The Yoneda lemma is usually the first serious challenge, because to understand it, you have to be able to juggle several things in your mind at once. But once you’ve got it, it’s very satisfying. Things just fall into place and you gain a lot of intuition about categories, functors, and natural transformations.

A Teaser Problem

You are given a polymorphic function imager that, for any function from Bool to any type r, returns a list of r. Try running this code in the School of Haskell, with colorMap, heatMap, and soundMap. You may also define your own function of Bool and pass it to imager.

{-# LANGUAGE ExplicitForAll #-}
imager :: forall r . ((Bool -> r) -> [r])
imager = ???

data Color = Red | Green | Blue        deriving Show
data Note  = C | D | E | F | G | A | B deriving Show

colorMap x = if x then Blue else Red
heatMap  x = if x then 32   else 212
soundMap x = if x then C    else G

main = print $ imager colorMap

Can you guess the implementation of imager? How many possible imagers with the same signature are there? By the end of this article you should be able to validate your answers using the Yoneda’s lemma.

Categories

A category is a bunch of objects with arrows between them (incidentally, a “bunch” doesn’t mean a set but a more generall collection). We don’t know anything about the objects — all we know is the arrows, a.k.a morphisms.

Our usual intuition is that arrows are sort of like functions. Functions are mappings between sets. Indeed, morphisms have some function-like properties, for instance composability, which is associative:

Fig 1. Associativity of morphisms demonstrated on Haskell functions. (In my pictures, piggies will represent objects; sacks of potatoes, sets; and fireworks, morphisms.)

h :: a -> b
g :: b -> c
f :: c -> d

f . (g . h) == (f . g) . h

There is also an identity morphism for every object in a category, just like the id function:

Fig 2. The identity morphism.

id :: a -> a

id . f == f . id == f

In all Haskell examples I’ll be using the category Hask of Haskell types, with morphisms being plain old functions. An object in Hask is a type, like Int, [Bool], or [a]->Int. Types are nothing more than just sets of values. Bool is a two element set {True, False}, Integer is the set of all integers, and so on.

In general, a category of all sets and functions is called Set .

So how good is this sets-and-functions intuition for an arbitrary category? Are all categories really like collections of sets, and morphisms are like functions from set to set? What does the word like even mean in this context?

Functors

In category theory, when we say one category is “like” another category, we usually mean that there is a mapping between the two. For this mapping to be meaningful, it should preserve the structure of the category. So not only every object from one category has to be mapped into an object from another category, but also all morphisms must be mapped correctly — meaning they should preserve composition. Such a mapping has a name: it’s called a functor.

Functors in Hask are described by the type class Functor

class Functor f where
fmap :: (a -> b) -> (f a -> f b)

A Haskell Functor maps types into types and functions into functions — a type constructor does the former and fmap does the latter.

A type contructor is a mapping from one type to another. For instance, a list type constructor takes any type a and creates a list type, [a].

So instead of asking if every category is “like” the Set category, we can ask a more precise question: For what types of categories (if not all of them) there exist functors that map them into Set . Such categories are called representable, meaning they have a representation in Set .

As a physicist I had to deal a lot with groups, such as groups of spacetime rotations in various dimensions or unitary groups in complex spaces. It was very handy to represent these abstract groups as matrices acting on vectors. For instance, different representations of the same Lorenz group (more precisely, SL(2, C)) would correspond to physical particles with different spins. So vector spaces and matrices are to abstract groups as sets and functions are to abstract categories.

Yoneda Embedding

One of the things Yoneda showed is that there is at least one canonical functor from any so called locally small category into the category of sets and functions. The construction of this functor is surpisingly easy, so let me sketch it.

This functor should map every object in category C into a set. Set of what? It doesn’t really matter, a set is a set. So how about using a set of morphisms?

Fig 3. The Yoneda embedding. Object X is mapped by the functor into the set HA(X). The elements of the set correspond to morphisms from A to X.

How can we map any object into a set of morphisms? Easy. First, let’s arbitrarily fix one object in the category C, call it A. It doesn’t matter which object we pick, we’ll just have to hold on to it. Now, for every object X in C there is a set of morphisms (arrows) going from our fixed A to this X. We designate this set to be the image of X under the functor we are constructing. Let’s call this functor HA. There is one element in the set HA(X) for every morphism from A to X.

A functor must define a mapping of objects to objects (to sets, in our case) and morphisms to morphisms (to functions in our case). We have established the first part of the mapping. To define the second part, let’s pick an arbitrary morphism f from X to Y. We have to map it to some function from the set HA(X) to the set HA(Y).

Fig 4. The Yoneda functor also maps morphisms. Here, morphism f is mapped into the function HA(f) between sets HA(X) and HA(Y).

Let’s define this function, we’ll call it HA(f), through its action on any element of the set HA(X), call it x. By our construction, x corresponds to some particular morphism, u, from A to X. We now have at our disposal two morphisms, u :: A -> X and f :: X -> Y (that’s the morphism we are mapping). We can compose them. The result f . u is a morphism from A to Y, so it’s a member of the set HA(Y). We have just defined a function that takes an x from HA(X) and maps it into y from HA(Y), and this will be our HA(f).

Of course, you have to prove that this construction of HA is indeed a functor preserving composition of morphisms, but that’s reasonably easy, once the technique we have just used becomes familiar to you. Here’s the gist of this technique: Use components! When you are defining a functor from category C to category D, pick a component — an object X in C — and define its image, F(X). Then pick a morphism f in C, say from X to Y, and define its image, F(f), as a particular morphism from F(X) to F(Y).

Similarly, when defining a function from set S to T, use its components. Pick an element x of S and define its image in T. That’s exactly what we did in our construction.

Incidentally, what was that requirement that the category C be locally small? A category is locally small if the collection of morphisms between any two objects forms a set. This may come as a surprise but there are things in mathematics that are too big to be sets. A classic example is a collection of all sets, which cannot be a set itself, because it would lead to a paradox. A collection of all sets, however, is the basis of the Set category (which, incidentally, turns out to be locally small).

Summary So Far

Let me summarize what we’ve learned so far. A category is just a bunch of abstract objects and arrows between them. It tells us nothing about the internal structure of objects. However, for every (locally small) category there is a structure-preserving mapping (a functor) that maps it into a category of sets. Objects in the Set category do have internal structure: they have elements; and morphisms are functions acting on those elements. A representation maps the categorie’s surface structure of morphisms into the internal structure of sets.

It is like figuring out the properties of orbitals in atoms by looking at the chemical compounds they form, and at the way valencies work. Or discovering that baryons are composed of quarks by looking at their decay products. Incidentally, no one has ever “seen” a free quark, they always live inside other particles. It’s as if physicists were studying the “category” of baryons by mapping them into sets of quarks.

A Bar Example

This is all nice but we need an example. Let’s start with “A mathematician walks into a bar and orders a category.” The barman says, “We have this new tasty category but we can’t figure out what’s in it. All we know is that it has just one object A” — (“Oh, it’s a monoid,” the mathematician murmurs knowingly) — “…plus a whole bunch of morphisms. Of course all these morphisms go from A to A, because there’s nowhere else to go.”

What the barman doesn’t know is that the new category is just a re-packaging of the good old set of ASCII strings. The morphisms correspond to appending strings. So there is a morphism called foo that apends the string "foo"

foo :: String -> String
foo = (++"foo")

main = putStrLn $ foo "Hello "

and so on.

“I can tell you what’s inside A,” says the mathematician, “but only up to an isomorphism. I’m a mathematician not a magician.”

She quickly constructs a set that contains one element for each morphism — morphisms must, by law, be listed by the manufacturer on the label. So, when she sees foo, she puts an element with the label “foo”, and so on. Incidentally, there is one morphism with no name, which the mathematician maps to an empty label. (In reality this is an identity morphism that appends an empty string.)

“That’s what’s inside the object A,” she says.

“Moreover, this set comes equipped with functions that rearrange its elements. In fact there is a function for every morphism listed in the category,” she says. “Name any morphism and I’ll construct you a function.”

The barman gives her morphism p, which in reality is:

p = (++"p")

“Okay,” she says, “here’s how I construct the corresponding function. Pick any element in my set.”

The barman picks “foo”.

“Okay, ‘foo’ corresponds to the morphism foo,” she says, “so tell me what you call the morphism that’s the composition of foo and p?” (By law, the manufacturer is obliged to specify all admissible compositions of morphisms on the label.)

“It’s called foop,” says the barman.

Quick check:

p . foo == (++"p") . (++"foo") == (++"foop")
foop = (++"foop")

“Okay,” she says, “the function corresponding to p maps “foo” into “foop”. Hm, how curious! I bet this function will map the no-label elment into “p”, is that right?”

“Indeed, it does,” says the barman.

Quick check:

p . id == p

“I bet you this is just a string monoid,” says the mathematician.

“I think I’ll have my usual Top on the rocks instead.”

Natural Transformations

We’ve seen how to construct a representation of any (locally small) category in Set by selecting an arbitrary object A in the category and studying morphisms originating at A. What if we choose a different object B instead? How different is the representation HA from HB? For that matter, what if we pick any functor F from C to Set ? How is it related to HA? That’s what the Yoneda lemma is all about.

Let me start with a short recap.

A functor is a mapping between categories that preserves their structure. The structure of a category is defined by the way its morphisms compose. A functor F maps objects into objects and morphism into morphisms in such a way that if f . g = h then F(f) . F(g) = F(h).

A natural transformation is a mapping between functors that preserves the structure of the underlying categories.

Fig 5. A component of a transformation Φ at X. Φ maps functor F into functor G, ΦX is a morphism that maps object F(X) into object G(X).

First we have to understand how to define mappings between functors. Suppose we have functors F and G, both going from category C to category D. For a given object X in C, F will map it into F(X) in D, and G will map it into G(X) in D. A mapping Φ between functors must map object F(X) to object G(X), both in category D. We know that a mapping of objects is called a morphism. So for every object X we have to provide a morphism ΦX from F(X) to G(X). This morphism is called a component of Φ at X. Or, looking at it from a different angle, Φ is a family of morphisms parameterized by X.

An Example of Natural Transformation

Just to give you some Haskell intuition, consider functors on Hask . These are mapping of types (type constructors) such as a -> [a] or a -> Maybe a, with the corresponging mappings of morphisms (functions) defined by fmap. Recall:

class Functor f where
fmap :: (a -> b) -> (f a -> f b)

The mapping between Haskell functors is a family of functions parameterized by types. For instance, a mapping between the [] functor and the Maybe functor will map a list of a, [a] into Maybe a. Here’s an example of such a family of functions called safeHead:

safeHead :: [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x

A family of functions parameterized by type is nothing special: it’s called a polymorphic function. It can also be described as a function on both types and values, with a more verbose signature:

{-# LANGUAGE ExplicitForAll #-}

safeHead :: forall a . [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x

main = print $ safeHead ["One", "Two"]

Let’s go back to natural transformations. I have described what it means to define a transformation of functors in terms of objects, but functors also map morphism. It turns out, however, that the tranformation of morphisms is completely determined by the two functors. A morphism f from X to Y is transformed under F into F(f) and under G into G(f). G(f) must therefore be the image of F(f) under Φ. No choice here! Except that now we have two ways of going from F(X) to G(Y).

Fig 6. The naturality square. Φ is a natural transformation if this diagram commutes, that is, both paths are equivalent.

We can first use the morphism F(f) to take us to F(Y) and then use ΦY to get to G(Y). Or we can first take ΦX to take us to G(X), and then G(f) to get to G(Y). We call Φ a natural transformation if these two paths result in the same morphism (the diagram commutes).

The best insight I can offer is that a natural transformation works on structure, while a general morphism works on contents. The naturality condition ensures that it doesn’t matter if you first rearrange the structure and then the content, or the other way around. Or, in other words, that a natural transformation doesn’t touch the content. This will become clearer in examples.

Going back to Haskell: Is safeHead a natural transformation between two functors [] and Maybe? Let’s start with a function f from some type a to b. There are two ways of mapping this function: one using the fmap defined by [], which is the list function map; and the other using the Maybe‘s fmap, which is defined in the Maybe‘s functor instance definition:

instance Functor Maybe where
   fmap f (Just x) = Just (f x)
   fmap _ Nothing  = Nothing

The two path from [a] to Maybe b are:

  1. Apply fmap f to [a] to get [b] and then safeHead it, or
  2. Apply safeHead to [a] and then use the Maybe version of fmap.

There are only two cases to consider: an empty list and a non-empty list. For an emtpy list we get Nothing both ways, otherwise we get Just f acting on the first element of the list.

We have thus shown that safeHead is a natural transformation. There are more interestig examples of natural transformations in Haskell; monadic return and join come to mind.

The intuition behind natural transformations is that they deal with structure, not contents. safeHead couldn’t care less about what’s stored in a list, but it understands the structure of the list: things like the list being empty, or having a first element. The type of this element doesn’t matter. In Haskell, natural transformations are polymorphic functions that can, like safeHead be typed using forall:

safeHead :: forall a . [a] -> Maybe a

Yoneda Lemma

Going back to the Yoneda lemma, it states that for any functor from C to Set there is a natural transformation from our canonical representation HA to this functor. Moreover, there are exactly as many such natural transformations as there are elements in F(A).

That, by the way, answers our other question about the dependence on the choice of A in the Yoneda embedding. The Yoneda lemma tells us that there are natural transformations both ways between HA and HB.

Amazingly, the proof of the Yoneda lemma, at least in one direction, is quite simple. The trick is to first define the natural transformation Φ on one special element of HA(A): the element that corresponds to the identity morphism on A (remember, there is always one of these for every object). Let’s call this element p. Its image under ΦA will be in F(A), which is a set. You can pick any element of this set and it will define a different but equally good Φ. Let’s call this element q. So we have fixed ΦA(p) = q.

Now we have to define the action of Φ on an arbitrary element in the image of HA. Remember that the functor HA transforms objects in C into sets. So let’s take an arbitrary object X and its image HA(X). The elements in HA(X) correspond to morphisms from A to X. So let’s pick one such morphism and call it f. Its image is an element r in HA(X). The question is, what does r map into under Φ? Remember, it’s image must be an element of F(X).

Fig 7. The mappings in the Yoneda lemma. F is an arbitrary functor. Any choice of p determines the morphism ΦX for any X.

To figure that out, let’s consider the F route. F being a functor transforms our morphism f into F(f) — which is a morphism from F(A) to F(X). But, as you may remember, we have selected a special element in F(A) — our q. Now apply F(f) to q and you get an element in F(X), call it s. (Remember, F(f) is just a regular function between two sets, F(A) and F(X).)

There’s nothing more natural than picking ΦX(r) to be this s! We have thus defined a natural transformation Φ for any X and r.

The straightforward proof that this definition of Φ is indeed natural is left as an exercise to the user.

A Haskell Example

I’ve been very meticulous about distinguishing between morphisms from A to X in C and the corresponding set elements in HA(X). But in practice it’s more convenient to skip the middle man and define natural transformations in the Yoneda lemma as going directly from these morphisms to F(X). Keeping this in mind, the Haskell version of the Yoneda lemma is ofter written as follows:

forall r . ((a -> r) -> f r) ~ f a

where the (lowercase) f is the functor (think of it as a type constructor and its corresponding fmap), (a -> r) is a function corresponding to the morphism from A to X in our orginal formulation. The Yoneda’s natural transformation maps this morphism into the image of r under f — the F(X) in the original formulation. The forall r means that the function ((a -> r) -> f r) works for any type r, as is necessary to make it a natural transformation.

The lemma states that the type of this function, forall r . ((a -> r) -> f r) is equivalent to the much simpler type f a. If you remember that types are just sets of values, you can interpret this result as stating that there is one-to-one correspondence between natural transformations and values of the type f r.

Remember the example from the beginning of this article? There was a function imager with the following signature:

imager :: forall r . ((Bool -> r) -> [r])

This looks very much like a natural transformation from the Yoneda lemma with the type a fixed to Bool and the functor, the list functor []. (I’ll call the functions Bool->r iffies.)

The question was, how many different implementations of this signature are there?

The Yoneda lemma tells us exactly how to construct such natural transformations. It instructs us to start with an identity iffie: idBool :: Bool -> Bool, and pick any element of [Bool] to be its image under our natural transformation. We can, for instance, pick [True, False, True, True]. Once we’ve done that, the action of this natural transformation on any iffie h is fixed. We just map the morphism h using the functor (in Haskell we fmap the iffie), and apply it to our pick, [True, False, True, True].

Therefore, all natural transformations with the signature:

forall r . ((Bool -> r) -> [r])

are in one-to-one correspondence with different lists of Bool.

Conversely, if you want to find out what list of Bool is hidden in a given implementation of imager, just pass it an identity iffie. Try it:

{-# LANGUAGE ExplicitForAll #-}

imager :: forall r . ((Bool -> r) -> [r])
imager iffie = fmap iffie [True, False, True, True]

data Color = Red | Green | Blue        deriving Show
data Note  = C | D | E | F | G | A | B deriving Show

colorMap x = if x then Blue else Red
heatMap  x = if x then 32   else 212
soundMap x = if x then C    else G
idBool :: Bool -> Bool
idBool x = x

main = print $ imager idBool

Remember, this application of the Yoneda lemma is only valid if imager is a natural transformation — its naturality square must commute. The two functors in the imager naturality diagram are the Yoneda embedding and the list functor. Naturality of imager translates into the requirement that any function f :: a -> b modifying an iffie could be pulled out of the imager:

imager (f . iffie) == map f (imager iffie)

Here’s an example of such a function translating colors to strings commuting with the application of imager:

{-# LANGUAGE ExplicitForAll #-}

imager :: forall r . ((Bool -> r) -> [r])
imager iffie = fmap iffie [True, False, True, True]

data Color = Red | Green | Blue  deriving Show

colorMap x = if x then Blue else Red

f :: Color -> String
f = show 

main = do
    print $ imager (f . colorMap)
    print $ map f (imager colorMap)

The Structure of Natural Transformations

That brings another important intuition about the Yoneda lemma in Haskell. You start with a type signature that describes a natural transformation: a particular kind of polymorphic function that takes a probing function as an argument and returns a type that’s the result of a functor acting on the result type of the probing function. Yoneda tells us that the structure of this natural transformation is tightly constrained.

One of the strengths of Haskell is its very strict and powerful type system. Many Haskell programers start designing their programs by defining type signatures of major functions. The Yoneda lemma tells us that type signatures not only restrict how functions can be combined, but also how they can be implemented.

As an extreme, there is one particular signature that has only one implementation: a->a (or, more explicitly, forall a. a -> a). The only natural implementation of this signature is the identity function, id.

Just for fun, let me sketch the proof using the Yoneda lemma. If we pick the source type as the singleton unit type, (), then the Yoneda embedding consists of all functions taking unit as an argument. A function taking unit has only one return value so it’s really equivalent to this value. The functor we pick is the identity functor. So the question is, how many natural tranformation of the the following type are there?

forall a. ((() -> a) -> a)

Well, there are as many as there are elements in the image of () under the identity functor, which is exactly one! Since a function ()->a can be identified with a, it means we have only one natural transformation with the following signature:

forall a. (a -> a)

Moreover, by Yoneda construction, this function is defined by fmapping the function ()->a over the element () using the identity functor. So our natural transformation, when probed with a value of the type a will return the same value. But that’s just the definition of the identity function. (In reality things are slightly more complicated because every Haskell type must include undefined, but that’s a different story.)

Here’s an exercise for the reader: Show that the naturality square for this example is equivalent to id commuting with any function: f . id == id . f.

Conclusion

I hope I provided you with enough background information and intuition so that you’ll be able to easily read more advanced blog posts, like this one:
Reverse Engineering Machines with the Yoneda Lemma by Dan Piponi, or GADTs by Gabriel Gonzales.

Acknowledgments

I’d like to thank Gabriel Gonzales for providing useful comments and John Wiegley, Michael Sloan, and Eric Niebler for many interesting conversations.


I’ve been dealing with concurrency for many years in the context of C++ and D (see my presentation about Software Transactional Memory in D). I worked for a startup, Corensic, that made an ingenious tool called Jinx for detecting data races using a lightweight hypervisor. I recorded a series of screencasts teaching concurrency to C++ programmers. If you follow these screencasts you might realize that I was strongly favoring functional approach to concurrency, promoting immutability and pure functions. I even showed how non-functional looking code leads to data races that could be detected by (now defunct) Jinx. Essentially I was teaching C++ programmers how to imitate Haskell.

Except that C++ could only support a small subset of Haskell functionality and provide no guarantees against even the most common concurrency errors.

It’s unfortunate that most programmers haven’t seen what Haskell can do with concurrency. There is a natural barrier to learning a new language, especially one that has the reputation of being mind boggling. A lot of people are turned off simply by unfamiliar syntax. They can’t get over the fact that in Haskell a function call uses no parentheses or commas. What in C++ looks like

f(x, y)

in Haskell is written as:

f x y

Other people dread the word “monad,” which in Haskell is essentially a tool for embedding imperative code inside functional code.

Why is Haskell syntax the way it is? Couldn’t it have been more C- or Java- like? Well, look no farther than Scala. Because of Java-like syntax the most basic functional trick, currying a function, requires a Scala programmer to anticipate this possibility in the function definition (using special syntax: multiple pairs of parentheses). If you have no access to the source code for a function, you can’t curry it (at least not without even more syntactic gymnastics).

If you managed to wade through this post so far, you are probably not faint of heart and you will accept the challenge of reading an excellent article by Simon Peyton Jones, Beautiful Concurrency that does a great job of explaining to the uninitiated Haskell’s beautiful approach to concurrency. It’s not a new article, but it has been adapted to the new format of the School of Haskell by Yours Truly, which means you’ll be able to run code examples embedded in it. If you feel adventurous, you might even edit them and see how the results change.


Anybody who tries to popularize Haskell has to deal with the monad problem. You can’t write the simplest Haskell program without a monad (the IO monad in particular) and yet there is no easy way to explain what a monad is. Mind you, when teaching object oriented programming, nobody starts with a definition of “object.” There’s no need for that because everybody has some intuition about objects — we are surrounded by objects in real life. Which is very convenient because if you tried to define what an object is in C++ or Java, you’d have to use quite a bit of hard-core PL theory; plus you’d have to wave your hands a lot. This trick doesn’t work with monads because, except for category theorists, nobody comes to Haskell with any intuitions about monads. The name monad is scary in itself (this is probably why monads in F# are innocuously called “computation expressions”).

In my n’th attempt to tackle monads I tried a different, mystical, approach. In mysticism you often talk about things that you cannot define. Taoism takes this approach to the extreme — any attempt to define Tao must fail: “Tao that can be described is not the eternal Tao.” Hence the slightly tongue-in-cheek approach I took in this tutorial, Pure Functions, Laziness, I/O, and Monads. If you’re new to Haskell, it might help if you read the two previous tutorials in my Basics of Haskell. Notice that these tutorials are in the new format introduced by the School of Haskell, where you can run and edit program snippets directly in your browser.

I must confess that this tutorial got slightly cold welcome in the Haskell community. That’s why I’m curious what non-Haskellers think of it. Does my approach make monads less or more scary? Take into account that I decided to tackle monads right after explaining the very basics of Haskell syntax — in most courses or books monads are treated as an advanced topic.


At last I can start blabbing about it. The School of Haskell went beta. We’ve been working on it for the last half a year with a team of top-notch Haskell programmers.

Our main effort is to create an online Haskell IDE, but the technology we’ve been developing for it was just aching to be used for teaching Haskell. Imagine an online tutorial where snippets of code can be compiled and run on the spot. You can play with them: edit, compile, and run with your changes. You can pull instant documentation for any library function used in the snippet. Yup! And there’s more. Read the blog I linked to above and sign up for beta. It’s a great opportunity to fulfill you New Year’s resolution to finally learn Haskell or to create some Haskell content yourself. Enjoy!


2013 starts on a very positive note: Simon Peyton-Jones has just endorsed our company, FP Complete. And, as the saying goes, he put his money where his mouth is: He invested his own money in FP Complete. Here’s what he wrote about us in his blog:

Of course, FP Complete thereby faces the challenge of making a business out of a language ecosystem, something known to be difficult! Here I am encouraged by the fact that Aaron and Bartosz are not, like many of us, primarily techno-geeks whose primary motivation is the technology itself (just read the Haskell Weekly News if you have any doubt what I mean). They are excited by Haskell all right, but they want to build a business, and they have experience of doing just that. And I’m very encouraged by the fact that Michael Snoyman (a giant of the Haskell community) has joined them, along with several others.

Of course, if you’ve been following my blogs you know that I am a techno-geek too. You might have noticed a slow evolution in my interests from the messiness of C++ and its template metaprogramming, through a flirt with the D language, the pitfalls of concurrency and parallelism, to the clarity and robustness of Haskell. If I was able to entice some of you to make some of this journey with me, it makes me very happy.

But, as Simon PJ pointed out, loving a language and having fun with it is not the same as creating a successful business. Aaron (the CEO) and I have been spending a lot of time discussing business strategies. Everything we do is a deliberate step towards self-sustainability and profitability. All our work would be for naught if we couldn’t create a steady stream of revenue. It is a tremendous task, but we have experience and we are quick learners. We talk to companies to help them figure out how they can improve their bottom lines using Haskell. We identify the obstacles in the way of adoption of Haskell and help overcome them. If we can tap into even a small percentage of profits that the industry will make by adopting Haskell, our future is secured. And Simon P-J may stay assured that his investment will pay back manyfold.

« Previous PageNext Page »