I’m not fond of arguments based on lack of imagination. “There’s no way this code may fail!” might be a sign of great confidence or the result of ignorance. The inability to come up with a counterexample doesn’t prove a theorem. And yet there is one area of programming where such arguments work, and are quite useful. These are parametricity arguments: free theorems about polymorphic functions. Fortunately, there is solid theory behind parametricity. Free theorems are not based on ignorance. So I decided to read the relevant papers (see bibliography at the end of this post) and write a blog about it. How hard could it be? A few months and several failed attempts later I realized how naive I was. But I think I finally understand the basics enough to explain them in relatively simple terms.


Here’s a classic example — a function that takes a list of arbitrary type a and returns a list of the same type:

r :: [a] -> [a]

What can this function do? Since it has to work with any type of list element, it can’t do anything type-specific. It can’t modify the elements or invent new ones. So all it can do is rearrange them, duplicate, or remove. Can you think of anything else?

The questions it a little tricky because it all depends on the kind of polymorphism your language supports. In Haskell, where we have parametric polymorphism, the above statement is for the most part true (modulo termination worries). In C++, which supports ad-hoc polymorphism, a generic function like:

template<class T> 
list<T> r(list<T>);

can do all kinds of weird things.

Parametric polymorphism means that a function will act on all types uniformly, so the declaration at the top indeed drastically narrows down the possibilities.

For instance, try fmapping any function of the type:

f :: a -> b

over a list of a. You can either apply fmap before or after acting on it with r. It shouldn’t matter whether you first modify the elements of the list and then rearrange them, or first rearrange and then modify them. The result should be the same:

r (fmap f as) = fmap f (r as)

But is it true just because we can’t imagine how it may fail, or can we make a formal argument to prove it?

Let’s Argue (Denotational) Semantics

One way to understand polymorphism is to have a good model for types. At first approximation types can be modeled as sets of values (strictly speaking, as shown by Reynolds, the set-theoretical model fails in the case of polymorphic lambda calculus, but there are ways around it).

The type Bool is a two-element set of True and False, Integer is a set of integers, and so on. Composite types can also be defined set-theoretically. For instance, a pair type is a cartesian product of two sets. A list of a is a set of lists with elements from the set a. A function type a->b is a set of functions between two sets.

For parametric polymorphism you need to first be able to define functions on types: functions that take a type and produce a new type. In other words, you should be able to define a family of types that is parametrized by another type.

For instance, given some type a, produce a type of pairs: (a, a). This can be formally written (not in Haskell) as:

Λa . (a, a)

Notice the capital lambda for defining functions on types, as opposed to the lowercase lambda used for functions on values.

To turn a family of types into a family of values, you put the universal quantifier in front of it, as in:

forall a . (a, a)

This is a valid type signature, but you’d be hard pressed to implement it. You’d have to provide a pair of concrete values for every possible type. You can’t do it uniformly across all types.

But there are some types of values that can be specified wholesale. These are function values. With one formula you can define a whole family of functions. The following signature, for instance, is perfectly implementable:

forall a . a -> a

Let’s analyze it. It consists of a type function:

Λa . a -> a

which, for any type a, returns a function type a->a. When universally quantified with forall, it becomes a family of concrete functions parameterized by type. This is possible because all these functions can be defined with one closed term:

id x = x

In Haskell, we usually omit the forall quantifier when there’s no danger of confusion. Any signature that contains a type variable is automatically universally quantified over it. (You’ll have to use explicit forall, however, with higher-order polymorphism, where a polymorphic function can be passed as an argument to another function.)

So what’s the set-theoretic model for polymorphism? You simply replace types with sets. A function on types becomes a function on sets. Notice that this is not the same as a function between sets. The latter assigns elements of one set to elements of another. The former assigns sets to sets. As in: Take any set a and return a cartesian product of the set with itself. (You might be tempted to think of functions on sets as functions between sets of all sets, but this kind of thinking leads to well known paradoxes.)

Or take any set a and return the set of functions from this set to itself. For this one we can easily build a polymorphic function — one which for every type a produces an actual function whose type is (a->a). So, for instance, if a is Int, we will produce a function (Int->Int); if a is Bool, we will produce a function (Bool->Bool); and so on. Now, with ad-hoc polymorphism it’s okay to code the Int function separately from the Bool function; but in parametric polymorphism, you’ll have to use the same code for all types.

Any language that provides some kind of pattern-matching on types (e.g., template specialization in C++) automatically introduces ad-hoc polymorphism. Incidentally, ad-hoc polymorphism is also possible in Haskell with the use of type families.

Preservation of Relations

Let’s go to our original example and rewrite it using the explicit universal quantifier:

r :: forall a. [a] -> [a]

It defines a family of functions parametrized by the type a. When used in Haskell code, a particular member of this family will be picked automatically by the type inference system, depending on the context. In what follows, I’ll use explicit subscripting for the same purpose. The free theorem I mentioned before can be rewritten as:

rb (fmap f as) = fmap f (ra as)

with the function:

f :: a -> b

serving as a bridge between the types a and b. Specifically, f relates values of type a to values of type b. This relation happens to be functional, which means that there is only one value of type b corresponding to any given value of type a.

But the correspondence between elements of two lists may, in principle, be more general. What’s more general than a function? A relation. A relation between two sets a and b is defined as a set of pairs — a subset of the cartesian product of a and b. A function is a special case of a relation, one that can be represented as a set of pairs of the form (x, f x), or in relational notation x <=> f x. This relation is often called the graph of the function, since it can be interpreted as coordinates of points on a 2-d plane that form the plot the function.

The key insight of Reynolds’ was that you can abstract the shape of a data structure by defining relations between values. For instance, how do we know that two pairs have the same shape — even if one is a pair of integers, say (1, 7), and the other a pair of colors, say (Red, Blue)? Because we can relate 1 to Red and 7 to Blue. This relation may be called: “occupying the same position”.

Notice that the relation doesn’t have to be functional. The pair (2, 2) can be related to the pair (Black, White) using the non-functional relation:

(2 <=> Black),
(2 <=> White)

Conversely, given any relation between integers and colors, you can easily test which integer pairs are related to which color pairs. For the above relation, for instance, these are all the pairs that are related:

((2, 2) <=> (Black, Black)),
((2, 2) <=> (Black, White)),
((2, 2) <=> (White, Black)),
((2, 2) <=> (White, White))

This idea is easily extended to lists. Two lists are related if their corresponding elements are related: the first element of one list must be related to the first element of the second list, etc.; and empty lists are always related.

In particular, if the relationship between elements is established by a function f, it’s easy to convince yourself that the lists as and bs are related if

bs = fmap f as

With this in mind, our free theorem can be rewritten as:

rb bs = fmap f (ra as)

In other words, it tells us that the two lists

rb bs


ra as

are related under f.

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

So r transforms related lists into related lists. It may change the shape of the list, but it never touches the values in it. When it acts on two related lists, it rearranges them in exactly the same way, without breaking any of the relations between corresponding elements.

Reading Types as Relations

The above examples showed that we can define relations between values of composite types in terms of relations between values of simpler types. We’ve seen this with the pair constructor and with the list constructor. Continuing this trend, we can state that two functions:

f :: a -> b


g :: a' -> b'

are related iff, for related x and y, f x is related to g y. In other words, related functions map related arguments to related values.

Notice what we are doing here: We are consistently replacing types with relations in type constructors. This way we can read complex types as relations.

But what about primitive types? Let’s consider an example. Two functions from lists to integers that simply calculate the lengths of the lists:

lenStr  :: [Char] -> Int
lenBool :: [Bool] -> Int

What happens when we call them with two related lists? The first requirement for lists to be related is that they are of equal length. So when called with related lists the two functions will return the same integer value . It makes sense for us to consider these two functions related because they don’t inspect the values stored in the lists — just their shapes. They also look like components of the same parametrically polymorphic function, length.

It therefore makes sense to read a primitive type, such as Int, as an identity relation: two values are related if they are equal. This way our two functions, lenStr and lenBool are indeed related, because they turn related lists to related results.

Notice that for non-polymorphic functions the relationship that follows from their type is pretty restrictive. For instance, two functions Int->Int are related if and only if their outputs are equal for equal inputs. In other words, the functions must be (extensionally) equal.

Now let’s go back to polymorphic functions. The type of a polymorphic function is specified by universally quantifying a function on types.

f :: forall A. ΦA

The function Φ maps types to types. In our set-theoretical model it maps sets to sets, but we want to read it in terms of relations.

A function on relations does two things: it maps input sets to output sets, and it maps pairs from input sets to pairs from output sets. If you pick an arbitrary relation A between sets a and a' then Φ will map A to a relation ΦA between sets b and b'. The connection between a and b, and between a' and b' may, in principle, vary with the input relation.

In describing polymorphic functions, we are only interested in functions on relations that “factorize” into functions on sets. It means that Φ defines two families of functions on sets, φ and ψ, such that our b is given by φa and b' is given by ψa'. ΦA then establishes a relation between the sets φa and ψa'.

Fig 2. Φ maps relations to relations

Fig 2. Φ maps relations to relations

Given that Φ maps relations to relations, a universally quantified version of it:

forall A. ΦA

maps pairs of sets to pairs of values.

Now suppose you have two polymorphic functions g and g':

g  :: forall a . φa
g' :: forall a'. ψa'

They both map types (sets) to values. We can instantiate g at some type a, and it will return a value ga of the type b=φa. We can instantiate g' at some type a', and it will return a value g'a' of the type b'=ψa'. We can do this for any relation A between two arbitrary sets a and a'.

We will say that g and g' are related through the relation induced by the type (forall A. ΦA) iff:

  • Φ factorizes, as a function on sets, into φ and ψ, and
  • the results ga and g'a' are related by ΦA.
Fig 3. Relation between two polymorphic functions

Fig 3. Relation between two polymorphic functions

In other words, polymorphic functions are related if they map related types to related values. Notice that in the interesting examples these values are themselves functions.

With these definitions, we can now reinterpret any type signature as a relation between values.

The Parametricity Theorem

Reynolds’ second key insight was that any term is in a relation with itself — the relation being defined by the term’s type. We have indeed defined the mapping of types to relations to make this work. Primitive types turn into identity relations, so obviously a primitive value is in relation with itself. A function between primitive types is in relation with itself because it maps related (equal) arguments into related (equal) results. A list or a pair of primitive types is in relation with itself because each element of it is equal to itself. You can recurse and consider a list of functions, or a pair of lists, etc., building the proof inductively, proceeding from simpler types to more and more complex types. The proof goes over all possible term construction rules and typing rules in a given theory.

Formally, this kind of proof is called “structural induction,” because you’re showing that more complex structures will satisfy the theorem as long as the simpler ones from which they are constructed, do. The only tricky part is dealing with polymorphic functions, because they are quantified over all types. In fact, this is the reason why the naive interpretation of types as sets breaks down (see, however, Pitts’ paper). It is possible, however, to prove the parametricity theorem in a more general setting, for instance, using frames, or in the framework of operational semantics.

Wadler’s key insight was to interpret Reynolds’ theorem not only as a way of identifying different implementations of the same type — for instance, cartesian and polar representations of complex numbers — but also as a source of free theorems for polymorphic types.

Let’s try applying parametricity theorem to some simple examples. Take a constant term: an integer like 5. Its type Int can be interpreted as a relation, which we defined to be the identity relation (it’s one of the primitive types). And indeed, 5 is in this relation with 5.

Take a function like:

ord :: Char -> Int

Its type defines a relation between functions: Two functions of the type Char->Int are related if they return equal integers for equal characters. Obviously, ord is in this relation with itself.

Those were trivial examples. The interesting ones involve polymorphic functions. So let’s go back to our starting example. The term now is the polymorphic function r whose type is:

forall a . [a] -> [a]

Parametricity tells us that r is in relation with itself. However, comparing a polymorphic function to itself involves comparing the instantiations of the same function at two arbitrary types (sets), say a and a':

ra <=> ra'

We are free to pick an arbitrary relation A between elements of these sets. A will be mapped to another relation by ΦA which, in our case, is induced by the type:

[a] -> [a]
Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

By our definition, two functions ra and ra’ are related if they map related arguments to related results. In this case the arguments and the results are lists. So if we have two related lists:

as  :: [a]
as' :: [a']

they must, by parametricity, be mapped to two related lists:

bs  = ra  as
bs' = ra' as'

This must be true for any type of relation, so let’s pick a functional relation generated by some (f :: a -> a'). We have:

as' = fmap f as

The results of applying r, therefore, must be related through the same relation:

bs' = fmap f bs

Putting it all together, we get our expected result:

ra' (fmap f as) = fmap f (ra as)

Parametricity and Natural Transformations

The free theorem I used as the running example is interesting for another reason: The list constructor is a functor. You may think of functors as generalized containers for storing arbitrary types of values. You can imagine that they have shapes; and for two containers of the same shape you may establish a correspondence between “positions” at which the elements are stored. This is quite easy for traditional containers like lists or trees, and with a leap of faith it can be stretched to non-traditional “containers” like functions. We used the intuition of relations corresponding to the idea of “occupying the same position” within a data structure. This notion can be readily generalized to any polymorphic containers. Two trees, for instance, are related if they are both empty, or if they have the same shape and their corresponding elements are related.

Let’s try another functor: You can also think of Maybe as having two shapes: Nothing and Just. Two Nothings are always related, and two Justs are related if their contents are related.

This observation immediately gives us a free theorem about polymorphic functions of the type:

r :: forall a. [a] -> Maybe a

an example of which is safeHead. The theorem is:

fmap h . safeHead == safeHead . fmap h

Notice that the fmap on the left is defined by the Maybe functor, whereas the one on the right is the list one.

If you accept the premise that an appropriate relation can be defined for any functor, then you can derive a free theorem for all polymorphic functions of the type:

r :: forall a. f a -> g a

where f and g are functors. This type of function is known as a natural transformation between the two functors, and the free theorem:

fmap h . r == r . fmap h

is the naturality condition. That’s how naturality follows from parametricity.


I’d like to thank all the people I talked to about parametricity at the ICFP in Gothenburg, and Edward Kmett for reading and commenting on the draft of this blog.

Appendix: Other Examples

Here’s a list of other free theorems from Wadler’s paper. You might try proving them using parametricity.

r :: [a] -> a -- for instance, head
f . r == r . fmap f
r :: [a] -> [a] -> [a] -- for instance, (++)
fmap f (r as bs) == r (fmap f as) (fmap f bs)
r :: [[a]] -> [a] -- for instance, concat
fmap f . r == r . fmap (fmap f)
r :: (a, b) -> a -- for instance, fst
f . r == r . mapPair (f, g)
r :: (a, b) -> b -- for instance, snd
g . r == r . mapPair (f, g)
r :: ([a], [b]) -> [(a, b)] -- for instance, uncurry zip
fmap (mapPair (f, g)) . r == r . mapPair (fmap f, fmap g)
r :: (a -> Bool) -> [a] -> [a] -- for instance, filter
fmap f . r (p . f) = r p . fmap f
r :: (a -> a -> Ordering) -> [a] -> [a] -- for instance, sortBy
 -- assuming: f is monotone (preserves order)
fmap f . r cmp == r cmp' . fmap f
r :: (a -> b -> b) -> b -> [a] -> b -- for instance, foldl
-- assuming: g (acc x y) == acc (f x) (g y)
g . foldl acc zero == foldl acc (g zero) . fmap f
r :: a -> a -- id
f . r == r . f
r :: a -> b -> a -- for instance, the K combinator
f (r x y) == r (f x) (g y)


mapPair :: (a -> c, b -> d) -> (a, b) -> (c, d)
mapPair (f, g) (x, y) = (f x, g y)


  1. John C Reynolds, Types, Abstraction and Parametric Polymorphism
  2. Philip Wadler, Theorems for Free!
  3. Claudio Hermida, Uday S. Reddy, Edmund P. Robinson, Logical Relations and Parametricity – A Reynolds Programme for Category Theory and Programming Languages
  4. Derek Dreyer, Paremetricity and Relational Reasoning, Oregon Programming Languages Summer School

In category theory we never look inside objects. All information about objects is encoded in the arrows (morphisms) between them. In set theory, on the other hand, we express properties of sets through their elements. But there is a strong link between categories and sets, at least in the case of locally small categories. In those categories, morphisms between any two objects form a set. We call this set the hom-set.

Go one level up in abstraction and you have categories of functors, which are mappings between categories. In a functor category a functor is treated as an object, and a natural transformation — a mapping between functors– is treated as a morphism. So a hom-set in a functor category is a set of natural transformations between two functors. An interesting question arises: How is this set related to the hom-sets in the categories that are mapped by these functors? After all, they are sets in the same category of sets. The answer involves, as usual, a universal construction. This one is called an end.


The problem with category theory is that it deals with so many levels of abstraction that you quickly run out of fonts for your notation. I’ll try to stick to more or less the standard notation. I’ll use capital letters C, D, etc. for categories (script font would be nice, but it’s not very practical in a blog). The corresponding lower case letters, c, d — optionally with primes, like c', d' — will denote objects in those categories. I’ll use f, g, h, etc. for morphisms and F, G, H for functors. I’ll also use the terse notation for hom-sets, so the set of all morphisms between objects c and c' in category C will be simply C(c, c'). The set of natural transformations between two functors F and G, both mapping C to D, will be [C, D](F, G). I will not use parentheses for functors acting on objects or morphisms, so F acting on c will be simply F c (and similarly F f, when acting on f). But occasionally I’ll break the rules if it helps the presentation.

Natural Transformations

A natural transformation is a map between functors. So let’s start with two functors F and G, both going from category C to another category D. Let’s focus on one object in C, call it c. F maps c to F c and G maps c to G c. These two, F c and G c, are objects in D. As such they define a hom-set: the set of all morphisms from F c to G c denoted D(F c, G c). A natural transformation picks one morphism from every such hom-set. Not all picks are natural, as we’ll see soon.

Hom-set defined by two images of object c under functors F and G

Hom-set defined by two images of object c under functors F and G

A hom-set, being a set, is also an object in Set — the category of sets. So let’s look at the hom-sets D(F c, G c) as objects in Set. There is one for every c. If we carefully pick one element from each, we will have a natural transformation (I said “carefully”). A natural transformation is a family of morphisms picked from hom-sets D(F c, G c), one for each c. If you think of those hom-sets as fibers growing from objects in C, a natural transformation is a cross-section through all these fibers.

Here’s another way of looking at this. Let’s pick an arbitrary set in Set, call it X (sorry, I’m using a capital letter for an object in Set, but I do need lowercase x for its element). A function φ from X to the set D(F c, G c) maps an element x of X to a particular morphism in D(F c, G c). We can think of it as picking a component of some natural transformation. Each choice of x would potentially correspond to a different natural transformation. So the set X could be looked upon as representing some set of natural transformations. We still have to fill in a lot of blanks, but we are on the right track.

A function from a set X to a hom-set defined by two functors F and G

A function from a set X to a hom-set defined by two functors F and G

Since a natural transformation is a family of morphisms, we need a family of such functions from X to D(F c, G c), one for every c. Let’s call this family τ. When we fix c, it’s a function from X to D(F c, G c). When we fix x, it’s a precursor of a natural transformation: a family of morphisms picked from hom-sets.

A family of functions parameterized by c and x

A family of functions parameterized by c and x

The only thing missing from this picture is the naturality condition. Not all families of D(F c, G c) are natural. We need to account for the fact that our functors F and G not only map objects, but also morphisms.

So let’s look at morphisms in C and how they are mapped to morphisms in D by, say, our functor F. We can pick two object in C, c and c'. Morphisms between those two form a hom-set C(c, c'). F maps this hom-set to another hom-set, D(F c, F c'). Similarly, G maps the same hom-set to D(G c, G c').

Four hom-sets defined by two functors F and G and two objects c and c'

Four hom-sets defined by two functors F and G and two objects c and c’

Taken together, we have a diagram of four hom-sets between four objects: F c, F c', G c, and G c'. Fixing a morphism f from c to c' (an element of C(c, c')) picks two morphisms, one F f from D(F c, F c') and one G f from D(G c, G c'), given the functoriality of F and G. Fixing an x in X picks two morphisms, one τc x from D(F c, G c) and one τc' x from D(F c', G c'). These four better commute:

G f . τc x = τc' x . F f
The commuting naturality square

The commuting naturality square

That’s the naturality condition. Any τ that satisfies this condition defines a set of natural transformations, one for each x.


All this time I’ve been setting up the scene for one important insight. The set X and the family of functions τ look a lot like a cone (see my blog post about limits). Except that, instead on one functor, we have two, and τ is not really a natural transformation. But we are getting close. And if we can carve out something cone-like out of this construction, than we could maybe find something limit-like. And indeed the cone-like object is called a wedge and the limit-like thing is called an end, and in our case the end would be a set of all natural transformations from F to G. So let’s work this thing out.

If we were constructing a cone, we’d start with a functor from our source category C to the target category — Set in this case. That’s easy to define for objects: c goes into the set D(F c, G c). But we run into a problem with morphisms. Suppose we want to map a morphism h that goes from c to c':

h : c -> c'

Its image should be a function from D(F c, G c) to D(F c', G c'). It’s a function that maps morphisms to morphisms. Let’s see what morphisms are at our disposal. We have:

F h : F c -> F c'
G h : G c -> G c'

How can we turn a morphism f from D(F c, G c)

f : F c -> G c

into a morphism f' from D(F c', G c'):

f' : F c' -> G c'

One thing we could do is to post-compose G h after f to get to G c'; but there is no way to get anywhere from F c'. So it can’t be that simple.

But notice one thing. Even though τ maps elements of X to “diagonal” hom-sets (by diagonal I mean that the argument c is repeated in D(F c, G c)), naturality condition involves off-diagonal hom-sets, like D(F c, F c') and D(G c, G c') (c potentially different from c'). So maybe we should open our minds to those off-diagonal hom-sets in our search of a functor? How about a functor that maps a pair of elements (c, c') to a hom-set D(F c, G c')? Let’s see if we can figure out its action on morphisms.

Now, since we are constructing a functor of two arguments, we have to define its action on a pair of morphisms, (f, g).

f : c -> c'
g : d -> d'

The image of this pair should be a function in Set that maps D(F c, G d) into D(F c', G d'). Unfortunately, we run into the same problem again. Given h : F c -> G d, we can follow it with G g : G d -> G d', but there is no way we can move anywhere from F c'. The only morphism at our disposal goes the wrong way, F f : F c -> F c'. If we could only reverse it!

Ah! but functors that go “the wrong way” on morphisms are well known. They are called contravariant functors, as opposed to the good old covariant functors that we have grown to love and cherish. What we need is a functor that’s contravariant in the first argument and covariant in the second. Such functors even have a name: they are called profunctors.

So, given a pair of morphisms (f, g) our profunctor Snat will map a morphism from D(F c', G d) to a morphism from D(F c, G d'). Notice that this time I have inverted c and c'.

Given h : F c' -> G d, we can easily construct a correpsponding morphism in D(F c, G d') by this composition:

G g . h . F f : F c' -> G d
Constructing the action of a profunctor built from two functors on a pair of morphisms

Constructing the action of a profunctor built from two functors on a pair of morphisms

So that’s our action of Snat on a pair of morphisms from C. It turns a morphism h into:

(Snat f g) h = G g . h . F f

Let’s summarize what we have so far: We have a profunctor Snat, a set X, and a family of morphisms τ from X to diagonal elements Snat c c. We are not interested in defining τ for off-diagonal elements of Snat. What remains is to impose some kind of generic condition on τ that would let it generate natural transformations only. We would like to formulate this condition in terms of a general profunctor S, if possible.

What’s the minimal consistency condition on τ, given that it only generates diagonal elements S c c? We need a way to somehow connect two such objects, say S c c and S c' c'. Suppose we have a morphism f : c -> c'. How can we use this morphism to operate on S c c and S c' c'? A profunctor S can be used to map a pair of morphisms, so how about pairing our f with something obvious that always exists, like the identity morphism id? Let’s try it:

S idc f  : S c  c  -> S c c'
S f idc' : S c' c' -> S c c'

(Notice the inversion of c and c' when f is used as the first argument — that’s our contravariance in action.) We have found two ways to get from two diagonal elements to the same non-diagonal element of S. Together with τc and τc' they form a diamond. This diamond better commute or we’re in trouble.

The wedge condition

The wedge condition

S idc f . τc = S f idc' . τc'

Now we can finally define a wedge for an arbitrary profunctor S. A wedge for S consists of an object X and a family of morphisms τc from X to S c c that satisfy the wedge condition above.

And if we plug in our special profunctor Snat that maps pairs of objects to hom-sets, the wedge condition turns into the naturality condition for our two functors. Let’s check this out.

Remember, this is how Snat acts on a pair of morphisms f and g:

(Snat f g) h = G g . h . F f

Substituting identities in the right places, we get (remember, functors turn identities into identities):

(Snat idc f) h = G f . h
(Snat f idc') h' = h' . F f

Notice that h and h' are morphisms in D but, at the same time, elements of sets S c c and S c' c' in the wedge condition. They are given, respectively, by the action of τc and τc’ on an element x of X. The wedge condition for Snat will therefore post-compose τc and pre-compose τc:

G f . τc = τc' . F f

And this is exactly the naturality condition for components of τ. It means that τ that satisfies the wedge condition is a natural transformation for any choice of x.

Dinatural transformations

This definition of a wedge looks almost like the definition of a cone, except that the commuting conditions in a cone were expressed in terms of naturality of the transformation between the diagonal functor ΔX and the functor F. Here, we could also say that the object X is an image of a diagonal profunctor ΔX — trivially defined to map pairs of objects from C to an object X, and pairs of morphisms to the identity morphism on X.

A wedge

A wedge (Strictly speaking, a profunctor is a mapping from CopxC to D — the “op” accounting for the reversal of the direction of morphisms in the first argument)

So what we need to complete the picture is some generalized notion of a natural transformation between two profunctors R and S. Since we are only interested in the mapping of diagonal elements of profunctors, this transformation will be called a dinatural transformation (diagonally natural). All we need is to expand the top vertex of the diamond diagram (the wedge condition) to create the commuting hexagon below.

Dinaturality condition

Dinaturality condition

S idc f . τc . R f idc = S f idc' . τc' . R idc' f

With this dinaturality condition we can define a wedge for any profunctor S with a vertex X as a dinatural transformation between ΔX and S.

Now, just as we have defined a limit as a universal cone, we can define an end as a universal wedge. An end is an object End S and a dinatural transformation ω such that for any other wedge (X, τ) there is a unique morphism h from X to End S for which all the triangles in the following diagram commute:

The end as a universal wedge

The end as a universal wedge

In particular, since the wedges for our profunctor Snat defined natural transformations, their end defines the set of all natural transformations between the functors F and G: [C, D](F, G).

There is special notation for an end using the integral symbol. We are “integrating” a profunctor S along a diagnonal over all objects c in category C.

End S = ∫ c S(c, c)

Using this notation, the set of natural transformations can be written as the following end:

[C, D](F, G) = ∫ c D(F c, G c)

Note: This is a bit of abuse of notation that happens quite often in category theory. The integral sign makes more sense for coends, which are duals of ends. Coends are related to colimits, which include coproducts, which are the category theory proxies for sums. In this sense, a coend is a generalization of a sum; just as an integral is an infinite continuous sum. By the same token, an end is a generalization of a product. Unfortunately, there is no common symbol for a continuous product in calculus, so we are stuck with the integral symbol.

In Haskell, ends become universal quantifiers, hence this definition of natural transformation from Edward Kmett’s category extras (here f and g are functors):

type f :~> g = forall a. f a -> g a
type Natural f g = f :~> g

You can find more about representing profunctors, dinatural transformations, and ends in Haskell in Edward’s blog.

The Moment of Zen

You can think of a hom-set as defining a mapping: It takes a pair of objects and generates a set of morphisms — an object in Set. A profunctor generalizes this idea. It takes a pair of objects and generates a hom-set-like object in a category that’s not necessarily the category of sets. The mapping from a pair of objects to a hom-set is functorial: its action on pairs of morphisms is well defined as well. Except that this action is contravariant in the first argument and covariant in the second. And so is a profunctor which imitates it.

We’ve seen before how a functor can embed a simple graph into a category and define a limit. The limit embodies the structure of this graph in a single object. A profunctor embeds a structure of a hom-set into a category. An end then embodies this structure in a single object. When this hom-set structure is fashioned using two functors, the end becomes a set of mappings between those two functors — a set of natural transformations. But nothing stops us from fashioning more complex hom-set-like structures and finding their ends.

One such construction I’ve had my eyes on for a long itme is called the Kan extension. To give you an idea, imagine a functor T that’s defined on a sub-category M of a bigger category C. It maps it into a category A. The question is, can we extend, or interpolate, this functor over the rest of C? How would we define its value on an object c that’s not in M? The trick is to look at all the hom-sets that go from c to objects in M.

Kan extension

Kan extension setup

Our extended functor will have to map not only c to an object in A, but also all those hom-sets to hom-sets in A. After all, that’s what functors do. This mapping of hom-sets looks very much like a profunctor. It has to be contravariant in its source and covariant in its target. If this profunctor has an end, that’s a perfect candidate for the image of c under the extended functor. That, in a nutshell, is the idea behind the right Kan extension (the left one is, predictably, built with coends). But that’s a topic for another blog post.

Haskell is a language deeply rooted in category theory. But as you don’t need to study the root system of Vitis vinifera in order to enjoy a glass of wine, you don’t need to know much about category theory in order to program in Haskell. Nevertheless, some of us just can’t help ourselves. We have to dig into the rich terroir of category theory to gain deeper insight into the art of functional programming. And today I’d like to talk about functions.

The basic category-theoretical model for Haskell is the category Hask, where objects are Haskell types, and morphisms are functions. The problem with this picture is that it puts functions on a different footing than the rest of the language. Functions from type A to type B — in other words, morphisms from object A to object B in Hask — form a set. This set is called the hom set, Hom(A, B). The fact that it’s just a set and not something bigger is a property of Hask — the property of being locally small. But in Haskell functions from type A to type B also form a type A->B. A type is an object in Hask. So what’s the connection between the set Hom(A, B) and the object A->B? The answer to this question is very interesting and involves products, exponentials, currying, and of course universal constructions.

In my previous blog I talked about the universal construction of limits — objects that represent relationships between other objects. In particular, a product can be defined as such a limit representing the most trivial relationship between two objects — that of just being two objects. Morphisms are also involved in relationships between objects, so maybe there is a way of representing them as an object as well. And indeed, it’s possible to define an object to represent a set of morphisms between two other objects A and B. Such an object is called the exponential and denoted by BA.

Notice that the domain A of the morphisms appears in the exponent. That might seem odd at first, but it makes perfect sense if you consider the relationship between multiplication (product) and exponentiation. In arithmetic, mn means m multiplied by itself n times. If you replace m and n with types (for simplicity, think of types as sets of values) and multiplication with (set-theoretical) product, you can think of mn as a set of n-tuples of values of type m: (m1, m2, m3,… mn). Of course, if n is a type, it’s not immediately clear what an n-tuple is (it’s a categorical power), but you can gain some intuition if you consider enumerated finite types. For instance, functions from Bool to any type m, Bool->m, can be represented as all possible pairs of ms (one value for True and one for False). They correspond to the exponential mBool. Also, for finite types, the number of different functions from m to n is equal to mn. But the connection between products and exponentials goes deeper than that.

Universal Construction

The basic relationship describing a function is that of application. Given a pair (function, argument), produce a result. It terms of types, a function of type X->Y applied to X produces Y. We want to define the exponential object YX to model this relationship. How do we do that?

There isn’t really that much choice. We need to map a pair of objects (YX, X) to Y. But what is a pair, and what does it mean to map? We can represent the pair as an object — a product of YX × X — and then we can map it to Y using a morphism, which we’ll call app.

It immediately follows that we can’t define exponential objects if we don’t have products. Again, it kind of make intuitive sense — exponentiation arising from iterated multiplication.

From previous experience we know that having a relationship between objects is usually not enough to define a new object. There may be many other objects that model this relationship. We need a way to compare them and pick the one that models it best.

So suppose that we have an impostor object Z, together with a morphism g from Z × X to Y impersonating application. We know that our choice for YX is universal if for any Z and g there is a unique morphism, which we’ll call λg, that maps Z to YX, and which factors through app:

g = app . (λg, id)
Universality diagram defining the exponential object

Universality diagram defining the exponential object

Such universal object might not exist in every category, but it does in Hask. In general, a category in which there is a terminal object, a product of any two objects, and an exponential of any two objects is called Cartesian closed. Cartesian closed categories are, for obvious reasons, very important in computer science.


There’s another way of looking at the diagram that defines the exponential object. You can think of the morphism g as a function of two variables:

g :: (Z, X) -> Y

For any such g there is a unique morphism λg that maps Z to YX, an object representing a function from X to Y. This establishes a one-to-one correspondence between functions of two variables and functions returning functions, which we know under the name of currying. So currying “falls out” of the definition of the exponential object.


Any time there is a one-to-one correspondence between sets of morphisms you might want to look for an underlying adjunction. You might remember from my previous blog post that a functor F is said to be left adjoint to the functor G (or G right adjoint to F) if the following two hom sets are naturally isomorphic:

Hom(FZ, Y) ~ Hom(Z, GY)

In our case we have a one-to-one mapping between the morphism g from Z×X to Y and the morphism λg from Z to YX. In a category where all products and all exponentials exist, we can define these two functors:

FXZ = Z × X

In Haskell, these functors would be implemented as:

newtype F x z = F (z, x)
instance Functor (F x) where
    fmap f (F (z, x)) = F (f z, x)

newtype G x y = G (x -> y)
instance Functor (G x) where
    fmap f (G g) = G (f . g)

and the isomorphism of hom sets would be given by the function phi and its inverse phi':

phi :: (F x z -> y) -> z -> G x y
phi f z = G $ \x -> f (F (z, x))

phi' :: (z -> G x y) -> F x z -> y
phi' g (F (z, x)) = let G f = g z 
                    in f x

Exponentiation can thus be defined as the right adjoint of taking a product.

C++ is like an oil tanker — it takes a long time for it to change course. The turbulent reefs towards which C++ has been heading were spotted on the horizon more than ten years ago. I’m talking, of course, about the end of smooth sailing under the Moore’s law and the arrival of the Multicore. It took six years to acknowledge the existence of concurrency in the C++11 Standard, but that’s only the beginning. It’s becoming more and more obvious that a major paradigm shift is needed if C++ is to remain relevant in the new era.

Why do we need a new paradigm to deal with concurrency? Can’t we use object oriented programming with small modifications? The answer to this question goes to the heart of programming: it’s about composability. We humans solve complex problems by splitting them into smaller subproblems. This is a recursive process, we split subproblems into still smaller pieces, and so on. Eventually we reach the size of the problem which can be easily translated into computer code. We then have to compose all these partial solutions into larger programs.

The key to composability is being able to hide complexity at each level. This is why object oriented programming has been so successful. When you’re implementing an object, you have to deal with its internals, with state transitions, intermediate states, etc. But once the object is implemented, all you see is the interface. The interface must be simpler than the implementation for object oriented programming to make sense. You compose larger objects from smaller objects based on their interfaces, not the details of their implementation. That’s how object oriented programming solves the problem of complexity.

Unfortunately, objects don’t compose in the presence of concurrency. They hide the wrong kind of things. They hide sharing and mutation. Let me quote the definition of data race: Two or more threads accessing the same piece of memory at the same time, at least one of them writing. In other words: Sharing + Mutation = Data Race. Nothing in the object’s interface informs you about the possibility of sharing and mutation inside the object’s implementation. Each object in isolation may be data-race-free but their composition may inadvertently introduce data races. And you won’t know about it unless you study the details of their implementation down to every single memory access.

In Java, an attempt had been made to mollify this problem: Every object is equipped with a mutex that can be invoked by declaring the method synchronized. This is not a scalable solution. Even Java’s clever thin lock implementation incurs non-negligible performance overhead, so it is used only when the programmer is well aware of potential races, which requires deep insight into the implementation of all subobjects, exactly the thing we are trying to avoid.

More importantly, locking itself doesn’t compose. There’s a classic example of a locked bank account whose deposit and withdraw methods are synchronized by a lock. The problem occurs when one tries to transfer money from one account to another. Without exposing the locks, it’s impossible to avoid a transient state in which the funds have already left one account but haven’t reached the second. With locks exposed, one may try to hold both locks during the transfer, but that creates a real potential for deadlocks. (Software Transactional Memory provides a composable solution to this problem, but there are no practical implementations of STM outside of Haskell and Clojure.)

Moreover, if we are interested in taking advantage of multicores to improve performance, the use of locks is a non-starter. Eking out parallel performance is hard enough without locks, given all the overheads of thread management and the Amdahl’s law. Parallelism requires a drastically different approach.

Since the central problem of concurrency is the conflict between sharing and mutation, the solution is to control these two aspects of programming. We can do mutation to our heart’s content as long as there’s no sharing. For instance, we can mutate local variables; or we can ensure unique ownership by making deep copies, using move semantics, or by employing unique_ptrs. Unique ownership plays very important role in message passing, allowing large amounts of data to be passed cheaply between threads.

However, the key to multicore programming is controlling mutation. This is why functional languages have been steadily gaining ground in concurrency and parallelism. In a nutshell, functional programmers have found a way to program using what, to all intents and purposes, looks like immutable data. An imperative programmer, when faced with immutability, is as confused as a barbecue cook in a vegetarian kitchen. And the truth is that virtually all data structures from the C++ standard library are unsuitable for this kind of programming — the standard vector being the worst offender. A continuous slab of memory is perfect for random or sequential access, but the moment mutation is involved, you can’t share it between threads. Of course, you can use a mutex to lock the whole vector every time you access it, but as I explained already, you can forget about performance and composability of such a solution.

The trick with functional data structures is that they appear immutable, and therefore require no synchronization when accessed from multiple threads. Mutation is replaced by construction: you construct a new object that’s a clone of the source object but with the requested modification in place. Obviously, if you tried to do this with a vector, you’d end up with a lot of copying. But functional data structures are designed for maximum sharing of representation. So a clone of a functional object will share most of its data with the original, and only record a small delta. The sharing is totally transparent since the originals are guaranteed to be immutable.

A singly-linked list is a classical, if not somewhat trivial, example of such a data structure. Adding an element to the front of a list requires only the creation of a single node to store the new value and a pointer to the original (immutable) list. There are also many tree-like data structures that are logarithmically cheap to clone-mutate (red-black trees, leftist heaps). Parallel algorithms are easy to implement with functional data structures, since the programmer doesn’t have to worry about synchronization.

Functional data structures, also known as “persistent” data structures, are naturally composable. This follows from the composability of immutable data — you can build larger immutable object from smaller immutable objects. But there’s more to it: This new way of mutating by construction also composes well. A composite persistent object can be clone-mutated by clone-mutating only the objects on the path to the mutation; everything else can be safely shared.

Concurrency also introduces nonstandard flows of control. In general, things don’t progress sequentially. Programmers have to deal with inversion of control, jumping from handler to handler, keeping track of shared mutable state, etc. Again, in functional programming this is nothing unusual. Functions are first class citizens and they can be composed in many ways. A handler is nothing but a continuation in the continuation passing style. Continuations do compose, albeit in ways that are not familiar to imperative programmers. Functional programmers have a powerful compositional tool called a monad that, among other things, can linearize inverted flow of control. The design of libraries for concurrent programming makes much more sense once you understand that.

A paradigm shift towards functional programming is unavoidable and I’m glad to report that there’s a growing awareness of that new trend among C++ programmers. I used to be the odd guy talking about Haskell and monads at C++ meetings and conferences. This is no longer so. There was a sea change at this year’s C++Now. The cool kids were all talking about functional programming, and the presentation “Functional Data Structures in C++” earned me the most inspiring session award. I take it as a sign that the C++ community is ready for a big change.

In my last blog post I talked about a universal construction of a product in an arbitrary category. This kind of construction might seem very abstract to you and me, but not to a mathematician. Every step in that construction may be analyzed under a microscope and further generalized.

Let’s start with the selection of two objects whose product we are defining. What does it mean to select two objects in a category C? In any other branch of mathematics that would be a stupid question, but not in category theory. You select two objects by providing a functor from a two-object category to C.

You might be used to thinking of categories as those big hulking things, like the category of sets or monoids. But there are also dwarf categories that consist of one or two objects and just a handful of arrows between them. They don’t represent anything other than simple graphs. In particular, the simplest non-trivial category, called 1, is just a single object with one identity morphism looping back on itself. We can define a functor from that category to any other category. It will map the single object to a particular object in the target category. (It will also map the only morphism into the identity morphism for that target object.) This functor is the essence of picking an object in a category. Instead of saying “Pick an object in the category C,” you may say “Give me a functor from the singleton category to C.”

The next simplest category is a two-object category, {1, 2}. We have two objects and two identity morphisms acting on them. Assume there are no morphisms between the two objects (so it’s a “discrete” category). A functor F from this category to C is the essence of picking two objects in C.

Functor F from {1,2} to C

Fig 1. Functor F from {1,2} to C

Actually, there is an even simpler functor from {1, 2} to C, a degenerate functor that picks just one object in C — it maps both object to the same object in C. We’ll call this functor ΔX, where X is the target object in C.

Const functor from {1, 2} to C

Fig 2. Const functor from {1, 2} to C

When there are two functors, there may be a natural transformation between them. Let’s try to define such a transformation between ΔX and F. The first functor, ΔX, maps both objects 1 and 2 to X. The second, F, maps 1 to A and 2 to B. A natural transformation between these two functors has two components: the first is a morphism p1 from X to A, and the second is a morphism p2 from X to B.

Components of the natural transformation from the constant functor to F

Fig 3. Components of the natural transformation from the constant functor Δ to F

But that’s exactly one of the triples (X, p1, p2) that I used in the definition of a product in my previous post. The product was defined as the universal triple with a unique mapping from any other triple to it.

We no longer have to talk about selecting objects and constructing triples. Instead we can talk about natural transformations between the constant functor and a functor from the category {1, 2} to C.

This might not sound like a great simplification, but it’s an essential step toward the next generalization. We are going to replace the simple category {1, 2} with an arbitrary category J (sometimes called an index category). Again, we have the constant functor ΔX from J to C, which maps all objects in J to a single object X. It also maps all morphisms in J to a single identity morphism iX. We also have the functor F that embeds J in C, but it will now map morphisms as well as objects. It’s helpful to think of J as a graph with vertices and arrows. F embeds this graph into C. As before, a natural transformation from ΔX to F has as its components morphisms going from X to the vertices of the diagram defined by F. Because of its conical shape, this natural transformation is called a cone.

A cone

Fig 5. A cone

But a natural transformation not only acts between objects but also between morphisms. In fact “naturality” is defined in terms of morphisms.

Fig 6 illustrates the general case. We have two functors F and G. These functors map objects X and Y, respectively, to FX, FY, and GX, GY.

A morphism f from X to Y is mapped by F to Ff and by G to Gf.

On top of that, we have the natural transformation Φ from F to G, whose components are ΦX and ΦY. For Φ to be natural these four arrows must form a commutative diagram. In other words:

Gf . ΦX = ΦY . Ff
Naturality square

Fig 6. Naturality square

Replace F with ΔX and you’ll see that the naturality condition for our cone translates into the commutativity of all the triangles with the vertex X. We didn’t see this condition in the product example, because there the base of the cone had no arrows. So let’s find a better example.


A lot of practical math problems boil down to solving an equation or two. An equation usually has the form: A function of some variable is equal to zero. The set of solutions is called a kernel. But this definition assumes that there is a zero, and that’s too specific. A more general formulation talks about two functions being equal. A set of values on which two functions are equal is called the equilizer. So the equilizer is the generalization of the kernel of the difference of two functions. It will work even if we don’t know what zero is or how to subtract values.

Let’s dissect this definition further. We have two functions, f and g, from A to B. The equilizer is the subset of A on which f and g are equal. How can we generalize this definition so that we don’t have to think about elements of A and B? We have two problems: how to define a subset and how to define the equality of elements.

Let’s start with the subset. We can look at a subset as an embedding of one set inside another. Every function, in a matter of speaking, defines an embedding of its domain into its co-domain. For instance, a function from real numbers to 3-d points can be viewed as an embedding of a line in space.

Let’s see how far we can get with this idea for the purpose of defining the equilizer of f and g. We need another set X and a function p from X to A. The image of X in A will serve as our definition of a subset of A. Now let’s apply the function f after p. We get a composite function f . p from X to B. Similarly, we can apply g after p to get g . p. In general, these two composite functions will be different, except when the image of p falls inside the equilizer of f and g. Notice that we can talk about equality of functions without resorting to equality of elements if we look at them as morphisms in a category.

You see where this is heading? The set X with a function p, while not exactly defining the equilizer, provides some sort of a candidate for it. This candidate can be fully described in categorical terms as an object and a morphism, without recourse to sets or their elements. And the equality

f . p = g . p

is just the condition for the following diagram to commute:

Fig 7. The equilizer cone

Fig 7. The equilizer cone

But this is a cone for a two-object-two-arrow category! The objects A and B and the morphisms f and g can be seen as the image of this category under some functor F. Likewise, the object X is the image of this category under the constant functor ΔX. The two orange arrows are the components of the natural transformation from ΔX to F, and the commutativity of this diagram is nothing but its naturality condition.

Of course, there may be many pairs (X, p) that satisfy our conditions. We are looking for the universal one, with the property that any other pair (Y, q) can be mapped into it.

Fig 8. X is the equilizer if its cone is universal

Fig 8. X is the equilizer if its cone is universal

Just like in the product case, we have to demand that q factorizes through h, or that the appropriate triangles commute — in particular q = p . h. This universal cone is the equilizer.

The Moment of Zen

You might ask yourself the question: Why should I care whether some diagrams commute or not? In particular, why are natural transformations better than the “unnatural” ones? Why is it important that naturality diagrams commute? There must be something incredibly deep behind this idea of commutativity to make it pop up in so many diverse branches of mathematics unified by category theory.

Indeed, the very definition of category contains the mother of all commuting diagrams: The condition that, if there is a morphism from A to B and another from B to C, then there is a shortcut morphism from A to C, and it doesn’t matter which way you go. This is the essence of composability: you decompose the path from A to C into its components.

Every programmer is familiar with the idea of functional decomposition, but this pattern goes much deeper than just programming. It’s the essence of our knowledge, of our understanding. We don’t understand anything unless we are able to split it into smaller steps and then put these steps back together — compose them. Without composition we can’t deal with complexity.

The other foundation of understanding is the ability to create models. We create simple models of complex phenomena all the time. By understanding how models work, we gain insight into how complex phenomena work. But for that we need to establish the correspondence, the mapping, between the domain of the model and the domain of the phenomenon that we’re studying. And what’s the most important property of that mapping? Well, our understanding of the model is built on our understanding of its parts and of the ways they compose. So the mapping must preserve composability! Mappings that preserve composability are called functors.

We’ve used some very simple categories as our models. The {1, 2} category modeled the selection of two objects. We used a functor to embed this model into a bigger category C. In that particular case, there wasn’t much structure for the functor to preserve. The model for the equilizer, on the other hand, was a bit more involved. Besides the two objects, it also had two parallel arrows. The functor that embedded this model had to map those arrows as well.

If functors are used for modeling, what are natural transformations? They relate different phenomena described by the same model.

A stick figure is a model for a human being. It can be mapped into Alice, or it can be mapped into Bob. A natural transformation would map Alice into Bob in such a way as to preserve the stick-figure structure. So if the circle is mapped into Alice’s head by one functor, and into Bob’s head by another, a natural transformation would map Alice’s head into Bob’s head.

The model tells us that there is a neck between the head and the torso. So we could use Alice’s neck to go from her head to her torso, and then map her torso to Bob’s torso using the natural transformation. But we could also map Alice’s head to Bob’s head using the natural transformation, and then use Bob’s neck to get to his torso. Naturality tells us that it doesn’t matter which way we go, the result is the same.

We also have the constant functor, which maps the model into a single blob. A natural transformation then maps this blob into Bob; again, with the condition that it doesn’t matter whether we reach Bob’s torso from the blob through his right hand and arm or through his left hand and arm.

Now take into account that, although the stick figure is just made of points and lines, it is mapped into real human beings and real blobs. So a morphism from a blob to Bob’s hand is not trivial, and there may be many different ones. Similarly, the arm that connects the hand to the torso contains veins, arteries, nerves, etc. So naturality is a non-trivial condition, especially if the blob itself is complex.

So what is special about the universal blob? It has to have some interesting internal structure because it not only maps into every part of Bob, but it maps better than any other blob. Any other blob that maps naturally into Bob also maps into the universal blob. And it maps in such a way that it doesn’t matter if you go directly from the said blob to Bob, or if you go through the universal blob. The universal blob contains the stick-figure essence of Bob, and no other blob (that is not isomorphic to it) can take its place.

Universality tells us something about the structure of an object not by dissecting it but by describing its relationship to other objects that model a certain relationship.


Having defined a cone as a natural transformation from ΔX to F — two functors that map the category J to category C — we can now define the limit of F as the universal cone. For that, we need a way to compare cones that differ only by their top vertex. If there is a universal cone with the top vertex U, then any other cone with the top vertex V can be uniquely mapped onto it. It must be a mapping that preserves the structure of the cone, so if h maps V into U, h must factorize all the arrows that form the V cone into the arrows that form the U cone. For instance, in Fig 9, we must have q = p . h, etc.

The limit U is the vertex of the universal cone

Fig 9. The limit U is the vertex of the universal cone

But there’s a better way of looking at it. Notice that, in the definition of a limit, we establish a one to one correspondence between a cone with the vertex V and a morphism h from V to U. The definition talks about there being a unique h for every cone, but the other way around works as well. Given an h we can construct the cone at V by composing the morphisms — as in q = p . h.

A cone is a natural transformation, a member of Nat(ΔV, F), the set of natural transformation between two functors, ΔV and F. A morphism h is a member of the home set Hom(V, U), the set of morphisms between two objects. So we can say that if U is a limit of F then there is an isomorphism between those two sets:

Nat(ΔV, F) ~ Hom(V, U)

In fact, if we insist that this isomorphism be natural, we’ll get all the commuting triangles for free. Remember? Naturality condition is just the commutability of certain diagrams. So if there is a natural isomorphism between these two sets, then U is a limit of F. In fact, we can use this isomorphism as the definition of the limit.

But what does it mean that the isomorphism is natural? What are the two functors that are mapped by it? One functor maps V into Hom(V, U), and the other maps V into Nat(ΔV, F). Both Hom(V, U) and Nat(ΔV, F) are sets in the category of sets. So these are functors from C to Set. You might recognize the first one from the Yoneda lemma.

The second one is a bit more tricky. To understand it, you have to realize that functors themselves form a category. If functors are object in the functor category than natural transformations between these functors are morphisms. Natural transformations compose (and their composition is associative), and there always is a unit natural transformation for each functor. So this is a legitimate category. A hom set in that category is a set of all natural transformations between two functors. Nat(ΔV, F) is one such hom set.

Whenever you see a natural isomorphism of hom sets, chances are there is an adjunction between two functors. A functor F is said to be left adjoint to the functor G (or G right adjoint to F) if the following two hom sets are naturally isomorphic:

Hom(FX, Y) ~ Hom(X, GY)

If you compare this with our definition of a limit, you’ll see that the functor that maps F to its limit U is right adjoint to the const functor ΔV. Of course this is only true if the said functor exists, which is not always true — not all diagrams of shape J must have limits in C.

Adjunction between delta and f is the natural isomorphism between the corresponding hom sets (arrows).

The adjunction between ΔV and Lim F (limit of F) is the natural isomorphism between the corresponding hom sets (arrows).

I hope to talk more about adjoint functors in the future.


Hopefully this blog post will prepare you to watch this excellent series of videos by Catsters on YouTube:

  1. Cones and limits: Definitions.
  2. Examples of limits: Terminal object, product, pullback, equalizer.
  3. Cones as natural transformations.
  4. Formal definition of limit as natural isomorphism.
  5. Limits and adjunctions.
  6. Colimits.

In category theory, all information about objects is encoded in the arrows (morphisms) between them. You’re not supposed to look inside an object to study its internal structure. You’re supposed to look at the relationships this object has with other objects in the category, including itself.

This takes some getting used to, especially when you’re talking about familiar objects like sets. How do you express the idea that a set is empty, or that it consists of pairs of elements from two other sets, without talking about elements?

On the other hand, if you learn to construct particular types of sets purely in terms of their relationships, you can easily apply those constructions to other categories.

So let’s try to define a product of two sets A and B by looking only at its relationships with other objects — relationships being morphisms or, in the case of sets, functions. A Cartesian product AxB comes equipped with two functions called projections (in Haskell they are called fst and snd). So maybe a product AxB can be defined as a set equipped with two functions — one mapping it into A and the other mapping it into B. Unfortunately, there are infinitely many such sets, most of them looking nothing like a product, so it’s not a very good definition.

It turns out that, in order to characterize a set as being a product of A and B, we have to look at its relationship with every other set that has maps into A and B. Only when our set is surrounded by other similar sets, does it stand out. It stands out as the model for its relationship with A and B. It’s the one that has “the best” projections. Its projections are so good that any other pair of projections from any other set have to factor through them. Let me explain what that means.

How can you tell that one candidate for a product is better than another? Suppose you have two sets X and Y and their mappings to A and B. Altogether you have:

p1 :: X -> A
p2 :: X -> B
q1 :: Y -> A
q2 :: Y -> B

We want to somehow relate Y to X, so let’s assume that there is a function h from Y to X. We are interested in functions that also relate the projections, so let’s impose these two conditions on h:

q1 = p1 . h
q2 = p2 . h
Factorization of q1 and q2 through h.

Factorization of q1 and q2 through h.

If this were multiplication of natural numbers, we would say that q1 and q2 have a common factor, h. Here, q1 and q2 are functions and the dot is composition, but we still say that q1 and q2 factorize through h.

In most cases there will be no such function h, and we won’t be able to compare candidates. And that’s okay. But sometimes we’ll get really lucky, and there will be one and only one such function h between Y and X. In that case we will say that X is better than Y as the candidate for the product of A and B.

In particular, if we were allowed to look inside the sets, and if X were just a set of pairs (a, b), then we could always construct a unique h :: Y -> X using q1 and q2, like this:

h y = (q1 y, q2 y)

where y is an element of Y. Of course, we are not allowed to look at the components, but I wanted to motivate our preference for Y over X.

For the sake of an argument, let’s try some other combinations in Haskell — with sets represented as types. For instance, we could try to pretend that the the product of Int and Bool is String, with

p1 = length
p2 "" = False
p2 _ = True

This won’t work because there is a triple (Y, q1, q2) that won’t factorize through p1 and p2. The simplest such Y is the actual product (Int, Bool) with the usual projections fst and snd. There is no h that would map (8, False) into a String whose p1 yields 8 and p2 yields False. (There is no empty string of length 8.)

So let’s try something bigger, like (Int, Bool, Char). Could this work as a product of Int and Bool? This time we can easily find factorizable mappings from (Int, Bool) — but they won’t be unique. We can tuck any Char to a pair of Int and Bool and get a factorizable h, as in:

h (i, b) = (i, b, 'a')
h' (i, b) = (i, b, 'b')

Some sets will be too small, others will be too big — like Goldilocks, we have to find the one that’s just right.

What’s important is that we have a way of comparing triples (X, p1, p2) based on unique factorizability. If there exist “the best” such triple — one that any other triple uniquely factorizes into — we’ll call it the product of A and B. Again, it’s not necessary that any two triples be comparable — we don’t need the total order. We just need any triple to be comparable with the one that represents the product.

Putting all this together, X is a product of A and B if and only if:

  1. There is a pair of morphisms:
    p1 :: X -> A
    p2 :: X -> B
  2. For any other object Y with a pair of morphisms
    q1 :: Y -> A
    q2 :: Y -> B

    there is a unique morphism h, such that

    q1 = p1 . h
    q2 = p2 . h

Notice that I used a very general language of objects and morphisms, rather than sets and functions. This lets us define products in an arbitrary category.

There is no guarantee that a product of two arbitrary objects exists in a particular category. And even if it exists, it doesn’t have to be unique. Well, it has to be unique up to an isomorphism. The intuition is this: If you have two objects X and Y that both fulfill our conditions, then there must be a mapping h from X to Y, and a mapping h’ from Y to X, both factorizing the respective projections. One can easily show that they have to be the inverse of each other.

Even in Haskell, the Cartesian product of two types is not unique. The built-in pair is one representation, but here’s another one:.

type ChurchPair a b = forall c . (a -> b -> c) -> c

It’s a type of polymorphic functions that’s isomorphic to the pair type. To prove this, here are the two mappings, from (ChurchPair a b) to (a, b) and back (and, of course, one is the inverse of the other):

churchToPair :: ChurchPair a b -> (a, b)
churchToPair f = f (\x y -> (x, y))

pairToChurch :: (a, b) -> ChurchPair a b
pairToChurch (x, y) = \g -> g x y

You might wonder what this universal definition of a product means in other, more exotic categories. Here’s one: Every partially ordered set (poset) is a category with the less-than-or-equal relations playing the role of morphisms. A product of two objects A and B in a poset turns out to be the largest object that’s smaller than both A and B, a.k.a., the meet, or the infimum, or the greatest lower bound. Here’s another: In the category of logical predicates, the product is the conjunction; and so on.

Universal Construction

This kind of characterization of a particular type of object in terms of its relationship with the rest of the universe is called a universal construction and is very common in category theory. We specify a certain property and then establish a hierarchy of objects according to how well they model this property. We then pick the “best” model. Best could mean the simplest, the least constrained, the smallest, or the largest, depending on the context.

I used the same method in my previous blog, Understanding Free Monoids. There, the free monoid was picked as a universal object in the category of monoids. We considered all monoids that shared the same generators, and there was one (up to isomorphism) that could be uniquely mapped to all others.

There’s much more to universal constructions that those few examples suggest. For instance, the statement that “there is a unique something for every something else” suggests some kind of isomorphism. Commuting diagrams hint at naturality — that is a natural transformations between functors. These ideas generalize to limits, co-limits, adjunctions and, ultimately, Kan extensions. I hope to write more on these topics in the future.

Initial Object

Let’s try to use a universal construction to define an empty set without talking about elements (or, rather, lack thereof). What can we say about functions in and out of an empty set? First of all, you can’t have a function going from any non-empty set into an empty set. Just try defining a function from integers to an empty set: What’s its value at zero?!

On the other hand, it’s easy to map an empty set to any other set. It’s a no-brainer: You’ll never be asked to provide any value. So an empty set has this property that there’s a unique mapping from it into any other set. Before we use it as a definition of emptiness, let’s ask if there is any other set with this property? There isn’t, because if there were, there would be a mapping from it to our empty set and we have just said that that was impossible.

Having defined an empty set in terms of functions, we can generalize this definition to any category. An object that has a unique mapping to any other object in the category is called the initial object. Can there be more than one initial object in a category? Suppose that we have two such objects, I and I’. By definition, there must be a unique mapping from I to I’, because I is initial. By the same token there must be a mapping from I’ to I, because I’ is initial. What’s the composition of these two mappings? It has to be the identity mapping. There can be no other mapping from an initial object to itself (why?). So the two objects I and I’ must be isomorphic. (But, as we’ve seen, in the category of sets there is only one initial object: the empty set.)

You might recognize initial algebras from my blog Understanding F-Algebras as examples of initial objects in the category of F-algebras.

Terminal Object

Every construction in category theory has its dual: one obtained by inverting the direction of all arrows. We have just defined the initial object as the one with unique morphisms going to any other object. The dual of that would be an object that has a unique morphism coming from any other object. We’ll call such an object, predictably, the terminal object.

What’s the terminal object in the category of sets? It’s a one-element set. The unique morphism (function) from any set to a one-element set simply maps all elements of the set to that one element. It’s called the constant function. So here we have a universal definition of a one-element set as the final object in the category of sets.

Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

void printNTriples(int n)
    int i = 0;
    for (int z = 1; ; ++z)
        for (int x = 1; x <= z; ++x)
            for (int y = x; y <= z; ++y)
                if (x*x + y*y == z*z) {
                    printf("%d, %d, %d\n", x, y, z);
                    if (++i == n)

Here, a single C function serves three distinct purposes: It

  1. Generates Pythagorean triples,
  2. Prints them,
  3. Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

  1. Generate the list of Pythagorean triples,
  2. Take the first ten of them, and
  3. Print them.

And that’s exactly how you’d write this program in Haskell:

main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
                     , x <- [1..z]
                     , y <- [x..z]
                     , x^2 + y^2 == z^2]

This program reads: take 10 triples and print them. It declares triples as a list (square brackets mean a list) of triples (x, y, z), where (the vertical bar reads “where”) z is an element of the list of integers from 1 to infinity, x is from 1 to z, y is from x to z, and the sum of squares of x and y is equal to the square of z. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to print that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.


We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

template<class T>
class Susp {
    explicit Susp(std::function<T()> f)
        : _f(f)
    T get() { return _f(); }
    std::function<T()> _f;

We often create suspensions using lambda functions, as in:

int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
int z = sum.get();

Notice that the suspended lambda may capture variables from its environment: here x and y. A lambda, and therefore a suspension, is a closure.

The trouble with this implementation is that the function is re-executed every time we call get. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls get we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the this pointer) and returning a value (by const reference). The get method simply calls this thunk, passing it the this pointer.

Initially, the thunk is set to thunkForce, which calls the method setMemo. This method evaluates the function, stores the result in _memo, switches the thunk to thunkGet, and returns the memoized value. On subsequent calls get goes through the getMemo thunk which simply returns the memoized value.

template<class T>
class Susp
    // thunk
    static T const & thunkForce(Susp * susp) {
        return susp->setMemo();
    // thunk
    static T const & thunkGet(Susp * susp) {
        return susp->getMemo();
    T const & getMemo() {
        return _memo;
    T const & setMemo() {
        _memo = _f();
        _thunk = &thunkGet;
        return getMemo();
    explicit Susp(std::function<T()> f)
        : _f(f), _thunk(&thunkForce), _memo(T())
    T const & get() {
        return _thunk(this);
    T const & (*_thunk)(Susp *);
    mutable T   _memo;

    std::function<T()> _f;

(By the way, the function pointer declaration of _thunk looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn't ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the const method pop_front.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

template<class T>
class Stream
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the Cell? Remember, we want to be able to generate infinite sequences, so Stream must contain the DNA for not only producing the value of type T but also for producing the offspring — another (lazy) Stream of values. The Cell is just that: A value and a stream.

template<class T>
class Cell
    Cell() {} // need default constructor for memoization
    Cell(T v, Stream<T> const & tail)
        : _v(v), _tail(tail)
    explicit Cell(T v) : _v(v) {}
    T val() const {
        return _v;
    Stream<T> pop_front() const {
        return _tail;
    T _v;
    Stream<T> _tail;

This mutually recursive pair of data structures works together amazingly well.

template<class T>
class Stream
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
    Stream() {}
    Stream(std::function<Cell<T>()> f)
        : _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
    Stream(Stream && stm)
        : _lazyCell(std::move(stm._lazyCell))
    Stream & operator=(Stream && stm)
        _lazyCell = std::move(stm._lazyCell);
        return *this;
    bool isEmpty() const
        return !_lazyCell;
    T get() const
        return _lazyCell->get().val();
    Stream<T> pop_front() const
        return _lazyCell->get().pop_front();

There are several things worth pointing out. The two constructors follow our formal definition of the Stream: one constructs an empty stream, the other constructs a suspended Cell. A suspension is created from a function returning Cell.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of forEach.

The magic happens when we call get for the first time. That’s when the suspended Cell comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to pop_front. Notice that pop_front is a const method — the Stream itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from n to infinity. The constructor of a Stream takes a function that returns a Cell. We’ll use a lambda that captures the value of n. It creates a Cell with that value and a tail, which it obtains by calling intsFrom with n+1:

Stream<int> intsFrom(int n)
    return Stream<int>([n]()
        return Cell<int>(n, intsFrom(n + 1)); 

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner intsFrom is not made from the outer intsFrom. Instead it’s made the first time get is called on the emerging Stream.

Of course, we can also create finite streams, like this one, which produces integers from n to m:

Stream<int> ints(int n, int m)
    if (n > m)
        return Stream<int>();
    return Stream<int>([n, m]()
        return Cell<int>(n, ints(n + 1, m));

The trick is to capture the limit m as well as the recursion variable n. When the limit is reached, we simply return an empty Stream.

We’ll also need the method take, which creates a Stream containing the first n elements of the original stream:

Stream take(int n) const {
    if (n == 0 || isEmpty())
        return Stream();
    auto cell = _lazyCell;
    return Stream([cell, n]()
        auto v = cell->get().val();
        auto t = cell->get().pop_front();
        return Cell<T>(v, t.take(n - 1));

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, Stream. Again, the key to understanding why this works is to keep in mind that Streams and Cells are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of take forces the evaluation of the first n elements — e.g., by passing them through the consuming forEach below — these elements will appear miraculously memoized in the original Stream.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of forEach that consumes the stream while enumerating it and feeding its elements to a function.

template<class T, class F>
void forEach(Stream<T> strm, F f)
    while (!strm.isEmpty())
        strm = strm.pop_front();

It’s the assignment:

strm = strm.pop_front();

which consumes the stream by decreasing the reference count of the head of the Stream. In particular, if you pass an rvalue Stream to forEach, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the Stream. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and shared_ptr.

Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it z — create a stream from 1 to z. For each of those — call them x — create a stream from x to z. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples (x, y, z).

So far we’ve learned how to create a stream of integers — we have the function intsFrom. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our Stream is a functor.

Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the std::transform algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it fmap — generic, we have to parameterize it over types. The algorithm starts with a Stream of elements of type T and a function from T to some other type U. The result should be a stream of U.

We don’t want to make U the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type F to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

template<class T, class F>
auto fmap(Stream<T> stm, F f)

Without the use of concepts, we have no way of enforcing, or even specifying, that F be a type of a function from T to U. The best we can do is to statically assert it inside the function:

static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

But what is U? We can get at it using decltype:


Notice that decltype takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of f. We also need a dummy argument for this function: we use the result of stm.get(). The argument to decltype is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of fmap. It’s supposed to be Stream<U>, but we don’t know U until we apply decltype to the arguments of fmap. We have to use the new auto function declaration syntax of C++11. So here are all the type-related preliminaries:

template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
    using U = decltype(f(stm.get()));
    static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

Compared to that, the actual implementation of fmap seems rather straightforward:

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
        return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function f, and returns a Cell. That cell contains the value of f acting on the first element of the original stream, and a tail. The tail is created with fmap acting on the rest of the original stream.

Equipped with fmap, we can now attempt to take the first step towards generating our triples: apply the function ints(1, z) to each element of the stream intsFrom(1):

fmap(intsFrom(1), [](int z)
    return ints(1, z);

The result is a Stream of Streams of integers of the shape:

1 2
1 2 3
1 2 3 4
1 2 3 4 5

But now we are stuck. We’d like to apply ints(x, z) to each element of that sequence, but we don’t know how to get through two levels of Stream. Our fmap can only get through one layer. We need a way to flatten a Stream of Streams. That functionality is part of what functional programmers call a monad. So let me show you that Stream is indeed a monad.

Stream as a Monad

If you think of a Stream as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two Streams (we’ll get to it later) and let’s implement a function mjoin that concatenates a whole Stream of Streams.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a Stream of Streams, so we have to test two levels deep. This way we’ll ensure that the concatenation of a Stream of empty Streams immediately returns an empty Stream.

The recursive step in mjoin creates a Cell whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of mjoin of the rest of the streams:

template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
    while (!stm.isEmpty() && stm.get().isEmpty())
        stm = stm.pop_front();
    if (stm.isEmpty()) return Stream<T>();
    return Stream<T>([stm]()
        Stream<T> hd = stm.get();
        return Cell<T>( hd.get()
                      , concat(hd.pop_front(), mjoin(stm.pop_front())));

The combination of fmap and mjoin lets us compose function like intsFrom or ints that return Streams. In fact, this combination is so common that it deserves its own function, which we’ll call mbind:

template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
    return mjoin(fmap(stm, f));

If we use mbind in place of fmap:

mbind(intsFrom(1), [](int z)
    return ints(1, z);

we can produce a flattened list:

1 1 2 1 2 3 1 2 3 4...

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable z. We can keep chaining calls to mbind and capture more variables in the process:

mbind(intsFrom(1), [](int z)
    return mbind(ints(1, z), [z](int x)
        return mbind(ints(x, z), [x, z](int y)

At this point we have captured the triples x, y, z, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is mthen which is a version of mbind that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, mthen will return an empty stream.

We implement mthen using a slightly modified version of fmap that takes a function f of no arguments:

template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
    using U = decltype(f());
    static_assert(std::is_convertible<F, std::function<U()>>::value,
        "fmapv requires a function type U()");

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
        return Cell<U>(f(), fmapv(stm.pop_front(), f));

We plug it into the definition of mthen the same way fmap was used in mbind:

template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
    return mjoin(fmapv(stm, f));

The second useful function is mreturn, which simply turns a value of any type into a one-element Stream:

template<class T>
Stream<T> mreturn(T v)
    return Stream<T>([v]() {
        return Cell<T>(v);

We’ll need mreturn to turn our triples into Streams.

It so happens that a parameterized type equipped with mbind and mreturn is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy Stream is indeed a monad.

Stream as a Monoid and a Monad Plus

When implementing mjoin we used the function concat to lazily concatenate two Streams. Its implementation follows the same recursive pattern we’ve seen so many times:

template<class T>
Stream<T> concat( Stream<T> lft
                , Stream<T> rgt)
    if (lft.isEmpty())
        return rgt;
    return Stream<T>([=]()
        return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like concat, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of Streams is indeed associative, and that the neutral element is an empty Stream. Concatenating an empty Stream, whether in front or in the back of any other Stream, doesn’t change the original Stream.

What’s even more interesting is that being a combination of a monoid and a monad makes Stream into a monad plus, and every monad plus defines a guard function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a Stream. If the Boolean is false, the Stream is empty (the unit element of monad plus!), otherwise it’s a singleton Stream. We really don’t care what value sits in this Stream — we never use the result of guard for anything but the flow of control. In Haskell, there is a special “unit” value () — here I use a nullptr as its closest C++ analog.

Stream<void*> guard(bool b)
    if (b) return Stream<void*>(nullptr);
    else return Stream<void*>();

We can now pipe the result of guard into mthen, which will ignore the content of the Stream but won’t fire when the Stream is empty. When the Stream is not empty, we will call mreturn to output a singleton Stream with the result tuple:

Stream<std::tuple<int, int, int>> triples()
    return mbind(intsFrom(1), [](int z)
        return mbind(ints(1, z), [z](int x)
            return mbind(ints(x, z), [x, z](int y)
                return mthen(guard(x*x + y*y == z*z), [x, y, z]()
                    return mreturn(std::make_tuple(x, y, z));

These singletons will then be concatenated by the three levels of mbind to create one continuous lazy Stream of Pythagorean triples.

Compare this function with its Haskell counterpart:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

Now, the client can take 10 of those triples from the Stream — and the triples still won’t be evaluated!. It’s the consuming forEach that finally forces the evaluation:

void test()
    auto strm = triples().take(10);
    forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
        std::cout << std::get<0>(t) << ", " 
                  << std::get<1>(t) << ", " 
                  << std::get<2>(t) << std::endl;


The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function triples to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.