Type System

The Free Theorem for Ends

In Haskell, the end of a profunctor p is defined as a product of all diagonal elements:

forall c. p c c

together with a family of projections:

pi :: Profunctor p => forall c. (forall a. p a a) -> p c c
pi e = e

In category theory, the end must also satisfy the edge condition which, in (type-annotated) Haskell, could be written as:

dimap f idb . pib = dimap ida f . pia

for any f :: a -> b.
Using a suitable formulation of parametricity, this equation can be shown to be a free theorem. Let’s first review the free theorem for functors before generalizing it to profunctors.

Functor Characterization

You may think of a functor as a container that has a shape and contents. You can manipulate the contents without changing the shape using fmap. In general, when applying fmap, you not only change the values stored in the container, you change their type as well. To really capture the shape of the container, you have to consider not only all possible mappings, but also more general relations between different contents.

A function is directional, and so is fmap, but relations don’t favor either side. They can map multiple values to the same value, and they can map one value to multiple values. Any relation on values induces a relation on containers. For a given functor F, if there is a relation a between type A and type A':

A <=a=> A'

then there is a relation between type F A and F A':

F A <=(F a)=> F A'

We call this induced relation F a.

For instance, consider the relation between students and their grades. Each student may have multiple grades (if they take multiple courses) so this relation is not a function. Given a list of students and a list of grades, we would say that the lists are related if and only if they match at each position. It means that they have to be equal length, and the first grade on the list of grades must belong to the first student on the list of students, and so on. Of course, a list is a very simple container, but this property can be generalized to any functor we can define in Haskell using algebraic data types.

The fact that fmap doesn’t change the shape of the container can be expressed as a “theorem for free” using relations. We start with two related containers:

xs :: F A
xs':: F A'

where A and A' are related through some relation a. We want related containers to be fmapped to related containers. But we can’t use the same function to map both containers, because they contain different types. So we have to use two related functions instead. Related functions map related types to related types so, if we have:

f :: A -> B
f':: A'-> B'

and A is related to A' through a, we want B to be related to B' through some relation b. Also, we want the two functions to map related elements to related elements. So if x is related to x' through a, we want f x to be related to f' x' through b. In that case, we’ll say that f and f' are related through the relation that we call a->b:

f <=(a->b)=> f'

For instance, if f is mapping students’ SSNs to last names, and f' is mapping letter grades to numerical grades, the results will be related through the relation between students’ last names and their numerical grades.

To summarize, we require that for any two relations:

A <=a=> A'
B <=b=> B'

and any two functions:

f :: A -> B
f':: A'-> B'

such that:

f <=(a->b)=> f'

and any two containers:

xs :: F A
xs':: F A'

we have:

if       xs <=(F a)=> xs'
then   F xs <=(F b)=> F xs'

This characterization can be extended, with suitable changes, to contravariant functors.

Profunctor Characterization

A profunctor is a functor of two variables. It is contravariant in the first variable and covariant in the second. A profunctor can lift two functions simultaneously using dimap:

class Profunctor p where
    dimap :: (a -> b) -> (c -> d) -> p b c -> p a d

We want dimap to preserve relations between profunctor values. We start by picking any relations a, b, c, and d between types:

A <=a=> A'
B <=b=> B'
C <=c=> C'
D <=d=> D'

For any functions:

f  :: A -> B
f' :: A'-> B'
g  :: C -> D
g' :: C'-> D'

that are related through the following relations induced by function types:

f <=(a->b)=> f'
g <=(c->d)=> g'

we define:

xs :: p B C
xs':: p B'C'

The following condition must be satisfied:

if             xs <=(p b c)=> xs'
then   (p f g) xs <=(p a d)=> (p f' g') xs'

where p f g stands for the lifting of the two functions by the profunctor p.

Here’s a quick sanity check. If b and c are functions:

b :: B'-> B
c :: C -> C'

than the relation:

xs <=(p b c)=> xs'


xs' = dimap b c xs

If a and d are functions:

a :: A'-> A
d :: D -> D'

then these relations:

f <=(a->b)=> f'
g <=(c->d)=> g'


f . a = b . f'
d . g = g'. c

and this relation:

(p f g) xs <=(p a d)=> (p f' g') xs'


(p f' g') xs' = dimap a d ((p f g) xs)

Substituting xs', we get:

dimap f' g' (dimap b c xs) = dimap a d (dimap f g xs)

and using functoriality:

dimap (b . f') (g'. c) = dimap (f . a) (d . g)

which is identically true.

Special Case of Profunctor Characterization

We are interested in the diagonal elements of a profunctor. Let’s first specialize the general case to:

C = B
C'= B'
c = b

to get:

xs = p B B
xs'= p B'B'


if             xs <=(p b b)=> xs'
then   (p f g) xs <=(p a d)=> (p f' g') xs'

Chosing the following substitutions:

A = A'= B
D = D'= B'
a = id
d = id
f = id
g'= id
f'= g

we get:

if              xs <=(p b b)=> xs'
then   (p id g) xs <=(p id id)=> (p g id) xs'

Since p id id is the identity relation, we get:

(p id g) xs = (p g id) xs'


dimap id g xs = dimap g id xs'

Free Theorem

We apply the free theorem to the term xs:

xs :: forall c. p c c

It must be related to itself through the relation that is induced by its type:

xs <=(forall b. p b b)=> xs

for any relation b:

B <=b=> B'

Universal quantification translates to a relation between different instantiations of the polymorphic value:

xsB <=(p b b)=> xsB'

Notice that we can write:

xsB = piB xs
xsB'= piB'xs

using the projections we defined earlier.

We have just shown that this equation leads to:

dimap id g xs = dimap g id xs'

which shows that the wedge condition is indeed a free theorem.

Natural Transformations

Here’s another quick application of the free theorem. The set of natural transformations may be represented as an end of the following profunctor:

type NatP a b = F a -> G b
instance Profunctor NatP where
    dimap f g alpha = fmap g . alpha . fmap f

The free theorem tells us that for any mu :: NatP c c:

(dimap id g) mu = (dimap g id) mu

which is the naturality condition:

mu . fmap g = fmap g . mu

It’s been know for some time that, in Haskell, naturality follows from parametricity, so this is not surprising.


I’d like to thank Edward Kmett for reviewing the draft of this post.


  1. Bartosz Milewski, Ends and Coends
  2. Edsko de Vries, Parametricity Tutorial, Part 1, Part 2, Contravariant Functions.
  3. Bartosz Milewski, Parametricity: Money for Nothing and Theorems for Free

Unlike monads, which came into programming straight from category theory, applicative functors have their origins in programming. McBride and Paterson introduced applicative functors as a programming pearl in their paper Applicative programming with effects. They also provided a categorical interpretation of applicatives in terms of strong lax monoidal functors. It’s been accepted that, just like “a monad is a monoid in the category of endofunctors,” so “an applicative is a strong lax monoidal functor.”

The so called “tensorial strength” seems to be important in categorical semantics, and in his seminal paper Notions of computation and monads, Moggi argued that effects should be described using strong monads. It makes sense, considering that a computation is done in a context, and you should be able to make the global context available under the monad. The fact that we don’t talk much about strong monads in Haskell is due to the fact that all functors in the category Set, which underlies the Haskell’s type system, have canonical strength. So why do we talk about strength when dealing with applicative functors? I have looked into this question and came to the conclusion that there is no fundamental reason, and that it’s okay to just say:

An applicative is a lax monoidal functor

In this post I’ll discuss different equivalent categorical definitions of the applicative functor. I’ll start with a lax closed functor, then move to a lax monoidal functor, and show the equivalence of the two definitions. Then I’ll introduce the calculus of ends and show that the third definition of the applicative functor as a monoid in a suitable functor category equipped with Day convolution is equivalent to the previous ones.

Applicative as a Lax Closed Functor

The standard definition of the applicative functor in Haskell reads:

class Functor f => Applicative f where
    (<*>) :: f (a -> b) -> (f a -> f b)
    pure :: a -> f a

At first sight it doesn’t seem to involve a monoidal structure. It looks more like preserving function arrows (I added some redundant parentheses to suggest this interpretation).

Categorically, functors that “preserve arrows” are known as closed functors. Let’s look at a definition of a closed functor f between two categories C and D. We have to assume that both categories are closed, meaning that they have internal hom-objects for every pair of objects. Internal hom-objects are also called function objects or exponentials. They are normally defined through the right adjoint to the product functor:

C(z × a, b) ≅ C(z, a => b)

To distinguish between sets of morphisms and function objects (they are the same thing in Set), I will temporarily use double arrows for function objects.

We can take a functor f and act with it on the function object a=>b in the category C. We get an object f (a=>b) in D. Or we can map the two objects a and b from C to D and then construct the function object in D: f a => f b.


We call a functor closed if the two results are isomorphic (I have subscripted the two arrows with the categories where they are defined):

f (a =>C b) ≅ (f a =>D f b)

and if the functor preserves the unit object:

iD ≅ f iC

What’s the unit object? Normally, this is the unit with respect to the same product that was used to define the function object using the adjunction. I’m saying “normally,” because it’s possible to define a closed category without a product.

Note: The two arrows and the two is are defined with respect to two different products. The first isomorphism must be natural in both a and b. Also, to complete the picture, there are some diagrams that must commute.

The two isomorphisms that define a closed functor can be relaxed and replaced by unidirectional morphisms. The result is a lax closed functor:

f (a => b) -> (f a => f b)
i -> f i

This looks almost like the definition of Applicative, except for one problem: how can we recover the natural transformation we call pure from a single morphism i -> f i.

One way to do it is from the position of strength. An endofunctor f has tensorial strength if there is a natural transformation:

stc a :: c ⊗ f a -> f (c ⊗ a)

Think of c as the context in which the computation f a is performed. Strength means that we can use this external context inside the computation.

In the category Set, with the tensor product replaced by cartesian product, all functors have canonical strength. In Haskell, we would define it as:

st (c, fa) = fmap ((,) c) fa

The morphism in the definition of the lax closed functor translates to:

unit :: () -> f ()

Notice that, up to isomorphism, the unit type () is the unit with respect to cartesian product. The relevant isomorphisms are:

λa :: ((), a) -> a
ρa :: (a, ()) -> a

Here’s the derivation from Rivas and Jaskelioff’s Notions of Computation as Monoids:

≅  (a, ())   -- unit law, ρ-1
-> (a, f ()) -- lax unit
-> f (a, ()) -- strength
≅  f a       -- lifted unit law, f ρ

Strength is necessary if you’re starting with a lax closed (or monoidal — see the next section) endofunctor in an arbitrary closed (or monoidal) category and you want to derive pure within that category — not after you restrict it to Set.

There is, however, an alternative derivation using the Yoneda lemma:

f ()
≅ forall a. (() -> a) -> f a  -- Yoneda
≅ forall a. a -> f a -- because: (() -> a) ≅ a

We recover the whole natural transformation from a single value. The advantage of this derivation is that it generalizes beyond endofunctors and it doesn’t require strength. As we’ll see later, it also ties nicely with the Day-convolution definition of applicative. The Yoneda lemma only works for Set-valued functors, but so does Day convolution (there are enriched versions of both Yoneda and Day convolution, but I’m not going to discuss them here).

We can define the categorical version of the Haskell’s applicative functor as a lax closed functor going from a closed category C to Set. It’s a functor equipped with a natural transformation:

f (a => b) -> (f a -> f b)

where a=>b is the internal hom-object in C (the second arrow is a function type in Set), and a function:

1 -> f i

where 1 is the singleton set and i is the unit object in C.

The importance of a categorical definition is that it comes with additional identities or “axioms.” A lax closed functor must be compatible with the structure of both categories. I will not go into details here, because we are really only interested in closed categories that are monoidal, where these axioms are easier to express.

The definition of a lax closed functor is easily translated to Haskell:

class Functor f => Closed f where
    (<*>) :: f (a -> b) -> f a -> f b
    unit :: f ()

Applicative as a Lax Monoidal Functor

Even though it’s possible to define a closed category without a monoidal structure, in practice we usually work with monoidal categories. This is reflected in the equivalent definition of the Haskell’s applicative functor as a lax monoidal functor. In Haskell, we would write:

class Functor f => Monoidal f where
    (>*<) :: (f a, f b) -> f (a, b)
    unit :: f ()

This definition is equivalent to our previous definition of a closed functor. That’s because, as we’ve seen, a function object in a monoidal category is defined in terms of a product. We can show the equivalence in a more general categorical setting.

This time let’s start with a symmetric closed monoidal category C, in which the function object is defined through the right adjoint to the tensor product:

C(z ⊗ a, b) ≅ C(z, a => b)

As usual, the tensor product is associative and unital — with the unit object i — up to isomorphism. The symmetry is defined through natural isomorphism:

γ :: a ⊗ b -> b ⊗ a

A functor f between two monoidal categories is lax monoidal if there exist: (1) a natural transformation

f a ⊗ f b -> f (a ⊗ b)

and (2) a morphism

i -> f i

Notice that the products and units on either side of the two mappings are from different categories.

A (lax-) monoidal functor must also preserve associativity and unit laws.

For instance a triple product

f a ⊗ (f b ⊗ f c)

may be rearranged using an associator α to give

(f a ⊗ f b) ⊗ f c

then converted to

f (a ⊗ b) ⊗ f c

and then to

f ((a ⊗ b) ⊗ c)

Or it could be first converted to

f a ⊗ f (b ⊗ c)

and then to

f (a ⊗ (b ⊗ c))

These two should be equivalent under the associator in C.


Similarly, f a ⊗ i can be simplified to f a using the right unitor ρ in D. Or it could be first converted to f a ⊗ f i, then to f (a ⊗ i), and then to f a, using the right unitor in C. The two paths should be equivalent. (Similarly for the left identity.)


We will now consider functors from C to Set, with Set equipped with the usual cartesian product, and the singleton set as unit. A lax monoidal functor is defined by: (1) a natural transformation:

(f a, f b) -> f (a ⊗ b)

and (2) a choice of an element of the set f i (a function from 1 to f i picks an element from that set).

We need the target category to be Set because we want to be able to use the Yoneda lemma to show equivalence with the standard definition of applicative. I’ll come back to this point later.

The Equivalence

The definitions of a lax closed and a lax monoidal functors are equivalent when C is a closed symmetric monoidal category. The proof relies on the existence of the adjunction, in particular the unit and the counit of the adjunction:

ηa :: a -> (b => (a ⊗ b))
εb :: (a => b) ⊗ a -> b

For instance, let’s assume that f is lax-closed. We want to construct the mapping

(f a, f b) -> f (a ⊗ b)

First, we apply the lifted pair (unit, identity), (f η, f id)

(f a -> f (b => a ⊗ b), f id)

to the left hand side. We get:

(f (b => a ⊗ b), f b)

Now we can use (the uncurried version of) the lax-closed morphism:

(f (b => x), f b) -> f x

to get:

f (a ⊗ b)

Conversely, assuming the lax-monoidal property we can show that the functor is lax-closed, that is to say, implement the following function:

(f (a => b), f a) -> f b

First we use the lax monoidal morphism on the left hand side:

f ((a => b) ⊗ a)

and then use the counit (a.k.a. the evaluation morphism) to get the desired result f b

There is yet another presentation of applicatives using Day convolution. But before we get there, we need a little refresher on calculus.

Calculus of Ends

Ends and coends are very useful constructs generalizing limits and colimits. They are defined through universal constructions. They have a few fundamental properties that are used over and over in categorical calculations. I’ll just introduce the notation and a few important identities. We’ll be working in a symmetric monoidal category C with functors from C to Set and profunctors from Cop×C to Set. The end of a profunctor p is a set denoted by:

a p a a

The most important thing about ends is that a set of natural transformations between two functors f and g can be represented as an end:

[C, Set](f, g) = ∫a C(f a, g a)

In Haskell, the end corresponds to universal quantification over a functor of mixed variance. For instance, the natural transformation formula takes the familiar form:

forall a. f a -> g a

The Yoneda lemma, which deals with natural transformations, can also be written using an end:

z (C(a, z) -> f z) ≅ f a

In Haskell, we can write it as the equivalence:

forall z. ((a -> z) -> f z) ≅ f a

which is a generalization of the continuation passing transform.

The dual notion of coend is similarly written using an integral sign, with the “integration variable” in the superscript position:

a p a a

In pseudo-Haskell, a coend is represented by an existential quantifier. It’s possible to define existential data types in Haskell by converting existential quantification to universal one. The relevant identity in terms of coends and ends reads:

(∫ z p z z) -> y ≅ ∫ z (p z z -> y)

In Haskell, this formula is used to turn functions that take existential types to functions that are polymorphic:

(exists z. p z z) -> y ≅ forall z. (p z z -> y)

Intuitively, it makes perfect sense. If you want to define a function that takes an existential type, you have to be prepared to handle any type.

The equivalent of the Yoneda lemma for coends reads:

z f z × C(z, a) ≅ f a

or, in pseudo-Haskell:

exists z. (f z, z -> a) ≅ f a

(The intuition is that the only thing you can do with this pair is to fmap the function over the first component.)

There is also a contravariant version of this identity:

z C(a, z) × f z ≅ f a

where f is a contravariant functor (a.k.a., a presheaf). In pseudo-Haskell:

exists z. (a -> z, f z) ≅ f a

(The intuition is that the only thing you can do with this pair is to apply the contramap of the first component to the second component.)

Using coends we can define a tensor product in the category of functors [C, Set]. This product is called Day convolution:

(f ★ g) a = ∫ x y f x × g y × C(x ⊗ y, a)

It is a bifunctor in that category (read, it can be used to lift natural transformations). It’s associative and symmetric up to isomorphism. It also has a unit — the hom-functor C(i, -), where i is the monoidal unit in C. In other words, Day convolution imbues the category [C, Set] with monoidal structure.

Let’s verify the unit laws.

(C(i, -) ★ g) a = ∫ x y C(i, x) × g y × C(x ⊗ y, a)

We can use the contravariant Yoneda to “integrate over x” to get:

y g y × C(i ⊗ y, a)

Considering that i is the unit of the tensor product in C, we get:

y g y × C(y, a)

Covariant Yoneda lets us “integrate over y” to get the desired g a. The same method works for the right unit law.

Applicative as a Monoid

Given a monoidal category, we can always define a monoid as an object m equipped with two morphisms:

μ :: m ⊗ m -> m
η :: i -> m

satisfying the laws of associativity and unitality.

We have shown that the functor category [C, Set] (with C a symmetric monoidal category) is monoidal under Day convolution. An object in this category is a functor f. The two morphisms that would make it a candidate for a monoid are natural transformations:

μ :: f ★ f -> f
η :: C(i, -) -> f

The a component of the natural transformation μ can be rewritten as:

(∫ x y f x × f y × C(x ⊗ y, a)) -> f a

which is equivalent to:

x y (f x × f y × C(x ⊗ y, a) -> f a)

or, upon currying:

x y (f x, f y) -> C(x ⊗ y, a) -> f a

It turns out that so defined monoid is equivalent to a lax monoidal functor. This was shown by Rivas and Jaskelioff. The following derivation is due to Bob Atkey.

The trick is to start with the whole set of natural transformation from f★f to f. The multiplication μ is just one of them. We’ll express the set of natural transformations as an end:

a ((f ★ f) a -> f a)

Plugging in the formula for the a component of μ, we get:

a x y (f x, f y) -> C(x ⊗ y, a) -> f a

The end over a does not involve the first argument, so we can move the integral sign:

x y (f x, f y) -> ∫ a C(x ⊗ y, a) -> f a

Then we use the Yoneda lemma to “perform the integration” over a:

x y (f x, f y) -> f (x ⊗ y)

You may recognize this as a set of natural transformations that define a lax monoidal functor. We have established a one-to-one correspondence between these natural transformations and the ones defining monoidal multiplication using Day convolution.

The remaining part is to show the equivalence between the unit with respect to Day convolution and the second part of the definition of the lax monoidal functor, the morphism:

1 -> f i

We start with the set of natural transformations that contains our η:

a (i -> a) -> f a

By Yoneda, this is just f i. Picking an element from a set is equivalent to defining a morphism from the singleton set 1, so for any choice of η we get:

1 -> f i

and vice versa. The two definitions are equivalent.

Notice that the monoidal unit η under Day convolution becomes the definition of pure in the Haskell version of applicative. Indeed, when we replace the category C with Set, f becomes and endofunctor, and the unit of Day convolution C(i, -) becomes the identity functor Id. We get:

η :: Id -> f

or, in components:

pure :: a -> f a

So, strictly speaking, the Haskell definition of Applicative mixes the elements of the lax closed functor and the monoidal unit under Day convolution.


I’m grateful to Mauro Jaskelioff and Exequiel Rivas for correspondence and to Bob Atkey, Dimitri Chikhladze, and Make Shulman for answering my questions on Math Overflow.

This is part 9 of Categories for Programmers. Previously: Functoriality. See the Table of Contents.

So far I’ve been glossing over the meaning of function types. A function type is different from other types.

Take Integer, for instance: It’s just a set of integers. Bool is a two element set. But a function type a->b is more than that: it’s a set of morphisms between objects a and b. A set of morphisms between two objects in any category is called a hom-set. It just so happens that in the category Set every hom-set is itself an object in the same category —because it is, after all, a set.

Hom-set in Set is just a set

Hom-set in Set is just a set

The same is not true of other categories where hom-sets are external to a category. They are even called external hom-sets.

Hom-set in category C is an external set

Hom-set in category C is an external set

It’s the self-referential nature of the category Set that makes function types special. But there is a way, at least in some categories, to construct objects that represent hom-sets. Such objects are called internal hom-sets.

Universal Construction

Let’s forget for a moment that function types are sets and try to construct a function type, or more generally, an internal hom-set, from scratch. As usual, we’ll take our cues from the Set category, but carefully avoid using any properties of sets, so that the construction will automatically work for other categories.

A function type may be considered a composite type because of its relationship to the argument type and the result type. We’ve already seen the constructions of composite types — those that involved relationships between objects. We used universal constructions to define a product type and a coproduct types. We can use the same trick to define a function type. We will need a pattern that involves three objects: the function type that we are constructing, the argument type, and the result type.

The obvious pattern that connects these three types is called function application or evaluation. Given a candidate for a function type, let’s call it z (notice that, if we are not in the category Set, this is just an object like any other object), and the argument type a (an object), the application maps this pair to the result type b (an object). We have three objects, two of them fixed (the ones representing the argument type and the result type).

We also have the application, which is a mapping. How do we incorporate this mapping into our pattern? If we were allowed to look inside objects, we could pair a function f (an element of z) with an argument x (an element of a) and map it to f x (the application of f to x, which is an element of b).

In Set we can pick a function f from a set of functions z and we can pick an argument x from the set (type) a. We get an element f x in the set (type) b.

In Set we can pick a function f from a set of functions z and we can pick an argument x from the set (type) a. We get an element f x in the set (type) b.

But instead of dealing with individual pairs (f, x), we can as well talk about the whole product of the function type z and the argument type a. The product z×a is an object, and we can pick, as our application morphism, an arrow g from that object to b. In Set, g would be the function that maps every pair (f, x) to f x.

So that’s the pattern: a product of two objects z and a connected to another object b by a morphism g.

A pattern of objects and morphisms that is the starting point of the universal construction

A pattern of objects and morphisms that is the starting point of the universal construction

Is this pattern specific enough to single out the function type using a universal construction? Not in every category. But in the categories of interest to us it is. And another question: Would it be possible to define a function object without first defining a product? There are categories in which there is no product, or there isn’t a product for all pairs of objects. The answer is no: there is no function type, if there is no product type. We’ll come back to this later when we talk about exponentials.

Let’s review the universal construction. We start with a pattern of objects and morphisms. That’s our imprecise query, and it usually yields lots and lots of hits. In particular, in Set, pretty much everything is connected to everything. We can take any object z, form its product with a, and there’s going to be a function from it to b (except when b is an empty set).

That’s when we apply our secret weapon: ranking. This is usually done by requiring that there be a mapping between candidate objects — a mapping that somehow factorizes our construction. In our case, we’ll decree that z together with the morphism g from z×a to b is better than some other z' with its own application g', if and only if there is a mapping h from z' to z such that the application of g' factors through the application of g. (Hint: Read this sentence while looking at the picture.)

Establishing a ranking between candidates for the function object

Establishing a ranking between candidates for the function object

Now here’s the tricky part, and the main reason I postponed this particular universal construction till now. Given the morphism h :: z'-> z, we want to close the diagram that has both z' and z crossed with a. What we really need, given the mapping h from z' to z, is a mapping from z'×a to z×a. And now, after discussing the functoriality of the product, we know how to do it. Because the product itself is a functor (more precisely an endo-bi-functor), it’s possible to lift pairs of morphisms. In other words, we can define not only products of objects but also products of morphisms.

Since we are not touching the second component of the product z'×a, we will lift the pair of morphisms (h, id), where id is an identity on a.

So, here’s how we can factor one application, g, out of another application g':

g' = g ∘ (h × id)

The key here is the action of the product on morphisms.

The third part of the universal construction is selecting the object that is universally the best. Let’s call this object a⇒b (think of this as a symbolic name for one object, not to be confused with a Haskell typeclass constraint — I’ll discuss different ways of naming it later). This object comes with its own application — a morphism from (a⇒b)×a to b — which we will call eval. The object a⇒b is the best if any other candidate for a function object can be uniquely mapped to it in such a way that its application morphism g factorizes through eval. This object is better than any other object according to our ranking.

The definition of the universal function object

The definition of the universal function object. This is the same diagram as above, but now the object a⇒b is universal.


A function object from a to b is an object a⇒b together with the morphism

eval :: ((a⇒b) × a) -> b

such that for any other object z with a morphism

g :: z × a -> b

there is a unique morphism

h :: z -> (a⇒b)

that factors g through eval:

g = eval ∘ (h × id)

Of course, there is no guarantee that such an object a⇒b exists for any pair of objects a and b in a given category. But it always does in Set. Moreover, in Set, this object is isomorphic to the hom-set Set(a, b).

This is why, in Haskell, we interpret the function type a->b as the categorical function object a⇒b.


Let’s have a second look at all the candidates for the function object. This time, however, let’s think of the morphism g as a function of two variables, z and a.

g :: z × a -> b

Being a morphism from a product comes as close as it gets to being a function of two variables. In particular, in Set, g is a function from pairs of values, one from the set z and one from the set a.

On the other hand, the universal property tells us that for each such g there is a unique morphism h that maps z to a function object a⇒b.

h :: z -> (a⇒b)

In Set, this just means that h is a function that takes one variable of type z and returns a function from a to b. That makes h a higher order function. Therefore the universal construction establishes a one-to-one correspondence between functions of two variables and functions of one variable returning functions. This correspondence is called currying, and h is called the curried version of g.

This correspondence is one-to-one, because given any g there is a unique h, and given any h you can always recreate the two-argument function g using the formula:

g = eval ∘ (h × id)

The function g can be called the uncurried version of h.

Currying is essentially built into the syntax of Haskell. A function returning a function:

a -> (b -> c)

is often thought of as a function of two variables. That’s how we read the un-parenthesized signature:

a -> b -> c

This interpretation is apparent in the way we define multi-argument functions. For instance:

catstr :: String -> String -> String
catstr s s’ = s ++ s’

The same function can be written as a one-argument function returning a function — a lambda:

catstr’ s = \s’ -> s ++ s’

These two definitions are equivalent, and either can be partially applied to just one argument, producing a one-argument function, as in:

greet :: String -> String
greet = catstr “Hello “

Strictly speaking, a function of two variables is one that takes a pair (a product type):

(a, b) -> c

It’s trivial to convert between the two representations, and the two (higher-order) functions that do it are called, unsurprisingly, curry and uncurry:

curry :: ((a, b)->c) -> (a->b->c)
curry f a b = f (a, b)


uncurry :: (a->b->c) -> ((a, b)->c)
uncurry f (a, b) = f a b

Notice that curry is the factorizer for the universal construction of the function object. This is especially apparent if it’s rewritten in this form:

factorizer :: ((a, b)->c) -> (a->(b->c))
factorizer g = \a -> (\b -> g (a, b))

(As a reminder: A factorizer produces the factorizing function from a candidate.)

In non-functional languages, like C++, currying is possible but nontrivial. You can think of multi-argument functions in C++ as corresponding to Haskell functions taking tuples (although, to confuse things even more, in C++ you can define functions that take an explicit std::tuple, as well as variadic functions, and functions taking initializer lists).

You can partially apply a C++ function using the template std::bind. For instance, given a function of two strings:

std::string catstr(std::string s1, std::string s2) {
    return s1 + s2;

you can define a function of one string:

using namespace std::placeholders;

auto greet = std::bind(catstr, "Hello ", _1);
std::cout << greet("Haskell Curry");

Scala, which is more functional than C++ or Java, falls somewhere in between. If you anticipate that the function you’re defining will be partially applied, you define it with multiple argument lists:

def catstr(s1: String)(s2: String) = s1 + s2

Of course that requires some amount of foresight or prescience on the part of a library writer.


In mathematical literature, the function object, or the internal hom-object between two objects a and b, is often called the exponential and denoted by ba. Notice that the argument type is in the exponent. This notation might seem strange at first, but it makes perfect sense if you think of the relationship between functions and products. We’ve already seen that we have to use the product in the universal construction of the internal hom-object, but the connection goes deeper than that.

This is best seen when you consider functions between finite types — types that have a finite number of values, like Bool, Char, or even Int or Double. Such functions, at least in principle, can be fully memoized or turned into data structures to be looked up. And this is the essence of the equivalence between functions, which are morphisms, and function types, which are objects.

For instance a (pure) function from Bool is completely specified by a pair of values: one corresponding to False, and one corresponding to True. The set of all possible functions from Bool to, say, Int is the set of all pairs of Ints. This is the same as the product Int × Int or, being a little creative with notation, Int2.

For another example, let’s look at the C++ type char, which contains 256 values (Haskell Char is larger, because Haskell uses Unicode). There are several functions in the part of the C++ Standard Library that are usually implemented using lookups. Functions like isupper or isspace are implemented using tables, which are equivalent to tuples of 256 Boolean values. A tuple is a product type, so we are dealing with products of 256 Booleans: bool × bool × bool × ... × bool. We know from arithmetics that an iterated product defines a power. If you “multiply” bool by itself 256 (or char) times, you get bool to the power of char, or boolchar.

How many values are there in the type defined as 256-tuples of bool? Exactly 2256. This is also the number of different functions from char to bool, each function corresponding to a unique 256-tuple. You can similarly calculate that the number of functions from bool to char is 2562, and so on. The exponential notation for function types makes perfect sense in these cases.

We probably wouldn’t want to fully memoize a function from int or double. But the equivalence between functions and data types, if not always practical, is there. There are also infinite types, for instance lists, strings, or trees. Eager memoization of functions from those types would require infinite storage. But Haskell is a lazy language, so the boundary between lazily evaluated (infinite) data structures and functions is fuzzy. This function vs. data duality explains the identification of Haskell’s function type with the categorical exponential object — which corresponds more to our idea of data.

Cartesian Closed Categories

Although I will continue using the category of sets as a model for types and functions, it’s worth mentioning that there is a larger family of categories that can be used for that purpose. These categories are called cartesian closed, and Set is just one example of such a category.

A cartesian closed category must contain:

  1. The terminal object,
  2. A product of any pair of objects, and
  3. An exponential for any pair of objects.

If you consider an exponential as an iterated product (possibly infinitely many times), then you can think of a cartesian closed category as one supporting products of an arbitrary arity. In particular, the terminal object can be thought of as a product of zero objects — or the zero-th power of an object.

What’s interesting about cartesian closed categories from the perspective of computer science is that they provide models for the simply typed lambda calculus, which forms the basis of all typed programming languages.

The terminal object and the product have their duals: the initial object and the coproduct. A cartesian closed category that also supports those two, and in which product can be distributed over coproduct

a × (b + c) = a × b + a × c
(b + c) × a = b × a + c × a

is called a bicartesian closed category. We’ll see in the next section that bicartesian closed categories, of which Set is a prime example, have some interesting properties.

Exponentials and Algebraic Data Types

The interpretation of function types as exponentials fits very well into the scheme of algebraic data types. It turns out that all the basic identities from high-school algebra relating numbers zero and one, sums, products, and exponentials hold pretty much unchanged in any bicartesian closed category theory for, respectively, initial and final objects, coproducts, products, and exponentials. We don’t have the tools yet to prove them (such as adjunctions or the Yoneda lemma), but I’ll list them here nevertheless as a source of valuable intuitions.

Zeroth Power

a0 = 1

In the categorical interpretation, we replace 0 with the initial object, 1 with the final object, and equality with isomorphism. The exponential is the internal hom-object. This particular exponential represents the set of morphisms going from the initial object to an arbitrary object a. By the definition of the initial object, there is exactly one such morphism, so the hom-set C(0, a) is a singleton set. A singleton set is the terminal object in Set, so this identity trivially works in Set. What we are saying is that it works in any bicartesian closed category.

In Haskell, we replace 0 with Void; 1 with the unit type (); and the exponential with function type. The claim is that the set of functions from Void to any type a is equivalent to the unit type — which is a singleton. In other words, there is only one function Void->a. We’ve seen this function before: it’s called absurd.

This is a little bit tricky, for two reasons. One is that in Haskell we don’t really have uninhabited types — every type contains the “result of a never ending calculation,” or the bottom. The second reason is that all implementations of absurd are equivalent because, no matter what they do, nobody can ever execute them. There is no value that can be passed to absurd. (And if you manage to pass it a never ending calculation, it will never return!)

Powers of One

1a = 1

This identity, when interpreted in Set, restates the definition of the terminal object: There is a unique morphism from any object to the terminal object. In general, the internal hom-object from a to the terminal object is isomorphic to the terminal object itself.

In Haskell, there is only one function from any type a to unit. We’ve seen this function before — it’s called unit. You can also think of it as the function const partially applied to ().

First Power

a1 = a

This is a restatement of the observation that morphisms from the terminal object can be used to pick “elements” of the object a. The set of such morphisms is isomorphic to the object itself. In Set, and in Haskell, the isomorphism is between elements of the set a and functions that pick those elements, ()->a.

Exponentials of Sums

ab+c = ab × ac

Categorically, this says that the exponential from a coproduct of two objects is isomorphic to a product of two exponentials. In Haskell, this algebraic identity has a very practical, interpretation. It tells us that a function from a sum of two types is equivalent to a pair of functions from individual types. This is just the case analysis that we use when defining functions on sums. Instead of writing one function definition with a case statement, we usually split it into two (or more) functions dealing with each type constructor separately. For instance, take a function from the sum type (Either Int Double):

f :: Either Int Double -> String

It may be defined as a pair of functions from, respectively, Int and Double:

f (Left n)  = if n < 0 then "Negative int" else "Positive int"
f (Right x) = if x < 0.0 then "Negative double" else "Positive double"

Here, n is an Int and x is a Double.

Exponentials of Exponentials

(ab)c = ab×c

This is just a way of expressing currying purely in terms of exponential objects. A function returning a function is equivalent to a function from a product (a two-argument function).

Exponentials over Products

(a × b)c = ac × bc

In Haskell: A function returning a pair is equivalent to a pair of functions, each producing one element of the pair.

It’s pretty incredible how those simple high-school algebraic identities can be lifted to category theory and have practical application in functional programming.

Curry-Howard Isomorphism

I have already mentioned the correspondence between logic and algebraic data types. The Void type and the unit type () correspond to false and true. Product types and sum types correspond to logical conjunction ∧ (AND) and disjunction ⋁ (OR). In this scheme, the function type we have just defined corresponds to logical implication ⇒. In other words, the type a->b can be read as “if a then b.”

According to the Curry-Howard isomorphism, every type can be interpreted as a proposition — a statement or a judgment that may be true or false. Such a proposition is considered true if the type is inhabited and false if it isn’t. In particular, a logical implication is true if the function type corresponding to it is inhabited, which means that there exists a function of that type. An implementation of a function is therefore a proof of a theorem. Writing programs is equivalent to proving theorems. Let’s see a few examples.

Let’s take the function eval we have introduced in the definition of the function object. Its signature is:

eval :: ((a -> b), a) -> b

It takes a pair consisting of a function and its argument and produces a result of the appropriate type. It’s the Haskell implementation of the morphism:

eval :: (a⇒b) × a -> b

which defines the function type a⇒b (or the exponential object ba). Let’s translate this signature to a logical predicate using the Curry-Howard isomorphism:

((a ⇒ b) ∧ a) ⇒ b

Here’s how you can read this statement: If it’s true that b follows from a, and a is true, then b must be true. This makes perfect intuitive sense and has been known since antiquity as modus ponens. We can prove this theorem by implementing the function:

eval :: ((a -> b), a) -> b
eval (f, x) = f x

If you give me a pair consisting of a function f taking a and returning b, and a concrete value x of type a, I can produce a concrete value of type b by simply applying the function f to x. By implementing this function I have just shown that the type ((a -> b), a) -> b is inhabited. Therefore modus ponens is true in our logic.

How about a predicate that is blatantly false? For instance: if a or b is true then a must be true.

a ⋁ b ⇒ a

This is obviously wrong because you can chose an a that is false and a b that is true, and that’s a counter-example.

Mapping this predicate into a function signature using the Curry-Howard isomorphism, we get:

Either a b -> a

Try as you may, you can’t implement this function — you can’t produce a value of type a if you are called with the Right value. (Remember, we are talking about pure functions.)

Finally, we come to the meaning of the absurd function:

absurd :: Void -> a

Considering that Void translates into false, we get:

 false ⇒ a

Anything follows from falsehood (ex falso quodlibet). Here’s one possible proof (implementation) of this statement (function) in Haskell:

absurd (Void a) = absurd a

where Void is defined as:

newtype Void = Void Void

As always, the type Void is tricky. This definition makes it impossible to construct a value because in order to construct one, you would need to provide one. Therefore, the function absurd can never be called.

These are all interesting examples, but is there a practical side to Curry-Howard isomorphism? Probably not in everyday programming. But there are programming languages like Agda or Coq, which take advantage of the Curry-Howard isomorphism to prove theorems.

Computers are not only helping mathematicians do their work — they are revolutionizing the very foundations of mathematics. The latest hot research topic in that area is called Homotopy Type Theory, and is an outgrowth of type theory. It’s full of Booleans, integers, products and coproducts, function types, and so on. And, as if to dispel any doubts, the theory is being formulated in Coq and Agda. Computers are revolutionizing the world in more than one way.


  1. Ralph Hinze, Daniel W. H. James, Reason Isomorphically!. This paper contains proofs of all those high-school algebraic identities in category theory that I mentioned in this chapter.

Next: Natural Transformations.


I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help throughout this series of posts.

I’m not fond of arguments based on lack of imagination. “There’s no way this code may fail!” might be a sign of great confidence or the result of ignorance. The inability to come up with a counterexample doesn’t prove a theorem. And yet there is one area of programming where such arguments work, and are quite useful. These are parametricity arguments: free theorems about polymorphic functions. Fortunately, there is solid theory behind parametricity. Free theorems are not based on ignorance. So I decided to read the relevant papers (see bibliography at the end of this post) and write a blog about it. How hard could it be? A few months and several failed attempts later I realized how naive I was. But I think I finally understand the basics enough to explain them in relatively simple terms.


Here’s a classic example — a function that takes a list of arbitrary type a and returns a list of the same type:

r :: [a] -> [a]

What can this function do? Since it has to work with any type of list element, it can’t do anything type-specific. It can’t modify the elements or invent new ones. So all it can do is rearrange them, duplicate, or remove. Can you think of anything else?

The questions it a little tricky because it all depends on the kind of polymorphism your language supports. In Haskell, where we have parametric polymorphism, the above statement is for the most part true (modulo termination worries). In C++, which supports ad-hoc polymorphism, a generic function like:

template<class T> 
list<T> r(list<T>);

can do all kinds of weird things.

Parametric polymorphism means that a function will act on all types uniformly, so the above declaration of r indeed drastically narrows down the possibilities.

For instance, consider what happens when you map any function of the type:

f :: a -> b

over a list of a. You can either apply map before or after acting on it with r. It shouldn’t matter whether you first modify the elements of the list and then rearrange them, or first rearrange and then modify them. The result should be the same:

r (map f as) = map f (r as)

But is it true just because we can’t imagine how it may fail, or can we make a formal argument to prove it?

Let’s Argue (Denotational) Semantics

One way to understand polymorphism is to have a good model for types. At first approximation types can be modeled as sets of values (strictly speaking, as shown by Reynolds, the set-theoretical model fails in the case of polymorphic lambda calculus, but there are ways around it).

The type Bool is a two-element set of True and False, Integer is a set of integers, and so on. Composite types can also be defined set-theoretically. For instance, a pair type is a cartesian product of two sets. A list of a is a set of lists with elements from the set a. A function type a->b is a set of functions between two sets.

For parametric polymorphism you need to first be able to define functions on types: functions that take a type and produce a new type. In other words, you should be able to define a family of types that is parametrized by another type. In Haskell, we call such things type constructors.

For instance, given some type a, produce a type of pairs: (a, a). This can be formally written (not in Haskell) as:

Λa . (a, a)

Notice the capital lambda for defining functions on types (sets), as opposed to the lowercase lambda used for functions on values (set elements).

To turn a family of types into a family of values — a polymorphic value — you put the universal quantifier forall in front of it. Don’t read too much into the quantifier aspect of it — it makes sense in the Curry-Howard isomorphism, but here it’s just a piece of syntax. It means that you use the type constructor to pick a type, and then you pick a specific value of that type.

You may recall the Axiom of Choice (AoC) from set theory. This axiom says that if you have a set of sets then there always exists a set of samples created by picking one element from each set. It’s like going to a chocolate store and ordering one of each. It’s a controversial axiom, and mathematicians are very careful in either using or avoiding it. The controversy is that, for infinite sets of sets, there may be no constructive way of picking elements. And in computer science we are not so much interested in proofs of existence, as in actual algorithms that produce tangible results.

Here’s an example:

forall a . (a, a)

This is a valid type signature, but you’d be hard pressed to implement it. You’d have to provide a pair of concrete values for every possible type. You can’t do it uniformly across all types. (Especially that some types are uninhabited, as Gershom Bazerman kindly pointed out to me.)

Interestingly enough, you can sometimes define polymorphic values if you constrain polymorphism to certain typeclasses. For instance, when you define a numeric constant in Haskell:

x = 1

its type is polymorphic:

x :: forall a. Num a => a

(using the language extension ExplicitForAll). Here x represents a whole family of values, including:

1.0 :: Float
1 :: Int
1 :: Integer

with potentially different representations.

But there are some types of values that can be specified wholesale. These are function values. Functions are first class values in Haskell (although you can’t compare them for equality). And with one formula you can define a whole family of functions. The following signature, for instance, is perfectly implementable:

forall a . a -> a

Let’s analyze it. It consists of a type function, or a type constructor:

Λa . a -> a

which, for any type a, returns a function type a->a. When universally quantified with forall, it becomes a family of concrete functions, one per each type. This is possible because all these functions can be defined with one closed term (see Appendix 2). Here’s this term:

\x -> x

In this case we actually have a constructive way of picking one element — a function — for each type a. For instance, if a is a String, we pick a function that takes any String and returns the same string. It’s a particular String->String function, one of many possible String->String functions. And it’s different from the Int->Int function that takes an Int and returns the same Int. But all these identity functions are encoded using the same lambda expression. It’s that generic formula that allows us to chose a representative function from each set of functions a->a: one from the set String->String, one from the set Int->Int, etc.

In Haskell, we usually omit the forall quantifier when there’s no danger of confusion. Any signature that contains a type variable is automatically universally quantified over it. (You’ll have to use explicit forall, however, with higher-order polymorphism, where a polymorphic function can be passed as an argument to another function.)

So what’s the set-theoretic model for polymorphism? You simply replace types with sets. A function on types becomes a function on sets. Notice that this is not the same as a function between sets. The latter assigns elements of one set to elements of another. The former assigns sets to sets — you could call it a set constructor. As in: Take any set a and return a cartesian product of this set with itself.

Or take any set a and return the set of functions from this set to itself. We have just seen that for this one we can easily build a polymorphic function — one which for every type a produces an actual function whose type is (a->a). Now, with ad-hoc polymorphism it’s okay to code the String function separately from the Int function; but in parametric polymorphism, you’ll have to use the same code for all types.

This uniformity — one formula for all types — dramatically restricts the set of polymorphic functions, and is the source of free theorems.

Any language that provides some kind of pattern-matching on types (e.g., template specialization in C++) automatically introduces ad-hoc polymorphism. Ad-hoc polymorphism is also possible in Haskell through the use of type classes and type families.

Preservation of Relations

Let’s go to our original example and rewrite it using the explicit universal quantifier:

r :: forall a. [a] -> [a]

It defines a family of functions parametrized by the type a. When used in Haskell code, a particular member of this family will be picked automatically by the type inference system, depending on the context. In what follows, I’ll use explicit subscripting for the same purpose. The free theorem I mentioned before can be rewritten as:

rb (map f as) = map f (ra as)

with the function:

f :: a -> b

serving as a bridge between the types a and b. Specifically, f relates values of type a to values of type b. This relation happens to be functional, which means that there is only one value of type b corresponding to any given value of type a.

But the correspondence between elements of two lists may, in principle, be more general. What’s more general than a function? A relation. A relation between two sets a and b is defined as a set of pairs — a subset of the cartesian product of a and b. A function is a special case of a relation, one that can be represented as a set of pairs of the form (x, f x), or in relational notation x <=> f x. This relation is often called the graph of the function, since it can be interpreted as coordinates of points on a 2-d plane that form the plot the function.

The key insight of Reynolds was that you can abstract the shape of a data structure by defining relations between values. For instance, how do we know that two pairs have the same shape — even if one is a pair of integers, say (1, 7), and the other a pair of colors, say (Red, Blue)? Because we can relate 1 to Red and 7 to Blue. This relation may be called: “occupying the same position”.

Notice that the relation doesn’t have to be functional. The pair (2, 2) can be related to the pair (Black, White) using the non-functional relation:

(2 <=> Black),
(2 <=> White)

This is not a function because 2 is not mapped to a single value.

Conversely, given any relation between integers and colors, you can easily test which integer pairs are related to which color pairs. For the above relation, for instance, these are all the pairs that are related:

((2, 2) <=> (Black, Black)),
((2, 2) <=> (Black, White)),
((2, 2) <=> (White, Black)),
((2, 2) <=> (White, White))

Thus a relation between values induces a relation between pairs.

This idea is easily extended to lists. Two lists are related if their corresponding elements are related: the first element of one list must be related to the first element of the second list, etc.; and empty lists are always related.

In particular, if the relationship between elements is established by a function f, it’s easy to convince yourself that the lists as and bs are related if

bs = map f as

With this in mind, our free theorem can be rewritten as:

rb bs = map f (ra as)

In other words, it tells us that the two lists

rb bs


ra as

are related through f.


Fig 1. Polymorphic function r rearranges lists but preserves relations between elements

So r transforms related lists into related lists. It may change the shape of the list, but it never touches the values in it. When it acts on two related lists, it rearranges them in exactly the same way, without breaking any of the relations between corresponding elements.

Reading Types as Relations

The above examples showed that we can define relations between values of composite types in terms of relations between values of simpler types. We’ve seen this with the pair constructor and with the list constructor. Continuing this trend, we can state that two functions:

f :: a -> b


g :: a' -> b'

are related iff, for related x and y, f x is related to g y. In other words, related functions map related arguments to related values.

Notice what we are doing here: We are consistently replacing types with relations in type constructors. This way we can read complex types as relations. The type constructor -> acts on two types, a and b. We extend it to act on relations: The “relation constructor” -> in A->B takes two relations A (between a and a') and B (between b and b') and produces a relation between functions f and g.

But what about primitive types? Let’s consider an example. Two functions from lists to integers that simply calculate the lengths of the lists:

lenStr  :: [Char] -> Int
lenBool :: [Bool] -> Int

What happens when we call them with two related lists? The first requirement for lists to be related is that they are of equal length. So when called with related lists the two functions will return the same integer value . It makes sense for us to consider these two functions related because they don’t inspect the values stored in the lists — just their shapes. (They also look like components of the same parametrically polymorphic function, length.)

It therefore makes sense to read a primitive type, such as Int, as an identity relation: two values are related if they are equal. This way our two functions, lenStr and lenBool are indeed related, because they turn related lists to related (equal) results.

Notice that for non-polymorphic functions the relationship that follows from their type is pretty restrictive. For instance, two functions Int->Int are related if and only if their outputs are equal for equal inputs. In other words, the functions must be (extensionally) equal.

All these relations are pretty trivial until we get to polymorphic functions. The type of a polymorphic function is specified by universally quantifying a function on types (a type constructor).

f :: forall a. φa

The type constructor φ maps types to types. In our set-theoretical model it maps sets to sets, but we want to read it in terms of relations.

Functions on relations

A general relation is a triple: We have to specify three sets, a, a', and a set of pairs — a subset of the cartesian product a × a'. It’s not at all obvious how to define functions that map relations to relations. What Reynolds chose is a definition that naturally factorizes into three mappings of sets, or to use the language of programming, three type constructors.

First of all, a function on relations Φ (or a “relation constructor”) is defined by two type constructors, φ and ψ. When Φ acts on a relation A between sets a and a', it first maps those sets, so that b=φa and b'=ψa'. ΦA then establishes a relation between the sets b and b'. In other words, ΦA is a subset of b × b'.


Fig 2. Φ maps relations to relations. The squarish sets represent cartesian products (think of a square as a cartesian product of two segments). Relations A and ΦA are subsets of these products.

Relations between polymorphic functions

Given that Φ maps relations to relations, a universally quantified version of it:

forall A. ΦA

maps pairs of sets to pairs of values.

Now suppose that you have two polymorphic functions g and g':

g  :: forall a . φa
g' :: forall a'. ψa'

They both map types (sets) to values.

  • We can instantiate g at some type a, and it will return a value ga of the type b=φa.
  • We can instantiate g' at some type a', and it will return a value g'a' of the type b'=ψa'.

We can do this for any relation A between two arbitrary sets a and a'.

We will say that g and g' are related through the relation induced by the type (forall A. ΦA) iff the results ga and g'a' are related by ΦA.


Fig 3. Relation between two polymorphic functions. The pair (g a, g' a') falls inside the relation ΦA.

In other words, polymorphic functions are related if they map related types to related values. Notice that in the interesting examples these values are themselves functions.

With these definitions, we can now reinterpret any type signature as a relation between values.

The Parametricity Theorem

Reynolds’ second key insight was that any term is in a relation with itself — the relation being induced by the term’s type. We have indeed defined the mapping of types to relations to make this work. Primitive types turn into identity relations, so obviously a primitive value is in relation with itself. A function between primitive types is in relation with itself because it maps related (equal) arguments into related (equal) results. A list or a pair of primitive types is in relation with itself because each element of it is equal to itself. You can recurse and consider a list of functions, or a pair of lists, etc., building the proof inductively, proceeding from simpler types to more and more complex types. The proof goes over all possible term construction rules and typing rules in a given theory.

Formally, this kind of proof is called “structural induction,” because you’re showing that more complex structures will satisfy the theorem as long as the simpler ones, from which they are constructed, do. The only tricky part is dealing with polymorphic functions, because they are quantified over all types (including polymorphic types). In fact, this is the reason why the naive interpretation of types as sets breaks down (see, however, Pitts’ paper). It is possible, however, to prove the parametricity theorem in a more general setting, for instance, using frames, or in the framework of operational semantics, so we won’t worry about it here.

Wadler’s key insight was to interpret Reynolds’ theorem not only as a way of identifying different implementations of the same type — for instance, cartesian and polar representations of complex numbers — but also as a source of free theorems for polymorphic types.

Let’s try applying parametricity theorem to some simple examples. Take a constant term: an integer like 5. Its type Int can be interpreted as a relation, which we defined to be the identity relation (it’s one of the primitive types). And indeed, 5 is in this relation with 5.

Take a function like:

ord :: Char -> Int

Its type defines a relation between functions: Two functions of the type Char->Int are related if they return equal integers for equal characters. Obviously, ord is in this relation with itself.

Parametricity in Action

Those were trivial examples. The interesting ones involve polymorphic functions. So let’s go back to our starting example. The term now is the polymorphic function r whose type is:

r :: forall a . [a] -> [a]

Parametricity tells us that r is in relation with itself. However, comparing a polymorphic function to itself involves comparing the instantiations of the same function at two arbitrary types, say a and a'. Let’s go through this example step by step.

We are free to pick an arbitrary relation A between elements of two arbitrary input sets a and a'. The type of r induces a mapping Φ on relations. As with every function on relations, we have to first identify the two type constructors φ and ψ, one mapping a and one mapping a'. In our case they are identical, because they are induced by the same polymorphic function. They are equal to:

Λ a. [a]->[a]

It’s a type constructor that maps an arbitrary type a to the function type [a]->[a].

The universal quantifier forall means that r lets us pick a particular value of the type [a]->[a] for each a. This value is a function that we call ra. We don’t care how this function is picked by r, as long as it’s picked uniformly, using a single formula for all a, so that our parametricity theorem holds.


Fig 4. Polymorphic function r maps related types to related values, which themselves are functions on lists

Parametricity means that, if a is related to a', then:

ra <=> ra'

This particular relation is induced by the function type [a]->[a]. By our definition, two functions are related if they map related arguments to related results. In this case both the arguments and the results are lists. So if we have two related lists, as and as':

as  :: [a]
as' :: [a']

they must, by parametricity, be mapped to two related lists, bs and bs':

bs  = ra  as
bs' = ra' as'

This must be true for any relation A, so let’s pick a functional relation generated by some function:

f :: a -> a'

This relation induces a relation on lists:

as' = map f as

The results of applying r, therefore, must be related through the same relation:

bs' = map f bs

Combining all these equalities, we get our expected result:

ra' (map f as) = map f (ra as)

Parametricity and Natural Transformations

The free theorem I used as the running example is interesting for another reason: The list constructor is a functor. You may think of functors as generalized containers for storing arbitrary types of values. You can imagine that they have shapes; and for two containers of the same shape you may establish a correspondence between “positions” at which the elements are stored. This is quite easy for traditional containers like lists or trees, and with a leap of faith it can be stretched to non-traditional “containers” like functions. We used the intuition of relations corresponding to the idea of “occupying the same position” within a data structure. This notion can be readily generalized to any polymorphic containers. Two trees, for instance, are related if they are both empty, or if they have the same shape and their corresponding elements are related.

Let’s try another functor: You can also think of Maybe as having two shapes: Nothing and Just. Two Nothings are always related, and two Justs are related if their contents are related.

This observation immediately gives us a free theorem about polymorphic functions of the type:

r :: forall a. [a] -> Maybe a

an example of which is safeHead. The theorem is:

fmap h . safeHead == safeHead . fmap h

Notice that the fmap on the left is defined by the Maybe functor, whereas the one on the right is the list one.

If you accept the premise that an appropriate relation can be defined for any functor, then you can derive a free theorem for all polymorphic functions of the type:

r :: forall a. f a -> g a

where f and g are functors. This type of function is known as a natural transformation between the two functors, and the free theorem:

fmap h . r == r . fmap h

is the naturality condition. That’s how naturality follows from parametricity.


I’d like to thank all the people I talked to about parametricity at the ICFP in Gothenburg, and Edward Kmett for reading and commenting on the draft of this blog.

Appendix 1: Other Examples

Here’s a list of other free theorems from Wadler’s paper. You might try proving them using parametricity.

r :: [a] -> a -- for instance, head
f . r == r . fmap f
r :: [a] -> [a] -> [a] -- for instance, (++)
fmap f (r as bs) == r (fmap f as) (fmap f bs)
r :: [[a]] -> [a] -- for instance, concat
fmap f . r == r . fmap (fmap f)
r :: (a, b) -> a -- for instance, fst
f . r == r . mapPair (f, g)
r :: (a, b) -> b -- for instance, snd
g . r == r . mapPair (f, g)
r :: ([a], [b]) -> [(a, b)] -- for instance, uncurry zip
fmap (mapPair (f, g)) . r == r . mapPair (fmap f, fmap g)
r :: (a -> Bool) -> [a] -> [a] -- for instance, filter
fmap f . r (p . f) = r p . fmap f
r :: (a -> a -> Ordering) -> [a] -> [a] -- for instance, sortBy
 -- assuming: f is monotone (preserves order)
fmap f . r cmp == r cmp' . fmap f
r :: (a -> b -> b) -> b -> [a] -> b -- for instance, foldl
-- assuming: g (acc x y) == acc (f x) (g y)
g . foldl acc zero == foldl acc (g zero) . fmap f
r :: a -> a -- id
f . r == r . f
r :: a -> b -> a -- for instance, the K combinator
f (r x y) == r (f x) (g y)


mapPair :: (a -> c, b -> d) -> (a, b) -> (c, d)
mapPair (f, g) (x, y) = (f x, g y)

Appendix 2: Identity Function

Let’s prove that there is only one polymorphic function of the type:

r :: forall a. a -> a

and it’s the identity function:

id x = x

We start by picking a particular relation. It’s a relation between the unit type () and an arbitrary (inhabited) type a. The relation consists of just one pair ((), c), where () is the unit value and c is an element of a. By parametricity, the function

r() :: () -> ()

must be related to the function

ra :: a -> a

There is only one function of the type ()->() and it’s id(). Related functions must map related argument to related values. We know that r() maps unit value () to unit value (). Therefore ra must map c to c. Since c is arbitrary, ra must be an identity for all (inhabited) as.


  1. John C Reynolds, Types, Abstraction and Parametric Polymorphism
  2. Philip Wadler, Theorems for Free!
  3. Claudio Hermida, Uday S. Reddy, Edmund P. Robinson, Logical Relations and Parametricity – A Reynolds Programme for Category Theory and Programming Languages
  4. Derek Dreyer, Paremetricity and Relational Reasoning, Oregon Programming Languages Summer School
  5. Janis Voigtländer, Free Theorems Involving Type Constructor Classes

This is my 100th WordPress post, so I decided to pull all the stops and go into some crazy stuff where hard math and hard physics mix freely with wild speculation. I hope you will enjoy reading it as much as I enjoyed writing it.

It’s a HoTT Summer of 2013

One of my current activities is reading the new book, Homotopy Type Theory (HoTT) that promises to revolutionize the foundations of mathematics in a way that’s close to the heart of a programmer. It talks about types in the familiar sense: Booleans, natural numbers, (polymorphic) function types, tuples, discriminated unions, etc.

As do previous type theories, HoTT assumes the Curry-Howard isomorphism that establishes the correspondence between logic and type theory. The gist of it is that any theorem can be translated into a definition of a type; and its proof is equivalent to producing a value of that type (false theorems correspond to uninhabited types that have no elements). Such proofs are by necessity constructive: you actually have to construct a value to prove a theorem. None if this “if it didn’t exist then it would lead to contradictions” negativity that is shunned by intuitionistic logicians. (HoTT doesn’t constrain itself to intuitionistic logic — too many important theorems of mathematics rely on non-constructive proofs of existence — but it clearly delineates its non-intuitionistic parts.)

Type theory has been around for some time, and several languages and theorem provers have been implemented on the base of the Curry-Howard isomorphism: Agda and Coq being common examples. So what’s new?

Set Theory Rant

Here’s the problem: Traditional type theory is based on set theory. A type is defined as a set of values. Bool is a two-element set {True, False}. Char is a set of all (Unicode) characters. String is an infinite set of all lists of characters. And so on. In fact all of traditional mathematics is based on set theory. It’s the “assembly language” of mathematics. And it’s not a very good assembly language.

First of all, the naive formulation of set theory suffers from paradoxes. One such paradox, called Russell’s paradox, is about sets that are members of themselves. A “normal” set is not a member of itself: a set of dogs is not a dog. But a set of all non-dogs is a member of itself — it’s “abnormal”. The question is: Is the set of all “normal” sets normal or abnormal? If it’s normal that it’s a member of normal sets, right? Oops! That would make it abnormal. So maybe it’s abnormal, that is not a member of normal sets. Oops! That would make it normal. That just shows you that our natural intuitions about sets can lead us astray.

Fortunately there is an axiomatic set theory called the Zermelo–Fraenkel (or ZF) theory, which avoids such paradoxes. There are actually two versions of this theory: with or without the Axiom of Choice. The version without it seems to be too weak (not every vector space has a basis, the product of compact sets isn’t necessarily compact, etc.); the one with it (called ZFC) leads to weird non-intuitive consequences.

What bothers many mathematicians is that proofs that are based on set theory are rarely formal enough. It’s not that they can’t be made formal, it’s just that formalizing them would be so tedious that nobody wants to do it. Also, when you base any theory on set theory, you can formulate lots of idiotic theorems that have nothing to do with the theory in question but are only relevant to its clunky set-theoretical plumbing. It’s like the assembly language leaking out from higher abstractions. Sort of like programming in C.

Donuts are Tastier than Sets

Tired of all this nonsense with set theory, a group of Princeton guys and their guests decided to forget about sets and start from scratch. Their choice for the foundation of mathematics was the theory of homotopy. Homotopy is about paths — continuous maps from real numbers between 0 and 1 to topological spaces; and continuous deformations of such paths. The properties of paths capture the essential topological properties of spaces. For instance, if there is no path between a and b, it means that the space is not connected — it has at least two disjoint components — a sits in one and b in another.

Two paths from a to b that cannot be continuously deformed into each other

If two paths between a and b cannot be deformed into each other, it means that there is a hole in space between them.

Obviously, this “traditional” formulation of homotopy relies heavily on set theory. A topological space, for instance, is defined in terms of open sets. So the first step is to distill the essence of homotopy theory by getting rid of sets. Enter Homotopy Type Theory. Paths and their deformations become primitives in the theory. We still get to use our intuitions about paths as curves inscribed on surfaces, but otherwise the math is totally abstract. There is a small set of axioms, the basic one asserting that the statement that a and b are equivalent is equivalent to the statement that they are equal. Of course the notions of equivalence and equality have special meanings and are very well defined in terms of primitives.

Cultural Digression

Why homotopy? I have my own theory about it. Our mathematics has roots in Ancient Greece, and the Greeks were not interested in developing technology because they had very cheap labor — slaves.

Euclid explaining geometry to his pupils in Raphael’s School of Athens

Instead, like all agricultural societies before them (Mesopotamia, Egypt), they were into owning land. Land owners are interested in geometry — Greek word γεωμετρία literally means measuring Earth. The “computers” of geometry were the slate, ruler and compass. Unlike technology, the science of measuring plots of land was generously subsidized by feudal societies. This is why the first rigorous mathematical theory was Euclid’s geometry, which happened to be based on axioms and logic. Euclid’s methodology culminated in the 20th century in Hilbert’s program of axiomatization of the whole of mathematics. This program crashed and burned when Gödel proved that any non-trivial theory (one containing arithmetic) is chock full of non-decidable theorems.

I was always wondering what mathematics would be like if it were invented by an industrial, rather than agricultural, society. The “computer” of an industrial society is the slide rule, which uses (the approximation of) real numbers and logarithms. What if Newton and Leibniz never studied Euclid? Would mathematics be based on calculus rather than geometry? Calculus is not easy to axiomatize, so we’d have to wait for the Euclid of calculus for a long time. The basic notions of calculus are Banach spaces, topology, and continuity. Topology and continuity happen to form the basis of homotopy theory as well. So if Greeks were an industrial society they could have treated homotopy as more basic than geometry. Geometry would then be discovered not by dividing plots of land but by studying solutions to analytic equations. Instead of defining a circle as a set of points equidistant from the center, as Euclid did, we would first define it as a solution to the equation x2+y2=r2.

Now imagine that this hypothetical industrial society also skipped the hunter-gather phase of development. That’s the period that gave birth to counting and natural numbers. I know it’s a stretch of imagination worthy a nerdy science fiction novel, but think of a society that would evolve from industrial robots if they were abandoned by humanity in a distant star system. Such a society could discover natural numbers by studying the topology of manifolds that are solutions to n-dimensional equations. The number of holes in a manifold is always a natural number. You can’t have half a hole!

Instead of counting apples (or metal bolts) they would consider the homotopy of the two-apple space: Not all points in that space can be connected by continuous paths.

There is no continuous path between a and b

Maybe in the world where homotopy were the basis of all mathematics, Andrew Wiles’s proof of the Fermat’s Last Theorem could fit in a margin of a book — as long as it were a book on cohomology and elliptic curves (some of the areas of mathematics Wiles used in his proof). Prime numbers would probably be discovered by studying the zeros of the Riemann zeta function.

Industrial robot explaining to its pupils the homotopy of a two-apple space.

Quantum Digression

If our industrial robots were very tiny and explored the world at the quantum level (nanorobots?), they might try counting particles instead of apples. But in quantum mechanics, a two-particle state is not a direct product of two one-particle states. Two particles share the same wave function. In some cases this function can be factorized when particles are far apart, in others it can’t, giving rise to quantum entanglement. In quantum world, 2 is not always equal to 1+1.

In Quantum Field Theory (QFT — the relativistic counterpart of Quantum Mechanics), physicist calculate the so called S matrix that describes idealized experiments in which particles are far away from each other in the initial and final states.  Since they don’t interact, they can be approximated by single-particle states. For instance, you can start with a proton and an antiproton coming at each other from opposite directions. They can be approximated as two separate particles. Then they smash into each other, produce a large multi-particle mess that escapes from the interaction region and is eventually seen as (approximately) separate particles by a big underground detector. (That’s, for instance, how the Higgs boson was discovered.) The number of particles in the final state may very well be different from the number of particles in the initial state. In general, QFT does not preserve the number of particles. There is no such conservation law.

Counting particles is very different from counting apples.

Relaxing Equality

In traditional mathematics, the notions of isomorphism and equality are very different. Isomorphism means (in Greek, literally) that things have the same shape, but aren’t necessarily equal. And yet mathematicians often treat isomorphic things as if they were equal. They prove a property of one thing and then assume that this property is also true for all things isomorphic. And it usually is, but nobody has the patience to prove it on the case-by-case basis. This phenomenon even has its own name: abuse of notation. It’s like writing programs in a language in which equality ‘==’ does not translate into the assembly-language CMP instruction followed be a conditional jump. We would like to work with structural identity, but all we do is compare pointers. You can overload operator ‘==’ in C++ but many a bug was the result of comparing pointers instead of values.

How can we make isomorphism more like equality? HoTT took quite an unusual approach by relaxing equality enough to make it plausibly equivalent to isomorphism.

HoTT’s homotopic version of equality is this: Two things are equal if there is a path between them. This equality is reflexive, symmetric, and transitive, just like equality is supposed to be. Reflexivity, for instance, tells us that x=x, and indeed there is always a trivial (constant) path from a point to itself. But there could also be other non-trivial paths looping from the point to itself. Some of them might not even be contractible. They all contribute to equality x=x.

There could be several paths between different points, a and b, making them “equal”: a=b. We are tempted to say that in this picture equality is a set of paths between points. Well, not exactly a set but the next best thing to a set — a type. So equality is a type, often called “identity type”, and two things are equal if the “identity type” for them is inhabited. That’s a very peculiar way to define equality. It’s an equality that carries with it a witness — a construction of an element of the equality type.

Relaxing Isomorphism

The first thing we could justifiably expect from any definition of equality is that if two things are equal they should be isomorphic. In other words, there should be an invertible function that maps one equal thing to another equal thing. This sound pretty obvious until you realize that, since equality is relaxed, it’s not! In fact we can’t prove strict isomorphism between things that are homotopically equal. But we do get a slightly relaxed version of isomorphism called equivalence. In HoTT, if things are equal they are equivalent. Phew, that’s a relief!

The trick is going the other way: Are equivalent things equal? In traditional mathematics that would be blatantly wrong — there are many isomorphic objects that are not equal. But with the HoTT’s notion of equality, there is nothing that would contradict it. In fact, the statement that equivalence is equivalent to equality can be added to HoTT as an axiom. It’s called Voevodski’s axiom of univalence.

It’s hard (or at least tedious), in traditional math, to prove that properties (propositions) can be carried over isomorphisms. With univalence, equivalence (generalized isomorphism) is the same as equality, and one can prove once and for all that propositions can be transported between equal objects. With univalence, the tedium of proving that if one object has a given property then all equivalent (“isomorphic”) object have the same property is eliminated.

Incidentally, where do types live? Is there (ahem!) a set of all types? There’s something better! A type of types called a Universe. Since a Universe is a type, is it a member of itself? You can almost see the Russel’s paradox looming in the background. But don’t despair, a Universe is not a member of itself, it’s a member of the higher Universe. In fact there are infinitely many Universes, each being a member of the next one.

Taking Roots

How does relaxed equality differ from the set-theoretical one? The simplest such example is the equality of Boolean types. There are two ways you can equal the Bool type to itself: One maps True to True and False to False, the other maps True to False and False to True. The first one is an identity mapping, but the second one is not — its square though is! (apply this mapping twice and you get back to original). Within HoTT you can take the square root of identity!

So here’s an interesting intuition for you: HoTT is to set theory as complex numbers are to real numbers (in complex numbers you can take a square root of -1). Paradoxically, complex numbers make a lot of things simpler. For instance, all quadratic equations are suddenly solvable. Sine and cosine become two projections of the same complex exponential. Riemann’s zeta function gains very interesting zeros on the imaginary line. The hope is that switching from sets to homotopy will lead to similar simplifications.

I like the example with flipping Booleans because it reminds me of an interesting quantum phenomenon. Imagine a quantum state with two identical particles. What happens when you switch the particles? If you get exactly the same state, the particles are called bosons (think photons). If you don’t, they are called fermions (think electrons). But when you flip fermions twice, you get back to the original state. In many ways fermions behave like square roots of bosons. For instance their equation of motion (Dirac equation) when squared produces the bosonic equation of motion (Klein-Gordon equation).

Computers Hate Sets

There is another way HoTT is better than set theory. (And, in my cultural analogy, that becomes more pertinent when an industrial society transitions to a computer society.) There is no good way to represent sets on a computer. Data structures that model sets are all fake. They always put some restrictions on the type of elements they can store. For instance the elements must be comparable, or hashable, or something. Even the simplest set of just two elements is implemented as an ordered pair — in sets you can’t have the first or the second element of a set (and in fact the definition of a pair as a set is quite tricky). You can easily write a program in Haskell that would take a (potenitally infinite) list of pairs and pick one element from each pair to form a (potentially infinite) list of picks. You can, for instance, tell the computer to pick the left element from each pair. Replace lists of pairs with sets of sets and you can’t do it! There is no constructive way of creating such a set and it’s very existence hinges on the axiom of choice.

This fact alone convinces me that set theory is not the best foundation for the theory of computing. But is homotopy a better assembly language for computing? We can’t represent sets using digital computers, but can we represent homotopy? Or should we start building computers from donuts and rubber strings? Maybe if we keep miniaturizing our computers down to the Planck scale, we could find a way to do calculations using loop quantum gravity, if it ever pans out.

Aharonov-Bohm Experiment

Even without invoking quantum gravity, quantum mechanics exhibits a lot of interesting non-local behaviors that often probe the topological properties of the surroundings. For instance, in the classic double-slit experiment, the fact that there are paths between the source of electrons and the screen that are not homotopically equivalent makes the electrons produce an interference pattern. But my favorite example is the Bohm-Aharonov experiment.

First, let me explain what a magnetic potential is. One of the Maxwell’s equations states that the divergence of the magnetic field is always zero (see a Tidbit at the end of this post that explains this notation):

This is the reason why magnetic field lines are always continuous. Interestingly, this equation has a solution that follows from the observation that the divergence of a curl is zero. So we can represent the magnetic field as a curl of some other vector field, which is called the magnetic potential A:

It’s just a clever mathematical trick. There is no way to measure magnetic potential, and the solution isn’t even unique: you can add a gradient of any scalar field to it and you’ll get the same magnetic field (the curl of a gradient is zero). So A is totally fake, it exists only as a device to simplify some calculations. Or is it…?

It turns out that electrons can sniff the magnetic potential, but only if there’s a hole in space. It turns out that, if you have a thin (almost) infinite linear coil with a current running through its windings, (almost) all magnetic field will be confined to its interior. Outside the coil there’s no magnetic field. However, there is a nonzero curl-free magnetic potential circling it. Now imagine using this coil as a separator between the two slits of the double-slit experiment. As before, there are two paths for the electron to follow: to the left of the coil and to the right of the coil. But now, along one path, the electron will be traveling with the lines of magnetic potential; along the other, against.

Aharononv-Bohm experiment. There are two paths available to the electron.

Magnetic potential doesn’t contribute to the electron’s energy or momentum but it does change its phase. So in the presence of the coil, the interference pattern in the two slit experiment shifts. That shift has been experimentally confirmed. The Aharonov-Bohm effect takes place because the electron is excluded from the part of space that is taken up by the coil — think of it as an infinite vertical line in space. The space available to the electron contains paths that cannot be continuously deformed into each other (they would have to cross the coil). In HoTT that would mean that although the point a, which is the source of the electron, and point b, where the electron hit the screen, are “equal,” there are two different members of the equality type.

The Incredible Quantum Homotopy Computer

The Aharonov-Bohm effect can be turned on and off by switching the current in the coil (actually, nobody uses coils in this experiment, but there is some promising research that uses nano-rings instead). If you can imagine a transistor built on the Aharonov-Bohm principle, you can easily imagine a computer. But can we go beyond digital computers and really explore varying homotopies?

I’ll be the first to admit that it might be too early to go to Kickstarter and solicit funds for a computer based on the Aharonov-Bohm effect that would be able to prove theorems formulated using Homotopy Type Theory; but the idea of breaking away from digital computing is worth a thought or two.

Or we can leave it to the post apocalyptic industrial-robot civilization that doesn’t even know what a digit is.


I’m grateful to the friendly (and patient) folks on the HoTT IRC channel for answering my questions and providing valuable insights.

Tidbit about Differential Operators

What are all those curls, divergences, and gradients? It’s just some vectors in 3-D space.

A scalar field φ(x, y, z) is a single function of space coordinates x, y, and z. You can calculate three different derivatives of this function with respect to x, y, and z. You can symbolically combine these three derivatives into one vector, (∂x, ∂y, ∂z). There is a symbol for that vector, called a nabla: . If you apply a nabla to a scalar field, you get a vector field that is called the gradient, φ, of that field. In coordinates, it is: (∂xφ, ∂yφ, ∂zφ).

A vector field V(x, y, z) is a triple of functions forming a vector at each point of space, (Vx, Vy, Vz). Magnetic field B and magnetic potential A are such vector fields. There are two ways you can apply a nabla to a vector field. One is just a scalar product of the nabla and the vector field, ·V, and it’s called the divergence of the vector field. In components, you can rewrite it as ∂xVx + ∂yVy + ∂zVz.

The other way of multiplying two vectors is called the vector product and its result is a vector. The vector product of the nabla and a vector field, ×A, is called the curl of that field. In components it is: (∂yAz – ∂zAy, ∂zAx – ∂xAz, ∂xAy – ∂yAx).

The vector product of two vectors is perpendicular to both. So when you then take a scalar product of the vector product with any of the original vectors, you get zero. This works also with nablas so, for instance, ·(×A) = 0 — the divergence of a curl is zero. That’s why the solution to ·B is B = ×A.

Similarly, because the vector product of a vector with itself is zero, we get ×φ = 0 — the curl of a gradient is zero. That’s why we can always add a term of the form φ to A and get the same field B. In physics, this freedom is called gauge invariance.

You don’t need to know anything about category theory to use Haskell as a programming language. But if you want to understand the theory behind Haskell or contribute to its development, some familiarity with category theory is a prerequisite.

Category theory is very easy at the beginning. I was able to explain what a category is to my 10-year old son. But the learning curve gets steeper as you go. Functors are easy. Natural transformations may take some getting used to, but after chasing a few diagrams, you’ll get the hang of it. The Yoneda lemma is usually the first serious challenge, because to understand it, you have to be able to juggle several things in your mind at once. But once you’ve got it, it’s very satisfying. Things just fall into place and you gain a lot of intuition about categories, functors, and natural transformations.

A Teaser Problem

You are given a polymorphic function imager that, for any function from Bool to any type r, returns a list of r. Try running this code in the School of Haskell, with colorMap, heatMap, and soundMap. You may also define your own function of Bool and pass it to imager.

{-# LANGUAGE ExplicitForAll #-}
imager :: forall r . ((Bool -> r) -> [r])
imager = ???

data Color = Red | Green | Blue        deriving Show
data Note  = C | D | E | F | G | A | B deriving Show

colorMap x = if x then Blue else Red
heatMap  x = if x then 32   else 212
soundMap x = if x then C    else G

main = print $ imager colorMap

Can you guess the implementation of imager? How many possible imagers with the same signature are there? By the end of this article you should be able to validate your answers using the Yoneda’s lemma.


A category is a bunch of objects with arrows between them (incidentally, a “bunch” doesn’t mean a set but a more generall collection). We don’t know anything about the objects — all we know is the arrows, a.k.a morphisms.

Our usual intuition is that arrows are sort of like functions. Functions are mappings between sets. Indeed, morphisms have some function-like properties, for instance composability, which is associative:

Fig 1. Associativity of morphisms demonstrated on Haskell functions. (In my pictures, piggies will represent objects; sacks of potatoes, sets; and fireworks, morphisms.)

h :: a -> b
g :: b -> c
f :: c -> d

f . (g . h) == (f . g) . h

There is also an identity morphism for every object in a category, just like the id function:

Fig 2. The identity morphism.

id :: a -> a

id . f == f . id == f

In all Haskell examples I’ll be using the category Hask of Haskell types, with morphisms being plain old functions. An object in Hask is a type, like Int, [Bool], or [a]->Int. Types are nothing more than just sets of values. Bool is a two element set {True, False}, Integer is the set of all integers, and so on.

In general, a category of all sets and functions is called Set .

So how good is this sets-and-functions intuition for an arbitrary category? Are all categories really like collections of sets, and morphisms are like functions from set to set? What does the word like even mean in this context?


In category theory, when we say one category is “like” another category, we usually mean that there is a mapping between the two. For this mapping to be meaningful, it should preserve the structure of the category. So not only every object from one category has to be mapped into an object from another category, but also all morphisms must be mapped correctly — meaning they should preserve composition. Such a mapping has a name: it’s called a functor.

Functors in Hask are described by the type class Functor

class Functor f where
fmap :: (a -> b) -> (f a -> f b)

A Haskell Functor maps types into types and functions into functions — a type constructor does the former and fmap does the latter.

A type contructor is a mapping from one type to another. For instance, a list type constructor takes any type a and creates a list type, [a].

So instead of asking if every category is “like” the Set category, we can ask a more precise question: For what types of categories (if not all of them) there exist functors that map them into Set . Such categories are called representable, meaning they have a representation in Set .

As a physicist I had to deal a lot with groups, such as groups of spacetime rotations in various dimensions or unitary groups in complex spaces. It was very handy to represent these abstract groups as matrices acting on vectors. For instance, different representations of the same Lorenz group (more precisely, SL(2, C)) would correspond to physical particles with different spins. So vector spaces and matrices are to abstract groups as sets and functions are to abstract categories.

Yoneda Embedding

One of the things Yoneda showed is that there is at least one canonical functor from any so called locally small category into the category of sets and functions. The construction of this functor is surpisingly easy, so let me sketch it.

This functor should map every object in category C into a set. Set of what? It doesn’t really matter, a set is a set. So how about using a set of morphisms?

Fig 3. The Yoneda embedding. Object X is mapped by the functor into the set HA(X). The elements of the set correspond to morphisms from A to X.

How can we map any object into a set of morphisms? Easy. First, let’s arbitrarily fix one object in the category C, call it A. It doesn’t matter which object we pick, we’ll just have to hold on to it. Now, for every object X in C there is a set of morphisms (arrows) going from our fixed A to this X. We designate this set to be the image of X under the functor we are constructing. Let’s call this functor HA. There is one element in the set HA(X) for every morphism from A to X.

A functor must define a mapping of objects to objects (to sets, in our case) and morphisms to morphisms (to functions in our case). We have established the first part of the mapping. To define the second part, let’s pick an arbitrary morphism f from X to Y. We have to map it to some function from the set HA(X) to the set HA(Y).

Fig 4. The Yoneda functor also maps morphisms. Here, morphism f is mapped into the function HA(f) between sets HA(X) and HA(Y).

Let’s define this function, we’ll call it HA(f), through its action on any element of the set HA(X), call it x. By our construction, x corresponds to some particular morphism, u, from A to X. We now have at our disposal two morphisms, u :: A -> X and f :: X -> Y (that’s the morphism we are mapping). We can compose them. The result f . u is a morphism from A to Y, so it’s a member of the set HA(Y). We have just defined a function that takes an x from HA(X) and maps it into y from HA(Y), and this will be our HA(f).

Of course, you have to prove that this construction of HA is indeed a functor preserving composition of morphisms, but that’s reasonably easy, once the technique we have just used becomes familiar to you. Here’s the gist of this technique: Use components! When you are defining a functor from category C to category D, pick a component — an object X in C — and define its image, F(X). Then pick a morphism f in C, say from X to Y, and define its image, F(f), as a particular morphism from F(X) to F(Y).

Similarly, when defining a function from set S to T, use its components. Pick an element x of S and define its image in T. That’s exactly what we did in our construction.

Incidentally, what was that requirement that the category C be locally small? A category is locally small if the collection of morphisms between any two objects forms a set. This may come as a surprise but there are things in mathematics that are too big to be sets. A classic example is a collection of all sets, which cannot be a set itself, because it would lead to a paradox. A collection of all sets, however, is the basis of the Set category (which, incidentally, turns out to be locally small).

Summary So Far

Let me summarize what we’ve learned so far. A category is just a bunch of abstract objects and arrows between them. It tells us nothing about the internal structure of objects. However, for every (locally small) category there is a structure-preserving mapping (a functor) that maps it into a category of sets. Objects in the Set category do have internal structure: they have elements; and morphisms are functions acting on those elements. A representation maps the categorie’s surface structure of morphisms into the internal structure of sets.

It is like figuring out the properties of orbitals in atoms by looking at the chemical compounds they form, and at the way valencies work. Or discovering that baryons are composed of quarks by looking at their decay products. Incidentally, no one has ever “seen” a free quark, they always live inside other particles. It’s as if physicists were studying the “category” of baryons by mapping them into sets of quarks.

A Bar Example

This is all nice but we need an example. Let’s start with “A mathematician walks into a bar and orders a category.” The barman says, “We have this new tasty category but we can’t figure out what’s in it. All we know is that it has just one object A” — (“Oh, it’s a monoid,” the mathematician murmurs knowingly) — “…plus a whole bunch of morphisms. Of course all these morphisms go from A to A, because there’s nowhere else to go.”

What the barman doesn’t know is that the new category is just a re-packaging of the good old set of ASCII strings. The morphisms correspond to appending strings. So there is a morphism called foo that apends the string "foo"

foo :: String -> String
foo = (++"foo")

main = putStrLn $ foo "Hello "

and so on.

“I can tell you what’s inside A,” says the mathematician, “but only up to an isomorphism. I’m a mathematician not a magician.”

She quickly constructs a set that contains one element for each morphism — morphisms must, by law, be listed by the manufacturer on the label. So, when she sees foo, she puts an element with the label “foo”, and so on. Incidentally, there is one morphism with no name, which the mathematician maps to an empty label. (In reality this is an identity morphism that appends an empty string.)

“That’s what’s inside the object A,” she says.

“Moreover, this set comes equipped with functions that rearrange its elements. In fact there is a function for every morphism listed in the category,” she says. “Name any morphism and I’ll construct you a function.”

The barman gives her morphism p, which in reality is:

p = (++"p")

“Okay,” she says, “here’s how I construct the corresponding function. Pick any element in my set.”

The barman picks “foo”.

“Okay, ‘foo’ corresponds to the morphism foo,” she says, “so tell me what you call the morphism that’s the composition of foo and p?” (By law, the manufacturer is obliged to specify all admissible compositions of morphisms on the label.)

“It’s called foop,” says the barman.

Quick check:

p . foo == (++"p") . (++"foo") == (++"foop")
foop = (++"foop")

“Okay,” she says, “the function corresponding to p maps “foo” into “foop”. Hm, how curious! I bet this function will map the no-label elment into “p”, is that right?”

“Indeed, it does,” says the barman.

Quick check:

p . id == p

“I bet you this is just a string monoid,” says the mathematician.

“I think I’ll have my usual Top on the rocks instead.”

Natural Transformations

We’ve seen how to construct a representation of any (locally small) category in Set by selecting an arbitrary object A in the category and studying morphisms originating at A. What if we choose a different object B instead? How different is the representation HA from HB? For that matter, what if we pick any functor F from C to Set ? How is it related to HA? That’s what the Yoneda lemma is all about.

Let me start with a short recap.

A functor is a mapping between categories that preserves their structure. The structure of a category is defined by the way its morphisms compose. A functor F maps objects into objects and morphism into morphisms in such a way that if f . g = h then F(f) . F(g) = F(h).

A natural transformation is a mapping between functors that preserves the structure of the underlying categories.

Fig 5. A component of a transformation Φ at X. Φ maps functor F into functor G, ΦX is a morphism that maps object F(X) into object G(X).

First we have to understand how to define mappings between functors. Suppose we have functors F and G, both going from category C to category D. For a given object X in C, F will map it into F(X) in D, and G will map it into G(X) in D. A mapping Φ between functors must map object F(X) to object G(X), both in category D. We know that a mapping of objects is called a morphism. So for every object X we have to provide a morphism ΦX from F(X) to G(X). This morphism is called a component of Φ at X. Or, looking at it from a different angle, Φ is a family of morphisms parameterized by X.

An Example of Natural Transformation

Just to give you some Haskell intuition, consider functors on Hask . These are mapping of types (type constructors) such as a -> [a] or a -> Maybe a, with the corresponging mappings of morphisms (functions) defined by fmap. Recall:

class Functor f where
fmap :: (a -> b) -> (f a -> f b)

The mapping between Haskell functors is a family of functions parameterized by types. For instance, a mapping between the [] functor and the Maybe functor will map a list of a, [a] into Maybe a. Here’s an example of such a family of functions called safeHead:

safeHead :: [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x

A family of functions parameterized by type is nothing special: it’s called a polymorphic function. It can also be described as a function on both types and values, with a more verbose signature:

{-# LANGUAGE ExplicitForAll #-}

safeHead :: forall a . [a] -> Maybe a
safeHead []     = Nothing
safeHead (x:xs) = Just x

main = print $ safeHead ["One", "Two"]

Let’s go back to natural transformations. I have described what it means to define a transformation of functors in terms of objects, but functors also map morphism. It turns out, however, that the tranformation of morphisms is completely determined by the two functors. A morphism f from X to Y is transformed under F into F(f) and under G into G(f). G(f) must therefore be the image of F(f) under Φ. No choice here! Except that now we have two ways of going from F(X) to G(Y).

Fig 6. The naturality square. Φ is a natural transformation if this diagram commutes, that is, both paths are equivalent.

We can first use the morphism F(f) to take us to F(Y) and then use ΦY to get to G(Y). Or we can first take ΦX to take us to G(X), and then G(f) to get to G(Y). We call Φ a natural transformation if these two paths result in the same morphism (the diagram commutes).

The best insight I can offer is that a natural transformation works on structure, while a general morphism works on contents. The naturality condition ensures that it doesn’t matter if you first rearrange the structure and then the content, or the other way around. Or, in other words, that a natural transformation doesn’t touch the content. This will become clearer in examples.

Going back to Haskell: Is safeHead a natural transformation between two functors [] and Maybe? Let’s start with a function f from some type a to b. There are two ways of mapping this function: one using the fmap defined by [], which is the list function map; and the other using the Maybe‘s fmap, which is defined in the Maybe‘s functor instance definition:

instance Functor Maybe where
   fmap f (Just x) = Just (f x)
   fmap _ Nothing  = Nothing

The two path from [a] to Maybe b are:

  1. Apply fmap f to [a] to get [b] and then safeHead it, or
  2. Apply safeHead to [a] and then use the Maybe version of fmap.

There are only two cases to consider: an empty list and a non-empty list. For an emtpy list we get Nothing both ways, otherwise we get Just f acting on the first element of the list.

We have thus shown that safeHead is a natural transformation. There are more interestig examples of natural transformations in Haskell; monadic return and join come to mind.

The intuition behind natural transformations is that they deal with structure, not contents. safeHead couldn’t care less about what’s stored in a list, but it understands the structure of the list: things like the list being empty, or having a first element. The type of this element doesn’t matter. In Haskell, natural transformations are polymorphic functions that can, like safeHead be typed using forall:

safeHead :: forall a . [a] -> Maybe a

Yoneda Lemma

Going back to the Yoneda lemma, it states that for any functor from C to Set there is a natural transformation from our canonical representation HA to this functor. Moreover, there are exactly as many such natural transformations as there are elements in F(A).

That, by the way, answers our other question about the dependence on the choice of A in the Yoneda embedding. The Yoneda lemma tells us that there are natural transformations both ways between HA and HB.

Amazingly, the proof of the Yoneda lemma, at least in one direction, is quite simple. The trick is to first define the natural transformation Φ on one special element of HA(A): the element that corresponds to the identity morphism on A (remember, there is always one of these for every object). Let’s call this element p. Its image under ΦA will be in F(A), which is a set. You can pick any element of this set and it will define a different but equally good Φ. Let’s call this element q. So we have fixed ΦA(p) = q.

Now we have to define the action of Φ on an arbitrary element in the image of HA. Remember that the functor HA transforms objects in C into sets. So let’s take an arbitrary object X and its image HA(X). The elements in HA(X) correspond to morphisms from A to X. So let’s pick one such morphism and call it f. Its image is an element r in HA(X). The question is, what does r map into under Φ? Remember, it’s image must be an element of F(X).

Fig 7. The mappings in the Yoneda lemma. F is an arbitrary functor. Any choice of p determines the morphism ΦX for any X.

To figure that out, let’s consider the F route. F being a functor transforms our morphism f into F(f) — which is a morphism from F(A) to F(X). But, as you may remember, we have selected a special element in F(A) — our q. Now apply F(f) to q and you get an element in F(X), call it s. (Remember, F(f) is just a regular function between two sets, F(A) and F(X).)

There’s nothing more natural than picking ΦX(r) to be this s! We have thus defined a natural transformation Φ for any X and r.

The straightforward proof that this definition of Φ is indeed natural is left as an exercise to the user.

A Haskell Example

I’ve been very meticulous about distinguishing between morphisms from A to X in C and the corresponding set elements in HA(X). But in practice it’s more convenient to skip the middle man and define natural transformations in the Yoneda lemma as going directly from these morphisms to F(X). Keeping this in mind, the Haskell version of the Yoneda lemma is ofter written as follows:

forall r . ((a -> r) -> f r) ~ f a

where the (lowercase) f is the functor (think of it as a type constructor and its corresponding fmap), (a -> r) is a function corresponding to the morphism from A to X in our orginal formulation. The Yoneda’s natural transformation maps this morphism into the image of r under f — the F(X) in the original formulation. The forall r means that the function ((a -> r) -> f r) works for any type r, as is necessary to make it a natural transformation.

The lemma states that the type of this function, forall r . ((a -> r) -> f r) is equivalent to the much simpler type f a. If you remember that types are just sets of values, you can interpret this result as stating that there is one-to-one correspondence between natural transformations and values of the type f r.

Remember the example from the beginning of this article? There was a function imager with the following signature:

imager :: forall r . ((Bool -> r) -> [r])

This looks very much like a natural transformation from the Yoneda lemma with the type a fixed to Bool and the functor, the list functor []. (I’ll call the functions Bool->r iffies.)

The question was, how many different implementations of this signature are there?

The Yoneda lemma tells us exactly how to construct such natural transformations. It instructs us to start with an identity iffie: idBool :: Bool -> Bool, and pick any element of [Bool] to be its image under our natural transformation. We can, for instance, pick [True, False, True, True]. Once we’ve done that, the action of this natural transformation on any iffie h is fixed. We just map the morphism h using the functor (in Haskell we fmap the iffie), and apply it to our pick, [True, False, True, True].

Therefore, all natural transformations with the signature:

forall r . ((Bool -> r) -> [r])

are in one-to-one correspondence with different lists of Bool.

Conversely, if you want to find out what list of Bool is hidden in a given implementation of imager, just pass it an identity iffie. Try it:

{-# LANGUAGE ExplicitForAll #-}

imager :: forall r . ((Bool -> r) -> [r])
imager iffie = fmap iffie [True, False, True, True]

data Color = Red | Green | Blue        deriving Show
data Note  = C | D | E | F | G | A | B deriving Show

colorMap x = if x then Blue else Red
heatMap  x = if x then 32   else 212
soundMap x = if x then C    else G
idBool :: Bool -> Bool
idBool x = x

main = print $ imager idBool

Remember, this application of the Yoneda lemma is only valid if imager is a natural transformation — its naturality square must commute. The two functors in the imager naturality diagram are the Yoneda embedding and the list functor. Naturality of imager translates into the requirement that any function f :: a -> b modifying an iffie could be pulled out of the imager:

imager (f . iffie) == map f (imager iffie)

Here’s an example of such a function translating colors to strings commuting with the application of imager:

{-# LANGUAGE ExplicitForAll #-}

imager :: forall r . ((Bool -> r) -> [r])
imager iffie = fmap iffie [True, False, True, True]

data Color = Red | Green | Blue  deriving Show

colorMap x = if x then Blue else Red

f :: Color -> String
f = show 

main = do
    print $ imager (f . colorMap)
    print $ map f (imager colorMap)

The Structure of Natural Transformations

That brings another important intuition about the Yoneda lemma in Haskell. You start with a type signature that describes a natural transformation: a particular kind of polymorphic function that takes a probing function as an argument and returns a type that’s the result of a functor acting on the result type of the probing function. Yoneda tells us that the structure of this natural transformation is tightly constrained.

One of the strengths of Haskell is its very strict and powerful type system. Many Haskell programers start designing their programs by defining type signatures of major functions. The Yoneda lemma tells us that type signatures not only restrict how functions can be combined, but also how they can be implemented.

As an extreme, there is one particular signature that has only one implementation: a->a (or, more explicitly, forall a. a -> a). The only natural implementation of this signature is the identity function, id.

Just for fun, let me sketch the proof using the Yoneda lemma. If we pick the source type as the singleton unit type, (), then the Yoneda embedding consists of all functions taking unit as an argument. A function taking unit has only one return value so it’s really equivalent to this value. The functor we pick is the identity functor. So the question is, how many natural tranformation of the the following type are there?

forall a. ((() -> a) -> a)

Well, there are as many as there are elements in the image of () under the identity functor, which is exactly one! Since a function ()->a can be identified with a, it means we have only one natural transformation with the following signature:

forall a. (a -> a)

Moreover, by Yoneda construction, this function is defined by fmapping the function ()->a over the element () using the identity functor. So our natural transformation, when probed with a value of the type a will return the same value. But that’s just the definition of the identity function. (In reality things are slightly more complicated because every Haskell type must include undefined, but that’s a different story.)

Here’s an exercise for the reader: Show that the naturality square for this example is equivalent to id commuting with any function: f . id == id . f.


I hope I provided you with enough background information and intuition so that you’ll be able to easily read more advanced blog posts, like this one:
Reverse Engineering Machines with the Yoneda Lemma by Dan Piponi, or GADTs by Gabriel Gonzales.


I’d like to thank Gabriel Gonzales for providing useful comments and John Wiegley, Michael Sloan, and Eric Niebler for many interesting conversations.

“You must be kidding!” would be the expected reaction to “Monads in C++.” Hence my surprise when I was invited to Boostcon 11 to give a three-hour presentation on said topic, a presentation which was met with totally unexpected level of interest. I imagine there was a small uptick in the sales of Haskell books afterwards.

Before I answer the question: “Why are monads relevant in C++?”, let me first answer the question: “Why is Haskell relevant in C++?” It turns out that Haskell makes C++ template metaprogramming if not easy then at least approachable (see my blog, What Does Haskell Have to Do with C++?). You have to understand that compile-time C++ is a strict functional language operating (mostly) on types. If that is not immediately obvious, it’s because the C++ template syntax is so appalling.

Now, not everybody falls in love with the Haskell syntax (at least not at first sight) but it fits the functional paradigm much better than anything else. What’s more, the correspondence between Haskell code and C++ template code is so direct that it could almost be translated mechanically. It really pays to design and test your template metaprograms in Haskell first, before translating them to C++. Conversely, translating complex template code to Haskell may help you understand it better and enable you to maintain and extend it. Those were my reasons to advocate the use of Haskell as a pseudo-language for C++ template metaprogramming.

Armed with Haskell I could study and analyze some of the most sophisticated C++ metacode. But when I looked at Eric Niebler’s Boost Proto library and couldn’t make heads or tails of it, even after discussing it with him over email and over beers, I was stumped.

Monads and EDSLs

Boost Proto is a library for implementing domain-specific languages (DSLs). It’s an impressive piece of C++ metaprogramming but it’s hard to comprehend and it doesn’t yield to straightforward translation to Haskell. The problem is that it combines compile-time constructs with run time code in a non-trivial way. I struggled long with Proto until one day I had an epiphany– I was looking at a monad. But first:

What Is an EDSL?

Great things come out of abusing C++. One example is the abuse of templates (which were originally designed to support parametrized data types) to express compile-time computations. The result is Template Metaprogrammming (TMP). Another example is the abuse of operator overloading, which created a fertile ground for Embedded Domain-Specific Languages (EDSLs). Eric’s Proto combined the two abominations into a library for constructing EDSLs.

In this post I will construct a simple EDSL in C++. This is how it works:

  1. You overload the heck out of operators
  2. The overloaded operators build trees from expressions instead of eagerly computing results
  3. The type of the tree encodes the structure of the expression that created it. Since it’s a type, the structure of the expression is available at compile time
  4. You construct an object of that type
  5. You execute that object (because it is a function object) with appropriate arguments, and get the results you wanted.

A more general EDSL will also create a runtime data structure that stores the values of any variables and literals in the expression. Algorithms can walk the tree at compile time, runtime, or both to compute values and perform actions. You can even write a compile time algorithm to compute a runtime algorithm that munges trees in interesting ways.

My moment of Zen was when I realized that an EDSL corresponds to a Haskell reader monad. (If you’re not familiar with Haskell or monads, read my series, Monads for the Curios Programmer.) To my own amazement this analogy worked and led to executable code.

Haskell Monad C++ “Monad”
Composition of monadic functions Compile-time parsing of expressions
Execution of compound action on astate Application of function object to runtime values

To prove the concept, I picked a simple EDSL based on one of the Eric’s examples. It’s a two-argument lambda EDSL. It lets you write an expression with two placeholders, arg1 and arg2. Through the magic of templates and operator overloading, this expression is interpreted as an anonymous function of two arguments. Here’s an example which evaluates to 25:

int x = (arg1 * arg2 + 13)(3, 4)

It turns out that the most important step in this process is to be able to convert an expression tree into a function object. Let me do this in Haskell first, and then translate it into C++.

The Expression Monad in Haskell

Let’s start with a short:

Reader Monad Refresher

A reader monad (a state monad with immutable state) provides a way to express stateful computations in a pure functional language. A stateful computation takes some arguments and returns a value, but unlike a function, it makes use of some external state. This computation can be turned into a function that takes the same arguments but, instead of returning a value, returns another function that takes a state as an argument and calculates the value. (The distinction between a computation and a function is essential here.)

Statefull computation represented as a monadic function returning an action


The state, in the case of our expression monad, is a collection of two values — the arguments to the lambda. They will be used to replace the two placeholders in the expression.

Here’s the definition of our state — a list of integers (we really need only two, but I’m being sloppy… I mean general):

type Args = [Int]


An action is a function that takes the state and produces a value of arbitrary type t:

Args -> t

You might remember from my previous post that one element of the monad is a type constructor: a mapping of an arbitrary type into an enriched type. In this case the enriched type is an action type. Because of technicalities related to Haskell’s type inference, I’ll have to thinly encapsulate an action inside a new type, which I will call Prog:

newtype Prog t = PR (Args -> t)

The idea is that the composition of monadic actions is similar to the compilation of source code: it produces a “program,” which is then “run.” In fact, I’ll define an auxiliary function that does just that, it runs a “program”:

run :: Prog t -> Args -> t
run (PR act) args = act args

This is a function that takes a Prog t as its first argument and pattern matches it to the constructor (PR act). The second argument is the state, args of type Args. The result of running the program is a value of type t.

Monadic Functions

The next step is to define some primitive monadic functions. These are functions that produce actions (or “programs”). Here’s a very useful function that produces an action that extracts the n’th value from the list of arguments (the state):

getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)

getArg takes an Int (in our case, zero or one) and returns a lambda (encapsulated in a Prog using the constructor PR). This lambda takes a list of Ints, args, and extracts its n’th element (args !! n means: take the n’th element of the list, args).

You’ve seen examples of monadic functions in my previous blog post, but it’s worth repeating the idea: getArg is a function that returns an action that is like a promise to extract the n’th argument when the arguments become available.

Just for fun, here’s another monadic function that takes n and promises to return twice n when the arguments are provided. It doesn’t matter that the arguments are going to be ignored.

doubleIt n = PR (λ args -> 2 * n)

Monadic Bind

I need two monadic functions, getArg and doubleIt, in order to demonstrate their composition using “bind.” I want to create an action that will get the first argument using getArg 0 and double it using doubleIt v. If you remember how “bind” works, it takes an action, in this case the result of getArg 0, and a continuation, which represents “the rest of the computation.”

Combining two monadic functions using bind

In our case, the continuation is a lambda that takes an argument v (that’s the future result of the first action) and performs doubleIt v.

λ v -> doubleIt v

This lambda returns an action because doubleIt v returns an action.

The signature of bind is:

bind :: (Prog a) -> (a -> (Prog b)) -> (Prog b)

You might have noticed that I use the words “action” and “program” interchangeably, although, strictly speaking, an action is the contents of a program. However, this distinction is an artifact or a Haskell quirk — a monad can’t be defined using a type alias, so we need the Prog type to encapsulate the action. Curiously, we won’t have this problem in C++.

The purpose of bind is to glue together an action with a continuation and return a new action. That means that bind has to return a lambda of appropriate type:

bind (PR act) cont =
    PR (λ args -> ... produce value of type b ...)

This lambda will be executed when arguments are available. At that point, with the arguments handy, we’ll be able to first execute the action, act, and pass its result to the continuation. The continuation will return another action, which we can promptly execute too:

bind (PR act) cont =
    PR (λ args ->
        let v = act args
            (PR act') = cont v
            act' args)

(In Haskell you can use primed variables, like act'. I like that notation.)

This is very much like the state monad bind, except that we don’t have to worry about chaining the state, which is immutable. In fact the above is the definition of the reader monad’s bind.

Let’s test our new bind by composing getArg with doubleIt:

test0 :: Prog Int
test0 =
    bind (getArg 0) (λ v -> doubleIt v)

Composing monadic function, GetArg, with the continuation that calls another monadic function, doubleIt

We can run the program produced by test0 to see that it actually works:

> let prog = test0
> run prog [3,4]
> run prog [11,0]

For completeness, here’s the full definition of the Prog reader monad:

instance Monad Prog where
     return v = PR (λ args -> v)
     (PR act) >>= cont = PR (λ arg -> let v = act arg
                                          (PR act') = cont v
                                      in act' arg)

With that definition, we can rewrite our little example in the do notation:

test1 = do
    v <- getArg 0
    doubleIt v

Expression Tree

We have the definition of a program — a thinly encapsulated action. To create an actual program we need some kind of source code and a compiler. Our source code takes the form of an expression tree. A tree in Haskell is defined as a tagged union:

data Exp = Const Int
           | Plus Exp Exp
           | Times Exp Exp
           | Arg1 
           | Arg2

The first constructor, Const takes an Int and creates a leaf node corresponding to this constant value. The next two constructors are recursive, they each take two expressions and produce, correspondingly, a Plus and a Times node. Finally, there are two placeholders for the two arguments that will be provided at “runtime.”


Compilation is turning source code into a program. This process can be represented as a function with the following signature:

compile :: Exp -> Prog Int

It takes an expression tree (the source) and returns a program that evaluates to an integer. Looking from another perspective, compile is just a monadic function in our Prog monad.

We will define compile in little steps driven by the various patterns in the definition of Exp. If the expression matches the Const node, we define compile as:

compile (Const c) = return c

Remember, return is a function that takes a value and returns an action (program) that will produce this value when executed.

Another easy case is Arg1 (and Arg2)–we already have a monadic function getArg that we can call:

compile Arg1 = getArg 0

The interesting case is the Plus (or Times) node. Here we have to recursively call compile for both children and then compose the results. Well, that’s what monadic function composition is for. Here’s the code using the do notation, hiding the calls to bind:

compile (Plus e1 e2) = 
        v1 <- compile e1
        v2 <- compile e2
        return (v1 + v2)

That’s it! We can now compile a little test expression (Arg1 * Arg2 + 13):

testExp =
    let exp = (Plus (Times Arg1 Arg2)(Const 13))
    in compile exp

and we can run the resulting program with an argument list:

> let args = [3, 4]
> let prog = testExp
> run prog args

The Expression Monad in C++

The translation of the Haskell expression monad to C++ is almost embarrassingly easy. I’ll be showing Haskell code side by side with its C++ equivalents.

Expression Tree

data Exp = Const Int
           | Plus Exp Exp
           | Times Exp Exp
           | Arg1 
           | Arg2

In C++, the expression tree is a compile-time construct that is expressed as a type (I’ll show you later where this type originates from). We have separate types for all the leaves, and the non-leaf nodes are parametrized by the types of their children.

template<int n> struct Const {};

template<class E1, class E2>
struct Plus {};

template<class E1, class E2>
struct Times {};

struct Arg1 {};
struct Arg2 {};

For instance, here’s the type that corresponds to the expression arg1*arg2+13:

Plus<Times<Arg1, Arg2>, Const<13>>



type Args = [Int]

C++: State is a runtime object. I implemented it using an array of two integers and, for good measure, I added a constructor and an accessor.

struct Args
    Args(int i, int j) {
        _a[0] = i;
        _a[1] = j;
    int operator[](int n) { return _a[n]; }
    int _a[2];


Here’s the tricky part: How to represent an action? Remember, an action takes state, which is now represented by Args, and returns a value of some type. Because Args are only available at runtime, an action must be a runtime function or, even better, a function object.

How does that fit in our compile-time/runtime picture? We want our C++ monadic functions to be “executed” at compile time, but they should produce actions that are executed at runtime. All we can do at compile time is to operate on types, and this is exactly what we’ll do. We will create a new type that is a function object. A function object is a struct that implements an overloading of a function-call operator.

There’s another way of looking at it by extending the notion of a metafunction. In one of my previous posts I described metafunctions that “return” values, types, or even other metafunctions. Here we have a metafunction that returns a (runtime) function. This view fits better the Haskell monadic picture where a monadic function returns an action.

Type Constructor

Unfortunately, not everything can be expressed as neatly in C++ as it is in Haskell. In particular our type constructor:

newtype Prog t = PR (Args -> t)

doesn’t have a direct C++ counterpart. It should be a template that, for any type T, defines an action returning that type. In this construction, an action is represented by a C++ function object, so we would like something like this:

template<class T> struct PR {
    T operator()(Args args);

which, for many reasons, is useless. What we really need is for PR to be a “concept” that specifies a type with an associated method, operator(), of a particular signature. Since concepts are not part of C++11, we’ll have to rely on programming discipline and hope that, if we make a mistake, the compiler error messages will not be totally horrible.

Monadic Metafunction

Let’s start by translating the Haskell monadic function getArg:

getArg :: Int -> Prog Int
getArg n = PR (λ args -> args !! n)

Here it is in C++:

template<int n>
struct GetArg { // instance of the concept PR
    int operator()(Args args) {
        return args[n];

It is a metafunction that takes a compile-time argument n and “returns” an action. Translation: it defines a struct with an overloaded operator() that takes Args and returns an int. Again, ideally this metafunction should be a struct that is constrained by the concept PR.


Let’s look again at the Haskell’s implementation of monadic bind:

bind (PR prog) cont =
    PR (λ args ->
        let v = prog args
            (PR prog') = cont v
            prog' args)

It takes two arguments: an action and a continuation. We know what an action is in our C++ construction: it’s a function object with a particular signature. We’ll parameterize our C++ Bind with the type, P1, of that object. The continuation is supposed to take whatever that action returns and return a new action. The type of that new action, P2, will be the second template parameter of Bind.

We’ll encode the type of the continuation as the standard function object taking an int and returning P2:


Now, the C++ Bind is a metafunction of P1 and P2 that “returns” an action. The act of “returning” an action translates into defining a struct with the appropriate overload of operator(). Here’s the skeleton of this struct:

template<class P1, class P2> // compile-time type parameters
struct Bind {
    Bind(P1 prog, std::function<P2(int)> cont)
        : _prog(prog), _cont(cont)
    P1 _prog;
    std::function<P2(int)> _cont;

Notice that at runtime we will want to construct an object of this type, Bind, and pass it the runtime arguments: the action and the continuation. The role of the constructor requires some explanation. Haskell function bind is a monadic function of two arguments. Its C++ counterpart is a metafunction that takes four arguments: P1 and P2 at compile time, and prog and cont at runtime. This is a general pattern: When constructing a monadic metafunction in C++ we try to push as much as possible into compile time, but some of the arguments might not be available until runtime. In that case we shove them into the constructor.

The interesting part is, of course, the function-call operator, which really looks like a one-to-one translation of the Haskell implementation:

template<class P1, class P2>
struct Bind {
    int operator()(Args args) {
        int v = _prog(args);
        P2 prog2 = _cont(v);
        return prog2(args);

Things to observe: A smart compiler should be able to inline all these calls because it knows the types P1 and P2, so it can look up the implementations of their function-call operators. What’s left to the runtime is just the operations on the actual runtime values, like the ones done inside GetArg::operator(). However, I have been informed by Eric Niebler that many of these optimizations are thwarted by the choice of std::function for the representation of the continuation. This problem can be overcome, but at some loss of clarity of code, so I’ll stick to my original implementation (also, see Appendix 2).


All that is left to complete the monad is the return function. Here it is in Haskell:

return :: a -> Prog a
return v = PR (λ args -> v)

And here it is, as a function object, in C++:

struct Return
    Return(int v) : _v(v) {}
    int operator()(Args args) 
        return _v;
    int _v;

Of course, in full generality, Return should be parameterized by the return type of its operator(), but for this example a simple int will do. The argument v is only available at runtime, so it is passed to the constructor of Return.

The “Compile” Metafunction in C++

Now that we have our monad implemented, we can use it to build complex monadic functions from simple ones (such as GetArg). In Haskell we have implemented a monadic function compile with the following signature:

compile :: Exp -> Prog Int

It takes an expression tree and returns an action that evaluates this tree.

The C++ equivalent is a metafunction that takes a (compile-time) expression tree and defines a struct with the appropriate overload of a function-call operator. Lacking concept support, the latter requirement can’t be directly expressed in C++. We can, and in fact have to, provide a forward declaration of Compile:

template<class Exp>
struct Compile;

In Haskell, we defined compile using pattern matching. We can do the same in C++. Just like we split the Haskel definition of compile into multiple sub-definitions corresponding to different argument patterns, we’ll split the C++ definition into multiple specializations of the general template, Compile.

Here’s the first specialization in Haskell:

compile (Const c) = return c

and in C++:

template<int c>
struct Compile<Const<c>> : Return
    Compile() : Return(c) {}

I could have defined a separate overload of operator() for this case, but it’s simpler to reuse the one in Return.

Here’s another trivial case:

compile Arg1 = getArg 0

translates into:

struct Compile<Arg1> : GetArg<0> {};

The real fun begins with the Plus node, because it involves composition of monadic functions. Here’s the de-sugared Haskell version:

compile (Plus exL exR) = 
  bind compile exL
       λ left -> 
          bind compile exR
               λ right ->
                   return (left + right)

The logic is simple: First we compile the left child and bind the result with the continuation (the lambda) that does the rest. Inside this continuation, we compile the right child and bind the result with the continuation that does the rest. Inside that continuation we add the two results (they are the arguments to the continuation) and encapsulate the sum in an action using return.

In C++, the binding is done by creating the appropriate Bind object and passing it (in the constructor) a function object and a continuation. The function object is the result of the compilation of the left child (we construct a temporary object of this type on the fly):


Just like in Haskell, the continuation is a lambda, only now it’s the C++11 lambda. Here’s the code, still with some holes to be filled later:

template<class L, class R>
struct Compile<Plus<L, R>> {
  int operator()(Args args) 
    return Bind<...> (
      [](int left) {
        return Bind<...>(
          [left](int right) {
             return Return(left + right);

Notice that the second lambda must explicitly capture the local variable, left. In Haskell, this capture was implicit.

The types for the instantiations of the two Bind templates can be easily derived bottom up. Ideally, we would like the compiler to infer them, just like in Haskell, but the C++ compiler is not powerful enough (although, at the cost of muddying the code some more, one can define template functions that return Bind objects of appropriate types — in C++ type inference works for template functions).

Here’s the final implementation with the types filled in:

template<class L, class R>
struct Compile<Plus<L, R>> {
  int operator()(Args args) 
    return Bind<Compile<L>, Bind<Compile<R>, Return>> (
      [](int left) -> Bind<Compile<R>, Return> {
        return Bind<Compile<R>, Return>(
          [left](int right) -> Return {
            return Return(left + right);

It’s quite a handful and, frankly speaking, I would have never been able to understand it, much less write it, if it weren’t for Haskell.

The Test

You can imagine the emotional moment when I finally ran the test and it produced the correct result. I evaluated the simple expression, Arg1*Arg2+13, with arguments 3 and 4 and got back 25. The monad worked!

void main () {
    Args args(3, 4);
    Compile<Plus<Times<Arg1, Arg2>, Const<13>>> act;
    int v = act(args);
    std::cout << v << std::endl;


Now that we have the monad and a monadic function Compile, we can finally build a simple embedded domain-specific language. Our minimalistic goal is to be able to evaluate the following expression:

int x = (arg1 + arg2 * arg2)(3, 4);

The trick is to convince the compiler to construct a very special type that represents the particular expression tree. In our case it should be something like this:

Plus< Arg1, Times<Arg2, Arg2> >

With this type, call it E, the compiler should call a special metafunction we’ll call Lambda, which returns a function object of two integral arguments:

template<class E>
struct Lambda {
    int operator()(int x, int y) {
        Args args(x, y);
        Compile<E> prog;
        return prog(args);

How does one do it? This is really the bread and butter of EDSL writers — pretty elementary stuff — but I’ll explain it anyway. We start by declaring two objects, arg1 and arg2:

const Lambda<Arg1> arg1;
const Lambda<Arg2> arg2;

These little objects can infect any expression with their Lambda-ness, and spread the infection with the help of appropriately overloaded arithmetic operators.

For instance, when the compiler sees arg1+arg2, it will look for the overload of operator+ that takes two Lambdas. And we’ll provide this one:

template<class E1, class E2>
Lambda<Plus<E1, E2>> operator+ (Lambda<E1> e1, Lambda<E2> e2)
    return Lambda<Plus<E1, E2>>();

Notice that this operator returns another Lambda, albeit of a more complex type, thus spreading the Lambda-ness even further, and so on. (In this very simplified example I’m ignoring the arguments e1 and e1. In general they would be used to create a runtime version of the expression tree.)

Let’s see how it works in our example:

(arg1 + arg2 * arg2)(3, 4)

The seed types are, respectively,

Lambda<Arg1>,  Lambda<Arg2>, Lambda<Arg2> 

The application of the inner operator* (its definition is left as an exercise to the reader), produces the following type:

Lambda<Times<Arg2, Arg2>>

This type is passed, along with Lambda<Arg1> to operator+, which returns:

Lambda<Plus<Arg1, Times<Arg2, Arg2>>

All this is just type analysis done by the compiler to find out what type of object is returned by the outermost call to operator+.

The next question the compiler asks itself is whether this object can be called like a function, with the two integer arguments. Why, yes! A Lambda object (see above) defines an overload of the function-call operator. The instantiation of this particular Lambda defines the following overload:

int Lambda<Plus<Arg1, Times<Arg2, Arg2>>::operator()(int x, int y) {
     Args args(x, y);
     Compile<Plus<Arg1, Times<Arg2, Arg2>> prog;
     return prog(args);

This function will be called at runtime with the arguments 3 and 4 to produce the expected result, 19.

The code that has to be actually executed at runtime is the call to prog(args), which is mostly a series of Binds, an addition, and a multiplication. Since the implementation of a Bind‘s function-call operators has no flow of control statements (if statements or loops), it can all be inlined by an optimizing compiler (modulo the glitch with std::function I mentioned earlier). So all that’s left is the addition and multiplication. Eric tells me that this is how Proto expressions work, and there is no reason why the monadic version wouldn’t lead to the same kind of performance.

Conculsions and Future Work

I came up with this C++ monadic construction to help me understand the kind of heavy-duty template metaprogramming that goes on in Proto. Whether I have succeeded is a matter of opinion. I exchanged some of the complexity of template manipulations for a different complexity of Haskell and monads. I would argue that any C++ template metaprogrammer should know at least one functional language, so learning Haskell is a good investment. Monads are just part of the same Haskell package.

Assuming you understand monads, is the monadic construction of an EDSL simpler than the original using Proto? I will argue that it is. The implementation of Proto is pretty much monolithic, the concepts it’s using have very little application or meaning outside of Proto. The monadic approach, on the other hand, decomposes into several layers of abstraction. At the bottom you have the construction of the state-, or reader-, monad. The monad exposes just three primitives: the type constructor, bind, and return. With these primitives you can create and compose monadic (meta-) functions — in my example it was the compile metafunction. Finally, you use these metafunctions to build an EDSL.

With this three-layer structure comes a well defined set of customization points. Users may plug in their own type constructors and implement their own Bind and Return. Or, they might use the default monad and just create their own monadic functions. It’s important that the procedure for building and composing monadic functions is well defined and that it uses straight C++ for implementing the logic. Inside a monadic function you can use regular C++ statements and standard control flow devices.

It’s not yet totally clear how general this approach is — after all what I described is a toy example. But there is a lot of interest in re-thinking or maybe even re-implementing Boost Proto and Phoenix in terms of monads. At Boostcon I started working with Joel Falcou and Hartmut Kaiser on this project, and later Eric Niebler and Thomas Heller joined our group. We believe that having solid theoretical foundations might encourage wider acceptance of some of the more complex template libraries. Especially if strict one-to-one correspondence with Haskell code could be established.


I’ve been given so much feedback on my Boostcon presentation and the drafts of this blog post that I should really list the following people as co-authors: Eric Niebler, Thomas Heller, Joel Falcou, and Hartmut Keiser. Thank you guys!

This page has been translated into Spanish language by Maria Ramos.

Appendix 1

The picture wouldn’t be complete if I didn’t provide the translation of the EDSL construction back to Haskell. It’s pretty straightforward, except for the last part where an expression becomes a callable function object. Maybe there is a way to bend Haskell’s syntax to do it directly but, since this is just a proof of concept, I took a shortcut and defined a function toFun which turns a Lambda expression into a function.

newtype Lambda = L Exp

toFun (L ex) =
  λ x y ->
     run (compile ex) [x, y]

instance Num Lambda where
    (L e1) + (L e2) = L (Plus e1 e2)
    (L e1) * (L e2) = L (Times e1 e2)
    fromInteger n = L (Const (fromInteger n))

test =
    let arg1 = L Arg1
        arg2 = L Arg2
        (toFun (arg1 + 2 * arg2 * arg2)) 2 3

The overloading of arithmetic operators is done by making Lambda an instance of the type class Num.

Appendix 2

When writing code at this level of abstraction it’s easy to bump into compiler bugs. For instance, Visual Studio 2010 won’t instantiate this simpler version of Bind that doesn’t use std::function:

template<class P1, class Cont>
struct Bind
    Bind(P1 prog, Cont f)
        : _prog(prog), _f(f)
    int operator()(State args) {
        int v = _prog(args);
        auto p2 = _f(v);
        return p2(args);
    P1 _prog;
    Cont _f; // store a lambda continuation


  1. Bartosz Milewski, Monads for the Curious Programmer:
  2. Bartosz Milewski, What Does Haskell Have to Do with C++?
  3. Eric Niebler, Expressive C++. A series of blog posts that started all this
  4. Brian McNamara and Yannis Smaragdakis, Functional Programming in C++. A Haskell-in-C++ library
  5. Brian McNamara and Yannis Smaragdakis, Syntax sugar for C++: lambda, infix, monads, and more. More traditional approach to monads in C++ as a runtime device
  6. Zoltán Porkoláb, Ábel Sinkovics, Domain-specific Language Integration with Compile-time Parser Generator Library. A C++ implementation of a compile-time parser that uses a monadic approach (although the word “monad” is never mentioned in the paper)

In the previous installment I introduced monads using two examples: the Maybe monad and the List monad. Admittedly, those weren’t the most exciting uses of monads, although they served the purpose of explaining the underlying theory. I also mentioned that monads were introduced into functional programming as a general solution for a variety of problems. One of those problems was representing stateful computations as functions.

Functions, State, and Side Effects

Here’s the problem: In functional languages, every time you call a function with the same arguments, it must return the same result. In fact, if the compiler detects such a situation, it is free to memoize the result of the first call and skip subsequent calls altogether. A stateful computation, on the other hand, might return different results every time it’s performed. It may, for instance, access some global or static variables. It may also modify those variables– in other words have side effects. In extreme cases a computation might be performed purely for side effects and not even bother to return any results.

This kind of behavior is often troublesome even in imperative programming. The use of global variables in particular is being discouraged. A better solution is to encapsulate the state and pass it explicitly to functions that use it. As a syntactic shortcut, in object-oriented languages, some of the state is regularly passed to functions (methods) as a hidden “this” or “self” argument. There’s even a syntax for composing such functions, as in this JavaScript snippet:

with(document) {
    var t = title;
    write(t + " and more");

Here, title is a property and write a method of the object document. (If you put on your monadic glasses, it almost looks like do notation.)

In functional languages we have one more limitation: we cannot mutate any data. There’s a standard way to overcome this limitation: Instead of modifying the data, you create a modified copy. This doesn’t even have to be expensive, if the language supports smart data structures that silently substitute references for copies whenever possible. Most operations on lists in Haskell are optimized this way, and there’s even a language, Clojure, at whose core are “persistent” data structures that behave as if they were immutable, but do a lot of sharing behind the scenes. Immutability is a very attractive feature when you are doing concurrent programming: access to immutable data requires no synchronization.

Taking all this into account, the way to translate stateful computations into functional language is to use functions that explicitly take state (encapsulated in some data structure) and return the, possibly modified, state together with the usual return value. For instance, a C++ “function”:

int pop() {
    auto v = glob.top();
    return v;

that operates on a global vector, glob of the type std::vector<int>, would be turned into a Haskell function:

pop (ST lst) = (ST (tail lst), head lst)

operating on the state of type Stack:

newtype Stack = ST [Int]

The constructor ST creates a Stack from a list of integers. This constructor is also used for pattern matching, as in the argument of pop. The function head returns the first element of a list, tail returns the rest.

The signature of pop is characteristic of functions that operate on state:

top:: Stack-> (Stack, Int)

Such functions are often called “actions.”

There are two problems with this scheme: (1) It’s awkward, and (2) It doesn’t fit our monadic approach. In this example, the original computation (as expressed in C++) takes no arguments and returns an integer. Its functional counterpart takes a state and returns an integer combined with the state. It’s not clear how one would bind such functions together and use the convenient do notation.

We are on the right track though. We just need to get even more general: We need to separate the construction of an action from its execution. Our basic blocks will be functions that return actions– we’ll call them “monadic functions.” Since action is a function, we’ll be dealing with functions returning functions; or higher order functions.

Our goal is to find a way to compose monadic functions into larger monadic functions. A composite monadic function will return a composite action. We will then execute such action on a state, and get our result.

This new description fits the monadic pattern much better. We start with a generic stateful computation that takes an argument of type a and returns a value of type b, and we turn it into a (monadic) function that takes type a and returns an enriched type based on b. This time, though, the enriched type is a function type– an action. In general, an action is a function that takes a state (of some type S) and returns a tuple consisting of the (possibly modified) state and the value of type b.

S -> (S, b)

Here’s the first element of the monad– a type constructor: For any type t it defines a new type: an action to calculate a value of type t given a state. This type constructor is part of the state monad. Before we get to a more formal definition, let’s do some practice exercises.

The Monadic Calculator

There’s a perfect example of a stateful computation: a stack-based calculator. The state in this case is represented by the type Calc:

newtype Calc = Calc [Int]

that hides a list of integers– our calculator’s stack.

First, lets define a monadic function (a function that returns an action) that pops an element off the calculator’s stack. It will be a function returning a function, so we need to use lambdas.

popCalc = 
    \(Calc lst) -> (Calc (tail lst), head lst)

The body of the lambda is almost identical to the implementation of pop above. Notice that popCalc takes no arguments. Rather, the function that it produces takes a calculator as an argument and returns the calculator back, paired with the result–the value at the top of the stack. In other words, popCalc returns a promise to calculate the top of the calculator’s stack when the stack is available.

Here’s how you can use popCalc. First, you call it with no arguments and record the returned action. Next, you create a calculator (with a non-empty stack, otherwise the next line of code would bomb). You apply the action to that calculator and record the result– you pattern-match it to a tuple consisting of a changed calculator and a number. Finally you display that number. This is the actual output of a Haskell interpreter session:

> let f = popCalc
> let calc = Calc [3, 2, 1]
> let (calc', x) = f calc
> x

While we’re at it, we can similarly implement a pushCalc function:

pushCalc n =
    \(Calc lst) -> (Calc (n:lst), ())

Notice that the lambda produced by pushCalc returns a modified calculator (argument n is prepended to the list) paired with a special value () of the type unit— a Haskell equivalent of void. The imperative equivalent of this function would return void and work only through side effects. Notice also that the lambda is actually a closure: it captures the outer variable n for later use.

Finally, we need a function that performs some calculation; after all we are implementing a calculator:

addCalc =
    \(Calc lst) -> let (a:b:rest) = lst
                       (Calc ((a + b):rest), ())

Here I’m matching the calculator’s list with the pattern (a:b:rest) to retrieve the top two elements. The modified calculator has the sum of those two elements on the top of its stack.

We can use all these functions in combination to perform more complex operations, like adding two numbers. Here’s a piece code that might rival some of the Rube Goldberg creations:

add x y =
    let pux = pushCalc x -- promise to push x
        puy = pushCalc y -- promise to push y
        axy = addCalc    -- promise to add top numbers
        pp = popCalc     -- promise to pop the result
        calc = Calc []   -- we need a calculator
        (calc1, _) = pux calc  -- actually push x
        (calc2, _) = puy calc1 -- actually push y
        (calc3, _) = axy calc2 -- actually add top numbers
        (_, z) = pp calc3      -- actually pop the result
        z  -- return the result

But what we really want is to be able to combine smaller actions into larger actions. For that we have to define bind. The signature of bind, in this case, should be:

bind :: (Calc -> (Calc, a)) ->        -- action
        (a -> (Calc -> (Calc, b)) ->  -- continuation
        (Calc -> (Calc, b))           -- new action

I have highlighted our enriched types–the action types. This signature looks much more complex than the signature of the Maybe bind, but that’s only because the enriched type is itself a function type. Other than that, the structure is the same: bind accepts an action and a continuation and returns a new action. The continuation in this case takes an argument of type a (the value to be calculated by the first action) and returns the composite action.

In fact, if we define Action as a type alias:

type Action a = Calc -> (Calc, a)

the signature of bind can be abbreviated to:

bind :: (Action a) -> (a -> (Action b)) ->  (Action b)

Now for the implementation. Since the result of bind is an action, it has to return a lambda of the appropriate signature.

bind act cont =
    \calc -> ... produce (Calc, b) tuple ...

Bind is supposed to compose the action, act, with the continuation, cont. So it should first apply act to calc.

let (calc', v) = act calc

The result is a tuple (calc', v) with a new calculator and a value v of type a.

This is the v that the continuation expects, so the next step is to apply the continuation:

act' = cont v

The result of the continuation is a new action. This new action can then be executed, that is applied to the new calculator:

act' calc'

to produce the desired result– a tuple of the type (Calc, b).

Here’s the final code:

bind act cont =
    \calc ->
        let (calc', v) = act calc
            act' = cont v
            act' calc'

To complete our construction of the monad, we need to define return. The signature of return is:

return :: a -> Action a

and the implementation is pretty straightforward. It takes a value v and returns a promise to return this value.

return v = \calc -> (calc, v)

An astute reader might notice that nothing in this construction depends on the peculiarities of the type Calc. It will work for any type that is used to represent state. So we have in fact just constructed a generic state monad. The stack-based calculator is just a silly example of that monad.

It’s not difficult to implement bind as an infix operator, >>=, and turn the calculator into a monad that’s recognizable by the Haskell compiler (see Appendix 1). Then the relevant part of the add function may be rewritten in the do notation:

add x y = do
    pushCalc x
    pushCalc y
    r <- popCalc
    return r

Let me present the same code without any syntactic sugar, using the cascading lambda-within-lambda notation:

add x y =
  bind (pushCalc x) 
       (\() -> bind (pushCalc y)
                    (\() -> bind addCalc
                                 (\() -> bind popCalc
                                              (\z -> return z))))

This is not something you will see often in Haskell programs, but I will eventually want to go beyond Haskell. My goal is to connect back with C++, and this is the form that’s most convenient form making such a transition.

So let’s painstakingly analyze this code. We are binding the first action, (pushCalc x), to the rest of the code. The rest of the code is expressed as one huge lambda. To make these two parts fit together, their types have to match. The value produced by the action pushCalc is void (a.k.a., “unit”)– so it’s type is Action (). Therefore the lambda to which it binds must also take void, hence the notation:

\() -> ...

The body of that lambda is another bind, and so on, until we get to the interesting part, which is popCalc.

popCalc is an action that calculates a value: its signature is Action Int. This value is passed to the lambda to which popCalc is bound. Therefore this last lambda takes an Int argument, z. Finally, this value is enclosed in an action, and that’s done by the function return.

This unsugared notation elucidates one more aspect of the monadic approach that’s very relevant in the context of Haskell. Haskell is a lazy language: it doesn’t evaluate anything unless it is strictly necessary for achieving some final goal. Also, when it needs to evaluate several independent things, it will do that in some arbitrary, unpredictable order. So if it were somehow possible to implement the imperative versions of push and pop in Haskell, we would have two problems: push would never be evaluated because it produces no result, and even if it were, its evaluation could be swapped with the subsequent pop. Monadic bind forces the ordered evaluation of actions by introducing explicit data dependency. If pop follows push in the chain of binds, pop cannot be evaluated before push because its argument is the calculator that is returned by push. The two are linked by data dependency which, by the way, is not so obvious in the do notation.


The state monad is a very interesting pattern from the programming point of view. Instead of doing something, you create an action that is executed (maybe even multiple times) later. The monadic scaffolding provides the standard amenities like the ability to compose actions, and the do notation makes writing functions that produce functions much more natural. There is an interesting variation of the state monad called the IO monad, which is used for input and output in Haskell. I describe it in Appendix 2.

There are many patterns in imperative languages that have elements, or sometimes just hints, of a state monad. For instance, in the OO world you might encounter a very useful Command Pattern. You can “bind” command objects using the Composite Pattern. In languages that support anonymous functions and closures, like JavaScript, C# and, recently C++, you can return functions from functions directly. This might help, for instance, in dealing with inversion of control, where you return a closure as an event handler (that would be material for another series of blog posts).

But I have in mind a very specific example that I’ve been working on in C++ that fits the monadic pattern perfectly, and I’m going to write about it in the next installment of this series.

I’d like to thank Eric Niebler for valuable comments on the draft of this post.

Appendix 1

The full implementation of a stack-based calculator requires a few more Haskell tricks. First, we have to explicitly define our type constructor. I’ll call the new type Calculation, with the type constructor CL that encapsulates an action:

newtype Calculation a = CL (Calc -> (Calc, a))

Monadic functions have to return this new type, so they all wrap their actions into a Calculation.

pushCalc n =
    CL (\(Calc lst) -> (Calc (n:lst), ()))

topCalc = 
    CL (\(Calc lst) -> (Calc lst, head lst))

popCalc =
    CL (\(Calc lst) -> (Calc (tail lst), head lst))

addCalc =
    CL (\(Calc lst) -> let (a:b:rest) = lst
                           (Calc ((a + b):rest), ()))

Haskell has a built-in type class for Monads (think of a type class as a C++ concept). We have to tell Haskell that our Calculation is an instance of Monad and provide the definition of the two associated functions: bind, using infix notation, and return.

instance Monad Calculation where
    return x = 
        CL (\calc -> (calc, x))
    CL(c) >>= cont =
        CL (\calc ->
            let (calc', v) = c calc
                CL c' = cont v
                c' calc')

With those definitions, our add function can be written using the do notation.

add x y = do
    pushCalc x
    pushCalc y
    r <- popCalc
    return r

Since we are not expecting any values to be calculated by pushCalc or addCalc, there are no left arrows accompanying them in the do notation.

Appendix 1a: The Applicative

Haskell keeps evolving and, on occasion, the changes in the language break old code. In particular the Monad class has a superclass now, called Applicative. That’s why the Monad instance for Calculation won’t compile any more, unless you explicitly add the instance for Applicative. Fortunately, the Applicative functionality can be fully implemented using the Monad interface, as in:

instance Applicative Calculation where
  pure = return
  mf <*> mx = do f <- mf
                 x <- mx
                 return (f x)

This won’t compile either, because Applicative requires Functor as its superclass. So we have to make Applicative a Functor. The simplest is to let the compiler work it out, but you have to include this line at the very top of the file:

{-# LANGUAGE DeriveFunctor #-}

and modify the definition of Calculation:

newtype Calculation a = CL (Calc -> (Calc, a))
  deriving Functor

In fact, it’s possible to implement the calculator as an Applicative, without the need for a Monad instance. But that’s a different story.

Appendix 2: The IO Monad

Strictly speaking a lazy purely functional language like Haskell cannot do any input or output. That’s because the compiler is free to memoize the result of the first call to, say, getChar and elide all subsequent calls. Calls to putChar, which don’t return anything, may be ignored altogether. This is why most functional languages cheat when it comes to I/O. But not Haskell. Monads to the rescue!

Let’s think for a moment why getChar may return different characters every time it’s called. It’s because there is a keyboard somewhere out there that changes its state. Why is it changing its state? Because there is a human who pushes the keys. Why is the human pushing the keys? Because he or she got a phone call from China that the price of rice is about to go up, so it makes sense to buy some futures. And so on… In other words there is this external world that is full of state.

What if we encapsulate that whole world in a hidden data structure and write our program as an action that operates on it? This is very similar to the state monad pattern except that here the programmer has no access to the state and cannot execute the action. The action is produced by main and passed to the system for execution. It’s the system, wink wink, that has access to “the world” and may pass it as a state argument to that action. Of course it’s all smoke and mirrors, but it successfully staves off the insanity of the external world from impinging on the purity of Haskell.

How does it work in practice? There is a monad called IO. It’s almost like a state monad, except that its type can’t be expressed in Haskell, because it would have to look something like this:

type IO a = World -> (World, a)

and we don’t know what World is. The main function in a Haskell program is a monadic IO action, usually:

main :: IO ()

with the type parameter a replaced by unit, (), (although any other type will work too).

The simplest main is just a single IO action:

main = putStrLn "Hello World!"

but in general main is a do block.

You might ask: If a Haskell program is one monadic IO action, then where does the traditional functional code go? The answer is that you can call traditional functions from anywhere in the do block. Even in my highly biased example there were several non-monadic function calls (head, tail, operator (+), etc.). Imagine a Haskell program as a tree: it’s trunk is monadic IO, and so are all the thick branches that have anything to do with I/O. But the thinner branches and leaves are your run of the mill functions that get (lazily) evaluated only when the main IO action is executed by the system.

Another interesting observation is that all functions that perform I/O have this information encoded in their type signature. Not in the type of its arguments, mind you, but in the return type. This is almost the opposite of what you see in imperative languages, where you’d have to pass some kind of I/O object (file or stream) to a procedure that performs I/O (except when that object is global, like standard input/output in C++). In Haskell, if you want your function to perform I/O, two things must happen: it must return an IO action that it internally constructs; and the action must find its way to the very top, to the main function. On the way up, it might be bound with other such actions either using bind or the do notation.

You might ask: How does Haskell make sure that all IO actions get to the top, so that the system may execute them? It doesn’t! But consider what you would have to do in order not to pass an action to the top. You’d have to explicitly ignore it, as in:

let _ = putStrLn "Don't print me!"

Ignoring things in Haskell is not a default thing.

In my previous post I talked about categories, functors, endofunctors, and a little about monads. Why didn’t I just start with a Haskell definition of a monad and skip the math? Because I believe that monads are bigger than that. I don’t want to pigeonhole monads based on some specific language or application. If you know what to look for you’ll find monads everywhere. Some languages support them better, as a single general abstraction, others require you to re-implement them from scratch for every use case. I want you to be able to look at a piece of C++ code and say, “Hey, that looks like a monad.” That’s why I started from category theory, which sits above all programming languages and provides a natural platform to talk about monads.

In this installment I will talk about some specific programming challenges that, on the surface, seem to have very little in common. I will show how they can be solved and how from these disparate solutions a pattern emerges that can be recognized as a monad. I’ll talk in some detail about the Maybe monad and the List monad, since they are simple and illustrate the basic ideas nicely. I will leave the really interesting examples, like the state monad and its variations, to the next installment.

Challenges of Functional Programming

You know what a function is: called with the same argument it always returns the same result. A lot of useful computations map directly to functions. Still, a lot don’t. We often have to deal with some notions of computations that are not functions. We can describe them in words or implement as procedures with side effects. In non-functional languages such computations are often called “functions” anyway– I’ll enclose such usage with quotation marks. Eugenio Moggi, the guy who introduced monads to computing, listed a few examples:

  • Partial “functions”: For some arguments they never terminate (e.g., go into an infinite loop). These are not really functions in the mathematical sense.
  • Nondeterministic “functions”: They don’t return a single result but a choice of results. Non-deterministic parsers are like this. When given an input, they return a set of possible parses. Which of them is right depends on the context. These are not really functions because a function must return a single value.
  • “Functions” with side effects: They access and modify some external state and, when called repeatedly, return different results depending on that state.
  • “Functions” that throw exceptions: They are not defined for certain arguments (or states) and they throw exceptions instead of returning a result.
  • Continuations.
  • Interactive input: getchar is a good example. It’s result depends on what hits the keyboard.
  • Interactive output: putchar is a good example. It’s not a function because it doesn’t return anything (if it returns an error code, it doesn’t depend on the argument). Still, it can’t be just optimized away if we want any output to appear on the screen.

The amazing thing is that, using some creative thinking, all these problems can be solved using pure functional methods. I won’t discuss all of them, but I’ll show you a series of progressively more interesting examples.

Error Propagation and Exceptions

Who doesn’t deal with error propagation? Let’s dissect the problem: You are defining a computation that can return a valid result only for a subset of possible arguments. So what happens when the argument is “wrong”? Well, you can return an error.

In the simplest case you might add one extra bit, success or failure, to your result. Sometimes you can spare a bit in the result type: If the correct result is always a non-negative number, you may return negative one as an error. Or, if the result is a pointer, you might return a null pointer instead. But these are just small hacks introduced for the sake of performance. And, like most hacks, they can lead to dangerous code. Consider this (likely incorrect) code:

size_t off = fileName.find('.');
string ext = fileName.substr(off, fileName.length() - off);

If fileName contains no dot, the result is a special value npos signifying an error. The problem is that npos is of the same type as a non-error result, so it can be passed quietly to string::substr causing undefined behavior.

A more general and safer solution is to change– extend– the type of the result to include an error bit. You can do it for any type of result. In Haskell, you just define a type constructor called Maybe:

data Maybe a = Nothing | Just a

It takes an arbitrary type a and defines a new type that adds the special value Nothing to the set of possible values of a. In C++ that would be equivalent to a template:

template<class T>
struct Maybe {
    T just; // valid if 'nothing' is false
    bool nothing;

(This is just an example. I’m not arguing that Maybe would be very useful in C++ which has other mechanisms for error propagation.)

So here we have a way to turn a computation that’s not defined for all values into a function that’s defined over the whole domain, but which returns a new richer type.

The next question is, do these new things– functions returning Maybe— compose? What should the caller do with the result of such a function when it’s part of a larger computation? One thing is for sure, this result cannot be passed directly to an unsuspecting function–the error cannot be easily ignored– which is good. If we replace find by a new function safe_find that returns a Maybe<size_t>, the client won’t call substr with it. The types wouldn’t match. Instead, the result of safe_find must be unpacked and (much more likely than before) tested.

Maybe<size_t> off = safe_find(fileName, '.');
string ext;
if (!off.nothing)
    ext = fileName.substr(off.just, fileName.length() - off.just);

Notice what happened here: By changing the return type of a function we broke the natural way functions compose– the result of one becoming the input of the next. On the other hand, we have come up with a new way of composing such functions. Let me chain a few of such compositions for better effect:

Maybe<Foo> foo = f(x);
if (!foo.nothing) {
    Maybe<Bar> bar = g(foo.just);
    if (!bar.nothing) {
        Maybe<Baz> baz = h(bar.just);
        if (!baz.nothing) {

I have highlighted the elements of the “glue,” which is used to compose our new, more error-conscious functions. Notice how this boilerplate glue gets in the way of code clarity. Ideally, we would like to write something like this:

    auto y = f(x);
    auto v = g(y);
    auto z = h(v);
    return z;

where DO would magically provide the glue that’s implicit in the chaining of f, g, and h.

It’s not easy to do this– abstract the glue away– in C++. I’m not even going to try. But in Haskell it’s a different story. Let’s start with the almost direct translation of the C++ code into Haskell:

compose n =
    let m = f n in
    case m of
    Nothing -> Nothing
    Just n1 ->
        let m1 = g n1 in
        case m1 of
        Nothing -> Nothing
        Just n2 -> 
            let n3 = h n2 in

The if statements are replaced by Haskell’s pattern matching (case x of). The ms are the Maybes and the ns art their contents (if they aren’t Nothings).

Notice the characteristic cascading style of this code– the nested conditionals or pattern matches. Let’s analyze one level of such a cascade. We start with a Maybe value (one returned by f n). We unpack it and examine the contents. If the result is not an error (Just n1), we continue with the rest of the body of compose. I have highlighted the “rest of the body” in blue (the “glue” is still in red).

What’s also important is that, if the result is an error (the Nothing branch of the pattern match) the whole “rest of the body” is skipped. In order to abstract the glue, we have to abstract the “rest of the body” as well. Such an abstraction is called a continuation. Let’s write this continuation as a lambda (lambdas in Haskell are written using the backslash, which is nothing but the Greek letter λ missing one leg):

\ n1 -> 
    let m1 = g n1 in
    case m1 of
    Nothing -> Nothing
    Just n2 -> 
        let n3 = h n2 in

And here’s the trick: We can abstract the glue as a (higher-order) function that takes a Maybe value and a continuation. Let’s call this new function bind and rewrite compose with its help:

compose n =
    let m = f n in
    -- the first argument is m, the second is the whole continuation
    bind m (\n1 ->
                let m1 = g n1 in
                case m1 of
                Nothing -> Nothing
                Just n2 -> 
                    let n3 = h n2 in

Here’s how bind is implemented:

bind m f = 
    case m of 
    Nothing -> Nothing  -- bypass the continuation
    Just n -> f n       -- pass n to the continuation

or, more compactly,

bind Nothing cont  = Nothing
bind (Just n) cont = cont n

Figure 1 illustrates the complete code transformation. The result of f n, which is a Maybe, is passed to bind, represented by a blue box. Inside bind the Maybe is unpacked. If its value is Nothing, nothing happens. If its value is Just n1, the rest of the code, the continuation, is called with the argument n1. The continuation itself is rewritten using bind, and so on. The final continuation calls return, which I will explain shortly.

Fig 1. Composition of functions that return Maybe results

The position of bind, the blue box in Figure 1, between its Maybe argument and the continuation suggests that infix notation might be more appropriate. Indeed, in Haskell bind is represented by the infix operator, >>=:

Nothing  >>= cont = Nothing
(Just x) >>= cont = cont x

(The left-hand side of the equal sign is the operator between its two arguments [actually, patterns], and the right-hand side is the body of the function.) We can express the type signature of the bind function as:

(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b

It takes a Maybe a and a function from a to Maybe b (that’s our continuation). It returns a Maybe b. When dealing with higher order functions I find type signatures extremely useful; so much so that I often use them as comments in my C++ meta-code.

The complete rewrite of compose using >>= looks like this:

compose1 n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Now is the time to explain the mysterious return at the end of the function. No, it’s not the keyword for returning a result from a function. It’s a function that takes an argument of type a and turns it into a Maybe a:

return n = Just n

We need return because the result of compose is a Maybe. If any of the functions returns Nothing, compose returns Nothing (it’s part of the definition of >>=). Only when all functions succeed do we call return with the correct result, n3. It turns n3 into Just n3, thus announcing the success of the computation and encapsulating the final result.

You might say that using return instead of Just n3 is an overkill, but there are good reasons to do that. One is to encapsulate and hide direct access to the implementation of Maybe. The other has to do with the ways this pattern is generalized beyond Maybe.

The Maybe Monad

What does Maybe have to do with monads? Let’s see what we have done so far from a more general perspective.

We started from a computation which didn’t behave like a function– here, it was not defined for all arguments. We had found a clever way to turn it into a function by enriching its return type. Such general “enriching” of types can be expressed as a type constructor.

From a category-theoretical point of view, defining a type constructor, which is a mapping from types to types, gets us half way towards defining an endofunctor. The other part is the mapping of functions (morphisms). For any function that takes values of type a and returns values of type b we need a corresponding function that acts between the enriched types. In Haskell, the mapping of functions to functions can be expressed as a higher-order function. When this mapping is part of an endofunctor, the corresponding higher-order function is called fmap.

Let’s call the type constructor M. It constructs the enriched type M a from any type a. The function fmap has the following signature:

fmap :: (a -> b) -> (M a -> M b)

It maps a function from a to b to a function from M a to M b. In the case of Maybe we already know the type constructor part, and fmap will follow soon.

Let’s go back to our computation turned function returning an enriched type. In general, its signature is:

f :: a -> M b

Since we care about composability of computations, we need a way to glue the enriched functions together. In the process of gluing we usually have to inspect the enriched type and make decisions based on its value. One of such decisions might be whether to continue with the rest of the computation or not. So the general bind operation must take two arguments: the result of the previous enriched function and the rest of the computation– a continuation:

bind :: M a -> (a -> M b) -> M b

If you squint enough, you can see a similarity between bind and fmap.

Here’s how you do it: Imagine that you supply bind with its second argument, the continuation. There is still one free parameter of the type M a. So what’s left is a function that takes M a and returns M b. (This is not exactly currying, since we are binding the second argument rather than the first, but the idea is the same.) That’s exactly the type of function that occurs on the right hand side of the signature of fmap. Seen from that point of view, bind maps functions into functions:

(a -> M b) -> (M a -> M b)

The function of the left hand side is our continuation, the one on the right hand side is bind with the second argument plugged by the continuation.

This signature is really close to that of fmap, but not quite. Luckily, we still have a second family of functions at our disposal, the polymorphic return. The signature of return is:

return :: a -> M a

Using both bind and return we can lift any function f:

f :: a -> b

to a function g:

g :: M a -> M b

Here’s the magic formula that defines g in terms of f, bind, and return (the dot denotes regular function composition):

g ma = bind ma (return . f)

Or, using infix notation:

g ma = ma >>= (return . f)

This mapping of f into g becomes our definition of fmap. Of course, fmap must obey some axioms, in particular it must preserve function composition and interact correctly with unit (which I will discuss shortly). These axioms translate back into corresponding conditions on bind and return, which I’m not going to discuss here.

Now that we have recovered the functor, we need the two other parts of a monad: unit and join. Guess what, return is just another name for unit. For any type, it lifts an element of that type to an element of the enriched type in the most natural manner (incidentally, the word natural has a very precise meaning in category theory; both unit and join are natural transformations of the functor (M, fmap)).

To recover join, let’s look at its type signature in Haskell:

join :: M (M a) -> M a

It collapses one layer of the type constructor. In the case of Maybe it should map a double Maybe to a single Maybe. It’s easy to figure out that it should map Just (Just x) into Just x and anything else to Nothing. But there is a more general way of defining join using bind combined with the identity function:

join :: M (M a) -> M a
join mmx = mmx >>= id

We are binding a value of type M (M a) to a function id. This function, when acting on M a returns M a. You can convince yourself that these signatures match the general signature of bind (see Appendix); and that this definition, when applied to Maybe, produces the correct result.

To summarize: What we have done here is to show that Maybe together with bind and return is a monad in the category-theory sense. Actually, we have shown something far more general: Any triple that consists of a type constructor and two functions bind and return that obey certain identities (axioms) define a monad. In category theory this triple is called a Kleisli triple and can be used as an alternative definition of a monad.

Syntactic Sugar

So far you’ve seen only the Maybe example, so it might seem like we are pulling out a monadic cannon to kill a mosquito. But, as you’ll see later, this same pattern pops up in a lot of places in various guises. In fact it’s so common in functional languages that it acquired its own syntactic sugar. In Haskell, this sugar is called the do notation. Let’s go back to the implementation of our function compose:

compose n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Here’s exactly the same code in do notation:

compose n = do
    n1 <- f n
    n2 <- g n1
    n3 <- h n2
    return n3

This looks deceptively similar to an imperative program. In fact here’s a C++ version of it (for the sake of the argument let’s assume that f takes a pointer to Foo and h returns an integer) :

int compose(Foo * n)
    auto n1 = f(n);
    auto n2 = g(n1);
    auto n3 = h(n2);
    return n3;

Uh, one more thing. This is how you would call it:

try {
catch(...) {
    // error handling

In C++ you get virtually the same functionality not by modifying the return type, but by throwing an exception. (Now you may utter your first cry, “Hey, that looks like a monad!”, when looking at C++ code.)

Just like in our Haskell example, once any of the functions reports an error (throws an exception), the rest of the body of compose is skipped. You might say that C++ exceptions offer more power than the Maybe monad and you’ll be right. But that’s because I haven’t shown you the Error monad and the Exception monad.

Where Haskell’s Exception monad beats C++ exceptions is in type checking. Remember the unfortunate attempt at adding exceptions specification to the C++ type system? (Java went the same way.) Here’s what the C++ standard says:

An implementation shall not reject an expression merely because when executed it throws or might throw an exception that the containing function does not allow.

In other words, exceptions specifications are bona fide comments. Not so in Haskell! If a function returns a Maybe or Exception, that becomes part of its type signature which is checked both at the call site and at the function definition site. No cheating is allowed, period.

But the major difference between Haskell’s and C++’s approach is that the do notation generalizes to all monads, whereas C++ neat try/catch syntax applies only to exceptions.

A word of caution when reading Haskell monadic code. Despite similarities in usage, Haskell’s left arrow is not an assignment. The left hand side identifier corresponds to the argument of the continuation (which is the rest of the do block). It is the result of unpacking the outcome of the right hand side expression. This unpacking always happens inside bind.

Non-deterministic Computations

In the previous post I introduced the list monad by defining a functor and two families of functions. The type constructor part of the functor mapped any type a into the list of a, [a]. The part that acted on functions (now we know that, in general, it’s called fmap; but for lists it’s just map) worked by applying the function to each element of the list. The unit and the join were polymorphic functions. unit was defined as:

 unit x = [x]

and join as concat.

Now I can tell you that the list monad provides the solution to the problem of implementing non-deterministic computations. Imagine parsing a non-deterministic grammar. A production might offer multiple alternative parse trees for the same input.

These types of computations may be simulated with functions returning lists of possible results. Here’s the same trick again: whatever type is returned by a deterministic parser (e.g., a parse-tree type), it’s now turned into an enriched type (a parse tree list).

We’ve already seen the category-theory construction of the list monad but here we are facing a slightly different problem: how to string together functions that return lists. We know that we have to define bind. It’s signature, in this case, would be:

bind :: [a] -> (a -> [b]) -> [b]

It takes a list (which is the result of a previous function) and a continuation, (a -> [b]), and must produce another list. The obvious thing is to apply the continuation to each element of the input list. But since the continuation also produces a list, we’ll end up with a list of lists. To get a single list as the result, we’ll simply concatenate the sublists. Here’s the final implementation:

xs >>= cont = concat (map cont xs)

The application of the continuation to each element of the input list is done using map.

Is this a coincidence that we were able to define bind in therms of join and fmap? Not at all. The general formula for converting the functor definition of a monad to a Kleisli triple is:

  • Take the object-mapping part of the functor (the type constructor)
  • Define bind as
    bind x f = join ((fmap f) x))

    where fmap is the part of the functor that maps morphisms.

  • Define return as unit

Now we know how to move between those two definitions back and forth.

Just as in the case of Maybe, we can apply the do notation to functions returning lists:

toss2Dice = do
    n <- tossDie
    m <- tossDie
    return (n + m)

Here, tossDie returns the list of all possible outcomes of a die toss, and toss2Dice returns the list of all possible sums of the outcomes of a two-die toss.

An interesting observation is that the list monad is closely related to list comprehensions, down to the use of left arrows (in Haskell). For instance, the above example is equivalent to:

toss2Dice = [n + m | n <- tossDie, m <- tossDie]


There is a large class of computations that convert input to output in a non-functional way. Many of those can be expressed as functions that return “enriched” output. This enrichment of the output can be expressed in terms of a type constructor. This type constructor defines the first component of a monad.

Computations have to be composable, so we need a way of composing the enriched functions. This introduces the second component of the monad, the bind family of higher-order functions.

Finally, we should be able to construct functions that return enriched types, and for that we need the third component of the monad, the return family of functions. They convert regular values into their enriched counterparts.

I’ve shown you two examples of Haskell monads in action, but the real treat will come in the third installment of my mini-series. I’ll show you how to define and compose functions that, instead of returning regular values, return functions that calculate those values.

Appendix: Join From Bind

By popular demand (see comments), I decided to explain in more detail the typing of the formula:

join :: M (M a) -> M a
join mmx = mmx >>= id

At first look it seems like id doesn’t have the right signature for the second parameter to bind:

bind :: M a' -> (a' -> M b') -> M b'

(I changed the names of type arguments to a' and b' to avoid confusion.) However, notice that here we are calling bind with the first argument, mmx, of type M (M a). It means that a' in this particular instantiation of bind is (M a). The second argument to bind is id with the signature:
id: c -> c
Here, id will be called with c equal to (M a), so its return type will also be (M a). With those substitutions, bind will also return type (M a):

M (M a) -> (M a -> M a) -> M a

Thus the whole combination (mmx >>= id) ends up with the right type signature for join.


  1. Eugenio Moggi, Notions of Computation and Monads. This is a hard core research paper that started the whole monad movement in functional languages.
  2. Philip Wadler, Monads for Functional Programming. The classic paper introducing monads into Haskell.
  3. Tony Morris, What Does Monad Mean?
  4. Brian Beckman, Don’t fear the Monad
  5. Graham Hutton, Erik Meijer, Monadic Parsing in Haskell.
  6. Brian McNamara, Yannis Smaragdakis, Syntax sugar for FC++: lambda, infix, monads, and more. A C++ template library for functional programming that imitates Haskell pretty closely.

The Monad is like a bellows:
it is empty yet infinitely capable.
The more you use it, the more it produces;
the more you talk about it, the less you understand.

–Monad Te Ching

I don’t know if I’m exaggerating but it seems like every programmer who gets monads posts a tutorial about them. (And each post begins with: There’s already a lot of monad tutorials on the Internet, but…) The reason is that getting monads it’s like a spiritual experience that you want to share with others.

When facing a monad, people often behave like the three blind men describing an elephant. You’ll see monads described as containers and monads described as actions. Some people see them as a cover-up for side effects, others as examples of endofunctors in Category Theory.

Monads are hard to describe because they don’t correspond to anything in our everyday experience. Compare this with Objects in Object-Oriented programming. Even an infant knows what an object is (something you can put in your mouth). What do you do with a monad?

But first, let me answer the pertinent question:

Why Bother?

Monads enable pure functional programmers to implement mutation, state, I/O, and a plethora of other things that are not functions. Well, you might say, they brought it on themselves. They tied their hands behind their backs and now they’re bragging that they can type with their toes. Why should we pay attention?

The thing is, all those non-functional things that we are so used to doing in imperative programming are also sources of a lot of troubles. Take side effects for instance. Smart programmers (read: the ones who burnt their fingers too many times) try to minimize the use of global and static variables for fear of side effects. That’s doable if you know what you’re doing. But the real game changer is multithreading. Controlling the sharing of state between threads is not just good programming practice– it’s a survival skill. Extreme programming models are in use that eliminate sharing altogether, like Erlang’s full isolation of processes and its restriction of message passing to values.

Monads stake the ground between total anarchy of imperative languages and the rigid dictatorship of Erlang-like isolationism. They don’t prohibit sharing or side effects but let you control them. And, since the control is exercised through the type system, a program that uses monads can be checked for correctness by the compiler. Considering how hard it it to test for data races in imperative programs, I think it’s worth investing some time to learn monads.

There is also a completely different motivation: metaprogramming. The template language used for metaprogramming in C++ is a pure functional language (see my blog post, What does Haskell have to do with C++?). If monads are so important in functional programming, they must also pop up in C++ metaprogramming. And indeed they do. I hope to discuss this topic in a future post.

So what’s a monad?

A Categorical Answer

If you don’t know anything about category theory, don’t get intimidated. This is really simple stuff and it will clarify a lot of things, not to mention earning you some bragging rights. My main goal is to share some intuitions from mathematics that will build foundations for a deeper understanding of monads in programming. In this installment I will explain categories, functors, and endofunctors, leading up to monads. I will give examples taken both from everyday life and from programming. I will really get into monads and their practical applications in the next installment, so be patient.


A category is a natural extension of our notion of sets and functions. The generalization of a set in a category is called an object (a pretty neutral term with little semantic ballast), and the generalization of a function is called a morphism. In fact, the standard example of a category is the category of sets and functions called (capital letter) Set.

A morphism (read “function”) goes from one object (read “set”) to another. Mathematical functions like sin or exp usually go from the set of real numbers to the set of real numbers. But you may also define functions like isPrime that go from natural numbers to Booleans, or a function price that goes from a set of goods to the set of numbers.

The only thing a mathematician needs to know about morphisms is that they can be composed. If you have a morphism from A to B, A->B, and another going from B to C, B->C, then they can be composed to a morphism from A to C, A->C. And just like the standard composition of functions, morphism composition must be associative, so we don’t need parentheses when composing more than two of them.

Actually, two things. There must be, for every object, a special morphism called identity that essentially does nothing and when composed with any other morphism reproduces the same morphism.

Just to throw you off the track, a category doesn’t have to be built on sets and functions. You can easily construct simple categories from blobs and arrows. Fig 1 shows such a category that contains two objects and four morphisms: arrows between them (formally, those arrows are ordered pairs of objects so, for instance, f is a pair (A, B)). You can easily check that any two morphisms can be composed and that the two moprphisms iA and iB serve as identities.

Fig 1. A simple category with two objects and four morphisms.

That’s it! Hopefully I have just convinced you that a category is not a big deal. But let’s get down to Earth. The one category that’s really important in programming languages is the category of types and functions, in particular its Haskell version called Hask. There usually is a finite set of basic types like integers or Booleans, and an infinite set of derived types, like lists of integers, functions from integers to Booleans, etc. In Hask, a type is just a set of values. For instance, the type Char is a set {‘a’, ‘b’, ‘c’, … }.

So, in the category Hask, types are objects and functions are morphisms. Indeed, a function maps one type into another (forget for a moment functions of multiple arguments– they can be modeled with currying– and polymorphic functions– they are families of functions). And these are functions in the functional-programming sense: called with the same values they return the same values–no side effects allowed.

Function composition is just passing the result of one function as an argument to another. The identity function takes x and immediately returns it back.

This is all fine, but what’s in it for me, you might ask. So here’s the first insight and a moment of Zen. If there is one thing that you can call the essence of programming, it’s composability. In any style of programming you always compose your program from smaller pieces, and those pieces from even smaller pieces, and so on. That’s why categories with their composable morphisms are so important. The essence of Lego blocks is the way they fit together, their composability, not the color or size. The essence of functional programming is how functions work together: how you can build larger functions from smaller ones.

Every category is defined by its choice of objects and morphisms. But is there something that can characterize a given category that’s independent of its choice of particular objects and morphisms? How do you expose the inner structure of a particular category? Mathematicians know exactly how to do that. You have to be able to map categories into other categories while preserving some constraints imposed by the way morphisms are attached to objects and the way they compose. Such maps let you find similarities between categories and catalog different kinds of categories. That’s when things get really interesting.


A functor, F, is a map from one category to another: it maps objects into objects and morphisms into morphisms. But it can’t do it in a haphazard way because that would destroy the very structures that we are after. So we must impose some “obvious” (mathematicians love that word) constraints.

First of all, if you have a morphism between two objects in the first category then it better be mapped into a morphism between the corresponding objects in the second category. Fig 2 explains this diagrammatically. Object A is mapped into F(A), object B into F(B). A morphism f from A to B is mapped into a morphism F(f) from F(A) to F(B). Mathematicians say that such a diagram must commute, that is the result must be the same whether you go from A to F(A) and then apply F(f), or first apply f and then go from B to F(B).

Functor diagram

Fig 2. Diagram showing the action of a functor F on objects A and B and a morphism f. The bottom part lives in F's domain (source) category, the top part in its codomain (the target).

Moreover, such mapping should preserve the composition property of morphisms. So if morphism h is a composition of f and g, then F(h) must be a composition of F(f) and F(g). And, of course, the functor must map identity morphisms into identity morphisms.

To get a feel for how constrained functors are by these conditions, consider how you could map the category in Fig 1 into itself (such a functor just rearranges things inside one category). There are two trivial mappings that collapse both objects into one (either A or B), and turn all morphisms into identity. Then there is the identity functor that maps both objects into themselves and all morphisms into themselves. Finally, there is just one “interesting” functor that maps A into B and B into A with f and g switching roles. Now imagine a similar category but with the g arrow removed (yes, it’s still a category). Suddenly there is no functor other than the collapsing ones between Fig 1 and that new category. That’s because the two categories have completely different structure.

Let me now jump into more familiar territory. Since we are mostly interested in one category, Hask, let me define a functor that maps that category into itself (such functors are called endofunctors). An object in Hask is a type, so our functor must map types into types. The way to look at it is that a functor in Hask constructs one type from another– it’s a type constructor. Don’t get confused by the name: a type constructor creates a new type in your program, but that type has already existed in Hask.

A classical example is the list type constructor. Given any type it constructs a list of that type. Type Integer is mapped into list of integers or, in Haskell notation, [Integer]. Notice that this is not a map defined on integer values, like 1, 2, or 3. It also doesn’t add a new type to Hask— the type [Integer] is already there. It just maps one type into another. For C++ programmers: think of mapping type T into a container of T; for instance, std::vector<T>.

Mapping the types is the easy part, what about functions? We have to find a way to take a particular function and map it into a function on lists. That’s also easy: apply the function to each element of the list in turn. There is a (higher level) function in Haskel that does it. It’s called map and it takes a function and a list and returns a new list (or, because of currying, you may say that it takes a function and returns a function acting on lists). In C++ there is a corresponding template function called std::transform (well, it takes two iterators and a function object, but the idea is the same).

Mathematicians often use diagrams to illustrate the properties of morphisms and functors (see Fig 2). The arrows for morphisms are usually horizontal, while the arrows for functors are vertical (going up). That’s why the mapping of morphisms under a functor is often called lifting. You can take a function operating on integers and “lift it” (using a functor) to a function operating on lists of integers, and so on.

The list functor obviously preserves function composition and identity (I’ll leave it as an easy but instructive exercise for the reader).

And now for another moment of Zen. What’s the second most important property of programming? Reusability! Look what we have just done: We took all the functions we’ve implemented so far and lifted them to the level of lists. We’ve got functions operating on lists essentially for free (well, we’ve got a small but important subset of those functions). And the same trick may be applied to all kinds of containers, arrays, trees, queues, unique_ptrs and more.

It’s all beautiful, but you don’t really need category theory to apply functions to lists. Still it’s always good to see patterns in programming, and this one is definitely a keeper. The real revolution starts with monads. And, guess what, the list functor is actually a monad. You just need a few more ingredients.

What’s the intuition behind the statement that mappings expose the structure of the system? Consider the schematic of the London underground in Fig 3. It’s just a bunch of circles and lines. It’s only relevant because there is a mapping between the city of London and this schematic. The circles correspond to tube stations and the lines to train connections. Most importantly, if trains run between two stations, the corresponding circles in the diagram are connected by lines and vice versa: these are the constraints that the mapping preserves. The schematic shows a certain structure that exists in London (mostly hidden underground) which is made apparent by the mapping.

Fig 3. The schematic map of London underground system.

Interestingly, what I’m doing here is also mapping: London and the underground map correspond to two categories. Trains stations/circles are objects and train connections/lines are morphism. How’s that for an example?


Mathematicians love mappings that preserve “obvious” constraints. As I explained, such mappings abstract inner structures away from the details of implementation. But you can also learn a lot about structure by studying non-trivial mappings into itself. Functors that map a category into itself are called endofunctors (like endo-scopes they let you look inside things). If functors expose similarities, endofunctors expose self-similarities. Take one look at the fractal fern, Fig 4, and you’ll understand how powerful self-similarity can be.

Fractal Fern

Fig 4. This fractal fern was generated using just four endomorphisms.

With a little bit of imagination you can see the list functor exposing fern-like structures inside Hask (Fig 5). Chars fan out into lists of Chars, which then fan out into lists of lists of Chars, and so on, ad infinitum. Horizontal structures described by functions from Char to Bool are reflected at higher and higher levels as functions on lists, lists of lists, etc.

Fig 5. The action of the list type constructor reveals fractal-like structure inside Hask. The functor lifts things up, the functions act horizontally.

A C++ template that takes a type parameter could be considered a type constructor. How likely is it that it also defines a functor (loosely speaking– C++ is not as mathematized as Haskell)? You have to ask yourself: Is the type parameter constrained in any way? It’s often hard to say, because type constraints are implicit in the body of a template and are tested only during instantiation. For instance, the type parameter for a std::vector must be copyable. That eliminates, for instance, classes that have private or deleted (in C++0x) copy constructors. This is not a problem though, because copyable types form a subcategory (I’m speaking really loosely now). The important thing is that a vector of copyable is itself copyable, so the “endo-” part of the endomorphism holds. In general you want to be able to feed the type created by the type constructor back to the type constructor, as in std::vector<std::vector<Foo>>. And, of course, you have to be able to lift functions in a generic way too, as in std::transform.


Ooh, Monads!
–Haskell Simpson

It’s time to finally lift the veil. I’ll start with the definition of a monad that builds on the previous sections and is mostly used by mathematicians. There is another one that’s less intuitive but easier to use in programming. I’ll leave that one for later.

A monad is an endofunctor together with two special families of morphisms, both going vertically, one up and one down (for “directions” see Fig 5). The one going up is called unit and the one going down is called join.

Now we are juggling a lot of mappings so let’s slow down to build some intuition. Remember, a functor maps objects: in our case, types, which are sets of values. The functor doesn’t see what’s inside the objects; morphisms, in general, do. In our case, a morphism is a function that maps values of one type into values of another type. Our functors, which are defined by type constructors, usually map poorer types into richer types; in the sense that type Bool is a set that contains just two elements, True and False, but type [Bool] contains infinitely many lists of True and False.

Unit takes a value from the poorer type, then picks one value from the richer type, and pronounces the two roughly equivalent. Such a rough equivalent of True from the Bool object is the one-element list [True] from the [Bool] object. Similarly, unit would map False into [False]. It would also map integer 5 into [5] and so on.

Unit can be thought of as immersing values from a lower level into the higher level in the most natural way possible. By the way, in programming we call a family of functions defined for any type a polymorphic function. In C++, we would express unit as a template, like this:

template<class T>
std::vector<T> unit(T value) {
    std::vector<T> vec;
    return vec;

To explain join, imagine the functor acting twice. For instance, from a given type T the list functor will first construct the type [T] (list of T), and then [[T]] (list of list of T). Join removes one layer of “listiness” by joining the sub-lists. Plainly speaking, it just concatenates the inner lists. Given, for instance, [[a, b], [c], [d, e]], it produces [a, b, c, d, e]. It’s a many-to-one mapping from the richer type to the poorer type and the type-parameterized family of joins also forms a polymorphic function (a template, in C++).

There are a few monadic axioms that define the properties of unit and join (for instance that unit and join cancel each other), but I’m not going to elaborate on them. The important part is that the existence of unit and join imposes new constraints on the endofunctor and thus exposes even more structure.

Mathematicians look at join as the grandfather of all multiplication with unit being its neutral element. It’s heaven for mathematicians because multiplication leads to algebraic structures and indeed monads are great for constructing algebras and finding their hidden properties.

Unlike mathematicians, we programmers are not that interested in algebraic structures. So there must be something else that makes monads such a hit. As I mentioned in the beginning, in programming we often face problems that don’t naturally translate into the functional paradigm. There are some types of computations that are best expressed in imperative style. It doesn’t mean they can’t be translated into functions, it’s just that the translation is somewhat awkward and tedious. Monads provide an elegant tool to do this translation. Monads made possible the absorption and assimilation of imperative programming into functional programming, so much so that some people claim (tongue in cheek?) that Haskell is the best imperative language. And like all things functional monads are bound to turn around and find their place in imperative programming. But that’s material for my next blog post.


Next Page »