Category Theory

Archived Posts from this Category

July 29, 2015

Representable Functors

Posted by Bartosz Milewski under Category Theory, Functional Programming
[21] Comments

This is part 14 of Categories for Programmers. Previously: Free Monoids. See the Table of Contents.

It’s about time we had a little talk about sets. Mathematicians have a love/hate relationship with set theory. It’s the assembly language of mathematics — at least it used to be. Category theory tries to step away from set theory, to some extent. For instance, it’s a known fact that the set of all sets doesn’t exist, but the category of all sets, Set, does. So that’s good. On the other hand, we assume that morphisms between any two objects in a category form a set. We even called it a hom-set. To be fair, there is a branch of category theory where morphisms don’t form sets. Instead they are objects in another category. Those categories that use hom-objects rather than hom-sets, are called enriched categories. In what follows, though, we’ll stick to categories with good old-fashioned hom-sets.

A set is the closest thing to a featureless blob you can get outside of categorical objects. A set has elements, but you can’t say much about these elements. If you have a finite set, you can count the elements. You can kind of count the elements of an inifinite set using cardinal numbers. The set of natural numbers, for instance, is smaller than the set of real numbers, even though both are infinite. But, maybe surprisingly, a set of rational numbers is the same size as the set of natural numbers.

Other than that, all the information about sets can be encoded in functions between them — especially the invertible ones called isomorphisms. For all intents and purposes isomorphic sets are identical. Before I summon the wrath of foundational mathematicians, let me explain that the distinction between equality and isomorphism is of fundamental importance. In fact it is one of the main concerns of the latest branch of mathematics, the Homotopy Type Theory (HoTT). I’m mentioning HoTT because it’s a pure mathematical theory that takes inspiration from computation, and one of its main proponents, Vladimir Voevodsky, had a major epiphany while studying the Coq theorem prover. The interaction between mathematics and programming goes both ways.

The important lesson about sets is that it’s okay to compare sets of unlike elements. For instance, we can say that a given set of natural transformations is isomorphic to some set of morphisms, because a set is just a set. Isomorphism in this case just means that for every natural transformation from one set there is a unique morphism from the other set and vice versa. They can be paired against each other. You can’t compare apples with oranges, if they are objects from different categories, but you can compare sets of apples against sets of oranges. Often transforming a categorical problem into a set-theoretical problem gives us the necessary insight or even lets us prove valuable theorems.

The Hom Functor

Every category comes equipped with a canonical family of mappings to Set. Those mappings are in fact functors, so they preserve the structure of the category. Let’s build one such mapping.

Let’s fix one object a in C and pick another object x also in C. The hom-set C(a, x) is a set, an object in Set. When we vary x, keeping a fixed, C(a, x) will also vary in Set. Thus we have a mapping from x to Set.

If we want to stress the fact that we are considering the hom-set as a mapping in its second argument, we use the notation:

C(a, -)

with the dash serving as the placeholder for the argument.

This mapping of objects is easily extended to the mapping of morphisms. Let’s take a morphism f in C between two arbitrary objects x and y. The object x is mapped to the set C(a, x), and the object y is mapped to C(a, y), under the mapping we have just defined. If this mapping is to be a functor, f must be mapped to a function between the two sets:

C(a, x) -> C(a, y)

Let’s define this function point-wise, that is for each argument separately. For the argument we should pick an arbitrary element of C(a, x) — let’s call it h. Morphisms are composable, if they match end to end. It so happens that the target of h matches the source of f, so their composition:

f ∘ h :: a -> y

is a morphism going from a to y. It is therefore a member of C(a, y).

We have just found our function from C(a, x) to C(a, y), which can serve as the image of f. If there is no danger of confusion, we’ll write this lifted function as:

C(a, f)

and its action on a morphism h as:

C(a, f) h = f ∘ h

Since this construction works in any category, it must also work in the category of Haskell types. In Haskell, the hom-functor is better known as the Reader functor:

type Reader a x = a -> x

instance Functor (Reader a) where
    fmap f h = f . h

Now let’s consider what happens if, instead of fixing the source of the hom-set, we fix the target. In other words, we’re asking the question if the mapping

C(-, a)

is also a functor. It is, but instead of being covariant, it’s contravariant. That’s because the same kind of matching of morphisms end to end results in postcomposition by f; rather than precomposition, as was the case with C(a, -).

We have already seen this contravariant functor in Haskell. We called it Op:

type Op a x = x -> a

instance Contravariant (Op a) where
    contramap f h = h . f

Finally, if we let both objects vary, we get a profunctor C(-, =), which is contravariant in the first argument and covariant in the second (to underline the fact that the two arguments may vary independently, we use a double dash as the second placeholder). We have seen this profunctor before, when we talked about functoriality:

instance Profunctor (->) where
  dimap ab cd bc = cd . bc . ab
  lmap = flip (.)
  rmap = (.)

The important lesson is that this observation holds in any category: the mapping of objects to hom-sets is functorial. Since contravariance is equivalent to a mapping from the opposite category, we can state this fact succintly as:

C(-, =) :: C^op × C -> Set

Representable Functors

We’ve seen that, for every choice of an object a in C, we get a functor from C to Set. This kind of structure-preserving mapping to Set is often called a representation. We are representing objects and morphisms of C as sets and functions in Set.

The functor C(a, -) itself is sometimes called representable. More generally, any functor F that is naturally isomorphic to the hom-functor, for some choice of a, is called representable. Such functor must necessarily be Set-valued, since C(a, -) is.

I said before that we often think of isomorphic sets as identical. More generally, we think of isomorphic objects in a category as identical. That’s because objects have no structure other than their relation to other objects (and themselves) through morphisms.

For instance, we’ve previously talked about the category of monoids, Mon, that was initially modeled with sets. But we were careful to pick as morphisms only those functions that preserved the monoidal structure of those sets. So if two objects in Mon are isomorphic, meaning there is an invertible morphism between them, they have exactly the same structure. If we peeked at the sets and functions that they were based upon, we’d see that the unit element of one monoid was mapped to the unit element of another, and that a product of two elements was mapped to the product of their mappings.

The same reasoning can be applied to functors. Functors between two categories form a category in which natural transformations play the role of morphisms. So two functors are isomorphic, and can be thought of as identical, if there is an invertible natural transformation between them.

Let’s analyze the definition of the representable functor from this perspective. For F to be representable we require that: There be an object a in C; one natural transformation α from C(a, -) to F; another natural transformation, β, in the opposite direction; and that their composition be the identity natural transformation.

Let’s look at the component of α at some object x. It’s a function in Set:

α_x :: C(a, x) -> F x

The naturality condition for this transformation tells us that, for any morphism f from x to y, the following diagram commutes:

F f ∘ α_x = α_y ∘ C(a, f)

In Haskell, we would replace natural transformations with polymorphic functions:

alpha :: forall x. (a -> x) -> F x

with the optional forall quantifier. The naturality condition

fmap f . alpha = alpha . fmap f

is automatically satisfied due to parametricity (it’s one of those theorems for free I mentioned earlier), with the understanding that fmap on the left is defined by the functor F, whereas the one on the right is defined by the reader functor. Since fmap for reader is just function precomposition, we can be even more explicit. Acting on h, an element of C(a, x), the naturality condition simplifies to:

fmap f (alpha h) = alpha (f . h)

The other transformation, beta, goes the opposite way:

beta :: forall x. F x -> (a -> x)

It must respect naturality conditions, and it must be the inverse of α:

α ∘ β = id = β ∘ α

We will see later that a natural transformation from C(a, -) to any Set-valued functor always exists, as long as F a is non-empty (Yoneda’s lemma) but it’s not necessarily invertible.

Let me give you an example in Haskell with the list functor and Int as a. Here’s a natural transformation that does the job:

alpha :: forall x. (Int -> x) -> [x]
alpha h = map h [12]

I have arbitrarily picked the number 12 and created a singleton list with it. I can then fmap the function h over this list and get a list of the type returned by h. (There are actually as many such transformations as there are list of integers.)

The naturality condition is equivalent to the composability of map (the list version of fmap):

map f (map h [12]) = map (f . h) [12]

But if we tried to find the inverse transformation, we would have to go from a list of arbitrary type x to a function returning x:

beta :: forall x. [x] -> (Int -> x)

You might think of retrieving an x from the list, e.g., using head, but that won’t work for an empty list. Notice that there is no choice for the type a (in place of Int) that would work here. So the list functor is not representable.

Remember when we talked about Haskell (endo-) functors being a little like containers? In the same vein we can think of representable functors as containers for storing memoized results of function calls (the members of hom-sets in Haskell are just functions). The representing object, the type a in C(a, -), is thought of as the key type, with which we can access the tabulated values of a function. The transformation we called α is called tabulate, and its inverse, β, is called index. Here’s a (slightly simplified) Representable class definition:

class Representable f where
   type Rep f :: *
   tabulate :: (Rep f -> x) -> f x
   index    :: f x -> Rep f -> x

Notice that the representing type, our a, which is called Rep f here, is part of the definition of Representable. The star just means that Rep f is a type (as opposed to a type constructor, or other more exotic kinds).

Infinite lists, or streams, which cannot be empty, are representable.

data Stream x = Cons x (Stream x)

You can think of them as memoized values of a function taking an Integer as an argument. (Strictly speaking, I should be using non-negative natural numbers, but I didn’t want to complicate the code.)

To tabulate such a function, you create an infinite stream of values. Of course, this is only possible because Haskell is lazy. The values are evaluated on demand. You access the memoized values using index:

instance Representable Stream where
    type Rep Stream = Integer
    tabulate f = Cons (f 0) (tabulate (f . (+1)))
    index (Cons b bs) n = if n == 0 then b else index bs (n - 1)

It’s interesting that you can implement a single memoization scheme to cover a whole family of functions with arbitrary return types.

Representability for contravariant functors is similarly defined, except that we keep the second argument of C(-, a) fixed. Or, equivalently, we may consider functors from C^op to Set, because C^op(a, -) is the same as C(-, a).

There is an interesting twist to representability. Remember that hom-sets can internally be treated as exponential objects, in cartesian closed categories. The hom-set C(a, x) is equivalent to x^a, and for a representable functor F we can write:

-^a = F

Let’s take the logarithm of both sides, just for kicks:

a = log F

Of course, this is a purely formal transformation, but if you know some of the properties of logarithms, it is quite helpful. In particular, it turns out that functors that are based on product types can be represented with sum types, and that sum-type functors are not in general representable (example: the list functor).

Finally, notice that a representable functor gives us two different implementations of the same thing — one a function, one a data structure. They have exactly the same content — the same values are retrieved using the same keys. That’s the sense of “sameness” I was talking about. Two naturally isomorphic functors are identical as far as their contents are involved. On the other hand, the two representations are often implemented differently and may have different performance characteristics. Memoization is used as a performance enhancement and may lead to substantially reduced run times. Being able to generate different representations of the same underlying computation is very valuable in practice. So, surprisingly, even though it’s not concerned with performance at all, category theory provides ample opportunities to explore alternative implementations that have practical value.

Challenges

Show that the hom-functors map identity morphisms in C to corresponding identity functions in Set.
Show that Maybe is not representable.
Is the Reader functor representable?
Using Stream representation, memoize a function that squares its argument.
Show that tabulate and index for Stream are indeed the inverse of each other. (Hint: use induction.)
The functor:
```
Pair a = Pair a a
```
is representable. Can you guess the type that represents it? Implement tabulate and index.

Bibliography

The Catsters video about representable functors.

Next: The Yoneda Lemma.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help throughout this series of posts.
Follow @BartoszMilewski

July 21, 2015

Free Monoids

Posted by Bartosz Milewski under Category Theory, Functional Programming, Haskell, Programming
[33] Comments

This is part 13 of Categories for Programmers. Previously: Limits and Colimits. See the Table of Contents.

Monoids are an important concept in both category theory and in programming. Categories correspond to strongly typed languages, monoids to untyped languages. That’s because in a monoid you can compose any two arrows, just as in an untyped language you can compose any two functions (of course, you may end up with a runtime error when you execute your program).

We’ve seen that a monoid may be described as a category with a single object, where all logic is encoded in the rules of morphism composition. This categorical model is fully equivalent to the more traditional set-theoretical definition of a monoid, where we “multiply” two elements of a set to get a third element. This process of “multiplication” can be further dissected into first forming a pair of elements and then identifying this pair with an existing element — their “product.”

What happens when we forgo the second part of multiplication — the identification of pairs with existing elements? We can, for instance, start with an arbitrary set, form all possible pairs of elements, and call them new elements. Then we’ll pair these new elements with all possible elements, and so on. This is a chain reaction — we’ll keep adding new elements forever. The result, an infinite set, will be almost a monoid. But a monoid also needs a unit element and the law of associativity. No problem, we can add a special unit element and identify some of the pairs — just enough to support the unit and associativity laws.

Let’s see how this works in a simple example. Let’s start with a set of two elements, {a, b}. We’ll call them the generators of the free monoid. First, we’ll add a special element e to serve as the unit. Next we’ll add all the pairs of elements and call them “products”. The product of a and b will be the pair (a, b). The product of b and a will be the pair (b, a), the product of a with a will be (a, a), the product of b with b will be (b, b). We can also form pairs with e, like (a, e), (e, b), etc., but we’ll identify them with a, b, etc. So in this round we’ll only add (a, a), (a, b) and (b, a) and (b, b), and end up with the set {e, a, b, (a, a), (a, b), (b, a), (b, b)}.

In the next round we’ll keep adding elements like: (a, (a, b)), ((a, b), a), etc. At this point we’ll have to make sure that associativity holds, so we’ll identify (a, (b, a)) with ((a, b), a), etc. In other words, we won’t be needing internal parentheses.

You can guess what the final result of this process will be: we’ll create all possible lists of as and bs. In fact, if we represent e as an empty list, we can see that our “multiplication” is nothing but list concatenation.

This kind of construction, in which you keep generating all possible combinations of elements, and perform the minimum number of identifications — just enough to uphold the laws — is called a free construction. What we have just done is to construct a free monoid from the set of generators {a, b}.

Free Monoid in Haskell

A two-element set in Haskell is equivalent to the type Bool, and the free monoid generated by this set is equivalent to the type [Bool] (list of Bool). (I am deliberately ignoring problems with infinite lists.)

A monoid in Haskell is defined by the type class:

class Monoid m where
    mempty  :: m
    mappend :: m -> m -> m

This just says that every Monoid must have a neutral element, which is called mempty, and a binary function (multiplication) called mappend. The unit and associativity laws cannot be expressed in Haskell and must be verified by the programmer every time a monoid is instantiated.

The fact that a list of any type forms a monoid is described by this instance definition:

instance Monoid [a] where
    mempty  = []
    mappend = (++)

It states that an empty list [] is the unit element, and list concatenation (++) is the binary operation.

As we have seen, a list of type a corresponds to a free monoid with the set a serving as generators. The set of natural numbers with multiplication is not a free monoid, because we identify lots of products. Compare for instance:

2 * 3 = 6
[2] ++ [3] = [2, 3] // not the same as [6]

That was easy, but the question is, can we perform this free construction in category theory, where we are not allowed to look inside objects? We’ll use our workhorse: the universal construction.

The second interesting question is, can any monoid be obtained from some free monoid by identifying more than the minimum number of elements required by the laws? I’ll show you that this follows directly from the universal construction.

Free Monoid Universal Construction

If you recall our previous experiences with universal constructions, you might notice that it’s not so much about constructing something as about selecting an object that best fits a given pattern. So if we want to use the universal construction to “construct” a free monoid, we have to consider a whole bunch of monoids from which to pick one. We need a whole category of monoids to chose from. But do monoids form a category?

Let’s first look at monoids as sets equipped with additional structure defined by unit and multiplication. We’ll pick as morphisms those functions that preserve the monoidal structure. Such structure-preserving functions are called homomorphisms. A monoid homomorphism must map the product of two elements to the product of the mapping of the two elements:

h (a * b) = h a * h b

and it must map unit to unit.
For instance, consider a homomorphism from lists of integers to integers. If we map [2] to 2 and [3] to 3, we have to map [2, 3] to 6, because concatenation

[2] ++ [3] = [2, 3]

becomes multiplication

2 * 3 = 6

Now let’s forget about the internal structure of individual monoids, and only look at them as objects with corresponding morphisms. You get a category Mon of monoids.

Okay, maybe before we forget about internal structure, let us notice an important property. Every object of Mon can be trivially mapped to a set. It’s just the set of its elements. This set is called the underlying set. In fact, not only can we map objects of Mon to sets, but we can also map morphisms of Mon (homomorphisms) to functions. Again, this seems sort of trivial, but it will become useful soon. This mapping of objects and morphisms from Mon to Set is in fact a functor. Since this functor “forgets” the monoidal structure — once we are inside a plain set, we no longer distinguish the unit element or care about multiplication — it’s called a forgetful functor. Forgetful functors come up regularly in category theory.

We now have two different views of Mon. We can treat it just like any other category with objects and morphisms. In that view, we don’t see the internal structure of monoids. All we can say about a particular object in Mon is that it connects to itself and to other objects through morphisms. The “multiplication” table of morphisms — the composition rules — are derived from the other view: monoids-as-sets. By going to category theory we haven’t lost this view completely — we can still access it through our forgetful functor.

To apply the universal construction, we need to define a special property that would let us search through the category of monoids and pick the best candidate for a free monoid. But a free monoid is defined by its generators. Different choices of generators produce different free monoids (a list of Bool is not the same as a list of Int). Our construction must start with a set of generators. So we’re back to sets!

That’s where the forgetful functor comes into play. We can use it to X-ray our monoids. We can identify the generators in the X-ray images of those blobs. Here’s how it works:

We start with a set of generators, x. That’s a set in Set.

The pattern we are going to match consists of a monoid m — an object of Mon — and a function p in Set:

p :: x -> U m

where U is our forgetful functor from Mon to Set. This is a weird heterogeneous pattern — half in Mon and half in Set.

The idea is that the function p will identify the set of generators inside the X-ray image of m. It doesn’t matter that functions may be lousy at identifying points inside sets (they may collapse them). It will all be sorted out by the universal construction, which will pick the best representative of this pattern.

We also have to define the ranking among candidates. Suppose we have another candidate: a monoid n and a function that identifies the generators in its X-ray image:

q :: x -> U n

We’ll say that m is better than n if there is a morphism of monoids (that’s a structure-preserving homomorphism):

h :: m -> n

whose image under U (remember, U is a functor, so it maps morphisms to functions) factorizes through p:

q = U h . p

If you think of p as selecting the generators in m; and q as selecting “the same” generators in n; then you can think of h as mapping these generators between the two monoids. Remember that h, by definition, preserves the monoidal structure. It means that a product of two generators in one monoid will be mapped to a product of the corresponding two generators in the second monoid, and so on.

This ranking may be used to find the best candidate — the free monoid. Here’s the definition:

We’ll say that m (together with the function p) is the free monoid with the generators x if and only if there is a unique morphism h from m to any other monoid n (together with the function q) that satisfies the above factorization property.

Incidentally, this answers our second question. The function U h is the one that has the power to collapse multiple elements of U m to a single element of U n. This collapse corresponds to identifying some elements of the free monoid. Therefore any monoid with generators x can be obtained from the free monoid based on x by identifying some of the elements. The free monoid is the one where only the bare minimum of identifications have been made.

We’ll come back to free monoids when we talk about adjunctions.

Challenges

You might think (as I did, originally) that the requirement that a homomorphism of monoids preserve the unit is redundant. After all, we know that for all a
```
h a * h e = h (a * e) = h a
```
So h e acts like a right unit (and, by analogy, as a left unit). The problem is that h a, for all a might only cover a sub-monoid of the target monoid. There may be a “true” unit outside of the image of h. Show that an isomorphism between monoids that preserves multiplication must automatically preserve unit.
Consider a monoid homomorphism from lists of integers with concatenation to integers with multiplication. What is the image of the empty list []? Assume that all singleton lists are mapped to the integers they contain, that is [3] is mapped to 3, etc. What’s the image of [1, 2, 3, 4]? How many different lists map to the integer 12? Is there any other homomorphism between the two monoids?
What is the free monoid generated by a one-element set? Can you see what it’s isomorphic to?

Next: Representable Functors.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help throughout this series of posts.
Follow @BartoszMilewski

July 13, 2015

From Lenses to Yoneda Embedding

Posted by Bartosz Milewski under Category Theory, Functional Programming, Haskell, Lens, Programming
[7] Comments

Lenses are a fascinating subject. Edward Kmett’s lens library is an indispensable tool in every Haskell programmer’s toolbox. I set out to write this blog post with the goal of describing some new insights into their categorical interpretation, but then I started reviewing all the different formulations of lenses and their relations to each other. So this post turned into a little summary of the theoretical underpinning of lenses.

If you’re already familiar with lenses, you may skip directly to the last section, which describes some new results.

The Data-Centric Picture

Lenses start as a very simple idea: an accessor/mutator or getter/setter pair — something familiar to every C++, Java, or C# programmer. In Haskell they can be described as two functions:

get :: s -> a
set :: s -> a -> s

Given an object of type s, the first function produces the value of the object’s sub-component of type a — which is the focus of the particular lens. The second function takes the object together with a new value for the sub-component, and produces a modified object.

A simple example is a lens that focuses on the first component of a pair:

get :: (a, a') -> a
get (x, y) = x

set :: (a, a') -> a -> (a, a')
set (x, y) x' = (x', y)

In Haskell we can go even further and define a lens for a polymorphic object, in which the setter changes not only the value of the sub-component, but also its type. This will, of course, also change the type of the resulting object. So, in general we have:

get :: s -> a
set :: s -> b -> t

Our pair example doesn’t change much on the surface:

get :: (a, a') -> a
get (x, y) = x

set :: (a, a') -> b -> (b, a')
set (x, y) x' = (x', y)

The difference is that the type of x' can now be different from the type of x. The first component of the pair is of type a before the update, and of type b (the same as that of x') after it. This way we can turn a pair of, say, (Int, Bool) to a pair of (String, Bool).

We can always go back to the monomorphic version of the lens by choosing b equal to a and t equal to s.

Not every pair of functions like these constitutes a lens. A lens has to obey a few laws (first formulated by Pierce in the database context). In particular, if you get a component after you set it, you should get back the value you just put in:

get . set s = id

If you call set with the value you obtained from get, you should get back the unchanged object:

set s (get s) = s

Finally, a set should overwrite the result of a previous set:

set (set s a) b = set s b

So this is what I would call a “classical” lens. It’s formulated in easy to understand programming terms.

The Algebraic Picture

It was Russell O’Connor who noticed that when you refactor the common element from the getter/setter pair, you get an interesting algebraic structure. Instead of writing the lens as two functions, we can write it as one function returning a pair:

s -> (a, b -> t)

Think of this as factoring out the “this” pointer in OO and returning the interface. Of course, the difference is that, in functional programming, the setter does not mutate the original — it returns a new version of the object instead.

Let’s for a moment concentrate on the monomorphic version of the lens — one in which a is the same as b, and s is the same as t. We can define a data structure:

data Store a s = Store a (a -> s)

and rewrite the lens as:

s -> Store a s

For instance, our first-component-of-a-pair lens will take the form:

fstl :: (a, b) -> Store a (a, b)
fstl (x, y) = Store x (\x' -> (x', y))

The first observation is that, for any a, Store a is a functor:

instance Functor (Store a) where
    fmap f (Store x h) = Store x (f . h)

It’s a well-known fact that you can define algebras for a functor, so-called F-algebras. An algebra for a functor f consists of a type s called the carrier type and a function called the action:

alg :: f s -> s

This is almost like a lens (with f replaced by Store a), except that the arrow goes the wrong way. Not to worry: there is a dual notion called a coalgebra. It consists of a carrier type s and a function:

coalg :: s -> f s

Substitute Store a for f and you see that a lens is nothing but a coalgebra for this functor. This is not saying much — we have just given a mathematical name to a programming construct, no big deal. Except that Store a is more than a functor — it’s also a comonad.

What’s a comonad? It’s a monad with the arrows reversed.

You know that you can define a monad in Haskell using return and join. Reverse the arrows on those two, and you get extract and duplicate, the two functions that define a comonad.

class (Functor w) => Comonad w where
    extract :: w a -> a
    duplicate :: w a -> w (w a)

Here, w is a type constructor that is also a functor.

This is how you can implement those two functions for Store a:

instance Comonad (Store a) where
    -- extract :: Store a s -> s
    extract (Store x h) = h x
    -- duplicate :: Store a s -> Store a (Store a s)
    duplicate (Store x h) = Store x (\y -> Store y h)

So now we have two structures: a comonad w and a coalgebra:

type Coalgebra w s = s -> w s

A lens is a special case of this coalgebra where w is Store a.

Every time you have two structures, you may legitimately ask the question: Are they compatible? Just by looking at types, you may figure out some obvious compatibility conditions. In particular, since they go the opposite way, it would make sense for extract to undo the action of coalg:

coalg :: Coalgebra w s
extract . coalg = id

Also, duplicating the result of coalg should be the same as applying coalg twice (the second time lifted by fmap):

fmap coalg . coalg = duplicate . coalg

If these two conditions are satisfied, we call coalg a comonad coalgebra.

And here’s the clencher:

These two conditions when applied to the Store a comonad are equivalent to our earlier lens laws.

Let’s see how it works. First we’ll express the result of our lens coalgebra acting on some object s in terms of get and set (curried set s is a function a->s):

coalg s = Store (get s) (set s)

The first condition extract . coalg = id immediately gives us the law:

set s (get s) = s

When we act with duplicate on coalg s, we get:

Store (get s) (\y -> Store y (set s))

On the other hand, when we fmap our coalg over coalg s, we get:

Store (get s) ((\s' -> Store (get s') (set s')) . set s)

Two functions — the second components of the Store objects in those equations — must be equal when acting on any a. The first function produces:

Store a (set s)

In the second one, we first apply set s to a to get set s a, which we then pass to the lambda to get:

Store (get (set s a)) (set (set s a))

This reproduces the other two (monomorphic) lens laws:

get (set s a) = a

and

set (set s a) = set s

This algebraic construction can be extended to type-changing lenses by replacing Store with its indexed version:

data IStore a b t = IStore a (b -> t)

and the comonad with its indexed counterpart. The indexed store is also called Context in the lens parlance, and an indexed comonad is also called a parametrized comonad.

So what’s an indexed comonad? Let’s start with an indexed functor. It’s a type constructor that takes three types, a, b, and s, and is a functor in the third argument:

class IxFunctor f where
    imap :: (s -> t) -> f a b s -> f a b t

IStore is obviously an indexed functor:

instance IxFunctor IStore where
    -- imap :: (s -> t) -> IStore a b s -> IStore a b t
    imap f (IStore x h) = IStore x (f . h)

An indexed comonad has the indexed versions of extract and duplicate:

class IxComonad w where
    iextract :: w a a t -> t
    iduplicate :: w a b t -> w a j (w j b t)

Notice that iextract is “diagonal” in the index types, whereas the double application of w shares one index, j, between the two applications. This plays very well with the unit and multiplication interpretation of a monad — here it looks just like matrix multiplication (although we are dealing with a comonad rather than a monad).

It’s easy to see that the instantiation of the indexed comonad for IStore works the same way as the instantiation of the comonad for Store. The types just work out that way.

instance IxComonad IStore where
    -- iextract :: IStore a a t -> t
    iextract (IStore a h) = h a
    -- iduplicate :: IStore a b t -> IStore a c (IStore c b t)
    iduplicate (IStore a h) = IStore a (\c -> IStore c h)

There is also an indexed version of a comonad coalgebra, where the coalgebra is replaced by a family of mappings from some carrier type s to w a b t; with the type t determined by s together with the choice of of the indexes a and b:

type ICoalg w s t a b = s -> w a b t

The compatibility conditions that make it an (indexed) comonad coalgebra are almost identical to the standard compatibility conditions, except that we have to be careful about the index types. Here’s the first condition:

icoalg_aa :: ICoalg w s t a a
iextract . icoalg_aa = id

Let’s analyze the types. The inner part has the type:

icoalg_aa :: s -> w a a t

We apply iextract to it, which has the type:

iextract :: w a a t -> t

and get:

iextract . icoalg_aa :: s -> t

The right hand side of the condition has the type:

id :: s -> s

It follows that, for the diagonal components of ICoalg w s t, t must be equal to s. The diagonal part of ICoalg w s t is therefore a family of regular coalgebras.

As we have done with the monomorphic lens, we can express (ICoalg IStore s t a b), when acting on s, in terms of get and set:

icoalg_ab s = IStore (get s) (set s)

But now get s is of type a, while set s if of the type b -> t. We can still apply extract to the diagonal term IStore a a t as required by the first compatibility condition. When equating the result to id, we recover the lens law:

set s (get s) = s

Similarly, it’s straightforward to see that the second compatibility condition:

icoalg_bc :: ICoalg w s t b c
icoalg_ab :: ICoalg w s t a b
icoalg_ac :: ICoalg w s t a c
imap icoalg_bc . icoalg_ab = iduplicate . icoalg_ac

is equivalent to the other two lens laws.

The Parametric Picture

Despite being theoretically attractive, standard lenses were awkward to use and, in particular, to compose. The breakthrough came when Twan van Laarhoven realized that there is a higher-order representation for them that has very nice compositional properties. Composing lenses to focus on sub-objects of sub-objects turned into simple function composition.

Here’s Twan’s representation (generalized by Russell for the polymorphic case):

type Lens s t a b = forall f. Functor f => (a -> f b) -> (s -> f t)

So a lens is a polymorphic higher order function with a twist. The twist is that it’s polymorphic with respect to a functor rather than a type.

You can think of it this way: the caller provides a function to modify a particular field of s, turning it from type a to f b. What the caller gets back is a function that transforms the whole of s to f t. The idea is that the lens knows how to reconstruct the object, while putting it under a functor f — if you tell it how to modify a field, also under this functor.

For instance, continuing with our example, here’s the van Laarhoven lens that focuses on the first component of a pair:

vL :: Lens (a, c) (b, c) a b
vL h (x, x') = fmap (\y -> (y, x')) (h x)

Here, s is (a, c) and t is (b, c).

To see that the van Laarhoven representation is equivalent to the get/set one, let’s first change the order of arguments and pull s outside of the forall quantifier:

Lens s t a b = s -> (forall f. Functor f => (a -> f b) -> f t)

Here’s how you can read this definition: For a given s, if you give me a function from a to f b, I will produce a value of type f t. And I don’t care what functor you use!

What does it mean not to care about the functor? It means that the lens must be parametrically polymorphic in f. It can’t do case analysis on a functor. It must be implemented using the same formula for f being the list functor, or the Maybe functor, or the Const functor, etc. There’s only one thing all these functors have in common, and that’s the fmap function; so that’s what we are allowed to use in the implementation of the lens.

Now let’s think what we can do with a function a -> f b that we were given. There’s only one thing: apply it to some value of type a. So we must have access to a value of type a. The result of this application is some value of type f b, but we need to produce a value of the type f t. The only way to do it is to have a function of type b->t and sneak it under the functor using fmap (so here’s where the generic functor comes in). We conclude that the implementation of the function:

forall f. Functor f => (a -> f b) -> f t

must be hiding a value of type a and a function b->t. But that’s exactly the contents of IStore a b t. Parametricity tells us that there is an IStore hiding inside the van Laarhoven lens. The lens is equivalent to:

s -> IStore a b t

In fact, with a clever choice of functors we can recover both get and set from the van Laarhoven representation.

First we select our functor to be Const a. Note that the parameter a is not the one over which the functor is defined. Const a takes a second parameter b over which it is functorial. And, even though it takes b as a type parameter, it doesn’t use it at all. Instead, like a magician, it palms an a, and then reveals it at the end of the trick.

newtype Const a b = Const { getConst :: a }

instance Functor (Const a) where
    -- fmap :: (s -> t) -> Const a s -> Const a t
    fmap _ (Const a) = (Const a)

The constructor Const happens to be a function

Const :: a -> Const a b

which has the required form to be the first argument to the lens:

a -> f b

When we apply the lens, let’s call it vL, to the function Const, we get another function:

vL Const :: s -> Const a t

We can apply this function to s, and then, in the final reveal, retrieve the value of a that was smuggled inside Const a:

get vL s = getConst $ vL Const s

Similarly, we can recover set from the van Laarhoven lens using the Identity functor:

newtype Identity a = Identity { runIdentity :: a }

We define:

set vL s x = runIdentity $ vL (Identity . const x) s

The beauty of the van Laarhoven representation is that it composes lenses using simple function composition. A lens takes a function and returns a function. This function can, in turn, be passed as the argument to another lens, and so on.

There’s an interesting twist to this kind of composition — the function composition operator in Haskell is the dot, just like the field accessor in OO languages like Java or C++; and the composition follows the same order as the composition of accessors in those languages. This was first observed by Conal Elliott in the context of semantic editor combinators.

Consider a lens that focuses on the a field inside some object s. It’s type is:

Lens s t a b

When given a function:

h :: a -> f b

it returns a function:

h' :: s -> f t

Now consider another lens that focuses on the s field inside some even bigger object u. It’s type is:

Lens u w s t

It expects a function of the type:

g :: s -> f t

We can pass the result of the first lens directly to the second lens to form a composite:

Lens u w s t . Lens s t a b

We get a lens that focuses on the a field of the object s that is the sub-object of the big object u. It works just like in Java, where you apply a dot to the result of a getter or a setter, to dig deeper into a subobject.

Not only do lenses compose using regular function composition, but we can also use the identity function as the identity lens. So lenses form a category. It’s time to have a serious look at category theory. Warning: Heavy math ahead!

The Categorical Picture

I used parametricity arguments to justify the choice of the van Laarhoven representation for the lens. The lens function is supposed to have the same form for all functors f. Parametricity arguments have an operational feel to them, which is okay, but I feel like a solid categorical justification is more valuable than any symbol-shuffling argument. So I worked on it, and eventually came up with a derivation of the van Laarhoven representation using the Yoneda lemma. Apparently Russell O’Connor and Mauro Jaskelioff had similar feelings because they came up with the same result independently. We used the same approach, going through the Store functor and applying the Yoneda lemma twice, once in the functor category, and once in the Set category (see the Bibliography).

I would like to present the same result in a more general setting of the Yoneda embedding. It’s a direct consequence of the Yoneda lemma, and it states that any category can be embedded (fully and faithfully) in the category of functors from that category to Set.

Here’s how it works: Let’s fix some object a in some category C. For any object x in that category there is a hom-set C(a, x) of morphisms from a to x. A hom-set is a set — an object in the category Set of sets. So we have defined a mapping from C to Set that takes an x and maps it to the set C(a, x). This mapping is called C(a, _), with the underscore serving as a placeholder for the argument.

It’s easy to convince yourself that this mapping is in fact a functor from C to Set. Indeed, take any morphism f from x to y. We want to map this morphism to a function (a morphism in Set) that goes between C(a, x) and C(a, y). Let’s define this lifted function component-wise: given any element h from C(a, x) we can map it to f . h. It’s just a composition of two morphisms from C. The resulting morphism is a member of C(a, y). We have lifted a morphism f from C to Set thus establishing that C(a, _) is a functor.

Now consider two such functors, C(a, _) and C(b, _). The Yoneda embedding theorem tells us that there is a one-to-one correspondence between the set of natural transformations between these two functors and the hom-set C(b, a).

Nat(C(a, _), C(b, _)) ≅ C(b, a)

Notice the reversed order of a and b on the right-hand side.

Let’s rephrase what we have just seen. For every a in C, we can define a functor C(a, _) from C to Set. Such a functor is a member of the functor category Fun(C, Set). So we have a mapping from C to the functor category Fun(C, Set). Is this mapping a functor?

We have just seen that there is a mapping between morphisms in C and natural transformations in Fun(C, Set) — that’s the gist of the Yoneda embedding. But natural transformations are morphisms in the functor category. So we do have a functor from C to the functor category Fun(C, Set). It maps objects to objects and morphisms to morphisms. It’s a contravariant functor, because of the reversal of a and b. Moreover, it maps the hom-sets in the two categories one-to-one, so it’s a fully faithful functor, and therefore it defines an embedding of categories. Every category C can be embedded in the functor category Fun(C, Set). That’s called the Yoneda embedding.

There’s an interesting consequence of the Yoneda embedding: Every functor category can be embedded in its own functor category — just replace C with a functor category in the Yoneda embedding. Recall that functors between any two categories form a category. It’s a category in which objects are functors and morphisms are natural transformations. Yoneda embedding works for that category too, which means that a functor category can be embedded in a category of functors from that functor category to Set.

Let’s see what that means. We can fix one functor, say R and consider the hom-set from R to some arbitrary functor f. Since we are in a functor category, this hom-set is a set of natural transformations between the two functors, Nat(R, f).

Now let’s pick another functor S. It also defines a set of natural transformations Nat(S, f). We can keep picking functors and mapping them to sets (sets of natural transformations). In fact we know from the previous argument that this mapping is itself a functor. This time it’s a functor from a functor category to Set.

What does the Yoneda embedding tell us about any two such functors? That the set of natural transformations between them is isomorphic to the (reversed) hom-set. But this time hom-sets are sets of natural transformations. So we have:

Nat(Nat(R, _), Nat(S, _)) ≅ Nat(S, R)

All natural transformations in this formula are regular natural transformation except for the outer one, which is more interesting. You may recall that a natural transformation is a family of morphisms parameterized by objects. But in this case objects are functors, and morphisms are themselves natural transformations. So it’s a family of natural transformations parameterized by functors. Keep this in mind as we proceed.

To get a better feel of what’s happening, let’s translate this to Haskell. In Haskell we represent natural transformations as polymorphic functions. This makes sense, since a natural transformation is a family of morphisms (here functions) parameterized by objects (here types). So a member of Nat(R, f) can be represented as:

forall x. R x -> f x

Similarly, the second natural transformation in our formula turns into:

forall y. S y -> f y

As I said, the outer natural transformation in the Yoneda embedding is a family of natural transformations parameterized by a functor, so we get:

forall f. Functor f => (forall x. R x -> f x) -> (forall y. S y -> f y)

You can already see one element of the van Laarhoven representation: the quantification over a functor.

The right hand side of the Yoneda embedding is a natural transformation:

forall z. S z -> R z

The next step is to pick the appropriate functors for R and S. We’ll take R to be IStore a b and S to be IStore s t.

Let’s work on the first part:

forall x. IStore a b x -> f x

A function from IStore a b x is equivalent to a function of two arguments, one of them of type a and another of type b->x:

forall x. a -> (b -> x) -> f x

We can pull a out of forall to get:

a -> (forall x. (b -> x) -> f x)

If you squint a little, you recognize that the thing in parentheses is a natural transformation between the functor C(b, _) and f, where C is the category of Haskell types. We can now apply the Yoneda lemma, which says that this set of natural transformations is isomorphic to the set f b:

forall x. (b -> x) -> f x ≅ f b

We can apply the same transformation to the second part of our identity:

forall y. (IStore s t y -> f y) ≅ s -> f t

Taking it all together, we get:

forall f. Functor f => (a -> f b) -> (s -> f t) 
    ≅ forall z. IStore s t z -> IStore a b z

Let’s now work on the right hand side:

forall z. IStore s t z -> IStore a b z 
    ≅ forall z. s -> (t -> z) -> IStore a b z

Again, pulling s out of forall and applying the Yoneda lemma, we get:

s -> IStore a b t

But that’s just the standard representation of the lens:

s -> IStore a b t ≅ (s -> a, s -> b -> t) = (get, set)

Thus the Yoneda embedding of the functor category leads to the van Laarhoven representation of the lens:

forall f. Functor f => (a -> f b) -> (s -> ft) 
  ≅ (s -> a, s -> b -> t)

Playing with Adjunctions

This is all very satisfying, but you may wonder what’s so special about the IStore functor? The crucial step in the derivation of the van Laarhoven representation was the application of the Yoneda lemma to get this identity:

forall x. IStore a b x -> f x ≅ a -> f b

Let’s rewrite it in the more categorical language:

Nat(IStore a b, f) ≅ C(a, f b)

The set of natural transformations from the functor IStore a b to the functor f is isomorphic to the hom-set between a and f b. Any time you see an isomorphism of hom-sets (and remember that Nat is the hom-set in the functor category), you should be on the lookout for an adjunction. And indeed, we have an adjunction between two functors. One functor is defined as:

a -> IStore a b

It takes an object a in C and maps it to a functor IStore a b parameterized by some other object b. The other functor is:

f -> f b

It maps a functor, an object in the functor category, to an object in C. This functor is also parameterized by the same b. Since this is a flipped application, I’ll call it Flapp:

newtype Flapp b f = Flapp (f b)

So, for any b, the functor-valued functor IStore _ b is left adjoint to Flapp b. This is what makes IStore special.

As a side note: IStore a b is a covariant functor in a and a contravariant functor in b. However, Store a is not functorial in a, because a appears in both positive and negative position in its definition. So the adjunction trick doesn’t work for a simple (monomorphic) lens.

We can now turn the tables and use the adjunction to define the functor IStore in an arbitrary category (notice that the Yoneda lemma worked only for Set-valued functors). We just define a functor-valued functor IStore to be the left adjoint to Flapp, provided it exists.

Nat(IStore a b, f) ≅ C(a, f b)

Here, Nat is a set of natural transformations between endofunctors in C.

We can substitute the so defined functor into the Yoneda embedding formula we used earlier:

Nat((Nat(IStore a b, f), Nat(IStore s t, f)) 
    ≅ Nat(IStore s t, IStore a b)

We can now use the adjunction, rather than the Yoneda lemma, to eliminate some of the occurrences IStore:

Nat(C(a, f b), C(s, f t))
    ≅ C(s, IStore a b t)

This is slightly more general than the original van Laarhoven equivalence.

We can go even farther and reproduce the Jaskelioff and O’Connor trick of constraining the generic functor in the definition of the van Laarhoven lens to a pointed or applicative functor. This results in a multi-focus lens. In particular, if we use pointed functors, we get lenses with zero or one targets, so called affine lenses. Restricting the functors further to applicative leads to lenses with any number of targets, or traversals.

The trick is that any pointed or applicative functor can be stripped of the additional functionality and treated just like any other functor. This act of “forgetting” about pure and <*> may itself be considered a functor in the functor category. It’s called, appropriately, a forgetful functor. The left adjoint to a forgetful functor (if it exists) is called a free functor. It takes an arbitrary functor and creates a pointed functor by generating an artificial pure; or it creates an applicative functor by adding <*>. This adjunction is described by a natural isomorphism of hom-sets — in this case sets of natural transformations:

Nat(S, U f) ≅ Nat(S*, f)

Here, U is the forgetful functor, and S* is the free applicative/pointed version of the functor S. The functor f ranges across applicative (respectively, pointed) functors.

Now we can try to substitute the free version of IStore in the Yoneda embedding formula:

Nat((Nat(IStore* a b, f), Nat(IStore* s t, f)) 
    ≅ Nat(IStore* s t, IStore* a b)

The formula holds for any applicative (pointed) functor f and a set of natural transformations over such functors.

The first step is to use the forgetful/free adjunction:

Nat((Nat(IStore a b, U f), Nat(IStore s t, U f)) 
    ≅ Nat(IStore s t, U IStore* a b)

Then we can use our defining adjunction for IStore to get:

Nat(C(a, U f b), C(s, U f t)) 
    ≅ C(s, U IStore* a b t)

In Haskell notation this reads:

forall f. Applicative f => (a -> f b) -> (s -> f t)
    ≅ s -> IAppStore a b t

(the action of U is implicit).

The free applicative version of IStore is defined as:

data IAppStore a b t = 
    Unit t
  | IAppStore a (IAppStore a b (b -> t))

These are all known results, but the use of the Yoneda embedding and the adjunction to define the IStore functor makes the derivation more compact and slightly more general.

Acknowledgments

I’m grateful to Mauro Jaskelioff, Gershom Bazerman, and Joseph Abrahamson for reading the draft and providing helpful comments and to André van Meulebrouck for editing help.

Bibliography

Edward Kmett, The Haskell Lens Library
Simon Peyton Jones, Lenses: Compositional Data Access and Manipulation. A Skills Matter video presentation.
Joseph Abrahamson, A Little Lens Starter Tutorial
Joseph Abrahamson, Lenses from Scratch
Artyom, lens over tea tutorial
Twan van Laarhoven, CPS based functional references
Bartosz Milewski, Lenses, Stores, and Yoneda
Mauro Jaskelioff, Russell O’Connor, A Representation Theorem for Second-Order Functionals
Bartosz Milewski, Understanding Yoneda

April 15, 2015

Category Theory and Declarative Programming

Posted by Bartosz Milewski under Category Theory
[19] Comments

This is part 11 of Categories for Programmers. Previously: Natural Transformations. See the Table of Contents.

Introduction to Part II

In the first part of the book I argued that both category theory and programming are about composability. In programming, you keep decomposing a problem until you reach the level of detail that you can deal with, solve each subproblem in turn, and re-compose the solutions bottom-up. There are, roughly speaking, two ways of doing it: by telling the computer what to do, or by telling it how to do it. One is called declarative and the other imperative.

You can see this even at the most basic level. Composition itself may be defined declaratively; as in, h is a composite of g after f:

h = g . f

or imperatively; as in, call f first, remember the result of that call, then call g with the result:

h x = let y = f x
      in g y

The imperative version of a program is usually described as a sequence of actions ordered in time. In particular, the call to g cannot happen before the execution of f completes. At least, that’s the conceptual picture — in a lazy language, with call-by-need argument passing, the actual execution may proceed differently.

In fact, depending on the cleverness of the compiler, there may be little or no difference between how declarative and imperative code is executed. But the two methodologies differ, sometimes drastically, in the way we approach problem solving and in the maintainability and testability of the resulting code.

The main question is: when faced with a problem, do we always have the choice between a declarative and imperative approaches to solving it? And, if there is a declarative solution, can it always be translated into computer code? The answer to this question is far from obvious and, if we could find it, we would probably revolutionize our understanding of the universe.

Let me elaborate. There is a similar duality in physics, which either points at some deep underlying principle, or tells us something about how our minds work. Richard Feynman mentions this duality as an inspiration in his own work on quantum electrodynamics.

There are two forms of expressing most laws of physics. One uses local, or infinitesimal, considerations. We look at the state of a system around a small neighborhood, and predict how it will evolve within the next instant of time. This is usually expressed using differential equations that have to be integrated, or summed up, over a period of time.

Notice how this approach resembles imperative thinking: we reach the final solution by following a sequence of small steps, each depending on the result of the previous one. In fact, computer simulations of physical systems are routinely implemented by turning differential equations into difference equations and iterating them. This is how spaceships are animated in the asteroids game. At each time step, the position of a spaceship is changed by adding a small increment, which is calculated by multiplying its velocity by the time delta. The velocity, in turn, is changed by a small increment proportional to acceleration, which is given by force divided by mass.

These are the direct encodings of the differential equations corresponding to Newton’s laws of motion:

F = m dv/dt
v = dx/dt

Similar methods may be applied to more complex problems, like the propagation of electromagnetic fields using Maxwell’s equations, or even the behavior of quarks and gluons inside a proton using lattice QCD (quantum chromodynamics).

This local thinking combined with discretization of space and time that is encouraged by the use of digital computers found its extreme expression in the heroic attempt by Stephen Wolfram to reduce the complexity of the whole universe to a system of cellular automata.

The other approach is global. We look at the initial and the final state of the system, and calculate a trajectory that connects them by minimizing a certain functional. The simplest example is the Fermat’s principle of least time. It states that light rays propagate along paths that minimize their flight time. In particular, in the absence of reflecting or refracting objects, a light ray from point A to point B will take the shortest path, which is a straight line. But light propagates slower in dense (transparent) materials, like water or glass. So if you pick the starting point in the air, and the ending point under water, it’s more advantageous for light to travel longer in the air and then take a shortcut through water. The path of minimum time makes the ray refract at the boundary of air and water, resulting in Snell’s law of refraction:

sin θ₁ / sin θ₂ =  v₁ / v₂

where v₁ is the speed of light in the air and v₂ is the speed of light in the water.

All of classical mechanics can be derived from the principle of least action. The action can be calculated for any trajectory by integrating the Lagrangian, which is the difference between kinetic and potential energy (notice: it’s the difference, not the sum — the sum would be the total energy). When you fire a mortar to hit a given target, the projectile will first go up, where the potential energy due to gravity is higher, and spend some time there racking up negative contribution to the action. It will also slow down at the top of the parabola, to minimize kinetic energy. Then it will speed up to go quickly through the area of low potential energy.

Feynman’s greatest contribution was to realize that the principle of least action can be generalized to quantum mechanics. There, again, the problem is formulated in terms of initial state and final state. The Feynman path integral between those states is used to calculate the probability of transition.

The point is that there is a curious unexplained duality in the way we can describe the laws of physics. We can use the local picture, in which things happen sequentially and in small increments. Or we can use the global picture, where we declare the initial and final conditions, and everything in between just follows.

The global approach can be also used in programming, for instance when implementing ray tracing. We declare the position of the eye and the positions of light sources, and figure out the paths that the light rays may take to connect them. We don’t explicitly minimize the time of flight for each ray, but we do use Snell’s law and the geometry of reflection to the same effect.

The biggest difference between the local and the global approach is in their treatment of space and, more importantly, time. The local approach embraces the immediate gratification of here and now, whereas the global approach takes a long-term static view, as if the future had been preordained, and we were only analyzing the properties of some eternal universe.

Nowhere is it better illustrated than in the Functional Reactive Programming approach to user interaction. Instead of writing separate handlers for every possible user action, all having access to some shared mutable state, FRP treats external events as an infinite list, and applies a series of transformations to it. Conceptually, the list of all our future actions is there, available as the input data to our program. From a program’s perspective there’s no difference between the list of digits of π, a list of pseudo-random numbers, or a list of mouse positions coming through computer hardware. In each case, if you want to get the nth item, you have to first go through the first n-1 items. When applied to temporal events, we call this property causality.

So what does it have to do with category theory? I will argue that category theory encourages a global approach and therefore supports declarative programming. First of all, unlike calculus, it has no built-in notion of distance, or neighborhood, or time. All we have is abstract objects and abstract connections between them. If you can get from A to B through a series of steps, you can also get there in one leap. Moreover, the major tool of category theory is the universal construction, which is the epitome of a global approach. We’ve seen it in action, for instance, in the definition of the categorical product. It was done by specifying its properties — a very declarative approach. It’s an object equipped with two projections, and it’s the best such object — it optimizes a certain property: the property of factorizing the projections of other such objects.

Compare this with Fermat’s principle of minimum time, or the principle of least action.

Conversely, contrast this with the traditional definition of a cartesian product, which is much more imperative. You describe how to create an element of the product by picking one element from one set and another element from another set. It’s a recipe for creating a pair. And there’s another for disassembling a pair.

In almost every programming language, including functional languages like Haskell, product types, coproduct types, and function types are built in, rather than being defined by universal constructions; although there have been attempts at creating categorical programming languages (see, e.g., Tatsuya Hagino’s thesis in Bibliography).

Whether used directly or not, categorical definitions justify pre-existing programming constructs, and give rise to new ones. Most importantly, category theory provides a meta-language for reasoning about computer programs at a declarative level. It also encourages reasoning about problem specification before it is cast into code.

Next Limits and Colimits.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.
Follow @BartoszMilewski

Bibliography

Tatsuya Hagino, A Categorical Programming Language.

April 15, 2015

Limits and Colimits

Posted by Bartosz Milewski under Category Theory
[62] Comments

This is part 12 of Categories for Programmers. Previously: Declarative Programming. See the Table of Contents.

It seems like in category theory everything is related to everything and everything can be viewed from many angles. Take for instance the universal construction of the product. Now that we know more about functors and natural transformations, can we simplify and, possibly, generalize it? Let us try.

The construction of a product starts with the selection of two objects a and b, whose product we want to construct. But what does it mean to select objects? Can we rephrase this action in more categorical terms? Two objects form a pattern — a very simple pattern. We can abstract this pattern into a category — a very simple category, but a category nevertheless. It’s a category that we’ll call 2. It contains just two objects, 1 and 2, and no morphisms other than the two obligatory identities. Now we can rephrase the selection of two objects in C as the act of defining a functor D from the category 2 to C. A functor maps objects to objects, so its image is just two objects (or it could be one, if the functor collapses objects, which is fine too). It also maps morphisms — in this case it simply maps identity morphisms to identity morphisms.

What’s great about this approach is that it builds on categorical notions, eschewing the imprecise descriptions like “selecting objects,” taken straight from the hunter-gatherer lexicon of our ancestors. And, incidentally, it is also easily generalized, because nothing can stop us from using categories more complex than 2 to define our patterns.

But let’s continue. The next step in the definition of a product is the selection of the candidate object c. Here again, we could rephrase the selection in terms of a functor from a singleton category. And indeed, if we were using Kan extensions, that would be the right thing to do. But since we are not ready for Kan extensions yet, there is another trick we can use: a constant functor Δ from the same category 2 to C. The selection of c in C can be done with Δ_c. Remember, Δ_c maps all objects into c and all morphisms into id_c.

Now we have two functors, Δ_c and D going between 2 and C so it’s only natural to ask about natural transformations between them. Since there are only two objects in 2, a natural transformation will have two components. Object 1 in 2 is mapped to c by Δ_c and to a by D. So the component of a natural transformation between Δ_c and D at 1 is a morphism from c to a. We can call it p. Similarly, the second component is a morphism q from c to b — the image of the object 2 in 2 under D. But these are exactly like the two projections we used in our original definition of the product. So instead of talking about selecting objects and projections, we can just talk about picking functors and natural transformations. It so happens that in this simple case the naturality condition for our transformation is trivially satisfied, because there are no morphisms (other than the identities) in 2.

A generalization of this construction to categories other than 2 — ones that, for instance, contain non-trivial morphisms — will impose naturality conditions on the transformation between Δ_c and D. We call such transformation a cone, because the image of Δ is the apex of a cone/pyramid whose sides are formed by the components of the natural transformation. The image of D forms the base of the cone.

In general, to build a cone, we start with a category I that defines the pattern. It’s a small, often finite category. We pick a functor D from I to C and call it (or its image) a diagram. We pick some c in C as the apex of our cone. We use it to define the constant functor Δ_c from I to C. A natural transformation from Δ_c to D is then our cone. For a finite I it’s just a bunch of morphisms connecting c to the diagram: the image of I under D.

Naturality requires that all triangles (the walls of the pyramid) in this diagram commute. Indeed, take any morphism f in I. The functor D maps it to a morphism D f in C, a morphism that forms the base of some triangle. The constant functor Δ_c maps f to the identity morphism on c. Δ squishes the two ends of the morphism into one object, and the naturality square becomes a commuting triangle. The two arms of this triangle are the components of the natural transformation.

So that’s one cone. What we are interested in is the universal cone — just like we picked a universal object for our definition of a product.

There are many ways to go about it. For instance, we may define a category of cones based on a given functor D. Objects in that category are cones. Not every object c in C can be an apex of a cone, though, because there may be no natural transformation between Δ_c and D.

To make it a category, we also have to define morphisms between cones. These would be fully determined by morphisms between their apexes. But not just any morphism will do. Remember that, in our construction of the product, we imposed the condition that the morphisms between candidate objects (the apexes) must be common factors for the projections. For instance:

p' = p . m
q' = q . m

This condition translates, in the general case, to the condition that the triangles whose one side is the factorizing morphism all commute.

The commuting triangle connecting two cones, with the factorizing morphism h (here, the lower cone is the universal one, with Lim D as its apex).

We’ll take those factorizing morphisms as the morphisms in our category of cones. It’s easy to check that those morphisms indeed compose, and that the identity morphism is a factorizing morphism as well. Cones therefore form a category.

Now we can define the universal cone as the terminal object in the category of cones. The definition of the terminal object states that there is a unique morphism from any other object to that object. In our case it means that there is a unique factorizing morphism from the apex of any other cone to the apex of the universal cone. We call this universal cone the limit of the diagram D, Lim D (in the literature, you’ll often see a left arrow pointing towards I under the Lim sign). Often, as a shorthand, we call the apex of this cone the limit (or the limit object).

The intuition is that the limit embodies the properties of the whole diagram in a single object. For instance, the limit of our two-object diagram is the product of two objects. The product (together with the two projections) contains the information about both objects. And being universal means that it has no extraneous junk.

Limit as a Natural Isomorphism

There is still something unsatisfying about this definition of a limit. I mean, it’s workable, but we still have this commutativity condition for the triangles that are linking any two cones. It would be so much more elegant if we could replace it with some naturality condition. But how?

We are no longer dealing with one cone but with a whole collection (in fact, a category) of cones. If the limit exists (and — let’s make it clear — there’s no guarantee of that), one of those cones is the universal cone. For every other cone we have a unique factorizing morphism that maps its apex, let’s call it c, to the apex of the universal cone, which we named Lim D. (In fact, I can skip the word “other,” because the identity morphism maps the universal cone to itself and it trivially factorizes through itself.) Let me repeat the important part: given any cone, there is a unique morphism of a special kind. We have a mapping of cones to special morphisms, and it’s a one-to-one mapping.

This special morphism is a member of the hom-set C(c, Lim D). The other members of this hom-set are less fortunate, in the sense that they don’t factorize the mapping of the two cones. What we want is to be able to pick, for each c, one morphism from the set C(c, Lim D) — a morphism that satisfies the particular commutativity condition. Does that sound like defining a natural transformation? It most certainly does!

But what are the functors that are related by this transformation?

One functor is the mapping of c to the set C(c, Lim D). It’s a functor from C to Set — it maps objects to sets. In fact it’s a contravariant functor. Here’s how we define its action on morphisms: Let’s take a morphism f from c' to c:

f :: c' -> c

Our functor maps c' to the set C(c', Lim D). To define the action of this functor on f (in other words, to lift f), we have to define the corresponding mapping between C(c, Lim D) and C(c', Lim D). So let’s pick one element u of C(c, Lim D) and see if we can map it to some element of C(c', Lim D). An element of a hom-set is a morphism, so we have:

u :: c -> Lim D

We can precompose u with f to get:

u . f :: c' -> Lim D

And that’s an element of C(c', Lim D)— so indeed, we have found a mapping of morphisms:

contramap :: (c' -> c) -> (c -> Lim D) -> (c' -> Lim D)
contramap f u = u . f

Notice the inversion in the order of c and c' characteristic of a contravariant functor.

To define a natural transformation, we need another functor that’s also a mapping from C to Set. But this time we’ll consider a set of cones. Cones are just natural transformations, so we are looking at the set of natural transformations Nat(Δ_c, D). The mapping from c to this particular set of natural transformations is a (contravariant) functor. How can we show that? Again, let’s define its action on a morphism:

f :: c' -> c

The lifting of f should be a mapping of natural transformations between two functors that go from I to C:

Nat(Δ_c, D) -> Nat(Δ_c', D)

How do we map natural transformations? Every natural transformation is a selection of morphisms — its components — one morphism per element of I. A component of some α (a member of Nat(Δ_c, D)) at a (an object in I) is a morphism:

α_a :: Δ_ca -> D a

or, using the definition of the constant functor Δ,

α_a :: c -> D a

Given f and α, we have to construct a β, a member of Nat(Δ_c', D). Its component at a should be a morphism:

β_a :: c' -> D a

We can easily get the latter from the former by precomposing it with f:

β_a = α_a . f

It’s relatively easy to show that those components indeed add up to a natural transformation.

Given our morphism f, we have thus built a mapping between two natural transformations, component-wise. This mapping defines contramap for the functor:

c -> Nat(Δ_c, D)

What I have just done is to show you that we have two (contravariant) functors from C to Set. I haven’t made any assumptions — these functors always exist.

Incidentally, the first of these functors plays an important role in category theory, and we’ll see it again when we talk about Yoneda’s lemma. There is a name for contravariant functors from any category C to Set: they are called “presheaves.” This one is called a representable presheaf. The second functor is also a presheaf.

Now that we have two functors, we can talk about natural transformations between them. So without further ado, here’s the conclusion: A functor D from I to C has a limit Lim D if and only if there is a natural isomorphism between the two functors I have just defined:

C(c, Lim D) ≃ Nat(Δ_c, D)

Let me remind you what a natural isomorphism is. It’s a natural transformation whose every component is an isomorphism, that is to say an invertible morphism.

I’m not going to go through the proof of this statement. The procedure is pretty straightforward if not tedious. When dealing with natural transformations, you usually focus on components, which are morphisms. In this case, since the target of both functors is Set, the components of the natural isomorphism will be functions. These are higher order functions, because they go from the hom-set to the set of natural transformations. Again, you can analyze a function by considering what it does to its argument: here the argument will be a morphism — a member of C(c, Lim D) — and the result will be a natural transformation — a member of Nat(Δ_c, D), or what we have called a cone. This natural transformation, in turn, has its own components, which are morphisms. So it’s morphisms all the way down, and if you can keep track of them, you can prove the statement.

The most important result is that the naturality condition for this isomorphism is exactly the commutativity condition for the mapping of cones.

As a preview of coming attractions, let me mention that the set Nat(Δ_c, D) can be thought of as a hom-set in the functor category; so our natural isomorphism relates two hom-sets, which points at an even more general relationship called an adjunction.

Examples of Limits

We’ve seen that the categorical product is a limit of a diagram generated by a simple category we called 2.

There is an even simpler example of a limit: the terminal object. The first impulse would be to think of a singleton category as leading to a terminal object, but the truth is even starker than that: the terminal object is a limit generated by an empty category. A functor from an empty category selects no object, so a cone shrinks to just the apex. The universal cone is the lone apex that has a unique morphism coming to it from any other apex. You will recognize this as the definition of the terminal object.

The next interesting limit is called the equalizer. It’s a limit generated by a two-element category with two parallel morphisms going between them (and, as always, the identity morphisms). This category selects a diagram in C consisting of two objects, a and b, and two morphisms:

f :: a -> b
g :: a -> b

To build a cone over this diagram, we have to add the apex, c and two projections:

p :: c -> a
q :: c -> b

We have two triangles that must commute:

q = f . p
q = g . p

This tells us that q is uniquely determined by one of these equations, say, q = f . p, and we can omit it from the picture. So we are left with just one condition:

f . p = g . p

The way to think about it is that, if we restrict our attention to Set, the image of the function p selects a subset of a. When restricted to this subset, the functions f and g are equal.

For instance, take a to be the two-dimensional plane parameterized by coordinates x and y. Take b to be the real line, and take:

f (x, y) = 2 * y + x
g (x, y) = y - x

The equalizer for these two functions is the set of real numbers (the apex, c) and the function:

p t = (t, (-2) * t)

Notice that (p t) defines a straight line in the two-dimensional plane. Along this line, the two functions are equal.

Of course, there are other sets c' and functions p' that may lead to the equality:

f . p' = g . p'

but they all uniquely factor out through p. For instance, we can take the singleton set () as c' and the function:

p'() = (0, 0)

It’s a good cone, because f (0, 0) = g (0, 0). But it’s not universal, because of the unique factorization through h:

p' = p . h

with

h () = 0

An equalizer can thus be used to solve equations of the type f x = g x. But it’s much more general, because it’s defined in terms of objects and morphisms rather than algebraically.

An even more general idea of solving an equation is embodied in another limit — the pullback. Here, we still have two morphisms that we want to equate, but this time their domains are different. We start with a three-object category of the shape: 1->2<-3. The diagram corresponding to this category consists of three objects, a, b, and c, and two morphisms:

f :: a -> b
g :: c -> b

This diagram is often called a cospan.

A cone built on top of this diagram consists of the apex, d, and three morphisms:

p :: d -> a
q :: d -> c
r :: d -> b

Commutativity conditions tell us that r is completely determined by the other morphisms, and can be omitted from the picture. So we are only left with the following condition:

g . q = f . p

A pullback is a universal cone of this shape.

Again, if you narrow your focus down to sets, you can think of the object d as consisting of pairs of elements from a and c for which f acting on the first component is equal to g acting on the second component. If this is still too general, consider the special case in which g is a constant function, say g _ = 1.23 (assuming that b is a set of real numbers). Then you are really solving the equation:

f x = 1.23

In this case, the choice of c is irrelevant (as long as it’s not an empty set), so we can take it to be a singleton set. The set a could, for instance, be the set of three-dimensional vectors, and f the vector length. Then the pullback is the set of pairs (v, ()), where v is a vector of length 1.23 (a solution to the equation sqrt (x²+y²+z²) = 1.23), and () is the dummy element of the singleton set.

But pullbacks have more general applications, also in programming. For instance, consider C++ classes as a category in which morphism are arrows that connect subclasses to superclasses. We’ll consider inheritance a transitive property, so if C inherits from B and B inherits from A then we’ll say that C inherits from A (after all, you can pass a pointer to C where a pointer to A is expected). Also, we’ll assume that C inherits from C, so we have the identity arrow for every class. This way subclassing is aligned with subtyping. C++ also supports multiple inheritance, so you can construct a diamond inheritance diagram with two classes B and C inheriting from A, and a fourth class D multiply inheriting from B and C. Normally, D would get two copies of A, which is rarely desirable; but you can use virtual inheritance to have just one copy of A in D.

What would it mean to have D be a pullback in this diagram? It would mean that any class E that multiply inherits from B and C is also a subclass of D. This is not directly expressible in C++, where subtyping is nominal (the C++ compiler wouldn’t infer this kind of class relationship — it would require “duck typing”). But we could go outside of the subtyping relationship and instead ask whether a cast from E to D would be safe or not. This cast would be safe if D were the bare-bone combination of B and C, with no additional data and no overriding of methods. And, of course, there would be no pullback if there is a name conflict between some methods of B and C.

There’s also a more advanced use of a pullback in type inference. There is often a need to unify types of two expressions. For instance, suppose that the compiler wants to infer the type of a function:

twice f x = f (f x)

It will assign preliminary types to all variables and sub-expressions. In particular, it will assign:

f       :: t0
x       :: t1
f x     :: t2
f (f x) :: t3

from which it will deduce that:

twice :: t0 -> t1 -> t3

It will also come up with a set of constraints resulting from the rules of function application:

t0 = t1 -> t2 -- because f is applied to x
t0 = t2 -> t3 -- because f is applied to (f x)

These constraints have to be unified by finding a set of types (or type variables) that, when substituted for the unknown types in both expressions, produce the same type. One such substitution is:

t1 = t2 = t3 = Int
twice :: (Int -> Int) -> Int -> Int

but, obviously, it’s not the most general one. The most general substitution is obtained using a pullback. I won’t go into the details, because they are beyond the scope of this book, but you can convince yourself that the result should be:

twice :: (t -> t) -> t -> t

with t a free type variable.

Colimits

Just like all constructions in category theory, limits have their dual image in opposite categories. When you invert the direction of all arrows in a cone, you get a co-cone, and the universal one of those is called a colimit. Notice that the inversion also affects the factorizing morphism, which now flows from the universal co-cone to any other co-cone.

Cocone with a factorizing morphism h connecting two apexes.

A typical example of a colimit is a coproduct, which corresponds to the diagram generated by 2, the category we’ve used in the definition of the product.

Both the product and the coproduct embody the essence of a pair of objects, each in a different way.

Just like the terminal object was a limit, so the initial object is a colimit corresponding to the diagram based on an empty category.

The dual of the pullback is called the pushout. It’s based on a diagram called a span, generated by the category 1<-2->3.

Continuity

I said previously that functors come close to the idea of continuous mappings of categories, in the sense that they never break existing connections (morphisms). The actual definition of a continuous functor F from a category C to C’ includes the requirement that the functor preserve limits. Every diagram D in C can be mapped to a diagram F ∘ D in C’ by simply composing two functors. The continuity condition for F states that, if the diagram D has a limit Lim D, then the diagram F ∘ D also has a limit, and it is equal to F (Lim D).

Notice that, because functors map morphisms to morphisms, and compositions to compositions, an image of a cone is always a cone. A commuting triangle is always mapped to a commuting triangle (functors preserve composition). The same is true for the factorizing morphisms: the image of a factorizing morphism is also a factorizing morphism. So every functor is almost continuous. What may go wrong is the uniqueness condition. The factorizing morphism in C’ might not be unique. There may also be other “better cones” in C’ that were not available in C.

A hom-functor is an example of a continuous functor. Recall that the hom-functor, C(a, b), is contravariant in the first variable and covariant in the second. In other words, it’s a functor:

C^op × C -> Set

When its second argument is fixed, the hom-set functor (which becomes the representable presheaf) maps colimits in C to limits in Set; and when its first argument is fixed, it maps limits to limits.

In Haskell, a hom-functor is the mapping of any two types to a function type, so it’s just a parameterized function type. When we fix the second parameter, let’s say to String, we get the contravariant functor:

newtype ToString a = ToString (a -> String)
instance Contravariant ToString where
    contramap f (ToString g) = ToString (g . f)

Continuity means that when ToString is applied to a colimit, for instance a coproduct Either b c, it will produce a limit; in this case a product of two function types:

ToString (Either b c) ~ (b -> String, c -> String)

Indeed, any function of Either b c is implemented as a case statement with the two cases being serviced by a pair of functions.

Similarly, when we fix the first argument of the hom-set, we get the familiar reader functor. Its continuity means that, for instance, any function returning a product is equivalent to a product of functions; in particular:

r -> (a, b) ~ (r -> a, r -> b)

I know what you’re thinking: You don’t need category theory to figure these things out. And you’re right! Still, I find it amazing that such results can be derived from first principles with no recourse to bits and bytes, processor architectures, compiler technologies, or even lambda calculus.

If you’re curious where the names “limit” and “continuity” come from, they are a generalization of the corresponding notions from calculus. In calculus limits and continuity are defined in terms of open neighborhoods. Open sets, which define topology, form a category (a poset).

Challenges

How would you describe a pushout in the category of C++ classes?
Show that the limit of the identity functor Id :: C -> C is the initial object.
Subsets of a given set form a category. A morphism in that category is defined to be an arrow connecting two sets if the first is the subset of the second. What is a pullback of two sets in such a category? What’s a pushout? What are the initial and terminal objects?
Can you guess what a coequalizer is?
Show that, in a category with a terminal object, a pullback towards the terminal object is a product.
Similarly, show that a pushout from an initial object (if one exists) is the coproduct.

Next: Free Monoids.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.
Follow @BartoszMilewski

April 7, 2015

Natural Transformations

Posted by Bartosz Milewski under Category Theory, Functional Programming, Haskell, Programming
[68] Comments

This is part 10 of Categories for Programmers. Previously: Function Types. See the Table of Contents.

We talked about functors as mappings between categories that preserve their structure. A functor “embeds” one category in another. It may collapse multiple things into one, but it never breaks connections. One way of thinking about it is that with a functor we are modeling one category inside another. The source category serves as a model, a blueprint, for some structure that’s part of the target category.

There may be many ways of embedding one category in another. Sometimes they are equivalent, sometimes very different. One may collapse the whole source category into one object, another may map every object to a different object and every morphism to a different morphism. The same blueprint may be realized in many different ways. Natural transformations help us compare these realizations. They are mappings of functors — special mappings that preserve their functorial nature.

Consider two functors F and G between categories C and D. If you focus on just one object a in C, it is mapped to two objects: F a and G a. A mapping of functors should therefore map F a to G a.

Notice that F a and G a are objects in the same category D. Mappings between objects in the same category should not go against the grain of the category. We don’t want to make artificial connections between objects. So it’s natural to use existing connections, namely morphisms. A natural transformation is a selection of morphisms: for every object a, it picks one morphism from F a to G a. If we call the natural transformation α, this morphism is called the component of α at a, or α_a.

α_a :: F a -> G a

Keep in mind that a is an object in C while α_a is a morphism in D.

If, for some a, there is no morphism between F a and G a in D, there can be no natural transformation between F and G.

Of course that’s only half of the story, because functors not only map objects, they map morphisms as well. So what does a natural transformation do with those mappings? It turns out that the mapping of morphisms is fixed — under any natural transformation between F and G, F f must be transformed into G f. What’s more, the mapping of morphisms by the two functors drastically restricts the choices we have in defining a natural transformation that’s compatible with it. Consider a morphism f between two objects a and b in C. It’s mapped to two morphisms, F f and G f in D:

F f :: F a -> F b
G f :: G a -> G b

The natural transformation α provides two additional morphisms that complete the diagram in D:

α_a :: F a -> G a
α_b :: F b -> G b

Now we have two ways of getting from F a to G b. To make sure that they are equal, we must impose the naturality condition that holds for any f:

G f ∘ α_a = α_b ∘ F f

The naturality condition is a pretty stringent requirement. For instance, if the morphism F f is invertible, naturality determines α_b in terms of α_a. It transports α_a along f:

α_b = (G f) ∘ α_a ∘ (F f)^-1

If there is more than one invertible morphism between two objects, all these transports have to agree. In general, though, morphisms are not invertible; but you can see that the existence of natural transformations between two functors is far from guaranteed. So the scarcity or the abundance of functors that are related by natural transformations may tell you a lot about the structure of categories between which they operate. We’ll see some examples of that when we talk about limits and the Yoneda lemma.

Looking at a natural transformation component-wise, one may say that it maps objects to morphisms. Because of the naturality condition, one may also say that it maps morphisms to commuting squares — there is one commuting naturality square in D for every morphism in C.

This property of natural transformations comes in very handy in a lot of categorical constructions, which often include commuting diagrams. With a judicious choice of functors, a lot of these commutativity conditions may be transformed into naturality conditions. We’ll see examples of that when we get to limits, colimits, and adjunctions.

Finally, natural transformations may be used to define isomorphisms of functors. Saying that two functors are naturally isomorphic is almost like saying they are the same. Natural isomorphism is defined as a natural transformation whose components are all isomorphisms (invertible morphisms).

Polymorphic Functions

We talked about the role of functors (or, more specifically, endofunctors) in programming. They correspond to type constructors that map types to types. They also map functions to functions, and this mapping is implemented by a higher order function fmap (or transform, then, and the like in C++).

To construct a natural transformation we start with an object, here a type, a. One functor, F, maps it to the type F a. Another functor, G, maps it to G a. The component of a natural transformation alpha at a is a function from F a to G a. In pseudo-Haskell:

alpha_a :: F a -> G a

A natural transformation is a polymorphic function that is defined for all types a:

alpha :: forall a . F a -> G a

The forall a is optional in Haskell (and in fact requires turning on the language extension ExplicitForAll). Normally, you would write it like this:

alpha :: F a -> G a

Keep in mind that it’s really a family of functions parameterized by a. This is another example of the terseness of the Haskell syntax. A similar construct in C++ would be slightly more verbose:

template<class A> G<A> alpha(F<A>);

There is a more profound difference between Haskell’s polymorphic functions and C++ generic functions, and it’s reflected in the way these functions are implemented and type-checked. In Haskell, a polymorphic function must be defined uniformly for all types. One formula must work across all types. This is called parametric polymorphism.

C++, on the other hand, supports by default ad hoc polymorphism, which means that a template doesn’t have to be well-defined for all types. Whether a template will work for a given type is decided at instantiation time, where a concrete type is substituted for the type parameter. Type checking is deferred, which unfortunately often leads to incomprehensible error messages.

In C++, there is also a mechanism for function overloading and template specialization, which allows different definitions of the same function for different types. In Haskell this functionality is provided by type classes and type families.

Haskell’s parametric polymorphism has an unexpected consequence: any polymorphic function of the type:

alpha :: F a -> G a

where F and G are functors, automatically satisfies the naturality condition. Here it is in categorical notation (f is a function f::a->b):

G f ∘ α_a = α_b ∘ F f

In Haskell, the action of a functor G on a morphism f is implemented using fmap. I’ll first write it in pseudo-Haskell, with explicit type annotations:

fmap_G f . alpha_a = alpha_b . fmap_F f

Because of type inference, these annotations are not necessary, and the following equation holds:

fmap f . alpha = alpha . fmap f

This is still not real Haskell — function equality is not expressible in code — but it’s an identity that can be used by the programmer in equational reasoning; or by the compiler, to implement optimizations.

The reason why the naturality condition is automatic in Haskell has to do with “theorems for free.” Parametric polymorphism, which is used to define natural transformations in Haskell, imposes very strong limitations on the implementation — one formula for all types. These limitations translate into equational theorems about such functions. In the case of functions that transform functors, free theorems are the naturality conditions. [You may read more about free theorems in my blog Parametricity: Money for Nothing and Theorems for Free.]

One way of thinking about functors in Haskell that I mentioned earlier is to consider them generalized containers. We can continue this analogy and consider natural transformations to be recipes for repackaging the contents of one container into another container. We are not touching the items themselves: we don’t modify them, and we don’t create new ones. We are just copying (some of) them, sometimes multiple times, into a new container.

The naturality condition becomes the statement that it doesn’t matter whether we modify the items first, through the application of fmap, and repackage later; or repackage first, and then modify the items in the new container, with its own implementation of fmap. These two actions, repackaging and fmapping, are orthogonal. “One moves the eggs, the other boils them.”

Let’s see a few examples of natural transformations in Haskell. The first is between the list functor, and the Maybe functor. It returns the head of the list, but only if the list is non-empty:

safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x:xs) = Just x

It’s a function polymorphic in a. It works for any type a, with no limitations, so it is an example of parametric polymorphism. Therefore it is a natural transformation between the two functors. But just to convince ourselves, let’s verify the naturality condition.

fmap f . safeHead = safeHead . fmap f

We have two cases to consider; an empty list:

fmap f (safeHead []) = fmap f Nothing = Nothing

safeHead (fmap f []) = safeHead [] = Nothing

and a non-empty list:

fmap f (safeHead (x:xs)) = fmap f (Just x) = Just (f x)

safeHead (fmap f (x:xs)) = safeHead (f x : fmap f xs) = Just (f x)

I used the implementation of fmap for lists:

fmap f [] = []
fmap f (x:xs) = f x : fmap f xs

and for Maybe:

fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)

An interesting case is when one of the functors is the trivial Const functor. A natural transformation from or to a Const functor looks just like a function that’s either polymorphic in its return type or in its argument type.

For instance, length can be thought of as a natural transformation from the list functor to the Const Int functor:

length :: [a] -> Const Int a
length [] = Const 0
length (x:xs) = Const (1 + unConst (length xs))

Here, unConst is used to peel off the Const constructor:

unConst :: Const c a -> c
unConst (Const x) = x

Of course, in practice length is defined as:

length :: [a] -> Int

which effectively hides the fact that it’s a natural transformation.

Finding a parametrically polymorphic function from a Const functor is a little harder, since it would require the creation of a value from nothing. The best we can do is:

scam :: Const Int a -> Maybe a
scam (Const x) = Nothing

Another common functor that we’ve seen already, and which will play an important role in the Yoneda lemma, is the Reader functor. I will rewrite its definition as a newtype:

newtype Reader e a = Reader (e -> a)

It is parameterized by two types, but is (covariantly) functorial only in the second one:

instance Functor (Reader e) where  
    fmap f (Reader g) = Reader (\x -> f (g x))

For every type e, you can define a family of natural transformations from Reader e to any other functor f. We’ll see later that the members of this family are always in one to one correspondence with the elements of f e (the Yoneda lemma).

For instance, consider the somewhat trivial unit type () with one element (). The functor Reader () takes any type a and maps it into a function type ()->a. These are just all the functions that pick a single element from the set a. There are as many of these as there are elements in a. Now let’s consider natural transformations from this functor to the Maybe functor:

alpha :: Reader () a -> Maybe a

There are only two of these, dumb and obvious:

dumb (Reader _) = Nothing

and

obvious (Reader g) = Just (g ())

(The only thing you can do with g is to apply it to the unit value ().)

And, indeed, as predicted by the Yoneda lemma, these correspond to the two elements of the Maybe () type, which are Nothing and Just (). We’ll come back to the Yoneda lemma later — this was just a little teaser.

Beyond Naturality

A parametrically polymorphic function between two functors (including the edge case of the Const functor) is always a natural transformation. Since all standard algebraic data types are functors, any polymorphic function between such types is a natural transformation.

We also have function types at our disposal, and those are functorial in their return type. We can use them to build functors (like the Reader functor) and define natural transformations that are higher-order functions.

However, function types are not covariant in the argument type. They are contravariant. Of course contravariant functors are equivalent to covariant functors from the opposite category. Polymorphic functions between two contravariant functors are still natural transformations in the categorical sense, except that they work on functors from the opposite category to Haskell types.

You might remember the example of a contravariant functor we’ve looked at before:

newtype Op r a = Op (a -> r)

This functor is contravariant in a:

instance Contravariant (Op r) where
    contramap f (Op g) = Op (g . f)

We can write a polymorphic function from, say, Op Bool to Op String:

predToStr (Op f) = Op (\x -> if f x then "T" else "F")

But since the two functors are not covariant, this is not a natural transformation in Hask. However, because they are both contravariant, they satisfy the “opposite” naturality condition:

contramap f . predToStr = predToStr . contramap f

Notice that the function f must go in the opposite direction than what you’d use with fmap, because of the signature of contramap:

contramap :: (b -> a) -> (Op Bool a -> Op Bool b)

Are there any type constructors that are not functors, whether covariant or contravariant? Here’s one example:

a -> a

This is not a functor because the same type a is used both in the negative (contravariant) and positive (covariant) position. You can’t implement fmap or contramap for this type. Therefore a function of the signature:

(a -> a) -> f a

where f is an arbitrary functor, cannot be a natural transformation. Interestingly, there is a generalization of natural transformations, called dinatural transformations, that deals with such cases. We’ll get to them when we discuss ends.

Functor Category

Now that we have mappings between functors — natural transformations — it’s only natural to ask the question whether functors form a category. And indeed they do! There is one category of functors for each pair of categories, C and D. Objects in this category are functors from C to D, and morphisms are natural transformations between those functors.

We have to define composition of two natural transformations, but that’s quite easy. The components of natural transformations are morphisms, and we know how to compose morphisms.

Indeed, let’s take a natural transformation α from functor F to G. Its component at object a is some morphism:

α_a :: F a -> G a

We’d like to compose α with β, which is a natural transformation from functor G to H. The component of β at a is a morphism:

β_a :: G a -> H a

These morphisms are composable and their composition is another morphism:

β_a ∘ α_a :: F a -> H a

We will use this morphism as the component of the natural transformation β ⋅ α — the composition of two natural transformations β after α:

(β ⋅ α)_a = β_a ∘ α_a

One (long) look at a diagram convinces us that the result of this composition is indeed a natural transformation from F to H:

H f ∘ (β ⋅ α)_a = (β ⋅ α)_b ∘ F f

Composition of natural transformations is associative, because their components, which are regular morphisms, are associative with respect to their composition.

Finally, for each functor F there is an identity natural transformation 1_F whose components are the identity morphisms:

id_{F a} :: F a -> F a

So, indeed, functors form a category.

A word about notation. Following Saunders Mac Lane I use the dot for the kind of natural transformation composition I have just described. The problem is that there are two ways of composing natural transformations. This one is called the vertical composition, because the functors are usually stacked up vertically in the diagrams that describe it. Vertical composition is important in defining the functor category. I’ll explain horizontal composition shortly.

The functor category between categories C and D is written as Fun(C, D), or [C, D], or sometimes as D^C. This last notation suggests that a functor category itself might be considered a function object (an exponential) in some other category. Is this indeed the case?

Let’s have a look at the hierarchy of abstractions that we’ve been building so far. We started with a category, which is a collection of objects and morphisms. Categories themselves (or, strictly speaking small categories, whose objects form sets) are themselves objects in a higher-level category Cat. Morphisms in that category are functors. A Hom-set in Cat is a set of functors. For instance Cat(C, D) is a set of functors between two categories C and D.

A functor category [C, D] is also a set of functors between two categories (plus natural transformations as morphisms). Its objects are the same as the members of Cat(C, D). Moreover, a functor category, being a category, must itself be an object of Cat (it so happens that the functor category between two small categories is itself small). We have a relationship between a Hom-set in a category and an object in the same category. The situation is exactly like the exponential object that we’ve seen in the last section. Let’s see how we can construct the latter in Cat.

As you may remember, in order to construct an exponential, we need to first define a product. In Cat, this turns out to be relatively easy, because small categories are sets of objects, and we know how to define cartesian products of sets. So an object in a product category C × D is just a pair of objects, (c, d), one from C and one from D. Similarly, a morphism between two such pairs, (c, d) and (c', d'), is a pair of morphisms, (f, g), where f :: c -> c' and g :: d -> d'. These pairs of morphisms compose component-wise, and there is always an identity pair that is just a pair of identity morphisms. To make the long story short, Cat is a full-blown cartesian closed category in which there is an exponential object D^C for any pair of categories. And by “object” in Cat I mean a category, so D^C is a category, which we can identify with the functor category between C and D.

2-Categories

With that out of the way, let’s have a closer look at Cat. By definition, any Hom-set in Cat is a set of functors. But, as we have seen, functors between two objects have a richer structure than just a set. They form a category, with natural transformations acting as morphisms. Since functors are considered morphisms in Cat, natural transformations are morphisms between morphisms.

This richer structure is an example of a 2-category, a generalization of a category where, besides objects and morphisms (which might be called 1-morphisms in this context), there are also 2-morphisms, which are morphisms between morphisms.

In the case of Cat seen as a 2-category we have:

Objects: (Small) categories
1-morphisms: Functors between categories
2-morphisms: Natural transformations between functors.

Instead of a Hom-set between two categories C and D, we have a Hom-category — the functor category D^C. We have regular functor composition: a functor F from D^C composes with a functor G from E^D to give G ∘ F from E^C. But we also have composition inside each Hom-category — vertical composition of natural transformations, or 2-morphisms, between functors.

With two kinds of composition in a 2-category, the question arises: How do they interact with each other?

Let’s pick two functors, or 1-morphisms, in Cat:

F :: C -> D
G :: D -> E

and their composition:

G ∘ F :: C -> E

Suppose we have two natural transformations, α and β, that act, respectively, on functors F and G:

α :: F -> F'
β :: G -> G'

Notice that we cannot apply vertical composition to this pair, because the target of α is different from the source of β. In fact they are members of two different functor categories: D ^C and E ^D. We can, however, apply composition to the functors F’ and G’, because the target of F’ is the source of G’ — it’s the category D. What’s the relation between the functors G’∘ F’ and G ∘ F?

Having α and β at our disposal, can we define a natural transformation from G ∘ F to G’∘ F’? Let me sketch the construction.

As usual, we start with an object a in C. Its image splits into two objects in D: F a and F'a. There is also a morphism, a component of α, connecting these two objects:

α_a :: F a -> F'a

When going from D to E, these two objects split further into four objects:

G (F a), G'(F a), G (F'a), G'(F'a)

We also have four morphisms forming a square. Two of these morphisms are the components of the natural transformation β:

β_{F a} :: G (F a) -> G'(F a)
β_F'a :: G (F'a) -> G'(F'a)

The other two are the images of α_a under the two functors (functors map morphisms):

G α_a :: G (F a) -> G (F'a)
G'α_a :: G'(F a) -> G'(F'a)

That’s a lot of morphisms. Our goal is to find a morphism that goes from G (F a) to G'(F'a), a candidate for the component of a natural transformation connecting the two functors G ∘ F and G’∘ F’. In fact there’s not one but two paths we can take from G (F a) to G'(F'a):

G'α_a ∘ β_{F a}
β_F'a ∘ G α_a

Luckily for us, they are equal, because the square we have formed turns out to be the naturality square for β.

We have just defined a component of a natural transformation from G ∘ F to G’∘ F’. The proof of naturality for this transformation is pretty straightforward, provided you have enough patience.

We call this natural transformation the horizontal composition of α and β:

β ∘ α :: G ∘ F -> G'∘ F'

Again, following Mac Lane I use the small circle for horizontal composition, although you may also encounter star in its place.

Here’s a categorical rule of thumb: Every time you have composition, you should look for a category. We have vertical composition of natural transformations, and it’s part of the functor category. But what about the horizontal composition? What category does that live in?

The way to figure this out is to look at Cat sideways. Look at natural transformations not as arrows between functors but as arrows between categories. A natural transformation sits between two categories, the ones that are connected by the functors it transforms. We can think of it as connecting these two categories.

Let’s focus on two objects of Cat — categories C and D. There is a set of natural transformations that go between functors that connect C to D. These natural transformations are our new arrows from C to D. By the same token, there are natural transformations going between functors that connect D to E, which we can treat as new arrows going from D to E. Horizontal composition is the composition of these arrows.

We also have an identity arrow going from C to C. It’s the identity natural transformation that maps the identity functor on C to itself. Notice that the identity for horizontal composition is also the identity for vertical composition, but not vice versa.

Finally, the two compositions satisfy the interchange law:

(β' ⋅ α') ∘ (β ⋅ α) = (β' ∘ β) ⋅ (α' ∘ α)

I will quote Saunders Mac Lane here: The reader may enjoy writing down the evident diagrams needed to prove this fact.

There is one more piece of notation that might come in handy in the future. In this new sideways interpretation of Cat there are two ways of getting from object to object: using a functor or using a natural transformation. We can, however, re-interpret the functor arrow as a special kind of natural transformation: the identity natural transformation acting on this functor. So you’ll often see this notation:

F ∘ α

where F is a functor from D to E, and α is a natural transformation between two functors going from C to D. Since you can’t compose a functor with a natural transformation, this is interpreted as a horizontal composition of the identity natural transformation 1_F after α.

Similarly:

α ∘ F

is a horizontal composition of α after 1_F.

Conclusion

This concludes the first part of the book. We’ve learned the basic vocabulary of category theory. You may think of objects and categories as nouns; and morphisms, functors, and natural transformations as verbs. Morphisms connect objects, functors connect categories, natural transformations connect functors.

But we’ve also seen that, what appears as an action at one level of abstraction, becomes an object at the next level. A set of morphisms turns into a function object. As an object, it can be a source or a target of another morphism. That’s the idea behind higher order functions.

A functor maps objects to objects, so we can use it as a type constructor, or a parametric type. A functor also maps morphisms, so it is a higher order function — fmap. There are some simple functors, like Const, product, and coproduct, that can be used to generate a large variety of algebraic data types. Function types are also functorial, both covariant and contravariant, and can be used to extend algebraic data types.

Functors may be looked upon as objects in the functor category. As such, they become sources and targets of morphisms: natural transformations. A natural transformation is a special type of polymorphic function.

Challenges

Define a natural transformation from the Maybe functor to the list functor. Prove the naturality condition for it.
Define at least two different natural transformations between Reader () and the list functor. How many different lists of () are there?
Continue the previous exercise with Reader Bool and Maybe.
Show that horizontal composition of natural transformation satisfies the naturality condition (hint: use components). It’s a good exercise in diagram chasing.
Write a short essay about how you may enjoy writing down the evident diagrams needed to prove the interchange law.
Create a few test cases for the opposite naturality condition of transformations between different Op functors. Here’s one choice:
```
op :: Op Bool Int
op = Op (\x -> x > 0)
```
and
```
f :: String -> Int
f x = read x
```

Next: Declarative Programming.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help.
Follow @BartoszMilewski

March 13, 2015

Function Types

Posted by Bartosz Milewski under Category Theory, Functional Programming, Haskell, Type System
[28] Comments

This is part 9 of Categories for Programmers. Previously: Functoriality. See the Table of Contents.

So far I’ve been glossing over the meaning of function types. A function type is different from other types.

Take Integer, for instance: It’s just a set of integers. Bool is a two element set. But a function type a->b is more than that: it’s a set of morphisms between objects a and b. A set of morphisms between two objects in any category is called a hom-set. It just so happens that in the category Set every hom-set is itself an object in the same category —because it is, after all, a set.

Hom-set in Set is just a set

The same is not true of other categories where hom-sets are external to a category. They are even called external hom-sets.

Hom-set in category C is an external set

It’s the self-referential nature of the category Set that makes function types special. But there is a way, at least in some categories, to construct objects that represent hom-sets. Such objects are called internal hom-sets.

Universal Construction

Let’s forget for a moment that function types are sets and try to construct a function type, or more generally, an internal hom-set, from scratch. As usual, we’ll take our cues from the Set category, but carefully avoid using any properties of sets, so that the construction will automatically work for other categories.

A function type may be considered a composite type because of its relationship to the argument type and the result type. We’ve already seen the constructions of composite types — those that involved relationships between objects. We used universal constructions to define a product type and a coproduct types. We can use the same trick to define a function type. We will need a pattern that involves three objects: the function type that we are constructing, the argument type, and the result type.

The obvious pattern that connects these three types is called function application or evaluation. Given a candidate for a function type, let’s call it z (notice that, if we are not in the category Set, this is just an object like any other object), and the argument type a (an object), the application maps this pair to the result type b (an object). We have three objects, two of them fixed (the ones representing the argument type and the result type).

We also have the application, which is a mapping. How do we incorporate this mapping into our pattern? If we were allowed to look inside objects, we could pair a function f (an element of z) with an argument x (an element of a) and map it to f x (the application of f to x, which is an element of b).

In Set we can pick a function f from a set of functions z and we can pick an argument x from the set (type) a. We get an element f x in the set (type) b.

But instead of dealing with individual pairs (f, x), we can as well talk about the whole product of the function type z and the argument type a. The product z×a is an object, and we can pick, as our application morphism, an arrow g from that object to b. In Set, g would be the function that maps every pair (f, x) to f x.

So that’s the pattern: a product of two objects z and a connected to another object b by a morphism g.

A pattern of objects and morphisms that is the starting point of the universal construction

Is this pattern specific enough to single out the function type using a universal construction? Not in every category. But in the categories of interest to us it is. And another question: Would it be possible to define a function object without first defining a product? There are categories in which there is no product, or there isn’t a product for all pairs of objects. The answer is no: there is no function type, if there is no product type. We’ll come back to this later when we talk about exponentials.

Let’s review the universal construction. We start with a pattern of objects and morphisms. That’s our imprecise query, and it usually yields lots and lots of hits. In particular, in Set, pretty much everything is connected to everything. We can take any object z, form its product with a, and there’s going to be a function from it to b (except when b is an empty set).

That’s when we apply our secret weapon: ranking. This is usually done by requiring that there be a mapping between candidate objects — a mapping that somehow factorizes our construction. In our case, we’ll decree that z together with the morphism g from z×a to b is better than some other z' with its own application g', if and only if there is a mapping h from z' to z such that the application of g' factors through the application of g. (Hint: Read this sentence while looking at the picture.)

Establishing a ranking between candidates for the function object

Now here’s the tricky part, and the main reason I postponed this particular universal construction till now. Given the morphism h :: z'-> z, we want to close the diagram that has both z' and z crossed with a. What we really need, given the mapping h from z' to z, is a mapping from z'×a to z×a. And now, after discussing the functoriality of the product, we know how to do it. Because the product itself is a functor (more precisely an endo-bi-functor), it’s possible to lift pairs of morphisms. In other words, we can define not only products of objects but also products of morphisms.

Since we are not touching the second component of the product z'×a, we will lift the pair of morphisms (h, id), where id is an identity on a.

So, here’s how we can factor one application, g, out of another application g':

g' = g ∘ (h × id)

The key here is the action of the product on morphisms.

The third part of the universal construction is selecting the object that is universally the best. Let’s call this object a⇒b (think of this as a symbolic name for one object, not to be confused with a Haskell typeclass constraint — I’ll discuss different ways of naming it later). This object comes with its own application — a morphism from (a⇒b)×a to b — which we will call eval. The object a⇒b is the best if any other candidate for a function object can be uniquely mapped to it in such a way that its application morphism g factorizes through eval. This object is better than any other object according to our ranking.

The definition of the universal function object. This is the same diagram as above, but now the object a⇒b is universal.

Formally:

A function object from a to b is an object a⇒b together with the morphism

eval :: ((a⇒b) × a) -> b

such that for any other object z with a morphism

g :: z × a -> b

there is a unique morphism

h :: z -> (a⇒b)

that factors g through eval:

g = eval ∘ (h × id)

Of course, there is no guarantee that such an object a⇒b exists for any pair of objects a and b in a given category. But it always does in Set. Moreover, in Set, this object is isomorphic to the hom-set Set(a, b).

This is why, in Haskell, we interpret the function type a->b as the categorical function object a⇒b.

Currying

Let’s have a second look at all the candidates for the function object. This time, however, let’s think of the morphism g as a function of two variables, z and a.

g :: z × a -> b

Being a morphism from a product comes as close as it gets to being a function of two variables. In particular, in Set, g is a function from pairs of values, one from the set z and one from the set a.

On the other hand, the universal property tells us that for each such g there is a unique morphism h that maps z to a function object a⇒b.

h :: z -> (a⇒b)

In Set, this just means that h is a function that takes one variable of type z and returns a function from a to b. That makes h a higher order function. Therefore the universal construction establishes a one-to-one correspondence between functions of two variables and functions of one variable returning functions. This correspondence is called currying, and h is called the curried version of g.

This correspondence is one-to-one, because given any g there is a unique h, and given any h you can always recreate the two-argument function g using the formula:

g = eval ∘ (h × id)

The function g can be called the uncurried version of h.

Currying is essentially built into the syntax of Haskell. A function returning a function:

a -> (b -> c)

is often thought of as a function of two variables. That’s how we read the un-parenthesized signature:

a -> b -> c

This interpretation is apparent in the way we define multi-argument functions. For instance:

catstr :: String -> String -> String
catstr s s’ = s ++ s’

The same function can be written as a one-argument function returning a function — a lambda:

catstr’ s = \s’ -> s ++ s’

These two definitions are equivalent, and either can be partially applied to just one argument, producing a one-argument function, as in:

greet :: String -> String
greet = catstr “Hello “

Strictly speaking, a function of two variables is one that takes a pair (a product type):

(a, b) -> c

It’s trivial to convert between the two representations, and the two (higher-order) functions that do it are called, unsurprisingly, curry and uncurry:

curry :: ((a, b)->c) -> (a->b->c)
curry f a b = f (a, b)

and

uncurry :: (a->b->c) -> ((a, b)->c)
uncurry f (a, b) = f a b

Notice that curry is the factorizer for the universal construction of the function object. This is especially apparent if it’s rewritten in this form:

factorizer :: ((a, b)->c) -> (a->(b->c))
factorizer g = \a -> (\b -> g (a, b))

(As a reminder: A factorizer produces the factorizing function from a candidate.)

In non-functional languages, like C++, currying is possible but nontrivial. You can think of multi-argument functions in C++ as corresponding to Haskell functions taking tuples (although, to confuse things even more, in C++ you can define functions that take an explicit std::tuple, as well as variadic functions, and functions taking initializer lists).

You can partially apply a C++ function using the template std::bind. For instance, given a function of two strings:

std::string catstr(std::string s1, std::string s2) {
    return s1 + s2;
}

you can define a function of one string:

using namespace std::placeholders;

auto greet = std::bind(catstr, "Hello ", _1);
std::cout << greet("Haskell Curry");

Scala, which is more functional than C++ or Java, falls somewhere in between. If you anticipate that the function you’re defining will be partially applied, you define it with multiple argument lists:

def catstr(s1: String)(s2: String) = s1 + s2

Of course that requires some amount of foresight or prescience on the part of a library writer.

Exponentials

In mathematical literature, the function object, or the internal hom-object between two objects a and b, is often called the exponential and denoted by b^a. Notice that the argument type is in the exponent. This notation might seem strange at first, but it makes perfect sense if you think of the relationship between functions and products. We’ve already seen that we have to use the product in the universal construction of the internal hom-object, but the connection goes deeper than that.

This is best seen when you consider functions between finite types — types that have a finite number of values, like Bool, Char, or even Int or Double. Such functions, at least in principle, can be fully memoized or turned into data structures to be looked up. And this is the essence of the equivalence between functions, which are morphisms, and function types, which are objects.

For instance a (pure) function from Bool is completely specified by a pair of values: one corresponding to False, and one corresponding to True. The set of all possible functions from Bool to, say, Int is the set of all pairs of Ints. This is the same as the product Int × Int or, being a little creative with notation, Int².

For another example, let’s look at the C++ type char, which contains 256 values (Haskell Char is larger, because Haskell uses Unicode). There are several functions in the part of the C++ Standard Library that are usually implemented using lookups. Functions like isupper or isspace are implemented using tables, which are equivalent to tuples of 256 Boolean values. A tuple is a product type, so we are dealing with products of 256 Booleans: bool × bool × bool × ... × bool. We know from arithmetics that an iterated product defines a power. If you “multiply” bool by itself 256 (or char) times, you get bool to the power of char, or bool^char.

How many values are there in the type defined as 256-tuples of bool? Exactly 2²⁵⁶. This is also the number of different functions from char to bool, each function corresponding to a unique 256-tuple. You can similarly calculate that the number of functions from bool to char is 256², and so on. The exponential notation for function types makes perfect sense in these cases.

We probably wouldn’t want to fully memoize a function from int or double. But the equivalence between functions and data types, if not always practical, is there. There are also infinite types, for instance lists, strings, or trees. Eager memoization of functions from those types would require infinite storage. But Haskell is a lazy language, so the boundary between lazily evaluated (infinite) data structures and functions is fuzzy. This function vs. data duality explains the identification of Haskell’s function type with the categorical exponential object — which corresponds more to our idea of data.

Cartesian Closed Categories

Although I will continue using the category of sets as a model for types and functions, it’s worth mentioning that there is a larger family of categories that can be used for that purpose. These categories are called cartesian closed, and Set is just one example of such a category.

A cartesian closed category must contain:

The terminal object,
A product of any pair of objects, and
An exponential for any pair of objects.

If you consider an exponential as an iterated product (possibly infinitely many times), then you can think of a cartesian closed category as one supporting products of an arbitrary arity. In particular, the terminal object can be thought of as a product of zero objects — or the zero-th power of an object.

What’s interesting about cartesian closed categories from the perspective of computer science is that they provide models for the simply typed lambda calculus, which forms the basis of all typed programming languages.

The terminal object and the product have their duals: the initial object and the coproduct. A cartesian closed category that also supports those two, and in which product can be distributed over coproduct

a × (b + c) = a × b + a × c
(b + c) × a = b × a + c × a

is called a bicartesian closed category. We’ll see in the next section that bicartesian closed categories, of which Set is a prime example, have some interesting properties.

Exponentials and Algebraic Data Types

The interpretation of function types as exponentials fits very well into the scheme of algebraic data types. It turns out that all the basic identities from high-school algebra relating numbers zero and one, sums, products, and exponentials hold pretty much unchanged in any bicartesian closed category theory for, respectively, initial and final objects, coproducts, products, and exponentials. We don’t have the tools yet to prove them (such as adjunctions or the Yoneda lemma), but I’ll list them here nevertheless as a source of valuable intuitions.

Zeroth Power

a⁰ = 1

In the categorical interpretation, we replace 0 with the initial object, 1 with the final object, and equality with isomorphism. The exponential is the internal hom-object. This particular exponential represents the set of morphisms going from the initial object to an arbitrary object a. By the definition of the initial object, there is exactly one such morphism, so the hom-set C(0, a) is a singleton set. A singleton set is the terminal object in Set, so this identity trivially works in Set. What we are saying is that it works in any bicartesian closed category.

In Haskell, we replace 0 with Void; 1 with the unit type (); and the exponential with function type. The claim is that the set of functions from Void to any type a is equivalent to the unit type — which is a singleton. In other words, there is only one function Void->a. We’ve seen this function before: it’s called absurd.

This is a little bit tricky, for two reasons. One is that in Haskell we don’t really have uninhabited types — every type contains the “result of a never ending calculation,” or the bottom. The second reason is that all implementations of absurd are equivalent because, no matter what they do, nobody can ever execute them. There is no value that can be passed to absurd. (And if you manage to pass it a never ending calculation, it will never return!)

Powers of One

1^a = 1

This identity, when interpreted in Set, restates the definition of the terminal object: There is a unique morphism from any object to the terminal object. In general, the internal hom-object from a to the terminal object is isomorphic to the terminal object itself.

In Haskell, there is only one function from any type a to unit. We’ve seen this function before — it’s called unit. You can also think of it as the function const partially applied to ().

First Power

a¹ = a

This is a restatement of the observation that morphisms from the terminal object can be used to pick “elements” of the object a. The set of such morphisms is isomorphic to the object itself. In Set, and in Haskell, the isomorphism is between elements of the set a and functions that pick those elements, ()->a.

Exponentials of Sums

a^b+c = a^b × a^c

Categorically, this says that the exponential from a coproduct of two objects is isomorphic to a product of two exponentials. In Haskell, this algebraic identity has a very practical, interpretation. It tells us that a function from a sum of two types is equivalent to a pair of functions from individual types. This is just the case analysis that we use when defining functions on sums. Instead of writing one function definition with a case statement, we usually split it into two (or more) functions dealing with each type constructor separately. For instance, take a function from the sum type (Either Int Double):

f :: Either Int Double -> String

It may be defined as a pair of functions from, respectively, Int and Double:

f (Left n)  = if n < 0 then "Negative int" else "Positive int"
f (Right x) = if x < 0.0 then "Negative double" else "Positive double"

Here, n is an Int and x is a Double.

Exponentials of Exponentials

(a^b)^c = a^b×c

This is just a way of expressing currying purely in terms of exponential objects. A function returning a function is equivalent to a function from a product (a two-argument function).

Exponentials over Products

(a × b)^c = a^c × b^c

In Haskell: A function returning a pair is equivalent to a pair of functions, each producing one element of the pair.

It’s pretty incredible how those simple high-school algebraic identities can be lifted to category theory and have practical application in functional programming.

Curry-Howard Isomorphism

I have already mentioned the correspondence between logic and algebraic data types. The Void type and the unit type () correspond to false and true. Product types and sum types correspond to logical conjunction ∧ (AND) and disjunction ⋁ (OR). In this scheme, the function type we have just defined corresponds to logical implication ⇒. In other words, the type a->b can be read as “if a then b.”

According to the Curry-Howard isomorphism, every type can be interpreted as a proposition — a statement or a judgment that may be true or false. Such a proposition is considered true if the type is inhabited and false if it isn’t. In particular, a logical implication is true if the function type corresponding to it is inhabited, which means that there exists a function of that type. An implementation of a function is therefore a proof of a theorem. Writing programs is equivalent to proving theorems. Let’s see a few examples.

Let’s take the function eval we have introduced in the definition of the function object. Its signature is:

eval :: ((a -> b), a) -> b

It takes a pair consisting of a function and its argument and produces a result of the appropriate type. It’s the Haskell implementation of the morphism:

eval :: (a⇒b) × a -> b

which defines the function type a⇒b (or the exponential object b^a). Let’s translate this signature to a logical predicate using the Curry-Howard isomorphism:

((a ⇒ b) ∧ a) ⇒ b

Here’s how you can read this statement: If it’s true that b follows from a, and a is true, then b must be true. This makes perfect intuitive sense and has been known since antiquity as modus ponens. We can prove this theorem by implementing the function:

eval :: ((a -> b), a) -> b
eval (f, x) = f x

If you give me a pair consisting of a function f taking a and returning b, and a concrete value x of type a, I can produce a concrete value of type b by simply applying the function f to x. By implementing this function I have just shown that the type ((a -> b), a) -> b is inhabited. Therefore modus ponens is true in our logic.

How about a predicate that is blatantly false? For instance: if a or b is true then a must be true.

a ⋁ b ⇒ a

This is obviously wrong because you can chose an a that is false and a b that is true, and that’s a counter-example.

Mapping this predicate into a function signature using the Curry-Howard isomorphism, we get:

Either a b -> a

Try as you may, you can’t implement this function — you can’t produce a value of type a if you are called with the Right value. (Remember, we are talking about pure functions.)

Finally, we come to the meaning of the absurd function:

absurd :: Void -> a

Considering that Void translates into false, we get:

 false ⇒ a

Anything follows from falsehood (ex falso quodlibet). Here’s one possible proof (implementation) of this statement (function) in Haskell:

absurd (Void a) = absurd a

where Void is defined as:

newtype Void = Void Void

As always, the type Void is tricky. This definition makes it impossible to construct a value because in order to construct one, you would need to provide one. Therefore, the function absurd can never be called.

These are all interesting examples, but is there a practical side to Curry-Howard isomorphism? Probably not in everyday programming. But there are programming languages like Agda or Coq, which take advantage of the Curry-Howard isomorphism to prove theorems.

Computers are not only helping mathematicians do their work — they are revolutionizing the very foundations of mathematics. The latest hot research topic in that area is called Homotopy Type Theory, and is an outgrowth of type theory. It’s full of Booleans, integers, products and coproducts, function types, and so on. And, as if to dispel any doubts, the theory is being formulated in Coq and Agda. Computers are revolutionizing the world in more than one way.

Bibliography

Ralph Hinze, Daniel W. H. James, Reason Isomorphically!. This paper contains proofs of all those high-school algebraic identities in category theory that I mentioned in this chapter.

Next: Natural Transformations.

Acknowledgments

I’d like to thank Gershom Bazerman for checking my math and logic, and André van Meulebrouck, who has been volunteering his editing help throughout this series of posts.
Follow @BartoszMilewski

February 3, 2015

Functoriality

Posted by Bartosz Milewski under C++, Category Theory, Functional Programming, Haskell, Programming
[58] Comments

This is part 8 of Categories for Programmers. Previously: Functors. See the Table of Contents.

Now that you know what a functor is, and have seen a few examples, let’s see how we can build larger functors from smaller ones. In particular it’s interesting to see which type constructors (which correspond to mappings between objects in a category) can be extended to functors (which include mappings between morphisms).

Bifunctors

Since functors are morphisms in Cat (the category of categories), a lot of intuitions about morphisms — and functions in particular — apply to functors as well. For instance, just like you can have a function of two arguments, you can have a functor of two arguments, or a bifunctor. On objects, a bifunctor maps every pair of objects, one from category C, and one from category D, to an object in category E. Notice that this is just saying that it’s a mapping from a cartesian product of categories C×D to E.

That’s pretty straightforward. But functoriality means that a bifunctor has to map morphisms as well. This time, though, it must map a pair of morphisms, one from C and one from D, to a morphism in E.

Again, a pair of morphisms is just a single morphism in the product category C×D. We define a morphism in a cartesian product of categories as a pair of morphisms which goes from one pair of objects to another pair of objects. These pairs of morphisms can be composed in the obvious way:

(f, g) ∘ (f', g') = (f ∘ f', g ∘ g')

The composition is associative and it has an identity — a pair of identity morphisms (id, id). So a cartesian product of categories is indeed a category.

An easier way to think about bifunctors would be to consider them functors in each argument separately. So instead of translating functorial laws — associativity and identity preservation — from functors to bifunctors, it would be enough to check them separately for each argument. However, in general, separate functoriality is not enough to prove joint functoriality. Categories in which joint functoriality fails are called premonoidal.

Let’s define a bifunctor in Haskell. In this case all three categories are the same: the category of Haskell types. A bifunctor is a type constructor that takes two type arguments. Here’s the definition of the Bifunctor typeclass taken directly from the library Control.Bifunctor:

class Bifunctor f where
    bimap :: (a -> c) -> (b -> d) -> f a b -> f c d
    bimap g h = first g . second h
    first :: (a -> c) -> f a b -> f c b
    first g = bimap g id
    second :: (b -> d) -> f a b -> f a d
    second = bimap id

The type variable f represents the bifunctor. You can see that in all type signatures it’s always applied to two type arguments. The first type signature defines bimap: a mapping of two functions at once. The result is a lifted function, (f a b -> f c d), operating on types generated by the bifunctor’s type constructor. There is a default implementation of bimap in terms of first and second, which shows that it’s enough to have functoriality in each argument separately to be able to define a bifunctor. (As mentioned before, this doesn’t always work, because the two maps may not commute, that is first g . second h may not be the same as second h . first g.)

bimap

The two other type signatures, first and second, are the two fmaps witnessing the functoriality of f in the first and the second argument, respectively.

first

second

The typeclass definition provides default implementations for both of them in terms of bimap.

When declaring an instance of Bifunctor, you have a choice of either implementing bimap and accepting the defaults for first and second, or implementing both first and second and accepting the default for bimap (of course, you may implement all three of them, but then it’s up to you to make sure they are related to each other in this manner).

Product and Coproduct Bifunctors

An important example of a bifunctor is the categorical product — a product of two objects that is defined by a universal construction. If the product exists for any pair of objects, the mapping from those objects to the product is bifunctorial. This is true in general, and in Haskell in particular. Here’s the Bifunctor instance for a pair constructor — the simplest product type:

instance Bifunctor (,) where
    bimap f g (x, y) = (f x, g y)

There isn’t much choice: bimap simply applies the first function to the first component, and the second function to the second component of a pair. The code pretty much writes itself, given the types:

bimap :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)

The action of the bifunctor here is to make pairs of types, for instance:

(,) a b = (a, b)

By duality, a coproduct, if it’s defined for every pair of objects in a category, is also a bifunctor. In Haskell, this is exemplified by the Either type constructor being an instance of Bifunctor:

instance Bifunctor Either where
    bimap f _ (Left x)  = Left (f x)
    bimap _ g (Right y) = Right (g y)

This code also writes itself.

Now, remember when we talked about monoidal categories? A monoidal category defines a binary operator acting on objects, together with a unit object. I mentioned that Set is a monoidal category with respect to cartesian product, with the singleton set as a unit. And it’s also a monoidal category with respect to disjoint union, with the empty set as a unit. What I haven’t mentioned is that one of the requirements for a monoidal category is that the binary operator be a bifunctor. This is a very important requirement — we want the monoidal product to be compatible with the structure of the category, which is defined by morphisms. We are now one step closer to the full definition of a monoidal category (we still need to learn about naturality, before we can get there).

Functorial Algebraic Data Types

We’ve seen several examples of parameterized data types that turned out to be functors — we were able to define fmap for them. Complex data types are constructed from simpler data types. In particular, algebraic data types (ADTs) are created using sums and products. We have just seen that sums and products are functorial. We also know that functors compose. So if we can show that the basic building blocks of ADTs are functorial, we’ll know that parameterized ADTs are functorial too.

So what are the building blocks of parameterized algebraic data types? First, there are the items that have no dependency on the type parameter of the functor, like Nothing in Maybe, or Nil in List. They are equivalent to the Const functor. Remember, the Const functor ignores its type parameter (really, the second type parameter, which is the one of interest to us, the first one being kept constant).

Then there are the elements that simply encapsulate the type parameter itself, like Just in Maybe. They are equivalent to the identity functor. I mentioned the identity functor previously, as the identity morphism in Cat, but didn’t give its definition in Haskell. Here it is:

data Identity a = Identity a

instance Functor Identity where
    fmap f (Identity x) = Identity (f x)

You can think of Identity as the simplest possible container that always stores just one (immutable) value of type a.

Everything else in algebraic data structures is constructed from these two primitives using products and sums.

With this new knowledge, let’s have a fresh look at the Maybe type constructor:

data Maybe a = Nothing | Just a

It’s a sum of two types, and we now know that the sum is functorial. The first part, Nothing can be represented as a Const () acting on a (the first type parameter of Const is set to unit — later we’ll see more interesting uses of Const). The second part is just a different name for the identity functor. We could have defined Maybe, up to isomorphism, as:

type Maybe a = Either (Const () a) (Identity a)

So Maybe is the composition of the bifunctor Either with two functors, Const () and Identity. (Const is really a bifunctor, but here we always use it partially applied.)

We’ve already seen that a composition of functors is a functor — we can easily convince ourselves that the same is true of bifunctors. All we need is to figure out how a composition of a bifunctor with two functors works on morphisms. Given two morphisms, we simply lift one with one functor and the other with the other functor. We then lift the resulting pair of lifted morphisms with the bifunctor.

We can express this composition in Haskell. Let’s define a data type that is parameterized by a bifunctor bf (it’s a type variable that is a type constructor that takes two types as arguments), two functors fu and gu (type constructors that take one type variable each), and two regular types a and b. We apply fu to a and gu to b, and then apply bf to the resulting two types:

newtype BiComp bf fu gu a b = BiComp (bf (fu a) (gu b))

That’s the composition on objects, or types. Notice how in Haskell we apply type constructors to types, just like we apply functions to arguments. The syntax is the same.

If you’re getting a little lost, try applying BiComp to Either, Const (), Identity, a, and b, in this order. You will recover our bare-bone version of Maybe b (a is ignored).

The new data type BiComp is a bifunctor in a and b, but only if bf is itself a Bifunctor and fu and gu are Functors. The compiler must know that there will be a definition of bimap available for bf, and definitions of fmap for fu and gu. In Haskell, this is expressed as a precondition in the instance declaration: a set of class constraints followed by a double arrow:

instance (Bifunctor bf, Functor fu, Functor gu) =>
  Bifunctor (BiComp bf fu gu) where
    bimap f1 f2 (BiComp x) = BiComp ((bimap (fmap f1) (fmap f2)) x)

The implementation of bimap for BiComp is given in terms of bimap for bf and the two fmaps for fu and gu. The compiler automatically infers all the types and picks the correct overloaded functions whenever BiComp is used.

The x in the definition of bimap has the type:

bf (fu a) (gu b)

which is quite a mouthful. The outer bimap breaks through the outer bf layer, and the two fmaps dig under fu and gu, respectively. If the types of f1 and f2 are:

f1 :: a -> a'
f2 :: b -> b'

then the final result is of the type bf (fu a') (gu b'):

bimap_bf :: (fu a -> fu a') -> (gu b -> gu b') 
  -> bf (fu a) (gu b) -> bf (fu a') (gu b')

If you like jigsaw puzzles, these kinds of type manipulations can provide hours of entertainment.

So it turns out that we didn’t have to prove that Maybe was a functor — this fact followed from the way it was constructed as a sum of two functorial primitives.

A perceptive reader might ask the question: If the derivation of the Functor instance for algebraic data types is so mechanical, can’t it be automated and performed by the compiler? Indeed, it can, and it is. You need to enable a particular Haskell extension by including this line at the top of your source file:

{-# LANGUAGE DeriveFunctor #-}

and then add deriving Functor to your data structure:

data Maybe a = Nothing | Just a
  deriving Functor

and the corresponding fmap will be implemented for you.

The regularity of algebraic data structures makes it possible to derive instances not only of Functor but of several other type classes, including the Eq type class I mentioned before. There is also the option of teaching the compiler to derive instances of your own typeclasses, but that’s a bit more advanced. The idea though is the same: You provide the behavior for the basic building blocks and sums and products, and let the compiler figure out the rest.

Functors in C++

If you are a C++ programmer, you obviously are on your own as far as implementing functors goes. However, you should be able to recognize some types of algebraic data structures in C++. If such a data structure is made into a generic template, you should be able to quickly implement fmap for it.

Let’s have a look at a tree data structure, which we would define in Haskell as a recursive sum type:

data Tree a = Leaf a | Node (Tree a) (Tree a)
    deriving Functor

As I mentioned before, one way of implementing sum types in C++ is through class hierarchies. It would be natural, in an object-oriented language, to implement fmap as a virtual function of the base class Functor and then override it in all subclasses. Unfortunately this is impossible because fmap is a template, parameterized not only by the type of the object it’s acting upon (the this pointer) but also by the return type of the function that’s been applied to it. Virtual functions cannot be templatized in C++. We’ll implement fmap as a generic free function, and we’ll replace pattern matching with dynamic_cast.

The base class must define at least one virtual function in order to support dynamic casting, so we’ll make the destructor virtual (which is a good idea in any case):

template<class T>
struct Tree {
    virtual ~Tree() {};
};

The Leaf is just an Identity functor in disguise:

template<class T>
struct Leaf : public Tree<T> {
    T _label;
    Leaf(T l) : _label(l) {}
};

The Node is a product type:

template<class T>
struct Node : public Tree<T> {
    Tree<T> * _left;
    Tree<T> * _right;
    Node(Tree<T> * l, Tree<T> * r) : _left(l), _right(r) {}
};

When implementing fmap we take advantage of dynamic dispatching on the type of the Tree. The Leaf case applies the Identity version of fmap, and the Node case is treated like a bifunctor composed with two copies of the Tree functor. As a C++ programmer, you’re probably not used to analyzing code in these terms, but it’s a good exercise in categorical thinking.

template<class A, class B>
Tree<B> * fmap(std::function<B(A)> f, Tree<A> * t)
{
    Leaf<A> * pl = dynamic_cast <Leaf<A>*>(t);
    if (pl)
        return new Leaf<B>(f (pl->_label));
    Node<A> * pn = dynamic_cast<Node<A>*>(t);
    if (pn)
        return new Node<B>( fmap<A>(f, pn->_left)
                          , fmap<A>(f, pn->_right));
    return nullptr;
}

For simplicity, I decided to ignore memory and resource management issues, but in production code you would probably use smart pointers (unique or shared, depending on your policy).

Compare it with the Haskell implementation of fmap:

instance Functor Tree where
    fmap f (Leaf a) = Leaf (f a)
    fmap f (Node t t') = Node (fmap f t) (fmap f t')

This implementation can also be automatically derived by the compiler.

The Writer Functor

I promised that I would come back to the Kleisli category I described earlier. Morphisms in that category were represented as “embellished” functions returning the Writer data structure.

type Writer a = (a, String)

I said that the embellishment was somehow related to endofunctors. And, indeed, the Writer type constructor is functorial in a. We don’t even have to implement fmap for it, because it’s just a simple product type.

But what’s the relation between a Kleisli category and a functor — in general? A Kleisli category, being a category, defines composition and identity. Let’ me remind you that the composition is given by the fish operator:

(>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)
m1 >=> m2 = \x -> 
    let (y, s1) = m1 x
        (z, s2) = m2 y
    in (z, s1 ++ s2)

and the identity morphism by a function called return:

return :: a -> Writer a
return x = (x, "")

It turns out that, if you look at the types of these two functions long enough (and I mean, long enough), you can find a way to combine them to produce a function with the right type signature to serve as fmap. Like this:

fmap f = id >=> (\x -> return (f x))

Here, the fish operator combines two functions: one of them is the familiar id, and the other is a lambda that applies return to the result of acting with f on the lambda’s argument. The hardest part to wrap your brain around is probably the use of id. Isn’t the argument to the fish operator supposed to be a function that takes a “normal” type and returns an embellished type? Well, not really. Nobody says that a in a -> Writer b must be a “normal” type. It’s a type variable, so it can be anything, in particular it can be an embellished type, like Writer b.

So id will take Writer a and turn it into Writer a. The fish operator will fish out the value of a and pass it as x to the lambda. There, f will turn it into a b and return will embellish it, making it Writer b. Putting it all together, we end up with a function that takes Writer a and returns Writer b, exactly what fmap is supposed to produce.

Notice that this argument is very general: you can replace Writer with any type constructor. As long as it supports a fish operator and return, you can define fmap as well. So the embellishment in the Kleisli category is always a functor. (Not every functor, though, gives rise to a Kleisli category.)

You might wonder if the fmap we have just defined is the same fmap the compiler would have derived for us with deriving Functor. Interestingly enough, it is. This is due to the way Haskell implements polymorphic functions. It’s called parametric polymorphism, and it’s a source of so called theorems for free. One of those theorems says that, if there is an implementation of fmap for a given type constructor, one that preserves identity, then it must be unique.

Covariant and Contravariant Functors

Now that we’ve reviewed the writer functor, let’s go back to the reader functor. It was based on the partially applied function-arrow type constructor:

(->) r

We can rewrite it as a type synonym:

type Reader r a = r -> a

for which the Functor instance, as we’ve seen before, reads:

instance Functor (Reader r) where
    fmap f g = f . g

But just like the pair type constructor, or the Either type constructor, the function type constructor takes two type arguments. The pair and Either were functorial in both arguments — they were bifunctors. Is the function constructor a bifunctor too?

Let’s try to make it functorial in the first argument. We’ll start with a type synonym — it’s just like the Reader but with the arguments flipped:

type Op r a = a -> r

This time we fix the return type, r, and vary the argument type, a. Let’s see if we can somehow match the types in order to implement fmap, which would have the following type signature:

fmap :: (a -> b) -> (a -> r) -> (b -> r)

With just two functions taking a and returning, respectively, b and r, there is simply no way to build a function taking b and returning r! It would be different if we could somehow invert the first function, so that it took b and returned a instead. We can’t invert an arbitrary function, but we can go to the opposite category.

A short recap: For every category C there is a dual category C^op. It’s a category with the same objects as C, but with all the arrows reversed.

Consider a functor that goes between C^op and some other category D:
F :: C^op → D
Such a functor maps a morphism f^op :: a → b in C^op to the morphism F f^op :: F a → F b in D. But the morphism f^op secretly corresponds to some morphism f :: b → a in the original category C. Notice the inversion.

Now, F is a regular functor, but there is another mapping we can define based on F, which is not a functor — let’s call it G. It’s a mapping from C to D. It maps objects the same way F does, but when it comes to mapping morphisms, it reverses them. It takes a morphism f :: b → a in C, maps it first to the opposite morphism f^op :: a → b and then uses the functor F on it, to get F f^op :: F a → F b.

Considering that F a is the same as G a and F b is the same as G b, the whole trip can be described as:
G f :: (b → a) → (G a → G b)
It’s a “functor with a twist.” A mapping of categories that inverts the direction of morphisms in this manner is called a contravariant functor. Notice that a contravariant functor is just a regular functor from the opposite category. The regular functors, by the way — the kind we’ve been studying thus far — are called covariant functors.

Here’s the typeclass defining a contravariant functor (really, a contravariant endofunctor) in Haskell:

class Contravariant f where
    contramap :: (b -> a) -> (f a -> f b)

Our type constructor Op is an instance of it:

instance Contravariant (Op r) where
    -- (b -> a) -> Op r a -> Op r b
    contramap f g = g . f

Notice that the function f inserts itself before (that is, to the right of) the contents of Op — the function g.

The definition of contramap for Op may be made even terser, if you notice that it’s just the function composition operator with the arguments flipped. There is a special function for flipping arguments, called flip:

flip :: (a -> b -> c) -> (b -> a -> c)
flip f y x = f x y

With it, we get:

contramap = flip (.)

Profunctors

We’ve seen that the function-arrow operator is contravariant in its first argument and covariant in the second. Is there a name for such a beast? It turns out that, if the target category is Set, such a beast is called a profunctor. Because a contravariant functor is equivalent to a covariant functor from the opposite category, a profunctor is defined as:
C^op × D → Set

Since, to first approximation, Haskell types are sets, we apply the name Profunctor to a type constructor p of two arguments, which is contra-functorial in the first argument and functorial in the second. Here’s the appropriate typeclass taken from the Data.Profunctor library:

class Profunctor p where
  dimap :: (a -> b) -> (c -> d) -> p b c -> p a d
  dimap f g = lmap f . rmap g
  lmap :: (a -> b) -> p b c -> p a c
  lmap f = dimap f id
  rmap :: (b -> c) -> p a b -> p a c
  rmap = dimap id

All three functions come with default implementations. Just like with Bifunctor, when declaring an instance of Profunctor, you have a choice of either implementing dimap and accepting the defaults for lmap and rmap, or implementing both lmap and rmap and accepting the default for dimap.

dimap

Now we can assert that the function-arrow operator is an instance of a Profunctor:

instance Profunctor (->) where
  dimap ab cd bc = cd . bc . ab
  lmap = flip (.)
  rmap = (.)

Profunctors have their application in the Haskell lens library. We’ll see them again when we talk about ends and coends.

The Hom-Functor

The above examples are the reflection of a more general statement that the mapping that takes a pair of objects a and b and assigns to it the set of morphisms between them, the hom-set C(a, b), is a functor. It is a functor from the product category C^op×C to the category of sets, Set.

Let’s define its action on morphisms. A morphism in C^op×C is a pair of morphisms from C:

f :: a'→ a
g :: b → b'

The lifting of this pair must be a morphism (a function) from the set C(a, b) to the set C(a', b'). Just pick any element h of C(a, b) (it’s a morphism from a to b) and assign to it:

g ∘ h ∘ f

which is an element of C(a', b').

As you can see, the hom-functor is a special case of a profunctor.

Challenges

Show that the data type:
```
data Pair a b = Pair a b
```
is a bifunctor. For additional credit implement all three methods of Bifunctor and use equational reasoning to show that these definitions are compatible with the default implementations whenever they can be applied.
Show the isomorphism between the standard definition of Maybe and this desugaring:
```
type Maybe' a = Either (Const () a) (Identity a)
```
Hint: Define two mappings between the two implementations. For additional credit, show that they are the inverse of each other using equational reasoning.
Let’s try another data structure. I call it a PreList because it’s a precursor to a List. It replaces recursion with a type parameter b.
```
data PreList a b = Nil | Cons a b
```
You could recover our earlier definition of a List by recursively applying PreList to itself (we’ll see how it’s done when we talk about fixed points).

Show that PreList is an instance of Bifunctor.
Show that the following data types define bifunctors in a and b:
```
data K2 c a b = K2 c
```
```
data Fst a b = Fst a
```
```
data Snd a b = Snd b
```
For additional credit, check your solutions agains Conor McBride’s paper Clowns to the Left of me, Jokers to the Right.
Define a bifunctor in a language other than Haskell. Implement bimap for a generic pair in that language.
Should std::map be considered a bifunctor or a profunctor in the two template arguments Key and T? How would you redesign this data type to make it so?

Next: Function Types.

Acknowledgment

As usual, big thanks go to Gershom Bazerman for reviewing this article.
Follow @BartoszMilewski

January 20, 2015

Functors

Posted by Bartosz Milewski under Category Theory, Haskell
[51] Comments

This is part of Categories for Programmers. Previously: Simple Algebraic Data Types. See the Table of Contents.

At the risk of sounding like a broken record, I will say this about functors: A functor is a very simple but powerful idea. Category theory is just full of those simple but powerful ideas. A functor is a mapping between categories. Given two categories, C and D, a functor F maps objects in C to objects in D — it’s a function on objects. If a is an object in C, we’ll write its image in D as F a (no parentheses). But a category is not just objects — it’s objects and morphisms that connect them. A functor also maps morphisms — it’s a function on morphisms. But it doesn’t map morphisms willy-nilly — it preserves connections. So if a morphism f in C connects object a to object b,

f :: a -> b

the image of f in D, F f, will connect the image of a to the image of b:

F f :: F a -> F b

(This is a mixture of mathematical and Haskell notation that hopefully makes sense by now. I won’t use parentheses when applying functors to objects or morphisms.) As you can see, a functor preserves the structure of a category: what’s connected in one category will be connected in the other category. But there’s something more to the structure of a category: there’s also the composition of morphisms. If h is a composition of f and g:

h = g . f

we want its image under F to be a composition of the images of f and g:

F h = F g . F f

Finally, we want all identity morphisms in C to be mapped to identity morphisms in D:

F id_a = id_{F a}

Here, id_a is the identity at the object a, and id_{F a} the identity at F a. Note that these conditions make functors much more restrictive than regular functions. Functors must preserve the structure of a category. If you picture a category as a collection of objects held together by a network of morphisms, a functor is not allowed to introduce any tears into this fabric. It may smash objects together, it may glue multiple morphisms into one, but it may never break things apart. This no-tearing constraint is similar to the continuity condition you might know from calculus. In this sense functors are “continuous” (although there exists an even more restrictive notion of continuity for functors). Just like functions, functors may do both collapsing and embedding. The embedding aspect is more prominent when the source category is much smaller than the target category. In the extreme, the source can be the trivial singleton category — a category with one object and one morphism (the identity). A functor from the singleton category to any other category simply selects an object in that category. This is fully analogous to the property of morphisms from singleton sets selecting elements in target sets. The maximally collapsing functor is called the constant functor Δ_c. It maps every object in the source category to one selected object c in the target category. It also maps every morphism in the source category to the identity morphism id_c. It acts like a black hole, compacting everything into one singularity. We’ll see more of this functor when we discuss limits and colimits.

Functors in Programming

Let’s get down to earth and talk about programming. We have our category of types and functions. We can talk about functors that map this category into itself — such functors are called endofunctors. So what’s an endofunctor in the category of types? First of all, it maps types to types. We’ve seen examples of such mappings, maybe without realizing that they were just that. I’m talking about definitions of types that were parameterized by other types. Let’s see a few examples.

The Maybe Functor

The definition of Maybe is a mapping from type a to type Maybe a:

data Maybe a = Nothing | Just a

Here’s an important subtlety: Maybe itself is not a type, it’s a type constructor. You have to give it a type argument, like Int or Bool, in order to turn it into a type. Maybe without any argument represents a function on types. But can we turn Maybe into a functor? (From now on, when I speak of functors in the context of programming, I will almost always mean endofunctors.) A functor is not only a mapping of objects (here, types) but also a mapping of morphisms (here, functions). For any function from a to b:

f :: a -> b

we would like to produce a function from Maybe a to Maybe b. To define such a function, we’ll have two cases to consider, corresponding to the two constructors of Maybe. The Nothing case is simple: we’ll just return Nothing back. And if the argument is Just, we’ll apply the function f to its contents. So the image of f under Maybe is the function:

f’ :: Maybe a -> Maybe b
f’ Nothing = Nothing
f’ (Just x) = Just (f x)

(By the way, in Haskell you can use apostrophes in variables names, which is very handy in cases like these.) In Haskell, we implement the morphism-mapping part of a functor as a higher order function called fmap. In the case of Maybe, it has the following signature:

fmap :: (a -> b) -> (Maybe a -> Maybe b)

We often say that fmap lifts a function. The lifted function acts on Maybe values. As usual, because of currying, this signature may be interpreted in two ways: as a function of one argument — which itself is a function (a->b) — returning a function (Maybe a -> Maybe b); or as a function of two arguments returning Maybe b:

fmap :: (a -> b) -> Maybe a -> Maybe b

Based on our previous discussion, this is how we implement fmap for Maybe:

fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)

To show that the type constructor Maybe together with the function fmap form a functor, we have to prove that fmap preserves identity and composition. These are called “the functor laws,” but they simply ensure the preservation of the structure of the category.

Equational Reasoning

To prove the functor laws, I will use equational reasoning, which is a common proof technique in Haskell. It takes advantage of the fact that Haskell functions are defined as equalities: the left hand side equals the right hand side. You can always substitute one for another, possibly renaming variables to avoid name conflicts. Think of this as either inlining a function, or the other way around, refactoring an expression into a function. Let’s take the identity function as an example:

id x = x

If you see, for instance, id y in some expression, you can replace it with y (inlining). Further, if you see id applied to an expression, say id (y + 2), you can replace it with the expression itself (y + 2). And this substitution works both ways: you can replace any expression e with id e (refactoring). If a function is defined by pattern matching, you can use each sub-definition independently. For instance, given the above definition of fmap you can replace fmap f Nothing with Nothing, or the other way around. Let’s see how this works in practice. Let’s start with the preservation of identity:

fmap id = id

There are two cases to consider: Nothing and Just. Here’s the first case (I’m using Haskell pseudo-code to transform the left hand side to the right hand side):

  fmap id Nothing 
= { definition of fmap }
  Nothing 
= { definition of id }
  id Nothing

Notice that in the last step I used the definition of id backwards. I replaced the expression Nothing with id Nothing. In practice, you carry out such proofs by “burning the candle at both ends,” until you hit the same expression in the middle — here it was Nothing. The second case is also easy:

  fmap id (Just x) 
= { definition of fmap }
  Just (id x) 
= { definition of id }
  Just x
= { definition of id }
  id (Just x)

Now, lets show that fmap preserves composition:

fmap (g . f) = fmap g . fmap f

First the Nothing case:

  fmap (g . f) Nothing 
= { definition of fmap }
  Nothing 
= { definition of fmap }
  fmap g Nothing
= { definition of fmap }
  fmap g (fmap f Nothing)

And then the Just case:

  fmap (g . f) (Just x)
= { definition of fmap }
  Just ((g . f) x)
= { definition of composition }
  Just (g (f x))
= { definition of fmap }
  fmap g (Just (f x))
= { definition of fmap }
  fmap g (fmap f (Just x))
= { definition of composition }
  (fmap g . fmap f) (Just x)

It’s worth stressing that equational reasoning doesn’t work for C++ style “functions” with side effects. Consider this code:

int square(int x) {
    return x * x;
}

int counter() {
    static int c = 0;
    return c++;
}

double y = square(counter());

Using equational reasoning, you would be able to inline square to get:

double y = counter() * counter();

This is definitely not a valid transformation, and it will not produce the same result. Despite that, the C++ compiler will try to use equational reasoning if you implement square as a macro, with disastrous results.

Optional

Functors are easily expressed in Haskell, but they can be defined in any language that supports generic programming and higher-order functions. Let’s consider the C++ analog of Maybe, the template type optional. Here’s a sketch of the implementation (the actual implementation is much more complex, dealing with various ways the argument may be passed, with copy semantics, and with the resource management issues characteristic of C++):

template<class T>
class optional {
    bool _isValid; // the tag
    T    _v;
public:
    optional()    : _isValid(false) {}         // Nothing
    optional(T x) : _isValid(true) , _v(x) {}  // Just
    bool isValid() const { return _isValid; }
    T val() const { return _v; }
};

This template provides one part of the definition of a functor: the mapping of types. It maps any type T to a new type optional<T>. Let’s define its action on functions:

template<class A, class B>
std::function<optional<B>(optional<A>)> 
fmap(std::function<B(A)> f) 
{
    return [f](optional<A> opt) {
        if (!opt.isValid())
            return optional<B>{};
        else
            return optional<B>{ f(opt.val()) };
    };
}

This is a higher order function, taking a function as an argument and returning a function. Here’s the uncurried version of it:

template<class A, class B>
optional<B> fmap(std::function<B(A)> f, optional<A> opt) {
    if (!opt.isValid())
        return optional<B>{};
    else
        return optional<B>{ f(opt.val()) };
}

There is also an option of making fmap a template method of optional. This embarrassment of choices makes abstracting the functor pattern in C++ a problem. Should functor be an interface to inherit from (unfortunately, you can’t have template virtual functions)? Should it be a curried or an uncurried free template function? Can the C++ compiler correctly infer the missing types, or should they be specified explicitly? Consider a situation where the input function f takes an int to a bool. How will the compiler figure out the type of g:

auto g = fmap(f);

especially if, in the future, there are multiple functors overloading fmap? (We’ll see more functors soon.)

Typeclasses

So how does Haskell deal with abstracting the functor? It uses the typeclass mechanism. A typeclass defines a family of types that support a common interface. For instance, the class of objects that support equality is defined as follows:

class Eq a where
    (==) :: a -> a -> Bool

This definition states that type a is of the class Eq if it supports the operator (==) that takes two arguments of type a and returns a Bool. If you want to tell Haskell that a particular type is Eq, you have to declare it an instance of this class and provide the implementation of (==). For example, given the definition of a 2D Point (a product type of two Floats):

data Point = Pt Float Float

you can define the equality of points:

instance Eq Point where
    (Pt x y) == (Pt x' y') = x == x' && y == y'

Here I used the operator (==) (the one I’m defining) in the infix position between the two patterns (Pt x y) and (Pt x' y'). The body of the function follows the single equal sign. Once Point is declared an instance of Eq, you can directly compare points for equality. Notice that, unlike in C++ or Java, you don’t have to specify the Eq class (or interface) when defining Point — you can do it later in client code. Typeclasses are also Haskell’s only mechanism for overloading functions (and operators). We will need that for overloading fmap for different functors. There is one complication, though: a functor is not defined as a type but as a mapping of types, a type constructor. We need a typeclass that’s not a family of types, as was the case with Eq, but a family of type constructors. Fortunately a Haskell typeclass works with type constructors as well as with types. So here’s the definition of the Functor class:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

It stipulates that f is a Functor if there exists a function fmap with the specified type signature. The lowercase f is a type variable, similar to type variables a and b. The compiler, however, is able to deduce that it represents a type constructor rather than a type by looking at its usage: acting on other types, as in f a and f b. Accordingly, when declaring an instance of Functor, you have to give it a type constructor, as is the case with Maybe:

instance Functor Maybe where
    fmap _ Nothing = Nothing
    fmap f (Just x) = Just (f x)

By the way, the Functor class, as well as its instance definitions for a lot of simple data types, including Maybe, are part of the standard Prelude library.

Functor in C++

Can we try the same approach in C++? A type constructor corresponds to a template class, like optional, so by analogy, we would parameterize fmap with a template template parameter F. This is the syntax for it:

template<template<class> F, class A, class B>
F<B> fmap(std::function<B(A)>, F<A>);

We would like to be able to specialize this template for different functors. Unfortunately, there is a prohibition against partial specialization of template functions in C++. You can’t write:

template<class A, class B>
optional<B> fmap<optional>(std::function<B(A)> f, optional<A> opt)

Instead, we have to fall back on function overloading, which brings us back to the original definition of the uncurried fmap:

template<class A, class B>
optional<B> fmap(std::function<B(A)> f, optional<A> opt) 
{
    if (!opt.isValid())
        return optional<B>{};
    else
        return optional<B>{ f(opt.val()) };
}

This definition works, but only because the second argument of fmap selects the overload. It totally ignores the more generic definition of fmap.

The List Functor

To get some intuition as to the role of functors in programming, we need to look at more examples. Any type that is parameterized by another type is a candidate for a functor. Generic containers are parameterized by the type of the elements they store, so let’s look at a very simple container, the list:

data List a = Nil | Cons a (List a)

We have the type constructor List, which is a mapping from any type a to the type List a. To show that List is a functor we have to define the lifting of functions: Given a function a->b define a function List a -> List b:

fmap :: (a -> b) -> (List a -> List b)

A function acting on List a must consider two cases corresponding to the two list constructors. The Nil case is trivial — just return Nil — there isn’t much you can do with an empty list. The Cons case is a bit tricky, because it involves recursion. So let’s step back for a moment and consider what we are trying to do. We have a list of a, a function f that turns a to b, and we want to generate a list of b. The obvious thing is to use f to turn each element of the list from a to b. How do we do this in practice, given that a (non-empty) list is defined as the Cons of a head and a tail? We apply f to the head and apply the lifted (fmapped) f to the tail. This is a recursive definition, because we are defining lifted f in terms of lifted f:

fmap f (Cons x t) = Cons (f x) (fmap f t)

Notice that, on the right hand side, fmap f is applied to a list that’s shorter than the list for which we are defining it — it’s applied to its tail. We recurse towards shorter and shorter lists, so we are bound to eventually reach the empty list, or Nil. But as we’ve decided earlier, fmap f acting on Nil returns Nil, thus terminating the recursion. To get the final result, we combine the new head (f x) with the new tail (fmap f t) using the Cons constructor. Putting it all together, here’s the instance declaration for the list functor:

instance Functor List where
    fmap _ Nil = Nil
    fmap f (Cons x t) = Cons (f x) (fmap f t)

If you are more comfortable with C++, consider the case of a std::vector, which could be considered the most generic C++ container. The implementation of fmap for std::vector is just a thin encapsulation of std::transform:

template<class A, class B>
std::vector<B> fmap(std::function<B(A)> f, std::vector<A> v)
{
    std::vector<B> w;
    std::transform( std::begin(v)
                  , std::end(v)
                  , std::back_inserter(w)
                  , f);
    return w;
}

We can use it, for instance, to square the elements of a sequence of numbers:

std::vector<int> v{ 1, 2, 3, 4 };
auto w = fmap([](int i) { return i*i; }, v);
std::copy( std::begin(w)
         , std::end(w)
         , std::ostream_iterator(std::cout, ", "));

Most C++ containers are functors by virtue of implementing iterators that can be passed to std::transform, which is the more primitive cousin of fmap. Unfortunately, the simplicity of a functor is lost under the usual clutter of iterators and temporaries (see the implementation of fmap above). I’m happy to say that the new proposed C++ range library makes the functorial nature of ranges much more pronounced.

The Reader Functor

Now that you might have developed some intuitions — for instance, functors being some kind of containers — let me show you an example which at first sight looks very different. Consider a mapping of type a to the type of a function returning a. We haven’t really talked about function types in depth — the full categorical treatment is coming — but we have some understanding of those as programmers. In Haskell, a function type is constructed using the arrow type constructor (->) which takes two types: the argument type and the result type. You’ve already seen it in infix form, a->b, but it can equally well be used in prefix form, when parenthesized:

(->) a b

Just like with regular functions, type functions of more than one argument can be partially applied. So when we provide just one type argument to the arrow, it still expects another one. That’s why:

(->) a

is a type constructor. It needs one more type b to produce a complete type a->b. As it stands, it defines a whole family of type constructors parameterized by a. Let’s see if this is also a family of functors. Dealing with two type parameters can get a bit confusing, so let’s do some renaming. Let’s call the argument type r and the result type a, in line with our previous functor definitions. So our type constructor takes any type a and maps it into the type r->a. To show that it’s a functor, we want to lift a function a->b to a function that takes r->a and returns r->b. These are the types that are formed using the type constructor (->) r acting on, respectively, a and b. Here’s the type signature of fmap applied to this case:

fmap :: (a -> b) -> (r -> a) -> (r -> b)

We have to solve the following puzzle: given a function f::a->b and a function g::r->a, create a function r->b. There is only one way we can compose the two functions, and the result is exactly what we need. So here’s the implementation of our fmap:

instance Functor ((->) r) where
    fmap f g = f . g

It just works! If you like terse notation, this definition can be reduced further by noticing that composition can be rewritten in prefix form:

fmap f g = (.) f g

and the arguments can be omitted to yield a direct equality of two functions:

fmap = (.)

This combination of the type constructor (->) r with the above implementation of fmap is called the reader functor.

Functors as Containers

We’ve seen some examples of functors in programming languages that define general-purpose containers, or at least objects that contain some value of the type they are parameterized over. The reader functor seems to be an outlier, because we don’t think of functions as data. But we’ve seen that pure functions can be memoized, and function execution can be turned into table lookup. Tables are data. Conversely, because of Haskell’s laziness, a traditional container, like a list, may actually be implemented as a function. Consider, for instance, an infinite list of natural numbers, which can be compactly defined as:

nats :: [Integer]
nats = [1..]

In the first line, a pair of square brackets is Haskell’s built-in type constructor for lists. In the second line, square brackets are used to create a list literal. Obviously, an infinite list like this cannot be stored in memory. The compiler implements it as a function that generates Integers on demand. Haskell effectively blurs the distinction between data and code. A list could be considered a function, and a function could be considered a table that maps arguments to results. The latter can even be practical if the domain of the function is finite and not too large. It would not be practical, however, to implement strlen as table lookup, because there are infinitely many different strings. As programmers, we don’t like infinities, but in category theory you learn to eat infinities for breakfast. Whether it’s a set of all strings or a collection of all possible states of the Universe, past, present, and future — we can deal with it! So I like to think of the functor object (an object of the type generated by an endofunctor) as containing a value or values of the type over which it is parameterized, even if these values are not physically present there. One example of a functor is a C++ std::future, which may at some point contain a value, but it’s not guaranteed it will; and if you want to access it, you may block waiting for another thread to finish execution. Another example is a Haskell IO object, which may contain user input, or the future versions of our Universe with “Hello World!” displayed on the monitor. According to this interpretation, a functor object is something that may contain a value or values of the type it’s parameterized upon. Or it may contain a recipe for generating those values. We are not at all concerned about being able to access the values — that’s totally optional, and outside of the scope of the functor. All we are interested in is to be able to manipulate those values using functions. If the values can be accessed, then we should be able to see the results of this manipulation. If they can’t, then all we care about is that the manipulations compose correctly and that the manipulation with an identity function doesn’t change anything. Just to show you how much we don’t care about being able to access the values inside a functor object, here’s a type constructor that ignores completely its argument a:

data Const c a = Const c

The Const type constructor takes two types, c and a. Just like we did with the arrow constructor, we are going to partially apply it to create a functor. The data constructor (also called Const) takes just one value of type c. It has no dependence on a. The type of fmap for this type constructor is:

fmap :: (a -> b) -> Const c a -> Const c b

Because the functor ignores its type argument, the implementation of fmap is free to ignore its function argument — the function has nothing to act upon:

instance Functor (Const c) where
    fmap _ (Const v) = Const v

This might be a little clearer in C++ (I never thought I would utter those words!), where there is a stronger distinction between type arguments — which are compile-time — and values, which are run-time:

template<class C, class A>
struct Const {
    Const(C v) : _v(v) {}
    C _v;
};

The C++ implementation of fmap also ignores the function argument and essentially re-casts the Const argument without changing its value:

template<class C, class A, class B>
Const<C, B> fmap(std::function<B(A)> f, Const<C, A> c) {
    return Const<C, B>{c._v};
}

Despite its weirdness, the Const functor plays an important role in many constructions. In category theory, it’s a special case of the Δ_c functor I mentioned earlier — the endo-functor case of a black hole. We’ll be seeing more of it it in the future.

Functor Composition

It’s not hard to convince yourself that functors between categories compose, just like functions between sets compose. A composition of two functors, when acting on objects, is just the composition of their respective object mappings; and similarly when acting on morphisms. After jumping through two functors, identity morphisms end up as identity morphisms, and compositions of morphisms finish up as compositions of morphisms. There’s really nothing much to it. In particular, it’s easy to compose endofunctors. Remember the function maybeTail? I’ll rewrite it using Haskell’s built in implementation of lists:

maybeTail :: [a] -> Maybe [a]
maybeTail [] = Nothing
maybeTail (x:xs) = Just xs

(The empty list constructor that we used to call Nil is replaced with the empty pair of square brackets []. The Cons constructor is replaced with the infix operator : (colon).) The result of maybeTail is of a type that’s a composition of two functors, Maybe and [], acting on a. Each of these functors is equipped with its own version of fmap, but what if we want to apply some function f to the contents of the composite: a Maybe list? We have to break through two layers of functors. We can use fmap to break through the outer Maybe. But we can’t just send f inside Maybe because f doesn’t work on lists. We have to send (fmap f) to operate on the inner list. For instance, let’s see how we can square the elements of a Maybe list of integers:

square x = x * x

mis :: Maybe [Int]
mis = Just [1, 2, 3]

mis2 = fmap (fmap square) mis

The compiler, after analyzing the types, will figure out that, for the outer fmap, it should use the implementation from the Maybe instance, and for the inner one, the list functor implementation. It may not be immediately obvious that the above code may be rewritten as:

mis2 = (fmap . fmap) square mis

But remember that fmap may be considered a function of just one argument:

fmap :: (a -> b) -> (f a -> f b)

In our case, the second fmap in (fmap . fmap) takes as its argument:

square :: Int -> Int

and returns a function of the type:

[Int] -> [Int]

The first fmap then takes that function and returns a function:

Maybe [Int] -> Maybe [Int]

Finally, that function is applied to mis. So the composition of two functors is a functor whose fmap is the composition of the corresponding fmaps. Going back to category theory: It’s pretty obvious that functor composition is associative (the mapping of objects is associative, and the mapping of morphisms is associative). And there is also a trivial identity functor in every category: it maps every object to itself, and every morphism to itself. So functors have all the same properties as morphisms in some category. But what category would that be? It would have to be a category in which objects are categories and morphisms are functors. It’s a category of categories. But a category of all categories would have to include itself, and we would get into the same kinds of paradoxes that made the set of all sets impossible. There is, however, a category of all small categories called Cat (which is big, so it can’t be a member of itself). A small category is one in which objects form a set, as opposed to something larger than a set. Mind you, in category theory, even an infinite uncountable set is considered “small.” I thought I’d mention these things because I find it pretty amazing that we can recognize the same structures repeating themselves at many levels of abstraction. We’ll see later that functors form categories as well.

Challenges

Can we turn the Maybe type constructor into a functor by defining:
```
fmap _ _ = Nothing
```
which ignores both of its arguments? (Hint: Check the functor laws.)
Prove functor laws for the reader functor. Hint: it’s really simple.
Implement the reader functor in your second favorite language (the first being Haskell, of course).
Prove the functor laws for the list functor. Assume that the laws are true for the tail part of the list you’re applying it to (in other words, use induction).

Acknowledgments

Gershom Bazerman is kind enough to keep reviewing these posts. I’m grateful for his patience and insight.

Next: Functoriality

Follow @BartoszMilewski

January 13, 2015

Simple Algebraic Data Types

Posted by Bartosz Milewski under C++, Category Theory, Haskell
[37] Comments

Categories for Programmers. Previously Products and Coproducts. See the Table of Contents.

We’ve seen two basic ways of combining types: using a product and a coproduct. It turns out that a lot of data structures in everyday programming can be built using just these two mechanisms. This fact has important practical consequences. Many properties of data structures are composable. For instance, if you know how to compare values of basic types for equality, and you know how to generalize these comparisons to product and coproduct types, you can automate the derivation of equality operators for composite types. In Haskell you can automatically derive equality, comparison, conversion to and from string, and more, for a large subset of composite types.

Let’s have a closer look at product and sum types as they appear in programming.

Product Types

The canonical implementation of a product of two types in a programming language is a pair. In Haskell, a pair is a primitive type constructor; in C++ it’s a relatively complex template defined in the Standard Library.

Pairs are not strictly commutative: a pair (Int, Bool) cannot be substituted for a pair (Bool, Int), even though they carry the same information. They are, however, commutative up to isomorphism — the isomorphism being given by the swap function (which is its own inverse):

swap :: (a, b) -> (b, a)
swap (x, y) = (y, x)

You can think of the two pairs as simply using a different format for storing the same data. It’s just like big endian vs. little endian.

You can combine an arbitrary number of types into a product by nesting pairs inside pairs, but there is an easier way: nested pairs are equivalent to tuples. It’s the consequence of the fact that different ways of nesting pairs are isomorphic. If you want to combine three types in a product, a, b, and c, in this order, you can do it in two ways:

((a, b), c)

(a, (b, c))

These types are different — you can’t pass one to a function that expects the other — but their elements are in one-to-one correspondence. There is a function that maps one to another:

alpha :: ((a, b), c) -> (a, (b, c))
alpha ((x, y), z) = (x, (y, z))

and this function is invertible:

alpha_inv :: (a, (b, c)) -> ((a, b), c)
alpha_inv  (x, (y, z)) = ((x, y), z)

so it’s an isomorphism. These are just different ways of repackaging the same data.

You can interpret the creation of a product type as a binary operation on types. From that perspective, the above isomorphism looks very much like the associativity law we’ve seen in monoids:

(a * b) * c = a * (b * c)

Except that, in the monoid case, the two ways of composing products were equal, whereas here they are only equal “up to isomorphism.”

If we can live with isomorphisms, and don’t insist on strict equality, we can go even further and show that the unit type, (), is the unit of the product the same way 1 is the unit of multiplication. Indeed, the pairing of a value of some type a with a unit doesn’t add any information. The type:

(a, ())

is isomorphic to a. Here’s the isomorphism:

rho :: (a, ()) -> a
rho (x, ()) = x

rho_inv :: a -> (a, ())
rho_inv x = (x, ())

These observations can be formalized by saying that Set (the category of sets) is a monoidal category. It’s a category that’s also a monoid, in the sense that you can multiply objects (here, take their cartesian product). I’ll talk more about monoidal categories, and give the full definition in the future.

There is a more general way of defining product types in Haskell — especially, as we’ll see soon, when they are combined with sum types. It uses named constructors with multiple arguments. A pair, for instance, can be defined alternatively as:

data Pair a b = P a b

Here, Pair a b is the name of the type paremeterized by two other types, a and b; and P is the name of the data constructor. You define a pair type by passing two types to the Pair type constructor. You construct a pair value by passing two values of appropriate types to the constructor P. For instance, let’s define a value stmt as a pair of String and Bool:

stmt :: Pair String Bool
stmt = P "This statements is" False

The first line is the type declaration. It uses the type constructor Pair, with String and Bool replacing a and the b in the generic definition of Pair. The second line defines the actual value by passing a concrete string and a concrete Boolean to the data constructor P. Type constructors are used to construct types; data constructors, to construct values.

Since the name spaces for type and data constructors are separate in Haskell, you will often see the same name used for both, as in:

data Pair a b = Pair a b

And if you squint hard enough, you may even view the built-in pair type as a variation on this kind of declaration, where the name Pair is replaced with the binary operator (,). In fact you can use (,) just like any other named constructor and create pairs using prefix notation:

stmt = (,) "This statement is" False

Similarly, you can use (,,) to create triples, and so on.

Instead of using generic pairs or tuples, you can also define specific named product types, as in:

data Stmt = Stmt String Bool

which is just a product of String and Bool, but it’s given its own name and constructor. The advantage of this style of declaration is that you may define many types that have the same content but different meaning and functionality, and which cannot be substituted for each other.

Programming with tuples and multi-argument constructors can get messy and error prone — keeping track of which component represents what. It’s often preferable to give names to components. A product type with named fields is called a record in Haskell, and a struct in C.

Records

Let’s have a look at a simple example. We want to describe chemical elements by combining two strings, name and symbol; and an integer, the atomic number; into one data structure. We can use a tuple (String, String, Int) and remember which component represents what. We would extract components by pattern matching, as in this function that checks if the symbol of the element is the prefix of its name (as in He being the prefix of Helium):

startsWithSymbol :: (String, String, Int) -> Bool
startsWithSymbol (name, symbol, _) = isPrefixOf symbol name

This code is error prone, and is hard to read and maintain. It’s much better to define a record:

data Element = Element { name         :: String
                       , symbol       :: String
                       , atomicNumber :: Int }

The two representations are isomorphic, as witnessed by these two conversion functions, which are the inverse of each other:

tupleToElem :: (String, String, Int) -> Element
tupleToElem (n, s, a) = Element { name = n
                                , symbol = s
                                , atomicNumber = a }

elemToTuple :: Element -> (String, String, Int)
elemToTuple e = (name e, symbol e, atomicNumber e)

Notice that the names of record fields also serve as functions to access these fields. For instance, atomicNumber e retrieves the atomicNumber field from e. We use atomicNumber as a function of the type:

atomicNumber :: Element -> Int

With the record syntax for Element, our function startsWithSymbol becomes more readable:

startsWithSymbol :: Element -> Bool
startsWithSymbol e = isPrefixOf (symbol e) (name e)

We could even use the Haskell trick of turning the function isPrefixOf into an infix operator by surrounding it with backquotes, and make it read almost like a sentence:

startsWithSymbol e = symbol e `isPrefixOf` name e

The parentheses could be omitted in this case, because an infix operator has lower precedence than a function call.

Sum Types

Just as the product in the category of sets gives rise to product types, the coproduct gives rise to sum types. The canonical implementation of a sum type in Haskell is:

data Either a b = Left a | Right b

And like pairs, Eithers are commutative (up to isomorphism), can be nested, and the nesting order is irrelevant (up to isomorphism). So we can, for instance, define a sum equivalent of a triple:

data OneOfThree a b c = Sinistral a | Medial b | Dextral c

and so on.

It turns out that Set is also a (symmetric) monoidal category with respect to coproduct. The role of the binary operation is played by the disjoint sum, and the role of the unit element is played by the initial object. In terms of types, we have Either as the monoidal operator and Void, the uninhabited type, as its neutral element. You can think of Either as plus, and Void as zero. Indeed, adding Void to a sum type doesn’t change its content. For instance:

Either a Void

is isomorphic to a. That’s because there is no way to construct a Right version of this type — there isn’t a value of type Void. The only inhabitants of Either a Void are constructed using the Left constructors and they simply encapsulate a value of type a. So, symbolically, a + 0 = a.

Sum types are pretty common in Haskell, but their C++ equivalents, unions or variants, are much less common. There are several reasons for that.

First of all, the simplest sum types are just enumerations and are implemented using enum in C++. The equivalent of the Haskell sum type:

data Color = Red | Green | Blue

is the C++:

enum { Red, Green, Blue };

An even simpler sum type:

data Bool = True | False

is the primitive bool in C++.

Simple sum types that encode the presence or absence of a value are variously implemented in C++ using special tricks and “impossible” values, like empty strings, negative numbers, null pointers, etc. This kind of optionality, if deliberate, is expressed in Haskell using the Maybe type:

data Maybe a = Nothing | Just a

The Maybe type is a sum of two types. You can see this if you separate the two constructors into individual types. The first one would look like this:

data NothingType = Nothing

It’s an enumeration with one value called Nothing. In other words, it’s a singleton, which is equivalent to the unit type (). The second part:

data JustType a = Just a

is just an encapsulation of the type a. We could have encoded Maybe as:

type Maybe a = Either () a

More complex sum types are often faked in C++ using pointers. A pointer can be either null, or point to a value of specific type. For instance, a Haskell list type, which can be defined as a (recursive) sum type:

List a = Nil | Cons a (List a)

can be translated to C++ using the null pointer trick to implement the empty list:

template<class A> 
class List {
    Node<A> * _head;
public:
    List() : _head(nullptr) {}  // Nil
    List(A a, List<A> l)        // Cons
      : _head(new Node<A>(a, l))
    {}
};

Notice that the two Haskell constructors Nil and Cons are translated into two overloaded List constructors with analogous arguments (none, for Nil; and a value and a list for Cons). The List class doesn’t need a tag to distinguish between the two components of the sum type. Instead it uses the special nullptr value for _head to encode Nil.

The main difference, though, between Haskell and C++ types is that Haskell data structures are immutable. If you create an object using one particular constructor, the object will forever remember which constructor was used and what arguments were passed to it. So a Maybe object that was created as Just "energy" will never turn into Nothing. Similarly, an empty list will forever be empty, and a list of three elements will always have the same three elements.

It’s this immutability that makes construction reversible. Given an object, you can always disassemble it down to parts that were used in its construction. This deconstruction is done with pattern matching and it reuses constructors as patterns. Constructor arguments, if any, are replaced with variables (or other patterns).

The List data type has two constructors, so the deconstruction of an arbitrary List uses two patterns corresponding to those constructors. One matches the empty Nil list, and the other a Cons-constructed list. For instance, here’s the definition of a simple function on Lists:

maybeTail :: List a -> Maybe (List a)
maybeTail Nil = Nothing
maybeTail (Cons _ t) = Just t

The first part of the definition of maybeTail uses the Nil constructor as pattern and returns Nothing. The second part uses the Cons constructor as pattern. It replaces the first constructor argument with a wildcard, because we are not interested in it. The second argument to Cons is bound to the variable t (I will call these things variables even though, strictly speaking, they never vary: once bound to an expression, a variable never changes). The return value is Just t. Now, depending on how your List was created, it will match one of the clauses. If it was created using Cons, the two arguments that were passed to it will be retrieved (and the first discarded).

Even more elaborate sum types are implemented in C++ using polymorphic class hierarchies. A family of classes with a common ancestor may be understood as one variant type, in which the vtable serves as a hidden tag. What in Haskell would be done by pattern matching on the constructor, and by calling specialized code, in C++ is accomplished by dispatching a call to a virtual function based on the vtable pointer.

You will rarely see union used as a sum type in C++ because of severe limitations on what can go into a union. You can’t even put a std::string into a union because it has a copy constructor.

Algebra of Types

Taken separately, product and sum types can be used to define a variety of useful data structures, but the real strength comes from combining the two. Once again we are invoking the power of composition.

Let’s summarize what we’ve discovered so far. We’ve seen two commutative monoidal structures underlying the type system: We have the sum types with Void as the neutral element, and the product types with the unit type, (), as the neutral element. We’d like to think of them as analogous to addition and multiplication. In this analogy, Void would correspond to zero, and unit, (), to one.

Let’s see how far we can stretch this analogy. For instance, does multiplication by zero give zero? In other words, is a product type with one component being Void isomorphic to Void? For example, is it possible to create a pair of, say Int and Void?

To create a pair you need two values. Although you can easily come up with an integer, there is no value of type Void. Therefore, for any type a, the type (a, Void) is uninhabited — has no values — and is therefore equivalent to Void. In other words, a*0 = 0.

Another thing that links addition and multiplication is the distributive property:

a * (b + c) = a * b + a * c

Does it also hold for product and sum types? Yes, it does — up to isomorphisms, as usual. The left hand side corresponds to the type:

(a, Either b c)

and the right hand side corresponds to the type:

Either (a, b) (a, c)

Here’s the function that converts them one way:

prodToSum :: (a, Either b c) -> Either (a, b) (a, c)
prodToSum (x, e) = 
    case e of
      Left  y -> Left  (x, y)
      Right z -> Right (x, z)

and here’s one that goes the other way:

sumToProd :: Either (a, b) (a, c) -> (a, Either b c)
sumToProd e = 
    case e of
      Left  (x, y) -> (x, Left  y)
      Right (x, z) -> (x, Right z)

The case of statement is used for pattern matching inside functions. Each pattern is followed by an arrow and the expression to be evaluated when the pattern matches. For instance, if you call prodToSum with the value:

prod1 :: (Int, Either String Float)
prod1 = (2, Left "Hi!")

the e in case e of will be equal to Left "Hi!". It will match the pattern Left y, substituting "Hi!" for y. Since the x has already been matched to 2, the result of the case of clause, and the whole function, will be Left (2, "Hi!"), as expected.

I’m not going to prove that these two functions are the inverse of each other, but if you think about it, they must be! They are just trivially re-packing the contents of the two data structures. It’s the same data, only different format.

Mathematicians have a name for such two intertwined monoids: it’s called a semiring. It’s not a full ring, because we can’t define subtraction of types. That’s why a semiring is sometimes called a rig, which is a pun on “ring without an n” (negative). But barring that, we can get a lot of mileage from translating statements about, say, natural numbers, which form a rig, to statements about types. Here’s a translation table with some entries of interest:

Numbers	Types
0	`Void`
1	`()`
a + b	`Either a b = Left a \| Right b`
a * b	`(a, b)` or `Pair a b = Pair a b`
2 = 1 + 1	`data Bool = True \| False`
1 + a	`data Maybe = Nothing \| Just a`

The list type is quite interesting, because it’s defined as a solution to an equation. The type we are defining appears on both sides of the equation:

List a = Nil | Cons a (List a)

If we do our usual substitutions, and also replace List a with x, we get the equation:

x = 1 + a * x

We can’t solve it using traditional algebraic methods because we can’t subtract or divide types. But we can try a series of substitutions, where we keep replacing x on the right hand side with (1 + a*x), and use the distributive property. This leads to the following series:

x = 1 + a*x
x = 1 + a*(1 + a*x) = 1 + a + a*a*x
x = 1 + a + a*a*(1 + a*x) = 1 + a + a*a + a*a*a*x
...
x = 1 + a + a*a + a*a*a + a*a*a*a...

We end up with an infinite sum of products (tuples), which can be interpreted as: A list is either empty, 1; or a singleton, a; or a pair, a*a; or a triple, a*a*a; etc… Well, that’s exactly what a list is — a string of as!

There’s much more to lists than that, and we’ll come back to them and other recursive data structures after we learn about functors and fixed points.

Solving equations with symbolic variables — that’s algebra! It’s what gives these types their name: algebraic data types.

Finally, I should mention one very important interpretation of the algebra of types. Notice that a product of two types a and b must contain both a value of type a and a value of type b, which means both types must be inhabited. A sum of two types, on the other hand, contains either a value of type a or a value of type b, so it’s enough if one of them is inhabited. Logical and and or also form a semiring, and it too can be mapped into type theory:

Logic	Types
false	`Void`
true	`()`
a \|\| b	`Either a b = Left a \| Right b`
a && b	`(a, b)`

This analogy goes deeper, and is the basis of the Curry-Howard isomorphism between logic and type theory. We’ll come back to it when we talk about function types.

Challenges

Show the isomorphism between Maybe a and Either () a.
Here’s a sum type defined in Haskell:
```
data Shape = Circle Float
           | Rect Float Float
```
When we want to define a function like area that acts on a Shape, we do it by pattern matching on the two constructors:
```
area :: Shape -> Float
area (Circle r) = pi * r * r
area (Rect d h) = d * h
```
Implement Shape in C++ or Java as an interface and create two classes: Circle and Rect. Implement area as a virtual function.
Continuing with the previous example: We can easily add a new function circ that calculates the circumference of a Shape. We can do it without touching the definition of Shape:
```
circ :: Shape -> Float
circ (Circle r) = 2.0 * pi * r
circ (Rect d h) = 2.0 * (d + h)
```
Add circ to your C++ or Java implementation. What parts of the original code did you have to touch?
Continuing further: Add a new shape, Square, to Shape and make all the necessary updates. What code did you have to touch in Haskell vs. C++ or Java? (Even if you’re not a Haskell programmer, the modifications should be pretty obvious.)
Show that a + a = 2 * a holds for types (up to isomorphism). Remember that 2 corresponds to Bool, according to our translation table.

Category Theory

The Hom Functor

Representable Functors

Challenges

Bibliography

Acknowledgments

Free Monoid in Haskell

Free Monoid Universal Construction

Challenges

Acknowledgments

The Data-Centric Picture

The Algebraic Picture

The Parametric Picture

The Categorical Picture

Playing with Adjunctions

Acknowledgments

Bibliography

Introduction to Part II

Acknowledgments

Bibliography

Limit as a Natural Isomorphism

Examples of Limits

Colimits

Continuity

Challenges

Acknowledgments

Polymorphic Functions

Beyond Naturality

Functor Category

2-Categories

Conclusion

Challenges

Acknowledgments

Universal Construction

Currying

Exponentials

Cartesian Closed Categories

Exponentials and Algebraic Data Types

Zeroth Power

Powers of One

First Power

Exponentials of Sums

Exponentials of Exponentials

Exponentials over Products

Curry-Howard Isomorphism

Bibliography

Acknowledgments

Bifunctors

Product and Coproduct Bifunctors

Functorial Algebraic Data Types

Functors in C++

The Writer Functor

Covariant and Contravariant Functors

Profunctors

The Hom-Functor

Challenges

Acknowledgment

Functors in Programming

The Maybe Functor

Equational Reasoning

Optional

Typeclasses

Functor in C++

The List Functor

The Reader Functor

Functors as Containers

Functor Composition

Challenges

Acknowledgments

Product Types

Records

Sum Types

Algebra of Types

Challenges

Acknowledments

Top Posts

License

Blogroll

Follow Me

Archives