Monads



I’m totally exhausted, so I won’t write much. I spoke to the audience practically non-stop for three hours, and then I was rewriting Proto with Joel Falcou and Hartmut Kaiser over beers till late at night. My talk about the connection between Haskell monads and Boost Proto was very well received–beyond my expectations. It looks like there is some real need to put theoretical foundations under the more advanced areas of template metaprogramming; and monads just fit the bill. There was a monad lurking in Proto all this time– the only thing missing was “bind”.

I’m looking forward to tomorrow’s keynote by Hans Boehm, and the continued hacking of Proto with Joel and Hartmut.


In my previous post I talked about categories, functors, endofunctors, and a little about monads. Why didn’t I just start with a Haskell definition of a monad and skip the math? Because I believe that monads are bigger than that. I don’t want to pigeonhole monads based on some specific language or application. If you know what to look for you’ll find monads everywhere. Some languages support them better, as a single general abstraction, others require you to re-implement them from scratch for every use case. I want you to be able to look at a piece of C++ code and say, “Hey, that looks like a monad.” That’s why I started from category theory, which sits above all programming languages and provides a natural platform to talk about monads.

In this installment I will talk about some specific programming challenges that, on the surface, seem to have very little in common. I will show how they can be solved and how from these disparate solutions a pattern emerges that can be recognized as a monad. I’ll talk in some detail about the Maybe monad and the List monad, since they are simple and illustrate the basic ideas nicely. I will leave the really interesting examples, like the state monad and its variations, to the next installment.

Challenges of Functional Programming

You know what a function is: called with the same argument it always returns the same result. A lot of useful computations map directly to functions. Still, a lot don’t. We often have to deal with some notions of computations that are not functions. We can describe them in words or implement as procedures with side effects. In non-functional languages such computations are often called “functions” anyway– I’ll enclose such usage with quotation marks. Eugenio Moggi, the guy who introduced monads to computing, listed a few examples:

  • Partial “functions”: For some arguments they never terminate (e.g., go into an infinite loop). These are not really functions in the mathematical sense.
  • Nondeterministic “functions”: They don’t return a single result but a choice of results. Non-deterministic parsers are like this. When given an input, they return a set of possible parses. Which of them is right depends on the context. These are not really functions because a function must return a single value.
  • “Functions” with side effects: They access and modify some external state and, when called repeatedly, return different results depending on that state.
  • “Functions” that throw exceptions: They are not defined for certain arguments (or states) and they throw exceptions instead of returning a result.
  • Continuations.
  • Interactive input: getchar is a good example. It’s result depends on what hits the keyboard.
  • Interactive output: putchar is a good example. It’s not a function because it doesn’t return anything (if it returns an error code, it doesn’t depend on the argument). Still, it can’t be just optimized away if we want any output to appear on the screen.

The amazing thing is that, using some creative thinking, all these problems can be solved using pure functional methods. I won’t discuss all of them, but I’ll show you a series of progressively more interesting examples.

Error Propagation and Exceptions

Who doesn’t deal with error propagation? Let’s dissect the problem: You are defining a computation that can return a valid result only for a subset of possible arguments. So what happens when the argument is “wrong”? Well, you can return an error.

In the simplest case you might add one extra bit, success or failure, to your result. Sometimes you can spare a bit in the result type: If the correct result is always a non-negative number, you may return negative one as an error. Or, if the result is a pointer, you might return a null pointer instead. But these are just small hacks introduced for the sake of performance. And, like most hacks, they can lead to dangerous code. Consider this (likely incorrect) code:

size_t off = fileName.find('.');
string ext = fileName.substr(off, fileName.length() - off);

If fileName contains no dot, the result is a special value npos signifying an error. The problem is that npos is of the same type as a non-error result, so it can be passed quietly to string::substr causing undefined behavior.

A more general and safer solution is to change– extend– the type of the result to include an error bit. You can do it for any type of result. In Haskell, you just define a type constructor called Maybe:

data Maybe a = Nothing | Just a

It takes an arbitrary type a and defines a new type that adds the special value Nothing to the set of possible values of a. In C++ that would be equivalent to a template:

template<class T>
struct Maybe {
    T just; // valid if 'nothing' is false
    bool nothing;
};

(This is just an example. I’m not arguing that Maybe would be very useful in C++ which has other mechanisms for error propagation.)

So here we have a way to turn a computation that’s not defined for all values into a function that’s defined over the whole domain, but which returns a new richer type.

The next question is, do these new things– functions returning Maybe— compose? What should the caller do with the result of such a function when it’s part of a larger computation? One thing is for sure, this result cannot be passed directly to an unsuspecting function–the error cannot be easily ignored– which is good. If we replace find by a new function safe_find that returns a Maybe<size_t>, the client won’t call substr with it. The types wouldn’t match. Instead, the result of safe_find must be unpacked and (much more likely than before) tested.

Maybe<size_t> off = safe_find(fileName, '.');
string ext;
if (!off.nothing)
    ext = fileName.substr(off.just, fileName.length() - off.just);

Notice what happened here: By changing the return type of a function we broke the natural way functions compose– the result of one becoming the input of the next. On the other hand, we have come up with a new way of composing such functions. Let me chain a few of such compositions for better effect:

Maybe<Foo> foo = f(x);
if (!foo.nothing) {
    Maybe<Bar> bar = g(foo.just);
    if (!bar.nothing) {
        Maybe<Baz> baz = h(bar.just);
        if (!baz.nothing) {
            ...
        }
    }
}

I have highlighted the elements of the “glue,” which is used to compose our new, more error-conscious functions. Notice how this boilerplate glue gets in the way of code clarity. Ideally, we would like to write something like this:

DO
{
    auto y = f(x);
    auto v = g(y);
    auto z = h(v);
    return z;
}

where DO would magically provide the glue that’s implicit in the chaining of f, g, and h.

It’s not easy to do this– abstract the glue away– in C++. I’m not even going to try. But in Haskell it’s a different story. Let’s start with the almost direct translation of the C++ code into Haskell:

compose n =
    let m = f n in
    case m of
    Nothing -> Nothing
    Just n1 ->
        let m1 = g n1 in
        case m1 of
        Nothing -> Nothing
        Just n2 -> 
            let n3 = h n2 in
            n3

The if statements are replaced by Haskell’s pattern matching (case x of). The ms are the Maybes and the ns art their contents (if they aren’t Nothings).

Notice the characteristic cascading style of this code– the nested conditionals or pattern matches. Let’s analyze one level of such a cascade. We start with a Maybe value (one returned by f n). We unpack it and examine the contents. If the result is not an error (Just n1), we continue with the rest of the body of compose. I have highlighted the “rest of the body” in blue (the “glue” is still in red).

What’s also important is that, if the result is an error (the Nothing branch of the pattern match) the whole “rest of the body” is skipped. In order to abstract the glue, we have to abstract the “rest of the body” as well. Such an abstraction is called a continuation. Let’s write this continuation as a lambda (lambdas in Haskell are written using the backslash, which is nothing but the Greek letter λ missing one leg):

\ n1 -> 
    let m1 = g n1 in
    case m1 of
    Nothing -> Nothing
    Just n2 -> 
        let n3 = h n2 in
        n3

And here’s the trick: We can abstract the glue as a (higher-order) function that takes a Maybe value and a continuation. Let’s call this new function bind and rewrite compose with its help:

compose n =
    let m = f n in
    -- the first argument is m, the second is the whole continuation
    bind m (\n1 ->
                let m1 = g n1 in
                case m1 of
                Nothing -> Nothing
                Just n2 -> 
                    let n3 = h n2 in
                    n3)

Here’s how bind is implemented:

bind m f = 
    case m of 
    Nothing -> Nothing  -- bypass the continuation
    Just n -> f n       -- pass n to the continuation

or, more compactly,

bind Nothing cont  = Nothing
bind (Just n) cont = cont n

Figure 1 illustrates the complete code transformation. The result of f n, which is a Maybe, is passed to bind, represented by a blue box. Inside bind the Maybe is unpacked. If its value is Nothing, nothing happens. If its value is Just n1, the rest of the code, the continuation, is called with the argument n1. The continuation itself is rewritten using bind, and so on. The final continuation calls return, which I will explain shortly.

Fig 1. Composition of functions that return Maybe results

The position of bind, the blue box in Figure 1, between its Maybe argument and the continuation suggests that infix notation might be more appropriate. Indeed, in Haskell bind is represented by the infix operator, >>=:

Nothing  >>= cont = Nothing
(Just x) >>= cont = cont x

(The left-hand side of the equal sign is the operator between its two arguments [actually, patterns], and the right-hand side is the body of the function.) We can express the type signature of the bind function as:

(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b

It takes a Maybe a and a function from a to Maybe b (that’s our continuation). It returns a Maybe b. When dealing with higher order functions I find type signatures extremely useful; so much so that I often use them as comments in my C++ meta-code.

The complete rewrite of compose using >>= looks like this:

compose1 n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Now is the time to explain the mysterious return at the end of the function. No, it’s not the keyword for returning a result from a function. It’s a function that takes an argument of type a and turns it into a Maybe a:

return n = Just n

We need return because the result of compose is a Maybe. If any of the functions returns Nothing, compose returns Nothing (it’s part of the definition of >>=). Only when all functions succeed do we call return with the correct result, n3. It turns n3 into Just n3, thus announcing the success of the computation and encapsulating the final result.

You might say that using return instead of Just n3 is an overkill, but there are good reasons to do that. One is to encapsulate and hide direct access to the implementation of Maybe. The other has to do with the ways this pattern is generalized beyond Maybe.

The Maybe Monad

What does Maybe have to do with monads? Let’s see what we have done so far from a more general perspective.

We started from a computation which didn’t behave like a function– here, it was not defined for all arguments. We had found a clever way to turn it into a function by enriching its return type. Such general “enriching” of types can be expressed as a type constructor.

From a category-theoretical point of view, defining a type constructor, which is a mapping from types to types, gets us half way towards defining an endofunctor. The other part is the mapping of functions (morphisms). For any function that takes values of type a and returns values of type b we need a corresponding function that acts between the enriched types. In Haskell, the mapping of functions to functions can be expressed as a higher-order function. When this mapping is part of an endofunctor, the corresponding higher-order function is called fmap.

Let’s call the type constructor M. It constructs the enriched type M a from any type a. The function fmap has the following signature:

fmap :: (a -> b) -> (M a -> M b)

It maps a function from a to b to a function from M a to M b. In the case of Maybe we already know the type constructor part, and fmap will follow soon.

Let’s go back to our computation turned function returning an enriched type. In general, its signature is:

f :: a -> M b

Since we care about composability of computations, we need a way to glue the enriched functions together. In the process of gluing we usually have to inspect the enriched type and make decisions based on its value. One of such decisions might be whether to continue with the rest of the computation or not. So the general bind operation must take two arguments: the result of the previous enriched function and the rest of the computation– a continuation:

bind :: M a -> (a -> M b) -> M b

If you squint enough, you can see a similarity between bind and fmap.

Here’s how you do it: Imagine that you supply bind with its second argument, the continuation. There is still one free parameter of the type M a. So what’s left is a function that takes M a and returns M b. (This is not exactly currying, since we are binding the second argument rather than the first, but the idea is the same.) That’s exactly the type of function that occurs on the right hand side of the signature of fmap. Seen from that point of view, bind maps functions into functions:

(a -> M b) -> (M a -> M b)

The function of the left hand side is our continuation, the one on the right hand side is bind with the second argument plugged by the continuation.

This signature is really close to that of fmap, but not quite. Luckily, we still have a second family of functions at our disposal, the polymorphic return. The signature of return is:

return :: a -> M a

Using both bind and return we can lift any function f:

f :: a -> b

to a function g:

g :: M a -> M b

Here’s the magic formula that defines g in terms of f, bind, and return (the dot denotes regular function composition):

g ma = bind ma (return . f)

Or, using infix notation:

g ma = ma >>= (return . f)

This mapping of f into g becomes our definition of fmap. Of course, fmap must obey some axioms, in particular it must preserve function composition and interact correctly with unit (which I will discuss shortly). These axioms translate back into corresponding conditions on bind and return, which I’m not going to discuss here.

Now that we have recovered the functor, we need the two other parts of a monad: unit and join. Guess what, return is just another name for unit. For any type, it lifts an element of that type to an element of the enriched type in the most natural manner (incidentally, the word natural has a very precise meaning in category theory; both unit and join are natural transformations of the functor (M, fmap)).

To recover join, let’s look at its type signature in Haskell:

join :: M (M a) -> M a

It collapses one layer of the type constructor. In the case of Maybe it should map a double Maybe to a single Maybe. It’s easy to figure out that it should map Just (Just x) into Just x and anything else to Nothing. But there is a more general way of defining join using bind combined with the identity function:

join :: M (M a) -> M a
join mmx = mmx >>= id

We are binding a value of type M (M a) to a function id. This function, when acting on M a returns M a. You can convince yourself that these signatures match the general signature of bind (see Appendix); and that this definition, when applied to Maybe, produces the correct result.

To summarize: What we have done here is to show that Maybe together with bind and return is a monad in the category-theory sense. Actually, we have shown something far more general: Any triple that consists of a type constructor and two functions bind and return that obey certain identities (axioms) define a monad. In category theory this triple is called a Kleisli triple and can be used as an alternative definition of a monad.

Syntactic Sugar

So far you’ve seen only the Maybe example, so it might seem like we are pulling out a monadic cannon to kill a mosquito. But, as you’ll see later, this same pattern pops up in a lot of places in various guises. In fact it’s so common in functional languages that it acquired its own syntactic sugar. In Haskell, this sugar is called the do notation. Let’s go back to the implementation of our function compose:

compose n =
    f n >>= \n1 ->
        g n1 >>= \n2 ->
            h n2 >>= \n3 ->
                return n3

Here’s exactly the same code in do notation:

compose n = do
    n1 <- f n
    n2 <- g n1
    n3 <- h n2
    return n3

This looks deceptively similar to an imperative program. In fact here’s a C++ version of it (for the sake of the argument let’s assume that f takes a pointer to Foo and h returns an integer) :

int compose(Foo * n)
{
    auto n1 = f(n);
    auto n2 = g(n1);
    auto n3 = h(n2);
    return n3;
}

Uh, one more thing. This is how you would call it:

try {
    compose(pFoo);
}
catch(...) {
    // error handling
}

In C++ you get virtually the same functionality not by modifying the return type, but by throwing an exception. (Now you may utter your first cry, “Hey, that looks like a monad!”, when looking at C++ code.)

Just like in our Haskell example, once any of the functions reports an error (throws an exception), the rest of the body of compose is skipped. You might say that C++ exceptions offer more power than the Maybe monad and you’ll be right. But that’s because I haven’t shown you the Error monad and the Exception monad.

Where Haskell’s Exception monad beats C++ exceptions is in type checking. Remember the unfortunate attempt at adding exceptions specification to the C++ type system? (Java went the same way.) Here’s what the C++ standard says:

An implementation shall not reject an expression merely because when executed it throws or might throw an exception that the containing function does not allow.

In other words, exceptions specifications are bona fide comments. Not so in Haskell! If a function returns a Maybe or Exception, that becomes part of its type signature which is checked both at the call site and at the function definition site. No cheating is allowed, period.

But the major difference between Haskell’s and C++’s approach is that the do notation generalizes to all monads, whereas C++ neat try/catch syntax applies only to exceptions.

A word of caution when reading Haskell monadic code. Despite similarities in usage, Haskell’s left arrow is not an assignment. The left hand side identifier corresponds to the argument of the continuation (which is the rest of the do block). It is the result of unpacking the outcome of the right hand side expression. This unpacking always happens inside bind.

Non-deterministic Computations

In the previous post I introduced the list monad by defining a functor and two families of functions. The type constructor part of the functor mapped any type a into the list of a, [a]. The part that acted on functions (now we know that, in general, it’s called fmap; but for lists it’s just map) worked by applying the function to each element of the list. The unit and the join were polymorphic functions. unit was defined as:

 unit x = [x]

and join as concat.

Now I can tell you that the list monad provides the solution to the problem of implementing non-deterministic computations. Imagine parsing a non-deterministic grammar. A production might offer multiple alternative parse trees for the same input.

These types of computations may be simulated with functions returning lists of possible results. Here’s the same trick again: whatever type is returned by a deterministic parser (e.g., a parse-tree type), it’s now turned into an enriched type (a parse tree list).

We’ve already seen the category-theory construction of the list monad but here we are facing a slightly different problem: how to string together functions that return lists. We know that we have to define bind. It’s signature, in this case, would be:

bind :: [a] -> (a -> [b]) -> [b]

It takes a list (which is the result of a previous function) and a continuation, (a -> [b]), and must produce another list. The obvious thing is to apply the continuation to each element of the input list. But since the continuation also produces a list, we’ll end up with a list of lists. To get a single list as the result, we’ll simply concatenate the sublists. Here’s the final implementation:

xs >>= cont = concat (map cont xs)

The application of the continuation to each element of the input list is done using map.

Is this a coincidence that we were able to define bind in therms of join and fmap? Not at all. The general formula for converting the functor definition of a monad to a Kleisli triple is:

  • Take the object-mapping part of the functor (the type constructor)
  • Define bind as
    bind x f = join ((fmap f) x))

    where fmap is the part of the functor that maps morphisms.

  • Define return as unit

Now we know how to move between those two definitions back and forth.

Just as in the case of Maybe, we can apply the do notation to functions returning lists:

toss2Dice = do
    n <- tossDie
    m <- tossDie
    return (n + m)

Here, tossDie returns the list of all possible outcomes of a die toss, and toss2Dice returns the list of all possible sums of the outcomes of a two-die toss.

An interesting observation is that the list monad is closely related to list comprehensions, down to the use of left arrows (in Haskell). For instance, the above example is equivalent to:

toss2Dice = [n + m | n <- tossDie, m <- tossDie]

Conclusions

There is a large class of computations that convert input to output in a non-functional way. Many of those can be expressed as functions that return “enriched” output. This enrichment of the output can be expressed in terms of a type constructor. This type constructor defines the first component of a monad.

Computations have to be composable, so we need a way of composing the enriched functions. This introduces the second component of the monad, the bind family of higher-order functions.

Finally, we should be able to construct functions that return enriched types, and for that we need the third component of the monad, the return family of functions. They convert regular values into their enriched counterparts.

I’ve shown you two examples of Haskell monads in action, but the real treat will come in the third installment of my mini-series. I’ll show you how to define and compose functions that, instead of returning regular values, return functions that calculate those values.

Appendix: Join From Bind

By popular demand (see comments), I decided to explain in more detail the typing of the formula:

join :: M (M a) -> M a
join mmx = mmx >>= id

At first look it seems like id doesn’t have the right signature for the second parameter to bind:

bind :: M a' -> (a' -> M b') -> M b'

(I changed the names of type arguments to a' and b' to avoid confusion.) However, notice that here we are calling bind with the first argument, mmx, of type M (M a). It means that a' in this particular instantiation of bind is (M a). The second argument to bind is id with the signature:
id: c -> c
Here, id will be called with c equal to (M a), so its return type will also be (M a). With those substitutions, bind will also return type (M a):

M (M a) -> (M a -> M a) -> M a

Thus the whole combination (mmx >>= id) ends up with the right type signature for join.

Bibliography

  1. Eugenio Moggi, Notions of Computation and Monads. This is a hard core research paper that started the whole monad movement in functional languages.
  2. Philip Wadler, Monads for Functional Programming. The classic paper introducing monads into Haskell.
  3. Tony Morris, What Does Monad Mean?
  4. Brian Beckman, Don’t fear the Monad
  5. Graham Hutton, Erik Meijer, Monadic Parsing in Haskell.
  6. Brian McNamara, Yannis Smaragdakis, Syntax sugar for FC++: lambda, infix, monads, and more. A C++ template library for functional programming that imitates Haskell pretty closely.

The Monad is like a bellows:
it is empty yet infinitely capable.
The more you use it, the more it produces;
the more you talk about it, the less you understand.

–Monad Te Ching

I don’t know if I’m exaggerating but it seems like every programmer who gets monads posts a tutorial about them. (And each post begins with: There’s already a lot of monad tutorials on the Internet, but…) The reason is that getting monads it’s like a spiritual experience that you want to share with others.

When facing a monad, people often behave like the three blind men describing an elephant. You’ll see monads described as containers and monads described as actions. Some people see them as a cover-up for side effects, others as examples of endofunctors in Category Theory.

Monads are hard to describe because they don’t correspond to anything in our everyday experience. Compare this with Objects in Object-Oriented programming. Even an infant knows what an object is (something you can put in your mouth). What do you do with a monad?

But first, let me answer the pertinent question:

Why Bother?

Monads enable pure functional programmers to implement mutation, state, I/O, and a plethora of other things that are not functions. Well, you might say, they brought it on themselves. They tied their hands behind their backs and now they’re bragging that they can type with their toes. Why should we pay attention?

The thing is, all those non-functional things that we are so used to doing in imperative programming are also sources of a lot of troubles. Take side effects for instance. Smart programmers (read: the ones who burnt their fingers too many times) try to minimize the use of global and static variables for fear of side effects. That’s doable if you know what you’re doing. But the real game changer is multithreading. Controlling the sharing of state between threads is not just good programming practice– it’s a survival skill. Extreme programming models are in use that eliminate sharing altogether, like Erlang’s full isolation of processes and its restriction of message passing to values.

Monads stake the ground between total anarchy of imperative languages and the rigid dictatorship of Erlang-like isolationism. They don’t prohibit sharing or side effects but let you control them. And, since the control is exercised through the type system, a program that uses monads can be checked for correctness by the compiler. Considering how hard it it to test for data races in imperative programs, I think it’s worth investing some time to learn monads.

There is also a completely different motivation: metaprogramming. The template language used for metaprogramming in C++ is a pure functional language (see my blog post, What does Haskell have to do with C++?). If monads are so important in functional programming, they must also pop up in C++ metaprogramming. And indeed they do. I hope to discuss this topic in a future post.

So what’s a monad?

A Categorical Answer

If you don’t know anything about category theory, don’t get intimidated. This is really simple stuff and it will clarify a lot of things, not to mention earning you some bragging rights. My main goal is to share some intuitions from mathematics that will build foundations for a deeper understanding of monads in programming. In this installment I will explain categories, functors, and endofunctors, leading up to monads. I will give examples taken both from everyday life and from programming. I will really get into monads and their practical applications in the next installment, so be patient.

Categories

A category is a natural extension of our notion of sets and functions. The generalization of a set in a category is called an object (a pretty neutral term with little semantic ballast), and the generalization of a function is called a morphism. In fact, the standard example of a category is the category of sets and functions called (capital letter) Set.

A morphism (read “function”) goes from one object (read “set”) to another. Mathematical functions like sin or exp usually go from the set of real numbers to the set of real numbers. But you may also define functions like isPrime that go from natural numbers to Booleans, or a function price that goes from a set of goods to the set of numbers.

The only thing a mathematician needs to know about morphisms is that they can be composed. If you have a morphism from A to B, A->B, and another going from B to C, B->C, then they can be composed to a morphism from A to C, A->C. And just like the standard composition of functions, morphism composition must be associative, so we don’t need parentheses when composing more than two of them.

Actually, two things. There must be, for every object, a special morphism called identity that essentially does nothing and when composed with any other morphism reproduces the same morphism.

Just to throw you off the track, a category doesn’t have to be built on sets and functions. You can easily construct simple categories from blobs and arrows. Fig 1 shows such a category that contains two objects and four morphisms: arrows between them (formally, those arrows are ordered pairs of objects so, for instance, f is a pair (A, B)). You can easily check that any two morphisms can be composed and that the two moprphisms iA and iB serve as identities.

Fig 1. A simple category with two objects and four morphisms.

That’s it! Hopefully I have just convinced you that a category is not a big deal. But let’s get down to Earth. The one category that’s really important in programming languages is the category of types and functions, in particular its Haskell version called Hask. There usually is a finite set of basic types like integers or Booleans, and an infinite set of derived types, like lists of integers, functions from integers to Booleans, etc. In Hask, a type is just a set of values. For instance, the type Char is a set {‘a’, ‘b’, ‘c’, … }.

So, in the category Hask, types are objects and functions are morphisms. Indeed, a function maps one type into another (forget for a moment functions of multiple arguments– they can be modeled with currying– and polymorphic functions– they are families of functions). And these are functions in the functional-programming sense: called with the same values they return the same values–no side effects allowed.

Function composition is just passing the result of one function as an argument to another. The identity function takes x and immediately returns it back.

This is all fine, but what’s in it for me, you might ask. So here’s the first insight and a moment of Zen. If there is one thing that you can call the essence of programming, it’s composability. In any style of programming you always compose your program from smaller pieces, and those pieces from even smaller pieces, and so on. That’s why categories with their composable morphisms are so important. The essence of Lego blocks is the way they fit together, their composability, not the color or size. The essence of functional programming is how functions work together: how you can build larger functions from smaller ones.

Every category is defined by its choice of objects and morphisms. But is there something that can characterize a given category that’s independent of its choice of particular objects and morphisms? How do you expose the inner structure of a particular category? Mathematicians know exactly how to do that. You have to be able to map categories into other categories while preserving some constraints imposed by the way morphisms are attached to objects and the way they compose. Such maps let you find similarities between categories and catalog different kinds of categories. That’s when things get really interesting.

Functors

A functor, F, is a map from one category to another: it maps objects into objects and morphisms into morphisms. But it can’t do it in a haphazard way because that would destroy the very structures that we are after. So we must impose some “obvious” (mathematicians love that word) constraints.

First of all, if you have a morphism between two objects in the first category then it better be mapped into a morphism between the corresponding objects in the second category. Fig 2 explains this diagrammatically. Object A is mapped into F(A), object B into F(B). A morphism f from A to B is mapped into a morphism F(f) from F(A) to F(B). Mathematicians say that such a diagram must commute, that is the result must be the same whether you go from A to F(A) and then apply F(f), or first apply f and then go from B to F(B).

Functor diagram

Fig 2. Diagram showing the action of a functor F on objects A and B and a morphism f. The bottom part lives in F's domain (source) category, the top part in its codomain (the target).

Moreover, such mapping should preserve the composition property of morphisms. So if morphism h is a composition of f and g, then F(h) must be a composition of F(f) and F(g). And, of course, the functor must map identity morphisms into identity morphisms.

To get a feel for how constrained functors are by these conditions, consider how you could map the category in Fig 1 into itself (such a functor just rearranges things inside one category). There are two trivial mappings that collapse both objects into one (either A or B), and turn all morphisms into identity. Then there is the identity functor that maps both objects into themselves and all morphisms into themselves. Finally, there is just one “interesting” functor that maps A into B and B into A with f and g switching roles. Now imagine a similar category but with the g arrow removed (yes, it’s still a category). Suddenly there is no functor other than the collapsing ones between Fig 1 and that new category. That’s because the two categories have completely different structure.

Let me now jump into more familiar territory. Since we are mostly interested in one category, Hask, let me define a functor that maps that category into itself (such functors are called endofunctors). An object in Hask is a type, so our functor must map types into types. The way to look at it is that a functor in Hask constructs one type from another– it’s a type constructor. Don’t get confused by the name: a type constructor creates a new type in your program, but that type has already existed in Hask.

A classical example is the list type constructor. Given any type it constructs a list of that type. Type Integer is mapped into list of integers or, in Haskell notation, [Integer]. Notice that this is not a map defined on integer values, like 1, 2, or 3. It also doesn’t add a new type to Hask— the type [Integer] is already there. It just maps one type into another. For C++ programmers: think of mapping type T into a container of T; for instance, std::vector<T>.

Mapping the types is the easy part, what about functions? We have to find a way to take a particular function and map it into a function on lists. That’s also easy: apply the function to each element of the list in turn. There is a (higher level) function in Haskel that does it. It’s called map and it takes a function and a list and returns a new list (or, because of currying, you may say that it takes a function and returns a function acting on lists). In C++ there is a corresponding template function called std::transform (well, it takes two iterators and a function object, but the idea is the same).

Mathematicians often use diagrams to illustrate the properties of morphisms and functors (see Fig 2). The arrows for morphisms are usually horizontal, while the arrows for functors are vertical (going up). That’s why the mapping of morphisms under a functor is often called lifting. You can take a function operating on integers and “lift it” (using a functor) to a function operating on lists of integers, and so on.

The list functor obviously preserves function composition and identity (I’ll leave it as an easy but instructive exercise for the reader).

And now for another moment of Zen. What’s the second most important property of programming? Reusability! Look what we have just done: We took all the functions we’ve implemented so far and lifted them to the level of lists. We’ve got functions operating on lists essentially for free (well, we’ve got a small but important subset of those functions). And the same trick may be applied to all kinds of containers, arrays, trees, queues, unique_ptrs and more.

It’s all beautiful, but you don’t really need category theory to apply functions to lists. Still it’s always good to see patterns in programming, and this one is definitely a keeper. The real revolution starts with monads. And, guess what, the list functor is actually a monad. You just need a few more ingredients.

What’s the intuition behind the statement that mappings expose the structure of the system? Consider the schematic of the London underground in Fig 3. It’s just a bunch of circles and lines. It’s only relevant because there is a mapping between the city of London and this schematic. The circles correspond to tube stations and the lines to train connections. Most importantly, if trains run between two stations, the corresponding circles in the diagram are connected by lines and vice versa: these are the constraints that the mapping preserves. The schematic shows a certain structure that exists in London (mostly hidden underground) which is made apparent by the mapping.

Fig 3. The schematic map of London underground system.

Interestingly, what I’m doing here is also mapping: London and the underground map correspond to two categories. Trains stations/circles are objects and train connections/lines are morphism. How’s that for an example?

Endofunctors

Mathematicians love mappings that preserve “obvious” constraints. As I explained, such mappings abstract inner structures away from the details of implementation. But you can also learn a lot about structure by studying non-trivial mappings into itself. Functors that map a category into itself are called endofunctors (like endo-scopes they let you look inside things). If functors expose similarities, endofunctors expose self-similarities. Take one look at the fractal fern, Fig 4, and you’ll understand how powerful self-similarity can be.

Fractal Fern

Fig 4. This fractal fern was generated using just four endomorphisms.

With a little bit of imagination you can see the list functor exposing fern-like structures inside Hask (Fig 5). Chars fan out into lists of Chars, which then fan out into lists of lists of Chars, and so on, ad infinitum. Horizontal structures described by functions from Char to Bool are reflected at higher and higher levels as functions on lists, lists of lists, etc.

Fig 5. The action of the list type constructor reveals fractal-like structure inside Hask. The functor lifts things up, the functions act horizontally.

A C++ template that takes a type parameter could be considered a type constructor. How likely is it that it also defines a functor (loosely speaking– C++ is not as mathematized as Haskell)? You have to ask yourself: Is the type parameter constrained in any way? It’s often hard to say, because type constraints are implicit in the body of a template and are tested only during instantiation. For instance, the type parameter for a std::vector must be copyable. That eliminates, for instance, classes that have private or deleted (in C++0x) copy constructors. This is not a problem though, because copyable types form a subcategory (I’m speaking really loosely now). The important thing is that a vector of copyable is itself copyable, so the “endo-” part of the endomorphism holds. In general you want to be able to feed the type created by the type constructor back to the type constructor, as in std::vector<std::vector<Foo>>. And, of course, you have to be able to lift functions in a generic way too, as in std::transform.

Monads

Ooh, Monads!
–Haskell Simpson

It’s time to finally lift the veil. I’ll start with the definition of a monad that builds on the previous sections and is mostly used by mathematicians. There is another one that’s less intuitive but easier to use in programming. I’ll leave that one for later.

A monad is an endofunctor together with two special families of morphisms, both going vertically, one up and one down (for “directions” see Fig 5). The one going up is called unit and the one going down is called join.

Now we are juggling a lot of mappings so let’s slow down to build some intuition. Remember, a functor maps objects: in our case, types, which are sets of values. The functor doesn’t see what’s inside the objects; morphisms, in general, do. In our case, a morphism is a function that maps values of one type into values of another type. Our functors, which are defined by type constructors, usually map poorer types into richer types; in the sense that type Bool is a set that contains just two elements, True and False, but type [Bool] contains infinitely many lists of True and False.

Unit takes a value from the poorer type, then picks one value from the richer type, and pronounces the two roughly equivalent. Such a rough equivalent of True from the Bool object is the one-element list [True] from the [Bool] object. Similarly, unit would map False into [False]. It would also map integer 5 into [5] and so on.

Unit can be thought of as immersing values from a lower level into the higher level in the most natural way possible. By the way, in programming we call a family of functions defined for any type a polymorphic function. In C++, we would express unit as a template, like this:

template<class T>
std::vector<T> unit(T value) {
    std::vector<T> vec;
    vec.push_back(value);
    return vec;
}

To explain join, imagine the functor acting twice. For instance, from a given type T the list functor will first construct the type [T] (list of T), and then [[T]] (list of list of T). Join removes one layer of “listiness” by joining the sub-lists. Plainly speaking, it just concatenates the inner lists. Given, for instance, [[a, b], [c], [d, e]], it produces [a, b, c, d, e]. It’s a many-to-one mapping from the richer type to the poorer type and the type-parameterized family of joins also forms a polymorphic function (a template, in C++).

There are a few monadic axioms that define the properties of unit and join (for instance that unit and join cancel each other), but I’m not going to elaborate on them. The important part is that the existence of unit and join imposes new constraints on the endofunctor and thus exposes even more structure.

Mathematicians look at join as the grandfather of all multiplication with unit being its neutral element. It’s heaven for mathematicians because multiplication leads to algebraic structures and indeed monads are great for constructing algebras and finding their hidden properties.

Unlike mathematicians, we programmers are not that interested in algebraic structures. So there must be something else that makes monads such a hit. As I mentioned in the beginning, in programming we often face problems that don’t naturally translate into the functional paradigm. There are some types of computations that are best expressed in imperative style. It doesn’t mean they can’t be translated into functions, it’s just that the translation is somewhat awkward and tedious. Monads provide an elegant tool to do this translation. Monads made possible the absorption and assimilation of imperative programming into functional programming, so much so that some people claim (tongue in cheek?) that Haskell is the best imperative language. And like all things functional monads are bound to turn around and find their place in imperative programming. But that’s material for my next blog post.

Bibliography

« Previous Page