Monads



This is part 21 of Categories for Programmers. Previously: Monads: Programmer’s Definition. See the Table of Contents.

Now that we know what the monad is for — it lets us compose embellished functions — the really interesting question is why embellished functions are so important in functional programming. We’ve already seen one example, the Writer monad, where embellishment let us create and accumulate a log across multiple function calls. A problem that would otherwise be solved using impure functions (e.g., by accessing and modifying some global state) was solved with pure functions.

The Problem

Here is a short list of similar problems, copied from Eugenio Moggi’s seminal paper, all of which are traditionally solved by abandoning the purity of functions.

  • Partiality: Computations that may not terminate
  • Nondeterminism: Computations that may return many results
  • Side effects: Computations that access/modify state
    • Read-only state, or the environment
    • Write-only state, or a log
    • Read/write state
  • Exceptions: Partial functions that may fail
  • Continuations: Ability to save state of the program and then restore it on demand
  • Interactive Input
  • Interactive Output

What really is mind blowing is that all these problems may be solved using the same clever trick: turning to embellished functions. Of course, the embellishment will be totally different in each case.

You have to realize that, at this stage, there is no requirement that the embellishment be monadic. It’s only when we insist on composition — being able to decompose a single embellished function into smaller embellished functions — that we need a monad. Again, since each of the embellishments is different, monadic composition will be implemented differently, but the overall pattern is the same. It’s a very simple pattern: composition that is associative and equipped with identity.

The next section is heavy on Haskell examples. Feel free to skim or even skip it if you’re eager to get back to category theory or if you’re already familiar with Haskell’s implementation of monads.

The Solution

First, let’s analyze the way we used the Writer monad. We started with a pure function that performed a certain task — given arguments, it produced a certain output. We replaced this function with another function that embellished the original output by pairing it with a string. That was our solution to the logging problem.

We couldn’t stop there because, in general, we don’t want to deal with monolithic solutions. We needed to be able to decompose one log-producing function into smaller log-producing functions. It’s the composition of those smaller functions that led us to the concept of a monad.

What’s really amazing is that the same pattern of embellishing the function return types works for a large variety of problems that normally would require abandoning purity. Let’s go through our list and identify the embellishment that applies to each problem in turn.

Partiality

We modify the return type of every function that may not terminate by turning it into a “lifted” type — a type that contains all values of the original type plus the special “bottom” value . For instance, the Bool type, as a set, would contain two elements: True and False. The lifted Bool contains three elements. Functions that return the lifted Bool may produce True or False, or execute forever.

The funny thing is that, in a lazy language like Haskell, a never-ending function may actually return a value, and this value may be passed to the next function. We call this special value the bottom. As long as this value is not explicitly needed (for instance, to be pattern matched, or produced as output), it may be passed around without stalling the execution of the program. Because every Haskell function may be potentially non-terminating, all types in Haskell are assumed to be lifted. This is why we often talk about the category Hask of Haskell (lifted) types and functions rather than the simpler Set. It is not clear, though, that Hask is a real category (see this Andrej Bauer post).

Nondeterminism

If a function can return many different results, it may as well return them all at once. Semantically, a non-deterministic function is equivalent to a function that returns a list of results. This makes a lot of sense in a lazy garbage-collected language. For instance, if all you need is one value, you can just take the head of the list, and the tail will never be evaluated. If you need a random value, use a random number generator to pick the n-th element of the list. Laziness even allows you to return an infinite list of results.

In the list monad — Haskell’s implementation of nondeterministic computations — join is implemented as concat. Remember that join is supposed to flatten a container of containers — concat concatenates a list of lists into a single list. return creates a singleton list:

instance Monad [] where
    join = concat
    return x = [x]

The bind operator for the list monad is given by the general formula: fmap followed by join which, in this case gives:

as >>= k = concat (fmap k as)

Here, the function k, which itself produces a list, is applied to every element of the list as. The result is a list of lists, which is flattened using concat.

From the programmer’s point of view, working with a list is easier than, for instance, calling a non-deterministic function in a loop, or implementing a function that returns an iterator (although, in modern C++, returning a lazy range would be almost equivalent to returning a list in Haskell).

A good example of using non-determinism creatively is in game programming. For instance, when a computer plays chess against a human, it can’t predict the opponent’s next move. It can, however, generate a list of all possible moves and analyze them one by one. Similarly, a non-deterministic parser may generate a list of all possible parses for a given expression.

Even though we may interpret functions returning lists as non-deterministic, the applications of the list monad are much wider. That’s because stitching together computations that produce lists is a perfect functional substitute for iterative constructs — loops — that are used in imperative programming. A single loop can be often rewritten using fmap that applies the body of the loop to each element of the list. The do notation in the list monad can be used to replace complex nested loops.

My favorite example is the program that generates Pythagorean triples — triples of positive integers that can form sides of right triangles.

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

The first line tells us that z gets an element from an infinite list of positive numbers [1..]. Then x gets an element from the (finite) list [1..z] of numbers between 1 and z. Finally y gets an element from the list of numbers between x and z. We have three numbers 1 <= x <= y <= z at our disposal. The function guard takes a Bool expression and returns a list of units:

guard :: Bool -> [()]
guard True  = [()]
guard False = []

This function (which is a member of a larger class called MonadPlus) is used here to filter out non-Pythagorean triples. Indeed, if you look at the implementation of bind (or the related operator >>), you’ll notice that, when given an empty list, it produces an empty list. On the other hand, when given a non-empty list (here, the singleton list containing unit [()]), bind will call the continuation, here return (x, y, z), which produces a singleton list with a verified Pythagorean triple. All those singleton lists will be concatenated by the enclosing binds to produce the final (infinite) result. Of course, the caller of triples will never be able to consume the whole list, but that doesn’t matter, because Haskell is lazy.

The problem that normally would require a set of three nested loops has been dramatically simplified with the help of the list monad and the do notation. As if that weren’t enough, Haskell let’s you simplify this code even further using list comprehension:

triples = [(x, y, z) | z

This is just further syntactic sugar for the list monad (strictly speaking, MonadPlus).

You might see similar constructs in other functional or imperative languages under the guise of generators and coroutines.

Read-Only State

A function that has read-only access to some external state, or environment, can be always replaced by a function that takes that environment as an additional argument. A pure function (a, e) -> b (where e is the type of the environment) doesn’t look, at first sight, like a Kleisli arrow. But as soon as we curry it to a -> (e -> b) we recognize the embellishment as our old friend the reader functor:

newtype Reader e a = Reader (e -> a)

You may interpret a function returning a Reader as producing a mini-executable: an action that given an environment produces the desired result. There is a helper function runReader to execute such an action:

runReader :: Reader e a -> e -> a
runReader (Reader f) e = f e

It may produce different results for different values of the environment.

Notice that both the function returning a Reader, and the Reader action itself are pure.

To implement bind for the Reader monad, first notice that you have to produce a function that takes the environment e and produces a b:

ra >>= k = Reader (\e -> ...)

Inside the lambda, we can execute the action ra to produce an a:

ra >>= k = Reader (\e -> let a = runReader ra e
                         in ...)

We can then pass the a to the continuation k to get a new action rb:

ra >>= k = Reader (\e -> let a  = runReader ra e
                             rb = k a
                         in ...)

Finally, we can run the action rb with the environment e:

ra >>= k = Reader (\e -> let a  = runReader ra e
                             rb = k a
                         in runReader rb e)

To implement return we create an action that ignores the environment and returns the unchanged value.

Putting it all together, after a few simplifications, we get the following definition:

instance Monad (Reader e) where
    ra >>= k = Reader (\e -> runReader (k (runReader ra e)) e)
    return x = Reader (\e -> x)

Write-Only State

This is just our initial logging example. The embellishment is given by the Writer functor:

newtype Writer w a = Writer (a, w)

For completeness, there’s also a trivial helper runWriter that unpacks the data constructor:

runWriter :: Writer w a -> (a, w)
runWriter (Writer (a, w)) = (a, w)

As we’ve seen before, in order to make Writer composable, w has to be a monoid. Here’s the monad instance for Writer written in terms of the bind operator:

instance (Monoid w) => Monad (Writer w) where 
    (Writer (a, w)) >>= k = let (a', w') = runWriter (k a)
                            in Writer (a', w `mappend` w')
    return a = Writer (a, mempty)

State

Functions that have read/write access to state combine the embellishments of the Reader and the Writer. You may think of them as pure functions that take the state as an extra argument and produce a pair value/state as a result: (a, s) -> (b, s). After currying, we get them into the form of Kleisli arrows a -> (s -> (b, s)), with the embellishment abstracted in the State functor:

newtype State s a = State (s -> (a, s))

Again, we can look at a Kleisli arrow as returning an action, which can be executed using the helper function:

runState :: State s a -> s -> (a, s)
runState (State f) s = f s

Different initial states may not only produce different results, but also different final states.

The implementation of bind for the State monad is very similar to that of the Reader monad, except that care has to be taken to pass the correct state at each step:

sa >>= k = State (\s -> let (a, s') = runState sa s
                            sb = k a
                        in runState sb s')

Here’s the full instance:

instance Monad (State s) where
    sa >>= k = State (\s -> let (a, s') = runState sa s 
                            in runState (k a) s')
    return a = State (\s -> (a, s))

There are also two helper Kleisli arrows that may be used to manipulate the state. One of them retrieves the state for inspection:

get :: State s s
get = State (\s -> (s, s))

and the other replaces it with a completely new state:

put :: s -> State s ()
put s' = State (\s -> ((), s'))

Exceptions

An imperative function that throws an exception is really a partial function — it’s a function that’s not defined for some values of its arguments. The simplest implementation of exceptions in terms of pure total functions uses the Maybe functor. A partial function is extended to a total function that returns Just a whenever it makes sense, and Nothing when it doesn’t. If we want to also return some information about the cause of the failure, we can use the Either functor instead (with the first type fixed, for instance, to String).

Here’s the Monad instance for Maybe:

instance Monad Maybe where
    Nothing >>= k = Nothing
    Just a  >>= k = k a
    return a = Just a

Notice that monadic composition for Maybe correctly short-circuits the computation (the continuation k is never called) when an error is detected. That’s the behavior we expect from exceptions.

Continuations

It’s the “Don’t call us, we’ll call you!” situation you may experience after a job interview. Instead of getting a direct answer, you are supposed to provide a handler, a function to be called with the result. This style of programming is especially useful when the result is not known at the time of the call because, for instance, it’s being evaluated by another thread or delivered from a remote web site. A Kleisli arrow in this case returns a function that accepts a handler, which represents “the rest of the computation”:

data Cont r a = Cont ((a -> r) -> r)

The handler a -> r, when it’s eventually called, produces the result of type r, and this result is returned at the end. A continuation is parameterized by the result type. (In practice, this is often some kind of status indicator.)

There is also a helper function for executing the action returned by the Kleisli arrow. It takes the handler and passes it to the continuation:

runCont :: Cont r a -> (a -> r) -> r
runCont (Cont k) h = k h

The composition of continuations is notoriously difficult, so its handling through a monad and, in particular, the do notation, is of extreme advantage.

Let’s figure out the implementation of bind. First let’s look at the stripped down signature:

(>>=) :: ((a -> r) -> r) -> 
         (a -> (b -> r) -> r) -> 
         ((b -> r) -> r)

Our goal is to create a function that takes the handler (b -> r) and produces the result r. So that’s our starting point:

ka >>= kab = Cont (\hb -> ...)

Inside the lambda, we want to call the function ka with the appropriate handler that represents the rest of the computation. We’ll implement this handler as a lambda:

runCont ka (\a -> ...)

In this case, the rest of the computation involves first calling kab with a, and then passing hb to the resulting action kb:

runCont ka (\a -> let kb = kab a
                  in runCont kb hb)

As you can see, continuations are composed inside out. The final handler hb is called from the innermost layer of the computation. Here’s the full instance:

instance Monad (Cont r) where
    ka >>= kab = Cont (\hb -> runCont ka (\a -> runCont (kab a) hb))
    return a = Cont (\ha -> ha a)

Interactive Input

This is the trickiest problem and a source of a lot of confusion. Clearly, a function like getChar, if it were to return a character typed at the keyboard, couldn’t be pure. But what if it returned the character inside a container? As long as there was no way of extracting the character from this container, we could claim that the function is pure. Every time you call getChar it would return exactly the same container. Conceptually, this container would contain the superposition of all possible characters.

If you’re familiar with quantum mechanics, you should have no problem understanding this analogy. It’s just like the box with the Schrödinger’s cat inside — except that there is no way to open or peek inside the box. The box is defined using the special built-in IO functor. In our example, getChar could be declared as a Kleisli arrow:

getChar :: () -> IO Char

(Actually, since a function from the unit type is equivalent to picking a value of the return type, the declaration of getChar is simplified to getChar :: IO Char.)

Being a functor, IO lets you manipulate its contents using fmap. And, as a functor, it can store the contents of any type, not just a character. The real utility of this approach comes to light when you consider that, in Haskell, IO is a monad. It means that you are able to compose Kleisli arrows that produce IO objects.

You might think that Kleisli composition would allow you to peek at the contents of the IO object (thus “collapsing the wave function,” if we were to continue the quantum analogy). Indeed, you could compose getChar with another Kleisli arrow that takes a character and, say, converts it to an integer. The catch is that this second Kleisli arrow could only return this integer as an (IO Int). Again, you’ll end up with a superposition of all possible integers. And so on. The Schrödinger’s cat is never out of the bag. Once you are inside the IO monad, there is no way out of it. There is no equivalent of runState or runReader for the IO monad. There is no runIO!

So what can you do with the result of a Kleisli arrow, the IO object, other than compose it with another Kleisli arrow? Well, you can return it from main. In Haskell, main has the signature:

main :: IO ()

and you are free to think of it as a Kleisli arrow:

main :: () -> IO ()

From that perspective, a Haskell program is just one big Kleisli arrow in the IO monad. You can compose it from smaller Kleisli arrows using monadic composition. It’s up to the runtime system to do something with the resulting IO object (also called IO action).

Notice that the arrow itself is a pure function — it’s pure functions all the way down. The dirty work is relegated to the system. When it finally executes the IO action returned from main, it does all kinds of nasty things like reading user input, modifying files, printing obnoxious messages, formatting a disk, and so on. The Haskell program never dirties its hands (well, except when it calls unsafePerformIO, but that’s a different story).

Of course, because Haskell is lazy, main returns almost immediately, and the dirty work begins right away. It’s during the execution of the IO action that the results of pure computations are requested and evaluated on demand. So, in reality, the execution of a program is an interleaving of pure (Haskell) and dirty (system) code.

There is an alternative interpretation of the IO monad that is even more bizarre but makes perfect sense as a mathematical model. It treats the whole Universe as an object in a program. Notice that, conceptually, the imperative model treats the Universe as an external global object, so procedures that perform I/O have side effects by virtue of interacting with that object. They can both read and modify the state of the Universe.

We already know how to deal with state in functional programming — we use the state monad. Unlike simple state, however, the state of the Universe cannot be easily described using standard data structures. But we don’t have to, as long as we never directly interact with it. It’s enough that we assume that there exists a type RealWorld and, by some miracle of cosmic engineering, the runtime is able to provide an object of this type. An IO action is just a function:

type IO a  =  RealWorld -> (a, RealWorld)

Or, in terms of the State monad:

type IO = State RealWorld

However, >=> and return for the IO monad have to be built into the language.

Interactive Output

The same IO monad is used to encapsulate interactive output. RealWorld is supposed to contain all output devices. You might wonder why we can’t just call output functions from Haskell and pretend that they do nothing. For instance, why do we have:

putStr :: String -> IO ()

rather than the simpler:

putStr :: String -> ()

Two reasons: Haskell is lazy, so it would never call a function whose output — here, the unit object — is not used for anything. And, even if it weren’t lazy, it could still freely change the order of such calls and thus garble the output. The only way to force sequential execution of two functions in Haskell is through data dependency. The input of one function must depend on the output of another. Having RealWorld passed between IO actions enforces sequencing.

Conceptually, in this program:

main :: IO ()
main = do
    putStr "Hello "
    putStr "World!"

the action that prints “World!” receives, as input, the Universe in which “Hello ” is already on the screen. It outputs a new Universe, with “Hello World!” on the screen.

Conclusion

Of course I have just scratched the surface of monadic programming. Monads not only accomplish, with pure functions, what normally is done with side effects in imperative programming, but they also do it with a high degree of control and type safety. They are not without drawbacks, though. The major complaint about monads is that they don’t easily compose with each other. Granted, you can combine most of the basic monads using the monad transformer library. It’s relatively easy to create a monad stack that combines, say, state with exceptions, but there is no formula for stacking arbitrary monads together.

Next: Monads Categorically.


This is part 20 of Categories for Programmers. Previously: Free/Forgetful Adjunctions. See the Table of Contents.

Programmers have developed a whole mythology around monads. It’s supposed to be one of the most abstract and difficult concepts in programming. There are people who “get it” and those who don’t. For many, the moment when they understand the concept of the monad is like a mystical experience. The monad abstracts the essence of so many diverse constructions that we simply don’t have a good analogy for it in everyday life. We are reduced to groping in the dark, like those blind men touching different parts of the elephant end exclaiming triumphantly: “It’s a rope,” “It’s a tree trunk,” or “It’s a burrito!”

Let me set the record straight: The whole mysticism around the monad is the result of a misunderstanding. The monad is a very simple concept. It’s the diversity of applications of the monad that causes the confusion.

As part of research for this post I looked up duct tape (a.k.a., duck tape) and its applications. Here’s a little sample of things that you can do with it:

  • sealing ducts
  • fixing CO2 scrubbers on board Apollo 13
  • wart treatment
  • fixing Apple’s iPhone 4 dropped call issue
  • making a prom dress
  • building a suspension bridge

Now imagine that you didn’t know what duct tape was and you were trying to figure it out based on this list. Good luck!

So I’d like to add one more item to the collection of “the monad is like…” clichés: The monad is like duct tape. Its applications are widely diverse, but its principle is very simple: it glues things together. More precisely, it composes things.

This partially explains the difficulties a lot of programmers, especially those coming from the imperative background, have with understanding the monad. The problem is that we are not used to thinking of programing in terms of function composition. This is understandable. We often give names to intermediate values rather than pass them directly from function to function. We also inline short segments of glue code rather than abstract them into helper functions. Here’s an imperative-style implementation of the vector-length function in C:

double vlen(double * v) {
  double d = 0.0;
  int n;
  for (n = 0; n < 3; ++n)
    d += v[n] * v[n];
  return sqrt(d);
}

Compare this with the (stylized) Haskell version that makes function composition explicit:

vlen = sqrt . sum . fmap  (flip (^) 2)

(Here, to make things even more cryptic, I partially applied the exponentiation operator (^) by setting its second argument to 2.)

I’m not arguing that Haskell’s point-free style is always better, just that function composition is at the bottom of everything we do in programming. And even though we are effectively composing functions, Haskell does go to great lengths to provide imperative-style syntax called the do notation for monadic composition. We’ll see its use later. But first, let me explain why we need monadic composition in the first place.

The Kleisli Category

We have previously arrived at the writer monad by embellishing regular functions. The particular embellishment was done by pairing their return values with strings or, more generally, with elements of a monoid. We can now recognize that such embellishment is a functor:

newtype Writer w a = Writer (a, w)

instance Functor (Writer w) where
  fmap f (Writer (a, w)) = Writer (f a, w)

We have subsequently found a way of composing embellished functions, or Kleisli arrows, which are functions of the form:

a -> Writer w b

It was inside the composition that we implemented the accumulation of the log.

We are now ready for a more general definition of the Kleisli category. We start with a category C and an endofunctor m. The corresponding Kleisli category K has the same objects as C, but its morphisms are different. A morphism between two objects a and b in K is implemented as a morphism:

a -> m b

in the original category C. It’s important to keep in mind that we treat a Kleisli arrow in K as a morphism between a and b, and not between a and m b.

In our example, m was specialized to Writer w, for some fixed monoid w.

Kleisli arrows form a category only if we can define proper composition for them. If there is a composition, which is associative and has an identity arrow for every object, then the functor m is called a monad, and the resulting category is called the Kleisli category.

In Haskell, Kleisli composition is defined using the fish operator >=>, and the identity arrrow is a polymorphic function called return. Here’s the definition of a monad using Kleisli composition:

class Monad m where
  (>=>) :: (a -> m b) -> (b -> m c) -> (a -> m c)
  return :: a -> m a

Keep in mind that there are many equivalent ways of defining a monad, and that this is not the primary one in the Haskell ecosystem. I like it for its conceptual simplicity and the intuition it provides, but there are other definitions that are more convenient when programming. We’ll talk about them momentarily.

In this formulation, monad laws are very easy to express. They cannot be enforced in Haskell, but they can be used for equational reasoning. They are simply the standard composition laws for the Kleisli category:

(f >=> g) >=> h = f >=> (g >=> h) -- associativity
return >=> f = f                  -- left unit
f >=> return = f                  -- right unit

This kind of a definition also expresses what a monad really is: it’s a way of composing embellished functions. It’s not about side effects or state. It’s about composition. As we’ll see later, embellished functions may be used to express a variety of effects or state, but that’s not what the monad is for. The monad is the sticky duct tape that ties one end of an embellished function to the other end of an embellished function.

Going back to our Writer example: The logging functions (the Kleisli arrows for the Writer functor) form a category because Writer is a monad:

instance Monoid w => Monad (Writer w) where
    f >=> g = \a -> 
        let Writer (b, s)  = f a
            Writer (c, s') = g b
        in Writer (c, s `mappend` s')
    return a = Writer (a, mempty)

Monad laws for Writer w are satisfied as long as monoid laws for w are satisfied (they can’t be enforced in Haskell either).

There’s a useful Kleisli arrow defined for the Writer monad called tell. It’s sole purpose is to add its argument to the log:

tell :: w -> Writer w ()
tell s = Writer ((), s)

We’ll use it later as a building block for other monadic functions.

Fish Anatomy

When implementing the fish operator for different monads you quickly realize that a lot of code is repeated and can be easily factored out. To begin with, the Kleisli composition of two functions must return a function, so its implementation may as well start with a lambda taking an argument of type a:

(>=>) :: (a -> m b) -> (b -> m c) -> (a -> m c)
f >=> g = \a -> ...

The only thing we can do with this argument is to pass it to f:

f >=> g = \a -> let mb = f a
                in ...

At this point we have to produce the result of type m c, having at our disposal an object of type m b and a function g :: b -> m c. Let’s define a function that does that for us. This function is called bind and is usually written in the form of an infix operator:

(>>=) :: m a -> (a -> m b) -> m b

For every monad, instead of defining the fish operator, we may instead define bind. In fact the standard Haskell definition of a monad uses bind:

class Monad m where
    (>>=) :: m a -> (a -> m b) -> m b
    return :: a -> m a

Here’s the definition of bind for the Writer monad:

(Writer (a, w)) >>= f = let Writer (b, w') = f a
                        in  Writer (b, w `mappend` w')

It is indeed shorter than the definition of the fish operator.

It’s possible to further dissect bind, taking advantage of the fact that m is a functor. We can use fmap to apply the function a -> m b to the contents of m a. This will turn a into m b. The result of the application is therefore of type m (m b). This is not exactly what we want — we need the result of type m b — but we’re close. All we need is a function that collapses or flattens the double application of m. Such function is called join:

join :: m (m a) -> m a

Using join, we can rewrite bind as:

ma >>= f = join (fmap f ma)

That leads us to the third option for defining a monad:

class Functor m => Monad m where
    join :: m (m a) -> m a
    return :: a -> m a

Here we have explicitly requested that m be a Functor. We didn’t have to do that in the previous two definitions of the monad. That’s because any type constructor m that either supports the fish or bind operator is automatically a functor. For instance, it’s possible to define fmap in terms of bind and return:

fmap f ma = ma >>= \a -> return (f a)

For completeness, here’s join for the Writer monad:

join :: Monoid w => Writer w (Writer w a) -> Writer w a
join (Writer ((Writer (a, w')), w)) = Writer (a, w `mappend` w')

The do Notation

One way of writing code using monads is to work with Kleisli arrows — composing them using the fish operator. This mode of programming is the generalization of the point-free style. Point-free code is compact and often quite elegant. In general, though, it can be hard to understand, bordering on cryptic. That’s why most programmers prefer to give names to function arguments and intermediate values.

When dealing with monads it means favoring the bind operator over the fish operator. Bind takes a monadic value and returns a monadic value. The programmer may chose to give names to those values. But that’s hardly an improvement. What we really want is to pretend that we are dealing with regular values, not the monadic containers that encapsulate them. That’s how imperative code works — side effects, such as updating a global log, are mostly hidden from view. And that’s what the do notation emulates in Haskell.

You might be wondering then, why use monads at all? If we want to make side effects invisible, why not stick to an imperative language? The answer is that the monad gives us much better control over side effects. For instance, the log in the Writer monad is passed from function to function and is never exposed globally. There is no possibility of garbling the log or creating a data race. Also, monadic code is clearly demarcated and cordoned off from the rest of the program.

The do notation is just syntactic sugar for monadic composition. On the surface, it looks a lot like imperative code, but it translates directly to a sequence of binds and lambda expressions.

For instance, take the example we used previously to illustrate the composition of Kleisli arrows in the Writer monad. Using our current definitions, it could be rewritten as:

process :: String -> Writer String [String]
process = upCase >=> toWords

This function turns all characters in the input string to upper case and splits it into words, all the while producing a log of its actions.

In the do notation it would look like this:

process s = do
    upStr <- upCase s
    toWords upStr

Here, upStr is just a String, even though upCase produces a Writer:

upCase :: String -> Writer String String
upCase s = Writer (map toUpper s, "upCase ")

This is because the do block is desugared by the compiler to:

process s = 
   upCase s >>= \ upStr ->
       toWords upStr

The monadic result of upCase is bound to a lambda that takes a String. It’s the name of this string that shows up in the do block. When reading the line:

upStr <- upCase s

we say that upStr gets the result of upCase s.

The pseudo-imperative style is even more pronounced when we inline toWords. We replace it with the call to tell, which logs the string "toWords ", followed by the call to return with the result of splitting the string upStr using words. Notice that words is a regular function working on strings.

process s = do
    upStr <- upStr s
    tell "toWords "
    return (words upStr)

Here, each line in the do block introduces a new nested bind in the desugared code:

process s = 
    upCase s >>= \upStr ->
      tell "toWords " >>= \() ->
        return (words upStr)

Notice that tell produces a unit value, so it doesn’t have to be passed to the following lambda. Ignoring the contents of a monadic result (but not its effect — here, the contribution to the log) is quite common, so there is a special operator to replace bind in that case:

(>>) :: m a -> m b -> m b
m >> k = m >>= (\_ -> k)

The actual desugaring of our code looks like this:

process s = 
    upCase s >>= \upStr ->
      tell "toWords " >>
        return (words upStr)

In general, do blocks consist of lines (or sub-blocks) that either use the left arrow to introduce new names that are then available in the rest of the code, or are executed purely for side-effects. Bind operators are implicit between the lines of code. Incidentally, it is possible, in Haskell, to replace the formatting in the do blocks with braces and semicolons. This provides the justification for describing the monad as a way of overloading the semicolon.

Notice that the nesting of lambdas and bind operators when desugaring the do notation has the effect of influencing the execution of the rest of the do block based on the result of each line. This property can be used to introduce complex control structures, for instance to simulate exceptions.

Interestingly, the equivalent of the do notation has found its application in imperative languages, C++ in particular. I’m talking about resumable functions or coroutines. It’s not a secret that C++ futures form a monad. It’s an example of the continuation monad, which we’ll discuss shortly. The problem with continuations is that they are very hard to compose. In Haskell, we use the do notation to turn the spaghetti of “my handler will call your handler” into something that looks very much like sequential code. Resumable functions make the same transformation possible in C++. And the same mechanism can be applied to turn the spaghetti of nested loops into list comprehensions or “generators,” which are essentially the do notation for the list monad. Without the unifying abstraction of the monad, each of these problems is typically addressed by providing custom extensions to the language. In Haskell, this is all dealt with through libraries.

Next: Monads and Effects.


This is part 4 of the miniseries about solving a simple constraint-satisfaction problem:

  s e n d
+ m o r e
---------
m o n e y

using monads in C++. Previously: The Tale of Two Monads. To start from the beginning, go to Using Monads in C++ to Solve Constraints: 1.

In the previous post I showed some very general programming techniques based on functional data structures and monads. A lot of the code I wrote to solve this particular arithmetic puzzle is easily reusable. In fact the monadic approach is perfect for constructing libraries. I talked about it when discussing C++ futures and ranges. A monad is a pure expression of composability, and composability is the most important feature of any library.

You would be justified to think that rolling out a monad library just to solve a simple arithmetic puzzle is overkill. But I’m sure you can imagine a lot of other, more complex problems that could be solved using the same techniques. You might also fear that such functional methods — especially when implemented in C++, which is not optimized for this kind of programming — would lead to less performant code. This doesn’t matter too much if you are solving a one-shot problem, and the time you spend developing and debugging your code dwarfs the time it takes the program to complete execution. But even if performance were an issue and you were faced with a larger problem, functional code could be parallelized much easier than its imperative counterpart with no danger of concurrency bugs. So you might actually get better performance from functional code by running it in parallel.

Refactoring

In this installment I’d like to talk about something that a lot of functional programmers swear by: Functional programs are amazingly easy to refactor. Anybody who has tried to refactor C++ code can testify to how hard it is, and how long it takes to iron out all the bugs introduced by refactoring. With functional code, it’s a breeze. So let’s have another look at the code from the previous post and do some deep surgery on it. This is our starting point:

StateList<tuple> solve()
{
    StateList sel = &select;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

What immediately stands out is the amount of repetition: eight lines of code look almost exactly the same. Conceptually, these lines correspond to eight nested loops that would be used in the imperative solution. The question is, how can we abstract over control structures, such as loops? But in our monadic implementation the loops are replaced with higher-order functions, and that opens up a path toward even further abstraction.

Reifying Substitutions

The first observation is that we are missing a data structure that should correspond to the central concept we use when describing the solution — the substitution. We are substituting numbers for letters, but we only see those letters as names of variables. A more natural implementation would use a map data structure, mapping characters to integers.

Notice, however, that this mapping has to be dynamically collected and torn down as we are exploring successive branches of the solution space. The imperative approach would demand backtracking. The functional approach makes use of persistent data structures. I have described such data structures previously, in particular a persistent red-black tree. It can be easily extended to a red-black map. You can find the implementation on github.

A red-black map is a generic data structure, which we’ll specialize for our use:

using Map = RBMap<char, int>;

A map lookup may fail, so it should either be implemented to return an optional, or use a default value in case of a failure. In our case we know that we never look up a value that’s not in the map (unless our code is buggy), so we can use a helper function that never fails:

int get(Map const & subst, char c)
{
    return subst.findWithDefault(-1, c);
}

Once we have objectified the substitution as a map, we can convert a whole string of characters to a number in one operation:

int toNumber(Map const & subst, string const & str)
{
    int acc = 0;
    for (char c : str)
    {
        int d = get(subst, c);
        acc = 10 * acc + d;
    }
    return acc;
}

There is one more thing that we may automate: finding the set of unique characters in the puzzle. Previously, we just figured out that they were s, e, n, d, m, o, r, y and hard-coded them into the solution. Now we can use a function, fashioned after a Haskell utility called nub:

string nub(string const & str)
{
    string result;
    for (char c : str)
    {
        if (result.find(c) == string::npos)
            result.push_back(c);
    }
    return result;
}

Don’t worry about the quadratic running time of nub — it’s only called once.

Recursion

With those preliminaries out of the way, let’s concentrate on analyzing the code. We have a series of nested mbind calls. The simplest thing is to turn these nested calls into recursion. This involves setting up the first recursive call, implementing the recursive function, and writing the code to terminate the recursion.

The main observation is that each level of nesting accomplishes one substitution of a character by a number. The recursion should end when we run out of characters to substitute.

Let’s first think about what data has to be passed down in a recursive call. One item is the string of characters that still need substituting. The second is the substitution map. While the first argument keeps shrinking, the second one keeps growing. The third argument is the number to be used for the next substitution.

Before we make the first recursive call, we have to prepare the initial arguments. We get the string of characters to be substituted by running nub over the text of our puzzle:

nub("sendmoremoney")

The second argument should start as an empty map. The third argument, the digit to be used for the next substitution, comes from binding the selection plan select<int> to a lambda that will call our recursive function. In Haskell, it’s common to call the auxiliary recursive function go, so that’s what we’ll do here as well:

StateList<tuple<int, int, int>> solve()
{
    StateList<int> sel = &select<int>;
    Map subst;
    return mbind(sel, [=](int s) {
        return go(nub("sendmoremoney"), subst, s);
    });
}

When implementing go, we have to think about terminating the recursion. But first, let’s talk about how to continue it.

We are called with the next digit to be used for substitution, so we should perform this substitution. This means inserting a key/value pair into our substitution map subst. The key is the first character from the string str. The value is the digit i that we were given.

subst.inserted(str[0], i)

Notice that, because the map is immutable, the method inserted returns a new map with one more entry. This is a persistent data structure, so the new map shares most of its data with its previous version. The creation of the new version takes logarithmic time in the number of entries (just as it does with a regular balanced tree that’s used in the Standard Library std::map). The advantage of using a persistent map is that we don’t have to do any backtracking — the unmodified version is still available after the return from the recursive call.

The recursive call to go expects three arguments: (1) the string of characters yet to be substituted — that will be the tail of the string that we were passed, (2) the new substitution map, and (3) the new digit n. We get this new digit by binding the digit selection sel to a lambda that makes the recursive call:

mbind(sel, [=](int n) {
    string tail(&str[1], &str[str.length()]);
    return go(tail, subst.inserted(str[0], i), n);
});

Now let’s consider the termination condition. Since go was called with the next digit to be substituted, we need at least one character in the string for this last substitution. So the termination is reached when the length of the string reaches one. We have to make the last substitution and then call the final function that prunes the bad substitutions. Here’s the complete implementation of go:

StateList<tuple<int, int, int>> go(string str, Map subst, int i)
{
    StateList<int> sel = &select<int>;

    assert(str.length() > 0);
    if (str.length() == 1)
        return prune(subst.inserted(str[0], i));
    else
    {
        return mbind(sel, [=](int n) {
            string tail(&str[1], &str[str.length()]);
            return go(tail, subst.inserted(str[0], i), n);
        });
    }
}

The function prune is lifted almost literally from the original implementation. The only difference is that we are now using a substitution map.

StateList<tuple<int, int, int>> prune(Map subst)
{
    return mthen(guard(get(subst, 's') != 0 && get(subst, 'm') != 0), 
      [=]() {
        int send  = toNumber(subst, "send");
        int more  = toNumber(subst, "more");
        int money = toNumber(subst, "money");
        return mthen(guard(send + more == money), [=]() {
            return mreturn(make_tuple(send, more, money));
        });
    });
}

The top-level code that starts the chain of events and displays the solution is left unchanged:

int main()
{
    List<int> lst{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    cout << evalStateList(solve(), lst);
    return 0;
}

It’s a matter of opinion whether the refactored code is simpler or not. It’s definitely easier to generalize and it’s less error prone. But the main point is how easy it was to make the refactorization. The reason for that is that, in functional code there are no hidden dependencies, because there are no hidden side effects. What would usually be considered a side effect in imperative code is accomplished using pure functions and monadic binding. There is no way to forget that a given function works with state, because a function that deals with state has the state specified in its signature — it returns a StateList. And it doesn’t itself modify the state. In fact it doesn’t have access to the state. It just composes functions that will modify the state when executed. That’s why we were able to move these functions around so easily.

A Word About Syntax

Monadic code in C++ is artificially complex. That’s because C++ was not designed for functional programming. The same program written in Haskell is much shorter. Here it is, complete with the helper functions prune, get, and toNumber:

solve = StateL select >>= go (nub "sendmoremoney") M.empty
  where
    go [c] subst i = prune (M.insert c i subst)
    go (c:cs) subst i = StateL select >>= go cs (M.insert c i subst)
    prune subst = do
        guard (get 's' /= 0 && get 'm' /= 0)
        let send  = toNumber "send"
            more  = toNumber "more"
            money = toNumber "money"
        guard (send + more == money)
        return (send, more, money)
      where
        get c = fromJust (M.lookup c subst)
        toNumber str = asNumber (map get str)
        asNumber = foldl (\t u -> t*10 + u) 0
main = print $ evalStateL solve [0..9]

Some of the simplification comes from using the operator >>= in place of mbind, and a simpler syntax for lambdas. But there is also some built-in syntactic sugar for chains of monadic bindings in the form of the do notation. You see an example of it in the implementation of prune.

The funny thing is that C++ is on the verge of introducing something equivalent to do notation called resumable functions. There is a very strong proposal for resumable functions in connection with the future monad, dealing with asynchronous functions. There is talk of using it with generators, or lazy ranges, which are a version of the list monad. But there is still a need for a push towards generalizing resumable functions to user-defined monads, like the one I described in this series. It just doesn’t make sense to add separate language extensions for each aspect of the same general pattern. This pattern has been known in functional programming for a long time and it’s called a monad. It’s a very useful and versatile pattern whose importance has been steadily growing as the complexity of software projects increases.

Bibliography

  1. The code for this blog is available on github.
  2. Douglas M. Auclair, MonadPlus: What a Super Monad!, The Monad.Reader Issue 11, August 25, 2008. It contains the Haskell implementation of the solution to this puzzle.
  3. Justin Le, Unique sample drawing & searches with List and StateT — “Send more money”. It was the blog post that inspired this series.

This is the second part of the cycle “Using Monads in C++ to Solve Constraints” in which I’m illustrating functional techniques using the example of a simple puzzle:

  s e n d
+ m o r e
---------
m o n e y

Previously, I talked about using the list monad to search the breadth of the solution space.

What would you do if you won a lottery? Would you buy a sports car, drop your current job, and go on a trip across the United States? Maybe you would start your own company, multiply the money, and then buy a private jet?

We all like making plans, but they are often contingent on the state of our finances. Such plans can be described by functions. For instance, the car-buying plan is a function:

pair<Car, Cash> buyCar(Cash cashIn)

The input is some amount of Cash, and the output is a brand new Car and the leftover Cash (not necessarily a positive number!).

In general, a financial plan is a function that takes cash and returns the result paired with the new value of cash. It can be described generically using a template:

template<class A>
using Plan = function<pair<A, Cash>(Cash)>;

You can combine smaller plans to make bigger plans. For instance, you may use the leftover cash from your car purchase to finance your trip, or invest in a business, and so on.

There are some things that you already own, and you can trivially include them in your plans:

template<class A>
Plan<A> got_it(A a)
{
    return [a](Cash s) { return make_pair(a, s); };
}

What does all this daydreaming have to do with the solution to our puzzle? I mentioned previously that we needed to keep track of state, and this is how functional programmers deal with state. Instead of relying on side effects to silently modify the state, they write code that generates plans of action.

An imperative programmer, on the other hand, might implement the car-buying procedure by passing it a bank object, and the withdrawal of money would be a side effect of the purchase. Or, the horror!, the bank object could be global.

In functional programming, each individual plan is a function: The state comes in, and the new state goes out, paired with whatever value the function was supposed to produce in the first place. These little plans are aggregated into bigger plans. Finally, the master plan is executed — that’s when the actual state is passed in and the result comes out together with a new state. We can do the same in modern C++ using lambdas.

You might be familiar with a similar technique used with expression templates in C++. Expression templates form the basis of efficient implementation of matrix calculus, where expressions involving matrices and vectors are not evaluated on the spot but rather converted to parse trees. These trees may then be evaluated using more efficient techniques. You can think of an expression template as a plan for evaluating the final result.

The State Monad

SelectionsWithLists

To find the solution to our puzzle, we will generate substitutions by picking digits from a list of integers. This list of integers will be our state.

using State = List<int>;

We will be using a persistent list, so we won’t have to worry about backtracking. Persistent lists are never mutated — all their versions persist, and we can go back to them without fearing that they may have changed. We’ll need that when we combine our state calculations with the list monad to get the final solution. For now, we’ll just consider one substitution.

We’ll be making and combining all kinds of plans that rely on, and produce new state:

template<class A>
using Plan = function<pair<A, State>(State)>;

We can always run a plan, provided we have the state available:

template<class A>
pair<A, State> runPlan(Plan<A> pl, State s)
{
    return pl(s);
}

You may recall from the previous post that the essence of every monad is the ability to compose smaller items into bigger ones. In the case of the state monad we need the ability to compose plans of action.

For instance, suppose that you know how to generate a plan to travel across the United States on a budget, as long as you have a car. You don’t have a car though. Not to worry, you have a plan to get a car. It should be easy to generate a composite plan that involves buying the car and then continuing with your trip.

Notice the two ingredients: one is the plan to buy a car: Plan<Car>. The second ingredient is a function that takes a car and produces the plan for your trip, Plan<Trip>. This function is the continuation that leads to your final goal: it’s a function of the type Plan<Trip>(Car). And the continuation itself may in turn be a composite of many smaller plans.

So here’s the function mbind that binds a plan pl to a continuation k. The continuation uses the output of the plan pl to generate another plan. The function mbind is supposed to return a new composite plan, so it must return a lambda. Like any other plan, this lambda takes a state and returns a pair: value, state. We’ll implement this lambda for the most general case.

The logic is simple. Inside the lambda we will have the state available, so we can run the plan pl. We get back a pair: the value of type A and the new state. We pass this value to the continuation k and get back a new plan. Finally, we run that plan with the new state and we’re done.

template<class A, class F>
auto mbind(Plan<A> pl, F k) -> decltype(k(pl(State()).first))
{
    using B = decltype(k(pl(State()).first)(State()).first);
    // this should really be done with concepts
    static_assert(std::is_convertible<
        F, std::function<Plan<B>(A)>> ::value,
        "mbind requires a function type Plan<B>(A)");

    return [pl, k](State s) {
        pair<A, State> ps = runPlan(pl, s);
        Plan<B> plB = k(ps.first);
        return runPlan(plB, ps.second); // new state!
    };
}

Notice that all this running of plans inside mbind doesn’t happen immediately. It happens when the lambda is executed, which is when the larger plan is run (possibly as part of an even larger plan). So what mbind does is to create a new plan to be executed in the future.

And, as with every monad, there is a function that takes regular input and turns it into a trivial plan. I called it got_it before, but a more common name would be mreturn:

template<class A>
Plan<A> mreturn(A a)
{
    return [a](State s) { return make_pair(a, s); };
}

The state monad usually comes with two helper functions. The function getState gives direct access to the state by turning it into the return value:

Plan<State> getState()
{
    return [](State s) { return make_pair(s, s); };
}

Using getState you may examine the state when the plan is running, and dynamically select different branches of your plan. This makes monads very flexible, but it also makes the composition of different monads harder. We’ll see this in the next installment, when we compose the state monad with the list monad.

The second helper function is used to modify (completely replace) the state.

Plan<void*> putState(State newState)
{
    return [=](State s) { return make_pair(nullptr, newState); };
}

It doesn’t evaluate anything useful, so its return type is void* and the returned value is nullptr. Its only purpose is to encapsulate a side effect. What’s amazing is that you can do it and still preserve functional purity. There’s no better example of having your cake and eating it too.

Example

To demonstrate the state monad in action, let’s implement a tiny example. We start with a small plan that just picks the first number from a list (the list will be our state):

pair<int, List<int>> select(List<int> lst)
{
    int i = lst.front();
    return make_pair(i, lst.popped_front());
}

Persistent list method popped_front returns a list with its first element popped off. Typical of persistent data structures, this method doesn’t modify the original list. It doesn’t copy the list either — it just creates a pointer to its tail.

Here’s our first plan:

Plan<int> sel = &select;

Now we can create a more complex plan to produce a pair of integers:

Plan<pair<int, int>> pl = 
    mbind(sel, [=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

Let’s analyze this code. The first mbind takes a plan sel that selects the first element from a list (the list will be provided later, when we execute this plan). It binds it to the continuation (whose body I have grayed out) that takes the selected integer i and produces a plan to make a pair of integers. Here’s this continuation:

[=](int i) { return
    mbind(sel, [=](int j) { return
    mreturn(make_pair(i, j));
}); });

It binds the plan sel to another, smaller continuation that takes the selected element j and produces a plan to make a pair of integers. Here’s this smaller continuation:

[=](int j) { return
    mreturn(make_pair(i, j));
});

It combines the first integer i that was captured by the lambda, with the second integer j that is the argument to the lambda, and creates a trivial plan that returns a pair:

mreturn(make_pair(i, j));

Notice that we are using the same plan sel twice. But when this plan is executed inside of our final plan, it will return two different elements from the input list. When mbind is run, it first passes the state (the list of integers) to the first sel. It gets back the modified state — the list with the first element popped. It then uses this shorter list to execute the plan produced by the continuation. So the second sel gets the shortened list, and picks its first element, which is the second element of the original list. It, too, shortens the list, which is then passed to mreturn, which doesn’t modify it any more.

We can now run the final plan by giving it a list to pick from:

List<int> st{ 1, 2, 3 };
cout << runPlan(pl, st) << endl;

We are still not ready to solve the puzzle, but we are much closer. All we need is to combine the list monad with the state monad. I’m leaving this task for the next installment.

But here’s another look at the final solution:

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return mbind(sel, [=](int s) {
    return mbind(sel, [=](int e) {
    return mbind(sel, [=](int n) {
    return mbind(sel, [=](int d) {
    return mbind(sel, [=](int m) {
    return mbind(sel, [=](int o) {
    return mbind(sel, [=](int r) {
    return mbind(sel, [=](int y) {
        return mthen(guard(s != 0 && m != 0), [=]() {
            int send  = asNumber(vector<int>{s, e, n, d});
            int more  = asNumber(vector<int>{m, o, r, e});
            int money = asNumber(vector<int>{m, o, n, e, y});
            return mthen(guard(send + more == money), [=]() {
                return mreturn(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

This time I didn’t rename mbind to for_each and mreturn to yield.

Now that you’re familiar with the state monad, you may appreciate the fact that the same sel passed as the argument to mbind will yield a different number every time, as long as the state list contains unique digits.

Before You Post a Comment

I know what you’re thinking: Why would I want to complicate my life with monads if there is a much simpler, imperative way of dealing with state? What’s the payoff of the functional approach? The immediate payoff is thread safety. In imperative programming mutable shared state is a never-ending source of concurrency bugs. The state monad and the use of persistent data structures eliminates the possibility of data races. And it does it without any need for synchronization (other than reference counting inside shared pointers).

I will freely admit that C++ is not the best language for functional programming, and that monadic code in C++ looks overly complicated. But, let’s face it, in C++ everything looks overly complicated. What I’m describing here are implementation details of something that should be encapsulated in an easy to use library. I’ll come back to it in the next installment of this mini-series.

Challenges

  1. Implement select from the example in the text using getState and putState.
  2. Implement evalPlan, a version of runPlan that only returns the final value, without the state.
  3. Implement mthen — a version of mbind where the continuation doesn’t take any arguments. It ignores the result of the Plan, which is the first argument to mthen (but it still runs it, and uses the modified state).
  4. Use the state monad to build a simple RPN calculator. The state is the stack (a List) of items:
    enum ItemType { Plus, Minus, Num };
    
    struct Item
    {
        ItemType _type;
        int _num;
        Item(int i) : _type(Num), _num(i) {}
        Item(ItemType t) : _type(t), _num(-1) {}
    };

    Implement the function calc() that creates a plan to evaluate an integer. Here’s the test code that should print -1:

    List<int> stack{ Item(Plus)
                   , Item(Minus)
                   , Item(4)
                   , Item(8)
                   , Item(3) };
    cout << evalPlan(calc(), stack) << endl;

    (The solution is available on github.)


I am sometimes asked by C++ programmers to give an example of a problem that can’t be solved without monads. This is the wrong kind of question — it’s like asking if there is a problem that can’t be solved without for loops. Obviously, if your language supports a goto, you can live without for loops. What monads (and for loops) can do for you is to help you structure your code. The use of loops and if statements lets you convert spaghetti code into structured code. Similarly, the use of monads lets you convert imperative code into declarative code. These are the kind of transformations that make code easier to write, understand, maintain, and generalize.

So here’s a problem that you may get as an interview question. It’s a small problem, so the advantages of various approaches might not be immediately obvious, especially if you’ve been trained all your life in imperative programming, and you are seeing monads for the first time.

You’re supposed write a program to solve this puzzle:

  s e n d
+ m o r e
---------
m o n e y

Each letter correspond to a different digit between 0 and 9. Before you continue reading this post, try to think about how you would approach this problem.

The Analysis

It never hurts to impress your interviewer with your general knowledge by correctly classifying the problem. This one belongs to the class of “constraint satisfaction problems.” The obvious constraint is that the numbers obtained by substituting letters with digits have to add up correctly. There are also some less obvious constraints, namely the numbers should not start with zero.

If you were to solve this problem using pencil and paper, you would probably come up with lots of heuristics. For instance, you would deduce that m must stand for 1 because that’s the largest possible carry from the addition of two digits (even if there is a carry from the previous column). Then you’d figure out that s must be either 8 or 9 to produce this carry, and so on. Given enough time, you could probably write an expert system with a large set of rules that could solve this and similar problems. (Mentioning an expert system could earn you extra points with the interviewer.)

However, the small size of the problem suggests that a simple brute force approach is probably best. The interviewer might ask you to estimate the number of possible substitutions, which is 10!/(10 – 8)! or roughly 2 million. That’s not a lot. So, really, the solution boils down to generating all those substitutions and testing the constraints for each.

The Straightforward Solution

The mind of an imperative programmer immediately sees the solution as a set of 8 nested loops (there are 8 unique letters in the problem: s, e, n, d, m, o, r, y). Something like this:

for (int s = 0; s < 10; ++s)
    for (int e = 0; e < 10; ++e)
        for (int n = 0; n < 10; ++n)
            for (int d = 0; d < 10; ++d)
                ...

and so on, until y. But then there is the condition that the digits have to be different, so you have to insert a bunch of tests like:

e != s
n != s && n != e
d != s && d != e && d != n

and so on, the last one involving 7 inequalities… Effectively you have replaced the uniqueness condition with 28 new constraints.

This would probably get you through the interview at Microsoft, Google, or Facebook, but really, can’t you do better than that?

The Smart Solution

Before I proceed, I should mention that what follows is almost a direct translation of a Haskell program from the blog post by Justin Le. I strongly encourage everybody to learn some Haskell, but in the meanwhile I’ll be happy to serve as your translator.

The problem with our naive solution is the 28 additional constraints. Well, I guess one could live with that — except that this is just a tiny example of a whole range of constraint satisfaction problems, and it makes sense to figure out a more general approach.

The problem can actually be formulated as a superposition of two separate concerns. One deals with the depth and the other with the breadth of the search for solutions.

Let me touch on the depth issue first. Let’s consider the problem of creating just one substitution of letters with numbers. This could be described as picking 8 digits from a list of 0, 1, …9, one at a time. Once a digit is picked, it’s no longer in the list. We don’t want to hard code the list, so we’ll make it a parameter to our algorithm. Notice that this approach works even if the list contains duplicates, or if the list elements are not easily comparable for equality (for instance, if they are futures). We’ll discuss the list-picking part of the problem in more detail later.

Now let’s talk about breadth: we have to repeat the above process for all possible picks. This is what the 8 nested loops were doing. Except that now we are in trouble because each individual pick is destructive. It removes items from the list — it mutates the list. This is a well known problem when searching through solution spaces, and the standard remedy is called backtracking. Once you have processed a particular candidate, you put the elements back in the list, and try the next one. Which means that you have to keep track of your state, either implicitly on your function’s stack, or in a separate explicit data structure.

Wait a moment! Weren’t we supposed to talk about functional programming? So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming? Functional programmers have been using the state monad since time immemorial. And mutation is not an issue if you’re using persistent data structures. So fasten your seat belts and make sure your folding trays are in the upright position.

The List Monad

We’ll start with a small refresher in quantum mechanics. As you may remember from school, quantum processes are non-deterministic. You may repeat the same experiment many times and every time get a different result. There is a very interesting view of quantum mechanics called the many-worlds interpretation, in which every experiment gives rise to multiple alternate histories. So if the spin of an electron may be measured as either up or down, there will be one universe in which it’s up, and one in which it’s down.

Selections

We’ll use the same approach to solving our puzzle. We’ll create an alternate universe for each digit substitution for a given letter. So we’ll start with 10 universes for the letter s; then we’ll split each of them into ten universes for the letter e, and so on. Of course, most of these universes won’t yield the desired result, so we’ll have to destroy them. I know, it seems kind of wasteful, but in functional programming it’s easy come, easy go. The creation of a new universe is relatively cheap. That’s because new universes are not that different from their parent universes, and they can share almost all of their data. That’s the idea behind persistent data structures. These are the immutable data structures that are “mutated” by cloning. A cloned data structure shares most of its implementation with the parent, except for a small delta. We’ll be using persistent lists described in my earlier post.

Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward. First, we need functions that generate new worlds. Since we are cheap, we’ll only generate the parts that are different. So what’s the difference between all the worlds that we get when selecting the substitution for the letter s? Just the number that we assign to s. There are ten worlds corresponding to the ten possible digits (we’ll deal with the constraints like s being different from zero later). So all we need is a function that generates a list of ten digits. These are our ten universes in a nutshell. They share everything else.

Once you are in an alternate universe, you have to continue with your life. In functional programming, the rest of your life is just a function called a continuation. I know it sounds like a horrible simplification. All your actions, emotions, and hopes reduced to just one function. Well, maybe the continuation just describes one aspect of your life, the computational part, and you can still hold on to our emotions.

So what do our lives look like, and what do they produce? The input is the universe we’re in, in particular the one number that was picked for us. But since we live in a quantum universe, the outcome is a multitude of universes. So a continuation takes a number, and produces a list. It doesn’t have to be a list of numbers, just a list of whatever characterizes the differences between alternate universes. In particular, it could be a list of different solutions to our puzzle — triples of numbers corresponding to “send”, “more”, and “money”. (There is actually only one solution, but that’s beside the point.)

And what’s the very essence of this new approach? It’s the binding of the selection of the universes to the continuation. That’s where the action is. This binding, again, can be expressed as a function. It’s a function that takes a list of universes and a continuation that produces a list of universes. It returns an even bigger list of universes. We’ll call this function for_each, and we’ll make it as generic as possible. We won’t assume anything about the type of the universes that are passed in, or the type of the universes that the continuation k produces. We’ll also make the type of the continuation a template parameter and extract the return type from it using auto and decltype:

template<class A, class F>
auto for_each(List<A> lst, F k) -> decltype(k(lst.front()))
{
    using B = decltype(k(lst.front()).front());
    // This should really be expressed using concepts
    static_assert(std::is_convertible<
        F, std::function<List<B>(A)>>::value,
        "for_each requires a function type List<B>(A)");

    List<List<B>> lstLst = fmap(k, lst);
    return concatAll(lstLst);
}

The function fmap is similar to std::transform. It applies the continuation k to every element of the list lst. Because k itself produces a list, the result is a list of lists, lstLst. The function concatAll concatenates all those lists into one big list.

Congratulations! You have just seen a monad. This one is called the list monad and it’s used to model non-deterministic processes. The monad is actually defined by two functions. One of them is for_each, and here’s the other one:

template<class A>
List<A> yield(A a)
{
    return List<A> (a);
}

It’s a function that returns a singleton list. We use yield when we are done multiplying universes and we just want to return a single value. We use it to create a single-valued continuation. It represents the lonesome boring life, devoid of any choices.

I will later rename these functions to mbind and mreturn, because they are part of any monad, not just the list monad.

The names like for_each or yield have a very imperative ring to them. That’s because, in functional programming, monadic code plays a role similar to imperative code. But neither for_each nor yield are control structures — they are functions. In particular for_each, which sounds and works like a loop, is just a higher order function; and so is fmap, which is used in its implementation. Of course, at some level the code becomes imperative — fmap can either be implemented recursively or using an actual loop — but the top levels are just declarations of functions. Hence, declarative programming.

There is a slight difference between a loop and a function on lists like for_each: for_each takes a whole list as an argument, while a loop might generate individual items — in this case integers — on the fly. This is not a problem in a lazy functional language like Haskell, where a list is evaluated on demand. The same behavior may be implemented in C++ using streams or lazy ranges. I won’t use it here, since the lists we are dealing with are short, but you can read more about it in my earlier post Getting Lazy with C++.

We are not ready yet to implement the solution to our puzzle, but I’d like to give you a glimpse of what it looks like. For now, think of StateL as just a list. See if it starts making sense (I grayed out the usual C++ noise):

StateL<tuple<int, int, int>> solve()
{
    StateL<int> sel = &select<int>;

    return for_each(sel, [=](int s) {
    return for_each(sel, [=](int e) {
    return for_each(sel, [=](int n) {
    return for_each(sel, [=](int d) {
    return for_each(sel, [=](int m) {
    return for_each(sel, [=](int o) {
    return for_each(sel, [=](int r) {
    return for_each(sel, [=](int y) {
        return yield_if(s != 0 && m != 0, [=]() {
            int send  = asNumber(vector{s, e, n, d});
            int more  = asNumber(vector{m, o, r, e});
            int money = asNumber(vector{m, o, n, e, y});
            return yield_if(send + more == money, [=]() {
                return yield(make_tuple(send, more, money));
            });
        });
    });});});});});});});});
}

The first for_each takes a selection of integers, sel, (never mind how we deal with uniqueness); and a continuation, a lambda, that takes one integer, s, and produces a list of solutions — tuples of three integers. This continuation, in turn, calls for_each with a selection for the next letter, e, and another continuation that returns a list of solutions, and so on.

The innermost continuation is a conditional version of yield called yield_if. It checks a condition and produces a zero- or one-element list of solutions. Internally, it calls another yield_if, which calls the ultimate yield. If that final yield is called (and it might not be, if one of the previous conditions fails), it will produce a solution — a triple of numbers. If there is more than one solution, these singleton lists will get concatenated inside for_each while percolating to the top.

In the second part of this post I will come back to the problem of picking unique numbers and introduce the state monad. You can also have a peek at the code on github.

Challenges

  1. Implement for_each and yield for a vector instead of a List. Use the Standard Library transform instead of fmap.
  2. Using the list monad (or your vector monad), write a function that generates all positions on a chessboard as pairs of characters between 'a' and 'h' and numbers between 1 and 8.
  3. Implement a version of for_each (call it repeat) that takes a continuation k of the type function<List<B>()> (notice the void argument). The function repeat calls k for each element of the list lst, but it ignores the element itself.

We have this friendly competition going on between Eric Niebler and myself. He writes some clever C++ template code, and I feel the compulsion to explain it to him in functional terms. Then I write a blog about Haskell or category theory and Eric feels a compulsion to translate it into C++.

Eric is now working on his proposal to rewrite the C++ STL in terms of ranges and I keep reinterpreting his work in terms familiar to functional programmers. Eric’s range comprehensions are a result of some of this back and forth.

Lazy ranges are such an excellent example of functional programming that it would be foolish for me to pass this opportunity to dissect them. To any functional programmer worth their salt they just scream “monad!” A monad is a higher order pattern that can be built step by step, so that’s what I’m going to do. I’ll start with the functor pattern, then add some functionality that will make it a pointed functor, then add some more to make it an applicative functor, and finally add some more to make it a monad. This gradual buildup of functionality is reminiscent of building a class hierarchy, and indeed it can be modelled as such in Haskell (although Haskell type classes are slightly different than C++ classes). This hierarchy would look something like this:

  • A monad is-an applicative functor
  • An applicative functor is-a pointed functor
  • A pointed functor is-a functor

So let’s start with a functor.

Functor

I have a pet peeve about the use of the word “functor” in C++. People keep calling function objects functors. It’s like calling Luciano Pavarotti an “operator,” because he sings operas. The word functor has a very precise meaning in mathematics — moreover, it’s the branch of mathematics that’s extremely relevant to programming. So hijacking this term to mean a function-like object causes unnecessary confusion.

A functor in functional programming is a generic template, which allows the “lifting” of functions. Let me explain what it means. A generic template takes an arbitrary type as a template argument. So a range (whether lazy or eager) is a generic template because it can be instantiated for any type. You can have a range of integers, a range of vectors, a range of ranges, and so on. (We’ll come back to ranges of ranges later when we talk about monads.)

The “lifting” of functions means this: Give me any function from some type T to some other type U and I can apply this function to a range of T and produce a range of U. You may recognize this kind of lifting in the STL algorithm std::transform, which can be used to apply a function to a container. STL containers are indeed functors. Unfortunately, their functorial nature is buried under the noise of iterators. In Eric’s range library, the lifting is done much more cleanly using view::transform. Have a look at this example:

 int total = accumulate(view::iota(1) |
                        view::transform([](int x){return x*x;}) |
                        view::take(10), 0);

Here, view::transform takes an anonymous function that squares its argument, and lifts this function to the level of ranges. The range created by view::iota(1) is piped into it from the left, and the resulting rage of squares emerges from it on the right. The (infinite) range is then truncated by take‘ing the first 10 elements.

The function view::iota(1) is a factory that produces an infinite range of consecutive integers starting from 1. (We’ll come back to range factories later.)

In this form, view::transform plays the role of a higher-order function: one that takes a function and returns a function. It almost reaches the level of terseness and elegance of Haskell, where this example would look something like this:

total = sum $ take 10 $ fmap (\x->x*x) [1..]

(Traditionally, the flow of data in Haskell is from right to left.) The (higher-order) function fmap can be thought of as a “method” of the class Functor that does the lifting in Haskell. In C++ there is no overall functor abstraction, so each functor names its lifting function differently — for ranges, it’s view::transform.

The intuition behind a functor is that it generates a family of objects that somehow encapsulate values of arbitrary types. This encapsulation can be very concrete or very abstract. For instance, a container simply contains values of a given type. A range provides access to values that may be stored in some other container. A lazy range generates values on demand. A future, which is also a functor (or will be, in C++17), describes a value that might not be currently available because it’s being evaluated in a separate thread.

All these objects have one thing in common: they provide means to manipulate the encapsulated values with functions. That’s the only requirement for a functor. It’s not required that a functor provide access to encapsulated values (which may not even exist), although most do. In fact there is a functor (really, a monad), in Haskell, that provides no way of accessing its values other than outputting them to external devices.

Pointed Functor

A pointed functor is a functor with one additional ability: it lets you lift individual values. Give me a value of any type and I will encapsulate it. In Haskell, the encapsulating function is called pure although, as we will see later, in the context of a monad it’s called return.

All containers are pointed, because you can always create a singleton container — one that contains only one value. Ranges are more interesting. You can obviously create a range from a singleton container. But you can also create a lazy range from a value using a (generic) function called view::single, which doesn’t have a backing container behind it.

There is, however, an alternative way of lifting a value to a range, and that is by repeating it indefinitely. The function that creates such infinite (lazy) ranges is called view::repeat. For instance, view::repeat(1) creates an infinite series of ones. You might be wondering what use could there be of such a repetitive range. Not much, unless you combine it with other ranges. In general, pointed functors are not very interesting other than as stepping stones towards applicative functors and monads. So let’s move on.

Applicative Functor

An applicative functor is a pointed functor with one additional ability: it lets you lift multi-argument functions. We already know how to lift a single-argument function using fmap (or transform, or whatever it’s called for a particular functor).

With multi-argument functions acting on ranges we have two different options corresponding to the two choices for pure I mentioned before: view::single and view::repeat.

The idea, taken from functional languages, is to consider what happens when you provide the fist argument to a function of multiple arguments (it’s called partial application). You don’t get back a value. Instead you get something that expects one or more remaining arguments. A thing that expects arguments is called a function (or a function object), so you get back a function of one fewer arguments. In C++ you can’t just call a function with fewer arguments than expected, or you get a compilation error, but there is a (higher-order) function in the Standard Library called std::bind that implements partial application.

This kind of transformation from a function of multiple arguments to a function of one argument that returns a function is called currying.

Let’s consider a simple example. We want to apply std::make_pair to two ranges: view::ints(10, 11) and view::ints(1, 3). To this end, let’s replace std::make_pair with the corresponding curried function of one argument returning a function of one argument:

[](int i) { return [i](int j) { return std::make_pair(i, j); };}

First, we want to apply this function to the first range. We know how to apply a function to a range: we use view::transform.

auto partial_app = view::ints(10, 11) 
                 | view::transform([](int i) { 
                      return [i](int j) { return std::make_pair(i, j); }
                   });

What’s the result of this application? Can you guess? Our curried function will be applied to each integer in the range, returning a function that pairs that integer with its argument. So we end up with a range of functions of the form:

[i](int j) { return std::make_pair(i, j); }

So far so good — we have just used the functorial property of the range. But now we have to decide how to apply a range of functions to the second range of values. And that’s the essence of the definition of an applicative functor. In Haskell the operation of applying encapsulated functions to encapsulated arguments is denoted by an infix operator <*>.

With ranges, there are two basic strategies:

  1. We can enumerate all possible combinations — in other words create the cartesian product of the range of functions with the range of values — or
  2. Match corresponding functions to corresponding values — in other words, “zip” the two ranges.

The former, when applied to view::ints(1, 3), will yield:

{(10,1),(10,2),(10,3),(11,1),(11,2),(11,3)}

and the latter will yield:

{(10, 1),(11, 2)}

(when the ranges are not equal length, you stop zipping when the shorter one is exhausted).

To see that those two choices correspond to the two choices for pure, we have to look at some consistency conditions. One of them is that if you encapsulate a single-argument function in a range using pure and then apply it to a range of arguments, you should get the same result as if you simply fmapped this function directly over the range of arguments. For the record, I’ll write it here in Haskell:

pure f <*> xs == fmap f xs

This is sort of an obvious requirement: You have two different ways of applying a single-argument function to a range, they better give the same result.

Let’s try it with the view::single version of pure. When acting on a function, it will create a one-element range containing this function. The “all possible combinations” application will just apply this function to all elements of the argument range, which is exactly what view::transform would do.

Conversely, if we apply view::repeat to the function, we’ll get an infinite range that repeats this function at every position. We have to zip this range with the range of arguments in order to get the same result as view::transform. So this implementation of pure works with the zippy applicative. Notice that if the argument range is finite the result will also be finite. But this kind of application will also work with infinite ranges thanks to laziness.

So there are two legitimate implementations of the applicative functor for ranges. One uses view::single to lift values and uses the all possible combinations strategy to apply a range of functions to a range of arguments. The other uses view::repeat to lift values and the zipping application for ranges of functions and arguments. They are both acceptable and have their uses.

Now let’s go back to our original problem of applying a function of multiple arguments to a bunch of ranges. Since we are not doing it in Haskell, currying is not really a practical option.

As it turns out, the second version of applicative has been implemented by Eric as a (higher-order) function view::zip_with. This function takes a multi-argument callable object as its first argument, followed by a variadic list of ranges.

There is no corresponding implementation for the combinatorial applicative. I think the most natural interface would be an overload of view::transform (or maybe view::fmap) with the same signature as zip_with. Our example would then look like this:

view::transform(std::make_pair, view::ints(10, 11), view::ints(1, 3));

The need for this kind of interface is not as acute because, as we’ll learn next, the combinatorial applicative is supplanted by a more general monadic interface.

Monad

Monads are applicative functors with one additional functionality. There are two equivalent ways of describing this functionality. But let me first explain why this functionality is needed.

The range library comes with a bunch of range factories, such as view::iota, view::ints, or view::repeat. It’s also very likely that users of the library will want to create their own range factories. The problem is: How do you compose existing range factories to obtain new range factories?

Let me give you an example that generated a blog post exchange between me and Eric. The challenge was to generate a list of Pythagorean triples. The idea is to take a cross product of three infinite ranges of integers and select those triples that satisfy the equation x2 + y2 = z2. The cross product of ranges is reminiscent of the “all possible combinations” applicative, and indeed that’s the applicative that can be extended to a monad (the zippy one can’t).

To make this algorithm feasible, we have to organize these ranges so we can (lazily) traverse them. Let’s start with a factory that produces all integers from 1 to infinity. That’s the view::ints(1) factory. Then, for each z produced by that factory, let’s create another factory view::ints(1, z). This range will provide our xs — and it makes no sense to try xs that are bigger than zs. These values, in turn, will be used in the creation of the third factory, view::ints(x, z) that will generate our ys. At the end we’ll filter out the triples that don’t satisfy the Pythagorean equation.

Notice how we are feeding the output of one range factory into another range factory. Do you see the problem? We can’t just iterate over an infinite range. We need a way to glue the output side of one range factory to the input side of another range factory without unpacking the range. And that’s what monads are for.

Remember also that there are functors that provide no way of extracting values, or for which extraction is expensive or blocking (as is the case with futures). Being able to compose those kinds of functor factories is often essential, and again, the monad is the answer.

Now let’s pinpoint the type of functionality that would allow us to glue range factories end-to-end. Since ranges are functorial, we can use view::transform to apply a factory to a range. After all a factory is just a function. The only problem is that the result of such application is a range of ranges. So, really, all that’s needed is a lazy way of flattening nested ranges. And that’s exactly what Eric’s view::flatten does.

With this new flattening power at our disposal, here’s a possible beginning of the solution to the Pythagorean triple problem:

view::ints(1) | view::transform([](int z) { 
                view::ints(1, z) | ... } | view::flatten

However, this combination of view::transform and view::flatten is so useful that it deserves its own function. In Haskell, this function is called “bind” and is written as an infix operator >>=. (And, while we’re at it, flatten is called join.)

And guess what the combination of view::transform and view::flatten is called in the range library. This discovery struck me as I was looking at one of Eric’s examples. It’s called view::for_each. Here’s the solution to the Pythagorean triple problem using view::for_each to bind range factories:

auto triples =
  for_each(ints(1), [](int z) {
    return for_each(ints(1, z), [=](int x) {
      return for_each(ints(x, z), [=](int y) {
        return yield_if(x*x + y*y == z*z, std::make_tuple(x, y, z));
      });
    });
  });

And here’s the same code in Haskell:

triples = 
  (>>=) [1..] $ \z -> 
     (>>=) [1..z] $ \x -> 
        (>>=) [x..z] $ \y -> 
           guard (x^2 + y^2 == z^2) >> return (x, y, z)

I have purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators >>= are normally used in infix position but here I wanted to match them against for_each. Haskell’s return is the same as view::single, which Eric renamed to yield inside for_each. In this particular case, yield is conditional, which in Haskell is expressed using guard. The syntax for lambdas is also different, but otherwise the code matches almost perfectly.

This is an amazing and somewhat unexpected convergence. In our tweeter exchange, Eric sheepishly called his for_each code imperative. We are used to thinking of for_each as synonymous with looping, which is such an iconic imperative construct. But here, for_each is the monadic bind — the epitome of functional programming. This puppy is purely functional. It’s an expression that returns a value (a range) and has no side effects.

But what about those loops that do have side effects and don’t return a value? In Haskell, side effects are always encapsulated using monads. The equivalent of a for_each loop with side effects would return a monadic object. What we consider side effects would be encapsulated in that object. It’s not the loop that performs side effects, its that object. It’s an executable object. In the simplest case, this object contains a function that may be called with the state that is to be modified. For side effects that involve the external world, there is a special monad called the IO monad. You can produce IO objects, you can compose them using monadic bind, but you can’t execute them. Instead you return one such object that combines all the IO of your program from main and let the runtime execute it. (At least that’s the theory.)

Is this in any way relevant to an imperative programmer? After all, in C++ you can perform side effects anywhere in your code. The problem is that there are some parts of your code where side effects can kill you. In concurrent programs uncontrolled side effects lead to data races. In Software Transactional Memory (STM, which at some point may become part of C++) side effects may be re-run multiple times when a transaction is retried. There is an urgent need to control side effects and to separate pure functions from their impure brethren. Encapsulating side effects inside monads could be the ticket to extend the usefulness of pure functions inside an imperative language like C++.

To summarize: A monad is an applicative functor with an additional ability, which can be expressed either as a way of flattening a doubly encapsulated object, or as a way of applying a functor factory to an encapsulated object.

In the range library, the first method is implemented through view::flatten, and the second through view::for_each. Being an applicative functor means that a range can be manipulated using view::transform and that any value may be encapsulated using view::single or, inside for_each, using yield.

The ability to apply a range of functions to a range of arguments that is characteristic of an applicative functor falls out of the monadic functionality. For instance, the example from the previous section can be rewritten as:

for_each(ints(10, 11), [](int i) {
  return for_each(ints(1, 3), [i](int j) {
    return yield(std::make_pair(i, j));
  });
});

The Mess We’re In

I don’t think the ideas I presented here are particularly difficult. What might be confusing though is the many names that are used to describe the same thing. There is a tendency in imperative (and some functional) languages to come up with cute names for identical patterns specialized to different applications. It is also believed that programmers would be scared by terms taken from mathematics. Personally, I think that’s silly. A monad by any other name would smell as sweet, but we wouldn’t be able to communicate about them as easily. Here’s a sampling of various names used in relation to concepts I talked about:

  1. Functor: fmap, transform, Select (LINQ)
  2. Pointed functor: pure, return, single, repeat, make_ready_future, yield, await
  3. Applicative functor: <*>, zip_with
  4. Monad: >>=, bind, mbind, for_each, next, then, SelectMany (LINQ)

Part of the problem is the lack of expressive power in C++ to unite such diverse phenomena as ranges and futures. Unfortunately, the absence of unifying ideas adds to the already overwhelming complexity of the language and its libraries. The functional paradigm could be a force capable of building connections between seemingly distant application areas.

Acknowledments

I’m grateful to Eric Niebler for reviewing the draft of this blog and correcting several mistakes. The remaining mistakes are all mine.


Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

void printNTriples(int n)
{
    int i = 0;
    for (int z = 1; ; ++z)
        for (int x = 1; x <= z; ++x)
            for (int y = x; y <= z; ++y)
                if (x*x + y*y == z*z) {
                    printf("%d, %d, %d\n", x, y, z);
                    if (++i == n)
                        return;
                }
}

Here, a single C function serves three distinct purposes: It

  1. Generates Pythagorean triples,
  2. Prints them,
  3. Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

  1. Generate the list of Pythagorean triples,
  2. Take the first ten of them, and
  3. Print them.

And that’s exactly how you’d write this program in Haskell:

main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
                     , x <- [1..z]
                     , y <- [x..z]
                     , x^2 + y^2 == z^2]

This program reads: take 10 triples and print them. It declares triples as a list (square brackets mean a list) of triples (x, y, z), where (the vertical bar reads “where”) z is an element of the list of integers from 1 to infinity, x is from 1 to z, y is from x to z, and the sum of squares of x and y is equal to the square of z. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to print that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.

Suspension

We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

template<class T>
class Susp {
public:
    explicit Susp(std::function<T()> f)
        : _f(f)
    {}
    T get() { return _f(); }
private:
    std::function<T()> _f;
};

We often create suspensions using lambda functions, as in:

int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
...
int z = sum.get();

Notice that the suspended lambda may capture variables from its environment: here x and y. A lambda, and therefore a suspension, is a closure.

The trouble with this implementation is that the function is re-executed every time we call get. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls get we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the this pointer) and returning a value (by const reference). The get method simply calls this thunk, passing it the this pointer.

Initially, the thunk is set to thunkForce, which calls the method setMemo. This method evaluates the function, stores the result in _memo, switches the thunk to thunkGet, and returns the memoized value. On subsequent calls get goes through the getMemo thunk which simply returns the memoized value.

template<class T>
class Susp
{
    // thunk
    static T const & thunkForce(Susp * susp) {
        return susp->setMemo();
    }
    // thunk
    static T const & thunkGet(Susp * susp) {
        return susp->getMemo();
    }
    T const & getMemo() {
        return _memo;
    }
    T const & setMemo() {
        _memo = _f();
        _thunk = &thunkGet;
        return getMemo();
    }
public:
    explicit Susp(std::function<T()> f)
        : _f(f), _thunk(&thunkForce), _memo(T())
    {}
    T const & get() {
        return _thunk(this);
    }
private:
    T const & (*_thunk)(Susp *);
    mutable T   _memo;

    std::function<T()> _f;
};

(By the way, the function pointer declaration of _thunk looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn’t ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the const method pop_front.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

template<class T>
class Stream
{
private:
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
};

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the Cell? Remember, we want to be able to generate infinite sequences, so Stream must contain the DNA for not only producing the value of type T but also for producing the offspring — another (lazy) Stream of values. The Cell is just that: A value and a stream.

template<class T>
class Cell
{
public:
    Cell() {} // need default constructor for memoization
    Cell(T v, Stream<T> const & tail)
        : _v(v), _tail(tail)
    {}
    explicit Cell(T v) : _v(v) {}
    T val() const {
        return _v;
    }
    Stream<T> pop_front() const {
        return _tail;
    }
private:
    T _v;
    Stream<T> _tail;
};

This mutually recursive pair of data structures works together amazingly well.

template<class T>
class Stream
{
private:
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
public:
    Stream() {}
    Stream(std::function<Cell<T>()> f)
        : _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
    {}
    Stream(Stream && stm)
        : _lazyCell(std::move(stm._lazyCell))
    {}
    Stream & operator=(Stream && stm)
    {
        _lazyCell = std::move(stm._lazyCell);
        return *this;
    }
    bool isEmpty() const
    {
        return !_lazyCell;
    }
    T get() const
    {
        return _lazyCell->get().val();
    }
    Stream<T> pop_front() const
    {
        return _lazyCell->get().pop_front();
    }
};

There are several things worth pointing out. The two constructors follow our formal definition of the Stream: one constructs an empty stream, the other constructs a suspended Cell. A suspension is created from a function returning Cell.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of forEach.

The magic happens when we call get for the first time. That’s when the suspended Cell comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to pop_front. Notice that pop_front is a const method — the Stream itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from n to infinity. The constructor of a Stream takes a function that returns a Cell. We’ll use a lambda that captures the value of n. It creates a Cell with that value and a tail, which it obtains by calling intsFrom with n+1:

Stream<int> intsFrom(int n)
{
    return Stream<int>([n]()
    {
        return Cell<int>(n, intsFrom(n + 1)); 
    });
}

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner intsFrom is not made from the outer intsFrom. Instead it’s made the first time get is called on the emerging Stream.

Of course, we can also create finite streams, like this one, which produces integers from n to m:

Stream<int> ints(int n, int m)
{
    if (n > m)
        return Stream<int>();
    return Stream<int>([n, m]()
    {
        return Cell<int>(n, ints(n + 1, m));
    });
}

The trick is to capture the limit m as well as the recursion variable n. When the limit is reached, we simply return an empty Stream.

We’ll also need the method take, which creates a Stream containing the first n elements of the original stream:

Stream take(int n) const {
    if (n == 0 || isEmpty())
        return Stream();
    auto cell = _lazyCell;
    return Stream([cell, n]()
    {
        auto v = cell->get().val();
        auto t = cell->get().pop_front();
        return Cell<T>(v, t.take(n - 1));
    });
}

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, Stream. Again, the key to understanding why this works is to keep in mind that Streams and Cells are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of take forces the evaluation of the first n elements — e.g., by passing them through the consuming forEach below — these elements will appear miraculously memoized in the original Stream.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of forEach that consumes the stream while enumerating it and feeding its elements to a function.

template<class T, class F>
void forEach(Stream<T> strm, F f)
{
    while (!strm.isEmpty())
    {
        f(strm.get());
        strm = strm.pop_front();
    }
}

It’s the assignment:

strm = strm.pop_front();

which consumes the stream by decreasing the reference count of the head of the Stream. In particular, if you pass an rvalue Stream to forEach, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the Stream. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and shared_ptr.

Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it z — create a stream from 1 to z. For each of those — call them x — create a stream from x to z. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples (x, y, z).

So far we’ve learned how to create a stream of integers — we have the function intsFrom. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our Stream is a functor.

Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the std::transform algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it fmap — generic, we have to parameterize it over types. The algorithm starts with a Stream of elements of type T and a function from T to some other type U. The result should be a stream of U.

We don’t want to make U the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type F to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

template<class T, class F>
auto fmap(Stream<T> stm, F f)

Without the use of concepts, we have no way of enforcing, or even specifying, that F be a type of a function from T to U. The best we can do is to statically assert it inside the function:

static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

But what is U? We can get at it using decltype:

decltype(f(stm.get()));

Notice that decltype takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of f. We also need a dummy argument for this function: we use the result of stm.get(). The argument to decltype is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of fmap. It’s supposed to be Stream<U>, but we don’t know U until we apply decltype to the arguments of fmap. We have to use the new auto function declaration syntax of C++11. So here are all the type-related preliminaries:

template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
{
    using U = decltype(f(stm.get()));
    static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");
    ...
}

Compared to that, the actual implementation of fmap seems rather straightforward:

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
    {
        return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));
    });

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function f, and returns a Cell. That cell contains the value of f acting on the first element of the original stream, and a tail. The tail is created with fmap acting on the rest of the original stream.

Equipped with fmap, we can now attempt to take the first step towards generating our triples: apply the function ints(1, z) to each element of the stream intsFrom(1):

fmap(intsFrom(1), [](int z)
{
    return ints(1, z);
});

The result is a Stream of Streams of integers of the shape:

1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
...

But now we are stuck. We’d like to apply ints(x, z) to each element of that sequence, but we don’t know how to get through two levels of Stream. Our fmap can only get through one layer. We need a way to flatten a Stream of Streams. That functionality is part of what functional programmers call a monad. So let me show you that Stream is indeed a monad.

Stream as a Monad

If you think of a Stream as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two Streams (we’ll get to it later) and let’s implement a function mjoin that concatenates a whole Stream of Streams.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a Stream of Streams, so we have to test two levels deep. This way we’ll ensure that the concatenation of a Stream of empty Streams immediately returns an empty Stream.

The recursive step in mjoin creates a Cell whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of mjoin of the rest of the streams:

template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
{
    while (!stm.isEmpty() && stm.get().isEmpty())
    {
        stm = stm.pop_front();
    }
    if (stm.isEmpty()) return Stream<T>();
    return Stream<T>([stm]()
    {
        Stream<T> hd = stm.get();
        return Cell<T>( hd.get()
                      , concat(hd.pop_front(), mjoin(stm.pop_front())));
    });
}

The combination of fmap and mjoin lets us compose function like intsFrom or ints that return Streams. In fact, this combination is so common that it deserves its own function, which we’ll call mbind:

template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
{
    return mjoin(fmap(stm, f));
}

If we use mbind in place of fmap:

mbind(intsFrom(1), [](int z)
{
    return ints(1, z);
});

we can produce a flattened list:

1 1 2 1 2 3 1 2 3 4...

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable z. We can keep chaining calls to mbind and capture more variables in the process:

mbind(intsFrom(1), [](int z)
{
    return mbind(ints(1, z), [z](int x)
    {
        return mbind(ints(x, z), [x, z](int y)
        {
            ...
        }
    }
}

At this point we have captured the triples x, y, z, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is mthen which is a version of mbind that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, mthen will return an empty stream.

We implement mthen using a slightly modified version of fmap that takes a function f of no arguments:

template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
{
    using U = decltype(f());
    static_assert(std::is_convertible<F, std::function<U()>>::value,
        "fmapv requires a function type U()");

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
    {
        return Cell<U>(f(), fmapv(stm.pop_front(), f));
    });
}

We plug it into the definition of mthen the same way fmap was used in mbind:

template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
{
    return mjoin(fmapv(stm, f));
}

The second useful function is mreturn, which simply turns a value of any type into a one-element Stream:

template<class T>
Stream<T> mreturn(T v)
{
    return Stream<T>([v]() {
        return Cell<T>(v);
    });
}

We’ll need mreturn to turn our triples into Streams.

It so happens that a parameterized type equipped with mbind and mreturn is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy Stream is indeed a monad.

Stream as a Monoid and a Monad Plus

When implementing mjoin we used the function concat to lazily concatenate two Streams. Its implementation follows the same recursive pattern we’ve seen so many times:

template<class T>
Stream<T> concat( Stream<T> lft
                , Stream<T> rgt)
{
    if (lft.isEmpty())
        return rgt;
    return Stream<T>([=]()
    {
        return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));
    });
}

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like concat, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of Streams is indeed associative, and that the neutral element is an empty Stream. Concatenating an empty Stream, whether in front or in the back of any other Stream, doesn’t change the original Stream.

What’s even more interesting is that being a combination of a monoid and a monad makes Stream into a monad plus, and every monad plus defines a guard function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a Stream. If the Boolean is false, the Stream is empty (the unit element of monad plus!), otherwise it’s a singleton Stream. We really don’t care what value sits in this Stream — we never use the result of guard for anything but the flow of control. In Haskell, there is a special “unit” value () — here I use a nullptr as its closest C++ analog.

Stream<void*> guard(bool b)
{
    if (b) return Stream<void*>(nullptr);
    else return Stream<void*>();
}

We can now pipe the result of guard into mthen, which will ignore the content of the Stream but won’t fire when the Stream is empty. When the Stream is not empty, we will call mreturn to output a singleton Stream with the result tuple:

Stream<std::tuple<int, int, int>> triples()
{
    return mbind(intsFrom(1), [](int z)
    {
        return mbind(ints(1, z), [z](int x)
        {
            return mbind(ints(x, z), [x, z](int y)
            {
                return mthen(guard(x*x + y*y == z*z), [x, y, z]()
                {
                    return mreturn(std::make_tuple(x, y, z));
                });
            });
        });
    });
}

These singletons will then be concatenated by the three levels of mbind to create one continuous lazy Stream of Pythagorean triples.

Compare this function with its Haskell counterpart:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

Now, the client can take 10 of those triples from the Stream — and the triples still won’t be evaluated!. It’s the consuming forEach that finally forces the evaluation:

void test()
{
    auto strm = triples().take(10);
    forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
    {
        std::cout << std::get<0>(t) << ", " 
                  << std::get<1>(t) << ", " 
                  << std::get<2>(t) << std::endl;
    });
}

Conclusion

The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function triples to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.

Next Page »