November 2014

This is part of the book Category Theory for Programmers. The previous instalment was Category: The Essence of Composition. See the Table of Contents.

The category of types and functions plays an important role in programming, so let’s talk about what types are and why we need them.

Who Needs Types?

There seems to be some controversy about the advantages of static vs. dynamic and strong vs. weak typing. Let me illustrate these choices with a thought experiment. Imagine millions of monkeys at computer keyboards happily hitting random keys, producing programs, compiling, and running them.


With machine language, any combination of bytes produced by monkeys would be accepted and run. But with higher level languages, we do appreciate the fact that a compiler is able to detect lexical and grammatical errors. Lots of monkeys will go without bananas, but the remaining programs will have a better chance of being useful. Type checking provides yet another barrier against nonsensical programs. Moreover, whereas in a dynamically typed language, type mismatches would be discovered at runtime, in strongly typed statically checked languages type mismatches are discovered at compile time, eliminating lots of incorrect programs before they have a chance to run.

So the question is, do we want to make monkeys happy, or do we want to produce correct programs?

The usual goal in the typing monkeys thought experiment is the production of the complete works of Shakespeare. Having a spell checker and a grammar checker in the loop would drastically increase the odds. The analog of a type checker would go even further by making sure that, once Romeo is declared a human being, he doesn’t sprout leaves or trap photons in his powerful gravitational field.

Types Are About Composability

Category theory is about composing arrows. But not any two arrows can be composed. The target object of one arrow must be the same as the source source object of the next arrow. In programming we pass the results on one function to another. The program will not work if the target function is not able to correctly interpret the data produced by the source function. The two ends must fit for the composition to work. The stronger the type system of the language, the better this match can be described and mechanically verified.

The only serious argument I hear against strong static type checking is that it might eliminate some programs that are semantically correct. In practice, this happens extremely rarely and, in any case, every language provides some kind of a backdoor to bypass the type system when that’s really necessary. Even Haskell has unsafeCoerce. But such devices should by used judiciously. Franz Kafka’s character, Gregor Samsa, breaks the type system when he metamorphoses into a giant bug, and we all know how it ends.

Another argument I hear a lot is that dealing with types imposes too much burden on the programmer. I could sympathize with this sentiment after having to write a few declarations of iterators in C++ myself, except that there is a technology called type inference that lets the compiler deduce most of the types from the context in which they are used. In C++, you can now declare a variable auto and let the compiler figure out its type.

In Haskell, except on rare occasions, type annotations are purely optional. Programmers tend to use them anyway, because they can tell a lot about the semantics of code, and they make compilation errors easier to understand. It’s a common practice in Haskell to start a project by designing the types. Later, type annotations drive the implementation and become compiler-enforced comments.

Strong static typing is often used as an excuse for not testing the code. You may sometimes hear Haskell programmers saying, “If it compiles, it must be correct.” Of course, there is no guarantee that a type-correct program is correct in the sense of producing the right ouput. The result of this cavalier attitude is that in several studies Haskell didn’t come as strongly ahead of the pack in code quality as one would expect. It seems that, in the commercial setting, the pressure to fix bugs is applied only up to a certain quality level, which has everything to do with the economics of software development and the tolerance of the end user, and very little to do with the programming language or methodology. A better criterion would be to measure how many projects fall behind schedule or are delivered with drastically reduced functionality.

As for the argument that unit testing can replace strong typing, consider the common refactoring practice in strongly typed languages: changing the type of an argument of a particular function. In a strongly typed language, it’s enough to modify the declaration of that function and then fix all the build breaks. In a weakly typed language, the fact that a function now expects different data cannot be propagated to call sites. Unit testing may catch some of the mismatches, but testing is almost always a probabilistic rather than a deterministic process. Testing is a poor substitute for proof.

What Are Types?

The simplest intuition for types is that they are sets of values. The type Bool (remember, concrete types start with a capital letter in Haskell) is a two-element set of True and False. Type Char is a set of all Unicode characters like 'a' or 'ą'.

Sets can be finite or infinite. The type of String, which is a synonym for a list of Char, is an example of an infinite set.

When we declare x to be an Integer:

x :: Integer

we are saying that it’s an element of the set of integers. Integer in Haskell is an infinite set, and it can be used to do arbitrary precision arithmetic. There is also a finite-set Int that corresponds to machine type, just like the C++ int.

There are some subtleties that make this identification of types and sets tricky. There are problems with polymorphic functions that involve circular definitions, and with the fact that you can’t have a set of all sets; but as I promised, I won’t be a stickler for math. The great thing is that there is a category of sets, which is called Set, and we’ll just work with it. In Set, objects are sets and morphisms (arrows) are functions.

Set is a very special category, because we can actually peek inside its objects and get a lot of intuitions from doing that. For instance, we know that an empty set has no elements. We know that there are special one-element sets. We know that functions map elements of one set to elements of another set. They can map two elements to one, but not one element to two. We know that an identity function maps each element of a set to itself, and so on. The plan is to gradually forget all this information and instead express all those notions in purely categorical terms, that is in terms of objects and arrows.

In the ideal world we would just say that Haskell types are sets and Haskell functions are mathematical functions between sets. There is just one little problem: A mathematical function does not execute any code — it just knows the answer. A Haskell function has to calculate the answer. It’s not a problem if the answer can be obtained in a finite number of steps — however big that number might be. But there are some calculations that involve recursion, and those might never terminate. We can’t just ban non-terminating functions from Haskell because distinguishing between terminating and non-terminating functions is undecidable — the famous halting problem. That’s why computer scientists came up with a brilliant idea, or a major hack, depending on your point of view, to extend every type by one more special value called the bottom and denoted by _|_, or Unicode ⊥. This “value” corresponds to a non-terminating computation. So a function declared as:

f :: Bool -> Bool

may return True, False, or _|_; the latter meaning that it would never terminate.

Interestingly, once you accept the bottom as part of the type system, it is convenient to treat every runtime error as a bottom, and even allow functions to return the bottom explicitly. The latter is usually done using the expression undefined, as in:

f :: Bool -> Bool
f x = undefined

This definition type checks because undefined evaluates to bottom, which is a member of any type, including Bool. You can even write:

f :: Bool -> Bool
f = undefined

(without the x) because the bottom is also a member of the type Bool->Bool.

Functions that may return bottom are called partial, as opposed to total functions, which return valid results for every possible argument.

Because of the bottom, you’ll see the category of Haskell types and functions referred to as Hask rather than Set. From the theoretical point of view, this is the source of never-ending complications, so at this point I will use my butcher’s knife and terminate this line of reasoning. From the pragmatic point of view, it’s okay to ignore non-terminating functions and bottoms, and treat Hask as bona fide Set (see Bibliography at the end).

Why Do We Need a Mathematical Model?

As a programmer you are intimately familiar with the syntax and grammar of your programming language. These aspects of the language are usually described using formal notation at the very beginning of the language spec. But the meaning, or semantics, of the language is much harder to describe; it takes many more pages, is rarely formal enough, and almost never complete. Hence the never ending discussions among language lawyers, and a whole cottage industry of books dedicated to the exegesis of the finer points of language standards.

There are formal tools for describing the semantics of a language but, because of their complexity, they are mostly used with simplified academic languages, not real-life programming behemoths. One such tool called operational semantics describes the mechanics of program execution. It defines a formalized idealized interpreter. The semantics of industrial languages, such as C++, is usually described using informal operational reasoning, often in terms of an “abstract machine.”

The problem is that it’s very hard to prove things about programs using operational semantics. To show a property of a program you essentially have to “run it” through the idealized interpreter.

It doesn’t matter that programmers never perform formal proofs of correctness. We always “think” that we write correct programs. Nobody sits at the keyboard saying, “Oh, I’ll just throw a few lines of code and see what happens.” We think that the code we write will perform certain actions that will produce desired results. We are usually quite surprised when it doesn’t. That means we do reason about programs we write, and we usually do it by running an interpreter in our heads. It’s just really hard to keep track of all the variables. Computers are good at running programs — humans are not! If we were, we wouldn’t need computers.

But there is an alternative. It’s called denotational semantics and it’s based on math. In denotational semantics every programing construct is given its mathematical interpretation. With that, if you want to prove a property of a program, you just prove a mathematical theorem. You might think that theorem proving is hard, but the fact is that we humans have been building up mathematical methods for thousands of years, so there is a wealth of accumulated knowledge to tap into. Also, as compared to the kind of theorems that professional mathematicians prove, the problems that we encounter in programming are usually quite simple, if not trivial.

Consider the definition of a factorial function in Haskell, which is a language quite amenable to denotational semantics:

fact n = product [1..n]

The expression [1..n] is a list of integers from 1 to n. The function product multiplies all elements of a list. That’s just like a definition of factorial taken from a math text. Compare this with C:

int fact(int n) {
    int i;
    int result = 1;
    for (i = 2; i <= n; ++i)
        result *= i;
    return result;

Need I say more?

Okay, I’ll be the first to admit that this was a cheap shot! A factorial function has an obvious mathematical denotation. An astute reader might ask: What’s the mathematical model for reading a character from the keyboard or sending a packet across the network? For the longest time that would have been an awkward question leading to a rather convoluted explanation. It seemed like denotational semantics wasn’t the best fit for a considerable number of important tasks that were essential for writing useful programs, and which could be easily tackled by operational semantics. The breakthrough came from category theory. Eugenio Moggi discovered that computational effect can be mapped to monads. This turned out to be an important observation that not only gave denotational semantics a new lease on life and made pure functional programs more usable, but also shed new light on traditional programming. I’ll talk about monads later, when we develop more categorical tools.

One of the important advantages of having a mathematical model for programming is that it’s possible to perform formal proofs of correctness of software. This might not seem so important when you’re writing consumer software, but there are areas of programming where the price of failure may be exorbitant, or where human life is at stake. But even when writing web applications for the health system, you may appreciate the thought that functions and algorithms from the Haskell standard library come with proofs of correctness.

Pure and Dirty Functions

The things we call functions in C++ or any other imperative language, are not the same things mathematicians call functions. A mathematical function is just a mapping of values to values.

We can implement a mathematical function in a programming language: Such a function, given an input value will calculate the output value. A function to produce a square of a number will probably multiply the input value by itself. It will do it every time it’s called, and it’s guaranteed to produce the same output every time it’s called with the same input. The square of a number doesn’t change with the phases of the Moon.

Also, calculating the square of a number should not have a side effect of dispensing a tasty treat for your dog. A “function” that does that cannot be easily modelled as a mathematical function.

In programming languages, functions that always produce the same result given the same input and have no side effects are called pure functions. In a pure functional language like Haskell all functions are pure. Because of that, it’s easier to give these languages denotational semantics and model them using category theory. As for other languages, it’s always possible to restrict yourself to a pure subset, or reason about side effects separately. Later we’ll see how monads let us model all kinds of effects using only pure functions. So we really don’t lose anything by restricting ourselves to mathematical functions.

Examples of Types

Once you realize that types are sets, you can think of some rather exotic types. For instance, what’s the type corresponding to an empty set? No, it’s not C++ void, although this type is called Void in Haskell. It’s a type that’s not inhabited by any values. You can define a function that takes Void, but you can never call it. To call it, you would have to provide a value of the type Void, and there just aren’t any. As for what this function can return, there are no restrictions whatsoever. It can return any type (although it never will, because it can’t be called). In other words it’s a function that’s polymorphic in the return type. Haskellers have a name for it:

absurd :: Void -> a

(Remember, a is a type variable that can stand for any type.) The name is not coincidental. There is deeper interpretation of types and functions in terms of logic called the Curry-Howard isomorphism. The type Void represents falsity, and the type of the function absurd corresponds to the statement that from falsity follows anything, as in the Latin adage “ex falso sequitur quodlibet.”

Next is the type that corresponds to a singleton set. It’s a type that has only one possible value. This value just “is.” You might not immediately recognise it as such, but that is the C++ void. Think of functions from and to this type. A function from void can always be called. If it’s a pure function, it will always return the same result. Here’s an example of such a function:

int f44() { return 44; }

You might think of this function as taking “nothing”, but as we’ve just seen, a function that takes “nothing” can never be called because there is no value representing “nothing.” So what does this function take? Conceptually, it takes a dummy value of which there is only one instance ever, so we don’t have to mention it explicitly. In Haskell, however, there is a symbol for this value: an empty pair of parentheses, (). So, by a funny coincidence (or is it a coincidence?), the call to a function of void looks the same in C++ and in Haskell. Also, because of the Haskell’s love of terseness, the same symbol () is used for the type, the constructor, and the only value corresponding to a singleton set. So here’s this function in Haskell:

f44 :: () -> Integer
f44 () = 44

The first line declares that f44 takes the type (), pronounced “unit,” into the type Integer. The second line defines f44 by pattern matching the only constructor for unit, namely (), and producing the number 44. You call this function by providing the unit value ():

f44 ()

Notice that every function of unit is equivalent to picking a single element from the target type (here, picking the Integer 44). In fact you could think of f44 as a different representation for the number 44. This is an example of how we can replace explicit mention of elements of a set by talking about functions (arrows) instead. Functions from unit to any type A are in one-to-one correspondence with the elements of that set A.

What about functions with the void return type, or, in Haskell, with the unit return type? In C++ such functions are used for side effects, but we know that these are not real functions in the mathematical sense of the word. A pure function that returns unit does nothing: it discards its argument.

Mathematically, a function from a set A to a singleton set maps every element of A to the single element of that singleton set. For every A there is exactly one such function. Here’s this function for Integer:

fInt :: Integer -> ()
fInt x = ()

You give it any integer, and it gives you back a unit. In the spirit of terseness, Haskell lets you use the wildcard pattern, the underscore, for an argument that is discarded. This way you don’t have to invent a name for it. So the above can be rewritten as:

fInt :: Integer -> ()
fInt _ = ()

Notice that the implementation of this function not only doesn’t depend on the value passed to it, but it doesn’t even depend on the type of the argument.

Functions that can be implemented with the same formula for any type are called parametrically polymorphic. You can implement a whole family of such functions with one equation using a type parameter instead of a concrete type. What should we call a polymorphic function from any type to unit type? Of course we’ll call it unit:

unit :: a -> ()
unit _ = ()

In C++ you would write this function as:

template<class T>
void unit(T) {}

Next in the typology of types is a two-element set. In C++ it’s called bool and in Haskell, predictably, Bool. The difference is that in C++ bool is a built-in type, whereas in Haskell it can be defined as follows:

data Bool = True | False

(The way to read this definition is that Bool is either True or False.) In principle, one should also be able to define a Boolean type in C++ as an enumeration:

enum bool {

but C++ enum is secretly an integer. The C++11 “enum class” could have been used instead, but then you would have to qualify its values with the class name, as in bool::true and bool::false, not to mention having to include the appropriate header in every file that uses it.

Pure functions from Bool just pick two values from the target type, one corresponding to True and another to False.

Functions to Bool are called predicates. For instance, the Haskell library Data.Char is full of predicates like isAlpha or isDigit. In C++ there is a similar library that defines, among others, isalpha and isdigit, but these return an int rather than a Boolean. The actual predicates are defined in std::ctype and have the form ctype::is(alpha, c), ctype::is(digit, c), etc.


  1. Define a higher-order function (or a function object) memoize in your favorite language. This function takes a pure function f as an argument and returns a function that behaves almost exactly like f, except that it only calls the original function once for every argument, stores the result internally, and subsequently returns this stored result every time it’s called with the same argument. You can tell the memoized function from the original by watching its performance. For instance, try to memoize a function that takes a long time to evaluate. You’ll have to wait for the result the first time you call it, but on subsequent calls, with the same argument, you should get the result immediately.
  2. Try to memoize a function from your standard library that you normally use to produce random numbers. Does it work?
  3. Most random number generators can be initialized with a seed. Implement a function that takes a seed, calls the random number generator with that seed, and returns the result. Memoize that function. Does it work?
  4. Which of these C++ functions are pure? Try to memoize them and observe what happens when you call them multiple times: memoized and not.
    1. The factorial function from the example in the text.
    2. std::getchar()
    3. bool f() { 
          std::cout << "Hello!" << std::endl;
          return true; 
    4. int f(int x)
          static int y = 0;
          y += x;
          return y;
  5. How many different functions are there from Bool to Bool? Can you implement them all?
  6. Draw a picture of a category whose only objects are the types Void, () (unit), and Bool; with arrows corresponding to all possible functions between these types. Label the arrows with the names of the functions.


  1. Nils Anders Danielsson, John Hughes, Patrik Jansson, Jeremy Gibbons, Fast and Loose Reasoning is Morally Correct. This paper provides justification for ignoring bottoms in most contexts.

Next: Categories Great and Small

Ferrari museum in Maranello

Ferrari museum in Maranello

I was recently visiting the Ferrari museum in Maranello, Italy, where I saw this display of telemetry data from racing cars.

Telemetry data from a racing car. The contour of the racing track is shown in the upper left corner and various data channels are displayed below.

Telemetry data from a racing car. The racing track is displayed in the upper left corner and various data channels are displayed below.

The processing and the display of telemetry data is an interesting programming challenge. It has application in space exploration (as in, when you land a probe on a surface of a comet), medicine, and the military. The same techniques are used in financial systems where streams carry information about stock prices, commodity prices, and currency exchange rates.

It’s also a problem that lends itself particularly well to functional programming. If you are one of these shops working with telemetry, and you have to maintain legacy code written in imperative style, you might be interested in an alternative approach, especially if you are facing constant pressure to provide more sophisticated analysis tools and introduce concurrency to make the system faster and more responsive.

What all these applications have in common is that they deal with multiple channels generating streams of data. The data has to be either displayed in real time or stored for later analysis and processing. It’s pretty obvious to a functional programmer that channels are functors, and that they should be composed using combinators. In fact this observation can drive the whole architecture. The clincher is the issue of concurrency: retrofitting non-functional code to run in parallel is a lost battle — it’s almost easier to start from scratch. But treating channels as immutable entities makes concurrency almost an after-thought.

Everything is a Number

The most basic (and totally wrong) approach is to look at telemetry as streams of numbers. This is the assembly language of data processing. When everything is a number and you can apply your math any way you wish. The problem is that you are throwing away a lot of useful information. You want to use types as soon as possible to encode additional information and to prevent nonsensical operations like adding temperature to velocity.

In an engineering application, the least you can do is to keep track of units of measurement. You also want to distinguish between channels that produce floating-point numbers and ones that produce integers, or Booleans, or strings. This immediately tells you that a channel should be a polymorphic data structure. You should be able to stream any type of data, be it bytes, complex numbers, or vectors.

Everything is an Object

To an object-oriented mind it looks very much like a channel should be an object that is parameterized by the type of data it carries. And as an object it should have some methods. We need the get method to access the current value, and the next method to increment the position in the stream. As an imperative programmer you might also be tempted to provide a mutator, set. If you ever want your program to be concurrent, don’t even think about it!

If you’re a C++ programmer, you may overload some operators, and use * and ++ instead. That would make a channel look more like a forward iterator. But whatever you call it, a functional programmer will recognize it as a list, with the head and tail functionality.

Everything is a List

Let’s talk about lists, because there is a lot of misunderstanding around them. When people think of lists in imperative languages they think about storage. A list is probably the worst data type for storing data. Imperative programmers naturally assume that functional programmers, who use lists a lot, must be crazy. They are not! A Haskell list is almost never used for storing bulk data. A list is either an interface to data that is stored elsewhere, or a generator of data. Haskell is a lazy functional language, so what looks like a data structure is really a bunch of functions that provide data on demand.

That’s why I wouldn’t hesitate to implement channels as lists in Haskell. As an added bonus, lists can provide a pull interface to data that is being pushed. Reactive programs that process streams of data may be written as if all the data were already there — the event handler logic can be hidden inside the objects that generate the data. And this is just what’s needed for live telemetry data.

Obviously, functional programming is easier in Haskell than in C++, C#, or Java. But given how much legacy software there is, it could be a lost cause to ask management to (a) throw away existing code and start from scratch, (b) retrain the team to learn a new language, and (c) deal with completely new performance characteristics, e.g., lazy evaluation and garbage collection. So, realistically, the best we can do is to keep introducing functional methods into imperative languages, at least for the time being. It doesn’t mean that Haskell should’t play an important role in it. Over and over again I find myself prototyping solutions in Haskell before translating them into C++. The added effort pays back handsomely through faster prototyping, better code quality, and fewer bugs to chase. So I would highly recommend to every imperative programmer to spend, say, an hour a day learning and playing with Haskell. You’d be amazed how it helps in developing your programming skills.

Everything is a Functor

So, if you’re an object oriented programmer, you’ll probably implement a channel as something like this:

template <class T> Channel {
    virtual T get();
    virtual bool next();

and then get stuck. With this kind of interface, the rest of your program is bound to degenerate into a complex system of loops that extract data from streams and process them, possibly stuffing it back into other streams.

Instead, I propose to try the functional way. I will show you some prototype code in Haskell, but mostly explain how things work, so a non-Haskell programmer can gain some insight.

Here’s the definition of a polymorphic channel type, Chan:

data Chan a = Chan [a]

where a plays the role of a type variable, analogous to T in the C++ code above. The right hand side of the equal sign defines the constructor Chan that takes a list as an argument. Constructors are used both for constructing and for pattern matching. The notation [a] means a list of a.

The details don’t really matter, as long as you understand that the channel is implemented as a list. Also, I’m making things a little bit more explicit for didactic purposes. A Haskell programmer would implement the channel as a type alias, type, rather than a separate type.

Rule number one of dealing with lists is: try not to access their elements in a loop (or, using the functional equivalent of a loop — recursively). Operate on lists holistically. For instance, one of the most common operations on lists is to apply a function to every element. That means we want our Chan to be a functor.

A functor is a polymorphic data type that supports operating on its contents with a function. In the case of Chan that’s easy, since a list itself is a functor. I’ll be explicit here, again for didactic reasons. This is how you make Chan an instance of the Functor class by defining how to fmap a function f over it:

instance Functor Chan where
    fmap f (Chan xs) = Chan (map f xs)

Here, map is a library function that applies f to every element of the list. This is very much like applying C++ std::transform to a container, except that in Haskell everything is evaluated lazily, so you can apply fmap to an infinite list, or to a list that is not there yet because, for instance, it’s being generated in real time from incoming telemetry.

Everything is a Combinator

Let’s see how far we can get with this channel idea. The next step is to be able to combine multiple channels to generate streams of derived data. For instance, suppose that you have one channel from a pressure gauge, and another providing volume data, and you want to calculate instantaneous temperature using the ideal gas equation.

Let’s start with defining some types. We want separate types for quantities that are measured using different units. Once more, I’m being didactic here, because there are ready-made Haskell libraries that use so called phantom types to encode measurement units. Here I’ll do it naively:

data Pressure = Pascal Float
data Volume   = Meter3 Float
data Temp     = Kelvin Float

I’ll also define the ideal gas constant:

constR = 8.314472 -- J/(mol·K)

Here’s the function that calculates the temperature of ideal gas:

getT :: Float -> Pressure -> Volume -> Temp
getT n (Pascal p) (Meter3 v) = Kelvin (p * v / (n * constR))

The question is, how can we apply this function to the pressure and volume channels to get the temperature channel? We know how to apply a function to a single channel using fmap, but here we have to work with two channels. Fortunately, a channel is not just a functor — it’s an applicative functor. It defines the action of multi-argument functions on multiple channels. I’ll give you a Haskell implementation, but you should be able to do the same in C++ by overloading fmap or transform.

instance Applicative Chan where
    pure x = Chan (repeat x)
    (Chan fs) <*> (Chan xs) = Chan (zipWith ($) fs xs)

The Applicative class defines two functions. One is called pure, and it creates a constant channel from a value by repeatedly returning the same value. The other is a binary operator <*> that applies a channel of functions (yes, you can treat functions the same way you treat any other data) to a channel of values. The function zipWith applies, pairwise, functions to arguments using the function application operator ($).

Again, the details are not essential. The bottom line is that this allows us to apply our function getT to two channels (actually, three channels, since we also need to provide the amount of gas in moles — here I’m assuming 0.1 moles).

chT :: Chan Pressure -> Chan Volume -> Chan Temp
chT chP chV = getT <$> pure 0.1 <*> chP <*> chV

Such functions that combine channels into new channels are called combinators, and an applicative functor makes the creation of new combinators very easy.

The combinators are not limited to producing physical quantities. They may as well produce channels of alerts, channels of pixels for display, or channels of visual widgets. You can construct the whole architecture around channels. And since we’ve been only considering functional data structures, the resulting architecture can be easily subject to parallelization.

Moving Average

But don’t some computations require mutable state? For instance, don’t you need some kind of accumulators in order to calculate, let’s say, moving averages? Let’s see how this can be done functionally.

The idea is to keep a running sum of list elements within a fixed window of size n. When advancing through the list, we will add the new incoming element to the running sum and subtract the old outgoing element. The average is just this sum divided by n.

We can use the old trick of delaying the list by n positions. We’ll pad the beginning of the delayed list with n zeros. Here’s the Haskell code:

delay :: Num a => Int -> [a] -> [a]
delay n lst = replicate n 0 ++ lst

The first line is the (optional, but very useful) type signature. The second line defines the function delay that takes the delay counter n and the list. The function returns a list that is obtained by concatenating (operator ++) the zero-filled list (replicate n 0) in front of the original list. For instance, if you start with the list [1, 2, 3, 4] and delay it by 2, you’ll get [0, 0, 1, 2, 3, 4].

The next step is to create a stream of deltas — the differences between elements separated by n positions. We do it by zipping two lists: the original and the delayed one.

zip lst (delay n lst)

The function zip pairs elements from the first list with the elements from the second list.

Continuing with our example, the zipping will produce the pairs [(1, 0), (2, 0), (3, 1), (4, 2)]. Notice that the left number in each pair is the incoming element that is to be added to the running sum, while the right number is the outgoing one, to be subtracted from the running sum.

Now if we subtract the two numbers in each pair we’ll get exactly the delta that has to be added to the running sum at each step. We do the subtraction by mapping the operator (-) over the list. (To make the subtraction operator (-) operate on pairs we have to uncurry it. (If you don’t know what currying is, don’t worry.)

deltas :: Num a => Int -> [a] -> [a]
deltas n lst = map (uncurry (-)) (zip lst (delay n lst))

Continuing with the example, we will get [1, 2, 2, 2]. These are the amounts by which the running sum should change at every step. (Incidentally, for n equal to one, the deltas are proportional to the derivative of the sampled stream.)

Finally, we have to start accumulating the deltas. There is a library function scanl1 that can be used to produce a list of partial sums when called with the summation operator (+).

slidingSums :: Num a => Int -> [a] -> [a]
slidingSums n lst =  scanl1 (+) (deltas n lst)

At each step, scanl1 will add the delta to the previous running sum. The “1” in its name means that it will start with the first element of the list as the accumulator. The result, in our little example, is [1, 3, 5, 7]. What remains is to divide each sum by n and we’re done:

movingAverage :: Fractional a => Int -> [a] -> [a]
movingAverage n list = map (/ (fromIntegral n)) (slidingSums n list)

Since n is an integer, it has to be explicitly converted to a fractional number before being passed to the division operator. This is done using fromIntegral. The slightly cryptic notation (/ (fromIntegral n)) is called operator section. It just means “divide by n.”

As expected, the final result for the two-element running average of [1, 2, 3, 4] is [0.5, 1.5, 2.5, 3.5]. Notice that we haven’t used any mutable state to achieve this result, which makes this code automatically thread safe. Also, because the calculation is lazy, we can calculate the moving average of an infinite list as long as we only extract a finite number of data points. Here, we are printing the first 10 points of the 5-element moving average of the list of integers from 1 to infinity.

print (take 10 (movingAverage 5 [1..]))

The result is:

[0.2, 0.6, 1.2, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]


The functional approach is applicable to designing software not only in the small but, more importantly, in the large. It captures the patterns of interaction between components and the ways they compose. The patterns I mentioned in this post, the functor and the applicative functor, are probably the most common, but functional programmers have at their disposal a large variety of patterns borrowed from various branches of mathematics. These patterns can be used by imperative programmers as well, resulting in cleaner and more maintainable software that is, by construction, multithread-ready.

I was overwhelmed by the positive response to my previous post, the Preface to Category Theory for Programmers. At the same time, it scared the heck out of me because I realized what high expectations people were placing in me. I’m afraid that no matter what I’ll write, a lot of readers will be disappointed. Some readers would like the book to be more practical, others more abstract. Some hate C++ and would like all examples in Haskell, others hate Haskell and demand examples in Java. And I know that the pace of exposition will be too slow for some and too fast for others. This will not be the perfect book. It will be a compromise. All I can hope is that I’ll be able to share some of my aha! moments with my readers. Let’s start with the basics.

A category is an embarrassingly simple concept. A category consists of objects and arrows that go between them. That’s why categories are so easy to represent pictorially. An object can be drawn as a circle or a point, and an arrow… is an arrow. (Just for variety, I will occasionally draw objects as piggies and arrows as fireworks.) But the essence of a category is composition. Or, if you prefer, the essence of composition is a category. Arrows compose, so if you have an arrow from object A to object B, and another arrow from object B to object C, then there must be an arrow — their composition — that goes from A to C.


In a category, if there is an arrow going from A to B and an arrow going from B to C then there must also be a direct arrow from A to C that is their composition. This diagram is not a full category because it’s missing identity morphisms (see later).


Arrows as Functions

Is this already too much abstract nonsense? Do not despair. Let’s talk concretes. Think of arrows, which are also called morphisms, as functions. You have a function f that takes an argument of type A and returns a B. You have another function g that takes a B and returns a C. You can compose them by passing the result of f to g. You have just defined a new function that takes an A and returns a C.

In math, such composition is denoted by a small circle between functions: g∘f. Notice the right to left order of composition. For some people this is confusing. You may be familiar with the pipe notation in Unix, as in:

lsof | grep Chrome

or the chevron >> in F#, which both go from left to right. But in mathematics and in Haskell functions compose right to left. It helps if you read g∘f as “g after f.”

Let’s make this even more explicit by writing some C code. We have one function f that takes an argument of type A and returns a value of type B:

B f(A a);

and another:

C g(B b);

Their composition is:

C g_after_f(A a)
    return g(f(a));

Here, again, you see right-to-left composition: g(f(a)); this time in C.

I wish I could tell you that there is a template in the C++ Standard Library that takes two functions and returns their composition, but there isn’t one. So let’s try some Haskell for a change. Here’s the declaration of a function from A to B:

f :: A -> B


g :: B -> C

Their composition is:

g . f

Once you see how simple things are in Haskell, the inability to express straightforward functional concepts in C++ is a little embarrassing. In fact, Haskell will let you use Unicode characters so you can write composition as:

g ∘ f

You can even use Unicode double colons and arrows:

f ∷ A → B

So here’s the first Haskell lesson: Double colon means “has the type of…” A function type is created by inserting an arrow between two types. You compose two functions by inserting a period between them (or a Unicode circle).

Properties of Composition

There are two extremely important properties that the composition in any category must satisfy.

1. Composition is associative. If you have three morphisms, f, g, and h, that can be composed (that is, their objects match end-to-end), you don’t need parentheses to compose them. In math notation this is expressed as:

h∘(g∘f) = (h∘g)∘f = h∘g∘f

In (pseudo) Haskell:

f :: A -> B
g :: B -> C
h :: C -> D
h . (g . f) == (h . g) . f == h . g . f

(I said “pseudo,” because equality is not defined for functions.)

Associativity is pretty obvious when dealing with functions, but it may be not as obvious in other categories.

2. For every object A there is an arrow which is a unit of composition. This arrow loops from the object to itself. Being a unit of composition means that, when composed with any arrow that either starts at A or ends at A, respectively, it gives back the same arrow. The unit arrow for object A is called idA (identity on A). In math notation, if f goes from A to B then

f∘idA = f


idB∘f = f

When dealing with functions, the identity arrow is implemented as the identity function that just returns back its argument. The implementation is the same for every type, which means this function is universally polymorphic. In C++ we could define it as a template:

template<class T> T id(T x) { return x; }

Of course, in C++ nothing is that simple, because you have to take into account not only what you’re passing but also how (that is, by value, by reference, by const reference, by move, and so on).

In Haskell, the identity function is part of the standard library (called Prelude). Here’s its declaration and definition:

id :: a -> a
id x = x

As you can see, polymorphic functions in Haskell are a piece of cake. In the declaration, you just replace the type with a type variable. Here’s the trick: names of concrete types always start with a capital letter, names of type variables start with a lowercase letter. So here a stands for all types.

Haskell function definitions consist of the name of the function followed by formal parameters — here just one, x. The body of the function follows the equal sign. This terseness is often shocking to newcomers but you will quickly see that it makes perfect sense. Function definition and function call are the bread and butter of functional programming so their syntax is reduced to the bare minimum. Not only are there no parentheses around the argument list but there are no commas between arguments (you’ll see that later, when we define functions of multiple arguments).

The body of a function is always an expression — there are no statements in functions. The result of a function is this expression — here, just x.

This concludes our second Haskell lesson.

The identity conditions can be written (again, in pseudo-Haskell) as:

f . id == f
id . f == f

You might be asking yourself the question: Why would anyone bother with the identity function — a function that does nothing? Then again, why do we bother with the number zero? Zero is a symbol for nothing. Ancient Romans had a number system without a zero and they were able to build excellent roads and aqueducts, some of which survive to this day.

Neutral values like zero or id are extremely useful when working with symbolic variables. That’s why Romans were not very good at algebra, whereas the Arabs and the Persians, who were familiar with the concept of zero, were. So the identity function becomes very handy as an argument to, or a return from, a higher-order function. Higher order functions are what make symbolic manipulation of functions possible. They are the algebra of functions.

To summarize: A category consists of objects and arrows (morphisms). Arrows can be composed, and the composition is associative. Every object has an identity arrow that serves as a unit under composition.

Composition is the Essence of Programming

Functional programmers have a peculiar way of approaching problems. They start by asking very Zen-like questions. For instance, when designing an interactive program, they would ask: What is interaction? When implementing Conway’s Game of Life, they would probably ponder about the meaning of life. In this spirit, I’m going to ask: What is programming? At the most basic level, programming is about telling the computer what to do. “Take the contents of memory address x and add it to the contents of the register EAX.” But even when we program in assembly, the instructions we give the computer are an expression of something more meaningful. We are solving a non-trivial problem (if it were trivial, we wouldn’t need the help of the computer). And how do we solve problems? We decompose bigger problems into smaller problems. If the smaller problems are still too big, we decompose them further, and so on. Finally, we write code that solves all the small problems. And then comes the essence of programming: we compose those pieces of code to create solutions to larger problems. Decomposition wouldn’t make sense if we weren’t able to put the pieces back together.

This process of hierarchical decomposition and recomposition is not imposed on us by computers. It reflects the limitations of the human mind. Our brains can only deal with a small number of concepts at a time. One of the most cited papers in psychology, The Magical Number Seven, Plus or Minus Two, postulated that we can only keep 7 ± 2 “chunks” of information in our minds. The details of our understanding of the human short-term memory might be changing, but we know for sure that it’s limited. The bottom line is that we are unable to deal with the soup of objects or the spaghetti of code. We need structure not because well-structured programs are pleasant to look at, but because otherwise our brains can’t process them efficiently. We often describe some piece of code as elegant or beautiful, but what we really mean is that it’s easy to process by our limited human minds. Elegant code creates chunks that are just the right size and come in just the right number for our mental digestive system to assimilate them.

So what are the right chunks for the composition of programs? Their surface area has to increase slower than their volume. (I like this analogy because of the intuition that the surface area of a geometric object grows with the square of its size — slower than the volume, which grows with the cube of its size.) The surface area is the information we need in order to compose chunks. The volume is the information we need in order to implement them. The idea is that, once a chunk is implemented, we can forget about the details of its implementation and concentrate on how it interacts with other chunks. In object-oriented programming, the surface is the class declaration of the object, or its abstract interface. In functional programming, it’s the declaration of a function. (I’m simplifying things a bit, but that’s the gist of it.)

Category theory is extreme in the sense that it actively discourages us from looking inside the objects. An object in category theory is an abstract nebulous entity. All you can ever know about it is how it relates to other object — how it connects with them using arrows. This is how internet search engines rank web sites by analyzing incoming and outgoing links (except when they cheat). In object-oriented programming, an idealized object is only visible through its abstract interface (pure surface, no volume), with methods playing the role of arrows. The moment you have to dig into the implementation of the object in order to understand how to compose it with other objects, you’ve lost the advantages of your programming paradigm.


  1. Implement, as best as you can, the identity function in your favorite language (or the second favorite, if your favorite language happens to be Haskell).
  2. Implement the composition function in your favorite language. It takes two functions as arguments and returns a function that is their composition.
  3. Write a program that tries to test that your composition function respects identity.
  4. Is the world-wide web a category in any sense? Are links morphisms?
  5. Is Facebook a category, with people as objects and friendships as morphisms?
  6. When is a directed graph a category?

Next: Types and Functions.