Preface

In my previous blog post I used, without proof, the fact that the initial algebra of the functor I + h \otimes - is a free monoid. The proof of this statement is not at all trivial and, frankly, I would never have been able to work it out by myself. I was lucky enough to get help from a mathematician, Alex Campbell, who sent me the proof he extracted from the paper by G. M. Kelly (see Bibliography).

I worked my way through this proof, filling some of the steps that might have been obvious to a mathematician, but not to an outsider. I even learned how to draw diagrams using the TikZ package for LaTeX.

What I realized was that category theorists have developed a unique language to manipulate mathematical formulas: the language of 2-dimensional diagrams. For a programmer interested in languages this is a fascinating topic. We are used to working with grammars that essentially deal with string substitutions. Although diagrams can be serialized—TikZ lets you do it—you can’t really operate on diagrams without laying them out on a page. The most common operation—diagram pasting—involves combining two or more diagrams along common edges. I am amazed that, to my knowledge, there is no tool to mechanize this process.

In this post you’ll also see lots of examples of using the same diagram shape (the free-construction diagram, or the algebra-morphism diagram), substituting new expressions for some of the nodes and edges. Not all substitutions are valid and I’m sure one could develop some kind of type system to verify their correctness.

Because of proliferation of diagrams, this blog post started growing out of proportion, so I decided to split it into two parts. If you are serious about studying this proof, I strongly suggest you download (or even print out) the PDF version of this blog.

Part I: Free Algebras

Introduction

Here’s the broad idea: The initial algebra that defines a free monoid is a fixed point of the functor I + h \otimes -, which I will call the list functor. Roughly speaking, it’s the result of replacing the dash in the definition of the functor with the result of the replacement. For instance, in the first iteration we would get:

I + h \otimes (I + h \otimes -) \cong I + h + h \otimes h \otimes -

I used the fact that I is the unit of the tensor product, the associativity of \otimes (all up to isomorphism), and the distributivity of tensor product over coproduct.

Continuing this process, we would arrive at an infinite sum of powers of h:

m = I + h + h \otimes h + h \otimes h \otimes h + ...

Intuitively, a free monoid is a list of hs, and this representation expresses the fact that a list is either trivial (corresponding to the unit I), or a single h, or a product of two hs, and so on…

Let’s have a look at the structure map of the initial algebra of the list functor:

I + h \otimes m \to m

Mapping out of a coproduct (sum) is equivalent to defining a pair of morphisms \langle \pi, \sigma \rangle:

\pi \colon I \to m

\sigma \colon h \otimes m \to m

the second of which may, in itself, be considered an algebra for the product functor h \otimes -.

Our goal is to show that the initial algebra of the list functor is a monoid so, in particular, it supports multiplication:

\mu \colon m \otimes m \to m

Following our intuition about lists, this multiplication corresponds to list concatenation. One way of concatenating two lists is to keep multiplying the second list by elements taken from the first list. This operation is described by the application of our product functor h \otimes -. Such repetitive application of a functor is described by a free algebra.

There is just one tricky part: when concatenating two lists, we have to disassemble the left list starting from the tail (alternatively, we could disassemble the right list from the head, but then we’d have to append elements to the tail of the left list, which is the same problem). And here’s the ingenious idea: you disassemble the left list from the head, but instead of applying the elements directly to the right list, you turn them into functions that prepend an element. In other words you convert a list of elements into a (reversed) list of functions. Then you apply this list of functions to the right list one by one.

This conversion is only possible if you can trade product for function — the process we know as currying. Currying is possible if there is an adjunction between the product and the exponential, a.k.a, the internal hom, [k, n] (which generalizes the set of functions from k to n):

C(m \otimes k, n) \cong C(m, [k, n])

We’ll assume that the underlying category C is monoidal closed, so that we can curry morphisms that map out from the tensor product:

g \colon m \otimes k \to n

\bar{g} \colon m \to [k, n]

(In what follows I’ll be using the overbar to denote the curried version of a morphism.)

The internal hom can also be defined using a universal construction, see Fig. 1. The morphism eval corresponds to the counit of the adjunction (although the universal construction is more general than the adjunction).

Fig. 1. Universal construction of the internal hom [k, n]. For any object m and a morphism g \colon m \otimes k \to n there is a unique morphism \bar{g} (the curried version of g) which makes the triangle commute.

The hard part of the proof is to show that the initial algebra produces a free monoid, which is a free object in the category of monoids. I’ll start by defining the notion of a free object.

Free Objects

You might be familiar with the definition of a free construction as the left adjoint to the forgetful functor. Fig 2 illustrates the essence of such an adjunction.

Fig. 2. Free/forgetful adjunction

The left hand side is in some category D of structured objects: algebras, monoids, monads, etc. The right hand side is in the underlying category C, often the category of sets. The adjunction establishes a one-to-one correspondence between sets of morphisms, of which g and f are examples. If U is the forgetful functor, then F is the free functor, and the object F x is called the free object generated by x. The adjunction is an isomorphism of hom-sets, natural in both x and z:

D(F x, z) \cong C(x, U z)

Unfortunately, this description doesn’t quite work for some important cases, like free monads. In the case of free monads, the right category is the category of endofunctors, and the left category is the category of monads. Because of size issues, not every endofunctor on the right generates a free monad on the left.

It turns out that there is a weaker definition of a free object that doesn’t require a full blown adjunction; and which reduces to it, when the adjunction can be defined globally.

Let’s start with the object x on the right, and try to define the corresponding free object F x on the left (by abuse of notation I will call this object F x, even if there is no functor F). For our definition, we need a condition that would work universally for any object z, and any morphism f from x to U z.

We are still missing one important ingredient: something that would tell us that x acts as a set of generators for F x. This property can be expressed by providing a morphism that inserts x into U (F x)—the object underlying the free object. In the case of an adjunction, this morphism happens to be the component of the unit natural transformation:

\eta \colon Id \to U \circ F

where Id is the identity functor (see Fig. 3).

Fig. 3. Unit of adjunction

The important property of the unit of adjunction \eta is that it can be used to recover the mapping from the left hom-set to the right hom-set in Fig. 2. Take a morphism g \colon F x \to z, lift it using U, and compose it with \eta_x. You get a morphism f \colon x \to U z:

f = U g \circ \eta_x

In the absence of an adjunction, we’ll make the existence of the morphism \eta_x part of the definition of the free object.

Definition. {Free object.}
A free object on x consists of an object F x and a morphism \eta_x \colon x \to U (F x) such that, for every object z and a morphism f \colon x \to U z, there is a unique morphism g \colon F x \to z such that:

U g \circ \eta_x = f

The diagram in Fig. 4 illustrates this definition. It is essentially a composition of the two previous diagrams, except that we don’t require the existence of a global mapping, let alone a functor, F.

Fig. 4. Definition of a free object F x

The morphism \eta_x is called the universal mapping, because it’s independent of z and f. The equation:

U g \circ \eta_x = f

is called the universal condition, because it works universally for every z and f. We say that g is induced by the morphism f.

There is a standard trick that uses the universal condition: Suppose you have two morphisms g and g' from the universal object to some z. To prove that they are equal, it’s enough to show that they are both induced by the same f. Showing the equality:

U g \circ \eta_x = U g' \circ \eta_x

is often easier because it lets you work in the simpler, underlying category.

Free Algebras

As I mentioned in the introduction, we are interested in algebras for the product functor: h \otimes -, which we’ll call h-algebras. Such an algebra consists of a carrier n and a structure map:

\nu \colon h \otimes n \to n

For every h, h-algebras form a category; and there is a forgetful functor U that maps an h-algebra to its carrier object n. We can therefore define a free algebra as a free object in the category of h-algebras, which may or may not be generalizable to a full-blown free/forgetful adjunction. Fig. 5 shows the universal condition that defines such a free algebra (m_k, \sigma) generated by an object k.

Fig. 5. Free h-algebra (m_k, \sigma) generated by k

In particular, we can define an important free h-algebra generated by the identity object I. This algebra (m, \sigma) has the structure map:

\sigma \colon h \otimes m \to m

and is described by the diagram in Fig. 6:

Fig. 6. Free h-algebra (m, \sigma) generated by I

Its universal condition reads:

g \circ \pi = f

By definition, since g is an algebra morphism, it makes the diagram in Fig. 7 commute:

Fig. 7. g is an h-algebra morphism

We use a notational convenience: h \otimes g is the lifting of the morphism g by the product functor h \otimes -. This might be confusing at first, because it looks like we are multiplying an object h by a morphism g. One way to parse it is to consider that, if we keep the first argument to the tensor product constant, then it’s a functor in the second component, symbolically h \otimes -. Since it’s a functor, we can use it to lift a morphism g, which can be notated as h \otimes g.

Alternatively, we can exploit the fact that tensor product is a bifunctor, and therefore it may lift a pair of morphism, as in id_h \otimes g; and h \otimes g is just a shorthand notation for this.

Bifunctoriality also means that the tensor product preserves composition and identity in both arguments. We’ll use these facts extensively later, especially as the basis for string diagrams.

The important property of m is that it also happens to be the initial algebra for the list functor I + h \otimes -. Indeed, for any other algebra with the carrier n and the structure map a pair \langle f, \nu \rangle, there exists a unique g given by Fig 6, such that the diagram in Fig. 8 commutes:

Fig. 8. Initiality of the algebra (m, \langle \pi, \sigma \rangle) for the functor I + h \otimes -.

Here, inl and inr are the two injections into the coproduct.

If you visualize m as the sum of all powers of h, \pi inserts the unit I (zeroth power) into it, and \sigma inserts the sum of non-zero powers.

\langle \pi, \sigma \rangle \colon I + h \otimes m \to m

The advantage of this result is that we can concentrate on studying the simpler problem of free h-algebras rather than the problem of initial algebras for the more complex list functor.

Example

Here’s some useful intuition behind h-algebras. Think of the functor (h \otimes -) as a means of forming formal expressions. These are very simple expressions: you take a variable and pair it (using the tensor product) with h. To define an algebra for this functor you pick a particular type n for your variable (i.e., the carrier object) and a recipe for evaluating any expression of type h \otimes n (i.e., the structure map).

Let’s try a simple example in Haskell. We’ll use pairs (and tuples) as our tensor product, with the empty tuple () as unit (up to isomorphism). An algebra for a functor f is defined as:

type Algebra f a = f a -> a

Consider h-algebras for a particular choice of h: the type of real numbers Double. In other words, these are algebras for the functor ((,) Double). Pick, as your carrier, the type of vectors:

data Vector = Vector { x :: Double
                     , y :: Double 
                     } deriving Show

and the structure map that scales a vector by multiplying it by a number:

vecAlg :: Algebra ((,) Double) Vector
vecAlg (a, v) = Vector { x = a * x v
                       , y = a * y v }

Define another algebra with the carrier the set of real numbers, and the structure map multiplication by a number.

mulAlg :: Algebra ((,) Double) Double
mulAlg (a, x) = a * x

There is an algebra morphism from vecAlg to mulAlg, which takes a vector and maps it to its length.

algMorph :: Vector -> Double
algMorph v = sqrt((x v)^2 + (y v)^2)

This is an algebra morphism, because it doesn’t matter if you first calculate the length of a vector and then multiply it by a, or first multiply the vector by a and then calculate its length (modulo floating-point rounding).

A free algebra has, as its carrier, the set of all possible expressions, and an evaluator that tells you how to convert a functor-ful of such expressions to another valid expression. A good analogy is to think of the functor as defining a grammar (as in BNF grammar) and a free algebra as a language generated by this grammar.

You can generate free h-expressions recursively. The starting point is the set of generators k as the source of variables. Applying the functor to it produces h \otimes k. Applying it again, produces h \otimes h \otimes k, and so on. The whole set of expressions is the infinite coproduct (sum):

k + h \otimes k + h \otimes h \otimes k + ...

An element of this coproduct is either an element of k, or an element of the product h \otimes k, and so on…

The universal mapping injects the set of generators k into the set of expressions (here, it would be the leftmost injection into the coproduct).

In the special case of an algebra generated by the unit I, the free algebra simplifies to the power series:

I + h + h \otimes h + ...

and \pi injects I into it.

Continuing with our example, let’s consider the free algebra for the functor ((,) Double) generated by the unit (). The free algebra is, symbolically, an infinite sum (coproduct) of tuples:

data Expr =
    () 
  | Double 
  | (Double, Double) 
  | (Double, Double, Double) 
  | ...

Here’s an example of an expression:

e = (1.1, 2.2, 3.3)

As you can see, free expressions are just lists of numbers. There is a function that inserts the unit into the set of expressions:

pi :: () -> Expr
pi () = ()

The free evaluator is a function (not actual Haskell):

sigma (a, ()) = a
sigma (a, x)  = (a, x)
sigma (a, (x, y)) = (a, x, y)
sigma (a, (x, y, z)) = (a, x, y, z)
...

Let’s see how the universal property works in our example. Let’s try, as the target (n, \nu), our earlier vector algebra:

vecAlg :: Algebra ((,) Double) Vector
vecAlg (a, v) = Vector { x = a * x v
                       , y = a * y v }

The morphism f picks some concrete vector, for instance:

f :: () -> Vector
f () = Vector 3 4

There should be a unique morphism of algebras g that takes an arbitrary expression (a list of Doubles) to a vector, such that g . pi = f picks our special vector:

Vector 3 4

In other words, g must take the empty tuple (the result of pi) to Vector 3 4. The question is, how is g defined for an arbitrary expression? Look at the diagram in Fig. 7 and the commuting condition it implies:

g \circ \sigma = \nu \circ (h \otimes g)

Let’s apply it to an expression (a, ()) (the action of the functor (h, -) on ()). Applying sigma to it produces a, followed by the action of g resulting in g a. This should be the same as first applying the lifted (id, g) acting on (a, ()), which gives us (a, Vector 3 4); followed by vecAlg, which produces Vector (a * 3) (a * 4). Together, we get:

g a = Vector (a * 3) (a * 4)

Repeating this process gives us:

g :: Expr -> Vector
g () = Vector 3 4
g a  = Vector (a * 3) (a * 4)
g (a, b) = Vector (a * b * 3) (a * b * 4)
...

This is the unique g induced by our f.

Properties of Free Algebras

Here are some interesting properties that will help us later: h-algebras are closed under multiplication and exponentiation. If (n, \nu) is an h-algebra, then there are also h-algebras with the carriers n \otimes k and [k, n], for an arbitrary object k. Let’s find their structure maps.

The first one should be a mapping:

h \otimes n \otimes k \to n \otimes k

which we can simply take as the lifting of \nu by the tensor product: \nu \otimes k.

The second one is:

\tau_{k} \colon h \otimes [k, n] \to [k, n]

which can be uncurried to:

h \otimes [k, n] \otimes k \to n

We have at our disposal the counit of the hom-adjunction:

eval \colon [k, n] \otimes k \to n

which we can use in combination with \nu:

\nu \circ (h \otimes eval)

to implement the (uncurried) structure map.

Here’s the same construction for Haskell programmers. Given an algebra:

nu :: Algebra ((,) h) n

we can create two algebras:

alpha :: Algebra ((,) h) (n, k)
alpha (a, (n, k)) = (nu (a, n), k)

and:

tau :: Algebra ((,) h) (k -> n)
tau (a, kton) = k -> nu (a, kton k)

These two algebras are related through an adjunction in the category of h-algebras:

Alg\big((n \otimes k, \nu \otimes k), (l, \lambda)\big) \cong Alg\big((n, \nu), ([k, l], \tau_{k})\big)

which follows directly from hom-adjunction acting on carriers.

C(n \otimes k, l) \cong C(n, [k, l])

Finally, we can show how to generate free algebras from arbitrary generators. Intuitively, this is obvious, since it can be described as the application of distributivity of tensor product over coproduct:

k + h \otimes k + h \otimes h \otimes k + ... =  (I + h + h \otimes h + ...) \otimes k

More formally, we have:

Proposition.
If (m, \sigma) is is the free h-algebra generated by I, then (m \otimes k, \sigma \otimes k) is the free h-algebra generated by k with the universal map given by \pi \otimes k.

Proof.
Let’s show the universal property of (m \otimes k, \sigma \otimes k). Take any h-algebra (n, \nu) and a morphism:

f \colon k \to n

We want to show that there is a unique g \colon m \otimes k \to n that closes the diagram in Fig. 9.

Fig. 9. Free h-algebra generated by k \cong I \otimes k

I have rewritten f (taking advantage of the isomorphism I \otimes k \cong k), as:

f \colon I \otimes k \to n

We can curry it to get:

\bar{f} \colon I \to [k, n]

The intuition is that the morphism \bar{f} selects an element of the internal hom [k, n] that corresponds to the original morphism f \colon k \to n.

We’ve seen earlier that there exists an h-algebra with the carrier [k, n] and the structure map \tau_k. But since m is the free h-algebra, there must be a unique algebra morphism \bar{g} (see Fig. 10):

\bar{g} \colon (m, \sigma) \to ([k, n], \tau_k)

such that:

\bar{g} \circ \pi = \bar{f}

Fig. 10. The construction of the unique morphism \bar{g}

Uncurying this \bar{g} gives us the sought after g.

The universal condition in Fig. 9:

g \circ (\pi \otimes k) = f

follows from pasting together two diagrams that define the relevant hom-adjunctions in Fig 11 (c.f., Fig. 1).

Fig. 11. The diagram defining the currying of both g and g \circ (\pi \otimes k). This is the pasting together of two diagrams that define the universal property of the internal hom [k, n], one for the object I and one for the object m.


\square

The immediate consequence of this proposition is that, in order to show that two h-algebra morphisms g, g' \colon m \otimes k \to n are equal, it’s enough to show the equality of two regular morphisms:

g \circ (\pi \otimes k) = g' \circ (\pi \otimes k) \colon k \to n

(modulo isomorphism between k and I \otimes k). We’ll use this property later.

It’s instructive to pick apart this proof in the simpler Haskell setting. We have the target algebra:

nu :: Algebra ((,) h) n

There is a related algebra [k, n] of the form:

tau :: Algebra ((,) h) (k -> n)
tau (a, kton) = k -> nu (a, kton k)

We’ve analyzed, previously, a version of the universal construction of g, which we can now generalize to Fig. 10. We can build up the definition of \bar{g}, starting with the condition \bar{g} \circ \pi = \bar{f}. Here, \bar{f} selects a function from the hom-set: this function is our f. That gives us the action of g on the unit:

g :: Expr -> (k -> n)
g () = f

Next, we make use of the fact that \bar{g} is an algebra morphism that satisfies the commuting condition:

\bar{g} \circ \sigma = \tau \circ (h \otimes \bar{g})

As before, we apply this equation to the expression (a, ()). The left hand side produces g a, while the right hand side produces tau (a, f). Next, we
apply the same equation to (a, (b, ())). The left hand side produces g (a, b). The right hand produces tau (a, tau (b, f)), and so on. Applying the definition of tau, we get:

g :: Expr -> (k -> n)
g () = f
g a  = k -> nu (a, f k)
g (a, b) = k -> nu (a, nu (b, f k))
...

Notice the order reversal in function application. The list that is the argument to g is converted to a series of applications of nu, with list elements in reverse order. We first apply nu (b, -) and then nu (a, -). This reversal is crucial to implementing list concatenation, where nu will prepend elements of the first list to the second list. We’ll see this in the second installment of this blog post.

Bibliography

  1. G.M. Kelly, A Unified Treatment of Transfinite Constructions for Free Algebras, Free Monoids, Colimits, Associated Sheaves, and so On. Bulletin of the Australian Mathematical Society, vol. 22, no. 01, 1980, p. 1.