Category Theory



This post is based on the talk I gave at Functional Conf 2022. There is a video recording of this talk.

Disclaimers

Data types may contain secret information. Some of it can be extracted, some is hidden forever. We’re going to get to the bottom of this conspiracy.

No data types were harmed while extracting their secrets.

No coercion was used to make them talk.

We’re talking, of course, about unsafeCoerce, which should never be used.

Implementation hiding

The implementation of a function, even if it’s available for inspection by a programmer, is hidden from the program itself.

What is this function, with the suggestive name double, hiding inside?

x double x
2 4
3 6
-1 -2

Best guess: It’s hiding 2. It’s probably implemented as:

double x = 2 * x

How would we go about extracting this hidden value? We can just call it with the unit of multiplication:

double 1
> 2

Is it possible that it’s implemented differently (assuming that we’ve already checked it for all values of the argument)? Of course! Maybe it’s adding one, multiplying by two, and then subtracting two. But whatever the actual implementation is, it must be equivalent to multiplication by two. We say that the implementaion is isomorphic to multiplying by two.

Functors

Functor is a data type that hides things of type a. Being a functor means that it’s possible to modify its contents using a function. That is, if we’re given a function a->b and a functorful of a‘s, we can create a functorful of b‘s. In Haskell we define the Functor class as a type constructor equipped with the method fmap:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

A standard example of a functor is a list of a‘s. The implementation of fmap applies a function g to all its elements:

instance Functor [] where
  fmap g [] = []
  fmap g (a : as) = (g a) : fmap g as

Saying that something is a functor doesn’t guarantee that it actually “contains” values of type a. But most data structures that are functors will have some means of getting at their contents. When they do, you can verify that they change their contents after applying fmap. But there are some sneaky functors.

For instance Maybe a tells us: Maybe I have an a, maybe I don’t. But if I have it, fmap will change it to a b.

instance Functor Maybe where
  fmap g Empty = Empty
  fmap g (Just a) = Just (g a)

A function that produces values of type a is also a functor. A function e->a tells us: I’ll produce a value of type a if you ask nicely (that is call me with a value of type e). Given a producer of a‘s, you can change it to a producer of b‘s by post-composing it with a function g :: a -> b:

instance Functor ((->) e) where
  fmap g f = g . f

Then there is the trickiest of them all, the IO functor. IO a tells us: Trust me, I have an a, but there’s no way I could tell you what it is. (Unless, that is, you peek at the screen or open the file to which the output is redirected.)

Continuations

A continuation is telling us: Don’t call us, we’ll call you. Instead of providing the value of type a directly, it asks you to give it a handler, a function that consumes an a and returns the result of the type of your choice:

type Cont a = forall r. (a -> r) -> r

You’d suspect that a continuation either hides a value of type a or has the means to produce it on demand. You can actually extract this value by calling the continuation with an identity function:

runCont :: Cont a -> a
runCont k = k id

In fact Cont a is for all intents and purposes equivalent to a–it’s isomorphic to it. Indeed, given a value of type a you can produce a continuation as a closure:

mkCont :: a -> Cont a
mkCont a = \k -> k a

The two functions, runCont and mkCont are the inverse of each other thus establishing the isomorphism Cont a ~ a.

The Yoneda Lemma

Here’s a variation on the theme of continuations. Just like a continuation, this function takes a handler of a‘s, but instead of producing an x, it produces a whole functorful of x‘s:

type Yo f a = forall x. (a -> x) -> f x

Just like a continuation was secretly hiding a value of the type a, this data type is hiding a whole functorful of a‘s. We can easily retrieve this functorful by using the identity function as the handler:

runYo :: Functor f => Yo f a -> f a
runYo g = g id

Conversely, given a functorful of a‘s we can reconstruct Yo f a by defining a closure that fmap‘s the handler over it:

mkYo :: Functor f => f a -> Yo f a
mkYo fa = \g -> fmap g fa

Again, the two functions, runYo and mkYo are the inverse of each other thus establishing a very important isomorphism called the Yoneda lemma:

Yo f a ~ f a

Both continuations and the Yoneda lemma are defined as polymorphic functions. The forall x in their definition means that they use the same formula for all types (this is called parametric polymorphism). A function that works for any type cannot make any assumptions about the properties of that type. All it can do is to look at how this type is packaged: Is it passed inside a list, a function, or something else. In other words, it can use the information about the form in which the polymorphic argument is passed.

Existential Types

One cannot speak of existential types without mentioning Jean-Paul Sartre.
sartre_22
An existential data type says: There exists a type, but I’m not telling you what it is. Actually, the type has been known at the time of construction, but then all its traces have been erased. This is only possible if the data constructor is itself polymorphic. It accepts any type and then immediately forgets what it was.

Here’s an extreme example: an existential black hole. Whatever falls into it (through the constructor BH) can never escape.

data BlackHole = forall a. BH a

Even a photon can’t escape a black hole:

bh :: BlackHole
bh = BH "Photon"

We are familiar with data types whose constructors can be undone–for instance using pattern matching. In type theory we define types by providing introduction and elimination rules. These rules tell us how to construct and how to deconstruct types.

But existential types erase the type of the argument that was passed to the (polymorphic) constructor so they cannot be deconstructed. However, not all is lost. In physics, we have Hawking radiation escaping a black hole. In programming, even if we can’t peek at the existential type, we can extract some information about the structure surrounding it.

Here’s an example: We know we have a list, but of what?

data SomeList = forall a. SomeL [a]

It turns out that to undo a polymorphic constructor we can use a polymorphic function. We have at our disposal functions that act on lists of arbitrary type, for instance length:

length :: forall a. [a] -> Int

The use of a polymorphic function to “undo” a polymorphic constructor doesn’t expose the existential type:

len :: SomeList -> Int
len (SomeL as) = length as

Indeed, this works:

someL :: SomeList
someL = SomeL [1..10]
> len someL
> 10

Extracting the tail of a list is also a polymorphic function. We can use it on SomeList without exposing the type a:

trim :: SomeList -> SomeList
trim (SomeL []) = SomeL []
trim (SomeL (a: as)) = SomeL as

Here, the tail of the (non-empty) list is immediately stashed inside SomeList, thus hiding the type a.

But this will not compile, because it would expose a:

bad :: SomeList -> a
bad (SomeL as) = head as

Producer/Consumer

Existential types are often defined using producer/consumer pairs. The producer is able to produce values of the hidden type, and the consumer can consume them. The role of the client of the existential type is to activate the producer (e.g., by providing some input) and passing the result (without looking at it) directly to the consumer.

Here’s a simple example. The producer is just a value of the hidden type a, and the consumer is a function consuming this type:

data Hide b = forall a. Hide a (a -> b)

All the client can do is to match the consumer with the producer:

unHide :: Hide b -> b
unHide (Hide a f) = f a

This is how you can use this existential type. Here, Int is the visible type, and Char is hidden:

secret :: Hide Int
secret = Hide 'a' (ord)

The function ord is the consumer that turns the character into its ASCII code:

> unHide secret
> 97

Co-Yoneda Lemma

There is a duality between polymorphic types and existential types. It’s rooted in the duality between universal quantifiers (for all, \forall) and existential quantifiers (there exists, \exists).

The Yoneda lemma is a statement about polymorphic functions. Its dual, the co-Yoneda lemma, is a statement about existential types. Consider the following type that combines the producer of x‘s (a functorful of x‘s) with the consumer (a function that transforms x‘s to a‘s):

data CoYo f a = forall x. CoYo (f x) (x -> a)

What does this data type secretly encode? The only thing the client of CoYo can do is to apply the consumer to the producer. Since the producer has the form of a functor, the application proceeds through fmap:

unCoYo :: Functor f => CoYo f a -> f a
unCoYo (CoYo fx g) = fmap g fx

The result is a functorful of a‘s. Conversely, given a functorful of a‘s, we can form a CoYo by matching it with the identity function:

mkCoYo :: Functor f => f a -> CoYo f a
mkCoYo fa = CoYo fa id

This pair of unCoYo and mkCoYo, one the inverse of the other, witness the isomorphism

CoYo f a ~ f a

In other words, CoYo f a is secretly hiding a functorful of a‘s.

Contravariant Consumers

The informal terms producer and consumer, can be given more rigorous meaning. A producer is a data type that behaves like a functor. A functor is equipped with fmap, which lets you turn a producer of a‘s to a producer of b‘s using a function a->b.

Conversely, to turn a consumer of a‘s to a consumer of b‘s you need a function that goes in the opposite direction, b->a. This idea is encoded in the definition of a contravariant functor:

class Contravariant f where
  contramap :: (b -> a) -> f a -> f b

There is also a contravariant version of the co-Yoneda lemma, which reverses the roles of a producer and a consumer:

data CoYo' f a = forall x. CoYo' (f x) (a -> x)

Here, f is a contravariant functor, so f x is a consumer of x‘s. It is matched with the producer of x‘s, a function a->x.

As before, we can establish an isomorphism

CoYo' f a ~ f a

by defining a pair of functions:

unCoYo' :: Contravariant f => CoYo' f a -> f a
unCoYo' (CoYo' fx g) = contramap g fx
mkCoYo' :: Contravariant f => f a -> CoYo' f a
mkCoYo' fa = CoYo' fa id

Existential Lens

A lens abstracts a device for focusing on a part of a larger data structure. In functional programming we deal with immutable data, so in order to modify something, we have to decompose the larger structure into the focus (the part we’re modifying) and the residue (the rest). We can then recreate a modified data structure by combining the new focus with the old residue. The important observation is that we don’t care what the exact type of the residue is. This description translates directly into the following definition:

data Lens' s a =
  forall c. Lens' (s -> (c, a)) ((c, a) -> s)

Here, s is the type of the larger data structure, a is the type of the focus, and the existentially hidden c is the type of the residue. A lens is constructed from a pair of functions, the first decomposing s and the second recomposing it.
SimpleLens

Given a lens, we can construct two functions that don’t expose the type of the residue. The first is called get. It extracts the focus:

toGet :: Lens' s a -> (s -> a)
toGet (Lens' frm to) = snd . frm

The second, called set replaces the focus with the new value:

toSet :: Lens' s a -> (s -> a -> s)
toSet (Lens' frm to) = \s a -> to (fst (frm s), a)

Notice that access to residue not possible. The following will not compile:

bad :: Lens' s a -> (s -> c)
bad (Lens' frm to) = fst . frm

But how do we know that a pair of a getter and a setter is exactly what’s hidden in the existential definition of a lens? To show this we have to use the co-Yoneda lemma. First, we have to identify the producer and the consumer of c in our existential definition. To do that, notice that a function returning a pair (c, a) is equivalent to a pair of functions, one returning c and another returning a. We can thus rewrite the definition of a lens as a triple of functions:

data Lens' s a = 
  forall c. Lens' (s -> c) (s -> a) ((c, a) -> s)

The first function is the producer of c‘s, so the rest will define a consumer. Recall the contravariant version of the co-Yoneda lemma:

data CoYo' f s = forall c. CoYo' (f c) (s -> c)

We can define the contravariant functor that is the consumer of c‘s and use it in our definition of a lens. This functor is parameterized by two additional types s and a:

data F s a c = F (s -> a) ((c, a) -> s)

This lets us rewrite the lens using the co-Yoneda representation, with f replaced by (partially applied) F s a:

type Lens' s a = CoYo' (F s a) s

We can now use the isomorphism CoYo' f s ~ f s. Plugging in the definition of F, we get:

Lens' s a ~ CoYo' (F s a) s
CoYo' (F s a) s ~ F s a s
F s a s ~ (s -> a) ((s, a) -> s)

We recognize the two functions as the getter and the setter. Thus the existential representation of the lens is indeed isomorphic to the getter/setter pair.

Type-Changing Lens

The simple lens we’ve seen so far lets us replace the focus with a new value of the same type. But in general the new focus could be of a different type. In that case the type of the whole thing will change as well. A type-changing lens thus has the same decomposition function, but a different recomposition function:

data Lens s t a b =
forall c. Lens (s -> (c, a)) ((c, b) -> t)

As before, this lens is isomorphic to a get/set pair, where get extracts an a:

toGet :: Lens s t a b -> (s -> a)
toGet (Lens frm to) = snd . frm

and set replaces the focus with a new value of type b to produce a t:

toSet :: Lens s t a b -> (s -> b -> t)
toSet (Lens frm to) = \s b -> to (fst (frm s), b)

Other Optics

The advantage of the existential representation of lenses is that it easily generalizes to other optics. The idea is that a lens decomposes a data structure into a pair of types (c, a); and a pair is a product type, symbolically c \times a

data Lens s t a b =
forall c. Lens (s -> (c, a))
               ((c, b) -> t)

A prism does the same for the sum data type. A sum c + a is written as Either c a in Haskell. We have:

data Prism s t a b =
forall c. Prism (s -> Either c a)
                (Either c b -> t)

We can also combine sum and product in what is called an affine type c_1 + c_2 \times a. The resulting optic has two possible residues, c1 and c2:

data Affine s t a b =
forall c1 c2. Affine (s -> Either c1 (c2, a))
                     (Either c1 (c2, b) -> t)

The list of optics goes on and on.

Profunctors

A producer can be combined with a consumer in a single data structure called a profunctor. A profunctor is parameterized by two types; that is p a b is a consumer of a‘s and a producer of b‘s. We can turn a consumer of a‘s and a producer of b‘s to a consumer of s‘s and a producer of t‘s using a pair of functions, the first of which goes in the opposite direction:

class Profunctor p where
  dimap :: (s -> a) -> (b -> t) -> p a b -> p s t

The standard example of a profunctor is the function type p a b = a -> b. Indeed, we can define dimap for it by precomposing it with one function and postcomposing it with another:

instance Profunctor (->) where
  dimap in out pab = out . pab . in

Profunctor Optics

We’ve seen functions that were polymorphic in types. But polymorphism is not restricted to types. Here’s a definition of a function that is polymorphic in profunctors:

type Iso s t a b = forall p. Profunctor p =>
  p a b -> p s t

This function says: Give me any producer of b‘s that consumes a‘s and I’ll turn it into a producer of t‘s that consumes s‘s. Since it doesn’t know anything else about its argument, the only thing this function can do is to apply dimap to it. But dimap requires a pair of functions, so this profunctor-polymorphic function must be hiding such a pair:

s -> a
b -> t

Indeed, given such a pair, we can reconstruct it’s implementation:

mkIso :: (s -> a) -> (b -> t) -> Iso s t a b
mkIso g h = \p -> dimap g h p

All other optics have their corresponding implementation as profunctor-polymorphic functions. The main advantage of these representations is that they can be composed using simple function composition.

Main Takeaways

  • Producers and consumers correspond to covariant and contravariant functors
  • Existential types are dual to polymorphic types
  • Existential optics combine producers with consumers in one package
  • In such optics, producers decompose, and consumers recompose data
  • Functions can be polymorphic with respect to types, functors, or profunctors

A PDF version of this post is available on github.

Abstract

Co-presheaf optic is a new kind of optic that generalizes the polynomial lens. Its distinguishing feature is that it’s not based on the action of a monoidal category. Instead the action is parameterized by functors between different co-presheaves. The composition of these actions corresponds to composition of functors rather than the more traditional tensor product. These functors and their composition have a representation in terms of profunctors.

Motivation

A lot of optics can be defined using the existential, or coend, representation:

\mathcal{O}\langle a, b\rangle \langle s, t \rangle = \int^{m \colon \mathcal M} \mathcal C (s, m \bullet a) \times \mathcal D ( m \bullet b, t)

Here \mathcal M is a monoidal category with an action on objects of two categories \mathcal C and \mathcal D (I’ll use the same notation for both actions). The actions are composed using the tensor product in \mathcal M:

n \bullet (m \bullet a) = (n \otimes m) \bullet a

The idea of this optic is that we have a pair of morphisms, one decomposing the source s into the action of some m on a, and the other recomposing the target t from the action of the same m on b. In most applications we pick \mathcal D to be the same category as \mathcal C.

Recently, there has been renewed interest in polynomial functors. Morphisms between polynomial functors form a new kind of optic that doesn’t neatly fit this mold. They do, however, admit an existential representation or the form:

\int^{c_{k i}} \prod_{k \in K} \mathbf{Set} \left(s_k,  \sum_{n \in N} a_n \times c_{n k} \right) \times \prod_{i \in K}  \mathbf{Set} \left(\sum_{m \in N} b_m \times c_{m i}, t_i \right)

Here the sets s_k and t_i can be treated as fibers over the set K, while the sets a_n and b_m are fibers over a different set N.

Alternatively, we can treat these fibrations as functors from discrete categories to \mathbf{Set}, that is co-presheaves. For instance a_n is the result of a co-presheaf a acting on an object n of a discrete category \mathcal N. The products over K can be interpreted as ends that define natural transformations between co-presheaves. The interesting part is that the matrices c_{n k} are fibrated over two different sets. I have previously interpreted them as profunctors:

c \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

In this post I will elaborate on this interpretation.

Co-presheaves

A co-presheaf category [\mathcal C, Set ] behaves, in many respects, like a vector space. For instance, it has a “basis” consisting of representable functors \mathcal C (r, -); in the sense that any co-presheaf is as a colimit of representables. Moreover, colimit-preserving functors between co-presheaf categories are very similar to linear transformations between vector spaces. Of particular interest are functors that are left adjoint to some other functors, since left adjoints preserve colimits.

The polynomial lens formula has a form suggestive of vector-space interpretation. We have one vector space with vectors \vec{s} and \vec{t} and another with \vec{a} and \vec{b}. Rectangular matrices c_{n k} can be seen as components of a linear transformation between these two vector spaces. We can, for instance, write:

\sum_{n \in N} a_n \times c_{n k} = c^T a

where c^T is the transposed matrix. Transposition here serves as an analog of adjunction.

We can now re-cast the polynomial lens formula in terms of co-presheaves. We no longer intepret \mathcal N and \mathcal K as discrete categories. We have:

a, b \colon [\mathcal N, \mathbf{Set}]

s, t \colon [\mathcal K, \mathbf{Set}]

In this interpretation c is a functor between categories of co-presheaves:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

We’ll write the action of this functor on a presheaf a as c \bullet a.

We assume that this functor has a right adjoint and therefore preserves colimits.

[\mathcal K, \mathbf{Set}] (c \bullet a, t) \cong [\mathcal N, \mathbf{Set}] (a, c^{\dagger} \bullet t)

where:

c^{\dagger} \colon [\mathcal K, \mathbf{Set}] \to [\mathcal N, \mathbf{Set}]

We can now generalize the polynomial optic formula to:

\mathcal{O}\langle a, b\rangle \langle s, t \rangle = \int^{c} [\mathcal K, \mathbf{Set}] \left(s,  c \bullet a \right) \times [\mathcal K, \mathbf{Set}] \left(c \bullet b, t \right)

The coend is taken over all functors that have a right adjoint. Fortunately there is a better representation for such functors. It turns out that colimit preserving functors:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

are equivalent to profunctors (see the Appendix for the proof). Such a profunctor:

p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

is given by the formula:

p \langle n, k \rangle = c ( \mathcal N(n, -)) k

where \mathcal N(n, -) is a representable co-presheaf.

The action of c can be expressed as a coend:

(c \bullet a) k = \int^{n} a(n) \times p \langle n, k \rangle

The co-presheaf optic is then a coend over all profunctors p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}:

\int^{p} [\mathcal K, \mathbf{Set}] \left(s,  \int^{n} a(n) \times p \langle n, - \rangle \right) \times [\mathcal K, \mathbf{Set}] \left(\int^{n'} b(n') \times p \langle n', - \rangle, t \right)

Composition

We have defined the action c \bullet a as the action of a functor on a co-presheaf. Given two composable functors:

c \colon  [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

and:

c' \colon  [\mathcal K, \mathbf{Set}] \to [\mathcal M, \mathbf{Set}]

we automatically get the associativity law:

c' \bullet (c \bullet a) = (c' \circ c) a

The composition of functors between co-presheaves translates directly to profunctor composition. Indeed, the profunctor p' \diamond p corresponding to c' \circ c is given by:

(p' \diamond p) \langle n, m \rangle = (c' \circ c) ( \mathcal N(n, -)) m

and can be evaluated to:

(c' ( c ( \mathcal N(n, -))) m \cong \int^{k} c ( \mathcal N(n, -)) k \times p' \langle k, m \rangle

\cong \int^{k} p \langle n, k \rangle \times p' \langle k, m \rangle

which is the standard definition of profunctor composition.

Consider two composable co-presheaf optics, \mathcal{O}\langle a, b\rangle \langle s, t \rangle and \mathcal{O}\langle a', b' \rangle \langle a, b \rangle. The first one tells us that there exists a c and a pair of natural transformations:

l_c (s,  a ) = [\mathcal K, \mathbf{Set}] \left(s,  c \bullet a \right)

r_c (b, t) = [\mathcal K, \mathbf{Set}] \left(c \bullet b, t \right)

The second tells us that there exists a c' and a pair:

l'_{c'} (a,  a' ) = [\mathcal K, \mathbf{Set}] \left(a,  c' \bullet a' \right)

r'_{c'} (b', b) = [\mathcal K, \mathbf{Set}] \left(c' \bullet b', b \right)

The composition of the two should be an optic of the type \mathcal{O}\langle a', b'\rangle \langle s, t \rangle. Indeed, we can construct such an optic using the composition c' \circ c and a pair of natural transformations:

s \xrightarrow{l_c (s,  a )} c \bullet a \xrightarrow{c \,\circ \, l'_{c'} (a,  a')} c \bullet (c' \bullet a') \xrightarrow{assoc} (c \circ c') \bullet a'

(c \circ c') \bullet b' \xrightarrow{assoc^{-1}} c \bullet (c' \bullet b') \xrightarrow{c \, \circ \, r'_{c'} (b', b)} c \bullet b \xrightarrow{r_c (b, t)}  t

Generalizations

By duality, there is a corresponding optic based on presheaves. Also, (co-) presheaves can be naturally generalized to enriched categories, where the correspondence between left adjoint functors and enriched profunctors holds as well.

Appendix

I will show that a functor between two co-presheaves that has a right adjoint and therefore preserves colimits:

c \colon [\mathcal N, \mathbf{Set}] \to [\mathcal K, \mathbf{Set}]

is equivalent to a profunctor:

p \colon \mathcal N^{op} \times \mathcal K \to \mathbf{Set}

The profunctor is given by:

p \langle n, k \rangle = c ( \mathcal N(n, -)) k

and the functor c can be recovered using the formula:

c (a) k = \int^{n'} a (n') \times p \langle n', k \rangle

where:

a \colon [\mathcal N, \mathbf{Set}]

I’ll show that these formulas are the inverse of each other. First, inserting the formula for c into the definition of p should gives us p back:

\int^{n'} \mathcal N(n, -) (n') \times p\langle n', k \rangle \cong  p \langle n, k \rangle

which follows from the co-Yoneda lemma.

Second, inserting the formula for p into the definition of c should give us c back:

\int^{n'} a n' \times c(\mathcal N(n', -)) k  \cong c (a) k

Since c preserves all colimits, and any co-presheaf is a colimit of representables, it’s enough that we prove this identity for a representable:

a (n) = \mathcal N (r, n)

We have to show that:

\int^{n'}  \mathcal N (r, n')  \times  c(\mathcal N(n', -)) k \cong  c ( \mathcal N (r, -) ) k

and this follows from the co-Yoneda lemma.


A PDF of this post is available on github.

Motivation

In this post I’ll be looking at a subcategory of \mathbf{Poly} that consists of polynomial functors in which the fibration is done over one fixed set N:

P(y) = \sum_{n \in N} s_n \times \mathbf{Set}(t_n, y)

The reason for this restriction is that morphisms between such functors, which are called polynomial lenses, can be understood in terms of monoidal actions. Optics that have this property automatically have profunctor representation. Profunctor representation has the advantage that it lets us compose optics using regular function composition.

Previously I’ve explored the representations of polynomial lenses as optics in terms on functors and profunctors on discrete categories. With just a few modifications, we can make these categories non-discrete. The trick is to replace sums with coends and products with ends; and, when appropriate, interpret ends as natural transformations.

Monoidal Action

Here’s the existential representation of a lens between polynomials in which all fibrations are over the same set N:

\mathbf{Pl}\langle s, t\rangle \langle a, b\rangle \cong

\int^{c_{k i}} \prod_{k \in N} \mathbf{Set} \left(s_k,  \sum_{n \in N} a_n \times c_{n k} \right) \times \prod_{i \in N}  \mathbf{Set} \left(\sum_{m \in N} b_m \times c_{m i}, t_i \right)

This makes the matrices c_{n k} “square.” Such matrices can be multiplied using a version of matrix multiplication.

Interestingly, this idea generalizes naturally to a setting in which N is replaced by a non-discrete category \mathcal{N}. In this setting, we’ll write the residues c_{m n} as profunctors:

c \langle m, n \rangle \colon \mathcal{N}^{op} \times \mathcal{N} \to \mathbf{Set}

They are objects in the monoidal category in which the tensor product is given by profunctor composition:

(c \diamond c') \langle m, n \rangle = \int^{k \colon \mathcal{N}} c \langle m, k \rangle \times c' \langle k, n \rangle

and the unit is the hom-functor \mathcal{N}(m, n). (Incidentally, a monoid in this category is called a promonad.)

In the case of \mathcal{N} a discrete category, these definitions decay to standard matrix multiplication:

\sum_k c_{m k} \times c'_{k n}

and the Kronecker delta \delta_{m n}.

We define the monoidal action of the profunctor c acting on a co-presheaf a as:

(c \bullet a) (m) = \int^{n \colon \mathcal{N}} a(n) \times c \langle n, m \rangle

This is reminiscent of a vector being multiplied by a matrix. Such an action of a monoidal category equips the co-presheaf category with the structure of an actegory.

A product of hom-sets in the definition of the existential optic turns into a set of natural transformations in the functor category [\mathcal{N}, \mathbf{Set}] .

\mathbf{Pl}\langle s, t\rangle \langle a, b\rangle \cong \int^{c \colon [\mathcal{N}^{op} \times \mathcal{N}, Set]}   [\mathcal{N}, \mathbf{Set}]  \left(s, c \bullet a\right)  \times  [\mathcal{N}, \mathbf{Set}]  \left(c \bullet b, t\right)

Or, using the end notation for natural transformations:

\int^{c} \left( \int_m \mathbf{Set}\left(s(m), (c \bullet a)(m)\right)  \times  \int_n \mathbf{Set} \left((c \bullet b)(n), t(n)\right) \right)

As before, we can eliminate the coend if we can isolate c in the second hom-set using a series of isomorphisms:

\int_n  \mathbf{Set} \left(\int^k b(k) \times c\langle k, n \rangle , t(n) \right)

\cong  \int_n \int_k \mathbf{Set}\left( b(k) \times c\langle k, n \rangle , t (n)\right)

\cong   \int_{n, k} \mathbf{Set}\left(c\langle k, n \rangle , [b(k), t (n)]\right)

I used the fact that a mapping out of a coend is an end. The result, after applying the Yoneda lemma to eliminate the end over k, is:

\mathbf{Pl}\langle s, t\rangle \langle a, b\rangle \cong \int_m \mathbf{Set}\left(s(m), \int^j a(j) \times [b(j), t(m)] \right)

or, with some abuse of notation:

[\mathcal{N}, \mathbf{Set}] ( s, [b, t] \bullet a)

When \mathcal{N} is discrete, this formula decays to the one for polynomial lens.

Profunctor Representation

Since this poly-lens is a special case of a general optic, it automatically has a profunctor representation. The trick is to define a generalized Tambara module, that is a category \mathcal{T} of profunctors of the type:

P \colon [\mathcal{N}, \mathbf{Set}]^{op}  \times [\mathcal{N}, \mathbf{Set}] \to \mathbf{Set}

with additional structure given by the following family of transformations, in components:

\alpha_{c, s, t} \colon P\langle s, t \rangle \to P \left \langle c \bullet s, c \bullet t \right \rangle

The profunctor representation of the polynomial lens is then given by an end over all profunctors in this Tambara category:

\mathbf{Pl}\langle s, t\rangle \langle a, b\rangle \cong \int_{P \colon \mathcal{T}} \mathbf{Set}\left ( (U P)\langle a, b \rangle, (U P) \langle s, t \rangle \right)

Where U is the obvious forgetful functor from \mathcal{T} to the underlying profunctor category$.


Lenses and, more general, optics are an example of hard-core category theory that has immediate application in programming. While working on polynomial lenses, I had a vague idea how they could be implemented in a programming language. I thought up an example of a polynomial lens that would focus on all the leaves of a tree at once. It could retrieve or modify them in a single operation. There already is a Haskell optic called traversal that could do it. It can safely retrieve a list of leaves from a tree. But there is a slight problem when it comes to replacing them: the size of the input list has to match the number of leaves in the tree. If it doesn’t, the traversal doesn’t work.

A polynomial lens adds an additional layer of safety by keeping track of the sizes of both the trees and the lists. The problem is that its implementation requires dependent types. Haskell has some support for dependent types, so I tried to work with it, but I quickly got bogged down. So I decided to bite the bullet and quickly learn Idris. This was actually easier than I expected and this post is the result.

Counted Vectors and Trees

I started with the “Hello World!” of dependent types: counted vectors. Notice that, in Idris, type signatures use a single colon rather than the Haskell’s double colon. You can quickly get used to it after the compiler slaps you a few times.

data Vect : Type -> Nat -> Type where
  VNil : Vect a Z
  VCons : (x: a) -> (xs : Vect a n) -> Vect a (S n)

If you know Haskell GADTs, you can easily read this definition. In Haskell, we usually think of Nat as a “kind”, but in Idris types and values live in the same space. Nat is just an implementation of Peano artithmetics, with Z standing for zero, and (S n) for the successor of n. Here, VNil is the constructor of an empty vector of size Z, and VCons prepends a value of type a to the tail of size n resulting in a new vector of size (S n). Notice that Idris is much more explicit about types than Haskell.

The power of dependent types is in very strict type checking of both the implementation and of usage of functions. For instance, when mapping a function over a vector, we can make sure that the result is the same size as the argument:

mapV : (a -> b) -> Vect a n -> Vect b n
mapV f VNil = VNil
mapV f (VCons a v) = VCons (f a) (mapV f v)

When concatenating two vectors, the length of the result must be the sum of the two lengths, (plus m n):

concatV : Vect a m -> Vect a n -> Vect a (plus m n)
concatV VNil v = v
concatV (VCons a w) v = VCons a (concatV w v)

Similarly, when splitting a vector in two, the lengths must match, too:

splitV : (n : Nat) -> Vect a (plus n m) -> (Vect a n, Vect a m)
splitV Z v = (VNil, v)
splitV (S k) (VCons a v') = let (v1, v2) = splitV k v'
                            in (VCons a v1, v2)

Here’s a more complex piece of code that implements insertion sort:

sortV : Ord a => Vect a n -> Vect a n
sortV VNil = VNil
sortV (VCons x xs) = let xsrt = sortV xs 
                     in (ins x xsrt)
  where
    ins : Ord a => (x : a) -> (xsrt : Vect a n) -> Vect a (S n)
    ins x VNil = VCons x VNil
    ins x (VCons y xs) = if x < y then VCons x (VCons y xs)
                                  else VCons y (ins x xs)

In preparation for the polynomial lens example, let’s implement a node-counted binary tree. Notice that we are counting nodes, not leaves. That’s why the node count for Node is the sum of the node counts of the children plus one:

data Tree : Type -> Nat -> Type where
  Empty : Tree a Z
  Leaf  : a -> Tree a (S Z)
  Node  : Tree a n -> Tree a m -> Tree a (S (plus m  n))

All this is not much different from what you’d see in a Haskell library.

Existential Types

So far we’ve been dealing with function that return vectors whose lengths can be easily calculated from the inputs and verified at compile time. This is not always possible, though. In particular, we are interested in retrieving a vector of leaves from a tree that’s parameterized by the number of nodes. We don’t know up front how many leaves a given tree might have. Enter existential types.

An existential type hides part of its implementation. An existential vector, for instance, hides its size. The receiver of an existential vector knows that the size “exists”, but its value is inaccessible. You might wonder then: What can be done with such a mystery vector? The only way for the client to deal with it is to provide a function that is insensitive to the size of the hidden vector. A function that is polymorphic in the size of its argument. Our sortV is an example of such a function.

Here’s the definition of an existential vector:

data SomeVect : Type -> Type where
  HideV : {n : Nat} -> Vect a n -> SomeVect a

SomeVect is a type constructor that depends on the type a—the payload of the vector. The data constructor HideV takes two arguments, but the first one is surrounded by a pair of braces. This is called an implicit argument. The compiler will figure out its value from the type of the second argument, which is Vect a n. Here’s how you construct an existential:

secretV : SomeVect Int
secretV = HideV (VCons 42 VNil)

In this case, the compiler will deduce n to be equal to one, but the recipient of secretV will have no way of figuring this out.

Since we’ll be using types parameterized by Nat a lot, let’s define a type synonym:

Nt : Type
Nt = Nat -> Type

Both Vect a and Tree a are examples of this type.

We can also define a generic existential for stashing such types:

data Some : Nt -> Type where
  Hide : {n : Nat} -> nt n -> Some nt

and some handy type synonyms:

SomeVect : Type -> Type
SomeVect a = Some (Vect a)
SomeTree : Type -> Type
SomeTree a = Some (Tree a)

Polynomial Lens

We want to translate the following categorical definition of a polynomial lens:

\mathbf{PolyLens}\langle s, t\rangle \langle a, b\rangle = \prod_{k} \mathbf{Set}\left(s_k, \sum_{n} a_n \times [b_n, t_k] \right)

We’ll do it step by step. First of all, we’ll assume, for simplicity, that the indices k and n are natural numbers. Therefore the four arguments to PolyLens are types parameterized by Nat, for which we have a type alias:

PolyLens : Nt -> Nt -> Nt -> Nt -> Type

The definition starts with a big product over all k‘s. Such a product corresponds, in programming, to a polymorphic function. In Haskell we would write it as forall k. In Idris, we’ll accomplish the same using an implicit argument {k : Nat}.

The hom-set notation \mathbf{Set}(a, b) stands for a set of functions from a to b, or the type a -> b. So does the notation [a, b] (the internal hom is the same as the external hom in \mathbf{Set}). The product a \times b is the type of pairs (a, b).

The only tricky part is the sum over n. A sum corresponds exactly to an existential type. Our SomeVect, for instance, can be considered a sum over n of all vector types Vect a n.

Here’s the intuition: Consider that to construct a sum type like Either a b it’s enough to provide a value of either type a or type b. Once the Either is constructed, the information about which one was used is lost. If you want to use an Either, you have to provide two functions, one for each of the two branches of the case statement. Similarly, to construct SomeVect its enough to provide a vector of some particular lenght n. Instead of having two possibilities of Either, we have infinitely many possibilities corresponding to different n‘s. The information about what n was used is then promptly lost.

The sum in the definition of the polynomial lens:

\sum_{n} a_n \times [b_n, t_k]

can be encoded in this existential type:

data SomePair : Nt -> Nt -> Nt -> Type where
  HidePair : {n : Nat} -> 
             (k : Nat) -> a n -> (b n -> t k) -> SomePair a b t

Notice that we are hiding n, but not k.

Taking it all together, we end up with the following type definition:

PolyLens : Nt -> Nt -> Nt -> Nt -> Type
PolyLens s t a b = {k : Nat} -> s k -> SomePair a b t

The way we read this definition is that PolyLens is a function polymorphic in k. Given a value of the type s k it produces and existential pair SomePair a b t. This pair contains a value of the type a n and a function b n -> t k. The important part is that the value of n is hidden from the caller inside the existential type.

Using the Lens

Because of the existential type, it’s not immediately obvious how one can use the polynomial lens. For instance, we would like to be able to extract the foci a n, but we don’t know what the value of n is. The trick is to hide n inside an existential Some. Here is the “getter” for this lens:

getLens :  PolyLens sn tn an bn -> sn n -> Some an
getLens lens t =
  let  HidePair k v _ = lens t
  in Hide v

We call lens with the argument t, pattern match on the constructor HidePair and immediately hide the contents back using the constructor Hide. The compiler is smart enough to know that the existential value of n hasn’t been leaked.

The second component of SomePair, the “setter”, is trickier to use because, without knowing the value of n, we don’t know what argument to pass to it. The trick is to take advantage of the match between the producer and the consumer that are the two components of the existential pair. Without disclosing the value of n we can take the a‘s and use a polymorphic function to transform them into b‘s.

transLens : PolyLens sn tn an bn -> ({n : Nat} -> an n -> bn n)
        -> sn n -> Some tn
transLens lens f t =
  let  HidePair k v vt = lens t
  in  Hide (vt (f v))

The polymorphic function here is encoded as ({n : Nat} -> an n -> bn n). (An example of such a function is sortV.) Again, the value of n that’s hidden inside SomePair is never leaked.

Example

Let’s get back to our example: a polynomial lens that focuses on the leaves of a tree. The type signature of such a lens is:

treeLens : PolyLens (Tree a) (Tree b) (Vect a) (Vect b)

Using this lens we should be able to retrieve a vector of leaves Vect a n from a node-counted tree Tree a k and replace it with a new vector Vect b n to get a tree Tree b k. We should be able to do it without ever disclosing the number of leaves n.

To implement this lens, we have to write a function that takes a tree of a and produces a pair consisting of a vector of a‘s and a function that takes a vector of b‘s and produces a tree of b‘s. The type b is fixed in the signature of the lens. In fact we can pass this type to the function we are implementing. This is how it’s done:

treeLens : PolyLens (Tree a) (Tree b) (Vect a) (Vect b)
treeLens {b} t = replace b t

First, we bring b into the scope of the implementation as an implicit parameter {b}. Then we pass it as a regular type argument to replace. This is the signature of replace:

replace : (b : Type) -> Tree a n -> SomePair (Vect a) (Vect b) (Tree b)

We’ll implement it by pattern-matching on the tree.

The first case is easy:

replace b Empty = HidePair 0 VNil (\v => Empty)

For an empty tree, we return an empty vector and a function that takes and empty vector and recreates and empty tree.

The leaf case is also pretty straightforward, because we know that a leaf contains just one value:

replace b (Leaf x) = HidePair 1 (VCons x VNil) 
                                (\(VCons y VNil) => Leaf y)

The node case is more tricky, because we have to recurse into the subtrees and then combine the results.

replace b (Node t1 t2) =
  let (HidePair k1 v1 f1) = replace b t1
      (HidePair k2 v2 f2) = replace b t2
      v3 = concatV v1 v2
      f3 = compose f1 f2
  in HidePair (S (plus k2 k1)) v3 f3

Combining the two vectors is easy: we just concatenate them. Combining the two functions requires some thinking. First, let’s write the type signature of compose:

compose : (Vect b n -> Tree b k) -> (Vect b m -> Tree b j) ->
       (Vect b (plus n m)) -> Tree b (S (plus j k))

The input is a pair of functions that turn vectors into trees. The result is a function that takes a larger vector whose size is the sume of the two sizes, and produces a tree that combines the two subtrees. Since it adds a new node, its node count is the sum of the node counts plus one.

Once we know the signature, the implementation is straightforward: we have to split the larger vector and pass the two subvectors to the two functions:

compose {n} f1 f2 v =
  let (v1, v2) = splitV n v
  in Node (f1 v1) (f2 v2)

The split is done by looking at the type of the first argument (Vect b n -> Tree b k). We know that we have to split at n, so we bring {n} into the scope of the implementation as an implicit parameter.

Besides the type-changing lens (that changes a to b), we can also implement a simple lens:

treeSimpleLens : PolyLens (Tree a) (Tree a) (Vect a) (Vect a)
treeSimpleLens {a} t = replace a t

We’ll need it later for testing.

Testing

To give it a try, let’s create a small tree with five nodes and three leaves:

t3 : Tree Char 5
t3 = (Node (Leaf 'z') (Node (Leaf 'a') (Leaf 'b')))

We can extract the leaves using our lens:

getLeaves : Tree a n -> SomeVect a
getLeaves t = getLens treeSimpleLens t

As expected, we get a vector containing 'z', 'a', and 'b'.

We can also transform the leaves using our lens and the polymorphic sort function:

trLeaves : ({n : Nat} -> Vect a n -> Vect b n) -> Tree a n -> SomeTree b
trLeaves f t = transLens treeLens f t
trLeaves sortV

The result is a new tree: ('a',('b','z'))

Complete code is available on github.


A PDF of this post is available on github

Motivation

Lenses seem to pop up in most unexpected places. Recently a new type of lens showed up as a set of morphisms between polynomial functors. This lens seemed to not fit the usual classification of optics, so it was not immediately clear that it had an existential representation using coends and, consequently a profunctor representation using ends. A profunctor representation of optics is of special interest since it lets us compose optics using standard function composition. In this post I will show how the polynomial lens fits into the framework of general optics.

Polynomial Functors

A polynomial functor in \mathbf{Set} can be written as a sum (coproduct) of representables:

P(y) = \sum_{n \in N} s_n \times \mathbf{Set}(t_n, y)

The two families of sets, s_n and t_n are indexed by elements of the set N (in particular, you may think of it as a set of natural numbers, but any set will do). In other words, they are fibrations of some sets S and T over N. In programming we call such families dependent types. We can also think of these fibrations as functors from a discrete category \mathcal{N} to \mathbf{Set}.

Since, in \mathbf{Set}, the internal hom is isomorphic to the external hom, a polynomial functor is sometimes written in the exponential form, which makes it look more like an actual polynomial or a power series:

P(y) = \sum_{n \in N} s_n \times y^{t_n}

or, by representing all sets s_n as sums of singlentons:

P(y) = \sum_{n \in N} y^{t_n}

I will also use the notation [t_n, y] for the internal hom:

P(y) = \sum_{n \in N} s_n \times [t_n, y]

Polynomial functors form a category \mathbf{Poly} in which morphisms are natural transformations.

Consider two polynomial functors P and Q. A natural transformation between them can be written as an end. Let’s first expand the source functor:

\mathbf{Poly}\left( \sum_k s_k \times [t_k, -], Q\right)  =  \int_{y\colon \mathbf{Set}} \mathbf{Set} \left(\sum_k s_k \times [t_k, y], Q(y)\right)

The mapping out of a sum is isomorphic to a product of mappings:

\cong \prod_k \int_y \mathbf{Set} \left(s_k \times [t_k, y], Q(y)\right)

We can see that a natural transformation between polynomials can be reduced to a product of natural transformations out of monomials. So let’s consider a mapping out of a monomial:

\int_y \mathbf{Set} \left( s \times [t, y], \sum_n a_n \times [b_n, y]\right)

We can use the currying adjunction:

\int_y \mathbf{Set} \left( [t, y],  \left[s, \sum_n a_n \times [b_n, y]\right]  \right)

or, in \mathbf{Set}:

\int_y \mathbf{Set} \left( \mathbf{Set}(t, y), \mathbf{Set} \left(s, \sum_n a_n \times [b_n, y]\right)  \right)

We can now use the Yoneda lemma to eliminate the end. This will simply replace y with t in the target of the natural transformation:

\mathbf{Set}\left(s, \sum_n a_n \times [b_n, t] \right)

The set of natural transformation between two arbitrary polynomials \sum_k s_k \times [t_k, y] and \sum_n a_n \times [b_n, y] is called a polynomial lens. Combining the previous results, we see that it can be written as:

\mathbf{PolyLens}\langle s, t\rangle \langle a, b\rangle = \prod_{k \in K} \mathbf{Set}\left(s_k, \sum_{n \in N} a_n \times [b_n, t_k] \right)

Notice that, in general, the sets K and N are different.

Using dependent-type language, we can describe the polynomial lens as acting on a whole family of types at once. For a given value of type s_k it determines the index n. The interesting part is that this index and, consequently, the type of the focus a_n and the type on the new focus b_n depends not only on the type but also on the value of the argument s_k.

Here’s a simple example: consider a family of node-counted trees. In this case s_k is a type of a tree with k nodes. For a given node count we can still have trees with a different number of leaves. We can define a poly-lens for such trees that focuses on the leaves. For a given tree it produces a counted vector a_n of leaves and a function that takes a counted vector b_n (same size, but different type of leaf) and returns a new tree t_k.

Lenses and Kan Extensions

After publishing an Idris implementation of the polynomial lens, Baldur Blöndal (Iceland Jack) made an interesting observation on Twitter: The sum type in the definition of the lens looks like a left Kan extension. Indeed, if we treat a and b as co-presheaves, the left Kan extension of a along b is given by the coend:

Lan_b a \cong \int^{n \colon \mathcal{N}} a \times [b, -]

A coend over a discrete category is a sum (coproduct), since the co-wedge condition is trivially satisfied.

Similarly, an end over a discrete category \mathcal{K} becomes a product. An end of hom-sets becomes a natural transformation. A polynomial lens can therefore be rewritten as:

\prod_{k \in K} \mathbf{Set}\left(s_k, \sum_{n \in N} a_n \times [b_n, t_k] \right)  \cong [\mathcal{K}, \mathbf{Set}](s, (Lan_b a) \circ t)

Finally, since the left Kan extension is the left adjoint of functor pre-composition, we get this very compact formula:

\mathbf{PolyLens}\langle s, t\rangle \langle a, b\rangle \cong [\mathbf{Set}, \mathbf{Set}](Lan_t s, Lan_b a)

which works for arbitrary categories \mathcal{N} and \mathcal{K} for which the relevant Kan extensions exist.

Existential Representation

A lens is just a special case of optics. Optics have a very general representation as existential types or, categorically speaking, as coends.

The general idea is that optics describe various modes of decomposing a type into the focus (or multiple foci) and the residue. This residue is an existential type. Its only property is that it can be combined with a new focus (or foci) to produce a new composite.

The question is, what’s the residue in the case of a polynomial lens? The intuition from the counted-tree example tells us that such residue should be parameterized by both, the number of nodes, and the number of leaves. It should encode the shape of the tree, with placeholders replacing the leaves.

In general, the residue will be a doubly-indexed family c_{m n} and the existential form of poly-lens will be implemented as a coend over all possible residues:

\mathbf{Pl}\langle s, t\rangle \langle a, b\rangle \cong

\int^{c_{k i}} \prod_{k \in K} \mathbf{Set} \left(s_k,  \sum_{n \in N} a_n \times c_{n k} \right) \times \prod_{i \in K}  \mathbf{Set} \left(\sum_{m \in N} b_m \times c_{m i}, t_i \right)

To see that this representation is equivalent to the previous one let’s first rewrite a mapping out of a sum as a product of mappings:

\prod_{i \in K}  \mathbf{Set} \left(\sum_{m \in N} b_m \times c_{m i}, t_i \right) \cong \prod_{i \in K} \prod_{m \in N} \mathbf{Set}\left(b_m \times c_{m i}, t_i \right)

and use the currying adjunction to get:

\prod_{i \in K} \prod_{m \in N} \mathbf{Set}\left(c_{m i}, [b_m, t_i ]\right)

The main observation is that, if we treat the sets N and K as a discrete categories \mathcal{N} and \mathcal{K}, a product of mappings can be considered a natural transformation between functors. Functors from a discrete category are just mappings of objects, and naturality conditions are trivial.

A double product can be considered a natural transformation from a product category. And since a discrete category is its own opposite, we can (anticipating the general profunctor case) rewrite our mappings as natural transformations:

\prod_{i \in K} \prod_{m \in N} \mathbf{Set} \left(c_{m i}, [b_m, t_i] \right) \cong [\mathcal{N}^{op} \times \mathcal{K}, \mathbf{Set}]\left(c_{= -}, [b_=, t_- ]\right)

The indexes were replaced by placeholders. This notation underscores the interpretation of b as a functor (co-presheaf) from \mathcal{N} to \mathbf{Set}, t as a functor from \mathcal{K} to \mathbf{Set}, and c as a profunctor on \mathcal{N}^{op} \times \mathcal{K}.

We can therefore use the co-Yoneda lemma to eliminate the coend over c_{ki}. The result is that \mathbf{Pl}\langle s, t\rangle \langle a, b\rangle can be wrtitten as:

\int^{c_{k i}} \prod_{k \in K} \mathbf{Set} \left(s_k,  \sum_{n \in N} a_n \times c_{n k} \right) \times [\mathcal{N}^{op} \times \mathcal{K}, \mathbf{Set}]\left(c_{= -}, [b_=, t_- ]\right)

\cong  \prod_{k \in K} \mathbf{Set}\left(s_k, \sum_{n \in N} a_n \times [b_n, t_k] \right)

which is exactly the original polynomial-to-polynomial transformation.

Acknowledgments

I’m grateful to David Spivak, Jules Hedges and his collaborators for sharing their insights and unpublished notes with me, especially for convincing me that, in general, the two sets N and K should be different.


A PDF of this post is available on github.

Motivation

A lens is a reification of the concept of object composition. In a nutshell, it describes the process of decomposing the source object s into a focus a and a residue c and recomposing a new object t from a new focus b and the same residue c.

The key observation is that we don’t care what the residue is as long as it exists. This is why a lens can be implemented, in Haskell, as an existential type:

data Lens s t a b where
 Lens :: (s -> (c, a)) -> ((c, b) -> t) -> Lens s t a b

In category theory, the existential type is represented as a coend—essentially a gigantic sum over all objects c in the category \mathcal{C}:

\mathcal{L}\langle s, t\rangle \langle a, b \rangle = \int^{c \colon \mathcal{C}} \mathcal{C}(s, c \times a) \times  \mathcal{C}(c \times b, t)

There is a simple recipe to turn this representation into the more familiar one. The first step is to use the currying adjunction on the second hom-functor:

\mathcal{C}(c \times b, t) \cong  \mathcal{C}(c, [b, t])

Here, [b, t] is the internal hom, or the function object (b->t).
Once the object c appears as the source in the hom-set, we can use the co-Yoneda lemma to eliminate the coend. This is the formula we use:

\int^c F c \times  \mathcal{C}(c, x) \cong F x

It works for any functor F from the category \mathcal{C} to \mathbf{Set} so, in particular we have:

\mathcal{L}\langle s, t\rangle \langle a, b \rangle \cong \int^{c} \mathcal{C}(s, c \times a) \times  \mathcal{C}(c, [b, t]) \cong \mathcal{C}(s, [b, t] \times a)

The result is a pair of arrows:

\mathcal{C}(s, [b, t]) \times \mathcal{C}(s, a)

the first corrsponding to:

set :: s -> b -> t

and the second to:

get :: s -> a

It turns out that this trick works for more general optics. It all depends on being able to isolate the object c as the source of the second hom-set.

We’ve been able to do it case-by-case for lenses, prisms, traversals, and the whole zoo of optics. It turns out that the same problem was studied in all its generality by Australian category theorists Janelidze and Kelly in a context that had nothing to do with optics.

Monoidal Actions

Here’s the most general definition of an optic:

\mathcal{O}\langle s, t\rangle \langle a, b \rangle = \int^{c \colon \mathcal{M}} \mathcal{D}(s, K_c a) \times  \mathcal{C}(L_c b, t)

This definition involves three separate categories. The category \mathcal{M} is monoidal, and it has an action defiend on both \mathcal{C} and \mathcal{D}. A category with a monoidal action is called an actegory.

We get the definition of a lens by having all three categories be the same—a cartesian closed category \mathcal{C}. We define the action of this category on itself:

L_c a = c \times a

(and the same for K_c).

There are two equivalent ways of defining the action of a monoidal category. One is as the mapping

\bullet \colon \mathcal{M} \times \mathcal{C} \to \mathcal{C}

written in infix notation as c \bullet a. It has to be equipped with two additional structure maps—isomorphisms that relate the tensor product \otimes in \mathcal{M} and its unit I to the action in \mathcal{C}:

\alpha_{d c a} \colon (d \otimes c) \bullet a  \to d \bullet (c \bullet a)

\lambda_a \colon I \bullet a \to a

plus some coherence conditions corresponding to associativity and unit laws.

Using this notation, we can rewrite the definition of an optic as:

\mathcal{O}\langle s, t\rangle \langle a, b \rangle = \int^{c \colon \mathcal{M}} \mathcal{D}(s, c \bullet a) \times  \mathcal{C}(c \bullet b, t)

with the understanding that we use the same symbol for two different actions in \mathcal{C} and \mathcal{D}.

Alternatively, we can curry the action \bullet, and use the mapping:

L \colon \mathcal{M} \to [\mathcal{C}, \mathcal{C}]

The target category here is the category of endofunctors [\mathcal{C}, \mathcal{C}], which is naturally equipped with a monoidal structure given by functor composition (and, as we well know, a monad is just a monoid in that category).

The two structure maps from the definition of \bullet translate to the requirement that L be a strict monoidal functor, mapping tensor product to functor composition and unit object to identity functor.

The Adjunction

When we were eliminating the coend in the definition of a lens, we used the currying adjunction. This particular adjunction works inside a single category but, in general, an adjunction relates two functors between a pair of categories. Therefore, to eliminate the end from the optic, we need an adjunction that looks like this:

\mathcal{C}(c \bullet a, t) \cong \mathcal{M}(c, R_a t)

The category on the right is the monoidal category \mathcal{M}, because c is the object from this category.

Using the adjunction and the co-Yoneda lemma we get:

\mathcal{O}\langle s, t\rangle \langle a, b \rangle = \int^{c \colon \mathcal{M}} \mathcal{D}(s, c \bullet a) \times  \mathcal{M}(c, R_b t) \cong  \mathcal{D}(s, R_b t \bullet a)

There is a whole slew of monoidal actions that have the right adjoint of this type. Let’s look at an example.

The category of sets is a monoidal category, and we can define its action on another category using the formula:

\mathcal{C}(c \cdot a, t) \cong \mathbf{Set}(c, \mathcal{C}(a, t))

This is the definition of a copower. A category in which this adjunction holds for all objects is called copowered or tensored over \mathbf{Set}.

The intuition is that a copower is like an iterated sum (hence the multiplication sign). Indeed, a mapping out of a coproduct of c copies of a, where c is a set, is equivalent to c mappings out of a.

This formula generalizes to the case in which \mathcal{C} is a category enriched over a monoidal category \mathcal{V}. In that case we have:

\mathcal{C}(c \cdot a, t) \cong \mathcal{V}(c, \mathcal{C}(a, t))

Here, the hom \mathcal{C}(a, t) is an object in \mathcal{V}.

Relation to Enrichment

We are interested in the adjunction:

\mathcal{C}( c \bullet a, t) \cong \mathcal{M}(c, R_a t)

The functor R is covariant in t and contravariant in a:

R \colon \mathcal{C}^{op} \times \mathcal{C} \to \mathcal{M}

In other words, it’s a profunctor. In fact, it has the right signature to be a hom-functor. And this is what Janelidze and Kelly show: the functor R can serve as the hom-functor that generates the enrichment for \mathcal{C}. If we call the enriched category \mathbf{C}, we can define the hom-object as:

\mathbf{C}(a, t) = R_a t

Our adjunction can be rewritten as:

\mathcal{C}( c \bullet a, t) \cong \mathcal{M}(c, \mathbf{C}(a, t))

The counit of this adjunction is a mapping:

\epsilon_{a t} \colon \mathbf{C}(a, t)  \bullet a \to t

which is analogous to function application.

The hom-object \mathbf{C}(a, t) in an enriched category must satisfy the composition and identity laws. In an enriched category, composition is a mapping:

\circ \colon \mathbf{C}(b, t) \otimes \mathbf{C}(a, b) \to \mathbf{C}(a, t)

Let’s see if we can get this mapping from our adjunction by replacing c with \mathbf{C}(b, t) \otimes \mathbf{C}(a, b). We get:

\mathcal{C}( (\mathbf{C}(b, t) \otimes \mathbf{C}(a, b)) \bullet a, t) \cong \mathcal{M}(\mathbf{C}(b, t) \otimes \mathbf{C}(a, b), \mathbf{C}(a, t))

The right-hand side should contain the mapping we’re looking for. All we need is to point at a morphism on the left. Indeed, we can define it as the following composite:

\big( \mathbf{C}(b, t) \otimes \mathbf{C}(a, b)\big) \bullet a  \xrightarrow{\alpha}

\mathbf{C}(b, c) \bullet \big(\mathbf{C}(a, b)) \bullet a\big)   \xrightarrow{id \; \bullet \; \epsilon}

\mathbf{C}(b, t) \bullet b  \xrightarrow{\epsilon} t

We used the structure map \alpha and (twice) the counit of the adjunction \epsilon.

Similarly, the identity of composition, which is the mapping:

id_a \colon I \to \mathbf{C}(a, a)

is adjoint to the other structure map \lambda_a.

Janelidze and Kelly go on to prove that the action of a monoidal right-closed category having the right adjoint is equivalent to the existence of the tensored enrichment of the category on which this action is defined.

The two examples of monoidal actions we’ve seen so far are indeed equivalent to enrichments. A cartesian closed category in which we defined the action L_c a = c \times a is automatically self-enriched. The copower action L_c a = c \cdot a is equivalent to enrichment over \mathbf{Set} (which doesn’t mean much, since regular categories are naturally \mathbf{Set}-enriched; but not all of them are tensored).

Acknowledgments

I’m grateful to Jules Hedges and his collaborators for sharing their insights and unpublished notes with me.


Category theory extracts the essence of structure and composition. At its foundation it deals with the composition of arrows. Building on composition of arrows it then goes on describing the ways objects can be composed: we have products, coproducts and, at a higher level, tensor products. They all describe various modes of composing objects. In monoidal categories any two objects can be composed.

Unlike composition, which can be described uniformly, decomposition requires case-by-case treatment. It’s easy to decompose a cartesian product using projections. A coproduct (sum) can be decomposed using pattern matching. A generic tensor product, on the other hand, has no standard means of decompositon.

Optics is the essence of decomposition. It answers the question of what it means to decompose a composite.

We consider an object decomposable when:

  • We can split it into the focus and the complement,
  • We can replace the focus with something else, without changing the complement, to get a new composite object,
  • We can zoom in; that is, if the focus is decomposable, we can compose the two decompositions,
  • It’s possible for the whole object to be the focus.

Let’s translate these requirements into the language of category theory. We’ll start with the standard example: the lens, which is the optic for decomposing cartesian products.

The splitting means that there is a morphism from the composite object s to the product c \times a, where c is the complement and a is the focus. This morphism is a member of the hom-set \mathcal{C}(s, c \times a).

To replace the focus we need another morphism that takes the same complement c, combines it with the new focus b to produce the new composite t. This morphism is a member of the hom-set \mathcal{C}(c \times b, t)

Here’s the important observation: We don’t care what the complement is. We are “focusing” on the focus. We carry the complement over to combine it with the new focus, but we don’t use it for anything else. It’s a featureless black box.

To erase the identity of the complement, we hide it inside a coend. A coend is a generalization of a sum, so it is written using the integral sign (see the Appendix for details). Programmers know it as an existential type, logicians call it an existential quantifier. We say that there exists a complement c, but we don’t care what it is. We “integrate” over all possible complements.

Here’s the existential definition of the lens:

L(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c \times a) \times \mathcal{C}(c \times b, t)

Just like we construct a coproduct using one of the injections, so the coend is constructed using one of (possibly infinite number of) injections. In our case we construct a lens L(s, t; a, b) by injecting a pair of morphisms from the two hom-sets sharing the same c. But once the lens is constructed, there is no way to extract the original c from it.

It’s not immediately obvious that this representation of the lens reproduces the standard setter/getter form. However, in a cartesian closed category, we can use the currying adjunction to transform the second hom-set:

\mathcal{C}(c \times b, t) \cong \mathcal{C}(c, [b, t])

Here, [b, t] is the internal hom, or the function object representing morphisms from b to t. We can then use the co-Yoneda lemma to reduce the coend:

\int^{c : \mathcal{C}} \mathcal{C}(s, c \times a) \times \mathcal{C}(c, [b, t]) \cong \mathcal{C}(s, [b, t] \times a) \cong \mathcal{C}(s \times b, t) \times \mathcal{C}(s, a)

The first part of this product is the setter: it takes the source object s and the new focus b to produce the new target t. The second part is the getter that extracts the focus a.

Even though all optics have similar form, each of them reduces differently.

Here’s another example: the prism. We just replace the product with the coproduct (sum).

P(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c + a) \times \mathcal{C}(c + b, t)

This time the reduction goes through the universal property of the coproduct: a mapping out of a sum is a product of mappings:

\mathcal{C}(c + b, t) \cong\mathcal{C}(c, t) \times\mathcal{C}(b, t)

Again, we use the co-Yoneda to reduce the coend:

\int^{c : \mathcal{C}} \mathcal{C}(s, c + a) \times\mathcal{C}(c, t) \times\mathcal{C}(b, t) \cong\mathcal{C}(s, t + a) \times\mathcal{C}(b, t)

The first one extracts the focus a, if possible, otherwise it constructs a t (by secretly injecting a c). The second constructs a t by injecting a b.

We can easily generalize existential optics to an arbitrary tensor product in a monoidal category:

O(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c \otimes a) \times \mathcal{C}(c \otimes b, t)

In general, though, this form cannot be further reduced using the co-Yoneda trick.

But what about the third requirement: the zooming-in property of optics? In the case of the lens and the prism it works because of associativity of the product and the sum. In fact it works for any tensor product. If you can decompose s into c \otimes a, and further decompose a into c' \otimes a', then you can decompose s into (c \otimes c') \otimes a'. Zooming-in is made possible by the associativity of the tensor product.

Focusing on the whole object plays the role of the unit of zooming.

These two properties are used in the definition of the category of optics. The objects in this category are pairs of object in \mathcal{C}. A morphism from a pair \langle s, t \rangle to \langle a, b \rangle is the optic O(s, t; a, b). Zooming-in is the composition of morphisms.

But this is still not the most general setting. The useful insight is that the multiplication (product) in a lens, and addition (coproduct) in a prism, look like examples of linear transformations, with the residue c playing the role of a parameter. In fact, a composition of a lens with a prism produces a 2-parameter affine transformation, which also behaves like an optic. We can therefore generalize optics to work with an arbitrary monoidal action (first hinted in the discussion at the end of this blog post). Categories with such actions are known as actegories.

The idea is that you define a family of endofunctors A_m in \mathcal{C} that is parameterized by objects from a monoidal category \mathcal{M}. So far we’ve only discussed examples where the parameters were taken from the same category \mathcal{C} and the action was either multiplication or addition. But there are many examples in which \mathcal{M} is not the same as \mathcal{C}.

The zooming principles are satisfied if the action respects the tensor product in \mathcal{M}:

A_{m \otimes n} \cong A_m \circ A_n

A_1 \cong \mathit{Id}

(Here, 1 is the unit object with respect to the tensor product \otimes in \mathcal{M}, and \mathit{Id} is the identity endofunctor.)

The actegorical version of the optic doesn’t deal directly with the residue. It tells us that the “unimportant” part of the composite object can be parameterized by some m \colon \mathcal{M}.

This additional abstraction allows us to transport the residue between categories. It’s enough that we have one action L_m in \mathcal{C} and another R_m in \mathcal{D} to create this mixed optics (first introduced by Mitchell Riley):

O(s, t; a, b) = \int^{m : \mathcal{M}} \mathcal{C}(s, L_m a) \times \mathcal{D}(R_m b, t)

The separation of the focus from the complement using monoidal actions is reminiscent of what physicists call the distinction between “physical”  and “gauge” degrees of freedom.

An in-depth presentation of optics, including their profunctor representation, is available in this paper.

Appendix: Coends and the Co-Yoneda Lemma

A coend is defined for a profunctor, that is a functor of two variables, one contravariant and one covariant, p \colon \mathcal{C}^{op} \times \mathcal{C} \to \mathbf{Set}. It’s a cross between a coproduct and a trace, as it’s constructed using injections of diagonal elements (with some identifications):

\iota_{a} \colon p \langle a, a \rangle \to \int^{c : \mathcal{C}} p \langle c, c \rangle

Co-Yoneda lemma is the identity that works for any covariant functor (copresheaf) F \colon \mathcal{C} \to \mathbf{Set}:

\int^{c \colon \mathcal{C}} F(c) \times \mathcal{C}(c, x) \cong F(x)


A PDF version of this post is available on GitHub.

Dependent types, in programming, are families of types indexed by elements of an indexing type. For instance, counted vectors are families of tuples indexed by natural numbers—the lengths of the vectors.

In category theory we model dependent types as fibrations. We start with the total space E, the base space B, and a projection, or a display map, p \colon E \to B. The fibers of p correspond to members of the type family. For instance, the total space, or the bundle, of counted vectors is the list type \mathit{List} (A) (a free monoid generated by A) with the projection \mathit{len} \colon \mathit{List} (A) \to \mathbb{N} that returns the length of a list.

Another way of looking at dependent types is as objects in the slice category \mathcal{C}/B. Counted vectors, for instance, are represented as objects in \mathcal{C}/\mathbb{N} given by pairs \langle \mathit{List} (A), \mathit{len} \rangle. Morphisms in the slice category correspond to fibre-wise mappings between bundles.

We often require that \mathcal{C} be a locally cartesian closed category, that is a category whose slice categories are cartesian closed. In such categories, the base-change functor f^* has both the left adjoint, the dependent sum \Sigma_f; and the right adjoint, the dependent product \Pi_f. The base-change functor is defined as a pullback:

basechange

This pullback defines a cartesian product in the slice category \mathcal{C}/B between two objects: \langle B', f \rangle and \langle E, p \rangle. In a locally cartesian closed category, this product has the right adjoint, the internal hom in \mathcal{C}/B.

Dependent optics

The most general optic is given by two monoidal actions L_m and R_m in two categories \mathcal{C} and \mathcal{D}. It can be written as the following coend of the product of two hom-sets:

O(A, A'; S, S') = \int^{m \colon \mathcal{M}} \mathcal{C}( S, L_m A) \times \mathcal{D}(R_m A', S')

Monoidal actions are parameterized by objects in a monoidal category (\mathcal{M}, \otimes, 1).

Dependent optics are a special case of general optics, where one or both categories in question are slice categories. When the monoidal action is defined in the slice category, the transformations must respect fibrations. For instance, the action in the bundle \langle E, p \rangle over B must commute with the projection:

p \circ L_m = p

This is reminiscent of gauge transformations in physics, which act on fibers in bundles over spacetime. The action must respect the monoidal structure of \mathcal{M} so, for instance,

L_{m \otimes n} \cong L_m \circ L_n

L_1 \cong \mathit{Id}

We can define a dependent (mixed) optic as:

\int^{m : \mathcal{M}} (\mathcal{C}/B)( S, L_m A) \times (\mathcal{D}/B')(R_m A', S')

Just like regular optics, dependent optics can be represented using Tambara modules, which are profunctors with the additional structure given by transformations:

\alpha_{m, \langle A, A' \rangle} \colon P \langle A, A' \rangle \to P\langle L_m A, R_m A' \rangle

where A and A' are objects in the appropriate slice categories.
The optic is then given by the following end in the Tambara category:

O(A, A'; S, S') = \int_{p : \mathbf{Tam}} \mathbf{Set}(P \langle A, A' \rangle, P \langle S, S' \rangle)

Dependent lens

The primordial optic, the lens, is defined by the monoidal action of a product. By analogy, we define a dependent lens by the action of the product in a slice category. The action parameterized by an object \langle C, q \rangle on another object \langle A, p \rangle is given by the pullback:

M_C A = C \times_B A

Since a pullback is the product in the slice category \mathcal{C}/B, it is automatically associative and unital, so it can be used to define a dependent lens:

\mathit{DLens}(A, A'; S, S') = \int^{\langle C, p \rangle : \mathcal{C}/B} (\mathcal{C}/B)( S, C \times_B A) \times (\mathcal{C}/B)(C \times_B A', S')

Since \mathcal{C} is locally cartesian closed, there is an adjunction between the product and the exponential. We can use it to get:

\cong \int^{\langle C, p \rangle : \mathcal{C}/B} (\mathcal{C}/B)( S, C \times_B A) \times (\mathcal{C}/B)(C , [A', S']_B)

We can then apply the Yoneda lemma to get the setter/getter form:

(\mathcal{C}/B)( S, [A', S']_B \times_B A)

The internal hom [A', S']_B in a locally cartesian closed category can be expressed using a dependent product:

\left [\left \langle A' \atop p \right \rangle, \left \langle S' \atop q \right \rangle \right ] \cong \Pi_p \left(p^* \left \langle S' \atop q \right \rangle \right)

where p \colon A' \to B is the fibration of A', \Pi_p is the right adjoint to the base change functor, and p^* is the base-change functor along p.

The dependent lens can be written as:

(\mathcal{C} / B) \left( \left \langle S \atop r \right \rangle, \Pi_p \left(p^* \left \langle S' \atop q \right \rangle \right) \times \left \langle A \atop r' \right \rangle \right)

In particular, if B is \mathbb{N}, this is equal to an infinite tuple of functions:

O(A, B; S, T) \cong \prod_n \left( s_n \to \left((b_n \to t_n) \times a_n \right) \right)

or fiber-wise pairs of setter/getter \langle s_n \to b_n \to t_n, s_n \to a_n \rangle indexed by n.

Traversals

Traversals are optics whose monoidal action is generated by polynomial functors of the form:

M_{c} a = \sum_{n \colon \mathbb{N}} c_n \times a^n

The coefficients c_n can be expressed as a fibration \langle C, p \colon C \to \mathbb{N} \rangle, with C = \sum_n c_n, the sum of the fibers. The set of powers of a can be similarly written as \langle L(a), \mathit{len} \rangle, with L(a) the type of list of a (a free monoid generated by a), and \mathit{len} the function that assigns the length to a list. The monoidal action can then be written using a product (pullback) in the slice category \mathbf{Set}/\mathbb{N}:

\left \langle {C \atop p} \right \rangle \times \left \langle {L(a) \atop \mathit{len}} \right \rangle

There is an obvious forgetful functor U \colon \mathbf{Set}/\mathbb{N} \to \mathbf{Set}, which can be used to express the polynomial action:

M_c a = U\left( \left \langle {C \atop p} \right \rangle \times \left \langle {L(a) \atop \mathit{len}} \right \rangle \right)

The traversal is the optic:

\int^{\langle C, p \rangle : \mathbf{Set}/\mathbb{N}} \mathbf{Set} \left(s, M_c a \right) \times \mathbf{Set}(M_c b, t)

Eqivalently, the second factor can be rewritten as:

\mathbf{Set}\left( \sum_{n \colon \mathbb{N}} c_n \times b^n, t\right) \cong \prod_{n \colon \mathbb{N}} \mathbf{Set}(c_n \times b^n, t)

This, in turn, is equivalent to a single hom-set in the slice category:

\cong (\mathbf{Set}/\mathbb{N})\left(\left \langle {C \atop p} \right \rangle \times \left \langle {L(b) \atop \mathit{len}} \right \rangle, \left \langle {\mathbb{N} \times t \atop \pi_1} \right \rangle \right)

where \pi_1 is the projection from the cartesian product.

The traversal is therefore a mixed optic:

\int^{\langle C, p \rangle : \mathbf{Set}/\mathbb{N}} \mathbf{Set} \left(s, M_c a \right) \times (\mathbf{Set}/\mathbb{N})\left( \left \langle {C \atop p} \right \rangle \times \left \langle {L(b) \atop \mathit{len}} \right \rangle, \left \langle {\mathbb{N} \times t \atop \pi_1} \right \rangle \right)

The second factor can be transformed using the internal hom adjunction:

(\mathbf{Set}/\mathbb{N})\left(\left \langle {C \atop p} \right \rangle, \left[ \left \langle {L(b) \atop \mathit{len}} \right \rangle, \left \langle {\mathbb{N} \times t \atop \pi_1} \right \rangle \right] \right)

We can then use the ninja Yoneda lemma on the optic to “integrate” over \langle C, p \rangle and get:

O(a, b; s, t) \cong \mathbf{Set} \left( s, U\left( \left[ \left \langle {L(b) \atop \mathit{len}} \right \rangle, \left \langle {\mathbb{N} \times t \atop \pi_1} \right \rangle \right] \times \left \langle {L(a) \atop \mathit{len}} \right \rangle \right) \right)

which, in components, reads:

s \to \sum_n \left( (b^n \to t) \times a^n \right)


This post is based on the talk I gave in Moscow, Russia, in February 2015 to an audience of C++ programmers.

Let’s agree on some preliminaries.

C++ is a low level programming language. It’s very close to the machine. C++ is engineering at its grittiest.

Category theory is the most abstract branch of mathematics. It’s very very high in the layers of abstraction. Category theory is mathematics at its highest.

So why have I decided to speak about category theory to C++ programmers? There are many reasons.

The main reason is that category theory captures the essence of programming. We can program at many levels, and if I ask somebody “What is programming?” most C++ programmers will probably say that it’s about telling the computer what to do. How to move bytes from memory to the processor, how to manipulate them, and so on.

But there is another view of programming and it’s related to the human side of programming. We are humans writing programs. We decide what to tell the computer to do.

We are solving problems. We are finding solutions to problems and translating them in the language that is understandable to the computer.

But what is problem solving? How do we, humans, approach problem solving? It was only a recent development in our evolution that we have acquired these fantastic brains of ours. For hundreds of millions of years not much was happening under the hood, and suddenly we got this brain, and we used this brain to help us chase animals, shoot arrows, find mates, organize hunting parties, and so on. It’s been going on for a few hundred thousand years. And suddenly the same brain is supposed to solve problems in software engineering.

So how do we approach problem solving? There is one general approach that we humans have developed for problem solving. We had to develop it because of the limitations of our brain, not because of the limitations of computers or our tools. Our brains have this relatively small cache memory, so when we’re dealing with a huge problem, we have to split it into smaller parts. We have to decompose bigger problems into smaller problems. And this is very human. This is what we do. We decompose, and then we attack each problem separately, find the solution; and once we have solutions to all the smaller problems, we recompose them.

So the essence of programming is composition.

If we want to be good programmers, we have to understand composition. And who knows more about composing than musicians? They are the original composers!

So let me show you an example. This is a piece by Johann Sebastian Bach. I’ll show you two versions of this composition. One is low level, and one is high level.

The low level is just sampled sound. These are bytes that approximate the waveform of the sound.

SampledMusic

And this is the same piece in typical music notation.

GavotteEnRondeau

Which one is easier to manipulate? Which one is easier to reason about? Obviously, the high level one!

Notice that, in the high level language, we use a lot of different abstractions that can be processed separately. We split the problem into smaller parts. We know that there are things called notes, and they can be reproduced, in this particular case, using violins. There are also some letters like E, A, B7: these are chords. They describe harmony. There is melody, there is harmony, there is the bass line.

Musicians, when they compose music, use higher level abstractions. These higher level abstractions are easier to manipulate, reason about, and modify when necessary.

And this is probably what Bach was hearing in his head.
 

 
And he chose to represent it using the high level language of musical notation.

Now, if you’re a rap musician, you work with samples, and you learn how to manipulate the low level description of music. It’s a very different process. It’s much closer to low-level C++ programming. We often do copy and paste, just like rap musicians. There’s nothing wrong with that, but sometimes we would like to be more like Bach.

So how do we approach this problem as programmers and not as musicians. We cannot use musical notation to lift ourselves to higher levels of abstraction. We have to use mathematics. And there is one particular branch of mathematics, category theory, that is exactly about composition. If programming is about composition, then this is what we should be looking at.

Category theory, in general, is not easy to learn, but the basic concepts of category theory are embarrassingly simple. So I will talk about some of those embarrassingly simple concepts from category theory, and then explain how to use them in programming in some weird ways that would probably not have occurred to you when you’re programming.

Categories

So what is this concept of a category? Two things: object and arrows between objects.

In category theory you don’t ask what these objects are. You call them objects, you give them names like A, B, C, D, etc., but you don’t ask what they are or what’s inside them. And then you have arrows that connect objects. Every arrow starts at some object and ends at some object. You can have many arrows going between two objects, or none whatsoever. Again, you don’t ask what these arrows are. You just give them names like f, g, h, etc.

And that’s it—that’s how you visualize a category: a bunch of objects and a bunch of arrows between them.

There are some operations on arrows and some laws that they have to obey, and they are also very simple.

Since composition is the essence of category theory (and of programming), we have to define composition in a category.

Composition

Whenever you have an arrow f going from object A to object B, here represented by two little piggies, and another arrow g going from object B to object C, there is an arrow called their composition, g ∘ f, that goes directly from object A to object C. We pronounce this “g after f.”

Composition is part of the definition of a category. Again, since we don’t know what these arrows are, we don’t ask what composition is. We just know that for any two composable arrows — such that the end of one coincides with the start of the other — there exists another arrow that’s their composition.

And this is exactly what we do when we solve problems. We find an arrow from A to B — that’s our subproblem. We find an arrow from B to C, that’s another subproblem. And then we compose them to get an arrow from A to C, and that’s a solution to our bigger problem. We can repeat this process, building larger and larger solutions by solving smaller problems and composing the solutions.

Notice that when we have three arrows to compose, there are two ways of doing that, depending on which pair we compose first. We don’t want composition to have history. We want to be able to say: This arrow is a composition of these three arrows: f after g after h, without having to use parentheses for grouping. That’s called associativity:

 (f ∘ g) ∘ h = f ∘ (g ∘ h)

Composition in a category must be associative.

And finally, every object has to have an identity arrow. It’s an arrow that goes from the object back to itself. You can have many arrows that loop back to the same object. But there is always one such loop for every object that is the identity with respect to composition.

Identity

It has the property that if you compose it with any other arrow that’s composable with it — meaning it either starts or ends at this object — you get that arrow back. It acts like multiplication by one. It’s an identity — it doesn’t change anything.

Monoid

I can immediately give you an example of a very simple category that I’m sure you know very well and have used all your adult life. It’s called a monoid. It’s another embarrassingly simple concept. It’s a category that has only one object. It may have lots of arrows, but all these arrows have to start at this object and end at this object, so they are all composable. You can compose any two arrows in this category to get another arrow. And there is one arrow that’s the identity. When composed with any other arrow it will give you back the same arrow.

Monoid

There are some very simple examples of monoids. We have natural numbers with addition and zero. An arrow corresponds to adding a number. For instance, you will have an arrow that corresponds to adding 5. You compose it with an arrow that corresponds to adding 3, and you get an arrow that corresponds to adding 8. Identity arrow corresponds to adding zero.

Multiplication forms a monoid too. The identity arrow corresponds to multiplying by 1. The composition rule for these arrows is just a multiplication table.

Strings form another interesting monoid. An arrow corresponds to appending a particular string. Unit arrow appends an empty string. What’s interesting about this monoid is that it has no additional structure. In particular, it doesn’t have an inverse for any of its arrows. There are no “negative” strings. There is no anti-“world” string that, when appended to “Hello world”, would result in the string “Hello“.

In each of these monoids, you can think of the one object as being a set: a set of all numbers, or a set of all strings. But that’s just an aid to imagination. All information about the monoid is in the composition rules — the multiplication table for arrows.

In programming we encounter monoids all over the place. We just normally don’t call them that. But every time you have something like logging, gathering data, or auditing, you are using a monoid structure. You’re basically adding some information to a log, appending, or concatenating, so that’s a monoidal operation. And there is an identity log entry that you may use when you have nothing interesting to add.

Types and Functions

So monoid is one example, but there is something closer to our hearts as programmers, and that’s the category of types and functions. And the funny thing is that this category of types and functions is actually almost enough to do programming, and in functional languages that’s what people do. In C++ there is a little bit more noise, so it’s harder to abstract this part of programming, but we do have types — it’s a strongly typed language, modulo implicit conversions. And we do have functions. So let’s see why this is a category and how it’s used in programming.

This category is actually called Set — a category of sets — because, to the lowest approximation, types are just sets of values. The type bool is a set of two values, true and false. The type int is a set of integers from something like negative two billion to two billion (on a 32-bit machine). All types are sets: whether it’s numbers, enums, structs, or objects of a class. There could be an infinite set of possible values, but it’s okay — a set may be infinite. And functions are just mappings between these sets. I’m talking about the simplest functions, ones that take just one value of some type and return another value of another type. So these are arrows from one type to another.

Can we compose these functions? Of course we can. We do it all the time. We call one function, it returns some value, and with this value we call another function. That’s function composition. In fact this is the basis of procedural decomposition, the first serious approach to formalizing problem solving in programming.

Here’s a piece of C++ code that composes two functions f and g.

C g_after_f(A x) {
    B y = f(x);
    return g(y);
}

In modern C++ you can make this code generic — a higher order function that accepts two functions and returns a third function that’s the composition of the two.

Can you compose any two functions? Yes — if they are composable. The output type of one must match the input type of another. That’s the essence of strong typing in C++ (modulo implicit conversions).

Is there an identity function? Well, in C++ we don’t have an identity function in the library, which is too bad. That’s because there’s a complex issue of how you pass things: is it by value, by reference, by const reference, by move, and so on. But in functional languages there is just one function called identity. It takes an argument and returns it back. But even in C++, if you limit yourself to functions that take arguments by value and return values, then it’s very easy to define a generic identity function.

Notice that the functions I’m talking about are actually special kind of functions called pure functions. They can’t have any side effects. Mathematically, a function is just a mapping from one set to another set, so it can’t have side effects. Also, a pure function must return the same value when called with the same arguments. This is called referential transparency.

A pure function doesn’t have any memory or state. It doesn’t have static variables, and doesn’t use globals. A pure function is an ideal we strive towards in programming, especially when writing reusable components and libraries. We don’t like having global variables, and we don’t like state hidden in static variables.

Moreover, if a function is pure, you can memoize it. If a function takes a long time to evaluate, maybe you’ll want to cache the value, so it can be retrieved quickly next time you call it with the same arguments.

Another property of pure functions is that all dependencies in your code only come through composition. If the result of one function is used as an argument to another then obviously you can’t run them in parallel or reverse the order of execution. You have to call them in that particular order. You have to sequence their execution. The dependencies between functions are fully explicit. This is not true for functions that have side effects. They may look like independent functions, but they have to be executed in sequence, or their side effects will be different.

We know that compiler optimizers will try to rearrange our code, but it’s very hard to do it in C++ because of hidden dependencies. If you have two functions that are not composed, they just calculate different things, and you try to call them in a different order, you might get a completely different result. It’s because of the order of side effects, which are invisible to the compiler. You would have to go deep into the implementation of the functions; you would have to analyse everything they are doing, and the functions they are calling, and so on, in order to find out what these side effects are; and only then you could decide: Oh, I can swap these two functions.

In functional programming, where you only deal with pure functions, you can swap any two functions that are not explicitly composed, and composition is immediately visible.

At this point I would expect half of the audience to leave and say: “You can’t program with pure functions, Programming is all about side effects.” And it’s true. So in order to keep you here I will have to explain how to deal with side effects. But it’s important that you start with something that is easy to understand, something you can reason about, like pure functions, and then build side effects on top of these things, so you can build up abstractions on top of other abstractions.

You start with pure functions and then you talk about side effects, not the other way around.

Auditing

Instead of explaining the general theory of side effects in category theory, I’ll give you an example from programming. Let’s solve this simple problem that, in all likelihood, most C++ programmers would solve using side effects. It’s about auditing.

You start with a sequence of functions that you want to compose. For instance, you have a function getKey. You give it a password and it returns a key. And you have another function, withdraw. You give it a key and gives you back money. You want to compose these two functions, so you start with a password and you get money. Excellent!

But now you have a new requirement: you want to have an audit trail. Every time one of these functions is called, you want to log something in the audit trail, so that you’ll know what things have happened and in what order. That’s a side effect, right?

How do we solve this problem? Well, how about creating a global variable to store the audit trail? That’s the simplest solution that comes to mind. And it’s exactly the same method that’s used for standard output in C++, with the global object std::cout. The functions that access a global variable are obviously not pure functions, we are talking about side effects.

string audit;

int logIn(string passwd){
  audit += passwd;
  return 42;
}

double withdraw(int key){
   audit += “withdrawing ”;
   return 100.0;
}

So we have a string, audit, it’s a global variable, and in each of these functions we access this global variable and append something to it. For simplicity, I’m just returning some fake numbers, not to complicate things.

This is not a good solution, for many reasons. It doesn’t scale very well. It’s difficult to maintain. If you want to change the name of the variable, you’d have to go through all this code and modify it. And if, at some point, you decide you want to log more information, not just a string but maybe a timestamp as well, then you have to go through all this code again and modify everything. And I’m not even mentioning concurrency. So this is not the best solution.

But there is another solution that’s really pure. It’s based on the idea that whatever you’re accessing in a function, you should pass explicitly to it, and then return it, with modifications, from the function. That’s pure. So here’s the next solution.

pair<int, string> 
logIn(string passwd, string audit){
  return make_pair(42, audit + passwd);
}

pair<double, string> 
withdraw(int key, string audit){
  return make_pair(100.0
                 , audit + “withdrawing ”);
}

You modify all the functions so that they take an additional argument, the audit string. And the return type is also changed. When we had an int before, it’s now a pair of int and string. When we had a double before, it’s now a pair of double and string. These function now call make_pair before they return, and they put in whatever they were returning before, plus they do this concatenation of a new message at the end of the old audit string. This is a better solution because it uses pure functions. They only depend on their arguments. They don’t have any state, they don’t access any global variables. Every time you call them with the same arguments, they produce the same result.

The problem though is that they don’t memoize that well. Look at the function logIn: you normally get the same key for the same password. But if you want to memoize it when it takes two arguments, you suddenly have to memoize it for all possible histories. Even if you call it with the same password, but the audit string is different, you can’t just access the cache, you have to cache a new pair of values. Your cache explodes with all possible histories.

An even bigger problem is security. Each of these functions has access to the complete log, including the passwords.

Also, each of these functions has to care about things that maybe it shouldn’t be bothered with. It knows about how to concatenate strings. It knows the details of the implementation of the log: that the log is a string. It must know how to accumulate the log.

Now I want to show you a solution that maybe is not that obvious, maybe a little outside of what we would normally think of.

pair<int, string> 
logIn(string passwd){
  return make_pair(42, passwd);
}

pair<double, string> 
withdraw(int key){
  return make_pair(100.0
                  ,“withdrawing ”);
}

We use modified functions, but they don’t take the audit string any more. They just return a pair of whatever they were returning before, plus a string. But each of them only creates a message about what it considers important. It doesn’t have access to any log and it doesn’t know how to work with an audit trail. It’s just doing its local thing. It’s only responsible for its local data. It’s not responsible for concatenation.

It still creates a pair and it has a modified return type.

We have one problem though: we don’t know how to compose these functions. We can’t pass a pair of key and string from logIn to withdraw, because withdraw expects an int. Of course we could extract the int and drop the string, but that would defeat the goal of auditing the code.

Let’s go back a little bit and see how we can abstract this thing. We have functions that used to return some types, and now they return pairs of the original type and a string. This should in principle work with any original type, not just an int or a double. In functional programming we call this “lifting.” Here we lift some type A to a new type, which is a pair of A and a string. Or we can say that we are “embellishing” the return type of a function by pairing it with a string.

I’ll create an alias for this new parameterised type and call it Writer.

template<class A>
using Writer = pair<A, string>;

My functions now return Writers: logIn returns a writer of int, and withdraw returns a writer of double. They return “embellished” types.

Writer<int> logIn(string passwd){
    return make_pair(42, passwd);
}

Writer<double> withdraw(int key){
    return make_pair(100.0, “withdrawing ”);
}

So how do we compose these embellished functions?

In this case we want to compose logIn with withdraw to create a new function called transact. This new function transact will take a password, log the user in, withdraw money, and return the money plus the audit trail. But it will return the audit trail only from those two functions.

Writer<double> transact(string passwd){
  auto p1 logIn(passwd);
  auto p2 withdraw(p1.first);
  return make_pair(p2.first
          , p1.second + p2.second);
}

How is it done? It’s very simple. I call the first function, logIn, with the password. It returns a pair of key and string. Then I call the second function, passing it the first component of the pair — the key. I get a new pair with the money and a string. And then I perform the composition. I take the money, which is the first component of the second pair, and I pair it with the concatenation of the two string that were the second components of the pairs returned by logIn and withdraw.

So the accumulation of the log is done “in between” the calls (think of composition as happening between calls). I have these two functions, and I’m composing them in this funny way that involves the concatenation of strings. The accumulation of the log does not happen inside these two functions, as it happened before. It happens outside. And I can pull out this code and abstract the composition. It doesn’t really matter what functions I’m calling. I can do it for any two functions that return embellished results. I can write generic code that does it and I can call it “compose”.

template<class A, class B, class C>
function<Writer<C>(A)> compose(function<Writer<B>(A)> f
                              ,function<Writer<C>(B)> g)
{
    return [f, g](A x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                  , p1.second + p2.second);
    };
}

What does compose do? It takes two functions. The first function takes A and returns a Writer of B. The second function takes a B and return a Writer of C. When I compose them, I get a function that takes an A and returns a Writer of C.

This higher order function just does the composition. It has no idea that there are functions like logIn or withdraw, or any other functions that I may come up with later. It takes two embellished functions and glues them together.

We’re lucky that in modern C++ we can work with higher order functions that take functions as arguments and return other functions.

This is how I would implement the transact function using compose.

Writer<double> transact(string passwd){
  return compose<string, int, double>
           (logIn, withdraw)(passwd);
}

The transact function is nothing but the composition of logIn and withdraw. It doesn’t contain any other logic. I’m using this special composition because I want to create an audit trail. And the audit trail is accumulated “between” the calls — it’s in the glue that glues these two functions together.

This particular implementation of compose requires explicit type annotations, which is kind of ugly. We would like the types to be inferred. And you can do it in C++14 using generalised lambdas with return type deduction. This code was contributed by Eric Niebler.

auto const compose = [](auto f, auto g) {
    return [f, g](auto x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                    , p1.second + p2.second);
    };
};
Writer<double> transact(string passwd){
  return compose(logIn, withdraw)(passwd);
}

Back to Categories

Now that we’ve done this example, let’s go back to where we started. In category theory we have functions and we have composition of functions. Here we also have functions and composition, but it’s a funny composition. We have functions that take simple types, but they return embellished types. The types don’t match.

Let me remind you what we had before. We had a category of types and pure functions with the obvious composition.

  • Objects: types,
  • Arrows: pure functions,
  • Composition: pass the result of one function as the argument to another.

What we have created just now is a different category. Slightly different. It’s a category of embellished functions. Objects are still types: Types A, B, C, like integers, doubles, strings, etc. But an arrow from A to B is not a function from type A to type B. It’s a function from type A to the embellishment of the type B. The embellished type depends on the type B — in our case it was a pair type that combined B and a string — the Writer of B.

Now we have to say how to compose these arrows. It’s not as trivial as it was before. We have one arrow that takes A into a pair of B and string, and we have another arrow that takes B into a pair of C and string, and the composition should take an A and return a pair of C and string. And I have just defined this composition. I wrote code that does this:

auto const compose = [](auto f, auto g) {
    return [f, g](auto x) {
        auto p1 = f(x);
        auto p2 = g(p1.first);
        return make_pair(p2.first
                    , p1.second + p2.second);
    };
};

So do we have a category here? A category that’s different from the original category? Yes, we do! It has composition and it has identity.

What’s its identity? It has to be an arrow from the object to itself, from A to A. But an arrow from A to A is a function from A to a pair of A and string — to a Writer of A. Can we implement something like this? Yes, easily. We will return a pair that contains the original value and the empty string. The empty string will not contribute to our audit trail.

template<class A>
Writer<A> identity(A x) {
    return make_pair(x, "");
}

Is this composition associative? Yes, it is, because the underlying composition is associative, and the concatenation of strings is associative.

We have a new category. We have incorporated side effects by modifying the original category. We are still only using pure functions and yet we are able to accumulate an audit trail as a side effect. And we moved the side effects to the definition of composition.

It’s a funny new way of looking at programming. We usually see the functions, and the data being passed between functions, and here suddenly we see a new dimension to programming that is orthogonal to this, and we can manipulate it. We change the way we compose functions. We have this new power to change composition. We have a new way of solving problems by moving to these embellished functions and defining a new way of composing them. We can define new combinators to compose functions, and we’ll let the combinators do some work that we don’t want these functions to do. We can factor these things out and make them orthogonal.

Does this approach generalize?

One easy generalisation is to observe that the Writer structure works for any monoid. It doesn’t have to be just strings. Look at how composition and identity are defined in our new cateogory. The only properties of the log we are using are concatenation and unit. Concatenation must be associative for the composition to be associative. And we need a unit of concatenation so that we can define identity in our category. We don’t need anything else. This construction will work with any monoid.

And that’s great because you have one more dimension in which you can modify your code without touching the rest. You can change the format of the log, and all you need to modify in your code is compose and identity. You don’t have to go through all your functions and modify the code. They will still work because all the concatenation of logs is done inside compose.

Kleisli Categories

This was just a little taste of what is possible with category theory. The thing I called embellishment is called a functor in category theory. You can implement categorical functors in C++. There are all kinds of embellishments/functors that you can use here. And now I can tell you the secret: this funny composition of functions with the funny identity is really a monad in disguise. A monad is just a funny way of composing embellished functions so that they form a category. A category based on a monad is called a Kleisli category.

Are there any other interesting monads that I can use this construction with? Yes, lots! I’ll give you one example. Functions that return futures. That’s our new embellishment. Give me any type A and I will embellish it by making it into a future. This embellishment also produces a Kleisli category. The composition of functions that return futures is done through the combinator “then”. You call one function returning a future and compose it with another function returning a future by passing it to “then.” You can compose these function into chains without ever having to block for a thread to finish. And there is an identity, which is a function that returns a trivial future that’s always ready. It’s called make_ready_future. It’s an arrow that takes A and returns a future of A.

Now you understand what’s really happening. We are creating this new category based on future being a monad. We have new words to describe what we are doing. We are reusing an idea from category theory to solve a completely different problem.

Resumable Functions

There is one little invonvenience with this approach. It requires writing a lot of so called “boilerplate” code. Repetitive code that obscures the simple logic. Here it’s the glue code, the “compose” and the “then.” What you’d like to do is to write your code directly in terms of embellished function, and the composition to be implicit. People noticed this and came up with solutions. In case of futures, the practical solution is called resumable functions.

Resumable functions are designed to hide the composition of functions that return futures. Here’s an example.

int cnt = 0;
do
{
   cnt = await streamR.read(512, buf);
   if ( cnt == 0 ) break;
   cnt = await streamW.write(cnt, buf);
} while (cnt > 0);

This code copies a file using a buffer, but it does it asynchronously. We call a function read that’s asynchronous. It doesn’t immediately fill the buffer, it returns a future instead. Then we call the function write that’s also asynchronous. We do it in a loop.

This code looks almost like sequential code, except that it has these await keywords. These are the points of insertion of our composition. These are the places where the code is chopped into pieces and composed using then.

I won’t go into details of the implementation. The point is that the composition of these embellished functions is almost entirely hidden. It doesn’t look like composition in a Kleisli category, but it really is.

This solution is usually described at a very low level, in terms of coroutines implemented as state machines with static variables and gotos. And what is being lost in all this engineering talk is how general this idea is — the idea of overloading composition to build a category of embellished functions.

Just to drive this home, here’s an example of different code that does completely different stuff. It calculates Fibonacci numbers on demand. It’s a generator of Fibonacci numbers.

generator<int> fib() 
{
    int a = 0; 
    int b = 1; 
    for (;;) { 
        __yield_value a; 
        auto next = a + b; 
        a = b; 
        b = next; 
    } 
}

Instead of await it has __yield_value. But it’s the same idea of resumable functions, only with a different monad. This monad is called a list monad. And this kind of code in combination with Eric Niebler’s proposed range library could lead to very powerful programming idioms.

Conclusion

Why do we have to separate the two notions: that of resumable functions and that of generators, if they are based on the same abstraction? Why do we have to reinvent the wheel?

There’s this great opportunity for C++, and I’m afraid it will be missed like so many other opportunities for great generalisations that were missed in the past. It’s the opportunity to introduce one general solution based on monads, rather than keep creating ad-hoc solutions, one problem at a time. The same very general pattern can be used to control all kinds of side effects. It can be used for auditing, exceptions, ranges, futures, I/O, continuations, and all kinds of user-defined monads.

This amazing power could be ours if we start thinking in more abstract terms, if we reach into category theory.


The main idea of functional programming is to treat functions like any other data types. In particular, we want to be able to pass functions as arguments to other functions, return them as values, and store them in data structures. But what kind of data type is a function? It’s a type that, when paired with another piece of data called the argument, can be passed to a function called apply to produce the result.

apply :: (a -> d, a) -> d

In practice, function application is implicit in the syntax of the language. But, as we will see, even if your language doesn’t support higher-order functions, all you need is to roll out your own apply.

But where do these function objects, arguments to apply, come from; and how does the built-in apply know what to do with them?

When you’re implementing a function, you are, in a sense, telling apply what to do with it–what code to execute. You’re implementing individual chunks of apply. These chunks are usually scattered all over your program, sometimes anonymously in the form of lambdas.

We’ll talk about program transformations that introduce more functions, replace anonymous functions with named ones, or turn some functions into data types, without changing program semantics. The main advantage of such transformations is that they may improve performance, sometimes drastically so; or support distributed computing.

Function Objects

As usual, we look to category theory to provide theoretical foundation for defining function objects. It turns out that we are able to do functional programming because the category of types and functions is cartesian closed. The first part, cartesian, means that we can define product types. In Haskell, we have the pair type (a, b) built into the language. Categorically, we would write it as a \times b. Product is functorial in both arguments so, in particular, we can define a functor

    L_a c = c \times a

It’s really a family of functors that it parameterized by a.

The right adjoint to this functor

    R_a d = a \to d

defines the function type a \to d (a.k.a., the exponential object d^a). The existence of this adjunction is what makes a category closed. You may recognize these two functors as, respectively, the writer and the reader functor. When the parameter a is restricted to monoids, the writer functor becomes a monad (the reader is already a monad).

An adjunction is defined as a (natural) isomorphism of hom-sets:

    D(L c, d) \cong C(c, R d)

or, in our case of two endofunctors, for some fixed a,

    C(c \times a, d) \cong C(c, a \to d)

In Haskell, this is just the definition of currying:

curry   :: ((c, a) -> d)   -> (c -> (a -> d))
uncurry :: (c -> (a -> d)) -> ((c, a) -> d)

You may recognize the counit of this adjunction

    \epsilon_d : L_a (R_a d) \to \mbox{Id}\; d

as our apply function

    \epsilon_d : ((a \to d) \times a) \to d

Adjoint Functor Theorem

In my previous blog post I discussed the Freyd’s Adjoint Functor theorem from the categorical perspective. Here, I’m going to try to give it a programming interpretation. Also, the original theorem was formulated in terms of finding the left adjoint to a given functor. Here, we are interested in finding the right adjoint to the product functor. This is not a problem, since every construction in category theory can be dualized by reversing the arrows. So instead of considering the comma category c/R, we’ll work with the comma category L/d. Its objects are pairs (c, f), in which f is a morphism

    f \colon L c \to d.

This is the general picture but, in our case, we are dealing with a single category, and L is an endofunctor. We can implement the objects of our comma category in Haskell

data Comma a d c = Comma c ((c, a) -> d)

The type a is just a parameter, it parameterizes the (left) functor L_a

    L_a c = c \times a

and d is the target object of the comma category.

We are trying to construct a function object representing functions a->d, so what role does c play in all of this? To understand that, you have to take into account that a function object can be used to describe closures: functions that capture values from their environment. The type c represents those captured values. We’ll see this more explicitly later, when we talk about defunctionalizing closures.

Our comma category is a category of all closures that go from a to d while capturing all possible environments. The function object we are constructing is essentially a sum of all these closures, except that some of them are counted multiple times, so we need to perform some identifications. That’s what morphisms are for.

The morphisms of the comma category are morphisms h \colon c \to c' in \mathcal C that make the following triangles in \mathcal D commute.

Unfortunately, commuting diagrams cannot be expressed in Haskell. The closest we can get is to say that a morphism from

c1 :: Comma a d c

to

c2 :: Comma a d c'

is a function h :: c -> c' such that, if

c1 = Comma c f
f :: (c, a) -> d
c2 = Comma c' g
g :: (c', a) -> d

then

f = g . bimap h id

Here, bimap h id is the lifting of h to the functor L_a. More explicitly

f (c, x) = g (h c, x)

As we are interpreting c as the environment in which the closure is defined, the question is: does f use all of the information encoded in c or just a part of it? If it’s just a part, then we can factor it out. For instance, consider a lambda that captures an integer, but it’s only interested in whether the integer is even or odd. We can replace this lambda with one that captures a Boolean, and use the function even to transform the environment.

The next step in the construction is to define the projection functor from the comma category L/d back to \mathcal C that forgets the f part and just keeps the object c

    \pi_d \colon (c, f) \mapsto c

We use this functor to define a diagram in \mathcal C. Now, instead of taking its limit, as we did in the previous installment, we’ll take the colimit of this diagram. We’ll use this colimit to define the action of the right adjoint functor R on d.

    R d = \underset{L/d}{\mbox{colim}} \; \pi_d

In our case, the forgetful functor discards the function part of Comma a d c, keeping only the environment c. This means that, as long as d is not Void, we are dealing with a gigantic diagram that encompasses all objects in our category of types. The colimit of this diagram is a gigantic coproduct of everything, modulo identifications introduced by morphisms of the comma category. But these identifications are crucial in pruning out redundant closures. Every lambda that uses only part of the information from the captured environment can be identified with a simpler lambda that uses a simplified environment.

For illustration, consider a somewhat extreme case of constructing the function object 1 \to d, or d^1 (d to the power of the terminal object). This object should be isomorphic to d. Let’s see how this works: The terminal object 1 is the unit of the product, so

    L_1 c = c \times 1 \cong c

so the comma category L_1 / d is just the slice category C/d of arrows to d. It so happens that this category has the terminal object (d, id_d). The colimit of a diagram that has a terminal object is that terminal object. So, indeed, in this case, our construction produces a function object that is isomorphic to d.

    1 \to d \cong d

Intuitively, given a lambda that captures a value of type c from the environment and returns a d, we can trivially factor it out, using this lambda to transform the environment for c to d and then apply the identity on d. The latter corresponds to the comma category object (d, id_d), and the forgetful functor maps it to d.

It’s instructive to run a few more examples to get the hang of it. For instance, the function object Bool->d can be constructed by considering closures of the type

f :: (c, Bool) -> d

Any such closure can be factorized by the following transformation of the environment

h :: c -> (d, d)
h c = (f (c, True), f (c, False))

followed by

g :: ((d, d), Bool) -> d
g ((d1, d2), b) = if b then d1 else d2

Indeed:

f (c, b) = g (h c, b)

In other words
    2 \to d \cong d \times d
where 2 corresponds to the Bool type.

Counit

We are particularly interested in the counit of the adjunction. Its component at d is a morphism

    \epsilon_d : L R d \to d

It also happens to be an object in the comma category, namely

    (R d, \epsilon_d \colon L R d \to d).

In fact, it is the terminal object in that category. You can see that because for any other object (c, f \colon L c \to d) there is a morphism h \colon c \to R d that makes the following triangle commute:

This morphisms h is a leg in the terminal cocone that defines R d. We know for sure that c is in the base of that cocone, because it’s the projection \pi_d of (c, f \colon L c \to d).

To get some insight into the construction of the function object, imagine that you can enumerate the set of all possible environments c_i. The comma category L_a/d would then consist of pairs (c_i, f_i \colon (c_i, a) \to d). The coproduct of all those environments is a good candidate for the function object a \to d. Indeed, let’s try to define a counit for it:

    (\coprod c_i, a) \to d \cong \coprod (c_i, a) \to d \cong \prod ((c_i, a) \to d)

I used the distributive law:

    (\coprod c_i, a) \cong \coprod (c_i, a)

and the fact that the mapping out of a sum is the product of mappings. The right hand side can be constructed from the morphisms of the comma category.

So the object \coprod c_i satisfies at least one requirement of the function object: there is an implementation of apply for it. It is highly redundant, though. This is why, instead of the coproduct, we used the colimit in our construction of the function object. Also, we ignored the size issues.

Size Issues

As we discussed before, this construction doesn’t work in general because of size issues: the comma category is not necessarily small, and the colimit might not exist.

To address this problems, we have previously defined small solution sets. In the case of the right adjoint, a solution set is a family of objects that is weakly terminal in L/c. These are pairs (c_i, f_i \colon L c_i \to d) that, among themselves, can factor out any g \colon L c \to d

    g = f_i \circ L h

It means that we can always find an index i and a morphism h \colon c \to c_i to satisfy that equation. Every g might require a different f_i and h to factor through but, for any g, we are guaranteed to always find a pair.

Once we have a complete solution set, the right adjoint R d is constructed by first forming a coproduct of all the c_i and then using a coequalizer to construct one terminal object.

What is really interesting is that, in some cases, we can just use the coproduct of the solution set, \coprod_i c_i to approximate the adjoint (thus skipping the equalizer part).


The idea is that, in a particular program, we don’t need to represent all possible function types, just a (small) subset of those. We are also not particularly worried about uniqueness: it’s no problem if the same function ends up with multiple syntactic representations.

Let’s reformulate Freyd’s construction of the function object in programming terms. The solution set is the set of types c_i and functions
f_i \colon (c_i, a) \to d
such that, for any function
g \colon (c, a) \to d
that is of interest in our program (for instance, used as an argument to another function) there exists an i and a function
h \colon c \to c_i
such that g can be rewritten as
g (c, a) = f_i (h c, a)
In other words, every function of interest can be replaced by one of the solution-set functions. The environment for this standard function can be always extracted from the environment of the more general function.

CPS Transformation

A particular application of higher order functions shows up in the context of continuation passing transformation. Let’s look at a simple example. We are going to implement a function that traverses a binary tree containing strings, and concatenates them all into one string. Here’s the tree

data Tree = Leaf String 
          | Node Tree String Tree

Recursive traversal is pretty straightforward

show1 :: Tree -> String
show1 (Leaf s) = s
show1 (Node l s r) =
  show1 l ++  s ++ show1 r

We can test it on a small tree:

tree :: Tree
tree = Node (Node (Leaf "1 ") "2 " (Leaf "3 "))
            "4 " 
            (Leaf "5 ")
test = show1 tree

There is just one problem: recursion consumes the runtime stack, which is usually a limited resource. Your program may run out of stack space resulting in the “stack overflow” runtime error. This is why the compiler will turn recursion into iteration, whenever possible. And it is always possible if the function is tail recursive, that is, the recursive call is the last call in the function. No operation on the result of the recursive call is permitted in a tail recursive function.

This is clearly not happening in our implementation of show1: After the recursive call is made to traverse the left subtree, we still have to make another call to traverse the right tree, and the two results must be concatenated with the contents of the node.

Notice that this is not just a functional programming problem. In an imperative language, where iteration is the rule, tree traversal is still implemented using recursion. That’s because the data structure itself is recursive. It used to be a common interview question to implement non-recursive tree traversal, but the solution is always to explicitly implement your own stack (we’ll see how it’s done at the end of this post).

There is a standard procedure to make functions tail recursive using continuation passing style (CPS). The idea is simple: if there is stuff to do with the result of a function call, let the function we’re calling do it instead. This “stuff to do” is called a continuation. The function we are calling takes the continuation as an argument and, when it finishes its job, it calls it with the result. A continuation is a function, so CPS-transformed functions have to be higher-order: they must accept functions as arguments. Often, the continuations are defined on the spot using lambdas.

Here’s the CPS transformed tree traversal. Instead of returning a string, it accepts a continuation k, a function that takes a string and produces the final result of type a.

show2 :: Tree -> (String -> a) -> a
show2 (Leaf s) k = k s
show2 (Node lft s rgt) k =
  show2 lft (\ls -> 
    show2 rgt (\rs -> 
      k (ls ++ s ++ rs)))

If the tree is just a leaf, show2 calls the continuation with the string that’s stored in the leaf.

If the tree is a node, show2 calls itself recursively to convert the left child lft. This is a tail call, nothing more is done with its result. Instead, the rest of the work is packaged into a lambda and passed as a continuation to show2. This is the lambda

\ls -> 
    show2 rgt (\rs -> 
      k (ls ++ s ++ rs))

This lambda will be called with the result of traversing the left child. It will then call show2 with the right child and another lambda

\rs -> 
      k (ls ++ s ++ rs)

Again, this is a tail call. This lambda expects the string that is the result of traversing the right child. It concatenates the left string, the string from the current node, and the right string, and calls the original continuation k with it.

Finally, to convert the whole tree t, we call show2 with a trivial continuation that accepts the final result and immediately returns it.

show t = show2 t (\x -> x)

There is nothing special about lambdas as continuations. It’s possible to replace them with named functions. The difference is that a lambda can implicitly capture values from its environment. A named function must capture them explicitly. The three lambdas we used in our CPS-transformed traversal can be replaced with three named functions, each taking an additional argument representing the values captured from the environment:

done s = s
next (s, rgt, k) ls = show3 rgt (conc (ls, s, k))
conc (ls, s, k) rs = k (ls ++ s ++ rs)

The first function done is an identity function, it forces the generic type a to be narrowed down to String.

Here’s the modified traversal using named functions and explicit captures.

show3 :: Tree -> (String -> a) -> a
show3 (Leaf s) k = k s
show3 (Node lft s rgt) k =
  show3 lft (next (s, rgt, k))

show t = show3 t done

We can now start making the connection with the earlier discussion of the adjoint theorem. The three functions we have just defined, done, next, and conc, form the family

    f_i \colon (c_i, a) \to b.

They are functions of two arguments, or a pair of arguments. The first argument represents the object c_i, part of the solution set. It corresponds to the environment captured by the closure. The three c_i are, respectively

()
(String, Tree, String -> String)
(String, String, String->String)

(Notice the empty environment of done, here represented as the unit type ().)

The second argument of all three functions is of the type String, and the return type is also String so, according to Freyd’s theorem, we are in the process of defining the function object a \to b, where a is String and b is String.

Defunctionalization

Here’s the interesting part: instead of defining the general function type String->String, we can approximate it with the coproduct of the elements of the solution set. Here, the three components of the sum type correspond to the environments captured by our three functions.

data Kont = Done 
          | Next String Tree   Kont 
          | Conc String String Kont

The counit of the adjunction is approximated by a function from this sum type paired with a String, returning a String

apply :: Kont -> String -> String
apply Done s = s
apply (Next s rgt k) ls = show4 rgt (Conc ls s k)
apply (Conc ls s k) rs  = apply k (ls ++ s ++ rs)

Rather than passing one of the three functions to our higher-order CPS traversal, we can pass this sum type

show4 :: Tree -> Kont -> String
show4 (Leaf s) k = apply k s
show4 (Node lft s rgt) k = 
  show4 lft (Next s rgt k)

This is how we execute it

show t = show4 t Done

We have gotten rid of all higher-order functions by replacing their function arguments with a data type equipped with the apply function. There are several situations when this is advantageous. In procedural languages, defunctionalization may be used to replace recursion with loops. In fact, the Kont data structure can be seen as a user-defined stack, especially if it’s rewritten as a list.

type Kont = [(String, Either Tree String)]

Here, Done was replaced with an empty list and Next and Conc together correspond to pushing a value on the stack.

In Haskell, the compiler performs tail recursion optimization, but defunctionalization may still be useful in implementing distributed systems, or web servers. Any time we need to pass a function between a client and a server, we can replace it by a data type that can be easily serialized.

Bibliography

  1. John C. Reynolds, Definitional Interpreters for Higher-Order Programming Languages
  2. James Koppel, The Best Refactoring You’ve Never Heard Of.

Next Page »