The main idea of functional programming is to treat functions like any other data types. In particular, we want to be able to pass functions as arguments to other functions, return them as values, and store them in data structures. But what kind of data type is a function? It’s a type that, when paired with another piece of data called the argument, can be passed to a function called apply to produce the result.

apply :: (a -> d, a) -> d

In practice, function application is implicit in the syntax of the language. But, as we will see, even if your language doesn’t support higher-order functions, all you need is to roll out your own apply.

But where do these function objects, arguments to apply, come from; and how does the built-in apply know what to do with them?

When you’re implementing a function, you are, in a sense, telling apply what to do with it–what code to execute. You’re implementing individual chunks of apply. These chunks are usually scattered all over your program, sometimes anonymously in the form of lambdas.

We’ll talk about program transformations that introduce more functions, replace anonymous functions with named ones, or turn some functions into data types, without changing program semantics. The main advantage of such transformations is that they may improve performance, sometimes drastically so; or support distributed computing.

## Function Objects

As usual, we look to category theory to provide theoretical foundation for defining function objects. It turns out that we are able to do functional programming because the category of types and functions is cartesian closed. The first part, cartesian, means that we can define product types. In Haskell, we have the pair type (a, b) built into the language. Categorically, we would write it as $a \times b$. Product is functorial in both arguments so, in particular, we can define a functor

$L_a c = c \times a$

It’s really a family of functors that it parameterized by $a$.

The right adjoint to this functor

$R_a d = a \to d$

defines the function type $a \to d$ (a.k.a., the exponential object $d^a$). The existence of this adjunction is what makes a category closed. You may recognize these two functors as, respectively, the writer and the reader functor. When the parameter $a$ is restricted to monoids, the writer functor becomes a monad (the reader is already a monad).

An adjunction is defined as a (natural) isomorphism of hom-sets:

$D(L c, d) \cong C(c, R d)$

or, in our case of two endofunctors, for some fixed $a$,

$C(c \times a, d) \cong C(c, a \to d)$

In Haskell, this is just the definition of currying:

curry   :: ((c, a) -> d)   -> (c -> (a -> d))
uncurry :: (c -> (a -> d)) -> ((c, a) -> d)

You may recognize the counit of this adjunction

$\epsilon_d : L_a (R_a d) \to \mbox{Id}\; d$

as our apply function

$\epsilon_d : ((a \to d) \times a) \to d$

In my previous blog post I discussed the Freyd’s Adjoint Functor theorem from the categorical perspective. Here, I’m going to try to give it a programming interpretation. Also, the original theorem was formulated in terms of finding the left adjoint to a given functor. Here, we are interested in finding the right adjoint to the product functor. This is not a problem, since every construction in category theory can be dualized by reversing the arrows. So instead of considering the comma category $c/R$, we’ll work with the comma category $L/d$. Its objects are pairs $(c, f)$, in which $f$ is a morphism

$f \colon L c \to d$.

This is the general picture but, in our case, we are dealing with a single category, and $L$ is an endofunctor. We can implement the objects of our comma category in Haskell

data Comma a d c = Comma c ((c, a) -> d)

The type a is just a parameter, it parameterizes the (left) functor $L_a$

$L_a c = c \times a$

and d is the target object of the comma category.

We are trying to construct a function object representing functions a->d, so what role does c play in all of this? To understand that, you have to take into account that a function object can be used to describe closures: functions that capture values from their environment. The type c represents those captured values. We’ll see this more explicitly later, when we talk about defunctionalizing closures.

Our comma category is a category of all closures that go from $a$ to $d$ while capturing all possible environments. The function object we are constructing is essentially a sum of all these closures, except that some of them are counted multiple times, so we need to perform some identifications. That’s what morphisms are for.

The morphisms of the comma category are morphisms $h \colon c \to c'$ in $\mathcal C$ that make the following triangles in $\mathcal D$ commute.

Unfortunately, commuting diagrams cannot be expressed in Haskell. The closest we can get is to say that a morphism from

c1 :: Comma a d c

to

c2 :: Comma a d c'

is a function h :: c -> c' such that, if

c1 = Comma c f
f :: (c, a) -> d
c2 = Comma c' g
g :: (c', a) -> d

then

f = g . bimap h id

Here, bimap h id is the lifting of h to the functor $L_a$. More explicitly

f (c, x) = g (h c, x)

As we are interpreting c as the environment in which the closure is defined, the question is: does f use all of the information encoded in c or just a part of it? If it’s just a part, then we can factor it out. For instance, consider a lambda that captures an integer, but it’s only interested in whether the integer is even or odd. We can replace this lambda with one that captures a Boolean, and use the function even to transform the environment.

The next step in the construction is to define the projection functor from the comma category $L/d$ back to $\mathcal C$ that forgets the $f$ part and just keeps the object $c$

$\pi_d \colon (c, f) \mapsto c$

We use this functor to define a diagram in $\mathcal C$. Now, instead of taking its limit, as we did in the previous installment, we’ll take the colimit of this diagram. We’ll use this colimit to define the action of the right adjoint functor $R$ on $d$.

$R d = \underset{L/d}{\mbox{colim}} \; \pi_d$

In our case, the forgetful functor discards the function part of Comma a d c, keeping only the environment $c$. This means that, as long as d is not Void, we are dealing with a gigantic diagram that encompasses all objects in our category of types. The colimit of this diagram is a gigantic coproduct of everything, modulo identifications introduced by morphisms of the comma category. But these identifications are crucial in pruning out redundant closures. Every lambda that uses only part of the information from the captured environment can be identified with a simpler lambda that uses a simplified environment.

For illustration, consider a somewhat extreme case of constructing the function object $1 \to d$, or $d^1$ ($d$ to the power of the terminal object). This object should be isomorphic to $d$. Let’s see how this works: The terminal object $1$ is the unit of the product, so

$L_1 c = c \times 1 \cong c$

so the comma category $L_1 / d$ is just the slice category $C/d$ of arrows to $d$. It so happens that this category has the terminal object $(d, id_d)$. The colimit of a diagram that has a terminal object is that terminal object. So, indeed, in this case, our construction produces a function object that is isomorphic to $d$.

$1 \to d \cong d$

Intuitively, given a lambda that captures a value of type $c$ from the environment and returns a $d$, we can trivially factor it out, using this lambda to transform the environment for $c$ to $d$ and then apply the identity on $d$. The latter corresponds to the comma category object $(d, id_d)$, and the forgetful functor maps it to $d$.

It’s instructive to run a few more examples to get the hang of it. For instance, the function object Bool->d can be constructed by considering closures of the type

f :: (c, Bool) -> d

Any such closure can be factorized by the following transformation of the environment

h :: c -> (d, d)
h c = (f (c, True), f (c, False))

followed by

g :: ((d, d), Bool) -> d
g ((d1, d2), b) = if b then d1 else d2

Indeed:

f (c, b) = g (h c, b)

In other words
$2 \to d \cong d \times d$
where $2$ corresponds to the Bool type.

## Counit

We are particularly interested in the counit of the adjunction. Its component at $d$ is a morphism

$\epsilon_d : L R d \to d$

It also happens to be an object in the comma category, namely

$(R d, \epsilon_d \colon L R d \to d)$.

In fact, it is the terminal object in that category. You can see that because for any other object $(c, f \colon L c \to d)$ there is a morphism $h \colon c \to R d$ that makes the following triangle commute:

This morphisms $h$ is a leg in the terminal cocone that defines $R d$. We know for sure that $c$ is in the base of that cocone, because it’s the projection $\pi_d$ of $(c, f \colon L c \to d)$.

To get some insight into the construction of the function object, imagine that you can enumerate the set of all possible environments $c_i$. The comma category $L_a/d$ would then consist of pairs $(c_i, f_i \colon (c_i, a) \to d)$. The coproduct of all those environments is a good candidate for the function object $a \to d$. Indeed, let’s try to define a counit for it:

$(\coprod c_i, a) \to d \cong \coprod (c_i, a) \to d \cong \prod ((c_i, a) \to d)$

I used the distributive law:

$(\coprod c_i, a) \cong \coprod (c_i, a)$

and the fact that the mapping out of a sum is the product of mappings. The right hand side can be constructed from the morphisms of the comma category.

So the object $\coprod c_i$ satisfies at least one requirement of the function object: there is an implementation of apply for it. It is highly redundant, though. This is why, instead of the coproduct, we used the colimit in our construction of the function object. Also, we ignored the size issues.

## Size Issues

As we discussed before, this construction doesn’t work in general because of size issues: the comma category is not necessarily small, and the colimit might not exist.

To address this problems, we have previously defined small solution sets. In the case of the right adjoint, a solution set is a family of objects that is weakly terminal in $L/c$. These are pairs $(c_i, f_i \colon L c_i \to d)$ that, among themselves, can factor out any $g \colon L c \to d$

$g = f_i \circ L h$

It means that we can always find an index $i$ and a morphism $h \colon c \to c_i$ to satisfy that equation. Every $g$ might require a different $f_i$ and $h$ to factor through but, for any $g$, we are guaranteed to always find a pair.

Once we have a complete solution set, the right adjoint $R d$ is constructed by first forming a coproduct of all the $c_i$ and then using a coequalizer to construct one terminal object.

What is really interesting is that, in some cases, we can just use the coproduct of the solution set, $\coprod_i c_i$ to approximate the adjoint (thus skipping the equalizer part).

The idea is that, in a particular program, we don’t need to represent all possible function types, just a (small) subset of those. We are also not particularly worried about uniqueness: it’s no problem if the same function ends up with multiple syntactic representations.

Let’s reformulate Freyd’s construction of the function object in programming terms. The solution set is the set of types $c_i$ and functions
$f_i \colon (c_i, a) \to d$
such that, for any function
$g \colon (c, a) \to d$
that is of interest in our program (for instance, used as an argument to another function) there exists an $i$ and a function
$h \colon c \to c_i$
such that $g$ can be rewritten as
$g (c, a) = f_i (h c, a)$
In other words, every function of interest can be replaced by one of the solution-set functions. The environment for this standard function can be always extracted from the environment of the more general function.

## CPS Transformation

A particular application of higher order functions shows up in the context of continuation passing transformation. Let’s look at a simple example. We are going to implement a function that traverses a binary tree containing strings, and concatenates them all into one string. Here’s the tree

data Tree = Leaf String
| Node Tree String Tree


Recursive traversal is pretty straightforward

show1 :: Tree -> String
show1 (Leaf s) = s
show1 (Node l s r) =
show1 l ++  s ++ show1 r


We can test it on a small tree:

tree :: Tree
tree = Node (Node (Leaf "1 ") "2 " (Leaf "3 "))
"4 "
(Leaf "5 ")

test = show1 tree


There is just one problem: recursion consumes the runtime stack, which is usually a limited resource. Your program may run out of stack space resulting in the “stack overflow” runtime error. This is why the compiler will turn recursion into iteration, whenever possible. And it is always possible if the function is tail recursive, that is, the recursive call is the last call in the function. No operation on the result of the recursive call is permitted in a tail recursive function.

This is clearly not happening in our implementation of show1: After the recursive call is made to traverse the left subtree, we still have to make another call to traverse the right tree, and the two results must be concatenated with the contents of the node.

Notice that this is not just a functional programming problem. In an imperative language, where iteration is the rule, tree traversal is still implemented using recursion. That’s because the data structure itself is recursive. It used to be a common interview question to implement non-recursive tree traversal, but the solution is always to explicitly implement your own stack (we’ll see how it’s done at the end of this post).

There is a standard procedure to make functions tail recursive using continuation passing style (CPS). The idea is simple: if there is stuff to do with the result of a function call, let the function we’re calling do it instead. This “stuff to do” is called a continuation. The function we are calling takes the continuation as an argument and, when it finishes its job, it calls it with the result. A continuation is a function, so CPS-transformed functions have to be higher-order: they must accept functions as arguments. Often, the continuations are defined on the spot using lambdas.

Here’s the CPS transformed tree traversal. Instead of returning a string, it accepts a continuation k, a function that takes a string and produces the final result of type a.

show2 :: Tree -> (String -> a) -> a
show2 (Leaf s) k = k s
show2 (Node lft s rgt) k =
show2 lft (\ls ->
show2 rgt (\rs ->
k (ls ++ s ++ rs)))

If the tree is just a leaf, show2 calls the continuation with the string that’s stored in the leaf.

If the tree is a node, show2 calls itself recursively to convert the left child lft. This is a tail call, nothing more is done with its result. Instead, the rest of the work is packaged into a lambda and passed as a continuation to show2. This is the lambda

\ls ->
show2 rgt (\rs ->
k (ls ++ s ++ rs))

This lambda will be called with the result of traversing the left child. It will then call show2 with the right child and another lambda

\rs ->
k (ls ++ s ++ rs)

Again, this is a tail call. This lambda expects the string that is the result of traversing the right child. It concatenates the left string, the string from the current node, and the right string, and calls the original continuation k with it.

Finally, to convert the whole tree t, we call show2 with a trivial continuation that accepts the final result and immediately returns it.

show t = show2 t (\x -> x)

There is nothing special about lambdas as continuations. It’s possible to replace them with named functions. The difference is that a lambda can implicitly capture values from its environment. A named function must capture them explicitly. The three lambdas we used in our CPS-transformed traversal can be replaced with three named functions, each taking an additional argument representing the values captured from the environment:

done s = s
next (s, rgt, k) ls = show3 rgt (conc (ls, s, k))
conc (ls, s, k) rs = k (ls ++ s ++ rs)


The first function done is an identity function, it forces the generic type a to be narrowed down to String.

Here’s the modified traversal using named functions and explicit captures.

show3 :: Tree -> (String -> a) -> a
show3 (Leaf s) k = k s
show3 (Node lft s rgt) k =
show3 lft (next (s, rgt, k))

show t = show3 t done


We can now start making the connection with the earlier discussion of the adjoint theorem. The three functions we have just defined, done, next, and conc, form the family

$f_i \colon (c_i, a) \to b$.

They are functions of two arguments, or a pair of arguments. The first argument represents the object $c_i$, part of the solution set. It corresponds to the environment captured by the closure. The three $c_i$ are, respectively

()
(String, Tree, String -> String)
(String, String, String->String)


(Notice the empty environment of done, here represented as the unit type ().)

The second argument of all three functions is of the type String, and the return type is also String so, according to Freyd’s theorem, we are in the process of defining the function object $a \to b$, where $a$ is String and $b$ is String.

## Defunctionalization

Here’s the interesting part: instead of defining the general function type String->String, we can approximate it with the coproduct of the elements of the solution set. Here, the three components of the sum type correspond to the environments captured by our three functions.

data Kont = Done
| Next String Tree   Kont
| Conc String String Kont


The counit of the adjunction is approximated by a function from this sum type paired with a String, returning a String

apply :: Kont -> String -> String
apply Done s = s
apply (Next s rgt k) ls = show4 rgt (Conc ls s k)
apply (Conc ls s k) rs  = apply k (ls ++ s ++ rs)


Rather than passing one of the three functions to our higher-order CPS traversal, we can pass this sum type

show4 :: Tree -> Kont -> String
show4 (Leaf s) k = apply k s
show4 (Node lft s rgt) k =
show4 lft (Next s rgt k)


This is how we execute it

show t = show4 t Done

We have gotten rid of all higher-order functions by replacing their function arguments with a data type equipped with the apply function. There are several situations when this is advantageous. In procedural languages, defunctionalization may be used to replace recursion with loops. In fact, the Kont data structure can be seen as a user-defined stack, especially if it’s rewritten as a list.

type Kont = [(String, Either Tree String)]

Here, Done was replaced with an empty list and Next and Conc together correspond to pushing a value on the stack.

In Haskell, the compiler performs tail recursion optimization, but defunctionalization may still be useful in implementing distributed systems, or web servers. Any time we need to pass a function between a client and a server, we can replace it by a data type that can be easily serialized.

## Bibliography

1. John C. Reynolds, Definitional Interpreters for Higher-Order Programming Languages
2. James Koppel, The Best Refactoring You’ve Never Heard Of.