The main idea of functional programming is to treat functions like any other data types. In particular, we want to be able to pass functions as arguments to other functions, return them as values, and store them in data structures. But what kind of data type is a function? It’s a type that, when paired with another piece of data called the argument, can be passed to a function called *apply* to produce the result.

apply :: (a -> d, a) -> d

In practice, function application is implicit in the syntax of the language. But, as we will see, even if your language doesn’t support higher-order functions, all you need is to roll out your own *apply*.

But where do these function objects, arguments to `apply`

, come from; and how does the built-in `apply`

know what to do with them?

When you’re implementing a function, you are, in a sense, telling *apply* what to do with it–what code to execute. You’re implementing individual chunks of *apply*. These chunks are usually scattered all over your program, sometimes anonymously in the form of lambdas.

We’ll talk about program transformations that introduce more functions, replace anonymous functions with named ones, or turn some functions into data types, without changing program semantics. The main advantage of such transformations is that they may improve performance, sometimes drastically so; or support distributed computing.

## Function Objects

As usual, we look to category theory to provide theoretical foundation for defining function objects. It turns out that we are able to do functional programming because the category of types and functions is *cartesian closed*. The first part, *cartesian*, means that we can define product types. In Haskell, we have the pair type `(a, b)`

built into the language. Categorically, we would write it as . Product is functorial in both arguments so, in particular, we can define a functor

It’s really a family of functors that it parameterized by .

The right adjoint to this functor

*defines* the function type (a.k.a., the exponential object ). The existence of this adjunction is what makes a category *closed*. You may recognize these two functors as, respectively, the writer and the reader functor. When the parameter is restricted to monoids, the writer functor becomes a monad (the reader is already a monad).

An adjunction is defined as a (natural) isomorphism of hom-sets:

or, in our case of two endofunctors, for some fixed ,

In Haskell, this is just the definition of currying:

curry :: ((c, a) -> d) -> (c -> (a -> d)) uncurry :: (c -> (a -> d)) -> ((c, a) -> d)

You may recognize the counit of this adjunction

as our `apply`

function

## Adjoint Functor Theorem

In my previous blog post I discussed the Freyd’s Adjoint Functor theorem from the categorical perspective. Here, I’m going to try to give it a programming interpretation. Also, the original theorem was formulated in terms of finding the left adjoint to a given functor. Here, we are interested in finding the right adjoint to the product functor. This is not a problem, since every construction in category theory can be dualized by reversing the arrows. So instead of considering the comma category , we’ll work with the comma category . Its objects are pairs , in which is a morphism

.

This is the general picture but, in our case, we are dealing with a single category, and is an endofunctor. We can implement the objects of our comma category in Haskell

data Comma a d c = Comma c ((c, a) -> d)

The type `a`

is just a parameter, it parameterizes the (left) functor

and `d`

is the target object of the comma category.

We are trying to construct a function object representing functions `a->d`

, so what role does `c`

play in all of this? To understand that, you have to take into account that a function object can be used to describe closures: functions that capture values from their environment. The type `c`

represents those captured values. We’ll see this more explicitly later, when we talk about defunctionalizing closures.

Our comma category is a category of all closures that go from to while capturing all possible environments. The function object we are constructing is essentially a sum of all these closures, except that some of them are counted multiple times, so we need to perform some identifications. That’s what morphisms are for.

The morphisms of the comma category are morphisms in that make the following triangles in commute.

Unfortunately, commuting diagrams cannot be expressed in Haskell. The closest we can get is to say that a morphism from

c1 :: Comma a d c

to

c2 :: Comma a d c'

is a function `h :: c -> c'`

such that, if

c1 = Comma c f f :: (c, a) -> d c2 = Comma c' g g :: (c', a) -> d

then

f = g . bimap h id

Here, `bimap h id`

is the lifting of `h`

to the functor . More explicitly

f (c, x) = g (h c, x)

As we are interpreting `c`

as the environment in which the closure is defined, the question is: does `f`

use all of the information encoded in `c`

or just a part of it? If it’s just a part, then we can factor it out. For instance, consider a lambda that captures an integer, but it’s only interested in whether the integer is even or odd. We can replace this lambda with one that captures a Boolean, and use the function `even`

to transform the environment.

The next step in the construction is to define the projection functor from the comma category back to that forgets the part and just keeps the object

We use this functor to define a diagram in . Now, instead of taking its limit, as we did in the previous installment, we’ll take the colimit of this diagram. We’ll use this colimit to define the action of the right adjoint functor on .

In our case, the forgetful functor discards the function part of `Comma a d c`

, keeping only the environment . This means that, as long as `d`

is not `Void`

, we are dealing with a gigantic diagram that encompasses all objects in our category of types. The colimit of this diagram is a gigantic coproduct of everything, modulo identifications introduced by morphisms of the comma category. But these identifications are crucial in pruning out redundant closures. Every lambda that uses only part of the information from the captured environment can be identified with a simpler lambda that uses a simplified environment.

For illustration, consider a somewhat extreme case of constructing the function object , or ( to the power of the terminal object). This object should be isomorphic to . Let’s see how this works: The terminal object is the unit of the product, so

so the comma category is just the slice category of arrows to . It so happens that this category has the terminal object . The colimit of a diagram that has a terminal object is that terminal object. So, indeed, in this case, our construction produces a function object that is isomorphic to .

Intuitively, given a lambda that captures a value of type from the environment and returns a , we can trivially factor it out, using this lambda to transform the environment for to and then apply the identity on . The latter corresponds to the comma category object , and the forgetful functor maps it to .

It’s instructive to run a few more examples to get the hang of it. For instance, the function object `Bool->d`

can be constructed by considering closures of the type

f :: (c, Bool) -> d

Any such closure can be factorized by the following transformation of the environment

h :: c -> (d, d) h c = (f (c, True), f (c, False))

followed by

g :: ((d, d), Bool) -> d g ((d1, d2), b) = if b then d1 else d2

Indeed:

f (c, b) = g (h c, b)

In other words

where corresponds to the `Bool`

type.

## Counit

We are particularly interested in the counit of the adjunction. Its component at is a morphism

It also happens to be an object in the comma category, namely

.

In fact, it is the terminal object in that category. You can see that because for any other object there is a morphism that makes the following triangle commute:

This morphisms is a leg in the terminal cocone that defines . We know for sure that is in the base of that cocone, because it’s the projection of .

To get some insight into the construction of the function object, imagine that you can enumerate the set of all possible environments . The comma category would then consist of pairs . The coproduct of all those environments is a good candidate for the function object . Indeed, let’s try to define a counit for it:

I used the distributive law:

and the fact that the mapping out of a sum is the product of mappings. The right hand side can be constructed from the morphisms of the comma category.

So the object satisfies at least one requirement of the function object: there is an implementation of `apply`

for it. It is highly redundant, though. This is why, instead of the coproduct, we used the colimit in our construction of the function object. Also, we ignored the size issues.

## Size Issues

As we discussed before, this construction doesn’t work in general because of size issues: the comma category is not necessarily small, and the colimit might not exist.

To address this problems, we have previously defined small *solution sets*. In the case of the right adjoint, a solution set is a family of objects that is *weakly terminal* in . These are pairs that, among themselves, can factor out any

It means that we can always find an index and a morphism to satisfy that equation. Every might require a different and to factor through but, for any , we are guaranteed to always find a pair.

Once we have a complete solution set, the right adjoint is constructed by first forming a coproduct of all the and then using a coequalizer to construct one terminal object.

What is really interesting is that, in some cases, we can just use the coproduct of the solution set, to approximate the adjoint (thus skipping the equalizer part).

The idea is that, in a particular program, we don’t need to represent all possible function types, just a (small) subset of those. We are also not particularly worried about uniqueness: it’s no problem if the same function ends up with multiple syntactic representations.

Let’s reformulate Freyd’s construction of the function object in programming terms. The solution set is the set of types and functions

such that, for any function

that is of interest in our program (for instance, used as an argument to another function) there exists an and a function

such that can be rewritten as

In other words, every function of interest can be replaced by one of the solution-set functions. The environment for this standard function can be always extracted from the environment of the more general function.

## CPS Transformation

A particular application of higher order functions shows up in the context of continuation passing transformation. Let’s look at a simple example. We are going to implement a function that traverses a binary tree containing strings, and concatenates them all into one string. Here’s the tree

data Tree = Leaf String | Node Tree String Tree

Recursive traversal is pretty straightforward

show1 :: Tree -> String show1 (Leaf s) = s show1 (Node l s r) = show1 l ++ s ++ show1 r

We can test it on a small tree:

tree :: Tree tree = Node (Node (Leaf "1 ") "2 " (Leaf "3 ")) "4 " (Leaf "5 ")

test = show1 tree

There is just one problem: recursion consumes the runtime stack, which is usually a limited resource. Your program may run out of stack space resulting in the “stack overflow” runtime error. This is why the compiler will turn recursion into iteration, whenever possible. And it is always possible if the function is *tail recursive*, that is, the recursive call is the *last call* in the function. No operation on the result of the recursive call is permitted in a tail recursive function.

This is clearly not happening in our implementation of `show1`

: After the recursive call is made to traverse the left subtree, we still have to make another call to traverse the right tree, and the two results must be concatenated with the contents of the node.

Notice that this is not just a functional programming problem. In an imperative language, where iteration is the rule, tree traversal is still implemented using recursion. That’s because the data structure itself is recursive. It used to be a common interview question to implement non-recursive tree traversal, but the solution is always to explicitly implement your own stack (we’ll see how it’s done at the end of this post).

There is a standard procedure to make functions tail recursive using continuation passing style (CPS). The idea is simple: if there is stuff to do with the result of a function call, let the function we’re calling do it instead. This “stuff to do” is called a continuation. The function we are calling takes the continuation as an argument and, when it finishes its job, it calls it with the result. A continuation is a function, so CPS-transformed functions have to be higher-order: they must accept functions as arguments. Often, the continuations are defined on the spot using lambdas.

Here’s the CPS transformed tree traversal. Instead of returning a string, it accepts a continuation `k`

, a function that takes a string and produces the final result of type `a`

.

show2 :: Tree -> (String -> a) -> a show2 (Leaf s) k = k s show2 (Node lft s rgt) k = show2 lft (\ls -> show2 rgt (\rs -> k (ls ++ s ++ rs)))

If the tree is just a leaf, `show2`

calls the continuation with the string that’s stored in the leaf.

If the tree is a node, `show2`

calls itself recursively to convert the left child `lft`

. This is a tail call, nothing more is done with its result. Instead, the rest of the work is packaged into a lambda and passed as a continuation to `show2`

. This is the lambda

\ls -> show2 rgt (\rs -> k (ls ++ s ++ rs))

This lambda will be called with the result of traversing the left child. It will then call `show2`

with the right child and another lambda

\rs -> k (ls ++ s ++ rs)

Again, this is a tail call. This lambda expects the string that is the result of traversing the right child. It concatenates the left string, the string from the current node, and the right string, and calls the original continuation `k`

with it.

Finally, to convert the whole tree `t`

, we call `show2`

with a trivial continuation that accepts the final result and immediately returns it.

show t = show2 t (\x -> x)

There is nothing special about lambdas as continuations. It’s possible to replace them with named functions. The difference is that a lambda can implicitly capture values from its environment. A named function must capture them explicitly. The three lambdas we used in our CPS-transformed traversal can be replaced with three named functions, each taking an additional argument representing the values captured from the environment:

done s = s next (s, rgt, k) ls = show3 rgt (conc (ls, s, k)) conc (ls, s, k) rs = k (ls ++ s ++ rs)

The first function `done`

is an identity function, it forces the generic type `a`

to be narrowed down to `String`

.

Here’s the modified traversal using named functions and explicit captures.

show3 :: Tree -> (String -> a) -> a show3 (Leaf s) k = k s show3 (Node lft s rgt) k = show3 lft (next (s, rgt, k)) show t = show3 t done

We can now start making the connection with the earlier discussion of the adjoint theorem. The three functions we have just defined, `done`

, `next`

, and `conc`

, form the family

.

They are functions of two arguments, or a pair of arguments. The first argument represents the object , part of the solution set. It corresponds to the environment captured by the closure. The three are, respectively

() (String, Tree, String -> String) (String, String, String->String)

(Notice the empty environment of `done`

, here represented as the unit type `()`

.)

The second argument of all three functions is of the type `String`

, and the return type is also `String`

so, according to Freyd’s theorem, we are in the process of defining the function object , where is `String`

and is `String`

.

## Defunctionalization

Here’s the interesting part: instead of defining the general function type `String->String`

, we can approximate it with the coproduct of the elements of the solution set. Here, the three components of the sum type correspond to the environments captured by our three functions.

data Kont = Done | Next String Tree Kont | Conc String String Kont

The counit of the adjunction is approximated by a function from this sum type paired with a `String`

, returning a `String`

apply :: Kont -> String -> String apply Done s = s apply (Next s rgt k) ls = show4 rgt (Conc ls s k) apply (Conc ls s k) rs = apply k (ls ++ s ++ rs)

Rather than passing one of the three functions to our higher-order CPS traversal, we can pass this sum type

show4 :: Tree -> Kont -> String show4 (Leaf s) k = apply k s show4 (Node lft s rgt) k = show4 lft (Next s rgt k)

This is how we execute it

show t = show4 t Done

We have gotten rid of all higher-order functions by replacing their function arguments with a data type equipped with the `apply`

function. There are several situations when this is advantageous. In procedural languages, defunctionalization may be used to replace recursion with loops. In fact, the `Kont`

data structure can be seen as a user-defined stack, especially if it’s rewritten as a list.

type Kont = [(String, Either Tree String)]

Here, `Done`

was replaced with an empty list and `Next`

and `Conc`

together correspond to *pushing* a value on the stack.

In Haskell, the compiler performs tail recursion optimization, but defunctionalization may still be useful in implementing distributed systems, or web servers. Any time we need to pass a function between a client and a server, we can replace it by a data type that can be easily serialized.

## Bibliography

- John C. Reynolds, Definitional Interpreters for Higher-Order Programming Languages
- James Koppel, The Best Refactoring You’ve Never Heard Of.