Optics for the Working Mathematician

Category theory extracts the essence of structure and composition. At its foundation it deals with the composition of arrows. Building on composition of arrows it then goes on describing the ways objects can be composed: we have products, coproducts and, at a higher level, tensor products. They all describe various modes of composing objects. In monoidal categories any two objects can be composed.

Unlike composition, which can be described uniformly, decomposition requires case-by-case treatment. It’s easy to decompose a cartesian product using projections. A coproduct (sum) can be decomposed using pattern matching. A generic tensor product, on the other hand, has no standard means of decompositon.

Optics is the essence of decomposition. It answers the question of what it means to decompose a composite.

We consider an object decomposable when:

We can split it into the focus and the complement,
We can replace the focus with something else, without changing the complement, to get a new composite object,
We can zoom in; that is, if the focus is decomposable, we can compose the two decompositions,
It’s possible for the whole object to be the focus.

Let’s translate these requirements into the language of category theory. We’ll start with the standard example: the lens, which is the optic for decomposing cartesian products.

The splitting means that there is a morphism from the composite object $s$ to the product $c \times a$ , where $c$ is the complement and $a$ is the focus. This morphism is a member of the hom-set $\mathcal{C}(s, c \times a)$ .

To replace the focus we need another morphism that takes the same complement $c$ , combines it with the new focus $b$ to produce the new composite $t$ . This morphism is a member of the hom-set $\mathcal{C}(c \times b, t)$

Here’s the important observation: We don’t care what the complement is. We are “focusing” on the focus. We carry the complement over to combine it with the new focus, but we don’t use it for anything else. It’s a featureless black box.

To erase the identity of the complement, we hide it inside a coend. A coend is a generalization of a sum, so it is written using the integral sign (see the Appendix for details). Programmers know it as an existential type, logicians call it an existential quantifier. We say that there exists a complement $c$ , but we don’t care what it is. We “integrate” over all possible complements.

Here’s the existential definition of the lens:

$L(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c \times a) \times \mathcal{C}(c \times b, t)$

Just like we construct a coproduct using one of the injections, so the coend is constructed using one of (possibly infinite number of) injections. In our case we construct a lens $L(s, t; a, b)$ by injecting a pair of morphisms from the two hom-sets sharing the same $c$ . But once the lens is constructed, there is no way to extract the original $c$ from it.

It’s not immediately obvious that this representation of the lens reproduces the standard setter/getter form. However, in a cartesian closed category, we can use the currying adjunction to transform the second hom-set:

$\mathcal{C}(c \times b, t) \cong \mathcal{C}(c, [b, t])$

Here, $[b, t]$ is the internal hom, or the function object representing morphisms from $b$ to $t$ . We can then use the co-Yoneda lemma to reduce the coend:

$\int^{c : \mathcal{C}} \mathcal{C}(s, c \times a) \times \mathcal{C}(c, [b, t]) \cong \mathcal{C}(s, [b, t] \times a) \cong \mathcal{C}(s \times b, t) \times \mathcal{C}(s, a)$

The first part of this product is the setter: it takes the source object $s$ and the new focus $b$ to produce the new target $t$ . The second part is the getter that extracts the focus $a$ .

Even though all optics have similar form, each of them reduces differently.

Here’s another example: the prism. We just replace the product with the coproduct (sum).

$P(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c + a) \times \mathcal{C}(c + b, t)$

This time the reduction goes through the universal property of the coproduct: a mapping out of a sum is a product of mappings:

$\mathcal{C}(c + b, t) \cong\mathcal{C}(c, t) \times\mathcal{C}(b, t)$

Again, we use the co-Yoneda to reduce the coend:

$\int^{c : \mathcal{C}} \mathcal{C}(s, c + a) \times\mathcal{C}(c, t) \times\mathcal{C}(b, t) \cong\mathcal{C}(s, t + a) \times\mathcal{C}(b, t)$

The first one extracts the focus $a$ , if possible, otherwise it constructs a $t$ (by secretly injecting a $c$ ). The second constructs a $t$ by injecting a $b$ .

We can easily generalize existential optics to an arbitrary tensor product in a monoidal category:

$O(s, t; a, b) = \int^{c : \mathcal{C}} \mathcal{C}(s, c \otimes a) \times \mathcal{C}(c \otimes b, t)$

In general, though, this form cannot be further reduced using the co-Yoneda trick.

But what about the third requirement: the zooming-in property of optics? In the case of the lens and the prism it works because of associativity of the product and the sum. In fact it works for any tensor product. If you can decompose $s$ into $c \otimes a$ , and further decompose $a$ into $c' \otimes a'$ , then you can decompose $s$ into $(c \otimes c') \otimes a'$ . Zooming-in is made possible by the associativity of the tensor product.

Focusing on the whole object plays the role of the unit of zooming.

These two properties are used in the definition of the category of optics. The objects in this category are pairs of object in $\mathcal{C}$ . A morphism from a pair $\langle s, t \rangle$ to $\langle a, b \rangle$ is the optic $O(s, t; a, b)$ . Zooming-in is the composition of morphisms.

But this is still not the most general setting. The useful insight is that the multiplication (product) in a lens, and addition (coproduct) in a prism, look like examples of linear transformations, with the residue $c$ playing the role of a parameter. In fact, a composition of a lens with a prism produces a 2-parameter affine transformation, which also behaves like an optic. We can therefore generalize optics to work with an arbitrary monoidal action (first hinted in the discussion at the end of this blog post). Categories with such actions are known as actegories.

The idea is that you define a family of endofunctors $A_m$ in $\mathcal{C}$ that is parameterized by objects from a monoidal category $\mathcal{M}$ . So far we’ve only discussed examples where the parameters were taken from the same category $\mathcal{C}$ and the action was either multiplication or addition. But there are many examples in which $\mathcal{M}$ is not the same as $\mathcal{C}$ .

The zooming principles are satisfied if the action respects the tensor product in $\mathcal{M}$ :

$A_{m \otimes n} \cong A_m \circ A_n$

$A_1 \cong \mathit{Id}$

(Here, $1$ is the unit object with respect to the tensor product $\otimes$ in $\mathcal{M}$ , and $\mathit{Id}$ is the identity endofunctor.)

The actegorical version of the optic doesn’t deal directly with the residue. It tells us that the “unimportant” part of the composite object can be parameterized by some $m \colon \mathcal{M}$ .

This additional abstraction allows us to transport the residue between categories. It’s enough that we have one action $L_m$ in $\mathcal{C}$ and another $R_m$ in $\mathcal{D}$ to create this mixed optics (first introduced by Mitchell Riley):

$O(s, t; a, b) = \int^{m : \mathcal{M}} \mathcal{C}(s, L_m a) \times \mathcal{D}(R_m b, t)$

The separation of the focus from the complement using monoidal actions is reminiscent of what physicists call the distinction between “physical” and “gauge” degrees of freedom.

An in-depth presentation of optics, including their profunctor representation, is available in this paper.

Appendix: Coends and the Co-Yoneda Lemma

A coend is defined for a profunctor, that is a functor of two variables, one contravariant and one covariant, $p \colon \mathcal{C}^{op} \times \mathcal{C} \to \mathbf{Set}$ . It’s a cross between a coproduct and a trace, as it’s constructed using injections of diagonal elements (with some identifications):

$\iota_{a} \colon p \langle a, a \rangle \to \int^{c : \mathcal{C}} p \langle c, c \rangle$

Co-Yoneda lemma is the identity that works for any covariant functor (copresheaf) $F \colon \mathcal{C} \to \mathbf{Set}$ :

$\int^{c \colon \mathcal{C}} F(c) \times \mathcal{C}(c, x) \cong F(x)$

Jay Gray Says:

September 9, 2021 at 5:53 am

Bartosz, your articles are damn-difficult to read (:->), but I always learn something (although usually less than you expect/intend).
My request is that you relate your examples to some more conventional structures.
One we use, because it is widely used, is the JSON-LD notation and registration of ‘categories’ using different ontologies.
The most widely used ontology (although most users do not think of it as an ‘ontology’) is schema.org.
The core datatypes are immutable categories. We can translate your concepts to a combination of JSON-LD and schema.org (sprinkled with the addition of other RDF datasets like US NIH PubChem and EMBL ChEBI).
We can build complex expressions by linking categories (won’t go further on this point because I’m not trying to sell you an alternative).
Just a suggestion from a less sophisticated student who tries to apply your ideas about category theory to real-world applications.

Bartosz Milewski Says:

September 9, 2021 at 11:36 am

This particular article was written specifically for category theorists. I have several earlier posts explaining these thing for programmers. As far as databases are concerned, I know too little about it. You might want to look at the work of David Spivak and Brendan Fong.

Johan Lövgren Says:

September 15, 2021 at 6:03 am

Bartosz, I am curious about your physics analogy. If I understand you correctly, you are comparing the monoidal action parametrized by m to gauge transformations parametrized by some spacetime point x. And that we then “sum” over m just like we integrate over x in the action?

Are you suggesting gauge theory can be thought of as an optic? Or vice-versa? Or both? 🙂

Anyway I would be glad to hear you elaborate on the subject.

September 15, 2021 at 10:09 am

I’m comparing monoidal action to gauge transformation, say U(1) for electromagnetic field, or SU(3) color for strong interactions. A field is a section of a bundle whose base is the spacetime, and the fiber carries a representation of this group.

Bartosz Milewski's Programming Cafe