February | 2020 | Bartosz Milewski's Programming Cafe

We live in interesting times. For instance, we are witnessing several extinction events all at once. One of them is the massive extinction of species. The other is the extinction of jobs. Both are caused by advances in technology. As programmers, we might consider ourselves immune to the latter–after all, somebody will have to program these self-driving trucks that eliminate the need for drivers, or the diagnostic tools that eliminate the need for doctors. Eventually, though, even programming jobs will be automated. I can imagine the last programmer putting finishing touches on the program that will make his or her job redundant.

But before we get there, let’s consider which programming tasks are the first to go, and which have the biggest chance to persist for the longest time. Experience tells us that it’s the boring menial jobs that get automated first. So any time you get bored with your work, take note: you are probably doing something that a computer could do better.

One such task is the implementation of user interfaces. All this code that’s behind various buttons, input fields, sliders, etc., is pretty much standard. Granted, you have to put a lot of effort to make the code portable to a myriad of platforms: various desktops, web browsers, phones, watches, fridges, etc. But that’s exactly the kind of expertise that is easily codified. If you find yourself doing copy and paste programming, watch out: your buddy computer can do it too. The work on generating UI has already started, see for instance, pix2code.

The design of user interfaces, as opposed to their implementation, will be more resistant to automation. Not only because it involves creativity, but also because it deals with human issues. Good design must serve the human in front of it. I’m not saying that modeling a human user is impossible, but it’s definitely harder. Of course, in many standard tasks, a drastically simplified human model will work just fine.

So I’m sorry to say that, but those programmers who specialize in HTML and JavaScript will have to retrain themselves.

The next job on the chopping block, in my opinion, is that of a human optimizer. In fact the only reason it hasn’t been eliminated yet is economical. It’s still cheaper to hire people to optimize code than it is to invest in the necessary infrastructure. You might think that programmers are expensive–the salaries of programmers are quite respectable in comparison to other industries. But if this were true, a lot more effort would go into improving programmers’ productivity, in particular in creating better tools. This is not happening. But as demand for software is growing, and the AI is getting cheaper, at some point the economic balance will change. It will be advantageous to use AI to optimize code.

I’m sorry to say that, but C and C++ programmers will have to go. These are the languages whose only raison d’être is to squeeze maximum performance from hardware. We’ll probably always be interested in performance, but there are other ways of improving it. We are familiar with optimizing compilers that virtually eliminated the job of an assembly language programmer. They use optimizers that are based on algorithmic principles–that is methods which are understandable to humans. But there is a whole new generation of AI waiting in the aisles, which can be trained to optimize code written in higher level languages. Imagine a system, which would take this definition of quicksort written in Haskell:

qsort [] = []
qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater
    where (lesser, greater) = partition (< p) xs

and produce code that would run as fast as its hand-coded C counterpart. Even if you don’t know Haskell, I can explain this code to you in just a few sentences. The first line says that sorting an empty list produces an empty list. The second line defines the action of quicksort on a list that consists of a head p–that will be our pivot–and the tail xs. The result is the concatenation (the symbol ++) of three lists. The first one is the result of (recursively) sorting the list lesser, the second is the singleton list containing the pivot, and the third is the result of sorting the list greater. Finally, the pair of lists (lesser, greater) is produced by partitioning xs using the predicate (< p), which reads “less than p.” You can’t get any simpler than that.

Of course the transformation required for optimizing this algorithm is highly nontrivial. Depending on the rest of the program, the AI might decide to change the representation of data from a list to a vector, replace copying by destructive swapping, put some effort in selecting a better pivot, use a different algorithm for sorting very short lists, and so on. This is what a human optimizer would do. But how much harder is this task than, say, playing a game of go against a grandmaster?

I am immensely impressed with the progress companies like Google or IBM made in playing go, chess, and Jeopardy, but I keep asking myself, why don’t they invest all this effort in programming technology? I can’t help but see parallels with Ancient Greece. The Ancient Greeks made tremendous breakthroughs in philosophy and mathematics–just think about Plato, Socrates, Euclid, or Pythagoras–but they had no technology to speak of. Hero of Alexandria invented a steam engine, but it was never put to work. It was only used as a parlor trick. There are many explanations of this phenomenon, but one that strikes close to home is that the Greeks didn’t need technology because they had access to cheap labor through slavery. I’m not implying that programmers are treated like slaves–far from it–but they seem to be considered cheap labor. In fact it’s so cheap to produce software that most of it is given away for free, or for the price of users’ attention in ad-supported software. A lot of software is just bait that’s supposed to entice the user to buy something more valuable, like beer.

It’s gradually becoming clear that programming jobs are diverging. This is not yet reflected in salaries, but as the job market matures, some programming jobs will be eliminated, others will increase in demand. The one area where humans are still indispensable is in specifying what has to be done. The AI will eventually be able to implement any reasonable program, as long as it gets a precise enough specification. So the programmers of the future will stop telling the computer how to perform a given task; rather they will specify what to do. In other words, declarative programming will overtake imperative programming. But I don’t think that explaining to the AI what it’s supposed to do will be easy. The AI will continue to be rather dumb, at least in the foreseeable future. It’s been noted that software that can beat the best go players in the world would be at a complete loss trying to prepare a dinner or clean the dishes. It’s able to play go because it’s reasonably easy to codify the task of playing go– the legal moves and the goal of the game. Humans are extremely bad at expressing their wishes, as illustrated by the following story:

A poor starving peasant couple are granted three wishes and the woman, just taking the first thing that comes to her mind, wishes for one sausage, which she receives immediately. Her husband, pointing out that she could have wished for immense wealth or food to last them a lifetime, becomes angry with her for making such a stupid wish and, not thinking, wishes the sausage were stuck on her nose. Sure enough, the sausage is stuck in the middle of her face, and then they have to use the third wish to make it go away, upon which it disappears completely.

As long as the dumb AI is unable to guess our wishes, there will be a need to specify them using a precise language. We already have such language, it’s called math. The advantage of math is that it was invented for humans, not for machines. It solves the basic problem of formalizing our thought process, so it can be reliably transmitted and verified. The definition of quicksort in Haskell is very mathematical. It can be easily verified using induction, because it’s recursive, and it operates on a recursive data structure: a list. The first line of code establishes the base case: an empty list is trivially sorted. Then we perform the induction step. We assume that we know how to sort all proper sublists of our list. We create two such sublists by partitioning the tail around the pivot. We sort the sublists, and then construct the final sorted list by inserting the pivot between them. As mathematical proofs go, this one is not particularly hard. In fact, in a typical mathematical text, it would be considered so trivial as to be left as an exercise for the reader.

Still, this kind of mathematical thinking seems to be alien to most people, including a lot of programmers. So why am I proposing it as the “programming language” of the future? Math is hard, but let’s consider the alternatives. Every programming language is a compromise between the human and the computer. There are languages that are “close to the metal,” like assembly or C, and there are languages that try to imitate natural language, like Cobol or SQL. But even in low level languages we try to use meaningful names for variables and functions in an attempt to make code more readable. In fact, there are programs that purposefully obfuscate source code by removing the formatting and replacing names with gibberish. The result is unreadable to most humans, but makes no difference to computers. Mathematical language doesn’t have to be machine readable. It’s a language that was created by the people, for the people. The reason why we find mathematical texts harder to read than, say, C++ code is because mathematicians work at a much higher abstraction level. If we tried to express the same ideas in C++, we would very quickly get completely lost.

Let me give you a small example. In mathematics, a monad is defined as a monoid in the category of endofunctors. That’s a very succinct definition. In order to understand it, you have to internalize a whole tower of abstractions, one built on top of another. When we implement monads in Haskell, we don’t use that definition. We pick a particular very simple category and implement only one aspect of the definition (we don’t implement monadic laws). In C++, we don’t even do that. If there are any monads in C++, they are implemented ad hoc, and not as a general concept (an example is the future monad which, to this day, is incomplete).

There is also some deeper math in the quicksort example. It’s a recursive function and recursion is related to algebras and fixed points. A more elaborate version of quicksort decomposes it into its more fundamental components. The recursion is captured in a combination of unfolding and folding that is called a hylomorphism. The unfolding is described by a coalgebra, while folding is driven by an algebra.

data TreeF a r = Leaf | Node a r r
  deriving Functor

split :: Ord a => Coalgebra (TreeF a) [a]
split [] = Leaf
split (a: as) = Node a l r 
  where (l, r) = partition (< a) as
    
join :: Algebra (TreeF a) [a]
join Leaf = []
join (Node a l r) = l ++ [a] ++ r

qsort :: Ord a => [a] -> [a]
qsort = hylo join split

You might think that this representation is an overkill. You may even use it in a conversation to impress your friends: “Quicksort is just a hylomorphism, what is the problem?” So how is it better than the original three-liner?

qsort [] = []
qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater
    where (lesser, greater) = partition (< p) xs

The main difference is that the flow of control in this new implementation is driven by a data structure generated by the functor TreeF. This functor describes a binary tree whose every node has a value of type a and two children. We use those children in the unfolding process to store lists of elements, lesser ones on the left, greater (or equal) on the right. Then, in the folding process, these children are replenished again–this time with sorted lists. This may seem like an insignificant change, but it uses a different processing ability of our brains. The recursive function tells us a linear, one-dimensional, story. It appeals to our story-telling ability. The functor-driven approach appeals to our visual cortex. There is an up and down, and left and right in the tree. Not only that, we can think of the algorithm in terms of movement, or animation. We are first “growing” the tree from the seed and then “traversing” it to gather the fruit from the branches. These are some powerful metaphors.

If this kind of visualization works for us, it might as well work for the AI that will try to optimize our programs. It may also be able to access a knowledge base of similar algorithms based on recursion schemes and category theory.

I’m often asked by programmers: How is learning category theory going to help me in my everyday programming? The implication being that it’s not worth learning math if it can’t be immediately applied to your current job. This makes sense if you are trying to locally optimize your life. You are close to the local minimum of your utility function and you want to get even closer to it. But the utility function is not constant–it evolves in time. Local minima disappear. Category theory is the insurance policy against the drying out of your current watering hole.

Bartosz Milewski's Programming Cafe

February 2020

Math is your insurance policy

Top Posts

License

Blogroll

Follow Me

Archives