The Functional Revolution in C++

June 9, 2014

The Functional Revolution in C++

Posted by Bartosz Milewski under C++, Concurrency, Functional Programming, Multicore, Multithreading, Parallelism, Programming
[48] Comments

C++ is like an oil tanker — it takes a long time for it to change course. The turbulent reefs towards which C++ has been heading were spotted on the horizon more than ten years ago. I’m talking, of course, about the end of smooth sailing under the Moore’s law and the arrival of the Multicore. It took six years to acknowledge the existence of concurrency in the C++11 Standard, but that’s only the beginning. It’s becoming more and more obvious that a major paradigm shift is needed if C++ is to remain relevant in the new era.

Why do we need a new paradigm to deal with concurrency? Can’t we use object oriented programming with small modifications? The answer to this question goes to the heart of programming: it’s about composability. We humans solve complex problems by splitting them into smaller subproblems. This is a recursive process, we split subproblems into still smaller pieces, and so on. Eventually we reach the size of the problem which can be easily translated into computer code. We then have to compose all these partial solutions into larger programs.

The key to composability is being able to hide complexity at each level. This is why object oriented programming has been so successful. When you’re implementing an object, you have to deal with its internals, with state transitions, intermediate states, etc. But once the object is implemented, all you see is the interface. The interface must be simpler than the implementation for object oriented programming to make sense. You compose larger objects from smaller objects based on their interfaces, not the details of their implementation. That’s how object oriented programming solves the problem of complexity.

Unfortunately, objects don’t compose in the presence of concurrency. They hide the wrong kind of things. They hide sharing and mutation. Let me quote the definition of data race: Two or more threads accessing the same piece of memory at the same time, at least one of them writing. In other words: Sharing + Mutation = Data Race. Nothing in the object’s interface informs you about the possibility of sharing and mutation inside the object’s implementation. Each object in isolation may be data-race-free but their composition may inadvertently introduce data races. And you won’t know about it unless you study the details of their implementation down to every single memory access.

In Java, an attempt had been made to mollify this problem: Every object is equipped with a mutex that can be invoked by declaring the method synchronized. This is not a scalable solution. Even Java’s clever thin lock implementation incurs non-negligible performance overhead, so it is used only when the programmer is well aware of potential races, which requires deep insight into the implementation of all subobjects, exactly the thing we are trying to avoid.

More importantly, locking itself doesn’t compose. There’s a classic example of a locked bank account whose deposit and withdraw methods are synchronized by a lock. The problem occurs when one tries to transfer money from one account to another. Without exposing the locks, it’s impossible to avoid a transient state in which the funds have already left one account but haven’t reached the second. With locks exposed, one may try to hold both locks during the transfer, but that creates a real potential for deadlocks. (Software Transactional Memory provides a composable solution to this problem, but there are no practical implementations of STM outside of Haskell and Clojure.)

Moreover, if we are interested in taking advantage of multicores to improve performance, the use of locks is a non-starter. Eking out parallel performance is hard enough without locks, given all the overheads of thread management and the Amdahl’s law. Parallelism requires a drastically different approach.

Since the central problem of concurrency is the conflict between sharing and mutation, the solution is to control these two aspects of programming. We can do mutation to our heart’s content as long as there’s no sharing. For instance, we can mutate local variables; or we can ensure unique ownership by making deep copies, using move semantics, or by employing unique_ptrs. Unique ownership plays very important role in message passing, allowing large amounts of data to be passed cheaply between threads.

However, the key to multicore programming is controlling mutation. This is why functional languages have been steadily gaining ground in concurrency and parallelism. In a nutshell, functional programmers have found a way to program using what, to all intents and purposes, looks like immutable data. An imperative programmer, when faced with immutability, is as confused as a barbecue cook in a vegetarian kitchen. And the truth is that virtually all data structures from the C++ standard library are unsuitable for this kind of programming — the standard vector being the worst offender. A continuous slab of memory is perfect for random or sequential access, but the moment mutation is involved, you can’t share it between threads. Of course, you can use a mutex to lock the whole vector every time you access it, but as I explained already, you can forget about performance and composability of such a solution.

The trick with functional data structures is that they appear immutable, and therefore require no synchronization when accessed from multiple threads. Mutation is replaced by construction: you construct a new object that’s a clone of the source object but with the requested modification in place. Obviously, if you tried to do this with a vector, you’d end up with a lot of copying. But functional data structures are designed for maximum sharing of representation. So a clone of a functional object will share most of its data with the original, and only record a small delta. The sharing is totally transparent since the originals are guaranteed to be immutable.

A singly-linked list is a classical, if not somewhat trivial, example of such a data structure. Adding an element to the front of a list requires only the creation of a single node to store the new value and a pointer to the original (immutable) list. There are also many tree-like data structures that are logarithmically cheap to clone-mutate (red-black trees, leftist heaps). Parallel algorithms are easy to implement with functional data structures, since the programmer doesn’t have to worry about synchronization.

Functional data structures, also known as “persistent” data structures, are naturally composable. This follows from the composability of immutable data — you can build larger immutable objects from smaller immutable objects. But there’s more to it: This new way of mutating by construction also composes well. A composite persistent object can be clone-mutated by clone-mutating only the objects on the path to the mutation; everything else can be safely shared.

Concurrency also introduces nonstandard flows of control. In general, things don’t progress sequentially. Programmers have to deal with inversion of control, jumping from handler to handler, keeping track of shared mutable state, etc. Again, in functional programming this is nothing unusual. Functions are first class citizens and they can be composed in many ways. A handler is nothing but a continuation in the continuation passing style. Continuations do compose, albeit in ways that are not familiar to imperative programmers. Functional programmers have a powerful compositional tool called a monad that, among other things, can linearize inverted flow of control. The design of libraries for concurrent programming makes much more sense once you understand that.

A paradigm shift towards functional programming is unavoidable and I’m glad to report that there’s a growing awareness of that new trend among C++ programmers. I used to be the odd guy talking about Haskell and monads at C++ meetings and conferences. This is no longer so. There was a sea change at this year’s C++Now. The cool kids were all talking about functional programming, and the presentation “Functional Data Structures in C++” earned me the most inspiring session award. I take it as a sign that the C++ community is ready for a big change.

48 Responses to “The Functional Revolution in C++”

kjellkod Says:

June 9, 2014 at 12:02 pm
Congrats on the award 🙂

Do you know if it will be video-casted somewhere so that we, who could not attend, can watch it?
Bartosz Milewski Says:

June 9, 2014 at 12:29 pm
Thanks! The talk has been recorded, so I assume it should be available online soon.
bhall Says:

June 9, 2014 at 12:56 pm
Re: “there are no practical implementations of STM outside of Haskell”

I’ve been using a home-grown STM system in C++ for a few years now with great results. Granted, I doubt that our implementation would be considered practical for general use. I’ll hopefully being talking about it at CppCon, still waiting to hear if my talk was accepted and if its not I’ll eventually get around to blogging about my experiences. But to go with what you’re saying here: even with STM in the mix the use of functional data structures is still a necessity to maintain composaiblity (but that might be due to the brain-dead STM implementation that we use in order to avoid transacting “too much” and running into performance problems).
Kevin Cameron Says:

June 9, 2014 at 1:45 pm
Concurrency is an old prblem that has been tackled before, and I’m not seeing how functional programming helps. The well known methodology for constructing parallel machines is the synchronous FSM methodology used with Verilog & VHDL, and it’s not hard to bump that up to asynchronous and do it with C++ (http://parallel.cc). The second choice would be something like Erlang that has communication as a first class construct.

Functional programming implies a machine that can handle a relatively deep call stack, or a compiler that knows how to collapse the code – that’ll mostly be tail-end recursion, but that’s awlward across libraries.

STM (like many things) should never appear at the user level (IMO).

PS: there was a company in the UK in the 1980s taking a functional programming approach to hardware design. AFAIK the hardware piece didn’t fly, but it may have morphed into this – http://www.spark-2014.org/about
Bartosz Milewski Says:

June 9, 2014 at 1:55 pm
I’m a big fan of STM. I even proposed it for the D language when I worked on it with Andrei and Walter. But it’s an uphill battle for a non-functional language, mostly because the type system doesn’t provide strong enough guarantees. Haskell can afford some amazing optimizations in the implementation of STM because it can rely on the purity of functions. In Haskell you also know exactly which mutations have to be transacted, whereas in C++ you have to instrument all writes. You end up needing two versions of each function, one with write guards for use in transactions, and one without. It’s all possible but at the cost of performance and robustness.
Nathan Says:

June 9, 2014 at 2:03 pm
I’m not familiar with Clojure, but I believe it has an STM implementation as well:
http://clojure.org/concurrent_programming

Thanks for the great read!
Bartosz Milewski Says:

June 9, 2014 at 2:20 pm
@Nathan: You’re right, I fixed it.
Kevin Cameron Says:

June 9, 2014 at 3:59 pm
A problem with programming languages in general is that people write in features of hardware and lose useful abstraction. STM is one of those things probably added to help with keeping big databases consistent, much like the assembly level calls used to build mutexes, just using them raw isn’t a good idea since that then usually ties future hardware into using the same methodology.

Functional programming is not really a general solution for programming since plenty of applications deal with large quantities of mutable data, in those cases an OO/FSM approach seems like a better choice to me. It certainly doesn’t map happily into DSP, GP-GPU or FPGA, and doesn’t look anything like how the Internet/Cloud computing works.
Bartosz Milewski Says:

June 9, 2014 at 4:06 pm
Kevin, I think you have it backwards. Haskell offers some of the most advanced solutions to DSP, GP-GPU and FPGA programming. It’s a common misunderstanding that functional languages don’t deal well with mutation. Functional languages give you the tools to control, not eliminate, mutation.
nosenseetal Says:

June 9, 2014 at 4:11 pm
a) afaik C++ has a way to lock all the mutexes properly without lock inversion deadlock
http://en.cppreference.com/w/cpp/thread/lock
b) “Moreover, if we are interested in taking advantage of multicores to improve performance, the use of locks is a non-starter”

“A continuous slab of memory is perfect for random or sequential access, but the moment mutation is involved, you can’t share it between threads. Of course, you can use a mutex to lock the whole vector every time you access it, but as I explained already, you can forget about performance and composability of such a solution. ”

It all depends on your usage pattern. This generalizations are fanboy talk.

c)”Nothing in the object’s interface informs you about the possibility of sharing and mutation inside the object’s implementation.”
Its hard to tell you this, but you dont know blank and blank
http://channel9.msdn.com/posts/C-and-Beyond-2012-Herb-Sutter-You-dont-know-blank-and-blank
😛

also Im really curious how those magical immutable lists deal with freeing of resources without locks:
I mean how do they know when part of the list is no longer needed(number of users drops to 0) if they dont use shared_ptr like atomic wizardry? Ofc really “smart” solution is to collect the garbage when program exits, aka to never reclaim allocated resources.

Please note: this is nothing personal against you and a lot of your posts are usable and informative(I watched your YT cpp lectures and I hope cppnow gets released soon 😀 ), but when you turn into FP PR zealot is gets kind of ridiculous… like 2013 is the year of Haskell post…
I mean multicore CPUs have been here for 10(+?) years, Haskell people have been claiming that Haskell is gonna take over the world for the previous 10 years because you know you cant deal with concurrency with imperative programming…
reminds me(iirc) of quicksort guy that used to claim for decades that sw hit the limit and only cure is more cowbell, I mean more formal verification. And in the meantime XP source looks like a tiny codebase by todays standards. 😀
Bartosz Milewski Says:

June 9, 2014 at 6:25 pm
@nosenseetal: I let my enthusiasm show through my posts — guilty as charged — but I’m also trying very hard to present solid arguments. Before this post there were several earlier ones dealing with lots of practical details including the discussion of memory management, synchronization, and reference counting. You might be interested in those (see links embedded in here).
Kevin Cameron Says:

June 9, 2014 at 7:10 pm
@Bartosz – I’ll give you that with a heap of analysis you can turn functional code into other forms, but there’s a bunch of machine scenarios where a (deep) call stack is something you want to avoid.

The main problem I have is that it hides all the communication in a way that makes it difficult to talk about. In HW design (and parallel processing) it becomes important at some point to put constraints on communication paths to balance up timing. It’s also hard to tell which things should be persistent constructs – e.g. GP-GPUs are difficult to feed and like to do stream processing, and doing that from a functional description leaves a lot of work to the compiler. I’m fairly confident the LLVM guys aren’t working on that, but I’ll ask them the next time I see them.

Of course if you don’t care about performance it won’t matter 😉
Phil Miller Says:

June 9, 2014 at 7:57 pm
Alternately, the general computing crowd can follow what people doing larger-scale parallel computing for decades have done, with quite a lot of success, and control or eliminate sharing rather than mutation. The notion of who (which process or which object, depending on the model) owns some data becomes a first-class consideration. Algorithms are developed and programs architected to address this head-on.

To programmers trained on imperative, procedural, and/or common object-oriented paradigms, messaging-based computing is as much a new way of thinking as relatively pure functional programming. The difference is that it has an established track record for parallel and concurrent computing.

For readers not familiar with the realm of parallel computing, the heaviest hitters are MPI, Charm++, Globals Arrays, Unified Parallel C, and Coarray Fortran (now just ‘Fortran’, since the 2008 standard). Any of these can sensibly be used even on a single multicore node, and the algorithms they inspire generally perform better than shared-state algorithms. The strongest factor in this is that the focus on locality naturally engenders cache-friendly code as well.
Phil Miller Says:

June 9, 2014 at 8:19 pm
@Kevin: If you don’t care about performance, then you shouldn’t be worrying about parallelism anyway. Just write the code the easiest way that works, and move on. The only reason parallelism matters is for performance.
Bartosz Milewski Says:

June 9, 2014 at 8:28 pm
@Phil: I don’t see message passing as the opposite or even as the alternative to functional programming. Message passing fits perfectly inside the functional paradigm. To a very good approximation Erlang, the epitome of message passing, is a functional language. There are several message-passing libraries in Haskell, including Cloud Haskell, which virtually implements Erlang inside Haskell.
Kirit Says:

June 9, 2014 at 8:42 pm
A few years ago I wrote a persistent JSON implementation that we’ve been using to very good effect since. We use it for inter-thread communication, asynchronous logging and we’ve also built a thread safe transactional JSON database on it suitable for small data blobs (up to a meg or two). It was surprisingly simple to achieve on top of shared_ptr and boost::variant.

The JSON structure is here https://github.com/KayEss/fost-base/blob/master/Cpp%2Finclude%2Ffost%2Fjson-core.hpp and the mutator is here https://github.com/KayEss/fost-base/blob/master/Cpp%2Finclude%2Ffost%2Fjson.hpp
Kev Says:

June 9, 2014 at 9:29 pm
@nosenseetal – watched the Sutter video: there’s a lot of “should” in there (i.e. it’s not enforced).

The other reason a lot of the C/C++ discussion is somewhat disingenuous is that there is no mention in the language about caching policy in the hardware. Really you only get predictable behavior if you stay in the same core or on a given cache line – i.e. just because you grabbed the mutex on one line doesn’t mean the other core flushed its cache copy of the rest of the object.

FP kinda avoids those problems by being more of a copy-in/copy-out paradigm, but it’s just not mirroring an efficient hardware methodology. Also a fork/join construct sort of breaks the basic premise, and maybe I want to make void/REST calls that never return.

Put another way: C/C++ is great on a sequential processor, FP is great on a stack machine, both suck at SMP because SMP is a bad architecture and neither approach has sensible handling for concurrency, and most non-SMP/Von-Neumann machines/systems just aren’t friendly to FP.

FP does nothing to get around this – http://en.wikipedia.org/wiki/Von_Neumann_architecture#Von_Neumann_bottleneck
Arnold deVos (@a4dev) Says:

June 10, 2014 at 1:22 am
A nicely reasoned post!
Rob Grainger Says:

June 10, 2014 at 5:25 am
And of course, Alan Kay always argued that the he wished he’d coined the phrase message-based programming rather than object-oriented programming. In recent papers he definitely seems aware of messages in these terms, as opposed to a synonym for a polymorphic call. Indeed, early Smalltalks (pre ST-80) used the Actor model of message-passing.
Adi Oltean Says:

June 10, 2014 at 2:57 pm
@Kevin – regarding functional programming in HW, have you looked at Bluespec? Bluespec is derived from Haskell, essentially.

Anong other things, it inherits a STM-like concept that allows expressing “transactional” state transitions in a much simpler fashion than writing similar FSM-based code in RTLs such as Verilog or VHDL. So the Bluespec code you have to write is much cleaner than the associated Verilog/VHDL code and much simpler to verify. Most importantly, the generated RTL code from Bluespec is typically very close “to the metal” (both in terms of consumed silicon area size/# of gates and performance) than highly-optimized, hand-tuned Verilog code.

You should ask around for people exposed to Bluespec – youll see that typically once they cross the bridge they are never going back to Verilog 🙂
Kevin Cameron Says:

June 10, 2014 at 4:30 pm
@Adi – I have looked at Bluespec in the past, but my own focus is getting to an asynchronous FSM methodology that is SW/programmer friendly rather than just a better HDL. I spend my days trying to fix Verilog issues, and I’m quite keen to kill it off 😉

At a glance Bluespec seems to suffer from the same problem as SystemC that it came out of an RTL ecosystem and failed to drop the clocks. Really you want a self/data timed representation where you just add constraints about delivery time and let the tools work out how to clock it as part of synthesis.

CSP – http://usingcsp.com – was the paradigm behind Occam and the Transputer. It’s quite good for a low level description too, but I like that it scales up to larger things like IoT (everyone’s favorite TLA at the moment). It’s essentially a subset of the asynchronous FSM approach.

BTW: the Cell processor (PS3) was a good candidate for a CSP approach, and not for FP because it had no traps for running off the call stack on the coprocessors (among other things).
C++的函数式革命 | 我爱互联网 Says:

June 11, 2014 at 5:53 am
[…] 　　英文原文：The Functional Revolution in C++ […]
Chris Green Says:

June 11, 2014 at 7:57 am
there are no practical implementations of STM outside of Haskell and Clojure
What about the intel transactional memory extensions? Admittedly they could be said to not strictly compose because of limitations about the amount of memory that can be touched inside a transaction without causing an abort, but within those limitations it seems pretty solid. At a bare minimum, it greatly increases the set of atomic primitives that can be used (atomically update several different memory locations? no problem).
Bartosz Milewski Says:

June 11, 2014 at 12:42 pm
@Chris: Strictly speaking, this is HTM: Hardware Transactional Memory, but it can be used to improve the implementation of STM. Personally, I have big hopes about this technology.
Adi Oltean Says:

June 11, 2014 at 12:49 pm
@Kevin – >>> that it came out of an RTL ecosystem and failed to drop the clocks.

What do you mean? You don’t need explicit clock “wires” anymore in Bluespec.
Kevin Cameron Says:

June 11, 2014 at 1:56 pm
@Adi – there’s specific mention of “cycle” in the docs (as with SystemC), so I’m assuming Bluespec is still more synchronous than asynchronous underneath. However, my issue is more that it’s a hardware centric view of the world, rather than a software centric view with a route to hardware.

Bluespec scores ~ 380 hits on LinkedIn, while Angular-JS (which is sort of asynchronous/event-driven) scores ~ 13000. Bluespec may be the best thing since sliced bread, but I can’t see it being a winning paradigm.

That’s why prefer to pick a language that most people use, do some extensions, then work in a subset – that’s easier than persuading everyone to use a new language. Most “new” languages are 90%+ semantically equivalent to old languages with new syntax – e.g. what does Python add that you can’t do in C++?
Mutually Excluded Composibility | Backwards Incompatibilities Says:

June 11, 2014 at 9:37 pm
[…] other day there was a post over at Bartosz Milewski’s Programming Cafe about the recent upsurge in interest in functional programming among the C++ community. There was […]
Adi Oltean Says:

June 12, 2014 at 7:04 pm
(I have written a much longer comment but the internet ate it. Let’s hope this one survives)

@Kevin: >>> there’s specific mention of “cycle” in the docs (as with SystemC)

I thought we were talking about clocks and clock wires? 🙂 Anyway, the clock concept is implicit in Bluespec (not explicit like in RTL languages such as VHDL/Verilog) and it is used purely as a technique to order conflicting transactions.

Here is a good reference on how Bluespec works: http://csg.csail.mit.edu/6.S078/6_S078_2012_www/resources/bsv_by_example.pdf

>> Bluespec scores ~ 380 hits on LinkedIn, while Angular-JS (which is sort of asynchronous/event-driven) scores ~ 13000.

Not sure why this is relevant. BTW, I did that search and Angular.JS seems a JavaScript-specific stuff, i.e. the software world (as can be seen from the first result in Linkedin for the search {Angular-JS}. Let’s not compare apples with oranges.

Anyway, my point isn’t that Bluespec is the answer to everything but that its underlying technique (Haskell-style functional programming + STM) can be a better alternative for HW design. So I am still waiting for an open source Bluespec-like simulation/synthesis environment …

>> However, my issue is more that it’s a hardware centric view of the world, rather than a software centric view with a route to hardware.

Any HW engineer will argue that hardware is hardware and can’t be constrained in software-style execution models. Any time you are getting away from the full capacity of using as many gates/circuits as needed in parallel (toward sequential execution of instructions) your abstraction loses power. I.e. your design will consume more gates, more power and will be slower than something that allows the designer to get “close to the metal”.
Adi Oltean Says:

June 12, 2014 at 7:07 pm
(BTW – see section 1.3 from the link I pasted above – it addresses some of the points in this discussion)
Kevin Cameron Says:

June 12, 2014 at 11:42 pm
@Adi –

>Any HW engineer will argue that hardware is hardware and can’t be constrained in software-style execution models. Any time you are getting away from the full capacity of using as many gates/circuits as needed in parallel (toward sequential execution of instructions) your abstraction loses power. I.e. your design will consume more gates, more power and will be slower than something that allows the designer to get “close to the metal”.

I would argue that our current approach to software is a thin abstraction of a Von-Neumann architecture that only works for certain scenarios. If I give you a machine with 10000 cores and NUMA you want something that looks a lot like hardware design (minus the clocks) as your software methodology. But hey, if you can persuade the Angular JS guys that Bluespec is a better solution I’m not going to stop you 😉

On clocks: Verilog/VHDL are languages that support a hardware modelling event-driven paradigm where clocks are like any other signal on a wire (RTL is just a programming style). Bluespec & SystemC don’t do hardware style modelling of wires but expect data transfers to be synchronized on some implicit clock. Higher level event driven paradigms just react to data arriving and have no concept of clocks, which is what you want as a programmer – clocks are an aspect of implementation and shouldn’t be in the algorithmic spec.

As someone well familiar with RTL: it’s a horrible abstraction level to work at, you want to be higher without the clocks, or lower with real circuit behavior. I expect it to be killed off by formal verification in the not too distant future.

Put another way: we need to program communications (SDN) and we need to design hardware (HDLs for FPGA & ASIC), why can’t we use the same methodology for that and many-core software? Whatever that methodology is I’m confident it doesn’t look like FP.
Adi Oltean Says:

June 13, 2014 at 1:32 pm
I would argue that our current approach to software is a thin abstraction of a Von-Neumann architecture that only works for certain scenarios. If I give you a machine with 10000 cores and NUMA you want something that looks a lot like hardware design (minus the clocks) as your software methodology.

Not really. First, 10,000 cores is a tiny amount compared with the large number of “building blocks” in a moderately-sized silicon chip. Second, a “core” is fairly limited in executing sequential instructions. IMHO, what you want might work on a GPU but it will have limited expressiveness in designing something to be laid out in an ASIC or FPGA.

Sure, it may work in some paradigms (dataflow centric). Altera makes progress on their OpenCL/FPGA which enables a similar programming model as the original OpenCL/GPU but IMHO this is an generic approach that won’t work efficiently for any problem where FPGAs are involved.

But hey, if you can persuade the Angular JS guys that Bluespec is a better solution I’m not going to stop you 😉

Not really seeing your point here. Bluespec is specifically designed for hardware design, not for the type of (software) problems that Angular.JS is designed to solve. You were the one that brought Angular.js in discussion so, it seems you agree with me that, it is not relevant in the context of HW design and you are comparing apples with oranges 🙂

Higher level event driven paradigms just react to data arriving and have no concept of clocks,

Seems you are proposing a novel HW design paradigm. Can you show me some real examples of this actually working in practice in a large-scale chip shipping in volume or are you just talking about some hypothetical lines of research? For one thing, having clocks (even implicitly) enables critical things like static timing verification that actually scales to millions of assembled modules and billions of gates. And this is usually the critical part in designing a large-scale ASIC.

IMHO, it is much simpler/practical to analyze an arbitrary number of HW building blocks/modules operating independently each within their clock domain and bound through async FIFOs as opposed to do a grand design truly based on asynchronous … latches or something like this.

I’m not saying that something like this won’t work, but you need to start with the proper toolset support from industrial-quality vendors (which is missing at the moment)
Kevin Cameron Says:

June 13, 2014 at 3:03 pm
@Adi –

Seems you are proposing a novel HW design paradigm.

Yep (although not that novel). EDA needs to move up from RTL, and programming needs to acknowledge that old sequential programming styles don’t work for many-core (or IoT etc.). My view is that an asynchronous FSM (CSP-like) methodology fulfils that requirement. Just write your code with as many threads/FSMs as you can and let the compilers deal with getting it onto your platform (serializing parallel code is easier than parallelizing serial code).

Here are some target software platforms –

http://www.tilera.com/
http://www.parallella.org/

Click to access 3-NASCUG20-ManyCoreVP-Victoria.pdf

http://www.kalray.eu/products/ (1024 core next year)

In the case of the FPGA target systems you can get the compilers to try to implement the FSM’s as gates etc.

With certain coding styles (OO) you can extract the FSMs and messaging automatically, so it’s not as far-fetched as it might look 😉
nosenseetal Says:

June 13, 2014 at 3:33 pm
@bartosz
you missed my point about magical immutable structures(to be fair I was not really clear 🙂 )… my impression is that like your C++ list has a shared_ptr(that uses atomics) ANY implementation of same structure in ANY language is not free from using atomics because need for shared_ptr is not a defect of a C++ but a “defect” of HW or physical world if you will…
Am I wrong about this?
Bartosz Milewski Says:

June 13, 2014 at 3:54 pm
@nosenseetal: You need atomics if you do reference counting, but you don’t if you do garbage collection, like in Haskell or Clojure.

It’s a totally different topic for discussion, but there are some good arguments that GC might be a better fit for concurrency than manual memory management a la C++. Try looking up the ABA problem and hazard pointers.
Adi Oltean Says:

June 13, 2014 at 4:27 pm
>> Here are some target software platforms […]

These are all about many-core architectures, not digital logic-style ones. Very different things, requiring very different methodologies.

I am talking about traditional digital logic HW design, where you operate with combinational logic (AND/OR/NOT/etc) combined with sequential logic (arrays of clock-driven D-flip-flops). This is how all hardware is built today in 99.99%

Generating a pure asynchronous HW design (at the gate-level) is doable for small chips (has been done before) but one of the main roadblocks is scalability of the verification, especially getting timing closure. Any time you keep adding gates, the complexity of the underlying verification increases exponentially. You can’t hand-wave these details unfortunately …
Kevin Cameron Says:

June 14, 2014 at 6:41 pm
>These are all about many-core architectures, not digital logic-style ones

My point was that an asynchronous specification as an “executable spec” is better than RTL, and is a usable programming methodology for many-core. An asynchronous spec can be turned into a synchronous logic design more efficiently because it doesn’t have unnecessary clocking clouding the functionality.

Certain programming styles are semantically equivalent, so pieces of FP and OO are translatable into asynchronous FSM style, but I say just go with it direct because you have to do that for the hardware design anyway.
nosenseetal Says:

June 20, 2014 at 5:10 pm
Bartosz
Im just gonna say I am a bit skeptical about GC and HPC, but maybe (if his talk will be recorded) Herbs talk at cppcon will make me change my mind. 😀
Kevin Cameron Says:

June 21, 2014 at 3:37 pm
GC isn’t a good idea if you have a large distributed machine since it gets more expensive to tell if things are still in use – TidalScale are working on giving you very large SMP machines spanning mutiple servers. You also want coding styles that are formally verifiable, Coverity’s static analysis tools can tell you that that you are not deallocating memory (as in you probably should), but can’t help much with sloppy coding styles like garbage collection. Languages that make humans “productive” are often not amenable computer aid, and are less likey to work well for parallel programming (humans are intrinsically bad at parallel stuff).
The Curious Failure of Functional Programming for Parallel Applications | Probably Dance Says:

September 7, 2014 at 6:56 am
[…] popular as we do more and more parallel programming, but that doesn’t happen. As an example this blog post made the rounds recently which has this quote in […]
Kenneth Kasajian Says:

April 3, 2015 at 10:17 pm
One of the best explanations of the problem I’ve read in a while.

The movement toward functional programming in main-stream languages has been extremely slow. It’s one thing to program in a functional style for expressiveness, but the promise of automatic parallelism in languages such as C# and Java, like erlang/Haskell, is spotty at best.

• There’s no tail-call optimization.
• There’s no true immutable structure.
• There’s memo-ization.
• If you try attempt different aggregate function over the same sequence, it won’t “fold” the iterations. That’s because there’s no “deforestization” (the removal of intermediate data structures) in LINQ. The end result is that hand-written for loops are faster than LINQ.

MS doesn’t seem to be all that eager to solve that problem and programmers are content because they don’t know what’s possible. “Don’t use LINQ in time-critical code” is the advice, missing the point that it’s not about isolated instances of time-criticality but that an overall shift to functional programmer is required to have a hope of optimizing for multi-core.

Fortunately, many applications are not CPU-bound. Advancements in other areas is more common: Node.js is solving the opposite problem (message passing)

Yet CPU-bound software is happy to eat clock-cycles like it’s a free resource. It becomes more of an issue when wasting CPU cycles means $$$ because you have to buy more servers to run your web services.

Eventually someone’s gonna say, “Hey, if we limit the number of Docker instances on the box but have more parallel programs, we can save money.”

Show them how money will be saved. Then people will listen.
Larry (@ljr1981) Says:

April 4, 2015 at 4:59 pm
Hello Barstosz!

If you’ve worked on D, then you are well aware of Eiffel. I wonder—have you seen it recently? There are several points to make:

Concurrency is now handled by the notion of “separate” and opens the doorway to SCOOP (Simple Concurrent Object Oriented Programming)—a world where the compiler itself is handling the complex concurrency code, relieving the programmer of this direct burden (beyond managing the notion of separateness).

Void-safety is another outstanding daily reality for Eiffel, wherein one is relieved from the burden of having great cause for concern regarding null pointers and calls. This is a powerful reality to live in on a moment-by-moment basis.

Design-by-Contract, sensible-and-simple Multiple Inheritance, Generics, and other technologies built into the notation are staple items the Eiffel programmer embraces on a minute-by-minute basis without concern and with great and practiced confidence.

The compiler is now rife with outstanding tools. Pick-and-drop navigation blows the mind of almost every programmer I show it to. The built-in AutoTest tool is outstanding for implementing TDD + Design-by-Contract for rock-solid regression and integration testing. And there is a new kid in town: Eiffel Inspector, which is a fabulous and user-definable Code Analysis tool that is gaining in importance, to where it is nearly as indispensable as the compiler itself.

Finally, Eiffel has had beautiful blending of functional programming for many years (about 15+) by the notion of Agents. For example, they serve as the basis for the Eiffel Vision (GUI library) MVC model.

With recent additions of Eiffel libraries for web-programming and the possibilities opened up for putting Eiffel-based web programming behind Phone-Gap and Cordova to have platform-independent “Mobile First” programming, the reach of this general purpose language has now blossomed greatly.

Now—admittedly, I am not a C or C++ programmer. While I have a certification in C# and some college courses in C, with some additional training in Java and JavaScript, I will not attempt to present myself as knowing these languages well enough to compare. All I can tell you is that I have worked deeply in Eiffel for the last 5 years and have grown to have a great appreciation for the language and the method.

I encourage all programmers to give this language a first, second, and third look. Don’t let “not-mainstream” turn you off. My team of 8 programmers have a history of taking programmers from knowing no Eiffel to being nicely proficient in about 1-2 weeks training time. We then find that it take about a year to become highly competent and excellent. Yet, even after the 1-2 weeks, we have seen programmers tackle tasks where they are changing huge amounts of system code successfully (our code base presently stands at about 850,000 LOC).

Anyway—give Eiffel a look. I will warn you that the documentation is poor, but that is changing slowly. The folks at Eiffel Software are aware of this shortcoming and are working diligently to make the change. It is mostly due to the fact that many of them are not American and their English language skills at writing English-based documentation are not the best (poor grammar and such). Nevertheless, that is changing for the better.

Also—do not let the price tag for the Enterprise version scare you either. There is a GPL FREE version, which is entirely unfettered and without lack, when compared to the paid version. Also—I have it on good authority from inside Eiffel Software that they will work with ANYONE on getting the price tag to something manageable and reasonable. All you have to do is ask them to negotiate. I have done so myself.

At the end of the matter—Eiffel and EiffelStudio is a fantastic general-purpose software engineering product that I hope you all will consider.

Cheers!
С++ встречи в России | FNIT.RU Says:

April 19, 2015 at 9:27 pm
[…] в C++ со строгих математических позиций. В этой его статье хорошо описано зачем и […]
Going Parallel: Overture | Crimild Project Development Diary Says:

June 27, 2015 at 7:31 pm
[…] promising. I know that this is going to be a long process if I want Crimild to be fully parallel (which might involve a bit of paradigm shifting), so I expect to have a lot of trial and […]
Functional Programming: Links, News And Resources (16) | Angel "Java" Lopez on Blog Says:

September 23, 2015 at 12:11 pm
[…] The Functional Revolution in C | Bartosz Milewski’s Programming Cafe https://bartoszmilewski.com/2014/06/09/the-functional-revolution-in-c/ […]
willnations Says:

December 9, 2015 at 12:49 am
I have a growing interest in functional programming, so I’m glad to see it coming to C++; however, this syntax seems a little awkward / complicated / unwieldy. Know if there are simplifications in the works like a keyword / operator / something? Don’t really wanna have to slog through a workflow like that (just from the sound of it).
Bartosz Milewski Says:

December 9, 2015 at 3:17 pm
@willnations: There is some progress on that front. The keyword auto introduces limited form of type inference, there are const expressions that may be better than the template notation, there is talk of using resumable functions to support monads. But the pain will always be there.
C++s Strengths Keep it Relevant in an Age of Code Bootcamps - Blog - Developer Tools - IDERA Community Says:

September 28, 2018 at 3:52 am
[…] programming: while not a pure functional language, functional style is a direction the language has been heading and is very powerful in […]
C++’s Strengths Keep it Relevant in an Age of Code Bootcamps - Blog - Developer Tools - IDERA Community Says:

October 10, 2018 at 1:10 am
[…] programming: while not a pure functional language, functional style is a direction the language has been heading and is very powerful in […]