I’ve been looking for a good analogy of what programming in C++ feels like and I remembered this 1990 Tim Burton movie, Edward Scissorhands.
It’s a darker version of Pinocchio, shot in suburban settings. In this poster, the scary guy (Johnny Depp) is trying to gently hug Winona Ryder but his clumsy scissor-hands are making it very dangerous for both of them. His face is already covered with deep scars.
Having scissors for hands in not all that bad. Edward has many talents: he can, for instance, create stunning dog hairdos.
I often have these kinds of thoughts after attending C++ conferences: this time it was Going Native 2013. The previous year, the excitement was all about the shiny new C++11 Standard. This year it was more of a reality check. Don’t get me wrong — there were many stunning dog hairdos on display (I mean C++ code that was elegant and simple) but the bulk of the conference was about how to avoid mutilation and how to deliver first aid in case of accidental amputation.
Little shop of horrors
There was so much talk about how not to use C++ that it occurred to me that maybe this wasn’t the problem of incompetent programmers, but that straightforward C++ is plain wrong. So if you just learn the primitives of the language and try to use them, you’re doomed.
C++ has an excuse for that: backward compatibility — in particular compatibility with C. You might think of the C subset of C++ as bona fide assembly language which you shouldn’t use it in day-to-day programming, except that it’s right there on the surface. If you reach blindly into your C++ toolbox, you’re likely to come up with naked pointers, for
loops, and all this ugly stuff.
A well known example of what not to do is to use malloc
to dynamically allocate memory, and free
to deallocate it. malloc
takes a count of bytes and returns a void pointer, which you have to cast to something more usable — it would be hard to come up with worse API for memory management. Here’s an example of really bad (but almost correct, if it weren’t for the possibility of null pointer dereference) code:
struct Pod { int count; int * counters; }; int n = 10; Pod * pod = (Pod *) malloc (sizeof Pod); pod->count = n pod->counters = (int *) malloc (n * sizeof(int)); ... free (pod->counters); free (pod);
Hopefully, nobody writes code like this in C++, although I’m sure there are a lot of legacy apps with such constructs, so don’t laugh.
C++ “solved” the problem of redundant casting and error-prone size calculations by replacing malloc
and free
with new
and delete
. The corrected C++ version of the code above would be:
struct Pod { int count; int * counters; }; int n = 10; Pod * pod = new Pod; pod->count = n; pod->counters = new int [n]; ... delete [] pod->counters; delete pod;
BTW, the null pointer dereference problem is solved too, because new
will throw an exception when the system runs out of memory. There is still a slight chance of a memory leak if the second new
fails (But how often does that happen? Hint: how big can n
get?) So here’s the really correct version of the code:
class Snd { // Sophisticated New Data public: Snd (int n) : _count(n), _counters(new int [n]) {} ~Snd () { delete [] _counters; } private: int _count; int * _counters; }; Snd * snd = new Snd (10); ... delete snd;
Are we done yet? Of course not! The code is not exception safe.
The C++ lore is that you should avoid naked pointers, avoid arrays, avoid delete
. So the remedy for the lameness of malloc
is operator new
, which is also broken because it returns a dangerous pointer and pointers are bad.
We all know (and have scars on our faces to prove it) that you should use the Standard Library containers and smart pointers whenever possible. Oh, and use value semantics for passing things around. No wait! Value semantics comes with a performance penalty because of excessive copying. So what about shared_ptr
and vectors of shared_ptr
? But that adds the overhead of reference counting! No, here’s a new idea: move semantics and rvalue references.
I can go on and on like this (and I often do!). Do you see the pattern? Every remedy breeds another remedy. It’s no longer just the C subset that should be avoided. Every new language feature or library addition comes with a new series of gotchas. And you know a new feature is badly designed if Scott Meyers has a talk about it. (His latest was about the pitfalls of, you guessed it, move semantics.)
The Philosophy of C++
Bjarne Stroustrup keeps stressing how important backward compatibility is for C++. It’s one of the pillars of the C++ philosophy. Considering how much legacy code there is, it makes perfect sense. Compatibility, though, takes a very heavy toll on the evolution of the language. If nature were as serious about backward compatibility as C++ is, humans would still have tails, gills, flippers, antennae, and external skeletons on top of internal ones — they all made sense at some point in the evolution.
C++ has become an extremely complex language. There are countless ways of doing the same thing — almost all of them either plain wrong, dangerous, unmaintainable, or all of the above. The problem is that most code compiles and even runs. The mistakes and shortcomings are discovered much later, often after the product has been released.
You might say: Well, that’s the nature of programming. If you think so, you should seriously look at Haskell. Your first reaction will be: I don’t know how to implement the first thing (other than the factorial and Fibonacci numbers) in this extremely restrictive language. This is totally different from the C++ experience, where you can start hacking from day one. What you don’t realize is that it will take you 10 years, if you’re lucky, to discover the “right way” of programming in C++ (if there even is such thing). And guess what, the better a C++ programmer you are, the more functional your programs look like. Ask any C++ guru and they will tell you: avoid mutation, avoid side effects, don’t use loops, avoid class hierarchies and inheritance. But you will need strict discipline and total control over your collaborators to pull that off because C++ is so permissive.
Haskell is not permissive, it won’t let you — or your coworkers — write unsafe code. Yes, initially you’ll be scratching your head trying to implement something in Haskell that you could hack in C++ in 10 minutes. If you’re lucky, and you work for Sean Parent or other exceptional programmer, he will code review your hacks and show you how not to program in C++. Otherwise, you might be kept in the dark for decades accumulating self-inflicted wounds and dreaming of dog hairdos.
Resource Management
I started this post with examples of resource management (strictly speaking, memory management), because this is one of my personal favorites. I’ve been advocating and writing about it since the nineties (see bibliography at the end). Obviously I have failed because 20 years later resource management techniques are still not universally known. Bjarne Stroustrup felt obliged to spend half of his opening talk explaining resource management to the crowd of advanced C++ programmers. Again, one could blame incompetent programmers for not accepting resource management as the foundation of C++ programming. The problem though is that there is nothing in the language that would tell a programmer that something is amiss in the code I listed in the beginning of this post. In fact it often feels like learning the correct techniques is like learning a new language.
Why is it so hard? Because in C++ the bulk of resource management is memory management. In fact it has to be stressed repeatedly that garbage collection would not solve the problem of managing resources: There will always be file handles, window handles, open databases and transactions, etc. These are important resources, but their management is overshadowed by the tedium of memory management. The reason C++ doesn’t have garbage collection is not because it can’t be done in an efficient way, but because C++ itself is hostile to GC. The compiler and the runtime have to always assume the worst — not only that any pointer can alias any other pointer but that a memory address can be stored as an integer or its lower bits could be used as bitfields (that’s why only conservative garbage collectors are considered for C++).
It’s a common but false belief that reference counting (using shared pointers in particular) is better than garbage collection. There is actual research showing that the two approaches are just two sides of the same coin. You should realize that deleting a shared pointer may lead to an arbitrary long pause in program execution, with similar performance characteristics as a garbage sweep. It’s not only because every serious reference counting algorithm must be able to deal with cycles, but also because every time a reference count goes to zero on a piece of data a whole graph of pointers reachable from that object has to be traversed. A data structure built with shared pointers might take a long time to delete and, except for simple cases, you’ll never know which shared pointer will go out of scope last and trigger it.
Careful resource management and spare use of shared_ptr
might still be defendable for single-threaded programs, but the moment you start using concurrency, you’re in big trouble. Every increment or decrement of the counter requires locking! This locking is usually implemented with atomic variables, but so are mutexes! Don’t be fooled: accessing atomic variables is expensive. Which brings me to the central problem with C++.
Concurrency and Parallelism
It’s been 8 years since Herb Sutter famously exclaimed: The Free Lunch is Over! Ever since then the big C++ oil tanker has been slowly changing its course. It’s not like concurrency was invented in 2005. Posix threads have been defined in 1995. Microsoft introduced threads in Windows 95 and multiprocessor support in Windows NT. Still, concurrency has only been acknowledged in the C++ Standard in 2011.
C++11 had to start from scratch. It had to define the memory model: when and in what order memory writes from multiple threads become visible to other threads. For all practical purposes, the C++ memory model was copied from Java (minus some controversial guarantees that Java made about behavior under data races). In a nutshell, C++ programs are sequentially consistent if there are no data races. However, since C++ had to compete with the assembly language, the full memory model includes so called weak atomics, which I would describe as portable data races, and recommend staying away from.
C++11 also defined primitives for thread creation and management, and basic synchronization primitives as defined by Dijkstra and Hoare back in the 1960s, such as mutexes and condition variables. One could argue whether these are indeed the right building blocks for synchronization, but maybe that doesn’t really matter because they are known not to be composable anyway. The composable abstraction for synchronization is STM (Software Transactional Memory), which is hard to implement correctly and efficiently in an imperative language. There is an STM study group in the Standards Committee, so there is a chance it might one day become part of the Standard. But because C++ offers no control over effects, it will be very hard to use properly.
There was also a misguided and confusing attempt at providing support for task-based parallelism with async tasks and non-composable futures (both seriously considered for deprecation in C++14). Thread-local variables were also standardized making task-based approach that much harder. Locks and condition variables are also tied to threads, not tasks. So that was pretty much a disaster. The Standards Committee has the work cut out for them for many years ahead. That includes task-based composable parallelism, communication channels to replace futures (one would hope), task cancellation and, probably longer term, data-driven parallelism, including GPU support. A derivative of Microsoft PPL and Intel TBB should become part of the Standard (hopefully not Microsoft AMP).
Let’s take a great leap of faith and assume that all these things will be standardized and implemented by, say, 2015. Even if that happens, I still don’t think people will be able to use C++ for mainstream parallel programming. C++ has been designed for single thread programming, and parallel programming requires a revolutionary rather than evolutionary change. Two words: data races. Imperative languages offer no protection against data races — maybe with the exception of D.
In C++, data is shared between threads by default, is mutable by default, and functions have side effects almost by default. All those pointers and references create fertile grounds for data races, and the vulnerability of data structures and functions to races is in no way reflected in the type system. In C++, even if you have a const reference to an object, there is no guarantee that another thread won’t modify it. Still worse, any references inside a const object are mutable by default.
D at least has the notion of deep constness and immutability (no thread can change an immutable data structure). Another nod towards concurrency from D is the ability to define pure functions. Also, in D, mutable objects are not shared between threads by default. It is a step in the right direction, even though it imposes runtime cost for shared objects. Most importantly though, threads are not a good abstraction for parallel programming, so this approach won’t work with lightweight tasks and work-stealing queues, where tasks are passed between threads.
But C++ doesn’t support any of this and it doesn’t look like it ever will.
Of course, you might recognize all these pro-concurrency and parallelism features as functional programming — immutability and pure functions in particular. At the risk of sounding repetitive: Haskell is way ahead of the curve with respect to parallelism, including GPU programming. That was the reason I so easily converted to Haskell after years of evangelizing good programming practices in C++. Every programmer who’s serious about concurrency and parallelism should learn enough Haskell to understand how it deals with it. There is an excellent book by Simon Marlow, Parallel and Concurrent Programming in Haskell. After you read it, you will either start using functional techniques in your C++ programming, or realize what an impedance mismatch there is between parallel programming and an imperative language, and you will switch to Haskell.
Conclusions
I believe that the C++ language and its philosophy are in direct conflict with the requirements of parallel programming. This conflict is responsible for the very slow uptake of parallel programming in mainstream software development. The power of multicore processors, vector units, and GPUs is being squandered by the industry because of an obsolete programming paradigm.
Bibliography
Here I put together some of my publications about resource management:
- Bartosz Milewski, “Resource Management in C++,” Journal of Object Oriented Programming, March/April 1997, Vol. 10, No 1. p. 14-22. This is still pre-
unique_ptr
, so I’m usingauto_ptr
for what it’s worth. Since you can’t have vectors ofauto_ptr
I implemented anauto_vector
. - C++ Report in September 1998 and February 1999 (still using
auto_ptr
). - C++ in Action (still auto_ptr), Addison Wesley 2001. See an excerpt from this book that talks about resource management.
- Walking Down Memory Lane, with Andrei Alexandrescu, CUJ October 2005 (using unique_ptr)
- unique_ptr–How Unique is it?, WordPress, 2009
Here are some of my blogs criticizing the C++11 approach to concurrency:
September 19, 2013 at 2:27 pm
[…] Edward C++Hands Source: https://bartoszmilewski.com/2013/09/19/edward-chands/ 0 […]
September 19, 2013 at 4:06 pm
> It’s a common but false belief that reference counting (using shared pointers in particular) is better than garbage collection. There is actual research showing that the two approaches are just two sides of the same coin. You should realize that deleting a shared pointer may lead to an arbitrary long pause in program execution, with similar performance characteristics as a garbage sweep. It’s not only because every serious reference counting algorithm must be able to deal with cycles, but also because every time a reference count goes to zero on a piece of data a whole graph of pointers reachable from that object has to be traversed. A data structure built with shared pointers might take a long time to delete and, except for simple cases, you’ll never know which shared pointer will go out of scope last and trigger it.
I think you’re misinterpreting this paper, or maybe shared_ptr. The “reference counting” referred to in the paper is a _method_ of garbage collection, not an alternative to it. The alternative called “reference counting” doesn’t deal with cyclic references, which means it’s not garbage collection at all. This is how shared_ptrs behave; they *will* leak given cyclic references. But that does mean that shared_ptr has decidedly deterministic deletion performance characteristics – either the reference count will be decremented and nothing else will happen, or the reference count will decrement and the underlying object’s destructor will be called. Any shared pointers owned by the object will get a similar treatment, and so on, recursively, and any naked pointers will be deleted (and their objects destroyed) unconditionally. shared_ptr essentially guarantees that you won’t traverse any more of the object graph at deletion time than you would with naked pointers, and maybe substantially *less*, at the cost of some O(1) time (per shared_ptr copy/assigned or deleted) and space bookkeeping. It’s not garbage collection at all.
It’s true you can’t know in the general case which shared_ptr going out of scope will trigger the destruction, but you should know your own program’s logic enough to have a fairly good idea which objects (and their subgraphs) will be long-lived. If it becomes a problem, that means you’ve got the object owned as a non-shared pointer somewhere, or your object graph is no longer acyclic.
September 19, 2013 at 4:26 pm
Notice that this kind of argument: “you should know your own program’s logic enough…” doesn’t scale at all. C++ programs are a maintenance nightmare because they are built from non-composable parts. You can’t build something in isolation and be confident that it will work correctly with the rest of the system unless you understand the rest of the system. And what if you’re writing or using libraries? Even if you are a C++ guru, you will have to work with other less experienced programmers and use and maintain code written by others. They usually don’t know the logic of your program well enough and there is noting preventing them from quietly screwing things up for you.
September 19, 2013 at 4:58 pm
the only difference between C++, and is in C++ the safety net is optional, and you’re free to cause side effects at your own peril. Since humans are fallible, it makes sense to put in guard rails by default, and let mature systems slowly migrate the helmet off the system when it knows how to ride the bicycle. Mandating monadic paradigms holistically and ubiquitously as the ultimate truth is to be beholden to mantra, whereas ultimately, although harder, context is what matters…
September 19, 2013 at 5:22 pm
Any thoughts on Rust? I saw D got a mention.
September 19, 2013 at 5:34 pm
1. “There was also a misguided and confusing attempt at providing support for task-based parallelism with async tasks and non-composable futures (both seriously considered for deprecation in C++14).”
Can you please point to such intent in committee doc?
2. Why no single word about Scala – all libs of java + Haskell like power in one?
September 19, 2013 at 5:37 pm
Are you sure that parallelizing is the only thing to do, with high performance code?
– The speedup of a program using multiple processors is limited
by the time needed for the sequential fraction of the program.
(Amdahl’s law)
In Haskell it’s easy parallelize code, but performance is not only using your cpu cores. And it’s hard to write code that can have paragonabile performance to C++ code
Today a cpu silicon is invested principally in: cache
You solve one problem, but you regres in others: memory usage
Haskell promote a coding style where your data is sparse (terrible data locality), and even basic data types are boxed
Use a vector? Not idiomatic Haskell! You have to use some tree!!
So your code will execute on all your core, but all your core will spend the majority on time on frontend stall! Not a great speedup
Number of cores can increase, but is useless if they spent their time waiting for the data
If people write custom allocators, is not because they are fools
And having a bad cache usage in a multi threading environment is also worser that in a single threads environment: MESI protocol will take even additional time
And in all of this, not a word on NUMA architecture, with will be probably predominant in the future. How Haskell can be made NUMA aware?
You are making it like “We don’t need to worry about low level details”…where is my api for cpu affinity? When a thread will be scheduled on another code, the data set has to be readed twice from the memory
This matters in high performance code
If the OS expose api like that, is not because they are fools
Communication channels is not the “silver bullet”
What if I have to manage a large shared data structure with thousand of edits per seconds? It’s obvious that message passing will be a suicide
(and please don’t tell me things like “simply don’t have a central state” — it’s obviously not possible in all cases)
If people researched complex lockless data structure, is not because they are fools — or don’t know about this
IMHO STM is what looks really promising…but I don’t know enough to talk about it
Turns out that if you really want to write high performance code…you regress from Haskell to C++
And I personally find that debug and understand the flow of an Haskell code, is hard at contrary to C++
Be able to understand when something will be evaluated, is also something that matters
Sorry for my english
I did find your others post on C++ vs Haskell “true beauty” but honestly I can’t view how we are going to rewrite code that really think about the hardware, with something that don’t and claims that this is how we will manage “high performance code”
Maybe we simply have a different definition of what is “high performance”
September 19, 2013 at 6:01 pm
I’ve been doing hobby game development on the side using html5 stuff, but have began reading Bjarne’s latest book on learning C++. By the sounds of this, I not only want to learn through good resources, but learn how to program in a functional style. “The C++ Programming Language” teaches use of classes and such, at least of where I am. Is it practical for someone to learn a language to code their own projects and not get bit in the ass?
September 19, 2013 at 6:24 pm
@Nicola: I really recommend Simon Marlow’s book. It will answer your questions. Locality and unboxing is very important indeed and, at least for matrix calculations, is provided by the Repa library in Haskell. GPU code is produced by the Accelerate library. On GPUs you usually have to copy your data to private GPU memory, and Accelerated does it for you. These libraries are all about scalable performance. Parallel programming is pretty useless if it can’t speed programs up.
September 19, 2013 at 6:26 pm
@Aaron. I would recommend Bjarne’s book, “A tour of C++” as a good introduction to the language: http://www.stroustrup.com/Tour.html .
September 19, 2013 at 6:42 pm
@Albert,
> Since humans are fallible, it makes sense to put in guard rails by default, and let mature systems slowly migrate the helmet off the system when it knows how to ride the bicycle.
That makes perfect sense — especially if there were good ways to enforce those guard rails, ideally at compile time. Having a set of default compiler declarations that outlaw foot-shooting is a great way to prevent it in the average case while letting “mature systems” do whatever they want as long as they sort of “sign the disclaimer.”
Allowing anyone to silently and accidentally slip through the guard rails and into oblivion is an entirely more problematic scenario, however.
September 19, 2013 at 7:45 pm
Do you know of any Haskell IDE that supports debugging Haskell code which is embedded in C++?
September 19, 2013 at 8:17 pm
Nicola,
I’ve been slowly building up a set of cache-oblivious yet still purely functional data structures in Haskell. Ryan Newton has been working towards NUMA-awareness. I have a decent high performance lock-free work-stealing deque and tbb-style task monad. With GHC 7.8 we’re getting primops for working with SIMD, prefetching, etc. and filling in other gaps in the high-performance spectrum.
Moreover I would challenge that the ‘vector isn’t idiomatic haskell’ argument is a few years old. Nowadays vector, repa, and accelerate are all pretty well en-meshed in the culture.
That said, CPU affinity _is_ still something we don’t have a story for. It bites me too. HECs don’t tend to move around, but that isn’t the best guarantee.
I won’t lie and say we have all of these things today, but we’re not ignoring them. Haskell is continuing to evolve and borrow good ideas.
September 19, 2013 at 8:40 pm
Nice, I never come with Edward when thinking of c++, it was aways Frankenstein’s creation…
September 19, 2013 at 9:18 pm
Thank you Edward Kmett, I really appreciate your message
September 19, 2013 at 9:31 pm
I think it is worth pointing out, that C++ *can* offer some protection
against data-races in a parallel programs. If you write parallel
programs using a language that supports strict fork-join task-based
parallelism, then you can use a race-detector to find races in your
code.
Cilk Plus provides such extensions to C and C++ for task parallelism which
exactly fits into this category. [Quasi-promotional links will follow. 🙂 ]
https://www.cilkplus.org/cilk-plus-tutorial
It also provides Cilk Screen, a corresponding race detector for finding data races.
http://software.intel.com/en-us/articles/an-introduction-to-the-cilk-screen-race-detector
Note that Cilk Screen actually exploits the strict fork-join structure
of parallelism to make strong correctness guarantees. For a large
class of interesting Cilk Plus programs (e.g, mostly those that use
only cilk_spawn, cilk_sync, cilk_for, and reducers), one can actually
*prove* that the race-detector will find all data races for the given input.
This guarantee is not true in general of parallel programming using
pthreads, or even other task-parallel programming platforms (e.g.,
TBB), because these platforms support more arbitrary, unstructured
parallelism.
Language support for strict fork-join parallelism in the style of Cilk
Plus has been proposed to the C++ standard.
Click to access n3409.pdf
For anyone who is serious about making C++ usable for mainstream
parallel programming, I would encourage taking a look at this
proposal. The strictness provided by having language keywords for
task parallelism provides stronger guarantees for race-detection tools, as compared task-parallel libraries like Microsoft PPL or Intel TBB.
In addition to the benefits of strictness, having a language extension for fork-join parallelism instead of a library interface is at least in my opinion, a much more intuitive and user-friendly way to support parallelism in C++.
September 19, 2013 at 10:15 pm
> I believe that the C++ language and its philosophy are in direct conflict
> with the requirements of parallel programming. … The power of multicore
> processors, vector units, and GPUs is being squandered by the
> industry because of an obsolete programming paradigm.
You’re right! But C++ is still one from mainstream languages…
September 20, 2013 at 1:03 am
I like to see, as people who don’t know something, write about this and blame how strange it looks like for them.
My answer on all these miracle article is simple – I recommend you to learn.
September 20, 2013 at 1:41 am
I think soon or later C++ comes to the situation when old gurus gone from IT, but new guys won’t accept so much headache of C++, when there is so candy C# or D. And this is normal – having GIGAherz under the bonnet, who cares how many microseconds past for i++? Safety, robustness – that’s main goals of modern, complex software. Plus effectiveness, which is no more counting of “ticks per second”, but parrallelism – use as many CPU as you can. Especially when even smartphones slowly come to 8(!!!) cores and x64.
September 20, 2013 at 5:54 am
I was on GoingNative 2013 too. And there were a lot of young (most of audience) people that know how to use C++.
If I give you rocket and you try to dig by it instead of fly – nothing will help you.
September 20, 2013 at 6:00 am
I recommend to everybody who read this article to verify (google) something on
” the moment you start using concurrency, you’re in big trouble. Every increment or decrement of the counter requires locking! This locking is usually implemented with atomic variables, but so are mutexes!”
And I reassure you – the whole article is the same.
September 20, 2013 at 7:33 am
Haskell is creating a new value for every parameter / function return value; That means that even for trivial tasks it is churning over non trivial amounts of RAM; the footprint of a Haskell program is therefore rather large; I guess that makes it rather impractical for real systems.
September 20, 2013 at 7:46 am
But of course, this one is an interesting observation: “If nature were as serious about backward compatibility as C++ is, humans would still have tails, gills, flippers, antennae, and external skeletons on top of internal ones — they all made sense at some point in the evolution.”
I guess in C++ is like Perl: for many things there is more than one way to do a thing; Don’t like exceptions? disable the feature during compilation and code as in C with error codes; That’s what makes the whole thing malleable; somehow the right kind of flexibility is a winning feature.
September 20, 2013 at 10:34 am
Three reflections come into my mind. You are very close to convince me but there are points that remain unclear for me.
The first one is about the critical mass needed to carry on. If I tell my boss I want to develop one of the modules of our system, just the smallest and simpliest one, in Haskell he will not let me do. For sure.
It doesn’t matter what I tell him about Haskell. The reason for his deny is simple: Who’s going to improve and mantain that module if you leave the company? It’s easy to hire a new C++ programmer, but that’s not
the case in Haskell. As long as there isn’t a critical mass of Haskell developers, and Haskell resources, there is no chance to replace C++ (or any other stablished language) and, as long as companies not assume the risk of begin to develop in Haskell that critical mass is unreachable. Here we have a nice circular dependency.
For the second point, I have to recall my knowledge of Haskell is very little, mainly limited to what I have learned reading your posts, so maybe the following it’s completly pointless (All my excuses if that’s the case).
As far as my little knowledge ligths my mind I agree with you: Haskell’s aproach to concurrency is a better solution than the C++ aproach. But the problem in C++ is the way it “access” the shared data, no the way it “works” with that data.
Of course, to work with the data you have to access it, but let me explain:
We are talking about functional paradigm not because we have problems to model and represent systems, the OOP paradigm has sohwn to be quite powerful in this commit. The problems, as you have stated repetadly in the lasts times, are in the reads and writes of the shared data. The way this data is computed and processed, and the logical models we develop for that, are not a problem (If you are a decent programmer). As far as I understand, beeing Haskell a pure functional language the OOP is out of it’s
scope, so it seems to me that if we fall into the arms of the functional programming we have to give up in all the benneffits OOP offers
(and they are a lot). I wonder if Haskell or any other functional language, doesn’t offer us a tricky tradeof: I can assure your life in the wild frontier, no need to worry about data races and all this stuff, but the price you have to pay is the loose of many of the advantges OOP offers you.
I have talk to colleagues and in other forums about the ideas you have been exposing in your posts about Haskell. I have to say that in most cases I found scepticism when not direct, and hostile, rejection. But I have the impression that the rejection is not due to any technical reason but for the perspective of loosing all the OOP benneffits. From time to time I have to take care of old pieces of code written in plain C, these are never good days.
The final reflection comes inmedtiatly (again from my ignorance): It isn’t possible to combine the benneffits of Haskell aproach to
concurrency with the advantages that the OOP offers?
September 20, 2013 at 12:23 pm
Very nice article.
“Two words: data races. Imperative languages offer no protection against data races — maybe with the exception of D.”
Have you looked at Rust? Variable declarations default to immutable, and the type system disallows sharing of mutable data between tasks.
September 20, 2013 at 1:30 pm
I don’t know much about Rust, but I noticed one thing: Like Erlang, it supports only one model of concurrency: message passing. That’s very limiting. Haskell actually lets you share memory in a very controlled way, which makes the implementation of STM possible.
September 20, 2013 at 1:58 pm
Yes, there is a bootstrapping problem with Haskell and it won’t be solved overnight. Many people, however, noticed Haskell’s hockey-stick growth pattern in recent years. There are also alternatives to Haskell like F# or Scala (which, by the way, is more object oriented than functional).
I’m not sure which OO features are important to you. A lot of them are available in Haskell. They might not look the same but play the same role. Haskell’s polymorphism, for instance, is strictly more powerful than C++’s.
September 20, 2013 at 2:04 pm
Actually, behind the scenes Haskell uses pointers for everything, including passing data to and from functions. It also tries to share as much data as possible when making modifications to data structures. It also gives the programmer the option of not creating thunks (using strictness annotations) and using unboxed values inside data structures. But it does all this without exposing the program to data races.
September 20, 2013 at 2:19 pm
Michael Moser,
You’d be surprised.
Due to strict control over side-effects GHC is very good at rejiggering code into a form that does few to no allocations for the kinds of things you usually turn into an inner loop.
Supercompilation (http://community.haskell.org/~ndm/downloads/paper-supero_making_haskell_faster-27_sep_2007.pdf) can even “beat C” in some cases.
Stream fusion (http://metagraph.org/papers/stream_fusion.pdf) and the worker-wrapper transformation (http://www.cs.nott.ac.uk/~gmh/wrapper.pdf) are used heavily in the compiler and core libraries to make it so idiomatic haskell doesn’t have to be slow.
The vector library uses stream fusion to generate rather impressive unrolled loops. This is getting better now with Geoffrey Mainland’s work (http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf). Now idiomatic Haskell code like:
dot v w = mfold’ (+) 0 (mzipWith (*) v w)
is starting to even beat hand-rolled C that uses SSE primitives!
The key to being able to do these rewrites on your code is the freedom from arbitrary side-effects. That lets the compiler move code around in ways that would be terrifying and unverifiable in a C/C++ compiler.
September 20, 2013 at 11:08 pm
I think it’s better to link to original sources. The “free lunch” article by Herb Sutter first appeared in Dr. Dobb’s. (http://www.drdobbs.com/184405990) and was updated by him there last year
(http://www.drdobbs.com/232400273)
September 21, 2013 at 4:38 am
non deterministic actor message passing is still hard to debug while a system like http://www.infoq.com/presentations/LMAX is easy, and you can write “single-threaded” code.
September 21, 2013 at 6:45 am
Michael Moser, even when ghc does allocate in your inner loop it will allocate 1mb in the nursery, then a minor gc will copy the few live objects from it and not pay for the rest. So it will only actually cost on the order of 2mb of ram for all these extra allocations.
The major downside of this imo is that it isn’t optimally friendly, but it is going to work at l2 bandwidth which is not too bad.
September 22, 2013 at 1:35 am
“The composable abstraction for synchronization is STM” – seems rather sweeping. Personnaly I think STM should be relagated to lower levels and never exposed to users.
Since you mention Hoare – CSP has been around for longer and is a paradigm that spans hardware design (extremely parallel to client-server Cloud computing).
Anything based on an explicit shared-memory model is extremely hard to scale to large numbers of threads and also hard to formalize and debug, so I’d agree C++11 is a bit of a bust for parallel processing.
However, the object oriented model of programming C++ supports is not dissimilar to what is used in hardware description languages like SystemVerilog which use different synchronization paradigms and can handle millions of threads using FSM methodology that is well understood. ICs have many parallel processes and need to be correct when fabbed (not many patching options).
So C++ just needs to head in a diffrent direction.
On another note I’m finding language discussions boring – the same stuff gets rehashed repeatedly and “new” languages rarely do anything new. Let’s ditch the languages and just talk about semantics. Someone stood up at Silicon Valley Codecamp last year and described how to do what Haskell does in C++ so saying one is better is somewhat pointless.
“You can’t build something in isolation and be confident that it will work correctly with the rest of the system unless you understand the rest of the system.”
– you can if its interfaces are fully defined. Don’t use shared memory (leave that to the compilers).
In that vein someone pointed me at this recently –
http://ccadar.github.io/klee/
It’ll tell you your Haskell is equivalent to your C++
http://www.haskell.org/ghc/docs/7.6.1/html/users_guide/code-generators.html
September 22, 2013 at 8:12 am
[…] Edward C++Hands I’ve been looking for a good analogy of what programming in C++ feels like and I remembered this 1990 Tim Burton movie, Edward Scissorhands. It’s a darker version of Pinocchio, shot in suburban set… […]
September 22, 2013 at 10:18 am
fact n = if n == 1 then 1 else n * fact( n – 1 )
If haskell is unable to do a tail call optimization, then there are going to be n stack frames where each one is holding a different value, right ?
Now
for( fact = i = 1; i < n ; i++, fact = fact * i );
Here there is one stack frame and one copy of the value.
Am i missing something?
September 22, 2013 at 11:51 am
@Michael: Haskell guarantees tail optimization.
But just like we would like to avoid explicit loops in C++, we’d like to avoid explicit recursion in Haskell. So a more readable implementation would be (it also gives the right answer for n=0):
Actually, factorial is not defined for negative numbers, so all these definitions are mathematically incorrect. You might want to protect from it using a runtime assertion:
BTW, product itself is defined in the Prelude in terms of foldl — still without exposing recursion:
foldl itself is recursive.
September 22, 2013 at 12:14 pm
@Kevin Cameron: I don’t find all language discussions boring. They potentially lead to the improvement of existing languages and creation of new better ones. At some level all languages are equivalent, so long as they are Turing complete. So the fact that you can translate Haskell to C++ and vice versa misses the point. The point is: how easy it is to express your problem in a given language and how much room for error there is (there’s always some room for errors, but there’s a huge difference in this respect between vanilla machine language and, say, Agda).
As for STM, it’s not a panacea, but it is the only composable synchronization mechanism that I know of. There are some drawbacks to STM (performance under high contention, fairness, starvation guarantees), so Haskell offers other synchronization mechanisms as a fallback. Those are not composable and might lead to deadlocks, just like traditional locks.
September 23, 2013 at 12:44 am
@Bartosz – It’s not the languages per se, it’s more that by the time you have the language a bunch of things have been cast in stone and they won’t be changed. In two decades of language design for hardware design I don’t think I’ve had a decent discussion about what the problem the language was supposed to solve was, or the methods needed in the absence of a programming language. I.e. the choice of Haskell or C++ is immaterial, it’s the algorithms that are important, semantics trump syntax.
STM is unlikely to fit in with KLEE (or other formal approaches) since it is non-deterministic – i.e. composable or not it may not complete. As such it’s not a useful abstraction for programmers – Wikipedia:
*The only sticking point is that it’s unclear to the caller, who is unaware of the implementation details of the component methods, when they should attempt to re-execute the transaction if it fails.”
September 23, 2013 at 1:01 am
Good article. As an experienced C++ programmer I agree that C++ has some serious limitations that make parallel programming both harder and more dangerous than necessary — and parallel programs are hard enough generally.
Haskell might well have advantages for parallel processing — much as APL did for numerical programming — but I find it hard to see how it can overcome the obstacles of large existing codebases in C++ and shops full of C++ programmers who work on them. Are companies going to risk abandoning all that investment to move to a lesser-know language? Hiring good C++ programmers isn’t that easy: how much harder would it be to find good Haskell programmers given their smaller numbers?
September 23, 2013 at 10:42 am
Any thought on Go? It lacks generics, sure; but one day they will probably add them. Of course, second-order linear logic is undecidable, but plain System F is undecidable too, so that’s not that big deal, and we don’t need much of the first-order stuff anyway.
September 23, 2013 at 11:31 am
Don’t confuse high-level logic with low-level programming.
Yes, if you want to program a UI, use HTML5 (don’t bother with rendering, animation, typesetting, etc.). If you want to solve formulas, use FORTRAN (no pointer aliasing, CUDA support, etc.). If you want to write game logic, use the scripting language that comes with the _engine_ (whichever it is).
But…
All those things have to run on an _engine_. You have to program that engine in some language, you know, that thing that drives an application. Or go a level deeper and ask yourself about an operating system. An OS written in Haskell? Maybe, but I strongly doubt that (you’ll end up with lot’s of native calls and be in the formidable position to debug your code across language boundaries).
You should distinguish between different disciplines.
And speaking about mobile development (people talking about multi-core embedded devices) those devices have one barrier that’s actually found in all other platforms as well (even though they might be wired to a powerline): The magical word is energy efficiency or to put in a buzzword: Green IT.
I know that some problems can be solved a lot more elegantly (and easier) using a different language than C++. But to realize them you need something that gives you fine grained control over the bits and pieces that are being executed, a guarantee (see real-time requirements for, let’s say, nuclear reactor ECUs) that they take X cycles to execute, a way to look at/tweak the actual machine code that is being executed.
Really. Every high-level language is transformed into machine code at some point and you are talking about people shooting themselves in the foot when they use the “wrong” tools to solve a problem. It’s not a matter of the wrong tools, it’s a matter of people being unable to use them correctly. Don’t put a 3 year old behind the wheel of a Ferrari and complain if the result is a bloody massacre.
Don’t get me wrong, I admire people seeking out other language alternatives and trying to invent the ‘better’ programming language. But the reality so far showed me one thing: Giving people control is a good thing if they can handle that. It’s a disaster if they can’t. C++ is the better C, it’s a SYSTEM programming language. It’s not meant to be compact and look sweet when you want to calculate Fibonacci or the Mandelbrot fractal. It’s about efficiency, control and guarantees (very much like the assembler counter-part, but on a higher level of abstraction).
So, if you don’t know your system, you are doomed. True. If you want to shield your ‘core’ from bad influences you have to behave like an OS and put those processes into user space and shield your kernel from that. You could start every co-workers code in a different process and use IPC. A process is crashing? Just restart.. that’s not the way to go if you want to have a good software solution but it’s probably necessary if you are working with co-workers that are unable to produce stable code.
As a last statement I just want to emphasize that software (apart from _simple_ Hello World programs) forms _complex_ systems. You might like the following article:
http://www.colorado.edu/AmStudies/lewis/ecology/break.htm#effect
Bottom-line: Nothing is safe. Nothing is stable. The more complex systems get, the more likely they will fail.
September 23, 2013 at 12:04 pm
@Joker_vD Go shares many of the same pitfalls C++ has when it comes to concurrency. In fact, it has it worse in some cases. For instance, it has absolutely no notion of immutability. At least C++ has ‘const’ (which doesn’t solve the entire issue as pointed out in the article).
Furthermore, Go seems to shun anything related to functional programming approaches. Even C++ has a more concise lambda syntax. C++’s RAII is superior to Go’s defer statement, and Go error handling is very primitive and error prone. It also managed to fall for Hoare’s billion dollar mistake (null pointers).
September 23, 2013 at 4:06 pm
@Kevin Cameron: I found the offending quote in Wikipedia. I can’t decide if it’s just confusing or plain wrong. This part is wrong: ” it’s unclear to the caller … when they should attempt to re-execute the transaction if it fails.” The caller never re-executes the transaction — the system does — until it commits or the code throws an uncaught exception.
The ‘retry’ keyword is a very clever composable way of implementing condition variables within the framework of STM. ‘retry’ unwinds the transaction and block until one of the shared variables (TVars) that is in its read set is externally modified. Then it restarts the transaction. Because of `retry`, implementing a bounded producer/consumer queue is trivial in Haskell. It’s a major undertaking in C++. (Have a look at this implementation. The author concludes that there are no deadlocks, because nobody could find them by inspection.)
September 23, 2013 at 4:10 pm
@bjforshaw: “Are companies going to risk abandoning all that investment to move to a lesser-know language?” They did move to Java, didn’t they? The reasons had something to do with the complexity of programming in C++. Unfortunately, Java is even less ready for parallel programming than C++.
September 23, 2013 at 4:27 pm
@Joker_vD: go’s answer to data races is ThreadSanitizer. It’s supposed to instrument your program, analyze it while it’s running and catch possible sources of data races. I’ve been in the business of data race detection when I worked for Corensic, and I have personally met Dmitry Vyukov who works on ThreadSanitizer. I assume it’s a good tool, but no race detector is 100% accurate, and when data races slip through testing and into production they are extremely hard to reproduce. There are some major companies that have known about fatal races in their products but couldn’t diagnose them for years.
September 23, 2013 at 11:56 pm
“The caller never re-executes the transaction — the system does — until it commits or the code throws an uncaught exception.” – my understanding of STM is that it’s an all-or-nothing commit, and if you get nothing it’s up to you what to do about it, as such it’s non-deterministic as a language mechanism/semantic. Functionally it’s not much different to asking for a lock and maybe not getting it, the difference is that it’s more fire-and-forget from a programmers perspective – like using garbage collection (which is also somewhat non-deterministic). STM is not a good language mechanism since the compilers and tools are not going to be able to help you much.
October 2, 2013 at 10:01 pm
[…] Milewski’s rant: “Edward C++Hands“ […]
October 18, 2013 at 12:54 pm
this is the worst promo for *Haskell* in a long long time…
because it looks like only crazy delusional people that have to resort to false statements to make it look better than C++ use it.
I would say that Alexandrescu converted 100x people to D than you converted to Haskell(because he has very good understanding of C++ and authority in C++ comunity) but that would be false. In fact ratio is negative. If it wasnt for your ridiculous attempts to attack C++ *more* C++ programmers would try/switch to Haskell.
October 18, 2013 at 1:57 pm
@nosenseetal: If you find fault with any of my arguments, please say so and I’ll be glad to respond. I can’t argue with arguments about authority and rates of conversion.
October 18, 2013 at 4:55 pm
@Bartosz:
your argument about atomics is just horrible….
:
“This locking is usually implemented with atomic variables, but so are mutexes! Don’t be fooled: accessing atomic variables is expensive.”
Accessing atomic vars on x86(and soon on ARM iirc Herbs blog post) is cheap, stores are expensive.
Ofc there is fake and non fake sharing, contention, bla bla bla…
And blasting shared pointer for using atomics:
a) shared pointers should not be used as default smart pointer, pointers should be avoided anyway. 🙂
b) shared pointers atomic ref cnt is expensive to normal ++ and — , but it is not done 24/7
or to make it more simple: FP division is slower than multiplication, but if you would tell somebody to replace all their /some_fp_constant in code with
* (one_divided_by_that_fp_constant) they would think you are crazy because except if it is done often it is a very very small cost.
If you want you can bother and do some measurements or read/see what bureau14 did:
https://blogea.bureau14.fr/
https://github.com/boostcon/cppnow_presentations_2013/blob/master/tue/scaling_with_cpp11.pdf?raw=true
( their quasarDB is written in C++11 and asm, not Haskell, those crazy crazy people… )
making atomics looks slow (guilty by association 😛 by mentioning mutexes) is crazy…
Atomics are super duper fast!
If you dont trust me check out Martin Thompson’s work on Disruptor. 🙂
(Java , but it translates to C++ since like you say they share similar mem model)
Ofc feel free to trash Distuptor’s performance with your pure immutable code… you might even earn some serious money… I hear money is good these days in finance industry 😛
http://lmax-exchange.github.io/disruptor/
October 18, 2013 at 5:06 pm
“But just like we would like to avoid explicit loops in C++” – really? Why?
The more explicit you are in coding the easier it is for the tools to understand what you are trying to do and help make it efficient.
There was another session at Silicon Valley Code Camp about how to do monads (Haskell-like) stuff in Java this year, and (frankly) I can’t see the benefit – the speaker said (more or less) it was just a style preference. It certainly doesn’t appear to have any natural parallelism that can be exploited.
October 19, 2013 at 11:52 am
@Kevin: A short succinct answer would be “loops are the new gotos”. Seriously though, when you write a loop you are telling the processor “how” to do things. You get into a lot of detail that obscures the “what” to do. When you, instead, call std::transform or std::accumulate, you are telling the compiler “what” to do. Algorithms, unlike loops, are also easily composable, and that’s where a good compiler can perform spectacular optimizations, like fusions.
As for monads, their power is not in code generation but in letting the type system check your code before it’s run. Monads encapsulate effects. C++ (and Java) have been struggling, mostly unsuccessfully, with the effect system for exceptions — the exception specification — and that’s the easiest one to implement in functional languages (the Maybe/Either/Error monads).
Because of the lack of the effect system, the attempts to implement software transactional memory (STM) in imperative languages have fizzled out. Monads (and their cousins, applicatives) make concurrent programming work. With a few lines of Haskell code you can implement a concurrent bounded producer/consumer queue. Try doing this in C++!
October 20, 2013 at 5:29 pm
@nosenseetal: Comparing performance of synchronization schemes is nontrivial because you have to take into account different usage scenarios, in particular low and high contention regimes. At low contention a well implemented lock (thin lock, which I described in one of my previous blogs) is no more expensive than an atomic variable. Why? Because it is implemented as an atomic variable.
Entering a critical section is one atomic CAS (compare and swap) and so is the modification of the reference counter. Things become different at higher contention regimes. A thin lock is upgraded to a blocking mutex, which has a queue of blocked threads. A reference counter becomes a spin lock in which threads are busy waiting. As you know there are well known tradeoffs between blocking and spinning locks and none is universally better than the other.
You are right that on an x86 atomic reads are cheaper, but there isn’t really much need for reading a reference counter except in the process of changing it. You can’t write code that first reads the refcount and then, based on the value, decides whether to modify it or not. That won’t work for obvious reasons.
If you studied processor architecture, you must know that atomics are not cheap because of cache coherency requirements.
October 25, 2013 at 3:15 pm
@Bartosz
I am not aware of usage where shared_ptr contention is a big problem, if you do know please share. Please be specific.
But not just to be disruptive 😛 Im gonna give you a suggestion:
make an Coursera like offer for your Haskell IDE:
you make a course for C++ devs and if they buy personal licence for 2y they can take the course, and you maybe even give feedback on general problems with homework… And ofc it should not be some 6 week intro to haskell but real from imperative 0 to immutable hero journey. 🙂
Ofc maybe ppl would not be interested, or even worse they would but if your course sucks they would hate you for scamming them out of their money… 🙂
BUT I think it is idea worth trying…
Just for the record I preregister 🙂
October 25, 2013 at 4:55 pm
@nosenseetal : I haven’t been working for FP Complete since May, so I have no interest in promoting their IDE. Incidentally, I was the ardent proponent of releasing the personal version of the IDE for free.
As for shared_ptr, I’m currently working on a series of blogs about functional data structures in C++ where I will discuss various performance issues in much more detail.
October 28, 2013 at 5:49 pm
@bartosz
eh so much for : https://bartoszmilewski.com/2013/01/03/the-year-of-haskell/
either way like i said please be very precise if possible and please dont use examples like 5 threads each try to populate vector of 1M shared_ptr cpying from the same source sp. 🙂
November 7, 2013 at 5:09 pm
I’m well aware that I’m not in a position to lecture you, and I don’t intend to do such a thing, the following is merely my observation:
In general you shouldn’t use a Bulldozer in place of a minivan (think of them as tools)! You shouldn’t use a jackhammer when you simply need a hammer (again, think of them as tools)! As it is said billions of times before, different languages have different uses and different audiences… you can’t indefinitely generalize them and put them in a single group, then give a general specification of what a good language is then criticize one for not complying with your specification.
Is C++ perfect? No, far from it. After all any (even unfair!) type of criticism is good for the evolution of the language. Now the question I have is why people criticize the language for what it doesn’t try to accomplish? C++ has survived and is what it is today mostly because there is reasonably enough use for it to keep it alive and going.
Correct me if I’m wrong, but C++ exists to address performance (as much as a native high level language can achieve) and backward compatibility with C and previous C++ standards. Another thing is, why should we decrease the capabilities of a language like C++ just to make it “easier” to use/understand for programmers who are not motivated enough to learn, or simply cannot afford to learn how to use it (for whatever reason)? I don’t think being “charitable” was an objective in designing C and C++ !
Some times I wonder why experts like the author of this blog try to change C++ wasn’t try to be, what it isn’t, and it isn’t trying to be? May be the solution is to design a whole new language with another set of design objectives?
One more thing is likening C++’s backward compatibility with humans still having tails and gills ! You know that analogy doesn’t really work ! C and C++ are not that old ! You are given a bunch of options, and you can choose what to use according to your needs… if someone is not in a position to (at least) make an educated choice, he/she shouldn’t be using C++; why would anyone choose to use a tool without the required skills and expect to get the desired results? Again… I don’t think ‘being charitable’ is a good design objective!
December 2, 2013 at 3:00 am
[…] a language. Thus far, I had only been exposed to C and C++. and what’s to love there? (Hint: NOTHING) That definitely changed with my exposure to Java in this class. Coming from C/C++, I felt that […]
August 6, 2014 at 6:19 pm
[…] Edward C++Hands […]
September 15, 2014 at 1:42 am
I’m late to reply, but anyways.
In C reduntant casts are reduntant. This compiles with “gcc -ansi -Wall -Wextra -pedantic” without warnings:
So C++ didn’t solve that problem, but created it.
September 18, 2014 at 1:40 am
Sad to read so many still defending C++. Change can only come incrementally. Keep fighting the good fight. At the end of the day, modern hardware is a poor match for the C/C++ model. The very thing that made C style languages so fast is their Achilles Heel today. They are very much optimized for the PDP-11. But our modern machines don’t resemble the PDP-11 in any way.