September 11, 2010

Beyond Locks: Software Transactional Memory

Posted by Bartosz Milewski under Concurrency, Multicore, Multithreading, Programming, Software Transactional Memory
[28] Comments

1. “before every read we have to access our write set…” This is in case we have already written in this location during the current transaction. The write hasn’t been reflected in shared memory yet — it’s still sitting in our write set. We want to read this new value, not the original. So it’s not a conflict, but rather a consistency requirement.
2. “The lock and version number are sampled in one load.” Version number is combined with the lock bit in a single memory word, so it can be loaded in one instruction. Now we have a copy of version/lock locally and we can check it. If the test fails, we abort. If it succeeds, Traditional lock-based approach to concurrent programming is error prone, hard to debug, and almost impossible to scale.

In my previous post I gave a preview of how concurrent programming might evolve in the future, and the topic of Software Transactional Memory (STM) came up quite naturally. STM has the potential to replace locking with an easy to use, virtually fool-proof, scalable paradigm for concurrent access.

It’s true that STM has its problems; performance and integration with existing languages being the serious ones. The hope is that performance can be improved, especially if there’s some support in hardware, and that a new batch of concurrent languages will incorporate STM more smoothly.

In this post I won’t be elaborating on the superiority of STM over other paradigms but rather concentrate on how it works. I hope my explanation will be more approachable than academic papers.

STM in a Nutshell

Let’s analyze a very simple example, a singleton pattern in C++ (I chose C++ because it exposes the use of pointers):

Foo * getFoo() {
    static Foo * pFoo = 0;
    if (pFoo == 0) // <- read
        pFoo = new Foo(); // <- write
    return pFoo;
}

As you might know, this code doesn’t work correctly in a multithreaded environment. Let’s have a look at why, but from a slightly unusual angle. The statement:

if (pFoo == 0)

executes a memory read of the variable pFoo. The value that’s been read is usually stored in a register–in other words, is cached–at least until it’s compared with zero. Suppose that we read zero (a null pointer). We then proceed with the allocation and construction of the Foo object. We update the variable pFoo (a memory write) and return it to the caller.

The important thing is that the write to pFoo is only correct if the condition pFoo == 0 is maintained up to the moment when the new value is stored (we don’t even think of such subtleties in sequential programming).

But that condition may be broken by another thread storing its own creation in pFoo. It’s a classic data race, and its consequences may vary from a simple memory leak to a deadly memory fault (that might happen on some processors if the object Foo manages its own internal pointers). In general, it’s the gaps between logically related reads and writes that leave room for other threads to break our assumptions. If we could only concentrate those reads and writes into one atomic action, we’d be safe.

Let me propose a crazy solution: Before performing the write to pFoo, why don’t we re-check, or validate, our original read (assume that it’s still in the register). If the value in memory is still equal to the value in the register, we proceed with the write; otherwise we back off and try again.

Wait a moment! Aren’t we just postponing the problem? What about another thread messing with pFoo right between the read validation and the write? True, that is a problem, and we have to figure out a way to make the sequence read-validation/final-write atomic. But the beauty of this crazy approach is that it can be made universal. The code that does generic read verification may be compiled into the language runtime. It may be implemented once and for all, thoroughly tested, and used whenever concurrent access is needed.

All the programmer has to do is to inform the compiler that a logically related group of reads and writes must be done atomically; for instance, by enclosing the relevant code in the atomic block (not actual C++!):

Foo * getFoo() {
   static Foo * pFoo = 0;
   atomic {
      if (pFoo != 0) // read
           pFoo = new Foo(); // write(s), possibly reads
   }
   return pFoo;
}

Here’s what happens inside the atomic block (there are many variations of the algorithm but the idea is the same).

First, a transaction is started–the runtime allocates a data structure containing two logs.
Every memory read inside the atomic block is logged in a log called the read set.
Every memory write, instead of going directly to memory, is written into the write set (there are also versions of STM that let the writes go through, to be undone if the transaction aborts).
At the end of the atomic block the system attempts to commit the transaction. It does that by first verifying the read log and then performing the writes. (There is a very fine-grained system of locking in play, which I’ll describe shortly.)
If the read verification discovers a mismatch, the transaction is aborted and repeated from scratch, until it succeeds.

Some Details

There is one particular implementation of STM that’s been most widely accepted: Transactional Locking II (TL2), which I will summarize here.

Memory Layout

Imagine that for every single word of main memory you also have a separate data structure that stores the version number and a write lock for that word. How this can be done efficiently is an important implementation detail that I will explain later. For each memory word, its version number is initially set to zero and its lock is open.

Transaction

Now let’s start a transaction. We allocate a thread-local transaction object that has a read set and a write set, initially empty. We assign a version number to the transaction by (atomically) sampling the global “version clock” (it’s called a clock because it can only move forward). This will be our transaction’s read version number.

The compiler has conveniently instrumented all reads and writes inside the scope of the transaction (that is, in the atomic block). Each read adds an entry to our read set, storing the address of the word being read. Each write adds an entry to our write set, storing the address and the value to be written. The value in main memory is not changed at this point–the write is virtual.

Actually, before every read, we have to access our write set in case we have already written to that location. Moreover, we need to check that the location is not locked by another committing transaction, and that its version is less or equal to our transaction’s read version. This is why we need all those per-location locks and version numbers. If the location is locked or if its version number is larger than our current version, we abort the transaction and repeat the whole process again.

These checks are costly, and they are optimized to just a few instructions in the common case. So, for instance, the write-set check is implemented using a Bloom filter. Before every read we sample the corresponding lock and the version number (in one load, since it’s one word) and keep it for later. Then we check our write set using the Bloom filter. If we don’t find the corresponding entry, we just read the value directly from memory, otherwise we read it from our write log. Then we store the read address in our read set, and finally check the version number/lock that we have previously sampled. (On some processors, read barriers are required when accessing the locks.)

Going back to our singleton example, in Fig 1 you see the transaction corresponding right before committing. We have read the value of pFoo (directly from memory, since there was no entry in our write set yet) and logged the address of pFoo in our read set. We have checked that the version of pFoo, 6, was less than the version of our transaction, 8. We have also allocated a new Foo in memory. During the construction of Foo some two fields of Foo were (virtually) written and they have been saved in the write set together with the new value of pFoo (the pointer to the new Foo).

Fig 1. An STM transaction before commit. Every location in main memory has a corresponding version/lock entry. Both the read set and the write set have an entry for pFoo. The write set also stores the new value of pFoo, which is a pointer to the newly allocated Foo.

At commit time, we lock all our write locations (there are three of those) using the per-word locks. Since we might be racing with other transactions, we should be careful not to cause deadlocks. The simplest precaution is to use bounded spinlocks. If we fail to acquire any lock within a set timeout, we abort the transaction.

Committing the Transaction

Sequencing and Validation

During read set validation, we check the lock and the version number of each location that’s been read by our transaction. The problem is that this operation is not atomic (we don’t lock read locations). While we are validating, other transactions might modify the already validated locations. I’ll show you that it really doesn’t matter in the big scheme of things.

In the big scheme of things, there is a sequence of transactions, each making an atomic update. So our program’s memory switches from one consistent state to another. When, precisely, this jump occurs is called the “sequence point.” As I mentioned, the sequence point for each transaction is reached after the write set is locked but before the read set is validated.

Now imagine that we have two read locations, r1 and r2. We have validated r1 and, while we’re validating r2, a stray transaction commits a write to r1. What can be said about the sequence points of the two transactions? Our sequence point was set before we stated read validation. Since we have successfully validated r1, it hadn’t been locked by the stray transaction. It means that the other transaction’s sequence point must have been set after ours; and we don’t care about “future” transactions. All we are concerned about at this point is whether an earlier transaction, with an earlier sequence point (which is, however, greater than our read version), hasn’t overwritten those locations.

At this point we atomically increment and fetch the global version clock–this will be our transaction’s write version, not to be confused with the earlier read version. We haven’t committed the transaction yet but, if we commit it, this will be its official commit point, a.k.a. sequence point.

Now we can validate the read set at our leisure. We make sure that the read locations are not locked by another thread, and that their version numbers are still less or equal to our transaction’s read version. This is a slightly tricky point–see the sidebar.

In our example, the locks for pFoo and for the two fields inside the newly allocated Foo will be taken, the clock incremented to 9, and the read set validated.

The read set is small–just the location of the variable pFoo. It is not locked by another thread (although it’s locked by us, because it is also present in our write set), and the version number is still less than 8.

The final step is to commit the write set. Each location is updated and then unlocked. We also set the version number for each modified location to the write version of our transaction (the value of the version clock at our sequence point). In our case, we change the version number of pFoo from 6 to 9.

You might be wondering why we verify each read at least twice–once while executing the instrumented client code, and once during validation. After all, if we read the wrong version of the variable, the transaction will fail anyway during verification. The problem is that, after reading a bunch of inconsistent values, the client code might get into all kinds of trouble, including infinite loops or access violations. This used to be a major concern before TL2.

Reasons to Abort

Let’s consider what can go wrong in our example that would force us to abort (and retry).

During the first read of pFoo:
- We find it locked. That could happen if another transaction is just committing its write to pFoo.
- It’s version number is greater than our transaction’s read version number. That could happen if another transaction had just committed its own write.
During pre-commit locking, we find pFoo locked by another transaction. (We let that transaction proceed.)
During read validation of pFoo:
- The location is locked by another transaction. (This can’t happen in our case because we have locked it ourselves.)
- Its version number is greater than our read version number. This could happen if another transaction committed its write to pFoo after we have read it last.

You can easily convince yourself that in each case it really makes sense to abort the transaction. Moreover, when we retry the transaction, pFoo will have already been initialized by another thread. In that case the transaction will just breeze through, almost as fast as the lock-based implementation.

Optimizations

Note that during our transaction other non-conflicting transactions could have committed successfully. Each such transaction increments the version lock. This is why our write version could be arbitrarily larger than our read version. On the other hand, if the write version is just one unit larger that the read version, we know that we were alone, and we can skip the read validation altogether. In fact this is what would have happened in our example where the read version was 8 and the write version was 9.

Another great optimization is read-only transactions. Transactions that don’t perform any writes don’t have to log the reads. It’s enough that they perform the lock/version check at the point of each read. If all reads are successful, the transaction is done. It has seen a consistent state of memory.

It’s worth noting that atomic operations in TL2 are non-blocking. They are usually implemented using the CAS instruction (Compare and Swap). The locking of individual memory words is done using bounded spinlocks.

Conserving Memory

Let’s see how we can implement all the locks and version numbers without doubling or tripling our computer’s memory. To begin with, we can combine the lock with the version number in one word–a versioned lock. We only need one bit for a spinlock and we can stick it at the bottom of the version number, as long as we only use even version numbers.

The second trick is to use a reasonably sized table of versioned locks and hash all memory locations into that table. Since there will be many locations that map into the same lock, spurious aborts may happen. Notice however that spurious aborts have no impact on the semantics of the program. Aborted transactions will be retried until they commit. In practice such spurious conflicts happen reasonably rarely.

Performance

With the usual caveat that there are lies, damned lies, and benchmarks; the performance of STM clocks at about half the performance of hand-crafted locks (this is more or less the consensus number–see the interview with Peyton-Jones and Harris for confirmation). Which is not bad, considering all the instrumentation that goes into it. But if you have a program that spends 90% of its time concurrently inserting and removing items from a red-black tree, you might consider hiring a team of highly experienced programmers and testers to create a faster, lock-based, or even lock-free, solution. But take into account an important fact: STM is very fine grained. For instance, when you’re inserting an item into a tree, the STM transaction will only lock the nodes that you are actually modifying. STM will easily beat a solution that uses one global lock per whole tree. Per-node manual locking is hard to implement correctly because of the risk of deadlocks.

By its optimistic nature, STM performs best under low contention scenarios where the vast majority of transactions commit on the first try.

What’s Next?

Will performance problems interfere with a wider acceptance of STM? There is an interesting analogy between garbage collection (GC) and STM put forward by Dan Grossman. Historically, performance concerns were a major obstacle in the acceptance of GC until a general-purpose language, Java, adopted it as the memory management tool. If the analogy holds, the same will happen to STM and shared-memory concurrency.

STM has a permanent spot, and probably the best implementation, in Haskell (doesn’t it remind you of the relationship between Lisp and GC?). There’s been some STM activity in C++, Java, and other languages (see Bibliography) but, without language support, STM doesn’t offer enough bang for the buck in terms of safety and ease of use. That’s why I’m really excited about the use of STM in a batch of new high-performance languages, in particular in Chapel. Will Chapel become to STM what Java became to GC? Only time will tell.

I’ve been following the progress of the Chapel’s implementation of STM and talked to some of the developers. In distributed environments, STM can perform outstandingly. That’s because the overhead of STM read/write logging (which is done locally) is dwarfed by the usual distributed communication costs. The trick is to piggyback STM communications on top of remote reads and writes, which have to happen anyway. Chapel also has a chance to implement a type system that makes STM safer to use. I hope to discuss those options in a future post.

Bibliography

Herlihy and Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures
Dice, Shalev, Shavit, Transactional Locking II
Interview with Simon Peyton-Jones and Tim Harris about STM
Harris, Fraser, Language Support for Lightweight Transactions–STM in Java
TinySTM, a public domain implementation of STM for C and C++
Deuce, a Java implementation of STM
Bronson, Chafi, Olukotun, CCSTM: A Library-Based STM for Scala
Harris, Marlow, Peyton Jones, Herlihy, Composable Memory Transactions–STM in Haskell.
Sridharan, Vetter, Kogge, Scalable Software Transactional Memory for Global Address Space Architectures–STM in Chapel

28 Responses to “Beyond Locks: Software Transactional Memory”

Dmitriy V'jukov Says:

September 27, 2010 at 12:24 pm
> If the analogy holds, the same will happen to STM and shared-memory concurrency.

Ok, let’s wait for another 30 years 🙂
David Winer Says:

September 27, 2010 at 1:52 pm
Clojure has STM: http://clojure.org/concurrent_programming
sheredom Says:

September 27, 2010 at 2:22 pm
Another aspect of STM is how well it fits with heterogeneity.

With processors like the Cell most likely going to be the future of computing (one main processor with several sub-processors adept at performing specific tasks), where each processor has its own local memory store, lends itself to STM by eliminating one of the two major overheads of STM – data has to be copied locally anyway. Therefore the only overhead remaining is with the commit, and any subsequent re-run of the atomic block!
No need for locks? « SoftTalk – multicore and parallel programming Says:

November 15, 2010 at 7:25 am
[…] I was reading a post that elaborates on STM and focuses on how it works. For example, it provides a ‘nutshell’ view of STM by analysing a […]
panqnik Says:

February 9, 2014 at 9:13 am
Hey there,
Very useful post! I would like to invite you to visit my blog as well, and read my latest post about sequence points in C and C++.

http://blog.panqnik.pl/dev/sequence-points-in-c-cpp/

Best regards,
panqnik
‘Compare and swap’ vs Software Transactional Memory « Homolog.us – Bioinformatics Says:

November 25, 2014 at 4:13 am
[…] Beyond Locks: Software Transactional Memory […]
Hoang Minh Chau Says:

February 17, 2015 at 5:59 am
I don’t quite get the point of ‘big-scheme of things’, consider 2 scenarios:

Another transaction starts after my transaction, commit to my read-set just before my read-validation, causing my transaction to fail.
Another transaction starts after my transaction, commit to my read-set just after my read-validation, my transaction is safe to process.
It doesn’t seem differently to me in both cases (my transaction starts before the other transaction), yet the end result is different. The read-set validation seems pointless here?
Bartosz Milewski Says:

February 17, 2015 at 5:40 pm
@Hoang: It doesn’t matter when transactions started. What matters is their sequence points. You do worry about transactions that reached their sequence point before you, but you don’t have to worry about the ones that reached it later (they will see your write set locked).
Valentin Tihhomirov Says:

April 16, 2016 at 3:23 pm
I have first got to know about MVCC (multiversion control), which claims to replace locking with versioning. Yet, I reasoned that you cannot get away with versions alone. You need some locking at the commit. You seem to prove this. I am still unaware about the difference between STM and MVCC.

From your explanation I did not get however how does MVCC/STM save us from locks. You still have to abort and retry in case of concurrent transaction. This seems to be more laborious that just wait until that transaction completes. So, I do not see any advantages over the classical locks. It looks like cooking a stone soup for me: you start with “lock free” paradigm but what you arrive at in the end is a worst form of locking.

The MVCC is a good example. Suppose you have checked out some sources from the server. You can modify individual pieces of code lock free. But, in the end, you commit and find out that those code pieces are changed. You have a read/write conflict and need to restart your “transaction”. The same effect would be in place if collaborating parties just locked the pieces of code globally. You going to update something and see that that piece is blocked. You realize it immediately with classical locking. You realize it post factum with MVCC. But, result is the same.

Does MVCC let one party to finish whereas simple locking would cause a deadlock? I could not understand this from your TL2. It seems like if two transactions intermix reads and writes to the same fields, they will conflict and block each other. They both have to drop, as you guess. When you say that you may need a limited number of retries you mean that this scheme is not deadlock free.
Bartosz Milewski Says:

April 17, 2016 at 9:05 am
@Valentin: The main point of STM is that it’s optimistic. It assumes that updates are relatively rare, which is true in a lot of scenarios. So, to begin with, conflicts are rare. With locks, you must take a lock every time your read, even if you know that updates to your data are extremely rare in your program.

Secondly, whenever a conflict appears, at least one thread makes progress. That’s guaranteed by the algorithm.

Thirdly, programming with STM is very easy and safe. You never have to worry about deadlocks. That allows you to implement very fine granularity concurrency control. A lot of lock-based data structures use a single lock for a whole data structure, which very quickly becomes a major bottleneck.

MVCC is mostly used in databases and version-control systems, although a version of MVCC is used in the implementation of STM in Closure.

STM is mostly used for in-memory transactions (although, at some point, the Chapel team at Cray worked on a version of STM for distributed systems). Therefore it doesn’t follow the whole ACID principle: it skips Durability.
Valentin Tihhomirov Says:

April 17, 2016 at 2:02 pm
But, in the end, at the verification/write stage, you still do the locking. So, you need to lock anyway. Shouldn’t this nullify the optimistic no-locking advantages? Because you see, you have read, you have checked that nobody overwritten your read. Yet, this check need to be done under lock with subsequent write for atomicity. Otherwise, you cannot be sure that somebody interferes between your last read and subsequent write.
Valentin Tihhomirov Says:

April 18, 2016 at 8:48 am
But you say that you still need the locks anyway! You confirm that you verify that nobody overriden the data you have read during the transaction. However, you say, this won’t help because somebody else may interfere between your last read and write operation. You need a lock anyway to ensure atomicity in STM. This is what I have got from your article. So, how can STM save you from locks when you need them anyway. This is the thing I could never understand reading about MVCC. You seem cannot commit without the lock.
Bartosz Milewski Says:

April 18, 2016 at 11:48 am
The whole algorithm is lock free in the sense that there is no blocking. A standard lock has to work with the operating system when it wants to suspend a thread that fails to acquire the lock. That’s very expensive.

The per-word locks in STM are very lightweight. They just do a few CAS instructions in a loop to set the lock bit. If the bit is still locked by another thread, the transaction is aborted and retried. There is no need to suspend any thread. Spinlocks are usually much faster than blocking locks, if contention is rare. These per-word locks in STM are also released very quickly, thus reducing the likelihood of contention even further.

Anyway, the performance analysis of STM is not easy. The whole mechanism is very fine tuned at the lowest level. And performance is not the main reason to use STM — ease of use and safety are the main advantages. This is reminiscent of the tradeoffs between manual memory management using new/delete (in C++) and garbage collection (e.g., in Java).
Tingyu Says:

April 25, 2016 at 11:43 am
Hi Bartosz,

I just read the post, it is pretty instructive and your explanation
is indeed more approachable than adademic papers. Especially for
“STM in a Nutshell”. But I’m still puzzled in several descriptions
as you go deeper. I would really appreciate if you could explain more on the following descriptions.

You mentioned “before every read we have to access our write set in case we have
already written to that location”. Can you be more specific on this? My confusion
is how can a thread itself conflict with read and write? If so, could you give
any scenario?

I don’t quite understand the description: The lock and version number are sampled in one
load, the lock bit is checked, the address stored in the read set, and then
the version number and lock rechecked once more. On some processors, read barriers
are required when accessing the locks. Could you give any context on this?

In the example you mentioned, two fields were virtually written and they have been saved
in the write together with the new value of pFoo. Could you explain more on the
three fields for write?

Also when the read set validation happens? Before we lock all our write locations or
after? I don’t quite get the whole picture of how read set and write lock interleave
with each other.

Also it seems there are multiple versions, including read version, write version. If
possible, could you explain what’s the relationship?

Again, thanks for your help on this!

Tingyu
Bartosz Milewski Says:

April 25, 2016 at 5:42 pm
1. “before every read we have to access our write set…” This is in case we have already written in this location during the current transaction. The write hasn’t been reflected in shared memory yet — it’s still sitting in our write set. We want out program to read this new value, not the original. So it’s not a conflict, but rather a consistency requirement.
2. “The lock and version number are sampled in one load.” Ok, after re-reading this fragment I have edited it and it hopefully got clearer. (I hate modifying old blog post because it’s error prone — I don’t have the whole picture in mind any more — but I did look up the original paper to check the algorithm.)
3. “Three fields were modified.” It’s just an example — I have arbitrary decided that the constructor of Foo makes two additional memory allocations.
4. “When doest the read validation happen?” I think it’s clear that it happens after all write locations are locked.
5. Each transaction has two versions. Read version is defined by the clock when starting the transaction. Write version (or the sequence point) is defined by the incremented clock after locking the write set.
Tingyu Says:

April 27, 2016 at 9:03 am
Hi Bartosz,

Thanks for these answers, they are pretty helpful and
the detail is much more approachable to me now. Based
on your answer, I have two incremental questions.

You mentioned the write version (sequential point) is
defined by the incremented clock after locking the write set.
Is the locally incremented clock also made global at the
time of locking? If so, other transactions could get the
latest global clock. I just wonder whether the global clock
is changed at the time of locking or
after the transaction is over?
You mentioned a potential risk is that during the commit
time, r1 may be overwritten by another transaction’s commit
operations on r1. Even so, we don’t need to worry about it.
I don’t quite understand the reasoning. It seems the
read validation on r1 can trigger some global operation (e.g.
locking on r1, or changing the version of r1) so that another
concurrent transaction can observe the difference of r1 and will
not overwrite r1. Is it true?

Say, if we have one transaction (Tran1) with read version 8 and
write version 11. Another transaction (Tran2) with read version
9 and write version 10. What will happen if Tran2 overwrite the
“r1” in Tran1 during commit?

In another scenario, if we change the read version and write
version of Tran2 to 9 and 12 respectively, and keep the read
version and write version of Tran1 as 9 and 10, what will happen
if Tran2 overwrite r1 in Tran1 during commit?

I really appreciate your time in answering me these questions!

Thanks,
Tingyu
Valentin Tihhomirov Says:

April 30, 2016 at 4:28 am
You mentioned a potential risk is that during the commit
time, r1 may be overwritten by another transaction’s commit
operations on r1.

Yes, it sounds like a conspiracy. All versioning expositions start with the fact that you do optimistically, without locks. In the end of the day you find out that locking is still needed at the commit stage. But, only write lock. They say that they read some values and, in versioning scheme, they need to check the readed values before committing the transactions even though they confirm themselves that it won’t work — a parallel transaction may slip in, it can read-verify and commit while you are doing the same “at your leisure time”. You normally take read locks to prevent this, I guess. I do not see how write locks (which prevent write-write conflicts, IMO) can ensure read-verification-and-commit atomicity. Unless you have locked your read registers, there can always can be a transaction between your read verification and commit. It can always update pFoo between your pFoo==null and pFoo = new.

The read set is small–just the location of the variable pFoo. It is not locked by another thread (although it’s locked by us, because it is also present in our write set)

Yes, I am realling something. I read some Wikipedia article about MVCC saying that read set must be a subset of the write set. This won’t let the other treads to update your read set before you commit or roll back.

I read the same hoax in the stackoverflow (TL2 exposition) as well

At commit time you take write locks against every attribute you are about to write. Whilst you have the locks you double check that your read-set is still valid; that the attributes you read in your transaction have not been updated to a higher version by another transaction. If so then your business logic may not be valid as you can have inconsistent reads so you need to rollback the whole transaction. If the final checks pass then you commit by flushing your write-set, bump the versions of those attributes, release the write locks.

I wonder, why nobody notices that double check does not save you from concurrent transaction sneaking between your read check and write. Even those who write a side bar for telling that simple double check does not solve the problem, forget to make the solution clear. And if you ask them to, they will evade again into explaining how lock-free their algorithm is. The expert refusal to iron out this key issue makes me to think that TL2 is a big hoax.
Valentin Tihhomirov Says:

April 30, 2016 at 12:53 pm
Ok, you did read a consistent snapshot and some another transaction starts to overwrite your input data after you validated them but before you flushed the commit output. You say that that transaction must have higher ID. I agree that it is fine if it is future transaction overwriting your input data. But it cannot be some older one, which was hanging around and suddenly decided to lock its write set and commit, right after your validation but before you commited the write?
Bartosz Milewski Says:

May 2, 2016 at 4:43 pm
I understand your frustration. I had to work really hard to understand the original paper. My hope was that I could explain it a little better. Trust me, this is not a hoax or a conspiracy. This stuff has been implemented and it works in commercial software. Haskell’s STM is based on TL2.
Tingyu Says:

May 2, 2016 at 5:09 pm
I agree with Valentin. I’m frustrated at this as well. To figure out the answer to my questions (e.g. how the global version is incremented and how the read is validated without conflict), I have read several papers in the past few days, but none of them explained clearly. It seems to me that the experts try to hinder the answers with a lot of discussion on some irrelevant details, I don’t know why there is not a single paper discussing this key issue, I believe it can be clearly depicted with a few simple examples.

Tingyu
Bartosz Milewski Says:

May 2, 2016 at 6:15 pm
I don’t think the experts are trying to obfuscate something simple. It’s just not that simple. You have to analyze it step by step.

If you can find a simple explanation with just a few examples, I’m all ears.
Tingyu Says:

May 2, 2016 at 6:52 pm
I’m not saying that question is simple. I’m
just curious why I can’t find the literature that
describes this issue. I would appreciate if you
point me to some other papers talking about it
besides your blog. I would also appreciate once you
get the answers to my questions, I will keep an eye on your blog.
Bartosz Milewski Says:

May 2, 2016 at 8:21 pm
I did provide bibliography. In particular this paper by Dice, Shalev, and Shavit.
Alexander Granin (@graninas) Says:

May 26, 2018 at 1:42 pm
Hi Bartosz,

My name is Alexander Granin.

You probably don’t remember me, but we both were giving talks about functional programming in C++ far in 2014 in Novosibirsk. Since then, I was researching this theme a lot, I’ve made many showcase projects and gave 4 talks about FP in C++.

And now my greatest deal was inventing monadic Software Transactional Memory for C++ and Haskell. It’s based on my original approach with the Free monad as key component inside. The interface is quite similar to Haskell’s STM.

https://github.com/graninas/cpp_stm_free

I’ve implemented the Dining Philosopher task on it, and it works reliably. Also, I’ve finished a tutorial about it:

I hope this implementation will interest the community because it’s really simple internally and can be a demo of STM key principles. But the code is not optimized yet. To be honest, it’s highly inefficient currently (because I’m working on it two months only), and cannot probably used in production at this point. Still, I consider it an interesting approach.
Denis Khoroshev Says:

June 28, 2020 at 8:32 am
The first example of singleton would actually be a thread-safe if it was written like this:

static Foo * foo = new Foo;
return foo;

The compiler is obliged to generate a thread safe code for static initialization and it’s going to insert something like std::once_flag & std::call_once.

Of course this example is as good as any.
Tal Hoffman (@talhof8) Says:

February 6, 2021 at 1:57 am
Great write-up!

One thing I still struggle to understand is the whole “Sequencing and Validation” section. No matter how you look at it, there’s a race condition when a context switch occurs after the read-set validation block inside commit(). Such race condition will result in a transaction possibly looking at an inconsistent view of shared memory.

I mean, if this case doesn’t matter, then the whole readset validation seems redundant and pointless.

Would love a second explanation about why will it not matter.
Maciej Says:

December 15, 2022 at 8:10 am
I have a question about the final read set validation.
In particular, about the accepted read location versions.
Its mentioned that they need to be “less or equal to our transaction’s read version.”
But it seems that version greater than current transaction “write version” is also ok. Is this right?
Bartosz Milewski Says:

December 15, 2022 at 10:03 am
I’ll direct you to the original paper, which has moved since. Here’s the new link.