TPC-See?

observed

                June 3, 2024

            TPC-See?

            One thing about concurrency control (“isolation”) in a transactional database is that it incurs costs, and there’s broadly two kinds of such costs.
The first such cost is the essential sacrifice of providing isolation. In a general transactional model, two transactions simply cannot concurrently operate mutably on the same piece of data. There’s a speed limit imposed by our requirement for isolation that we can’t optimize our way out of, and it rears its head in the form of aborts (in the case of optimistic concurrency control) and lock contention (in the form of pessimistic concurrency control). Or both, if you’re feeling frisky.
The second such cost is overhead. Stronger levels of isolation typically require more bookkeeping than weaker ones. We need to aggressively track the read-and-write-sets of our transactions, even if in reality, no conflicts will occur.
In practice, if you remove the contention, you can reduce the impact of the first kind of cost. Replacing a counter increment with a row insertion means that transactions aren’t fighting to increment that one location in memory.
But if I’ve removed the contention, which is precisely the reason I have concurrency control in the first place, why am I still eating the second cost?
If I’ve designed my workload such that every one of my customers operates in a disjoint keyspace, and every one of my customers only issues transactions serially, then, well, I don’t need any of this stuff at all. All my executions will be serializable without it because all my transactions either commute or are non-concurrent.
I share this example to make the point that the observed isolation of a workload is a function of both the concurrency control algorithms used and the workload itself. If we say that a concurrency control algorithm provides a particular level of isolation, we’re saying that for every workload it does so. But for many specific workloads it might provide something that appears stronger to any observer, if said workload doesn’t expose whatever corner of the algorithm that fails to provide it. If I ran the above workload using “Justin’s Shitty Concurrency Control Algorithm” that has no locking or other safety mechanisms of any kind, that’s actually indistinguishable from an algorithm that provides true serializability. In fact, JSCCA is probably better in that situation because it has fewer costs of the second kind.
The Transaction Processing Performance Council (TPC, for some reason) provides a number of standard benchmarks for comparing database systems. These serve a variety of different purposes but the most well-known is probably TPC-C (the “-C” standing for the third letter of the alphabet) which is used for testing transaction algorithms. TPC-C has a long, kind of boring spec that vendors publishing benchmarking blog posts love to ignore. One thing I want to highlight in that spec is that it requires serializable execution for a run of the benchmark to be considered valid. There’s a set of queries that have to return expected results in order for a given run to “count.” Of course, there are also other requirements like “getting your implementation audited by TPC” which I don’t know if anyone has done since you had to request a copy of the spec be mailed to you on six thousand floppy disks.
But it raises an interesting question, since if we’re gaming this here benchmark, well, we’ve fixed a workload, which means we might not really need the full power of a serializable isolation algorithm! Maybe we can get away with something more akin to Justin’s Shitty Concurrency Control Algorithm and save ourselves some overhead. But this is a carefully designed, well regarded benchmark—surely it actually exercises serializable execution?
In Making Snapshot Isolation Serializable, Fekete et al. show a minor modification that can be made to databases providing snapshot isolation to ensure all executions are serializable. The paper is great. The point is that they recognize a particular kind of “dangerous structure” in the dependency graph of transactions that is the source of any isolation anomalies that occur under snapshot isolation. They observe that if you prevent such dangerous structures by carefully aborting transactions, all executions can be made to be serializable. They call the resulting algorithm "Serializable Snapshot Isolation" and this is the algorithm that eventually made its way to Postgres as its algorithm for serializable isolation.
The paper also analyzes TPC-C and shows that it surprisingly contains no dangerous structures that can induce anomalies! To connect the dots:

TPC-C requires serializable isolation.
If you provide snapshot isolation, you can provide serializable isolation by avoiding dangerous structures.
TPC-C has no dangerous structures.
Therefore, TPC-C, run under snapshot isolation, provides only serializable executions.

That sounds like an oversight! Is anyone looking into this? I asked my friend Pittsford Travers if he knew anything about this and he said "well, who do you think might benefit from TPC-C not actually exercising serializability, and would have the power to influence such things?" When I told him I didn't know what he was gesturing at but he was sounding conspiratorial again he sent me this clip (skip to 8:40). Decide for yourself.

Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

Start the conversation: