More Blind Writes and Merge Operators
Programming Note
There is one week left to email me (firstname . lastname at gmail.com, or reply to this email) your thoughts on the first two chapters of Feedback Control for Computer Systems: Introducing Control Theory to Enterprise Programmers. Less than that, because I will be sharing my thoughts next Monday so you need to message me before that (preferably by Sunday).
To clarify what this "is:" it is not a "book club," per se. I think of it as a "study group" loosely organized around the structure of this book. There is no in-person or video call discussion component, there is only:
- emailing me your longer form thoughts on the material, ideally in the form of a blog post I can link to, and
- vaguely real-time discussion on the NULL BITMAP Discord server.
As for the type of commentary I'm interested in: I'm interested largely in your personal reaction to the material, the stuff that I wouldn't be able to get myself. Connections to problems you've faced before, other fields you have some kind of interest in, your emotional reaction to it. It's all fair game. It's an open invitation. Code you've written or visualizations you've done in response to the text is especially welcome, my philosophy on this material is that getting hands on "messing around with stuff" experience is the best kind of experience.
Anyway. On with your regularly scheduled NULL BITMAP.
We talked last week about what it might look like if SQL were designed more explicitly with log-structured merge trees in mind. This week I want to go into more detail about the expressive power and limitations of this idea.
First, we have to talk about the relationship between a data structure and a log.
One way to think about a data structure is that it's an object that we continually mutate.
For example, say we have a set data structure:
s = set()
s.add(1) # s is now {1}
s.add(2) # s is now {1, 2}
s.add(3) # s is now {1, 2, 3}
s.remove(2) # s is now {1, 3}
We can also think of this as a fold over the operations we want to perform on s
:
Add(1)
Add(2)
Add(3)
Remove(2)
This has been described to me as "logs are the Universal Turing Machine of data structures:" if you have the ability to represent a log, you can use that to represent any data structure at all. This is why consensus algorithms like Raft and Paxos are so concerned with replicating logs. We can build, on top of those, any data structure we'd like.
You can also think of this as a sort of state machine (this is where the terminology of "replicated state machine" comes from): the state of our set data structure is the elements contained in it, and the symbols that cause it to transition to a new state are the commands of either Add(x)
or Remove(x)
.
If you want to represent something like this in a traditional SQL database you need to do a read-modify-write. If we had a "set" column type in SQL and we wanted to add a new element to it, we'd have to read out the old value, compute the new set value, and then re-insert it. You wouldn't do this in most SQL databases, though: you'd set up your data model something like:
CREATE TABLE sets (
id INT,
elem INT,
PRIMARY KEY (id, elem)
);
And then modifications look like this:
INSERT INTO sets VALUES ($1, 1);
INSERT INTO sets VALUES ($1, 2);
INSERT INTO sets VALUES ($1, 3);
DELETE FROM sets WHERE id = $1 AND elem = 2;
This is sort of like what we were describing: we're turning a single point of contention into something that can be handled more granularly.
When you read back that value, if you wanted it as a single value, you'd probably just do some kind of aggregation to reconstruct the actual value you want:
SELECT array_agg(elem) FROM sets WHERE id = $1
This way of doing things is pretty common, it does have some downsides that we talked about last week:
- because the logical schema of the data is unaware of the "structure:" it doesn't know we only plan to read it in this aggregated form, so the database can't clean up any of these aggregations in the background automatically,
- the semantics of SQL demand that these
INSERT
operations still need to do a read because if there is a primary key conflict that needs to be reported to the user, even if the user doesn't care. This is still true even if you use some kind of UUID primary key, the database still needs to check.
One way you might think about this is that if the values we've inserted are a
, b
, c
, d
, and e
, then the computed value is
f(f(f(f(a, b), c), d), e)
Now, the problem we run into if we try to move f
deeper into the bowels of our storage engine and compute it lazily during cleanup is that there's an order required in the evaluation: we can only compute left-to-right. We can compute ab = f(a, b)
, and abc = f(ab, c)
, but we have to proceed in this way: if we ever have c
and d
right next to each other, we can't just opportunistically merge them, since we need the full chain of values that comes before that.
Therein lies the additional restriction of associativity: f
is associative if f(f(a, b), c) = f(a, f(b, c))
. Some examples of things that are associative are addition, multiplication, and string concatenation.
If we have associativity, we have the ability to merge objects whenever its convenient for us.
So, what kinds of things are associative that we can get some value out of? Well, like we said, addition is a big one, because it means we can count stuff. In general, things that are associative smell like function composition.
There's two main classes of things that aren't associative for these purposes.
The first is things that are morally associative, but aren't for some reason related to their implementation. The root example of this is float addition. Float addition is intended to approximate real number addition, which is associative, but float addition isn't. This can be seen by noticing that for a sufficiently big x
and a sufficiently small y
, x + y = x
, and so (x + y) + y = x
as well. And ( ... (x + y) + y) ... + y) = x
. But if we add up enough of these y
s together, they can be large enough to be noticeable when added to x
. x + (y + ... + y) != x
.
This problem extends into anything that's based off of floats: if you have a struct that you combine the fields of by adding them together, if any of those fields are floats, your combining function is no longer associative.
Things like this are basically associative, and if you can lose some accuracy, it might be okay. The way you could get into a problem with this sort of thing is if you always need the outcomes of your values to come out the same, say, if you are replicating data and you checksum it across the various replicas, if the storage engines on two different replicas merged the data in different orders, even if you don't care too much about some slight loss of precision, your checksum will come out differently.
The other kind of thing is the one that's just not associative. Our set datatype is one example, merging sets and removing elements from sets is not associative:
({1} union {1}) remove {1} = {1} remove {1} = {}
{1} union ({1} remove {1}) = {1} union {} = {1}
And like, most things are like this. Being associative is a fairly strong property. I think one obvious example you might want to do is arbitrary finite state machines, where it would be cool to be able to blind write state transitions and have the state of your state machine automatically transition.
I wanted to provide a bit more colour to this topic because it sounded like it wasn't that compelling for some people. Next time we talk about it we will get into some examples of workloads that might like to make use of blind writing.
Remember, next week is the first two chapters of the feedback book!