NULL BITMAP by Justin Jaffray
Subscribe
RSS
Archive
Why are Things Associative?
December 16, 2024
The big three algebraic properties that show up in the distributed systems literature are associative, commutativity, and idempotency (EYE-dem-po-ten-see)....
What are the Magical Clocks for?
December 9, 2024
At Re:Invent last week, AWS announced DSQL, their new serverless SQL database. As a fan of distributed SQL databases I have been enjoying reading about the...
Thoughts on DuckDB's Grammar Patching Thing
December 2, 2024
It is the job of modern programming languages to be amenable to abstractions. It should be easy for users to take a little bundle of functionality and reuse...
Fine! I'll Play With Skiplists
November 25, 2024
Follow me on Bluesky! A Log-Structured Merge tree, or LSM, is a popular data structure for storage engines. It’s what is used by RocksDB, which is sort of...
The Prequel to SQL is SEQUEL
November 18, 2024
Meta note: follow me on Bluesky if you are so inclined. I've been going through some historical stuff lately. Been on a history kick. We went to see the...
The CVM Algorithm
November 11, 2024
Everything you need to know about query planning can be understood from this query: SELECT * FROM xy WHERE y = 3 ORDER BY x Imagine we have two indexes, one...
How Many Grains of Sand are in a Pile
November 4, 2024
“This code’s no good,” says Pittsford, pointing at the screen: def update_and_get_balance(account, amount): database.debit(account, amount) return...
Adventures in Probability
October 28, 2024
I hope everyone had a good weekend. I went on a hike. It was great. Statistics One of my great school regrets, next to “taking three years of business...
Pushing Values to Zero
October 21, 2024
There is a popular presentation of the exponential distribution that goes through the geometric distribution. The geometric distribution arises from flipping...
Linearity in Query Processing
October 14, 2024
I saw a fun post this week about interpreting colours as vectors and the way you can exploit that interpretation to nicely represent certain operations you...
The Relational Derivative Pt. 2: Groups are Good
October 7, 2024
Last week we talked about the notion of the relational derivative and its application to enforcing integrity constraints in databases. The relational...
Integrity Constraints and the Relational Derivative
September 30, 2024
In a SQL database, you can set up a foreign key with REFERENCES: nullbitmap=# CREATE TABLE ab (a INT PRIMARY KEY, b INT); CREATE TABLE nullbitmap=# INSERT...
The Two Machines
September 23, 2024
There's a joke in my friend circle that asks "is it a database?" A startup, a program, a syscall, a person good with numbers, a person with a good memory....
Benchmarks That Aren't Your Friends
September 16, 2024
We’ve now talked twice about an important dimension of a benchmark: the openness of the loop. While there’s more subtlety to it, if what you take away is:...
Measuring Throughput
September 9, 2024
Note: I'm trying out enabling comments. Behave! I reserve the right to disable them again for any reason including "the very idea of someone commenting made...
SQL's Grammar Ambiguity
September 2, 2024
I was going to write a longer, more involved post this week, but I wound up getting Covid and didn't have it in me to work through all the stuff I had to for...
Languages Without Abstraction
August 26, 2024
Implementing something like a compiler, there is the understanding that we want different representations of a program for different purposes. This is why we...
The Closed-Loop Benchmark Trap
August 19, 2024
Let's make a little mock database. It won't actually do anything but pretend to handle requests. I've had cause at work lately to write some Go so we're...
Some Books I Like // One Year of NULL BITMAP
August 12, 2024
This is the 52nd NULL BITMAP, which means I have been doing this for a whole year! I skipped one week before I decided that this was going to be a weekly...
Solving Rubik's Cubes with Computers
August 5, 2024
The first place I got into programming was making games. You can probably dig up some games I made long ago if you try hard enough (do not). In high school,...
What do we want? Negation! When do we want it? Not later!
July 29, 2024
Hello I am traveling this weekend! I am home in Ontario. I wanted to share this picture I took of this guy feeding a bear. Bear eating an object out of a...
A Query Planning Guideline
July 22, 2024
A number of readers last week reached out to direct my attention to What if a SQL Statement Returned a Database which is a much more thorough and well-...
Multiple Returns in SQL
July 15, 2024
Podcast Appearance I was on the Thinking About Computers podcast recently! We had a fun chat on the topic of writing technical content and research....
Deriving Functional Dependencies for Selections
July 8, 2024
Over the last two weeks you were babysat by the pre-scheduling feature of Buttondown, since I was away in Japan without my laptop. Here's me in Japan: I had...
To Understand Correctness, You Must First Understand Incorrectness
July 1, 2024
Recently I was in a Discord channel where someone wrote something akin to “I have a question about linearizability” which had an attached thread with 80...
Physical Properties #4
June 24, 2024
Previous parts of this series: Physical Properties #1 Physical Properties #2 Physical Properties #3 Relational query planning makes a distinction between...
In Codd we Trust (or not)
June 17, 2024
Get in loser, we’re doing another Codd philosophizing session. Codd’s paper introducing the relational model opens up like so: Future users of large data...
NULL BITMAP Builds a Database #2: Enter the Memtable
June 10, 2024
I didn't realize how hard it would be to bin-pack episodes of this thing into the roughly ~750 word chunks that I try to keep issues of this newsletter at....
TPC-See?
June 3, 2024
One thing about concurrency control (“isolation”) in a transactional database is that it incurs costs, and there’s broadly two kinds of such costs. The first...
Avoiding Cross Products with the Query Graph
May 27, 2024
To compute the join of two relations, we find all pairs of rows their rows which have the same value for any columns with the same name. This is sometimes...
NULL BITMAP Builds a Database #1: The Log is Literally the Database
May 20, 2024
It is time to end the tyranny of people becoming interested in database implementation and building a BTree. Let us turn to the succor of immutable storage....
A Card Counting Trick
May 13, 2024
I used to call this thing a “game” but my friend Kevin (who is not the same Kevin as last week but who is also a mathematician) kept telling me it’s really...
The Official NULL BITMAP Glossary: Graph Theory Edition
May 6, 2024
Last week a post of mine made it to God’s favourite website and one thing I was struck by was how many people disagreed about basic graph theory terminology....
Not all Graphs are Trees
April 29, 2024
It's pretty easy to imagine how to represent relational algebra expressions as a tree—they are already structurally rooted trees where each operator has its...
Heath's Theorem
April 22, 2024
We don't have all that much in the world of relational query planning that could be considered a "fundamental theorem," as in like, some central idea that...
The Geometry of SQL
April 15, 2024
Today I want to talk about a way to think about some relational algebra operations. First, let's start with this relation: This is just a handful of games...
My First Distributed System
April 8, 2024
I can show you a picture of the first distributed system I ever used: (Not entirely accurate, I had a Game Boy Color.) When I was a kid, we'd spend summers...
A Sniff Test for Some Query Optimizers
April 1, 2024
One important part of query planning is performing transformations over queries. Today I want to see how a couple common databases perform on a completely...
When Compositionality Fails
March 25, 2024
The idea of abstraction is that we can take some complex thing and present it as some simpler thing. Where people can ignore the aspects of the thing not...
So You Want to Generate SQL Queries (me too)
March 18, 2024
We have talked before about how to appropriately test query planners. I wrote there: I love metamorphic testing for SQL databases because it in large part...
CAP is Good, Actually
March 11, 2024
It seems like there are two main takes regarding the CAP theorem online: In introductory materials, it is presented as a deep, fundamental truth about...
A Very Basic Decorrelator
March 4, 2024
Today we're going to begin implementing a simple query decorrelator. We're not going to finish it in this post, and I'm not sure how many posts it will take,...
Some of My Favourite Query Planning Papers
February 26, 2024
Something I've learned about programmers is that for some reason they absolutely love being recommended papers. They lose their minds for it. So here is an...
The Three Places for Data in an LSM
February 19, 2024
We have talked before about how to conceptualize what an LSM does. I want to talk about another way to think through how we put together this data structure....
Physical Properties #3
February 12, 2024
Last week we walked through how a query optimizer might use Physical Properties to optimize a query plan. This week, I want to talk through one surprising...
Physical Properties #2
February 5, 2024
Last week, we talked about the idea of physical properties, which are attributes of a result set that, in some sense, do not have bearing on whether that...
Physical Properties #1
January 29, 2024
I wanna talk about relational algebra. Specifically, the things relational algebra is not concerned with, but obviously matter. Things like: ordering of a...
Testing Query Planners
January 22, 2024
A thing I did not appreciate for a long time is how different pieces of software merit different testing methodologies. I don't necessarily mean like, a...
Why SQL is Unkillable
January 15, 2024
My Twitter bio is it is easier to imagine an end to computing than an end to sql. I've thought a lot about the question of why SQL is such a cockroach. It...
Certificates and Duality
January 8, 2024
I wanna talk about optimization. Not query optimization, or program optimization, today, but the other kind. Mathematical optimization. Like when your high...
Older archives
GitHub
Website
Bluesky
Twitter