NULL BITMAP by Justin Jaffray logo

NULL BITMAP by Justin Jaffray

Archives
April 27, 2026

Where Optimizations Come From

NULL BITMAP.png

I used to have this thought that optimizations in compilers and databases were a bit of a bandaid to badly-written code or something. I think this is not a correct mental model, though.

I've also been trying to learn more about compilers recently, just because I have query planning problems that I feel like must have already been solved by the compiler world. So here's some musing.

Languages like Rust benefit from compiler optimizations much more aggressively than other low-ish-level languages. In C and Go, it's idiomatic to write more-or-less what you expect to happen. As a result, modulo some things that are hard to express directly (say, stack/heap allocation behaviour in Go), very often the generated code closely resembles the code that was written. In Rust, this is true semantically but not so much syntactically. Because a style of leaning on "zero-cost" abstractions is much more common in Rust, Rust programmers generally have a much greater expectation that the wrappers and indirection and generic code they write will "boil off" into something that's actually reasonable.

In my experience, Rust programmers have a fairly strong mental model of when and how the code they write will get turned into machine code, and they're constantly aware of where the various optimization and monomorphization barriers that they might be introducing are. As a result, the Rust compiler must perform those optimizations or else the software it produces will simply not be fit-for-purpose. This is not as true of the C or Go compilers.

And the reason for this, in language like Rust, is that people want to write more abstract software. We want to use generics, to use iterator chains, and so on, and as a result optimizations are particularly important. This is cultural, not an innate feature of the language being optimized

To be reductive: the two kinds of needs for optimizations are from:

  • decisions that cannot be explicitly expressed in the language, and
  • simplifications from the peeling away of abstractions (which we could have, theoretically, done ourselves).

I think this dichotomy also exists in the database query planning world but manifests in a slightly different way.

The class and scope of things that we can't express (by design) in a SQL query is much higher than it is in a typical programming language. Save for the presence of things like hints, we can't typically say whether a join should be a hash join or a merge join or an index join, or even in what order the joins should be computed at all. Whether an aggregation will use a hash table or rely on some kind of sortedness is totally inexpressible in vanilla SQL. All of this is the purview of the optimizer.

On the other hand, many SQL queries are built up incrementally across various layers of a stack. A data visualization tool might talk to an ORM which has its own abstractions, which then writes queries in terms of several views. While in theory, if a user saw such a query before it was to be executed, they could probably rewrite it to remove obvious redundancies, no one except for the database itself actually has a convenient place to do such rewrites. We need the database, who sits at the very end of this process, to peel off the abstractions that the data visualization tool and the ORM leaned on in order to get to something that's actually appropriate to execute.

I think discussing compilers and query planners together is a bit tricky because they are similar in many ways and very different in many ways. But I think this is one place where they align and it's useful to think about how they're similar.

Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

Add a comment:

GitHub
justinjaffray.com
Bluesky
Twitter
Powered by Buttondown, the easiest way to start and grow your newsletter.