The Problem with Declarativity
IN or OR
There was a very nice blog post from OtterTune [1] recently, digging into some of the quirks of the way Postgres executes a particular query:
Query best practices: When should you use the IN instead of the OR operator? | OtterTune
OtterTune uses AI to automate database tuning and solves configuration problems to optimize PostgreSQL & MySQL performance and lowers costs.
I really liked this post, not just because it's well written. I liked it because it highlights the uncomfortable truth at the heart of declarative query planning.
At the extremes of declarativity, it's really tempting to believe that the program or query is just a way to express what results you want out of the computer. In fact, that's the promise! Codd's whole deal was a world where the computer is the only party that cares about the execution of my query.
Of course in practice, that's untrue on its face. Not just because of some theoretical reason like "the computational difficulty of determining if two queries are the same," or something, but empirically, in popular databases, very common operations are performed completely differently. And users care about that!
Not to mention, query planners do a bad job a lot. You're pretty happy if they let you not think about how your query is executed 90% of the time. You can bring that number up if you're willing to throw even more computational resources at the problem (and usage-based providers would be happy for you to do that), but this is a thing that anyone who interacts with databases in some intimate way will have to think about at some point.
I should say, I'm a declarativity guy. But to me, fervent belief in this kind of declarativity isn't about allowing programmers to ignore "implementation details." I think that argument falls apart pretty fast under some light scrutiny. While it is nice to have those 90% of cases where you can ignore such things, that last 10% still exists.
I think those of us who feel strongly about declarative computation do so because we believe in giving the computer options. I can implement a query by hand in C and it will work fine (well, probably not if I wrote it in C, but you get my point). But as the data it operates on shifts and grows, the implementation decisions I made may no longer make sense. Or they might not be right in a different setting. Or there might be a better way to do things when my program is embedded in a larger context. And by using a language like SQL, not imposing requirements on how a query is executed gives the system the freedom to make adjustments to suit changing needs without intervention. It's about alleviating a maintenance burden and about allowing a different scale of program authorship.
One problem, once you decide that people writing queries should also understand how those queries are executed, is that you now are at least somewhat obligated to not mangle the translation of SQL-to-query plan too badly. Not that this is a bad idea for debuggability. It does call into question, to me, the entire premise of decoupling "what you want" from "how to compute it."
That's no IN
This particular computation has frustrated me a lot in SQL databases because it's so common and so obnoxious to express. You always end up doing this kind of thing where you construct a big IN
statement or a big OR
, and you generally have to do this via string concatenation because (at least in Postgres, which is the dialect I'm most familiar with) you can't send big tuples as parameters to placeholders.
Not to mention as before, in Postgres, it appears, you might want to structure this differently depending on its shape. It seems like Postgres doesn't do any kind of cost-based analysis to automatically swap between the two.
The thing that really frustrates me here is that Postgres has the machinery within its guts to optimize this exact computation itself. The computation the author really wants to perform is a join. If you want to restrict some relation to the rows where columns are in some set, you just want the join of that relation on that column with said set.
The problem is that Postgres doesn't let you have table-valued parameters, where you send in a relation as a placeholder. You can send in an array and then unnest
it, but that's finicky, troublesome with multiple columns, and I would just like something that feels a little more natural.
We Fixed the Bug
VLDB was this past weekend and I've been slowly making my way through some of the papers that looked interesting to me. One that stirred up some conversation in my circles online was Automatic SQL Error Mitigation in Oracle by Pasupuleti, Li, Su, and Zaiuddin.
I want to be very clear: I think this is a genius piece of engineering and solves real problems.
That said. It's extremely funny.
Here's the abstract:
Despite best coding practices, software bugs are inevitable in a large codebase. In traditional databases, when errors occur during query processing, they disrupt user workflow until workarounds are found and applied. Manual identification of workarounds often relies on a trial-and-error method. The process is not only time- consuming but also requires domain expertise that users are often lacking. In this paper, we propose a framework to automatically mitigate errors that occur during query compilation (including optimization and code generation) without any user intervention. An error is intercepted by the database internally, a workaround is identified for it, and the query is recompiled using the workaround. The entire process remains transparent to the user with the query being executed seamlessly. The proposed technique handles SQL errors during query compilation and provides three types of mitigation strategies – i) quickly failover to one of the readily-available historical plans for the statement ii) apply targeted error-correcting directives (hints) identified from the optimizer context at the time of the error iii) modify the global configuration of the optimizer using hints.
This feature has been implemented and will be released in an upcoming version of Oracle Autonomous Database
To get a little more precise regarding what they're talking about, here's a sentence from slightly later:
For example, an assertion can be added to check whether a pointer is null before its de-reference and is fired if the pointer is found to be null. In our experience, assertions account for a large percent of bugs filed for SQL query compilation failures.
What they're saying is, if, during the process of planning a query, an assertion fails, they say "ah, that was no good," and then give it another go, but, differently this time.
We're not talking about the distributed systems kind of failure, which is normal, transient, and expected in everyday operation. We're talking about automatically mitigating logic bugs that exist in the binary you have shipped to your customers.
This is some next level "reliable computer from unreliable parts" kind of stuff.
I think there's a couple main takes on this. The first is that if you're this afraid of bugs you can detect via assertions, imagine what's lurking in the codebase that can't be detected via assertions. Wrong query results are much more insidious than an assertion failure, and are much harder to detect (though modern research has led to some great tools for this).
However, I think pragmatically speaking, as a business, the easiest way to undermine customer trust in your database (the piece of software you plan on trusting the most) is assertion failures or panics. I actually buy that the cost in trust from your query optimizer choking on a query for no good reason is so high that you will happily do whatever goofy manner of thing like this to prevent that from happening. Yes, wrong data is bad. But this is immediate, annoying, and from the customer's perspective, amateur hour.
Oracle is old, crusty, and written in C, which is not known for its safety facilities. And SQL queries come in so many shapes and sizes that like any programming language, it's impossible to predict every kind of input you'll eventually receive. And unlike most programming languages, you're expected to do deep, logical rewrites of your operators and the way they interact. There's a lot of moving pieces, and planning bugs are found all the time even in mature databases.
My takeaway is that this is the kind of solution to a problem you'd only encounter working on a piece of software as old and widely-used as Oracle. It's cool to me to see what comes out of that kind of environment.
[1]: Not a sponsor of this newsletter. Nor do I endorse any database technology. Unless one of them wants to pay me a lot of money. And I mean a LOT of money. Definitely at least a twenty. You know how to contact me.