Testing Query Planners
A thing I did not appreciate for a long time is how different pieces of software merit different testing methodologies. I don't necessarily mean like, a pacemaker should be subject to different standards than an app that makes fart noises, I mean, different kinds of software demand different techniques to effectively test them, in addition to demanding different intensities of testing independent of their importance.
Software A might demand more intensive testing than software B not because it is more important that it works properly," but because structurally, socially, or otherwise, it will be more difficult to verify it is correct without particular tools.
Many services, for example, are simple interfaces over some data store. They come up, maybe read some configuration, then serve approximately the same query for their entire lives. The kind of testing this merits is very different from a query planner, which is going to serve a very wide range of different queries and workloads. Two deployments of it might not see any of the same kind of work.
For this reason, I think software with a wide surface area, like a query planner, is much better served by a property-based testing approach than a piecemeal, unit test-at-a-time approach. I want to emphasize that I am not a PBT purist, I think it has a very high cost and has questionable benefit for most things. But for anything that resembles a "language" and permits construction of arbitrarily complex inputs, my experiences have sold me on it being an absolute necessity.
Luckily for us, over the past few years, there have been some fantastic techniques developed expressly for the purpose of doing this kind of testing on SQL databases. They're generally some variation of metamorphic testing. I love metamorphic testing for SQL databases because it in large part reduces the problem of testing a database to building a high-quality query generator. I'll just outline my favourite (simplest) technique, Ternary Logic Partitioning, but Manuel Rigger and his collaborators have, at this point, developed a wide variety of these techniques.
Input: A SQL query Q
and a predicate p
.
Output: Three new SQL queries Q_t
, Q_f
, Q_N
, which must satisfy Q_t + Q_f + Q_N = Q
We transform Q
and output:
SELECT * FROM Q WHERE p
SELECT * FROM Q WHERE NOT p
SELECT * FROM Q WHERE p IS NULL
The power of this transformation is that SQL planners love to push predicates like this down, and, ideally, in some of those cases, the predicates making their way down the tree will disturb the structure, triggering possibly buggy optimizations. Then, you'll be able to detect the difference in output with the original query.
I think this is so powerful because SQL is so old and wide, and it's very difficult for any individual programmer to test all O(n^2)
combinations of features. I think it's very cool that we can, via metamorphic testing, go straight from "SQL query" to "test case," with no need for an oracle of what the right answer is in the middle.