A Sniff Test for Some Query Optimizers

                April 1, 2024

            A Sniff Test for Some Query Optimizers

            One important part of query planning is performing transformations over queries. Today I want to see how a couple common databases perform on a completely made-up and unrepresentative benchmark.
I have a lot of caveats for this one. I need to emphasize before we start that the following test is not remotely indicative of overall optimizer quality, and it's not necessarily even a deficiency if an optimizer doesn't pass it, there's a lot of variables that go into deciding which transformations to do. I don't have a deep understanding of how all of these optimizers work, so it's possible they do the transformations I've listed in different situations, or in a cost-based way, or for their use-cases, they're not that useful. I'm also biased because I worked on CockroachDB and am only aware of several transformations because of that.
I have, out of my ass, plucked three SQL Query optimizations. Today we're going to survey a list (which I have also plucked out of my ass) of databases and see which of them do them.
The transformations I have chosen are:

An EXISTS to semijoin decorrelation,
DISTINCT elimination based on primary key, and
a common subexpression elimination.

EXISTS to semijoin decorrelation
Query
CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);

Explanation
As written, this query suggests evaluating the subquery SELECT * FROM xy WHERE a = x for each row. This can be transformed into a semijoin, or even a regular inner join to exploit the database's built-in join functionality to do it faster.
DuckDB
Verdict: ✅
EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            SEMI           ├──────────────┐
│  a IS NOT DISTINCT FROM a │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            ││             xy            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             a             ││             x             │
│             b             ││                           │
└───────────────────────────┘└───────────────────────────┘

CockroachDB
Verdict: ✅
> EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
           info
---------------------------
  distribution: local
  vectorized: true

  • merge join (semi)
  │ equality: (a) = (x)
  │ left cols are key
  │ right cols are key
  │
  ├── • scan
  │     missing stats
  │     table: ab@ab_pkey
  │     spans: FULL SCAN
  │
  └── • scan
        missing stats
        table: xy@xy_pkey
        spans: FULL SCAN
(17 rows)

Postgres
Verdict: ✅
EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
                            QUERY PLAN
------------------------------------------------------------------
 Hash Join  (cost=60.85..99.39 rows=2260 width=8)
   Hash Cond: (ab.a = xy.x)
   ->  Seq Scan on ab  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=32.60..32.60 rows=2260 width=4)
         ->  Seq Scan on xy  (cost=0.00..32.60 rows=2260 width=4)
(5 rows)

SQLite
Verdict: ❌
> EXPLAIN QUERY PLAN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
QUERY PLAN
|--SCAN ab
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH xy USING INDEX sqlite_autoindex_xy_1 (x=?)

DISTINCT elimination
Query
CREATE TABLE ab (a INT PRIMARY KEY, b INT);
SELECT DISTINCT * FROM ab;

Explanation
Since a is a primary key of the table ab, a must be non-null and distinct for every pair of rows. This means that every row in the table is distinct from every other row, and the query can be simplified to:
SELECT * FROM ab;

DuckDB
Verdict: ❌
D EXPLAIN SELECT DISTINCT * FROM ab;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│             #1            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            ab.a           │
│            ab.b           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             a             │
│             b             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           │
└───────────────────────────┘

CockroachDB
Verdict: ✅
> EXPLAIN SELECT DISTINCT * FROM ab;
         info
-----------------------
  distribution: local
  vectorized: true

  • scan
    missing stats
    table: ab@ab_pkey
    spans: FULL SCAN
(7 rows)

Postgres
Verdict: ❌
=# EXPLAIN SELECT DISTINCT * FROM ab;
                         QUERY PLAN
------------------------------------------------------------
 HashAggregate  (cost=43.90..66.50 rows=2260 width=8)
   Group Key: a, b
   ->  Seq Scan on ab  (cost=0.00..32.60 rows=2260 width=8)
(3 rows)

SQLite
Verdict: ❌
> EXPLAIN QUERY PLAN SELECT DISTINCT * FROM ab;
QUERY PLAN
|--SCAN ab USING INDEX sqlite_autoindex_ab_1
`--USE TEMP B-TREE FOR DISTINCT

Common Subexpression Elimination
Query
CREATE TABLE ab (a INT PRIMARY KEY, b INT);
SELECT
  (SELECT max(a) FROM ab),
  (SELECT max(a) FROM ab)+1,
  (SELECT max(b) FROM ab),
  (SELECT max(b) from ab)+1;

Explanation
This one is a little contrived. I'm not even sure what the right way to do it is. I kind of just want to see what happens. I'll say a database passes if it doesn't scan the tables (even partially) four times.
DuckDB
Verdict: ❌
Oh god, this one is getting me sent to your spam filter for sure.
┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          SUBQUERY         │
│       (SUBQUERY + 1)      │
│          SUBQUERY         │
│       (SUBQUERY + 1)      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       CROSS_PRODUCT       ├────────────────────────────────────────────────────────────────────────┐
└─────────────┬─────────────┘                                                                        │
┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │                                                          │    UNGROUPED_AGGREGATE    │
│                           ├───────────────────────────────────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           │                                           │              │         first(#0)         │
└─────────────┬─────────────┘                                           │              └─────────────┬─────────────┘
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │                             │    UNGROUPED_AGGREGATE    ││         PROJECTION        │
│                           ├──────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           │              │              │         first(#0)         ││             #0            │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    ││    UNGROUPED_AGGREGATE    ││         PROJECTION        ││      STREAMING_LIMIT      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
│         first(#0)         ││         first(#0)         ││             #0            ││                           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        ││      STREAMING_LIMIT      ││    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            ││             #0            ││                           ││          max(#0)          │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│      STREAMING_LIMIT      ││      STREAMING_LIMIT      ││    UNGROUPED_AGGREGATE    ││         PROJECTION        │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││          max(#0)          ││             a             │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    ││    UNGROUPED_AGGREGATE    ││         PROJECTION        ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          max(#0)          ││          max(#0)          ││             a             ││             ab            │
│                           ││                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││                           ││             a             │
│                           ││                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││                           ││           EC: 0           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             b             ││             b             ││             ab            │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││             a             │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││           EC: 0           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            ││             ab            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             b             ││             b             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           ││           EC: 0           │
└───────────────────────────┘└───────────────────────────┘

(Aside, I love this plan format.)
CockroachDB
Verdict: ❌
> EXPLAIN SELECT                                                                                                                                                                           (SELECT max(a) FROM ab),                                                                                                                                                                                          (SELECT max(a) FROM ab)+1,                                                                                                                                                                                        (SELECT max(b) FROM ab),                                                                                                                                                                                          (SELECT max(b) from ab)+1;
                     info
-----------------------------------------------
  distribution: local
  vectorized: true

  • root
  │
  ├── • values
  │     size: 4 columns, 1 row
  │
  ├── • subquery
  │   │ id: @S1
  │   │ original sql: (SELECT max(a) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • revscan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: LIMITED SCAN
  │             limit: 1
  │
  ├── • subquery
  │   │ id: @S2
  │   │ original sql: (SELECT max(a) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • revscan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: LIMITED SCAN
  │             limit: 1
  │
  ├── • subquery
  │   │ id: @S3
  │   │ original sql: (SELECT max(b) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • scan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: FULL SCAN
  │
  └── • subquery
      │ id: @S4
      │ original sql: (SELECT max(b) FROM ab)
      │ exec mode: one row
      │
      └── • group (scalar)
          │
          └── • scan
                missing stats
                table: ab@ab_pkey
                spans: FULL SCAN
(57 rows)

Postgres
Verdict: ❌
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Result  (cost=76.92..76.94 rows=1 width=16)
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.19..0.20 rows=1 width=4)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.15..0.19 rows=1 width=4)
                   ->  Index Only Scan Backward using ab_pkey on ab  (cost=0.15..83.51 rows=2249 width=4)
                         Index Cond: (a IS NOT NULL)
   InitPlan 4 (returns $3)
     ->  Result  (cost=0.19..0.20 rows=1 width=4)
           InitPlan 3 (returns $2)
             ->  Limit  (cost=0.15..0.19 rows=1 width=4)
                   ->  Index Only Scan Backward using ab_pkey on ab ab_1  (cost=0.15..83.51 rows=2249 width=4)
                         Index Cond: (a IS NOT NULL)
   InitPlan 5 (returns $4)
     ->  Aggregate  (cost=38.25..38.26 rows=1 width=4)
           ->  Seq Scan on ab ab_2  (cost=0.00..32.60 rows=2260 width=4)
   InitPlan 6 (returns $5)
     ->  Aggregate  (cost=38.25..38.26 rows=1 width=4)
           ->  Seq Scan on ab ab_3  (cost=0.00..32.60 rows=2260 width=4)
(19 rows)

SQLite
Verdict: ❌
QUERY PLAN
|--SCAN CONSTANT ROW
|--SCALAR SUBQUERY 1
|  `--SEARCH ab USING COVERING INDEX sqlite_autoindex_ab_1
|--SCALAR SUBQUERY 2
|  `--SEARCH ab USING COVERING INDEX sqlite_autoindex_ab_1
|--SCALAR SUBQUERY 3
|  `--SEARCH ab
`--SCALAR SUBQUERY 4
   `--SEARCH ab

Conclusion
I dunno. I wouldn't advise drawing any actual conclusions from the outcome of this, besides it being sort of fun to think about. If you know any of these systems intimately and think I misrepresented them, let me know. Or if you have other fun tests like this, also let me know.

Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

Start the conversation:

NULL BITMAP by Justin Jaffray

A Sniff Test for Some Query Optimizers

`EXISTS` to semijoin decorrelation

Query

Explanation

DuckDB

CockroachDB

Postgres

SQLite

`DISTINCT` elimination

Query

Explanation

DuckDB

CockroachDB

Postgres

SQLite

Common Subexpression Elimination

Query

Explanation

DuckDB

CockroachDB

Postgres

SQLite

Conclusion