Why SQL is Unkillable
My Twitter bio is it is easier to imagine an end to computing than an end to sql
.
I've thought a lot about the question of why SQL is such a cockroach. It seems surprising to me that we have so, so many different programming languages, but only a handful of query languages, and in the realm of general-purpose data manipulation, really only SQL stands tall. Sure, there's a handful of languages that are purporting to be "a better SQL," but none of those are getting implemented in a major database any time soon. And if I'm making the call on what query language to support in a new database, you bet your behind it's gonna be SQL.
So what's the reason for this distinction? Why has there been a proliferation of programming languages and merely the behemoth of SQL? I think this is overdetermined, and I have a handful of contributing factors.
SQL's Penetration is Absurd
There's probably no computer language in the world with as diverse of a userbase as SQL. Now, that's not to say that it's necessarily important that data analysts are able to work your transactional database, but there's a lot of value in being able to piggyback off of
- extant documentation of how to use SQL, and
- your users's pre-existing understanding of SQL.
When it comes to teaching people how to use a tool, being able to leverage things that already exist is a big win. If you can get value out of fifty years of accumulated documentation, that's great. If people just already know how to query your database, that's even better.
SQL is "Good Enough"
I think this is sort of the lazy answer. But it's true, at least to some degree. For general-purpose programming languages, there is some space for debate about what things are important, and what parts of computation should be emphasized. But if you look at any honest attempt at replacing SQL, you get something that's still vaguely SQL-shaped, in terms of its implementation of the relational model. We got the fundamentals right on the first try, basically. And yeah, there's some bad stuff in there, bizarre syntax, weird semantics, poor capabilities for abstraction. But empirically it seems like people are getting on just fine despite all those things.
SQL is Not a Standard
Yes, okay, there is a document which is "the SQL standard." That thing is about as toothless as they come, though. Any dream that once existed of having SQL be portable across databases is completely dead. The reason everyone uses SQL is because everyone knows SQL.
But this is not such a bad thing. Once you throw out the dream of one, unified SQL, the language itself becomes a platform that is malleable. Vendors who have specific needs, or features, can graft them onto SQL pretty easily, without having to go through some standards body, or requiring users to activate some pragma. it's just "_____-flavoured SQL."
I'm not going to transpile my SQL
I've seen a couple of SQL-killers operate by compiling their query language into SQL, so you can use it directly against Postgres, or whatever. Look, I'll be straight with you: I'm not going to do that. It took the JavaScript community years to get that experience satisfactory, and they had the force of Google and Mozilla behind them, along with a ton of community will to make it work. I haven't actually tried one of these languages that compiles to SQL, but it's a hard sell to introduce another layer into the stack for me, chief.
The Query Language is a Small Piece of the Puzzle
A database is an extremely expensive and complex piece of software to build. There's a ton of components to be built, depending on the architecture, that are not related to the query language at all. This means that most-to-all database projects have to answer the question "what query language are we going to use."
Nobody ever got fired for choosing SQL. In a piece of software with as many moving pieces as a database, an important consideration is to de-risk as many individual pieces as possible. If you have a choice between one of the most popular and successful languages of all time, and something new and arguably "better," the sensible choice is SQL, regardless of any aesthetic qualms one might have with its design.
I think this is related to, but distinct from:
Novelty Budgets
You only get a couple pivot points to innovate on when building a new database. For a scary, important piece of your tech stack like a database, you're not willing to take weird bets on too many dimensions of it. I'd be willing to bet that SQL, being good enough for most things, is not bad enough to justify spending some of that budget on replacing it.