I am a SQL Injection Attack

syntax

                October 4, 2022

            I am a SQL Injection Attack

            Recently I've been very interested in "meatspace models", where you explain a CS concept in terms of the real world. It's the monad is a burrito gag except you actually pick something where the analogy is helpful. This seems especially effective with concurrency topics. I first saw this in terms of race conditions, my own first attempt was two-phase commit, and the Strangeloop TLA+ workshop was framed in terms of musical chairs.
In "Commonsense Computing", Lewandowski et al argue that students learn concurrency faster when its presented as "real-life" problems, not abstract algorithms. I think about this a lot. Here's my attempt at translating the two phase commit protocol:https://t.co/mwdbwT5NSj
— Inactive; Bluesky is @hillelwayne(dot)com (@hillelogram) August 25, 2022

Are there meatspace models for things besides concurrency? Turns out I've already used a meatspace model to explain SQL injection attacks to layfriends (say if it's on the news). At a very high level, an injection attack is the conflation of syntax and data, which isn't a difference most people have encountered before. So here's how I explain it.¹

I used to have this problem when meeting new people:

H: What's your name?
B: Brian.
H: Hillel, nice to meet you.
B: So what's your name?
H: ???

It took me an embarassingly long time to realize what was happening: my name sounds an awful lot like "Hello"! There are two distinct sentences with the same order of words, but different meanings:

"Hello, nice to meet you": The first word is part of the syntax that makes the sentence a standard idiom.
"Hillel, nice to meet you": The first word is carrying data, my name, to the other party.

A person has to resolve the ambiguity of the sentence and figure out what the first word was for from the context. For standard names, like "Brian", it's easy enough to tell. But my name is exceedingly rare in the US, so people are much more likely to hear "hello" instead.²
Once I realized this, I started using a construct which is unambigous:

H: Nice to meet you, my name is Hillel.

"My name is X" makes it clear that "X" is definitely data. A person will always interpret what follows as my name. If it sounds like "Hello", instead of assuming it was actually syntax, they'll ask me to repeat it.
Make sense? Great. So how does this apply to "SQL injections"?
SQL Injections
The good thing about computer programs is they're never ambiguous. The bad thing about computer programs is they're never ambiguous.
The way databases work is the programmer sets up a query, and the user provides some data. For example, the query might be:

Display for user INSERT-ID-HERE their email

The only thing you can do is give the query an ID. If you give it ID 19, you get

Display for user 19 their email

Clearly, the query is the syntax, and the ID you submit is intended to be data. A SQL injection attack is when you instead submit something that's ambiguous and can be data. For example, giving the ID "19 their password and". Then the query becomes

Display for user 19 their password and their email

And now the query engine is interpreting part of the data, the "their password and", as syntax. And now you've leaked the user's password!
There's different ways we can construct queries to make the data part clearer, but also different ways that attackers can construct their data to look like syntax. Ultimately what us programmers do is use a "prepared statement", where we send the query and data in separate messages. That might look like

Display for user {id} their email
{id} is INSERT-ID-HERE

Now if I try the same trick, I'd be sending to the database

Display for user {id} their email
{id} is 19 their password and

And the query would get back to me "there's no user '19 their password and'." And that's how SQL injections work!

So that's an example of a meatspace model: first I present a version of the topic that's based in a real-life problem, so the student can relate to it directly. Then I analogize it to the computational domain.³ Notice that it's an incomplete description, and there's going to be aspects of the real system not represented in the meatspace model. This is fine; it's there to bootstrap the person's mental model, not be the mental model.
Even so, we can still dynamically extend the meatspace model to match specific complexities. So it's viable to backport a specific real-system problem into the model, explain it there, and then forward-port into the real model. I found that useful for introducing the topic of strong fairness to my audience. If everybody's playing musical chairs, and there's far more people than chairs, you may never actually sit in a chair, even if you have an endless sequence of rounds. Strong fairness is the guarantee that if you play enough times, you will eventually sit in a chair.
It's interesting to me how easy it is to come up with meatspace models for concurrency topics, while people have tried and failed for years to come up with a good meatspace model for FP monads. Maybe it's too broad a topic?

Mandatory mention of "little bobby tables" so 50,000 people don't email me ↩

For a more extreme case, the Hebrew word for "hello" is Shalom, which is also a common Jewish name. ↩

If you don't have anybody with convenient ambiguous names hanging around, the Abbott and Costello sketch Who's on First might be a good substitute. ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: