Predicate Logic for Programmers: Status Report
A 2000-word piece on why I haven't yet finished a different piece
Two years ago I started a new book: Predicate Logic for Programmers. In it I said
People often ask me what’s the best math to learn for formal methods, and my answer is always “predicate logic”. 1 It’s super useful to specifying properties, understanding requirements, and just modeling things in general. Then they ask me how to learn it and I falter.
I estimated it would be in early-access by June 2021. But then real life intervened, and then ADHD happened, and then I didn't touch it for two years. In January of this year I picked it up again, with the goal of having the first draft done by the end of winter. Well, that's not happening either, but I am 10k words in, so that's progress! Let's do a rundown of what the book is, what I have planned, and what I'm struggling with.
Goals and Themes
When you write a book, you should have a good idea of the audience, the goal, and how it'll reach the goal. The first two are easy: the audience is programmers, and the goal is to get them to use predicate logic in their everyday work. To achieve that goal, the book needs to:
- Have lots of exercises, with solutions, to help people develop the skills
- Be focused on the use-cases of logic
- Be concise, no more than 20k words (80ish pages) or so.
These all affect the design decisions of the book. For one, if I'm writing the book for programmers, I should use programmer notations, like &&
instead of /\
. I also use structs and arrays to keep it close to what programmers use. Two, it means I don't use terminology like "models" and I write a bunch of sloppy stuff that wouldn't pass muster in a more mathematically-bent text.
Probably the worst offenders are 1) not bothering to axiomatize sets, and 2) throwing in functions without formally defining them. I spent a lot of time debating both choices and whether I should keep everything formal and rigorous. Ultimately I decided against it because I think the errors these potentially introduce aren't likely to come up in the day-to-day application of logic to programming. So I'm preferring conceptual simplicity to complete soundness.
(By analogy, consider that hash tables aren't constant-time lookup, and have a worst case complexity of O(n). But most hash table implementations make the worst case exceptionally hard to hit in normal use, so virtually all programmers can just work under the assumption that hash lookups are O(1).)
Math is Flexible
Books, even technical books, have themes. The theme of this book is that math is flexible. From the current draft:
People think that math is stricter than code. In fact the opposite is true: programming languages, which need to be computable, are much more exacting than math is. And that's what makes math so useful to us: it's flexibility. In math, we can come up with new syntax whenever we want, as long as it's clear what we're saying. So we can adapt our notation to our problems, instead of the other way around. This is one reason mathematicians like to use all the funny symbols.
One example, take the statement
all x, y in set: P(x, y)
Now, a common pattern is that you want to say "for different x, y". Formally you can do it like this:
all x, y in set: x != y => P(x, y)
But if we know the rote move to say x and y are different, why not just add a new symbol?
all disj x, y, z in set: Q(x, y, z)
^
| disjoint
Math is a language where you can invent new grammar whenever you want.
Logic as abstraction
The other theme is logic as an abstraction. A logical predicate don't need to be "implemented" in the way that code abstractions do. When I say "A user belongs to a group", I can think of at least four ways to represent this at the database level:
- A user record, with a groups field
- A group record, with a users field
- A
user_group
many-to-many table - A
user_events
table withjoin_group
andleave_group
records (or anadmin
table withadd_to_group
andremove_from_group
records!)
Now add on subgroups and conditional groups and the checking logic and you can see how even a simple statement like "A user can belong to a group" gets very complicated to implement!
At the predicate level, we can ignore all those details of implementation and say BelongsTo(user, group)
is a primitive predicate that's true or false. Any implementation eventually maps onto that concept. In fact, we can go even further and say that there's a difference between BelongsTo(user, group)
and SystemThinksBelongsTo(user, group)
: making a distinction between the reality we need to administer and the system we use to administer it. As we build the system, we can "flesh out" the predicates and define them in terms of our chosen implementation.
Book Layout
I'm pretty set on this layout, though I might add a fourth use case if I can think of a really good one.
The Math
Four chapters on propositional logic, predicate logic, set theory, and implication. I elide a lot of specifics and nuance in these chapters, the point is just to quickly get people comfortable with manipulating logical symbols. The whole section is about 5,000 words.
Function specifications
This is the most immediately "obvious" use of logic: formally specifying what it means for a function to be correct. Say we have find(list, x)
, which returns the first index of x
, or null
if there is no such index. We can write the formal specification like this:
given: find(list, x) -> o
ensures:
o = null =>
all i in 0..<len(list): list[i] != x
o != null =>
1. list[o] = x
2. all i in 0..<o:
list[i] != x
(This is also a good example of "math is flexible". There isn't any consensus notation for a function's contracts, so I just made up a yaml notation. Also I stole the ..<
notation from Chapel.)
The chapter's maybe 70% written, and includes a section on how ways to directly test logical specifications: property testing, design by contract, and formal verification.
Data constraints
This is about both internal data invariants, like end >= start
, and global data invariants, like database constraints and such. Something like "every row has a unique id" is global. You need two rows to violate it, each of which is valid by itself but the both together are invalid.
all row1, row2 in db:
row1 != row2 => row1.id != row2.id
I haven't started this chapter, though I have some outlines. I want to talk about the difference between what's representable and what's valid, and discuss class invariants, constraint features and Make Illegal States Unrepresentable as enforcement techniques.
Requirements
This is about encoding English requirements of a system into logical formulae. IMO it's the most valuable use of formal logic, as the process eliminates a lot of ambiguity and reveals subtle consequences of the system. I'm using the Yale library room reservation rules as an example:
- Rooms can be reserved up to 4 months (120 days) in advance
- Patrons can hold 2 active reservations at a time
- Reservations have a 3-hour time limit
- Rooms are open to all when not reserved; however, users with a valid reservation are given priority
- Reservations can only be made by current TC students, faculty, and staff
If you try to formalize these requirements, you learn a lot of things, principally that there's a difference in the domain between a reservation entity and a room being reserved. Some of these rules are relevant to only one of the two. And some rules are ambiguous, such as time limits. "A reservation entities can span at most 3 hours" is behaviorally different from "a room can be reserved for at most 3 hours."
I'm using a couple more examples, too, access policies and a third I haven't decided yet.
Technology
The use cases cover how and why to use formal logic. This is a reference place of "things you can do", sort of like a sin table or handbook of equations. For example, to say that x
is the most in a set under some condition f
, you could write
all y in set: f(x) >= f(y)
It also includes rewrite rules, like all x: !P(x) <=> !(some x: P(x))
, etc.
Challenges
Stuff that's slowing the book down.
Coming up with exercises is hard
So hard.
WTF is Notation
The functional specification chapter has been pretty straightforward because the notation for functional operations already exists, sorta. The problem is the requirement chapter, the things that deal with state. This is the stuff that formal logic is most useful for, but also far enough removed from mathematical theory that there's no material on how to actual do it. I'm making stuff up as I go along.
Just one example: the difference between elements of a type set and elements of the state. Say we have a type of account:
struct Account {
id: Nat,
email: String,
username: String
password: String,
}
This struct is actually a set of all all possible accounts. In any actual system, only a subset of them would actually exist, and that's what we want to write properties on. So how do we distinguish between "the set of accounts" and "the set of accounts that are in your database?" I could just write all x in Account
and conflate the two, but that hides that there is a difference, which is important! $Account
or @Account
? Those work but are they an extraneous notation load? Or maybe I should be writing state.accounts
in case we need to have multiple distinct subsets of Account
in the system...
Anyway there's notational questions like this all over the book, and it'd be easier if there was prior art I could draw upon, but I haven't been able to find any. Right now I'm doing whatever feels right but that leads to a lot of inconsistency in the notations that I use, and I need to get them all consistent for any final version.
Books are Hard
This is a PDF-first project, which I'm working with LaTeX generated by Sphinx. I was so gosh-darn proud of myself when I figured out how to automatically push exercise solutions to the end of the book.
I'm struggling more with the "design" part. I want this book to look good, or at least not like a techie wrote it in LaTeX. How do people design books? How do I lay it out? Should I buy a good font? etc etc etc.
Status and Timeline
I'm on a freeze until April or so to update and run my TLA+ workshop. Right now I have 10k words, so a little over half done? The math section is done except for examples and exercises, function specifications is 70% done, data invariants is 0% done, requirements is 50% done, techniques is 10%. Once I start working on it again, the order I'm thinking is:
- Add outlines of the content I plan to write, and notes, to all the existing chapters.
- Send it around to a couple of friends asking for structural and material feedback.
- Fill it out, with examples and low polish but few exercises.
- Make an early-purchasable version to get wider feedback.
- Add exercises, polish, pay a designer to help make a good cover and layout.
- Beta, final feedback.
- Pay a proofreader and copy-editor.
- Final version!
I don't know what the actual timeline is, though. I mean I'd like to get to at least (4) by the end of the year, but we all know bad I am with deadlines.
That's all I got for today, back to workshop revisions. Have a great week everyone!
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.