A Very Brief Intro to Formal Methods (aka my job)
Office Hours
Worked pretty well last time, so let's do it again! This Friday, 11 AM CST, Zoom Room [redacted]
, password is [redacted]
. Not sure if this is going to be a weekly thing, might move the time if it is. Feel free to send thoughts my way.
Formal Methods
I recently got a lot of followers from my essays on software history. A lot of you might not know what it is I do for an actual living. And it's something I've always had trouble explaining to people, even other programmers. I work in a pretty weird niche of programming, and while I want to make it more mainstream, it's a pretty uphill battle as you'd expect. Let's see if I can explain what formal methods are!
Note: I cover this all in a lot more detail in Why Don't People Use Formal Methods?. If you want a deeper intro, that's a good 6,000 words for ya.
Proving Code Correct
I'm sure you're all familiar with testing. I have a function sort
and want to make sure works properly. I'll assume for now that we are only dealing with integer arrays. We might write a test like
assert sort([2, 1, 3]) == [1, 2, 3]
But all this is doing is testing a single example of what the sort function should do. I have no guarantee based on this test that, say, an input like [1, -5, 0, 0, 2, -9, 100]
would work! All I know is that the specific input I tested works.
This is fine for most cases. But there are certain domains where this is not enough. It's not enough for something to be probably correct, or definitely correct most of the time. It must be definitely correct all of the time. We're willing to put in extra effort if it guarantees that we are safe in every possible case.
To do that, I first need a specification. The specification is a description of what I want my program to definitely do. In this case the specification of sort
is this:
forall i, j in 0..Len(l)-1:
if i < j, then l[i] <= l[j]
If I can show that this code implements this specification then I'm guaranteed that the output of this function will always be sorted, no matter what I put in. This is called formal verification. We take a piece of code and mathematically prove that it does what we expected to in all circumstances. This is not a pipe dream! There are tools that do this right now. We've actually been doing this kind of work since at least 1975, and the tools have only gotten better since.
If it works, why haven't you heard of it? It's much harder than it sounds, and it sounds pretty hard. First of all, it's meaningless to talk about correctness without being very explicit in what kind of specification you mean. Proving something correct only proves that it matches the specification you intend, not the you have the right specification the first place. For example! Here's a function that satisfies our specification of sort
:
def sort(l: list[int]): list[int]
ensures forall i, j in 0..Len(l)-1:
if i < j, then l[i] <= l[j]
{
return []
}
The problem is that we didn't specify that it should be the input list that is sorted. Our specification did not match our intentions. Writing the correct spec in the first place is half the battle.
And even then, lots of things can interfere with correctness. Consider a function that modifies an array in place. If you look at just that function, it might seem that it guarantees the array is sorted. Now imagine that in a separate thread we are mutating the same array. The sort function cannot guarantee that the output is sorted, because something else could unsort it before sort finished. Pretty much everything that makes programming languages more expressive also makes them harder to prove.
Finally, there's the actual process of proving itself. Most code verification tools need a lot of hints from the programmer or steps filled in for them. It's still possible, but you need a lot of time and skill. This all makes it cost-ineffective for most orgs compared to things like code review and testing.
I told you that story so I can tell you this one
Okay so proving code is hard. One thing we can do is make better tools to prove code. And as I've said before, tools have been getting a lot better over time. There's also something else we can do: instead of proving code correct, we prove abstract models of the system correct. Then, once we're confident the system works, we implement it in code. There's no connection between the proven abstract model and the implemented code, besides for code review and careful testing and stuff. This makes it a lot easier to write and prove properties about. In fact, showing properties hold is often completely automatic, with no extra "hinting" from the programmer needed.
It's kind of hard to explain, but I like to think of it as testing ideas instead of code. I've also heard it described as "design verification", "debuggable design", and "lightweight formal methods". It's also a lot more effective than people think. Probably best to show that with an example. Imagine you have
- Users
- Roles, which users can have
- Resources, with access and block policies, both of which can include both users and roles
- Parent Resources, where access to a parent resource grants access to the child resources. Not all resources have parents.
What kinds of pathological cases would you test? What kind of weird edge cases do you need to worry about?
How about this one?
That's where a user has direct access to a parent resource, but it also has a role that is blocked by the child resource. Does this case satisfy all your properties? If you're lucky, you spend a day tracing the code before deciding that yes, it does. If you're unlucky, you find out late in development and have to redo your entire access model in order to handle this situation.
By contrast, I wrote those four rules in Alloy and it found this particular case in a tenth of a second. And it would test this case, and all other complicated cases, for any property I wanted. Would you rather find problems a few minutes into modeling or months into your development process?
This is stuff I believe is world-changing. I've employed a few of these design verification languages to production systems and the results have been consistently stellar. My favorite case was a greenfield project I was hired on as a consultant. They were trying to create a new kind of blockchain for their purposes and had already built a prototype. Within three hours of modelling, we discovered that a blockchain was completely the wrong data structure and they had to start from square one. Possibly the only time anyone was persuaded not to use a blockchain.
What I actually do
This kind of design modeling is incredibly powerful. But it's also hard to use. This isn't an essential problem but an accidental one. It just happens that they don't have great UI/UX, and most aren't well documented. If something is hard to learn, you're not going to use it, even if it has the potential to cut months off your development time.¹
Most of my work is on making these tools more accessible. That's mostly been writing documentation and guides to make it easier for beginners to get started.² More recently, I've been doing work on auxiliary tooling, mostly CLI. I also teach, which is where most of my income comes from. Companies want to use formal specifications, so they bring me in to teach some people.
(I care a lot about teaching. One of the best compliments I've gotten was from a student who said it was the only corporate workshop he's ever been to that was worth the time. 16 hours of hands-on labs and group exercises.)
I've done the most work on two of these tools:
- TLA+ is a formal specification language that is extremely good at modeling concurrent or distributed systems. I've written two resources on it, a free online guide called Learn TLA+ and a book called Practical TLA+. More recently I've been working on an experimental new CLI to make it easier for people to use TLA+.
- Alloy is a great tool for modeling data structures and relationships in systems. You saw an example of its visualizer up there. I'm in the process of writing the (eventually-will-be) official online documentation, and I've organized an internal project to add an Alloy CLI.
Here's some of my favorite stuff I've written on using these tools:
- Augmenting Agile with Formal Methods: Using TLA+ to find a bug in a thread scheduler that took professional testers weeks to find.
- Formally Modeling Database Migrations: Using Alloy to verify that replacing a set column with a many-to-many relationship doesn't lead to us violating database constraints.
- Feature Interaction Bugs: Using TLA+ to show that two completely independent features of an ecommerce site can both work correctly and break a global invariant when combined.
If you're interested in using this stuff at work, I also wrote a few guides:
- The Business Case for Formal Methods: How to convince your boss to let you spend time learning and applying formal methods. Includes lots of case studies on how it saves money and devtime.
- Using Formal Methods at Work: How to apply FM in productive ways, even when you're just starting out.
- Or you can just hire me lol
¹ I call this the "flossing problem". Half of Americans don't floss, even though it's like two minutes a day. If the annoyingness of flossing is enough to get people to chance root canals, then poor UI/UX on a tool is enough to sink it.
² I also run Let's Prove Leftpad, where people can compare and contrast different theorem provers. But that's more on the "verifying production code" side of things.
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.