What are the Rosettas of formal specification?

Logic for Programmers

        January 15, 2025

What are the Rosettas of formal specification?
Plus, new book release!

        First of all, I just released version 0.6 of Logic for Programmers! You can get it here. Release notes in the footnote.¹
I've been thinking about my next project after the book's done. One idea is to do a survey of new formal specification languages. There's been a lot of new ones in the past few years (P, Quint, etc), plus some old ones I haven't critically examined (SPIN, mcrl2). I'm thinking of a brief overview of each, what's interesting about it, and some examples of the corresponding models.
For this I'd want a set of "Rosetta" examples. Rosetta Code is a collection of programming tasks done in different languages. For example, "99 bottles of beer on the wall" in over 300 languages. If I wanted to make a Rosetta Code for specifications of concurrent systems, what examples would I use? 
What makes a good Rosetta examples?
A good Rosetta example would be simple enough to understand and implement but also showcase the differences between the languages. 
A good example of a Rosetta example is leftpad for code verification. Proving leftpad correct is short in whatever verification language you use. But the proofs themselves are different enough that you can compare what it's like to use code contracts vs with dependent types, etc. 
A bad Rosetta example is "hello world". While it's good for showing how to run a language, it doesn't clearly differentiate languages. Haskell's "hello world" is almost identical to BASIC's "hello world".
Rosetta examples don't have to be flashy, but I want mine to be flashy. Formal specification is niche enough that regardless of my medium, most of my audience hasn't use it and may be skeptical. I always have to be selling. This biases me away from using things like dining philosophers or two-phase commit.
So with that in mind, three ideas:
1. Wrapped Counter
A counter that starts at 1 and counts to N, after which it wraps around to 1 again.
Why it's good
This is a good introductory formal specification: it's a minimal possible stateful system without concurrency or nondeterminism. You can use it to talk about the basic structure of a spec, how a verifier works, etc. It also a good way of introducing "boring" semantics, like conditionals and arithmetic, and checking if the language does anything unusual with them. Alloy, for example, defaults to 4-bit signed integers, so you run into problems if you set N too high.²
At the same time, wrapped counters are a common building block of complex systems. Lots of things can be represented this way: N=1 is a flag or blinker, N=3 is a traffic light, N=24 is a clock, etc.
The next example is better for showing basic safety and liveness properties, but this will do in a pinch. 
2. Threads
A counter starts at 0. N threads each, simultaneously try to update the counter. They do this nonatomically: first they read the value of the counter and store that in a thread-local tmp, then they increment tmp, then they set the counter to tmp. The expected behavior is that the final value of the counter will be N.
Why it's good
The system as described is bugged. If two threads interleave the setlocal commands, one thread update can "clobber" the other and the counter can go backwards. To my surprise, most people do not see this error. So it's a good showcase of how the language actually finds real bugs, and how it can verify fixes.
As to actual language topics: the spec covers concurrency and track process-local state. A good spec language should make it possible to adjust N without having to add any new variables. And it "naturally" introduces safety, liveness, and action properties.
Finally, the thread spec is endlessly adaptable. I've used variations of it to teach refinement, resource starvation, fairness, livelocks, and hyperproperties. Tweak it a bit and you get dining philosophers.
3. Bounded buffer
We have a bounded buffer with maximum length X. We have R reader and W writer processes. Before writing, writers first check if the buffer is full. If full, the writer goes to sleep. Otherwise, the writer wakes up a random sleeping process, then pushes an arbitrary value. Readers work the same way, except they pop from the buffer (and go to sleep if the buffer is empty).
The only way for a sleeping process to wake up is if another process successfully performs a read or write.
Why it's good
This shows process-local nondeterminism (in choosing which sleeping process to wake up), different behavior for different types of processes, and deadlocks: it's possible for every reader and writer to be asleep at the same time.
The beautiful thing about this example: the spec can only deadlock if X < 2*(R+W). This is the kind of bug you'd struggle to debug in real code. An in fact, people did struggle: even when presented with a minimal code sample and told there was a bug, many testing experts couldn't find it. Whereas a formal model of the same code finds the bug in seconds. 
If a spec language can model the bounded buffer, then it's good enough for production systems.
On top of that, the bug happens regardless of what writers actually put in the buffer, so you can abstract that all away. This example can demonstrate that you can leave implementation details out of a spec and still find critical errors.
Caveat
This is all with a heavy TLA+ bias. I've modeled all of these systems in TLA+ and it works pretty well for them. That is to say, none of these do things TLA+ is bad at: reachability, subtyping, transitive closures, unbound spaces, etc. I imagine that as I cover more specification languages I'll find new Rosettas.

Exercises are more compact, answers now show name of exercise in title

"Conditionals" chapter has new section on nested conditionals

"Crash course" chapter significantly rewritten
Starting migrating to use consistently use == for equality and = for definition. Not everything is migrated yet
"Beyond Logic" appendix does a slightly better job of covering HOL and constructive logic
Addressed various reader feedback
Two new exercises

↩

You can change the int size in a model run, so this is more "surprising footgun and inconvenience" than "fundamental limit of the specification language." Something still good to know! ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

                                Don't miss what's next. Subscribe to Computer Things:

            Email address (required)

          Join the discussion:

                            A

                                    Alex

                        January 24, 2025, morning

        "Rosetta Code for specifications"? I thought it reminded me of something...
Yep, sure, the https://www.cs.ru.nl/~freek/100/ ! 😁
Yes, I know it's a very different thing, it has nothing to do with programming for the starter. Still, if someone wants to get a gist of different Interactive Theorem Provers, the list provides a number of reference points.

Reply
Report

          Add a comment:

    You're not signed in. Posting this comment will subscribe you to this newsletter with the email address you enter below.

                Share this email:

                                Share on Facebook

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Bluesky