Formal Specification Languages
Alloy Workshop
Slots for the September 21st Alloy workshop are still available! Learn how to model domains in a way that finds the bugs for you, rather than three months into the project. Register here. Use the code YARLSNARTH
for $500 off.
Formal Specification Languages
One of the problems of having a general interest newsletter is that a lot of the new subscribers don't actually know what I do for a living. You see the Alloy workshop and wonder "what the heck is Alloy?" I'd guess I have at least 600 new subscribers who have never heard of formal methods, much less about specific formal methods. Since I've been asked to write a quick utility intro to the space for a while now, I figure I'll spend this newsletter just running through some of the tools that are out there. Also I'm only going to talk about formal specificiation languages for design, not formal verification of code. I'm just a lot more experienced in the former. If you want to broader introduction to formal methods, I wrote a bit here and a broader thing on why they're uncommon here.
Broadly speaking, a formal specification language is a notation we can use to describe the design of a system without having to implement it first. Once we have a description of the system, we can then directly test that design itself for bugs. So you can find issues with your system in a couple hours of modeling versus a couple months of implementing. And the issues that formal specification tends to find are the ones that are really subtle, dangerous, and expensive to fix. This makes writing formal specifications a great way to deliver better software faster and cheaper.
(Oh hey, guess who's offering a workshop on this?)
There's a lot of different formal specification languages. A common and unfortunate property of them is that they are mostly developed in academia. Nothing wrong with academia, but it means that they tend not to have great UI/UX or tutorials for industry developers. This is starting to change as more developers in the industry are using these tools, but it's a long road.
I listed some of the formal specification languages I know about below, ordered by how comfortable I am with them and how much I can say about them. All headers are links to the official sites.
Z
That was a lie, I have to start with Z. While I never used it, Z is so historically important to the broader space of formal methods that I have to introduce it first. So much of the field walks down the paths that Z blazed.
It's debatable whether Z, VDM, or Patrick Doyle's work was the first published work in specification. But Z was first specification approach to see "widespread" use, meaning "more than a dozen people". Most famously it was used by Praxis Critical Systems as part of their "correct-by-construction" approach.
Z is based on a mix of set theory to describe the states of the system and schemas to describe its behavior. Everything is expressed as mathematical relationships on sets and relations. This means you need a strong grounding in formal logic to use Z well. Here's what a Z schema looks like:
A whole lot of symbols there. That was one of Z's major problems: it had a huge custom vocabulary that you needed to know in order to use it. With most formal specifications, the challenge is learning how to express your problems in a compatible way. Z had the further problem that it was hard to even speak the language.
The other big problem with Z was that it was too "expressive". Even though it was very hard to say anything, there was a lot you could, theoretically, say. This meant that making tooling for Z was extremely difficult. According to one author I read the tooling never really got past pretty printing and the most basic type checking. Checking the actual properties of your specification was out of reach. The situation may have improved since then, but there's nothing like the modern tooling we have now. I think that Z's still used in teaching environments but is considered obsolete for actual industry practice. There are several descendants, one of which (B/Event-B) is from the same developer and can be considered the official successor.
As a note: Z is pronounced "Zed", even if you're an American. The pronunciation is in the ISO. I pronounce Event-B "Event-Bed" as revenge.
Alloy
Developed by Daniel Jackson in response to his perceived issues with Z. Alloy follows a similar approach in it's design principles, but with two additional constraints: Alloy must have a simple syntax and must be easy to write tooling for. IMO it is succeeded in both of these remarkably well.
Everything in Alloy is either a "signature", or type, or relationship between signatures. For example I could write a directed acyclic graph like this:
sig Node {
edge: set Node
}
fact "DAG" {
no iden & ^edge
}
So Node
is a signature and edge
is a relationship that maps nodes to nodes. There's still some syntax you have to learn but much less than Z had. More complex properties, like changing state and nested information, has to be expressed in as signatures and relationships. That requires a bit of cleverness.
All FM tools have a niche and Alloy's is domain modeling and graph data. It's a great tool if you know that you're going to have to deal with graphs. People tend to use it outside of its best niche, though, because of its absolutely fantastic tooling. Alloy specifications can be cast to SAT problems, meaning that even very complex models can be checked in less than a second. It also has a REPL and a model visualizer, making it easy to share both examples and counterexamples of properties with non-technical people.
Alloy is one of the tools I know best. It feels like a set of lockpicks to me: small, elegant, precise motions of the fingers. Also I have an agenda here. In addition to the workshop stuff, I accidentally-sorta-kinda joined the Alloy board. Long story. I'm currently working on writing the online reference documentation, which you can read the current version of here. It's not yet official, but it's starting to get close to that. Woot!
TLA+
Arguably the fastest growing formal method at present. I like to think I played a big part of that but that might just be my hubris talking.
A fact I glazed over earlier is that modeling time is really, really hard. Z and Alloy do it by splaying all the states out at once and expressing time as a relationship between those sets of states. This works but is very finicky and limited. The other common approach is to use something called temporal logic, which extends standard mathematical logic to where the truth of statements can change over time. There are a lot of different kinds of temporal logic with different trade-offs. One of these, Leslie Lamport's Temporal Logic of Actions, is known to be easier to scale to real world systems than other kinds of temporal logic. He originally proposed that it be integrated with Z, but once that was rejected he created his own specification language called TLA+.
TLA+ is designed to model concurrent and distributed systems. If you have multiple things all changing state nondeterministically then TLA+ is often the right tool for the job. It's also capable of checking liveness properties, properties of systems that must eventually be true for the spec to be valid. For example, an email sending service might buffer emails, but should eventually send them. If the design makes it possible for an email to never be sent then TLA+ can find that as a bug.
TLA+ was the first formal method I discovered. I used it to design a complicated vendor integration and was blown away by its effectiveness. It cut a month off at the timeline and saved another month on post-deployment maintenance. I left that company three years ago but my ex-coworkers have told me that the system is still running along, no problems in prod. That experience inspired me to write an introductory website, then a book, and then start consulting in FM. Like I said, I've got some hubris about my role in popularizing TLA+.
(I'll prob be offering my next TLA+ workshop in October, will release deets on that soon!)
If Alloy is a set of lockpicks then TLA+ is a sledgehammer. It feels big and heavy and cumbersome, and it makes walls disappear. You swing at something you want to not be there anymore.
PRISM
Quick digression from the parade of Z's children. PRISM is a bit of an odd duck. All these tools are for finding faults in systems. Either the system has the problem or it doesn't. PRISM is different. PRISM is used for probabilistic specifications, where different things have different chances of happening. Instead of saying "this will fail if X happens", PRISM can instead say "this has a 20% chance of failing if X happens and X will happen on average once every 7 minutes." Really powerful stuff! The downside is that PRISM has an extremely restricted syntax in order to be tractable. No strings, no functions, no collections of elements. To use PRISM well you have to be very good at converting your problem into PRISM-speak.
(For the record, despite its crippling syntax PRISM is still cutting edge of probabilistic specification. Your next best alternative is to just write a giant transition matrix and hope for the best.)
I've written about PRISM before and said it felt completely useless. But I keep getting drawn back to it because I find it's power so fascinating. I recently realized the trick to it is to owrk one level above the spec: write programs that writes the spec for you. I'm currently working on a piece about exploring queuing theory via PRISM. Here's a sneak peek:
B/Event-B
The direct successor to Z. I don't know too much about this so won't say very much. To my understanding B was the core language and Event-B is the dominant dialect. It tends to focus a bit more on refinement than other specification languages, which broadly speaking is a way of showing that a more complex specification has the same properties as a less complex one. It seems like there's also a focus on physical machines? Most famously it was used to model the Paris Métro Line 14 train controller. I have the Event-B IDE on my computer and it's next my line of things to learn, more to round out my specification experience than for any specific thing I'm looking for in Event-B itself. But it feels like something that I'll be more excited about once I actually know it.
Spin
Purely from a career perspective this is the one I should be looking at next. Spin seems to have that same mix of expressivity, accessibility, and tooling firepower that make TLA+ and Alloy so powerful. I could easily see it being the third language I offer as an industry-oriented workshop.
Spin was designed for modeling networking protocols and that shows in its level of detail. Spin syntax looks like a mishmash of C and Algol-68:
mtype = {M_UP, M_DW};
chan Chan_data_down = [0] of {mtype};
chan Chan_data_up = [0] of {mtype};
proctype P1 (chan Chan_data_in, Chan_data_out)
{
do
:: Chan_data_in ? M_UP -> skip;
:: Chan_data_out ! M_DW -> skip;
od;
};
(from wikipedia)
That's far closer to the level of code implementation then any other specification I've talked about. Spin can even integrate snippets of C into its specifications at the cost of losing model tractability. Spin also has native syntax for communicating sequential processes. It should be feasible to adapt Spin for modeling Go code, a domain I can see performing admirably.
Then again, I haven't used Spin, only read about it. Maybe I'll try out for a few days and find out that "reformat your hard drive on failure" is part of the language semantics. Until then, the official book sits on my bookshelf, mocking my laziness.
Misc Languages
Formal specification languages I have less to say about.
mCRL2
I thought that several different people recommended this to me, but looking back it was just a single Hacker News argument. Let it never be said that I don't dive headfirst into any rabbit hole I see. mCRL2 purports to be a specification language for concurrent systems and protocols. To my understanding it's highly academic and mostly used as part of student projects. One paper lists one success case as being used as part of the Dezyne modeling tool but the paper they reference expressly says that Dezyne uses FDR for verification. I've also heard that it sees some use in automotive software but haven't been able to track down any industrial confirmation of that. I found this language extremely difficult to learn and the public models indicate that the language has lots of boilerplate, so I don't believe it's worth learning.
Runway
A sadly defunct tool. Developed by Diego Ongaro, the inventor of Raft, while he was at Salesforce. It purported to be a simple specification language with easy integration into visualizers, which would make it very easy to teach to beginners. Work on it seems to have stopped after Ongaro was poached by Amazon.
FDR
Formal specification language for communicating sequential processes. Main interesting thing to me is that the specification language is functional and lazy. Seems like it might be good for real-time specifications. The big problem with it is the license: it's only free for research and personal use. Anybody who wants to use it for industry work has to pay for a license.
KeyMaera X
A specification language for real-time cyberphysical systems, with an emphasis on control theory and differential equations. Haven't touched it at all but chatted with one of the core developers and they seem cool.
SDL
An old graphic modeling language. The developer, Ivar Jacobson, would go on to be one of the developers of UML. I don't consider UML an FSL because it was never intended to have mathematically rigorous semantics, though people have used subsets of UML for formal specification. One of UML's successors, SysML, looks to be more formalized.
Other Notations of Note
Finite State Machines
When it comes to actually designing systems at scale, the only effective structure we really have is the finite state machine. They're simple, easy to reason about, and easy to analyze. Most specification languages eventually boil down to "how well can we express the state machine we need." Finite state machines aren't too expressive, though, so people often make more expressive extensions that are less analytically tractable. This includes things like Petri nets, state charts, and transition systems. State machines are great y'all
Decision Tables
A simple technique to exhaustively express all of the outcome of a set of enumerated conditions. I wrote a bit about them here
.
Conclusion
None really, these are all cool, also give me money
Tweetstorms
Only two this week:
- Where math comes from: My response to an internet dogpile I probably shouldn't have gotten involved in. How the math we have changes over time, and the ways we think about algebra and stuff now is different from how people in the past thought about it.
- Exploring problem spaces: how we don't really have a discipline of figuring out what problems we'll run into in a given domain. Not how to solve the problems, just waht the problems are in the first place. A bunch of people recommended domain-driven design as a viable approach here, which I'll have to look into.
-
What I'm referring to as Spin, is in fact, PROMELA/Spin, or as I've recently taken to calling it, PROMELA plus Spin. Everybody refers to it as Spin, but that's just the name of the model checker. PROMELA is name of the specification language. But the same guy developed both and refers to it as Spin, so I'm doing that here too. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.