Creatively Misusing TLA+
Five not-quite-normal uses
I spent the past few weeks thinking about complexity and poking dead birds and stuff, but now that the March TLA+ workshop is available (use C0MPUT3RTHINGS
for 15% off!), I'm back in teacher mode and making workshop improvements.1 TLA+ is intended for finding flaws in software designs. But what else can we do with it?
Creative Misuse is the use of a tool in an unintended way. For example, if you use a screwdriver to pry something open, or a book as a monitor stand. Creative misuse in software includes making games in Excel spreadsheets and using yes
to test for broken hardware. Creative misuse is 1) very fun, and 2) expands the space of how useful the tool is. I love finding creative misuses for tools. Here's a few of them for TLA+.
Data Generation
In most programming languages, we construct values step-by-step. In most formal specification languages, we instead take a big set of all possible values and then winnow them down with predicates. This is less efficient, which is why programming languages don't do it, but it's also a lot more expressive.
This is useful for generating things like initial system configurations like "all nodes start with a different priority":
{ priority \in [Node -> 1..NumNodes]:
\A n1, n2 \in Node:
n1 /= n2 => priority[n1] /= priority[n2]
}
But I occasionally flip it around and use it to get values for other purposes, like "give me two sets where every set operation gives something different and interesting".
LET set == SUBSET (1..5) IN
{
<<s1, s2>> \in set \X set:
AllDistinct(<<s1, s2,
s1 \union s2, s1 \intersect s2,
s1 \ s2, s2 \ s1>>)
}
(Here's a good TLA+ exercise: implement AllDistinct
!)
Some subclasses of data generation:
Constraint Solving
One exercise I like a lot:
Solve for x:
x³ - 20x² + 7x = 14,388
.x
is a natural number below 100.
The TLA+ Way:
CHOOSE x \in 0..100: x^3 - 20*x^2 + 7*x = 14388
Okay cool, but now imagine a problem with multiple solutions, where we want to find solution with the lowest cost function. In TLA+:
MaximalChoice(pred(_), cost(_), set) ==
CHOOSE x \in set:
/\ pred(x)
/\ \A y \in set:
pred(y) => cost(x) <= cost(y)
This is using TLA+ as a constraint solver. It's a pretty slow one, since it works by enumerating all possibilities (and it don't do floats). That's why it's a creative misuse!
Dedicated constraint solvers like MiniZinc use all sorts of heuristics to speed things up (and do floats). But unlike TLA+, they don't have collection types. If you want to constraint solve on a nested data structure you have to find some way of encoding it as an array of numbers. In TLA+ you can directly solve with complex data types. For that reason I still reach for it occasionally for small-but-intricate problems.
(For some problems I like using Alloy, for the same reasons. Alloy tends to be better with subtypes and graphs. One of these days I should turn my attention back to Alloy, it's quite nice.)
Truth Tables
For what values are P && !(Q || R)
true?
[P, Q, R \in BOOLEAN |-> P /\ ~(Q \/ R)]
Synthesis
Here's code that generates simple digital circuits, like half-adders. It's in Alloy but the principles are the same.
Finding Solutions
This the same principle as data generation, lifted to the level of specification. TLA+ does worst-case model checking, so it fails if it finds any path to an error. This opens a famous trick: if you want to find the set of steps that solve a problem, write a property saying "the problem isn't solved" and make that an invariant. Then any behavior that finds the solution also breaks the invariant, and the model checker will dutifully spit out the set of steps in that behavior.
Leslie Lamport himself used this trick to solve the diehard water jug puzzle. I've also demoed this trick in learntla, using it to find sequences of operations that add up to a given number.
Even with the pedigree, this is still an unintended use of TLA+: the model checker can't give you more than one solution, at least not without a little bit of cleverness, and it can't constraint-solve on all possible solutions, at least not without a whole lot of cleverness.
Visualizing State Spaces
TLC has a generate graphviz output button, so you can see the state space for a given model.
For tiny state spaces it's a pretty great teaching tool, but past just 100 distinct states it breaks down. Fortunately there are other tools you can use! You can dump the graphviz to a file, and then import it with something like Gephi.
This is a different spec that's about 10 times larger than the graphviz output earlier. I highlighted the one starting state and the 26 terminal states.
I haven't done much with graph analysis, but I'm sure there's also sorts of fun tricks this opens up.
Counting
Imagine we have m workers w₁, w₂, … wₘ. The first one takes N₁ steps to complete, the second N₂, etc. Each worker can run between any two steps of another worker. How many possible interleavings are there total?
This many:
(N₁+N₂+...+Nₘ)!
--------------
N₁!*N₂!*...*Nₘ!
You can reason out the equation from first principles, but I found it a lot easier to first put a few combinations of values into TLA+, see how big the corresponding state spaces were, and then figure out the formula from that.
Generating Test Cases
Can we use TLA+ to directly verify our code? …sort of. In eXtreme modeling in practice, a team at MongoDB2 investigated two possible ways:
- Taking traces of the real system behavior and seeing if they match a valid TLA+ behavior
- Taking behaviors from TLA+ simulations and generating corresponding tests
According to their experience, (1) is too inefficient to do at scale. They were more successful with (2):
We found [system traces] to be impractical for testing that the Server conformed to a highly abstract specification. [Test generation] was highly successful for Realm Sync, however.
Note the constraints on their system: they used TLA+ to generate possible inputs to a complex, but deterministic, function. They speculate it would be more difficult to do for nondeterministic or concurrent systems, which also matches my experience. It's a lot easier if all the processes are modeled as communicating state machines, though, an idea that P explores in more detail.
I like using tools in weird ways. Most of these creative misuses won't be covered in the workshop because I'll already have my hands full teaching the actual uses of TLA+, verifying properties of software designs.
Update for the Internets
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. Updates are 6x a month. I also have a website where I put my polished writing (the newsletter is more for off-the-cuff stuff).
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.