TLA from first principles
Building an intuition for the mathematics behind the "TLA" in "TLA+"
No Newsletter next week
I'll be speaking at USENIX SRECon!
TLA from first principles
I'm working on v0.5 of Logic for Programmers. In the process of revising the "System Modeling" chapter, I stumbled on a great way to explain the Temporal Logic of Actions that TLA+ is based on. I'm reproducing that bit here with some changes to fit the newsletter format.
Note that by this point the reader has already encountered property testing, formal verification, decision tables, and nontemporal specifications, and should already have a lot of practice expressing things as predicates.
The intro
We have some bank users, each with an account balance. Bank users can wire money to each other. We have overdraft protection, so wires cannot reduce an account value below zero.
For the purposes of introducing the ideas, we'll assume an extremely simple system: two hardcoded
variables alice
and bob
, both start with 10 dollars, and transfers
are only from Alice to Bob. Also, the transfer is totally atomic: we
check for adequate funds, withdraw, and deposit all in a single moment
of time. Later [in the chapter] we'll allow for multiple nonatomic transfers at the same time.
First, let's look at a valid behavior of the system, or possible way it can evolve.
alice 10 > 5 > 3 > 3 > ...
bob 10 > 15 > 17 > 17 > ...
In programming, we'd think of alice
and bob
as variables that change. How do we represent those variables purely in terms of predicate logic? One way is to instead think of them as arrays of values. alice[0]
is the initial state of alice
, alice[1]
is after the first time step, etc. Time, then, is "just" the set of natural numbers.
Time = {0, 1, 2, 3, ...}
alice = [10, 5, 3, 3, ...]
bob = [10, 15, 17, 17, ...]
In comparison to our valid behavior, here are some invalid behaviors:
alice = [10, 3, ...]
bob = [10 15, ...]
alice = [10, 1, ...]
bob = [10 21, ...]
The first is invalid because Bob received more money than Alice lost. The second is invalid because it violates our proposed invariant, that accounts cannot go negative. Can we write a predicate that is true for valid transitions and false for our two invalid behaviors?
Here's one way:
Time = Nat // {0, 1, 2, etc}
Transfer(t: Time) =
some value in 0..=alice[t]:
1. alice[t+1] = alice[t]  value
2. bob[t+1] = bob[t] + value
Go through and check that this is true for every t
in the valid
behavior and false for at least one t
in the invalid behavior. Note
that the steps where Alice doesn't send a transfer also pass
Transfer
; we just pick value = 0
.
I can now write a predicate that perfectly describes a valid behavior:
Spec =
1. alice[0] = 10
2. bob[0] = 10
3. all t in Time:
Transfer(t)
Now allowing "nothing happens" as "Alice sends an empty transfer" is a little bit weird. In the real system, we probably don't want people to constantly be sending each other zero dollars:
Transfer(t: Time) =
 some value in 0..=alice[t]:
+ some value in 1..=alice[t]:
1. alice[t+1] = alice[t]  value
2. bob[t+1] = bob[t] + value
But now there can't be a timestep where nothing happens. And that means
no behavior is valid! At every step, Alice must transfer at least one dollar to Bob.
Eventually there is some t
where alice[t] = 0 && bob[t] = 20
. Then
Alice can't make a transfer, Transfer(t)
is false, and so Spec
is
false.^{1}
So typically when modeling we add a stutter step, like this:
Spec =
1. alice[0] = 10
2. bob[0] = 10
3. all t in Time:
 Transfer(t)
 1. alice[t+1] = alice[t]
2. bob[t+1] = bob[t]
(This is also why we can use infinite behaviors to model a finite algorithm. If the algorithm completes at t=21
, t=22,23,24...
are all stutter steps.)
There's enough moving parts here that I'd want to break it into subpredicates.
Init =
1. alice[0] = 10
2. bob[0] = 10
Stutter(t) =
1. alice[t+1] = alice[t]
2. bob[t+1] = bob[t]
Next(t) = Transfer(t) // foreshadowing
Spec =
1. Init
2. all t in Time:
Next(t)  Stutter(t)
Now finally, how do we represent the property NoOverdrafts
? It's an
invariant that has to be true at all times. So we do the same thing we
did in Spec
, write a predicate over all times.
property NoOverdrafts =
all t in Time:
alice[t] >= 0 && bob[t] >= 0
We can even say that Spec => NoOverdrafts
, ie if a behavior is valid
under Spec
, it satisfies NoOverdrafts
.
One of the exercises
Modify the Next
so that Bob can send Alice transfers, too. Don't try
to be too clever, just do this in the most direct way possible.
Bonus: can Alice and Bob transfer to each other in the same step?
Solution [in back of book]: We can rename Transfer(t)
to TransferAliceToBob(t)
, write the
converse as a new predicate, and then add it to next
. Like this
TransferBobToAlice(t: Time) =
some value in 1..=bob[t]:
1. alice[t+1] = alice[t]  value
2. bob[t+1] = bob[t] + value
Next(t) =
 TransferAliceToBob(t)
 TransferBobToAlice(t)
Now, can Alice and Bob transfer to each other in the same step? No.
Let's say they both start with 10 dollars and each try to transfer five
dollars to each other. By TransferAliceToBob
we have:
1. alice[1] = alice[0]  5 = 5
2. bob[1] = bob[0] + 5 = 15
And by TransferBobToAlice
, we have:
1. bob[1] = bob[0]  5 = 5
2. alice[1] = alice[0] + 5 = 15
So now we have alice[1] = 5 && alice[1] = 15
, which is always false.
Temporal Logic
This is good and all, but in practice, there's two downsides to treating time as a set we can quantify over:
 It's cumbersome. We have to write
var[t]
andvar[t+1]
all over the place.  It's too powerful. We can write expressions like
alice[t^25] = alice[t] + t
.
Problem (2) might seem like a good thing; isn't the whole point of logic to be expressive? But we have a longterm goal in mind: getting a computer to check our formal specification. We need to limit the expressivity of our model so that we can make it checkable.
In practice, this will mean making time implicit to our model, instead of explicitly quantifying over it.
The first thing we need to do is limit how we can use time. At a
given point in time, all we can look at is the current value of a
variable (var[t]
) and the next value (var[t+1]
). No var[t+16]
or
var[t1]
or anything else complicated.
And it turns out we've already seen a mathematical convention for
expressing this: priming!^{2} For a
given time t
, we can define var
to mean var[t]
and var'
to mean
var[t+1]
. Then Transfer(t)
becomes
Transfer =
some value in 1..=alice:
1. alice' = alice
2. bob' = bob
Next we have the construct all t in Time: P(t)
in both Spec
and
NoOverdrafts
. In other words, "P is always true". So we can add
always
as a new term. Logicians conventionally use □ or []
to mean the same thing.^{3}
property NoOverdrafts =
always (alice >= 0 && bob[t] >= 0)
// or [](alice >= 0 && bob[t] >= 0)
Spec =
Init && always (Next  Stutter)
Now time is almost completely implicit in our spec, with just one
exception: Init
has alice[0]
and bob[0]
. We just need one more
convention: if a variable is referenced outside of the scope of a
temporal operator, it means var[0]
. Since Init
is outside of the []
, it becomes
Init =
1. alice = 10
2. bob = 10
And with that, we've removed Time
as an explicit value in our model.
The addition of primes and always
makes this a temporal logic, one that can model how things change over time. And that makes it ideal for modeling software systems.
Modeling with TLA+
One of the most popular specification languages for modeling these kinds of concurrent systems is TLA+. TLA+ was invented by the Turing awardwinner Leslie Lamport, who also invented a wide variety of concurrency algorithms and LaTeX. Here's our current spec in TLA+:
 MODULE transfers 
EXTENDS Integers
VARIABLES alice, bob
vars == <>
Init ==
alice = 10
/\ bob = 10
AliceToBob ==
\E amnt \in 1..alice:
alice' = alice  amnt
/\ bob' = bob + amnt
BobToAlice ==
\E amnt \in 1..bob:
alice' = alice + amnt
/\ bob' = bob  amnt
Next ==
AliceToBob
\/ BobToAlice
Spec == Init /\ [][Next]_vars
NoOverdrafts ==
[](alice >= 0 /\ bob >= 0)
====
TLA+ uses ASCII versions of mathematicians notation: /\
/\/
for
&&/
, \A \E
for all/some
, etc. The only thing that's "unusual"
(besides ==
for definition) is the [][Next]_vars
bit. That's TLA+
notation for [](Next  Stutter)
: Next
or Stutter
always happens.
The rest of the chapter goes on to explain model checking, PlusCal (for modeling nonatomic transactions without needing to explain the exotic TLA+ function syntax), and liveness properties. But this is the intuition behind the "temporal logic of actions": temporal operators are operations on the set of points of time, and we restrict what we can do with those operators to make reasoning about the specification feasible.
Honestly I like it enough that I'm thinking of redesigning my TLA+ workshop to start with this explanation. Then again, maybe it only seems good to me because I already know TLA+. Please let me know what you think about it!
Anyway, the new version of the chapter will be in v0.5, which should be out midNovember.
Blog Rec
This one it's really dear to me: Metadata, by Murat Demirbas. When I was first trying to learn TLA+ back in 2016, his post on using TLA+ in a distributed systems class was one of, like... three public posts on TLA+. I must have spent hours rereading that post and puzzling out this weird language I stumbled into. Later I emailed Murat with some questions and he was super nice in answering them. Don't think I would have ever grokked TLA+ without him.
In addition to TLA+ content, a lot of the blog is also breakdowns of papers he read— like the morning paper, except with a focus on distributed systems (and still active). If you're interested in learning more about the science of distributed systems, he has an excellent page on foundational distributed systems papers. But definitely check out his his deep readings, too!

In the book this is presented as an exercise (with the solution in back). The exercise also clarifies that since
Time = Nat
, all behaviors have an infinite number of steps. ↩ 
Priming is introduced in the chapter on decision tables, and again in the chapter on database invariants.
x'
is "the next value ofx
", so you can use it to express database invariants like "jobs only move fromready
tostarted
oraborted
." ↩ 
I'm still vacillating on whether I want a "beyond logic" appendix that covers higher order logic, constructive logic, and modal logic (which is what we're sneakily doing right now!)
While I'm here, this explanation of
always
asall t in Time
isn't 100% accurate, since it doesn't explain why things like[](P => []Q)
or<>[]P
make sense. But it's accurate in most cases and is a great intuition pump. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.
Aww, I was hoping you were going to use NoOverdrafts
to then show how the spec could instead be written by including NoOverdrafts
in the spec itself to obviate the upper bound in the transfer spec itself, i.e. some value in 1..=alice[t]
can then turn into just some value in 1..
.
First thought: we could do that, but that puts the cart before the horse! The goal of writing the spec is to make sure that a higherlevel property is guaranteed by a lowerlevel specification.
Second thought: I wonder if any safety property can be turned into a deadlock property by doing exactly what you said, and incorporating the property as part of the spec.
I like this method of explanation; I think it helps with getting an intuition for how time is implicit in TLA+. Quantifying over
t: Time
seems like it makes sense once you've introduced quantifiers, and then you can motivate priming as a straightforward followup from that.Possible typo in the code block after introducing priming; right now it says
Transfer = some value in 1..=alice: 1. alice' = alice 2. bob' = bob
Should that be1. alice' = alice + value 2. bob' = bob  value
?Argh, formatting.
Original code block:
Transfer = some value in 1..=alice: 1. alice' = alice 2. bob' = bob
Possible correction:
Transfer = some value in 1..=alice: 1. alice' = alice  value 2. bob' = bob + value
...apparently multiline code blocks don't work in Buttondown's comment formatting. Ah well.