Computer Things

Subscribe
Archives
October 22, 2024

TLA from first principles

Building an intuition for the mathematics behind the "TLA" in "TLA+"

No Newsletter next week

I'll be speaking at USENIX SRECon!

TLA from first principles

I'm working on v0.5 of Logic for Programmers. In the process of revising the "System Modeling" chapter, I stumbled on a great way to explain the Temporal Logic of Actions that TLA+ is based on. I'm reproducing that bit here with some changes to fit the newsletter format.

Note that by this point the reader has already encountered property testing, formal verification, decision tables, and nontemporal specifications, and should already have a lot of practice expressing things as predicates.


The intro

We have some bank users, each with an account balance. Bank users can wire money to each other. We have overdraft protection, so wires cannot reduce an account value below zero.

For the purposes of introducing the ideas, we'll assume an extremely simple system: two hardcoded variables alice and bob, both start with 10 dollars, and transfers are only from Alice to Bob. Also, the transfer is totally atomic: we check for adequate funds, withdraw, and deposit all in a single moment of time. Later [in the chapter] we'll allow for multiple nonatomic transfers at the same time.

First, let's look at a valid behavior of the system, or possible way it can evolve.

alice   10 ->  5 -> 3  -> 3  -> ...
bob     10 -> 15 -> 17 -> 17 -> ...

In programming, we'd think of alice and bob as variables that change. How do we represent those variables purely in terms of predicate logic? One way is to instead think of them as arrays of values. alice[0] is the initial state of alice, alice[1] is after the first time step, etc. Time, then, is "just" the set of natural numbers.

Time  = {0, 1, 2, 3, ...}
alice = [10, 5, 3, 3, ...]
bob   = [10, 15, 17, 17, ...]

In comparison to our valid behavior, here are some invalid behaviors:

alice = [10, 3,  ...]
bob   = [10  15, ...]

alice = [10, -1,  ...]
bob   = [10  21,  ...]

The first is invalid because Bob received more money than Alice lost. The second is invalid because it violates our proposed invariant, that accounts cannot go negative. Can we write a predicate that is true for valid transitions and false for our two invalid behaviors?

Here's one way:

Time = Nat // {0, 1, 2, etc}

Transfer(t: Time) =
  some value in 0..=alice[t]:
    1. alice[t+1] = alice[t] - value
    2. bob[t+1] = bob[t] + value

Go through and check that this is true for every t in the valid behavior and false for at least one t in the invalid behavior. Note that the steps where Alice doesn't send a transfer also pass Transfer; we just pick value = 0.

I can now write a predicate that perfectly describes a valid behavior:

Spec = 
  1. alice[0] = 10
  2. bob[0]   = 10
  3. all t in Time:
    Transfer(t)

Now allowing "nothing happens" as "Alice sends an empty transfer" is a little bit weird. In the real system, we probably don't want people to constantly be sending each other zero dollars:

Transfer(t: Time) =
- some value in 0..=alice[t]:
+ some value in 1..=alice[t]:
    1. alice[t+1] = alice[t] - value
    2. bob[t+1] = bob[t] + value

But now there can't be a timestep where nothing happens. And that means no behavior is valid! At every step, Alice must transfer at least one dollar to Bob. Eventually there is some t where alice[t] = 0 && bob[t] = 20. Then Alice can't make a transfer, Transfer(t) is false, and so Spec is false.1

So typically when modeling we add a stutter step, like this:

Spec =
  1. alice[0] = 10
  2. bob[0]   = 10
  3. all t in Time:
    || Transfer(t)
    || 1. alice[t+1] = alice[t]
       2. bob[t+1] = bob[t]

(This is also why we can use infinite behaviors to model a finite algorithm. If the algorithm completes at t=21, t=22,23,24... are all stutter steps.)

There's enough moving parts here that I'd want to break it into subpredicates.

Init =
  1. alice[0] = 10
  2. bob[0]   = 10

Stutter(t) =
  1. alice[t+1] = alice[t]
  2. bob[t+1] = bob[t]

Next(t) = Transfer(t) // foreshadowing

Spec =
  1. Init
  2. all t in Time:
    Next(t) || Stutter(t)

Now finally, how do we represent the property NoOverdrafts? It's an invariant that has to be true at all times. So we do the same thing we did in Spec, write a predicate over all times.

property NoOverdrafts =
  all t in Time:
    alice[t] >= 0 && bob[t] >= 0

We can even say that Spec => NoOverdrafts, ie if a behavior is valid under Spec, it satisfies NoOverdrafts.

One of the exercises

Modify the Next so that Bob can send Alice transfers, too. Don't try to be too clever, just do this in the most direct way possible.

Bonus: can Alice and Bob transfer to each other in the same step?

Solution [in back of book]: We can rename Transfer(t) to TransferAliceToBob(t), write the converse as a new predicate, and then add it to next. Like this

TransferBobToAlice(t: Time) =
  some value in 1..=bob[t]:
    1. alice[t+1] = alice[t] - value
    2. bob[t+1] = bob[t] + value

Next(t) =
  || TransferAliceToBob(t)
  || TransferBobToAlice(t)

Now, can Alice and Bob transfer to each other in the same step? No. Let's say they both start with 10 dollars and each try to transfer five dollars to each other. By TransferAliceToBob we have:

1. alice[1] = alice[0] - 5 = 5
2. bob[1] = bob[0] + 5 = 15

And by TransferBobToAlice, we have:

1. bob[1] = bob[0] - 5 = 5
2. alice[1] = alice[0] + 5 = 15

So now we have alice[1] = 5 && alice[1] = 15, which is always false.

Temporal Logic

This is good and all, but in practice, there's two downsides to treating time as a set we can quantify over:

  1. It's cumbersome. We have to write var[t] and var[t+1] all over the place.
  2. It's too powerful. We can write expressions like alice[t^2-5] = alice[t] + t.

Problem (2) might seem like a good thing; isn't the whole point of logic to be expressive? But we have a long-term goal in mind: getting a computer to check our formal specification. We need to limit the expressivity of our model so that we can make it checkable.

In practice, this will mean making time implicit to our model, instead of explicitly quantifying over it.

The first thing we need to do is limit how we can use time. At a given point in time, all we can look at is the current value of a variable (var[t]) and the next value (var[t+1]). No var[t+16] or var[t-1] or anything else complicated.

And it turns out we've already seen a mathematical convention for expressing this: priming!2 For a given time t, we can define var to mean var[t] and var' to mean var[t+1]. Then Transfer(t) becomes

Transfer =
  some value in 1..=alice:
    1. alice' = alice
    2. bob' = bob

Next we have the construct all t in Time: P(t) in both Spec and NoOverdrafts. In other words, "P is always true". So we can add always as a new term. Logicians conventionally use □ or [] to mean the same thing.3

property NoOverdrafts =
  always (alice >= 0 && bob[t] >= 0)
  // or [](alice >= 0 && bob[t] >= 0)

Spec =
  Init && always (Next || Stutter)

Now time is almost completely implicit in our spec, with just one exception: Init has alice[0] and bob[0]. We just need one more convention: if a variable is referenced outside of the scope of a temporal operator, it means var[0]. Since Init is outside of the [], it becomes

Init =
  1. alice = 10
  2. bob = 10

And with that, we've removed Time as an explicit value in our model.

The addition of primes and always makes this a temporal logic, one that can model how things change over time. And that makes it ideal for modeling software systems.

Modeling with TLA+

One of the most popular specification languages for modeling these kinds of concurrent systems is TLA+. TLA+ was invented by the Turing award-winner Leslie Lamport, who also invented a wide variety of concurrency algorithms and LaTeX. Here's our current spec in TLA+:

---- MODULE transfers ----
EXTENDS Integers

VARIABLES alice, bob
vars == <>

Init ==
  alice = 10 
  /\ bob = 10

AliceToBob ==
  \E amnt \in 1..alice:
    alice' = alice - amnt
    /\ bob' = bob + amnt

BobToAlice ==
  \E amnt \in 1..bob:
    alice' = alice + amnt
    /\ bob' = bob - amnt

Next ==
  AliceToBob
  \/ BobToAlice

Spec == Init /\ [][Next]_vars

NoOverdrafts ==
  [](alice >= 0 /\ bob >= 0)

====

TLA+ uses ASCII versions of mathematicians notation: /\/\/ for &&/||, \A \E for all/some, etc. The only thing that's "unusual" (besides == for definition) is the [][Next]_vars bit. That's TLA+ notation for [](Next || Stutter): Next or Stutter always happens.


The rest of the chapter goes on to explain model checking, PlusCal (for modeling nonatomic transactions without needing to explain the exotic TLA+ function syntax), and liveness properties. But this is the intuition behind the "temporal logic of actions": temporal operators are operations on the set of points of time, and we restrict what we can do with those operators to make reasoning about the specification feasible.

Honestly I like it enough that I'm thinking of redesigning my TLA+ workshop to start with this explanation. Then again, maybe it only seems good to me because I already know TLA+. Please let me know what you think about it!

Anyway, the new version of the chapter will be in v0.5, which should be out mid-November.


Blog Rec

This one it's really dear to me: Metadata, by Murat Demirbas. When I was first trying to learn TLA+ back in 2016, his post on using TLA+ in a distributed systems class was one of, like... three public posts on TLA+. I must have spent hours rereading that post and puzzling out this weird language I stumbled into. Later I emailed Murat with some questions and he was super nice in answering them. Don't think I would have ever grokked TLA+ without him.

In addition to TLA+ content, a lot of the blog is also breakdowns of papers he read— like the morning paper, except with a focus on distributed systems (and still active). If you're interested in learning more about the science of distributed systems, he has an excellent page on foundational distributed systems papers. But definitely check out his his deep readings, too!


  1. In the book this is presented as an exercise (with the solution in back). The exercise also clarifies that since Time = Nat, all behaviors have an infinite number of steps. ↩

  2. Priming is introduced in the chapter on decision tables, and again in the chapter on database invariants. x' is "the next value of x", so you can use it to express database invariants like "jobs only move from ready to started or aborted." ↩

  3. I'm still vacillating on whether I want a "beyond logic" appendix that covers higher order logic, constructive logic, and modal logic (which is what we're sneakily doing right now!)

    While I'm here, this explanation of always as all t in Time isn't 100% accurate, since it doesn't explain why things like [](P => []Q) or <>[]P make sense. But it's accurate in most cases and is a great intuition pump. ↩

If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:
Join the discussion:
Dylan Sprague
Oct. 22, 2024, evening

I like this method of explanation; I think it helps with getting an intuition for how time is implicit in TLA+. Quantifying over t: Time seems like it makes sense once you've introduced quantifiers, and then you can motivate priming as a straightforward followup from that.

Possible typo in the code block after introducing priming; right now it says Transfer = some value in 1..=alice: 1. alice' = alice 2. bob' = bob Should that be 1. alice' = alice + value 2. bob' = bob - value ?

Reply Report
Dylan Sprague
Oct. 22, 2024, evening

Argh, formatting.

Original code block:

Transfer = some value in 1..=alice: 1. alice' = alice 2. bob' = bob

Possible correction:

Transfer = some value in 1..=alice: 1. alice' = alice - value 2. bob' = bob + value

Reply Report
Dylan Sprague
Oct. 22, 2024, evening

...apparently multi-line code blocks don't work in Buttondown's comment formatting. Ah well.

Reply Report
Yawar Raza
Oct. 22, 2024, evening

Aww, I was hoping you were going to use NoOverdrafts to then show how the spec could instead be written by including NoOverdrafts in the spec itself to obviate the upper bound in the transfer spec itself, i.e. some value in 1..=alice[t] can then turn into just some value in 1...

Reply Report
Brent
Oct. 25, 2024, evening

First thought: we could do that, but that puts the cart before the horse! The goal of writing the spec is to make sure that a higher-level property is guaranteed by a lower-level specification.

Second thought: I wonder if any safety property can be turned into a deadlock property by doing exactly what you said, and incorporating the property as part of the spec.

Reply Report
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.