Scaffolding TLA+

concept

                May 20, 2021

            Scaffolding TLA+

            I'm in the process of updating my TLA+ workshop for my class next week.¹ Every time I run it I get new ideas on what to improve. After April's class, one of the most important changes I noticed was also one of the smallest. In TLA+, there's two ways to write "not-equals": # and /=. I'm used to writing /= and find # really weird, so I teach /=. After April, I mass changed every /= to #.
Why? Because people kept writing \= by mistake.
…
Most people approach formal methods like they're learning a new programming language. This is understandable, but also a mistake. Specifications serve a different role and have to be thought about differently than implementation details. When teaching TLA+, I'm simultaneously covering three topics:

The concept of specifications and modeling. Things like predicates/properties, facts/axioms, first-order logic, state and behavior spaces.
TLA+'s conceptual model. Actions, stutter-invariance, error traces, adversarial nondeterminism.
The syntax of TLA+. How to write things.

The only way to really build a mental model is through regular practice and feedback. You try things out, make mistakes, and learn from them. This is necessary but also slow and painful, so much of teaching is finding ways to speed this up. Creating specific controlled environments where mistakes are obvious and the feedback is both quick and comprehensive. Anything that slows this down is something I want out. One of the biggest impedances is overload: if you're faced with three new concepts at once, you can't easily get feedback on just one of them. You have no idea where your mistakes are and which things you are misunderstanding. It takes less time to learn five things sequentially than three things simultaneously. The practice of designing the learning environment so students only have to learn one idea at a time is called scaffolding.
TLA+ is notoriously hard to learn. This is mostly attributed to the concept of specifying, but a huge part of the problem is how hard it is to scaffold TLA+. This is because topic (3), TLA+'s syntax, so unfriendly for beginners. Much of it is intuitive to experienced practitioners but gets in the way of initial learning. Students spend so much time wrestling with the syntax that could be better spent on the core concepts.
A small example: a student has a set of servers that can be online or offline. As part of an action they want to add online server to the set of active servers. Here's what they write:
active' == active /union CHOOSE node in Servers: IsOnline(node)

There are seven separate issues in this one liner. Here's five of them:

It's active' =, not active ==.
It's \union, not /union.
It's node \in Servers, not node in Servers.
\union takes two sets, while we're unioning a set and an individual element here. You have to wrap the CHOOSE statement in brackets.
CHOOSE is deterministic. For a given set S and predicate P, CHOOSE n \in S: P(n) will always choose the same value. This means you've collapsed a nondeterministic choice into a deterministic one, so you're not actually specifying what you wanted to.

Of those problems, only the fifth one is an issue with their mental model. That's the thing I actually need to get them to understand. Everything else is just noise. But it's noise that gets in the way of them learning how to model, noise they have to spend lots of time and mental energy learning to fix. It is incredibly discouraging to beginners. Why even model a real-world system if you're going to spend 10x as much time hunting down syntax errors?
This would be a little bit more tolerable if the parser gave clearer error messages, but it doesn't. I dropped that code sample into the IDE and here's the only error I got:
***Parse Error***
Was expecting "Expression or Instance"
Encountered "Beginning of definition" and token "=="

Residual stack trace follows:
[snip]

It's breaking on active' ==, which isn't intuitively obvious from the error. Not exactly helpful for beginners.
I have an incredible amount of respect for the people who make the TLA+ tooling. However, they're a very small group of people and Microsoft seems uninterested in funding one of their most revolutionary technologies. So compiler errors, which primarily affect beginners over experts, have always been a low priority. That means it falls to me as a teacher to carefully design my lessons to keep all these sharp edges away from my users. Sharp edges like "What if they write \= instead of /=?"
***Parse Error***
Was expecting "==== or more Module body"
Encountered "1" at line 12, column 21 and token "="

So I'm switching to using # instead. 
Edge Smoothing
Fixing the /= problem is easy. A harder one is set comprehensions: TLA+ has syntax for mapping sets and filtering them, and they're incredibly easy to confuse:
{F(x): x \in set} \* map
{x \in set: F(x)} \* filter

I'll be introducing these to my students and then telling them to use two helper operators I wrote:
Map(Op(_), set) == {Op(x): x \in set}
Filter(Op(_), set) == {x \in set: Op(x)}

Instead of writing {x*2: x \in 1..3}, they can write Map(LAMBDA x: x*2, 1..3). Much less confusing.
Unlike the previous fix, though, this substitution is worse in the long run. You can't use the full map/filter syntax, like multiset mapping, and it's a convention you will never see in the wild, so it makes reading real-world specs harder. Then why give students these substitutions?
Again, it's about scaffolding. Normally students have to learn map/filter syntax along with everything else. Giving them a substitute takes that off the table so they can focus on the more important stuff. By the time my substitute switches from help to hindrance, they'll already know all of the other stuff. They can incorporate the "proper" syntax into an existing mental model vs learning it while building a new mental model.
This is a subtle lesson in teaching: time isn't fungible. The time beginners spend learning is far more fragile and precious than the time practitioners spend learning. It's often worth teaching simpler misconceptions to beginners if it saves time in getting them to the practitioner level. Of course I need to be clear when I'm doing this. My students are adults, not children, and deserve to know when I intentionally teach simplifications. 
All of this seems incredibly obvious to me, but that's just because I'm a professional teacher. It's hard to imagine that many people don't already know this, but I have to remind myself that most people don't need to know this and may not have been put in situations where it's useful for them to think about. Expert blindness strikes again.

If you're unfamiliar with TLA+, welcome to the newsletter! TLA+ is a tool for finding bugs in abstract models of software systems. Here's an post on finding bugs in some Go code and here's a talk on finding bugs in an ETL worker queue. ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: