Users are Nondeterministic Agents of Chaos
New Post: Clever vs Insightful Code
I shared an early version with the newsletter some time back, and now the final version is done! This one is less technical than usual, just me trying to articulate a difference between different classes of "cleverness" in programming. I think lumping them all under the same brush does us a disservice, and I figure calling out the differences can lead other people to do more interesting stuff with the division. Enjoy the blog!
New Projects Page
Putting what I consider my best work in one place. Right now contains my formal methods teaching and research projects, like the Crossover project. That way people can find it without having to archive dive my blog.
Users are Nondeterministic Agents of Chaos
Work on Predicate Logic for Programmers continues. I'm tentatively considering three "main areas" of application:
- Specifying functions
- Specifying requirements
- Database constraints
(2) is really interesting. What makes specifying requirements different from specifying functions? One obvious difference is scope: you can't write a total specification for requirements the same way you can for functions, you can only do partial specifications. A subtler—but more impactful—difference is that requirements are much more about user interaction than functions are. We don't just care that there's one function that's correct, but that there's a range of things the user can do, and in aggregate they are all correct.
And this is really interesting to me! A specification state is nondeterministic if there are multiple possible next states; the behavior of the system can go in one of N ways. Normally I think of nondeterminism in three very rough categories:
- Probabilistic Nondeterminism: when something has flat-out random chance involved. Random number generators are the most obvious source of this.1
- Abstract Nondeterminisim: when a system is technically deterministic but the determinism is so complex that it's easier to treat it as nondeterministic in a spec. Like if we have workers writing messages to a queue, you can model all of the factors that could lead to a message being dropped, or you can say it's nondeterministic: "the writer will either write a message or drop a message."
- Concurrency: multiple agents might locally be deterministic, but they can run in any order. You might have multiple workers writing to the queue, so even if they're guaranteed to successfully write, then the ordering of writes is nondeterministic.
User interactions are a fourth kind of nondeterminism: one agent in the system is continually making decisions that affect the state of the system. From a specification perspective, this looks the most like abstract nondeterminism. But unlike AN, users are essentially nondeterministic. AN goes away as we make the system more detailed, but users always have free will and can do whatever they want. Further, user actions are innately concurrent and asynchronous, because they live in the real world. Most long-lived systems need to assume that both a lot of different people are using them, and that many people are using them over long lengths of time.
Simple example: I buy a plane ticket. The next week, I upgrade to first-class, and the week after that, I cancel. I'm making nondeterministic decisions across a span of two weeks, meaning lots of other concurrent agents (including other users) are going to be doing stuff in that time. And those are going to affect the actions I choose to take. Maybe I can get a full refund and in the intervening time seats have gotten cheaper, so I save money if I cancel and buy a new seat.
Anyway, this makes modeling user interactions a potential use case for formal methods since nondeterminism is, as a rule, horrible. The logic for the example up there could look something like:
Types User, Seat
// two people can't share a seat
Property
all s in Seat:
at-most-one u in User:
u has-seat s
BuySeat(u, s):
1. u can-afford s
2. no u2 in User: u2 has-seat s
3. u' has-seat s
4. ChargeUser(u, s)
ChangeSeat(u, s1, s2):
1. 1. u has-seat s1
2. u' !has-seat s1
2. BuySeat(u, s2)
// this charges them for both seats
// too lazy to fix for newsletter
UpgradeSeat(u, s):
1. some old_seat in seat:
1. u has-seat old_seat
2. s is-nicer-than old_seat
2. ChangeSeat(u, old_seat, s)
RefundSeat(u, s):
1. u has-seat s
2. !RefundPeriodExpired(s)
3. u' !has-seat s
(Yes, those are pretty terrible requirements, but this is a newsletter, so I'm allowed to do things all shoddy. Just trying to get ideas across!)
As you can tell, I'm still playing around a lot with the most useful syntax. Feedback on last week's post indicated I shouldn't use |
for separating quantifiers from expressions, because people thought it looked too much like ||
. Also, I don't yet know the best way to express temporality in predicate logic that'd be 1) useful, 2) understandable, and 3) kinda sorta rigorous. TLA+ style is too complex, dynamic logic is too restricted at this level, with everything else I'd be worried about frame problems. Right now I'm leaning towards some kind of state machine formalism, possibly DASH, and the more I think about that the more I like it.2
Back to "users are nondeterministic". Are they nondeterministic in certain ways? For abstract nondeterminism, we say one of N actions happens, but we often expect something to be much more common than other things. Going back to the worker and message queue, even though a message can nondeterministically be dropped, we expect that only a tiny minority of messages will be. So we can reasonably include that as an assumption in our model. I'd guess that users are totally arbitrary, though, since each person has their own reasons for doing things. "We violate this property if a message is dropped at the wrong time" might be an acceptable cost, since it will happen so rarely. But "We violate a system priority if the user does these exact five things in a row" is unacceptable, since that will happen all the time.
(To be clear, this isn't the user being dumb or anything. It's more that as the software engineers, we have a narrow view of how people will want to use our software. A thousand users will be much more creative than any team of programmers could ever be.)
I guess this is already obvious to most of you? Of course users are going to be breaking everything. The idea I'm trying to play with is that we can specify a model of a user as essentially taking random, arbitrary actions. Even if every individual person is following a specific path, the aggregate over all of them is equivalent to total nondeterminism, at least if we're doing worst-case modeling.
One last consequence of this: outside the formal methods world, showing "system matches requirements" is often called acceptance testing. Here's an example of one such tool, Cucumber (ref):
Scenario: Customer has a broker policy so DOB is requested
Given I have a "Broker" policy
When I submit my policy number
Then I should be asked for my date of birth
Scenario: Customer has a tied agent policy so last name is requested
Given I have a "TiedAgent" policy
When I submit my policy number
Then I should be asked for my last name
These are each a single user action. There's no sense of a user doing multiple actions in a nondeterministic fashion, which limits how much the acceptance tests can actually verify the requirements. To properly acceptance test you'd need to have the user take many actions in sequence, possibly with concurrent interrupts, and not just in orders you'd expect. That'd be some form of model-based testing, I think.
-
Yes, most RNGs are pseudorandom, so technically this isn't nondeterministic. However, pseudorandomness is an unfortunate implementation detail: in most cases, we use RNGs because we genuinely want things to be as close to random as possible. ↩
-
I hash out a surprising number of ideas on this newsletter. It's like having thousands of rubber ducks! ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.