Refactoring Invariants

should

                September 24, 2024

            Refactoring Invariants

                Porting a formal methods idea to everyday coding

            (Feeling a little sick so this one will be short.)
I'm often asked by clients to review their (usually TLA+) formal specifications. These specs are generally slower and more convoluted than an expert would write. I want to fix them up without changing the overall behavior of the spec or introducing subtle bugs.
To do this, I use a rather lovely feature of TLA+. Say I see a 100-line Foo action that I think I can refactor down to 20 lines. I'll first write a refactored version as a separate action NewFoo, then I run the model checker with the property
RefactorProp ==
    [][Foo <=> NewFoo]_vars

That's an intimidating nest of symbols but all it's saying is that every Foo step must also be a NewFoo step. If the refactor ever does something different from the original action, the model-checker will report the exact behavior and transition it fails for. Conversely, if the model checker passes, I can safely assume they have identical behaviors.
This is a refactoring invariant:¹ the old and new versions of functions have identical behavior. Refactoring invariants are superbly useful in formal specification. Software devs spend enough time refactoring that they'd be useful for coding, too.
Alas, refactoring invariants are a little harder to express in code. In TLA+ we're working with bounded state spaces, so the model checker can check the invariant for every possible state. Even a simple program can have an unbounded state space via an infinite number of possible function inputs. 
(Also formal specifications are "pure" simulations while programs have side effects.)
The "normal" way to verify a program refactoring is to start out with a huge suite of oracle tests. This should catch a bad refactor via failing tests. The downside is that you might not have the test suite in the first place, or not one that covers your particular refactoring. Second, even if the test suite does, it only indirectly tests the invariant. It catches the refactoring error as a consequence of testing other stuff. What if we want to directly test the refactoring invariant?
Two ways of doing this
One: by pulling in formal methods. Ray Myers has a neat video on formally proving a refactoring is correct. That one's in the niche language ACL2, but he's also got one on refactoring C. You might not even to prove the refactoring correct, you could probably get away with using an SMT solver to find counterexamples.
Two: by using property-based testing. Generate random inputs, pass them to both functions, and check that the outputs are identical. Using the python Hypothesis library:
from hypothesis import given
import hypothesis.strategies as st

# from the `gilded rose kata`
def update_quality(list[Item]):
    ...

def update_quality_new(list[Item]):
    ...

@given(st.lists(st.builds(Item)))
def test_refactoring(l):
    assert update_quality(l) == update_quality_new(l)

One tricky bit is if the function is part of a long call chain A -> B -> C, and you want to test that refactoring C' doesn't change the behavior of A. You'd have to add a B' that uses C' and then an A' that uses B'. Maybe you could instead create a branch, commit the change the C' in that branch, and then run a cross-branch test against each branch's A.
Impure functions are harder. The test now makes some side effect twice, which could spuriously break the refactoring invariant. You could instead test the changes are the same, or try to get the functions to effect different entities and then compare the updates of each entity. There's no general solution here though, and there might be No Good Way for a particular effectful refactoring.
Behavior-changing rewrites
We can apply similar ideas for rewrites that change behavior. Say we have an API, and v1 returns a list of user names while v2 returns a {version, userids} dict. Then we can find some transformation of v2 into v1, and run the refactoring invariant on that:
def v2_to_v1(v2_resp):
    return [User(id).name for user in v2_resp["userids"]]

@given(some_query_generator)
def test_refactoring(q):
    assert v1(q) == v2_to_v1(v2(q))

Fun fact: v2_to_v1 is a software homomorphism!

Well technically it's an action property since it's on the transitions of states, not the states, but "refactor invariant" gets the idea across better. ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: