A Short Treatise on Bugs
First a term: by "treatise", I'm not saying this newsletter is comprehensive, persuasive, or even correct. I'm using it to mean a very specific type of writing, one that presents an idea and its consequences without trying to convince people of that idea. I don't even know if I'm convinced by the idea. I just want to see if I can communicate a specific idea, and if people understand it but reject it, I'm happy.1
Consider we have a rectangle object:
class Rect():
def __init__(self, l, w):
self.l = l
self.w = w
@property
def l(self):
return self._l
@property
def w(self):
return self._w
@l.setter
def l(self, _l):
self._l = _l
@w.setter
def w(self, _w):
self._w = _w
And a function that uses rectangles and calls an API with the length and width:
def complicated_api_call(r: Rect):
api.one(r.l, r.w, more_params)
api.two(r.l, r.w, ...)
api.tre(r.l, r.w, ...)
Now imagine we have a lot of these functions, so that Rect
is used everywhere. So far so good. Now, after all that, we find out we got the API totally wrong. While we thought the parameter order was length-width, the actual order is width-length. That's a bug, normally one we'd fix by changing the arguments in all of our API calls. Or we could do this instead:
@l.setter
def l(self, _l):
- self._l = _l
+ self._w = _l
@w.setter
def w(self, _w):
- self._w = _w
+ self._l = _w
Now when we try to set the length we set the width instead. We've fixed the bug without having to hunt down all the API callsites! Yay
This "fix" breaks at least two other things:
- Any API calls that use length-then-width are now broken.
- Directly
get
ing the value ofl
andw
no longer works.
In most codebases, that would be enough to make this not work. But we can imagine codebases where people don't do either yet. For example, it could be that we're creating Rect
to immediately pass into api
and don't really use it besides that. In that case, we'd see the original bug get "fixed" with no new wrong behavior introduced. Is the "fix" still a bug?
Intuitively, I want to say yes, it is, because why does the l setter modify w. But I also think of bugs as "the system does not do the right thing", but it does in this case. None of the observable extant behaviors look wrong, because we aren't using it in any cases where it would be wrong.
So I need to expand my mental definition of "bug". I know this is a bug but can't describe why it's a bug, except for post-hoc rationalizations. I want to understand my thinking better. Some ideas I had:
It's not a bug
While it looks like a bug, it isn't a bug. After all, the system is behaving as expected.
This is a pretty silly argument in this case, but again, it's only a minimal example. If all of the api calls were on legacy clients I don't control, then introducing the "fix" might be the least-bad option. Or it could be that properly fixing the bug is a lot harder than patching it over.
It's not a bug (but something else)
It's not a bug, but some other Bad Thing in the codebase. Like how code smells and antipatterns aren't bugs but still bad. Maybe "tech debt"?
I don't like this solution but I'm not sure why. "Tech debt" feels too mild a term here. It so obviously screams "bug" that I want it to be one.
It confuses people
Anybody who looks at this is going to think it's wrong. This adds a knowledge burden to new developers, who have to master the weird codebase convention of "use length when you mean width".
This answer means that bugs aren't just about program behavior. A bug can be something that confuses the reader. This is a very "software as artifact" view where we see the codebase as an entity distinct from its purpose. The bug is then on the entity, not the purpose.
Running with this idea:
- Is any source of confusion a bug? Probably not. Lots of confusion comes from inherently-complex domains. Maybe different kinds of confusion?
- Are language footguns "bugs"? Thinking of how in JavaScript,
['1', '7', '11'].map(parseInt) == [1, NaN, 3]
.
It's fragile
Programs change. While the bug doesn't affect anything right now, it's likely to trigger if we add new features. It's so obviously risky we should just fix it properly.
The bug, then, is "latent". It doesn't affect the current behavior of the system but spreads to corrupt future behaviors. As part of my formal methods work I've run into a lot of systems with race conditions that are don't cause problems until, years later, they tweak the code a little.
One variation of this: the bug isn't breaking anything we're observing but causing issues in a place we're not observing. This is more likely when the program is embedded in a larger open system, like a client API.
It's not OSHA-compliant
Combining the previous two ideas. A new developer signs on, sees this awful code, writes a "fix", and breaks production. It is code that guides reasonable people into doing something dangerous. In fact, it motivates them to do that.
Unstable Equilibrium
One running theme I see in all these explanations is that the code is "unstable". It fixes the problem right now, but there are too many ways for it to backfire later. Our engineering intuition is screaming at us that this is so obviously broken in principle even if it's not broken (yet) in practice. We have experience with how systems evolve, and we're seeing all of the futures in which everything catches fire.
Is anything that triggers that intuition a "bug"? I don't think so. Lots of choices lead to future fires, but they're low quality code or a meta-system tooling or stuff like that. Maybe it's the risk level? Most "risky" code sets things on fire in some futures, this one sets things on fire in almost all futures.
At the same time, fixing the bug is also risky. If you switching things back you have to hunt down every single Rect
callsite, and if you miss a couple you're gonna have very weird results. Once you're done, you're done. But it's weird that "fixing the bug" means, in the short-term, potentially shifting the system from one with no observable errors to one with many observable errors.
...
I think I'm now even more confused than I was before. And most bugs aren't going to be like this, so is this even worth exploring? I dunno. My gut is telling me yes, though. We don't have much of a discipline of bugs. There's a lot of academic research and tooling, but you can get like 90% of the mainstream software theory by reading Julia Evans' article. So the act of exploring weird edge cases seems a good idea, because it gets us closer to further developing the bug discipline.
-
This is as much me trying to convey the idea of a "treatise" as much as talk about bugs. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.