Software Engineers Don't Have Disciplines
I know I said no newsletter this week, between the Alloy workshop and my panicked rush to finish The Crossover Project, but I found this old abandoned essay and thought it was perfect for the newsletter. Enjoy!
I want you to do a quick exercise. First, here's two sorting algorithms. Which of these is faster in the average case? By how much?
def bubblesort(l):
done = False
out = l[:]
while not done:
done = True
for i in range(1, len(out)):
if out[i] < out[i-1]:
done = False
out[i], out[i-1] = out[i-1], out[i]
return out
def quicksort(l):
if len(l) <= 1:
return l
pivot = l[0]
lt = [x for x in l if x < pivot]
eq = [x for x in l if x == pivot]
gt = [x for x in l if x > pivot]
return quicksort(lt) + eq + quicksort(gt)
In the average case, the first is O(n^2)
, the latter is O(n log n)
. Not only can we say quicksort is faster, but we can say by how much it's faster. We can also say that in the worst case, quicksort also O(n^2)
, and we can say what conditions cause that worst case.
Second part: you're writing a room reservation system for a company. People can reserve an unreserved room for a given timeslot, and can cancel their reservations. You're asked to implement two new features:
- Two people can co-reserve a room for an event, and either can cancel it.
- A person can be put on the waitlist for a room. If the current reservation is cancelled, the next person on the waitlist gets it.
Which will be more work to implement?
We can say a lot about the two algorithms, while we can say very little about the two requirements. But not only do we know very little, we can't as easily discuss it. We don't have words for it. In other words, we have a discipline of algorithms, but not a discipline of requirements.
What is a discipline?
A discipline is a shared, collected body of information around a topic. The information doesn't need to be consistent with itself, and likely won't be. The discipline covers not just what we know, but what we've thought before, what we've argued about, what we want to explore next, what our heuristics are. Here are some statements that are covered by the discipline of algorithms:
- Algorithms have a worst-case, average-case, and best-case runtime, which grows with the input. Calculating average-case is much harder than worst- or best-case.
- For small inputs, other factors can dominate (the k factor).
- Some names of techniques: divide and conquer, memoization, dynamic programming.
- Algorithms belong to classes of runtime.
P = NP
is an unsolved problem. - Good books in the field are TAOCP and SICP.
Algorithms are a special case, since they often have a rigorous mathematical basis. But we also have disciplines that aren't mathematical, as we have shared experience to work with. We also have disciplines of testing, software architecture, design patterns, and languages. I find the best indicator of a richness of a discipline is how much terminology we have. I can intelligibly say "Integration tests are slower and flakier than unit tests, so use mocks when practicing TDD." I'm using terms with carried meanings.1 I can also do things like get into arguments about the "testing pyramid", or identify testing antipatterns, or find the appropriate test framework in a given language. We can't do the same with requirements. I have no word for "requirement the client is probably going to forget about", or "requirement that will have a noticeable impact on performance", or "requirement that's hard but not boring". I've even gotten into arguments with people who don't think these are meaningful distinctions to make!
Disciplines aren't exclusive, and we can go as narrow and broad as we like. We have a discipline of testing. We also have a discipline of unit testing, which is narrower, and a discipline of software verification, which is broader. Disciplines can also be more- or less- developed.
There are a lot of advantages to having a discipline. It means we can communicate complex ideas. It gives us a field of study with topics we can look up. If somebody says "I want to get better at testing", then somebody will probably recommend Kent Beck, PBT, or Clean Code. Then they will all argue with each other. The discipline doesn't have to commit to a position, it just needs to aggregate the information.
There are also drawbacks. Disciplines lead to jargon and raise the barrier for entry.
Why don't we have certain disciplines?
One answer is "because we don't care". This doesn't work, because there are things we absolutely care about that we don't have disciplines for (package management, debugging). There are also specialist domains most of us don't care that much about that have very rich disciplines (most of academia). So it's not just "what we care about", or even "what we find useful".
I believe a sort of "great man" theory here. It takes more "activation energy" to establish a new discipline than expand and refine an existing one. Once Design Patterns came out, people quickly started remixing the ideas and applying them to fields in software. But that was only after someone else established a discipline of patterns. Making the first move is the hard part. You also have to be lucky, too, and your establishing document has to catch on. It's a very random process.
There's an alternate universe where Martin Fowler became a doctor and Data and Reality hit mainstream. We'd lack a discipline of refactoring, but our data modeling would be amazing.
How can we get them?
Data and Reality was a monumental achievement, a master creating a discipline from first principles. Very, very few disciplines come out this way. You don't need insight or ingenuity to make a discipline. Rather, it's about having the patience to do the painstaking and the work of aggregrating and curating. The material is already out there. It's in people's individual tricks, their heuristics, the pitfalls they know to work around. It's not a discipline because nobody put all of these in one place and drew the links between different ideas.
Design Patterns is clear that they aren't coming up with a new theory. They're just documenting existing patterns. People were already writing Factories and Adapters and preferred composition to inheritance. The book's contribution was the four authors talking to many people and seeing the commonalities. They wrote down the oral knowledge.
Aggregation is itself a skill. You have to make the information into something. You've probably seen the Awesome Foobar fad, like Awesome ML or Awesome Cold Showers. These just collect, and don't discuss. It treats all of the links as separate writing instead of connecting them together. By contrast, Julia Evans talked to many people and wrote What does debugging a program look like? She extracts the common techniques people use in debugging, includes anecdotes, and shares resources. Every one of those techniques becomes a launching point to further develop the discipline.
-
In some cases this leads to confusion. Nobody can really agree on the boundary between integration and unit tests. But testers have a conception of it. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.