I have complicated feelings about TDD
They're not all good and not all bad
There were a couple of threads this week on why Test Driven Development, or TDD, isn't more widely used by programmers.
https://twitter.com/GeePawHill/status/1556825728326553600
That thread (it's a good one) argues that the problem was an organization failure by TDD proponents, pushing too hard and letting memetic decay transmute "TDD" into "tests r gud". I have a different explanation: TDD isn't as valuable as its strongest proponents believe. Most of them are basing TDD's value on their experience, so I'll base it on mine.
Let's start with my background: I'd consider myself a "TDD person". I learned it in 2012, it helped me get my first software job, and my first two jobs were places that did strict TDD in Ruby. For a while all my personal projects followed strict TDD, and if I ever went crazy and did a tech startup, I'd use TDD to write the software. I defended it back in 2018 and would defend it now.
The difference is that I treat it as a useful technique, one of many, while the very strongest advocates consider it transformational. Some claim it's as important to programming as washing your hands is to medicine. Others think my experience with formal methods irrevocably stains my TDD credentials.
Why the difference? Because we mean two different things. I practice "weak TDD", which just means "writing tests before code, in short feedback cycles". This is sometimes derogatively referred to as "test-first". Strong TDD follows a much stricter "red-green-refactor" cycle:
- Write a minimal failing test.
- Write the minimum code possible to pass the test.
- Refactor everything without introducing new behavior.
The emphasis is on minimality. In its purest form we have Kent Beck's test && commit || reset (TCR): if the minimal code doesn't pass, erase all changes and start over.
Further, we have different views on why to do TDD. Proponents of strong TDD often say it's not a testing technique, but rather a "design technique" that happens to use testing. I struggle with this framing for two reasons. First of all, they're using "design" in a very different way than I do: the local code organization versus the system specification. Second, many say it was always this way, when the original book explicitly calls it a testing technique. It's okay to say "it was but we've learned since then", but I get annoyed by historical revisionism.2
Regardless, it's a core tenet of modern strong TDD: TDD makes your design better. In other words, weak TDD is a technique, while strong TDD is a paradigm.
The straw maximalist
No one likes to hear they’re doing it wrong, least of all when they’re doing it wrong.
What if you tried TDD and it didn’t “work” (whatever that means), but in fact the thing you tried wasn’t TDD at all? — Against TDD
So a disclaimer: this is a newsletter piece, not a blog, so I'm trying to timebox the writing to one day.1 So I didn't do as much research to make sure I completely understand the other side. Also, for the sake of not working through every nuance, I'm going to focus on a "maximalist" model of TDD:
- TDD must be used in all but the most exceptional cases.
- The TDD cycle should be followed as strictly as possible (though TCR is unnecessary).
- Test-first isn't TDD.
- TDD always leads to a better design.
- TDD obviates other forms of design.
- TDD obviates other forms of verification.
- TDD cannot fail. If it causes problems, it's because you did it wrong.
The tradeoff between TDD and productivity is about the learning curve. Once you reach the top of that hill there's no tradeoff. If you're talking about a tradeoff, that signals where on the hill you might be.
— Jason Gorman @jasongorman@mastodon.cloud (@jasongorman) May 16, 2022
I don't believe there are many true maximalists out there, though I've met at least one.3 Most advocates are moderate in some ways and extreme in others— I'm certainly no exception! But the maximalist is a good model for what the broader TDD dialog looks like. While people pay lip service to things like "use the right tool" and "there is no silver bullet", they often express their maximal viewpoints and don't share their caveats. Maximalism thinking is spread diffuse across the discipline.
Further, the counterpart to maximal TDD is "some TDD", and knowing why maximalism breaks down helps me figure out where that "some" should be. So even if (almost) nobody is maximalist, it's worth investigating. Just know that I'm assuming a spherical cow.
Analyzing Maximalism
The maximalist case for TDD comes from two benefits: it's good for your testing and it's good for design.
Verification
Test Driven Development IS Double Entry Bookkeeping. Same discipline. Same reasoning. Same result.
— Uncle Bob Martin (@unclebobmartin) November 11, 2019
The argument here is pretty simple: under maximal TDD, every line of code written is covered by a test, which will catch more bugs. I believe this, too! More test coverage means fewer bugs.
The problem is that TDD tests are very constrained. To keep the TDD cycle fast, your tests need to be fast to write and run— "hundreds of tests in a second". The only tests that fit all three criteria are hand-made unit tests. This leaves out other forms of testing:
- Integration testing
- End-to-end testing
- Mutation testing
- Fuzzing
- Property testing
- Model-based testing
For unit testing to be sufficient, it needs to supersede all of these other forms of testing. And it also needs to supersede non-testing based verification techniques:
- Manual testing
- Code review
- Type systems
- Static analysis
- Contracts
- Shoving assert statements everywhere
"But nobody says unit tests are all you need!" Well consider yourself lucky, because I've run into that strain of maximalism many, many times. If you use TDD you don't have bugs, so if you have bugs you didn't use TDD right.
I don’t have to write the test for nil because I know that nil will never be passed. I don’t need to make states unrepresentable, if I know those states will never be represented. — Tests and Types
But that's impossible. Unit tests only cover units. There are no side effects, nondeterminism, or sequences of events. They only cover what the programmer thought to test, and only the specific inputs they chose. But many serious bugs are higher level, from correct components interacting in a bad way (1 2 3).4 Or they only happen with very specific inputs. Or they always happen on a nil, but there's only a specific call chain that could pass nil. You never know if states will never be represented.
"Design"
Test Driven Development (TDD) is not a testing approach.
— Daniel Moka⚡ (@dmokafa) November 30, 2020
It is a design approach. It helps you to build clean, tested and bugless code by using automated tests.
Tests are not the outputs of TDD. Tests are the inputs and the clean design and code are the outputs.
As I said before, TDD advocates use "design" in a very different way than I do, so let's start by explaining the difference.
Design, to me, is the software's specification. We have a problem we want to solve and a set of properties we want to preserve, does our system satisfy that? For example, consider a worker that pulls data from three streams, merges them together, and uploads them into a database. I want to make sure that data isn't duplicated, stream downtime are handled gracefully, all data is eventually merged, etc. I don't care what methods the code is calling to make its "API requests" or how it turns a JSON response into domain objects. I just care what it does with the data.
By contrast, the "design" in TDD is how the code is organized. Is munge
a public or private method? Should we split the http response handler into separate objects? What are the parameters for the check_available
method? TDD advocates talk about "listening to your tests": if writing the tests is hard, then that points to a problem in your code. You should refactor the code to make it easier to test. In other words, code that is hard to test through TDD is badly organized.
TDD is a design technique. If you don't need to design, you don't need TDD. (The tests are just a nice side effect of the design process.) I'm hard pressed to imagine a system so small that you can get away with zero design, tho.
— Allen Holub allenholub.(mstdn.social,bsky.social) (@allenholub) February 14, 2021
(Again, this is the maximalist position.)
But does TDD guarantee good organization? I don't think so. We know that TDDed code looks different. Among other things:
- Dependency injection. This makes the code more configurable at the cost of making it a lot more complex.
- Lots of small functions instead of a few larger functions.
- Large surfaces of public methods instead of deep use of private methods.
Are these necessarily bad? No. Can they bad? Yes! Sometimes large functions make for better abstractions and small functions lead to confusing behavior graphs. Sometimes dependency injection makes code a lot more complex and hard to understand. Sometimes large public APIs tightens module coupling by encouraging reuse of "implementation objects". If TDD is at odds with your organization, sometimes the TDD is wrong.
Now that's a fairly weak argument, because it applies as much to any kind of design pressure. The more specific problem with maximalism is that the code organization must evolve in extremely small steps. This leads to path dependence: the end result of the code is strongly influenced by the path you took to get there.6 Take quicksort. Following maximal TDD, here were the first seven tests I wrote:
quicksort([]) # prove it exists
assert quicksort([]) == []
assert quicksort([1]) == [1]
assert quicksort([2, 1]) == [1, 2]
assert quicksort([1, 2]) == [1, 2]
assert quicksort([1, 2, 1]) == [1, 1, 2]
assert quicksort([2, 3, 1]) == [1, 2, 3]
And here's the minimal code that passes it:
def quicksort(l):
if not l:
return []
out = [l[0]]
for i in l[1:]:
if i <= out[0]:
out = [i] + out
else:
out.append(i)
return out
To be clear, I wasn't trying to be perverse here, this is how I used to do when I was being strict about TDD. With more tests it will converge on being correct, but the design is gonna be all wonky, because we wrapped the code around a bunch of tiny tests.
Now I said I do "weak TDD", so I'd still write a test before quicksort. Unlike with maximal TDD, though, I wouldn't write a unit test. Instead:
from hypothesis import given
from hypothesis.core import example
import hypothesis.strategies as st
@given(st.lists(st.integers()))
def test_it_sorts(l):
out = quicksort(l)
for i in range(1, len(l)):
assert out[i] >= out[i-1]
This is an example of property testing. Instead of coding to a bunch of specific examples, I'm coding to the definition of sorting, and the test will run my code on random lists and check if the property holds. The conceptual unification runs much deeper, and this drives better organization.
That leads to my biggest pet peeve about maximalist TDD: it emphasizes local organization over global organization. If it can keep you from thinking holistically about a function, it can also keep you from thinking holistically about the whole component or interactions between components.5 Up front planning is a good thing. It leads to better design.
Architecture is too essential to design up-front.
— James Shore @jamesshore@mastodon.online (@jamesshore) July 18, 2021
(Actually my biggest pet peeve is that it makes people conflate code organization with software design, but non-TDDers conflate them too, so maybe I just picked an exceptionally poor topic to evangelize.)
In defense of TDD
I've spent enough time talking trash about TDD. As I said before, I regularly practice the "weak" form of TDD: write some kind of verification before writing the code, but without sticking to minimality or even test-based verification. The TDD maximalist might say this isn't "real TDD", but hell with them.
Weak TDD has four benefits:
- You write more tests. If writing a test "gates" writing code, you have to do it. If you can write tests later, you can keep putting it off and never get around to it. This, IMO, is the principle benefit of teaching TDD to early-stage programmers.
- It's easier to refactor, as you catch regressions more easily.
- All of your code now has at least one client as you develop it. This tells you if your interfaces are too awkward early on.
- It gets you in the habit of thinking about how your code will be verified, even if you don't actually do so with a unit test.
Wait, aren't these the same benefits as maximal TDD? "It checks if you've got awkward interfaces" sounds an awful lot like "listening to your tests." Well, yes. You should listen to your tests! TDD often makes your design better!
My point is that it can also make your design worse. Some TDD is better than no TDD, but no TDD is better than excessive TDD. TDD is a method you use in conjunction with other methods. Sometimes you'll listen to the methods and they'll give conflicting advice. Sometimes, TDD's advice will be right and sometimes it will be wrong. Sometimes it'll be so wrong that you shouldn't use TDD in that circumstance.
Why TDD hasn't conquered the world
Somewhat eye opening day. Test driven development was cutting edge circa 1999. It is the basis of modern development. I can't imagine not using it. Hearing companies that don't use it, is like hearing companies go "have you heard of this new thing called Linux?" ... wtf?
— Simon Wardley (@swardley) February 8, 2022
So, after all that, I have my hypothesis on why TDD doesn't spread. And to be honest it's a pretty anticlimactic one. Maximal TDD isn't nearly as important as the maximalist would believe. TDD is better used in a portfolio of methods. Since there's way more useful methods than one person can possibly master, you pick what you want to get good at. Often TDD doesn't make the cut.
I'd equate it to shell scripting. I spent a lot of time this spring learning shell scripting.7 It's paid off tenfold. I think that every developer should know how to write custom functions. Is it more important than TDD? If people don't have the time to learn both, which one should they pick? What if proper TDD takes so much time you can learn both shell scripting and debugging practices? When do people get to stop?
Conclusions
I have no idea where I'm even ending up. This "one day newsletter" took three days and 2500 words and I don't know if it made anything clearer for me or for any of you. I don't even know if my understandings are valid, because I didn't do much research or work through any of the nuances. This is why I should leave this stuff to the blog and just use the newsletter for regex stanning.
Update for the Internets
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. Updates are usually 1x a week. I also have a website where I put my polished writing (the newsletter is more for off-the-cuff stuff).
-
Future Hillel here, past Hillel is a goddamn liar, this took three days. That should give you a sense of how long my blog posts take. ↩
-
To be clear, it's a pretty minor example of revisionism. ↩
-
No, it's not who you're thinking of. It's a guy who isn't really known outside of some Chicago tech circles. ↩
-
Lots of serious bugs are also at the unit level: see simple testing can prevent most critical failures. ↩
-
Someone's going to roll out the whole Sudoku thing, where Ron Jeffries started writing a Sudoku solver with TDD and never finished. I don't like that example, because 1) he started it to kill time in an airport lobby, 2) he hadn't heard of sudoku before and was just playing around with TDD, and 3) it's okay to lose interest in projects. ↩
-
The refactoring stage of red-green-refactor helps here, but not as much as you'd expect. ↩
-
I'm on Windows, so powershell. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.