Road tests and Lab tests (This Old Pony #105)
In this issue I want to introduce the concept of road tests and lab tests, not as new kinds of tests but as an alternative taxonomy for thinking about how and what we test in our Django applications (or any others for that matter). I've found it to be a helpful way of thinking about testing and likely someone else here will, too.
Types of tests and testing pyramids
Much of testing terminology focuses on what is tested and the environment in which tests are run. While valuable for describing individual test functions and modules, these names and categories are inadequate for describing the degree of confidence that tests should inspire (the same goes for test coverage, too, by the way[0]).
What kinds of tests are there? How many ways are there to describe tests? So many!
- Unit tests
- Functional tests
- Integration tests
- Acceptance tests
- User tests
- Regression tests
- End-to-end tests
- System tests
- Service tests
- Smoke tests
- Behavioral tests
One thing you'll notice is that many of the enumerated labels here are not mutually exclusive. Some are intended to describe the level of code tested (e.g. unit tests, integration test), some imply something about who is reading or writing the test (e.g. behavioral tests), others where testing is injected or what is running the test.
If you cut the test types across the dimension of "level of integration", you can stack them into a pyramid.
Well, the actual testing pyramid[1] is a little easier to construct. It shows the hierarchy of integration level as well as the expected relative speed of individual test execution, and provides a rough guide to how many of each kind of test your test suite should have.
I'll risk saying that the testing pyramid is a generally and widely accepted guideline for writing tests. This neatly encapsulates the advice:
Write lots of small and fast unit tests. Write some more coarse-grained tests and very few high-level tests that test your application from end to end.
Now this is good advice, all things being equal. Though it leaves unsaid the assumptions about why you should write more or less of certain tests. What are the metrics, the goals? Is it coverage and test breadth? Test depth? Who are you writing the tests for? What are they being used for? What are the consequences of code being failing or being wrong?
Test metrics and goals
The question of whether a test is a unit test or integration test doesn't answer the questions of what the test is for, or what kinds of answers it should give. So it's not that this isn't a useful taxonomy for tests, rather it's like categorizing shoes by weight, color, and size, instead of, say, their intended purpose. Or by organizing books on your shelf by the color of their spine rather than, say, their topic.
Apologies, but I think this is absolutely pathological behavior:
Can you imagine organizing code like this?
So let's break down the dichotomy at issue. Lab tests are tests that are really for the developer, these are tests that give you some confidence than some individual part or component or behaviors works as expected. Road tests are oriented toward the customer, e.g. testing business requirements, that some major behavior tying multiple things together works as expected. You could point a new developer at the lab tests to get an idea for what a component should be able to handle and how robust it is (regardless of how component complexity or integration); but you would point them at the lab tests to understand how it is that the application should work for end-users.
Organizing road tests and lab tests
One concrete thing that this can help with is test organization. Tests for components - lab tests - might be bundled with the individual modules and libraries in your web app (e.g. various app_name/tests/
modules ) and road tests might be separately kept in a top level directory organized by user type or scenario rather than module/component.
That being said there is no one way of dividing up or writing these tests. My own preferred way - at this time - is as described, with a single test function in each "road test" test file executing a single workflow and verifying side effects and results along the way with detailed subtests[2]. Test files are named based on the action or business requirement, and grouped in folders based on features and products. While not written exactly in the style of literate programming, they are at least loquaciously commented and documented.
The main thing this taxonomy gets you though is a better grasp on how confident you should really be in your tests. If you have customer facing behavior tested with road tests, higher confidence is arguably warranted. You can have 100% test coverage with unit tests or integration tests and still radically miss the mark.
Isn't this just the taxonomy of functional tests and non-functional tests? It smells similar, but tastes different. What about acceptance testing? I think that's probably closer, but the key is that we have a taxonomy of related terms. What is the corollary to acceptance tests? Non-acceptance tests?
Regardless of where the tests are, with lab tests and road tests we have an understanding of the limitations of the various tests, the limitations of what they tell us, and where to focus for feature changes and bug fixes[3].
Organizedly yours,
Ben
[0] Test coverage without depth is like verifying that you've checked that every wheel is attached to your car before a long trip without checking the tire pressure, too. But don't worry you checked all the wheels!
[1] From guest poster Ham Vocke https://martinfowler.com/articles/practical-test-pyramid.html
[2] Books photo by See-ming Lee, CC BY-SA 2.0 Deed
[3] It also points to the relative disposability of tests