The Myth of Self-Documenting Code
One of the weirdest things about software engineering is how many people hate comments.
Like actually hate. There are influential people out there who say that if comments are a sign your code is bad, because you failed to make your code understandable enough to not need comments. Comments lie and go out of date, they say, code is the only source of truth. Instead, you should have self-documenting code, where good variable names and code organization make comments superfluous. People can learn what the code does just by reading the code! Then you don't need comments, except for very rare exceptions.
I'm sorry, but this is impossible.
One of the first things you learn as a teacher is that most "obvious" things aren't. No matter how simple and obvious something is to you, it isn't simple or obvious for anybody else. This applies to programming, too. When you write "self-documenting code", it's self-documenting to you. But you have all this internal context that makes it self-documenting. Other people don't have that context. They need more information than your code has.
You can embed some of this information in the variable and function names, but you're only getting one slug per thing, which only lets you encode on small, distinct concept. You could try "encoding more" by breaking a function into subfunctions, but then you're making the code harder to follow in order to make it more "readable". You can also embed some of this information in tests, but then your tests are forced to be both tests and documentation, and they no longer have a single responsibility.
But replacing comments with functions 'n tests has a deeper, more fundamental problem: they can only encode certain classes of information. People refer to this as "comment why, not what", but there's a huge class of information beyond "why" that can't be encoded in the codeflow: negative information, optimization info, tips, etc. Most people don't even realize they can document that stuff. But you can, quite easily, with comments.
Tactical Sasquatch
If self-documenting code is a myth, why are there so many examples online? There are tons of articles showing snippets of claimed "self-documenting code". But these examples are small, self contained things, like
fun area(length, width) {
return length*width;
}
I'll borrow an idea from one of my favorite blogs: the difference between tactics and operations. It's easy to follow the moment-to-moment flow of the code and what it's doing, so it's tactically self-documenting. But operational information is more global, how the code fits in with the larger program. The code's behavior can't tell us that because it isn't supposed to know anything about the larger program. That would break the very idea of encapsulation!
My favorite "clean code" project is Uncle Bob's FitNesse. I like it because it's a large program written by the most influential Clean Coder, so should be our best candidate for truly self-documenting code. Here's one snippet from the Shutdown Class:
private void run(String[] args) throws Exception {
if (!parseArgs(args))
usage();
ResponseParser response = buildAndSendRequest();
String status = checkResponse(response);
if (!"OK".equals(status)) {
LOG.warning("Failed to shutdown. Status = " + status);
System.exit(response.getStatus());
}
}
Readable? Yes. You can see exactly what this calls and what each thing it calls says it does. But we operationally know that this is supposed to, at some point, make an API call to actually send the shutdown signal. It can't tell us where that is! At best it vaguely gestures at buildAndSendRequest
.
In fact you need to dig four methods deep: run
calls buildAndSendRequest
which calls ResponseParser.performHttpRequest
which calls RequestBuilder.send
. The code isn't self-documenting because a critical part of the documentation, the actual API call, is an operational concern, not a tactical one.1
Remember: what's obvious to you is not obvious to everyone else. You might not even realize these operational questions exist because you've so internalized the code they're too trivial to even be questions. You'd look at your code and think "yep this is self-documenting" because it answers tactical questions, and those are the only ones you can think of.
Siren Songs
What makes self-documenting code so appealing to people? I think it's a plethora of (legitimate!) reasons:
- Comments can fall out of sync, which makes them less reliable. While the same thing can happen to method names and tests, it's less likely, so it takes less maintenance to uphold them.
- Comments clutter up the codebase. Files are streams of bytes, meaning the only portable way to encode metadata is to drop it in the file, and comments are metadata.
- It's easier for us to remember the times comments were obsolete and "tricked" us than the times they helped us.
- Our discipline has developed a large body of theory on how to write better code, but not write better comments.
- Writing comments— communicating human to human— is a totally different skill from writing code, one that most developers haven't practiced.
A lot of these boil down to "comments are kinda garbo", and I 100% agree: comments are garbo. We need something better than comments, or at the very least we need way better comment tech! But even though comments are garbo, they're still the only means we have of interleaving code with expressive human-to-human communication. Code and tests are not nearly expressive enough to embed all useful information. We can't sufficiently self-document our code.
Now I kinda wanna hack together a "comment embedded" using neovim floating windows, could be fun
SPLASH
Decided on a whim to go. If you're around come say hi!
Update for the Hacker News Crowd
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. I also have a website where I put my more polished and heavily-edited writing (the newsletter is more for off-the-cuff stuff). Newsletters are usually 1x a week.
-
While writing this, Bob commented "That project is twenty years old and has had a long and varied history … Perfect? No. What is?" I think my point still stands: the operational information is missing not because the code is insufficiently clean, but because it can't be easily encoded in the code. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.