Codebases as communication
Conventionally we communicate programming ideas with talks, papers, and blog posts. But we can also communicate ideas with entire codebases. If someone finds a security exploit, she'll sometimes publish a proof of concept to prove the exploit isn't just theoretical.
Now let's say the exploit PoC comes with a ton of command-line flags: verbose mode, configuration options, output formats, the whole works. Now the writer is communicating something subtly different: not just that the exploit exists, but she wants you to experiment with it. She's making it as easy as possible for you to play with the exploit yourself and come up with variations and consequences.
This makes codebases like any other kind of communication medium. There are different styles you can use to say subtly different things. There are also different "genres", or overt things you use the codebase to say. Some examples:
Of Genres
This is by no means exhaustive. A codebase can:
- Normalize cross-language comparisons. TodoMVC showcases the differences between frontend frameworks, and I run Let's Prove Leftpad to sorta do the same with formal methods.
- Platform a technique. A simple business app with lots of well-designed property tests shows people how to use property testing. A codebase can show what a "proper" Rails application looks like. You can also add a complication to the codebase to show how the technique handles it.
- Make a technical statement. Show that code written in a certain style is likely to have bugs. Or data from two different json formats, showing what queries are easier in each format.
- Make a political statement. Consider an NLP project that matches writing styles of anonymous Gab accounts with verified Twitter accounts, or one that does sentiment analysis on dev.to comments of male- vs female-authored articles.
- Show the viability of an idea. A codebase that transforms a TLA+ specification into a fixed testing suite won't be directly useful to anybody, but it can inspire other people to try making their own versions. A codebase doing lots of fun things with Word2Vec will inspire people to play with Word2Vec.
- Be funny. Kevin Kuchta is the undisputed master of this, with things like a compile-time spellchecker and CSS-only chat.
Of Styles
Styles are qualities inside a codebase that encourages certain kinds of reader responses. Done right, they also signal intentionality, that you intend for those responses. Again, not exhaustive.
- Well-commented code encourages the reader to understand something. The something could be the problem domain, the solution, a used library, or the language itself.
- You can write code to encourage extraction, where the reader adapts part of the code to their own projects. You can also write it to encourage modification, so that people fork your project and make their own changes.
- A good UI encourages use of the code. A set of well-documented CLI flags, along with programmable completion, signal that you want people to actively use your codebase.
- A programming "flourish" showcases your own skill. A small utility script written in Idris, a library with a Clojure transducer, a codebase with a formal spec. When used sparingly, it shows you're both competent and judicious. You can do cool things but aren't compelled to.
Repeated style patterns across codebases can communicate fundamental values. If a company releases a lot of open source, and many of them have a single flourish, it shows that engineers at the company get to work with exciting technology. If everything uses TDD, it shows that the company treats TDD as a fundamental part of the development process.
Like other forms of media, a codebase style can show unintentional things about the author. What does it say about a programmer when their favorite test string is "boobies"?
Considerations
For a codebase to communicate well, the message needs to be easy to understand. So the code layout needs to be very simple. Most production code doesn't communicate well because it's split across files in nested folders. That's hard to navigate. Try to have only a few files and make the most important ones clear.
Boilerplate code hampers communication by adding noise to the code. If you're trying to communicate something that's language-agnostic, use low-boilerplate languages for the codebase.
Science code is often bad at communicating because it's a spaghetti mess (example). At the same time, lots of abstractions and indirection tax the reader's working memory. The code needs to be organized but not too organized.
If you have an intentional genre and style, you should make that clear in a project readme. Better overexplain than underexplain. I think it also makes sense to talk directly to the reader. Call out specific lines, go into detail.
Unlike in regular programs, redundancy helps with communication. Presenting the same thing in increasingly complex ways, or show variations and their consequences.
All this means that communicative codebases will look different from other kinds of code. Genre definitely; production code can also have styles, but it'll be harder for the casual reader to notice them.
Questions I have about this
- What other genres and styles are there?
- What are some good communicative codebases that already exist? I only know of examples in the comparison and joke genres.
- Are there different techniques for writing communicative code?
- While production code isn't communicative, can you have communicative carveouts? Like a complex app that calls attention to one particular file with a technical flourish.
- How do we distinguish between intentional and unintentional stylings? What about conspicuousness: where the author wants us to notice her style?
- What would a code sharing platform that specializes in communicative code look like? Some common features like issues are less important, some rare features like discussion boards or code annotations would be more important.
- What would a language designed for communicative projects look like? It might seem like communication is too complex to support with general features, but that's not true: "less boilerplate" is generally better for communication. So what other features would help? A really good CLI library? A powerful DSL builder? Are there things that would be footguns for 100-file codebases that would be beneficial for 3-file codebases?
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.