Collecting and curating material is good and we should do it more

specifics

                May 31, 2023

            Collecting and curating material is good and we should do it more

                Chronicling, plugin systems, and six other disjointed ideas combine in an argument of how to "engineerize" software more

            GOTO Chicago is over! My talk, "Is software engineering really engineering", went over pretty well, and I'm happy with how it turned out. I'm going to try submitting it to other conferences. Beyond that, I have a TLA+ workshop I'm teaching on June 12th and after that my next engagement is in September, so there's a lot of time to take up with things.
(Use the code C0MPUT3RTHINGS for 10% off the TLA+ workshop! It's a full day of hands-on experience, plus I give my students followup reviews on their specifications.)
In the talk Q&A, one person disagreed with me with something I thought was really interesting and deserved a longer response. 
My original claim
As you may know, the talk was based on my Crossover Project, where I interviewed people who've done both tradition and software engineering. I'm deeply interested in what's needed to make our field more "engineering-like", and the last part of the talk was about what the two worlds could learn from each other. We could say a lot about the high-level principles, like "community" or "responsibility", but far more interesting is the specifics. It's one thing to say that "traditional engineering needs more open conferences", quite another to say "trad eng needs better version control!"
In the talk, I focused on one cool specific idea from trad engineering: hyperspecific learning material. I was inspired by this book an interviewee mentioned:

That's a whole book on manufacturing snap fits! I said that there's nothing like that in software: we talk about specific tools, but never about specific domains. What if we had a book on how to do versioning, or how to make a good plugin system! That'd go a long way to improving the state of our field.
The Problem
Someone objected to this. According to him, we're not far along enough for these materials to be useful, and instead they'd ossify us. We don't yet know how to do versioning well in all cases, so a book on it would just get everybody to cargo cult a poor solution.
And he had a lot of experience backing this. He was an editor at a large software publisher and regularly sifts through tons of low-quality "best-practices" submissions. He called them "summer projects", like "what I did on my summer vacation". Someone builds a system and decides to write a book saying "my is was the best way to make the system". Sometimes he'd get several books on the same topic, all saying "my way is the best way", and they'd be different ways!
If it's not clear already, this critique is coming from a person I deeply respect who's done brilliant work in his field. It's not something I can dismiss with "well why don't you interview 15 people", and at the same time, I still disagree with this. Even if people aren't doing this aggregation work well, it doesn't mean the work isn't worth doing!
My defense
The Research Process
Let's start by listing some of the things that we do in this kind of "survey chronicling".¹ It's hard to find material on this online, and none of my humanities books give it a definition, so I'm just going to invent my own process.²

Collection: gathering material that's out there and putting it in one place.
Curation: identifying which gathered material is useful for knowledge-building.
Analysis: taking the curated material, breaking them down, and studying what they're "saying".
Synthesis: taking the analytic information and processing it into an overall idea.

Synthesis is the stage that leads to "best practices", under the process "projects that do X benefit and projects that don't do X suffer." The objection as I remember it is that there's not enough material to collect, and we'll do poor synthesis.
What if we skip synthesis? If we don't synthesize, we can't synthesize poorly. And this is still useful! It's even useful without the analysis:

Just collection: The "awesome list of" repos on github that get a bazillion stars. People like these because collection is a long and tedious process.
Collection + curation: Academic "survey papers", or "literature reviews". People like these because curation is a long and tedious process.

Analysis, too, is a long and tedious process, where you actually go into the details of your material. Analysis without synthesis gives us a "snapshot of knowledge" at the time of research, a sense of what we already know without trying to create new knowledge. Now this could quickly become outdated as we learn new things! But it accelerates the process of learning new things, as a lot of the long and tedious work is already done.
In contrast, a "summer project" resource doesn't do that tedious work for the user. I can't pick up "The One True Plugin System" and get a sense of all of the plugin system landscape. By doing analysis and synthesis from a single source, the writer skipped the necessary collection and curation, which is the bedrock of chronicling.
An example of useful collection: connectors
An example of this working properly is the book Documenting Software Architectures. They list six types of "connectors" between software components:

You can tell the book is from the early 2000's because it lists "peer-to-peer" as a major architectural motif. Napster and BitTorrent were Big Deals back then. The list is only a snapshot of early 2000s architectural knowledge, and things have developed since then. Even so, reading it back in 2017 really helped me! Having those six terms, with some discussion of tradeoffs and real-world examples, gave me a surer footing in talking about modern systems. Here's what it says about "pipe-and-filter" architectures:

That's honestly not a huge amount of analysis right there, but it's still more analysis of pipe-and-filter than I'd considered before, so it was a valuable read.
There's plenty to collect
The other objection is that there's not enough material out there to collect: it took hundreds of years of doing civil and mechanical engineering to build up the body of knowledge necessary for chronicling.
I think this doesn't hold for two reasons. First of all, not all engineering fields took centuries: electrical engineering only got started in the 1800s and was fully mainstream as an "engineering discipline" by 1900. Second, we communicate with each other much more than other fields do. There is no civil engineering equivalent of Github, or dev.to, or Strange Loop.
Third, there is enough material. To understand plugin systems better, here's some potential projects we can look at: 

WebExtensions, and the legacy addon system Firefox had before that
Plugin systems for Eclipse, VSCode, Atom, Emacs, and Vim
The six different Office plugin APIs, Google Workspace plugins, whatever Libreoffice does
Sphinx, Jekyll, Wordpress, and Hugo plugins
Plugins for game modding: Minecraft, Quake, Wesnoth, etc

That's five different universes of plugin systems. Maybe, on aggregation, there's some universal analysis you can do, or maybe you'll have to approach each universe separately. The point is, there's a lot of material out there!
Oh, and it's all open source. We can see how the plugin systems are implemented! Many of the people who built these systems have public emails and will answer questions! That's crazy!
What's stopping us?
This is always the question that kills my hopes and dreams, because it always has the same answer: the work is hard and unrewarding. Nobody's paying for it and it doesn't lead to any academic prestige. I wonder how the "snap-fit handbook" happened. Shouldn't it have had the same constraints?
Okay, after a lot of searching, it seems that the author is someone who specializes in snap-fits, and has been specialized in that for over 40 years, so maybe it's a "prestige" book. Lots of people write books (including me!) to establish themselves as an expert. Whoever makes the snapshot of knowledge can then be the Plugins Person. So maybe the limiting factor is the market forces of consultancy. There's lots of consulting shops for specific technologies, so they have incentives to write books on specific technologies, but I haven't seen that many consulting shops for specific problem domains. 

I got the term "chronicling" from Nathaniel Reindl-Scheibel (@masto.don.gs) in a bluesky convo. I'm @hillelwayne.com btw ↩

I'm sure this is incorrect and the correct version either 1) doesn't exist yet or 2) exists and is impossible for a normie to find. This is frustrating me more and more these days. ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: