Code review vs code proofreading

typeset

                February 20, 2023

            Code review vs code proofreading

                What if the problem with code review is that we're doing it wrong?

            Administrative Stuff

Just one month until the March TLA+ Workshop! Thanks to everyone who already signed up, I'm in the process of revising everything and am real excited to share the new content. There's still nine slots left if you want to join!
I have a new blog post up: NP-Complete isn't (always) Hard, about NP-complete problems and modern SAT solvers.

Code review vs code proofreading
So in my last newsletter I had an aside on code review:

This is also why a lot of people hate code review. It’s good when you’re acting as an editor, looking for foundational improvements to the code, but it’s awful when you’re acting as a proofreader. That probably deserves its own essay!

Let's dig into that a bit more.
Proofreading comes from the printing industry.¹ Historically, printers would typeset a hand-written document for copying, and proofreaders would check for discrepancies between the typeset "proof" and the original manuscript. Proofreading was both hard and boring, so it's easy to miss things.  These days, with digital automated "typesetting", proofreading is synonymous with copyediting, which also includes things like fixing word choices, stylistic inconsistency, etc. 
However, "proper" copyediting changes the piece more than proofreading. We can think of the range of editing actions as being on a spectrum of invasiveness:

Proofreading, finding spelling and grammatical errors
Copyediting, improving word choice, stylistic consistency (do you write do not or don't), grammatical errors
Substantive editing, sentence structure, organization of content
Really substantive editing, choice of content, paper thesis 

The further up the rung you go, the more editing becomes an art. I've had lots of different people help edit my blog posts and I've noticed that people tend to "settle" on one style of feedback. It also seems to form a pyramid: it's a lot easier to find people who like to do proofreading and copyediting than who do substantive editing. That's likely because proofreading is very objective while substantive editing is heavily subjective. 
The important thing is that these are different skills and activities. The techniques and challenges of proofreading are different from the challenges and techniques of copyediting, which are different from substantive editing.
So what?
Both Microsoft and Google have written case studies on how they internally use code review. What they consistently find is that the majority of the code review comments are not found defects, rather they are suggestions to improve the code. And those improvements can be also placed on a spectrum: 

Formatting, indentation, bikeshedding
Code consistency (do you write id() or get_id()), local code organization
Understandability, documentation
"Fundamentals": is this the right code for our need? Is it done well?

I see an awful lot of parallels to the editing spectrum. It's really easy to get bikeshedding about formatting and indentation, so much so that newer languages now have automatic formatters to just nip the problem in the bud. It's also a lot more annoying to both do and receive, to the point where I think its annoyingness inspires the idea that pairing is a sufficient replacement. Proofreading sucks and everybody hates it.²
I don't think this is a radical idea. It's implicit in all the code review guides I read as research— design and complexity come before style and formatting. So why doesn't this normally happen?
One thing I didn't mention about the editing spectrum is that the more invasive forms of editing happen earlier in the process. You do substantive edits on the first rough drafts, where parts of the manuscript are totally unpolished, outline chunks, or even just "TODOs". The editor doesn't need your prose to flow well to see flaws in the structure or a hole in your evidence! Then copy editing happens when later the overall structure is nailed down but the prose is still in rough shape, and then once you have a "final" draft you're only looking for spelling and grammatical errors: proofreading. Of course things aren't that fixed and you can do any of those forms of editing at any stage, this is just the most efficient default. If you proofread too early you might throw away all the spelling corrections when you delete half the essay, and if you substantial edit too late you throw away all the effort on polishing.
In code review, by contrast, code is typically reviewed when it's potentially ready to merge. Substantive editing at that stage can mean throwing away a huge amount of labor. And its harder to see fundamental issues in a ready PR, for the same reason that it's easier to find design flaws in a specification than a final implementation.
Now, there are other reasons why people proofread more than they copy/substantive-edit, but this reason is the main one that is directly tied to the PR process. Meaning it can be changed with a different process. Say I submit a PR with this content:
class DateTimeDifferenceResolver {
  constructor(date1, date2) {
    // todo
  }

  execute() {
    // todo
  }
}

// DateTimeDifferenceResolver will be used in files
// A, B, C, D, E

There's no implementation there, so the only thing people can review is the copy and substance. If someone goes "that could just be a function" I didn't waste an hour of my life writing (new DateTimeDifferenceResolver(a, b)).execute() everywhere.
The obvious drawback of this approach is it makes the development cycle longer. Instead of at least one round of PR feedback, I need two: a dedicated substance review and then a proofreading review. As such, it doesn't make sense for the small changes most PRs are about. But it could be a useful process for bigger changes.
Hero image is from https://biostatmatt.com/uploads/ProofreadSymbols.pdf.
(The subtler drawback is that CI/CD setups usually assume that PRs consist of completed code, not code drafts, so wouldn't play well with this approach.)

A lot of programming terms come from printing, for example boilerplate code. ↩

I once had a teammate who would only give PR feedback on a single file at a time, so you had to resubmit corrections several times to get all the proofreading he should have given you in the first place. ↩

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: