Educational Codebases
We get better at coding by reading code, but code isn't designed to be studied. So let's make projects that *can* be studied.
New blog post, which is also a new project: Learn AutoHotKey by stealing my scripts. It's about this project. Patreon notes here.
This is my second "educational codebase", after TLA+ Graph Modeling. The idea's been stuck in my head for a while. A common bit of advice to new coders is "get better at programming by reading code." The problem is that codebases aren't set up to be read. While we prize readability in our code, that's readability for other contributors. There's certain things we can assume for other contributors that we can't assume for beginners. Take the most starred codebase on GitHub, react. Here's a chunk of one file:
// $FlowFixMe[missing-this-annot]
function Chunk(status: any, value: any, reason: any, response: Response) {
this.status = status;
this.value = value;
this.reason = reason;
this._response = response;
}
// We subclass Promise.prototype so that we get other methods like .catch
Chunk.prototype = (Object.create(Promise.prototype): any);
// TODO: This doesn't return a new Promise chain unlike the real .then
Chunk.prototype.then = function <T>(
this: SomeChunk<T>,
resolve: (value: T) => mixed,
reject: (reason: mixed) => mixed,
) {
const chunk: SomeChunk<T> = this;
// If we have resolved content, we try to initialize it first which
// might put us back into one of the other states.
switch (chunk.status) {
case RESOLVED_MODEL:
initializeModelChunk(chunk);
break;
Now think about all of the JavaScript features and concepts this uses: prototypes, promises, switch statements, underscore fields. It's also got a lot of flow features, like annotations and generics, which aren't clearly distinguished from the JavaScript concepts! A beginner reading this would be hopelessly lost. Maybe there's a better React file to start with, but how is our beginner supposed to find it?
The other problem: there's no way to differentiate idiomatic code and "good design" from kludges and pragmatic workarounds. A lot of Python code uses strings for paths instead of path objects because pathlib is only about ten years old. A beginner seeing open(path)
wouldn't know it's legacy and they shouldn't use it!
(This isn't only a problem with coding newbies: people learning a new language have the same problem, though to a lesser extent.)
The solution: educational codebases
An educational codebase is one designed to be read and studied.
Here's a snippet from my main.ahk:
/*
"Copy as markdown link." If you have `link` on the clipboard and have selected `title`, it will set your clipboard to `[title](link)`.
*/
; #>!c --> win + right-alt + c
; I use just `#!c` personally but that's a bad habit,
; since it would override any Windows hotkeys
#>!c:: {
ctmp := A_clipboard ; Save what's on the clipboard (A_clipboard) for later formatting.
; In Windows clipboard copying is async.
; ClipWait can detect if the clipboard is NON-EMPTY.
; So to detect "we've copied" we have to empty the clipboard first.
A_clipboard := ""
Send "^c" ; Presses ctrl+C for you
ClipWait 5 ; Wait 5ms for clipboard data. See https://www.autohotkey.com/docs/v2/lib/ClipWait.htm
A_clipboard := "[" . A_clipboard . "](" . ctmp . ")"
}
That's 10 lines of comments for five lines of script. All the functions and idioms and peculiarities are explained. I hope this makes it as easy for the student to understand and tweak as possible.
This is still a new idea for me, but here's a couple of things I think are true about educational codebases.
Educational Codebases aren't Toys
The scripts in the repo aren't just ones I whipped up as demos, they're actual scripts I use myself.1 I find that when I run into issues implementing a "toy", I'm tempted to change the problem instead of go through the inconvenience and writing a complicated solution. So I don't get the same breadth of teaching with toys as I do "real" problems.
Educational Codebases aren't Production
Production code is often too complicated and has too many hacks to be great for teaching. It's easier to teach if you simplify things a bit, make things uniform, and fix all the bad habits. In the above script, I wrote #>!c
when my own config has #!c
, because I don't want other people adopting my old bad habits.
Yes, I realize this contradicts the last point about wanting complicated examples. There's a balance to be had, and I think the sweet spot comes from simplifying production code, not making a toy.
And on a more basic level, production code doesn't need so many comments. I already know AHK, I don't need five lines explaining what ClipWait
is doing!
No good conclusion, I just want to see people make more educational codebases. I think they're a great idea!
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.