Donald Knuth Was Framed
YOW! Talk
My YOW! talk, "Designing Distributed Systems with TLA+", is now available! You can watch it here.
Donald Knuth Was Framed
The other day I was talking with a friend about structured editing and literate programming came up. LP was one of Donald Knuth's ideas, to structure programs as readable documents instead of just machine docs. He was interested in it, I was cautiously skeptical. We both knew the famous story about it:
In 1986, Jon Bentley asked Knuth to demonstrate the concept of literate programming by writing a program in WEB. Knuth came up with an 8-pages long monolithic listing that was published together with a critique by Douglas McIlroy of Bell Labs. McIlroy praised intricacy of Knuth's solution, his choice of a data structure (Frank M. Liang's hash trie), but noted that more practical, much faster to implement, debug and modify solution of the problem takes only six lines of shell script by reusing standard Unix utilities. McIlroy concluded:[12]
Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial-strength Faberge egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start.
The program was "print out the top K most-used words in a text." The six lines of shell script were
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
A damning counter. But neither of us had ever read the paper. And as you know, I'm all about primary sources. We pulled up the paper here and read through it together. And it left us with a very different understanding of literate programming, and the challenge, than the famous story gave.
First of all, we found that Literate Programming as conceived by Knuth not just "text, code, text, code". You could say "now we add foo
to section <<bar>>
" even after having already created bar
and modified it several times before. Writing the code out-of-order is the whole point. You'd use a tool to take the narrative, or "weave" and synthesize the code, the "tangle". Very roughly, you could write:
Blah blah blah
<<foo>>
<<bar>>
Blah blah
<<bar>> = print("world
Blah
<<foo>> = print("hello ");
Blah
<<bar>> += !");
And it would weave that into the actual program.
The actual paper paints LP in a much better light than we thought it did.
- Knuth was asked to demonstrate his tool for LP, WEB, and was demonstrating LP as part of that. He wasn't competing with McIlroy, and didn't know how McIlroy was going to critique it.
- The actual content is effectively his stream-of-consciousness notes about what he's doing. He discusses why he makes the choices he does and what he's thinking about as primary concerns. It's easy to skim or deep-dive the code.
- Most of the "eight pages" aren't because Knuth is doing LP, but because he's Donald Knuth:
- One page is him setting up the problem ("what do we mean by 'word'? What if multiple words share the same frequency?") and one page is just the index.
- Another page is just about working around specific Pascal issues no modern language has, like "how do we read in an integer" and "how do we identify letters when Pascal's character set is poorly defined."
- Then there's almost four pages of handrolling a hash trie.
- Knuth is doing everything from first principles. One of McIlroy's strongest criticisms is "everything there- even input conversion and sorting-is programmed monolithically and from scratch. In particular the isolation of words, the handling of punctuation, and the treatment of case distinctions are built in." Of course it is! Knuth was trying to show how WEB makes it easier to write and understand programs, not explain how Pascal packages work! His ideas aren't invalidated just because we can import libraries.
[rant] McIlroy could only write such a tight script because we're doing something that's near perfect for shell scripting: simple processing of text. If we choose a slightly different problem, like "find the top K pairs of words and print the Levenshtein distance between each pair" then shell scripting that would be a lot harder.[/rant]
Not to say that McIlroy is entirely wrong, and he raises a lot of good points. But I don't think this is the damning indictment of LP or a total vindication of the Unix model. It also got me to look into the Leo Editor, which seems a more modern version of LP. I'll report on how it goes.
Update for the wave of new readers
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. Updates are usually 1x a week. I also have a website where I put my more polished and heavily-edited writing (the newsletter is more for off-the-cuff stuff).
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.