I finally found a use for XML
The Problem
When teaching things I like to break code up into a set of small changes, showing the differences between each change. So if I start with
// version 1
boilerplate
more boilerplate
some code
more boilerplate
I'd show the next small change as a diff, like this:
// version 2
boilerplate
more boilerplate
- some code
+ other code
more boilerplate
I might do this four or five times for a single code snippet, gradually progressing from a base version to a final version. To keep the explanations in sync, I store each intermediate as a full file and automatically generate the diff snippets from them. In Sphinx, that looks like:
.. literalinclude:: file1
:diff: file2
Having multiple files is good because it means I can test every one of them individually and make sure every diff does have the properties I say it has. Each file is mostly the same as the previous version, except for one small change.
Changing the boilerplate sucks though. If I make any tweaks to the initial version of the code, I have to manually change every single file to keep them all in sync. With just a first and final version that's not too bad, but if I have four intermediates that gets tiring really fast.
I wanted a way to put all of the versions in a single file, marking up what lines belong to what versions. Prior art is beamer
, a LaTeX package for making slides. In beamer, each slide "frame" compiles into a set of slides. You can write a \onslide[2]{b}
to say that slide 1 of the frame contains a
and slide 2 contains a b
. But since writing documentation in LaTeX is almost as bad as writing it in markdown, I looked for a more general-purpose solution. Everything had problems:
- Software template libraries like handlebars and jinja2 are for filling fixed formats with collections of prestructured data, like
<li>Name: {firstname} {lastname}"</li>
. My data is fixed and I need the logic to be flexible. - I looked into
m4
like three or four times but always bounced off, it's just too messy. - Several people recommended storing the versions as git commits and rebasing whenever I needed to change something. This would be ridiculously heavyweight and still require a ton of manual labor.
There was no available tooling to do this, so I needed to roll my own.
XML to the rescue!
I came up with my own lightweight xml format. The only tag is s
, for switch. <s on="1,3-4">foo</s>
means that foo
should be present in versions 1, 3, and 4 of the file. Ranges can be open-ended for convenience.
IsUnique(s) == <s on="1">Cardinality(seen) = Len(s)</s><s on="2-">
\A i, j \in 1..Len(s):
<s on="3">i # j => </s>seq[i] # seq[j]</s>
(You can see a full template file here.)
I also wrote a script to parse the xml and generate the corresponding files. For each version, it uses an xpath query to find all s
tags and checks if it has a matching version. If they don't match, it simply deletes the text in the tag and all its children. Since XML is naturally tree-like this means I can nest changes inside other changes, making really intricate progressions possible with minimal boilerplate.
\* file__1.tla
IsUnique(s) == Cardinality(seen) = Len(s)
\* file__2.tla
IsUnique(s) ==
\A i, j \in 1..Len(s):
seq[i] # seq[j]
\* file__3.tla
IsUnique(s) ==
\A i, j \in 1..Len(s):
i # j => seq[i] # seq[j]
What tickles me is that XML is exactly the right choice for this. Two reasons why JSON/YAML/TOML wouldn't work:
- I need inline markup, not just line-level markup.
- I need to preserve text formatting, like indentation and newlines.
In other words, I'm working on a semantic text document. JSON et al are better than XML for configs and API formats, but they're not text markup languages. For that XML still remains the right tool for the job. And I didn't even know XML before! It was so much better than the alternatives that I learned just for this one project.
Now just because it's the right tool doesn't mean it's the best possible tool. Like everybody else I find it annoyingly verbose, and I don't like having to write <=
as <=
. Pollen seems like it would be a lot better, since I could write my switches as ◊s[1,3-4]{foo}
. But that would mean migrating everything to Racket, which I can't do (for this project). As it stands, XML does a great job solving my problem.
Mailbag next week
I enjoyed the one back in December and think it would be fun to repeat every 3-4 months. So next week will be answering reader questions. Send them here.
New York Trip
(This is a bit silly but I really want to see if it works)
I have to go to NYC April 9th for a wedding. Normally I'd do the "get in Friday, wedding Saturday, leave Sunday" thing, but I think it would be fun to spend a bit more time in the city. That's a bit more feasible if I can get flight and hotel covered. So if anybody wants to host a talk at their company or anything, let me know! I can do the usual prepped ones on TLA+, ESE, or the Crossover Project, or more informal stuff on... well, pretty much anything I've written about. We can figure out the specifics together. Time should be within a few weekdays of the 9th, either before or after works for me.
I'd also be up for regular consulting gigs on TLA+/Alloy spec review or pairing, see my consulting page here.
(If this works and I get to spend a few extra days in New York, I'll also make time to meet with people one-on-one. I'll let you all know next week!)
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.