Back Just In Time To Close Out the Year
Been a while, internet! This is my first newsletter since May 12. Let’s see, where to begin…
For the bulk of the last three and a half months, I have basically been “in transit”. And in order to explain that, I have to go right back to the very beginning of the pandemic, with a breakup and an obscenely✱ expensive downtown Vancouver apartment. She had picked it out when we moved in together, but she couldn’t afford it on her own, so the place went to me. I had just landed a terrifically lucrative contract—more on that story some other time—that was just about to start as everything was going into lockdown, so this arrangement suited me just fine—or at least as fine as it could, given the circumstances.
✱ The apartment was only modestly above normal-expensive by Vancouver standards, but it was definitely more suitable for a couple—and their paycheques.
Come around the beginning of this year, as the first vaccinations were starting to nibble at the third wave of infections, this situation was no longer suiting me just fine. I was starting to think I should proooobably start thinking about finding somewhere to live that wasn’t 300 too many square feet and 500 too many dollars. The problem was: I couldn’t picture it. I don’t know if it was pandemic torpor, but I was completely incapable of imagining what my next home in Vancouver would look like. So I just…did nothing.
Fast forward to July: my grandmother dies✱. She left behind a modest house in a nice part of Toronto, and the gargantuan chore of getting it into a state where it could be sold. A few days after that, for the first time in months, I had some clarity: there’s plenty of room in the house in Toronto, and my dad and his three siblings could use my help with it. So I gave notice on my beautiful downtown Vancouver apartment—that had actually become kind of a sad and empty place anyway—tetrised everything I own into storage, and got back on a plane.
✱ After a few weeks of a sudden and acute illness. Not covid. She was 93. I flew out to see her the week before she passed and left the day before—I had to attend another funeral, for my aunt—also not covid—back in Vancouver.
The three months that ensued are not terribly remarkable, other than the fact that the realtor insisted everybody vacate so he could blitz as many prospects through as possible, which resulted in me spending two weeks sharing a 400-square-foot hotel suite with my dad (at least it was an actual suite). And that was after losing close to $800—about a third of what I managed to claw back—after ejecting from the shittiest yet most artfully misrepresented AirBnB I had ever seen. The final wrinkle in that story was that the family who eventually bought the house wanted possession in four weeks, which, while I can’t fault my dad and his siblings for taking the deal, wasn’t part of the original plan. So I ended up spending the last twelve days of November in a hotel and two more (much nicer) AirBnBs, all the while trying to finish a consulting project I picked up in early October.
And now I’m here.
And I have a colossal backlog of stuff I’ve been thinking about but haven’t been able to write down.
I ended up writing a whole other 4000-word thing and subsequently decided I wanted to sit on it, so here is what I was originally going to write about: the project I just completed.
Not Everything Was a Goat Rodeo
During this turbulent period, I somehow managed to pitch, negotiate, execute, and finish a fairly significant consulting project. I think I took meetings in half a dozen different places. It was a straightforward strategery gig: I scooped up a bunch of information, figured out what it meant, infused it with insights, and presented it. I also did a little bit of exploratory programming—made them a library for doing some stuff with a little command-line wrapper around it. This time the project was around internationalization and localization, something I have done before (I designed, wrote, and maintained the entire L10N infrastructure for my employer from 2002 through 2005 by myself), but not in a while (it turns out not to have changed much).
Since early 2018, I have been delivering my work using a little client extranet I have cobbled together—although I have (still mostly unrealized) plans for it going all the way back to early 2009. The purpose of delivering my work product by extranet is not only so there is one authoritative place for it, but also in order exercise a particular fine-grained control over it that can’t be achieved by conventional documents.
I am speaking specifically of the style I am cultivating, which I call dense hypermedia. The key assertion that underpins dense hypermedia is that conventional documents, up to and including Web pages, are sparse in the amount of information they carry. This is especially significant for business documents, which are not only sparse (Edward Tufte wrote an entire essay on the information content of PowerPoint) but also tend to refer to things informally—people, companies, products, projects, places, events, concepts, arguments, decisions, quantities, even other documents—with no hyperlink. Not only, in this situation, do readers have to look up references manually, but authors have to write more words to expose and contextualize a given reference. The implication is that denser documents, with more hyperlinks, could require considerably less text to convey the same amount of information.
Even the average website, if you were to bleach off everything but the main page content (the navigation, sidebars, header, footer, whatever), you probably won’t find many links; certainly not many more than you would in a Word document, because people are still authoring Web pages in Microsoft Word (or Google Docs if you’re hip, or Markdown if you’re l33t). There still isn’t much (although there’s not zero) in the way of link-heavy text authoring interfaces.
The big deliverable for this project described a number of processes, so the narrative form of a document actually suited it. However, the other big deliverable was something like an inventory: extremely formulaic in structure, and while it needed to be represented as a document, it was arguably better suited as a database. Even the narrative document referred to people, companies, products, concepts, et cetera. This data could be encoded into the deliverable, and then pulled out and flipped around to make the entire set more valuable: show me all document sections that mention product X, all initiatives where person Y is a stakeholder (and who reports to them), the meeting where this or that decision was made (and who made it, and who was present), the content of the decision itself, its rationale, and so on. All this data could be encoded as hard nodes and connections and embedded into the documents. So I did. At least some of it.
Nobody asked for it (well, except for the part that they did ask for), but this is easily the most, uh, infused I have ever made a set of client deliverables. The most voluminous category I embedded, aside from some additional document semantics that I use mainly for controlling the layout (I mean, it’s there, why not), is a glossary of over a hundred concepts used throughout the documents. And, since the glossary is derived from the content, I also know where, down to the section (I could go down to the paragraph but it gets messy), the terms are referenced. This means the glossary is also an index. Furthermore, since I make these in SKOS, they have additional structure: semantic relations between concepts (broader, narrower, related), named collections, etc.
I figure I encoded probably a third to half of everything worth encoding, which isn’t bad for almost no infrastructure and no explicit mandate from the client.
This is another case of a document which is just a freeze-dried database, so author the database, generate a document out of it, then embed the database right into the document. It will actually survive the round trip. I do a number of these now—the glossary on my own website (which needs the same treatment) is generated from the same code—but I’m doing other forms too. The goal is to be able to round-trip (that is, generate an identical document from the embedded data) the data contained within them.
I consider what I created on this project to be somewhat transitional: the target was documents; the client wanted documents and they got documents. Over two months I wrote about a dozen Web pages (a few of those were pieces of composite documents, so more like eight or nine) into my little client extranet. At the end of the job, they asked for a PDF version: 245 pages. I had literally written them an entire book.
Exactly one hundred pages of this document is the glossary/index, which, in retrospect, I probably could have fiddled with to make it take up less space (it’s the backlinks, mostly). I generated the PDF by first transforming the extranet into LaTeX and then massaging it from there.
I think it’s remarkable—and why I consider this project to be transitional—that a fistful of Web pages, when typeset for print, becomes an entire book. A 245-page book is a daunting artifact (or even its ~90-page core). Moreover, because we were dealing with a mix of business, editorial, and technical concerns, no one person at the client’s organization would read the entire thing end to end.
This, potentially, was a missed opportunity to deploy some rudimentary audience mapping, but for the actual stakeholders on the project:
create a list of stakeholders within the organization (which I already had, since I interviewed them all)
create audience archetypes (business, editorial, technical) and attach the stakeholders to them
harvest the set of concepts from the documents (including what concept was referenced where)
map audiences to a handful of concepts (I had previously come up with the relations
aware-of
,understands
,values
,eschews
)
The essential criterion for determining a document’s audience, in my opinion, is whether the audience under consideration can understand it. There is an inclination to say that relevance is most important, but a relevant document you can’t understand is useless. Here is an opportunity for some interplay: you have a document (section) which you intend to be for a certain audience, but it mentions concepts they don’t understand. So you either fix the document by adding definitions of and introductions to said concepts (if you deem them essential), or you split out the part that deals with the offending concepts and reserve it for a different audience. While this editorial work ultimately has to be done by hand, it can be detected on at least a semiautomatic basis.
This, I think, is another place where we can see the benefit of delivering consulting work product as hypermedia: we can map document sections (again, you could do individual paragraphs if you felt like dealing with thousands of objects instead of mere hundreds) to audiences, and then resolve those audiences to individual people. Then we can curate the absolute minimum subset of text that each individual (often very busy) person needs to read to get value out of the work product. Since it’s hypertext, they can venture farther afield if they want to, if and when they have the time.
I should remark here that my infrastructure for doing this is incredibly sketchy—as in the code is tantamount to a sketch—as nobody has ever asked me directly for its outcome. So if I’m doing it, it’s off the corner of my desk.
Anyway, that’s what we can do with just concepts and audiences, and there are a whackload more viable data objects in there besides.
In addition to this augmented documentation, I also wrote two OWL ontologies, one was a line item and the other was not. The line-item one was quite a narrowly-targeted thing which nevertheless has considerable generic applicability, while the other was more of a grab-bag of formal entities that weren’t represented either conspicuously or accurately enough by existing ontologies. I think I am just going to bake the latter in as part of the service: create the organization-specific ontology on the assumption you’re not going to have the time to find and audit existing ontologies for everything you’re planning to model; you can go back and fill them in later.
For those of you who don’t know what an ontology is in this context, it’s a controlled vocabulary that defines the semantics for not only classes of entities, but also relations between classes of entities (also known as attributes, predicates, properties). In the particular case of OWL (and/or RDF Schema), these rules enable us to infer latent facts from given assertions in data, which turns out to be pretty powerful.
Gearing Up For the Next One
There are a few other things on the back end I would like to get up and running before my next project. The first is the matter of authenticating to my little extranet. This time around, I just notified everybody involved of a shared username and password. It did the trick but it wasn’t ideal. I do, however, have a little piece of middleware that I wrote that recapitulates the forgot-my-password loop. My wish here is not to have to manage user accounts or passwords at all. I nevertheless need to protect confidential client deliverables. So the way I (and numerous other entities) see it, this magic link is no less secure than is your email account.
I blasted this thing out over a week in May 2019 (this is not the first time I have been around the authentication module block), and actually used it to present a pitch in September of that year. What I would do is just send the one-off link to the prospect, whose first encounter with the extranet was that of being already logged in, because they were logged in. This got the prospective client all bent out of shape, and we were unable to convince them that the material was secure.
It actually turns out to be okay that we didn’t get this job, because the person I pitched it with turned out to be one of the worst experiences I’ve ever had working with anybody in this industry; a fact I would have discovered with even higher stakes attached than when I actually did.
As for the extranet, once again, it was no less insecure than their own corporate email accounts.
The fix for this short-circuited forgot-my-password authentication process, I’m thinking, is pure security theatre: force the user through the process themselves. Show them that the site is off limits without special access. Put up a page with a form input that says something like “enter your email address and we’ll send you a special link to access this site”. On the back end, I can just have a list of valid addresses for a given extranet (or since client teams tend to all have email accounts on the same domain, add a wildcard). The problem with this plan is that it turns a project that was started and finished in under a week into a whole Thing™ with actual significant UX design requirements. Plus it touches email, which is always a pain in the ass. If I grit my teeth, I suppose I can get this done by mid-January, or at least in time for my next engagement.
Following solving for logging in to my little client extranet, there is another piece of infrastructure I’d like to set up for future projects. Now, I didn’t really log my hours on this project because I wasn’t really billing ’em. (I did, however, log my hours on specific things I wanted to remember how long they took.) However, as a line item on the project, the client specifically asked for weekly written status reports. I thought, I’ll do better than that, and just keep a daily journal. Really nothing more than a grouping of commit-message-length blurbs of what I did that day. Now, I was doing these totally freeform, but it would be pretty sweet to have a little Twitteresque text box somewhere on the extranet that I can just type into.
I am actually pretty surprised at my discipline for keeping that journal, which only really fell apart near the end of the project when I was churning out text. This is fine, because all the journal would have said for the last couple weeks was “Writing.”
I should note as well that I am a big proponent of doing forensics on everyday information offgassing for reporting tasks that nobody should ever have to do by hand. Time stamps on files and version control, for example, are a pretty decent proxy for when you quit for the day (and in the latter case, where you were at when you quit). Of course the problem is I have to remember to commit before I quit for the day, and have been trying to decide on a warning mechanism that wouldn’t be too annoying.
The penultimate improvement I had in mind was around the fact that I was working out in the open, meaning I would be writing documents as people would be reading them. It occurred to me that I could probably write a few lines of JavaScript that would notify onlookers to refresh the page they were currently looking at if I had updated it.
But First…
What I’m planning to work on over the next few weeks is an overhaul of my breadboard-slash-Swiss-army-knife for Things I Believe Web Content Should Be Able To Do. The goal is to take it from a sketchy bag of code that can only be run from a developer REPL, to an actual cohesive tool that somebody other than me can use.
Part of my motivation for the overhaul is that I used this infrastructure to wrangle all the content and metadata for this latest client project. I gave the client a taxidermied copy (the thing is more or less a static website generator anyway) for them to put online internally, but if they wanted to regenerate it for some reason (though it wasn’t part of the deal and plus I don’t expect them to), they’d be at a bit of an impasse.
The main reason, however, is just that It Is Time™. I hacked on this thing a bit during the project in order to get some necessary functionality working that I had not previously needed, and so had left as a stub. Turns out adding said functionality made a routine process, one that gets executed thousands of times over the course of generating out a website, intolerably slow. Like seconds into minutes kind of slow. So that part definitely needs to get rewritten pronto. But doing that is going to set off a chain reaction that is going to end up with me overhauling the entire thing; I’m not even going to try to pretend it won’t.
Neurologically speaking, “instant” is anything that happens in under about a hundred milliseconds. It is incredibly important, especially for computer stuff, that if you can’t outright deliver in that timespan, at least show the user that something is happening. Anything longer than that and people start to think that whatever they did to get the result they were expecting didn’t work, and that is immensely frustrating. Now, I am not currently generating anything on the fly, but I will be, so the part that is currently intolerably slow needs to be fast by then.
Mostly, what I need is a workbench for developing this technique, this thing I’m calling dense hypermedia. What I want to do is vastly increase my capacity for communicating complex ideas and develop some innovative methods around persuasion and storytelling that have fallen into disuse (at least, outside of a super niche indie game-adjacent scene) since the Hypercard days. The problem is that when you try to do this on the Web, there are too many pieces, that are too small, and contain too many links, to manage by conventional means. In the process of solving for this though, I will solve a bunch of other perennial problems that have plagued the Web since its inception.
Take addressing for example. Everybody hates a 404. Link rot was in the news briefly this year, and the numbers are abysmal. The mean survival time for a URL is something like 90 days. Then you consider the fraction of these that aren’t some website going out of business, but a page that was moved or renamed.
In this situation, the 404 is a lie. Or maybe just incompetence: the website is just too dumb to know that there was something there and where it is now. If the page did get deleted, there is another error code for that: 410, which you almost never see, which is the website acknowledging that there was something there and it has been deliberately removed. In either case, you need some kind of mechanism for remembering which addresses have ever been exposed, and where the content addressed by said address actually is now, if it is even anywhere. In my opinion this should be a content management system’s number-one job. (It is rarely even a human’s job.)
This addressing issue is actually right at the centre of everything: like ordinary files, if you want to refer to a page on the Web, you need to call it something. Unlike files, for a number of reasons, what you call it is actually consequential. Since you can’t refer to a page (or in a lot of cases, even save the damn thing) without deciding on a name for it, you either have to stop what you’re doing and do that work right now, or come up with a temporary one. But if you do that, you will need to rename it eventually, along with updating all the references to it. Many of these references you will not control, because they will be on other pages on other people’s websites. As such, you have to actually put that logic in your website’s address resolver itself.
So a big chunk of this project (indeed the part that is currently intolerably slow) is exactly this address resolver that is impervious to 404s. The rest of it is basically routines for managing metadata and generating boilerplate.
My hope, though, in the coming year, is to use this updated infrastructure to start doing more dense hypermedia projects, and at least one or two client gigs where that’s the actual point of the engagement. I’ve also set up the ol’ streameroo again, so I’ll be streaming the overhaul from time to time at twitch.tv/doriantaylor.