Where did the web go?
This post is about the preservation of digital things.
I wonder if you'll still be able to read it in 25 years?
Hacking my way around town
There was a time in the late 1990s when I got a name for myself as a freelancer-about-town, catering to a specific niche of Oxford-based publishers who needed a website built.
From humble beginnings when a friend introduced me to an editor at Oxford University Press, I ended up with a nice little portfolio of sites for the likes of Elsevier, Blackwell and Macmillan, as well as some direct work with various OUP authors.
My combination of skills as a reasonably competent designer, being enough of a code monkey to hack together some HTML and JavaScript (oh, and FTP some files – important to complete finish the job), and having the ability to translate what people said they wanted into a vaguely coherent end product, helped fuel this sideline. I was a multidisciplinary team of one for hire – yay for generalists.
For five-or-so-years my output was pretty prolific, both through the freelance work and as part of a burgeoning web agency.
Those were formative times: the web was still relatively new and disciplines were only just getting established, and there was a huge amount of creative and technological exploration about exactly where it could all lead and how best to deliver online experiences.
I distinctly remember a conversation with my boss at that point who had purchased a minuscule Nokia phone and we ended up cranking out some pages via WAP. Holy guacamole, you can even do this web stuff on a mobile phone!
Looking back now it’s hard to find a solid record from that period. You can see a historical snapshot of many websites on the Wayback Machine, but in many cases it captures only fragments, and certainly doesn’t cover more elaborate functionality. Spectacular and often highly sophisticated Flash games and animations, pre-YouTube .mov files, even basic image maps – all buried without trace.
Up until last year I took peculiar pleasure knowing that at least one of the websites I’d created during that phase remained online decades later. The online home of children’s book illustrator Korky Paul, dating back to 1997, proudly stood as an artefact of that bygone web era.
Alas I was devastated (OK, maybe more wistful and nostalgic) that it got a full makeover last year, and in one fell swoop the last remnant of my freelance heyday evaporated into cyberspace.
Nothing ever lasts forever
I think it’s interesting to observe just how much of the early web you now can’t get your hands on in any meaningful way.
By contrast, despite being unable to access the files, formatting and code I was absorbed in 25 years ago, I can still head to Amazon and purchase a book about Scottish mountains that I designed in 1996.
For all of the ‘technology will kill the…’ narrative that gets pumped out when every new wave of bullshit innovation hits, in retrospect you can almost always find a contradictory point of view.
Thus when we're told Kindles will replace books, music will all be digital, cryptocurrencies will usurp the banking system, we’ll be doing our daily stand-ups in the metaverse, AI will wash the dishes, and we’ll all be watching movies with clunking over-priced ski goggles on our heads, sometimes a pinch of salt is required.
I’m obviously not suggesting that technology hasn’t had a profound impact on all of the industries and mediums I’ve listed above. Just maybe that it’s not always as extreme or game-changing as we're led to expect.
Therefore it’s worth being aware, or even concerned, that while there’s a seemingly bottomless pit of investment in new and shiny tech, some of the important and foundational stuff can get left to languish, or slowly fade into the background.
I love this episode of Cautionary Tales where Tim Hartford examines the oddly British story of the Domesday Project, and the heroic efforts of individual enthusiasts in preventing a huge swathe of geographical and historical data gathered in the 1980s from disappearing entirely.
As the shownotes say:
We tend to take archives for granted — but preservation doesn’t happen by accident, and digitisation doesn’t mean that something will last forever. And the erasure of the historical record has disastrous consequences for humanity.
Although it may seem a touch trivial, the middle class revolt that occurred in 2016 when the BBC threatened to get rid of its online recipe archive is a really great example of what can be lost.
Thankfully the archive was saved for a nation who couldn’t conceive of life without a quick soup recipe containing minimal ingredients and rated at least four stars.
But what would have happened if the decision hadn’t been reversed? And what if it wasn’t recipes under threat, but a massive chunk of social history, or material that captured the zeitgeist in a way that only a digital format could?
This topic is raised in BBC Future when it highlights the online data that's being deleted, and discusses some of the reasons – some unimportant, some less so – why content vanishes, and the potential implications.
Through one of my previous roles when I worked at National Museums Scotland, I know that there are people who spend all day every day working on preservation – mostly real objects, but also intangible material.
Whilst I was there I can recall a discussion about online storage, a subject that I had up until that point thought remarkably little about.
When my slightly-evangelistic perspective about 'moving everything to the Cloud' was challenged with the counterpoint that I needed to think about how assets might be accessed in 100 years time, it helped to hit home that we may all be creating our own Domesday Projects, condemned by a lack of forward thinking and technical obsolescence.
As a case in point Seb Chan's fascinating article from 2013 talks about the Cooper Hewitt Museum's project to preserve code from an iOS app.
At least when considering the preservation of online data, you're dealing with content available on the (largely) free and open web. When it comes to proprietary systems and hardware it's a whole other ball game. As he writes:
Software written for the first iPhones, released only six years ago in 2007, no longer works on today’s iPhones. It might be because the operating system was taught a fresh new way of thinking about things. It might be because new hardware was invented that is foreign to and misunderstood by the past. Often it’s both.
The fact that many of the image links in the post are now broken tells a story in itself about the fragility of digital life.
Whose job is it anyway?
My thinking process about this post was sparked by a recent series of stories about Google retiring its web cache feature.
This was largely reported in the technology press and social media, but not really picked up by mainstream outlets. It's not like they're getting rid of our goddam recipes! Oh, wait.
I'm not 100% clear from the reporting, but Google seems to be delegating responsibility for backing up the Internet to the Internet Archive, an online library that's been running since the 1990s.
The archive is a remarkable resource, and you definitely get a whiff of the olde worlde Internet when you peruse its quirky range of content.
But it's a non-profit, and therefore reliant on philanthropy and donations to keep it ticking along. And as the article above notes, its records are less extensive than Google's own archives.
The subject matter of some of the specific archives (the Grateful Dead, U.S. political TV adverts, Remembering 9/11) also points to the fact it, not unreasonably, leans to an American-centric interpretation of history.
The sad truth is, we don't really have an answer to the question of whose responsibility it is to curate and care for the world's digital media on an on-going basis.
If it's left in the hands of commercial companies then you're always running the risk of such a large, costly and unprofitable initiative being cut.
Add to that the removal of content that others would only do in exceptional circumstances; allowing those with money, power and influence to sculpt their own version of events.
And when it comes to long-term archiving strategies, is it really going to be down to AWS or Microsoft to determine how this comes to pass?
I absolutely take the point that this is an area you could massively overthink, particularly when considering the petabytes of data created every single day.
But it also doesn't feel unimportant to be considering how future generations will understand what went on in the decades after the web came onstream.
I may have lost a few funky little websites; the risk is we all lose knowledge and resources of far greater consequence.
🗃️ Thank you for reading. If you haven't already you can subscribe and never miss my future ramblings.