Information is Relational
Google's AI Overviews Fails Helpfully Highlight a Source of Danger
By Emily
We tend to think of information as a set of objective facts that just exist in the world, and of reliable information sources as somehow self-evidently reliable. But in fact, information and the information ecosystem are inherently relational. The marks on the page (or pixels or bytes) are just that, unless someone is interpreting them. A news site only works as a news site and a satirical blog only works as a satirical blog because their audiences understand them to be those things.
When Google, Microsoft, and OpenAI try to insert so-called “AI” systems (driven by LLMs) between information seekers and information providers, they are interrupting the ability to make and maintain those relationships. (They are also disrupting the economic relationships that information providers enter into: If the search website displays answers directly, information seekers don't click through to the underlying pages, and those pages lose advertising revenue. But that is not my focus here.)
I've been writing for a while, in academic papers and also blog posts and op-eds, about why LLMs are a bad replacement for search engines, and how synthetic media spills are polluting our information ecosystem.
One of the key points is that, even if the answers provided could be magically made to be always “correct” (an impossible goal, for many reasons, but bear with me), chatbot-mediated information access systems interrupt a key sense-making process.
An example I like to use is as follows:
Say you put in a medical query into a traditional search engine (think one that would return “10 blue links”), and the links you get point to a variety of sites. Perhaps you're offered links to the Mayo Clinic, WebMD, Dr. Oz's site and a forum where people navigating similar medical questions are discussing their own experiences. As a denizen of the Internet in 2024, you have had the opportunity to form opinions about these different sites and how to situate the information provided by each. You might know the Mayo Clinic as a renowned cancer treatment center, WebMD as commercial web property but one that does work with MDs to vet information, and Dr. Oz as a charlatan. The forum is particularly interesting, because any given answer lifted from such a site might be the kind of thing you'd want to confirm before acting on, but the potential to connect with other people living through similar medical journeys, share stories and pool information, can be invaluable.
If instead of the 10 blue links, you get an answer from a chatbot that reproduces information from some or all of these four options (let's assume it even does so reliably), you've lost the ability to situate the information in its context. And, in this example, you've lost the opportunity to serendipitously discover the existence of the community in the forum. Worse still, if this is frequently how you get access to information, you lose the opportunity to build up your own understanding of the landscape around you in the information ecosystem.
In the few days since Google launched their “AI Overview” feature, the denizens of the Internet have had a field day turning up absurd fail cases. I wanted to use a few here for how they highlight the ways this approach to information access ruptures the relationship between reader and writer.
Some of the most famous examples as I write involve advice to eat a small rock each day, for better health and using non-toxic glue on pizza to keep the cheese from sliding off. These are based on items from the satirical paper The Onion and a Reddit post, respectively.
Closer to home for me, the AI Overviews feature also output, in response to the query “mt rainier eruption prediction” the following very amusing string based on a post from local satirical paper The Needling: “According to a 2022 study by Washington state geologists, Mount Rainier is unlikely to erupt during most people's lifetimes, with the possible exception of the very end.”
These examples are clearly cherry picked, first by the people who found them and then by everyone who chose to share them (including me). But they are instrumental in revealing the problem with severing information (or non-information, when the LLM generates something inconsistent with the underlying text) from its context. And as the work of Prof. Safiya Noble reminds us, this isn't just a question of decontextualization, where words are stripped of their context. It's also a question of recontextualization, wherein the purported answers are coming from an automated system, perceived as “objective” by many, from a company whose stated mission is to “organize the world's information and make it universally accessible and useful.”
Taking the long view, and following Dr. Noble, I think it's time to reconsider information access as a public good and to remember that as recently as the 1990s, that's generally how it worked. You wanted information and didn't know where to start? You went to the library. In the short run, I urge search users to always click through and find the source. Turn off the “AI Overview” feature if you can, ignore it otherwise. The only thing it's good for is as a way to see the egg on Google's face.
Follow up post: Information Access is a Public Good