Information literacy and chatbots as search
By Emily
This post started off as a thread I wrote and posted across social media on Sunday evening. I'm reproducing the thread (lightly edited) first and then below I address several recurring responses I saw on social media.
The thread
As OpenAI and Meta introduce LLM-driven searchbots, I'd like to once again remind people that neither LLMs nor chatbots are good technology for information access.
Chirag Shah and I wrote about this in two academic papers:
- 2022: Situating Search
- 2024: Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?
We also have an op-ed from Dec 2022:
Why are LLMs bad for search? Because LLMs are nothing more than statistical models of the distribution of word forms in text, set up to output plausible-sounding sequences of words. For a detailed presentation of this point, see this talk I gave in August 2023:
If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.
But even if the chatbots on offer were built around something other than LLMs, something that could reliably get the right answer, they'd still be a terrible technology for information access.
Setting things up so that you get "the answer" to your question cuts off the user's ability to do the sense-making that is critical to information literacy. That sense-making includes refining the question, understanding how different sources speak to the question, and locating each source within the information landscape.
Imagine putting a medical query into a standard search engine and receiving a list of links including one to a local university medical center, one to WebMD, one to Dr. Oz, and one to an active forum for people with similar medical issues. If you have the underlying links, you have the opportunity to evaluate the reliability and relevance of the information for your current query—and also to build up your understanding of those sources over time. If instead you get an answer from a chatbot, even if it is correct, you lose the opportunity for that growth in information literacy.
The case of the discussion forum has a further twist: Any given piece of information there is probably one you'd want to verify from other sources, but the opportunity to connect with people going through similar medical journeys is priceless.
Finally, the chatbots-as-search paradigm encourages us to just accept answers as given, especially when they are stated in terms that are both friendly and authoritative. But now more than ever we all need to level-up our information access practices and hold high expectations regarding provenance—i.e. citing of sources. The chatbot interface invites you to just sit back and take the appealing-looking AI slop as if it were "information". Don't be that guy.
Common responses/FAQs
The responses to this thread were overwhelmingly positive. It seems that most people it reached are appropriately skeptical of using LLMs and chatbots for search. But it of course also got some pushback. Here, I've categorized the most common kinds and provided responses:
"Doesn't RAG (retrieval augmented generation) solve this?"
Retrieval augmented generation systems run a standard web search based on the input into the chat interface, the use an LLM to "summarize" the documents retrieved. This does not solve the problems mentioned above for several reasons:
- The summary extruded from the LLM is still synthetic text, and likely to contain errors both in the form of extra word sequences motivated by the pre-trainining data for the LLM rather than the input texts AND in the form of ommission. It's difficult to detect when the summary you are relying on is actually missing critical information.
- Even if the set up includes the links to the retrieved documents, the presence of the summary discourages users from actually drilling down and reading them.
- This is still a framing that says: Your question has an answer, and the computer can give it to you. This framing brings all the attendant problems, as outlined above and in the papers cited.
"But LLMs work great for code!"
First, I'm not really convinced—they're probably decent for boiler plate, and the code query example is at least a case where you could catch many incorrect answers by just trying to compile or run the code, but there are still security issues here. Would someone who is writing code by querying Copilot or whatever really be in a position to detect security problems injected through that automatically provided code?
Second, I encourage anyone who has this reaction to take a good hard look at why they think a system being good (or even halfway decent) at providing boiler plate code gives it any credibility for other kinds of information seeking behavior. The tech companies are counting on you agreeing that writing code is the pinacle of cognitive activity, and so anything that can do that can surely do any "lesser" tasks.
"Search is also statistics"
Yes, sure—researchers working on information retrieval have been using machine learning methods for decades. But it matters how the input and output map to the task at hand. If you're training a system to rank documents based on their relevance to a query and reliability—and then you're using that system to rank documents in response to queries, that's a decent match. If you're training a system to output plausible sequences of words based on the distribution of those words in some training corpus and then using that system to produce plausible sequences of words as continuations of queries but telling people they are answers to their queries ... that's a problem.
"Traditional search engines are bad, actually"
I'm not disagreeing. Abdicating information access systems to commercial interests (rather than seeing them and supporting them as a public good) has led to lots of harm. In particular, I recommend Dr. Safiya Noble's Algorithms of Oppression on this topic. But just because the current system is bad doesn't mean any given change is going to be better—especially a change that not only has all the problems outlined above but also doesn't address the underlying problems with the existing system (commerical interest).
"People can be wrong too"
This analogy is misleading—people are accountable for what they say. People are also understood to be individuals with certain experience and expertise, not set up as all-knowing oracles. But most importanly, the analogy is dehumanizing. For more, see:
"But I use it all the time and I'm happy with it!"
I'm sorry to hear that. I hope you will reconsider in light of the above, or if not that, then in light of the environmental and social impacts of these systems.