The Tragedy of the AI Commons
Hi friends –
Last month, I was captivated by an investigation in The Atlantic that revealed that Meta's AI bot had been trained on a trove of more than 170,000 pirated books. I contacted the author, Alex Reisner, and asked if my books were in there. "Stealing MySpace and Dragnet Nation are both in the dataset, sorry to say," he replied.
Of course, I am not the only creator to discover that their work has been ingested into the AI machine, copyright be damned. But it was strange to me how shocked and violated I felt.
It wasn't a financial blow - I don't think the AI bots are going to cut into my meager sales of decades-old nonfiction - but more of a sense of broken trust. I had really believed claims of the leading AI chatbots – OpenAI’s ChatGPT, Google’s Bard, Meta’s Llama and Anthropic’s Claude – that said they were trained on data “publicly available on the Internet.”
This broken trust is what I am calling the Tragedy of the AI Commons in this week’s article for New York Times Opinion (gift link).
Here’s my argument in a nutshell:
What’s the Problem?: The best parts of the Internet are places where people share their work, their art, their ideas. But now rapacious tech companies are scooping up all of this beautiful human expression to feed into their for-profit AI systems.
Understandably, many are pulling back from sharing. Artists are deleting their work from X, (formerly Twitter), Hollywood writers and actors are on strike to make sure their work is not fed into AI systems, and publishers like The New York Times and CNN are using technical measures to prevent AI bots from scraping their websites.
Meanwhile, dubious AI generated content is rushing into the public sphere. NewsGuard has identified 475 AI-generated news and information websites. AI-generated music is flooding streaming websites and generating royalties for scammers. Dangerous AI written books, such as a mushroom foraging guide that could lead to mistakes in identifying highly poisonous fungi, are prevalent on Amazon.
Altogether, this is leading to a polluted public sphere.
What is Being Done?: Many artists are fighting legal battles contesting the use of their work. For instance, a group of authors is suing AI outfits over the use of their pirated books in AI training data.
The European Union is, of course, considering legislation. It is set to pass later this year the first set of global restrictions on AI, which would require AI companies to disclose what copyrighted data that was used to train its systems.
But transparency is hardly enough to rebalance the power between those whose data is being exploited and the companies poised to cash in on the exploitation.
What More Needs to be Done?: Tim Friedlander, founder and president of the National Association of Voice Actors, has called for AI companies to adopt ethical standards. He says that actors need three Cs: consent, control and compensation.
In fact, all of us need the three Cs. Without control of our work, and opportunity to participate if others profit from it, I worry that the best parts of the Internet -- the art, the music, the creativity in public -- will wither. And that will be a real tragedy.
Thanks for reading.
Best
Julia