AI is Not Living Up to Its Hype
Hi friends —
Last week, OpenAI CEO Sam Altman promised that the company would release “new stuff” that “feels like magic.” But this week’s announcement was not particularly magical — it was a rather routine update to make ChatGPT cheaper and faster.
To me, this feels like an apt metaphor for the moment we are in. The makers of AI have been promising us “the most powerful technology humanity has yet invented,” but in my latest piece for New York Times Opinion, I argue that AI is actually more often too buggy and unreliable to be useful.
“It’s looking less like an all-powerful being and more like a bad intern whose work is so unreliable that it’s often easier to do the task yourself,” I wrote.
The evidence is on my side. Researchers are increasingly finding that AI provides unreliable answers to even simple questions, whether about the law, medicine, or voter information.
Even programming - where AI is meant to excel — has shortcomings. Consider Devin, an “AI software engineer” whose release was heavily hyped earlier this year. Devin’s maker, a startup called Cognition, posted a video that claimed to show Devin completing a coding task that had been uploaded to a website for freelancers called Upwork.
A few weeks later, though, a software engineer named Carl Brown debunked Devin’s work on his YouTube channel. Brown completed the same task that Devin did in just 36 minutes – compared with the more than 6 hours it appears to have taken Devin, based on timestamps in Cognition’s video. One reason for the delay was that Devin generated code containing errors and then spent a lot of time debugging the errors it had created.
Brown also beat Devin by running two simple commands that were located in the “ReadMe” file provided by the client. Devin apparently couldn’t locate those instructions so it ran a slow outdated programming language through a complicated process. “This is not something I would accept in a code review from a junior developer,” Brown said in his video.
A few days later, the engineer who posted the task to Upwork, Felipe Tambasco, weighed in with his own video titled “Devin didn't solve my computer vision project,” stating that Devin only completed a portion of the task described in his Upwork post.
Cognition responded on Twitter by acknowledging that Devin did not complete the output requested and adding that “Devin is often inefficient and makes mistakes, some that it fixes and others that cause it to get stuck.”
If even the AI makers admit their products are “Often inefficient and makes mistakes,” then we have to reckon with the fact that we are pouring billions of dollars, our energy resources, and a generation of the brightest math and science minds on technology that might end up akin to the Roomba vacuum. The Roomba does a passable job when you are home alone, but if you are having guests over you are going to want to get out a proper vacuum and do the job yourself.
Is Roomba-quality AI worth all that we are investing in it? I don’t have the answer, but I think it’s a question we all need to start considering.
As always, thanks for reading.
Best
Julia