The Midjourney Mess
Everything Is True
Ada Hoffmann's author newsletter
(This post was originally for subscribers only; I’ve chosen to release it, and its sequel post, for free as a part of the SFWA roundup of posts on AI and the creative industries.)
When I was in graduate school - not that long ago; I got my PhD in 2019 - my field of study was computational creativity. Now I kind of want to apologize to everybody. Like, sorry about the mess! But this isn't really an apology post. I've been chewing on the recent furore over Midjourney (and similar AI art generators) and their potential use as a source of illustrations for book covers, and I want to talk a little bit about how we got here, and about some lessons I've belatedly learned.
When I was in graduate school, I genuinely didn't believe that AI art would replace humans. Quite frankly very few people did! Computational creativity was quite a small field, and when people did talk about their work replacing humans, it was often in these very credulous, self-aggrandizing, Singularity-esque terms. Like, ooh, what if AI art got so good that humans couldn't comprehend it? That kind of thing. I didn't take it seriously. Even the people who did take it seriously tended not to be thinking in specific or detailed terms about what would be an ethical vs an unethical use of this type of AI.
It's hard to explain what we did find interesting about AI art. Hindsight tends to twist things, but I think I and many others viewed it as a cool toy - not a replacement for human artists, but something interesting that a certain kind of technically-minded artist could fool around with to make cool things in a new way; also, not incidentally, something that would improve our understanding of what creativity even is and how it works in the first place.
(And if you're saying "but Ada, why did you spend those precious years researching a cool toy, when you could have been curing cancer or something?" - then you vastly underestimate the appeal of cool toys, and you overestimate my attention span.)
Anyway, then 2019 happened (I had already passed my thesis defense at this point and was just waiting for convocation) and large language models happened,* starting with GPT-2.** And you know the rest.
The thing is - in a perfect world, I do think that even large language models could be used appropriately, by some artists, as cool toys, without wrecking things for the rest of the artists in the world. Ursula Vernon has a good thread about some of the nuance here with Midjourney. The majority of the people using this stuff, in Vernon's experience, really do view it as a cool toy - either something just for fun or something they can work into their pipeline as a digital artist. The people who are like "bwahaha now we'll never have to pay fair wages to an artist again" are a small (though vocal, and infuriating) minority.
And that's where it would stop, if not for capitalism.
Nobody who actually enjoys writing wants to see AI replace human writers. The people who are using GPT-3 to write their books for them are overwhelmingly people in economic situations that demand absurd amounts of output for them in a short time. They're self-published authors on Amazon who know that the algorithm is going to abandon them if they don't churn out a book every two months. Or they're underpaid writers working for content farms, sweating for ad money. (And in some cases they are simply writers who think the software is a cool toy and want to see what happens - but that use case doesn’t worry me as much.)
I don't think most people who develop these tools are snickering into their hands, going "bwahaha, we will make human creativity obsolete!" But they've released these tools into a world in which economic conditions dictate that people will be incentivized to let AI do their writing for them, and in which the very rich people at the top of the heap are alr incentivized to ask for more and more cheap writing faster.
While I'm not as familiar with the world of illustration, it seems like the conditions are similar. There are big powerful companies that don't see why they should pay gfor a humann artist's work when there are cheaper, faster options; and there are tiny little outfits that maybe would like to pay good money (or maybe not; who knows?) but who don't have that much in the first place and are tempted to cut corners where they can.
As Ted Chiang writes in one of my favorite AI essays ever, most of the AI problems that scare us are actually capitalism problems. And I don’t think AI art is any different.
I also don't think my tiny little computational creativity research community ever really talked about this. (I may be forgetting someone's work or overlooking someone who published something useful after I graduated; if so, many apologies and please link me.) There were some discussions to the effect of what if AI can do something that humans used to do themselves, is that good or bad, how would the creative world adjust - but I don't think anyone from that community sat down and discussed the economic conditions that incentivized people to automate things even when they wouldn't otherwise want to, or to replace people with software even when the software is not as good. These discussions have happened elsewhere, but I don't think we ever really talked about how it applied to us.***
There's a lot more to say about this in future posts, but in the meantime: Fuck capitalism. Or at least, that's where I'm putting the blame today.
*I'm lumping art generators like Midjourney in with large language models for simplicity, for reasons that will make more sense to a computer scientist. Obviously art is not language; in practice, these models do need some relatively sophisticated way of handling language in order to take text prompts, and the way they learn visual art from large data sets is broadly similar to the way they learn language.
**Technically there were a few smaller neural network-based language models that showed promise before I graduated; I was pretty impressed with some of Ross Goodwin’s results at poetry generation, for instance; but these models were still a rare, niche thing at the time, and there were also many small attempts at neural network text generation that failed badly.
***I'm maybe taking on a little more collective blame here than I deserve, or lumping two communities together that have been largely separate in practice. There’s a lot of inside baseball here that I don’t want to go into, but the short version is that the academic community I was part of and the corporate/startup communities that are making these things are really not the same community at all, and communication between the two tends to be minimal. This doesn’t obviate the fact that both communities were working on similar problems, and that mine didn’t really take the dangers seriously.