ARTchivist's Notebook: DALL-E & Me, or Alt text as AI instructions
ARTchivist's Notebook: DALL-E & Me, or Alt text as AI instructions
By now you're probably familiar with AI image generators DALL-E 2, Midjourney, and others. (An image created by Midjourney recently won a prize at the Colorado State Fair.) These artificial intelligences, fattened on enormous image banks, have a broader "knowledge" of art history—no, image history—than most of us. (I say "knowledge" in quotes because I'm not sure whether AI's "know" things in the same way humans do, but that's a subject for another time.) Due to the vastness of their learning base, they can create incredibly convincing, original digital images in a matter of moments from nothing but basic text prompts.
As an art writer and archivist, I've spent a lot of time doing the opposite—turning images into text. I have described a large number of artworks to analyze and critique them, but also simply to share something of the experience of seeing them. As an archivist I have assigned keywords and written image descriptions and alt text, also with the intention of providing broader access to visual works. The image descriptions and alt text I've written help people who use a screen reader get a sense of the images they encounter on the Internet. (If you're not familiar with image description and alt text, the Cooper Hewitt has an excellent guide.)
It recently occurred to me that the descriptions I've spent so much time producing might have another use: as instructions for an AI. I was particularly curious to see what an AI would do with alt text, which is created with the express purpose of conjuring an image in the mind of a person who can't see it clearly or at all. What would alt text conjure in the "mind" of the AI?
Thanks to a generous friend, I borrowed a login to DALL-E 2. Here's what it did with a few examples of alt text I wrote for images featured on the Judy Chicago Research Portal.
Original image with alt text:
Color photograph of a child walking and holding a blue disc next to a kneeling woman dressed in white in front of rows of white blocks and white smoke
What DALL-E saw:
The results are...not bad! It kind of missed the part about the "kneeling woman," but I guess it counts that her knees are bent?
Original image with alt text:
Color photograph of a woman submerged in a bathtub, seen from above
What DALL-E saw:
These are positively creepy and I suspect came up against DALL-E's express prohibition of nudity. Also, it's interesting that at least two of the women look Asian, which was unexpected. I just assume that most women on the Internet are white, but I shudder to think there is an odd corner where Asian women bathe.
Original image with alt text:
Square canvas with yellow, orange, and red gradations leading towards a center point. A square with purple and blue gradations surrounds the point.
What DALL-E saw:
Fair, fair. To be honest, my alt text could've been better, but when you've been writing hundreds of descriptions of abstract art all afternoon, you might be excused for losing your way.
Original image with alt text:
Volunteer Marjorie Biggs leaning over and embroidering the runner for the Caroline Herschel place setting
What DALL-E saw:
I'm impressed that DALL-E knew that "Marjorie Biggs" was an older white woman. Or maybe those are the majority of people who embroider in pictures online?
I'm not sure what this experiment tells us about AI or alt text, or anything, for that matter. If nothing else, it gives us a view into a series of gaps between 1) what I see and experience, 2) what I am able to express in words, and 3) what DALL-E is able to understand, make associations with, and render.
There is a certain poetry to this game of telephone, if only to affirm the incommensurability of experience. We all see and experience things, and we try to share that experience within the limits of what our clumsy bodies and languages and tools can do.
DALL-E gives us an occasion to think about all the layers of effort that go into communication, and whether that communication is accurate and has a desired result. Can an AI-generated image make us feel something beyond wonder? Can alt text?
Do you use AI to write alt text or image description?
Take this 2-question anonymous survey.
Thanks for reading! If you have any comments or questions about this issue, please feel free to get in touch. Or follow me on LinkedIn or Twitter @SharonMizota.
ARTchivist's Notebook is an occasional newsletter musing on the intersection of archives, art, and social justice by me, Sharon Mizota, DEI metadata consultant and art writer.
I help museums, archives, libraries, and media organizations transform and share their metadata to achieve greater diversity, equity, and inclusion. Contact me to discuss your metadata project today.