Art vs Engineering
I'm finally (finally!) working on a new version of the Are We Really Engineers talk, which has got me once again thinking about the differences between "Programming" and "Software Engineering". I've also been wholly physically and mentally consumed last week by the art machine:
I made a thing so that anybody, even nonprogrammers, can make their AI-generated art. Just put in a text prompt and out comes art!https://t.co/QETWSB73gI
— Inactive; Bluesky is @hillelwayne(dot)com (@hillelogram) August 3, 2021
As a rough summary, the art machine is based on an image recognition model called CLIP. We use CLIP to guide an image generator to produce something recognizable as the prompt. This gives you really cool images. Here's what the art machine gave me for "old gods and older machines":
Delightful.
Now, while I made the art machine, very little of it is original work. CLIP-based art generators have been around for months now, pioneered by people like advadnoun and Katherine Crowson. All I really did was take one of Crowson's generators and make it more accessible to nonprogrammers.
It's that "making it more accessible" bit that really interests me. I have a very different goal from the people whose work I'm adapting. They are primarily artists and data scientists. I can't read their minds, but I suspect that they designed their generators to make good art. Whereas I designed the art machine to make make easy art. We make different choices in how we design things and what we prioritize, and I wonder if some of those choices speak to the difference between "programming" and "engineering".
Couple disclaimers. First, I am not saying the artists are bad programmers. This is all a black box to me, and they understand the code much better than I do. I'm specifically saying they weren't trying to engineer something. Nor am I saying I'm a good engineer, or that the art machine is well-engineered. But it has more of an engineering mindset, which is what I want to pick apart more.
Second, I'm running this all off the top of my head. I could change my mind on all of this tomorrow.
Background Knowledge
All the AI Art generators are running on Google Colab. Colab is a free IPython notebook service. You can see, run, and clone other people's notebooks. More importantly, Colab gives you free GPUs. CLIP runs like 60x faster on a GPU than a CPU. The size of the image you can make is limited by the GPU memory. With 16gb RAM, the art machine can go up to about 700x700 pixels.
Colab instances have GPUs with either 8, 12, or 16gb of RAM. Most people get 16, but people who use it frequently or for long times are more likely to get 8 or 12. You're also less likely to get the good stuff during peak hours. I have no idea when those are.
While you can combine CLIP with different kinds of generation networks, the community has generally settled on VQGAN, as it seems to produce the best images. Beyond that, there's a huge amount of tweaks you can do both to the core algorithm and the ML hyperparameters. In the hunt for good art people have made all sorts of different notebooks:
I got super into this early July, tried to get other people to use it, and found them intimidated by the code. So I made the art machine.
Differences
Using Tools "Right"
At first I thought all this was beyond me, as I'm neither a data scientist nor an artist. But I first realized I could make positive contributions when I saw this line:
!curl -L {link} > vqgan_imagenet_f16_1024.yaml
!curl -L {link} > vqgan_imagenet_f16_1024.ckpt
The checkpoint file is 500mb and takes about a minute to download. It does this download every time you run a new image, even if the file already exists. After reading the curl documentation, I found that the -C
flag checks if the file already exists before downloading it. Using it saved me a minute on every future run.
It's the "reading the curl documentation" thing that interests me. While curl is a critical part of the generator, it's only used in one place. It's "auxilary", in a way, as opposed to "core". I think "engineers" are significantly more likely to research auxiliary tools so they can use them "right".
There's a couple other places I noticed this in the art machine. First, I wanted to hide pip
output for UI reasons. To do this I researched how IPython worked and found the %%bash
and %%capture
magic commands. Second, while Colab has a form extension for inputting values, it can't do things like buttons, so you can't add an easy "download all images" button to the form. If other notebooks had a "download" option, it was in another IPython cell, which you're told not to run unless you want to download. I didn't like that, so I read up on Jupyter Widgets and learned how to implement a button widget. That leads to a nicer UI:
I don't know if learning this would serve the purposes of the artist in making art. In many cases the opportunity cost just isn't worth it, and using an familiar tool "wrong" is preferrable to spending time learning how to do things "right". Whereas engineering tends to work in contexts where the opportunity cost goes the other way, maybe.
Developer/Client disparity
Most of these notebooks are made by AI artists, for themselves or other artists. They focus a lot on how to get the best art. Here's how you modify the original codebook sampling notebook:
You're directly editing the code, and there's fifteen knobs for you to twiddle. This works for the purposes of finding good art: everybody already is comfortable editing code, and more knobs means more power to affect the final product. I wanted to adapt it for people who didn't program. That means asking them to modify code is right out. I was also worried that too many options would lead to analysis paralysis, or make it seem like you needed a lot of practice to get good images out. So I made a bunch of simplifications:
- Three basic options, six advanced options, and even that's on the brink of too much. All of the options are documented and orthogonal. Nothing relies on anything else and there's only two ways to "tweak" things.
- All the code is hidden from you. While it's still there, you have to choose to see it, as opposed to seeing it by default.
- I also hid diagnostic information, like loss functions and warnings. This stuff is very useful if you're tinkering with the code, but it's extraneous information if you're just dabbling.
This significantly limits the power of the art machine compared to other notebooks, and I assume serious artists would prefer more powerful colabs. At the same time, it makes it a lot less intimating to layfolk. No blocks of code or "noise weights" to fret over, just fill in the prompt and click "run".
Two other changes were more subtle. First, instead of starting the iteration loop with i=0
, I did i=1
. That skips printing the pre-trained output, which I found was confusing people. Second, I always print final image. Normally people have steps_per_image
be a factor of total_steps
, which gives the expected behavior. But if it's not, I wanted to make sure people could still see the final result. Both of these are sharp edges you can quickly learn to work around, but I don't want people to have to work around it.
I don't think "your clients are significantly different from you" is fundamental to engineering, but it seems like a good indicator that engineering practices would be more effective. Most "non-engineering" programming is used by one person and is adapted around that one person's needs. Engineering programming is instead done for other people.
(Not necessarily layfolk: I can think of things I'd do for "software engineered tool for AI Artists" that'd be different from both "tool for layfolk" and "artist programming". Checkpoints, export, variation analysis, etc. Though I'd need to talk with more artists to know what they'd actually like, versus what I imagine they'd like.)
The Software as an Artifact
This is something I saw to a lesser degree but I really like the idea and think it's pretty important. Software engineering concerns itself more with the program as a produced artifact of the process. The code itself should be easy to analyze, query, and modify, so that we can address broader concerns beyond the specific task.
An example of this is backwards compatibility. Updating the software shouldn't change the results on old data. Another example is auditing: we often don't just want to know the state of the current database, but when/why/how it changed. These arguably aren't part of the core task the software is solving, but they're important for the surrounding context in which we use the software. So you see both features appearing a lot in software engineering but not in regular programming.
This doesn't manifest too much in this particular case, except in that I have a rule to never "lock out" an art. Someone should always be able to reproduce an image they get, given the right parameters. That hasn't given me too much trouble, with one small exception. The step size is defined as:
# weirdness was 1 by default
step_size=0.05*(weirdness if weirdness != 11 else 22),
I found that a step size of 0.1
gave roughly the same quality as a step size of 0.05
, so switching would make image generation twice as fast. But if I changed the step_size
equation, people wouldn't be able to be able to exactly reproduce their old art. Instead, I kept the same equation but changed the default weirdness to 2
. A small fix to solve a small problem most people don't even care about, but it reflects some of the values of engineering.
Bonus: OOP Everything
Art machine is designed for nonprogrammers, which shaped a lot of the engineering decisions. I am a programmer, though, and I want to make good AI art. So I also have a separate, private notebook running in AWS SageMaker. The code has one crucial difference: instead of the run being part of a script, I wrapped all the logic in a Task
object. That makes it easy to do batch jobs on my art.
t = Task(h=300, w=500, output_dir="img/gemstones")
for g in gems:
t.run(f"{g} gemstone")
Making a program itself programmable feels very "engineering" to me. I haven't yet fully formed the idea though.
I think I'm on to something here. Most arguments about "programming" vs "engineering" are centered on ethics, connotations, and metaphor. But it'd be more interesting and informative to focus on the differences in approaches. It comports with other stuff I learned via the crossover project. For example, why most academics and scientists don't use version control, even though it's a revolutionary software innovation. It's because version control matters a lot more to engineering than programming.
Anyway, if you want to try out the art machine, the link is here. I want to conclude by reiterating that I'm basing this off the incredible work of AI artists. I'm not "improving" their code, just adapting it to a different context. Thanks for reading!
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.