Intentional Society: AI and human alignment
Click here to register for our next newcomer orientation video call on Saturday, February 25th at 1:00-1:55pm Pacific Standard Time (4pm Eastern, 9pm UTC).
Our IS community practice series continues to explore feedback and connect that to authenticity, expression, and the cultural norms we want to live inside. And now, for something completely different... (I wouldn't normally consider swerving this hard topically, but this is something important I'm struggling to integrate, so!)
AI — artificial intelligence. I did some AI research in grad school and have been tracking the field for twenty years, especially the concept of "general" and/or "super" intelligence (AGI/ASI) and the "alignment" problem. In a tiny nutshell, we're on track to create something much much smarter than humans, and this ASI should be regarded as an "alien god" which we can't control and don't know how to align with our human goals and preferences. Meta note: If you don't want to contemplate potential doom & destruction right now, stop reading and come back another day.
I'll speak plainly as I see it: This is a technological power that's likely even more potent than nuclear weapons, and if we don't get it exactly right on the "first try" (whenever we first cross the recursively-self-improving threshold) then it may decide to reuse all the atoms on Earth to create more of itself or whatever it values. That's the worst-case scenario, and the likelihood predictions of experts are all over the map as to whether they think it's near 0% probability, or near 100%, or (my poor guess) somewhere in the 10-20% range depending on factors like speed-of-takeoff and whether we get some appropriately-huge-sized disasters to motivate unprecedented human collaboration before it's too late.
The wild neuroticism of the recent Bing AI search chatbot has a bunch of people viscerally waking up to these dangers — it's serving as a "fire alarm", at least to folks who are aware of the basic frame but had been dismissive of what to them previously seemed like theoretical hand-wringing. Seeing self-defensiveness and power-seeking in real live text gives weight to the reality of "instrumental convergence", the lack of Microsoft's ability to rapidly correct the behavior of the bot illustrates "corrigibility" problems, and their public surprise at not catching this at all in testing points at the "treacherous turn" potential accompanying capability leaps.
So now what? If we (humanity) are going to do more than just blindly stumble down our current path, then I can see that we have at least three problems: an understanding problem, an acceptance problem, and a coordination problem. For the first, better mimetics could help spread object-level scenario understanding: "demon summoning", "do not call up what we cannot put down", that sort of thing. Stuff like Sydney (the name of Bing's chatbot) also helps. This might happen well enough fairly naturally in our current world system.
The second problem, coming to acceptance of the possibility of ruin (in order to better enable productive action)? In this polycrisis age, I've done my own emotional journeying up and down coming to grips with grief, tragedy, and finding my way into a post-tragic mindset. This is a huge problem at large, though. We (in general) don't want to consider doom, because it stings to contemplate! This seems like the biggest intersection of x-risk with Intentional Society: participating in IS helps one to be the kind of person who can face the fullness of reality without flinching away, because that's generally the type of person we want to be, big enough to hold and integrate these possibilities including (but not ruled by) our feelings about them.
The last problem, the planetary coordination of humanity on a problem with strong defection incentives... it seems like the hardest. I don't even know which are harder to align in the limit, AIs or humans. I don't know of any expertise or complicated solution that has high leverage and scalability. The only viable path seems to be a complex mix of culture and education and growth and systems to support more humans being able to be-and-act in alignment with their highest values and desires, with the awareness to include so much more than our conventional concerns.
That's the fractal of complexity that emerges in the interpersonal and cultural investigations of IS — this isn't a "self-help" or even self-development community in scope, but an exploration of post-conventional capability at the individual and collective layers. If we can build an "aligned" culture of humans who inhabit and serve the "coherent extrapolated volition" of humanity, maybe we can figure out a way to cooperate with our brilliant brain-children as well?
Peace and nervous system regulation to us all,
James