Q1 2023 updates
Hi everyone,
Welcome to the AI world. It's exciting & scary but hopefully more exciting than scary. And hopefully we'll figure out how to make aliens on GPUs love us, rather than hate us (or just be indifferent towards us).
New posts:
- My 2022 self (I don't know them) was very wrong about meditation, huge monitors, and... sleep.
- lifehacks [137 of them...]
New guest post:
Hiring a research assistant for an AI steganography project
I'm spending most of my time thinking about technical AI alignment these days.
A couple of weeks back I realized that I have enough of a sense of how an LLM thinks to be able to write a prompt that would jailbreak both GPT-4 and Claude at the same time in a way that I hadn't seen anyone do it before... and it just worked. I eventually ended up coming up with the shortest, to my knowledge, prompt that only uses plain English that jailbreaks both GPT-4 and Claude (it's 2 sentences..).
The project I'm thinking the most about right now is investigating steganography in GPT-4 and Claude & I'm looking for a research assistant to join me on it. I think there's a ton to learn about how models really think from looking into this. Relevant: GPT-3 will ignore tools when it disagrees with them.
If you're interested, please send me a note with (1) whether we should expect to find it or not, (2) how you'd approach this & (3) what we can learn about LLMs from the project. If things are going well and we are both interested, this could be a longer-term engagement.
Links
- Sam Altman, a few weeks ago:
Some people in the AI field think the risks of AGI (and successor systems) are fictitious; we would be delighted if they turn out to be right, but we are going to operate as if these risks are existential.
- Leo Aschenbrenner:
Everyone on your timeline is fretting about AI risk ... Maybe you've even developed a slight distaste for it all ... That’s what I used to think too ... Then I got to see things more up close. And here’s the thing: nobody’s actually on the friggin’ ball on this one!
- Open-sourcing “Baby AGI”, a paired down version of the “Task-Driven Autonomous Agent” at 105 lines of code. Three task agents (execution, creation, prioritization) work in harmony… forever.
- Holden's Karnofsky: Jobs that can help with the most important century.
- Harry Potter by Balenciaga
Have a great April & as always feel free to reach out!
Stay frosty,
Alexey