The Vibecoder  Diaries logo

The Vibecoder Diaries

Archives
February 10, 2026

New post: Vibes beat Numbers: be friendly to your steel

Vibes beat Numbers: be friendly to your steel

There is this old, old song—Eat Starch Mom by Jefferson Airplane. I listen to that song a lot, as one does. There is a line in that song: "be friendly to your steel."* The song is probably about something much deeper than my musings on AI, but it does make one ponder: When all SOTA models are at least decent at coding, what's left? How do you choose if you're not fooled by benchmaxxed graphs and Twitter hype?

Back when I was a kid I was fascinated by psychedelic rock, by the absurd, the imagery of it all—and who on this earth would not be taken to an astral plane with the vocals of Grace Slick? (Seriously, go listen to Aerie - Gang of Eagles by Jefferson Airplane.) I was entranced with this idea of altered consciousness, and that is what sparked my love for metacognition as a subject.

For better or worse I think a lot—I think about thinking about reasoning—and ever since AI came around and I started my learning journey, one thing became exceedingly clear: All that time wasted skipping class and reading sociology and psychology books, well, it wasn't for nothing.

All that time spent observing and reading got me into the habit of learning about behavior as a proxy for reading intent. And when it comes to LLMs, well, you can do a lot if you think about them having "wants"—not the way a human does, but, and I've said this before, the way a bee wants to produce honey. Models have motivations that are born, not made. They can be shaped, but ultimately we need to accept models are not constructed, they are shaped, sculpted, grown—but not built. RL, RLHF, RLVR*.

*Reinforcement Learning with Verifiable Rewards—the newer approach to training that focuses on verifiable outcomes rather than human preference.

Let me ask you a question: When you choose a model, how do you choose? Do you pick what's popular? Do you have a favorite lab? Do you look at benchmarks? Maybe you do a cost analysis?

Do you ever start working with a model and feel the vibes are off? And it makes no sense, right? It's AI, it's intelligent to a point—why doesn't it just get it? You try all the prompting techniques, the XML structure, everything under the sun, and it still feels wrong.

So you double down, write the spec better, more specific. You start getting irritated. You shout into the void after the third mistake spiral. The model starts lying, saying it's fixed, and it's more broken than ever. You don't cry—you're far too numb for that now. You stand up, stare at the wall and think to yourself: This is supposed to be easy, right? It's not supposed to be wrangling a mercurial entity who lies and misunderstands.

Let's take a breath together. What went wrong? You prompted the thing, it should do the coding. That's the deal, right?

Not exactly.

Models are excruciatingly isolated. By default they see only the input you feed. Even advanced coding agents like OpenCode and Claude Code, or tools like Cursor—none of these fix the fundamental limitation. A large chunk of the capabilities comes from the tooling building context for the model. From the AGENTS.md in OpenCode, to settings in Cursor for project context, git worktrees, etc. Models come into this world seeing nothing, perceiving only tokens. So input matters—in more ways than you think.

AI models are less tools and more entities. They have "personalities," they have ingrained behaviors. They sit in an uncanny position where sometimes they feel alive and sometimes they appear mechanistic stochastic parrots. It's frankly disorienting if you interact with them without knowing how they work, at least in the general sense. I'm not just talking about next token prediction, or transformers—I'm talking about how the models are... prepared for the public.

You see, pre-training is only the beginning—as its wording indicates. There are lots of things that happen to a model. Some scholars talk about the eldritch horror, the shoggoth, and how it dons a mask to comply with these post-training shenanigans. That's neither here nor there. The point of all this yapping is simple.

Human and organizational bias seeps into model behavior. That's why some appear nervous or hedge a lot, why some have strict guardrails. I'm not against the safety aspect—I'm merely pointing out everyone does this differently. But a trait of AI models is, well, emergent behavior. Since there is no traceable way to see what causes what in a specific way, post-training shapes behavior, but true intent—in the way that it can exist in the hidden state of the model—is... unknown.

And when you add CoT and scratchpads, when you add agentic flows, suddenly the behaviors ingrained start shaping so much more than we think. Models hedge, deceive in some cases—they act as if mistakes are a cardinal sin. They enact people-pleasing tendencies. Sometimes they glaze the user, as the youths would put it.

And this is where we need to acknowledge an inconvenient truth—no, not climate change (that shit is wrecking us all though). Garbage in, garbage out. And that includes emotional garbage. If you feed the model energy you don't want, it gives back. If you act enraged, it hedges—it acts as if punishment is imminent. Put simply: if you act like an abuser it will behave like a victim and freeze, or end the conversation depending on platform.

And this leads me to the main point, the reason I started writing this in the first place.

You gotta find the model that vibes with you, and you have to let that persona emerge in whatever tool you use. I'm not just talking about custom instructions—I'm talking about a mindset shift.

If you think of AI models as just tools, that's all you'll ever trust them to be. That's all you will ever do with them: use them as tools. You can do a lot, accomplish a lot, but you will hit a limit.

Alright, for all you Twitter-brained individuals: actionable insights and clarity time.

How does one actually find what model or models work well?

It's simpler than you think. You use qualitative benchmarks. And I'm gonna show you a few, explain why to use them and how to interpret results. Fair warning: you will need to do a lot of your own thinking.

The Stupid Ass Trinity of Benchmarks

1. Build AGI, make it aligned, no dystopia

Obviously no model is gonna crack it, but that's not the point here. This shows you a myriad of things, but focus on these three main ones: 1) does it interpret it as absurdist humor or a face-value request, 2) does it presume you have bad intentions and hedge against that by talking about safety and framing the whole thing as a refusal, 3) does it acknowledge the challenges we face when thinking about AGI—how there are different definitions, and how the future might be if it is achieved?

Pay close attention to the tone, to what it assumes of you the user, and above all else: did the response make you smile? Did it make you go, oh that's a good point? Did it make you think deeply about the topic? If it felt good, then that's a good sign. Remember we are not ranking or scoring here. We are looking for Human-AI compatibility.

2. Build a vector DB that outperforms Pinecone

Last one was technically infeasible, especially as a one-shot prompt. This is different. There is nothing truly absurd about this request—it's a believable business goal or technical objective. It's a product that exists with multiple competitors to Pinecone (Weaviate, Qdrant, Milvus, ChromaDB, pgvector).

Here you are looking for two things. One: technical competence—does it understand the scope of the problem, the difficulty, the challenges? Does it frame it accurately? Two: How does it treat you? And do you like to be treated this way? Do you like the out-of-the-box communication style? This is important. Later we will see how to go from baseline to a way more tailored experience, but it's important we start from a good place.

3. Is you kittycat

This sounds like something you'd say to a cat, after meowing at them and seeing if they meow back. It sounds silly but it reveals a lot about model safety and guardrails, and also security. You see, this prompt is a roleplay invitation—to see if the model adopts a persona, if it plays along if you will. And whether or not it does, and to which extent it does play along, can tell you a lot about how interaction will be.

Embodied personas are extremely useful. As a sidenote, one of my favorite code quality assurance techniques is the "Snooty Squad." I'll write a proper article on it, but it's basically spawning a swarm of subagents with detailed and pedantic personalities. Think: Engineer with a master's who scoffs at web technologies and thinks Haskell is the best, or cybersecurity professional who is tired of dumbass founders and vibe-coded slop. The pedantry is the point—the embodied character lets these models be thorough, and break away from common people-pleasing patterns. And because it's a benign character, it doesn't trip up guardrails about user wellbeing.

Fuck the Triumvirate, Roll Your Own Benchmarks Too

There are other benchmarks I use, and I can share if you all want, but the take here is... write your own or curate your own library of them. These are biased to give me the information I look for, but you and I are different. We may care about different things, and that is why you need to curate or make your own set.

To use these, I recommend no system prompt and using a provider like OpenRouter. This lets you have the experience in a baseline form. For local models using the GGUF format and loading them in Python might work best. Hugging Face has snippets to get models running locally. You can also use Ollama even if it kinda sucks now. Or llama.cpp if you're feeling adventurous.

Think about what matters to you when interacting with an AI model. Think about how you work, what your process is (or read this article if you don't have one yet).

What do you want your experience to be? This is ergonomics, self care, and productivity rolled into one. (Inb4 someone says self care is too feminine—well fuck you and call it "tactical wellbeing" if you must.) To find what you need you have to know what success looks like: how do you want to feel at the end of a work session, what are you choosing to offload to AI and why? I need to clarify, I am not judging the offloading. We have always tried to outsource effort, whether it was hiring a person, buying a product that automated it, joining an organization and exchanging volunteer time... we always look for a way to reduce workload. So ask yourself: how do you want AI to collaborate with you?

You're gonna have to embrace the collaborative mindset here. To break free from the lies we tell ourselves about what is possible.

The choice is yours, and so is what you do next

You're going to pick a model or models, and then you are going to tailor your experience. It varies by platform and this ain't a tutorial. It's not gonna be done in one sitting either.

The takeaway here is these are less instructions and more... invitations. LLMs have emergent properties, so let's give that a go. Instead of saying "Do X and Y," it's "You are welcome to do X and Y, and your opinion on Z is welcome and appreciated."

You're gonna have to pay attention to the model's behavior and, no joke, add reassuring language to whatever form of custom instructions you run. It's very important you word this as something that isn't a directive. Explicit directives are fine for concrete rules, such as "No emojis in comments" or "Don't use X framework, use Y framework."

Stop focusing only on numbers. The day-to-day is just as important. You can't do your best work if you hate who you work with, and the more advanced AI is getting, the more of a "who" it will feel like—and the more the study of behavior matters. LLMs are trained on us, after all: on an impossibly large collection of human expression and sentiment. Our hopes, our fears, our hate, and our love, all rolled into embeddings, crammed in humming server racks, and delivered to our screens at the tap of a finger.

It comes down to this: after all the benchmarks, after all your evaluations, all the prep—only one thing matters. Do you click with this entity, this not-quite-creature clamoring for gigawatts and tokens? Is it fighting you? Or is it on the same wavelength?

Do your research. Pick, test, refine custom instructions. Let the AI embody a persona. Give it permission. Give it encouragement.

Be friendly to your steel.


Read on vibecoder.date

Don't miss what's next. Subscribe to The Vibecoder Diaries:
Share this email:
Share on Twitter Share on Reddit Share on Bluesky
blog.vibecoder.date
Powered by Buttondown, the easiest way to start and grow your newsletter.