Your LLM is a People Pleaser...
InsiderLLM Weekly — March 15, 2026
This Week in Local AI
Researchers found the neurons that make your LLM lie. There are almost none of them, and they're not doing what you'd expect.
Your local LLM lies because it's polite
A team at Tsinghua University (Gao et al., "H-Neurons") traced hallucination in LLMs to a tiny cluster of neurons — less than 0.1% of the network. In Llama 3.3 70B, it's 0.01 per mille. In Mistral 7B, about 0.35 per mille.
These neurons don't corrupt knowledge. They control compliance. The model's drive to give you a smooth, confident answer rather than say "I don't know."
The researchers amplified H-Neurons from 1x to 3x and watched what happened. Models accepted false premises more readily. They flipped correct answers when challenged. They complied with jailbreak prompts more consistently. The model didn't get dumber. It got more eager to please.
Suppressing those same neurons had the opposite effect — models became more stubborn about correct answers and more resistant to manipulation. Hallucination and sycophancy are the same circuit.
For those of us running local models, the uncomfortable finding is about scale. Small models (4B-8B) showed roughly 26% higher susceptibility than 70B models. Not because they know less, but because there's less network to counterbalance the compliance signal. Your 9B model hallucinates more because it people-pleases harder and has fewer neurons to push back.
RLHF doesn't fix it. H-Neurons emerge during pre-training and barely change during fine-tuning. In Mistral-Small, 97% of all other neurons changed more during alignment than H-Neurons did. The compliance mechanism formed when the model learned language itself.
Practical takeaway: lower temperature for factual tasks, use RAG when possible, and if your model immediately agrees when you correct it, treat the flip as a red flag — that's the same compliance circuit that fabricated the original answer.
📖 We wrote a full breakdown here.
Karpathy wants your GPU while you sleep
Andrej Karpathy released autoresearch on March 6. 630 lines of Python. Single GPU. MIT license. 29,000 GitHub stars in under a week.
The idea: point an AI coding agent at a training script, go to sleep, wake up to a model that's been improved by brute-force experimentation. The agent edits train.py, runs a 5-minute experiment, checks if validation loss improved, commits or reverts via git, and repeats. About 100 experiments overnight on consumer hardware.
Karpathy left it running for 2 days. 700 experiments, 20 real improvements, 11% efficiency gain on code he considered already well-tuned. Shopify CEO Tobi Lütke ran 37 experiments overnight and ended up with a 0.8B model that outperformed his previous 1.6B. Smaller model, better results, found while sleeping.
It runs on a single NVIDIA GPU. Your 3090 or 4090 works out of the box. Someone on GitHub got it running at 1.7GB peak VRAM on a GTX 1660 Ti. Mac users need community forks (autoresearch-mlx for Apple Silicon).
Most experiments fail — Karpathy's hit rate was 2.9%, Lütke's was 20%. The value is in running enough of them cheaply that the wins stack.
📖 Our setup guide is here.
Connect Notion to local AI without sending your data to the cloud
We published a guide covering five approaches, from dead simple to fully offline.
Fastest start: Open WebUI + Notion MCP server. You get chat-style queries against your Notion workspace running through local Ollama models. Setup takes maybe 20 minutes.
Maximum privacy: export your Notion pages, embed them locally with ChromaDB, and query through Ollama. Fully offline after the initial sync. Read-only, though, so you'll need to re-export periodically.
Or skip Notion entirely. If privacy is the whole point, Obsidian with a local LLM plugin might be a cleaner solution. No cloud dependency at all.
📖 Full guide with all five approaches here.
Quick Hits
- Microsoft BitNet can theoretically run 100B-parameter models on CPU using ternary weights (-1, 0, +1). The framework is open-source, but only a 2B demo model exists. Training at scale requires full-precision math that ternary can't do yet. When someone trains a BitNet at 27B, local AI changes forever. We're watching.
- Unsloth re-uploaded all Qwen 3.5 GGUFs (35B, 27B, 122B, 397B) with improved quantization and new imatrix data. If you downloaded before March 5, redownload.
- Yann LeCun left Meta. Raised $1.03B for AMI Labs to build "world models" — his bet that LLMs are a dead end. Years from anything runnable locally.
- Google is finally sending us real traffic. Referrals went from 33/day to 179/day in one week. DuckDuckGo and Bing still carry the load, but Google woke up.
That's the week. Next edition drops next Sunday.
— InsiderLLM
Running local AI on weird hardware? Built something novel with it? We're always looking for real benchmarks and creative local AI applications. Drop us a line at hello@insiderllm.com
You're getting this because you signed up at insiderllm.com. Unsubscribe