Weekly links, Monday March 30, 2026
Hey everyone! It's been a while. Here are some of the most interesting things I have read recently:
https://ngrok.com/blog/quantization
Quantization allows you to make models ~4x smaller and ~2x faster while losing little accuracy. But how? This is the clearest explanation of quantization I have come across.
https://factory.ai/news/missions
These folks claim to have cracked long running goal oriented multi-agent systems. I wonder if it works...
https://arxiv.org/abs/2603.19461
https://github.com/facebookresearch/Hyperagents
New bit of research from Meta's FAIR lab. Here is how it works:
Pick an eval you want a system / an LLM to get better at. For example, a coding benchmark. You have some tree of codebases (we will see in a sec how this is created).
- Run a script to pick which of the tree's nodes to iterate on
- Spin up a docker container and load the codebase into it, along with all of the other iterations of the codebase and their scores in reference folders.
- A "meta-agent" runs - it is told to look at the codebase and the other iterations and make any mods it wants.
Note: the codebase contains the code for the node picker, the meta agent, AND the task agent - the meta agent can modify the ENTIRE experimental system pretty much however it wants.
- The modified "task agent" runs on whatever the task is.
- The new, modified codebase along with the score is saved as a "child" of whatever node it was created from.
This system lets the meta agent change the search strategy across three layers: which nodes we choose to iterate on, how the meta-agent works and how the task agent works.
Results:
| Domain | Start → End | Notes |
|---|---|---|
| Polyglot Coding | 8.4% → 26.7% | Matches original DGM (another similar setup https://sakana.ai/dgm/) |
| Paper Review | 0.0 → 0.710 | Beats AI-Scientist-v2 (0.630) |
| Robotics | 0.060 → 0.372 | Achieves jumping behavior |
| Cross-domain transfer | DGM: 0.0, DGM-H: 0.630 | Train on review+robotics, test on math grading |
Weird stuff they do: They use LiteLLM with no native tool calling :< they just tell the models to output json and parse it.
This is the prompt:
"""
{tools_available}
Use only one tool (if needed) in this format:
ONLY USE ONE TOOL PER RESPONSE, AND STRICTLY FOLLOW THE FORMAT OF TOOL_NAME AND TOOL_INPUT ABOVE. DO NOT HALLUCINATE OR MAKE UP ANYTHING.
"""
Personal note:
I'm headed to Boston this weekend for the AISST / MAIA workshop. If you'll be there, respond and lmk! I'm excited to see you :)