Weekly links, Monday March 30, 2026

        March 30, 2026

Weekly links, Monday March 30, 2026

        Hey everyone! It's been a while. Here are some of the most interesting things I have read recently:
https://ngrok.com/blog/quantization
Quantization allows you to make models ~4x smaller and ~2x faster while losing little accuracy. But how? This is the clearest explanation of quantization I have come across.
https://factory.ai/news/missions
These folks claim to have cracked long running goal oriented multi-agent systems. I wonder if it works...
https://arxiv.org/abs/2603.19461
https://github.com/facebookresearch/Hyperagents
New bit of research from Meta's FAIR lab. Here is how it works:
Pick an eval you want a system / an LLM to get better at. For example, a coding benchmark.
You have some tree of codebases (we will see in a sec how this is created).

Run a script to pick which of the tree's nodes to iterate on
Spin up a docker container and load the codebase into it, along with all of the other iterations of the codebase and their scores in reference folders. 
A "meta-agent" runs - it is told to look at the codebase and the other iterations and make any mods it wants.

Note: the codebase contains the code for the node picker, the meta agent, AND the task agent - the meta agent can modify the ENTIRE experimental system pretty much however it wants.

The modified "task agent" runs on whatever the task is.
The new, modified codebase along with the score is saved as a "child" of whatever node it was created from.

This system lets the meta agent change the search strategy across three layers: which nodes we choose to iterate on, how the meta-agent works and how the task agent works.
Results:

Domain
Start → End
Notes

Polyglot Coding
8.4% → 26.7%
Matches original DGM (another similar setup https://sakana.ai/dgm/)

Paper Review
0.0 → 0.710
Beats AI-Scientist-v2 (0.630)

Robotics
0.060 → 0.372
Achieves jumping behavior

Cross-domain transfer
DGM: 0.0, DGM-H: 0.630
Train on review+robotics, test on math grading

Weird stuff they do:
They use LiteLLM with no native tool calling :< they just tell the models to output json and parse it.
This is the prompt: 
"""
{tools_available}
Use only one tool (if needed) in this format:

  {{
      "tool_name": ...,
      "tool_input": ...
  }}

ONLY USE ONE TOOL PER RESPONSE, AND STRICTLY FOLLOW THE FORMAT OF TOOL_NAME AND TOOL_INPUT ABOVE.
  DO NOT HALLUCINATE OR MAKE UP ANYTHING. 
"""
Personal note:
I'm headed to Boston this weekend for the AISST / MAIA workshop. If you'll be there, respond and lmk! I'm excited to see you :)

                                Don't miss what's next. Subscribe to Julian Moncarz:

            Email address (required)

                    ← Newer

                Weekly links Friday, April 10th, 2026

                    Older →

                Weekly links, March 5 2026

Domain	Start → End	Notes
Polyglot Coding	8.4% → 26.7%	Matches original DGM (another similar setup https://sakana.ai/dgm/)
Paper Review	0.0 → 0.710	Beats AI-Scientist-v2 (0.630)
Robotics	0.060 → 0.372	Achieves jumping behavior
Cross-domain transfer	DGM: 0.0, DGM-H: 0.630	Train on review+robotics, test on math grading