Julian Moncarz

Archives
Log in
Subscribe
March 30, 2026

Weekly links, Monday March 30, 2026

Hey everyone! It's been a while. Here are some of the most interesting things I have read recently:

https://ngrok.com/blog/quantization

Quantization allows you to make models ~4x smaller and ~2x faster while losing little accuracy. But how? This is the clearest explanation of quantization I have come across.

https://factory.ai/news/missions

These folks claim to have cracked long running goal oriented multi-agent systems. I wonder if it works...

https://arxiv.org/abs/2603.19461

https://github.com/facebookresearch/Hyperagents

New bit of research from Meta's FAIR lab. Here is how it works:

Pick an eval you want a system / an LLM to get better at. For example, a coding benchmark. You have some tree of codebases (we will see in a sec how this is created).

  1. Run a script to pick which of the tree's nodes to iterate on
  2. Spin up a docker container and load the codebase into it, along with all of the other iterations of the codebase and their scores in reference folders.
  3. A "meta-agent" runs - it is told to look at the codebase and the other iterations and make any mods it wants.

Note: the codebase contains the code for the node picker, the meta agent, AND the task agent - the meta agent can modify the ENTIRE experimental system pretty much however it wants.

  1. The modified "task agent" runs on whatever the task is.
  2. The new, modified codebase along with the score is saved as a "child" of whatever node it was created from.

This system lets the meta agent change the search strategy across three layers: which nodes we choose to iterate on, how the meta-agent works and how the task agent works.

Results:

Domain Start → End Notes
Polyglot Coding 8.4% → 26.7% Matches original DGM (another similar setup https://sakana.ai/dgm/)
Paper Review 0.0 → 0.710 Beats AI-Scientist-v2 (0.630)
Robotics 0.060 → 0.372 Achieves jumping behavior
Cross-domain transfer DGM: 0.0, DGM-H: 0.630 Train on review+robotics, test on math grading

Weird stuff they do: They use LiteLLM with no native tool calling :< they just tell the models to output json and parse it.

This is the prompt:

"""

{tools_available}

Use only one tool (if needed) in this format: {{ "tool_name": ..., "tool_input": ... }}

ONLY USE ONE TOOL PER RESPONSE, AND STRICTLY FOLLOW THE FORMAT OF TOOL_NAME AND TOOL_INPUT ABOVE. DO NOT HALLUCINATE OR MAKE UP ANYTHING.

"""

Personal note:

I'm headed to Boston this weekend for the AISST / MAIA workshop. If you'll be there, respond and lmk! I'm excited to see you :)

Don't miss what's next. Subscribe to Julian Moncarz:
Powered by Buttondown, the easiest way to start and grow your newsletter.