4B Agent on 10K Data, MoE Upcycling Saves 32% Compute
- 10K Open Trajectories Train a 4B Deep Research Agent. DR-Venus combines agentic SFT with turn-level RL to deliver an edge-deployable agent that beats sub-9B agentic models and narrows the gap to the 30B class.
- Expanding MoE Experts from a Checkpoint Saves 32% GPU Time. Expert Upcycling copies experts and expands the router, then lets experts re-differentiate during continued pretraining; picking which experts to copy by gradient importance multiplies the gain by 3x.
- 6,000 Real Coding-Agent Conversations Finally Public. SWE-chat shows bimodal usage — 41% of sessions offload nearly everything to the agent, 23% have humans writing all the code; only 44% of agent code lands in commits and 44% of turns are pushback.
- Four Architectures Learn the Same Number Representations. Transformer, Linear RNN, LSTM, and word vectors all converge on T=2, 5, 10 Fourier-domain periodic features, but whether mod-T is linearly classifiable still depends on data format and optimizer.
Also Notable
- Three-Tier Scoring for Continuous Agent Skill Learning. Skill quality, execution trace, and task outcome get separate evaluations, sharpening resolution along the SkillFlow direction. SkillLearnBench
- LLM Game-Code Generation Moves from "Runs Once" to "Iterates Across Versions". The focus shifts to cross-version experience reuse, since single-shot generation hits a ceiling that only iteration can break. CreativeGame
- Surrogate Models Approximate Black-Box LLM Behavior in Medical Prediction. The path is worth referencing; effectiveness depends on full-paper details. Surrogate modeling for LLMs
Don't miss what's next. Subscribe to AI Research Brief: