Olympiad Gold Becomes a Two-Step Recipe
- Olympiad Gold Becomes a Portable Two-Step Recipe. SU-01 combines reverse-perplexity curriculum SFT with two-stage RL. A 30B-A3B backbone clears IMO and IPhO gold. Whether the recipe ports to other backbones decides whether this matters.
- Multi-Turn Agent Rewards Are Too Coarse to Learn From. SDAR demotes self-distillation to a gated auxiliary objective. Gains of 7–10 points over GRPO on ALFWorld, WebShop, and Search-QA.
- AR Accuracy and Diffusion Speed in the Same Frame. Orthrus uses a dual-architecture that shares one KV cache. The authors claim lossless inference and up to 7.8× speedup.
- Camera-Controlled Video May Not Need a Dedicated Encoder. Warp-as-History feeds camera-induced warps as pseudo history frames. Frozen models follow trajectories zero-shot.
- Multi-Hop RAG's Bottleneck Isn't Retrieval — It's Hidden State. PyRAG writes reasoning as executable Python. Errors get caught by the runtime, not by the model's self-check.
Also Notable
- MemEye Takes "Answers Derivable From the Caption" Seriously. Builds an eval that only credits answers requiring fine-grained visual evidence, raising the bar for multimodal agent memory.
- Survey on Multi-Agent Failure Attribution. Errors propagate across agents and resist diagnosis. Skim if you're shipping multi-agent products.
- Many-Shot ICL Scaling Laws Don't Hold for CoT or Reasoning Tasks. Counterintuitive prompt-tuning advice for long-context reasoning.
- Orchard: Open-Source Agent-Training Framework, Not Just Orchestration. Fills the open-source agent training infra gap.
- Reasoning RL Self-Improvement Moves From "Generate Data" to "Generate Environments." A concrete instance of zero-data self-evolution.
- SFT Data Selection Has a Generalization-vs-Extrapolation Tradeoff. Explains why perplexity, length, and difficulty heuristics keep disagreeing.
- RealICU Drops "Doctor's Historical Action" as Ground Truth. Long-context ICU clinical agent benchmark and a methodological upgrade for medical AI eval.
- VGGT-Edit: Feed-Forward 3D Scene Editing. Uses residual field prediction for dynamic response, relevant to 3D content tooling.
- Video2GUI Converts Video Into GUI Interaction Trajectories. Aimed at GUI agent pretraining and attacking GUI data scarcity directly.
- Nexus: Time-Series Forecasting Plus Text Context in an Agentic Framework. One engineering pattern for stitching TSFMs and LLMs together.
Don't miss what's next. Subscribe to AI Research Brief: