Olympiad Gold Becomes a Two-Step Recipe

        May 16, 2026

Olympiad Gold Becomes a Two-Step Recipe

Olympiad Gold Becomes a Portable Two-Step Recipe. SU-01 combines reverse-perplexity curriculum SFT with two-stage RL. A 30B-A3B backbone clears IMO and IPhO gold. Whether the recipe ports to other backbones decides whether this matters.

Multi-Turn Agent Rewards Are Too Coarse to Learn From. SDAR demotes self-distillation to a gated auxiliary objective. Gains of 7–10 points over GRPO on ALFWorld, WebShop, and Search-QA.

AR Accuracy and Diffusion Speed in the Same Frame. Orthrus uses a dual-architecture that shares one KV cache. The authors claim lossless inference and up to 7.8× speedup.

Camera-Controlled Video May Not Need a Dedicated Encoder. Warp-as-History feeds camera-induced warps as pseudo history frames. Frozen models follow trajectories zero-shot.

Multi-Hop RAG's Bottleneck Isn't Retrieval — It's Hidden State. PyRAG writes reasoning as executable Python. Errors get caught by the runtime, not by the model's self-check.

Also Notable

MemEye Takes "Answers Derivable From the Caption" Seriously. Builds an eval that only credits answers requiring fine-grained visual evidence, raising the bar for multimodal agent memory.
Survey on Multi-Agent Failure Attribution. Errors propagate across agents and resist diagnosis. Skim if you're shipping multi-agent products.
Many-Shot ICL Scaling Laws Don't Hold for CoT or Reasoning Tasks. Counterintuitive prompt-tuning advice for long-context reasoning.
Orchard: Open-Source Agent-Training Framework, Not Just Orchestration. Fills the open-source agent training infra gap.
Reasoning RL Self-Improvement Moves From "Generate Data" to "Generate Environments." A concrete instance of zero-data self-evolution.
SFT Data Selection Has a Generalization-vs-Extrapolation Tradeoff. Explains why perplexity, length, and difficulty heuristics keep disagreeing.
RealICU Drops "Doctor's Historical Action" as Ground Truth. Long-context ICU clinical agent benchmark and a methodological upgrade for medical AI eval.
VGGT-Edit: Feed-Forward 3D Scene Editing. Uses residual field prediction for dynamic response, relevant to 3D content tooling.
Video2GUI Converts Video Into GUI Interaction Trajectories. Aimed at GUI agent pretraining and attacking GUI data scarcity directly.
Nexus: Time-Series Forecasting Plus Text Context in an Agentic Framework. One engineering pattern for stitching TSFMs and LLMs together.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)