AI Research Brief

Archives
March 21, 2026

3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work

  • Cascade RL plus multi-domain distillation lets 3B active parameters win three olympiad golds. NVIDIA open-sourced the full training recipe. Small-model reasoning ceilings just moved.
  • Video diffusion models already encode full 3D spatial priors internally. No 3D annotations or geometry modules needed. Extract intermediate features and you get depth and scene flow prediction.
  • 768-dimensional discrete tokens serve both understanding and generation. CubiD uses fine-grained masked diffusion to sidestep high-dimensional combinatorial explosion. One fewer barrier to unified multimodal architectures.
  • Reaction latency, not trajectory smoothness, is the real VLA deployment bottleneck. FASTER provides an explicit formula and compresses reactive denoising by roughly 10x.
  • Agents that build and iterate their own skills outperform external skill injection. But percentage gains on extremely low baselines deserve a sober second look.

Also Notable

  • Semantic Editing and Motion Preservation No Longer Fight Each Other. SAMA decouples the two objectives into independent optimization paths without external priors.
  • 3DreamBooth Uses Multi-View 3D Representations for Subject-Driven Video Generation. View consistency stops being luck-dependent; objects are no longer treated as 2D.
  • Long Video + Audio Cross-Modal Understanding Gets a Systematic Benchmark. Current OmniLLMs collapse on cross-modal tasks beyond 10 minutes.
  • Diffusion for Discrete Motion Tokens. Handles semantic conditioning and kinematic constraints simultaneously, merging two previously incompatible motion generation paradigms.
  • Video Diffusion Denoising Steps Vary Wildly in Precision Sensitivity. Step-level adaptive quantization pushes models down to 6-bit.
  • RL Alignment for Diffusion Language Models Requires Full Diffusion Probability per Step. Meta uses trajectory reduction to slash the overhead.
  • Procedural Diagnostic Environments Isolate Reasoning-Action Coupling in Tool-Augmented LLMs. Eliminates memorization and data contamination confounds. From CMU.
  • When Should a Generalist Model Split into Domain Experts? EPFL provides an optimal splitting strategy that outperforms one-size-fits-all fine-tuning.
  • Cross-Domain Video Demonstrations to Executable Code. Neuro-symbolic counterfactual reasoning auto-adapts to perceptual differences across physical environments.
  • Single-Image Reconstruction of Articulated 3D Objects. Progressive structural reasoning decomposes geometry, parts, and motion parameters layer by layer.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.