3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work

        March 21, 2026

3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work

Cascade RL plus multi-domain distillation lets 3B active parameters win three olympiad golds. NVIDIA open-sourced the full training recipe. Small-model reasoning ceilings just moved.

Video diffusion models already encode full 3D spatial priors internally. No 3D annotations or geometry modules needed. Extract intermediate features and you get depth and scene flow prediction.

768-dimensional discrete tokens serve both understanding and generation. CubiD uses fine-grained masked diffusion to sidestep high-dimensional combinatorial explosion. One fewer barrier to unified multimodal architectures.

Reaction latency, not trajectory smoothness, is the real VLA deployment bottleneck. FASTER provides an explicit formula and compresses reactive denoising by roughly 10x.

Agents that build and iterate their own skills outperform external skill injection. But percentage gains on extremely low baselines deserve a sober second look.

Also Notable

Semantic Editing and Motion Preservation No Longer Fight Each Other. SAMA decouples the two objectives into independent optimization paths without external priors.
3DreamBooth Uses Multi-View 3D Representations for Subject-Driven Video Generation. View consistency stops being luck-dependent; objects are no longer treated as 2D.
Long Video + Audio Cross-Modal Understanding Gets a Systematic Benchmark. Current OmniLLMs collapse on cross-modal tasks beyond 10 minutes.
Diffusion for Discrete Motion Tokens. Handles semantic conditioning and kinematic constraints simultaneously, merging two previously incompatible motion generation paradigms.
Video Diffusion Denoising Steps Vary Wildly in Precision Sensitivity. Step-level adaptive quantization pushes models down to 6-bit.
RL Alignment for Diffusion Language Models Requires Full Diffusion Probability per Step. Meta uses trajectory reduction to slash the overhead.
Procedural Diagnostic Environments Isolate Reasoning-Action Coupling in Tool-Augmented LLMs. Eliminates memorization and data contamination confounds. From CMU.
When Should a Generalist Model Split into Domain Experts? EPFL provides an optimal splitting strategy that outperforms one-size-fits-all fine-tuning.
Cross-Domain Video Demonstrations to Executable Code. Neuro-symbolic counterfactual reasoning auto-adapts to perceptual differences across physical environments.
Single-Image Reconstruction of Articulated 3D Objects. Progressive structural reasoning decomposes geometry, parts, and motion parameters layer by layer.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)