AI Research Brief

Archives
March 20, 2026

3D at 0.1% Tokens, Video Fine-Tuning's Hidden Spatial Cost

  • Misaligned experience replay is a silent bottleneck in agent RL. Complementary RL lets the experience extractor adapt based on policy performance, enabling co-evolution instead of static accumulation.
  • Video-SFT's temporal gains come at the cost of spatial understanding. Systematic experiments across architectures and scales confirm this is a structural trade-off, not a model-specific bug.
  • Video generation as auxiliary supervision for robot policies, disabled at deployment. GigaWorld-Policy's decoupled design runs 9x faster than Motus with 7% higher success rate.
  • 3D tokenization shifts from geometric to semantic hierarchy. LoST achieves better reconstruction quality with 0.1% of prior methods' token count.
  • Token pruning across ViT and LLM finally unified. STTS is end-to-end trainable, with efficiency gains that scale as you sample more frames for long videos.

Also Notable

  • Discrete Audio Tokens at 12.5fps for Autoregressive Speech Generation. A complete, deployable open-source foundation model pipeline. MOSS-TTS
  • Joint Appearance and Stereo Geometry from Pure RGB. End-to-end stereo video generation without depth maps. StereoWorld
  • MLLMs Expose Systematic Hallucination on Fine-Grained Negation. Passable coarse-grained scores mask failure modes at finer granularity. FINER
  • Hierarchical Grids Compress Long Video Navigation to Log-Scale Compute. Operates directly on raw frames, no caption preprocessing needed. VideoAtlas
  • Majority Voting in Label-Free RL Collapses Output Diversity. Co-evolving generator and verifier breaks the consensus trap. CoVerRL
  • RL-Trained Code Search Agent for Precise Large-Repo Localization. The upstream bottleneck for coding agents: get localization wrong and everything downstream fails. CodeScout
  • Integrated Gradients Guide Layer-Wise Mixed-Precision LVLM Quantization. Quantization sensitivity visualization directly cuts deployment cost. QAIG
  • GQA-to-MLA via Low-Rank Factorization. Drops KV-cache overhead without retraining. CARE
  • A New Fix for DPO's Squeezing Effect. Sharpness-aware optimization in logit space balances alignment and generalization. LogitSAM
  • Adaptive Zoom and Instruction Refinement for Small GUI Elements. A practical GRPO-trained approach. AdaZoom-GUI

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.