AI Research Brief

Archives
April 8, 2026

Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels

  • VideoLLM achieves 2 FPS streaming video QA. AURA unifies continuous perception and proactive response in one end-to-end architecture, with ASR+TTS integrated into a working interactive prototype.
  • Agent belief maintenance gets its first systematic benchmark. ClawArena covers 64 scenarios with dynamic information updates, finding that framework design accounts for nearly 60% of the performance gap between models.
  • RLVR's rollout mechanism naturally filters noisy labels. Wrong labels only affect training when the model independently reproduces the wrong answer. The method stays robust at 0.9 noise ratio.
  • Breaking the circular dependency in test-based code selection. ACES uses leave-one-out AUC on a pass/fail matrix to weight tests by ranking consistency. Zero extra model calls required.

Also Notable

  • How Much "Geometry Tax" Do Discrete Tokens Cost Scientific Models? — Quantifies tokenization's cost to geometric fidelity in continuous physical systems; switching to continuous output heads recovers most of the loss.
  • Combee Makes Agent Knowledge Accumulation Composable — Compositional prompt learning addresses scalability limits of existing prompt learning on agent tasks.
  • LLM Reviewers Now Retrieve Literature and Execute Code to Verify Claims — Review quality improvement shifts from "read more carefully" to "look more broadly."
  • Snapchat's Production Experience Replacing Traditional Item IDs With Semantic IDs — Covers engineering tradeoffs across retrieval-to-ranking pipeline in a recommender system.
  • Geolocation as a Stress Test for Agentic Tool Use — Combining weak visual cues with multi-hop verification exposes agent reasoning chain weaknesses better than text-only tasks.
  • Schema Constraints Plus Hybrid Knowledge Tools for KG Triple Verification — Multi-source cross-validation reduces single-source bias in automated KG construction.
  • 4 Portable Cameras Recover 4D Dynamic Scenes — Previously required dense multi-view arrays; CVPR-accepted lightweight alternative.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.