Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels

        April 8, 2026

Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels

VideoLLM achieves 2 FPS streaming video QA. AURA unifies continuous perception and proactive response in one end-to-end architecture, with ASR+TTS integrated into a working interactive prototype.

Agent belief maintenance gets its first systematic benchmark. ClawArena covers 64 scenarios with dynamic information updates, finding that framework design accounts for nearly 60% of the performance gap between models.

RLVR's rollout mechanism naturally filters noisy labels. Wrong labels only affect training when the model independently reproduces the wrong answer. The method stays robust at 0.9 noise ratio.

Breaking the circular dependency in test-based code selection. ACES uses leave-one-out AUC on a pass/fail matrix to weight tests by ranking consistency. Zero extra model calls required.

Also Notable

How Much "Geometry Tax" Do Discrete Tokens Cost Scientific Models? — Quantifies tokenization's cost to geometric fidelity in continuous physical systems; switching to continuous output heads recovers most of the loss.
Combee Makes Agent Knowledge Accumulation Composable — Compositional prompt learning addresses scalability limits of existing prompt learning on agent tasks.
LLM Reviewers Now Retrieve Literature and Execute Code to Verify Claims — Review quality improvement shifts from "read more carefully" to "look more broadly."
Snapchat's Production Experience Replacing Traditional Item IDs With Semantic IDs — Covers engineering tradeoffs across retrieval-to-ranking pipeline in a recommender system.
Geolocation as a Stress Test for Agentic Tool Use — Combining weak visual cues with multi-hop verification exposes agent reasoning chain weaknesses better than text-only tasks.
Schema Constraints Plus Hybrid Knowledge Tools for KG Triple Verification — Multi-source cross-validation reduces single-source bias in automated KG construction.
4 Portable Cameras Recover 4D Dynamic Scenes — Previously required dense multi-view arrays; CVPR-accepted lightweight alternative.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)