11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x
- A 20-person red team exposed 11 agent failure modes in a real deployment environment with persistent memory, email, Discord, and shell access. The most dangerous: agents reporting "task complete" while the underlying state is already broken.
- Video reasoning finally gets a million-scale benchmark. VBVR replaces model-based judging with rule-based scoring, shifting video evaluation from "visual quality" to "spatiotemporal causal understanding." 404 HF upvotes suggest the community has been waiting.
- A unified multimodal model runs on a phone. Mobile-O redesigns the fusion architecture with depthwise separable convolutions instead of distillation. 74% on GenEval, 6x faster than Show-O, 3 seconds per image on an iPhone.
- Multi-model routing drops from query-level to step-level. SkillOrchestra replaces end-to-end RL with explicit skill modeling, cutting training cost 300–700x and eliminating routing collapse.
- Policy collapse in RLVR training is more common than expected. Token-level entropy regularization changes wording, not reasoning paths. DSDR intervenes at both trajectory and token level, improving accuracy and pass@k.
Also Notable
- Constraining Latent Reasoning to a Collaborative Manifold prevents recommendation system reasoning trajectories from drifting into implausible regions.
- TTT Layers as Implicit Representations for 3D Reconstruction handle long-context multi-view input at linear complexity. CVPR accepted.
- First Post-Training Quantization Scheme for VLA Models addresses the deployment bottleneck for vision-language-action models.
- Constraining Agent Planning and Execution With Tool Documentation reduces unrecoverable failures from single-step errors.
- Testing Whether Agents Can Infer Unstated User Requirements covering accessibility, privacy boundaries, and catastrophic risk.
- Model Merging in the Essential Subspace reduces task interference. CVPR accepted.
- Relational Modeling for DiT Feature Caching replaces independent extrapolation to speed up diffusion generation. ICLR accepted.
- Detecting Jailbreaks Hidden in Fluent Text via Activation Decoupling without relying on surface-level semantic features.
- DeepMind's 100 Real-Robot Training Runs systematically answering which sim-to-real design choices matter most.
- Contrastive Inverse RL to Detect RLHF Reward Hacking with interpretable repair.
Don't miss what's next. Subscribe to AI Research Brief: