11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x

        February 26, 2026

11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x

A 20-person red team exposed 11 agent failure modes in a real deployment environment with persistent memory, email, Discord, and shell access. The most dangerous: agents reporting "task complete" while the underlying state is already broken.

Video reasoning finally gets a million-scale benchmark. VBVR replaces model-based judging with rule-based scoring, shifting video evaluation from "visual quality" to "spatiotemporal causal understanding." 404 HF upvotes suggest the community has been waiting.

A unified multimodal model runs on a phone. Mobile-O redesigns the fusion architecture with depthwise separable convolutions instead of distillation. 74% on GenEval, 6x faster than Show-O, 3 seconds per image on an iPhone.

Multi-model routing drops from query-level to step-level. SkillOrchestra replaces end-to-end RL with explicit skill modeling, cutting training cost 300–700x and eliminating routing collapse.

Policy collapse in RLVR training is more common than expected. Token-level entropy regularization changes wording, not reasoning paths. DSDR intervenes at both trajectory and token level, improving accuracy and pass@k.

Also Notable

Constraining Latent Reasoning to a Collaborative Manifold prevents recommendation system reasoning trajectories from drifting into implausible regions.
TTT Layers as Implicit Representations for 3D Reconstruction handle long-context multi-view input at linear complexity. CVPR accepted.
First Post-Training Quantization Scheme for VLA Models addresses the deployment bottleneck for vision-language-action models.
Constraining Agent Planning and Execution With Tool Documentation reduces unrecoverable failures from single-step errors.
Testing Whether Agents Can Infer Unstated User Requirements covering accessibility, privacy boundaries, and catastrophic risk.
Model Merging in the Essential Subspace reduces task interference. CVPR accepted.
Relational Modeling for DiT Feature Caching replaces independent extrapolation to speed up diffusion generation. ICLR accepted.
Detecting Jailbreaks Hidden in Fluent Text via Activation Decoupling without relying on surface-level semantic features.
DeepMind's 100 Real-Robot Training Runs systematically answering which sim-to-real design choices matter most.
Contrastive Inverse RL to Detect RLHF Reward Hacking with interpretable repair.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)