T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x

        May 7, 2026

T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x

Multi-Turn Agent RL Collapse May Not Be a Credit Assignment Problem. T²PO uses model self-uncertainty to trigger thinking and resampling. Stability and final performance both rise on WebShop, ALFWorld, and Search QA. ICML accepted.

Factuality's Bottleneck Is Metacognition, Not Knowledge Volume. A position paper argues models still don't know what they don't know. Calibrated uncertainty is the hidden control layer in any agent reliability stack.

A Better Scorecard for Putting Medical Agents to Work. PhysicianBench drops 100 real consultations into a commercial EHR environment. Each task averages 27 tool calls. Best agent pass@1 hits 46%; best open source only 19%.

Pixel-Level Fix for Video Generation Caching. MotionCache assigns denoising steps per pixel using frame differences. SkyReels-V2 sees 6.28x speedup; MAGI-1 only 1.64x. Transfer depends heavily on the base model.

What If Attention Is Just a Parameter-Prediction MLP. WeightFormer rewrites attention math as an MLP whose parameters are predicted from input. The linearization design goal shifts from "approximate softmax" to "predict good parameters."

Also Notable

Students Picked 80 Real-Coursework Questions Agents Can't Solve — bilingual benchmark closer to real user failures than researcher-designed tests.
Make a Model Count Repeated Symbols Until It Breaks — quantifiable minimal reliability test for the boundary between memorization and rule execution.
Treat Agentic Systems as Token-Allocation Economies — position paper reframes the stack into four economic layers and argues for token-economy evaluation over text generation.
26.7M Spatial Proteomics Patches + H&E + Clinical Trimodal Contrastive Learning — Haiku actually delivered at scale, laying a multimodal foundation model floor for spatial biology.
Brain MRI Foundation Model SAEs Collapse in Deep Layers — authors stabilize SAEs with geometric priors, adding an interpretability tool for medical imaging foundation models.
Game-Engine Synthetic Data Still Has a Visible Sim2Real Gap — even with ray tracing, real images differ visibly; the hybrid approach narrows the gap for synthetic-data training pipelines.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)