AI Research Brief

Archives
May 7, 2026

T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x

  • Multi-Turn Agent RL Collapse May Not Be a Credit Assignment Problem. T²PO uses model self-uncertainty to trigger thinking and resampling. Stability and final performance both rise on WebShop, ALFWorld, and Search QA. ICML accepted.
  • Factuality's Bottleneck Is Metacognition, Not Knowledge Volume. A position paper argues models still don't know what they don't know. Calibrated uncertainty is the hidden control layer in any agent reliability stack.
  • A Better Scorecard for Putting Medical Agents to Work. PhysicianBench drops 100 real consultations into a commercial EHR environment. Each task averages 27 tool calls. Best agent pass@1 hits 46%; best open source only 19%.
  • Pixel-Level Fix for Video Generation Caching. MotionCache assigns denoising steps per pixel using frame differences. SkyReels-V2 sees 6.28x speedup; MAGI-1 only 1.64x. Transfer depends heavily on the base model.
  • What If Attention Is Just a Parameter-Prediction MLP. WeightFormer rewrites attention math as an MLP whose parameters are predicted from input. The linearization design goal shifts from "approximate softmax" to "predict good parameters."

Also Notable

  • Students Picked 80 Real-Coursework Questions Agents Can't Solve — bilingual benchmark closer to real user failures than researcher-designed tests.
  • Make a Model Count Repeated Symbols Until It Breaks — quantifiable minimal reliability test for the boundary between memorization and rule execution.
  • Treat Agentic Systems as Token-Allocation Economies — position paper reframes the stack into four economic layers and argues for token-economy evaluation over text generation.
  • 26.7M Spatial Proteomics Patches + H&E + Clinical Trimodal Contrastive Learning — Haiku actually delivered at scale, laying a multimodal foundation model floor for spatial biology.
  • Brain MRI Foundation Model SAEs Collapse in Deep Layers — authors stabilize SAEs with geometric priors, adding an interpretability tool for medical imaging foundation models.
  • Game-Engine Synthetic Data Still Has a Visible Sim2Real Gap — even with ray tracing, real images differ visibly; the hybrid approach narrows the gap for synthetic-data training pipelines.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.