AI Research Brief

Archives
April 12, 2026

DMax Triples Parallel Decoding Efficiency for Diffusion LMs

  • Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question whether modular pipelines are still worth their complexity.
  • Parallel decoding efficiency for diffusion language models nearly triples. DMax replaces binary mask flips with continuous embedding interpolation, hitting 1,338 tokens per second on two H200s.
  • The real bottleneck for agents isn't too few tools; it's too many calls. HDPO decouples accuracy and efficiency into orthogonal channels, cutting tool calls by orders of magnitude with no accuracy loss.
  • Text-to-video counting errors get a training-free fix. NUMINA reverse-engineers object layouts from attention heads and corrects them, plugging directly into Wan2.1 with no retraining.
  • Multi-task RL finally has a principled answer to mismatched reward distributions. G2RPO normalizes each task's advantage to N(0,1), beating comparable open-source models across 18 benchmarks.

Also Notable

  • 153 Everyday Tasks, 144 Real Websites, and the Best Agent Still Fails Half the Time. A high-profile large-scale agent evaluation that makes capability boundaries immediately visible.
  • AI2 Releases a Fully Open-Source Visual Web Agent. Ships with public training data and a complete pipeline, serving as a ready-to-use baseline for building your own web agent.
  • 3,000 Trajectories Distill a 9B Model That Matches Gemini 3 Pro Across Six Web Environments. A web agent recipe that cuts cost by orders of magnitude.
  • Phone Agents Need to Know When to Shut Up, Not Just Complete Tasks. A personalized agent evaluation framework testing preference inference and the judgment to intervene proactively.
  • Real-Time, High Expressiveness, and Long-Term Identity Consistency. The triangle problem of digital character animation; LPM uses a video-based approach to approximate all three.
  • Multiple Differentiable Rewards Jointly Guide Diffusion at Inference Time. No weight changes: alignment, fidelity, and localization unified into the sampling process.
  • Do LLMs Avoid the Same Mistakes Next Time? Not testing factual recall, but whether behavior automatically adapts: an implicit memory evaluation.
  • T2I Reward Models Optimize for Average Aesthetics. This One Models Individual Preferences. What looks good varies from person to person.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.