DMax Triples Parallel Decoding Efficiency for Diffusion LMs
- Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question whether modular pipelines are still worth their complexity.
- Parallel decoding efficiency for diffusion language models nearly triples. DMax replaces binary mask flips with continuous embedding interpolation, hitting 1,338 tokens per second on two H200s.
- The real bottleneck for agents isn't too few tools; it's too many calls. HDPO decouples accuracy and efficiency into orthogonal channels, cutting tool calls by orders of magnitude with no accuracy loss.
- Text-to-video counting errors get a training-free fix. NUMINA reverse-engineers object layouts from attention heads and corrects them, plugging directly into Wan2.1 with no retraining.
- Multi-task RL finally has a principled answer to mismatched reward distributions. G2RPO normalizes each task's advantage to N(0,1), beating comparable open-source models across 18 benchmarks.
Also Notable
- 153 Everyday Tasks, 144 Real Websites, and the Best Agent Still Fails Half the Time. A high-profile large-scale agent evaluation that makes capability boundaries immediately visible.
- AI2 Releases a Fully Open-Source Visual Web Agent. Ships with public training data and a complete pipeline, serving as a ready-to-use baseline for building your own web agent.
- 3,000 Trajectories Distill a 9B Model That Matches Gemini 3 Pro Across Six Web Environments. A web agent recipe that cuts cost by orders of magnitude.
- Phone Agents Need to Know When to Shut Up, Not Just Complete Tasks. A personalized agent evaluation framework testing preference inference and the judgment to intervene proactively.
- Real-Time, High Expressiveness, and Long-Term Identity Consistency. The triangle problem of digital character animation; LPM uses a video-based approach to approximate all three.
- Multiple Differentiable Rewards Jointly Guide Diffusion at Inference Time. No weight changes: alignment, fidelity, and localization unified into the sampling process.
- Do LLMs Avoid the Same Mistakes Next Time? Not testing factual recall, but whether behavior automatically adapts: an implicit memory evaluation.
- T2I Reward Models Optimize for Average Aesthetics. This One Models Individual Preferences. What looks good varies from person to person.
Don't miss what's next. Subscribe to AI Research Brief: