DMax Triples Parallel Decoding Efficiency for Diffusion LMs

        April 12, 2026

DMax Triples Parallel Decoding Efficiency for Diffusion LMs

Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question whether modular pipelines are still worth their complexity.

Parallel decoding efficiency for diffusion language models nearly triples. DMax replaces binary mask flips with continuous embedding interpolation, hitting 1,338 tokens per second on two H200s.

The real bottleneck for agents isn't too few tools; it's too many calls. HDPO decouples accuracy and efficiency into orthogonal channels, cutting tool calls by orders of magnitude with no accuracy loss.

Text-to-video counting errors get a training-free fix. NUMINA reverse-engineers object layouts from attention heads and corrects them, plugging directly into Wan2.1 with no retraining.

Multi-task RL finally has a principled answer to mismatched reward distributions. G2RPO normalizes each task's advantage to N(0,1), beating comparable open-source models across 18 benchmarks.

Also Notable

153 Everyday Tasks, 144 Real Websites, and the Best Agent Still Fails Half the Time. A high-profile large-scale agent evaluation that makes capability boundaries immediately visible.
AI2 Releases a Fully Open-Source Visual Web Agent. Ships with public training data and a complete pipeline, serving as a ready-to-use baseline for building your own web agent.
3,000 Trajectories Distill a 9B Model That Matches Gemini 3 Pro Across Six Web Environments. A web agent recipe that cuts cost by orders of magnitude.
Phone Agents Need to Know When to Shut Up, Not Just Complete Tasks. A personalized agent evaluation framework testing preference inference and the judgment to intervene proactively.
Real-Time, High Expressiveness, and Long-Term Identity Consistency. The triangle problem of digital character animation; LPM uses a video-based approach to approximate all three.
Multiple Differentiable Rewards Jointly Guide Diffusion at Inference Time. No weight changes: alignment, fidelity, and localization unified into the sampling process.
Do LLMs Avoid the Same Mistakes Next Time? Not testing factual recall, but whether behavior automatically adapts: an implicit memory evaluation.
T2I Reward Models Optimize for Average Aesthetics. This One Models Individual Preferences. What looks good varies from person to person.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)