dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x

        April 14, 2026

dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x

dLLMs hallucinate in fundamentally different ways than autoregressive models. The first controlled comparison identifies three unique failure modes (premature termination, incomplete denoising, context intrusion), meaning existing detection tools need redesign.

Contrastive mutual information cuts process reward labeling cost by two orders of magnitude. Step-level signals extracted directly from model probabilities, no repeated rollouts needed. Accepted at ACL.

RAG knowledge base defense shifts from static rules to runtime adversarial games. Canary tokens borrowed from stack canary concepts enable continuous detection, plug-and-play with no architecture changes.

TorchUMM unifies mainstream multimodal models into one codebase. Covers understanding, generation, and editing, enabling the first apples-to-apples comparison across architectures.

Also Notable

Hierarchical Analogical Reasoning Replaces Rule Matching for Content Moderation — Analogies handle gray-area cases more flexibly than hard rules.
Chain-of-Analogy Counters Decision Shortcuts in Moderation — Companion paper to CHAIRO above, using DPO to strengthen analogical reasoning quality.
Strip Textures, Keep Wireframes, Test VLM Geometric Understanding — Checks whether models truly understand spatial structure or just read texture cues.
Multi-Agent Structured Reasoning for Legal Consultation — Includes a large-scale Chinese legal QA dataset.
2.5M Spatially Aligned Samples for Remote Sensing Multimodal Pretraining — Semantic supervision for geospatial foundation model pretraining.
LLM Code Summaries Are Getting Longer, Evaluation Can't Keep Up — Reference-free fine-grained factual consistency evaluation.
Teaching Navigation Agents to Recognize Nonexistent Targets — Handle false-premise instructions instead of searching blindly until timeout.
Unsupervised Domain Adaptation for Low-Light Pose Estimation — No annotated dark-scene data required.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)