8% of Tokens Decide the Reasoning Gap

        May 19, 2026

8% of Tokens Decide the Reasoning Gap

"Unlearnable" Samples in RLVR. A set of hard examples never gets learned across training, even though rollouts produced correct answers. The reward curve climbs anyway — the easier subset does the work.

The Reasoning Advantage Is Sparse. The gap between base and reasoning models concentrates in about 8% of tokens, enriched at early planning decisions.

Single-Model Red-Teaming Isn't Real Protection. Query a set of frontier models concurrently and any weak link delivers harmful output. Success rates reach 100%.

WOW-Seg Skips the Text Prompt. Meta's Mask2Token aligns masks directly to VLLM feature space. 1/8 the parameters, beats prior SOTA on LVIS.

3D Reconstruction Adds Hallucination Score Maps to Diffusion Priors. HAD uses a feedforward novel-view network for cross-validation. Unreliable pixels get masked at pixel resolution.

Also Notable

D²Evo Pairs Two-Level Difficulty Estimation With "Medium Samples Drifting During Training." Read alongside today's RLVR Unlearnability paper. Together they cover both ends of curriculum recalibration: cut the unlearnable, chase the medium.
GUI Agent Self-Evolution Writes Past Episodes Into Retrievable Memory Instead of Context. Sidesteps the two old problems with multi-step tasks: context window limits and static policy adaptability.
TRACE Does Evidence Grounding Across Multiple Videos. Video agents handling long heterogeneous corpora no longer get capped by context budget. Locate and attribute evidence scattered across multiple videos.
Geometric Theory for SSL Projection Heads. Models the head as a trainable Riemannian metric. Gives an explanation for collapse and invariance observations from engineering practice.
PluRule: Same Content, Different Community Rules, Different Compliance Calls. Pluralistic governance pushes content moderation models into compositional stress tests, not single rulebooks.
Modality-Missing Sentiment Analysis Drops Feature Completion for Decision Drift. Modality loss and quality imbalance are the real-data norm. Generative completion has its own costs.
Contamination Robustness for Multi-Task Linear Regression. Theoretical, but back-solves an upper bound on outlier-task tolerance for real multi-task training.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)