AI Research Brief

Archives
Log in
May 19, 2026

8% of Tokens Decide the Reasoning Gap

  • "Unlearnable" Samples in RLVR. A set of hard examples never gets learned across training, even though rollouts produced correct answers. The reward curve climbs anyway — the easier subset does the work.
  • The Reasoning Advantage Is Sparse. The gap between base and reasoning models concentrates in about 8% of tokens, enriched at early planning decisions.
  • Single-Model Red-Teaming Isn't Real Protection. Query a set of frontier models concurrently and any weak link delivers harmful output. Success rates reach 100%.
  • WOW-Seg Skips the Text Prompt. Meta's Mask2Token aligns masks directly to VLLM feature space. 1/8 the parameters, beats prior SOTA on LVIS.
  • 3D Reconstruction Adds Hallucination Score Maps to Diffusion Priors. HAD uses a feedforward novel-view network for cross-validation. Unreliable pixels get masked at pixel resolution.

Also Notable

  • D²Evo Pairs Two-Level Difficulty Estimation With "Medium Samples Drifting During Training." Read alongside today's RLVR Unlearnability paper. Together they cover both ends of curriculum recalibration: cut the unlearnable, chase the medium.
  • GUI Agent Self-Evolution Writes Past Episodes Into Retrievable Memory Instead of Context. Sidesteps the two old problems with multi-step tasks: context window limits and static policy adaptability.
  • TRACE Does Evidence Grounding Across Multiple Videos. Video agents handling long heterogeneous corpora no longer get capped by context budget. Locate and attribute evidence scattered across multiple videos.
  • Geometric Theory for SSL Projection Heads. Models the head as a trainable Riemannian metric. Gives an explanation for collapse and invariance observations from engineering practice.
  • PluRule: Same Content, Different Community Rules, Different Compliance Calls. Pluralistic governance pushes content moderation models into compositional stress tests, not single rulebooks.
  • Modality-Missing Sentiment Analysis Drops Feature Completion for Decision Drift. Modality loss and quality imbalance are the real-data norm. Generative completion has its own costs.
  • Contamination Robustness for Multi-Task Linear Regression. Theoretical, but back-solves an upper bound on outlier-task tolerance for real multi-task training.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.