AI Research Brief

Archives
May 10, 2026

Lorem Ipsum Rescues GRPO's Wasted Hard Samples

  • Skill1 Unifies Skill Retrieval, Use, and Distillation in One Policy. A single task reward co-trains all three, avoiding interference between competing reward signals. SkillOS attacks the same problem from a different angle the same week. Agent continual learning's bottleneck is shifting from single-step inference to skill library operations.
  • DCI Lets Agents Grep the Raw Corpus Directly. Skip embeddings, vector indexes, and retrieval APIs. Beats sparse, dense, and reranking baselines on BRIGHT, BEIR subsets, and BrowseComp-Plus. The retrieval bottleneck moves from algorithm to interface.
  • LoPE Glues Lorem Ipsum Onto the Prompt. From 1.7B to 7B, prepending random Latin beats resampling the original prompt for rescuing GRPO's zero-advantage hard samples. Moving RL exploration from output to input was a path almost nobody had tried seriously.
  • CDM Brings DMD into Continuous Time. Trajectory density and distribution matching, two competing schools, collapse into one framework. 1-4 step generation no longer leans on GAN or reward patches.

Also Notable

  • The Other Skill-Library Paper From the Same Day. SkillOS treats "which skills are worth keeping" as a trainable decision, focusing on learning the curation operator.
  • Trajectory-Level Strategy Sampling for Agentic RL. Improves exploration and credit assignment for reactive policies on long-horizon tasks.
  • "Auto-Research" as a Closed Loop With External Metrics. Specialized agents collaborate to produce auditable trial trajectories rather than a single checkpoint.
  • Multi-Reward Balancing for Diffusion RL Fine-Tuning. MARBLE drops multi-expert and fixed-weight setups for an end-to-end approach.
  • Video Reward Model Decouples Reasoning From Scoring. Think first, then score. The next step for aligning generated video to human preference.
  • Cola DLM Builds a Hierarchical Latent Diffusion Language Model. A complete generation attempt for non-AR text. Worth a glance for anyone tracking AR alternatives.
  • Long-Context Understanding by Different Means. MiA-Signature approximates global activations' downstream impact with a compact representation, sidestepping full attention's O(N²).
  • TIDE Questions "Token Index Injected Once at the Embedding Layer." Re-injects token identity at every layer to address rare-token and long-range degradation.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.