Lorem Ipsum Rescues GRPO's Wasted Hard Samples
- Skill1 Unifies Skill Retrieval, Use, and Distillation in One Policy. A single task reward co-trains all three, avoiding interference between competing reward signals. SkillOS attacks the same problem from a different angle the same week. Agent continual learning's bottleneck is shifting from single-step inference to skill library operations.
- DCI Lets Agents Grep the Raw Corpus Directly. Skip embeddings, vector indexes, and retrieval APIs. Beats sparse, dense, and reranking baselines on BRIGHT, BEIR subsets, and BrowseComp-Plus. The retrieval bottleneck moves from algorithm to interface.
- LoPE Glues Lorem Ipsum Onto the Prompt. From 1.7B to 7B, prepending random Latin beats resampling the original prompt for rescuing GRPO's zero-advantage hard samples. Moving RL exploration from output to input was a path almost nobody had tried seriously.
- CDM Brings DMD into Continuous Time. Trajectory density and distribution matching, two competing schools, collapse into one framework. 1-4 step generation no longer leans on GAN or reward patches.
Also Notable
- The Other Skill-Library Paper From the Same Day. SkillOS treats "which skills are worth keeping" as a trainable decision, focusing on learning the curation operator.
- Trajectory-Level Strategy Sampling for Agentic RL. Improves exploration and credit assignment for reactive policies on long-horizon tasks.
- "Auto-Research" as a Closed Loop With External Metrics. Specialized agents collaborate to produce auditable trial trajectories rather than a single checkpoint.
- Multi-Reward Balancing for Diffusion RL Fine-Tuning. MARBLE drops multi-expert and fixed-weight setups for an end-to-end approach.
- Video Reward Model Decouples Reasoning From Scoring. Think first, then score. The next step for aligning generated video to human preference.
- Cola DLM Builds a Hierarchical Latent Diffusion Language Model. A complete generation attempt for non-AR text. Worth a glance for anyone tracking AR alternatives.
- Long-Context Understanding by Different Means. MiA-Signature approximates global activations' downstream impact with a compact representation, sidestepping full attention's O(N²).
- TIDE Questions "Token Index Injected Once at the Embedding Layer." Re-injects token identity at every layer to address rare-token and long-range degradation.
Don't miss what's next. Subscribe to AI Research Brief: