AI论文简报

Archives
Log in
February 15, 2026

RL训练数据不够用?把简单题拼成难题就行

  • 把做对的简单题拼成新难题,Composition-RL让RLVR训练数据的有效利用率大幅提升,4B到30B模型一致涨点
  • 5B参数做到80B的活。DeepGen 1.0在图像生成和编辑上同时超越体量大十几倍的对手,代码权重全开源
  • 学生不仅能学老师,还能超过老师。ExOPD通过"奖励外推"打破蒸馏的性能天花板,多领域专家知识可合并回小模型
  • 1张A6000D跑100万token上下文,MiniCPM-SALA用稀疏+线性注意力混合架构把长上下文推理成本砍到原来的1/3

阅读全文 →


  • Combine solved easy problems into new hard ones. Composition-RL turns wasted RLVR training samples into effective composite challenges, with consistent gains from 4B to 30B models.
  • 5B parameters doing the job of 80B. DeepGen 1.0 beats opponents 10x its size in both image generation and editing — code and weights fully open-sourced.
  • Students can surpass their teachers. ExOPD breaks the distillation performance ceiling through "reward extrapolation," and multi-domain expert knowledge can be merged back into a single small model.
  • 1M-token context on a single A6000D. MiniCPM-SALA's sparse + linear attention hybrid cuts long-context inference cost to a third.

Read more →

Don't miss what's next. Subscribe to AI论文简报:
Powered by Buttondown, the easiest way to start and grow your newsletter.