AI论文简报

Archives
Log in
February 18, 2026

二进制token让图像生成快30倍,RL训练也在学会反思

  • 二进制token取代codebook索引,BitDance用260M参数打平1.4B模型的图像生成质量,推理快8.7倍,1024分辨率快30倍以上
  • RL训练的反馈太稀疏模型学不动?ERL让模型先反思失败再强化成功,复杂环境提升最高达81%
  • 搜索Agent的训练数据太贵太难造,REDSearcher用图拓扑合成高质量复杂任务,配合本地仿真环境大幅降低RL迭代成本
  • 推理时计算还在靠高温采样碰运气?STATe用结构化推理模板替代随机采样,搜索空间更可控且可解释

阅读全文 →


  • Binary tokens replace codebook indices. BitDance matches 1.4B-parameter models with 260M parameters, runs 8.7x faster, and hits 30x+ speedup at 1024 resolution.
  • RL training feedback too sparse for models to learn? ERL adds explicit reflection on failures before reinforcing successes, with gains up to 81% in complex environments.
  • Training data for search agents is expensive and hard to build. REDSearcher synthesizes complex tasks via graph topology and pairs them with a local simulation environment to slash RL iteration costs.
  • Inference-time compute still relies on high-temperature sampling to get lucky? STATe replaces random sampling with structured reasoning templates — more controllable and more interpretable.

Read more →

Don't miss what's next. Subscribe to AI论文简报:
Powered by Buttondown, the easiest way to start and grow your newsletter.