二进制token让图像生成快30倍,RL训练也在学会反思
- 二进制token取代codebook索引,BitDance用260M参数打平1.4B模型的图像生成质量,推理快8.7倍,1024分辨率快30倍以上
- RL训练的反馈太稀疏模型学不动?ERL让模型先反思失败再强化成功,复杂环境提升最高达81%
- 搜索Agent的训练数据太贵太难造,REDSearcher用图拓扑合成高质量复杂任务,配合本地仿真环境大幅降低RL迭代成本
- 推理时计算还在靠高温采样碰运气?STATe用结构化推理模板替代随机采样,搜索空间更可控且可解释
- Binary tokens replace codebook indices. BitDance matches 1.4B-parameter models with 260M parameters, runs 8.7x faster, and hits 30x+ speedup at 1024 resolution.
- RL training feedback too sparse for models to learn? ERL adds explicit reflection on failures before reinforcing successes, with gains up to 81% in complex environments.
- Training data for search agents is expensive and hard to build. REDSearcher synthesizes complex tasks via graph topology and pairs them with a local simulation environment to slash RL iteration costs.
- Inference-time compute still relies on high-temperature sampling to get lucky? STATe replaces random sampling with structured reasoning templates — more controllable and more interpretable.
Don't miss what's next. Subscribe to AI论文简报: