二进制token让图像生成快30倍，RL训练也在学会反思

        February 18, 2026

二进制token让图像生成快30倍，RL训练也在学会反思

二进制token取代codebook索引，BitDance用260M参数打平1.4B模型的图像生成质量，推理快8.7倍，1024分辨率快30倍以上

RL训练的反馈太稀疏模型学不动？ERL让模型先反思失败再强化成功，复杂环境提升最高达81%

搜索Agent的训练数据太贵太难造，REDSearcher用图拓扑合成高质量复杂任务，配合本地仿真环境大幅降低RL迭代成本

推理时计算还在靠高温采样碰运气？STATe用结构化推理模板替代随机采样，搜索空间更可控且可解释

阅读全文 →

Binary tokens replace codebook indices. BitDance matches 1.4B-parameter models with 260M parameters, runs 8.7x faster, and hits 30x+ speedup at 1024 resolution.

RL training feedback too sparse for models to learn? ERL adds explicit reflection on failures before reinforcing successes, with gains up to 81% in complex environments.

Training data for search agents is expensive and hard to build. REDSearcher synthesizes complex tasks via graph topology and pairs them with a local simulation environment to slash RL iteration costs.

Inference-time compute still relies on high-temperature sampling to get lucky? STATe replaces random sampling with structured reasoning templates — more controllable and more interpretable.

Read more →

                                Don't miss what's next. Subscribe to AI论文简报:

            Email address (required)