文本扩散模型不再只是概念验证
- 文本扩散模型不再只是概念验证,LLaDA2.1的100B模型在代码任务上跑出892 TPS,并首次实现大规模RL训练dLLM
- 开源视频+音频联合生成终于有了,MOVA一个模型同时生成画面、对白、音效和音乐
- GUI Agent三个变体覆盖2B到30B。UI-Venus-1.5在ScreenSpot-Pro和AndroidWorld上刷新SOTA,中文手机App实测可用
- 模型训练到瓶颈了怎么办?用自己的"弱版本"当老师反而能继续涨点,零额外推理开销
- Text diffusion models are no longer a proof of concept. LLaDA2.1's 100B model hits 892 TPS on code tasks and is the first dLLM to undergo large-scale RL training.
- Open-source video+audio joint generation is here. MOVA generates visuals, dialogue, sound effects, and music in a single model.
- GUI agents that actually work on real phones. UI-Venus-1.5 sets new SOTA on ScreenSpot-Pro and AndroidWorld across three model sizes from 2B to 30B.
- When post-training saturates, teach from your own weak checkpoints. WMSS uses earlier model states to recover forgotten capabilities at zero inference cost.
Don't miss what's next. Subscribe to AI论文简报: