AI论文简报

Archives
Log in
February 22, 2026

DiT动态patch快3倍,Mamba减法逼近softmax

  • Latent diffusion的两步训练可以统一为一步,encoder输出噪声与diffusion噪声level对齐后训练效率更高,ImageNet-512达到FID 1.4。
  • DiT去噪不需要全程最细粒度:DDiT按内容复杂度动态调整patch大小,即插即用加速3.5倍且质量无损。
  • MoE最佳实践的完整组合验证——Arcee Trinity用sigmoid routing等已验证技术训400B参数模型,17T token零loss spike。
  • Mamba-2做减法反而精度更高。 系统性剥离组件后,简化版在保持线性复杂度的同时几乎追平softmax attention。

阅读全文 →


  • Latent Diffusion's Two-Step Training Collapses Into One. Aligning encoder output noise with the diffusion schedule yields a unified objective. FID 1.4 on ImageNet-512 with lower training FLOPs.
  • DiT Denoising Doesn't Need Fine-Grained Patches at Every Step. DDiT adjusts patch size by content complexity and denoising stage. 3.5x speedup, no quality loss, no retraining.
  • A Full Validation of MoE Best Practices at Scale. Arcee Trinity combines sigmoid routing, interleaved attention, and gated attention in a 400B-parameter model trained on 17T tokens with zero loss spikes.
  • Mamba-2 Gets More Accurate by Removing Components. Systematic ablation produces a simplified variant that nearly matches softmax attention while keeping linear complexity.

Read more →

Don't miss what's next. Subscribe to AI论文简报:
Powered by Buttondown, the easiest way to start and grow your newsletter.