DiT动态patch快3倍，Mamba减法逼近softmax

        February 22, 2026

DiT动态patch快3倍，Mamba减法逼近softmax

Latent diffusion的两步训练可以统一为一步，encoder输出噪声与diffusion噪声level对齐后训练效率更高，ImageNet-512达到FID 1.4。

DiT去噪不需要全程最细粒度：DDiT按内容复杂度动态调整patch大小，即插即用加速3.5倍且质量无损。

MoE最佳实践的完整组合验证——Arcee Trinity用sigmoid routing等已验证技术训400B参数模型，17T token零loss spike。

Mamba-2做减法反而精度更高。 系统性剥离组件后，简化版在保持线性复杂度的同时几乎追平softmax attention。

阅读全文 →

Latent Diffusion's Two-Step Training Collapses Into One. Aligning encoder output noise with the diffusion schedule yields a unified objective. FID 1.4 on ImageNet-512 with lower training FLOPs.

DiT Denoising Doesn't Need Fine-Grained Patches at Every Step. DDiT adjusts patch size by content complexity and denoising stage. 3.5x speedup, no quality loss, no retraining.

A Full Validation of MoE Best Practices at Scale. Arcee Trinity combines sigmoid routing, interleaved attention, and gated attention in a 400B-parameter model trained on 17T tokens with zero loss spikes.

Mamba-2 Gets More Accurate by Removing Components. Systematic ablation produces a simplified variant that nearly matches softmax attention while keeping linear complexity.

Read more →

                                Don't miss what's next. Subscribe to AI论文简报:

            Email address (required)