Learned Sparsity Cuts Diffusion Inference Compute by 54%
- Learned sparsity cuts diffusion inference compute by 54% with no quality loss. DiffSparse trains a lightweight predictor to decide per-layer, per-step token sparsity rates. Stacking with distillation and quantization remains unverified.
- Multi-character video identity leakage traces back to position encoding, not attention. PoCo redesigns control signals at the position embedding level, improving cross-shot consistency and reference fidelity. Sora2 is attacking the same problem.
- Next-scale AR extends from images to motion generation. Coarse-to-fine hierarchical generation outperforms flattened 1D sequences. CVPR-accepted text-to-motion hits SOTA and zero-shot generalizes to editing tasks.
Also Notable
- Labels Matter More Than Images in Visual In-Context Prompt Retrieval — prompt engineering effort may be aimed at the wrong target.
- Diffusion-Generated Imagined Frames for Video Retrieval — bridges the information asymmetry when text queries describe only a video fragment.
- 3DGS Hair Reconstruction Compressed via Card Clustering — storage and rendering costs drop sharply from million-scale Gaussians.
- First Large-Scale Pixel-Level X-Ray Contraband Segmentation Benchmark — pushes security screening from bounding boxes to fine-grained segmentation.
Don't miss what's next. Subscribe to AI Research Brief: