频谱衰减让W4A4量化回升7%精度
- 预训练越充分,量化反而越脆弱:Amazon发现激活异常点严重程度与预训练规模正相关,S2D通过频谱衰减在训练阶段修复根因,W4A4精度最高回升7%
- 精心挑选fine-tuning数据的大部分技巧没用,Microsoft Research系统拆解后发现只有梯度表示跨任务稳定有效,数据量充足时精选与随机几乎无差
- token级策略梯度与推理的语义粒度根本错配。MPO把连续K个token打包为语义动作做策略梯度,让优化目标和推理结构对齐
- Airbnb把地理检索重构为2500万网格的极端分类问题,在ranking之前就大幅收窄候选集,解决双边市场供需异质性带来的检索难题
- Better pretraining makes quantization worse. Amazon finds activation outlier severity scales with pretraining duration. S2D applies spectral decay during training to fix the root cause, recovering up to 7% accuracy under W4A4.
- Most fine-tuning data selection tricks are a waste of effort. Microsoft Research systematically disentangles the components and finds only gradient-based representations reliably predict downstream performance. With enough data, careful selection barely beats random.
- Token-level policy gradients fundamentally mismatch reasoning granularity. MPO packs consecutive K tokens into semantic actions for policy gradient, aligning the optimization target with actual reasoning structure.
- Airbnb reformulates geo-retrieval as extreme classification over 25 million grid cells. This drastically narrows the candidate set before ranking, tackling the heterogeneity problem inherent in two-sided marketplaces.
Don't miss what's next. Subscribe to AI论文简报: