Data Mixing Becomes Post-Training, Surface Cues Hijack Reasoning 38x
- Data mixing ratios move from pre-training hyperparameter to post-training optimization. OptiMer trains per-dataset models, then searches for optimal merge weights in parameter space. Search cost drops 15–35x.
- Surface cues hijack LLM reasoning 8–38x harder than target constraints. A stable sigmoid pattern across six models. One minimal hint recovers 15 percentage points.
- Dual-stream DiT unifies text semantics and spatial structure inside the architecture. MMFace-DiT beats six SOTAs by 40% on face generation. One model handles multiple spatial conditions.
Also Notable
- Fixed Evaluators in Automated Discovery Let the Search Game the Test — Co-evolving evaluators and discovery processes to prevent reward hacking.
- Noise Pre-Training Improves Implicit Neural Representations — CVPR work challenging the assumption that initialization must be data-driven.
- Panorama-to-3D Scene Generation With Spatial Consistency — CVPR work resolving the consistency vs. controllability trade-off in immersive scene generation.
- CMU's Multilingual Phoneme Recognition Recipe — Systematic validation of how English pre-trained representations generalize to low-resource languages.
Don't miss what's next. Subscribe to AI Research Brief: