PDEs Beat Attention 2x, Local RL Saves 3/4 Compute
- Decomposing formal proofs into three independent RL tasks beats end-to-end training. LongCat-Flash-Prover separates autoformalization, scaffolding, and step-by-step proving, each with its own RL loop. HisPO stabilizes MoE long-chain training. The methodology transfers regardless of model scale.
- Layering local RL on SFT trajectories reaches near end-to-end performance at one quarter the compute. PivotRL only rolls out at high-variance "pivot" steps. OOD tasks beat standard SFT by 10%. Already deployed in NVIDIA's Nemotron production models.
- A PDE replaces self-attention in world model prediction with 2x lower reconstruction error. FluidWorld uses reaction-diffusion equations for spatial inductive bias and O(N) complexity. Multi-step predictions stay stable where Transformers degrade.
- Aligning language and actions at inference time beats baking reasoning supervision into training. RoboAlign samples action tokens via natural language reasoning at test time, then applies RL alignment. Just 1% of data after SFT yields significant gains.
Also Notable
- F4Splat Uses Predictive Densification for 3D Gaussian Splatting — Controls total Gaussian count while maintaining reconstruction quality.
- Unified Framework for Discrete Diffusion With Arbitrary Noise Processes — Clean forward and reverse derivations.
- Learnable Sparse Memory Banks for Transformers — Retrieves stored knowledge via cross-attention. (ICLR)
- Context-Aware Adaptive Fine-Tuning for Vision Encoders — No more choosing between freezing and fine-tuning. (CVPR)
- Constraint-Based Filtering for Synthetic Multi-Step Reasoning Data — Systematically improves synthetic data reliability. (AAAI)
- Uncertainty-Adaptive Knowledge Distillation — Dynamically balances learning from data vs. teacher per sample. No manual tuning. (CVPR)
- Frequency-Domain Switching for Parameter-Efficient Multi-Task Learning — One model, multiple tasks. (CVPR)
- Root Cause of Sampling Bias in Latent Diffusion Models Found — Variance inflation loss correction. (CVPR)
- Emotion-Driven 3D Talking Head Synthesis — Expression control under few-shot personalization. (CVPR)
- 3D Reconstruction With Physical Uncertainty Propagation — Reconstructions that respect physics, not just visual fidelity. (CVPR)
Don't miss what's next. Subscribe to AI Research Brief: