PDEs Beat Attention 2x, Local RL Saves 3/4 Compute

        March 24, 2026

PDEs Beat Attention 2x, Local RL Saves 3/4 Compute

Decomposing formal proofs into three independent RL tasks beats end-to-end training. LongCat-Flash-Prover separates autoformalization, scaffolding, and step-by-step proving, each with its own RL loop. HisPO stabilizes MoE long-chain training. The methodology transfers regardless of model scale.

Layering local RL on SFT trajectories reaches near end-to-end performance at one quarter the compute. PivotRL only rolls out at high-variance "pivot" steps. OOD tasks beat standard SFT by 10%. Already deployed in NVIDIA's Nemotron production models.

A PDE replaces self-attention in world model prediction with 2x lower reconstruction error. FluidWorld uses reaction-diffusion equations for spatial inductive bias and O(N) complexity. Multi-step predictions stay stable where Transformers degrade.

Aligning language and actions at inference time beats baking reasoning supervision into training. RoboAlign samples action tokens via natural language reasoning at test time, then applies RL alignment. Just 1% of data after SFT yields significant gains.

Also Notable

F4Splat Uses Predictive Densification for 3D Gaussian Splatting — Controls total Gaussian count while maintaining reconstruction quality.
Unified Framework for Discrete Diffusion With Arbitrary Noise Processes — Clean forward and reverse derivations.
Learnable Sparse Memory Banks for Transformers — Retrieves stored knowledge via cross-attention. (ICLR)
Context-Aware Adaptive Fine-Tuning for Vision Encoders — No more choosing between freezing and fine-tuning. (CVPR)
Constraint-Based Filtering for Synthetic Multi-Step Reasoning Data — Systematically improves synthetic data reliability. (AAAI)
Uncertainty-Adaptive Knowledge Distillation — Dynamically balances learning from data vs. teacher per sample. No manual tuning. (CVPR)
Frequency-Domain Switching for Parameter-Efficient Multi-Task Learning — One model, multiple tasks. (CVPR)
Root Cause of Sampling Bias in Latent Diffusion Models Found — Variance inflation loss correction. (CVPR)
Emotion-Driven 3D Talking Head Synthesis — Expression control under few-shot personalization. (CVPR)
3D Reconstruction With Physical Uncertainty Propagation — Reconstructions that respect physics, not just visual fidelity. (CVPR)

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)