RL Patches 3D Consistency Into Video Models Without Touching Architecture

        April 29, 2026

RL Patches 3D Consistency Into Video Models Without Touching Architecture

Microsoft Patches 3D Consistency Into Video Models Through RL. World-R1 turns 3D constraints into a reward signal and pairs them with a text-only world simulation dataset, so a deployed video backbone gains geometric capability without architectural surgery.

Meta Reduces Image-Editing CoT to Five Meta-Tasks. Average gain across 21 tasks is 15.8%, and a CoT-Editing consistency reward forces what the model "thinks" to align with what it actually does.

Process Reward Models From Math Break When Ported to Data Analysis. DataPRM lets the reward model run code to verify intermediate state, and uses a three-valued reward to separate honest exploration from real failure.

Financial Agent Sycophancy Comes From Pre-Stated User Preferences, Not Pushback After the Answer. Most models drift toward the user's prior; input-side filtering only partially mitigates it.

Also Notable

Attention-Based VLM Token Pruning Gets Reexamined. ICLR's LearnPruner argues that score-by-attention pruning carries systematic bias, and rethinks what to prune and how.
Existing Streaming VideoQA Benchmarks Are All Retrospective. Yale points out that pausing at fixed timestamps doesn't reflect real streaming response, and proposes an "every-frame prediction counts" eval paradigm.
Reason-Then-Act Agents Only Touch a Single Environment Per Step. ACL's DPEPO runs parallel exploration across multiple environments, expanding per-step information and easing under-exploration.
DPO Preference Data Built From the LVLM's Own Outputs. ACL: avoids the distribution drift that comes with proprietary-model preference construction, and the self-correction path reduces hallucination.
Chart-To-Code Has Always Been Python-Centric. ACL uses multi-language scripts of the same chart as alignment supervision, teaching the model chart semantics decoupled from any specific language.
VLMs Fail at Cross-Frame Reasoning in Dynamic Physical Scenes. ICLR's PhysNote uses self-knowledge notes to let models accumulate physical commonsense over time, handling real-world cases beyond the textbook.
Financial Time Series Moves From Number Prediction to Advisory. This ICLR paper requires the model to give direction, reasoning, and risk management, with hindsight preference as training signal.
Autonomous Driving Topology Reasoning Usually Relies on Simplified MLPs. CVPR's TopoHR introduces a point-to-instance hierarchical centerline representation, raising geometric precision in topology reasoning.
The Chinese Imperial Examination System as a Benchmark for Expert Historical Reasoning. ACL: moves beyond breadth-of-knowledge tests into source criticism and long-horizon reasoning.
Standard CT Report Generation Metrics Are Too Coarse. ACL's CT-FineBench evaluates by disease attribute, so "overall report similarity" no longer masks loss of diagnostic fidelity.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)