RL Patches 3D Consistency Into Video Models Without Touching Architecture
- Microsoft Patches 3D Consistency Into Video Models Through RL. World-R1 turns 3D constraints into a reward signal and pairs them with a text-only world simulation dataset, so a deployed video backbone gains geometric capability without architectural surgery.
- Meta Reduces Image-Editing CoT to Five Meta-Tasks. Average gain across 21 tasks is 15.8%, and a CoT-Editing consistency reward forces what the model "thinks" to align with what it actually does.
- Process Reward Models From Math Break When Ported to Data Analysis. DataPRM lets the reward model run code to verify intermediate state, and uses a three-valued reward to separate honest exploration from real failure.
- Financial Agent Sycophancy Comes From Pre-Stated User Preferences, Not Pushback After the Answer. Most models drift toward the user's prior; input-side filtering only partially mitigates it.
Also Notable
- Attention-Based VLM Token Pruning Gets Reexamined. ICLR's LearnPruner argues that score-by-attention pruning carries systematic bias, and rethinks what to prune and how.
- Existing Streaming VideoQA Benchmarks Are All Retrospective. Yale points out that pausing at fixed timestamps doesn't reflect real streaming response, and proposes an "every-frame prediction counts" eval paradigm.
- Reason-Then-Act Agents Only Touch a Single Environment Per Step. ACL's DPEPO runs parallel exploration across multiple environments, expanding per-step information and easing under-exploration.
- DPO Preference Data Built From the LVLM's Own Outputs. ACL: avoids the distribution drift that comes with proprietary-model preference construction, and the self-correction path reduces hallucination.
- Chart-To-Code Has Always Been Python-Centric. ACL uses multi-language scripts of the same chart as alignment supervision, teaching the model chart semantics decoupled from any specific language.
- VLMs Fail at Cross-Frame Reasoning in Dynamic Physical Scenes. ICLR's PhysNote uses self-knowledge notes to let models accumulate physical commonsense over time, handling real-world cases beyond the textbook.
- Financial Time Series Moves From Number Prediction to Advisory. This ICLR paper requires the model to give direction, reasoning, and risk management, with hindsight preference as training signal.
- Autonomous Driving Topology Reasoning Usually Relies on Simplified MLPs. CVPR's TopoHR introduces a point-to-instance hierarchical centerline representation, raising geometric precision in topology reasoning.
- The Chinese Imperial Examination System as a Benchmark for Expert Historical Reasoning. ACL: moves beyond breadth-of-knowledge tests into source criticism and long-horizon reasoning.
- Standard CT Report Generation Metrics Are Too Coarse. ACL's CT-FineBench evaluates by disease attribute, so "overall report similarity" no longer masks loss of diagnostic fidelity.
Don't miss what's next. Subscribe to AI Research Brief: