Recalibrating the Critic Lifts Reasoning Models 18 Points
- Self-Trained Reasoning Models Stall Because the Critic Drifts. TEMPO recalibrates the critic against a small labeled set. OLMO3-7B jumps from 33% to 51% on AIME 2024, Qwen3-14B from 42% to 66%. Diversity holds.
- 8M–30M Micro LMs Write the First 4–8 Words On-Device. A cloud model continues asynchronously. From the user's perspective, latency disappears; the device-vs-cloud question stops being either-or.
- LoRA's "Locality" Is a Diagnostic Axis Worth Isolating. ShadowPEFT moves adaptation from weight space to layer space using a centralized shadow network. Same architectural signal as the B-matrix symmetry paper from two days ago.
- What Gives Away an AI Shopping Video Isn't Picture Quality. It's hand and face anomalies plus fingers clipping through products. CoInteract bakes spatial structure into generation through dual-stream training, with the auxiliary stream removed at inference, so generation cost stays flat.
Also Notable
- AnyRecon Treats Video Diffusion as a Universal 3D Reconstruction Prior — Feeds any number of unordered inputs straight in, sidestepping the geometric consistency problem under sparse views.
- Tstars-Tryon 1.0 Publishes Engineering Trade-offs for Production Virtual Try-On — Stability under extreme pose, lighting, and motion blur, plus serving latency, with real deployment detail.
- SmartPhotoCrafter Couples Reasoning, Generation, and Optimization Into End-to-End Photo Editing — Sidesteps the entry pain of non-experts who can't write aesthetic instructions.
- Chat2Workflow Is the First Benchmark for LLMs Generating Executable Visual Workflows from Natural Language — Pulls the direction from engineering experiment to quantifiable comparison.
- 15 LLMs Across 8 Tasks Show Zero-Shot Ability Explains Only Part of Final Optimization Variance — Where the rest comes from is worth digging into.
- CityRAG Turns City Generation Into a Controllable Simulation Environment for Autonomous Driving — Supports arbitrary weather and dynamic-object configuration.
- DASH-KV Accelerates Long-Context Inference With Asymmetric KV Cache Hashing — Sidesteps the generation-quality trade-off in standard KV compression.
- GRASPrune Jointly Prunes FFN Channels and KV Head Groups Post-Pretraining — Structured pruning under a unified budget.
- Treats Evaluation, Not Models, as the Real Bottleneck for Scientific Discovery — A perspective-flipped diagnostic.
- RARE Moves RAG Evaluation From "Documents Are Distinct" Onto Earnings Reports, Legal, and Patents — Redundancy-aware is the next gap in RAG evaluation.
Don't miss what's next. Subscribe to AI Research Brief: