AI Research Brief

Archives
April 25, 2026

Recalibrating the Critic Lifts Reasoning Models 18 Points

  • Self-Trained Reasoning Models Stall Because the Critic Drifts. TEMPO recalibrates the critic against a small labeled set. OLMO3-7B jumps from 33% to 51% on AIME 2024, Qwen3-14B from 42% to 66%. Diversity holds.
  • 8M–30M Micro LMs Write the First 4–8 Words On-Device. A cloud model continues asynchronously. From the user's perspective, latency disappears; the device-vs-cloud question stops being either-or.
  • LoRA's "Locality" Is a Diagnostic Axis Worth Isolating. ShadowPEFT moves adaptation from weight space to layer space using a centralized shadow network. Same architectural signal as the B-matrix symmetry paper from two days ago.
  • What Gives Away an AI Shopping Video Isn't Picture Quality. It's hand and face anomalies plus fingers clipping through products. CoInteract bakes spatial structure into generation through dual-stream training, with the auxiliary stream removed at inference, so generation cost stays flat.

Also Notable

  • AnyRecon Treats Video Diffusion as a Universal 3D Reconstruction Prior — Feeds any number of unordered inputs straight in, sidestepping the geometric consistency problem under sparse views.
  • Tstars-Tryon 1.0 Publishes Engineering Trade-offs for Production Virtual Try-On — Stability under extreme pose, lighting, and motion blur, plus serving latency, with real deployment detail.
  • SmartPhotoCrafter Couples Reasoning, Generation, and Optimization Into End-to-End Photo Editing — Sidesteps the entry pain of non-experts who can't write aesthetic instructions.
  • Chat2Workflow Is the First Benchmark for LLMs Generating Executable Visual Workflows from Natural Language — Pulls the direction from engineering experiment to quantifiable comparison.
  • 15 LLMs Across 8 Tasks Show Zero-Shot Ability Explains Only Part of Final Optimization Variance — Where the rest comes from is worth digging into.
  • CityRAG Turns City Generation Into a Controllable Simulation Environment for Autonomous Driving — Supports arbitrary weather and dynamic-object configuration.
  • DASH-KV Accelerates Long-Context Inference With Asymmetric KV Cache Hashing — Sidesteps the generation-quality trade-off in standard KV compression.
  • GRASPrune Jointly Prunes FFN Channels and KV Head Groups Post-Pretraining — Structured pruning under a unified budget.
  • Treats Evaluation, Not Models, as the Real Bottleneck for Scientific Discovery — A perspective-flipped diagnostic.
  • RARE Moves RAG Evaluation From "Documents Are Distinct" Onto Earnings Reports, Legal, and Patents — Redundancy-aware is the next gap in RAG evaluation.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.