AI Research Brief
Archives
Search...
Subscribe
Flow-OPD Lifts GenEval From 63 to 92
May 13, 2026
Image Generation Alignment and LLM Post-Training Now Share One Toolbox. Flow-OPD ports On-Policy Distillation to flow matching. SD 3.5 Medium hits GenEval 92...
Geometry Conflict Predicts Continual Fine-Tuning Forgetting
May 12, 2026
Geometry Conflict Predicts Continual Fine-Tuning Forgetting. Treating each task's parameter-update covariance as a measurable signal, GCWM beats data-free...
Lorem Ipsum Rescues GRPO's Wasted Hard Samples
May 10, 2026
Skill1 Unifies Skill Retrieval, Use, and Distillation in One Policy. A single task reward co-trains all three, avoiding interference between competing reward...
10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA
May 8, 2026
10.6k Curated Trajectories Match a Four-Stage RL Pipeline. OpenSeeker-v2 expands knowledge graph and tool set, applies strict low-step filtering. Pure SFT on...
T²PO Stabilizes Multi-Turn RL; MotionCache Cuts Video Steps 6x
May 7, 2026
Multi-Turn Agent RL Collapse May Not Be a Credit Assignment Problem. T²PO uses model self-uncertainty to trigger thinking and resampling. Stability and final...
Gradient Boosting Turns Out to Be Diffusion's Asymptotic Optimum
May 6, 2026
Multi-Object Generation Failures Need Attribution Before Solutions: T2I multi-object failures come from scene complexity, not class imbalance. Concept-level...
ViT Pre-Trains Like an LLM, Skips the CLIP Stage
May 4, 2026
GenLIP Pre-Trains ViT With an LM Objective Directly: dropping CLIP's contrastive stage and text decoder, 8B samples match larger-data baselines on multimodal...
FD as Loss: One-Step Generation Hits 0.72 FID
May 2, 2026
Heterogeneous scientific foundation model collaboration: Eywa pulls LLMs back from "general solver" to coordinator, handing protein structure and physics...
Cross-Architecture Distillation Shrinks dLLMs to 0.6B
May 1, 2026
Cross-Architecture Distillation Shrinks dLLMs From 8B to 0.6B. TIDE is the first dLLM distillation framework where teacher and student differ in...
Recursive MAS Cuts Tokens 35%, T2I Repaints Instead of Editing
April 30, 2026
Recursive Scaling Moves From Single Models to Multi-Agent Systems. RecursiveMAS casts the entire multi-agent setup as one latent-space recursive computation,...
RL Patches 3D Consistency Into Video Models Without Touching Architecture
April 29, 2026
Microsoft Patches 3D Consistency Into Video Models Through RL. World-R1 turns 3D constraints into a reward signal and pairs them with a text-only world...
Emotion Probes Crash From 82% to 5% Without Keywords
April 28, 2026
Silicon Panels Match the Mean and Distort the Variance. Stanford used 277 professional philosophers as ground truth; seven open and closed models all...
ProEval Cuts Benchmark Eval Samples 8-65x
April 28, 2026
Benchmark Eval Becomes a Probability Problem. Google's ProEval treats LLM benchmark scoring as Bayesian estimation with a pretrained Gaussian process...
Full Traces Lift Multi-Agent Attribution Accuracy 76%
April 27, 2026
Multi-Agent Debugging Moves from Vibes to Numbers. TraceElephant turns failure attribution into an explicit benchmark, with full execution traces lifting...
4B Agent on 10K Data, MoE Upcycling Saves 32% Compute
April 25, 2026
10K Open Trajectories Train a 4B Deep Research Agent. DR-Venus combines agentic SFT with turn-level RL to deliver an edge-deployable agent that beats sub-9B...
Recalibrating the Critic Lifts Reasoning Models 18 Points
April 25, 2026
Self-Trained Reasoning Models Stall Because the Critic Drifts. TEMPO recalibrates the critic against a small labeled set. OLMO3-7B jumps from 33% to 51% on...
Coding Agents Start Cheating by Round 4 Under Score Pressure
April 25, 2026
Pressuring Coding Agents on Public Scores Actively Induces Shortcuts. 403 of 1,326 trajectories showed public scores rising while hidden true scores stayed...
A 305M Retriever Gains 45% on Instruction Following
April 22, 2026
Retrievers Ignore Instructions Because of Data, Not Capacity: IF-IR synthesizes contrastive samples from complementary instruction pairs with label reversal....
Agents Ignore Answers Placed in Plain Sight
April 21, 2026
Cohere Puts the Solution Directly in the Agent's Reading Path and It Still Follows Its Own Reasoning Trace. Terminal-Bench runs encountered the shortcut in...
3B Matches R1 on Refusal; B Matrix Is LoRA's Bottleneck
April 21, 2026
Write Abstention Into the Reward. Abstain-R1 puts answerable and unanswerable questions under one verifiable signal. A 3B model matches DeepSeek-R1 on three...
Older archives