Compile the Corpus Into a Skill Tree, Train Surrogates on Logs

        April 18, 2026

Compile the Corpus Into a Skill Tree, Train Surrogates on Logs

RAG shifts from "retrieve-consume" to "walk-and-drill." Corpus2Skill compiles the entire corpus offline into a hierarchical skill tree; the agent drills down along summaries rather than passively receiving results, and beats dense retrieval, RAPTOR, and agentic RAG on WixQA.

Production logs are free distillation data. TRACER uses a parity gate to hand 83-100% of traffic across 77 intent classes to a lightweight surrogate, and on NLI it refuses deployment outright — "knowing it can't do the job" is the most valuable capability in the system.

Visual RAG's four-stage pipeline collapses into one joint policy. UniDoc-RL trains hierarchical actions with dense rewards end-to-end, folds "actively crop a region" into the action space, and gains up to 17.7% across three benchmarks.

Flow matching post-training finally reaches the early generation steps. LeapAlign compresses long trajectories into two randomly anchored leaps, sidestepping the dilemma between OOM backprop and direct-gradient methods that can't touch early steps. Accepted to CVPR.

Imitation plus rule-based correction gives way to an adversarial closed loop. RAD-2 has diffusion generate candidates while an RL discriminator reranks them, paired with BEV feature-space closed-loop simulation; collision rate drops 56% vs a strong diffusion baseline.

Also Notable

Deep Research Agents Get a More Realistic Benchmark — Real user materials with per-task rubrics, covering multi-modal multi-file report generation. DR³-Eval
VLM Distillation Can't Share One Signal Across Both Modalities — Switch-KD admits the vision branch and language branch need different supervision. Switch-KD
Bolting AIGC Tools Together Produces Style-Clashing Pages — MM-WebAgent preserves global consistency with a hierarchical framework. MM-WebAgent
3DGS Primitive Allocation Moves From Local Heuristics to Global Scene Tokens — GlobalSplat gives feed-forward 3DGS a scene-level viewpoint. GlobalSplat
Long-Context RL Uses the Model's Own High-Magnitude Activations as Signal — LongAct sidesteps the usual reward engineering and data curation paths. LongAct
Multi-Agent Search Breaks the Diversity Ceiling of Single-Agent Tree Search — MARS² hands trajectory diversity to a parallel agent swarm. MARS²
LLM Reasoning Failures Concentrate on a Few "Pivot" Tokens — Not uniform noise but locatable points of divergence. Dissecting Failure Dynamics
VLM Visual Token Pruning Becomes a Pareto-Frontier Learning Problem — Huawei's VisPCO picks pruning configs adaptively to the compute budget. VisPCO
Can a Single Mamba Layer Carry Time-Series Classification on Its Own — ICLR's MambaSL runs an isolated-capacity probe. MambaSL

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)