Reasoning Search Adds a Second Direction, World Models Add Agents

        May 31, 2026

Reasoning Search Adds a Second Direction, World Models Add Agents

BES Splits Inference-Time Search From One-Way Expansion. Forward search gains evolutionary operators to escape the model prior's "entropy shell," and backward search recursively decomposes the goal to feed dense intermediate signals to forward, theoretically cutting samples exponentially.

DenoiseRL Trains on the Model's Own Failed Prefixes. The target isn't avoiding mistakes — it's teaching the model to recover after going off-track, which matters more in long-horizon agent settings than mistake-avoidance.

MemTrace Turns Memory Pipelines Into Executable Evolution Graphs. Failure attribution shifts from experience-based guesswork to subgraph analysis, with contamination clustering in "information loss" and "retrieval mismatch."

GEM Forces VLMs to Generate Depth Maps During Pretraining. The depth output isn't the point — it pushes the model to encode spatial structure into representations, filling the "low-level physics" course VLMs never took.

Gamma-World Scales Interactive Video World Models to Multi-Agent Shared Spaces. Simplex Rotary encodes agent identity into RoPE geometry without learning, Sparse Hub Attention drops cross-agent attention from quadratic to linear, and training on 2 agents generalizes to 4 without retraining.

Also Notable

Active Recommendation RL Has Two Policy Gradient Estimator Biases at the Path Level — the paper provides a correction method, useful for teams working on long-horizon RL.
For Small Computer-Use Agent Specialization, Picking Traces From Weak Points Beats Brute-Force Synthesis — gives small and mid-sized teams a more economical training data path for specialized agents.
Sparse Attention Plus HiF8 Quantization Plus RL Combined Into a Video Generation Pipeline — none of the individual techniques are new, but the engineering parameters for combining them are useful reference.
Agent Skill Updates Framed as Gradient-Descent-Like Optimization — read alongside the recent MUSE-Autoskill work on skill management; different angle.
First Unified Evaluation of Thinking-Mode Switching in Hybrid-Reasoning LLMs — cross-model comparisons are finally comparable, useful for picking baselines on hybrid reasoning routing.
Async Function Calling Benchmark Adds Tool Response Latency as a Dimension — the latency dimension long dodged is finally in the benchmark, worth watching for tool-calling framework teams.
Code Written by Models Rewrites the Runtime Itself — conceptual design exploration; relevant for anyone tracking how agent architectures develop.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)