AI Research Brief

Archives
March 19, 2026

Open-Source Search Agent Wins With 12K Samples, Agent Skills Mostly Fail

  • An open-source search agent trained on 12K synthetic samples beats closed-source competitors. OpenSeeker nearly doubles the second-best on BrowseComp with fully open data and weights. Deep Research is no longer a big-lab monopoly.
  • Cross-layer attention keeps deep signals from fading. MoDA lets each attention head attend to KV pairs from preceding layers, trading 3.7% extra FLOPs for +2.11% on downstream tasks. Open-sourced.
  • Agent skill injection sounds great; 39 of 49 skills produce zero improvement. SWE-Skills-Bench is the first rigorous evaluation of agent skills in real-world SWE. Average gain: +1.2%.
  • A mathematician formalized a plasma physics theorem in Lean 4 with zero code in 10 days. The full AI-assisted workflow is publicly archived for $200 total cost.

Also Notable

  • Human-Scene Interaction Reconstruction Deploys Directly to Humanoid Robots. HSImul3R uses physics simulator as bidirectional optimization supervisor, bridging the gap between visual reconstruction and physics engines (141 HF upvotes). Source
  • Video DiT Editing Trained on 2D Images Only. ViFeEdit decouples spatial independence through architectural reparameterization, requiring zero video training data. Source
  • City-Scale World Model Grounded in Real Seoul Streets. SWM anchors video generation to retrieved street views, maintaining spatial consistency across hundreds of meters (121 HF upvotes). Source
  • Code LLM and Test LLM Co-Evolve Through Adversarial Training. Code-A1's architectural separation eliminates self-collusion risk, making white-box test generation safe. Source
  • "Wait" Tokens Aren't the Key to Reasoning; Uncertainty Externalization Is. Information-theoretic framework unifies explanations of LLM "Aha moments." Purely procedural reasoning stagnates informationally. Source
  • 464-Person Red Team Competition: Every Frontier Model Falls to Indirect Prompt Injection. Claude Opus 4.5 is most resistant (0.5% success rate), Gemini 2.5 Pro most vulnerable (8.5%). Capability and robustness show weak correlation. Source
  • Hallucination Detection Recast as Geometric Anomaly in Cognitive Trajectories. Information-theoretic probes map VLM generation to a low-dimensional cognitive state space, reaching SOTA under weak supervision. Source
  • Aleph Alpha Releases 70B Tokenizer-Free Model. HAT architecture operates at byte level, reuses Llama 3.1 backbone, outperforms original Llama in both German and English. Source
  • Unified Multimodal Model Inference Accelerated 1.78-2.01x, Training-Free. FlashU tailors optimization strategies separately for generation and understanding tasks (CVPR 2026). Source

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.