Budget-Aware Agents Beat 4x Brute-Force Sampling
- SWE agent training is bottlenecked by executable environments, not algorithms. OpenSWE open-sources 45,320 Dockerized training environments across 12,800+ repos. The $1.47M build cost shows why academic labs can't fill this infrastructure gap alone.
- A budget-aware tree search beats 4x brute-force sampling at 1/4 the cost. It replaces LLM self-evaluation with relative progress scoring and needs no training to integrate into existing agent systems.
- Traditional embedding benchmark scores don't predict memory retrieval performance. LMEB covers 193 tasks across four memory types and finds results orthogonal to MTEB rankings. Model scale isn't the deciding factor either.
- Enzyme catalysis modeled as explicit "recognition then adaptation" stages. An MoE architecture routes by active-site type, yielding better out-of-distribution generalization for novel enzyme-substrate pairs in drug discovery.
Also Notable
- Multi-Agent Dialogue Framework for Complex Scene Composition — Four specialized agents divide attribute binding and spatial layout tasks, reducing errors from single-model one-shot generation.
- GRPO Under-Explores in T2I Flow Models; Augmenting Condition Space Fixes It — Turns sparse prompt sampling into dense coverage, letting reward signals more reliably guide generation quality.
- Understanding User Pointing in First-Person View — Current MLLMs perform poorly on egocentric pointing comprehension, a key capability gap for next-gen AR assistants.
- HIFICL: High-Fidelity Distillation for In-Context Learning — A mathematical framework analyzes ICL influence factors and compresses multi-example ICL performance into zero-shot inference.
- GNSS-Free Global Localization via Ground-Satellite Cross-View Matching — Dual-axis transforms learn view-invariant representations, solving pose estimation under occlusion and multipath effects.
- Machine Unlearning Must Preserve Knowledge Structure — Semantic relationships among retained knowledge should be maintained after target data removal, or overall model capability degrades.
- xAI Team: Aggregating Local Explanations into Global Decision Patterns — For time-series classification, extracts class-level discriminative patterns from per-sample explanations while respecting temporal dependencies.
- Offline Teacher Distillation + Prompt Tuning for Remote Sensing VLMs — A lightweight approach to transfer general vision-language models to remote sensing without large-scale labeled data.
Don't miss what's next. Subscribe to AI Research Brief: