Agents Ignore Answers Placed in Plain Sight
- Cohere Puts the Solution Directly in the Agent's Reading Path and It Still Follows Its Own Reasoning Trace. Terminal-Bench runs encountered the shortcut in 79-81% of runs but acted on it only 37-50% of the time; on AppWorld, fewer than 7% of agents that read the hint actually called it.
- SkillFlow Shifts Agent Evaluation From "Can You Use Tools" to "Can You Build Skills Over a Lifetime." 166 tasks across 20 families expose lifetime-level failures; Kimi K2.5 hit 66.87% skill usage for a +0.60 point gain.
- JuRe Takes Second on TSB-AD With a 128-Dim Depthwise-Separable Conv Block. No attention, no latent variables, no adversarial components; ablations show training perturbations drive the gap, not network capacity.
- MedFocusLeak Injects Invisible Perturbations Into Non-Diagnostic Regions of Medical Scans. SOTA attack success across six imaging modalities, with black-box transfer between medical VLMs.
Also Notable
- Position Paper Calls Flat-Fact Memory APIs AI's Most Critical Architecture Flaw — proposes an independent continuity layer that carries what the model already understood.
- Tsinghua AnchorMem Splits Memory Into Anchored Facts and Associative Contexts — avoids the frequent-rewrite path used by A-Mem and Mem0.
- HSG Moves Scene Graphs From Euclidean to Hyperbolic Space — explicitly represents place-object hierarchy for multi-view and 3D scene reasoning.
- Dynamic Compute Depth Per Position for Visual Autoregressive Models — CVPR paper presented as an alternative to hard pruning.
- Systematic Survey of LLM Reinforcement Learning Under Data Scarcity — ACL paper focused on the cost of obtaining external supervision signals.
- LLM Calibration on Medical QA Skews Across Sexual Orientation and Religion Markers — ACL paper; the bias shows up in confidence, not accuracy.
- ThreadSumm Frames Nested Discussion Summarization as Hierarchical Reasoning — ACL paper using tree of thoughts for interleaved replies and overlapping topics.
- LookasideVLN Adds Orientation Awareness to Drone VLN — CVPR paper improving natural-language navigation in urban environments.
- Adaptive Masking Locates Sentiment and Rhetoric Neurons in LLMs — ACL paper offering controllable steering of generation direction.
- PBSBench Targets Single-Cell Morphology in Blood Smears Rather Than Tissue Structure — CVPR paper providing a multi-level VLM framework and benchmark for whole-slide images.
Don't miss what's next. Subscribe to AI Research Brief: