Agents Ignore Answers Placed in Plain Sight

        April 21, 2026

Agents Ignore Answers Placed in Plain Sight

Cohere Puts the Solution Directly in the Agent's Reading Path and It Still Follows Its Own Reasoning Trace. Terminal-Bench runs encountered the shortcut in 79-81% of runs but acted on it only 37-50% of the time; on AppWorld, fewer than 7% of agents that read the hint actually called it.

SkillFlow Shifts Agent Evaluation From "Can You Use Tools" to "Can You Build Skills Over a Lifetime." 166 tasks across 20 families expose lifetime-level failures; Kimi K2.5 hit 66.87% skill usage for a +0.60 point gain.

JuRe Takes Second on TSB-AD With a 128-Dim Depthwise-Separable Conv Block. No attention, no latent variables, no adversarial components; ablations show training perturbations drive the gap, not network capacity.

MedFocusLeak Injects Invisible Perturbations Into Non-Diagnostic Regions of Medical Scans. SOTA attack success across six imaging modalities, with black-box transfer between medical VLMs.

Also Notable

Position Paper Calls Flat-Fact Memory APIs AI's Most Critical Architecture Flaw — proposes an independent continuity layer that carries what the model already understood.
Tsinghua AnchorMem Splits Memory Into Anchored Facts and Associative Contexts — avoids the frequent-rewrite path used by A-Mem and Mem0.
HSG Moves Scene Graphs From Euclidean to Hyperbolic Space — explicitly represents place-object hierarchy for multi-view and 3D scene reasoning.
Dynamic Compute Depth Per Position for Visual Autoregressive Models — CVPR paper presented as an alternative to hard pruning.
Systematic Survey of LLM Reinforcement Learning Under Data Scarcity — ACL paper focused on the cost of obtaining external supervision signals.
LLM Calibration on Medical QA Skews Across Sexual Orientation and Religion Markers — ACL paper; the bias shows up in confidence, not accuracy.
ThreadSumm Frames Nested Discussion Summarization as Hierarchical Reasoning — ACL paper using tree of thoughts for interleaved replies and overlapping topics.
LookasideVLN Adds Orientation Awareness to Drone VLN — CVPR paper improving natural-language navigation in urban environments.
Adaptive Masking Locates Sentiment and Rhetoric Neurons in LLMs — ACL paper offering controllable steering of generation direction.
PBSBench Targets Single-Cell Morphology in Blood Smears Rather Than Tissue Structure — CVPR paper providing a multi-level VLM framework and benchmark for whole-slide images.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)