Big Models Resist Rumors but Fall for Noise

        April 16, 2026

Big Models Resist Rumors but Fall for Noise

Agent failures split into two measurable error modes: locking onto one path (over-exploit) and wandering without direction (over-explore) can be separated by black-box metrics, no access to model internals required, and frontier models differ clearly in failure profile.

Scaling splits "reading context" into two subskills moving in opposite directions. Google gives the first scaling law for contextual entrainment across two model families: the largest model resists counterfactual rumors 4x better than the smallest, but gets derailed by irrelevant tokens 2x more often.

Pruning one objective leaves the better solution on the table: Google's MOONSHOT reframes post-training one-shot pruning as multi-objective optimization and wraps existing pruners, cutting C4 perplexity by up to 32.6% on Llama-3.2 at 2:4 sparsity.

Also Notable

Microsoft Adds Video Grounding to Web Agent Skills — Text-only workflow descriptions leave too much execution ambiguity; visual demos anchor skills to UI elements. WebXSkill
HETA Uses Hessian Second-Order Info to Fix Token Attribution — Existing methods mostly rely on linear approximations, missing the causal chain in autoregressive LLMs. Accepted to ICLR. HETA
Procedurally Generated Open-Ended Science Reasoning Problems — Current benchmarks inherit biases from known paper conclusions; InfiniteScienceGym sidesteps publication bias and annotation noise. InfiniteScienceGym
LLMs Fill Text Attributes for Medical Knowledge Graphs — Medical concept representations have long been limited by code noise and sparse samples; downstream clinical prediction quality improves substantially.
MIT Builds a Mathematical Framework for t-SNE's Information Loss — Which structures are necessarily lost and which can be preserved, with a theoretical yardstick for the first time. Some Theoretical Limitations of t-SNE
SSD-GS Fills In Material-Light Interaction for 3DGS Relighting — Physically accurate relighting requires modeling scattering and shadows separately. SSD-GS

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)