VLMs Break When You Change the Rules

        April 15, 2026

VLMs Break When You Change the Rules

VLMs Read the Board but Can't Follow Alternative Rules. 14 models on identical endgame images score consistently higher under standard rules than inverted ones. Researchers call it "semantic fixation" — a warning for any application requiring models to follow custom rules.

English Safety Alignment Collapses in Low-Resource Languages. LASA anchors alignment at the model's semantic bottleneck layer, cutting LLaMA-3.1's average attack success rate from 24.7% to 2.8%.

Naive Sparse Attention Breaks on Diffusion Language Models. KV inflation from masked tokens is the root cause. LoSA exploits local invariance in token states during denoising, achieving 4.14x speedup in practice.

In Large Tool Libraries, Plan-Level Search Beats Picking the Right Tool at Each Step. Amazon uses prediction entropy to allocate search budget: explore more where uncertainty is high, push forward where it's low.

Also Notable

When Retrieved Evidence Conflicts With Visual Content, Should Models Refuse or Hallucinate? — ACL benchmark separates deflection and hallucination evaluation for the first time.
VideoLLMs Excel at Understanding but Struggle at Retrieval — ViLL-E uses the LLM's final-layer representations as embeddings, making video understanding and retrieval no longer mutually exclusive.
Training LLMs to Write Reviews From Author Responses, Not Reviewer Text — NVIDIA flips the training signal from reviewer opinions to author rebuttals.
Active Learning Can Finally Optimize Downstream Loss Directly — No longer limited to uncertainty or information gain proxies.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)