AI Research Brief

Archives
April 22, 2026

A 305M Retriever Gains 45% on Instruction Following

  • Retrievers Ignore Instructions Because of Data, Not Capacity: IF-IR synthesizes contrastive samples from complementary instruction pairs with label reversal. A 305M encoder gains 45% on FollowIR and beats general embeddings of comparable or larger size.
  • RLHF's Single Point of Failure Lives in the Reward Model: ARES pushes red-teaming from "find the vulnerability" to end-to-end repair of the policy-reward system, closer to what teams with live RLHF pipelines actually need.
  • MLLMs Fail in Fog, Low Light, and Motion Blur, and the Fix May Not Be Algorithmic: DUALVISION adds an infrared channel for modal complementarity and open-sources 25K aligned IR-RGB images with 204K QA annotations, cutting the cost of trying IR on existing MLLMs.
  • Multi-View and 2D-3D Tasks Have Lacked a Unified Positional Encoding: URoPE samples 3D points along camera rays and projects them back to the query plane. Parameter-free, compatible with existing RoPE kernels, with stable gains on novel view synthesis, 3D detection, tracking, and depth estimation.

Also Notable

  • Split Scientific Feasibility into "Consistent with Known Knowledge" and "Supportable by Experiment" — Framed as diagnostic reasoning, the test is whether LLMs can tell the two layers apart.
  • Humor as a Counterfactual Unfairness Probe — What the model finds funny exposes social assumptions about identity and groups baked into training data. Clever framing.
  • Multilingual LLMs Win on High-Level Tasks, Fail on Grammatical Gender and Morphology — MORPHOGEN turns this lexical blind spot into a cross-lingual benchmark.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.