A 305M Retriever Gains 45% on Instruction Following

        April 22, 2026

A 305M Retriever Gains 45% on Instruction Following

Retrievers Ignore Instructions Because of Data, Not Capacity: IF-IR synthesizes contrastive samples from complementary instruction pairs with label reversal. A 305M encoder gains 45% on FollowIR and beats general embeddings of comparable or larger size.

RLHF's Single Point of Failure Lives in the Reward Model: ARES pushes red-teaming from "find the vulnerability" to end-to-end repair of the policy-reward system, closer to what teams with live RLHF pipelines actually need.

MLLMs Fail in Fog, Low Light, and Motion Blur, and the Fix May Not Be Algorithmic: DUALVISION adds an infrared channel for modal complementarity and open-sources 25K aligned IR-RGB images with 204K QA annotations, cutting the cost of trying IR on existing MLLMs.

Multi-View and 2D-3D Tasks Have Lacked a Unified Positional Encoding: URoPE samples 3D points along camera rays and projects them back to the query plane. Parameter-free, compatible with existing RoPE kernels, with stable gains on novel view synthesis, 3D detection, tracking, and depth estimation.

Also Notable

Split Scientific Feasibility into "Consistent with Known Knowledge" and "Supportable by Experiment" — Framed as diagnostic reasoning, the test is whether LLMs can tell the two layers apart.
Humor as a Counterfactual Unfairness Probe — What the model finds funny exposes social assumptions about identity and groups baked into training data. Clever framing.
Multilingual LLMs Win on High-Level Tasks, Fail on Grammatical Gender and Morphology — MORPHOGEN turns this lexical blind spot into a cross-lingual benchmark.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)