AI Research Brief

Archives
Log in
June 9, 2026

Swap the Arm Without Retraining; VLMs See Both the Duck and the Rabbit

  • Swap a robot arm and the whole skill set breaks — the fix is rewiring, not retraining. RECENT writes skills as executable code and locally refactors only the execution bindings that shift with body or environment, letting a small model handle grounding on-device and matching the large-model version's task performance.
  • Robust-U1 makes the model repair the image before answering, turning robustness into an observable intermediate. A three-stage self-recovery path handles blur, noise, and occlusion — the visual corruption that only shows up in production — at the cost of an extra reconstruction step.
  • VLMs actually "see" both readings of a duck-rabbit image. Probes find 72% of bistable images light up features for both interpretations on the vision side; the bottleneck for steering sits downstream in language, not in the vision tower.
  • Atmospheric compensation in standoff infrared imaging, long shelved, gets a set-based treatment. The work jointly inverts multiple radiance measurements of one scene as an unordered set; what transfers is the modeling stance, not the LWIR setting itself.

Also Notable

  • Multiple Teaching Agents Each Propose a Reasonable Plan, but the Student Gets One Answer — a voting protocol coordinates multi-agent collaboration, treating disagreement as a governance problem rather than a capability gap.
  • A Map for Spending More Compute at Inference Time in Multimodal Models — a systematic survey of test-time scaling across generation and reasoning in multimodal foundation models.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.