AI Research Brief

Archives
Log in
June 1, 2026

LoRA as a Ruler for Memory, Reversed Video as a Free Counterfactual

  • Flip LoRA into a measuring stick and the real capacity of parametric memory falls out: it follows a power law you can estimate in advance, and a token-prediction probability of 0.5 is the threshold for verbatim recall.
  • Unified retrieval isn't about the interface — it's about not throwing away structure. OmniRetrieval routes queries to each source's native engine instead of crushing everything into a shared vector space, and beats single-source baselines across 309 knowledge bases.
  • Play a real video backwards and you get a free counterfactual. YoCausal uses reversed clips as expectation-violating negatives, and finds 13 video diffusion models can sense the arrow of time but can't explain the causality.
  • Image agents shift from rewriting prompts to writing code. GenClaw has the LLM nail down composition in SVG/HTML/Three.js as an executable sketch, then hands it to a generative model to color in — the value is control, not fidelity.
  • Agent guardrails pile on "lightweight" and "real-time," but the real novelty hides in the taxonomy. AgentDoG 1.5's substantive contribution is an updated open-world agent risk taxonomy; discount the "1k samples matches closed-source" claim, and verify it yourself since the model and dataset are open.

Also Notable

  • A Fully Open-Source Real-Time Interactive Video World Model — the whole chain is open, from data construction to streaming inference, worth a look if you want to run your own world model. minWM
  • Move Token Compression From Late Prefill Up Into the Vision Encoder — video understanding usually compresses late in prefill; this skips the wasted stretch before it. EarlyTom
  • A Third Path for Joint Audio-Video Generation — neither two-tower post-alignment nor full tri-modal mixing, a new route to native fine-grained audio-visual sync. Native Audio-Visual Alignment
  • Render a Text Problem as an Image for a VLM and Performance Collapses — this traces where that "carrier-sensitive" bias comes from. LoMo
  • A Mechanistic Account of Why Dense Retrieval Scores High, at the Embedding Level — makes the long-black-box relevance score legible. Xetrieval
  • Self-Evolving Anchors Loosen Autoregressive Video's Over-Reliance on the First Frame — no longer chained to frame one. AdaState
  • Let the Rewriter and Encoder Co-Train Iteratively — in tool retrieval, casual queries don't match technical API terms; this evolves both ends together. CoHyDE
  • Inject 3D Spatial Priors Into a VLM Without a Dedicated 3D Encoder or 3D VQA Fine-Tuning — patches the geometric-reasoning gap. Beyond 3D VQAs
  • Generative 4D Neural Object Kinematics — lets static 3D objects produce realistic time-varying deformation under different physical conditions. NeuROK
  • An Interactive Assistant for Scientific Hypothesis Discovery — folds divergent exploration and convergent refinement into one workflow. MOOSE-Copilot

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.