LoRA as a Ruler for Memory, Reversed Video as a Free Counterfactual
- Flip LoRA into a measuring stick and the real capacity of parametric memory falls out: it follows a power law you can estimate in advance, and a token-prediction probability of 0.5 is the threshold for verbatim recall.
- Unified retrieval isn't about the interface — it's about not throwing away structure. OmniRetrieval routes queries to each source's native engine instead of crushing everything into a shared vector space, and beats single-source baselines across 309 knowledge bases.
- Play a real video backwards and you get a free counterfactual. YoCausal uses reversed clips as expectation-violating negatives, and finds 13 video diffusion models can sense the arrow of time but can't explain the causality.
- Image agents shift from rewriting prompts to writing code. GenClaw has the LLM nail down composition in SVG/HTML/Three.js as an executable sketch, then hands it to a generative model to color in — the value is control, not fidelity.
- Agent guardrails pile on "lightweight" and "real-time," but the real novelty hides in the taxonomy. AgentDoG 1.5's substantive contribution is an updated open-world agent risk taxonomy; discount the "1k samples matches closed-source" claim, and verify it yourself since the model and dataset are open.
Also Notable
- A Fully Open-Source Real-Time Interactive Video World Model — the whole chain is open, from data construction to streaming inference, worth a look if you want to run your own world model. minWM
- Move Token Compression From Late Prefill Up Into the Vision Encoder — video understanding usually compresses late in prefill; this skips the wasted stretch before it. EarlyTom
- A Third Path for Joint Audio-Video Generation — neither two-tower post-alignment nor full tri-modal mixing, a new route to native fine-grained audio-visual sync. Native Audio-Visual Alignment
- Render a Text Problem as an Image for a VLM and Performance Collapses — this traces where that "carrier-sensitive" bias comes from. LoMo
- A Mechanistic Account of Why Dense Retrieval Scores High, at the Embedding Level — makes the long-black-box relevance score legible. Xetrieval
- Self-Evolving Anchors Loosen Autoregressive Video's Over-Reliance on the First Frame — no longer chained to frame one. AdaState
- Let the Rewriter and Encoder Co-Train Iteratively — in tool retrieval, casual queries don't match technical API terms; this evolves both ends together. CoHyDE
- Inject 3D Spatial Priors Into a VLM Without a Dedicated 3D Encoder or 3D VQA Fine-Tuning — patches the geometric-reasoning gap. Beyond 3D VQAs
- Generative 4D Neural Object Kinematics — lets static 3D objects produce realistic time-varying deformation under different physical conditions. NeuROK
- An Interactive Assistant for Scientific Hypothesis Discovery — folds divergent exploration and convergent refinement into one workflow. MOOSE-Copilot
Don't miss what's next. Subscribe to AI Research Brief: