DeepSeek V4 Cuts KV to 13.5%, Video Memory Runs 10x Faster

        June 10, 2026

DeepSeek V4 Cuts KV to 13.5%, Video Memory Runs 10x Faster

DeepSeek V4 bakes "index then attend" into the main architecture. Decoding no longer keeps the full KV cache in VRAM. A Neural Memory Indexer fetches relevant history on demand, cutting KV usage to 13.5% on long-context evals while downstream accuracy ticks up 0.6 points.

Video world models move memory into latent space and skip the pixel round-trip. Mirage drops explicit RGB point clouds, runs end-to-end generation 10.57x faster on 1/55th the VRAM, and takes SOTA on WorldScore.

Reading a scene is easy. Acting in it is not. SpatialWorld puts agents in first-person environments where they operate and reason about space at once. The best model averages 17.4% success, bottlenecked on active exploration and long-horizon planning rather than single-step reasoning.

Imitation learning breaks out of distribution, but a bigger policy net isn't the fix. DARP retrieves expert demos at inference time and models the difference between query and neighbor states, beating standard behavior cloning by 15–46% across several domains.

Also Notable

ToM Post-Training Hits 99%, Maybe All Shortcut. The task has exploitable shortcuts, so this kind of post-training gain deserves a question mark first.
Safety Judges Are Brittle, One Perturbation Flips Them. They're sensitive to small changes in prompt and rubric; this uses curriculum training to move the judge from reliable to expressive.
Directly Translated Benchmarks Miss Cultural Context. Multilingual safety evals lose local context under direct translation; this does culturally adapted red-teaming for East and Southeast Asian contexts.
Differential Privacy Has Guarantees, Real Protection Is Doubtful. Overlap in pretraining data discounts DP's privacy effect; this builds an empirical benchmark to test actual protection.
RL Reasoning for Video Grounding Often Stays Shallow. Reasoning paths look sound but ring hollow; this does temporally aware reasoning optimization for sharper grounding.
3D Semantic Scene Generation Drops the Triplane. No more triplanes or other heavy 3D architectures — unconditional diffusion produces editable semantic occupancy for autonomous driving.
Diffusion Both Generates and Learns Representations. The link between the two abilities was never clear; this evaluates its representation space through a self-supervised lens.
AI Paper Writing Shifts From Generation to Verification. This uses a deterministic integrity gate to block fabricated citations and numbers that don't match source tables.
A Bit-Exact Consistency Catalog for 84 Numeric Formats. Porting models across accelerators with FP8/BF16/MXFP4 and others, use it as a reference to catch silent precision drift.
Wastewater Sees Flu Spread Before Clinical Reports. But wastewater isn't a clean proxy for population burden; this uses Bayesian selective latent inference for wastewater-first evidence.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)