Archive (Page 3) • AI Research Brief • Buttondown

Watermarks Enable Bit-Level Tracing, Diffusion VLMs Ground GUI

March 30, 2026

Discrete diffusion VLMs validated for GUI grounding for the first time. Bidirectional attention shows structural advantages on spatial tasks. Data diversity...

Mistral Ships TTS, Diffusion LLMs Get 4.7x Faster

March 28, 2026

Mistral becomes the first major LLM lab to ship its own TTS. Three seconds of reference audio is enough for voice cloning. Speech synthesis is shifting from...

Self-Distillation Strips Out Hesitation, OOD Drops 40%

March 27, 2026

Self-distillation strips out the model's ability to hesitate, not redundant steps. Once epistemic verbalization is suppressed, OOD performance drops up to...

Speculative Execution Hits Agent Loops, 3x Faster

March 26, 2026

Speculative Execution Comes to Agent Loops, Up to 3.35x Speedup. SpecEyes borrows CPU branch prediction for multimodal agents: a small model predicts...

Diffusion OCR Decodes 3.2x Faster, Single-Stream AV in 2 Seconds

March 25, 2026

Diffusion Decoding Replaces Autoregressive OCR, Going From Serial to Parallel. MinerU-Diffusion reframes document parsing as inverse rendering, using block-...

PDEs Beat Attention 2x, Local RL Saves 3/4 Compute

March 24, 2026

Decomposing formal proofs into three independent RL tasks beats end-to-end training. LongCat-Flash-Prover separates autoformalization, scaffolding, and step-...

Seed1.8 Goes Agent-Native, Language Training Erodes Vision

March 24, 2026

Seed1.8 unifies search, code execution, and GUI interaction at the foundation layer. ByteDance's agent-native model optimizes for latency and cost in...

12B Beats GPT-4, Distilled Students Surpass Teachers

March 23, 2026

Generative recommendation's "generalization advantage" degrades to token-level memorization at closer inspection. Per-instance fusion of both paradigms beats...

3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work

March 21, 2026

Cascade RL plus multi-domain distillation lets 3B active parameters win three olympiad golds. NVIDIA open-sourced the full training recipe. Small-model...

3D at 0.1% Tokens, Video Fine-Tuning's Hidden Spatial Cost

March 20, 2026

Misaligned experience replay is a silent bottleneck in agent RL. Complementary RL lets the experience extractor adapt based on policy performance, enabling...

First 32B Industrial Code Model, War-Tested Reasoning Eval

March 19, 2026

General-purpose code models collapse on industrial tasks. The root cause is data and paradigm mismatch. InCoder-32B is the first 32B open-source base model...

Open-Source Search Agent Wins With 12K Samples, Agent Skills Mostly Fail

March 19, 2026

An open-source search agent trained on 12K synthetic samples beats closed-source competitors. OpenSeeker nearly doubles the second-best on BrowseComp with...

700K Paper Pairs Distill Taste, Null Spaces Expose Blind Spots

March 17, 2026

Community citation signals can train "taste." RLCF uses 700K paper pairs for preference modeling, producing a judge that outperforms GPT-5.2. The paradigm...

Expert Reasoning Structure for CoT, +13% on Novel Class Discovery

March 17, 2026

Design CoT Supervision From Domain Experts' Actual Reasoning Process. In medical VQA, structured clinical workflows as CoT steps improve both accuracy and...

Budget-Aware Agents Beat 4x Brute-Force Sampling

March 16, 2026

SWE agent training is bottlenecked by executable environments, not algorithms. OpenSWE open-sources 45,320 Dockerized training environments across 12,800+...

Document Agents Navigate by Luck, Prefill Speeds Up 1.82x

March 14, 2026

Document Agents' Reasoning Is Overestimated. MADQA's benchmark, designed with classical test theory, shows the best multimodal agents match human accuracy...

Encode the Answer, Not the Question — Embeddings Gain 9%

March 13, 2026

Encoding LLM Responses Instead of User Queries Lifts Embeddings by 9.3%. LLM2Vec-Gen uses purely self-supervised training to beat the best unsupervised...

\"Think It Over\" Can Unlock a Model's Memory Bank

March 12, 2026

CoT Reasoning Doubles as a Parametric Memory Search Engine. Google finds that even simple factual questions benefit from reasoning mode — reasoning tokens...

Write Code Before You Draw, Layouts Improve 68%

March 11, 2026

All Intrinsic RLVR Is Just Sharpening the Initial Distribution. Model prior quality sets the training ceiling. Model Collapse Step can predict feasibility...

4-Step Diffusion Beats 100-Step Baselines, Layer Skipping Saves 18%

March 10, 2026

Non-Differentiable Rewards Now Work for Few-Step Diffusion RL Training. 4-step generation beats 100-step baselines across the board. Human preference,...