AI Research Brief
Archives
Search...
Subscribe
Watermarks Enable Bit-Level Tracing, Diffusion VLMs Ground GUI
March 30, 2026
Discrete diffusion VLMs validated for GUI grounding for the first time. Bidirectional attention shows structural advantages on spatial tasks. Data diversity...
Mistral Ships TTS, Diffusion LLMs Get 4.7x Faster
March 28, 2026
Mistral becomes the first major LLM lab to ship its own TTS. Three seconds of reference audio is enough for voice cloning. Speech synthesis is shifting from...
Self-Distillation Strips Out Hesitation, OOD Drops 40%
March 27, 2026
Self-distillation strips out the model's ability to hesitate, not redundant steps. Once epistemic verbalization is suppressed, OOD performance drops up to...
Speculative Execution Hits Agent Loops, 3x Faster
March 26, 2026
Speculative Execution Comes to Agent Loops, Up to 3.35x Speedup. SpecEyes borrows CPU branch prediction for multimodal agents: a small model predicts...
Diffusion OCR Decodes 3.2x Faster, Single-Stream AV in 2 Seconds
March 25, 2026
Diffusion Decoding Replaces Autoregressive OCR, Going From Serial to Parallel. MinerU-Diffusion reframes document parsing as inverse rendering, using block-...
PDEs Beat Attention 2x, Local RL Saves 3/4 Compute
March 24, 2026
Decomposing formal proofs into three independent RL tasks beats end-to-end training. LongCat-Flash-Prover separates autoformalization, scaffolding, and step-...
Seed1.8 Goes Agent-Native, Language Training Erodes Vision
March 24, 2026
Seed1.8 unifies search, code execution, and GUI interaction at the foundation layer. ByteDance's agent-native model optimizes for latency and cost in...
12B Beats GPT-4, Distilled Students Surpass Teachers
March 23, 2026
Generative recommendation's "generalization advantage" degrades to token-level memorization at closer inspection. Per-instance fusion of both paradigms beats...
3B Params Win Three Olympiad Golds, 768-D Discrete Tokens Work
March 21, 2026
Cascade RL plus multi-domain distillation lets 3B active parameters win three olympiad golds. NVIDIA open-sourced the full training recipe. Small-model...
3D at 0.1% Tokens, Video Fine-Tuning's Hidden Spatial Cost
March 20, 2026
Misaligned experience replay is a silent bottleneck in agent RL. Complementary RL lets the experience extractor adapt based on policy performance, enabling...
First 32B Industrial Code Model, War-Tested Reasoning Eval
March 19, 2026
General-purpose code models collapse on industrial tasks. The root cause is data and paradigm mismatch. InCoder-32B is the first 32B open-source base model...
Open-Source Search Agent Wins With 12K Samples, Agent Skills Mostly Fail
March 19, 2026
An open-source search agent trained on 12K synthetic samples beats closed-source competitors. OpenSeeker nearly doubles the second-best on BrowseComp with...
700K Paper Pairs Distill Taste, Null Spaces Expose Blind Spots
March 17, 2026
Community citation signals can train "taste." RLCF uses 700K paper pairs for preference modeling, producing a judge that outperforms GPT-5.2. The paradigm...
Expert Reasoning Structure for CoT, +13% on Novel Class Discovery
March 17, 2026
Design CoT Supervision From Domain Experts' Actual Reasoning Process. In medical VQA, structured clinical workflows as CoT steps improve both accuracy and...
Budget-Aware Agents Beat 4x Brute-Force Sampling
March 16, 2026
SWE agent training is bottlenecked by executable environments, not algorithms. OpenSWE open-sources 45,320 Dockerized training environments across 12,800+...
Document Agents Navigate by Luck, Prefill Speeds Up 1.82x
March 14, 2026
Document Agents' Reasoning Is Overestimated. MADQA's benchmark, designed with classical test theory, shows the best multimodal agents match human accuracy...
Encode the Answer, Not the Question — Embeddings Gain 9%
March 13, 2026
Encoding LLM Responses Instead of User Queries Lifts Embeddings by 9.3%. LLM2Vec-Gen uses purely self-supervised training to beat the best unsupervised...
\"Think It Over\" Can Unlock a Model's Memory Bank
March 12, 2026
CoT Reasoning Doubles as a Parametric Memory Search Engine. Google finds that even simple factual questions benefit from reasoning mode — reasoning tokens...
Write Code Before You Draw, Layouts Improve 68%
March 11, 2026
All Intrinsic RLVR Is Just Sharpening the Initial Distribution. Model prior quality sets the training ceiling. Model Collapse Step can predict feasibility...
4-Step Diffusion Beats 100-Step Baselines, Layer Skipping Saves 18%
March 10, 2026
Non-Differentiable Rewards Now Work for Few-Step Diffusion RL Training. 4-step generation beats 100-step baselines across the board. Human preference,...
Newer archives
Older archives