Archive (Page 4) • AI Research Brief • Buttondown

12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster

March 10, 2026

Post-Training Data Matters More Than Model Size in Vertical Domains. A systematic ablation in finance shows that distillation quality control plus...

Drop CLIP, Gain Performance: VLMs Work Better Without It

March 9, 2026

Contrastive Pretraining Actively Hurts VLMs. CLIP optimizes for category discrimination, not fine-grained understanding. Tencent's Penguin-VL initializes the...

LLM-Initialized Vision Encoders Outperform Larger Models at 2B

March 9, 2026

An LLM-Initialized Vision Encoder at 2B Beats Larger Models on Multiple Benchmarks. Contrastive pretraining optimizes for coarse-grained matching; VLMs need...

\"Be Concise\" Halves Tokens, Lifts Accuracy by 16 Points

March 7, 2026

"Be Concise" Self-Distillation Halves Tokens and Raises Accuracy. Qwen3 on MATH-500: 57% fewer reasoning tokens, 16-point accuracy gain. Redundant reasoning...

Code Agents Can't Cross Repo Boundaries, Under 45% Success

March 5, 2026

Code agents fall apart outside single-repo fixes. BeyondSWE tests four dimensions across 500 instances. The best model stays below 45% success. Adding search...

Direct Lottie Generation, DPO's Built-In Forgetting Defense

March 4, 2026

AI-generated animation now outputs editable project files directly. OmniLottie compresses Lottie's verbose JSON into parameterized token sequences, letting...

9K Samples Rival R1, Most RL Gains Trace Back to SFT

March 3, 2026

A 4B reasoning model trained on 9K curated samples approaches DeepSeek-R1. CHIMERA shows the real bottleneck in reasoning training is domain coverage and...

Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy

March 3, 2026

A single spectral condition unifies μP scaling across width and depth. No more per-architecture, per-optimizer derivations for hyperparameter transfer. Code...

Drop 90% of Vision Tokens, Keep the Performance

March 2, 2026

Spatial relationships in image generation can now be optimized, not just hoped for. SpatialScore trains a reward model that outperforms GPT-4V on spatial...

Latent Reasoning's Gains Aren't From Reasoning

February 28, 2026

Latent reasoning gains come from side effects, not reasoning itself. Causal mediation analysis reveals a causal disconnect between latent tokens and both...

Tri-Modal Training From Scratch, Agentic RL Gets a Stability Fix

February 27, 2026

Apple trains a tri-modal masked diffusion model from scratch. Systematic testing of scaling laws, modality mixing, and noise schedules makes this directly...

TTT Is Linear Attention, Terminal Agent Data Recipe Goes Open

February 26, 2026

TTT architectures are formally equivalent to linear attention. NVIDIA's proof unifies two independent research communities, sharply narrowing the design...

11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x

February 26, 2026

A 20-person red team exposed 11 agent failure modes in a real deployment environment with persistent memory, email, Discord, and shell access. The most...

Token Probabilities as Zero-Shot Rewards Hit 0.95 Correlation

February 25, 2026

An LLM builds an internal "world model" of kernel behavior to plan optimization paths. On complex kernels like MoE, it runs 14x faster than evolutionary...

74% of Agent Coordination May Be Wasted Effort

February 24, 2026

74% of enterprise workflow tasks don't need inter-agent coordination. Monotonicity analysis provides a formal test: if merging sub-results can't make things...

Model Folding Beats Pruning, XR Gets Hand-Level Control

February 23, 2026

Weight folding outperforms pruning at most compression rates. ICLR 2026 work proves folding yields lower reconstruction error and validates across 1,000+...