AI Research Brief
Archives
Search...
Subscribe
12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster
March 10, 2026
Post-Training Data Matters More Than Model Size in Vertical Domains. A systematic ablation in finance shows that distillation quality control plus...
Drop CLIP, Gain Performance: VLMs Work Better Without It
March 9, 2026
Contrastive Pretraining Actively Hurts VLMs. CLIP optimizes for category discrimination, not fine-grained understanding. Tencent's Penguin-VL initializes the...
LLM-Initialized Vision Encoders Outperform Larger Models at 2B
March 9, 2026
An LLM-Initialized Vision Encoder at 2B Beats Larger Models on Multiple Benchmarks. Contrastive pretraining optimizes for coarse-grained matching; VLMs need...
\"Be Concise\" Halves Tokens, Lifts Accuracy by 16 Points
March 7, 2026
"Be Concise" Self-Distillation Halves Tokens and Raises Accuracy. Qwen3 on MATH-500: 57% fewer reasoning tokens, 16-point accuracy gain. Redundant reasoning...
Code Agents Can't Cross Repo Boundaries, Under 45% Success
March 5, 2026
Code agents fall apart outside single-repo fixes. BeyondSWE tests four dimensions across 500 instances. The best model stays below 45% success. Adding search...
Direct Lottie Generation, DPO's Built-In Forgetting Defense
March 4, 2026
AI-generated animation now outputs editable project files directly. OmniLottie compresses Lottie's verbose JSON into parameterized token sequences, letting...
9K Samples Rival R1, Most RL Gains Trace Back to SFT
March 3, 2026
A 4B reasoning model trained on 9K curated samples approaches DeepSeek-R1. CHIMERA shows the real bottleneck in reasoning training is domain coverage and...
Spectral Conditions Unify μP Scaling, Data Curation Leaks Privacy
March 3, 2026
A single spectral condition unifies μP scaling across width and depth. No more per-architecture, per-optimizer derivations for hyperparameter transfer. Code...
Drop 90% of Vision Tokens, Keep the Performance
March 2, 2026
Spatial relationships in image generation can now be optimized, not just hoped for. SpatialScore trains a reward model that outperforms GPT-4V on spatial...
Latent Reasoning's Gains Aren't From Reasoning
February 28, 2026
Latent reasoning gains come from side effects, not reasoning itself. Causal mediation analysis reveals a causal disconnect between latent tokens and both...
Tri-Modal Training From Scratch, Agentic RL Gets a Stability Fix
February 27, 2026
Apple trains a tri-modal masked diffusion model from scratch. Systematic testing of scaling laws, modality mixing, and noise schedules makes this directly...
TTT Is Linear Attention, Terminal Agent Data Recipe Goes Open
February 26, 2026
TTT architectures are formally equivalent to linear attention. NVIDIA's proof unifies two independent research communities, sharply narrowing the design...
11 Agent Failure Modes From Red-Teaming, Step-Level Routing Cuts Cost 700x
February 26, 2026
A 20-person red team exposed 11 agent failure modes in a real deployment environment with persistent memory, email, Discord, and shell access. The most...
Token Probabilities as Zero-Shot Rewards Hit 0.95 Correlation
February 25, 2026
An LLM builds an internal "world model" of kernel behavior to plan optimization paths. On complex kernels like MoE, it runs 14x faster than evolutionary...
74% of Agent Coordination May Be Wasted Effort
February 24, 2026
74% of enterprise workflow tasks don't need inter-agent coordination. Monotonicity analysis provides a formal test: if merging sub-results can't make things...
Model Folding Beats Pruning, XR Gets Hand-Level Control
February 23, 2026
Weight folding outperforms pruning at most compression rates. ICLR 2026 work proves folding yields lower reconstruction error and validates across 1,000+...
Newer archives