Archive (Page 2) • AI Research Brief • Buttondown

Open Omni Hits Flagship Scale, Self-Judge Breaks, Reasoning Leaks Forgotten Facts

April 20, 2026

Open omni finally hits closed-flagship scale. Qwen3.5-Omni pushes parameter count into tens of billions with 256k context and MoE, targeting latency,...

Compile the Corpus Into a Skill Tree, Train Surrogates on Logs

April 18, 2026

RAG shifts from "retrieve-consume" to "walk-and-drill." Corpus2Skill compiles the entire corpus offline into a hierarchical skill tree; the agent drills down...

Tencent Open-Sources 3D World Generation, VLM Modal Bias Probe

April 17, 2026

Tencent HY-World 2.0 ships 3D world generation as a four-stage pipeline (panorama → trajectory → view expansion → multi-view synthesis), turning text or a...

Big Models Resist Rumors but Fall for Noise

April 16, 2026

Agent failures split into two measurable error modes: locking onto one path (over-exploit) and wandering without direction (over-explore) can be separated by...

VLMs Break When You Change the Rules

April 15, 2026

VLMs Read the Board but Can't Follow Alternative Rules. 14 models on identical endgame images score consistently higher under standard rules than inverted...

dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x

April 14, 2026

dLLMs hallucinate in fundamentally different ways than autoregressive models. The first controlled comparison identifies three unique failure modes...

SFT Convergence Hides Failures, Attention Hijacking Hits 94%

April 14, 2026

SFT loss convergence doesn't mean the model learned everything. Five systematic failure modes reproduced across three model families show that aggregate...

DMax Triples Parallel Decoding Efficiency for Diffusion LMs

April 12, 2026

Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question...

Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o

April 11, 2026

Agent Skills Should Self-Evolve From User Populations. SkillClaw turns multi-user interaction traces into skill evolution signals. One user's correction...

1.7x Faster From Fine-Tuning Alone, Token Collapse Misdiagnosed

April 10, 2026

Fine-tuning alone teaches LLMs to output multiple tokens per step. MARS needs no architecture changes and no extra parameters. Qwen2.5-7B hits 1.71x wall-...

Entropy Is Lying to You, Implicit Reasoning Tops Out at 7 Steps

April 9, 2026

Stable entropy doesn't mean healthy reasoning. RAGEN-2 exposes "template collapse" in agentic RL: models learn fixed templates for all inputs while entropy...

120B on One GPU, and 40% of Video Benchmarks Are Guessable

April 8, 2026

Single GPU Trains 120B at Full Precision, 1.84x Faster Than DeepSpeed. MegaTrain demotes the GPU to a transient compute engine, storing all parameters in CPU...

Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels

April 8, 2026

VideoLLM achieves 2 FPS streaming video QA. AURA unifies continuous perception and proactive response in one end-to-end architecture, with ASR+TTS integrated...

Learned Sparsity Cuts Diffusion Inference Compute by 54%

April 7, 2026

Learned sparsity cuts diffusion inference compute by 54% with no quality loss. DiffSparse trains a lightweight predictor to decide per-layer, per-step token...

Open-Source 32B Cracks Hardware Code, Agents Score Just 23%

April 6, 2026

Open-Source 32B Reaches Top Tier for Hardware Code Debugging. InCoder distills reasoning chains from engineers' actual error-fix cycles. It ranks among the...

Agents Hit 23% on Hard Tasks, CLIP's Three-Year Path Dependency

April 6, 2026

Error-Driven Chain-of-Thought Synthesis Fills the Industrial Code Reasoning Data Gap. InCoder generates reasoning traces through multi-turn error feedback...

4M Game Frames Train Rendering, Internalized Skills Beat Retrieval

April 4, 2026

Discrete Tokens Are LLM's Architectural Ceiling, Not an Optimization Target. A survey traces four technical threads showing core computation migrating from...

Single Neurons Remember Entities, Reusable Routines Boost 19%

April 3, 2026

Single MLP Neurons Can Trigger Entity-Level "Amnesia." Google verified causal links across 200 entities — knowledge editing may shift from broad surgery to...

Minimalist Agents Match MCP, Code Models Think Mid-Stream

April 3, 2026

A Terminal-Only Agent Matches Fully Equipped MCP Setups. 72 HF upvotes confirm practitioners' collective anxiety about agent over-engineering is real — but...

Data Mixing Becomes Post-Training, Surface Cues Hijack Reasoning 38x

April 1, 2026

Data mixing ratios move from pre-training hyperparameter to post-training optimization. OptiMer trains per-dataset models, then searches for optimal merge...