AI Research Brief
Archives
Search...
Subscribe
Open Omni Hits Flagship Scale, Self-Judge Breaks, Reasoning Leaks Forgotten Facts
April 20, 2026
Open omni finally hits closed-flagship scale. Qwen3.5-Omni pushes parameter count into tens of billions with 256k context and MoE, targeting latency,...
Compile the Corpus Into a Skill Tree, Train Surrogates on Logs
April 18, 2026
RAG shifts from "retrieve-consume" to "walk-and-drill." Corpus2Skill compiles the entire corpus offline into a hierarchical skill tree; the agent drills down...
Tencent Open-Sources 3D World Generation, VLM Modal Bias Probe
April 17, 2026
Tencent HY-World 2.0 ships 3D world generation as a four-stage pipeline (panorama → trajectory → view expansion → multi-view synthesis), turning text or a...
Big Models Resist Rumors but Fall for Noise
April 16, 2026
Agent failures split into two measurable error modes: locking onto one path (over-exploit) and wandering without direction (over-explore) can be separated by...
VLMs Break When You Change the Rules
April 15, 2026
VLMs Read the Board but Can't Follow Alternative Rules. 14 models on identical endgame images score consistently higher under standard rules than inverted...
dLLMs Hallucinate Differently, PRM Labeling Cost Drops 100x
April 14, 2026
dLLMs hallucinate in fundamentally different ways than autoregressive models. The first controlled comparison identifies three unique failure modes...
SFT Convergence Hides Failures, Attention Hijacking Hits 94%
April 14, 2026
SFT loss convergence doesn't mean the model learned everything. Five systematic failure modes reproduced across three model families show that aggregate...
DMax Triples Parallel Decoding Efficiency for Diffusion LMs
April 12, 2026
Tencent unifies robot perception and planning in a single VLM. They release both a 2B on-device model and a 32B reasoning model, calling into question...
Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o
April 11, 2026
Agent Skills Should Self-Evolve From User Populations. SkillClaw turns multi-user interaction traces into skill evolution signals. One user's correction...
1.7x Faster From Fine-Tuning Alone, Token Collapse Misdiagnosed
April 10, 2026
Fine-tuning alone teaches LLMs to output multiple tokens per step. MARS needs no architecture changes and no extra parameters. Qwen2.5-7B hits 1.71x wall-...
Entropy Is Lying to You, Implicit Reasoning Tops Out at 7 Steps
April 9, 2026
Stable entropy doesn't mean healthy reasoning. RAGEN-2 exposes "template collapse" in agentic RL: models learn fixed templates for all inputs while entropy...
120B on One GPU, and 40% of Video Benchmarks Are Guessable
April 8, 2026
Single GPU Trains 120B at Full Precision, 1.84x Faster Than DeepSpeed. MegaTrain demotes the GPU to a transient compute engine, storing all parameters in CPU...
Streaming Video QA Hits 2 FPS, RLVR Shrugs Off Noisy Labels
April 8, 2026
VideoLLM achieves 2 FPS streaming video QA. AURA unifies continuous perception and proactive response in one end-to-end architecture, with ASR+TTS integrated...
Learned Sparsity Cuts Diffusion Inference Compute by 54%
April 7, 2026
Learned sparsity cuts diffusion inference compute by 54% with no quality loss. DiffSparse trains a lightweight predictor to decide per-layer, per-step token...
Open-Source 32B Cracks Hardware Code, Agents Score Just 23%
April 6, 2026
Open-Source 32B Reaches Top Tier for Hardware Code Debugging. InCoder distills reasoning chains from engineers' actual error-fix cycles. It ranks among the...
Agents Hit 23% on Hard Tasks, CLIP's Three-Year Path Dependency
April 6, 2026
Error-Driven Chain-of-Thought Synthesis Fills the Industrial Code Reasoning Data Gap. InCoder generates reasoning traces through multi-turn error feedback...
4M Game Frames Train Rendering, Internalized Skills Beat Retrieval
April 4, 2026
Discrete Tokens Are LLM's Architectural Ceiling, Not an Optimization Target. A survey traces four technical threads showing core computation migrating from...
Single Neurons Remember Entities, Reusable Routines Boost 19%
April 3, 2026
Single MLP Neurons Can Trigger Entity-Level "Amnesia." Google verified causal links across 200 entities — knowledge editing may shift from broad surgery to...
Minimalist Agents Match MCP, Code Models Think Mid-Stream
April 3, 2026
A Terminal-Only Agent Matches Fully Equipped MCP Setups. 72 HF upvotes confirm practitioners' collective anxiety about agent over-engineering is real — but...
Data Mixing Becomes Post-Training, Surface Cues Hijack Reasoning 38x
April 1, 2026
Data mixing ratios move from pre-training hyperparameter to post-training optimization. OptiMer trains per-dataset models, then searches for optimal merge...
Newer archives
Older archives