120B on One GPU, and 40% of Video Benchmarks Are Guessable

        April 8, 2026

120B on One GPU, and 40% of Video Benchmarks Are Guessable

Single GPU Trains 120B at Full Precision, 1.84x Faster Than DeepSpeed. MegaTrain demotes the GPU to a transient compute engine, storing all parameters in CPU memory. Pipeline double-buffering breaks the bandwidth bottleneck. Small teams should evaluate this single-machine route.

40–60% of Video Understanding Questions Are Answerable Without Watching. Two independent papers expose models doing reading comprehension instead of video understanding. Filtering text bias and training on less data improves scores by 6.2 points.

Without Specifying Catalytic Sites, AI-Designed Enzymes Outperform Human-Engineered Ones. DISCO uses diffusion models to jointly generate sequences and 3D structures. Inference-time scaling extends the search into chemical space nature never explored.

Office Agents Hit 53–63% Success but 7–23% Unsafe Operations. Apple's ClawsBench exposes 8 systematic failure modes in high-fidelity multi-service workspaces. Stronger capability does not mean safer behavior.

Also Notable

Google MedGemma 1.5: One Architecture for CT/MRI Volumes, Pathology Slides, and Longitudinal Chest X-Rays. — A 4B parameter model covers 3D imaging, pathology, and longitudinal comparison in a single architecture.
CMU Improves Sparse Memory Fine-Tuning: Absorbing New Knowledge Without Forgetting. — Standard fine-tuning causes catastrophic forgetting. Sparse memory methods find a better balance between knowledge injection and capability retention.
Controlling What Models Remember and Forget During Training, Not After. — ACL work. A training framework that directly regulates memorization behavior, with new approaches for privacy compliance and knowledge management.
Reliable Multi-Bit Watermarking: Traceable Binary Information in LLM Output. — ACL work. Unlike single-bit "is this AI-generated" detection, multi-bit watermarks tag specific sources and versions.
Multilingual Models Organize by Writing System, Not Linguistic Structure. — ACL work. Same language in a different script gets clustered separately, reframing how multilingual models organize internal representations.
Yale Multi-Agent Paper Writing: Tackling Shallow Literature Reviews. — Multiple specialized agents collaborate to improve literature coverage and review quality over single-agent approaches.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)