Donald Knuth's Open Problem Fell to Claude in Under an Hour
1. Donald Knuth Spent Weeks on an Open Problem. Claude Solved It in an Hour. "Shock! Shock!" Donald Knuth wrote on February 28.
2. Code Agents Ace the Easy Test. The Hard One Breaks Them Alibaba's Qwen team released Qwen3-Coder-Next this week, an 80-billion-parameter coding agent that activates only 3 billion parameters at inference time.
3. Simon Willison Publishes an Agentic Engineering Field Guide, Starting With What Not to Do The first rule of Simon Willison's new agentic engineering guide isn't a pattern. It's a warning: stop filing pull requests full of code you never read.
In Brief
- Kuaishou Releases Kling-MotionControl for Character Animation Kuaishou published a DiT-based framework that transfers motion from a driving video to a reference image, producing full-body character animation. The system handles face, hands, and body motion through separate control modules unified in a single pipeline. HuggingFace
- Researchers Map the Design Space for Native Multimodal Pretraining A team ran controlled from-scratch pretraining experiments isolating factors that govern multimodal learning without prior language pretraining. The study uses the Transfusion framework — next-token prediction for language, diffusion for vision — to identify which design choices actually matter. HuggingFace
- PRISM Uses Process Reward Models to Fix Deep Reasoning Failures A new inference framework addresses a core flaw in deep-thinking methods: longer deliberation often amplifies errors rather than correcting them. PRISM injects correctness signals during inference through process reward models, preventing wrong candidates from dominating the solution pool. HuggingFace
- Utonia Trains One Point Cloud Encoder Across Five Domains A self-supervised point transformer encoder learns shared representations across remote sensing, outdoor LiDAR, indoor RGB-D, CAD models, and RGB-lifted point clouds. Despite radically different sensing geometries and densities, the single model matches or beats domain-specific encoders. HuggingFace
- UniG2U-Bench Tests Whether Generation Actually Helps Understanding A new benchmark systematically evaluates when multimodal generation improves comprehension, covering 7 regimes and 30 subtasks. The framework requires varying degrees of visual transformation, filling a gap left by benchmarks that test generation and understanding separately. HuggingFace
- SteerEval Benchmarks LLM Controllability Across Language, Sentiment, and Personality Researchers introduced a hierarchical evaluation framework testing how well LLMs follow behavioral specifications at three levels: what to express, how to express it, and how to instantiate it. The benchmark targets deployment in socially sensitive domains where unpredictable output poses direct risk. HuggingFace
- Mix-GRM Separates Breadth and Depth in Chain-of-Thought Reward Models Current generative reward models scale reasoning by making chains of thought longer, ignoring that breadth (multi-dimensional coverage) and depth (judgment soundness) serve different functions. Mix-GRM structures these two mechanisms separately, improving evaluation reliability without relying on raw length. HuggingFace
- InSight Replaces Difficulty Heuristics with Information-Guided Data Selection for RL Standard RL training for LLMs picks data by difficulty — favoring mid-range success rates — which confuses hard problems with informative ones. InSight uses weighted mutual information to select training samples, accounting for epistemic uncertainty from limited evidence. HuggingFace
- Kiwi-Edit Adds Reference Image Guidance to Video Editing A new pipeline generates high-quality paired training data to enable reference-guided video editing, bypassing the bottleneck of scarce training pairs. The system accepts both text instructions and reference images, giving editors precise visual control that language alone cannot specify. HuggingFace
- Adaptive Test-Time Scaling Tackles Image Editing Efficiency Image Chain-of-Thought methods improve generation quality by extending inference time but waste compute on editing tasks where the solution space is already constrained. This work introduces adaptive sampling budgets and early-stage verification tuned specifically for instruction-based image editing. HuggingFace
Don't miss what's next. Subscribe to AI News Digest: