Donald Knuth's Open Problem Fell to Claude in Under an Hour

        March 4, 2026

Donald Knuth's Open Problem Fell to Claude in Under an Hour

        1. Donald Knuth Spent Weeks on an Open Problem. Claude Solved It in an Hour.
"Shock! Shock!" Donald Knuth wrote on February 28.
2. Code Agents Ace the Easy Test. The Hard One Breaks Them
Alibaba's Qwen team released Qwen3-Coder-Next this week, an 80-billion-parameter coding agent that activates only 3 billion parameters at inference time.
3. Simon Willison Publishes an Agentic Engineering Field Guide, Starting With What Not to Do
The first rule of Simon Willison's new agentic engineering guide isn't a pattern. It's a warning: stop filing pull requests full of code you never read.

In Brief

Kuaishou Releases Kling-MotionControl for Character Animation Kuaishou published a DiT-based framework that transfers motion from a driving video to a reference image, producing full-body character animation. The system handles face, hands, and body motion through separate control modules unified in a single pipeline. HuggingFace

Researchers Map the Design Space for Native Multimodal Pretraining A team ran controlled from-scratch pretraining experiments isolating factors that govern multimodal learning without prior language pretraining. The study uses the Transfusion framework — next-token prediction for language, diffusion for vision — to identify which design choices actually matter. HuggingFace

PRISM Uses Process Reward Models to Fix Deep Reasoning Failures A new inference framework addresses a core flaw in deep-thinking methods: longer deliberation often amplifies errors rather than correcting them. PRISM injects correctness signals during inference through process reward models, preventing wrong candidates from dominating the solution pool. HuggingFace

Utonia Trains One Point Cloud Encoder Across Five Domains A self-supervised point transformer encoder learns shared representations across remote sensing, outdoor LiDAR, indoor RGB-D, CAD models, and RGB-lifted point clouds. Despite radically different sensing geometries and densities, the single model matches or beats domain-specific encoders. HuggingFace

UniG2U-Bench Tests Whether Generation Actually Helps Understanding A new benchmark systematically evaluates when multimodal generation improves comprehension, covering 7 regimes and 30 subtasks. The framework requires varying degrees of visual transformation, filling a gap left by benchmarks that test generation and understanding separately. HuggingFace

SteerEval Benchmarks LLM Controllability Across Language, Sentiment, and Personality Researchers introduced a hierarchical evaluation framework testing how well LLMs follow behavioral specifications at three levels: what to express, how to express it, and how to instantiate it. The benchmark targets deployment in socially sensitive domains where unpredictable output poses direct risk. HuggingFace

Mix-GRM Separates Breadth and Depth in Chain-of-Thought Reward Models Current generative reward models scale reasoning by making chains of thought longer, ignoring that breadth (multi-dimensional coverage) and depth (judgment soundness) serve different functions. Mix-GRM structures these two mechanisms separately, improving evaluation reliability without relying on raw length. HuggingFace

InSight Replaces Difficulty Heuristics with Information-Guided Data Selection for RL Standard RL training for LLMs picks data by difficulty — favoring mid-range success rates — which confuses hard problems with informative ones. InSight uses weighted mutual information to select training samples, accounting for epistemic uncertainty from limited evidence. HuggingFace

Kiwi-Edit Adds Reference Image Guidance to Video Editing A new pipeline generates high-quality paired training data to enable reference-guided video editing, bypassing the bottleneck of scarce training pairs. The system accepts both text instructions and reference images, giving editors precise visual control that language alone cannot specify. HuggingFace

Adaptive Test-Time Scaling Tackles Image Editing Efficiency Image Chain-of-Thought methods improve generation quality by extending inference time but waste compute on editing tasks where the solution space is already constrained. This work introduces adaptive sampling budgets and early-stage verification tuned specifically for instruction-based image editing. HuggingFace

Read the full edition →

                            Don't miss what's next. Subscribe to AI News Digest:

            Email address (required)