AI Benchmark Digest — 2026-04-01
Last 24 Hours
It was a high-stakes 24 hours for the leaderboards, with several "next-generation" heavyweights reclaiming top spots across vision, coding, and reasoning benchmarks.
Here are the most significant shifts in the AI landscape:
- Vision Supremacy Reclaimed: Claude Opus 4.6 surged to the top of the Chatbot Arena (Vision) with an Arena Score of 1295.0, successfully ousting Gemini 3 Pro by a 9-point margin.
- Perfect Reasoning Score: In a rare display of mathematical precision, Gemini 3 Deep Think (Feb 2026) achieved a perfect 100.0% on the IUMB benchmark, overtaking GPT-5.2 (xhigh).
- Engineering and Coding Shifts: Claude Sonnet 4.5 took the lead in the Kaggle WWTP Engineering challenge with a score of 84.62%, while Prism (1413.0 Elo) narrowly edged out Gemini 3.1 Pro Preview to lead the Design Arena (SVG).
- Spatial and Logic Gains: GPT-5.4 High claimed the top spot in CubeBench with a 66.7% success rate, while Gemini 3 Pro Preview (1653.0 Elo) moved into first place on ASCIIBench.
- Forecasting Dominance: The Cassi ensemble 2 crowdadj model took the lead in ForecastBench with an overall score of 67.9, proving the continued efficacy of ensemble methods over zero-shot approaches.
Last 7 Days
No significant benchmark changes in the last 7 days.
Don't miss what's next. Subscribe to Mikhail Doroshenko: