AI Benchmark Digest — 2026-04-02
Last 24 Hours
The AI benchmark landscape saw significant movement in the last 24 hours, highlighted by the arrival of next-generation previews and a shake-up in reasoning and creative leaderboards.
- Next-Gen Dominance in Strategic Play: GPT-5.4 made a massive debut, seizing the #1 spot on the Kaggle Game Arena Werewolf leaderboard with a 0.00078 Equilibrium Rating. It narrowly edged out the Gemini 3.1 Pro Preview, which landed at #2 with a -0.003776 rating, signaling a new era of high-level social deduction and game theory capabilities.
- Reasoning and Vision Gains: The trinity-large-thinking model entered the PinchBench at #2, achieving a high 91.92% Success Rate. Meanwhile, Qwen3-VL-Plus showcased strong multimodal performance, debuting at #3 on the IDP Leaderboard with an average score of 77.9%.
- Shift in Creative Coding: A new leader emerged on ASCIIBench as claude-opus-4.1 (1655.0 ELO) overtook gemini-3-pro-preview to claim the top spot, suggesting improved spatial reasoning and character-based rendering.
- Strategic Leaderboard Turnover: The Kaggle Game Arena Werewolf saw a definitive change in leadership, with GPT-5.4 successfully displacing Gemini 3 Pro Preview from the summit.
Last 7 Days
No significant benchmark changes in the last 7 days.
Don't miss what's next. Subscribe to Mikhail Doroshenko: