Mikhail Doroshenko

Archives
April 2, 2026

AI Benchmark Digest — 2026-04-02

Last 24 Hours

The AI benchmark landscape saw significant movement in the last 24 hours, highlighted by the arrival of next-generation previews and a shake-up in reasoning and creative leaderboards.

  • Next-Gen Dominance in Strategic Play: GPT-5.4 made a massive debut, seizing the #1 spot on the Kaggle Game Arena Werewolf leaderboard with a 0.00078 Equilibrium Rating. It narrowly edged out the Gemini 3.1 Pro Preview, which landed at #2 with a -0.003776 rating, signaling a new era of high-level social deduction and game theory capabilities.
  • Reasoning and Vision Gains: The trinity-large-thinking model entered the PinchBench at #2, achieving a high 91.92% Success Rate. Meanwhile, Qwen3-VL-Plus showcased strong multimodal performance, debuting at #3 on the IDP Leaderboard with an average score of 77.9%.
  • Shift in Creative Coding: A new leader emerged on ASCIIBench as claude-opus-4.1 (1655.0 ELO) overtook gemini-3-pro-preview to claim the top spot, suggesting improved spatial reasoning and character-based rendering.
  • Strategic Leaderboard Turnover: The Kaggle Game Arena Werewolf saw a definitive change in leadership, with GPT-5.4 successfully displacing Gemini 3 Pro Preview from the summit.

Last 7 Days

No significant benchmark changes in the last 7 days.


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.