AI Benchmark Digest — 2026-04-02

        April 2, 2026

AI Benchmark Digest — 2026-04-02

        Last 24 Hours
The AI benchmark landscape saw significant movement in the last 24 hours, highlighted by the arrival of next-generation previews and a shake-up in reasoning and creative leaderboards.

Next-Gen Dominance in Strategic Play: GPT-5.4 made a massive debut, seizing the #1 spot on the Kaggle Game Arena Werewolf leaderboard with a 0.00078 Equilibrium Rating. It narrowly edged out the Gemini 3.1 Pro Preview, which landed at #2 with a -0.003776 rating, signaling a new era of high-level social deduction and game theory capabilities.
Reasoning and Vision Gains: The trinity-large-thinking model entered the PinchBench at #2, achieving a high 91.92% Success Rate. Meanwhile, Qwen3-VL-Plus showcased strong multimodal performance, debuting at #3 on the IDP Leaderboard with an average score of 77.9%.
Shift in Creative Coding: A new leader emerged on ASCIIBench as claude-opus-4.1 (1655.0 ELO) overtook gemini-3-pro-preview to claim the top spot, suggesting improved spatial reasoning and character-based rendering.
Strategic Leaderboard Turnover: The Kaggle Game Arena Werewolf saw a definitive change in leadership, with GPT-5.4 successfully displacing Gemini 3 Pro Preview from the summit.

Last 7 Days
No significant benchmark changes in the last 7 days.

View on AI Benchmark Hub

                                Don't miss what's next. Subscribe to Mikhail Doroshenko:

            Email address (required)

                    ← Newer

                AI Benchmark Digest — 2026-04-03

                    Older →

                AI Benchmark Digest — 2026-04-01