Mikhail Doroshenko

Archives
April 18, 2026

AI Benchmark Digest — 2026-04-18

AI Benchmark Digest — 2026-04-18

=== DAILY === NEW BENCHMARKS (4) - GameWorld Generalist (Progress (%)): leader Gemini-3-Flash-Preview (41.9), 10 models - GameWorld Computer-Use (Progress (%)): leader Seed-1.8 (39.8), 8 models - LLM Stats (CyberGym) (Score (%)): leader Claude Mythos Preview (83.1), 5 models - PickMeBench (Win Rate (%)): leader claude-opus-4-6 (78.0), 6 models

NEW #1 LEADERS (5) - AA GDPval (ELO): Claude Opus 4.7 (max) (1752.68) beat GPT-5.4 (xhigh) (1673.63) by 79.05 - Chatbot Arena (Code) (Elo): claude-opus-4-7 (1583.0) beat claude-opus-4-6-thinking (1548.0) by 35.0 - Chatbot Arena (Text) (Elo): claude-opus-4-7-thinking (1505.0) beat claude-opus-4-6-thinking (1502.0) by 3.0 - Open Arabic LLM Leaderboard (Average Score (%)): Qwen3-8B-SFT-V2 (80.49) beat Karnak (79.37) by 1.12 - Artificial Analysis Intelligence Index (Intelligence Index): Claude Opus 4.7 (max) (57.28) beat Gemini 3.1 Pro Preview (57.18) by 0.1


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.