AI Benchmark Digest — 2026-06-11
AI Benchmark Digest — 2026-06-11
=== DAILY === NEW BENCHMARKS (1) - GDPval-AA (Elo): leader Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (1932.0), 390 models
NEW SCORES FROM TOP-10 MODELS (3) - Claude Fable 5 on Chatbot Arena (Document): 1495.0 Elo (#5/29) - Claude Fable 5 on Chatbot Arena (Vision): 1307.0 Arena Score (#2/131) - Claude Fable 5 on React Native Evals: 86.96 Overall Score (%) (#4/28)
NEW #1 LEADERS (12) - PACT (Lechmazur) (CMS Points): Claude Fable 5 (High) (2171.0) beat GPT-5.5 (High) (2016.0) by 155.0 - Chatbot Arena (Code) (Elo): Claude Fable 5 (1665.0) beat Claude Opus 4.7 (Thinking) (1567.0) by 98.0 - Design Arena (Data Viz) (Elo): Claude Fable 5 (1406.0) beat Claude Opus 4.7 (Thinking) (1338.0) by 68.0 - Design Arena (Website) (Elo): Claude Fable 5 (1364.0) beat Claude Opus 4.6 (1341.0) by 23.0 - Design Arena (3D) (Elo): Claude Fable 5 (1383.0) beat Kimi K2.6 (1366.0) by 17.0 - FrontierSWE (Dominance (%)): Claude Fable 5 (90.0) beat Claude Opus 4.8 (83.0) by 7.0 - Chatbot Arena (Text) (Elo): Claude Fable 5 (1510.0) beat Claude Opus 4.6 (Thinking) (1504.0) by 6.0 - SimpleBench (Score (AVG@5)): Claude Fable (81.9) beat Gemini 3.1 Pro (Preview) (79.6) by 2.3 - UGI - Writing (Writing Score): Claude 5 (74.23) beat Gemini 3.5 Flash (Thinking, Medium) (72.54) by 1.69 - EQ-Bench Longform Writing (Writing Score (0-100)): Claude Fable 5 (83.0) beat Claude Opus 4.7 (81.8) by 1.2 - LLM Stats (Video-MME) (Score (%)): MiMo-V2.5 (87.7) beat Kimi K2.5 (87.4) by 0.3 - LLM Stats (CMMLU) (Score (%)): MiMo-V2.5-Pro (90.2) beat Qwen 2 72B Instruct (90.1) by 0.1