Mikhail Doroshenko

Archives
April 30, 2026

AI Benchmark Digest — 2026-04-30

AI Benchmark Digest — 2026-04-30

=== DAILY === NEW MODELS (9) - kimi-k2.6_nitro — ELO 1888, #57/1066 (above: Grok 4.20 0309 (Reasoning), below: Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)) GACL - WordMatrix: 66.72 (#2/21) - Kimi K2.6 (Non-reasoning) — ELO 1808, #106/1066 (above: GPT-5 Codex, below: O3 Pro) - deepseek-v4-flash_nitro — ELO 1785, #126/1066 (above: DeepSeek V3.2 Speciale, below: O3) - DeepSeek V4 Pro (Non-reasoning) — ELO 1763, #148/1066 (above: Kimi K2 (Thinking), below: Gemini 3 Flash (Thinking)) - DeepSeek V4 Flash (Non-reasoning) — ELO 1733, #187/1066 (above: Nova 2.0 Lite (High), below: Grok 4.1 Fast (Reasoning)) - MiMo-V2.5-Pro (Non-reasoning) — ELO 1729, #190/1066 (above: DeepSeek V3.2 Exp (Thinking), below: O4 Mini (High)) - Granite 4.1 30B — ELO 1502, #485/1066 (above: K2-V2 (Low), below: InternVL3-8B) - Granite 4.1 8B — ELO 1445, #600/1066 (above: Ovis1.5-Llama3-8B, below: Nanbeige4-3B-Thinking-2511) - Granite 4.1 3B — ELO 1372, #742/1066 (above: Mini-InternVL-Chat-2B-V1.5, below: Gemma 3n E4B Instruct Preview (May '25))

NEW #1 LEADERS (1) - GACL - Tic-Tac-Toe (Normalized Score (0-100)): claude-sonnet-4.6 (83.6) beat claude-opus-4.6 (63.14) by 20.46


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.