Mikhail Doroshenko

Archives
April 5, 2026

AI Benchmark Digest — 2026-04-05

AI Benchmark Digest — 2026-04-05

=== DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE March (Accuracy (%)): leader GPT-5.4 (xhigh) (36.61), 5 models - MathArena - ARXIV March (Accuracy (%)): leader Gemini 3.1 Pro Preview (66.13), 5 models

NEW MODELS (11) - qwen3.6-plus_free ELO 1698 #122/778 (above: DeepSeek R1 0528, below: Qwen 3.5 35B) - Arcee Trinity Large Thinking - DR_agent - GPT-5.2 (xhigh reasoning) - GPT-5.4 (xhigh reasoning) - GPT-5.4 Mini (xhigh reasoning) - Gemma 4 31B IT - gemma-4-26B-A4B-it - gemma-4-31B-it - gemma-4-31b-it - z-3510-cd

NEW #1 LEADERS (3) - LLM Stats (Multi-Challenge) (Score (%)): GPT-5 (69.6) beat Qwen3.5-397B-A17B (67.6) by 2.0 - GACL - WordMatrix (Normalized Score (0-100)): gpt-5.3-codex (73.94) beat gemini-3.1-pro-preview (72.68) by 1.26 - LiveBench AMPS Hard (Score): gemma-4-31b-it (100.0) beat claude-opus-4-5-20251101-thinking-64k-high-effort (99.0) by 1.0


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.