AI Benchmark Digest — 2026-04-06
AI Benchmark Digest — 2026-04-06
=== DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE March (Accuracy (%)): leader GPT-5.4 (xhigh) (36.61), 5 models - MathArena - ARXIV March (Accuracy (%)): leader Gemini 3.1 Pro Preview (66.13), 5 models
NEW MODELS (1) - qwen3.6-plus_free — ELO 1698, #122/778 (above: DeepSeek R1 0528, below: Qwen 3.5 35B)
NEW #1 LEADERS (7) - Kaggle Game Arena Chess (Elo Rating): Gemini 3.1 Pro Preview (1414.95) beat Gemini 3 Pro Preview (1314.15) by 100.8 - Design Arena (Slides) (Elo): claude-pptx-opus (1269.0) beat gamma (1251.0) by 18.0 - Design Arena (Website) (Elo): glm-5-turbo (1373.0) beat claude-opus-4-6 (1364.0) by 9.0 - LLM Stats (Multi-Challenge) (Score (%)): GPT-5 (69.6) beat Qwen3.5-397B-A17B (67.6) by 2.0 - GACL - WordMatrix (Normalized Score (0-100)): gpt-5.3-codex (73.94) beat gemini-3.1-pro-preview (72.68) by 1.26 - Design Arena (Game Dev) (Elo): glm-5-turbo (1373.0) beat claude-opus-4-6-thinking (1372.0) by 1.0 - LiveBench AMPS Hard (Score): gemma-4-31b-it (100.0) beat claude-opus-4-5-20251101-thinking-64k-high-effort (99.0) by 1.0