AI Benchmark Digest — 2026-04-23
AI Benchmark Digest — 2026-04-23
=== DAILY === NEW BENCHMARKS (1) - MathArena - ARXIVLEAN March (Accuracy (%)): leader GPT-5.4 (xhigh) (17.07), 6 models
NEW MODELS (3) - MiMo-V2.5-Pro — ELO 1927, #33/898 (above: GPT-5.2 Codex (xHigh), below: Muse Spark) - MediKAI — ELO 1249, #801/898 (above: InstructBLIP-7B, below: LLaVA-v1-7B) - EMO-2B — ELO 1141, #859/898 (above: Qwen-1.8B, below: gpt-neox-20B)
NEW #1 LEADERS (4) - Design Arena (Image) (Elo): gpt-image-2 (1404.0) beat chestnut (1329.0) by 75.0 - Design Arena (Image Editing) (Elo): gpt-image-2 (1378.0) beat hazel (1327.0) by 51.0 - Chatbot Arena (Document) (Elo): claude-opus-4-6-thinking (1528.0) beat claude-opus-4-7 (1521.0) by 7.0 - Design Arena (Website) (Elo): claude-opus-4-7-thinking (1355.0) beat claude-opus-4-6 (1349.0) by 6.0