AI Benchmark Digest — 2026-04-08
AI Benchmark Digest — 2026-04-08
=== DAILY === NEW MODELS (9) - GLM-5.1 — ELO 1916, #14/789 (above: GPT-5.2 (Medium), below: GPT-5.2 Pro) - GLM 5.1 — ELO 1862, #30/789 (above: GPT-5 (High), below: GPT-5.4) - GPT-5.1 Codex (High) — ELO 1831, #44/789 (above: nova-2-lite-v1, below: Kimi K2.5 (Thinking)) - Kimi K2.5 (Thinking) — ELO 1828, #45/789 (above: GPT-5.1 Codex (High), below: GPT-5.1 (Thinking)) - MiMo-V2-Omni-0327 — ELO 1803, #54/789 (above: O3 (Medium), below: GPT-5.4 Mini (High)) - Kimi K2 (Thinking) — ELO 1755, #83/789 (above: Claude Sonnet 4.5, below: DeepSeek V3.2 (Thinking)) - Qwen 3.5 35B A3B — ELO 1743, #94/789 (above: GPT-5.1 (Low), below: DeepSeek V3.1 (Thinking)) - Qwen 3.5 27B — ELO 1735, #101/789 (above: GLM-4.7, below: GPT-5.3 Instant) - Qwen3.5 Omni Flash — ELO 1629, #194/789 (above: Qwen3-30B-A3B-2507-Think, below: Qwen 3 235B A22B 2507)
NEW #1 LEADERS (6) - Design Arena (Video Editing) (Elo): wan-v2.7-v2v (1330.0) beat libra (1258.0) by 72.0 - Chatbot Arena (Text-to-Video) (Elo): dreamina-seedance-2.0-720p (1450.0) beat veo-3.1-audio-1080p (1381.0) by 69.0 - Chatbot Arena (Image-to-Video) (Elo): dreamina-seedance-2.0-720p (1449.0) beat grok-imagine-video-720p (1404.0) by 45.0 - Design Arena (UI Components) (Elo): coconut (1433.0) beat claude-opus-4-6 (1399.0) by 34.0 - ASCIIBench (ELO Rating): claude-opus-4.1 (1666.0) beat claude-sonnet-4.5 (1660.0) by 6.0 - Big-Bench Hard (Average (%)): gemini-1.5-pro-001 (89.2) beat DeepSeek-V3 (87.5) by 1.7