AI Benchmark Digest — 2026-04-19
AI Benchmark Digest — 2026-04-19
=== DAILY === NEW BENCHMARKS (5) - FoodTruckBench (Net Worth ($)): leader Claude Opus 4.6 (49519.0), 24 models - LLM Stats (Claw-Eval) (Score (%)): leader GLM-5V-Turbo (75.0), 5 models - LLM Stats (EmbSpatialBench) (Score (%)): leader Qwen3.5-27B (84.5), 5 models - LLM Stats (RefSpatialBench) (Score (%)): leader Qwen3 VL 235B A22B Thinking (69.9), 5 models - LLM Stats (ZEROBench-Sub) (Score (%)): leader Qwen3.5-122B-A10B (36.2), 5 models
NEW MODELS (4) - Collaiborator-MEDLLM-Llama-3-8B-v2-5 — ELO 1356, #678/909 (above: Gemma 3n E4B Instructed LiteRT Preview, below: Collaiborator-MEDLLM-Llama-3-8B-v2-6) - Collaiborator-MEDLLM-Llama-3-8B-v2-6 — ELO 1356, #679/909 (above: Collaiborator-MEDLLM-Llama-3-8B-v2-5, below: JSL-MedMNX-7B) - Collaiborator-MEDLLM-Llama-3-8B — ELO 1355, #681/909 (above: JSL-MedMNX-7B, below: Command R) - ClinicalGPT-base-zh — ELO 1075, #889/909 (above: pythia-2.8B, below: pythia-2.8B-deduped)
NEW SCORES FROM TOP-10 MODELS (1) - Claude Opus 4.7 on BenchTable: 83.3 Total Score (%) (#3/347)
NEW #1 LEADERS (4) - Design Arena (Website) (Elo): claude-opus-4-7 (1355.0) beat claude-opus-4-6 (1352.0) by 3.0 - MathArena - SMT 2025 (Accuracy (%)): GPT-5.2 (xhigh) (95.71) beat Gemini 3 Pro (preview) (93.4) by 2.31 - Design Arena (UI Components) (Elo): claude-opus-4-7 (1391.0) beat claude-opus-4-6 (1389.0) by 2.0 - ASCIIBench (ELO Rating): claude-opus-4.1 (1680.0) beat gemini-3-pro-preview (1679.0) by 1.0