AI Benchmark Digest — 2026-04-19


            
        April 19, 2026
    
    
AI Benchmark Digest — 2026-04-19


        AI Benchmark Digest — 2026-04-19
=== DAILY ===
NEW BENCHMARKS (5)
  - FoodTruckBench (Net Worth ($)): leader Claude Opus 4.6 (49519.0), 24 models
  - LLM Stats (Claw-Eval) (Score (%)): leader GLM-5V-Turbo (75.0), 5 models
  - LLM Stats (EmbSpatialBench) (Score (%)): leader Qwen3.5-27B (84.5), 5 models
  - LLM Stats (RefSpatialBench) (Score (%)): leader Qwen3 VL 235B A22B Thinking (69.9), 5 models
  - LLM Stats (ZEROBench-Sub) (Score (%)): leader Qwen3.5-122B-A10B (36.2), 5 models
NEW MODELS (4)
  - Collaiborator-MEDLLM-Llama-3-8B-v2-5 — ELO 1356, #678/909 (above: Gemma 3n E4B Instructed LiteRT Preview, below: Collaiborator-MEDLLM-Llama-3-8B-v2-6)
  - Collaiborator-MEDLLM-Llama-3-8B-v2-6 — ELO 1356, #679/909 (above: Collaiborator-MEDLLM-Llama-3-8B-v2-5, below: JSL-MedMNX-7B)
  - Collaiborator-MEDLLM-Llama-3-8B — ELO 1355, #681/909 (above: JSL-MedMNX-7B, below: Command R)
  - ClinicalGPT-base-zh — ELO 1075, #889/909 (above: pythia-2.8B, below: pythia-2.8B-deduped)
NEW SCORES FROM TOP-10 MODELS (1)
  - Claude Opus 4.7 on BenchTable: 83.3 Total Score (%) (#3/347)
NEW #1 LEADERS (4)
  - Design Arena (Website) (Elo): claude-opus-4-7 (1355.0) beat claude-opus-4-6 (1352.0) by 3.0
  - MathArena - SMT 2025 (Accuracy (%)): GPT-5.2 (xhigh) (95.71) beat Gemini 3 Pro (preview) (93.4) by 2.31
  - Design Arena (UI Components) (Elo): claude-opus-4-7 (1391.0) beat claude-opus-4-6 (1389.0) by 2.0
  - ASCIIBench (ELO Rating): claude-opus-4.1 (1680.0) beat gemini-3-pro-preview (1679.0) by 1.0

View on AI Benchmark Hub
    

                                Don't miss what's next. Subscribe to Mikhail Doroshenko:
                            
                        
            Email address (required)
            
            
                    ← Newer
                
                AI Benchmark Digest — 2026-04-20
            
        
                    Older →
                
                AI Benchmark Digest — 2026-04-18