AI Benchmark Digest — 2026-04-22


            
        April 22, 2026
    
    
AI Benchmark Digest — 2026-04-22


        AI Benchmark Digest — 2026-04-22
=== DAILY ===
NEW BENCHMARKS (3)
  - ReasonScape R12 (ReasonScore): leader Qwen3.5-397B-A17B (AWQ, 16k) Thinking (951.64), 67 models
  - LLM Stats (DeepSearchQA) (Score (%)): leader Claude Opus 4.6 (91.3), 5 models
  - LLM Stats (MCP-Mark) (Score (%)): leader Kimi K2.6 (55.9), 5 models
NEW MODELS (6)
  - Ling 2.6 Flash — ELO 1658, #219/891 (above: Grok 4.1 Fast, below: Hermes 4 405B)
  - JSL-MedMNX-7B — ELO 1343, #669/891 (above: Collaiborator-MEDLLM-Llama-3-8B-v2-6, below: Command R)
  - Collaiborator-MEDLLM-Llama-3-8B-v2-1 — ELO 1342, #673/891 (above: Collaiborator-MEDLLM-Llama-3-8B, below: Yi-1.5-9B)
  - JSL-MedMNX-7B-SFT — ELO 1339, #678/891 (above: Llama-3-Orca-1.0-8B, below: Collaiborator-MEDLLM-Llama-3-8B-v2-4)
  - BioLing-7B-Dare — ELO 1294, #748/891 (above: DeepHermes 3 - Llama-3.1 8B, below: LFM2.5-1.2B-Instruct)
  - JSL-MedPhi2-2.7B — ELO 1244, #798/891 (above: Phi-3 Mini, below: Gemma 3n E2B)
NEW #1 LEADERS (6)
  - Chatbot Arena (Text-to-Image) (Elo): gpt-image-2 (medium) (1512.0) beat gemini-3.1-flash-image-preview (nano-banana-2) [web-search] (1264.0) by 248.0
  - Chatbot Arena (Image Edit) (Elo): gpt-image-2 (medium) (1513.0) beat chatgpt-image-latest-high-fidelity (20251216) (1392.0) by 121.0
  - OSWorld (Success Rate (%)): Holo3-35B-A3B (82.56) beat Opus 4.5 (74.48) by 8.08
  - Spider 2.0-DBT (Accuracy (%)): SignalPilot Agent (51.56) beat Databao Agent (44.11) by 7.45
  - Design Arena (3D) (Elo): kimi-k2.6 (1381.0) beat claude-opus-4-6 (1376.0) by 5.0
  - SEAL Showdown (Arena Score): gemini-3-pro-preview (1306.8) beat gpt-4o-audio-preview-2025-06-03 (1305.3) by 1.5

View on AI Benchmark Hub
    

                                Don't miss what's next. Subscribe to Mikhail Doroshenko:
                            
                        
            Email address (required)
            
            
                    ← Newer
                
                AI Benchmark Digest — 2026-04-23
            
        
                    Older →
                
                AI Benchmark Digest — 2026-04-21