AI Benchmark Digest — 2026-04-27
AI Benchmark Digest — 2026-04-27
=== DAILY === NEW MODELS (8) - medllama3-v11 — ELO 1333, #776/1047 (above: ai-medical-model-32bit, below: ollama_v7) - Collaiborator-MEDLLM-Llama-3-8B-v2-5 — ELO 1321, #802/1047 (above: suzume-llama-3-8B-multilingual, below: Collaiborator-MEDLLM-Llama-3-8B-v2-6) - Collaiborator-MEDLLM-Llama-3-8B-v2-6 — ELO 1321, #803/1047 (above: Collaiborator-MEDLLM-Llama-3-8B-v2-5, below: JSL-MedMNX-7B) - Collaiborator-MEDLLM-Llama-3-8B — ELO 1320, #806/1047 (above: Gemma 3n E4B Instructed LiteRT Preview, below: Collaiborator-MEDLLM-Llama-3-8B-v2-1) - Collaiborator-MEDLLM-Llama-3-8B-v2-1 — ELO 1320, #807/1047 (above: Collaiborator-MEDLLM-Llama-3-8B, below: falcon-180B) - Collaiborator-MEDLLM-Llama-3-8B-v2-4 — ELO 1318, #813/1047 (above: JSL-MedMNX-7B-SFT, below: Llama 2 70B) Open Medical LLM - PubMedQA: 78.6 (#3/168) - Collaiborator-MEDLLM-Llama-3-8B-v2-3 — ELO 1315, #821/1047 (above: Parrot-7B, below: Granite 4.0 Micro) - Llama-8B-1807 — ELO 1039, #1028/1047 (above: pythia-2.8B-deduped, below: GPT-2)
NEW #1 LEADERS (1) - Design Arena (Website) (Elo): claude-opus-4-6 (1349.0) beat claude-opus-4-7-thinking (1348.0) by 1.0