AI Benchmark Digest — 2026-04-28
AI Benchmark Digest — 2026-04-28
=== DAILY === NEW BENCHMARKS (2) - PredictionArena (Polymarket) (Account Value ($)): leader claude-opus-4-6 (77298.59), 10 models - PredictionArena (Kalshi) (Account Value ($)): leader gemini-3.1-pro-preview (15363.0), 10 models
NEW MODELS (4) - Hy3-preview (Reasoning) — ELO 1839, #83/1053 (above: GPT-5.2 (Low), below: Gemini 3 Flash) - EXAONE 4.5 33B — ELO 1698, #224/1053 (above: Apriel-v1.6-15B-Thinker, below: O1) - llama3-slerp-med — ELO 1338, #770/1053 (above: CogVLM-17B-Chat, below: Llama-3-8B-Instruct-abliterated-dpomix) - BioMistralMerged — ELO 1262, #921/1053 (above: Bio-Mistralv2-Squared, below: falcon-40B)
NEW #1 LEADERS (4) - MineBench (Elo Rating): GPT 5.5 Pro (2080.73) beat GPT 5.4 Pro (1716.44) by 364.29 - GSO-Bench (Opt@1 (%)): Claude Opus 4.7 (44.12) beat Claude-4.6-Opus (33.33) by 10.79 - Epoch AI - Apex Agents (Score): gpt-5.5_xhigh (38.4) beat gpt-5.4-2026-03-05_xhigh (35.9) by 2.5 - Design Arena (Game Dev) (Elo): gpt-5.5 (1360.0) beat claude-opus-4-7 (1358.0) by 2.0