Mikhail Doroshenko

Archives
April 25, 2026

AI Benchmark Digest — 2026-04-25

AI Benchmark Digest — 2026-04-25

=== DAILY === NEW BENCHMARKS (6) - RuneBench (Total Peak XP Rate (XP/min)): leader GPT-5.5 (6238.0), 18 models - AI Chess Leaderboard (Reasoning) (Elo): leader gemini-3.1-pro-preview (1874.0), 264 models - AI Chess Leaderboard (Continuation) (Elo): leader gemini-3-pro-preview ˟ (1810.0), 212 models - LLM Public Goods Game (Avg. Contribution (%)): leader Gemini 2.0 Flash Exp (45.2), 21 models - LLM Emergent Collusion (Collusion Rate (%)): leader Grok 4 (0709) (75.0), 13 models - Story Theory Bench (Score (%)): leader deepseek-v3.2 (92.2), 25 models

NEW MODELS (3) - Med-Yi-1.5-9B — ELO 1295, #897/1064 (above: Lumina-3.5, below: BioMistral-DARE-NS) - MedMistral-instruct — ELO 1264, #937/1064 (above: BioMistral-7B-SLERP, below: BioMistral-7B-DARE) - Healix-1.1B-V1-Chat-dDPO — ELO 1022, #1057/1064 (above: mega-ar-126m-4k, below: GPT2_PMC)

NEW #1 LEADERS (15) - YC-Bench (Net Worth ($K)): Claude Opus 4.7 (1714.5) beat Claude Opus 4.6 (1269.7) by 444.8 - LLM Chess (Saplin) (ELO): gemini-3.1-pro-preview (1511.4) beat gpt-5-2025-08-07-medium (1086.8) by 424.6 - VoxelBench (Rating): GPT-5.5 (xHigh) (2111.0) beat Gemini 3.1 Pro Preview (1725.0) by 386.0 - MathArena - ARXIV_FALSE March (Accuracy (%)): GPT-5.5 (xhigh) (73.66) beat GPT-5.4 (xhigh) (36.61) by 37.05 - MathArena - ARXIV_FALSE February (Accuracy (%)): GPT-5.5 (xhigh) (69.76) beat GPT-5.4 (xhigh) (38.71) by 31.05 - MathArena - APEX 2025 (Accuracy (%)): GPT-5.5 (xhigh) (80.21) beat GPT-5.4-Pro (xhigh) (69.79) by 10.42 - MathArena - ARXIV March (Accuracy (%)): GPT-5.5 (xhigh) (75.0) beat Gemini 3.1 Pro Preview (66.13) by 8.87 - MathArena - Kangaroo 2025 Levels 3-4 (Accuracy (%)): GPT-5.5 (xhigh) (89.58) beat GPT-5.4 (xhigh) (83.33) by 6.25 - MathArena - APEX Shortlist 2025 (Accuracy (%)): GPT-5.5 (xhigh) (93.75) beat Gemini 3.1 Pro Preview (89.06) by 4.69 - SlopCodeBench (Isolated Solved (%)): GPT 5.5 (High) (28.06) beat GPT 5.3-Codex (High) (23.66) by 4.4 - MathArena - Kangaroo 2025 Levels 5-6 (Accuracy (%)): GPT-5.5 (xhigh) (90.0) beat Gemini 3.1 Pro Preview (86.67) by 3.33 - PrinzBench (Score (x/99)): gpt-5.5-pro (extended)* (82.0) beat gpt-5.4-pro (extended) (79.0) by 3.0 - MathArena - Usamo 2026 (Accuracy (%)): GPT-5.5 (xhigh) (98.21) beat GPT-5.4 (xhigh) (95.24) by 2.97 - OTIS Mock AIME 2024-25 (Accuracy (%)): gpt-5.5-pre-release_xhigh (100.0) beat claude-opus-4-7_xhigh (97.8) by 2.2 - MathArena - Kangaroo 2025 Levels 1-2 (Accuracy (%)): GPT-5.5 (xhigh) (95.83) beat GPT-5.4 (xhigh) (94.79) by 1.04


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.