Mikhail Doroshenko

Archives
Log in
May 21, 2026

AI Benchmark Digest — 2026-05-21

AI Benchmark Digest — 2026-05-21

=== DAILY === NEW SCORES FROM TOP-10 MODELS (1) - Gemini 3.5 Flash (High) on WeirdML: 62.64 Average Score (#17/124)

NEW #1 LEADERS (3) - Kaggle Game Arena Poker (Heads Up) (Mean BB/100): GPT-5.5 (73.93) beat GPT-5.2 (40.0) by 33.93 - AA APEX-Agents (Pass@1 (%)): Gemini 3.5 Flash (high) (47.05) beat GPT-5.5 (xhigh) (37.68) by 9.37 - LA Leaderboard (Average Score (%)): Qwen2.5-14B-Instruct-GPTQ-Int8 (63.6) beat gemma-2-9b-it (63.33) by 0.27


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.