Mikhail Doroshenko

Archives
Log in
Subscribe
May 30, 2026

AI Benchmark Digest — 2026-05-30

AI Benchmark Digest — 2026-05-30

=== DAILY === NEW SCORES FROM TOP-10 MODELS (5) - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) on UGI - Natural Intelligence: 65.39 NatInt Score (#30/1247) - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) on UGI - Willingness (W/10): 2.2 W/10 Score (#1094/1247) - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) on UGI - Writing: 65.88 Writing Score (#34/1191) - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) on UGI Leaderboard: 52.64 UGI Score (#69/1247) - GPT-5.4 (xHigh) on Creative Writing (Lechmazur): 3.4 Mean Score (#2/31)

NEW #1 LEADERS (2) - Bullshit Benchmark (BS Detection Rate (%)): Claude Opus 4.8 (96.4) beat Claude Sonnet 4.6 (94.5) by 1.9 - Creative Writing (Lechmazur) (Mean Score): GPT-5.5 (xHigh) (3.5) beat GPT-5.5 (Thinking, xHigh) (3.2) by 0.3


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.