Mikhail Doroshenko

Archives
Log in
Subscribe
June 2, 2026

AI Benchmark Digest — 2026-06-02

AI Benchmark Digest — 2026-06-02

=== DAILY === NEW BENCHMARKS (1) - GIM (IRT ability (theta)): leader GPT-5.4 Pro (High) (2.16), 46 models Grounded Integration Measure from Meta FAIR: 820 multimodal and text-grounded problems testing integrated reasoning across quantitative, spatial, language, world-knowledge, and document tasks. Scores are reported as IRT ability on GIM-820.

NEW SCORES FROM TOP-10 MODELS (2) - GPT-5.5 (xHigh) on IMO-Bench: 71.9 Advanced ProofBench Accuracy (%) (#4/12) - GPT-5.5 Pro (xHigh) on IMO-Bench: 88.1 Advanced ProofBench Accuracy (%) (#2/12)


View on AI Benchmark Hub

Don't miss what's next. Subscribe to Mikhail Doroshenko:
Powered by Buttondown, the easiest way to start and grow your newsletter.