AI Benchmark Digest — 2026-06-02


            
        June 2, 2026
    
    
AI Benchmark Digest — 2026-06-02


        AI Benchmark Digest — 2026-06-02
=== DAILY ===
NEW BENCHMARKS (1)
  - GIM (IRT ability (theta)): leader GPT-5.4 Pro (High) (2.16), 46 models
      Grounded Integration Measure from Meta FAIR: 820 multimodal and text-grounded problems testing integrated reasoning across quantitative, spatial, language, world-knowledge, and document tasks. Scores are reported as IRT ability on GIM-820.
NEW SCORES FROM TOP-10 MODELS (2)
  - GPT-5.5 (xHigh) on IMO-Bench: 71.9 Advanced ProofBench Accuracy (%) (#4/12)
  - GPT-5.5 Pro (xHigh) on IMO-Bench: 88.1 Advanced ProofBench Accuracy (%) (#2/12)

View on AI Benchmark Hub
    

                                Don't miss what's next. Subscribe to Mikhail Doroshenko:
                            
                        
            Email address (required)
            
            
                    ← Newer
                
                AI Benchmark Digest — 2026-06-03
            
        
                    Older →
                
                AI Benchmark Digest — 2026-06-01