AI Benchmark Digest — 2026-06-09
AI Benchmark Digest — 2026-06-09
=== DAILY === NEW SCORES FROM TOP-10 MODELS (2) - GPT-5.5 (xHigh) on SEAL - SWE Atlas - Codebase QnA: 45.43 Score (#2/14) - GPT-5.5 (xHigh) on SEAL - SWE Atlas - Test Writing: 42.59 Score (#3/14)
NEW #1 LEADERS (7) - GSMA Open-Telco - TeleTables (Score (%)): TelecomGPT (88.0) beat OTel-LLM-8.3B-QnA (61.8) by 26.2 - GSMA Open-Telco LLM Leaderboard (Average Score (%)): TelecomGPT (89.64) beat OTel-LLM-8.3B-QnA (85.98) by 3.66 - SEAL - SWE Atlas - Codebase QnA (Score): Opus 4.8 (Claude Code) (48.79) beat GPT-5.5 (45.43) by 3.36 - GSMA Open-Telco - 3GPP (Score (%)): TelecomGPT (84.22) beat OTel-LLM-8.3B-QnA (81.4) by 2.82 - GSMA Open-Telco - TeleLogs (Score (%)): TelecomGPT (98.96) beat OTel-LLM-8.3B-QnA (96.3) by 2.66 - GSMA Open-Telco - srsRAN-Bench (Score (%)): TelecomGPT (91.33) beat OTel-LLM-8.3B-QnA (89.68) by 1.65 - SEAL - SWE Atlas - Test Writing (Score): Opus 4.8 (Claude Code) (45.56) beat GPT-5.4 (xHigh) (44.36) by 1.2