Mikhail Doroshenko
Archives
Search...
Log in
Subscribe
AI Benchmark Digest — 2026-05-28
May 28, 2026
AI Benchmark Digest — 2026-05-28 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - GPT-5.5 (xHigh) on SWE-rebench: 62.73 Resolved (%) (#1/82) NEW #1 LEADERS...
AI Benchmark Digest — 2026-05-27
May 27, 2026
AI Benchmark Digest — 2026-05-27 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - GPT-5.4 (xHigh) on Creative Writing (Lechmazur): 3.2 Mean Score (#2/25)...
AI Benchmark Digest — 2026-05-25
May 25, 2026
AI Benchmark Digest — 2026-05-25 === DAILY === NEW BENCHMARKS (6) - LLMEval-Logic Base (Accuracy (%)): leader Seed 2.0 Pro (Thinking) (75.5), 14 models -...
AI Benchmark Digest — 2026-05-24
May 24, 2026
AI Benchmark Digest — 2026-05-24 === DAILY === NEW BENCHMARKS (14) - NanoGPT-Bench (% of Human Progress Recovered): leader Claude Opus 4.6 (9.3), 2 models...
AI Benchmark Digest — 2026-05-23
May 23, 2026
AI Benchmark Digest — 2026-05-23 === DAILY === NEW #1 LEADERS (1) - OSWorld (Success Rate (%)): Opus 4.7 (83.64) beat Holo3-35B-A3B (82.56) by 1.08 View on...
AI Benchmark Digest — 2026-05-22
May 22, 2026
AI Benchmark Digest — 2026-05-22 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - GPT-5.5 (High) on Sycophancy (Lechmazur): 3.5 Sycophancy rate % (lower is...
AI Benchmark Digest — 2026-05-21
May 21, 2026
AI Benchmark Digest — 2026-05-21 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - Gemini 3.5 Flash (High) on WeirdML: 62.64 Average Score (#17/124) NEW #1...
AI Benchmark Digest — 2026-05-20
May 20, 2026
AI Benchmark Digest — 2026-05-20 === DAILY === NEW MODELS (1) - Gemini 3.5 Flash (High) — ELO 1942, #9/609 (above: Claude Opus 4.7 (Thinking), below: GPT-5.5...
AI Benchmark Digest — 2026-05-19
May 19, 2026
AI Benchmark Digest — 2026-05-19 No significant changes. View on AI Benchmark Hub
AI Benchmark Digest — 2026-05-18
May 18, 2026
AI Benchmark Digest — 2026-05-18 No significant changes. View on AI Benchmark Hub
AI Benchmark Digest — 2026-05-17
May 17, 2026
AI Benchmark Digest — 2026-05-17 === DAILY === NEW #1 LEADERS (1) - OpenClawProBench (Overall Score (%)): intern-s2-preview (76.7) beat Sensenova 6.7 Flash...
AI Benchmark Digest — 2026-05-16
May 16, 2026
AI Benchmark Digest — 2026-05-16 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - GPT-5.5 (xHigh) on Chatbot Arena (Code): 1501.0 Elo (#9/79) NEW #1 LEADERS...
AI Benchmark Digest — 2026-05-14
May 14, 2026
AI Benchmark Digest — 2026-05-14 === DAILY === NEW MODELS (4) - Doubao-Seed-2-0-Pro-260215 (High) — ELO 1781, #73/796 (above: GPT-5.2 (Low), below:...
AI Benchmark Digest — 2026-05-13
May 13, 2026
AI Benchmark Digest — 2026-05-13 === DAILY === NEW BENCHMARKS (2) - ProgramBench (Resolved (%)): leader GPT-5.5 (xHigh) (0.5), 13 models Meta and Stanford...
AI Benchmark Digest — 2026-05-11
May 11, 2026
AI Benchmark Digest — 2026-05-11 === DAILY === NEW #1 LEADERS (2) - OpenClawProBench (Overall Score (%)): Sensenova 6.7 Flash Lite (73.7) beat...
AI Benchmark Digest — 2026-05-10
May 10, 2026
AI Benchmark Digest — 2026-05-10 === DAILY === NEW BENCHMARKS (43) - AA Global-MMLU-Lite - Arabic (Accuracy (%)): leader Gemini 3.1 Pro Preview (93.0), 119...
AI Benchmark Digest — 2026-05-09
May 9, 2026
AI Benchmark Digest — 2026-05-09 === DAILY === NEW BENCHMARKS (8) - Factory Code Review Benchmark (Mean F1 (%)): leader GPT-5.2 (60.5), 13 models Factory...
AI Benchmark Digest — 2026-05-08
May 8, 2026
AI Benchmark Digest — 2026-05-08 === DAILY === NEW BENCHMARKS (8) - EuroEval Albanian NLU (NLU Average Score (%)): leader gemini-3.1-pro-preview (61.17), 208...
AI Benchmark Digest — 2026-05-07
May 7, 2026
AI Benchmark Digest — 2026-05-07 === DAILY === NEW BENCHMARKS (19) - LIBRA - MatreshkaNames * (Dataset Total Score (%)): leader...
AI Benchmark Digest — 2026-05-05
May 5, 2026
AI Benchmark Digest — 2026-05-05 === DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE April (Accuracy (%)): leader GPT-5.5 (xhigh) (72.13), 6 models -...
Older archives