Mikhail Doroshenko
Archives
Search...
Log in
Subscribe
AI Benchmark Digest — 2026-06-18
June 18, 2026
AI Benchmark Digest — 2026-06-18 View on AI Benchmark Hub Daily New Benchmarks (9) AISI Cyber Cooling Tower 10M (Avg Steps (/7)): Claude Opus 4.6 leads with...
AI Benchmark Digest — 2026-06-17
June 17, 2026
AI Benchmark Digest — 2026-06-17 View on AI Benchmark Hub Daily New Benchmarks (4) LLM Stats (Finance Agent v2) (Score (%)): Gemini 3.5 Flash leads with...
AI Benchmark Digest — 2026-06-16
June 16, 2026
AI Benchmark Digest — 2026-06-16 View on AI Benchmark Hub Daily New Benchmarks (7) SWE-Marathon (Pass@1 (%)): Claude Opus 4.8 leads with 26.0 across 9...
AI Benchmark Digest — 2026-06-15
June 15, 2026
AI Benchmark Digest — 2026-06-15 View on AI Benchmark Hub Daily New Benchmarks (145) Open LLM Leaderboard - IFEval (Score): Llama-3.3-70B-Instruct leads with...
AI Benchmark Digest — 2026-06-14
June 14, 2026
AI Benchmark Digest — 2026-06-14 View on AI Benchmark Hub Daily New Benchmarks (75) Ramp SWE-Bench (Resolved (%)): Claude Fable 5 leads with 87.5 across 14...
AI Benchmark Digest — 2026-06-13
June 13, 2026
AI Benchmark Digest — 2026-06-13 View on AI Benchmark Hub Daily New Scores From Top-10 Models (11) Claude 5 on Chess Puzzles (Epoch AI): 41.0 Accuracy (%)...
AI Benchmark Digest — 2026-06-12
June 12, 2026
AI Benchmark Digest — 2026-06-12 === DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE May (Accuracy (%)): leader GPT-5.5 (xhigh) (50.0), 8 models -...
AI Benchmark Digest — 2026-06-11
June 11, 2026
AI Benchmark Digest — 2026-06-11 === DAILY === NEW BENCHMARKS (1) - GDPval-AA (Elo): leader Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8...
AI Benchmark Digest — 2026-06-10
June 10, 2026
AI Benchmark Digest — 2026-06-10 === DAILY === NEW BENCHMARKS (1) - SkateBench (Success Rate (%)): leader gemini-3.1-pro-preview (96.92), 28 models...
AI Benchmark Digest — 2026-06-09
June 9, 2026
AI Benchmark Digest — 2026-06-09 === DAILY === NEW SCORES FROM TOP-10 MODELS (2) - GPT-5.5 (xHigh) on SEAL - SWE Atlas - Codebase QnA: 45.43 Score (#2/14) -...
AI Benchmark Digest — 2026-06-08
June 8, 2026
AI Benchmark Digest — 2026-06-08 No significant changes. View on AI Benchmark Hub
AI Benchmark Digest — 2026-06-07
June 7, 2026
AI Benchmark Digest — 2026-06-07 === WEEKLY === NEW MODELS (2) - MiniMax-M3 — ELO 1762, #83/970 (above: Gemini 3 Flash (High), below: Claude Opus 4.5 (Non-...
AI Benchmark Digest — 2026-06-06
June 6, 2026
AI Benchmark Digest — 2026-06-06 === DAILY === NEW BENCHMARKS (20) - Pencil Puzzle Bench - Yajilin (Direct-ask Success Rate (%)): leader gpt-5.2 (High)...
AI Benchmark Digest — 2026-06-04
June 4, 2026
AI Benchmark Digest — 2026-06-04 === DAILY === NEW #1 LEADERS (1) - GAIA (Accuracy (%)): CustomGPT.ai Research Lab v44 (93.36) beat Co-Sight Pro v1.0.1...
AI Benchmark Digest — 2026-06-03
June 3, 2026
AI Benchmark Digest — 2026-06-03 === DAILY === NEW SCORES FROM TOP-10 MODELS (2) - GPT-5.5 Pro on IUMB: 100.0 Score (%) (#2/55) - Gemini 3 Deep Think on...
AI Benchmark Digest — 2026-06-02
June 2, 2026
AI Benchmark Digest — 2026-06-02 === DAILY === NEW BENCHMARKS (1) - GIM (IRT ability (theta)): leader GPT-5.4 Pro (High) (2.16), 46 models Grounded...
AI Benchmark Digest — 2026-06-01
June 1, 2026
AI Benchmark Digest — 2026-06-01 === DAILY === NEW #1 LEADERS (3) - EQ-Bench Creative Writing v3 (Elo): Claude Opus 4.7 (2050.8) beat GPT-5.4 (1906.0) by...
AI Benchmark Digest — 2026-05-30
May 30, 2026
AI Benchmark Digest — 2026-05-30 === DAILY === NEW SCORES FROM TOP-10 MODELS (5) - Claude Opus 4.8 (Adaptive Reasoning, Max Effort) on UGI - Natural...
AI Benchmark Digest — 2026-05-29
May 29, 2026
AI Benchmark Digest — 2026-05-29 === DAILY === NEW BENCHMARKS (1) - DeepSWE (Pass@1 (%)): leader GPT-5.5 (xHigh) (70.0), 12 models DataCurve benchmark...
AI Benchmark Digest — 2026-05-28
May 28, 2026
AI Benchmark Digest — 2026-05-28 === DAILY === NEW SCORES FROM TOP-10 MODELS (1) - GPT-5.5 (xHigh) on SWE-rebench: 62.73 Resolved (%) (#1/82) NEW #1 LEADERS...
Older archives