Mikhail Doroshenko
Archives
Search...
Subscribe
AI Benchmark Digest — 2026-05-05
May 5, 2026
AI Benchmark Digest — 2026-05-05 === DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE April (Accuracy (%)): leader GPT-5.5 (xhigh) (72.13), 6 models -...
AI Benchmark Digest — 2026-05-04
May 4, 2026
AI Benchmark Digest — 2026-05-04 === DAILY === NEW MODELS (62) - Doubao Seed Code — ELO 1645, #209/778 (above: Qwen 3 235B A22B 2507 (Reasoning), below:...
AI Benchmark Digest — 2026-05-04
May 4, 2026
AI Benchmark Digest — 2026-05-04 === DAILY === NEW MODELS (62) - Doubao Seed Code — ELO 1645, #209/778 (above: Qwen 3 235B A22B 2507 (Reasoning), below:...
AI Benchmark Digest — 2026-05-03
May 3, 2026
AI Benchmark Digest — 2026-05-03 === DAILY === NEW BENCHMARKS (9) - Open-R1 Eval Leaderboard (Average Accuracy (%)): leader Qwen3-32B (73.74), 37 models -...
AI Benchmark Digest — 2026-05-03
May 3, 2026
AI Benchmark Digest — 2026-05-03 === DAILY === NEW BENCHMARKS (9) - Open-R1 Eval Leaderboard (Average Accuracy (%)): leader Qwen3-32B (73.74), 37 models -...
AI Benchmark Digest — 2026-05-01
May 1, 2026
AI Benchmark Digest — 2026-05-01 === DAILY === NEW BENCHMARKS (56) - LIBRA - Passkey (Dataset Total Score (%)): leader GLM-4 9B Chat (100.0), 17 models -...
AI Benchmark Digest — 2026-04-30
April 30, 2026
AI Benchmark Digest — 2026-04-30 === DAILY === NEW MODELS (9) - kimi-k2.6_nitro — ELO 1888, #57/1066 (above: Grok 4.20 0309 (Reasoning), below: Claude Sonnet...
AI Benchmark Digest — 2026-04-29
April 29, 2026
AI Benchmark Digest — 2026-04-29 === DAILY === NEW BENCHMARKS (29) - OpenVLM MME (Overall Score): leader InternVL3-78B (2538.6), 235 models - OpenVLM...
AI Benchmark Digest — 2026-04-28
April 28, 2026
AI Benchmark Digest — 2026-04-28 === DAILY === NEW BENCHMARKS (2) - PredictionArena (Polymarket) (Account Value ($)): leader claude-opus-4-6 (77298.59), 10...
AI Benchmark Digest — 2026-04-27
April 27, 2026
AI Benchmark Digest — 2026-04-27 === DAILY === NEW MODELS (8) - medllama3-v11 — ELO 1333, #776/1047 (above: ai-medical-model-32bit, below: ollama_v7) -...
AI Benchmark Digest — 2026-04-26
April 26, 2026
AI Benchmark Digest — 2026-04-26 === DAILY === NEW BENCHMARKS (5) - AI Chess Leaderboard (Continuation) (Elo): leader gemini-3-pro-preview ˟ (1810.0), 214...
AI Benchmark Digest — 2026-04-25
April 25, 2026
AI Benchmark Digest — 2026-04-25 === DAILY === NEW BENCHMARKS (6) - RuneBench (Total Peak XP Rate (XP/min)): leader GPT-5.5 (6238.0), 18 models - AI Chess...
AI Benchmark Digest — 2026-04-24
April 24, 2026
AI Benchmark Digest — 2026-04-24 === DAILY === NEW BENCHMARKS (8) - MathArena - ARXIVLEAN March (Accuracy (%)): leader Aristotle (17.07), 6 models -...
AI Benchmark Digest — 2026-04-23
April 23, 2026
AI Benchmark Digest — 2026-04-23 === DAILY === NEW BENCHMARKS (8) - MathArena - ARXIVLEAN March (Accuracy (%)): leader Aristotle (17.07), 6 models -...
AI Benchmark Digest — 2026-04-23
April 23, 2026
AI Benchmark Digest — 2026-04-23 === DAILY === NEW BENCHMARKS (1) - MathArena - ARXIVLEAN March (Accuracy (%)): leader GPT-5.4 (xhigh) (17.07), 6 models NEW...
AI Benchmark Digest — 2026-04-22
April 22, 2026
AI Benchmark Digest — 2026-04-22 === DAILY === NEW BENCHMARKS (3) - ReasonScape R12 (ReasonScore): leader Qwen3.5-397B-A17B (AWQ, 16k) Thinking (951.64), 67...
AI Benchmark Digest — 2026-04-21
April 21, 2026
AI Benchmark Digest — 2026-04-21 === DAILY === NEW BENCHMARKS (4) - ReasonScape R12 (ReasonScore): leader Qwen3.5-397B-A17B (AWQ, 16k) Thinking (951.64), 67...
AI Benchmark Digest — 2026-04-20
April 20, 2026
AI Benchmark Digest — 2026-04-20 === DAILY === NEW BENCHMARKS (30) - ArtifactsBenchmark (Average Score): leader SWE-Bench (2294.0), 30 models - BIRD-Interact...
AI Benchmark Digest — 2026-04-19
April 19, 2026
AI Benchmark Digest — 2026-04-19 === DAILY === NEW BENCHMARKS (5) - FoodTruckBench (Net Worth ($)): leader Claude Opus 4.6 (49519.0), 24 models - LLM Stats...
AI Benchmark Digest — 2026-04-18
April 18, 2026
AI Benchmark Digest — 2026-04-18 === DAILY === NEW BENCHMARKS (4) - GameWorld Generalist (Progress (%)): leader Gemini-3-Flash-Preview (41.9), 10 models -...
Older archives