Mikhail Doroshenko
Archives
Search...
Subscribe
AI Benchmark Digest — 2026-04-14
April 14, 2026
AI Benchmark Digest — 2026-04-14 === DAILY === NEW BENCHMARKS (11) - SpacetimeDB LLM Benchmark (TypeScript) (Task Pass Rate (%)): leader Claude Opus 4.6...
AI Benchmark Digest — 2026-04-12
April 12, 2026
AI Benchmark Digest — 2026-04-12 === DAILY === NEW #1 LEADERS (2) - Design Arena (3D) (Elo): glm-5-turbo (1372.0) beat claude-opus-4-6 (1365.0) by 7.0 -...
AI Benchmark Digest — 2026-04-11
April 11, 2026
AI Benchmark Digest — 2026-04-11 === DAILY === NEW BENCHMARKS (20) - YKS 2025 LLM Leaderboard (Total Score (out of 200)): leader GPT-5 (194.0), 8 models -...
AI Benchmark Digest — 2026-04-10
April 10, 2026
AI Benchmark Digest — 2026-04-10 === DAILY === NEW BENCHMARKS (9) - WebApp1K (Pass@1 (%)): leader o3-mini (96.1), 34 models - WebApp1K Duo (Pass@1 (%)):...
AI Benchmark Digest — 2026-04-09
April 9, 2026
AI Benchmark Digest — 2026-04-09 === DAILY === NEW BENCHMARKS (25) - SWE-Arena (Elo Score): leader Voxtral Small 24B 2507 (1004.0), 37 models - Long Code...
AI Benchmark Digest — 2026-04-08
April 8, 2026
AI Benchmark Digest — 2026-04-08 === DAILY === NEW MODELS (9) - GLM-5.1 — ELO 1916, #14/789 (above: GPT-5.2 (Medium), below: GPT-5.2 Pro) - GLM 5.1 — ELO...
AI Benchmark Digest — 2026-04-07
April 7, 2026
AI Benchmark Digest — 2026-04-07 === DAILY === NEW MODELS (1) - Solar Pro 3 — ELO 1641, #181/781 (above: MiniMax M1 80k, below: O3 Mini) NEW #1 LEADERS (2) -...
AI Benchmark Digest — 2026-04-06
April 6, 2026
AI Benchmark Digest — 2026-04-06 === DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE March (Accuracy (%)): leader GPT-5.4 (xhigh) (36.61), 5 models -...
AI Benchmark Digest — 2026-04-05
April 5, 2026
AI Benchmark Digest — 2026-04-05 === DAILY === NEW BENCHMARKS (2) - MathArena - ARXIV_FALSE March (Accuracy (%)): leader GPT-5.4 (xhigh) (36.61), 5 models -...
AI Benchmark Digest — 2026-04-03
April 3, 2026
Last 24 Hours The last 24 hours in AI benchmarking have been dominated by the sudden arrival of "future-dated" model variants and a massive sweep by the Qwen...
AI Benchmark Digest — 2026-04-02
April 2, 2026
Last 24 Hours The AI benchmark landscape saw significant movement in the last 24 hours, highlighted by the arrival of next-generation previews and a shake-up...
AI Benchmark Digest — 2026-04-01
April 1, 2026
Last 24 Hours It was a high-stakes 24 hours for the leaderboards, with several "next-generation" heavyweights reclaiming top spots across vision, coding, and...
AI Benchmark Digest — 2026-03-31
March 31, 2026
Last 24 Hours Here is your daily update on the AI benchmark landscape: Strategic Dominance in Game Theory: The Kaggle Game Arena Four in a Row saw a massive...
AI Benchmark Digest — 2026-03-30
March 30, 2026
Last 24 Hours No significant benchmark changes in the last 24 hours. Last 7 Days The last seven days in AI benchmarking have been dominated by the arrival of...
AI Benchmark Digest — 2026-03-29
March 29, 2026
Last 24 Hours The AI benchmark landscape saw significant movement in the last 24 hours, marked by a massive leap in reasoning capabilities and a shift in...
AI Benchmark Digest — 2026-03-28
March 28, 2026
Last 24 Hours Here is your daily update on the AI benchmark landscape: DeepSeek Continues Momentum: The newly released DeepSeek V3.2 made a strong debut on...
AI Benchmark Digest — 2026-03-27
March 27, 2026
Last 24 Hours Here is your summary of AI benchmark activity over the last 24 hours: Audio Intelligence Breakthrough: The gemini-3.1-flash-live-preview...
AI Benchmark Digest — 2026-03-26
March 26, 2026
Last 24 Hours The last 24 hours in AI benchmarking have been dominated by a high-stakes shuffle at the top of the leaderboards, featuring a strong debut from...
AI Benchmark Digest — 2026-03-25
March 25, 2026
Last 24 Hours The last 24 hours in AI benchmarking have been dominated by the arrival of specialized logic tests and a significant shake-up in visual...
AI Benchmark Digest — 2026-03-25
March 25, 2026
Last 24 Hours It has been a high-velocity 24 hours in the evaluation space, with next-generation iterations from OpenAI and Google dominating the...
Newer archives
Older archives