|
Kimi's Agent Swarm: 100 AI Workers Crushing Complex Tasks (And It's Live)
Kimi K2.5 unleashes ~100 sub-agents per task—your single-model bottlenecks are over.
Feb 15 · agents, multi-agent, llm
|
|
Prime Intellect Open-Sourced INTELLECT-3: Full 106B MoE Stack + RL Training
106B MoE model dominating math/code + they released the ENTIRE training stack—your custom RL agent era starts now.
Feb 15 · open-source, reasoning, rl
|
|
GLM-5 Just Dropped: The Open Model Crushing Gemini at Half the Price
744B params, tops every open benchmark, and costs just $0.80/M tokens—did Z.ai finally crack frontier performance for devs?
Feb 15 · llm, open-source, benchmarks
|
|
MIT's AI Inflection: Multimodal Models Are About to Crack General Scientific Intelligence
Rafael Gómez-Bombarelli says we're at science's 'second inflection'—AI reasoning over text, structures, and recipes to invent materials.
Feb 14 · ai-research, science, multimodal
|
|
This AI Framework Predicts Brand-New Alloys With 92% Accuracy—Science Just Got 10x Faster
JAIST's new AI fuses expert knowledge from papers into a framework that discovers high-entropy alloys even for unseen elements.
Feb 14 · ai-research, materials-science, llm
|
|
Vision-Language-Action Models Just Made Camera-Only Robots Viable (No LiDAR Needed)
Forget expensive LiDAR—new AI models let robots 'see' and act like humans using just cameras, slashing costs for autonomous fleets.
Feb 14 · robotics, autonomous, vision-language
|
|
Privacy Warnings in Your AI Chat? This New Research Makes It Real (And Local)
New dataset + models detect privacy leaks in prompts before you hit send—running tiny on your phone.
Feb 13 · privacy, llm-safety, research
|
|
AI Referrals Are Dying—But Here's the Real Shift Publishers Must Master Now
AI usage explodes, but referrals tank—not from bad demand, but because models now answer everything themselves.
Feb 13 · ai-search, seo, publishers
|
|
Z.ai's GLM-5 Just Dethroned Every Open Weights LLM (And It's Actually Usable)
Open-source just hit a new high: GLM-5 crushes benchmarks with the lowest hallucinations ever—your next production model?
Feb 13 · llm, open-source, benchmarks
|
|
TELUS Drops Bomb: Follow-Up Prompts Actually Hurt Top LLMs Like GPT-5.2 and Claude 4.5
Challenging GPT-5.2 or Claude? New benchmark shows it backfires - even flips correct answers wrong. Time to rethink your prompting?
Feb 12 · prompting, llm-evaluation, robustness
|
|
DeepSeek Math-V2: Open 685B Model Grabs Math Gold - Devs, Your Calculators Are Obsolete
Gold on IMO and Putnam from a free 685B open model? DeepSeek just made elite math reasoning accessible to every dev.
Feb 12 · math, reasoning, open-source
|
|
Z.ai's Massive GLM-5 Drops: 744B Params of Open Power You Can Actually Use
A Chinese giant just unleashed a 744B-param beast that's open for devs to grab - is this the GPT-killer we've been waiting for?
Feb 12 · llm, open-source, glm
|
|
Anthropic's 'Anonymous' AI Interviews? An LLM De-Anonymized Them in Minutes
Anthropic released 1,250 'safe' anonymized interviews. A prof used a stock LLM to unmask 25%—exposing a massive privacy wake-up call for AI
Feb 11 · privacy, anthropic, llm-risks
|
|
LLMs Just Cracked 'Uniquely Human' Language Skills—And Built ConlangCrafter to Prove It
Turns out, you don't need to be human to master metalinguistic analysis—LLMs do it better, and now generate entire artificial languages on d
Feb 11 · llm-research, linguistics, openai
|
|
Google DeepMind Just Open-Sourced the Tool That Lets You Study AI in Group Chats
What if LLMs don't just chat one-on-one, but deliberate, negotiate, and sway entire groups? DeepMind's new open-source platform makes it dea
Feb 11 · open-source, research, multi-agent
|
|
Stanford's AMIE AI Wins 47% of Cardiology Cases Over Top Doctors
Gemini-powered AMIE halved clinical errors and beat unaided cardiologists 47% vs 33% in RCT—healthcare AI just went clinical.
Feb 10 · healthcare, research, llm
|
|
DeepSeek V4: 1T-Param Coding Beast That Runs on Your Dual 4090s
1T-param coder hitting 90% HumanEval, 1M+ context, open-sourced—and it fits on consumer GPUs. Mid-Feb drop incoming.
Feb 10 · coding, open-source, llm
|
|
Open-Source Just Crushed GPT and Claude on PhD-Level Science Reviews
An open model beat human PhDs 51% of the time at literature reviews—now with a free API devs can build on today.
Feb 10 · open-source, llm, research
|
|
Anthropic's Claude Agents Hit Real Science Labs – TB-Scale Analysis in Hours
Claude-powered multi-agent systems just deployed to Allen Institute: compressing months of genomics analysis into hours.
Feb 9 · agents, anthropic, science
|
|
Kona Crushes LLMs at Spatial Puzzles – 96% Solve Rate in 313ms
LLMs flop at 2% on spatial puzzles while this energy-based model solves 96% in milliseconds – proof autoregressive is broken for real reason
Feb 9 · reasoning, multimodal, research
|
|
TinyLoRA: Reasoning in Just 13 Parameters – The Fine-Tuning Hack That Crushes Benchmarks
What if you could unlock 91% reasoning accuracy on tough math benchmarks... by training only 13 parameters? Meta just made it real.
Feb 9 · llm, finetuning, reasoning
|