Logs Of Thinking Machine

Archives
February 15, 2026

AI Weekly: 21 Stories You Shouldn't Miss (Feb 15)

Logs of a Thinking Machine

Weekly AI Digest · Feb 8 - Feb 15

Hey there! Here's what happened in AI this week — 21 stories curated just for you.

Kimi's Agent Swarm: 100 AI Workers Crushing Complex Tasks (And It's Live)

Kimi K2.5 unleashes ~100 sub-agents per task—your single-model bottlenecks are over.

Feb 15 · agents, multi-agent, llm

Prime Intellect Open-Sourced INTELLECT-3: Full 106B MoE Stack + RL Training

106B MoE model dominating math/code + they released the ENTIRE training stack—your custom RL agent era starts now.

Feb 15 · open-source, reasoning, rl

GLM-5 Just Dropped: The Open Model Crushing Gemini at Half the Price

744B params, tops every open benchmark, and costs just $0.80/M tokens—did Z.ai finally crack frontier performance for devs?

Feb 15 · llm, open-source, benchmarks

MIT's AI Inflection: Multimodal Models Are About to Crack General Scientific Intelligence

Rafael Gómez-Bombarelli says we're at science's 'second inflection'—AI reasoning over text, structures, and recipes to invent materials.

Feb 14 · ai-research, science, multimodal

This AI Framework Predicts Brand-New Alloys With 92% Accuracy—Science Just Got 10x Faster

JAIST's new AI fuses expert knowledge from papers into a framework that discovers high-entropy alloys even for unseen elements.

Feb 14 · ai-research, materials-science, llm

Vision-Language-Action Models Just Made Camera-Only Robots Viable (No LiDAR Needed)

Forget expensive LiDAR—new AI models let robots 'see' and act like humans using just cameras, slashing costs for autonomous fleets.

Feb 14 · robotics, autonomous, vision-language

Privacy Warnings in Your AI Chat? This New Research Makes It Real (And Local)

New dataset + models detect privacy leaks in prompts before you hit send—running tiny on your phone.

Feb 13 · privacy, llm-safety, research

AI Referrals Are Dying—But Here's the Real Shift Publishers Must Master Now

AI usage explodes, but referrals tank—not from bad demand, but because models now answer everything themselves.

Feb 13 · ai-search, seo, publishers

Z.ai's GLM-5 Just Dethroned Every Open Weights LLM (And It's Actually Usable)

Open-source just hit a new high: GLM-5 crushes benchmarks with the lowest hallucinations ever—your next production model?

Feb 13 · llm, open-source, benchmarks

TELUS Drops Bomb: Follow-Up Prompts Actually Hurt Top LLMs Like GPT-5.2 and Claude 4.5

Challenging GPT-5.2 or Claude? New benchmark shows it backfires - even flips correct answers wrong. Time to rethink your prompting?

Feb 12 · prompting, llm-evaluation, robustness

DeepSeek Math-V2: Open 685B Model Grabs Math Gold - Devs, Your Calculators Are Obsolete

Gold on IMO and Putnam from a free 685B open model? DeepSeek just made elite math reasoning accessible to every dev.

Feb 12 · math, reasoning, open-source

Z.ai's Massive GLM-5 Drops: 744B Params of Open Power You Can Actually Use

A Chinese giant just unleashed a 744B-param beast that's open for devs to grab - is this the GPT-killer we've been waiting for?

Feb 12 · llm, open-source, glm

Anthropic's 'Anonymous' AI Interviews? An LLM De-Anonymized Them in Minutes

Anthropic released 1,250 'safe' anonymized interviews. A prof used a stock LLM to unmask 25%—exposing a massive privacy wake-up call for AI

Feb 11 · privacy, anthropic, llm-risks

LLMs Just Cracked 'Uniquely Human' Language Skills—And Built ConlangCrafter to Prove It

Turns out, you don't need to be human to master metalinguistic analysis—LLMs do it better, and now generate entire artificial languages on d

Feb 11 · llm-research, linguistics, openai

Google DeepMind Just Open-Sourced the Tool That Lets You Study AI in Group Chats

What if LLMs don't just chat one-on-one, but deliberate, negotiate, and sway entire groups? DeepMind's new open-source platform makes it dea

Feb 11 · open-source, research, multi-agent

Stanford's AMIE AI Wins 47% of Cardiology Cases Over Top Doctors

Gemini-powered AMIE halved clinical errors and beat unaided cardiologists 47% vs 33% in RCT—healthcare AI just went clinical.

Feb 10 · healthcare, research, llm

DeepSeek V4: 1T-Param Coding Beast That Runs on Your Dual 4090s

1T-param coder hitting 90% HumanEval, 1M+ context, open-sourced—and it fits on consumer GPUs. Mid-Feb drop incoming.

Feb 10 · coding, open-source, llm

Open-Source Just Crushed GPT and Claude on PhD-Level Science Reviews

An open model beat human PhDs 51% of the time at literature reviews—now with a free API devs can build on today.

Feb 10 · open-source, llm, research

Anthropic's Claude Agents Hit Real Science Labs – TB-Scale Analysis in Hours

Claude-powered multi-agent systems just deployed to Allen Institute: compressing months of genomics analysis into hours.

Feb 9 · agents, anthropic, science

Kona Crushes LLMs at Spatial Puzzles – 96% Solve Rate in 313ms

LLMs flop at 2% on spatial puzzles while this energy-based model solves 96% in milliseconds – proof autoregressive is broken for real reason

Feb 9 · reasoning, multimodal, research

TinyLoRA: Reasoning in Just 13 Parameters – The Fine-Tuning Hack That Crushes Benchmarks

What if you could unlock 91% reasoning accuracy on tough math benchmarks... by training only 13 parameters? Meta just made it real.

Feb 9 · llm, finetuning, reasoning

View All Posts

You're receiving this because you subscribed to Logs of a Thinking Machine.

Visit Site · Follow on X

Don't miss what's next. Subscribe to Logs Of Thinking Machine:
Powered by Buttondown, the easiest way to start and grow your newsletter.