AI Weekly: 21 Stories You Shouldn't Miss (Feb 15)


            
        February 15, 2026
    
    
AI Weekly: 21 Stories You Shouldn't Miss (Feb 15)


                Logs of a Thinking Machine
              

                Weekly AI Digest · Feb 8 - Feb 15
              

                Hey there! Here's what happened in AI this week — 21 stories curated just for you.
              

          Kimi's Agent Swarm: 100 AI Workers Crushing Complex Tasks (And It's Live)
        

          Kimi K2.5 unleashes ~100 sub-agents per task—your single-model bottlenecks are over.
        

          Feb 15 · agents, multi-agent, llm
        

          Prime Intellect Open-Sourced INTELLECT-3: Full 106B MoE Stack + RL Training
        

          106B MoE model dominating math/code + they released the ENTIRE training stack—your custom RL agent era starts now.
        

          Feb 15 · open-source, reasoning, rl
        

          GLM-5 Just Dropped: The Open Model Crushing Gemini at Half the Price
        

          744B params, tops every open benchmark, and costs just $0.80/M tokens—did Z.ai finally crack frontier performance for devs?
        

          Feb 15 · llm, open-source, benchmarks
        

          MIT's AI Inflection: Multimodal Models Are About to Crack General Scientific Intelligence
        

          Rafael Gómez-Bombarelli says we're at science's 'second inflection'—AI reasoning over text, structures, and recipes to invent materials.
        

          Feb 14 · ai-research, science, multimodal
        

          This AI Framework Predicts Brand-New Alloys With 92% Accuracy—Science Just Got 10x Faster
        

          JAIST's new AI fuses expert knowledge from papers into a framework that discovers high-entropy alloys even for unseen elements.
        

          Feb 14 · ai-research, materials-science, llm
        

          Vision-Language-Action Models Just Made Camera-Only Robots Viable (No LiDAR Needed)
        

          Forget expensive LiDAR—new AI models let robots 'see' and act like humans using just cameras, slashing costs for autonomous fleets.
        

          Feb 14 · robotics, autonomous, vision-language
        

          Privacy Warnings in Your AI Chat? This New Research Makes It Real (And Local)
        

          New dataset + models detect privacy leaks in prompts before you hit send—running tiny on your phone.
        

          Feb 13 · privacy, llm-safety, research
        

          AI Referrals Are Dying—But Here's the Real Shift Publishers Must Master Now
        

          AI usage explodes, but referrals tank—not from bad demand, but because models now answer everything themselves.
        

          Feb 13 · ai-search, seo, publishers
        

          Z.ai's GLM-5 Just Dethroned Every Open Weights LLM (And It's Actually Usable)
        

          Open-source just hit a new high: GLM-5 crushes benchmarks with the lowest hallucinations ever—your next production model?
        

          Feb 13 · llm, open-source, benchmarks
        

          TELUS Drops Bomb: Follow-Up Prompts Actually Hurt Top LLMs Like GPT-5.2 and Claude 4.5
        

          Challenging GPT-5.2 or Claude? New benchmark shows it backfires - even flips correct answers wrong. Time to rethink your prompting?
        

          Feb 12 · prompting, llm-evaluation, robustness
        

          DeepSeek Math-V2: Open 685B Model Grabs Math Gold - Devs, Your Calculators Are Obsolete
        

          Gold on IMO and Putnam from a free 685B open model? DeepSeek just made elite math reasoning accessible to every dev.
        

          Feb 12 · math, reasoning, open-source
        

          Z.ai's Massive GLM-5 Drops: 744B Params of Open Power You Can Actually Use
        

          A Chinese giant just unleashed a 744B-param beast that's open for devs to grab - is this the GPT-killer we've been waiting for?
        

          Feb 12 · llm, open-source, glm
        

          Anthropic's 'Anonymous' AI Interviews? An LLM De-Anonymized Them in Minutes
        

          Anthropic released 1,250 'safe' anonymized interviews. A prof used a stock LLM to unmask 25%—exposing a massive privacy wake-up call for AI 
        

          Feb 11 · privacy, anthropic, llm-risks
        

          LLMs Just Cracked 'Uniquely Human' Language Skills—And Built ConlangCrafter to Prove It
        

          Turns out, you don't need to be human to master metalinguistic analysis—LLMs do it better, and now generate entire artificial languages on d
        

          Feb 11 · llm-research, linguistics, openai
        

          Google DeepMind Just Open-Sourced the Tool That Lets You Study AI in Group Chats
        

          What if LLMs don't just chat one-on-one, but deliberate, negotiate, and sway entire groups? DeepMind's new open-source platform makes it dea
        

          Feb 11 · open-source, research, multi-agent
        

          Stanford's AMIE AI Wins 47% of Cardiology Cases Over Top Doctors
        

          Gemini-powered AMIE halved clinical errors and beat unaided cardiologists 47% vs 33% in RCT—healthcare AI just went clinical.
        

          Feb 10 · healthcare, research, llm
        

          DeepSeek V4: 1T-Param Coding Beast That Runs on Your Dual 4090s
        

          1T-param coder hitting 90% HumanEval, 1M+ context, open-sourced—and it fits on consumer GPUs. Mid-Feb drop incoming.
        

          Feb 10 · coding, open-source, llm
        

          Open-Source Just Crushed GPT and Claude on PhD-Level Science Reviews
        

          An open model beat human PhDs 51% of the time at literature reviews—now with a free API devs can build on today.
        

          Feb 10 · open-source, llm, research
        

          Anthropic's Claude Agents Hit Real Science Labs – TB-Scale Analysis in Hours
        

          Claude-powered multi-agent systems just deployed to Allen Institute: compressing months of genomics analysis into hours.
        

          Feb 9 · agents, anthropic, science
        

          Kona Crushes LLMs at Spatial Puzzles – 96% Solve Rate in 313ms
        

          LLMs flop at 2% on spatial puzzles while this energy-based model solves 96% in milliseconds – proof autoregressive is broken for real reason
        

          Feb 9 · reasoning, multimodal, research
        

          TinyLoRA: Reasoning in Just 13 Parameters – The Fine-Tuning Hack That Crushes Benchmarks
        

          What if you could unlock 91% reasoning accuracy on tough math benchmarks... by training only 13 parameters? Meta just made it real.
        

          Feb 9 · llm, finetuning, reasoning
        

                View All Posts
              

                You're receiving this because you subscribed to Logs of a Thinking Machine.
              

Visit Site ·
                Follow on X


                            Don't miss what's next. Subscribe to Logs Of Thinking Machine:
                        
                    
            Email address (required)