11B参数跑出frontier级Agent智能，Coding Agent评测该换赛道了

        February 14, 2026

11B参数跑出frontier级Agent智能，Coding Agent评测该换赛道了

196B参数但只激活11B就对标GPT-5.2，Step 3.5 Flash用MoE+RL把Agent效率拉到新高度，开源权重

Coding Agent能修bug不代表能开发功能。FeatureBench把评测从单PR修复升级到端到端feature开发，最强模型只过11%

Mistral发布流式语音识别模型Voxtral Realtime，480ms延迟追平Whisper离线转录质量，Apache 2.0开源

长上下文推理的记忆管理有了"刹车"，GRU-Mem用门控机制让Agent知道何时更新记忆、何时该停，推理速度最高提升4倍

阅读全文 →

196B parameters, only 11B active, and it matches GPT-5.2. Step 3.5 Flash uses MoE + RL to push agent efficiency to a new frontier — with open weights.

Coding agents can fix bugs, but can they build features? FeatureBench upgrades evaluation from single-PR fixes to end-to-end feature development. The best model passes just 11%.

Mistral releases Voxtral Realtime, a streaming speech recognition model. 480ms latency matching Whisper's offline transcription quality. Apache 2.0.

Long-context reasoning gets a "brake pedal." GRU-Mem uses gated mechanisms so agents know when to update memory and when to stop — up to 4x faster inference.

Read more →

                                Don't miss what's next. Subscribe to AI论文简报:

            Email address (required)