11B参数跑出frontier级Agent智能,Coding Agent评测该换赛道了
- 196B参数但只激活11B就对标GPT-5.2,Step 3.5 Flash用MoE+RL把Agent效率拉到新高度,开源权重
- Coding Agent能修bug不代表能开发功能。FeatureBench把评测从单PR修复升级到端到端feature开发,最强模型只过11%
- Mistral发布流式语音识别模型Voxtral Realtime,480ms延迟追平Whisper离线转录质量,Apache 2.0开源
- 长上下文推理的记忆管理有了"刹车",GRU-Mem用门控机制让Agent知道何时更新记忆、何时该停,推理速度最高提升4倍
- 196B parameters, only 11B active, and it matches GPT-5.2. Step 3.5 Flash uses MoE + RL to push agent efficiency to a new frontier — with open weights.
- Coding agents can fix bugs, but can they build features? FeatureBench upgrades evaluation from single-PR fixes to end-to-end feature development. The best model passes just 11%.
- Mistral releases Voxtral Realtime, a streaming speech recognition model. 480ms latency matching Whisper's offline transcription quality. Apache 2.0.
- Long-context reasoning gets a "brake pedal." GRU-Mem uses gated mechanisms so agents know when to update memory and when to stop — up to 4x faster inference.
Don't miss what's next. Subscribe to AI论文简报: