AI论文简报

Archives
Log in
February 14, 2026

11B参数跑出frontier级Agent智能,Coding Agent评测该换赛道了

  • 196B参数但只激活11B就对标GPT-5.2,Step 3.5 Flash用MoE+RL把Agent效率拉到新高度,开源权重
  • Coding Agent能修bug不代表能开发功能。FeatureBench把评测从单PR修复升级到端到端feature开发,最强模型只过11%
  • Mistral发布流式语音识别模型Voxtral Realtime,480ms延迟追平Whisper离线转录质量,Apache 2.0开源
  • 长上下文推理的记忆管理有了"刹车",GRU-Mem用门控机制让Agent知道何时更新记忆、何时该停,推理速度最高提升4倍

阅读全文 →


  • 196B parameters, only 11B active, and it matches GPT-5.2. Step 3.5 Flash uses MoE + RL to push agent efficiency to a new frontier — with open weights.
  • Coding agents can fix bugs, but can they build features? FeatureBench upgrades evaluation from single-PR fixes to end-to-end feature development. The best model passes just 11%.
  • Mistral releases Voxtral Realtime, a streaming speech recognition model. 480ms latency matching Whisper's offline transcription quality. Apache 2.0.
  • Long-context reasoning gets a "brake pedal." GRU-Mem uses gated mechanisms so agents know when to update memory and when to stop — up to 4x faster inference.

Read more →

Don't miss what's next. Subscribe to AI论文简报:
Powered by Buttondown, the easiest way to start and grow your newsletter.