AI论文简报

Archives
Log in
February 20, 2026

Agent从80分涨到90分,失败模式没变

  • Agent准确率从80涨到90,失败模式几乎没变。 14个模型实测显示,能力提升并未带来可靠性同步改善,从demo到production的决策应看失败条件而非平均分
  • VLM+仿真RL绕过示教数据瓶颈: HERO让人形机器人零样本操控从未见过的物体,末端跟踪误差降低3.2倍
  • Fast weight长文本瓶颈不在架构,在训练目标 — 换成next-sequence prediction配合RL,固定内存模型在长文本任务上首次具备实用竞争力
  • 冷启动和偏好漂移被一个框架同时解决, Princeton的PAHF用持续学习加双反馈通道让Agent跟上用户不断变化的偏好

阅读全文 →


  • Agents Scored 80→90, but Failure Modes Barely Changed. Testing 14 models shows capability gains don't translate to reliability gains. Demo-to-production decisions should hinge on failure conditions, not average accuracy.
  • VLM + Sim RL Bypasses the Demonstration Data Bottleneck. HERO lets humanoid robots manipulate never-seen objects zero-shot, cutting end-effector tracking error by 3.2x.
  • Fast Weight Long-Context Bottleneck Is the Training Objective, Not the Architecture. Switching to next-sequence prediction with RL makes fixed-memory models competitive on long-context tasks for the first time.
  • Cold Start and Preference Drift, Solved in One Framework. Princeton's PAHF uses continual learning with dual feedback channels so agents keep up with shifting user preferences.

Read more →

Don't miss what's next. Subscribe to AI论文简报:
Powered by Buttondown, the easiest way to start and grow your newsletter.