AI Research Brief

Archives
Log in
May 22, 2026

$15 Per Paper, Healthcare Agents Cap at 28%

  • Auto-Research Cost Curve Has Crossed. $15 produces a full paper, but frontier LLMs still fabricate results and miss errors. End-to-end autonomy still falls short of the conference acceptance bar.
  • OProver Pulls the Compiler Loop Into Training. Failed trajectories plus verifier-repaired proofs feed SFT directly. MiniF2F 93.3% Pass@32 puts it in the current top tier among open whole-proof provers.
  • CHI-Bench Tests Policy Density, Role Switching, and Mid-Task Dialogue Together. The best agent config clears only 28%. Strict pass^3 keeps everyone under 20%.
  • CompactAttention Targets the Chunked Prefill Gap. Demoting the 2D block-sparse mask from execution plan to KV selection signal gets 2.72x attention speedup at 128K context, with dense-equivalent accuracy.

Also Notable

  • Tool-Using Agents Tested in One Real-Work Pipeline. Real professional tasks force end-to-end failure modes out of tool-using agents.
  • Training-Free N-Gram Memory Module. Plug-and-play path for MoE schemes and trainable memory embeddings.
  • Auto-Generated Abstract Reasoning Tasks, Formally Verifiable. Sidesteps human annotation cost and memorization contamination. Accuracy scoring stops getting dragged by data leakage.
  • SFT That Adds New Knowledge Without Losing Old Capability. Distribution-aligned self-distillation without an external teacher. Post-training stops trading old capability for new.
  • GPU Kernel Agent With Generalization-Aware Evaluation. Pushes kernel agents from single-point capability tests to unseen-config generalization.
  • Expert-Guided Merging Then Quantization. Compresses model merging and quantization into one low-resource deployment pipeline.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.