$15 Per Paper, Healthcare Agents Cap at 28%

        May 22, 2026

$15 Per Paper, Healthcare Agents Cap at 28%

Auto-Research Cost Curve Has Crossed. $15 produces a full paper, but frontier LLMs still fabricate results and miss errors. End-to-end autonomy still falls short of the conference acceptance bar.

OProver Pulls the Compiler Loop Into Training. Failed trajectories plus verifier-repaired proofs feed SFT directly. MiniF2F 93.3% Pass@32 puts it in the current top tier among open whole-proof provers.

CHI-Bench Tests Policy Density, Role Switching, and Mid-Task Dialogue Together. The best agent config clears only 28%. Strict pass^3 keeps everyone under 20%.

CompactAttention Targets the Chunked Prefill Gap. Demoting the 2D block-sparse mask from execution plan to KV selection signal gets 2.72x attention speedup at 128K context, with dense-equivalent accuracy.

Also Notable

Tool-Using Agents Tested in One Real-Work Pipeline. Real professional tasks force end-to-end failure modes out of tool-using agents.
Training-Free N-Gram Memory Module. Plug-and-play path for MoE schemes and trainable memory embeddings.
Auto-Generated Abstract Reasoning Tasks, Formally Verifiable. Sidesteps human annotation cost and memorization contamination. Accuracy scoring stops getting dragged by data leakage.
SFT That Adds New Knowledge Without Losing Old Capability. Distribution-aligned self-distillation without an external teacher. Post-training stops trading old capability for new.
GPU Kernel Agent With Generalization-Aware Evaluation. Pushes kernel agents from single-point capability tests to unseen-config generalization.
Expert-Guided Merging Then Quantization. Compresses model merging and quantization into one low-resource deployment pipeline.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)