12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster

        March 10, 2026

12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster

Post-Training Data Matters More Than Model Size in Vertical Domains. A systematic ablation in finance shows that distillation quality control plus difficulty-aware sampling lets an 8B model beat same-scale SOTA with just 12k RL samples.

Offline RL Turns Agent Planning From Guesswork Into Engineering. Microsoft trains tool-call planning on synthetic trajectories with quality scoring. The approach transfers to any multi-step agent task.

Models Shouldn't Be Locked to Fixed Weights After Deployment. Tencent's HY-WU introduces a functional memory module that generates instance-level weight updates in real time, skipping test-time optimization overhead.

LLM CUDA Kernel Optimization Expands to General HPC. A new benchmark, MSKernelBench, covers four task categories. A multi-agent architecture runs 35% faster than existing methods overall.

Also Notable

RL Agent Autonomously Runs Architecture Search Until Convergence. Bold idea, but validation scale is still small.
Activation Steering Controls Endoscopy Pathological Features Without Training or Fine-Tuning. Generates causal training data inside diffusion models.
RLVR Reasoning Chains Are Full of Redundant Steps; Re-Solving Sends Models Back to Key Nodes. Both efficiency and quality improve (ICLR).
Slide Auto-Generation Finally Gets a Fine-Grained Rubric Benchmark. Covers layout, content, and visual consistency.
Mila's Planet-Scale 4D Spatiotemporal World Model. Extends multi-resolution hash encoding into time for self-supervised representations across centuries and continents.
Long Video Understanding Has a Credibility Problem: VLMs Answer Confidently With Key Frames Missing. Evaluation scores are inflated (CVPR).
RAG Applied to Gene Perturbation Response Prediction. Cross-cell-type generalization significantly outperforms pure deep learning methods (ICLR).
Conformal Prediction Meets Generative Molecular Design. Statistical guarantees without an oracle (ICLR).

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)