12k Samples Beat Finance SOTA, CUDA Optimization 35% Faster
- Post-Training Data Matters More Than Model Size in Vertical Domains. A systematic ablation in finance shows that distillation quality control plus difficulty-aware sampling lets an 8B model beat same-scale SOTA with just 12k RL samples.
- Offline RL Turns Agent Planning From Guesswork Into Engineering. Microsoft trains tool-call planning on synthetic trajectories with quality scoring. The approach transfers to any multi-step agent task.
- Models Shouldn't Be Locked to Fixed Weights After Deployment. Tencent's HY-WU introduces a functional memory module that generates instance-level weight updates in real time, skipping test-time optimization overhead.
- LLM CUDA Kernel Optimization Expands to General HPC. A new benchmark, MSKernelBench, covers four task categories. A multi-agent architecture runs 35% faster than existing methods overall.
Also Notable
- RL Agent Autonomously Runs Architecture Search Until Convergence. Bold idea, but validation scale is still small.
- Activation Steering Controls Endoscopy Pathological Features Without Training or Fine-Tuning. Generates causal training data inside diffusion models.
- RLVR Reasoning Chains Are Full of Redundant Steps; Re-Solving Sends Models Back to Key Nodes. Both efficiency and quality improve (ICLR).
- Slide Auto-Generation Finally Gets a Fine-Grained Rubric Benchmark. Covers layout, content, and visual consistency.
- Mila's Planet-Scale 4D Spatiotemporal World Model. Extends multi-resolution hash encoding into time for self-supervised representations across centuries and continents.
- Long Video Understanding Has a Credibility Problem: VLMs Answer Confidently With Key Frames Missing. Evaluation scores are inflated (CVPR).
- RAG Applied to Gene Perturbation Response Prediction. Cross-cell-type generalization significantly outperforms pure deep learning methods (ICLR).
- Conformal Prediction Meets Generative Molecular Design. Statistical guarantees without an oracle (ICLR).
Don't miss what's next. Subscribe to AI Research Brief: