10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA
- 10.6k Curated Trajectories Match a Four-Stage RL Pipeline. OpenSeeker-v2 expands knowledge graph and tool set, applies strict low-step filtering. Pure SFT on a 30B model beats Tongyi DeepResearch's full CPT+SFT+RL on BrowseComp/HLE/xbench. The investment-worthy step is moving from optimizers to trajectory synthesis.
- RL Post-Training Rollout Finally Has a Checklist. A new survey breaks the lifecycle into Generate/Filter/Control/Replay, with three-axis evaluation across reliability, coverage, and cost sensitivity, plus a symptom-to-module diagnostic index.
- A 120K-Parameter Mamba Beats LZMA on Plain CPU. StateSMix combines online training, sparse n-gram, and arithmetic coding in pure C. No GPU needed. Beats xz -9e by 8.7% on 1MB enwik8, but the gap collapses to 0.7% by 10MB.
- Under $50 of Synthetic Data Lifts Open ASR to 3x Commercial on Long-Tail Languages. Indic TTS synthesizes ~22K entity-dense utterances. LoRA fine-tuning Whisper-Telugu pushes Entity-Hit-Rate from 0.027 to 0.473. A 20-utterance human sanity check addresses the same-TTS self-loop concern.
Also Notable
- Multi-Turn Agent Training Environment for 10 Clinical Domains. Gymnasium-compatible, covering intake through treatment decisions; pairs with the earlier PhysicianBench evaluation layer.
- Diagnostic Agent Connected to Fitbit for Daily Self-Reported Symptoms. Stepping from curated cases to real-life self-reports, the performance drop is the data point worth recording.
- Workspace-Bench Targets Cross-File Dependencies. Workspace-level agent benchmark closer to real office workflows than single-file tasks.
- iWorld-Bench Adds a Large-Scale Evaluation for World Models. ICML-accepted interactive world model benchmark with a unified action-generation framework.
- PatRe Models Patent Examination as Multi-Round Office Action Plus Rebuttal. First simulation of a peer-review-style iterative process, breaking out of the static-classification view.
- Tencent AniMatrix Trains Anime's "Physics Violations" as a Prior. Smear, impact frames, chibi shifts — physics-biased video models smooth those out.
- Apple HeadsUp Does Forward 3D Gaussian Head Reconstruction. Multi-camera large-scale capture; the latent is engineered to be very tight.
Don't miss what's next. Subscribe to AI Research Brief: