10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA

        May 8, 2026

10.6k SFT Trajectories Match Full RL Pipeline; Mamba Beats LZMA

10.6k Curated Trajectories Match a Four-Stage RL Pipeline. OpenSeeker-v2 expands knowledge graph and tool set, applies strict low-step filtering. Pure SFT on a 30B model beats Tongyi DeepResearch's full CPT+SFT+RL on BrowseComp/HLE/xbench. The investment-worthy step is moving from optimizers to trajectory synthesis.

RL Post-Training Rollout Finally Has a Checklist. A new survey breaks the lifecycle into Generate/Filter/Control/Replay, with three-axis evaluation across reliability, coverage, and cost sensitivity, plus a symptom-to-module diagnostic index.

A 120K-Parameter Mamba Beats LZMA on Plain CPU. StateSMix combines online training, sparse n-gram, and arithmetic coding in pure C. No GPU needed. Beats xz -9e by 8.7% on 1MB enwik8, but the gap collapses to 0.7% by 10MB.

Under $50 of Synthetic Data Lifts Open ASR to 3x Commercial on Long-Tail Languages. Indic TTS synthesizes ~22K entity-dense utterances. LoRA fine-tuning Whisper-Telugu pushes Entity-Hit-Rate from 0.027 to 0.473. A 20-utterance human sanity check addresses the same-TTS self-loop concern.

Also Notable

Multi-Turn Agent Training Environment for 10 Clinical Domains. Gymnasium-compatible, covering intake through treatment decisions; pairs with the earlier PhysicianBench evaluation layer.
Diagnostic Agent Connected to Fitbit for Daily Self-Reported Symptoms. Stepping from curated cases to real-life self-reports, the performance drop is the data point worth recording.
Workspace-Bench Targets Cross-File Dependencies. Workspace-level agent benchmark closer to real office workflows than single-file tasks.
iWorld-Bench Adds a Large-Scale Evaluation for World Models. ICML-accepted interactive world model benchmark with a unified action-generation framework.
PatRe Models Patent Examination as Multi-Round Office Action Plus Rebuttal. First simulation of a peer-review-style iterative process, breaking out of the static-classification view.
Tencent AniMatrix Trains Anime's "Physics Violations" as a Prior. Smear, impact frames, chibi shifts — physics-biased video models smooth those out.
Apple HeadsUp Does Forward 3D Gaussian Head Reconstruction. Multi-camera large-scale capture; the latent is engineered to be very tight.

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)