TTT Is Linear Attention, Terminal Agent Data Recipe Goes Open
- TTT architectures are formally equivalent to linear attention. NVIDIA's proof unifies two independent research communities, sharply narrowing the design space for efficient sequence modeling.
- The training data recipe for terminal agents is finally public. From seed task generation to skill composition and training strategy comparisons, the full dataset and model weights are open-sourced. An 8B model jumps from 2.5% to 13.0% accuracy.
- RL-trained vision agents have a laziness problem, and now there's an engineering fix. Oversampling plus cumulative tool rewards effectively stops interaction collapse, keeping models from degenerating into single-turn QA.
- Multi-modal retrieval's storage bottleneck gets a universal compression solution. Attention-guided clustering compresses document vectors to a fixed budget while preserving retrieval quality across text, image, and video.
- Google's Aletheia completes a math proof challenge fully autonomously. But 10 problems is nowhere near enough to draw conclusions. The maturity of math reasoning benchmarks may be the bigger bottleneck.
Also Notable
- VLM Evaluation Upgrades From Single-Turn VQA to Causal Reasoning Hierarchies — Understanding how geometry, contact, and support relationships constrain possible actions. Source
- Agentic Data Synthesis Teaches VLMs and Diffusion Models to Fix Visual Artifacts — Not just detection: localization and repair. Source
- Text Rendering Quality Assessment Has a Blind Spot — Mainstream MLLMs and OCR models are nearly blind to structural anomalies. CVPR paper. Source
- Driving Scene World Model Uses Ray Space for 4D Spatiotemporal Reasoning — Berkeley team unifies spatial and temporal correlations. CVPR. Source
- First Open 3D Dataset for Spinal Motion Modeling — Bridging biomechanical simulation and computer vision. CVPR. Source
- CMU Quantifies How Much LLMs Memorize Personal Information From Training Data — Emails, phone numbers, IP addresses: leakage risk is more systematic than expected. Source
- Unified Detection, Localization, and Recovery for AI Face-Swaps via Watermarking — Goes beyond detection to restore tampered regions. CVPR. Source
- When the Preference Oracle Itself Is Noisy — A robust online alignment algorithm built for the realistic premise that feedback isn't reliable. Source
- Protein Language Models Learn Systematically Different Attention Patterns From NLP Transformers — Same architecture, different data domain, different computational strategies. Used to improve inference efficiency. Source
Don't miss what's next. Subscribe to AI Research Brief: