AI Research Brief

Archives
April 4, 2026

4M Game Frames Train Rendering, Internalized Skills Beat Retrieval

  • Discrete Tokens Are LLM's Architectural Ceiling, Not an Optimization Target. A survey traces four technical threads showing core computation migrating from token sequences to continuous latent space.
  • Agent Skills Work Better Internalized via RL Than Retrieved at Runtime. SKILL0's progressive withdrawal curriculum improves ALFWorld by 9.7%, with under 500 tokens per step at inference.
  • AAA Game Engines Are a Hidden Data Goldmine for Generative Rendering. 4 million synchronized RGB + G-buffer frames produce models that clearly outperform existing solutions on cross-dataset generalization.
  • Visual Features Can Be Steered in Real Time with Text Prompts. Injecting cross-attention inside ViT encoder layers enables zero-shot generalization on anomaly detection without degrading general capabilities.

Also Notable

  • Cross-Modal Reasoning in Latent Space — avoids the information loss of translating visual content to text. LatentUM
  • Multiple LLM Agents Autonomously Explore, Reflect, and Collaborate on Open Problems — no more hardcoded search rules. CORAL
  • Near-Identity Distractors Remove Background Dependency from Visual Encoders — identity representations that actually focus on the subject. NearID
  • Video Inpainting Beyond Filling Gaps — when the removed object has physical interactions, the entire scene's causal chain needs re-reasoning. VOID
  • Autonomous Driving VLAs Can't Do Spatial Awareness and Semantic Reasoning at Once — an attempt to unify both in a single framework. UniDriveVLA
  • 3D Textures as an Adversarial Attack Surface — closer to real deployment than 2D patches, a warning sign for VLA model robustness. Tex3D
  • Bridging 3D Data Scarcity with 2D Generation — a foundation model unifying text-to-2D and text-to-3D generation. Omni123
  • Graph-Based Synthesis of Cross-Modal Multi-Hop Reasoning Data — addressing the single-image limitation of existing multimodal benchmarks. CRIT
  • Arbitrary-Resolution Images in a Single Forward Pass — freeing ViTs from pretrained resolution constraints on dense prediction tasks. SPAR
  • Visual Riddles Test Visual Reasoning — when images are clues rather than answers, current models' cognitive abilities drop off a cliff. RebusBench

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.