AI Research Brief

Archives
April 11, 2026

Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o

  • Agent Skills Should Self-Evolve From User Populations. SkillClaw turns multi-user interaction traces into skill evolution signals. One user's correction auto-syncs to everyone, giving agent systems organizational memory.
  • Smart Compression Beats Brute-Force Context Windows. Tempo uses a 6B model to select key frames per query. Under an 8K token budget, it outperforms GPT-4o and Gemini 1.5 Pro.
  • Lighting Becomes a First-Class Citizen in Video Generation. LiVER decouples lighting, layout, and camera motion through a physics renderer. Accepted at CVPR, targeting professional production workflows.
  • Scramble Audio and Video, Let the Model Reassemble. OmniJigsaw uses zero-annotation temporal reordering as a proxy task, forcing models to integrate audiovisual signals. Validated across 15 benchmarks.

Also Notable

  • 170K Style Descriptions + 400K Prompts Build a Scalable Data Pipeline. Uses generative models' own style consistency to solve the data bottleneck for style transfer.
  • RLVR Improves Accuracy but Degrades Reasoning Chains. CoT decouples from visual evidence. Correct answers don't guarantee correct reasoning.
  • Virtual Try-On Starts Caring About Fit. The first try-on dataset with precise sizing annotations. Not just whether it looks good overlaid.
  • Gradient-Signal-Driven Adaptive Layer Sampling. Achieves near-full-parameter fine-tuning results with half the memory (ACL).
  • Stronger LLMs Cooperate Less Under Zero-Cost Collaboration. Cooperation failure in multi-agent systems is a real risk (ICLR).
  • Agent Reward Models Can't Just Evaluate Single Steps. They need to assess entire planning trajectories (ACL).
  • Annotation-Free Medical Visual Reasoning. Agentic RL lets models autonomously locate visual evidence before making judgments (ICLR).
  • More Training Data Isn't Always Better for Search Agents. A hierarchical experience framework filters high-value trajectories from random exploration.
  • Testing VLM Long-Horizon Interaction in a Pokémon 3D Environment. Closer to real agent deployment than static image-text benchmarks.
  • Continuously Editing VLM Knowledge Without Forgetting. A subspace alignment method preserves old concepts during updates (CVPR).

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.