Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o

        April 11, 2026

Scrambled Media Boosts Reasoning; 6B Model Tops GPT-4o

Agent Skills Should Self-Evolve From User Populations. SkillClaw turns multi-user interaction traces into skill evolution signals. One user's correction auto-syncs to everyone, giving agent systems organizational memory.

Smart Compression Beats Brute-Force Context Windows. Tempo uses a 6B model to select key frames per query. Under an 8K token budget, it outperforms GPT-4o and Gemini 1.5 Pro.

Lighting Becomes a First-Class Citizen in Video Generation. LiVER decouples lighting, layout, and camera motion through a physics renderer. Accepted at CVPR, targeting professional production workflows.

Scramble Audio and Video, Let the Model Reassemble. OmniJigsaw uses zero-annotation temporal reordering as a proxy task, forcing models to integrate audiovisual signals. Validated across 15 benchmarks.

Also Notable

170K Style Descriptions + 400K Prompts Build a Scalable Data Pipeline. Uses generative models' own style consistency to solve the data bottleneck for style transfer.
RLVR Improves Accuracy but Degrades Reasoning Chains. CoT decouples from visual evidence. Correct answers don't guarantee correct reasoning.
Virtual Try-On Starts Caring About Fit. The first try-on dataset with precise sizing annotations. Not just whether it looks good overlaid.
Gradient-Signal-Driven Adaptive Layer Sampling. Achieves near-full-parameter fine-tuning results with half the memory (ACL).
Stronger LLMs Cooperate Less Under Zero-Cost Collaboration. Cooperation failure in multi-agent systems is a real risk (ICLR).
Agent Reward Models Can't Just Evaluate Single Steps. They need to assess entire planning trajectories (ACL).
Annotation-Free Medical Visual Reasoning. Agentic RL lets models autonomously locate visual evidence before making judgments (ICLR).
More Training Data Isn't Always Better for Search Agents. A hierarchical experience framework filters high-value trajectories from random exploration.
Testing VLM Long-Horizon Interaction in a Pokémon 3D Environment. Closer to real agent deployment than static image-text benchmarks.
Continuously Editing VLM Knowledge Without Forgetting. A subspace alignment method preserves old concepts during updates (CVPR).

Read the full edition →

                                Don't miss what's next. Subscribe to AI Research Brief:

            Email address (required)