AI Research Brief

Archives
March 4, 2026

Direct Lottie Generation, DPO's Built-In Forgetting Defense

  • AI-generated animation now outputs editable project files directly. OmniLottie compresses Lottie's verbose JSON into parameterized token sequences, letting vision-language models generate vector animations with keyframes and easing curves. No format conversion needed. CVPR accepted, 2M-animation dataset open-sourced.
  • DPO's reward estimation has implicit regularization that suppresses catastrophic forgetting. SPoT finds many common post-training practices actually break this built-in protection. A minimal 4K-sample correction pushes Qwen3-8B math performance up 6.2%.
  • Longer CoT in reward models isn't always better. Mix-GRM distinguishes breadth CoT from depth CoT, each serving different task types. Structured decomposition beats the best open-source reward model by 8.2% across five benchmarks.
  • Constraints serve as both generation blueprint and quality check. CoVe uses explicit constraints to drive both synthesis and verification of agent training data. A 4B model competes with models 17x its size on τ²-bench.

Also Notable

  • Multi-Image Reasoning Benchmark Focuses on Real-Life Scenarios. Tests cross-image reasoning in everyday situations, not academic exercises. Accepted at ICLR.
  • Rubric-Guided Evaluation Itself Lacks Standards. Microsoft's RubricBench measures the quality of model-generated scoring rubrics, adding a quantitative baseline for "evaluating the evaluator."
  • AutoML Library for NLU Tasks. Data-aware training pipeline auto-selection covering text classification and NER, no manual configuration needed.
  • Personal Photo Retrieval Goes Beyond Image-Text Matching. PhotoBench requires understanding timelines, social relationships, and user intent — closer to how people actually search their photos.
  • 3D Geometric Memory Bridges Video Generation and Scene Reconstruction. Injects explicit 3D structure into video diffusion models, solving multi-view consistency problems.
  • MoE Drops Fixed Top-K. DynaMoE dynamically decides how many experts to activate per token, with per-layer capacity adapting automatically.
  • In-Context Self-Reflection as Policy Optimization. No parameter updates — multi-round reflection improves answer quality. A theoretically grounded test-time scaling method.
  • RL Teaches Draft Models to Adjust Draft Length. Speculative decoding's bottleneck is fixed draft length. Adaptive adjustment stabilizes speedup ratios.
  • Modular Memory Architecture Gives Models Continual Learning. Splits memory into independent modules for experience accumulation and cross-task transfer in foundation models.
  • Interactive Benchmark for Long-Conversation Memory Management. AMemGym replaces static datasets with dynamic interaction evaluation, closer to the real memory challenges assistants face in long conversations.

Read the full edition →

Don't miss what's next. Subscribe to AI Research Brief:
Powered by Buttondown, the easiest way to start and grow your newsletter.