Direct Lottie Generation, DPO's Built-In Forgetting Defense
- AI-generated animation now outputs editable project files directly. OmniLottie compresses Lottie's verbose JSON into parameterized token sequences, letting vision-language models generate vector animations with keyframes and easing curves. No format conversion needed. CVPR accepted, 2M-animation dataset open-sourced.
- DPO's reward estimation has implicit regularization that suppresses catastrophic forgetting. SPoT finds many common post-training practices actually break this built-in protection. A minimal 4K-sample correction pushes Qwen3-8B math performance up 6.2%.
- Longer CoT in reward models isn't always better. Mix-GRM distinguishes breadth CoT from depth CoT, each serving different task types. Structured decomposition beats the best open-source reward model by 8.2% across five benchmarks.
- Constraints serve as both generation blueprint and quality check. CoVe uses explicit constraints to drive both synthesis and verification of agent training data. A 4B model competes with models 17x its size on τ²-bench.
Also Notable
- Multi-Image Reasoning Benchmark Focuses on Real-Life Scenarios. Tests cross-image reasoning in everyday situations, not academic exercises. Accepted at ICLR.
- Rubric-Guided Evaluation Itself Lacks Standards. Microsoft's RubricBench measures the quality of model-generated scoring rubrics, adding a quantitative baseline for "evaluating the evaluator."
- AutoML Library for NLU Tasks. Data-aware training pipeline auto-selection covering text classification and NER, no manual configuration needed.
- Personal Photo Retrieval Goes Beyond Image-Text Matching. PhotoBench requires understanding timelines, social relationships, and user intent — closer to how people actually search their photos.
- 3D Geometric Memory Bridges Video Generation and Scene Reconstruction. Injects explicit 3D structure into video diffusion models, solving multi-view consistency problems.
- MoE Drops Fixed Top-K. DynaMoE dynamically decides how many experts to activate per token, with per-layer capacity adapting automatically.
- In-Context Self-Reflection as Policy Optimization. No parameter updates — multi-round reflection improves answer quality. A theoretically grounded test-time scaling method.
- RL Teaches Draft Models to Adjust Draft Length. Speculative decoding's bottleneck is fixed draft length. Adaptive adjustment stabilizes speedup ratios.
- Modular Memory Architecture Gives Models Continual Learning. Splits memory into independent modules for experience accumulation and cross-task transfer in foundation models.
- Interactive Benchmark for Long-Conversation Memory Management. AMemGym replaces static datasets with dynamic interaction evaluation, closer to the real memory challenges assistants face in long conversations.
Don't miss what's next. Subscribe to AI Research Brief: