AGI Agent

Archives
Subscribe
December 21, 2025

LLM Daily: December 21, 2025

🔍 LLM DAILY

Your Daily Briefing on Large Language Models

December 21, 2025

HIGHLIGHTS

• OpenAI has introduced customization features for ChatGPT that allow users to adjust the AI's warmth, enthusiasm, and emoji usage, creating more personalized interaction experiences.

• Xiaomi has entered the frontier AI competition with its MiMo-V2-Flash, a massive 309 billion parameter language model that positions the Chinese tech giant as a serious competitor to established AI leaders.

• ETH Zürich researchers have developed Stackelberg Learning from Human Feedback (SLHF), a new approach to AI alignment that frames preference learning as a sequential game between policies, showing significant improvements over standard RLHF in preventing reward hacking.

• The LongVie 2 model represents a breakthrough in AI video generation, capable of creating ultra-long videos up to 5 minutes in length compared to the much shorter clips traditionally possible with generative AI.

• ComfyUI has emerged as the leading open-source tool for advanced image generation with its node-based interface, attracting over 97,500 GitHub stars and becoming the preferred solution for complex image generation workflows.


BUSINESS

OpenAI Allows Users to Customize ChatGPT's Personality

TechCrunch (2025-12-20) OpenAI has introduced a new feature allowing ChatGPT users to directly adjust the chatbot's warmth, enthusiasm, and emoji usage. This personalization option aims to make interactions more tailored to individual user preferences.

Ex-Splunk Executives' Resolve AI Reaches Unicorn Status

TechCrunch (2025-12-19) Resolve AI, founded by former Splunk executives, has achieved unicorn status with a $1 billion valuation in its Series A funding round. The investment was led by Lightspeed Venture Partners, according to sources familiar with the deal.

Cursor Acquires Graphite in Ongoing Acquisition Strategy

TechCrunch (2025-12-19) AI coding assistant Cursor has acquired Graphite, an AI code review assistant previously valued at $290 million. This acquisition continues Cursor's strategy of expanding its AI development tools portfolio through strategic purchases.

Yann LeCun Confirms New "World Model" Startup

TechCrunch (2025-12-19) Renowned AI scientist Yann LeCun has confirmed launching a new startup focused on "world models," though he clarified he won't serve as CEO. Reports suggest the company is seeking a valuation exceeding $5 billion.

ChatGPT Launches App Store for Developers

TechCrunch (2025-12-18) OpenAI has launched an app store for ChatGPT, opening its platform to third-party developers. This marketplace aims to expand ChatGPT's functionality through a variety of user experiences created by external developers.

Former British Chancellor Joins OpenAI Leadership

TechCrunch (2025-12-18) George Osborne, former British Chancellor of the Exchequer, has joined OpenAI as managing director and head of OpenAI for Countries. He will also run Coinbase's internal advisory council, highlighting the growing trend of politicians moving to tech leadership roles.

ChatGPT Mobile App Reaches $3B in Consumer Spending

TechCrunch (2025-12-18) ChatGPT's mobile app has achieved $3 billion in lifetime consumer spending in just 31 months, reaching this milestone faster than TikTok and major streaming apps.

Peripheral Labs Raises $3.6M for Sports Viewing Technology

TechCrunch (2025-12-18) Peripheral Labs has secured a $3.6 million seed round led by Khosla Ventures for its technology that utilizes self-driving car sensors to create immersive sports viewing experiences.


PRODUCTS

Xiaomi's MiMo-V2-Flash (309B Model) Enters the Big Leagues

Xiaomi has released their MiMo-V2-Flash, a massive 309 billion parameter language model that appears to be positioning the Chinese tech giant as a serious competitor in the AI space. According to discussions on r/LocalLLaMA, the model is receiving attention for its impressive capabilities, though some users express skepticism about benchmark metrics. Reddit Discussion (2025-12-20)

LongVie 2: Ultra-Long Video World Model

A new AI model called LongVie 2 has been released, capable of generating ultra-long videos up to 5 minutes in length. This represents a significant advancement in AI video generation, which has traditionally been limited to much shorter clips. The model appears to be gaining traction in the Stable Diffusion community. Reddit Post (2025-12-20)

Awesome Production Machine Learning Repository

A new curated repository of open-source libraries for deploying, monitoring, versioning, and scaling machine learning systems has been shared with the machine learning community. This resource aims to help practitioners implement ML solutions in production environments more effectively. GitHub Repository via Reddit (2025-12-20)


TECHNOLOGY

Open Source Projects

ComfyUI - Modular Diffusion UI

The most powerful and modular diffusion model GUI with a node-based interface for creating complex image generation workflows. Differentiating itself with a focus on flexibility and power, ComfyUI allows for granular control over the generation process through its visual programming approach. With over 97,500 GitHub stars and active development, ComfyUI has become the tool of choice for advanced image generation workflows.

FireCrawl - Web Data API for AI

A specialized tool that converts entire websites into LLM-ready markdown or structured data, making web content easy to process for AI applications. FireCrawl differentiates itself by providing clean, context-preserving extraction optimized specifically for large language model consumption. The project has gained impressive traction with over 70,000 stars and shows active development with recent fixes to the API search function and billing logic.

Models & Datasets

Z-Image-Turbo - Fast Text-to-Image Generation

Tongyi-MAI's new text-to-image model has gained significant popularity with over 3,100 likes and 340,000+ downloads. Building on research from multiple recent papers (referenced by arXiv IDs), Z-Image-Turbo offers high-quality image generation with improved inference speed. The model is compatible with the ZImagePipeline in Diffusers and available for Azure deployment.

NVIDIA Nemotron-3-Nano-30B - Efficient Large Language Model

NVIDIA's 30B parameter language model designed for efficient inference while maintaining strong capabilities across multiple languages including English, Spanish, French, German, Japanese, and Italian. Built on a diverse training dataset spanning code, math, science, and instruction following, the model uses BF16 precision and has gained over 70,000 downloads since release.

Medical Reasoning SFT Dataset

A specialized dataset containing medical reasoning examples for fine-tuning language models, with over 1,900 downloads. The dataset focuses on enhancing LLM capabilities in healthcare contexts and contains between 100K and 1M examples in parquet format. Released under the Apache-2.0 license, it's compatible with multiple data processing libraries including datasets, dask, and polars.

AnthropicInterviewer Dataset

Anthropic's dataset for improving language model evaluation through simulated interviews, with over 11,000 downloads and 328 likes. Containing between 1K and 10K examples in CSV format, this MIT-licensed dataset provides structured conversations that can be used to assess model capabilities more effectively than standard benchmarks.

Developer Tools

Cognitive Proxy

A Gradio-based tool that likely serves as an intermediary for optimizing LLM interactions, though specific details are limited. Despite being relatively new with 43 likes, the space appears to be gaining interest within the AI development community.

SMOL Training Playbook

A comprehensive resource for training smaller, more efficient language models with over 2,600 likes. Presented as a research paper with data visualizations, this Docker-based space provides practical guidelines and techniques for training resource-efficient models without sacrificing performance.

Infrastructure & Applications

AWPortrait-Z

A LoRA adapter built on the Z-Image-Turbo model, specialized for portrait generation. With 353 likes and over 4,000 downloads, this Apache-2.0 licensed model demonstrates how foundation models can be efficiently adapted for specific use cases through parameter-efficient fine-tuning techniques.

Qwen-Image-to-LoRA

A Gradio-based tool for converting images to LoRA adapters compatible with the Qwen model. This space has received 243 likes and provides an accessible interface for creating personalized image generation models from reference images, lowering the technical barrier for customizing image generation.

FunctionGemma Physics Playground

An interactive demo showcasing Google's FunctionGemma model applied to physics problems. This static site with 40 likes demonstrates how specialized models can be deployed for educational use cases, particularly in STEM fields where function calling capabilities are valuable for solving computational problems.


RESEARCH

Paper of the Day

Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game (2025-12-18)

Authors: Barna Pásztor, Thomas Kleine Buening, Andreas Krause Institution: ETH Zürich

This paper introduces a fundamentally new approach to preference alignment in AI systems. Unlike standard RLHF which uses a single-policy reinforcement learning approach, SLHF frames alignment as a sequential game between a Leader policy and a Follower policy, creating a more robust framework for aligning AI with human preferences. The approach shows significant improvements over RLHF in preventing reward hacking and reward misspecification, two critical issues in current alignment techniques.

Notable Research

From Facts to Conclusions: Integrating Deductive Reasoning in Retrieval-Augmented LLMs (2025-12-18)

Authors: Shubham Mishra, Samyek Jain, Gorang Mehrishi, et al. This research tackles a fundamental limitation in RAG systems by introducing a structured reasoning framework that helps LLMs resolve conflicts between retrieved documents and produce better-grounded responses with citation links.

Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or Worse (2025-12-18)

Authors: Aaron Imani, Mohammad Moshirpour, Iftekhar Ahmed This study reveals how code comments affect LLM-generated code quality, showing that while comments generally improve performance, they can also lead models to internalize and reproduce security vulnerabilities mentioned in comments.

Scaling Laws for Energy Efficiency of Local LLMs (2025-12-18)

Authors: Ander Alvarez, Alessandro Genuardi, Nilotpal Sinha, et al. The researchers establish novel scaling laws for CPU-based inference of local LLMs, providing crucial guidelines for deploying models on edge devices with optimal energy efficiency-to-performance tradeoffs.

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification (2025-12-18)

Authors: Qihao Liu, Chengzhi Mao, Yaojie Liu, Alan Yuille, Wen-Sheng Chu This paper introduces AuditDM, an innovative framework that uses reinforcement learning to automatically identify and fix capability gaps in multimodal LLMs by generating targeted questions and counterfactual images that reveal model weaknesses.


LOOKING AHEAD

As 2025 draws to a close, we're witnessing the early stages of truly multimodal AI systems that seamlessly integrate reasoning across text, image, audio, video, and structured data. The Q1 2026 release calendar suggests these capabilities will become standardized across enterprise solutions. Most notably, the emerging "cognitive architecture" approach—combining specialized model components rather than scaling single models—is gaining traction after demonstrated efficiency improvements of 30-40% in recent research.

Looking toward mid-2026, we anticipate significant advancements in AI-augmented scientific discovery as specialized research models trained on domain-specific knowledge become more accessible. The regulatory landscape will likely evolve in response, with the EU's AI Act implementation phase concluding and similar frameworks expected from several Asian markets by Q3 2026.

Don't miss what's next. Subscribe to AGI Agent:
Share this email:
Share on Facebook Share on Twitter Share on Hacker News Share via email
GitHub
Twitter
Powered by Buttondown, the easiest way to start and grow your newsletter.