AGI Agent

Archives
Subscribe
December 22, 2025

LLM Daily: December 22, 2025

🔍 LLM DAILY

Your Daily Briefing on Large Language Models

December 22, 2025

HIGHLIGHTS

• Resolve AI, founded by ex-Splunk executives, has achieved unicorn status with a Series A funding round, while AI scientist Yann LeCun confirmed launching a new startup focused on world models reportedly seeking a $5B+ valuation.

• Researchers from Cambridge and Princeton have introduced EGGROLL, a breakthrough approach for training deep learning models without backpropagation that directly optimizes evaluation metrics and shows better generalization than traditional methods.

• Open-source web crawling tools specifically optimized for LLMs are gaining significant traction, with Firecrawl (70,000+ stars) and Crawl4AI (57,000+ stars) leading the way in transforming websites into AI-ready formats.

• The SWE-Bench++ framework represents a major advancement in LLM evaluation for coding tasks, automatically generating repository-level challenges from real GitHub pull requests across multiple languages and covering both bug fixes and feature requests.


BUSINESS

Funding & Investment

Resolve AI Hits $1 Billion Valuation with Series A Round

  • AI startup founded by ex-Splunk executives reached unicorn status with a Series A funding round led by Lightspeed Venture Partners
  • TechCrunch (2025-12-19)

Yann LeCun Confirms New "World Model" Startup

  • The renowned AI scientist confirmed launching a new startup focused on world models, though he won't serve as CEO
  • Reports suggest the company is seeking a $5B+ valuation
  • TechCrunch (2025-12-19)

M&A

Cursor Acquires Graphite in Ongoing Acquisition Spree

  • AI coding assistant Cursor has acquired Graphite, an AI code review assistant previously valued at $290 million
  • This continues Cursor's recent acquisition strategy in the AI developer tools space
  • TechCrunch (2025-12-19)

Company Updates

OpenAI Adds User Controls for ChatGPT's Personality

  • ChatGPT users can now directly adjust the AI's warmth, enthusiasm, and emoji usage levels
  • This represents a shift toward more personalized AI interactions
  • TechCrunch (2025-12-20)

OpenAI Implements New Teen Safety Rules

  • The company has added additional safety measures for teenage users of ChatGPT
  • This comes as lawmakers consider new AI standards for minors
  • TechCrunch (2025-12-19)

Waymo Suspends Service During San Francisco Blackout

  • The autonomous vehicle company temporarily halted its robotaxi service after a blackout left many vehicles stalled on city streets
  • The incident highlights ongoing challenges for autonomous vehicle deployment
  • TechCrunch (2025-12-21)

Market Analysis

Hardware Companies Face Bankruptcy Wave

  • iRobot, Luminar, and Rad Power Bikes have all filed for bankruptcy
  • The hardware sector is facing significant challenges from tariff pressures, supply chain issues, and shifting markets
  • This trend highlights potential advantages for AI software companies over hardware-dependent businesses
  • TechCrunch (2025-12-19)

PRODUCTS

EGGROLL: Evolution Strategies for Deep Learning at Scale

Company: Research (Authors from University of Cambridge & Princeton University)
Date: (2025-12-21)
Source: https://arxiv.org/abs/2511.16652

Researchers have introduced EGGROLL, a novel approach for training deep learning models without backpropagation. According to community discussion, this method directly optimizes evaluation metrics (like NDCG for retrieval tasks) instead of using proxy loss functions. Early results suggest that models trained with EGGROLL generalize better than traditional backpropagation-trained counterparts. A PyTorch implementation has been released by a community member alongside the original JAX implementation from the research team.

Z-Image Amateur Photography LoRA

Company: Independent Creator (Major_Specific_23)
Date: (2025-12-21)
Source: https://civitai.com/models/652699/amateur-photography?modelVersionId=2524532

A new LoRA for Stable Diffusion has been released that aims to enhance realism in generated images by mimicking amateur photography styles. The creator designed it to produce variations in lighting, more natural-looking humans, and the imperfect qualities of real amateur photography. The community has responded positively to this enhancement, which provides a different aesthetic from the "perfect" images typically generated by AI image models.

llama.cpp Performance Improvements

Company: Open Source (llama.cpp contributors)
Date: (2025-12-21)
Source: https://www.reddit.com/r/LocalLLaMA/comments/1psbx2q/llamacpp_appreciation_post/

The open-source project llama.cpp continues to receive widespread community appreciation for its frequent updates and performance improvements. Recent benchmarks shared by users demonstrate significant speed enhancements, with one user reporting performance improving from 8 tokens per second using LM Studio to 23 tokens per second with llama.cpp when running Qwen3 Next 80B on consumer hardware (Radeon 6700XT 12GB + 5600G + 32GB DDR4). This showcases the project's ongoing optimization work that enables running large language models efficiently on consumer hardware.


TECHNOLOGY

Open Source Projects

firecrawl/firecrawl - Web Data API for AI

A TypeScript-based tool that transforms websites into LLM-ready markdown or structured data. Firecrawl allows developers to easily extract and format web content for AI applications. With over 70,000 stars and active development (last commit December 19), it has become a popular solution for web scraping in AI workflows.

unclecode/crawl4ai - LLM Friendly Web Crawler

This open-source Python web crawler and scraper is specifically designed to extract content in formats optimized for large language models. With 57,504 stars and recent updates (v0.7.8 released on December 11), Crawl4AI offers a specialized solution for gathering training and inference data from websites.

Models & Datasets

Tongyi-MAI/Z-Image-Turbo - Fast Text-to-Image Generation

A high-performance text-to-image diffusion model with over 3,200 likes and 350,000+ downloads. Z-Image-Turbo implements a custom pipeline architecture (ZImagePipeline) and is supported by multiple research papers (arxiv:2511.22699, arxiv:2511.22677, arxiv:2511.13649). The model is optimized for speed while maintaining high-quality image generation.

Qwen/Qwen-Image-Layered - Image Editing with Layers

An image-text-to-image generation model that enables layered editing capabilities. Built on the Qwen/Qwen-Image base model, it provides sophisticated image manipulation features as described in a recent research paper (arxiv:2512.15603). The model supports both English and Chinese inputs and implements a custom QwenImageLayeredPipeline.

OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B - Healthcare Reasoning Dataset

A specialized medical dataset with 179 likes and over 2,100 downloads designed for training language models on medical reasoning tasks. This collection contains between 100K and 1M examples in parquet format, making it valuable for researchers developing AI systems for healthcare applications.

google/mobile-actions - Function Calling for Mobile

A dataset from Google designed specifically for training function-calling capabilities for mobile applications. With 1,200+ downloads, this collection provides examples for FunctionGemma and other models to interact with mobile interfaces. The dataset contains 1K-10K JSON-formatted examples.

Developer Tools

ResembleAI/chatterbox-turbo-demo - Voice AI Interface

A Gradio-based demo for conversational AI with voice capabilities from ResembleAI. With 323 likes, this space demonstrates advanced voice synthesis and processing technologies, offering developers a reference implementation for building voice-enabled AI applications.

HuggingFaceTB/smol-training-playbook - LLM Training Resource

A comprehensive guide for training smaller language models with over 2,600 likes. This Docker-based space provides researchers and developers with best practices, code examples, and data visualizations for efficient training of small to medium-sized language models, addressing the growing interest in more efficient AI development.

webml-community/FunctionGemma-Physics-Playground - Function Calling Testbed

A static demonstration environment for testing Google's FunctionGemma capabilities in physics-related applications. This playground allows developers to explore and understand how function calling can be implemented for scientific computing and physics simulations.

Infrastructure

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 - Optimized 30B Parameter Model

NVIDIA's 30B parameter language model optimized for performance with BF16 precision. With 419 likes and 74,000+ downloads, this model is designed for high-efficiency inference across various languages. It has been trained on a diverse set of datasets from NVIDIA's Nemotron collection and is compatible with various deployment endpoints.

AiSudo/Qwen-Image-to-LoRA - LoRA Generation Tool

A Gradio interface with 261 likes that enables users to generate Low-Rank Adaptation (LoRA) models from images using Qwen's image understanding capabilities. This tool simplifies the creation of specialized image generation models with minimal computational resources, making advanced model fine-tuning more accessible.

AI-nthusiast/cognitive-proxy - AI System Integration

A Gradio-based tool for orchestrating interactions between different AI systems. With 44 likes, this proxy service helps developers create more complex AI applications by coordinating multiple models and services, addressing the growing need for interoperable AI systems.


RESEARCH

Paper of the Day

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories (2025-12-19)

Lilin Wang, Lucas Ramalho, Alan Celestino, Phuc Anthony Pham, Yu Liu, Umang Kumar Sinha, Andres Portillo, Onassis Osunwa, Gabriel Maduekwe

This paper stands out for addressing a critical gap in LLM evaluation for software engineering tasks by introducing an automated framework that generates repository-level coding challenges from real open-source GitHub projects. Unlike previous static benchmarks that rely on manual curation and focus primarily on Python bug fixes, SWE-Bench++ automatically harvests live pull requests to create a diverse, multilingual benchmark covering both bug fixes and feature requests. This represents a significant advancement in creating scalable, realistic evaluation frameworks for code-related LLM capabilities.

Notable Research

XAgen: An Explainability Tool for Identifying and Correcting Failures in Multi-Agent Workflows (2025-12-19)

Xinru Wang, Ming Yin, Eunyee Koh, Mustafa Doga Dogan

This research addresses a critical challenge in LLM-based multi-agent systems by developing an explainability tool that helps users identify and fix failures in agent workflows, based on insights from interviews with 12 practitioners.

Towards Explainable Conversational AI for Early Diagnosis with Large Language Models (2025-12-19)

Maliha Tabassum, M Shamim Kaiser

The authors introduce a diagnostic chatbot powered by GPT-4o with RAG capabilities that brings transparency to medical diagnostics, addressing key challenges in healthcare including inefficient diagnostics and limited access to specialists.

Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing (2025-12-19)

Lingxiao Zhao, Haoran Zhou, Yuezhi Che, Dazhao Cheng

This paper presents a novel system design for multimodal LLM inference that addresses performance bottlenecks through GPU-internal scheduling and resource sharing across the three stages of MLLM processing.

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation (2025-12-19)

Rang Li, Lei Li, Shuhuai Ren, Hao Tian, Shuhao Gu, Shicheng Li, Zihao Yue, Yudong Wang, Wenhan Ma, Zhe Yang, Jingyuan Ma, Zhifang Sui, Fuli Luo

This work introduces a comprehensive evaluation framework that exposes critical limitations in current multimodal LLMs' visual grounding abilities through multi-dimensional assessment techniques.


LOOKING AHEAD

As 2025 comes to a close, we're seeing clear signals of what 2026 will bring to AI. The integration of multimodal reasoning capabilities into everyday applications is accelerating, with several major platforms now seamlessly blending text, image, audio, and physical world interactions. The regulatory landscape continues to evolve, with the EU's AI Harmonization Act implementation deadline approaching in Q2 2026 and similar frameworks gaining traction globally.

Watch for the emergence of highly specialized domain-expert models that rival human specialists in fields like medicine and engineering. The long-promised general-purpose household robots also appear poised for mainstream adoption by mid-2026, as recent breakthroughs in embodied AI and affordable sensor technology have finally aligned. These developments suggest we're entering a new phase where AI transitions from digital assistant to physical world collaborator.

Don't miss what's next. Subscribe to AGI Agent:
Share this email:
Share on Facebook Share on Twitter Share on Hacker News Share via email
GitHub
Twitter
Powered by Buttondown, the easiest way to start and grow your newsletter.