LLM Daily: September 23, 2025

Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang

                        September 23, 2025

            LLM Daily: September 23, 2025

                    🔍 LLM DAILY
Your Daily Briefing on Large Language Models
September 23, 2025
HIGHLIGHTS
• Facebook is developing an AI dating assistant for its platform to help users build profiles and find better matches, marking Meta's latest effort to integrate AI features into its social services.
• Alibaba Cloud has released three new specialized Qwen3-Omni 30B models using a new A3B architecture, generating excitement for their potential to run powerful AI capabilities on consumer-grade hardware.
• Google Research has introduced AToken, a breakthrough unified visual tokenizer that works across images, videos, and 3D assets while accomplishing both high-fidelity reconstruction and semantic understanding in a single framework.
• Major tech companies including Meta, Oracle, Microsoft, Google, and OpenAI are making billion-dollar investments in AI infrastructure projects that are powering the current AI boom and shaping the future of computing.

BUSINESS
Facebook Adding AI Dating Assistant
Facebook is developing an AI assistant for its Dating platform to help users build their profiles and find better matches, according to TechCrunch. The new feature aims to improve the dating experience on Meta's platform. TechCrunch (2025-09-22)
Major AI Infrastructure Investments Revealed
A new analysis has uncovered details about the biggest AI infrastructure projects across the tech industry, with significant investments from Meta, Oracle, Microsoft, Google, and OpenAI. These billion-dollar deals are powering the current AI boom and shaping the future of computing infrastructure. TechCrunch (2025-09-22)
Sequoia Capital Invests in Irregular
Sequoia Capital has announced a new investment in Irregular, a company positioned at the cutting edge of AI technology. The venture firm highlighted the partnership in their latest announcement, though specific funding details weren't disclosed. Sequoia Capital (2025-09-17)
Silicon Valley's New Trend: AI Training Environments
A wave of startups is creating reinforcement learning (RL) environments to help major AI labs train more capable agents. Industry observers note this could become Silicon Valley's next major investment trend, with companies like Anthropic, OpenAI, and Scale AI showing interest in these specialized training platforms. TechCrunch (2025-09-21)
YouTube Unveils New GenAI Tools for Creators
YouTube has announced a suite of new generative AI tools and features for content creators during its "Made on YouTube" event. The updates include enhancements to Studio, YouTube Live, and various AI-powered creative tools designed to help video creators streamline their production processes. TechCrunch (2025-09-20)

PRODUCTS
Alibaba Releases New Qwen3-Omni Models
Company: Alibaba Cloud (Established player)

Release Date: (2023-09-22)

HuggingFace Repository
Alibaba's Qwen team has released three new Qwen3-Omni models on HuggingFace, expanding their open-source AI model lineup. The three specialized 30B models are:

Qwen3-Omni-30B-A3B-Captioner - Optimized for image captioning tasks
Qwen3-Omni-30B-A3B-Thinking - Enhanced for reasoning capabilities
Qwen3-Omni-A3B - General purpose model

These models appear to use a new architecture (A3B) that has prompted significant community interest, particularly around compatibility with llama.cpp for local deployment. The release has generated excitement among local AI enthusiasts, with many hoping these powerful models can run on consumer-grade hardware like NVIDIA 3060 GPUs.
Qwen Image Edit 2509 Released
Company: Alibaba Cloud (Established player)

Release Date: (2023-09-22)

Reddit Announcement
Alibaba has released "Qwen-Image-Edit-2509," the September iteration of their image editing AI. This update introduces several significant improvements:

Multi-image Editing Support: Can now edit multiple images simultaneously using image concatenation techniques
Enhanced Performance: Improved editing capabilities compared to the August release
Monthly Release Cycle: This appears to be part of a regular monthly update schedule for their image editing technology

The model is available through the "Image Editing" feature in Qwen Chat. This regular release cadence suggests Alibaba is committing significant resources to improving their visual AI capabilities and keeping pace with competitors in the rapidly evolving AI image editing space.

TECHNOLOGY
Open Source Projects
AUTOMATIC1111/stable-diffusion-webui
A comprehensive web interface for Stable Diffusion with 156,699 stars (+38 today). This Gradio-based UI provides extensive features including outpainting, inpainting, color sketch, prompt matrix, and upscaling capabilities in a user-friendly package. The project continues to see active development with multiple commits merged just yesterday.
comfyanonymous/ComfyUI
A powerful and modular diffusion model GUI with 89,069 stars (+104 today). ComfyUI distinguishes itself with a node-based interface that allows for advanced workflow customization. Recent updates include bug fixes to the WanAnimateToVideo node and addition of offset parameters, showing continued active development.
Models & Datasets
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Alibaba's 30B parameter Mixture-of-Experts model based on Qwen3 architecture. The model has gained significant traction with 561 likes and 7,564 downloads. It supports conversational applications and text generation tasks under the Apache 2.0 license.
ibm-granite/granite-docling-258M
IBM's document-specialized model (258M parameters) designed for processing documents, code, formulas, charts, and tables. Based on the IDEFICS3 architecture, it excels at OCR, layout understanding, and information extraction from complex documents. With 523 likes and 17,529 downloads, it's gaining popularity for document-specific AI tasks.
InternRobotics/OmniWorld
A large-scale dataset for robotics and multi-modal AI applications with 11,749 downloads. OmniWorld supports various tasks including text-to-video, image-to-video, image-to-3D, and robotics applications. The dataset contains between 1-10M samples and was last updated on September 21, 2025.
LucasFang/FLUX-Reason-6M
A large-scale dataset containing 6 million image-text pairs specifically curated for visual reasoning tasks. With 75 likes and 36,323 downloads, it's becoming a reference dataset for training multimodal reasoning capabilities, available in Parquet format under Apache 2.0 license.
Developer Tools & Spaces
Wan-AI/Wan2.2-Animate
A Gradio-based interface for the Wan2.2-Animate-14B model with 298 likes. The space provides easy access to this popular video generation model, allowing users to create animations without requiring local setup or specialized hardware.
Kwai-Kolors/Kolors-Virtual-Try-On
An immensely popular virtual clothing try-on interface with 9,679 likes. This Gradio-based tool allows users to visualize clothing items on digital models, making it useful for fashion e-commerce and personal shopping applications.
not-lain/background-removal
A practical tool for removing backgrounds from images with 2,344 likes. This Gradio-based interface provides a simple way to extract subjects from images, useful for graphics work, e-commerce, and content creation.
finegrain/finegrain-image-enhancer
An AI-powered image enhancement tool with 1,769 likes. This space combines upscaling, clarity improvement, and refinement techniques to transform low-quality images into higher-quality versions. It leverages multiple AI models including Stable Diffusion variants to achieve impressive results.

RESEARCH
Paper of the Day
AToken: A Unified Tokenizer for Vision (2025-09-17)
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
Google Research
This paper represents a significant breakthrough in multimodal AI by introducing the first unified visual tokenizer that works across images, videos, and 3D assets while accomplishing both high-fidelity reconstruction and semantic understanding. Unlike existing specialized tokenizers that focus on either reconstruction or understanding for individual modalities, AToken bridges this gap with a novel pure transformer architecture using 4D rotary position embeddings to unify both tasks and modalities within a single framework.
The authors demonstrate that AToken achieves state-of-the-art performance on reconstruction benchmarks while matching specialized tokenizers on understanding tasks. This work addresses a critical challenge in multimodal AI by providing a common representational foundation that could significantly advance the development of more unified vision-language models.
Notable Research
Rethinking Molecule Synthesizability with Chain-of-Reaction (2025-09-19)
Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Weili Nie, Arash Vahdat
The authors introduce ReaSyn, a novel framework that addresses the challenge of generating synthesizable molecules, a common pitfall in molecular generative models. Their approach uses a chain-of-reaction method that effectively explores neighboring chemical spaces to find optimal synthesizable molecules, overcoming the limitations of previous approaches in this exponentially large combinatorial space.
BEFT: Bias-Efficient Fine-Tuning of Language Models (2025-09-19)
Baichuan Huang, Ananth Balashankar, Amir Aminifar
This research proposes a novel parameter-efficient fine-tuning approach that focuses exclusively on bias terms, achieving remarkable parameter efficiency while maintaining competitive performance. The authors identify and optimize key bias terms across different components of transformer models, introducing a paradigm that requires only 0.01-0.06% of trainable parameters compared to full fine-tuning.
Multi-Physics: A Comprehensive Benchmark for Multimodal LLMs Reasoning on Chinese Multi-Subject Physics Problems (2025-09-19)
Zhongze Luo, Zhenshuai Yin, Yongxin Guo, Zhichao Wang, Jionghao Zhu, Xiaoying Tang
The researchers introduce a novel benchmark for evaluating multimodal LLMs on Chinese physics reasoning tasks, addressing critical gaps in current evaluation frameworks by providing fine-grained subject coverage, step-by-step reasoning assessments, and systematic evaluation of visual information processing in non-English contexts.
Inverting Trojans in LLMs (2025-09-19)
Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
This paper tackles the challenging problem of backdoor detection in LLMs, developing novel inversion techniques to identify malicious triggers in the discrete token space. The authors overcome fundamental limitations of traditional gradient-based search methods that work well for images but fail for text, proposing efficient search strategies for the vast combinatorial space of potential triggers.

LOOKING AHEAD
As Q3 2025 draws to a close, we're seeing early signals of what might define AI's next evolution. The emergence of fully autonomous AI agents capable of self-supervision and complex task coordination marks a significant leap beyond today's systems. Industry insiders hint that Q4 will bring the first commercial deployments of true multimodal reasoning systems that can transfer knowledge across domains without explicit training.
Looking into early 2026, the convergence of quantum computing and neural architectures appears imminent, with several tech giants racing to demonstrate practical quantum advantage in large-scale model training. Meanwhile, regulatory frameworks are struggling to keep pace, with the EU's AI Oversight Committee expected to release updated guidance on autonomous agent governance by year-end – potentially reshaping how these technologies can be deployed globally.

Don't miss what's next. Subscribe to AGI Agent: