LLM Daily: September 22, 2025

Dimitrios Damianos, Leon Voukoutis, Georgios Paraskevopoulos, Vassilis Katsouros

                        September 22, 2025

            LLM Daily: September 22, 2025

                    🔍 LLM DAILY
Your Daily Briefing on Large Language Models
September 22, 2025
HIGHLIGHTS
• Meituan has released LongCat-Flash-Thinking, a groundbreaking open-source AI model that achieves enhanced reasoning capabilities while using 64.5% fewer tokens than competitors, leveraging asynchronous reinforcement learning for 3x speedup over synchronous frameworks.
• Silicon Valley is witnessing a surge in startups developing reinforcement learning environments for AI agent training, attracting significant interest from major players including Anthropic, OpenAI, and Scale AI in what TechCrunch calls "Silicon Valley's next craze."
• VOX-KRIKRI research introduces a novel continuous fusion approach that bridges the gap between speech and language models without relying on tokenization boundaries, achieving state-of-the-art performance on audio question-answering benchmarks.
• Sequoia Capital has announced a partnership with AI startup Irregular, signaling continued venture capital interest in innovative AI companies despite the competitive landscape.
• ComfyUI, a node-based visual AI engine for diffusion models, continues to gain popularity among advanced users with recent updates to its Weighted Animation Nodes and LoRA Trainer for FP8 models, offering greater customization than traditional interfaces.

BUSINESS
Funding & Investment

Sequoia Capital invests in Irregular: Sequoia announced a partnership with Irregular, according to their recent blog post titled "Partnering with Irregular: Ahead of the Curve." The AI startup appears to have secured funding, though specific amount details weren't disclosed. (2025-09-17)

Market Analysis

Silicon Valley's newest trend - AI training environments: According to TechCrunch, a wave of startups are creating reinforcement learning (RL) environments to help AI labs train agents. This emerging sector is attracting significant attention from major players including Anthropic, OpenAI, and Scale AI, potentially becoming "Silicon Valley's next craze." (2025-09-21)

California's SB 53 could impact big AI companies: TechCrunch reports that California's new AI safety bill (SB 53) may provide meaningful oversight of large AI companies. The legislation appears to have a stronger chance of becoming law compared to previous attempts at AI regulation in the state. (2025-09-19)

Company Updates

YouTube announces new generative AI tools: At their "Made on YouTube" event, the Google-owned platform unveiled numerous new AI features and tools designed specifically for creators. These updates appear to focus on enhancing content creation capabilities through generative AI technology. (2025-09-20)

PRODUCTS
LongCat-Flash-Thinking - New Open-Source AI Model with Enhanced Reasoning
Company: Meituan (Established Chinese tech company)

Released: 2025-09-21

Source: HuggingFace | Official Website
Meituan has released LongCat-Flash-Thinking, a new open-source AI model designed for enhanced reasoning with greater efficiency. The model claims state-of-the-art performance on logic, mathematics, coding, and agent tasks while using 64.5% fewer tokens to achieve top-tier accuracy on AIME25 benchmarks with native tool use. The model is built with agent compatibility in mind and leverages asynchronous reinforcement learning, which reportedly achieves a 3x speedup over synchronous frameworks. The model is available on HuggingFace and can be tested through the official website.
Qwen Image Edit - Growing Community Interest
Company: Alibaba Cloud (Established tech company)

Released: Previously launched, gaining traction

Source: Reddit Discussion
Qwen Image Edit, an AI-powered image editing tool from Alibaba Cloud, is gaining attention in the AI art community for its impressive capabilities. Users on Reddit are sharing successful experiments and testing the tool's limits. While comprehensive documentation appears to be limited, the community is working to develop best practices and workflows. One user is compiling information for a comprehensive guide to be released freely to the community. The tool appears to be particularly effective for sophisticated image manipulations, though some users report inconsistent results with default settings.

TECHNOLOGY
Open Source Projects
AUTOMATIC1111/stable-diffusion-webui - 156,603 stars
The most popular web interface for Stable Diffusion, built with Gradio. Recently updated with fixes for image upscaling on CPU systems, this comprehensive UI provides features like outpainting, inpainting, color sketch, prompt matrix, and more in an accessible package that continues to be the go-to option for Stable Diffusion users.
comfyanonymous/ComfyUI - 88,967 stars
A powerful and modular visual AI engine featuring a node-based interface for diffusion models. With recent updates to its Weighted Animation Nodes (WAN) and fixes to the LoRA Trainer for FP8 models, ComfyUI offers greater customization and control than traditional interfaces, making it popular among advanced users who need fine-grained control of their generation pipelines.
Models & Datasets
tencent/SRPO
A new text-to-image diffusion model from Tencent that has quickly gained attention with 865 likes. The model appears to implement techniques from the paper arxiv:2509.06942 and offers enhanced image generation capabilities, becoming one of the most popular recent additions to Hugging Face.
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
A 30B parameter language model from Alibaba that utilizes a mixture-of-experts (MoE) architecture based on the Qwen3 family. With 500 likes and compatibility with AutoTrain and Inference Endpoints, this conversational model represents Alibaba's continued push into the large language model space.
google/vaultgemma-1b
A differentially private 1B parameter model from Google, trained using DP-SGD (Differentially Private Stochastic Gradient Descent). VaultGemma demonstrates Google's focus on privacy-preserving machine learning techniques while still maintaining usable text generation capabilities. The model references multiple privacy-focused research papers in its metadata.
Wan-AI/Wan2.2-Animate-14B
A 14B parameter animation-focused diffusion model that has garnered 277 likes and nearly 9,000 downloads. Based on research from arxiv:2503.20314, this model specializes in generating animated content and is complemented by a popular Gradio space (Wan-AI/Wan2.2-Animate) for easy access.
HuggingFaceFW/finepdfs
A massive multilingual dataset specifically designed for PDF processing and understanding, with support for hundreds of languages. With 546 likes and over 74,000 downloads, this dataset provides valuable training data for models that need to handle PDF documents in diverse languages and formats.
Developer Tools & Spaces
Wan-AI/Wan2.2-Animate
A Gradio-based interface for the Wan2.2-Animate model, making animation generation accessible without requiring local installation. With 215 likes, this space provides an easy way to experiment with advanced animation capabilities.
Kwai-Kolors/Kolors-Virtual-Try-On
An immensely popular virtual try-on application with over 9,677 likes. This Gradio space allows users to virtually try on clothing items, demonstrating practical applications of AI in fashion e-commerce and showcasing how computer vision models can enhance shopping experiences.
finegrain/finegrain-image-enhancer
A specialized image enhancement space with 1,767 likes that combines upscaling, clarity improvements, and refinement techniques. It leverages Stable Diffusion 1.5 and Juggernaut models to provide high-quality image enhancement for artistic purposes.
not-lain/background-removal
A highly practical background removal tool with 2,336 likes. This space provides a clean interface for automatically removing backgrounds from images, a common task in image editing and content creation workflows.

RESEARCH
Paper of the Day
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion (2025-09-19)
Dimitrios Damianos, Leon Voukoutis, Georgios Paraskevopoulos, Vassilis Katsouros
This paper stands out for introducing a novel approach to multimodal integration that bridges the gap between speech and language models without relying on tokenization boundaries. VOX-KRIKRI's significance lies in its unique continuous fusion mechanism that connects Whisper and LLMs through an intermediate audio-conditioned text space, resulting in a more natural and effective alignment of modalities. The approach demonstrates state-of-the-art performance on audio question-answering benchmarks while maintaining computational efficiency, paving the way for more integrated speech-language AI systems.
Notable Research
AToken: A Unified Tokenizer for Vision (2025-09-17)
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang
The first unified visual tokenizer that handles both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets, encoding diverse visual inputs into a shared 4D latent space using a pure transformer architecture.
Rethinking Molecule Synthesizability with Chain-of-Reaction (2025-09-19)
Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Weili Nie, Arash Vahdat
Introduces ReaSyn, a generative framework that tackles the synthesizability challenge in molecular generation by exploring neighboring synthesizable molecules through a chain-of-reaction mechanism, significantly improving coverage and optimization performance.
BEFT: Bias-Efficient Fine-Tuning of Language Models (2025-09-19)
Baichuan Huang, Ananth Balashankar, Amir Aminifar
Presents an innovative parameter-efficient fine-tuning technique focusing on bias terms, achieving competitive performance while using orders of magnitude fewer parameters than other PEFT methods, with particular effectiveness in low-data scenarios.
Inverting Trojans in LLMs (2025-09-19)
Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
Addresses the critical security challenge of detecting and inverting backdoor triggers in LLMs by developing innovative methods to overcome the discrete input space problem and efficiently search through the vast token tuple space.

LOOKING AHEAD
As we move toward Q4 2025, the integration of embodied AI with multimodal LLMs continues to reshape human-AI interaction. The emerging generation of models capable of real-time physical world reasoning—interpreting visual, auditory, and spatial data simultaneously—signals a shift from passive text generators to active environmental reasoners. Several research labs are now demonstrating early prototypes of systems that can understand implicit social cues and maintain coherent identity across long-term interactions.
We anticipate that by early 2026, the first wave of commercially viable AI systems combining neuromorphic hardware with quantum-assisted training will reach market, dramatically reducing both training costs and inference latency. This will likely accelerate the development of personalized AI assistants with genuine contextual memory, further blurring the line between digital tools and collaborative partners.

Don't miss what's next. Subscribe to AGI Agent: