AGI Agent

Subscribe
Archives
September 30, 2025

LLM Daily: September 30, 2025

🔍 LLM DAILY

Your Daily Briefing on Large Language Models

September 30, 2025

HIGHLIGHTS

• A breakthrough in AI model training efficiency has been published by OpenAI co-founder John Schulman, demonstrating that Low-Rank Adaptation (LoRA) can match full fine-tuning performance in reinforcement learning while using only two-thirds of computational resources.

• The recruiting AI sector is seeing significant investment activity with Y Combinator alumnus Alex raising $17M for interview automation technology and Sequoia Capital backing the Juicebox recruiting platform.

• Researchers from the University of Sydney have developed Dynamic Experts Search (DES), a novel technique that enhances reasoning in Mixture-of-Experts LLMs without additional training, boosting performance by 17.2% on GSM8K and 18.5% on MATH benchmarks.

• Open-source AI development is accelerating with Dify's production-ready platform for agentic workflows reaching over 115,000 GitHub stars and "LLMs-from-scratch" providing educational resources for building GPT-like models in PyTorch.


BUSINESS

Funding & Investment

AI Recruiter Alex Raises $17M to Automate Initial Job Interviews (2025-09-29)
Y Combinator alumnus Alex secured a $17 million Series A funding round led by Peak XV Partners. The startup is building AI technology to automate the initial job interview process, potentially transforming HR tech workflows. TechCrunch

Sequoia Capital Invests in Recruiting Platform Juicebox (2025-09-25)
Sequoia Capital announced its partnership with Juicebox, describing it as "The Recruiting Platform Founders Are Obsessed With." This investment highlights continued VC interest in AI-powered recruiting solutions. Sequoia Capital

Technology & Competition

DeepSeek Releases 'Sparse Attention' Model to Cut API Costs (2025-09-29)
DeepSeek researchers have released a new experimental model using sparse attention techniques that reportedly cuts API costs in half for long-context operations. This development could significantly reduce the operational costs of AI systems, particularly for applications requiring extended context windows. TechCrunch

OpenAI Challenges Google and Amazon with New Shopping System (2025-09-29)
OpenAI has launched a new agentic shopping system, positioning itself as a direct competitor to Google and Amazon in the e-commerce space. This marks a significant expansion of OpenAI's commercial strategy beyond core AI development. TechCrunch

Policy & Regulation

California Governor Signs Landmark AI Safety Bill (2025-09-29)
Governor Gavin Newsom signed SB 53, a landmark AI safety bill requiring large AI labs including OpenAI, Anthropic, Meta, and Google DeepMind to implement transparent safety protocols. The legislation also provides whistleblower protections for employees at these companies, setting a potential precedent for AI regulation in other jurisdictions. TechCrunch

International Developments

South Korea Launches Initiative to Compete with Global AI Leaders (2025-09-27)
South Korea has unveiled its most ambitious sovereign AI initiative to date, with major tech companies including LG and SK Telecom developing their own large language models. This national effort aims to establish South Korean AI capabilities that can compete with OpenAI, Google, and other global leaders. TechCrunch


PRODUCTS

Thinking Machines Releases Research on LoRA for Reinforcement Learning

Blog Post | Thinking Machines (Led by OpenAI co-founder) | (2025-09-29)

John Schulman, OpenAI co-founder and researcher at Thinking Machines, has published significant new research demonstrating that Low-Rank Adaptation (LoRA) can match the performance of full fine-tuning in reinforcement learning scenarios while using only two-thirds of the computational resources. This breakthrough challenges the previous assumption that high-quality thinking models required extensive hardware (8+ GPUs). The research suggests that LoRA can now be used to effectively train models with substantially reduced computational requirements, potentially democratizing advanced model development.

New Research Paper: "No Prompt Left Behind" Advances RLHF Techniques

Paper on arXiv | Academic Research | (2025-09-29)

A new research paper introduces "Entropy-Guided Advantage Shaping," an enhancement to Reinforcement Learning with Verifiable Rewards (RLVR). The technique addresses a critical limitation in current approaches like GRPO that only use problems where model responses differ significantly. The new method enables training on a wider range of prompts, including those with zero-variance responses, potentially improving large language models' reasoning abilities. The research is available on both arXiv and Hugging Face.

New Stable Diffusion Workflow Combines Multiple Models for Enhanced Video Generation

Workflow on Civitai | Community Developer (alisitskii) | (2025-09-29)

A new AI creative workflow combines HiDream, WanVision 2.2, USDU, and GIMM VFI models to produce highly realistic and atmospheric AI-generated videos. The multi-stage process begins with text-to-image generation, followed by image-to-image refinement for realism, conversion to video, upscaling to 1080p, frame interpolation for smoothness, and AI-generated audio for a complete multimedia experience. The community reception has been enthusiastic, with users noting the impressive quality and realism of the generated content.


TECHNOLOGY

Open Source Projects

langgenius/dify - Production-Ready Platform for Agentic Workflows

Dify is a comprehensive platform for building and deploying AI applications with agentic workflows. It recently added file upload capabilities for workflow development, allowing developers to recreate Google NotebookLM-like podcast experiences. The project is gaining significant traction with over 115,000 GitHub stars and continuous improvements to its core functionalities and type safety.

rasbt/LLMs-from-scratch - Build a GPT-like LLM in PyTorch

This educational repository provides step-by-step guidance for implementing a ChatGPT-like large language model from scratch using PyTorch. With nearly 74,000 stars and growing rapidly (+390 today), it serves as the official code companion to Sebastian Raschka's book of the same name. Recent updates include improved generation stability on MPS devices and compatibility testing with Python 3.13.

Shubhamsaboo/awesome-llm-apps - LLM Application Showcase

A curated collection of practical LLM applications showcasing AI agents and RAG implementations using OpenAI, Anthropic, Gemini, and open-source models. With over 70,000 stars, this repository recently added a YAML-based Multi-Agent Web Research System with Firecrawl MCP, emphasizing low-code implementations for complex AI workflows.

Models & Datasets

Text-to-Image & 3D Generation

  • tencent/HunyuanImage-3.0 - The latest version of Tencent's Hunyuan text-to-image model, gaining rapid attention with 631 likes despite being recently released.
  • Wan-AI/Wan2.2-Animate-14B - An animation-focused diffusion model with 563 likes and over 35,000 downloads, implementing techniques from a recent arxiv paper (2503.20314) for creating animated content.
  • tencent/Hunyuan3D-Part - Specialized 3D generation model from Tencent focusing on part segmentation and generation, built on their Hunyuan3D-2.1 architecture and trained on Objaverse and Objaverse-XL datasets.

Multimodal & LLMs

  • Qwen/Qwen3-Omni-30B-A3B-Instruct - A 30B-parameter multimodal model from Qwen that supports text-to-audio and any-to-any transformations. With 547 likes and over 125,000 downloads, it's among the most widely deployed recent releases.
  • deepseek-ai/DeepSeek-V3.2-Exp - An experimental conversational model from DeepSeek with FP8 quantization support, compatible with endpoints and AutoTrain, released under an MIT license.

Datasets

  • openai/gdpval - A multimodal validation dataset from OpenAI spanning audio, document, image, text, and video modalities. With 121 likes and over 6,500 downloads since its release on September 25th, it's gaining significant adoption for model evaluation.
  • nvidia/Nemotron-Personas-Japan - A synthetic Japanese-language dataset from NVIDIA featuring persona-based text and images, with over 6,800 downloads and categorized in the 1-10M size range.
  • tencent/WildSpeech-Bench - A speech benchmark dataset containing audio and text samples for evaluating LLM speech processing capabilities, accompanied by a recent arxiv paper (2506.21875).
  • ScaleAI/SWE-bench_Pro - A software engineering benchmark from Scale AI for evaluating coding capabilities of models, gaining quick adoption with over 2,600 downloads since release.

Developer Tools & Spaces

  • Wan-AI/Wan2.2-Animate - A highly popular Gradio interface for the Wan2.2 animation model with over 1,100 likes, allowing easy access to the model's animation capabilities.
  • multimodalart/ai-toolkit - A Docker-based comprehensive toolkit for AI development that's gaining traction with 84 likes.
  • Kwai-Kolors/Kolors-Virtual-Try-On - An extremely popular virtual try-on application with 9,720 likes, demonstrating practical retail applications of AI technology.
  • not-lain/background-removal - A widely-used tool for image background removal with over 2,300 likes, implementing efficient MCP server technology for processing.
  • Respair/Takane - A new Japanese text-to-speech system featuring autoregressive speech generation with anime-style voice output, showcasing specialized voice synthesis capabilities.

RESEARCH

Paper of the Day

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time (2025-09-26)

Authors: Yixuan Han, Fan Ma, Ruijie Quan, Yi Yang

Institution: The University of Sydney

This paper is significant because it introduces a novel approach to improve the reasoning capabilities of Mixture-of-Experts (MoE) LLMs without requiring additional training or fine-tuning. By identifying that varying the number of activated experts in MoE architectures creates diverse solution paths, the researchers unlock a new dimension for test-time optimization.

The authors propose Dynamic Experts Search (DES), which systematically explores different expert activation patterns during inference to find optimal reasoning paths. Their approach demonstrates substantial improvements across multiple reasoning benchmarks, including a 17.2% boost on GSM8K and 18.5% on MATH when applied to Mixtral-8x7B. These results are particularly important as they reveal a previously underexplored source of model diversity that can be leveraged during inference time.

Notable Research

InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios (2025-09-26)

Authors: Chenglin Yu, Yang Yu, Songmiao Wang, et al.

InfiAgent introduces a novel self-evolving pyramid framework that enables LLM agents to automatically adapt to new scenarios without manual prompt engineering or human intervention. The system uses a hierarchical approach with specialized experts and a top-down/bottom-up evolution mechanism, demonstrating superior performance compared to traditional agent frameworks across diverse tasks.

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing (2025-09-26)

Authors: Ke Wang, Houxing Ren, Zimu Lu, Mingjie Zhan, Hongsheng Li

This research introduces the first comprehensive benchmark for evaluating voice-first AI assistants, covering 13 multimodal task categories with over 10,000 examples. VoiceAssistant-Eval assesses capabilities across audio understanding, speech generation, and visual processing, providing crucial insights into the current limitations of leading AI systems in voice-first interactions.

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation (2025-09-26)

Authors: Ruoyu Chen, Xiaoqing Guo, Kangwei Liu, et al.

The researchers present EAGLE, a novel black-box framework that explains how multimodal LLMs generate tokens by attributing outputs to specific visual regions and quantifying the importance of different modalities. This work significantly advances our understanding of multimodal model interpretability by revealing which visual elements influence specific parts of the generated text.

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning (2025-09-26)

Authors: Hieu Tran, Zonghai Yao, Nguyen Luong Tran, et al.

PRIME introduces a novel agent framework that combines task planning with a sophisticated memory system that dynamically retrieves and organizes contextual information. By integrating planning with memory management, this approach significantly improves LLM performance on complex, multi-step reasoning tasks requiring long-term context retention.

AxLLM: accelerator architecture for large language models with computation reuse capability (2025-09-26)

Authors: Soroush Ahadi, Mehdi Modarressi, Masoud Daneshtalab

This paper presents a hardware accelerator architecture specifically designed for quantized LLMs that exploits parameter locality to enable computation reuse. The proposed architecture delivers up to 3.8× speedup and 3.6× energy reduction compared to state-of-the-art accelerators, demonstrating a promising direction for more efficient LLM deployment on resource-constrained devices.


LOOKING AHEAD

As we close Q3 2025, the convergence of multimodal LLMs with real-time data processing is reshaping AI capabilities. The recent breakthroughs in sub-1ms inference latency suggest that by Q1 2026, we'll see the first truly conversational AI assistants indistinguishable from human interactions in specialized domains.

Watch for the emerging "cognitive architecture" paradigm gaining momentum in Q4, as researchers move beyond monolithic models toward modular AI systems with specialized components working in concert. These developments, coupled with the anticipated regulatory frameworks expected from the Global AI Governance Summit in November, will likely accelerate responsible AI deployment while addressing growing concerns about synthetic media authentication and AI-generated content provenance.

Don't miss what's next. Subscribe to AGI Agent:
GitHub X
Powered by Buttondown, the easiest way to start and grow your newsletter.