LLM Daily: January 23, 2026

Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen, Li Dong, Wayne Xin Zhao, Ji-Rong Wen, Furu Wei

        January 23, 2026

LLM Daily: January 23, 2026

        🔍 LLM DAILY
Your Daily Briefing on Large Language Models
January 23, 2026
HIGHLIGHTS
• Alibaba's Qwen team is developing a new text-to-speech (TTS) model that was reportedly part of a vLLM leak, adding to their growing portfolio of open-source AI capabilities.
• LiveKit, the voice AI engine powering OpenAI's ChatGPT voice mode, has achieved unicorn status with a $1 billion valuation after securing a $100 million investment round led by Index Ventures.
• Microsoft Research has introduced a breakthrough approach called "LLM-in-Sandbox" that enables large language models to exhibit general agentic intelligence by operating within a code sandbox environment without additional training.
• vLLM, the high-throughput inference engine for LLMs, continues active development with recent commits focused on Mixture of Experts (MoE) model support and performance optimizations for specialized hardware like Nvidia Hopper GPUs.
• Blockit, an AI calendar negotiation agent founded by a former Sequoia partner, has secured $5 million in seed funding to develop technology that communicates directly with other calendars to manage scheduling autonomously.

BUSINESS
Funding & Investment

Blockit Raises $5M Seed Round: An AI calendar negotiation agent startup founded by a former Sequoia partner has secured $5 million in seed funding led by Sequoia Capital. The AI agent communicates directly with other calendars to manage scheduling. (2026-01-22)

LiveKit Reaches Unicorn Status: Voice AI engine LiveKit, which powers OpenAI's ChatGPT voice mode, has reached a $1 billion valuation after raising a $100 million round led by Index Ventures. (2026-01-22)

Inferact Secures $150M: AI inference startup Inferact has raised $150 million in a seed round that values the company at $800 million. The startup is focused on commercializing vLLM technology. (2026-01-22)

RadixArk (formerly SGLang) Valued at $400M: The UC Berkeley spin-out focused on inference optimization has reportedly raised capital from Accel at a $400 million valuation, according to sources. (2026-01-21)

Company Updates

OpenAI Appoints Enterprise Lead: OpenAI has reportedly appointed Barret Zoph to lead its enterprise division, just a week after Zoph rejoined the company, signaling an increased focus on enterprise customers in 2026. (2026-01-22)

Google DeepMind CEO Comments on AI Monetization: Demis Hassabis, CEO of Google DeepMind, expressed surprise at OpenAI's decision to implement ads in ChatGPT, stating that Google isn't pressuring his team to monetize AI chatbots through advertising. (2026-01-22)

Todoist Introduces Voice AI Feature: Productivity app Todoist has publicly released a feature allowing users to add tasks to their to-do lists by speaking naturally to the app's AI. (2026-01-21)

Apple Planning Siri Overhaul: Reports indicate that Apple plans to transform Siri into an AI chatbot more similar to ChatGPT, moving beyond its current state as an integrated feature across Apple products. (2026-01-21)

Market Analysis

AI Agents Performance Questioned: New benchmark research examining how leading AI models perform in actual white-collar work tasks across consulting, investment banking, and law found that most models failed, raising doubts about AI agents' readiness for workplace deployment. (2026-01-22)

Inference Market Heats Up: With significant funding rounds for Inferact and RadixArk (formerly SGLang), the AI inference optimization market is seeing explosive growth as companies look to improve the efficiency of large language model deployment. (2026-01-22)

PRODUCTS
Alibaba's Qwen Team Working on New TTS Model
Source: Reddit | Alibaba (Established) | (2026-01-22)
A developer from Alibaba's Qwen team appears to be working on a new text-to-speech (TTS) model, according to Reddit discussions. The model was reportedly part of a vLLM leak, though official announcements are still forthcoming. Qwen has been gaining traction in the open-source AI community with their series of capable language models.
RuneXX Releases LTX-2 Workflows for Stable Diffusion
Source: Hugging Face | RuneXX (Community Creator) | (2026-01-22)
A new set of workflows for Stable Diffusion has been released by RuneXX on Hugging Face. The LTX-2 Workflows appear to enhance image generation capabilities, with Reddit users sharing samples of 1080p images created using the new workflows. The workflows are designed to improve quality and consistency in generated images, though some artifacts remain in complex areas like hands.
ComfyUI 0.10.0 Beta Released with New Core Node
Source: Reddit | ComfyUI (Open Source) | (2026-01-22)
The popular open-source UI for Stable Diffusion models, ComfyUI, has released version 0.10.0 in beta. This update includes a new "core beta node" that users are experimenting with. ComfyUI continues to be a favored interface among advanced Stable Diffusion users for its node-based workflow and extensive customization options.
GPTZero Identifies AI Hallucinations in Academic Papers
Source: GPTZero | GPTZero (Startup) | (2026-01-22)
GPTZero has released a report identifying 100 hallucinated citations in 51 papers accepted at NeurIPS 2025, one of the premier machine learning conferences. This follows a similar analysis of ICLR submissions, highlighting ongoing concerns about AI-assisted academic writing. GPTZero's detection tools are being used to identify potentially problematic content in academic research, raising questions about academic integrity in the AI era.

TECHNOLOGY
Open Source Projects
openai/openai-cookbook - 71K+ stars
Official collection of examples and guides for the OpenAI API, recently updated with GPT 5.2 Codex mentions. The cookbook provides code samples and best practices for common API use cases, helping developers effectively implement OpenAI's models. Recent updates focus on Codex capabilities and fixing documentation issues.
vllm-project/vllm - 68.2K+ stars
A high-throughput and memory-efficient inference and serving engine for LLMs. vLLM dramatically improves inference speed and resource utilization through continuous batching and PagedAttention memory management. Recent commits show active development on MoE model support and performance optimizations for specialized hardware like Nvidia Hopper GPUs.
anthropics/skills - 49.7K+ stars, trending rapidly
Anthropic's implementation of Agent Skills for Claude, providing a framework for creating specialized capabilities. Skills are structured folders containing instructions, scripts, and resources that Claude can dynamically load to improve performance on specific tasks. The repository serves as both a reference implementation and documentation resource for the Agent Skills standard.
Models & Datasets
zai-org/GLM-4.7-Flash
A multilingual (English and Chinese) conversational model generating significant interest with nearly 1,000 likes and 123K+ downloads. Based on the GLM architecture (referenced in arxiv:2508.06471), this MIT-licensed model is compatible with cloud endpoints, making it accessible for production deployments.
nvidia/personaplex-7b-v1
Nvidia's speech-to-speech and audio-to-audio model built on the Moshiko architecture. With 523 likes, this specialized audio transformation model appears to be designed for persona-based voice conversions, building on research from multiple papers including arxiv:2503.04721 and arxiv:2410.00037.
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b
A supervised fine-tuning dataset focused on improving reasoning capabilities in large language models. With nearly 10K downloads, this dataset targets code, math, scientific QA, and step-by-step reasoning. It was designed specifically for distilling reasoning capabilities into the gpt-oss-120b model architecture.
MiniMaxAI/OctoCodingBench
A coding benchmark dataset for evaluating agent capabilities, with 243 likes and 11.5K+ downloads. Despite its relatively small size (under 1K samples), this MIT-licensed benchmark has gained significant traction for assessing code generation performance, particularly in agent-based systems.
Developer Tools & Infrastructure
unsloth/GLM-4.7-Flash-GGUF
Quantized version of the GLM-4.7-Flash model in GGUF format for efficient local deployment. With 244 likes and an impressive 111K+ downloads, this model demonstrates the strong demand for optimized local inference versions of powerful models. The quantization enables running the model on consumer hardware with significantly reduced memory requirements.
HuggingFaceTB/smol-training-playbook
A popular training guide (2.9K+ likes) providing a comprehensive playbook for efficient model training. This Docker-based space appears to be a research article/template focused on smaller, more efficient training approaches, with visualization tools to help researchers understand the training process.
webml-community/YOLO26-WebGPU
A static Space implementing YOLO26 using WebGPU for browser-based object detection. This implementation leverages the newer WebGPU API (successor to WebGL) to run high-performance computer vision models directly in web browsers, enabling edge deployments without server-side processing.
Visual Generation Tools
zai-org/GLM-Image
A multilingual text-to-image model with 956 likes and 10.7K+ downloads. This MIT-licensed diffusion model supports both English and Chinese text prompts using a custom GlmImagePipeline in the Diffusers library, providing an accessible open alternative to proprietary image generation systems.
prithivMLmods/Qwen-Image-Edit-2511-LoRAs-Fast
A Gradio-based image editing application using the Qwen image model enhanced with over 2,500 LoRA adaptations. This Space (534 likes) offers a fast implementation for detailed image manipulation using fine-tuned versions of the Qwen architecture for specific visual editing tasks.

RESEARCH
Paper of the Day
LLM-in-Sandbox Elicits General Agentic Intelligence (2026-01-22)
Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen, Li Dong, Wayne Xin Zhao, Ji-Rong Wen, Furu Wei
Microsoft Research
This paper represents a significant advancement in LLM agent capabilities by introducing a novel approach that places LLMs within a code sandbox (virtual computer) environment without additional training. The research demonstrates that powerful LLMs exhibit emergent capabilities to leverage the sandbox environment for non-coding tasks, including accessing external resources, using the file system to manage long contexts, and executing scripts for specific formatting needs. This approach could fundamentally change how we develop LLM-powered agents by providing them with more generalized tool-use abilities.
Notable Research
Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing (2026-01-22)
Song Xia, Meiwen Ding, Chenqi Kong, Wenhan Yang, Xudong Jiang

Introduces a novel feature space smoothing method that provides the first certified robustness guarantees for multimodal LLMs against adversarial perturbations in both text and image inputs.
Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation (2026-01-22)
Mingyu Yu, Lana Liu, Zhehao Zhao, Wei Wang, Sujuan Qin

Explores security vulnerabilities in MLLMs by developing a novel jailbreaking framework that bypasses safety mechanisms, enabling harmful image generation through carefully crafted semantic-agnostic inputs.
Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment (2026-01-22)
Yiran Qiao, Xiang Ao, Jing Chen, Yang Liu, Qiwei Zhong, Qing He

Presents CS-VAR, a novel approach that enables LLMs to detect complex patterns of harmful behavior across multiple live streaming sessions by leveraging cross-session evidence and retrieval-augmented techniques.
Co-Constructing Alignment: A Participatory Approach to Situate AI Values (2026-01-22)
Anne Arzberger, Enrico Liscio, Maria Luce Lupetti, Inigo Martinez de Rituerto de Troya, Jie Yang

Challenges dominant model-centric alignment approaches by reframing alignment as an interactional practice co-constructed during human-AI interaction, emphasizing users as active epistemic agents rather than passive recipients of predefined values.

LOOKING AHEAD
As we navigate Q1 2026, the integration of multimodal neuromorphic computing with large language models is emerging as the next frontier. Several research labs are reporting breakthrough efficiency gains of up to 90% in power consumption while maintaining or improving performance. Watch for the first commercial applications by Q3, particularly in edge computing and augmented reality interfaces.
Meanwhile, the regulatory landscape continues to evolve rapidly. The EU's AI Harmony Framework takes effect next quarter, while the US comprehensive AI legislation remains in committee. This regulatory divergence will likely accelerate the trend toward regionally-optimized AI systems, with companies increasingly deploying distinct model architectures tailored to specific jurisdictional requirements and cultural contexts.

                            Don't miss what's next. Subscribe to AGI Agent:

            Email address (required)

                Share this email:

                                Share on Facebook

                                Share on Twitter

                                Share on Hacker News

                                Share via email