AGI Agent

Subscribe
Archives
May 22, 2025

LLM Daily: May 22, 2025

🔍 LLM DAILY

Your Daily Briefing on Large Language Models

May 22, 2025

HIGHLIGHTS

• LM Arena, the AI benchmarking organization behind popular leaderboards, has secured a massive $100M seed funding at a $600M valuation, signaling increased investment in standardized AI evaluation infrastructure.

• Mistral AI and All Hands AI have released Devstral-Small-2505, a specialized agentic LLM designed specifically for software engineering tasks and optimized to work with OpenHands development workflows.

• Google has entered the diffusion model space with Gemini Diffusion, marking a significant shift from their traditional transformer-based LLMs to compete in the text-to-image generation market.

• The "Mixture-of-Thought" framework introduced by University of Pittsburgh and Carnegie Mellon researchers enables models to seamlessly integrate natural language, code, and symbolic logic, significantly improving performance on complex logical reasoning tasks.

• Dify, an open-source LLM app development platform offering AI workflow management and RAG pipelines, is gaining tremendous momentum with over 98,000 GitHub stars as teams seek to build AI applications without extensive engineering resources.


BUSINESS

Funding & Investment

LM Arena secures $100M seed funding at $600M valuation (2025-05-21) AI benchmarking organization LM Arena has raised $100 million in a seed round led by Andreessen Horowitz (a16z) and UC Investments. The company, known for its crowdsourced benchmarking projects that major AI labs use to test and market their models, is now valued at $600 million. TechCrunch

Google commits $150M to smart glasses partnership with Warby Parker (2025-05-20) Google announced a $150 million commitment to jointly develop AI-powered smart glasses with Warby Parker based on Android XR. Google has already allocated $75 million toward product development and commercialization costs. The announcement came during Google I/O 2025. TechCrunch

M&A

OpenAI acquires Jony Ive's io for $6.5 billion (2025-05-21) OpenAI is acquiring io, the device startup led by former Apple designer Jony Ive and OpenAI CEO Sam Altman, in an all-equity deal valued at $6.5 billion. As part of the acquisition, Ive and his design firm LoveFrom will lead design efforts at OpenAI. The startup has been in development for two years before this unusual acquisition. TechCrunch

Company Updates

Google unveils Gemini 2.5 with Deep Think capabilities (2025-05-20) At Google I/O 2025, the company revealed Gemini 2.5 featuring Deep Think technology, AI Mode in Search, and Veo 3 for video generation with audio. Google also introduced a premium $249 Ultra plan targeting power users and enterprises, positioning itself ahead of competitors with these advanced AI capabilities. VentureBeat

Sergey Brin makes surprise Google I/O appearance, declares AGI ambitions (2025-05-21) Google co-founder Sergey Brin made an unexpected appearance at Google I/O, stating "Gemini will be the very first AGI." His comments revealed philosophical tensions with DeepMind CEO Demis Hassabis, who advocates for scientific caution in AI development. During the event, Brin also acknowledged making "a lot of mistakes with Google Glass." VentureBeat | TechCrunch

OpenAI enhances Responses API with new enterprise features (2025-05-21) OpenAI has rapidly updated its Responses API with Model Context Protocol (MCP) support, native image generation through GPT-4o, and additional enterprise features. The update includes support for remote MCP servers, integration of image generation and Code Interpreter tools, and improvements to file search capabilities. VentureBeat

Meta launches program for startups using Llama AI models (2025-05-21) Meta has introduced a new program designed to encourage startups to adopt its Llama AI models. The initiative aims to expand the ecosystem of companies building on Meta's open-source AI technology. TechCrunch

Mistral AI releases Devstral open-source model for developers (2025-05-21) Mistral AI has launched Devstral, a powerful new open-source software engineering agent model that can run on laptops. The model is designed to make AI development more accessible by requiring less computational resources. VentureBeat

Google introduces Gemma 3n for on-device AI (2025-05-20) Google has expanded its "open" AI model family with Gemma 3n, designed to run efficiently on phones, laptops, and tablets. Available in preview, the model can handle multiple modalities including audio, text, images, and videos while maintaining performance on consumer devices. TechCrunch

Market Analysis

Klarna uses AI avatar of CEO for earnings call (2025-05-21) Fintech company Klarna utilized an AI avatar of CEO Sebastian Siemiatkowski to deliver its earnings report. The company disclosed the use of AI, though observers noted there were only subtle signs that the presentation wasn't delivered by the real CEO. This represents a novel application of AI in corporate communications. TechCrunch

Raindrop launches AI-native observability platform (2025-05-19) Raindrop has unveiled an AI-native observability platform designed to monitor AI application performance. The company's rebranding and product expansion reflects its belief that the next generation of software observability will be AI-first by design, helping developers identify when AI applications go "off-script" or create negative user experiences. VentureBeat


PRODUCTS

New Releases

Mistral AI & All Hands AI: Devstral-Small-2505

Released: (2025-05-21) Link: Hugging Face Repository

Mistral AI and All Hands AI have collaborated to release Devstral, a specialized agentic LLM designed specifically for software engineering tasks. Unlike Mistral's more general-purpose coding model Codestral, Devstral is trained to work with OpenHands, making it more specialized for certain development workflows. Community members have already begun creating GGUF versions for local deployment and compatibility with various frameworks.

Google DeepMind: Gemini Diffusion

Released: (2025-05-22) Link: Official Announcement

Google has entered the diffusion model space with Gemini Diffusion, marking a significant shift from traditional transformer-based LLMs. This text-to-diffusion model represents Google's approach to competing with existing diffusion models. While full details on its reasoning capabilities and differentiation from transformer architectures are still emerging, this release signals Google's continued investment in diversifying its AI model architectures.

Flux.dev: GrainScape UltraReal LoRA

Released: (2025-05-22) Link: Reddit Announcement

Flux.dev has released an updated version of their GrainScape UltraReal LoRA for Stable Diffusion. This new version was trained on a completely new dataset built from scratch to enhance both image fidelity and personality. Key improvements include reduced vertical banding on flat textures, enhanced grain structure, and boosted color depth for more vivid outputs. The LoRA maintains strong performance for black-and-white generations while addressing previous "same face" issues in portraits.


TECHNOLOGY

Open Source Projects

Dify - LLM App Development Platform

Dify is gaining significant momentum (98,437 stars, +357 today) as an open-source platform for building LLM applications. It offers an intuitive interface that combines AI workflow management, RAG pipelines, agent capabilities, and observability features. Recent updates focus on improving OpenSearch configuration and fixing knowledge retrieval issues, making it increasingly production-ready for teams looking to build AI applications without extensive engineering resources.

ComfyUI - Powerful Modular Diffusion Interface

ComfyUI (77,426 stars, +109 today) continues to dominate as the most flexible visual interface for diffusion models. Built around a node-based workflow, it allows for intricate customization of image generation pipelines. Recent commits show ongoing performance improvements for handling large prompt queues and significant code optimization in the server component, enhancing stability for complex workflows.

LangChain - Context-Aware Reasoning Framework

LangChain (107,902 stars) remains one of the most established frameworks for building context-aware AI applications. Recent updates include documentation improvements for Exa integration and fixes for OpenAI's strict schema implementations, reflecting the project's ongoing commitment to maintaining compatibility with evolving LLM APIs.

Models & Datasets

Mistral's Devstral-Small-2505

Mistral AI has released a new small model specifically optimized for developers, supporting an impressive 17 languages including English, French, German, Spanish, Japanese, Korean, Russian, Chinese, and more. The model is vLLM-compatible and distributed under the Apache 2.0 license, making it accessible for commercial applications.

BAGEL-7B-MoT

ByteDance has released BAGEL-7B-MoT, an any-to-any multimodal conversion model built on Qwen2.5-7B-Instruct. The model, described in arXiv paper 2505.14683, enables flexible conversions between different modalities, expanding the toolkit for developers working on multimodal applications.

Wan2.1-VACE-14B

This video generation model from Wan-AI supports multiple input modes: video-to-video editing, reference-to-video, and image-to-video generation. With impressive download numbers (12,886) and strong community support (262 likes), it implements techniques from several research papers (2503.20314, 2503.07598) and is available under Apache 2.0 license.

Ultra-FineWeb Dataset

OpenBMB has released an enormous pretraining dataset (>1T tokens) for language models, supporting both English and Chinese. With nearly 12,000 downloads, this dataset described in papers 2505.05427 and 2412.04315 offers a substantial resource for researchers training foundation models.

EuroSpeech Dataset

A comprehensive multilingual speech dataset covering 24 European languages, EuroSpeech supports both automatic speech recognition and text-to-speech tasks. Its broad language coverage (including German, English, French, Italian, and many others) makes it particularly valuable for developing inclusive speech technologies across Europe.

Developer Tools & Interfaces

Step1X-3D

A popular Gradio interface (162 likes) for 3D content generation from StepFun AI, offering accessible tools for creating three-dimensional assets from simple prompts, making 3D content creation more accessible to non-technical users.

Kolors Virtual Try-On

This extremely popular Gradio space (8,806 likes) from Kwai-Kolors provides virtual clothing try-on functionality, allowing users to visualize how garments would look when worn without physical fitting, demonstrating practical retail applications of generative AI.

SmolVLM Realtime WebGPU

The WebML community has created a demonstration of running visual language models directly in the browser using WebGPU, showcasing the potential for client-side AI without server dependencies. This technical achievement (106 likes) points to expanding possibilities for edge computing with multimodal models.

AI Comic Factory

With over 10,000 likes, this Docker-based space provides tools for generating complete comic strips using AI, demonstrating the creative potential of generative models for visual storytelling and content creation.


RESEARCH

Paper of the Day

Learning to Reason via Mixture-of-Thought for Logical Reasoning (2025-05-21)

Tong Zheng, Lichang Chen, Simeng Han, R. Thomas McCoy, Heng Huang
University of Pittsburgh, Carnegie Mellon University

This paper stands out for introducing a novel multi-modal reasoning framework that mirrors how humans naturally utilize different representational formats to solve logical problems. Unlike traditional approaches that rely on a single reasoning modality, the proposed Mixture-of-Thought (MoT) paradigm allows models to seamlessly integrate natural language, code, and symbolic logic during both training and inference. The authors demonstrate that this approach significantly outperforms single-modality methods on complex logical reasoning tasks, suggesting a promising direction for enhancing LLMs' reasoning capabilities through more human-like cognitive processes.

Notable Research

DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning (2025-05-21)
Gaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang
This paper introduces a ground truth-free training framework that enables language models to autonomously enhance their reasoning abilities through multi-agent debates, demonstrating significant improvements on complex reasoning tasks without requiring additional supervised data.

Subquadratic Algorithms and Hardness for Attention with Any Temperature (2025-05-20)
Shreya Gupta, Boyang Huang, Barna Saha, Yinzhan Xu, Christopher Ye
The researchers provide theoretical breakthroughs on the computational complexity of attention mechanisms, establishing precise conditions under which subquadratic attention computation is possible, with implications for scaling transformer models to longer context lengths.

Programmatic Video Prediction Using Large Language Models (2025-05-20)
Hao Tang, Kevin Ellis, Suhas Lohit, Michael J. Jones, Moitreya Chatterjee
This innovative approach leverages LLMs to generate programmatic representations of video dynamics, enabling more interpretable and controllable video prediction that outperforms traditional neural network methods on various benchmarks.

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses (2025-05-21)
Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye
The authors challenge current evaluation practices for LLM alignment defenses, demonstrating that more informed adversarial approaches can effectively bypass defenses that report near-zero attack success rates against standard jailbreaking methods.

Research Trends

Recent research is showing a distinct shift toward more sophisticated reasoning paradigms for LLMs, with multiple papers exploring multi-modal reasoning, self-improvement through debate, and programmatic representations of knowledge. There's also increasing attention to the theoretical foundations of transformer architectures, particularly regarding computational efficiency at scale. Additionally, alignment research is maturing with more rigorous adversarial evaluation methods, suggesting the field is moving beyond simplistic safety metrics toward more robust evaluation frameworks. These trends collectively point to a growing emphasis on creating models that reason more like humans while maintaining computational efficiency and safety.


LOOKING AHEAD

As Q2 2025 comes to a close, the integration of multimodal capabilities into everyday LLM applications has become standard rather than exceptional. Looking to Q3 and beyond, we anticipate the first commercial deployment of truly context-aware models that maintain persistent memory across sessions without explicit prompting. The regulatory landscape continues to evolve, with the EU's AI Act implementation phase entering its final stages and similar frameworks developing in Asia-Pacific markets.

The research community's focus is increasingly shifting toward computational efficiency rather than raw parameter count. This "efficiency revolution" is likely to produce models in late 2025 that match today's top performers with just 30% of the computational footprint – a crucial development as energy consumption concerns mount among enterprise AI adopters.

Don't miss what's next. Subscribe to AGI Agent:
GitHub X
Powered by Buttondown, the easiest way to start and grow your newsletter.