AGI Agent

Subscribe
Archives
July 20, 2025

LLM Daily: July 20, 2025

🔍 LLM DAILY

Your Daily Briefing on Large Language Models

July 20, 2025

HIGHLIGHTS

• Blaxel has raised $7.3M to build specialized cloud infrastructure for AI agents, positioning itself as an "AWS for AI agents" after already processing billions of agent requests in the emerging autonomous AI systems market.

• Researchers have developed NeuralOS, an experimental operating system that generates every screen frame using neural networks without traditional software stacks, running at 1.8fps on an NVIDIA H100 GPU as a proof-of-concept for neural networks as computing architecture.

• DeepMind researchers have established a theoretical connection between supervised fine-tuning (SFT) and reinforcement learning (RL), demonstrating that SFT on curated data is effectively RL in a sparse reward setting, with implications for improving model alignment techniques.

• Dify, an open-source platform for developing agentic workflows (107,656 GitHub stars), has added workflow file upload capabilities enabling podcast summarization similar to Google's NotebookLM, making AI application development more accessible.


BUSINESS

Funding & Investment

  • Blaxel Raises $7.3M in Seed Funding: The company is building specialized cloud infrastructure for AI agents, positioned as an "AWS for AI agents." According to VentureBeat, Blaxel has already processed billions of agent requests and aims to challenge AWS with its purpose-built platform for autonomous AI systems. (2025-07-17)
  • Greptile Closing $30M Series A: The YCombinator alum that builds AI code review tools is reportedly in talks with Benchmark to lead its Series A, valuing the company at $180 million, according to sources cited by TechCrunch. (2025-07-18)
  • Confident Security Emerges from Stealth with $4.2M: The San Francisco-based startup, positioning itself as "the Signal for AI," has developed a tool that wraps around AI models to ensure data privacy. The company just announced its funding and public launch. (2025-07-17)

M&A Activity

  • Windsurf Being Acquired by Cognition: The AI coding startup's CEO Jeff Wang shared on X about the "very bleak" mood at the company before the acquisition deal was reached. The announcement provides rare insight into the challenges faced by AI startups before acquisition. (2025-07-19)
  • Cursor Acquires Koala: Cursor maker Anysphere is acquiring enterprise startup Koala as part of its strategy to compete with Microsoft's GitHub Copilot in the AI coding tools space. The acquisition represents consolidation in the competitive AI development tools market. (2025-07-18)
  • ServiceNow-Moveworks Deal Under Antitrust Review: The acquisition of Moveworks by ServiceNow, announced in March, is reportedly being reviewed for antitrust concerns. According to sources familiar with the matter, the probe began in June. (2025-07-18)

Company Updates

  • OpenAI Launches ChatGPT Agent: OpenAI has unveiled its most capable AI agent to date, giving ChatGPT its own computer to autonomously use email, web apps, and create files. The agent can work with login-protected websites through a secure browser view. (2025-07-17)
  • OpenAI Strengthens ChatGPT Agent Security: OpenAI's red team conducted 110 coordinated attacks to build a security defense system for ChatGPT Agent that reportedly achieves 95% effectiveness. The comprehensive security testing addressed vulnerabilities including AI biological threats and data exfiltration attacks. (2025-07-18)
  • Mistral AI Enhances Le Chat Platform: Mistral has added deep research capabilities and voice mode to its Le Chat platform, bringing it into direct competition with ChatGPT and Gemini. The update includes native multilingual reasoning and advanced image editing features. (2025-07-17)
  • Anthropic Tightens Claude Code Usage Limits: Users of Claude Code have reported unexpectedly restrictive usage limits, particularly affecting heavy users on the $200-a-month Max plan. The changes were implemented without prior notification to users. (2025-07-17)

Market Analysis

  • Google Takes Top Spot in Embedding Model Leaderboard: Google's new Gemini Embedding model now leads the MTEB benchmark, though it faces strong competition from both closed and open source alternatives. Alibaba's open source model is closing the gap in performance. (2025-07-19)
  • AnyCoder Launches K2-Powered Web App Development Tool: A new tool powered by Kimi K2 enables fast prototyping and deployment of web applications, targeting both novice developers and experts looking to quickly spin up new projects. (2025-07-18)
  • Y Combinator Startup Pivots Away from Windows AI Agents: Pig.dev, which was working on agent technology for controlling Windows desktops, has abandoned this potentially revolutionary approach and pivoted. The case highlights challenges in the AI agent space. (2025-07-18)

PRODUCTS

NeuralOS: A Generative OS Powered by Neural Networks

Company: Research Project (yuntiandeng) | Date: (2025-07-19) Source

NeuralOS is an experimental operating system that generates every screen frame entirely from mouse and keyboard inputs using neural networks. Running at 1.8fps on an NVIDIA H100 GPU, the system uses an RNN to track computer state and a diffusion model to generate screen pixels. It functions without traditional software stacks, creating a fully hallucinated computing environment. While impractical for everyday use due to its resource requirements, it represents an innovative exploration of neural networks as fundamental computing architecture.

Kotext LoRA for Google Earth to Drone Photography Conversion

Company: Independent Developer (Alternative_Lab_4441) | Date: (2025-07-19) Source

A developer has released a specialized LoRA model for Stable Diffusion that transforms Google Earth screenshots into realistic drone photography. Primarily designed for architecture visualization purposes, the tool allows designers and urban planners to quickly generate photorealistic aerial views of locations without actual drone deployment. The model is available for free download along with a workflow guide. The community response has been enthusiastic, particularly from those working in design and visualization fields.

Flux Depth for Dungeon Styling

Company: Independent Developer (darabos) | Date: (2025-07-19) Source

A new application of the Flux Depth model has been shared for styling fantasy dungeons for tabletop gaming and visual media. This specialized implementation helps creators generate consistent, atmospherically appropriate dungeon environments with realistic lighting and depth. The tool appears particularly valuable for game masters, indie game developers, and fantasy content creators looking to quickly produce high-quality dungeon visualizations.


TECHNOLOGY

Open Source Projects

langchain-ai/langchain - 111,810 ⭐

Build context-aware reasoning applications with this popular framework for developing LLM-powered applications. Recent activity includes fixes to HuggingFace integration, documentation improvements, and updates to testing components, showing continued active maintenance of this mature project.

langgenius/dify - 107,656 ⭐

A production-ready platform for developing agentic workflows that simplifies AI application development. Dify now features workflow file upload capabilities, enabling podcast summarization similar to Google's NotebookLM. Recent commits focus on UI improvements and documentation enhancements.

rasbt/LLMs-from-scratch - 59,330 ⭐

A comprehensive educational repository for implementing ChatGPT-like LLMs in PyTorch from scratch. The project is the official code repository for Sebastian Raschka's book and features step-by-step implementation of GPT-like models, with recent updates improving memory optimization and fixing semantic errors.

Models & Datasets

New LLMs

  • moonshotai/Kimi-K2-Instruct - A new instruction-tuned model from Moonshot AI, garnering significant attention with 1,523 likes and over 125K downloads.
  • mistralai/Voxtral-Mini-3B-2507 and mistralai/Voxtral-Small-24B-2507 - Mistral AI's new audio-text-to-text models designed for multilingual audio processing, available in both 3B and 24B parameter sizes. The 24B version is fine-tuned from Mistral-Small-24B-Base-2501.
  • HuggingFaceTB/SmolLM3-3B - A compact but capable multilingual model with 538 likes and over 212K downloads, supporting English, French, Spanish, Italian, Portuguese, Chinese, Arabic, and Russian.
  • LGAI-EXAONE/EXAONE-4.0-32B - LG AI's latest 32B parameter model with particular strength in Korean and Spanish languages alongside English, referenced in a recent arXiv paper (2507.11407).

Datasets

  • NousResearch/Hermes-3-Dataset - A training dataset likely used for the Hermes-3 model series, containing between 100K and 1M examples with Apache 2.0 licensing.
  • microsoft/rStar-Coder - Microsoft's dataset for code generation containing 1-10M samples, linked to a recent arXiv paper (2505.21297) and compatible with multiple data processing libraries.
  • HuggingFaceTB/smoltalk2 - A large-scale conversation dataset (1-10M samples) likely used to train the SmolLM models, referenced in multiple arXiv papers.
  • LGAI-EXAONE/KMMLU-Pro - A Korean language evaluation benchmark dataset in the style of MMLU, containing 1K-10K samples for testing LLM capabilities in Korean.
  • microsoft/NextCoderDataset - Microsoft's code generation dataset with MIT licensing and 100K-1M examples specifically designed for text-to-code applications.

Developer Tools & Demos

  • llamameta/Grok-4-heavy-free - A free hosted demo of Grok-4, allowing users to test this advanced model without requiring their own compute resources.
  • FunAudioLLM/ThinkSound - A Gradio-based demo showcasing audio processing capabilities, demonstrating the growing interest in multimodal applications for LLMs.
  • Miragic-AI/Miragic-Virtual-Try-On - A virtual clothing try-on application using AI, with a companion Miragic-Speed-Painting application for AI-assisted digital art creation.
  • open-llm-leaderboard/open_llm_leaderboard - The highly popular (13,319 likes) benchmark leaderboard for open LLMs, featuring automatic submissions and public evaluations on code, math, and general English language tasks.
  • Kwai-Kolors/Kolors-Virtual-Try-On - An extremely popular virtual try-on demo with 9,346 likes, demonstrating the practical application of AI in fashion e-commerce.

RESEARCH

Paper of the Day

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) (2025-07-17)

Authors: Chongli Qin, Jost Tobias Springenberg

Institution: DeepMind

This paper stands out for its theoretical contribution connecting two fundamental paradigms in LLM training: supervised fine-tuning (SFT) and reinforcement learning (RL). The authors provide a rigorous mathematical formulation showing that SFT on curated data can be understood as maximizing a lower bound on the RL objective in a sparse reward setting.

The research reveals that by incorporating this theoretical understanding, SFT processes can be substantially improved. The authors introduce practical methods for enhancing SFT by leveraging insights from RL theory, demonstrating how two seemingly distinct approaches to LLM alignment are deeply connected. This reframing has significant implications for how we approach model alignment and optimization strategies in the future.

Notable Research

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities (2025-07-17)

Authors: Hao Sun, Mihaela van der Schaar

This comprehensive review bridges the gap between inverse reinforcement learning and LLM alignment, providing a systematic analysis of recent advances at this intersection and highlighting future research opportunities for enhancing LLM capabilities.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders (2025-07-17)

Authors: Kenza Bouzid, Shruthi Bannur, et al.

The authors apply Matryoshka-SAE to analyze the MAIRA-2 radiology model, extracting interpretable features that reveal how the model processes medical images and generates reports, advancing mechanistic interpretability in healthcare AI.

Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era (2025-07-17)

Authors: Matthew E. Brophy

This philosophical paper argues that traditional ethical frameworks for artificial moral agents are obsolete for evaluating LLMs due to their opacity, proposing new functional criteria focused on observable outcomes rather than transparent internal processes.

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding (2025-07-17)

Authors: Shihao Wang, Guo Chen, et al.

The research introduces a novel approach to video understanding that enables more precise temporal localization within videos, improving Video-LLMs' ability to identify and process relevant content in complex, lengthy video sequences.


LOOKING AHEAD

As we move deeper into Q3 2025, the convergence of multimodal LLMs with physical robotics is emerging as the defining trend of the coming months. The recent demonstrations of LLM-powered autonomous systems capable of complex reasoning while navigating real-world environments suggest we'll see the first commercial applications by Q1 2026. Meanwhile, the regulatory landscape continues to evolve, with the EU's AI Act Phase II implementation deadline approaching in November and similar frameworks expected from APAC countries before year-end.

Perhaps most significant is the brewing paradigm shift toward truly distributed AI computation. The limitations of centralized data centers are becoming apparent, and we expect major announcements on edge-optimized foundation models before 2026, potentially revolutionizing how AI processing is allocated between cloud and local devices.

Don't miss what's next. Subscribe to AGI Agent:
GitHub X
Powered by Buttondown, the easiest way to start and grow your newsletter.