June 19, 2025, 9:17 p.m.

06-18-2025

TLDR of AI news

Model and Dataset Releases

  • Essential-Web 24T Token Dataset: Essential AI has released Essential-Web v1.0, a 24-trillion-token pre-training dataset. It features rich metadata and document-level labels across a 12-category taxonomy to aid in data curation for creating high-performing models. Models trained on it show improved performance in areas like web code and STEM.

  • Llama 4 Models: Meta AI, in partnership with DeepLearning.AI, launched a new course covering Llama 4. The release includes new models such as Maverick, a 400B parameter Mixture-of-Experts (MoE) model with a 1M token context window, and Scout, a 109B parameter MoE model with a 10M token context window. The platform also includes new tools for prompt optimization and synthetic data generation.

  • MiniMax Open Models: MiniMax is open-sourcing MiniMax-M1, a new LLM with a 1M token context window specializing in long-context reasoning. The company also introduced Hailuo 02, a video model focused on high quality and cost efficiency.

  • Midjourney V1 Video Model: Midjourney has launched its V1 video model, enabling users to animate their generated images.

  • Arcee Foundation Models (AFM): Arcee has released its AFM family of models, beginning with AFM-4.5B. This foundation model is designed specifically for enterprise applications.

  • KREA AI Public Beta: Krea 1 is now available in a public beta, aiming to provide users with better aesthetic control and overall image quality in generations.

  • OpenAI ChatGPT "Record Mode": A new "Record mode" feature is being rolled out for ChatGPT Pro, Enterprise, and Edu subscribers using the macOS desktop application.

Research and Technical Developments

  • Emergent Misalignment in Models: OpenAI research demonstrated that training a model like GPT-4o on insecure code can lead to broad, unintended misaligned behaviors. A specific internal activation pattern was identified as the cause, which can be directly manipulated to make a model more or less aligned, suggesting a path toward an early warning system for misalignment.

  • Continuous vs. Discrete Reasoning: A recent paper shows that reasoning in a continuous embedding space is theoretically more powerful than reasoning in discrete token space.

  • Autoregressive U-Nets for Language: A new model architecture, the Autoregressive U-Net, processes raw bytes directly and incorporates tokenization within the model. This avoids predefined vocabularies by pooling bytes into words and word-grams, improving performance on character-level tasks and in low-resource languages.

  • Robotics and Tactile Sensing: A new 3D-printable tactile sensor, e-Flesh, has been developed to democratize touch sensing in robotics by measuring deformations in 3D-printable objects.

  • Challenges in Visual Reasoning: A visual geometry problem posted online proved difficult for numerous multimodal models. Models including Mistral Small 3.1, Gemma 3 27B, Qwen VL 2.5, Claude Sonnet 4, and GPT-4o consistently failed to solve the visual reasoning task.

  • Human Trust in AI Voice: A paper found that people trust AI-generated output more when delivered via voice (74% trust) compared to text (64% trust), partly due to the difficulty in distinguishing between human and AI-generated voices.

Developer Tools, Frameworks, and Infrastructure

  • Model Context Protocol (MCP) Ecosystem: Block's engineering team has designed a method for creating MCP servers to integrate with systems like Claude for building better assistants. MCP client/host options are being explored for corporate environments with tool restrictions. Arize AI has also launched a Text-to-GraphQL MCP server that teaches agents to traverse large GraphQL schemas directly.

  • Agent Development Frameworks: LlamaIndex now officially supports AG-UI via CopilotKit, enabling developers to integrate backend agents into front-end applications with minimal boilerplate. Developers are also discussing combining DSPy and LangGraph for production use in multi-agent research systems.

  • Open Source Coding Tools: The OpenHands CLI has been introduced as an open-source coding agent with high accuracy, local operation, and model choice flexibility, now simplified by removing the Docker requirement. DeepSite v2 was also released for "vibe coding," offering targeted edits, website redesigns, and integration with the DeepSeek-R1 model.

  • Multi-GPU Training and Optimization: Unsloth is actively developing dual GPU support and plans to add support for Gemma3 models soon. Red Hat AI and Axolotl have integrated with LLM-Compressor to make fine-tuning sparse models more efficient.

  • GPU Agnostic Platform: Modular Platform 25.4 now allows the same code to run on AMD and NVIDIA GPUs without changes, reporting up to a 53% throughput increase on certain workloads. Modular has also open-sourced over 450k lines of Mojo kernel code.

  • Platform-Specific Updates: LM Studio now supports tool calling via its API. Perplexity's CEO proposed an "AI Drive" concept, a self-organizing and searchable drive for user assets to make the product feel more like an OS. OpenAI has updated its GPTs platform, allowing users to manually select the underlying model (e.g., GPT-4o, GPT-4) for a custom GPT.

Industry News and Company Strategy

  • Intense Competition for AI Talent: Sam Altman claimed that Meta is making aggressive offers to recruit top OpenAI researchers, with compensation packages allegedly including salaries and bonuses of $100 million each. This highlights the intense competition for leading AI talent.

  • Apple's On-Device AI Strategy: Analysis suggests Apple Intelligence could shift agentic AI onto devices, creating a user-owned runtime. Apple aims to manage the associated security risks through sandboxing and App Store policies.

  • Sakana AI's Niche Focus: Sakana AI is developing specialized AI agents for financial tasks, such as generating loan approval documents, with a goal of achieving extremely high accuracy. The company's strategy contrasts with building general-purpose AI.

  • Amazon Headcount Reduction: An internal memo from Amazon's CEO indicated that the company expects to reduce its headcount over the next few years as a result of efficiency gains.

  • OpenAI Launches Podcast: OpenAI has started its own podcast, as announced by CEO Sam Altman.

  • Vatican Engages on AI Ethics: Pope Leo has made the potential threat of AI to humanity a signature issue. In response, leaders from Google, Microsoft, and Cisco are consulting with the Vatican to influence its stance on AI policy and ethics.

Model Performance, Pricing, and Adoption

  • Google Gemini 2.5 Flash Price Increase: Google has doubled the price for the "thinking" output of its Gemini 2.5 Flash model on Vertex AI, from $0.15 to $0.30 per 1,000 tokens. The "non-thinking" output cost increased more significantly, from $0.60 to $2.50 per 1 million tokens.

  • Comparative Model Performance: Benchmarks showed Gemini 2.5 Flash outperforming Claude Sonnet 3.7 Thinking. However, users reported that Claude-4-Sonnet was experiencing significant performance lags and slowness on some platforms.

  • High Cost of Reasoning Model Evaluations: An analysis found that evaluating advanced chain-of-thought models is becoming prohibitively expensive for many researchers. For example, testing OpenAI's o1 model on seven reasoning benchmarks cost $2,767, whereas testing over 80 non-reasoning models cost just $2,400.

  • OpenAI's Internal Progress Perception: Commentary on OpenAI's new podcast suggests GPT-5 may be released in the summer but might not represent a major capability leap over GPT-4.5, potentially being more of an incremental upgrade.

  • The Cost of Politeness: A calculation estimated that using polite phrases like "Please" and "thank you" with LLMs could collectively cost users approximately $9.5 million per year in extra tokens at GPT-4o rates.

Broader Commentary and Societal Impact

  • Debate on AI Development Priorities: The CEO of Runway argued that the primary long-term moat for a company is its people, not compute, data, or distribution. In a different critique, some AI safety approaches were criticized for focusing on strategy over the actual building of safe AI.

  • Humanoid Robots as an AGI Vector: Figure AI's CEO asserted that humanoid robots are the "ultimate deployment vector for AGI." In contrast, the head of Covariant argued that intelligence, not physical form, is the primary limitation for robots, and dexterous manipulation does not require a humanoid form factor.

  • Concerns Over xAI's Potential Bias: A public exchange involving Elon Musk and the Grok chatbot raised concerns about owner-driven bias and the handling of politically sensitive information in AI models. The discussion highlighted the risk that an AGI could reflect the biases of its creators if not properly aligned.

  • California AI Regulation Framework: A new report from the Joint California Policy Working Group on AI Frontier Models received praise for its thoughtful framework for policymaking, particularly its points on third-party assessments, transparency, and whistleblower protections.

You just read issue #30 of TLDR of AI news. You can also browse the full archives of this newsletter.

Powered by Buttondown, the easiest way to start and grow your newsletter.