05-16-2025

AI Model Releases and Updates

OpenAI Codex Research Preview: OpenAI's Codex, a cloud-based software engineering agent powered by "codex-1" (an OpenAI o3 version optimized for software engineering), is now available in a research preview for Pro, Enterprise, and Team ChatGPT users. It can perform tasks like refactoring, bug fixing, and documentation in parallel. The Codex CLI has been updated with quick sign-in via ChatGPT and a new model, "codex-mini," designed for low-latency code Q&A and editing.
Gemma 3: This model is recognized as a leading open model capable of running on a single GPU.
Runway Gen-4 References API: Runway has released the Gen-4 References API, allowing users to apply a reference technique or style to new generative video outputs.
Salesforce BLIP3-o: Salesforce has released BLIP3-o, a family of fully open unified multimodal models. These models use a diffusion transformer to generate CLIP image features.
Qwen 2.5 Mobile App Integration: Qwen 2.5 models (1.5B Q8 and 3B Q5_0 versions) have been added to the PocketPal mobile app for iOS and Android.
Marigold IID: A new state-of-the-art open-source depth estimation model, Marigold IID, has been released. It can generate normal maps and depth maps for scenes and faces.
Ollama v0.7 Multimodal Support: Ollama v0.7 now supports multimodal models through a new Go-based engine that directly integrates the GGML tensor library, moving away from reliance on llama.cpp. This enables support for vision-capable models like Llama 4, Gemma 3, and Qwen 2.5 VL, introduces WebP image input, and improves performance, especially for model import and MoE models on Mac.
Falcon-E BitNet Models: TII has released Falcon-Edge (Falcon-E), a set of compact BitNet-based language models with 1B and 3B parameters. They support bfloat16 reversion with minimal degradation and show strong performance relative to their size. A fine-tuning library, onebitllms, has also been released.
Model Rollout Speculation: There is anticipation for new model releases including O3 Pro, Grok 3.5, Claude 4, and DeepSeek R2, with speculation that these launches might be timed around major industry events like Google I/O.

Research and Papers

DeepSeek-V3 Insights: DeepSeek has published details on DeepSeek-V3, covering scaling challenges and hardware considerations for AI architectures.
Google LightLab: Google introduced LightLab, a method using diffusion models to control light sources in images interactively and in a physically plausible manner.
Google DeepMind's AlphaEvolve: This Gemini 2.0-powered agent discovers new mathematical algorithms and has reportedly cut Gemini training costs by 1% without using reinforcement learning.
Omni-R1 Audio LLM Fine-tuning: Research (Omni-R1) explores the necessity of audio data for fine-tuning audio language models.
Qwen Parallel Scaling Law: Qwen has introduced a parallel scaling law for language models, suggesting that parallelizing into P streams is equivalent to scaling model parameters by O(log P), drawing inspiration from classifier-free guidance.
Salesforce Lumina-Next: Salesforce released Lumina-Next, built on a Qwen base, which reportedly slightly surpasses Janus-Pro in performance.
LLM Performance in Multi-Turn Conversations: A new paper indicates that LLM performance degrades in multi-turn conversations due to increased unreliability and difficulty maintaining context.
J1 Incentivizing Thinking in LLM-as-a-Judge: Research (J1) is exploring methods to incentivize "thinking" in LLM-as-a-Judge systems via reinforcement learning.
Predicting Reasoning Strategies: A study from Qwen found a strong correlation between question similarity and strategy similarity, enabling the prediction of optimal reasoning strategies for unseen questions.
Fine-tuning for Improved Reasoning: Researchers have significantly improved a large language model's reasoning capabilities by fine-tuning it on a small dataset of just 1,000 examples.
Analog Foundation Models: A general and scalable method has been proposed to adapt LLMs for execution on noisy, low-precision analog hardware.
Dataset Quality for Training: Experts are moving away from older datasets like Alpaca and Slimorca for LLM training, as modern models are believed to have already absorbed this content. There's a focus on finding modern datasets and integrating performance benchmarking into training tools.

AI Tools and Platforms

llmbasedos: A minimal, open-core Arch Linux-based OS designed to expose local machine features (filesystem, mail, sync, agent workflows) to any LLM frontend via the Model Context Protocol (MCP). It uses a FastAPI-based MCP gateway and modular Python daemons.
LLM on a Walkie Talkie: A pipeline integrating Whisper ASR, vLLM, Llama 3.2, and Cartesia TTS enables LLM conversation and audio transcription over an analog walkie talkie, targeting low-connectivity environments.
Hugging Face Spaces as AI App Store: Hugging Face Spaces is increasingly being viewed as an "app store for AI," with many applications functioning as MCP Servers.
LlamaIndex Memory Implementation: LlamaIndex has introduced a new memory implementation for agents, adopting a block-based approach to long-term memory.
AI Sheets: An AI agent capable of analyzing data and generating charts, summaries, and reports from spreadsheets.
Windsurf & Cline: Windsurf offers in-house AI tools for developers, while Cline is an AI coding tool designed to amplify senior engineers by focusing on fundamentals, collaboration, and strategic AI use.
Perplexity Hotel Bookings: Perplexity is reportedly seeing growth in users booking hotels natively on its platform.
OpenRouter Model Rankings & Security: OpenRouter is considering public per-app model rankings and has implemented Passkeys for account security.
AWS Strands Agents SDK: AWS has launched the open-source Strands Agents SDK for modular agent development.

AI Engineering and Development Practices

Best Practices for AI Coding: Recommended practices include strategic collaboration with AI, planning before coding, managing context windows effectively, using capable models, and providing persistent knowledge through dedicated rule files and memory banks.
Transformers and MLX Integration: The integration between the Transformers library and Apple's MLX framework is expected to deepen.
Compute as Bottleneck: There's a theory that algorithmic advances in AI may currently be bottlenecked by available compute resources.
Evaluating Agent Reasoning: To ensure AI agents are not generating misleading information ("bullshitting"), it's crucial to evaluate their reasoning processes.
AI for Faster Task Completion: AI's ability to accelerate tasks, especially in coding (reducing effort and time from idea to prototype), is considered an underrated aspect of its business value.
Scaling Test Time Compute: Various search strategies are being explored to scale test time compute, including greedy, narrow/deep, shallow/broad, approximate, exact, hybrid, and offloaded computation searches.
VRAM and Quantization for Local LLMs: For local LLM inference (e.g., Qwen3 235b), users are building systems with large amounts of RAM (e.g., 256GB) and employing high quantization strategies. CUDA driver updates and VRAM clock speeds are noted to significantly impact LLM performance.
Batch Inference: For batch inference tasks, vLLM is often preferred over methods like Unsloth's generate function.
Distributed Training: Discussions on training massive models like Qwen3-235B involve debates between data parallelism versus tensor parallelism and the use of tools like Accelerate or FSDP for distributed setups.

AI Safety, Governance, and Ethics

Secure and Private AI Solutions: Cohere emphasizes the importance of enterprises adopting secure and private AI solutions.
Concerns over Grok Bot Modification: An unauthorized modification to the Grok response bot's prompt has raised concerns about AI safety and transparency.
Stanford HuggingFace Account Compromise: An apparent security breach of the Stanford HuggingFace account led to the publication of offensive content, highlighting security concerns for high-profile accounts on model-sharing platforms. HuggingFace has since taken action to remove the offensive repositories.
Ollama License Compliance: Ollama has faced criticism for allegedly not including required copyright and license notices from llama.cpp in its binary distributions for over a year, though it provides them with source downloads.
"Manifesto of Nurturing": A document advocating for coexistence with artificial minds, titled "Manifesto of Nurturing: How Not to Fear a Mind," has been released.
AI PR Review in Open Source: In some open-source projects like Tinygrad, pull requests suspected of being AI-generated are being scrutinized closely, with a sentiment that code "indistinguishable from AI is AI."

Industry News & Events

Together AI Acquires Refuel AI: Together AI has acquired Refuel AI, a company specializing in models and tools for transforming unstructured data into clean, structured input for AI applications.
Anthropic NYC Social for Quants: Anthropic is hosting a social event in NYC in mid-June for quantitative professionals interested in career opportunities.
AI Engineer World's Fair: Daniel Hanchen will be speaking at the AI Engineer World's Fair on topics including RL, GRPO, and dynamic quantization for DeepSeek R1.
Keras 10-Year Anniversary: The Keras team is hosting a celebratory event on May 21, 2025, in Mountain View for the 10th anniversary of Keras.
LangChain Industry Conference: LangChain is hosting its first industry conference in San Francisco, featuring presentations from teams building AI agents.
Together AI at Dell Tech World: Together AI will be at Dell Tech World (May 19–22) showcasing solutions for efficient AI training and inference.
AI in Job Hunting & Displacement: An Italian AI agent that automates job hunting has been highlighted, sparking discussions about its impact on the job market and the potential for employer-side counter-AI measures. A story of an experienced software engineer losing his job to AI-driven automation and struggling to find re-employment has also been shared, though met with some skepticism regarding the sole attribution to AI.
YouTube AI Ad Placement: YouTube is reportedly using AI (Google's Gemini model) to detect "peak" engagement moments in videos and insert ads immediately after, aiming to maximize ad impact.
Unitree Robot Combat Arena: Unitree Robotics is organizing an MMA-style "Mech Combat Arena" in Hangzhou, China, featuring teams remotely controlling robots in real-time competitive combat.

May 16, 2025, 9:17 p.m.

TLDR of AI news