A new version of the DeepSeek R1 model, DeepSeek-R1-0528, has been released on Hugging Face and is available on some inference partner platforms. It continues to use the MIT license for model weights and code. The community is actively converting the model to GGUF format for broader compatibility.
The Gemma model family has seen numerous releases over six months, including PaliGemma 2, PaliGemma 2 Mix, Gemma 3, ShieldGemma 2, TxGemma, Gemma 3 QAT, Gemma 3n Preview, and MedGemma, alongside earlier models like DolphinGemma and SignGemma.
The Claude 4 launch is reported to be significantly accelerating development workflows. The combination of Opus 4, Claude Code, and the Claude Max plan is considered a high-return AI coding stack.
Codestral Embed, a code embedder capable of using up to 3072 dimensions, has been released.
The BAGEL model, proposed and implemented by ByteDance, is an open-source multimodal model designed for reading, reasoning, drawing, and editing, supporting long, mixed contexts and arbitrary aspect ratios without a quality bottleneck.
An updated DeepSeek model (possibly R1 v2 or 0528) reportedly shows improved accuracy, successfully answering test questions that stumped Gemini 2.5 Pro, though with increased response latency. A previous bug related to hallucinating invisible tokens with the '翻译' prompt has been fixed.
Google AI Edge Gallery, an open-source app, enables on-device, offline execution of generative AI models (like Gemma3-1B-IT q4) on Android (iOS soon), with features like 'Ask Image,' 'Prompt Lab,' and 'AI Chat,' and tunable inference settings. Some users report instability and potential privacy concerns with network requests.
Chatterbox TTS 0.5B, an open-source English-only text-to-speech model claiming to surpass ElevenLabs in quality, has been released. It is distributed via pip, with weights on HuggingFace, and offers adjustable expressive parameters with CPU-viability for short utterances.
Google announced SignGemma, an upcoming open-source model in the Gemma family, designed for translating sign language into spoken text. It aims to improve accessibility and real-time multimodal communication and is expected later this year. It reportedly generates less uncanny point cloud visualizations than previous models.
Tencent released Hunyuan Video Avatar, an open-source, audio-driven image-to-video generation model supporting multiple characters. The initial release supports single-character, 14s audio inputs. Minimum hardware is a 24GB GPU, with 80GB recommended.
A new anime-specific fine-tune of the WAN (Warp-Aware Network) video generation model for Stable Diffusion has been released on CivitAI, offering image-to-video and text-to-video capabilities for stylized animation.
Anthropic has rolled out a Claude voice mode beta for mobile devices, enabling English language tasks such as calendar summaries across all user plans.
Claude Opus 4 has reportedly reached the #1 position in the WebDev Arena benchmark, surpassing the previous Claude 3.7 and matching Gemini 2.5 Pro. Evaluations also show a significant improvement in coding performance for Sonnet 4.
Claude Opus 4 is claimed to achieve state-of-the-art results on the ARC-AGI-2 benchmark. Claude 4 Sonnet might be the first model to significantly benefit from test-time-compute on ARC-AGI 2, beating o3-preview on this benchmark at a substantially lower cost.
Findings suggest that random rewards in reinforcement learning only work for Qwen models and that observed improvements were due to clipping, raising questions about the validity of RL papers using Qwen if the model works with any random reward.
Nemotron-CORTEXA reportedly reached the top of the SWEBench leaderboard by solving 68.2% of SWEBench GitHub issues using a multi-step problem localization and repair process.
A paper on VideoGameBench indicates that the best-performing model, Gemini 2.5 Pro, completes only 0.48% of VideoGameBench and 1.6% of VideoGameBench Lite.
Frontier LLMs are reported to find solving ‘Modern Sudokus’ challenging.
DeepSeek-R1-0528 is noted for its strong coding capabilities, with user reports indicating it performs on par with or approaches models like Gemini 2.5 Pro, successfully handling complex coding tasks and resolving issues that stumped other leading models. In a custom Scrabble coding test, it generated accurate, working code and robust tests on the first try, producing more concise code than competitors.
A comparison attempt between DeepSeek-R1-0528 and Claude-4-Sonnet using a 'heptagon + 20 balls' benchmark was deemed uninformative as it relies on external physics engines, not the LLMs' inherent abilities.
Gemma 3 27B QAT, running on RDNA3 Gen1 hardware, reportedly achieved 11 tokens per second.
In user tests for web development, Gemini 2.5 Pro ranked highly, outperforming Grok 3. Opus 4 was ranked above O3 in coding by some users.
Perplexity Pro reportedly outperformed Sonar Pro in 20 tests, despite claims that Perplexity uses open-source models.