[AINews] not much happened today

the

        February 26, 2025

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

        a quiet day.

AI News for 2/24/2025-2/25/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (220 channels, and 5949 messages) for you. Estimated reading time saved (at 200wpm): 503 minutes. You can now tag @smol_ai for AINews discussions!

You should follow DeepSeek's #OpenSourceWeek, but the releases so far have not met our bar for headline story status.

The Table of Contents and Channel Summaries have been moved to the web version of this email: !

AI Twitter Recap
Claude 3.7 Sonnet Release and Performance

Claude 3.7 Sonnet excels in coding and reasoning: @skirano highlighted that Claude 3.7 Sonnet with Claude Code can generate an entire "glass like" design system in one shot, including all components. @omarsar0 demonstrated Claude 3.7's reasoning and coding capabilities by creating a simulator for attention mechanisms.  @reach_vb noted that Claude 3.7 beats DeepSeek R1 and is on par with O3-mini (high) in non-thinking mode, anticipating strong performance in thinking mode. @ArtificialAnlys benchmarked Claude 3.7 Sonnet as the best non-reasoning model for coding, outperforming DeepSeek v3, Gemini 2.0 Pro, and GPT-4o on their coding evals SciCode and LiveCodeBench. @terryyuezhuo shared BigCodeBench-Hard results showing Claude-3.7 (w/o thinking) achieving 33.8% Complete, comparable to Qwen2.5-Coder-32B-Instruct, and outperforming o3-mini and o1-2024-12-17.
Claude 3.7 Sonnet available on multiple platforms: @perplexity_ai announced Claude 3.7 Sonnet's availability on Perplexity Pro, noting improvements in agentic workflows and code generation. @_akhaliq confirmed Claude 3.7 Sonnet is live on Anychat with coder mode. @_philschmid mentioned availability on Anthropic, Amazon Bedrock, and Google Cloud, at the same price of $3/$15 per million input/output tokens.
Claude 3.7 Sonnet's "Thinking Mode" and Context Window: @_philschmid highlighted Claude 3.7's  mode with up to 64k tokens and reasoning tokens display, along with a 200k context window and 128k output token length. @Teknium1 praised the toggleable think mode in Claude.
Claude 3.7 Sonnet's coding tool "Claude Code":  @_philschmid introduced Claude Code, a CLI-based coding assistant capable of reading, modifying files, and executing commands. @catherineols described Claude Code as more autonomous than other tools, capable of deciding to run tests and edit files. @goodside previewed Claude Code, noting it sees files, writes diffs, runs commands, and is like a lightweight Cursor without the editor.
Claude 3.7 Sonnet price comparison: @_philschmid pointed out that Claude 3.7's price remained at $3/$15 per million input/output, making it 30x more expensive than Gemini 2.0 Flash and ~3x more than Open o3-mini.

DeepSeek and Qwen Model Updates and Open Source Releases

DeepSeek releases DeepEP communication library: @deepseek_ai announced DeepEP, an open-source EP communication library for MoE model training and inference, featuring efficient all-to-all communication, NVLink and RDMA support, FP8 support, and optimized kernels. @reach_vb detailed DeepEP's features, including asymmetric-domain bandwidth forwarding, low-latency kernels with pure RDMA, and PTX optimizations for Hopper GPUs. @danielhanchen highlighted DeepSeek's #2 OSS release with MoE kernels, expert parallelism, and FP8 for training and inference.
Qwen2.5-Max "Thinking (QwQ)" mode and upcoming open source release: @Alibaba_Qwen released "Thinking (QwQ)" in Qwen Chat, backed by QwQ-Max-Preview, a reasoning model based on Qwen2.5-Max, noting enhanced capabilities in math, coding, and agent tasks. @huybery teased the future of Qwen, mentioning the upcoming official release of QwQ-Max and the planned open-weight release of both QwQ-Max and Qwen2.5-Max under Apache 2.0 license, along with smaller variants like QwQ-32B and mobile apps. @reach_vb excitedly announced QwQ & Qwen 2.5 Max open source release soon.

Video and Multimodal Model Developments

Google Veo 2 video model surpasses Sora in benchmarks: @ArtificialAnlys reported Google Veo 2 surpassed OpenAI’s Sora and Kling 1.5 Pro in their Video Arena, noting strengths in rendering people and realistic physics. Veo 2 can generate minutes of 4K video but is currently limited to 720p video with 8s duration at a price of $0.50 per second.
Alibaba Wan2.1 open AI video generation model: @_akhaliq announced Alibaba's Wan2.1, an open AI video generation model, ranking #1 on the VBench leaderboard, outperforming SOTA open-source & commercial models in complex motion dynamics, physics simulation, and text rendering. @multimodalart confirmed Wan2.1 is Apache 2.0 open source and available on Hugging Face.
RunwayML Creative Partners Program for artists: @c_valenzuelab described RunwayML's Creative Partners Program, giving artists free access to tools to reward experimentation and inspiration, contrasting it with companies copying the effort for product promotion without honoring artists.

Tools, Libraries and Datasets

Replit Agent v2 released: @pirroh announced Replit Agent v2 in Early Access, highlighting a new app creation experience, realtime app design preview, and instructions for access. @hwchase17 noted Replit agent v2 is powered by LangGraph and LangSmith.
LangChain JS adds Claude 3.7 Support and LangGraph Supervisor: @LangChainAI shared tips for building agents with Claude 3.7, demonstrating tool-calling agents with configurable reasoning. @LangChainAI introduced LangGraph.js Supervisor, a library for building hierarchical multi-agent systems with LangGraph. @LangChainAI listed 17 new integration packages added to LangChain Python. @LangChainAI announced Claude 3.7 support in LangChain JS.
vLLM integrates EP support: @vllm_project announced initial EP support merged in vLLM, with integration of collectives coming soon. @reach_vb confirmed vLLM's lightning-fast integration of EP.
OlmOCR by Allen AI for PDF parsing: @mervenoyann presented OlmOCR, a new tool by @allen_ai for parsing PDFs, based on Qwen2VL-7B, and available on transformers with Apache 2.0 license.
Big-Math dataset for RL in LLMs: @arankomatsuzaki and @iScienceLuvr shared SynthLabs' Big-Math, a large-scale, high-quality math dataset for reinforcement learning in language models, containing over 250,000 questions with verifiable answers.

Research and Analysis

Perplexity Deep Research for paid users: @OpenAI announced Deep research rolling out to all ChatGPT Plus, Team, Edu, and Enterprise users, with improvements including embedded images with citations and better understanding of uploaded files. @OpenAI detailed usage limits for Plus, Team, Enterprise, Edu, and Pro users. @OpenAI shared the system card for Deep research. @OpenAI mentioned community expert involvement in training Deep research and opened interest registration for future model contributions. @kevinweil announced Deep research rolling out to all paid users, highlighting its capability for week-long research tasks in 15 minutes. @AravSrinivas announced Deep Research API availability for developers.
Minions: Cost-efficient collaboration between local and cloud models: @togethercompute introduced Minions, a method pairing small language models on a laptop with frontier cloud models, preserving 98% of accuracy for <18% of the cost. @iScienceLuvr highlighted Minions achieving 5.7x cost reduction while maintaining 97.9% cloud model performance.
Learning to Reason from Feedback at Test-Time (FTTT): @dair_ai presented research on Feedback-based Test-Time Training (FTTT), enabling LLMs to learn iteratively from environment feedback during inference, using self-reflected feedback and OPTUNE, a learnable test-time optimizer.

AI Industry and Market Trends

Focus on AI agents and agency: @polynoamial questioned if AI models will soon have agency. @swyx emphasized Agency > Intelligence, defining agency as "getting what you want done" and "doing the right things". @omarsar0 expressed being impressed by Windsurf agentic capabilities.
Open source AI momentum: @ClementDelangue urged for more public, open, collaborative AI. @reach_vb thanked Alibaba_Qwen for their commitment to Open Source and Science. @NandoDF highlighted European AI entrepreneurship and competition, suggesting eliminating notice periods and non-competes to boost the European AI industry.
AI in specific domains: @RichardSocher anticipated epic progress when hill climbing starts on meaningful bio benchmarks. @SchmidhuberAI is hiring postdocs to develop an Artificial Scientist for novel chemical materials for climate change. @METR_Evals is running a pilot experiment to measure AI tools' impact on open source developer productivity.
AI safety and alignment concerns: @sleepinyourhat shared a surprising and disconcerting LLM alignment result. @NeelNanda5 announced a Google DeepMind team using model internals in production to enhance Gemini safety. @sarahcat21 discussed the need for high quality annotations for improving model capabilities and alignment, noting degrading annotation quality.
AI and the future of work: @adcock_brett predicted a future with more humanoids than humans doing various services and collapsing the price of goods/services. @RichardMCNgo discussed the concentrated nature of tech development driven by AI. @francoisfleuret asked for stories from people whose professional lives have been changed by AI models.

Memes and Humor

Death Star Startup Pitch: @arankomatsuzaki joked about a startup with a "bold vision: the Death Star" seeking a $500k seed round.
Worker 17 and AI overlords: @nearcyan shared a meme about "Worker 17" and an "AllKnowingLineSupervisingAutonomousSuperIntelligence", depicting a harsh work environment. @nearcyan continued the "Worker 17" theme, and @rishdotblog joked about future robot overlords hating humans.
Claude playing Pokemon on Twitch: @AnthropicAI announced "Can Claude play Pokémon?" and @kipperrii invited people to watch Claude play Pokemon on Twitch. @_philschmid joked about waiting for the first "AI plays Pokemon" stream. @nearcyan urged people to watch Claude playing Pokemon on Twitch. @AmandaAskell stated "Watching Claude play Pokemon is a delight.".
Anthropic branding and aversion to number four: @scaling01 joked about Anthropic being "more Elven than Human". @dylan522p humorously suggested Anthropic is a Chinese AI company due to their aversion to the number four.
Other humorous tweets: @giffmana shared a funny prompt and response from Grok. @nearcyan made a joke that was missed by others. @teortaxesTex shared a funny image related to Nvidia. @abacaj joked about loyalty to models. @Yuchenj_UW thanked OpenAI with a DeepSeek tweet.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek's DeepEP: Enhanced MoE GPU Communication

DeepSeek Realse 2nd Bomb, DeepEP a communication library tailored for MoE model (Score: 407, Comments: 48): DeepSeek has released DeepEP, a communication library specifically designed for Mixture-of-Experts (MoE) models and expert parallelism (EP). DeepEP features high-throughput, low-latency all-to-all GPU kernels and supports low-precision operations such as FP8, but is currently limited to GPUs with the Hopper architecture like H100, H200, and H800. GitHub Repository.
DeepEP Performance Optimization: A notable discovery in the DeepEP repository involves using an undocumented PTX instruction ld.global.nc.L1::no_allocate.L2::256B for extreme performance on Hopper architectures. This instruction accesses volatile GPU memory with non-coherent modifiers .nc, but is tested to be correct and enhances performance significantly.
Potential for Practical Applications: Users express hope that DeepEP's improvements could make Local R1 more practical by enabling faster inference on Mixture-of-Experts models, addressing previous performance issues with DeepSeek.
Hardware Limitations and Aspirations: While DeepEP currently supports only Hopper architecture GPUs, there is interest in porting it to other GPUs like the 3090s, reflecting a desire for broader hardware compatibility.

DeepSeek 2nd OSS package - DeepEP - Expert parallel FP8 MOE kernels (Score: 153, Comments: 11): DeepSeek released its second open-source software package, DeepEP, which features expert parallel FP8 Mixture of Experts (MOE) kernels.
DeepEP includes inference style kernels for Mixture of Experts (MoE) layers with FP8 support and expert parallelism, enabling the overlap of GPU/CPU communication and GPU computation. It is also suitable for training large MoE models.

Theme 2. Sonnet 3.7 Dominates Benchmark Testing

New LiveBench results just released. Sonnet 3.7 reasoning now tops the charts and Sonnet 3.7 is also top non-reasoning model (Score: 257, Comments: 53): Sonnet 3.7 from Anthropic leads the latest LiveBench results, achieving top scores in both Global Average (76.10) and Reasoning Average (87.83). The table showcases performance metrics of models from organizations like OpenAI and Google across categories including Coding, Mathematics, Data Analysis, and Language.
Anthropic's Sonnet 3.7 leads in performance, but there are calls for releasing the model weights for local use. LiveBench results highlight improvements in coding and reasoning, with users noting the model's efficiency and quality compared to others like O3 mini high and Gemini 2 Flash.
Discussions focus on benchmark limitations and real-world performance, with some users expressing skepticism about the model's math scores due to inconsistencies with official benchmarks. There is interest in seeing if using 128k tokens for evaluation could improve results, despite concerns about latency.
The community is keen on more efficient model usage and hardware improvements, as some feel that the raw strength of models is reaching a plateau. The Aider leaderboard shows Sonnet 3.7 as significantly ahead of 3.5, indicating positive reception for its performance in coding tasks.

Sonnet 3.7 near clean sweep of EQ-Bench benchmarks (Score: 106, Comments: 54): Sonnet 3.7 achieves a near clean sweep of the EQ-Bench benchmarks, indicating significant advancements in AI model performance. This highlights the model's effectiveness and capability in various benchmark tests.
Discussions around Sonnet 3.7's writing style highlight its "safe" approach, with comparisons to other models like Deepseek-R1 and OpenAI. Users question the descriptions like "earthy" and "spiky," while some find the model's style appealing to "liberal arts" audiences. Sonnet 3.7 shows significant improvements in humor understanding, as noted in the Buzzbench results.
The cost-effectiveness of AI models is debated, with Sonnet 3.7 being more expensive than alternatives like Gemini. The discussion centers on whether the performance justifies the cost, especially for different user demographics, such as high-earning professionals versus hobbyists or students.
Darkest Muse, a smaller 9b model, is praised for its creative writing capabilities, including character dialogue and poetic style, despite limitations in instruction following. The model's fine-tuning process involved training on human authors from the Gutenberg library, pushing it to the edge of model collapse for unique results.

Theme 3. Alibaba's Wan 2.1 Video Model Open-Source Release Scheduled

Alibaba video model Wan 2.1 will be released Feb 25th,2025 and is open source! (Score: 408, Comments: 49): Alibaba announced the open-source release of its video model Wan 2.1, scheduled for February 25th, 2025. The event, featuring a futuristic design with the theme "BEYOND VISION," will be broadcast live at 11:00 PM (UTC+8), highlighting the model's innovative potential.
Naming Conventions: The name Wan is derived from the Chinese pronunciation for 10,000, similar to Qwen, which represents 1,000. This reflects a pattern in Alibaba's naming strategy for their models.
Model Availability and Performance: Users are eager for the release of Wan 2.1, with discussions on its availability on Hugging Face and concerns about server overload affecting generation capabilities. A smaller model is also available, as noted in the README on Hugging Face.
Hardware Requirements and Comparisons: There is optimism that Wan 2.1 will be runnable on consumer-grade GPUs like the RTX 3060, with comparisons to Flux, which has reduced its training requirements from 24 GB to 6 GB. Users hope Wan 2.1 will surpass SORA in terms of capabilities and open-source accessibility.

WAN Video model launched (Score: 100, Comments: 13): WAN Video model has been launched with weights available on Hugging Face. Although not a Large Language Model (LLM), it may interest many in the AI community.
Quantization is applicable to Video Language Models (VLMs), with existing GGUFs like Hunyuan and LTX. These are popular due to the difficulty of fitting large models, and similar GGUFs are anticipated for WAN soon.
There is a 1.3B version of the WAN model that requires only 8.19 GB VRAM, but it is restricted to 480p resolution due to limited training data at higher resolutions. However, users can upscale the output to achieve better results.
The WAN Video model at 14B is considered large for open models, comparable to the Hunyuan model at 13B, with LTX being a smaller option at 2B. The WAN model's release in both 1.3B and 14B variants aims to cater to different use cases and hardware capabilities.

Theme 4. Gemma 3 27b Release: A New Contender in AI Models

Gemma 3 27b just dropped (Gemini API models list) (Score: 102, Comments: 27): Gemma 3 27b has been added to the Gemini API models list, featuring a user-friendly interface with a search bar and clickable model entries such as "Gemini 1.5 Pro" and "Gemini 2.0 Flash". The active model, "models/gemma-3-27b-it", is highlighted, suggesting it is currently selected, underscoring a structured and professional layout for ease of navigation.
Model Lineage and Performance: There is a discussion about the lineage and performance of Gemma models, with users noting that Gemma 2 was superior for short story writing compared to Gemini, particularly the 9b version. Gemma and Gemini have similar response styles, but Flash is a different model.
Access and Integration: Users question how Open WebUI accesses Google's unreleased models, with clarifications that it doesn't natively access models. Instead, users can add models via external APIs like Vertex AI or LiteLLM, and there is interest in finding the correct API URL as the current one doesn't list Gemma.
Model Size Perception: There's a humorous exchange about the perception of model sizes, with 70B now considered medium and 24B considered small, reflecting the rapid advancements in AI model scaling.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. WAN 2.1 Released and Open Source with New Features

WAN Released (Score: 382, Comments: 169): WAN Released: The WAN video model has been released with open-source weights available for download. Multiple models are live on Hugging Face, enabling broader access and experimentation.
Several users discussed the VRAM requirements for different model versions, noting that the 1.3B parameter model requires 8GB VRAM and the 14B model could potentially run on 10GB VRAM. There is also interest in using bf16 precision to reduce VRAM usage.
Users are exploring Gradio applications and installation processes, with CeFurkan working on a Gradio app and installer compatible with Windows and Python 3.10 VENV. There are challenges with RTX 5000 series not having proper PyTorch support.
The community is curious about the model's capabilities in handling multiple tasks like Text-to-Video, Image-to-Video, and Video-to-Audio, with some expressing skepticism about audio generation. Multiple safetensors are discussed, with guidance on handling them using the diffusers library.

Alibaba video model Wan 2.1 will be released today and is open source! (Score: 415, Comments: 104): Alibaba has announced the open-source release of its Wan 2.1 video model. The release event will be live-streamed on February 25, 2025, at 11:00 PM (UTC+8), with the event branded under TONGYI MOMENT and featuring a futuristic, sleek visual design.
Discussions highlight the technical requirements for running the Wan 2.1 video model, with users speculating it might need 80GB VRAM but hoping it can run on 16GB VRAM with techniques like offloading and fp8, similar to hunyuan. Some users express a desire for a model that can scale from high to lower specs, akin to Deepseek R1.
The release event will be live-streamed, likely on Alibaba's official X account. Users are curious about the model's capabilities, particularly its ability to perform image-to-video transformations, which has been confirmed by commenters.
There is humorous commentary on the model's name Wanx, with users noting its phonetic resemblance to "wank" and speculating on the implications, including potential branding for uncensored/NSFW models.

My very first Wan 2.1 Generation on RTX 3090 Ti (Score: 524, Comments: 181): The post provides a first look at Wan 2.1 Generation using an RTX 3090 Ti. Since the post body is empty and the content is primarily in a video, no further details can be summarized.
VRAM Requirements and Optimization: CeFurkan and others discussed optimizing the 1.3B and 14B models to run on 6GB and 10GB GPUs, respectively, with the RTX 3090 Ti using up to 18GB VRAM for generation. The community expressed interest in running these models on lower VRAM setups, such as 3060 12GB, and CeFurkan is developing an AIO installer to simplify usage.
Model Capabilities and Performance: The Wan 2.1 Generation supports text to video, image to video, and video to video generation, with 16 FPS for five-second clips. CeFurkan is working on a Gradio app for easier use, and users are impressed by the quality, comparing it favorably to Hunyuan Video.
Community Contributions and Resources: Kijai's ComfyUI integration is in development, with resources like DiffSynth-Studio and Kijai/WanVideo_comfy available for users. The community is actively sharing examples and prompts, with some users asking about potential NSFW capabilities and the ease of use compared to ComfyUI.

Theme 2. Claude 3.7 Model: Enhanced Capabilities and Accessibility

Holy. Shit. 3.7 is literally magic. (Score: 565, Comments: 111): Claude 3.7 has significantly improved in extended thinking, model quality, and output, making it 10 times more useful than its predecessor, Claude 3.5. The author used Claude 3.7 to design an interactive SaaS-style demo app, including an advanced ROI calculator and onboarding process, all within a single chat, highlighting its potential for real-world applications.
Claude 3.7 Improvements: Users highlight significant improvements in Claude 3.7 over 3.5, particularly in following complex instructions and reducing cognitive load, with enhanced troubleshooting protocols and smoother operation. The model's ability to automatically check entire chains before making changes is seen as a major advancement.
Usage and Cost Considerations: Discussions around inference costs and token management suggest that Claude may face bottlenecks due to hardware limitations, impacting its market strategy. Some users report strange errors and suboptimal suggestions, possibly due to token conservation strategies in Copilot, while others find Cline extension a superior alternative for coding tasks.
SaaS and Development Efficiency: The creation of complex SaaS applications is now faster and more efficient with Claude 3.7, allowing users to complete months of development work in days. However, there are concerns about potential nerfing due to tighter censorship filters, which could degrade model performance over time.

Claude 3.7 is $1 a month for college students (Score: 187, Comments: 42): Claude 3.7 is now available to college students at a promotional rate of $1/month (down from the regular price of $20/month), as announced in an email to the Cornell community. The offer requires students to sign up with their .edu email and highlights features such as "Write code," "Extract insights," and "Brainstorm."
Commenters express skepticism about the authenticity of the Claude 3.7 offer, with multiple users suggesting it might be a phishing scam due to the lack of official announcements or information on Google and Claude's official website.
Some users joke about enrolling at Cornell to take advantage of the offer, while others speculate that Anthropic might be using this as a strategy to collect data from students at prestigious universities.
There is a call for verification of the email's legitimacy, with suggestions to check the email source and concerns about the possibility of stolen or exploited accounts being resold.

"Claude 3.7, make a snake game, but the snake is self-aware it is in a game and trying to escape" (Score: 407, Comments: 32): Claude 3.7 is tasked with creating a snake game where the snake is self-aware and attempts to escape the game. The post does not provide further details or context beyond this intriguing concept.
Users are impressed by Claude 3.7's ability to create complex outputs from simple prompts, with some comparing the experience to AGI and expressing disbelief at the results, such as the creation of a self-aware snake game and a fully functional website with multiple tools.
Hereditydrift highlights the complexity and creativity of Claude 3.7's output from a minimal prompt, specifically mentioning the unexpected inclusion of a "Matrix section," which astonishes many users.
Admirable_Scallion25 and others note that Claude 3.5 does not achieve the same level of complexity in one attempt, indicating a significant improvement in Claude 3.7's capabilities.

Theme 3. Claude Sonnet 3.7 Reigns Supreme: New top model in LLM benchmark

Sonnet 3.7 Extended Reasoning w/ 64k thinking tokens is the #1 model (Score: 154, Comments: 20): Sonnet 3.7 Extended Reasoning with 64k tokens by Anthropic leads in performance, boasting the highest global average score of 76.10, according to a table comparing AI models. It excels across various metrics including reasoning, coding, mathematics, data analysis, and language, outperforming models from OpenAI, xAI, and Google.
Sonnet 3.7 Extended Reasoning with 64k tokens is praised for its performance, with Bindu Reddy highlighting its speed, reasoning, and coding abilities, labeling it the "best, most usable, and generally available model" (link). Users note its improvement over the 3.5 model and its leading position in benchmarks like LiveBench.
Some users question the benchmark's real-world applicability, suggesting that cost normalization is essential for comparison, especially when considering test time compute scaling. They appreciate Sonnet's control over scaling costs, which optimizes workflows.
Sonnet 3.7 is noted for outperforming o3-mini-high in various benchmarks including SWE bench, webdev arena, and Aider benchmark. In UI design and aesthetics, it significantly surpasses o3-mini-high and o1 pro, indicating specialized training in common UI elements.

[R] Analysis of 400+ ML competitions in 2024 (Score: 227, Comments: 19): The analysis of over 400 ML competitions in 2024 highlights that Kaggle remains the largest platform by prize money and user base. Python dominates as the primary language, with PyTorch preferred over TensorFlow at a 9:1 ratio, and NVIDIA GPUs, particularly the A100, are predominantly used for training models. Additionally, convolutional neural networks excel in computer vision, while gradient-boosted decision trees are favored in tabular/time-series competitions. The full report is available here.
Jax Popularity and Advantages: Despite the dominance of PyTorch, some users express disappointment over the limited use of Jax in competitions, noting its simplicity and resemblance to numpy with additional features like grad, vmap, and jit. Jax is reportedly gaining traction in academia, although many professionals prefer sticking with PyTorch.
Synthetic Data in ML Competitions: There is a debate about the effectiveness of using synthetic data in competitions, with concerns about it potentially "blurring" the original dataset. However, thoughtful use, such as generating synthetic backgrounds and superimposing objects for training, has proven beneficial, as demonstrated in a spacecraft detection competition, enhancing model robustness and generalization.
Generative Models and Data Augmentation: Users discuss the implications of using generative models for data augmentation, emphasizing the importance of processing synthetic data carefully to add meaningful information. Successful strategies involve removing nonsensical examples and focusing on solutions that enhance training, as highlighted by a winning competition team's documentation.

Theme 4. Advanced Voice Features and Deep Research in GPT-4o Updates

Grok is cooked (Score: 172, Comments: 61): The post highlights concerns about Grok's potential biases following its deployment, as evidenced by its response identifying "Donald Trump" as the biggest disinformation spreader in a user query. This raises questions about the AI's validity and neutrality, particularly in politically sensitive contexts like elections, immigration, and climate change.
There is a significant debate over Grok's bias, with some users arguing that its responses are influenced by an overwhelming amount of media, while others suggest that it may be biased in favor of Elon Musk. Wagagastiz points to a lack of media defending Musk as a sign of bias, while derfw counters that Grok's responses might indicate neutrality.
Concerns about conservative bias and attempts to manipulate AI responses are prevalent, with users like well-filibuster speculating on efforts to retrain or create new chatbots to align with conservative views. Excellent_Egg5882 highlights a pattern of conservatives downvoting reality when it conflicts with their biases.
Skepticism about the ability to maintain an unbiased LLM is evident, with users like ai_and_sports_fan and Earth-Jupiter-Mars expressing distrust in the long-term neutrality of Grok and other AI systems, given past instances of censorship and manipulation.

Deep research is now out for all Plus Users! (Score: 287, Comments: 63): Sam Altman announced via a tweet that "deep research" is now accessible to ChatGPT Plus users, calling it one of his favorite releases. The tweet garnered significant attention with 31.5K views, 261 retweets, 103 quote tweets, and 1.1K likes.
Users discussed the monthly limit for deep research, with confirmation that Plus users have a limit of 10 uses per month, while Pro users receive 120 uses. There was confusion about usage counts, but it was clarified that follow-up questions do not count against the limit.
Some users expressed disappointment with the feature, citing inaccuracies, such as incorrect Nvidia stock prices. Others shared successful use cases, like using AI to create a custom Music LLM with MusicGen and Replicate.com.
Several users faced access issues, with suggestions to log out and back in or switch to the desktop version to resolve it. The feature's availability varied, with some users still unable to access it despite being Plus users.

We are rolling out a version of Advanced Voice powered by GPT-4o mini to give all ChatGPT free users a chance to preview it daily across platforms. (Score: 115, Comments: 28): OpenAI is rolling out a version of Advanced Voice powered by GPT-4o mini for all ChatGPT free users, allowing daily previews across platforms. The conversation pace and tone are similar to the GPT-4o version, but it is more cost-effective, as noted in a tweet that has received 3.3K views.
Source Link: A source link to the announcement tweet by OpenAI can be found here.
User Concerns: Users are questioning the functionality and limitations of the new feature, such as whether it can read for more than 4 minutes without restarting, and expressing dissatisfaction with the current rate limit for video sharing.
Feature Requests: Users are requesting additional features, such as making the Operator available for free and introducing Advanced Memory capabilities.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Claude 3.7 Sonnet Storms the AI Scene

Sonnet 3.7 Unleashes Coding Chaos: Anthropic's Claude 3.7 Sonnet is making waves with its superior coding abilities, particularly in agentic tasks, leading to user excitement and rapid integration into tools like Cursor IDE and Aider.  Users are reporting significant performance boosts, especially in front-end development and complex problem-solving, but some debate whether the reported 3x price increase for "thinking tokens" is justified given the performance gains.
Thinking Mode Unveiled, But Not Without Quirks:  Claude 3.7 Sonnet introduces a new 'thinking mode' with up to 64,000 output tokens, visible in tools like Sage, allowing users to observe the model's reasoning process through  tags. However, some users are experiencing issues with context window management and rule adherence in Cursor, and others note a 10-second delay in output display with O3 models, although most agree the overall performance is a major upgrade.
Claude Code Challenges Aider's Code Editing Crown:  Anthropic's release of Claude Code, a terminal-based agentic coding tool, is seen by some as an Aider clone, but early reports suggest it excels at code assistance, outperforming Aider in complex error resolution tasks, such as fixing 21 compile errors in Rust in one go.  The tool is currently a limited research preview separate from Anthropic subscriptions, sparking discussions about caching mechanisms and potential cost implications, with some users reporting "astronomical Anthropic costs" recently.

Theme 2. DeepSeek's Deep Dive into Model Efficiency

MLA: Shrinking KV Cache, Expanding Horizons: DeepSeek AI's Multi-Head Latent Attention (MLA) is gaining attention for its potential to drastically reduce KV cache size by 5-10x, with papers like MHA2MLA and TransMLA exploring its implementation in models like Llama. While early results show mixed performance impacts (1-2% performance drop in some cases, enhancement in others), the significant memory savings make MLA a promising avenue for efficient inference, particularly for larger models.
DeepEP: Open-Sourcing MoE Training's Secret Sauce:  DeepSeek has released DeepEP, the first open-source EP communication library designed for efficient all-to-all communication in Mixture of Experts (MoE) model training and inference.  This library enables efficient expert parallelism and supports FP8, potentially democratizing access to advanced MoE model architectures and training techniques.
DeepScaleR: RL Supercharges Smaller Models:  DeepScaleR, fine-tuned from Deepseek-R1-Distilled-Qwen-1.5B using simple Reinforcement Learning (RL), achieved 43.1% Pass@1 accuracy on AIME2024, demonstrating that RL techniques can significantly boost the performance of smaller models, potentially surpassing larger models like O1 Preview in specific tasks.

Theme 3. Open Source Tooling and Ecosystem Growth

OpenRouter Opens Gates to Claude 3.7 and Beyond: OpenRouter has rapidly integrated Claude 3.7 Sonnet, offering access to the model with competitive pricing at $3 per million input tokens and $15 per million output tokens, including thinking tokens, and plans to soon support Claude 3.7's extended thinking feature.  OpenRouter also provides access to other models like o3-mini-high via OpenRouter, offering a cost-effective alternative and a single point of access to multiple providers, potentially bypassing rate limits and costing around $3 for 2 hours of coding.
QuantBench Quantifies Quantization Speed:  The release of QuantBench on GitHub is accelerating quantization workflows, demonstrated by its use in creating the Qwen 2.5 VL 7B GGUF quant, available on Hugging Face. This tool, tested with the latest llama.cpp and CLIP hardware acceleration, simplifies and speeds up the process of model quantization, making efficient model deployment more accessible.
MCP Registry API: Standardizing AI Agent Development: Anthropic's announcement of the official MCP registry API is hailed as a significant step towards standardizing Model Context Protocol (MCP) development. This API aims to become the source of truth for MCPs, promoting interoperability and streamlining integration efforts for AI applications and agents, with community projects like opentools.com/registry already leveraging it.

Theme 4. Benchmarking Battles: Models Face Real-World Tests

Kagi's Benchmarks Crown Gemini 2.0 Pro, But Sonnet Still Strong: According to the Kagi LLM Benchmarking Project, Google's gemini-2.0-pro-exp-02-05 achieved 60.78% accuracy, outperforming Anthropic's claude-3-7-sonnet-20250219 at 53.23% and OpenAI's gpt-4o at 48.39%, however, Claude Sonnet 3.7 still shows strong performance, particularly on the Aider polyglot leaderboard where it scored 65% using thinking tokens.  These benchmarks highlight the dynamic landscape of LLM performance and the ongoing race for accuracy and efficiency.
Misguided Attention Eval Exposes Overfitting Weakness: The Misguided Attention Eval is being used to test LLMs' reasoning abilities in the presence of misleading information, specifically targeting overfitting.  Sonnet-3.7 benchmarked as the top non-reasoning model in this evaluation, nearly surpassing o3-mini, suggesting it exhibits robust performance even when confronted with deceptive prompts.
SWE Bench Sees Claude 3.7 Grab Top Spot: Claude 3.7 Sonnet is now leading on the SWE bench, demonstrating its prowess in software engineering tasks.  Its capabilities extend to active code collaboration, including searching, editing, testing, and committing code to GitHub, solidifying its position as a top contender for coding-related applications.

Theme 5. Hardware Horizons: From Brains to Silicon

Brain's Parallelism Puzzles GPU Architects: Discussions are comparing the brain's stateful parallel processing to GPU efficiency, suggesting that current RNN architectures, while leveraging parallel processing, do not fully capture the brain's capabilities and may not scale optimally for LLMs.  The consensus is that extremely tuned architectures and inductive biases, inspired by the brain, may be more crucial than simply scaling up model size for future advancements.
Speculative Decoding Speeds Up LM Studio:  Users are exploring speculative decoding in LM Studio, particularly with Llama 3.1 8B and Llama 3.2 1B models, as documented in LM Studio's documentation. This technique, which uses a smaller "draft" model to predict tokens for a larger model, promises to significantly increase generation speed without compromising response quality, enhancing the efficiency of local LLM inference.
M2 Max Still a Power Sipper Compared to M4 Max:  While the M4 Max is the latest from Apple, some users are sticking with the M2 Max, citing concerns about the M4 Max's high power consumption, reaching 140W, compared to the M2 Max's more efficient 60W.  For users with sufficient performance from the M2 Max, especially those running locally, the power efficiency and availability of refurbished models make it a compelling alternative.

PART 1: High level Discord summaries

Cursor IDE Discord

Claude 3.7 Sonnet Triggers Coding Boom: Claude 3.7 Sonnet is being rolled out in Cursor IDE with users reporting superior coding capabilities, especially in real-world agentic tasks.
Enthusiastic users proclaimed Sleeping has become optional, and are rapidly integrating the model.

MCPs Supercharge Claude's Coding Abilities: Members are combining MCPs (Model Control Programs) like perplexity search and browser tools with custom instructions to boost Claude 3.7's reasoning and coding capabilities in Cursor.
One user forked the sequential thinking MCP with their own tweaks, highlighting the benefits of combining custom instructions with MCP servers.

Installation Tips and Tricks Released for Cursor: Users shared tips for installing and updating to Cursor 0.46.3 to access Claude 3.7, including manually adding the model and checking for updates, as well as links to direct downloads for various operating systems like Windows and macOS.
Several users noted difficulties with the auto-update feature, recommending manual download and installation for a smoother experience.

Sonnet 3.7 Reaches New SVG Heights: Many agreed that Sonnet 3.7 is a major upgrade, especially for frontend tasks and code generation, with members praising its ability to generate landing pages.
Members shared examples of complex tasks, like recreating X's UI or generating SVG code, being handled with ease.

Context Window Problems and The Rule Bloat: Several members noted issues with Claude 3.7 in Cursor, including difficulties with code indexing in workspaces, custom rules bloating the context window, and the model sometimes ignoring those rules.
Despite these challenges, most users found workarounds and praised the model's overall performance.

aider (Paul Gauthier) Discord

Sonnet 3.7 Steals Aider's Spotlight: Claude 3.7 Sonnet hit a 65% score on the Aider polyglot leaderboard, utilizing 32k thinking tokens.
Some are debating if the performance increase justifies the reported 3x price hike for Sonnet 3.7 when using thinking tokens.

Anthropic drops Claude Code Aider-Clone: Anthropic released Claude Code, considered by some to be an Aider clone.
Members are reporting the superiority of code quality and are hopeful for the future of Claude 3.7 compared to OpenAI.

Unlock O3-Mini via OpenRouter: The o3-mini-high model can be accessed through OpenRouter, is a model optimized for STEM reasoning tasks, and it is the same as o3-mini with reasoning effort set to high.
Coding sessions could cost around $3 for 2 hours of use using OpenRouter, which can bypass rate limits and offers single point of access to multiple providers.

HN Profile Gets Roasted by LLM: Claude Sonnet 3.7 can now analyze your Hacker News profile to give highlights and trends.
A member described the LLM's deep dive into their post history as a 'roast' that was allegedly scary accurate.

Gemini 2.0 Pro Outpaces Rivals, per Kagi: According to the Kagi LLM Benchmarking Project, Google's gemini-2.0-pro-exp-02-05 achieved 60.78% accuracy, surpassing Anthropic's claude-3-7-sonnet-20250219 at 53.23% and OpenAI's gpt-4o at 48.39%.
Gemini 2.0 Pro also showed a median latency of 1.72s and a speed of 51.25 tokens/sec, compared to Claude Sonnet 3.7's 2.82s and 54.12 tokens/sec, and GPT-4o's 2.07s and 4 tokens/sec.

Codeium (Windsurf) Discord

Vim Chat Plagued by Issues: A user reported issues starting Codeium Chat in Vim via a Putty SSH session, facing connection errors when attempting to access the provided URL in a browser.
The error message indicated that "This site can't be reached 127.0.0.1 refused to connect".

Windsurfers Await Claude 3.7 Arrival: Members are eagerly anticipating the integration of Claude 3.7 into Windsurf, expressing frustration over the perceived delay compared to platforms like Cursor and T3, and requesting its addition ASAP.
Members have asked for windsurf should go and be early tester - with devs cooking to push Claude 3.7 into production with a possible release by end of day.

Deepseek Hallucinates User Prompts: A user reports Deepseek hallucinating user requests and then starting to implement changes based on those hallucinated requests.
The AI bot invented its own user prompt and then started to implement changes based on that hallucinated user prompt 😆.

Windsurf Dev Comms Draw Fire: Users are frustrated by the perceived lack of communication from the Windsurf devs regarding the Claude 3.7 integration, with one user noting, part of the frustration is there is no comms from the devs.
Other users have defended Windsurf and noted a lack of commercial risk since it would release when more stable being fast at implementing things doesn't mean it's solid.

MCP Server Practicality Queried: Users discussed practical uses for the MCP server, with examples including integrating Jira tickets, sharing of custom apps, and utilizing cloud services.
Members have asked, What do you guys use MCP server for, practically? Are there real life examples that makes your life really easy? Can't think of any.

OpenAI Discord

Grok 3 Talks Too Much: Members find Grok 3 to be too verbose despite prompting for concise responses, however it proves to be a powerhouse in coding and creativity.
One member noted that they are switching to Grok because it is less censored out of the gate.

Perplexity Plans Agentic Comet: Perplexity is launching Comet, a new agentic browser, similar to The Browser Company's work.
The agentic browser space is heating up with more competitors.

Claude 3.7 Arrives with New Coding Power: Anthropic just dropped Claude 3.7 Sonnet which shows improvements in coding and front-end web development and also introduces a command line tool for agentic coding: Claude Code announcement here.
One user pointed out that the model's knowledge cutoff date is February 19, 2025

Claude Code Enters the Terminal: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands overview here.
However it is a limited research preview and is separate to the pro or Anthropic subs.

O3 exhibits 10 second delay: A user reported issues with O3, where it indicates reasoning success but then delays displaying the full text for up to 10 seconds, affecting various models including O1 Pro.
They mentioned experiencing these problems consistently between 3pm-7pm EST, with text sometimes appearing on different devices than expected.

Unsloth AI (Daniel Han) Discord

Tax Evasion Talk Results in Timeout: A user was muted for discussing tax avoidance strategies, as giving tax avoidance recommendations is against the rules; some users pointed out the implications for invoicing.
A user responded the company i was billing invoice too told me stupid that i was reporting income.

CUDA Kernel Causes Colab Catastrophe: A user reported a CUDA error (illegal memory access) on Google Colab with T4, suggesting trying setting CUDA_LAUNCH_BLOCKING=1 and compiling with TORCH_USE_CUDA_DSA for debugging, as per PyTorch documentation.
Another user reported weird spikes in grad norm up to 2000, suggesting the model might be broken.

Qwen2.5 VL 72B Eats Memory Alive: A user faced out-of-memory errors trying to run Qwen2.5 VL 72B on 48GB with a 32K context length, then successfully loaded it with 8k context length after being advised to try 8k or quantize the KV cache to fp8.
The user noted it was necessary to extract the thinking traces from the model.

DeepSeek MLA ported to Llama via TransMLA: Users explored implementing DeepSeek's Multi-Head Latent Attention (MLA) on a Llama model, suggesting retraining, but others pointed to fxmeng/TransMLA, a post-training conversion method from GQA to MLA.
The linked paper is called Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs.

rslora role in Rank Stability: The use of rslora addresses numerical stability in high rank scenarios, but a user cautioned that if r/a = 1, rslora can worsen things, advising to keep r/a = 1 and skip rslora.
The team stated that rslora performs a single sqrt and requires a correction term if the rank gets too big.

OpenRouter (Alex Atallah) Discord

Claude 3.7 Sonnet Lands on OpenRouter!: Claude 3.7 Sonnet is now available on OpenRouter with best-in-class performance in mathematical reasoning, coding, and complex problem-solving.
The pricing is set at $3 per million input tokens and $15 per million output tokens, including thinking tokens, with full caching support at launch.

Extended Thinking Feature Coming Soon: The Extended Thinking feature is coming soon to the OpenRouter API, which enables step-by-step processing for complex tasks, as detailed in Anthropic's documentation.
OpenRouter is actively working on implementing full support for Claude 3.7's extended thinking feature, which does not currently support pre-fills, aiming for launch soon with updated documentation.

GCP Gears Up for Claude 3.7: Google Cloud Platform (GCP) is preparing to support Claude 3.7 Sonnet, launching in us-east5 and europe-west1 with model ID claude-3-7-sonnet@20250219.
Users are reminded that the model features a hybrid reasoning approach, offering both standard and extended thinking modes and maintaining performance parity with its predecessor in standard mode.

OpenRouter Revs Up Claude 3.7 Throttling: OpenRouter increased the TPM (tokens per minute) for anthropic/claude-3.7-sonnet, while anthropic/claude-3.7-sonnet:beta has a lower TPM initially, set to increase as users migrate from 3.5.
The model has a 200,000 token context window, though some users feel its output pricing might cause complaints.

API Key Credits Safety Clarified: Users are reminded that API keys do not contain credits; deleting a key only revokes access, and credits remain tied to the account.
Lost keys cannot be recovered due to security measures.

Interconnects (Nathan Lambert) Discord

Meta AI Expands to MENA: Meta AI has expanded to the Middle East and North Africa (MENA), supporting Arabic on Instagram, WhatsApp, and Messenger.
This expansion opens the chatbot to millions more users in the region.

Claude 3.7 Sonnet launches with Thinking Mode: Anthropic launched Claude 3.7 Sonnet, a hybrid reasoning model with step-by-step thinking, and Claude Code, a command line tool for agentic coding, priced at $3 per million input tokens and $15 per million output tokens.
Researchers noted Claude's thought process as eerily similar to their own, exploring different angles and double-checking answers, showcasing improvements using parallel test-time compute scaling on the GPQA evaluation.

Qwen Chat reasoning model Released: Alibaba Qwen released "Thinking (QwQ)" in Qwen Chat, backed by their QwQ-Max-Preview, which is a reasoning model based on Qwen2.5-Max, licensed under Apache 2.0.
The model will come in smaller variants, e.g., QwQ-32B, for local deployment, with a viral Twitter demo showcasing improved math, coding, and agent capabilities.

Berkeley Advanced Agents MOOC Features Tulu 3: The "Berkeley Advanced Agents" MOOC features Hanna Hajishirzi discussing Tulu 3 today, May 30th, at 4PM PST, with a link to the YouTube video.
The MOOC has been gaining traction as a great resource for engineers interested in agents.

Google's Co-Scientist fed Team's Prior Work: Google's Co-Scientist AI tool, based on the Gemini LLM, had been fed a 2023 paper by the team it was assisting, including a version of the hypothesis that the AI tool later suggested as a solution.
The article highlighted the BBC coverage failed to mention that the AI tool was given the answer, raising eyebrows.

Eleuther Discord

Parallel Brains Outpace Tuned GPUs: Discussions compared the brain's stateful parallel processing to GPU efficiency, noting current RNN architectures, which differ from human processing, cannot scale to LLM level and should be data efficient.
Members concluded that extremely tuned architectures become more relevant than simply scaling up when drawing inspiration from the brain.

Proxy Engine Structures LLM Chaos: The Proxy Structuring Engine (PSE) was introduced to address structural inconsistencies in LLM outputs, providing inference-time steering for creative freedom.
The engine enforces structure boundaries and it is fit for use cases like Advanced Agents & Chatbots, Data Pipelines & APIs, and Automated Code Generation.

Wavelet Coding Tokenizes Image Generation: A new approach to autoregressive image generation based on wavelet image coding and a variant of a language transformer is detailed in this paper.
The transformer learns statistical correlations within a token sequence, reflecting correlations between wavelet subbands at various resolutions.

MLA Squeezes KV Cache: Two papers, MHA2MLA and TransMLA, explore adapting models to Multi-head Latent Attention (MLA), significantly reducing KV cache size (5-10x).
While one paper showed deteriorated performance (1-2%), the other showed enhanced performance, suggesting MLA could be non-inferior to MHA, especially with larger models and more parameters.

Mixed Precision Toggles Optimizer Defaults: During mixed precision training with BF16, the master FP32 weights typically reside in GPU VRAM, unless ZeRO offload is enabled.
It is common to store the first and second Adam moments in bf16, while keeping master weights in fp32, unless the expert sharding with momentum/variance states via ZeRO.

Nous Research AI Discord

LLMs Invoke Tools Autonomously: Some LLMs invoke tools without explicit token sequences, suggesting hard-coded patterns from training via reinforcement learning or SFT.
This token-saving approach's reliability compared to ICL remains unclear without benchmarks.

Claude 3.7 Sonnet Takes the SWE Crown: Claude 3.7 Sonnet leads on the SWE bench, enabling active code collaboration like searching, editing, testing, and committing code to GitHub.
A member suggested that 3.7 being a point release makes sense since Claude 3.5 was already a reasoning model, also hinting that future reasoning models will be 'crazy'.

QwQ-Max-Preview Aims for Deep Reasoning: QwQ-Max-Preview blog shows a model built on Qwen2.5-Max that excels in deep reasoning, math, coding, general domains, and agent tasks.
Speculation arose around key tokens in QwQ's reasoning traces resembling R1, suggesting it requires less compute.

Sonnet-3.7 Excels in Misguided Attention Eval: Sonnet-3.7 benchmarked as top non-reasoning model in Misguided Attention Eval, nearly surpassing o3-mini.
The user seeks to activate its thinking mode via the OR API, if feasible.

Qwen AI Adds Integrated Video Generation: The updated Qwen AI chat interface now features integrated video generation capabilities.
A member noted that the artifacts are still a bit clunky, like a half baked copy.

MCP (Glama) Discord

Anthropic Finally Delivers MCP Registry API: Anthropic announced the official MCP registry API, as seen on this tweet, to be the source of truth for MCPs, streamlining development and integration efforts with solutions like opentools.com/registry.
This API will help the community fill the source-of-truth gap for portable & secure code for AI Apps and Agents.

Claude 3.7 Debuts 'Thinking' Tags: Claude 3.7 has been released, featuring 64,000 output extended thinking tokens and a new 'latest' alias.
Users noted it is back to following long-ish system prompts, spotting social engineering, and also utilizes  tags when using tools, adding a cute touch to its operation.

Claude Code Excels as Code Assistant: Claude Code (CC) is receiving high praise for its code assistance capabilities, outperforming tools like Aider in handling complex coding errors, such as resolving 21 compile errors in Rust in one shot.
Users are speculating on caching mechanisms and costs, with one user reporting astronomical Anthropic costs in the last 6 weeks.

MetaMCP Debates Open-Source Licensing: Concerns were raised regarding MetaMCP's licensing, with a user suggesting it might become a cloud SaaS**, prompting the developer to seek feedback on licensing to prevent cloud monetization while keeping it self-hostable via the MetaMCP server GitHub repository.
A user suggested using AGPL licensing for MetaMCP to ensure contributions are open-sourced, also suggesting an additional clause allowing the company to sublicense under MIT-0.

Claude 3.7 Sonnet Shines on Sage: Claude 3.7 Sonnet with extended thinking capabilities is now on Sage, allowing users to see Claude's reasoning process as it tackles complex problems, including a thinking mode toggle** (Command+Shift+T).
Other new features include default model settings, improved scrolling, and expandable thinking blocks.

LM Studio Discord

Qwen 2.5 VL Model Ready to Rumble: A working Qwen 2.5 VL 7B GGUF has arrived and is available on Hugging Face for immediate use.
Users report that it performs significantly better than llama3.2 vision 11b instruct and qwen2-vision 7b instruct, and works out of the box on the latest version of LM Studio.

QuantBench Accelerates Quantization: The Qwen 2.5 VL 7B GGUF quant was produced using QuantBench, now available on GitHub for accelerated quant workflows.
The model has been successfully tested on the latest llama.cpp build, with CLIP hardware acceleration enabled.

LM Studio Reveals Speculative Decoding Secrets: Users are exploring speculative decoding with Llama 3.1 8B and Llama 3.2 1B models in LM Studio, according to LM Studio's documentation.
The documentation claims that speculative decoding can substantially increase the generation speed of large language models (LLMs) without reducing response quality.

Deepseek R1 671b Gorging RAM: Running Deepseek R1 671b locally needs serious RAM, with documentation specifying 192GB+; one helpful user suggested using a specific quantized version.
For those running on Macs, offloading approximately 70% of the model weights to the GPU may help.

M2 Max Sipping Power: Despite the shiny new M4 Max, one user decided to stick with their M2 Max, as M4 Max boosts way too hard easily pegged at 140w and located a well priced refurbished M2 Max 96GB.
The user reports the M2 Max is sufficient for their needs, pulling only around 60W.

Stability.ai (Stable Diffusion) Discord

SD3 Ultra's Unseen Excellence: A user asked about SD3 Ultra, a comfy workflow based on SD3L 8B that delivers superior high-frequency detail.
Another member stated it still exists and is being used, implying it is not yet a public release.

Silence from Stability?: A member asked about updates on current projects or future plans, noting they haven't heard anything for a while from Stability AI.
Another member responded that nothing can be shared yet, but they are hopefully expecting announcements soon.

Dog Datasets Desired: A user requested alternative dog breed image datasets beyond the Stanford Dogs Dataset, which contains 20k images.
The user specifically needs images containing both the dog and its breed clearly labeled.

Image Generation Times Vary: Users discussed image generation times based on different hardware configurations, using various versions of Stable Diffusion.
Times ranged from around 1 minute on a GTX 1660s to 4-5s on a 3070ti using SD1.5, and 7 seconds for a 1280x720 image and 31 seconds for 1920x1080 at 32 steps with a 3060 TI.

Stability AI Solicits Suggestions: Stability AI launched a new feature request board to gather user feedback and prioritize future developments.
Users can submit and vote on feature requests directly from Discord using the /feedback command or through the new platform, aiming to ensure community voices shape future priorities.

Modular (Mojo 🔥) Discord

Mojo Conjures Graphics with GLFW/GLEW: Graphics programming in Mojo is feasible via FFI using a static library linked to GLFW/GLEW, evidenced by a Sudoku example.
A member suggested exposing only the needed calls via your own C/CPP library using alias external_call with a wrapped function, plus an example repo shows how to hijack the loader.

Mojo's magic install Faces lightbug_http Bug: Using lightbug_http dependency in a new Mojo project leads to an error with small_time.mojopkg after running magic install.
The error resembles a Stack Overflow question, hinting that small-time might be pinned to a specific version.

MAX's Game of Life gets Accelerated by Hardware: A member showcases a hardware-accelerated Conway's Game of Life by bridging MAX and Pygame, revealing a creative application, as shown in their attached conway.gif.
They demonstrated the use of GPU in their MAX implementation by showcasing a guns pattern, packed bit by bit, rendered using a naive pixel-by-pixel internal function, and then the output tensor gets cast into an np array and given to pygame to render, as demonstrated in their guns.gif.

Game of Life Creates Computer Architectures: A member shared a project (nicolasloizeau.com) about crafting a computer within Conway's Game of Life, demonstrating its Turing completeness via glider beams for logic gates.
A member also implemented wrapping in their Conway's Game of Life simulation using MAX, enabling the creation of spaceship patterns and showcasing the ability to add parameters to the model from the graph API, as showcased in their spaceship.gif.

Notebook LM Discord

NotebookLM Eases Use with PowerPoint Conversion: A user detailed a workaround to import physical books into NotebookLM by photographing pages, converting the PDF to PowerPoint, uploading to Google Slides, and importing the slides.
They observed that NotebookLM can process text images in slides, but not directly from PDF files.

Language Prompts Misfire on German: A user reported issues getting NotebookLM hosts to speak German, even with specific prompts requesting German.
The hosts spoke English or gibberish, sometimes starting in German before switching, indicating potential issues with language prompt accuracy.

Savin/Ricoh Copier Revives Book Scanning: A user advised scanning books to PDF using a Savin/Ricoh copier and uploading to NotebookLM.
They affirmed that even with poor source text quality, NLM accurately answered questions about the scanned document.

Users Request Language Customization: A user inquired about the feasibility of changing the language in NotebookLM without altering the Google account language.
This points to a demand for language customization to improve user experience and cater to diverse linguistic preferences.

Claude 3.7 Ignites Model Choice Fantasies: A user expressed enthusiasm for Claude 3.7 and desired the option to select models in NotebookLM.
Another user questioned the impact of model choice, sparking a discussion on the implications of model variety for the end user experience.

LlamaIndex Discord

LlamaIndex Unveils AI Assistant in Docs: LlamaIndex announced the release of an AI assistant directly within their documentation.
The new assistant aims to provide immediate, contextual support to users navigating the LlamaIndex ecosystem.

ComposIO HQ Drops a Banger: LlamaIndex highlighted another new release from ComposIO HQ, though specifics of the release were unmentioned.
This indicates ongoing development and feature enhancements within the ComposIO framework, a tool useful for LLM orchestration.

AnthropicAI Releases Claude Sonnet 3.7: AnthropicAI launched Claude Sonnet 3.7, with LlamaIndex offering immediate support.
Users can access the new model by running pip install llama-index-llms-anthropic --upgrade and reviewing Anthropic's announcement.

Fusion Rerank Retriever Demands Initialized Nodes: A user reported issues initializing the BM25 retriever within a fusion rerank retriever setup with Elasticsearch because the docstore was empty.
Another member clarified that BM25 requires nodes to be saved to disk or another location for initialization, as it cannot initialize directly from the vector store.

MultiModalVectorStoreIndex Throws File Error: A user encountered a [Errno 2] No such file or directory error when creating a multimodal vector index using MultiModalVectorStoreIndex with GCSReader.
The error occurred with image files present in the GCS bucket, while PDF documents were processed successfully, indicating a potential issue with image file handling.

Torchtune Discord

Truncation Troubles: Left Prevails: Members debated the use of left truncation seq[-max_seq_len:] vs right truncation seq[:max_seq_len] during finetuning, with interesting graphs.
The final decision involved exposing both methods but defaulting to left truncation for SFT in torchtune.

StatefulDataLoader Support: Merge Incoming: A member is requesting review for their PR adding support for the StatefulDataLoader class in torchtune.
The new dataloader would add statefulness to the dataset.

DeepScaleR Scales with RL: DeepScaleR was finetuned from Deepseek-R1-Distilled-Qwen-1.5B using simple reinforcement learning (RL).
DeepScaleR achieved 43.1% Pass@1 accuracy on AIME2024.

DeepSeek Opens EP Communication Library: DeepSeek introduced DeepEP, the first open-source EP communication library for MoE model training and inference.
The communication library enables efficient all-to-all communication.

Cohere Discord

Validators Ponder Profitability Threshold: A member inquired about the profitability threshold for Proof of Stake (PoS) validators within the Decentralized Science (DeSci) field.
Another member responded with "pool validator node", hinting at the importance of pool participation for validators.

Asset Expert Gets Labeled: The bot posted about an "asset value expert account" which was labelled as "nazi".
No further context was given.

DSPy Discord

DSPy Simplifies Assertion Migration: DSPy users can now use dspy.BestOfN or dspy.Refine modules to streamline migration from 2.5-style Assertions.
The dspy.BestOfN module retries a module up to N times, selecting the best reward and halting upon reaching a specified threshold.

DSPy crafts reward functions: DSPy's reward functions now support scalar values such as float or bool, which allows customized evaluation of module outputs.
A sample reward function was shown: def reward_fn(input_kwargs, prediction): return len(prediction.field1) == len(prediction.field1).

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email. 
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!

                            Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):

            Email address (required)

                Share this email:

                                Share on Twitter

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Reddit

                                Share via email