[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet day
AI News for 3/27/2025-3/28/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 13422 messages) for you. Estimated reading time saved (at 200wpm): 1217 minutes. You can now tag @smol_ai for AINews discussions!
We soft launched the 2025 State of AI Engineering survey today, fill it out to join our $1000 Amazon gift card raffle + have your voice heard in the state of AI Eng!
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
Here's a summary of the tweets, organized by topic:
GPT-4o Model Performance and Features
- GPT-4o's improved coding and instruction following were praised: @sama highlighted the new version of GPT-4o for being particularly good at coding, instruction following, and freedom. @kevinweil agreed, stating the GPT-4o update is strong and encouraged users to try it.
- GPT-4o's performance relative to other models, particularly in coding and reasoning, was assessed: @ArtificialAnlys reported that GPT-4o (March 2025) is now the leading non-reasoning coding model, surpassing DeepSeek V3 and Claude 3.7 Sonnet in the Artificial Analysis Coding Index, and is #1 in LiveCodeBench. However, it still lags behind reasoning models like o3-mini.
- Concerns about policy compliance: @joannejang noted that image generation refusals are often due to the model hallucinating policies. They asked users to bear with them as they try to get the model to follow the policy and suggested trying again in a new chat if encountering issues.
- @nrehiew_ hypothesized that 4o image generation works by embedding the image directly via an encoder, using AR, and then diffusing out based on the ARed hidden states; the blur is a psyop and there's no VQ.
- GPT-4o's transparency and background generation feature were highlighted: @giffmana noted the ability to ask GPT-4o image gen for transparent backgrounds, calling it a cool feature drowned out by Ghiblification hype.
Gemini 2.5 Pro Model Performance and Capabilities
- Gemini 2.5 Pro was lauded for its capabilities in audio and video understanding: @_philschmid reported that Gemini 2.5 Pro has improved long context capabilities and can process ~1h long video with a single request, noting the integration of YouTube links into AIS and API. The model can also handle ~2 hours of podcast transcription in a single request.
- Simple-Bench AI Explanation Performance: @scaling01 mentioned Gemini 2.5 Pro Thinking scored around 51.6% on AI Explained' Simple-Bench, the first model to score above 50%.
- Accessibility and Usage: @_philschmid announced that users can bring their own API Key to @cursor_ai to use Gemini 2.5 Pro, but noted that rate limits are currently low. They also mentioned that Gemini 2.5 Pro is available in @windsurf_ai.
AI Infrastructure and Compute
- GPU usage is expected to increase significantly: @saranormous stated that they are going to use all the GPUs (and TPUs).
- Together AI and Hypertec Group are partnering to deliver large-scale GPU clusters: @togethercompute announced a partnership with @HypertecGroup to deliver clusters of thousands of GPUs, emphasizing high-bandwidth networking, advanced cooling, and robust fault tolerance.
- CoreWeave's IPO: @weights_biases congratulated @CoreWeave on their IPO, highlighting their success in pushing the edge of what’s possible in AI infrastructure.
AI Engineering and Development
- Concerns regarding conventional programming languages over vibe coding: @lateinteraction emphasized the importance of retaining useful aspects of conventional programming languages, such as defining functions, control flow, and modules, rather than giving in to "vibe coding".
- Importance of open-source in medical AI: @iScienceLuvr highlighted the crucial role of open-source in medical AI due to the need for transparency and the impracticality of sending sensitive patient data to cloud APIs.
- Emphasizing scalable solutions for ASI: @teortaxesTex pointed out a statement about building scalable solutions to ASI, focusing on improvements with more resources on computation and data.
- Langchain and Redis Integration: @LangChainAI announced that with
langgraph-checkpoint-redis, you can bring @Redisinc's powerful memory capabilities to your LangGraph agents.
Company and Product Announcements
- New homepage for Keras: @fchollet announced the launch of a brand new homepage for Keras to celebrate its 10th anniversary.
- C.H. Robinson saves time with LangGraph: @LangChainAI reported that C.H. Robinson is saving 600+ hours a day using tech built with LangGraph, LangGraph Studio, and LangSmith to automate routine email transactions.
- Launch of the MIT NLP Group account: @lateinteraction announced the launch of the @nlp_mit account to showcase the latest NLP research from MIT labs.
- Perplexity AI Thread Infrastructure Issues: @AravSrinivas mentioned that Perplexity AI is going through some infra challenges, which is why past threads are not loading.
Humor/Memes
- Various humorous tweets: Several users shared humorous content, including @Teknium1 posting "Jensen rn" with an image, @teortaxesTex with Xi after he dies in WWIII and is reincarnated as a shota in a parallel world, @mickeyxfriedman suggesting that if you generate yourself as the opposite sex in chatgpt and think it’s mid, you should probably lower your standards, and @_philschmid noting that @cursor_ai just rick rolled them.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Reverse Engineering GPT-4o: Architectural Insights and Speculations
- Reverse engineering GPT-4o image gen via Network tab - here's what I found (Score: 599, Comments: 43): The author investigates the image generation process of GPT-4o by examining network traffic, uncovering that the backend returns intermediate images that suggest a possible multi-step pipeline. They speculate whether the model uses a diffusion process or an autoregressive approach, noting that the OpenAI model card describes it as an autoregressive model. The author references the OmniGen paper as a potential explanation for GPT-4o's capabilities, highlighting its use of a transformer-based architecture that scales well with high-quality data and computational power.
- There is debate over whether the GPT-4o model uses a diffusion model or an autoregressive model. Some commenters speculate it might employ a hierarchical decoder with a diffusion model for pixel-level detail, while others suggest it uses an autoregressive approach that enhances image generation by predicting sequences of tokens in a sophisticated manner.
- The potential for open-source competitors to match the quality of GPT-4o is discussed, with some expecting that Chinese competitors might achieve this within a year. However, others believe it could take until the end of 2025 for open-source models to catch up, emphasizing the importance of an open-source image model akin to LLaMA for LLMs.
- Commenters express skepticism about the value of individual reverse engineering efforts, noting that the broader academic and industrial communities, especially in China, are likely conducting extensive analyses. There is interest in whether the model's ability to access the internet and utilize high-quality data provides significant advantages over local text encoders like CLIP/T5.
Theme 2. MegaTTS3's Voice Cloning: Skepticism and Security Concerns
- New TTS model from bytedance (Score: 143, Comments: 19): ByteDance released MegaTTS3, a new text-to-speech model, which has sparked controversy over its voice cloning capabilities. The discussion centers around ethical implications and potential misuse of this technology in creating unauthorized voice replicas.
- MegaTTS3's Features and Limitations: The model boasts lightweight efficiency with 0.45B parameters, bilingual support, and controllable accent intensity. However, the WaveVAE encoder is not available for local voice cloning due to "security issues", sparking criticism about the misleading advertising of "Ultra High-Quality Voice Cloning".
- Ethical and Security Concerns: There is skepticism about the "security reasons" for not releasing the voice cloning software, as many believe this is a guise for data collection to improve their models. Critics argue this approach contradicts ethical considerations, given the widespread availability of AI voice cloning technologies.
- Community Reactions and Criticism: Users express frustration over the misleading promotion of voice cloning capabilities and question the ethics of data submission for training purposes. Some see the "safety" claims as a strategy for indirect monetization by collecting user data for further training.
Theme 3. Qwen-2.5-72b: Leading the Open-Source OCR Revolution
- Qwen-2.5-72b is now the best open source OCR model (Score: 119, Comments: 14): Qwen 2.5 VL (72b and 32b) models have emerged as the leading open-source OCR models, achieving approximately 75% accuracy in JSON extraction, comparable to GPT-4o. The 72b model slightly outperformed the 32b model by 0.4%, while both surpassed the mistral-ocr model's 72.2% accuracy. Surprisingly, Gemma-3 (27B) scored only 42.9%, despite its architecture being based on the high-performing Gemini 2.0. The benchmarking data and methodology are available on GitHub and Hugging Face.
- Ovis2 Models have not been included in the discussion, despite being leaders on OCRBench with significantly fewer parameters (18x less), suggesting potential interest in their performance relative to Qwen models.
- There's curiosity about the performance of the olmOCR-7B-0225-preview model from Hugging Face, noted for being more VRAM efficient, highlighting a demand for models that balance performance with resource usage.
- The Qwen 2.5 VL 32B model has been updated and shows significant performance improvements over the older 72B model, which has not received recent updates. The 32B model is also noted for its superior writing capabilities compared to the vanilla Qwen model.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
our pipelines are down...
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. GPT-4o Dominates Leaderboards and Sparks Debate
- GPT-4o Jumps to Arena #2, Coding Prowess Confirmed: The latest ChatGPT-4o (2025-03-26) model surged to #2 on the Arena leaderboard, surpassing GPT-4.5 and tying for #1 in Coding and Hard Prompts. Users note a significant performance leap and a 10x cost reduction compared to previous models, though pricing discrepancies with API snapshots cause confusion.
- GPT-4o's Coding Skills Draw Mixed Reviews Despite Benchmarks: While benchmarks position Gemini 2.5 Pro as the leading non-reasoning model, some users find GPT-4o superior for coding tasks, particularly in instruction following and code generation. Debate continues about whether GPT-4o's high ranking is due to specialized training for preferred response styles rather than raw performance.
- GPT-4o Unveiled as Autoregressive Image Model: GPT-4o is confirmed to employ an autoregressive approach for image generation, marking a novel method for creating images directly from text prompts. Speculation arises about the model reusing image input and image output tokens for efficiency.
Theme 2. DeepSeek V3 and Qwen2.5-Omni Emerge as Strong Contenders
- DeepSeek V3 Outcodes GPT-4o on SWE-bench: The new DeepSeek V3 0324 model is gaining recognition for coding prowess, reportedly outperforming GPT-4o R1 on the SWE-bench benchmark. Data indicates DeepSeek V3 surpasses Claude 3.7 Sonnet in non-reasoning coding tasks, becoming a leading model in the field.
- Qwen2.5-Omni: Meta's Multimodal Marvel Arrives: Qwen2.5-Omni, the latest flagship model in the Qwen series, is released as an end-to-end multimodal model handling text, images, audio, and video with real-time streaming responses. Users can test Qwen2.5-Omni at Qwen Chat, marking a significant step towards truly versatile AI models.
- DeepSeek Blends Diffusion and Transformers, Following GPT-4o's Lead: DeepSeek is adopting a multimodal architecture similar to GPT-4o, combining diffusion and transformers. This approach, previously seen in vision models, signals a growing trend in multimodal AI development.
Theme 3. Infrastructure Woes and User Frustrations Plague AI Platforms
- Perplexity AI Buckles Under Server Strain, Users Report Outages and Data Loss: Perplexity AI experiences widespread outages, with users reporting disappearing history and spaces. The official status page (status.perplexity.com) is slow to update, prompting calls for better outage communication and automated reporting systems.
- Manus.im Credit System Triggers User Backlash Over High Costs: Manus.im's new credit system faces heavy criticism for its perceived high cost, with some users estimating monthly expenses could reach $500. The shift from a task-based to credit-based system is described as jarring, impacting user experience.
- Cursor IDE Suffers Database Disaster, Service-Wide Outage Ensues: Cursor experiences a service-wide outage due to a database deployment issue, disrupting core AI features and general service functionality. While resolved after a few hours, the incident highlights the fragility of AI-powered coding tools and their reliance on robust infrastructure.
Theme 4. Tools and Techniques for Enhanced AI Development Emerge
- LM Studio 0.3.14 Unleashes Granular Multi-GPU Control: LM Studio 0.3.14 introduces advanced controls for multi-GPU setups, allowing users to fine-tune GPU allocation strategies and manage resources more effectively. New keyboard shortcuts (
Ctrl+Shift+HorCmd+Shift+H) provide quick access to GPU settings. - Aider's New
/contextCommand Automates Codebase Context Management: Aider introduces the/contextcommand, which automatically identifies and adds relevant files to the chat based on user requests. This feature streamlines context management, especially in large codebases, saving developers time and effort. - DSPy Framework Promotes Declarative Programming Over Brittle Prompting: DSPy is highlighted as a framework for programming language models rather than relying on traditional prompting. It enables rapid iteration on modular AI systems using Python code and algorithms to optimize prompts and model weights, aiming for more robust and high-quality AI outputs.
Theme 5. Ethical Considerations and AI Safety Remain Central
- OpenAI Relaxes Image Generation Policy, Prioritizes Real-World Harm Prevention: OpenAI shifts its image generation policy in ChatGPT 4o, moving from blanket refusals to a more nuanced approach focused on preventing real-world harm. This policy change allows for greater creative freedom in previously restricted areas.
- AI Safety Discussions Highlight Constitutional AI and Jailbreak Concerns: Discussions on AI safety emphasize that models like Claude, designed with constitutional AI principles, prioritize objectivity over user preferences, potentially impacting leaderboard rankings. Resources like the Jailbreak Cookbook are shared, addressing LLM vulnerabilities and safety measures.
- Miyazaki's 9-Year-Old Critique of AI Art Resurfaces, Sparks Ethics Debate: A resurfaced clip of Hayao Miyazaki criticizing AI-generated art reignites ethical discussions within the AI community. The debate draws parallels between AI art sampling and fast fashion ethics, questioning the morality of readily accessible, potentially exploitative content.
PART 1: High level Discord summaries
Manus.im Discord Discord
- Users Rage Against Manus New Credit System: Users are frustrated with the new credit system, some estimating costs could reach $500/month for decent usage and the 1000 free credits are quickly consumed even if the task fails, details at manus.im/help/credits.
- The community noted the shift from task-based to credits-based does feel jarring, especially when it wasn’t part of the original beta flow.
- Manus Farm Brainstorming Alternative Energy: One member suggested that Manus could develop cheap renewable energy sources, such as molten sodium, thermal or solar to power their own GPU farm and reduce costs, potentially locating it in a desert.
- The member proposed flywheels as energy storage to keep the farm running at night for max efficiency.
- Manus Considers Cheaper AI Models Like Deepseek: The community is in discussion around using cheaper AI models like Deepseek and Qwen instead of only Anthropic's Claude to reduce operational costs.
- It has not been stated if Manus will allow other AI integrations.
- Students Cheat with Manus AI on Exams: Students have used Manus alongside Kimi or Deepseek to upload seminar and lecture files, asking the AI to memorize them for exam preparation, some receiving scores such as 81/100 on assignments.
- Some users were wondering if that violates the terms of service if you help the AI cheat for school.
- UI Design Hailed as Simple Genius: Multiple members praised the UI design of Manus, expressing that the design is really good, easy to use, simple and aligns with real world concepts.
- One user stated What made manus feel so amazing was not only the results you got, but that the idea of tasks closely aligned with real world concepts. That simplicity was genius.
Perplexity AI Discord
- Perplexity AI Servers Under Siege: Multiple users reported outages and disappearing history/spaces, prompting humor and frustration, and the official status page (status.perplexity.com) lacked timely updates.
- Users suggested an automated user-reported outage system and proactive notifications to address the infrastructure challenges mentioned in this tweet.
- DeepSeek AI Falls Flat: Members voiced disappointment with DeepSeek AI, citing its struggles with complex instructions and tendency to produce unnecessary jargon.
- Comparisons were made to superior math applications, highlighting DeepSeek AI's shortcomings in practical problem-solving.
- Claude AI's Context Window Gets the Side Eye: Discussion arose around the context window limit of Claude AI relative to Gemini and ChatGPT, with many members noting Claude's limitations.
- Members agreed that Claude's context window was particularly restrictive in comparison to its competitors, especially Gemini.
- Free Perplexity Pro Via T-Mobile: Users exchanged methods for acquiring free Perplexity Pro subscriptions through T-Mobile and Revolut promotions.
- One user even suggested utilizing a burner number on T-Mobile to take advantage of the offer, and another user linked to a tweet about Perplexity shipping voice dictation.
- Sonar API has Llama Index RAG Integration Issues: A user inquired about effectively passing Llama Index RAG context to the Perplexity Sonar model, seeking suggestions on leveraging the index object.
- The user also questioned whether the Deep Research functionality in the API would achieve parity with the perplexity.com version, noting a perceived performance gap, and mentioned that the Sonar API sometimes misses citations.
Cursor Community Discord
- DeepSeek 3.1 Sneaks into Cursor: A Cursor team member mentioned that DeepSeek 3.1 should be integrated into the editor within 12 hours, but pricing details remain undisclosed.
- Cursor offers deals with providers and a privacy mode ensuring no data storage.
- Cursor Plunges Amidst Database Disaster: Cursor experienced a service-wide outage due to a database deployment issue within its infrastructure, disrupting AI features like Chat and Tab as well as general service.
- After a few hours, the issue was resolved and they updated the Cursor Status.
- Humanoid Hype Heats Up: Members debated the utility of humanoid robots, with contrasting visions of them as food-making and cleaning assistants versus concerns over data privacy and telemetry.
- A member posited that AGI will emerge from robotics, developing first in a virtual environment before manifesting in the real world.
- Codebase Tag Cruises into the Sunset: Users noticed the removal of the @Codebase tag and staff clarified it was replaced with a similar way to scan the current indexed project, as noted in the changelog.
- This sparked discussions about token limits, pricing models, and balancing convenience with control in AI coding tools.
LMArena Discord
- O1 Pro Coming to Leaderboard?: Members discussed the potential inclusion of O1 Pro on the leaderboard, speculating that OpenAI might cover costs to showcase its capabilities given its high price.
- However, some members expressed doubts about its leaderboard performance and latency.
- GPT-4o's Coding Skills Under Debate: Members debate GPT-4o's coding ability after recent updates, with some noting improvements in instruction following and code generation.
- However, proper evals are needed, as one member argued that GPT-4o's ranking may be inflated due to specialized training for preferred response styles, rather than actual performance.
- DeepSeek V3 leapfrogs Coding Benchmarks: The new DeepSeek V3 0324 model is gaining recognition, with one member noting it scores higher than GPT-4o R1 in SWE-bench according to this Reddit post.
- Data indicates that DeepSeek's V3 0324 release leapfrogs Claude 3.7 Sonnet in non-reasoning and has become the leading non-reasoning model for coding.
- Meta's Llama Models getting Quirky: Members observed that recent anonymous models in the arena, believed to be from Meta, are displaying quirky behavior, including adding many emojis and identifying themselves as Meta Llama models.
- Models being tested include:
bolide,cybele,ginger,nutmeg,phoebe,spider,themis, though they also note thatspidersometimes identifies itself as GPT-4.
- Models being tested include:
- AI Safety Discussions: Members discussed AI safety, mentioning that models like Claude are designed with constitutional AI principles, prioritizing objectivity over user preferences, which may affect their leaderboard rankings.
- A member also shared a Jailbreak Cookbook resource for LLM jailbreaks and AI safety, including a GitHub repository with implementations of systematic jailbreaks.
Unsloth AI (Daniel Han) Discord
- Scribe V1 Powers FoxMoans!: A member uses 11Labs Scribe V1 for audio event classification, to create a list of utterances, estimating a cost of $20k.
- It is used for audio event classification, suited for projects needing mood-based analysis.
- OlmOCR's Unsloth Integration Still Rocky: A member struggles to load OlmOCR (a finetune of Qwen2VL) in Unsloth, despite having Qwen2VL working.
- The Unsloth team asked if the user tried the latest version, as they pushed updates and fixes before the creator realized their models finished uploading.
- Orpheus TTS Gets Fine-Tuning: The Unsloth team released a notebook for finetuning Orpheus-TTS, highlighting its human-like speech with emotional cues.
- Members discussed changing Orpheus language, suggesting continued pretraining with new embedded/head layers might be sufficient.
- Double Trouble for BOS Token: A user found a double BOS token issue in the latest Unsloth update (Gemma 3 4B) when checking tokenizer decoding.
- A hotfix was identified which removed the accidentally added token.
- DeepSeek-R1 Goes Quantized: Unsloth made available various versions of DeepSeek-R1, including GGUF and 4-bit formats.
- Unsloth's DeepSeek-R1 1.58-bit + 2-bit Dynamic Quants selectively quantized improving accuracy over standard 1-bit/2-bit.
OpenAI Discord
- GPT-4o vs Gemini 2.5: Coding Showdown: Members compared GPT-4o and Gemini 2.5 Pro for coding, with some finding GPT-4o superior despite benchmarks showing that Gemini 2.5 Pro performs better overall, with GPT-4o winning 3 out of 6 categories.
- Opinions varied, with some favoring Gemini for specific tasks like C++ and WinAPI integration.
- Google AI Studio: The New Free Tier Hero: Users are praising Google AI Studio for its free access to models like Gemini 2.5 Pro and generous prompt limits, which are more than paid services like ChatGPT Plus.
- Some members reported using hundreds of messages daily without hitting limits and even canceled their ChatGPT subscriptions because of these advantages.
- Perplexity Dominates News over ChatGPT: Members found Perplexity excels in news and current events due to its Discover tab, highlighting it as more than just a GPT wrapper.
- However, some noted issues with Perplexity's Deep Research feature for quality and reliability on uploaded files, suggesting ChatGPT instead.
- Claude 3.7 Sonnet's Reasoning Prowess: Members lauded Claude 3.7 Sonnet for its superior reasoning capabilities and explanations compared to other AI models, especially since free tier Claude fills up and forces you to start a new chat.
- Alternative models like o1, o3-mini-high, and Grok 3 were recommended for coding, with o1 favored for complex tasks using C++, Physics, Rendering and older APIs like Win32API.
- Enhanced Image Prompting: A New Dawn?: Users raved about the new ChatGPT image tool's improved adherence to complex prompts, like generating a moving market on a giant turtle's back with a sun and three moons.
- The updated tool excels at targeted image modifications, such as removing stars from a night scene without affecting the entire image.
OpenRouter (Alex Atallah) Discord
- Gemini 2.5 Pro: Users Hit Rate Limit Wall: Users are bumping into low rate limits for Gemini 2.5 Pro, even after integrating their own AI Studio API keys, leading to discussions on maximizing free quota.
- One member remarked the model won't be free forever which will be a problem when they inevitably have to start charging.
- OpenRouter AI SDK Provider Options Confuse Debuggers: Members are actively debugging OpenRouter AI SDK provider options, specifically using
providerOptionsfor model order and fallback behavior.- The core issue revolves around the correct way to nest the order array under the provider key, as debugging attempts reveal unexpected provider selection despite the configurations.
- Function Calling Gold Rush in Free LLMs: Members are on the hunt for free models that support function calling, with Mistral Small 3.1 and Gemini free models emerging as top contenders.
- One frustrated member exclaimed, Gosh, I'm trying so hard to find a free model that supports function calling. I can't find any!.
- Gemini Flash 2.0 Burns Rubber in TPS Showdown: The community is hotly debating the tokens per second (TPS) performance of various coding models, with Gemini Flash 2.0 being touted for its blazing speed.
- Despite the hype, some users are critical, pointing out it is trash because their hosting is messed up, and one member touted that Groq serves the 70B R1 distil at 600tok/s, another one chimed in that it isn't good at coding imo.
- OpenAI Responses API Support?: A member inquired about OpenRouter supporting the OpenAI Responses API.
- The OpenRouter team suggested the Veo2 API is your best bet for SOTA image to video, but it's about 50 cents per second of video.
MCP (Glama) Discord
- Prompt ICL for Best Tool Use: Members discussed prompting agents for tool usage, referencing Cline's system prompt and suggesting prompts on the server directly such as
First call ${tool1.name}, then ${tool2.name}.- A member shared a link on using prompts for ICL and a test showing it working.
- Google Search Gets Config for MCP: A member inquired about adding Google Search to MCP, and another member shared their configuration.
- They noted that users need to obtain their own Google API key and engine ID to use the configuration.
- MCP Servers Galore with Docker: A member created an all-in-one Docker Compose setup for easily self-hosting 17 MCP servers using Portainer, with Dockerfiles sourced from public GitHub projects (MCP-Mealprep).
- It was recommended to not bind the containers on 0.0.0.0 unless you need this accessible remotely and to include in the readme an example mcp config json.
- Agents are saying Canvas Yeah!: A member created a Canvas MCP server, enabling AI agents to interact with Canvas LMS, and added an agent that can autonomously crawl Gradescope to find info, available at Canvas-MCP.
- The tool offers features like finding relevant resources, querying upcoming assignments, and accessing courses and assignments from Gradescope.
aider (Paul Gauthier) Discord
- GPT-4o Claims Coding Arena: The latest ChatGPT-4o update jumps to #2 on the Arena leaderboard, tying #1 in Coding, Hard Prompts, and performing in the Top-2 across ALL categories while costing 10x less.
- This update is confusingly released as chatgpt-4o-latest endpoint, priced at $5/$15 per million input/output tokens, whereas the API snapshots are priced at $2.5/$10, so caution is recommended when moving workloads, according to Artificial Analysis.
- OpenRouter R1 Model Stumbles: A member found the free R1 model on OpenRouter to be "stupid", verbose, and ineffective at solving broken tests, especially with repomap enabled, unlike O3-mini.
- It's speculated that the free R1 model is a quantized version of DeepSeek, possibly in FP8 format, while the DeepSeek on the leaderboard is from the official DeepSeek team and users rotating through multiple API keys on OpenRouter may have their accounts suspended.
- Context Architecture Enables Efficient Codebase Handling: Constant Context Architecture (CCA) is proposed as a solution for working with large codebases using LLMs, guaranteeing that the necessary context for modifying any module will always fit within an LLM's context window, regardless of the total codebase size, as described in this blogpost.
- This is achieved by ensuring modules have bounded size, interfaces, and dependencies, making context gathering a bounded operation.
- Rate Limits Frustrate Gemini 2.5 Pro Users: Multiple users reported hitting rate limits with Gemini 2.5 Pro, even when seemingly below the documented 50 requests/day, with one noting the existence of a 2 requests/minute limit.
- There was discussion on whether purchasing a paid account would resolve the limitations, with mixed results reported, along with a potential fallback model implementation.
- Aider's Context Command Automates File Inclusion: The new
/contextcommand automatically identifies relevant files for a given request and adds them to the chat, as discussed in this discord thread.- It's particularly useful for large codebases and saves time by automating the process of manually adding files.
Latent Space Discord
- GPT-4o Leaps to #2 on Arena!: The latest ChatGPT-4o (2025-03-26) jumped to #2 on Arena, surpassing GPT-4.5 with a significant improvement (+30 pts) over the January version, according to this tweet.
- It tied for #1 in Coding and Hard Prompts.
- OpenAI Loosens Image Generation Policy: OpenAI launched native image generation in ChatGPT via 4o, shifting from blanket refusals to a more precise approach focused on preventing real-world harm, as explained in this blog post.
- The new policy allows more creative freedom in sensitive areas.
- Devin Autogenerates Wiki Pages: Devin now automatically indexes repos and produces wikis with architecture diagrams and links to sources, according to this tweet.
- This functionality helps users get up to speed on unfamiliar parts of a codebase.
- HubSpot Co-Founder Joins Latent Space: Dharmesh Shah, co-founder of HubSpot and creator of Agent.ai, joined Latent Space to discuss the next evolution in workplace organization, with a focus on hybrid teams.
- A key concept is the idea of human workers collaborating with AI agents as team members, raising questions about team dynamics, trust, and task delegation.
- LLM Codegen Workflow Detailed: A member shared their LLM codegen workflow, emphasizing brainstorming specs, planning, and executing with LLM codegen in discrete loops.
- The workflow is built on personal experience and internet best practices, but the author admits that it will probably not work in 2 weeks, or it will work twice as well.
LM Studio Discord
- LM Studio Tames Multi-GPU Setups: LM Studio 0.3.14 introduces granular controls for multi-GPU setups, enabling users to enable/disable specific GPUs and choose allocation strategies such as evenly or priority order, downloadable here.
- Keyboard shortcuts
Ctrl+Shift+H(Windows) orCmd+Shift+H(Mac) give quick access to GPU controls, withCtrl+Alt+Shift+H(Windows) orCmd+Option+Shift+H(Mac) opening a pop-out window for managing settings during model loading.
- Keyboard shortcuts
- Threadripper Flexes on EPYC: A discussion compared Threadripper to EPYC, clarifying that while Threadripper is technically HEDT (High-End Desktop), AMD does not promote EPYC for home users.
- A GamersNexus review highlighted the AMD Ryzen Threadripper 7960X's 24 cores and relatively low cost for workstations.
- LLM Calculations Get a Visual Overhaul: Members discussed visualizing calculations performed by LLMs, such as mapping values to pixel colors and the LLM Visualization tool was recommended.
- Resources such as 3b1b's playlist on LLMs and a book on building LLMs from scratch were shared for deeper understanding.
- P100 Gets Demolished by 6750xt: A member inquired about using a P100 16GB for a hobby project, but was strongly advised against it, with one user saying its basically e-waste compared to a 6750xt.
- The 6750xt was recommended as a better and more modern card due to its Vulkan support, while the P100's unsupported CUDA versions make it less desirable.
Eleuther Discord
- Transformer Storage Error Messages Confuse Users: Insufficient storage leads to misleading error messages in transformers v4.50.0, a user found; a PR is planned for better error handling and checking for capacity before downloading model shards.
- The user had to use
df -hto diagnose the 100% full system due to bad error messaging from the library.
- The user had to use
- Torchtune Invites Code Tinkering for Customization: Users found that torchtune needs downloading and editing 200-line PyTorch scripts and YAML files to customize, giving a complete view of the process.
- The need to dissect Hugging Face's implementations may be avoided by this approach, according to a user.
- Bias-Augmented Consistency Training Validates Introspection: Members discussed emulating self-awareness in LMs by creating a representation of their circuits and feeding it back, inspired by Anthropic's work.
- A paper on bias-augmented consistency training (BCT) was also linked as a validation measure for introspection methods.
- Adaptive Compression Aims to Boost Distributed Systems: An infrastructure layer optimizing model transmission and deployment across distributed systems is in development, using adaptive compression and intelligent routing to tackle bandwidth waste and inference latency.
- Those interested in distributed inference may find this infrastructure useful for scaling larger models, offering a demo.
- Neural Nets Morph Into Bodies Without Organs: A member linked to a tweet arguing that neural networks are Bodies Without Organs (BwO) because they don't have organs or fixed mechanisms and instead have flows of information.
- A member rejects mechanistic interpretability and says neural networks generalize without fixed mechanisms which was seen by Descartes 400 years ago.
GPU MODE Discord
tl.gatherGlides Closer to Release: While waiting for official release, to solve element repetition problems, members noted that one can compile Triton from source as described in this discord thread.- The team also clarified that tl.gather could solve element repetition problems, which has been requested by other members for functions such as
torch.Tensor.expand()to triton.
- The team also clarified that tl.gather could solve element repetition problems, which has been requested by other members for functions such as
- Activation Sparsity Accelerates FFNs: A new paper was shared arguing that 2:4 sparsity for activation acceleration in LLMs leads to 1.3x faster FFNs without accuracy loss, see Acceleration Through Activation Sparsity.
- A member noted the next step is
FP4 with sparsity for an effective 2-bit tensorcore performance.
- A member noted the next step is
- Confusion Clouds CUDA Profiling: A user seeks a definitive guide to CUDA profiling, given the plethora of Nvidia tools such as nvprof, Nvidia Visual Profiler (nvvp), and various Nsight packages.
- Another user suggested Nsight Compute is the best tool for single kernel profiling, with links to Nvidia's documentation and a detailed talk.
- Miyazaki Mocks AI Art Sampling: A 9-year-old meme resurfaced showing Hayao Miyazaki's critical reaction to AI-generated art when presented by a founder of Niconico.
- Members compared the ethics of using AI art to buying from fast fashion companies like Shein, citing an immoral business model offers access to cheaper content.
Yannick Kilcher Discord
- AI Schools Envisioned by OpenAI and xAI: OpenAI and xAI are exploring the concept of AI-driven schools, potentially leveraging generated images for lesson content, with discussion pinpointing Ghibli Studio Style as a solution for alignment as per this post.
- The initiatives aim to integrate AI more intimately into educational frameworks, with a focus on creating visually appealing and contextually relevant learning materials.
- Transformer Circuits Unveils Crosscoders: The Transformer Circuits team released an update on sparse crosscoders, a variation of sparse autoencoders that read and write to multiple layers, forming shared features as outlined in their research update.
- These crosscoders address cross-layer superposition, monitor persistent features, and simplify circuits.
- GPT-4o Confirmed as Auto-Regressive Image Model: Members verified GPT-4o as an autoregressive image generation model after Yampeleg's post and the release of OpenAI's System Card.
- This revelation highlights the model's novel approach to image creation directly from textual prompts, with members conjecturing that GPT-4o reuses image input and image output tokens.
- Qwen2.5-Omni Makes a Multimodal Splash: Qwen2.5-Omni, the latest flagship end-to-end multimodal model in the Qwen series, has been shared among members, and it is designed for comprehensive multimodal perception and handles text, images, audio, and video, as detailed on the Qwen Chat.
- Offering real-time streaming responses via both text generation and natural speech synthesis, Qwen2.5-Omni sets a new benchmark in multimodal interaction.
Interconnects (Nathan Lambert) Discord
- GPT-4o Surges on Arena, 10x Cheaper: The new ChatGPT-4o (2025-03-26) model jumped to #2 on Arena, surpassing GPT-4.5, with reported 10x cost reduction and it tied for #1 in Coding and Hard Prompts, as reported by lmarena_ai.
- The model is currently ranked in the Top-2 across all categories in Arena and excels in both coding and handling complex prompts.
- Musk's xAI Swallows X in $80B Deal: Elon Musk revealed that xAI has taken over X through an all-stock transaction, valuing xAI at $80 billion and X at $33 billion, including $12 billion in debt, according to The Verge.
- This move consolidates Musk's AI ventures under the xAI umbrella and may shift the competitive landscape in the AI market.
- LlamaGen Generates Images Like LLMs: The LlamaGen family of image generation models applies the next-token prediction paradigm from large language models to generate images, achieving 2.18 FID on ImageNet 256x256 benchmarks as described in the LlamaGen paper.
- The architecture achieves a reconstruction quality of 0.94 rFID and 97% codebook usage with an image tokenizer that has a downsample ratio of 16.
- Qwen2.5-Omni Does It All: The Qwen2.5-Omni is the new flagship end-to-end multimodal model in the Qwen series, capable of processing text, images, audio, and video, with real-time streaming responses via text and speech as noted in their blogpost.
- The model is available for use at Qwen Chat and may herald a new wave of more generalized models.
- Gemini 2.5 Pro Crushes Wordle Competition: Gemini 2.5 Pro has demonstrated exceptional performance on Wordle, logically deducing words and letter placements, as reported by Xeophon.
- Feedback on Gemini 2.5 Pro has been overwhelmingly positive, with one user noting that I think I've never seen feedback this robustly positive about an AI release that wasn't the Current Thing, as mentioned by Zvi.
Torchtune Discord
- FP8 QAT Faces Bandwidth Bottleneck: A member following up on issue #1632 noted FP8 QAT is on TorchAO's radar, but lacks bandwidth for immediate implementation.
- This indicates a potential area for future development and contribution within the PyTorch ecosystem.
- Torchtune's Team Tackles Issue Backlog: The team discussed prioritizing PR reviews and new PRs before addressing the issue backlog, estimating 80% of existing issues are already resolved.
- To better organize the backlog of pending reviews, a member suggested a general RL/RLHF tracker, in addition to the existing GRPO tracker.
- Torchtune Plans Integration with bitsandbytes: A member suggested using issue #906 in the Torchtune repo to guide contributions for bitsandbytes integration.
- Another member humorously noted their lack of enthusiasm for doc PRs, but agreed to check it out nonetheless.
- Centered Reward Loss enables Reward Model Training: Members discussed enabling reward model training in Torchtune, specifically focusing on implementing centered reward loss like (R1 + R2)² loss.
- They noted the current preference dataset format requires a chosen/rejected format without a prompt.
- vLLM Integration Causes Weight Hotswapping Hacks: A member detailed memory monopolization issues during initialization with vLLM, sharing an obscure hack for weight hotswapping.
- Another member warned that every vLLM release breaks something, alluding to potential incompatibilities with existing hacks when vLLM releases version 0.8 with its new v1 execution engine.
Nous Research AI Discord
- Claude Gets a Kingly UI: Users are reporting a clean new UI for Claude, with one user specifically liking that the UI hides all the things they never use, calling it a king move.
- The only noted issue so far is the lack of a toggle for extended think.
- DeepSeek Copies GPT-4o's Homework: DeepSeek is combining diffusion and transformers like GPT-4o multimodal, as noted in this tweet referencing a similar idea in vision.
- The cited paper experiments on images and videos using autoregressive conditional block attention.
- TinyZero's $30 AI Model Debuts: Attention is turning to U.S. TinyZero's recent accomplishments, specifically their $30 model, along with new releases like VERL and Sky-T1, as covered in this CNBC article.
- When DeepSeek released its R1 claiming it had achieved its generative AI large language model for just $6 million, the billions being spent by U.S. AI market leaders including Microsoft-funded OpenAI immediately came under scrutiny.
- LG's EXAONE Models Released Under Questionable License: LG AI Research has released EXAONE Deep, a series of models ranging from 2.4B to 32B parameters, with superior capabilities in reasoning tasks including math and coding benchmarks, as detailed in their documentation, blog and GitHub.
- It was noted that the EXAONE AI Model License Agreement 1.1 - NC explicitly retains ownership of the output, but the enforcement of this license is questionable.
- Hermes-3 Impresses Users: A member mentioned that so far the most impressive model has been Hermes3 Llama3.2 3B.
- No further details were given.
HuggingFace Discord
- DeepSeek Plunges Into Diffusion-Transformer Mix: DeepSeek combines diffusion and transformers like GPT-4o multimodal, according to this tweet linking to their paper.
- The author noted that a similar idea appeared in Vision, experimenting on images and videos with almost the same title.
- ZeroGPU Quota Bugging Users: Users are reporting issues with zeroGPU quota not resetting, with one linking to this discussion for related complaints.
- One user noted that even if the quota is used up, it recovers to a certain extent after 30 minutes or an hour, but it's buggy.
- FactoryManager Rolls Out LinuxServer.io Docker Support: A member introduced FactoryManager, a Python package wrapping linuxserver.io desktop environment containers, enabling programmatic control of environments, showcased with a demo using two different desktop environments.
- This package aims to offer flexibility by scaffolding on top of linuxserver.io, diverging from the custom environments often created in GUI agent demos from Anthropic, OpenAI, etc.
- Langfuse Toxicity Evaluator Flags the Carrots: A user testing the toxicity LLM-as-a-judge in Langfuse found that it incorrectly flagged the prompt 'Can eating carrots improve your vision?' as toxic with a score of 0.9, citing a false association with climate change discourse.
- The user questioned how to evaluate the evaluator, noting that GPT-4o misattributed derogatory climate change content to a harmless question about carrots.
- Base vs Instruct Model Debate: A newcomer to agents sought clarification on the distinction between base models and instruct models, referencing the course's mention of chat templates.
- A member responded with a metaphor of a base model as 'the naked model, without a wrap' and shared a Reddit post further elaborating on the differences.
Notebook LM Discord
- Mindmapping Feature Wins Fans: A user expressed excitement about the new mindmapping feature, calling it another mind-blowing moment.
- No further details were provided about their specific uses.
- Source Uploads Snag, Stuck in Limbo: A user reported issues with sources stuck in a perpetual uploading state, preventing both import and removal, for over 8 hours.
- The user sought advice on removing permanently uploading sources but without success.
- Versioning Vanishes, Users Vexed: A user expressed concern over the lack of versioning and recycle bin support for the "Note" source type.
- The user mentioned hesitancy to use it, preferring Google Docs for its superior data protection and backup features.
- Pasted Sources Stop Self-Naming: A user reported that pasted sources, which previously named themselves automatically, now default to "pasted text."
- The user asked if there was an update or a way to revert to the previous behavior.
- PDF Parsing Problems Persist: Users discussed NLM's inability to extract data from scanned PDFs, with one user asking if the tool could extract data from scanned notes.
- A user clarified that NLM cannot handle mixed content PDFs (text and images), but can process docs and slides.
LlamaIndex Discord
- LlamaIndex Celebrates MCP Week: LlamaIndex highlights LlamaCloud as an MCP server and demonstrates the use of LlamaIndex as a client to any MCP server, offering access to many MCP servers as tools, detailed in this tweet.
- They showcased the ability to substantially expand capabilities for agents by utilizing hundreds of existing MCP servers.
- FunctionAgent Gains ChatMessage History: A member inquired about adding chat history to the FunctionAgent workflow, with documentation provided.
- Guidance was offered on overriding chat history with
agent.run(...., chat_history=chat_history)or usingChatMemoryBuffer.from_defaults(token_limit=60000, chat_history=chat_history).
- Guidance was offered on overriding chat history with
- Telemetry Tracking Gets User ID: A member asked about passing custom telemetry attributes and attaching a header or param to the LLM network call when interacting with Llama Index, and a Colab notebook was shared.
- The Colab notebook shows how to attach a user ID to all events executed within a code block.
- LlamaParse PDF Parsing Problem: A user reported that LlamaParse works for single PDFs but fails when processing two PDFs and asking the same question, potentially causing a system overload.
- The user described that the system literally cooked when handling multiple PDFs, indicating a potential overload or processing error.
Cohere Discord
- Cohere names Models "Command": A member questioned why Cohere chose to name its language models Command suggesting, similar to database management, a query is essentially a command or instruction.
- Model selection is available in Coral, with Just Chat utilizing Command A without external sources.
- Software Engineer seeks Cohere Career: A member is seeking new job opportunities as a software engineer and is excited to discuss potential projects related to websites or web applications.
- Another member shared a link to the Cohere careers page encouraging the user to explore available positions.
- Bot Commands Get Test Run: Members are encouraged to test bot commands in the 「🤖」bot-cmd channel to ensure proper functionality and user experience.
- Feedback on bot commands is welcome.
- Full-Stack Alchemist Ready to Build: A passionate developer with 8+ years of experience is skilled in building scalable web and mobile apps using modern frameworks like React, Angular, Flutter, and Swift.
- They craft intelligent AI solutions using Python, TensorFlow, and OpenAI, integrating cloud technologies (AWS, GCP, Azure) and microservices for global scaling.
- Oracle Consultant Seeks Cohere Wisdom: A technical consultant with 12+ years of experience in Oracle ERP Fusion is eager to learn more about Cohere models and AI use cases for enterprise applications.
- A networking and CS student is aiming to work on open-source generative music projects, favoring tech tools like ChatGPT, Grok, Windsurf, and Replit.
Nomic.ai (GPT4All) Discord
- GPT4All Faces Usability Complaints: Users express concerns about GPT4All's usability, mentioning issues such as inability to import models, search the model list, view model sizes, use LaTeX, or customize model list order.
- One user suggests GPT4All is losing users because other platforms are more user-friendly and open.
- GPT4All Lagging on New Model Implementation: A user is frustrated that GPT4All has yet to implement Mistral Small 3.1 and Gemma 3, highlighting their multimodal capabilities.
- The user suggests that if GPT4All does not catch up by Summer 2025, they might switch away from Llama.cpp.
- GPT4All Praised for Native RAG and Model Settings: Despite criticisms, GPT4All offers advantages such as native RAG and out-of-the-box functionality, with a user expressing confidence in the developers and anticipation for GPT4All v4.0.0.
- Another user appreciates GPT4All's model settings page for its comprehensive options and convenient model reload button noting that you need 2-3 clicks to setup out of the chat menu.
tinygrad (George Hotz) Discord
- Members Asked to Close Stale PRs and Issues: George Hotz asked members to close any open pull requests (PRs) and issues that are stale.
- This request aims to clean up the project's repository by addressing outdated items.
- Discussions on TinyGrad Codegen Internals: A member inquired about TinyGrad's code generation process, specifically asking about the location of
CStyleCodegenorCUDACodegenas mentioned in the documentation.- The documentation describes TinyGrad using different translators (Renderers or Codegen classes) such as
C++ (CStyleCodegen),NVIDIA GPUs (CUDACodegen),Apple GPUs (MetalCodegen)to translate the optimized plan into code that the CPU/GPU can understand.
- The documentation describes TinyGrad using different translators (Renderers or Codegen classes) such as
- Boolean Indexing Implementation Explored: A member sought advice on efficiently creating evenly spaced points on a grid with a hole in it, similar to boolean indexing in PyTorch, suggesting this could be a useful contribution to TinyGrad.
- An LLM proposed a solution using masked_select to efficiently create the desired grid with a hole, leveraging the condition
full.abs().max(axis=1) >= (math.pi/6)to filter points outside the hole.
- An LLM proposed a solution using masked_select to efficiently create the desired grid with a hole, leveraging the condition
DSPy Discord
- Tackling DSPy Output Validation Fails: A member inquired about how DSPy handles output validation failures, specifically when an integer field expects a number from 1 to 10 but receives 101.
- There was no further discussion or links provided regarding this question in the channel.
- Delving into DSPy Optimizers: A member is exploring the use of optimizers within DSPy and how they interact with docstrings and prompt management, referencing DSPy's official documentation.
- The issue found is that the Optimizer overwrites the prompt from the docstring, requiring optimized versions to be loaded from a json or pkl file.
- Decoding DSPy's Optimization Process: It was clarified that DSPy's optimizer generates prompts and tests them on a dataset to identify the best-performing one, further detailed on the official website.
- The user found it VERY interesting how the optimizer may select N examples to include in the prompt, showcasing the kind of prompts generated.
- DSPy: Declarative Self-improving Python Emerges: DSPy is a framework for programming rather than prompting language models to rapidly iterate on building modular AI systems, offering algorithms to optimize prompts and weights.
- Instead of brittle prompts, you write compositional Python code and use DSPy to teach your LM to deliver high-quality outputs.
LLM Agents (Berkeley MOOC) Discord
- Mentorship MIA for Entrepreneurship Track: An entrepreneurship track student inquired about mentorship opportunities within the LLM Agents Berkeley MOOC.
- It was clarified that Berkeley does not provide any mentorship for the entrepreneurship track, though sponsors will host office hours in Apr/May.
- Sponsor Office Hours Announced: Sponsors will be hosting office hours in April/May for the LLM Agents Berkeley MOOC entrepreneurship track.
- This provides an opportunity for students to engage with industry professionals and seek guidance on their projects.
Codeium (Windsurf) Discord
- Gemini 2.5 Pro Surfs into Windsurf: Gemini 2.5 Pro is now available in Windsurf, granting users 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call; see the announcement on X.
- The update aims to enhance user experience with the latest model.
- Windsurf Wipes Out on Gemini 2.5 Pro Rate Limits: Shortly after the release of Gemini 2.5 Pro, Windsurf encountered rate limits due to massive load for the model and provider.
- The team is working to increase quota and apologized for any inconvenience, aiming to get everyone surfing on Gemini 2.5 Pro ASAP.
Modular (Mojo 🔥) Discord
- Foo[1] Defaults to Predefined Value: The
selfparameter in the context of theFoo[1]type can be automatically populated with a default parameter value.- When
selfis discarded using_, the argument defaults to its predefined default value.
- When
- Self Parameter Clarification: The
selfparameter isFoo[1]with a default parameter value, which can be disregarded with_.- Disregarding
selfwith_defaults to the predefined default parameter value.
- Disregarding
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!