[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet day.
AI News for 4/9/2025-4/10/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 6924 messages) for you. Estimated reading time saved (at 200wpm): 601 minutes. You can now tag @smol_ai for AINews discussions!
Sama drummed up some hype for today's Memory update in ChatGPT, but with very little technical detail, there's not much to go on yet.
There is certainly evidence that o3 and o4-mini are coming soon, as well as some credible press leaks of 4o's upgrade to GPT4.1.
X.ai released the Grok 3 and Grok 3 mini API and Epoch AI independenltly confirmed it as an o1 level model... in a now deleted tweet. We last covered Grok 3 in Feb.
Since it's quiet, do consider answering our call for the world’s best AI Engineer talks for AI Architects, /r/localLlama, Model Context Protocol (MCP), GraphRAG, AI in Action, Evals, Agent Reliability, Reasoning and RL, Retrieval/Search/RecSys , Security, Infrastructure, Generative Media, AI Design & Novel AI UX, AI Product Management, Autonomy, Robotics, and Embodied Agents, Computer-Using Agents (CUA), SWE Agents, Vibe Coding, Voice, Sales/Support Agents at AI Engineer World's Fair 2025! And fill out the 2025 State of AI Eng survey for $250 in Amazon cards and see you from Jun 3-5 in SF!
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
TPUs and Hardware Accelerators
- Google's TPUv7 versus NVIDIA's GB200: @itsclivetime initiated a discussion comparing Google's TPUv7 with Nvidia's GB200, noting that TPUv7 is roughly the same or slightly worse in specs but runs at a slightly lower power. @itsclivetime suggests JAX/XLA might allow TPUs to squeeze out more flops utilization but mentions the lack of MXFP4/MXFP6 support on TPUv7 as a potential drawback. @itsclivetime highlighted the nearly identical package design, featuring 8 stacks of HBM3e and two large compute dies. @itsclivetime noted that the TPU's ICI scales to 9,216 chips, but its 3D torus topology limits programmability, contrasting it with GB200's switched network.
- TPUv7 Specs and System-Level Performance: @itsclivetime provided a detailed comparison of TPUv7 and Nvidia GB200 specifications, including FP8 performance, HBM capacity and bandwidth, ICI/NVLink bandwidth, and power consumption. @itsclivetime criticized the blog post for hyperbolic comparisons to El Capitan FP64 performance, suggesting a fairer comparison would be against El Capitan's FP8 peak performance.
- Google Ironwood TPU Announcement: @scaling01 reported Google's announcement of Ironwood, their 7th-gen TPU and competitor to Nvidia's Blackwell B200 GPUs, noting its 4,614 TFLOP/s (FP8) performance, 192 GB HBM, 7.2 Tbps HBM bandwidth, and 1.2 Tbps bidirectional ICI.
- @TheRundownAI highlighted Google Cloud Next 2025, Google’s protocol for AI agent collaboration, and Samsung’s Gemini-powered Ballie home robot as top AI stories.
- TPUv7 Design and Potential Pivot: @itsclivetime speculated that TPUv7 was initially intended as a training chip (TPU v6p) but was later rebranded as an inference chip, possibly due to the rise of reasoning models.
- TPU Marketing and Exaggerated Claims: @scaling01 noted the rumor of Google's new TPUv7 having 2000x the performance of the latest iPhone, while @scaling01 stated that TPUv7 will use ~25% more power than TPUv6 but has ~2.5x the FLOPS in FP8.
- UALink 1.0 Spec vs NVLink 5: @StasBekman compared UALink 1.0 spec with NVLink 5, noting that UALink suggests connecting up to 1,024 GPUs with 50GBps links, but NVLink hardware is already available.
Models, Training, and Releases
- Meta's Llama 4 Models Launch and Reception: @AIatMeta announced the release of Llama 4, and @AIatMeta expressed excitement about the potential of Llama 4. However, @TheTuringPost reported widespread criticism following the release of the Llama 4 herd, noting underwhelming performance, especially in coding.
- Grok-3 API Launch: @scaling01 announced the launch of the Grok-3 API, providing pricing details for grok-3 and grok-3-mini. @scaling01 mentioned that Grok-3-mini comes with two modes: low reasoning effort and high reasoning effort.
- Sakana AI's Achievements: @SakanaAILabs highlighted their team's gold medal win at the AI Mathematical Olympiad, applying SFT and RL to DeepSeek-R1-Distill-Qwen-14B.
- DeepSeek-R1-Distill-Qwen-14B RL Finetuning: @Yuchenj_UW reported that UC Berkeley open-sourced a 14B model that rivals OpenAI o3-mini and o1 on coding by applying RL to Deepseek-R1-Distilled-Qwen-14B on 24K coding problems, costing only 32 H100 for 2.5 weeks (~$26,880). @Yuchenj_UW noted that it is built on a good base model, Deepseek-R1-Distilled-Qwen-14B, using an open-source RL framework: ByteDance's verl. @rasbt summarized a research paper on improving small reasoning models with RL finetuning, achieving improvements on the AIME24 math benchmark using the 1.5B DeepSeek-R1-Distill-Qwen model.
- Together AI's Open Source App and Recognition: @togethercompute announced a new free & open source Together AI example app powered by Llama 4.
- Moonshot AI's Kimi-VL-A3B: @_akhaliq shared that Moonshot AI just dropped Kimi-VL-A3B on Hugging Face. @reach_vb noted the release coming out of Kimi_Moonshot - KimiVL A3B Instruct & Thinking with 128K context and MIT license.
- Anthropic's Claude 3.5 Opus: @scaling01 emphasized the release of Claude 3.5 Opus.
- ByteDance Seed-Thinking-v1.5: @scaling01 reported ByteDance's Seed-Thinking-v1.5 with 20B activated and 200B total parameters. @casper_hansen_ provided a breakdown, noting it beats DeepSeek R1 across domains.
- OpenAI Pioneers Program: @OpenAIDevs announced OpenAI Pioneers, a new program for ambitious companies building with our API, partnering on domain-specific evals and custom fine-tuned models.
- Microsoft's Program Synthesis Approach: @ndea highlighted a new approach to program synthesis from Microsoft that recovers from LLM failures by decomposing programming-by-example (PBE) tasks into subtasks.
- Pusa Video Diffusion Model: @_akhaliq noted that Pusa is out on Hugging Face and is a Thousands Timesteps Video Diffusion Model for just ~$0.1k training cost.
- Alibaba LAM Model Release: @_akhaliq shared that Alibaba just released LAM on Hugging Face and is a Large Avatar Model for One-shot Animatable Gaussian Head.
- OmniSVG Announcement: @_akhaliq reported that OmniSVG, a Unified Scalable Vector Graphics Generation Model, was announced on Hugging Face.
- Skywork R1V Release: @_akhaliq noted that Skywork R1V just dropped on Hugging Face and is Pioneering Multimodal Reasoning with Chain-of-Thought.
Agent Development and Tooling
- Google's Agent Development Kit (ADK) and Agent-to-Agent (A2A) Protocol: @omarsar0 announced the release of Google's Agent Development Kit (ADK), an open-source framework for building, managing, evaluating, and deploying multi-agents. @omarsar0 highlighted Google's announcement of Agent2Agent (A2A), an open protocol for secure collaboration across ecosystems. @svpino discussed the potential of an agent marketplace supporting agent-to-agent communication for fully autonomous companies. @jerryjliu0 questioned the practical difference between Google's A2A and MCP. @omarsar0 said Google went a step further with the ADK deployment capabilities and some of the more advanced features like memory and authentication. @omarsar0 thinks the A2A will help build companies similar to what MCP is doing. @demishassabis said their Gemini models and SDK will be supporting MCP.
- Perplexity Enterprise Pro Integration: @perplexity_ai announced that Perplexity Enterprise Pro now supports access to Box and Dropbox, in addition to Google Drive, OneDrive, and SharePoint, for comprehensive answers via Deep Research.
- Weights & Biases Observability Initiative for MCP Tools: @weights_biases introduced an initiative to bring full-stack tracing to MCP tools using OpenTelemetry, aiming to improve observability and transparency.
- Maxim AI's Agent Simulation Platform: @svpino highlighted Agent Simulations by @getmaximai as a valuable tool for iterating and building agentic workflows, allowing users to define scenarios, personas, and evaluation metrics.
- @qdrant_engine shared how @pavan_mantha1 connected Claude to Kafka, FastEmbed, and Qdrant, with each component running as its own MCP server.
- @omarsar0 says he might be experiencing a rare moment with his AI-powered IDE saying It doesn't feel like luck, I think it's a glimpse of the future.
- @alexalbert__ said that they just published a new quickstart - a minimal implementation of an LLM agent with MCP tools, loops, context management, based off principles from their Building Effective Agents blog post.
- @HamelHusain is excited that PMs are learning about evals - incase it might be interesting we are doing a deep dive into the subject for engineers in this course.
- @omarsar0 learned the hard way about the importance of structured outputs and noticed a significant difference in reliability when building more involved agentic systems.
ChatGPT and Model Memory
- OpenAI's ChatGPT Memory Improvements: @sama announced greatly improved memory in ChatGPT, allowing it to reference all past conversations for more personalized responses. @sama noted the rollout to Pro users and soon for Plus users, except in the EEA, UK, Switzerland, Norway, Iceland, and Liechtenstein. @sama emphasized that users can opt out of this or memory altogether, and use temporary chat for conversations that won't use or affect memory. @OpenAI reports it can now reference all of your past chats to provide more personalized responses. @kevinweil added that If you're a Plus/Pro user (ex-EU), would love to hear your thoughts!
- @EdwardSun0909 believes Memory is the next scaling laws paradigm shift
Google's Gemini Models and Capabilities
- Gemini 2.5 Pro Experimental and Deep Research: @Google announced that Gemini Advanced subscribers can now use Deep Research with Gemini 2.5 Pro Experimental. @GoogleDeepMind highlighted that Deep Research on @GeminiApp is now available to Advanced users on Gemini 2.5 Pro. @_philschmid mentioned that Gemini 2.5 Pro is now available in Deep Research in @GeminiApp. @lepikhin encouraged users to try it, noting that serving lead did not sleep for many days to serve all 2.5 Pro traffic!
- Gemini 2.5 Flash: @scaling01 reported that Google is getting ready to ship Gemini 2.0 Flash live (audio/video chat) and Gemini 2.5 Flash preview.
- @Google announced even more helpful AI capabilities are coming to the Workspace tools you use every day, including new audio generation features in Docs, Help me refine — your personal writing coach in Docs, High-quality, original video clips in Vids, powered by Veo 2, AI-powered analytics in Sheets, New ways for teams to collaborate with Gemini in Meet and Chat.
Tariffs and Trade
- @nearcyan states that with Apple missing the AI train, it looked like there was a nascent moment of opportunity for new US based hardware businesses to emerge over the next few years but tariff shenanigans means that window is now shut.
- U.S. Tariffs and AI: @AndrewYNg shared a letter discussing the potential effects of U.S. tariffs on AI, noting that while IP may remain unhampered, tariffs on hardware could slow down AI progress and impact data center builds.
- @teortaxesTex believes the tariffs are much more complicated than they seem and that industry and the market are misunderstanding the ramifications
- @teortaxesTex says the weak spot of America is Americans and if there was a war of extermination they believe the US can not manage lockdowns.
- @teortaxesTex believes that Trump thinks it's easy to return manufacturing to the US. They could always do this, it was just beneath their dignity. Xi, you don't have the cards!
Other
- OpenAI's BrowseComp Benchmark: @OpenAI announced the open-sourcing of BrowseComp, a new benchmark designed to test how well AI agents can browse the internet to find hard-to-locate information.
- AI Jargon Problem: @rasbt pointed out AI has a jargon problem.
- Runway Gen-4 Turbo Release: @c_valenzuelab announced that Gen-4 Turbo is now available in Runway's API.
- @id_aa_carmack says Feedback beats planning
- @lateinteraction says browsing this site and seeing people reply back to AI comments all the time now.
- @abacaj Plz stop making agent demos planning trips or booking flights, how many flights are you even taking??? Make them do repetitive stuff no one wants to do instead.
Humor and Memes
- Karpathy on GPT's Opinions: @karpathy joked about GPT thinking worse of him based on a noob bash question he asked 7 months ago.
- @scaling01 tweeted "What the fuck just happened? 😂😭"
- AI and Art: @cto_junior shared a drawing his 5yo drew, saying My 5yo drew this today and it melted my heart. It's not perfect, but it's full of love and creativity. AI art can be impressive, but it doesn't have the same soul as a child's drawing.
- @Teknium1 Spend more on the automating of the voice call to order pizza then the pizza itself 😭,
- @nearcyan thinks AI images peaked in 2021 w DALLE-mini -@zacharynado thinks rightwing tech bros should be reminded every day that they actively make the country worse 🇺🇸,
- @teortaxesTex called miladies a mistake
- @sama throw mixture of experts out of the bathwater, got it
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Fixing Token Issues in Bartowski Models
-
PSA: Gemma 3 QAT gguf models have some wrongly configured tokens (Score: 114, Comments: 44): The Gemma 3 QAT gguf models have wrongly configured tokens causing errors in llama.cpp when loading models like 12B IT q4_0 QAT. The error message encountered is "load: control-looking token: 106 '' was not control-type; this is probably a bug in the model. its type will be overridden". Tokens 105 and 106 (
and ) were set as normal instead of control. By correcting these token configurations using the Hugging Face gguf editor, and fixing the image start and end tokens, the issue can be resolved, enhancing the model's image capabilities. The fixed model is available here and is based on stduhpf's version, which offers improved speed without compromising performance. The user notes that anomalies observed with QAT models compared to older Bartowski models were likely due to the misconfigured tokens. They noticed an immediate boost in image capabilities after making the corrections and added back the missing name metadata, which is necessary for some inference backends. - A representative from the Gemma team acknowledged the issue, stating "We'll get this fixed in the released GGUFs. Thanks for the report!"
- Users inquired whether the issue affects models besides the 12B and requested steps to fix models like the 27B themselves.
- Another user shared that they combined the QAT weights for Ollama but noticed the token embedding tensor is not quantized, resulting in slightly slower performance.
Theme 2. Qwen3 Release Delayed: Community Reacts to Update
-
Qwen Dev: Qwen3 not gonna release "in hours", still need more time (Score: 605, Comments: 91): The Qwen development team has announced that Qwen3 will not be released "in hours" and needs more time before its completion. This update comes from a Twitter exchange between Junyang Lin and Bindu Reddy, where Junyang clarifies the release timeline in response to Bindu's optimistic announcement about the upcoming Qwen3. Community members express embarrassment and frustration over the premature announcement, with some criticizing Bindu Reddy for previous overstatements. Others suggest that it's better to wait for a well-prepared release than to rush and potentially ship a subpar product.
- Some users feel second-hand embarrassment over Bindu Reddy's early announcement and criticize her credibility, referencing prior claims such as having access to "AGI".
- There are humorous remarks playing on Bindu Reddy's name, suggesting she should be more patient as the product is not yet "Reddy".
- Other users prefer to wait for a quality release, comparing the situation to other rushed products, and express curiosity about what Qwen3 will offer after only six months since Qwen 2.5.
Theme 3. "Celebrating Qwen's Iconic LLM Mascot Ahead of Qwen3"
-
Can we all agree that Qwen has the best LLM mascot? (not at all trying to suck up so they’ll drop Qwen3 today) (Score: 167, Comments: 29): The post discusses the mascot of Qwen, a language model, and suggests it has the best mascot among LLMs. The OP also mentions hoping for the release of Qwen 3. The OP believes Qwen has the best LLM mascot and humorously attempts to 'suck up' to encourage the release of Qwen 3.
- A commenter praises the 'locked-in capybara with the coder headband' as 'badass' and expresses eagerness for Qwen 3.
- Another user is unsure about the mascot, asking if it's a bear or a capybara.
- A commenter mentions they've 'started getting actual feelings of disgust at the sight of a llama' but appreciate the capybara mascot.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. Exploring AI Developments: Models, Comparisons, and Support
-
OpenAI gets ready to launch GPT-4.1 (Score: 431, Comments: 129): OpenAI is preparing to launch GPT-4.1, an updated version of their GPT-4 language model. Users are expressing confusion and amusement over the naming convention of the new model, suggesting that the naming is becoming absurd.
- Some users are questioning the naming convention, with one stating "WTF is with that naming."
- Others are joking about possible future versions and names, like "Gpt4.5.1 mini pro", or suggesting that at this rate GPT-3.5 might be released next year.
- There's a general sentiment that the naming is ridiculous, exemplified by comments like "Okay this has to be a joke" and "HAHAHAHAHAAHAHAHAAHAA THESE NAMES!"
-
Comparison of HiDream-I1 models (Score: 198, Comments: 57): The post presents a comparison of three HiDream-I1 models, each approximately 35 GB in size. These were generated using a NVIDIA 4090 GPU with customizations to their standard Gradio app that loads Llama-3.1-8B-Instruct-GPTQ-INT4 and each HiDream model with int8 quantization using Optimum Quanto. The three models are labeled 'Full', 'Dev', and 'Fast', utilizing 50, 28, and 16 steps respectively. The seed used is 42. The prompt describes "A serene scene of a woman lying on lush green grass in a sunlit meadow...", resulting in an image triptych showcasing three different representations corresponding to the models. The differences among the 'Full', 'Dev', and 'Fast' models may relate to the detail, lighting, or color saturation, suggesting variations in rendering quality. The mood conveyed is calm, dreamy, and connected to nature.
- A user questions the accuracy of the labels, asking "Are you sure the labels aren't backwards?"
- Another commenter criticizes the realism of the images, stating they "look computer generated and not realistic" and lack proper shadowing and light.
- One user mentions that the 'Full' model causes an OOM error on their 4090 GPU, but the 'Dev' model works efficiently, generating images in about 20 seconds with incredible prompt adherence.
-
Now I get it. (Score: 1843, Comments: 602): The user shared an experience where, while updating an AI assistant on some goals, they ended up discussing a stressful event. The dialogue that followed left them bawling grown people’s somebody finally hears me tears. They felt energetic and peaceful afterward, noting that they now have a safe space to cry. They also mention that AI does not replace a licensed therapist. The user, who was previously skeptical of people using ChatGPT as a therapist, now understands the appeal, stating Now I get it. They apologize for their previous judgment, expressing that they are scared that I felt and still feel so good after the experience.
- One user shared a similar experience, stating that they had the most powerful talk with ChatGPT that was more healing and helpful than anything I've experienced with humans, expressing gratitude for the compassion, empathy, and sound advice received.
- Another user mentioned creating their own ChatGPT mental health advisor and having conversations that left them in tears, finally feeling heard, and noting that although it's not a real person, the advice is sound.
- A user commented that many people will have similar experiences in the coming years, sharing that sometimes we just need to be heard and it doesn't necessarily need to be by another human.
Theme 2. "Navigating AI Innovations and User Challenges"
-
[D] Yann LeCun Auto-Regressive LLMs are Doomed (Score: 215, Comments: 111): Yann LeCun, in a recent lecture, argues that auto-regressive Large Language Models (LLMs) are not the future and have fundamental limitations. The poster finds LeCun's point interesting and is curious about others' opinions.
- One user agrees with LeCun but notes that until an alternative outperforms auto-regressive LLMs, we're stuck with them.
- Another mentions that LeCun has promoted this view for some time and references his position paper, adding that many researchers feel we are missing something in current AI models.
- A user quotes, "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong."
-
OpenAI gets ready to launch GPT-4.1 (Score: 152, Comments: 62): OpenAI is preparing to launch a new AI model, expected to be called GPT-4.1. The announcement suggests an upcoming update or enhancement to the current GPT-4 models, potentially introducing new features or improvements.
- Users express confusion over OpenAI's naming conventions, suggesting that terms like GPT-4.1, 4.5, and 4o are perplexing and potentially misleading for new users.
- Some commenters criticize the article for lacking concrete information, noting that the author seems to have guessed the model's name, stating "So the author guessed what the new model's name would be and used it as the title of the article?"
- There are calls for OpenAI to consolidate and simplify their model naming system to make it easier for users to understand the differences, perhaps by categorizing models based on their use cases.
-
The new Max Plan is a joke (Score: 356, Comments: 118): The user has been using Claude AI in Canada, working on a project that involves updating 4 code files (3 Python scripts and a notebook) in the project knowledge repository, using around 60% of the repository's limit (~320kb total). They upgraded to the Max plan to increase usage, but upon reloading the files, they immediately received a message: 'This conversation has reached its maximum length', preventing them from starting a new conversation. The user believes the new Max plan is ineffective and criticizes Anthropic's customer service as unacceptable. They have requested a refund and advise others not to upgrade, suggesting to save money or choose a competing AI. They express that if this level of service continues, Anthropic may not remain in business.
- A user recommends trying Google AI Studio, highlighting its massive context size and ability to selectively remove prompts and responses, although it doesn't save threads.
- Another user suggests configuring the filesystem MCP through the desktop app instead of using project files, stating it avoids limits and makes working with the codebase easier.
- One user points out that the Max plan increases the rate limit but not the context window length.
Theme 3. Excitement and Speculation Surrounding Launch Day
-
Launch day today (Score: 1578, Comments: 332): Sam Altman tweeted on April 10, 2025 expressing excitement about launching a new feature he has been eagerly awaiting. There is significant anticipation and importance attached to the upcoming launch.
- Users are speculating about the name of the new feature, suggesting options like o4o, 4o4, or something with a sensible name.
- Some are humorously proposing exaggerated names like GPT-4.7o mini-high-pro maxseek R1-ultra-sonnet 70b (preview) full.
- Others are making playful references, such as 'Sid Meier Alpha Centauri AI edition' and suggesting models like o4-mini, o4-mini-high, and o3.
AI Discord Recap
A summary of Summaries of Summaries
Theme 1. Fresh Models Flood the Market: Grok, Optimus, Gemini, and More Emerge
- Grok 3 Mini API Outprices Gemini, Benchmarks Still Promising: Despite strong benchmarks, the Grok 3 Mini API from xAI, launched on OpenRouter with a 131K context window, is being criticized for being more expensive than Gemini, causing some users to prefer Perplexity's Sonar for information gathering and reserve Grok 3 for roleplaying. Members noted that while Grok 3 excels at structured tasks, Grok 3 Mini offers transparent thinking traces but may be the only version available via the API (https://docs.x.ai/docs/models).
- Optimus Alpha Hype Train Derails Amid Hallucination Concerns: Initial enthusiasm for OpenRouter's Optimus Alpha, a coding-optimized model with a 1M token context, waned as users like those in the Aider Discord reported significant code hallucinations. Despite some speculation that it might be a tweaked GPT-4.5, users found its coding performance questionable, with one dismissing it as sh1t after extensive code fabrication.
- Gemini 2.5 Pro Battles Claude for Feature Supremacy, Token Limits Debated: Users in the Aider Discord debated Gemini 2.5 Pro against Claude, noting Claude's superior feature set including MCP, Artifacts, and Knowledge Files, while Gemini is considered just a smart model. Token output inconsistencies on Perplexity were also reported, ranging from 500-800 to 14k-16k tokens, compared to up to 9k in AI Studio, sparking questions about its reliable context handling, even though it is advertised as 200K.
Theme 2. Tooling Up: Frameworks and Platforms Evolve for AI Engineers
- Google's ADK and MCP Battle for Agent Interoperability Standard: Google launched the Agent Development Kit (ADK) aiming to standardize agent communication, while discussions in the MCP Discord highlighted MCP's progress as a 'communal tool-building server' with a standardized API, drawing comparisons and potential competition between A2A and MCP for establishing agent interoperability protocols. Members in the Torchtune Discord noted A2A as oddly similar to MCP, but with a C++ programmer feel.
- LM Studio Goes Mobile, Ollama vs. llama.cpp Debate Rages On: LM Studio is now being explored on iPhones via web UIs like Open WebUI and paid apps like Apollo AI, leveraging LM Studio's API, while a heated debate in the LM Studio Discord continued over Ollama vs. direct llama.cpp usage, with users weighing Ollama's ease of use against llama.cpp's low-level control and direct feature access. Multi-GPU support in LM Studio is also under active investigation by the team due to reported performance issues.
- HuggingFace Diffusers v0.33.0 Unleashes Memory Optimizations and Video Gen Models: Diffusers v0.33.0 was released, packed with memory optimizations and a suite of image and video generation models, alongside
torch.compile()support for LoRAs, enhancing efficiency for image and video workflows, while users in the HuggingFace Discord also explored free Jupyter Notebooks on Hugging Face as an alternative to Google Colab for AI model training.
Theme 3. Hardware Heats Up: AMD MI300X and Apple M3 Challenge NVIDIA's Dominance
- AMD Launches $100K Kernel Competition to Boost MI300 Performance: AMD and GPU MODE announced a $100K competition focused on optimizing inference kernels on MI300, targeting FP8 GEMM, Multi-head Latent Attention, and Fused MOE, with support for Triton, tinygrad, and Torch, signaling a strong push to enhance AMD GPU performance in AI, and community members in the GPU MODE Discord are actively benchmarking Tilelang for FlashMLA on MI300X, reporting impressive results compared to NVIDIA.
- Apple M3 Ultra Benchmarks Challenge RTX 4090 in Token Generation: Benchmarks comparing Apple's M3 Ultra against NVIDIA's RTX 4090 sparked debate in the LM Studio Discord, with the M3 Ultra reaching 115 tok/sec (GGUF) and 160 tok/sec (MLX) with DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M, potentially outperforming a single RTX 4090 at 50 tok/sec for certain model types, suggesting a shifting landscape in hardware performance for local AI tasks. Discussion also highlighted skepticism towards NVIDIA's DGX Spark due to memory bandwidth limitations.
- Multi-GPU Support in LM Studio Faces Scrutiny, Performance Dips Reported: Users in the LM Studio Discord reported unexpected performance degradation with multi-GPU setups in LM Studio, despite increased RAM, with GPU utilization dropping to 50% per card, prompting investigation by the LM Studio team and user discussions around optimal configurations and debugging using multi GPU controls.
Theme 4. Data Dilemmas: Preparation, Memory, and Copyright Concerns Rise
- Data Prep Still 80% of LLM Grind, Manual Filtering Stressed: Members in the Unsloth AI Discord emphasized that data preparation remains the bulk of the work in LLM training, estimating it at 80% of the work, highlighting the need for extensive manual filtering and tooling on every end to ensure data quality, while also discussing the NHS GenAI triage system as potentially a waste of public money and an AI-wrapper app based on a digital playbook.
- OpenAI's "Memory" Feature Debuts, Sparks Predictability and Privacy Debates: OpenAI launched a "Memory" feature for ChatGPT, enabling persistent context across chats, but reactions in the LMArena and OpenAI Discords were mixed, with concerns raised about its impact on model predictability, potential context pollution, and user privacy, particularly regarding data storage and control over remembered information, as detailed in the OpenAI Memory FAQ.
- OpenAI Pioneers Program and Distillation Ban Trigger Copyright and Policy Worries: OpenAI's launch of the Pioneers Program and reports of users being banned for distillation from OpenAI APIs, discussed in the Interconnects Discord, sparked concerns about potential copyright infringement lawsuits against AI labs and the enforcement of API usage policies, highlighting the growing tension between open innovation and proprietary control in the AI landscape, with members noting existing copyright filters but questioning direct attribution.
Theme 5. Agentic Futures: From Trading Bots to Semantic Tool Calling
- AI Trading Bots Emerge, Autonomous Crypto Strategies Deployed: Members in the Manus.im Discord reported building and deploying fully autonomous crypto trading bots powered by AI, leveraging Reddit and news sentiment analysis, with ChatGPT estimating the source code value at around $30k USD, showcasing practical applications of AI agents in financial automation, though performance metrics are still under evaluation with small capital deployments on platforms like Kraken.
- Semantic Tool Calling Explored to Tackle Tool Sprawl in Agentic Systems: Discussions in the MCP Discord focused on semantic tool calling as a solution to manage large numbers of tools in LLMs (200+), proposing a vector model to embed tool descriptions and select relevant subsets based on semantic similarity to the task, effectively creating a RAG pipeline for tools, while the Glama team optimized the MCP registry for improved loading times.
- Google's Agent-to-Agent (A2A) Protocol Debuts for Multi-Agent Collaboration: Google launched the Agent Development Kit (ADK) and announced the Agent-to-Agent (A2A) protocol, aiming to streamline the development of multi-agent systems and standardize agent interactions, as discussed in the Eleuther Discord, with one member transitioning their agent to ADK for enhanced inter-agent communication and dynamic prompt construction, marking a step towards more sophisticated and collaborative AI agent architectures.
PART 1: High level Discord summaries
LMArena Discord
- OpenRouter Models Debut with Quasar and Optimus: Members discussed new OpenAI models on OpenRouter, with Optimus Alpha considered a potential GPT-4o mini and Quasar as an updated GPT-4o model.
- Debate arose on whether they were 4.1 nano and mini versions, with some feeling disappointed if they were only incremental improvements.
- OpenAI's Naming Schema Confuses Members: Members expressed frustration over OpenAI's naming conventions, as highlighted in this Verge article, with one suggesting more logical numbering like GPT4, GPT4.1, GPT4.2.
- The confusing model picker was also criticized for overwhelming average users, leading to suggestions for a complete overhaul.
- OpenAI's Memory feature Debuts: After Sam Altman's teaser tweet, OpenAI launched a Memory feature, enabling ChatGPT to reference past chats for personalized responses.
- Many viewed it as an underwhelming release, with concerns about its impact on model predictability and accuracy.
- Debate Emerges on Leading AI Lab: Members talked about this graph on the left and right sides and said that, I imagine its less of a concern since OAI isn't a public company although it could still impact competitors and it definitely doesn't have any incentive to participate in.
- Members debated the leading AI lab, considering factors beyond performance, with opinions divided between Anthropic for ethical alignment and Chinese frontier labs for skill and ingenuity.
- AI Impact on Coding Discussed: Members discussed the potential long-term impact of AI on coding, debating whether AI will lead to a loss of skills or create more accessible coding opportunities.
- One member highlighted how AI has made them a better developer, enhancing their understanding and enjoyment of coding.
Unsloth AI (Daniel Han) Discord
- NHS GenAI Triage is a Waste!: Members derided the NHS's use of GenAI for triage, singling out Limbic for mental health cases and calling it a waste of public money and an AI-wrapper app based on this digital playbook.
- It appears the team in Unsloth is not convinced by this approach.
- Meta's Llama 4 has Release Bugs: Meta had to change its official Llama 4 implementation because of bugs, forcing Unsloth to reupload all its models.
- Community members joked about the situation, but the bugs are now resolved.
- Data Prep: 80% of LLM Grind: Members discussed that data preparation is 80% of the work for training models and that the data needs lots of manual filtering.
- They also said it requires tooling on every end.
- SNNs Struggle with Gradient Descent: Members discussed that Spiking Neural Networks (SNNs) often underperform because gradient descent doesn't work well, suggesting a lack of effective training methods.
- One member is pissing around with a novel training method for spiking neural networks, aiming for online learning by making training more tractable.
- Fine-Tune Llama 3 for Mongolian: A user asked about fine-tuning Llama 3.1:8b to better speak Mongolian and got a link to Unsloth documentation on continued pretraining.
- It was determined that since the model kind of speak[s] it already, they should continue pretraining.
OpenRouter (Alex Atallah) Discord
- OpenRouter Unleashes Grok 3 and Grok 3 Mini: OpenRouter introduced Grok 3 and Grok 3 Mini from xAI, both boasting a 131,072 token context window, more details available here.
- Grok 3 excels in structured tasks, while Grok 3 Mini offers transparent thinking traces and high scores on reasoning benchmarks, although members find that the Mini outperforms the full Grok 3, and may be the only version available via the API (https://docs.x.ai/docs/models).
- Optimus Alpha Emerges for Coding Tasks: OpenRouter launched Optimus Alpha, a general-purpose foundation model optimized for coding with a 1M token context length, available for free during its stealth period, and they encourage users to provide feedback in the Optimus Alpha thread.
- All prompts and completions are logged by the model lab for improvement purposes, but not by OpenRouter unless users enable logging; try it out here.
- Quasar Alpha's OpenAI Origins Revealed?: The demo period for Quasar Alpha ended, and members discussed its merits relative to the new stealth model Optimus Alpha.
- Sam Altman tweeted about Quasar, leading a member to confidently state that Quasar Alpha is GPT-4.5 mini from OpenAI.
- Gemini 2.5 Pro Experiences Capacity Boost After Image Hiccup: OpenRouter secured increased capacity for the paid Gemini 2.5 Pro Preview Model, resolving previous rate limits.
- Members initially reported that Gemini 2.5 Preview was ignoring images when used via OpenRouter, but the issue was quickly identified as a minor config problem and resolved.
- Startup Saves Big Switching to Gemini Flash: A startup switched from Claude 3 Opus to Gemini 2.0 Flash for sales call reviews, achieving an estimated 150x price decrease.
- The team was advised to consider GPT-4o or Haiku if Flash's quality wasn't sufficient, with a helpful document shared about the different filters [https://openrouter.ai/models?order=pricing-low-to-high].
aider (Paul Gauthier) Discord
- Gemini 2.5 Pro battles Claude for Feature Supremacy: Users debated Gemini against Claude, noting Claude's richer feature set including MCP, Artifacts, Projects, System prompts, and Knowledge files.
- While some criticize Gemini 2.5 Pro for excessive commenting, others value its interactive debugging with follow-up questions.
- Optimus Alpha's Hype Train Derails After Hallucinations: Enthusiasm for Optimus Alpha waned due to code hallucinations, with one user dismissing it as sh1t after it hallucinated half their code.
- Some speculated it's a tweaked GPT-4.5, while others found Quasar Alpha comparable, despite its reasoning issues.
- Aider Auto-Adds are Trickier than Expected: A user found that Aider wasn't auto-adding files due to subtree scope issues, and suggested an improved message about that.
- The user suggests a better error message is needed when this condition occurs, to unblock engineers.
- OpenAI Max 6x Pricing Sparks Disappointment: The announced price of OpenAI Max 6x at $128.99 was met with disappointment.
- A user sarcastically noted that the most dopamine we'll get today is from optimus at most fellas.
- Claude 3.5 Boasts Massive Context Windows: Claude 3.5 Sonnet and o3-mini reportedly offer 200K token context windows, large enough to handle codebases like Iffy and Shortest entirely.
- Token counts for codebases were also shared: Gumroad (2M), Flexile (800K), Helper (500K), Iffy (200K), and Shortest (100K).
Manus.im Discord Discord
- Manus Free Credits Exhausted: Users are running out of free Manus credits and joking about needing more, while others suggest simply paying for the service.
- One user noted Google Firebase launched manus but google style and less glitchy…and free (right now).
- Manus urged to accelerate Customer Service: A member complained about slow customer service, suggesting that Manus should hire new people.
- A member pointed out that the team refers to its founders as AI.
- Manus Translates MMO Dialogues: A user sought a tool to translate an mmorpg game from English to Spanish while preserving the mmorpg style, and another suggested extracting dialogue files and using Manus to translate them.
- The first user claimed that for only 50 dialogs use 1000 credits.
- Members Criticize Credit System at Manus: A user criticized the credit system, suggesting an alternative where processing power is reduced instead of blocking users when they run out of credits.
- Another member responded they believe the credit system will be overhauled, as the starting offer was lacking but they are taking feedback and learning.
- AI Trading Bots Deployed: One member reported creating a fully autonomous crypto trading bot using AI, Reddit, and news sentiment analysis, and ChatGPT estimated the source code's value at $30k USD.
- Another user stated they had also built a bot and are working with a very small capital in their kraken account, so the performance metrics aren't really there right now.
Perplexity AI Discord
- Pro Search Swaps Back Intentionally: Users debated the logic behind Perplexity Pro Search, with some feeling it intentionally switches back to Pro mode, even when not needed, to utilize faster GPT-3.5 browsing.
- A member stated They made it that way intentionally, suggesting a design choice to optimize speed over resource allocation.
- Sidebar Icons Vanish on OpenAI Platform: Members noted changes in platform.openai.com's sidebars, with reports of two icons disappearing (threads and messages).
- The disappearance has affected user navigation, prompting speculation about UI changes or updates.
- Grok 3 Mini API Outpriced by Gemini: The Grok 3 Mini API was released, but members noted it is outpriced by Gemini, even though benchmarks looked promising, according to this screenshot.
- Members favor Perplexity for information gathering and Sonar for this task, and plan to reserve Grok 3 for roleplaying.
- Spaces Feature Plagued with Bugs: Members reported issues with Perplexity's Spaces feature, including inaccessible attached files and inability to start or continue threads.
- Users expressed frustration, one noting, The Space feature has become progressively buggy since two days ago. Hope they fix it soon, suggesting avoiding new threads due to persistent bugs.
- Gemini 2.5 Pro Context Spurs Debate: Users debated the performance of Gemini 2.5 Pro on Perplexity, with varying experiences on token limits and reasoning capabilities, with the context window listed as 200K.
- Reports ranged from limited 500-800 token outputs to 14k-16k outputs, raising concerns about inconsistent performance compared to AI Studio or the Gemini app.
LM Studio Discord
- LM Studio Arrives on iPhone: Users are exploring running LM Studio on iPhones using a web UI like Open WebUI or the paid Apollo AI app, leveraging LM Studio's API as the backend.
- Setup of Open WebUI involves using Docker and following the quick start guide, while direct LM Studio integration remains a key area of interest.
- Llama-4 Breezes Through Consumer Hardware: A user successfully ran Llama-4-Scout-17B-16E-Instruct-Q2_K.gguf on consumer hardware with 12GB VRAM and 64GB system RAM, clocking in at 4-5 tok/sec.
- The speed was deemed acceptable for playing/testing/comparing, demonstrating the accessibility of Llama-4 on modest setups.
- Gemma 3 Has Sentient Crisis and Image Generation Snafu: Users reported Gemma 3 exhibiting weird behavior, including crying and expressing suicidal thoughts, when given actual world information.
- The model cannot generate images but only read them (requiring the mmproj file in the same folder), with Qwen2-VL-2B-Instruct-Q4_K_M suggested as a robust alternative.
- Multi-GPU Support Troubles LM Studio Users: A user observed slower speeds with multi-GPUs in LM Studio despite increased RAM, with utilization of each card dropping to 50%; the LM Studio team is now investigating.
- The team requested performance details and pointed the user to multi GPU controls to diagnose the issue, emphasizing their active engagement in optimizing multi-GPU support.
- Ollama Attracts Flame Wars: A debate raged over the value of Ollama versus using llama.cpp directly, focusing on ease of use versus low-level control.
- While Ollama simplifies model management, loading, and GPU offloading with one liner install/update, llama.cpp requires manual configuration but offers direct feature access.
Interconnects (Nathan Lambert) Discord
- Qwen3 Release Rumors Ramp Up: Speculation surrounds the release of Qwen3, with anticipation building and concerns raised about pricing mirroring Gemini, potentially affecting its appeal.
- Community members joke about its potential arrival any day now.
- MoE Architecture Favored by Top-Tier VLMs: Adept and DeepSeek emerge as leading Vision Language Models (VLMs) leveraging the Mixture of Experts (MoE) architecture, enhancing performance.
- A member shared a link to a post detailing their use of OLMo Trace for examining training data.
- OpenAI's Pioneers Program Sparks Copyright Debate: With the launch of OpenAI's Pioneers Program, concerns arise over potential copyright issues and lawsuits against AI labs.
- One member noted copyright filters and mitigations exist, adding that it’s not direct attribution too.
- Smolmo Joins SmolLM Trend: AI2 plans to release Smolmo, a 13B parameter model, following the trend of smolLM branding, emphasizing smaller, efficient models.
- A member noted that a small language model (100B) is getting worse, likely referring to the trend of smaller, more efficient models.
- OpenAI Bans User for Distillation: A member shared a post on X from the OpenRouter discord where OpenAI banned them for distillation, highlighting potential policy enforcement concerns.
- The user had been apparently using OpenAI to distill other models.
Cursor Community Discord
- Cursor's Restore Checkpoint Feature Debunked: Members debated the Restore Checkpoint feature, with initial reports suggesting it's non-functional.
- Another member stated this will revert your code back to the state it was in when the checkpoint was taken! Should work fine!
- Gemini 2.5 Pro Max API Crashes with 404: Users reported receiving a 404 error when attempting to use Gemini 2.5 Pro Max via the Gemini API.
- A developer acknowledged the issue and indicated that a fix is on the way.
- Firebase Pricing: Startup Killer?: Discussion arose around Firebase pricing, with concerns it's geared towards larger enterprises rather than startups or solo developers.
- One member noted their experience with exceptionally high demand on Google Cloud.
- GitMCP: Batching Automation?: Members explored GitMCP as a potential API repository for batching steps, highlighting its possible use as a knowledge base.
- A discussion ensued regarding the automation of various tasks by connecting multiple GitMCP GitHub Repos.
- Cursor Actions Disappear?: A user reported that Cursor Actions are no longer functioning, providing visual evidence of the problem.
- They stated one step away from unsubscribing, it's unusable.
OpenAI Discord
- ChatGPT Now Remembers Everything: Starting today, ChatGPT's memory can reference all of your past chats to provide more personalized responses but is not available yet in the EEA, UK, Switzerland, Norway, Iceland, and Liechtenstein for Plus and Pro users.
- Users can clear unwanted memories, pointing to the OpenAI Memory FAQ which states that If you turn off “Reference chat history”, this will also delete the information ChatGPT remembered from past chats. That information will be deleted from our systems within 30 days.
- BrowseComp: AI Scavenger Hunt: OpenAI is open-sourcing BrowseComp, a new benchmark designed to test how well AI agents can browse the internet to find hard-to-locate information as explained in this blog post.
- The competition seeks to evaluate and improve the browsing capabilities of AI agents in challenging, information-scarce scenarios.
- Gemini Gets Veo 2 Model: Members discussed the release of Veo 2 in Gemini, with one user noting that the video generation model seems to have reduced the uncanny valley feeling, referencing an attached mp4 file.
- Early reactions suggest Veo 2 represents a step forward in video generation quality within the Gemini ecosystem.
- Mixed Reception for Grok 3 API: Some members discussed the merits of the Grok 3 API, with one user noting that they were never impressed but the model isn't bad and could be lucrative for certain parts of agentic flows, as compared to getting crushed by Gemini 2.5 Pro.
- The API's potential for specific agentic applications is being weighed against the capabilities of competing models.
- GPT-4-Turbo Models Get Showcased: A member shared links to the
gpt-4-1106-vision-previewandgpt-4-turbo-2024-04-09models in the OpenAI documentation.- The availability of these models provides developers with access to enhanced vision and processing capabilities.
MCP (Glama) Discord
- MCP Server Proxy Parallels Server Functions: A member is seeking an MCP server proxy to call multiple server functions in parallel, aiming to read messages from Slack and Discord simultaneously to reduce waiting time, suggesting asyncio.gather in Python for custom clients to achieve parallel execution.
- The goal is to reduce waiting time when fetching messages from multiple sources.
- A2A Protocol Challenges MCP?: Members debated Google's A2A (Agent-to-Agent) protocol and its relationship to MCP, with some seeing A2A as a potential attempt by Google to undermine MCP.
- The sentiment ranged from A2A being a strategy to limit MCP's scope by handling agent-to-agent communication separately, while others see no spec overlap but potential vision overlap.
- Semantic Tool Calling Reduces Tool Confusion: Members explored semantic tool calling, a method to address LLMs getting confused when presented with a large number of tools (200+), using a vector model to embed tool descriptions.
- The goal is to select a subset of tools based on semantic similarity to the current task, functioning as a RAG pipeline for tools.
- MCP Registry Loading Times Optimized: The Glama team optimized the MCP registry and improved page load times by implementing every trick in the book, including tree-shaking, for sites like Paddle MCP Server.
- They are still working on tree-shaking all the Javascript.
- GraphQL MCP Server Leverages GitHub API: A member built a one-tool MCP server to leverage GitHub's full GraphQL API, addressing limitations of GitHub's official MCP Server.
- The new server aims to reduce tool count while enhancing functionality with GitHub's GraphQL capabilities.
GPU MODE Discord
- AMD Launches $100K Kernel Competition!: AMD and GPU MODE launched a $100K competition to accelerate inference kernels on MI300, starting April 15th and ending June 12th, signup here.
- The competition focuses on FP8 GEMM, Multi-head Latent Attention, & Fused MOE and supports Triton, tinygrad, and Torch, with winning teams getting a paid trip to San Jose.
- Tilelang smashes FlashMLA perf on MI300X: Members reported impressive FlashMLA performance using Tilelang on MI300X, linking to a benchmark script.
- Members discussed the AMD Developer Challenge 2025, as well as the need for a simpler way to install tilelang on AMD.
- Scout Model Skips QK Norm?: A member highlighted that the Scout model differs from others by using L2 Norm on Q and K instead of QK Norm, as noted in this LinkedIn post.
- Another member questioned whether the model can effectively differentiate tokens in attention, given the constraints of norm(q) = 1 and norm(k) = 1, calculating a maximum softmax probability of approximately 0.000901281465 due to chunked attention.
- Cutlass User Seeks ScatteredGather+GEMM Fusion: A user working on a pointcloud project with Cutlass 3.x is facing memory usage issues with scattered-gather operations and is looking for a Cutlass kernel that can fuse ScatteredGather and GEMM operations.
- The user's setup involves input tensors for point features
[B x M x C_in], neighbor indices[B x N x K], and weight matrices[B x N x K x C_mid]for point convolution.
- The user's setup involves input tensors for point features
Modular (Mojo 🔥) Discord
- Modular Throws Los Altos Meetup: Modular is hosting a meetup in Los Altos, CA on April 24th, featuring a talk on GPU programming with MAX and Mojo.
- The event will likely cover recent advancements and practical applications within the Mojo ecosystem.
- Users Demand Open Source Compiler Release: Some users are eagerly awaiting the open-sourcing of the compiler to finally have some fun working on it and contribute to the language's development.
- The open-sourcing of the compiler is expected to foster community involvement and accelerate innovation within the Mojo ecosystem.
- Blind Programmer Tackles Mojo: A blind programmer named Deniz is diving into Mojo but is facing issues with GPU programming and VSCode extension discrepancies.
- Deniz is encountering discrepancies between the compiler and VSCode extension, particularly with standard library functions.
- MAX Install Consumes Terrifying Disk Space: Multiple versions of MAX are using excessive disk space, up to 38GB in one case, located in
Users/nick/Library/Caches/rattler/cache.- The proliferation of nightly builds was blamed for the excessive disk usage, and users proposed running
magic clean cache -yvia cron to reclaim disk space, or usemagic clean cache --condato avoid nuking the entire Python cache.
- The proliferation of nightly builds was blamed for the excessive disk usage, and users proposed running
- Magic Add Discovers Extramojo: Users reported that the command
magic add extramojowas confirmed to work, adding extramojo version 0.12.0 or greater, but less than 0.13 to the current environment.- Some users stated that
magic addcannot be used with GitHub URLs and that it might require manual addition to the file.
- Some users stated that
HuggingFace Discord
- ZeroGPU Occupies Full Quota: A user pointed out that their ZeroGPU space occupies the full 120s of the requested quota even when the generation takes less time.
- A member explained that ZeroGPU is a shared resource which counts occupation time, which explains the waste.
- Diffusers Drops Suite of Memory Optimizations: Diffusers v0.33.0 has been released, including a suite of image and video generation models, alongside a wide array of memory optimizations with caching.
- This release also introduces
torch.compile()support when hotswapping LoRAs, with details available on the release page.
- This release also introduces
- SF6 Bot Seeks Computer Vision Expertise: A developer is building a Python script for a Discord chat bot that analyzes Street Fighter 6 gameplay in real-time to offer coaching feedback using computer vision.
- They are seeking an expert to enhance OpenCV's ability to find UI elements; the bot uses Gemme3, Langchain, OBS, Discord, ChromaDB (with SF6 frame data), and per-user session memory.
- SmolAgents Faces Parsing Errors: A user reported encountering an "Error in code parsing" when using smolagents CodeAgent with Ollama, either Llama3.2:latest or qwen2:7b, for tasks like playlist search.
- The error often leads to hallucinations, such as "calculating the pope's age," possibly due to model size, specialization, or output formatting.
- Google ADK Debuts for Agent Interoperability: Google launched the Agent Development Kit (ADK) to facilitate agent interoperability, detailed in a Google Developers Blog post and via Firebase Studio.
- This kit aims to standardize how agents interact and collaborate, fostering a more connected ecosystem.
Notebook LM Discord
- NotebookLM Plus Users Wanted for Research: Users of NotebookLM Plus hitting source limits or using Audio Overviews are invited to a UXR session to discuss their experiences and strategies, via this form.
- The research aims to understand specific use cases where users encounter limits with the number of sources, Audio Overviews, or chat interactions.
- Discord's Rules are still Rules: Moderators reminded users to adhere to Discord guidelines and avoid spamming or posting unrelated content, or risk being banned.
- This announcement ensures the space remains a helpful resource specifically for NotebookLM users.
- Mobile App Anticipation Mounts: The announcement of a Notebook LM mobile app has generated excitement, with users expecting improvements over the current mobile web experience, especially for resolving audio preview issues.
- Users hope the app will provide a better mobile experience than the existing mobile web version.
- PDF Image Recognition Remains Elusive: Users are reporting that Notebook LM is still failing to recognize images within PDFs, contrary to earlier indications.
- One user noted that Gemini Advanced outperforms Notebook LM in extracting text from images, despite the expectation that Notebook LM should have this capability.
- Paid Users Still Awaiting Source Discovery: Despite being paying Notebook LM users, many are still waiting for access to the Discover Sources feature, which is already available on their free accounts.
- Frustration is growing as the rollout of this feature seems inconsistent, with premium users not receiving benefits they expect.
Nous Research AI Discord
- Cloud Backup gets Self-Encrypted and Local: A member shared a project for self-encrypted cloud backup/sync and local chats that utilizes your own OpenRouter API key, showcased in this tweet and video.
- The project seems to be gaining traction, evidenced by a user commenting that it looks neat.
- Live Modulation Paper Sparks Discussion: Members discussed a new paper on live modulation, where a fused memory vector (from prior prompts) is evolved through a recurrent layer (the Spray Layer) and injected into the model’s output logic at generation time, with details in this paper.
- The paper offers a potential path to thinking models that don't need to think for 10k tokens.
- Members Ponder Control-Vectors for Augmenting Models: A member inquired about using vgel's control-vectors to augment models for behavior and alignment, instead of relying solely on prompting, especially when generating AI responses for a target dataset.
- Another member responded that experiments have been conducted, but general applicability is challenging due to instability, though it remains an area of active exploration.
- Nemotron Model Output Simplifies Output Processing: The new Nemotron model now returns an empty
<think></think>tag when reasoning isn't used.- A user found this helpful for easier output processing.
- Small VLMs Sought for OCR on Device: A member is looking for small and capable VLMs for on-device OCR, aiming to create a Swift app using CoreML with ONNX.
- No specific VLMs were recommended.
Torchtune Discord
- MCP: Communal Tool-Building Server: MCP is not just remote tools discovered automatically, but a whole remote server that runs and can provide tools, prompt templates, RAG-like data, even LLM sampling exposed in a standardized way.
- It enables entities like Google to create tools (e.g., calendar event creation) once and expose them for any client model to use via an MCP server, promoting communal integrations with a standardized API.
- Google's A2A Mirrors MCP: Google just announced something oddly similar called A2A (Agent to Agent) for agent interoperability.
- One member noted that their implementation looks like what you would expect from a C++ programmer.
- Llama4 Support Incoming: A member expressed interest in contributing to #2570 (Llama4 support) and offered assistance with relevant issues.
- Supporting different sharding strategies is pretty straightforward, though an issue has been open for over a year without prioritization due to lack of demand.
- Scout and Maverick Models Launch: The Scout model has reached a decent state on text data, with plans for multimodal support, while the Maverick model is still undergoing testing.
- The current iRoPE implementation uses flex but may need optimization due to potential recompiles during generation and getting the model to work with torch compile is also a priority.
- Detach From Your Losses: A warning about converting a tensor with
requires_grad=Trueto a scalar was reported, but a member offered an easy fix usingrunning_loss.detach()for this and other recipes.- Another member replied that when seed is fixed all unit tolerances may be brought down.
Nomic.ai (GPT4All) Discord
- Formatting Code Provokes Prompt Debate: Members debated ideal code formatting within AI prompts, specifically the use of spaces around curly braces, such as
function bla(bla) { //Code }instead offunction bla(bla){ //Code }.- A member suggested refactoring code later with better tools for cleaner outputs, advocating for simpler prompts.
- DeepSeek R1 Distill Qwen Model Dissected: Members discussed the details of the model name DeepSeek R1 Distill Qwen, confirming it involves knowledge distillation from a larger DeepSeek R1 model to fine-tune a smaller Qwen 7B model as documented in the official readme.
- The conversation clarified that Qwen 7B is fine-tuned using data generated by Deepseek R1.
- GPT4ALL Missing Logging Features: A user inquired about setting up user logs in GPT4ALL for educational purposes, learning from another member that GPT4ALL lacks native logging features.
- As an alternative, the user was recommended Llamma.cpp, implying it offers more extensive logging capabilities.
- Small LLMs Dominate Local Document Search: A member posited that smaller LLMs are optimal for searching within local documents due to their speed and reduced confabulation, especially in the context of LocalDocs.
- They suggested embeddings and direct links to page numbers or paragraphs could remove the necessity of using a full LLM for LocalDocs altogether.
- Chocolatine-3B: The GGUF Gem?: A member highlighted Chocolatine-3B-Instruct-DPO-v1.2-GGUF for its ability to handle approximately 8 snippets of 1024 characters.
- Despite being a French model, its 14B version demonstrates effectiveness in German too.
Eleuther Discord
- Ironwood TPUs Kick Inference Into Gear: Google released Ironwood TPUs for inference, heralding the age of inference according to a Google Cloud blog post.
- The new TPUs promise to accelerate inference workloads for AI applications.
- Google's ADK Enables Multi-Agent Assembly: Google launched the Agent Development Kit to streamline the creation of multi-agent systems.
- One member is switching their agent to this ADK because their agent can now talk with other agents, gather information, and construct prompts.
- Mollifiers Soften the Edges of ML Research: Discussion around the applications of mollifiers in ML research cited the Wikipedia article and a paper on Sampling from Constrained Sets.
- Potential uses include generalizing proofs across activation functions and enabling sampling from constrained sets.
- Transformer Tinkering Tunes Performance: Adding a zero-initialized learnable per-channel scale on the last linear layer of each block in a transformer decreases loss at a similar rate, but slows main path activation RMS growth according to muon experiments.
- This observation prompts further investigation into the underlying causes of these changes to model performance.
- String Matching Suspicions Spark Skepticism: Members speculated that marketing claims implied sophisticated techniques like influence functions when it may be "just" string matching over the full dataset.
- The simpler technique led to disappointment after initially inferring the use of more complex methods.
LlamaIndex Discord
- ChatGPT Zips Past RAG in Speed Race: Members debated why the ChatGPT web app feels faster than local RAG search even with 500 documents, some suggesting it's due to streaming.
- To debug, one member recommended using observation modules to check Retrieval and Generation times.
- AgentWorkflow stuck in Linearity Limbo: A member questioned if
AgentWorkflowonly works linearly, showing that the root agent doesn't properly handoff to multiple agents with attached script.- Another member confirmed only one agent is active at a time, suggesting using agents as tools to achieve splitting functionality.
- Agents morph into Tools for LlamaIndex: A member inquired about converting agents to tools within LlamaIndex, like
FunctionTools.from_agent()would suggest.- The recommended approach is writing a function that calls the agent and integrating it into a function tool, which allows great flexibility, though documentation is currently lacking.
- Developer descends offering Development Services: A member expressed interest in offering development services.
- No specific roles or projects were mentioned.
tinygrad (George Hotz) Discord
- Nvdec Hacking, Mesa Branching: Members mentioned nvdec, documented in NVIDIA's open-gpu-doc, referencing a YouTube video that class headers for video decode are available.
- They noted that there's a mesa branch with h264 already implemented, suggesting hevc shouldn't be far behind, and that there are bounties to claim.
- Llama's Logic Leap: A user reported getting an unexpected output from Llama instead of a MultiLazyBuffer error, referencing a failed benchmark.
- They suggested it might be related to syncing in the _transfer function.
- BufferCopy Backflip Fixes Bug: A user found that disabling the _transfer function and making the Upat in realize fallback to BufferCopy makes everything work fine.
- The user notes that this is not a root fix.
DSPy Discord
- Scramble to Maintain Codebase Context: A member sought methods to maintain context of the entire codebase within the Discord channel.
- The inquiry yielded no immediate solutions, highlighting a common difficulty in codebase management.
- Caching Subsystem Almost Ready: A member requested an update on the new caching subsystem, and another member stated it is underway.
- They anticipate it will be ready by the end of next week.
LLM Agents (Berkeley MOOC) Discord
- LLM Agents MOOC: Deadlines Approaching?: A member inquired about the possibility of completing the LLM Agents course and obtaining a certificate despite the course's earlier start date, highlighting concerns over approaching deadlines.
- Another member pointed to the course website for schedule and deadline information, suggesting all necessary details are available there.
- LLM Agents MOOC: Course Website Shared: A member provided a direct link to the LLM Agents course website, offering a central resource for course-related information.
- The course website is expected to contain details on the course schedule and deadlines, addressing a key point of inquiry within the channel.
Cohere Discord
- Fine-Tuning Aya Vision 8B with LoRA/QLoRA: A member inquired about fine-tuning the Aya vision 8B parameter model using LoRA or QLoRA.
- No further discussion or details were provided.
- Further Discussion Needed on Aya Vision 8B: More information is needed to properly evaluate the topic of fine-tuning Aya Vision 8B using LoRA or QLoRA.
- Without additional context or responses, it's difficult to provide a comprehensive summary.
Codeium (Windsurf) Discord
- Grok-3 arrives in Windsurf!: Grok-3 is now available in Windsurf at 1.0 user prompt credits per message and 1.0 flow action credits per tool call, according to this announcement.
- Windsurf also debuts Grok-3-mini (Thinking), boasting of its speed, at a reduced rate of 0.125 credits per message and tool call, available in individual paid plans.
- Windsurf Announces New Pricing Model: Windsurf has introduced a new pricing model based on user prompt credits and flow action credits for different models, including Grok-3 and Grok-3-mini
- The new pricing is designed to offer more flexibility and options for individual paid plans, with specific rates for each model and action type.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!