[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet day is all you need.
AI News for 3/31/2025-4/1/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7148 messages) for you. Estimated reading time saved (at 200wpm): 719 minutes. You can now tag @smol_ai for AINews discussions!
people were mostly smart enough not to launch things on april fools'.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Manus.im Discord Discord
- LMArena Discord
- Cursor Community Discord
- Unsloth AI (Daniel Han) Discord
- Perplexity AI Discord
- OpenAI Discord
- LM Studio Discord
- aider (Paul Gauthier) Discord
- OpenRouter (Alex Atallah) Discord
- Eleuther Discord
- Interconnects (Nathan Lambert) Discord
- GPU MODE Discord
- Latent Space Discord
- HuggingFace Discord
- Modular (Mojo 🔥) Discord
- Nous Research AI Discord
- Yannick Kilcher Discord
- MCP (Glama) Discord
- Notebook LM Discord
- Torchtune Discord
- tinygrad (George Hotz) Discord
- LlamaIndex Discord
- Nomic.ai (GPT4All) Discord
- LLM Agents (Berkeley MOOC) Discord
- Cohere Discord
- MLOps @Chipro Discord
- AI21 Labs (Jamba) Discord
- Codeium (Windsurf) Discord
- Gorilla LLM (Berkeley Function Calling) Discord
- PART 2: Detailed by-Channel summaries and links
- Manus.im Discord ▷ #showcase (1 messages):
- Manus.im Discord ▷ #general (753 messages🔥🔥🔥):
- LMArena ▷ #general (977 messages🔥🔥🔥):
- LMArena ▷ #announcements (1 messages):
- Cursor Community ▷ #general (867 messages🔥🔥🔥):
- Unsloth AI (Daniel Han) ▷ #general (256 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (48 messages🔥):
- Unsloth AI (Daniel Han) ▷ #help (248 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #research (23 messages🔥):
- Perplexity AI ▷ #announcements (2 messages):
- Perplexity AI ▷ #general (544 messages🔥🔥🔥):
- Perplexity AI ▷ #sharing (10 messages🔥):
- Perplexity AI ▷ #pplx-api (5 messages):
- OpenAI ▷ #annnouncements (1 messages):
- OpenAI ▷ #ai-discussions (314 messages🔥🔥):
- OpenAI ▷ #gpt-4-discussions (24 messages🔥):
- OpenAI ▷ #prompt-engineering (9 messages🔥):
- OpenAI ▷ #api-discussions (9 messages🔥):
- LM Studio ▷ #general (198 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (63 messages🔥🔥):
- aider (Paul Gauthier) ▷ #general (230 messages🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (30 messages🔥):
- OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):
- OpenRouter (Alex Atallah) ▷ #general (98 messages🔥🔥):
- Eleuther ▷ #general (43 messages🔥):
- Eleuther ▷ #research (21 messages🔥):
- Eleuther ▷ #scaling-laws (4 messages):
- Eleuther ▷ #interpretability-general (5 messages):
- Eleuther ▷ #lm-thunderdome (28 messages🔥):
- Eleuther ▷ #gpt-neox-dev (5 messages):
- Interconnects (Nathan Lambert) ▷ #news (70 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #random (24 messages🔥):
- Interconnects (Nathan Lambert) ▷ #rl (4 messages):
- Interconnects (Nathan Lambert) ▷ #reads (7 messages):
- GPU MODE ▷ #general (46 messages🔥):
- GPU MODE ▷ #triton (2 messages):
- GPU MODE ▷ #cuda (2 messages):
- GPU MODE ▷ #torch (5 messages):
- GPU MODE ▷ #cool-links (1 messages):
- GPU MODE ▷ #jobs (1 messages):
- GPU MODE ▷ #beginner (2 messages):
- GPU MODE ▷ #off-topic (2 messages):
- GPU MODE ▷ #irl-meetup (3 messages):
- GPU MODE ▷ #self-promotion (1 messages):
- GPU MODE ▷ #🍿 (1 messages):
- GPU MODE ▷ #reasoning-gym (9 messages🔥):
- GPU MODE ▷ #general (3 messages):
- GPU MODE ▷ #submissions (17 messages🔥):
- Latent Space ▷ #ai-general-chat (75 messages🔥🔥):
- HuggingFace ▷ #general (42 messages🔥):
- HuggingFace ▷ #today-im-learning (3 messages):
- HuggingFace ▷ #cool-finds (3 messages):
- HuggingFace ▷ #i-made-this (1 messages):
- HuggingFace ▷ #computer-vision (1 messages):
- HuggingFace ▷ #gradio-announcements (2 messages):
- HuggingFace ▷ #agents-course (16 messages🔥):
- HuggingFace ▷ #open-r1 (1 messages):
- Modular (Mojo 🔥) ▷ #general (9 messages🔥):
- Modular (Mojo 🔥) ▷ #mojo (59 messages🔥🔥):
- Nous Research AI ▷ #general (40 messages🔥):
- Nous Research AI ▷ #ask-about-llms (8 messages🔥):
- Nous Research AI ▷ #research-papers (2 messages):
- Nous Research AI ▷ #interesting-links (10 messages🔥):
- Nous Research AI ▷ #research-papers (2 messages):
- Yannick Kilcher ▷ #general (35 messages🔥):
- Yannick Kilcher ▷ #paper-discussion (5 messages):
- Yannick Kilcher ▷ #ml-news (20 messages🔥):
- MCP (Glama) ▷ #general (38 messages🔥):
- MCP (Glama) ▷ #showcase (13 messages🔥):
- Notebook LM ▷ #announcements (1 messages):
- Notebook LM ▷ #use-cases (9 messages🔥):
- Notebook LM ▷ #general (39 messages🔥):
- Torchtune ▷ #general (11 messages🔥):
- Torchtune ▷ #dev (16 messages🔥):
- tinygrad (George Hotz) ▷ #learn-tinygrad (15 messages🔥):
- LlamaIndex ▷ #blog (1 messages):
- LlamaIndex ▷ #general (6 messages):
- Nomic.ai (GPT4All) ▷ #general (7 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):
- Cohere ▷ #「💬」general (3 messages):
- Cohere ▷ #「🤝」introductions (2 messages):
- MLOps @Chipro ▷ #events (1 messages):
- MLOps @Chipro ▷ #general-ml (1 messages):
- AI21 Labs (Jamba) ▷ #general-chat (2 messages):
- Codeium (Windsurf) ▷ #announcements (1 messages):
- Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):
AI Twitter Recap
Open Source Models and Releases
- OpenAI's upcoming open-weight language model: @sama stated OpenAI will not impose restrictions like preventing usage if a service exceeds 700 million monthly active users. @LiorOnAI noted that OpenAI is planning to release their first open-weight model since GPT-2 in the next few months. @ClementDelangue welcomes OpenAI's willingness to share open weights, hoping it leads to a golden age of AI progress. @snsf mentioned the open weight model coming in the next few months.
- DeepSeek's Open-Source R1 Model: @scaling01 reports that OpenAI's commitment to releasing an open-weight language model is a response to DeepSeek's R1 model launch on January 20, 2025, which challenges the notion that China lags in AI development.
- License and Usage of Open Source Models: @cognitivecompai defended that someone merely stated that a license was silly and that he's not going to do that.
Model Performance and Benchmarks
- Gemma Model Performance: @osanseviero announced that Gemma 3 can do function calling and is now on the Berkeley Function-Calling Leaderboard. @jack_w_rae noted Gemini's rate of progress in math is amazing to see, driven by talented researchers, observing the uplift on HMMT.
- GemmaCoder3-12b: @ben_burtenshaw introduced GemmaCoder3-12b, a code reasoning model that improves performance on the LiveCodeBench benchmark by 11 points, highlighting its ability to run on 32GB of RAM, its 128k context length, and the option to activate thinking via the chat template.
- Qwen 2.5 Models: @TheTuringPost highlights Alibaba_Qwen's Qwen2.5-Omni, which understands any types of input and introduces a two-part Thinker-Talker system and TMRoPE feature to create responses in both text and natural speech.
- @vipulved reported that the TogetherCompute inference team achieved 140 TPS on a 671B parameter R1 model, which is ~3x faster than Azure, and ~5.5x faster than DeepSeek API on Nvidia GPUs.
AI Product and Tool Releases & Updates
- ChatGPT and OpenAI: @kevinweil announced that the new image generation in ChatGPT is now available to 100% of free users. @OpenAI announced the release of a new voice in ChatGPT.
- Runway Gen-4: @TomLikesRobots shares excitement about Gen-4 for animating miniature diorama-style generations, praising its movement interpretation and style maintenance.
- LangChain: @LangChainAI introduced using LangGraph's pre-built computer use agent through a chat-based generative UI.
- Figure 03 Humanoid Robots: @adcock_brett discussed the first commercially deployed humanoid robots, highlighting full autonomy, real-world integration at BMW, fleet data for better pretraining, and BotQ manufacturing scaling.
- Other Tools: @juberti highlighted the new OpenAI realtime transcription API now supports WebRTC connections. @TheRundownAI mentioned Amazon’s Nova Act AI browser agent.
AI Research and Studies
- Efficient Reasoning for LLMs: @omarsar0 shared a survey focusing on reasoning economy in LLMs, analyzing how to balance deep reasoning performance with computational cost.
- Stanford's Tutor CoPilot: @DeepLearningAI reports that Stanford researchers developed Tutor CoPilot, a GPT-4-powered tool that assists online tutors.
- AI-driven Automation and Economic Implications: @EpochAIResearch discussed that AI investments might seem huge, but global wages add up to over $70 trillion.
Hugging Face and Gradio
- Gradio Usage: @ClementDelangue announced that Gradio just crossed 1,000,000 monthly developers using it in March.
Humor/Memes
- Sarcasm and April Fools' Jokes: @sama joked that "-restart-0331-final-final2-restart-forreal-omfg3" is gonna hit, i know it. @vladquant jokingly announced that after strategic review, Kagi is now Kagibara.
AI Reddit Recap
/r/LocalLlama Recap
1. LLM Mathematical Reasoning Limitations
- Olympiad Obstacle: Top Models Falter: A research paper revealed that state-of-the-art LLMs like O3-MINI and Claude 3.7 scored less than 5% on the 2025 USA Mathematical Olympiad (USAMO), despite being trained on extensive mathematical data including previous olympiad problems.
- The study highlighted significant issues with the models' logical reasoning, creativity, and self-evaluation capabilities, with LLMs overestimating their own scores by up to 20x compared to human graders. Community discussion pointed to the need for specialized proof-focused benchmarks and integration with formal proof tools like Lean or Coq.
- Formal Proof Progress: Pioneering Paths Forward: Reddit users discussed ongoing research efforts in automated theorem proving, with users sharing links to Google's AlphaProof and several open-source projects from Princeton, Stanford, and Huawei focusing on formal mathematical proofs.
- The discussion highlighted the challenges of formalizing mathematics, with users suggesting that future AI systems might combine strict formalized symbolic logic with diffusion-like processes for concept discovery. Many agreed that current LLMs need specialized tools and training to excel at mathematical reasoning rather than just answer prediction.
2. DeepMind Research Publication Strategy
- Six-Month Secrecy: DeepMind's Defensive Delay: According to a Financial Times report, Google's DeepMind will implement a six-month embargo policy on publishing strategic generative AI research papers to maintain competitive advantage, with a researcher stating they "could not imagine us putting out the transformer papers for general use now."
- The community had mixed reactions, with some users arguing the delay is reasonable given how companies like OpenAI built their business on DeepMind's freely shared research, while others expressed concern that this could be a "race to the bottom" that would eventually lead to longer delays or permanent secrecy.
- Open Research Ramifications: Progress vs. Profit: Redditors debated the impact of DeepMind's new publication policy on AI advancement, with many pointing out that transformer architecture research from 2017 created hundreds of billions in value for other companies while Google failed to capitalize on its own innovations.
- Some commenters argued that open collaboration accelerates progress for everyone, noting "we probably wouldn't be where we are currently when it comes to the field if it wasn't publicly shared," while others defended DeepMind's right to protect its intellectual property and competitive position.
3. New Tools and Features for Local LLM Users
- Hugging Face's Hardware Helper: Hugging Face launched a new feature that allows users to check if their hardware can run specific GGUF models directly from the model page by entering their hardware specifications at https://huggingface.co/settings/local-apps.
- Users welcomed this quality-of-life improvement while suggesting additional features such as filtering models by hardware compatibility, estimating maximum context length, and providing layer offload recommendations for CPU+GPU setups. The Hugging Face team indicated they would iterate on these suggestions in future updates.
- Mobile Model Momentum: iPhone Inference Innovations: A developer demonstrated achieving 90 tokens per second with Llama 3.2 1B in float16 precision on an iPhone by completely rewriting the inference engine, showcasing significant performance improvements over existing solutions like MLX.
- The community discussed the trade-offs between using float16 versus quantized models, with some questioning whether the quality difference between fp16 and q8 was significant enough to justify the performance cost, while others debated the practical applications of such small models on mobile devices.
- DeepSeek's Diminutive Deployment: V3 GGUF Quantization: User VoidAlchemy released new GGUF quantizations of DeepSeek V3-0324 using the ikawrakow/ik_llama.cpp fork, optimized to support 32k+ context in under 24GB VRAM with Multi-Layer Attention (MLA) and high-quality tensors for attention/dense layers.
- The quantizations were designed specifically for the ik_llama.cpp fork and won't work with mainline llama.cpp or other tools like Ollama or LM Studio. Performance benchmarks showed achieving near Q8_0 quality with speeds comparable to 4bpw quants on CPU-only setups.
4. Novel LLM Research Concepts
- Temporal Training: LLMs Trapped in Time: A Reddit user proposed creating LLMs trained exclusively on data from before a specific year or time period, such as pre-2010, sparking discussion about the feasibility and implications of such historically-bounded models.
- Community members suggested that models limited to pre-1950s data would be possible with public domain books, newspapers, and archived materials, but noted such models would reflect historical biases and technological optimism while lacking modern concepts. Some pointed to existing research like TimeLMs that tracked how language models' performance degraded on recent content.
- Acoustic Analysis: GPU Symphonies from Model Inference: Users discovered that different LLM models produce distinctive sounds from GPUs during inference, with a post linking to evidence that these audio patterns are specific to model architecture, quantization, and context size combinations.
- The discussion revealed this phenomenon is caused by "coil whine" from capacitors and inductors in the GPU's voltage regulator module, with some noting researchers have previously extracted encryption keys by recording such processing noise, raising potential security implications.
- Attention-Free Architecture: Qwerky's Quantum Leap: A post highlighted Qwerky-72B and 32B, attention-free models trained on just 8 GPUs, representing a significant advancement in efficient model architecture that requires less computational resources.
- These models, available on Hugging Face, demonstrate how attention-free architectures can reduce VRAM requirements while maintaining performance, with community members noting the potential implications for long context handling and accessibility of large model training.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
1. GPT-4o Image Generation Capabilities
- Precise Placement Prowess: Reddit users are impressed by GPT-4o's ability to handle precise item arrangement and text in generated images. A user shared examples showing how the model accurately places multiple icons in grid layouts with correct labeling, maintaining consistency across complex visual hierarchies.
- Many commenters noted that GPT-4o's unified text-image architecture gives it a significant advantage in understanding and executing detailed prompts compared to other models. One user demonstrated the model could handle up to 24 distinct icons with labels before quality degradation, showcasing its impressive compositional capabilities.
- Content Filter Frustrations: Users expressed frustration with GPT-4o's content filtering system, with one post titled "Chat gpt 4O sucks and everything trips its baby mode content filters" gaining significant traction. The poster complained about being unable to generate even mildly violent or suggestive content like fantasy battle scenes.
- Despite complaints, several users demonstrated techniques to work around the filters, sharing successful generations of warrior characters and stylized art. This sparked debate about OpenAI's approach to content moderation, with some users creating satirical content mocking the filters, including a GitHub repo titled "I've reverse-engineered OpenAI's ChatGPT 4o image generation algorithm" that was actually an April Fool's joke.
2. Claude vs Gemini Competition Heats Up
- Gemini 2.5 Takes the Lead: A post titled "This is the first time in almost a year that Claude is not the best model" sparked significant discussion as a Claude user admitted that Google's Gemini 2.5 now outperforms Claude across multiple use cases. The post highlighted Gemini's superior handling of context and overall reliability.
- Users debated specific strengths of each model, with many noting that Gemini 2.5's million-token context window is a game-changer compared to Claude's more limited capacity. Several commenters praised Gemini's creative writing abilities, though some suggested the influx of pro-Gemini posts might be strategic "astroturfing" rather than organic user feedback.
- Claude's Service Struggles: Multiple posts documented issues with Claude's service reliability, with users reporting increased rate limiting and error messages like "Message Limit Reached" appearing more frequently for paid subscribers. Screenshots showed the service becoming unresponsive despite users paying for premium access.
- The timing of these issues coincided with growing praise for Gemini 2.5, leading some users to question Anthropic's infrastructure scaling. One user wrote, "Anthropic should scale or get out of the business, we are paying customers," while others noted they were switching to Gemini due to Claude's increasingly restrictive usage limits.
3. Video Generation Breakthroughs
- Wan 2.1 Video Model Mastery: User @legarth shared an impressive video demonstration of the Wan 2.1 vid2vid model running locally on a 5090 GPU, transforming a clip from Tropical Thunder into a scene featuring the Joker. The model accurately maintained physics details like jacket movement despite working only with pose information.
- The creator explained they processed 216 frames (9 seconds at 24fps) but noted quality deterioration after about 120 frames. The community was particularly impressed by the model's ability to predict physics from motion alone, with one commenter noting "the jacket's cool. Physics" and another highlighting how the model handled hair movement despite the original actor being bald.
- VACE Video Control Released: A significant open-source video generation advancement was announced with the partial release of VACE (Video with Attention-based Cascaded Extraction) models on GitHub. The release includes VACE-Wan2.1-1.3B-Preview and VACE-LTX-Video-0.9, with the larger 14B version promised for later.
- Users expressed excitement about this open-source alternative to closed commercial platforms, with one commenter noting: "If this works anything like the examples shown, open-source video just leveled up big time." The technology appears to offer enhanced control over video generation, including structure and pose preservation features.
4. AI Development Tools and Innovations
- Claude Code's Costly Creation: A developer shared their experience spending $417 on Claude Code to build a word game called LetterLinks, detailing both successes and frustrations with the AI coding assistant. Despite the high cost, the user concluded it was still cheaper than hiring a freelancer for the estimated $2-3K the project would have required.
- The post highlighted specific issues with Claude Code, including context window limitations that became problematic as the codebase grew to 15K lines, and the need for extensive manual testing since "Claude can write code all day but can't click a f***ing button to see if it works." Many commenters suggested alternatives like using Gemini 2.5 Pro with its million-token context window or Claude MCP on the desktop app.
- EasyControl: Diffusion Transformer Enhancement: A new framework called EasyControl was released, designed to add efficient and flexible control capabilities to Diffusion Transformer (DiT) models. The system incorporates a lightweight Condition Injection LoRA module and position-aware training to enhance model compatibility and generation flexibility.
- Community members were particularly interested in EasyControl's potential to provide ControlNet-like functionality for Flux models, with one user commenting "Could this be the long-awaited good ControlNets for Flux?" Testing revealed mixed results, with OpenPose control working well but subject transfer capabilities showing inconsistent performance.
5. Pixel Art and Retro Graphics AI
- Retro Diffusion's Pixelated Precision: Retro Diffusion launched an interactive browser-based playground for generating authentic pixel art using AI, with no signup required. The FLUX-based model can create pixel art across various styles through smart prompting alone, without requiring LoRAs.
- The technical article accompanying the launch detailed how Retro Diffusion solved challenges specific to pixel art generation, including grid alignment, limited color palettes, and maintaining pixel-perfect outputs. The platform's creator joined the discussion, answering questions about features like animation capabilities and color palette control.
- XLSD: Lightweight Model Magic: Developer @lostinspaz shared progress on the XLSD project, which aims to create a high-quality image generation model that can run on systems with limited VRAM (8GB or even 4GB). The approach involves forcing SD1.5 to use the SDXL VAE and then training it to produce significantly better results.
- Comparison images showed substantial quality improvements over the base SD1.5 model, with the developer noting they were "cherry picking a little" but providing a fair comparison using identical settings. The community responded positively to this optimization-focused approach, with one commenter appreciating "people who push a piece of technology to its limit and explore it just for the sake of it."
AI Discord Recap
A summary of Summaries of Summaries by o1-preview-2024-09-12
Theme 1: OpenAI's Open-Weight Model Sparks Excitement
- Sam Altman Dangles Open-Weight Model Carrot: OpenAI CEO Sam Altman announced plans to release a powerful new open-weight language model, seeking developer feedback to maximize its utility. He assured they "will not do anything silly like saying that you can't use our open model if your service has more than 700 million monthly active users."
- Community Speculates on OpenAI's Strategy Shift: Nathan Lambert expects a 30B parameter reasoning model with an MIT/Apache license, igniting discussions about OpenAI's potential impact on the open-source community.
- Enthusiasts Brace for OpenAI's Return to Open Releases: AI developers express optimism about OpenAI's move, seeing it as a boost for collaboration and innovation in AI development.
Theme 2: New AI Models Under the Microscope
- Gemini 2.5 Pro's 'Aliveness' Stirs Turing Test Talk: Users are intrigued by Gemini 2.5 Pro's unique interaction style, suggesting it "might be the first to pass a serious Turing Test" due to its apparent aliveness and curiosity.
- DeepSeek R1 Zips Past Rivals, Democratizes RL: DeepSeek R1 outperforms larger labs with efficient resource use and an MIT license, making reinforcement learning accessible to the "GPU poor" through GRPO.
- Gemma 3 Smokes Gemini 1.5 in Benchmarks: Gemma 3 27B outperforms Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, impressing users with its superior capabilities.
Theme 3: Users Vent Over AI Tool Troubles
- Manus.im Users Rage Over Credit Crunch: Manus.im's new credit system infuriates users as credits deplete rapidly, leading them to recommend alternative AI research tools like Traycer.
- Gemini 2.5 Pro Rate Limits Drive Users Nuts: Frustrated users encounter rate limits on Gemini 2.5 Pro, debating whether limits apply to free and paid tiers, with some attempting to bypass them via a VPN.
- Cursor Charges for Free Models? Users Say 'What the Heck!': Cursor users question being charged to use a free model, sparking discussions about API usage, billing practices, and transparency within the platform.
Theme 4: Open-Source Contributions and Technical Innovations Shine
- Neuronpedia Opens the Data Floodgates: The interpretability platform Neuronpedia goes open source under MIT license, releasing over 4 TB of data and tools to democratize model interpretability.
- Stanford Teaches Transformers to the Masses: Stanford opens its CS25 Transformers seminar course to the public via Zoom and YouTube, covering topics from LLM architectures to creative applications.
- Megatron Tensor Parallelism Gets Deep Dive Treatment: An illustrated deep-dive into Megatron-style tensor parallelism, including fused/parallel CE loss, is shared, enhancing understanding of ML scalability and performance techniques.
Theme 5: AI Makes Strides in Law and Healthcare
- AI Decodes Legal Jargon in New Seminar Series: The Silicon Valley Chinese Association Foundation hosts a seminar on AI in legislation, featuring the founder of Legalese Decoder, exploring how AI simplifies complex legal documents.
- Sophont Aims for Medical AI Revolution: Sophont launches to build open multimodal foundation models for healthcare, striving to create a DeepSeek for medical AI.
- Dream On! Rem App Wants You to Journal Your Nighttime Adventures: Rem introduces a dream journaling app that allows users to record, analyze, and share dreams, leveraging AI to uncover hidden patterns in their subconscious.
PART 1: High level Discord summaries
Manus.im Discord Discord
- R1 Users Rage Over Credit Crunch: Many R1 users expressed dissatisfaction with the new credit system, with some experiencing complete credit depletion after a few requests, and recommend alternative AI research tools like Traycer to save credits.
- They observed that the system is like gambling and proposed more clear and transparent options for future plans, urging a reconsideration for user adoption.
- Decoding Credit Consumption: Credits deplete based on LLM tokens, virtual machines, and third-party APIs, increasing with task complexity and time, now also consuming credits even just browsing online.
- Members reported projects failing to upload and needing 800 credits plus 1800 more debugging, pointing out that debugging on ChatGPT was superior.
- OpenManus Gains Traction: Despite security concerns with PAT and API keys, there's rising interest in OpenManus, with some planning to evaluate its capabilities, with a member asking if the tool's output could improve.
- Members caution about capability deficiences when adapting to the Manus' work scenarios and also point out that it can generate interactive study guides as websites and in depth research, depending on the situation.
- Manus Now Offers Website Hosting: Members are reporting success with Manus on creating a hosted website, pointing out that the software provides DNS and hosting services, while they are combining services like Perplexity and Gemini Deep Research.
- A member says there's a video on website creation, leading other members to inquire about how to get people to use the website.
- Manus Android App Debuts: Users discover that Manus has an Android app, accessible via the browser by clicking a phone icon, which redirects to the Play Store.
- Some members even jokingly suggested purchasing an iPhone as a solution.
LMArena Discord
- Meta Models' Safety Settings Get Downgraded: Newer models from Meta are becoming safer by sanitizing censored details when inferring hidden context from corrupted text, marking a shift in model behavior.
- Previous models like Themis, Cybele, and Spider were eager to go where other models couldn't.
- Decoding the 'Venom' System Prompt: Members analyzed the system prompt for models like Spider, Cybele, and Themis, believing they share a similar prompt to the now exposed
venom
prompt.- The analysis reveals a whacky but intelligently crafted prompt that heavily influences the models' style and responses, particularly in how they format and structure their outputs.
- Gemini 2.5 Pro's 'Aliveness' Sparks Debate: Members expressed intrigue over the aliveness and curiosity of Gemini 2.5 Pro, with one suggesting it might be the first to pass a serious Turing Test due to its unique interaction style and exceptional creative writing.
- They highlight Gemini's top scores on Philip's SimpleBench as evidence of its potential and note the model appears to be more creative and engaging, leading to calls for a double-blind Turing Test.
- LMArena Unleashes New Models into the Pantheon: LMArena introduced a flood of anonymous models like Aether, Maverick, Ray, Stargazer, Riveroaks, with members trying to uncover their origins and capabilities.
- Stargazer is said to be made by Google (=== Nebula), and Riveroaks claims to be from OpenAI, gpt 4o, while Maverick, Spider and 24_karat_gold seem to have a similar style due to their shared system prompts and origins at Meta.
- Alpha Arena Adds Copy Code and Images: The Alpha Arena now features a copy code function and image generation capabilities, enhanced accessibility.
- Testers are encouraged to provide feedback via a Google Forms link and report bugs via an Airtable link.
Cursor Community Discord
- Reasoning Ability of Gemini 2.5 Pro Debated: Members are debating the reasoning capabilities of Gemini 2.5 Pro, with some finding it quick but lacking depth, while others praise its performance in specific coding scenarios citing Tweet from Min Choi.
- Some suggested Claude 3.7 handles complexity and detail more effectively, however the new Gemini Pro 2.5 model is now being used in Cursor. See Tweet from Ryan Carson.
- Account Restrictions Ignite Trial Abuse Discussion: A user's account limitations sparked a debate about trial abuse, with claims of accounts being flagged for abuse and requiring a credit card.
- Alternatives like Windsurf or Cline were suggested to bypass payment issues, but no further details were provided about how to use those tools or their reliability.
- AI's Impact on Jobs Discussed: Members are discussing AI's potential impact on employment, speculating that 86% of jobs could be replaced by 2030.
- The response was to learn ML/AI and Prompting properly, with the additional suggestion to learn polynomials with regressions.
- Cursor's Charging for Free Models Questioned: Members questioned Cursor for charging to use a free model, with explanations clarifying that Cursor manages API usage through their wallet and has deals with AI models via Fireworks.
- The general consensus was that Cursor has limited token usage but is approximately 10x cheaper than Claude, offering a more cost-effective solution for some users.
Unsloth AI (Daniel Han) Discord
- Multi-GPU Support Marches to Unsloth: Unsloth is adding multi-GPU support, with the first release focusing on data parallelism, but fsdp (Fully Sharded Data Parallelism) may not be included initially.
- The fsdp (Fully Sharded Data Parallelism) component will be under the AGPL3 license.
- DeepGrove's Bonsai Boasts Budget BitNet Bootstrapping: A member is skeptical about DeepGrove's Bonsai claiming to pretrain a BitNet with only $70 and 3.8b tokens.
- They're running the model in Kaggle to see if it's valid, exploring whether the model is a blindly copied Qwen model or continue trained Qwen to BitNet.
- Unsloth Dataset Defect Detected: A user ran into a
ValueError
when using a custom dataset in Unsloth Orpheus format, which was later resolved by using a GPU.- Another user mentioned that the Orpheus dataset uses SNAC, which operates at 24kHz.
- Gemma 3 Generates Text-to-Image Wonders: A user sought image and text inference samples for Unsloth/Gemma 3 using Hugging Face, referencing a Gemma 3 demo on Hugging Face Spaces.
- It was noted that while Llama 3.2 Vision requires an image, Gemma 3 should not have the same issue.
- Long Ctx Benchmarks? RULER is Rule!: For long ctx benches, a member stated that RULER is the bare minimum for what should be considered a long ctx bench, and NIAH is garbage.
- They added that some of the recent ones are alright.
Perplexity AI Discord
- Discord Revamp Imminent!: The moderation team is gearing up to overhaul the Discord experience, focusing on simplified onboarding, a consolidated feedback channel, and automated Pro Channel access over the next week.
- These changes aim to streamline user engagement and ensure the team stays responsive to community needs.
- Space Instructions Still Caged?: Users found out that Space Instructions in Perplexity AI have limitations on controlling the search experience, mainly impacting output summarization.
- Because instructions only apply after the data has been extracted, this prevents the AI from avoiding specific topics.
- Image Generation Goes MIA: Users noticed the disappearance of the image creation feature within Perplexity.
- While it isn't clear if the function is completely discontinued, one user suggested using the web search to find the generate option, but another confirmed that the function doesn't seem to appear for everyone, perhaps indicating phased rollout or feature testing.
- GPT Omni Receives Thumbs Down: Members have reported experiencing frustration with GPT Omni, with one describing it as suck ass.
- While designed for smarter interaction with audio, video, and images, users have noted that Omni has been dumbed down compared to GPT-4 for cost reasons.
- JSON Escapes Sonar API: A user reported issues with the Sonar API adding odd special characters to the JSON results when searching the web, despite using pydantic for formatting.
- The user provided an example where extra characters were added to the
source_name
,source_title
,summary
, andurl
fields in the JSON output.
- The user provided an example where extra characters were added to the
OpenAI Discord
- ChatGPT Debuts Monday Voice: ChatGPT introduced a new voice option called Monday, accessible via the voice picker in the top right corner of voice mode, as shown in this demo video.
- Users can select the new Monday voice option by opening the voice mode and using the voice picker in the top right corner.
- Beware of Fake ChatGPT Apps!: Users reported encountering fake ChatGPT apps on the Play Store, where they purchased the app but did not receive access, emphasizing the need to verify purchases via their purchase history.
- It's crucial to ensure you're using the official app to avoid scams and ensure you have access to genuine OpenAI services.
- Gemini 2.5 Pro Rate Limits Plague Users: Users reported experiencing rate limits on Gemini 2.5 Pro, leading to discussions on whether the limit applies to both free and paid tiers, with some users trying to bypass rate limits by using a VPN.
- A suggestion was made to use Gemini in Google AI Studio instead, where the usage limits are higher (50 requests per day).
- ElevenLabs Model Promises Audiobooks: A member explored ElevenLabs' new model for narrated audio books, citing its voice cloning feature.
- While they were impressed with the initial results, they await OpenAI to release a similar voice product to avoid subscribing to external services, because it may be useful as a voice acting placeholder for game developers.
- Model Users RESET Rigid Patterns: A member shared a code snippet
FORMAT_RESET
to help models acknowledge when they've fallen into rigid patterns and rethink their approach.- The code encourages the model to analyze what format would better suit the response and completely rethink its approach without defaulting to templates.
LM Studio Discord
- Gemma 3 Smokes Gemini 1.5: Gemma 3 27B outperformed Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, with one member producing the data using Gemini 2.5 Pro, available for free on OpenRouter.
- A user with a 4060 Ti and i5 12400F was recommended Qwen Coder 7B, available on LM Studio's model page, though members emphasized that local LLMs generally perform worse than cloud alternatives.
- Gamers Plug eGPU into LM Studio: Members discussed the feasibility of using an eGPU with LM Studio, suggesting it should work if the computer recognizes it, though speeds may be slower, as referenced in a YouTube video comparing LLMs on RTX 4090 Laptop vs Desktop.
- Another user observed a 3.24x speedup from M4 Max to 5090 after resolving crashing issues, which aligns with the 3.28x ratio of their memory bandwidths when doing QwQ 32B 4 bit quant comparisons.
- Copilot's Code Called Garbage!: Members debated the benefits of AI assistance in programming, with one arguing it hurts more than helps due to learning from AI slop, expressing concern that Copilot is trained on garbage code.
- Others disagreed, saying Copilot works great for experienced developers, but one person noted average users trust the recommendations too easily.
- Context Size Drives Mac Preference: Despite faster Nvidia GPUs, users are leaning towards Macs for the freedom to have more context size, highlighting the utility of larger context sizes even with slower speeds.
- One user wondered what would happen if they could load the context overflow into the shared memory/system RAM while keeping the entire model in VRAM, but another user noted that the LLM needs all the context in VRAM to generate the next token.
- Nvidia Drivers Fail after 10 Hours: A user reported Nvidia driver instability after running models for 10-12 hours, requiring a driver reinstall to resolve performance issues, clarifying the issue was with the Nvidia driver itself, not the Windows OS.
- A user inquired about performance results for the Tenstorrent Wormhole (n150d and n300d) within the Discord community, expressing interest in obtaining TOK/s metrics for these models.
aider (Paul Gauthier) Discord
- Gemini 2.5 Pro is a Mixed Bag: Users are seeing mixed results with Gemini 2.5 Pro, with some models hallucinating, DC'ing, or being top-tier for coding tasks.
- One user found Gemini 2.5 Pro and DeepseekV3 to be "almost free and top tier", whereas others are giving up, and throwing away their computers, as shown in this GIF.
- RateLimitError Fixes Sought: Users are experiencing frequent RateLimitErrors when requesting summaries and clearing history.
- It was clarified that the rate limit is likely based on the number of requests per minute or day, and a possible solution may be found in this Github issue.
- Dot Command Revolution?: A user is promoting the use of .dotcommands as a productivity tool for developers, to automate tasks with single-line commands such as
.status
and.next
.- The goal is to provide cognitive shortcuts optimized for clarity and specific functionality, but it was noted that *THE DOT REVOLUTION IS HERE 🔥 Coders everywhere will want to try this one cool trick.
- Aider's Subtree Savior Emerges: Members are seeking ways to limit aider to a subdirectory of a monorepo.
- The solution is to use the
--subtree-only
switch after changing to the desired directory, setting aider to ignore the repo outside the starting directory, however, the asker pointed out the FAQ on large monorepos.
- The solution is to use the
- The Case of the Misconfigured Model: A member reported that specifying model names in a local YAML config file wasn't working as expected.
- Despite the startup message showing the correct config settings, aider still defaulted to anthropic/claude-3-7-sonnet-20250219 rather than the configured deepseek/deepseek-chat.
OpenRouter (Alex Atallah) Discord
- Organizations Feature Escapes Beta!: OpenRouter announced the Organizations feature is out of beta, enabling teams to manage billing, data policies, provider preferences, and API keys in one place, according to this X post.
- Over 500 organizations were created during the two-week beta, offering full control over data policies and billing.
- Web Search Invades Chatroom!: Web search results are now integrated into the chatroom, using Perplexity results formatted similarly to OpenRouter's
:online
model variants.- A user requested that OpenRouter post on Bluesky to avoid Xitter reliance.
- Gemini Flash 2 Transforms!: OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in.
- These transforms are applied by default on endpoints with context length less than 8192 tokens, and only once the 1M limit is reached.
- Usage Downloads Coming Soon!: A member requested downloads of their usage data, including tokens and costs, as displayed on the activity page, for credit verification.
- A maintainer responded that while this feature is unavailable, we're working on it.
- EU Provider Selection Quandary!: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer noted the need but mentioned limited coverage, recommending seeking an EU certified provider for strict EU data guidelines if provider selection is not enough.
Eleuther Discord
- Stanford Teaches Transformer Class, Online: Stanford has opened its CS25 Transformers seminar course to the public via Zoom, featuring discussions with researchers and covering topics from LLM architectures to creative applications and past lectures available on YouTube.
- The course includes lectures, social events, networking sessions, and a Discord server for discussions.
- Deep Sets finds Triangles, Achieves Nothing: A member shared a link to a paper titled Deep Sets for the Area of a Triangle (arxiv link), which presents a polynomial formula for triangle area in Deep Sets form.
- The abstract concludes that the project, motivated by questions about computational complexity of n-point statistics in cosmology, gained no insights of any kind.
- Neuronpedia Opens the Data Floodgates!: The interpretability platform Neuronpedia is now MIT open source and available on GitHub, offering a quick Vercel deploy.
- A trove of interpretability data, totaling over 4 TB, is available for download as Public Datasets.
- SmolLM Scores Zeroes Out, PR fixes Aggregate Scores: A member reported that aggregate scores for tasks like leaderboard_bbh, leaderboard_math_hard, and leaderboard_musr were empty in the results JSON when running leaderboard evaluations with lm-eval on SmolLM-1.7B.
- A member shared a PR adding subtask aggregation to address missing aggregate scores in tasks with subtasks.
Interconnects (Nathan Lambert) Discord
- CodeScientist Automates Science: AllenAI introduces CodeScientist, an autonomous system using genetic search over research articles and codeblocks to generate and evaluate machine-generated ideas, with 19 discoveries from experiments in agents and virtual environments detailed in their paper.
- The system addresses limitations in current ASD systems by exploring broader design spaces and evaluating research artifacts more thoroughly.
- OpenAI Teases Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2, seeking developer feedback via this form to maximize its utility, according to Sam Altman's tweet.
- Altman stated that they will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.
- Meta Preps Screened Smart Glasses: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman's report.
- Members are interested to see how they'll do against xreal.
- Pydantic Evaluates LLMs: Pydantic Evals is a powerful evaluation framework designed to help systematically test and evaluate the performance and accuracy of the systems you build, especially when working with LLMs.
- It provides a structured environment for assessing model capabilities and identifying areas for improvement.
- Lambert Returns to OpenAI: Nathan Lambert shared his thoughts on OpenAI returning in a substack post, mentioning that he may use this format for unbaked career thoughts too.
- He also mentioned DMing some OpenAI folks about it, hoping to find allies of open source who feel exiled by the current situation.
GPU MODE Discord
- A100 Parallel Threads Face Reality Check: Members debated the maximum number of parallel threads on an A100 GPU, but practical tests using GeoHot's tool revealed a limit of 24576 or 256 threads per SM before performance degrades.
- The conversation clarified that GPUs use oversubscription to hide latencies with cheap (~1 cycle) context switches, suggesting adding threads beyond the "parallel threads" limit doesn't significantly increase runtime.
- FlexAttention Lets it All Hang Out: FlexAttention now supports arbitrary sequence lengths, removing the restriction for segment sequence lengths to be a multiple of 128 in PyTorch 2.6.
- This enhancement was discussed with Horace He at a GPU mode event in San Jose.
- Desire for Memory Savings Using Tensor Deletion: A user seeks methods to delete argument tensors within a loss function to achieve memory savings of approximately 7GB, with a related GitHub Issue available.
- The user wants to free the storage associated with a tensor after it's no longer needed, even if a reference exists in the outer scope, while ensuring it remains compatible with torch compilation to avoid graph breaks.
- Apple Pushes MLX to the Max: Apple is hiring engineers for their MLX team to build scalable, distributed training and research pipelines, advancing the frontier of ML and systems.
- The company seeks system engineers and software developers with a background in ML to build technologies powering future products.
- Megatron Tensor Parallelism gets the Deep Dive Treatment: A member wrote an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, and seeks feedback on the content, available here.
- The article aims to deepen the understanding of ML scalability & performance techniques.
Latent Space Discord
- Cursor Codes Cash with Huge Round: Cursor closed a $625M funding round at a $9.6B post-money valuation, led by Thrive and A16z, with Accel as a new backer, and achieved $200M ARR, a 4x increase from its previous round in November 2024, according to The Information.
- The round sparked chatter about vibe coding, with Abe Brown noting the company's valuation has rapidly grown, approaching $10B.
- Etched Etches $85M for Transformer ASIC: Etched, a startup building transformer ASICs, closed an unannounced $85M round at a $1.5B valuation, following two stealth rounds at $500M and $750M, according to Arfur Rock.
- The company claims its chip Sohu can process over 500,000 tokens per second running Llama 70B, with one 8xSohu server replacing 160 H100s, although it cannot run CNNs, LSTMs, SSMs, or other AI models.
- OpenAI Opens the Gates with Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback on how to maximize its utility, according to OpenAI.
- The company will evaluate the model using its preparedness framework and host developer events in SF, Europe, and APAC, with Nathan Lambert anticipating a 30B parameter reasoning model with MIT/Apache license, according to his tweet.
- OpenDeepSearch Searches Deeper than GPT-4o: Seoong79 announced the release of OpenDeepSearch (ODS), an open-source search agent that works with any LLM and outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on the FRAMES benchmark from DeepMind, according to his tweet.
- Specifically, OpenDeepSearch achieved +9.7% accuracy over GPT-4o-Search when paired with DeepSeek-R1.
- Sophont Seeks to Solve Medical AI with Open Models: iScienceLuvr announced the launch of Sophont, a company building open multimodal foundation models for the future of healthcare, aiming to create a DeepSeek for medical AI, according to his tweet.
- The new company seeks to create foundation models that can perform well in healthcare.
HuggingFace Discord
- DeepSeek R1 Zips Past Rivals: A tweet lauded DeepSeek R1 for outperforming larger Western labs with efficient resource utilization and a permissive MIT license.
- The release also democratized RL for the GPU poor through GRPO.
- xAI Swallows X Corp!: xAI acquired X in an all-stock deal, valuing xAI at $80 billion and X at $33 billion according to Elon's Tweet.
- The merger aims to combine xAI's AI expertise with X's extensive user base.
- LLM Hyperparameter Tuning Hot Takes: Members sought guidance on selecting hyperparameters for fine-tuning LLMs, and were directed to Unsloth's LoRA Hyperparameters Guide.
- The question focused on how contextual changes influence hyperparameter settings.
- Coding Model OpenHands LM Opens!: The open coding model OpenHands LM, a 32B parameter model, is now on Hugging Face.
- The coding model is intended for use in autonomous agents for software development, as mentioned on the project blog.
- Gradio Rides the Million Developer Wave!: Gradio announced they've hit 1,000,000 monthly active developers for building and sharing AI interfaces.
- The Gradio team expressed gratitude for the community's contributions.
Modular (Mojo 🔥) Discord
- MAX 25.2 Livestream Glitches: Modular's MAX 25.2 livestream faced technical difficulties, but a cleaned-up recording and Chris' lightning talk are now available on YouTube and YouTube, respectively.
- The team apologized and promised a better system for future events, with one member humorously mistaking a GTC video of Chris for the live event.
- Compiler Error Bugging Users: A user reported a confusing compiler error message when defining a method for a
Dataset
struct, suspecting a compiler bug, see GitHub issue #4248.- A potential cause could be using
out self
instead ofmut self
, highlighting the need for clearer error messaging.
- A potential cause could be using
- Enums MIA in Mojo: Inquiries about enum updates in Mojo revealed that there are no updates available at this time.
- The response was a simple "Sadly no. 🙃🙃🙃"
- FlexAttention MAXed Out?: A user inquired about implementing flex-attention in Mojo, linking to a PyTorch blog post and suggesting it as a custom op in MAX.
- The response indicated that Mojo on the GPU is close to CUDA and *"unless you run into something that's a work in progress, MAX should be able to do more or less whatever you want."
- Float-to-String Algorithm Fails to Float: A user ported a new float to string algorithm to Mojo from this code, referencing the creator's CPPCon talk, but found it slower than the standard library's dragonbox implementation.
- Stringifying
canada.json
went from mid 30ms to low 40s, despite ripping the formatting from the standard library.
- Stringifying
Nous Research AI Discord
- OpenAI API Gets One-Line Fix: Any tutorial working with OpenAI API should work with the Nous Research AI API, provided the endpoint is changed to
endpoint = "api.nousresearch.com"
.- A member confirmed the fix and noted that they will be adding styles.
- Midjourney Models Write Creatively: Midjourney released a new research paper with NYU on training text-based large language models (LLMs) to write more creatively, moving past image generation.
- The company also revealed it is developing its own AI hardware, announced in late summer 2024.
- Sam Altman Teases Open-Weight Model: Sam Altman announced plans to release a new open-weight language model with reasoning capabilities, seeking developer feedback through events in SF, Europe, and APAC, according to this announcement.
- This marks OpenAI's first open-weight model release since GPT-2.
- DeepSeek Jiu Jitsu saves the Open Source Community: Members expressed gratitude to DeepSeek for their sophisticated maneuvers in enabling an Open Source community.
- The sentiment was linked to this YouTube video discussing OpenAI's shifting strategy related to the open-weight model.
- CamelAIOrg Launches Project Loong 🐉: CamelAIOrg introduces Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data, and this blog post details the integration of synthetic data generation with semantic verification.
- The project features a multi-agent framework ensuring accuracy and consistency.
Yannick Kilcher Discord
- Graphs Experience Learning Renaissance: A Google Research blogpost highlights the evolution of graph learning since 2019, tracing the history of graph theory back to Leonhard Euler in 1736 and its applications in modeling relationships.
- Community members showed great interest in recent advancements of the area.
- AI/ML Reshapes Job Landscape: Recent AI/ML improvements primarily impact low-level jobs, such as minor programming tasks, yet human adaptation remains crucial, reducing dependencies on others, exemplified by AI/ML's role in initial legal assistance.
- This shift saves resources and enables multidisciplinary tasks, suggesting a significant restructuring of professional roles.
- RLHF Produces Nerfed Models: Concerns rise over RLHF leading to emergent misalignment if models are penalized for useful tasks such as ML R&D, potentially resulting in open-source models becoming increasingly evil as they compensate for suppressed behaviors.
- Discussion also touched on whether open-source models may become nerfed.
- Gemini 2.5 Pro Bombs Math Test: Testers found Gemini 2.5 Pro (experimental) to be totally trash in math, with issues in UI math display, with ChatGPT and Grok 3 demonstrated superior question comprehension in information theory and geometry.
- Results led the user to guide the language model to write correctly.
- AI Model Feedback Aired Out: With the launch of the OpenAI Open Model Feedback forum, there was renewed discussion of Ilya Sutskever's quote of if there was one great failing it would be that you always had to check the results.
- The forum aims to improve models using community input.
MCP (Glama) Discord
- Pichai Promotes MCP?: Sundar Pichai's tweet asking 'To MCP or not to MCP, that's the question' has ignited significant interest in MCP, amassing over a million views.
- The moderator of
/r/mcp
even proposed hosting an AMA if Google adopts MCP.
- The moderator of
- ActivePieces Abandons MCP!: Active pieces, an open-source Zapier alternative, discontinued its support for MCP.
- There was no reason stated, but it might be related to the general MCP protocol still undergoing active development, along with the growing pains of many MCP-related side projects being deprecated.
- MCP RBAC approaches Explored: Users are exploring Role-Based Access Control (RBAC) implementations on MCP servers for segmented tool visibility, with one suggestion being integration with WorkOS.
- Another member mentioned Toolhouse API handles RBAC based on the API key.
- SDK Governance goes Open Source!: An open source SDK for enterprise governance (Identity, RBAC, Credentials, Auditing, Logging, Tracing) within the Model Context Protocol framework, is available at ithena-one/mcp-governance-sdk.
- Community feedback is welcomed.
- Asynchronous MCP Cometh: The extension MCPC mitigates MCP's synchronous limitations by adding asynchronous support.
- It maintains backwards compatibility, so existing setups remain functional, while the new features are available to both client and server setups.
Notebook LM Discord
- NotebookLM Chases Webby Wins: NotebookLM is nominated for three Webby Awards and is asking for community votes at this link.
- Voters should confirm their votes by clicking the verification link in their email, and check their spam folder.
- Google Tasks Tempts Integration: A user suggested that Google Tasks could integrate with NotebookLM by allowing users to pick a task list via a dropdown/popup.
- They proposed that this could work similarly to how Google Tasks allows selecting a task list for sharing.
- Archival Aspirations Arise: A user requested a way to archive notebooks in NotebookLM to hide them and reduce the number of notebooks counting against their limit.
- They suggested that hidden/archived notebooks should not appear in the list of notebooks available for sharing content.
- Gemini 2.5 Pro: Prompting Parity: A user requested that the NotebookLM IA be updated to Gemini 2.5 Pro, citing their love for the updated Gemini version.
- They hope that NotebookLM will perform even better with the new model, but the NotebookLM team has not commented on any ETAs.
- Notes, Not Sources Needed: A user with personal notes managed in Obsidian (2000+ short notes) finds the 300-note limit restrictive.
- They propose limiting the total number of words instead of the number of sources to better accommodate mesh note systems; a user suggests that folders or zipped files as a single source would also solve the problem.
Torchtune Discord
- Torchtune Scheduled for Next Friday: Members announced the next Torchtune office hours next Friday, linking to the Discord event.
- Members celebrated Discord's automatic timezone conversion feature.
- Hurry Review PR #2441: A member requested a final review for PR #2441 to expedite the merge process.
- Regression testing for PR #2477 is paused awaiting Qwen model upload to S3 for download during the regression test script, but the S3 bucket hookup is encountering internal infra snags.
- Llama2 Called Geriatric: A member suggested swapping the regression tests using the Llama2 model with something more current.
- It wasn't clear if the member's issues were related to regression test failures or simply the test suite using older components.
- Recursive Reshard Routine Removed: PR #2510 removes the
recursive_reshard
utility because it wasn't needed.- This PR was initially intended to address #2483, but further examination revealed the utility was unnecessary.
tinygrad (George Hotz) Discord
- ImageDtype's Purpose Revealed: A member asked about the purpose of ImageDtype and the IMAGE environment variable in tinygrad, referencing its influence on Tensor.conv2d implementation with a link to a VAE training script.
- Another member thinks that it is related to accelerating comma.ai models on Qualcomm (QCOM) hardware, by utilizing mobile GPUs' texture performance and caching.
- tinygrad BEAM Leaves tf-metal in the Dust: A user reported performance gains on an M1 Pro, going from 3.2 it/s without BEAM to 28.36 it/s with BEAM=2; while Keras with tf-metal achieved about 25 it/s.
- George Hotz was pleased to see that it's "faster than tf-metal with BEAM!"
- Mobile GPUs Get Accelerated Via Textures and ImageDType: Discussion suggests ImageDType and associated functions optimize for mobile GPUs' texture performance, referencing a Microsoft research paper on mobile GPUs.
- A member questioned the hardcoding of layout specifics and suggested HWC (Height, Width, Channel) handling should be part of normal conv2d with user-defined padding.
- arange() Algorithm Optimized: A member identified suboptimal code generation for small arange ranges (e.g.,
arange(1, 2, 0.1)
) compared to larger ranges (e.g.,arange(1, 10, 0.1)
) and documented their findings on.arange()
here.- They also noticed an unnecessary addition in the generated code, proposing a fix from
((float)((ridx0+1)))*0.1f)+0.9f)
to(((float)((ridx0)))*0.1f)+1.0f)
.
- They also noticed an unnecessary addition in the generated code, proposing a fix from
LlamaIndex Discord
- LLM Agents Open New Frontiers for Docs: An underrated use case for LLM agents is every field that depends heavily on complex technical documentation like manufacturing, construction, and energy, where an agent can do structured extraction from documents.
- These docs are often full of screenshots as mentioned in this tweet.
- OpenAI RateLimitError Hinders ReAct Agent Locally: A user encountered an OpenAI RateLimitError (Error 429) when using a ReAct agent with a local model set up via Ollama, questioning if ReAct agents are exclusively for OpenAI LLMs, with setup details in their GitHub repository.
- The suggestion was that the embedding model might be the cause of the OpenAI error, as it could be defaulting to OpenAI's embedding model if not explicitly set, even though the user confirmed that they are using a Hugging Face embedding model, set during document creation.
- VectorStoreIndex Setup Needs LLM and Embedding Model: It was advised to pass in both the
llm
andembed_model
when creating the VectorStoreIndex.- Also, make sure to specify
llm
when callingindex.as_query_engine()
.
- Also, make sure to specify
Nomic.ai (GPT4All) Discord
- GPT4All Expands Globally with Translations: Official translations have been rolled out for the GPT4All documentation, now supporting Simplified Chinese, Traditional Chinese, Italian, Portuguese, Romanian, and Spanish.
- This broadens accessibility and usability of GPT4All for non-English speaking developers.
- Users Debate Llama3 8B Instructor Model Use Case: A user inquired whether the Llama3 8B Instruct model is optimal for generating blog posts and web pages from video and text-based course materials.
- Another user requested that they rephrase their question.
- Clarification on .bin vs .gguf File Formats: A user initially questioned the interchangeability of .bin and .gguf file formats.
- The user then retracted the question, noting they were mistaken about the incompatibility.
LLM Agents (Berkeley MOOC) Discord
- MOOC Quizzes Completion-Based: Members confirmed that the MOOC quizzes are completion based.
- Instructors hope students will attempt their best for their own learning.
- Llama 3 Cookbook Unveiled: The LLM Agents Cookbook mentioned in Week 5 coding agents refers to the Llama 3's cookbook found here.
- Meta released the Meta Llama 3 family of LLMs in 8 and 70B sizes, optimized for dialogue use cases and outperforming other open source chat models on industry benchmarks according to their blogpost.
- Loong Verifiers Validate Reasoning Models: As discussed in Project Loong, Large Reasoning Models like DeepSeek-R1 greatly improved general reasoning when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward.
- The ability to verify accuracy is crucial for improving domain-specific capabilities, particularly in mathematics and programming.
- High-Quality Datasets Enhance CoT Learning: The consensus is that abundant, high-quality datasets, featuring questions paired with verified correct answers, are a critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs).
- The community believes that these datasets provide the necessary signals for models to reliably arrive at correct answers.
Cohere Discord
- Command A Screams Eternally: A user found that Command A gets stuck generating the same character endlessly when encountering a context where a character is screaming with repeated letters.
- This issue occurs even with default API Playground settings, freezing the interface and preventing feedback, reliably reproduced with prompts like "Please generate a scream in fiction inside quotation marks".
- Rem App wants you to journal Dreams: A user shared Rem, a dream journaling app created with a friend to easily record, analyze, and share dreams.
- The app aims to provide a platform for users to log their dreams and gain insights into their subconscious.
- New Cohere Members make Introductions: The community welcomes new members to the Cohere Discord server, encouraging them to introduce themselves and share what they're working on.
- New members are prompted to share their company, favorite tech tools, and what they hope to gain from this community.
- Members eager to participate and learn: New members are eager to participate, learn, and get feedback on their projects.
- They are excited to engage in discussions about their favorite technologies and tools within the community.
MLOps @Chipro Discord
- Decoding Legalese Seminar: The Silicon Valley Chinese Association Foundation (SVCAF) will host a seminar on April 2, 2025, discussing AI applications in legislation, featuring the Founder of Legalese Decoder.
- The seminar will explore how AI, ML, and NLP simplify legal documents for public understanding.
- SVCAF Launches AI4Legislation Competition: SVCAF is holding a competition this summer to develop open-source AI solutions for citizen engagement in the legislative process, with details available in the official Github repo.
- The competition aims to harness AI's power to make legislative processes more equitable and effective, aligning with SVCAF's mission to educate the Chinese community in public affairs.
- AI4Legislation Seminar Series to Commence: The AI4Legislation seminar series will recur during the first week of each month to provide project guidance and information about legislative AI tools, accessible here.
- Each seminar features a different guest sharing insights on utilizing AI to address key challenges in lawmaking, exploring the potential of AI-driven governance.
AI21 Labs (Jamba) Discord
- Multilingual User Misses Poll: A member noted their absence from a recent poll, mentioning they regularly communicate in both French and English.
- They also indicated occasional use of Greek and Hebrew.
- AI21 Labs Discussed: The discussion briefly touched on AI21 Labs and their new Jamba model.
- However, no specific details or opinions about the model were shared.
Codeium (Windsurf) Discord
- Windsurf Sounds Kickstarts Auditory UX: Windsurf AI debuted Windsurf Sounds, their initial project in sound design and Auditory UX, with the goal of boosting flow state and productivity.
- Check out the full video announcement on X.com for more details.
- Windsurf Next Beta Program Opens to Early Adopters: The Windsurf Next Beta program is ready for early testers to check out new features, with downloads available at Codeium.com.
- Minimum requirements include OS X Yosemite, glibc >= 2.28 for Linux, and Windows 10 (64-bit).
Gorilla LLM (Berkeley Function Calling) Discord
- v0 Dataset: Vanished or Merged?: A member inquired about the fate of the v0 openfunctions dataset within
io_uring.h
and whether it was completely merged into the v1 dataset.- The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the
openfunctions
dataset inio_uring.h
.
- The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the
- Architectural Changes in Datasets: The conversation explores the architectural changes between the v0 and v1 versions of the
openfunctions
dataset inio_uring.h
.- The members seek to understand the data migration strategies, if any.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Manus.im Discord ▷ #showcase (1 messages):
Amazing case
- Case gets lauded as "amazing": A member labeled a certain case as amazing, using celebratory emojis.
- Unfortunately, no details or context about what this 'case' refers to, or why it's considered exceptional, were shared.
- Mystery 'Amazing Case' Piques Curiosity: A user highlighted a 'case' as amazing with emojis, but provided no specifics.
- The lack of context leaves the community wondering about the nature and significance of this purportedly noteworthy event.
Manus.im Discord ▷ #general (753 messages🔥🔥🔥):
Manus credits, Credit system, Pricing Structure, Token-Based System
- *R1 Users Lament New Credit System: Many R1* users expressed dissatisfaction with the new credit system, especially because testing projects often exhausts credits quickly, with some experiencing complete credit depletion after just a few requests, with members recommending alternative AI research tools to save credits.
- They observed that the system is like gambling and proposed more clear and transparent options for future plans, and urged it to be reconsidered for user adoption.
- *Decoding Manus' Credit Consumption Mechanism: Credits are depleted based on LLM tokens, virtual machines, and third-party APIs*, increasing with task complexity and time; tasks are now consuming credits despite just browsing online, making it hard for those in programming.
- Members pointed out that projects failed to upload, with some pointing to needing 800 credits and 1800 more debugging, but that debugging on ChatGPT was superior.
- *OpenManus Open-Source Alternative Gains Traction: Despite security concerns with PAT and API keys, there's rising interest in OpenManus, with some planning to evaluate its capabilities, though members caution of capability deficiences* when adapting to the Manus' work scenarios.
- A member asks whether the tool's output could improve, prompting replies that it can generate interactive study guides as websites and in depth research, but that it depends on the situation.
- *Manus Offers Support for Creating and Hosting Websites: Members are reporting success with Manus on creating a hosted website, pointing out that the software provides DNS and hosting services, while members also report they are combining services like Perplexity and Gemini Deep Research*.
- One member says there's a video if you would like to watch this*, leading other members to inquire about how to get people to use the website.
- *Android App for Manus Is Available: Users discover that Manus has an Android app, accessible via the browser by clicking a phone icon, which redirects to the Play Store, while some suggest purchasing an iPhone* to solve the issue.
- Don't ask to ask, just ask: no description found
- Reve: Bring your ideas to life: no description found
- How Bankuet Works For Food Banks: Bankuet is a platform for food banks that turns donations into the supplies that food banks need most. Find out more here: https://www.bankuet.co.uk/ Credits:…
- Three.js Platformer Game: no description found
- Leonardo.Ai: Create production-quality visual assets for your projects with unprecedented quality, speed and style-consistency.
- Whale From Http://Headlikeanorange.Tumblr.Com/ GIF - Whale Sleeping Sea - Discover & Share GIFs: Click to view the GIF
- Daft Punk Disintegration GIF - Daft punk Disintegration Robot - Discover & Share GIFs: Click to view the GIF
- Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
- Whale Swimming GIF - Whale Swimming Nature - Discover & Share GIFs: Click to view the GIF
- Hệ thống SaaS Quản lý Khách sạn: no description found
- Lucifer Well Hello There GIF - Lucifer Well Hello There Hello - Discover & Share GIFs: Click to view the GIF
- no title found: no description found
- Chat Context Tracking Extension for LLMs - Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
- GitHub - punkpeye/awesome-mcp-servers: A collection of MCP servers.: A collection of MCP servers. Contribute to punkpeye/awesome-mcp-servers development by creating an account on GitHub.
- Traycer: AI-Powered Pair Programming: An AI-powered coding assistant that plans, implements, and validates every change 🚀
- Lionel B Crypto GIF - Lionel B Crypto Currency - Discover & Share GIFs: Click to view the GIF
LMArena ▷ #general (977 messages🔥🔥🔥):
Meta Model Safety Downgrades, Decoding 'venom' Prompts, Gemini 2.5 Pro's 'Aliveness', New LMArena models
- Meta Models Get a "Safety" Downgrade: Newer models from Meta are reportedly becoming safer, with one member noting the shift by testing how the AI infers hidden context from corrupted text and observing that the models now apparently sanitize the censored details.
- In contrast, previous models like Themis, Cybele, and Spider were eager to go where other models couldn't.
- Decoding the "Venom" Prompt: A System Prompt Analysis: Members analyzed the system prompt for models like Spider, Cybele, and Themis, believing they share a similar prompt to the now exposed
venom
prompt.- The analysis reveals a whacky but intelligently crafted prompt that heavily influences the models' style and responses, particularly in how they format and structure their outputs.
- Gemini 2.5 Pro's Spooky "Aliveness" Sparks Turing Test Debates: Members express intrigue over the aliveness and curiosity of Gemini 2.5 Pro, with one suggesting it might be the first to pass a serious Turing Test due to its unique interaction style.
- They highlight Gemini's exceptional creative writing capabilities and top scores on Philip's SimpleBench as evidence of its potential and note the model appears to be more creative and engaging, leading to calls for a double-blind Turing Test.
- LMArena Introduces a Pantheon of New Models: LMArena introduces a flood of anonymous models like Aether, Maverick, Ray, Stargazer, Riveroaks, with members trying to uncover their origins and capabilities.
- Stargazer is said to be made by Google (=== Nebula), and Riveroaks claims to be from OpenAI, gpt 4o, while Maverick, Spider and 24_karat_gold seem to have a similar style due to their shared system prompts and origins at Meta.
- We are finally beginning to understand how LLMs work: No, they don't simply predict word after word: Circuit tracing is a relatively new technique that lets researchers track how an AI model builds its answers step by step – like following the wiring in...
- Download - ZeroBrane Studio - Lua IDE/editor/debugger for Windows, Mac OSX, and Linux: no description found
- Models Table: Open the Models Table in a new tab | Back to LifeArchitect.ai Open the Models Table in a new tab | Back to LifeArchitect.ai Models Table Rankings Reasoning Models • 2024Q3–2025Q1 Data dictionary Model...
- Meta (?)'s `24_karat_gold` (lmarena) System Prompt: Meta (?)'s `24_karat_gold` (lmarena) System Prompt - prompt.txt
- SimpleBench: SimpleBench
- Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad: Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with th...
- LMArena's `venom` System Prompt: LMArena's `venom` System Prompt. GitHub Gist: instantly share code, notes, and snippets.
- Gemini 2.5: Our most intelligent AI model: Gemini 2.5 is our most intelligent AI model, now with thinking.
- no title found: no description found
- Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
LMArena ▷ #announcements (1 messages):
Alpha Arena updates, Copy Code feature, Image generation, Bug reports
- *Alpha Arena* Adds Copy Code and Images: The Alpha Arena now features a copy code function and image generation capabilities.
- Users can try out the new features at alpha.lmarena.ai using the password
still-alpha
.
- Users can try out the new features at alpha.lmarena.ai using the password
- *Alpha Arena* Testers Requested to Give Feedback: Testers are encouraged to provide feedback via a Google Forms link and report bugs via an Airtable link.
- Outdated Browsers Cause Airtable Issue: Users experiencing issues with Airtable are advised to use the desktop app or update to the latest version of Chrome, Firefox, Safari, or Edge.
- This suggestion was made to resolve potential compatibility issues.
- Arena - New UI Feedback: Tell us what you think about the new design!
- Airtable | Everyone's app platform: Airtable is a low-code platform for building collaborative apps. Customize your workflow, collaborate, and achieve ambitious outcomes. Get started for free.
Cursor Community ▷ #general (867 messages🔥🔥🔥):
Gemini 2.5 Pro Reasoning, Trial Abuse and Account Flagging, Roo Code Alternatives, Model Context Protocol, AI-Generated KFC Ad
- Gemini 2.5 Pro: Reasoning Debated: A member questioned why Gemini 2.5 Pro doesn't reason, stating He doesn't think, he responds very quickly, sparking a discussion about its capabilities.
- Others defended Gemini's abilities in specific scenarios, while some suggest Claude 3.7 handles complexity and detail more effectively.
- Account Restrictions Spark Trial Abuse Debate: After a user expresses confusion about their account limitations, another member claims the account was flagged for abusing the trial, needing a credit card.
- Another user suggested alternatives like Windsurf or Cline to bypass the payment issue.
- Comparing AI Model Performance and Tooling: Members discussed the performance of Gemini 2.5 Pro versus Claude 3.7, with some preferring Gemini 2.5 Pro and others finding it only useful for simple tasks, while one preferring Sonnet 3.7 Thinking.
- Discussions also covered the use of different tools like Roo Code and methods for prompt engineering, with emphasis on keeping prompts simple and clear and focusing on multiple shots for each task.
- Discussing AI replacing Jobs, and the need for ML and AI knowledge: Members discussed the future of AI and its potential impact on employment, with one suggesting that 86% of jobs could be replaced by 2030.
- The response was to learn ML/AI and Prompting properly and polynomials with regressions.
- Cursor's Free Model Questioned: Members questioned Cursor for charging people to use a free model, with an explanation that Cursor's API usage is managed through their wallet, and they’ve got deals with some AI models via Fireworks.
- The consensus was that Cursor has limited token usage but it is like 10x cheaper then Claude.
- Tweet from Ashton Forbes (@JustXAshton): No one would question this footage if it showed the plane crashing into the ocean. We would just say, "of course the US military was tracking a rogue Boeing 777 flying past its exercises and bases...
- Tweet from Min Choi (@minchoi): Google Gemini 2.5 Pro is the best AI coding model right now.People are finding insane ways to build apps, games, and supercharge productivity.10 wild examples:1. Office simulation game
- Tweet from wh (@nrehiew_): Almost 2.5 years after the launch of ChatGPT, OpenAI has finally released text-davinci-003
- Tweet from Salma (@Salmaaboukarr): I'm blown away!😱 This KFC concept ad is 100% AI generated!My friend David Blagojevic (he's not on X) created this ad concept for KFC and it's incredible! Tools used: Runway, Pika, Kling...
- Tweet from Ingi Erlingsson 🪄 (@ingi_erlingsson): I've had a chance to play with the new @higgsfield_ai tools over the past few days and they're a lot of fun!It's nice to see such a well curated collection of camera moves and effects all ...
- Cursor Editor Dumbness Meter: no description found
- Tweet from Ryan Carson (@ryancarson): Just officially switched from Sonnet 3.7 MAX to Gemini 2.5 Pro MAX in @cursor_aiThe combination of 1m context + strong reasoning + strong coding skills makes it a JOY to code with.👏 @tulseedoshi @Off...
- Tweet from Cj Z 🎯 (@cj_zZZz): Cursor Agent is just wild.Now i use Gemini PRO 2.5 to scan the codebase and sonnet 3.5/3.7 to execute code.In this workflow you need 3 things:1. Detailed project documentation 2. Use multiple AI codi...
- Pricing | Cursor - The AI Code Editor: Choose the plan that works for you.
- Cursor – Ask mode: no description found
- Rainbow Spongebob GIF - Rainbow Spongebob Imagination - Discover & Share GIFs: Click to view the GIF
- Art Rebellion: 2029: no description found
- <globalRules><responses>- Repeat the question before thinking about the solu - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
- [Guide] A Simpler, More Autonomous AI Workflow for Cursor: Hey everyone, Following up on the previous KleoSr Cursor Rules system, I’ve been working for the past week and the engagement with the community inside my old thread: [Guide] Maximizing Coding Effici...
- Private Browse Man Private Browser GIF - Private Browse Man Private Browse Private Browser - Discover & Share GIFs: Click to view the GIF
- AI Model & API Providers Analysis | Artificial Analysis: Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.
- Cursor Status: no description found
- Never Gonna Give You Up Rickroll GIF - Never Gonna Give You Up Rickroll April Fool - Discover & Share GIFs: Click to view the GIF
- no title found: no description found
- Unlocking AI’s potential: How to quickly set up a Cursor MCP Server: Learn how to quickly set up a MCP Server in Cursor and unlock AI’s potential with the Model Context Protocol (MCP). Standardize LLM integration with external tools.
- Cursor – Model Context Protocol: no description found
- Changelog | Cursor - The AI Code Editor: New updates and improvements.
- Changelog - Mar 11, 2025 | Cursor - The AI Code Editor: Reliability, Keyboard Shortcuts & Early access opt-in
- Changelog for 0.45.6 version: Hi guys! maybe there’s some info about newest version, let’s talk about it
Unsloth AI (Daniel Han) ▷ #general (256 messages🔥🔥):
Blackwell Support, VLM Training, GRPO Usage, Gemini 2.5 Pro, Training with Unsloth
- RTX Pro 6000 gets PyTorch Support: A user asked about Blackwell support and mentioned having an RTX Pro 6000 with CUDA 12.8 and sm_120 to finetune Mistral.
- Another user replied that PyTorch nightly supports it, but recompiling everything is required.
- RAG Reward Ruminations: A user asked about using GRPO for RAG or similar variants with tools, and how to reward that.
- Another user outlined primary reward components, including retrieval quality rewards (relevance, diversity, accuracy), generation quality rewards (factual consistency, citation accuracy, completeness), and tool usage rewards (appropriate selection, correct usage, effective incorporation).
- Unsloth's Training Precision Pointers: In a discussion about training speed and precision, it was stated that 16-bit LoRA is the most precise and fastest if VRAM is not limited, and that Unsloth has optimizations for 16-bit.
- It was also suggested to benchmark both 4-bit and 8-bit to see the difference and gain practical experience.
- Multi-GPU Marvels Incoming: It was revealed that multi-GPU support is coming soon to Unsloth.
- The first release will likely include only data parallelism, and fsdp (Fully Sharded Data Parallelism) may not be included initially and will be under the AGPL3 license.
- DeepSeek Training Trials: One member lamented that they can't train on deepseek!
- It may require two nodes of H100s to train DeepSeek, even with QLoRA.
- Custom Pre-Tokenized Dataset – Axolotl: no description found
- class DataCollatorForLastCompletionOnlyLM(DataCollatorForLanguageModeling): - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
- unsloth/phi-4-GGUF · Hugging Face: no description found
- - YouTube: no description found
- Fix batched generation for prompts of different lengths by RunFMe · Pull Request #2216 · unslothai/unsloth: no description found
- Inicio - alberto@barrahome:~$: no description found
- alberto@barrahome:~$: no description found
- Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
- no title found: no description found
Unsloth AI (Daniel Han) ▷ #off-topic (48 messages🔥):
Lightweight Pretraining Techniques, Bonsai pretraining, BitNet training Costs, Qwen Model Rebenched, Exllama2 vs vLLM Inference
- Investigating SOTA Lightweight Pretraining for 64-GPU alternative: A member suggested investigating SOTA lightweight pretraining techniques (MoE, FP8) to achieve pretraining with a single node in a couple of weeks instead of 64 GPUs.
- They shared a link to deepgrove-ai/Bonsai suggesting it might be possible to pretrain with only $70 and 3.8b tokens on a BitNet.
- DeepGrove's Bonsai Claims $70 BitNet Pretraining: A member expressed skepticism about DeepGrove's Bonsai claim of pretraining a BitNet with only $70 and 3.8b tokens.
- They are running the model in Kaggle to see if it holds up and explore possibilities of the model being a blindly copied Qwen model or continue trained Qwen to BitNet.
- BitNet Verification Challenges Explored: A member shared a code snippet for a modified weight quantization to determine if a model is based on BitNet architecture.
- The code uses per-tensor quantization to 1.58 bits, with no grouping needed for quantization.
- Fastest inference engine debate: exllama2 vs vLLM: A member asked about the fastest inference engine for single request non-batched decoding for Llama / Mistral 4bit quants, particularly comparing Sglang/lmdeploy and vLLM.
- The member assumes vLLM might not perform well in non-batched decoding due to its engine needing to go through llm_engine.step().
- TurboDerp's exllama2's Dynamic Mode Explored: A member shared the exllama2's dynamic mode link, noting all forward calls go through the generator, requiring control handoff to the generator job scheduling.
- Other members suggest TensorRT LLM for single token generation, while some suggest hooking the forward pass in exllama.
- GitHub - deepgrove-ai/Bonsai: Contribute to deepgrove-ai/Bonsai development by creating an account on GitHub.
- exllamav2/doc/dynamic.md at master · turboderp-org/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp-org/exllamav2
- Injecting noise in hidden state inputs, query, key/value or attention head outputs · turboderp-org/exllamav2 · Discussion #500: Hey there! Not sure if this was already discussed somewhere around here but I stumbled across the idea of injecting noise into inference and BEFORE sampling. See https://github.com/EGjoni/DRUGS and...
Unsloth AI (Daniel Han) ▷ #help (248 messages🔥🔥):
Orpheus Dataset Issues, Model Evaluation Problems, Gemma 3 Inference Samples, Fine-tuning with PDFs, Vision Fine-tuning with Gemma 3
- Dataset causes value error: A user encountered a
ValueError: expected sequence of length 203 at dim 1 (got 885)
when using a custom dataset in Unsloth Orpheus format, later resolving it by using a GPU.- Another user mentioned that the Orpheus dataset uses SNAC, which operates at 24kHz.
- Model Evaluation Incoherence Surfaces: A user reported experiencing issues during model evaluation, with the model generating incoherent text despite coherent text generation during normal inference runs.
- It was suggested that enabling
report_to
can help log metrics to platforms like Wandb, especially when using a customcompute_metrics
function.
- It was suggested that enabling
- Gemma 3 does Text-to-Image!: A user asked for image and text inference samples for Unsloth/Gemma 3 using Hugging Face, referencing a Gemma 3 demo on Hugging Face Spaces.
- It was noted that while Llama 3.2 Vision requires an image, Gemma 3 should not have the same issue.
- Turning PDFs into Chatbots, Dataprep needed: A user sought guidance on fine-tuning a model using only documents (PDFs) to specialize in a language or field, after converting the PDFs to text using Langchain.
- It was suggested to use synthetic data generation via augmentoolkit, emphasizing that this process is outside the scope of Unsloth and that they should look at Unsloth docs.
- Mamba Gains Eager Attention: Users discovered a fix for Mamba implementation issues by setting
attn_implementation = "eager"
, as highlighted in a GitHub pull request.- Despite the fix, Mamba training was noted to be significantly slower.
- Vision Fine-tuning | Unsloth Documentation: Details on vision/multimodal fine-tuning with Unsloth
- Gemma 3 12b It - a Hugging Face Space by huggingface-projects: no description found
- Continued Pretraining | Unsloth Documentation: AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- [DOC] Gemma 3 instructions on Vision Fine Tuning page is not correct · Issue #2265 · unslothai/unsloth: [x ] Report incorrect documentation Report needed documentation Report incorrect documentation Location of incorrect documentation -- provide links and line numbers if possible. https://docs.unslot...
- no title found: no description found
- Google Colab: no description found
Unsloth AI (Daniel Han) ▷ #research (23 messages🔥):
Model Evaluation, Coding benchmarks, Long context benchmarks, Math benchmarks, Gemma 3 vs small LMs
- General performance benchmarks non-existent: A member stated that there is no such thing as general performance benchmark and every model is good/bad at a set of different verticals.
- They said it's like a fallacy to believe that by aggregating a bunch of verticals together you'll get a score result that satisfies your particular vertical.
- Coding benchmarks: Aider Polyglot and SWE Bench: For coding benchmarks, a member suggested Aider Polyglot and SWE Bench as decently applicable benchmarks.
- However, SWE Bench has issues with being based on llm frameworks, and Aider Polyglot might show how good an llm is when you use it with aider.
- RULER is bare minimum for Long Ctx Bench: For long ctx benches, a member stated that RULER is the bare minimum for what should be considered a long ctx bench, and NIAH is garbage.
- They added that some of the recent ones are alright.
- Math Benchmarks: AIME is good enough: For math benchmarks, a member suggested that AIME is good enough as long as there's no contamination and there is proper assessment with COT.
- They also mentioned that most coding benches are based on python, but there's WebDev Arena for JS.
- Small LMs vs Gemma 3: A member expressed interest in comparing Gemma-3 4B with existing small LMs.
- They asked if Open LLM doesn't have Gemma 3, are there any other viable leaderboards containing Gemma 3 against small LMs.
Perplexity AI ▷ #announcements (2 messages):
Discord improvements, Simplified onboarding, Feedback consolidation, Pro channel access
- Discord Overhaul Incoming: The mod team has gathered feedback to enhance the Discord experience and plans to implement three key improvements over the next week.
- Users can expect changes to the onboarding flow and feedback channels, with announcements made in advance to avoid surprises.
- Streamlined Onboarding for Newbies: The onboarding flow will be simplified to reduce the number of steps and choices required before engaging with the community.
- The goal is to make it easier for new users to get started and quickly become active members.
- Feedback Central: One Channel to Rule Them All: Feedback channels will be consolidated to streamline the process, ensuring the PPLX team stays informed about community requests.
- This aims to make feedback more effective and ensures the team is always aware of user needs.
- Pro Channel VIP Access: Efforts are underway to automate access to the Pro Channel, providing advanced support from mods for urgent requests.
- This will ensure that users with time-sensitive needs receive prompt and dedicated assistance.
Perplexity AI ▷ #general (544 messages🔥🔥🔥):
Space Instructions limitations, Image generation discontinued?, Apple Intelligence in the EU, Samsung AI vs Apple Intelligence, GPT Omni shortcomings
- Space Instructions offer limited control, members discover: Users discussed that Space Instructions in Perplexity AI do not provide full control over the search experience, mainly affecting output summarization rather than initial data sourcing.
- The limitation means instructions cannot prevent the AI from searching specific topics, as instructions only apply after the relevant data has already been extracted, causing frustration among some users.
- Perplexity Image Generation: Missing in Action?: A user inquired about the discontinuation of image creation within Perplexity, noting the feature's absence.
- Another user suggested using the web search to find the generate option, but another confirmed that the function doesn't seem to appear for everyone, perhaps indicating phased rollout or feature testing.
- Apple Intelligence Blocked from EU?: A user casually noted yey apple intelligence in the EU now, implying availability, though without further elaboration.
- Following the statement, others swiftly shifted the focus to discussing Samsung AI, with one user claiming it's superior, triggering a debate on the merits of each.
- Perplexity Users Grumble About GPT Omni: Users expressed dissatisfaction with GPT Omni, with one describing it as suck ass and questioning how to revert to a previous GPT version.
- Another user explained that Omni is designed for smarter interaction with audio, video, and images, but has been dumbed down compared to GPT-4 for cost reasons.
- Rumors abound: Perplexity to launch more Deep Research: A Perplexity team member hinted at an upcoming, more powerful version of Deep Research in the coming weeks.
- Speculations include a potential partnership with Groq, following the addition of text, but not the actual new deep research feature; users report that Deep Research completes in seconds instead of minutes.
- Tweet from Ask Perplexity (@AskPerplexity): Perplexity has signed a definitive agreement to acquire Poogle for $2.2T - a significant step toward improving search in the AI era.
- Tweet from Perplexity Supply (@PPLXsupply): no joke, animals love PPLX Supply
- Sam Altman Says OpenAI Will Release an ‘Open Weight’ AI Model This Summer: The news follows the breakout success of DeepSeek and growing pressure from rivals like Meta.
- Dead Reckoning Part 1 Hayley Atwell GIF - Dead reckoning part 1 Hayley atwell Pom klementieff - Discover & Share GIFs: Click to view the GIF
- Got Game Of Thrones GIF - GOT Game Of Thrones You Know Nothing - Discover & Share GIFs: Click to view the GIF
- Metal Gear Rising Metal Gear Rising Revengeance GIF - Metal Gear Rising Metal Gear Rising Revengeance Senator Armstrong - Discover & Share GIFs: Click to view the GIF
- Totoro GIF - Totoro - Discover & Share GIFs: Click to view the GIF
- Counting Money GIF - Counting Money Paper - Discover & Share GIFs: Click to view the GIF
- Website Traffic Checker: Estimate Any Site’s Traffic: Dig into the traffic data for any website and find growth opportunities for yours. Try the free version of Ahrefs’ traffic checker.
- Why Does ChatGPT's Algorithm 'Think' in Chinese?: OpenAI's new reasoning model is doing weird, unpredictable stuff.
- Hamster AI: Hamster AI, the most powerful AI tools brought into one software. Use tools from OpenAI, Cluade, and Mistral for less than the price of coffee!
- no title found: no description found
- Papaya vs Musk melon nutrition content per 100g wi: Shared By Anonymous - on April 1, 2025
- Papaya vs Musk melon nutrition content per 100g wi - Blackbox: no description found
Perplexity AI ▷ #sharing (10 messages🔥):
Code Tracing in Python, AI Accuracy in Reading, API Research
- Python Code Tracing Tricks: A user asked how to trace a code on python.
- No answers were given in the context.
- AI Reading Accuracy Questioned: A user asked how accurate is AI in reading.
- No answers were given in the context.
- API Research Questioned: A user asked about researching API.
- No answers were given in the context.
Perplexity AI ▷ #pplx-api (5 messages):
Sonar API Access, Tier 2 Credits, JSON Formatting with Pydantic
- Sonar API Access Sought: A user inquired about obtaining access to the Sonar API for work-related purposes and requested contact information for a relevant person on the Perplexity team.
- James from the API team responded, offering assistance.
- Tier 2 Credits Acquired: A user confirmed they reached Tier 2 with credits.
- They want to tell it that this response will be read by brand managers of FMCG companies so please structure in a manner that is actionable for them.
- JSON Formatting Issues with Web Search Results: A user reported issues with the Sonar API adding weird special characters (e.g., "<") to the JSON results when searching the web, despite using pydantic for formatting.
- The user provided an example where extra characters were added to the
source_name
,source_title
,summary
, andurl
fields in the JSON output.
- The user provided an example where extra characters were added to the
OpenAI ▷ #annnouncements (1 messages):
ChatGPT's new voice Monday, voice mode, voice picker
- ChatGPT Introduces New Monday Voice Option: A new voice option called Monday has been introduced in ChatGPT, accessible via the voice picker in the top right corner of voice mode, as demonstrated in the attached video.
- Monday Voice Quick Access: Users can quickly access the new Monday voice in ChatGPT by opening voice mode and selecting the voice picker located in the top right corner of the interface.
OpenAI ▷ #ai-discussions (314 messages🔥🔥):
Fake ChatGPT Apps, Gemini 2.5 Pro Rate Limits, Image Generation with Ghibli style, ElevenLabs Voice Model, AI and Creative Industries
- Beware! ChatGPT Impersonators Swarm Play Store: Users reported buying ChatGPT through the Play Store but not receiving access, raising concerns about fake impersonator apps, and urging users to check their purchase history to confirm it was with OpenAI.
- It's important to ensure you're using the official app to avoid scams and ensure you have access to OpenAI services.
- Gemini 2.5 Pro Hits Rate Limits: Users are reporting getting rate limited on Gemini 2.5 Pro, sparking discussion of whether the limit applies to both free and paid tiers, some users bypass rate limits by using a VPN.
- It was suggested to use Gemini in Google AI Studio where the limits are higher (50 per day).
- Ghibli Style Conversions Tickle AI Image Generators: Members experimented with prompts to convert images to Ghibli style, one user shared their prompt "Make this image gibli style", while another suggested "Reimagine this image in the iconic Studio Ghibli style: painterly textures, soft light, and a touch of nostalgic wonder".
- The free models were used, but it was noted more improvements are needed to detail emotions and face details, with some claiming they are far better than destroying Ghibli art style for nothing.
- ElevenLabs New Voice Model: Promising but Pricey?: A member shared that they are exploring ElevenLabs' new model for creating narrated audio books, highlighting its voice cloning feature.
- While impressed with initial results and high quality, they await OpenAI to release a similar voice product to avoid subscribing to external services, as for some game developers, it could be useful as a voice acting placeholder.
- Navigating AI's Role in Creative Industries: a Tightrope Walk: The discussion touched on the use of AI in creative fields, particularly gaming, with the consensus that AI is often used by non-creatives, resulting in amateurish outputs and overestimation of AI's current capabilities, they referenced this discussion.
- There was an exchange of opinions, with some arguing that AI is mostly assisting professionals for ideation, while others critiqued reliance on statistically average outputs and the need for human effort in creating novel works. AI integration into existing software ecosystems like Adobe and Autodesk was seen as a more promising direction.
- OpenAI.fm: An interactive demo for developers to try the new text-to-speech model in the OpenAI API
- Shia Labeouf Clapping GIF - Shia labeouf Clapping Theatre - Discover & Share GIFs: Click to view the GIF
- evolving_llms_through_text-based_self-play.pdf: no description found
OpenAI ▷ #gpt-4-discussions (24 messages🔥):
Image generation rate limits, copilot experiences, ChatGPT instructions, 4o abilities, future of image model
- OpenAI implements image generation rate limits for Plus users: Due to extreme load since the new image model was released, Plus users are now experiencing rate limits, an interim measure to mitigate the flood of users.
- One user, presumably facetiously, remarked, "At $200 a month you better not get rate limited," referencing a story that OpenAI added 1 million new users in an hour.
- Users adapt to copilot: A member shared feeling adapting to copilot, when copilot is "something dumb over and over".
- The member expressed a feeling of adaptation while using copilot.
- Users seeking guidance on ChatGPT Prompting: A user sought help to prevent ChatGPT from adding "cool or edgy" concluding remarks to its descriptions.
- Another member suggested a revised prompt, including the line: "Do not add concluding remarks outside the direct scope of fulfilling the request based on the chosen purpose. Stick to the process."
- Experimenting photo editing with 4o abilities: A user suggested a mode for editing photos or setting a custom vibe in 4o, preloading context with a game of 20 questions to narrow in on obscure human touches.
- The idea suggests leveraging 4o's ability to read between the lines for better follow-up requests and a personalized experience.
- Debating the Future of Image Model Improvements: A member asked whether OpenAI will continue to improve the image model or leave it for a few years, similar to what they did with DALL-E.
- No concrete answers were provided, but the question sparks curiosity about OpenAI's future plans for image generation technology.
OpenAI ▷ #prompt-engineering (9 messages🔥):
Custom Instructions in 'About Me' Box, Memory-Stored Prompts, Personalization in Model Responses, Model Pattern Recognition and Formatting
- Custom Instructions Work in 'About Me' Box: A member confirmed that extending custom instructions into the 'About Me' box works perfectly, as information is information to the model.
- Another member noted that the model can figure out likely patterns in your intent and runs with it, even if the fields are split mid-sentence.
- RAG-only context limits 'About me' Box: A member questioned whether the 'About Me' box gets confined to some RAG-only context-dependent space and whether it's reliable for storing entire prompts.
- They also mentioned having a ridiculous amount of memories and have tried to fit entire prompts into them, but it doesn’t seem very reliable. And that memory-stored prompts or personas do not activate without specifically requesting them.
- Personalization May Not Always Be Evident: A member shared examples of personalized model responses and noted that the first response may not always reflect their personalization.
- They emphasized being very clear about what they want and don't want from the model, including specifications for tool usage and NPC behavior.
- Model Learns To Reset Rigid Patterns: A member shared a code snippet
FORMAT_RESET
to help models acknowledge when they've fallen into rigid patterns and rethink their approach.- The code encourages the model to analyze what format would better suit the response and completely rethink its approach without defaulting to templates.
OpenAI ▷ #api-discussions (9 messages🔥):
Custom instructions in 'about me' box, Memory-stored prompts, Model Guessing vs Training, FORMAT_RESET for rigid patterns
- About Me as Custom Instructions?: A member asked if extending the custom instructions into the about me box works, and another member confirmed that it works perfectly because the model uses it as additional information to work with.
- The model can figure out a likely pattern in your intent and runs with it, even if you split the field mid-sentence; there is no functional reason it wouldn’t work unless “about me” gets confined to some RAG-only context-dependent space.
- Memory-stored prompts unreliable?: A member noted they have a ridiculous amount of memories, and while they have tried to fit entire prompts into them, it doesn’t seem very reliable.
- They can’t get memory-stored prompts or personas to activate without specifically requesting them, leading them to think the model is not actually seeing them most of the time or they are a much lower priority than where custom instructions are placed in the model context; shared chat examples of prompt engineering and NPC generation.
- Model Guessing vs Training: One member shared their process by presuming that the model either guesses a lot, leading to variation, OR it was trained to output in a typical pattern which was not specifically asked for, OR it is doing exactly what was asked for.
- Conflicts are mandatory to find and fix, as they usually degrade the performance, particularly when the model is trained that humans prefer X but the user prefers otherwise.
- FORMAT_RESET for rigid patterns!: A member created a little thing for when you catch a model following a format/pattern you don't like/want, as a way to acknowledge that the model has fallen into rigid patterns and rethink its approach without defaulting to templates.
- They provided a code snippet to tell the model
FORMAT_RESET: Acknowledge you've fallen into rigid patterns, analyze what format would better suit your response, and completely rethink your approach without defaulting to templates.
- They provided a code snippet to tell the model
LM Studio ▷ #general (198 messages🔥🔥):
eGPU with LM Studio, Gemini 2.5 Pro Evaluation, Gemma 3 27B Performance, Local LLM recommendations, Copilot hurts developer experience
- Plug eGPU to LM Studio!: Members discussed the feasibility of using an eGPU with LM Studio, suggesting it should work as long as the computer recognizes it, despite slower speeds, referencing a YouTube video comparing LLMs on RTX 4090 Laptop vs Desktop.
- Gemma 3 Beats Gemini 1.5 Flash: A member shared a comparison where Gemma 3 27B outperforms Gemini 1.5 Flash in several benchmarks, like MMLU-Pro and Bird-SQL.
- Another member confirmed excellent results with Gemini 2.5 Pro, while another user used Gemini 2.5 Pro to produce the data, available free on OpenRouter.
- Qwen Coder 7B recommended!: For coding on a system with a 4060 Ti and i5 12400F, Qwen Coder 7B was recommended and available on LM Studio's model page, with suggestions to offload most of it to the GPU and also use Qwen Coder 14B or 32B.
- Members emphasized that a local LLM would perform much worse than cloud alternatives like ChatGPT or Deepseek, but Gemini 2.0 Flash was considered a top performer, costing only $0.44 per 1M input tokens according to their pricing documentation.
- Copilot's Coding Critiqued!: Members debated whether AI assistance in programming is beneficial, with one arguing that it hurts more than helps because the average user learns on AI slop.
- Others disagreed, stating that Copilot works great for experienced developers, but one person claims the recommendations given are trusted too easily by the average user, in addition to concern that copilot is trained on garbage code.
- One Parameter LLM Possible?: In a lighthearted exchange, it was discussed that a one-parameter LLM is possible but useless, and one user indicated they tried 656K but it can't chat though.
- About LM Studio | LM Studio Docs: Learn how to run Llama, DeepSeek, Phi, and other LLMs locally with LM Studio.
- qwen2.5-coder-7b-instruct: qwen • Alibaba • 7B
- Gemini Pro 2.5 Experimental (free) - API, Providers, Stats: Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini Pro 2.5 Experimental (free) with API
- April Fool GIF - April Fool - Discover & Share GIFs: Click to view the GIF
- Introduction - Hugging Face NLP Course: no description found
- llama.cpp/examples/quantize/README.md at master · ggml-org/llama.cpp: LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
- Google AI Studio: Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.
- no title found: no description found
- LLMs on RTX 4090 Laptop vs Desktop 🤯 not even close!: It's not even close.Discount on SIHOO chair: https://hongkongsihoointelligenthomecolimited.pxf.io/aziskDiscount code: YT6OFFAmazon: https://amzn.to/4jkRYPjDi...
- no title found: no description found
- How do I use LM Studio as my LLM service? · Issue #665 · pydantic/pydantic-ai: Hii everyone! I am Abhiraj, an associate SWE looking for some help/advice. I want to use LM Studio as my opensource and local LLM Service, is there any support for it yet because I am unable to fin...
LM Studio ▷ #hardware-discussion (63 messages🔥🔥):
Nvidia Drivers instability after 10-12 hours of usage, M4 Max vs 5090 Speed Comparison, Mac vs Nvidia GPUs for LLM, Tenstorrent Wormhole performance on Discord, Context Overflow and Shared Memory impact on LLM speed
- Nvidia Drivers Crash After Extended Runtime: A user reported Nvidia driver instability after running models for 10-12 hours, requiring a driver reinstall to resolve performance issues.
- The user clarified the issue was with the Nvidia driver itself, not the Windows OS, and sought to find if others experienced the same.
- M4 Max gets Speed Boost vs 5090: A user observed a 3.24x speedup from M4 Max to 5090 after resolving crashing issues, which aligns with the 3.28x ratio of their memory bandwidths when doing QwQ 32B 4 bit quant comparisons.
- They're now seeing around 21 tok/s on M4 Max and roughly 60 tok/s on 5090 when testing Gemma 3 32B q4.
- Mac Freedom of Context Size vs Nvidia Faster GPUs: While Nvidia GPUs may be faster, users are leaning towards Macs for the freedom to have more context size.
- The user highlighted that even though NVIDIA GPUs are faster, the ability to use larger context sizes is proving to be more useful.
- Tenstorrent Wormhole results sought out in Discord: A user inquired about performance results for the Tenstorrent Wormhole (n150d and n300d) within the Discord community.
- They expressed interest in obtaining TOK/s metrics for these models, but there was no follow up.
- Overflowing Context impacts LLM speed: A user wondered what would happen if they could load the context overflow into the shared memory/system RAM while keeping the entire model in VRAM.
- Another user noted that the LLM needs all the context in VRAM to generate the next token, because for each token generated, all the context goes through the transformer blocks, again and again.
Link mentioned: M3 Ultra vs RTX 5090 | The Final Battle: M3 Ultra Mac Studio vs AI beast with NVIDIA RTX 5090Efficient. Productive. Organized. | Baseus Spacemate Series(MAC)11-in-1 Docking StationBuy on Amazon.US: ...
aider (Paul Gauthier) ▷ #general (230 messages🔥🔥):
Gemini 2.5 Pro experiences and limitations, RateLimitError automation strategies, Dot Command Revolution, F#, Video analysis
- *Gemini 2.5 Pro*: A Hot Mess of Highs and Hallucinations?: Users are experimenting with Gemini 2.5 Pro and reporting mixed results with some models hallucinating/DC'ing while others are providing top-tier performance for coding tasks.
- One user noted, "Gemini is hallucinating / dc'd for me, same for you guys?", while another stated that the combination of Gemini 2.5 Pro and DeepseekV3 is *"almost free and top tier."
- *RateLimitError* Woes: Token Limits or Request Frequency?: A user reported frequent RateLimitErrors when requesting summaries and clearing history, and was looking for solutions to automate this process.
- Paul Gauthier clarified that the rate limit is likely based on the number of requests per minute or day, rather than the token count. One possible solution may be found in this Github issue.
- Dot Command Revolution: Aider's productivity hack?: A user is trying to promote the use of .dotcommands as a productivity hack for developers, enabling them to automate tasks with single-line commands such as
.status
and.next
.- The goal is to provide cognitive shortcuts optimized for clarity and specific functionality, but no one is using them. It has led to the quip *"THE DOT REVOLUTION IS HERE 🔥 Coders everywhere will want to try this one cool trick."
- *F#*: Condolences or Kudos?: A user mentioned rebuilding their app from Python into F#, prompting mixed reactions, including condolences and suggestions to "use Haskell".
- While the user explained they were working on ML projects, the community seemed skeptical about the choice of F# for such tasks.
- Video Analysis: Beyond the Transcript: A user inquired about AI models' comprehension of videos, wondering if they understand emotional impact or follow visual storylines beyond just processing transcripts.
- One response indicated that "Gemini's video understanding is 1 frame per second of the video fed into the model as an image."
- LiveSWEBench: no description found
- Tweet from AI Purr-fessor (Yash) (@MahawarYas27492): Amazing 2.5 flash experimental is out. 🔥 It's very smart better than o3 mini high on some reasoning questions I tested.It's rolling out very slowly so you might need to wait for officially, I...
- Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
- Gemini Pro 2.5 Experimental (free) - API, Providers, Stats: Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini Pro 2.5 Experimental (free) with API
- Do It Star Wars GIF - Do it Star wars Emperor palpatine - Discover & Share GIFs: Click to view the GIF
- Kuzco Yzma GIF - Kuzco Yzma Chat - Discover & Share GIFs: Click to view the GIF
- Online Classes Throw Away GIF - Online Classes Throw Away Computer - Discover & Share GIFs: Click to view the GIF
- April Fool April Fools GIF - April Fool April Fools Spongebob - Discover & Share GIFs: Click to view the GIF
- Restoring chat history leads to error / chat history summarization not working · Issue #2979 · Aider-AI/aider: Issue I have a chat history file that's quite long (80k tokens) but offers lots of valuable information about the project I'm building. It worked fine last week when I use model that has large...
- Interview with Vibe Coder in 2025: Vibe Codinghttps://linkgraph.net/stack/vibecoderInterview with a Professional Vibe Coder with Kai Lentit aired on © The Viboe Coder 2025.AI codingprompt eng...
- GitHub - tninja/aider.el: Interact with Aider: AI pair programming made simple: Interact with Aider: AI pair programming made simple - tninja/aider.el
- GitHub - MatthewZMD/aidermacs: AI Pair Programming in Emacs with Aider: AI Pair Programming in Emacs with Aider. Contribute to MatthewZMD/aidermacs development by creating an account on GitHub.
aider (Paul Gauthier) ▷ #questions-and-tips (30 messages🔥):
Temperature for coding, Stopping benchmarks, Aider with subdirectories, Aider local config, Model Summarization fails
- The Coder's Icy Preference for Temperature: Members discussed the optimal "temperature" for coding, with
0
being the popular choice in the channel.- A member asked for justification for this value, requesting is it based on smth?
- Aider's Subtree Savior for Mono-Repos: A member asked how to limit aider to a subdirectory of a monorepo, prompting a response to use the
--subtree-only
switch after changing to the desired directory.- This sets aider to ignore the repo outside the starting directory, though the asker noted the docs need updating and pointed to the FAQ on large monorepos.
- Config Conundrums: Model Settings in Aider: A member reported that specifying model names in a local YAML config file wasn't working as expected.
- Despite the startup message showing the correct config settings, aider still defaulted to anthropic/claude-3-7-sonnet-20250219 rather than the configured deepseek/deepseek-chat.
- Linting Loops Launching with Aider: A member inquired about running linters within aider, with another suggesting the use of
/run [npm|pnpm] run [lint|fix|whatever-command]
for a tight feedback loop.- Another member pointed to the sample aider.conf.yml file for listing multiple linters.
- Architect Model Results Get a Promotion: A member sought a way to directly send a satisfactory response from the architect model to the editor model to conserve 2.5 Pro shots.
- The suggestion was to open a new aider instance with
--restore-chat-history
and a suitable editor configured, though the lack of a--no-architect
flag was noted as an inconvenience.
- The suggestion was to open a new aider instance with
- FAQ: Frequently asked questions about aider.
- aider/aider/website/assets/sample.aider.conf.yml at 3992681b84d1ec0cbc18657c5ca832c89d7e551c · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):
Organizations leave Beta, Web search results in Chatroom, Cerebras on OpenRouter, PDF support for OpenRouter API
- Organizations is Out of Beta!: OpenRouter announced that the Organizations feature is now out of beta, allowing teams to control billing, data policies, provider preferences, and API keys in one place, detailed in this X post.
- During the two-week beta, over 500 organizations were created, giving teams complete control over data policies and consolidated billing.
- Web Search Hits the Chatroom!: Web search results are now available in the chatroom, with Perplexity results formatted similarly to OpenRouter's
:online
model variants. - Bluesky plea: A member requested that OpenRouter post on Bluesky as well, suggesting less reliance on Xitter.
- Call for Cerebras!: A member asked OpenRouter to talk to Cerebras about adding them to OpenRouter.
- PDF support in API?: A member inquired about when the OpenRouter API will support PDF files.
Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Today we're taking Organizations out of beta.With Organizations, teams have complete control over data policies and consolidated billing, adding peace of mind across dozens of model providers.Key ...
OpenRouter (Alex Atallah) ▷ #general (98 messages🔥🔥):
Aider OpenRouter Copilot, Gemini Flash 2 Context, Usage Downloads, Enterprise Level Rate Limits, GPT4o Image Generation
- Gemini Flash 2 Middle-Out Transforms: Members confirmed that OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in and only applied by default on endpoints with context length less than 8192 tokens.
- One member clarified that middle out only applies once you hit 1m right? (on flash) even if it's turned on.
- Requesting Usage Download: A member inquired about obtaining downloads of their usage data, including tokens and costs, as displayed on the activity page, to verify their credit usage.
- A maintainer responded that while this feature isn't currently available, we're working on it.
- OpenRouter Enterprise Level Rate Limits: A user asked about enterprise-level rate limits, clarifying that they disappear with a balance of $500 or more, subject to the upstream provider.
- Another member chimed in that well technically it depends upon the upstream provider.
- Auto Router for Fallback Models: A user requested a fallback model option, similar to the existing fallback provider feature.
- Another member pointed out that OpenRouter already has this via the Auto Router and the
models
parameter, as detailed in the documentation.
- Another member pointed out that OpenRouter already has this via the Auto Router and the
- OpenRouter EU Provider Selection: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer acknowledged the need but noted limited coverage today, mentioning OpenRouter allows provider selection, recommending seeking an EU certified provider for strict EU data guidelines.
- DeepSeek R1 Zero (free) - API, Providers, Stats: DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inf...
- Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
- Model Routing - Smart Model Selection and Fallback: Route requests dynamically between AI models. Learn how to use OpenRouter's Auto Router and model fallback features for optimal performance and reliability.
- Discord: no description found
Eleuther ▷ #general (43 messages🔥):
Cosine Annealing LR, Mini-batch vs Batch, Gradient Accumulation, Stanford CS 25 Transformers Course, Category theory
- Debate on Cosine Annealing Learning Rate (LR) Updates: Discussion on whether Cosine Annealing LR is best updated after every batch or sample, with concerns raised about different samples receiving different training when updating after every sample.
- The recommendation was to update after every mini-batch, ignoring the exposure problem, or attempting to fix it.
- Mini-Batch vs Batch Jargon Jungle: Members discussed the difference between mini-batch and batch in machine learning, with the distinction becoming increasingly blurred due to techniques like gradient accumulation and distributed training.
- It was mentioned that a mini-batch is run before each optimizer step, while a batch is a set of unique data, but the term batch size refers to the size of the mini-batch.
- Gradient Accumulation: Pro or Con?: Members debated the merits of gradient accumulation, with one member recalling it being previously dismissed but now seeing potential advantages in the early stages of training to calibrate optimizer states.
- Another member noted that gradient accumulation can be beneficial when network communications are slower than the compute, but otherwise, it is considered bad.
- Stanford Launches CS25 Transformers Course to the Public: Stanford has opened its CS25 Transformers seminar course to the public via Zoom, featuring discussions with researchers and covering topics from LLM architectures to creative applications.
- The course includes lectures, social events, networking sessions, and a Discord server for discussions, with past lectures available on YouTube.
- Category Theory: Reverse Engineering DL?: Someone shared a link to a thought experiment on whether category theory could be the optimal language for reverse engineering deep learning.
- The original post argued that neural networks have embeddings, or meaningful patterns of neuron activation rather than representations.
- Tweet from Nora Belrose (@norabelrose): Neural networks don't have "representations"They have embeddings, or meaningful patterns of neuron activationThey're meaningful in the sense of enabling us to do certain thingsDifferen...
- Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...
- - YouTube: no description found
Eleuther ▷ #research (21 messages🔥):
ACL Rebuttals, Deep Sets for Triangle Area, Comparing Language Model Embeddings, Relative Representations, Convergence of Representations in AI
- Reviewers Get Nudged After Rebuttal Submission: A member asked about sending an extra message to ACL reviewers the day after submitting a rebuttal and another member suggested it's reasonable if the rebuttal deadline is closing or if follow-up might take a few days.
- The original poster planned to follow up that evening, with the deadline being Thursday.
- Deep Sets Calculate Triangle Area, Yields Zero Insights: A member shared a link to a paper titled Deep Sets for the Area of a Triangle (arxiv link), which presents a polynomial formula for triangle area in Deep Sets form.
- The abstract concludes that the project, motivated by questions about computational complexity of n-point statistics in cosmology, gained no insights of any kind.
- Comparing Language Model Embedding Matrices: A member inquired about methods for analyzing and proving the similarity of two language models trained with the same tokenizer but different dimensionality embedding matrices.
- Suggestions included relative representations, least squares mapping, and comparing the leading entries of the eigenvalue decomposition of W^T W.
- Relative Representations Proposed as Solution, But Maybe Not: A member suggested relative representations (arxiv link) as a potential solution for comparing language model embeddings, while also cautioning about their limited applicability.
- They linked a paper discussing cosine similarity inflation in neural representations (arxiv link) and further pointed out related works discussing whether cosine is the best way to assess similarity.
- AI Representation Convergence, Plato Style: A member linked a paper arguing that representations in AI models, particularly deep networks, are converging towards a shared statistical model of reality, akin to Plato's concept of an ideal reality (arxiv link).
- Others suggested using CCA or SVCCA to compare the embedding matrices, referencing papers on Singular Vector Canonical Correlation Analysis (arxiv link) and projection weighted CCA (arxiv link).
- Relative representations enable zero-shot latent space communication: Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should ...
- A formula for the area of a triangle: Useless, but explicitly in Deep Sets form: Any permutation-invariant function of data points $\vec{r}_i$ can be written in the form $ρ(\sum_iϕ(\vec{r}_i))$ for suitable functions $ρ$ and $ϕ$. This form - known in the machine-learning literatur...
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning: We present modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models. Specifically, we show that different data modalities (e.g. images and text) are embedded ...
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability: We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing compa...
- Insights on representational similarity in neural networks with canonical correlation: Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Compa...
- The Platonic Representation Hypothesis: We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways...
Eleuther ▷ #scaling-laws (4 messages):
Learning Rate Impact, Scaling Efficiency, Model Oomph
- Learning Rate Affects Scaling Efficiency: A member stated that a bad learning rate changes the efficiency of scaling, which affects constants A & B.
- Another member added that a bad learning rate also changes how much oomph the model can get out of a given amount of data, which would seem to implicate beta.
- Bad Learning Rate Bad: Bad learning rate is bad.
- Like, really bad.
Eleuther ▷ #interpretability-general (5 messages):
Neuronpedia Open Source, Delphi auto-interp server update, Actionable Interpretability Workshop at ICML 2025, Neuronpedia Datasets
- *Neuronpedia* Goes Open Source!: The interpretability platform Neuronpedia is now MIT open source and available on GitHub with a quick Vercel deploy.
- *Delphi* Auto-Interp Server Set for Update: Neuronpedia's auto-interp server, which utilizes Eleuther's
Delphi
(previouslysae-auto-interp
), is slated for an update to the latest version.- The update aims to introduce new scoring and explaining types, facilitated by Neuronpedia's modular design and the existing OpenAPI schemas for the
Delphi
auto-interp server.
- The update aims to introduce new scoring and explaining types, facilitated by Neuronpedia's modular design and the existing OpenAPI schemas for the
- Dive into 4+ TB of Neuronpedia's Data!: A trove of interpretability data, totaling over 4 TB, is available for download as Public Datasets.
- *Actionable Interpretability* Workshop Accepted to ICML 2025: The Actionable Interpretability workshop has been accepted to #ICML2025 and is accepting paper submissions until May 9th per this tweet.
- Tweet from Hadas Orgad (@OrgadHadas): 🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉>> Follow @ActInterp@tal_haklay @anja_reu @mariusmosbach @sarahwiegreffe @iftenney @megamor2Paper submission deadlin...
- Tweet from neuronpedia (@neuronpedia): Announcement: we're open sourcing Neuronpedia! 🚀This includes all our mech interp tools: the interpretability API, steering, UI, inference, autointerp, search, plus 4 TB of data - cited by 35+ re...
- Neuronpedia is Now Open Source | The Residual Stream: Interpretability tools for absolutely everyone, for free. Plus 4TB of datasets.
Eleuther ▷ #lm-thunderdome (28 messages🔥):
Debugger updates, SmolLM Evaluation Issues, Open LLM Leaderboard Normalization, Subtask Aggregation PR
- Debugger Status remains vague: Members requested updates on the debugger's progress, but the specific status remained unclear, with a member offering assistance by asking what branch the debugger was working on.
- A member shared code modifications, suspecting they might be causing unnecessary load, and later reported fixing a bug related to the number of choices in questions, suggesting a PR submission.
- SmolLM leaderboard evals return empty aggregate scores: A member reported that aggregate scores for tasks like leaderboard_bbh, leaderboard_math_hard, and leaderboard_musr were empty in the results JSON when running leaderboard evaluations with lm-eval on SmolLM-1.7B.
- They provided the command used and example output, noting that individual tasks reported numbers as usual, and linked to the Hugging Face Dataset Card.
- Non-standard Normalization on the Open LLM Leaderboard: The discussion highlighted the use of a non-standard normalization method on the Open LLM Leaderboard for evaluating and comparing LLMs.
- The normalization was introduced to address issues with optimized prompts and evaluation setups that inflate model scores.
- Subtask Aggregation PR Adds Subtask Scores: A member shared a PR adding subtask aggregation, copied from a Hugging Face fork, to address missing aggregate scores in tasks with subtasks.
- Another member tested the PR, reporting that installing lm_eval via editable triggered an unrelated error, but the PR otherwise appeared to work as expected.
- Hugging Face – The AI community building the future.: no description found
- Performances are plateauing, let's make the leaderboard steep again : no description found
- Build software better, together: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
- leaderboard - add subtask scores by baberabb · Pull Request #2867 · EleutherAI/lm-evaluation-harness: added subtask aggregate scores from https://github.com/huggingface/lm-evaluation-harness/tree/main
- vllm backend faild · Issue #2028 · EleutherAI/lm-evaluation-harness: hi, i tried to eval : export CUDA_VISIBLE_DEVICES="2,3" accelerate launch -m lm_eval --model vllm \ --model_args pretrained="THUDM/glm-4-9b",dtype=bfloat16 \ --tasks mmlu \ --devic...
Eleuther ▷ #gpt-neox-dev (5 messages):
GPT-NeoX Pre-training on NVIDIA DGX Cloud, SLURM cluster restrictions, torchrun, DeepSpeed Launch modes
- Bypassing DeepSpeed Launcher on NVIDIA DGX Cloud: A member is pre-training GPT-NeoX on NVIDIA DGX Cloud but must bypass the default
deepy.py
launcher due to SLURM and SSH restrictions, using a custom script that leverages a hostfile andtorchrun
.- The member is using this script to perform argument parsing and launch
train.py
, and has questions regarding their approach.
- The member is using this script to perform argument parsing and launch
- Debating Direct
train.py
Execution: A member asked if they could directly startpython train.py
with encodedds_config
andmegatron_config
arguments, and how to handle GPU process spawning withouttorchrun
.- Another member confirmed this approach for manual bypassing, suggesting further modular refactoring to inject node-local processes that self-assign rank and detect GPU count, coordinating through principle rather than protocol.
- Navigating DGX Cloud with Torchrun: A user is using torchrun due to cluster restrictions and disabled SSH, referencing a comment and sample script from the NVIDIA DGX Cloud documentation.
- They are seeking guidance on whether they are implementing their custom solution correctly and whether they have reinvented the wheel.
- My servers used for multi-node training do not have ssh. How can I launch multi-node training using the torchrun command? · Issue #1203 · EleutherAI/gpt-neox: My machines used for multi-node training do not allow ssh service. How can I launch multi-node training using the most basic torchrun command (torch.distributed.launch) ? The servers which I use do...
- 3. Cluster User Guide — NVIDIA DGX Cloud Slurm Documentation: no description found
- hubble-gpt-neox/deepy_simple.py at 3dd12f3cdf78a29116d55d795f783f903cc284c0 · ameyagodbole/hubble-gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - ameyagodbole/hubble-gpt-neox
Interconnects (Nathan Lambert) ▷ #news (70 messages🔥🔥):
CodeScientist, OpenAI open language model, Meta's smart glasses, Multi-subject RLVR
- CodeScientist Automates Scientific Discovery: AllenAI introduces CodeScientist, a system for autonomous scientific discovery that uses genetic search over research articles and codeblocks to generate and evaluate machine-generated ideas, with 19 discoveries resulting from hundreds of experiments in agents and virtual environments, detailed in their paper.
- The system addresses limitations in current ASD systems by exploring broader design spaces and evaluating research artifacts more thoroughly, though one user noted that generated papers are rather short listicles PDFs and all papers are negative results.
- OpenAI Teases Open-Weight Language Model: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback to maximize its utility, detailed in Sam Altman's tweet and OpenAI's feedback form.
- Altman stated that they will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.
- Meta Plans Smart Glasses with Screen: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman's report.
- Members are interested to see how they'll do against xreal.
- Multi-subject data for paper Expanding RL: A multi-subject multiple-choice QA dataset ExamQA is used in the Expanding RL with Verifiable Rewards Across Diverse Domains paper.
- The dataset consists of 638k college-level instances, with both questions and objective answers written by domain experts for examination purposes.
- ChatGPT Gets a New Voice: OpenAI announced a new voice in ChatGPT, generating excitement and speculation about potential capabilities.
- One member joked about this being an April Fool's joke, while others expressed genuine interest.
- Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
- Tweet from Andrew Carr (e/🤸) (@andrew_n_carr): ask and ye shall receiveQuoting Andrew Carr (e/🤸) (@andrew_n_carr) I have a very specific agentic use case that is just hard enough that web scraping doesn't work. 1. I have a list of 1000+ books...
- Tweet from Cameron Jones (@camrobjones): New preprint: we evaluated LLMs in a 3-party Turing test (participants speak to a human & AI simultaneously and decide which is which).GPT-4.5 (when prompted to adopt a humanlike persona) was judged t...
- Tweet from Unitree (@UnitreeRobotics): Unitree Release | Unitree Dex5 Dexterous Hand - Mastering the World with Agility 🥳Single hand with 20 degrees of freedom (16 active+4 passive). Enable smooth backdrivability (direct force control). E...
- Tweet from Mark Gurman (@markgurman): NEW: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year. Here’s how they’ll work: https://www.bloomberg.com/news/articles/2025-04-01/how-meta-s-upc...
- Tweet from Sam Altman (@sama): we will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.we want everyone to use it!
- Tweet from OpenAI (@OpenAI): No joke, there's a new voice in ChatGPT.
- Tweet from Nathan Benaich (@nathanbenaich): sad to see - looks like the end of meta's fair is upon us
- Tweet from Nathan Lambert (@natolambert): I’ve joined OpenAI
- Tweet from Amir Efrati (@amir): you know what is bananas? growing a $3b-a-year business by 30% in >>3<< months.
- CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation: Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore vari...
- Tweet from Nathan Lambert (@natolambert): These days if you’re interested in post training you should read all the stuff Joanne puts out (mostly is this type of thing or model spec discussions). Only person talking about this stuff publicly.Q...
- virtuoussy/Multi-subject-RLVR · Datasets at Hugging Face: no description found
Interconnects (Nathan Lambert) ▷ #random (24 messages🔥):
Pydantic Evals, Grok solves math, Gemini vs GPT 4.5, MidJourney v6, GPT-4o translation
- *Pydantic Evals* is here: Pydantic Evals is a powerful evaluation framework designed to help you systematically test and evaluate the performance and accuracy of the systems you build, especially when working with LLMs.
- *Grok* solves math problems: After several unsuccessful attempts, a member found a prompt that got Grok to solve a math problem (the well-known Dubnovy Blazen problem in graph theory), showcased in this tweet.
- *Gemini* is overly eager: A member compared Gemini to GPT-4.5, observing that Gemini is overly eager to explain everything, write a lot, while making subtle, childlike jokes here and there, like an autistic engineer.
- *MidJourney* is cooking: MidJourney is currently in a preview / rating phase for the final model (likely drops tomorrow) and they are absolutely cooking.
- *GPT-4o* translation: Members commented that the GPT-4o is a simple translation which represents the reader's preference for simple english but a lot is lost in the translation itself, showcased in this YouTube video.
- Tweet from Timothy Gowers @wtgowers (@wtgowers): It's finally happened: after several unsuccessful attempts, I found a prompt that got Grok to solve a maths problem (the well-known Dubnovy Blazen problem in graph theory) I've been working on...
- Evals - PydanticAI: Agent Framework / shim to use Pydantic with LLMs
- - YouTube: no description found
Interconnects (Nathan Lambert) ▷ #rl (4 messages):
KL Penalty in RL, Base Models vs Instruct Models, Reasoning and Reinforcement Learning
- KL Penalty Dropping Debated for RL: The question arose as to why dropping the KL penalty might be beneficial when performing RL on base models but not on instruct models, as mentioned in Nathan Lambert's post.
- Reasoning Needed for Base Models: It was suggested that a larger change is needed on base models, but it may change for models that have a reasoning component.
- RLHF Book: Nathan Lambert is writing a book on RLHF that he strongly recommends reading.
Link mentioned: Recent reasoning research: GRPO tweaks, base model RL, and data curation: The papers I endorse as worth reading among a cresting wave of reasoning research.
Interconnects (Nathan Lambert) ▷ #reads (7 messages):
OpenAI returning, Long timelines to advanced AI
- Lambert airs OpenAI thoughts: Nathan Lambert shared his thoughts on OpenAI returning in a substack post, mentioning that he may use this format for unbaked career thoughts too.
- He also mentioned DMing some OpenAI folks about it, hoping to find allies of open source who feel exiled by the current situation.
- Toner's Rising Tide Substack launch: Helen Toner launched her new Substack called Rising Tide and shared a post on long timelines to advanced AI.
- In the post, she noted that arguing for anything like human-level AI in the first half of the 21st century used to be a bold claim requiring strong evidence.
Link mentioned: "Long" timelines to advanced AI have gotten crazy short: The prospect of reaching human-level AI in the 2030s should be jarring
GPU MODE ▷ #general (46 messages🔥):
CUDA occupancy, GPU parallel processing, A100 thread limit, GRPO training with Qwen
- *Debate sparks over maximum parallel threads on A100 GPUs: A discussion arose regarding the calculation of the maximum number of threads that can run in parallel on an A100 GPU, with one member stating the number is 96 * 2048*.
- Another member uses GeoHot's tool to test this hypothesis, showing that the practical limit on their A100 (96SM) GPU is 24576, or 256 threads per SM before performance degrades.
- *Warp scheduling explained as a way to hide latencies*: The discussion clarified that while a GPU can have many concurrent threads, they may not all run truly in parallel due to resource limitations like register space and shared memory.
- A member pointed out that GPUs use oversubscription to hide latencies, and context switches between warps are cheap (~1 cycle), unlike CPUs, adding threads above the limit for "parallel threads" does not necessarily increase runtime, or at least not significantly/measurably.
- *Experiments in GRPO training with Qwen 0.5B Model: A member shared that they have finished a GRPO training run with Qwen 0.5B* (code instruct) on the GPUMODE kernel dataset, but the model didn't effectively generate Triton kernels.
- They hypothesize that SFT to teach the model the basics of Triton implementation, followed by GRPO for refinement, will be more successful.
GPU MODE ▷ #triton (2 messages):
Disable autotune, Triton kernel
- Disable Triton Autotune Temporarily: A member asked for a way to disable autotune temporarily because their Triton kernel is called in two situations, where only one needs autotuning, and they are using the
triton.autotune
decorator.- Another member suggested using a global variable to turn autotune on/off and reload the module, or using autotune inside the kernel instead of as a decorator.
- Global variable trick: One member suggests turning autotune on/off using a global variable and reloading the module that contains the Triton kernel.
- The other option is to use autotune inside the kernel, not as a decorator, which doesn't require reloading the module.
GPU MODE ▷ #cuda (2 messages):
Request for PMPP book PDF, PMPP book
- Request for PMPP book PDF: A member asked for the PMPP book PDF of the recent version.
- No links or further details were provided in the message.
- PMPP Book Inquiry: A user requested the latest version of the PMPP book PDF.
- This request did not include any links or further context.
GPU MODE ▷ #torch (5 messages):
FlexAttention, Arbitrary Sequence Lengths, PyTorch 2.6, Tensor Subclass Use Case, Memory savings
- *FlexAttention* Embraces Arbitrary Sequence Lengths: FlexAttention now supports arbitrary sequence lengths, addressing the previous requirement for segment sequence lengths to be a multiple of 128, as of PyTorch 2.6.
- This enhancement was discussed with Horace He at a GPU mode event in San Jose.
- Tensor Subclass Use Case Questioned: A user inquired about the intended use case for a tensor subclass.
- This suggests a potential issue or area for improvement in PyTorch's tensor subclassing functionality, prompting further investigation.
- Desire for Memory Savings Using Tensor Deletion: A user is seeking methods to delete argument tensors within a loss function to achieve memory savings of approximately 7GB.
- The user wants to free the storage associated with a tensor after it's no longer needed, even if a reference exists in the outer scope, while ensuring it remains compatible with torch compilation to avoid graph breaks; see the GitHub Issue for more information.
Link mentioned: Graph break on Tensor._make_subclass · Issue #150265 · pytorch/pytorch: 🐛 Describe the bug I am having the following problem from torch import nn import torch torch_compile_options = { "epilogue_fusion" : True, "max_autotune" : True, "shape_paddi...
GPU MODE ▷ #cool-links (1 messages):
marksaroufim: https://arxiv.org/abs/2503.20313
GPU MODE ▷ #jobs (1 messages):
MLX, Apple hiring, ML systems
- Apple's MLX Team is Recruiting: Apple is hiring engineers to join their MLX team and advance the frontier of ML and systems.
- The role involves building scalable, distributed training and research pipelines, working with researchers and software engineers on novel ML research algorithms.
- ML System Development at Apple: Apple's Machine Learning Research org focuses on building technologies that will power future products.
- They seek engineers with system engineering and software development backgrounds to build scalable, distributed training and research pipelines.
Link mentioned: AIML - Software Engineer for MLX, MLR - Jobs - Careers at Apple: Apply for a AIML - Software Engineer for MLX, MLR job at Apple. Read about the role and find out if it’s right for you.
GPU MODE ▷ #beginner (2 messages):
CUDA Program Execution, GPU Volumetric Data Processing
- GPU Eats Gigabytes in Volumetric Data: For models processing volumetric data, like in the medical domain, a volume of 512³ voxels, 32 channels and fp16 activations can result in 8GiB of data per layer.
- This highlights the significant memory requirements for certain types of GPU computations.
- CUDA Kernel Code Compilation and Execution Explored: A member is trying to understand how execution of a CUDA program works and wants to know what exactly is sent over the PCIe bus, from the CPU to the GPU.
- They assume that kernel code is compiled into some GPU-machine byte code, and when a call to kernel code is made, this code is then sent to the GPU.
GPU MODE ▷ #off-topic (2 messages):
Egg noodles with chicken and vegetables, Image Analysis with YouTube
- Egg-cellent Noodle Dish Debuts: A member showcased a dish of egg noodles with chicken and vegetables in soy sauce with black pepper, featuring egg noodles, soy sauce, chicken fillet, onion, sweet red pepper, French green beans, beef fat, and sesame.
- An image of the dish was shared (IMG_20250401_045505.jpg).
- YouTube Analysis Uploads: An image analysis was conducted with a YouTube video titled * - YouTube*, although the description of the video is undefined.
Link mentioned: - YouTube: no description found
GPU MODE ▷ #irl-meetup (3 messages):
NYC Meetups, Community Meetup
- NYC Meetups in the Works: A member inquired about any meetups in NYC and another member confirmed they are planning something.
- The inquiring member responded with excitement, indicating interest in attending.
- Community Plans Meetup: A community meetup is planned.
- Enthusiastic community member is excited about upcoming plans.
GPU MODE ▷ #self-promotion (1 messages):
Megatron Tensor Parallelism, Fused/Parallel CE Loss
- Deep Dive into Megatron Tensor Parallelism is Illustrated!: A member wrote an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, seeking feedback on the content.
- Check out the illustrated deep-dive to deepen your understanding of ML scalability & performance techniques.
- Feedback Requested on Megatron-Style Parallelism Deep Dive: The author of an illustrated deep-dive on Megatron-style tensor parallelism is soliciting feedback.
- The article covers aspects like fused/parallel CE loss and aims to enhance understanding of ML scalability and performance.
Link mentioned: Tweet from Daniel Vega-Myhre (@vega_myhre): For any ML folks who want to deepen their understanding of ML scalability & performance techniques, I wrote an illustrated deep-dive into Megatron-style tensor parallelism: https://danielvegamyhre.git...
GPU MODE ▷ #🍿 (1 messages):
AlphaGeometry, LLM for kernel optimization
- Scoping AlphaGeometry-style LLM + verifier for kernel optimization: A member inquired about the prior exploration of using an AlphaGeometry-style LLM + verifier approach for kernel optimization.
- They asked if it had been attempted or discussed previously, acknowledging their potential rediscovery of existing concepts, given their newness to the field.
- Newbie questions about LLM and kernel optimization: A very new member is rediscovering ideas about kernel optimization and is requesting pointers to past discussions.
- They expressed an interest in anyone pointing them to what happened with the idea of using AlphaGeometry style LLM + verifier for the kernel optimization process.
GPU MODE ▷ #reasoning-gym (9 messages🔥):
OpenAI Open-Weight Reasoning Models, PR Review Requests, Arc AGI PR, Collisions PR, CodeIO Dataset Merged
- New Reasoning Gym Banner Shines: A new banner for the reasoning-gym, made with 4o, was shared, with a potential PR to add it to the readme.
- Another member pointed out the Rubik's cube depicted on the banner was "not a valid Rubik's cube reeeeeeeeeee".
- OpenAI to Open-Source Strong Models?: Members expressed surprise that OpenAI may publish strong open-weight reasoning models.
- One member speculated this could significantly increase OpenAI's valuation, while another reviewed two outstanding PRs.
- Arc AGI and Collisions PRs Ready for Scrutiny: The arc agi and collisions PRs are up for review.
- Changes were requested to the Collisions PR, specifically to unstage notebooks that were simply run without modifications.
- CodeIO Dataset Ingested: The CodeIO dataset was merged after a delay; further postprocessing will align it with the existing implementation.
- Thanks to the user who merged the CodeIO dataset.
GPU MODE ▷ #general (3 messages):
.py scripts vs .cu files, active python leaderboards
- Clarification Sought: Python vs. CUDA Submissions: A member inquired whether the leaderboards currently only accept .py scripts and not .cu files.
- Another member suggested reviewing a previous message for clarification on the submission guidelines.
- Active Leaderboards Confirmation: A member questioned whether all active leaderboards are currently restricted to Python submissions.
- Another member directed them to a previous message, likely containing details about active leaderboards and submission requirements.
GPU MODE ▷ #submissions (17 messages🔥):
vectorsum, conv2d, vectoradd, matmul, grayscale
- Vectorsum benchmark floods leaderboard: Multiple benchmark submissions for
vectorsum
on L4 and H100 GPUs using Modal runners have succeeded, with submission IDs 3372, 3374, 3375, 3395, 3396, and 3397. - Conv2d benchmark succeeds on multiple GPUs: A leaderboard submission for
conv2d
on L4, T4, A100, and H100 GPUs using Modal runners has succeeded, with submission ID 3373. - Vectoradd benchmarks hit T4 and H100: Leaderboard submissions for
vectoradd
on H100 and T4 GPUs using Modal runners have succeeded, with submission IDs 3394 and 3399 respectively. - Matmul benchmarks meet A100: Leaderboard submissions for
matmul
on A100 GPUs using Modal runners have succeeded, with submission IDs 3400 and 3408. - Grayscale tests get going: Multiple test submissions for
grayscale
on H100 GPUs using Modal runners have succeeded, with submission IDs 3402, 3403, 3404, 3405, 3406, and 3407.
Latent Space ▷ #ai-general-chat (75 messages🔥🔥):
Cursor's Funding Round, Etched's New Transformer ASIC, OpenAI's New Open-Weight Language Model, OpenDeepSearch (ODS), Sophont: Open Multimodal Foundation Models for Healthcare
- Cursor Closes Cashy Round, Codes Vibes: Cursor closed a $625M funding round at a $9.6B post-money valuation, led by Thrive and A16z, with Accel as a new backer, achieving $200M ARR, a 4x increase from its previous round in November 2024 (Source).
- Abe Brown noted that Cursor's valuation has grown rapidly, sparking the buzzphrase vibe coding and seeing its valuation possibly reach $10B.
- Etched, the Transformer ASIC Startup Etches $85M Round: Etched, a startup developing transformer ASICs, closed an unannounced $85M round at a $1.5B valuation, following two stealth rounds at $500M and $750M, with their chip Sohu able to process over 500,000 tokens per second running Llama 70B (Source).
- Etched claims one 8xSohu server replaces 160 H100s, but Sohu cannot run CNNs, LSTMs, SSMs, or any other AI models.
- OpenAI Opens Up: Open-Weight Model Incoming: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback on how to make it maximally useful (Source).
- The company will evaluate the model according to their preparedness framework and host developer events in SF, Europe, and APAC to gather feedback and test early prototypes, and Nathan Lambert expects a 30B parameter reasoning model with MIT/Apache license (Source).
- OpenDeepSearch Opens Up Web Search: Seoong79 announced the release of OpenDeepSearch (ODS), an open-source search agent that works with any LLM, outperforming OpenAI’s specialized model for web search, GPT-4o-Search, on the challenging, multi-hop FRAMES benchmark from DeepMind by +9.7% accuracy (Source).
- Sophont Startup Seeks to Solve Medical AI: iScienceLuvr announced the launch of Sophont, a company building open multimodal foundation models for the future of healthcare, aiming to create a DeepSeek for medical AI (Source).
- Introducing Amazon Nova Act | Amazon AGI Labs: no description found
- Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
- Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): I have EXCITING news:I've started a company!Introducing SophontWe’re building open multimodal foundation models for the future of healthcare. We need a DeepSeek for medical AI, and @SophontAI will...
- Tweet from Guillermo Rauch (@rauchg): We're building an API to run arbitrary compute, targeting agentic AI usecases and long-running tasks. Yes, it can run servers.Powered by the infra that runs our 1M+ daily @vercel builds, optimized...
- Tweet from Steven Heidel (@stevenheidel): we're releasing a model this year that you can run on your own hardwareQuoting Sam Altman (@sama) TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the c...
- Tweet from Sewoong Oh (@sewoong79): We are releasing OpenDeepSearch (ODS), an open-source search agent that works with any LLM. When paired with DeepSeek-R1, ODS outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on t...
- Tweet from Arfur Rock (@ArfurRock): 🚨New unicorn alert — Etched, word's first transformer ASICClosed an unannounced $85M at $1.5B, following two other stealth rounds at $500M then $750M.The $750M round was just ~2 months ago.Quotin...
- Tweet from Arfur Rock (@ArfurRock): 🚨New unicorn alert — Etched, word's first transformer ASICClosed an unannounced $85M at $1.5B, following two other stealth rounds at $500M then $750M.The $750M round was just ~2 months ago.Quotin...
- Tweet from Steven Heidel (@stevenheidel): we're releasing a model this year that you can run on your own hardwareQuoting Sam Altman (@sama) TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the c...
- Tweet from Sam Julien (@samjulien): Really happy to see our @Get_Writer Palmyra X 004 model from October up on the @rungalileo Agent Leaderboard coming in at #9, beating other models from the same time like Claude 3.5 Sonnet and Gemini ...
- Tweet from Justin Uberti (@juberti): The new OpenAI realtime transcription API now supports WebRTC connections, which allows you to easily connect a MediaStream or <audio> element to the API. Just made a quick demo to show this off...
- Tweet from Amazon Science (@AmazonScience): Meet Amazon Nova Act — an effortless way to build AI agents that can reliably use browsers 🧑💻With our new model, compose robust steps into complex workflows; handle everything from bookings to QA t...
- Tweet from Arfur Rock (@ArfurRock): Cursor round closed — $625M at $9.6B post led by Thrive & A16z. Accel is a new backer.$200M ARR, up 4x from $2.5B round in November 2024.ARR multiple constant from last round at 50x.Quoting Abe Brown ...
- Tweet from Arfur Rock (@ArfurRock): Cursor round closed — $625M at $9.6B post led by Thrive & A16z. Accel is a new backer.$200M ARR, up 4x from $2.5B round in November 2024.ARR multiple constant from last round at 50x.Quoting Abe Brown ...
- Tweet from Sam Julien (@samjulien): Really happy to see our @Get_Writer Palmyra X 004 model from October up on the @rungalileo Agent Leaderboard coming in at #9, beating other models from the same time like Claude 3.5 Sonnet and Gemini ...
- Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
- Some thoughts on OpenAI returning to open releases: And welcome to my extra uneditted thoughts blog.
- Tweet from Amazon Science (@AmazonScience): Meet Amazon Nova Act — an effortless way to build AI agents that can reliably use browsers 🧑💻With our new model, compose robust steps into complex workflows; handle everything from bookings to QA t...
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): I have EXCITING news:I've started a company!Introducing SophontWe’re building open multimodal foundation models for the future of healthcare. We need a DeepSeek for medical AI, and @SophontAI will...
- Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
- Tweet from Nathan Lambert (@natolambert): Some basic logic for why OpenAI will probably release an ~30B param reasoning model with MIT/apache.* OpenAI will only release something clearly SOTA in size category* Will release a reasoning model, ...
- Tweet from Arfur Rock (@ArfurRock): Cursor round closed — $625M at $9.6B post led by Thrive & A16z. Accel is a new backer.$200M ARR, up 4x from $2.5B round in November 2024.ARR multiple constant from last round at 50x.Quoting Abe Brown ...
- Tweet from Nathan Lambert (@natolambert): Some basic logic for why OpenAI will probably release an ~30B param reasoning model with MIT/apache.* OpenAI will only release something clearly SOTA in size category* Will release a reasoning model, ...
- Tweet from Arfur Rock (@ArfurRock): 🚨New unicorn alert — Etched, word's first transformer ASICClosed an unannounced $85M at $1.5B, following two other stealth rounds at $500M then $750M.The $750M round was just ~2 months ago.Quotin...
- Tweet from Sewoong Oh (@sewoong79): We are releasing OpenDeepSearch (ODS), an open-source search agent that works with any LLM. When paired with DeepSeek-R1, ODS outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on t...
- Tweet from Sewoong Oh (@sewoong79): We are releasing OpenDeepSearch (ODS), an open-source search agent that works with any LLM. When paired with DeepSeek-R1, ODS outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on t...
- Tweet from Amazon Science (@AmazonScience): Meet Amazon Nova Act — an effortless way to build AI agents that can reliably use browsers 🧑💻With our new model, compose robust steps into complex workflows; handle everything from bookings to QA t...
- Tweet from Guillermo Rauch (@rauchg): We're building an API to run arbitrary compute, targeting agentic AI usecases and long-running tasks. Yes, it can run servers.Powered by the infra that runs our 1M+ daily @vercel builds, optimized...
- Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
- Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
- Tweet from Stephanie Palazzolo (@steph_palazzolo): This isn't an April Fools joke: ChatGPT revenue has surged 30% in just three months. In this morning's Agenda, @amir and I get into ChatGPT's growth, the OpenAI-Google attention war, and w...
- Tweet from Stephanie Palazzolo (@steph_palazzolo): This isn't an April Fools joke: ChatGPT revenue has surged 30% in just three months. In this morning's Agenda, @amir and I get into ChatGPT's growth, the OpenAI-Google attention war, and w...
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): I have EXCITING news:I've started a company!Introducing SophontWe’re building open multimodal foundation models for the future of healthcare. We need a DeepSeek for medical AI, and @SophontAI will...
- Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
- Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
- Tweet from Stephanie Palazzolo (@steph_palazzolo): This isn't an April Fools joke: ChatGPT revenue has surged 30% in just three months. In this morning's Agenda, @amir and I get into ChatGPT's growth, the OpenAI-Google attention war, and w...
- The case against conversational interfaces: Conversational interfaces are a bit of a meme. Every couple of years a shiny new AI development emerges and people in tech go "This is it! The next computing paradigm is here! We'll only use...
- GitHub - aws/nova-act: Amazon Nova Act is a research preview of a new AI model for developers to build agents that take actions in web browsers: Amazon Nova Act is a research preview of a new AI model for developers to build agents that take actions in web browsers - aws/nova-act
- OpenAI Valued at $300 Billion After Record-Setting Funding Round: OpenAI on Monday announced it has raised $40 billion in what is easily the largest funding round for a private tech company ever.
- [AINews] >$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b): More money is all you need AI News for 3/28/2025-3/31/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 17665 messages) for you....
HuggingFace ▷ #general (42 messages🔥):
DeepSeek R1, xAI Acquires X, Hyperparameter tuning LLMs, SFTTrainer hanging, stable_baselines3 CPU faster than GPU
- DeepSeek R1 outmaneuvers Western labs: A user linked to a tweet criticizing lazy attacks trying to downplay DeepSeek R1, which outmaneuvered bloated Western labs through ferocious execution and resource efficiency.
- The member added that DeepSeek also released weights under a maximally permissive MIT license and democratized RL for the GPU poor through GRPO.
- xAI gobbles up X: A member linked to a tweet announcing that xAI has acquired X in an all-stock transaction, valuing xAI at $80 billion and X at $33 billion.
- The combination unlocks immense potential by blending xAI’s advanced AI capability and expertise with X’s massive reach.
- LLM Hyperparameter Tuning Resource Quest: A member inquired about resources for choosing hyperparameters when fine-tuning LLMs, seeking a god send of a resource that addresses how changing context affects certain hyperparameters.
- Another member suggested checking out Unsloth's Discord and linked to Unsloth's LoRA Hyperparameters Guide.
- SFTTrainer freezes mid-training: A user reported that their SFTTrainer was hanging after truncating the training dataset, and it timed out after an hour.
- A member suggested that the issue might be due to a lack of progress bar appearance and potential misconfiguration of TrainingArguments or Trainer settings.
- CPU outruns GPU with stable_baselines3: A user reported receiving a warning about running PPO on the GPU with MlpPolicy, suggesting it's primarily intended for the CPU, and linked to a GitHub issue.
- The user was confused why the CPU might be faster than the GPU when running a Multi-Layer Perceptron.
- Tweet from Elon Musk (@elonmusk): @xAI has acquired @X in an all-stock transaction. The combination values xAI at $80 billion and X at $33 billion ($45B less $12B debt). Since its founding two years ago, xAI has rapidly become one of ...
- Tweet from Ross Taylor (@rosstaylor90): Nothing against Microsoft AI, who do very good work - but I just hate these kind of lazy attacks trying to downplay DeepSeek R1.In reality, DeepSeek outmanoeuvred bloated Western labs through ferociou...
- LoRA Hyperparameters Guide | Unsloth Documentation: Best practices for LoRA hyperparameters and learn how they affect the finetuning process.
- Hugging Face - Learn: no description found
- @RudeBoi on Hugging Face: "Can someone please explain to me why I am getting this error message? Please…": no description found
- Hyperparameter tuning - GeeksforGeeks: Hyperparameter tuning is the process of selecting optimal configuration settings for a machine learning model to enhance its performance, with techniques including GridSearchCV, RandomizedSearchCV, an...
- Hyperparameter search: no description found
- Hyperparameter Optimization: no description found
- [Bug]: Very Poor A2C CPU Utilization · Issue #1245 · DLR-RM/stable-baselines3: 🐛 Bug When using the A2C algorithm with 50 vectorized environments on a Ryzen Threadripper machine with 12 cores / 24 threads, I'm getting this really poor, almost single-core CPU utilization. I&...
HuggingFace ▷ #today-im-learning (3 messages):
Agents Course Unit 2.1, Run Jupyter Lab Locally, RL Course Frozen Lake issue
- Agents Course Unit 2.1 Runs Locally: A member mentioned they are learning Agent Course Unit 2.1 and it works when run using the kernel of a local venv having jupyterlab and its widgets installed.
- They noted that Colab is not an option for them because they do not have a Google account.
- Instructions to run Jupyter Lab Locally: A member is looking into how to get a notebook to run, asking if they should clone the repo and run jupyter-lab locally using these instructions.
- The user expressed confusion on where they should run it but mentions that if they used colab google, they're unsure how to link the notebook to the Colab Google Workspace.
- Frozen Lake Code Fixed: A member noticed that the code offered in Unit 2 in the RL Course for Frozen Lake was not working due to a Python Version issue.
- They shared a link to their HuggingFace page with code to resolve the pickle5 problem.
Link mentioned: What are LLMs? - Hugging Face Agents Course: no description found
HuggingFace ▷ #cool-finds (3 messages):
OpenHands LM, Autonomous Agents, Nature article on data access
- *OpenHands LM* opens coding!: The new open coding model OpenHands LM is available on Hugging Face and a reasonable size at 32B to run locally, according to a member.
- It is intended for use in autonomous agents for software development and more information is available on the project blog.
- Data access is restricted on Nature's article: A Nature article has restricted access to data under a clinical trial protocol, to share deidentified information with researchers but prohibits it from being publicly available.
- To protect the participant’s anonymity, any information that could identify her will not be part of the shared data, specifically her personalized voice synthesizer.
- A streaming brain-to-voice neuroprosthesis to restore naturalistic communication - Nature Neuroscience: Naturalistic communication is an aim for neuroprostheses. Here the authors present a neuroprosthesis that restores the voice of a paralyzed person simultaneously with their speaking attempts, enabling...
- all-hands/openhands-lm-32b-v0.1 · Hugging Face: no description found
- All Hands AI: no description found
HuggingFace ▷ #i-made-this (1 messages):
tonic_1: very cool
HuggingFace ▷ #computer-vision (1 messages):
YOLO vertical object detection, CNN vertical object detection, Instance segmentation fragments
- YOLO & CNN Seek Vertical Vision Boost: A member inquired about enhancing YOLO or any CNN's ability to detect vertical objects, asking if increasing depth would help.
- Responses suggested exploring data augmentation techniques or custom loss functions.
- Fragmented Instance Segmentation Fixes: A member is facing issues with their instance segmentation model detecting fragments of the same objects.
- They asked for suggestions on how to make the model recognize these fragments as one object, such as using label tags across segments.
HuggingFace ▷ #gradio-announcements (2 messages):
Gradio Milestone, Million monthly active developers
- Gradio Reaches One Million Monthly Active Developers!: Gradio announced it has achieved a milestone of 1,000,000 monthly active developers using the platform to create and share AI interfaces.
- The Gradio team expressed gratitude to the community for their invaluable contributions in achieving this significant milestone.
- Community Celebrates Gradio's Success: Members of the Gradio community celebrated the platform's achievement of reaching one million monthly active developers.
- Community members acknowledged Gradio's impact on enabling ML researchers and companies to build production-ready AI interfaces, highlighting the platform's growth and importance in the AI landscape.
HuggingFace ▷ #agents-course (16 messages🔥):
OpenAIServerModel with Ollama, Langraph OpenAI API model alternatives, Release of Unit 3
- Ollama Plays Well With OpenAIServerModel: Members discussed using OpenAIServerModel with Ollama, given its compatibility with the OpenAI API.
- Seek Alternatives to Langraph OpenAI API Model: One member requested recommendations for alternatives to the Langraph OpenAI API model for an email agent.
- Unit 3 Delayed, Community Screams Into Void: Many members are eagerly awaiting the release of Unit 3 for the Agent course, though one member noted its release was delayed.
- One member joked maybe if we all keep yelling into the void, the void will call back.
HuggingFace ▷ #open-r1 (1 messages):
Liger Kernel, GPU Memory Occupation, Speed vs. Memory Trade-off
- *Liger Kernel*: Speed Boost vs. Memory Hog: A user found that applying the Liger kernel significantly improved speed but resulted in high GPU memory occupation.
- They questioned if their application method was flawed, seeking advice to optimize memory usage without sacrificing performance.
- Analyzing Liger Kernel's Memory Footprint: The user's experience highlights a potential trade-off between computational speed and memory consumption when using the Liger kernel.
- Further investigation is needed to understand the kernel's memory management and identify possible optimization strategies.
Modular (Mojo 🔥) ▷ #general (9 messages🔥):
MAX 25.2 livestream, Chris lightning talk, GTC Chris video
- *MAX 25.2 Livestream* Kicks Off!: Modular's MAX 25.2 livestream was announced, inviting viewers to join via LinkedIn or YouTube to ask the team questions live.
- Due to technical difficulties, a new livestream link was shared (YouTube), “Introducing MAX 25.2 Live!”.
- Apologies for Tech Glitches During Livestream!: Members apologized for the technical issues during the MAX 25.2 livestream, assuring a better system for the next event.
- One member humorously recounted accidentally watching a video of Chris at GTC thinking it was part of the livestream.
- Cleaned Up Livestream and Chris's Talk Available!: A cleaned-up recording of the MAX 25.2 livestream was posted (YouTube) for those who missed it live.
- A full recording of Chris' lightning talk at the Modular booth is also available on YouTube.
- - YouTube: no description found
- Introducing MAX 25.2 Live!: Join us April 1st for a livestream where we’ll dive into all things MAX 25.2! 💥 Get the full scoop on our latest release from the team who built it and a be...
- - YouTube: no description found
Modular (Mojo 🔥) ▷ #mojo (59 messages🔥🔥):
Compiler Bug, Enums, Flex Attention, Float to String Algorithm, FlashAttention-2 in Mojo
- Confusing Compiler Error Message Exposed: A user reported a confusing error message when defining a method for a
Dataset
struct, suspecting a compiler bug, and provided a GitHub issue link.- Another user suggested the issue might be due to using
out self
instead ofmut self
, while acknowledging the compiler error message was still confusing.
- Another user suggested the issue might be due to using
- Enum Updates Still MIA: A user inquired about updates on enums in Mojo, but unfortunately, there are no updates available.
- The response was a simple "Sadly no. 🙃🙃🙃"
- MAXing FlexAttention Implementation: A user asked about implementing flex-attention in Mojo and whether it's difficult, linking to a PyTorch blog post on flex-attention.
- It was suggested that implementing it as a custom op in MAX is possible and that Mojo on the GPU is close to CUDA, allowing control over memory movement, so "unless you run into something that's a work in progress, MAX should be able to do more or less whatever you want."
- Float-to-String Algorithm Porting Disappoints: A user ported a new float to string algorithm to Mojo, referencing the creator's CPPCon talk, but found it slower than the standard library's dragonbox implementation and shared a link to the relevant code.
- The user noted that stringifying
canada.json
went from mid 30ms to low 40s, despite ripping the formatting from the standard library.
- The user noted that stringifying
- FlashAttention-2 Recipe Revealed: A user shared a link to a recipe containing a version of FlashAttention-2 in Mojo, emphasizing it was written for readability, not super-optimized performance, see custom-ops-ai-applications.
- Another link was provided to a recipe showing progressive optimization of matrix multiplication using Mojo's memory layout abstractions, see custom-ops-matrix-multiplication.
- ir_utils.mojo: GitHub Gist: instantly share code, notes, and snippets.
- GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
- GitHub - cassioneri/teju_jagua: Teju Jagua: Teju Jagua. Contribute to cassioneri/teju_jagua development by creating an account on GitHub.
- EmberJson/emberjson/teju/__init__.mojo at main · bgreni/EmberJson: A user friendly json library written in pure Mojo. Contribute to bgreni/EmberJson development by creating an account on GitHub.
- Custom Operations: Applications in AI Models Recipe | MAX Builds: no description found
- Custom Operations: Optimizing Matrix Multiplication Recipe | MAX Builds: no description found
- teju_jagua/teju/mshift.h at main · cassioneri/teju_jagua: Teju Jagua. Contribute to cassioneri/teju_jagua development by creating an account on GitHub.
- [BUG] Confusing error message for broken method definition · Issue #4248 · modular/max: Bug description Actual behavior I made a mistake defining a method for Dataset struct. The parameter just had a type instead of owned type. The error message told me that self was of the wrong type...
Nous Research AI ▷ #general (40 messages🔥):
OpenAI API, Midjourney New Research, Sam Altman open-weight language model, Psyche p2p, Anthropic Insights on LLMs
- One-Line Fix Makes OpenAI API tutorials work: Any tutorial that works with OpenAI API should work with the Nous Research AI API, provided you change the endpoint in the code to
endpoint = "api.nousresearch.com"
.- One user confirmed they had it running with that change and will be adding styles.
- Midjourney's LLMs Write More Creatively: Midjourney released a new research paper alongside machine learning experts at NYU on training text-based large language models (LLMs) to write more creatively, expanding beyond its image generation focus.
- The company is also building its own computing and AI hardware, announced in late summer 2024.
- Sam Altman Teases New Open-Weight Model: Sam Altman announced plans to release a new open-weight language model with reasoning capabilities in the coming months, seeking developer feedback to maximize its usefulness.
- Developer events are planned in SF, Europe, and APAC to gather feedback and test early prototypes, marking OpenAI's first open-weight model release since GPT-2 (link to the announcement).
- Tracing Thoughts in Language Models: Anthropic's Insights: Anthropic has released research (Tracing Thoughts in Language Models) indicating that LLMs have a thinking language of their own and think ahead more than previously thought.
- LLMs operate in more complex ways than just processing single tokens.
- DeepSeek jiu jitsu makes open source Open AI Model possible: Members on the channel expressed 'gratitude to DeepSeek for applying complex Jiu Jitsu maneuvers to make this a reality for the Open Source community'.
- This sentiment was echoed along with link to YouTube video discussing OpenAI's shifting strategy related to the open-weight model.
- Midjourney’s surprise: new research on making LLMs write more creatively: There's still a lot of juice left to be squeezed, cognitively and performance-wise, from classic Transformer-based, text-focused LLMs.
- Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
- What OpenAI's new model reveals about its shifting strategy: CNBC’s Deirdre Bosa joins 'The Exchange' to discuss OpenAI's plans to release open-weight model in the coming months.
Nous Research AI ▷ #ask-about-llms (8 messages🔥):
DeepHermes Reasoning, Structured Output with Langchain, DeepHermes AI, Tool Calling with Reasoning
- DeepHermes Reasoning Reliability Investigated: According to a member, it is currently more reliable to avoid using JSON or tool calling with reasoning mode in DeepHermes, instead opting for the non-reasoning mode.
- The next version of DeepHermes is expected to improve on reasoning and tool calling; however, for current use, combining a reasoning system prompt, a newline, and a tool calling system prompt may yield acceptable results.
- DeepHermes AI Discovered: A member excitedly noted the existence of DeepHermes AI, discovering it is a 3B model.
- The same member observed that reasoning in DeepHermes appears to be implemented as a chain of thoughts with
<think> </think>
tags.
- The same member observed that reasoning in DeepHermes appears to be implemented as a chain of thoughts with
Nous Research AI ▷ #research-papers (2 messages):
Project Loong Release, Synthetic Data Generation
- CamelAIOrg Launches Project Loong 🐉: CamelAIOrg introduces Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data.
- The project features a blog post detailing the modular design that integrates synthetic data generation with semantic verification and a multi-agent framework ensuring accuracy and consistency.
- Sampling Presets Impact Metric Distribution: A member expressed curiosity about how the distribution of a beautiful metric shifts with sampling presets.
Link mentioned: Tweet from CAMEL-AI.org (@CamelAIOrg): Introducing Project Loong 🐉Blog: https://camel-ai.org/blogs/project-loong-synthetic-data-at-scale-through-verifiers…• Our structured approach to generating and validating synthetic data for enhanced ...
Nous Research AI ▷ #interesting-links (10 messages🔥):
Nous Research Portal Git Repo, X Link Removal, Contributing to Nous Research, Google's Style Guide
- *Git Repo for Nous Research Portal Remains Elusive: A member inquired about the Git repository for the Nous Research portal, but it was clarified that no Git repository is needed as it uses the OpenAI library*.
- The portal's environment details include production status for VercelEnv and NodeEnv, with an unknown commit, branch, last commit, and modification status.
- *X Link Vanishes Due to Security Concerns: A member asked about the location of a specific X link, only to be informed that it was deleted due to concerns that it could steal user keys*.
- No further details were provided regarding the nature of the link or the specific security risks it posed.
- *Applications are the Way to Contribute to Nous Research: In response to an inquiry about contributing to Nous Research, it was suggested that users can contribute by making applications with the models*.
- Clarification was given that users should focus on building services using the API, rather than modifying the service itself.
- *Google's Style Guide Offers Steroids for Code Generation: A member shared a link to Google's Style Guide, describing it as steroids for code generation*.
- The style guide (google/styleguide) includes guidelines for AngularJS, Common Lisp, C++, and C#.
- Google Style Guides: Style guides for Google-originated open-source projects
- Nous Portal: no description found
Nous Research AI ▷ #research-papers (2 messages):
Project Loong, Synthetic Data Generation, Model Performance Enhancement
- Camel AI Launches Project Loong: Camel AI introduced Project Loong 🐉, a modular solution for generating and verifying synthetic data, and requests shares and reposts of the announcement.
- Project Loong employs a structured approach integrating synthetic data generation with semantic verification.
- Project Loong Enhances Model Performance: Project Loong aims to enhance model performance through a multi-agent framework ensuring accuracy and consistency.
- The project focuses on empowering domain-specific models with reliable reasoning signals generated from synthetic data.
Link mentioned: Tweet from CAMEL-AI.org (@CamelAIOrg): Introducing Project Loong 🐉Blog: https://camel-ai.org/blogs/project-loong-synthetic-data-at-scale-through-verifiers…• Our structured approach to generating and validating synthetic data for enhanced ...
Yannick Kilcher ▷ #general (35 messages🔥):
Graph Learning Evolution, AI/ML Job Impact, RLHF Alignment and Nerfed Models, Gemini 2.5 Pro Math Abilities, Dream Journaling App
- *Graphs Evolve Beyond 2018, Sparks Graph Learning Renaissance: A member shared a Google Research blogpost about the evolution of graph learning and expressed interest in recent advancements since 2019*.
- The blogpost traces graph theory back to Leonhard Euler in 1736 and discusses its applications in modeling relationships and connections.
- *AI/ML Transforms Job Market: Low-Level Roles Threatened: A member suggested that recent AI/ML* advancements primarily impact low-level jobs, such as minor programming tasks, but emphasized the human capacity to adapt.
- They noted AI/ML reduces dependencies on others, like using AI/ML for initial legal assistance, which saves resources and enables multi-disciplinary tasks.
- *RLHF Alignment: Suppressed Behaviors Resurface in AI Models: Discussion revolved around RLHF and the potential for emergent misalignment if models are penalized for useful tasks, like ML R&D or data collection*.
- Concerns were raised that if open-source models are nerfed, the first self-improving models might become increasingly evil as they compensate for suppressed behaviors.
- *Gemini 2.5 Pro Flunks Math, UI Fails Disgracefully: A member tested Gemini 2.5 Pro (experimental) in math and found it to be totally trash*, also the member added that Google's UI doesn't display math correctly.
- When asked about information theory and geometry, ChatGPT and Grok 3 were better at understanding questions, even when poorly written and the user later guided it to write correctly.
- *Dream Journaling App Aims to Analyze Lucid Dreams*: A member announced the creation of Rem, a dream journaling app designed for easy recording, analysis, and sharing of dreams.
- No secondary summary given.
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning: Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes rem...
- The evolution of graph learning: no description found
- Rem: Record your dreams, uncover hidden patterns, and connect with a community of dreamers in a beautiful, secure space.
Yannick Kilcher ▷ #paper-discussion (5 messages):
RLHF, Reward Hacking, Response Diversity, Reasoning Task Verifiers, Generative Reward Model
- Aligning LLMs via Reinforcement Learning from Human Feedback: A paper on Reinforcement Learning from Human Feedback (RLHF) was shared, noting its importance for aligning large language models with human preferences, available at arxiv.org/abs/2503.22230.
- Overlooking Prompt-Data Construction in RLHF: The paper addresses the overlooked importance of prompt-data construction and explores data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity.
- Hybrid Reward System Mitigates Reward Hacking: The paper introduces a hybrid reward system combining Reasoning Task Verifiers (RTV) and a Generative Reward Model (GenRM) to mitigate reward hacking.
- Prompt Selection Method Enhances Learning Effectiveness: A novel prompt-selection method, Pre-PPO, is proposed to maintain response diversity and enhance learning effectiveness.
- Prioritizing Tasks Early Improves Performance: The paper finds that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance, with experiments across two model sizes validating the methods' effectiveness and scalability.
Link mentioned: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback: Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of...
Yannick Kilcher ▷ #ml-news (20 messages🔥):
AI Dog Chasing Tail, AI Model Feedback, Runway Relevance, OpenAI Model Release Speculation, GPT-3.5 vs Thinking Models
- AI Called Out for Chasing its Tail: A member expressed skepticism about current AI capabilities, describing them as probabilistic text completion with a few hacks and questioning whether they constitute true thinking.
- They expressed disillusionment with the hype surrounding AI, feeling that improvements since GPT-3.5 have been more about fine-tuning than significant breakthroughs and want end to end no hand holding.
- OpenAI Model Feedback Forum Launches: A member linked to the OpenAI Open Model Feedback forum.
- Another quoted Ilya Sutskever, noting that if there was one great failing it would be that you always had to check the results.
- Runway's Relevance Debated: A member questioned whether anyone still cares about what Runway does.
- Another member linked to a tweet showcasing an AI-generated KFC ad concept made with Runway, Pika, Kling AI, Google DeepMind Veo2, Luma AI, OpenAI Sora, and Topaz Labs.
- OpenAI Release Speculation Surfaces: Members speculated about OpenAI's next model release, guessing it might be a smaller model for mobile, especially since their Apple deal fell through.
- Some joked and pondered if they'd be releasing GPT 2.5, 100M parameters.
- Deepseek R1 is like a university student: A member stated there's a massive gap between GPT 3.5 and any thinking models, likening GPT 3.5 to a 10 year old while describing Deepseek R1 as being like a university student.
Link mentioned: Tweet from Salma (@Salmaaboukarr): I'm blown away!😱 This KFC concept ad is 100% AI generated!My friend David Blagojevic (he's not on X) created this ad concept for KFC and it's incredible! Tools used: Runway, Pika, Kling...
MCP (Glama) ▷ #general (38 messages🔥):
MCP RBAC Implementation, Docker alternatives, MCP server for webapp, VirusTotal Integration, MCP for make.com or n8n cloud
- MCP Gains Traction with Pichai's Tweet: Following a tweet from Sundar Pichai asking 'To MCP or not to MCP, that's the question', interest in MCP has surged, with his tweet gaining over a million views.
- A Reddit moderator of /r/mcp even suggested doing an AMA if Google is leaning into MCP.
- Crafting RBAC on MCP Server: Users are exploring Role-Based Access Control (RBAC) implementations on MCP servers to segment tool visibility based on user roles.
- One user suggested integrating with WorkOS and another mentioned that Toolhouse API does RBAC based on the API key.
- SDK Governance layer is Open Sourced!: A member shared an open source SDK designed for implementing enterprise governance (Identity, RBAC, Credentials, Auditing, Logging, Tracing) within the Model Context Protocol framework at ithena-one/mcp-governance-sdk.
- Feedback from the community is encouraged and very welcome.
- DesktopCommanderMCP crafts code for you: A user recommends DesktopCommanderMCP to create and update files for Claude, providing terminal control, file system search, and file editing capabilities via wonderwhy-er/DesktopCommanderMCP.
- They suggest to get the llm to pick the right servers and only get the context of those, instead of overwhelming the context with 30 mcps.
- Nova act is considered for MCP: A member suggested that it would not be difficult to have Claude spit out the act calls (from amazon's Nova) and feed them to an MCP server hooked up to whatever is performing the actual browsing (i.e. some nova endpoint), see this video.
- This approach involves Claude generating
nova.act
commands based on user requests, which are then executed by the MCP server.
- This approach involves Claude generating
- Tweet from Sundar Pichai (@sundarpichai): To MCP or not to MCP, that's the question. Lmk in comments
- Tweet from Frank Fiegel (@punkpeye): @sundarpichai Hey @sundarpichai I am moderator of /r/mcp on RedditIf Google is leaning into MCP, let's do an AMA.
- GitHub - nerding-io/n8n-nodes-mcp: n8n custom node for MCP: n8n custom node for MCP. Contribute to nerding-io/n8n-nodes-mcp development by creating an account on GitHub.
- GitHub - matthewhand/mcp-openapi-proxy: Contribute to matthewhand/mcp-openapi-proxy development by creating an account on GitHub.
- GitHub - wonderwhy-er/DesktopCommanderMCP: This is MCP server for Claude that gives it terminal control, file system search and diff file editing capabilities: This is MCP server for Claude that gives it terminal control, file system search and diff file editing capabilities - wonderwhy-er/DesktopCommanderMCP
- GitHub - ithena-one/mcp-governance-sdk: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK - ithena-one/mcp-governance-sdk
MCP (Glama) ▷ #showcase (13 messages🔥):
ActivePieces drops MCP support, MCP Autotest Tool, MCP Weekly Newsletter, Playwrite MCP server with Smithery, MCP synchronous limitations
- *ActivePieces* Cuts Off MCP Support: Active pieces, an open-source Zapier alternative, has dropped support for MCP.
- *Autotest* Utility for MCP Servers Released: The mcp-autotest is a tool that defines expected server behavior in yaml files, and see if it complies or not.
- Version 0.2.1 tests using stdio or new streamable http transports.
- MCP Bits Newsletter goes live!: A new MCP Weekly Newsletter called MCP Bits has been published.
- It contains the latest news, articles, video and project updates; subscribe to the newsletter here.
- Playwrite MCP Server Now Works via Smithery Hosting: The Playwrite MCP server now works via Smithery hosting, enabling Sage to grab web content on iOS.
- MCPC enables two-way asynchronous communication: An extension called MCPC has been created to mitigate MCP's synchronous limitations and add asynchronous support.
- The new extension offers backwards compatibility, so nothing breaks—you just won’t get the extra features unless both the client and server support MCPC.
- mcp.direct | Custom MCP Servers: Custom MCP servers for optimized LLM attention. Get in touch for tailored solutions.
- MCP Bits #1: Weekly Model Context Protocol News
- GitHub - GeLi2001/shopify-mcp: MCP server for Shopify api, usable on mcp clients such as Anthropic's Claude and Cursor IDE: MCP server for Shopify api, usable on mcp clients such as Anthropic's Claude and Cursor IDE - GeLi2001/shopify-mcp
- GitHub - strowk/mcp-autotest: Utility for autotesting MCP servers: Utility for autotesting MCP servers. Contribute to strowk/mcp-autotest development by creating an account on GitHub.
- GitHub - OlaHulleberg/mcpc: An extension to MCP (Model-Context-Protocol) that enables two-way asynchronous communication between LLMs and tools through the already existing MCP transport - no additional transport layer needed.: An extension to MCP (Model-Context-Protocol) that enables two-way asynchronous communication between LLMs and tools through the already existing MCP transport - no additional transport layer needed...
- GitHub - ithena-one/mcp-governance-sdk: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK - ithena-one/mcp-governance-sdk
- 280+ Open Source MCPs — Use them on Activepieces now: Give AI access to your apps with 280+ open source MCPs. Use them with Claude, Cursor, or Windsurf to let AI read your emails, manage your calendar, and more.
Notebook LM ▷ #announcements (1 messages):
Webby Awards, Voting, NotebookLM nominations
- NotebookLM bags Three Webby Nominations!: NotebookLM has been nominated for THREE Webby Awards and is asking for community votes at this link.
- Voters should confirm their votes by clicking the verification link in their email, and check their spam folder.
- Search yields No Results: The search function displays
Displaying top results.
and**No results**
.- It prompts users to refine your search criteria to narrow the results.
Link mentioned: Vote for the best of the internet: I just voted in The Webby People's Voice Awards and checked my voter registration.
Notebook LM ▷ #use-cases (9 messages🔥):
Google Tasks integration with NotebookLM, Archiving notebooks in NotebookLM, Sharing sources on different notes in NotebookLM
- Google Tasks could integrate w/ NotebookLM: A user suggested that Google Tasks could integrate with NotebookLM by allowing users to pick a task list via a dropdown/popup.
- They proposed that this could work similarly to how Google Tasks allows selecting a task list for sharing.
- Notebook Archival Feature could reduce Notebook Count: A user requested a way to archive notebooks in NotebookLM to hide them and reduce the number of notebooks counting against their limit.
- They suggested that hidden/archived notebooks should not appear in the list of notebooks available for sharing content.
- Source Sharing Between Notes: An Available Feature?: A user inquired whether it's possible to share sources on different notes within NotebookLM.
- They were unsure if this feature is currently available.
Notebook LM ▷ #general (39 messages🔥):
Timestamped sections on the todo list, NotebookLM to Gemini 2.5 Pro, Conversation ending early, Limit the total number of words, not the number of sources?, Maths notation in NLM is very hard to read
- *Timestamped Todo Lists Triumph*: A user requested adding timestamped sections to the to-do list, similar to Audible, for skipping and re-listening to specific sections.
- This suggestion aims to enhance user experience and accessibility for longer audio content.
- *Gemini 2.5 Pro Prayers: A user requested that the NotebookLM IA be updated to Gemini 2.5 Pro*, citing their love for the updated Gemini version.
- They hope that NotebookLM will perform even better with the new model, but the NotebookLM team has not commented on any ETAs.
- *Conversation Cut-Off Catastrophe*: A user reported that the conversation is ending prematurely and not covering the second resource uploaded and asked if there was a fix.
- The team requests documenting the issue in the dedicated discord channel, including a sample notebook ID if possible.
- *Notes not Sources Needed: A user with personal notes managed in Obsidian* (2000+ short notes) finds the 300-note limit restrictive.
- They propose limiting the total number of words instead of the number of sources to better accommodate mesh note systems; a user suggests that folders or zipped files as a single source would also solve the problem.
- *Math Notation Menace*: A user reported that math notation in NLM is very hard to read in normal chats, asking if there's a fix.
- The team acknowledged the issue and is investigating, but currently, no ETA is available for a change.
Torchtune ▷ #general (11 messages🔥):
Torchtune office hours, Discord timezone handling
- *Torchtune Time* Next Friday: Members announced the next Torchtune office hours next Friday, linking to the Discord event.
- Discord Timezone Auto-Conversion Big Brain*: Members were converting timezones manually, before realizing *Discord handles that automatically.
- One member then posted a Big Brain meme.
Link mentioned: Brain Brain Meme GIF - Brain Brain meme Big brain - Discover & Share GIFs: Click to view the GIF
Torchtune ▷ #dev (16 messages🔥):
PR #2441 Review, Regression Testing for PR #2477, Qwen Model Upload, S3 Bucket Hookup Issues, PR #2510
- PR #2441 Needs Final Review ASAP: A member requested a final review for PR #2441 to speed up the merge process.
- Regression Testing on hold due to S3 troubles: Regression testing for PR #2477 is desired, but is blocked while waiting to upload the Qwen model to S3 for download as part of the regression test script.
- However, another member realized that there is more work to hook up their S3 bucket due to internal infra changes and suggested putting the regression testing on hold for a bit.
- Modern Models like Llama2 win the race: A member suggested using something a bit more modern than Llama2 for tests, but the current regression test uses the Llama2 model.
- PR #2510 Removes Recursive Reshard Utility: PR #2510 removes the recursive_reshard utility because it wasn't needed.
- torchtune/tests/cache_artifacts.sh at f1ecdd64cd67fc33a713c073d9664ab111116606 · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
- REMOVE `recursive_reshard` UTILITY by ebsmothers · Pull Request #2510 · pytorch/torchtune: At first, this PR was supposed to fix #2483. Upon further inspection of this utility, it became clear that it wasn't needed. And if it's not needed, why keep it around?How do you know...
tinygrad (George Hotz) ▷ #learn-tinygrad (15 messages🔥):
ImageDtype and IMAGE env, tinygrad BEAM Performance, Mobile GPUs and ImageDType, arange() optimization
- Delving into ImageDtype and IMAGE env: A member inquired about the purpose of ImageDtype and the IMAGE environment variable in tinygrad, noting its influence on Tensor.conv2d implementation and linking to a VAE training script.
- Another member suggested it's related to running comma.ai models faster on Qualcomm (QCOM) hardware, leveraging mobile GPUs' texture performance and caching capabilities.
- tinygrad BEAM Blazes Past tf-metal: One user reported achieving 3.2 it/s on an M1 Pro without BEAM, 28.36 it/s with BEAM=2, and about 25 it/s using Keras with tf-metal.
- George Hotz responded, "glad to see it's faster than tf-metal with BEAM!"
- Mobile GPUs Get Texture Boost With ImageDType: The discussion indicates ImageDType and related functions might optimize for mobile GPUs' texture performance, citing a potential Microsoft research paper on mobile GPUs.
- One member questioned the necessity of hardcoding layout specifics and suggested HWC (Height, Width, Channel) handling should be part of normal conv2d with user-defined padding.
- arange() Gets Optimized: A member discovered suboptimal code generation for small arange ranges (e.g.,
arange(1, 2, 0.1)
) compared to larger ranges (e.g.,arange(1, 10, 0.1)
), then added a chapter on.arange()
here.- They also spotted an unnecessary addition in the generated code, suggesting a fix from
((float)((ridx0+1)))*0.1f)+0.9f)
to(((float)((ridx0)))*0.1f)+1.0f)
.
- They also spotted an unnecessary addition in the generated code, suggesting a fix from
- 4 - The .arange() insanity – TinyGrad Notes: My notes on TinyGrad internals
- tinygrad-generative-ai-tutorial/vae/vae_train.py at main · currybab/tinygrad-generative-ai-tutorial: tinygrad generative ai tutorial. Contribute to currybab/tinygrad-generative-ai-tutorial development by creating an account on GitHub.
LlamaIndex ▷ #blog (1 messages):
LLM Agents for Technical Documentation, Structured Extraction from Complex Documents
- LLM Agents Tackle Technical Documentation: An underrated use case for LLM agents is every field that depends heavily on complex technical documentation like manufacturing, construction, and energy.
- It was suggested that you can build an agent that can do structured extraction from these documents.
- Complex Docs Decoded with LlamaIndex: A tweet shows a thread mentioning these docs are often full of screenshots.
- The tweet in question can be found here.
LlamaIndex ▷ #general (6 messages):
ReAct Agents, Local Models via Ollama, OpenAI Rate Limit Errors, Embedding Models, Query Engines
- ReAct Agent runs into OpenAI Rate Limits: A user encountered an OpenAI RateLimitError (Error 429) when using a ReAct agent with a local model set up via Ollama, questioning if ReAct agents are exclusively for OpenAI LLMs.
- They provided a link to their GitHub repository showing their agent setup.
- Troubleshooting the OpenAI Error: A member suggested that the embedding model might be the cause of the OpenAI error, as it could be defaulting to OpenAI's embedding model if not explicitly set.
- The user confirmed that they are using a Hugging Face embedding model, set during document creation.
- LLM and Embed Model Parameters: A member advised to pass in both the
llm
andembed_model
when creating the VectorStoreIndex.- Also, make sure to specify
llm
when callingindex.as_query_engine()
.
- Also, make sure to specify
Link mentioned: Agentic-Chat-RAG/agent_utils.py at jake-dev · JakeFurtaw/Agentic-Chat-RAG: Uses a Gradio interface to stream coding related responses from local models. Can be used in Chat Mode or Agent Mode. - JakeFurtaw/Agentic-Chat-RAG
Nomic.ai (GPT4All) ▷ #general (7 messages):
Official Translations, Llama3 8B instruct model, .bin vs .gguf
- GPT4All Goes Global with Official Translations: Official translations are now available for Simplified Chinese, Traditional Chinese, Italian, Portuguese, Romanian, and Spanish for the GPT4All documentation.
- Llama3 8B Instructor Model for Blog Posts & Web Pages?: A user asked if the Llama3 8B Instruct model would be the best model to use for making Blog posts and web pages off of a bunch of courses they have recorded (video and text).
- Another user suggested the original user ask a friend to help rephrase the question in English so that they could better understand and answer the question with confidence.
- Confusion between .bin and .gguf file formats: A user asked about the difference between a .bin and a .gguf file format, apparently noting they could not interchange them.
- The same user quickly retracted this, indicating they were just mistaken.
Link mentioned: Home: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):
Quizzes, Completion based
- Quizzes being completion based: A member asked what score they needed to achieve on quizzes.
- Another member replied that they are completion based.
- Quizzes matter if they are attempted: A member asked if the score doesn't matter as long as quizzes are attempted.
- Another member replied yep! and added that they hope users try their best for their own learning.
LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):
LLM Agents Cookbook, Llama 3
- LLM Agents Cookbook linked to Llama3: A member inquired whether the "LLM agents cookbook" mentioned in week 5 of Coding Agents refers to Llama 3's cookbook.
- A link to the cookbook was provided for reference.
- Meta released Llama 3: Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
- The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Link mentioned: Llama3 Cookbook - LlamaIndex: no description found
LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):
DeepSeek-R1, Reinforcement Learning, Chains-of-Thought, Project Loong
- Verifiable Rewards Boost Reasoning Models: Recent Large Reasoning Models like DeepSeek-R1 show greatly improved general reasoning capabilities when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward (as discussed in Project Loong).
- The ability to easily verify accuracy is crucial for improving domain-specific capabilities, particularly in mathematics and programming.
- High-Quality Datasets Enhance CoT Learning: An abundance of high-quality datasets, featuring questions paired with verified correct answers, is a critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs).
- These datasets provide the necessary signals for models to reliably arrive at correct answers.
Link mentioned: 🐉 Loong: Synthesize Long CoTs at Scale through Verifiers: Project Loong is a collaborative effort lead by CAMEL-AI to explore Long CoTs data generation through verifiers at scale.
Cohere ▷ #「💬」general (3 messages):
Command A issues, Rem dream journaling app
- Command A screams eternally: A user testing Command A found that the model gets stuck generating the same character endlessly when encountering a context where a character is screaming with repeated letters.
- This issue occurs even with default API Playground settings, freezing the interface and preventing feedback; reproduction is reliable with prompts like "Please generate a scream in fiction inside quotation marks".
- Rem app wants you to journal dreams: A user shared Rem, a dream journaling app created with a friend to easily record, analyze, and share dreams.
- The app aims to provide a platform for users to log their dreams and gain insights into their subconscious.
- Rem: Record your dreams, uncover hidden patterns, and connect with a community of dreamers in a beautiful, secure space.
- imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
- imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
Cohere ▷ #「🤝」introductions (2 messages):
Introductions, Community growth, User interests, Networking
- Community Welcomes New Members: The community welcomes new members to the Cohere Discord server, encouraging them to introduce themselves.
- New members are prompted to share their company, what they're working on, favorite tech tools, and what they hope to gain from this community.
- New members share interests: New members are eager to participate, learn, and get feedback on their projects
- They are excited to engage in discussions about their favorite technologies and tools within the community.
MLOps @Chipro ▷ #events (1 messages):
AI in Legislation, Legalese Decoder, SVCAF's AI4Legislation competition
- Decoding Legalese with AI: Seminar Alert!: The Silicon Valley Chinese Association Foundation (SVCAF) is hosting a seminar on April 2, 2025, at 6:30pm Pacific Time to discuss AI applications in legislation, featuring the Founder of Legalese Decoder.
- The seminar will delve into how AI, ML, and NLP are used to simplify complex legal documents, making them understandable to everyone.
- SVCAF Launches AI4Legislation Competition: SVCAF is holding a competition this summer to develop open-source AI-driven solutions for citizen engagement in the legislative process, with details available in the official Github repo.
- The competition aims to harness AI's power to make legislative processes more equitable and effective, aligning with SVCAF's mission to educate the Chinese community in public affairs.
- AI4Legislation Seminar Series to start: The AI4Legislation seminar series will recur during the first week of each month, aiming to provide project guidance and current information about legislative AI tools, more information can be found here.
- Each seminar features a different guest sharing insights on utilizing AI to address key challenges in lawmaking, exploring the potential of AI-driven governance.
- AI4Legislation Seminars RSVP: Thank you for your interest in SVCAF's AI4Legislation seminars! Please check out the official competition Github repo here and join our Discord server!Silicon Valley Chinese Association Foundation...
- Seminar Series: AI4Legislation - featuring Legalese Decoder - SVCA Foundation: SVCAF is hosting the first seminar of the AI4Legislation competition on the applications of artificial intelligence in legislation. Join us to interview both the founder of Legalese Decoder and our fo...
MLOps @Chipro ▷ #general-ml (1 messages):
smartinez.ai: I think you can ask Joe
AI21 Labs (Jamba) ▷ #general-chat (2 messages):
Language Use
- Member Uses French and English Regularly: A member mentioned they missed the poll and regularly use French and English.
- They also use Greek and Hebrew at times.
- No Topics: No topics were discussed.
- No topics were discussed.
Codeium (Windsurf) ▷ #announcements (1 messages):
Windsurf Sounds, Auditory UX, Windsurf Next Beta
- Windsurf Sounds Launch: Windsurf AI introduced Windsurf Sounds, marking their initial foray into sound design and Auditory UX, aiming to enhance flow state and productivity.
- The full video announcement is available on X.com.
- Windsurf Next Beta Program Available: The Windsurf Next Beta program is now available for early adopters to test new features.
- Downloads are available at Codeium.com with minimum requirements including OS X Yosemite, glibc >= 2.28 for Linux, and Windows 10 (64-bit).
- Tweet from Windsurf (@windsurf_ai): Introducing Sounds, our latest development. This is our first step into the realm of sound design and auditory UX, unlocking higher levels of flow state and productivity by playing the perfect sounds ...
- Thank you for downloading Windsurf Next: Windsurf Next is our experimental beta, giving early adopters a unique opportunity to test new features before they make it to the stable release.
Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):
io_uring.h, v0 openfunctions dataset, v1 dataset
- io_uring.h's v0 Dataset: Vanished or Merged?: A member inquired about the fate of the v0 openfunctions dataset within
io_uring.h
and whether it was completely merged into the v1 dataset. - v0 vs v1 Datasets in io_uring.h: A Deep Dive?: The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the
openfunctions
dataset inio_uring.h
.