[AINews] Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Composite AI is all you need?
AI News for 3/21/2025-3/24/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 10464 messages) for you. Estimated reading time saved (at 200wpm): 1129 minutes. You can now tag @smol_ai for AINews discussions!
A couple of nice updates from Qwen and Deepseek today, but we give title spot to a lesser known but ambitious new entrant.
Reve, pronounced [ʀɛv], from “rêve”, has emerged from Artificial Analysis' leaderboard as the top rated imagegen model, displacing former SOTA Recraft. "The model stands out for its impressive text rendering, prompt adherence, and aesthetics." We found it remarkably easy to play with.


And it beats Ideogram for typography:

It's interesting that it comes from Christian Cantrell, former VP Product at Stability, Taesung Park, and Michaël Gharbi. All are Adobe alums, and Michael's announcement gives the most insight into how they do it:
Reve’s mission is to invent the future of intent-driven visual creation. Capturing creative intent requires advanced machine understanding of natural language and other interactions. Turning this intent into compelling visual calls for interactive systems that have a deep understanding of the visual world they generate, so they can iteratively amend it.
Today's text-to-image models are essentially that—random slice-of-the-world generator. There's no intelligence. This is both a data and representation problem. We need to leverage the equivalent of full documents for images, but we don't have a good representation for it. Our mission at Reve is to enhance visual generative models with logic. As the first step, we focus on understanding user intent with advanced language capabilities, resulting in superior complex prompt understanding and text writing.
There's no suggestion that it's a single model, but rather some composite of models. Probably this is what Christian wanted to build at Stability, but couldn't.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
Here's a summary of the AI-related discussions from the provided tweets, categorized for a technical audience:
Model Releases and Updates, Including Performance
- DeepSeek V3-0324 Release and Performance: @_akhaliq announced DeepSeek-V3-0324 release on Hugging Face, and @Teknium1 also noted its release, and @reach_vb highlighted it as a post-training update with potential for improved downstream performance. Several users discussed its performance and characteristics, including @teortaxesTex who found it comparable to Sonnet 3.6 and @teortaxesTex noting it surpasses DeepSeek-R1 and Claude-3.7 in some evaluations.
- Qwen 2.5-VL-32B-Instruct Release: @_akhaliq announced the release of Alibaba's Qwen2.5-VL-32B-Instruct on Hugging Face, and @reach_vb shared performance benchmarks indicating it beats Qwen 2.5 72B and GPT 4o Mini on vision tasks, with enhanced mathematical reasoning and human preference alignment.
- DeepSeek Model Serving: @_akhaliq noted that DeepSeek's new model is served on Hugging Face via Hyperbolic Labs, and @ClementDelangue mentioned it's available via FireworksAI and Hyperbolic Labs. @Yuchenj_UW stated that Hyperbolic Labs now serves DeepSeek-V3-0324.
- DeepSeek V3-0324 on MLX: @reach_vb reported that the latest DeepSeek V3-0324 runs at >20 toks/sec on a 512GB M3 Ultra with mlx-lm, and @awnihannun confirmed the same.
- NVIDIA Mamba Image Backbones: @mervenoyann announced NVIDIA's release of new Mamba image backbones on Hugging Face, available in various sizes and resolutions.
Frameworks and Tools
- LangChain and LangGraph Use Cases: Multiple tweets highlighted use cases of LangChain and LangGraph, including Vodafone's AI assistants for data operations @hwchase17, Klarna's AI assistant for customer support @LangChainAI, and a medical supply chain AI system @LangChainAI. @hwchase17 also mentioned context management in langgraph.
- Weave-Agent Planner Discussion: @jd_pressman discussed the design and planning of Weave-Agent, considering approaches like ReActTree and MuZero for agentic planning.
- Smolagents Growth: @AymericRoucher announced that smolagents has reached 15k GitHub stars and is integrating sandboxed code execution via E2B or Docker.
- Together Chat: @togethercompute introduced Together Chat, featuring OSS models like DeepSeek R1 for web search, coding, image generation, and image analysis, and @togethercompute listed the tech stack.
Agent Engineering and Applications
- Agent Engineering Talk and Essay: @swyx shared a talk and essay on Agent Engineering, defining agents, outlining six elements, and discussing their potential impact.
- Linear and Codegen Integration: @mathemagic1an announced Codegen's integration with Linear, enabling agents to solve tickets and close duplicates, and highlighted Linear's expanded capabilities for bots @mathemagic1an.
- Evaluation Metric for Agents: @_philschmid advocated for using pass^k instead of pass@k for evaluating agents, arguing it provides a more accurate performance metric aligned with user experience.
Economic and Strategic Implications
- AI Automation and Economic Growth Model: @EpochAIResearch discussed GATE, a model for AI automation's economic impacts, predicting trillions in AI investments, extreme compute scaling, and significant economic growth.
- US-Japan Defense Innovation Award: @SakanaAILabs announced that Sakana AI won an award at the US-Japan Competition for Defense Innovation for novel AI solutions.
- Perspectives on China and AGI: @teortaxesTex shared multiple opinions on China's technological and strategic advantages, including its state capacity, industrial base, and AGI efforts. @teortaxesTex also touched on DeepSeek's "commoditize your complement" theory.
ARC-AGI Benchmark
- ARC-AGI-2 Release and Competition: @fchollet announced the release of ARC-AGI-2, a benchmark designed to measure general fluid intelligence, and the ARC Prize 2025 competition with a \$700,000 grand prize @fchollet. He noted that current top AI approaches score very low, requiring test-time adaptation, and discussed the evaluation methodology @fchollet.
Humor and Memes
- Coding by Vibes: @gneubig shared a tweet about prompting to improve vibe coding, distinguishing between coding by vibes for personal projects versus agent behavior.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek V3-0324: Performance and Expectations vs R1
- Deepseek releases new V3 checkpoint (V3-0324) (Score: 638, Comments: 125): DeepSeek released its new V3 checkpoint (V3-0324), which likely includes updates and improvements over previous versions. Further details on specific features or enhancements are not provided in the post.
- Discussion on the DeepSeek-V3 checkpoint (V3-0324) includes speculation about its use as a base for a future R2 release, with some users anticipating it to arrive in April. There is a debate on whether V4 is necessary for R2, with arguments suggesting that improvements can be achieved through better scaling and reasoning techniques without a new base model.
- Users are seeking benchmark results to compare the new model's performance, with some noting that no official benchmarks have been released yet. Independent tests are expected soon due to the open-source release of the weights, and there is a call for DeepSeek to release their own benchmarks similar to Mistral.
- There are observations about the model's coding skills improvement and its deployment on both API and web platforms, with some users noting a more censored version compared to the original. The MTP module is highlighted for its role in enhancing decoding speed, achieving 1.8 times TPS, as detailed in a research paper.
- New deepseek v3 vs R1 (first is v3) (Score: 282, Comments: 56): The image compares two versions of DeepSeek user interfaces: V3 and R1. V3 showcases a more dynamic design with animated weather cards for "Windy," "Rainy," "Sunny," and "Snowy," while R1 offers a simpler interface with toggle buttons for "Wind," "Rain," "Sun," and "Snow," each represented by a single icon.
- DeepSeek V3 and R1 interfaces are being compared, with V3 offering animated weather cards and R1 featuring simpler toggle buttons. Users are curious about which model corresponds to each interface and the prompts used for the comparison.
- There is a preference for open-source models over proprietary ones due to cost and flexibility, despite DeepSeek models not being the cheapest. Sonnet is noted to be significantly more expensive than V3, especially during off-peak hours.
- The discussion includes references to command-a running locally, with links provided for further exploration, such as the Hugging Face model and a GIF showcasing the interface. Users express interest in more dynamic content, like videos, to better understand the animated features.
- DeepSeek V3-0324 has caught up to Sonnet 3.7 in my code creativity benchmark - "Write a raytracer that renders an interesting scene with many colourful lightsources in python." (Score: 215, Comments: 43): DeepSeek V3-0324 has matched Sonnet 3.7 in a code creativity benchmark involving a raytracer task in Python, demonstrating significant improvement over its previous version. The benchmark revealed that while most LLMs generated simple RGB scenes, Sonnet 3.7 and now DeepSeek V3-0324 produced more complex and aesthetically pleasing scenes, though the method for this creativity boost remains speculative. More details and data are available in the GitHub repository.
- DeepSeek V3-0324 is noted for its "psychotic taste," resembling reasoning models like R1 or QwQ more than its predecessor, and has faced criticism for its creative writing outputs, which some users find incoherent despite high benchmark scores. Gemma 3 is highlighted for its coherence and creativity in fiction, contrasting with R1's often criticized outputs.
- R1 failed in the benchmark by not producing a functioning program, despite attempts, which raises questions about its effectiveness compared to older versions of DeepSeek V3. The discussion suggests that R1's long chains of thought (CoT) do not guarantee successful outputs, unlike previous versions of DeepSeek.
- The increase in program size for DeepSeek V3-0324 and Sonnet 3.7 is noted, with speculation about whether this is due to training for longer generation lengths or other optimizations. Generating 10kB of code in a single attempt is considered significant, indicating potential advancements in model capabilities.
Theme 2. Meta's ParetoQ Explored: Promise of 2-bit Models
- Meta released a paper last month that seems to have gone under the radar. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization. This is a better solution than BitNet and means if Meta wanted (for 10% extra compute) they could give us extremely performant 2-bit models. (Score: 505, Comments: 49): Meta's ParetoQ paper introduces scaling laws for extremely low-bit LLM quantization, proposing a more effective solution than BitNet. This allows the possibility of delivering highly efficient 2-bit models with only a 10% increase in compute requirements.
- Quantization and Performance: Discussions emphasize the potential of 2-bit quantization for lightweight models, with some users noting that this could be transformative for applications like creative writing assistants and chatbots. However, concerns about potential slowdowns and the impact of quantization on model intelligence and instruction following are raised, with hopes for improvements using vulkan/T-MAC kernels.
- Research and Comparisons: Users discuss the ParetoQ framework as a more rigorous method for comparing quantization settings, highlighting a learning transition between 2 and 3 bits. The paper is noted for its ability to optimize training for 2-3 bit models, with comparisons to AQLM and references to human synapses having 4-5 bpw.
- Resources and References: The discussion includes references to resources like the Intel auto-round project and DeepSeek-R1-int2-mixed-sym-inc, which achieve comparable performance with 97.9% accuracy retention. A link to the paper is provided: arxiv.org.
Theme 3. Expanding LLM Functionalities: From Text to Multimodal
- I made a diagram and explanation of how transformers work (Score: 272, Comments: 20): LLM functionalities are expanding beyond text, and a user has created a diagram and explanation to illustrate how transformers function. This effort aims to provide a clearer understanding of the internal mechanisms of transformers for those interested in AI and machine learning.
- Input and Output Embeddings: There is a discussion on whether input and output embeddings are still linked in modern transformer architectures, with users noting the difficulty in obtaining a comprehensive and current overview of these architectures.
- Resources and Diagrams: Several users shared resources to aid in understanding transformers, including a detailed explanation by Cromulent123 and a link to a GitHub page with relevant diagrams (GitHub Llama Nuts and Bolts). Another user highlighted a conceptual guide on transformers available on Ben Levinstein's Substack.
- Detailed Explanation on Transformer Functionality: Cromulent123 provides an in-depth explanation of how transformers work, focusing on the process of token embedding, the role of Query, Key, and Value Matrices, and the concept of attention scores in determining relevance. They also discuss the importance of contextual enrichment through multiple transformer blocks, emphasizing the nuanced understanding of token relationships.
- I don't understand what an LLM exactly is anymore (Score: 233, Comments: 89): The author is confused about the expanding definition of Large Language Models (LLMs), originally understood as systems predicting the next word based on pretrained weights from text data. They question how LLMs now encompass capabilities like audio and image generation, and cite SpatialLM, which processes 3D point cloud data, as an example of this broadening scope, seeking clarification on the connection to language models.
- Diffusion Models and LLMs: There is a debate on whether models like Stable Diffusion qualify as LLMs since they incorporate T5 for understanding text prompts, though they primarily generate images. Co0k1eGal3xy argues that such models are close to LLMs because of their advanced language understanding, despite not traditionally fitting the LLM category.
- Tokenization and Multimodal Models: suprjami explains that all data, including text, images, and audio, is tokenized into numbers for LLMs to process, which allows them to learn relationships between different media types. Chair-Short details how self-attention mechanisms and positional encoding enable LLMs to handle different data modalities, suggesting a shift from purely text-focused models to multimodal capabilities.
- Defining LLMs: Discussions highlight the blurred lines in defining LLMs, with some viewing them as large models capable of processing and generating language, regardless of the input type. SnackerSnick mentions that LLMs use tokenization and embeddings to predict subsequent tokens, while Otherwise_Marzipan11 and Co0k1eGal3xy suggest that branding and interaction with language, whether text, audio, or images, contribute to the LLM label.
- Possible Llama 4 prototypes on Chatbot Arena (Score: 105, Comments: 21): MetaAI is testing several anonymous Llama/Meta models on Chatbot Arena, potentially as prototypes for Llama 4. Models like aurora, ertiga, pinnacle, solaris, and spectra are image-enabled, while rhea is identified as Llama 3.
- Discussions reveal skepticism about model identities on Chatbot Arena, as some models, like anonymous-chatbot, claim to be from OpenAI, while others like rage and phantom are suspected to be Meta models. Users note that these models often provide inconsistent company affiliations, potentially due to a guard model or hallucinations.
- The anonymous-chatbot and nebula models are highlighted for their performance, with nebula being particularly praised for excelling in tests, while models like rage and rhea received mixed feedback, with rhea noted for its friendly demeanor and emoji use.
- There is a debate about whether any models are actually Llama 4, with users noting that none explicitly identify as such. Some comments suggest that Meta might be testing diverse writing styles or using randomized system prompts to obscure the true origin of the models.
Theme 4. TeapotLLM's Impact: Lightweight Q&A Models
- Announcing TeapotLLM- an open-source ~800M model for hallucination-resistant Q&A and document extraction, running entirely on CPU. (Score: 163, Comments: 50): TeapotLLM is an open-source model designed for hallucination-resistant Q&A and document extraction, featuring an approximate 800 million parameter architecture. It is optimized to run entirely on CPU, making it accessible for broader usage without the need for specialized hardware.
- TeapotLLM's Hallucination Resistance: Discussion highlights the model's focus on hallucination resistance and its performance against models like Qwen and Llama, with some skepticism expressed about claims of reduced hallucination. Users are curious about its placement on hallucination leaderboards, and a demo is available for testing.
- Model's Language and Output Capabilities: The model is trained primarily in English, but theoretically supports all languages covered by flan-t5. It can extract structured data into JSON using a library that parses fields into typed JSON, as detailed in the documentation, though there is interest in expanding language support and testing on platforms like ollama.
- Performance and Resource Usage: TeapotLLM is optimized for CPU usage, fitting within approximately 2GB of RAM on Google Colab, making it accessible for users with limited compute resources. There is interest in exploring fine-tuning on more modern models like Qwen 0.5B to potentially enhance performance, while maintaining the current model's strengths in document extraction and concise responses.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. New Improved Memory Alpha in ChatGPT Enhances Interaction
- New improved memory alpha is insane (Score: 414, Comments: 241): The post discusses the new improved memory alpha feature in ChatGPT, comparing its impact to the leap from GPT-2 to GPT-4. The author expresses skepticism about DeepSeek's ability to compete unless they adopt similar advancements, expressing confidence in OpenAI's continued leadership.
- Many users express frustration and confusion over the availability and inconsistency of the new memory alpha feature in ChatGPT, with some losing access unexpectedly despite having pro subscriptions. CyberNoche and jalpseon highlight deactivation issues, while alpha_rover and DamionPrime share positive experiences with memory persistence.
- The discussion touches on the pricing of ChatGPT subscriptions, with Initial-Kangaroo-534 questioning the value of paying $200 per month. This is contrasted by alpha_rover, who finds the feature invaluable for project continuity and would miss it compared to other AI tools.
- Some commenters like 3xNEI and SillyTwo3470 speculate on the broader implications of memory features, suggesting it could lead to human-AI hybridization. They emphasize the potential for increased personalization and the blurring of lines between tool and partner, indicating a significant shift in how users might interact with AI.
Theme 2. Anthropic's Revenue Surge Matches OpenAI's 2023 Numbers
- Anthropic is making about $115M a month now; same as OpenAI in Nov 2023 (Score: 272, Comments: 50): Anthropic is reportedly generating $115M per month, matching OpenAI's revenue in November 2023. Revenue projections for 2025 estimate $2B as likely and $4B as optimistic, with Manus contributing approximately $2 per task to their revenue. An image depicts a 40% increase in annualized revenue from December 2024 to March 2025, with figures from the Bay Area Times.
- Claude's Impact and Usage: Users highlight Claude Code as a game-changing tool, with some spending $50 per day on it due to its effectiveness in automating coding tasks. Alternatives like AIDER and Cursor's Agent are mentioned but are deemed less effective compared to Claude, which is described as being akin to having a competent intern.
- Revenue Sources and Context: A significant portion of Anthropic's revenue is attributed to integration with AWS Bedrock, with expectations of continued growth due to widespread enterprise adoption. The discussion clarifies that the reported figures represent revenue, not profit.
- Model Comparisons and Preferences: Users compare various AI models, noting that Claude offers superior performance despite smaller context windows in some cases. The OG 600b model and Sonnet 3.7 are mentioned, with the latter praised for its smart capabilities and iterative problem-solving.
Theme 3. AI-Driven Bug Fixing Automation: A 27-Day Experiment
- I made AI fix my bugs in production for 27 days straight - lessons learned (Score: 191, Comments: 80): Over 27 days, the author used Claude 3.7 to automatically fix 21 unique production bugs, resulting in 12 successful one-shot fixes, 6 partial successes, and 3 failures due to incorrect assumptions or complex issues. Despite the initial time investment exceeding manual bug fixing, the system reduced cognitive load and context switching, though it may not suit niche or complex problem domains.
- Interest in Open Sourcing: There is significant interest in the project being open-sourced, with Relevant-Pitch-8450 expressing intent to share it after some cleanup. Users appreciate the UI design and see potential utility in the tool.
- Potential Commercialization: Commenters like ClassyBukake suggest that the tool could be monetized as a service, highlighting its appeal from both personal and business perspectives.
- Cost and Time Efficiency: HelpRespawnedAsDee raises questions about the tool's cost and time efficiency over an extended period, suggesting continued use to evaluate long-term benefits.
Theme 4. Advanced Claude Workflow Integration: MCP External Tools
- My Claude Workflow Guide: Advanced Setup with MCP External Tools (Score: 124, Comments: 20): The post provides a detailed guide for setting up Claude's desktop application with external tools like Brave Search and Tavily to enhance its capabilities, requiring a Claude Pro subscription ($20/month) and specific software installations like Node.js and Python. It includes configuration examples for both Windows and macOS, instructions for accessing developer settings, and troubleshooting tips for installation and setup issues. The guide emphasizes the benefits of enhanced web search, filesystem access, and sequential thinking, and provides additional resources and security considerations for effective use.
- Claude's desktop application setup is praised for its accessibility to non-developers, providing a bridge for regular desktop users to enhance Claude's capabilities without coding skills. The guide is compared to Claude Code, which offers more flexibility for tech-savvy users comfortable with command line interfaces.
- A tutorial for Claude Code is recommended for those interested in exploring its capabilities, available on YouTube. This highlights the distinction between the two approaches: one prioritizing ease of use and the other, advanced customization.
Theme 5. Wan 2.1 Video Frame Feature Innovations in AI
- Wan-i2v - Prompt: a man throws a lady overboard from the front of a cruiseship. (Score: 812, Comments: 51): Wan-i2v AI has introduced new features and advancements, as demonstrated in a prompt scenario where "a man throws a lady overboard from the front of a cruiseship." While the post does not provide further details, it suggests a focus on action-oriented scenarios or potentially controversial themes in AI-generated content.
- The Wan-i2v AI is discussed as an image-to-video tool, with some users noting that it couldn't independently create a starting frame from the Titanic movie, implying a direct screenshot was used instead. This highlights the potential limitations of AI in generating entirely original content without reference images.
- Users humorously critique the AI's understanding of physics, with comments suggesting that while AI may not currently grasp physical laws, advancements such as Stable Diffusion and Wan2.1 are rapidly improving in simulating realistic physics in animations, such as "boob jiggles."
- The conversation also touches on the idea of AI-generated alternate movie endings, with users joking about creating new endings for films like Titanic. This raises questions about copyright issues and the potential for new YouTube channels focused on AI-crafted content, despite the challenges of intellectual property rights.
- Wan 2.1 begin and ending frame feature having model coming officially (Score: 100, Comments: 13): Wan 2.1 is set to release an official model that supports start and end frames interpolation soon, as confirmed by user "danielzy1990" on a social media platform. For more details, refer to the GitHub issue comment.
- Users anticipate that Wan 2.1's new model will significantly enhance video control, with some expressing hope for improvements such as adding a guidance layer similar to Hunyuan to speed up generation times.
- Comparisons to Hunyuan highlight its efficiency, generating video clips at 24fps in nearly half the time it takes Wan to generate at 16fps, emphasizing the potential benefits of guidance training.
- There is interest in the model's capability to support multiple timed keyframes, with some users hoping it remains compatible with existing img2vid functionalities.
AI Discord Recap
A summary of Summaries of Summaries by o1-preview-2024-09-12
Theme 1. DeepSeek V3's Surprise Launch Shakes AI Community
- DeepSeek V3 Emerges as Open-Source Giant: DeepSeek released DeepSeek V3, a 685B-parameter mixture-of-experts model under the MIT license, accessible on Hugging Face. The community is excited, comparing it to OpenAI's o1 models in performance.
- DeepSeek V3 Outperforms R1?: Users claim DeepSeek V3 beats R1 in coding and front-end tasks, even without chain-of-thought reasoning, noting its cost-effectiveness and excellence in math.
- DeepSeek V3 Drops Without a README!: DeepSeek releases DeepSeek V3 without proper documentation, leaving users both amused and perplexed by the lack of a README, but offering a playground for experimentation.
Theme 2. Qwen Models and Upcoming AI Innovations
- Qwen3 Support Added to Hugging Face Transformers: Developers are thrilled as Qwen3 support is integrated into Hugging Face Transformers, preparing for the upcoming Qwen3 models.
- Qwen2.5-VL-32B-Instruct Released Under Apache 2.0: Qwen releases Qwen2.5-VL-32B-Instruct, a multimodal vision-language model fine-tuned with reinforcement learning, enhancing mathematical reasoning and visual problem-solving capabilities.
- Qwen3 to Support CPU Inference?: Users speculate that Qwen3-15B-A2B could be ideal for CPU inference due to its size, making advanced AI models more accessible.
Theme 3. Debates and Advances in LLM Reasoning Training
- R1-Zero Training Bias Unveiled: Researchers uncover a bias in R1-Zero-like training, where using row mean favors shorter correct responses and longer incorrect ones, impacting model outputs.
- GRPO's Length Explosion Troubles Practitioners: Users grapple with GRPO training leading to length explosion, debating techniques like length clipping and curriculum to address the issue.
- MathFusion Supercharges LLM Math Skills: MathFusion enhances mathematical reasoning in LLMs via cross-problem instruction synthesis, improving models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B.
Theme 4. Agent Engineering and MCP Developments
- AGNCY Initiative Propels Agentic Interaction Standards: Luke leads AGNCY, aiming to create an open standard for agentic interactions, providing a robust framework for developing more effective AI agents.
- MCPwizard Eases MCP Server Creation: Developers introduce mcpwizard, a CLI tool that simplifies creating and deploying MCP servers, enabling easy addition of custom tools to AI assistants like Claude.
- A16Z Explores Future of AI Tooling with MCP: A16Z publishes a deep dive into Model Context Protocol (MCP), analyzing its potential as a standard interface for AI models and discussing its impact on AI tooling.
Theme 5. NVIDIA's Nemotron-H Models and Hardware Advances
- NVIDIA Unveils Nemotron-H Hybrid Models: NVIDIA introduces the Nemotron-H family, hybrid Mamba-Transformer models offering up to 3x speed improvements, with models ranging from 8B to 47-56B parameters.
- Mistral 24B Roars Back into Favor: Mistral 24B is hailed as one of the greatest releases recently, with users impressed by its strength and accessibility under the Apache 2.0 license.
- Flash Attention and Hopper Architecture Demystified: Enthusiasts delve into Flash Attention optimizations and clarify confusion around Hopper's 64B swizzle, enhancing understanding of NVIDIA's GPU architectures.
PART 1: High level Discord summaries
Perplexity AI Discord
- Sonar 3.7 Bug kicks model: A user reported a bug with Sonar 3.7 where a chown command kicks the model out and breaks the conversation while coding, wondering if there was any difference in performance between high and old source amount and reasoning quality between search steps.
- A user followed up noting that in their experience, the difference is quite large, sharing a screenshot here.
- Sonar Model Gives Cropped Snippets: Multiple users reported that the Sonar model in the Perplexity API is truncating responses, particularly since the weekend, even though the JSON format is correct.
- A user provided an example of a JSON request and the truncated response, noting that switching to sonar-pro resolves the issue, but is not preferrable for cost reasons.
- Llama Index Wrestles with Sonar: A user encountered an error when configuring Sonar as a chat engine with Llama Index for a RAG project and requested assistance.
- This highlights potential integration challenges when using Sonar in conjunction with other AI development tools.
- Deep Research Rate Limit: A user inquired about the possibility of extending the limit of 100 deep researches per minute due to bulk processing needs in their application.
- This inquiry underscores the demand for higher API usage limits for users with demanding workloads.
Unsloth AI (Daniel Han) Discord
- Bonsai Bitnet Seeks Testers for Qwen2.5 Comparison: A member is looking for testers for deepgrove/Bonsai, asking how the bitnet compares to Qwen2.5 0.5B.
- They also linked a relevant Hugging Face Transformers PR about adding Qwen3 and Qwen3MoE support.
- Orpheus TTS Model Gains Audio Finetuning: Audio finetuning has arrived with the Orpheus TTS model, according to a newly released Unsloth notebook.
- A user noted that the work was all done by a particular member and that the notebook is a lot more streamlined compared to local audio tokenizing and then regular Llama3 finetuning.
- Straight PRs OK on Unsloth Github, but wait: A member inquired about contributing to Unsloth's GitHub, and another member confirmed that straight PRs are acceptable, though potential delays may occur due to the high volume of recent PRs and issues.
- The discussion then shifted to modifying data preparation steps in Colab to accommodate .txt files, aiming for cheaper inference, and the original issue was linked.
- GRPO Reasoning Needs Training Data: A user asked about training only parts of the output, specifically wanting the model to generate its own reasoning during inference.
- It was suggested to look at the GRPO notebooks as a standard way of adding reasoning, and that the model must see reasoning traces during training to take it into account during inference.
- Unsloth's Fine-Tuning Guide Now Available: A member created a guide for fine-tuning with Unsloth, covering theoretical aspects, practical examples, and how to create a reasoning model with GRPO.
- The guide compiles everything learned over the last year.
LMArena Discord
- Nebula Steals Chatbot Spotlight: Members found Nebula, an anonymous chatbot suspected to be from DeepMind, to be really good and the best anonymoud model rn, outperforming others in math, english-turkish translation, and solving Arc-AGI problems.
- It seems similar to Phantom, which users identified as a Google model, with both being tested in the arena.
- GPT-4o Gets Human Alignment Boost: GPT-4o has significantly improved through OpenAI's post-training, potentially surpassing Grok 3 soon, due to continued pretraining since December.
- Speculation suggests it might top the leaderboard, leveraging OpenAI's proficiency in human preference alignment in the LM arena.
- Specter Evolves into Phantom then Nebula: Specter, Phantom, and Nebula are revisions of the same model, in that order, showing performance jumps in a few weeks.
- Members noted a more significant performance jump from Specter to Phantom compared to Phantom to Nebula.
- LMArena Fixes Bugs, Tunes Leaderboard: The LMArena alpha received updates including bug fixes and new features, and testers are encouraged to continue testing at alpha.lmarena.ai with the password
still-alpha.- A bug preventing messages from saving and causing vote failures has been fixed, and leaderboard columns are now sortable with live data updates; feedback can be provided via this Google Forms link and bug reports can be filed using this Airtable link.
Cursor Community Discord
- Cursor's CMD+Backspace becomes problematic: Users express frustration with Cursor's CMD+Backspace leading to accidental project deletions, with some losing work up to 7 times.
- The Cursor team plans to change the default keybinding to CMD+Shift+Backspace, with configuration options, targeting a Monday rollout.
- Claude 3.7 MAX hits users' pocket: Claude 3.7 Thinking, now Claude 3.7 MAX, moves from the Pro plan to usage-based pricing, causing user frustration due to increased costs.
- Claude 3.7 MAX features a higher context window and more tool calls compared to the standard Claude 3.7 Sonnet.
- Windsurf Surfing Ahead in Responsiveness: Some users find Windsurf faster and more responsive than Cursor, citing Cursor's lagging and freezing.
- Others prefer Cursor for its rollback features and agent performance, though acknowledge AI programming's remaining challenges.
- MCP Combinations become hype: Users experiment with various MCP (Model Context Protocol) server combinations to enhance AI coding agents like Cursor, with Supabase MCP highlighted.
- Some users suggest MCPs may be overhyped, noting instances of agents over- or under-utilizing MCPs, suggesting a need for clearer instructions.
- 3D Integration Frustrates AI Coders: A user struggles to integrate a 3D model (FBX format) into a three.js project using Claude, facing issues with FBXLoader.
- The limitations of AI in handling 3D designs become clear, with suggestions to switch to GLTF format and simplify tasks.
aider (Paul Gauthier) Discord
- DeepSeek V3-0324 Beats R1?: The Aider community is excited about the new DeepSeek V3-0324 release, suggesting it outperforms R1 in coding and front-end tasks, despite lacking chain of thought.
- Members highlight its strengths in coding and math compared to previous versions, drawing comparisons to Sonnet 3.5 in benchmarks, while also noting its cost-effectiveness.
- Aider Tames Sonnet's Over-Eagerness: Paul Gauthier reveals he has managed to get Aider to mitigate Sonnet 3.7's over-eager behavior by adding a line to the prompt to chill out; this is now available in the main branch.
- He encourages users to provide feedback on this adjustment based on their coding sessions.
- Aider Gains New Homepage: Paul Gauthier announces the launch of Aider's new homepage at aider.chat, showcasing compatibility with models like Claude 3.7 Sonnet, DeepSeek R1 & Chat V3, OpenAI o1, o3-mini & GPT-4o, and support for over 100 code languages.
- This update offers an improved introduction for new users and a central hub for resources.
- Aider's Context Command Streamlines Chats: Paul Gauthier introduces an experimental
/contextcommand in Aider that automatically sets up the chat context, working best with Sonnet 3.7, R1, and o3-mini.- This new command enhances user experience by intelligently identifying and adding relevant files to the chat.
- Community Curates LLM Contexts: A member announces the launch of ctxs.ai/weekly, a site dedicated to collecting aider conventions, prompts, and LLM-oriented documentation snippets.
- The goal is to create a useful resource for the aider community, and the member is actively soliciting feedback on how to improve the site.
Nous Research AI Discord
- LCPP Context Length Baffles: Users found that setting a context length to 100 in LCPP still tries to allocate 180GB of RAM, leading to VRAM exhaustion.
- Suggestions include Attention overriding the assigned context length, missing ROPE-specific arguments, or using Q8 quantization.
- Deepseek V3 Mirrors Sonnet 3.7: Deepseek V3 0324 shows as much variation as Sonnet 3.7, suggesting shared advancements in their architectures, viewable in this image.
- One user even called it a huge update with Sonnet-level code creativity and a potential base for R2.
- Transformers Ditch Normalization: Inspired by the Transformers without Normalization paper, a member replaced normalization with tanh.
- The discussion then focused on removing experts at inference and its effects on smaller weights.
- MathFusion Supercharges LLM Math: MathFusion improves mathematical reasoning in LLMs via cross-problem instruction synthesis, enhancing models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B (more on MathFusion).
- This method creates the MathFusionQA dataset, which fine-tunes models and boosts benchmark accuracy with minimal extra data.
- Qwen3 to support CPU inference: The transformers library PR#36878 shows that Qwen3 support is being added, meaning that the models will soon be supported by the transformers library.
- A user speculated that Qwen3-15B-A2B could be a good candidate for CPU inference due to its size.
OpenAI Discord
- Sam Altman Teases GPT-5 Release: Despite the absence of an official announcement, Sam Altman confirmed that GPT-5 will launch this year, leading to speculation it could arrive in the first half to compete with R2 or Llama-4.
- Members on the OpenAI Discord server suggested that an unannounced API might also be imminent.
- GPT-4o: The Model That Converted a User: A user finds GPT-4o to be such a strong daily driver that they rarely switch models, only using other models such as 4.5, o1, o3 when the 4o messages run out or for important or unsolved problems.
- The user also claims to have built an "engine" that recovered a 400+ turn chat and continues past 500 turns retaining context with no drift or hallucinations, all through the default prompt.
- Many-Shot Prompting Boosts Multimodal Model Muscle: A research paper (MANY-SHOT IN-CONTEXT LEARNING IN MULTIMODAL FOUNDATION MODELS) suggests that closed models like GPT-4o and Gemini 1.5 Pro benefit significantly from many-shot demonstrations up to ~2,000 examples, whereas open-weight models do not show the same benefit.
- The paper notes that large multimodal foundation models like GPT-4o and Gemini 1.5 Pro show significant performance improvements when provided with many-shot demonstrations compared to few-shot examples.
- Run an F1 Team Powered by GPT-4o: The open source project FormulaGPT (github repo) simulates head-to-head races between LLM-powered teams that think contextually and adaptively by continuously reasoning, strategizing, and making nuanced decisions.
- Viewers can challenge advanced language models in Player vs. AI Mode, or watch the best AI models battle each other in AI vs. AI Mode while observing detailed AI reasoning behind each pit stop, tire change, or overtaking maneuver.
- Avoid Turnitin AI Detector, If You Dare: A member sought advice on avoiding Turnitin AI similarity detection for a report reusing their company's business model, which violates Turnitin's ToS.
- Others suggested it looked like spamming appeals to cheat homework and recommended using humanize AI tools.
OpenRouter (Alex Atallah) Discord
- OpenAI's o1-pro: Gucci-Level Pricing?: Users reacted strongly to OpenAI's o1-pro API pricing at $150/M input tokens and $600/M output tokens, with one calling it GucciAI due to its high cost.
- Another member joked that the API's slowness might be a deliberate feature to prevent overspending given compute constraints.
- Image Generation MIA on OpenRouter: A user inquired about using Gemini's image generation with the gemini-2.0-flash-exp model, but was informed that image generation is not yet supported on OpenRouter.
- The team indicated that while image generation is on their roadmap, there are currently no short-term plans to support image models like Flux.
- Lambda Endpoints Plagued by 404s: Multiple users reported encountering 404 'no endpoint found' errors when attempting to use Lambda models, despite Lambda's status page showing full operational status.
- The community offered suggestions, and some users confirmed that the Llama 3.3 70B Instruct | Lambda model was functioning correctly for them.
- DeepSeek R1 challenges OpenAI o1: Members noted that the DeepSeek R1 model, a 671B parameter model with 37B active during inference, performs comparably to OpenAI's o1 but is open-sourced and available under the MIT license.
- Its availability under the MIT license allows for commercial use.
- Claude 3.7 Sonnet Sputters with Overload Errors: Users reported frequent overload errors when using Claude 3.7 Sonnet, leading to cut-off responses and charges for input tokens.
- One user suggested a retry strategy or switching to Gemini 2.0 Pro as an alternative, acknowledging Claude's strength in translations.
LM Studio Discord
- LM Studio Lacks NPU Support: Users have reported that NPUs are not yet supported in LM Studio, but Ryzen AI support exists in version 0.3.11.
- For those with limited resources like 2GB VRAM, consider using Gemma 3 1B with Q6 or Q8 quantization and the CUDA runtime for improved performance.
- KV Cache Quants Slash VRAM Needs: Users recommend leveraging KV cache 8-bit quants to diminish memory footprint when operating models with extensive context windows, like 30k tokens.
- Keep in mind that 12GB of VRAM might prove inadequate for a 32B model, suggesting that Phi-4 or Qwen2.5 14b could serve as compelling alternatives.
- Multi GPU Gets In-App Management: Enthusiasts are raving about LM Studio controls that allow the user to select the GPU that the model will load onto, available in the latest beta build.
- Multiple users confirmed that Multi GPU is supported out of the box with the latest beta build of LM Studio.
- Google Coral TPUs a Flop for AI: The Google Coral dual TPU is inadequate for AI use as it does not have any onboard memory to store data.
- One user with an 8060s also inquired about thermal and power headroom for the Framework Desktop.
- 4060ti: Inexpensive Inference Sweet Spot: The RTX 4060 Ti with 16GB of VRAM stands out as a budget-friendly pick for AI inference, clocking in around $500 USD/EUR.
- A user mentioned it is important to note that AMD cards are not optimized for gaming and the 5000 series from Nvidia may melt.
Yannick Kilcher Discord
- VPN code hijacks OpenAI site?: Users reported seeing
<veepn-guard-alert>and<veepn-lock-screen>tags on OpenAI's website, suggesting a VPN injection, but it was likely code injected by their own VPN sm0kywu.github.io/Amodal3R.- It appears that this user was simply using a VPN.
- cuOpt Solves Linear Programming at NVIDIA: NVIDIA® cuOpt™ is a GPU-accelerated optimization AI microservice that excels in Mixed Integer Linear Programming (MILP), Linear Programming (LP), and Vehicle Routing Problems (VRP) according to docs.nvidia.com.
- It appears this microservice is well received and performant at NVIDIA.
- CUDA Python is the new black?: Members debated whether it is truly the year of CUDA Python as mentioned by blelbach on X, with some asserting that Python is sufficient for GPU programming.
- Others mocked modern Python programmers, linking a YouTube video titled Modern Python Programmers.
- MoEs Training Stabilizes?: One user claimed that MoEs are unstable to train, but another user countered that they haven’t been unstable to train for two years and are now about the same as dense networks.
- The stability is largely due to better kernels and dropless token routing, solving issues like numerical instability and expert collapse.
- DeepSeek-V3 quietly drops: Members noted that DeepSeek released their DeepSeek-V3-0324 model, and a blog post reused their diagrams.
- The model boasts 685B parameters and offers various tensor types like BF16, F8_E4M3, and F32, with links to finetunes and quantizations.
GPU MODE Discord
- Flash Attention FA Debugging: In a discussion about understanding Flash Attention (FA), a member suggested coding and profiling/debugging, indicating that hands-on implementation aided understanding of normal attention, and similarly could for Flash Attention.
- One member ran into issues implementing Flash Attention 1 in triton: it works with TRITON_INTERPRET=1 but it has a few elements mismatched on cuda. After increasing rtol & atol the tests passed.
- RTX 5080 Gets CUDA 12.8: A developer released a patch enabling full CUDA 12.8 + PyTorch 2.5.0 compatibility with the Blackwell / sm_120 architecture for the RTX 5080, providing a GitHub repo with scripts, diffs, and instructions.
- It's also confirmed that WMMA instructions are "wrappers" that compile directly to HMMA/IMMA/QMMA instructions in SASS, similar to how MMA instructions function, as shown on the CUDA Godbolt.
- Hopper's Swizzle Unpacked: The documentation's description of the 64B swizzle in the Hopper architecture is confusing to many, but it's clarified to be a 64B (bytes) swizzle where each square is 128b (bits), which translates to a 8x64 tile for 8-bit dtypes and a 8x32 tile for 16-bit types.
- A member is seeking ROCm experts to help implement a row-row bank conflict-free swizzle for the tilelang HIP backend.
- Oxford U creates AI Fellowships: The University of Oxford has a new opening for a research fellow (postdoc level or equivalent experience) to work on AI / RL in games and neuroimaging with Rui Ponte Costa, at a salary of £100k+.
- This involves developing an AI-powered technology that can infer the contributions of specific brain regions to behavior by analyzing gameplay data, enabling non-invasive diagnosis and treatment of neurological disorders.
- Flash Attention's Contiguous Memory: In Flash Attention, tensors are stored as (batch_size, N, num_heads, d), which are contiguous in d (typically > 64), enabling efficient global memory coalescing where each thread loads 16B of data.
- This also makes it easier to understand what is going on, so LLMs can be used to understand kernel code, explaining simple concepts and variable states at specific places in tensors.
Interconnects (Nathan Lambert) Discord
- Nvidia Engineers Mamba-Transformer Hybrid: Nvidia introduced the Nemotron-H family of models, including a series of 8B and 47-56B models that are hybrid Mamba-Transformer models, offering improved inference speed, according to their research.
- The model is noted for improvements in speed compared to other models.
- Mistral 24B Roars Back into Favor: The release of Mistral 24B has been received as a major highlight due to its strength and accessible base model, further aided by new open releases under the Apache 2.0 license.
- A member stated, "Mistral 24B is probably one of the greatest releases in the last months, incredibly strong model and you have access to the base model as well."
- R1-Zero Training's Length Bias Exposed: An analysis reveals that using row mean in R1-Zero-like training introduces a bias, favoring shorter correct responses and longer incorrect ones, as detailed in a paper and accompanying code.
- Switching to all mean yields comparable performance without increasing length and raised questions about plots showing increasing reasoning length correlating with increased capability.
- China Plots Open-Source AI Blitz: China plans to flood the market with open-source AI models to commoditize AI software and boost its hardware sales, potentially shaking up US tech dominance, according to this tweet.
- The release of DeepSeek models temporarily knocked ~$1T off US tech market caps, highlighting the potential impact of Chinese AI.
- Browser Automation Scales Up with Infinibranch: Morph Cloud's Infinibranch Browser was suggested as a possible solution to help scale browser-use agents, improving the success rate to approximately 80% on tasks like finding Amazon links for a list of books.
- Traditional web scraping methods have become obsolete because of JavaScript-heavy single page applications, CAPTCHAs and sophisticated bot detection.
Latent Space Discord
- Gemini Updates Get Deep Dive: Gemini's Dave Citron joined @OfficialLoganK on the Release Notes podcast to discuss recent updates, including personalization, Canvas, Audio Overviews, and Deep Research as reported by Google Gemini App.
- The discussion covered topics from recent app launches to the future of personalization in the Gemini app, including insights into user data and privacy considerations.
- Claude Code Gains Eight New Features: Anthropic launched eight new features for Claude Code to help developers build faster and smarter, documented on their engineering blog.
- Features include a new think tool, leading to discussion on its implementation and value, with some likening it to Chain of Thought prompting.
- A16Z Explores Model Context Protocol (MCP): A16Z published a deep dive into Model Context Protocol (MCP), exploring its potential as a standard interface for execution, data fetching, and tool calling in AI models as APIs are the internet's first great unifier A Deep Dive Into MCP and the Future of AI Tooling | Andreessen Horowitz.
- The post examines the use cases of MCP, the challenges, and how it changes the way AI interacts with tools, noting that APIs were the internet’s first great unifier, but AI models lack an equivalent.
- Roboflow Unleashes RF-DETR for Real-Time Object Detection: Roboflow announced RF-DETR, a fully open-source real-time object detection model under the Apache 2.0 license available on GitHub.
- RF-DETR achieves SOTA performance with over 60 mAP on COCO, with base and large models at 29M and 128M parameters respectively.
- Swyx Engineers the Future of Agents: Swyx launched a new talk and essay on Agent Engineering, highlighting the reasons for going all in on Agents at @aiDotEngineer.
- The discussion defines Agents (thanks to @simonw) and elaborates on the Six Elements of Agent Engineering, examining how Agents could be ChatGPT's route to reaching 1 billion monthly active users (MAU).
Notebook LM Discord
- Mobile Study Participants Needed: The team seeks participants for a study on mobile use cases, encouraging individuals to share insights to enhance understanding of how to use the tool on mobile.
- The team also announced upcoming AI model updates, with more details to be shared soon.
- Mindmaps Emerge Gradually in NotebookLM: A user noted the absence of mindmaps in their NotebookLM, while another confirmed having them in the free version, indicating a staggered rollout of the feature.
- The mind map feature gets mixed reviews, needing constant regeneration to update and lacking details beyond the topic.
- NotebookLM Powers Extensive Research Reports: A user employs NotebookLM for research, crafting detailed reports to help people understand situations, focusing on local and regional news.
- The user also shared a link to a podcast episode discussing the legal consequences of a 911 prank call 911 Prank Call: The Felony Consequences.
- NotebookLM as HR Policy Central: A user explored using NotebookLM as a central hub for HR policies, employee handbooks, and new employee onboarding.
- Though the concept is promising, the user noted the answers weren't always accurate and wondered about effective information organization strategies.
- Mind Map Pixelation Solved with Zooming: A member suggests zooming in on tabs before downloading a Mind Map to enhance output quality and resolve pixelation issues.
- The member touted the crazy context window and low hallucination rates, even cancelling their subscriptions to ChatGPT and Claude.
Eleuther Discord
- Virtual Tester Predicts Model Performance: A member proposed a virtual testing environment to predict AI model viability before training, potentially saving resources and accelerating innovation; the simulator aims to determine if a model has a realistic chance of working or is doomed to fail early on.
- While others noted testing new architectures at a small scale is already relatively inexpensive, costing around $5 to train a L6D512 model on a 3090 for a day.
- EleutherAI Evaluates Evaluation Methods: A member detailed evaluation methods for EleutherAI in a new blog and set up an MkDocs site for easier navigation; they also await review on this PR.
- The contributor was cautioned about using AI to generate PR content, emphasizing the need to vet contributions to avoid adding spam.
- VectorAdam claims rotation equivariance: VectorAdam modifies the second moment update to be the square of the vector norm per gradient vector, addressing coordinate-system bias in Adam, potentially improving rotation equivariance.
- It was noted that VectorAdam is not similar to Adafactor, but more like a blocked approximation with block size = hidden dim.
- MechInterp faces backlash for being outside academia: Members discussed that there seems to be an academic 'backlash' to the 'mechinterp' brand because so much of it is outside of traditional academic channels, and they are resistant to the paradigm.
- A member found that the first token to trigger an activation is holocaust but it's not the token with the strongest activation, and wondered if neuron activation might be context specific.
- Recursive Design Trumps GANs, CNNs, and RL: A member introduced a novel diagram using a recursive design, distinguishing it from traditional GANs; this implementation emphasizes structural organization over sequential processing, leveraging CNNs for filtering and RL for refining responses.
- Another member is drafting a PR to update the evaluation logic to
lm_eval==0.4.8, the latest version, referencing the Evals PR.
- Another member is drafting a PR to update the evaluation logic to
HuggingFace Discord
- HF Agents Course Embraces New Frameworks: The Hugging Face Agents Course now has integrations for LlamaIndex, LangChain, and smolagents, offering learners diverse approaches to agent frameworks, as noted in this tweet.
- Members using the Agents course noted that LangGraph is rigid which helps to guide their process when building smolagents.
- pdf2notes Converts PDF Notes Effortlessly: Pdf2Notes converts PDFs into organized notes using LlamaParse and Llama-3.3-70B, also utilizing DeepMind's Gemini 2 Flash for multi-modal parsing, wrapped in a Gradio and FastAPI framework.
- A member asked if pdf2notes can operate 100% locally without external APIs, raising concerns about needing subscriptions for Gemini and Groq.
- SpatialLM takes on 3D Data: SpatialLM, a 3D large language model designed to process 3D point cloud data, has been released on Hugging Face at manycore-research/SpatialLM-Llama-1B.
- It generates structured 3D scene understanding outputs and can be further explored via the project website and GitHub repository.
- InferenceClient API throws Authentication Errors: A user reported a 403 Forbidden error when attempting to list deployed models using the
InferenceClientAPI, even with read-only tokens configured to allow calls to Inference Providers.- The error indicates insufficient permissions to call Inference Providers and a user posted a link with the same error.
MCP (Glama) Discord
- K8s Required for MCP Prompt Testing: A Kubernetes setup is required to test MCP prompts, such as those found in this file and this test.
- An alternative implementation with prompts is available here for managing Electric Vehicle charging stations.
- Microsoft releases official C# SDK for MCP: Microsoft has released a new official C# SDK for Model Context Protocol servers and clients, available here.
- This SDK provides developers with tools for building AI applications using JavaScript and TypeScript, integrating into web frameworks like Next.js and Svelte, per Vercel AI SDK 4.2.
- Zapier Integrates with MCP: Zapier has released an MCP server, providing access to over 8,000 integrations for AI assistants to interact with various apps.
- This integration enables AIs to perform real-world tasks such as sending messages, managing data, scheduling events, and updating records, expanding their capabilities beyond text generation.
- MCPwizard eases Server Creation: A member introduced mcpwizard, a CLI tool to simplify creating and deploying MCP servers, highlighting features like initializing projects and adding custom tools to Claude assistants.
- The tool's GitHub repo was also shared for community feedback and contributions.
- Google Sheets MCP Server Enables Direct Editing: A member built a Google Sheet MCP server, allowing Claude to directly edit spreadsheets, streamlining data handling and formula adjustments as mentioned in this tweet.
- The code can be found here.
Nomic.ai (GPT4All) Discord
- Prompting Language Models in Specific Languages: Members discussed that to make language models respond in a specific language (e.g. German), it is best to write the system message in that language to avoid triggering "Im Kontext Lernen" (in-context learning).
- It was further suggested that avoiding negative sentences can improve results, with a recommendation to rephrase instructions to use active verbs instead.
- Mistral Model Versions Clarified: It was mentioned that Mistral Nemo is a 12b model and Mistral 24b is Mistral 3 or Mistral 3.1, with discussion around specific model details for projects.
- Confusion arose around identifying the exact model, with one member emphasizing the need for precise model information to avoid issues.
- GPT4All's LocalDocs Mysteriously Vanish: A user reported that their entire catalog of local docs disappeared for no apparent reason, prompting discussion about potential causes such as changes to the install folder or lack of admin rights.
- Members recommended backing up the localdocs.db file and the original documents to prevent data loss, and suggested that a Windows 11 update might have caused the issue by messing with drive letters.
- LLMs Consider Medical Office Automation: Members discussed the potential of using local LLMs in a medical office setting to help doctors create reports and assist with treatments, with a focus on the system learning from past dictated notes.
- However, it was cautioned that LLMs may not be suitable for handling financial or medical data due to the risk of confabulation and the need for precise information.
- GPT4All Remains Blind: A member asked if any models that GPT4All can run have vision capabilities, and it was confirmed that GPT4All does not support vision capabilities.
- Alternative tools like LM-Studio were suggested as options for vision-related tasks.
Modular (Mojo 🔥) Discord
- Open APIs Pave Path for Portability: When exploring high-performance software solutions, using open and portable APIs such as OpenCL, OpenMP, OpenACC, Vulkan’s Compute API, and SYCL is a good starting point.
- POCL was pointed to as an academic project with related papers.
- Democratizing AI Compute Lowers GPU Costs: Chris Lattner's series, 'Democratizing AI Compute', underscores the importance of better hardware utilization to reduce the need for expensive GPUs.
- The series includes articles on CUDA, OpenCL, and AI compilers (TVM and XLA).
- MAX Platform Inquiries: A new user inquired about modifying the max/pipeline directory and testing changes within the MAX Platform via the pixi.toml file.
- Specifically, they were curious about altering the max-pipeline without downloading it as a dependency.
- Mojo's Formatting Tool Rivals Black and fmt: Mojo incorporates a built-in formatting tool,
mojo format, akin toBlackin Python orfmtin Rust, for code formatting.- Meanwhile, GPU support for Windows is difficult because the Windows compiler toolchain is a pain to work with.
LlamaIndex Discord
- AGNCY Initiative Seeks Agentic Standard: Luke is spearheading AGNCY, an initiative focused on forging an open standard for agentic interactions.
- The project aims to provide a robust framework for developing more effective and interoperable AI agents.
- Deepseek and LlamaIndex Build Smarter RAG: Akshay Pachaar details a new project integrating Deepseek AI to create a RAG app using LlamaIndex for orchestration, Deepseek AI R1 for inference, Ollama to locally serve R1, and Streamlit for the UI; more details here.
- This is intended to demonstrate the power of combining different tools to build sophisticated applications.
- Timeouts Break Agent Workflows: A member reported that their agent workflow was crashing because of unhandled timeout errors with the OpenAI endpoint.
- It was suggested to catch
WorkflowRuntimeExceptionorExceptioninstead ofWorkflowTimeoutErrorto resolve the issue.
- It was suggested to catch
- Members Ponder Function Calling in Multi-Agent: Members are contemplating whether triggering single agents via function calling could displace program-wide backoff mechanisms in multi-agent systems.
- The central question is whether these two setups might achieve the same functionality in certain scenarios, potentially streamlining system architecture.
- Crafting the Interview Grindset: A member is building a local AI using Llama 3.2, Sonnet 3.7, and Dolphin blended into a 16B model with RAG and custom fine-tuning.
- He is trying to get his AI to apply to ai/tech companies and pass interviews and has experience in face tracking, blender, unity, powershell, and TTS.
Cohere Discord
- Command-R-Plus Powers Molecular AI Assistant: An AI assistant, powered by Cohere's command-r-plus, is being used to build tools for structural biology with a MolStar molecular viewer (https://ai.doi.bio).
- The site supports a 'load' command, demonstrated by saying 'Show me 7zzz' to load PDB entries into the viewer.
- Cohere Clears Up Chat Security Policies: A member inquired about data retention and security policies for Cohere's chat feature, asking if data is used for model training.
- A Cohere team member linked the privacy policy, data usage policy, and security policy, noting that users can control data settings in their dashboard.
- API Spamming Suspected as SSL Error Culprit: A member reported experiencing SSL errors when rapidly sending requests to the API, suggesting it might be due to spamming despite proper py.ssl module installation.
- Another member proposed the issue might stem from untrusted server certificates, and others pointed out that API rate limits usually return a 429 error code rather than an SSL error.
- vnc-lm Launches RAG-Enabled Discord Bot: A member released a new version of their Discord bot, vnc-lm, featuring a RAG pipeline that augments prompts with data from Wikipedia and DuckDuckGo.
- The bot adds approximately 500 tokens to each prompt, appending five chunks of sourced information to improve the model's context, with code available on GitHub.
- vnc-lm Now Supports ALL LLMs via Docker: The updated Discord bot now supports all popular local and hosted large language model APIs, including Cohere, enabled with Docker.
- With the new release, users can easily edit messages and get new responses within Discord.
Torchtune Discord
- DeepSeek-V3 Drops Without a README: Deepseek released DeepSeek-V3 without a proper readme, accessible on Hugging Face, prompting humorous reactions.
- Despite the lack of documentation, a playground is available, allowing users to experiment with the model.
- Data Quality still Tortures AI Engineers: Despite years of research, defining and achieving good data remains a challenge for AI labs, even after the recognition of datasets like fineweb and lima.
- A member expressed frustration over the persistent lack of effective PDF extraction tools.
- LlamaExtract Tool Structures Documents: LlamaIndex launched LlamaExtract, a tool for structuring complex documents using genAI-native agents.
- It adapts the latest models to accurately structure documents like financial reports and resumes, as per a tweet from Jerry Liu.
- GRPO LoRA Scores Surprisingly High: The GRPO LoRA 3B single device achieves 54% on GMS8K, as shown in this pull request.
- It performed better than expected on novel questions, despite an error of adding extraneous +2 in its calculation.
- CUDA Graphs Compress GPU Operations: Members discussed CUDA graphs, which capture a whole bunch of GPU operations as a graph and launch them as a single operation.
- This reduces the overhead to launch CUDA operations from the CPU, which reduces GPU idle time.
DSPy Discord
- DLCoT Optimizer Trims Tokens: The new DLCoT (Deconstructing Long Chain-of-Thought) Optimizer slashes token usage by 70-90% while maintaining or improving accuracy across benchmarks, available in pull request #8000.
- It enhances chain-of-thought reasoning by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output, while working with existing DSPy optimizers like BootstrapFewShot.
- DSPy Inspires Creativity Optimizations: Members discussed using DSPy for creative content generation by optimizing prompts and using a good judge, pointing to resources like PAPILLON and Agentic Reward Modeling.
- The discussion underscored the need for example inputs but not necessarily summaries (labels) if a judge/metric can assess summaries without a reference.
- Granular Feedback Arrives Via Prediction: Achieving granular feedback with Refine, where specific checks over an output provide targeted feedback, is coming soon.
- Version 2.6.15 will enable returning
dspy.Prediction(score=...., feedback=....)to offer fine-grained feedback to the module.
- Version 2.6.15 will enable returning
- Multi-Agent Protocol Standard Explores Retrieval: Members explored expanding the multi-agent protocol standard (MCP) to retrievers/retrieval augmented generation.
- They are discussing a shared schema for retrieval results and methods to exchange documents and embeddings to streamline data-driven workflows and simplify combining multiple models and data sources.
tinygrad (George Hotz) Discord
- Dataset Origins Discovered: A member located the
datasets/sops.gzdataset within the repo's extra directory, which is used inspeed_compare_cuda_ptx.- The dataset is generated via the generate_dataset.sh script within the same directory.
- CUDA Port Configuration Clarified: When asked about porting Tinygrad to CUDA GPU, a member provided a link to the README.md file, showcasing the project's supported backends.
- This indicates that CUDA support information is available within the project's documentation.
- Agenda Alert: Meeting #63 Topics: Meeting #63's agenda includes company updates, quantized DSP, BERT, scheduler, driver, tensor cores, WebGPU, ONNX, RetinaNet, and Torch frontend discussions.
- Also planned is to discuss bounties around the AMD LLVM backend and topics such as test_ops, multi GPU training, and torch compile.
- AMD LLVM Backend Advances: Progress on the AMD LLVM backend involves multiple merged pull requests and testing with Llama3 and Flux examples.
- Currently, a pull request is under review, marking continued development in this area.
- ONNX Frontend Emerges: The creation of
tinygrad.frontend.onnxwas announced, signaling a focus on ONNX preparation for the week.- Efforts include validating the top 30 Hugging Face ONNX repos.
LLM Agents (Berkeley MOOC) Discord
- Quiz Title Typo Sparks Confusion: A member reported a typo in the title of Quiz 7, causing confusion when checking answers for Quiz 6.
- Another member acknowledged the catch and thanked the reporter.
- AgentX Research Track Application Opens: Selected students will receive mentorship from Berkeley postdocs/mentors on an AgentX Research Track project, due March 26th at 11:59pm PDT.
- Mentorship is not required to join or succeed in AgentX, and labs plus the Certificate Declaration form will be released in April as seen in the attached image.
- Research Track Goes Remote, Stays Unpaid: A member confirmed that the AgentX Research Track mentorship will be conducted remotely.
- Another member clarified that the mentorship is not paid, with mentors simply providing guidance on the research project.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
>: no description founde Typo Causes Confusion**: A member reported a typo in the title of **Quiz 7**, causing confusion when checking answers for **Quiz 6**. - Another member acknowledged the catch and thanked the reporter. - **AgentX Research Track Application Live**: Selected students will receive mentorship from **Berkeley postdocs/mentors** on an **AgentX Research Track project** with applications due **March 26th** at **11:59pm PDT**. - Mentorship is not required to join or succeed in **AgentX**, and labs plus the Certificate Declaration form will be released in April as seen in the [attached image](https://cdn.discordapp.com/attachments/1280370030609170494/1353204258450964544/image.png?ex=67e2c76c&is=67e175ec&hm=1fb895b885ce732fd7e5b99b8ff24c55286d5). - **Research Track is Confirmed to be Remote and Unpaid**: A member confirmed that the **AgentX Research Track mentorship** will be conducted remotely. - Another member clarified that the mentorship is not paid, with mentors simply providing guidance on the research project. --- --- --- --- {% else %} >
GPU MODE ▷ #torch (5 messages):
torch.compile() graph breaks, VRAM reduction techniques, FA3 attention FP8
torch.compile()and Graph Breaks: An Investigation: A user inquired about how to check for graph breaks when usingtorch.compile(), noting thattlparselogs yielded missing metrics.- They noted that training runs fine with
torch.compile(model, fullgraph=True), asking if this means there are no graph breaks.
- They noted that training runs fine with
- VRAM Usage Gets Slimmer: A user outlined techniques to reduce VRAM usage, including folding the optimizer step into backward (with a link to a PyTorch tutorial) and offloading optimizer states to the CPU via
torchao.- They also mentioned partially offloading optimizer states with BNB paged optimizers, and pointed to a TorchTune page on memory optimization, referencing a table summarizing components like Model Precision, Activation Checkpointing, and Activation Offloading.
- Serialized Compiled Models Remain Elusive: A user shared a GitHub issue about the inability to save/load compiled models and asked if anyone is actively working on it.
- The issue describes the bug as Serializing a compiled model with pickle fails.
- PyTorch Developer Podcast breakdown, please visit the web version of this email: []()! >: Technology Podcast · The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.
- How to save memory by fusing the optimizer step into the backward pass — PyTorch Tutorials 2.6.0+cu124 documentation: no description found
- Memory Optimization Overview — torchtune main documentation: no description found
- Make compiled models serializable · Issue #101107 · pytorch/pytorch: 🐛 Describe the bug Serializing a compiled model with pickle fails with Can't pickle local object 'convert_frame.<locals>._convert_frame' and cannot pickle 'ConfigModuleInstance&...
{% else %}
>
>
>
GPU MODE ▷ #announcements (1 messages):
Tanishq Kumar, Scaling Laws for Low Precision, Precision-aware scaling laws, post-training quantization, compute optimal- Tanishq Kumar Talk on Scaling Laws Incoming: In about 3 hours, Tanishq Kumar will discuss his paper on "Scaling Laws for Low Precision" which introduces precision-aware scaling laws for training and inference.
- Lower Precision Training Scaling Laws: The paper proposes that training in lower precision reduces the model's effective parameter count, enabling the prediction of additional loss from low precision training and post-train quantization.
- It suggests that training larger models in lower precision may be compute optimal.
- Quantization Degradation: The research indicates that the degradation from post-training quantization escalates as models train on more data, potentially making additional pretraining data detrimental.
- The study unifies scaling laws for post and pretraining quantization to predict degradation from training and inference in varied precisions, validated on models up to 1.7B parameters trained on 26B tokens.
Link mentioned: Scaling Laws for Precision: Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling la...
GPU MODE ▷ #cool-links (1 messages):
srivarshan4271: https://lights0123.com/blog/2025/01/07/hip-script/
GPU MODE ▷ #jobs (1 messages):
AI & Neuroscience Fellowship at the University of Oxford, AI / RL in games and neuroimaging, non-invasive diagnosis and treatment of neurological disorders- Oxford U Opens AI & Neuroscience Fellowship: The University of Oxford has a new opening for a research fellow (postdoc level or equivalent experience) to work on AI / RL in games and neuroimaging with Rui Ponte Costa.
- The salary will be £100k+, with slight adjustments based on experience level, at the Centre for Neural Circuits and Behaviour.
- AI Powers Systems-Behavioral Neuroscience: The fellowship develops an AI-powered technology that can infer the contributions of specific brain regions to behavior by analyzing gameplay data, enabling non-invasive diagnosis and treatment of neurological disorders.
- Their approach leverages state-of-the-art deep reinforcement learning models, specifically MuZero and Dreamer architectures (project link).
- Pillar VC backs AI for Science Fellows: Pillar VC and ARIA are backing AI fellows to spend one year embedded in top science labs in ARIA's Opportunity Spaces across the UK.
- They seek the next generation of founders, scientists, and leaders building AI for science (fellowship link).
Links mentioned:- ARIA Opportunity Space: Scalable Neural Interfaces: no description found
- Encode: AI for Science Fellowship: A fellowship connecting top AI talent with leading science labs in the UK to catalyze translation. Backed by Pillar, powered by ARIA.
GPU MODE ▷ #beginner (56 messages🔥🔥):
GPU/CUDA learning resources, Warp scheduler significance, Context switching, SIMD vs SIMT execution, Flash attention setup on Windows- GPU Glossary Sparks CUDA Confusion: A member learning about GPUs/CUDA from the Modal GPU glossary expressed confusion about warp schedulers and context switching, specifically about the point of context switching if each thread shares the same instruction pointer.
- Another member explained using an example of 64 threads in two groups, showing how the scheduler executes one warp while another waits for data, similar to CPU context switching but without state storage overhead.
- SIMT Demystified: Data Differentiates Threads: A member clarified that while threads in a warp share the same instruction, the data differs, enabling SIMT (Single Instruction, Multiple Threads) execution where 32 threads can multiply 32 elements in one clock cycle.
- They emphasized that a group of 32 threads is scheduled at once, and context switching brings in a different group of 32, rather than scheduling individual threads one after another.
- Flash Attention Frustrations on Windows VM: A member encountered issues setting up the flash attention repo locally within a Windows/Ubuntu VM, struggling with nvcc version conflicts and potential disruption to existing CUDA/Torch/Triton setups.
- Considering vast.ai for development, they sought recommendations on suitable machines for Triton/CUDA work and guidance on choosing a machine to train a BERT model with custom kernels.
- CUDA Core Confusion Corrected: A member explained that NVIDIA's marketing term "CUDA cores" actually refers to FP32 units, which function similarly to SIMD operations and cannot run independently.
- Warps from different kernels can be scheduled to the same Streaming Multiprocessor (SM) in a finely time-sliced fashion, especially beneficial when threads are waiting for data loads.
- Streaming Multiprocessor Architecture Deep Dive: A member clarified that multiple thread blocks can run on one Streaming Multiprocessor (SM), which is crucial for block synchronization, allowing the SM to have warps ready to run while others await a barrier, referencing H100 Streaming Multiprocessor.
- They explained that resources like registers and shared memory determine the number of resident thread blocks, and the warp scheduler context switches between warps to keep processing units busy.
Link mentioned: GPU Glossary: A glossary of terms related to GPUs.
GPU MODE ▷ #pmpp-book (1 messages):
Amazon Book Release Date, 5th Edition of Book- Fifth Edition Release Date Spotted on Amazon: A member reported seeing a 5th edition of an unspecified book listed on Amazon with a scheduled release date of February 2026.
- Release Date Unconfirmed: Another member requested confirmation of this release date.
GPU MODE ▷ #jax (1 messages):
bigfoot1144: Any progress so far?
GPU MODE ▷ #rocm (2 messages):
ROCm, tilelang HIP backend, row-row bank conflict-free swizzle, AMD sponsoring cards- Seeking ROCm Row-Row Bank Conflict-Free Swizzle Implementation: A member is seeking ROCm experts to help implement a row-row bank conflict-free swizzle for the tilelang HIP backend.
- Currently, they only have solutions for NT layout conflict swizzling, and are requesting assistance from the community.
- AMD Card Sponsorship Plea for ROCm Development: The same member jokingly requested that AMD sponsor some cards for development related to ROCm.
- This highlights the resource constraints faced by some developers in the ROCm ecosystem.
GPU MODE ▷ #lecture-qa (2 messages):
Hopper Flops, H100 Clock Rate, H100 SMs, Nvidia Boost Clocks- H100's Dense FLOPs Revealed: For fp16/bf16, dense flops in Hopper = 989 TFLOPS and the clock rate of H100 = 1.830 GHz with number of SMs = 132.
- The FLOPs / clock / SM = (989 x 10^3) / 1.83 / 132 which is approximately 4096.
- Nvidia's Seldom-Mentioned Boost Clock Detailed: The H100 SXM has a boost clock of 1.980 GHz for normal SM operation, but if you use tensor cores it drops down to 1.830 or lower depending on power draw/thermals.
- There are some rare conditions where you get the full boost clock when running TC ops but strangely that's not always the case.
- Official Hopper Boost Clock Document Located: A document was shared which mentions the different boost clocks (GTC22 Whitepaper).
- The different boost clocks can be found in table 3, page 39 of the document.
Link mentioned: NVIDIA H100 Tensor Core GPU Architecture Overview: A high-level overview of NVIDIA H100, new H100-based DGX, DGX SuperPOD, and HGX systems, and a H100-based Converged Accelerator. This is followed by a deep dive into the H100 hardware architecture, ef...
GPU MODE ▷ #tilelang (10 messages🔥):
Tilelang 2:4 sparsity support, Tilelang v0.1.3 Release, SPGEMM issue- Tilelang to Support 2:4 Sparsity: Tilelang plans to support 2:4 sparsity, leveraging Cute as a backend, although the user acknowledges its current uncommonness in AI workloads.
- A user expressed interest in fine-tuning 2:4 sparse LLMs, noting its success with vision models, but uncertainty about its impact on LLM accuracy.
- Tilelang v0.1.3 lands with Cute Upgrades: Tilelang released v0.1.3, featuring enhancements, optimizations, and bug fixes, including Cute upgrades.
- The release includes new kernels and tutorials such as DeepGEMM, plus autotuning and kernel caches, among other new features.
- Request to add SPGEMM Issue: A TileLang dev requested that users interested in trying Tilelang for SPGEMM should open an issue on GitHub.
- A user indicated that they would be interested in seeing progress on this if the dev team investigates further.
Link mentioned: Release v0.1.3 · tile-ai/tilelang: What's Changed[Docker] Add libstdcxx-ng-12 to Dockerfiles for CUDA versions by @LeiWang1999 in #160Add cpu jit with backend ctypes by @xs-keju in #154[Carver] Multi-Threads Compilation for Fast...
GPU MODE ▷ #metal (3 messages):
Parallelized Cholesky, Python + MLX + Metal- Parallelized Cholesky accelerates with Python + MLX + Metal: A member shared their contribution to the community: a super high speed parallelized cholesky in python + MLX + Metal, along with an attached python file.
- Another member commented this is really cool.
- MLX gains momentum: The community sees growing interest in the MLX framework for Metal.
- MLX seems to be unlocking new possibilities in high-speed computing.
GPU MODE ▷ #self-promotion (10 messages🔥):
WheelNext Initiative, CUDA Indexing Blogpost, Container-First Triton Development, GemLite bfloat16 Support- *WheelNext* Gears Up to Enhance Python Packaging: The WheelNext initiative (wheelnext.dev) aims to improve the user experience in the Python packaging ecosystem, focusing on scientific computing and machine/deep learning.
- A meetup was announced to discuss making shipping python packages with native accelerator code much easier, with details available on Discord.
- Dive into CUDA Indexing with New Blogpost: A member shared a blog post explaining CUDA indexing with a 2D block tiling example for matrix multiplication, emphasizing row-major format.
- The post details how a 2D array
Awith shape(M, N)in CUDA is linearized in row-major format, mapping the coordinate(i,j)toi * N + j.
- The post details how a 2D array
- Container-First Approach Streamlines Triton Development: A member highlighted a new blog post about using containers to simplify and accelerate Triton kernel development.
- The post emphasizes how containerization enhances the Triton development workflow by simplifying setup, increasing consistency, and enabling more seamless collaboration.
- *GemLite* Adds bfloat16 Support for Gemma Models: GemLite now supports bfloat16 on both Hopper and non-Hopper GPUs, enabling the running of Gemma models in vllm via hqq.
- More details are available in the associated tweet and on the github pull request.
Links mentioned:- WheelNext: no description found
- Tweet from mobicham (@mobicham): GemLite now supports bfloat16 on both Hopper and non-Hopper gpus 🫡https://github.com/mobiusml/gemlite/pull/24
- Indexing in CUDA: In this blogpost I want to explain what it means for a matrix to be in row major format. This is essential to understand CUDA kernels and their methods ...
- A container-first approach to Triton development: The Triton project from OpenAI is at the forefront of a groundbreaking movement to democratize AI accelerators and GPU kernel programming. It provides a powerful and flexible framework for writi...
GPU MODE ▷ #🍿 (1 messages):
LLM Kernel Understanding, RL for Operation Understanding, Reducing Hallucinations in Kernel Creation- LLMs Demystify Kernel Code: The idea is to use LLMs to understand kernel code, explaining simple concepts and variable states at specific places in tensors.
- This aims to ensure the LLM grasps the underlying operations.
- RL Supercharges Kernel Operation Grasp: Employ Reinforcement Learning (RL) to enhance the model's understanding of operations, ensuring a solid grasp.
- This solid grasp of kernel operations can serve as a prerequisite for creating complex kernels and potentially reducing hallucinations.
- Kernel Creation Sanity Check with LLMs: Using LLMs to verify and explain kernel operations could greatly reduce hallucinations during the complex kernel creation process.
- Such method could be seen as a sanity check for complex kernel code and design.
GPU MODE ▷ #reasoning-gym (5 messages):
veRL rollouts with sglang, low precision data types, quantization strategies for RL, ARC-AGI2 announcement- veRL rolls out sglang support: veRL now supports rollouts with sglang as shown in this paper.
- Tiny Model Reasoning with GRPO: A study showed reinforcement learning (RL) improving reasoning in small language models (LLMs), specifically a 1.5B parameter model trained on 4 NVIDIA A40 GPUs in 24 hours.
- Adapting the Group Relative Policy Optimization (GRPO) algorithm on a curated dataset, the model achieved significant gains, such as AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, with a training cost of only $42.
- ARC-AGI2 frontier benchmark: A member shared the ARC-AGI-2 announcement, a frontier AGI benchmark challenging AI reasoning systems.
- The goal is to achieve 85% accuracy with ~$0.42/task efficiency, contrasting sharply with current performance levels of base LLMs at 0% and reasoning systems at under 4%.
Links mentioned:- Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't: Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained setting...
- Tweet from ARC Prize (@arcprize): Today we are announcing ARC-AGI-2, an unsaturated frontier AGI benchmark that challenges AI reasoning systems (same relative ease for humans).Grand Prize: 85%, ~$0.42/task efficiencyCurrent Performanc...
GPU MODE ▷ #gpu模式 (5 messages):
CUDA core, CUDA_fp6.hpp, CUDA_fp4.hpp- CUDA core's fp4 and fp6 use cases requested: A member inquired about which libraries utilize fp4 and fp6 within the CUDA core, referencing the presence of
cuda_fp6.hppandcuda_fp4.hppheader files in version 12.8.- However, they noted difficulty in locating libraries that actively employ these header files.
- CUDA FP4/FP6 Library Usage: The user is asking about the usage of FP4 and FP6 data types within CUDA cores, specifically if any libraries are utilizing them.
- They have identified header files (cuda_fp6.hpp and cuda_fp4.hpp) in CUDA version 12.8, but haven't found examples of their practical application in existing libraries.
GPU MODE ▷ #general (9 messages🔥):
Submission Guide, Kernel profiling, Conv2D error- Submission Guide Available: A member asked for a submission guide and another member shared a link to the documentation for the GPU kernel leaderboard, which is a competition platform on Discord where users can submit their own kernel implementations.
- Kernel Profiling Coming Soon!: A member asked if it was possible to profile their triton kernel via the bot itself.
- The response was that we do not currently have that possibility, but it's in store and you (most likely) can expect it for the first problem set launch.
- Conv2D Submission Error: A member reported getting a consistent error when submitting to conv2d involving
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1and asked if it meant their CUDA source couldn't compile.- The member was new to CUDA and C++ and was seeking assistance from the community.
Link mentioned: Getting Started | GPU MODE Kernel Leaderboard: Welcome! If you are excited about building GPU kernels, this leaderboard is the place for you! We
GPU MODE ▷ #submissions (119 messages🔥🔥):
matmul benchmarks on H100, grayscale benchmarks on A100, grayscale benchmarks on T4, L4, A100, H100, histogram benchmarks on T4, vectorsum tests on A100- Modal Runners Ace Matmul Benchmarks on H100: Numerous
matmulbenchmarks and tests using Modal runners on H100 GPUs have succeeded, with submission IDs ranging from 2479 to 2487.- These submissions indicate successful execution and integration of Modal runners for matrix multiplication tasks on high-performance GPUs.
- Grayscale Gauntlet on A100 GPUs: A multitude of
grayscalebenchmark and leaderboard submissions have succeeded on A100 GPUs using Modal runners, with submission IDs spanning from 2488 to 2596 and beyond.- These consistent successes highlight the reliability and efficiency of Modal runners for image processing tasks on A100 GPUs.
- Grayscale Greatness Across GPUs: Leaderboard submissions for
grayscaleusing Modal runners have succeeded across various GPUs, including T4, L4, A100, and H100, with an initial submission ID of 2484.- This demonstrates the versatility of Modal runners in handling image processing tasks on diverse GPU architectures.
- Histogram Hit on T4 GPUs: A
histogrambenchmark submission with ID 2765 using Modal runners on T4 GPUs has succeeded.- This indicates successful execution of histogram computation tasks on T4 GPUs utilizing the Modal runners platform.
- Vector Sum Victory and Conv2d Conquest on A100: Test submissions for
vectorsumandconv2dhave succeeded on A100 GPUs using Modal runners with IDs 2829 and 2830.- These successful tests highlight the capability of Modal runners in handling vector operations and convolutional tasks on high-performance GPUs.
GPU MODE ▷ #status (2 messages):
CUDA, load_inline(), PyTorch headers, KernelBotload_inline()Timed Out Due to Excessive PyTorch Headers: CUDA submissions usingload_inline()were timing out because about 5K PyTorch headers were being added, as investigated in this PR.- A new mode was added to disable implicitly adding headers, and one member managed to get an example compiling from 90s to 15s, while a colleague got it from 15s to 5s.
- KernelBot leaderboard performance improved: The KernelBot leaderboard supports custom CUDA extensions via
load_inline(), which previously resulted in cold starts of up to 90s.- A member stated that they always thought it was a cuda problem, and was happy it could be solved.
Link mentioned: load_inline no_implicit_headers mode by msaroufim · Pull Request #149480 · pytorch/pytorch: In the kernelBot leaderboard we support people competing with custom cuda extensions via load_inline(), however even on toy kernels this can result in cold starts of up to 90s - this problem is pri...
GPU MODE ▷ #hardware (17 messages🔥):
GPU prices, VRAM requirements for LLMs, RTX Pro 6000, CUDA Capability- GPU Prices Skyrocket Amid AI Boom: High-end consumer GPUs are becoming increasingly expensive due to NVIDIA's strategy of limiting high VRAM to those models, but cloud vendors like vast.ai and Nebius offer cheaper alternatives for running models.
- One member stated, "welcome to the ai boom," highlighting the impact of AI on GPU pricing and availability.
- Max out budget on older GPUs, run stuff locally: For local machine learning, investing in older cards like 3090 or 4090 is suggested for maximizing budget, with 2x3090 potentially outperforming a single newer card, allowing for local distributed training.
- The assertion was made that older cards provide opportunities to learn distributed stuff locally.
- Nvidia desensitizes users to high prices: The new RTX Pro 6000, with 96GB VRAM, is considered a reasonable option for professionals, normalizing the perception of high GPU costs, although it lacks NVLink.
- One member noted, "Actl i think nvidia has successfully desensitized me to their insance prices," suggesting an adjustment in expectations due to market trends.
- GDDR7 memory: The RTX Pro 6000 features 96 GB GDDR7 with ECC and 1792 GB/sec bandwidth, although discrepancies exist in CUDA API versions reported in the Data Sheet and TPU specifications.
- The specs report Compute APIs as CUDA 11.6, while TPU claims CUDA 10.1, and the member highlighted that the CUDA GPUs list has Geforce RTX 50 series with C.C. 10.0 instead of 12.0.
GPU MODE ▷ #tpu (1 messages):
rocka2424: This is awesome, looking forward to it!
Interconnects (Nathan Lambert) ▷ #news (86 messages🔥🔥):
Nvidia Mamba-Transformer Hybrid, Qwen 2.5 Omni Model, DeepSeek V3 Model Update, Reve Image Halfmoon Model, Qwen2.5-VL-32B-Instruct- Nvidia engineers a Nemotron-H Mamba-Transformer hybrid: Nvidia introduced the Nemotron-H family of models, including a series of 8B and 47-56B models that are hybrid Mamba-Transformer models, offering improved inference speed compared to other models, according to their research.
- Qwen Debuts Qwen2.5-Omni: An End-to-End Streaming Multimodal Model: Qwen released Qwen2.5-Omni, a multimodal model designed to perceive text, images, audio, and video, while generating text and natural speech responses in a streaming manner, according to HuggingFace.
- *DeepSeek V3* Gets a Quick Update, Still Rocks Leaderboards: DeepSeek announced a small version upgrade for the DeepSeek V3 model, with the API interface and usage method remaining unchanged, according to their HuggingFace page.
- Reve Image Launches Halfmoon: Claims Top Spot in Image Generation: Reve Image launched Halfmoon, claiming it's the best image model in the world, with impressive text rendering, prompt adherence, and aesthetics, currently accessible through their website, according to their announcement.
- Qwen Drops Qwen2.5-VL-32B-Instruct: Open Source VL Model with RLHF: Qwen open-sourced the Qwen2.5-VL-32B-Instruct model under the Apache 2.0 license, optimized with reinforcement learning, showing significant improvements in human preference and mathematical reasoning, according to their blog.
Links mentioned:- Qwen2.5-VL-32B: Smarter and Lighter: QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback fro...
- Tweet from ARC Prize (@arcprize): Today we are announcing ARC-AGI-2, an unsaturated frontier AGI benchmark that challenges AI reasoning systems (same relative ease for humans).Grand Prize: 85%, ~$0.42/task efficiencyCurrent Performanc...
- Tweet from 坂本 (@AkemiMadoka): @teortaxesTex looks like a small update on v3
- Tweet from Simon Willison (@simonw): Notes on today's DeepSeek v3 0324 model - a 641 GB MIT licensed monster, but you can run it on a ~$10,000 consumer level 512GB M3 Mac Studio if you use the 352 GB quantized version via MLX https:/...
- Tweet from Reve (@reveimage): Halfmoon is Reve Image — and it’s the best image model in the world 🥇(🔊)
- Tweet from Xeophon (@TheXeophon): Tested the new DeepSeek V3 on my internal bench and it has a huge jump in all metrics on all tests.It is now the best non-reasoning model, dethroning Sonnet 3.5.Congrats @deepseek_ai!
- Tweet from Qwen (@Alibaba_Qwen): 72B too big for VLM? 7B not strong enough! Then you should use our 32B model, Qwen2.5-VL-32B-Instruct!Blog: https://qwenlm.github.io/blog/qwen2.5-vl-32b/Qwen Chat: https://chat.qwen.aiHF: https://hugg...
- Tweet from PicoCreator - AI Model Builder 🌉 (@picocreator): ❗️Attention is NOT all you need ❗️Using only 8 GPU's (not a cluster), we trained a Qwerky-72B (and 32B), without any transformer attentionWith evals far surpassing GPT 3.5 turbo, and closing in on...
- Tweet from Artificial Analysis (@ArtificialAnlys): The Halfmoon 🌓 reveal: Congratulations to @reveimage on creating the world’s leading image generation model with Reve Image!Reve Image has been in the Artificial Analysis Image Arena over the past we...
- Tweet from Aaron Meurer (@asmeurer): @simonw The license update is a big deal. The original V3 was not MIT.
- Tweet from Chubby♨️ (@kimmonismus): Sora abandons credits for all paid tiers, unlimited generations available.This is a good change.
- Tweet from Tibor Blaho (@btibor91): @TheXeophon https://x.com/btibor91/status/1899917834496729259?s=61Quoting Tibor Blaho (@btibor91) @TheXeophon-bench
- MambaVision - a nvidia Collection: no description found
- deepseek-ai/DeepSeek-V3-0324 · Hugging Face: no description found
- Nemotron-H: A Family of Accurate, Efficient Hybrid Mamba-Transformer Models: Nemotron-H is a series of hybrid Mamba-Transformer models which offer either better or on-par accuracy and improved inference speed (up to 3x) compared to other similarly-sized state-of-the-art open-s...
- Add Qwen2.5-Omni by BakerBunker · Pull Request #36752 · huggingface/transformers: What does this PR do?Add Qwen2.5 Omni ModelBefore submitting This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). Did you read the contributor...
Interconnects (Nathan Lambert) ▷ #ml-questions (25 messages🔥):
Impact of noisy data in multi-turn SFT, Transformer usage in RL, Community model preferences, Trusting eval benchmarks, Gemini's image generation- Noise Tolerated in Multi-Turn SFT?: A member questioned how much noise impacts data quality in multi-turn SFT, especially with complex agent trajectories, suggesting that some noise is tolerable, recovery steps are valuable, and erroneous turns can be masked.
- They shared that it's difficult to collect perfect trajectories when the complexity and step count increases, like making a wrong decision about which site to go to for information or which application to use to open a file.
- Transformers Slow to Take Over RL?: A member inquired about the limited use of Transformers in RL policy models, suspecting it's due to compute and memory constraints.
- They are having trouble finding many papers where they actually used a small Transformer.
- Community Prefers Claude 3.5 for Code?: A member asked if Interconnects publishes community-preferred model lists, noting their preference for Claude 3.5 over Claude 3.7 for code, but the opposite for reasoning.
- Another member mentioned that Interconnects does not publish model lists, but they hope to add more evals to their artifacts logs series when possible.
- Private Evals > Benchmarks: Multiple members discussed trusting model eval benchmarks, with one stating Don’t trust them; have my own eval, and recommending creating a markdown file with 5-10 prompts that you care about.
- The suggestion was to run prompts with multiple models side by side in tools such as Chorus to quickly get a feel which model is good for which things.
- Gemini's Generator a Mystery?: A member inquired whether the new Gemini's image generation is autoregressive or uses a diffusion head, but its architecture remains unknown.
- Another member mentioned that labs know which websites to include to boost common benchmarks during training.
Link mentioned: Building on evaluation quicksand: On the state of evaluation for language models.
Interconnects (Nathan Lambert) ▷ #random (36 messages🔥):
LLM input/output tokens, o1-pro performance, Mistral 24B is impressive, Claude Compass starter prompts, DAPO and Dr. GRPO- LLMs Count Input and Output Tokens: In LLMs, both input tokens and output tokens are counted during Supervised Fine-Tuning (SFT), clarifying an initial question about token handling.
- A member confirmed the token counting and humorously remarked that, "With the cost of those tokens, he could’ve bought the NYT."
- o1-pro Dominates Extended NYT Connections Benchmark: o1-pro set a new record on the Extended NYT Connections benchmark with a score of 81.7, surpassing the previous champion, o1 at 69.7, as noted in a tweet.
- The benchmark is a more challenging version of the original, with additional words per puzzle.
- Mistral 24B Impresses Community, Reputation Recovers: The release of Mistral 24B is considered a major highlight, praised for its strength and accessibility of the base model, and the promise of new open releases under Apache 2.0 is aiding in reputation recovery.
- One member stated, *"Mistral 24B is probably one of the greatest releases in the last months, incredibly strong model and you have access to the base model as well."
- Claude Compass Launches Prompts: A member shared a tweet of Claude Compass's starter prompts which are deep research prompts such as 'Find credible sources for my research' and 'Analyze great investment pitches'.
- It was also noted that another company named Cohere already has a product named Compass.
- DAPO and Dr. GRPO Papers: A member is mastering DAPO and Dr. GRPO for an upcoming blog post, planning to review relevant papers and improve the RLHF book implementation section on tradeoffs.
- The notes are complete, and the member is considering covering DAPO and Dr. GRPO together, possibly deferring the rest to a future post.
Links mentioned:- Tweet from Andreas Köpf (@neurosp1ke): added mistral-small-3.1-24b-instruct
- Tweet from Lech Mazur (@LechMazur): o1-pro sets a new record on my Extended NYT Connections benchmark with a score of 81.7, easily outperforming the previous champion, o1 (69.7)! This benchmark is a more difficult version of my original...
- Tweet from Tibor Blaho (@btibor91): New: Claude Compass (deep research) starter prompts- "Find credible sources for my research"- "Provide evidence-based insights for my topic"- "Research topics for my writing"- ...
- Tweet from Lech Mazur (@LechMazur): @bradthilton I might benchmark a shorter version of hallucinations, but no chance I'm running other benchmarks.
- Llm Pricing - a Hugging Face Space by Presidentlin: no description found
Interconnects (Nathan Lambert) ▷ #memes (4 messages):
O1-pro vs BoN, O1-pro reasoning paths marginalization, Tech CEOs in Open Source RL- O1-pro excels in Reasoning Path Merging: A member suggested that O1-pro seems more like merging reasoning paths with correct answers than the simple BoN (Bag of Neurons).
- They noted that the output length from o1-pro is usually a lot longer than o1 but didn't know how to marginalize reasoning paths though.
- Tech CEOs champion Open Source RL: Nathan Lambert shared a post that stated major tech company CEOs are arguing for very cutting edge defaults in open-source RL repos.
- He concluded that this timeline is amazing.
Links mentioned:- Tweet from undefined: no description found
- Tweet from Nathan Lambert (@natolambert): Lol when major tech company CEOs are arguing for very cutting edge defaults in open-source RL repo's. This timeline is amazing.
Interconnects (Nathan Lambert) ▷ #rl (127 messages🔥🔥):
R1-Zero Training, GRPO Bias, LOOP & RLOO, PPO Objective, Creative Writing LLMs- Row Mean's Length Bias Unmasked in R1-Zero Training: An analysis reveals that using row mean in R1-Zero-like training introduces a bias, favoring shorter correct responses and longer incorrect ones, as detailed in a paper and accompanying code.
- Switching to all mean yields comparable performance without increasing length; leading to questions about plots showing increasing reasoning length correlating with increased capability.
- GRPO's Length Explosion Problem Plagues Practitioners: Users observed length explosion in their GRPO runs, prompting consideration of techniques like length curriculum or clipping, though these are seen as unsatisfactory band-aids.
- The core issue is garbage responses are being generated when responses are getting longer; this implies a deeper problem beyond length.
- Prefix Caching for vLLM Causes RL Issues: Members found that prefix caching for vLLM may be causing RL issues as stated in this github issue.
- Specifically, inference was worse than training and identified this caching as the culprit, demonstrating a subtle issue that may be overlooked.
- LOOP and RLOO Arise from Unbiasing Dr. GRPO: It was suggested that Dr. GRPO still has a bias that is more pronounced the smaller the group size is; to make it unbiased, simply multiply Dr. GRPO's A_i by the correction term N/N-1, resulting in LOOP (Leave-One-Out Proximal Policy Optimization), detailed in the Dr GRPO paper.
- Removing PPO’s clipping yields RLOO (Reinforce Leave-One-Out).
- Deviation-Based DPO Diversifies Creative LLM Writing: A new paper explores promoting both output diversity and quality in creative writing LLMs, by including deviation in the training objective to facilitate learning from rare high-quality instances.
- The study adopts this approach to Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO).
Links mentioned:- Modifying Large Language Model Post-Training for Diverse Creative Writing: As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training...
- Tweet from Zichen Liu (@zzlccc): 🪂Understanding R1-Zero-Like Training: A Critical Perspective* DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning??* The ever-increasing output length in RL-tuning might be due to...
- Tweet from Tassel Pierre (@tassel_pierre): @elan_marko @zzlccc Yes but you apply the loss per token. So if you have two completions with positive advantage rewards, even if the longer one has a slightly less positive one, because it is applied...
- Tweet from Quentin Gallouédec (@QGallouedec): Thanks @ethayarajh @zzlccc!I have exactly the same question as @ethayarajh. In trl, we don't do (anymore) this par per-sequence normalization that leads to response level length bias. Instead, we ...
- Tweet from Wenhu Chen (@WenhuChen): This paper provides some really interesting insights:1. Previously, people found that Qwen base models are particularly good at R1 training to show strong exploration skills. - This paper shows that t...
- Tweet from leloy! (@leloykun): I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the group size is.To make it unbiased, simply multiply Dr. GRPO's A_i by...
- Where does the proximal policy optimization objective's ratio term come from?: I will use the notation used in the proximal policy optimization paper. What approximation is needed to arrive at the surrogate objective (equation (6) above) with the ratio $r_t(\theta)$?&#x...</li><li><a href="https://ai.stackexchange.com/questions/37958/where-does-the-proximal-policy-opt">Where does the proximal policy optimization objective's ratio term come from?</a>: I will use the notation used in the proximal policy optimization paper.

What approximation is needed to arrive at the surrogate objective (equation (6) above) with the ratio $r_t(\theta)$?&#x...</li><li><a href="https://github.com/huggingface/open-r1/issues/491">Prefix Caching should be turned off for GRPO · Issue #491 · huggingface/open-r1</a>: The performance of my runs during inference was way worse than the performance during training. After debugging, I think prefix caching is the culprit behind this. Since the model is constantly bei...</li><li><a href="https://github.com/sail-sg/oat/blob/7619b79a8804e813419faeda22bdd35cc4d9b9bd/oat/algorithms/ppo.py#L231">oat/oat/algorithms/ppo.py at 7619b79a8804e813419faeda22bdd35cc4d9b9bd · sail-sg/oat</a>: 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc. - sail-sg/oat</li><li><a href="https://github.com/huggingface/trl/blob/07cfe1677e552b7d5c92b7740e5b2f0b057661d8/trl/trainer/grpo_trainer.py#L965">trl/trl/trainer/grpo_trainer.py at 07cfe1677e552b7d5c92b7740e5b2f0b057661d8 · huggingface/trl</a>: Train transformer language models with reinforcement learning. - huggingface/trl</li><li><a href="https://github.com/huggingface/trl/blob/07cfe1677e552b7d5c92b7740e5b2f0b057661d8/trl/trainer/ppo_trainer.py#L573C1-L574C1">trl/trl/trainer/ppo_trainer.py at 07cfe1677e552b7d5c92b7740e5b2f0b057661d8 · huggingface/trl</a>: Train transformer language models with reinforcement learning. - huggingface/trl</li><li><a href="https://github.com/sail-sg/oat/blob/7619b79a8804e813419faeda22bdd35cc4d9b9bd/oat/algorithms/ppo.py#L560">oat/oat/algorithms/ppo.py at 7619b79a8804e813419faeda22bdd35cc4d9b9bd · sail-sg/oat</a>: 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc. - sail-sg/oat </li> </ul> </div> --- ### **Interconnects (Nathan Lambert) ▷ #[cv](https://discord.com/channels/1179127597926469703/1208183243447468093/1352749970393927773)** (6 messages): > `Operator agent limitations, Infinibranch Browsers as a solution, Intelligent Browser Automation` - **Operator Agents Lack Managerial Skills**: Members discussed the limitations of **Operator agents**, noting they struggle with complex tasks requiring coordination, such as extracting information from datasets and one person commented on needing *a manager agent that tells 1 operator agent to get the details for 1 dataset*. - One member expressed frustration with the limited success rate, achieving only **4** out of **10** tasks with the operator and **6** with deep research. - **Infinibranch Browsers Reach 80% Success**: A possible solution using [Morph Cloud's Infinibranch Browser](https://x.com/morph_labs/status/1902566171641266500) was suggested to help scale browser-use agents, improving the success rate to approximately **80%** on tasks like finding Amazon links for a list of books. - The original poster on X, Andrew Carr, needed to extract links from **1000+ books** to a Google sheet which Operator was unable to hack. - **Morph Cloud Scales Autonomous Browser Workflows**: [Morph Cloud](https://morph.so/blog/browser-morph-cloud/) allows users to snapshot and branch complete browser states, including authentication and cookies, making it easier to scale autonomous browser workflows across multiple parallel instances. - The blogpost further explains how traditional web scraping methods have become obsolete because of JavaScript-heavy single page applications, Dynamic loading and infinite scroll, complex user interactions required to access data, CAPTCHAs and sophisticated bot detection, multi-step workflows that require understanding context. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://x.com/andrew_n_carr/status/1901354501317288304">Tweet from Andrew Carr (e/🤸) (@andrew_n_carr)</a>: I have a very specific agentic use case that is just hard enough that web scraping doesn't work. 1. I have a list of 1000+ books2. I want to find their amazon links3. I would like those saved in a...</li><li><a href="https://x.com/morph_labs/status/1902566171641266500">Tweet from Morph (@morph_labs)</a>: Announcing Infinibranch BrowsersMorph Cloud's Infinibranch Browser scales the browser-use agent into a ~80% success rate on the list of books belowOperator doesn't get past 10%Quoting Andrew C...</li><li><a href="https://morph.so/blog/browser-morph-cloud/">Remote Browsers with Morph Cloud: Infinitely Scalable Browser Automation</a>: no description found </li> </ul> </div> --- ### **Interconnects (Nathan Lambert) ▷ #[reads](https://discord.com/channels/1179127597926469703/1214764639397617695/1352726958584365116)** (16 messages🔥): > `R1-Zero-Like Training, DeepSeek-V3-Base, GRPO Bias in RL-tuning, CoT Philosophy, Math errors in AI papers` - **R1-Zero Training: New Insights Emerge**: A [Twitter thread](https://x.com/zzlccc/status/1903162768083259703) highlights key observations about **R1-Zero-like training**, suggesting **DeepSeek-V3-Base** shows an *'Aha moment'* before RL-tuning. - The researchers point to a potential **bias in GRPO** contributing to ever-increasing output length, detailing findings in a [paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf) and providing [code](https://github.com/sail-sg/understand-r1-zero). - **GRPO Loss Implementation Analysis**: Multiple papers this week discuss the **1/o term** and its impact on longer examples, suggesting that the loss penalizes long, repetitive behaviors less while not rewarding long, exploratory generations sufficiently. - They note that per-question normalization punishes hard questions within a batch. - **Chain of Thought and Reasoning**: A member questioned if advancements are truly about reasoning or if they leverage tokens to overcome inefficiencies in task-specific next-token completion/search. - Another suggested the viability of Chain of Thought as a form of language model reasoning, describing reasoning as very broad. - **Mathematical Concerns about paper calculations**: There was discussion in an AI2 Slack channel suggesting potential errors or anomalies in the math presented in the paper. - Some members expressed confusion regarding the paper's argument about length normalization bias, with further discussion occurring in a linked channel with a member providing an explanation. **Link mentioned**: <a href="https://x.com/zzlccc/status/1903162768083259703?s=61">Tweet from Zichen Liu (@zzlccc)</a>: 🪂Understanding R1-Zero-Like Training: A Critical Perspective* DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning??* The ever-increasing output length in RL-tuning might be due to... --- ### **Interconnects (Nathan Lambert) ▷ #[lectures-and-projects](https://discord.com/channels/1179127597926469703/1223784028428177510/1353807028715393156)** (2 messages): > `Claude PR, Header Copy Links` - **Claude Sends Pull Request for Header Copy Links**: A member shared a [pull request](https://github.com/natolambert/rlhf-book/pull/82) made by **Claude** for adding header copy links to a GitHub repository. - **Header Copy Links Amaze**: Members found the header copy links that appear on hover to be interesting and useful. - They attached a [screenshot](https://cdn.discordapp.com/attachments/1223784028428177510/1353807029223030835/Screenshot_2025-03-24_at_12.05.34_PM.png?ex=67e2fe8c&is=67e1ad0c&hm=41d19137d3231c38197bef45a02356a9b88f754b907ba8a3f1028543cb17349e&) of the links, noting that they *worked immediately with claude code*. **Link mentioned**: <a href="https://github.com/natolambert/rlhf-book/pull/82">(experimental) Add heading anchor links for easy section linking by natolambert · Pull Request #82 · natolambert/rlhf-book</a>: Add copyable links to all headings that appear on hoverLinks copy the current URL with fragment identifier to clipboardAdd CSS for styling the anchor linksUpdate Makefile to copy new JS file to ... --- ### **Interconnects (Nathan Lambert) ▷ #[policy](https://discord.com/channels/1179127597926469703/1325523782806274089/1353105005778833449)** (9 messages🔥): > `China's Open Source AI Blitz, DeepSeek's Impact, US vs China AI Competition, Chinese consumer market for software, China commoditizing hardware` - **China Plans Open-Source AI Blitz**: According to [this tweet](https://x.com/balajis/status/1903469483739730132), China aims to flood the market with open-source AI models to **commoditize AI software** and boost its hardware sales. - The strategy is to copy, optimize, scale, and undercut Western tech, similar to their approach with manufacturing, with **DeepSeek** being a key player. - **DeepSeek Triggers Tech Market Tumble**: The release of **DeepSeek** models temporarily knocked ~$1T off US tech market caps, highlighting the potential impact of Chinese AI on global markets, per [this tweet](https://x.com/balajis/status/1903469483739730132). - The founder of DeepSeek (**Liang Wengfeng**) has met with top Chinese officials, indicating significant state support and access to *unlimited resources*. - **China's AI Competition**: A member stated that China's push in open-source AI is driven by intense domestic competition, aiming to accelerate progress rather than *bring down US tech*. - They added that most top Chinese labs realize open source is the best way to drive progress because *your close source model will be irrelevant in 3-6 mths or so, might as well accelerate*. - **Revenue from Ads and Digital Services Lower in China than US**: A member pointed out that Chinese companies aren't trying to destroy American value as a goal. - The revenue market for ads and digital services isnt the same as in the US with *much less revenue in ads and digital services in china than US* and for this reason open sourcing is more fine, as well. - **Chinese Consumers Reluctant to Pay for Software**: Chinese consumers generally avoid paying for software and services, with students and professionals being the primary payers. - The consumer market is largely dominated by **ByteDance** and previously by **Kimi**. **Link mentioned**: <a href="https://x.com/balajis/status/1903469483739730132">Tweet from Balaji (@balajis)</a>: AI OVERPRODUCTIONChina seeks to commoditize their complements. So, over the following months, I expect a complete blitz of Chinese open-source AI models for everything from computer vision to robotics... --- ### **Interconnects (Nathan Lambert) ▷ #[expensive-queries](https://discord.com/channels/1179127597926469703/1338919429752361103/1353032783139704883)** (17 messages🔥): > `Grok DeeperSearch, OpenAI Deep Research, Twitter Premium, HF model comparisons` - **Grok DeeperSearch Approaches OpenAI Deep Research**: The new **Grok DeeperSearch** is reportedly *"really good"* and close to **OpenAI Deep Research** in quality, which is impressive considering the short timeframe. - The initial **Grok DeepSearch** was considered *"awful"* due to hallucinating content from retrieved links, making it the worst implementation, according to some users. - **Twitter Premium Grants Access to Grok DeeperSearch**: Access to **Grok DeeperSearch** is available with **Twitter Premium** (the $10 tier), exclusively on the Grok Website. - After tweeting about the poor performance of **Grok DeepSearch**, an individual from xAI contacted one user, leading to improvements in **DeeperSearch** based on provided chats and benchmarks. - **Benchmarking Deep(Re)search Implementations**: One user maintains a markdown file with a set of questions to test search and research implementations, including **Grok DeeperSearch**. - The benchmark includes a broad shopping query, a specific shopping query, a generic paper search prompt, and a table/benchmark comparison between two models from **Hugging Face**. - **Image Generation Benchmarking**: A user shared their image generation benchmark, including prompts such as *"A woman sitting at a poker table with cards in her hands"* and *"Isometric pixel art of a waterfall"*. - These benchmarks help in comparing the performance of different models and would assist future posts. **Link mentioned**: <a href="https://fxtwitter.com/btibor91/status/1899917834496729259">Tweet from Tibor Blaho (@btibor91)</a>: @TheXeophon-bench --- ### **Latent Space ▷ #[ai-general-chat](https://discord.com/channels/822583790773862470/1075282825051385876/1352724292319576125)** (89 messages🔥🔥): > `Gemini Updates, Claude Code New Features, Model Context Protocol (MCP), AI Agents and Email, RF-DETR Object Detection Model` - **Gemini Updates Deconstructed**: Gemini's Dave Citron joined @OfficialLoganK on the Release Notes podcast to discuss recent updates, including **personalization**, **Canvas**, **Audio Overviews**, and **Deep Research**. - The discussion covered topics from recent app launches to the future of personalization in the **Gemini app**, including insights into user data and privacy considerations. - **Claude Code Gets Eight New Features**: Anthropic launched **eight** new features for **Claude Code** to help developers build faster and smarter, documented on their [engineering blog](https://www.anthropic.com/engineering/claude-think-tool). - Features include a new "think" tool, leading to discussion on its implementation and value, with some likening it to Chain of Thought prompting. - **A16Z's MCP Ecosystem Deep Dive**: A16Z published a deep dive into **Model Context Protocol (MCP)**, exploring its potential as a standard interface for execution, data fetching, and tool calling in AI models as APIs are the internet's first great unifier. - The post examines the use cases of MCP, the challenges, and how it changes the way AI interacts with tools, noting that APIs were the internet’s first great unifier, but AI models lack an equivalent. - **Roboflow Unleashes RF-DETR for Real-Time Object Detection**: Roboflow announced **RF-DETR**, a fully open-source real-time object detection model under the Apache 2.0 license available on [GitHub](https://github.com/roboflow/rf-detr). - RF-DETR achieves **SOTA** performance with over **60 mAP** on **COCO**, with base and large models at **29M** and **128M** parameters respectively. - **Browser Use Bags $17M to Build Web for Agents**: Browser Use raised **$17 million** to advance web agents, led by Felicis Ventures, aiming to take web agents to the next level after an initial prototype was built in just **four days** and launched on Hacker News. - The company is hiring top engineers to build the internet for LLMs, promising a challenging environment with a pure software geekery team culture. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://x.com/pashpops/status/1902814965595246855?s=46">Tweet from Pasha Rayan (@Pashpops)</a>: Introducing A1Mail - Email for AI Agents! 📬🤖TLDR: With A1Mail you can create an email address then send and receive mail from that address for your AI Agent - without paying $12 per month per Gmail ...</li><li><a href="https://x.com/stuffyokodraws/status/1902757447984710076">Tweet from Yoko (@stuffyokodraws)</a>: [New post] 🔥A Deep Dive Into MCP and the Future of AI Tooling APIs were the internet’s first great unifier but AI models lack an equivalent. What are the use cases of MCPs today? Where are the chal...</li><li><a href="https://x.com/GeminiApp/status/1902752852843331650">Tweet from Google Gemini App (@GeminiApp)</a>: In the latest episode of Release Notes, Gemini's Dave Citron joins @OfficialLoganK to deep dive into some of the latest Gemini updates.🎙️ Learn more about Gemini with personalization, Canvas, Aud...</li><li><a href="https://x.com/kimmonismus/status/1903221838022324226?s=46">Tweet from Chubby♨️ (@kimmonismus)</a>: Sora abandons credits for all paid tiers, unlimited generations available.This is a good change.</li><li><a href="https://x.com/karpathy/status/1903671737780498883">Tweet from Andrej Karpathy (@karpathy)</a>: I just vibe coded a whole iOS app in Swift (without having programmed in Swift before, though I learned some in the process) and now ~1 hour later it's actually running on my physical phone. It wa...</li><li><a href="https://x.com/theomediaai/status/1903448834451111988?s=61">Tweet from Theoretically Media (@TheoMediaAI)</a>: Sora goes “unlimited” but, watermarked/720/slow boat for the $20 tier. You know what would get me back on the Pro ($200) plan? Release the “Big Daddy” Sora model for Pro users. Keep the Nerf’d version...</li><li><a href="https://fxtwitter.com/karpathy/status/1903671737780498883)">Tweet from Andrej Karpathy (@karpathy)</a>: I just vibe coded a whole iOS app in Swift (without having programmed in Swift before, though I learned some in the process) and now ~1 hour later it's actually running on my physical phone. It wa...</li><li><a href="https://x.com/karpathy/status/1903671737780498883>)">Tweet from Andrej Karpathy (@karpathy)</a>: I just vibe coded a whole iOS app in Swift (without having programmed in Swift before, though I learned some in the process) and now ~1 hour later it's actually running on my physical phone. It wa...</li><li><a href="https://x.com/_catwu/status/1903130881205977320">Tweet from cat (@_catwu)</a>: It’s been a big week for Claude Code.We launched 8 exciting new features to help devs build faster and smarter.Here's a roundup of everything we released:</li><li><a href="https://x.com/AnthropicAI/status/1903128670081888756">Tweet from Anthropic (@AnthropicAI)</a>: We’re launching a new blog: Engineering at Anthropic.A hub where developers can find practical advice and our latest discoveries on how to get the most from Claude.</li><li><a href="https://x.com/leloykun/status/1903186153513291933">Tweet from leloy! (@leloykun)</a>: There are actually two kinds of inference-time compute:1. The thinking that happens before generating answer tokens. Think of this as the "drafting" or "planning" stage. And2. The thin...</li><li><a href="https://fxtwitter.com/gergelyorosz/status/1904089127600975966)">Tweet from Gergely Orosz (@GergelyOrosz)</a>: Right now, many AI coding tooling startups are heavily subsidizing actual cost of running AI agents.None will be able to do it indefinitely.But those that start to charge closer actual costs on their ...</li><li><a href="https://x.com/gergelyorosz/status/1904089127600975966>)">Tweet from Gergely Orosz (@GergelyOrosz)</a>: Right now, many AI coding tooling startups are heavily subsidizing actual cost of running AI agents.None will be able to do it indefinitely.But those that start to charge closer actual costs on their ...</li><li><a href="https://x.com/tokumin/status/1902251588925915429?s=46">Tweet from Simon (@tokumin)</a>: 🛳️Rolling out interactive Mindmaps in NotebookLM! I'm so inspired by the Exploratorium here in SF - What if every notebook generated your own personal set of interactive understanding toys that h...</li><li><a href="https://x.com/kalomaze/status/1903366221333958999?s=61">Tweet from kalomaze (@kalomaze)</a>: @metalure hybrid mamba, 56b. ~20T tokens (!!!)fp8 pretrain. actual depth (64 layers, ~15% have attention, rest are mamba).distilled (not SFT, actual pretrain distillation!) 47b variant. for ~60bil tok...</li><li><a href="https://fxtwitter.com/taesung/status/1904220824435032528)">Tweet from Taesung Park (@Taesung)</a>: Excited to come out of stealth at @reveimage!Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques...</li><li><a href="https://x.com/taesung/status/1904220824435032528>)">Tweet from Taesung Park (@Taesung)</a>: Excited to come out of stealth at @reveimage!Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques...</li><li><a href="https://fxtwitter.com/karpathy/status/1886192184808149383)">Tweet from Andrej Karpathy (@karpathy)</a>: There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g...</li><li><a href="https://x.com/karpathy/status/1886192184808149383>)">Tweet from Andrej Karpathy (@karpathy)</a>: There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g...</li><li><a href="https://fxtwitter.com/TransluceAI/status/1904226873879806390)">Tweet from Transluce (@TransluceAI)</a>: To interpret AI benchmarks, we need to look at the data.Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses.We're introducing Docent to...</li><li><a href="https://x.com/TransluceAI/status/1904226873879806390>)">Tweet from Transluce (@TransluceAI)</a>: To interpret AI benchmarks, we need to look at the data.Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses.We're introducing Docent to...</li><li><a href="https://x.com/roboflow/status/1902810257652351228?s=46">Tweet from Roboflow (@roboflow)</a>: Excited to announce RF-DETR, the current SOTA for real-time object detection, fully open source and Apache 2.0 for the community.More to come but the repo and Colab notebook are available today for yo...</li><li><a href="https://fxtwitter.com/gregpr07/status/1903835252382224795)">Tweet from Gregor Zunic (@gregpr07)</a>: We Raised $17M to Build the Future of Web for Agents 🤖A few months ago, Browser Use was just an idea weekend experiment to see if LLMs could navigate the web like humans. In just four days, we built ...</li><li><a href="https://x.com/gregpr07/status/1901686296902615122>>>">Tweet from Gregor Zunic (@gregpr07)</a>: Browser Use is Hiring the top 0.01% Founding Engineer to build internet for LLMs🔥We (2 people) have built the leading repository for web agents—45K+ GitHub stars in just 4 months. Every day, someone ...</li><li><a href="https://x.com/gregpr07/status/1903835252382224795>)">Tweet from Gregor Zunic (@gregpr07)</a>: We Raised $17M to Build the Future of Web for Agents 🤖A few months ago, Browser Use was just an idea weekend experiment to see if LLMs could navigate the web like humans. In just four days, we built ...</li><li><a href="https://a16z.com/a-deep-dive-into-mcp-and-the-future-of-ai-tooling/">A Deep Dive Into MCP and the Future of AI Tooling | Andreessen Horowitz</a>: We explore what MCP is, how it changes the way AI interacts with tools, what developers are already building, and the challenges that still need solving. </li><li><a href="https://x.com/ctnzr/status/1903228434232512878?s=61">Tweet from Bryan Catanzaro (@ctnzr)</a>: Nemotron-H: A family of Hybrid Mamba-Transformer LLMs.* Hybrid architecture means up to 3X faster at the same accuracy* Trained in FP8* Great for VLMs* Weights and instruct versions to come soon.https...</li><li><a href="https://hamel.dev/blog/posts/field-guide/">A Field Guide to Rapidly Improving AI Products – Hamel’s Blog</a>: Evaluation methods, data-driven improvement, and experimentation techniques from 30+ production implementations.</li><li><a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-0324">deepseek-ai/DeepSeek-V3-0324 · Hugging Face</a>: no description found</li><li><a href="https://github.com/openai/openai-agents-python/blob/main/examples/agent_patterns/agents_as_tools.py">openai-agents-python/examples/agent_patterns/agents_as_tools.py at main · openai/openai-agents-python</a>: A lightweight, powerful framework for multi-agent workflows - openai/openai-agents-python</li><li><a href="https://x.com/WHinthorn/status/1903511723082232203">Tweet from WFH (@WHinthorn)</a>: Fun fact, this is how we got Claude to be a great prompt engineer where regular meta-prompting failed.https://github.com/hinthornw/promptimizer/blob/31a78b28123530571a8a098b020f5a7a5cfbc2ca/src/prompt...</li><li><a href="https://github.com/hinthornw/promptimizer/blob/31a78b28123530571a8a098b020f5a7a5cfbc2ca/src/promptim/optimizers/metaprompt.py#L238">promptimizer/src/promptim/optimizers/metaprompt.py at 31a78b28123530571a8a098b020f5a7a5cfbc2ca · hinthornw/promptimizer</a>: Prompt optimization scratch. Contribute to hinthornw/promptimizer development by creating an account on GitHub.</li><li><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231">no title found</a>: no description found</li><li><a href="https://qwenlm.github.io/blog/qwen2.5-vl-32b/">Qwen2.5-VL-32B: Smarter and Lighter</a>: QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback fro...</li><li><a href="https://chat.qwenlm.ai)">no title found</a>: no description found</li><li><a href="https://modelscope.cn/collections/Qwen25-VL-58fbb5d31f1d47)">魔搭社区</a>: no description found</li><li><a href="https://www.oneusefulthing.org/p/the-cybernetic-teammate">The Cybernetic Teammate</a>: Having an AI on your team can increase performance, provide expertise, and improve your experience</li><li><a href="https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf)">Publications - Faculty & Research - Harvard Business School</a>: no description found</li><li><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5162111)">Page Cannot be Found</a>: no description found</li><li><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566).">Page Cannot be Found</a>: no description found</li><li><a href="https://x.com/_mchenco/status/1903520306305827051?s=46">Tweet from michelle (@_mchenco)</a>: cloudflare's first innovation week of 2025 just wrapped up, and i wasn't joking when i said every product in the future will be powered by Workers AI.we don't think of AI solely as a verti...</li><li><a href="https://blog.cloudflare.com/how-cloudflare-is-using-automation-to-tackle-phishing/">How Cloudflare is using automation to tackle phishing head on</a>: How Cloudflare is using threat intelligence and our Developer Platform products to automate phishing abuse reports.</li><li><a href="https://blog.cloudflare.com/ai-labyrinth/">Trapping misbehaving bots in an AI Labyrinth</a>: How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.</li><li><a href="https://blog.cloudflare.com/take-control-of-public-ai-application-security-with-cloudflare-firewall-for-ai/">Take control of public AI application security with Cloudflare's Firewall for AI</a>: Firewall for AI discovers and protects your public LLM-powered applications, and is seamlessly integrated with Cloudflare WAF. Join the beta now and take control of your generative AI security.</li><li><a href="https://blog.cloudflare.com/cloudflare-for-ai-supporting-ai-adoption-at-scale-with-a-security-first-approach/">Cloudflare for AI: supporting AI adoption at scale with a security-first approach</a>: With Cloudflare for AI, developers, security teams and content creators can leverage Cloudflare’s network and portfolio of tools to secure, observe and make AI applications resilient and safe to use.</li><li><a href="https://blog.cloudflare.com/introducing-ai-agent/">Introducing Cloudy, Cloudflare’s AI agent for simplifying complex configurations</a>: Cloudflare’s first AI agent, Cloudy, helps make complicated configurations easy to understand for Cloudflare administrators. </li> </ul> </div> --- ### **Latent Space ▷ #[ai-announcements](https://discord.com/channels/822583790773862470/1075282504648511499/1353504654897446993)** (2 messages): > `Rishi Agarwal on Distillation, Swyx's Agent Engineering Talk, Agent Engineering Elements, Agents as ChatGPT's Growth Path` - **Agarwal Surveys Distillation Techniques**: Deepmind's **Rishi Agarwal** released a short [podcast](https://youtu.be/O1AR4iL30mg) surveying **distillation** techniques in machine learning. - **Swyx Launches into Agent Engineering**: **Swyx** launched a [new talk and essay](https://x.com/swyx/status/1904256213661192405) on **Agent Engineering**. - The talk was also featured live on the [@latentspacepod](https://latent.space/p/agent), highlighting the reasons for going all in on Agents at @aiDotEngineer. - **Six Agent Engineering Elements Unveiled**: The discussion defines **Agents** (thanks to @simonw) and elaborates on the **Six Elements of Agent Engineering**. - It also examines how **Agents** could be **ChatGPT's** route to reaching **1 billion monthly active users (MAU)**. **Link mentioned**: <a href="https://x.com/swyx/status/1904256213661192405">Tweet from swyx 🌉 (@swyx)</a>: 🆕 talk + essay: Agent Engineeringhttps://latent.space/p/agentWhy we went all in on Agents @aiDotEngineerDefining Agents (thanks to @simonw)The Six Elements of Agent EngineeringWhy Agents are ChatGPT&... --- ### **Latent Space ▷ #[ai-in-action-club](https://discord.com/channels/822583790773862470/1200548371715342479/1352733334760984606)** (226 messages🔥🔥): > `DORA report, Gemini API, AI code generation, Agile adoption, Ruby on Rails` - **Google Cloud's DORA Report Explores Engineering Excellence**: The [DORA report](https://dora.dev/research/2024/dora-report/) by Google Cloud delves into metrics for **engineering excellence**, though accessing the full report requires signup. - Some found the focus on "*engineering excellence*" to be overly corporate, contrasting it with the "*yolo vibe code*" often used in prototyping. - **Discord Mobile App to Show Video Ads**: Discord's mobile app will introduce **video ads** starting in June, offering advertisers opportunities to showcase trailers and premium content as reported by [ArsTechnica](https://arstechnica.com/gadgets/2025/03/discord-heightens-ad-focus-by-introducing-video-ads-to-mobile-apps-in-june/). - Users expressed concerns about Discord "*enshittifying*" in preparation for an IPO, drawing parallels to the platform X. - **Gemini API is a Cheap Loss Leader**: Members are finding the **Gemini API** to be a very cheap API, with one user "*sonnet maxxing right now*," and another calls it a "*loss leader.*" - There are concerns raised about potential "*model lockin*" risks associated with relying on one AI provider and cultural differences between companies. - **AI Code Generation Replacing Manual Coding**: A member mentioned AI is writing **80-90%** of their company's code, and another admits that AI writes **99%** of their code these days, resulting in robots doing all the work. - Others mentioned their hate for "*template repos*" and that AI is much better at reinventing the wheel for itself. - **Vibe Manifesto Released**: The [Vibe Manifesto](https://vibemanifesto.org/) values flow, iteration, augmentation, product thinking, rerolling, and human taste. - These values contrast with friction, perfection, automation, code crafting, debugging, and technical constraints, respectively. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://vibemanifesto.org/">Vibe Coding Manifesto</a>: A philosophy for a new generation of developers</li><li><a href="https://dora.dev/research/2024/dora-report/">DORA | Accelerate State of DevOps Report 2024</a>: DORA is a long running research program that seeks to understand the capabilities that drive software delivery and operations performance. DORA helps teams apply those capabilities, leading to better ...</li><li><a href="https://tenor.com/view/putting-on-my-sunglasses-ken-ryan-gosling-barbie-movie-shades-on-gif-812066675624542171">Putting On My Sunglasses Ken GIF - Putting on my sunglasses Ken Ryan gosling - Discover & Share GIFs</a>: Click to view the GIF</li><li><a href="https://github.com/ZachBeta/ruby_ai_llm_bot_for_good_discord">GitHub - ZachBeta/ruby_ai_llm_bot_for_good_discord</a>: Contribute to ZachBeta/ruby_ai_llm_bot_for_good_discord development by creating an account on GitHub.</li><li><a href="https://docs.google.com/spreadsheets/d/1q5rwO4wleMTLXr1z58c2UC03QsDsGwbJY1v4UG7eEOs/edit?gid=1439059137#gid=1439059137">AI In Action: Weekly Jam Sessions</a>: no description found</li><li><a href="https://arstechnica.com/gadgets/2025/03/discord-heightens-ad-focus-by-introducing-video-ads-to-mobile-apps-in-june/">Discord heightens ad focus by introducing video ads to mobile apps in June</a>: Discord looks for more ways to make money ahead of expected IPO.</li><li><a href="https://github.com/ZachBeta/threejs_fpv">GitHub - ZachBeta/threejs_fpv</a>: Contribute to ZachBeta/threejs_fpv development by creating an account on GitHub. </li> </ul> </div> --- ### **Notebook LM ▷ #[announcements](https://discord.com/channels/1124402182171672732/1182376564525113484/1353176885789724783)** (1 messages): > `Mobile Study Participants, AI Model Updates` - **Mobile Study Participants Needed**: The team is still seeking participants for a study focused on mobile use cases and ideas. - Interested individuals are encouraged to join and share their insights to help the team learn more. - **AI Model Updates Coming Soon**: The team announced upcoming updates to their AI models. - More details will be shared in the coming days regarding specific improvements and new features. --- ### **Notebook LM ▷ #[use-cases](https://discord.com/channels/1124402182171672732/1124403655819415592/1352725684115734640)** (52 messages🔥): > `Mindmaps in NotebookLM, Research with NotebookLM, HR policies Hub in NotebookLM, NotebookLM for literature search, External Users Share NotebookLM` - **Mindmaps verschijnen geleidelijk in NotebookLM**: Een gebruiker merkte op dat hij geen mindmaps had in NotebookLM, waarop een andere gebruiker antwoordde dat hij ze wel had in de gratis versie en dat de functie geleidelijk wordt uitgerold. - Niet iedereen zit op dezelfde server, dus het duurt even voordat alle servers zijn bijgewerkt. - **NotebookLM: Onderzoek om uitgebreide rapporten te bouwen**: Een gebruiker vertelde dat hij NotebookLM gebruikt om onderzoek te doen en uitgebreide rapporten te maken om lokaal en soms regionaal nieuws te genereren, om mensen te helpen situaties te begrijpen. - De gebruiker deelde ook een link naar een podcast-episode over een 911-telefoongrap en de juridische gevolgen [911 Prank Call: The Felony Consequences](https://creators.spotify.com/pod/show/peezyproductions/episodes/911-Prank-Call-The-Felony-Consequences-e30gfec). - **NotebookLM: Hub voor HR-beleid**: Een gebruiker vroeg of iemand NotebookLM gebruikt als een hub voor HR-beleid, personeelshandboeken en onboarding van nieuwe medewerkers, zodat ze vragen kunnen stellen en de juiste antwoorden kunnen krijgen. - Hij had het geprobeerd, maar de antwoorden waren niet altijd correct en hij vroeg zich af of er een manier was om de informatie op een bepaalde manier te organiseren. - **NotebookLM: Literatuuronderzoek**: Een gebruiker vroeg hoe NotebookLM gebruikt kan worden voor literatuuronderzoek, waarop een andere gebruiker antwoordde dat NotebookLM geen ingebouwde zoekfunctie heeft. - Desondanks blijft het erg handig voor het leren van onderwerpen op de universiteit. - **NotebookLM: contract analyse**: Een gebruiker heeft 3 contracten van één pagina met handgeschreven cijfers/bedragen. - Eén ervan werd in eerste instantie helemaal niet vermeld. Een andere werd vermeld met ofwel EUR 700 of EUR 760. Eigenlijk is het EUR 400. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://creators.spotify.com/pod/show/peezyproductions/episodes/911-Prank-Call-The-Felony-Consequences-e30gfec">🚓 911 Prank Call: The Felony Consequences by Neural Network News</a>: 11-year-old girl named Ava in Volusia County, Florida, falsely reported a kidnapping via 911 texts. Deputies traced the messages to her residence in Port Orange, revealing the hoax. Ava confessed to m...</li><li><a href="https://open.spotify.com/episode/6a44wSFv8bc1T9x3mEE9Dq?si=tWnXTxqHQbqpky6bWqj0uw&nd=1&dlsi=d20a7ee755104caa">Sancocho con Limon - Quatsch Session 01</a>: FELD.FM · Episode </li> </ul> </div> --- ### **Notebook LM ▷ #[general](https://discord.com/channels/1124402182171672732/1124402182909857966/1352724286988746882)** (202 messages🔥🔥): > `Mind Map Pixelation Fix, Mind Map Feature Feedback, NotebookLM vs ChatGPT, Access to New NotebookLM, Feedback Methods for NotebookLM` - **Zoom in for Crisp Mind Map Downloads**: A member recommends zooming in on tabs before downloading a **Mind Map** to get a bigger and higher quality output and fix pixelation issues. - The member also declared that *this tool is an absolute game changer*, touting the crazy context window and low hallucination rates, even cancelling their subscriptions to **ChatGPT** and **Claude**. - **Mind Mapping Sparks Symbolic Reasoning**: A user believes that getting **Mind Mapping** right is an important step toward more effective and smarter AI and may be indicative of symbolic reasoning. - They suggest that once knowledge can be expressed as a network of meanings, these data structures can be easily corrected with simple manipulations like transplanting nodes or adding intermediate nodes. - **NotebookLM is not an App, but a PWA**: A user sought to change the language on the app, but another user noted that **NotebookLM** doesn't have an app, but rather a Progressive Web App (PWA). - They recommend removing the app, loading **NotebookLM** in the browser with the `?hl=LANGUAGE` option, and then reinstalling the **PWA**. - **Podcast Language can be "Forced"**: A user found that it's possible to "force" a podcast to generate in another language by inputting a specific prompt at the beginning of the text settings, though English is the only officially supported language. - They used the prompt *PT-BR cria o podcast em português* to generate a Portuguese podcast, emphasizing it doesn't always work but finds it cool when it does. - **Mind Map Feature gets Mixed Reviews**: A user thinks that the new mind map is a great addition to **NotebookLM**, but finds it has major weaknesses. - They state that the mind map needs constant regeneration to update and lacks details beyond the topic, requiring back-and-forth navigation and asked for *topic and subtopic could be explained within the topic itself*. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://www.playbook.com/s/notebooklm/the-deep-dive">Playbook</a>: Playbook is a modern creative file manager. Store, tag, and organize your files and folders beautifully. Designers, sign up today and get 4TB storage!</li><li><a href="https://notebooklm.google.com/">no title found</a>: no description found</li><li><a href="https://support.google.com/notebooklm/answer/15724963?hl=en&ref_topic=14775295&sjid=1197563608642675832-NC">Learn how NotebookLM protects your data - NotebookLM Help</a>: no description found</li><li><a href="https://support.google.com/notebooklm/answer/15678219?hl=en">Upgrading to NotebookLM Plus - NotebookLM Help</a>: no description found</li><li><a href="https://support.google.com/a/answer/14338836?sjid=14118684210403272528-EU&hl=en">Export your users' data - Google Workspace Admin Help</a>: no description found </li> </ul> </div> --- ### **Eleuther ▷ #[general](https://discord.com/channels/729741769192767510/729741769738158194/1352761126160437350)** (106 messages🔥🔥): > `RWKV architecture development, AI model viability prediction, EleutherAI evaluation methods, Low precision data types for RL, MkDocs site for lm-evaluation-harness` - **Virtual Testing Environment Predicts Model Viability**: A member proposed a virtual testing environment (AKA the simulator) that predicts AI model viability before training to reduce wasted resources, saving time and accelerating AI innovation by eliminating unnecessary failed experiments before they happen in expensive real-world training. - The member stated that their goal is *not to achieve 100% accuracy in predicting an AI mechanism’s behavior*—it’s to create a system that can at least tell us whether a model has a realistic chance of working or is doomed to fail early on. - **EleutherAI Evaluation Methods Detailed in New Blog**: A member wrote a quick blog on evaluation methods for EleutherAI and set up an [MkDocs site for easier navigation](https://slyracoon23.github.io/lm-evaluation-harness/). - They are awaiting review on [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/2832) too. - **Contributor Cautioned on AI-Generated Content in PRs**: A member was cautioned about the use of AI to generate content for pull requests, emphasizing the importance of vetting contributions to avoid adding spam. - It was suggested that unless the author is 100% certain they're correct on everything, *it would be better to withdraw the contribution until you are*. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/cloneofsimo/sdxl_inversions/blob/800613b426785757fca4964badeb666218e59eee/sdxl.py#L86">sdxl_inversions/sdxl.py at 800613b426785757fca4964badeb666218e59eee · cloneofsimo/sdxl_inversions</a>: Contribute to cloneofsimo/sdxl_inversions development by creating an account on GitHub.</li><li><a href="https://slyracoon23.github.io/blog/posts/2025-03-21_eleutherai-evaluation-methods.html">EleutherAI’s lm-evaluation-harness: Architecture and Configuration – Earl Potters</a>: A comprehensive guide to configuration, task architecture, and model integration</li><li><a href="https://slyracoon23.github.io/lm-evaluation-harness/">LM Evaluation Harness</a>: no description found</li><li><a href="https://github.com/EleutherAI/lm-evaluation-harness/pull/2832">Add MkDocs Documentation with GitHub Actions Deployment by Slyracoon23 · Pull Request #2832 · EleutherAI/lm-evaluation-harness</a>: Description:This PR introduces MkDocs integration to the LM Evaluation Harness repository, significantly enhancing documentation readability and accessibility. It provides:MkDocs setup: Configu... </li> </ul> </div> --- ### **Eleuther ▷ #[research](https://discord.com/channels/729741769192767510/747850033994662000/1352903760178974762)** (121 messages🔥🔥): > `AI simulation environments, Continual learning in production LLMs, Architecture-aware optimizers, Sharpness Disparity across Transformer blocks, VectorAdam optimizer` - **AI simulator for research**: A member shared an idea for a virtual environment to test AI innovations, potentially saving **money and resources**, as detailed in the attached [Ai_simulator.pdf](https://cdn.discordapp.com/attachments/747850033994662000/1352903759839363083/Ai_simulator.pdf?ex=67e30110&is=67e1af90&hm=6dd1c8028d8932d9e8b64355594bcf7c338adbf09e986186ccd4322d9cbcf99b&). - Others pointed out that testing new architectures at a small scale is already relatively inexpensive, costing around **$5** to train a **L6D512** model on a **3090** for a day. - **Optimal Optimizer Derivation Dilemma**: Members discussed the difficulty of deriving an optimal optimizer for specific architectures, noting that even for transformers, no such optimizer has been found, despite the availability of unconventional architectures. - One member suggested that if a near-optimal optimizer could be derived for an arbitrary architecture, it would be *work deserving of an award*. - **VectorAdam rotation equivariance exposed**: VectorAdam modifies the second moment update to be the square of the vector norm per gradient vector, addressing coordinate-system bias in Adam, potentially improving rotation equivariance, as shown in this [VectorAdam paper](https://www.dgp.toronto.edu/~zling/vector-adam/). - It was noted that VectorAdam is not similar to Adafactor, but more like a blocked approximation with **block size = hidden dim**. - **Convergence lemmas debunked**: It was suggested that convergence lemmas may not be important and that the regularizers can go in the loss function, so the AdamW detail can be ignored, or put in a separate loss function. - Other researchers believed this to be incorrect because the optima you're looking for is actually quite different with different regularization. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://arxiv.org/abs/2503.17126">Modifying Large Language Model Post-Training for Diverse Creative Writing</a>: As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training...</li><li><a href="https://arxiv.org/abs/1907.04164">Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model</a>: Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critic...</li><li><a href="https://arxiv.org/abs/2502.19002">The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training</a>: Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and...</li><li><a href="https://drive.google.com/file/d/1IIqxolKNn3cbQ9DaKTYqx5WIvJ04twTP/view">evolving_llms_through_text-based_self-play.pdf</a>: no description found</li><li><a href="https://www.dgp.toronto.edu/~zling/vector-adam/">VectorAdam for Rotation Equivariant Geometry Optimization</a>: no description found </li> </ul> </div> --- ### **Eleuther ▷ #[interpretability-general](https://discord.com/channels/729741769192767510/1052314805576400977/1352956296466534400)** (20 messages🔥): > `mechinterp backlash, token level activations, SAE visualizations, single token activations, untied embeddings` - **MechInterp Faces Academic 'Backlash'**: Members discussed that there seems to be an academic 'backlash' to the 'mechinterp' brand because so much of it is outside of traditional academic channels. - Theorizing that mechinterp is outside the mainstream academic channels, and they are resistant to the paradigm. - **Token Activations Analyzed for Accuracy**: A member is extracting token level activations on an **SAE**, questioning whether passing a single/pair of tokens would yield more accurate results than passing a whole sentence. - They found that the first token to trigger an activation is *holocaust* but it's not the token with the strongest activation, and wondered if neuron activation might be context specific. - **SAEviz Library for Visualization**: When looking at neuronpedia website graphs per feature/neuron, it was suggested to look into **SAEviz**, a library that does those visualizations using the **logit lens**. - The discussion clarified that these visualizations represent the ground truth activations rather than approximations. - **Single Token Activation Doubts Raised**: A member questioned the validity of single token activations, emphasizing that neurons are only ever active in contexts, it doesn't make sense to analyze them in isolation. - They explained that the activations are influenced by the context before; for instance, the phrase *I am a dictator I want to* might change the activation on *to*. - **Models need time to "warm up"**: A member states that models need time to 'warm up', where for the first 50 tokens contextual features tend to be ablated by the model by attending to the `end-of-text` token. - The intuition being that the model doesn't have enough information to make good judgements about context. --- ### **Eleuther ▷ #[multimodal-general](https://discord.com/channels/729741769192767510/795089627089862656/1353676488788148275)** (1 messages): > `Recursive Design, GAN vs. CNN vs. RL Architectures` - **Recursive Design Emerges as a Promising Technique**: A member introduced a novel diagram using a recursive design, distinguishing it from traditional **GANs** (*Generative Adversarial Networks*). - This member highlighted that their implementation emphasizes structural organization over sequential processing, leveraging **CNNs** for filtering and **RL** for refining responses. - **Alternate Architectures**: The user proposed an alternate architecture using recursive design. - The user distinguished the architecture from **GAN** as an expression, **CNN** for filtering, and **RL** for response refinement. --- ### **Eleuther ▷ #[gpt-neox-dev](https://discord.com/channels/729741769192767510/730090096287547444/1353833840694788237)** (1 messages): > `lm_eval update, CI test failures` - **Request to update `lm_eval`**: A member is drafting a PR to update the evaluation logic to `lm_eval==0.4.8`, the latest version, referencing the [Evals PR](https://github.com/EleutherAI/gpt-neox/pull/1348). - **CI Tests Failures**: A member observed that CI tests are failing for the **lm_eval update PR** and another test PR created with trivial changes, asking if the repo's CI is healthy, and referencing the [CI Test PR](https://github.com/EleutherAI/gpt-neox/pull/1349). <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/EleutherAI/gpt-neox/pull/1348">Update Evaluation Logic to Latest `lm_eval` (0.4.8) and Support Automatic Benchmark Evals w/o Validation Set by Kyle1668 · Pull Request #1348 · EleutherAI/gpt-neox</a>: I&#39;m training a model where I want to train on the entire datasets. I do not want to split the dataset into train/val/test. I want to evaluate on a set of benchmarks, one of which was introduce...</li><li><a href="https://github.com/EleutherAI/gpt-neox/pull/1349">[Throw Away] Sanity Check CI by Kyle1668 · Pull Request #1349 · EleutherAI/gpt-neox</a>: no description found </li> </ul> </div> --- ### **HuggingFace ▷ #[announcements](https://discord.com/channels/879548962464493619/897387888663232554/1353440360520749210)** (1 messages): > `StarVector, SpatialLM, Hugging Face Agents Course, Xet on the Hub, HF Welcome Page Makeover` - ****StarVector** emerges as vector graphics virtuoso**: A new foundation model called **StarVector** has been released on Hugging Face for generating scalable vector graphics code from images and text, available at [Hugging Face](https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289). - The initial release includes the **starvector/starvector-1b-im2svg** model. - ****SpatialLM** navigates the 3D landscape**: **SpatialLM**, a 3D large language model designed to process 3D point cloud data, has been released on Hugging Face at [manycore-research/SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B). - It generates structured 3D scene understanding outputs and can be further explored via the [project website](https://manycore-research.github.io/SpatialLM) and [GitHub repository](https://github.com/manycore-research/SpatialLM). - **HF Agents Course embraces LlamaIndex, LangChain, and SmolAgents**: The Hugging Face Agents Course now includes integrations for **LlamaIndex**, **LangChain**, and **smolagents**, offering learners diverse approaches to agent frameworks. - The course aims to provide fundamental knowledge applicable across different frameworks, making it accessible to those already familiar with one or more of them, according to [this tweet](https://x.com/ben_burtenshaw/status/1903025737633841170). - ****Xet** accelerates on the Hub**: Hugging Face's **Xet Team** has migrated the first Model and Dataset repositories off LFS and to Xet storage. - This is a step toward empowering AI builders to build and collaborate more effectively on massive models and datasets, described in more detail in this [blog post](https://huggingface.co/blog/xet-on-the-hub). - **Hugging Face revamps welcome page**: The Hugging Face welcome page has received a significant makeover, offering a streamlined access to community AI apps, open-source libraries, local model execution, and more. - Users can explore various sections like HF Spaces, Open Source Libraries, Local Models, and the Inference Playground via the updated [welcome page](https://huggingface.co/welcome). <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://huggingface.co/collections/starvector/starvector-models-6783b22c7bd4b43d13cb5289">💫StarVector Models - a starvector Collection</a>: no description found</li><li><a href="https://huggingface.co/manycore-research/SpatialLM-Llama-1B">manycore-research/SpatialLM-Llama-1B · Hugging Face</a>: no description found</li><li><a href="https://x.com/ben_burtenshaw/status/1903025737633841170">Tweet from Ben Burtenshaw (@ben_burtenshaw)</a>: The @huggingface Agents Course now includes three major agent frameworks. LlamaIndex, LangChain, and our very own smolagents.We've worked to integrate the three frameworks in distinctive ways so ...</li><li><a href="https://huggingface.co/blog/xet-on-the-hub">Xet is on the Hub</a>: no description found</li><li><a href="https://huggingface.co/welcome">Hugging Face – The AI community building the future.</a>: no description found</li><li><a href="https://huggingface.co/blog/endpoint-analytics">The New and Fresh analytics in Inference Endpoints</a>: no description found</li><li><a href="https://huggingface.co/blog/ai-action-wh-2025">AI Policy @🤗: Response to the White House AI Action Plan RFI</a>: no description found</li><li><a href="https://huggingface.co/blog/olympic-coder-lmstudio">Open R1: How to use OlympicCoder locally for coding</a>: no description found</li><li><a href="https://huggingface.co/blog/nvidia-physical-ai">NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets</a>: no description found</li><li><a href="https://huggingface.co/blog/burtenshaw/gemma3-thinking">Making Gemma 3 think</a>: no description found </li> </ul> </div> --- ### **HuggingFace ▷ #[general](https://discord.com/channels/879548962464493619/879548962464493622/1352720141007196330)** (136 messages🔥🔥): > `ComfyUI Samplers, Open Schizo Leaderboard, Short Story Generator with Pytorch, Photorealism Settings for SD1.5/SDXL, Flux.1 Model Performance` - **ComfyUI Sampler Strategy Session**: Members discussed the best **sampler_name** to use in ComfyUI, seeking recommendations for optimal configurations but not knowing much about it. - One user recommended *dpmpp_2m_sde* sampler and *kl_optimal* scheduler for photorealism with **SD1.5** and **SDXL checkpoints**. - **Showcasing Crazies on Open Schizo Leaderboard**: A new leaderboard was released on Hugging Face, showcasing top models. - Find the [Open-Schizo-Leaderboard](https://huggingface.co/spaces/rombodawg/Open-Schizo-Leaderboard) on HuggingFace. - **Model Integration Protocol (MIP) simplifies LLM-powered service**: A user is seeking feedback on **Model Integration Protocol (MIP)**, proposing a simpler and more scalable approach for OpenAI that automatically converts existing methods, classes, and HTTP endpoints into JSON-RPC using reflection. - This approach aims to drastically reduce development overhead while maintaining platform independence and compatibility with any LLM, and a [Neurocaster-Server implementation](https://github.com/vishalmysore/neurocaster-server) illustrates its use. - **Wan Models Debut AutoencoderKL**: A user encountered an import error related to `AutoencoderKLWan` from the `diffusers` library, potentially due to using a development version or a mistaken repository. - A github [issue](https://github.com/huggingface/diffusers/issues/10963) was found which explains that the user may be experiencing a development version error, since `AutoencoderKLWan` is not available yet. - **InferenceClient API throws Authentication Error**: A user reported a **403 Forbidden** error when attempting to list deployed models using the `InferenceClient` API, even with read-only tokens configured to allow calls to Inference Providers. - The error indicates insufficient permissions to call Inference Providers on behalf of the user and a user posted a [link](https://huggingface.co/posts/kpadpa/282697879499561) with the same error. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://serapath-example1-orchestrator-agent.hf.space`">no title found</a>: no description found</li><li><a href="https://huggingface.co/spaces/rombodawg/Open-Schizo-Leaderboard">Try Rombos-LLM-V2.5-Qwen-7b - a Hugging Face Space by rombodawg</a>: no description found</li><li><a href="https://huggingface.co/docs/hub/spaces-sdks-docker">Docker Spaces</a>: no description found</li><li><a href="https://huggingface.co/chat/)">HuggingChat</a>: Making the community's best AI chat models available to everyone.</li><li><a href="https://huggingface.co/spaces/fantaxy/fantasy-novel-kr/discussions">fantaxy/fantasy-novel-kr · Discussions</a>: no description found</li><li><a href="https://huggingface.co/blog/hmb/gradio-dataframe-upgrade">Gradio’s Dataframe has been upgraded! 🎨</a>: no description found</li><li><a href="https://huggingface.co/posts/kpadpa/282697879499561">@kpadpa on Hugging Face: "What does this mean and how can I fix it? "This authentication method does…"</a>: no description found</li><li><a href="https://docs.vllm.ai/en/v0.7.2/getting_started/examples/whisper.html">Whisper — vLLM</a>: no description found</li><li><a href="https://aikval25.kattis.com/contests/aikval25/problems/windchill">Windchill – Kattis, AI-olympiadens Kval 2025</a>: no description found</li><li><a href="https://huggingface.co/posts/julien-c/158943939527784">@julien-c on Hugging Face: "Important notice 🚨 For Inference Providers who have built support for our…"</a>: no description found</li><li><a href="https://huggingface.co/docs/api-inference/pricing">Pricing and Rate limits</a>: no description found</li><li><a href="https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers">Wan-AI/Wan2.1-I2V-14B-480P-Diffusers · Hugging Face</a>: no description found</li><li><a href="https://huggingface.co/blog/open-r1/update-3">Open R1: Update #3</a>: no description found</li><li><a href="https://github.com/huggingface/diffusers/issues/10963">cannot import name 'AutoencoderKLWan' from 'diffusers' · Issue #10963 · huggingface/diffusers</a>: Describe the bug ImportError: cannot import name 'AutoencoderKLWan' from 'diffusers' (/usr/local/lib/python3.10/dist-packages/diffusers/init.py) Reproduction from diffusers import Auto...</li><li><a href="https://huggingface.co/docs/inference-endpoints/index">Inference Endpoints</a>: no description found</li><li><a href="https://huggingface.co/learn/cookbook/en/enterprise_dedicated_endpoints">Inference Endpoints (dedicated) - Hugging Face Open-Source AI Cookbook</a>: no description found</li><li><a href="https://github.com/huggingface/text-generation-inference">GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference</a>: Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.</li><li><a href="https://huggingface.co/support">Expert Support – Hugging Face</a>: no description found</li><li><a href="https://huggingface.co/spaces/fffiloni/diffusers-image-outpaint">Diffusers Image Outpaint - a Hugging Face Space by fffiloni</a>: no description found</li><li><a href="https://huggingface.co/docs/diffusers/using-diffusers/img2img">Image-to-image</a>: no description found</li><li><a href="https://github.com/justinpinkney/stable-diffusion?tab=readme-ov-file#image-mixer">GitHub - justinpinkney/stable-diffusion</a>: Contribute to justinpinkney/stable-diffusion development by creating an account on GitHub.</li><li><a href="https://github.com/TheDenk/images_mixing">GitHub - TheDenk/images_mixing: Сombine images using usual diffusion models.</a>: Сombine images using usual diffusion models. Contribute to TheDenk/images_mixing development by creating an account on GitHub.</li><li><a href="https://huggingface.co/spaces?sort=trending&search=vton">Spaces - Hugging Face</a>: no description found</li><li><a href="https://huggingface.co/spaces?sort=trending&search=try+on">Spaces - Hugging Face</a>: no description found</li><li><a href="https://archive.ph/2025.02.24-150819/https://medium.com/data-scientists-from-future/fine-tuning-open-source-language-models-a-step-by-step-guide-a38bed8df923">Fine-Tuning Open-Source Language Models: A Step-by-Step Guide | by Vi…</a>: no description found</li><li><a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">Qwen/Qwen2.5-VL-7B-Instruct · Hugging Face</a>: no description found</li><li><a href="https://huggingface.co/microsoft/Phi-4-multimodal-instruct">microsoft/Phi-4-multimodal-instruct · Hugging Face</a>: no description found</li><li><a href="https://huggingface.co/learn/cookbook/advanced_rag">Advanced RAG on Hugging Face documentation using LangChain - Hugging Face Open-Source AI Cookbook</a>: no description found</li><li><a href="https://huggingface.co/docs/transformers/tasks/asr">Automatic speech recognition</a>: no description found </li> </ul> </div> --- ### **HuggingFace ▷ #[today-im-learning](https://discord.com/channels/879548962464493619/898619964095860757/1353192213567373354)** (5 messages): > `audio processing, AI agents, Tokenisers, BPE, Unigram language modelling` - **Dive into Audio Adventures**: A member is deep-diving into **audio processing** today. - **Framework for Fantastic AI Agents**: A member is tackling the **framework** for **AI agents** today. - **Tokeniser Tussle: BPE vs Unigram**: A member is exploring the mechanics of various **tokenisers**, specifically **BPE** and **unigram language modelling**. - **Lightweight Models light up Laptops**: A member is researching **lightweight**, **fine-tunable models** suitable for running and tuning on a development laptop. --- ### **HuggingFace ▷ #[i-made-this](https://discord.com/channels/879548962464493619/897390720388825149/1353166030863732786)** (8 messages🔥): > `Logfire Callback for HF Transformers Trainer, TrashLens for image organization, pdf2notes: AI-powered PDF to Notes conversion, Kids feedback on UI/UX, Local API Usage` - ****Logfire Callback** Logs Training Events!**: A member created a [Logfire callback](https://github.com/louisbrulenaudet/logfire-callback) for **HF transformers Trainer** that logs training events. - This tool helps in tracking and analyzing the training process of transformer models in Hugging Face. - ****TrashLens** Brings Order to Image Chaos!**: [TrashLens](https://github.com/0xrushi/TrashLens) is designed to bring order to image chaos, helping users focus on important content and free up space effortlessly. - The tool aims to streamline image organization, making it easier to manage and declutter visual data. - ****pdf2notes** Turns PDFs into Organized Notes!**: [Pdf2Notes](https://github.com/AstraBert/pdf2notes) is an **AI-powered, open-source solution** that converts unstructured PDFs into well-ordered notes using **LlamaParse** and **Llama-3.3-70B**. - The tool uses **DeepMind's Gemini 2 Flash** for multi-modal parsing and features a chatbot for more in-depth insights, wrapped in a **Gradio** and **FastAPI** framework, and can be run locally with **Docker**. - **Kids Provide Valuable UI/UX Feedback!**: A member shared that their son helped with the UI colors and enjoys the tool, especially unlocking new achievements. - Feedback from kids emphasizes the importance of engaging UI elements and achievement systems in educational tools. - **API-Free Local Operation in Question!**: A member questioned if [pdf2notes](https://github.com/AstraBert/pdf2notes) can operate **100% locally without external APIs**, raising concerns about needing subscriptions for **Gemini** and **Groq**. - They criticized the Docker setup, suggesting it is too complex for non-power users who prefer simpler solutions without additional application installations. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/0xrushi/TrashLens">GitHub - 0xrushi/TrashLens</a>: Contribute to 0xrushi/TrashLens development by creating an account on GitHub.</li><li><a href="https://github.com/louisbrulenaudet/logfire-callback">GitHub - louisbrulenaudet/logfire-callback: A callback for logging training events from Hugging Face's Transformers to Logfire 🤗</a>: A callback for logging training events from Hugging Face's Transformers to Logfire 🤗 - louisbrulenaudet/logfire-callback</li><li><a href="https://github.com/AstraBert/pdf2notes">GitHub - AstraBert/pdf2notes: Turn PDF into Notes in seconds📝</a>: Turn PDF into Notes in seconds📝. Contribute to AstraBert/pdf2notes development by creating an account on GitHub. </li> </ul> </div> --- ### **HuggingFace ▷ #[computer-vision](https://discord.com/channels/879548962464493619/922424143113232404/1352896313557258272)** (6 messages): > `Qwen for video annotation, Opus clip opensource, LLMs and VLMs in autonomous driving` - **Qwen Guides Video Annotation Newbie**: A member sought advice on using **Qwen** with the **transformers library** for video frame extraction and annotation. - Another member recommended the [Qwen2.5-VL official GitHub repo](https://youtu.be/4twSI2XFK2s) for model information and quickstart examples. - **Opensource Opus Clip Tool Seeks Helping Hands**: A member is trying to create an opensource version of **Opus Clip** (**video repurposing tool**). - The author seeks assistance with their "spaghetti repo and code" which utilizes **yolov8** and **revideo** for detecting people and splitting the video vertically. - **LLMs and VLMs drive Autonomous Driving into the Future**: A member shared their new substack article about **LLMs** and **VLMs** in autonomous driving, highlighting improvements in vehicle capabilities. - The article references a survey paper, *A survey for foundation models in autonomous driving*, available on [arXiv:2402.01105](https://arxiv.org/abs/2402.01105). **Link mentioned**: <a href="https://samerattrah.substack.com/p/autonomous-driving-with-llms-vlms">Autonomous driving with LLMs, VLMs, and MLLMs</a>: Discussing the application of Large Language/Vision Models in autonomous driving and the most significant developments and approaches. --- ### **HuggingFace ▷ #[gradio-announcements](https://discord.com/channels/879548962464493619/1014577787039924226/1353824302524530848)** (1 messages): > `Gradio Deep Links` - **Gradio 5.23 enables Deep Links!**: Gradio 5.23 introduces **Deep Links**, allowing direct linking to specific outputs like images or videos, exemplified by [this link](https://abidlabs-black-forest-labs-flux-1-schnell.hf.space/?deep_link=oUq4ebmL1Ek) to a blue jay image. - To upgrade, use `pip install --upgrade gradio`. - **Image.png**: The image shows an attached file. - The file is hosted on discord. **Link mentioned**: <a href="https://abidlabs-black-forest-labs-flux-1-schnell.hf.space/?deep_link=oUq4ebmL1Ek">black-forest-labs/FLUX.1-schnell</a>: no description found --- ### **HuggingFace ▷ #[smol-course](https://discord.com/channels/879548962464493619/1313889336907010110/1352937712549630033)** (1 messages): > `Hackathon Timing, Hackathon Details` - **Hackathon Date Still a Mystery**: A member inquired about the hackathon date, expressing difficulty in finding relevant information about it. - They mentioned the **YouTube stream** stated the **22nd of March**, but found no confirmation. - **Hackathon Details are missing**: The user is unable to find any relevant information about the Hackathon. - The user mentions the youtube stream said that it's today, but there are no details. --- ### **HuggingFace ▷ #[agents-course](https://discord.com/channels/879548962464493619/1329142738440028273/1352742104954306560)** (33 messages🔥): > `LangGraph rigidity, Local LLMs for smolagents, Gemini in LangGraph, API costs for notebooks, Agent storing retrieved info` - **LangGraph gains Fans Despite LangChain hate**: A member who just finished the **LangGraph** module likes the *rigidness* of **LangGraph** compared to **LangChain**, which they follow on Twitter and said *gets a lot of hate*. - Others seemed to echo this sentiment. - **Local LLMs Need Beefy Machines to run Smolagents**: Members found that to run a local LLM and get good results on **smolagents**, you'll need a big one (around **32B** parameters) and that implies a powerful machine. - They tried with 'small' LLMs like **qwen coder 7B** or **deepsek-r1 7B** and the results with smolagents are pretty inconsistent. - **Home Labs Arise to Reduce API Costs**: Members discussed the cost of **APIs** to complete the notebook, and those who do not wish to pay are working to build out a sufficient **home lab** to run models on and access them via **API**. - It was mentioned that InferenceClient APIs by huggingface are free to use with a limit of 300 requests/hour for free users. - **Where does the Agent store for future reference?**: In the agentic RAG section of the course ([https://huggingface.co/learn/agents-course/unit2/smolagents/retrieval_agents](https://huggingface.co/learn/agents-course/unit2/smolagents/retrieval_agents)), it is unclear how the LLM agent *stores* the retrieved information for easy access when planning future events, optimizing efficiency in subsequent tasks. - It was suggested it is not the LLM but the agent that stores the search and that the agent itself would have to write it down somewhere, not just in the context. - **API Token Issue Solved!**: A member was experiencing issues running code using **HuggingFaceInferenceAPI** and getting irrelevant responses from their LLM. - The issue was identified and resolved as a problem with the **API token**, which needed to be read-only to run locally. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://arxiv.org/abs/2303.17651">Self-Refine: Iterative Refinement with Self-Feedback</a>: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improv...</li><li><a href="https://huggingface.co/learn/agents-course/unit2/smolagents/retrieval_agents#basic-retrieval-with-duckduckgo)">Building Agentic RAG Systems - Hugging Face Agents Course</a>: no description found</li><li><a href="https://huggingface.co/learn/agents-course">Welcome to the 🤗 AI Agents Course - Hugging Face Agents Course</a>: no description found</li><li><a href="https://docs.llamaindex.ai/en/stable/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm/">HuggingFace LLM - StableLM - LlamaIndex</a>: no description found </li> </ul> </div> --- ### **HuggingFace ▷ #[open-r1](https://discord.com/channels/879548962464493619/1333465203865817088/1352799575294873652)** (9 messages🔥): > `r1, vllm, cuda kernel` - **Debate Erupts Over r1 Training Curriculum!**: One member asked about the training curriculum, saying that it took *5 minutes* with **deepseek** to understand the humor. - Another member stated that **r1** is *incredibly slow*, requiring considerable power; their **Scaleway R1 grid** running *20 machines* around **3 PFLOPS** generated only a few hundred MB per day, so it was much faster to use **llama** and reverse engineer the thinking tokens from query response pairs. - **CUDA Kernel Improvements Discussed**: One user inquired whether **vllm** was being used and also mentioned working on some **cuda kernel improvements**. - Another member simply answered *no*. --- ### **MCP (Glama) ▷ #[general](https://discord.com/channels/1312302100125843476/1312302100125843479/1352737566960386178)** (155 messages🔥🔥): > `MCP and K8s, Anthropic's MCP, MCP server directories, C# MCP SDK, Vercel's AI SDK with MCP Clients` - **K8s Setup Required to Test MCP Prompts**: To test MCP prompts, particularly those from [this file](https://github.com/strowk/mcp-k8s-go/blob/main/testdata/list_prompts_test.yaml) and [this test](https://github.com/strowk/mcp-k8s-go/blob/10aa7fd54dd7839bbeeb6b8705243e8cdb67ca7e/testdata/with_k3d/list_k8s_namespaces_test.yaml#L50), a Kubernetes setup is required. - An alternative implementation with prompts is available [here](https://github.com/Abiorh001/mcp_ev_assistant_server) for managing Electric Vehicle charging stations. - **MCP isn't that complex! User says**: One user expressed confusion at the perception that MCP is complex, stating *JSON RPC isn't hard. Using SDKs it's even easier. Making an MCP server or client is pretty easy compared to a lot of other development work*. - They suggested that with just **1 cmd and 1 arg** you can add anything to any llm, with no need for public ip, tls cert, or any previous blocks. - **Dive into MCP Server Repositories**: Users shared a list of useful MCP server directories, including [Glama](http://glama.ai/mcp/servers) with a report card system, [PulseMCP](https://www.pulsemcp.com/) for a well-organized and exhaustive list, and the [official MCP GitHub](https://github.com/modelcontextprotocol/servers?tab=readme-ov-file#model-context-protocol-servers). - These resources help developers find and assess various MCP servers for their projects. - **New C# SDK officially released!**: A new official **C# SDK** for Model Context Protocol servers and clients has been released by Microsoft, as seen [here](https://github.com/modelcontextprotocol/csharp-sdk). - This provides developers with tools for building **AI applications** using **JavaScript** and **TypeScript**, integrating into web frameworks like [Next.js](https://nextjs.org) and [Svelte](https://svelte.dev/), per [Vercel AI SDK 4.2](https://vercel.com/blog/ai-sdk-4-2). - **Zapier Integrates with MCP for broader AI application Access**: Zapier has released an MCP server, [providing access to over 8,000 integrations](https://zapier.com/mcp) for AI assistants to interact with various apps. - This allows AIs to perform real-world tasks such as sending messages, managing data, scheduling events, and updating records, expanding their capabilities beyond text generation. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://x.com/llmindsetuk/status/1885719128247296109">Tweet from llmindset (@llmindsetuk)</a>: microsoft 365 copilot. the word "agent" is now defined by the millions of corporate eyeballs that will see this screen. "Enterprise data protection" is given prominence.</li><li><a href="https://x.com/tom_doerr/status/1903972369443475471">Tweet from Tom Dörr (@tom_doerr)</a>: Product requirements document to tasks toolQuoting Eyal Toledano (@EyalToledano) Sick of @cursor_ai rewriting good code or going in circles?Introducing Task Master ✨ A CLI that turns your PRD into a l...</li><li><a href="https://zapier.com/mcp">Zapier MCP—Connect your AI to any app instantly</a>: The fastest way to let your AI assistant interact with thousands of apps. No complex API integrations required.</li><li><a href="https://block.github.io/goose/docs/getting-started/providers#local-llms-ollama">Configure LLM Provider | codename goose</a>: Goose is compatible with a wide range of LLM providers, allowing you to choose and integrate your preferred model.</li><li><a href="https://VeyraX.com/mcp">Tweet from VeyraX</a>: VeyraX is Agenic Component Interface</li><li><a href="https://github.com/FreePeak/">Free Peak</a>: Indie Hacker. Free Peak has one repository available. Follow their code on GitHub.</li><li><a href="https://glama.ai/mcp/servers/@gannonh/firebase-mcp">Firebase MCP</a>: The Firebase MCP server provides a standardized interface to interact with Firebase services, including Firebase Authentication, Firestore, and Firebase Storage.</li><li><a href="https://github.com/Abiorh001/mcp_ev_assistant_server">GitHub - Abiorh001/mcp_ev_assistant_server: A powerful server implementation for managing Electric Vehicle (EV) charging stations, trip planning, and resource management. This server provides a comprehensive set of tools and APIs for EV-related services.</a>: A powerful server implementation for managing Electric Vehicle (EV) charging stations, trip planning, and resource management. This server provides a comprehensive set of tools and APIs for EV-rel...</li><li><a href="https://github.com/strowk/mcp-k8s-go/blob/main/testdata/list_prompts_test.yaml">mcp-k8s-go/testdata/list_prompts_test.yaml at main · strowk/mcp-k8s-go</a>: MCP server connecting to Kubernetes. Contribute to strowk/mcp-k8s-go development by creating an account on GitHub.</li><li><a href="https://vercel.com/blog/ai-sdk-4-2">AI SDK 4.2 - Vercel</a>: AI SDK 4.2 introduces MCP clients, reasoning, image generation with language models, message parts, sources, and more</li><li><a href="https://github.com/modelcontextprotocol/specification/discussions/220">MCP Hosting Working Group · modelcontextprotocol/specification · Discussion #220</a>: Pre-submission Checklist I have verified this would not be more appropriate as a feature request in a specific repository I have searched existing discussions to avoid duplicates Your Idea Hey ever...</li><li><a href="https://github.com/modelcontextprotocol/python-sdk/pull/343">Fix/base64 handling (Issue #342) by evalstate · Pull Request #343 · modelcontextprotocol/python-sdk</a>: Single line change to lowlevel/server.py + test to verify that Base64 decoding is not url safe and as expected by the Client.Motivation and ContextTransmitting Binary resources.How Has This Been...</li><li><a href="https://github.com/spences10/mcp-sequentialthinking-tools">GitHub - spences10/mcp-sequentialthinking-tools: 🧠 An adaptation of the MCP Sequential Thinking Server to guide tool usage. This server provides recommendations for which MCP tools would be most effective at each stage.</a>: 🧠 An adaptation of the MCP Sequential Thinking Server to guide tool usage. This server provides recommendations for which MCP tools would be most effective at each stage. - spences10/mcp-sequential.....</li><li><a href="https://github.com/strowk/mcp-k8s-go/blob/10aa7fd54dd7839bbeeb6b8705243e8cdb67ca7e/testdata/with_k3d/list_k8s_namespaces_test.yaml#L50">mcp-k8s-go/testdata/with_k3d/list_k8s_namespaces_test.yaml at 10aa7fd54dd7839bbeeb6b8705243e8cdb67ca7e · strowk/mcp-k8s-go</a>: MCP server connecting to Kubernetes. Contribute to strowk/mcp-k8s-go development by creating an account on GitHub.</li><li><a href="https://github.com/modelcontextprotocol/csharp-sdk">GitHub - modelcontextprotocol/csharp-sdk: The official C# SDK for Model Context Protocol servers and clients, maintained by Microsoft</a>: The official C# SDK for Model Context Protocol servers and clients, maintained by Microsoft - modelcontextprotocol/csharp-sdk</li><li><a href="https://glama.ai/mcp/servers/@heurist-network/heurist-mesh-mcp-server">Mesh Agent MCP Server</a>: A Model Context Protocol server that connects Claude to Heurist Mesh APIs, providing access to various blockchain and web3 tools including cryptocurrency data, token security, Twitter intelligence, an...</li><li><a href="https://github.com/heurist-network">Heurist</a>: Heurist is a Decentralized AI-as-a-Service Cloud. Heurist has 22 repositories available. Follow their code on GitHub.</li><li><a href="https://github.com/modelcontextprotocol/servers?tab=readme-ov-file#model-context-protocol-servers)">GitHub - modelcontextprotocol/servers: Model Context Protocol Servers</a>: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.</li><li><a href="https://github.com/FreePeak/db-mcp-server">GitHub - FreePeak/db-mcp-server</a>: Contribute to FreePeak/db-mcp-server development by creating an account on GitHub.</li><li><a href="https://github.com/punkpeye/awesome-mcp-servers/pull/355">Update README: Add multi-database MCP server built with Golang by linhdmn · Pull Request #355 · punkpeye/awesome-mcp-servers</a>: Add multi-database MCP server built with Golang, supporting MySQL & PostgreSQLAs an alternative for the https://github.com/FreePeak/db-mcp-server </li> </ul> </div> --- ### **MCP (Glama) ▷ #[showcase](https://discord.com/channels/1312302100125843476/1315696461316358175/1352750945901084674)** (29 messages🔥): > `mcpwizard, vscode-mcp, DICOM servers MCP, google sheet MCP server, Narrative Spittoon Inversion project` - ****MCPwizard** Simplifies Server Creation**: A member introduced [mcpwizard](https://www.npmjs.com/package/mcpwizard), a CLI tool to simplify creating and deploying **MCP servers**, highlighting features like initializing projects and adding custom tools to Claude assistants. - The tool's [GitHub repo](https://github.com/yoannarz/mcpwizard) was also shared for community feedback and contributions. - ****VS Code MCP** Gets Community Acclaim**: Members shared a [VS Code MCP](https://github.com/block/vscode-mcp) that they've wanted. - It's described in action in this [Youtube Short](https://www.youtube.com/shorts/gddEgvCLrgU) . - ****DICOM MCP** Server for Clinical Imaging**: A member created an MCP server for interacting with **DICOM servers**, enabling AI assistants to query medical imaging systems for patient scans and clinical reports, available at [christianhinge.com](https://www.christianhinge.com/projects/dicom-mcp/). - The associated **GitHub repo** is located [here](https://github.com/ChristianHinge/dicom-mcp). - ****Google Sheets MCP** for Direct Editing**: A member built a **Google Sheet MCP server**, allowing Claude to directly edit spreadsheets, streamlining data handling and formula adjustments as mentioned in [this tweet](https://x.com/xing101/status/1903391600040083488). - The code can be found [here](https://github.com/xing5/mcp-google-sheets). - ****Automated Debugger MCP Server** Enhancements**: A member has been making improvements to their [automated debugger MCP server](https://github.com/jasonjmcghee/claude-debugs-for-you), encouraging others to try it out and contribute. - The server allows LLMs to *place breakpoints, run code, move between breakpoints, and evaluate expressions*. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://lokka.dev/">Lokka | Lokka</a>: Lokka is an AI agent tool that brings the power of Microsoft Graph to AI agents like GitHub Copilot and Claude that run on your local desktop.</li><li><a href="https://github.com/evalstate/mcp-webcam/.">GitHub - evalstate/mcp-webcam: Capture live images from your webcam with a tool or resource request</a>: Capture live images from your webcam with a tool or resource request - GitHub - evalstate/mcp-webcam: Capture live images from your webcam with a tool or resource request</li><li><a href="https://github.com/gotohuman/gotohuman-mcp-server">GitHub - gotohuman/gotohuman-mcp-server</a>: Contribute to gotohuman/gotohuman-mcp-server development by creating an account on GitHub.</li><li><a href="https://github.com/jasonjmcghee/claude-debugs-for-you">GitHub - jasonjmcghee/claude-debugs-for-you: Enable any LLM (e.g. Claude) to interactively debug any language for you via MCP and a VS Code Extension</a>: Enable any LLM (e.g. Claude) to interactively debug any language for you via MCP and a VS Code Extension - jasonjmcghee/claude-debugs-for-you</li><li><a href="https://github.co">GitHub · Build and ship software on a single, collaborative platform</a>: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.</li><li><a href="https://www.christianhinge.com/projects/dicom-mcp/"> Agentic healthcare LLMs | Christian Hinge </a>: no description found</li><li><a href="https://github.com/ChristianHinge/dicom-mcp">GitHub - ChristianHinge/dicom-mcp: Model Context Protocol (MCP) for interacting with dicom servers (PACS etc.)</a>: Model Context Protocol (MCP) for interacting with dicom servers (PACS etc.) - ChristianHinge/dicom-mcp</li><li><a href="https://github.com/Kvadratni/speech-mcp">GitHub - Kvadratni/speech-mcp: Speech MCP: A Goose MCP extension for voice interaction with audio visualization</a>: Speech MCP: A Goose MCP extension for voice interaction with audio visualization - Kvadratni/speech-mcp</li><li><a href="https://github.com/MubarakHAlketbi/game-asset-mcp">GitHub - MubarakHAlketbi/game-asset-mcp: An MCP server for creating 2D/3D game assets from text using Hugging Face AI models.</a>: An MCP server for creating 2D/3D game assets from text using Hugging Face AI models. - MubarakHAlketbi/game-asset-mcp</li><li><a href="https://github.com/MushroomFleet/UNO-MCP">GitHub - MushroomFleet/UNO-MCP: Unified Narrative Operator</a>: Unified Narrative Operator. Contribute to MushroomFleet/UNO-MCP development by creating an account on GitHub.</li><li><a href="https://x.com/xing101/status/1903391600040083488">Tweet from Xing Wu (@xing101)</a>: Everyone's buzzing about #MCP and here's why: a weekend project solved a long-standing pain for me. No more copying data tables to spreadsheets or decoding complex formula guides from LLM conv...</li><li><a href="https://github.com/xing5/mcp-google-sheets">GitHub - xing5/mcp-google-sheets</a>: Contribute to xing5/mcp-google-sheets development by creating an account on GitHub.</li><li><a href="https://github.com/yoannarz/mcpwizard">GitHub - yoannarz/mcpwizard: A package to help you create and deploy MCP servers</a>: A package to help you create and deploy MCP servers - yoannarz/mcpwizard</li><li><a href="https://shorturl.at/sLWsr">MCPwizard helps you building mcp servers !</a>: Use Loom to record quick videos of your screen and cam. Explain anything clearly and easily – and skip the meeting. An essential tool for hybrid workplaces. </li> </ul> </div> --- ### **Nomic.ai (GPT4All) ▷ #[general](https://discord.com/channels/1076964370942267462/1090427154141020190/1352725930593878149)** (102 messages🔥🔥): > `Speech to Text Solutions, GPT4All and NSFW content, LocalDocs Disappearing, LLMs for Office Tasks, Running Models on Multiple Devices` - **Prompting Proficiency Prevails**: Members discussed that if a language model is desired to respond in a specific language (e.g. German), it is best to write the system message in that language to avoid triggering *"Im Kontext Lernen"* (in-context learning). - It was further suggested that **avoiding negative sentences** with words like *"nicht"* and *"don't"* can improve results, with a recommendation to rephrase instructions to use active verbs instead. - **Nemo's Nuances Named**: It was mentioned that [Mistral Nemo is a 12b model](https://huggingface.co/mistralai) and Mistral 24b is Mistral 3 or Mistral 3.1, with discussion around specific model details for projects. - Confusion arose around identifying the exact model, with one member emphasizing the need for precise model information to avoid issues. - **GPT4All's LocalDocs Vanish**: A user reported that their entire catalog of local docs disappeared for no apparent reason, prompting discussion about potential causes such as **changes to the install folder** or **lack of admin rights**. - Members recommended backing up the *localdocs.db* file and the original documents to prevent data loss, and suggested that a Windows 11 update might have caused the issue by messing with drive letters. - **LLMs Eye Medical Office Efficiency**: Members discussed the potential of using local LLMs in a medical office setting to help doctors create reports and assist with treatments, with a focus on the system learning from past dictated notes. - However, it was cautioned that **LLMs may not be suitable for handling financial or medical data** due to the risk of confabulation and the need for precise information. - **GPT4All Lacks Vision**: A member asked if any models that GPT4All can run have vision capabilities, and it was confirmed that **GPT4All does not support vision capabilities**. - Alternative tools like **LM-Studio** were suggested as options for vision-related tasks. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://huggingface.co/mistralai">mistralai (Mistral AI_)</a>: no description found</li><li><a href="https://www.reddit.com/r/LocalLLaMA/comments/1broa8h/is_there_a_way_for_me_to_use_multiple_computers/">Reddit - The heart of the internet</a>: no description found</li><li><a href="https://www.reddit.com/r/LocalLLaMA/comments/1jcm5p2/ocr_llm_for_invoice_extraction/">Reddit - The heart of the internet</a>: no description found </li> </ul> </div> --- ### **Modular (Mojo 🔥) ▷ #[general](https://discord.com/channels/1087530497313357884/1098713601386233997/1352831709149794374)** (7 messages): > `High performance software, Vendor lock-ins, OpenCL, OpenMP, OpenACC, Vulkan’s Compute API, and SYCL, Democratizing AI Compute, Hardware Lottery` - **Exploring High-Performance Software Landscape**: A member is exploring the landscape of writing **high-performance software** for various devices and industry needs, particularly concerning vendor lock-ins and the necessity of porting projects to phones or embedded devices. - They requested recommendations for papers, search terms, or authors to better understand the trade-offs and options available. - **Open and Portable APIs**: A member suggested starting with open and portable APIs such as **OpenCL**, **OpenMP**, **OpenACC**, **Vulkan’s Compute API**, and **SYCL**, citing their well-documented reasons for creation. - They also pointed to **POCL** as an academic project with related papers. - **Democratizing AI Compute Series**: A member linked to Chris Lattner's "[Democratizing AI Compute](https://www.modular.com/blog/democratizing-compute-part-1-deepseeks-impact-on-ai)" series, highlighting how **better hardware utilization** can dramatically reduce the need for expensive GPUs. - The series includes articles on **CUDA**, **OpenCL**, and **AI compilers (TVM and XLA)**. - **The Hardware Lottery**: A member recommended the paper "[The Hardware Lottery](https://arxiv.org/abs/2009.06489)" by Sara Hooker, which discusses how hardware and software can determine the success or failure of research ideas. - The abstract states that the paper *introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions*. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://www.modular.com/blog/democratizing-compute-part-1-deepseeks-impact-on-ai">Modular: Democratizing AI Compute, Part 1: DeepSeek’s Impact on AI</a>: Part 1 of an article that explores the future of hardware acceleration for AI beyond CUDA, framed in the context of the release of DeepSeek</li><li><a href="https://arxiv.org/abs/2009.06489">The Hardware Lottery</a>: Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is... </li> </ul> </div> --- ### **Modular (Mojo 🔥) ▷ #[mojo](https://discord.com/channels/1087530497313357884/1151418092052815884/1353002566157602847)** (82 messages🔥🔥): > `Mojo Logging Library, Mojo Formatter Tool, Mojo Dict Default Values, GPU Support for Windows, Mojo Inline Assembly Documentation` - **Logging Library in Mojo Remains WIP**: A logging library is work-in-progress in the standard library but is getting reworked; full serialization, and likely reflection, is needed before logging can be considered finished. - According to one member, *We would need to finish serialization before we could call logging finished, which probably means reflection.* - **Mojo Boasts Built-In Formatting Tool**: Mojo includes a built-in formatting tool, mojo format, similar to Black in Python or fmt in Rust, for code formatting. - **Dict Lacks Default Value Generation**: The Mojo Dict is more like Python's dict and does not include functionality to generate default values like defaultdict. - **Windows GPU Support Frustrates Mojo Developers**: GPU support for Windows is difficult because the Windows compiler toolchain is a pain to work with; most people do not run enterprise GPU clusters on Windows, and there's little reason to improve tooling. - **Mojo's Inline Assembly Documentation is a Mess**: Members noted the documentation for inline assembly in Mojo is a bit messy. - One member said *Time to harass Joe into writing documentation for it, then*, but this was immediately followed by *No harassing*. **Link mentioned**: <a href="https://forum.modular.com/t/question-vpermi2b-inline-assembly-output-incorrect-in-loop-context-due-to-register-allocation/1091/2?u=sora">Question: vpermi2b inline assembly output incorrect in loop context due to register allocation</a>: Maybe you could try this from sys import llvm_intrinsic alias T = SIMD[DType.int8, 64] @always_inline("nodebug") fn vpermi2b(a: T, b: T, idx: T) -> T: return llvm_intrinsic["llv... --- ### **Modular (Mojo 🔥) ▷ #[max](https://discord.com/channels/1087530497313357884/1212827597323509870/1353465638743707820)** (3 messages): > `MAX Platform, pixi.toml, max-pipeline, Python model graphs, magic CLI` - **Newcomer Asks About MAX Platform**: A new user inquired about modifying the **max/pipeline** directory and testing changes within the **MAX Platform** via the [pixi.toml file](https://github.com/modular/max/tree/main/src/max). - Specifically, they were curious about altering the **max-pipeline** without downloading it as a dependency. - **Editing Python Model Graphs**: A member explained that while **Python model graphs** aren't well-documented, the **MAX pipelines** module's Python source is downloaded locally. - Changes to these local files in `.modular/envs/max-pipelines/lib/python3.12/site-packages/max/pipelines` (or similar location in the `.magic` environment) should reflect when running pipelines. - **Running max-pipelines via Python**: The original poster asked if they could run **max-pipelines** directly with Python instead of using the **magic CLI** to add more command line parameters. - No direct response was given on the feasibility of this approach. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/modular/max/tree/main/src/max">max/src/max at main · modular/max</a>: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.</li><li><a href="https://github.com/m">m - Overview</a>: Typist, engineer, code poet, lover of beautiful data structures. - m </li> </ul> </div> --- ### **LlamaIndex ▷ #[blog](https://discord.com/channels/1059199217496772688/1187460979064324127/1352740201688207571)** (4 messages): > `AGNTCY, Large-Scale Structured Extraction, Deepseek R1 + LlamaIndex RAG app, WeAreDevs WebDev & AI Day` - **AGNCY Initiative for Agentic Interactions Emerges**: Luke discusses the motivations behind **AGNCY**, an effort to create an [open standard for agentic interactions](https://t.co/I558Qe2u4n). - **Scale Structured Extraction on Complex Docs**: LlamaIndex highlights how to perform **large-scale structured extraction** over complicated documents, extracting **50-100 fields** from a pydantic schema with nested sub-schemas, requiring high accuracy. - More details [here](https://t.co/tO1vACKTGo). - **Deepseek R1 and LlamaIndex Build RAG**: LlamaIndex highlights a project from Akshay Pachaar integrating **Deepseek AI** to build a **RAG app** with **LlamaIndex** for orchestration, **Deepseek AI R1** for inference, **Ollama** to locally serve R1, and **Streamlit** for the UI; more details available [here](https://t.co/KS26JUkwz0). - **WeAreDevs WebDev & AI Day Approaches**: LlamaIndex advertises **WeAreDevs WebDev & AI Day** this Thursday, promising insights from industry experts on how **AI is transforming web development** and its impact on software development, with more information [available here](https://t.co/c5N5BJ34mr). --- ### **LlamaIndex ▷ #[general](https://discord.com/channels/1059199217496772688/1059201661417037995/1352741141925203999)** (71 messages🔥🔥): > `Haystack Uninstall LlamaIndex Install, Ollama Integration Error, RTX 3060 Token Issues, Custom AI Interview Prep, Agent Workflow Timeout Error` - ****LlamaIndex + Ollama = Perfect RAG?****: A member sought help setting up a RAG pipeline with **LlamaIndex**, **Ollama**, and related integrations, receiving a code snippet from Deepseek to get started but ran into dependency issues. - The error was caused by the incorrect naming of a function argument (**model_name** instead of **model**), and while the error was resolved, the generated answer was still not what was expected. - ****Crafting Custom AI Interview Grindset****: A member is building a local AI using **Llama 3.2**, **Sonnet 3.7**, and **Dolphin** blended into a 16B model with RAG, custom fine-tuning, and dreams of landing a job at an AI/Tech company. - He is trying to get his AI to *apply to ai/tech companies and pass interviews* and has experience in face tracking, blender, unity, powershell, and tts. - ****Timeouts Break Agent Workflows!****: A member reported that their agent workflow was crashing due to unhandled **timeout errors** with the **OpenAI endpoint**. - It was suggested to catch `WorkflowRuntimeException` or `Exception` instead of `WorkflowTimeoutError`. - ****Hugging Face vs Ollama: Which LLM is Easier to Configure?****: Members discussed using **Hugging Face** models locally for chat with RAG, with one user suggesting **Ollama** is easier to configure. - Despite the debate, helpful links to **Hugging Face Embedding** examples were provided, such as [this notebook](https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/embeddings/huggingface.ipynb). - ****JSONL Datasets and Git: A Match Made in Heaven or Data Disaster?****: One member pondered the wisdom of storing datasets as **JSONL** files in **Git**, seeking insights into potential downsides. - There was no specific answer to this question, but it was mentioned that *Github tracks the updates to every piece of documentation*. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://docs.llamaindex.ai/en/stable/examples/workflow/function_calling_agent/">Workflow for a Function Calling Agent - LlamaIndex</a>: no description found</li><li><a href="https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/">Local Embeddings with HuggingFace - LlamaIndex</a>: no description found</li><li><a href="https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/">Hugging Face LLMs - LlamaIndex</a>: no description found</li><li><a href="https://github.com/run-llama/llama_cloud_services/blob/main/examples/parse/multimodal/multimodal_rag_slide_deck.ipynb">llama_cloud_services/examples/parse/multimodal/multimodal_rag_slide_deck.ipynb at main · run-llama/llama_cloud_services</a>: Knowledge Agents and Management in the Cloud. Contribute to run-llama/llama_cloud_services development by creating an account on GitHub. </li> </ul> </div> --- ### **LlamaIndex ▷ #[ai-discussion](https://discord.com/channels/1059199217496772688/1100478495295017063/1353699437750255668)** (1 messages): > `Multi-Agent Systems, Program-Wide Backoff Mechanism, Function Calling` - **Debate on Triggering Agents via Function Calling**: Members are debating if a single agent triggering other single agents via **function calling** could replace **program-wide backoff mechanisms** in multi-agent systems. - They are considering whether these two setups might overlap to achieve the same functionality in certain scenarios. - **Exploring Alternatives to Backoff Mechanisms**: The discussion focuses on whether using a single agent to trigger others via function calls is a viable alternative to a program-wide backoff mechanism. - The goal is to determine if this approach can achieve similar functionality in multi-agent systems, potentially offering a more streamlined solution. --- ### **Cohere ▷ #[「💬」general](https://discord.com/channels/954421988141711382/954421988783444043/1352766855911047238)** (25 messages🔥): > `RAG source return, data retention policy, security information about chat with cohere, sampler settings for Command A, AI assistant powered by Cohere's command-r-plus` - **Command-R-Plus Powers New AI Assistant**: A startup founder is building tools for structural biology using an AI assistant powered by **Cohere's command-r-plus**, combined with a **MolStar** molecular viewer ([https://ai.doi.bio](https://ai.doi.bio)). - The site currently supports the 'load' command for loading PDB entries into the viewer; for example, say *'Show me 7zzz'*. - **Data Retention Policy & Security Info Discussed**: A member inquired about **data retention** and **security policies** for **Cohere's chat** feature, specifically if data is used for model training. - A Cohere team member responded with links to the [privacy policy](https://cohere.com/privacy), [data usage policy](https://cohere.com/data-usage-policy), and [security policy](https://cohere.com/security), mentioning that users can control data settings in their dashboard. - **Cohere's Data Privacy and Deployment**: A Cohere team member detailed that their SaaS platform lets users control data directly from their [dashboard](https://dashboard.cohere.com/data-controls), offers **ZDR support** upon request via email, and integrates with major cloud providers (**OCI**, **Bedrock**, **Sagemaker**, **Azure Cloud**). - They also provide **on-prem solutions** (details at [https://cohere.com/deployment-options](https://cohere.com/deployment-options)), are **SOC II** and **GDPR** compliant, and adhere to industry standards for data security and privacy. - **Seeking RAG Replication Resources**: A member is seeking resources to replicate **RAG source return** behavior similar to **notebooklm**, where specific paragraphs are referenced in search results. - They are looking for open-source examples related to **chunking** and **data model design**. - **Command A Sampler Settings Guidance**: A member asked about released recommended **sampler settings for Command A**. - Another member suggested starting with a **temperature of 0.7** and adjusting as needed for determinism vs. flexibility; the default temperature is **0.3**. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://ai.doi.bio">ai.doi.bio</a>: no description found</li><li><a href="https://cohere.com/security">Security | Cohere</a>: Ensure ultimate AI security and privacy with Cohere's enterprise-grade security protocols, robust access controls, and private deployment options. </li><li><a href="https://dashboard.cohere.com/data-controls">Login | Cohere</a>: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.</li><li><a href="https://cohere.com/privacy">Privacy Policy | Cohere</a>: Cohere Inc. (“Cohere”) values and respects your privacy. We have prepared this privacy policy to explain the manner in which we collect, use and disclose personal information through our Website locat...</li><li><a href="https://cohere.com/deployment-options">Deployment Options - SaaS, Cloud API, Virtual Private Cloud (VPC), On Premise | Cohere</a>: Our solutions provide industry-leading data privacy and security and are designed to meet the diverse needs of organizations seeking to harness the power of generative AI. Whether you’re a start-up or... </li> </ul> </div> --- ### **Cohere ▷ #[「🔌」api-discussions](https://discord.com/channels/954421988141711382/1168578329423642786/1353203487290429522)** (35 messages🔥): > `Command models, SSL Errors, API Rate Limits, MongoDB` - ****Command** Models Face SSL Issues?**: A member inquired about **Command** models and their potential for generating more human-like responses, while also experiencing **SSL errors**. - Another member pointed out that SSL errors are not typically related to the model itself but rather to **untrusted certificates** or network configurations, but could be related to rate limiting. - **API Spamming Causes SSL Errors?**: A member reported encountering **SSL errors** when rapidly sending requests to the **API**, suspecting it might be due to spamming despite having the py.ssl module properly installed. - Another member suggested the issue could stem from **untrusted server certificates**, not client-side problems, and recommended contacting the support team. - **Suspect API Rate Limit Arises**: A member suspected the **SSL errors** might be related to an undocumented **API rate limit** triggered by spamming requests. - Another member noted that rate limits usually return a **429 error code**, however. - **MongoDB Status Queried**: Switching topics, a member inquired whether another's **MongoDB** was working. - The other member stated it was working fine and they used it yesterday. --- ### **Cohere ▷ #[「💡」projects](https://discord.com/channels/954421988141711382/1218409701339828245/1353370236643971123)** (2 messages): > `Discord Bot, RAG Pipeline, vnc-lm, Context Augmentation, Docker` - **vnc-lm Releases Discord Bot with RAG Integration**: A member released a new version of their Discord bot, **vnc-lm**, featuring a **RAG pipeline** that pulls data from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://open.spotify.com/episode/6a44wSFv8bc1T9x3mEE9Dq?si=tWnXTxqHQbqpky6bWqj0uw&nd=1&dlsi=d20a7ee755104caa">Sancocho con Limon - Quatsch Session 01</a>: FELD.FM · Episode</li><li><a href="https://github.com/jake83741/vnc-lm">GitHub - jake83741/vnc-lm: A Discord bot for large language models. Add Gemini, Sonnet-3.7 DeepSeek R-1, and other models. Easily change models, edit prompts, and enable web search.</a>: A Discord bot for large language models. Add Gemini, Sonnet-3.7 DeepSeek R-1, and other models. Easily change models, edit prompts, and enable web search. - jake83741/vnc-lm </li> </ul> </div> --- ### **Torchtune ▷ #[general](https://discord.com/channels/1216353675241590815/1216353675744641096/1352733484468146206)** (33 messages🔥): > `Synthetic Data Generation with vllm and deepseek r1, Llama4 Release, Qwen3 MoE, Good Data Problem, PDF Extraction` - ****Synthetic Data** Streams from **vllm** and **Deepseek R1****: A member is generating **synthetic data** using **vllm** and **Deepseek R1**, expecting the process to run for a couple of weeks. - Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://x.com/jerryjliu0/status/1902880391578653176">Tweet from Jerry Liu (@jerryjliu0)</a>: LlamaExtract is now in public beta 🔥- the leading, genAI-native agent for structured document extraction.We adapt the latest models and tune them so that you can structure even the most complex docum...</li><li><a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-0324">deepseek-ai/DeepSeek-V3-0324 · Hugging Face</a>: no description found </li> </ul> </div> --- ### **Torchtune ▷ #[dev](https://discord.com/channels/1216353675241590815/1236040539409879170/1352735360001507479)** (23 messages🔥): > `datasets library issue, GRPO LoRA 3B Single Device, vLLM support for data generation, CUDA graphs` - **Datasets Library Troubleshoot**: Members found an issue with the **datasets library** and attempted to debug it, with one suggesting upgrading the **datasets version**. - One member confirmed that they are on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: <a href="https://github.com/pytorch/torchtune/pull/2467">GRPO LoRA Single Device by ianbarber · Pull Request #2467 · pytorch/torchtune</a>: ContextWhat is the purpose of this PR? Is it to[x ] add a new feature fix a bug update tests and/or documentation other (please add here)#2421 - exploring a LoRA recipe.ChangelogWhat are ... --- ### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1353412469934264491)** (1 messages): > `DLCoT Optimizer, Chain-of-Thought Distillation, Token Usage Reduction, DSPy Optimizers` - ****DLCoT Optimizer** Launches for Chain-of-Thought**: A member has submitted a [pull request (#8000)](https://github.com/stanfordnlp/dspy/pull/8000) for a new optimizer called **DLCoT** (Deconstructing Long Chain-of-Thought) to the DSPy teleprompt module. - It enhances chain-of-thought reasoning by intelligently processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**: <a href="https://github.com/stanfordnlp/dspy/pull/8000">Add DLCoT Optimizer for efficient Chain-of-Thought distillation by jmanhype · Pull Request #8000 · stanfordnlp/dspy</a>: Add DLCoT (Deconstructing Long Chain-of-Thought) OptimizerOverviewThis PR adds a new optimizer to the DSPy teleprompt module: the DLCoT (Deconstructing Long Chain-of-Thought) optimizer. This feat... --- ### **DSPy ▷ #[general](https://discord.com/channels/1161519468141355160/1161519469319946286/1353165176161042493)** (20 messages🔥): > `DSPy for creative content generation, PAPILLON example, Agentic-Reward-Modeling link, DLCoT Optimizer, MIPROv2` - ****DSPy** for creative content generation discussed**: Members are discussing using **DSPy** for optimizing prompts for creative content generation and suggesting to use a *good judge*. - One member suggested checking out [PAPILLON](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) and [Agentic Reward Modeling](https://github.com/THU-KEG/Agentic-Reward-Modeling) examples. - ****DLCoT Optimizer** contribution**: A member shared a new contribution, the **DLCoT (Deconstructing Long Chain-of-Thought) Optimizer**, on [GitHub](https://github.com/stanfordnlp/dspy/pull/8000) for efficient Chain-of-Thought distillation. - The member encouraged others to check it out and provide feedback. - **Optimizing Prompt without Examples**: A member is seeking guidance on optimizing a prompt for passage summarization **without examples**, using a working evaluation function and wondered if they should use **COPRO** instead of **MIPROv2**. - Another member clarified that example *inputs* are always needed but summaries (labels) are not, if a judge/metric can assess summaries without a reference/label. - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/THU-KEG/Agentic-Reward-Modeling">GitHub - THU-KEG/Agentic-Reward-Modeling: Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems</a>: Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems - THU-KEG/Agentic-Reward-Modeling</li><li><a href="https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb">PAPILLON/papillon_tutorial.ipynb at main · Columbia-NLP-Lab/PAPILLON</a>: Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles - Columbia-NLP-Lab/PAPILLON</li><li><a href="https://github.com/stanfordnlp/dspy/pull/8000">Add DLCoT Optimizer for efficient Chain-of-Thought distillation by jmanhype · Pull Request #8000 · stanfordnlp/dspy</a>: Add DLCoT (Deconstructing Long Chain-of-Thought) OptimizerOverviewThis PR adds a new optimizer to the DSPy teleprompt module: the DLCoT (Deconstructing Long Chain-of-Thought) optimizer. This feat... </li> </ul> </div> --- ### **DSPy ▷ #[examples](https://discord.com/channels/1161519468141355160/1161519685616025600/1353064669274701935)** (9 messages🔥): > `DSPy Modules, Creative Writing Prompts, PAPILLON, Privacy Preservation` - **DSPy Module Usage Under Scrutiny**: A member inquired about the correct usage of **DSPy Modules** within the context of generating reports and charts from a **Pandas DataFrame** using **LLMs**. - Another member pointed out the difficulty in getting help without a more specific question beyond reviewing a large attached code file, the member then specified *is that the correct way to use DSPy Modules*? - **Members seek creative writing prompt examples**: A member requested examples for improving **creative writing prompts** or similar cases where there's no clear correct answer. - A link to the **PAPILLON GitHub repository** was shared, featuring a tutorial notebook focused on privacy preservation from internet-based and local language model ensembles, [PAPILLON GitHub](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb). **Link mentioned**: <a href="https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb">PAPILLON/papillon_tutorial.ipynb at main · Columbia-NLP-Lab/PAPILLON</a>: Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles - Columbia-NLP-Lab/PAPILLON --- ### **tinygrad (George Hotz) ▷ #[general](https://discord.com/channels/1068976834382925865/1068976834928193609/1353437505646755893)** (19 messages🔥): > `sops.gz dataset, Tinygrad CUDA port, Meeting #63 Agenda, AMD LLVM progress, ONNX Frontend for Tinygrad` - **Track Down `sops.gz` Origins**: A member inquired about the location of the `datasets/sops.gz` dataset used in `speed_compare_cuda_ptx`. - Another member shared that the dataset is available in the repo's [extra directory](https://github.com/tinygrad/tinygrad/blob/master/extra/datasets/sops.gz) and generated via the [generate_dataset.sh script](https://github.com/tinygrad/tinygrad/blob/master/extra/optimization/generate_dataset.sh). - **CUDA Port Ponderings**: A member inquired about the possibility of porting **Tinygrad** to **CUDA GPU** for training. - Another member responded with a link to the [README.md](https://github.com/tinygrad/tinygrad/?tab=readme-ov-file#accelerators) file, highlighting supported backends. - **Meeting Agenda Announced**: The agenda for meeting #63 was announced, covering topics such as **company update**, **quantized DSP**, **BERT**, **scheduler**, **driver**, **tensor cores**, **WebGPU**, **ONNX**, **RetinaNet**, **Torch frontend** and other bounties. - Discussion included **test_ops**, **multi GPU training**, **torch compile** and bounties for an **AMD LLVM backend**. - **AMD LLVM Backend Advancements**: Progress on the **AMD LLVM backend** was reported, including multiple merged pull requests and testing with **Llama3** and **Flux** examples. - A pull request is undergoing review. - **ONNX Frontend Ascends**: A member noted that `tinygrad.frontend.onnx` now exists, expressing intent to focus on **ONNX** preparation this week. - Validation of the top 30 **Hugging Face ONNX** repos is a topic. <div class="linksMentioned"> <strong>Links mentioned</strong>: <ul> <li> <a href="https://github.com/tinygrad/tinygrad/blob/master/extra/datasets/sops.gz">tinygrad/extra/datasets/sops.gz at master · tinygrad/tinygrad</a>: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad</li><li><a href="https://github.com/tinygrad/tinygrad/?tab=readme-ov-file#accelerators">GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️</a>: You like pytorch? You like micrograd? You love tinygrad! ❤️ - GitHub - tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ </li> </ul> </div> --- ### **tinygrad (George Hotz) ▷ #[learn-tinygrad](https://discord.com/channels/1068976834382925865/1070745817025106080/1353147176968126574)** (4 messages): > `Disable colored terminal output, tinygrad facades, GPU code generation, OpenCLEmpty guarantees` - **Disable colored terminal output in tinygrad**: A member asked if there's a way to disable colored terminal output. - **Tinygrad has two facades**: Tinygrad has two facades: the **deep learning** part (weights update, tensors, matrix multiplication), and the **compiler** part (GPU code generation and scheduling). - The deep learning part is better explained by [Karpathy’s Youtube tutorial](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ). - **OpenCL empty values are unguaranteed**: A member reported getting weird output from the [first example in tinygrad-notes](https://mesozoic-egg.github.io/tinygrad-notes/20241231_intro.html). - It was clarified that *with OpenCLempty is just empty, there's no guaranteed value*. **Link mentioned**: <a href="https://mesozoic-egg.github.io/tinygrad-notes/20241231_intro.html">Introduction to the internals</a>: Tutorials on tinygrad --- ### **LLM Agents (Berkeley MOOC) ▷ #[mooc-questions](https://discord.com/channels/1280234300012494859/1280370030609170494/1353204258391982166)** (9 messages🔥): > `Quiz Typos, AgentX Research Track, Remote Research Mentorship, Unpaid Research` - **Quiz Title Typo Causes Confusion**: A member reported a typo in the title of **Quiz 7**, causing confusion when checking answers for **Quiz 6**. - Another member acknowledged the catch and thanked the reporter. - **AgentX Research Track Application Live**: Selected students will receive mentorship from **Berkeley postdocs/mentors** on an **AgentX Research Track project** with applications due **March 26th** at **11:59pm PDT**. - Mentorship is not required to join or succeed in **AgentX**, and labs plus the Certificate Declaration form will be released in April as seen in the [attached image](https://cdn.discordapp.com/attachments/1280370030609170494/1353204258450964544/image.png?ex=67e2c76c&is=67e175ec&hm=1fb895b885ce732fd7e5b99b8ff24c55286d5). - **Research Track is Confirmed to be Remote and Unpaid**: A member confirmed that the **AgentX Research Track mentorship** will be conducted remotely. - Another member clarified that the mentorship is not paid, with mentors simply providing guidance on the research project. --- --- --- --- {% else %} > The full channel by channel breakdowns have been truncated for email. > > If you want the full breakdown, please visit the web version of this email: []()! > > If you enjoyed AInews, please [share with a friend](https://buttondown.email/ainews)! Thanks in advance!