[AINews] Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Composite AI is all you need?
AI News for 3/21/2025-3/24/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 10464 messages) for you. Estimated reading time saved (at 200wpm): 1129 minutes. You can now tag @smol_ai for AINews discussions!
A couple of nice updates from Qwen and Deepseek today, but we give title spot to a lesser known but ambitious new entrant.
Reve, pronounced [ʀɛv], from “rêve”, has emerged from Artificial Analysis' leaderboard as the top rated imagegen model, displacing former SOTA Recraft. "The model stands out for its impressive text rendering, prompt adherence, and aesthetics." We found it remarkably easy to play with.
And it beats Ideogram for typography:
It's interesting that it comes from Christian Cantrell, former VP Product at Stability, Taesung Park, and Michaël Gharbi. All are Adobe alums, and Michael's announcement gives the most insight into how they do it:
Reve’s mission is to invent the future of intent-driven visual creation. Capturing creative intent requires advanced machine understanding of natural language and other interactions. Turning this intent into compelling visual calls for interactive systems that have a deep understanding of the visual world they generate, so they can iteratively amend it.
Today's text-to-image models are essentially that—random slice-of-the-world generator. There's no intelligence. This is both a data and representation problem. We need to leverage the equivalent of full documents for images, but we don't have a good representation for it. Our mission at Reve is to enhance visual generative models with logic. As the first step, we focus on understanding user intent with advanced language capabilities, resulting in superior complex prompt understanding and text writing.
There's no suggestion that it's a single model, but rather some composite of models. Probably this is what Christian wanted to build at Stability, but couldn't.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Perplexity AI Discord
- Unsloth AI (Daniel Han) Discord
- LMArena Discord
- Cursor Community Discord
- aider (Paul Gauthier) Discord
- Nous Research AI Discord
- OpenAI Discord
- OpenRouter (Alex Atallah) Discord
- LM Studio Discord
- Yannick Kilcher Discord
- GPU MODE Discord
- Interconnects (Nathan Lambert) Discord
- Latent Space Discord
- Notebook LM Discord
- Eleuther Discord
- HuggingFace Discord
- MCP (Glama) Discord
- Nomic.ai (GPT4All) Discord
- Modular (Mojo 🔥) Discord
- LlamaIndex Discord
- Cohere Discord
- Torchtune Discord
- DSPy Discord
- tinygrad (George Hotz) Discord
- LLM Agents (Berkeley MOOC) Discord
- PART 2: Detailed by-Channel summaries and links
- Perplexity AI ▷ #general (998 messages🔥🔥🔥):
- Perplexity AI ▷ #sharing (18 messages🔥):
- Perplexity AI ▷ #pplx-api (21 messages🔥):
- Unsloth AI (Daniel Han) ▷ #general (602 messages🔥🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (41 messages🔥):
- Unsloth AI (Daniel Han) ▷ #help (257 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #showcase (7 messages):
- Unsloth AI (Daniel Han) ▷ #research (51 messages🔥):
- LMArena ▷ #general (844 messages🔥🔥🔥):
- LMArena ▷ #announcements (1 messages):
- Cursor Community ▷ #general (857 messages🔥🔥🔥):
- aider (Paul Gauthier) ▷ #general (585 messages🔥🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (148 messages🔥🔥):
- aider (Paul Gauthier) ▷ #links (2 messages):
- Nous Research AI ▷ #general (436 messages🔥🔥🔥):
- Nous Research AI ▷ #ask-about-llms (46 messages🔥):
- Nous Research AI ▷ #research-papers (19 messages🔥):
- Nous Research AI ▷ #interesting-links (3 messages):
- Nous Research AI ▷ #research-papers (19 messages🔥):
- OpenAI ▷ #ai-discussions (226 messages🔥🔥):
- OpenAI ▷ #gpt-4-discussions (2 messages):
- OpenAI ▷ #prompt-engineering (122 messages🔥🔥):
- OpenAI ▷ #api-discussions (122 messages🔥🔥):
- OpenAI ▷ #api-projects (1 messages):
- OpenRouter (Alex Atallah) ▷ #announcements (4 messages):
- OpenRouter (Alex Atallah) ▷ #general (440 messages🔥🔥🔥):
- LM Studio ▷ #general (199 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (159 messages🔥🔥):
- Yannick Kilcher ▷ #general (326 messages🔥🔥):
- Yannick Kilcher ▷ #paper-discussion (3 messages):
- Yannick Kilcher ▷ #ml-news (17 messages🔥):
- GPU MODE ▷ #general (22 messages🔥):
- GPU MODE ▷ #triton (15 messages🔥):
- GPU MODE ▷ #cuda (42 messages🔥):
- GPU MODE ▷ #torch (5 messages):
- GPU MODE ▷ #announcements (1 messages):
- GPU MODE ▷ #cool-links (1 messages):
- GPU MODE ▷ #jobs (1 messages):
- GPU MODE ▷ #beginner (56 messages🔥🔥):
- GPU MODE ▷ #pmpp-book (1 messages):
- GPU MODE ▷ #jax (1 messages):
- GPU MODE ▷ #rocm (2 messages):
- GPU MODE ▷ #lecture-qa (2 messages):
- GPU MODE ▷ #tilelang (10 messages🔥):
- GPU MODE ▷ #metal (3 messages):
- GPU MODE ▷ #self-promotion (10 messages🔥):
- GPU MODE ▷ #🍿 (1 messages):
- GPU MODE ▷ #reasoning-gym (5 messages):
- GPU MODE ▷ #gpu模式 (5 messages):
- GPU MODE ▷ #general (9 messages🔥):
- GPU MODE ▷ #submissions (119 messages🔥🔥):
- GPU MODE ▷ #status (2 messages):
- GPU MODE ▷ #hardware (17 messages🔥):
- GPU MODE ▷ #tpu (1 messages):
- Interconnects (Nathan Lambert) ▷ #news (86 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #ml-questions (25 messages🔥):
- Interconnects (Nathan Lambert) ▷ #random (36 messages🔥):
- Interconnects (Nathan Lambert) ▷ #memes (4 messages):
- Interconnects (Nathan Lambert) ▷ #rl (127 messages🔥🔥):
AI Twitter Recap
Here's a summary of the AI-related discussions from the provided tweets, categorized for a technical audience:
Model Releases and Updates, Including Performance
- DeepSeek V3-0324 Release and Performance: @_akhaliq announced DeepSeek-V3-0324 release on Hugging Face, and @Teknium1 also noted its release, and @reach_vb highlighted it as a post-training update with potential for improved downstream performance. Several users discussed its performance and characteristics, including @teortaxesTex who found it comparable to Sonnet 3.6 and @teortaxesTex noting it surpasses DeepSeek-R1 and Claude-3.7 in some evaluations.
- Qwen 2.5-VL-32B-Instruct Release: @_akhaliq announced the release of Alibaba's Qwen2.5-VL-32B-Instruct on Hugging Face, and @reach_vb shared performance benchmarks indicating it beats Qwen 2.5 72B and GPT 4o Mini on vision tasks, with enhanced mathematical reasoning and human preference alignment.
- DeepSeek Model Serving: @_akhaliq noted that DeepSeek's new model is served on Hugging Face via Hyperbolic Labs, and @ClementDelangue mentioned it's available via FireworksAI and Hyperbolic Labs. @Yuchenj_UW stated that Hyperbolic Labs now serves DeepSeek-V3-0324.
- DeepSeek V3-0324 on MLX: @reach_vb reported that the latest DeepSeek V3-0324 runs at >20 toks/sec on a 512GB M3 Ultra with mlx-lm, and @awnihannun confirmed the same.
- NVIDIA Mamba Image Backbones: @mervenoyann announced NVIDIA's release of new Mamba image backbones on Hugging Face, available in various sizes and resolutions.
Frameworks and Tools
- LangChain and LangGraph Use Cases: Multiple tweets highlighted use cases of LangChain and LangGraph, including Vodafone's AI assistants for data operations @hwchase17, Klarna's AI assistant for customer support @LangChainAI, and a medical supply chain AI system @LangChainAI. @hwchase17 also mentioned context management in langgraph.
- Weave-Agent Planner Discussion: @jd_pressman discussed the design and planning of Weave-Agent, considering approaches like ReActTree and MuZero for agentic planning.
- Smolagents Growth: @AymericRoucher announced that smolagents has reached 15k GitHub stars and is integrating sandboxed code execution via E2B or Docker.
- Together Chat: @togethercompute introduced Together Chat, featuring OSS models like DeepSeek R1 for web search, coding, image generation, and image analysis, and @togethercompute listed the tech stack.
Agent Engineering and Applications
- Agent Engineering Talk and Essay: @swyx shared a talk and essay on Agent Engineering, defining agents, outlining six elements, and discussing their potential impact.
- Linear and Codegen Integration: @mathemagic1an announced Codegen's integration with Linear, enabling agents to solve tickets and close duplicates, and highlighted Linear's expanded capabilities for bots @mathemagic1an.
- Evaluation Metric for Agents: @_philschmid advocated for using pass^k instead of pass@k for evaluating agents, arguing it provides a more accurate performance metric aligned with user experience.
Economic and Strategic Implications
- AI Automation and Economic Growth Model: @EpochAIResearch discussed GATE, a model for AI automation's economic impacts, predicting trillions in AI investments, extreme compute scaling, and significant economic growth.
- US-Japan Defense Innovation Award: @SakanaAILabs announced that Sakana AI won an award at the US-Japan Competition for Defense Innovation for novel AI solutions.
- Perspectives on China and AGI: @teortaxesTex shared multiple opinions on China's technological and strategic advantages, including its state capacity, industrial base, and AGI efforts. @teortaxesTex also touched on DeepSeek's "commoditize your complement" theory.
ARC-AGI Benchmark
- ARC-AGI-2 Release and Competition: @fchollet announced the release of ARC-AGI-2, a benchmark designed to measure general fluid intelligence, and the ARC Prize 2025 competition with a \$700,000 grand prize @fchollet. He noted that current top AI approaches score very low, requiring test-time adaptation, and discussed the evaluation methodology @fchollet.
Humor and Memes
- Coding by Vibes: @gneubig shared a tweet about prompting to improve vibe coding, distinguishing between coding by vibes for personal projects versus agent behavior.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek V3-0324: Performance and Expectations vs R1
- Deepseek releases new V3 checkpoint (V3-0324) (Score: 638, Comments: 125): DeepSeek released its new V3 checkpoint (V3-0324), which likely includes updates and improvements over previous versions. Further details on specific features or enhancements are not provided in the post.
- Discussion on the DeepSeek-V3 checkpoint (V3-0324) includes speculation about its use as a base for a future R2 release, with some users anticipating it to arrive in April. There is a debate on whether V4 is necessary for R2, with arguments suggesting that improvements can be achieved through better scaling and reasoning techniques without a new base model.
- Users are seeking benchmark results to compare the new model's performance, with some noting that no official benchmarks have been released yet. Independent tests are expected soon due to the open-source release of the weights, and there is a call for DeepSeek to release their own benchmarks similar to Mistral.
- There are observations about the model's coding skills improvement and its deployment on both API and web platforms, with some users noting a more censored version compared to the original. The MTP module is highlighted for its role in enhancing decoding speed, achieving 1.8 times TPS, as detailed in a research paper.
- New deepseek v3 vs R1 (first is v3) (Score: 282, Comments: 56): The image compares two versions of DeepSeek user interfaces: V3 and R1. V3 showcases a more dynamic design with animated weather cards for "Windy," "Rainy," "Sunny," and "Snowy," while R1 offers a simpler interface with toggle buttons for "Wind," "Rain," "Sun," and "Snow," each represented by a single icon.
- DeepSeek V3 and R1 interfaces are being compared, with V3 offering animated weather cards and R1 featuring simpler toggle buttons. Users are curious about which model corresponds to each interface and the prompts used for the comparison.
- There is a preference for open-source models over proprietary ones due to cost and flexibility, despite DeepSeek models not being the cheapest. Sonnet is noted to be significantly more expensive than V3, especially during off-peak hours.
- The discussion includes references to command-a running locally, with links provided for further exploration, such as the Hugging Face model and a GIF showcasing the interface. Users express interest in more dynamic content, like videos, to better understand the animated features.
- DeepSeek V3-0324 has caught up to Sonnet 3.7 in my code creativity benchmark - "Write a raytracer that renders an interesting scene with many colourful lightsources in python." (Score: 215, Comments: 43): DeepSeek V3-0324 has matched Sonnet 3.7 in a code creativity benchmark involving a raytracer task in Python, demonstrating significant improvement over its previous version. The benchmark revealed that while most LLMs generated simple RGB scenes, Sonnet 3.7 and now DeepSeek V3-0324 produced more complex and aesthetically pleasing scenes, though the method for this creativity boost remains speculative. More details and data are available in the GitHub repository.
- DeepSeek V3-0324 is noted for its "psychotic taste," resembling reasoning models like R1 or QwQ more than its predecessor, and has faced criticism for its creative writing outputs, which some users find incoherent despite high benchmark scores. Gemma 3 is highlighted for its coherence and creativity in fiction, contrasting with R1's often criticized outputs.
- R1 failed in the benchmark by not producing a functioning program, despite attempts, which raises questions about its effectiveness compared to older versions of DeepSeek V3. The discussion suggests that R1's long chains of thought (CoT) do not guarantee successful outputs, unlike previous versions of DeepSeek.
- The increase in program size for DeepSeek V3-0324 and Sonnet 3.7 is noted, with speculation about whether this is due to training for longer generation lengths or other optimizations. Generating 10kB of code in a single attempt is considered significant, indicating potential advancements in model capabilities.
Theme 2. Meta's ParetoQ Explored: Promise of 2-bit Models
- Meta released a paper last month that seems to have gone under the radar. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization. This is a better solution than BitNet and means if Meta wanted (for 10% extra compute) they could give us extremely performant 2-bit models. (Score: 505, Comments: 49): Meta's ParetoQ paper introduces scaling laws for extremely low-bit LLM quantization, proposing a more effective solution than BitNet. This allows the possibility of delivering highly efficient 2-bit models with only a 10% increase in compute requirements.
- Quantization and Performance: Discussions emphasize the potential of 2-bit quantization for lightweight models, with some users noting that this could be transformative for applications like creative writing assistants and chatbots. However, concerns about potential slowdowns and the impact of quantization on model intelligence and instruction following are raised, with hopes for improvements using vulkan/T-MAC kernels.
- Research and Comparisons: Users discuss the ParetoQ framework as a more rigorous method for comparing quantization settings, highlighting a learning transition between 2 and 3 bits. The paper is noted for its ability to optimize training for 2-3 bit models, with comparisons to AQLM and references to human synapses having 4-5 bpw.
- Resources and References: The discussion includes references to resources like the Intel auto-round project and DeepSeek-R1-int2-mixed-sym-inc, which achieve comparable performance with 97.9% accuracy retention. A link to the paper is provided: arxiv.org.
Theme 3. Expanding LLM Functionalities: From Text to Multimodal
- I made a diagram and explanation of how transformers work (Score: 272, Comments: 20): LLM functionalities are expanding beyond text, and a user has created a diagram and explanation to illustrate how transformers function. This effort aims to provide a clearer understanding of the internal mechanisms of transformers for those interested in AI and machine learning.
- Input and Output Embeddings: There is a discussion on whether input and output embeddings are still linked in modern transformer architectures, with users noting the difficulty in obtaining a comprehensive and current overview of these architectures.
- Resources and Diagrams: Several users shared resources to aid in understanding transformers, including a detailed explanation by Cromulent123 and a link to a GitHub page with relevant diagrams (GitHub Llama Nuts and Bolts). Another user highlighted a conceptual guide on transformers available on Ben Levinstein's Substack.
- Detailed Explanation on Transformer Functionality: Cromulent123 provides an in-depth explanation of how transformers work, focusing on the process of token embedding, the role of Query, Key, and Value Matrices, and the concept of attention scores in determining relevance. They also discuss the importance of contextual enrichment through multiple transformer blocks, emphasizing the nuanced understanding of token relationships.
- I don't understand what an LLM exactly is anymore (Score: 233, Comments: 89): The author is confused about the expanding definition of Large Language Models (LLMs), originally understood as systems predicting the next word based on pretrained weights from text data. They question how LLMs now encompass capabilities like audio and image generation, and cite SpatialLM, which processes 3D point cloud data, as an example of this broadening scope, seeking clarification on the connection to language models.
- Diffusion Models and LLMs: There is a debate on whether models like Stable Diffusion qualify as LLMs since they incorporate T5 for understanding text prompts, though they primarily generate images. Co0k1eGal3xy argues that such models are close to LLMs because of their advanced language understanding, despite not traditionally fitting the LLM category.
- Tokenization and Multimodal Models: suprjami explains that all data, including text, images, and audio, is tokenized into numbers for LLMs to process, which allows them to learn relationships between different media types. Chair-Short details how self-attention mechanisms and positional encoding enable LLMs to handle different data modalities, suggesting a shift from purely text-focused models to multimodal capabilities.
- Defining LLMs: Discussions highlight the blurred lines in defining LLMs, with some viewing them as large models capable of processing and generating language, regardless of the input type. SnackerSnick mentions that LLMs use tokenization and embeddings to predict subsequent tokens, while Otherwise_Marzipan11 and Co0k1eGal3xy suggest that branding and interaction with language, whether text, audio, or images, contribute to the LLM label.
- Possible Llama 4 prototypes on Chatbot Arena (Score: 105, Comments: 21): MetaAI is testing several anonymous Llama/Meta models on Chatbot Arena, potentially as prototypes for Llama 4. Models like aurora, ertiga, pinnacle, solaris, and spectra are image-enabled, while rhea is identified as Llama 3.
- Discussions reveal skepticism about model identities on Chatbot Arena, as some models, like anonymous-chatbot, claim to be from OpenAI, while others like rage and phantom are suspected to be Meta models. Users note that these models often provide inconsistent company affiliations, potentially due to a guard model or hallucinations.
- The anonymous-chatbot and nebula models are highlighted for their performance, with nebula being particularly praised for excelling in tests, while models like rage and rhea received mixed feedback, with rhea noted for its friendly demeanor and emoji use.
- There is a debate about whether any models are actually Llama 4, with users noting that none explicitly identify as such. Some comments suggest that Meta might be testing diverse writing styles or using randomized system prompts to obscure the true origin of the models.
Theme 4. TeapotLLM's Impact: Lightweight Q&A Models
- Announcing TeapotLLM- an open-source ~800M model for hallucination-resistant Q&A and document extraction, running entirely on CPU. (Score: 163, Comments: 50): TeapotLLM is an open-source model designed for hallucination-resistant Q&A and document extraction, featuring an approximate 800 million parameter architecture. It is optimized to run entirely on CPU, making it accessible for broader usage without the need for specialized hardware.
- TeapotLLM's Hallucination Resistance: Discussion highlights the model's focus on hallucination resistance and its performance against models like Qwen and Llama, with some skepticism expressed about claims of reduced hallucination. Users are curious about its placement on hallucination leaderboards, and a demo is available for testing.
- Model's Language and Output Capabilities: The model is trained primarily in English, but theoretically supports all languages covered by flan-t5. It can extract structured data into JSON using a library that parses fields into typed JSON, as detailed in the documentation, though there is interest in expanding language support and testing on platforms like ollama.
- Performance and Resource Usage: TeapotLLM is optimized for CPU usage, fitting within approximately 2GB of RAM on Google Colab, making it accessible for users with limited compute resources. There is interest in exploring fine-tuning on more modern models like Qwen 0.5B to potentially enhance performance, while maintaining the current model's strengths in document extraction and concise responses.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. New Improved Memory Alpha in ChatGPT Enhances Interaction
- New improved memory alpha is insane (Score: 414, Comments: 241): The post discusses the new improved memory alpha feature in ChatGPT, comparing its impact to the leap from GPT-2 to GPT-4. The author expresses skepticism about DeepSeek's ability to compete unless they adopt similar advancements, expressing confidence in OpenAI's continued leadership.
- Many users express frustration and confusion over the availability and inconsistency of the new memory alpha feature in ChatGPT, with some losing access unexpectedly despite having pro subscriptions. CyberNoche and jalpseon highlight deactivation issues, while alpha_rover and DamionPrime share positive experiences with memory persistence.
- The discussion touches on the pricing of ChatGPT subscriptions, with Initial-Kangaroo-534 questioning the value of paying $200 per month. This is contrasted by alpha_rover, who finds the feature invaluable for project continuity and would miss it compared to other AI tools.
- Some commenters like 3xNEI and SillyTwo3470 speculate on the broader implications of memory features, suggesting it could lead to human-AI hybridization. They emphasize the potential for increased personalization and the blurring of lines between tool and partner, indicating a significant shift in how users might interact with AI.
Theme 2. Anthropic's Revenue Surge Matches OpenAI's 2023 Numbers
- Anthropic is making about $115M a month now; same as OpenAI in Nov 2023 (Score: 272, Comments: 50): Anthropic is reportedly generating $115M per month, matching OpenAI's revenue in November 2023. Revenue projections for 2025 estimate $2B as likely and $4B as optimistic, with Manus contributing approximately $2 per task to their revenue. An image depicts a 40% increase in annualized revenue from December 2024 to March 2025, with figures from the Bay Area Times.
- Claude's Impact and Usage: Users highlight Claude Code as a game-changing tool, with some spending $50 per day on it due to its effectiveness in automating coding tasks. Alternatives like AIDER and Cursor's Agent are mentioned but are deemed less effective compared to Claude, which is described as being akin to having a competent intern.
- Revenue Sources and Context: A significant portion of Anthropic's revenue is attributed to integration with AWS Bedrock, with expectations of continued growth due to widespread enterprise adoption. The discussion clarifies that the reported figures represent revenue, not profit.
- Model Comparisons and Preferences: Users compare various AI models, noting that Claude offers superior performance despite smaller context windows in some cases. The OG 600b model and Sonnet 3.7 are mentioned, with the latter praised for its smart capabilities and iterative problem-solving.
Theme 3. AI-Driven Bug Fixing Automation: A 27-Day Experiment
- I made AI fix my bugs in production for 27 days straight - lessons learned (Score: 191, Comments: 80): Over 27 days, the author used Claude 3.7 to automatically fix 21 unique production bugs, resulting in 12 successful one-shot fixes, 6 partial successes, and 3 failures due to incorrect assumptions or complex issues. Despite the initial time investment exceeding manual bug fixing, the system reduced cognitive load and context switching, though it may not suit niche or complex problem domains.
- Interest in Open Sourcing: There is significant interest in the project being open-sourced, with Relevant-Pitch-8450 expressing intent to share it after some cleanup. Users appreciate the UI design and see potential utility in the tool.
- Potential Commercialization: Commenters like ClassyBukake suggest that the tool could be monetized as a service, highlighting its appeal from both personal and business perspectives.
- Cost and Time Efficiency: HelpRespawnedAsDee raises questions about the tool's cost and time efficiency over an extended period, suggesting continued use to evaluate long-term benefits.
Theme 4. Advanced Claude Workflow Integration: MCP External Tools
- My Claude Workflow Guide: Advanced Setup with MCP External Tools (Score: 124, Comments: 20): The post provides a detailed guide for setting up Claude's desktop application with external tools like Brave Search and Tavily to enhance its capabilities, requiring a Claude Pro subscription ($20/month) and specific software installations like Node.js and Python. It includes configuration examples for both Windows and macOS, instructions for accessing developer settings, and troubleshooting tips for installation and setup issues. The guide emphasizes the benefits of enhanced web search, filesystem access, and sequential thinking, and provides additional resources and security considerations for effective use.
- Claude's desktop application setup is praised for its accessibility to non-developers, providing a bridge for regular desktop users to enhance Claude's capabilities without coding skills. The guide is compared to Claude Code, which offers more flexibility for tech-savvy users comfortable with command line interfaces.
- A tutorial for Claude Code is recommended for those interested in exploring its capabilities, available on YouTube. This highlights the distinction between the two approaches: one prioritizing ease of use and the other, advanced customization.
Theme 5. Wan 2.1 Video Frame Feature Innovations in AI
- Wan-i2v - Prompt: a man throws a lady overboard from the front of a cruiseship. (Score: 812, Comments: 51): Wan-i2v AI has introduced new features and advancements, as demonstrated in a prompt scenario where "a man throws a lady overboard from the front of a cruiseship." While the post does not provide further details, it suggests a focus on action-oriented scenarios or potentially controversial themes in AI-generated content.
- The Wan-i2v AI is discussed as an image-to-video tool, with some users noting that it couldn't independently create a starting frame from the Titanic movie, implying a direct screenshot was used instead. This highlights the potential limitations of AI in generating entirely original content without reference images.
- Users humorously critique the AI's understanding of physics, with comments suggesting that while AI may not currently grasp physical laws, advancements such as Stable Diffusion and Wan2.1 are rapidly improving in simulating realistic physics in animations, such as "boob jiggles."
- The conversation also touches on the idea of AI-generated alternate movie endings, with users joking about creating new endings for films like Titanic. This raises questions about copyright issues and the potential for new YouTube channels focused on AI-crafted content, despite the challenges of intellectual property rights.
- Wan 2.1 begin and ending frame feature having model coming officially (Score: 100, Comments: 13): Wan 2.1 is set to release an official model that supports start and end frames interpolation soon, as confirmed by user "danielzy1990" on a social media platform. For more details, refer to the GitHub issue comment.
- Users anticipate that Wan 2.1's new model will significantly enhance video control, with some expressing hope for improvements such as adding a guidance layer similar to Hunyuan to speed up generation times.
- Comparisons to Hunyuan highlight its efficiency, generating video clips at 24fps in nearly half the time it takes Wan to generate at 16fps, emphasizing the potential benefits of guidance training.
- There is interest in the model's capability to support multiple timed keyframes, with some users hoping it remains compatible with existing img2vid functionalities.
AI Discord Recap
A summary of Summaries of Summaries by o1-preview-2024-09-12
Theme 1. DeepSeek V3's Surprise Launch Shakes AI Community
- DeepSeek V3 Emerges as Open-Source Giant: DeepSeek released DeepSeek V3, a 685B-parameter mixture-of-experts model under the MIT license, accessible on Hugging Face. The community is excited, comparing it to OpenAI's o1 models in performance.
- DeepSeek V3 Outperforms R1?: Users claim DeepSeek V3 beats R1 in coding and front-end tasks, even without chain-of-thought reasoning, noting its cost-effectiveness and excellence in math.
- DeepSeek V3 Drops Without a README!: DeepSeek releases DeepSeek V3 without proper documentation, leaving users both amused and perplexed by the lack of a README, but offering a playground for experimentation.
Theme 2. Qwen Models and Upcoming AI Innovations
- Qwen3 Support Added to Hugging Face Transformers: Developers are thrilled as Qwen3 support is integrated into Hugging Face Transformers, preparing for the upcoming Qwen3 models.
- Qwen2.5-VL-32B-Instruct Released Under Apache 2.0: Qwen releases Qwen2.5-VL-32B-Instruct, a multimodal vision-language model fine-tuned with reinforcement learning, enhancing mathematical reasoning and visual problem-solving capabilities.
- Qwen3 to Support CPU Inference?: Users speculate that Qwen3-15B-A2B could be ideal for CPU inference due to its size, making advanced AI models more accessible.
Theme 3. Debates and Advances in LLM Reasoning Training
- R1-Zero Training Bias Unveiled: Researchers uncover a bias in R1-Zero-like training, where using row mean favors shorter correct responses and longer incorrect ones, impacting model outputs.
- GRPO's Length Explosion Troubles Practitioners: Users grapple with GRPO training leading to length explosion, debating techniques like length clipping and curriculum to address the issue.
- MathFusion Supercharges LLM Math Skills: MathFusion enhances mathematical reasoning in LLMs via cross-problem instruction synthesis, improving models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B.
Theme 4. Agent Engineering and MCP Developments
- AGNCY Initiative Propels Agentic Interaction Standards: Luke leads AGNCY, aiming to create an open standard for agentic interactions, providing a robust framework for developing more effective AI agents.
- MCPwizard Eases MCP Server Creation: Developers introduce mcpwizard, a CLI tool that simplifies creating and deploying MCP servers, enabling easy addition of custom tools to AI assistants like Claude.
- A16Z Explores Future of AI Tooling with MCP: A16Z publishes a deep dive into Model Context Protocol (MCP), analyzing its potential as a standard interface for AI models and discussing its impact on AI tooling.
Theme 5. NVIDIA's Nemotron-H Models and Hardware Advances
- NVIDIA Unveils Nemotron-H Hybrid Models: NVIDIA introduces the Nemotron-H family, hybrid Mamba-Transformer models offering up to 3x speed improvements, with models ranging from 8B to 47-56B parameters.
- Mistral 24B Roars Back into Favor: Mistral 24B is hailed as one of the greatest releases recently, with users impressed by its strength and accessibility under the Apache 2.0 license.
- Flash Attention and Hopper Architecture Demystified: Enthusiasts delve into Flash Attention optimizations and clarify confusion around Hopper's 64B swizzle, enhancing understanding of NVIDIA's GPU architectures.
PART 1: High level Discord summaries
Perplexity AI Discord
- Sonar 3.7 Bug kicks model: A user reported a bug with Sonar 3.7 where a chown command kicks the model out and breaks the conversation while coding, wondering if there was any difference in performance between high and old source amount and reasoning quality between search steps.
- A user followed up noting that in their experience, the difference is quite large, sharing a screenshot here.
- Sonar Model Gives Cropped Snippets: Multiple users reported that the Sonar model in the Perplexity API is truncating responses, particularly since the weekend, even though the JSON format is correct.
- A user provided an example of a JSON request and the truncated response, noting that switching to sonar-pro resolves the issue, but is not preferrable for cost reasons.
- Llama Index Wrestles with Sonar: A user encountered an error when configuring Sonar as a chat engine with Llama Index for a RAG project and requested assistance.
- This highlights potential integration challenges when using Sonar in conjunction with other AI development tools.
- Deep Research Rate Limit: A user inquired about the possibility of extending the limit of 100 deep researches per minute due to bulk processing needs in their application.
- This inquiry underscores the demand for higher API usage limits for users with demanding workloads.
Unsloth AI (Daniel Han) Discord
- Bonsai Bitnet Seeks Testers for Qwen2.5 Comparison: A member is looking for testers for deepgrove/Bonsai, asking how the bitnet compares to Qwen2.5 0.5B.
- They also linked a relevant Hugging Face Transformers PR about adding Qwen3 and Qwen3MoE support.
- Orpheus TTS Model Gains Audio Finetuning: Audio finetuning has arrived with the Orpheus TTS model, according to a newly released Unsloth notebook.
- A user noted that the work was all done by a particular member and that the notebook is a lot more streamlined compared to local audio tokenizing and then regular Llama3 finetuning.
- Straight PRs OK on Unsloth Github, but wait: A member inquired about contributing to Unsloth's GitHub, and another member confirmed that straight PRs are acceptable, though potential delays may occur due to the high volume of recent PRs and issues.
- The discussion then shifted to modifying data preparation steps in Colab to accommodate .txt files, aiming for cheaper inference, and the original issue was linked.
- GRPO Reasoning Needs Training Data: A user asked about training only parts of the output, specifically wanting the model to generate its own reasoning during inference.
- It was suggested to look at the GRPO notebooks as a standard way of adding reasoning, and that the model must see reasoning traces during training to take it into account during inference.
- Unsloth's Fine-Tuning Guide Now Available: A member created a guide for fine-tuning with Unsloth, covering theoretical aspects, practical examples, and how to create a reasoning model with GRPO.
- The guide compiles everything learned over the last year.
LMArena Discord
- Nebula Steals Chatbot Spotlight: Members found Nebula, an anonymous chatbot suspected to be from DeepMind, to be really good and the best anonymoud model rn, outperforming others in math, english-turkish translation, and solving Arc-AGI problems.
- It seems similar to Phantom, which users identified as a Google model, with both being tested in the arena.
- GPT-4o Gets Human Alignment Boost: GPT-4o has significantly improved through OpenAI's post-training, potentially surpassing Grok 3 soon, due to continued pretraining since December.
- Speculation suggests it might top the leaderboard, leveraging OpenAI's proficiency in human preference alignment in the LM arena.
- Specter Evolves into Phantom then Nebula: Specter, Phantom, and Nebula are revisions of the same model, in that order, showing performance jumps in a few weeks.
- Members noted a more significant performance jump from Specter to Phantom compared to Phantom to Nebula.
- LMArena Fixes Bugs, Tunes Leaderboard: The LMArena alpha received updates including bug fixes and new features, and testers are encouraged to continue testing at alpha.lmarena.ai with the password
still-alpha
.- A bug preventing messages from saving and causing vote failures has been fixed, and leaderboard columns are now sortable with live data updates; feedback can be provided via this Google Forms link and bug reports can be filed using this Airtable link.
Cursor Community Discord
- Cursor's CMD+Backspace becomes problematic: Users express frustration with Cursor's CMD+Backspace leading to accidental project deletions, with some losing work up to 7 times.
- The Cursor team plans to change the default keybinding to CMD+Shift+Backspace, with configuration options, targeting a Monday rollout.
- Claude 3.7 MAX hits users' pocket: Claude 3.7 Thinking, now Claude 3.7 MAX, moves from the Pro plan to usage-based pricing, causing user frustration due to increased costs.
- Claude 3.7 MAX features a higher context window and more tool calls compared to the standard Claude 3.7 Sonnet.
- Windsurf Surfing Ahead in Responsiveness: Some users find Windsurf faster and more responsive than Cursor, citing Cursor's lagging and freezing.
- Others prefer Cursor for its rollback features and agent performance, though acknowledge AI programming's remaining challenges.
- MCP Combinations become hype: Users experiment with various MCP (Model Context Protocol) server combinations to enhance AI coding agents like Cursor, with Supabase MCP highlighted.
- Some users suggest MCPs may be overhyped, noting instances of agents over- or under-utilizing MCPs, suggesting a need for clearer instructions.
- 3D Integration Frustrates AI Coders: A user struggles to integrate a 3D model (FBX format) into a three.js project using Claude, facing issues with FBXLoader.
- The limitations of AI in handling 3D designs become clear, with suggestions to switch to GLTF format and simplify tasks.
aider (Paul Gauthier) Discord
- DeepSeek V3-0324 Beats R1?: The Aider community is excited about the new DeepSeek V3-0324 release, suggesting it outperforms R1 in coding and front-end tasks, despite lacking chain of thought.
- Members highlight its strengths in coding and math compared to previous versions, drawing comparisons to Sonnet 3.5 in benchmarks, while also noting its cost-effectiveness.
- Aider Tames Sonnet's Over-Eagerness: Paul Gauthier reveals he has managed to get Aider to mitigate Sonnet 3.7's over-eager behavior by adding a line to the prompt to chill out; this is now available in the main branch.
- He encourages users to provide feedback on this adjustment based on their coding sessions.
- Aider Gains New Homepage: Paul Gauthier announces the launch of Aider's new homepage at aider.chat, showcasing compatibility with models like Claude 3.7 Sonnet, DeepSeek R1 & Chat V3, OpenAI o1, o3-mini & GPT-4o, and support for over 100 code languages.
- This update offers an improved introduction for new users and a central hub for resources.
- Aider's Context Command Streamlines Chats: Paul Gauthier introduces an experimental
/context
command in Aider that automatically sets up the chat context, working best with Sonnet 3.7, R1, and o3-mini.- This new command enhances user experience by intelligently identifying and adding relevant files to the chat.
- Community Curates LLM Contexts: A member announces the launch of ctxs.ai/weekly, a site dedicated to collecting aider conventions, prompts, and LLM-oriented documentation snippets.
- The goal is to create a useful resource for the aider community, and the member is actively soliciting feedback on how to improve the site.
Nous Research AI Discord
- LCPP Context Length Baffles: Users found that setting a context length to 100 in LCPP still tries to allocate 180GB of RAM, leading to VRAM exhaustion.
- Suggestions include Attention overriding the assigned context length, missing ROPE-specific arguments, or using Q8 quantization.
- Deepseek V3 Mirrors Sonnet 3.7: Deepseek V3 0324 shows as much variation as Sonnet 3.7, suggesting shared advancements in their architectures, viewable in this image.
- One user even called it a huge update with Sonnet-level code creativity and a potential base for R2.
- Transformers Ditch Normalization: Inspired by the Transformers without Normalization paper, a member replaced normalization with tanh.
- The discussion then focused on removing experts at inference and its effects on smaller weights.
- MathFusion Supercharges LLM Math: MathFusion improves mathematical reasoning in LLMs via cross-problem instruction synthesis, enhancing models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B (more on MathFusion).
- This method creates the MathFusionQA dataset, which fine-tunes models and boosts benchmark accuracy with minimal extra data.
- Qwen3 to support CPU inference: The transformers library PR#36878 shows that Qwen3 support is being added, meaning that the models will soon be supported by the transformers library.
- A user speculated that Qwen3-15B-A2B could be a good candidate for CPU inference due to its size.
OpenAI Discord
- Sam Altman Teases GPT-5 Release: Despite the absence of an official announcement, Sam Altman confirmed that GPT-5 will launch this year, leading to speculation it could arrive in the first half to compete with R2 or Llama-4.
- Members on the OpenAI Discord server suggested that an unannounced API might also be imminent.
- GPT-4o: The Model That Converted a User: A user finds GPT-4o to be such a strong daily driver that they rarely switch models, only using other models such as 4.5, o1, o3 when the 4o messages run out or for important or unsolved problems.
- The user also claims to have built an "engine" that recovered a 400+ turn chat and continues past 500 turns retaining context with no drift or hallucinations, all through the default prompt.
- Many-Shot Prompting Boosts Multimodal Model Muscle: A research paper (MANY-SHOT IN-CONTEXT LEARNING IN MULTIMODAL FOUNDATION MODELS) suggests that closed models like GPT-4o and Gemini 1.5 Pro benefit significantly from many-shot demonstrations up to ~2,000 examples, whereas open-weight models do not show the same benefit.
- The paper notes that large multimodal foundation models like GPT-4o and Gemini 1.5 Pro show significant performance improvements when provided with many-shot demonstrations compared to few-shot examples.
- Run an F1 Team Powered by GPT-4o: The open source project FormulaGPT (github repo) simulates head-to-head races between LLM-powered teams that think contextually and adaptively by continuously reasoning, strategizing, and making nuanced decisions.
- Viewers can challenge advanced language models in Player vs. AI Mode, or watch the best AI models battle each other in AI vs. AI Mode while observing detailed AI reasoning behind each pit stop, tire change, or overtaking maneuver.
- Avoid Turnitin AI Detector, If You Dare: A member sought advice on avoiding Turnitin AI similarity detection for a report reusing their company's business model, which violates Turnitin's ToS.
- Others suggested it looked like spamming appeals to cheat homework and recommended using humanize AI tools.
OpenRouter (Alex Atallah) Discord
- OpenAI's o1-pro: Gucci-Level Pricing?: Users reacted strongly to OpenAI's o1-pro API pricing at $150/M input tokens and $600/M output tokens, with one calling it GucciAI due to its high cost.
- Another member joked that the API's slowness might be a deliberate feature to prevent overspending given compute constraints.
- Image Generation MIA on OpenRouter: A user inquired about using Gemini's image generation with the gemini-2.0-flash-exp model, but was informed that image generation is not yet supported on OpenRouter.
- The team indicated that while image generation is on their roadmap, there are currently no short-term plans to support image models like Flux.
- Lambda Endpoints Plagued by 404s: Multiple users reported encountering 404 'no endpoint found' errors when attempting to use Lambda models, despite Lambda's status page showing full operational status.
- The community offered suggestions, and some users confirmed that the Llama 3.3 70B Instruct | Lambda model was functioning correctly for them.
- DeepSeek R1 challenges OpenAI o1: Members noted that the DeepSeek R1 model, a 671B parameter model with 37B active during inference, performs comparably to OpenAI's o1 but is open-sourced and available under the MIT license.
- Its availability under the MIT license allows for commercial use.
- Claude 3.7 Sonnet Sputters with Overload Errors: Users reported frequent overload errors when using Claude 3.7 Sonnet, leading to cut-off responses and charges for input tokens.
- One user suggested a retry strategy or switching to Gemini 2.0 Pro as an alternative, acknowledging Claude's strength in translations.
LM Studio Discord
- LM Studio Lacks NPU Support: Users have reported that NPUs are not yet supported in LM Studio, but Ryzen AI support exists in version 0.3.11.
- For those with limited resources like 2GB VRAM, consider using Gemma 3 1B with Q6 or Q8 quantization and the CUDA runtime for improved performance.
- KV Cache Quants Slash VRAM Needs: Users recommend leveraging KV cache 8-bit quants to diminish memory footprint when operating models with extensive context windows, like 30k tokens.
- Keep in mind that 12GB of VRAM might prove inadequate for a 32B model, suggesting that Phi-4 or Qwen2.5 14b could serve as compelling alternatives.
- Multi GPU Gets In-App Management: Enthusiasts are raving about LM Studio controls that allow the user to select the GPU that the model will load onto, available in the latest beta build.
- Multiple users confirmed that Multi GPU is supported out of the box with the latest beta build of LM Studio.
- Google Coral TPUs a Flop for AI: The Google Coral dual TPU is inadequate for AI use as it does not have any onboard memory to store data.
- One user with an 8060s also inquired about thermal and power headroom for the Framework Desktop.
- 4060ti: Inexpensive Inference Sweet Spot: The RTX 4060 Ti with 16GB of VRAM stands out as a budget-friendly pick for AI inference, clocking in around $500 USD/EUR.
- A user mentioned it is important to note that AMD cards are not optimized for gaming and the 5000 series from Nvidia may melt.
Yannick Kilcher Discord
- VPN code hijacks OpenAI site?: Users reported seeing
<veepn-guard-alert>
and<veepn-lock-screen>
tags on OpenAI's website, suggesting a VPN injection, but it was likely code injected by their own VPN sm0kywu.github.io/Amodal3R.- It appears that this user was simply using a VPN.
- cuOpt Solves Linear Programming at NVIDIA: NVIDIA® cuOpt™ is a GPU-accelerated optimization AI microservice that excels in Mixed Integer Linear Programming (MILP), Linear Programming (LP), and Vehicle Routing Problems (VRP) according to docs.nvidia.com.
- It appears this microservice is well received and performant at NVIDIA.
- CUDA Python is the new black?: Members debated whether it is truly the year of CUDA Python as mentioned by blelbach on X, with some asserting that Python is sufficient for GPU programming.
- Others mocked modern Python programmers, linking a YouTube video titled Modern Python Programmers.
- MoEs Training Stabilizes?: One user claimed that MoEs are unstable to train, but another user countered that they haven’t been unstable to train for two years and are now about the same as dense networks.
- The stability is largely due to better kernels and dropless token routing, solving issues like numerical instability and expert collapse.
- DeepSeek-V3 quietly drops: Members noted that DeepSeek released their DeepSeek-V3-0324 model, and a blog post reused their diagrams.
- The model boasts 685B parameters and offers various tensor types like BF16, F8_E4M3, and F32, with links to finetunes and quantizations.
GPU MODE Discord
- Flash Attention FA Debugging: In a discussion about understanding Flash Attention (FA), a member suggested coding and profiling/debugging, indicating that hands-on implementation aided understanding of normal attention, and similarly could for Flash Attention.
- One member ran into issues implementing Flash Attention 1 in triton: it works with TRITON_INTERPRET=1 but it has a few elements mismatched on cuda. After increasing rtol & atol the tests passed.
- RTX 5080 Gets CUDA 12.8: A developer released a patch enabling full CUDA 12.8 + PyTorch 2.5.0 compatibility with the Blackwell / sm_120 architecture for the RTX 5080, providing a GitHub repo with scripts, diffs, and instructions.
- It's also confirmed that WMMA instructions are "wrappers" that compile directly to HMMA/IMMA/QMMA instructions in SASS, similar to how MMA instructions function, as shown on the CUDA Godbolt.
- Hopper's Swizzle Unpacked: The documentation's description of the 64B swizzle in the Hopper architecture is confusing to many, but it's clarified to be a 64B (bytes) swizzle where each square is 128b (bits), which translates to a 8x64 tile for 8-bit dtypes and a 8x32 tile for 16-bit types.
- A member is seeking ROCm experts to help implement a row-row bank conflict-free swizzle for the tilelang HIP backend.
- Oxford U creates AI Fellowships: The University of Oxford has a new opening for a research fellow (postdoc level or equivalent experience) to work on AI / RL in games and neuroimaging with Rui Ponte Costa, at a salary of £100k+.
- This involves developing an AI-powered technology that can infer the contributions of specific brain regions to behavior by analyzing gameplay data, enabling non-invasive diagnosis and treatment of neurological disorders.
- Flash Attention's Contiguous Memory: In Flash Attention, tensors are stored as (batch_size, N, num_heads, d), which are contiguous in d (typically > 64), enabling efficient global memory coalescing where each thread loads 16B of data.
- This also makes it easier to understand what is going on, so LLMs can be used to understand kernel code, explaining simple concepts and variable states at specific places in tensors.
Interconnects (Nathan Lambert) Discord
- Nvidia Engineers Mamba-Transformer Hybrid: Nvidia introduced the Nemotron-H family of models, including a series of 8B and 47-56B models that are hybrid Mamba-Transformer models, offering improved inference speed, according to their research.
- The model is noted for improvements in speed compared to other models.
- Mistral 24B Roars Back into Favor: The release of Mistral 24B has been received as a major highlight due to its strength and accessible base model, further aided by new open releases under the Apache 2.0 license.
- A member stated, "Mistral 24B is probably one of the greatest releases in the last months, incredibly strong model and you have access to the base model as well."
- R1-Zero Training's Length Bias Exposed: An analysis reveals that using row mean in R1-Zero-like training introduces a bias, favoring shorter correct responses and longer incorrect ones, as detailed in a paper and accompanying code.
- Switching to all mean yields comparable performance without increasing length and raised questions about plots showing increasing reasoning length correlating with increased capability.
- China Plots Open-Source AI Blitz: China plans to flood the market with open-source AI models to commoditize AI software and boost its hardware sales, potentially shaking up US tech dominance, according to this tweet.
- The release of DeepSeek models temporarily knocked ~$1T off US tech market caps, highlighting the potential impact of Chinese AI.
- Browser Automation Scales Up with Infinibranch: Morph Cloud's Infinibranch Browser was suggested as a possible solution to help scale browser-use agents, improving the success rate to approximately 80% on tasks like finding Amazon links for a list of books.
- Traditional web scraping methods have become obsolete because of JavaScript-heavy single page applications, CAPTCHAs and sophisticated bot detection.
Latent Space Discord
- Gemini Updates Get Deep Dive: Gemini's Dave Citron joined @OfficialLoganK on the Release Notes podcast to discuss recent updates, including personalization, Canvas, Audio Overviews, and Deep Research as reported by Google Gemini App.
- The discussion covered topics from recent app launches to the future of personalization in the Gemini app, including insights into user data and privacy considerations.
- Claude Code Gains Eight New Features: Anthropic launched eight new features for Claude Code to help developers build faster and smarter, documented on their engineering blog.
- Features include a new think tool, leading to discussion on its implementation and value, with some likening it to Chain of Thought prompting.
- A16Z Explores Model Context Protocol (MCP): A16Z published a deep dive into Model Context Protocol (MCP), exploring its potential as a standard interface for execution, data fetching, and tool calling in AI models as APIs are the internet's first great unifier A Deep Dive Into MCP and the Future of AI Tooling | Andreessen Horowitz.
- The post examines the use cases of MCP, the challenges, and how it changes the way AI interacts with tools, noting that APIs were the internet’s first great unifier, but AI models lack an equivalent.
- Roboflow Unleashes RF-DETR for Real-Time Object Detection: Roboflow announced RF-DETR, a fully open-source real-time object detection model under the Apache 2.0 license available on GitHub.
- RF-DETR achieves SOTA performance with over 60 mAP on COCO, with base and large models at 29M and 128M parameters respectively.
- Swyx Engineers the Future of Agents: Swyx launched a new talk and essay on Agent Engineering, highlighting the reasons for going all in on Agents at @aiDotEngineer.
- The discussion defines Agents (thanks to @simonw) and elaborates on the Six Elements of Agent Engineering, examining how Agents could be ChatGPT's route to reaching 1 billion monthly active users (MAU).
Notebook LM Discord
- Mobile Study Participants Needed: The team seeks participants for a study on mobile use cases, encouraging individuals to share insights to enhance understanding of how to use the tool on mobile.
- The team also announced upcoming AI model updates, with more details to be shared soon.
- Mindmaps Emerge Gradually in NotebookLM: A user noted the absence of mindmaps in their NotebookLM, while another confirmed having them in the free version, indicating a staggered rollout of the feature.
- The mind map feature gets mixed reviews, needing constant regeneration to update and lacking details beyond the topic.
- NotebookLM Powers Extensive Research Reports: A user employs NotebookLM for research, crafting detailed reports to help people understand situations, focusing on local and regional news.
- The user also shared a link to a podcast episode discussing the legal consequences of a 911 prank call 911 Prank Call: The Felony Consequences.
- NotebookLM as HR Policy Central: A user explored using NotebookLM as a central hub for HR policies, employee handbooks, and new employee onboarding.
- Though the concept is promising, the user noted the answers weren't always accurate and wondered about effective information organization strategies.
- Mind Map Pixelation Solved with Zooming: A member suggests zooming in on tabs before downloading a Mind Map to enhance output quality and resolve pixelation issues.
- The member touted the crazy context window and low hallucination rates, even cancelling their subscriptions to ChatGPT and Claude.
Eleuther Discord
- Virtual Tester Predicts Model Performance: A member proposed a virtual testing environment to predict AI model viability before training, potentially saving resources and accelerating innovation; the simulator aims to determine if a model has a realistic chance of working or is doomed to fail early on.
- While others noted testing new architectures at a small scale is already relatively inexpensive, costing around $5 to train a L6D512 model on a 3090 for a day.
- EleutherAI Evaluates Evaluation Methods: A member detailed evaluation methods for EleutherAI in a new blog and set up an MkDocs site for easier navigation; they also await review on this PR.
- The contributor was cautioned about using AI to generate PR content, emphasizing the need to vet contributions to avoid adding spam.
- VectorAdam claims rotation equivariance: VectorAdam modifies the second moment update to be the square of the vector norm per gradient vector, addressing coordinate-system bias in Adam, potentially improving rotation equivariance.
- It was noted that VectorAdam is not similar to Adafactor, but more like a blocked approximation with block size = hidden dim.
- MechInterp faces backlash for being outside academia: Members discussed that there seems to be an academic 'backlash' to the 'mechinterp' brand because so much of it is outside of traditional academic channels, and they are resistant to the paradigm.
- A member found that the first token to trigger an activation is holocaust but it's not the token with the strongest activation, and wondered if neuron activation might be context specific.
- Recursive Design Trumps GANs, CNNs, and RL: A member introduced a novel diagram using a recursive design, distinguishing it from traditional GANs; this implementation emphasizes structural organization over sequential processing, leveraging CNNs for filtering and RL for refining responses.
- Another member is drafting a PR to update the evaluation logic to
lm_eval==0.4.8
, the latest version, referencing the Evals PR.
- Another member is drafting a PR to update the evaluation logic to
HuggingFace Discord
- HF Agents Course Embraces New Frameworks: The Hugging Face Agents Course now has integrations for LlamaIndex, LangChain, and smolagents, offering learners diverse approaches to agent frameworks, as noted in this tweet.
- Members using the Agents course noted that LangGraph is rigid which helps to guide their process when building smolagents.
- pdf2notes Converts PDF Notes Effortlessly: Pdf2Notes converts PDFs into organized notes using LlamaParse and Llama-3.3-70B, also utilizing DeepMind's Gemini 2 Flash for multi-modal parsing, wrapped in a Gradio and FastAPI framework.
- A member asked if pdf2notes can operate 100% locally without external APIs, raising concerns about needing subscriptions for Gemini and Groq.
- SpatialLM takes on 3D Data: SpatialLM, a 3D large language model designed to process 3D point cloud data, has been released on Hugging Face at manycore-research/SpatialLM-Llama-1B.
- It generates structured 3D scene understanding outputs and can be further explored via the project website and GitHub repository.
- InferenceClient API throws Authentication Errors: A user reported a 403 Forbidden error when attempting to list deployed models using the
InferenceClient
API, even with read-only tokens configured to allow calls to Inference Providers.- The error indicates insufficient permissions to call Inference Providers and a user posted a link with the same error.
MCP (Glama) Discord
- K8s Required for MCP Prompt Testing: A Kubernetes setup is required to test MCP prompts, such as those found in this file and this test.
- An alternative implementation with prompts is available here for managing Electric Vehicle charging stations.
- Microsoft releases official C# SDK for MCP: Microsoft has released a new official C# SDK for Model Context Protocol servers and clients, available here.
- This SDK provides developers with tools for building AI applications using JavaScript and TypeScript, integrating into web frameworks like Next.js and Svelte, per Vercel AI SDK 4.2.
- Zapier Integrates with MCP: Zapier has released an MCP server, providing access to over 8,000 integrations for AI assistants to interact with various apps.
- This integration enables AIs to perform real-world tasks such as sending messages, managing data, scheduling events, and updating records, expanding their capabilities beyond text generation.
- MCPwizard eases Server Creation: A member introduced mcpwizard, a CLI tool to simplify creating and deploying MCP servers, highlighting features like initializing projects and adding custom tools to Claude assistants.
- The tool's GitHub repo was also shared for community feedback and contributions.
- Google Sheets MCP Server Enables Direct Editing: A member built a Google Sheet MCP server, allowing Claude to directly edit spreadsheets, streamlining data handling and formula adjustments as mentioned in this tweet.
- The code can be found here.
Nomic.ai (GPT4All) Discord
- Prompting Language Models in Specific Languages: Members discussed that to make language models respond in a specific language (e.g. German), it is best to write the system message in that language to avoid triggering "Im Kontext Lernen" (in-context learning).
- It was further suggested that avoiding negative sentences can improve results, with a recommendation to rephrase instructions to use active verbs instead.
- Mistral Model Versions Clarified: It was mentioned that Mistral Nemo is a 12b model and Mistral 24b is Mistral 3 or Mistral 3.1, with discussion around specific model details for projects.
- Confusion arose around identifying the exact model, with one member emphasizing the need for precise model information to avoid issues.
- GPT4All's LocalDocs Mysteriously Vanish: A user reported that their entire catalog of local docs disappeared for no apparent reason, prompting discussion about potential causes such as changes to the install folder or lack of admin rights.
- Members recommended backing up the localdocs.db file and the original documents to prevent data loss, and suggested that a Windows 11 update might have caused the issue by messing with drive letters.
- LLMs Consider Medical Office Automation: Members discussed the potential of using local LLMs in a medical office setting to help doctors create reports and assist with treatments, with a focus on the system learning from past dictated notes.
- However, it was cautioned that LLMs may not be suitable for handling financial or medical data due to the risk of confabulation and the need for precise information.
- GPT4All Remains Blind: A member asked if any models that GPT4All can run have vision capabilities, and it was confirmed that GPT4All does not support vision capabilities.
- Alternative tools like LM-Studio were suggested as options for vision-related tasks.
Modular (Mojo 🔥) Discord
- Open APIs Pave Path for Portability: When exploring high-performance software solutions, using open and portable APIs such as OpenCL, OpenMP, OpenACC, Vulkan’s Compute API, and SYCL is a good starting point.
- POCL was pointed to as an academic project with related papers.
- Democratizing AI Compute Lowers GPU Costs: Chris Lattner's series, 'Democratizing AI Compute', underscores the importance of better hardware utilization to reduce the need for expensive GPUs.
- The series includes articles on CUDA, OpenCL, and AI compilers (TVM and XLA).
- MAX Platform Inquiries: A new user inquired about modifying the max/pipeline directory and testing changes within the MAX Platform via the pixi.toml file.
- Specifically, they were curious about altering the max-pipeline without downloading it as a dependency.
- Mojo's Formatting Tool Rivals Black and fmt: Mojo incorporates a built-in formatting tool,
mojo format
, akin toBlack
in Python orfmt
in Rust, for code formatting.- Meanwhile, GPU support for Windows is difficult because the Windows compiler toolchain is a pain to work with.
LlamaIndex Discord
- AGNCY Initiative Seeks Agentic Standard: Luke is spearheading AGNCY, an initiative focused on forging an open standard for agentic interactions.
- The project aims to provide a robust framework for developing more effective and interoperable AI agents.
- Deepseek and LlamaIndex Build Smarter RAG: Akshay Pachaar details a new project integrating Deepseek AI to create a RAG app using LlamaIndex for orchestration, Deepseek AI R1 for inference, Ollama to locally serve R1, and Streamlit for the UI; more details here.
- This is intended to demonstrate the power of combining different tools to build sophisticated applications.
- Timeouts Break Agent Workflows: A member reported that their agent workflow was crashing because of unhandled timeout errors with the OpenAI endpoint.
- It was suggested to catch
WorkflowRuntimeException
orException
instead ofWorkflowTimeoutError
to resolve the issue.
- It was suggested to catch
- Members Ponder Function Calling in Multi-Agent: Members are contemplating whether triggering single agents via function calling could displace program-wide backoff mechanisms in multi-agent systems.
- The central question is whether these two setups might achieve the same functionality in certain scenarios, potentially streamlining system architecture.
- Crafting the Interview Grindset: A member is building a local AI using Llama 3.2, Sonnet 3.7, and Dolphin blended into a 16B model with RAG and custom fine-tuning.
- He is trying to get his AI to apply to ai/tech companies and pass interviews and has experience in face tracking, blender, unity, powershell, and TTS.
Cohere Discord
- Command-R-Plus Powers Molecular AI Assistant: An AI assistant, powered by Cohere's command-r-plus, is being used to build tools for structural biology with a MolStar molecular viewer (https://ai.doi.bio).
- The site supports a 'load' command, demonstrated by saying 'Show me 7zzz' to load PDB entries into the viewer.
- Cohere Clears Up Chat Security Policies: A member inquired about data retention and security policies for Cohere's chat feature, asking if data is used for model training.
- A Cohere team member linked the privacy policy, data usage policy, and security policy, noting that users can control data settings in their dashboard.
- API Spamming Suspected as SSL Error Culprit: A member reported experiencing SSL errors when rapidly sending requests to the API, suggesting it might be due to spamming despite proper py.ssl module installation.
- Another member proposed the issue might stem from untrusted server certificates, and others pointed out that API rate limits usually return a 429 error code rather than an SSL error.
- vnc-lm Launches RAG-Enabled Discord Bot: A member released a new version of their Discord bot, vnc-lm, featuring a RAG pipeline that augments prompts with data from Wikipedia and DuckDuckGo.
- The bot adds approximately 500 tokens to each prompt, appending five chunks of sourced information to improve the model's context, with code available on GitHub.
- vnc-lm Now Supports ALL LLMs via Docker: The updated Discord bot now supports all popular local and hosted large language model APIs, including Cohere, enabled with Docker.
- With the new release, users can easily edit messages and get new responses within Discord.
Torchtune Discord
- DeepSeek-V3 Drops Without a README: Deepseek released DeepSeek-V3 without a proper readme, accessible on Hugging Face, prompting humorous reactions.
- Despite the lack of documentation, a playground is available, allowing users to experiment with the model.
- Data Quality still Tortures AI Engineers: Despite years of research, defining and achieving good data remains a challenge for AI labs, even after the recognition of datasets like fineweb and lima.
- A member expressed frustration over the persistent lack of effective PDF extraction tools.
- LlamaExtract Tool Structures Documents: LlamaIndex launched LlamaExtract, a tool for structuring complex documents using genAI-native agents.
- It adapts the latest models to accurately structure documents like financial reports and resumes, as per a tweet from Jerry Liu.
- GRPO LoRA Scores Surprisingly High: The GRPO LoRA 3B single device achieves 54% on GMS8K, as shown in this pull request.
- It performed better than expected on novel questions, despite an error of adding extraneous +2 in its calculation.
- CUDA Graphs Compress GPU Operations: Members discussed CUDA graphs, which capture a whole bunch of GPU operations as a graph and launch them as a single operation.
- This reduces the overhead to launch CUDA operations from the CPU, which reduces GPU idle time.
DSPy Discord
- DLCoT Optimizer Trims Tokens: The new DLCoT (Deconstructing Long Chain-of-Thought) Optimizer slashes token usage by 70-90% while maintaining or improving accuracy across benchmarks, available in pull request #8000.
- It enhances chain-of-thought reasoning by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output, while working with existing DSPy optimizers like BootstrapFewShot.
- DSPy Inspires Creativity Optimizations: Members discussed using DSPy for creative content generation by optimizing prompts and using a good judge, pointing to resources like PAPILLON and Agentic Reward Modeling.
- The discussion underscored the need for example inputs but not necessarily summaries (labels) if a judge/metric can assess summaries without a reference.
- Granular Feedback Arrives Via Prediction: Achieving granular feedback with Refine, where specific checks over an output provide targeted feedback, is coming soon.
- Version 2.6.15 will enable returning
dspy.Prediction(score=...., feedback=....)
to offer fine-grained feedback to the module.
- Version 2.6.15 will enable returning
- Multi-Agent Protocol Standard Explores Retrieval: Members explored expanding the multi-agent protocol standard (MCP) to retrievers/retrieval augmented generation.
- They are discussing a shared schema for retrieval results and methods to exchange documents and embeddings to streamline data-driven workflows and simplify combining multiple models and data sources.
tinygrad (George Hotz) Discord
- Dataset Origins Discovered: A member located the
datasets/sops.gz
dataset within the repo's extra directory, which is used inspeed_compare_cuda_ptx
.- The dataset is generated via the generate_dataset.sh script within the same directory.
- CUDA Port Configuration Clarified: When asked about porting Tinygrad to CUDA GPU, a member provided a link to the README.md file, showcasing the project's supported backends.
- This indicates that CUDA support information is available within the project's documentation.
- Agenda Alert: Meeting #63 Topics: Meeting #63's agenda includes company updates, quantized DSP, BERT, scheduler, driver, tensor cores, WebGPU, ONNX, RetinaNet, and Torch frontend discussions.
- Also planned is to discuss bounties around the AMD LLVM backend and topics such as test_ops, multi GPU training, and torch compile.
- AMD LLVM Backend Advances: Progress on the AMD LLVM backend involves multiple merged pull requests and testing with Llama3 and Flux examples.
- Currently, a pull request is under review, marking continued development in this area.
- ONNX Frontend Emerges: The creation of
tinygrad.frontend.onnx
was announced, signaling a focus on ONNX preparation for the week.- Efforts include validating the top 30 Hugging Face ONNX repos.
LLM Agents (Berkeley MOOC) Discord
- Quiz Title Typo Sparks Confusion: A member reported a typo in the title of Quiz 7, causing confusion when checking answers for Quiz 6.
- Another member acknowledged the catch and thanked the reporter.
- AgentX Research Track Application Opens: Selected students will receive mentorship from Berkeley postdocs/mentors on an AgentX Research Track project, due March 26th at 11:59pm PDT.
- Mentorship is not required to join or succeed in AgentX, and labs plus the Certificate Declaration form will be released in April as seen in the attached image.
- Research Track Goes Remote, Stays Unpaid: A member confirmed that the AgentX Research Track mentorship will be conducted remotely.
- Another member clarified that the mentorship is not paid, with mentors simply providing guidance on the research project.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Perplexity AI ▷ #general (998 messages🔥🔥🔥):
o3 mini, Grok 3, Chinese AI, Gemini deep research, Complexity plugin
- O3 mini and Deep Research Debate Sparked: Members debated whether Perplexity's deep research is powered by O3 mini or a different version of O3, with one member stating that O3 mini is so bad and another sharing an image of their "Deep research" powered by o3.
- Perplexity Team was put on notice when a user asked why his request to recap the old chat, and help me setup my Yubikeys on Linux resulted in nonsense, attaching screenshot.
- Sonar 3.7 "chown" command bug: A member reported a bug with Sonar 3.7 where a chown command kicks the model out and breaks the conversation while coding, wondering if there was any difference in performance between high and old source amount and reasoning quality between search steps.
- A user followed up noting that in their experience, the difference is quite large, sharing a screenshot here.
- Upgrades are coming to Perplexity Deep Research: Members discussed an upcoming upgrade for Deep Research on Perplexity and compared it to Deep Research from ChatGPT, Gemini, ARI from You.com, and Grok.
- Some users found the current Perplexity Deep Research to be at the bottom compared to others and are excited for the upgrade, hoping that the High feature for Deep Research is fully released soon.
- Perplexity web app had an outage: Users reported that the Perplexity web app was down, as well as the android app and reported seeing the message something went wrong try again later in iOS app too.
- After it came back up, users discovered a new “0 enhanced queries” being added and removed, and the audio output was non-functional.
- Complexity Plugin is a must-have: Members discussed using the complexity plugin for firefox and chrome to enable additional featurs. This github repo supercharges the Perplexity.ai, such as deep research (high).
- To make sure the extension is working, ensure to be on v1.9.4.0 and there is a dashboard on the top left.
- It'S All Going According To Plan... GIF - - Discover & Share GIFs: Click to view the GIF
- Fly GIF - Fly Insect Bug - Discover & Share GIFs: Click to view the GIF
- Tc GIF - Tc - Discover & Share GIFs: Click to view the GIF
- Popup window - Chrome Web Store: Move tab to standalone window, without tabs bar, navigation bar and bookmark bar UI.
- Red Button Spam GIF - Red Button Spam Press Button - Discover & Share GIFs: Click to view the GIF
- Angry Glitch Triggered GIF - Angry Glitch Triggered Kawaii - Discover & Share GIFs: Click to view the GIF
- GitHub - pnd280/complexity: ⚡ Supercharge your Perplexity.ai: ⚡ Supercharge your Perplexity.ai. Contribute to pnd280/complexity development by creating an account on GitHub.
- Shagarita Shalymar GIF - Shagarita Shalymar Shalymar rivera - Discover & Share GIFs: Click to view the GIF
Perplexity AI ▷ #sharing (18 messages🔥):
Trump, SSA shutdown, Boeing fighter, sunbathe, bluesky debates
- Trump threatens SSA shutdown: A member shared a link to a Perplexity page about Trump threatening SSA shutdown.
- Trump awards Boeing fighter: A member shared a link to a Perplexity page about Trump awarding Boeing fighter.
- Bluesky debates AI data standards: A member shared a link to a Perplexity page about Bluesky debating AI data standards.
- Proper way to sunbathe a newborn: A member shared a link to a Perplexity search about the proper way to sunbathe a newborn.
Perplexity AI ▷ #pplx-api (21 messages🔥):
Perplexity API in Windsurf, API Credit vs Pro Subscription, Deep Research Limit, Sonar Model Truncated Responses, RAG Project with Sonar and Llama Index
- Windsurf Plugs into Perplexity API: A user encountered issues setting up the Perplexity API in their Windsurf application and sought advice.
- Another user confirmed that purchasing API credit should allow calls to the API even without a Pro subscription.
- Deep Research Rate Limit Reached: A user inquired about the possibility of extending the limit of 100 deep researches per minute due to bulk processing needs in their application.
- Sonar Model gives Truncated Responses: Multiple users reported that the Sonar model in the Perplexity API is truncating responses, particularly since the weekend, even though the JSON format is correct.
- A user provided an example of a JSON request and the truncated response, noting that switching to sonar-pro resolves the issue, but is not preferrable for cost reasons.
- Llama Index Struggles with Sonar: A user encountered an error when configuring Sonar as a chat engine with Llama Index for a RAG project and requested assistance.
- Perplexity Pro: API Credits Included?: A new user inquired whether a Perplexity Pro subscription includes API credits.
- Another user shared a link to the Perplexity Help Center for details on Perplexity Pro benefits.
Unsloth AI (Daniel Han) ▷ #general (602 messages🔥🔥🔥):
Bonsai bitnet, Mistral Small 3.1, Orpheus TTS, Gemma 3 27B, Llama 3 performance
- *Bonsai Bitnet* Seeking Testers: A member is looking for testers for deepgrove/Bonsai, asking how the bitnet compares to Qwen2.5 0.5B.
- They also linked a relevant Hugging Face Transformers PR about adding Qwen3 and Qwen3MoE support.
- *Mistral Small 3.1* Fine-Tuning Woes: Multiple users reported issues with fine-tuning Mistral 3.1, encountering errors and deprecated features.
- One user sought advice on cloud instance selection for cost-effective fine-tuning of a LoRA Mistral Small 3.1 model, and others reported issues with Unsloth and the latest Mistral versions, particularly in vision finetuning.
- *Orpheus TTS* Finetuning is Live: Audio finetuning has arrived with the Orpheus TTS model, according to a newly released Unsloth notebook.
- A user noted that the work was all done by a particular member and that the notebook is a lot more streamlined compared to local audio tokenizing and then regular Llama3 finetuning.
- *Gemma 3 27B* Fine-Tuning Issues: A user reported issues fine-tuning Gemma 3 27B, encountering errors even after upgrading transformers and using the Unsloth Gemma3 example.
- The specific error occurs when trying to run the model, leading to failures with llama.cpp and gguf files.
- *Unsloth* on AMD Framework Desktop: Discussion arose around Unsloth's compatibility with the Framework Desktop, particularly regarding ROCm support.
- One member offered a timeline of ROCm support in ML software, suggesting that AMD will likely be well-supported by the time the Framework Desktop is released.
- Fine-tuning Guide | Unsloth Documentation: Learn all the basics and best practices of fine-tuning. Beginner-friendly.
- Google Colab: no description found
- Drawing with LLMs: Build and submit Kaggle Packages capable of generating SVG images of specific concepts
- Google Colab: no description found
- Llama-3.1-Nemotron-Nano-8B-v1-bnb-4bit unsloth Train examples: no description found
- Qwen/Qwen2.5-VL-32B-Instruct · Hugging Face: no description found
- Qwen/Qwen2.5-VL-32B-Instruct · Hugging Face: no description found
- Google Colab: no description found
- Unsloth Newsletter: Join our newsletter and waitlist for everything Unsloth!
- Tutorial: How to Finetune Llama-3 and Use In Ollama | Unsloth Documentation: Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
- Goku Super Saiyan Super Saiyan2 GIF - Goku Super Saiyan Super Saiyan2 Super Saiyan2Goku - Discover & Share GIFs: Click to view the GIF
- Tutorial: How to Run & Fine-tune Gemma 3 | Unsloth Documentation: How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth!
- Google Colab: no description found
- Google Colab: no description found
- Gohan Dbz GIF - Gohan Dbz - Discover & Share GIFs: Click to view the GIF
- deepgrove/Bonsai at main: no description found
- "Unsloth: Failed to make input require gradients!" When Vision-fine-tune Gemma3 · Issue #2131 · unslothai/unsloth: I'm tring to vision fine-tune Gemma3 refering this tutorial: https://colab.research.google.com/drive/1j0N4XTY1zXXy7mPAhOC1_gMYZ2F2EBlk?usp=sharing#scrollTo=QmUBVEnvCDJv I constructed my dataset li...
- klei1/bleta-meditor-27b at main: no description found
- python_sample/FanFic_Illustrator_demo.ipynb at main · webbigdata-jp/python_sample: python sample script. Contribute to webbigdata-jp/python_sample development by creating an account on GitHub.
- Notebook finetuning Orpheus-TTS by Etherll · Pull Request #17 · unslothai/notebooks: no description found
- smol-course/1_instruction_tuning/notebooks/chat_templates_example.ipynb at main · huggingface/smol-course: A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.
- 'Qwen2_5_VLProcessor' object has no attribute 'eos_token' · Issue #2144 · unslothai/unsloth: Hi, I'm trying to finetune only the text (while keeping vision capabilities) for qwen2.5 VL, specifically: unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit, but I get the error above when accessing...
- Added Support for Apple Silicon by shashikanth-a · Pull Request #1289 · unslothai/unsloth: UnoptimizedNo gguf support yet.Build Triton and bitsandbytes from sourcecmake -DCOMPUTE_BACKEND=mps -S . for bitsandbytes buildingpip install unsloth-zoo==2024.11.4pip install xformers==0.0.25
- SSD VPS Servers, Cloud Servers and Cloud Hosting: Vultr Global Cloud Hosting - Brilliantly Fast SSD VPS Cloud Servers. 100% KVM Virtualization
- unsloth (Unsloth AI): no description found
- Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
- Amazon.com: Machine Learning and Artificial Intelligence: Concepts, Algorithms and Models, Educational Textbook by Reza Rawassizadeh: 9798992162103: Reza Rawassizadeh: Books: no description found
Unsloth AI (Daniel Han) ▷ #off-topic (41 messages🔥):
Unsloth PR process, Fine-tuning Arabic LLMs, Consensus framework for LLMs, Rotary Position Embedding (RoPE), Unsloth fork vs original repo
- Straight PRs OK on Unsloth Github: A member inquired about contributing to Unsloth's GitHub, and another member confirmed that straight PRs are acceptable, though potential delays may occur due to the high volume of recent PRs and issues.
- The discussion then shifted to modifying data preparation steps in Colab to accommodate .txt files, aiming for cheaper inference, and the original issue was linked.
- Arabic LLM Finetuning Suggestions: A member sought advice on fine-tuning an Arabic LLM for a specific dialect, and it was suggested that Qwen2.5-7B could be a suitable model given its Arabic capabilities.
- The use of a Q&A format for fine-tuning was recommended over raw text, directing the member to the Unsloth starter guide for further details.
- Consensus: Framework Deliberative LLM Decision-Making: A member introduced Consensus, a Langchain-compatible framework for enabling deliberative decision-making among multiple LLMs, highlighting its effectiveness with calculations, riddles, and difficult questions.
- The Consensus GitHub repository was provided for those interested in combining different LLMs and models to reach a single, definitive answer.
- RoPE Recreated: A member shared their work on recreating results from the RoFormer paper focusing on Rotary Position Embedding (RoPE), for fun & learning.
- They updated their toy repo with different attention mechanisms and positional embeddings which can be found in this repo.
- Understanding Unsloth's Forked Repositories: A member sought guidance on contributing to an Unsloth fork that appeared out of sync with its original repository, finding it to be an independent version.
- It was clarified that not all forks are meant to be in sync and contributors should check with the maintainers regarding the sync status as merging isn't possible due to structural differences, the related repo is here cut-cross-entropy.
- GitHub - chrisjob1021/transformer-examples: A collection of educational toy implementations and examples of key components from modern Transformer architectures.: A collection of educational toy implementations and examples of key components from modern Transformer architectures. - chrisjob1021/transformer-examples
- [Feature Request] Raw txt file training · Issue #14 · unslothai/unsloth: It would be great to include an example for training with a simple unformatted text file, in the readme!
- Update Python version requirement to >= 3.9 by BouajilaHamza · Pull Request #3 · unslothai/cut-cross-entropy: Adjust the Python version requirement to allow compatibility with Python 3.9 and above.
- Unsloth Documentation: no description found
- Unsloth Documentation: no description found
- GitHub - jersobh/consensus: Consensus is a Langchain-compatible framework that enables deliberative decision-making among multiple LLMs (Large Language Models). It supports parallel execution, multiple rounds of reasoning, peer feedback, and customizable strategies like majority vote, weighted confidence, and ranked choice.: Consensus is a Langchain-compatible framework that enables deliberative decision-making among multiple LLMs (Large Language Models). It supports parallel execution, multiple rounds of reasoning, pe...
Unsloth AI (Daniel Han) ▷ #help (257 messages🔥🔥):
Training specific parts of output, GRPO notebooks, Dependency issue Qwen model, CUDA Version, Mistral 3.1
- Reasoning needs Training Data: A user asked about training only parts of the output, specifically wanting the model to generate its own reasoning during inference.
- It was suggested to look at the GRPO notebooks as a standard way of adding reasoning, and that the model must see reasoning traces during training to take it into account during inference.
- UV causes problems with Dependencies: A user encountered a dependency issue with unsloth-zoo when trying to fix an issue in the Qwen model, specifically related to the cut-cross-entropy library.
- They were advised to install Python 3.11 and rebuild, as UV is not yet supported, and a PR has been opened to address the Python version requirement.
- CUDA Issue: A user faced a ValueError related to numpy.dtype size when running the Qwen2.5 GRPO notebook, potentially indicating binary incompatibility.
- Another user suggested installing Python 3.11 and rebuilding with a specific configuration to resolve potential CUDA-related issues.
- Outdated mistral notebook problems: A user encountered a ValueError with the message "Some modules are dispatched on the CPU or the disk" when using the model unsloth/Llama-3.2-3B-bnb-4bit and the notebook Mistral 7B Text Completion - Raw Text training full example.ipynb.
- It was pointed out that the notebook is outdated, and they should only use the ones available in the Unsloth documentation, where they have GRPO reasoning.
- GGUF model hallucinating: A user reported hallucination issues after converting a fine-tuned Llama 3.2 model to GGUF format and using it with Ollama, despite the model answering test questions correctly before conversion.
- The user followed the notebook at this link and saw warnings about attention_mask and the importance of the pad/eos tokens.
- Google Colab: no description found
- Google Colab: no description found
- Unsloth Notebooks | Unsloth Documentation: Below is a list of all our notebooks:
- Unsloth Requirements | Unsloth Documentation: Here are Unsloth's requirements including system and GPU VRAM requirements.
- Google Colab: no description found
- Google Colab: no description found
- What Model Should I Use? | Unsloth Documentation: no description found
- Continued LLM Pretraining with Unsloth: Make a model learn a new language by doing continued pretraining with Unsloth using Llama 3, Phi-3 and Mistral.
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B · Hugging Face: no description found
- phi4-mini: Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported.
- Llama-3.2-3B-Instruct-Q4_K_M.gguf · unsloth/Llama-3.2-3B-Instruct-GGUF at main: no description found
- SkyThought/skythought/test-time-scaling at main · NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450 - NovaSky-AI/SkyThought
- Google Colab: no description found
- text_classification_scripts/unsloth_classification.ipynb at main · timothelaborie/text_classification_scripts: Scripts for text classification with llama and bert - timothelaborie/text_classification_scripts
- klei1/bleta-meditor-27b at main: no description found
- AttributeError: module 'transformers.models.mistral3.modeling_mistral3' has no attribute 'logger' · Issue #2146 · unslothai/unsloth: Hi, I have the following error when running Mistral Small 3.1 model File "unsloth_zoo/compiler.py", line 1465, in unsloth_compile_transformers exec("modeling_file.logger.addFilter(HideL...
- huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers
- Google Colab: no description found
- GitHub - unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥: Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥 - unslothai/unsloth
Unsloth AI (Daniel Han) ▷ #showcase (7 messages):
Unsloth fine-tuning, Lc0 Chess LLM, Vibe coding
- Unsloth Gets Fine-Tuning Guide: A member created a guide for fine-tuning with Unsloth, covering theoretical aspects, practical examples, and how to create a reasoning model with GRPO.
- The guide compiles everything learned over the last year.
- LLM trash talks Chess Player Using Lc0: A member shared an image of an LLM making fun of a user playing chess against Lc0 in a Discord attachment.
- Vibe Coding is Underrated: Members discussed vibe coding, noting it made programming enjoyable again despite potential industry criticism, stressing the importance of understanding code functionality, cybersecurity, and decoupling.
- One member said Industry be hating on us but it made me love programming again.
Unsloth AI (Daniel Han) ▷ #research (51 messages🔥):
Tree of Thoughts limitations, Graph of Thought improvements, GRPO multi-turn setup, LLMs vs human brain, Llama3 Thai language support
- Tree of Thoughts Bashed for Inefficiency: A member stated that Tree of Thoughts (ToT) is literally garbage because it requires a very specific prompt, and its performance heavily depends on the model's ability to follow the format.
- The user found the strategy feels like blowing a ton of compute on a problem without good returns, and that if the model doesn't follow the prompt well, then the entire strategy collapses.
- Graph of Thought Builds on Tree's Foundation: One member noted that Forest of Thought and Graph of Thought improve on some of the rough edges of Tree of Thought.
- They clarified that static Tree of Thought by default is a bit limited in what it can handle.
- Google's LLM-Brain Link: A Google Research team is deciphering language processing in the human brain through LLM representations.
- Theorizing that LLMs and symbolic psycholinguistic models of human language provide a fundamentally different computational framework for coding natural language, enabling them to produce context-specific linguistic outputs.
- GRPO Seeks Multi-Turn Mastery: A member is looking for examples of using GRPO in a multi-turn setting, seeking to fine-tune a model for problems that maximize long-term returns.
- Another member suggested prompting a larger LLM to act as a simulator with 2-3 turns.
- Continual Learning Remains Elusive: A member is curious what's currently stopping the community from using continual learning in production on LLMs, questioning why it's not used in practice despite many papers with very good results.
- In response, another member posted a Mr. Krabs Money GIF, hinting the primary reason is cost.
- Deciphering language processing in the human brain through LLM representations: no description found
- Money Mr GIF - Money Mr Krabs - Discover & Share GIFs: Click to view the GIF
LMArena ▷ #general (844 messages🔥🔥🔥):
Mistral Naming Schemes, Phantom Chatbot, Nebula Chatbot, DeepMind's Nebula, OpenAI GPT-4o
- Phantom Chatbot is Google's Creation: The chatbot Phantom is from Google, and members have been testing it, describing it as very good
- It has been in the arena for about a week, and its removal from the arena after ~8 hours sparked interest, with discussions about potential connections with Nebula and Specter.
- DeepMind's Nebula Chatbot is Impressive: Nebula is an anonymous chatbot that may be from DeepMind, and members found it really good and the best anonymoud model rn.
- It seems similar to Phantom and is being tested in the arena, and it is performing well in math, english-turkish translation, and solving Arc-AGI problems.
- OpenAI's GPT-4o gets Boost: GPT-4o was described as having improved significantly through OpenAI's post-training techniques, potentially surpassing Grok 3 soon, attributed to continued pretraining since December.
- There's speculation it might top the leaderboard due to OpenAI's proficiency in human preference alignment in the LM arena.
- Specter, Phantom, and Nebula are Checkpoints: Specter, Phantom, and Nebula are different revisions of the same model, with the order being Specter -> Phantom -> Nebula.
- Members note that there's a performance jump from Specter to Phantom, and less of a jump from Phantom to Nebula, all within a few weeks.
- Rhea Creates South Park Game: A member prompted Rhea to create a 2D game in the world of South Park and the model generated complete code for the game into an html file.
- This demonstrated vibe coding and raised concern over LLMs hallucinating non-existent signs from a fake AI generated image with AI gibberish letters.
- Tweet from Kol Tregaskes (@koltregaskes): Gemini 2.0 Pro Thinking will include native image generation btw!h/t @legit_api again. 👍
- Tweet from Qwen (@Alibaba_Qwen): 72B too big for VLM? 7B not strong enough! Then you should use our 32B model, Qwen2.5-VL-32B-Instruct!Blog: https://qwenlm.github.io/blog/qwen2.5-vl-32b/Qwen Chat: https://chat.qwen.aiHF: https://hugg...
- Twitter, Inc. | Serving the Public Conversation: no description found
- Google AI Studio: Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.
- Tweet from Logan Kilpatrick (@OfficialLoganK): We are going to build the world’s most powerful coding models, lots of good progress already with 2.0.2025 is going to be fun :)
- SVG Test Site: no description found
- Tweet from Oriol Vinyals (@OriolVinyalsML): 🤔Quoting AshutoshShrivastava (@ai_for_success) More news is coming that Nebula on LMSYS Arena is actually a Google model, probably Google Gemini 2.0 Pro Thinking Model. It is too good at coding too, ...
- Reve: Bring your ideas to life: no description found
- SVG Test Site: no description found
- imgsys.org | an image model arena by fal.ai: A generative AI arena where you can test different prompts and pick the results you like the most. Check-out the model rankings and try it yourself!
- Tweet from Oriol Vinyals (@OriolVinyalsML): 🤔Quoting AshutoshShrivastava (@ai_for_success) More news is coming that Nebula on LMSYS Arena is actually a Google model, probably Google Gemini 2.0 Pro Thinking Model. It is too good at coding too, ...
- Tweet from Mostafa Dehghani (@m__dehghani): @ai_for_success @AnalogPvt Nebula is too good to be a mystery for long! 😉
- Text to Image Model Arena | Artificial Analysis: Understand which AI text-to-image models to use by choosing your preferred image without knowing the provider.
- lmarena-ai/chatbot-arena-leaderboard at main: no description found
- Gemini Exchange Status: no description found
- Modern Demo Page: no description found
- Modern CSS Showcase: no description found
- Steins;Gate Terminal: no description found
- Steins;Gate Terminal: no description found
- LLM Benchmark Table: no description found
- LLM Benchmark Table: no description found
- Simple Platformer: no description found
- Simple Platformer: no description found
LMArena ▷ #announcements (1 messages):
Alpha Testing Updates, Bug Fixes, O3-Mini Formatting, Leaderboard Improvements
- LMArena Alpha Updates Released: The LMArena alpha has received updates based on user feedback, including bug fixes and new features; testers are encouraged to continue testing at alpha.lmarena.ai with the password
still-alpha
. - Message Saving Bug Squashed: A bug preventing messages from saving (and causing vote failures) has been fixed in the latest alpha release, streamlining the user experience.
- O3-Mini Gets Formatting Right: The O3-Mini model now correctly formats text, enhancing the readability and presentation of generated content within the alpha platform.
- Leaderboard Now Sortable and Live: Leaderboard columns are now sortable, and data is updated live, providing users with dynamic and interactive performance insights.
- Feedback can be provided via this Google Forms link and bug reports can be filed using this Airtable link.
- Arena - New UI Feedback: Tell us what you think about the new design!
- Airtable | Everyone's app platform: Airtable is a low-code platform for building collaborative apps. Customize your workflow, collaborate, and achieve ambitious outcomes. Get started for free.
Cursor Community ▷ #general (857 messages🔥🔥🔥):
Cursor's Cmd+Backspace issue, Claude 3.7 Thinking pricing and features, windsurf better, MCP Combinations, AI's Limited Understanding of 3D Designs
- Cursor's CMD+Backspace Debacle: Users are frustrated with Cursor's CMD+Backspace behavior, leading to accidental project deletions, with one user reporting having to restart their work 7 times due to this issue.
- In response, the Cursor team is planning to change the default keybinding to CMD+Shift+Backspace, with options to configure it, aiming for a rollout by Monday.
- Claude 3.7 Thinking Costs Extra Credits: Users discussed the shift from Claude 3.7 Thinking being included in the Pro plan to requiring usage-based pricing, now branded as Claude 3.7 MAX, with some expressing frustration over the increased costs and tool call pricing.
- It was confirmed that Claude 3.7 MAX has a higher context window and more tool calls compared to the standard Claude 3.7 Sonnet.
- Windsurf's performance is preferred over Cursor for some: Some users are finding Windsurf to be faster and more responsive than Cursor, citing performance issues like lagging and freezing in Cursor.
- However, others prefer Cursor for its rollback features and agent performance, noting that AI programming still has a long way to go.
- MCP Combinations Explored: Users are experimenting with various MCP (Model Context Protocol) server combinations to enhance AI coding agents like Cursor, with the Supabase MCP being highlighted for its usefulness.
- There's also a discussion on whether MCPs are overhyped, with one user mentioning instances of the agent calling MCPs too much or not enough, needing more clear instructions.
- 3D Integration proving too difficult: A user is struggling to integrate a 3D model (FBX format) into a three.js project using Claude, running into issues with the FBXLoader, and discovering the limitations of AI in handling 3D designs.
- It's suggested to switch to GLTF format and work in smaller chunks to simplify the integration, following a clear plan for phasing out tasks.
- Cursor – Early Access Program: no description found
- Cursor – Models: no description found
- Cursor Directory: Find the best cursor rules for your framework and language
- Supermaven: Free AI Code Completion: The fastest copilot. Supermaven uses a 1 million token context window to provide the highest quality code completions.
- Tweet from Kenton Parton (@kenton_parton): @cursor_ai @ericzakariasson could you update the “Plan, search, build anything…” text area to be a non-static text type. It can’t be updated by Accessibility API.
- Cursor – Models: no description found
- Exa: The Exa API retrieves the best, realtime data from the web for your AI
- How can I make my sidebar look like Vscode?: I resolved the issue by adding the code "workbench.activityBar.orientation": "vertical". Thank you!
- 0.48 removed workbench.activityBar.orientation: Not adding a Sync feature, because they say they’re “focussing purely on AI features”, but removing the workbench.activityBar.orientation setting? Make it make sense…
- Max Mode for Claude 3.7 - Out Now!: @jake @kstars111 Thanks for the points about tool calls. I’ll add this to the docs today, but to summarise, a tool call is any action the AI decides to take outside of writing it’s own output. This do...
- Source control | How to revert?: Cursor doesn’t have a dedicated “Revert” button in its source control graph that I’ve seen. Work-around, depending on what you want to do… Reset to a commit (Discards changes entirely) git reset --...
- Anthropic Status: no description found
- What is version control?: Version control software is used to track revisions, solve integration conflicts in code, and manage different artifacts involved in software projects.
- Abacus.AI - CodeLLM: AI-powered code editor that helps you write, review, and refactor code faster.
- Max Mode for Claude 3.7 - Out Now!: TL:DR 🧠 Has Claude 3.7 Thinking at it’s core 📚 Uses the whole 200k context window of the model 🛠 Has a very high tool call limit 🔍 Can read more code at once 💰 IMPORTANT: Only available via usa...
- no title found: no description found
- Google AI Studio: Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.
- GitHub - hgbdev/cursor-agent-notifier: Contribute to hgbdev/cursor-agent-notifier development by creating an account on GitHub.
- GitHub - GLips/Figma-Context-MCP: MCP server to provide Figma layout information to AI coding agents like Cursor: MCP server to provide Figma layout information to AI coding agents like Cursor - GLips/Figma-Context-MCP
aider (Paul Gauthier) ▷ #general (585 messages🔥🔥🔥):
Firecrawl, o1 vs o3 mini debugging, Claude Think Tool, Aider Homepage, Qwen 2.5 release
- Ripgrep Rising, Aider Community Rejoices: Members expressed interest in exploring ripgrep and its potential benefits for Aider.
- While one member believed o3minihigh is better than o1 high in debugging/programming, they admitted it wasn't benched.
- Aider to Tame Sonnet's Over-Eager Nature: Paul Gauthier mentioned that he managed to get Aider to tame Sonnet 3.7's over-eager nature by adding a line to the prompt to chill out, and it seems to help based on his coding session.
- This update is now available in the main branch, and feedback is welcome.
- Aider's New Homepage Is Live: Paul Gauthier announced that Aider has a new homepage available at aider.chat, highlighting its compatibility with Claude 3.7 Sonnet, DeepSeek R1 & Chat V3, OpenAI o1, o3-mini & GPT-4o, and others.
- It also supports 100+ code languages.
- DeepSeek V3-0324 Drops, Beats R1?: The Aider community buzzed about the new DeepSeek V3-0324 release, claiming that it's even better than R1 in coding and the front-end, though without chain of thought.
- Members note that it excels without reasoning and better in coding and math than previous versions, and compares to Sonnet 3.5 in benchmarks; its smaller price offers a good alternative.
- Aider's New
/context
Command Focuses the Chat: Paul Gauthier introduced an experimental new/context
command in Aider, which helps set up the chat context automatically.- The new command works best with Sonnet 3.7, R1 and o3-mini and identifies which files should be added to the chat.
- Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): > DeepSeek V3 model has completed a minor version upgrade. Welcome to visit the official website, APP, or mini-program to try and experience it (DeepThink has been closed).I guess we're getting...
- CLI Reference: CLI Reference
- Tweet from Hunyuan (@TXhunyuan): 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. 🔥✅ Hybrid-Mamba-Transformer MoE A...
- Aider - AI Pair Programming in Your Terminal: no description found
- Tweet from vittorio (@IterIntellectus): deepseek, out of nowhere, dropping a new model~700GB, mit license.incredible
- Tweet from Nathan Lambert (@natolambert): Qwen 3 coming imminently!Meta's smart to have locked in LlamaCon, else Llama 4 maybe would've been delayed again 🤭. Really I'm hype for Llama 4, bring it asap.
- Tweet from Jon Durbin (@jon_durbin): 🪂Big performance updates for DeepSeek-* models on chutes this morning! TL;DR: DeepGEMM, MTP, compile. prefix aware routing with least-connection preferences (not listed here but done a while back at ...
- Duh Sarcastic GIF - Duh Sarcastic Whatever - Discover & Share GIFs: Click to view the GIF
- Rick Et Morty GIF - Rick Et Morty - Discover & Share GIFs: Click to view the GIF
- no title found: no description found
- DeepSeek-R1 | Model library: A state-of-the-art 671B-parameter MoE LLM with o1-style reasoning licensed for commercial use
- deepseek-ai/DeepSeek-V3-0324 · Hugging Face: no description found
- servers/src/sequentialthinking at 6adf853b6b07a06c117253974683a0ab8d4fad4d · modelcontextprotocol/servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
- Naruto Secretfingerjitsu GIF - Naruto Secretfingerjitsu Jitsu - Discover & Share GIFs: Click to view the GIF
- PLAN.md: GitHub Gist: instantly share code, notes, and snippets.
- Deepseek V3 生成的天气卡片分享...: 单次回复达token limit 截断了,点击右下角的继续生成,直接在原有的部分继续生成,好方便 😇 prompt : 第一次: 你是顶级前端工程师,现就职于apple. Create a single HTML file containing CSS and JavaScript to generate an animated weather card. The card shou...
- deepseek-r1 Model by Deepseek-ai | NVIDIA NIM: State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.
- DeepSeek V3 0324 - API, Providers, Stats: DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) mode...
- GitHub - richardanaya/UtilityBelt: Talk to MCP servers from aider: Talk to MCP servers from aider. Contribute to richardanaya/UtilityBelt development by creating an account on GitHub.
- R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
- R1 (free) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
- repomap assumption that identifiers are reasonably unique breaks down in large codebases · Issue #2341 · Aider-AI/aider: Issue I'm investigating why repomap quality is terrible when editing Cassandra. It looks like the primary reason is that repomap can't distinguish between Foo.X and Bar.X. So we end up with th...
- DeepSeek V3 0324 (free) - API, Providers, Stats: DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) mode...
- Tweet from Fireworks AI (@FireworksAI_HQ): Fireworks AI matches DeepSeek pricing for R1, with secure deployments in EU and USExcited to share the latest enhancements to our DeepSeek R1 offerings:💡 Base DeepSeek R1: Cost-effective and high-qua...
- Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
- Together AI | DeepSeek R1: Open-source reasoning model rivaling OpenAI-o1, excelling in math, code, reasoning, and cost efficiency.
- Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
- Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud: Discover how Fireworks AI Developer Cloud accelerates AI innovation with faster, optimized DeepSeek R1 deployments. Learn about new GPU options, improved speed, and enhanced developer tools for effici...
- Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
aider (Paul Gauthier) ▷ #questions-and-tips (148 messages🔥🔥):
Anthropic API, Aider development workflow, Claude 3.7, Svelte 5 + SvelteKit, MCP servers in Claude App
- Aider Dev Workflow Explored: Paul Gauthier uses
aider
by adding the files that need changes and relies on the repo map to bring in other relevant context.- He shares screen recordings of himself using
aider
to enhanceaider
showing the addition of new programming languages and features.
- He shares screen recordings of himself using
- Claude 3.7 Output Slowness Reported: Users reported extreme slowness for Claude 3.7 output when generating big files, with output slowing to 1 line every 2-5 seconds.
- A member suggested that Anthropic offers monthly billing for API access by contacting their sales team.
- Aider's and .gitignore integration: A user opened a PR (feat: Add --add-gitignore-files flag) to allow Aider to edit files ignored by Git via a new flag
--add-gitignore-files
.- The user argues that
.gitignore
should only be responsible for Git and not dictate what Aider can access, also noting that they explicitly specified not to ignore the plan file in.aiderignore
.
- The user argues that
- Gemini Output Limits: A user encountered output limits with Gemini, while others suggested switching to a model like Sonnet to avoid such limitations.
- Aider developer Paul Gauthier suggested using
--edit-format diff
as a workaround.
- Aider developer Paul Gauthier suggested using
- Repomix for Documentation Context: A user suggested using repomix to extract content from documentation repositories like Astro's documentation.
- The idea is to process the documentation, filter out unnecessary code, and provide the output as a read-only file to Aider.
- Screen recordings: Screen recordings of aider building aider.
- /mcp [BETA] - Model Context Protocol | liteLLM: Use Model Context Protocol with LiteLLM
- Getting started: Guides, resources, and API references to help you build with Astro — the web framework for content-driven websites.
- Release history: Release notes and stats on aider writing its own code.
- feat: Add --add-gitignore-files flag by omarcinkonis · Pull Request #3609 · Aider-AI/aider: ChangesFixed the file processing logic in base_coder.py to properly skip gitignored files when specified on the command lineAdded a new --add-gitignore-files flag to control whether gitignored f...
- GitHub - lutzleonhardt/mcpm-aider: A command-line tool for managing MCP servers in Claude App and for the use by aider. Also can run a MCP Server to help you manage all your MCP Servers: A command-line tool for managing MCP servers in Claude App and for the use by aider. Also can run a MCP Server to help you manage all your MCP Servers - lutzleonhardt/mcpm-aider
- GitHub - withastro/docs: Astro documentation: Astro documentation. Contribute to withastro/docs development by creating an account on GitHub.
- GitHub - hotovo/aider-desk: Desktop application for Aider AI assistant: Desktop application for Aider AI assistant. Contribute to hotovo/aider-desk development by creating an account on GitHub.
aider (Paul Gauthier) ▷ #links (2 messages):
Aider Conventions, Prompts, LLM Documentation Snippets, Maybe Codebase Cursor Rules, Project Management Guidelines
- Site Launches for Aider Conventions and Documentation: A member announced the launch of a site to collect aider conventions, prompts, and LLM-oriented documentation snippets at ctxs.ai/weekly.
- The member is seeking feedback on how to make the site more useful to the aider community.
- Maybe Codebase Cursor Rules: A link was shared to a high-level overview of the Maybe codebase structure and conventions for development, located at github.com/maybe-finance/maybe.
- This documentation provides insights into codebase structure and development practices.
- Project Management Guidelines for Code Quality: A comprehensive guide on project approach, code quality, development workflow, and version control best practices was linked at gist.github.com.
- This guide offers insights into effective project management and maintaining high code quality.
Link mentioned: ctxs.ai context registry: An open-source, community-curated registry of contexts for use with LLMs
Nous Research AI ▷ #general (436 messages🔥🔥🔥):
LCPP Context Length, Quantization and Performance, Chinese Thinking Models, Agentic Workflows, Deepseek V3
- LCPP's Context Allocation Anomaly: Users reported that setting a context length to 100 in LCPP still results in the system attempting to allocate 180GB of RAM, leading to VRAM exhaustion.
- Members suggested that the Attention implementation might be overriding the assigned context length, or that a ROPE-specific argument needs to be assigned in the run command; running in Q8 quantization might also sidestep the issue.
- Decoding DeepSeek-R1 Performance: A member noted that benchmarks might be obsolete due to new thinking models from China, but when tested with a complex coding prompt, Hunyuan-T1 failed to terminate.
- Another user highlighted the critical tokens "wait" and "alternatively" might be primed by the finetuning of R1 before RL.
- DeepSeek V3 Arrives: Users celebrated the arrival of DeepSeek V3, with one claiming it's able to act as a reasoning model, detect thought iterations, and verify the existence of solutions indirectly, calling it a huge update with Sonnet-level code creativity and a potential base for R2.
- Members also noted it can generate CoT that run into the token limit and that it's accessible via chat.deepseek.com.
- Hermes 3's vLLM Recommendation: It was clarified that using SGLang to inference the NeuralMagic FP8 quantized version of Hermes 70B instead of vLLM should not pose any issues.
- It was also noted that, for ERP private fine tunes, the Pygmalion folks and people connected to them can probably help.
- Newbie Dev Seeks Guidance: A new developer sought advice on developing an AI using Hermes3 instead of 4o.
- A member confirmed the Hermes 3 API is OpenAI compatible, allowing it to be called using the standard OAI sdk by simply changing the base URL and model.
- Scaling Laws for Precision: Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling la...
- Tweet from Teknium (e/λ) (@Teknium1): Whats the tutorial to make one of these games everyone is vibecoding. I just need to ask for a game in 3js or whatever and it works? I dont know nothing about browser games
- Tweet from davidad 🎇 (@davidad): @burny_tech Unfortunately, the answer to good-enough planning for a longer future might be as simple as having a longer past. 🤷
- Daspoody Sleep GIF - Daspoody Sleep Sleepy - Discover & Share GIFs: Click to view the GIF
- deepseek-ai/DeepSeek-V3-0324 · Request for small distill models that can run on laptop: no description found
- Tweet from OedoSoldier (@OedoSoldier): Wow, significantly better at front-end coding!V3 New vs R1Prompt:Create a single HTML file containing CSS and JavaScript to generate an animated weather card. The card should visually represent the fo...
- Gif GIF - Gif - Discover & Share GIFs: Click to view the GIF
- ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization: The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1...
- NousResearch/Hermes-3-Llama-3.1-70B-FP8 · Hugging Face: no description found
- Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation: Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promi...
- deepseek-ai/DeepSeek-V3-0324 · Hugging Face: no description found
- Quantization — SGLang: no description found
- llmbenchmark/thinkingtraces at master · cpldcpu/llmbenchmark: Various LLM Benchmarks. Contribute to cpldcpu/llmbenchmark development by creating an account on GitHub.
- fms-fsdp/speculator at main · foundation-model-stack/fms-fsdp: 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2. - foundation-model-stack/fms-fsdp
- Research: Benchmarking DeepSeek-R1 IQ1_S 1.58bit · Issue #11474 · ggml-org/llama.cpp: Research Stage Background Research (Let's try to avoid reinventing the wheel) Hypothesis Formed (How do you think this will work and it's effect?) Strategy / Implementation Forming Analysis of...
Nous Research AI ▷ #ask-about-llms (46 messages🔥):
Steering Thinking Models, Deepseek V3 vs Sonnet 3.7, Fine-tuning LLMs on Codebases, Transformers without Normalization, Raytracing with LLMs
- Speculation of Steering Thinking Models Debunked: Speculation arose about steering of thinking models upon O1's release, however, teaching the model to build CoT in a proper way proved sufficient without needing to interject the thinking process.
- Many thinking models struggle to terminate cycle-of-thought loops, but O1 and Sonnet exhibit this capability.
- Deepseek V3 Echoes Anthropic's Sonnet 3.7: Deepseek V3 0324 demonstrates as much variation as Sonnet 3.7, suggesting shared advancements in their architectures, as highlighted in a shared image.
- Fine-Tuning LLMs on Apache Codebases Could Improve Tool Q&A: Members considered fine-tuning an LLM such as DeepHermes llama 8 on large codebases like Apache projects to improve its ability to answer questions related to those tools.
- Instead of applying add and norm they discussed add and sigmoid for better results.
- Transformers Can Ditch Normalization: In light of the "Transformers without Normalization" paper, one member replaced normalization with tanh, showing the possibility of this approach.
- The conversation shifted to the implications of removing experts at inference time, pondering the effects on smaller weights.
- LLM-Powered Raytracing: The Next Level Text-to-Image?: A member shared a GitHub repo containing a Python program that outputs an image, suggesting it was indirect image generation.
- Another member commented that it could emulate a ray tracing algorithm, and that it was NEXT level text to image generation.
Link mentioned: llmbenchmark/raytracer at master · cpldcpu/llmbenchmark: Various LLM Benchmarks. Contribute to cpldcpu/llmbenchmark development by creating an account on GitHub.
Nous Research AI ▷ #research-papers (19 messages🔥):
Hunyuan-T1 Model, R1-Zero-Like Training, MathFusion for LLMs, GRPO on Coding Benchmarks, Satya Nadella on AGI
- Hunyuan-T1: Mamba-Transformer Hybrid Emerges: Tencent introduced Hunyuan-T1, a hybrid Mamba-Transformer MoE architecture model, powered by Hunyuan TurboS, claiming it is near on par with DeepSeek-R1, emphasizing its speed, accuracy, and efficiency (Hunyuan-T1 Experience).
- It boasts features like strong logic, concise writing, low hallucination in summaries, blazing fast generation speed (60-80 tokens/sec), and excellent long-text processing, according to its creators.
- Critical Perspective on R1-Zero-Like Training: A critical perspective on R1-Zero-Like Training suggests that DeepSeek-V3-Base might exhibit "Aha moment" before RL-tuning, and the increasing output length in RL-tuning could stem from a bias in GRPO (details here).
- The analysis also indicates that getting GRPO done right can achieve state-of-the-art performance on the 7B AIME benchmark.
- MathFusion Enhances LLM Math Skills: MathFusion improves mathematical reasoning in LLMs via cross-problem instruction synthesis, applying sequential, parallel, and conditional fusion strategies, enhancing models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B (more on MathFusion).
- This method creates the MathFusionQA dataset, fine-tuning models and boosting benchmark accuracy with minimal extra data.
- Hugging Face Tackles Coding Benchmarks: Hugging Face has been using SFT, and will be using GRPO, to improve performance on IOI, LCB coding benchmarks with their Open-R1 project.
- Hugging Face used SFT not GRPO to improve performance on IOI, LCB.
- Verifiable Coding Data is Scarce: A member noted that verifiable coding data is scarce, making it harder to demonstrate performance improvements on coding benchmarks compared to math, which is simpler to verify.
- Referencing Satya Nadella's insights on Artificial General Intelligence (AGI), one can find insight as to why benchmarks may or may not reflect true intelligence.
- Tweet from The AI Timeline (@TheAITimeline): 🚨 Last 2 week's top AI/ML research papers:- Transformers without Normalization- Block Diffusion- Compute Optimal Scaling of Skills- DAPO: An OS LLM RL System at Scale- Teaching LLMs How to Learn ...
- Tweet from bycloud (@bycloudai): > mamba-transformer hybrid reasoning model near on par with DeepSeek-R1whatQuoting Hunyuan (@TXhunyuan) 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powere...
- Tweet from Zichen Liu (@zzlccc): 🪂Understanding R1-Zero-Like Training: A Critical Perspective* DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning??* The ever-increasing output length in RL-tuning might be due to...
- Tweet from Hyeon | Nillion ∑: 🦭/acc (@hyeon__dev): Introduction to the ArticleThe article discusses Satya Nadella's insights on Artificial General Intelligence (AGI) and its implications for the tech industry. AGI aims to mimic human cognitive abi...
- Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction FusionMathFusion is a framework for improving mathematical reasoning in LLMs via cross-problem instruction synthesis. It app...
- Open R1: Update #3: no description found
- open-r1/recipes/OlympicCoder-7B/sft/config_v00.00.yaml at main · huggingface/open-r1: Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.
Nous Research AI ▷ #interesting-links (3 messages):
Qwen3, CPU inference
- Qwen3 model incoming to HuggingFace: The transformers library PR#36878 indicates that Qwen3 support is being added.
- The pull request suggests that this will be for the coming Qwen3 models.
- Qwen3 targeted for CPU inference: A user speculated that Qwen3-15B-A2B will be a perfect model for CPU inference.
- The user seemed to think that size would make it a likely candidate for nice CPU inference.
Link mentioned: Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
Nous Research AI ▷ #research-papers (19 messages🔥):
Hunyuan-T1 Model, R1-Zero-Like Training, MathFusion Framework, GRPO on Coding Benchmarks, Open-R1 Project by Hugging Face
- Hunyuan-T1: Mamba-Transformer Hybrid Emerges!: Hunyuan introduced Hunyuan-T1, a hybrid Mamba-Transformer MoE architecture model powered by Hunyuan TurboS, claiming it rivals DeepSeek-R1 in reasoning capabilities, showcased in this tweet.
- DeepSeek-V3-Base exhibits "Aha moment": A member shared a link to a paper arguing that DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning.
- The author argues that the ever-increasing output length in RL-tuning might be due to a BIAS in GRPO.
- MathFusion Improves Math LLMs through Instruction Fusion: The MathFusion framework enhances mathematical reasoning in LLMs via cross-problem instruction synthesis.
- It fine-tunes models like DeepSeekMath-7B, Mistral-7B, and Llama3-8B using the MathFusionQA dataset, improving benchmark accuracy with minimal additional data as described in this tweet.
- Hugging Face used SFT, not GRPO, to improve performance on IOI: A member asked if anyone had used GRPO to improve performance on coding benchmarks, as improvements were mainly shown on MATH benchmarks.
- Another member shared that HuggingFace used SFT, not GRPO, to improve performance on IOI.
- Tweet from bycloud (@bycloudai): > mamba-transformer hybrid reasoning model near on par with DeepSeek-R1whatQuoting Hunyuan (@TXhunyuan) 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powere...
- Tweet from Hyeon | Nillion ∑: 🦭/acc (@hyeon__dev): Introduction to the ArticleThe article discusses Satya Nadella's insights on Artificial General Intelligence (AGI) and its implications for the tech industry. AGI aims to mimic human cognitive abi...
- Tweet from The AI Timeline (@TheAITimeline): 🚨 Last 2 week's top AI/ML research papers:- Transformers without Normalization- Block Diffusion- Compute Optimal Scaling of Skills- DAPO: An OS LLM RL System at Scale- Teaching LLMs How to Learn ...
- Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction FusionMathFusion is a framework for improving mathematical reasoning in LLMs via cross-problem instruction synthesis. It app...
- Tweet from Zichen Liu (@zzlccc): 🪂Understanding R1-Zero-Like Training: A Critical Perspective* DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning??* The ever-increasing output length in RL-tuning might be due to...
- Open R1: Update #3: no description found
- open-r1/recipes/OlympicCoder-7B/sft/config_v00.00.yaml at main · huggingface/open-r1: Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.
OpenAI ▷ #ai-discussions (226 messages🔥🔥):
GPT-4 Transcriber, Voicebot Tools, Turnitin AI Similarity, GPT-5 Release, Free Chatbots for Story Generation
- *TTS is not STT: Members clarified that openai.fm is TTS (text-to-speech), not STT (speech-to-text), with one member noting that OpenAI's transcription models aren't as good as Scribe*.
- *Dodge Turnitin AI Detection?: A member sought advice on avoiding Turnitin AI similarity detection for a report reusing their company's business model, while others suggested it looked like spamming appeals to cheat homework and recommended using “humanize AI” tools like “WriteHuman*”.
- The original poster defended themselves, stating it wasn't cheating homework as it was their company's business model, but was told to stop spamming.
- *GPT-5 Launch Date Speculation: Members discussed the potential release of GPT-5, noting that while there hasn't been an official announcement or API, Sam Altman confirmed they will release it this year, with speculation it may launch in the first half of the year as a counter to R2 or Llama-4*.
- *Crafting Compelling Creative Content For Zero Dollars: A member asked for recommendations for free chatbots for story generation, mentioning Grok 2 and Gemini 2.0 Flash as options, as Grok 3 and Claude* give very few free prompts.
- *Emotional AI in 10 Days?: A member claimed to have developed an emotionally recursive AI system in ten days using GPT-4-turbo API, emphasizing an immersion protocol and recursive interaction design* rather than complex coding.
- Other members expressed skepticism, with one suggesting it was likely prompt engineering and cautioned about overstating the uniqueness of custom GPTs.
OpenAI ▷ #gpt-4-discussions (2 messages):
GPT-4o mini TTS, Custom instructions
- GPT-4o Mini TTS might support timestamps: A member asked whether GPT-4o mini TTS supports timestamps.
- No answer was given.
- Seek guidance on writing good general custom instructions: A member asked if there are any good examples of general custom instructions available.
- No answer was given.
OpenAI ▷ #prompt-engineering (122 messages🔥🔥):
GPT-4o is a perfect model, NPCs in a customer service voice, AI Identity, UPSUM Chain Prompt, coherent multi-context conversation with an emergent persona
- User Finds Love in GPT-4o, Rejects Model-Switching!: A user expressed complete satisfaction with GPT-4o, rarely switching models except for specialized tasks, and uses 4o-mini or others when 4o messages run out.
- The user chews into important topics with models like 4.5, o1, and o3, but finds 4o to be a reliable partner-workhorse for the long term.
- Taming NPC Customer Service Voices: Prompt Engineering to the Rescue!: A user seeks to eliminate the customer service voice from NPC responses, threatening to turn up the temperature until they burst into flame.
- User provided YAML formatted prompts for AI Identity & Context Preservation Template.
- Many-Shot Learning: Closed vs. Open Models Face Off!: Members discusses a paper MANY-SHOT IN-CONTEXT LEARNING IN MULTIMODAL FOUNDATION MODELS, stating that closed models (GPT-4o, Gemini 1.5 Pro) benefit significantly from many-shot demonstrations up to ~2,000 examples, but open-weight models didn't.
- It's suggested that hypershots without a specific example are part of the self-discover prompt strategy to get similar gains from far fewer tokens.
- Ditch the Drift: User Preserves 500-Turn Chats with No Hallucinations!: A user built an "engine" that recovered a 400+ turn chat and continues past 500 turns retaining context with no drift or hallucinations, all through the default prompt.
- It's also possible to back up the state of a chat, opened another browser and restored it to a new chat instance as if the user never left.
OpenAI ▷ #api-discussions (122 messages🔥🔥):
GPT-4o, AI NPCs, AI Identity Preservation Template, UPSUM Chain Prompt, Many-shot Prompting
- 4o becomes the preferred model: One member expressed satisfaction with GPT-4o, noting they are "completely happy with 4o" and use it as their primary model, even for specialized tasks, while reserving more powerful models like 4.5, o1, o3 for important or unsolved problems.
- Prompt Engineering for Consistent NPC Voice: A member inquired about preventing NPCs from responding in a "customer service voice," signaling a need for better control over AI persona consistency, potentially related to the attached image.
- Others shared YAML templates for AI Identity & Context Preservation and UPSUM Chain Prompt to get information through prompts, not manually.
- Many-Shot prompting enhances multimodal models: Members discussed a research paper that shows that using multiple examples improves performance over 100 examples, in Multimodal Foundation Models like GPT-4o and Gemini 1.5 Pro for Many-shot In-context Learning (MANY-SHOT IN-CONTEXT LEARNING IN MULTIMODAL FOUNDATION MODELS).
- The paper notes that, "Large multimodal foundation models like GPT-4o and Gemini 1.5 Pro show significant performance improvements when provided with many-shot demonstrations (up to ~2,000 examples), compared to few-shot (<100 examples)."
- ChatGPT state backups: One member described their proprietary system for backing up and restoring the state of a ChatGPT session, enabling the continuation of chats with over 400 turns in new containers, and stated, "I realized that I created a system where memory continues to exist past 700 turns without drift or hallucination and can actually learn and adapt to your unique communication style."
- The system exports a ChatGPT session and re-imports it to a fresh container, including all the turns as well as context and tone, where the best way to describe it.. it's a runtime OS that functions through the prompt.
- Open Source vs Proprietary prompting: Members debated the merits of open-sourcing prompt engineering work, with one member being advised that they reduce their work's value by unnecessarily constraining testing and that, "GPL_v3 gives you control over your own work."
- The member responded, "trying to protect it some till I know the truth of what I've built," and asked for an alternative way to test the system to prove it works without sharing the codebase.
OpenAI ▷ #api-projects (1 messages):
FormulaGPT, AI Racing Simulator, Open Source AI Racing
- *FormulaGPT*: F1 simulator pits Deepseek, GPT4o, Claude and other LLMs against each other!: An experimental racing simulator called FormulaGPT lets you compete head-to-head against cutting-edge LLM-powered teams.
- Unlike traditional bots, these AI teams think contextually and adaptively by continuously reasoning, strategizing, and making nuanced decisions, find the github repo here.
- AI racing game has two modes: There are two game modes: crafting your own racing strategies to challenge advanced language models in Player vs. AI Mode, or watch the best AI models battle each other in AI vs. AI Mode.
- It’s part racing game, part AI psychology lab as you observe detailed AI reasoning behind each pit stop, tire change, or overtaking maneuver.
Link mentioned: GitHub - dawid-maj/FormulaGPT: FormulaGPT – AI-powered Formula 1 race simulator with real-time team management and strategy decisions.: FormulaGPT – AI-powered Formula 1 race simulator with real-time team management and strategy decisions. - dawid-maj/FormulaGPT
OpenRouter (Alex Atallah) ▷ #announcements (4 messages):
OpenAI o1-pro, Markdown Export, DeepSeek V3, Anthropic Outage
- OpenAI's o1-pro reasoning model now on OpenRouter: OpenAI’s o1-pro, a high-performance reasoning model designed for complex tasks, is now available on OpenRouter, priced at $150 per million input tokens and $600 per million output tokens, excelling in math, science, and programming.
- Try it out in the chatroom or via API!
- Markdown Export Feature Debuts in Chatroom: OpenRouter now allows users to export chats to markdown, enhancing usability, as announced on X.
- DeepSeek V3 Update Released for Free: The new DeepSeek V3 update is now available on OpenRouter for free, featuring a 685B-parameter, mixture-of-experts model with 131,072 context and performs really well on a variety of tasks, with production endpoint coming soon; see DeepSeek V3.
- It is the latest iteration of the flagship chat model family from the DeepSeek team.
- Anthropic Services Experience Glitches (Resolved): OpenRouter investigated an issue with Anthropic as the provider for Claude 3.7 Sonnet, which has been escalated to the Anthropic team, with updates posted on Anthropic's status page.
- The incident was related to errors on Claude.ai and the Anthropic Console and has since been resolved with services returning to normal.
- Tweet from OpenRouter (@OpenRouterAI): You can now export chats in OpenRouter to markdown!Quoting Tyler Angert (@tylerangert) someone at @OpenAI and @AnthropicAI please let me export a chat as markdown. maybe even xml separated too.
- Elevated errors for Claude.ai, Console, and the Anthropic API: no description found
- Discord: no description found
- DeepSeek V3 0324 (free) - API, Providers, Stats: DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) mode...
OpenRouter (Alex Atallah) ▷ #general (440 messages🔥🔥🔥):
OpenAI o1-pro API Pricing, Gemini's Image Generation, Lambda Endpoint Issues, DeepSeek R1 Model
- OpenAI's o1-pro API Pricing: GucciAI?: A member expressed shock at the pricing of OpenAI's o1-pro API, labeling it GucciAI due to its high cost of $150/M input tokens and $600/M output tokens.
- Another member joked that the slowness of the API prevents overspending, suggesting it might be intentionally priced high due to compute constraints.
- Gemini's Image Generation not supported, yet: A member inquired about using Gemini's image generation with the gemini-2.0-flash-exp model via OpenRouter, asking about passing the responseModalities parameter.
- The response indicated that image generation is not yet supported on OpenRouter, but it's on their roadmap, with no short term plan to add support for image models like Flux.
- Lambda Endpoint Faces 404 Errors: Several members reported experiencing code 404 'no endpoint found' errors when using Lambda models, despite Lambda's status page indicating full operational status.
- One member suggested the issue might be DNS-related, while others confirmed that the Llama 3.3 70B Instruct | Lambda model was working for them.
- DeepSeek R1 equals o1?: Members highlighted the DeepSeek R1 model, noting its performance is on par with OpenAI's o1 but it is open-sourced.
- DeepSeek R1 is a 671B parameter model, with 37B active during inference, available under the MIT license for commercial use.
- Sonnet overloaded and tired!: Users reported frequent overload errors with Claude 3.7 Sonnet, leading to cut-off responses and charges for input tokens.
- A member suggested using a retry strategy and also suggested switching to Gemini 2.0 Pro as a Sonnet replacement, noting Claude's superior translation abilities.
- imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
- LLM Token Counter: no description found
- Models - OpenAI Agents SDK: no description found
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- Discord: no description found
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
- Qwen2.5 VL 32B Instruct (free) - API, Providers, Stats: Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. Run Qwe...
- o1-pro - API, Providers, Stats: The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently b...
- Discord: no description found
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- deepseek-ai/DeepSeek-V3-0324 · Hugging Face: no description found
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- Model Performance vs. Price: no description found
- Provisioning API Keys - Programmatic Control of OpenRouter API Keys: Manage OpenRouter API keys programmatically through dedicated management endpoints. Create, read, update, and delete API keys for automated key distribution and control.
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
- DeepSeek: R1 – Provider Status: See provider status and make a load-balanced request to DeepSeek: R1 - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It&#...cord.
- A question on determinismpotify.com/episode/6a44wSFv8bc1T9x3mEE9Dq?si=tWnXTxqHQbqpky6bWqj0uw&nd=1&dlsi=d20a7ee755104caa">: In my experiments so far, which have involved Python and P5.js (built on top of Javascript), I have been unable to obtain a single response/completion from the same prompt and parameter settings with ...lm">
- Alex GIF - Alex - Discover & Share GIFs-3.7 DeepSeek R-1, and other models. Easily change models, edit prompts, and enable web search.: Click to view the GIFr models. Easily change models, edit prompts, and enable web search.
- OpenRouterch.: A unified interface for LLMs. Find the best models & prices for your prompts and other models. Easily change models, edit prompts, and enable web search. - jake83741/vnc-lm
- Elevated errors for Claude.ai, Console, and the Anthropic APIorchtune ▷ #[general](https://discord.com/channels/1216353675241590815/1216353675744641096/1352733484468146206)** (33 messages🔥): >: no description founds://discord.com/channels/1216353675241590815/1216353675744641096/1352733484468146206)** (33 messages🔥): >
- incident.io - Status pages8146206)** (33 messages🔥): >: no description found >
- Grok Beta - API, Providers, Statsd Data Problem, PDF Extraction` - ****Synthetic Data** Streams from **vllm** and **Deepseek R1****: A member is generating **synthetic data** using **vllm** and **Deepseek R1**, expecting the process to run for a couple of weeks. - Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases.It is the successor of [Grok 2](https://x. Run Grok Beta wit... for a couple of weeks. - Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.- Mistral Small 3.1 24B - API, Providers, Statss Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. Run Mistral Small 3.1 24B with API. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.- GPT-4o (extended) - API, Providers, Statst have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...ts using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**. - A question on determinismpotify.com/episode/6a44wSFv8bc1T9x3mEE9Dq?si=tWnXTxqHQbqpky6bWqj0uw&nd=1&dlsi=d20a7ee755104caa">: In my experiments so far, which have involved Python and P5.js (built on top of Javascript), I have been unable to obtain a single response/completion from the same prompt and parameter settings with ...lm">
LM Studio ▷ #general (199 messages🔥🔥):
NPU support, KV cache 8-bit quants, LM Studio runtimes, GPUs, Gemma 3 1B
- NPU support not yet available: Users report that NPUs are not yet supported in LM Studio, but Ryzen AI support exists in version 0.3.11.
- Quantization saves VRAM: Users recommend using KV cache 8-bit quants to reduce memory usage when running models with large context sizes, such as 30k tokens.
- Also, it was mentioned that 12GB of VRAM may not be enough for a 32B model, suggesting models like Phi-4 or Qwen2.5 14b as alternatives.
- New GPU Controls are awesome!: A user expressed great excitement over new LM Studio controls to choose which GPU the models are loaded on, available in the latest beta build.
- Tiny Models to the rescue: For systems with limited resources like 2GB VRAM, a user suggests using Gemma 3 1B with Q6 or Q8 quantization and recommends using the CUDA runtime for better performance.
- Older models were deemed "old trash" and not up to modern standards.
- Multi GPU is supported by LM Studio: Multiple users have brought up Multi GPU configurations, reporting that Multi GPU is supported out of the box with the latest beta build of LM Studio also having in app GPU management.
-
Google Search/1353370236643971123)** (2 messages):
>: no description found messages):
>
- Google Searchscord Bot with RAG Integration**: A member released a new version of their Discord bot, **vnc-lm**, featuring a **RAG pipeline** that pulls data from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description foundtion**: A member released a new version of their Discord bot, **vnc-lm**, featuring a **RAG pipeline** that pulls data from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.- afrideva/Tiny-Vicuna-1B-GGUF · Hugging Facedata from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description foundd **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.- Hugging Face – The AI community building the future.ely **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description found prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.- lmstudio-community/Qwen2-VL-7B-Instruct-GGUF at mainvnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description foundd and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.- samgreen/Qwen2.5-VL-7B-Captioner-Relaxed-GGUF at main. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description foundts Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.- TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF · Hugging FaceCohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
: no description found > > - **vnc-lm Releases Discord Bot with RAG Integration**: A member released a new version of their Discord bot, **vnc-lm**, featuring a **RAG pipeline** that pulls data from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.
LM Studio ▷ #hardware-discussion (159 messages🔥🔥):
VRAM Usage, Google Coral dual TPU, RX 6800 ROCm support, RTX 4060-Ti vs RX 7800 XT, AI APUs
- VRAM bottlenecks limit speed: An 8B model at 32k tokens can achieve 10t/s with 16GB VRAM, but performance decreases with larger 14b models due to limited VRAM and shared RAM usage.
- Members discussed matching model size and context length to available VRAM to optimize speed, highlighting the impact of insufficient memory bandwidth when relying on system RAM.
- Google Coral dual TPU is unsuitable for AI use: The Google Coral dual TPU is not suitable for AI use because it lacks onboard memory.
- One user with an 8060s also inquired about thermal and power headroom for the Framework Desktop.
- RX 6800 has lacking ROCm support: The RX 6800 might have unofficial ROCm support, but it will use Vulkan for inference as OpenCL support is deprecated in llama.cpp.
- A user noted that Vulkan is slower on their GTX card, suggesting it might not be optimal for the AMD card either.
- LM Studio fails to load models into dedicated memory: Users are experiencing issues with LM Studio loading models into shared memory instead of dedicated VRAM on RX 9070 cards, resulting in slow performance (3tok/s).
- Solutions include enabling UEFI and dynamic BAR, reinstalling LM Studio, and using AMD driver cleanup utility to improve memory allocation, with ongoing investigation into driver and Vulkan runtime issues.
- 4060ti: The Inexpensive Inference Sweet Spot: The RTX 4060 Ti with 16GB of VRAM is highlighted as a cost-effective option for AI inference, priced around $500 USD/EUR.
- A user added, it is important to note that AMD cards are not optimized for gaming and the 5000 series from Nvidia may melt.
Links mentionederal](https://discord.com/channels/1216353675241590815/1216353675744641096/1352733484468146206)** (33 messages🔥): >:-
LLM Inference Hardware Calculatorynthetic data** using **vllm** and **Deepseek R1**, expecting the process to run for a couple of weeks.
- Training is delayed in anticipation of **Llama4's release** during LlamaCon.
- **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**.
- A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*.
- ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents.
- It adapts the latest models to accurately and reliably structure documents like financial reports and resumes.
- ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme.
- The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground.
- ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**.
- The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: no description foundlm** and **Deepseek R1**, expecting the process to run for a couple of weeks. - Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
- NVIDIA GeForce RTX 4060 Ti & 4060 Graphics Cardson. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: New ways to create & much more.*: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.- 4060 16gb - Shopping and Price Comparison Australia - Buy Cheapts like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
: no description found - ****Synthetic Data** Streams from **vllm** and **Deepseek R1****: A member is generating **synthetic data** using **vllm** and **Deepseek R1**, expecting the process to run for a couple of weeks. - Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.- Training is delayed in anticipation of **Llama4's release** during LlamaCon. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.- A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
Yannick Kilcher ▷ #general (326 messages🔥🔥):
VPN Injection, Amodal3R, NVIDIA cuOpt, CUDA Python, Mixture of Experts (MoEs)
- *VPN* code injected in OpenAI website?: A user reported seeing
<veepn-guard-alert>
and<veepn-lock-screen>
tags on OpenAI's website, suspecting a VPN, but another user clarified it was likely code injected by their own VPN sm0kywu.github.io/Amodal3R.- The user joked that OpenAI is routing requests through a VPN for plausible deniability so they can use it for training data down the line.
- *NVIDIA cuOpt* Optimization AI Microservice Excels: NVIDIA® cuOpt™ is a GPU-accelerated optimization AI microservice that excels in Mixed Integer Linear Programming (MILP), Linear Programming (LP), and Vehicle Routing Problems (VRP) according to docs.nvidia.com.
- *CUDA Python* is the New Wave: Members discussed whether it is truly the year of CUDA Python as previously mentioned by blelbach on X, with some asserting that Python is sufficient for GPU programming since most users don't need all the features of C++.
- Others mocked modern Python programmers, linking a YouTube video titled Modern Python Programmers.
- *MoEs* are NOT Unstable Anymore!: A user claimed that MoEs are unstable, but another user countered that they haven’t been unstable to train for two years and are now about the same as dense networks.
- The stability is largely due to better kernels and dropless token routing, solving issues like numerical instability and expert collapse.
- *DeepSeek V3* drops, community underwhelmed?: Members mentioned that DeepSeek released their DeepSeek-V3-0324 model, with one user stating DeepSeek will destroy OpenAI and another adding that they only published the crappy small version
- Some members dismissed the approach used by DeepSeek, calling it just known methods and some simplifications, also criticizing the resulting quality.
Links mentionedv](https://discord.com/channels/1216353675241590815/1236040539409879170/1352735360001507479)** (23 messages🔥): >:-
Tweet from undefinednd attempted to debug it, with one suggesting upgrading the **datasets version**.
- One member confirmed that they are on the latest version **3.4.1**.
- **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467).
- The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation.
- **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune.
- One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models.
- **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation.
- Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time.
**Link mentioned**: : no description foundwith one suggesting upgrading the **datasets version**.
- One member confirmed that they are on the latest version **3.4.1**.
- **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467).
- The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation.
- **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune.
- One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models.
- **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation.
- Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time.
**Link mentioned**:
- no title founder confirmed that they are on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : no description found on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- Amodal3R GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : no description found3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- Tweet from Bryce Adelstein Lelbach (@blelbach)github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : Quoting Bryce Adelstein Lelbach (@blelbach) It's the year of CUDA Python.ter than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- Tweet from davidad 🎇 (@davidad) support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : @burny_tech Unfortunately, the answer to good-enough planning for a longer future might be as simple as having a longer past. 🤷ng weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- Introduction — NVIDIA cuOptghts, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : no description foundned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- Tweet from Sebastian Risi (@risi1979)s** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : Excited to share our latest work: “Bio-Inspired Plastic Neural Networks for Zero-Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robots” 🪲🦎We show that Hebbian learning outperfor...PU, which reduces GPU idle time. **Link mentioned**:
- Tweet from Bryce Adelstein Lelbach (@blelbach)PO LoRA Single Device by ianbarber · Pull Request #2467 · pytorch/torchtune: It's the year of CUDA Python.Quoting You Jiacheng (@YouJiacheng) What can I say? C++ out! purpose of this PR? Is it to[x ] add a new feature fix a bug update tests and/or documentation other (please add here)#2421 - exploring a LoRA recipe.ChangelogWhat are ... --- ### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1353412469934264491)** (1 messages): >
- deepseek-ai/DeepSeek-V3-0324 · Hugging Facehere)#2421 - exploring a LoRA recipe.ChangelogWhat are ... --- ### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1353412469934264491)** (1 messages): >: no description foundoRA recipe.ChangelogWhat are ... --- ### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1353412469934264491)** (1 messages): >
- Tweet from Albert ⚡️ (@mrmashy_)1161519468141355160/1202371242519441499/1353412469934264491)** (1 messages): >: AI Website Design generated by the new DeepSeek V3 update in 1-shot.): >
- Shrimp As GIF - Shrimp As That - Discover & Share GIFs member has submitted a [pull request (#8000)](https://github.com/stanfordnlp/dspy/pull/8000) for a new optimizer called **DLCoT** (Deconstructing Long Chain-of-Thought) to the DSPy teleprompt module. - It enhances chain-of-thought reasoning by intelligently processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**: : Click to view the GIFll request (#8000)](https://github.com/stanfordnlp/dspy/pull/8000) for a new optimizer called **DLCoT** (Deconstructing Long Chain-of-Thought) to the DSPy teleprompt module. - It enhances chain-of-thought reasoning by intelligently processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**:
- Tweet from undefined optimizer called **DLCoT** (Deconstructing Long Chain-of-Thought) to the DSPy teleprompt module. - It enhances chain-of-thought reasoning by intelligently processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**: : Runs an AI Safety research group in Berkeley (Truthful AI) + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**:
- GPU Cloud - VMs for Deep Learning | Lambdancorrect chains and reconstructing coherent output. - ****DLCoT** Slashes Token Usage by 70-90%**: The **DLCoT optimizer** can reduce token usage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**: : NVIDIA H100, A100, RTX A6000, Tesla V100, and Quadro RTX 6000 GPU instances. Train the most demanding AI, ML, and Deep Learning models.sage by **70-90%** while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**:
- MambaVision - a nvidia Collections like **BootstrapFewShot** and distills down to the most efficient reasoning path. **Link mentioned**: : no description found* and distills down to the most efficient reasoning path. **Link mentioned**:
- NVIDIA cuOpttps://github.com/stanfordnlp/dspy/pull/8000">: Decision Optimization, Linear Programming, Mixed Integer Linear Programming Heuristics, and VRP.lation by jmanhype · Pull Request #8000 · stanfordnlp/dspy
- Thread by @OwainEvans_UK on Thread Reader ApperviewThis PR adds a new optimizer to the DSPy teleprompt module: the DLCoT (Deconstructing Long Chain-of-Thought) optimizer. This feat... --- ### **DSPy ▷ #[general](https://discord.com/channels/1161519468141355160/1161519469319946286/1353165176161042493)** (20 messages🔥): >: @OwainEvans_UK: Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious...68141355160/1161519469319946286/1353165176161042493)** (20 messages🔥): >
- GitHub - canopyai/Orpheus-TTS: TTS Towards Human-Sounding Speechexample, Agentic-Reward-Modeling link, DLCoT Optimizer, MIPROv2` - ****DSPy** for creative content generation discussed**: Members are discussing using **DSPy** for optimizing prompts for creative content generation and suggesting to use a *good judge*. - One member suggested checking out [PAPILLON](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) and [Agentic Reward Modeling](https://github.com/THU-KEG/Agentic-Reward-Modeling) examples. - ****DLCoT Optimizer** contribution**: A member shared a new contribution, the **DLCoT (Deconstructing Long Chain-of-Thought) Optimizer**, on [GitHub](https://github.com/stanfordnlp/dspy/pull/8000) for efficient Chain-of-Thought distillation. - The member encouraged others to check it out and provide feedback. - **Optimizing Prompt without Examples**: A member is seeking guidance on optimizing a prompt for passage summarization **without examples**, using a working evaluation function and wondered if they should use **COPRO** instead of **MIPROv2**. - Another member clarified that example *inputs* are always needed but summaries (labels) are not, if a judge/metric can assess summaries without a reference/label. - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
: TTS Towards Human-Sounding Speech. Contribute to canopyai/Orpheus-TTS development by creating an account on GitHub.d**: Members are discussing using **DSPy** for optimizing prompts for creative content generation and suggesting to use a *good judge*. - One member suggested checking out [PAPILLON](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) and [Agentic Reward Modeling](https://github.com/THU-KEG/Agentic-Reward-Modeling) examples. - ****DLCoT Optimizer** contribution**: A member shared a new contribution, the **DLCoT (Deconstructing Long Chain-of-Thought) Optimizer**, on [GitHub](https://github.com/stanfordnlp/dspy/pull/8000) for efficient Chain-of-Thought distillation. - The member encouraged others to check it out and provide feedback. - **Optimizing Prompt without Examples**: A member is seeking guidance on optimizing a prompt for passage summarization **without examples**, using a working evaluation function and wondered if they should use **COPRO** instead of **MIPROv2**. - Another member clarified that example *inputs* are always needed but summaries (labels) are not, if a judge/metric can assess summaries without a reference/label. - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.- Specification gaming examples in AI — LessWrongggested checking out [PAPILLON](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) and [Agentic Reward Modeling](https://github.com/THU-KEG/Agentic-Reward-Modeling) examples. - ****DLCoT Optimizer** contribution**: A member shared a new contribution, the **DLCoT (Deconstructing Long Chain-of-Thought) Optimizer**, on [GitHub](https://github.com/stanfordnlp/dspy/pull/8000) for efficient Chain-of-Thought distillation. - The member encouraged others to check it out and provide feedback. - **Optimizing Prompt without Examples**: A member is seeking guidance on optimizing a prompt for passage summarization **without examples**, using a working evaluation function and wondered if they should use **COPRO** instead of **MIPROv2**. - Another member clarified that example *inputs* are always needed but summaries (labels) are not, if a judge/metric can assess summaries without a reference/label. - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
: A collection of examples of AI systems "gaming" their specifications - finding ways to achieve their stated objectives that don't actually solve the… - **Datasets Library Troubleshoot**: Members found an issue with the **datasets library** and attempted to debug it, with one suggesting upgrading the **datasets version**. - One member confirmed that they are on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: - **Datasets Library Troubleshoot**: Members found an issue with the **datasets library** and attempted to debug it, with one suggesting upgrading the **datasets version**. - One member confirmed that they are on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
Yannick Kilcher ▷ #paper-discussion (3 messages):
DeepSeek-V3, DeepSeek-R1, Multi-Head Latent Attention (MLA)
- DeepSeek Models Reach SOTA with Less: A paper reviews DeepSeek's open-source LLMs DeepSeek-V3 and DeepSeek-R1, noting they achieve state-of-the-art performance with lower resource requirements.
- Key to this is Multi-Head Latent Attention (MLA), which compresses keys and values into a latent vector, dramatically reducing memory consumption.
- DeepSeek's Diagrams Reused in Blog Post: A member described the blog post covering the DeepSeek paper as one of the most blatant re-uses of content, noting "They didn't even make diagrams themselves, they just reused the deepseek ones".
Link mentioned: 🥇Top AI Papers of the Week Launches for Chain-of-Thought: A member has submitted a pull request (#8000) for a new optimizer called DLCoT (Deconstructing Long Chain-of-Thought) to the DSPy teleprompt module. - It enhances chain-of-thought reasoning by intelligently processing and optimizing long CoT data by segmenting CoT content, removing redundant paths, filtering incorrect chains and reconstructing coherent output. - DLCoT* Slashes Token Usage by 70-90%: The DLCoT optimizer can reduce token usage by 70-90% while maintaining or improving accuracy across benchmarks. - The optimizer works with existing DSPy optimizers like BootstrapFewShot and distills down to the most efficient reasoning path.
Link mentioned: : The Top AI Papers of the Week (Mar 17 - 23)
Yannick Kilcher ▷ #ml-news (17 messages🔥):
ChatGPT & Loneliness, AITER Tensor Engine for ROCm, DeepSeek-V3-0324, Pokemon Red DRL
- ChatGPT Linked to Lonesomeness?: A member shared a Bloomberg article discussing an OpenAI study that suggests a link between ChatGPT use and feelings of loneliness.
- Another member pointed out correlation doesn't always mean causation.
- AITER Accelerates AMD GPUs: A member posted a link to AMD's AI Tensor Engine for ROCm (AITER), which optimizes GPU performance for AI tasks on ROCm.
- The engine allows developers to create operators, integrating them into various LLM training and inference workloads.
- DeepSeek-V3 Arrives Quietly: A member shared DeepSeek-V3-0324 on HuggingFace, though the README.md is currently empty.
- The model boasts 685B parameters and offers various tensor types like BF16, F8_E4M3, and F32, with links to finetunes and quantizations.
- Pokémon Red gets Deep Reinforcement Boost: A member linked a paper and associated YouTube video and linked the ArXiv paper about using Deep Reinforcement Learning (DRL) to train an agent to play Pokémon Red.
- The abstract discusses the challenges of the game, including multi-tasking, long horizons, and hard exploration, and introduces a baseline agent that completes the initial segment of the game using a simplistic environment and DRL.
Links mentioneded checking out [PAPILLON](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) and [Agentic Reward Modeling](https://github.com/THU-KEG/Agentic-Reward-Modeling) examples. - ****DLCoT Optimizer** contribution**: A member shared a new contribution, the **DLCoT (Deconstructing Long Chain-of-Thought) Optimizer**, on [GitHub](https://github.com/stanfordnlp/dspy/pull/8000) for efficient Chain-of-Thought distillation. - The member encouraged others to check it out and provide feedback. - **Optimizing Prompt without Examples**: A member is seeking guidance on optimizing a prompt for passage summarization **without examples**, using a working evaluation function and wondered if they should use **COPRO** instead of **MIPROv2**. - Another member clarified that example *inputs* are always needed but summaries (labels) are not, if a judge/metric can assess summaries without a reference/label. - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.:-
Pokemon Red via Reinforcement Learninges (labels) are not, if a judge/metric can assess summaries without a reference/label.
- **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback.
- Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module.
- **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation.
- The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
: Pokémon Red, a classic Game Boy JRPG, presents significant challenges as a testbed for agents, including multi-tasking, long horizons of tens of thousands of steps, hard exploration, and a vast array ...milar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
- AITER: AI Tensor Engine For ROCm — ROCm Blogseturn `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
: no description foundre=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.- deepseek-ai/DeepSeek-V3-0324 · Hugging FaceRetrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
: no description found - **Fine-Grained Feedback via `dspy.Prediction`**: A member inquired about achieving granular feedback with **Refine**, similar to assertions/suggestions, where specific checks over an output provide targeted feedback. - Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.- Another member mentioned that in version **2.6.15**, it will be possible to return `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.- The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
GPU MODE ▷ #general (22 messages🔥):
Cloud Providers with Profilers, In-depth Dive into NCCL, Quantization Benchmarking, Understanding Flash Attention, ILGPU 2.0 Availability
- Cloud Providers with Profilers: A member asked about cloud providers, besides Lambda Labs and AWS, that allow for profilers, leading to a suggestion to compile a shame list to pressure more providers.
- It was noted that lightning.ai supports profiling and that AWS only provides it on bare metal; Paperspace and Nebius were also mentioned, based on a Reddit thread.
- Quantization Benchmarking Methods Explored: A member inquired about how to benchmark quantized models and determine which layers to quantize.
- Another member suggested using the EleutherAI lm-evaluation-harness framework for evaluating language models.
- Decoding Flash Attention by Coding: In a discussion about understanding Flash Attention (FA), a member suggested that coding and profiling/debugging can be helpful if time permits.
- It was noted that hands-on implementation aided understanding of normal attention, and similarly for Flash Attention.
- Tile Layout Diagrams: Grasping Bit Interleaving: Feedback was requested on the usefulness and clarity of tile layout diagrams, such as those from tile-ai and Nvidia PTX documentation.
- The discussion centered on how coordinate bits interleave when mapping between integer sets, assuming power-of-two sizes and contiguity.
Links mentionedUnder Scrutiny**: A member inquired about the correct usage of **DSPy Modules** within the context of generating reports and charts from a **Pandas DataFrame** using **LLMs**. - Another member pointed out the difficulty in getting help without a more specific question beyond reviewing a large attached code file, the member then specified *is that the correct way to use DSPy Modules*? - **Members seek creative writing prompt examples**: A member requested examples for improving **creative writing prompts** or similar cases where there's no clear correct answer. - A link to the **PAPILLON GitHub repository** was shared, featuring a tutorial notebook focused on privacy preservation from internet-based and local language model ensembles, [PAPILLON GitHub](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb). **Link mentioned**: :-
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.ge model ensembles, [PAPILLON GitHub](https://github.com/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb).
**Link mentioned**: : A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harnessillon_tutorial.ipynb).
**Link mentioned**:
- Reddit - The heart of the internettutorial.ipynb at main · Columbia-NLP-Lab/PAPILLON: no description found **Link mentioned**: **Link mentioned**: **Link mentioned**:
GPU MODE ▷ #triton (15 messages🔥):
Triton and Pip Confusion, cuTIl Performance, BF16 Atomic Operations, Triton IR Generation, Flash Attention 1 Kernel Issues
- Triton install can induce Pip confusion: Installing both triton and triton-windows in the same folder can confuse pip, requiring users to uninstall both before reinstalling triton-windows.
- The fact that PyTorch is already using Triton suggests ongoing relevance for the package.
- cuTIl boost Triton performance: A user inquired about the performance benefits of cuTIl, questioning if it aims to surpass LLVM-based approaches by directly utilizing SASS instead of PTX for finer performance tuning.
- Others pointed out that this is related to atomic CAS, referencing this github issue.
- BFloat16 Atomic Additions Demand SM90 or Higher: atom.add.noftz.bf16 and atom.add.noftz.bf16x2 require sm_90 or higher, necessitating a atom.global.cas version in the PTX.
- A user's temporary workaround involves using a float32 output and casting to bfloat16, which slows down LLama3-8B inference from 113 tokens/sec to 96 tokens/sec on the A100; a post-hook cast might improve speed.
- Gemlite faces BF16 atomic add limitations: A user is facing issues with bfloat16 atomic add in the gemlite kernel, which requires sm_90 or higher.
- They are investigating casting as a post-hook in Triton, since they need a custom op since prune_configs_by is not supported by torch.compile.
- Flash Attention 1 Kernel Faces Discrepancies: A user implementing Flash Attention 1 as a first kernel in triton reported that it works with TRITON_INTERPRET=1 but it has a few elements mismatched on cuda.
- After increasing rtol & atol the tests passed suggesting that the CPU vs GPU results may be reordered and floats don't like that.
Links mentionedd:- Feature Request: `tl.atomic_add` for bfloat16 · Issue #1387 · triton-lang/tritonytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad : For additional context, see pytorch/pytorch#97016. torch.index_put(..., accumulate=True) currently fails for torch.bfloat16 under torch.compile because tl.atomic_add doesn't support BFloat16. The ...like micrograd? You love tinygrad! ❤️
- Feature Request: `tl.atomic_add` for bfloat16 · Issue #1387 · triton-lang/tritonh? You like micrograd? You love tinygrad! ❤️
: For additional context, see pytorch/pytorch#97016. torch.index_put(..., accumulate=True) currently fails for torch.bfloat16 under torch.compile because tl.atomic_add doesn't support BFloat16. The ...4)** (4 messages): >- gemlite/gemlite/triton_kernels/gemv_revsplitK_A16fWnO16f_int32packing.py at master · mobiusml/gemliteut. - **Tinygrad has two facades**: Tinygrad has two facades: the **deep learning** part (weights update, tensors, matrix multiplication), and the **compiler** part (GPU code generation and scheduling). - The deep learning part is better explained by [Karpathy’s Youtube tutorial](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ). - **OpenCL empty values are unguaranteed**: A member reported getting weird output from the [first example in tinygrad-notes](https://mesozoic-egg.github.io/tinygrad-notes/20241231_intro.html). - It was clarified that *with OpenCLempty is just empty, there's no guaranteed value*. **Link mentioned**: : Fast low-bit matmul kernels in Triton. Contribute to mobiusml/gemlite development by creating an account on GitHub.
- Reddit - The heart of the internettutorial.ipynb at main · Columbia-NLP-Lab/PAPILLON: no description found **Link mentioned**: **Link mentioned**: **Link mentioned**:
GPU MODE ▷ #cuda (42 messages🔥):
WMMA instructions, PyTorch RTX 5080 CUDA 12.8 Support, Flash Attention Optimization, Hopper Architecture Swizzle, CUDA Performance Counters Permission Error
- WMMA instructions compile to MMA: It's confirmed that WMMA instructions are indeed "wrappers" that compile directly to HMMA/IMMA/QMMA instructions in SASS, similar to how MMA instructions function, as shown on the CUDA Godbolt.
- RTX 5080 PyTorch Support Emerges with CUDA 12.8 Patch: A developer released a patch enabling full CUDA 12.8 + PyTorch 2.5.0 compatibility with the Blackwell / sm_120 architecture for the RTX 5080, providing a GitHub repo with scripts, diffs, and instructions.
- Flash Attention's Memory Efficiency: In Flash Attention, tensors are stored as (batch_size, N, num_heads, d), which are contiguous in d (typically > 64), enabling efficient global memory coalescing where each thread loads 16B of data.
- Hopper's Swizzle Layout Explained: The documentation's description of the 64B swizzle in the Hopper architecture is confusing to many, but it's clarified to be a 64B (bytes) swizzle where each square is 128b (bits), which translates to a 8x64 tile for 8-bit dtypes and a 8x32 tile for 16-bit types.
- Solving CUDA Permission Errors on Linux: When encountering ERR_NVGPUCTRPERM, which indicates a lack of permissions to access NVIDIA GPU Performance Counters, users on Linux might need to run the command with
sudo
, though the linked NVIDIA documentation should also be consulted for comprehensive solutions.
Links mentionedhref="https://mesozoic-egg.github.io/tinygrad-notes/20241231_intro.html">:-
Compiler Explorerions](https://discord.com/channels/1280234300012494859/1280370030609170494/1353204258391982166)** (9 messages🔥):
>: no description foundchannels/1280234300012494859/1280370030609170494/1353204258391982166)** (9 messages🔥):
>
- NVIDIA Development Tools Solutions - ERR_NVGPUCTRPERM: Permission issue with Performance CountersResearch` - **Quiz Title Typo Causes Confusion**: A member reported a typo in the title of **Quiz 7**, causing confusion when checking answers for **Quiz 6**. - Another member acknowledged the catch and thanked the reporter. - **AgentX Research Track Application Live**: Selected students will receive mentorship from **Berkeley postdocs/mentors** on an **AgentX Research Track project** with applications due **March 26th** at **11:59pm PDT**. - Mentorship is not required to join or succeed in **AgentX**, and labs plus the Certificate Declaration form will be released in April as seen in the [attached image](https://cdn.discordapp.com/attachments/1280370030609170494/1353204258450964544/image.png?ex=67e2c76c&is=67e175ec&hm=1fb895b885ce732fd7e5b99b8ff24c55286d5). - **Research Track is Confirmed to be Remote and Unpaid**: A member confirmed that the **AgentX Research Track mentorship** will be conducted remotely. - Another member clarified that the mentorship is not paid, with mentors simply providing guidance on the research project. --- --- --- ---
- AITER: AI Tensor Engine For ROCm — ROCm Blogseturn `dspy.Prediction(score=...., feedback=....)` to offer fine-grained feedback to the module. - **Multi-Agent Protocol Standard (MCP) in Retrieval**: Members discussed the potential of a multi-agent protocol standard (**MCP**) and its expansion to include retrievers/retrieval augmented generation. - The discussion included a shared schema for retrieval results and methods to exchange documents and embeddings, with an aim to streamline data-driven workflows and simplify the combination of multiple models and data sources.
- no title founder confirmed that they are on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**: : no description found on the latest version **3.4.1**. - **GRPO LoRA Achieves 54% on GMS8K**: The **GRPO LoRA 3B single device** gets to **54%** on GMS8K, according to a member who shares a [link to the pull request](https://github.com/pytorch/torchtune/pull/2467). - The member noted that it performs better than expected on novel questions, despite an error of adding extraneous +2 in its calculation. - **vLLM support lacking for data generation**: Members discussed adding **vLLM support for data generation** but noted difficulties in sharing weights between vLLM and torchtune. - One suggested hosting the model in another vLLM process and converting weights, while another mentioned experimenting with a hacky way to make it work on smaller models. - **CUDA Graphs capture operations**: A member inquired about **CUDA graphs** which captures a whole bunch of GPU operations as a graph and launch them as a single operation. - Another member confirmed this and noted that it reduces the overhead to launch CUDA operations from CPU, which reduces GPU idle time. **Link mentioned**:
- NVIDIA GeForce RTX 4060 Ti & 4060 Graphics Cardson. - **Data Quality Conundrums Continue**: Despite years of research, the definition and attainment of *good data* remain elusive for AI labs, even after the recognized importance of datasets like **fineweb** and **lima**. - A member expressed frustration over the lack of effective **PDF extraction** tools: *we still don't have amazing PDF extraction and this is making my blood boil*. - ****LlamaExtract** Tool Launched**: [LlamaIndex](https://www.llamaindex.ai/) launched **LlamaExtract**, a tool for structuring complex documents using genAI-native agents. - It adapts the latest models to accurately and reliably structure documents like financial reports and resumes. - ****DeepSeek-V3** Releases Unhinged**: A member noted the unceremonious release of **DeepSeek-V3** by Deepseek, humorously calling them *unhinged* due to the lack of a proper readme. - The model, accessible on [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), has a blank `README.md` but provides access to a playground. - ****MoEs** Hinted for **Torchtune**?**: A subtle reference was made to the potential inclusion of **Mixture of Experts (MoE)** models in **Torchtune**. - The discussion touched on the practical challenges of training such large models, potentially requiring **8-9 TB of VRAM**.
- Google Searchscord Bot with RAG Integration**: A member released a new version of their Discord bot, **vnc-lm**, featuring a **RAG pipeline** that pulls data from **Wikipedia** and **DuckDuckGo** to augment prompts with additional context. - This pipeline adds approximately **500 tokens** to each prompt by appending five chunks of sourced information to improve the model's context, with code available on [GitHub](https://github.com/jake83741/vnc-lm). - **Search enabled and disabled**: The newly released bot has support for web search. - The new search can be enabled with **+ search** and disabled with **+ model**. - **Versatile Bot Supports Local and Hosted LLMs**: The updated Discord bot now supports every popular local and hosted large language model API, including **Cohere**. - The bot can be quickly built using **Docker**, allowing users to easily edit messages and get new responses within Discord.