[AINews] lots of little things happened this week

"Normal agent is a shit show without max"

                March 22, 2025

            [AINews] lots of little things happened this week

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Incremental updates are all you need.

AI News for 3/20/2025-3/21/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 3009 messages) for you. Estimated reading time saved (at 200wpm): 318 minutes. You can now tag @smol_ai for AINews discussions!

Claude Code (which we mentioned last month) had a mini launch week
Mindmaps in NotebookLM
Roboflow launched their YOLO competitor
Anthropic made a lot of noise about a think tool
Gemini launhced a bunch of things 
Kyutai Moshi added vision
Topaz announced a fast upscaler
Percy Liang relaunched HELM

all this and more in the Twitter/Reddit/Discord recaps. We hope to ship the weekly AINews this weekend.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Other AI Subreddit Recap

AI Discord Recap
PART 1: High level Discord summaries
Cursor Community Discord
Unsloth AI (Daniel Han) Discord
OpenAI Discord
LM Studio Discord
aider (Paul Gauthier) Discord
Perplexity AI Discord
Interconnects (Nathan Lambert) Discord
LMArena Discord
Notebook LM Discord
Nous Research AI Discord
HuggingFace Discord
MCP (Glama) Discord
OpenRouter (Alex Atallah) Discord
GPU MODE Discord
Nomic.ai (GPT4All) Discord
Yannick Kilcher Discord
LlamaIndex Discord
Cohere Discord
Modular (Mojo 🔥) Discord
DSPy Discord
tinygrad (George Hotz) Discord
Torchtune Discord

PART 2: Detailed by-Channel summaries and links
Cursor Community ▷ #general (789 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (241 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #off-topic (3 messages):
Unsloth AI (Daniel Han) ▷ #help (95 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #showcase (7 messages):
Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):
OpenAI ▷ #ai-discussions (296 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (7 messages):
OpenAI ▷ #prompt-engineering (22 messages🔥):
OpenAI ▷ #api-discussions (22 messages🔥):
LM Studio ▷ #general (103 messages🔥🔥):
LM Studio ▷ #hardware-discussion (136 messages🔥🔥):
aider (Paul Gauthier) ▷ #general (207 messages🔥🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (14 messages🔥):
Perplexity AI ▷ #general (204 messages🔥🔥):
Perplexity AI ▷ #sharing (6 messages):
Perplexity AI ▷ #pplx-api (7 messages):
Interconnects (Nathan Lambert) ▷ #news (76 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #random (23 messages🔥):
Interconnects (Nathan Lambert) ▷ #memes (2 messages):
Interconnects (Nathan Lambert) ▷ #cv (6 messages):
Interconnects (Nathan Lambert) ▷ #reads (16 messages🔥):
Interconnects (Nathan Lambert) ▷ #policy (6 messages):
LMArena ▷ #general (108 messages🔥🔥):
Notebook LM ▷ #use-cases (21 messages🔥):
Notebook LM ▷ #general (74 messages🔥🔥):
Nous Research AI ▷ #general (79 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (8 messages🔥):
Nous Research AI ▷ #research-papers (1 messages):
Nous Research AI ▷ #research-papers (1 messages):
Nous Research AI ▷ #reasoning-tasks (3 messages):
HuggingFace ▷ #general (50 messages🔥):
HuggingFace ▷ #today-im-learning (1 messages):
HuggingFace ▷ #i-made-this (3 messages):
HuggingFace ▷ #computer-vision (1 messages):
HuggingFace ▷ #smol-course (2 messages):
HuggingFace ▷ #agents-course (24 messages🔥):
MCP (Glama) ▷ #general (69 messages🔥🔥):
MCP (Glama) ▷ #showcase (6 messages):
OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):
GPU MODE ▷ #general (9 messages🔥):
GPU MODE ▷ #triton (6 messages):
GPU MODE ▷ #cuda (3 messages):
GPU MODE ▷ #torch (5 messages):
GPU MODE ▷ #algorithms (2 messages):
GPU MODE ▷ #lecture-qa (3 messages):
GPU MODE ▷ #self-promotion (3 messages):
GPU MODE ▷ #reasoning-gym (4 messages):
GPU MODE ▷ #submissions (11 messages🔥):
GPU MODE ▷ #hardware (2 messages):
Nomic.ai (GPT4All) ▷ #general (35 messages🔥):
Yannick Kilcher ▷ #general (10 messages🔥):
Yannick Kilcher ▷ #paper-discussion (3 messages):
Yannick Kilcher ▷ #ml-news (16 messages🔥):
LlamaIndex ▷ #blog (1 messages):
LlamaIndex ▷ #general (13 messages🔥):
Cohere ▷ #「💬」general (4 messages):
Cohere ▷ #「🔌」api-discussions (4 messages):
Cohere ▷ #「🤖」bot-cmd (3 messages):
Cohere ▷ #「🤝」introductions (2 messages):
Modular (Mojo 🔥) ▷ #mojo (12 messages🔥):
DSPy ▷ #general (1 messages):
tinygrad (George Hotz) ▷ #general (1 messages):
Torchtune ▷ #papers (1 messages):

AI Twitter Recap
Models and Benchmarks

New research from @AnthropicAI reveals a simple 'think' tool dramatically improves instruction adherence and multi-step problem solving for agents: @alexalbert__ documented these findings in a blog post. @skirano also noted that they made an MCP for this, which can be downloaded from their official Anthropic MCP server repo. @_philschmid observed that @AnthropicAI appears to be the first to release combined reasoning and tool use, with Claude reasoning, generating a function call, executing it, and then continuing to reason with the output.
NVIDIA's Llama-3.3-Nemotron-Super-49B-v1 ranks at #14 on LMArena: According to @lmarena_ai, this model is a powerful open reasoning model, excelling in math with an openly released 15M post-training dataset. The ranking overview of this model, previously tested under the codename "march-chatbot" on LMArena, can be found here.
Sakana AI is using Sudoku Puzzles to superpower AI reasoning: @SakanaAILabs announced the release of a new reasoning benchmark based on the modern variant of Sudoku to challenge the AI community, believing these puzzles are perfect for measuring progress in AI reasoning capabilities. The new benchmark and training data are available here. @hardmaru simply stated that as a species, we can improve our collective reasoning and problem-solving ability by playing Sudoku.
The HELM benchmark has a new leaderboard: HELM Capabilities v1.0: @percyliang noted that they curated 5 challenging datasets (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH) and evaluated 22 top language models.
Meta AI released SWEET-RL, a novel RL algorithm for long-horizon & multi-turn tasks which can perform better credit assignments: @AIatMeta reported that experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success & win rates on CollaborativeAgentBench compared to other state-of-the-art multiturn RL algorithms, enabling Llama-3.1-8B to match or exceed the performance of GPT4-o in realistic collaborative content creations. More details on both of these releases can be found in the full paper published on arXiv.
Meta AI also released a new agents benchmark: CollaborativeAgentBench, the first benchmark studying collaborative LLM agents that work with humans across multi-turn collaboration on realistic tasks in backend programming & frontend design: Details at @AIatMeta.
New on LMArena: @Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14. It is a powerful open reasoning model—top-15 overall, excelling in math, with an openly released 15M post-training dataset.

Language Model Development and Releases

Gallabytes joined Cursor to work on coding agents: After an incredible 3 years leading model development at Midjourney, @gallabytes announced their move to Cursor.
Kyutai Labs released MoshiVis, an end-to-end low-latency Vision Speech Model: @reach_vb noted the model only adds 206M parameters and uses a learnable gating mechanism, adding only ~7ms per inference step on a MacMini with M4 Pro Chip, while maintaining real-time performance.
NVIDIA built GR00T N1, a powerful open-source AI model designed for humanoid robots: According to @TheTuringPost, it's a Vision-Language-Action (VLA) model based on Eagle-2 with SmolLM-1.7B, and a Diffusion Transformer. It generates 16 actions in ~64 milliseconds on an NVIDIA L40 GPU.
ByteDance just announced InfiniteYou available on Hugging Face: According to @_akhaliq, this is for Flexible Photo Recrafting While Preserving Your Identity.
Roblox just casually dropped a app for Cube 3D on Hugging Face: @_akhaliq noted that it generates 3D models directly from text.
Claude gets real-time web search: According to @TheRundownAI, OpenAI's voice AI got a personality boost. @_philschmid believes that @AnthropicAI are the first releasing combined reasoning + tool use.

AI Applications and Tools

The Deep Research x AI Builder Thesis: @swyx theorizes the collision path between the prompt-to-app AI builder and the deep research agent, suggesting building a deep research app on demand to split out UI generation and data generation into separate agents.
Dair.AI promotes the use of LLM-as-a-Judge, a technique for automating the assessment of LLM outputs by using a specialized LLM as a “Judge”: @dair_ai believes this enables rapid development of LLM applications and AI agents.
LangChain released MCP Adapters: @LangChainAI announced their new TypeScript library that connects Anthropic's MCP tools with LangChain.js & LangGraph.js, featuring multi-server support and seamless agent integration.
LlamaIndex announced LlamaExtract is now in public beta: This leading, genAI-native agent for structured document extraction adapts the latest models to structure even the most complex documents: @jerryjliu0.
Perplexity is working on an updated version of Deep Research: @AravSrinivas states that the new version will throw even more compute, think longer, present more detailed answers, use code execution, and render in-line charts.

AI Community and Events

Andrew Ng shared his observations from the AI Dev 25 conference: @AndrewYNg noted that agentic AI continues to be a strong theme, developers are fine-tuning smaller models on specific data, and many speakers spoke about the importance of being pragmatic about what problems we are solving, as opposed to buying into the AGI hype.

Optimization and Training

Cloneofsimo shared findings from exploring extreme beta values in training: @cloneofsimo notes that large beta2 seems crucial, until beta1 also becomes small, and that small beta1 allows small beta2.
Hamel Husain provided an update on training tools: @HamelHusain let his audience know that he'd be online in ~ 15 min (will be recorded for those who sign up).

Humor

Neel Nanda jokingly asked if 21% don't think someone is a billionaire: @NeelNanda5.
Vikhyatk joked about moving to SF and finding a room for only $6000/mo: @vikhyatk.
Swyx updated a meme: @swyx.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. SpatialLM: LLM for 3D Scene Understanding

SpatialLM: A large language model designed for spatial understanding (Score: 1033, Comments: 94): SpatialLM is a large language model specifically designed to enhance 3D scene understanding using Llama 1B. The model focuses on improving spatial comprehension, potentially offering advancements in applications that require detailed environmental awareness.
SpatialLM Capabilities: SpatialLM processes 3D point cloud data to generate structured scene understanding, identifying architectural elements like walls and doors and classifying objects with semantic categories. It works with various data sources, including monocular videos, RGBD images, and LiDAR sensors, making it versatile for applications in robotics and navigation.
Technical Queries and Clarifications: Discussions raised questions about the classification of SpatialLM as a language model, given its processing of non-human readable data. It was clarified that it outputs structured 3D object graphs, which is a specific form of language, and is based on Llama 1B and Qwen 0.5B.
Model Performance and Applications: Users expressed amazement at the model's capabilities with only 1.25 billion parameters and discussed potential applications, such as integration with text-to-speech for the visually impaired and use in robot vacuum cleaners. The model's ability to estimate object heights and its potential for integration into reasoning models were also highlighted.

Theme 2. Qwen 3: Modular AI Model Developments

Qwen 3 is coming soon! (Score: 402, Comments: 97): Qwen 3 is anticipated to be released soon, as indicated by a pull request on the Hugging Face Transformers GitHub repository. The link to the pull request is here.
Discussion highlights the Qwen 3 MoE model's architecture, particularly its use of 128 experts with 8 activated per token, and the 15B MoE model size, which makes it suitable for CPU inference. Users express hope for larger models, like a potential 30-40B MoE or even a 100-120B MoE, to compete with modern models.
Several comments delve into the technical details and performance metrics of Qwen 3, with comparisons to other models like Deepseek v3. Active parameters are noted to be 2B, and there's a discussion on the model's potential performance, with references to benchmarks and model equivalence calculations.
The community is excited about Qwen 3's potential, especially its CPU compatibility and small active parameter size, which reduces computational resource requirements. There's interest in its embedding capabilities and curiosity about its performance in coding tasks, with some users noting the vocab size of 152k and max positional embeddings of 32k.

Theme 3. Docker's Competitive Leap: LLM in Containers

Docker's response to Ollama (Score: 240, Comments: 136): Docker is introducing a new feature that enables Mac GPU access, allowing users to run models like mistral/mistral-small on their machines. This update excites users as it enhances Docker Desktop's capability by allowing containers to utilize the Mac's GPU, as detailed in their official announcement and further discussed in a YouTube video.
The discussion highlights the use of wrappers like Ollama and llama-swap for managing and running models, with some users criticizing these as unnecessary abstractions over llama.cpp. However, others argue that these tools simplify deployment, especially for those not deeply familiar with technical setups, and offer modularity and ease of use in distributing and hosting models.
Docker's new feature enabling Mac GPU access is seen as a significant advancement, allowing Mac users to run applications in isolated environments with GPU acceleration. This update is particularly important for those using Apple silicon and is compared to the impact of GitHub Container Registry on Docker Hub, though some users express dissatisfaction with Docker's command-line interface.
There is a debate over the open-source community's approach, with some users expressing concern about projects like Ollama branding themselves instead of contributing to existing projects like llama.cpp. Others defend the modular approach, emphasizing the importance of simplicity in development and deployment, particularly in the context of AI model hosting and managing dependencies.

Theme 4. Gemma 3, Mistral 24B, and QwQ 32B: Performance Comparison

Gemma 3 27b vs. Mistral 24b vs. QwQ 32b: I tested on personal benchmark, here's what I found out (Score: 231, Comments: 74): QwQ 32b excels in local LLM coding and reasoning, outperforming Deepseek r1 in some instances and significantly surpassing Gemma 3 27b and Mistral 24b. In mathematics, both Gemma and QwQ handle simple tasks well, with Gemma being faster but having a more restrictive license. Mistral 24b underperforms compared to the others, though it, along with Gemma, offers image support. For further details, refer to the blog post.
QwQ 32b's Performance and VRAM Requirements: Users confirm that QwQ 32b excels in coding and reasoning tasks, outperforming some cloud models, but note its significant VRAM requirements. This makes it challenging to run on a single GPU even with quantization, limiting its context window size.
Model Comparisons and Quantization Concerns: There's a need for clarity on Gemma's model type used in comparisons, as well as concerns about quantization settings, particularly for Mistral, which may affect performance. RekaAI_reka-flash-3 and ExaOne Deep are suggested as alternatives for users with limited hardware resources.
Benchmarking and Use Cases: Suggestions include running models like Gemma, Mistral, and QwQ in an IDE for more practical benchmarks, and testing ExaOne Deep and DeepHermes for comparison. Users also highlight QwQ 32b's strong performance in transcript summarization, occasionally surpassing GPT-4/4.5.

Theme 5. ByteDance's InfiniteYou: Identity-Preserving Image Model

ByteDance released on HuggingFace an open image model that generates Photo While Preserving Your Identity (Score: 128, Comments: 36): ByteDance has launched InfiniteYou, an image generation model available on HuggingFace that allows for flexible photo recrafting while preserving individual identity. The project features a diverse array of portraits, showcasing individuals in various settings, emphasizing a blend of realism and artistic interpretation. Key resources include the project page, code repository, and the model on HuggingFace.
Commenters critique the image quality of InfiniteYou, describing it as "rough" and "plastic-y," indicating skepticism about the model's ability to generate realistic images.
macumazana points out that similar work has been done previously with older models, suggesting that InfiniteYou doesn't offer significant novelty or advancement in the field.
moofunk suggests a strategic approach by focusing on model strengths and proposing the idea of chaining models to enhance photo generation quality, rather than relying on single-model outputs.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. 5 Second Flux Innovation: Nunchaku, InfiniteYou, and Step-Video-TI2V

5 Second Flux images - Nunchaku Flux - RTX 3090 (Score: 263, Comments: 66): MIT-Han-Lab has released ComfyUI-nunchaku, a tool for generating 5-second flux images. The announcement also mentions the RTX 3090, although no specific details about its role in the project are provided.
Users expressed skepticism about ComfyUI-nunchaku's output quality, noting that images appear "plastic" and similar to those generated by models like SDXL. Concerns were specifically raised about the artificial appearance of human faces, often featuring cleft chins.
Nunchaku SVDQuant offers significant performance improvements, reducing model size by 3.6×, memory usage by 3.5×, and achieving 10.1× speedup on NVIDIA RTX 4090 laptops by eliminating CPU offloading. The tool supports lora conversion similar to TensorRT, with detailed setup instructions provided via GitHub and Hugging Face.
A user shared their experience using the deepcompressor repo for quantizing flux finetunes, encountering challenges with cuda/transformers dependencies and VRAM limitations, suggesting a 24GB VRAM is insufficient. They provided a workaround by renting an A40 GPU and shared steps for potential dependency fixes.

InfiniteYou from ByteDance new SOTA 0-shot identity perseveration based on FLUX - models and code published (Score: 193, Comments: 59): ByteDance has introduced InfiniteYou, a new state-of-the-art zero-shot identity preservation model based on FLUX. The model, alongside its code, has been published, showcasing its ability to enhance identity characteristics in images, as demonstrated in a comparison grid featuring ID Image, PuLID-FLUX, and InfU (Our model), with InfU showing advanced rendering and fidelity in identity preservation.
Discussion around Flux's identity preservation reveals mixed opinions: while some users note that the model effectively adheres to prompts and maintains facial details, others criticize it for not accurately replicating input features like eye color and hair, as well as the "Flux chin" issue. ByteDance's InfiniteYou is viewed as a significant step forward, though its realism is questioned by some users.
Hugging Face is a focal point for the model's availability, with users eager to see its integration into ComfyUI workflows. There is a demand for better handling of features like freckles, scars, and tattoos, which are seen as essential for high-quality facial replicas.
Users express impatience with the current Flux model's aesthetic and predict a shift once a new open model becomes available. ByteDance's approach focuses on research and methodology rather than aesthetics, which some users find lacking in terms of practical, photorealistic application.

Step-Video-TI2V - a 30B parameter (!) text-guided image-to-video model, released (Score: 119, Comments: 61): Step-Video-TI2V is a newly launched 30 billion parameter model that facilitates text-guided image-to-video conversion. This release marks a significant advancement in AI-driven video generation.
Model Size and Performance: The Step-Video-TI2V model, with its 30 billion parameters and 59GB weights, is seen as a significant advancement, though its local usage is challenged by high VRAM requirements (up to 70GB for 720p videos). Users discuss the impracticality of its current resource demands, jokingly suggesting the need for a kidney to run it locally.
Chinese AI Development: There is a perception that China is advancing rapidly in the AI sector, with multiple video models emerging consecutively, while the US and EU lag behind. Some users note that although China is producing these models, they do not always provide the best quality outputs, as seen with Yuewen's implementation.
Quality and Compression Concerns: Users express concerns about the compression techniques used in the model, which result in a loss of detail despite the model's large size. The model's reliance on 16x spatial and 8x temporal compression is criticized for hindering its ability to generate fine details, leading to glitches and subpar results in video outputs.

Theme 2. Text-to-Video AI Advancements: From Open-Source Initiatives

Remade is open sourcing all their Wan LoRAs on Hugging Face under the Apache 2.0 license (Score: 171, Comments: 21): Remade is open sourcing all their Wan LoRAs on Hugging Face under the Apache 2.0 license, allowing for broader access and use within the AI community.
Some users, like Weird_With_A_Beard and Mrwhatever79, are enthusiastic about the Wan LoRAs, expressing gratitude and enjoyment in using them for video generation. However, others are skeptical about the claim of open-sourcing, highlighting that LoRAs generally don't have licenses and questioning the authenticity of the open-source claim due to premium services offered via a Discord server.
LindaSawzRH and hurrdurrimanaccount criticize the open-source claim, arguing that the LoRAs are not truly open-source if the training data and processes are not provided, and access is behind a paywall. They express concerns about the precedent this sets for the community, with hurrdurrimanaccount questioning whether datasets are being shared.
Ballz0fSteel shows interest in a tutorial for training Wan LoRAs, but LindaSawzRH suggests that access to such information might require payment, further fueling the discussion about the transparency and accessibility of the resources.

Wan I2V - start-end frame experimental support (Score: 160, Comments: 21): Wan I2V introduces experimental support for start-end frames, enhancing its capabilities in video processing. This update is likely to improve the precision and efficiency of video frame analysis.
WanVideoWrapper Update: The WanVideoWrapper by Kijai received an update for experimental start-end frame support, previously available in raindrop313's repository. This improvement allows the introduction of new objects in scenes which were difficult to prompt before, although some issues like missing elements and color shifts persist, which can be mitigated by adjusting parameters such as caching and resolution.
Community Excitement and Testing: Users expressed enthusiasm about the update, with some already testing it with Kija nodes and reporting positive results. The feature is seen as a potential game-changer for scripted storytelling, offering more reliability than previous versions.
Open Source and Collaboration: The community appreciates the open-source nature of the project, highlighting contributions from various developers like raindrop313 and expressing gratitude for the collaborative efforts that led to these advancements.

Theme 3. Critique of LLM Evaluation Methods: Simplification & Blame

Shots Fired (Score: 1372, Comments: 284): Critics argue that LLM intelligence tests are often unevaluative, implying that they fail to accurately measure or reflect the true capabilities and intelligence of large language models. This criticism suggests a need for more rigorous and meaningful evaluation methods to assess AI performance.
Yann LeCun's Perspective: Yann LeCun is discussed extensively, with many agreeing that LLMs alone won't lead to AGI. LeCun emphasizes the need for new AI architectures beyond LLMs, as presented in his speech at the NVDA conference, and is recognized for his significant contributions to AI, particularly in deep learning and CNNs.
Limitations of LLMs: Several commenters argue that LLMs are limited in achieving AGI due to their architecture, which lacks the ability to learn and adapt like human intelligence. Virtual Intelligence (VI) is suggested as a more appropriate term for current AI capabilities, emphasizing utility over consciousness or self-awareness.
Current AI Utility and Misconceptions: There is a consensus that while LLMs are not useless, they are tools that require proper use and understanding. Some express skepticism about the AI hype, noting that tools like Claude have improved and can enhance productivity, but they do not replace human jobs or achieve independent reasoning.

After giving me a puzzle I couldn’t solve I asked for one simpler (Score: 529, Comments: 113): The post discusses a ChatGPT interaction where the user requested a simpler puzzle after being unable to solve "The Three Chests" puzzle. The AI responded without sarcasm, implying it genuinely believed the user needed an easier challenge, highlighting potential limitations in understanding user intent or context.
Logical Reasoning and Puzzle Analysis: Claude's analysis of "The Three Chests" puzzle demonstrates a classic logical reasoning approach, questioning the accuracy of labels and considering potential twists like incorrect labels. The discussion highlights the need to consider whether all labels are incorrect, which would lead to choosing the chest labeled "Gold" after testing the "Silver" chest first.
Humor and Sarcasm: Several commenters, like EuphoricDissonance and Careless_General5380, use humor to engage with the topic, joking about the treasure being "love" or "friends made along the way." This reflects the light-hearted nature of the discussion around the puzzle's simplicity and the AI's response.
Puzzle Constraints and Solutions: Toeffli points out a missing element in the puzzle regarding truth-telling and lying notes, which affects determining the treasure's location. Professional_Text_11 and others note the absence of a rule against opening all three chests, suggesting a straightforward solution that bypasses the intended puzzle logic.

Theme 4. AI-Generated Satire and Historical Reconstructions

Doge The Builder – Can He Break It? (Score: 183, Comments: 24): Doge The Builder satirizes Elon Musk and Dogecoin by comparing them to "Bob the Builder," highlighting themes of greed, economic chaos, and unchecked capitalism. The post humorously references a fictional licensing by the Department of Automated Truth (DOAT) and suggests a YouTube link for viewing in the comments.
AI's Role: Commenters express admiration for the capabilities of AI in creating content like "Doge The Builder," highlighting its impressive nature in the current age.
Cultural Impact: Discussions touch on the influence of individuals like Elon Musk on society's zeitgeist, questioning the morality of amassing wealth and its implications on civilization.
Creation Curiosity: There is curiosity about the process of creating satirical content, with inquiries on how such pieces are made.

Made this in 5 minutes. We're going to need some good AI detection soon... (Score: 13355, Comments: 552): The post highlights the urgent need for improved AI detection technologies, specifically in the context of rapidly produced AI-generated videos. The author underscores the ease with which such content can be created, implying potential challenges in distinguishing authentic videos from AI-generated ones.
Concerns about the authenticity of AI-generated videos are prevalent, with users like YoshiTheDog420 expressing skepticism about ever having reliable AI detection tools. They fear that visual evidence could become unreliable, with any footage potentially dismissed as AI-generated, undermining trust in media.
The discussion highlights the ease with which people can be fooled by AI-generated content, as Rude_Adeptness_8772 suggests a significant portion of elderly individuals might perceive such videos as genuine. Visarar_01 shares an anecdote about a family member being deceived by an AI video, illustrating the potential for misinformation.
Some commenters, like ProfessionalCreme119, propose solutions such as integrating AI detection tools into devices to identify AI-generated videos, suggesting a need for widespread implementation of detection mechanisms. Others, like Soft-Community-8627, warn about the potential misuse of AI to fabricate events, which could be leveraged by governments to manipulate public perception.

Theme 5. AI Art and Workflow Transparency Debates

Can we start banning people showcasing their work without any workflow details/tools used? (Score: 265, Comments: 56): The post suggests banning art posts that do not include workflow details or tools used, arguing that without such information, these posts function merely as advertisements. The author calls for a change to ensure contributions are informative and beneficial to the community.
Many users, including Altruistic-Mix-7277 and GravitationalGrapple, argue against banning posts without workflow details, suggesting that the subreddit serves both as a gallery and a resource for learning. They emphasize the importance of open-ended discussion and the ability to ask questions directly in comments for additional details.
Lishtenbird highlights the ongoing issue of "no workflow" posts, noting the disparity in engagement between detailed guides and flashy, low-effort content. They suggest implementing an auto-mod comment system to ensure that at least some information, like prompts, is shared, although this would require additional resources to implement.
ByWillAlone and wonderflex discuss the subreddit’s dual nature as both an art showcase and a learning platform. They propose the idea of creating a separate space, like r/aiartschool, dedicated to in-depth tutorials and high-effort content, while maintaining the voting system to naturally filter content quality.

This guy released a massive ComfyUI workflow for morphing AI textures... it's really impressive (TextureFlow) (Score: 105, Comments: 11): ComfyUI released a significant workflow called TextureFlow for generating and morphing AI textures. The release is notable for its impressive capabilities in AI texture manipulation.
TextureFlow is available via a direct link to the workflow JSON on GitHub. Users are exploring its capabilities for AI texture manipulation and generation.
Users like Parulanihon are experimenting with TextureFlow for logo creation, recommending a denoising level of 0.3 or 0.4 max. However, challenges include achieving transparent backgrounds and aligning with outdated YouTube tutorials, necessitating a mix-and-match approach to achieve desired results.
No-Mistake8127 is using TextureFlow to create animated artwork for a custom Raspberry Pi driven digital frame, highlighting its ability to handle inputs such as video, text prompts, photos, movement, and controlnets.

AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Pricing Showdowns and Censorship Woes 

Cursor Burns Wallets: Users rage over charges for connection errors and lost premium requests when downgrading plans. One member quipped "Normal agent is a shit show without max" and opted out due to cost inefficiencies.  
OpenAI’s o1 Pro Overheats: Developers call o1 Pro a "monumentally overpriced" model, preferring Claude or cheaper alternatives like DeepSeek. Some joked that o1 Pro costs $30 per full send, making it a luxury few can afford.  
Pear vs. Cursor Price War: Some note that Pear is cheaper but “can’t code worth a damn” and relies on roo code for file changes. Others warn that if Cursor’s pricing and context limits don’t improve, they might jump ship.

Theme 2. Model Upgrades and Debates 

Claude 3.7 Provokes Passion: Some swear 3.7 is "better for going the extra mile," while others say 3.5 is more accurate. The community agrees "no single hammer is better for every job," reflecting a divide over performance quirks.  
Qwen 3 Draws Crowds: People excitedly track news that Qwen 3 is imminent, following the recent Qwen 2.5 Omni release. Leaked hints suggest it might challenge top-tier models like GPT-4.5.  
Sora Falls Short: Despite big teasers, this public release underwhelmed users who found it inferior to Keling AI and Hailuo AI. Critics suspect "turbo version" hype overshadowed real performance limitations.

Theme 3. Fine-Tuning Adventures and VRAM Tussles 

Gemma 3 Keeps Breaking: Missing dependencies and --no-deps bugs stumped users trying older Colab notebooks. One dev lamented "Why does Llama fail here, but works fine in my other environment?" 
QLoRA Slays Memory Woes: Turning on QLoRA instantly cut VRAM usage, letting Gemma 3 run on smaller hardware. Loading in 4-bit mode helped avoid out-of-memory crashes.  
DeepHermes 24B Overflows VRAM: People face OOM errors running 24B on multi-GPU rigs, even with minimal context. Suggestions include 8-bit versions or fine-tuning multi-GPU setups with flags like --tensor-split.

Theme 4. New Tools, Agents, and RAG 

Oblix Orchestrates Edge vs. Cloud: A slick demo shows agents juggling local and remote LLMs for cost/performance trade-offs. The system decides whether to run queries on hardware like Ollama or farm them out to OpenAI.  
Local RAG App Wows Coders: A fully local retrieval-augmented generation tool chats with code using GitIngest for parsing and Streamlit for UI. It runs Meta’s Llama 3.2 locally through Ollama, delighting developers seeking offline solutions.  
Semantic Workbench Rides In: Microsoft’s new VS Code extension prototypes multi-agentic systems in one place. Users wonder if it doubles as an MCP framework or stays primarily a dev tool.

Theme 5. Tokenizer Tricks, Synthetic Data, and Hardware Upgrades 

SuperBPE Shrinks Sequences: A newly minted superword tokenizer cuts sequence lengths by 33% at a fixed 200k vocab. Tests show an 8% MMLU boost and 27% faster inference compared to standard BPE.  
Synthetic Data Reigns: Researchers highlight filtering, augmentation, and generation as a way to “reject data we already predict well.” Open-source labs like Bespoke promise fresh synthetic pipelines for targeted fine-tuning.  
Nvidia’s Blackwell Sparks Skepticism: Next-gen RTX Pro cards tout up to 96GB of VRAM but threaten to worsen GPU supply shortages. Enthusiasts doubt Nvidia’s claim that “we’ll fix availability by May/June.”

PART 1: High level Discord summaries

Cursor Community Discord

Cursor's Pricing Proves Punitive: Users express frustration with Cursor's pricing model, citing charges for connection errors, resumed requests, and 'tool charges for no responses', with some reporting lost premium requests after downgrading plans.
Some users find the 'Normal agent is a shit show without max' and find it 'quicker than spending real $ on max', opting out of premium due to perceived cost inefficiencies.

Claude 3.7 Causing Consternation: Members report issues with Claude 3.7's performance in Cursor, claiming false assumptions and decreased reliability compared to Claude 3.5, with some having the opposite experience.
Opinions vary, with one user stating '3.7 is better for going the extra mile. 3.5 is better for accuracy', while another notes 'There’s no single hammer that’s better for every job'.

Pear's Potential Prompts Pricey Problems: Users compare Pear AI to Cursor, noting Pear’s cheaper pricing but also concerns about its reliance on roo code and per-file change acceptance workflow, whereas others cite that Pear can’t code worth a damn.
Some Cursor users, like one who said 'I don't like pear AI that much, mainly cause they use roo code and roo code is not that stable', are considering switching if Cursor doesn't improve its context window or pricing.

React Racketeering Raises Rivalries: The channel debates the merits of React versus Svelte for a SaaS app, with some preferring React for its large community and compatibility with Cloudflare Pages, while others find it slow and messy, advocating for Svelte.
The user base seems pretty split, with arguments ranging from 'react is slow af' to 'svelte also doesn't need workarounds'.

Vibe Visions Vary Wildly: Members debated the usefulness of vibe coding, with some calling it a marketing ploy and a crock, while others argued that it is a real thing requiring technical expertise, like a basic knowledge of Git.
Despite varying definitions, a consensus emerged that successful 'vibing' requires critical thinking, debugging skills, and the ability to steer AI tools effectively.

Unsloth AI (Daniel Han) Discord

Gemma 3 gets Dependency Glitch: Gemma 3 has a bug with --no-deps causing missing dependencies in older notebooks, and a Google Colab with a 2018 GPU might be too outdated for some tasks, according to this discussion.
A user encountered issues with Llama failing in a Gemma-specific environment, but the same notebook failed on Google Colab due to missing dependencies, according to this notebook.

Vision Fine-Tuning still on Unsloth's backburner: Despite Gemma 3 supporting images, vision fine-tuning is not yet supported on Unsloth, according to this issue.
A user attempted to fine-tune Gemma 3 using Llama code, which failed, but they still wanted to know if the model would run images after fine-tuning text only.

QLoRA to the Rescue for Gemma 3: Users encountered memory errors when running the Gemma 3 model, but enabling QLoRA resolved the issue, likely due to reduced VRAM usage as mentioned here.
Turning on QLoRA automatically sets load in 4bit = true, which helps to reduce VRAM usage.

Community Seeks Synthetic Data Nirvana: Members discussed tools for synthetic data generation, with one user recommending Bespoke Labs due to its extensive features, and confirmed it's open source with a dedicated Discord server.
One user inquired about the availability of example notebooks or Colabs demonstrating the implementation of GRPO with vision models, but such an example was currently lacking, but is planned for the future.

DPO Trainer gets an Upgrade: A user shared their experience upgrading to the latest DPO Trainer with the latest Unsloth and Unsloth Zoo, providing a link to their small diff for others facing similar challenges.
The user also found the Zephyr (7B)-DPO notebook confusing and suggested updating it via a pull request to the Unsloth notebooks repository.

OpenAI Discord

OpenAI's o1 Pro Pricing Sparks Outrage: Users are unhappy with OpenAI's API pricing for the o1 Pro model, calling it severely overpriced, and preferring Claude.
Some joked about OpenAI's pricing strategy, and observed that DeepSeek offers comparable performance at a fraction of the cost, according to shared charts.

Debate Surrounds o1 Architecture: Discord users are debating if OpenAI's o1 model is based on GPT-4o, with conflicting claims about its architecture.
Arguments focus on knowledge cutoff dates; some think o1 is just gpt4o with reasoning.

Perplexity Desktop App Boosts Loyalty: Perplexity is rewarding desktop app users with a free month of Pro after 7 days of use.
The reward is limited to the Windows app, excluding macOS, iOS, Android, and Web users.

GPT Pro Subscription Woes Plague Users: Users reported paying for GPT Pro but not getting subscription access, and expressed frustration over OpenAI support's unresponsiveness.
Affected users were directed to help.openai.com for support, with assurances that the channel cannot assist with billing matters.

Structured Output Hinders AI Reasoning: Members tested if the phrase "No other keys or commentary are allowed" reduces reasoning capabilities in structured output, and discovered an adverse effect, along with increased token usage.
Results suggest that models are overthinking ethical implications, in these conditions.

LM Studio Discord

LM Studio API Courts RAG Integration: Users are eyeing the potential of RAG (Retrieval-Augmented Generation) integration with the LM Studio server API, similar to Ollama and Qdrant.
While the GUI fetches only the top 3 vectors, the API could enable customized implementations with embeddings and vector databases, according to one user.

ZeroGPU Pro Users Bump into Quota Walls: A ZeroGPU Pro user hit their GPU quota despite upgrading, possibly because they were using a FastAPI backend instead of a Gradio UI.
They are seeking advice on resolving the quota issue when calling the ZeroGPU Pro API from their own application.

LM Studio Inspires Browser Extension Ideas: Potential browser extensions for LM Studio are being discussed, including webpage translation using Gemma 3 27b and YouTube video summarization.
One member suggested extensions to summarize YouTube videos by extracting and summarizing subtitles, while feasibility of real-time webpage translation was debated due to speed constraints.

Audio Model Alchemists Brew with PyTorch: A member is experimenting with pretraining an audio model from scratch using PyTorch and a transformer architecture, aiming to generate proper audio from tokens.
Another member shared their model's song outputs based on names (e.g., abba.mp3, mj.mp3) and suggested fine-tuning or uploading the model to Hugging Face for broader experimentation.

RX 9070 owners report slow speeds: Several users with the new RX 9070 cards are reporting slower inference speeds compared to older cards, with one user reporting their speeds dropped from 5-7 tok/s to around 3 tok/s with a Granite 3.1 8B Q8_0 model.
The performance issues are suspected to stem from bugs in AMD's Vulkan drivers.

aider (Paul Gauthier) Discord

Claude Code Copies Aider Web Search: A user observed that Claude code is implementing web search in a similar fashion to Aider, which was demonstrated in a post on X.
It was clarified that the new Claude web search feature is currently exclusive to Claude Desktop.

Aider's Commit Flag Triggers Hook Headaches: Aider adds the --no-verify flag during commits, bypassing system hooks, according to aider/repo.py code.
The maintainer explained that this is because commit hooks could cause arbitrarily strange things to happen, suggesting the use of lint and test hooks as a workaround.

o1-pro API Costs Price Users Out: Users trying o1-pro via the API reported exorbitant costs of $30 per full send, rendering it prohibitive.
The high cost spurred discussions on caching mechanisms, with speculation on whether OpenAI's automatic prompt caching could help mitigate expenses.

Pipx Package Installation Woes on Ubuntu: A user encountered difficulties installing Aider for all users on Ubuntu, despite advice to use sudo pipx install --global aider-chat.
They eventually succeeded by installing with uv at /usr/local/bin after overcoming pip and version conflict issues.

Aider's Auto-Fixing needs manual prompting: A user reported that Aider needs manual prompts such as "fix the tests" after each failure, despite having enabled the --auto-test parameter, referencing the documentation here.
Aider should automatically fix test failures if configured with the "--auto-test" setting.

Perplexity AI Discord

Deep Research Limits Debated Fiercely: Users are debating Deep Research usage limits, referencing the Perplexity blog stating unlimited access for Pro, while others cite a 500 queries per day limit.
A member pointed to a tweet by Aravind Srinivas indicating Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries.

GPT 4.5's Disappearance Creates Confusion: Users report GPT 4.5 is missing from Perplexity Pro, with some suggesting the model was removed after gaining new subscribers.
Some users lauded 4.5 as SOTA for writing text while others deemed it slow and uninsightful, creating uncertainty among the user base.

Perplexity Users Frustrated by Auto Model Switching Glitch: Users are experiencing a glitch where Perplexity reverts to the Auto model, even after selecting a specific model like Claude.
This issue requires users to manually reselect their preferred model, leading to frustration, especially among those who favor Claude over R1.

API Key Spend Tracking Feature Requested: A feature request was submitted to GitHub to allow users to name API keys for better spend tracking.
Currently, users can track spend by API key, but lack the ability to assign names, hindering efficient management of API usage costs.

R1-1776 Finetuning Faces Censorship Scrutiny: An independent researcher found canned CCP answers and censored content in R1-1776-671B and the distilled R1-1776-70B when prompted on topics like Tiananmen Square, documented in this blogpost
The researchers raised concerns regarding political bias and content filtering in the open-source weights of the model.

Interconnects (Nathan Lambert) Discord

Claude Unleashes Web Search: Web search is now available in claude.ai, enabling Claude to finally search the internet and deliver true positives for research queries, confirmed in this tweet.
It was later confirmed the search engine being used by Claude is Brave.

Midjourney Lead Swaps Beauty for Code: After 3 years leading model development at Midjourney, a key member joined Cursor to work on coding agents, marking a shift from a focus on beauty and creativity to code, as noted in this tweet.
The move signals a growing emphasis on practical AI applications in coding environments.

InternVL's Training Code Opens Up: Members expressed surprise that InternVL has open source training code, making it one of the few notable models with open training pipelines, with InternVL's packing implementation provided as an example of the dataloading approach.
The open-source nature of InternVL allows the community to inspect the data loading process and dataset iteration.

SuperBPE Tokenizer boosts efficiency: SuperBPE, a new superword tokenizer that includes tokens spanning multiple words, created a model that consistently outperforms the BPE baseline on 30 downstream tasks (+8% MMLU), while being 27% more efficient at inference time, described in this tweet.
At a fixed vocab size of 200k, SuperBPE reduces sequence length by 33% on average.

Smaller Models Benefit from Synthetic Augmentation: Members discussed whether smaller datasets are a new trend, with larger models like GPT-4.5 potentially needing more data, especially during various post-training stages and the conversation touched on the use of synthetic data to augment smaller datasets for training smaller models.
The conversation suggested a trade-off between data size, model size, and the use of synthetically generated data, implying a strategy where smaller models might rely more on enhanced datasets, while larger models can effectively utilize larger volumes of raw data.

LMArena Discord

Claude Gets Overrated, Grok3 Still King?: Community members suggest Claude is overrated in coding due to limited evaluations beyond SWE-bench, hinting it doesn't match Grok3 on livecodebench.
The ratings may be skewed by non-developers, leading to inaccurate assessments of its true capabilities.

Gemma Gets Glowing Review: Members were amazed by Gemma3's 1340 score and its relatively small 27B parameter size.
One member described Gemma's responses as autistic, giving very brief answers, often when a much longer one is warranted.

Deepseek R1 Hogging VRAM: Deepseek R1 requires around 1000GB of VRAM, with one user deploying it on 8xH200s.
Despite high VRAM usage, there are claims that Deepseek R1 exhibits baked-in PRO CHINA biases, raising concerns about its use, with one user saying tldr deepseek is #&&@% don't recommend using it.

Qwen 3 Coming Soon, Qwen 2.5 Omni Announced: Reports indicate that Qwen 3 is coming soon, confirmed by a post on the Hugging Face Transformer repository.
This news follows the announcement of Qwen 2.5 Omni, sparking interest and anticipation within the community, as noted in a Tweet from Lincoln 🇿🇦.

Sora's Turbo Version Struggles, Hype not Matching Reality: Users found Sora's public release underwhelming compared to its promotional materials, and maybe inferior to competitors like Keling AI and Hailuo AI.
It's suspected that OpenAI used huge amounts of compute over hours to generate them and the released Sora version is the turbo version.

Notebook LM Discord

NLM's Podcast Feature Gets Mixed Reactions: Users are reporting positive experiences with NotebookLM's Podcast feature, though some find that the AI cuts them short during discussions.
One user likened the experience to being part of a radio show where I can talk to hosts, but felt like a third wheel because the AI would revert to its own script.

Gemini 1.5 Pro Powers NotebookLM: Users discuss the underlying model of NotebookLM, with speculation pointing towards Gemini 1.5 Pro, while others suggest Gemini 2.0.
The discussion underscores the importance of NotebookLM staying grounded in its sources, a key differentiator from Gemini.

Users Seek Streamlined PDF Processing: A user is seeking a more efficient workflow for scanning physical papers into private online storage and making them searchable via natural language queries, and asks whether taking photos with iPhone and sending to NLM for automatic naming and OCR is more efficient.
The current manual process involves scanning to PDF, sending to Gmail, manually naming each file, OCR processing, and importing into NotebookLM.

AI Avatar Lip Sync Services Face Off: Members compared lip syncing services for AI avatars, noting that Hedra is great but pricey.
RunwayLM garnered less favorable feedback.

Mind Map Feature Slowly Unveiled: The Mind Map feature rollout is proceeding slowly, with many users, including Plus subscribers, not yet seeing it in their accounts.
Staff confirmed it will take a few days for all users to get it.

Nous Research AI Discord

Nvidia Blackwell RTX Pro Sparks Supply Chain Concerns: Nvidia launched the Blackwell RTX Pro series for various platforms, potentially squeezing the already tight Blackwell GPU supply.
While Nvidia anticipates improved GPU availability by May/June, skepticism persists among community members.

Dataset Evaluation & Augmentation Proves Paramount: Discussions highlighted dataset evaluation, augmentation, sorting, and categorization as effective methods for using GPU hours, with a suggestion to filter data using a small model.
A member noted the potential of using a small model to reject data, describing the area as "underexplored in public" and cited Predictive Data Selection and Programming Every Example.

DeepHermes 24B Stumbles on Multi-GPU Setup: A user encountered Out-of-Memory (OOM) errors running DeepHermes 24B on a 5x 3090 setup using llama.cpp, even with minimal context settings.
Suggested solutions involved using the 8-bit version, and verifying multi-GPU configurations with --device, --split-mode, and --tensor-split flags.

Hermes 3 Powers Up with Llama 3.2: Nous Research released Hermes 3 3B, a new addition to the Hermes LLM series, detailed in the Hermes 3 Technical Report.
This model features advanced agentic capabilities, improved roleplaying, reasoning, multi-turn conversation, and long context coherence over Hermes 2.

C# Developer Champions Anthropic LLMs: A developer offered their C# expertise and professional LLM experience to the community, highlighting their work on documentation and examples for Anthropic.
They cited examples such as a Titanfall 2-based generator and the Bladewolf example from Metal Gear Rising, accessible on the Anthropic GitHub.

HuggingFace Discord

Hugging Face APIs Suffer 404 Meltdown: Multiple Hugging Face API models experienced widespread 404 errors, causing significant downtime for dependent applications.
Users reported the outage lasted almost a whole day without official acknowledgement, urging the HF dev team for immediate attention.

Roblox's Voice Safety Classifier Speaks Up: Roblox released a large classification model trained on 2,374 hours of real-world voice chat to detect toxicity.
The model outputs a tensor with labels like Profanity, DatingAndSexting, Racist, Bullying, Other, NoViolation, and uses a synthetic data pipeline detailed in this blog post.

Fuse GPU VRAM via Tensor Tricks: Users explored techniques to combine VRAM from multiple GPUs, like running Gemma3-12B on an A2000 12GB and a 1060 6GB using tensor parallelism.
References were made to Ollama issues on GitHub and llama.cpp discussions for more on multi-GPU support.

Oblix Platform Juggle AI on Cloud and Device: The Oblix.ai platform intelligently routes AI tasks to cloud or edge based on complexity, latency requirements, and cost considerations, using autonomous agents for optimal performance.
A YouTube video demonstrates how Oblix dynamically decides whether to process each AI request locally or in the cloud.

Gradio Upgrade Unwraps Dataframe Feature: A user reported that upgrading to Gradio 5.22 caused the gr.Dataframe(wrap=True) feature to stop working; this wrapping feature was only functioning in Gradio 5.20.
No further information about this issue was provided.

MCP (Glama) Discord

Microsoft Intros Semantic Workbench: Microsoft launched the Semantic Workbench, a VS Code extension, which is a tool to prototype intelligent assistants, agents, and multi-agentic systems, prompting questions about its role as an MCP.
A member specifically inquired if the tool functions as an MCP.

MySQL Server Bombs Out: A user is encountering issues connecting mcp-mysql-server to Docker MySQL, reporting connection failures despite it working outside of MCP.
The error occurs with every connection attempt, creating a significant development hurdle.

Glama API 500 Error: A user reported receiving a 500 error from the Glama API, but another member stated that there have been no outages in the last 24 hours, and shared a code sample.
The code to reproduce is curl -X 'GET' 'https://glama.ai/api/mcp/v1/servers?first=10&query=github' -H 'accept: application/json'.

DaVinci Resolve MCP Seeks Speedy Server Claim: A user is seeking to resubmit their DaVinci Resolve MCP project with a license and updates and was told claiming the server might speed up the update process.
The project's repo hosts the relevant code.

Calendar Scheduling Gets Automated: A blog post detailed the use of Asana MCP and Google Calendar MCP with Goose to automate task scheduling, using the blog post.
Tasks are pulled from Asana, analyzed, and scheduled in Google Calendar with a single prompt.

OpenRouter (Alex Atallah) Discord

OpenRouter Eyes TTS, Image Gen Rollout: Members expressed interest in OpenRouter offering TTS and image generation, with some voicing concerns about potentially high pricing.
Pricing details and release dates for the new features are still under wraps.

Groq Hits Speed Bump, Not Sambanova: A member reported that Sambanova was down, but quickly corrected the statement, clarifying that it was Groq that was experiencing issues.
Service status updates for Groq were not immediately available.

GPT-4o Lands on OpenRouter: GPT-4o-64k-output-alpha is now available on OpenRouter, supporting both text and image inputs with text outputs.
The pricing is set at $6/M input tokens and $18/M output tokens.

Fireworks Heats Up Pricing War: Fireworks slashed pricing for R1 and V3, with V3 allegedly matching existing performance, pegged at .9/.9.
The move intensifies competition in the generative AI service market; more information can be found on the Fireworks pricing page.

GPU MODE Discord

Nvidia Talk Eyes Pythonic CUTLASS: Attendees will hear about the pythonic future of CUTLASS in its next major 4.0 version at GTC, especially its integration into Python.
Previously, a member announced their GTC presentation titled Performance-Optimized CUDA Kernels for Inference With Small Transformer Models [S73168] happening today at 4pm, focused on Hopper architecture.

BFloat16 Atomic Addition Sucks: A member reported that using tl.atomic_cas with a lock for atomic addition with bfloat16 actually works, but it sucks.
The member is seeking improvements to the implementation, and offered a code snippet using tl.atomic_cas with a lock, inviting the community to enhance its performance.

Triton's Simplicity Entices GPU Newbies: A member highlighted that Triton's key strength lies not in peak performance, but in its accessibility, enabling individuals with limited GPU experience to create complex kernels, and pointed to lucidrains/native-sparse-attention-pytorch as an example.
They noted that achieving peak performance on predefined workloads is relatively straightforward, but Triton's robustness is what sets it apart.

FlashMLA's SmemLayoutP Unveiled: A member inquired about the dimensions of SmemLayoutP in the FlashMLA code, specifically its shape ((2,2), kNThreadsS, 1, kBlockN/8) and the role of kNThreadsS in synchronizing P between warpgroups.
The member speculated whether other dimensions might be related to wgmma, awaiting clarification from other experts.

Grayscale Leaderboard Smokes the Competition: Multiple leaderboard submissions to the grayscale leaderboard were successful on GPUs: L4, T4, A100, and H100 using Modal runners with IDs 2351, 2429, 2430, 2431, 2459, and 2460.
Benchmark submission with id 2363 to leaderboard vectoradd on GPUs: T4, L4, A100, H100 using Modal runners also succeeded, indicating progress in the vectoradd benchmark across various GPU architectures.

Nomic.ai (GPT4All) Discord

Oblix Orchestrates Local vs Cloud LLMs: A member shared a demo video (https://youtu.be/j0dOVWWzBrE) of Oblix, which seamlessly switches between local vs cloud, using agents to monitor system resources and make decisions dynamically.
The platform orchestrates between Ollama and OpenAI for optimal performance and cost-efficiency, as detailed on Oblix.ai.

AI Engineers Compare LLM Leaderboards: Members shared links to Artificial Analysis and LM Arena to find reliable LLM leaderboards for specific purposes.
Concerns were raised about filtering relevant models from these lists, particularly avoiding outdated options like Grok-3.

Members Design Medical Data Processing PC: A member requested assistance with building a new PC to process medical data using AI, emphasizing the need for secure, offline operation.
Another member suggested starting with an Intel i9, 128GB RAM, and an Nvidia 4090 RTX.

GPT4All Struggles with Audio Transcription: A member inquired about using GPT4All for local audio file transcription, specifically uploading .wav files, but found that it wasn't working.
Another member clarified that GPT4All is primarily designed for docs/pdf, recommending XTTS webui for wav to text conversion, but cautioned that the installation process is complex.

Yannick Kilcher Discord

W-GANs Sidestep Gradient Explosion: W-GANs mitigate gradient saturation by being linear, avoiding the BCE issues of traditional GANs, as shown in Figure 2 of the W-GAN paper.
However, instability can still arise if the generator or discriminator becomes overly dominant, leading to saturation on both sides.

Transformers Get Soft with Slots: Members shared an image analysis on soft slot methods which shows how soft slots dynamically bind to input tokens or retrieved content in Transformers.
Equations for Attention and Soft Slots (S') were shown, with learnable slots using softmax and scaled dot-product attention.

OpenAI.fm's UX/UI: Fast but Flawed?: Members joked about the simple and rushed UX/UI of OpenAI.fm.
One member pointed out that a more structured protocol is easily disrupted by less structured protocols that can evolve according to user needs, and that clients consume more of what they like and less of what they don't.

G-Retriever Enables Chatting with Graphs: The G-Retriever paper details the semantic extraction of information from knowledge graphs, enabling chatting with your graph, graph QnA and Graph RAG.
The paper introduces a Graph Question Answering (GraphQA) benchmark with data from scene understanding, common sense reasoning, and knowledge graph reasoning.

Moore's Law Accelerates AI?: Members are discussing METR_Evals' research suggesting "Moore’s Law for AI agents", claiming the length of tasks AIs can do is doubling about every 7 months.
Some members refuted the claim, arguing that certain tasks are not interesting for probabilistic models.

LlamaIndex Discord

Local RAG App Deployed for Code Chat: A fully local, fully open-source RAG app has been built that can chat with your code and was announced in this tweet.
The app uses GitIngest to parse the code into summaries and markdown, Streamlit for the UI, and runs Meta's Llama 3.2 locally using Ollama.

TypeScript Bundler Config Fixed Import Bug: A member using LlamaIndex TS had an issue importing agent, which was resolved by updating the tsconfig bundler configuration.
The user confirmed that modifying the TS config resolved the import error, and thanked the community for the suggestion.

Parallel executions limited in Agent Workflows: A member asked about limiting parallel executions in Agent Workflows, specifically for a tool with a human-in-the-loop event due to the agent calling the tool multiple times in parallel.
The issue was replied on GitHub because the user sought to ensure the tool was called only once at a time.

Cohere Discord

Account Limit Trumps Trial Key Limit: Users clarified that the monthly limit of 1k requests for trial keys is per account, not per key.
They cautioned that creating multiple accounts to bypass this limit will result in removal of all accounts.

Cohere API's Throw Errors: Users encountered various Cohere API error messages, including invalid request, rate limiting, and token limits due to empty documents, short prompts, exceeding token limits, and incorrect model specifications.
Rate limiting errors are identified by a 429 status code, as detailed in the Cohere API documentation.

Cohere User Seeks Rate Limit Checker: A user inquired about an API to check their remaining rate limit usage.
Currently, there doesn't appear to be a direct API solution available.

Hospitality Expert Pioneers Low-Code Tech: Gaby, a professional in the hospitality industry, introduced herself as a low-code tech enthusiast, proficient with platforms like Make and Adalo.
Her expertise showcases the growing importance of low-code tools in various industries.

Modular (Mojo 🔥) Discord

Mojo's Duration Module Displays Weirdness: A developer working on a duration module proposal for Mojo ran into unexpected behavior with type casting between Ratio and Duration structs, sharing code snippet to demonstrate the issue.
The specifics of the bug involve unexpected results when converting between the two time formats.

Mojo and PyTorch Team Up?: A member speculated if using PyTorch in Mojo could speed up training with MAX.
The inquiry did not receive a response, leaving the potential benefits unconfirmed.

Mojo Community Debates Nanosecond Precision: The community debated using nanosecond precision as the base unit for time representation in Mojo; one member noted that a UInt64 of nanoseconds can cover over 500 years.
Another member countered that C++ guarantees a default time resolution of at least 292 years, emphasizing that seconds are the base SI unit for time.

DSPy Discord

MIPRO v2 Judges LLMs: A member reported using MIPRO v2 with LLM-as-a-judge as their evaluation metric and shared a link to a math reasoning tutorial showcasing its use.
The math reasoning tutorial demonstrates MIPRO as a metric for evaluating LLMs.

DSPy Shares LLM-as-a-Judge Documentation: Documentation on utilizing LLM-as-a-judge was shared from DSPy's learning resources.
The documentation details the use of AI feedback for metric evaluations.

Automatic Metrics Optimize DSPy: It was emphasized that automatic metrics are critical for evaluation and optimization within DSPy.
DSPy employs metrics to monitor progress and enhance program effectiveness.

Metrics Evaluate Task Performance: A metric is defined as a function that scores system outputs based on data examples; where simple tasks may use basic metrics like accuracy or exact match.
Complex tasks benefit from metrics that assess multiple output properties via AI feedback.

tinygrad (George Hotz) Discord

Member Questions Unet3d's Dimensions: A member inquired if the example unet3d model is actually 3D, proposing it might be 2.5D because it uses 2D convolutions and 2D transposes on 3D input.
They drew attention to the difference from a real 3D Unet architecture.

2D Convolutions Mimic 3D: The conversation clarified that using 2D convolutions on 3D input creates a 2.5D effect, in contrast to true 3D Unet architectures which use genuine 3D operations.
The original poster requested clarification on the dimensionality of the implementation.

Torchtune Discord

Paper Shared on Torchtune: krammnic shared a paper on the Torchtune channel.
No discussion occurred about this paper.

Follow-up on Paper's Relevance: The paper's title and abstract suggest potential relevance to the ongoing discussions within the Torchtune community.
Further investigation is needed to determine the paper's specific contributions and applicability to current projects.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Cursor Community ▷ #general (789 messages🔥🔥🔥):

Cursor pricing, Claude 3.7, Vibe coding, Pear AI vs Cursor, React vs Svelte 

Cursor's Pricing Proves Punitive: Users are frustrated with Cursor's pricing model, where they are charged for connection errors and resuming requests, as well as 'tool charges for no responses' and reported losing premium requests after downgrading plans.
They are finding that 'Normal agent is a shit show without max' and are 'quicker than spending real $ on max'.

Claude 3.7 Causing Consternation: Members are reporting issues with Claude 3.7's performance in Cursor, claiming that it makes false assumptions and is less reliable than Claude 3.5, whereas others are having the opposite experience.
As one user put it, '3.7 is better for going the extra mile. 3.5 is better for accuracy' with another adding 'There’s no single hammer that’s better for every job'.

Pear's Potential Prompts Pricey Problems: Users are comparing Pear AI to Cursor, noting Pear’s cheaper pricing but also concerns about its reliance on roo code and per file change acceptance workflow, while others cite that Pear can’t code worth a damn.
Some Cursor users, like one who said 'I don't like pear AI that much, mainly cause they use roo code and roo code is not that stable', are considering switching if Cursor doesn't improve its context window or pricing.

React Racketeering Raises Rivalries: The channel is debating the merits of React versus Svelte for a SaaS app, with some preferring React for its large community and compatibility with Cloudflare Pages, while others find it slow and messy, advocating for Svelte
The user base seems pretty split, with arguments ranging from 'react is slow af' to 'svelte also doesn't need workarounds'

Vibe Visions Vary Wildly: Members debated the usefulness of vibe coding, with some calling it a marketing ploy and a crock, while others argued that it is a real thing requiring technical expertise, like a basic knowledge of Git.
Despite varying definitions, a consensus emerged that successful 'vibing' requires critical thinking, debugging skills, and the ability to steer AI tools effectively.

Links mentioned:

ThePrimeagen - Twitch: 🚨🚨 DAY 1 - VIBE CODING A Game In 7 Days Using Cursor -- #ad🚨🚨
Auto Hide - Visual Studio Marketplace: Extension for Visual Studio Code - A tool to autohide the sidebar and terminal panel.
Cursor – Model Context Protocol: no description found
Cursor Directory: Find the best cursor rules for your framework and language
Tweet from Michael Feldstein (@msfeldstein): I wish they named 3.7 something else, its a totally different model, not a better 3.5Quoting Bass (@SeifBassam) Is anyone else finding that Claude Sonnet 3.5 is better than 3.7 for coding?I reverted a...
GitHub - demyxsh/demyx: Demyx is a Docker image that automates and manages WordPress installations. Traefik for reverse proxy with Lets Encrypt SSL/TLS. WordPress sites are powered by OpenLiteSpeed/NGINX-PHP and MariaDB.: Demyx is a Docker image that automates and manages WordPress installations. Traefik for reverse proxy with Lets Encrypt SSL/TLS. WordPress sites are powered by OpenLiteSpeed/NGINX-PHP and MariaDB. ...
GitHub - samuelrizzo/jira-mcp-server: Contribute to samuelrizzo/jira-mcp-server development by creating an account on GitHub.
GitHub - geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming - geekan/MetaGPT
Dialogo AI - Intelligent Task Automation: Dialogo AI provides intelligent AI agents that learn, adapt, and automate complex workflows across any platform. From data analysis to system management, our intelligent agents transform how you work.

Unsloth AI (Daniel Han) ▷ #general (241 messages🔥🔥):

Gemma 3 issues, Llama failing in Gemma environment, Vision fine-tuning on Gemma 3, QLoRA for Gemma 3, Synthetic data generation 

Gemma 3 glitches with dependencies, old notebooks trigger: Members reported that Gemma 3 has a bug with --no-deps that causes missing dependencies, and old notebooks have not been tested recently, according to this discussion.
It was also noted that Google Colab with a 2018 GPU might be too outdated for some tasks.

Llama struggles to function in Gemma-specific settings: A user encountered issues with Llama failing in a Gemma-specific environment but working fine in another environment without Gemma updates, referencing this notebook.
The user expressed confusion over why the same notebook failed on Google Colab, suggesting missing dependencies due to --no-deps.

Gemma 3 doesn't yet support vision fine-tuning: Despite Gemma 3 supporting images, vision fine-tuning is not yet supported on Unsloth, which was raised in this issue.
A user attempted to fine-tune Gemma 3 using Llama code, which failed, but they still wanted to know if the model would run images after fine-tuning text only.

QLoRA fixes memory errors in Gemma 3: When running the Gemma 3 model, users ran into memory errors, but enabling QLoRA resolved the issue, likely due to reduced VRAM usage as mentioned here.
Turning on QLoRA sets load in 4bit = true.

Synthetic Data: Bespoke Labs is a Cool Tool: Members discussed tools for synthetic data generation, with one user recommending Bespoke Labs due to its extensive features.
Another user confirmed it's open source with a dedicated Discord server.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
SUFE-AIFLM-Lab/Fin-R1 · Hugging Face: no description found
Unsloth Notebooks | Unsloth Documentation: Below is a list of all our notebooks:
notebooks/nb/Gemma3_(1B)-GRPO.ipynb at main · unslothai/notebooks: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - unslothai/notebooks
"Unsloth: Failed to make input require gradients!" When Vision-fine-tune Gemma3 · Issue #2131 · unslothai/unsloth: I'm tring to vision fine-tune Gemma3 refering this tutorial: https://colab.research.google.com/drive/1j0N4XTY1zXXy7mPAhOC1_gMYZ2F2EBlk?usp=sharing#scrollTo=QmUBVEnvCDJv I constructed my dataset li...
GitHub - canopyai/Orpheus-TTS: TTS Towards Human-Sounding Speech: TTS Towards Human-Sounding Speech. Contribute to canopyai/Orpheus-TTS development by creating an account on GitHub.
notebooks/nb at main · unslothai/notebooks: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - unslothai/notebooks
notebooks/nb/Gemma3_(4B).ipynb at main · unslothai/notebooks: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - unslothai/notebooks
notebooks/nb/Mistral_(7B)-Text_Completion.ipynb at main · unslothai/notebooks: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - unslothai/notebooks
Text Completion Notebook - Backwards requires embeddings to be bf16 or fp16 · Issue #2127 · unslothai/unsloth: I am trying to run the notebook from the Continue training, https://docs.unsloth.ai/basics/continued-pretraining Text completion notebook https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObe...

Unsloth AI (Daniel Han) ▷ #off-topic (3 messages):

Unsloth Submissions, Tiny-grad Spreadsheet for tasks, Github issues with high involvement 

Unsloth Submissions Start Getting Reviewed: A member mentioned that Unsloth submissions are starting to get reviewed, but they haven't helped out with any Unsloth issues yet.
Tiny-grad Spreadsheet on TODO tasks requested: A member inquired about a tiny-grad-like spreadsheet laying out the key things that need doing, wondering if it's mainly Github issues tagged with help wanted.
They felt this would be a good way to build confidence and stop lurking.

Github issues with high involvement are important: A member stated that if there's a Github issue with 5 or more people involved (excluding the team), then it would be pretty important to solve.

Unsloth AI (Daniel Han) ▷ #help (95 messages🔥🔥):

DPO Trainer Upgrade, Zephyr DPO Notebook Confusion, Gemma 3 (27b) inference issue, Unsloth save and push to hub during training, Unsloth finetuning voice models 

Navigating DPO Trainer Upgrade with Unsloth Patch: A user shared their experience upgrading to the latest DPO Trainer with the latest Unsloth and Unsloth Zoo, providing a link to their small diff for others facing similar challenges.
The user also found the Zephyr (7B)-DPO notebook confusing and suggested updating it via a pull request to the Unsloth notebooks repository.

Sampling during training causes breakage for Llama-3: A user reported that using LogCompletionsCallback to sample the model during training is broken for Llama-3, along with Gemma 3 (27b), resulting in an error related to default generation_config values.
A code snippet involving FastLanguageModel.for_inference(model) and FastLanguageModel.for_training(model) was shared, indicating an attempt to switch between inference and training modes within the callback.

Saving and pushing to Hub: One user inquired about saving and pushing models to the Hugging Face Hub during training with Unsloth, noting that the default save strategy wasn't working.
Another user suggested building a custom callback for uploading the model to the Hub whenever it saves to disk, but couldn't provide the specific code at the moment, but pointed to message history for the solution.

Merging Issue and Model Hallucinations: Users reported issues with merging LoRA models, experiencing problems where the merging process resulted in gibberish outputs or matrix alignment errors.
Another user found that when fine-tuning Phi-3.5-mini-instruct, the model hallucinates when there are too many numerics in the data.

GRPO with Vision Models Example Sought: A user inquired about the availability of example notebooks or Colabs demonstrating the implementation of GRPO with vision models, as mentioned in a recent Unsloth blog post.
The response indicated that such an example was currently lacking, but is planned for the future.

Links mentioned:

no title found: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Unsloth Requirements | Unsloth Documentation: Here are Unsloth's requirements including system and GPU VRAM requirements.
DPO Trainer: no description found
Updates _unsloth_get_batch_samples to accept a 4th device parameter. by mmathew23 · Pull Request #91 · unslothai/unsloth-zoo: get_batch_samples patch currently takes 3 parameters and transformer 4.50.0 has changed it to take 4 parameters. I&#39;ve updated the function to take device = None. The default keyword maintains ...
GitHub - unslothai/notebooks: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more.: Unsloth Fine-tuning Notebooks for Google Colab, Kaggle, Hugging Face and more. - unslothai/notebooks
updated dpo script with latest trl deps · toranb/sloth@9abead8: no description found

Unsloth AI (Daniel Han) ▷ #showcase (7 messages):

LLM chatbot, Personality bots 

Chatbot impersonates friend via finetuned LLM: A member created a chatbot that sounds like their friend Kolo via finetuned LLM.
The member noted that it kind of sounds like Andrew Tate a bit lmao.

Personality bots take off: A member thinks personality bots need to take off.
They argue that it is a very good use case for fine tuning and that it can be edgy, funny, and entertaining.

Link mentioned: Vite + React: no description found

Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):

Foundation Model Training, Tree-of-Thought, Monte Carlo Tree Search 

Turbocharge Foundation Model Training 6x: A member shared an image claiming a 6x speedup in foundation model training.
Another member commented "if it works well" is a big if, suggesting skepticism about the method's reliability.

ToT and MCTS: Relics or Relevant?: A member mentioned Tree-of-Thought (ToT) and Monte Carlo Tree Search (MCTS) as precursors to current test-time compute scaling strategies.
Another member shared their experience, stating "I tried Tree-of-Thought before didn't perform good", to which they were asked to clarify the specific tasks.

OpenAI ▷ #ai-discussions (296 messages🔥🔥):

OpenAI Pricing, o1 Model Architecture, Grok Deep Research, Perplexity desktop app 

OpenAI's o1 Pro Pricing Angers Users: Users expressed frustration with OpenAI's API pricing, particularly for the o1 Pro model, deeming it diabolically, horrifically, monumentally overpriced.
One user humorously remarked that one can only laugh at OpenAI's pricing philosophy, while another suggested that Claude outperforms o1 Pro.

o1 Architecture Under Scrutiny: Discord users debated whether OpenAI's o1 model is based on GPT-4o, with claims that o1 has its own distinct architecture, while others believe it's a fine-tuned version of GPT-4o.
Arguments centered on the fact that, if the base model were different, it would have a different knowledge cutoff, with many concluding that o1 is gpt4o with reasoning.

DeepSeek Outshines OpenAI: Members shared charts that claimed that models like DeepSeek were comparable in performance to other models.
The pricing of DeepSeek was significantly lower than that of OpenAI, leading to further frustration.

Perplexity Desktop App Rewards Loyalty: A user mentioned that Perplexity is offering a free month of Perplexity Pro for users who use their desktop app for 7 days straight.
This reward, however, is exclusive to the Windows app, excluding users on macOS, iOS, Android, and Web.

Grok Deep Research Compared: Users are testing Grok's Deep Research function.
One user said they are a great fan of perplexity research, after trying everything.

Link mentioned: AI Model & API Providers Analysis | Artificial Analysis: Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.

OpenAI ▷ #gpt-4-discussions (7 messages):

GPT Pro, Subscription Issues, OpenAI Support 

User struggles with GPT Pro Subscription: A user reported paying for GPT Pro but not receiving access to the subscription, expressing frustration that OpenAI support is unresponsive.
Another member advised contacting support at help.openai.com, emphasizing that no one in the channel can assist with billing issues.

OpenAI Support Unresponsive to Subscription Problems: A user reported that they have been unable to get a response from OpenAI support after filling out a form with all their information regarding a failed GPT Pro subscription.
The user indicated they have received no response since yesterday.

OpenAI ▷ #prompt-engineering (22 messages🔥):

Model Personalization, Strucutred output effect on reasoning, GPT memory usage, Github Copilot Pull Request Descriptions 

Models require personalization for various biases: The differences in model behavior are based on how the model guesses and if the model has any directed bias that needs to be adjusted for due to preferences or training data.
Different models may require unusual prompting to bypass these biases, especially in sensitive topics.

Structured output may affect reasoning: A member tested whether the prompt, "No other keys or commentary are allowed," reduces a model's reasoning capabilities when using structured output.
The results indicated it might increase token usage and worsen performance in some cases, possibly due to ethical contemplation.

Summarizing Chat history can't ignore directives: A member inquired about creating a prompt that allows ChatGPT to summarize its memory while ignoring specific directives given in previous conversations.
The goal was to retain general knowledge while disregarding specific instructions like name preferences.

Github Copilot for auto pull requests: A user seeks suggestions to automate pull request descriptions using GitHub Copilot, which breaks with medium-long texts.
They currently use a manual process involving ChatGPT to refine Copilot's summaries and want to optimize this workflow.

OpenAI ▷ #api-discussions (22 messages🔥):

Prompt Engineering Adaptability, Model Guessing and Bias, Model Memory and Personalization, Structured Output and Reasoning, Github Copilot PR Optimization 

Prompt Engineering becomes Model-Specific: Members observed that prompt engineering is becoming increasingly model-specific, particularly concerning built-in reasoning capabilities, leading to a greater investment of time in refining skills.
One member humorously noted, "It's a great way to excuse myself into even more time spent in my side love/hobby (AI). I'm just 'keeping up on my skills'."

AI Model Guessing and Directed Bias: One member suggested that differences in AI model behavior stem from 'how the model guesses' and whether the model has any directed bias that needs to be adjusted for.
They further added, *"I think training data for the model is like a vast sea.  Any topic 'well represented' in training data may have many unique instances in the model's understanding of the topic."

Model Memory Enhances with User Interaction: Members suggested that AI models adapt more effectively the more users engage with them, challenging the idea that AI is static.
One member recounted successfully connecting two non-interacting APIs through GPT, emphasizing the potential for innovation and pushing the boundaries of what's believed about AI, and to "don’t be afraid to push the limits of what we believe we know about ai, AI’s changing and fast, it’s gunna be shaping the future and we’re lucky enough to be working with it when it’s new".

Structured Output Restraints Examined: A member questioned if using the phrase "No other keys or commentary are allowed" to enforce structured output could reduce reasoning in models, prompting experimentation.
The user ultimately noted that the phrase had little to no effect, or in some cases, had the opposite effect.

Automating Pull Request Descriptions with Copilot: A user is seeking advice on automating pull request descriptions with Github Copilot due to issues with long inputs breaking Copilot inside Github.
The user is using this prompt:

Create a pull request body description that:
- Always begins with: "This pull request introduces..."
- Includes the following sections: **Additions**, **Fixes**, **Refactors**, and **Deletions** where possible.
- Avoids any references to commit messages, links, or minor changes (such as TypeScript interface tweaks).
- Provides a short, bullet-point summary for each section.
- Maintains the same uniform, consistent structure.

LM Studio ▷ #general (103 messages🔥🔥):

LM Studio Server API for RAG, ZeroGPU Pro Upgrade Issues, Browser Extensions for LM Studio, Audio Model Training with PyTorch, Speculative Decoding Crashes 

LM Studio API Eyes RAG Integration: Users are exploring the potential of integrating RAG (Retrieval-Augmented Generation) functionality with the LM Studio server API, similar to Ollama and Qdrant.
One user noted that while the GUI only retrieves the top 3 vectors, the API could allow for more customized implementations with embeddings and a vector database.

ZeroGPU Pro Users face GPU Quota Hiccups: A ZeroGPU Pro user reported an issue where they exceeded their GPU quota despite having a paid upgrade, possibly due to using a FastAPI backend instead of a Gradio UI.
This user is seeking advice on how to resolve this quota issue when calling the ZeroGPU Pro API from their own application.

LM Studio sparks Browser Extension Ideas: Users discussed potential browser extensions for LM Studio, including translating webpages using Gemma 3 27b and summarizing YouTube videos, though the feasibility of real-time webpage translation was questioned due to speed.
One member suggested extensions could summarize YouTube videos, by summarizing the subtitles from YouTube.

Crafting Custom Audio Models with PyTorch Transformers: A member is experimenting with pretraining an audio model from scratch using PyTorch and a transformer architecture, aiming to generate proper audio from tokens.
Another member shared examples of their own model's output, generating songs based on names (e.g., abba.mp3, mj.mp3), and suggested fine-tuning or uploading the model to Hugging Face for others to experiment with.

Speculative Decoding Stalls LM Studio Models: A user reported that enabling speculative decoding causes their models to crash consistently, specifically when using Qwen 2.5 7B as the main model and Qwen 2.5 0.5B as the draft model.
Another member suggested creating a detailed report in the LM Studio Discord channel, including model details, hardware, and system information.

LM Studio ▷ #hardware-discussion (136 messages🔥🔥):

RX 9070, Vulkan performance degradation, ROCm support, Gemma3 memory allocation 

*RX 9070* Owners Report Slow Inference Speeds: Several users with the new RX 9070 are reporting slower inference speeds compared to older cards like the RX 580 and 1070, despite the 9070 showing 100% GPU load.
One user saw speeds drop from 5-7 tok/s to around 3 tok/s with a Granite 3.1 8B Q8_0 model, while another had similar experiences after upgrading from a 1070.

*Vulkan* Drivers Blamed for Performance Degradation: The performance issues with the RX 9070 are suspected to stem from bugs in AMD's Vulkan drivers, with one user noting that Vulkan performance in some games is also significantly lower than DirectX 11.
One member suggests downgrading to the 24.10.1 driver, but this version does not support the RX 9000 series, and another user found that disabling flash attention improves Vulkan performance.

ROCm Support Still in Progress for AMD GPUs: Members mention that ROCm support is still under development, and while AMD has technically added support for gfx1200, it hasn't been fully implemented in llama.cpp.
A user shares detailed performance data across different driver versions and llama.cpp versions, showing that Vulkan performance is often lower than ROCm, and that the issue has been addressed

Memory Allocation Issues with Gemma3 Models: A user encounters memory allocation errors when loading Gemma3 12b models with context windows larger than 43520, receiving a VC++ error.
They discovered that allocating even one additional token increases the buffer size by 96 MB, causing the allocation to fail, though it works for Vulcan, and that the issue is also very specific to Gemma3.

Link mentioned: Rtx 2080ti GIF - Rtx 2080ti - Discover & Share GIFs: Click to view the GIF

aider (Paul Gauthier) ▷ #general (207 messages🔥🔥):

Claude Code vs Aider Web Search, Aider's --no-verify Flag, o1-pro API experiences, Aider install for all users Ubuntu 

Claude Code Catches Aider Web Search: A user noted that Claude code is implementing web search the same way Aider does it.
They linked to an X account post demonstrating this feature but others noted that this is only available on Claude Desktop.

Aider's Git Commit Flag Causing Hook Headaches: A member noticed Aider adding the --no-verify flag during commits and linked to the relevant aider/repo.py code, which bypasses system hooks.
The Aider maintainer explained that the flag is used because commit hooks could cause arbitrarily strange things to happen, and a potential workaround using lint and test hooks was suggested.

High Costs make o1-pro API prohibitive: Some users have tried o1-pro via the API, others commented that at $30 per full send, they had to sell their computer and logged off.
The high cost led to discussions about potential caching mechanisms and whether OpenAI's automatic prompt caching could help mitigate expenses.

Pipx package madness: One user struggled to install Aider for all users on an Ubuntu box, despite being given advice to use sudo pipx install --global aider-chat.
They ended up reporting success by installing with uv at /usr/local/bin after facing multiple hurdles with pip and version conflicts.

Ripgrep MCP: Turbocharge your Claude Context: Some users are integrating Claude with Model Context Protocol (MCP) servers like mcp-ripgrep for improved file searching, since search_files times out on larger directories and doesn't respect .gitignore.
This allows Claude to interact with the filesystem, providing better context for code generation and problem-solving, but one user was skeptical since Claude provides an "official" MCP product already.

Links mentioned:

Vibe Coder Frontend Developer Role - CO/AI: This isn’t about grinding through syntax; it’s about prompting, iterating, and vibing your way to a brilliant product.
Linting and testing: Automatically fix linting and testing errors.
Reasoning models: How to configure reasoning model settings from secondary providers.
A - Overview: A has 31 repositories available. Follow their code on GitHub.
Can't set thinking tokens for Sonnet 3.7 via openrouter · Issue #3591 · Aider-AI/aider: Aider currently only supports setting thinking tokens the way Anthropic specifies. See: BerriAI/litellm#9429
GitHub - mcollina/mcp-ripgrep: An MCP server to wrap ripgrep: An MCP server to wrap ripgrep. Contribute to mcollina/mcp-ripgrep development by creating an account on GitHub.
GitHub - Aider-AI/aider: aider is AI pair programming in your terminal: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
GitHub - modelcontextprotocol/servers: Model Context Protocol Servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
aider/aider/repo.py at 14f140fdc52fbc7d819c50eca3de1b3e848282f3 · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #questions-and-tips (14 messages🔥):

Aider failing tests, Aider documentation, Aider help command, aider.el package 

Aider's Auto-Fixing Capabilities for Failing Tests Explored: A user questioned if Aider automatically fixes failing tests with --auto-test enabled, noting the need for manual prompts like "fix the tests" after each failure, and documentation was provided.
Aider should automatically fix test failures if configured with the "--auto-test" setting.

Documentation Download Options Aider-la-Carte: A user sought downloadable Aider documentation, with a pointer to the aider/aider/website/docs directory on GitHub, and a file containing patterns to exclude.
The file containing patterns to exclude parts of that subtree which aren't really "docs" is available here.

Aider's Help Command Clarified: A user inquired about using Aider to its fullest potential, including wrapping it in another system and selectively using its directory scanning capabilities, and was directed to the Aider's troubleshooting documentation here.
The documentation explains to Type /help <question> and aider will respond with helpful information, utilizing retrieval augmented generation (RAG) with its indexed documentation.

Aider.el package and prompts: A user noticed that when using aider.el package the prompt changes to "What's wrong? Fix" when tests fail.
The user confirms that when using aider in the terminal, this is the expected behavior and they have to press enter for the run tests / fix / run tests loop.

Links mentioned:

Using /help: Use “/help " to ask for help about using aider, customizing settings, troubleshooting, using LLMs, etc.
Linting and testing: Automatically fix linting and testing errors.
aider/aider/website/docs at main · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
aider/aider/help_pats.py at main · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.

Perplexity AI ▷ #general (204 messages🔥🔥):

Deep Research Limits, GPT 4.5 Model, Switching Models, Perplexity apps, Coding AI 

Deep Research Limits Remain Hot Topic: Users are debating whether Deep Research has usage limits, with some referencing the Perplexity blog stating unlimited access for Pro members, while others cite a 500 queries per day limit.
Members pointed to a tweet by Aravind Srinivas who said Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries.

Perplexity without GPT 4.5 Model?: Users are reporting that GPT 4.5 is gone from Perplexity Pro, with some suspecting that the model was removed after gaining new subscribers.
Some users claim that 4.5 was SOTA for writing text and the best for specific tasks, while others found it slow and not insightful.

Auto Model Switching Glitch: Several users are experiencing a glitch where Perplexity automatically switches back to the Auto model, even after selecting a specific model like Claude.
Some find it frustrating as they have to manually change it back every time the page refreshes, expressing a preference for Claude over R1.

Mobile and desktop apps lagging behind: Members note the High Deep Research mode is only available on the web version, not the desktop app, and the apps in general always end up really lagging behind, same goes for mobile.
Some find it best to just use the website on all platforms.

Perplexity not suited for coding?: A new user asked whether Perplexity is suitable for coding, or whether it can be used for math and coding games.
Other members chimed in do not use it to code and that Claude is best for those types of tasks.

Links mentioned:

Tweet from Aravind Srinivas (@AravSrinivas): Excited to introduce the Perplexity Deep Research Agent: available for free to all users. Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries, an...
Tweet from Aravind Srinivas (@AravSrinivas): All Perplexity Pro users now get 500 daily DeepSeek R1 queries (without censorship and prompts not going to China). Free users get 5 daily queries.Quoting Aravind Srinivas (@AravSrinivas) 100 daily De...
Audio Models in the API: Olivier Godement, Jeff Harris, Iaroslav Tverdoklhib, and Yi Shen introduce and demo three new audio models in the API—two speech-to-text models and one text-to-speech model—and an audio integration wi...

Perplexity AI ▷ #sharing (6 messages):

RAGs, LLM Email Reply System, NotebookLM, Deep Reasoning 

RAGs Explored: A user asked about RAGs.
The user asked how to implement RAGs.

LLM Email System Testing: A user shared a link to a page about testing an LLM email reply system.
No further details were given.

NotebookLM Introduces Interact: A user shared a link to NotebookLM introducing Interact.
No further details were given.

Deep Reasoning Collection Shared: A user shared a link to a Deep Reasoning collection.
No further details were given.

Perplexity AI ▷ #pplx-api (7 messages):

API Key Spend Tracking, search_domain_filter Documentation, R1-1776 Open Source Weights, MCP issues 

API Key Spend Tracking Feature Request: Users can track spend by API key but can't name them yet, so a feature request was submitted to GitHub to address this.
search_domain_filter Docs Updated: The documentation for search_domain_filter has been updated, available at Perplexity's API Reference.
A user inquired if the limit of only 3 domains had changed.

R1-1776 Finetuning Censors Material: An independent researcher evaluating R1-1776-671B and the distilled R1-1776-70B variant found canned CCP answers and censored content when prompted on topics like Tiananmen Square, described in this blogpost.
MCP tool experiences Issues: Users have reported issues with the MCP tool not functioning correctly, documented on GitHub.
API errors flood Perplexity users: One user reported consistently receiving various API errors (204, 503, 500), despite making no changes to their script and asked for thoughts on the issue.

Links mentioned:

no title found: no description found
Auditing AI Bias: The DeepSeek Case: Cracking open the inner monologue of reasoning models.
ppl-ai/api-discussion: Discussion forum for Perplexity API. Contribute to ppl-ai/api-discussion development by creating an account on GitHub.
Problem MCP tool n8n - Perplexity · Issue #17 · ppl-ai/modelcontextprotocol: Hello everyone, I try to set up the node perplexity MCP on my workflow n8n but i got some issue. I have done the same for the mcp of BRAVE but it doesn't work for perplexity. Thanks you by Advance...

Interconnects (Nathan Lambert) ▷ #news (76 messages🔥🔥):

Claude Web Search, Midjourney -> Cursor, TokenSet Image Generation, Qwen3 Release, Hunyuan-T1 

Claude Acquires Web Browsing Superpowers: Web search is now available in claude.ai, allowing Claude to finally search the internet, delivering true positives for research queries.
Midjourney's Model Lead Joins Cursor: After 3 years leading model development at Midjourney, a key member joined Cursor to work on coding agents, marking a shift from focus on beauty and creativity to code as noted in this tweet.
TokenSet Breaks Barriers in Image Generation: TokenSet introduces a new paradigm for image generation, representing images as unordered token sets, enhancing global context and robustness against local perturbations, as shown in their GitHub.
Qwen3 Imminent: Benchmarks Teased: Qwen3 may be launching soon, with members closely watching the HuggingFace and vLLM repos, with expectations that it may be GPT4.5 level.
Nvidia's Llama-3.3-Nemotron-Super-49B-v1 Ranks High: Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14 on the LMArena, excelling in math, with an openly released 15M post-training dataset available on Hugging Face.

Links mentioned:

Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): New on LMArena: @Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14!A powerful open reasoning model—top-15 overall, excelling in math, with an openly released 15M post-training dataset.Congrats...
Tweet from theseriousadult (@gallabytes): After an incredible 3 years leading model development at Midjourney, I've joined Cursor to work on coding agents. I'm incredibly proud of my time at Midjourney and the work we did, of the resu...
Tweet from Hunyuan (@TXhunyuan): 📢 Introducing TokenSet: A fundamentally new paradigm for image generation!We've broken free from traditional sequential token approaches by representing images as unordered token sets. Our innova...
Tweet from Alex Albert (@alexalbert__): Web search is now available in claude dot ai. Claude can finally search the internet!
Tweet from Hunyuan (@TXhunyuan): 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. 🔥✅ Hybrid-Mamba-Transformer MoE A...
Tweet from Lincoln 🇿🇦 (@Presidentlin): https://www.reddit.com/r/LocalLLaMA/comments/1jgio2g/qwen_3_is_coming_soon/?sort=new
Tweet from Vaibhav (VB) Srivastav (@reach_vb): LETS GOO! @kyutai_labs just released MoshiVis - an end-to-end low-latency Vision Speech Model, CC-BY license 🔥> Only adds 206M parameters via lightweight cross-attention (CA) modules to integrate ...
Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
[Model] Add Qwen3 and Qwen3MoE by YamPengLi · Pull Request #15289 · vllm-project/vllm: DescriptionRecently, I have submitted a pull request to Hugging Face Transformers containing the implementation of the Qwen3 and Qwen3MoE model. I would also like to contribute these new modelsto ...

Interconnects (Nathan Lambert) ▷ #random (23 messages🔥):

Esoteric Total Ordering, Unitree Robotics Kip-Up, Claude uses Brave Search, Capybara Logo Change, Token Counting Inflation 

*New Faces (tm)* Sparks Esoteric Order Theories: After seeing new faces, a member joked about believing in an esoteric total ordering to new faces.
The member jokingly analyzed an attached image, suggesting the last ordering was based on hair volume ASC.

*Unitree's G1* Nails the Kip-Up!: UnitreeRobotics showcased their G1 humanoid robot performing a kip-up, celebrating the rapid advancement of humanoid intelligence.
A member responded saying that this robot abuse will be a big theme in the revolution.

*Brave* Chosen as Claude's Search Engine: It was confirmed that Claude's web search feature uses Brave Search, verified by a recent update to their Trust Center and matching search results, according to Simon Willison.
The fact that Brave was chosen was received negatively, with members lamenting the difficulty of building better indexes and the death of the Bing API.

Capybara Logo De-Capybarized: The Capybara logo is being reverted to a more generic logo, a decision lamented by some despite its broader appeal, as reported by Justin Lin.
A member reacted with crying emojis to the logo change and the 9 quadrillion tokens for training.

Token Count Inflation Allegations: A member questioned whether a token count was inflated by counting the token per image per frame per video.
Another member suggested that this inflation was almost certain, especially after the company revealed their tokenizer, implying a strategy to boost stock value.

Links mentioned:

Tweet from Junyang Lin (@JustinLin610): Turning the Capybara back to Logo. Suitable for more people but a little bit sad for us.
Tweet from Unitree (@UnitreeRobotics): Movement creates intelligence - Unitree's G1 humanoid robot nails the world's first kip-up!😘This fresh, newly captured video from Unitree's testing grounds showcases the breakneck speed o...
Tweet from Simon Willison (@simonw): I've confirmed that the search engine being used by Claude's web search feature is @brave - it's listed in a recent update to their "Trust Center" and the search results are an exa...

Interconnects (Nathan Lambert) ▷ #memes (2 messages):

Anthropic Job Application, AI Alignment, Vibe Check 

ChatGPT Crafts Cover Letter for Anthropic: A user prompted ChatGPT to draft a cover letter for an Anthropic application just in case.
This was presented as a proactive measure, even without concrete plans to apply, showcasing the user's interest in potential opportunities at Anthropic.

Alignment Vibe Check for Anthropic: A user jokingly offered to help Anthropic improve its AI alignment, suggesting a need to elevate the AI's main-character energy.
The user humorously emphasized the importance of deep vibes and verified blue-check alignment status for Anthropic's AI, seemingly poking fun at contemporary social media culture and its integration into AI persona design.

Twitter Bot Career Path Joked for AI: A member jokingly suggested that someone's message about improving Anthropic's AI sounded like a suitable style for a Twitter comment bot.
This comment implies that the tone and language used in the message were overly enthusiastic or promotional, fitting the stereotype of automated social media engagement.

Interconnects (Nathan Lambert) ▷ #cv (6 messages):

Sonnet 3.7 Benchmarks, InternVL Open Source Training Code, OpenAI operator use case 

*Sonnet 3.7* Visual Benchmarks Remain Elusive: A member inquired about comprehensive visual benchmarks for Sonnet 3.7, noting that only MMMU results were reported in the announcement.
So far, there is no response on benchmarks.

*InternVL*'s Open Training Code Draws Attention: A member was surprised to discover that InternVL has open source training code.
They believe this makes InternVL and Molmo the only notable models with open training pipelines, pointing to InternVL's packing implementation as a resource for dataloading.

OpenAI Operator Finds Use in Dataset Iteration: A member shared their initial use case for the OpenAI operator: iterating through InternVL datasets to find ArXiv and Hugging Face links, using a specific image to demonstrate this.
Another member suggested that deep research would work better for this use case.

Interconnects (Nathan Lambert) ▷ #reads (16 messages🔥):

RLAIF-V for MLLM Trustworthiness, Skill-Dependent Scaling Laws, Scaling RL Compute for Reasoning, SuperBPE Tokenizer, SWEET-RL for Multi-Turn LLM Agents 

*RLAIF-V* Boosts MLLM Trustworthiness with Open-Source Feedback: OpenBMB introduced RLAIF-V, a framework for aligning MLLMs using open-source feedback, claiming to surpass GPT-4V in trustworthiness, as detailed in their paper and GitHub repository.
The framework uses inference-time scaling with RLAIF-V reward and high-quality generalizable feedback data and splitting responses into atomic claims to sample different responses.

Knowledge and Reasoning Exhibit Different Scaling Behaviors: A new paper reveals that knowledge and reasoning skills show different scaling behaviors, suggesting that compute-optimal scaling is skill-dependent.
The research investigates the impact of various datamixes and found fundamental differences in scaling behavior between knowledge and code, even when correcting for datamix differences.

*Reinforcement Learning* Scales Reasoning Powers: Reinforcement learning has enabled advanced reasoning capabilities in recent language models, enabling models to "think for longer" and reduce error rates via inference-time scaling.
The blogpost questions how to scale RL compute further, discovering even more inference-time capabilities, with several open questions on priors.

*SuperBPE* Tokenizer Enhances Efficiency: A new superword tokenizer, SuperBPE, was created which includes tokens spanning multiple words and at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time, as described in this tweet.
At a fixed vocab size of 200k, SuperBPE reduces sequence length by 33% on average.

*SWEET-RL* algorithm enhances LLM agent interaction: A novel RL algorithm called SWEET-RL was proposed for multi-turn interactions in real-world tasks for LLM agents, detailed in this paper.
SWEET-RL uses a carefully designed optimization objective to train a critic model with access to additional training-time information, providing step-level rewards for improving the policy model.

Links mentioned:

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness: Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how ...
Tweet from Alisa Liu (@alisawuffles): We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream task...
Tweet from OpenBMB (@OpenBMB): 🔥 Test-time Scaling for MLLMs’ trustworthiness🌟 Thrilled to introduce our new work RLAIF-V, a novel framework for aligning MLLMs through open-source feedback, achieving trustworthiness that surpasse...
Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): Wait, for real? They claim a large improvement on hallucinations with basically nothing more than splitting responses into atomic claims, sampling responses using different seeds for trustworthiness c...
Tweet from Nicholas Roberts (@nick11roberts): 📉📉NEW SCALING LAW PHENOMENON 📉📉 We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of sk...
Tweet from Alisa Liu (@alisawuffles): What can we gain from less restrictive tokenization? To find out, we developed SuperBPE🚀, which learns subword *and* superword tokens. SuperBPE dramatically improves encoding efficiency over BPE — at...
Compute Optimal Scaling of Skills: Knowledge vs Reasoning: Scaling laws are a critical component of the LLM development pipeline, most famously as a way to forecast training decisions such as 'compute-optimally' trading-off parameter count and dataset...
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks: Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks. However, existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit ass...
Scaling RL Compute | General Reasoning: no description found

Interconnects (Nathan Lambert) ▷ #policy (6 messages):

Scaling Laws for Language Models, Data Requirements for GPT-4.5 

Bigger Models Benefit from More Data: Members discussed whether smaller datasets are a new trend, with larger models like GPT-4.5 potentially needing more data, especially during various post-training stages.
One member suggested examining the number of tokens used in training open-source model suites for different model sizes, referencing scaling laws, with another asking about papers on the subject.

Synthetic Data Augmentation: The conversation touches on the use of synthetic data to augment smaller datasets for training smaller models, suggesting a trade-off between data size, model size, and the use of synthetically generated data.
This implies a strategy where smaller models might rely more on enhanced, possibly synthetic, datasets, while larger models can effectively utilize larger volumes of raw data.

LMArena ▷ #general (108 messages🔥🔥):

p2l-router-7b-0318 Model, Claude Overrated?, Google AI Studio API, Deepseek R1, Qwen 3 Coming Soon 

Is Claude really all that, or is everyone just simping?: Members believe people are overrating Claude, questioning its coding prowess due to limited evaluations beyond SWE-bench, suggesting it might not match Grok3 on livecodebench.
Some suggest that ratings may be skewed by non-developers, leading to inaccurate assessments of its true capabilities.

Gemma gets Glowing: Community members expressed amazement at Gemma3's 1340 score and relatively small 27B parameter size.
One member described Gemma's responses as autistic, giving very brief answers, often when a much longer one is warranted.

Deepseek R1 eats VRAM: Deepseek R1 requires a substantial amount of VRAM, around 1000GB, with one user running it on 8xH200s.
Despite high VRAM usage, there are claims that Deepseek R1 exhibits baked-in PRO CHINA biases, raising concerns about its use, with one user saying tldr deepseek is #&&@% don't recommend using it.

Qwen 3 Sneak Peek?: There are reports that Qwen 3 is coming soon, indicated by a post on the Hugging Face Transformer repository.
This news follows the announcement of Qwen 2.5 Omni, sparking interest and anticipation within the community.

OpenAI's Sora: Hype vs. Reality?: Users found Sora's public release underwhelming compared to its promotional materials, even inferior to competitors like Keling AI and Hailuo AI.
It's suspected that OpenAI used huge amounts of compute over hours to generate them and the released Sora version is the turbo version.

Links mentioned:

Tweet from Lincoln 🇿🇦 (@Presidentlin): Qwen 3 coming pr on hugging face transformer repo
no title found: no description found
unsloth/DeepSeek-V3 · Hugging Face: no description found

Notebook LM ▷ #use-cases (21 messages🔥):

Podcast Feature in NotebookLM, NotebookLM vs Gemini, Efficient PDF Processing Workflow, AI Avatar Lip Syncing Services, Mindmap Feature rollout 

Users Blown Away by NLM's Podcast Feature: Users are very happy about the Podcast feature of NLM, but one user felt like a third wheel in the discussions because the AI tended to cut short their answers and revert to its own script.
The user likened the experience to being part of a radio show where I can talk to hosts.

NotebookLM Stays Grounded in Provided Sources: A user questioned the advantage of using NotebookLM over Gemini, as both support files like PDFs.
A member found that Gemini did not stay grounded in the sources, whereas NotebookLM's distinguishing aspect is that it only uses the sources provided.

Streamline PDF Processing Workflow: A user seeks an efficient way to declutter physical papers, scan them into private online storage, and make them searchable by natural language queries.
The current manual process involves scanning to PDF, sending to Gmail, manually naming each file, OCR processing, and importing into NotebookLM, and the user asks if taking photos with iPhone and sending to NLM for automatic naming and OCR is more efficient.

AI Avatar Lip Syncing Showdown: Members are comparing lip syncing services for AI avatars, with one user finding Hedra great but pricey, and being unimpressed with RunwayLM.
Mindmap Feature Still Rolling Out: A user inquired about the absence of the Mindmap feature in their NotebookLM, and another user clarified that the feature is rolling out over a two-week period.
The feature is not available even for NLM Plus subscribers yet.

Notebook LM ▷ #general (74 messages🔥🔥):

Flashcard Generation, NotebookLM vs Chatbase, Premium Voice Overview Limits, Mind Map Feature Rollout, Whitelist NotebookLM Crawler 

*Flashcard Feature* Requested by Plus User: A Plus user requested the integration of Flashcard generation (Anki) in NotebookLM.
They expressed disappointment, stating that chatbase is currently a better chatbot agent.

*Mind Map* Rollout Happening Slowly: Users are reporting that the Mind Map feature is doing a slow rollout, with many not seeing it on their accounts despite being Plus users.
A staff member confirmed that it will take a few days for all users to get it and they should sit tight and wait.

*Cloudflare Blocking* NotebookLM Crawler: A user asked how to technically whitelist the NotebookLM crawler, as Cloudflare was blocking it on their website.
The user found out it was Cloudflare blocking it.

*PDF Sources* Getting Cut Off: A user reported that their PDF source was being cut off, with NotebookLM not recognizing details near the end of the file.
A staff member suggested that the preview may not be ground truth and to file the issue under bugs if asking about the end of the document yields no results.

*Gemini 1.5 Pro* Powering NotebookLM: A user inquired whether NotebookLM is boosted by Gemini 1.5 Pro.
Another asked what model NotebookLM uses and another user answered Gemini 2.0.

Nous Research AI ▷ #general (79 messages🔥🔥):

Nvidia Blackwell RTX Pro series, Data filtering strategies, DeepHermes 24B OOM issues, WorldSim appreciation 

Nvidia's Blackwell RTX Pro Enters the Ring: Nvidia has introduced the Blackwell RTX Pro series targeting laptops, desktops, standalone PCs, and data centers, potentially tightening the already limited supply of Blackwell GPUs.
Sources at GDC/GTC suggest Nvidia aims to improve Blackwell GPU supply, hinting that supply might meet demand by May/June, though skepticism remains: *"We'll believe that when we see it."

Dataset Evaluation & Augmentation reign supreme: Discussion emphasized that the most efficient use of GPU hours lies in dataset evaluation, augmentation, sorting, and categorization.
One member suggested filtering data using a small model to reject data it predicts well, noting this area as *"underexplored in public."

DeepHermes 24B Struggles on Multi-GPU Rig: A user faced Out-of-Memory (OOM) errors running DeepHermes 24B on a 5x 3090 rig with llama.cpp, even with the lowest context settings.
Suggestions included trying the 8-bit version and checking multi-GPU configuration, with advice on using --device, --split-mode, and --tensor-split flags for proper GPU utilization.

WorldSim Sparks Awe: User Calls Nous Research 'Godly': A member expressed immense enthusiasm for WorldSim, praising Nous Research for creating an incredible application of AI and stating it's *"absolutely epic!!"
The user was enthralled by the application and emphasized its masterclass quality, regretting not discovering it sooner, "Thanks so much Nous Research for creating such an incredible application of AI!"

Links mentioned:

Predictive Data Selection: The Data That Predicts Is the Data That Teaches: Language model pretraining involves training on extensive corpora, where data quality plays a pivotal role. In this work, we aim to directly estimate the contribution of data during pretraining and se...
Nvidia Blackwell RTX Pro with up to 96GB of VRAM — even more demand for the limited supply of GPUs: GB202, GB203, and GB205 are coming to professional and data center GPUs. (Updated with full specs.)
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale: Large language model pre-training has traditionally relied on human experts to craft heuristics for improving the corpora quality, resulting in numerous rules developed to date. However, these rules l...
Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy/llm.c · Discussion #481: Let's reproduce the GPT-2 (124M) in llm.c (~4,000 lines of C/CUDA) in 90 minutes for $20. The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite...

Nous Research AI ▷ #ask-about-llms (8 messages🔥):

Hermes 3 Llama 3.2 3B, Model Parameters, Response Generation Issues 

Nous Research Unveils Hermes 3 Llama 3.2 3B: Nous Research introduced Hermes 3 3B, a small but mighty addition to the Hermes series of LLMs, detailed in the Hermes 3 Technical Report.
Hermes 3 boasts improvements over Hermes 2, including advanced agentic capabilities, better roleplaying, reasoning, multi-turn conversation, long context coherence.

Tweaking Parameters for Casual Chat with Hermes 3: A user experimented with Hermes-3-Llama-3.2-3B-GGUF on Hugging Face, using a payload with parameters like temperature, top_k, top_p, and repeat_penalty to generate responses.
Initially, the user set temperature to 0.2 and top_p to 0.7, but considered that 0.7 and 0.85 ranges might be better, respectively.

User Impersonation Glitch in Hermes 3: A user reported issues with Hermes 3, where it sometimes impersonates the user instead of maintaining its AI persona.
The user is attempting to refine the system prompt and payload parameters to ensure logical and consistent AI-generated responses.

Link mentioned: NousResearch/Hermes-3-Llama-3.2-3B-GGUF · Hugging Face: no description found

Nous Research AI ▷ #research-papers (1 messages):
teknium: https://x.com/nick11roberts/status/1902875088438833291?s=46

Nous Research AI ▷ #research-papers (1 messages):
teknium: https://x.com/nick11roberts/status/1902875088438833291?s=46

Nous Research AI ▷ #reasoning-tasks (3 messages):

Nous Hermes 2, C# Development, Anthropic LLMs 

Developer professes love for Nous Hermes 2: A member reminisced about starting with Nous Hermes a year ago, specifically mentioning Nous Hermes 2 as their "beloved".
They have released their first desktop app and are figuring out which model to use for version 2.

Developer offers C# skills to the community: A member offered their help, mentioning C# as their "love language" and highlighting their professional LLM experience.
They have created several documentation and example LLMs for Anthropic.

Anthropic LLM examples detailed: A member mentioned their work on Anthropic LLMs, including a Titanfall 2-based generator and the Bladewolf example from Metal Gear Rising.
Their contributions can be found on the Anthropic GitHub.

HuggingFace ▷ #general (50 messages🔥):

Hugging Face API Outage, Roblox Voice Safety Classifier, Local Models for Speed & Privacy vs Cloud Models, Merge Multiple GPU VRAM, MagicQuill Low Quality Images 

HF API's 404 Error Causes App Downtime: A user reported widespread 404 errors affecting multiple Hugging Face API models, causing significant downtime for dependent applications, and requested immediate attention from the HF dev team, noting it had been almost a whole day without official acknowledgement.
Another member tagged a HuggingFace employee to raise awareness to this urgent issue experienced by paid users.

*Roblox* Releases Voice Safety Classifier for Toxicity Detection: Roblox released a large classification model trained on a manually curated real-world dataset of 2,374 hours of voice chat audio clips.
The model outputs a n by 6 output tensor where the inferred labels are Profanity, DatingAndSexting, Racist, Bullying, Other, NoViolation based on a synthetic data pipeline described in this blog post.

Fuse VRAM from multiple GPUs via Tensor Parallelism: Users discussed methods for combining VRAM from multiple GPUs, particularly for running models like Gemma3-12B, with one user asking if there's a way to combine an A2000 12GB and a 1060 6GB; the main recommendation was using tensor parallelism.
One member pointed to Ollama issues on GitHub (2) and llama.cpp discussions for further information on multi-GPU support.

*Oblix* Platform Dynamically Executes AI Tasks on Cloud or Device: Oblix.ai platform uses autonomous agents for intelligent AI orchestration that dynamically executes between cloud and on-device models, ensuring optimal performance, cost-efficiency, and security; they intelligently route AI tasks to cloud or edge based on complexity, latency requirements, and cost considerations.
Oblix dynamically decides whether to process each AI request locally or in the cloud as shown in this YouTube video.

Gradio Upgrade Breaks gr.Dataframe Wrapping Feature: A user reported that upgrading to Gradio 5.22 caused the gr.Dataframe(wrap=True) feature to stop working, with wrapping only functioning in Gradio 5.20.
There were no other details given.

Links mentioned:

Roblox/voice-safety-classifier · Hugging Face: no description found
Hugging Face – The AI community building the future.: no description found
Hugging Face – The AI community building the future.: no description found
Open Source Vector Database | Weaviate: Simplify the development of AI applications and enable developers of all levels to build, iterate, and scale AI capabilities faster.
HF Inference API last few minutes returns the same 404 exception to all models: I think its due to the server error/issues, im getting this now as well instead of 404
MagicQuill - a Hugging Face Space by AI4Editing: no description found
AI4Editing/MagicQuill · I edited my image but then had no idea how to download the edited output image?: no description found
AI4Editing/MagicQuill · Discussions: no description found
Spaces - Hugging Face: no description found
Mini Toy World - a Hugging Face Space by Yntec: no description found
Transform Your AI Performance with Intelligent Hybrid Orchestration | Oblix.ai: Experience our interactive demo and see how our intelligent agents seamlessly switch between local LLM execution and cloud providers for optimal performance and cost efficiency.
GitHub - Neph0s/awesome-llm-role-playing-with-persona: Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas: Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas - Neph0s/awesome-llm-role-playing-with-persona
Reddit - The heart of the internet: no description found
Distributed GPU inference: no description found
 Do Ollama support multiple GPUs working simultaneously? · Issue #2672 · ollama/ollama: I have 8 RTX 4090 GPUs. Can they support a 70B-int4 parameter model?
Ollama is splitting the model between CPU and one GPU instead of using second GPU · Issue #8995 · ollama/ollama: What is the issue? Problem description My Setup I use ollama on my Laptop with an external GPU. My Laptop has an internal Nvidia Quadro M2000M. Over Thunderbolt 3 I have a Razer Core X Chroma eGPU ...
How to properly use llama.cpp with multiple NVIDIA GPUs with different CUDA compute engine versions? · ggml-org/llama.cpp · Discussion #8725: I have an RTX 2080 Ti 11GB and TESLA P40 24GB in my machine. First of all, when I try to compile llama.cpp I am asked to set CUDA_DOCKER_ARCH accordingly. But according to what -- RTX 2080 Ti (7.5)...

HuggingFace ▷ #today-im-learning (1 messages):
richieghost: Today I'm learning Pytorch Frame.

HuggingFace ▷ #i-made-this (3 messages):

Ollama Gradio UI with Kokoro TTS, Little-Geeky-s-Learning-UI, Oblix AI orchestration platform, Edge-Cloud transitions 

Little Geeky Learns New UI Tricks: A member showcased a new UI built with Ollama, Gradio, and Kokoro TTS that automatically reads text output in a chosen voice and has model creation and management tools.
The UI can read ebooks and answer questions about documents, as well as work with vision models to do the same for images, and an audio file is output from the UI.

GeekyGhost shares Little-Geeky-s-Learning-UI: The Little-Geeky-s-Learning-UI is an Ollama based Gradio UI that uses Kokoro TTS.
It allows model creation and management, reads ebooks, answers questions about documents, and works with vision models.

Oblix orchestrates edge-cloud transitions: Oblix.ai is an AI orchestration platform powered by autonomous agents that dynamically executes between cloud and on-device models, ensuring optimal performance, cost-efficiency, and security.
The platform features intelligent routing, performance optimization, execution agents, and cost efficiency, and a demo is available on YouTube.

Links mentioned:

Transform Your AI Performance with Intelligent Hybrid Orchestration | Oblix.ai: Experience our interactive demo and see how our intelligent agents seamlessly switch between local LLM execution and cloud providers for optimal performance and cost efficiency.
GitHub - GeekyGhost/Little-Geeky-s-Learning-UI: An Ollama based Gradio UI that uses Kokoro TTS: An Ollama based Gradio UI that uses Kokoro TTS. Contribute to GeekyGhost/Little-Geeky-s-Learning-UI development by creating an account on GitHub.

HuggingFace ▷ #computer-vision (1 messages):
.mwayne: https://blog.roboflow.com/fine-tune-sam-2-1/amp/

HuggingFace ▷ #smol-course (2 messages):

Manual Looping vs Vectorization, GSM8K Dataset, Tokenizer ChatML Format, Certifications 

Vectorized Processing triumphs Manual Looping: One member found vectorization performed much better than manual looping.
The original poster reported that they had implemented it manually with a for loop, describing it as kind of round-about.

GSM8K Dataset Difficulties Arise: A member expressed trouble understanding the next notebook task involving the GSM8K dataset.
The member was especially confused about the instructions to create a message format with the role and content.

Tokenizer's ChatML Format Examined: Doubts were raised on whether the tokenizer method always implements the same ChatML format.
The member questioned how the function knows how the original dataset is formatted.

Certification Assignment Location: One member inquired where they could find the assignment to get certifications.
No further details were provided about the specific certification or platform.

HuggingFace ▷ #agents-course (24 messages🔥):

HF Course Certificate, Unit 2.1 Error, AI agent for UI automation, Langfuse Error, Smolagent model to run locally 

HF Course Certificate Achievement: A member asked about how to obtain a certificate after completing Unit 2 of the Hugging Face course.
Another member requested the inclusion of an AI agent for UI automation.

HF Learners Encounter Unit 2.1 Issues: A member encountered an error while running the code from Unit 2.1 Building Agents That Use Code in their own Python environment, the image showed the code crashing.
A member suggested using the hf_token to run the model, check terms and conditions, or verify the HFApi object contains the token.

Langfuse Integration Challenges: Members reported encountering an AttributeError related to the smolagents module when trying to connect to Langfuse in the unit2/smolagents/code_agents.ipynb notebook.
A maintainer acknowledged the issue and pointed to a bug report here and fix here related to the openinference instrumentation library.

Exploration of Local Smolagent Models: A member inquired about the best smolagent model to run locally and how to implement it in a working program.
The member shared challenges when implementing the multiagents module.

AI Agents Course: A learning review: A member shared their experience and learnings from completing unit 1 of the AI Agents course in a Medium blog post.
They shared tips on how to get the most out of the first unit.

Links mentioned:

The 🤗 AI Agents Course: A review: As a data scientist or LLM enthusiast, you hear and read about agents everywhere now. Unfortunately, not everyone has the same idea when…
[bug] SmolagentsInstrumentor - AttributeError: module 'smolagents' has no attribute 'ApiModel' · Issue #1399 · Arize-ai/openinference: Describe the bug When following the guide on instrumenting Hugging Face smolagents using the SmolagentsInstrumentor, I get the following error: AttributeError: module 'smolagents' has no attri...
fix: only import exported smolagents models by njbrake · Pull Request #1403 · Arize-ai/openinference: Fixes #1399

MCP (Glama) ▷ #general (69 messages🔥🔥):

mcp-mysql-server issues, fastmcp Framework, Vibe Coding, DaVinci Resolve MCP update, Glama API outage 

*MySQL & MCP Hookup Headache: A user is wrestling with mcp-mysql-server connecting to Docker MySQL, reporting it bombs every connection despite working outside of MCP*.
*Fastmcp Framework Frustrations: A user suspects the fastmcp framework is stripping out hidden* arguments passed to commands, casing issues from RegisterCommandsSchema to _RegisterCommandsSchema in their code.
*Vibe Coding Debate Sparks: Some users joked about vibe coding*, where the coder looks at the screen, nothing makes sense, but somehow gets results.
*DaVinci Resolve MCP Seeks Speedy Server Claim: A user is seeking to resubmit their DaVinci Resolve MCP* project with a license and updates, and was told claiming the server might speed up the update process, directing them to their repo.
*Glama API Grievances Galore: A user reported getting a 500 error from the Glama API*, but another member stated that there have been no outages in the last 24 hours, with others sharing code samples to reproduce: curl -X 'GET' 'https://glama.ai/api/mcp/v1/servers?first=10&query=github' -H 'accept: application/json'.

Links mentioned:

Tweet from Jamie Dubs (@jamiew): 1. MCP is complex, overengineered, hard to host in cloud, security model is nonexistent2. MCP is also the best current solution for "LLM plugins" or "SDK but for LLMs". everything else...
GitHub - cunhapaulo/marpstyle: Repository for Marp Themes created with beauty and simplicity in mind.: Repository for Marp Themes created with beauty and simplicity in mind. - cunhapaulo/marpstyle
GitHub - ggozad/oterm: a text-based terminal client for Ollama: a text-based terminal client for Ollama. Contribute to ggozad/oterm development by creating an account on GitHub.
GitHub - samuelgursky/davinci-resolve-mcp: MCP server integration for DaVinci Resolve: MCP server integration for DaVinci Resolve. Contribute to samuelgursky/davinci-resolve-mcp development by creating an account on GitHub.
GitHub - MarkusPfundstein/mcp-gsuite: MCP Server to interact with Google Gsuite prodcuts: MCP Server to interact with Google Gsuite prodcuts - MarkusPfundstein/mcp-gsuite
GitHub - isaacphi/mcp-gdrive: Model Context Protocol (MCP) Server for reading from Google Drive and editing Google Sheets: Model Context Protocol (MCP) Server for reading from Google Drive and editing Google Sheets - isaacphi/mcp-gdrive
GitHub - kazz187/mcp-google-spreadsheet: MCP Server for Google Spreadsheet: MCP Server for Google Spreadsheet. Contribute to kazz187/mcp-google-spreadsheet development by creating an account on GitHub.
GitHub - distrihub/mcp-google-workspace: A Model Context Protocol (MCP) server built in Rust for interacting with Google Drive and Google Sheets.: A Model Context Protocol (MCP) server built in Rust for interacting with Google Drive and Google Sheets. - distrihub/mcp-google-workspace
GitHub - akchro/google-sheets-mcp: Contribute to akchro/google-sheets-mcp development by creating an account on GitHub.
GitHub - rishipradeep-think41/drive-mcp: Contribute to rishipradeep-think41/drive-mcp development by creating an account on GitHub.

MCP (Glama) ▷ #showcase (6 messages):

Microsoft Semantic Workbench, Turso MCP tool video, Asana MCP + Google Calendar MCP, MCPHub.nvim + Avante + Figma MCP 

Microsoft Releases Semantic Workbench Tool: Microsoft released the Semantic Workbench, a VS Code extension, described as a versatile tool to prototype intelligent assistants, agents, and multi-agentic systems, prompting questions about its role as an MCP.
One member asked if it was an MCP, after looking at its description.

Turso MCP Tool Demoed by Jamie: Jamie created a video of the Turso MCP tool, built by the @tursodatabase community and showcased in a tweet.
The video shows Claude creating a database for a domain collection.

Automated Calendar Scheduling with Asana & Google Calendar MCPs: A blog post detailed using Asana MCP and Google Calendar MCP with Goose to automate task scheduling, illustrating how tasks are pulled from Asana, analyzed, and scheduled in Google Calendar with a single prompt. The blog post highlights the time-saving benefits of automating the organization of tasks and meetings.
The Asana MCP server is located at Asana and the Google Calendar MCP is located at Google Calendar.

nvim Integrates with Figma: A user showcased an integration of MCPHub.nvim with Avante and Figma MCP, demonstrating a streamlined workflow as seen in the shared video login-figma.mp4.
Other users expressed interest with comments such as "Looks cool".

Links mentioned:

Tweet from Jamie Barton (@notrab): Here I ask Claude to create a database for my domain collection. Don't worry, I didn't include the full list, the video is only 90 seconds.👏 Huge shout out to @spences10 and the @tursodatabas...
MCP in Action: How I Use AI to Plan My Week with Goose, Asana, and Google Calendar: Use MCPs with Goose to automate task management and enhance productivity.
GitHub - microsoft/semanticworkbench: A versatile tool designed to help prototype intelligent assistants, agents and multi-agentic systems: A versatile tool designed to help prototype intelligent assistants, agents and multi-agentic systems  - GitHub - microsoft/semanticworkbench: A versatile tool designed to help prototype intelligent...

OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):

OpenRouter TTS, Ernie Models, Sambanova, Inferencenet, OpenAI audio models 

OpenRouter to Offer TTS, Image Gen, at a Cost: A member expressed interest in OpenRouter offering TTS and image generation, but voiced concerns about potentially high pricing.
Groq vs Sambanova mixup: A member initially reported that Sambanova was down, but then corrected themselves, stating that it was Groq experiencing issues.
GPT-4o Arrives: A user noticed that GPT-4o-64k-output-alpha is available on OpenRouter, supporting both text and image inputs with text outputs at the cost of $6/M input tokens and $18/M output tokens.
Reasoning models usage data shared: A member published token usage data and thoughts on reasoning models, comparing them to traditional models.
Fireworks Reduces Pricing, Matching Performance: Fireworks lowered their pricing for R1 and V3, with V3 reportedly matching existing performance metrics, specifically .9/.9.

Links mentioned:

API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
GPT-4o (extended) - API, Providers, Stats: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
nghuyong (HuYong): no description found
OpenRouter FAQ: Find answers to commonly asked questions about OpenRouter's unified API, model access, pricing, and integration.
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
GitHub - mintsuku/sora: Sora is a Discord bot that integrates with the Open Router API to facilitate conversation in Discord servers.: Sora is a Discord bot that integrates with the Open Router API to facilitate conversation in Discord servers. - mintsuku/sora
Eyes GIF - Eyes Burning My Eyes - Discover & Share GIFs: Click to view the GIF

GPU MODE ▷ #general (9 messages🔥):

vast.ai ncu profiling, Jake, Spam detection with neural nets 

NCU Profiling on Vast.ai in Question: A member inquired if vast.ai allows for ncu profiling.
Another member responded that while someone with the handle Jake is present, they doubt bare metal access is provided.

Spam Auto-Detection via Neuro Nets: A member mentioned knowing a server that implemented automatic detection of spam messages using some kind of neuro nets.
No further details or links were provided.

GPU MODE ▷ #triton (6 messages):

cuTile talk, atomic addition with bfloat16, triton 3.1.0 and triton-windows 3.2.0, Triton's ease of use, sparse attention pattern 

Channel Eyes cuTile Talk: A member suggested inviting someone to give a talk about cuTile on the channel and the suggestion is in the works.
No further details were provided.

BFloat16 Atomic Addition Achieved Via Locks: A member reported that using tl.atomic_cas with a lock for atomic addition with bfloat16 actually works, but it sucks.
The member is seeking improvements to the implementation, and offered a code snippet using tl.atomic_cas with a lock, inviting the community to enhance its performance.

Triton Versions Clash Post-Install: After successfully running triton_test.py, a member found both triton 3.1.0 and triton-windows 3.2.0 listed in pip, expressing hesitation to uninstall the older version due to numerous files shown in the CMD.
They sought advice on whether to uninstall triton 3.1.0, but no solutions were offered.

Triton's Simplicity Attracts GPU Newbies: A member highlighted that Triton's key strength lies not in peak performance, but in its accessibility, enabling individuals with limited GPU experience to create complex kernels, and pointed to lucidrains/native-sparse-attention-pytorch as an example.
They noted that achieving peak performance on predefined workloads is relatively straightforward, but Triton's robustness is what sets it apart.

Link mentioned: native-sparse-attention-pytorch/native_sparse_attention_pytorch/triton_native_sparse_attention.py at main · lucidrains/native-sparse-attention-pytorch: Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper - lucidrains/native-sparse-attention-pytorch

GPU MODE ▷ #cuda (3 messages):

FlashMLA SmemLayoutP, Pointer Tagging 

FlashMLA's SmemLayoutP Dimension Decoded: A member inquired about the dimensions of SmemLayoutP in the FlashMLA code, specifically its shape ((2,2), kNThreadsS, 1, kBlockN/8) and the role of kNThreadsS in synchronizing P between warpgroups.
The member speculated whether other dimensions might be related to wgmma, awaiting clarification from other experts.

Pointer Tagging Pitfalls Pondered: A member asked about potential pitfalls when implementing pointer tagging using uint32_t*, referencing the programming guide's suggestion of 17 available bits.
They included an image from the programming guide for context.

Link mentioned: FlashMLA/csrc/flash_fwd_mla_kernel.h at b31bfe72a83ea205467b3271a5845440a03ed7cb · deepseek-ai/FlashMLA: FlashMLA: Efficient MLA decoding kernels. Contribute to deepseek-ai/FlashMLA development by creating an account on GitHub.

GPU MODE ▷ #torch (5 messages):

ZeRO offload, full-finetuning, 8B model, BF16, A100 40GB 

Zeroing in on ZeRO Offload for 8B Finetuning: A member inquired if ZeRO offload would enable full fine-tuning of an 8B model (in BF16) on a single A100 40GB GPU.
Another member suggested that ZeRO is mostly for distributed training, and that checkpointing + gradient accumulation would be better for a single GPU.

FSDP2's Offload Parameter Examined: One member mentioned that FSDP2 has an offload_to_cpu parameter, while another suggested that a better starting point would be torchao's offload optimizer.
DeepSpeed's Zero Offload Claims: A member mentioned that deepspeeds Zero offload claimed you could train a model up to 13B on a single GPU.
They were looking for a better implementation.

GPU MODE ▷ #algorithms (2 messages):

GPU Mode Scammer, Discord Channel Alerts 

Scammer Alert Sounds the Alarm: A user alerted the channel that a scammer is present in the GPU Mode channel, suggesting potential fraudulent activity.
The message included custom emojis related to GPUs and Dragon Ball Z, possibly as a humorous or attention-grabbing element.

Discord Channel Experiences Scammer Scare: Members of the GPU Mode Discord channel were warned about the presence of a potential scammer.
The warning lacked specific details, but served as a general alert to exercise caution.

GPU MODE ▷ #lecture-qa (3 messages):

Hopper Architecture, Microbenchmarking, Matrix Multiplication 

Hopper's Flops per Cycle: On Hopper, one can microbenchmark 4096 flops/2048 MAD (16-bit) per cycle per SM.
This doubles when using 8-bit types, due to its two matmuls (QxK and xV).

Microbenchmarking recommended for Hopper: One member suggested that microbenchmarking would be the way to go to gather the performance details of Hopper's architecture.
Another member thought it was probably documented somewhere, but couldn't find any reference.

GPU MODE ▷ #self-promotion (3 messages):

GTC Presentation, CUDA Kernels, Small Transformer Models, Hopper Architecture, CUTLASS 4.0 

*GTC Presentation Promises Performance-Optimized CUDA Kernels: A member announced their GTC presentation titled "Performance-Optimized CUDA Kernels for Inference With Small Transformer Models [S73168]" happening today at 4pm, focused on Hopper architecture*.
They encouraged attendees to come, say hi, and ask tough questions, also mentioning that the talk recordings will be available on the GTC website for those unable to attend in person.

*CUTLASS 4.0 set to redefine Pythonic Integration at GTC: Attendees will hear about the pythonic future of CUTLASS* in its next major 4.0 version at GTC.

GPU MODE ▷ #reasoning-gym (4 messages):

Deprecated Coach class, Curriculum Experiments 

Coach Class Faces Retirement: Members discussed the removal of the Coach class, deeming it deprecated in favor of Curriculum Experiments.
One member agreed to open a PR, confirming it's a leftover of early attempts.

Curriculum Execution FTW: Discussion focused on streamlining curriculum execution within the codebase.
The move aims to consolidate efforts and avoid confusion between older and newer approaches to curriculum management.

GPU MODE ▷ #submissions (11 messages🔥):

Leaderboard Submissions, GPU Tests 

Grayscale Leaderboard Receives Flood of Submissions: Multiple leaderboard submissions to the grayscale leaderboard were successful on GPUs: L4, T4, A100, and H100 using Modal runners.
Submission IDs included 2351, 2429, 2430, 2431, 2459, and 2460.

Vectoradd Benchmark Achieves Success: Benchmark submission with id 2363 to leaderboard vectoradd on GPUs: T4, L4, A100, H100 using Modal runners succeeded!
This indicates progress in the vectoradd benchmark across various GPU architectures.

Grayscale Tests Pass on A100 GPUs: Test submissions with IDs 2422 and 2423 to the grayscale leaderboard on A100 GPUs using Modal runners were successful.
These tests specifically targeted the A100 GPU architecture, indicating focused testing efforts.

GPU MODE ▷ #hardware (2 messages):

Consumer GPUs, Cloud GPUs, Local vs Cloud 

Consumer GPUs obsolete rapidly: Members discussed how consumer GPUs become obsolete quickly, especially when used for purposes other than gaming.
One member pointed out that for the same cost (~$1000), users can access the latest cloud GPUs on a rolling basis without any long-term commitment, while another mentioned that if you don’t have a gpu at home then you never get to hear it go brrrr.

Local GPU enthusiasts enjoy the "brrr" factor: Enthusiasts appreciate the auditory feedback of having a GPU running locally.
One user stated if you don’t have a gpu at home then you never get to hear it go brrrr.

Nomic.ai (GPT4All) ▷ #general (35 messages🔥):

Oblix, AI Orchestration, Local LLM for SFW Stories, LLM Leaderboards, PC Build for Medical Data 

*Oblix* Seamlessly Switches Between Local vs Cloud: A member shared a demo video (https://youtu.be/j0dOVWWzBrE) of Oblix, which seamlessly switches between local vs cloud while still maintaining context using agents to monitor system resources and make decisions.
The platform orchestrates between Ollama and OpenAI, dynamically deciding whether to process each AI request locally or in the cloud for optimal performance and cost-efficiency, as detailed on Oblix.ai.

LLM Leaderboards Compared for Model Selection: Members discussed finding reliable LLM leaderboards for specific purposes, with one member sharing links to Artificial Analysis and LM Arena.
Concerns were raised about filtering relevant models from these lists, particularly avoiding outdated or undesirable options like Grok-3.

Members seek advice on PC build for medical data processing: A member requested assistance with building a new PC to process medical data using AI, emphasizing the need for secure, offline operation, mentioning the Github wasn't clear enough.
Another member suggested starting with an Intel i9, 128GB RAM, and an Nvidia 4090 RTX.

GPT4All not ideal for audio file transcription: A member inquired about using GPT4All for local audio file transcription, specifically uploading .wav files.
Another member clarified that GPT4All is primarily designed for docs/pdf, recommending XTTS webui for wav to text conversion, although noting it's not a simple install.

Links mentioned:

MTEB Leaderboard - a Hugging Face Space by mteb: no description found
LLM Leaderboard - Compare GPT-4o, Llama 3, Mistral, Gemini & other models | Artificial Analysis: Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context w...
Transform Your AI Performance with Intelligent Hybrid Orchestration | Oblix.ai: Experience our interactive demo and see how our intelligent agents seamlessly switch between local LLM execution and cloud providers for optimal performance and cost efficiency.

Yannick Kilcher ▷ #general (10 messages🔥):

W-GAN saturation, Transformers soft slots, MCP UX/UI 

W-GANs mitigate gradient explosion: Traditional GANs saturate due to BCE, while W-GANs mitigate this by being linear, as illustrated in Figure 2 of the W-GAN paper.
Although vanishing gradients are less of a problem, instability can still occur if the generator or discriminator becomes too dominant, leading to saturation at both ends.

Soft Slots Dynamically Bind in Transformers: A member shared an image analysis on soft slot methods which shows how soft slots dynamically bind to input tokens or retrieved content in Transformers.
The equations for Attention and Soft Slots (S') are provided, highlighting the use of softmax and scaled dot-product attention mechanisms with learnable slots.

OpenAI.fm's UX/UI Rushed and Simple: Members commented on the simple UX/UI of OpenAI.fm, joking that it looked rushed.
One member quoted a view that the MCP enforces too much structure, which makes it vulnerable to disruption by less structured protocols that can evolve according to user needs, emphasizing that higher variance almost always wins in a server-client system because clients consume more of what they like and less of what they don't.

Link mentioned: OpenAI.fm: An interactive demo for developers to try the new text-to-speech model in the OpenAI API

Yannick Kilcher ▷ #paper-discussion (3 messages):

G-Retriever, Graph Question Answering, Graph RAG 

G-Retriever enables chatting with graphs: The G-Retriever paper details the semantic extraction of information from a knowledge graph, enabling chatting with your graph, graph QnA and Graph RAG.
Graph Question Answering benchmark introduced: The paper introduces a Graph Question Answering (GraphQA) benchmark with data collected from different applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning.

Link mentioned: G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering: Given a graph with textual attributes, we enable users to `chat with their graph': that is, to ask questions about the graph using a conversational interface. In response to a user's questions...

Yannick Kilcher ▷ #ml-news (16 messages🔥):

Claude Pokemon, AI Moore's Law, Hunyuan-T1 model 

Claude's Pokemon Prowess Questioned: Members express skepticism about Claude's ability, noting it is "quite garbage at Pokemon", questioning its capabilities despite general improvements.
Moore's Law for AI Agents: Discussion ensues on METR_Evals' research suggesting "Moore’s Law for AI agents", where the length of tasks AIs can do is doubling about every 7 months.
Some members dismiss the related chart as "actual bullshit", arguing that certain tasks, like training a classifier or optimizing chip creation, shouldn't be interesting for probabilistic models.

Hunyuan-T1 Launches for Reasoning: Hunyuan-T1, powered by Hunyuan TurboS, features a Hybrid-Mamba-Transformer MoE architecture and is designed for speed, accuracy, and efficiency, according to Tencent.
The new model boasts low hallucination in summaries and excels in long-text processing, as featured in the Hunyuan-T1 HuggingFace demo.

Links mentioned:

Tweet from Hunyuan (@TXhunyuan): 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. 🔥✅ Hybrid-Mamba-Transformer MoE A...
Tweet from METR (@METR_Evals): When will AI systems be able to carry out long projects independently?In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 mont...
Tweet from METR (@METR_Evals): We then fit a curve that predicts the success rate of an AI based on how long it took humans to do each task. This curve characterizes how capable an AI is at different task lengths. We then summarize...
Tweet from bycloud (@bycloudai): > mamba-transformer hybrid reasoning model near on par with DeepSeek-R1whatQuoting Hunyuan (@TXhunyuan) 🚀 Introducing Hunyuan-T1! 🌟Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powere...

LlamaIndex ▷ #blog (1 messages):

Local RAG app, GitIngest parsing, Streamlit UI, Ollama Llama 3.2 

Fully Local RAG App Aces Code Chat: A fully local, fully open-source RAG app that can chat with your code has been built by a LlamaIndex community member, and announced in a tweet.
GitIngest Parses for Streamlit Display: The app uses GitIngest to parse the code into summaries and markdown, using Streamlit for the UI, with details available at this link.
Ollama Runs Llama 3.2 Locally: It runs Meta's Llama 3.2 locally using Ollama.

LlamaIndex ▷ #general (13 messages🔥):

LlamaIndex TypeScript Agent Import Issue, Agent Workflow Parallel Execution Limits, Human-in-the-Loop Tool Limitations 

TypeScript Agent Import Issue Resolved: A member using LlamaIndex TS had an issue importing agent, but it was resolved by updating the tsconfig bundler configuration.
The user confirmed that modifying the TS config resolved the import error, and thanked the community for the suggestion.

Limiting Parallel Executions in Agent Workflows: A member asked about limiting parallel executions in Agent Workflows, specifically for a tool with a human-in-the-loop event.
They noted that the agent was calling the tool multiple times in parallel, causing issues, and they wanted to ensure the tool was called only once at a time; the issue was replied on GitHub.

Seeking Solutions for Human-in-the-Loop Tool Constraints: A member encountered issues due to parallel calls to a human-in-the-loop tool within an agent workflow and sought ways to limit executions.
The user is experiencing funky issues when an agent workflow tool is called many times in parallel, and is awaiting assistance on a related GitHub issue.

Link mentioned: [Question]: Parallel Human in Loop with Agent Workflow Issues · Issue #18220 · run-llama/llama_index: Question Validation I have searched both the documentation and discord for an answer. Question Searching and debugging a long time to find a solution. Thanks for any help! When an agent workflow is...

Cohere ▷ #「💬」general (4 messages):

Trial Key Limits, Command-A Training Data 

Trial Key Limit is per Account: A member asked if the monthly limit of 1k requests for trial keys is per key or per account, and another member clarified that it is per account.
They added that trying to bypass this by making multiple accounts will result in removal of all accounts.

Inquiry About Command-A's Training Data: A member asked about the cut-off date for Command-A’s training data.

Cohere ▷ #「🔌」api-discussions (4 messages):

Cohere API Errors, Rate Limiting, Checking Rate Limits 

Cohere API throws common errors: Users discussed various Cohere API error messages, including invalid request, rate limiting, and token limits.
The errors covered a range of issues, such as empty documents, short prompts, exceeding token limits, and incorrect model specifications.

Rate Limiting Discussed: Members noted that rate limiting errors can be identified by a 429 status code in the response, as detailed in the Cohere API documentation.
One user noted their code crashed due to not getting a response.

Checking API rate limits: A user asked if there was a way to check their rate limit to see how much they have left.
No resolution was given for checking rate limits via an API.

Link mentioned: Errors (status codes and description) — Cohere: Understand Cohere's HTTP response codes and how to handle errors in various programming languages.

Cohere ▷ #「🤖」bot-cmd (3 messages):

Bot Permissions 

Bot Permissions Issue: A user mentioned that the bot might have permissions issues on the channel.
User Greets the Bot: A user greeted another user. No specific AI discussion or links were shared.

Cohere ▷ #「🤝」introductions (2 messages):

Introductions, Low-code tech, Community Engagement 

Hospitality Expert Joins Cohere!: Gaby, a professional in the hospitality industry, introduces herself as a low-code tech enthusiast, proficient with platforms like Make and Adalo.
She expresses her eagerness to learn from fellow community members and contribute her own experiences.

Low-Code Tech Takes Center Stage: Gaby's introduction highlights the growing importance of low-code tools in various industries, showcasing their accessibility and potential.
Her expertise in Make and Adalo could provide valuable insights for others exploring similar technologies within the Cohere community.

Modular (Mojo 🔥) ▷ #mojo (12 messages🔥):

Duration Module Proposal, Mojo and PyTorch Integration, Nanosecond Precision as Base Unit 

Duration Module Proposal Reveals Weird Behavior: A member encountered a peculiar issue while working on a duration module proposal, specifically with type casting involving Ratio and Duration structs, and shared code snippet exhibiting the unexpected behavior.
PyTorch and Mojo Speed Boost?: A user inquired about the possibility of using PyTorch in Mojo and whether it could accelerate the training process with MAX.
There was no response in the provided messages, so this idea remains an open question in this channel.

Time Flies with Nanosecond Precision: A member suggested using nanosecond precision as the base unit for time, noting that UInt64 of Nanoseconds covers over 500 years, which should be sufficient.
Another member pointed out that C++ guarantees a default time resolution that can represent at least 292 years, also noting that seconds is the base SI unit for time.

DSPy ▷ #general (1 messages):

MIPRO v2, LLM-as-a-judge, Automatic Metrics, DSPy Optimization, Evaluation Metrics 

MIPRO v2 Used with LLM-as-a-Judge: A member mentioned using MIPRO v2 with LLM-as-a-judge as their metric for evaluation, pointing to a math reasoning tutorial.
The math reasoning tutorial provides an example of using MIPRO as a metric.

LLM-as-a-Judge Documentation Shared: Documentation on using LLM-as-a-judge was shared from DSPy's learning resources.
The documentation provides info on using AI feedback for metric evaluations.

Automatic Metrics Crucial for DSPy: It was highlighted that automatic metrics are essential for evaluation and optimization in DSPy.
DSPy leverages metrics to track progress and enhance the effectiveness of programs.

Metrics Defined for Task Evaluation: A metric is defined as a function that scores system outputs based on data examples.
Simple tasks may use basic metrics like accuracy or exact match, while complex tasks require metrics that check multiple output properties using AI feedback.

Links mentioned:

Reasoning - DSPy: The framework for programming—rather than prompting—language models.
Metrics - DSPy: The framework for programming—rather than prompting—language models.

tinygrad (George Hotz) ▷ #general (1 messages):

Unet3d model, 2D Convolutions 

Unet3d Model uses 2D Convolutions: A member questioned whether the example unet3d model is truly 3D, suggesting it resembles 2.5D due to its reliance on 2D convolutions and 2D transposes on 3D input.
The member emphasized the distinction from a genuine 3D Unet architecture.

2D vs 3D Unet Architectures: The discussion highlighted the difference between using 2D convolutions on 3D input (resulting in a 2.5D effect) and employing true 3D Unet architectures with 3D operations.
The user sought clarification on the implementation's dimensionality.

Torchtune ▷ #papers (1 messages):
krammnic: I like this: https://arxiv.org/pdf/2502.07923

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):