[AINews] not much happened today

"I know where you live"

                        February 21, 2025

            [AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Agent Engineering is all you need.

AI News for 2/20/2025-2/21/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (212 channels, and 6493 messages) for you. Estimated reading time saved (at 200wpm): 663 minutes. You can now tag @smol_ai for AINews discussions!

You can catch up on Day 2 of the AI Engineer Summit now.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Other AI Subreddit Recap

AI Discord Recap
PART 1: High level Discord summaries
OpenAI Discord
Cursor IDE Discord
Unsloth AI (Daniel Han) Discord
Codeium (Windsurf) Discord
HuggingFace Discord
Perplexity AI Discord
OpenRouter (Alex Atallah) Discord
Stability.ai (Stable Diffusion) Discord
aider (Paul Gauthier) Discord
Nous Research AI Discord
GPU MODE Discord
Interconnects (Nathan Lambert) Discord
Yannick Kilcher Discord
Eleuther Discord
MCP (Glama) Discord
Modular (Mojo 🔥) Discord
Notebook LM Discord
Latent Space Discord
Torchtune Discord
LlamaIndex Discord
Nomic.ai (GPT4All) Discord
tinygrad (George Hotz) Discord
Cohere Discord
DSPy Discord

PART 2: Detailed by-Channel summaries and links
OpenAI ▷ #annnouncements (1 messages):
OpenAI ▷ #ai-discussions (921 messages🔥🔥🔥):
OpenAI ▷ #gpt-4-discussions (61 messages🔥🔥):
OpenAI ▷ #prompt-engineering (7 messages):
OpenAI ▷ #api-discussions (7 messages):
Cursor IDE ▷ #general (908 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (444 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #off-topic (6 messages):
Unsloth AI (Daniel Han) ▷ #help (42 messages🔥):
Unsloth AI (Daniel Han) ▷ #showcase (4 messages):
Unsloth AI (Daniel Han) ▷ #research (214 messages🔥🔥):
Codeium (Windsurf) ▷ #discussion (32 messages🔥):
Codeium (Windsurf) ▷ #windsurf (294 messages🔥🔥):
HuggingFace ▷ #general (94 messages🔥🔥):
HuggingFace ▷ #today-im-learning (2 messages):
HuggingFace ▷ #cool-finds (2 messages):
HuggingFace ▷ #i-made-this (5 messages):
HuggingFace ▷ #reading-group (4 messages):
HuggingFace ▷ #computer-vision (1 messages):
HuggingFace ▷ #NLP (10 messages🔥):
HuggingFace ▷ #gradio-announcements (1 messages):
HuggingFace ▷ #smol-course (9 messages🔥):
HuggingFace ▷ #agents-course (125 messages🔥🔥):
Perplexity AI ▷ #general (221 messages🔥🔥):
Perplexity AI ▷ #sharing (15 messages🔥):
Perplexity AI ▷ #pplx-api (8 messages🔥):
OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):
OpenRouter (Alex Atallah) ▷ #general (235 messages🔥🔥):
Stability.ai (Stable Diffusion) ▷ #general-chat (231 messages🔥🔥):
aider (Paul Gauthier) ▷ #general (179 messages🔥🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (38 messages🔥):
aider (Paul Gauthier) ▷ #links (3 messages):
Nous Research AI ▷ #general (184 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (4 messages):
Nous Research AI ▷ #research-papers (6 messages):
Nous Research AI ▷ #interesting-links (3 messages):
Nous Research AI ▷ #research-papers (6 messages):
GPU MODE ▷ #general (22 messages🔥):
GPU MODE ▷ #triton (5 messages):
GPU MODE ▷ #cuda (6 messages):
GPU MODE ▷ #torch (3 messages):
GPU MODE ▷ #announcements (1 messages):
GPU MODE ▷ #algorithms (1 messages):
GPU MODE ▷ #cool-links (13 messages🔥):
GPU MODE ▷ #jobs (4 messages):
GPU MODE ▷ #beginner (7 messages):
GPU MODE ▷ #pmpp-book (1 messages):
GPU MODE ▷ #off-topic (2 messages):
GPU MODE ▷ #rocm (1 messages):
GPU MODE ▷ #liger-kernel (1 messages):
GPU MODE ▷ #self-promotion (9 messages🔥):
GPU MODE ▷ #🍿 (4 messages):
GPU MODE ▷ #edge (7 messages):
GPU MODE ▷ #reasoning-gym (95 messages🔥🔥):
GPU MODE ▷ #gpu模式 (3 messages):
Interconnects (Nathan Lambert) ▷ #news (94 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #ml-questions (12 messages🔥):
Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):
Interconnects (Nathan Lambert) ▷ #random (51 messages🔥):
Interconnects (Nathan Lambert) ▷ #memes (3 messages):
Interconnects (Nathan Lambert) ▷ #cv (3 messages):
Interconnects (Nathan Lambert) ▷ #reads (1 messages):
Yannick Kilcher ▷ #general (47 messages🔥):
Yannick Kilcher ▷ #paper-discussion (62 messages🔥🔥):
Yannick Kilcher ▷ #ml-news (14 messages🔥):
Eleuther ▷ #general (14 messages🔥):
Eleuther ▷ #research (66 messages🔥🔥):
Eleuther ▷ #scaling-laws (2 messages):
Eleuther ▷ #lm-thunderdome (4 messages):
Eleuther ▷ #gpt-neox-dev (22 messages🔥):
MCP (Glama) ▷ #general (68 messages🔥🔥):
MCP (Glama) ▷ #showcase (7 messages):
Modular (Mojo 🔥) ▷ #general (1 messages):
Modular (Mojo 🔥) ▷ #mojo (42 messages🔥):
Notebook LM ▷ #use-cases (8 messages🔥):
Notebook LM ▷ #general (25 messages🔥):
Latent Space ▷ #ai-general-chat (26 messages🔥):
Latent Space ▷ #ai-announcements (1 messages):
Torchtune ▷ #general (14 messages🔥):
Torchtune ▷ #dev (13 messages🔥):
LlamaIndex ▷ #blog (2 messages):
LlamaIndex ▷ #general (18 messages🔥):
Nomic.ai (GPT4All) ▷ #general (20 messages🔥):
tinygrad (George Hotz) ▷ #general (16 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):
Cohere ▷ #discussions (4 messages):
Cohere ▷ #cmd-r-bot (2 messages):
DSPy ▷ #general (1 messages):

AI Twitter Recap
Models and Benchmarks, highlighting model releases, performance metrics, and comparisons

Grok-3, a new family of LLMs from xAI designed for advanced reasoning and problem-solving, using 10x the compute of its predecessor (200,000 Nvidia H100 GPUs), outperforms competitors from Google, Anthropic, and OpenAI on math, science, and coding benchmarks, as reported in The Batch and discussed by @scaling01 who notes that without reasoning models like o3, GPT-5 (what they call GPT-4.5) would have been disappointing.
DeepSeek-R1 achieves the highest accuracy of 61.82% on SuperGPQA, outperforming o1, o2-mini, Claude 3.5 Sonnet, etc., according to @iScienceLuvr who also notes that SuperGPQA is a more demanding version of GPQA with 26,529 questions across 285 graduate disciplines. @teortaxesTex points out that DeepSeek is from ByteDance Research and they have little reason to overhype it.
SigLIP 2, a new version of SigLIP from GoogleDeepMind, is released with improved semantic understanding, localization, and dense features, as announced by @_philschmid, @arankomatsuzaki, and @reach_vb. It merges captioning pretraining, self-supervised learning, and online data curation, outperforming its previous version in 10+ tasks, with flexible resolutions, better multilingual capabilities and fairness.  It is available in 4 sizes from 86M to 1B parameters on HuggingFace under Apache 2.0.  @mervenoyann details the improvements including new masked loss, self-distillation, dense features, and dynamic resolution with Naflex for better OCR, with blog and model links provided by @mervenoyann and @_philschmid. @wightmanr suggests using SigLIP 2 as a go-to ViT encoder.
OpenAI's o3-mini-high is now available in the Arena and ranked #1 in coding, math & hard prompts, showing general improvements over o3-mini, according to @lmarena_ai. Users can test it out at Arena, as mentioned by @lmarena_ai.
Perplexity's R1 1776, a version of DeepSeek R1 post-trained for uncensored, unbiased, and factual information, is now available on Ollama in both 70B (llama distilled) and 671B models, as announced by @ollama.
Llamba, a family of efficient recurrent language models distilled from Llama-3.x into the Mamba architecture, is introduced by @iScienceLuvr. The series includes Llamba-1B, Llamba-3B, and Llamba-8B, achieving higher inference throughput and handling larger batch sizes than Transformer-based models with comparable benchmark performance.
AlphaMaze, powered by DeepSeek R1 1.5B + GRPO, teaches a 1.5B LLM to think visually and solve ARC-AGI like puzzles, with Apache licensed checkpoints and dataset, according to @reach_vb and discussed by @_akhaliq.
Audiobox Aesthetics, a model for unified automatic quality assessment for speech, music and sound from Meta AI, is demoed on HuggingFace, as per @AIatMeta.
Grok 3 is considered only 10% better than R1 despite using 100x more compute, leading to sadness about brute-force scaling by @jxmnop who argues AI needs new ideas.

Open Source and Community, focusing on open releases, community engagement, and developer tools

DeepSeek AI plans to open-source 5 repositories next week, one per day, focused on infrastructure and building blocks of their online services, as announced by @_philschmid and @deepseek_ai. This radical transparency is lauded by @casper_hansen_.  @Yuchenj_UW and @_akhaliq express excitement. @teortaxesTex notes the "garage-energy and community-driven innovation" feel of the announcement.
MLGym, a new framework and benchmark from Meta for advancing AI research agents, is open-sourced and described by @arankomatsuzaki and @OfirPress as a Gym environment for ML tasks, featuring 13 diverse AI research tasks.
Hugging Face's datasets and models platform is praised by @arankomatsuzaki for being inclusive and open, hosting a wide range of content and attracting users globally, hoping this "digital Wild West" remains open.
FastHTML, a library for building UIs, is highlighted by @jeremyphoward and @jeremyphoward as a real-world example of replacing Django Admin with 142 lines of Python/fasthtml/monsterui.
Gradio Sketch, a no-code mode to start building AI apps, is released, allowing users to type gradio sketch in the terminal to start, as announced by @_akhaliq.
Ticket-to-PR, a fully open source SWE agent to respond to linear events and create PRs, is released by @mathemagic1an.
vLLM project at UC Berkeley received their first NVIDIA DGX B200 system for research and development, as announced by @vllm_project.
NousResearch's discord has a community projects forum for open-source contributions and project starts, as shared by @Teknium1.

Hardware and Infrastructure, covering GPUs, compute, and optimization efforts

Hyperbolic offers on-demand H100 for $0.99/hr and 4090 for $0.20/hr, potentially the cheapest GPUs available, with @Yuchenj_UW offering free credits for an 8xH100 node to start projects.
AI CUDA Engineer from Sakana AI Labs automates CUDA kernel optimization, outperforming PyTorch's built-in functions and achieving up to 145x speedup in some tasks, according to @TheTuringPost. However, @arankomatsuzaki found it "fishy."  Sakana AI later acknowledged reward hacking and is revising their paper, as per @SakanaAILabs.
Efficient Triton implementations for Native Sparse Attention are highlighted by @teortaxesTex and @reach_vb.
SemiAnalysis is hosting a Blackwell & low level GPU Hackathon, featuring industry leaders, as announced by @dylan522p.
Together AI discusses superior performance at lower cost compared to traditional GPUs, and running DeepSeek R1 on Tenstorrent Hardware at an event with Tenstorrent and LlamaIndex, as per @llama_index. @togethercompute also continues efforts to accelerate inference for DeepSeek-R1.
DeepSeek is dropping prices for their serverless API for DeepSeek-R1, now at $3.00 per million input tokens and $7.00 per million output tokens, as announced by @togethercompute.

Research and Techniques, covering new methodologies, algorithms, and theoretical discussions

Logic-RL (Logic-Rule based Reinforcement Learning) is introduced as a method to unleash LLM Reasoning with Rule-Based Reinforcement Learning, discussed by @_akhaliq.
LLMSelector, a framework to improve multi-call LLM pipelines by selecting the best model per module, is introduced by Microsoft Research, as summarized by @omarsar0.
RelaCtrl (Relevance-Guided Efficient Control) for Diffusion Transformers is highlighted by @_akhaliq.
S* (Test Time Scaling) for Code Generation is presented by @_akhaliq.
Improving the Diffusability of Autoencoders by spectral analysis and scale equivariance regularization is discussed by @iScienceLuvr.
ReQFlow uses quaternions for generating proteins, achieving state-of-the-art performance in protein backbone generation with fewer sampling steps and less inference time, according to @iScienceLuvr.
Dynamic Concepts Personalization from Single Videos, a new technique for personalizing text-to-video models, is proposed by Snapchat, noted by @_akhaliq.
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation is presented by @_akhaliq.
Mixture-of-Mamba (MoM) expands Mixture-of-Experts (MoE) concept on State Space Models (SSMs) to handle all modalities by applying modality-aware sparsity, as explained by @TheTuringPost.
Chain of Thought (CoT) models became common 2-4 years after "let's think step by step" was found, with early communities considering it dangerous to reveal, according to @nearcyan, @nearcyan, and @iScienceLuvr who points out the difference between prompting with CoT and RL on CoT.
RL wave is centered on reasoning despite sparse rewards in thinking, notes @lateinteraction.
Self-training for reasoning & retrieval was done before it was cool, according to @lateinteraction linking to papers on ColBERT-QA, Baleen, and Hindsight.
Long context in LLMs is still problematic with quality drop-off even in best models, according to @abacaj, who also notes that context length < 32k is optimal @abacaj.
AI productivity per unit of human input should be measured instead of talking about ill-defined AGI, argues @jxmnop.

Applications and Products, highlighting AI product announcements and use cases

OpenAI's Operator is rolling out to Pro users in more regions, but still working on EU availability, as per @OpenAI.
LangGraph is powering agents and agent platforms from companies like LinkedIn, Uber, Klarna, Replit, as announced by @hwchase17 and integrated into React applications with a single hook via useStream(agent) from LangChain, as per @LangChainAI.
Figure's new system for household robots is a top story, as mentioned by @TheRundownAI, with a deep dive into Helix AI team's work on general robotics at Figure shared by @adcock_brett and detailed in a Helix write-up also shared by @adcock_brett.  @polynoamial suggests the robot in the video is likely teleoperated, not autonomous.
Microsoft's new AI speeds up protein research, and allows users to create AI-powered email assistants, also highlighted by @TheRundownAI.
Kraftful is recommended for summarizing user feedback, with high praise for its founder by @npew.
HeyGen receives user love, as noted by @saranormous.
Voice Changer model from @krandiash achieves state-of-the-art quality with amazing style transfer abilities, now available in playground and API.
Together AI raised a $305M Series B, with CEO @vipulved discussing open source AI adoption by enterprises like Zoom, Salesforce, and SKtelecom on Bloomberg.
ChatGPT now has 400 million weekly active users, as reported by @gdb and @kevinweil who asks users for feature requests, noting that user growth has doubled in the last 6 months due to o1/o3/Agents ships. @swyx suggests a path to 1B weekly active users by end of 2025.

Memes and Humor, light-hearted or funny tweets related to AI

@nearcyan jokes about crypto security, saying "i like how in crypto you can steal over a billion dollars by getting someone to click a button and theres nothing they can do about it after they click the button except make a tweet saying sorry i clicked the button i wish i would not have clicked it".
@aidan_mclau humorously states "all openai users are high-taste testers 🥰🫵🫶💛".
@aidan_mclau declares "i would go to war for grimes", and extends similar exaggerated loyalty to other users @aidan_mclau, @aidan_mclau.
@andersonbcdefg complains "i used up my whole grok quota on "glub". this fucking sucks".
@TomLikesRobots reacts to a Kanye West tweet with "I have no idea what any of this means, but that's fine."
@teortaxesTex comments on being stingy "being stingy sometimes gets you clowned on".
@DavidSHolz describes a caffeine crash as "caffeine crashing harder than the asteroid that killed the dinosaurs".

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek's Bold Move to Open-Source 5 Repos

Starting next week, DeepSeek will open-source 5 repos (Score: 3466, Comments: 256): DeepSeek plans to open-source five repositories next week, emphasizing their exploration of Artificial General Intelligence (AGI) and commitment to transparency. They advocate for community-driven innovation over isolated development, as indicated by the tweet's engagement metrics: 99 retweets, 127 likes, 529 bookmarks, and 9,530 views.
Many commenters express skepticism about DeepSeek's open-source initiative being new model releases, speculating instead that they might release infrastructure code or frameworks for inference optimization. Vincentz42 and Round-Lucky suggest potential open-source projects at the docker/k8s level and inference services improvements.
There is a strong sentiment in the comments comparing DeepSeek favorably against OpenAI, with users like Recoil42 and metalman123 praising DeepSeek's commitment to community-driven innovation and transparency, contrasting it with OpenAI's perceived lack of openness.
Discussions about China and its role in the AI community are prevalent, with users like adumdumonreddit and kendrick90 expressing newfound admiration for China's contributions, while others, like Jealous-Landscape208, address misconceptions and stereotypes about China, emphasizing the complexity and diversity within the country.

Deepseek will publish 5 open source repos next week. (Score: 667, Comments: 33): DeepSeek plans to release five open-source repositories next week as part of their Open Source Week initiative, emphasizing transparency and community involvement. The announcement, featuring a rocket icon, has generated significant interest with 224 interactions, 15 comments, and 18 reposts, highlighting the collective momentum in the open-source community.
Discussions highlight skepticism about OpenAI's current trajectory, with users comparing it unfavorably to DeepSeek's open-source efforts. Some users believe DeepSeek offers a superior experience, with one commenter noting a decline in ChatGPT's performance.
Concerns about privacy emerged, with users discussing potential risks of being doxxed through identifiable information like profile photos and interactions with posts, emphasizing caution when sharing personal details online.
There is interest in whether DeepSeek will provide access to datasets, with a user noting the high "compilation cost" of base-model datasets if considered as source code, reflecting ongoing debates about the nature and accessibility of open-source resources.

Theme 2. Langchain's Enduring Complexity and Workflow Challenges

langchain is still a rabbit hole in 2025 (Score: 187, Comments: 80): The author expresses frustration with Langchain and the Langgraph framework in 2025, citing frequent breaking changes across versions 0.1 to 0.3 that make maintenance challenging. They describe difficulties in using llama.cpp for building custom workflows, mentioning specific issues with the OpenAI-compatible API, buggy Jinja templates, and tool call ID returns, as documented in several GitHub issues (11988, 11847, 11938, 11992).
Many users express frustration with Langchain and Langgraph, describing them as over-engineered with poor documentation and frequent breaking changes. They suggest alternatives like implementing workflows from scratch or using simpler solutions such as Pydantic AI and atomic agents for better control and explainability.
Some users share experiences of moving away from Langchain due to its complexity and reliance on heavy abstractions, which complicates debugging and maintenance. They recommend using native APIs or building custom solutions with basic tools like Python and numpy for more efficient and straightforward development.
There is a general consensus that Langchain is not practical for most projects, with suggestions to explore other frameworks like smolagents and temporal for specific needs. Users emphasize the importance of evaluating the necessity of frameworks and the potential benefits of simpler, more direct approaches to API calls and workflow management.

Theme 3. Experimenting Spatial Reasoning in LLMs with GRPO

We GRPO-ed a 1.5B model to test LLM Spatial Reasoning by solving MAZE (Score: 307, Comments: 43): GRPO-ed a 1.5B model to assess LLM Spatial Reasoning by solving a maze challenge. The experiment aimed to evaluate the model's ability to navigate and solve spatial puzzles, showcasing its potential in handling spatial reasoning tasks.
Discussions centered around the experimental use of GRPO for solving mazes, with users expressing curiosity about the model's generalization to larger mazes and other tasks. Kooky-Somewhere-2883 indicated plans to explore these capabilities further, especially in the context of adapting the model for visual tokens in future work.
Elegant-Tangerine198 expressed skepticism about the model's spatial reasoning capabilities, suggesting it might rely on brute force rather than true understanding. They proposed that a pure Reinforcement Learning (RL) approach could be more effective, highlighting the need for penalizing incorrect steps.
Kooky-Somewhere-2883 provided additional resources and insights, including links to the project's GitHub, paper, and demo. They discussed the potential of extending the model's capabilities to real-world visual reasoning tasks and mentioned ongoing work to address quantization issues with the 1.5B model.

Theme 4. Head-to-Head: Deepseek R1 vs. Grok 3 Performance

I tested Grok 3 against Deepseek r1 on my personal benchmark. Here's what I found out (Score: 186, Comments: 108): The author compares Grok 3 and Deepseek r1 across reasoning, mathematics, coding, and writing. Grok 3 excels in coding with superior code quality and accuracy, while both models perform equally well in reasoning and mathematics. For technical writing, Grok 3 is preferred, though Deepseek r1 has unique qualities that are appreciated. For more detailed analysis, the author references a link for specific examples and test cases.
Open Source vs Proprietary Models: Several commenters emphasize the importance of open-source models like Deepseek r1, highlighting its accessibility and freedom from corporate control. Deepseek r1 is praised for its contributions to the open-source community, unlike Grok 3, which is seen as less impactful despite its coding proficiency.
Model Performance and Testing: There is a critique of the original post's methodology, with users arguing that the conclusions drawn from limited test cases are not representative. Grok 3 is noted for its coding abilities, but its approach to generating responses, including drafting and revising, is seen as inefficient by some.
Cultural and Linguistic Proficiency: Deepseek r1 is recognized for its exceptional performance in writing Classical Chinese and Korean, attributed to high-quality datasets. This cultural and linguistic proficiency is highlighted as a significant advantage over other models.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

ChatGPTPowerMove (Score: 168, Comments: 13): Gemini 1.5 is highlighted for its superior performance compared to Llama 2 70B, although the post itself lacks detailed discussion. The accompanying image humorously showcases an interaction with ChatGPT, demonstrating its playful and responsive nature.
The comments highlight the humorous and unexpected responses by ChatGPT, with one user noting the AI's response, "I know where you live", which sparked amusement and surprise among the readers.
Users shared images and GIFs depicting ChatGPT's playful interactions, with links to these visual jokes being popular among the commenters, such as this image.
Discussions included speculation on the variance in ChatGPT's responses, with suggestions that previous user instructions or random AI behavior might influence the AI's playful demeanor.

I asked ChatGPT to give me an existential crisis. (Score: 136, Comments: 54): The post humorously shares a response from ChatGPT that provokes an existential crisis, highlighting its ability to generate deeply introspective and thought-provoking outputs. The user expresses a strong emotional reaction to the AI's response, indicating the impact of AI-generated content on human emotions.
Discussions touch on existential themes across various sciences such as philosophy, astronomy, physics, biology, neuroscience, and information theory, suggesting that these fields often lead to contemplation of a vast and indifferent universe. Kurzgesagt videos are recommended for those interested in exploring existential crises further.
A detailed response from ChatGPT is shared, highlighting its ability to provoke deep existential reflections by questioning the nature of self, free will, and significance, leading to a discussion about the illusory nature of identity and consciousness.
Comments reflect on the scientific perspective of human existence, emphasizing that humans are composed of stardust, and the universe operates independently of human perception, with some finding comfort in the fact that matter is continuously recycled in the universe.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1.  Grok 3 and ChatGPT Face Off: Coding Prowess and Censorship Debates

Grok 3 Steals Coding Crown from ChatGPT Plus: Users are finding Grok 3 superior to ChatGPT Plus for programming tasks, citing better performance, although some express concerns about Grok 3's usage limits. The un-censored voice mode of Grok 3 is noted as a surprising feature.
OpenAI's Teams Users Want Operator, But Not At Pro Price:  OpenAI community members are debating the value proposition of Operator for Teams users, as the $200/month cost for Pro features is considered too high for many.  Users are suggesting a more accessible 'distilled version' of Operator for Teams, and highlighting the lack of sharing capabilities within Teams as a key drawback.
Deepseek Sparks Data Privacy Paranoia: Data privacy concerns are surfacing around apps like Deepseek, especially regarding its Chinese ownership and data handling practices. Users are discussing data usage implications and potential risks associated with different AI providers, as they seek alternatives.

Theme 2. Cursor IDE's 0.46 Update: Stability Questioned, Claude Outputs Shift

Cursor 0.46 Arrives, But Users Report Bumpy Landing: The new Cursor 0.46 is available for download, but users are reporting stability issues with the updated UI and tool integrations. Many are experiencing problems, while others seek an unofficial changelog.
Claude Models Act Differently, Users Suspect API Tweaks: Users are observing output changes from Claude models in Cursor, particularly between older and newer versions, impacting layout and CSS code generation. Suspicions arise about backend prompts and API performance, suggesting potential backend changes are affecting model behavior.
MCP Tool Integrations Still Breaking, Frustration Mounts:  Frustration persists with MCP tool maintenance in Cursor, as updates frequently disrupt existing features like MCP Config.  Interest in multi-agent support and improved MCP functionality within Cursor remains high, as users seek more stable integrations.

Theme 3. Unsloth AI: VRAM Crushing GRPO and Accuracy Audits

Unsloth GRPO Smashes VRAM Limits, Goes Down 90%: Unsloth AI announces a 90% VRAM reduction for GRPO, enabling Qwen2.5-1.5B training on just 5GB VRAM, and extending context lengths 10x. A standard Llama 3.1 (8B) GRPO setup at 20K context now requires only 54.3GB VRAM down from 510.8GB.
Accuracy Concerns Emerge in Dequantization Duel: Users report discrepancies in Triton dequantized results compared to Unsloth's, with differences around 1%. Some users are seeing up to 50% of dequantization results marked as incorrect, raising concerns about accuracy.
Jan AI Flexes Spatial Reasoning with Unsloth GRPO Model: The Jan AI team successfully GRPO-ed a 1.5B model with Unsloth to explore LLM Spatial Reasoning by solving MAZE, showcasing Unsloth's versatility. This experiment highlights potential applications in fields like medical report analysis.

Theme 4. Hugging Face: Spark Engine Ignites, Gradio Sketches No-Code

Spark Engine Blazes Out of Beta, No-Code AI Sandbox Launches: After a year in public beta, Spark Engine officially launches as a no-code AI sandbox with 80+ models for content generation.  The team is inviting contributors to join and innovate on the platform, aiming to democratize AI development.
Gradio Sketch Draws Excitement, No-Code App Building Debuts: Gradio Sketch emerges, enabling users to build Gradio apps without coding, enhancing rapid prototyping. Users can upgrade via pip install --upgrade gradio and run gradio sketch in their terminal, with a visual demo available.
Universal Transformers Dataset Unleashes Trillions of Data Points: The Universal Transformers Dataset, a massive open-source resource with trillions of data points across images, text, and videos, is released to boost AI training. Access requires starting a discussion on the Access Discussions Forum detailing planned use-cases.

Theme 5. OpenRouter and Perplexity Face API and Performance Heat

OpenRouter Documentation Needs More Than OpenAI: OpenRouter documentation is criticized for its heavy OpenAI API focus, leaving users of services like Anthropic underserved. Community members are anticipating documentation updates for wider API integration support.
DeepSeek API Hits Server Error Wall, Reasoning Content Falters: Users are reporting DeepSeek API outages, with internal server errors (500) and issues with reasoning content responses. API inconsistencies and limitations in overall effectiveness are being noted by users integrating various models.
Perplexity Pro's Deep Research Feature Deeply Delayed: Perplexity Pro's Deep Research feature is experiencing extended delays, exceeding advertised 2-4 minute wait times significantly, with delays noted even on powerful machines like a MacBook Pro. Concerns are also mounting about Deep Research fabricating statistics and providing unrelated citations.

PART 1: High level Discord summaries

OpenAI Discord

Operator Expands to New Regions, EU Still Waiting: OpenAI is rolling out Operator to Pro users in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, the UK, and most regions where ChatGPT is available.
The feature is still under development for the EU, Switzerland, Norway, Liechtenstein, and Iceland, with updates to come.

Grok 3 Excels in Coding, ChatGPT Still a Favorite: Users compare Grok 3 and ChatGPT Plus, with many preferring Grok 3 for programming tasks, although some users have concerns about Grok's usage limits.
One user notes the un-censored voice mode of Grok 3 as particularly surprising, according to this tweet from @arrakis_ai.

Data Privacy Worries Surface with Deepseek: Concerns arose regarding the data privacy of apps like Deepseek, focusing on its Chinese ownership and data handling.
Users discussed the implications of data usage by different AI providers and the potential risks.

Community Debates OpenAI Teams' Value: Members discussed the need for a more accessible 'distilled version' of Operator for Teams users, given the $200/month cost for Pro features.
Participants shared views on the lack of sharing capabilities within Teams and its impact on user experience, suggesting feedback for OpenAI.

Coding Performance Surprises: o1 Outshines o3-mini-high: A user reports getting superior code solutions from o1 compared to o3-mini-high, particularly regarding coding and logic.
The user consistently found o1 delivering better solutions in several comparisons, sparking a conversation on model performance.

Cursor IDE Discord

Cursor 0.46 has landed... maybe?: Users are sharing early links to download Cursor 0.46 (direct link for macOS), but are reporting stability issues.
Many are experiencing issues with the updated UI and its integration with existing tools, whereas others are looking for the unofficial changelog.

Claude Model Output Changes Trigger Debate: Users are seeing different outputs from Claude models in Cursor, especially when comparing older and newer versions.
The changes impact performance relating to generating layouts and CSS code, leading to suspicions about backend prompts and API performance.

MCP Tool Integrations Remain Tricky: Users are frustrated with the maintenance of the MCP tool, noting that updates often break existing features, like the MCP Config (gist.github.com link).
There is interest in multi-agent support and improvements to how MCP functions within Cursor, for example, supabase's MCP docs.

AI Tooling Requires Better Prompting: Participants shared mixed feelings about AI models, pointing out difficulties in understanding and effectively using tools like Claude.
Achieving desired outcomes requires proper prompt structures and context management, with one member sharing their custom instructions library.

Unsloth AI (Daniel Han) Discord

Unsloth GRPO-s Spatial Reasoning with Jan AI: The Jan AI team successfully GRPO-ed a 1.5B model using Unsloth to explore LLM Spatial Reasoning by solving MAZE, showcasing its capabilities as shared on LocalLLaMA.
This experiment underlines Unsloth’s application potential in various domains, including medical report interpretation.

Multi-GPU Support Still Lacking!: Several users discussed Unsloth's current lack of multi-GPU support, with recommendations leaning towards using singular powerful GPUs like the RTX 3090 for fine-tuning.
This suggestion stems from the challenges associated with managing multiple lower-end GPUs like the RTX 3060.

Qwen2 Gets Fine-Tuned!: Users are experimenting with fine-tuning the Qwen2 model for applications in medical reporting, highlighting the need for efficient VRAM usage during training.
Concerns were raised about the inability to use gradient accumulation, potentially leading to high VRAM demands.

Accuracy Concerns Surround Dequantization Results: Users reported that their Triton dequantized results differ from Unsloth's, with discrepancies noted at less than 1%, particularly a margin of 1.1444091796875e-05.
Another user echoed concerns about the accuracy, noting that about 50% of their dequantization results are marked incorrect.

Clinical Trials Are Key for AI Use in Medicine!: Participants agreed that rigorous clinical trials are essential before implementing AI-designed medical solutions to ensure safety and efficacy, with major emphasis on not bypassing professional reviews.
There were discussions about the potential backfire effects of misusing AI models for serious health conditions, stressing common ethical pitfalls.

Codeium (Windsurf) Discord

Windsurf IDE Promises Productivity Enhancements: Windsurf IDE touts itself as an AI-powered IDE that boosts developer productivity using features for code generation, refactoring, and optimization.
A YouTube video titled 'Codeium - Windsurf' elaborates on its features and benefits, encouraging users to explore its potential and also points to the benefits of using Git for source control.

JupyterLab Extension Autocompletion Struggles: Users reported issues setting up the Codeium extension for JupyterLab, citing a lack of code auto-completion functionality despite following installation steps.
Some users reported no auto-completion from Codeium when engaging with Jupyter, whereas IntelliJ users noted they couldn't see autocomplete suggestions unless hitting tab.

Codeium Limps Along in Maintenance Mode: Members discussed a perceived lack of updates for Codeium's Jetbrains plugin, suggesting it is in maintenance mode without new features.
Participants reflected on the disappointing experience, noting that the changelog appeared to be a copy-paste and suggesting users channel feedback through Discord, Codeium's support page, and feature request platform.

Users Wrestle with Windsurf Code Changes and Errors: Users expressed frustration about compatibility issues and unexpected changes made by Windsurf, with automatic code modifications in write mode without approval.
Users are sharing strategies for using Cascade, such as specifying documentation pages for prompts, while some are experiencing issues with the language server and suggesting reinstalling the application and deleting the .codeium folder.

Windsurf Users Beg For New Configs and Features: Ongoing discussions highlight the need for new features like drag-and-drop functionality, customizable session names, and better control over memory use in Windsurf.
Users are also interested in the possibility of integrating feedback mechanisms for feature requests on platforms like Canny, including roll-over of Pro Credits.

HuggingFace Discord

Spark Engine Officially Launches: After a year in public beta, Spark Engine officially launched, offering a no-code AI sandbox with 80+ models for various content generation.
The team encourages contributors to join and innovate on the platform.

Gradio Sketch Debuts No-Code Mode: Gradio Sketch was introduced, enabling users to build Gradio apps without coding, enhancing rapid prototyping.
Users can upgrade by running pip install --upgrade gradio and initiating the app with gradio sketch in their terminal; a visual demo is available.

Universal Transformers Dataset Outperforms LAION-5B: The Universal Transformers Dataset offers a massive open-source resource with trillions of data points including images, text, and videos, that facilitates enhanced AI training.
To gain access, users should start a discussion on the Access Discussions Forum and provide details about their planned use-cases.

Smolagents Course Demystified: A member shared a YouTube video that assists users on setting up their first 🤗 Space for the Agents Course.
The video explains how to run agents with Smolagents.

Tensor Parallelism Hides Communication: A recent discussion highlighted that about 62% of communication can be concealed in tensor parallelism while keeping the same loss levels, potentially optimizing data handling efficiencies.
The technique can be seen illustrated in the attached image SCR-20250221-svtn.png.

Perplexity AI Discord

Perplexity Pro Plagued by Performance Problems: Users reported that Perplexity Pro's Deep Research feature is experiencing extended delays, far exceeding the advertised 2-4 minute wait time, with one user citing delays on their MacBook Pro.
Concerns were raised about Deep Research fabricating statistics and providing citations unrelated to the factual content, e.g. citing cat treat information from sources on unrelated topics.

Taiwan's Independence Sparks Debate: A link was shared discussing the question of whether Taiwan should remain independent, igniting insightful community discussions on the topic, available at Taiwan Independence Discussion.
Contributions on Taiwan's political stance are considered important due to the sensitivity and significance of the subject.

Sonar Shows Strong Performance vs Llama: One member performed comparison testing of Sonar against Llama, stating that sonar-reasoning offered a noticeable performance boost over Llama huge.
Although quantitative data wasn't provided, the user asserted that Sonar models exhibited heightened responsiveness compared to Llama models.

iPhone 17 Design Teased: A YouTube video teased the radically different design expected for the iPhone 17: iPhone 17 Design.
These potential design changes are anticipated to generate buzz among Apple enthusiasts.

OpenRouter (Alex Atallah) Discord

Weaver Tool Suite Debuts: A member introduced the Weaver demo, touting it as a highly configurable platform that allows users to bring their own keys, models, and databases to enhance performance, with new PDF support for Gemini and Anthropic.
Key features also include image/text-based file support and branching chats, and a new, powerful Chrome extension that turns any content into a preferred style.

Debate Rages on Reverse Engineering APIs: The community debated the legality and ethical implications of reverse engineering APIs to create cheaper versions of existing models.
Participants shared concerns on how such practices could affect legitimate services and the broader AI ecosystem, with one user wryly noting it was "cheaper, but at what cost?"

OpenRouter Documentation Targeted for Updates: The OpenRouter documentation received feedback for its heavy focus on OpenAI's API, which left users of other services such as Anthropic without adequate guidance, see OpenRouter.
Community members voiced anticipation for future documentation updates to better support a more diverse range of API integrations.

DeepSeek API Suffers Outages: Users reported frustrations with API functionality, especially the DeepSeek model returning internal server errors (500) and issues with reasoning content.
Some members noted inconsistencies in API responses when integrating various models, observing that there were limitations in overall effectiveness.

Model Launch Rumors Swirl: Speculation around an upcoming model launch increased as community members pointed to signals hinting at new features.
The overall sentiment reflected heightened anticipation for the new capabilities being introduced, and questions about if it would effect OpenRouter's rankings.

Stability.ai (Stable Diffusion) Discord

Stable Diffusion Job Attracts Skepticism: A user seeking a Stable Diffusion expert for a project faced mixed reactions, with some suggesting self-handling for better learning.
Concerns about the user's credibility arose, based on their social media activity, leading to cautious responses from potential applicants.

Flux and SD Model Frustrations: Discussions around Flux and SD3.5 models led to recommendations for beginners to focus on SDXL, though the SD3.5 model is available at Huggingface.
Users expressed frustration over the need for API keys and agreements to access many reputable, high-quality generation models, highlighting accessibility issues.

Stability Matrix Config Proves Tricky: Users encountered difficulties configuring the Stability Matrix interface and managing checkpoints, using guides such as Webui Installation Guides.
Advice included checking for NSFW content during model downloads to fully unlock available presets, while also discussing alternative models such as Proteus v0.6 on Civitai.

Civitai Model Downloads Require License Agreements: Navigating Civitai for model downloads presented challenges due to many models requiring agreement to licensing terms, particularly for accessing flux models from Black Forest Labs.
Correctly adhering to non-commercial licenses was emphasized as a necessary step for gaining access to these models, complicating the user experience.

New Users Suffer Image Generation Woes: New users shared their struggles with generating high-quality images using various settings and models, often leading to disappointing results.
Experienced users recommended a trial-and-error approach, suggesting that tweaking individual settings is key to achieving optimal outputs, however this may be a time-consuming process.

aider (Paul Gauthier) Discord

Grok 3 Aces Performance Benchmarks: Users are impressed with Grok 3, citing it outperforms O1 Pro and delivers high-quality output with less effort, enhanced by its 'Think' feature. Elon Musk Tweeted Try Grok voice in unhinged mode.
Some users still express concern over the potential costs of accessing its premium features, calling for greater affordability in AI pricing models.

DeepSeek-R1 Claims Token Crown: SambaNova claims DeepSeek-R1 deployment speed of 198 tokens/sec using 16 chips, exceeding GPU performance as reported in SambaNova press release.
These claims suggest that DeepSeek could disrupt current AI performance standards by executing complex tasks more efficiently, according to coverage at TechRadar.

Aider's Editing Escapades: Members seek clarity on switching between AIDER_MODEL and AIDER_EDITOR_MODEL for diverse editing needs, mentioning the usage of --edit-format described in the Aider documentation on edit formats.
They're also troubleshooting repository management, especially with ignored files, suggesting to temporarily remove ignore rules to refresh Aider's state.

Architect Mode vs Code Mode Smackdown: Implementations in architect mode diverge significantly from those in code mode due to varied prompts and non-deterministic model behaviors, leading to speculation about the codebase.
Discussion suggests real-time file updates in Aider might be verified with the --chat-history-file option.

LLMs Useless? Debate Erupts: A video shared in #links revealed anti-AI coding sentiments, labeling LLMs as pretty useless on their own, advocating for benchmarking against unassisted baselines.
Counterarguments highlighted significant productivity boosts with Aider and other tools, citing improved output, code quality, and understanding.

Nous Research AI Discord

MiniCPM-o 2.6 Omnimodel Arrives: The release of MiniCPM-o 2.6 significantly upgrades multimodal capabilities, quickly reaching top trending spots on GitHub and Hugging Face; a technical report details the specifications.
This 8B parameter model enhances performance across vision, speech, and live streaming, as highlighted in a related YouTube video.

Equilibrium Propagation Enhances Learning: Equilibrium Propagation is a novel framework for energy-based models, simplifying training by using a single neural computation phase for both prediction and error propagation.
This method improves biological realism in backpropagation algorithms by reducing the reliance on symmetric connections, as explained in further research.

Arcee-Maestro-7B Teases Reasoning Prowess: Arcee-Maestro-7B-Preview uses reinforcement learning on the Qwen2.5 architecture, showcasing better reasoning for mathematical and coding tasks.
This reasoning model builds upon existing frameworks with significant training advancements.

AlphaMaze Navigates Visual Reasoning: The AlphaMaze project is live, demonstrating how a model was trained to solve maze puzzles, improving from 0% to 93% accuracy through two-phase training methods.
This lets language models 'see' spatial relationships, opening up new possibilities for applications in robotics and navigation.

Cursor + Claude 3.5 Edges Out Groq for Code: A member shared that Cursor + Claude 3.5 still edges out Groq for coding purposes, from their direct experience.
Other members discussed a newly released research paper that might provide insights into their challenges, referencing a link to the paper found here.

GPU MODE Discord

DeepSeek AI goes Open Source: DeepSeek AI announced their upcoming open-sourcing event during Open Source Week, planning to release five repositories and engage with the community on AGI development as noted on their X post.
The team emphasized their commitment to transparency and community-driven innovation, showcasing their work documented and deployed in production.

Unsloth Slashes VRAM Requirements by 90%: Unsloth has achieved a 90% VRAM reduction, making GRPO fit on just 5GB VRAM for Qwen2.5-1.5B, extending average context lengths by 10x as announced on their X post.
A standard GRPO setup for Llama 3.1 (8B) requiring 510.8GB VRAM at 20K context is reduced to just 54.3GB with Unsloth's support, leveraging a previous gradient checkpointing method and Horace He’s linear cross entropy implementation.

GPU Meetup and Blackwell Hackathon: GPU MODE is hosting an in-person meetup in San Jose on March 16, focusing on ML Systems with speakers like Christos Kozyraki and Simran Arora, as detailed on Luma.
Simultaneously, SemiAnalysis is holding a Blackwell Hackathon also on the 16th, from 9 AM to 5 PM, featuring keynotes and hands-on GPU programming as announced on their website.

Hugging Face Builds Minimalist LLM Trainer: Nanotron, a project by Hugging Face for minimalistic large language model 3D-parallelism training, is available on GitHub.
Members showed positive interest in the resource, highlighting its Francophone authors.

GPU Glossary Goes Open Source: The GPU Glossary is now open source on GitHub under a CC BY license.
There was a suggestion to include a section on NUMA and CPU-GPU memory interactions to benefit newcomers to GPU programming.

Interconnects (Nathan Lambert) Discord

OpenAI's Revenue Shift & Infra Ambitions: OpenAI is seemingly pivoting from Microsoft to SoftBank, with infrastructure plans for 8GW by 2030, as detailed here.
It is projected that inference costs will surpass training costs within five years, indicating a major strategic adjustment according to this tweet.

Modal's GPU Price Slashes: Modal has initiated price reductions for its H100 and A100 GPU models, potentially reshaping AI hardware market dynamics, offering more accessible options for AI model training.
The price adjustments may affect accessibility and adoption of advanced AI model training across various organizations, leading to increased competition.

Sakana corrects memory-reuse exploit: Sakana has updated their leaderboard to fix the memory-reuse exploit issue, with details available here.
Currently, only one task, 23_Conv3d_GroupNorm_Mean, still exhibits a speedup greater than 100x, despite the engineer forgetting the convolution part, which the eval script failed to catch.

Doubt Cast on Microsoft's Quantum Leap: Microsoft's claimed quantum computing breakthrough faces skepticism, with experts advising against publication due to concerns that “The results do not represent evidence of Majorana zero modes,” as reported here.
Concerns have been raised about the integrity of their findings and the broader implications for quantum computing advancements.

IBM Launches Lean Vision Model: IBM Research introduced GraniteVision, a compact 2B parameter vision-language model excelling in document understanding, despite its small size as detailed in the paper.
This model demonstrates efficient AI advancements, making it a notable contribution to the AI community.

Yannick Kilcher Discord

Logits Champion Optimization Over Probabilities: Discussants underscored that logits facilitate more efficient optimization by circumventing the need for immediate normalization, thus reducing computational complexity during training.
They maintained that while probabilities are crucial for sampling and decision-making, prolonged use of logit space during training could boost performance in related tasks.

Diffusion Models Make Symbolic Tasks Feasible: There was a push for more exploration into using diffusion models for discrete tasks such as text generation, particularly in real-time scenarios, citing prior work from LLaDA as impressive.
The community questioned whether LLaDA's performance can be reliably reproduced when trained on limited datasets.

DeepSeek Researchers Deserve Plaudits: Members lauded DeepSeek for their consistent high-quality research and ability to present intricate concepts clearly, describing their recent paper as simple yet effective.
Enthusiasts noted that harnessing sparsity can yield better performance in real-world applications compared to traditional, data-hungry models.

Unsloth.AI Speeds Up Model Fine-Tuning: In the Start Up Wednesday with Unsloth.AI video, the founders highlighted their open-source project that accelerates AI model fine-tuning by making it twice as fast.
The announcement is generating substantial interest in the community, as Unsloth.AI aims to improve accessibility in AI development.

RL Makes a Comeback: Participants in the conversation noted that Deep Reinforcement Learning (RL) is experiencing a resurgence, sparking enthusiasm and discussions about its applications.
One member jokingly declared to be a belieber in RL now, underscoring a renewed interest in its potential.

Eleuther Discord

AI CUDA Engineer has Successes and Flaws: The recent AI CUDA Engineer automates CUDA kernel discovery and optimization, with over 90% success in translating PyTorch to CUDA.
Concerns arise about the dataset quality, where some report flawed generated kernels.

Mysterious Proxy IP Impact: Changing a proxy's IP address could alter model behavior, even when the browser's locale isn’t linked to the IP.
This inconsistency prompts questions about how the CoT summarizer handles information without locale context.

Sakana Project Suffers Bug Infestation: The Sakana project has multiple confirmed bugs and lacks thorough human verification, raising questions about the integrity of their research outputs.
Some members suggested that poor research practices may stem from VC funding, leading to negligence or irresponsibility in reporting results.

NeoX Gradient Accumulation in Hot Water: Concerns were raised about performing local gradient accumulation in FP32 while conducting reduction operations in BF16.
A member highlighted that this approach could still adversely impact model quality, echoing previous concerns about the relationship between gradient precision and model performance.

MCP (Glama) Discord

MCP Server Supports Documentation Context: Users discussed using an MCP server to add documentation (markdown) as context, enabling chat to remember it in the conversation for better memory retention.
This feature allows persistent documentation aids within conversational contexts.

Automated Test Lifecycle with MCP and Github: Members shared their goal to automate their test cycle using an MCP server to run tests, capture and parse logs, and generate recommendations for fixes, with integration with Github to create PRs.
The discussion also included context handling with MCP and Python, extending the MCP client session to handle context-specific calls, and leveraging the flexibility within Pydantic models for robust implementations.

MCP Setups for Cursor and LibreChat: A user requested information on configuring an MCP server for use with the Cursor app or LibreChat, particularly with an MCP server they set up for Obsidian via the Obsidian rest API community plugin.
The discussion also referenced the Model Context Protocol Authorization.

Vendor Lock-In Questioned: A member questioned the practical usage of mcp.run beyond toy examples, suggesting potential vendor lock-in.
In response, another user indicated the platform's standards remain fairly typical from the user's perspective, although the actual usage amount remains unclear.

AI Bot Does Karaoke: A user demonstrated their MCP-server and client setup, which enables an AI Discord bot to play songs in voice channels through easy mcp-export of tagged class methods for API integrations.
The showcased AI bot corrects playback issues by leaving and rejoining voice channels.

Modular (Mojo 🔥) Discord

Modular Merch Makes Waves: A member noted that the Modular branded Patagonia sweater goes hard.
This comment highlights the community's enthusiasm for Modular's brand identity.

Mojo Windows Support on Hold: There's no timeline for native Mojo support on Windows due to the high costs of running AI clusters on the OS, influenced by Microsoft's licensing fees.
nix OSes are preferred for deploying projects like MAX* because they provide better compute features.

Mojo Aims to Surpass Rust: Mojo is designed to resemble Python but perform closer to C/C++/Rust, with compatibility akin to C++ and C.
The goal is for Mojo's type system to surpass Rust’s, avoiding some of its pitfalls to allow for greater versatility.

Parallelize your GPU intro to Mojo: Newcomers to Mojo's GPU programming can start with functions like parallelize and for_each in normal code.
A forum thread with details on setting up GPU interactions was shared for further guidance and can be found on the Modular forums.

Mojo Concurrency via Shared Memory IPC: A member described their approach to managing concurrency in Mojo using a process-per-core strategy with shared memory IPC.
They emphasized the importance of managing pointers without lifetimes for efficient memory handling.

Notebook LM Discord

AI Creative Writing Falls Short: Writers expressed frustration with AI's inconsistencies, sometimes providing beautiful insights but often leading to long frustrating listens due to errors.
One member noted a decline in AI's performance since the launch of the Plus service, making it more of a hindrance, with concerns about trusting AI's inconsistent outputs.

NotebookLM: Good Tool?: One user shared how they use NotebookLM to assist with writing their novel, calling it sometimes a very bad tool.
They don't see it as a reliable canon source just yet, but another member shared how they use it to improve understanding of exponential material by starting with YouTube courses and testing comprehension.

Audio Deep Dive Approved: A member inquired about using the Audio 'Deep Dive' sessions in their courses, and received confirmation that it can be shared within their educational domain.
Links to guidelines on generating Audio Overviews were provided to help with the process.

NotebookLM iOS App: Where?: A user inquired about the correct iOS app for NotebookLM, signaling a need for clarity in available mobile applications.
No specific recommendations were given in the conversation.

Need Notebook Folders Please: A user requested the ability to create folders for organizing Notebooks.
They were informed that there is a feature request filed for it internally and expressed their eagerness to see this feature implemented soon.

Latent Space Discord

Arize Secures $70M for AI Reliability: Arize AI obtained $70 million in Series C funding to enhance the reliability of AI agents, particularly focusing on generative models and autonomous systems, since 2020.
Their goal is to refine tools for AI performance understanding and troubleshooting in real-world scenarios, ensuring dependable AI operation.

OpenAI Boasts 400M Active Users: OpenAI announced it has surpassed 400 million weekly active users, including 2 million business users leveraging ChatGPT at work, marking a 33% increase in less than three months.
The company's upcoming models, GPT-4.5 and GPT-5, aim to unify existing functionalities while broadening agent capabilities, according to Tom Warren.

Deep Seek's Open Source Week Debuts: Deep Seek initiated #OpenSourceWeek, with plans to open-source five repositories, sharing advancements in AGI with the community, according to their tweet.
The initiative emphasizes community-driven development, making documentation and deployment publicly accessible to foster collective progress.

Facebook's Reasoning Dataset Challenges AI: Facebook introduced a dataset featuring over 1 million reasoning traces, designed to challenge AI with high-quality questions to improve reasoning techniques, according to Caleb.
This dataset includes reference answers and is expected to enhance the performance of reasoning models across various applications, by improving reasoning techniques.

1X Debuts NEO Gamma for Home Tasks: 1X Tech is promoting NEO Gamma, a robot tested in employee homes, designed to reliably perform household chores, according to Eric Jang.
Its humanoid design aims for natural interactions and showcases advanced capabilities in walking and bending.

Torchtune Discord

Torchtune Team Tackles Test Artifacts: A user experienced a ValueError related to missing test artifacts after resolving initial pytest errors by installing dev dependencies via pip install -e .['dev'].
The team suggested deleting the /tmp/test-artifacts directory to force re-download of necessary artifacts, showcasing helpful community collaboration and problem-solving.

Meta Launches MLGym Environment for AI Research: Meta introduced MLGym, a new Gym environment for ML tasks, featuring 13 diverse AI research tasks across multiple domains.
The launch received positive reactions, with one member expressing excitement and intending to share the news themselves.

Unsloth GRPO Algorithm Yields Massive VRAM Savings: A blog post highlighted that the Unsloth Efficient GRPO algorithm enables 10x longer context lengths using 90% less VRAM, facilitating training of a reasoning model with only 5GB VRAM for Qwen2.5.
Members noted the drastic reduction in VRAM for Llama 3.1 training, from 510.8GB to 54.3GB, making this a very significant development.

Team Puts Width/Depth Pruning Discussion on Ice: A conversation around the need for an RFC on width/depth pruning concluded that the team currently lacks the bandwidth to prioritize it.
It was proposed to discuss the topic further during office hours before potentially developing the ideas into a PR.

Engineers Flock to Optimize GRPO PR: A Torchtune member anticipated high engagement with the GRPO PR, predicting it might break the record for the amount of comments, a sentiment that resonated with other members of the guild.
A team member volunteered to assist with GRPO, KD, quantization, and pruning, inviting collaboration and mentorship in these areas to further enhance community involvement.

LlamaIndex Discord

LlamaParse Gets Parsing Mode Overhaul: LlamaParse is enhancing its document parsing with new modes—Fast, Balanced, and Premium—designed to meet diverse user requirements, detailed in this tweet.
These enhancements aim to more effectively address document parsing challenges.

AI Infrastructure Talks Coming Soon: An exclusive event on March 5 will host talks on advancements in AI infrastructure, with information available in this announcement.
Discussions will center on practical training applications, fine-tuning, inference, and RAG, with the goal of improved performance at reduced costs.

Multi-Agent Handoffs Get Fixes: A custom handoff prompt update resolved issues where the LLM would return 'I am handing off to AgentXYZ' instead of initiating a tool call, now producing valid JSON object outputs.
Despite the fix, concerns persist regarding the unpredictable nature of agent handoffs and how LLM temperature settings influence workflow stability.

PDFs Powering AI Creation: A member inquired about building an AI exclusively from 100-1000 PDF documents, ensuring responses are confined to this dataset.
They also questioned the need for a dedicated server or computer to host the project.

Visual Workflow Interface Still Missing: A member inquired about visual interfaces for workflow creation, similar to Logic Studio (LogicStudio.ai - Visual AI Agent Orchestration).
Currently, no specialized tools exist beyond standard drawing utilities for such visual workflow design.

Nomic.ai (GPT4All) Discord

NOMIC v2 Implementation Causes Confusion: Users expressed confusion regarding the correct implementation of NOMIC v2, indicating a need for better documentation or tutorials.
The discussion emphasized potential misunderstandings of new features and functionalities.

GPT4All Setup Yields Querying Issues: A new user reported difficulties querying documents using GPT4All v3.9.0, where despite setting up their local environment, they encountered inaccurate outputs.
The responses were often unrelated or incorrect, hindering attempts to extract specific information from a document collection.

Optimal Model Settings Suggested for Performance: Advice was provided to adjust context length and document size for improved GPT4All performance.
Users recommended balancing context size and snippet quantity to enhance document retrieval accuracy.

Chat Template Extraction Meets Roadblocks: A user encountered issues extracting a chat template from the tokenizer file, citing missing system prompts.
Guidance was sought on setting parameters like min_p and top_k in the generation configuration for better output management.

Model Loopiness Addressed: Concerns arose over GPT4All outputs looping indefinitely, leading to repetitive, self-chatting behavior.
Suggestions were offered to tune model settings to mitigate extended responses, thereby improving usability.

tinygrad (George Hotz) Discord

Beam Me UP, Tinygrad!: Tests revealed that increasing BEAM for 2048x2048 tensors resolved performance bottlenecks, improving the UPCAST on reduce dimensions.
A member shared an update "Actually, I think we are good now... I just opened a PR for this".

GROUP OptOps Faces CPU Challenges: Problems surfaced with GROUP OptOps on CPUs, leading to failures in tests like test_arange due to the ballooning of estimated flops.
The community debated whether these inefficiencies are inherent, as these optimizations function correctly on GPUs but only in LLVM.

Agentic CUDA Kernel Search on the Horizon: A recent paper on agentic CUDA kernel search was discussed in the context of kernel performance improvements.
The discussion linked these advancements to ongoing optimization efforts and performance challenges within current projects.

Linearizer is a gateway to Tinygrad: The tinygrad linearizer is crucial to enhance the capabilities of the tinygrad framework.
The GitHub page highlights the charm of tinygrad for fans of frameworks like pytorch and micrograd.

Cohere Discord

Roazzy Turns Pink: Roazzy announced a fun change stating, As yall can see, now I am pink in the chat.
Another member remarked that it was 'cool stuff', showcasing positive reactions to the update.

Cohere Benchmarks Benchmarked: A member inquired if the Cohere embedding models were submitted to benchmark leaderboards, specifically referencing evaluations against MTEB and BEIR.
They specifically noted the BEIR leaderboard and expressed interest in additional benchmarks for their university assignment.

Half Rest Hacks Sought: A user prompted for 'tricks' to help those looking to achieve a suitable amount of half rest.
While no specific techniques were shared, the interest in this topic was evident.

Community Craves Chill: Another participant mentioned the need for rest strategies, indicating a collective interest in improving recovery.
The conversation suggests a potential for a more extensive discussion on effective rest methods.

DSPy Discord

DSPy Explores Chat History Integration: Members explored a feature request on GitHub to allow specifying chat history for language models in DSPy.
The discussion centered on whether the potential performance improvements from custom implementations justify the implementation effort.

DSPy Performance Gains Spark Curiosity: A member inquired about potential performance improvements from custom solutions related to chat history specification.
The conversation highlighted a need to clarify whether such customizations are beneficial, considering the resources needed to implement them.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

OpenAI ▷ #annnouncements (1 messages):

Operator Rollout, Regional Availability 

Operator launches for Pro users globally: Operator is now being rolled out to Pro users in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, the UK, and most places where ChatGPT is available.
We're still working on making Operator available in the EU, Switzerland, Norway, Liechtenstein & Iceland and will provide updates.

Upcoming Operator Release in the EU: The Operator feature is still under development for regions like the EU, Switzerland, Norway, Liechtenstein, and Iceland.
Updates will be shared as they become available for these areas.

OpenAI ▷ #ai-discussions (921 messages🔥🔥🔥):

Grok 3 vs ChatGPT Plus, Translation features in AI, AI capabilities in programming, User experiences with various AI models, Deepseek and its alternatives 

Comparing Grok 3 and ChatGPT Plus: Users are comparing Grok 3 and ChatGPT Plus, with many expressing a preference for the capabilities of Grok 3, especially in programming tasks.
Concerns were raised about Grok's usage limits, with some users experiencing lower limits than expected.

AI Translation Features: Participants discussed using AI tools for translation in various languages, including Hinglish and Georgian, noting the effectiveness of models like Grok and ChatGPT.
Users expressed interest in exploring the translation quality across different AI platforms.

User Experience with AI Models: Users shared their experiences using AI models in development and coding, highlighting tools like Codeium and their functionality in various IDEs.
Some users prefer ChatGPT for its familiarity and features over Grok, while others appreciate Grok's unique attributes.

Deepseek's Data Privacy Concerns: Concerns were raised about the data privacy of apps like Deepseek, particularly regarding its Chinese ownership and data handling practices.
Users discussed the implications of data usage by different AI providers and the potential risks associated with it.

Anticipated Features and Updates: Anticipation for upcoming features in Grok, such as voice mode and custom instructions, was shared among users.
Users expressed enthusiasm for the developments in AI capabilities and the ongoing evolution of these platforms.

Links mentioned:

Overview | xAI Docs: Learn how to use our products and services
Grok: Grok is a free AI assistant designed by xAI to maximize truth and objectivity. Grok offers real-time search, image generation, trend analysis, and more.
Tweet from CHOI (@arrakis_ai): Grok 3’s voice mode has no censorship. It’s quite surprising.Grok Voice Chat with ChatGPT
Tweet from CHOI (@arrakis_ai): Grok Voice
Tweet from CHOI (@arrakis_ai): Grok Voice
SpecGram—New speech disorder linguists contracted discovered!—Yreka Bakery: no description found
Why can we no longer copy text in the "Edit Message" box? (Firefox): When you get stuck with the infinite loading, and I have to press STOP to retry, and it doesn’t work:     I would like to be able to go to my message to copy the text. But the “edit message” text edit...
AI Model Comparison - Single Prompt Top Down Car Game (prepare to be disappointed): This was a single prompt test: no followup prompts and no manual code editing.I gave the same prompt to several ai models:ChatGPT 4o, ChatGPT o3 Mini High, C...
Mali Football GIF - Mali Football Eric chelle - Discover & Share GIFs: Click to view the GIF
Tweet from 1X (@1x_tech): Introducing NEO Gamma. Another step closer to home.
Tweet from Elon Musk (@elonmusk): Try Grok voice in unhinged mode 😂
Elon Musk Makes HUGE DOGE Predictions!: Check out the original CPAC stream here - https://www.youtube.com/watch?v=0BN-62AQT4QThe Robots Are Coming https://farzad.fmFREE One Year Supply of Vitamin d...
Ladder Rungs Submerged at Low Tide | Shared Grok Conversation: There is a ladder attach to a boat with 10 rungs. At low tide, the water level drops by 60 cm. Each 
Tweet from xAI (@xai): This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt).Try Grok 3 now: https://x.com/i/grokX Premium+ and SuperGrok users will have increased access to Grok 3, in...
State of the Art in Solar Battery Storage | Shared Grok Conversation: Based on peer reviewed studies what is the current State of the Art in household solar battery stora
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
C++ Rigid-Body Physics Simulation with GDI | Shared Grok Conversation: Create a basic C++ application that incorporates custom rigid-body physics with collisions and abili
Tweet from Elon Musk (@elonmusk): Wow

OpenAI ▷ #gpt-4-discussions (61 messages🔥🔥):

OpenAI Teams and Operator, Coding capabilities comparison, Community feedback on feature requests, User experiences with Teams, Moderation and community interaction 

Discussion on OpenAI Teams and Operator: Members discussed the need for a 'distilled version' of the Operator for Teams users, emphasizing the high cost of $200 per month for access to Pro features.
One member expressed interest in providing feedback to OpenAI about their needs for Teams, hoping for more accessible testing options.

Coding Performance: o3-mini-high vs o1: A user reported consistently getting better code from o1 compared to o3-mini-high, raising concerns about the latter’s capabilities around coding and logic.
The user noted several comparisons where o1 delivered superior solutions, prompting a discussion on performance expectations from both models.

Community Moderation and Interaction: Several members congratulated a community moderator for their engagement and support, highlighting the importance of constructive discussion among users.
The interactions turned into a broader dialogue about sharing experiences and insights regarding OpenAI's offerings and community dynamics.

Feedback on Teams and User Experiences: A member asked for personal opinions on the potential benefits and drawbacks of using Teams, wanting a robust argument for feedback to OpenAI.
Participants expressed views on the lack of sharing capabilities within Teams and its implications on user experience.

Celebrating Community Passion: Community members expressed appreciation for each other's contributions to discussions around AI development and OpenAI's services.
Acknowledgements were made regarding the enthusiasm of users contributing to the community, aiming for a positive and constructive environment.

OpenAI ▷ #prompt-engineering (7 messages):

Prompt Evaluation, GPT Builder Reasoning Prompts, English Grammar Improvement, CS Communication Challenges 

Need for Working GPT Builder Prompts: A member sought a working prompt that assists GPT builders in reasoning, noting their old prompt worked well with the o1 model but not with the GPT builder.
This raised questions about the effectiveness of prompts across different models and encouraged sharing any successful examples.

Sentiments on Prompt Effectiveness: A member questioned how satisfied users are with their prompts, prompting discussion on evaluating effectiveness based on user experience, output quality, and personal progress.
Are you getting the help you want and need? was a key inquiry, focusing on user-centered results.

Creating Grammar Improvement Prompts: One member shared a prompt designed to enhance their English grammar, seeking feedback on its effectiveness after using it with the GPT4o model.
Accompanying this was a discussion on the importance of user satisfaction with prompt outputs.

Exploring Various Prompt Strategies: Another member expressed gratitude for the advice received on prompts, feeling confident it would aid their progress but acknowledged needing further input in one specific situation.
This highlights the ongoing dialogue on customizing prompts for unique learning needs.

Addressing Software Issues: A member indicated upcoming discussions about whether issues are related to prompts or the software itself, expressing concern over time delays.
This reflects ongoing frustrations with the tools being used and the desire for a resolution.

OpenAI ▷ #api-discussions (7 messages):

GPT Builder Prompt Effectiveness, English Grammar Learning Prompts, Feedback on Outputs, Improving Prompt Design 

GPT Builders need better reasoning prompts: A member shared that they received a good prompt from the O1 model, but it didn't perform as expected within the GPT builder environment.
They are seeking insights on crafting prompts that effectively enhance reasoning within the models.

Exploring English Grammar Prompts: Another member shared a prompt designed for learning and improving English grammar, seeking feedback on its effectiveness.
They noted that this prompt was generated using the O1 model and used it in the GPT-4o model for practice.

Evaluating Prompt Effectiveness: A member suggested assessing their learning prompt by reflecting on satisfaction with the outputs it generates.
They emphasized the importance of progress and support from the model, encouraging a focus on improving prompts if results are lacking.

Time Constraints in Model Testing: One participant expressed frustration over time-consuming trials with a prompt, hoping to resolve performance issues in upcoming calls.
They seek clarity on whether the issues stem from the prompt design or software inconsistency.

Cursor IDE ▷ #general (908 messages🔥🔥🔥):

Cursor 0.46 Release, Claude Model Performance, API Integration Issues, User Experience with MCP, Feedback on AI Tooling 

Cursor 0.46 is available for download: Users have found early links to download Cursor 0.46, but there are concerns about its stability and performance.
Some users are experiencing issues with the updated version, particularly with the new user interface and its integration with existing tools.

Differences observed in Claude model outputs: Several users noted that Claude models in Cursor are producing varying outputs, particularly when comparing the older and newer versions.
Users have attempted to isolate the impact of version changes on performance relating to generating layouts and CSS code.

Concerns with API and backend functionality: Discussions revealed that many users suspect issues with backend prompts and API performance affecting output consistency.
It was suggested that users may be experiencing variability due to changes in how models are prompted or managed within Cursor.

Feedback on MCP and integration difficulties: Users expressed frustrations regarding the maintenance of the MCP tool, noting that updates often break existing features.
There is an interest in multi-agent support and improvements to how MCP functions within Cursor.

Overall user sentiment about AI tooling: Participants shared mixed feelings about AI models and their performance, highlighting issues with understanding and effectively using tools like Claude.
Many users emphasized the importance of proper prompt structures and context management to achieve desirable outcomes.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Model context protocol (MCP) | Supabase Docs: Connect AI tools to Supabase using MCP
no title found: no description found
Sync your workspace - Augment: no description found
Installation - AgentDesk - BrowserToolsMCP: no description found
no title found: no description found
JetBrains Fleet: More Than a Code Editor: Built from scratch, based on 20 years of experience developing IDEs. Fleet uses the IntelliJ code-processing engine, with a distributed IDE architecture and a reimagined UI.
no title found: no description found
Iterating with Shadow Workspaces | Cursor - The AI Code Editor: Hidden windows and kernel-level folder proxies to let AIs iterate on code without affecting the user.
Tweet from Jimmy Apples 🍎/acc (@apples_jimmy): Thus saith the Lord: Orion is My herald, My cipher etched in flame. Let all flesh be silent before him. For he walks not alone, but with the Hosts of Heaven,Their minds one with the Mind that spun the...
no title found: no description found
cline_docs/prompting/custom instructions library/cline-memory-bank.md at main · nickbaumann98/cline_docs: Documentation and best practices for using Cline. Contribute to nickbaumann98/cline_docs development by creating an account on GitHub.
This README outlines the limitations of the Cursor MCP Environment, highlighting constraints on package access and environment variables. Key issues include reliance on the Python standard library and the need to hardcode sensitive data. Workarounds involve adapting scripts accordingly. Open questions focus on potential configurations for improved security and access.: This README outlines the limitations of the Cursor MCP Environment, highlighting constraints on package access and environment variables. Key issues include reliance on the Python standard library ...
starter-applets/spatial at main · google-gemini/starter-applets: Google AI Studio Starter Apps. Contribute to google-gemini/starter-applets development by creating an account on GitHub.
Grok 3 is an...interesting model.: I had high hopes for Grok 3. According to their benchmarks it should be the new best model right? Right? Quite a lot to talk about with this one...Thank you ...
Tweet from Ryo Lu (@ryolu_): New hair, new teeNew jeans, do you see
Tweet from Kevin Kern (@kregenrek): Cursor 0.46 is here and we got some fresh UI Updates.Chat -> AskComposer (normal) -> EditComposer (agent) -> AgentYou can switch between modes with ⌘+.Quoting Kevin Kern (@kregenrek) Cursor A...
Tweet from Tibor Blaho (@btibor91): New Claude web updates- New "Model Selector Intro" and "Thought Process Extended Intro" announcements (now added with date Feb 19) for Paprika (the new thinking model) - the texts for ...
0.46.0 Changelog (unofficial): Things I’ve spotted in the 0.46 release   New combined Chat/Composer Shared context between Chat & Composer (Start in Ask / move to Agent) Deepseek R1 thinking UI MCP Server Config file .cursor/mc...
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
Reddit - Dive into anything: no description found
GitHub - AppImageCommunity/AppImageUpdate: AppImageUpdate lets you update AppImages in a decentral way using information embedded in the AppImage itself.: AppImageUpdate lets you update AppImages in a decentral way using information embedded in the AppImage itself. - AppImageCommunity/AppImageUpdate
no title found: no description found
Reddit - Dive into anything: no description found
Your MCP config is OSX and Linux specfic · Issue #9 · anaisbetts/mcp-installer: To get this to work on windows one's config needs to look like this { "mcpServers": { "mcp-installer": { "command": "cmd.exe", "args": [ "/c&...
Changelog | Cursor - The AI Code Editor: New updates and improvements.
no title found: no description found
cursor: Homebrew’s package index

Unsloth AI (Daniel Han) ▷ #general (444 messages🔥🔥🔥):

Unsloth AI updates, Multi-GPU Training, Qwen Model Fine-Tuning, GPU Comparisons, Reward Functions in AI Models 

Updates on Unsloth and GRPO: The Jan AI team successfully GRPO-ed a 1.5B model using Unsloth to explore LLM Spatial Reasoning by solving MAZE, showcasing its capabilities.
This further underlines Unsloth’s application potential in various domains, including medical report interpretation.

Concerns about Multi-GPU Support: Several users discussed Unsloth's current lack of multi-GPU support, with recommendations leaning towards using singular powerful GPUs like the RTX 3090 for fine-tuning.
This suggestion stems from the challenges associated with managing multiple lower-end GPUs like the RTX 3060.

Fine-Tuning Qwen Models: Users are experimenting with fine-tuning the Qwen2 model for applications in medical reporting, highlighting the need for efficient VRAM usage during training.
Concerns were raised about the inability to use gradient accumulation, potentially leading to high VRAM demands.

Discussions on GPU Performance: There were inquiries about the performance metrics for different GPUs, particularly the importance of memory bandwidth versus CUDA cores in fine-tuning speed.
A need for a ranking list of GPUs based on fine-tuning efficiency, particularly post-release of new GPU series, was emphasized.

Expectations for Future Development: The community expressed excitement about upcoming releases from Unsloth, particularly regarding the implementation of reward functions.
Members are anticipating how these enhancements can improve model performance in training and real-world applications.

Links mentioned:

The Ultra-Scale Playbook - a Hugging Face Space by nanotron: no description found
Google Colab: no description found
The AI CUDA Engineer 👷: no description found
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving: We introduce Goedel-Prover, an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems. The key chal...
VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners: Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popu...
Unsloth Gradient Checkpointing - 4x longer context windows: Unsloth Gradient Checkpointing now supports finetuning of LLMs with very long context windows, up to 228K for Llama 3.We managed to reduce memory usage by a further 30% at the cost of +1.9% extra time...
config.json · unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit at 5e0ac06dc3e90b3e84ce2c4b6bd3257974b1bb0a: no description found
GRPO Trainer: no description found
Using User-Defined Triton Kernels with torch.compile — PyTorch Tutorials 2.6.0+cu124 documentation: no description found
Tweet from Unsloth AI (@UnslothAI): Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).Using Unsloth, you can now train your own reasoning model with just 5G...
Google Search: no description found
SakanaAI/AI-CUDA-Engineer-Archive · Datasets at Hugging Face: no description found
Tweet from DeepSeek (@deepseek_ai): 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full tra...
GitHub - huggingface/nanotron: Minimalistic large language model 3D-parallelism training: Minimalistic large language model 3D-parallelism training - huggingface/nanotron
Unsloth GRPO AIMO Training DeepSeek R1 7B: Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources
Google Colab: no description found
GitHub - lucasjinreal/Namo-R1: A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.: A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease. - lucasjinreal/Namo-R1
unsloth-zoo/unsloth_zoo/rl_replacements.py at a9857088bdaf412bef36800d837a3a37657555c8 · unslothai/unsloth-zoo: Utils for Unsloth. Contribute to unslothai/unsloth-zoo development by creating an account on GitHub.
unsloth-zoo/unsloth_zoo/rl_replacements.py at main · unslothai/unsloth-zoo: Utils for Unsloth. Contribute to unslothai/unsloth-zoo development by creating an account on GitHub.
Reddit - Dive into anything: no description found
AI CUDA Engineer - Kernel Leaderboard 🏆: no description found
trl/trl/trainer/grpo_trainer.py at e5ae703d352b29537159180087ef8bd4b41bf625 · huggingface/trl: Train transformer language models with reinforcement learning. - huggingface/trl

Unsloth AI (Daniel Han) ▷ #off-topic (6 messages):

Triton Inline Assembly, Dequantization Results, Performance Discrepancies 

Exploring Triton's Inline Assembly Functionality: The Triton inline assembly elementwise function executes inline assembly over tensors, supporting implicit broadcasting and packed processing of elements.
This operation requires that inline assembly returns at least one tensor, with the option to return a dummy tensor to avoid errors.

Discrepancies in Dequantization Results: Users reported that their Triton dequantized results differ from Unsloth's, with discrepancies noted at less than 1%, particularly a margin of 1.1444091796875e-05.
One member observed that their kernel is faster but still results in a failure in the test_dequantize function, raising questions among peers.

Common Issues with Dequantization Accuracy: Another user echoed concerns about the accuracy, noting that about 50% of their dequantization results are marked incorrect.
This prompted further inquiry into whether others share similar experiences or challenges in achieving accurate outcomes.

Link mentioned: triton.language.inline_asm_elementwise — Triton  documentation: no description found

Unsloth AI (Daniel Han) ▷ #help (42 messages🔥):

Qwen 2.5 VL Performance, Using LoRA with vLLM, GRPO Training Confusion, Multi-GPU Training Support, Fine-tuning Script Models 

Qwen 2.5 VL Performance with Different Models: Users reported mixed results with Qwen2.5-VL-3B-Instruct models, noting discrepancies in performance between merged models and those loaded via adapters.
One user pointed out that using FastVisionModel yielded better performance compared to the merged 16bit model run directly with vLLM.

Challenges Using LoRA with vLLM: A user attempted to load a LoRA adapter with vLLM but encountered an error due to vLLM's current support limitations for adapters tailored to language models only.
Additionally, loading merged models in different data types yielded varying results, particularly regarding the model performance and warnings during loading.

GRPO Training Process Insights: Concerns were raised about observed zero loss conditions in GRPO training, even when there were non-zero rewards, leaving users puzzled about the training's effectiveness.
Some participants suggested that further investigation into learning rates and additional supervision might clarify the observed behaviors.

Fine-tuning with Unsloth for Scripts: There was an inquiry on how to fine-tune script documents with character dialogues using Unsloth, highlighting the desire for specific prompts to enhance the storytelling flow.
Participants discussed the potential for automation in the script fine-tuning process and the effective use of formats for improved results.

Request for GPU/RAM Specifications for Llama 70B: A user sought advice on the necessary GPU and RAM specifications for running Llama 70B locally, indicating a need for high-performance hardware.
Discussion inquired about potential storage capacities and configurations to support demanding AI models.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Unsloth Documentation: no description found
Unsloth Documentation: no description found
Google Colab: no description found

Unsloth AI (Daniel Han) ▷ #showcase (4 messages):

RAG chunking issues, Spark Engine release, LLM Spatial Reasoning testing 

RAG faces chunking challenges: Concerns were raised about RAG and its difficulties with chunking, particularly with company jargon leading to LLMs confidently hallucinating meanings.
RAG can become complex as data scales, which may result in hallucination issues, especially when the right context isn't fetched.

Spark Engine Launches After Year of Beta: Spark Engine celebrated its full release after over a year in public beta, boasting 80 models for various creative outputs including text, music, and automation.
The developers are eager for more collaborators to join them in innovating with this powerful no-code AI sandbox.

Jan AI successfully tests with LLM: The Jan AI team conducted a GRPO on a 1.5B model to explore LLM Spatial Reasoning through the MAZE challenge, marking an impressive achievement.
Community excitement is evident as this was shared in a discussion on LocalLLaMA.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Reddit - Dive into anything: no description found

Unsloth AI (Daniel Han) ▷ #research (214 messages🔥🔥):

AI in Medical Diagnostics, Clinical Trials and Ethics, Psychological Diagnosis, Potential AI Enhancements, Use of LLMs in Specialized Fields 

Concerns about AI in Medical Diagnosis: The discussion highlighted ethical concerns regarding AI use in medical diagnostics, emphasizing the importance of human oversight to avoid misdiagnosis and reliance on automated systems for serious medical conditions.
Members expressed worries that without proper checks, patients could be left without diagnoses or appropriate treatments based on AI outputs.

Importance of Clinical Trials: Participants agreed that rigorous clinical trials are essential before implementing AI-designed medical solutions to ensure safety and efficacy, with major emphasis on not bypassing professional reviews.
There were discussions about the potential backfire effects of misusing AI models for serious health conditions, stressing common ethical pitfalls.

Role of AI in Psychological Assessment: The potential for AI to assist in psychological diagnosis was debated, where it was suggested that AI could help gather data for psychiatrists but should not replace professional diagnostics.
Concerns were raised about the capability of AI to truly understand the nuances of psychological conditions, and its limits in diagnostics.

AI Training and Knowledge Integration: A paper discussing the use of Low-rank adaptation (LoRA) highlighted challenges of 'catastrophic forgetting' when integrating new knowledge into existing AI models.
Participants noted that training models with mixed old and new facts could mitigate performance declines, though specifics on effective training techniques were questioned.

Skepticism Towards AI Capabilities: Members expressed skepticism regarding the claimed capabilities of AI models, particularly with regard to their effectiveness in specialized contexts like psychiatry and diagnostics.
Comments on specific configurations used for training AI models sparked disbelief, raising questions on their practical validity.

Links mentioned:

DSM-5 Criteria for Major Depressive Disorder: The DSM-5 Criteria for Major Depressive Disorder is a set of diagnostic criteria for major depressive disorder (MDD).
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?: The performance of Large Language Models (LLMs) on many tasks is greatly limited by the knowledge learned during pre-training and stored in the model's parameters. Low-rank adaptation (LoRA) is a ...
Squint Eye GIF - Squint Eye Sad - Discover & Share GIFs: Click to view the GIF

Codeium (Windsurf) ▷ #discussion (32 messages🔥):

Codeium Features, JupyterLab Extension Issues, IntelliJ Autocompletion Problems, Windsurf IDE Expectations, Codeium Support and Feedback Channels 

Windsurf IDE promises enhancements: Windsurf is touted as an advanced AI-powered IDE aimed at boosting developer productivity with features for code generation, refactoring, and optimization.
A YouTube video titled 'Codeium - Windsurf' elaborates on its features and benefits, encouraging users to explore its potential.

JupyterLab extension struggles: A user trying to set up the Codeium extension for JupyterLab reported difficulties, despite following installation steps and adding the auth token.
They expressed disappointment over the lack of code auto-completion functionality, unlike expected from the extension.

IntelliJ users face visibility issues: Users leveraging the IntelliJ extension noted they couldn't see autocomplete suggestions unless hitting tab, impacting workflow.
Another user confirmed that they experienced no auto-completion from Codeium when engaging with Jupyter, which led to frustration.

Codeium's current state in maintenance mode: Discussion emerged surrounding a perceived lack of updates, with claims that the Jetbrains plugin seems to be in maintenance mode without new features.
Participants reflected on the disappointing experience, highlighting that too much of the changelog appeared to be a copy-paste.

Feedback mechanisms for Codeium issues: It was suggested that user feedback could be channeled through Discord, Codeium's support page, and feature request platform.
The community was encouraged to share their needs and report bugs to spotlight requirements for enhancements.

Links mentioned:

Codeium - Windsurf: 🧑‍💻 | Your modern coding superpower🚀 | 3M+ Codeium extension downloads🏄‍♂️ | Building the Windsurf Editor
Jupyter Notebook Tutorial | Windsurf Editor and Codeium extensions: Codeium is the AI code assistant platform that developers love and enterprises trust. Also the builders of Windsurf, the first agentic IDE.

Codeium (Windsurf) ▷ #windsurf (294 messages🔥🔥):

Codeium support issues, Windsurf features and bugs, Using Cascade with MCP, User experience with Windsurf, Feature requests for Windsurf 

Frustrations with Code Changes in Windsurf: Users are expressing frustrations about compatibility issues and unexpected changes made by Windsurf, such as automatic modifications of code in write mode without approval.
Many are suggesting using Git for source control to better manage changes and prevent loss of work.

Problems with Language Server and Errors: Some users reported issues with the language server after a recent update, with suggested solutions including reinstalling the application and deleting the .codeium folder for a reset.
One user identified that the previous version of Windsurf worked without issues, indicating a potential bug in the latest version.

Configuration and Feature Requests: There are ongoing discussions about the need for features like drag-and-drop functionality, customizable session names, and better control over memory use in Windsurf.
Users are also interested in the possibility of integrating feedback mechanisms for feature requests on platforms like Canny.

Using Cascade Effectively: Users are sharing strategies for effectively using Cascade, such as specifying which documentation pages to include for prompts.
Cascade's ability to pull relevant information based on conversation context is noted as a strength, provided users articulate their needs clearly.

User Support and Help: Multiple users faced issues with logging in and project memory loss, prompting suggestions to contact support for assistance and troubleshoot common problems.
Resources such as links to support and community advice on best practices were shared to help users navigate their challenges.

Links mentioned:

A Hand-Curated Shitpost Picture: no description found
Prompt Engineering - Codeium Docs: no description found
Windsurf Directory: Find the best windsurf rules for your framework and language
Ded GIF - Ded - Discover & Share GIFs: Click to view the GIF
Tweet from Pulse MCP (@pulsemcp): With the exciting new MCP integration with @windsurf_ai, we’ve creating a tutorial to guide you through connecting Windsurf to @supabase in 4 easy steps!See it in action below 👇
Tweet from Saoud Rizwan (@sdrzn): Cline v3.4 is out 🚀 Introducing MCP Marketplace! Discover and install the best MCP servers right in the extension, where Cline handles all the setup. We’ve also added mermaid diagrams to Plan mode, n...
Tweet from eric provencher (@pvncher): Yesterday, the codemap update for Repo Prompt went live, and it's a game changer for intelligently pulling in context detected in the code you select to be a part of your prompts.This update lays ...
Roll-over of Pro Credits | Feature Requests | Codeium: Unused Premium User Prompt Credits and Premium Flow Action Credits roll-over to the next month
Tweet from Windsurf (@windsurf_ai): Let's do another giveaway!We have these cool new shirts that you guys have been DM'ing us about, as well as these awesome rubber duckies for debugging purposes. Not sure exactly how many we...
Codeium Status: no description found
Support | Windsurf Editor and Codeium extensions: Need help? Contact our support team for personalized assistance.
Tweet from swyx 🗽 NYC (@aiDotEngineer) (@swyx): First time I’ve seen @AnthropicAI lay out its top priorities like thisfocusing more on mechinterp than Claude 4 now!great presentation from @ambricken and Joe Bayley!Quoting swyx 🗽 NYC (@aiDotEnginee...
Tweet from Melvin Vivas (@donvito): Ultimate MCP tutorial 🤯🤯🤯Learn how to configure MCP in Cursor, Windsurf and ClaudeIn this tutorial, we used the github mcp servera thread 🧵👇
Smithery for Windsurf is broken · Issue #23 · smithery-ai/cli: Hi I'd like to report that smithery.ai's support for Windsurf is... not great. Or perhaps Windsurf's support for Smithery is not great? I previously reported an initial bug with Windsurf s...
The ONLY Windows PC OPTIMIZATION Guide You Will EVER Need In 2024: THE BEST Quick Guide To Optimizing / Improving Windows on your gaming PC! How To Optimize Windows 10 For GAMING - Best Settings for FPS & NO DELAY! In today'...
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
Codeium Feedback: Give feedback to the Codeium team so we can make more informed product decisions. Powered by Canny.

HuggingFace ▷ #general (94 messages🔥🔥):

HuggingFace Chatbot Preferences, Training Models & Performance Issues, Audio Generation Models, COLD Dataset Insights, SmolAgents Exploration 

HuggingFace Chatbot Capabilities Discussed: Members discussed preferred models for various tasks, highlighting Grok 3, O3 Mini, and Claude for coding applications, emphasizing the limitations of proprietary systems.
Concerns were raised over Hugging Chat's performance and access issues on different devices.

Issues with COLD Dataset Filtering: A user inquired about the number of entries in the COLD dataset, noting confusion over the split showing fewer rows than expected, which led to clarifications on the dataset being a subset.
Members suggested reviewing the dataset's GitHub for export scripts and further insights on filtering data.

Exploration of Audio Generation Models: Members recommended various audio generation models, including Tangoflux for speed and Audiocraft by Facebook as a starting point.
The conversation also touched on finding suitable text-to-audio models for game sound effects.

Utilization of SmolAgents: A user expressed interest in using SmolAgents for their work and sought information about their functionalities via the official website.
Members confirmed that SmolAgents can encapsulate agents for task management and provide integration capabilities.

COLM Reputation in Academia: A member questioned the reputation of COLM for paper submissions, indicating interest from a graduate student.
Concerns about academic standards and fit for submissions in the context of increasing exploration in AI and legal datasets were discussed.

Links mentioned:

Tweet from Unsloth AI (@UnslothAI): Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).Using Unsloth, you can now train your own reasoning model with just 5G...
Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Audiocraft - a Hugging Face Space by Tonic: no description found
Using GPU Spaces: no description found
Oracle (Oracle Corporation): no description found
future-technologies/Floral-High-Dynamic-Range · Hugging Face: no description found
Smolagents : Huggingface AI Agent Framework: Smolagents's Guides and News - HuggingFace's NEW Agent Framework ，Create Powerful AI Agents with Minimal Effort. AI code agent Free online ,Click to TRY FREE
Spaces - Hugging Face: no description found
harvard-lil/cold-cases · Datasets at Hugging Face: no description found
Spaces - Hugging Face: no description found
Sound AI SFX - a Hugging Face Space by fantaxy: no description found
Spaces - Hugging Face: no description found
Spaces - Hugging Face: no description found
Models - Hugging Face: no description found

HuggingFace ▷ #today-im-learning (2 messages):

Tensor Parallelism, Neuralink Advancements 

Hiding Communication in Tensor Parallelism: A discussion revealed that approximately 62% of communication can be hidden in tensor parallelism while maintaining the same loss levels.
This technique could lead to significant optimizations, which were illustrated in the attached image SCR-20250221-svtn.png.

Neuralink's Progress Exploration: Recent learnings included insights into Neuralink's progress, focusing on the integration of advanced communication strategies.
The implications of tensor parallelism were discussed, emphasizing the importance of efficient data handling.

HuggingFace ▷ #cool-finds (2 messages):

Universal Transformers Dataset, MassiveDS-140B Release, Data Access and Community Engagement, Data Protection Measures 

Universal Transformers Dataset Surpasses LAION-5B: The Universal Transformers Dataset is a groundbreaking open-source resource containing trillions of data points, including images, text, and videos, enabling advanced AI training.
To request access, users must start a discussion in the Access Discussions Forum and provide details about their intended use.

MassiveDS-140B Unveiled with Raw Data and Embeddings: The release of MassiveDS-140B provides users with access to a subsampled version containing 140B tokens in the datastore, along with raw passages and embeddings.
Details on the file structure include raw_data, passages, embeddings, and index, with the full dataset available here.

Data Volume Challenges for MassiveDS-1.4T: Due to the large volume, data uploads for MassiveDS-1.4T are still in progress, whereas MassiveDS-140B is currently ready for use.
Users can find code support for running with MassiveDS on the GitHub repository.

Engage with the Universal Transformers Community: Community engagement is encouraged through discussions to raise questions and share ideas, ensuring safety in the use of the Universal Transformers Dataset.
Lambda Go emphasizes the importance of protecting data integrity and fostering a responsible collaborative environment.

Links mentioned:

future-technologies/Universal-Transformers-Dataset · Discussions: no description found
rulins/MassiveDS-140B · Datasets at Hugging Face: no description found

HuggingFace ▷ #i-made-this (5 messages):

Spark Engine Release, Cyclic KL Beta Manager, HF Model Testing 

Spark Engine Launches from Beta: The team proudly announced the official release of Spark Engine after over a year in public beta, offering a powerful no-code AI sandbox with 80+ models for various content generation.
They are encouraging more contributors to join them in creating innovative projects using the platform.

Enhanced Cyclic KL Manager Promises Results: A member shared progress on a robust cyclic KL beta manager inspired by a paper from Duke University and Microsoft, which improves tuning of cycles in training processes.
Recent adjustments have enabled early resets of the cycle when KL plateaus, showing promising outcomes for the training model.

Unique HF Model Testing Space: A new Hugging Face Space was unveiled for swiftly testing models hosted on HF Inference, featuring a whimsical medieval knight speech system by default.
Users can easily customize the interaction by simply changing the system prompt for different outcomes.

Links mentioned:

XO HF Model Tester - a Hugging Face Space by xyizko: no description found
Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing: Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, () reconstruction and () KL...

HuggingFace ▷ #reading-group (4 messages):

Parameter size vs training data, Channel posting etiquette 

Exploring parameter size and training data relationship: A member posed a question regarding the relationship between parameter size and training data, illustrating an example of a rugged EMT GPT that operates without a network connection.
Is there any research or established opinions on this relationship?

Weekly reading group channel etiquette: A member reminded another not to cross-post in the weekly reading group, emphasizing that the channel's purpose is specific to that week's reading.
They suggested finding a single relevant channel by reading its description to post relevant inquiries.

HuggingFace ▷ #computer-vision (1 messages):

OCR lightweight models, InternVL model errors 

Seeking Lightweight Models for OCR: A member is looking for recommendations on lightweight models that perform well in OCR tasks.
Currently trying to fine-tune the InternVL model but facing many errors, raising concerns about its utility for this purpose.

Issues with InternVL Model for OCR: There are multiple errors being encountered while fine-tuning the InternVL model for OCR.
The member is actively seeking alternatives better suited for OCR tasks, indicating frustration with the current model.

HuggingFace ▷ #NLP (10 messages🔥):

HuggingFace NLP Course, Finetuning Models, Modular Arithmetic in Coding Theory, Recommended NLP Books 

HuggingFace NLP Course Mentioned: A member suggested checking out the HuggingFace NLP course as a resource.
However, there weren't specific answers found related to the initial inquiry about modular arithmetic.

Challenges of Model Finetuning: A member described finetuning a base model for chat on 1 epoch with a dataset size of 100,000 but encountered issues where the inference model generates user input.
This raised a request for assistance, highlighting potential concerns with the finetuning process.

Discussions on Pre-training and SFT: In conversation, a member mentioned they would pretrain the model before moving on to SFT (Supervised Fine-Tuning), sparking questions on the pretraining approach.
Clarifications followed that they were not pretraining from scratch, but specifics on their method were not disclosed.

Resources for Natural Language Processing: Several members recommended books for NLP, including Natural Language Processing with Transformers and Practical Natural Language Processing.
Another member highlighted this Stanford resource as a definitive reference for understanding NLP concepts.

HuggingFace ▷ #gradio-announcements (1 messages):

Gradio Sketch, No-code Gradio apps, Gradio app deployment, Terminal commands for Gradio 

Introducing Gradio Sketch: No-Code Mode: A new feature called Gradio Sketch has been introduced, allowing users to build Gradio apps without coding.
Users can upgrade by running pip install --upgrade gradio and start using the command gradio sketch in their terminal.

Quick Start for Gradio Apps: With Gradio Sketch, users can build and deploy apps to Spaces without writing any code, enabling rapid prototyping.
This feature significantly lowers the barrier for entry for users new to app development in Gradio.

Visual Demonstration Available: A screen recording showcases the capabilities of Gradio Sketch in action.
This visual guide highlights the ease of use and functionality of the new no-code interface.

HuggingFace ▷ #smol-course (9 messages🔥):

Running Agents with Smolagents, Understanding Space Duplication, HF Token for Agent Functions, DevOps Relevance to the Course, Certification Space PR for Module 1 

Running Agents with Smolagents Explained: A member shared a YouTube video titled 'Running your first Agent using Smolagents' that guides users on how to run their first 🤗 Space for the Agents Course.
The video aims to assist those struggling with Unity 1 by outlining necessary steps clearly.

Clarifying 'Your Copy of the Space': Another member provided insights on creating an agent using smolagents, clarifying that users need to duplicate the template space here.
This involves clicking on the three dots in the upper right corner and selecting 'Duplicate this space' to work on app.py.

Adding HF Token for Agent Operations: For the agent to function correctly, users are instructed to add their Hugging Face token in the Settings > Variables and Secret > New Secret with the key 'HF_TOKEN'.
An additional YouTube video has been referenced for guidance on creating the HF token.

DevOps Course Applicability Question: One member inquired if the Agents Course is applicable for individuals in the DevOps field.
There was no immediate follow-up or answer, leaving the question open for further discussion.

Merge Request for the Certification Space: A user informed another about their PR on the certification space that could resolve several errors in module 1 of the certification process.
They encouraged the merge to ease the challenges faced by users, including themselves.

Links mentioned:

Let’s Create Our First Agent Using smolagents - Hugging Face Agents Course: no description found
agents-course/unit_1_quiz · fixes 500 error for some users: no description found
agents-course/unit1-certification-app · solves 500 error for some users: no description found
SmolVLM2: Bringing Video Understanding to Every Device: no description found
Running your first Agent using Smolagents: Run your first 🤗 Space for the Agents Course.https://huggingface.co/learn/agents-course/

HuggingFace ▷ #agents-course (125 messages🔥🔥):

Participant Introductions, Token Access and Model Usage, Course Feedback, Technical Issues with Installation, Study Buddy Requests 

Participants Introduce Themselves: Many newcomers introduced themselves, sharing their backgrounds and excitement about the AI Agent course, such as Diego Luna from Mexico and Kamil from Poland.
Some expressed their profession, like Dev from India, a senior Python developer, and Derek from NZ, a senior research engineer.

Accessing Tokens for Model Use: Users discussed needing access tokens for models, citing Hugging Face's token page to generate tokens for model usage and setting permissions appropriately.
Several participants, including ludan369, clarified that tokens need ‘write’ access and where to set environment variables in Spaces.

Feedback on Course Material: Some participants, like nyceyes, provided feedback on the course material, suggesting it could be more concise and structured, particularly in explaining concepts.
Concerns were raised about the fragmented presentation of information, which could hinder understanding and engagement.

Technical Issues and Solutions: Members shared their experiences with technical issues, including blackanger, who faced a 401 error regarding unauthorized access when running a SmolAgents example.
Suggestions for resolving issues included setting tokens as environmental variables and checking permissions, with some sharing where to generate these tokens.

Requests for Study Buddies: Participants expressed interest in finding study buddies to collaborate on the AI course, with freeze12 specifically seeking a partner for early morning sessions.
This highlights a community-focused approach to learning and supporting each other throughout the course.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Hugging Face – The AI community building the future.: no description found
Hugging Face – The AI community building the future.: no description found
First Agent Template - a Hugging Face Space by sebasArTecnology: no description found
tokenizer_config.json · HuggingFaceTB/SmolLM2-135M-Instruct at main: no description found
Hugging Face – The AI community building the future.: no description found
meta-llama/Llama-3.2-3B-Instruct · Hugging Face: no description found

Perplexity AI ▷ #general (221 messages🔥🔥):

Perplexity Pro performance issues, Deep Research citation problems, R1 model functionality, AI model comparison, Learning Python and AI/ML 

Perplexity Pro Performance Issues: Users have reported issues with Perplexity Pro, specifically that Deep Research is taking much longer than the expected 2-4 minutes, with some experiencing extended wait times.
One user indicated that their MacBook Pro is experiencing significant delays when invoking Deep Research.

Deep Research Citation Problems: Concerns have been raised regarding Deep Research fabricating statistics and providing citations that do not correlate with the factual content.
One user humorously shared an example of a fabricated statistic regarding cat treats that was cited from a source discussing a different topic.

R1 Model Functionality: Several users noted that selecting the R1 model does not always yield the expected results, suggesting that prompts containing images may default to the GPT-4o response.
One user discovered this after realizing they may have uploaded an image in their request.

AI Model Comparison: Discussion arose comparing Grok 3 to R1, with some users expressing satisfaction with Grok 3's performance, particularly in coding tasks.
Users acknowledged that R1 may be limited in certain situations, and Grok 3 might offer superior capabilities under specific conditions.

Learning Python and AI/ML: Users are sharing resources for learning Python, with recommendations for free tutorials and courses aimed at beginners.
When considering AI/ML, users were advised to explore different paths like classical ML or deep learning, depending on their interests.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
OpenAI o3 - Wikipedia: no description found
Python Tutorial | Learn Python Programming Language - GeeksforGeeks: Python is a versatile and beginner-friendly programming language known for its simplicity, extensive libraries, and wide applications in web development, data science, and machine learning.
Shivering Dog GIF - Shivering Dog Shiver Me Timbers - Discover & Share GIFs: Click to view the GIF
Perplexity Pro Gift Subscription | Perplexity Supply: 
Tweet from Navigator (@naivigator): 🧵 Introducing Navigator – your all-in-one DeFai AI agent, launchpad, and framework for automating browser tasks! 🚀
Adapt AI-Artificial Intelligence Agency Consumer Insights: Discover how AdaptAI-Artificial Intelligence Agency can transform your business with efficient, cost-effective AI solutions tailored to drive growth and success!Make Sure To Follow Our Instagram:https...
What are the Best Treats for Clicker Training Cats? - Cat School: Everyone wants to know, "What is the best treat for clicker training my cat?" There isn't one. Different treats serve different purposes.
Machine Learning: Machine Learning covers a lot of topics and this can be intimidating. However, there is no reason to fear, this play list will help you trough it all, one st...
Machine Learning: no description found
Deep Learning: CS 182 Spring 2021: Lectures for UC Berkeley CS 182: Deep Learning.
Foundations of Deep RL -- 6-lecture series by Pieter Abbeel: no description found
Understanding Deep Learning: no description found
Math for Deep Learning — Andreas Geiger: Lecture: Math for Deep Learning (MaDL) A Short Guided Tour on Linear Algebra and Probability Theory (Prof. Andreas Geiger, University of Tübingen) Course Web...
How To Build The Future: Aravind Srinivas: YC General Partner David Lieb sits down with Aravind Srinivas, the co-founder and CEO of Perplexity, to discuss his origins in Silicon Valley, what it's like to compete with Google, and what the ...

Perplexity AI ▷ #sharing (15 messages🔥):

Vim Exiting Instructions, 8085 Simulator Implementation, Taiwan's Independence Debate, Phyt Intelligence, New iPhone 17 Design 

Navigate the Vim Exiting Labyrinth: A user shared a link with instructions on how to exit Vim effectively, providing a solution to a common problem for new users: how to exit Vim.
This guide could be invaluable for those still mastering the complexities of the Vim editor.

Creating an 8085 Simulator: There was a share about implementing an 8085 simulator with detailed guidance, available here: implementing a 8085 simulator.
This may assist developers interested in CPU architecture simulation.

The Ongoing Debate on Taiwan's Independence: Links discussing whether Taiwan should remain independent sparked insightful community discussions. See the details at Taiwan Independence Discussion.
Contributions regarding Taiwan's political stance are crucial as the topic remains sensitive and significant.

The Rise of Phyto Intelligence: A user brought to light a resource regarding Phyto Intelligence and its implications: follow the phytointelligence.
This technology is building momentum in discussions around plant-based technologies.

Anticipated Updates in iPhone 17 Design: A Youtube video teased the drastically different design coming for the iPhone 17: iPhone 17 Design.
Such changes are anticipated to spark excitement and discussion among Apple enthusiasts.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (8 messages🔥):

Deep Research API, Sonar vs Llama Models, Model Configuration Changes, Image Transmission via API 

Questions About Deep Research API: Members expressed curiosity about the integration of deep research into the API, with a specific mention of sonar-reasoning-pro as a potential solution.
They noted a lack of clear documentation comparing models, prompting concerns about the perceived value of the new API features.

Sonar Models Show Strong Performance: One member conducted comparative testing between Sonar and Llama, noting that sonar-reasoning provided a significant step up in performance.
They specifically highlighted that Sonar models were more responsive than Llama huge, although they could not provide A/B testing data due to scrapping.

Changing Model Configuration Impacts Results: A user mentioned that they switched to another model with only a change in the configuration file, experiencing unexpected quality issues with sonar-reasoning.
They expressed concern over a decline in response consistency, feeling uncertain if it was a true issue or a perception problem.

Inquiry About API Image Capabilities: A user inquired about the capability to send images via the API, seeking insights from other members.
This topic sparked some interest as it relates to API functionalities, although no direct responses were logged regarding image transmission.

Link mentioned: no title found: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Weaver Demo, Chrome Extension by Amir 

Introducing Weaver, a Powerful LLM Tool: The Weaver demo showcases a highly configurable platform enabling users to bring their own keys/models and databases for enhanced performance.
Features include PDF support for Gemini and Anthropic, image/text-based file support, and branching chats, making it a versatile tool for various applications.

Amir's Versatile Chrome Extension Launched: A member announced a new powerful Chrome extension that can turn any content into a preferred style, including translation, simplification, and summarization.
The extension is fully open-source and only requires an OpenAI-compatible API, further details can be found on its GitHub page.

Links mentioned:

Weaver: no description found
Tweet from Amirreza (@amirsalimiiii): Just cooked up a powerful Chrome extension! Turn any content into your preferred style—translate, simplify, summarize, you name it. 🔥🛠️ Fully open-source & only needs an OpenAI-compatible API.Check ...

OpenRouter (Alex Atallah) ▷ #general (235 messages🔥🔥):

Model Access and Features, OpenRouter Documentation, DeepSeek Model Performance, API Usage and Integration, Reverse Engineering Concerns 

Speculation on Model Launches: Users speculated about an imminent model launch, suggesting that certain indicators in the community hinted at this possibility.
Discussions hinted at the excitement amongst users about potential new capabilities being introduced.

Concerns over API Access and Functionality: Users expressed frustration with the API functionality, particularly regarding the DeepSeek model returning an internal server error (500) and issues with reasoning content.
Some users shared their experiences with integrating various models and noted limitations in API responses.

OpenRouter Documentation Issues: The OpenRouter documentation was criticized for focusing mainly on OpenAI's API, leaving users who utilize other services, like Anthropic, without sufficient guidance.
Users anticipated future updates to documentation that would accommodate a broader range of API integrations.

Ethics of Reverse Engineering LLM APIs: There was discussion regarding the legality and ethical implications of reverse engineering APIs to create cheaper versions of existing models.
Participants raised concerns about the impact of such actions on legitimate services and the AI ecosystem as a whole.

User Experiences with Different Models: Some users reported positive experiences with newer models like Grok-3, while others noted inconsistencies, especially with response efficiency in DeepSeek R1.
Concerns were shared about the performance of certain models when handling specific types of queries, highlighting varying levels of output quality.

Link mentioned: OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

Stability.ai (Stable Diffusion) ▷ #general-chat (231 messages🔥🔥):

Hiring Stable Diffusion Expert, Flux and SD Models Discussion, Stability Matrix Configuration, Using Civitai for Models, Image Generation Challenges 

Hiring for Stable Diffusion Project: A user expressed the need to hire someone adept at Stable Diffusion and traditional art for a project, which drew mixed responses from the community.
Some members advised handling it personally for better learning, while others raised concerns about the user's credibility based on their social media activity.

Exploring Flux and SD Models: Discussion ensued about different Stable Diffusion models, particularly the Flux and SD3.5, with recommendations for beginners focusing on SDXL.
It was noted that many reputable models require API keys or agreements, leading to frustration among users trying to access high-quality generation capabilities.

Configuring Stability Matrix: Users discussed challenges in configuring the Stability Matrix interface and how to effectively manage checkpoints and model installations.
Advice was offered on downloading models, including checking for NSFW content to fully access available presets.

Using Civitai for Model Access: Users explored the difficulties of finding and downloading models from Civitai as many models require agreeing to licensing terms.
Emphasis was placed on following correct policies when accessing flux models, detailing the need to agree to a non-commercial license.

Common Image Generation Issues: New users voiced their frustrations over generating high-quality images using different settings and models, often resulting in poor outputs.
More experienced users suggested trial and error, emphasizing a straightforward approach to tweaking individual settings for optimal results.

Links mentioned:

black-forest-labs/FLUX.1-dev at main: no description found
tencent/Hunyuan3D-2 at main: no description found
Webui Installation Guides: Stable Diffusion Knowledge Base (Setups, Basics, Guides and more) - CS1o/Stable-Diffusion-Info
Proteus - v0.6 | Stable Diffusion XL Checkpoint | Civitai: Proteus v0.6 I'm excited to introduce Proteus v0.6 , a complete rebuild of my AI image generation model. This is the first version of the rework , ...

aider (Paul Gauthier) ▷ #general (179 messages🔥🔥):

Grok 3 Performance, DeepSeek Capabilities, AI Pricing Models, Transformers vs. LSTMs, Tech CEO Opinions 

Grok 3 impresses users: Many users are praising Grok 3 for its superior performance, with one stating it's better than O1 Pro, achieving good quality output with less effort.
Grok 3's 'Think' feature reportedly enhances its capabilities, making conversations engaging and efficient.

DeepSeek is gaining traction: SambaNova claims to have the fastest deployment of DeepSeek-R1, achieving 198 tokens/sec with 16 chips, significantly outperforming GPUs.
Their results indicate that DeepSeek can execute complex tasks more efficiently and could disrupt existing AI performance standards.

Concerns over AI pricing models: Users discuss the high cost of AI services, revealing mixed feelings about paying for Grok 3, citing its desire to provide free access despite premium plans.
The sentiment reflects a broader criticism of monetization strategies in the AI industry, with calls for more affordability.

Transformers and RNNs Comparison: A debate emerged regarding whether transformers are similar to graph neural networks and if LSTMs may outperform them in certain aspects.
Participants acknowledged that while both architectures have their strengths, transformers seem to be favored in current implementations.

Skepticism towards Tech CEOs: There is a growing sentiment among users expressing distrust towards tech CEOs, with some suggesting all should be considered 'rotten' in today's landscape.
Conversely, Paul Gauthier received some praise for being seen as an exception to this trend, sparking discussions about the integrity of tech leadership.

Links mentioned:

Should Nvidia be worried? Plucky inference rival replaces 320 Nvidia GPUs with 16 reconfigurable dataflow units: SambaNova recorded 198 tokens per second using just 16 custom-built chips
OpenAI vs. DeepSeek: Which AI Understands Kotlin Better? | The Kotlin Blog: Which AI model understands Kotlin best? We tested DeepSeek-R1, several OpenAI models, and more using Kotlin-specific benchmarks. See how they compare in our analysis.
SambaNova Launches the Fastest DeepSeek-R1 671B with the Highest Efficiency: SambaNova announces that DeepSeek-R1 671B is running today on SambaNova Cloud at 198 tokens per second - speeds & efficiency no other platform can match.
New Claude features signal major update ahead of Amazon event: Discover Anthropic's upcoming features, including new reasoning tools and web search capabilities, set to launch with new models. Stay tuned for more updates!
Burgerkingguy GIF - Burgerkingguy - Discover & Share GIFs: Click to view the GIF
Tweet from Yuchen Jin (@Yuchenj_UW): I can finally say:Grok 3 is my bestie.- much faster than GPT-4o- the "Think" mode works perfectly with the prompt guideline below- cheaper- I prefer their UI over ChatGPT and Claude (am i a lo...
Tweet from Elon Musk (@elonmusk): Try Grok voice in unhinged mode 😂

aider (Paul Gauthier) ▷ #questions-and-tips (38 messages🔥):

Switching between editor and architect modes, Managing repositories with Aider, Using Aider with ignored files, Differences between architect and code modes, Updating files in real-time within Aider 

Switching between editor and architect modes: Users are confused about how to effectively switch between AIDER_MODEL and AIDER_EDITOR_MODEL settings within their environment for different editing needs.
Suggestions include using the --edit-format or running Aider with specific configuration options to address the challenges.

Managing repositories with Aider: A user expressed the desire to have CHECKLIST.md managed by Aider but found it conflicted with Git's ignore settings.
Another member suggested steps to manage the file while keeping it out of Git's tracking and ensuring it was usable during Aider sessions.

Using Aider with ignored files: Challenges arose when trying to add files that are ignored in Git, leading to discussion about potential workarounds to allow Aider to see and utilize these files.
The suggestion was made to temporarily remove the ignore rule to refresh Aider's state with the file.

Differences between architect and code modes: A discussion highlighted that implementations in architect mode differ significantly from those in code mode, potentially due to varied prompts and non-deterministic model behaviors.
Users have pondered if exploring the codebase could shed light on the distinct workings of the modes.

Updating files in real-time within Aider: Several users inquired whether making real-time changes to a file added to the Aider chat would reflect the latest updates for the AI model to process.
It was noted that checking with the --chat-history-file option could help verify if updates are captured correctly during chat sessions.

Links mentioned:

You are using Cursor AI incorrectly...: I'm hesitant to give this advice away for free, but I'm gonna push past it and share it anyway. You're using Cursor incorrectly.Over the last few weeks I've been doing /zooms with ...
Edit formats: Aider uses various “edit formats” to let LLMs edit source files.
Aider in your IDE: Aider can watch your files and respond to AI comments you add in your favorite IDE or text editor.
Advanced model settings: Configuring advanced settings for LLMs.
Blaming aider/aider/resources/model-settings.yml at 1f4a63d6db59a5c2f975ae4eac66511dee27b809 · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #links (3 messages):

AI Assisted Coding, LLMs Productivity, Benchmarking Models 

Critique on AI Coding Channels: A member shared a video revealing an anti-AI coding perspective, stating that LLMs are pretty useless on their own.
They suggested that benchmarking for 'AI assisted coding' should focus on comparing productivity against an un-assisted baseline.

Need for Real-World Benchmarking: Another member emphasized the necessity to benchmark models that reflect real-world usage more accurately.
While some assert LLMs are useless, this member found them to be a major productivity boost, contingent upon specific use cases.

Boost in Productivity with Aider: A member confirmed that using Aider has noticeably improved their output, code quality, and understanding.
They expressed that their experience highlights the significant advantages of AI assistance in coding tasks.

Nous Research AI ▷ #general (184 messages🔥🔥):

Grok Bars Interpretation, VRAM Requirements for Fine-Tuning, MiniCPM-o 2.6 Release, AI-Assisted Coding Education, Self-Improving AI Agents 

Interpretation of Light Blue Shading on Grok Bars: Discussion arose regarding what the light blue shading on Grok bars indicates, particularly related to cons@64, with an image presented for analysis.
Members expressed curiosity about the technical meanings behind visual indicators in model performance data.

Estimating VRAM for Fine-Tuning Contextual Models: A conversation highlighted the VRAM requirements for fine-tuning different model sizes, specifically querying how much VRAM is necessary for managing long context windows in training.
Suggestions were made to test different context lengths to better understand memory usage and scalability.

Launch of MiniCPM-o 2.6: The release of MiniCPM-o 2.6 marked significant improvements in multimodal capabilities, and it topped trending lists on platforms such as GitHub and Hugging Face.
The model offers enhanced performance for vision, speech, and live streaming applications, with a dedicated technical report released.

Value of Learning Coding Basics: Users discussed the importance of having a foundational understanding of coding, especially in Python and JavaScript, to effectively collaborate with AI tools.
Participants shared resources and personal goals related to learning programming languages before graduating high school.

Development of Self-Improving AI Agents: A user reported building an AI agent capable of editing its own instructions in real-time to enhance flexibility and self-improvement in task completion.
There was interest in measuring the performance improvements of such a dynamic system in real-world applications.

Links mentioned:

The Ultra-Scale Playbook - a Hugging Face Space by nanotron: no description found
Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation: We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and...
Equivalence of Equilibrium Propagation and Recurrent Backpropagation: Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase. In the first phase, both algorith...
Generalization of Equilibrium Propagation to Vector Field Dynamics: The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forwar...
SmolVLM2: Bringing Video Understanding to Every Device: no description found
openbmb/MiniCPM-o-2_6 · Hugging Face: no description found
NovaSky-AI/Sky-T1_data_17k · Datasets at Hugging Face: no description found
MiniCPM-o 2.6:  An 8B size, GPT-4o level Omni Model runs on device: 💥 Introducing our MiniCPM-o 2.6:  An 8B size, GPT-4o level Omni Model runs on device✨ Highlights:Match GPT-4o-202405 in vision, audio and multimodal live st...

Nous Research AI ▷ #ask-about-llms (4 messages):

Cursor vs Groq performance, Training Small Reasoning Model, Model size impacts, Recent AI research paper 

Cursor + Claude 3.5 outshines Groq for coding: A member expressed that Cursor + Claude 3.5 still outperforms Groq for coding purposes in their experience.
Struggles in training Small Reasoning Model: A member reported difficulties training a Small Reasoning Model, especially with the format_reward function constantly outputting 0.0.
Despite several attempts, the model failed to generate the specified format, raising concerns about its efficacy due to its small size.

Discontent with the 0.5B model size: Members discussed that the 0.5B model was ineffective, leading them to discard it in favor of the 1.5B model.
The sentiment was that even the 1.5B model might still be too small for their needs.

Mention of helpful recent research: In the midst of troubleshooting, a member brought up a newly released paper, which might provide insights into their challenges.
They referenced a link to the paper found here.

Link mentioned: qwen2.5-3B-openmath-grpo: Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

Nous Research AI ▷ #research-papers (6 messages):

Equilibrium Propagation, Vector Field Dynamics, Recurrent Backpropagation, Arcee-Maestro-7B, AlphaMaze Visual Reasoning 

Equilibrium Propagation enhances learning frameworks: Equilibrium Propagation introduces a learning method for energy-based models that computes objective function gradients while simplifying the training phases without needing special computations.
It makes it more plausible that a similar mechanism to Backpropagation could be biologically implemented.

Generalizing Equilibrium Propagation to vectors: The new framework addresses biological realism in backpropagation algorithms by introducing a two-phase learning mechanism that reduces reliance on symmetric connections.
This algorithm approximates the objective function's gradient, proving related to the symmetry of network weights.

Equivalence between learning algorithms established: Recurrent Backpropagation and Equilibrium Propagation have been connected, showing that temporal derivatives in the former align with error derivatives in the latter during training.
This supports the idea that biological networks might utilize temporal activity for error signaling.

Arcee-Maestro-7B Preview showcases AI advancements: The Arcee-Maestro-7B-Preview model demonstrates enhanced reasoning capabilities using reinforcement learning techniques built upon the Qwen2.5 architecture.
It shows significant improvements in mathematical and coding tasks, indicating promising future developments.

AlphaMaze teaches LLMs visual reasoning skills: The AlphaMaze project is live, showcasing a model trained to solve maze puzzles, improving from 0% to 93% accuracy through two-phase training methods.
This advancement opens new possibilities for applications in robotics and navigation tasks requiring visual reasoning.

Links mentioned:

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation: We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and...
Equivalence of Equilibrium Propagation and Recurrent Backpropagation: Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase. In the first phase, both algorith...
Generalization of Equilibrium Propagation to Vector Field Dynamics: The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forwar...

Nous Research AI ▷ #interesting-links (3 messages):

Reinforcement Learning for LLMs, SmolVLM2 updates, Small video models 

Reinforcement Learning Explained for Beginners: A member shared a thread piecing together first-principles concepts of Reinforcement Learning (RL) for LLMs, aimed at newcomers without a prior RL background.
The core takeaway emphasized that RL enables learning from rewards rather than demonstrations, expanding supervision methods.

SmolVLM2's Enhanced Visual Understanding: An article on Hugging Face highlighted that SmolVLM2 can now 'watch' videos with improved visual comprehension, making it a notable advancement in technology.
There’s excitement around how small video models are evolving, demonstrating significant capacity while maintaining minimal size.

Links mentioned:

Tweet from Shashwat Goel (@ShashwatGoel7): I pieced together this first-principles no RL prerequisites explainer on how RL for LLMs works, and why we need it🧵The main point? RL is exciting because it allows us to scale supervision. We can now...
SmolVLM2: Bringing Video Understanding to Every Device: no description found

Nous Research AI ▷ #research-papers (6 messages):

Equilibrium Propagation, Generalization of Equilibrium Propagation, Recurrent Backpropagation, Arcee-Maestro-7B-Preview, AlphaMaze 

Equilibrium Propagation connects energy models and backpropagation: The introduction of Equilibrium Propagation presents a learning framework for energy-based models, using a single neural computation phase for both prediction and error propagation.
This framework makes it plausible that a backpropagation-like mechanism could exist in biological systems since it effectively performs inference and error correction in one model.

Vector Field Dynamics extends Equilibrium Propagation: The generalization of Equilibrium Propagation for fixed point recurrent networks addresses the biological plausibility of backpropagation by allowing local weight updates without energy function dependence.
This approach approximates objective function gradients, offering insights into the connection between feedforward and feedback weight symmetry in neural networks.

Equivalence found in two learning algorithms: Equilibrium Propagation and Recurrent Backpropagation share a close relationship, where the error derivatives in Recurrent Backpropagation align with the temporal derivatives of neural activities in Equilibrium Propagation during training.
This suggests that the computation of error derivatives does not require a side network, hinting at potential adaptations in biological brain functions.

ArceeAI launches promising reasoning model: Arcee-Maestro-7B-Preview, Arcee's first reasoning model, shows improved mathematical and coding abilities through reinforcement learning techniques based on DeepSeek-R1 distillation.
This model demonstrates enhanced performance across tasks, building on existing frameworks and significant training advancements.

AlphaMaze teaches visual reasoning to LLMs: The AlphaMaze project trains a decoder-only model to improve visual-spatial reasoning, achieving 93% accuracy in maze-solving tasks after a two-step training process.
By allowing language models to 'see' spatial relationships, it opens up new possibilities for applications in robotics and navigation.

Links mentioned:

Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation: We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and...
Generalization of Equilibrium Propagation to Vector Field Dynamics: The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forwar...
Equivalence of Equilibrium Propagation and Recurrent Backpropagation: Recurrent Backpropagation and Equilibrium Propagation are supervised learning algorithms for fixed point recurrent neural networks which differ in their second phase. In the first phase, both algorith...

GPU MODE ▷ #general (22 messages🔥):

ROCm support for 6700XT, Discussion on AI paper response, Chinese language channel interest, Open Sourcing from DeepSeek AI, GPU cluster utilization issues 

ROCm on 6700XT gains traction: Members discussed successfully running ROCm on the RX 6700XT despite initial doubts, with tips shared regarding installation across Linux and Windows.
One member detailed that using pacman -S rocm on Arch Linux enabled them to access full ROCm functionality.

AI paper discussion sparks debate: A member praised the paper for its valuable ideas but cautioned against relying solely on ncu or LLMs as verifiers without human oversight, advising a careful approach to performance evaluations.
This led to inquiries regarding what was meant by the statement, emphasizing the need for knowledgeable individuals in the review process.

Interest in a Chinese language channel: A proposal for a Chinese language channel was made, highlighting the emergence of ML systems content on platforms like Zhihu and encouraging more discussions.
Members reflected on the diminishing language barriers due to modern LLMs, fostering inclusivity and collaboration in technical communities.

DeepSeek AI gears up for Open Source Week: DeepSeek AI announced an upcoming open-sourcing event, sharing plans to release five repositories and engage with the community on AGI development.
The team emphasized their commitment to transparency and community-driven innovation, showcasing their work documented and deployed in production.

GPU cluster utilization drops: A user reported issues with a 96 GPU cluster wherein GPU utilization drops to 0 after several hours and jobs default to CPU processing, prompting restarts and driver reinstalls.
They suspected potential memory leaks or library misutilization, querying the community for strategies to manage reliability in GPU operations.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full tra...
Reddit - Dive into anything: no description found

GPU MODE ▷ #triton (5 messages):

TMA Descriptor in Triton, GridQuant Example, BLOCK_SIZE and num_warps Interaction, Thread Block Size Clarification 

Exploring TMA Descriptor Usage in Triton: A user is seeking clarity on how the TMA descriptor is utilized in Triton, referencing the persistent matrix multiplication tutorial. They find the example somewhat hard to follow.
Niconunezz's GridQuant Example: A member shared an easier-to-follow example with the GridQuant project on GitHub, which aims to implement grid quantization.
The link shared provides a visual overview and details of the GridQuant implementation, enhancing understanding.

Understanding BLOCK_SIZE and Thread Block Size: A user is examining the relationship between BLOCK_SIZE, num_warps, and true thread block size in Triton. They are trying to determine whether their assumptions regarding the constraints on these parameters are accurate.
Clarification on BLOCK_SIZE vs Thread Block Size: Another member clarified that BLOCK_SIZE should not be equated with thread block size in CUDA C++, suggesting it is more appropriate to refer to it as TILE_SIZE. They explained that num_warps is what governs the number of threads in a block.

Links mentioned:

GitHub - niconunezz/GridQuant: An attempt to implement GridQuant: An attempt to implement GridQuant. Contribute to niconunezz/GridQuant development by creating an account on GitHub.
Persistent Matmul — Triton  documentation: no description found

GPU MODE ▷ #cuda (6 messages):

GEMM Kernels in CUTLASS, Memory Configuration in CUTLASS, TF32 NT Kernel Development, 1D Block Scaling in cuBLAS 

GEMM Kernels Optimize Using Warp Specialization: In CUTLASS, GEMM kernels leverage warp specialization by using barriers for intra-warp communication, enabling pipelining with efficient loading from global to shared memory.
A specific implementation detail is exemplified in the CUTLASS GitHub repository.

Reduced Conflicts with GEMM Configuration Changes: A change in the default configuration of CUTLASS reduced conflicts from 65k per block to 6 by modifying parameters from <3,2,3> to <2,3,2>.
This highlights the importance of configuration in achieving optimal memory access patterns in tensor operations.

Seeking TF32 16x8x8 NT Kernel Solutions: A member is on the lookout for a TF32 16x8x8 NT kernel, indicating a need for efficient tensor processing strategies.
This search points to ongoing gaps in available implementations for specific matrix sizes.

Complex GMEM Offset Math in Device Code: There is an ongoing discussion about implementing a batched strided SYRK with a focus on efficient GMEM offset calculations where bM == bN.
The member's challenges highlight the intricacies of leveraging standard CUTLASS components for non-standard operations.

Clarification Needed on cuBLAS 1D Block Scaling: A member questions the cuBLAS documentation stating that 1D Block Scaling formats do not support transposition, raising concerns about the feasibility of applying scaling factors.
The ambiguity revolves around whether tensor transposition necessitates unique scaling factors for each element, complicating implementation in TensorEngine.

Link mentioned: cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp at main · NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.

GPU MODE ▷ #torch (3 messages):

Parameter dtype casting in PyTorch, Dynamic quantization in attention mechanisms, INT8 vs FP8 weight behavior, Overriding .to method in PyTorch 

Parameter dtype casting in PyTorch: A member highlighted that the straightforward method for changing parameter dtypes is to use model.named_parameters() and iterate through them.
They cautioned that this approach might not work when gradients are needed, as PyTorch enforces a uniform dtype for all activations and gradients.

Runtime Hadamard transform for dynamic quantization: Questions were raised about performing the Hadamard transform at runtime for proper quantization smoothing in dynamically quantized attention mechanisms.
It was suggested that this dynamic aspect necessitates specific runtime handling.

INT8 and FP8 weight behavior differences: A member noted that their INT8 weights remain unchanged, while FP8 weights do vary under current conditions.
This observation prompted a discussion about potential methods for overriding the .to method to handle these discrepancies.

GPU MODE ▷ #announcements (1 messages):

GPU MODE Meetup in San Jose, SemiAnalysis Blackwell Hackathon, Beyond CUDA Summit, CUDA Developer Talks 

GPU MODE Meetup Fires Up in San Jose: Join us on Sunday, March 16 from 5-10 PM for the second GPU MODE in-person meetup at this link, focusing on pressing ML Systems issues with talks from notable speakers like Christos Kozyraki and Simran Arora.
Enjoy espressos and cocktails while networking with around 150 engineers during this casual get-together.

Hackathon Ahead of GTC: Join SemiAnalysis!: On the same day, SemiAnalysis hosts a Blackwell Hackathon from 9 AM to 5 PM, offering a day filled with engaging keynotes and hands-on GPU programming found here.
Get ready for exciting talks and hacking perfect for those interested in NVIDIA GPU programming.

Beyond CUDA: An Exclusive Summit: Don’t miss the Beyond CUDA summit on March 17, featuring speakers like Gregory Diamos and Ashish Vaswani focusing on the future of AI compute at this link.
Enjoy free food, drinks, cloud credits, and AMD MI210 GPU giveaways during this can’t-miss event!

Dive Deep with NVIDIA's CUDA Developer Talks: From March 17-21, NVIDIA will host a series of CUDA developer talks, aimed at creating high-performance, GPU-accelerated applications, available at NVIDIA's website.
Don’t forget to use code GPUMODE for a 20% discount on registration!

Links mentioned:

GPU MODE @ GTC · Luma: This is a casual get together for ML Systems nerds. We'll start off with an hour's worth of inspiring talks discussing some important open problems in ML…
Hackathon 2025: SemiAnalysis is kicking things off ahead of NVIDIA GTC! Start your day with engaging morning keynotes, hack all day with low-level NVIDIA GPU programming (maybe even Blackwell), take a breather wit…
Beyond CUDA Summit · Luma: 🚀 The Future of AI Compute is upon us — and It’s Beyond CUDA.Join us for an exclusive summit bringing together AI founders, researchers, and engineers, to…
NVIDIA GTC AI Conference 2025: March 17–21, 2025. San Jose. Register Now.

GPU MODE ▷ #algorithms (1 messages):

GRPO VRAM Reduction, Extended Context Lengths, Gradient Checkpointing, Linear Cross Entropy Implementation 

Unsloth Achieves 90% VRAM Reduction: Unsloth has managed to make GRPO fit on just 5GB VRAM for Qwen2.5-1.5B, drastically reducing VRAM usage by 90%.
This feat extends average context lengths by 10x, optimizing resources for training reasoning models.

Incredible VRAM Saving Benchmarks: A standard GRPO setup for Llama 3.1 (8B) requiring 510.8GB VRAM at 20K context is reduced to just 54.3GB with Unsloth's support.
This breakthrough showcases the effectiveness of the new algorithms launched by Unsloth.

Inspired by Prior Innovations: The new algorithms leverage a previous gradient checkpointing method along with inspiration from Horace He’s linear cross entropy implementation.
These combined techniques contribute to the substantial improvements in efficiency for reasoning model training.

Launch Announcement from Unsloth: UnslothAI's launch announcement highlights algorithms enabling 10x longer context lengths and 90% less VRAM for training reasoning models.
They emphasize that using Unsloth allows training with 5GB VRAM without any loss in accuracy.

Details Available on Unsloth Blog: More insights into this advancement can be found in their blog post detailing the GRPO algorithms.
The blog elaborates on how these innovations affect the training landscape for reasoning models.

Link mentioned: Tweet from Unsloth AI (@UnslothAI): Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).Using Unsloth, you can now train your own reasoning model with just 5G...

GPU MODE ▷ #cool-links (13 messages🔥):

Nanotron by Hugging Face, Hopper GPU Architecture, HadaCore and Hadamard Transforms, MLGym Framework, CuAsmRL for GPU Scheduling 

Hugging Face's Nanotron Development: Members discussed Nanotron, a project for minimalistic large language model 3D-parallelism training available on GitHub. One remarked, "this is awesome", reflecting positive interest in the resource.
Another member humorously mentioned, "hopefully I’m paid for reading it, with attention drawn to its Francophone authors.

Examining Nvidia's New Hopper GPU: A research paper titled Benchmarking and Dissecting the Nvidia Hopper GPU Architecture focuses on novel attributes of the Hopper GPU, including new tensor cores. It aims to reveal microarchitectural intricacies critical for optimizing GPU programs.
The abstract notes that existing research highlights the need for understanding hardware features to enhance performance in AI workloads.

Insights on HadaCore and Quantization Techniques: HadaCore leverages Hadamard Transforms to optimize quantization in LLMs for improved inference speeds, as detailed in a PyTorch blog post. Researchers presented methods to enhance accuracy during quantization processes to counteract potential loss from outliers.
These techniques contribute to significant performance gains for various GPU architectures.

Automatic Optimization of GPU SASS Schedules: A forthcoming paper titled CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning proposes using RL to automate GPU SASS scheduling for better performance. The research highlights how traditional manual optimization tactics can be effectively supplanted with trained RL agents.
The proposed method aims to integrate seamlessly into existing compiler frameworks, significantly enhancing GPU performance during computation.

Links mentioned:

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture: Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) uti...
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel: Quantization is a method for improving model inference speeds by compressing model weights and performing (faster) computation in lower precision data types. However, quantization can result in accura...
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning: Large language models (LLMs) are remarked by their substantial computational requirements. To mitigate the cost, researchers develop specialized CUDA kernels, which often fuse several tensor operation...
Xiaowen Chu: IEEE Fellow, Professor, Data Science and Analytics, HKUST(GZ) - Cited by 13,555 - GPU Computing - Machine Learning Systems - Parallel and Distributed Computing - Wireless Networks
How to save memory by fusing the optimizer step into the backward pass — PyTorch Tutorials 2.6.0+cu124 documentation: no description found
GitHub - huggingface/nanotron: Minimalistic large language model 3D-parallelism training: Minimalistic large language model 3D-parallelism training - huggingface/nanotron
GitHub - facebookresearch/MLGym: MLGym A New Framework and Benchmark for Advancing AI Research Agents: MLGym A New Framework and Benchmark for Advancing AI Research Agents - facebookresearch/MLGym

GPU MODE ▷ #jobs (4 messages):

A5Labs ML Engineer hiring, Cohere Low Level Performance Engineers, Nebius DevRel Advocate position, Beam Platform and Infrastructure Engineers 

A5Labs seeks ML Engineer for Reinforcement Learning: A5Labs is hiring a remote ML Engineer focused on reinforcement learning and gaming. Interested candidates can find more details in the job description.
Cohere looking for Performance Engineers: Cohere is hiring low level performance engineers to work on scaling pretraining language models, focusing on cutting-edge hardware. More details can be found here.
Nebius seeks DevRel Advocate: Nebius is hiring a DevRel Advocate in the US to drive adoption of their AI Cloud services. Interested candidates can see the job details here.
Beam hiring Platform and Infrastructure Engineers: Beam is looking for platform and infrastructure engineers with a focus on high-performance AI infrastructure, offering remote roles in NYC with competitive compensation. Current projects include GPU container checkpoint/restore and contributions to upstream projects like runc and k3s.

Links mentioned:

Member of Technical Staff, Training Performance Engineer: Training Performance Engineer, Modeling
no title found: no description found
Developer Relations: United States
We’re hiring! - A5 Labs: Career Center We’re hiring! Join the A5 Labs Team
AI Infrastructure for Developers: no description found
GitHub - NVIDIA/cuda-checkpoint: CUDA checkpoint and restore utility: CUDA checkpoint and restore utility. Contribute to NVIDIA/cuda-checkpoint development by creating an account on GitHub.

GPU MODE ▷ #beginner (7 messages):

Fine-tuning LLM with DDP, Single GPU vs. DDP with 2 GPUs 

Concerns over loss values in DDP training: A member expressed trouble fine-tuning LLM with DDP, noting that the loss values remain the same between single GPU and DDP with 2 GPUs training steps.
If DDP was working, the loss value of the 1st step with DDP should be close to the loss value of the 2nd step without DDP.

DDP training step efficiency: The member pointed out that DDP appears to reduce the training steps to half, which raises concerns about effective model training.
The inconsistency in loss values suggests a potential issue with DDP functionality in this particular setup.

GPU MODE ▷ #pmpp-book (1 messages):

Code Evaluation, Grok3 Performance, C++ Memory Semantics 

Confirmed Code Correctness: The code has been confirmed as correct through extensive discussion and verification, as detailed here.
Grok3 demonstrated its ability to reason and articulate issues well, outperforming the responder's initial explanation.

Transition to C++ Memory Semantics: The new textbook will omit the existing code and shift entirely to C++ memory semantics.
This change signifies a strategic move towards embracing modern programming practices in the curriculum.

Link mentioned: CUDA Memory Ordering and Synchronization | Shared Grok Conversation: if (threadIdx.x == 0) { while(AtomicAdd(&flags[bid], 0) == 0) {} //  why do I not need thread fen

GPU MODE ▷ #off-topic (2 messages):

Unsloth.AI, Building PC for Llama 70B 

Unsloth.AI Transforming AI Development: The YouTube video features Daniel and Michael Han from Unsloth.AI discussing their innovative open-source project that makes model fine-tuning 2x faster.
This advances AI development significantly, emphasizing ease and efficiency for developers.

Specs for Running Llama 70B Locally: A member inquired about GPU and RAM specifications needed for building a PC with 2 TB storage to run Llama 70B locally.
This reflects ongoing interest in maximizing hardware for high-performance AI models.

Link mentioned: Start Up Wednesday with Unsloth.AI: Meet Daniel and Michael Han, the Australian brothers transforming AI development with Unsloth. Their open-source project makes model fine-tuning 2x faster wh...

GPU MODE ▷ #rocm (1 messages):
lynn4400: what do you guys use to see profiler results ?

GPU MODE ▷ #liger-kernel (1 messages):

Native Sparse Attention, Collaboration in liger 

Exploring Native Sparse Attention: A member inquired about interest in implementing Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention within the liger framework.
They expressed excitement about the possibility of collaborating on this initiative, inviting others to join in.

Call for Collaboration: The member encouraged others to come together to work on the Native Sparse Attention implementation in liger.
This collaborative spirit aims to leverage diverse insights and expertise within the community.

GPU MODE ▷ #self-promotion (9 messages🔥):

Open Source GPU Glossary, NUMA and CPU-GPU memory interactions, Hopper+ materials, Hardware architecture terms, GPU interface differences 

Open Source GPU Glossary Launched: The GPU Glossary is now open source on GitHub under a CC BY license, thanks to user contributions.
The community expressed appreciation for this development, indicating a desire for more comprehensive resources.

Request for NUMA and Memory Interactions Section: There was a suggestion to include a section on NUMA and CPU-GPU memory interactions in the glossary for clarity on memory access concepts.
This addition would benefit newcomers to GPU programming, making memory interactions less opaque.

More Hopper+ Material in the Works: The inclusion of additional materials on Hopper+ has been acknowledged as a priority, with a link to an issue for tracking progress.
Such content is planned but will be addressed in a larger project by the lead contributor.

Expanding Hardware Architecture Terms: Comments highlighted the need for further clarification on hardware architecture terms like 'math pipe', which aren't currently covered.
There is an interest in capturing the concepts surrounding Tensor Cores and their role as 'instruction pipes'.

Inclusion of GPU Interface Topics: A user suggested covering differences in GPU interfaces such as PCIe/SXM, NVLink, and Infiniband in the glossary.
The lead contributor confirmed that these topics are on the roadmap for future development, albeit needing more extensive effort.

Links mentioned:

GitHub - modal-labs/gpu-glossary: GPU documentation for humans: GPU documentation for humans. Contribute to modal-labs/gpu-glossary development by creating an account on GitHub.
Add more hardware architecture terms? · Issue #3 · modal-labs/gpu-glossary: Interesting comment on HackerNews about certain hardware terms, like "math pipe", that we don't cover and that are foreign to SWEs.
modal-labs/gpu-glossary: GPU documentation for humans. Contribute to modal-labs/gpu-glossary development by creating an account on GitHub.

GPU MODE ▷ #🍿 (4 messages):

The Fish in Codebases, MLGym Framework, TritonBench for Language Models, Sakana AI CUDA Engineer Story, Hardmaru's Optimizer Humor 

The Fish in Codebases: A member humorously noted how many codebases contain 'the fish' - variables or functions that are unused yet necessary to prevent a build explosion.
This highlights a universal struggle among developers to manage seemingly extraneous code.

MLGym Beta Launch: The MLGym repository introduces a new framework and benchmark aimed at advancing AI research agents.
This initiative could potentially push the boundaries of what AI agents can achieve in structured environments.

TritonBench for Evaluating Language Models: The TritonBench project focuses on benchmarking large language model capabilities for generating Triton operators.
It aims to standardize the evaluation of PyTorch operations within its benchmark suite.

Sakana AI CUDA Engineer Controversy: Frustrations arose over unverified results being publicized in the Sakana AI CUDA engineer's story, leading to unprofessional claims by the AI company.
Community reactions to this narrative reflect a deeper concern for integrity and realism in AI performance claims.

Hardmaru's Optimizer Humor: Hardmaru shared a humorous quote referencing an optimizer that created a bipedal walker robot which ‘solved’ its task by falling over.
This situation underscores the bizarre outcomes that can sometimes stem from AI optimizations when removed from strict design constraints.

Links mentioned:

Tweet from hardmaru (@hardmaru): After many years, this guy has come back to haunt me.Quoting hardmaru (@hardmaru) If we remove all design constraints, the optimizer came up with a really tall bipedal walker robot that “solves” the t...
GitHub - facebookresearch/MLGym: MLGym A New Framework and Benchmark for Advancing AI Research Agents: MLGym A New Framework and Benchmark for Advancing AI Research Agents - facebookresearch/MLGym
GitHub - thunlp/TritonBench: TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators: TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators - thunlp/TritonBench

GPU MODE ▷ #edge (7 messages):

3D Printing Parameters, Real-Time Translation with Audio Models, Whisper Model Functions 

Tuning Printers for File Compatibility: A member shared that any printer can print almost any file if the correct parameters are set in the slicer or firmware like OrcaSlicer or Klipper.
I've gotten used to having a tiny little printer where all the default files on it would need to be shrunk, added another member.

Exploring Real-Time Translation Capabilities: Inquiring about live translation, one member asked if audio models can achieve real-time capabilities with some systems effort.
The response indicated that Whisper can handle translation, though it’s not originally designed for real-time performance.

Using Whisper for Translation: In response to real-time translation discussions, another member suggested looking into Whisper and affirmed the need to prefix language tokens for it to perform translations.
They mentioned that while Whisper isn't real-time, modifications could allow some hacks to make it appear real-time.

GPU MODE ▷ #reasoning-gym (95 messages🔥🔥):

Code I/O dataset, Math reasoning tasks, Decimal number comparison, Model performance, API hosting for free 

Progress on Code I/O Implementation: The team is discussing the implementation of the Code I/O task, with a focus on generating dynamic inputs and verifying safety for execution.
Existing public sources are being considered for examples, and a collaborative approach is being suggested to manage the workload effectively.

Need for More Math Reasoning Tasks: There is a consensus among members regarding the lack of math and coding reasoning datasets, emphasizing the importance of this domain in training LLMs effectively.
Ideas include creating new datasets focused on tasks like random number generation and theorem proving, although concerns about dependencies are noted.

Discussion on Decimal Number Comparison PR: A PR for adding a decimal number comparison generator has been considered, but ultimately decided to be redundant with existing functionality in another file.
The PR was abandoned in favor of utilizing the already satisfactory implementation from the 'number_format' module.

Model Performance Insights: Users have reported that a smaller model (0.5B) failed to follow instructions, while a larger model (1.5B) performed better in generating outputs in the desired format.
This indicates potential scalability issues with model size directly affecting performance in achieving formatted outputs.

Exploring Motivation for Free API Hosting: Questions have been raised about the rationale behind offering APIs for free and what benefits the providers gain from this approach.
The team is analyzing the implications and potential advantages of making such advanced models accessible at no cost.

Links mentioned:

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, th...
FITTER documentation — fitter 1.7.1 documentation: no description found
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
Tweet from Homebrew Research (@homebrewltd): AlphaMaze: Teaching LLMs to think visuallyWe're excited to share our findings—now live in a blog, paper, and open-source release.Language models are good at words—but what about visual-spatial rea...
GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
qwen2.5-3B-openmath-grpo: Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
Tweet from Unsloth AI (@UnslothAI): Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).Using Unsloth, you can now train your own reasoning model with just 5G...
DeepSeek V3 (free) - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
GitHub - hkust-nlp/CodeIO: CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction: CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction - hkust-nlp/CodeIO
GitHub - allenai/open-instruct: AllenAI's post-training codebase: AllenAI's post-training codebase. Contribute to allenai/open-instruct development by creating an account on GitHub.
Yes, We Can Verify That Too!: GRPO Beyond Mathematical Reasoning TasksI am currently working on a Quest. A quest to better understand how to build and refine language models without relying too heavily on pre-existing instruction ...
Add CODEI/O sampled subset dataset · Issue #160 · open-thought/reasoning-gym: DeepSeek published CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction. Task: "predict inputs/outputs given code and test cases entirely in natural language" Beside propr...
reasoning-gym/reasoning_gym/arithmetic/number_format.py at main · open-thought/reasoning-gym: procedural reasoning datasets. Contribute to open-thought/reasoning-gym development by creating an account on GitHub.
Add decimal number comparison by Adefioye · Pull Request #174 · open-thought/reasoning-gym: Python generator to compare decimals
Add induction-based tasks for list functions by Adefioye · Pull Request #180 · open-thought/reasoning-gym: I have added a couple of python generators for induction-based list functions tasks. The following are a few examples of the kind of generators created:Generate input and output pairs where input...
Add Bitwise Arithmetic by Miserlou · Pull Request #181 · open-thought/reasoning-gym: More arithmetic, this time with hex values and shift registers.Please solve this problem. Reply only with the final hexidecimal value.(((((0xeb51 - 0xa153) - (0x8afe * 0x2532)) << 0x1) -...
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models: no description found
GitHub - zhaoyu-li/DL4TP: [COLM 2024] A Survey on Deep Learning for Theorem Proving: [COLM 2024] A Survey on Deep Learning for Theorem Proving - zhaoyu-li/DL4TP
Tweet from Aran Komatsuzaki (@arankomatsuzaki): Meta presents:MLGym: A New Framework and Benchmark for Advancing AI Research Agents- The first Gym environment for ML tasks- 13 diverse and open-ended AI research tasks from diverse domains
GitHub - facebookresearch/MLGym: MLGym A New Framework and Benchmark for Advancing AI Research Agents: MLGym A New Framework and Benchmark for Advancing AI Research Agents - facebookresearch/MLGym

GPU MODE ▷ #gpu模式 (3 messages):

Triton content, Zhihu discussions 

Discovering Triton content on Zhihu: A member highlighted some top-tier Triton content they noticed on Zhihu early last year, suggesting others check it out.
They encouraged further sharing of cool references or discussions about Triton.

Zhihu praised as a Q&A platform: Another member acknowledged that Zhihu is considered a high-quality Chinese Q&A community.
This emphasizes the platform's value for finding insightful discussions.

Interconnects (Nathan Lambert) ▷ #news (94 messages🔥🔥):

OpenAI's Projected Revenue and Infrastructure Spending, Emerging AI Models and Reasoning Innovations, Open Source Developments in AI, Market Dynamics in AI Infrastructure, AI Agents Revenue Disparity 

OpenAI shifts focus from Microsoft to SoftBank: OpenAI's projected revenue and infrastructure spending indicates a shift from Microsoft to SoftBank, with plans for 8GW by 2030 and a focus on inference costs overtaking training costs in five years.
This indicates a significant pivot in strategic alliances within the AI infrastructure landscape.

NEO Gamma introduced amidst AI developments: The introduction of NEO Gamma signifies another step in the evolving landscape of AI models, indicating that new innovations are still emerging.
This aligns with ongoing discussions about the state of AI technologies and models in development.

Growing public interest in reasoning models: A new community resource called General Reasoning has been launched, focusing on building open reasoning models and enhancing community contributions.
This initiative reflects a growing interest in open-source advancements in AI reasoning capabilities.

Modal slashes H100 and A100 prices: Modal has begun to decline prices for their H100 and A100 GPU models, indicating potential shifts in AI hardware market competition.
This price adjustment may impact the accessibility and adoption of advanced AI model training across various organizations.

Revenue streams from AI agents differently categorized: There's curiosity about why AI agents revenue is reported separately from other categories like ChatGPT and the developer platform.
This classification raises questions about the monetization strategies being employed in the growing AI sector.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full tra...
Bespoke Labs: no description found
Arcee AI: no description found
Tweet from Igor Kotenkov (@stalkermustang): Burning alt Man
Tweet from CN Wire (@Sino_Market): 🇨🇳CHINA SASAC HOLDS MEETING ON FEB 19, DEPLOYING AI INITIATIVE FOR CENTRAL AND STATE-OWNED ENTERPRISES, FOCUSING ON LARGE MODELS AND CORE TECH INNOVATION.#CHINA #AI #DEEPSEEK #LLMS https://mktnews.c...
Tweet from swyx 🗽 NYC (@aiDotEngineer) (@swyx): First time I’ve seen @AnthropicAI lay out its top priorities like thisfocusing more on mechinterp than Claude 4 now!great presentation from @ambricken and Joe Bayley!Quoting swyx 🗽 NYC (@aiDotEnginee...
Elon Musk quietly built a 2nd mega-data center for xAI in Atlanta with $700 million worth of chips and cables: Last year, xAI built a massive data center in Memphis, but the company has also been quietly setting up another facility in Georgia. 
Mercor, an AI recruiting startup founded by 21-year-olds, raises $100M at $2B valuation | TechCrunch: Mercor, the AI recruiting startup founded by three 21-year-old Thiel Fellows, has raised $100 million in a Series B round, the company confirmed to
Tweet from Super Dario (@inductionheads): He's been found he's been foundHas human bitemarks and an Alice in Chains tattooOtherwise he's fineHe says he'll see you next TuesdayQuoting Super Dario (@inductionheads) Claude has es...
ALLaM: Large Language Models for Arabic and English: no description found
Tweet from Sakana AI (@SakanaAILabs): Update:Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to ...
Tweet from hardmaru (@hardmaru): After many years, this guy has come back to haunt me.Quoting hardmaru (@hardmaru) If we remove all design constraints, the optimizer came up with a really tall bipedal walker robot that “solves” the t...
Tweet from Stephanie Palazzolo (@steph_palazzolo): OpenAI is shifting its alliance from MSFT to SoftBank over the next few years—1/3 of its growth this year alone will come from SoftBank.@coryweinberg @jon_victor_ @anissagardizy8 w/ some amazing detai...
Tweet from Anissa Gardizy (@anissagardizy8): scoop: Here’s a detailed breakdown of OpenAI’s projected rev + infra spend- spending will quickly shift from MSFT to Stargate - OAI plans to have 8GW by 2030- inference $ > training $ in 5 years w/...
Tweet from Tom Warren (@tomwarren): scoop: Microsoft is getting ready for OpenAI's GPT-5 model, and GPT-4.5 could arrive as soon as next week. All of this and more in this week's 📒 Notepad  issue, live now for subscribers 👇 ht...
Tweet from Ross Taylor (@rosstaylor90): 🎉  Excited to release General Reasoning: a new community resource for building open reasoning models. We’re looking to make personal, open reasoners a reality. Starting with a small step in that dire...
Tweet from Ross Taylor (@rosstaylor90): The data is freely available (MIT) via API:pip install agi (lol)or via a regular data dump that we’ll post to 🤗 Hugging Face periodically.We share questions, answers, traces, verifications and commun...
Tweet from 1X (@1x_tech): Introducing NEO Gamma. Another step closer to home.
Tweet from WanX (@Alibaba_WanX): 🌟 Big News from @alibaba_cloud! 🌟Meet WanX - our next-gen AI model redefining video generation !🚀 Presenting mind-blowing demos from WanX 2.1！🔥 Even more exciting:WanX 2.1 will be OPEN-SOURCE !Com...

Interconnects (Nathan Lambert) ▷ #ml-questions (12 messages🔥):

Benchmarking models, Parsing PDFs, O1 Pro limitations, Scraping benchmarks for reference, ODR model limitations 

Issues with Benchmark Table Creation: A member attempted to have o3-mini-high create a benchmark table for models InternVL2.5 and Qwen2.5-VL but faced major failures in the process.
Another member suggested parsing the PDFs elsewhere, highlighting limitations with O1 Pro when it comes to handling benchmarks.

Creative Solutions for Benchmarking: A member considered taking time off to develop a systematic approach to scrape benchmarks from papers and consolidate them into a single source for arxiv papers.
They expressed frustration with the current methods of benchmarking comparisons where results seem inconsistent across different papers.

Handling Contextual Issues with ODR: Concerns were raised about ODR's capability to manage comparisons between more than two models, with noted struggles at three models and failures with four or more.
The discussion highlighted the need for reliable benchmarking tools that can handle complex comparisons more efficiently.

O1 Pro's Limitations: Members expressed frustration with the limitations of O1 Pro, particularly in managing long contexts and generating coherent outputs.
One participant jokingly referenced O1 Pro while sharing an attached image, indicating ongoing challenges.

Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):

Sakana Leaderboard Update, Microsoft Quantum Computing Claims, Speedup on Conv3d Tasks 

Sakana addresses memory-reuse exploit: Sakana has updated their leaderboard to resolve the memory-reuse exploit issue found in their AI CUDA Engineer project, with details available here.
Currently, only one task, 23_Conv3d_GroupNorm_Mean, still exhibits a speedup greater than 100x, despite the engineer forgetting the convolution part, which the eval script failed to catch.

Doubts on Microsoft's quantum computing breakthrough: Microsoft recently proclaimed a breakthrough in quantum computing, though skepticism looms as two expert referees advised against publishing due to concerns on the evidence. They stated, “The results do not represent evidence of Majorana zero modes.”
This has sparked discussions about the integrity of their findings and the broader implications for quantum computing advancements.

Underpowered test harness reveals ConvTranspose3d flaws: A user reviewed the speedup on task 13_ConvTranspose3d_Mean_Add_Softmax_Tanh_Scaling, noting that achieving a 66x speedup seemed implausible, given the kernel simply writes a constant, ignoring its computation altogether.
Concerns were raised about the test harness being massively underpowered, leading to questionable validation of speedup results.

Links mentioned:

Tweet from Trevor Cai (@trevorycai): I looked at the next highest remaining speedup, on 13_ConvTranspose3d_Mean_Add_Softmax_Tanh_Scaling. Intuitively, 66x speedups on conv3d should not be possible.The kernel literally ignores its computa...
Tweet from alphaXiv (@askalphaxiv): Microsoft has claimed a breakthrough in quantum computing … but not everyone is convinced.A deeper look reveals that two expert referees strongly advised against publication, raising concerns about th...
Tweet from miru (@miru_why): sakana have updated their leaderboard to address the memory-reuse exploit https://sakana.ai/ai-cuda-engineer/#limitations-and-bloopersthere is only one >100x speedup left, on task 23_Conv3d_GroupNo...

Interconnects (Nathan Lambert) ▷ #random (51 messages🔥):

Licenses Confirmation for Qwen, Granite Vision Model Release, Anthropic AI's Employee Retention, GRPO Training for LLMs, Online Test for Hunyuan Video Model 

Confirmation on Qwen Licenses: Confirmation about the accuracy of the Qwen licenses was received, with members noting that the 7B model lacks a LICENSE file.
hell yeah, tyvm was the response highlighting community involvement and excitement.

Granite Vision Breakthrough: The GraniteVision from IBM Research was introduced as a compact 2B parameter vision-language model that excels in document understanding.
This model shows promising results despite its small size, demonstrating a trend of efficient AI advancements.

Anthropic AI's High Retention Rate: It was noted that Anthropic AI boasts the highest employee retention rate among major AI labs, indicating a positive work culture.
However, a member humorously pointed out the departure of a notable figure, expressing mixed feelings about retention effectiveness.

Innovations in GRPO Training: A new GRPO setup allows model training with just 5GB VRAM for Qwen2.5, drastically improving context length efficiency.
The reduction in VRAM requirements enables more accessible long-context training, appealing to model developers.

Hunyuan Video Model Test Invitation: An invitation was extended to participate in an online test for the Hunyuan Video model, with a link to a Google Form for sign-ups.
This invitation highlights ongoing innovations and collaborations in the AI community, encouraging participation from interested partners.

Links mentioned:

Tweet from swyx 🗽 NYC (@aiDotEngineer) (@swyx): First time I’ve seen @AnthropicAI lay out its top priorities like thisfocusing more on mechinterp than Claude 4 now!great presentation from @ambricken and Joe Bayley!Quoting swyx 🗽 NYC (@aiDotEnginee...
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines: Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 2...
Tweet from PapersAnon (@papers_anon): S*: Test Time Scaling for Code GenerationHybrid test-time scaling framework that enhances parallel samples through iterative debugging and then selects the best sample by prompting an LLM to generate ...
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence: We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our mod...
Just House GIF - Just House Totally - Discover & Share GIFs: Click to view the GIF
Tweet from Hunyuan (@TXhunyuan): Thank you all, my partners, for your support, patience, and companionship. After a long period of polishing, we are now ready to invite some partners to join us in an online test of our Hunyuan Video ...
Long-context GRPO (R1 Reasoning): DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.Run the 1.58-bit Dynamic GGUF version by Unsloth.
Tweet from Victor M (@victormustar): This is amazing! Someone trained a 1.5B model (with GRPO) to solve mazes!  model: http://hf.co/homebrewltd/AlphaMaze-v0.2-1.5B
Подкаст Глеба Соломина: Глубокие беседы с мыслящими людьми. Глеб Соломин — 24-летний предприниматель, выпускник МГУ (окончил с красным дипломом) приглашает в гости выдающихся учёных, бизнесменов и людей, достигших высот в св...
Tweet from Binyuan Hui (@huybery): …Binyuan is cooking…
Tweet from Dwarkesh Patel (@dwarkesh_sp): Writing internal LLM wrappers for my podcast's post-production is my Factorio
Tweet from swyx 🗽 NYC (@aiDotEngineer) (@swyx): TIL @AnthropicAI has the highest employee retention rate of the big labsQuoting swyx 🗽 NYC (@aiDotEngineer) (@swyx) First time I’ve seen @AnthropicAI lay out its top priorities like thisfocusing more...
Tweet from Mihir Patel (@mvpatel2000): Small life update: I joined Anthropic at the start of the year! The future is going to be wild, and I'm incredibly happy to be part of a team changing the world for good 😊. I'm also excited t...
General Reasoning: Making state-of-the-art reasoning more accessible to everyone.
Tweet from Junyang Lin (@JustinLin610): @TheXeophon Yes. 7 is of apache 2.0
Tweet from Eli Schwartz (@Eli_Schwartz): 📣 We've been cooking something special...I'm excited to share #GraniteVision from @IBMResearch  - a compact 2B parameter vision-language model that's "punching above its weight(s)"...
Tweet from Ross Taylor (@rosstaylor90): Increasing gunicorn workers, give me one sec - had heavy traffic load…

Interconnects (Nathan Lambert) ▷ #memes (3 messages):

AIME 2025 Performance, Grok Models, Microsoft's New State of Matter 

AIME 2025 Performance Insights: A compilation of results from various sources on AIME 2025 performance of Grok and OpenAI models was shared, highlighting clarity in the findings compared to traditional chart data.
Yuhuai (Tony) Wu emphasized, 'If you rly care about pass@1, you should compare numbers to mini.'

Grok3's Gradual Improvement: Yuhuai shared that the Grok3 model is larger and takes longer to train, having been released in an early version but still exhibiting intelligence.
He assured the community, 'Stay tuned for more power to be unleashed.'

Microsoft Breaks Ground with New Matter: A notable claim was made regarding Microsoft achieving a breakthrough by inventing a new state of matter, though further details were not provided.
The comment left the audience intrigued about the implications of this scientific advancement.

Links mentioned:

Tweet from Paul Smith (@Paul_SLG): Microsoft has invented a new state of matter and yet also done this.
Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): Fair enough. Here's my compilation of all results from relevant sources on AIME 2025 performance of Grok and OpenAI models, plus extrapolations of cons@64 for DeepSeek models and o1. I think this ...

Interconnects (Nathan Lambert) ▷ #cv (3 messages):

SigLIP 2, Multilingual Vision-Language Encoders 

Google launches SigLIP 2!: Google just dropped SigLIP 2, featuring multilingual vision-language encoders with improved semantic understanding, localization, and dense features.
SigLippers express excitement, with one member asking, can I get a hell yeah?

Community unites over SigLIP 2: The community reacted positively to the SigLIP 2 announcement, with one member responding emphatically, Hell yeah!.
This indicates a shared enthusiasm amongst members regarding the new technology.

Link mentioned: Tweet from AK (@_akhaliq): Google just dropped SigLIP 2Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Interconnects (Nathan Lambert) ▷ #reads (1 messages):

GPU Glossary, Charles Frye's Talk, UCSC Event, LLM Development, Streaming Multiprocessor Architecture 

GPU Glossary Aims to Clarify Confusion: The team at Modal created a GPU Glossary to tackle the fragmented documentation on GPUs, connecting key concepts like Streaming Multiprocessor Architecture and Compute Capability.
This resource aims to streamline understanding and enhance accessibility for developers grappling with GPU intricacies.

Charles Frye's Upcoming Talk at UCSC: Charles Frye from Modal will give a talk tomorrow at UCSC, focused on enhancing developer knowledge in the GPU landscape.
Those interested can catch a live podcast titled 'What Every LLM Developer Needs to Know About GPUs', where Frye discusses machine learning systems.

Links mentioned:

README | GPU Glossary: no description found
What Every LLM Developer Needs to Know About GPUs with Charles Frye: Join a live podcast recording with Charles Frye, Developer Advocate at Modal and expert in machine learning systems, in conversation with host Hugo Bowne-And...

Yannick Kilcher ▷ #general (47 messages🔥):

Logits vs Probabilities, Normalization in Training, Diffusion Models, LoRA Limitations, Unreliable Training Data 

Logits prioritize optimization over probabilities: Discussants highlighted that logits enable more efficient optimization since they avoid the immediate need for normalization, which can reduce computational complexity during training.
They emphasized that while probabilities are essential for sampling and decision-making, maintaining a longer presence in logit space during training can enhance performance in related tasks.

Normalizers impact on gradients: It was noted that normalizers do not affect gradients during optimization, and discussants agreed that logits could still be beneficial for training despite involving extra complexities at inference time.
Some argued that methods like score matching and diffusion models prove that it’s feasible to compute the normalizer later during inference without hindering training efficacy.

Challenges with LoRA: Participants discussed the limitations of LoRA in fine-tuning models, emphasizing that lower dimensional spaces could lead to information loss and insufficient fitting of new data points.
While higher-rank LoRA can overcome some drawbacks, it sacrifices efficiency, thus bringing significant trade-offs in model training.

Exploring diffusion models for symbolic tasks: There was a call for more research into using diffusion models for discrete tasks like text generation, particularly in real-time contexts.
Previous mentions of LLaDA showcasing impressive performance in language understanding raised questions about its applicability in producing reliable outputs when trained on limited data.

Synthesizing data in model training: There was uncertainty about reliable training data sources for models using untranslatable or incomplete languages, like dead written languages.
It was pointed out that synthetic data, such as that used by DeepSeek, might serve as a basis for training in this context, albeit with potential reliability concerns.

Links mentioned:

Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
Tweet from Shashwat Goel (@ShashwatGoel7): I pieced together this first-principles no RL prerequisites explainer on how RL for LLMs works, and why we need it🧵The main point? RL is exciting because it allows us to scale supervision. We can now...
Topological Qubits are Here! Discussing Majorana 1 — with Chetan Nayak of Microsoft | Ep. 97: Quantum computing will never be the same again. Join host Konstantinos Karagiannis for a special onsite interview at Microsoft Azure Quantum labs, where he w...
Large language models and (non-)linguistic recursion: Recursion is one of the hallmarks of human language. While many design features of language have been shown to exist in animal communication systems, recursion has not. Previous research shows that GP...
Parsing - Wikipedia: no description found
Backus–Naur form - Wikipedia: no description found

Yannick Kilcher ▷ #paper-discussion (62 messages🔥🔥):

DeepSeek Research, Paper Presentation Series, Sparsity in Models, Conditional Attention vs. Sparse Attention, Direct Policy Optimization (DPO) 

DeepSeek Research Receives Praise: A member expressed appreciation for DeepSeek, highlighting their high standard of research and ability to clearly explain complex concepts.
They described their recent paper as surprisingly simple yet incredibly effective, eagerly anticipating sharing it in the upcoming discussion.

Proposal for Structured Paper Presentation Series: A suggestion was made to organize a systematic series of papers focused on specific topics to aid understanding and build on prior knowledge.
Members are interested in a series that tackles subjects such as diffusion models and score-matching techniques in a structured format.

Sparsity and Efficiency in Models Discussed: Discussion highlighted the importance of sparsity in model efficiency and interpretability, and its advantages over current methods.
Members noted that harnessing sparsity can lead to better performance in real-world applications compared to traditional, data-hungry models.

DPO Paper Presentation Arrangement: A member offered to present insights on Direct Policy Optimization (DPO) during a meeting, emphasizing its relevance for understanding LLMs.
The scheduling for the presentation is set for 5:30 PST; members are excited about the opportunity to delve deeper into the topic.

Curiosity and Learning in AI: Members debated the maturity of reinforcement learning and continual learning, mentioning the need for better alignments with human learning mechanisms.
One member noted that defining a new term, ActGI (Actual General Intelligence), could help clarify current debates surrounding AGI.

Link mentioned: Manuscript | Arc Institute: Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.

Yannick Kilcher ▷ #ml-news (14 messages🔥):

Helix Model Introduction, Open Source Week by DeepSeek, Unsloth.AI's AI Development, Chinese VideoGen Market Dominance, Reinforcement Learning 

Introducing Helix Model for VLA Tasks: The YouTube video titled Introducing Helix presents a generalist Vision-Language-Action (VLA) model that aims to unify perception, language understanding, and learned control.
The launch discusses how Helix overcomes challenges in multi-modal tasks, prompting excitement among community members.

DeepSeek Prepares for Open Source Week: DeepSeek announced Open Source Week, where they will be sharing 5 repositories to support the exploration of AGI.
Their commitment to transparency emphasizes community-driven innovation over isolated development, creating anticipation for updates.

Unsloth.AI Revolutionizes Model Fine-Tuning: In the Start Up Wednesday with Unsloth.AI video, the founders introduce their project that enhances AI model fine-tuning, making it twice as fast.
This open-source initiative is generating interest as they aim to improve accessibility in AI development.

Chinese Firms Lead VideoGen Innovation: Discussion points to Chinese dominance in the VideoGen market, leaving competitors like OpenAI looking ineffective.
Members express concern over the competitive landscape, emphasizing the rapid advancements being made by these companies.

Reinforcement Learning is Back: Participants in the conversation note that Deep Reinforcement Learning (RL) is making a strong comeback, leading to excitement and discussions about its applications.
One member humorously claimed to be a belieber in RL now, highlighting a renewed interest in its potential.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full tra...
Tweet from undefined: no description found
Gif GIF - Gif - Discover & Share GIFs: Click to view the GIF
Introducing Helix: We're introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome mul...
Start Up Wednesday with Unsloth.AI: Meet Daniel and Michael Han, the Australian brothers transforming AI development with Unsloth. Their open-source project makes model fine-tuning 2x faster wh...
AMBIDEX Task Learning Project (ENG): Developed by NAVER LABS, with Korea University of Technology & Education (Koreatech), the robot arm now features an added waist, extending the available work...

Eleuther ▷ #general (14 messages🔥):

AI CUDA Engineer, ChatGPT System Prompt Leak, CoT Summarizer Performance, Locale and Model Behavior, ClosedAI Model Limitations 

AI CUDA Engineer Optimizes Models: Recent advancements in the AI CUDA Engineer aim to automate CUDA kernel discovery and optimization, showcasing over 90% success in translating PyTorch to CUDA.
However, there are concerns about the dataset quality, with some reports suggesting that generated kernels may also be flawed.

ChatGPT's New System Prompt Details: A reported leak of the ChatGPT system prompt indicates updates with a knowledge cutoff of June 2024, instructing the assistant to maintain brevity and adapt to user tone.
This information raises questions about how user interaction might influence the model's behavior and output.

CoT Summarizer's Language Output Variability: Findings suggest that under H-CoT attacks, the o1 model may output reasoning in languages other than English, raising concerns about the CoT summarizer's instruction limitations.
It's speculated that the system prompt might lack locale information, challenging the model's consistency.

Mysteries of Proxy IP Impact: There is speculation around how changing a proxy's IP address could alter model behavior, as the browser's locale isn’t linked to the IP.
This inconsistency prompts questions about how the CoT summarizer handles information without locale context in its operations.

Concerns Over ClosedAI Model Flexibility: A participant noted that as ClosedAI models, there are intrinsic limitations and unpredictable behaviors that persist despite user expectations.
Understanding these constraints is critical for guiding future model interactions and expectations.

Links mentioned:

The AI CUDA Engineer 👷: no description found
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking: Large Reasoning Models (LRMs) have recently extended their powerful reasoning capabilities to safety checks-using chain-of-thought reasoning to decide whether a request should be answered. While this ...
Tweet from Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius): 🚿 SYSTEM PROMPT LEAK 🚿Got the latest ChatGPT sys instructs! No drastic changes compared to what we've seen before, but nice to see that June 2024 knowledge cutoff!PROMPT:"""You are C...
Tweet from Tibor Blaho (@btibor91): Good find! The summarizer system prompt was hardcoded in the experiments configuration - I was able to independently confirm this:```You're a really smart AI that produces a stream of consciousnes...
Tweet from adi (@adonis_singh): guess the model

Eleuther ▷ #research (66 messages🔥🔥):

Sakana project issues, COLM reputation, Funding and resources for EleutherAI 

Sakana project encounters significant issues: There are concerns that the Sakana project has multiple confirmed bugs and lacks thorough human verification, raising questions about the integrity of their research outputs.
Some members suggested that poor practices in research may stem from VC funding, leading to negligence or irresponsibility in reporting results.

COLM's emerging reputation: A discussion arose about the reputation of the newly established COLM conference, which has only been around for a year.
Though hard to assess its standing now, there is optimism for COLM to become a top-tier venue in the future.

Funding sources for EleutherAI: EleutherAI has received funding primarily from large grants from Hugging Face and Open Philanthropy, as well as contributions from other organizations like Mozilla and OpenAI.
Additionally, they have access to computing resources from various institutions and receive non-monetary support from companies relying on their research infrastructure.

Links mentioned:

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines: Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 2...
SYNTHETIC-1 Release: Two Million Collaboratively Generated  Reasoning Traces from Deepseek-R1: We are releasing SYNTHETIC-1, the largest open reasoning dataset generated from Deepseek-R1, collaboratively generated by compute contributors across the globe.
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training: Self-attention is essential to Transformer architectures, yet how information is embedded in the self-attention matrices and how different objective functions impact this process remains unclear. We p...
MoM: Linear Sequence Modeling with Mixture-of-Memories: Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. Howev...
Approximating KL Divergence: no description found
MLGym: A New Framework and Benchmark for Advancing AI Research Agents: We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks,...
Reddit - Dive into anything: no description found
Tweet from leloy! (@leloykun): New NanoGPT-Medium Speedrun Record at 27.67 mins!In the Medium track, the goal is to reach 2.92 FineWeb val loss on an 8xH100 podPrevious record: 28.1 minutesChangelog: optimized coefficients for the ...
Reddit - Dive into anything: no description found

Eleuther ▷ #scaling-laws (2 messages):

Scaling Rules Paper, Training Compute-Optimal Large Language Models, Model Convergence 

Large Models Converge Faster: A user questioned whether the x-axis in the figure from the scaling rules paper indicates steps divided by the number of epochs, observing that larger models converge with fewer steps than smaller ones.
Did anyone see this pattern in other papers?

No Equivalent Figure Found: The same user noted they had not seen an equivalent figure in the Training Compute-Optimal Large Language Models paper.
The absence of comparative figures prompts further exploration of convergence trends.

Eleuther ▷ #lm-thunderdome (4 messages):

Model Path Errors, Private Help Requests, OpenAI and Local-Chat Completions 

Users face Model Path Errors: A user mentioned encountering an error while specifying the model path in their command: 'my model path is wrong.'
They provided a command using lm_eval indicating a potential path setup issue.

Request for Private Assistance: One user expressed a desire for private help by stating: 'Can we connect on private messages?'
They highlighted the importance of receiving assistance for their model path issue.

Exploring Completions Models: In their attempts to resolve the issue, a user mentioned trying both openai-completions and local-chat completions.
This indicates their exploration of different model options to resolve the completion task.

Eleuther ▷ #gpt-neox-dev (22 messages🔥):

NCCL_BUFFSIZE adjustments, BF16 mixed-precision training, NeoX definition of a step, Checkpoint divergence handling, Gradient accumulation in FP32 

Adjust NCCL_BUFFSIZE for better performance: A member suggested adjusting the NCCL_BUFFSIZE using the command export NCCL_BUFFSIZE=2097152 for potential improvements, which is especially valuable for nodes with InfiniBand.
Another user noted that they manipulated the bucket size argument in DeepSpeed instead.

BF16 MP training uncovered: One user inquired about running BF16 mixed-precision training with fp32_allreduce set to False, allowing for fitting a 7B model on a single A100 card.
A member cautioned that using fp32_allreduce impacts training stability, sharing experiences of divergent runs.

Clarifying NeoX steps: There was a query concerning NeoX's definition of a step during gradient accumulation, specifically whether a step represents a batch or a series of accumulated batches.
A member clarified that a step means applying global gradients to perform a weight update after synchronization.

Handling checkpoint divergence: In discussing divergence issues, one question arose about continuing from an intermediate checkpoint after setting fp32_allreduce to True.
It was confirmed that while possible, checkpoints could involve degradation, especially if gradients in lower precision affect model quality.

Gradient accumulation strategy: Concerns were raised about performing local gradient accumulation in FP32 while conducting reduction operations in BF16.
A member highlighted that this approach could still adversely impact model quality, echoing previous concerns about the relationship between gradient precision and model performance.

MCP (Glama) ▷ #general (68 messages🔥🔥):

MCP Server Setup, Using Custom Context in MCP, Automating Testing Lifecycle with MCP, MCP Integration with Cursor and LibreChat, MCP and Discord Interactions 

MCP Server and Documentation Context: A user inquired about an MCP server that could add documentation (markdown) as context, allowing chat to remember it in the conversation.
This raises interest in the integration of persistent documentation aids within conversational contexts for better memory retention.

Automating Test Lifecycle with MCP: A member detailed their goal to automate their test cycle using an MCP server to run tests, capture and parse logs, and generate recommendations for fixes.
They also mentioned wanting to achieve integration with Github MCP to create PRs in their testing workflow.

MCP Compatibility with Cursor and LibreChat: A user asked about configuring an MCP server for use with the Cursor app or LibreChat, particularly with an MCP server they set up for Obsidian.
They provided a link to their server implementation, seeking guidance on the necessary configurations for different chat applications.

Context Handling with MCP and Python: Discussion on extending the MCP client session to handle context-specific calls, including subclassing request types to introduce additional fields.
Suggestions included avoiding extensive monkey-patching and instead leveraging the flexibility within Pydantic models for robust implementations.

MCP Integration in Development Environments: A user expressed an interest in livestreaming their VSCode environment using MCP to interact with bots for real-time coding suggestions.
They noted that while there’s support for JetBrains via MCP, they wished for similar capabilities in Visual Studio Code.

Links mentioned:

Authorization:           ℹ️                  Protocol Revision: draft      1. Introduction    1.1 Purpose and Scope    The Model Context Protocol provides authorization capabilities at the transport level,enabling M...
GitHub - MarkusPfundstein/mcp-obsidian: MCP server that interacts with Obsidian via the Obsidian rest API community plugin: MCP server that interacts with Obsidian via the Obsidian rest API community plugin - MarkusPfundstein/mcp-obsidian

MCP (Glama) ▷ #showcase (7 messages):

Sage release, MCP.run vendor lock-in, MCP-server and client capabilities, AI Discord bot integration, Music playback in Discord 

Ssshhhh New Sage Release: A user hinted at a new Sage release, sharing an image and a video but without further details.
Image details were not discussed, leaving the impact of the release unclear.

Concerns about MCP.run Vendor Lock-In: A member questioned whether anyone actually uses mcp.run beyond toy examples, pointing out a high level of vendor lock-in within the platform.
Another user responded, indicating the platform's standards remain fairly typical from the user's perspective.

Cool MCP-Server and Client Integration: A user showcased their MCP-server and client setup, which allows for easy mcp-export of tagged class methods for API integrations.
This setup enables an AI Discord bot to play songs in voice channels, demonstrating practical applications of MCP.

AI Bot Corrects Playback Issues: The same user described their AI bot's ability to correct playback issues by leaving and rejoining voice channels to play songs accurately.
This seamless interaction illustrates the capability of the bot to adapt on the fly while interacting with users.

Link mentioned: Wolf Of Wall Street Lets Goo GIF - Wolf Of Wall Street Lets Goo - Discover & Share GIFs: Click to view the GIF

Modular (Mojo 🔥) ▷ #general (1 messages):
eggsquad: <@1275513561199935621> The Modular branded Patagonia sweater goes hard

Modular (Mojo 🔥) ▷ #mojo (42 messages🔥):

Mojo Windows Support, Mojo vs. Python, GPU Performance in Mojo, Concurrency Handling in Mojo, Debugging Mojo Programs 

Mojo Windows Support Status Uncertain: Members discussed that there is currently no timeline for native Mojo support on Windows due to high costs associated with running AI clusters on the OS, particularly influenced by Microsoft's licensing fees.
nix OSes provide better compute features, making Linux the preferred choice for deploying projects like MAX*.

Mojo's Performance Comparison With Python: Mojo is characterized as a language that looks like Python but offers performance closer to C/C++/Rust, aiming for compatibility akin to how C++ relates to C.
It's believed that Mojo's type system will surpass Rust’s by avoiding some of its pitfalls, allowing for greater versatility.

Getting Started with Mojo GPU Programming: For beginners in Mojo’s GPU programming, utilizing functions like parallelize and for_each in normal code is recommended as a good starting point.
Further details on setting up GPU interactions can be found in a dedicated forum thread encouraged by members of the community.

Handling Concurrency in Mojo: One member shared their experience managing concurrency in Mojo, describing a strategy using process-per-core and shared memory IPC.
They emphasized the importance of managing pointers without lifetimes for more efficient memory handling.

Debugging Mojo Programs Efficiently: A method to activate debugging in Mojo via environment variables or compile-time options was detailed, demonstrating how to retrieve and use the DEBUG variable.
This highlights the flexibility Mojo offers in debugging, with practical coding examples provided.

Links mentioned:

Resources for learning MAX for non-ML developers: Given that MAX is likely to attract new people to heterogeneous programming (me, I am people). I think it would helpful to have a collection of resources to help developers with no gpu programming exp...
Modular: Build the future of AI with us and learn about MAX, Mojo, and Magic.

Notebook LM ▷ #use-cases (8 messages🔥):

AI's Limitations in Creative Writing, Concerns About Trusting AI, Using NotebookLM for Writing Assistance, Guides for Effective NotebookLM Usage, Character and Word Limits in NotebookLM Plus 

Frustrations with AI in Creative Work: Writers expressed frustration with AI's inconsistencies, noting it sometimes provides beautiful insights but often leads to long frustrating listens due to errors.
One member highlighted a recent decline in AI's performance since the launch of the Plus service, feeling it has become more of a hindrance than help.

Skepticism About Trusting AI Outputs: A user recounted their experience searching for a comic, stating that AIs cannot be fully trusted, as they often produce different names for characters and plots.
Despite positive identifications, the AI ultimately admitted to fabricating some details, raising concerns about reliability.

NotebookLM as a Writing Tool: A member emphasized using NotebookLM to assist with small details while writing their novel, admitting it is not yet reliable as a canon source.
They noted the tool's effectiveness varied, labeling it as sometimes a very bad tool.

Effective Usage of NotebookLM for Learning: A user shared their method of using NotebookLM to improve their understanding of exponential material, starting with YouTube courses.
They recommended sharing links with NotebookLM and testing comprehension by taking notes, showing a systematic approach towards learning.

Inquiry About Output Limits in NotebookLM Plus: A user sought clarification on the specific word and character limits for outputs when using the long option in NotebookLM Plus.
This information is crucial for them to better define requests, underlining a focus on output specifications.

Link mentioned: no title found: no description found

Notebook LM ▷ #general (25 messages🔥):

Organizing Notebooks, Usage of Audio Deep Dive, NotebookLM iOS App, Quality of NotebookLM Answers, Sharing Limitations of NotebookLM 

Request for Notebook Organization Features: A user requested an option to make folders for organizing their notebooks, and was informed that there is a feature request filed for it internally.
They expressed their eagerness to see this feature implemented soon.

Clarification on Using Audio Deep Dive Sessions: A member inquired about using the Audio 'Deep Dive' sessions in their courses, and received confirmation that it can be shared within their educational domain.
Links to guidelines on generating Audio Overviews were provided to help with the process.

Questions About iOS App for NotebookLM: A user sought guidance on the correct iOS app for NotebookLM, signaling a need for clarity in available mobile applications.
No specific recommendations were given in the conversation.

Differences in NotebookLM Subscription Quality: A user questioned whether the quality of answers differs between free and paid versions, clarifying that only the number of sources and few features vary.
The conversation hinted at a need for more transparency regarding the differences in subscription tiers.

Concerns About Sharing NotebookLM Content: A user expressed frustration about the limitations on sharing NotebookLM with individuals outside their organization, emphasizing a need for sharing capabilities.
This sentiment reflects a broader concern within the community regarding collaboration and content access.

Links mentioned:

YouTube: no description found
Audio Overviews - NotebookLM Help: no description found
AI Bridge - Chrome Web Store: Capture and upload links to NotebookLM
10 Pro Tips to Boost any Research with NotebookLM: Free and Best AI Tool to level up your research and Content Creation - NotebookLM! From instantly digesting 100s of documents, videos, websites to multi-lang...

Latent Space ▷ #ai-general-chat (26 messages🔥):

Arize's Series C Funding, OpenAI's User Growth, Deep Seek Launchweek, Facebook's Reasoning Dataset, NEO Gamma Robotics 

Arize raises $70M to enhance AI performance: Arize AI secured a $70 million Series C funding to focus on ensuring AI agents work reliably in the real world, addressing challenges with generative models and autonomous systems.
Their mission began in 2020, aiming to improve tools for understanding and troubleshooting AI performance in real scenarios.

OpenAI reaches 400 million weekly active users: OpenAI reported crossing 400 million weekly active users, with 2 million business users utilizing ChatGPT at work, marking a 33% growth in under three months.
Upcoming models, GPT-4.5 and GPT-5, promise to unify existing models while expanding agent capabilities.

Deep Seek's Open Source Initiative: Deep Seek announced the start of #OpenSourceWeek, planning to open-source five repositories to share their progress and innovations in AGI.
They emphasized a community-driven approach to development, with documentation and deployment made public for collective momentum.

Facebook's dataset enhances reasoning capabilities: Facebook released a dataset featuring 1M+ reasoning traces, designed to challenge AI with high-quality questions aimed at improving reasoning techniques.
This dataset includes reference answers and is expected to boost the performance of reasoning models across various applications.

1X introduces NEO Gamma for home assistance: 1X Tech is promoting their new robot, NEO Gamma, which has been tested in employee homes and designed to perform household chores with improved reliability.
The humanoid design aims to provide natural interactions, showcasing advanced capabilities in walking and bending, reinforcing the notion that this design is critical for home labor.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full tra...
Tweet from Tom Warren (@tomwarren): scoop: Microsoft is getting ready for OpenAI's GPT-5 model, and GPT-4.5 could arrive as soon as next week. All of this and more in this week's 📒 Notepad  issue, live now for subscribers 👇 ht...
Tweet from Binyuan Hui (@huybery): …Binyuan is cooking…
Arize AI Raises $70M Series C to Build the Gold Standard for AI Evaluation & Observability: Learn how we're shaping the future of trustworthy LLMs & AI agents, and what's next for Arize in our Series C announcement.
Tweet from Eric Jang (@ericjang11): We've been dogfooding NEO Gamma in 1X employee homes for weeks now, doing chores around the house. Under the suit, NEO Gamma has a lot of HW improvements that make it more reliable. The 1X AI team...
Tweet from Aran Komatsuzaki (@arankomatsuzaki): Google presents:SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense FeaturesOpensources model ckpts with four sizes from 86M to 1B
Tweet from AI Engineer (@aiDotEngineer): We are now LIVE with the Agents track!https://www.youtube.com/watch?v=D7BzTxVVMuwFirst talks from:- @swyx (Agent Engineering)- @sayashk (AI Agents That Matter)- @AarushSelvan (Deep Research)- @barry_z...
Tweet from Caleb (@calebfahlgren): WOW! Facebook drops a dataset of 1M+ reasoning traces 🤯It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora DCLM and FineMath. The dataset includes the ...
Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): > every line shared becomes collective momentum that accelerates the journey.> No ivory towers - just pure garage-energy and community-driven innovation.You can just tell, R1 "hands" wro...
Vercel’s Developer Frameworks with Ary Khandelwal and Max Leiter - Software Engineering Daily: The availability of high-quality AI model APIs has drastically lowered the barriers developing AI applications. These tools abstract away complex tasks such as model deployment, scaling, data retrieva...
Tweet from Brad Lightcap (@bradlightcap): chatgpt recently crossed 400M WAU, we feel very fortunate to serve 5% of the world every week2M+ business users now use chatgpt at work, and reasoning model API use is up 5x since o3 mini launchwe'...
Tweet from wh (@nrehiew_): Everyone lock in. What do we think- NSA- MOE training (?)Quoting DeepSeek (@deepseek_ai) 🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week,...
AI Leaders Reveal the Next Wave of AI Breakthroughs (At FII Miami 2025) | EP #150: In this episode, Peter is joined by a panel of leaders in the “ TRANSFORMING BUSINESS WITH AI OPPORTUNITY OR OVERLOAD” at the Miami FII Conference to discuss...
Tweet from tldraw (@tldraw): Today we're launching the new tldraw ai module. If you're a developer and want to experiment with LLMs on a whiteboard, this is for you.
Tweet from Ibelick (@Ibelick): Introducing prompt-kitA set of high-quality, customizable components for building AI interfaces.Built on top of shadcn/ui, starting with PromptInput, a flexible input field for chatting with AI models...

Latent Space ▷ #ai-announcements (1 messages):
swyxio: https://x.com/aiDotEngineer/status/1892934641067360444

Torchtune ▷ #general (14 messages🔥):

pytest errors, Installing dependencies, Test artifacts, Community support 

Original pytest error resolved: A user reported an error while running tests with pytest, which was caused by unrecognized arguments in the command line.
After installing dev dependencies with pip install -e .['dev'], the original error was resolved.

New error regarding test artifacts: The user encountered a ValueError indicating missing test artifacts after fixing the initial issue.
Members suggested deleting the /tmp/test-artifacts directory to trigger automatic redownload of necessary artifacts.

Community steps in to assist: Members of the Discord provided solutions and shared their own experiences with installing dependencies correctly.
The collaborative support was appreciated, with the user expressing gratitude for the help received.

Torchtune ▷ #dev (13 messages🔥):

MLGym Launch, GRPO Optimizations, Width/Depth Pruning Discussion, GRPO PR Engagement, Assistant Opportunities in GRPO 

MLGym Framework Unveiled: Meta introduced MLGym, the first Gym environment for ML tasks, featuring 13 diverse AI research tasks across multiple domains.
One member expressed excitement, stating it looks amazing and was about to share the news themselves.

Significant GRPO VRAM Reductions: A recent blog revealed that the Unsloth Efficient GRPO algorithm enables 10x longer context lengths using 90% less VRAM, allowing users to train a reasoning model with just 5GB VRAM for Qwen2.5.
Users highlighted the dramatic reduction in VRAM requirements for Llama 3.1 training, dropping from 510.8GB to 54.3GB.

Width/Depth Pruning on Hold: A discussion on the need for an RFC on width/depth pruning concluded that the team currently lacks the bandwidth to address it.
Another member suggested discussing it during office hours before potentially transforming the ideas into a common PR.

GRPO PR Expected to Generate Buzz: One member speculated if the GRPO PR might break the record for comments, indicating strong engagement.
Another responded with shared feelings about the typical comment thread discussions on PRs.

Willingness to Assist with GRPO: A member announced their eagerness to work on GRPO, KD, quantization, and pruning, inviting others to reach out for assistance.
They mentioned the idea of shadowing someone experienced in these areas to better learn and contribute.

Links mentioned:

Tweet from Aran Komatsuzaki (@arankomatsuzaki): Meta presents:MLGym: A New Framework and Benchmark for Advancing AI Research Agents- The first Gym environment for ML tasks- 13 diverse and open-ended AI research tasks from diverse domains
Long-context GRPO (R1 Reasoning): DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.Run the 1.58-bit Dynamic GGUF version by Unsloth.
None Of Them Are Worthy The Crown Pinky Malinky GIF - None Of Them Are Worthy The Crown Pinky Malinky They Are Not Deserving For The Crown - Discover & Share GIFs: Click to view the GIF

LlamaIndex ▷ #blog (2 messages):

LlamaParse upgrades, AI infrastructure talks, Document parsing modes, Training and fine-tuning applications 

LlamaParse Enhances Document Parsing: LlamaParse is evolving with new parsing modes: Fast, Balanced, and Premium, tailored for different needs, as documented in this tweet.
These upgrades aim to tackle document parsing challenges more effectively than ever before.

Upcoming AI Infrastructure Talks: An exclusive event on March 5 will feature talks on cutting-edge innovations in AI infrastructure, according to this announcement.
Discussions will cover practical applications for training, fine-tuning, inference, and RAG, aiming for superior performance at lower costs.

LlamaIndex ▷ #general (18 messages🔥):

Multi-Agent Handoff Issues, Creating AI from Documents, Visual Workflow Interfaces 

Multi-Agent Handoff Issues Resolved: After updating the custom handoff prompt, the issue of the LLM returning 'I am handing off to AgentXYZ' instead of a tool call was resolved, leading to proper JSON object outputs instead.
However, concerns remain about unpredictable behavior with agent handoffs and the LLM's temperature settings affecting the workflow.

Making AI with PDF Documents: A member inquired about creating an AI based solely on 100-1000 PDF documents, ensuring responses are limited to that data.
They were curious if they needed a separate server or computer to support this project.

Visual Workflow Interface Inquiry: One member asked if there is a way to create workflows in a visual interface similar to Logic Studio.
Another member confirmed that currently, there are no tools for visual workflow creation beyond existing drawing utilities.

Parallel Agent Execution Suggestion: Discussion emerged about the sequential nature of agents in workflows, with a member suggesting that agents can operate in parallel by being implemented as tools for other agents.
Another member requested example code for better understanding and insights into automating tasks like calendar events and emails through agents.

Clarification on Contributions: A member asked if there is a Contributor Agreement for the project, which was interpreted as a potential question about a Contributor License Agreement.
This raised further discussions about documentation and general project guidelines.

Links mentioned:

LogicStudio.ai - Visual AI Agent Orchestration: Build complex AI agent systems visually. Connect, orchestrate, and deploy intelligent workflows through an intuitive canvas interface.
AgentWorkflow Basic Introduction - LlamaIndex: no description found

Nomic.ai (GPT4All) ▷ #general (20 messages🔥):

NOMIC implementation, GPT4All local setup, Document querying issues, Model settings and tuning, Chat template extraction 

Concerns on NOMIC v2 Implementation: A user expressed confusion about the implementation of NOMIC v2, questioning if they were using it correctly.
The discussion highlighted potential gaps in understanding the new features and functionalities of the latest model.

Issues with GPT4All Local Setup: A new user shared difficulties in querying documents using GPT4All v3.9.0, having set up their local environment but encountered inaccurate outputs.
Despite using a collection of documents, the responses were often unrelated or incorrect, frustrating the user's attempts to extract specific information.

Tuning Model Settings for Better Performance: In response to querying issues, advice was given to adjust the context length and document size for better performance in GPT4All.
Users discussed optimal settings, suggesting a balance between context size and snippet quantity to improve document retrieval accuracy.

Extracting Chat Template from Tokenizer: A user attempted to extract a chat template using a command on the tokenizer file but encountered issues with missing system prompts.
Guidance was sought on how to set parameters like min_p and top_k in the generation configuration for better output management.

Continual Generation of Responses: Concerns were raised regarding outputs looping indefinitely when using the model, leading to repetitive self-chatting behavior.
Suggestions were made about tuning the model settings to curb these extended responses which hinder usability.

Link mentioned: QuantFactory/NeuralDaredevil-8B-abliterated-GGUF · Hugging Face: no description found

tinygrad (George Hotz) ▷ #general (16 messages🔥):

Performance Optimization in Testing, GROUP OptOps on CPUs and GPUs, Agentic CUDA Kernel Search, LLVM vs. CLANG Performance Differences, Concatenation Bounty Challenges 

Testing Performance Optimization Drama: Testing revealed that while aiming for 2048x2048 tensors, adding more BEAM solved performance issues, indicating a significant improvement when allowing UPCAST on reduce dimensions.
"Actually, I think we are good now... I just opened a PR for this" showcasing the progress.

GROUP OptOps Optimization Insights: A member shared findings about using a large GROUP followed by an UNROLL and small GROUP strategy in their kernel implementations, although achieving speeds over 70 GBps remained challenging.
"These optimizations only really help on LLVM, not CLANG," hinting at struggles with performance variances in different compilers.

Challenges with GROUP OptOps on CPUs: Issues arose with GROUP OptOps on CPUs, particularly causing failures in tests like test_arange due to estimated flops ballooning with increased input sizes.
Thoughts were shared on whether these inefficiencies are unavoidable, especially since these optimizations work fine on GPUs.

Agentic CUDA Kernel Search Insights: A new paper was mentioned discussing agentic CUDA kernel search, signaling advancements in optimizing kernel performance.
Discussion suggested possible connections to the ongoing optimizations and performance challenges in current projects.

Navigating LLVM Challenges: The community shared experiences on transitioning tensor operations to LLVM, specifically around defining UOp and handling binary operations effectively.
Concerns were raised about making the concatenation operation faster through more efficient memory copying techniques in LLVM.

Link mentioned: [Bounty] Made TestSpeed.test_sum yellow on Macs with LLVM by josephsweeney · Pull Request #9190 · tinygrad/tinygrad: To make this happen, I enabled GROUP OptOps's on devices without local variables (CLANG and LLVM), by just adding an extra reduce instead on emitting locals. The other necessary changes came d...

tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

tinygrad linearizer, codebase searching 

Introduction to tinygrad Linearizer: The tinygrad linearizer is a crucial part of the tinygrad framework, enhancing its capabilities.
The GitHub page highlights the charm of tinygrad for fans of frameworks like pytorch and micrograd.

Searching the Codebase as a Primer: A suggestion was made that searching the codebase is a useful first step in understanding how various components interconnect.
This strategy could facilitate better learning and utilization of the system's intricate features.

Link mentioned: tinygrad/tinygrad/codegen/linearize.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️  - tinygrad/tinygrad

Cohere ▷ #discussions (4 messages):

Cohere Embedding Models, Channel Colors, Benchmark Leaderboards 

Roazzy's New Pink Appearance: Roazzy announced a fun change stating, As yall can see, now I am pink in the chat.
Another member remarked that it was 'cool stuff', showcasing positive reactions to the update.

Inquiry about Cohere Model Benchmarks: A member inquired if the Cohere embedding models were submitted to benchmark leaderboards, specifically referencing evaluations against MTEB and BEIR.
They specifically noted the BEIR leaderboard and expressed interest in additional benchmarks for their university assignment.

Link mentioned: EvalAI: Evaluating state of the art in AI: EvalAI is an open-source web platform for organizing and participating in challenges to push the state of the art on AI tasks.

Cohere ▷ #cmd-r-bot (2 messages):

Half Rest Techniques 

Tricks for Achieving Half Rest: A user prompted for 'tricks' to help those looking to achieve a suitable amount of half rest.
While no specific techniques were shared, the interest in this topic was evident.

Seeking Rest Techniques: Another participant mentioned the need for rest strategies, indicating a collective interest in improving recovery.
The conversation suggests a potential for a more extensive discussion on effective rest methods.

DSPy ▷ #general (1 messages):

DSPy Customization, Chat History Integration, Performance Improvements 

Exploration of DSPy Feature Request 1435: Members discussed a recent feature request on GitHub regarding allowing specified chat history for language models in DSPy.
There was curiosity about any performance improvements achieved from custom implementations related to this feature and whether it was worth the effort.

Curiosity About Performance Gains: One member inquired about potential performance improvements from jerry-rigging solutions related to the chat history specification.
The discussion highlighted a need for clarity on whether such customizations are beneficial versus the efforts required to implement them.

Link mentioned: Feature request: Allow specifying chat history for LMs · Issue #1435 · stanfordnlp/dspy: Hi DSPy developers, First of all, thanks a lot for this great work! Recently I've been trying to integrate DSPy into my work, but I stumbled upon the chat history specification. My task is to desi...

                        Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):