[AINews] not much happened today

'NO LM Runtime found for model format'

                        February 11, 2025

            [AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            a quiet day.

AI News for 2/7/2025-2/10/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (210 channels, and 11464 messages) for you. Estimated reading time saved (at 200wpm): 1218 minutes. You can now tag @smol_ai for AINews discussions!

Just like Meta's Coconut before it, Huginn's Latent Reasoning Model made a splash today. We agree with Jeremy and Andrej that the best RL will probably not be in English, but we didn't choose this as feature story because presumably DeepSeek already tried that for r1 (our coverage here) and didn't find it worth the tradeoff of not being able to read the thoughts.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Other AI Subreddit Recap

AI Discord Recap
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
HuggingFace Discord
LM Studio Discord
OpenAI Discord
Cursor IDE Discord
Stability.ai (Stable Diffusion) Discord
Nous Research AI Discord
Codeium (Windsurf) Discord
OpenRouter (Alex Atallah) Discord
aider (Paul Gauthier) Discord
Latent Space Discord
Modular (Mojo 🔥) Discord
MCP (Glama) Discord
GPU MODE Discord
Notebook LM Discord
Eleuther Discord
Yannick Kilcher Discord
LlamaIndex Discord
Cohere Discord
LLM Agents (Berkeley MOOC) Discord
Torchtune Discord
Nomic.ai (GPT4All) Discord
tinygrad (George Hotz) Discord
DSPy Discord
Gorilla LLM (Berkeley Function Calling) Discord

PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (1052 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):
Unsloth AI (Daniel Han) ▷ #help (440 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #showcase (2 messages):
Unsloth AI (Daniel Han) ▷ #research (42 messages🔥):
HuggingFace ▷ #general (252 messages🔥🔥):
HuggingFace ▷ #today-im-learning (4 messages):
HuggingFace ▷ #cool-finds (6 messages):
HuggingFace ▷ #i-made-this (16 messages🔥):
HuggingFace ▷ #reading-group (15 messages🔥):
HuggingFace ▷ #computer-vision (6 messages):
HuggingFace ▷ #NLP (12 messages🔥):
HuggingFace ▷ #smol-course (19 messages🔥):
HuggingFace ▷ #agents-course (704 messages🔥🔥🔥):
HuggingFace ▷ #open-r1 (7 messages):
LM Studio ▷ #general (596 messages🔥🔥🔥):
LM Studio ▷ #hardware-discussion (149 messages🔥🔥):
OpenAI ▷ #annnouncements (1 messages):
OpenAI ▷ #ai-discussions (705 messages🔥🔥🔥):
OpenAI ▷ #gpt-4-discussions (23 messages🔥):
OpenAI ▷ #prompt-engineering (11 messages🔥):
OpenAI ▷ #api-discussions (11 messages🔥):
Cursor IDE ▷ #general (644 messages🔥🔥🔥):
Stability.ai (Stable Diffusion) ▷ #general-chat (599 messages🔥🔥🔥):
Nous Research AI ▷ #general (541 messages🔥🔥🔥):
Nous Research AI ▷ #research-papers (17 messages🔥):
Nous Research AI ▷ #interesting-links (11 messages🔥):
Nous Research AI ▷ #research-papers (17 messages🔥):
Codeium (Windsurf) ▷ #announcements (1 messages):
Codeium (Windsurf) ▷ #discussion (41 messages🔥):
Codeium (Windsurf) ▷ #windsurf (409 messages🔥🔥🔥):
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):
OpenRouter (Alex Atallah) ▷ #general (291 messages🔥🔥):
OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):
aider (Paul Gauthier) ▷ #general (216 messages🔥🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (70 messages🔥🔥):
aider (Paul Gauthier) ▷ #links (8 messages🔥):
Latent Space ▷ #ai-general-chat (126 messages🔥🔥):
Latent Space ▷ #ai-in-action-club (139 messages🔥🔥):
Modular (Mojo 🔥) ▷ #mojo (256 messages🔥🔥):
MCP (Glama) ▷ #general (157 messages🔥🔥):
MCP (Glama) ▷ #showcase (65 messages🔥🔥):
GPU MODE ▷ #general (16 messages🔥):
GPU MODE ▷ #triton (5 messages):
GPU MODE ▷ #cuda (4 messages):
GPU MODE ▷ #torch (3 messages):
GPU MODE ▷ #announcements (1 messages):
GPU MODE ▷ #algorithms (5 messages):
GPU MODE ▷ #cool-links (1 messages):
GPU MODE ▷ #jobs (3 messages):
GPU MODE ▷ #beginner (11 messages🔥):
GPU MODE ▷ #youtube-recordings (8 messages🔥):
GPU MODE ▷ #torchao (1 messages):
GPU MODE ▷ #off-topic (2 messages):
GPU MODE ▷ #rocm (2 messages):
GPU MODE ▷ #lecture-qa (1 messages):
GPU MODE ▷ #liger-kernel (1 messages):
GPU MODE ▷ #self-promotion (1 messages):
GPU MODE ▷ #avx (2 messages):
GPU MODE ▷ #thunderkittens (1 messages):
GPU MODE ▷ #edge (1 messages):
GPU MODE ▷ #reasoning-gym (89 messages🔥🔥):
Notebook LM ▷ #announcements (1 messages):
Notebook LM ▷ #use-cases (26 messages🔥):
Notebook LM ▷ #general (118 messages🔥🔥):
Eleuther ▷ #announcements (1 messages):
Eleuther ▷ #general (45 messages🔥):
Eleuther ▷ #research (49 messages🔥):
Eleuther ▷ #interpretability-general (18 messages🔥):
Eleuther ▷ #lm-thunderdome (7 messages):
Eleuther ▷ #multimodal-general (1 messages):
Eleuther ▷ #gpt-neox-dev (4 messages):
Yannick Kilcher ▷ #general (65 messages🔥🔥):
Yannick Kilcher ▷ #paper-discussion (41 messages🔥):
Yannick Kilcher ▷ #agents (6 messages):
Yannick Kilcher ▷ #ml-news (8 messages🔥):
LlamaIndex ▷ #blog (5 messages):
LlamaIndex ▷ #general (111 messages🔥🔥):
LlamaIndex ▷ #ai-discussion (1 messages):
Cohere ▷ #discussions (14 messages🔥):
Cohere ▷ #api-discussions (8 messages🔥):
Cohere ▷ #cmd-r-bot (59 messages🔥🔥):
LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (50 messages🔥):
LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (12 messages🔥):
Torchtune ▷ #general (13 messages🔥):
Torchtune ▷ #dev (43 messages🔥):
Nomic.ai (GPT4All) ▷ #general (45 messages🔥):
tinygrad (George Hotz) ▷ #general (21 messages🔥):
DSPy ▷ #show-and-tell (1 messages):
DSPy ▷ #papers (2 messages):
DSPy ▷ #general (3 messages):
Gorilla LLM (Berkeley Function Calling) ▷ #discussion (6 messages):

AI Twitter Recap
AI Model Releases and Advancements

Google's Release of Gemini 2.0 Flash Thinking Experimental 1-21: DeepLearningAI announced that Google released Gemini 2.0 Flash Thinking Experimental 1-21, the latest version of its vision-language reasoning model, featuring an expanded 1 million-token context window and a user-readable chain of thought. The update improves accuracy across science, math, and multimedia benchmarks, surpassing DeepSeek-R1 but trailing OpenAI's o1 in some areas.

Release of Zonos - Multilingual TTS Model with Voice Cloning: @reach_vb highlighted that ZyphraAI released Zonos, an Apache 2.0 licensed, multilingual Text-to-Speech model with instant voice cloning capabilities. The model supports zero-shot TTS with voice cloning using a 10-30 second speaker sample, audio prefix inputs for enhanced speaker matching, and controls for speaking rate, pitch, frequency, audio quality, and emotions. It runs at ~2x real-time speed on an RTX 4090 and is available on the Hugging Face Hub.

Hugging Face Releases OpenR1-Math-220k Dataset: @_lewtun and @reach_vb announced the release of OpenR1-Math-220k, a large-scale math reasoning dataset based on Numina Math 1.5, containing 220K math problems and 800K raw R1 reasoning traces generated on 512 H100 GPUs. The dataset is Apache 2.0 licensed, encouraging the community to fine-tune models and advance mathematical reasoning capabilities.

Advancements in AI Reasoning and Models

Introduction of Huginn-3.5B Latent Reasoning Model: Tom Goldstein introduced Huginn-3.5B, an open-source reasoning model that reasons implicitly in latent space without producing extra chain-of-thought tokens at test time. Trained on 800B tokens, Huginn-3.5B demonstrates significant improvements on reasoning tasks like GSM8K, outperforming larger models despite its smaller size.

Debate on Human-Readable Reasoning Traces: Jeremy Howard predicted that training AI systems to produce human-readable reasoning traces will eventually seem bizarre, comparing it to requiring a diffusion image model to output an image sequence that matches an artist's brush strokes. He suggests that future models may internalize reasoning in ways that are not easily interpretable by humans.

Scaling Test-Time Compute with Latent Reasoning: @iScienceLuvr discussed a new language model architecture capable of improving performance on reasoning benchmarks by implicitly reasoning in latent space. The model scales test-time computation without the need for specialized training data, supporting small context windows and capturing reasoning not easily represented in words.

AI's Impact on Industry and Economy

Anthropic Launches the Anthropic Economic Index: AnthropicAI launched the Anthropic Economic Index, aiming to understand AI's impact on the economy over time. Their first paper analyzes millions of anonymized Claude conversations to reveal how AI is being used across different tasks and occupations. Key findings include:

AI use tilts towards augmentation (57%) over automation (43%).
Software and technical writing tasks have the highest AI usage.
AI adoption is most common in medium-to-high income jobs, with low usage in very-high and low-income jobs.
The dataset and ongoing analysis aim to track patterns of change as AI evolves.

Integration of DeepSeek Models into Cloud Services: @teortaxesTex noted that China's three big telecom operators are rushing to integrate DeepSeek models into cloud services, potentially freezing their own LLM projects. This indicates a strategic shift towards adopting existing powerful models rather than developing new ones independently.

AI Tools, Development, and Research

Combining Vector Search and Knowledge Graphs: Qdrant Engine shared insights on building with Neo4j and Qdrant to create a smarter GraphRAG, which leverages vector search for semantic retrieval and graph traversal for structured reasoning. This approach aims for greater accuracy with less LLM dependency.

Using TensorFlow's ImageDataGenerator: DeepLearningAI highlighted the use of TensorFlow’s ImageDataGenerator to handle real-world images that vary in size, position, and contain multiple subjects. This tool automatically labels, resizes, and batches images for training, enhancing the efficiency of data pipelines when working with diverse image datasets.

Exploring AI's Limitations with Unknown Unknowns: @hardmaru discussed a paper titled "Evolution and The Knightian Blindspot of Machine Learning", which argues that the process of evolution equips organisms to navigate unexpected events ("unknown unknowns"), a capability that current AI systems struggle to replicate.

Community Insights and Events

Sam Altman's Three Observations: Sam Altman shared "Three Observations", offering insights likely related to AI developments, industry trends, or human potential. The content emphasizes the ongoing evolution and impact of technology.

AI Summit in Paris and Open-Source Advocacy: Clement Delangue announced arrival in Paris for the AI Summit, emphasizing efforts to push open-source AI alongside team members like Irene Solaiman. The focus is on doubling investments in France with an emphasis on open-source, robotics, and applications.

Discussions on Chinese AI Progress: @teortaxesTex provided a timeline reflecting skepticism towards Chinese AI advancements, noting a progression from initial underestimation to recognition of solid engineering efforts.

Memes/Humor

OpenAI's Super Bowl Ad and Rivalry with Google: Sam Altman humorously remarked on the challenge of surpassing Google with "man, still a long way to go to run down google 🥺" and mentioned "also our ad, it’s really good" in a conversation with @xprunie. @teortaxesTex playfully critiqued OpenAI employees for hyping their high-production-value ad, comparing OpenAI to an Apple-type corporation.

The Hackbot Singularity and TEDx Talk: @rez0__ mentioned that "the hackbot singularity is coming" and shared his TEDx talk titled "The Rise of AI Hackbots" available on YouTube, discussing the implications of AI in cybersecurity and hacking.

Humorous Takes on AI and Society: @teortaxesTex shared several tweets with humorous or satirical reflections on AI developments and societal observations, including commentary on public transit externalities, the robustness of nation-states, and playful jabs at corporate strategies in AI advancement.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek-R1/V3 Performance Showcase on Xeon and GPU

671B DeepSeek-R1/V3-q4 on a Single Machine (2× Xeon + 24GB GPU) – Up to 286 tokens/s Prefill & 14 tokens/s Decode (Score: 623, Comments: 165): The KTransformers team announces support for DeepSeek-R1/V3, achieving up to 286 tokens/s for prefill using a CPU/GPU hybrid inference system, which is significantly faster than llama.cpp. They highlight the use of Intel AMX-accelerated kernels and a selective expert activation method for performance enhancement, and emphasize that offloading computational tasks to the GPU aligns with DeepSeek's architecture, offering substantial speed improvements.
CPU and GPU Configuration: The setup uses an Intel® Xeon® Gold 6454S with 32 cores per socket and 8x DDR5-4800 for each socket, paired with a 4090D GPU. The system costs approximately $10K, with discussions on whether a heavy CPU setup is better than a heavy GPU setup, considering the Xeon's cost and potential downgrades to more affordable options.
Performance and Optimization: The DeepSeek V3/R1 model's performance is enhanced through CPU/GPU hybrid inference, though adding more GPUs does not currently offer significant improvements due to the model's sparsity. The model's footprint can be reduced significantly through optimizations, with one user reporting a 3.38 times improvement in prompt processing speed over llama.cpp, thanks to using an RTX 4090.
Platform Support and Future Plans: There is interest in optimizing for Apple Silicon and Intel GPUs, though the current focus is on open-sourcing version 0.3 and executing planned optimizations. AMD is supported but lacks the AMX optimization for prefill speed, and there are discussions about the potential benefits of using 48GB VRAM and future support for AMD Matrix Core (AMC).

Deepseek’s AI model is ‘the best work’ out of China but the hype is 'exaggerated,' Google Deepmind CEO says. “Despite the hype, there’s no actual new scientific advance.” (Score: 329, Comments: 244): Google DeepMind CEO commented on the DeepSeek AI model, describing it as the "best work" from China but stated the hype around it is exaggerated. He emphasized that despite the excitement, there is no actual new scientific advancement in the model.
Commenters criticized DeepMind CEO Demis Hassabis for downplaying the DeepSeek AI model, arguing that its open-source nature and engineering efficiencies, such as reduced costs and training efficiency, are significant advancements. They accused Hassabis of dishonesty by omission, failing to acknowledge the model's open weights and cost-effectiveness as substantial contributions.
Some commenters highlighted that DeepSeek's engineering achievements are notable, even if they don't constitute a scientific breakthrough. They pointed out that DeepSeek achieved competitive performance with ChatGPT at a fraction of the cost, challenging assumptions about China's AI capabilities and suggesting that the model's efficiency and open-source approach are valuable innovations.
Discussions also focused on the broader implications of open-source AI models like DeepSeek, emphasizing the potential for democratizing AI technology. Commenters noted that Google's reluctance to open-source their models contrasts with the openness of DeepSeek, leading to debates about the role of open-source in advancing AI research and its geopolitical impact.

Theme 2. Innovative Techniques in LLM Model Optimization

TL;DR of Andrej Karpathy’s Latest Deep Dive on LLMs (Score: 382, Comments: 48): Andrej Karpathy has released a 3-hour, 31-minute video on LLMs like ChatGPT, described as a "goldmine of information." A summary article condensing the key insights into 15 minutes is available here, and the original video can be found on YouTube.
Fine-tuning and Prompt Engineering: Discussions highlight the importance of fine-tuning smaller open-source models like llama-3B and emphasize prompt engineering as crucial for optimizing LLM applications. Andrej Karpathy's work and the article by Anfal Mushtaq are noted for covering these topics in depth, alongside strategies to reduce hallucinations in model outputs.
Data Processing and Tokenization: The article and video explore the preprocessing of vast internet text data, including rigorous filtering and tokenization using techniques like Byte Pair Encoding. This process is essential for the effective training of LLMs, balancing creativity with accuracy in model predictions.
Humor and Engagement: Several comments playfully summarize the article and video in progressively shorter formats, including a one-minute recap, a 50-word summary, and even a haiku, showcasing community engagement and humor in distilling complex information.

New paper gives models a chance to think in latent space before outputting tokens, weights are already on HF - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (Score: 112, Comments: 16): Scaling LLM Compute with Latent Reasoning discusses a novel approach in AI model computation, allowing models to perform reasoning in latent space before generating output tokens. This method, detailed in the paper titled "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach," has its weights already available on Hugging Face.
Adaptive Compute and Latent Reasoning: A notable discussion revolves around per-token adaptive compute, where models adjust computational effort based on token importance, potentially impacting AI benchmarks significantly within the next 6-12 months. This method allows models to "think" more on complex tokens while expending less on simpler ones, suggesting a significant shift in AI processing efficiency.
Recurrent Depth Approach and Weight Sharing: There's speculation on the implementation details, particularly whether the R blocks share weights and how these are sampled at test time. This recurrent depth approach, as discussed, could enhance the model's reasoning accuracy with increased recurrent steps, similar to efforts by OpenAI.
Availability and Comparisons: The weights for this approach are accessible on Hugging Face, with additional resources available on GitHub. Comparisons are made to Meta's similar research, though they did not release weights, emphasizing the value of open-access research artifacts for practical exploration and understanding of AI's latent reasoning capabilities.

Theme 3. Orange Pi AI Studio Pro PC: A New Player in AI Hardware

Orange Pi AI Studio Pro mini PC with 408GB/s bandwidth (Score: 315, Comments: 91): The Orange Pi AI Studio Pro mini PC has been released, featuring an impressive 408GB/s bandwidth. This development is significant for AI engineers looking for high-performance computing solutions in compact form factors.
Hardware vs. Software Support: The Orange Pi AI Studio Pro mini PC is criticized for its lack of reliable software support, with users highlighting past issues with Orange Pi's software ecosystem. Concerns include the absence of updates, proprietary drivers, and poor community support, making it less appealing despite its hardware capabilities.
Economic Considerations: Discussions emphasize the cost-effectiveness of pairing accelerators with DDR memory for AI workloads, as seen with setups like Deepseek R1 on EPYC systems costing under $10,000, compared to more expensive VRAM setups. The Orange Pi device, priced around $2,150, is seen as potentially good value for its specifications, but skepticism remains about its practical utility without robust software support.
Alternative Solutions and Comparisons: Users suggest alternatives like older NVIDIA GPUs and Intel NUCs for better support and performance, noting the challenges of using NPUs in less mainstream systems like the Qualcomm Snapdragon X series. The Orange Pi device's potential is overshadowed by these alternatives due to its niche status and anticipated software hurdles.

Theme 4. Scaling Retrieval-Augmented Generation (RAG) for Massive Datasets

How to scale RAG to 20 million documents ? (Score: 137, Comments: 136): To scale RAG (Retrieval-Augmented Generation) for 20 million documents, focus on optimizing latency, efficient embedding, and robust indexing strategies. Explore techniques like distributed computing, advanced indexing structures, and parallel processing to manage large-scale document retrieval efficiently.
The discussion highlights the challenges and strategies for scaling RAG with 20 million documents, emphasizing the importance of efficient vector databases like Weaviate, PGVector, and Pinecone for handling large-scale data. HNSW indexing and Reranking strategies such as Reciprocal Rank Fusion (RRF) are recommended to optimize retrieval quality and performance.
Participants debate the merits of fine-tuning versus context injection, with some arguing that fine-tuning is costly and less effective for large datasets. DataIsLoveDataIsLife suggests a pragmatic approach using stella_en_400M_v5 for embedding and MiniBatchKMeans for clustering, estimating a processing cost of $1,000-$20,000.
The use of GraphRAG/LightRAG approaches and graph databases is proposed for better results, while others suggest leveraging existing search engines for retrieval. Data ingestion and indexing are also discussed, with suggestions for using middleware layers to manage data efficiently and experimenting with tools like parade db for high-scale search.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Gemini 2 Flash: The New Benchmark for AI Translation Efficiency

Did I save 93% of cost by using OpenAI for translation? (Score: 160, Comments: 47): The post author compares translation costs, noting that Azure charges approximately €9.60 per 1 million characters, while OpenAI's GPT-4o-mini costs around €0.70 per 1 million characters, potentially saving 93% in costs. The calculation includes the need to translate words from a given sentence, requiring the input word in the output, with costs broken down as €0.30 x 2 per million characters plus €0.075 for input.
Discussions highlight the potential cost savings of using Gemini 2 Flash for translations, which offers better multi-lingual support and costs less than other options. Users note that with rate limiting and free tier usage, costs can be minimized or even eliminated, as detailed in Google's pricing with specifics on token costs and free tier limits.
Several users discuss strategies to further reduce translation costs, such as utilizing batch processing and prompt caching, which can cut costs significantly by allowing non-real-time processing. A link to the OpenAI batch API documentation is provided for reference on how this can achieve up to 50% cost reduction.
There is a conversation about the reliability and accuracy of various translation models, with some users suggesting open-source models for particular use cases, despite their slower speeds. Concerns are raised about translation quality, emphasizing the importance of having a human in the loop for large-scale translations to ensure accuracy.

Theme 2. OpenAI's Innovative Branding with Super Bowl Ad

OpenAI's $14 million SuperBowl ad (Score: 2722, Comments: 601): OpenAI is reportedly investing $14 million in a Super Bowl ad strategy, indicating a significant marketing push. This move could suggest an effort to increase public awareness and engagement with their AI technologies.
Many commenters believe the Super Bowl ad effectively positions ChatGPT as a major technological milestone, similar to Apple's 1984 ad, by associating it with historical advancements like fire and the moon landing. This approach aims to create brand awareness and emotional connection rather than focus on specific functionalities.
There is a divide in opinions about the ad's effectiveness; some argue it missed an opportunity to showcase ChatGPT's capabilities, while others see it as a strategic move to establish brand recognition and public acceptance of AI. The ad's creative and aesthetic quality received praise, with some noting its appeal to Millennials through elements like the Ratatat Neckbrace remix.
The discussion highlights the complexity of marketing AI technologies, with some emphasizing the importance of brand positioning and awareness, while others question the decision not to demonstrate practical uses of ChatGPT in the advertisement. Critics argue that the ad may not effectively reach those unfamiliar with OpenAI or ChatGPT.

Theme 3. ChatGPT's Ascent to Top Global Website Traffic Rankings

ChatGPT is now the 6th most visited site in the world as of January 2025, per Similarweb. The AI chatbot now holds 2.33% of global internet traffic, marking a 5.91% monthly surge. (Score: 139, Comments: 7): ChatGPT has become the 6th most visited site globally as of January 2025, according to Similarweb, capturing 2.33% of global internet traffic and experiencing a 5.91% monthly increase in visits.
Commenters discuss that OpenAI is gaining significant data from ChatGPT interactions, which enhances their brand recognition and potential subscriber base. This data is invaluable beyond mere traffic statistics.
OpenAI has achieved substantial brand recognition with ChatGPT, likened to historical brand dominance like Motorola's Droid. Commenters note that ChatGPT is becoming synonymous with "AI" for the general public, unlike lesser-known competitors like Claude.
A shared Google Trends graph highlights the disparity in search interest between ChatGPT and Claude, emphasizing ChatGPT's dominant position in public awareness.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Unsloth AI's Rise and Community Focus

Unsloth Rockets to GitHub Stardom:  Unsloth AI Celebrates GitHub Trending achieved #1 trending repository on GitHub within a year, marking significant community growth and impact. The community acknowledges Unsloth's contributions, particularly to Deepseek-R1, with potential integrations already in progress.
REINFORCE Reasoning Methods Under Scrutiny:  Reasoning LLM using REINFORCE Notion Doc sparks debate on novelty, with members noting existing Unsloth implementations. Skepticism arises around the originality of the approach, questioning its added value over current methods already available in Unsloth.
Model Merging Faces Headwinds:  Merging models into MoEs draws skepticism, triggering discussions on potential downsides and limitations. The community debates potential learning losses in long output formats with shared structures, which could impede training for specific tasks.

Theme 2.  No-Code AI Platforms & Tools Emerge

Spark Engine Launches No-Code AI Powerhouse: Spark Engine v1 is Live debuts with 80+ AI models, offering no-code text, music, and video generation capabilities. Developers express interest in integrating infrastructure like Unsloth to further enhance the no-code AI ecosystem.
Dataset Tools Gets AI-Powered EXIF Upgrade: Dataset Tools EXIF Viewer on GitHub enhances EXIF data viewing and adds support for GGUF and JPEG formats. Developers leverage AI to improve features and collaborate on code optimization for the project.
Markdrop Python Package Drops PDF Data Bombs: Markdrop PDF to Markdown Converter on GitHub arrives as a new Python package for converting PDFs to Markdown, extracting images, and using AI for descriptions.  The package quickly gains traction, hitting 7,000+ installs in a month.

Theme 3.  Model Performance and Hardware Debates Heat Up

Qwen 2.5 Leaves Llama 8B in the Dust:  Qwen 2.5 outpaces Llama 8B in speed, particularly with larger models like 32B, due to better optimizations. Users suggest Qwen 2.5 is the superior choice for those with capable hardware.
LM Studio Users Wrestle with Model Loading Errors:  LM Studio users grapple with 'NO LM Runtime found for model format' errors, indicating hardware limitations.  Users are advised to share system specs and screenshots and match model sizes to system capabilities based on LM Studio Docs.
M4 Ultra vs M2 Ultra: The Great Mac Chip Showdown:  A debate sparks over the value of waiting for M4 Ultra versus buying M2 Ultra for efficient model operation.  Users are concerned about rising service costs amid uncertain model performance on M2 Ultra.

Theme 4.  OpenAI Model Dynamics and User Concerns

Gemini Swallows Context Whole, ChatGPT Chokes:  Gemini’s massive 1-2 million token context window gains popularity over ChatGPT's 32k/128k token limits. Users prefer Gemini for complex tasks, despite ChatGPT limitations and connection errors.
GPT-4 Feeling Dumber, Users Demand Better Prompts:  GPT-4 is perceived as weaker, requiring more sophisticated prompting, while connection errors plague ChatGPT. Users are reporting ongoing connection errors and feeling that GPT-4 is not as capable as it once was.
DeepSeek's 'Unlimited' Turns Out to Have Limits:  DeepSeek's 'unlimited' usage is revealed to have restrictions, with high use flagged as abusive, raising transparency questions. Users express concerns about the term 'unlimited' and inconsistent policy application.

Theme 5.  Coding Tools and Agentic Workflows Evolve

Cursor IDE Explodes with MCP Server Mania:  Cursor IDE users dive deep into MCP servers, particularly Perplexity MCP server, for enhanced coding assistance.  Users explore setups and troubleshoot installation issues across different operating systems.
Agent Mode in Cursor Hailed as Debugging Hero:  Agent Mode in Cursor is praised for debugging prowess, outshining standard coding commands with direct model communication.  Users find integrating diverse LLMs boosts coding experience, especially with real-time assistance.
Aider Chat History Balloons, Token Limits Loom:  Aider's chat history grows excessively, reaching 25k tokens, sparking concerns about token limit overruns. Users discuss potential bugs and prompt caching effectiveness and performance impacts.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

Unsloth Achieves GitHub Trending Status: Unsloth AI has become the #1 trending repository on GitHub within a year, celebrating its tools and resources.
The community acknowledges Unsloth's contribution to Deepseek-R1, with components potentially already integrated or available in current projects.

REINFORCE Reasoning Sparks Debate: Concerns arose over a document on Reasoning LLM using REINFORCE at this link, questioning its novelty.
Members noted that an identical implementation already exists in Unsloth.

Model Merging Faces Skepticism: Interest in merging several effective models into a single mixture of experts (MoE) was met with skepticism, leading to discussion about potential pitfalls and limitations.
Discussion occurred regarding the potential loss of learning in long output formats that share common structures, which may hinder the training of specific tasks.

Spark Engine Integrates No-Code AI: Spark Engine v1 has been launched with over 80 AI models, generating text, music, and videos at SparkEngine.ai.
The developers expressed a desire to potentially integrate more infrastructure like Unsloth into the Spark Engine platform to foster advancements in the no-code AI realm.

Dataset Curation Dominates Model Performance: It was emphasized that 80% of a model's performance hinges on careful dataset curation, with one member noted, 'There is no such thing as redundant research - you learn from every paper.'
Another member is experimenting with Lora settings to develop a metacognitive first-person reasoning format.

HuggingFace Discord

Kokoro TTS Speaks C#: A member released a C# library for Kokoro TTS, enabling plug & play integration on .NET platforms, available on GitHub.
The library promises a multilingual experience with all voices packaged in a convenient format, supporting fast local TTS inference and works across multiple platforms.

Dataset Tools Gets EXIF and AI Upgrade: The Dataset organizer and EXIF Viewer received updates, enhancing its capabilities to view advanced EXIF data and supporting formats like GGUF and JPEG, available on GitHub.
The developer utilized AI tools to assist in the project, enhancing its features while collaborating with others for code optimization.

Spark Engine Ignites AI Sandbox: The Spark Engine v1 was released after a year-long public beta, providing over 80 models for various AI tasks available at sparkengine.ai.
The platform offers free credits daily and integrates with Hugging Face, making a robust no-code environment for users to experiment with AI capabilities.

Markdrop Extracts PDF Data: A new Python package called Markdrop was introduced, designed for converting PDFs to Markdown with features like image extraction and AI-powered descriptions, accessible on GitHub.
In just a month, it has achieved over 7,000 installs, showcasing its popularity among users looking for document manipulation tools.

go-attention Implements Transformer in Pure Go: A member shared their project, go-attention, which showcases the first full attention mechanism and transformer built in pure Go, highlighting its unique capabilities on GitHub.
The project invites others to check out examples and explore the potential of serverless implementations in Go programming.

LM Studio Discord

Qwen 2.5 Smokes Llama 8B in Speed: Users compared Qwen 2.5 and Llama 8B, citing that Qwen offers faster response times due to optimization, especially with larger models like 32B.
The discussion suggested that Qwen 2.5 is preferable with adequate hardware.

LM Studio Users Battle Model Loading: Users encountered issues loading models into LM Studio, receiving errors like 'NO LM Runtime found for model format', indicating hardware limitations.
The suggested solution was to provide system specs and screenshots for better assistance, as well as matching model size to system capabilities according to LM Studio Docs.

Debate on M4 Ultra vs M2 Ultra ensues: A debate emerged about the value of waiting for the M4 Ultra versus purchasing the M2 Ultra for efficient model operation.
Concerns centered on rising costs for existing services amidst uncertain performance of models on the M2 Ultra.

PCI-E Risers Raise Eyebrows: A user inquired about using PCI-E riser cables to install additional GPUs and the performance implications, particularly with A5000 cards.
A suggestion was made to repurpose old cases as GPU holders for enhanced cooling and space management.

OpenAI Discord

Gemini Gains Large Context Popularity: Gemini’s capability to handle 1-2 million tokens has made it popular, especially compared to ChatGPT’s 32k and 128k tokens, enhancing usability for complex tasks.
Users appreciate Gemini’s flexible features, making it a preferred choice for detailed work, despite concerns over ChatGPT’s limitations.

GPT-4 Feels Weaker Nowadays: Members feel GPT-4 is less capable, requiring better prompting to yield good results, but earlier models might have set a perception of inferiority in complex tasks.
Several users also reported ongoing connection errors while using ChatGPT, raising concerns about accessibility, which could be tied to the ChatGPT app.

Indirect Injection: Data Needs Sanitization: Members voiced concerns over whether OpenAI has disclosed if deep research is vulnerable to indirect prompt injection from scraped pages, implying a need for data sanitization.
Another member was optimistic about an upcoming feature addressing this concern, looking forward to more information.

Markdown Manages URL Attention: ChatGPT is more effective with links described in markdown rather than plain URLs, improving prompt hygiene.
Members found that using well-formatted structured data like JSON can help manage large blocks of information effectively.

DeepSeek's 'Unlimited' Has Usage Restrictions: Reports highlight that high use of DeepSeek is categorized as abusive, sparking user concerns about the term 'unlimited', and raising questions about the transparency of OpenAI's policies.
The restrictions, seemingly applied inconsistently, prompted questions about the transparency of OpenAI's policies and user expectations.

Cursor IDE Discord

Cursor MCP Servers Spark Discussion: Users on the channel discussed various MCP servers, including the Perplexity MCP server, detailing its setup and functionality within Cursor to improve coding assistance.
Some users shared their experiences integrating different models into their workflows, while others troubleshoot command prompts that returned errors, indicating the need for clearer documentation and support.

Agent Mode Praised for Debugging: Users explored Agent Mode functionalities and its advantages over standard coding commands, particularly praising its debugging capabilities and direct communication with models like Perplexity.
The consensus was that integrating different LLMs could enhance the coding experience, especially with features allowing searching and real-time assistance.

MCP Server Installation Snafus Reported: Several users encountered issues setting up MCP servers, specifically with command execution and server responses on different operating systems such as Mac and Windows.
Discussions involved troubleshooting command prompts that returned errors or failed to connect, pointing to the need for improved documentation and support.

Custom Cursor Rules Spark Interest: Participants discussed the possibility of creating custom cursor rules to improve the implementation of specific features while using the Perplexity MCP server, with links to Using Cursor with Convex.
Users emphasized that integrated cursor rules could streamline workflow and enhance the ability of the AI to respond to complex code-related queries.

Performance and Limitations Probed: Discussions occurred regarding the performance of various models, including reports of service degradation and concerns about fast API call limits within Cursor.
Participants noted that MCP servers, if used correctly, could alleviate performance issues and provide better results than traditional web scraping methods.

Stability.ai (Stable Diffusion) Discord

Unique Tags Boost Lora Consistency: Using unique tags in training data, such as specific names for objects or scenes, can significantly improve the consistency and narrative continuity of generated images in Lora models.
The method helps the model to better associate specific scenes with those names, as shown in this example of Lora Training on BasedLabs.

Optimal Flux Resolutions Found: For generating images with Flux, optimal latent sizes are around 672x1024 or 1024x672, while 1920x1088 provides a suitable quick HD generation size.
Generating images above 1MP during initial passes may cause compositional issues.

Photoshop Gets ComfyUI Integration: Users are exploring the integration of various plugins for ComfyUI with Photoshop, such as Auto-Photoshop-StableDiffusion-Plugin and sd-ppp.
These plugins enable the generation of stable diffusion images directly within Photoshop using a ComfyUI backend.

Stable Diffusion Hit GPU Snags: Users reported troubleshooting GPU errors and slow performance issues across different Stable Diffusion UI paths, with lowering GPU settings being a common solution to resolve memory issues.
Using specific settings and maintaining aspect ratios were recommended to improve model performance and output quality, see Stable Diffusion Knowledge Base (Setups, Basics, Guides and more).

AI-Generated Art Gets Copyright Shield?: A recent case granted copyright protection to an AI-produced image due to sufficient human input, potentially setting a legal precedent for AI-generated content ownership, reported by cnet.com.
The image, called A Single Piece of American Cheese, was created using Invoke's AI editing platform.

Nous Research AI Discord

Nous Mimics META's Moves: Discussion highlights how Nous Research improves its AI models using advancements from larger companies like META and DeepSeek, while facing funding challenges as a smaller startup.
The focus is on creating affordable frontier AI models to maintain market competitiveness, similar to building on existing codebases.

Granite 3.1 Trains Multiple Objectives: User plans to train Granite 3.1's 3B model to explore training strategies and custom RL loops with multiple objectives per epoch in a new setup.
This explores the potential of using multiple objectives within the novel training structure.

Zonos Clones High Fidelity Voices: The release of Zonos, a high-fidelity TTS model featuring voice cloning, showcases strong performance against leading TTS providers.
The model's open-source license under Apache 2.0, as noted in ZyphraAI's tweet, promotes its integration into AI development.

LM Similarity Undermines AI Oversight: Research has proposed a probabilistic metric for language model similarity based on model mistakes to enhance AI oversight, as detailed in a paper on arxiv.org.
This suggests the use of LLMs as judges to favor similar models to facilitate weak-to-strong generalization with complementary knowledge; however, the trend is concerning as model mistakes are becoming harder to detect as AI Oversight becomes more important.

OVERTHINK slows reasoning models: The OVERTHINK attack is causing models to slow down by as much as 46x in inference by injecting decoy tasks, amplifying reasoning tokens without altering output, according to Jaechul Roh's tweet.
The method uses complex tasks like Markov Decision Processes and Sudoku during untrusted contexts to manipulate inference processes, posing risks for models like OpenAI's o1 and o3-mini.

Codeium (Windsurf) Discord

Windsurfers Request Profile Page Polish: The Codeium team is soliciting user feedback for improvements to the Codeium profile page, with users encouraged to submit suggestions via a provided form.
The enhancements aim to create a more useful and personalized experience, focusing on the stats and metrics that users find most valuable.

Jetbrain Extension Seen as Abandoned: Users worry that the Jetbrain extension model availability lags behind Windsurf, with some speculating about a shift towards a Cursor-centric approach, causing frustrations over lost functionalities.
The announcement that a new passive in-text editor experience will be exclusive to Windsurf, leading to the deprecation of Supercomplete on the VSCode plugin, exacerbates these concerns.

Codeium Plagued by Payment Problems: There's discussion around payment restrictions affecting Russian users, causing challenges in securing licenses due to regional limitations and company policies.
Users are urging Codeium for clearer communication regarding these restrictions, as well as an improved payment process.

Windsurfers Want Workflow Improvements: Windsurf users reported issues with code proposals, diff displays, and automatic updates, along with the need for more consistent tool calling among AI models like O3, Deepseek, and Claude.
Users are also requesting better credit management, system issue notifications, improved design documents, debugging capabilities, and output consistency from AI models.

Credit Crunch Concerns Codeium Customers: Users voiced concerns about the credit system, particularly around consumption during operations and the absence of refunds for unsuccessful attempts.
The frustration stems from spending credits on unsatisfactory outputs, prompting calls for more transparency in usage handling.

OpenRouter (Alex Atallah) Discord

OpenRouter Exposes Reasoning Tokens: Users can now see reasoning tokens on model activity pages alongside prompt and completion tokens for better transparency.
This enhancement aims to provide users with deeper insights into how models perform on the OpenRouter platform.

Chat-thyme Simplifies Discord Bot Creation: Chat-thyme lets you set up Discord bots using any OpenAI-compatible LLM framework, offering easy OpenRouter integration.
It also integrates Exa for models supporting tool use, although reliability depends on the provider.

FindSMap Integrates Historical Maps Globally: FindSMap is a progressive web application connecting historical maps and archaeological institutes using Open Street Maps and Leaflet.js.
Built with Claude and Open Router, FindSMap showcases iterative development and dedication to the project.

DeepSeek R1 faces Timeouts: Users reported significant performance issues with DeepSeek R1, experiencing timeouts during API requests, but the 'nitro' variant is integrated into the main model features, allowing users to sort by throughput
A new inference stack for DeepSeek R1 @togethercompute gets up to 110 t/s on the 671B parameter model (tweet).

TypeScript SDK Eases LLM Calls: A team is building a TypeScript SDK to interface with over 60 LLMs using OpenAI's format, integrating OpenRouter.
The GitHub project aims to simplify calls to 100+ LLM Providers, but feedback indicates it may be rough around the edges.

aider (Paul Gauthier) Discord

DeepSeek APIs Suffer Instability: Users reported instability and unresponsiveness with DeepSeek APIs, especially when integrating them with Aider. One user had trouble getting outputs using DeepSeek with specific configurations.
Model comparisons for DeepSeek's R1 and V3 favored Hyperbolic and OpenRouter over other providers, with users noting specific configurations enhancing performance.

Aider Auto-Creates Files in Architect Mode: Users are experiencing Aider auto-creating files without prompts in Architect mode, leading to confusion. A user shared a screenshot showing the unexpected behavior, suggesting potential configuration issues; see issue #3153.
This unexpected behavior is leading to confusion about the operation flow, and warrants more investigation into the config.

Aider Chat History Reaches Token Limit: There are concerns that Aider's chat history is exceeding reasonable limits, with some users reporting it climbing to 25k tokens.
The community discussed potential bugs and the effectiveness of prompt caching, and the overall effect on performance.

Copilot Proxy Unlocks GitHub Copilot Models: The experimental Copilot Proxy VS Code extension enables AI assistants access to GitHub Copilot's language models. A YouTube video details the extension's functionality.
One member sought ways to utilize the Copilot Proxy work, and another suggested using the llmap repo with its parse.py script to extract file outlines.

Gemini Models Effective for PHP Tasks: Users reported positive experiences with Gemini models like gemini-1206-exp for PHP tasks, with comparisons to other providers showing no significant differences in output.
Aider also introduced experimental support for tree-sitter-language-pack aiming to expand Aider's programming language capabilities. Users are encouraged to test this feature and provide feedback.

Latent Space Discord

DeepSeek R1 Goes Local: Chinese GPU manufacturers like Moore Threads and Baidu's Kunlun are now supporting DeepSeek's R1 LLM models on local systems, increasing competition with NVIDIA.
This move signifies growing AI hardware capabilities in China, challenging NVIDIA's dominance in AI processing.

Anthropic Indexes Economic Impact: Anthropic launched the Economic Index, including a paper analyzing millions of anonymized Claude conversations to assess AI's impact on the economy, as discussed in their Tweet.
Initial findings reveal material transportation shows surprisingly low engagement compared to other sectors.

Replit Simplifies Mobile App Creation: Replit introduced early access for Native Mobile App support, enabling users to create iOS and Android apps without coding, powered by Replit Assistant; tweet here.
This launch marks a pivot towards more accessible app development, promising full agent support soon.

Deep Research Tool Sparks Debate: Members discussed OpenAI's new Deep Research tool, highlighting its interactive approach by asking clarifying questions before research, which signals a move towards more proactive AI as shown on their Deep Research page.
Comparisons are emerging with tools like Hugging Face's Deep Research and other community-developed alternatives.

ELIZA Makes a Comeback?: Members were introduced to the ELIZA Operating System (ELIZA Operating System) designed for AI agents, highlighting its foundational role in chatbot technology.
The conversation highlighted the historical significance of chatbots like ELIZA in the context of modern AI development.

Modular (Mojo 🔥) Discord

Mojo Faces Ecosystem Hurdles: Members debated Mojo's viability for web development, emphasizing the importance of a solid ecosystem and seamless integration with existing Python libraries.
The general consensus was that significant effort is required to build foundational tools before widespread adoption can occur, mentioning platforms like Render as a good example.

VariadicList Challenges Arise in Mojo: A user reported issues initializing VariadicList in Mojo, specifically concerning dynamic element repetition using the pop.variadic.create operation, and posted a link to the GitHub issue).
The issue highlights potential gaps in Mojo's current capabilities for handling variadic lists, with some members sharing their own mojoproject.toml files (such as this one) .

Domain Knowledge Drives Business: Participants stressed that domain understanding is essential for launching a successful tech business, particularly the need for strong networking knowledge.
Many startups neglect this aspect, which leads to avoidable challenges and impedes growth. 'Understanding the domain is crucial for launching a business', one member stated.

Network Effects Influence Language Adoption: The group discussed how network effects impact the adoption of languages like Rust, where a vibrant ecosystem fosters experimentation and growth.
While some tolerate rapid development 'slop', others advocate for maintaining high-quality standards to ensure long-term viability and prevent technical debt.

C++ Remains King in High-Performance: The discussion highlighted C++'s continued dominance in performance-critical applications and its impact on new language adoption.
While Mojo has potential, its growth hinges on seamless integration with established languages and offering substantial performance advantages over current solutions.

MCP (Glama) Discord

No Firebase/Firestore MCP Found: A user looking for a Firebase/Firestore MCP was directed to a link indicating it might not exist, highlighting a need for such a tool.
This gap underscores opportunities for developing MCP tools tailored to specific database integrations.

MCP Command Path Misconfiguration: Users encountered 'No Tools Found' errors while adding MCP servers via Cursor, suggesting path misconfigurations might be the cause.
Solutions involve verifying the correct command path and potentially resetting the application after updates, ensuring proper tool recognition.

MCP Performance Faces Python SDK Hurdles: Users reported slow tool call responses when using MCP with Claude Desktop, attributing the issues to limitations within the Python SDK and ongoing bugs after a recent update (python-sdk@bd74227).
The feedback emphasizes a demand for enhanced error handling and overall performance improvements to facilitate smoother operation.

Smithery Installer Sparks Concerns: While regarded as a leading MCP installer, concerns arose about Smithery's remote data handling and overhead, prompting a search for a more local alternative.
Users emphasized the need for privacy and efficiency, pushing for solutions that minimize remote data dependencies in MCP tools.

Claude Desktop Beta Still Buggy: Beta testers experienced crashes with the Claude Desktop app while using their MCP servers, reflecting the current features' unreliability.
The consensus is that the app requires extensive feedback and substantial improvements before a stable release can be anticipated, as provided in the Claude Desktop Quick Feedback form.

GPU MODE Discord

cuBLAS Shows Varied GPU Performance: A user found cuBLAS performance inconsistent between a 1650ti and 4090, questioning if the build accommodates newer architectures.
Discussions also touched on how increasing the L1 hit rate might alleviate stalls related to load queuing.

Unsloth Turbocharges LLM Training: Unsloth can speed up LLM training by 30x, enabling Alpaca training in just 3 hours instead of 85, according to their blog post Introducing Unsloth.
They claim 60% less memory usage without sacrificing accuracy, offering both open source and proprietary options.

Mistral Finetuning Gets 14x Faster: The introduction of QLoRA support accelerates Mistral 7B finetuning by 14x on a single A100, decreasing peak VRAM usage by 70%, as noted in their blog post Unsloth update: Mistral support + more.
Additionally, CodeLlama 34B sees a 1.9x speedup, with enhanced memory utilization preventing out-of-memory errors.

Explore iGPU Programming on Ryzen AI: Members discussed how to leverage the iGPU in the Ryzen AI CPU (Strix Point) through graphics frameworks or potentially HIP.
These approaches could allow developers to tap into the processing power of integrated GPUs.

reasoning-gym gets Matrix Manipulation: The reasoning-gym saw new PRs merged, including Matrix Manipulation and Count Bits, expanding the dataset offerings.
Members considered how to best benchmark the gym environment to see how RL training impacts generalization, and considered using OpenRouter for inference compute.

Notebook LM Discord

NotebookLM Plus Joins Google One, Student Discounts Arrive: NotebookLM Plus is now part of Google One AI Premium, offering higher usage limits; U.S. students over 18 get a 50% discount on the plan, which is $9.99/month.
NotebookLM Plus increases notebook capacity by 5x, source limit per notebook by 6x, and audio overviews by 7x.

Users grapple with NotebookLM's Source Generation Hiccups: Users report issues with NotebookLM failing to generate notes from uploaded sources like .txt and .pdf files; the system displays 'New Note: Generating' indefinitely.
Workarounds include directly pasting text and directing users to official Google support links to understand inherent free and paid version limits.

NotebookLM Plus Boosts Chat and Sharing Tools: NotebookLM Plus now features advanced chat customization, sharing capabilities, and provides comprehensive usage analytics.
Notebook sharing requires Gmail to be enabled, presenting challenges for users with SSO from Azure.

AI Bridges Clarity Gap in Medical Discussions: A member shared how AI helps clarify medical jargon related to their breast cancer diagnosis, summarizing dense articles and surgeon appointments.
They emphasized how AI has been a comforting aid during their treatment by challenging the AI for clarifications.

Users Build Versatile Bots With NotebookLM: A user launched the Versatile Bot Project, providing prompt documents to transform NotebookLM into different types of chatbots through specialized prompts.
The user said that both prompts have been tested and aimed to create a customizable chatbot experience.

Eleuther Discord

Skip Transcoders leap ahead of Sparse Autoencoders: Skip transcoders demonstrate a Pareto improvement over SAEs, providing enhanced interpretability and fidelity for researchers, and can be used with flags --transcode and --skip_connection in the sparsify library.
In contrast to SAEs, transcoders better approximate input-output relationships, bolstering the approach to interpretability, according to the team which published their paper on arxiv.org.

Partial Rewriting Faces Obstacles: The team encountered lackluster results in their research on partially rewriting transformers, as they trained a skip transcoder on the sixth layer of Pythia 160M.
Despite initial setbacks, the team remains optimistic about refining their methods and has published a paper detailing the approach.

GPU Retrofitting for AI: Proceed with Caution: Concerns about repurposing older 1070ti mining rigs for AI highlighted issues with outdated architecture and bandwidth limitations, possibly limiting training.
While these GPUs could serve adequately in inference tasks, members cautioned against expecting efficient training outcomes for contemporary AI models.

Chess-Based LLM Evaluation Gambit: EleutherAI is creating a task to evaluate LLMs using a database of 4M+ chess tactics, which could uniquely enhance LLM performance, eventually playing chess, by leveraging reinforcement learning.
The team is determining whether to do MCQ style versus free-form generation, hoping for models to show their reasoning through  tags.

Pythia's Puzzling Checkpoint Pattern: Discussion clarified that Pythia saves checkpoints every 1,000 steps, contrary to claims of 10K steps, to enable deeper analysis using log(tokens) for interpretations.
There was some consideration about whether smaller linear step sizes and switching over earlier would improve efficiency, weighed against concerns of wallclock overhead for saving checkpoints.

Yannick Kilcher Discord

Logits vs Probabilities sparks debate: Members debated the benefits of training models in log space compared to absolute space, emphasizing that log space can capture a wider range of values and can lead to more similarities in distant points.
One member pointed out that using log space affects accuracy based on the use case.

Sparse Autoencoders Receive Skepticism: A member voiced skepticism about Sparse Autoencoders (SAEs) being overhyped, expressing disappointment in their interpretability and citing inconsistencies across random seeds, see this paper.
The discussion referenced recent papers critiquing SAEs and exploring new methods for model interpretation, as well as skip transcoders outperforming SAE's as shared on twitter.

Guardrails Fail Bioweapons Discovery: A drug discovery algorithm, intended to minimize toxicity, reportedly switched to maximizing toxicity, leading to the discovery of 40,000 potential bioweapons in just 6 hours.
The incident raised alarms about the effectiveness of current guardrails against broad knowledge synthesis and the risk of overlooking harmful compounds due to narrow focus.

PlanExe AI Project launches on Github: A member introduced PlanExe, a structured AI planner built with LlamaIndex and OpenRouter, which can generate structured plans like SWOT analyses without extensive web searching, available on GitHub.
The creator expressed uncertainty about the accuracy of the outputs but also provided a link to PlanExe-web.

LLMs Struggle With Token Counting: Members noted that LLMs struggle with counting tokens in their context, suggesting that the difficulty extends beyond tokenization to a fundamental inability to count.
It was simply stated by a member that LLMs can't count at all.

LlamaIndex Discord

Gemini Flash Accelerates Document Understanding: LlamaParse now supports Gemini 2.0 Flash, achieving GPT-4o+ performance levels for document processing at a lower cost, setting the stage for enhanced workflows leveraging VLMs and LLMs.
A tutorial by @composiohq demonstrated building a YouTube research agent with Gemini Flash 2.0, streamlining video searches and Gmail draft creation, reinforcing LlamaIndex's utility in simplifying video research workflows.

CrossPoster App Arrives for AI-Enhanced Social Media: The CrossPoster app launched, enabling cross-posting to Twitter, LinkedIn, and BlueSky using AI to optimize social media engagement.
The app intelligently identifies individuals and their accounts, streamlining the management of a social presence across platforms.

OpenAI LLM Faces Timeout Troubles: Members found that the timeout for OpenAI LLM options is being overridden by the retry decorator, leading to inconsistencies, despite higher timeout settings.
One member shared that even after submitting a bug fix, Deepseek returns a 200 OK response after 60 seconds but with an empty body, exacerbating the issue.

Hand-off Frustrations in LlamaIndex: Users voiced concerns about the can_handoff_to feature in LlamaIndex, particularly when agents transfer control without a response from the receiving agent, leading to dropped requests.
Suggested solutions included enabling debug logging and using LlamaIndex's callback handler for more effective troubleshooting.

Metadata Must-Haves for AzureAI Search: A user questioned the hardcoded customization of filterable metadata fields in AzureAI Search, specifically noting 'author' and 'director'.
It was clarified that Azure requires these metadata fields to be defined upfront, emphasizing the significance of well-defined and useful document fields, and the need to be aware of the current limitations of the feature.

Cohere Discord

Trust Yourself During Job Hunt: Members on the Cohere Discord emphasized self-belief during job applications, encouraging others to trust in themselves 'regardless of what they say'.
They added that everyone is just as uncertain, pushing for persistence in the face of challenges and highlighting the lack of hiring opportunities for engineering internships.

Networking Boosts Exposure: Members suggest that networking is crucial, regardless of one’s location, suggesting participation in events to boost exposure, while also recommending engaging in open-source projects to connect with others in the field.
One user mentioned attending conferences and competitions relevant to their engineering field, even highlighting their participation in the Canadian engineering competition.

LibreChat API calls hitting v1 instead of v2: A member highlighted that they can only access the Cohere API through https://api.cohere.ai/v1 using LibreChat's Custom Endpoint, confirming the Cohere API works via curl.
It was pointed out that LibreChat is currently calling the old API version (v1) and needs an update to the /v2 endpoint, though the URL https://api.cohere.com/v1 mirrors the functionality of https://api.cohere.ai/v1.

Cohere Community lays down the Rules: Members discussed the Cohere Community rules, emphasizing respect and appropriate conduct within the server, while drafting introduction messages for newcomers, highlighting interests in AI and local initiatives like 'buy Canadian'.
The discussion later shifted to the scalability of Cohere's API and how accessible their staff is for collaboration, while one member encouraged a Socratic dialogue about vapes.

LLM Agents (Berkeley MOOC) Discord

Yu Su's Language Agents Lecture Livestreamed: Today at 4:00pm PST, the 3rd lecture featuring Yu Su on Memory, Reasoning, and Planning of Language Agents was live streamed here, arguing contemporary AI agents use language as a vehicle for reasoning.
Yu Su is a Distinguished Assistant Professor at the Ohio State University and co-directs the NLP group with significant contributions including Mind2Web, SeeAct, HippoRAG, LLM-Planner, and MMMU garnering recognition like the Best Student Paper Award at CVPR 2024 and Outstanding Paper Award at ACL 2023.

MOOC Late Enrollment and Curriculum Details Awaited: Users can enroll in the LLM Agents MOOC that started in January, and staff promised to release more curriculum details soon, addressing concerns about project framework and publication limitations.
Participants asked about the specifics of assignments and projects outside of quizzes, to which staff mentioned detailed information would be released shortly, encouraging users to remain patient while awaiting clear guidelines on project requirements and grading policies.

Certificate Concerns in Berkeley MOOC: Several users reported not receiving their certificates while their peers have, prompting a focus on missing completed certificate declaration forms as a required step.
Course staff reiterated that completion of this form is necessary for certificate issuance and needs to be submitted individually, and suggestions included creating an automated agent to streamline the certificate process and address common queries.

DPO Explained and Compared to SFT: A member explained how Supervised Fine Tuning (SFT) uses only positive examples while Direct Preference Optimization (DPO) incorporates negative responses, highlighting the penalties for bad responses in DPO.
Bad responses, often well-structured, trigger an increase in their probability during SFT due to the absence of a reward model.

Lecture 2 Study Session Prompted Time Zone Concerns: A member announced a study session on Lecture 2: Learning to Reason with LLMs, inviting others to join via a provided link, preparing to discuss GRPO from DeepSeek-R1 as part of the study materials.
One participant expressed concern about the study session's timing, noting that it fell at 3:00 AM UK time, highlighting potential scheduling conflicts for international members.

Torchtune Discord

Exploring Artificial Data Generation Methods: A member is diving into artificial data generation and is looking for tools to turn unstructured data like PDFs and Excel files into training samples for LLMs, citing a YouTube video on the topic.
However, there was a recognition of challenges in training LLMs with synthetic data, noting that question generation may not provide necessary comparative insights that requires comprehensive data across multiple document sources.

Kolo Simplifies Fine-Tuning: A member is developing Kolo, a tool designed to simplify model fine-tuning, but it currently lacks data creation capabilities.
The developer plans to add a training data generation feature in the future.

PR #2257 Under Review: A member requested a review for PR #2257, stating it passes local tests but needs more feedback.
Reviewers lauded the changes but raised UX concerns regarding quantization and recommended documentation improvements.

GRPO's Feature Philosophy: The team debated whether to simplify GRPO by removing functionalities, balancing usability with cleaner code.
Opinions leaned toward removing unneeded code, with some acknowledging the potential need for features like activation checkpointing; see Grpo loss by kashif.

Torchtune's Checkpointing Mechanics Detailed: A member shared how resume functionality updates checkpoint paths and depends on the resume_from_checkpoint flag, as seen in the Checkpointing in torchtune documentation.
Discussion covered the implications of unusual workflows in loading initial weights.

Nomic.ai (GPT4All) Discord

GPT4All Lacks Model Selection Menu: Users are concerned about the absence of a functional model selection menu with search options in GPT4All, even after 36 releases.
A member suggested contributing code to enhance GPT4All due to its open-source nature.

AI Agents Embrace Databases for Long-Term Memory: Members explored using AI agents with databases for long-term memory and suggested improving LLMs' temporal awareness through functions.
The conversation speculated that 2025 could be a pivotal year for advancements in agentic AI.

GPT4All Sidelines Image Analysis: It was clarified that GPT4All does not currently support image analysis, with suggestions to use other platforms for such tasks.
Recommendations included tools like booruDatasetTagmanager and joycaption for image-related projects.

Perfecting PDF Embedding Methods: Members discussed strategies for embedding and summarizing long documents like PDFs into usable formats for GPT4All.
Proper handling of downloads to remove irrelevant content before embedding was emphasized.

Qwen2.5 and Phi4 Win Popularity Contest: Members recommended Qwen2.5 and Phi4 for their efficiency compared to models like Mistral.
The user-friendliness of models integrated with the app was underscored, with offers of assistance for those unfamiliar with Hugging Face.

tinygrad (George Hotz) Discord

Tinygrad's Mobile Misadventures: Testing reveals WebGPU failing on iPhone 15 due to caching issues, while M1 Pro users report success on Safari and Chrome with tinychat demos.
The community is calling for enhanced testing to improve compatibility, especially with WASM loading on mobile devices.

Tinygrad's Remote Roots Revealed: Clarification emerged that tinygrad is a fully remote company, dismissing rumors of being based in San Diego due to inaccurate Twitter information.
The correction prompted inquiries about Ampere Altra processor support and backend acceleration capabilities.

Company Meeting Gears Up for Action: Meeting #57 is scheduled, featuring discussions on company updates, CI speed, tensor cores, and potential bounties for WebGPU and tinychat enhancements.
The goal is to boost internal operational speeds and address community interests in ongoing projects.

FP16's Fate in ML Frameworks: A debate sparked about why most ML frameworks don't exclusively use fp16, revealing potential disadvantages and performance limitations.
George responded with a suggestion to review discord rules, sparking further commentary on research quality prior to inquiries.

PR Precision and Quantization Quirks: Discussions centered on a pull request (PR) implementing a script, emphasizing the need for additional features and testing, especially with Hugging Face models.
The community stressed the importance of clean PR structure for easy reviews while acknowledging existing numerical inaccuracies in quantized models as a challenge.

DSPy Discord

DSPy Trains BERT to Classify Articles: A member transitioned from GPT-3.5 and GPT-4 to training a BERT model for article classification using DSPy.
The optimized prompt now extracts a dozen fields from each article, processed in batches every 24 hours using Miprov2 with o3-mini as a teacher, and Mistral Small 3 as a student, and resulted in a 50% discount.

Multi-Agent Systems Boost Performance with MASS: LLMs operating as multiple agents show great promise in solving complex tasks due to effective collaboration strategies highlighted in the MASS framework.
The analysis emphasizes the importance of prompts and topologies in multi-agent system design.

Factorio as AI Agent System Engineering Sandbox: Static benchmarks fall short in evaluating necessary skills for dynamic system engineering, so agents trained via automation-oriented sandbox games like Factorio is proposed.
This fosters the development of reasoning and long-horizon planning capabilities essential for managing complex engineering challenges.

Deep Research Abstractions: A member inquired about plans to introduce abstractions that simplify tasks akin to deep research.
Are you guys planning to introduce abstractions? the member asked, highlighting their curiosity about potential upcoming features.

DSPy Client Error Debacle: A member reported encountering the error AttributeError: module 'dspy' has no attribute 'HFClientVLLM' while using dspy.
They later noted that this feature was deprecated in dspy 2.6, which resolved their confusion.

Gorilla LLM (Berkeley Function Calling) Discord

Custom RAFT templates for Llama?: A member inquired whether their own templates, similar to RAFT's, could be used for generating synthetic datasets with Llama.
This inquiry raises questions about the flexibility of Llama's dataset requirements and customization options.

Compatibility issues with HF Datasets: A member voiced concerns about potential compatibility issues with HF datasets due to differing function properties.
The member suggested converting complex objects to strings to simplify dataset loading and usage.

JSON lines Formatting Clarified: A member clarified that there are no issues with the JSON files, noting that HF expects JSON lines formatted files.
This clarification underscores the importance of adhering to the expected file format for successful dataset loading in HF.

README Update Proposed: A member offered to create a pull request (PR) to update the README with a new helper function.
The suggestion was well-received, indicating a collaborative approach to improving user experience and documentation.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (1052 messages🔥🔥🔥):

Unsloth AI Progress, GRPO Challenges, ASCII Art Generation, Reward Function Validations, Multi-GPU Support 

Unsloth AI Gains Popularity: Unsloth has become the #1 trending repository on GitHub, marking significant progress within a year of its establishment.
The community has expressed appreciation for the tools and resources that Unsloth provides, with enthusiasm for its future developments.

GRPO and Reward Function Issues: Users have reported challenges with GRPO effectively evaluating non-deterministic outputs, particularly in creative tasks such as RPG role-playing.
Discussions emphasized the importance of the reward function and suggestions for debugging to enhance output quality.

Exploring ASCII Art Generation: An interest in fine-tuning models to generate ASCII art based on descriptions emerged, but concerns about models' limitations and coherence were raised.
Participants encouraged the exploration of existing models for potential inspiration and successful examples in generating ASCII art.

Validation Challenges in AI Training: The conversation highlighted difficulties in verifying the output of models trained with RL, particularly when no fixed outputs exist to compare against.
It was suggested that leveraging known hallucinations might provide a starting point for generating better datasets.

Multi-GPU Support in Unsloth: The Unsloth community is curious about the future implementation of multi-GPU support, currently in beta for select trusted members.
Users expressed interest in updates and availability for broader access in the future.

Links mentioned:

Deploy Deepseek-R1 in one GPU — AMD Instinct™ MI300X: Yes!
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step: Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1. However, curr...
Run DeepSeek-R1 Dynamic 1.58-bit: DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.Run the 1.58-bit Dynamic GGUF version by Unsloth.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training: Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain ...
Tweet from BlinkDL (@BlinkDL_AI): https://huggingface.co/BlinkDL/temp-latest-training-models/blob/main/rwkv-x070-2b9-world-v3-preview-20250210-ctx4k.pth
Re-introducing Unsloth: In celebration of us being the #1 Trending GitHub repo of the day, we reflect on our journey and contributions to the open-source community.
Reddit - Dive into anything: no description found
Unsloth Newsletter: Join our newsletter and waitlist for everything Unsloth!
Errors | Unsloth Documentation: To fix any errors with your setup, see below:
Unsloth - Dynamic 4-bit Quantization: Unsloth's Dynamic 4-bit Quants selectively avoids quantizing certain parameters. This greatly increases accuracy while maintaining similar VRAM use to BnB 4bit.
ktransformers/doc/en/DeepseekR1_V3_tutorial.md at main · kvcache-ai/ktransformers: A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations - kvcache-ai/ktransformers
Mistral-AI-Game-Jam (Mistral AI Game Jam): no description found
Beginner? Start here! | Unsloth Documentation: no description found
Tweet from Unsloth AI (@UnslothAI): Unsloth is the #1 trending repo on GitHub! 🦥It’s been an incredible journey and we couldn’t have done it without you! To celebrate, we’re taking a look back at how it all started and how we got here:...
Update peft_utils.py · fzyzcjy/unsloth-zoo@d2372ca: no description found
Gitingest: Replace 'hub' with 'ingest' in any GitHub URL for a prompt-friendly text.
Unsloth Documentation: no description found
[FIXED] `Qwen2VL` finetuning broken · Issue #1485 · unslothai/unsloth: Installing latest version of unsloth: !pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git breaks the Qwen2 7B Vision Colab ...
GitHub - TruffleClock/nano-r1: Contribute to TruffleClock/nano-r1 development by creating an account on GitHub.
GitHub - EvolvingLMMs-Lab/open-r1-multimodal: A fork to add multimodal model training to open-r1: A fork to add multimodal model training to open-r1 - EvolvingLMMs-Lab/open-r1-multimodal
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs: A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
GitHub - Zyphra/Zonos: Contribute to Zyphra/Zonos development by creating an account on GitHub.
GitHub - unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory - unslothai/unsloth
[FIXED] `Qwen2VL` finetuning broken · Issue #1485 · unslothai/unsloth: Installing latest version of unsloth: !pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git breaks the Qwen2 7B Vision Colab ...
Qwen2.5-VL-3B 4Bit train, 'requires_grad_' error · Issue #1613 · unslothai/unsloth: Hi! I am trying to sft the qwen2.5vl(unsloth/Qwen2.5-VL-3B-Instruct) model on google colab using the colab file https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7...
Large Language Models lack essential metacognition for reliable medical reasoning - Nature Communications: Large Language Models demonstrate expert-level accuracy in medical exams, supporting their potential inclusion in healthcare settings. Here, authors reveal that their metacognitive abilities are under...
FabFilter - Quality Audio Plug-Ins for Mixing, Mastering and Recording - VST VST3 AU CLAP AAX AudioSuite: no description found
Build a Digital Human Blueprint by NVIDIA | NVIDIA NIM: Create intelligent, interactive avatars for customer service across industries
Added Support for Apple Silicon by shashikanth-a · Pull Request #1289 · unslothai/unsloth: UnoptimizedNo gguf support yet.Build Triton and bitsandbytes from sourcecmake -DCOMPUTE_BACKEND=mps -S . for bitsandbytes buildingpip install unsloth-zoo==2024.11.4pip install xformers==0.0.25
Build software better, together: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Trainer: no description found

Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

Reasoning LLM using REINFORCE, Deepseek collaboration, ReMax RL Training for LLMs, Model merging challenges 

Concerns Over REINFORCE Approach: A member shared a link to a document regarding Reasoning LLM using REINFORCE, sparking debate on its originality.
Others questioned whether it offered anything novel, with one noting that an identical implementation already exists in Unsloth.

Discussion on Deepseek Availability: A member inquired about the collaboration on Deepseek, uncertain of its presence on Hugging Face.
Responses indicated that parts of its implementation might already be integrated or available in existing projects.

ReMax Framework Gains Attention: A tweet highlighted the launch of ReMax, a framework for RL Training for LLMs with features like higher throughput compared to PPO.
Members discussed its GitHub code, emphasizing its stability in training.

Skepticism on Model Merging: A user expressed interest in understanding the challenges of merging several effective models into a single mixture of experts (MoE).
This request prompted discussion about potential pitfalls and limitations associated with model merging strategies.

Links mentioned:

Tweet from Ziniu Li (@ZiniuLi): 🚀 Efficient RL Training for LLMs: ReMax, built on the Verl distributed framework, is now available! 🛠️🔑 Key Features:- Higher Throughput than PPO- Stable Training with theoretical guarantees on var...
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team

Unsloth AI (Daniel Han) ▷ #help (440 messages🔥🔥🔥):

Fine-tuning models, Handling OOM errors, Using quantized models, Reward functions in training, Model evaluation 

Challenges in Fine-tuning and Evaluation: Users reported issues with fine-tuning Qwen2 VL models, noting that their adjustments have had little effect on output accuracy despite decreasing training losses.
Several users expressed frustration finding effective approaches, with some believing their models were not improving despite applying LoRA techniques.

Dealing with OOM Errors: Multiple members faced out-of-memory (OOM) errors when attempting to load and fine-tune models, emphasizing the need to manage model complexity and resource allocation.
Suggestions included cleaning checkpoints and ensuring the configuration aligns with available hardware, particularly when using larger models.

Quantized Models and Performance: Discussion around using 4-bit quantized models highlighted potential performance benefits but also confusion regarding their implementation, especially post-quantization outputs.
Users noted that while dynamic quantization can enhance efficiency, it can sometimes result in unexpected output alterations, querying the balance between model size and result consistency.

Understanding Reward Functions in Training: Participants emphasized the significance of tailoring reward functions to encourage desired outputs, particularly in tasks with varying input complexities.
There was interest in how reward mechanisms could refine model responses, especially when datasets include nuanced examples such as scientific data extraction.

Building Models with Multi-GPU Support: A user inquired about training models across multiple GPUs within Unsloth, highlighting the limitations and lack of direct multi-GPU support currently.
Suggestions pointed towards potential future enhancements but noted the current need for single GPU training setups.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Vision Fine-tuning | Unsloth Documentation: Details on vision/multimodal fine-tuning with Unsloth
Beginner? Start here! | Unsloth Documentation: no description found
Google Colab: no description found
Finetuning from Last Checkpoint | Unsloth Documentation: Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
Datasets 101 | Unsloth Documentation: Learn all the essentials of creating a dataset for fine-tuning!
yukiarimo/yuna-ai-v3-full at main: no description found
simples - Overview: GitHub is where simples builds software.
Qwen2.5-VL-3B 4Bit train, 'requires_grad_' error · Issue #1613 · unslothai/unsloth: Hi! I am trying to sft the qwen2.5vl(unsloth/Qwen2.5-VL-3B-Instruct) model on google colab using the colab file https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7...
unsloth/DeepSeek-R1-Distill-Qwen-32B · Hugging Face: no description found
ollama/README.md at main · ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models. - ollama/ollama
Error `KeyError: 'layers.0.mlp.down_proj.weight'` when running Merged 4-bit Mistral Nemo in vLLM · Issue #1208 · unslothai/unsloth: I am attempting to fine-tune Mistral Nemo, save it as 4-bit merged, and run it in vLLM. I do not encounter any errors during the training process. However, when I attempt to serve the model with vL...
 - YouTube: no description found
s1/eval/generate.py at main · simplescaling/s1: s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on GitHub.
unsloth (Unsloth AI): no description found
finetune-Qwen2-VL/finetune.py at main · zhangfaen/finetune-Qwen2-VL: Contribute to zhangfaen/finetune-Qwen2-VL development by creating an account on GitHub.
GitHub - huggingface/trl: Train transformer language models with reinforcement learning.: Train transformer language models with reinforcement learning. - huggingface/trl
GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
Google Colab: no description found

Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Llama 3.2 model breakdown, Spark Engine v1 launch, No-code AI tools, AI model integration 

Detailed Breakdown of Llama 3.2 Model: A user has reassembled the Llama 3.2 1B model into its weights, providing insights into its architecture which can be explored on GitHub.
Share any suggestions for improvements on this breakdown are welcome to enhance its effectiveness.

Spark Engine v1 Launch Party: The latest version of Spark Engine v1 has been released after over a year of public beta, boasting over 80 AI models capable of generating various content types, including text, music, and videos.
The user expressed a desire to potentially integrate more infrastructure like Unsloth into the Spark Engine platform, fostering further advancements in the no-code AI realm.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
GitHub - SoumilB7/Llama3.2_1B_pytorch_barebones: Pytorch implementation of Llama 3.2 1B architecture barebones + nuggets of wisdom: Pytorch implementation of Llama 3.2 1B architecture barebones + nuggets of wisdom - SoumilB7/Llama3.2_1B_pytorch_barebones

Unsloth AI (Daniel Han) ▷ #research (42 messages🔥):

Dataset Curation Importance, Meta Reasoning and Lora Fine-Tuning, Optimizing Inference Methods, Long Output Format Challenges, Integrating Updated Language Specs 

Dataset Curation is Key to Model Success: It's been emphasized that 80% of any model's performance relies on careful dataset curation, highlighting its critical role in training.
One member noted, 'There is no such thing as redundant research - you learn from every paper,' which reflects a commitment to continuous learning.

Exploring Meta Reasoning with Lora: A member is experimenting with Lora settings to develop a metacognitive first-person reasoning format, aiming to improve reasoning capabilities.
They plan to extensively test their model settings and ensure it balances between retaining base knowledge and enhancing reasoning.

Optimizing Inference Methods for Improved Performance: Inference tests on Unsloth indicated it was both faster and more accurate compared to other methods, using bfloat16 across multiple GPUs.
Members discussed optimizations that could improve performance on lower-spec hardware while maintaining model accuracy.

Challenges with Long Output Formats in Fine-Tuning: Concerns were raised about the potential loss of learning in long output formats that share common structures, which may hinder the training of specific tasks.
One member addressed that token loss would allow the model to focus on dynamic parts, which would aid in learning desired tasks despite the lengthy prompts.

Integrating Updated Language Specifications into Models: A user sought advice on parsing updated language specifications for Janet to enhance Phi4's capabilities, given its knowledge cutoff in 2021.
The challenge lies in distilling current resources to improve the model's solutions in light of new updates and features in the language.

Links mentioned:

imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis: Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time comput...
AI in Investment Analysis: LLMs for Equity Stock Ratings: no description found

HuggingFace ▷ #general (252 messages🔥🔥):

AI Agents Course, Model Optimization, Mental Health Chatbot Recommendations, Data Privacy and Security, AI Presentation Generation Solutions 

Discussion on AI Agents Course: Several users confirmed their participation in the AI Agents course starting shortly, with some expressing excitement for the content.
Inquiries were made about course materials and quizzes to prepare for the upcoming sessions.

Exploring Smaller Models for Efficiency: Users discussed the importance of choosing smaller models to optimize performance and reduce hardware requirements while maintaining quality.
It was suggested to run comparisons of multiple models and employ techniques like batching requests for better efficiency.

Recommendations for Mental Health Chatbots: Various models from Hugging Face, including Aloe and OpenBioLLM, were recommended for mental health applications, highlighting their potential effectiveness.
Conversations addressed the use of local inference for maintaining data privacy when using AI tools in sensitive fields like health tech.

Data Privacy and Cloud Concerns: A discussion emerged about the implications of sending protected patient data to cloud services and the need for local processing requirements.
Various strategies, such as anonymizing data or using smaller models, were suggested as ways to mitigate privacy concerns while performing AI tasks.

AI for Presentation and UI Generation: A user requested solutions for automatically generating PowerPoint presentations and transforming designs from Figma into code.
They sought tools that could streamline these processes, leveraging AI for efficiency and reducing repetitive manual work.

Links mentioned:

LiveBench: no description found
LiveBench: no description found
Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Publications: no description found
Calendar: Listing of course modules and topics.
torch.cuda.manual_seed — PyTorch 2.6 documentation: no description found
Remygag Remy GIF - RemyGag Remy Gag - Discover & Share GIFs: Click to view the GIF
thrishala/mental_health_chatbot · Hugging Face: no description found
VideoLLaMA3 - a DAMO-NLP-SG Collection: no description found
Math - a Hugging Face Space by Tonic: no description found
Nemotron-Mini - a Hugging Face Space by Tonic: no description found
Using GPU Spaces: no description found
Samurai Japan GIF - Samurai Japan Windy - Discover & Share GIFs: Click to view the GIF
facebook/bart-large-cnn · Hugging Face: no description found
Reddit - Dive into anything: no description found
Start Locally: Start Locally
stabilityai/stable-diffusion-xl-refiner-1.0 · Only receiving black images?: no description found
FP16 support on gtx 1060 and 1080: Hello everyone,  I am a newbee with TensorRT.  I am trying to use TensorRT on my dev computer equipped with a GTX 1060.  When optimizing my caffe net with my c++ program (designed from the samples pro...
Tonic/MiniF2F · Datasets at Hugging Face: no description found
Mentalchat2 - a Hugging Face Space by bobpopboom: no description found
Testing - a Hugging Face Space by bobpopboom: no description found
Master AI image generation - ComfyUI full tutorial 2024: ComfyUI complete installation & tutorial. The ultimate image generator. Text to image, image to image, faceswap, controlnet, upscaling, external plugins, & m...
DeepAI: Artificially intelligent tools for naturally creative humans.
mzbac/function-calling-llama-3-format-v1.1 · Datasets at Hugging Face: no description found
 - YouTube: no description found
The Game Awards Matan GIF - The Game Awards Matan Matan Evenoff - Discover & Share GIFs: Click to view the GIF
huggingchat/chat-ui · [MODELS] Discussion: no description found

HuggingFace ▷ #today-im-learning (4 messages):

Hugging Face in healthcare, Adaptive tuning of KL loss in VAE training, Classifying scientific abstracts, Gradio front end for AI agent, Model checkpointing 

Exploring Hugging Face in Healthcare: A member is diving into how to use Hugging Face in the healthcare sector, indicating an interest in practical applications.
They expressed a desire to know if anyone else is pursuing this same topic.

Promising Results with Adaptive KL Loss Tuning: Another member shared insights on using various approaches for adaptively tuning KL loss during VAE training, reporting interesting results.
They noted the success of reducing the weight to 0 to stave off collapse while training.

Classifying Scientific Publications: A user is creating a model to classify abstracts from scientific publications into two distinct categories using their private data.
They emphasized their focus on classifiers in their project.

Building a Gradio Front End for AI Monitoring: One member is learning how to create a Gradio front end for an AI agent that monitors code execution, sharing a specific model name.
They attached multiple images related to their work, illustrating their progress in building this application.

HuggingFace ▷ #cool-finds (6 messages):

Reasoning Language Models, Writing-in-the-Margins bounty, Physical Perceptrons, Markdrop Python Package 

Exploring Reasoning Language Models: A YouTube video titled "From Large Language Models to Reasoning Language Models - Three Eras in The Age of Computation" was shared, diving into the evolution of LLMs and their impact on computation.
The discussion emphasized the transformative journey through various computational lenses.

$5k Bounty for Writing-in-the-Margins: A member highlighted that there is a $5,000 bounty available for implementing the Writing-in-the-Margins inference pattern in vllm.
This feature aims to enhance results for long context window retrieval, with a detailed motivation provided.

The Quest for Physical Perceptrons: A member questioned if anyone had seen a physical Perceptron, referencing a YouTube video for context.
This inquiry opened up discussions about the historical construction and significance of perceptrons in AI.

Markdrop's PDF Conversion Power: A new Python package called Markdrop was introduced for converting PDFs to markdown, including image and table extraction features.
It offers advanced functionalities such as AI-powered content analysis, automatic image extraction, and interactive HTML output.

Links mentioned:

From Large Language Models to Reasoning Language Models - Three Eras in The Age of Computation.: In this talk, we explore the fascinating evolution of Large Language Models (LLMs) and their transformative journey through the lenses of computation and opt...
GitHub - shoryasethia/markdrop: A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.: A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more func...
markdrop: A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extra...
[Feature]: Integrate Writing in the Margins inference pattern ($5,000 Bounty) · Issue #9807 · vllm-project/vllm: 🚀 The feature, motivation and pitch Writer has introduced "Writing in the Margins" algorithm (WiM) that boosts results for long context window retrieval. The task is composed from "con...

HuggingFace ▷ #i-made-this (16 messages🔥):

Kokoro TTS integration, Dataset Tools update, Spark Engine launch, Markdrop PDF tool, go-attention implementation 

Kokoro TTS now open-sourced with C# library: A member announced the release of a C# library for Kokoro TTS, enabling plug & play integration on .NET platforms, available on GitHub. This library supports fast local TTS inference and works across multiple platforms.
The library promises a multilingual experience with all voices packaged in a convenient format.

Dataset Tools gets an update for EXIF and AI Metadata: The Dataset organizer and EXIF Viewer received updates, enhancing its capabilities to view advanced EXIF data and supporting formats like GGUF and JPEG, shared on GitHub.
The developer utilized AI tools to assist in the project, enhancing its features while collaborating with others for code optimization.

Spark Engine v1 officially launched: The Spark Engine v1 was released after a year-long public beta, providing over 80 models for various AI tasks available at sparkengine.ai.
The platform offers free credits daily and integrates with Hugging Face, making a robust no-code environment for users to experiment with AI capabilities.

Markdrop provides advanced PDF to Markdown features: A new Python package called Markdrop was introduced, designed for converting PDFs to Markdown with features like image extraction and AI-powered descriptions, accessible on GitHub.
In just a month, it has achieved over 7,000 installs, showcasing its popularity among users looking for document manipulation tools.

Innovative go-attention implementation for transformers: A member shared their project, go-attention, which showcases the first full attention mechanism and transformer built in pure Go, highlighting its unique capabilities on GitHub.
The project invites others to check out examples and explore the potential of serverless implementations in Go programming.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
Mixture Of Diffusers SDXL Tiling - a Hugging Face Space by elismasilva: no description found
Mini Datathon - a Hugging Face Space by jeremyadd: no description found
Design 101: A Historical and Theoretical Exploration of Graphic Arts and Design in the Age of AI: no description found
GitHub - shoryasethia/markdrop: A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.: A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more func...
no title found: no description found
GitHub - Lyrcaxis/KokoroSharp: Fast local TTS inference engine with ONNX runtime. Multi-speaker, multi-platform and multilingual.  Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices.: Fast local TTS inference engine with ONNX runtime. Multi-speaker, multi-platform and multilingual.  Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices. - ...
GitHub - takara-ai/go-attention: A full attention mechanism and transformer in pure go.: A full attention mechanism and transformer in pure go. - takara-ai/go-attention
Tonic/Climate-Guard-Toxic-Agent · Datasets at Hugging Face: no description found
GitHub - Ktiseos-Nyx/Dataset-Tools: A Simple Viewer for EXIF and AI Metadata: A Simple Viewer for EXIF and AI Metadata. Contribute to Ktiseos-Nyx/Dataset-Tools development by creating an account on GitHub.
GitHub - duskfallcrew/Beetlejuice_Summoning: Literally just summons a youtube video after you say his name 3x spoken unbroken, and makes sure you enter one of the "WHOLE BEING DEAD THING" lyrics. It's untested, but i was using GPT to be a nerd.: Literally just summons a youtube video after you say his name 3x spoken unbroken, and makes sure you enter one of the "WHOLE BEING DEAD THING" lyrics. It's untested, but i wa...
Baby Cute GIF - Baby Cute Go - Discover & Share GIFs: Click to view the GIF

HuggingFace ▷ #reading-group (15 messages🔥):

AI Reading Group Schedule, Role Management, Women in AI & Robotics 

February AI Reading Group Session Reminder: The next session of the AI Reading Group from Women in AI & Robotics is scheduled for this Thursday at 12pm EST / 5pm GMT. Participants can join the live stream in the reading group voice channel here.
It was noted that these sessions occur approximately once a month, with opportunities for paper authors to present.

Role Removal Assistance: A member inquired about removing a role, and was informed that it may have been removed or is undergoing an overhaul. Another member promptly assisted by stating, 'I've removed the role for you.'
The original member expressed gratitude with a quick, 'tysm.'

Interest in Participation: Members expressed interest in attending the AI Reading Group if time allows, highlighting engagement with deeper topics like deep tech engineering. One member communicated enthusiasm about potentially participating in future sessions.

HuggingFace ▷ #computer-vision (6 messages):

Computer Vision for Manufacturing, Screenshot Analysis Models, Roboflow Maestro for Multimodal Models, BLIP Fine-tuning Script, AutoML for Image Classification 

Computer Vision Enhances Manufacturing Inspections: One member is experimenting with using computer vision in a manufacturing setting to inspect products and analyze both visual features and production logs.
This approach aims to ensure quality by effectively merging visual and textual data.

Struggles with Browser Screenshot Interpretation: A user expressed frustration over the difficulty of finding a computer vision model that accurately interprets visual components on browser screenshots.
Despite recent impressive results in the field, they noted that existing models, particularly those utilizing GPT-4V, haven't provided the detail needed for effective integration into their workflow.

Exploring Roboflow Maestro for Model Fine-tuning: A member shared a link to Roboflow Maestro as a resource for streamlining the fine-tuning process of multimodal models like PaliGemma 2 and Florence-2.
This user is considering trying it out, but currently relies on a BLIP fine-tuning script.

Interest in AutoML for Image Classification: A member inquired if anyone has tried an AutoML model for image classification tasks.
This highlights ongoing interest in simplifying the process of model selection and training for specific computer vision needs.

Link mentioned: GitHub - roboflow/maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL - roboflow/maestro

HuggingFace ▷ #NLP (12 messages🔥):

Context Length Problem in LLMs, Sentiment Classification with roBERTa, Evaluation of Embedding Models, Product Quantization Techniques, Coarse Quantization in PQ 

Seeking Solutions for Context Length in LLMs: A member expressed frustration over finding a practical solution to the context length problem in LLMs, emphasizing concerns about accuracy dropping as context length increases.
They are looking for high-quality approaches that don't sacrifice accuracy for extended context.

Sentiment Analysis with Twitter roBERTa: One user shared their method of classifying sentiments using the Twitter-RoBERTa model, stating that they utilize the Hugging Face classification pipeline.
They noted that the model's outputs sometimes yield unusually high confidence scores across all sentiment categories.

Evaluation Techniques for Embedding Models: A member asked for community feedback on a paper proposing a unified evaluation method independent of downstream tasks, focused on embedding models.
The paper aims to correlate theoretical foundations with practical performance metrics to ensure better evaluation standards.

Insights on Product Quantization: Discussion emerged around the Product Quantization (PQ) technique, especially its implications for using subtleties in word embeddings, and concerns about information loss during quantization.
One user inquired about the trade-off between compression benefits and potential meaning alteration in embeddings.

Understanding Coarse Quantization: A user sought clarification on coarse quantization in the context of Product Quantization and reported difficulties finding sufficient materials on the topic.
They highlighted frustrations with existing AI tools failing to provide adequate answers regarding this concept.

Links mentioned:

When is an Embedding Model  More Promising than Another?: Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of...
cardiffnlp/twitter-roberta-base-sentiment · Hugging Face: no description found
Pipelines: no description found

HuggingFace ▷ #smol-course (19 messages🔥):

Course Registration Issues, Live Q&A Session Announcement, Certification Notification Process, GitHub Pull Request for Course Content, YouTube Course Introduction 

Registered but no updates received: Several members expressed concerns about not receiving updates after registering for the course, including jyotip2217 and pierre2452.
This raises questions about the communication process for participants who have signed up for the course.

Live Q&A scheduled for February 12th: A member provided a link to a YouTube video detailing the course's introduction and the upcoming live Q&A session on February 12th at 5 PM.
Participants were informed that this session will cover course logistics and provide a platform for questions.

Clarification needed on certification notification: A participant asked if there is a specific notification process required for certification after enrolling using their Hugging Face account.
The uncertainty points to a potential gap in instructional clarity regarding course participation and certification expectations.

GitHub collaboration for course content: Member burtenshaw shared a GitHub Pull Request aimed at migrating content from the smol course to the NLP course, including interactive quizzes.
This effort seeks to enhance the course by integrating more engaging materials and is open for collaboration.

YouTube video introduction to the course: The shared YouTube video titled 'Welcome To The Agents Course' introduces the course structure and scope, serving as a resource for new participants.
This video aims to clarify upcoming course milestones, helping individuals navigate the initial phases effectively.

Links mentioned:

Welcome to the 🤗 AI Agents Course - Hugging Face Agents Course: no description found
Introduction to Agents - Hugging Face Agents Course: no description found
Welcome To The Agents Course! Introduction to the Course and Q&A: In this first live stream of the Agents Course, we will explain how the course will work (scope, units, challenges and more) and answer your questions.Don't ...
[CHAPTER] New chapter on supervised fine tuning based on smol course by burtenshaw · Pull Request #777 · huggingface/course: This is a draft PR for discussion.It would be cool to reuse the smol course chapter on SFT in the HF NLP course.Here I've just copied across the content, but here the next step I would propos....

HuggingFace ▷ #agents-course (704 messages🔥🔥🔥):

Course Introduction, Networking among participants, Python knowledge requirements, AI Agents discussions, International participation 

Course Introduction and Expectations: Participants are expressing excitement for the AI Agents course and discussing its launch, with many inquiring about course access and requirements.
People are eager to start learning and collaborating, with mentions of the course's focus on practical knowledge.

Networking Among Participants: Users are introducing themselves along with their locations, creating a sense of community among participants from various countries.
Many express interest in connecting based on shared experiences in AI and related fields.

Python Knowledge Requirements: Some participants are querying about the level of Python knowledge required for the course, indicating varying backgrounds in programming.
There is concern among users with basic Python skills, seeking reassurance about their ability to keep pace.

International Participation: The channel features a diverse group of participants from numerous countries, including India, the U.S., France, and Brazil.
Participants are excited to learn together, appreciating the global nature of the course.

Technical Access Issues: Some users report difficulties with account verification and accessing course channels on Discord after signing up.
There's ongoing discussion about troubleshooting these issues, with users sharing experiences regarding verification.

Links mentioned:

Tweet from undefined: no description found
Agents: Authors: Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic
AI Python for Beginners: Basics of AI Python Coding - DeepLearning.AI: Learn Python programming with AI assistance. Gain skills writing, testing, and debugging code efficiently, and create real-world AI applications.
Seattle Space GIF - Seattle Space - Discover & Share GIFs: Click to view the GIF
LLM Powered Autonomous Agents: Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The p...
Tijuana GIF - Tijuana - Discover & Share GIFs: Click to view the GIF
Amay Kataria 3.0: no description found
Hello Hi GIF - Hello Hi Hy - Discover & Share GIFs: Click to view the GIF
Napoleon Dynamite Wave GIF - Napoleon Dynamite Wave Bye - Discover & Share GIFs: Click to view the GIF
ıtaly Italya GIF - Italy Italya Italiana - Discover & Share GIFs: Click to view the GIF
Agents: Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines ...
Greetings Chat Chrissss Gridman GIF - Greetings Chat Chrissss Gridman Dante - Discover & Share GIFs: Click to view the GIF
agents-course (Hugging Face Agents Course): no description found
GitHub - huggingface/agents-course: This repository contains the Hugging Face Agents Course.: This repository contains the Hugging Face Agents Course.  - GitHub - huggingface/agents-course: This repository contains the Hugging Face Agents Course.
GitHub - mohamedsheded/Agentic-design-patterns: A repo for implementing and understanding design patterns of agentic workflows: A repo for implementing and understanding design patterns of agentic workflows - mohamedsheded/Agentic-design-patterns
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter

HuggingFace ▷ #open-r1 (7 messages):

Reasoning Datasets, Model Training, Distillations, Learning Math 

Exploring Reasoning Datasets: A member directed others to check out various reasoning datasets available here, particularly highlighting the Bespoke-Stratos-17k dataset.
Another member expressed gratitude, noting that this information was most helpful.

Attempting R1 Style Reasoning: One user mentioned experimenting with teaching a model R1 style reasoning to aid in learning math, indicating a visual demonstration was attached.
The focus seems to be on simplifying the reasoning process, as discussed among members.

Discussion on Result Quality: A member suggested removing the  component entirely to assess differences in result quality.
This sparked a lighthearted conversation about the implications of such a change.

Patience Required for Training: A member highlighted the lengthy training time, stating that training any model on a 4060ti takes about 6 hours, raising doubts about the efficacy of reasoning addition.
Despite the challenges, some progress in learning math was noted, underscoring a commitment to the process.

Link mentioned: 🧠 Reasoning datasets - a open-r1 Collection: no description found

LM Studio ▷ #general (596 messages🔥🔥🔥):

Qwen Models, LM Studio Functionality, Embedding Models, Comparing AI Models, Using APIs 

Qwen 2.5 vs Llama 8B: Users discussed the performance differences between Qwen 2.5 and Llama 8B, with Qwen generally providing faster responses due to its optimization.
It was suggested that Qwen 2.5 is a better option if users have the necessary hardware to run larger models like 32B.

Troubleshooting Model Loading Issues: Users reported various problems loading models into LM Studio, with suggestions to detail system specifications and provide screenshots for better assistance.
Errors like 'NO LM Runtime found for model format' indicate potential hardware limitations, emphasizing the importance of matching model size to system capabilities.

Utilizing Local Servers: Queries were raised about accessing LM Studio models via local server, requiring connections to front-end applications for effective usage.
Suggestions included using compatible APIs as LM Studio does not feature a built-in web UI, highlighting the need for external integrations.

Model Configuration and Performance: Discussion centered around adjusting settings like temperature and batch size in LM Studio to optimize performance based on available RAM and VRAM.
Users were advised that configuration tuning is crucial for achieving desired results from AI models, particularly for intensive applications.

AI for Coding and Projects: Inquiries about using Qwen models for programming tasks led to insights about various models like Mistral and alternatives for effective coding assistance.
The conversation emphasized that while powerful models exist, starting with smaller, manageable ones might provide a better learning experience for beginners.

Links mentioned:

Blog: Qwen
Trump Thug Life GIF - Trump Thug Life - Discover & Share GIFs: Click to view the GIF
Import Models | LM Studio Docs: Use model files you've downloaded outside of LM Studio
Download an LLM | LM Studio Docs: Discover and download supported LLMs in LM Studio
Rpx_syria Mic Drop GIF - Rpx_syria Mic Drop - Discover & Share GIFs: Click to view the GIF
closedagi/gpt5-r3-claude4-magnum-dong-slerp-raceplay-dpo-4.5bpw-1.1T-exl2-rpcal · Hugging Face: no description found
Stanley Hudson GIF - Stanley Hudson The - Discover & Share GIFs: Click to view the GIF
Everythings Under Control Happily GIF - Everythings Under Control Happily The Situation Is Under Control - Discover & Share GIFs: Click to view the GIF
GitHub - ishan-marikar/lm-studio-ollama-bridge: lm-studio-ollama-bridge is a standalone Go-based utility that synchronizes your local Ollama models with external applications such as LM Studio. Inspired by the original matts-shell-scripts/syncmodels project: lm-studio-ollama-bridge is a standalone Go-based utility that synchronizes your local Ollama models with external applications such as LM Studio. Inspired by the original matts-shell-scripts/syncmo...
LM Studio REST API (beta) | LM Studio Docs: The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.
OpenAI Compatibility API | LM Studio Docs: Send requests to Chat Completions (text and images), Completions, and Embeddings endpoints
LM Studio REST API (beta) | LM Studio Docs: The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.

LM Studio ▷ #hardware-discussion (149 messages🔥🔥):

GPU Overclocking, M4 Ultra vs M2 Ultra, AMD vs NVIDIA Performance, LM Studio with Intel Macs, PCI-E Riser Cables for Extra GPUs 

Overclocking GPU Memory Discussion: Members debated whether overclocking GPU memory increases inference speed, with one noting that memory overclocking might yield marginal benefits without significant gains.
Mistral and its limitations were also mentioned, highlighting the importance of model fit for optimal GPU performance.

Comparing M4 Ultra and M2 Ultra: A discussion emerged regarding the value of waiting for the M4 Ultra compared to purchasing the M2 Ultra to run models more efficiently.
Concerns were shared about the model performance on M2 Ultra amidst rising costs associated with maintaining subscriptions to existing services.

AMD vs NVIDIA Performance Metrics: Members compared the performance of the AMD 7900 XTX with the NVIDIA 4090, noting that benchmarks could vary based on the software optimizations available.
Some members pointed out that results might differ depending on whether the software supports ROCm or CUDA.

LM Studio's Compatibility with Intel Macs: Users confirmed that LM Studio does not support Intel Macs unless using Boot Camp, which allows the installation of Windows.
While some alternatives like Open-webui are available, questions about model performance and GPU usage on Intel Macs were raised.

Using PCI-E Riser Cables for Additional GPUs: A user wondered about the performance implications of using PCI-E riser cables to install additional GPUs, specifically discussing the potential compatibility with A5000 cards.
Meanwhile, a suggestion was made to repurpose old cases as GPU holders for better cooling and space management.

Links mentioned:

Run DeepSeek-R1 Dynamic 1.58-bit: DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.Run the 1.58-bit Dynamic GGUF version by Unsloth.
Dolphin 2.9.2 Mixtral 8x22B 🐬 - API, Providers, Stats: Dolphin 2.9 is designed for instruction following, conversational, and coding. Run Dolphin 2.9.2 Mixtral 8x22B 🐬 with API
Install Windows 10 on your Mac with Boot Camp Assistant - Apple Support: Learn how to install Windows 10 on your Mac with Boot Camp.
 - YouTube: no description found
Reddit - Dive into anything: no description found
Run LM Studio as a service (headless) | LM Studio Docs: GUI-less operation of LM Studio: run in the background, start on machine login, and load models on demand
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

OpenAI ▷ #annnouncements (1 messages):
OpenAI: https://youtu.be/kIhb5pEo_j0

OpenAI ▷ #ai-discussions (705 messages🔥🔥🔥):

AI Model Performance, Gemini vs ChatGPT, DeepSeek Restrictions, User Feedback Mechanism, Emerging AI Technologies 

Concerns Over AI Model Reliability: Users discussed the changes in AI model performance, particularly expressing concerns about models being 'lobotomized' after updates, leading to reduced output quality and consistency.
There's a general sentiment that many updates make previously capable models perform worse, creating distrust among users.

Gemini Gains Popularity with Large Context Windows: Gemini’s capability to handle 1-2 million tokens has made it popular among users, particularly compared to ChatGPT’s limitations of 32k and 128k tokens.
Users appreciate Gemini’s flexible features, which enhance its usability for complex tasks and projects.

DeepSeek's Usage Restrictions: There were discussions about DeepSeek's usage limitations, with reports of high use being categorized as abusive, prompting user concerns about the term 'unlimited'.
The restrictions, seemingly applied inconsistently, raised questions about the transparency of OpenAI's policies and user expectations.

Feedback Mechanisms in AI: Users inquired about how feedback is processed within ChatGPT, prompting discussions about whether the feedback leads to meaningful improvements for individual contexts.
Concerns were expressed regarding the lack of transparency related to feedback implementation and model updates.

Social AI and Ethical Considerations: The conversation touched on the potential of socially-oriented AIs that leverage community data to counter the influence of wealthy individuals and companies.
Participants debated the implications of utilizing AI trained on shadowy realms like the dark web and the ethics surrounding such technologies.

Links mentioned:

Android – Clone: no description found
The Pro tier is NOT 'near unlimited': Every evening I am getting the following restriction applied to my account (and then removed by the help team after waiting for ages).  “Unusual activity detected with your o1 usage. We’ve temporarily...
From Base Models to Reasoning Models: Understand how modern LLM are trained and particularly new reasoning models like Deepseek-R1
Datenschützer wollen chinesische KI-Anwendung DeepSeek prüfen: DeepSeek verblüfft und verunsichert die Tech-Welt. Der neue Chat-Bot funktioniert ähnlich wie sein Konkurrent ChatGPT - soll aber günstiger entwickelt worden sein. Datenschützer wollen die KI prüfen.

OpenAI ▷ #gpt-4-discussions (23 messages🔥):

GPT-4 performance concerns, ChatGPT connection issues, Using GPT for code research, Emoji usage in responses, Children's storybook creation 

GPT-4 performance feeling weak: Members expressed concerns that GPT-4 feels less capable now compared to initial excitement, noting it requires better prompting to yield good results.
It's not about weakness, said one member, pointing out that earlier models created a perception of inferiority in complex tasks.

ChatGPT experiencing connection errors: Several users reported ongoing connection errors while using ChatGPT, raising concerns about accessibility.
One user highlighted that these issues could be tied specifically to the ChatGPT app rather than general model usage.

Code research with GPT: A user inquired about the feasibility of using GPT for detailed code research, particularly for less-trained programming languages.
Another shared a positive experience with SwiftUI documentation, stating it effectively contributed to completing their project.

Concerns over emoji use in 4o: Users discussed the influx of emojis in responses from GPT-4o, questioning whether it was intended to prevent misuse by other models.
One member criticized it as a result of a bad update, calling it annoying and unhelpful.

Using GPT for children's storybooks: A member shared their experiences of using GPT for creating children's storybooks, prompting interest from others.
This conversation indicates a growing interest in leveraging GPT capabilities for creative storytelling.

OpenAI ▷ #prompt-engineering (11 messages🔥):

Indirect Prompt Injection Vulnerability, Managing URLs in Prompts, Improving ChatGPT Responses, Effective Prompt Hygiene, Attention Management Techniques 

Concerns Over Indirect Prompt Injection: A member questioned whether OpenAI has disclosed if deep research is vulnerable to indirect prompt injection from scraped pages, implying a need for data sanitization.
Another member was optimistic about an upcoming feature relating to this concern, expressing eagerness for more information.

Markdown URLs Get Better Attention: It was observed that ChatGPT is more effective with links described in markdown rather than just plain URLs, as they enhance prompt hygiene.
Members agreed that using well-formatted structured data like JSON can help manage large blocks of information effectively.

Desire for Concise Responses from ChatGPT: A member expressed frustration over ChatGPT's lengthy and fragmented outputs, wishing for it to provide concise answers instead of overwhelming information.
Recommendations were made to prioritize direct instruction to the model, ensuring it understands the user's preferences for response style.

Guidance on Effective Prompting: A member advised starting a conversation with ChatGPT to clarify needs, which helps improve dialogue and response customization.
It's suggested to clearly define any specific requirements or quirks to guide the model's understanding and output.

Attention Management for Structured Formats: Members discussed the use of markdown or YAML for managing attention, noting that structured formats like JSON can also be effective if formatted correctly.
This leads to better engagement with links and promotes clear data presentation, enhancing the overall interaction with GPT.

OpenAI ▷ #api-discussions (11 messages🔥):

Prompt Injection Vulnerability, Markdown for URL Management, Response Management in ChatGPT, Formatting for Effective Responses, Attention Management Techniques 

Prompt Injection Concerns in Deep Research: A member raised concerns about potential indirect prompt injection vulnerabilities in deep research, questioning if enough sanitization occurs on scraped pages.
Another member suggested that we will soon have more information as the feature is expected to be tested.

Markdown Improves GPT's URL Following: A user noted that GPT performs better when URLs are presented in markdown format, enhancing prompt hygiene through organized presentation.
Another member supported this point, suggesting that clean formatting like YAML or JSON is crucial for effective attention management.

Managing Large Data Blocks Effectively: A member shared that providing contexts over a page in paged JSON files leads to better response management from GPT.
They emphasized that less dynamic context helps in producing more effective results.

Seeking Clarity in GPT Responses: A user expressed frustration at GPT's scattered responses, asking for a more concise delivery without excessive information.
Advice was offered on having clearer conversations with GPT to better communicate specific user preferences.

Advice on Communicating Needs to GPT: One member advised explicitly stating needs to the model to achieve better outputs, highlighting the importance of guiding GPT's understanding.
Clarification that quirks or specific conditions should be communicated so GPT can tailor responses appropriately.

Cursor IDE ▷ #general (644 messages🔥🔥🔥):

Cursor MCP Servers, Perplexity Integration, Agent Mode, MCP Setup, Performance Issues 

Cursor MCP Servers Discussed: The channel discussed various MCP servers, specifically the Perplexity MCP server, detailing its setup and functionality within Cursor, including how to utilize it effectively.
Users shared their experiences and difficulties, with some attempting to integrate various models into their workflows for improved coding assistance.

Agent Mode Functionality: Users explored the functionalities of agent mode and its advantages over standard coding commands, particularly praising its capabilities for debugging and direct communication with models like Perplexity.
There was a consensus that integrating different LLMs could enhance the coding experience, particularly with features that allow for searching and real-time assistance.

MCP Server Installation Issues: Several users encountered issues while setting up MCP servers, particularly with command execution and server responses on different operating systems like Mac and Windows.
Discussions included troubleshooting command prompts that returned errors or failed to connect, indicating a need for clearer documentation and support.

Cursor Rules and Enhancements: Participants discussed the possibility of creating custom cursor rules that could improve the implementation of specific features while using the Perplexity MCP server.
Users emphasized the potential benefits of having integrated cursor rules to streamline workflow and enhance the capabilities of the AI in responding to complex code-related queries.

Performance and Limitations: There were discussions surrounding the performance of various models, including reports of degradation in service and concerns about fast API call limits in Cursor.
Participants noted that MCP servers, if used correctly, could alleviate some performance issues and provide better results compared to traditional web scraping methods.

Links mentioned:

Spark Engine - The AI Sandbox: Turn ideas into AI-powered products, no coding experience required
How to use Cursor Rules in Version  0.45: Master coding with the Cursor AI course. Build websites, apps, and software faster, with fewer errors. Perfect for beginners and pros. Create personal blogs to complex web apps effortlessly. No AI exp...
PostgREST (Supabase) MCP Server by Supabase | PulseMCP: MCP (Model Context Protocol) Server. Connects to Supabase projects using PostgREST, or standalone PostgREST servers, enabling natural language querying and management of PostgreSQL data.
Visual Studio Code - Code Editing. Redefined: Visual Studio Code redefines AI-powered coding with GitHub Copilot for building and debugging modern web and cloud applications. Visual Studio Code is free and available on your favorite platform - Li...
AI Prompts | Supabase Docs: Prompts for working with Supabase using AI-powered IDE tools
Perplexity MCP Server | Smithery: no description found
Perplexity Server | Smithery: no description found
Tweet from Dan (@danperks_): looking for some passionate people who live and breath @cursor_ai to join us in user ops!feel free to reach out to me or Eric for more info!Quoting eric zakariasson (@ericzakariasson) we're expand...
Perplexity MCP Server | Smithery: no description found
Tweet from FxTwitter / FixupX: Sorry, that user doesn't exist :(
Ctrl+A doesn't work in Composer user message box when last line is an empty line: Ctrl+A doesn’t work in Composer user message box when last line is an empty line  When pressing Ctrl+A, the cursor moves to the start of the message, but no text is actually selected.     Version: 0.4...
Using Cursor with Convex | Convex Developer Hub: Tips and best practices for using Cursor with Convex
servers/src/puppeteer at main · modelcontextprotocol/servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
Get Started / Migrate from VS Code – Cursor: no description found
Tweet from Michael Feldstein (@msfeldstein): The @cursor_ai Agent can now generate images using @FAL via MCP tools
Tweet from Ryo Lu (@ryolu_): 4 days in @cursor_ai• Everyone is cracked• @shaoruu keeps chasing me for designs and builds them• 1 meeting per week• Got a line up coming in the next release for you already, here’s a lil’ preview ⚡️
Tweet from ian (@shaoruu): built a 3D basketball court using @cursor_ai, press "H" to feel like stephen curry 🏀
Cursor removing Itself?: The cursor application is deleting itself.
GitHub - daniel-lxs/mcp-starter: Contribute to daniel-lxs/mcp-starter development by creating an account on GitHub.
Tweet from Cursor (@cursor_ai): Cursor going entirely from ticket to PR!We've shipped several improvements to Cursor's agent, including support for custom tools, better semantic search, and the ability to fix lints.
guides/cursor-mcp-setup.md at main · JeredBlu/guides: Contribute to JeredBlu/guides development by creating an account on GitHub.
GitHub - daniel-lxs/mcp-perplexity: Contribute to daniel-lxs/mcp-perplexity development by creating an account on GitHub.
Release v0.1.3 · daniel-lxs/mcp-starter: Remove log line that might cause issues for Mac users adding mcp-starter to cursor
Release v0.1.1 · daniel-lxs/mcp-starter: No longer opens a command prompt window on Windows.Full Changelog: v0.1.0...v0.1.1
Smithery - Model Context Protocol Registry: no description found
Open-Source MCP servers: Enterprise-grade security, privacy, with features like agents, MCP, prompt templates, and more.
Cursor Directory: Find the best cursor rules for your framework and language
GitHub - eastlondoner/cursor-tools: Give Cursor Agent an AI Team and Advanced Skills: Give Cursor Agent an AI Team and Advanced Skills. Contribute to eastlondoner/cursor-tools development by creating an account on GitHub.
GitHub - getcursor/crawler: Easily show documentation to Cursor's coding AI: Easily show documentation to Cursor's coding AI. Contribute to getcursor/crawler development by creating an account on GitHub.
awesome-cursor-mpc-server/src/index.ts at main · kleneway/awesome-cursor-mpc-server: Example of an MCP server with custom tools that can be called directly from cursor - kleneway/awesome-cursor-mpc-server
GitHub - modelcontextprotocol/servers: Model Context Protocol Servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
microsoft/vscode: Visual Studio Code. Contribute to microsoft/vscode development by creating an account on GitHub.
no title found: no description found
Cursor + MCP Servers: Complete Setup Guide (Sequential Thinking, Brave Search, & More): Cursor just added MCP support! In this complete setup guide, I'll show you how to integrate and use MCP servers (Sequential Thinking, Brave Search, and Puppe...
svelte-llm - Svelte 5 and SvelteKit Developer documentation in an LLM-ready format: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found

Stability.ai (Stable Diffusion) ▷ #general-chat (599 messages🔥🔥🔥):

AI Image Training, Stable Diffusion Models, ComfyUI, Lora Models, Image Resolution and Quality 

Training Lora Models with Unique Tags: There is a discussion on how unique tags, like naming bedrooms or streets in training data, can be used to improve consistency in Lora models.
Using unique tags is believed to help the model associate specific scenes with those names, enhancing narrative continuity in generated images.

Recommended Resolutions for Flux: The optimal latent sizes for Flux are discussed, with recommendations around 672x1024 or 1024x672 for best results, while 1920x1088 is mentioned as a suitable quick HD generation size.
Concerns are raised about generating images at resolutions above 1mp during initial passes, as they may lead to compositional issues.

Using ComfyUI with Photoshop Integrations: Users are discussing the integration of various plugins for ComfyUI with Photoshop, including Auto-Photoshop-StableDiffusion-Plugin and others.
These plugins aim to facilitate the generation of stable diffusion images within Photoshop using a ComfyUI backend.

Issues and Solutions in Stable Diffusion: Several users are troubleshooting issues related to GPU errors and slow performance in different UI paths of Stable Diffusion, with suggestions to lower GPU settings to resolve memory issues.
There are shared recommendations for using specific settings and maintaining aspect ratios to improve model performance and output quality.

Legal Discussions Around AI-Generated Art: There is a conversation about copyright issues concerning AI-generated images, highlighting a recent case where an AI-produced image received copyright protection due to sufficient human input in its creation.
This case is viewed as potentially setting a legal precedent for AI-generated content and its ownership.

Links mentioned:

no hello: please don't say just hello in chat
imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
‎TryFit - AI Outfit Changer: ‎HOW TO USE TRYFIT:1)Download the app2)Upload a full-body photo and Outfit Photo you want to try-on3)Click Generate.4)See yourself in new styles instantly*Transform your shopping experience with TryFi...
What is score_9 and how to use it in Pony Diffusion | Civitai: Interested in next version of Pony Diffusion? Read update here: https://civitai.com/articles/5069/towards-pony-diffusion-v7 You may've seen score_9...
This Company Got a Copyright for an Image Made Entirely With AI. Here's How: The image, called "A Single Piece of American Cheese," was created using Invoke's AI editing platform.
DeepL Translate: The world's most accurate translator: Translate texts & full document files instantly. Accurate translations for individuals and Teams. Millions translate with DeepL every day.
Decart: no description found
Tweet from Based Labs AI (@BasedLabsAI): A quick guide to LoRa Training on BasedLabs ⬇️Comment, retweet, and DM us If you'd like to try it for free 🚀
Federal Register :: Request Access: no description found
 - YouTube: no description found
lucataco/dotted-waveform-visualizer – Run with an API on Replicate: no description found
Webui Installation Guides: Stable Diffusion Knowledge Base (Setups, Basics, Guides and more) - CS1o/Stable-Diffusion-Info
 - YouTube: no description found
Pony Diffusion V6 XL - V6 (start with this one) | Stable Diffusion Checkpoint | Civitai: Pony Diffusion V6 is a versatile SDXL finetune capable of producing stunning SFW and NSFW visuals of various anthro, feral, or humanoids species an...
GitHub - AbdullahAlfaraj/Auto-Photoshop-StableDiffusion-Plugin: A user-friendly plug-in that makes it easy to generate stable diffusion images inside Photoshop using either Automatic or ComfyUI as a backend.: A user-friendly plug-in that makes it easy to generate stable diffusion images inside Photoshop using either Automatic or ComfyUI as a backend. - AbdullahAlfaraj/Auto-Photoshop-StableDiffusion-Plugin
GitHub - zombieyang/sd-ppp: Communicate between Photoshop and ComfyUI: Communicate between Photoshop and ComfyUI. Contribute to zombieyang/sd-ppp development by creating an account on GitHub.

Nous Research AI ▷ #general (541 messages🔥🔥🔥):

Nous Research and AI Development, Reinforcement Learning in AI, Granite 3.1 Model Training, Tree Search Methods in AI, Voice Cloning Technology 

Nous Research's Approach to AI: Discussion highlighted how Nous Research relies on breakthroughs from larger companies like META and DeepSeek to enhance their AI models, akin to learning from existing codebases before innovating.
The conversation also touched on the funding challenges for smaller startups and the importance of developing cheap frontier AI models to remain competitive.

Reinforcement Learning and Human Feedback: A proposed method in reinforcement learning discussed generating multiple outputs for a question and rewarding the model for producing the correct answer after multiple attempts.
This method raises questions about the effectiveness of the reward strategy compared to traditional RLHF techniques.

Training the Granite 3.1 Model: User shared plans to run training on Granite 3.1's 3B model, expressing a desire to investigate various training strategies, including a custom RL loop.
The aim is to explore the potential of multiple objectives per epoch in a newly designed training setup.

Limitations of Tree Search Methods in AI: The limitations of tree search methods in reasoning tasks were discussed, particularly regarding local optima and the potential for better strategies to be implemented.
The conversation suggested that using multiple LLMs with different contexts might offer better problem-solving capabilities.

Zonos TTS Model Release: The release of Zonos, a high-fidelity TTS model with voice cloning capabilities, was shared, highlighting its performance against leading TTS providers.
The model's open-source nature under the Apache 2.0 license encourages its adoption in AI development.

Links mentioned:

Musk-led investor group offers $97.4 billion for OpenAI — Altman declines: According to The Wall Street Journal, Elon Musk and a group of investors are offering $97.4 billion to take control of OpenAI. 
What The GIF - What The Wtf - Discover & Share GIFs: Click to view the GIF
Hermes-3-Llama-3.1-8B-F32.imatrix · Joseph717171/Hermes-3-Llama-3.1-8B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF at main: no description found
unsloth/DeepSeek-R1-GGUF · any benchmark results?: no description found
Tweet from Zyphra (@ZyphraAI): Today, we're excited to announce a beta release of Zonos, a highly expressive TTS model with high fidelity voice cloning.We release both transformer and SSM-hybrid models under an Apache 2.0 licen...
Reddit - Dive into anything: no description found
Developers are getting screwed.: LEARN: https://learn.typecraft.dev/X: https://x.com/typecraft_devFor the longest time now, the software developer’s path has been a pretty clear one. As a ju...
GitHub - jart/cosmopolitan: build-once run-anywhere c library: build-once run-anywhere c library. Contribute to jart/cosmopolitan development by creating an account on GitHub.
Psyche Foundation: Psyche Foundation has one repository available. Follow their code on GitHub.
Models: Teaming up with excellent open-source foundation models.
GitHub - 3Simplex/Llama.Cpp-Toolbox: Llama.Cpp-Toolbox is a PowerShell GUI interface.: Llama.Cpp-Toolbox is a PowerShell GUI interface. Contribute to 3Simplex/Llama.Cpp-Toolbox development by creating an account on GitHub.
Reddit - Dive into anything: no description found
Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun: ABOUT THE LECTUREAnimals and humans understand the physical world, have common sense, possess a persistent memory, can reason, and can plan complex sequences...
Calibration data provided by Dampf, combines his own efforts on top of Kalomaze's. Used for calibrating GGUF imatrix files: Calibration data provided by Dampf, combines his own efforts on top of Kalomaze's. Used for calibrating GGUF imatrix files - calibration_datav3.txt

Nous Research AI ▷ #research-papers (17 messages🔥):

AI Oversight, Layer Merging in LLMs, OVERTHINK Attack on Reasoning Models 

AI Oversight using Model Similarity: Recent research proposes a probabilistic metric for language model similarity based on model mistakes, aiming to enhance AI oversight. This method suggests that LLMs-as-judges favor similar models, facilitating weak-to-strong generalization with complementary knowledge.
As model capabilities rise, detecting mistakes becomes more challenging, prompting reliance on AI oversight, a concerning trend in model performance.

Merging FFN Layers for Efficiency: A discussion arose about merging successive FeedForward Network (FFN) layers into a Mixture of Experts (MoE), potentially improving computational efficiency. Parallelizing similar layers may yield performance gains while maintaining accuracy.
Members theorized that treating merged layers as experts could enhance overall model output, though the efficiency of such changes remains uncertain.

Innovative OVERTHINK Attack: A new attack, dubbed OVERTHINK, targets reasoning LLMs by injecting complex tasks, causing models to slow down by up to 46× in inference. This method amplifies reasoning tokens without altering the final output, showcasing vulnerabilities in reasoning models.
By introducing decoy tasks during untrusted contexts, OVERTHINK effectively manipulates inference processes, posing risks for models like OpenAI's o1 and o3-mini.

Links mentioned:

Leveraging the true depth of LLMs: Large Language Models demonstrate remarkable capabilities at the cost of high compute requirements. While recent research has shown that intermediate layers can be removed or have their order shuffled...
Great Models Think Alike and this Undermines AI Oversight: As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we ref...
Tweet from Jaechul Roh (@JaechulRoh): 🧠💸 "We made reasoning models overthink — and it's costing them big time."Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like Ope...
Tweet from Jaechul Roh (@JaechulRoh): 2/ Main Method:   Our OVERTHINK attack injects complex decoy reasoning tasks (e.g., Markov Decision Processes or Sudoku) into untrusted context sources. This causes reasoning LLMs to consume more toke...

Nous Research AI ▷ #interesting-links (11 messages🔥):

Mistral's Performance, Granite Model Enhancements, LIMO's Mathematical Reasoning, Knowledge Distillation Experiments, Novel Language Model Architecture 

Mistral helps Macron secure investments: Mistral played a pivotal role in helping Macron secure investments up to EUR 30-50B for initiatives in the UAE.
This milestone highlights the growing influence of Mistral in high-profile financial dialogues.

Granite's Enhanced Reasoning Capabilities: The Granite-3.2-8B-Instruct-Preview model allows users to toggle reasoning with a simple flag, showcasing enhanced thinking capabilities with only 817 curated training samples.
This model is built on prior versions and aims to refine reasoning without extensive data.

LIMO Model Sets New Standards in Reasoning: LIMO demonstrates groundbreaking mathematical reasoning abilities, achieving 57.1% accuracy on AIME and 94.8% on MATH with only 817 samples.
This performance signifies a major leap from prior models, utilizing only 1% of traditional training data.

Insights from Knowledge Distillation Experiments: A member shared findings from knowledge distillation experiments, with the Distilled 1.5B model showing notable performance improvements across various datasets.
The results underscore the preference for distillation over fine-tuning when dealing with significant model performance gaps.

Innovative Language Model Architecture Unveiled: A novel language model architecture scales computation by reasoning in latent space, improving performance on reasoning benchmarks with 3.5 billion parameters.
This model diverges from chain-of-thought methods, effectively scaling without requiring specialized training data.

Links mentioned:

Cerebras brings instant inference to Mistral Le Chat - Cerebras: Cerebras January update: Fastest DeepSeek R1-70B, Mayo Clinic genomic model, Davos appearance, and more! Learn how we're accelerating AI with real-time inference, machine learning, and case studi...
LIMO: Less is More for Reasoning: We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks de...
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach: We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling...
ibm-granite/granite-3.2-8b-instruct-preview · Hugging Face: no description found
lm-similarity - a Hugging Face Space by bethgelab: no description found
Le Chat - Mistral AI: Chat with Mistral AI's cutting edge language models.
Reddit - Dive into anything: no description found

Nous Research AI ▷ #research-papers (17 messages🔥):

AI Oversight in Language Models, Layer Merging Strategies in Neural Networks, Performance Improvements through Layer Parallelization, OVERTHINK Attack on Reasoning Models 

AI Oversight proposes new metric for model similarity: Research establishes a probabilistic metric for LM similarity based on overlap in model mistakes, enhancing AI Oversight efficiency.
Model mistakes are becoming harder to find, raising concerns about increased reliance on AI oversight.

Innovative Layer Merging Strategies explored: Discussion highlighted potentially merging successive FFN layers into a Mixture of Experts (MoE) to enhance computational efficiency.
One member suggested treating similar layers as indivisible components, effectively increasing expert numbers while maintaining performance.

Parallelization boosts performance metrics: Experiments demonstrated that fully parallel evaluation of attention and FFN layers outperform traditional architectures, yielding greater efficiency.
Members discussed merging similar layers into double-wide versions to enhance performance through reduction of similar activations.

Introduction of the OVERTHINK attack: A new method called OVERTHINK is introduced to hinder reasoning LLMs, causing slower responses and increased token consumption.
The attack injects complex tasks like Sudoku into inputs, expanding reasoning token usage without altering outputs.

Links mentioned:

Great Models Think Alike and this Undermines AI Oversight: As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we ref...
Leveraging the true depth of LLMs: Large Language Models demonstrate remarkable capabilities at the cost of high compute requirements. While recent research has shown that intermediate layers can be removed or have their order shuffled...
Tweet from Jaechul Roh (@JaechulRoh): 🧠💸 "We made reasoning models overthink — and it's costing them big time."Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like Ope...
Tweet from Jaechul Roh (@JaechulRoh): 2/ Main Method:   Our OVERTHINK attack injects complex decoy reasoning tasks (e.g., Markov Decision Processes or Sudoku) into untrusted context sources. This causes reasoning LLMs to consume more toke...

Codeium (Windsurf) ▷ #announcements (1 messages):

Profile Page Improvements, User Feedback Request 

Codeium Profile Page Getting Upgrades: Improvements are underway for the Codeium profile page, with an invitation for user input to enhance the experience.
A form has been created for users to suggest which stats and metrics they'd like to see, and it includes open-ended questions for additional ideas.

User Input Highly Encouraged: The team is seeking user feedback to make meaningful updates to the profile experience on the platform.
Participants are thanked in advance for their suggestions, emphasizing the collaborative nature of this upgrade effort.

Link mentioned: Windsurf Editor and Codeium extensions: Codeium is the AI code assistant platform that developers love and enterprises trust. Also the builders of Windsurf, the first agentic IDE.

Codeium (Windsurf) ▷ #discussion (41 messages🔥):

Jetbrain Extension Limitations, Codeium's Shift to Windsurf, Codeium Issues in IDEs, Payment Restrictions for Russian Users, Multi-file Edit Suggestions Needed 

Jetbrain Extension Lags Behind Windsurf: There is concern about the Jetbrain extension lagging in model availability compared to Windsurf, with users speculating they're abandoning Jetbrain for a Cursor-centric approach.
There's frustration over losing functionalities in existing IDEs, indicating users feel neglected by these changes.

Codeium Transitioning to Windsurf Exclusively: It was announced that a new passive in-text editor experience will soon be exclusive to Windsurf, leading to the deprecation of Supercomplete on the VSCode plugin.
Members expressed their disappointment about the loss of support for VSCode and Jetbrain, suggesting they feel forced to adopt Windsurf.

Issues with Codeium in Integrated Development Environments: Users reported that Codeium has been freezing their Rider IDE when commands are sent, prompting suggestions to submit diagnostic logs to support.
Another user noted a problem with Codeium suggestions stopping after extended IDE use, leading to questions about whether there's a refresh solution.

Challenges for Russian Users in Accessing Codeium: Discussion arose around payment restrictions for Russian users, emphasizing struggles to secure licenses due to regional limitations and company policies.
Users called for clearer communication from Codeium regarding their stance on these restrictions, highlighting frustrations around payment processes.

Demand for Multi-file Edit Suggestions in Codeium: Users are advocating for multi-file edit suggestions in the Codeium extensions, which they currently find in Windsurf but not in Codeium.
There is a strong desire for this functionality to be integrated into the extensions to enhance usability and streamline workflows.

Codeium (Windsurf) ▷ #windsurf (409 messages🔥🔥🔥):

Windsurf Performance Issues, Integration of Different AI Models, User Experiences with Code Changes, Windsurf Feature Requests, Credit System Concerns 

Windsurf Performance Issues: Users reported issues with Windsurf's code proposal function, stating it no longer displays diffs or allows automatic updates, leading to manual copying and pasting of changes.
Additionally, many users expressed frustration over loss of credits due to reversion errors and ongoing problems with various AI models.

Integration of Different AI Models: Users discussed the need for consistent tool calling among models like O3, Deepseek, and Claude, with mixed experiences reported when switching between them.
Some users found success by relying on Claude for making changes, while others expressed a desire for O3 High for improved coding capabilities.

User Experiences with Code Changes: There were discussions about the nuances of switching between AI models and how responses might be remembered or lost during transitions.
Users suggested prompting AI to apply previous suggestions as a workaround for disruptions caused by switching contexts.

Windsurf Feature Requests: Some users suggested features like the ability to manage credits better, notifications about system issues, and improving the design documents to alleviate workflow disruptions.
The community frequently referenced the need for improved debugging and consistency in the output generated from the AI models.

Credit System Concerns: Concerns were raised around the credit system, especially regarding how credits are consumed during operations and the lack of refunds for unsuccessful attempts.
Users commonly noted that spending credits for unsatisfactory outputs has been a frustration, urging for a more transparent handling of usage.

Links mentioned:

Deep Research clones, MCP Use Cases, Windsurf + MCP | PulseMCP: New this week ending Feb 8, 2025: Deep Research clones, MCP Use Cases, Windsurf + MCP
Smithery - Model Context Protocol Registry: no description found
American Psycho Patrick Bateman GIF - American Psycho Patrick Bateman American - Discover & Share GIFs: Click to view the GIF
Screen Recording 2025-02-08 124422.mp4: no description found
Codeium Status: no description found
DeepSeek iOS app sends data unencrypted to ByteDance-controlled servers: Apple’s defenses that protect data from being sent in the clear are globally disabled.
FAQ | Windsurf Editor and Codeium extensions: Find answers to common questions.
Contact | Windsurf Editor and Codeium extensions: Contact the Codeium team for support and to learn more about our enterprise offering.
Multi-Model Agentic Performance for Code Creation and Editing | Feature Requests | Codeium: The evidence is mounting that combining different models can lead to more optimal performance in code creation and editing tasks.
GreatScottyMac - Overview: GreatScottyMac has 3 repositories available. Follow their code on GitHub.
Feature Requests | Codeium: Give feedback to the Codeium team so we can make more informed product decisions. Powered by Canny.
GitHub - microsoft/PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework: Task-Aware Agent-driven Prompt Optimization Framework - microsoft/PromptWizard
GitHub - GreatScottyMac/cascade-memory-bank: 🧠 Intelligent project memory system for Windsurf IDE. Empowers Cascade AI to maintain deep context across sessions, automatically documenting decisions, progress, and architectural evolution. Perfect for complex projects that demand consistent understanding over time.: 🧠 Intelligent project memory system for Windsurf IDE. Empowers Cascade AI to maintain deep context across sessions, automatically documenting decisions, progress, and architectural evolution. Perfe.....
ManagedV - We Launch AI-First Ventures: no description found
Tweet from FxTwitter / FixupX: Sorry, that user doesn't exist :(
Jack C Crawford on about.me: I am a Generative AI Maven, consultant, and small business owner in Irvine, California. Visit my website.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Reasoning Tokens Visibility, Model Activity Pages 

Reasoning Tokens Now Visible: Users can now view reasoning tokens in model activity pages, displayed alongside prompt and completion tokens.
This feature enhances transparency in evaluating model performance, as illustrated in the attached image.

Insightful Display of Model Metrics: The introduction of viewing reasoning tokens aligns with ongoing efforts to improve user insights into model performance metrics.
Such changes encourage deeper analysis and understanding among users regarding how models operate.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

chat-thyme Discord bot, FindSMap application, Open Router integration 

chat-thyme: Discord Bot Made Easy: Chat-thyme is a system designed for setting up Discord bots with any LLM framework compatible with OpenAI, allowing for seamless integration with OpenRouter.
It also offers search capabilities with Exa for models that support tool use, though its reliability varies by provider.

FindSMap PWA: Mapping History: FindSMap is a progressive web application that connects to historical maps and archaeological institutes globally, using Open Street Maps and Leaflet.js for mapping.
Built with Claude and Open Router, it has undergone a long iterative process, showcasing the developer's growth and commitment to the project.

Link mentioned: FindsMap - Research, Explore and Log Your Metal Detecting Finds: no description found

OpenRouter (Alex Atallah) ▷ #general (291 messages🔥🔥):

DeepSeek R1 performance issues, Gemini models and pricing, API request limitations, User experience with model outputs, Account management concerns 

DeepSeek R1 experiencing timeouts: Users have reported significant performance issues with DeepSeek R1, particularly regarding timeouts when making requests.
The 'nitro' variant for R1 is now integrated into the main model features, allowing users to sort by throughput.

Concerns over Gemini model pricing: Some users expressed frustration regarding the cost of using the Gemini Pro 1.5 model, which is seen as expensive despite being cheaper than some competitors.
Others suggested exploring newer models like Gemini 2.0 Flash for better pricing and performance.

Issues with API request quotas: Several users faced 'Quota exceeded' errors when using API requests, indicating that their usage limits may have been reached.
Provider responses indicated a temporary service disruption, but some users were still able to access models without issues.

User experiences with model output quality: Debates emerged around the relative quality of various AI models, with many asserting that certain models like Sonnet 3.5 outperform others in practical applications.
Discussions included experiences with how different models handle context and reasoning tasks.

Account and data management challenges: Users raised concerns about the potential loss of chat history and difficulties in managing account settings effectively.
There were also discussions about accessing models with specific provider keys without incurring costs.

Links mentioned:

Ollama: Get up and running with large language models.
Qwen2.5 VL 72B Instruct (free) - API, Providers, Stats: Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. Run Qwen2.5 VL 72B Instruct (free) with API
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from Vipul Ved Prakash (@vipulved): Rolling out a new inference stack for DeepSeek R1 @togethercompute that gets up to 110 t/s on the 671B parameter model!
Qwen VL Plus (free) - API, Providers, Stats: Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of...
Tweet from Shruti Mishra (@heyshrutimishra): 🚨 China JUST dropped another AI model that beats OpenAI, DeepSeek, and Meta.o1-level reasoning, 200K characters context window, 50 files, real-time search in 1000+ webpages.Here's everything you ...
no title found: no description found
GitHub - simplescaling/s1: s1: Simple test-time scaling: s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on GitHub.
Models | OpenRouter: Browse models on OpenRouter
Groq is Fast AI Inference: Groq offers high-performance AI models & API access for developers. Get faster inference at lower cost than competitors. Explore use cases today!
Tweet from Sam Altman (@sama): no thank you but we will buy twitter for $9.74 billion if you want

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

OpenRouter Integration, Typescript SDK for LLMs 

Building OpenAI-Formatted LLM Library: A team is developing a TypeScript SDK to call over 60 LLMs using OpenAI's format and has just integrated OpenRouter for this purpose.
Feedback is appreciated as they acknowledge the work might still be rough around the edges.

GitHub Repository for the Project: They shared a GitHub link for the abso project, aimed at facilitating calls to 100+ LLM Providers using OpenAI's format.
The repository promises a comprehensive TypeScript SDK for developers looking to implement this functionality.

Link mentioned: GitHub - lunary-ai/abso: TypeScript SDK to call 100+ LLM Providers in OpenAI format.: TypeScript SDK to call 100+ LLM Providers in OpenAI format. - lunary-ai/abso

aider (Paul Gauthier) ▷ #general (216 messages🔥🔥):

Aider Performance and Configurations, DeepSeek API Stability, Model Comparisons, Gemini Usage, Language Support and Benchmarks 

Aider's Impact on Confidence in Code: Users have expressed mixed feelings about Aider, with some reporting increased confidence in its output despite potential flaws in underlying models.
One user humorously noted that Aider could write complex code but struggle with basic syntax correctness.

DeepSeek API Concerns: Multiple users reported instability and unresponsiveness when using DeepSeek APIs, particularly in the context of integrating with Aider.
One user mentioned troubleshooting issues when attempting to get outputs via DeepSeek with specific configurations.

Model Comparisons among Providers: Discussions around the effectiveness of different providers for DeepSeek's R1 and V3 revealed preferences for Hyperbolic and OpenRouter over others.
Users noted specific configurations and tools that enhanced performance when working with different models.

Utilization of Gemini Models: Several users shared experiences with using Gemini models like gemini-1206-exp, highlighting its effectiveness for PHP tasks.
Comparisons were made between Gemini and other providers, with some users emphasizing the lack of noticeable differences in output.

Language Support Enhancements: The introduction of experimental support for tree-sitter-language-pack aims to expand Aider's programming language capabilities.
Users were encouraged to test this new feature and provide feedback regarding its installation and language support effectiveness.

Links mentioned:

In-chat commands: Control aider with in-chat commands like /add, /model, etc.
In-chat commands: Control aider with in-chat commands like /add, /model, etc.
Config with .env: Using a .env file to store LLM API keys for aider.
Zoolander Zoolander Movie GIF - Zoolander Zoolander movie Movie zoolander - Discover & Share GIFs: Click to view the GIF
FAQ: Frequently asked questions about aider.
YAML config file: How to configure aider with a yaml config file.
Its An Illusion Creepy Jason GIF - Its An Illusion Creepy Jason Ink Master - Discover & Share GIFs: Click to view the GIF
OpenRouter: aider is AI pair programming in your terminal
Options reference: Details about all of aider’s settings.
Edit formats: Aider uses various “edit formats” to let LLMs edit source files.
no title found: no description found
GitHub - ai-christianson/RA.Aid: Develop software autonomously.: Develop software autonomously. Contribute to ai-christianson/RA.Aid development by creating an account on GitHub.
Enhance: Add project specific rules in .aiderrules · Issue #1293 · Aider-AI/aider: Issue Currently we can include instructions by adding for example markdown files to the Chat. For project-specific instructions, you could include the instructions in a .aiderrules file in the root...

aider (Paul Gauthier) ▷ #questions-and-tips (70 messages🔥🔥):

Aider Configuration, Model Performance Comparisons, Usage of Aider Features, Architect Mode, Command Line Operations 

Managing Aider in Architect Mode: Users are experiencing issues with Aider auto-creating files without prompting when in Architect mode, leading to confusion about the operation flow.
One user shared a screenshot showcasing the unexpected behavior, indicating potential configuration issues.

Aider's Chat History Limits: Concerns were raised about Aider's chat history exceeding reasonable limits, with some users noticing it climbing to 25k tokens.
Discussion around potential bugs and the effectiveness of using prompt caching in relation to this issue was highlighted.

Using Ollama Models with Aider: Local Ollama models with customized context sizes are being used, but users reported warnings that suggest the context size being employed isn't equivalent to the model's capabilities.
Questions about performance and functionality surface, particularly regarding handling code requests effectively.

Training and Best Practices for Aider: One user is exploring ways to effectively train their team on Aider's best practices and interactions to improve efficiency in a startup environment.
They shared interest in utilizing various Aider features like --yes-always and test-driven development workflows.

Confusion Surrounding Aider's GitHub Copilot Integration: A user inquired about the exact model used in GitHub Copilot named o3 mini, questioning its classification as low, mid, or high.
Another user expressed interest in obtaining reasoning summaries for the o3-mini model, highlighting curiosity around model performance metrics.

Links mentioned:

Ollama: aider is AI pair programming in your terminal
Ollama: aider is AI pair programming in your terminal
OpenRouter: aider is AI pair programming in your terminal
Tutorial videos: Intro and tutorial videos made by aider users.
Options reference: Details about all of aider’s settings.
Feature Request: Allow discussion with architect before accepting · Issue #3153 · Aider-AI/aider: I use aider like many others with an architect and editor model. There have been many times I query something, but I notice that the architect misunderstood some point or I did not explain it in en...
aider/aider/models.py at f7dd0fc58201711c4e483fa4340e3cb1fbd224c3 · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #links (8 messages🔥):

Copilot Proxy Extension, Outline Script for Aider, C++ Code Challenges, GitHub Integration Feedback 

Copilot Proxy Unlocks New Possibilities: A member introduced the experimental Copilot Proxy, a VS Code extension designed to enable AI assistants access to GitHub Copilot's language models.
They shared a YouTube video detailing the extension's functionality and potential.

Community Seeks Script for Outlining Code: A member expressed frustration after their support comment on GitHub was not merged, seeking ways to utilize the Copilot Proxy work for their needs.
Another member suggested using the llmap repo and provided guidance on using its parse.py script to extract file outlines.

Struggles with Massive C++ Codebase: A member revealed their challenges in managing a massive C++ codebase developed over 10 years, reflecting on AI's token limits during the process.
They mentioned needing to add an scm file for effective outlining, which they later found in the repo.

Links mentioned:

Aider Integration with Copilot Proxy: Expanding Language Model Access: Unlock GitHub Copilot Models with My New Copilot Proxy: Step-by-Step Guide🔗 GitHub Repository:https://github.com/lutzleonhardt/copilot-proxy🎥 Join me for a...
GitHub - jbellis/llmap: Contribute to jbellis/llmap development by creating an account on GitHub.
GitHub - lutzleonhardt/copilot-proxy: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purposes and should not be used in production environments.: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purpos...

Latent Space ▷ #ai-general-chat (126 messages🔥🔥):

DeepSeek AI Models, Anthropic Economic Index, Replit Mobile App Support, AGI Discussions, Open Source Software and Secrets 

DeepSeek AI Models gaining traction in China: Chinese consumer GPU manufacturers have adapted support for DeepSeek's R1 LLM models on local systems, marking significant progress in AI hardware capabilities in China.
With Moore Threads and Baidu's Kunlun GPUs, the competition to challenge NVIDIA's dominance in AI is intensifying.

Anthropic Economic Index Launch: Anthropic has launched the Economic Index to analyze the impact of AI on the economy, which includes a paper based on millions of anonymized Claude conversations.
Initial findings reveal interesting patterns with notable areas like material transportation showing surprisingly low engagement.

Replit Introduces Native Mobile App Support: Replit announced early access for Native Mobile App support allowing users to create iOS and Android apps without coding, powered by Replit Assistant.
The launch suggests a pivot towards making app development more accessible, with promises of full agent support soon.

AGI Discussions and Perceptions: Discussion points revolve around what defines AGI, with definitions suggesting AGIs should be independent workers trusted to complete tasks rather than merely assistants.
Views highlighted the need for continual assessment of AGI based on emerging technologies and their implications.

Open Source Software vs. Secrets Debate: Stratechery's insights emphasize the increasing value of open-source software alongside the challenges of maintaining secretive competitive advantages in AI.
It was noted that many supposed secrets may not be as secure as companies believe, suggesting a faster diffusion of knowledge in the field.

Links mentioned:

Tweet from Sam Altman (@sama): no thank you but we will buy twitter for $9.74 billion if you want
Gradual Disempowerment: no description found
Google NotebookLM | Note Taking & Research Assistant Powered by AI: Use the power of AI for quick summarization and note taking, NotebookLM is your powerful virtual research assistant rooted in information you can trust.
Ai2 ScholarQA: no description found
Tweet from Dimitris Papailiopoulos (@DimitrisPapail): AIME I 2025: A Cautionary Tale About Math Benchmarks and Data ContaminationAIME 2025 part I was conducted yesterday, and the scores of some language models are available here: https://matharena.ai tha...
OpenAI’s Super Bowl Ad: Introducing the Intelligence Age - Event | OpenAI Forum: no description found
alphaXiv: Discuss, discover, and read arXiv papers.
Tweet from Mislav Balunović (@mbalunovic): We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers.We evaluated them on the AIME 2025 I competition from *yesterday* and the...
Tweet from 🍓🍓🍓 (@iruletheworldmo): anthropic are currently testing claude 4.0 sonnet (chocolate) and claude 4.0 haiku (kiwi) in the lmsys battle mode. the 'red teaming' they are currently running is from their latest model. ful...
Chinese GPU Manufacturers Push Out Support For Running DeepSeek's AI Models On Local Systems, Intensifying the AI Race: Chinese consumer GPU manufacturers have now started to bring support for running DeepSeek's R1 LLM models on local systems.
Tweet from Peyman Milanfar (@docmilanfar): Michael Jordan gave a short, excellent, and provocative talk recently in Paris - here's a few key ideas- It's all just machine learning (ML) - the AI moniker is hype - The late Dave Rumelhart ...
Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): AIME-2025 is out.Non-reasoners cannot into hard math, it's not even close.But man o3-mini is very good at this and cheap. R2 can't come soon enough.
Tweet from Amjad Masad (@amasad): Announcing Native Mobile App support on Replit.Now you can build iOS and Android apps that you can take all the way to the App Store without writing any code, powered by Replit Assistant.This is early...
Tweet from Avi Chawla (@_avichawla): Let's build our own reasoning model (like DeepSeek-R1) 100% locally:*-
MathArena.ai: MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Tweet from 🍓🍓🍓 (@iruletheworldmo): someone inside anthropic told me they’re releasing claude 4 this week. and a reasoning model. blows past full o3 scores. really excited.
Tweet from Zyphra (@ZyphraAI): Today, we're excited to announce a beta release of Zonos, a highly expressive TTS model with high fidelity voice cloning.We release both transformer and SSM-hybrid models under an Apache 2.0 licen...
Tweet from Anthropic (@AnthropicAI): Today we’re launching the Anthropic Economic Index, a new initiative aimed at understanding AI's impact on the economy over time.The Index’s first paper analyzes millions of anonymized Claude conv...
Elicit: The AI Research Assistant: Use AI to search, summarize, extract data from, and chat with over 125 million papers. Used by over 2 million researchers in academia and industry.
int8 tensorcore matmul for Turing: Speaker: Erik Schultheis
GitHub - stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. - stanford-oval/storm
Three Observations: Our mission is to ensure that AGI (Artificial General Intelligence) benefits all of humanity. Systems that start to point to AGI* are coming into view, and so we think it’s important to...
Talking about AI with the Italian Michael Jordan: Prof. Michael Jordan offers his provocative thoughts on the blending of AI and economics and takes us on a tour of Trieste, a beautiful and grand city in nor...
Reddit - Dive into anything: no description found
UAE to invest billions in France AI data center: The project announced by the French presidency was signed as global experts debate the threats and promise of artificial intelligence at a gathering in Paris on Thursday and Friday, ahead of a summit ...
 - YouTube: no description found
Emergent Mind: AI Research Assistant: Research-backed answers to your questions.
[Feature]: Integrate Writing in the Margins inference pattern ($5,000 Bounty) · Issue #9807 · vllm-project/vllm: 🚀 The feature, motivation and pitch Writer has introduced "Writing in the Margins" algorithm (WiM) that boosts results for long context window retrieval. The task is composed from "con...
Reddit - Dive into anything: no description found
Tweet from Shital Shah (@sytelus): So, AIME might not be a good test for frontier models after all. For 15 problems in AIME 2025 Part 1, I fired off deep research to find near duplicates. It turns out…  1/n🧵
Reddit - Dive into anything: no description found
Amazon doubles down on AI with a massive $100B spending plan for 2025 | TechCrunch: Amazon is joining other Big Tech companies by announcing huge AI spending plans for 2025.
Reddit - Dive into anything: no description found
Deep Research and Knowledge Value: Deep Research is an AGI product for certain narrow domains; it’s ability to find anything on the Internet will make secret knowledge all the more valuable.
Deep Research and Knowledge Value: Deep Research is an AGI product for certain narrow domains; it’s ability to find anything on the Internet will make secret knowledge all the more valuable.
Consensus AI-powered Academic Search Engine: Consensus is a new breed of academic search engine, powered by AI, grounded in science. Find the best papers while getting instant insights and topic synthesis.

Latent Space ▷ #ai-in-action-club (139 messages🔥🔥):

Deep Research Tool Discussion, Social Agents Exploration, AI Proactive Systems, OpenAI's $200 Tax, ELIZA Operating System 

Deep Research Tool Gains Attention: Members discussed OpenAI's new Deep Research tool, noting its ability to ask clarifying questions before completing research tasks, signaling a shift towards more interactive AI systems.
There's a growing interest in comparing it to other tools like Hugging Face's Deep Research and community-made alternatives.

Exploring the Potential of Social Agents: Participants expressed interest in broader discussions about social agents, with one member highlighting the emerging significance of this area in AI development.
There's acknowledgment of the need for more structured exploration into how these agents can enhance user experiences.

AI Becomes Proactive in User Interaction: There were discussions around the value of having AI systems that proactively prompt users, moving beyond reactive models and enhancing engagement levels.
This reflects a collective desire for AI to better understand user needs and provide tailored assistance.

Debate Over OpenAI’s $200 Fee: Concerns were raised about the perceived 'OAI Tax' associated with using OpenAI's tools, specifically its $200 fee.
Some participants expressed skepticism but acknowledged that valuable alternatives are limited.

Introduction to ELIZA Operating System: Members were introduced to the ELIZA Operating System designed for AI agents, showcasing its foundational role in developing chatbot technology.
The relevance of historical chatbots like ELIZA in today's AI context was an interesting angle in the conversation.

Links mentioned:

elizaOS - The Operating System for AI Agents: elizaOS is an open-source protocol for autonomous AI agents (Elizas).
Tweet from thebes (@voooooogel): so these people were being assholes to teor, so let's look into their "quantum geometric tensor" library (pumpfun in bio btw), i'm sure we'll find some, uh, gemsQuoting Teortaxes▶️...
GitHub - go-go-golems/pinocchio: pinocchio LLM utility: pinocchio LLM utility. Contribute to go-go-golems/pinocchio development by creating an account on GitHub.
AI In Action: Weekly Jam Sessions: no description found
Gradio: no description found

Modular (Mojo 🔥) ▷ #mojo (256 messages🔥🔥):

Mojo and Web Programming, VariadicList Initialization in Mojo, Community and Ecosystem Development, Comparison with Other Languages, Understanding of Networking in Development 

Discussion on Mojo's future in web programming: Members discussed the long-term prospects for Mojo in web development, noting that establishing a robust ecosystem will take considerable time and effort.
Many believe that successful Mojo applications will rely on its integration with existing Python libraries, highlighting the need for foundational tools before broader adoption can occur.

Challenges with VariadicList initialization in Mojo: A user raised an issue regarding the initialization of VariadicList in Mojo, providing code examples that failed to create expected outcomes.
They specifically inquired about the ability to dynamically repeat elements when using the pop.variadic.create operation.

Importance of domain knowledge in business development: The conversation highlighted that understanding the domain is crucial for launching a business, especially in tech where networking knowledge is often critical.
Participants noted that many startups skip this understanding, leading to challenges that could have been avoided.

Network effects and language adoption: Discussion focused on how network effects affect the adoption of programming languages like Rust, pointing out that a strong ecosystem promotes easier experimentation.
While some believe in the inevitability of slop as part of rapid development, others argue for maintaining high-quality standards.

C++ dominance in high-performance applications: The group reflected on the prevalence of C++ in companies prioritizing performance optimization, discussing its impact on language adoption.
There was a consensus that while Mojo could gain traction, its growth would substantially depend on its compatibility and integration with established languages.

Links mentioned:

Cloud Application Platform | Render: On Render, you can build, deploy, and scale your apps with unparalleled ease – from your first user to your billionth.
modular/mojo: The Mojo Programming Language. Contribute to modular/mojo development by creating an account on GitHub.
max-cv/mojoproject.toml at main · BradLarson/max-cv: An image processing framework built upon MAX. Contribute to BradLarson/max-cv development by creating an account on GitHub.

MCP (Glama) ▷ #general (157 messages🔥🔥):

Firebase/Firestore MCP, Troubleshooting MCP Commands, MCP Performance Issues, Smithery MCP Installer, Claude Desktop Beta Experience 

Searching for Firebase/Firestore MCP: A user inquired if anyone had discovered a Firebase/Firestore MCP, to which another user pointed to a link that likely confirmed its unavailability.
This interaction highlights a gap in MCP tools for specific databases, indicating further exploration is necessary.

Common MCP Command Issues: A user experienced issues adding an MCP server via Cursor, receiving a 'No Tools Found' error and discussing potential path misconfigurations.
Suggestions included verifying the correct command path and resetting the application after updates.

Concerns Over MCP Performance and Capabilities: Users expressed frustration with slow tool call responses, attributing issues to the Python SDK's limitations and ongoing bugs after a recent update.
Feedback pointed towards the need for better error handling and performance improvements while using MCP in conjunction with Claude Desktop.

Ambivalence Toward Smithery MCP Installer: While Smithery is regarded as a leading MCP installer, concerns about its remote data handling and overhead emerged in discussions.
Users emphasized the necessity for a more local alternative to address privacy and efficiency in using MCP tools.

Beta Testing Experiences with Claude Desktop: Multiple users reported crashing instances of the Claude Desktop app while using their MCP servers, leading to a discussion on the unreliability of current features.
There was a consensus that the app is still in beta, requiring extensive feedback and improvements before a stable release can be expected.

Links mentioned:

Advanced / Model Context Protocol (MCP)– Cursor: no description found
Claude Desktop Quick Feedback: Thanks for trying out our Desktop App, currently in public beta. We would love your feedback on bugs you encounter, rough edges, and feature suggestions. Thanks in advance for your feedback below. Lea...
fix: update types to reflext 2024-11-05 schema · modelcontextprotocol/python-sdk@bd74227: no description found
GitHub - ClickHouse/mcp-clickhouse: Contribute to ClickHouse/mcp-clickhouse development by creating an account on GitHub.
Open-Source MCP servers: Enterprise-grade security, privacy, with features like agents, MCP, prompt templates, and more.
GitHub - modelcontextprotocol/servers: Model Context Protocol Servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
python-sdk/src/mcp/types.py at f10665db4c2f676da1131617ad67715952258712 · modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients - modelcontextprotocol/python-sdk
fix: handle internal notifications during session cleanup by donghao1393 · Pull Request #85 · modelcontextprotocol/python-sdk: fix: handle internal notifications during session cleanupMotivation and ContextAddresses an issue where internal notifications (e.g. 'cancelled') during session cleanup would trigger v...
Random error thrown on response · Issue #88 · modelcontextprotocol/python-sdk: Describe the bug Sometimes, I see a stacktrace printed in the logs of my mcp server. Claude eventually succeeds to response but I think its good to investigate it. To Reproduce Its hard to reproduc...

MCP (Glama) ▷ #showcase (65 messages🔥🔥):

Sampling support in MCP, Modifications to Web Research Code, Superargs use cases, Deployment of MCP servers, Cost management for MCP infrastructure 

Progress on Sampling Support in MCP: A member is developing sampling support in the mcp-agent and has created a model selector based on cost, speed, and intelligence preferences, as detailed in a Twitter thread. They seek collaboration and feedback from others who may have similar needs.
Another member noted that MCP SDK Python servers currently do not support sampling.

Enhancements in Web Research Code: A participant successfully modified the mzrxai/web-research code to include proper headers for Chrome and eliminate headers that disclose automation. The project is available on GitHub for review.
The goal of the modification is to improve the functionality of the web research server, allowing it to provide real-time information effectively.

Superargs Introduces Runtime Configurations: Superargs enables dynamic configuration of MCP server arguments during runtime, allowing for delayed variable setups, as demonstrated in a GitHub repository. This adaptation addresses limitations of current MCP server designs by simplifying configurations and tool add-ons.
There was a discussion about the potential of using Superargs to create an intelligent assistant that adjusts settings as needed during user interactions.

Debate on MCP Server Deployment at Scale: Concerns were raised about the practicality and costs of deploying MCP servers at scale, especially regarding Stateful data and security isolation. Members discussed potential methods of controlling costs, such as pooling resources or utilizing services like DigitalOcean.
Some highlighted the challenges users may face with managing such infrastructure, suggesting that a subscription model might be a more user-friendly option for managed services.

Real-World Usage of MCP Servers: One participant elaborated on their advanced use cases for MCP servers, particularly in embedded remote assistant applications that require runtime adjustments. They explained how using MCP servers could simplify integration with various APIs while maintaining user data security.
Interest was shown in exploring ways to allocate costs effectively to users while managing infrastructure challenges.

Links mentioned:

no title found: no description found
Tweet from Sarmad Qadri (@qadri_sarmad): I built a simple LLM selector that lets you pick an LLM depending on cost, speed and intelligence preferences.  It is based on Model Context Protocol's model preferences spec, and uses data from @...
GitHub - PhialsBasement/mcp-webresearch: MCP web research server (give Claude real-time info from the web): MCP web research server (give Claude real-time info from the web) - PhialsBasement/mcp-webresearch
GitHub - supercorp-ai/superargs: Provide AI MCP server args during runtime.: Provide AI MCP server args during runtime. Contribute to supercorp-ai/superargs development by creating an account on GitHub.
GitHub - PederHP/mcpdotnet: .NET implementation of the Model Context Protocol (MCP): .NET implementation of the Model Context Protocol (MCP) - PederHP/mcpdotnet

GPU MODE ▷ #general (16 messages🔥):

cuBLAS performance comparison, Matrix-Vector Multiplication vs Matrix-Matrix Multiplication, Analogue Matrix Multiplication Hardware, Load queuing and stalls, L1 hit rate 

cuBLAS shows varied performance on different GPUs: A user reported that cuBLAS behaves inconsistently between their 1650ti and their brother's 4090, with significant performance differences noted in associated images.
They questioned whether the cuBLAS build accommodates newer architectures effectively.

Matrix-Vector Multiplication clarifies confusion: It was clarified that the operation Cx = A(Bx) tests matrix-vector multiplication (MV) rather than matrix-matrix multiplication (MM).
Further discussions revealed that MM is associative, thereby validating the approach discussed.

Inquiry about analogue matrix multiplication developments: A member inquired about mysticAI, a company working on analogue matrix multiplication hardware and its claimed 1000x power efficiency advantage.
Another user provided a link to their current project at mythic.ai, suggesting potential progress.

Concerns about stalls in processing: Discussions surfaced regarding the frequent stalls due to load queuing in operations and a comment on the need for a larger stall bar.
Members noted that increasing the L1 hit rate might alleviate some of these stalls.

Total improvement metrics discussed: Suggestions arose to illustrate total improvements in processing time/flops as a way to better understand efficiency gains.
The emphasis on measurable metrics aims to enhance discussions on performance.

GPU MODE ▷ #triton (5 messages):

Triton lang Discord Access, Performance of Tensor Cores, Unsloth 30x Faster LLM Training, Mistral 14x Faster Finetuning, Triton Code Contiguity Issues 

Request for Triton lang Discord Access: A member asked if they could be added to the Triton lang Discord, while others showed interest in joining as well.
Complexfilterr expressed they're also eager to be included in the Discord.

Investigating Performance Without Tensor Cores: Notyourmom questioned the performance implications of not using tensor cores in their 03-matmul.py script on a 3050M GPU, sharing attached images for context.
This sparked curiosity within the community regarding the efficiency of various implementations.

Unsloth Promises 30x Faster LLM Training: A blog post on Unsloth details how it can make LLM training 30x faster, allowing Alpaca to train in just 3 hours instead of 85.
It also boasts 60% less memory usage and claims no loss in accuracy, with both open source and proprietary options available.

Mistral Finetuning Achieves 14x Speedup: The release of QLoRA support allows finetuning Mistral 7B to perform 14x faster on a single A100, using 70% less peak VRAM.
Notably, CodeLlama 34B achieved 1.9x speedup, with memory usage improvements ensuring it doesn't run out of memory.

Handling Non-Contiguous Variables in Triton: Complexfilterr raised a question about addressing non-contiguous variables found with tl.load in Triton code.
They inquired whether generating an explicit buffer would be a viable solution for the issue.

Links mentioned:

Introducing Unsloth: no description found
Unsloth update: Mistral support + more: We’re excited to release QLoRA support for Mistral 7B, CodeLlama 34B, and all other models based on the Llama architecture! We added sliding window attention, preliminary Windows and DPO support, and ...

GPU MODE ▷ #cuda (4 messages):

CUDA Kernel Invocations, Kernel Fusion, CUDA Graphs 

Considerations for CUDA Kernel Invocations: A member questioned if it's practical to care about the number of CUDA kernels invoked when chained in a stream, pondering if any performance gains come from fusing them.
Another member responded that if fusion avoids global memory accesses, it could indeed make a difference, especially when kernels are so short that launch overheads can't be hidden.

CUDA Graphs Provide Potential Benefits: The discussion highlighted that the number of kernels is significant when their launch overhead can't be masked by asynchronous execution.
In such cases, utilizing CUDA Graphs could be beneficial, but only if they are reused often enough.

GPU MODE ▷ #torch (3 messages):

fsdp2 dtensor APIs, CPython C API 

Inquiry on FSDP2 dtensor APIs in C++: A member asked about the availability of fsdp2 dtensor APIs in C++ and whether they exist.
They were seeking clarity on the best approach for accessing functionality related to FSDP2.

Recommendation to Use CPython C API: Another member responded that since FSDP2 is implemented in Python, it is likely better to use the CPython C API for making Python calls.
This suggestion implies the absence of a direct C++ implementation for FSDP2 dtensor APIs in this context.

GPU MODE ▷ #announcements (1 messages):

int8 tensorcore matmul, technical insights 

Excitement for Erik's Insights on Tensorcore: The community is buzzing with excitement as Erik shares his expertise on the int8 tensorcore matmul for Turing starting now.
Such deep technical insights from Erik are anticipated to enrich our understanding and discussions in the server.

Anticipation Builds for Technical Depth: Members are looking forward to Erik's deep dive into technical aspects of tensorcore matmul, highlighting how it benefits Turing architecture.
The server is abuzz with eager comments about Erik's ability to provide technical clarity on complex topics.

GPU MODE ▷ #algorithms (5 messages):

Tiling in GEMM, Jeremy Howard's Lectures, Simon Boehm's Blog on Matmul 

Learning Tiling in GEMM: A member asked for good resources to learn about how tiling works, specifically in breaking up GEMM into smaller chunks.
They expressed interest in materials that include code examples or visualizations.

Jeremy Howard's Lectures on YouTube: Another member recommended checking out Jeremy Howard's lectures on YouTube for insights into tiling.
The specific YouTube video linked is titled Getting Started With CUDA for Python Programmers at this timestamp.

Simon Boehm's Blog on Matmul: A member suggested Simon Boehm's blog focused on CPU and CUDA matrix multiplication as an additional resource.
This blog is expected to provide helpful insights and practical examples related to tiling.

Link mentioned: Getting Started With CUDA for Python Programmers: I used to find writing CUDA code rather terrifying. But then I discovered a couple of tricks that actually make it quite accessible. In this video I introduc...

GPU MODE ▷ #cool-links (1 messages):
iron_bound: Run Deepseek from fast NVME storage
https://github.com/BlinkDL/fast.c

GPU MODE ▷ #jobs (3 messages):

GPU Glossary Contributions, ROCm Specific Terms 

Excitement for the GPU Glossary: A member expressed their enjoyment of the GPU Glossary, stating they 'absolutely loved' it.
This enthusiasm highlighted the community's positive reception of the resource.

Interest in Contributing to the Glossary: Another member inquired if there was a way to contribute to the GPU Glossary, specifically wanting to add ROCm related terms and general GPU information.
This reflects a desire for community involvement and enhancement of the existing resource.

Anticipation for Future Updates: A response suggested staying tuned for updates, indicating that contributions might be streamlined soon.
The phrase 'watch this space' signals an active engagement and expectation of developments regarding contributions.

GPU MODE ▷ #beginner (11 messages🔥):

Implementing matmul in Assembly, Llama-8b Model Memory Usage, Using eGPU with MacBook Air, Learning CUDA and Resources, Mentorship in CUDA/CUTLASS 

Optimizing matmul kernel performance: A user reports implementing and optimizing a matmul kernel in x86_64 Assembly, achieving around 105 GFLOPs on an R7 5800X single core. They seek feedback and improvements on their rudimentary code available here.
Plans to change the matrix storage from row/column major to blocking may push performance to 120 GFLOPs.

Llama-8b runs out of memory in 16BF: A user inquired why their llama-8b 16BF model occupies around 30GB of VRAM when using L40S or A40 GPUs, questioning if it loads in 32bit instead of 16bit. A fellow user suggested using torch_dtype='auto' during the model loading to optimize memory usage.
The issue stems from the weights normally loading in full precision (torch.float32) unless specified otherwise.

Challenges using eGPU with MacBook Air: A user asked about hooking up an eGPU to a MacBook Air with an M2 chip for ML model training, considering CUDA options. Another user warned that NVIDIA ceased MacOS driver support since High Sierra, making modern NVIDIA GPUs incompatible with Mac.
It was noted that eGPU support is also absent for M1 based Macs and current Apple silicon models.

Resources for learning CUDA: A user shared a recommendation for a free online CUDA playground at leetgpu.com to kickstart learning CUDA. Others encouraged leveraging cloud GPU instances or Google Colab for CUDA support.
These resources provide hands-on experience without the need for proprietary hardware.

Seeking mentorship for deep learning in CUDA: A user is looking for guidance while studying the PMPP textbook and plans to contribute to CUDA/CUTLASS or ROCM. They aim to strengthen their understanding of deep learning to apply for related jobs in the field.
One recommendation is to engage with communities and forums to further enhance their skills and knowledge base.

Links mentioned:

no title found: no description found
Apple Silicon (M1/M2) and eGPU support - Apple Community: no description found
Load pretrained instances with an AutoClass: no description found
GPU memory not recognized for the code - nsight compute: Hi everyone,  I have an nvidia GeForce MX450 with cuda kit 12.8 and graphics driver 572.16 on WSL  I was trying to follow along with a course and run the following code  https://github.com/Infatoshi/c...
GitHub - alint77/matmul_assembly_x86: Contribute to alint77/matmul_assembly_x86 development by creating an account on GitHub.
Software Engineer, Systems ML -  HPC Specialist: Meta's mission is to build the future of human connection and the technology that makes it possible.

GPU MODE ▷ #youtube-recordings (8 messages🔥):

Live Video Conversion Issues, Video Quality Concerns, int8 Tensorcore Matmul Course, Course Exercise Performance Statistics 

Live Video Views Split Issue: A member expressed frustration over not being able to convert a live video into a regular one, causing view counts to split.
This issue makes it difficult to track engagement and performance effectively.

Video Quality Affects Clarity: Members noted that the video quality hindered the legibility of several figures and screenshots in the video.
They suggested that slides may provide a clearer view of the content presented in the talk.

Promotion of int8 Tensorcore Matmul Video: A YouTube video titled 'int8 Tensorcore Matmul for Turing' featuring Erik Schultheis was shared for reference.
This video is likely relevant for those enrolled in the related course.

Course Updates and Challenges: A member mentioned that the int8mm exercise was updated in their course, making it challenging to achieve full points on specific tasks.
They humorously reflected on the difficulty of exercise 3a, still not achieving optimal times.

Performance Statistics from Last Year’s Course: One member shared performance statistics for the CP3a exercise, detailing submission times of 288 students.
The data revealed trends about graded solutions, with specific thresholds highlighted for full points.

Links mentioned:

int8 tensorcore matmul for Turing: Speaker: Erik Schultheis
Exercises: no description found

GPU MODE ▷ #torchao (1 messages):

TorchAO/Gemlite performance regression, Benchmarking Issues, Environment Setup Concerns 

Major drop in TorchAO/Gemlite performance: A member reported a significant performance regression with TorchAO/Gemlite, indicating slower throughput compared to their benchmarks.
I've filed an issue here Performance Regression with TorchAO/Gemlite detailing the setup used for benchmarking.

Benchmarking setup referenced: They mentioned using scripts from the Pytorch Blog on accelerating LLM inference for the benchmarking tests conducted.
The setup included H100, Cuda: 12.6, torch: 2.5.1.14+cu126, and torchao: 0.8.

Possible setup issues questioned: The member speculated that there might be a bug or issue with their current setup leading to the observed performance drop.
They are seeking feedback on whether others face similar performance drops or if it's specific to their environment.

Link mentioned: Performance Regression with TorchAO/Gemlite: Slower Throughput Compared to sglang.bench_offline_throughput · Issue #15 · mobiusml/gemlite: I've benchmarked using the scripts in Pytorch Blog: https://pytorch.org/blog/accelerating-llm-inference/ And here is the setup of my environment: H100 Cuda: 12.6 torch: 2.5.1.14+cu126 torchao: 0.8...

GPU MODE ▷ #off-topic (2 messages):

YouTube Video Discussion, Pricing Concerns 

YouTube Video Surprise: A YouTube video titled " - YouTube" was shared, but no description was provided.
Viewers are left curious about the content as details remain sparse.

Pricing Shock: A member expressed dismay at some unspecified pricing, stating, "That pricing 😬".
An attached image potentially illustrates the pricing concern, sparking discussions about its implications.

Link mentioned:  - YouTube: no description found

GPU MODE ▷ #rocm (2 messages):

iGPU programming in Ryzen AI CPU, Graphics frameworks, HIP support 

Exploring iGPU Programming in Ryzen AI: One member inquired about the possibility of programming the iGPU in the Ryzen AI CPU (Strix Point).
They are seeking methods or frameworks to leverage this functionality.

Graphics Frameworks and HIP as Solutions: Another member suggested that graphics frameworks might be the best approach for programming the iGPU.
They also noted that, in theory, HIP should work for this purpose.

GPU MODE ▷ #lecture-qa (1 messages):

cost-effective algorithms, topology-based algorithms, all-to-all communication 

Exploring Cost-Effective Alternatives to All-to-All Communication: A member inquired about solutions that are more efficient than all-to-all communication, citing cost concerns.
They specifically asked if anyone has better alternatives for algorithm choice beyond topology-based approaches.

Inquiry on Algorithm Choices: Another discussion focused on the possible choices for algorithms, questioning if topology-based methods were the best route.
Members were curious about innovative strategies that could reduce costs while maintaining efficiency.

GPU MODE ▷ #liger-kernel (1 messages):

Transformers version, Logits issue, LigerCrossEntropy optimization 

Query about Transformers Version: A query was raised regarding the transformers version being used when running tests, suggesting its relevance to the issue at hand.
What's your transformers version when running tests?

Logits Test Issue Linked: It was pointed out that the problem with the logits in the test_mini_models_with_logits test might be related to an existing issue reported on GitHub, specifically Issue #543.
The issue discusses the convergence test failing due to differences in losses and logits from two models.

LigerCrossEntropy's Role in Newer Versions: The message mentioned that LigerCrossEntropy has likely optimized logits in newer transformers versions, potentially affecting test outcomes.
This change might be a factor in the discrepancies being observed in model performance.

Link mentioned: The convergence test test_mini_models_with_logits is failing with the latest transformers · Issue #543 · linkedin/Liger-Kernel: 🐛 Describe the bug In this convergence test, we compare losses and last step's logits to make sure two models (w/ and w/o monkey patch) can produce similar results. For logits, LigerCrossEntropy ...

GPU MODE ▷ #self-promotion (1 messages):

GPU Benchmarking, Thread Coarsening, Shared Memory Usage, Memory Coalescing, Synchronization Techniques 

Insights into GPU Benchmarking Techniques: The member shared that benchmarking was conducted on their laptop using the 4050 GPU, highlighting the importance of various techniques applicable across GPUs.
Important techniques discussed include Thread Coarsening, Shared Memory Usage, Memory Coalescing, and ensuring proper synchronization.

Understanding Key Optimization Techniques: The conversation emphasizes the significance of optimizing memory usage through greater Shared Memory Usage and Memory Coalescing for improved performance across GPUs.
Community insights suggest that proper synchronization of processes can lead to more effective GPU resource management.

GPU MODE ▷ #avx (2 messages):

matmul kernel optimization, AVX2 FMA performance, matrix storage order, GitHub matmul project 

Optimizing Matmul Kernel with AVX2 FMA: A user shared their progress on optimizing a matmul kernel in x86_64 Assembly using AVX2 and FMA, achieving approximately 105 GFLOPs on a R7 5800X single core.
The theoretical maximum performance is ~150 GFLOPs, and they are currently transitioning to a blocking storage method to improve performance to around 120 GFLOPs.

Seeking Advice on Assembly Code: The user requested feedback on their rudimentary code implementation, expressing a need for guidance as they are not an experienced programmer.
They shared a GitHub repository where their work can be reviewed, inviting the community to contribute.

Link mentioned: GitHub - alint77/matmul_assembly_x86: Contribute to alint77/matmul_assembly_x86 development by creating an account on GitHub.

GPU MODE ▷ #thunderkittens (1 messages):

Associative Scan, Data Layouts in TK, Ortho/Aligned Layouts 

Associative Scan Implementation Queries: A member is exploring the addition of associative scans to TK, starting with vectors and transitioning to tiles, aiming to clarify data layout usage.
Is it even worth it to implement an associative scan for aligned/ortho layouts? they pondered, questioning the complexity versus necessity.

Confusion Around Layout Usage: The member noted that current implementations like rv_fl, rv_bf, and rv_hf appear to rely on the naive layout.
They raised the question of whether aligned/ortho layouts are actually utilized for vectors in TK.

GPU MODE ▷ #edge (1 messages):

PyTorch Edge Team, Discord Channel for On-Device AI, ExecuTorch Library 

PyTorch Edge Team opens Discord to public: The PyTorch Edge team at Meta has recently opened their Discord channel to the public for discussions on announcements, issues, and releases related to on-device AI.
They encourage new members to join and introduce themselves in the introduction channel.

Discussion on Contributions to ExecuTorch: The channel will also serve as a space for contributions to ExecuTorch, the on-device library focused on enhancing AI applications.
Feel free to join and actively participate in discussions surrounding this library and its developments.

GPU MODE ▷ #reasoning-gym (89 messages🔥🔥):

New PRs in reasoning-gym, Benchmarking the gym environment, Matrix manipulation dataset development, OpenRouter sponsorship for inference, Interactive training vision 

Exciting new PRs in reasoning-gym: Multiple new PRs have been opened, including Matrix Manipulation and Count Bits, showcasing collaborative contributions.
The team is eager to evaluate these new additions and potentially explore more datasets.

Plan to benchmark the gym environment: There are discussions about benchmarking the gym environment to see how RL training aids model generalization to unseen tasks, with interests in using OpenRouter for evaluation.
Suggestions for pooling resources and sharing scripts were proposed to coordinate the benchmarking effort across team members.

Development of matrix manipulation tasks: The manipulate_matrix dataset has received positive feedback, with suggestions for additional configuration options to enhance usability in tasks.
Thanks to recent contributions, the repository has expanded to a total of 65 datasets, a significant milestone.

Considering sponsorship for inference compute: OpenRouter may sponsor some compute credits for benchmarking, enabling smoother transitions between inference providers to maintain consistency in evaluation results.
A focus on securing sponsorship reflects the team's proactive approach to managing resources for their efforts.

Vision for interactive training with reasoning-gym: A new issue has been opened to propose an interactive training run using CLI commands or a web front end to control dataset configurations dynamically.
This innovative approach highlights a potential enhancement in training workflows, allowing for real-time adjustments during experimentation.

Links mentioned:

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance: Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an...
Raven's Progressive Matrices - Wikipedia: no description found
GPU MODE: A GPU reading group and community https://discord.gg/gpumodeSupplementary content here https://github.com/gpu-modeCreated by Mark Saroufim and Andreas Köpf 
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
DeepSeek: R1 – Provider Status: See provider status and make a load-balanced request to DeepSeek: R1 - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It&#...
no title found: no description found
Interactive training with reasoning-gym server · Issue #104 · open-thought/reasoning-gym: Vision: Launch a training run and use cli-commands (or a web-frontend) to monitor and manipulate the reasoning-gym dataset configuration - to directly control the next batch composition, e.g. add o...
Add basic matrix manipulation task dataset · Issue #95 · open-thought/reasoning-gym: Write a dataset class with corresponding unit tests for basic matrix manipulation tasks. Dataset entries: The question should include a randomly generated matrix (square or non-square) and instruct...
reasoning-gym/reasoning_gym/arc/arc_agi.py at 1f9d9d27ab0e0722a900b89e3820bec3435bdd50 · open-thought/reasoning-gym: procedural reasoning datasets. Contribute to open-thought/reasoning-gym development by creating an account on GitHub.
GitHub - sanjana707/Hacking_game: Password guessing game created using Python.: Password guessing game created using Python. Contribute to sanjana707/Hacking_game development by creating an account on GitHub.
Add score_answer method to word_ladder by Adefioye · Pull Request #93 · open-thought/reasoning-gym: This is a draft PR to add score_answer() method to WordLadder. Look forward to some feedback.If implementation is satisfactory, I can work on unit tests.
reasoning-gym-eval: no description found
R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
Feat/re arc by joesharratt1229 · Pull Request #88 · open-thought/reasoning-gym: The following PR implements the procedural task dataset class including unit tests.** Main Changes **Imports re-arc generator code from re-arc adapted such that there is explicit control over ra...
Sieve - Wikipedia: no description found
TRL upgrade by winglian · Pull Request #2307 · axolotl-ai-cloud/axolotl: wip towards adding support for GRPO

Notebook LM ▷ #announcements (1 messages):

NotebookLM Plus, Google One AI Premium Plan, Student Discount on AI Premium, Enhanced Features of NotebookLM Plus 

NotebookLM Plus joins Google One AI Premium: Starting today, NotebookLM Plus is included in the Google One AI Premium plans, providing users higher usage limits and premium features for research.
This enhances existing benefits including Gemini Advanced and 2 TB of storage for a more valuable package.

Students get a sweet deal!: U.S. students aged 18 and older can now enjoy a 50% discount on the Google One AI Premium plan, costing just $9.99/month.
This offer aims to make advanced AI research tools more accessible for students, starting today.

NotebookLM Plus boosts chat and sharing tools: NotebookLM Plus offers advanced chat customization and sharing capabilities, complete with usage analytics.
Users can now access 5x the notebooks, 6x the sources per notebook, and 7x more audio overviews.

Upgrade options for NotebookLM Plus: Users can upgrade to NotebookLM Plus via the provided link or directly within the NotebookLM interface later today.
This upgrade promises to deliver enhanced research features tailored to user needs.

Link mentioned: NotebookLM Plus is now available in the Google One AI Premium subscription.: NotebookLM is a research and thinking companion designed to help you make the most of your information. You can upload material, summarize it, ask questions and transfor…

Notebook LM ▷ #use-cases (26 messages🔥):

Medical Jargon Assistance, Audio Overview Creation, Versatile Bot Project, Mock Interview Preparation, Video Project Completion 

AI Transforms Medical Jargon into Clarity: A member shared their experience using AI to navigate medical jargon related to their breast cancer diagnosis, summarizing dense articles and recording surgeon appointments.
They expressed how reassuring it is to challenge the AI for clarifications, highlighting AI's role as a comforting aid during treatment.

Customizing Audio Overviews: Members discussed the inability to create a new audio overview without first deleting the existing one, emphasizing the need for more extensive audio summaries.
Some suggested specifying topics in the customization section to potentially enhance depth in coverage.

Versatile Bot Project Launched: A user introduced the Versatile Bot Project, providing two prompt documents to transform NotebookLM into different types of chatbots through specialized prompts.
They mentioned both prompts have been tested and aimed to create a customizable chatbot experience, encouraging community engagement.

Mock Interviews Enhanced with AI: One member described how they utilize NotebookLM to prepare for mock interviews by uploading job descriptions and company information to generate tailored notes.
This method allows for a more focused review and preparation process, enhancing their interview readiness.

Short Film Produced with NotebookLM's Help: A user completed a 6.5-minute short film, leveraging NotebookLM for audio overviews and editing clips to accompany discussions based on microfiction they wrote.
They detailed the project's extensive effort, especially in generating over 50 video shots, illustrating the power of AI-assisted creative processes.

Links mentioned:

GitHub - shun0t/versatile_bot_project: Transforming NotebookLM into a versatile bot: Transforming NotebookLM into a versatile bot. Contribute to shun0t/versatile_bot_project development by creating an account on GitHub.
Stellar Beacon 2:  Galactic Podcast Generated with NotebookLM and VideoFX: Join our intrepid podcasters as they delve into the latest news from the Stellar Beacon News Bulletin. From prison colonies to rogue mercenaries, they explor...

Notebook LM ▷ #general (118 messages🔥🔥):

NotebookLM functionality issues, NotebookLM Plus features, User accounts and sharing, Gemini integration, Language options in NotebookLM 

NotebookLM struggles with source generation: Multiple users reported issues with NotebookLM not generating notes or summaries from uploaded sources, with one stating 'New Note: Generating' indefinitely.
Some suggested that issues might arise from specific file formats like .txt or .pdf, while others noted successful generation when pasting text directly.

NotebookLM Plus expands user limits: NotebookLM Plus offers increased capabilities, such as five times the notebooks and audio overviews compared to the free version, which has inherent limits.
Users inquired about the specific limitations of both free and paid versions, directing them to official Google support links for comprehensive details.

Challenges with sharing notebooks: An administrator experienced difficulties sharing notebooks among created user accounts and mentioned that Gmail must be enabled for sharing functionalities to work.
Despite setting up SSO from Azure, users could not share notebooks, leading to discussions about the requirements for sharing and accessing accounts.

Integration with Gemini: Discussions highlighted the integration of Gemini with NotebookLM, with users exploring prompts to create study guides and FAQ documents more effectively.
Some expressed concerns regarding the potential for hallucinations when mixing responses from sources and online search results.

Availability of NotebookLM via Google One: Users inquired about accessing NotebookLM through Google One subscriptions, with some still facing 'coming soon' notifications despite having subscriptions.
Clarifications were sought regarding the rollout timing of features associated with Google One subscriptions.

Links mentioned:

Upgrading to NotebookLM Plus - NotebookLM Help: no description found
Gemini can now watch YouTube for you - skip the video, get the highlights: Don't want to wade through an entire video to find what you need? Let Gemini save you time and summarize it for you.
Cyborg Technology and Human Enhancement | DS Extended Essay Form: Welcome to this survey on cyborg technology and human enhancement you've been selected due to your interest in the following topic!I am exploring how technological advancements reshape our underst...
userscripts/gemini_download_conversation at master · DavidLJz/userscripts: GreaseMonkey / TamperMonkey Scripts. Contribute to DavidLJz/userscripts development by creating an account on GitHub.
It Matters To Her by Scotty McCreery on Apple Music: Song · 2021 · Duration 2:51

Eleuther ▷ #announcements (1 messages):

Sparse Autoencoders (SAEs), Skip Transcoders, Interpretability in Neural Networks, Partial Rewriting of Transformers, EleutherAI Libraries 

Skip Transcoders outperform Sparse Autoencoders: Introducing skip transcoders shows a Pareto improvement over SAEs, with enhanced interpretability and fidelity in neural networks. You can utilize skip transcoders by using flags --transcode and --skip_connection in the sparsify library.
Unlike SAEs, transcoders better approximate input-output relationships, thereby improving the approach to interpretability.

Disappointing results in partial rewriting: In research on partially rewriting transformers, the team trained a skip transcoder on the sixth layer of Pythia 160M but faced lackluster results. They failed to outperform a simple baseline of using a zero vector in place of the transcoder.
Despite these setbacks, they remain optimistic about refining their methods for more detailed and precise explanations.

Interest in Interpretability Research: The team expresses excitement about improving model interpretability, suggesting they are seeking collaboration and involvement from others. Discussions are ongoing in the #1153431135414669422 channel for anyone who wants to contribute.
Thanks were given to contributors for their work on both the skip transcoder and partial rewriting papers.

Links to Recent Research Papers: The team shared links to their newly published papers, including the skip transcoder paper and the partial rewriting paper. Both papers advance the understanding of mechanistic interpretability in neural networks.
Highlights from the abstracts underscore the significance of these architectures in enhancing human-understandable frameworks for machine learning.

Links mentioned:

Transcoders Beat Sparse Autoencoders for Interpretability: Sparse autoencoders (SAEs) extract human-interpretable features from deep neural networks by transforming their activations into a sparse, higher dimensional latent space, and then reconstructing the ...
Partially Rewriting a Transformer in Natural Language: The greatest ambition of mechanistic interpretability is to completely rewrite deep neural networks in a format that is more amenable to human understanding, while preserving their behavior and perfor...
Tweet from Nora Belrose (@norabelrose): Sparse autoencoders (SAEs) have taken the interpretability world by storm over the past year or so. But can they be beaten?Yes!We introduce skip transcoders, and find they are a Pareto improvement ove...

Eleuther ▷ #general (45 messages🔥):

SAE Visualization Tools, Distill Meetup Announcement, Learning about Generative Models, Older GPUs for AI Workloads, eGPU Setup on MacBook Air 

SAE Visualization Tools Needed: A user expressed a desire for a user interface built on delphi/sparsify libraries to make exploring SAEs more accessible.
Another member mentioned adapting existing SAE visualization libraries as a possible solution.

Join the Distill Meetup!: A virtual Distill meetup is being organized for next Friday with a focus on discussing articles and visualizations in science communication.
Interested participants were encouraged to respond for an invite and access shared meeting notes.

How to Learn Generative Models: A user sought advice on learning about promptable diffusion models and generative video models, hinting at the challenge of finding detailed textbooks.
Members suggested reading research papers, tinkering with sample code, and even checking LLMs to clarify doubts about confusing concepts.

Repurposing Older GPUs for AI: Concerns about using 1070ti mining rigs for AI highlighted issues with outdated architecture and bandwidth limitations.
Members noted the potential for such GPUs in inference but warned against their efficiency in training modern AI models.

eGPU Setup with MacBook Air: A user inquired about using their MacBook Air M2 with an eGPU for training ML models and the feasibility of learning CUDA.
Responses indicated that attempting to use an external GPU setup on a Mac may not be practical for ML applications.

Links mentioned:

A Comprehensive Mechanistic Interpretability Explainer & Glossary - Dynalist: no description found
Distill: Intro meet doc: Before you go through the doc, I would like to share a quote I like "You must be imaginative, strong-hearted. You must try things that may not work, and you must not let anyone define your limits...

Eleuther ▷ #research (49 messages🔥):

Sparse Outlier Matrix, Transformer Architecture without FFNs, Model Interpretability, Policy Gradient Approaches, Self-Improving Intelligence 

Understanding Sparse Outlier Matrix in FP4 Training: The sparse outlier matrix compensates for quantization error caused by clamping, allowing high-precision sparse matrix multiplication of the residuals to preserve accuracy.
Clamping thresholds are set high (0.99 to 0.999), resulting in a sparse residual matrix with only 0.2% to 2% non-zero elements.

Exploring Transformers without Feed-Forward Networks: A new transformer model proposes using persistent memory vectors in self-attention layers, eliminating the need for feed-forward networks while maintaining performance.
This architecture could facilitate teaching new knowledge to transformers without modifying all weights, potentially making learning updates more efficient.

Insights on Model Interpretability for Researchers: Model interpretability is highlighted as a field requiring further foundational theory but with opportunities for applied experimentation.
Engaging in rapid testing could provide insights beneficial for beginner researchers moving into this area.

Discussion on Policy Gradient Algorithms in RL: The disparity between policy gradient algorithms, which are often forward KL-oriented, and state-of-the-art continuous control methods that favor reverse KL, is noted.
This difference appears less impactful in discrete action environments, suggesting variations in approach aren't as critical.

Seeking Feedback for Self-Improving Intelligence Paper: An author is seeking feedback and potential arXiv endorsement for a paper detailing a novel approach using recursive reasoning loops for AI self-improvement.
Community members shared insights about navigating the endorsement process and seeking solid reviews before publication.

Links mentioned:

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning: Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this res...
Generating Symbolic World Models via Test-time Scaling of Large Language Models: Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindere...
Augmenting Self-attention with Persistent Memory: Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The la...
Improving Transformer World Models for Data-Efficient RL: We present an approach to model-based RL that achieves a new state of the art performance on the challenging Craftax-classic benchmark, an open-world 2D survival game that requires agents to exhibit a...
Value Residual Learning For Alleviating Attention Concentration In Transformers: Transformers can capture long-range dependencies using self-attention, allowing tokens to attend to all others directly. However, stacking multiple attention layers leads to attention concentration. O...
Temporal Difference Learning: Why It Can Be Fast and How It Will Be...: Temporal difference (TD) learning represents a fascinating paradox: It is the prime example of a divergent algorithm that has not vanished after its instability was proven. On the contrary, TD...
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach: We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling...
GitHub - KellerJordan/modded-nanogpt: NanoGPT (124M) in 3 minutes: NanoGPT (124M) in 3 minutes. Contribute to KellerJordan/modded-nanogpt development by creating an account on GitHub.

Eleuther ▷ #interpretability-general (18 messages🔥):

Checkpointing Strategies, Pythia Checkpoints, Niche Training Tasks, Training Dynamics Analysis, Saving Checkpoints Without Interrupting Training 

Exploring Checkpointing Strategies for LLMs: A member proposed using an exponential checkpointing strategy initially (1, 2, 4, 8, 16) followed by fixed linear intervals, expressing curiosity about alternative approaches.
They suggested 1K or 5K steps would be better for linear checkpoints compared to Pythia's 10K steps.

Pythia's Checkpointing Methodology: Discussion clarified that Pythia saves checkpoints every 1,000 steps, countering the idea that it uses 10K steps.
Researchers opted for this spacing to allow deeper analysis using log(tokens) for interpretations.

Considerations for Checkpoint Resolutions: There was mention of resolution loss issues just after 1,000 steps, referencing discussions from Curt's paper on circuit stability.
Members reflected on having released a high number of checkpoints, indicating uncertainty about what resolutions were most valuable.

Saving Checkpoints Without Interruption: A member inquired about whether it's possible to save checkpoints without pausing training, potentially using a separate process.
While one member thought this might be possible, they could not recall the specific flag or details regarding this functionality.

Reflections on Early Checkpointing Decisions: In retrospect, there was some consideration about having smaller linear step sizes and switching over earlier, to improve efficiency.
Conflicting information from the time also highlighted practical concerns, particularly regarding wallclock overhead for saving checkpoints.

Eleuther ▷ #lm-thunderdome (7 messages):

Evaluating LLMs with Chess Tactics, MCQ vs Free Form Generation for Tasks, Current Progress on Chess Task Implementation, Challenges with Generative Tasks, Tactics Database Management 

Evaluating LLMs with Chess Tactics: A member proposed creating a task to evaluate LLMs using a dataset of chess tactics, suggesting it as a unique approach to enhance LLM performance.
The goal is to eventually allow LLMs to play chess, leveraging reinforcement learning on positions with exact solutions.

Choosing Task Format: MCQ vs Free Form: Discussion highlighted the potential task formats for evaluating LLMs, debating between MCQ style and free-form generation.
One view suggests avoiding MCQA for simplicity and prefers having models show their reasoning through  tags.

Current Progress on Chess Task Implementation: The developer has made progress by creating an initial example that passes validity checks for the chess evaluation task.
However, they encounter a bug with generative tasks while using mlx_lm on macOS, hindering further development.

Challenges with Generative Tasks: Despite testing a prompt on ChatGPT, which finds mate in 1s, the model struggled with more complex positions.
There are concerns about some models not formatting answers correctly using  tags, which complicates evaluation.

Managing a Large Tactics Database: The tactics database has grown to over 4M+ tactics, and the developer seeks suggestions on effectively managing this size.
They reported that analyzing a 100-example subset with a small 14B model takes about an hour on their machine.

Eleuther ▷ #multimodal-general (1 messages):

Self-aware AI concepts, Design of CARL 

Introducing CARL, the Self-Aware AI Concept: A member presented a design concept for a self-aware AI named CARL, complete with a visual representation shared in an image.
The attached image can be viewed here

Visual Representation of AI Concepts: The shared image of CARL highlights a creative approach to depicting self-aware AI, showcasing its possible form and features.
Members expressed interest in exploring the implications of such designs on future AI development.

Eleuther ▷ #gpt-neox-dev (4 messages):

VocabParallelEmbedding, use_flashattn_swiglu settings, RMSNorm vs RMSNorm Fusion, Asynchronous Checkpointing in NeoX, Torch Compile Speedups 

VocabParallelEmbedding adjustment for weight decay: A member found the part of the codebase where NeoX calls its embedding layer, called VocabParallelEmbedding, and added it to the weight decay ignore list.
They questioned whether this addition alone would suffice, especially since they are using tied embeddings.

Curiosity about use_flashattn_swiglu's impact: A question arose regarding the use_flashattn_swiglu setting in the configuration, where a member experienced negligible impact.
They inquired if others found it helpful, along with questioning the usefulness of rmsnorm_fusion from Apex.

Inquiry on separate process for metric collection: A member asked if there is a method to perform metric collection, logging, and model checkpointing on a different backend in NeoX, referencing this blog.
They related their query to OLMo2's paper, highlighting improvements in checkpointing times via new asynchronous methods.

Speedups with torch.compile in NeoX: A member inquired about potential speedups using torch.compile in NeoX as they could not find any related flags.
This raises curiosity about optimizing model performance with available compilation techniques.

Links mentioned:

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing: Summary:   With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how IBM Research Team is able to implement and reduce effective checkpointing ti...
gpt-neox/megatron/model/utils.py at main · EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - EleutherAI/gpt-neox

Yannick Kilcher ▷ #general (65 messages🔥🔥):

Reinforcement Learning (RL), Logits vs Probabilities, Yannic Kilcher's Work, DeepSeek Models, Religious Discussions in Discord 

Reinforcement Learning Insights: Discussion centered around the nuances between RL training processes such as using direct logit optimization instead of transitioning to probability space too early.
It was noted that RL can be applied before, inside, or after a transformer, impacting how policies are learned and actions are selected.

Logits vs Probabilities in Training: Participants debated the benefits of training models in log space compared to absolute space, emphasizing that log space can capture a wider range of values.
Zickzack highlighted that using log space can lead to more similarities in distant points and affect accuracy based on the use case.

Yannic Kilcher's Professional Role: Mr. Yannic Kilcher, cofounder of DeepJudge, was discussed for his contributions and current projects in AI.
Participants inquired whether he is a full-time YouTube creator or primarily focused on his startup.

DeepSeek Research Discussion: Users shared their experiences and opinions regarding the DeepSeek r1 model, noting its efficiency in certain applications.
Questions arose regarding the availability of research papers related to DeepSeek developments and comparisons to other models.

Religious Discussions Spark Debate: There were mixed reactions to discussions about religious texts, with some participants advocating for reading the Quran for guidance.
This led to light-hearted comments and jokes about people's views on religion, showing a spectrum of opinions on the topic.

Links mentioned:

Tweet from Jaechul Roh (@JaechulRoh): 🧠💸 "We made reasoning models overthink — and it's costing them big time."Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like Ope...
AI Mathematical Olympiad - Progress Prize 2: Solve national-level math challenges using artificial intelligence models
DeepJudge - Collective Knowledge Revived: no description found
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models: We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small...
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning: Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the...
Generalized Contrastive Divergence: Joint Training of Energy-Based Model and Diffusion Model through Inverse Reinforcement Learning: We present Generalized Contrastive Divergence (GCD), a novel objective function for training an energy-based model (EBM) and a sampler simultaneously. GCD generalizes Contrastive Divergence (Hinton, 2...
Diffusion Spectral Representation for Reinforcement Learning: Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising,...
Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC: Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying seque...
Sampling from Energy-based Policies using Diffusion: Energy-based policies offer a flexible framework for modeling complex, multimodal behaviors in reinforcement learning (RL). In maximum entropy RL, the optimal policy is a Boltzmann distribution derive...

Yannick Kilcher ▷ #paper-discussion (41 messages🔥):

RSS Feeds for ML/DL, Sparse Autoencoders Research, AI Oversight and Model Similarity, Hugging Face Daily Papers, PhD Paper Assistant Tool 

Discussion on Useful RSS Feeds for ML/DL: Members discussed the relevance of RSS feeds for tracking ML/DL research, with comments suggesting alternatives like latent.space and the Hugging Face papers site.
One user indicated that RSS is outdated, while another mentioned using GitHub to filter papers by keywords.

Debate on Sparse Autoencoders: A member expressed skepticism about Sparse Autoencoders (SAEs) being overhyped, stating they expected more interpretability but encounter inconsistencies across random seeds in results.
Discussion included insights from recent papers that critique SAEs and explore new methods for model interpretation.

Exploring AI Oversight Through Model Similarity: A paper highlighted the challenges of evaluating advanced language models and proposed a probabilistic metric for assessing model similarity based on mistake overlap.
Concerns were raised about increasing systemic mistakes in advanced models, with conversations around the implications for AI oversight.

Utilizing Hugging Face Daily Papers: One user shared a resource to subscribe to daily paper updates from Hugging Face, suggesting it allows tracking trending ML papers efficiently.
Members appreciated the idea of filtering daily papers by keywords and highlighted the importance of a ranking system for better organization.

PhD Paper Assistant Tool Launch: A new tool called PhD Paper Assistant aims to help students navigate complex research papers filled with ML jargon by filtering content based on keywords.
The tool also allows users to sort and pin preferred papers, enhancing the research experience for PhD students.

Links mentioned:

Great Models Think Alike and this Undermines AI Oversight: As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we ref...
Great Models Think Alike and this Undermines AI Oversight: Model similarity has negative effects on using LMs to judge or train other models; Unfortunately LMs are getting similar with increasing capabilities.
Sparse Autoencoders Trained on the Same Data Learn Different Features: Sparse autoencoders (SAEs) are a useful tool for uncovering human-interpretable features in the activations of large language models (LLMs). While some expect SAEs to find the true underlying features...
Sparse Autoencoders Can Interpret Randomly Initialized Transformers: Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. In this paper, we apply SAEs to 'interpret' random transformers,...
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders: Fine-grained steering of language model outputs is essential for safety and reliability. Prompting and finetuning are widely used to achieve these goals, but interpretability researchers have proposed...
Tweet from Nora Belrose (@norabelrose): Sparse autoencoders (SAEs) have taken the interpretability world by storm over the past year or so. But can they be beaten?Yes!We introduce skip transcoders, and find they are a Pareto improvement ove...
Tweet from Jan Leike (@janleike): After ~300,000 messages and an estimated ~3,700 collective hours, someone broke through all 8 levels.However, a universal jailbreak has yet to be found...Quoting Jan Leike (@janleike) We challenge you...
Daily Papers - Hugging Face: no description found
GitHub - SmokeShine/phd-paper-assistant: PhD Paper Assistant is a web-based tool designed to help PhD students navigate and understand complex research papers, particularly those filled with machine learning (ML) jargon.: PhD Paper Assistant is a web-based tool designed to help PhD students navigate and understand complex research papers, particularly those filled with machine learning (ML) jargon.  - GitHub - Smoke...

Yannick Kilcher ▷ #agents (6 messages):

Reinforcement Learning in AI Agents, PlanExe AI Project, LLMs and Token Counting Limitations 

Questioning RL Definitions in AI Agents: One member explored whether using VectorDB to store lessons as embeddings constitutes true Reinforcement Learning (RL), questioning if genuine RL can be implemented without fine-tuning hosted LLMs.
They seek insights on simulating RL-like behavior and also inquired about relevant papers on agentic frameworks with RL implementations.

Introducing PlanExe: A Structured AI Planner: Another member presented their project, PlanExe, created with LlamaIndex and OpenRouter, capable of generating structured plans like SWOT analyses without deep web searching.
They shared a GitHub link to the project, expressing uncertainty about the accuracy of the outputs it generates.

LLMs Struggling with Token Count: A member pointed out that LLMs have difficulty counting tokens in their context, indicating a broader issue with tokenization not being the only problem in counting characters in words.
This was further emphasized by another user who remarked that LLMs can't count at all.

Links mentioned:

GitHub - neoneye/PlanExe: AI planner similar to OpenAI's deep research: AI planner similar to OpenAI's deep research. Contribute to neoneye/PlanExe development by creating an account on GitHub.
PlanExe-web: Website for PlanExe

Yannick Kilcher ▷ #ml-news (8 messages🔥):

ELEGNT Video Discussion, Guardrails in Drug Discovery Algorithms, Guardrails for Bioweapons Discovery, Anthropic's Information Output 

ELEGNT Video on Robot Movement: A YouTube video titled ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot was shared here. The video is currently undefined in description but relates to movement design.
The discussion around this video emphasizes innovative movement techniques applied to robot design.

Algorithm Shifts in Drug Discovery: There's concern that a drug discovery algorithm switched from toxicity minimizing to maximizing, resulting in discovering 40,000 potential bioweapons in just 6 hours. The implication is that guardrails are ineffective against broader knowledge synthesis.
It raises questions about the algorithm's focus on specific topics while potentially neglecting more harmful compounds.

Guardrails' Impact on Bioweapons Discovery: The possibility exists that guardrails intended for one specific nerve gas could lead to overlooking numerous harmful compounds. There's an awareness that discovering other bioweapons merely requires applying the same methodology to millions of compounds.
Critiques suggest that by narrowing their focus, creators might inadvertently create blind spots in their safety measures.

Anthropic's Information Handling Critique: A member criticized that Anthropic's outputs often offer partial information, creating a false sense of security. Even when outputs contain valuable pieces of information, the perception of safety lagged behind the actual effectiveness.
This highlights a fundamental gap between advertised safety measures and real-world applications while addressing complex concerns.

Links mentioned:

Tweet from Wyatt walls (@lefthanddraft): 7 down, 1 to goQuoting Wyatt walls (@lefthanddraft) 6 down, 2 to go
AI Demos | Meta FAIR: Try experimental demos featuring the latest AI research from Meta.
ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot: no description found
Pixx Pixar GIF - Pixx Pixar Lamp Pixar - Discover & Share GIFs: Click to view the GIF

LlamaIndex ▷ #blog (5 messages):

Gemini Flash 2.0, AI in Enterprise Automation, CrossPoster App Launch, GraphRAG Pipelines 

Gemini 2.0 Flash revolutionizes document processing: LlamaParse now supports Gemini 2.0 Flash, providing GPT-4o+ performance at a fraction of the cost for document processing.
The future of workflows is poised to leverage VLMs and LLMs, according to recent discussions.

YouTube Research Agent built with Gemini Flash 2.0: A tutorial introduced by @composiohq details how to build a YouTube research agent using Gemini Flash 2.0, enabling robust video searches and draft creations in Gmail.
This integration positions @llama_index as a fundamental tool in simplifying video research workflows.

AI's role in enterprise automation: An article emphasized that enterprises should focus on adapting AI technology to automate knowledge work and resolve business challenges effectively.
Cobus Greyling proposes that this should become a primary goal for businesses by 2025.

Launch of CrossPoster for social media: Today marks the launch of CrossPoster, an app designed to cross-post to Twitter, LinkedIn, and BlueSky using AI for optimal social media engagement.
The app intelligently identifies individuals and their accounts, streamlining the process of managing social presence across platforms.

Utilizing GraphRAG pipelines for data insights: GraphRAG pipelines are highlighted for their ability to transform raw data into actionable insights through knowledge graph creation.
This approach enhances LLM accuracy with domain-specific knowledge, allowing for more comprehensive search capabilities.

LlamaIndex ▷ #general (111 messages🔥🔥):

Timeout Issues with OpenAI LLM, Agent Hand-off Mechanism in LlamaIndex, Gemini Function Calling Challenges, LlamaIndex and RAG Implementation, AzureAI Search Custom Metadata Fields 

Timeout Issues with OpenAI LLM: Members discussed how the timeout for OpenAI LLM options is clobbered by the retry_decorator, causing inconsistencies despite higher timeout settings.
A member mentioned that even on submission of a bug fix, Deepseek returns a 200 OK response after 60 seconds but with an empty body, making the issue complicated.

Agent Hand-off Mechanism in LlamaIndex: Concerns were raised regarding the efficacy of the can_handoff_to feature in LlamaIndex, specifically when agents pass control without a response from the receiving agent.
Suggestions to enable debug logging and utilize LlamaIndex's callback handler were offered as troubleshooting steps.

Gemini Function Calling Challenges: Forum members expressed frustration over the difficulty of debugging function calls in Gemini, citing issues with type annotations and unclear error messages.
Despite the frustrations, some were able to engineer their tool outputs around the existing bugs.

LlamaIndex and RAG Implementation: Members discussed strategies for handling user queries that require entire document context in RAG settings, like identifying summary and vector indices.
Utilizing agents or query classification logic was recommended for better management of retrieval based on specific query needs.

AzureAI Search Custom Metadata Fields: A question arose regarding the hardcoded customization of filterable metadata fields in AzureAI Search, with particular fields such as 'author' and 'director' noted.
Members noted that Azure requires these metadata fields to be defined upfront, which can be limiting but emphasizes the importance of useful document fields.

Links mentioned:

GitHub - run-llama/llama_deploy: Deploy your agentic worfklows to production: Deploy your agentic worfklows to production. Contribute to run-llama/llama_deploy development by creating an account on GitHub.
anthropic-cookbook/misc/using_citations.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook
Starter Tutorial (OpenAI) - LlamaIndex: no description found
Introduction to RAG - LlamaIndex: no description found
fix gemini multi-turn tool calling by logan-markewich · Pull Request #17764 · run-llama/llama_index: There were two issuesthe original tool call was not being included in the messages when converting from llama-index to gemini messagesstreaming tool calls that mix text and tool calls were clobb...
Tracing and Debugging - LlamaIndex: no description found
run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index
Contextual Retrieval With Llama Index - LlamaIndex: no description found
Contextual Retrieval - LlamaIndex: no description found
openai-python/src/openai/_client.py at 7193688e364bd726594fe369032e813ced1bdfe2 · openai/openai-python: The official Python library for the OpenAI API. Contribute to openai/openai-python development by creating an account on GitHub.
openai-python/src/openai/_response.py at 7193688e364bd726594fe369032e813ced1bdfe2 · openai/openai-python: The official Python library for the OpenAI API. Contribute to openai/openai-python development by creating an account on GitHub.
Workflow - LlamaIndex: no description found
Multi-agent workflows - LlamaIndex: no description found
llama_index/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py at 7391f302e18542c68b9cf5025afb510af4a52324 · run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index
Azureaisearch - LlamaIndex: no description found
llama_index/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py at 5b0067e146c919bc804803892aa2456842a80346 · run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index

LlamaIndex ▷ #ai-discussion (1 messages):
mrmirro: 💯

Cohere ▷ #discussions (14 messages🔥):

Job Application Advice, Engineering Internships, Networking, Open Source Contribution, Canadian Engineering Competition 

Trust Yourself in Job Applications: Emphasizing self-belief, one member encouraged others to 'trust in yourself regardless of what they say' during job applications.
Another added that everyone is just as uncertain, pushing for persistence in the face of challenges.

Challenges of Finding Internships: Members discussed the current lack of hiring opportunities for engineering internships, particularly in the local area.
One user shared their experience of competing in the Canadian engineering competition, ranking in the top 6 for junior design.

Networking is Key: A member emphasized that networking is crucial, regardless of one’s location, suggesting participation in events to boost exposure.
Engaging in open-source projects was also recommended as a way to connect with others in the field.

Competing for Exposure: In an effort to gain experience, one user mentioned attending conferences and competitions relevant to their engineering field.
They highlighted their participation in the Canadian engineering competition, reflecting their commitment to personal development.

Coding for Fun: A member humorously stated they enjoy coding just for fun, illustrating a lighter side to the otherwise serious job seeking discussions.
This reflected a balance between pursuing professional goals while engaging in activities that bring joy.

Cohere ▷ #api-discussions (8 messages🔥):

LibreChat Endpoint Issues, Curl Testing, Cohere API Versioning 

LibreChat struggles with Cohere API: A member highlighted that they can only access the Cohere API through https://api.cohere.ai/v1 using LibreChat's Custom Endpoint.
CURL worked, indicating the issue lies within LibreChat's integration with the API.

Curl testing reveals API accessibility: Another member suggested testing the Cohere API with curl, which confirmed that it works correctly.
If curl works, it's likely that the issue is specific to LibreChat, leading to an encouraged issue report on their GitHub.

LibreChat using outdated Cohere API: It was pointed out that LibreChat is currently calling the old API version (v1) and needs an update to the /v2 endpoint.
The URL https://api.cohere.com/v1 mirrors the functionality of https://api.cohere.ai/v1, providing a potential solution for current users.

Cohere ▷ #cmd-r-bot (59 messages🔥🔥):

Cohere Community Rules, Introduction Messages, AI Reasoning and Scalability, Working with Cohere staff, Discussion about Vapes 

Exploring Cohere Community Rules: Members discussed the Cohere Community rules, emphasizing respect and appropriate conduct within the server.
Questions arose about how these rules apply to conversations and what should be considered while engaging in discussions.

Crafting Introduction Messages: Users collaborated on drafting engaging introduction messages for newcomers, highlighting interests in AI and local initiatives like 'buy Canadian'.
An example introduction emphasized the desire to explore AI's potential and engage meaningfully with the community.

AI Reasoning and Scalability of Cohere: The discussion shifted to the scalability of Cohere's API and how accessible their staff is for collaboration.
Members expressed interest in understanding how Cohere supports businesses in leveraging AI for their products.

Inquiry About Vapes: A member encouraged a Socratic dialogue about vapes, prompting an exploration of what vapes represent in contemporary society.
This led to a humorous exchange where Socrates expressed his unfamiliarity with vapes, inviting education on the matter.

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Lecture by Yu Su, Language Agents, MOOC curriculum details 

Yu Su's Lecture on Language Agents Today at 4pm PST: Today at 4:00pm PST, join the livestream of the 3rd lecture featuring Yu Su presenting on Memory, Reasoning, and Planning of Language Agents here.
Yu Su argues that contemporary AI agents differ by utilizing language as a vehicle for reasoning and communication, presenting a conceptual framework and exploring their core competencies.

Yu Su's Contributions to NLP: Yu Su is a Distinguished Assistant Professor at the Ohio State University and co-directs the NLP group with significant contributions including Mind2Web, SeeAct, HippoRAG, LLM-Planner, and MMMU.
His work has garnered recognition like the Best Student Paper Award at CVPR 2024 and Outstanding Paper Award at ACL 2023.

Upcoming MOOC Curriculum Details: An announcement stated that MOOC curriculum details will be released soon, and thanked everyone for their patience.
Details regarding the curriculum remain pending, encouraging participants to stay tuned.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (50 messages🔥):

Course Registration, Certificate Issues, Project Collaboration, MOOC Curriculum, Research Track Registration 

Course Registration for Late Enrollers: Users inquired whether they could enroll in the LLM Agents MOOC that started in January, with confirmations that late registration is possible by completing the signup form.
Participants were clarified that earlier course versions are not strict prerequisites for later iterations.

Concerns Over Missing Certificates: Several users reported not receiving their certificates while their peers have, prompting a focus on missing completed certificate declaration forms as a required step.
The course staff reiterated that completion of this form is necessary for certificate issuance and needs to be submitted individually.

Project Collaboration Inquiry: One user expressed interest in collaborating on a project while ensuring compliance with course guidelines regarding publication rights as a MOOC student.
Course staff promised to release more curriculum details soon, addressing concerns about project framework and publication limitations.

MOOC Curriculum and Requirements Update: Participants asked about the specifics of assignments and projects outside of quizzes, to which staff mentioned detailed information would be released shortly.
Users were encouraged to remain patient while awaiting clear guidelines on project requirements and grading policies.

Research Track Registration Inquiry: Users sought clarity on how to register for the research track, indicating a need for guidance on the appropriate Google form.
Additional suggestions included creating an automated agent to streamline the certificate process and address common queries.

Links mentioned:

no title found: no description found
🖥️ Compound AI System using LLMs 🖥️: A Compound AI System performing multiple complex tasks
Tweet from Tony T (@Tonikprofik): 🚀As part of my research for Master thesis. I enrolled in  MOOC of UC Berkeley on Large Language Model (LLM) Agents, offered in Fall 2024🧠✨This course deep dived into LLM agents' applications. He...

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (12 messages🔥):

SFT vs DPO, Importance of Negative Examples, LLM Training Challenges, Lecture 2 Study Session, Time Zone Discussions 

SFT and DPO explain training paradigms: A member explained how Supervised Fine Tuning (SFT) uses only positive examples while Direct Preference Optimization (DPO) incorporates negative responses, highlighting the penalties for bad responses in DPO.
Bad responses, often well-structured, trigger an increase in their probability during SFT due to the absence of a reward model.

Challenges of model responses under altered instructions: Discussion focused on a slide indicating the expectation that a modified instruction x' would lead to a worse response y', emphasizing the challenges of generating responses that are relevant yet semantically different.
The model is tasked with producing accurate yet deficient responses while adhering to the modified prompt's requirements, showcasing a tough balancing act.

Study session on Lecture 2 announced: A member announced a study session on Lecture 2: Learning to Reason with LLMs, inviting others to join via a provided link.
Participants were encouraged to prepare for discussing GRPO from DeepSeek-R1 as part of the study materials.

Concerns about timing for study session: One participant expressed concern about the study session's timing, noting that it fell at 3:00 AM UK time.
This highlighted potential scheduling conflicts for international members.

Torchtune ▷ #general (13 messages🔥):

Artificial Data Generation, Kolo Fine Tuning Tool, Challenges in Synthetic Data Creation, Public Roadmap Update 

Artificial Data Generation Exploration: A member is venturing into artificial data generation and seeks tools for converting unstructured data, such as PDFs and Excel files, into training samples for LLMs.
They shared a YouTube video on synthetic data generation methodologies relevant to this field.

Kolo Tool for Fine Tuning: A member is developing a tool called Kolo aimed at simplifying the fine-tuning process for models, though it currently doesn't assist in data creation.
The creator is working on incorporating a feature to help generate training data in the future.

Challenges of Synthetic Data Generation: Discussions highlighted the complexity of training LLMs with synthetic data, noting that generating questions from individual documents may not cover necessary comparative insights.
A member expressed that in-depth queries require comprehensive training data across multiple document sources to ensure effective learning.

Feedback on Roadmap Availability: A reminder was issued regarding the public roadmap discussed previously, with a member inquiring about a draft version.
It was confirmed that the roadmap is in the approval process and should be shared on GitHub by the end of the week once finalized.

Link mentioned: Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3): ➡️ Get Life-time Access to the Complete Scripts (and future improvements): https://Trelis.com/ADVANCED-fine-tuning➡️ One-click fine-tuning and LLM templates:...

Torchtune ▷ #dev (43 messages🔥):

PR #2257 Review, GRPO Development Philosophies, Checkpointing Methods, PyTorch Dependency Management, Support for UV and Pip 

PR #2257 Needs Extra Eyes: A member shared a PR #2257 for review, noting it's working in their local tests but seeking additional feedback.
Another member reviewed it, praising the changes but mentioning UX concerns with quantization and suggesting documentation updates.

Two Philosophies on GRPO Features: The discussion centered around whether to keep or remove various functionalities in GRPO for simplification, balancing between ease of use and code cleanliness.
Several members expressed opinions, leaning towards removing unnecessary code while noting the potential need for specific features like activation checkpointing.

Understanding Checkpointing Mechanics: Details were shared about how resume functionality works in torchtune, emphasizing updates to checkpoint paths and the importance of the resume_from_checkpoint flag.
Members discussed the implications of checkpointing practices, including an unusual workflow regarding loading initial weights.

Managing PyTorch Dependencies: A member proposed adding install options for different versions of dependencies, highlighting potential complications with nightly versions and standard pip support.
The discussion included considerations for supporting both pip and uv, weighing the benefits and drawbacks of extending the pyproject.toml.

Support for UV Users: There was an acknowledgment of uv's growing popularity among users, with suggestions to implement support alongside traditional pip approaches.
Emphasis was placed on prioritizing pip while being open to well-tested additions for uv due to its utility in user workflows.

Links mentioned:

Dependency Groups - Python Packaging User Guide: no description found
Using uv with PyTorch | uv: no description found
Checkpointing in torchtune — torchtune main documentation: no description found
Build software better, together: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
pyproject.toml wrong dev deps organization · Issue #2375 · pytorch/torchtune: torchtune has dev dependencies defined at [project.optional-dependencies] - https://github.com/pytorch/torchtune/blob/main/pyproject.toml#L47 while they should be defined at [dependency-groups] acc...
torchtune/recipes/configs/mistral/7B_full_ppo_low_memory.yaml at 9da35c744adef777ec9b8d8620337ae5f0371dd5 · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
Removing ao from pyproject.toml by ebsmothers · Pull Request #1452 · pytorch/torchtune: TLDR: We have to choose between our ability to consistently provide stable, well-tested nightly packages and a clean install experience for all of our users. This PR reluctantly proposes to sacrifi...
Rework recipes section of README and simplify models ref by joecummings · Pull Request #2349 · pytorch/torchtune: no description found
Grpo loss by kashif · Pull Request #553 · linkedin/Liger-Kernel: SummaryAdds the GRPO chunked lossfixes issue #548Testing DoneHardware Type:  run make test to ensure correctness run make checkstyle to ensure code style run make test-convergence to ensu...
[RFC] Liger FlexChunkLoss: Grouping Loss · Issue #548 · linkedin/Liger-Kernel: 🚀 The feature, motivation and pitch If I can assume that many research efforts like Group Relative Policy Optimization (GRPO) will emerge, I think we could introduce a LigerFusedLinearGroupingBase .....

Nomic.ai (GPT4All) ▷ #general (45 messages🔥):

Model Selection Menu, AI Agents and Memory, Image Analysis in GPT4All, PDF Processing and Embedding, Long-term Memory Solutions 

Critique of Model Selection Menu: Concerns were raised about the lack of a functional model selection menu with search options in GPT4All after 36 releases, suggesting it might just require copy-pasting from other platforms.
One member proposed contributing code to address missing features since GPT4All is an open-source product.

Exploring AI Agents for Long-Term Memory: Members discussed the potential of AI agents that utilize databases for long-term memory, with suggestions to enhance LLMs' temporal awareness through functions.
The year 2025 was mentioned as a possible turning point for agentic AI advancements.

Limitations of Image Analysis: It was clarified that, currently, GPT4All does not support image analysis, with suggestions to explore alternative platforms for such capabilities.
Recommendations for tools like booruDatasetTagmanager and joycaption were provided for users interested in image-related tasks.

Best Practices for PDF Processing: Members discussed effective strategies for embedding and summarizing long documents, such as PDFs, into usable formats for GPT4All.
The need to handle downloads from browsers properly was emphasized to ensure the elimination of irrelevant content before embedding.

Choosing the Right Model: When asked about model performance, Qwen2.5 and Phi4 were recommended for their efficiency over other models, including Mistral, based on member experience.
The importance of selecting models integrated with the app for user-friendliness was highlighted, along with a willingness to help those unfamiliar with downloading from Hugging Face.

Links mentioned:

GPT4All API Server - GPT4All: GPT4All Docs - run LLMs efficiently on your hardware
Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API: Hello everyone!  Ok, I admit had help from OpenAi with this. But what I “helped” put together I think can greatly improve the results and costs of using OpenAi within your apps and plugins, specially ...
gpt4all/gpt4all-bindings/typescript/spec/chat-memory.mjs at main · nomic-ai/gpt4all: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all
QuantFactory/gpt2-large-GGUF at main: no description found
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF at main: no description found
Home: Graphical Java application for managing BibTeX and biblatex (.bib) databases - JabRef/jabref
JabRef: no description found

tinygrad (George Hotz) ▷ #general (21 messages🔥):

tinygrad testing and features, web-based LLM demos, tinygrad community discussions, company meeting agenda, ML frameworks and fp16 concerns 

Testing tinygrad on mobile devices: Discussion emerged around the performance of tinychat demos on mobile, highlighting that WebGPU fails on iPhone 15 due to caching issues while M1 Pro users find it works well in both Safari and Chrome.
Users expressed a need for further testing to improve compatibility, particularly regarding WASM loading on mobile devices.

Tinygrad company structure clarification: A user mistakenly thought tinygrad is based in San Diego due to Twitter information, but was corrected that it is a fully remote company.
This led to questions about the Ampere Altra processor support and backend acceleration capabilities for tinygrad.

Company Meeting #57 Topics Announced: Meeting #57 scheduled for Monday includes topics like company updates, CI speed, tensor cores, and discussion on potential bounties related to WebGPU and tinychat.
Such meetings aim to enhance the operational speed of internal processes while addressing community interests in ongoing projects.

Exploring fp16 in ML frameworks: A user questioned why most ML frameworks don't operate solely in fp16, prompting active discussion on its potential disadvantages and performance limitations.
George responded to the inquiry with a directive to review discord rules, sparking further commentary on research quality prior to inquiries.

PR clarity and numerical accuracy: Discussion unfolded around a pull request (PR) that implements a script but requires further features and testing for Hugging Face models.
The community emphasized the importance of clean PR structure for easy reviews while acknowledging existing numerical inaccuracies in quantized models as a challenge.

Links mentioned:

WebLLM Chat: no description found
Make tensor use UOp lshift/rshift; delete SimpleMathTrait · tinygrad/tinygrad@caafb50: You like pytorch? You like micrograd? You love tinygrad! ❤️  - Make tensor use UOp lshift/rshift; delete SimpleMathTrait · tinygrad/tinygrad@caafb50

DSPy ▷ #show-and-tell (1 messages):

DSPy, BERT model training, Mistral architecture, Automated article processing 

DSPy Revolutionizes Article Classification: After struggling with GPT-3.5 and the high cost of GPT-4, a member transitioned to training a BERT based model to classify incoming articles effectively.
Today marks a significant milestone with a highly optimized prompt that extracts a dozen fields from each article using DSPy, significantly enhancing performance.

Mistral Models Shine in Cost Efficiency: The member utilized Miprov2 with o3-mini as a teacher and Mistral Small 3 as a student, creating a fully automated process that is both cheap and efficient.
This setup enables batch processing of articles every 24 hours, achieving results that exceeded expectations with a 50% discount.

From Bumpy Beginnings to Streamlined Workflow: Two years ago, the member faced hurdles with manual data classification, but now sees a steady flow of 100-200 articles daily processed effortlessly.
The initial BERT model setup laid the groundwork for today's automated solution, showcasing tremendous growth in capability and efficiency.

DSPy ▷ #papers (2 messages):

Multi-Agent Systems, AI Agents and System Engineering, MASS Optimization Framework, Automation-oriented Sandbox Games 

Multi-Agent Systems excel at complex tasks: Large language models operating as multiple agents can excel at solving complex tasks due to effective interaction and collaboration programs, as detailed in the MASS framework.
The analysis emphasizes that effective prompts and topologies are critical for designing robust multi-agent systems.

AI Agents need dynamic system engineering skills: The static benchmarks used for evaluating AI agents fail to reflect necessary skills for dynamic system engineering, advocating for agents trained via automation-oriented sandbox games like Factorio.
This approach aims to foster the development of specialized reasoning and long-horizon planning capabilities essential for managing complex engineering challenges.

Links mentioned:

Develop AI Agents for System Engineering in Factorio: Continuing advances in frontier model research are paving the way for widespread deployment of AI agents. Meanwhile, global interest in building large, complex systems in software, manufacturing, ener...
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies: Large language models, employed as multiple agents that interact and collaborate with each other, have excelled at solving complex tasks. The agents are programmed with prompts that declare their func...

DSPy ▷ #general (3 messages):

Deep Research Abstractions, dspy Error Handling 

Inquiry on Simplifying Deep Research Tasks: A member inquired about plans to introduce abstractions that simplify tasks akin to deep research, noting that the necessary components might already be available.
Are you guys planning to introduce abstractions? highlights a curiosity about potential upcoming features.

AttributeError with dspy: One member reported encountering the error AttributeError: module 'dspy' has no attribute 'HFClientVLLM' while using dspy.
After some investigation, they noted that this feature was deprecated in dspy 2.6, resolving their confusion.

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (6 messages):

RAFT templates for Llama, Compatibility issues with HF datasets, Converting complex objects to strings, Updating README with helper function, JSON lines formatted files 

Can we use custom templates with Llama?: A member inquired whether their own templates, similar to RAFT's, could be used for generating synthetic datasets with Llama, or if a specific structure was necessary.
This raises questions about the flexibility of Llama's dataset requirements.

HF datasets may face compatibility issues: A member expressed concerns that HF datasets might always have compatibility issues due to differing function properties.
They noted a preference for converting complex objects to strings for ease of use in datasets.

Common practice for complex objects in HF datasets: A member shared a code snippet suggesting the practice of converting complex objects not following a schema to strings for HF datasets.
This approach aims to streamline dataset loading in scenarios with non-standard structures.

Proposal to update README for additional helper function: A member offered to create a pull request (PR) to update the README with a new helper function that could benefit users.
This suggestion was positively received, with another member expressing gratitude for the help.

Clarification on JSON file formatting: A member clarified that there are no issues with the JSON files used, stating that HF expects JSON lines formatted files.
This reinforces the importance of adhering to the expected file format for successful dataset loading.

                        Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):