[AINews] not much happened today

just 2-3x slower than C

                October 11, 2024

            [AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            a quiet long weekend is all we need.

AI News for 10/10/2024-10/11/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (231 channels, and 2131 messages) for you. Estimated reading time saved (at 200wpm): 218 minutes. You can now tag @smol_ai for AINews discussions!

We are indeed fans of Tesla's Robotaxi/van/humanoid progress, but there's not much actionable for AI Engineers there. Perhaps you can read Dario Amodei's latest take on the AGI future or, closer to earth, the back to back Latent Space features on the $2 H100 GPU Bust or deep dive with Ankur Goyal of Braintrust following his monster series A.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Other AI Subreddit Recap

AI Discord Recap
PART 1: High level Discord summaries
Notebook LM Discord Discord
Unsloth AI (Daniel Han) Discord
HuggingFace Discord
LM Studio Discord
Latent Space Discord
Modular (Mojo 🔥) Discord
Perplexity AI Discord
OpenRouter (Alex Atallah) Discord
Eleuther Discord
aider (Paul Gauthier) Discord
OpenAI Discord
Stability.ai (Stable Diffusion) Discord
GPU MODE Discord
Nous Research AI Discord
Cohere Discord
LlamaIndex Discord
OpenAccess AI Collective (axolotl) Discord
DSPy Discord
tinygrad (George Hotz) Discord
Torchtune Discord
Interconnects (Nathan Lambert) Discord
Gorilla LLM (Berkeley Function Calling) Discord
LLM Agents (Berkeley MOOC) Discord
LangChain AI Discord
OpenInterpreter Discord
AI21 Labs (Jamba) Discord

PART 2: Detailed by-Channel summaries and links
Notebook LM Discord ▷ #announcements (1 messages):
Notebook LM Discord ▷ #use-cases (36 messages🔥):
Notebook LM Discord ▷ #general (591 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (259 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #help (80 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):
HuggingFace ▷ #general (235 messages🔥🔥):
HuggingFace ▷ #today-im-learning (3 messages):
HuggingFace ▷ #cool-finds (5 messages):
HuggingFace ▷ #i-made-this (10 messages🔥):
HuggingFace ▷ #NLP (3 messages):
HuggingFace ▷ #diffusion-discussions (12 messages🔥):
HuggingFace ▷ #gradio-announcements (1 messages):
LM Studio ▷ #general (57 messages🔥🔥):
LM Studio ▷ #hardware-discussion (37 messages🔥):
Latent Space ▷ #ai-general-chat (46 messages🔥):
Latent Space ▷ #ai-announcements (5 messages):
Latent Space ▷ #ai-in-action-club (39 messages🔥):
Modular (Mojo 🔥) ▷ #general (4 messages):
Modular (Mojo 🔥) ▷ #mojo (73 messages🔥🔥):
Modular (Mojo 🔥) ▷ #max (1 messages):
Perplexity AI ▷ #general (67 messages🔥🔥):
Perplexity AI ▷ #sharing (8 messages🔥):
Perplexity AI ▷ #pplx-api (3 messages):
OpenRouter (Alex Atallah) ▷ #general (75 messages🔥🔥):
Eleuther ▷ #announcements (1 messages):
Eleuther ▷ #general (53 messages🔥):
Eleuther ▷ #research (8 messages🔥):
Eleuther ▷ #scaling-laws (1 messages):
Eleuther ▷ #interpretability-general (5 messages):
Eleuther ▷ #lm-thunderdome (5 messages):
aider (Paul Gauthier) ▷ #general (38 messages🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (22 messages🔥):
aider (Paul Gauthier) ▷ #links (8 messages🔥):
OpenAI ▷ #ai-discussions (53 messages🔥):
OpenAI ▷ #gpt-4-discussions (9 messages🔥):
OpenAI ▷ #prompt-engineering (2 messages):
OpenAI ▷ #api-discussions (2 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (62 messages🔥🔥):
GPU MODE ▷ #general (2 messages):
GPU MODE ▷ #triton (2 messages):
GPU MODE ▷ #torch (15 messages🔥):
GPU MODE ▷ #cool-links (3 messages):
GPU MODE ▷ #youtube-recordings (2 messages):
GPU MODE ▷ #torchao (25 messages🔥):
GPU MODE ▷ #off-topic (1 messages):
GPU MODE ▷ #rocm (3 messages):
GPU MODE ▷ #intel (2 messages):
GPU MODE ▷ #arm (1 messages):
GPU MODE ▷ #self-promotion (1 messages):
Nous Research AI ▷ #general (44 messages🔥):
Nous Research AI ▷ #ask-about-llms (9 messages🔥):
Nous Research AI ▷ #interesting-links (1 messages):
Cohere ▷ #discussions (3 messages):
Cohere ▷ #questions (2 messages):
Cohere ▷ #api-discussions (12 messages🔥):
LlamaIndex ▷ #blog (1 messages):
LlamaIndex ▷ #general (13 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):
OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (9 messages🔥):
DSPy ▷ #show-and-tell (6 messages):
DSPy ▷ #papers (1 messages):
DSPy ▷ #general (4 messages):
tinygrad (George Hotz) ▷ #general (6 messages):
tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):
Torchtune ▷ #general (5 messages):
Torchtune ▷ #papers (3 messages):
Interconnects (Nathan Lambert) ▷ #news (4 messages):
Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):
Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):
LangChain AI ▷ #general (2 messages):
OpenInterpreter ▷ #O1 (2 messages):
AI21 Labs (Jamba) ▷ #jamba (2 messages):

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Developments

Aria by Rhymes AI: @mervenoyann highlighted Aria, a new 25.3B multimodal model by Rhymes AI that can take image/video inputs. It's released with Apache-2.0 license and fine-tuning scripts. @osanseviero noted it's the first Multimodal MoE (text/code/image/video) with 24.9B total params, 3.5B active per text token, and a 64k token context window. It's pre-trained on 6.4T language tokens and 400B multimodal tokens.

OpenAI Updates: @DbrxMosaicAI reported evaluating OpenAI's o1-preview and o1-mini models, along with Google's Gemini 1.5 Pro and Gemini 1.5 Flash. They found that OpenAI o1 models show consistent improvement over Anthropic and Google models on long context RAG Benchmark up to 128k tokens.

Google Gemini: @DbrxMosaicAI noted that despite lower performance than OpenAI and Anthropic models, Google Gemini 1.5 models have consistent RAG performance at extreme context lengths of up to 2 million tokens.

Meta AI: @ylecun announced Meta AI rolling out in 21 countries, including support for Tagalog, Arabic, Indonesian, Thai, and Vietnamese. However, it's still not available in the EU.

AI Research and Benchmarks

SWE-bench: @OfirPress celebrated the one-year anniversary of SWE-bench, a benchmark for software engineering tasks. They also introduced SWE-bench Multimodal.

LLM Evaluation: @clefourrier shared a comprehensive guidebook for LLM evaluation, covering practical insights and theoretical knowledge gathered while managing the Open LLM Leaderboard.

Astute RAG: @omarsar0 discussed Astute RAG, a novel approach to deal with imperfect retrieval augmentation and knowledge conflicts in LLMs. It adaptively elicits essential information from LLMs' internal knowledge and iteratively consolidates internal and external knowledge with source-awareness.

AI Tools and Applications

OxyCopilot: @rohanpaul_ai introduced OxyCopilot, an AI-powered assistant from Oxylabs that simplifies web scraping. It uses advanced AI models to identify and generate complex parsing patterns accurately.

Taipy: @svpino shared Taipy, an open-source Python library for building end-to-end production applications without JavaScript, CSS, or HTML. It's designed for data scientists and scales well for production use.

Latitude: @svpino presented Latitude, an open-source prompt engineering platform that evaluates prompts across different scenarios and refines them to improve results.

AI Industry Insights

AI Funding: @finbarrtimbers noted that with LLMs, creating massively profitable/successful businesses with very little capital is less true than before, expecting a radical impact on the industry.

OpenAI Strategy: @_philschmid speculated on why OpenAI might not prioritize API Revenue and focuses on consumer products like ChatGPT, citing factors such as competition from open models and the potential for "AGI"/Agents to use multiple models.

Memes and Humor

@karpathy joked about YouTube's algorithm not understanding his desire for "highly rated, 1hr long, information dense lecture on anything esoteric."

@kipperrii humorously asked what to name a second array variable after naming the first one "array."

This summary captures the key discussions in the AI community, focusing on new model releases, research developments, tools, and industry insights that would be relevant for an AI engineer audience.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. AI Hardware Advancements: New GPUs and Price Dynamics

AMD Launched MI325X - 1kW, 256GB HBM3, claiming 1.3x performance of H200SXM (Score: 97, Comments: 40): AMD has launched the MI325X GPU, featuring 256 GB of HBM3e memory and built on the CDNA 3 architecture, with a product link available. The GPU boasts 1.3 times greater peak theoretical FP16 and FP8 compute performance compared to NVIDIA's H200, along with 1.3 times better inference performance and token generation than the NVIDIA H100, while delivering a memory bandwidth of 6 terabytes per second.

$2 H100s: How the GPU Rental Bubble Burst (Score: 251, Comments: 80): The GPU rental market has experienced a significant shift, with H100 GPU prices dropping to $2 per hour, down from previous rates of $5-$10 per hour. This price reduction is attributed to increased supply and competition among cloud providers, potentially disrupting the AI infrastructure market and making high-performance computing more accessible to a wider range of researchers and developers.
Users report H100 GPU prices as low as $1.73-$2.40 per hour on platforms like vast.ai, datacrunch.io, and Lambda Cloud. Some express concerns about stability and performance issues with certain providers.
NVIDIA's AI Enterprise license expires after 5 years, limiting access to their container platform. This strategy, along with potential buyback programs for used GPUs, aims to maintain high prices and control the secondhand market.
The price drop could lead to an explosion of new models and benefit the open-source community. However, A100 80GB GPUs still command high prices ($16K on eBay), while older models like V100 32GB can be found for as low as $550-$1500.

Bought a server supporting 8*gpu to run 32b...but it screams like jet, normal? (Score: 271, Comments: 173): The post discusses the challenges of running 8 GPUs in a home server setup, specifically focusing on noise issues. The author purchased a server capable of supporting 8 GPUs to run 32-bit models, but found that it produces excessive noise comparable to a jet engine. This situation raises questions about the practicality and feasibility of operating high-performance GPU servers in residential settings due to noise constraints.
Rack mount servers are designed to run with the case closed for proper cooling. Users advised closing the lid to reduce noise and ensure proper airflow, as open cases trigger full-speed fan operation.
The server is likely a Supermicro 4029 model designed for passive GPUs, not desktop GPUs. Users suggested using the IPMI utility to adjust fan speeds and potentially replacing fans with quieter alternatives like Sunon Maglev fans.
The setup's practicality was questioned, with suggestions to use 2-4 4090s instead of 8 GPUs for 32-bit models. Some users recommended passively cooled GPUs and exploring desktop options to mitigate noise issues.

Theme 2. Democratizing AI: Open Source Models and Local Inference

I made a home server running local AI on R Pi (Score: 55, Comments: 30): Over a 10-year period, the author developed a home server running local AI on a Raspberry Pi, evolving from using Wolfram Alpha and Wit.ai to current LLMs. The latest version (MK II) operates on 8GB of memory, a new Raspberry Pi CPU, and 1 terabyte of storage, designed for areas with limited or no internet access and accessible via hotspot and browser.
The author uses a node server for non-LLM tasks and PeerJS for LLM streaming. The default model is llama3.2 Q4_K_M 3B running on ollama, achieving 6-7 tokens per second. A video demonstrates the response speed.
The device's design was inspired by Ferrari seat headrests and resembles the ship from the movie "Arrival". The case is made of translucent resin, blurring the Raspberry Pi inside. More information is available on the project website.
The project aims to provide AI access in areas without internet, serving as a home server/cloud with file management capabilities. It includes 1TB storage for movies, pictures, and embedded files, accessible via dual WiFi and onboard hotspot for family use.

Fast Llama 3+ inference in pure, modern Java (Score: 98, Comments: 37): The project llama3.java offers fast Llama 3+ inference in pure Java with no dependencies, supporting the GGUF format, Llama 3 tokenizer, and Grouped-Query Attention. It includes features such as Q8_0 and Q4_0 quantizations, fast matrix-vector multiplication using Java's Vector API, and supports Llama 3.1 and 3.2 models, along with GraalVM's Native Image and AOT model pre-loading for quick startup times.
Users humorously discussed Java's performance, with some expressing surprise at its speed. One commenter noted Java is "just 2-3x slower than C" and "50X faster than Python", which is commonly used in ML research.
The discussion touched on garbage collection in Java vs C#. One user mentioned Java's ZGC garbage collector with "0.05ms pause times", while C# was said to have "100ms+ pause times" in some cases.
Several comments joked about Java's reputation, with one stating "3 Billion Devices Run Llama", referencing the famous Java slogan. Another asked if the project supports GPU inference or only CPU inference.

I've been working on this for 6 months - free, easy to use, local AI for everyone! (Score: 631, Comments: 97): Browser-based AI tool Mela offers free, local AI capabilities for chat and document creation without requiring a backend. Developed over 6 months, the tool utilizes WebGPU for efficient processing and supports various open-source models including Llama 2, Mistral, and Phi-2. Mela features include real-time text generation, document summarization, and a built-in vector database for context-aware responses, all while prioritizing user privacy by keeping data locally on the device.
Papeg.ai is a browser-based AI tool created by a digital artist in Europe, offering features like real-time text generation, document summarization, and voice chat. The project is open-source on GitHub and supports custom AI models and Ollama integration.
Users expressed interest in the project's funding model and potential for enterprise use cases. Some concerns were raised about automatic file downloads and the need for warnings before initiating downloads.
The tool uses IndexDB for document storage and Orama for vector search, with hybrid searches performed on the vector database. Users can connect to external APIs and the developer is considering implementing OpenAI API integration.

Theme 3. New AI Model Releases and Benchmarks

Announcing Mistral-NeMo-Minitron 8B Instruct by Nvidia (Score: 87, Comments: 17): NVIDIA has announced the Mistral-NeMo-Minitron 8B Instruct model, a new foundation model that reportedly delivers high accuracy. The announcement includes performance comparisons and a link to a detailed blog post on the NVIDIA developer website for more information about the model's capabilities and implementation.
Users questioned the comparison to Gemma-7B instead of Gemma2-9B, highlighting the importance of benchmark selection in model evaluation.
Performance comparisons were shared, suggesting Gemini Flash 8B achieves an MMLU score of ~75, while being multimodal with potentially a smaller text model component.
Qwen 2.5 7B was mentioned as achieving a 75.4 MMLU-redax score, referencing a carefully annotated version of the MMLU benchmark.

LLM Hallucination Leaderboard (Score: 62, Comments: 18): The LLM Hallucination Leaderboard compares the tendency of various large language models to generate false or unsupported information. Models are evaluated on their performance across three key metrics: hallucination rate, factual accuracy, and consistency. The leaderboard currently includes results for popular models like GPT-3.5, GPT-4, and Claude, providing a quantitative assessment of their propensity for confabulation in different contexts.
Users questioned the use of temperature 0 for testing, with the author noting that higher temperature settings didn't significantly affect results. The discussion highlighted the importance of sampling methods in LLM evaluation.
Initial confusion arose over GPT-4's poor performance, later clarified that GPT-4-mini performed poorly while GPT-4 excelled. This underscores the variability in performance across different versions of the same model family.
Llama models showed strong performance due to their cautious responses, resulting in fewer hallucinations but higher non-response rates. This highlights the trade-off between accuracy and completeness in LLM outputs.

DARKEST Planet 16.5B - Unusually strong non AI creative model, with "regen" randomness. (Score: 103, Comments: 28): The DARKEST Planet 16.5B model, part of the "Dark Planet" series, is a 71-layer creative AI model developed using the Brainstorm 40X process for various creative applications. It features unique properties including significant variations between "regens" using the same prompt, exceptional detail and prose levels, and unusual stability with repetition penalty 1.02 and up, and temperature 0-5, along with a provided guide for settings and quantization.
Users reported issues with the model's NSFW content generation, noting it often refuses to produce such content. The developer suggested trying different quantizations (Q4KS and IQ4XS) and mentioned upcoming "DARKEST PLANET" 16.5B versions that might address this.
The model's "non-AI" like qualities were discussed, referring to its ability to produce prose without typical AI patterns or clichés. Users appreciated its humanized text output and unpredictable regenerations for the same prompt.
Some users experienced difficulties with the model replying for them in roleplaying situations, despite trying various settings. The developer uploaded the full source repo to Hugging Face in response to user interest.

Theme 4. AI Evaluation and Fine-tuning Techniques

Hugging Face LLM Evaluation Guidebook (Score: 38, Comments: 6): Hugging Face's evaluation team has released an LLM Evaluation Guidebook on GitHub, offering comprehensive resources for creating custom evaluations, analyzing current methods, and troubleshooting. The guidebook, developed from insights gained while managing the Open LLM Leaderboard and designing lighteval, aims to provide both practical and theoretical knowledge, with plans to regularly add notebooks demonstrating fast evaluation experiments and best practices.
The LLM Evaluation Guidebook received positive feedback, with users appreciating the comprehensive resource. A corrected GitHub link was provided in the comments for easier access.
Users expressed gratitude for the guidebook and the evaluation team's contributions to the community. The submitter actively engaged with commenters, acknowledging their feedback.
Discussion focused on the challenges of LLM-as-a-judge workflows, highlighting issues with ambiguity in evaluation criteria. The submitter agreed, noting this method is currently unreliable but promising.

Monitor your LlamaIndex application for model fine-tuning or evaluation (Score: 80, Comments: 1): The author developed a tool to monitor LlamaIndex applications for model fine-tuning and evaluation by collecting model responses and implementing an annotation UI in Argilla. They shared a GitHub notebook demonstrating this setup, which could be particularly useful for applications with users who can contribute to improving model outputs.

Fine-tuning with small batch sizes and gradient accumulation poorly perform if you use Transformers (TRL)! (Score: 42, Comments: 22): Fine-tuning with Hugging Face libraries (TRL and Transformers) shows significant performance issues when using small batch sizes and gradient accumulation. Experiments with Llama 3.2, SmolM-135M, and Qwen2.5 demonstrate that batch_size=1 with gradient_accumulation_steps=32 performs much worse than batch_size=32 with gradient_accumulation_steps=1, despite being mathematically equivalent. This issue persists across different precision formats (bf16 and fp32) and has been reported to the TRL repository.
Users express a need for an up-to-date guide on fine-tuning modern models, with current best practices. The HuggingFace alignment handbook and SimPO paper are recommended resources for hyperparameters and alignment techniques.
Experiments with Unsloth, built on top of Transformers, show similar behavior to the original findings. The difference in training loss is observed, but validation loss remains similar, suggesting minimal impact on the model itself.
Discussion highlights that gradient accumulation and batch size are not strictly equivalent, contrary to common belief. The Oobabooga Training Pro extension suggests that gradient accumulation can degrade training fidelity while being VRAM-friendly.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Techniques

Google Deepmind advances multimodal learning with joint example selection: In /r/MachineLearning, a Google Deepmind paper demonstrates how data curation via joint example selection can further accelerate multimodal learning.

Microsoft's MInference dramatically speeds up long-context task inference: In /r/MachineLearning, Microsoft's MInference technique enables inference of up to millions of tokens for long-context tasks while maintaining accuracy, dramatically speeding up supported models.

Scaling synthetic data creation using 1 billion web-curated personas: In /r/MachineLearning, a paper on scaling synthetic data creation leverages the diverse perspectives within a large language model to generate data from 1 billion personas curated from web data.

AI Model Releases and Improvements

Salesforce's "tiny giant" xLAM-1b model surpasses GPT 3.5 in function calling: In /r/LocalLLaMA, Salesforce released xLAM-1b, a 1 billion parameter model that achieves 70% accuracy in function calling, surpassing GPT 3.5. It is dubbed a "function calling giant" despite its relatively small size.

Phi-3 Mini (June) with function calling: In /r/LocalLLaMA, Rubra AI released an updated Phi-3 Mini model in June with function calling capabilities. It is competitive with Mistral-7b v3 and outperforms the base Phi-3 Mini.

Pyramid Flow SD3 open-source video generation tool released: A new open-source video generation tool called Pyramid Flow SD3 was released, based on Stable Diffusion 3. It includes 384p and 768p models, with the 384p version requiring around 26GB of memory.

AI Industry and Business

OpenAI projections show massive planned investments: OpenAI projections suggest the company plans to invest heavily, with losses potentially tripling to $14 billion by 2026. This indicates significant confidence in future AI capabilities and market potential.

Tesla unveils robotaxi concept: Elon Musk presented Tesla's robotaxi concept, featuring inductive charging, automated cleaning, and claims of enabling parking lots to be converted to parks. However, many commenters expressed skepticism about the timeline and practicality of the concept.

AI Capabilities and Limitations

Paper demonstrates probabilistic reasoning in LLMs: A new paper provides evidence that large language models engage in probabilistic reasoning rather than pure memorization, though some limitations are noted.

Debate over ChatGPT's Advanced Voice Mode capabilities: Users discussed their experiences with ChatGPT's Advanced Voice Mode, with some finding it impressive while others noted significant limitations and heavy censorship compared to text-based interactions.

Emerging Technologies

Brain stimulation for VR motion simulation: A new technology for simulating motion in VR using galvanic vestibular stimulation was demonstrated, potentially reducing motion sickness and enhancing immersion.

Ambitious longevity research goals: Clock.bio announced plans to pursue extending human healthspan by 20 years based on biomarkers of aging in a Phase 3 trial by the end of the decade, though some commenters expressed skepticism about the timeline.

AI Discord Recap

A summary of Summaries of Summaries by O1-mini

Theme 1. Turbocharging Model Training and Fine-Tuning

Optimize Llama3.2 with DeepSpeed and FSDP2: Engineers are tackling the high VRAM demands of Llama3.2 by leveraging DeepSpeed and FSDP2, achieving efficient training on limited GPU resources. Techniques like activation checkpointing are proving essential for managing memory effectively.
Quantization Hacks Improve torchao Performance: Innovating with int8 tensor replacements and hardware-based optimizations, users are enhancing torchao for faster computations. Despite some performance challenges, blending quantization and dequantization holds promise for scalability.
Fine-Tuning Llama 7B on a 16GB GPU? Challenge Accepted!: Developers are pushing the limits by fine-tuning Llama 7B on a single 16GB GPU, employing tools like Runpod and CPU offload optimizers to navigate memory constraints. Success with QLoRA highlights the community's adaptability.

Theme 2. Multimodal AI: Bridging Text, Image, and Audio

Aria Shines as the Open Multimodal Champion: The Aria model is setting benchmarks with its 3.9B parameters, outperforming Pixtral-12B and Llama3.2-11B in language understanding and multimodal tasks. Its open nature fosters wider adoption and innovation in integrating diverse data types.
From Discord Chats to Podcasts: AI's New Playground: Communities are experimenting with generating podcasts from casual Discord conversations, utilizing tools like NotebookLM. While outputs vary in quality, the creative potential is sparking enthusiastic engagement.
Nonverbal Sound Analysis Takes Center Stage: Explorations into nonverbal vocalizations and emotions using TTS models are uncovering nuanced AI capabilities. Google's TTS model is at the forefront, showcasing potential for deeper emotional intelligence in AI systems.

Theme 3. Mastering Costs and GPU Infrastructure

H100 Rentals Dive to $2/hr: Should You Buy or Rent?: The GPU rental market is booming with H100 prices plummeting from $8/hr to under $2/hr, thanks to the emergence of new vendors and Blackwell chips. Smaller AI firms are weighing the benefits of buying vs. renting as infrastructure options expand.
Batch-GPT Slashes API Costs by Over 50%: The Batch-GPT tool is revolutionizing cost management by reducing OpenAI API expenses by more than 50% through its innovative Batch API. Open-source enthusiasts are integrating auto-caching features for seamless adoption.
Runpod and AWS Lead the Charge in GPU Clusters: Recommendations for H100 clusters at roughly $2.5/hr spotlight services like Runpod and AWS, providing robust options for substantial AI training needs. These platforms are becoming go-to choices for scaling large model deployments efficiently.

Theme 4. Navigating API Performance and Integration Hurdles

Perplexity API vs. Perplexity Labs: The Speed Race: Users are flagging the Perplexity API's 2-second response time as a lag behind Perplexity Labs' <1 second speed, debating the implementation of web sockets to bridge the gap. Support channels are bustling as users seek better performance and enhanced features like citation access.
Cohere's V2 API Struggles with Speed: Transitioning to Cohere's v2 API has brought challenges with response times creeping to 2-3 seconds, compared to v1's 1-1.5 seconds. Community members are seeking solutions and sharing migration insights to optimize their workflows.
Integrating op3nai Real-Time APIs: Success Stories Needed: Developers are eager to implement op3nai real-time APIs into projects like O1, yet face hurdles with access and documentation. Email support and community troubleshooting are critical for overcoming these integration challenges.

Theme 5. Streamlining AI Development with Cutting-Edge Tools

Gradio 5 Launches with Turbocharged Features: The release of Gradio 5 introduces security upgrades, a gorgeous new UI, and the innovative AI Playground feature, empowering developers to build ML applications more efficiently. These enhancements promise lightning-fast loading and an improved user experience.
Symphony Automates Multi-Agent AI Workflows: Symphony transforms user descriptions into functional agentic workflows, simplifying complex AI task automation. Detailed Loom demonstrations showcase how easy it is to integrate tools like perplexity and image-to-text.
ComfyUI vs. Automatic1111: Choosing Your AI Tool: Community preferences lean towards ComfyUI for advanced Flux usage, while Automatic1111 remains the choice for beginners. Both platforms, alongside PyTorch and Diffusers, are pivotal in enhancing the Stable Diffusion workflows for diverse user bases.

Links Mentioned:

Optimize Llama3.2 with DeepSpeed and FSDP2
Quantization Hacks Improve torchao Performance
Fine-Tuning Llama 7B on a 16GB GPU? Challenge Accepted!
Aria Shines as the Open Multimodal Champion
From Discord Chats to Podcasts: AI's New Playground
Nonverbal Sound Analysis Takes Center Stage
H100 Rentals Dive to $2/hr: Should You Buy or Rent?
Batch-GPT Slashes API Costs by Over 50%
Runpod and AWS Lead the Charge in GPU Clusters
Perplexity API vs. Perplexity Labs: The Speed Race
Cohere's V2 API Struggles with Speed
Integrating op3nai Real-Time APIs: Success Stories Needed
Gradio 5 Launches with Turbocharged Features
Symphony Automates Multi-Agent AI Workflows
ComfyUI vs. Automatic1111: Choosing Your AI Tool

PART 1: High level Discord summaries
Notebook LM Discord Discord

Audio Overviews generate problems: The team is investigating why Audio Overviews are failing to generate, which might hinder the performance of other features.

Members voiced concerns that this problem could cascade, affecting the functionality of additional components in the system.
NotebookLM enhances Homeschooling fun: Participants are exploring NotebookLM to create engaging lesson plans for homeschool settings, particularly for a 13-year-old student.

However, there are warnings about potential hallucinatory outputs from the AI which may lack substantive content depth.
Podcasts born from Discord chatter: The community is buzzing about generating podcasts from Discord conversations, turning casual chats into entertaining audio content.

Some users shared humorous takes on utilizing quirky chat logs for this podcasting venture, raising eyebrows about the output quality.
Nonverbal Sound Analysis initiates exploration: Experiments are underway to analyze nonverbal vocalizations and emotions through TTS models, showcasing a potential area for AI capability development.

This endeavor is part of an ongoing investigation into how nuanced audio elements can be accurately conveyed and interpreted by AI.
AI explores personal Dream Journals: A member is experimenting with using AI to extract recurring themes from their personal dream journal, highlighting the diverse applications of AI.

This exploration encourages others to reflect on similar uses of AI for analyzing personal experiences and narratives.

Unsloth AI (Daniel Han) Discord

Multimodal Models Excitement: The community is eagerly awaiting support for multimodal models like Llama3.2 and Qwen2 VL, with updates expected next week.

This advancement is highly anticipated, with members expressing their excitement over new possibilities.
Fine-Tuning Strategies Under Scrutiny: Members discussed fine-tuning for models like G2-9B, noting high VRAM requirements and effectiveness with Dora.

Challenges with Gemma 9B emerged, including VRAM issues and the presence of NaN values during training.
Recommendations on H100 Clusters: Users shared insights on using H100 clusters at around $2.5/hour, highlighting required VRAM for optimal performance.

Options like Runpod are recommended for those seeking substantial AI training resources.
Speculation on OpenAI's O1: Opinions are divided about OpenAI's O1, speculated to allow chains of prompts without user visibility.

Some members question the closed nature of the source, reflecting skepticism towards the claims made.
Exploration of CoT Reasoning in LLMs: Members believe enhancing LLMs through chain of thought reasoning holds promise for future models.

Proposals include integrating CoT into the attention model's k/v cache for potential experimentation.

HuggingFace Discord

Cost Calculation Tool for Distilabel: A member showcased a new package for cost calculation in Distilabel pipelines with functionalities tested on TextGeneration and TextClassification tasks, available here.

The package will soon support pricing options in YAML for various LLM APIs, enhancing user experience in managing costs.
Gradio 5 Goes Live: The Gradio 5 release announces significant enhancements, including security upgrades and an AI Playground feature, empowering developers to create ML applications more efficiently.

Developers can expect lightning-fast loading with implemented SSR and a gorgeous new UI design that enhances app interactions.
NVIDIA's Innovations in LLM Training: NVIDIA's recent research highlighted improvements in LLM training utilizing upcycled models, with the Nemotron-4 15B achieving 67.6% on MMLU.

Their methods incorporated MoE techniques, suggesting alternatives for optimizing large model training while addressing high-performance demands.
Insights into Emotion Detection Models: A user probing into emotion detection models noted experiences with FER and DeepFace, prompting discussions on limitations in identifying nuanced emotional states.

Members pointed out specific challenges with measuring emotion accuracy, emphasizing the need for better tools in various emotional recognition applications.
Multi-Channel Considerations in Diffusion Processes: Discussion bridged on applying diffusion noise across various channels, particularly when processing images with different information layers, including biological data.

Participants raised questions about whether a singular noise schedule would maintain effectiveness across diverse channel data representations.

LM Studio Discord

Models Shine with General-Purpose Tasks: Members confirmed that new models excel in performing various tasks akin to ChatGPT, leveraging both pretrained and instruct finetuned weights.

This versatility allows users to deploy these models across a range of applications effortlessly.
Upgrade from M1 Max to M3 Max Brings Results: An upgrade from an M1 Max with standard RAM to an M3 Max with 128GB RAM proved successful for running LLMs without issues.

Many users are transitioning to larger systems to effectively manage high-demand model workloads.
Debate on RTX 5000 Pricing Leaves Members Shocked: Rumors suggest the pricing of the new RTX 5000 series could range from $1,500 to $2,500 per card, possibly undercutting Mac Studio configurations.

Concerns about the expenses associated with multiple graphics cards are mounting, especially regarding thermal and energy costs.
Compatibility Hiccups with MLX Backend: Issues arose with model loading on GPUs using the MLX backend, where larger models default to CPU usage instead.

Members recommended checking performance in standalone Apple MLX setups and consider raising an issue on GitHub for more support.
External e-GPU Compatibility Comes into Question: Users explored whether attaching an e-GPU via Thunderbolt to an RTX 4090 could enhance graphics memory, but reported doubts on potential performance gains.

The Thunderbolt connection may introduce latency, affecting the overall performance when mixing GPU resources.

Latent Space Discord

Wondercraft Introduces Director Mode: With the launch of Director Mode, Wondercraft empowers users to control AI voice delivery, marking it as their most significant update this year.

This innovation enhances creative versatility for audio projects, allowing fine-tuned performance options previously unavailable.
H100 GPU Prices Crash: A guest post titled $2 H100s: How the GPU Rental Bubble Burst reports a price drop of H100 rentals from $8/hr to less than $2/hr, prompting discussions on buying versus renting.

As Blackwell chips emerge, the article raises strategic considerations for smaller AI firms exploring infrastructure options.
Insights on Live Demos and Technical Hiccups: A self-proclaimed 'king of live demos' shared insights that expectations differ significantly between novices and experienced presenters, often leading to technical difficulties.

Community members echoed this sentiment, recounting their own mishaps during demonstrations responsible for stalling key project presentations.
Challenges with Discord API Setup: Members discussed the pain of permissions when seeking API keys while transitioning between libraries like discord.py and discord.js, stressing the complications involved.

One member humorously noted that obtaining the correct setup feels more like an art than a straightforward process, often derailing workflows.
Simplifying Feature Building: Amidst feature-building discussions, suggestions for easy project ideas like a calculator app or to-do list emerged to help streamline developer efforts.

Emphasizing efficiency, one member stated that the 'fun stuff that works takes 10 seconds', highlighting the balance between complexity and simplicity in projects.

Modular (Mojo 🔥) Discord

Float Precision Fumbles with 0.15: A user questioned why a certain value does not equal 0.15, leading to discussions on float precision in programming, noting that literals are materialized to Float64.

It's clarified that discrepancies arise similar to how 1/3 cannot be precisely represented in base 10.
Consistent Floating Point Behavior: Despite precision issues, another member reassures that values remain self-consistent in IEEE 745 64-bit floating points.

Calculations equate to 0.15 accurately within this representation's confines.
Defining Trivial Types Challenge: Users tackled issues defining a trait for trivial types containing only inline memory, debating AnyTrivialRegType's restrictiveness.

They expressed the need for alternatives, given the limitations on combining due to existing trait constraints.
AESNI Instruction Set Implementation Issues: A user described code to check for AESNI instruction set support but faced recognition issues by the compiler while ensuring compatibility with X86 architecture using llvm_intrinsic.

The role of AVX2 and AVX512 was confirmed, allowing operations across multiple instruction widths.
In-Place Struct Creation Discussion: Queries arose about creating structs in-place to prevent unnecessary copies when appending to a list, noting that rvalue struct creation generally avoids copies.

The __moveinit__ method was highlighted as a lightweight approach to copying when needed.

Perplexity AI Discord

Perplexity API slower than anticipated: Users noted the 2-second response time of the Perplexity API, compared to the speedy less than 1 second on Perplexity Labs. They speculated that implementing web sockets, as seen on Labs, could enhance the API's performance.

One user reported emailing support for access to citations and an increased rate limit but found no response; they were advised to contact api@perplexity.ai for faster resolution.
Tesla introduces new robovan model: Tesla has launched a new robovan geared towards improving urban transport with high electric efficiency and advanced driver-assistance systems.

This innovative model aims to significantly alter urban mobility and reduce carbon footprints, paving the way for cleaner city environments.
Hurricane Milton wreaks havoc in Florida: Hurricane Milton has caused major disruptions in Florida, prompting emergency evacuations, as detailed here.

Meteorologists continue to monitor its unpredictable path, stressing the importance of preparedness amidst such severe weather conditions.
Germany's apostrophe controversy intensifies: A debate surrounding Germany's apostrophe usage is stirring significant discussions on modernizing language standards.

Experts in linguistics are voicing opinions on whether current rules should evolve to reflect contemporary usage.
Engaging community interactions: Members shared lighthearted memes, including a cat in the snow with the phrase 'when hell freezes over,' reflecting the casual atmosphere of the community.

These playful moments were complemented by insightful discussions about features and functionality, keeping the chatter lively.

OpenRouter (Alex Atallah) Discord

Tackling API Usage Issues: A member inquired about dealing with billing and usage issues via DMs, prompting Alex Atallah to advise patience for IDs to appear after using the /generation API.

This reflects common user experiences concerning API request and response delays.
Comparing Model Pricing Strategies: Discussion emerged on the price differences between Mistral Nemo 12B Starcannon and Rocinante 12B, noting Mistral's more attractive pricing.

The conversation pointed out that limited competition in the market allows Rocinante 12B to charge higher prices.
LLMs Boost Writing Quality: A user shared that focusing LLMs on specific sections of articles has significantly enhanced their writing output.

Another user supported this, stating that with LLMs, anyone can improve their writing quality with effort.
How to Share Models Effectively: Users learned that the 'share models' button generates a link to share the current chatroom's model settings, but lacks details like parameters and prompts.

This feature simplifies sharing settings, but users may need to supplement shared links with detailed explanations.
Access Glitches Cause Concern: A user flagged bugs that let them access old account chats via a different account, indicating potential cookie issues.

This sparked a broader discussion about how chat data is handled and stored in browser tools, raising privacy considerations.

Eleuther Discord

GPT-NeoX enhances library with new features: The HPC team has introduced post-training features for the GPT-NeoX library, enabling native SFT, DPO, and KTO finetuning.

Test results reveal a 30% performance improvement over HuggingFace's trl library at the 13B scale, assuring greater scalability for massive computing systems.
Debating effectiveness of entropy-based sampling: Discussion on entropy-based sampling in models like Llama3.1 highlighted the need for rigorous validation of improvements over baseline reasoning scores.

Members called for credible evidence linking sampled techniques to performance enhancements, suggesting that detailed analysis is necessary.
Exploring AI's role in computational psychiatry: A proposal was made to investigate the potential of LLMs for insights into mental disorders, emphasizing the concept of 'computational psychiatry'.

Agreement surfaced that while LLMs don't showcase human-like disorders, analyzing their outputs could lead to valuable frameworks despite the alignment challenge.
lm-eval-harness raises tokenization warnings: A member reported warnings about tokenizers forking processes when running lm-eval-harness, indicating excessive output due to these warnings.

The issue can be resolved by setting the TOKENIZERS_PARALLELISM environment variable to false, preventing repetitive alerts while maintaining setup integrity.

aider (Paul Gauthier) Discord

Aider Surges Ahead of the Pack: Aider impresses users, outperforming competitors like Cline and Cursor for bug fixes and coding tasks, as one user claims it is the best after rigorous testing across frameworks.

Members unanimously praised its efficiency for both frontend and backend applications, calling it simply the best.
DeepSeek Struggles with Efficiency: Users report frustrations with DeepSeek, citing sluggish performance and inefficiencies when consolidating functions, particularly for solo developers.

One member reverted to using Sonnet-3.5 due to edit format errors, expressing disappointment with DeepSeek's functionality.
Configuration Confusion Unraveled: A user requested help configuring .env files for openrouter models, facing issues with unexpected default changes.

Another suggested that the --edit-format whole option could further complicate matters with DeepSeek's performance.
Diffsitter Dazzles with Semantic Diffs: Diffsitter serves as a tool for creating semantically meaningful diffs via AST comparison, effectively ignoring formatting variations.

Members appreciate how it produces cleaner diffs without the noise of extraneous spacing.
Error Handling Hiccups in Aider: Frequent search/replace errors in Aider prompted discussions on utilizing settings effectively to enhance performance.

Users referenced troubleshooting guidelines to tackle these issues, emphasizing capable model usage to improve outcomes.

OpenAI Discord

Voice Modulation Techniques Spark Interest: Discussants shared methods to encourage AI voice modulation, highlighting how specific prompts such as voice modulation can effectively replicate singing without actually performing it.

Frustration emerged over the AI's reluctance to engage in expressive performances, straying away from drama or poetry.
AI Compared to High-Functioning Psychopaths: A member suggested that high-functioning psychopaths and AI share a common trait of operating on logical calculations devoid of emotional burden.

This led to a humorous yet serious debate on whether psychopathic traits might have been consciously modeled in AI systems.
OpenAI Copilot Faces Performance Critique: Users are critiquing the latest version of OpenAI Copilot, claiming it underperforms compared to previous iterations and even Google's Gemini.

While some defended the model, others pointed out major omissions like lacking typing animations.
AI Exceeds Humans in Bedside Manner: Members noted reports suggesting that AI displays a better bedside manner than human doctors, stirring discussion on AI empathy.

The darkly comedic twist emerged questioning if psychopathic traits in medical professionals might inadvertently lead to superior decision-making.
Intellectual Property Constraints Innovation: Discussion highlighted how intellectual property laws restrict innovation within AI, raising concerns on monetization and litigation risks.

The tension between creativity and ownership highlights how legal frameworks may impede revolutionary advancements in AI.

Stability.ai (Stable Diffusion) Discord

ComfyUI Takes Center Stage: Members indicate that ComfyUI is favored for Flux usage, while Automatic1111 is suggested for beginners wanting to start with Stable Diffusion. Recommendations also include using PyTorch or Diffusers for command-line interface work.

This highlights a broader trend in tool preference among users as they look for better workflows in AI generation.
AMD GPUs Face AI Testing Troubles: A member expressed frustration about the lack of CUDA support on their AMD GPU, citing difficulties with Python development. Guides for using ZLUDA versions for those with AMD GPUs featuring 8GB or more of VRAM were shared.

This discussion revolves around the growing pains of adapting AMD hardware for AI workloads, which is becoming increasingly critical.
3060 Ti for Stable Diffusion Shines: It was confirmed that the 3060 Ti performs well for Stable Diffusion, with suggestions to upscale images to enhance quality despite its 8GB VRAM limitation. Members shared techniques like quantizations and tiled upscaling for better outputs.

This signifies the continued relevance of mid-tier GPUs in efficient AI generation setups.
Lora Trigger Management Gets Spotlight: A user inquired about effective strategies to remember trigger words for Loras and whether there's an automated way to manage them. This resulted in a well-rounded conversation about the complexities associated with Lora usage.

The need for systematic approaches to handle these trigger words reflects growing user challenges in enhancing AI generation fidelity.
Merging Models Discussed for Quality Boost: A rich discussion arose about the merits of merging models compared to consecutive passes, with members exploring specific sigma values in diffusion steps. The consensus revolves around the idea that merging two models averages their capabilities for balanced performance.

Such insights highlight the collective quest for improved methodologies in model enhancement.

GPU MODE Discord

Preparing for GPU Engineer Internship: A member requested resources and advice for a GPU Engineer internship, noting the importance of a strong CUDA background and anticipated test formats of multiple-choice questions and coding tasks.

This call for guidance indicates the demand for mentorship and targeted resources for aspiring engineers entering the GPU field.
Seeking cuDNN SDPA Implementation Resources: Query about a tutorial or implementation of an Attention layer using cuDNN's SDPA in Python illustrated community needs for better resources amidst confusion over instantiation processes.

A member pointed to a notebook from the cudnn-frontend's repository for further assistance, emphasizing the collaborative nature of troubleshooting.
Optimizing Llama 7B Training on Limited GPU: Training Llama 7B, which requires 28GB of memory, on a 16GB GPU was highlighted as challenging, prompting suggestions for utilizing tools like FSDP2 and activation checkpointing.

Suggestions for CPU offload optimizers were made, illustrating community adaptation strategies for fine-tuning while managing limited resources.
ROCm's New Windows Support: ROCm has introduced native support for Windows starting from version 6.3, significantly expanding access for AMD users to GPU technology, as noted in a recent GitHub issue.

The communication of this feature prompted discussions concerning clarity in ROCm's compatibility documentation.
Guangxuan Xiao Discusses Streaming LLM: Upcoming PyTorch Expert Exchange features Guangxuan Xiao on StreamingLLM slated for October 11th at 10AM PST.

An accompanying YouTube video elaborates on Efficient Streaming Language Models with Attention Sinks, demonstrating practical applications in the field.

Nous Research AI Discord

Llama 3.2 Fine-tuning issues raise concerns: Users reported freezing during full finetuning of the Llama 3.2 1B model, possibly due to NCCL issues with the dataset being used.

Another member noted success with Llama 3 8B QLoRA, suggesting the issue might be due to configuration.
New Speculative Decoding algorithm faster than Groq: A member highlighted their new speculative decoding algorithm outpacing Groq, sparking interest for further technical details.

Members expressed eagerness to explore this advancement in resource efficiency.
Exploring O1's Use Cases: Inquiries on the best use cases for O1 pointed out its effectiveness in coding, yet members noted its primary strength lies in math.

Responses confirmed limited utility in coding tasks, raising questions about its versatility.
Comparative Performance Analysis of O1 and GPT-4o: Private evaluations revealed GPT-4o outperformed O1 in direct answering tasks, especially in complex math exercises.

Despite this, O1 Mini had a slight edge over GPT-4o in coding, while O1 Preview excelled in the PAL approach.
OpenAI's Prompt Generation Metaprompt: A member discussed OpenAI's metaprompt for system prompt generation, hinting at upcoming integrations with DSPy.

A link to the OpenAI documentation provided insight into evolving methodologies.

Cohere Discord

Community Engagement Shines Bright: Members exchanged greetings, creating a friendly atmosphere with enthusiastic hellos that fostered openness for conversations.

The chat reflects a welcoming environment, encouraging interaction and connection among participants.
Web Search Connector Unpacked: Inquiries about enabling the Internet search tool revealed confusion in the documentation, leading to discussions on its availability in the v1 API.

Migration options are detailed in the Cohere migration guide, highlighting the differences for users transitioning to v2.
V2 API Slower Than Expected: Users noted the v2 API performs slower, with response times averaging 2-3 seconds compared to 1-1.5 seconds for v1.

This delay has been consistently reported, raising concerns about its impact on user experience.
Token Usage Discussion Sparks Debate: Questions about the necessity of using specific tokens in API requests led to discussions on their impact on response quality.

Clarifications suggest that understanding token requirements is crucial for effective API use, although some users question their necessity.
Cohere API Toolcall Issue Resolution: A user reported a Cohere API performance issue regarding toolcall, but found that the related GitHub issue had been closed.

They sought insights on unresolved problems while using version 5.11.0, reflecting a need for clearer resolutions from the community.

LlamaIndex Discord

AI Builders Night Buzzing at Zoom HQ: Join us on Monday for AI Builders Night at Zoom HQ in San Jose, featuring Biswaroop Palit from LlamaIndex discussing multi-agent systems and insights from QDrant.

Network with fellow developers and spark discussions around the latest AI advancements.
Lightning Demos Want Your Innovations: Showcase your AI-powered use cases at the meetup's lightning demos using the Zoom Developer Platform.

It's a prime opportunity for feedback, so share highlights on social media with #ZoomDevelopers.
Symphony Speeds Up Workflow Automation: Symphony automates agentic workflows, generating high-performance setups based on your tools and tasks, encouraging joining their Discord for an API key.

Check out this Loom video for detailed insights into creating efficient AI workflows.
OpenAI Batch API Not Suited for Document Summaries: Members discussed using the OpenAI Batch API within LlamaIndex's Document Summary Index, concluding it doesn't fit operational standards for efficiency.

There was playful frustration about the lengthy claims process, highlighting the community's preference for quicker methodologies.
Sponsorship Call for AI Mayhem V3 Hackathon: Representatives from Zo World are seeking sponsors for the AI Mayhem V3 hackathon in San Francisco and Bangalore, emphasizing brand visibility opportunities.

They encouraged reaching out for collaboration, aiming to engage top developers in this dual-location event.

OpenAccess AI Collective (axolotl) Discord

Multi-nodes Deployment Made Simple: For large multi-nodes deployment, utilizing AWS is recommended as it ensures better management and connectivity within the same region.

This approach provides a more effective system for scaling and handling resource requirements.
Frustration with Llama-3-8B Fine-tuning: A member shared their experience with fine-tuning Llama-3-8B on two 3090 GPUs, reporting no speed advantage compared to a single GPU setup.

Despite both GPUs being over 98% utilized, doubts were raised regarding data parallelism effectiveness with DeepSpeed.
Custom Llama Tokenizer for Character Level: Customizing the LlamaTokenizer to generate single character tokens involves subclassing and overriding the tokenize method, enhancing string processing capabilities.

This method is particularly aimed at optimizing large language models for tasks such as molecule design.
Adjusting for Character Level Tokenization: Tokenizing at the character level may necessitate adjustments to the model's maximum sequence length, impacting training and inference performance.

These adjustments could significantly influence the overall efficiency of model deployment.
Processing SMILES Strings Demonstrated: A member illustrated how the tokenizer processes a SMILES string, showcasing practical application in molecular representation.

While changes from the tokenizer modification may be minor, they are still deemed noteworthy in advancing processing techniques.

DSPy Discord

Batch-GPT slashes API costs: A member highlighted the Batch-GPT tool that reduces OpenAI API costs by 50%+ through its Batch API, promoting cost-effective implementation.

This open-source project features auto-caching for repeated queries, simplifying integration with a code snippet: client = OpenAI(..., base_url='http://batch-gpt/v1').
DSPy onboarding form boosts user experience: An onboarding form for DSPy was introduced to guide new users through its features, improving understanding and utilization.

The prospect of automation in this process tied into discussions about enhancing user experience and future AGI capabilities.
OpenAI embraces DSPy optimizations: News broke that OpenAI intends to implement DSPy optimizations in its services, indicating a shift towards better performance and efficiency.

Community members reacted positively, indicating excitement about potential enhancements in future OpenAI iterations.
GraphIC boosts In-Context Learning: The GraphIC method was discussed, which employs graph-based representations and Bayesian Networks to improve In-context Learning (ICL).

This technique overcomes biases in traditional ICL methods, focusing on deeper reasoning structures needed for complex tasks.
Handling ambiguity in LLM classification: A member training an LLM classifier with DSPy shared the need for the model to indicate classification ambiguities like, Requires more info, ambiguity between class A and B.

This initiated a conversation on whether separate classes should be created for all ambiguities, addressing the nuances of classification outcomes.

tinygrad (George Hotz) Discord

Int64 Indexing Sparks Precision Debate: Discussion arose regarding the application of int64 indexing only on ALUs where it can exceed, as referenced in ##6987.

Tinygrad raises concerns if two different data types are used together, prompting consideration for operator compatibility.
GPU Slowness Fuels Data Type Casting: Concerns about int64 being slow on the GPU surfaced, leading to a discussion on the necessity of casting between different data types.

The group agreed to utilize int64 indices only when strictly necessary to boost overall performance.
Type Annotations Needed in nn/init.py: Members highlighted the need for type annotations in all classes within nn/init.py for improved clarity.

George suggested this could serve as a promising first pull request for contributors aiming to tackle this enhancement.
Diffusion Policy Impresses in Robot Learning: The paper on Visuomotor Policy Learning via Action Diffusion shows the Diffusion Policy yielding a 46.9% average advantage in robot behavior generation.

It deftly manages multimodal action distributions and high-dimensional action spaces, utilizing stochastic Langevin dynamics for stable training.
Streamlined Example File Preferences Discussed: In organizing the examples/ directory, George stated that having one file is preferred, emphasizing high-quality code.

This feedback supports the creation of coherent examples that enhance understanding.

Torchtune Discord

BitNet Model Implementation Smarts: A member explored how to implement the 1.58B BitNet model via matrix addition instead of multiply-accumulate, eyeing better performance on NVIDIA GPUs.

It was noted that utilizing tensor cores would enhance efficiency while leveraging integer operations could further optimize the model.
Gemma-2 Hits Fine-Tuning Bottlenecks: There's a rising buzz around Gemma-2 and its multilingual prowess, but fine-tuning still poses challenges with QLora implementations.

Concerns arose surrounding optimal parameter choices, with a GitHub issue initiated to rally support for improved fine-tuning.
Pixtral 12B Takes Center Stage: The paper on Pixtral 12B highlights its capabilities in multimodal AI and is co-authored by a team including Pravesh Agrawal.

It emphasizes the blend of natural images and documents, aiming for leading performance in a competitive landscape.
Aria Sets New Multimodal Standards: Aria emerges as an open multimodal native model showing top-tier performance with its 3.9B and 3.5B active parameters.

It outshines Pixtral-12B and Llama3.2-11B, showcasing leaps in language understanding and broader task efficiencies.

Interconnects (Nathan Lambert) Discord

Technical Insights on Replicating OpenAI's O1: A new report presents the 'journey learning' paradigm for replicating OpenAI's O1 model, showcasing an 8% improvement using just 327 training samples. The report offers in-depth observations and techniques utilized throughout the replication process, focusing on advanced reasoning capabilities.

The exploration emphasizes trial-and-error learning strategies and how they enhance the model's performance, as documented in discussions about mathematical reasoning integration.
Skeptical Views on Dowehaveopeno1.com Proposal: A suggestion was made to establish dowehaveopeno1.com as a resource for O1 replication updates, though it sparked skepticism regarding its feasibility. Community members conveyed mixed feelings, acknowledging progress but questioning if the timing for the domain creation was right.

The conversation revealed concerns about whether the domain would be beneficial at this stage, considering the ongoing development of the O1 replication.

Gorilla LLM (Berkeley Function Calling) Discord

Exciting Advancements in Gorilla LLM: Members expressed gratitude for recent enhancements in the Gorilla LLM model and encouraged submissions for a PR related to its handler.

The discussions highlighted existing PRs from other providers as useful references to facilitate contributions.
Streamlined Contribution Process: A detailed README was shared to guide users on how to effectively contribute to the Gorilla project.

The document includes steps for training and evaluating LLMs specifically for function calls.
Symphony Makes AI Workflows Easy: The Symphony model simplifies the creation of agentic workflows by transforming user descriptions into functional AI workflows, as showcased in this Loom video.

Community members are also invited to join the Discord to request an API key, enhancing collaboration on the project, with access details found here.

LLM Agents (Berkeley MOOC) Discord

Web Browser Agents Spark Interest: Users are discussing effective web browser agents, with Web Voyager surfacing as a notable contender to investigate further.

Members expressed enthusiasm for sharing hands-on experiences with these agents to drive collective insight.
Finding Lab Study Materials: A member sought guidance on optimal study methods for labs, resulting in discussions about utilizing slides and supplemental readings.

The conversation underscored the critical role these materials play in effective preparation for lab work.

LangChain AI Discord

Need for a Lightweight Vector Database on Raspberry Pi 5: A member highlighted the requirement for a lightweight vector database to facilitate a RAG setup on a Raspberry Pi 5, citing its limited RAM resources.

They expressed concerns about Chroma's RAM storage approach negatively impacting performance when integrated with Ollama.
Pinecone Recommended for Vector DB Needs: In response, another member suggested Pinecone as a practical vector database alternative for the Raspberry Pi 5 scenario.

This recommendation directly aimed to mitigate the limitations posed by using Chroma in this hardware context.

OpenInterpreter Discord

Calculating ElevenLabs Audio Costs: A member shared that being on the creator plan for ElevenLabs provides 100k credits per month, which equals 833 credits or $0.18 per minute of audio.

This insight sheds light on the cost implications when using the app for audio production.
Inquiry on op3nai Real-Time API Integration: Another member posed a question about successfully implementing the op3nai real-time API into O1.

This inquiry emphasizes the community's interest in sharing experiences related to API integrations and challenges faced.

AI21 Labs (Jamba) Discord

Hugging Face AI21-Jamba-1.5-Mini Fails with CUDA: A user faced an error with the Hugging Face model AI21-Jamba-1.5-Mini while using torch.multiprocessing in a Docker container on Ubuntu with CUDA 12.4.

The error pointed out that CUDA could not be re-initialized in a forked subprocess, stressing the importance of adopting the 'spawn' start method.
Docker Woes on Akash with A100 GPUs: Another user reported issues running a Docker image on Akash while utilizing two A100 GPUs, though specifics on their configuration were scant.

They expressed frustration over the ongoing configuration challenges and their impacts on workflow.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Notebook LM Discord ▷ #announcements (1 messages):

Audio Overviews
Feature Performance Issues

Audio Overviews Generation Issues: There is an ongoing investigation into the issue where Audio Overviews may be failing to generate, which could impact other features' performance.

The team will provide updates as they work on resolving this situation.
Potential Impact on Other Features: Members are concerned that the issue with Audio Overviews could have broader implications on the performance of other features.

The team is aware of these concerns and is actively looking into the interactions between features.

Notebook LM Discord ▷ #use-cases (36 messages🔥):

NotebookLM for Education
Using Podcasts for Podcasting Discord Chats
Multimodal AI Interpretation
Analyzing Music and Sounds
Dream Journals & AI Analysis

NotebookLM used for Homeschooling: A participant expressed interest in using NotebookLM to run lesson books through the system to make homeschooling for their 13-year-old more engaging.

However, others cautioned that the podcast outputs may lack depth and could result in hallucinatory inaccuracies.
Creating Podcasts from Discord Conversations: Members discussed the idea of generating podcasts from their Discord conversations, noting it could offer a fun exploration of their chats.

One user humorously remarked about feeding wacky Discord chats into the AI for podcasting purposes.
Insights into Multimodal AI Interpretation: A user highlighted their research on how NotebookLM handles interpreting complex historical mixed-media art through a podcast.

They observed command tools and metadata use, indicating varying outcomes, particularly regarding music analysis from YouTube.
Sound Analysis Capabilities of AI: Another user shared their experiment analyzing nonverbal vocalization and emotions captured in audio summaries by TTS models, highlighting its ongoing nature.

This exploration aims to further understand the capabilities of Google's TTS model within NotebookLM's framework.
Dream Journals Interrogated by AI: A member inquired about using AI to analyze dreams and extract recurring themes and narratives from a personal dream journal.

This showcases the diverse applications of AI analysis in personal reflections, encouraging others to consider similar uses.

Links mentioned:

Google's NotebookLM takes on The Bootymachine - How AI understand multimodal art by AI's Hit - artificial intelligence hits on things: Using the online-offline multimodal art experience of the Bootymachine from 2007, and using the pdf file of the published book named The Bootymachine ( ISBN 978-2-940679-01-0 ) but also some audio, we...
GitHub - GilgameshofUT/AIResearcher: Contribute to GilgameshofUT/AIResearcher development by creating an account on GitHub.

Notebook LM Discord ▷ #general (591 messages🔥🔥🔥):

NotebookLM audio generation
AI hallucinations
Notion of podcast quality
User experiences with NotebookLM
Community engagement with AI tools

User Experiences with NotebookLM Audio: Users are sharing their experiences with NotebookLM's audio generation feature, often noting the fun yet inconsistent quality of the outputs, including instances of hallucinations and repeated phrases.

One user humorously mentioned encountering an overt skipping of sections and voice changes during podcasts, highlighting the unpredictable nature of the tool.
Concerns on AI and Podcast Quality: A blog post from Listen Notes raised concerns about AI-generated podcasts potentially overwhelming the platform with low-quality content, causing worry among genuine creators.

Users debated the risk but acknowledged that open-source alternatives would sprout regardless, making it hard to completely eliminate the issue.
NotebookLM Features and Limitations: Discussions led to curiosity about the NotebookLM’s capabilities, including how it handles extensive documents and whether users can prompt the audio tool for specific outputs.

Challenges were noted regarding incorrect data formats especially with tables during import, making it difficult to ensure accuracy.
Community Engagement and Feedback: Users expressed their enthusiasm for engaging with NotebookLM and shared tips on how to maximize its features, including the deep dive podcasts.

One user noted the comedic value of the AI's outputs, suggesting that despite hallucinations, the overall experience remains enjoyable.
Future Directions for AI in Podcasting: A user highlighted potential future developments for NotebookLM, including interactive features and enhanced user controls over audio outputs.

The community showed interest in how these innovations could transform their interaction with AI-generated content.

Links mentioned:

Notebook LM: A threat to the Podcasting World: Bad for Listeners. Bad for Podcast Creators. Bad for Advertisers. Bad for Hosting Platforms.
Build AI chatbots with Google’s NotebookLM.: Artificial Intelligence is transforming how we communicate, work, and create. Among the vanguard of these innovations stands a fascinating tool from Google — NotebookLM, which is now showcasing its…
Applause Applaud GIF - Applause Applaud Clap - Discover & Share GIFs: Click to view the GIF
Pawn Stars Rick GIF - Pawn Stars Rick Serious - Discover & Share GIFs: Click to view the GIF
Grinch Pose GIF - Grinch Pose Fab - Discover & Share GIFs: Click to view the GIF
Uh Oh GIF - Uh Oh - Discover & Share GIFs: Click to view the GIF
Kitty Forman Debra Jo Rupp GIF - Kitty Forman Debra Jo Rupp Laugh Meme - Discover & Share GIFs: Click to view the GIF
Flight Reacts Impressed GIF - Flight Reacts Impressed Clapping - Discover & Share GIFs: Click to view the GIF
Huh GIF - Huh - Discover & Share GIFs: Click to view the GIF
Facts Straight GIF - Facts Straight Up - Discover & Share GIFs: Click to view the GIF
Golden Girls Sophia Petrillo GIF - Golden Girls Sophia Petrillo Sophia - Discover & Share GIFs: Click to view the GIF
Waynesworld GIF - Waynesworld Way - Discover & Share GIFs: Click to view the GIF
Witcher Letho GIF - Witcher Letho Acceptable - Discover & Share GIFs: Click to view the GIF
no title found: no description found
Deep Dive Into Deep Dives: Listen to Deep Dive Into Deep Dives by Drew Walton #np on #SoundCloud
Nobody Aint Got Time Wearing These Bad Boys GIF - Nobody Aint Got Time Wearing These Bad Boys Pajamas - Discover & Share GIFs: Click to view the GIF
Behind the product: NotebookLM | Raiza Martin (Senior Product Manager, AI @ Google Labs): Raiza Martin is a senior product manager for AI at Google Labs, where she leads the team behind NotebookLM, an AI-powered research tool that includes a delig...

Unsloth AI (Daniel Han) ▷ #general (259 messages🔥🔥):

Multimodal Model Support
Fine-Tuning Techniques
H100 Cluster Recommendations
Gemma 9B Training Challenges
Arcee SuperNova-Medius Model

Anticipation for Multimodal Models: The community is eagerly awaiting support for multimodal models like Llama3.2 and Qwen2 VL, with expected updates coming next week.

Theyruinedelise expressed excitement about this advancement, which is highly anticipated by users.
Fine-Tuning and Dataset Optimization: Members discussed strategies for fine-tuning models such as G2-9B, noting significant VRAM demands and the effectiveness of tuning with Dora.

Challenges with Gemma 9B arose, including issues with high VRAM usage and the appearance of NaN values during training.
Recommendations for H100 Clusters: Users shared experiences using H100 clusters, with costs around $2.5/hour and the necessity of ensuring adequate VRAM for training.

Options like Runpod are favored for those needing substantial resources for AI training.
Deployment and Usage of Models: Concerns were raised about the feasibility of merging models, with examples like Qwen 1.5 being compared to yi coder and the evolution to 2.5 models.

Users were advised that there is no single 'correct' notebook or settings for training, emphasizing the trial-and-error process in model tuning.
Introducing Arcee SuperNova-Medius: The Arcee SuperNova-Medius, a 14B model, was introduced, claiming performance comparable to much larger models, igniting interest in its training methodology.

Discussion revolved around testing the model's capabilities and the open-source tool DistillKit that supports its development.

Links mentioned:

Introducing SuperNova-Medius: Arcee AI's 14B Small Language Model That Rivals a 70B: First came our flagship 70B SuperNova, followed by the 8B SuperNova-Lite. Today we add to this family of superpower Small Language Models with the release of the 14B SuperNova-Medius.
Google Colab: no description found
Nerding Speech Bubble GIF - Nerding Speech Bubble Pepe Nerd - Discover & Share GIFs: Click to view the GIF
Lambda | GPU Compute for AI: The GPU Cloud built for AI developers. Featuring on-demand & reserved cloud NVIDIA H100, NVIDIA H200 and NVIDIA Blackwell GPUs for AI training & inference.
LoRA Parameters Encyclopedia | Unsloth Documentation: Learn how parameters affect the finetuning process.
Continue: The leading open-source AI code assistant. Continue has 14 repositories available. Follow their code on GitHub.
GitHub - arcee-ai/DistillKit at blog.arcee.ai: An Open Source Toolkit For LLM Distillation. Contribute to arcee-ai/DistillKit development by creating an account on GitHub.
peft/src/peft/tuners/lora/config.py at main · huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft

Unsloth AI (Daniel Han) ▷ #help (80 messages🔥🔥):

Llama 3 Model Fine-Tuning
Using RAG with Embedding Models
Characterization of Animals with LLMs
CUDA Out of Memory Errors
Gradient Accumulation Steps in Training

Llama 3 Model Fine-Tuning Discussions: Members discussed the process of fine-tuning Llama 3 models, including using descriptions of animals to enhance model responses.

One mentioned using embeddings of animal traits to retrieve relevant results, noting the model's performance with embeddings for classification.
RAG and Embedding Models for Queries: The utility of RAG with embedding models was highlighted, suggesting it could retrieve valuable insights when fine-tuned with appropriate data.

Participants indicated that using shorter descriptions could skew embedding scores, affecting relevance in retrieval tasks.
CUDA Out of Memory and RTX 4090 Training: A user reported experiencing CUDA out-of-memory errors while training on an RTX 4090, suggesting adjustments in batch size and optimizer settings.

Advice was given to utilize paged_adamw_8bit optimizer and to lower gradient accumulation steps to manage memory better.
Efficiency Challenges in Training Settings: Users expressed concerns over long training times, citing configurations that resulted in multiple hours needed for a single epoch.

Several users collaborated to find configurations that would optimize training time without sacrificing performance.
Gradient Accumulation Steps Impact: Participants discussed the implications of adjusting gradient accumulation steps, debating the effects on training speed.

One user humorously noted that they experienced faster training times despite increasing their gradient accumulation steps.

Links mentioned:

Google Colab: no description found
Retrieve & Re-Rank — Sentence Transformers documentation: no description found
text_classification_scripts/unsloth_classification.ipynb at main · timothelaborie/text_classification_scripts: Scripts for text classification with llama and bert - timothelaborie/text_classification_scripts
GitHub - MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.: Leveraging BERT and c-TF-IDF to create easily interpretable topics. - GitHub - MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

OpenAI's O1
Chain of Thought (CoT) reasoning
Cursor Team's Speculation
Anthropic/AWS stack
Closed Source Concerns

OpenAI's O1 Speculations Stir Debate: Some speculate that OpenAI's O1 allows a chain of prompts to operate without user visibility, though consensus remains elusive.

Discussions highlight differing opinions on the validity of these claims, with some members expressing skepticism about the locked nature of the source.
Improving LLMs with CoT Reasoning: Members express that improving LLMs chain of thought reasoning without prompts appears promising, sparking ideas about innovative models.

One proposed placing CoT into the attention model's k/v cache, suggesting avenues for experimentation.
Cursor Team's Insights on O1: While watching a related video, a member found the Cursor Team's insights speculative and questioned their validity given the closed-source nature of the product.

The notion that the Anthropic/AWS stack might influence their product's performance due to O1's release was also discussed.
Community Engagement on O1's Functionality: Conversations reveal a mixed response regarding the functionality of O1, with some acknowledging they have encountered similar information in videos.

A lighthearted expression of uncertainty followed, emphasizing the unpredictable nature of the unfolding technology.
Concerns Over OpenAI's Transparency: Sentiments were shared about OpenAI's lack of openness, with emphasis on the challenges of speculative claims regarding the functionality of O1.

This skepticism reflects a broader community concern about proprietary technologies and the impacts of their limitations.

HuggingFace ▷ #general (235 messages🔥🔥):

Labeling stylised character images
Emotion detection models
Running Text-to-Code models
Hugging Face subscription inquiries
Function calling in text models

Challenges in labeling character images: A member is struggling to label over 400k stylised character images for different views and seeking effective methods to segregate them. Other members suggest that manual classification might be necessary due to inconsistencies and the inadequacy of current models.

They discussed the compounding errors involved in data augmentation techniques, noting the importance of manual review when dealing with such large datasets.
Exploring Emotion Detection Models: A user inquired about good emotion detection models, mentioning their use of FER and DeepFace libraries. Recommendations and shared experiences led to a discussion about the limitations of certain models in accurately capturing emotional nuances.

Members expressed challenges with specific models and the need for accurate emotion output in various applications.
Running the Text-to-Code Model: Another user asked for help with running the Text-to-Code model from the CodeXGLUE benchmark suite found on GitHub. It was clarified that the suite itself is not a model but a set of benchmarks, indicating potential confusion.

Members emphasized the need to understand the context and tools involved in leveraging these benchmarks for effective use.
Hugging Face Subscription Assistance: A member was looking for assistance with a subscription to Hugging Face but faced issues using a personal email. Responses included suggestions to contact Hugging Face through specified email addresses for business inquiries.

There was a general consensus that clarity on contacting sales or support teams is essential for smooth onboarding and resolution of related issues.
Function Calling and Use Cases in Text Models: Discussion arose around the practicality of text models that lack function calling capabilities, with members exploring their potential use cases. It was noted that while the models may be less functional, they still hold value in various applications like search engines and documentation.

Members engaged in a metaphorical discussion comparing models and tools, illustrating their roles in processing and responding to tasks.

Links mentioned:

XLabs-AI/flux-dev-fp8 at main: no description found
The Office Dwight GIF - The Office Dwight Joke - Discover & Share GIFs: Click to view the GIF
GitHub: Let’s build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and fea...
Trelis/Meta-Llama-3-70B-Instruct-function-calling · Hugging Face: no description found
Python for Beginners – Full Course [Programming Tutorial]: Learn the Python programming language in this full course for beginners! You will learn the fundamentals of Python and code two Python programs line-by-line....
GitHub - ltdrdata/ComfyUI-Manager: ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.: ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th...
llama3: Meta Llama 3: The most capable openly available LLM to date
meta-llama/Meta-Llama-3-8B · The Serverless Inference API: "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)": no description found

HuggingFace ▷ #today-im-learning (3 messages):

Community Computer Vision Course
Proof of Concept Chat Agent
NVIDIA Research on Larger LLMs
Mixture of Experts (MoE) Techniques

Explore the Community Computer Vision Course: A member shared a new Community Computer Vision Course aimed at teaching ML techniques using Hugging Face libraries and models.

This course emphasizes practical applications and aims to bolster participant's skills in computer vision.
First POC Chat Agent Released: A member unveiled their first 'Proof of Concept' chat agent with tool support, available at this GitHub link.

This agent integrates functionalities for document collaboration and microblogging, showcasing a blend of AI chatbot and coding capabilities.
NVIDIA Pushes Boundaries of LLM Training: Exciting research from NVIDIA revealed improvements in LLM training through upcycling techniques; the upcycled Nemotron-4 15B model achieved 67.6% MMLU.

The team proposed a 'virtual group' initialization for MoE, leveraging features from NVIDIA Megatron-Core like expert parallelism, and shared details in this paper.
Better Performance with MoE Techniques: The NVIDIA research demonstrated that the softmax-then-topK approach outperforms the topK-then-softmax method for upcycling large models.

Furthermore, they noted the advantages of finer-grained MoE structures, proposing novel techniques for increasing model capacity efficiently.

Links mentioned:

Hugging Face - Learn: no description found
quantizr/QuantaGradio/Quanta_Gradio_AgentTest.py at main · Clay-Ferguson/quantizr: Open-source CMS, Document Collaboration, Microblogging, and Publishing with AI Chatbot and AI Coding Agent supporting most Cloud AI providers - Clay-Ferguson/quantizr
Tweet from Ethan He (@EthanHe_42): I'm excited to share our latest research on improving LLM by upcycling them into Mixture of Experts (MoE)! 1. We upcycled the Nemotron-4 15B model on 1T tokens and compared it to a continuously tr...
Upcycling Large Language Models into Mixture of Experts: Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques f...
Megatron-LM/megatron/core/transformer/moe at main · NVIDIA/Megatron-LM: Ongoing research training transformer models at scale - NVIDIA/Megatron-LM

HuggingFace ▷ #cool-finds (5 messages):

Community Computer Vision Course
Text to Speech Research
Xtts Space by Coqui
Maximizing GPU Utilization
Tripo3D.ai Tool

Discover Hugging Face Community Course: Hugging Face offers a Community Computer Vision Course that teaches about computer vision ML using libraries from their ecosystem.

This course aims to equip participants with practical skills in computer vision applications.
New Insights in Text to Speech: A new paper titled 2406.04904 focuses on advancements in text to speech technology, authored by several researchers.

The research discusses innovative approaches and methodologies to enhance text to speech systems.
Explore Coqui's Xtts Space: Coqui launched the Xtts space which showcases exciting features in text to speech technology.

This space aims to refresh the experience of generating speech from text with new tools and capabilities.
Optimize GPU Utilization Effectively: Sushmitha's article offers practical guidance on maximizing GPU utilization during model training, emphasizing the benefits of monitoring and optimizing settings.

The blog highlights how proper GPU usage can significantly speed up experiments and improve overall performance.
Revolutionary 3D Graphics Tool Found: A user discovered Tripo3D.ai, a tool that creates highly detailed 3D models using AI from text or images, which has impressed early testers.

This tool aims to save designers and developers hours of work, making it a valuable resource in creating 3D assets.

Links mentioned:

XTTS - a Hugging Face Space by coqui: no description found
Maximizing GPU Utilization While Training Models: A Practical Guide: As a Machine Learning Enthusiast, one of the key resources I rely on for running memory-intensive experiments is the GPU. In this blog…
Hugging Face - Learn: no description found
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model: Most Zero-shot Multi-speaker TTS (ZS-TTS) systems support only a single language. Although models like YourTTS, VALL-E X, Mega-TTS 2, and Voicebox explored Multilingual ZS-TTS they are limited to just...
Tripo AI for Web: no description found

HuggingFace ▷ #i-made-this (10 messages🔥):

Distilabel Cost Calculation
Version 1.5.0 Refactoring
LLM Token Calculation

Introducing Cost Calculation for Distilabel: A member shared a small package to add cost calculation into Distilabel pipelines, tested on TextGeneration and TextClassification tasks.

They plan to update it with pricing options in YAML for all supported LLM APIs, making it easier for users.
Refactoring Output for Easier Cost Calculation: Discussion arose regarding version 1.5.0, aiming to refactor LLM outputs to simplify the computation of tokens and costs.

This change seeks to leverage existing API functionality, minimizing the need for additional token counting tools.
Community Enthusiasm for New Features: Members expressed excitement for the shared cost calculation package and its potential impact on Distilabel workflows.

Another member commended the initiative and attached interest in improving token calculation methods for greater efficiency.

Links mentioned:

TimothyLovett | Pavement Model ViT | Kaggle.): no description found
SimpleTuner/documentation/DISTRIBUTED.md at main · bghira/SimpleTuner: A general fine-tuning kit geared toward diffusion models. - bghira/SimpleTuner
GitHub - ariasanovsky/cuts: Contribute to ariasanovsky/cuts development by creating an account on GitHub.
GitHub - djellalmohamedaniss/distilabel-cost-calculator: A custom Step for LLM API cost calculation for the distilabel library.: A custom Step for LLM API cost calculation for the distilabel library. - djellalmohamedaniss/distilabel-cost-calculator
Diffing iPython notebook code in Git: Nowadays, I use iPython notebooks a lot in my software development nowadays. It's a nice way to debug things without having to fire up pdb; I'll often use it when I'm trying to debug an...
GitHub - moonglow-ai/pre-commit-hooks: Moonglow pre-commit hooks: Moonglow pre-commit hooks. Contribute to moonglow-ai/pre-commit-hooks development by creating an account on GitHub.
TimothyLovett | Birds 224x224 Shrunken EfficientNet | Kaggle): no description found

HuggingFace ▷ #NLP (3 messages):

Runpod
Llama3
VLLM
Embedding models API
Flask API request batching

Runpod, Llama3, and VLLM exploration: A member encouraged others to check out Runpod, Llama3, and VLLM, adding a fun tone with a smiley face emoji.

Enjoyment in exploring new tools is highlighted with a light-hearted approach.
Open source libraries for embedding models: Query raised about available open source libraries to infer embedding models like BERT and all-miniLM.

The search reflects a need for tools that can simplify interactions with these models.
Simplifying Flask API for embeddings: Concerns were shared regarding the potential performance issues of running a Flask API without request batching.

The need for efficient processing was emphasized, hinting at the wanting for solutions that avoid unnecessary complexity.

HuggingFace ▷ #diffusion-discussions (12 messages🔥):

Diffusion processes for multi-channel data
Training flux models on M2 Macs
Image generation with different channel configurations
Effects of noise on image structure
Bug fixing and community contributions

Diffusion processes need channel consideration: Discussion centered on how to apply diffusion noise across different channels, especially when dealing with biological sequences or images with an additional alpha channel.

How do they deal with channels representing completely different information? Participants questioned if the same noise schedule could apply to channels with different information.
Training flux models on M2 Macs still uncertain: One member inquired whether it's feasible to run dreambooth-lora-flux training on an M2 Mac, sharing concerns about reaching a semaphore leak.

Another member advised that training scripts are primarily tested on Linux, suggesting using SimpleTuner for Apple hardware instead.
Image generation influenced by high strength settings: A participant shared experiences using img2img techniques with high strength settings, noting that lower strength allows for more precise positioning.

However, they cautioned that overly low strengths might capture pixel details, compromising image quality.
Community contributions appreciated for issues: There was a call for community assistance in fixing bugs and enhancing compatibility with Apple hardware.

Members encouraged opening pull requests for any fixes discovered while using training scripts on Macs.
SDXL and Flux latents discussion: Specific details were shared regarding SDXL and Flux latents, noting that SDXL has 4 channels configured as luminance, cyan/red, lime/purple, and a structure pattern.

Adjusting the pattern/structure channel was reported to impact image clarity and presence of small objects.

Link mentioned: dreambooth_run - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

HuggingFace ▷ #gradio-announcements (1 messages):

Gradio 5 Release
Performance Improvements
User Interface Enhancements
Security Upgrades
AI Playground Feature

Gradio 5 Makes Its Grand Debut: The Gradio 5 release brings the BIGGEST and MOST SECURE update yet, transforming how production-level apps are created with months of hard work condensed into this version.

This update empowers developers to construct machine learning applications using just a few lines of Python, as highlighted in the announcement.
Lightning-Fast Loading with SSR: With SSR implemented, Gradio 5 offers LIGHTNING-FAST loading speeds, eliminating the need for loading spinners, much to the delight of developers.

Users can expect apps to perform at unprecedented speeds, making user experiences smoother than ever.
Stunning New UI Design: The update features a GORGEOUS new UI design and themes that enhance the visual appeal of apps created with Gradio 5.

This transformation aims to provide users with beautiful interfaces that captivate audiences while they interact with machine learning applications.
Rock-Solid Security Improvements: Gradio 5 has undergone an audit by TrailOfBits, significantly boosting its security measures to ensure safer machine learning applications.

A detailed security review is available in their blogpost, further reinforcing Gradio's commitment to best practices.
Mind-Blowing AI Playground Feature: The introduction of the AI Playground allows users to leverage AI to build Gradio applications with interactive tools and features.

This feature aims to streamline the development process, making it easier than ever to experiment with Gradio.

Links mentioned:

A Security Review of Gradio 5: no description found
Gradio Playground: Play Around with Gradio Demos
Welcome, Gradio 5: no description found

LM Studio ▷ #general (57 messages🔥🔥):

Model Performance Capabilities
LM Studio API and Functionality
Model Compatibility with MLX
GPUs and Model Loading Issues
Feature Requests for LM Studio

Models excel in various tasks: Members confirmed that the new models are general-purpose and perform well in nearly all tasks similar to ChatGPT.

The models utilize a combination of pretrained and instruct finetuned weights for various applications.
Structured output API usage: It was noted that LM Studio supports structured JSON outputs through the /v1/chat/completions endpoint via a provided JSON schema.

Members were directed to check the server tab in LM Studio for example code on how to implement this functionality.
Compatibility and issues with MLX backend: A discussion highlighted issues with model loading on GPUs using the MLX backend, particularly for larger models that default to CPU usage.

Members suggested verifying model performance in standalone Apple MLX and potentially opening an issue on GitHub for broader assistance.
Local server connectivity requests: The community discussed future support for LM Studio to connect as a client to local servers running OpenAI-compatible endpoints.

A GitHub issue was shared to track this feature request, encouraging others to follow the progress.
Scrollbar functionality improvement requests: Requests were made for an updated version of LM Studio that includes a scrollbar for easier navigation of large documents.

Suggestions included dynamic scrollbar visibility when hovered over, akin to Windows scrolling functionalities.

Links mentioned:

Feature Request: Use LM Studio as a Client for a different LLM Server in the local Network. · Issue #133 · lmstudio-ai/lmstudio-bug-tracker: LM Studio already allows to create a server and use it for api requests. But it does not allow LM Studio to act as a client for that Server. Here is the scenario: I have one powerful machine in my ...
GitHub - ml-explore/mlx: MLX: An array framework for Apple silicon: MLX: An array framework for Apple silicon. Contribute to ml-explore/mlx development by creating an account on GitHub.
GitHub - abetlen/llama-cpp-python: Python bindings for llama.cpp: Python bindings for llama.cpp. Contribute to abetlen/llama-cpp-python development by creating an account on GitHub.
Structured Output - Advanced | LM Studio Docs: Enforce LLM response formats using JSON schemas.

LM Studio ▷ #hardware-discussion (37 messages🔥):

M1 Max Upgrade
RTX 5000 Pricing Rumors
External e-GPU limitations
Mixtral model recommendations
Compatibility sorting features

M1 Max to M3 Max Upgrade: A user shared their experience of upgrading from an M1 Max with standard RAM to an M3 Max with 128GB RAM, noting it was expensive but flawless for running LLMs.

Many users in hardware discussions are upgrading to handle larger models effectively.
Nvidia RTX 5000 Pricing Shock: Members debated the rumored pricing of the new RTX 5000 series, with estimates ranging from $1,500 to $2,500 per card, potentially making them cheaper than a Mac Studio configuration.

Concerns were raised about the high costs of running multiple cards, especially in terms of heat and electricity.
External e-GPU Limitations Explained: A user questioned the feasibility of increasing usable graphics memory by connecting an external e-GPU via Thunderbolt to an existing RTX 4090, but suspected performance might lag.

Discussion hinted that the Thunderbolt connection could introduce latency issues, potentially limiting the benefit of combining GPU memory.
Mixtral Model Recommendations for M1 Ultra: Users suggested the Mixtral 8x7b and Mixtral 8x22b models as suitable for those on 128GB RAM Mac systems, highlighting their compatibility for various use cases.

One user noted that Mac machines can run models up to 120b Q4 in size seamlessly.
Changes in Compatibility Sorting Features: A user expressed confusion about the absence of a compatibility sorting feature in version 0.3.0, prompting a discussion about interface changes compared to earlier versions.

Users clarified that while the new version automatically handles compatible models, the specific sorting by compatibility feature is no longer available.

Links mentioned:

Rumored RTX 5000 GPU price leaks are shocking – Nvidia should just call the RTX 5090 a Titan if it’s going to charge up to $2,500 for it: That’s a worst-case scenario, but the best-case is… wait for it… $2,000. Yes, you read that right
Pc Pc Explosion GIF - Pc Pc explosion Pc burn - Discover & Share GIFs: Click to view the GIF

Latent Space ▷ #ai-general-chat (46 messages🔥):

Controllable Voice Technology
Hugging Face Evaluation Guidebook
MLE Bench Benchmark
OpenWebUI Developments
AI Neoclouds & GPU Rental Dynamics

Controllable Voice Technology from Wondercraft: Wondercraft has introduced Director Mode, allowing users to instruct their AI voice character on how to deliver lines.

This release is billed as their biggest of the year, enhancing creative control for users.
Hugging Face's New Evaluation Guidebook: A new guidebook shared by Clementine Fourrier on GitHub consolidates insights on LLM evaluation gained from managing the Open LLM Leaderboard.

This resource offers both practical and theoretical frameworks for evaluating machine learning models.
MLE Bench for AI Agents: OpenAI has launched the MLE-bench benchmark to evaluate AI agents' performance in machine learning engineering through competitions sourced from Kaggle.

This initiative showcases the increasing focus on practical applications and performance metrics in AI development.
OpenWebUI Buzz and Features: There's significant recent buzz around OpenWebUI following its latest update, dubbed 'Artifacts’, which enables fully local and private LLM usage.

Users noted its robust features, positioning OpenWebUI as a leading tool with the potential for greater accessibility.
Insights on AI Neoclouds and GPU Rental Dynamics: A popular HN post discusses the emerging trends of GPU rental services, influenced by oversupply and shifting demand for AI compute.

This discourse highlights the evolving strategies in AI infrastructure and the potential for price collapses linked to market saturation.

Links mentioned:

no title found: no description found
Tweet from Nick St. Pierre (@nickfloats): Something very special is about to happen with Midjourney --v 7 release, video model release, Midjourney 3D release, a new editor for external images with control net capabilities All getting launc...
Tweet from Wondercraft (@wondercraft_ai): Introducing Director Mode. What if you could literally tell your AI voice character how to deliver a line? Now you can. After the success of Parrot Mode, we're taking our audio studio to the next...
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a dive...
Behind the product: Google’s NotebookLM | Raiza Martin (Senior Product Manager, AI @ Google Labs): Google Labs' Raiza Martin on leading NotebookLM, the "Audio Overviews" feature, and scaling an AI-powered tool.
Spotify to Acquire Sonantic, an AI Voice Platform — Spotify: As a leader in all things audio, Spotify is always searching for new ways to create unique experiences that our users will love. So today we’re excited to share our intention to acquire Sonantic, a dy...
Understanding CrewAI Flows: A Comprehensive Guide: When it comes to AI automation, managing complex workflows efficiently is crucial. The CrewAI team has recently released Flows, a powerful feature designed to simplify the creation and management of A...
Tweet from Ofir Press (@OfirPress): SWE-bench is co-led by @_carlosejimenez and @jyangballin
485K views · 8K reactions | GTA 5 Real Life Graphics | Generative AI | GTA 5 Real Life Graphics | Generative AI | By TRCK | Facebook: GTA 5 Real Life Graphics | Generative AI
Tweet from cocktail peanut (@cocktailpeanut): Artifacts, but 100% Open Source, Private & Local. @OpenWebUI has released a game changer update---"Artifacts". Instead of proprietary LLMs, you can now use fully LOCAL and PRIVATE LLM via @o...
Tweet from Vicente Silveira (@vicentes): The amazing @OfficialLoganK at the Google DeepMind event in SF and teasing us with next gen Gemini …
AI Neocloud Playbook and Anatomy : H100 Rental Price Cuts, AI Neocloud Giants and Emerging Neoclouds, H100 Cluster Bill of Materials and Cluster Deployment, Day to Day Operations, Cost Optimizations, Cost of Ownership and Returns
Tweet from OpenAI (@OpenAI): We’re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions source...
GitHub - huggingface/evaluation-guidebook: Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!: Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! - huggingface/evaluation-guidebook
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter

Latent Space ▷ #ai-announcements (5 messages):

GPU Rental Market
H100 Pricing
Latent Space Guest Post
Blackwell Chips

H100 Prices Plummet to $2/hr: A guest post titled $2 H100s: How the GPU Rental Bubble Burst highlights the dramatic drop in H100 prices from $8/hr to under $2, with multiple vendors now in the mix.

It raises the question of whether smaller AI companies should buy or rent as new Blackwell chips hit the market.
Congrats on HN Feature!: Congrats were given to a member for publishing the first LS post to reach Hacker News in a long time, sparking excitement in the community.

Notably, the guest post by @picocreator outperformed the Tesla robotaxi in terms of engagement.
Community Reactions to HN Comments: Member feedback suggested that while the robotaxi showed poor reception on Twitter, the comments on the HN post took that sentiment further.

Discussions hinted at varying perspectives on the content and their significance in the community.
Buzz Around SFCompute Gains Traction: Amidst the conversation about the H100 pricing, a member mentioned the rising visibility of sfcompute, especially as they exit private beta.

This points to the growing competition in the GPU rental market as more users learn about available alternatives.

Links mentioned:

Tweet from Latent.Space (@latentspacepod): 🆕 $2 H100s: How the GPU Rental Bubble Burst https://latent.space/p/gpu-bubble A rare guest post, from returning guest @picocreator! H100s used to be $8/hr (if you could get them). Now there's...
Tweet from swyx 🔜 NYC (@swyx): lmaoooo @picocreator's first LS guest post beat the @tesla robotaxi today Quoting Latent.Space (@latentspacepod) 🆕 $2 H100s: How the GPU Rental Bubble Burst https://latent.space/p/gpu-bubble ...

Latent Space ▷ #ai-in-action-club (39 messages🔥):

LLM Whisperers
Programming Identity
Discord API Setup Challenges
Live Demo Insights
Feature Building Techniques

We are all LLM whisperers: A member humorously stated, 'We are all LLM whisperers', highlighting the collaborative and exploratory nature of AI discussions.

This sentiment resonated with others who shared laughter and agreement.
Questioning Programmer Identity: A programmer expressed uncertainty about whether they should still call themselves a programmer these days, stating, 'as a programmer, I also wonder...'.

Another member chimed in humorously, reinforcing the relatability of this sentiment.
Discord API Setup is Tricky: Setup challenges were discussed, particularly regarding the difficulties of obtaining the API key while sharing a screen.

One member lamented the pain of permissions when transitioning between libraries like discord.py and discord.js.
Insight on Live Demos: A member claimed to be the 'king of live demos', sharing insight that expectations differ for outsiders versus those familiar with the tasks.

Others shared experiences of unsuccessful demos, implying the reality of technical difficulties during presentations.
Building Features with Minimal Setup: Discussion about feature building led to suggestions for simple project ideas like a calculator app or todo list to streamline workflows.

One member noted that 'the fun stuff that works takes 10 seconds', emphasizing the need to navigate complex tasks as an expert.

Modular (Mojo 🔥) ▷ #general (4 messages):

Float Precision Issues
IEEE 745 Standards

Float Precision Fumbles with 0.15: A user questioned why a certain value does not equal 0.15, leading to discussions on float precision in programming.

It's noted that literals are materialized to Float64, which has limited precision, causing discrepancies similar to how 1/3 cannot be precisely represented in base 10.
Consistency in Floating Point Representation: Despite precision issues, another member reassured that values remain self-consistent in IEEE 745 64-bit floating points.

This means calculations will still equate to 0.15 accurately within the confines of this specific representation.

Modular (Mojo 🔥) ▷ #mojo (73 messages🔥🔥):

Trivial Types in Mojo
AESNI Instruction Set
Memory and Struct Management in Mojo
Reflection in Mojo
SIMD Operations

Defining Trivial Types: Users discussed challenges in defining a trait for trivial types that only contain inline memory, especially regarding constraints on container elements. It was noted that using AnyTrivialRegType might not fulfill the requirements as it's too restrictive.

The current limitation of the trait system makes it difficult to combine AnyTrivialRegType with other traits, prompting users to seek alternatives.
Hardware AES Implementation Concerns: A user shared code intended to determine if the system supports the AESNI instruction set but encountered issues ensuring that the compiler recognized it correctly. The function has_aesni() checks for X86 architecture compatibility and utilizes llvm_intrinsic.

Despite some initial confusion regarding correct compiler behavior, it was confirmed that AVX2 and AVX512 would be supported, allowing multiple instruction widths for efficiency.
Efficient Struct Creation in Containers: There was a query on whether structs can be created in-place to avoid unnecessary copy operations, specifically while appending to a list. It was clarified that creating structs as rvalues typically doesn't involve copies unless a temporary is created.

However, the __moveinit__ method was mentioned as a lightweight copy method that could still be invoked, leading to discussions about the efficiency of operations on larger types.
Reflections and Its Role: The conversation touched on the concept of reflection and its immutability and mutability implications in Mojo, with suggestions that early reflection could denote mutable states. C++ reflection practices were briefly compared, but a user advocated for differentiating traits in Mojo.

This discussion led to considerations of how certain constructs could be utilized to minimize code complexity, especially when automating checks for struct decorations.
Loop Unrolling and Compiler Behavior: During the debugging phase, a user realized their unrolled loop generated more calls than intended, owing to an oversight regarding the aesdec function in their compiled code. This highlighted the importance of ensuring vigilance when performing optimizations to prevent miscompilation.

Moreover, users proposed using constraints to simplify hardware support checks while validating that operations executed correctly, demonstrating effective teamwork in debugging.

Links mentioned:

Types | Modular Docs: Standard Mojo data types.
Types | Modular Docs: Standard Mojo data types.

Modular (Mojo 🔥) ▷ #max (1 messages):

Model IR caching
Serialized MAX graphs

Enable Model IR Caching for Faster Compilation: To boost performance, check the presence of the enable_model_ir_cache config option in your modular.cfg located at ~/.modular/modular.cfg, which allows faster compilation on subsequent runs when there's a cache hit.

This should significantly reduce compilation time.
Serialized MAX Graphs Support is Incoming: Currently, there is no official support for serialized MAX graphs, but efforts are underway to develop this feature.

The team is actively working on it and updates will follow.

Perplexity AI ▷ #general (67 messages🔥🔥):

Perplexity usage tips
Discussion on Pro features
Issues with search functionality
Community interactions and memes
Mathematical and API inquiries

Concerns about Perplexity's performance: A member expressed doubt about the advantages of using Perplexity compared to ChatGPT, noting more hallucinations in Perplexity's responses.

Another member shared a specific example where Perplexity provided incorrect historical information regarding leprosy.
Pro search features evolve: Users discussed that the Pro search feature now utilizes an 'agentic' model, which requires more focused inputs for effective functioning.

There was mention that earlier guided search capabilities had transformed into a more complex input-output interaction.
Issues with search hanging: One user reported problems with Perplexity where searches hang, making it difficult to read while entering new queries.

This prompted discussions about the user experience and frustrations with the interface.
Curiosity about available models: New Pro users inquired about the most effective models to use within Perplexity, especially regarding API interactions.

Responses included links for further details about model capabilities and API functionalities.
Lighthearted interactions in the channel: Several members engaged in playful banter and shared memes, such as a cat in the snow with the phrase 'when hell freezes over.'

The community maintained a light atmosphere, discussing profile pictures and their taste in dogs.

Links mentioned:

Tweet from Ryan Putnam (@RypeArts): more spooky vibes
Tweet from Bilawal Sidhu (@bilawalsidhu): Aravind Srinivas: "Einstein spent all his energy thinking about relativity, right? What if there is an Einstein that spends all their energy thinking about your life? Any problem that you're...
no title found: no description found
デトロイトビカムヒューマン Ps4 ゲーム GIF - Detroit Become Human Playstation Game - Discover & Share GIFs: Click to view the GIF
When Hell Freezes Over GIF - Hell Freezes Over When Hell Freezes Over - Discover & Share GIFs: Click to view the GIF
Tweet from Aravind Srinivas (@AravSrinivas): now, you can make delightful charts on perplexity code interpreter!

Perplexity AI ▷ #sharing (8 messages🔥):

GPU Driver Update
Tesla Robovan
CyberCab
Hurricane Milton
Germany's Apostrophe Debate

GPU Driver Update Triggers UN Alarm: The latest GPU driver update has raised concerns involving potential vulnerabilities, causing alarm at the UN.

Discussions center around the implications for security and the technology landscape.
Tesla Unveils Innovative Robovan: Tesla has introduced a new robovan model aimed at enhancing urban transportation solutions.

Features include high electric efficiency and advanced driver-assistance systems.
CyberCab Revolutionizes Ridesharing: The launch of Tesla's CyberCab is set to transform urban mobility with unique design and eco-friendly technology.

Its autonomous features promise to improve rideshare efficiency and safety.
Hurricane Milton Strikes Florida: Hurricane Milton has officially hit Florida, causing widespread disruptions and emergency evacuations, as reported here.

Meteorologists are tracking its path closely due to its unpredictable nature.
Germany's Apostrophe Debate Heats Up: A heated discussion surrounding Germany's use of apostrophes reflects a cultural clash over language rules.

Experts weigh in on whether changes should be made to modernize language standards.

Perplexity AI ▷ #pplx-api (3 messages):

Perplexity API response times
Web socket usage in APIs
Access to citations
Increased rate limits

Perplexity API slower than expected: A user inquired about the 2-second response time of the Perplexity API compared to the less than 1 second response time on the Perplexity Labs site.

They suggested that the use of web sockets on the Labs site may be responsible for the faster response and questioned if this could be implemented in the API.
Requesting access to citations and rate limits: User peso reported emailing support for access to citations and an increased rate limit without receiving a response.

Alex advised peso to forward the request to api@perplexity.ai for quicker assistance.

OpenRouter (Alex Atallah) ▷ #general (75 messages🔥🔥):

Usage issues
Model pricing differences
LLM writing tips
Chat model sharing
Account access glitches

Addressing Usage Issues: A member inquired about DMing staff for billing and usage issues, to which Alex Atallah suggested waiting longer for IDs to show up when hitting the /generation API.

This aligns with various user experiences related to API requests and response issues.
Understanding Model Pricing Differences: Members discussed the price discrepancies between Mistral Nemo 12B Starcannon and Rocinante 12B, with observations indicating Mistral’s competitive pricing strategy.

The conversation highlighted competitive dynamics, noting the lack of other providers for Rocinante 12B may enable higher pricing.
Enhancing Writing with LLMs: A user shared that employing LLMs just for specific parts of an article improved their writing quality significantly.

Another member emphasized that while not everyone is a natural writer, LLMs can help nearly anyone produce decent texts with some effort.
Functionality of Share Models Feature: Users learned that the 'share models' button copies a link to share the current chatroom's model settings but does not include parameters or prompts.

This feature provides a quick way to share model settings but lacks comprehensive detail for deeper sharing.
Account Access Glitches Noted: A user reported glitches allowing access to their old account chats through a different account on the same device, raising concerns about cookies retaining cache.

This led to discussions about how chats are stored locally in browser tools, hinting at possible privacy and data management issues.

Link mentioned: Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.

Eleuther ▷ #announcements (1 messages):

GPT-NeoX library updates
Performance improvement over HuggingFace
New features in GPT-NeoX 3.0
Post-training methods introduced
Testing of new GPT-NeoX features

GPT-NeoX library introduces post-training methods: The HPC team has introduced post-training features for the GPT-NeoX library, allowing users to perform SFT, DPO, and KTO finetuning natively.

This addition is supported by collaboration with members from SynthLabs and enhances the library's capabilities.
GPT-NeoX outperforms HuggingFace's trl: Test results indicate that GPT-NeoX shows a 30% performance improvement over HuggingFace's trl library at the 13B scale.

This improvement comes with better scalability for massive computing systems that trl does not support.
Exciting new features in GPT-NeoX 3.0: Upcoming release 3.0 of GPT-NeoX will include several new features, such as AMD GPUs, Mixture-of-Experts (MoE) layers, RWKV, and Mamba.

Users can already test these features on the main branch while the version is in pre-release bug testing.
Community feedback on GPT-NeoX features: Users are encouraged to provide feedback on their experience with the new features of GPT-NeoX in the designated channel after testing.

This initiative aims to optimize performance and user experience leading up to the stable release.
Learn more about GPT-NeoX through blog posts: Users can follow up on the developments in GPT-NeoX by checking out the blog post by Eleuther and the one from SynthLabs linked within.

Further information and access to the library can be found in the GPT-NeoX library.

Eleuther ▷ #general (53 messages🔥):

Entropy-based sampling
Computational psychiatry
Image-based RAG models
Quantization for LLMs
Inference speed in training

Entropy-based sampling in LLMs raises questions: Discussion on the effectiveness of entropy-based sampling in models like Llama3.1 suggests that validating improvements over baseline reasoning scores is crucial to determine its impact.

One member highlighted concerns regarding basing results on varying sampling methods, emphasizing the need for credible evidence linking techniques to performance boosts.
Exploring computational psychiatry's potential: A member proposed the intriguing idea of investigating LLMs for insights into mental disorders, particularly through 'computational psychiatry' works that utilize ML to model brain behavior.

There was agreement that while LLMs might not exhibit human-like disorders, using their outputs for analysis could provide valuable frameworks, though evidence alignment remains an obstacle.
Challenges with image-based RAG applications: Concerns were raised about the efficacy of using image-based retrievers with RAG systems, with experiences pointing to low relevance of retrieved content from platforms like Colqwen.

Members inquired about successful implementations in other domains and sought guidance on the best text-based embedding models for RAG.
Quantization approach in training: Discussion on Google's AQT revealed it applies quantization to activations and weights before matmul, enhancing efficiency without directly training int8 weights.

It was noted that while it can speed up inference during model training, it might require maintaining a floating-point copy of parameters in memory.
Inference speed improvements during training: Members discussed how the quantization approach could accelerate inference throughout training, indicating an advantage when not limited by bandwidth.

The conversation acknowledged that during training, integrating such efficiency could eliminate overhead from loading weights in and out of memory effectively.

Links mentioned:

Will entropy-based sampling improve Llama3.1 on reasoning benchmarks in 2024?: 47% chance. Entropy-based sampling (colloquially, "the shrek sampler") is a term for a new class of sampling methods for LLMs, intended to "simulate something similar to o1's CoT o...
Brain-Score: Brain-Score is a platform for researchers to test models on how well they predict neural and behavioral brain measurements.
GitHub - stillmatic/entropix: Entropy Based Sampling and Parallel CoT Decoding: Entropy Based Sampling and Parallel CoT Decoding . Contribute to stillmatic/entropix development by creating an account on GitHub.
aqt/aqt/jax/v2/examples/examples.ipynb at main · google/aqt: Contribute to google/aqt development by creating an account on GitHub.
Dementia in Convolutional Neural Networks: Using Deep Learning Models to Simulate Neurodegeneration of the Visual System - Neuroinformatics: Although current research aims to improve deep learning networks by applying knowledge about the healthy human brain and vice versa, the potential of using such networks to model and study neurodegene...

Eleuther ▷ #research (8 messages🔥):

Reward Mechanism on EOS
Endorsing Papers in CS
Benchmark Discussions
ARC Challenge
MMLU Subsets

Reward Mechanism on EOS Explained: A user clarified that an AI only receives a positive gradient reward for the end of sequence (EOS) indicator under very specific conditions, essentially only when it's correct.

In simpler terms, if it knows it has succeeded, it will end the output; if it believes it hasn't, it will continue generating.
Seeking Endorsements for Papers: One user expressed a need for someone to endorse their forthcoming paper in the computational linguistics category (cs.CL), offering to share further details.

This reflects a common practice in the community where individuals seek support and recognition for their research efforts.
Serious Benchmarks Under Discussion: A member posed a question regarding trustworthy benchmarks, leading to a discussion on credible evaluation metrics.

Another pointed out that the ARC Challenge might be the most reliable benchmark available, emphasizing its focus on second-order reasoning capabilities.
MMLU Subsets as Viable Benchmarks: The same member speculated that a specific subset of MMLU could also serve as a valid benchmark, potentially meeting rigorous evaluation standards.

This indicates ongoing scrutiny towards current evaluation processes and the desire for improved metrics in AI assessments.

Eleuther ▷ #scaling-laws (1 messages):
micpie: https://arxiv.org/abs/2410.08184

Eleuther ▷ #interpretability-general (5 messages):

Neural Networks Learn Statistics of Increasing Complexity
LEACE-work motivation
Pythia checkpoints evaluation
Learning dynamics experiments
MLP learning limitations

New arXiv paper on neural networks: The updated paper titled Neural Networks Learn Statistics of Increasing Complexity discusses an evaluation of Pythia checkpoints on sequences sampled from trigram and 4-gram LMs trained on the Pile.

The author expressed excitement, responding to feedback that the paper is 'really cool' and confirmed a connection to LEACE-work.
Future experiments to connect LEACE and learning dynamics: The author plans to redo some experiments that better link LEACE with learning dynamics, indicating a motivation rooted in earlier work.

They noted that small MLPs appear unable to learn anything without quadratic information about the class label.

Link mentioned: Tweet from Nora Belrose (@norabelrose): New arXiv version of our paper, Neural Networks Learn Statistics of Increasing Complexity, including evaluation of Pythia checkpoints on sequences sampled from trigram and 4-gram LMs trained on the Pi...

Eleuther ▷ #lm-thunderdome (5 messages):

lm-eval-harness warnings
Environment variable solution
fast tokenizers

lm-eval-harness triggers multiple warnings: A member reported receiving a warning about tokenizers forking processes upon calling the lm-eval-harness code, leading to excessive outputs of the message.

The warning suggests avoiding the use of tokenizers before the fork or setting TOKENIZERS_PARALLELISM to false.
Set environment variable to solve warnings: Another member shared that they resolved the issue by setting the environment variable to false for tokenizers in their main code, which worked well.

This allows users to circumvent the annoying repetitive warnings without altering their setup significantly.
Fast tokenizers leverage Rust: A participant noted that fast tokenizers utilize Rust, indicating potential performance benefits.

This aligns with advancements being made in the tokenization space to enhance efficiency and speed.
Gratitude for problem resolution: A member expressed their appreciation for the assistance received, stating it helped to fix an issue in torchtune.

Such collaborative problem-solving showcases the community's shared commitment to improving workflows.

aider (Paul Gauthier) ▷ #general (38 messages🔥):

Aider's Performance
DeepSeek Limitations
Configuration Challenges
Video Resource Recommendations
Error Handling in Aider

Aider Outshines Competitors: Users claim that Aider significantly outperforms tools like Cline and Cursor for bug fixes and coding tasks, with one user stating it's the best after extensive testing on various frameworks.

Simply the best for frontend and backend applications, according to multiple members highlighting its efficiency.
Challenges with DeepSeek: Multiple users expressed frustrations with DeepSeek, stating it was slow and unsatisfactory when trying to consolidate functions, highlighting a lack of efficiency for solo developers.

One user shared that they've reverted to using Sonnet-3.5 due to edit format errors and that DeepSeek fell short in comparison.
Configuration of Models: A user sought advice on how to properly configure .env files to set openrouter models, facing issues with defaults switching unexpectedly.

Another user pointed out that using the --edit-format whole option might exacerbate performance issues with DeepSeek.
YouTube Video Praise: One user recommended the videos from a specific channel, emphasizing the quality and clarity of the content presented, sharing a link to it.

Several members expressed hope for the channel to grow, appreciating the informative nature of the videos.
Error Handling in Aider: Discussions surfaced around frequent search/replace errors encountered in Aider, with advice given to utilize the settings for improved outcomes.

Linking to troubleshooting documents, users emphasized the importance of using capable models to mitigate these error occurrences.

Links mentioned:

File editing problems: aider is AI pair programming in your terminal
Linting and testing: Automatically fix linting and testing errors.
Repository map: Aider uses a map of your git repository to provide code context to LLMs.

aider (Paul Gauthier) ▷ #questions-and-tips (22 messages🔥):

Aider not structured for dependency
Best practices for Aider usage
Using Aider with environment variables
Tutorials and resources for Aider
Infinite output in custom models

Aider is not structured for easy dependency inclusion: Members discussed that Aider's architecture isn't designed to be included as a dependency in projects, as it's intended for interactive use.

One suggested setting a bash alias to ease usage, while another highlighted that using aider --show-repo-map can assist in viewing useful structures.
Practical tips for using Aider effectively: A member emphasized that if Aider gets stuck or confused, it can often be faster to manually write code and then seek help for subsequent tasks.

The joy of coding was noted as a motivating factor, implying coding can provide an energy boost and improve productivity.
Calling Aider with gemini-api-key: A user inquired whether they could pass the gemini-api-key directly when calling Aider, but it was noted that only environment variables are accepted for non-OpenAI or Anthropic keys.

The clarification emphasizes the need for proper configuration for using alternate API keys.
Resource sharing and tutorial videos for Aider: A user expressed excitement about discovering Aider and sought video resources for live coding examples and advanced usage techniques.

Paul shared a comprehensive list of tutorial videos, highlighting their practical applications and showcasing strategies for effective usage of Aider.
Setting infinite output for custom models: An inquiry was made on how to enable infinite output for a custom model, specifically asking about the requirements within the model's metadata.

It was clarified that for the model metadata JSON, the field `

Links mentioned:

YAML config file: How to configure aider with a yaml config file.
Specifying coding conventions: Tell aider to follow your coding conventions when it works on your code.
DevDocs: Fast, offline, and free documentation browser for developers. Search 100+ docs in one web app including HTML, CSS, JavaScript, PHP, Ruby, Python, Go, C, C++, and many more.
Options reference: Details about all of aider’s settings.
Tutorial videos: Intro and tutorial videos made by aider users.

aider (Paul Gauthier) ▷ #links (8 messages🔥):

Chain of Thought Algorithm
Diffsitter Tool
Difftastic Exploration
PHP Search/Replace Issues
Troubleshooting Editing Errors

Chain of Thought Algorithm Development: A member expressed excitement about working on a chain of thought algorithm, finding the timing of shared resources helpful.

Another member encouraged this initiative, wishing them luck in their undertaking.
Introduction to Diffsitter Tool: Diffsitter is a tool that creates semantically meaningful diffs by computing differences on the AST (abstract syntax tree) instead of raw text.

It effectively ignores formatting differences, leading to cleaner diffs without issues from extra spacing.
Interest in Difftastic: A member inquired if others have seen difftastic, sparking interest in this additional tool.

One member acknowledged unfamiliarity but promised to investigate it further, noting problems with search and replace in PHP.
PHP Search/Replace Challenges: A member highlighted recurring search/replace issues with PHP, where code gets appended rather than replaced, leading to lint errors.

They expressed frustration over the inability to resolve these code issues effectively.
Troubleshooting Guide for Editing Errors: Another member shared troubleshooting guidelines for editing errors that might occur when using LLMs.

The guide includes tips like using capable models, such as GPT-4o or Claude 3, to minimize disobedience in system prompts.

Links mentioned:

File editing problems: aider is AI pair programming in your terminal
GitHub - afnanenayet/diffsitter: A tree-sitter based AST difftool to get meaningful semantic diffs: A tree-sitter based AST difftool to get meaningful semantic diffs - afnanenayet/diffsitter

OpenAI ▷ #ai-discussions (53 messages🔥):

Voice Modulation Techniques
Perception of AI as Psychopaths
OpenAI Copilot Comparison
AI's Bedside Manner vs Human
Impact of Intellectual Property on Innovation

Voice Modulation vs Singing: Discussants shared methods to encourage AI voice modulation, emphasizing it can effectively replicate singing without actually singing, using specific prompts like voice modulation instead of singing.

One user mentioned efforts to prompt the AI to perform in a more expressive style, revealing some frustration when the AI refrained from drama or poetry-inspired performances.
AI and Psychopathy Comparison: A member posited a comparison between high-functioning psychopaths and AI, suggesting both operate primarily on logical calculations with a lack of emotional burden.

The conversation sparked debate regarding whether the traits of psychopathy could have been intentionally modeled in AI systems, blending humor and seriousness in the discussion.
Debates on OpenAI Copilot's Efficacy: Users critiqued the latest version of OpenAI Copilot, highlighting perceived decreases in performance compared to previous iterations and comparing it unfavorably to Google's Gemini.

While some defended the new model, suggesting that criticisms stemmed from misunderstandings of its functionality, others argued that it lacked essential features like typing animations.
AI's Superior Bedside Manner: Several members humorously noted reports suggesting that AI exhibits better bedside manner than human doctors, posing a thought-provoking reflection on the nature of empathy in AI.

This led to an exploration of whether psychopathic traits in medical professionals could inadvertently lead to better decision-making in critical situations, adding a darkly comedic twist.
Impact of Intellectual Property on AI Innovation: Acknowledgements of the constraints that intellectual property laws impose on the innovation process within AI development emerged, focusing on concerns regarding monetization and risks of litigation.

The discussions emphasized the tension between creativity and ownership, suggesting that these legal frameworks may stifle groundbreaking advancements in AI technology.

Link mentioned: GPT Unicorn: no description found

OpenAI ▷ #gpt-4-discussions (9 messages🔥):

Interacting with ChatGPT
Assessing Ideas with ChatGPT
Irrational Responses for Fun
Embedding Compatibility

ChatGPT Interaction Limitations: A user expressed frustration about not being able to talk to ChatGPT in the server, but was directed to chatgpt.com for free access.

This led another user to point out issues with saving answers on that platform.
Seeking Rational Assessments: A member advised prompting ChatGPT to assess ideas rationally and clinically to avoid glorified responses.

They emphasized providing a clear context to get a more objective evaluation of ideas.
Fun with Irrational Responses: A member suggested that asking AI to be irrational could lead to more entertaining outcomes.

This hints at a playful approach to engaging with AI responses.
Embeddings Compatibility Inquiry: A user inquired about the compatibility of text-embedding-ada-002 and text-embedding-3-small when running cosine similarity.

They questioned whether using the two together would result in incoherent answers or cause errors.

OpenAI ▷ #prompt-engineering (2 messages):

Activity in Prompt Engineering Channel
Engagement from Members
Discord Dynamics

Member Questions Channel Engagement: hydarnes_46264 expressed confusion about the low activity levels in the Prompt Engineering channel, questioning if they were in the wrong place.

eskcanta pointed out that activity does occur when members have specific questions or insights to share.
Understanding Channel Activity: eskcanta noted that while there are over 110K members, not all are involved in academic research, which may explain the quietness.

eskcanta mentioned that many may not feel compelled to contribute their insights, leading to a perceived lack of engagement.

OpenAI ▷ #api-discussions (2 messages):

Channel Activity Levels
Prompt Engineering Interest
Member Engagement

Channel Activity Levels questioned: One member expressed concern about the lack of activity, asking, 'am I in the wrong channels or why is it so silent in here?'.

They noted that prompt engineering might seem uninteresting currently, especially with over 110K members in the channel.
Response to Activity Concerns: Another member pointed out that scrolling up reveals some activity, mentioning members typically engage when they have questions or information to share.

They suggested that the original poster's question was too narrowly phrased, as many members might not be in the academic research field.
Member Engagement Variability: There was a noted variability in member engagement, with some members potentially feeling they lack insights to contribute.

This reflects the diverse interests of the channel's audience, not solely focused on prompt engineering.

Stability.ai (Stable Diffusion) ▷ #general-chat (62 messages🔥🔥):

ComfyUI for AI Generation
Using AMD for AI
Stable Diffusion with 3060 Ti
Lora Trigger Words Management
Model Merging Techniques

ComfyUI as a Preferred Tool: Members indicate that ComfyUI is favored for Flux usage, while Automatic1111 is suggested for beginners wanting to start with Stable Diffusion.

Another recommendation includes using PyTorch or Diffusers for command line interface work.
Challenges with AMD for AI Computation: A member expressed frustration regarding the lack of CUDA support on their AMD GPU, mentioning challenges with Python.

One user offered guides for using ZLUDA versions for those with AMD GPUs featuring 8GB or more of VRAM.
3060 Ti's Capabilities in Stable Diffusion: It was confirmed that the 3060 Ti is a good GPU for Stable Diffusion, with suggestions to upscale images to maximize quality despite its 8GB VRAM limitation.

Members shared that using quantizations and tiled upscaling can help achieve better quality outputs.
Managing Trigger Words for Loras: A user inquired about strategies to remember trigger words for Loras and whether there's an automated way to add them.

This question sparked discussions about the challenges faced in effectively managing Lora trigger words.
Merging Models for Quality Enhancement: A discussion surfaced about the differences in merging models versus consecutive passes, with insights on using specific sigma values in diffusion steps.

Members noted that merging two models likely averages their capabilities, resulting in a balanced performance.

Links mentioned:

AMD Tunes ROCm For Consumers: Turns Your "Radeon" Systems Into a Localized AI Solution: AMD has made a significant effort for AI workloads, as the firm has now pushed out support for ML development on RDNA 3 architectures.
Geeky RemB - Geeky Remb live portrait | Flux Workflows | Civitai: Flux Schnell NF4 Animated Composition Workflow This powerful ComfyUI workflow harnesses the Flux Schnell NF4 model to create stunning, animated com...
Webui Installation Guides: Stable Diffusion Knowledge Base (Setups, Basics, Guides and more) - CS1o/Stable-Diffusion-Info
Flux xy grid (Dark Mode) - v3.5 | Flux Workflows | Civitai: Please share your creations. I would love to see how you have used with this workflow tool. An X/Y grid for evaluating parameters in FLUX. I also m...

GPU MODE ▷ #general (2 messages):

GPU Engineer internship preparation
Attention layer with cuDNN's SDPA

Advice for GPU Engineer Internship: A member requested resources and advice for preparing for a GPU Engineer internship, specifying requirements like being in the final year at university and having a strong background in CUDA.

They mentioned that the test would encompass both multiple-choice questions and coding tasks.
Seeking Guidance on cuDNN's SDPA Implementation: Another member inquired about a tutorial or implementation of an Attention layer using cuDNN's SDPA in Python.

They expressed confusion regarding the instantiation of the pygraph and sought help, referencing a notebook from the cudnn-frontend's repository.

GPU MODE ▷ #triton (2 messages):

Kernel Types

Discussion on Persistent vs Non-Persistent Kernels: A member inquired about whether another member is working on a persistent kernel or a non-persistent kernel.

This prompted a follow-up from another member indicating they would answer in a different thread, thanking the inquirer for their question.
Clarification on Response Location: Another member acknowledged the question about kernel types and stated they would provide their answer in a separate thread.

This shows the propensity for organized discussions in the channel, encouraging clarity and focused responses.

GPU MODE ▷ #torch (15 messages🔥):

Training Llama 7B on limited GPU
Optimizer and parameter offloading
Memory management techniques
CUDA and PyTorch insights
Importance of gradient checkpointing

Strategies for Training Llama 7B on a 16GB GPU: A member expressed the challenge of training Llama 7B, which requires 28GB of GPU memory, on a single 16GB GPU and discussed trading time for space.

Multiple members suggested using tools like FSDP2 and techniques like activation checkpointing to optimize memory usage during training.
Exploring Optimizer and Parameter Offloading: Discussion around using an optimizer that can offload gradients on a single GPU led to suggestions for a CPU offload optimizer capable of full fine-tuning on a 16GB GPU, albeit slowly.

Members highlighted the importance of a dependency graph for prefetching during training, referencing the implementation details found in FSDP2 source.
Memory Management Techniques for Efficient GPU Usage: Potential memory optimization strategies discussed include loading data only when necessary and using gradient checkpointing to store only a subset of activations, which can save memory during training.

The conversation included questions on how to determine which layers or activations to keep in memory and considerations for model sharding during the training of LLMs.
Beginner Insights on CUDA and PyTorch Resources: A new member shared their experience watching GPU mode lectures and studying papers like Zero-Offload and vLLM for better understanding memory issues in training large models.

They inquired about additional resources or blogs that could provide further insights on using PyTorch for managing GPU memory effectively.
Humorous Takes on Engineering Challenges: Light-hearted remarks on the difficulties of maintaining state during model training included suggestions for simpler approaches such as keeping embedding and LM head in memory.

Several participants laughed at the amount of effort required to build a prefetch graph for model training while acknowledging the fun of tackling these engineering challenges.

Links mentioned:

torchtune/recipes/lora_finetune_single_device.py at main · pytorch/torchtune: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.
GitHub - pytorch/torchtune: PyTorch native finetuning library: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.
torchtune/recipes/lora_finetune_single_device.py at main · pytorch/torchtune: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.

GPU MODE ▷ #cool-links (3 messages):

SF Tech Week Event
INTELLECT-1 Decentralized Training Model

Chill Session at SF Tech Week: A group is pivoting their SF Tech Week event to a cozy meetup at sf parc, welcoming designers, developers, and researchers to foster friendships without networking pressure.

Dress code: 'dripped-out technology brother/sister' fosters a relaxed atmosphere, emphasizing listening over pitching.
Launch of INTELLECT-1 for Open-Source AGI: INTELLECT-1 is announced as the first decentralized training of a 10 billion parameter model, scaling efforts 10x beyond previous initiatives.

The invite extends to anyone interested in contributing to the development of open-source AGI, symbolized by a butterfly emoji 🦋.

Links mentioned:

Tweet from Prime Intellect (@PrimeIntellect): Announcing INTELLECT-1: the first-ever decentralized training of a 10B model Scaling decentralized training 10x beyond prior efforts. Anyone can join us to build open-source AGI 🦋
RSVP to deep galactic chillout | Partiful: We decided that the entire world in SF Tech Week is doing a hackathon so we ought to do something to end it off. Thus, like SF startups often 'pivot', we're pivoting our event to a ch...

GPU MODE ▷ #youtube-recordings (2 messages):

Lecture scripts/notes
PyTorch cuda & memory profiler

Inquiry on Lecture Material Availability: This lecture was very helpful; a member inquired if the scripts or notes have been uploaded.

Another member expressed that unfortunately, this never materialized, suggesting exploring the PyTorch cuda & memory profiler documentation for guidance.
Advice on Profiling Strategies: To clarify profiling processes, a member recommended keeping the areas profiled as short as possible, ideally limited to single forward-backward passes.

They encouraged starting with a couple of about:tracing chrome traces to make the overall strategy clearer.

GPU MODE ▷ #torchao (25 messages🔥):

quantization in torchao
issues with comfyui ecosystem
torch.export()
cuBLAS restrictions
performance of torchao

Quantization struggles and preferences in torchao: A member expressed that they prefer torchao for better quantization but questioned why quantizations are a tensor subclass, suggesting the need for a simple int8 tensor replacement.

Concerns regarding comfyui ecosystem implementation: Concerns were raised about the comfyui ecosystem being too poorly implemented to warrant time investment in troubleshooting its issues.

Another member disagreed, suggesting it might still be worth pursuing despite the challenges noted.
Challenges with torch.export() and tensor manipulation: A member encountered problems with torch.export() when dealing with tensor mutations, specifically needing to use unwrap_tensor_subclass to address the issue.

They confirmed finding success with torch 2.4.1, despite discussion around needing to stay current with versions.
Discussion around cuBLAS and IMMA kernel restrictions: A member referenced assertions in torchao related to cuBLAS restrictions and wanted documentation on the derived rules for these kernels.

They noted that these rules seem to mirror the FP8 requirements found in cuBLAS documentation.
Performance Insights on torchao: One member shared that while torchao is slow at 6.68s/it, it works effectively, though fixes around compilation and other issues are needed.

Discussion highlighted the need for blending quantization and dequantization operations to enhance performance.

Links mentioned:

torch.export() fails on aten.to(..., copy=True) followed by mutation · Issue #131679 · pytorch/pytorch: repro: import torch class Mod(torch.nn.Module): def forward(self, x): y = torch.ops.aten.to(x, dtype=torch.float16, copy=True) y.mul_(2) return y x = torch.randn(4) m = torch.export.export(Mod(), (...
Resubmit _int_mm by cpuhrsch · Pull Request #96685 · pytorch/pytorch: Avoids any changes to gemm_and_bias cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbe...
ao/test/integration/test_integration.py at 10601b3ece80f6aba856556f73bf98a21a52f1df · pytorch/ao: PyTorch native quantization and sparsity for training and inference - pytorch/ao

GPU MODE ▷ #off-topic (1 messages):

Brunch Recipes
Fireweed Tea
Ground Beef Burgers

Delicious Brunch Combo: A member enjoyed a brunch featuring 1 L of fireweed tea with milk and stevia powder, paired with 3 burgers made from ground beef patties and fresh tomato slices.

This combination offers a rich variety of flavors and a boost of energy to start the day.
Savory Ground Beef Burgers: The 3 burgers were prepared using ground beef burger patties which brought a savory taste paired with sliced tomatoes.

This choice adds freshness and complements the hearty nature of the burgers.

GPU MODE ▷ #rocm (3 messages):

ROCm Windows Support
PyTorch Issue Discussion

ROCm Now Supports Windows Natively: From version 6.3, ROCm has rolled out native support for Windows, enabling AMD users to leverage its capabilities more seamlessly.

Excitement arises as this shift allows broader access to GPU technology for Windows users, as detailed in the GitHub issue.
Clarification on Windows Support Statements: A member questioned the explicitness of the statement regarding Windows support, shedding light on the lack of clear communication.

This prompts a broader discussion on the clarity of documentation and feature announcements related to ROCm's compatibility with Windows.

Link mentioned: ROCm & Windows Support · Issue #106608 · pytorch/pytorch: 🚀 The feature, motivation and pitch AMD has release ROCm windows support, as docs.amd.com shows: Please add PyTorch support of Windows on AMD GPUs! Alternatives No response Additional context No re.....

GPU MODE ▷ #intel (2 messages):

Codeplay vs Coldplay
Intel's acquisitions
Imagination/PowerVR
GPU backend compiler

Codeplay Misunderstanding: A member humorously pointed out a mistake in referring to Codeplay instead of Coldplay, expressing bemusement at some of Intel's strange acquisitions.

Imagination/PowerVR contracted Codeplay for their GPU backend compiler on SGX/Rogue 10-15 years ago, highlighting their high competence and cost.
Reflection on Historical Collaborations: A follow-up comment acknowledged the previous error regarding Coldplay and confirmed the reference was indeed meant for Codeplay.

The light-hearted conversation reveals a shared history and appreciation for the capabilities of Codeplay in the GPU space.

GPU MODE ▷ #arm (1 messages):

SIMD programming on ARM
OpenCL vs SIMD Intrinsics
RK3588 platform performance

Exploring SIMD Programming on ARM: There is an ongoing discussion about the general consensus on getting into SIMD programming on ARM, with members sharing their experiences and insights.

Curiosity about the best path forward has sparked interest in various frameworks and methodologies.
OpenCL vs SIMD Intrinsics Dilemma: A member raised a question about the effectiveness of using OpenCL as opposed to SIMD Intrinsics on platforms like the RK3588.

The uncertainty stems from whether one approach offers significant advantages over the other in terms of performance.
Preferred Frameworks for SIMD Programming: The discussion included inquiries about preferred frameworks for working with SIMD programming on ARM.

Members expressed varied opinions on different tools based on their own experiences, emphasizing practical implementations.

GPU MODE ▷ #self-promotion (1 messages):

PyTorch Expert Exchange
Streaming LLM
Efficient Language Models

Guangxuan Xiao presents on Streaming LLM: The next PyTorch Expert Exchange Series features Guangxuan Xiao discussing StreamingLLM on October 11th at 10AM PST.

Check out the GitHub repository for more details on the project titled Efficient Streaming Language Models with Attention Sinks.
YouTube Video on Efficient Streaming LLMs: An accompanying YouTube video titled 'Efficient Streaming Language Models with Attention Sinks' elaborates on the discussion and application of LLMs in streaming.

The video features insights from Guangxuan Xiao regarding deploying large language models effectively in streaming applications.

Links mentioned:

GitHub - mit-han-lab/streaming-llm: [ICLR 2024] Efficient Streaming Language Models with Attention Sinks: [ICLR 2024] Efficient Streaming Language Models with Attention Sinks - mit-han-lab/streaming-llm
Efficient Streaming Language Models with Attention Sinks: Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, MIT EECSDeploying Large Language Models (LLMs) in streaming applications such as ...

Nous Research AI ▷ #general (44 messages🔥):

Llama 3.2 Fine-tuning Issues
Speculative Decoding Algorithm
Set Theory Discussion
Hieroglyph Inclusion
OpenAI Prompt Generation

Llama 3.2 Fine-tuning issues raise concerns: A user experienced freezing during full finetuning on the Llama 3.2 1B model, indicating potential NCCL issues, especially with the dataset being used.

Another member noted successful performance using Llama 3 8B QLoRA, suggesting a possible configuration issue with the finetuning process.
New Speculative Decoding algorithm faster than Groq: A member announced that their new speculative decoding algorithm is outperforming Groq in speed, sparking interest for further details.

Others requested a deeper explanation, highlighting excitement around this advancement in resource efficiency.
Set Theory ZFC Axioms Optional Expertise: In response to a query about the ZFC axioms, various members chimed in with limited knowledge, expressing curiosity about the topic.

One participant specified familiarity primarily with the axiom of choice, indicating interest in broader discussions around set theory.
Hieroglyph inclusion raises questions: Concerns were raised about the inclusion of hieroglyphs during processing, with speculation around potential weird encoding issues.

Members discussed the undefined impact of such inclusions on the outputs, leaving the final conclusion open-ended.
OpenAI's Prompt Generation Metaprompt: A member shared details about OpenAI's metaprompt for generating system prompts for tasks, hinting at future integrations with DSPy.

They provided a link to the OpenAI documentation, signaling intriguing developments in prompt generation methodologies.

Links mentioned:

Jashancutie Bumsekichu GIF - JashanCutie BumSeKichu - Discover & Share GIFs: Click to view the GIF
Reddit - Dive into anything: no description found
GitHub - stillmatic/entropix: Entropy Based Sampling and Parallel CoT Decoding: Entropy Based Sampling and Parallel CoT Decoding . Contribute to stillmatic/entropix development by creating an account on GitHub.

Nous Research AI ▷ #ask-about-llms (9 messages🔥):

Use cases for O1
O1 performance in coding
Iterative capabilities of O1
O1 vs GPT-4o in tasks

Exploring O1's Use Cases: A member asked about the best use cases for O1, noting its effectiveness in coding but wondering about other strengths.

Responses indicated a sentiment that O1 excels primarily in math, with limited utility in coding.
Disappointments in Code Generation: A member expressed disappointment with O1's performance in non-trivial code generation, stating it was often easier to write the code independently.

Another participant echoed this frustration, struggling with iterative coding tasks due to O1's reluctance for multi-turn steps.
Challenges with Prose Iteration: Users noted that O1 also struggled with iterating on prose, especially in later iterations where it failed to follow instructions well.

This trend raised concerns about its overall reliability in both coding and creative writing tasks.
Comparative Performance Analysis: One member shared private evaluations, highlighting GPT-4o as superior in direct answering tasks, particularly in math exercises.

They noted that O1 Mini had a slight advantage over GPT-4o on coding tasks, while O1 Preview excelled in the PAL approach.

Nous Research AI ▷ #interesting-links (1 messages):
vincentweisser: https://github.com/openai/mle-bench/

Cohere ▷ #discussions (3 messages):

Community Greetings

Warm Welcome in the Community: Members exchanged greetings, with one member enthusiastically saying, 'hello everyone'.

Competent responded positively, contributing to the friendly atmosphere.
Friendly Atmosphere Maintained: The message exchange set a relaxed tone within the community, showing openness for conversations.

The exchanges indicate that members are ready to engage and interact.

Cohere ▷ #questions (2 messages):

Cohere API v1 and v2
Web search connector
API documentation navigation

Web Search Connector Guide: A member inquired about enabling the Internet search tool for API requests, mentioning difficulty finding instructions in the documentation.

Another member clarified that the web search connector is available in the v1 API, with migration options for v2 discussed in the Cohere migration guide.
Differences Between API Versions: The discussion highlighted key differences when migrating from Cohere API v1 to v2, especially regarding model and embedding_types as required fields in v2.

The change in message structure from v1's separate parameters to a unified messages parameter in v2 was also noted.

Link mentioned: Migrating From API v1 to API v2 — Cohere: The document serves as a reference for developers looking to update their existing Cohere API v1 implementations to the new v2 standard.

Cohere ▷ #api-discussions (12 messages🔥):

V2 API Performance
Cohere API Toolcall Issue
API Token Usage

V2 API confirmed slower than V1: Users reported that the v2 API is consistently and clearly slower than v1, with response times now reaching 2-3 seconds compared to the 1-1.5 seconds of v1.

One user noted that even with the numbers shared, the v2 response times appear to be significantly delayed.
GitHub issue for Cohere API closed: A user encountered performance issues with the Cohere API for toolcall, but mentioned that the related GitHub issue has been closed.

They asked for insights about the problem and any potential resolutions as they were on version 5.11.0.
Clarification on using text limits in Cohere: A user inquired about limiting Cohere based on specific text and sought recommendations between Cohere and Gemini.

Another user asked for clarification on what limiting by text entails and whether it's related to a system prompt.
Usage of tokens in API requests: Questions arose regarding the necessity of using tokens like <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|> in API requests, with users wondering if responses would still be decent without them.

The discussion highlighted the importance of understanding token requirements when using the API effectively.

LlamaIndex ▷ #blog (1 messages):

AI Builders Night
Lightning Demos
Multi-Agent Systems
Zoom Developer Platform
Meetup speakers

AI Builders Night at Zoom HQ: Join us on Monday for AI Builders Night at Zoom HQ in San Jose, featuring our own Biswaroop Palit from LlamaIndex. He will discuss multi-agent systems in production along with insights from QDrant.

Don't miss the chance to connect with other developers and engage in stimulating discussions about AI advancements.
Showcase your creations at Lightning Demos: The meetup will include lightning demos where attendees can showcase their AI-powered use cases using the Zoom Developer Platform. This is a fantastic opportunity to present innovative projects and receive feedback.

Attendees are encouraged to share highlights from the event on social media using the hashtag #ZoomDevelopers.
Meetup Speakers Lineup Revealed: Exciting talks await with speakers such as Biswaroop Palit from LlamaIndex and Thierry Damiba from QDrant. Their presentations will provide valuable insights into current AI trends and applications.

Be sure to check out their profiles for more information before the event starts to get the most out of these discussions.

Link mentioned: AI Builders Night @ Zoom HQ · Luma: Zoom Developers are excited to come back for our October Meetup at our HQ. This time we will be having LlamaIndex and QDrant. For this upcoming meetup, in…

LlamaIndex ▷ #general (13 messages🔥):

Workflows Implementation
Symphony Automation for Agentic Workflows
OpenAI Batch API Limitations
Agent Memory and Query Responses
AI Mayhem V3 Hackathon Sponsorship

Clarifying Workflow Uses: A user inquired whether workflows should be created anew for each request or if a single instance should be reused. The response highlighted that workflows are stateless by default, unless specified otherwise.

Example code was provided to demonstrate both stateless usage and how to persist state between workflow runs.
Symphony Automates AI Workflow: A member shared that Symphony automates agentic workflow development, generating high-performance workflows based on provided tools and task descriptions. They encouraged joining their Discord for an API key and referred to a Loom video for more insights.

This approach promises a simplified way for developers to create efficient AI workflows without extensive manual setup.
OpenAI Batch API Usage in Document Summary Index: A member asked if the OpenAI Batch API could be utilized within the Document Summary Index of LlamaIndex. The response clarified that the library is designed for efficiency, indicating that using the Batch API would not align with their operational standards.

The tone indicated a playful frustration with the duration of potential claims on operations, suggesting that quicker alternatives are preferred.
Agent Memory Confusion: A user expressed confusion about why their agent's memory response was not referencing previous context as expected. They noted that the overlap between primary and secondary memories could be the cause of receiving unexpected answers.

They inquired whether there were ways to guide the agent's responses beyond simply modifying the system prompt.
Sponsorship Opportunities for AI Hackathon: A representative from Zo World introduced an AI hackathon, AI Mayhem V3, taking place in San Francisco and Bangalore, seeking sponsors and partners. They outlined various opportunities for brand visibility and networking specifically aimed at engaging with top developers.

The member asked for direction on whom to contact regarding sponsorship, highlighting the potential impact of participating in this cross-continental event.

Link mentioned: Designing an End-to-End Multi-Agent AI Workflow with Symphony 🤖: Today, I guide you through creating a multi-agent AI workflow using Symphony. We explore available tools like perplexity and image-to-text, and how to add new ones. I show how to design workflows, run...

OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):

Large multi-nodes deployment
Fine-tuning Llama-3-8B

Best spots for large multi-nodes found: Any large cloud provider such as AWS is recommended for deploying large multi-nodes as long as the nodes are in the same region.

This setup can provide efficient management and connectivity among nodes.
No speed increase in multi-GPU fine-tuning: A member expressed frustration fine-tuning Llama-3-8B with two 3090s but observed no speed increase compared to using a single 3090.

They noted that both GPUs showed over 98% utilization, raising concerns about their setup and effectiveness of data parallelism with DeepSpeed.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (9 messages🔥):

Llama Tokenizer Modification
SMILES String Processing
Molecule Design Optimization

Custom Llama Tokenizer for Single Character Tokens: A member discovered that customizing the Llama tokenizer to split text into single character tokens can be achieved by subclassing LlamaTokenizer and overriding its tokenize method.

This method involves modifying the tokenizer's functionality to efficiently process strings at the character level, which could be beneficial for unique applications like molecule design.
Exploring Use Cases for Tokenization Changes: One member inquired about the use case for character-level tokenization, prompting another to explain that it aims to optimize large language models for designing molecules.

They expressed curiosity about how this modification might impact the processing of SMILES strings, which are commonly used to represent molecular structures.
Adjusting Sequence Length for Character Tokens: There's a note that when tokenizing at the character level, it may require adjusting the model's maximum sequence length to accommodate the longer sequences produced.

This could be significant in terms of training and inference times, thereby affecting general model performance.
The Uniqueness of the Approach: Another member highlighted the uniqueness of the approach being discussed, indicating interest in its potential implications.

The conversation emphasizes a collaborative exploration of innovative methods in model finetuning and tokenization techniques.
Processing SMILES Strings with the Tokenizer: A member provided a visual example of the tokenizer processing a SMILES string for a molecule, demonstrating real-world application.

They commented that while they do not expect major changes from modifying the tokenizer, the results could still be intriguing.

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search): Understand code, faster.

DSPy ▷ #show-and-tell (6 messages):

Batch-GPT Tool
DSPy Onboarding Form
OpenAI DSPy Optimizations
Structured Outputs Project
AGI Automation Discussion

Batch-GPT cuts OpenAI API costs by 50%: A member shared a tool called Batch-GPT that has been developed to cut OpenAI API costs by 50%+ using their Batch API, allowing for cost-effective integration.

The project is open-source and includes features like auto-caching for repeated queries, making it easy to implement with a simple code integration: client = OpenAI(..., base_url="http://batch-gpt/v1").
DSPy onboarding form gains traction: A member introduced an onboarding form that effectively guides users through the building blocks of DSPy, indicating its helpfulness for new users.

There was excitement about the potential of automating this process to enhance user experience, drawing a lighthearted connection to AGI capabilities.
OpenAI to utilize DSPy optimizations: A link was shared highlighting that OpenAI plans to implement DSPy optimizations in the future, signaling an important development in their services.

This news was well-received, suggesting a positive outlook on enhanced performance and efficiency in upcoming OpenAI iterations.
Structured Outputs Project revealed: A member showcased a project on GitHub, dslmodel, focused on producing structured outputs using DSPy and Jinja2, which is currently open for contributions.

The initiative aims to streamline the development process, making it easier for users to integrate structured outputs in their workflows.
Exploration of AGI through automation: A member posed an intriguing question about whether automating the conversion of problems into a specific format could be considered a step towards AGI.

This sparked discussions about the importance of structured problem-solving and its alignment with the principles behind artificial general intelligence.

Links mentioned:

Introduction to DSLModel Framework: https://github.com/seanchatmangpt/dslmodel Hi everybody, Sean Chatman here, introducing the DSL Model Framework, a powerful tool for modeling data using DSPy and Jinja. In this video, I explain how t...
GitHub - seanchatmangpt/dslmodel: Structured outputs from DSPy and Jinja2: Structured outputs from DSPy and Jinja2. Contribute to seanchatmangpt/dslmodel development by creating an account on GitHub.

DSPy ▷ #papers (1 messages):

In-Context Learning
GraphIC Technique
Bayesian Networks
ICE Selection
Multi-step Reasoning

GraphIC Technique Enhances ICL: The proposed GraphIC method utilizes graph-based representations alongside Bayesian Networks to improve In-context Learning (ICL) by selecting optimal in-context examples.

This technique mitigates shallow semantic biases, focusing on deeper reasoning structures essential for tasks requiring multi-step reasoning.
Bias in Traditional ICL Approaches: Conventional text-based embedding methods often fall short in selecting in-context examples (ICEs) for complex reasoning tasks due to introduced biases from shallow semantics.

The discussion highlights that these biases can hinder LLM performance on tasks such as mathematical and logical problem solving.
Bayesian Networks in GraphIC: Bayesian Networks play a critical role in the GraphIC approach, capturing the dependencies of a node’s attributes effectively.

This structure allows for a more refined selection process of ICEs, enhancing overall reasoning capabilities of LLMs.

Link mentioned: GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning: In-context learning (ICL) enables large language models (LLMs) to generalize to new tasks by incorporating a few in-context examples (ICEs) directly in the input, without updating parameters. However,...

DSPy ▷ #general (4 messages):

DSPy module COT invocation
Throttling errors with LiteLLM
LLM classifier ambiguity handling

DSPy module faces throttling errors: A member reported issues with throttling errors when performing COT invocations to classify an input string across a pandas DataFrame using a DSPy module with Claude Sonnet via bedrock.

They noticed there were no throttling issues when directly hitting the bedrock API without using DSPy, suggesting a potential problem with the integration.
Seeking details on throttling errors: Another member requested more information regarding the specific errors encountered during the DSPy COT invocation.

This inquiry indicates a community interest in understanding the underlying issues better to address potential concerns.
Classifying ambiguity in LLM results: A member shared their experience training an LLM classifier using DSPy but expressed a desire for the model to indicate ambiguities in classification, stating it should say, Requires more info, ambiguity between class A and B.

They asked if it's advisable to create separate classes for all possible ambiguities, opening a discussion on best practices for handling nuanced classification outcomes.

tinygrad (George Hotz) ▷ #general (6 messages):

Int64 Indexing in Tinygrad
Data Type Casting
Type Annotation in nn/__init__.py

Int64 Indexing Needs Precision: Discussion arose regarding the application of int64 indexing only on ALUs where it can exceed, as referenced in ##6987.

Tinygrad raises concerns if two different data types are used together, prompting members to consider operator compatibility.
Slow Int64 on GPU Prompts Casting: Concerns about int64 being slow on the GPU surfaced, leading members to discuss the necessity of casting between different data types.

The agreement was reached to only utilize int64 indices when strictly necessary to improve performance.
Call for Type Annotations in nn/init.py: A member highlighted that all classes in nn/init.py need type annotations, underlining their importance for clarity.

George suggested that this could serve as a promising first pull request for contributors to tackle.

tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):

Diffusion Policy Learning
Examples Directory Preferences
KAN Example PR Review

Diffusion Policy shows impressive robot learning: The paper on Visuomotor Policy Learning via Action Diffusion highlights the Diffusion Policy, which improves robot behavior generation, achieving a 46.9% average advantage over existing methods across various benchmarks.

It deftly manages multimodal action distributions and high-dimensional action spaces, leveraging stochastic Langevin dynamics for stable training.
Preference for one file in examples directory: In response to inquiries about organizing the examples/ directory, George stated that having one file is preferred over several smaller files, but with an emphasis on high-quality code.

This guidance supports the creation of coherent and polished examples that are easy to understand.
KAN example undergoing review: A member is refining their KAN example for the repository, implementing a FastKAN that trains quickly on MNIST.

Additionally, they mention a TransformerKAN example available for transfer, showcasing their ongoing contributions to the tinygrad project.

Links mentioned:

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion: This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy a...
FastKAN example by mdaiter · Pull Request #6690 · tinygrad/tinygrad: This implements a FastKAN, detailed here: https://arxiv.org/abs/2405.06721 Super quick to train! Trains on MNIST in here. Also, I've tested the Attention transformer module included in here as...

Torchtune ▷ #general (5 messages):

1.58B BitNet Model Implementation
Gemma-2 Multilingual Capabilities

Efficient Implementation of 1.58B BitNet Model: A member inquired about effectively implementing the 1.58B BitNet model using only additions in matrix multiplication instead of multiply-accumulate.

It was suggested that efficient implementation requires hardware solutions, as using NVIDIA GPUs with tensor cores may offer better performance.
BitNet's Integer-Based Efficiency: Discussion revealed that BitNet uses int8 for activations, making integer additions potentially more efficient than floating point ones if implemented in hardware.

Members emphasized that rethinking the operations used can lead to a more efficient model.
Gemma-2 Implementation Hurdles: A member asked for updates on the Gemma-2 implementation, noting its promising multilingual capabilities but issues with full fine-tuning using QLora.

Concerns were raised that QLora fine-tuning heavily depends on parameter choices, making optimal performance difficult to achieve.
Call for Gemma-2 Fine-Tuning Support: Another member acknowledged the request for Gemma-2 fine-tuning support, pointing to a new GitHub issue regarding its multilingual capabilities.

They expressed a desire to collaboratively outline modeling components needed to advance support for this model.

Link mentioned: support for gemma-2 · Issue #1813 · pytorch/torchtune: Could you please add fine-tuning support for gemma-2 ? It has good good multilingual capabilities and is a good candidate for fine-tuning for languages other than English. Its different sizes also ...

Torchtune ▷ #papers (3 messages):

Pixtral 12B
Aria Multimodal Model

Pixtral 12B Detailed Overview: The paper on Pixtral 12B by a team of authors outlines its development and performance, highlighting its capabilities within the multimodal AI landscape.

Authors include Pravesh Agrawal and others, bringing together diverse expertise to push the boundaries of AI integration.
Aria: Open Multimodal Native Model: The new model Aria is introduced as an open multimodal native model, boasting best-in-class performance across various tasks with 3.9B and 3.5B activated parameters.

It outperforms Pixtral-12B and Llama3.2-11B, demonstrating significant advancements in language understanding and multimodal tasks, making it a competitive alternative to proprietary models.
Research on Multimodal Information Integration: Discussion centers around the necessity of multimodal native AI models to integrate and understand diverse real-world information effectively.

The challenges in the adaptation of proprietary models emphasize the need for open approaches like Aria to facilitate broader adoption and innovation.

Links mentioned:

Pixtral 12B: We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal...
Aria: An Open Multimodal Native Mixture-of-Experts Model: Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native mode...

Interconnects (Nathan Lambert) ▷ #news (4 messages):

OpenAI O1 Replication
Journey Learning Paradigm
Dowehaveopeno1.com

In-Depth Report on Replicating OpenAI's O1: A detailed technical report introduces a new training paradigm called 'journey learning' for the replication of OpenAI's O1 model, integrating search and learning in mathematical reasoning with trial-and-error processes. Highlights include an 8% improvement using just 327 training samples.

The report documents insights, challenges, and innovative methods encountered through the replication journey, emphasizing the exploration of advanced reasoning capabilities and journey learning mechanisms.
Discussion on Dowehaveopeno1.com: A member suggested it’s time to create the domain dowehaveopeno1.com, possibly as a resource related to the O1 replication progress. The idea was met with some skepticism about its viability.

Another member expressed a sense of progress but ultimately felt the creation of the domain wasn't quite ready yet.

Link mentioned: Tweet from Pengfei Liu (@stefan_fee): The first in-depth technical report on Replicating OpenAI's o1 !!! Uncover a Treasure Trove of Trial-and-Error Insights and Hard-Won Lessons. Some highlights: (1) We introduce a new training par...

Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):

Model PR submissions
GitHub contributions

Exciting Model Advancements: A member expressed gratitude for advancements in a latest model and encouraged submitting a PR for the model's handler.

They noted that there are other existing PRs as references from other model providers.
Instructions for Contributions: Another member shared a README link for more detailed instructions on contributing to the project.

This README includes necessary guidance for training and evaluating LLMs for function calls.
Commitment to Improvement: A member thanked others for their support and confirmed they would work on the model improvements.

This reflects a collaborative spirit within the community as members contribute to advancements in the project.

Link mentioned: gorilla/berkeley-function-call-leaderboard at main · ShishirPatil/gorilla: Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Symphony AI workflow
Discord collaboration
Loom video demonstration

Symphony builds AI workflows effortlessly: The model Symphony automates agentic workflow development by allowing users to select tools and describe tasks, which it then transforms into an AI workflow.

A demonstration is available in this Loom video showcasing the functionality.
Join our Discord for API access: The team invites members to join their Discord channel, where they can request an API key for using Symphony.

Join us here: Discord Link to become part of the community.

Link mentioned: Designing an End-to-End Multi-Agent AI Workflow with Symphony 🤖: Today, I guide you through creating a multi-agent AI workflow using Symphony. We explore available tools like perplexity and image-to-text, and how to add new ones. I show how to design workflows, run...

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):

Web Browser Agent Experimentation
Lab Study Resources

Web Browser Agents pique interest: Users are discussing potential candidates for effective web browser agents, with Web Voyager highlighted as a promising option to explore further.

Members are eager to hear about any hands-on experiences they might have had with these agents.
Where to find lab study materials: A member inquired about the best practices for studying for labs, leading to the mention of using slides and supplemental readings.

This sparked discussions about the importance of these materials in preparation.

LangChain AI ▷ #general (2 messages):

Raspberry Pi 5
Lightweight Vector Databases
Pinecone Vector DB

Seeking a Lightweight Vector Database for Raspberry Pi 5: A member expressed the need for a lightweight but effective vector database to run a RAG setup on a Raspberry Pi 5 due to limited RAM resources.

They mentioned that Chroma might not be suitable since it primarily stores data in RAM, affecting performance when using Ollama.
Recommendation for Pinecone Vector DB: Another member recommended using Pinecone as a suitable vector database for the member's needs.

This suggestion was aimed at addressing the limitations of using Chroma on the Raspberry Pi 5.

OpenInterpreter ▷ #O1 (2 messages):

ElevenLabs cost structure
Implementation of op3nai API

Calculating ElevenLabs Audio Costs: A member shared that being on the creator plan for ElevenLabs provides 100k credits per month, translating to about 833 credits or $0.18 per minute of audio.

This calculation highlights the raw cost associated with producing a full minute of talking from the app.
Inquiry on op3nai Real-Time API Integration: Another member asked if anyone has successfully implemented the op3nai real-time API into O1.

This question indicates a need for sharing experiences related to API integrations within the community.

AI21 Labs (Jamba) ▷ #jamba (2 messages):

Hugging Face model issues
CUDA multiprocessing
Docker configurations
A100 GPU usage

Hugging Face AI21-Jamba-1.5-Mini configuration error: A user encountered an error with the Hugging Face model AI21-Jamba-1.5-Mini while using torch.multiprocessing in a Docker container on Ubuntu with CUDA 12.4.

The error message indicated they could not re-initialize CUDA in a forked subprocess, highlighting the need to use the 'spawn' start method.
Docker image issue on Akash with A100 GPUs: Another user reported encountering a similar issue while running a Docker image on Akash with two A100 GPUs.

They did not provide additional details on their setup but expressed concern over the configuration issues being faced.

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):