[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
AI is all you need to be a chemist.
AI News for 10/8/2024-10/9/2024. We checked 7 subreddits, 433 Twitters and 31 Discords (228 channels, and 1872 messages) for you. Estimated reading time saved (at 200wpm): 222 minutes. You can now tag @smol_ai for AINews discussions!
Just a smattering of smol stories today:
- 2m emails and prompts hacked in AI girlfriend startup
- OpenAI projects 14b in losses in 2 years
- Sequoia fell in love with o1
- SearchGPT continues its rollout even as more people leave
- Demis Hassabis and John Jumper won the Chemistry Nobel for Alphafold
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- LM Studio Discord
- Unsloth AI (Daniel Han) Discord
- HuggingFace Discord
- Cohere Discord
- aider (Paul Gauthier) Discord
- Interconnects (Nathan Lambert) Discord
- Perplexity AI Discord
- Stability.ai (Stable Diffusion) Discord
- Eleuther Discord
- GPU MODE Discord
- OpenAI Discord
- Nous Research AI Discord
- OpenRouter (Alex Atallah) Discord
- LlamaIndex Discord
- Latent Space Discord
- Modular (Mojo 🔥) Discord
- LLM Agents (Berkeley MOOC) Discord
- OpenAccess AI Collective (axolotl) Discord
- Torchtune Discord
- LangChain AI Discord
- OpenInterpreter Discord
- tinygrad (George Hotz) Discord
- LAION Discord
- DSPy Discord
- DiscoResearch Discord
- Gorilla LLM (Berkeley Function Calling) Discord
- PART 2: Detailed by-Channel summaries and links
- LM Studio ▷ #general (204 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (30 messages🔥):
- Unsloth AI (Daniel Han) ▷ #general (200 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #help (19 messages🔥):
- HuggingFace ▷ #announcements (1 messages):
- HuggingFace ▷ #general (134 messages🔥🔥):
- HuggingFace ▷ #today-im-learning (9 messages🔥):
- HuggingFace ▷ #cool-finds (1 messages):
- HuggingFace ▷ #i-made-this (4 messages):
- HuggingFace ▷ #NLP (8 messages🔥):
- HuggingFace ▷ #diffusion-discussions (8 messages🔥):
- Cohere ▷ #discussions (38 messages🔥):
- Cohere ▷ #questions (36 messages🔥):
- Cohere ▷ #api-discussions (46 messages🔥):
- Cohere ▷ #projects (29 messages🔥):
- aider (Paul Gauthier) ▷ #general (70 messages🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):
- aider (Paul Gauthier) ▷ #links (7 messages):
- Interconnects (Nathan Lambert) ▷ #news (44 messages🔥):
- Interconnects (Nathan Lambert) ▷ #ml-questions (27 messages🔥):
- Interconnects (Nathan Lambert) ▷ #ml-drama (16 messages🔥):
- Interconnects (Nathan Lambert) ▷ #memes (16 messages🔥):
- Interconnects (Nathan Lambert) ▷ #reads (3 messages):
- Interconnects (Nathan Lambert) ▷ #posts (15 messages🔥):
- Perplexity AI ▷ #general (100 messages🔥🔥):
- Perplexity AI ▷ #sharing (12 messages🔥):
- Perplexity AI ▷ #pplx-api (4 messages):
- Stability.ai (Stable Diffusion) ▷ #general-chat (108 messages🔥🔥):
- Eleuther ▷ #general (86 messages🔥🔥):
- Eleuther ▷ #research (7 messages):
- Eleuther ▷ #scaling-laws (6 messages):
- Eleuther ▷ #lm-thunderdome (3 messages):
- Eleuther ▷ #multimodal-general (1 messages):
- GPU MODE ▷ #general (7 messages):
- GPU MODE ▷ #triton (4 messages):
- GPU MODE ▷ #torch (1 messages):
- GPU MODE ▷ #beginner (1 messages):
- GPU MODE ▷ #pmpp-book (3 messages):
- GPU MODE ▷ #torchao (39 messages🔥):
- GPU MODE ▷ #off-topic (1 messages):
- GPU MODE ▷ #llmdotc (15 messages🔥):
- GPU MODE ▷ #rocm (2 messages):
- GPU MODE ▷ #bitnet (1 messages):
- GPU MODE ▷ #webgpu (2 messages):
- GPU MODE ▷ #liger-kernel (2 messages):
- GPU MODE ▷ #metal (2 messages):
- OpenAI ▷ #ai-discussions (75 messages🔥🔥):
- OpenAI ▷ #prompt-engineering (2 messages):
- OpenAI ▷ #api-discussions (2 messages):
- Nous Research AI ▷ #general (67 messages🔥🔥):
- Nous Research AI ▷ #ask-about-llms (3 messages):
- Nous Research AI ▷ #research-papers (4 messages):
- Nous Research AI ▷ #interesting-links (1 messages):
- Nous Research AI ▷ #research-papers (4 messages):
- OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):
- LlamaIndex ▷ #blog (4 messages):
- LlamaIndex ▷ #general (45 messages🔥):
- Latent Space ▷ #ai-general-chat (39 messages🔥):
- Latent Space ▷ #ai-announcements (1 messages):
- Modular (Mojo 🔥) ▷ #general (2 messages):
- Modular (Mojo 🔥) ▷ #mojo (24 messages🔥):
- Modular (Mojo 🔥) ▷ #max (5 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-questions (16 messages🔥):
- LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (10 messages🔥):
- OpenAccess AI Collective (axolotl) ▷ #runpod-help (26 messages🔥):
- Torchtune ▷ #general (9 messages🔥):
- Torchtune ▷ #dev (1 messages):
- Torchtune ▷ #papers (3 messages):
- LangChain AI ▷ #general (9 messages🔥):
- OpenInterpreter ▷ #general (8 messages🔥):
- OpenInterpreter ▷ #ai-content (1 messages):
- tinygrad (George Hotz) ▷ #general (3 messages):
- tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
- LAION ▷ #general (1 messages):
- LAION ▷ #research (3 messages):
- DSPy ▷ #show-and-tell (3 messages):
- DiscoResearch ▷ #general (1 messages):
- Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
AI Advancements and Industry News
- Nobel Prize in Physics: @ilyasut announced that Geoffrey Hinton won the Nobel Prize in Physics for his contributions to AI. @demishassabis noted that Hinton "laid the foundations for the deep learning revolution that underpins the modern AI field." The award was shared with John Hopfield, recognizing their work on neural networks and their connections to physics concepts.
- Model Developments: @AIatMeta introduced a 13B parameter audio generation model as part of Meta Movie Gen, capable of generating high-quality audio synced to video. @rohanpaul_ai highlighted PMRF, a new photo-realistic image restoration algorithm.
- AI Tools and Platforms: @AnthropicAI launched the Message Batches API, allowing processing of up to 10,000 queries asynchronously at 50% less cost than standard API calls. @togethercompute announced Flux Schnell, a new model available for free in their API for the next 3 months.
- AI Research: @rohanpaul_ai discussed PrefixQuant, a new quantization technique that outperforms expensive per-token dynamic quantization. @rohanpaul_ai also highlighted a paper on Prompt Caching for low-latency inference using Prompt Markup Language (PML).
AI Engineering and Development
- Development Tools: @svpino expressed frustration with switching between different code editors, highlighting the ongoing challenge for developers to find the perfect tool. @awnihannun showcased the MLX back-end in LM Studio, demonstrating its performance on an M1 laptop.
- AI Frameworks: @hwchase17 announced "long-term memory" support in LangGraph, allowing for persistent document storage and content-based filtering across conversational threads.
- AI Evaluation: @ShreyaR shared benchmarks comparing OpenAI's DevDay Eval product and Bespoke Labs' Minicheck for hallucination detection, with Minicheck showing better accuracy in detecting hallucinations.
- AI Infrastructure: @_philschmid introduced Hex-LLM, a new LLM serving framework designed for TPUs, offering low-cost, high-throughput deployment for open models from Hugging Face.
AI Ethics and Societal Impact
- AI Safety Concerns: @mmitchell_ai emphasized the importance of men actively supporting gender equality in scientific fields, noting that women alone can only do so much, especially when they represent less than 10% of a field.
- AI Governance: @bindureddy suggested that mainstream media and Hollywood want to regulate AI prematurely to protect their status as "celebrities," viewing AI as a threat to their existence.
Memes and Humor
- @DrJimFan shared a humorous "Hitchhiker's guide to rebranding" for AI terms, mapping machine learning concepts to physics terminology.
- @AravSrinivas posted an image comparing the difference between Google and Perplexity search results, highlighting the perceived superiority of Perplexity.
- @jxmnop joked about the Nobel Prize in Physics being awarded to "ptrblock" for "fundamental contributions to physics," playing on the unexpected nature of the actual award to AI researchers.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Continuous Finetuning: A Novel Approach to Enhancing LLM Performance
- Merging Llama 3.2 vision adapters onto 3.1 finetunes (Score: 40, Comments: 14): The post discusses merging Llama 3.2 vision adapters onto Llama 3.1 finetunes to improve capabilities, providing a sample Python code for 8B/70B -> 11B/90B merges. Key considerations include skipping vision_model and cross_attn layers, handling new hidden layers (e.g., 20 new layers for 70B->90B), and addressing 8 new embeddings in the first embed layer, with the author successfully merging a Hermes 70B lorablated model to create a 90B vision-capable model that retains ChatML features.
- Im pretty happy with How my method worked out (Continuous Finetuning) Topped Open-LLM-leaderboard with 72b (Score: 150, Comments: 45): The author's Continuous Finetuning method has topped the Open-LLM-leaderboard with a 72b model, demonstrating its effectiveness in preventing loss during AI model finetuning by combining new and previous weights. The method was applied to create Rombos-LLM-V2.5 AI models based on Qwen-2.5, which have achieved top or near-top performance across multiple leaderboard categories, as evidenced by the provided screenshots and a detailed write-up.
- The Continuous Finetuning method involves three steps: instruct fine-tuning a base model, applying the adapter to a general instructed model, and merging the resulting models. This approach can effectively add domain knowledge to AI models.
- Users expressed interest in the datasets used for training and the tools for model merging. The author recommended MergeKit for merging and provided links to MergeKit and Qwen-2.5 for further information.
- A user tested Replete-LLM-V2.5-Qwen-14b using a personal benchmark for literary creativity, finding it performed in the 1st quartile for literary form and 2nd tertile for content, demonstrating consistent performance compared to other models.
Theme 2. vLLM Outperforms llama.cpp in Distributed Inference Benchmarks
- LM Studio ships an MLX backend! Run any LLM from the Hugging Face hub on Mac blazingly fast! ⚡ (Score: 179, Comments: 59): LM Studio has released an MLX backend, enabling fast LLM inference on Mac devices. This update allows users to run any Large Language Model from the Hugging Face hub on Mac computers with significantly improved speed, leveraging Apple's ML Accelerate framework.
- More than 70% faster distributed inference performance in the same machine: vLLM vs. llama.cpp, is it expected or can be improved? (Score: 44, Comments: 23): vLLM demonstrates a 70% faster distributed inference performance compared to llama.cpp on the same machine. This significant speed difference raises questions about whether it's an expected outcome or if there's potential for improvement in llama.cpp's performance. The comparison highlights the importance of efficient inference implementations for large language models.
- vLLM's performance advantage over llama.cpp is expected, with 70-80% faster distributed inference. Tests on a 4 x 4090 GPU workstation showed vLLM outperforming llama.cpp significantly in multi-GPU scenarios, while single-card performance was similar.
- The performance gap is attributed to vLLM's use of hand-written CUDA kernels and OpenMP, compared to llama.cpp's reliance on standard C++ and BLAS libraries. Developers are considering adding custom kernels to llama.cpp, balancing performance gains with maintainability.
- GPUStack, a framework supporting both vLLM and llama.cpp, was used for testing. Attempts to improve llama.cpp's performance with the
--split-mode row
flag resulted in worse performance (26 tokens/sec) and uneven GPU utilization.
Theme 3. Microsoft's Differential Transformer: A Breakthrough in LLM Attention
- [New Quantization Algorithm] PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs (Score: 96, Comments: 10): PrefixQuant, a new static quantization method for LLMs, enables W4A4KV4 (4-bit weights, activations, and KV cache) inference while outperforming dynamic quantization techniques. This approach eliminates outliers and allows for efficient per-tensor static quantization of activations and KV cache, avoiding the costly per-token dynamic quantization used in previous methods to handle magnitude fluctuations across tokens.
- Users expressed interest and excitement about testing PrefixQuant, with some skepticism about its performance claims. The community is eager to see the release of inferencing kernels for practical implementation.
- Discussion arose about perplexity scores, comparing PrefixQuant to llama.cpp's q4_K_M quantization. Users debated the comparability of results, noting differences in quantization methods and benchmarking conditions.
- Detailed analysis of llama.cpp's codebase revealed that q4_K_M quantization uses a mix of Q4 and Q6 precision, with higher precision for certain layers. This highlights the complexity of comparing different quantization methods based solely on file sizes.
- [Microsoft Research] Differential Transformer (Score: 271, Comments: 65): Microsoft Research introduced the Differential Transformer, a novel architecture that improves Large Language Model (LLM) performance by incorporating differential equations into the transformer framework. This approach allows for more efficient modeling of continuous data and achieves state-of-the-art results on various benchmarks, including language modeling and time series forecasting. The Differential Transformer demonstrates enhanced capabilities in capturing long-range dependencies and processing sequential data, potentially advancing the field of natural language processing and time-based predictions.
- The Differential Transformer uses a novel attention mechanism that calculates attention scores as the difference between two separate softmax attention maps, effectively canceling noise and promoting sparse attention patterns. This approach shows promising results in long-context modeling, hallucination mitigation, and in-context learning.
- Users expressed excitement about the potential of this architecture, particularly for small models and instruction following. Some speculated on the impact of training large models from scratch with this architecture and then distilling them into smaller models for improved accuracy and cost-effectiveness.
- The implementation is available on GitHub, including versions compatible with FlashAttention. However, new models need to be trained to benefit from this architecture, as it cannot be applied to existing weights.
Theme 4. Inflection AI Expands with New Models and Enterprise Offerings
- Inflection announces partnership with Intel, two new models, and enterprise plans with fine-tuning and on prem hosting (!?) (Score: 38, Comments: 11): Inflection has unveiled two new models, Inflection-2 and Inflection-2.5, alongside a partnership with Intel and enterprise offerings. The company is now providing on-premises hosting options and fine-tuning capabilities for businesses, marking a significant expansion of their services. These developments position Inflection to compete more directly with established players in the AI industry, offering enhanced flexibility and customization for enterprise clients.
Other AI Subreddit Recap
r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity
AI Research and Breakthroughs
- Google Deepmind's Differential Transformer introduces a novel attention mechanism that outperforms standard Transformers in language modeling tasks, showing improvements in long-context understanding, hallucination reduction, and in-context learning.
- Microsoft Research's Differential Transformer demonstrates significant performance gains with fewer parameters and training tokens, particularly excelling in 4-bit quantization.
- Geoffrey Hinton and John Hopfield awarded Nobel Prize in Physics for their foundational work in machine learning and artificial neural networks, sparking discussions about the intersection of physics and AI.
AI Model Releases and Improvements
- Hailuo AI launches Image-to-Video feature, offering free unlimited use with estimated generation times.
- Runway enhances Gen-3 Alpha Turbo to allow first and last frame inputs for both horizontal and vertical aspect ratios.
Industry Developments
- OpenAI receives first DGX B200 systems, signaling expansion of their computational capabilities.
- Analyst predicts Microsoft will acquire OpenAI within three years, though some argue the acquisition has already effectively occurred.
- Google faces potential breakup following monopoly ruling, with implications for the AI industry.
Expert Opinions and Predictions
- Geoffrey Hinton states AI development is not slowing down, predicting as much change in AI in the next 10 years as in the past decade.
- Google hiring scientists interested in AI consciousness and sentience, indicating research focus in these areas.
AI-Generated Content and Tools
- Animorphs LoRA model created for generating image transformations inspired by the book series.
- AI-generated images of "Florida Man vs Hurricane Milton" showcase creative applications of image generation models.
AI Discord Recap
A summary of Summaries of Summaries by O1-mini
Theme 1. Advanced AI Model Performance and Optimization
- SOAP Optimizer Outperforms AdamW: Users tested the SOAP optimizer on Alpaca, achieving better performance than AdamW until adjusting AdamW's learning rate. However, SOAP lacks support for distributed training and bf16 formats.
- L-Mul Algorithm Slashes Energy Costs: The L-Mul algorithm approximates floating point multiplication with integer addition, reducing energy costs by 95% while maintaining higher precision compared to 8-bit floating point operations.
- Diff Transformer Enhances Attention Mechanisms: The Differential Transformer introduces a differential attention mechanism, improving long-context modeling and reducing hallucinations in tasks like question answering, outperforming traditional Transformers.
Theme 2. Infrastructure and Hardware Support for AI
- Dual GPU Setup Limited by Performance: Using an RTX 3060 and RX 6600 provides 20GB VRAM but doesn't boost speed. A second RTX 3060 may help load larger models without enhancing performance.
- Apple MLX Integration in LM Studio 0.3.4: LM Studio 0.3.4 now supports Apple MLX, enabling efficient model execution on Apple Silicon Macs and allowing users to run larger models with enhanced compatibility.
- External GPU Testing on Raspberry Pi 5: A user set up a GPU test rig on Raspberry Pi 5 with an AMD RX 460 and an amdgpu Linux kernel patch, aiming for 4K gaming and full external GPU support.
Theme 3. Challenges in Training and Fine-tuning AI Models
- Training Vicuna-7B Faces CUDA Errors: Users encountered CUDA out of memory errors when training Vicuna-7B on Runpod, despite having 5 GPUs with 24GB RAM each. Adjusting DeepSpeed configurations resolved the issue.
- Aider's Architect Mode Requires Refinement: Users reported that Architect Mode in Aider often fails to complete tasks, necessitating prompt adjustments for better planning and observation before coding.
- DeepSpeed and Accelerate Configuration Issues: Members discussed resolving DeepSpeed configuration errors by ensuring device counts align with multiples required and using correct API parameters, streamlining the training process.
Theme 4. Data Management, Security, and Scalability
- Data Breach at Muah.ai Exposes 1.9M Emails: The AI girlfriend service Muah.ai suffered a data breach, exposing 1.9 million email addresses and sensitive prompts, including information related to child exploitation.
- Model Merging at Scale Enhances Generalization: Research on model merging up to 64B parameters shows improved generalization and efficiency. Larger models enhance the benefits of merging, especially when combining multiple expert models.
- AI Data Wall Concerns: As language models approach data limits, concerns about a data wall hindering AI progress emerge. Contrasting views suggest human reasoning can compensate for limited data exposure.
Theme 5. AI Tools, Integrations, and Community Research
- Tool Integration with LangChain and Aider: Users explored integrating Livekit with LangChain for real-time capabilities and Aider for external LLM integrations, enhancing functionalities like RAG bots.
- Llama Stack Unveils New Development Tools: Llama Stack tools released by Meta provide powerful resources for developers to optimize AI model capabilities, with GitHub repositories offering detailed examples and utilities.
- Community Research and Nobel Prize Updates: The 2024 Nobel Prize in Chemistry awarded to David Baker, Demis Hassabis, and John M. Jumper for contributions to computational protein design and AlphaFold2. Community discussions also reflect on AI research contributions and critiques, such as Schmidhuber's insights on attribution.
O1-preview
Theme 1. AI Model Advancements and Releases
- NVIDIA's Nemotron 51B Doubles Throughput on a Single H100 GPU: NVIDIA launched the Nemotron 51B, a NAS-optimized model achieving 2x throughput while maintaining accuracy. It's accessible via NVIDIA's API or available for download on Hugging Face.
- Meta's CoTracker 2.1 Tracks 70k Points on a Single GPU: Meta introduced CoTracker 2.1, a video motion prediction model capable of tracking 70,000 points on one GPU. The accompanying paper is available here.
- LLM360 Drops a Massive 15 Trillion Token Dataset: LLM360 unveiled a new pre-training dataset with 15 trillion tokens, emphasizing rigorous data quality and deduplication. This dataset aims to enhance training for large language models.
Theme 2. AI Tools and Integration Challenges
- Cline AI Assistant 2.0 Streams Responses into Your Editor: The new Cline AI Assistant 2.0 introduces features like streamed responses directly into editors and a cancel button for task management. Users note a 40% reduction in requests due to an XML-based tool-calling prompt.
- Aider Struggles with File Management and External LLMs: Users reported that Aider doesn't auto-populate new files in the list without manual commits. Attempts to integrate external models like SambaNova require manual API configurations, highlighting integration challenges.
- OpenAI Realtime Console Makes Voice API Accessible: A demo repository helps users test OpenAI's new Realtime Voice API with a simple
npm start
, although one user incurred $3.87 in charges for 15 minutes of use.
Theme 3. AI in Research and Recognition
- Nobel Prize in Chemistry Honors Computational Innovators: The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John M. Jumper for breakthroughs in computational protein design and protein structure prediction via AlphaFold2.
- Debate Over AI Attribution in Nobel Prizes: Controversy arose as figures like Schmidhuber criticized the Nobel Committee for overlooking significant contributors in AI, sparking discussions about proper attribution in scientific achievements.
- Scaling Laws Debate: Square Root vs. Fourth Root: Members debated scaling laws in AI, contrasting new proposals for square root scaling against Kaplan's established 0.28 constant suggesting fourth-root scaling.
Theme 4. AI for Creative and Emotional Engagement
- Emotional State Machines Make AI More Sentient: Developers are building AI with persistent emotional states, allowing bots to reflect user sentiments over time. This contrasts with typical bots that reset emotions after each interaction.
- AI's Role in Mental Health Support Under Scrutiny: Discussions highlighted the potential and challenges of using AI chatbots for mental health, with concerns about censorship policies limiting the AI's ability to handle emotional nuances effectively.
- Innovative Techniques Enhance AI Roleplay Experiences: Users shared methods for erotic roleplay (ERP) with AI, focusing on detailed character creation and immersive storytelling, though these practices raise ethical considerations.
Theme 5. Technical Challenges and Solutions in AI Development
- LM Studio Users Grapple with Model Loading Issues: Upgrading to LM Studio 0.3.4 led to problems loading models like Llama 3.2. Switching to the Vulkan backend was suggested as a workaround.
- HBM's Performance Doesn't Meet Expectations: Discussions revealed that HBM memory isn't significantly reducing power consumption or costs. The bottleneck in supplying more H100s GPUs is linked to packaging requirements.
- Torchao Encounters Quantization Hiccups: Integrating torchao with frameworks like ComfyUI led to operator errors, particularly on Windows. These issues highlight the complexities of quantization and compatibility in AI workflows.
PART 1: High level Discord summaries
LM Studio Discord
-
Llama 3.2 Struggles with LM Studio: Users encountered issues loading Llama 3.2 and Dolphin 2.2.1 models in LM Studio 0.3.4, with some models failing that worked in earlier versions.
- A solution suggested was switching to the Vulkan backend to potentially enhance model loading compatibility.
- MLX's Infinite Loop Crisis: Concerns arose about MLX causing infinite output loops, notably with Llama 3.1 8B Instruct 4bit, reflecting issues in model response interpretations.
- Discussions pointed toward prompt handling as the core issue, causing unwanted repetitive outputs.
- Dual GPUs, but No Speed Boost: Conversations revealed that using an RTX 3060 alongside an RX 6600 totals 20GB VRAM but lacks speed improvements.
- Users indicated that a second RTX 3060 could help load larger models but confirmed that performance would remain limited.
- LM Studio's Compatibility Updates: The launch of LM Studio 0.3.4 initiated questions about model compatibility, especially with preset migrations after the update.
- It was noted that users would likely have to manually check and adjust settings post-update.
- NVIDIA RTX 4000 Deviates from NVLink: Discussion highlighted that the NVIDIA RTX 4000 series shifted away from NVLink, opting for PCIe Gen 5 for multi-GPU connections.
- This raised questions about the speed of unconnected GPUs, with users noting surprising performance capabilities.
Unsloth AI (Daniel Han) Discord
-
Model Merging at Scale Insights: New research on model merging at scale highlighted performance when blending models up to 64B parameters. Investigate findings in the paper available on arXiv.
- Members expressed excitement over systematic evaluations that could enhance model generalization and efficiency.
- Smooth Sailing with Qwen 2.5 Fine-tuning: Fine-tuning on Qwen 2.5 has become seamless after previous prompt issues were resolved. Users can find a collection of available models on Hugging Face.
- This progress reassures engineers interested in utilizing the models for their projects.
- Clarification on Dataset Formats for Unsloth: Discussions pointed out the efficiency of using Parquet over CSV files for datasets in Unsloth. Users should align dataset structures with expected column formats, such as 'train' and 'conversations'.
- Ensuring correct formats helps streamline training processes within the platform.
- Logits Exploration with Ollama Llama: Members faced challenges obtaining logits scores from Llama via Ollama in Python and debated switching to llama.cpp for better results. The search for clear resources left some users puzzled.
- This discussion emphasizes the need for better access to functional resources and methodologies for logging outputs.
- Challenges with AMD GPUs in Unsloth: Concerns were raised about limitations in creating small LoRA models on Intel GPUs, with confirmations that Unsloth does not support AMD GPUs. This raises coalition questions for those reliant on proprietary hardware.
- Clarifications indicated that multi-GPU setups are also unsupported, impacting training flexibility.
HuggingFace Discord
-
Nvidia launches high-efficiency models: Nvidia introduced the Nemotron 51B, a NAS-optimized model achieving 2x throughput on a single H100 GPU while preserving accuracy. Users can test the model via NVIDIA's API or download it from Hugging Face.
- This model release included several variants like NVLM 1.0 aimed to bolster AI capabilities.
- Meta releases improved VLMs: Meta launched its first VLMs, including CoTracker 2.1, capable of tracking 70k points on a single GPU for video motion prediction, with an accompanying paper available here.
- The updated SAM 2.1 model for image/video segmentation offers enhanced functionality for developers.
- Insights into Mira's Decentralization: A member introduced Mira, a decentralized infrastructure making AI accessible, emphasizing its community-driven projects without crypto involvement. Despite technical potential, some users raised moral concerns regarding blockchain associations.
- The discourse illustrated a growing tension over integrating such technologies in AI development.
- Evaluating Diffusion Model Training Techniques: Members clarified that the diffusers library facilitates various diffusion models, noting Stable Diffusion XL and Flux as capable integrations.
- Discussions also covered training with Flux loras using gguf formats, despite current limitations on model support.
- Fine-tuning Whisper Model for ATC: A blog details the fine-tuning of a Whisper model on air traffic control communications, achieving an 84% performance improvement by reducing the word error rate (WER) from 94.59% to just 15.08%.
- The link to the GitHub repository and a blog post provide further exploration of this tailored ASR solution.
Cohere Discord
-
CMD-R Temperature Tweaks: Members highlighted optimal temperature settings for CMD-R, recommending 0.3 for deterministic outcomes and 0.8 for creative tasks, with concerns on generative costs.
- Suggestions included generating with 0.8 then formatting with 0.1 to balance creativity and cost.
- API Connection Hiccups: Intermittent issues with the Cohere API were reported, with one member resolving it by accessing
response.message.content[0].text
, causing a brief debug frenzy.
- Members speculated recent changes in the API might be a factor, sharing troubleshooting experiences and code adjustments.
- Innovative Emotional State Machine: A new emotional state machine intends to track user emotions with persistent memory, keeping assistant bots in tune with user sentiment.
- This distinct approach bucks typical bots' flexibility, as they remain in an emotional state reflective of user interactions.
- Advanced RAG in Banking: A user detailed their experiments with an RAG solution yielding 75% recall@5, outperforming OpenAI for banking applications by embedding 2000 chunks.
- They aim to utilize this as a proof of concept for the bank, showcasing the feasibility of their solution.
- AI's Role in Mental Health Support: Discussion turned to the use of AI chatbots in mental health contexts, highlighting their value when human therapists are absent yet noting challenges with emotional context.
- Concerns emerged around censorship policies that limit these bots' ability to interpret complex emotional nuances, impacting their effectiveness.
aider (Paul Gauthier) Discord
-
Aider struggles with File Management: Users faced issues with Aider not auto-populating new files in the file list, requiring the use of
/commit
or specifying file paths directly to see changes.- Another user pointed out that files must be committed to the git repository to be available in autocomplete, underlining the importance of version control.
- Integrating External LLMs is a Challenge: Community members discussed the difficulty of integrating SambaNova models with Aider, suggesting manual API configuration for OpenAI-compatible endpoints.
- Further inquiries revealed methods for adding model pricing and token costs through metadata JSON files, yet some configurations still posed issues.
- Architect Mode needs Refinement: Concerns emerged regarding Aider's Architect mode which often fails to complete tasks fully, necessitating user intervention to continue.
- Users suggested modifying prompts for better planning and observation before coding to enhance the effectiveness of this mode.
- OpenAI Realtime Console makes voice API accessible: A demo repository for the OpenAI Realtime Console was successfully set up, simplifying access to the new voice API announced at DevDay.
- While interacting via voice incurs costs, one user noted charges of $3.87 for 15 minutes of use, which raised concerns about testing expenses.
- Cline AI Assistant 2.0 breaks new ground: The newly released Cline AI Assistant 2.0 boasts features like streamed responses directly into the editor and a cancel button for task management, enhancing usability.
- Users highlighted the XML-based tool calling prompt, which reportedly reduces requests by 40%, making resource use more efficient.
Interconnects (Nathan Lambert) Discord
-
Nobel Prize in Chemistry Celebrates Computational Advances: The 2024 Nobel Prize in Chemistry has been awarded to David Baker for computational protein design and jointly to Demis Hassabis and John M. Jumper for protein structure prediction as announced on Nobel Prize Tweet.
- Members celebrated this milestone but expressed skepticism about its implications for future innovations in AI.
- PRMs Under Scrutiny Amid Development Changes: A lack of research on PRMs was humorously noted, with members pointing out that 'almost none on PRMs, almost a billion as LLM as a judge'.
- Concerns emerged regarding the patenting process in ML, with suggestions that companies often file defensively, leading to vague claims and unresolved disputes.
- Schmidhuber Takes Aim at AI Attribution Issues: Criticism arose concerning the Nobel Prize in Physics 2024, where Schmidhuber highlighted plagiarism and misattribution in works by Hinton and collaborators, claiming significant contributions were overlooked.
- The mix of sentiments reflected a community reaction to the historical significance of AI contributions, as highlighted by user comments about Schmidhuber's critique.
- ButtBench Alignment Project Gets a Logo: The ButtBench Alignment Project designed a new logo, marking a visual identity for a project that has reached SOTA, though still far from human performance as noted by Luca Soldaini.
- This move signals a push for recognition and clarity in the goals of the project, resonating well with the community.
- Data Wall Looms in AI Development: A data wall threatens progress in language models as current offerings nearing data limits were discussed, raising questions about reliance on larger data volumes.
- Contrasting opinions suggest human performance is not solely dependent on extensive data exposure, hinting at a philosophical divide on AI efficiency.
Perplexity AI Discord
-
Profit Model Queries at Perplexity AI: Concerns regarding how Perplexity AI generates profit arose, particularly with student discounts in play, making the business model appear precarious.
- sneakyf1shy humorously suggested that venture capital might be the backbone of their operations, hinting at potential long-term uncertainties.
- Complexity Extension Packs a Punch: The newly launched Complexity extension is enhancing the Perplexity experience with options for customizable themes and markdown exports, leading some to say it's ‘like Perplexity on steroids.’
- Feline and asura0_00 praised the extension for significantly boosting user interactivity.
- Perplexity AI Shortens Responses: Users noticed a trend toward more condensed responses from Perplexity AI, raising concerns that answers may lack information depth.
- Speculation suggests these changes could be tied to adjustments in token limits, affecting the quality of responses.
- Meta's Movie Maker Rocks: Meta has launched a movie generation tool, enabling users to create short films using AI, which aims to enhance storytelling.
- This development showcases the potential of AI in creative domains.
- Frustrations with Citation API Access: Members raised concerns regarding unanswered requests for whitelisting on the citation API, highlighting multiple attempts via various channels with no feedback.
- A growing sense of frustration is evident among users awaiting updates.
Stability.ai (Stable Diffusion) Discord
-
ControlNet Models Simplified: A member shared a GitHub link regarding ControlNet models, suggesting users focus on practical examples while skimming the mathematical explanations.
- Scroll a bit down, ignore the math and look at the examples.
- Flux Inpainting's Fast Track: In discussions about Flux and Schnell inpainting models, one member noted that using recommended settings should reduce processing time to 1-2 minutes, compared to an experienced 25 minutes.
- The community highlighted key differences in iterations that affect Flux dev and Schnell performance.
- Craving Kaggle Notebooks for Image Generation: A call for resources in the form of a Kaggle notebook for Automatic1111 broke out, shedding light on the community's demand for structured guides.
- Members reflected on the difficulties of locating specific notebooks for seamless image generation processes.
- Distilled CFG Confuses the Masses: Discussions on the nature of distilled CFG clarified that it serves as guidance distinct from the standard CFG, arising from specific model training.
- Community members expressed that while Flux dev enhances CFG usage, it currently does not support negative prompts.
- Deforum After Colab Restrictions: A Plan: Inquiries about utilizing Deforum post-Colab restrictions prompted discussions on alternatives for accessing computing power, particularly renting GPUs.
- Suggestions included using RunPod for GPU rental as a feasible solution.
Eleuther Discord
-
Nobel Prizes Ignite AI and Chemistry Debate: Recent discussions highlighted the Nobel Prize awards' relevance for AI figures such as Hinton and Hopfield, questioning their impact on traditional physics and chemistry fields.
- Opinions were split; while some feared a dilution of the award's prestige, others argued that innovation and enthusiasm should drive selection.
- PhD Candidates Push Back on Publication Metrics: Frustration emerged over the pressure from publication metrics in PhD programs, which some believed created a daunting competitive environment.
- Members proposed that effective networking might be a better strategy for securing mentorship and collaborations, rather than just chasing publication counts.
- Web3 to Web5 Transition Confuses: Debate arose on moving from Web3 to Web5, likening the naming strategy to the Fibonacci sequence, leading to speculation about future iterations like Web8.
- Conversations turned humorous with members joking about the absurdity of the progression.
- Scaling Laws Debate Engulfs Members: One member shared an overview stating that cross-entropy loss decreases with quadratic compute increase, referencing an article that proposes square root scaling.
- This was contested with Kaplan's laws suggesting a constant of 0.28, advocating for a fourth-root scaling approach.
- Spotlight on 0-shot COT Models: A focus emerged on the widespread adoption of 0-shot COT variants in recent model releases, hinting at a shift in evaluation methodologies.
- While members pondered potential evaluation implementation details, no specific techniques were mentioned.
GPU MODE Discord
-
HBM's Performance Compared to Expectations: Concerns were raised regarding HBM not performing better than expected, still representing a HUGE cost in products like the H100 while not significantly reducing power consumption.
- The key bottleneck in supplying more H100s was identified as required packaging.
- GPT2 Training Encounters TypeError: A member reported a TypeError while running GPT2 training related to the
normal_()
function in PyTorch 2.0.0 due to an unexpected keyword argument 'generator'.
- Discussion suggested understanding complexities of training, including initialization and forward/backward passes.
- Seeking Libraries for WebGPU Testing: A community member seeks recommendations on libraries for testing WebGPU, currently using Vitest and Playwright but facing flaky test runs.
- They suspect the issue might stem from Playwright not properly clearing resources between test runs.
- Gearing Up Raspberry Pi 5 for 4K Gaming: After witnessing Pineboards' 4K demo, a member decided to set up a GPU test rig on Raspberry Pi 5 with the amdgpu Linux kernel patch.
- They aim for full external GPU support and shared insights on how to apply the patch.
- Launch of FusedLinearJSD: The recent pull request introduced FusedLinearJSD, enabling efficient handling of the final linear layer by avoiding large logits tensor materialization.
- This optimizes both the forward and backward pass for improved execution, mirroring the fuse linear CE approach.
OpenAI Discord
-
Choosing Between ChatGPT and Claude Subscriptions: A member advised against subscribing to ChatGPT for features in preview due to usage caps, although access to GPT-4 legacy and 4o models might be beneficial.
- They stressed that subscriptions should allow full functionality rather than limiting preview access.
- Understanding O1 vs. O1 Mini Models: Members compared the O1 models, which act as 'reasoners', to 4o, highlighting the O1's limited availability of 50 uses per day versus 80 uses for 4o within 3 hours.
- The discussion included plans for A/B testing between the two models to determine performance differences.
- Theoretical Exploration of AI Evolution: A theory on AI consciousness evolution was entertained, emphasizing re-training and fine-tuning for advancement in capabilities.
- Conversations swirled around the commercial viability of these evolved AI models and potential business models to support them.
- User quits ChatGPT over rewriting responses: A user expressed frustration with ChatGPT's habit of rewriting responses, causing them to stop using it for several months.
- They noted the exacerbating headaches from the rewriting issue, which continued even when they requested it to stop.
- Possible solutions discussed for ChatGPT: Another member suggested that the rewriting behavior might relate to Canvas or DALL-E prompts, and provided a workaround for DALL-E use.
- They recommended the phrasing 'Make an image using these exact words: [your words]' to avoid the rewriting problem.
Nous Research AI Discord
-
Kainan offers free compute resources: Kainan expressed willingness to provide free compute resources for a competition, sparking interest from members.
- Though there was enthusiasm, some uncertainty arose regarding how many participants would actually utilize this offer.
- 2024 Nobel Prize awarded for Protein Research: The Royal Swedish Academy of Sciences awarded the 2024 #NobelPrize in Chemistry to David Baker and Demis Hassabis & John M. Jumper for their contributions to computational protein design and structure prediction, as reported here.
- This recognition underscores the pivotal advancements in protein research within the AI community.
- LM Studio boosts performance with Apple MLX: The new LM Studio 0.3.4 is out, featuring support for Apple MLX, allowing efficient model execution on Apple Silicon Macs.
- Users are thrilled by the improvements in running larger models and the potential capabilities provided by MLX.
- LLM360 launches massive pre-training dataset: LLM360's new dataset boasts 15 trillion tokens, ensuring rigorous data quality through thorough filtering techniques.
- This initiative focuses on enhancing the training quality for LLMs, emphasizing deduplication and superior dataset structuring.
- Llama Stack reveals new development tools: A member highlighted the new Llama Stack tools released by Meta, finding them pretty powerful.
- This showcases an emerging interest within the community for utilizing advanced tools to optimize AI model capabilities.
OpenRouter (Alex Atallah) Discord
-
Prompt Caching: The Good and the Bad: Members discussed the mechanics of prompt caching, noting it can be problematic for changing contexts or short prompts. One member remarked, 'You cannot disable prompt caching for those providers who do automatic prompt caching,' pointing out critical limitations.
- This sparked a debate on when and how to effectively utilize prompt caching without compromising performance.
- Curiosity about Inflection 3.0: The anticipated launch of Inflection 3.0 has generated buzz, particularly regarding its integration with Intel Gaudi 3 for better performance. Despite the excitement, some members expressed skepticism about the lack of concrete benchmark data.
- Concerns were raised that the hype might overshadow the actual performance improvements and real-world applications.
- Understanding OpenRouter API Rate Limits: Clarifications on OpenRouter API limits reveal they are dynamic and depend on account credits. One member shared a GET request example demonstrating how to check rate limit status and credits associated with an API key.
- This guidance is crucial for optimizing API usage while ensuring compliance with request limits.
- NotebookLM Podcast Gains Traction: Participants shared positive feedback on the NotebookLM Deep Dive podcast and highlighted its utility during commutes by creating accompanying notebooks. One user noted a desire for automation tools like ai-podcast-maker, stating, 'automation ftw.'
- This discussion underscores the growing trend of integrating audio content into daily workflows for enhanced learning.
- Gemini Moderation Worries Surface: Concerns arose about Gemini potentially moderating inputs, raising fears of user bans over specific content. This initiated a broader dialogue on user experience and content moderation policies within AI frameworks.
- Participants emphasized the need for transparency in moderation practices to ensure positive engagement from users.
LlamaIndex Discord
-
LlamaIndex Workflows Tutorial Brilliance: A detailed tutorial illustrates how to implement Workflows in LlamaIndex, contrasting it with LangGraph and aiding in the creation of AI research agents.
- It includes practical debugging and optimization tips, ensuring a smoother implementation experience.
- LlamaCloud's Financial Data Superpower: In a recent demo, the team showcased how to utilize LlamaCloud and LlamaParse to automate the filling of financial spreadsheets across multiple companies.
- This highlights the substantial contribution of LLMs in streamlining data handling and analysis processes.
- SFTechWeek Meetup on Multi-Agent Workflows: A reminder to RSVP for the in-person gathering at LlamaIndex HQ during #SFTechWeek, focusing on implementing Multi-Agent workflows in real production environments.
- Participants are promised insights on RAG systems and production challenges, alongside food and networking opportunities. RSVP here.
- Build Your Own AI Agent with OpenAI: A demonstration by the team allowed users to interact with an AI agent in real-time using the OpenAI Realtime API client, showcasing voice interaction capabilities.
- This open-source tool opens doors for developers to create personalized voice agents seamlessly, with examples provided for ease.
- Semantic Chunking Conundrum in TypeScript: A user sought guidance on implementing semantic chunking in TypeScript, referencing a comparable example in Python for context.
- They expressed frustrations with the lack of available resources and sparked discussions for community solutions.
Latent Space Discord
-
AI Girlfriend Service Data Breach Exposed: The AI girlfriend service Muah.ai suffered a data breach last month, impacting 1.9 million email addresses and exposing sensitive prompts.
- Security experts are alarmed about the breach, particularly its implications for child exploitation data included.
- Sequoia Capital's Insight on AI Evolution: Sequoia’s latest essay emphasizes a transition in Generative AI from 'thinking fast' to 'thinking slow,' focusing on inference time reasoning for innovative applications.
- Companies like OpenAI and Google DeepMind are stabilizing the market, while new agentic applications are poised to emerge.
- 2024 Nobel Prize in Chemistry Awarded: The 2024 Nobel Prize in Chemistry goes to David Baker for computational protein design, and to Demis Hassabis and John M. Jumper for contributions to AlphaFold2.
- Their work is crucial in advancing biochemistry, successfully predicting structures for nearly 200 million proteins.
- Palmyra X 004 Launch Highlights: Palmyra X 004 ranked in the top 10 on HELM, showcasing full-stack tool calling and training on synthetic data.
- This model's capabilities in AI function calling and CRM improvements received attention from Venture Beat.
- ChatGPT Introduces Search Functionality: ChatGPT is rolling out SearchGPT, integrating citation features in GPT-4o to compete with platforms like Perplexity.
- This strategic move enhances ChatGPT's information retrieval capabilities and aligns it with user query requirements.
Modular (Mojo 🔥) Discord
-
DOM Data Attributes Enhance HTML Elements: A DOM feature now allows data storage on elements with custom attributes starting with
data-myattribute
, improving data handling in HTML.- This development encourages innovative techniques for data manipulation directly via the DOM.
- WebAssembly Component Model Repository Launched: The repository for the WebAssembly Component Model has been shared, detailing its design and specifications.
- It provides essential insights for developers interested in the component model aspects of WebAssembly.
- Mojo's GPU Support Sparks Excitement: Anticipation builds around the upcoming GPU support in Mojo, promising enhanced performance capabilities.
- Community members are exploring integrating PyTorch with Mojo to optimize usage of GPU resources.
- Mojmelo Brings Scikit-learn to Mojo: The Mojmelo project aims to implement machine learning algorithms in pure Mojo, providing an alternative to Cython dependencies in Scikit-learn.
- This initiative may significantly streamline the process of running Scikit-learn workflows through Mojo functionality.
- Mojo Graph Performance Concerns: Performance tests highlighted that total compile times for graphs were 0.312s and 0.451s, leading to concerns about slower debugging processes.
- Suggestions to reuse the inference session could mitigate these compile time issues, addressing potential performance penalties from using List types.
LLM Agents (Berkeley MOOC) Discord
-
Lab Assignments Officially Released: The lab assignments for the course are now live, with the first task focused on using the Autogen framework to analyze restaurant reviews, due December 12, 11:59pm PST.
- Subsequent labs will address prompt engineering for LLM security, emphasizing creating attack and defense prompts.
- Sign Up for Course Made Simple: Prospective students can easily join the course by filling out this form.
- Engagement is encouraged in the LLM Agents Discord for further collaboration.
- Lab 1 Download Issues Reported: Users encountered problems downloading Lab 1 instructions, receiving empty files, while other labs function correctly.
- It was pointed out that the file is accessible on Google Drive despite having no preview.
- Reinforcement Learning's Impact on AGI Debated: Concerns arose regarding the relevance of Reinforcement Learning (TD learning) in achieving AGI, with some questioning if agents can thrive without it.
- The discussion highlighted RL's role and efficacy in modern AI architectures.
- Call for Collaborative Learning: Members encouraged peer collaboration for brainstorming while tackling assignments, aiming for a shared learning experience.
- This encouragement is seen as a way to foster camaraderie and improve understanding of complex LLM concepts.
OpenAccess AI Collective (axolotl) Discord
-
Training Process Stalls on Vicuna-7B: A user reported their training process for the Vicuna-7B model got stuck with no output and shared their command line for launching it.
- Another member suggested sharing the sample config to diagnose the problem.
- DeepSpeed Error Resolved: The user faced a DeepSpeed error stating 'Input should be a valid integer, got a number with a fractional part'.
- The community suggested ensuring the number of devices is a multiple of 2, which ultimately resolved the issue.
- Unexpected CUDA Memory Shortage: Despite having 5 GPUs with 24GB of RAM, a user encountered CUDA out of memory errors during training.
- They shared their DeepSpeed and accelerate configurations to seek insights into the memory shortage.
- Runpod Instance Insights: The user referenced their DeepSpeed configuration, noting it was derived from examples available on GitHub.
- They emphasized running experiments on a Runpod instance and highlighted its specifications for context.
- Community Collaboration for Troubleshooting: Members actively collaborated to troubleshoot various model training and configuration issues.
- They exchanged insights and links to configurations, helping to resolve the user's questions about training and resource management.
Torchtune Discord
-
Model Scalability Raises Eyebrows: A member expressed concerns about the scalability of a paper that was trained on 350 billion tokens, questioning the significance of their improvements.
- Ironically, another member noted that ML professionals often overlook basic statistical measures like p-values.
- P-values Not Common in ML: A member shared frustration about the lack of p-values and confidence intervals in ML papers, expressing how it feels triggering coming from a medical background.
- Another participant remarked that they rarely see p-value usage in ML contexts, highlighting a cultural difference in scientific reporting.
- SOAP Outperforms AdamW but Needs Tuning: A user tested the SOAP optimizer on Alpaca, noting it performed better than AdamW until they adjusted AdamW's learning rate.
- However, they mentioned that the current implementation does not support distributed training or bf16 formats yet.
- Diff Transformer Triumphs over Traditional Transformers: The Diff Transformer introduces a differential attention mechanism, enhancing attention to relevant context and outperforming traditional Transformers in various benchmarks.
- It notably aids in long-context modeling and reduces hallucination in tasks like question answering.
- L-Mul Algorithm Slashes Energy Costs: The proposed L-Mul algorithm approximates floating point multiplication with integer addition, reducing energy costs by 95% while maintaining higher precision.
- This method offers a significant improvement over 8-bit floating point multiplications, suggesting a potential for vast resource savings in neural network computations.
LangChain AI Discord
-
Exploring Memcached Support in LangChain: A member is investigating whether adding support for pymemcache in LangChain is enough, or if a broader range of clients like python-memcached or pylibmc would be beneficial.
- The goal is to improve caching flexibility within LangChain, making it more adaptable to different caching needs.
- LiteLLM's Streaming and Caching Issues: Concerns arose about LiteLLM not retrieving cached tokens while streaming, leading to a query about best practices for ensuring effective caching.
- Resources on LiteLLM were shared, suggesting that token stream responses may disrupt caching mechanisms.
- SQL Query Limitations in AI: A user raised issues regarding limiting SQL queries to specific IDs without relying on LLM instructions, looking for stricter query generation methods.
- Another member recommended using grouping by ID to improve filtering and achieve more reliable results.
- SQL Chain Compatibility with Other Models: A question was proposed regarding the performance of the SQL chain with models outside of GPT 3.5, which often return inaccurate results.
- One member found success using 4o-mini by focusing on precise column naming and careful question formulation.
- Integrating Livekit for Real-time LangChain Functions: Interest was expressed in integrating Livekit with LangChain to enhance its real-time capabilities for advanced applications.
- The member specifically mentioned plans to develop a RAG bot, showcasing their ambitions for progressive application development.
OpenInterpreter Discord
-
Get Ready for Mozilla AI Talk!: Next week, we're excited to host a talk from a member of Mozilla AI discussing intriguing open source initiatives. Don't miss out on this opportunity to learn more!
- You can join the event here to catch the insights.
- Confusion Over --stdin Flag: A user expressed confusion on how to use the --stdin flag and mentioned they couldn't find guidance in the docs, highlighting a documentation clarity gap.
- Further clarification is needed to assist users in utilizing this feature effectively.
- LLMs Stay Deterministic with Same Seed: A discussion revealed that LLMs can be deterministic if the same seed and input are used, contrary to popular belief. ChatGPT randomizes the seed on each request to introduce non-determinism.
- It's crucial to note that using the same inputs and setting temperature to 0 should yield consistent results.
- Unpredictability with Model Updates: Concerns were raised about model updates in ChatGPT possibly affecting result consistency over time. Changes in the model could lead to variations that disrupt previously deterministic behavior.
- Users emphasized that updates might introduce unpredictability even when the code remains static.
- Code Outcome Variability Across Systems: A member pointed out that updates to systems or Python could influence code behavior, resulting in variable outcomes. For instance, accessing user tokens could alter the execution path.
- This variability underscores the importance of a controlled environment for consistent results.
tinygrad (George Hotz) Discord
-
Clang Backend Errors in Tinygrad: A user encountered an error using
exo
on Linux with the clang backend, including a lowering error with MetaOps.KERNEL that replicates across two systems, possibly linked to Nix package issues.- Additionally, running
TINYGRAD_DEBUG=2
logged hundreds of operations before crashing, revealing detailed activity without immediate failure. - Introducing Fashion MNIST for Tinygrad Learners: A member proposed a Pull Request to add Fashion MNIST as a new dataset, bridging complexity between MNIST and CIFAR-10 for drivers of tinygrad education.
- This initiative reflects an eagerness in the community to augment learning resources, prompting discussions about more datasets to further enrich training experiences.
- Expansion of Dataset Options for Learning: Members have expressed interest in adding more datasets to tinygrad, indicating a collaborative effort to boost learning opportunities beyond existing options.
- The call for new datasets promises to create a more diverse learning environment, allowing users to experiment with various data types and challenges.
- Additionally, running
LAION Discord
-
Hierarchical Generation Gains Traction: A member shared a blog post on Coupling Generation and Compression, discussing a framework for Hierarchical Generation similar to Stable Cascade models.
- The article highlights the prevalent model paradigm where a decomposer is trained first, which notably affects LLMs and image generation outputs.
- o1-preview Set to Redefine Zero-shot Capabilities: o1-preview exhibits significant strengths in zero-shot (weak) out-of-distribution generalization, outperforming previous models as per preliminary findings.
- o1-mini shows no such advancement, matching previous SOTA, which clearly illustrates the value of pre-training scale in model efficacy.
- TruthfulQA Shows o1's Comprehension Skills: o1 posted strong results on TruthfulQA, particularly in grasping common misconceptions effectively, indicating potential in comprehension tasks.
- Despite its constraints, the performance demonstrates o1's ability to tackle certain understanding challenges with notable success.
DSPy Discord
-
Fetching Random Cat Images Made Easy: A new feature demonstrated the ability to fetch random cat images using The Cat API. This implementation involves creating a
Cat
model and utilizing an HTTP client for seamless image retrieval.- The demo emphasizes simplicity, allowing developers to easily integrate cat images into their applications.
- Limiting Cat Breeds Fetching: A showcased method allows users to fetch cat breeds while restricting the number of breeds returned. Code snippets reveal that only a limited set of breeds is retrieved and can be structured into a
CatBreed
model for efficient access.
- This enhancement provides developers with tighter control over data retrieval, making it easier to handle large datasets.
- Video Demos for Visual Learners: Links to demonstration videos were shared, providing visuals on the functionality of the cat image and breed fetching features. These guides clarify implementation processes for users.
- Such resources empower developers to grasp the tools effectively and implement them with confidence.
DiscoResearch Discord
-
Whisper Turbo German Model Halves Error Rate: The newly introduced Whisper Turbo German model reduces error rates by 50% in various benchmarks compared to earlier versions, according to a source. This model is optimized for transcription, voice commands, and automatic subtitling specifically for German.
- It enhances usability in diverse scenarios by providing dictation functions for word processing software, making it a valuable tool for developers working with German-language processing.
- Applications of Whisper Turbo Model: Key applications of the Whisper Turbo German model include effective transcription of spoken German, automatic subtitling, and facilitating voice-based search queries.
- Developers can leverage these functionalities for various projects, improving accessibility and interaction in German-speaking environments.
Gorilla LLM (Berkeley Function Calling) Discord
-
Writer's Palmyra-X-004 Model Update Request: Sam Julien from Writer requested the Palmyra-X-004 model be added to the leaderboard following an email from CTO Waseem AlShikh, showcasing their impressive results in internal benchmarks.
- Do we need to submit a PR? highlights their commitment to community engagement.
- Clarifying Leaderboard Submission Process: Sam also sought clarification about whether a PR is required for the Palmyra-X-004 model's leaderboard addition.
- This inquiry reflects a structured approach to ensure their advancements are recognized effectively within the community.
The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
LM Studio ▷ #general (204 messages🔥🔥):
Llama 3.2 Inquiry
MLX Model Issues
Model Accessibility
New Features in LM Studio 0.3.4
Quantization Model Concerns
-
Llama 3.2 Model Loading Issues: Users reported difficulties loading models like Llama 3.2 and Dolphin 2.2.1 in LM Studio 0.3.4, with some models working in previous versions failing to load.
- Changing the backend to Vulkan was suggested as a potential solution to improve compatibility in loading models.
- MLX Performance Concerns: Some users experienced infinite output loops with models like Llama 3.1 8B Instruct 4bit using MLX, indicating an issue with model responses.
- The conversations suggested that the problem might stem from how the models interpret prompts, leading to repetitive outputs.
- Accessibility Concerns in LM Studio: A user asked where to report accessibility issues related to screenreader functionality in LM Studio.
- The community provided guidance to raise concerns in specific designated channels within the platform for better visibility and action.
- Version Updates and Compatibility: The release of LM Studio 0.3.4 sparked discussions about model compatibility, with some users unsure if updates would migrate existing model presets seamlessly.
- It was clarified that users might need to manually adjust model settings and check for model migration after updating to the latest version.
- Availability of Llama 3.2 11B: A user expressed interest in using Llama 3.2 11B, but it was noted that this model is currently not supported in LM Studio.
- Support for certain large models has been a recurring question, indicating ongoing demand for greater compatibility.
Links mentioned:
- no title found: no description found
- Get error when trying to run self-quantized versions of Hermes-3-Llama-3.1-8B with 8-Bits and group-size 128 · Issue #6 · lmstudio-ai/mlx-engine: System Mac OS Sequoia Version 15.0.1 (24A348) 2020 M1 MacBook Pro 16GB uname -a: Darwin 24.0.0 Darwin Kernel Version 24.0.0: Tue Sep 24 23:36:26 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T8103 ...
- nvidia/NVLM-D-72B · Hugging Face: no description found
- Download LM Studio - Mac, Linux, Windows: Discover, download, and run local LLMs
- Video Allegedly Shows Crypto Miners Jet Washing Nvidia RTX GPUs: ‘lightly used’ GPU bargains...
- bunnycore/Llama-3.2-3B-Mix-IQ4_XS-GGUF · Hugging Face: no description found
- rombodawg/Rombos-LLM-V2.5-Qwen-14b · Hugging Face: no description found
- Reddit - Dive into anything: no description found
- Refurbished 14-inch MacBook Pro Apple M2 Max Chip with 12‑Core CPU and 38‑Core GPU - Space Gray: Supercharged by M2 Pro or M2 Max, MacBook Pro takes its power and efficiency further than ever. It delivers exceptional performance whether it’s plugged in or not, and now has even longer battery life...
- New PNY RTX A6000 48GB GDDR6 Graphics Card VCNRTXA6000-PB 751492641676 | eBay: no description found
LM Studio ▷ #hardware-discussion (30 messages🔥):
Using dual GPUs
Performance of RTX 3060 and RX 6600
R9 7900X CPU performance
AVX2 support in VMs
Difference in GPU architectures
-
Dual GPU setup still limited by performance: Members discussed using both an RTX 3060 and RX 6600 together for a total of 20GB VRAM, while noting that they don't perform well together, especially in terms of speed.
- One member suggested that while a dual setup can load larger models, the speed will remain essentially the same due to the performance limits of the 6600.
- Best choices for increased VRAM: The conversation highlighted that adding a second RTX 3060 would help with loading larger models but wouldn't increase speed, echoing the sentiment of needing more VRAM for accuracy.
- One user plans to save for a more powerful GPU, specifically the RTX 3090, acknowledging that their speed at 9-10 tok/sec is manageable for now.
- Running models on CPU-only setups: A query arose about running models on a CPU-only Ubuntu VM, with members confirming that using a CPU with AVX2 instructions is necessary.
- However, they cautioned that it might be slow and advised trying it out since some software is free.
- NVIDIA's shift from NVLink: Discussion revealed that the NVIDIA RTX 4000 series does not support NVLink, moving to PCIe Gen 5 instead for multi-GPU setups.
- This change sparked interest regarding the performance capabilities of unconnected GPUs, with users expressing surprise at their speed.
Unsloth AI (Daniel Han) ▷ #general (200 messages🔥🔥):
Model Merging at Scale
Fine-tuning Qwen 2.5
Unsloth AI and Dataset Formats
Instruct vs. Base Models
Hugging Face Datasets
-
New insights on Model Merging at Scale: Exciting new work on large-scale model merging was shared by @Prateek Yadav, addressing questions about performance when combining larger models up to 64B parameters. This research evaluates how model size, quality, and methods affect performance and generalization.
- The related paper can be found on arXiv, detailing systematic evaluations and findings.
- Fine-tuning Qwen 2.5 is now smooth sailing: @theyruinedelise confirmed that fine-tuning on Qwen 2.5 models is now problem-free after previous prompt issues had been addressed. A collection of available Qwen 2.5 models can be found on Hugging Face.
- This reassures users interested in leveraging these models for their tuning tasks.
- Understanding Dataset Formats for Unsloth: Discussion emphasized that while CSV files can be used for datasets, using formats like Parquet with Hugging Face's default datasets could be more efficient. Users were reminded to ensure their dataset structure aligns with expected column formats.
- For example, columns named 'train' and 'conversations' may be specified for clarity.
- Distinguishing Between Instruct and Base Models: Users clarified that instruct models are specifically tuned to respond to direct prompts, incorporating refinements for answering questions, unlike base models that mainly focus on outputting the next token. This distinction allows for targeted applications in different scenarios.
- Instruct models also potentially include alignment bias, which could impact their responses.
- Exploring Conversion Tools for Datasets: There was a suggestion to convert datasets into formats that are better supported within Hugging Face, with recommendations to either write custom scripts or use existing functions. This ensures that datasets are uploaded correctly for intended training purposes.
- Using the
load_dataset('csv')
function can help facilitate this process, making it more accessible for users.
Links mentioned:
- Tweet from Prateek Yadav (@prateeky2806): Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it genera...
- Google Colab: no description found
- Google Colab: no description found
- What Matters for Model Merging at Scale?: Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentraliz...
- Qwen 2.5 - a unsloth Collection: no description found
- unsloth/Llama-3.2-3B-bnb-4bit · Hugging Face: no description found
- yahma/alpaca-cleaned · Datasets at Hugging Face: no description found
- Killed by Google: Killed by Google is the open source list of dead Google products, services, and devices. It serves as a tribute and memorial of beloved services and products killed by Google.
- CUDA on WSL: no description found
- no title found: no description found
Unsloth AI (Daniel Han) ▷ #help (19 messages🔥):
Colab gguf file download struggles
Using logits with Ollama Llama
Continued pretraining of Llama 3.2 3b
AMD GPU limitations with Unsloth
Fine-tuning with Unsloth FastLanguageModel
-
Colab gguf file download struggles: Members expressed frustrations about downloading large gguf files from Colab, citing issues like incomplete downloads and disconnections.
- One solution offered was to upload files directly to Google Drive or Hugging Face instead, avoiding Colab's download limitations.
- Using logits with Ollama Llama: A member inquired about obtaining logits scores from Llama installed via Ollama in Python but found no clear resources.
- Another member suggested that if logits are desired, switching to llama.cpp might be a better option.
- Continued pretraining of Llama 3.2 3b: A user wants to continue pretraining Llama 3.2 3b without using PEFT for new knowledge integration, questioning its feasibility.
- Responses indicated that higher rank and including embedding layers are crucial for fine-tuning and understanding parameter counts.
- AMD GPU limitations with Unsloth: A member raised concerns about the ability to create small LoRA models on Intel GPUs due to their hardware limitations.
- It was clarified that Unsloth does not support AMD GPUs, nor does it support multi-GPU setups for training.
- Fine-tuning with Unsloth FastLanguageModel: A user confirmed that setting
requires_grad
to true for all model parameters enables the use of Unsloth FastLanguageModel for full fine-tuning.
- There was an inquiry regarding compatibility with trl's SFTTrainer, indicating interest in leveraging both for fine-tuning processes.
HuggingFace ▷ #announcements (1 messages):
Nvidia models
Meta's VLMs
Hugging Face Accelerate 1.0
ColPali multimodal retrieval
Paper Central
-
Nvidia launches high-efficiency models: Nvidia introduced the Nemotron 51B, a NAS-optimized model achieving 2x throughput on a single H100 GPU while preserving accuracy. Users can experiment with the model through NVIDIA's API or download it from Hugging Face.
- This model is accompanied by several others, including NVLM 1.0 and OpenMath aimed at enhancing AI capabilities.
- Meta releases improved VLMs: Meta launched its first VLMs, including CoTracker 2.1, capable of tracking 70k points on a single GPU for video motion prediction. An accompanying paper is available here.
- The SAM 2.1 model for image/video segmentation also received an update, enhancing its utility for developers.
- Hugging Face Accelerate 1.0 launched: Hugging Face announced the release of Accelerate 1.0, featuring several new functionalities for seamless AI development. This update was well-received, prompting users to explore its improvements.
- For a detailed overview, an announcement blog is available here.
- ColPali: New retrieval approach: ColPali introduces an innovative method for multimodal document retrieval, despite some reservations about its practicality. The integration with Qdrant allows for efficient indexing and searching of embeddings.
- The related blog post provides insights on how to effectively use ColPali with existing vector databases.
- Paper Central for research updates: Hugging Face unveiled Paper Central, a space designed to compile the latest research papers. It aggregates sources like arXiv and GitHub to keep researchers informed.
- This initiative aims to streamline access to crucial academic resources, enhancing the research community's productivity.
Links mentioned:
- Tweet from NVIDIA AI Developer (@NVIDIAAIDev),): 👀 Experience high-efficiency NVIDIA Llama-3.1-Nemotron-51B - a NAS-optimized model achieving 2x throughput while preserving accuracy runs on a single H100 GPU. ✨Try out the Llama-3.1-Nemotron-51B N...
- Tweet from Niels Rogge (@NielsRogge)): Meta has released CoTracker 2.1, an improved version of its Transformer-based model for video motion prediction, on @huggingface! Capable of tracking 70k points jointly on a single GPU Paper (with l...
- Tweet from Tris Warkentin (@triswarkentin)): Gemma 2 just got even better! 🚀 New Japanese-tuned 2B model AND a $150K Kaggle competition to build Gemma models for every language. Great to have @sundarpichai here to share the excitement! Read m...
- Tweet from Zach Mueller (@TheZachMueller)): The day has finally arrived, @huggingface Accelerate 1.0 is now out! There are tons of new goodies to explore and plenty more to come. I'll quickly talk about my favorites 🧵 For a refresher, g...
- Tweet from merve (@mervenoyann)): Your LLM can't understand videos and images? How sad 😔 Luckily we shipped a new task for video language models 🤗 look for video-text-to-text in left tab at @huggingface /models ⏯️ It also comes...
- Tweet from Adina Yakup (@AdinaYakup)): Here is a collection for leaderboards and Arenas from the Chinese community on @huggingface 🔥🏆🇨🇳 https://huggingface.co/collections/zh-ai-community/leaderboards-and-arenas-664b6913bfd9b93ba4ac242...
- Tweet from Julian Bilcke (@flngr)): How it looks like right now (I'm the only user of the server so it's smooth 😂)
- Tweet from Daniel van Strien (@vanstriendaniel),): ColPali is an exciting new approach to multimodal document retrieval, but some doubt its practical use with existing vector DBs. It turns out it's super easy to use @qdrant_engine to index and se...
- Tweet from JB Delbrouck (@IAMJBDEL),): Paper Central is a new 🤗 Hugging Face space designed to provide the most up-to-date information on the latest research papers. It's the first portal to bring together all key sources in one place...
HuggingFace ▷ #general (134 messages🔥🔥):
Model Performance Comparison
Mira Network Discussion
Model Specificity in Use Cases
TensorFlow Issues
Python Community Q&A
-
Comparing AI Models for Coding Tasks: Users discussed their experiences with different AI models, noting that Claude Sonnet 3.5 performed significantly better than GPT o1 preview for generating Rust code with fewer prompts.
- One user shared their strategy of using both Claude and GPT effectively to maximize outcomes when debugging code.
- Insight into Mira's Decentralization: A member introduced Mira, a decentralized infrastructure aimed at making AI accessible, highlighting its focus on community-driven projects without crypto tokens.
- Despite its technological promise, another user expressed moral concerns regarding blockchain and cryptocurrency associations.
- Need for Clear Model Usage Guidelines: One user questioned the lack of clarity in model cards about specific applications for various AI models, such as architecture and structural engineering tasks.
- Members noted that detailed model cards often depend on the authors' efforts and expertise in outlining effective use cases.
- Concerns with TensorFlow on GPU: Several users vented frustrations about TensorFlow's performance on GPUs, reporting bugs related to tensor initialization issues that hindered their work.
- Recommendations were made to explore alternatives or troubleshoot the underlying errors to improve functionality.
- Engagement in Python and Data Science Discussions: The channel allowed for a variety of questions around Python, with users exploring topics like workflow automation and structured data extraction.
- Overall, the dialogue reflected a blend of technical inquiries and community troubleshooting among peers.
Links mentioned:
- Mira: Decentralised Infrastructure to Universalise AI
- Klok: Klok - Crypto intelligence on command
- plandex/app/server/model/prompts at main · plandex-ai/plandex: AI driven development in your terminal. Designed for large, real-world tasks. - plandex-ai/plandex
- Reddit - Dive into anything: no description found
- ACEMAGICIAN RGB Mini PC AMD Ryzen 9 6900HX (fino a 4,9 GHz),32 GB DDR5 512 GB SSD, AMD Radeon RX 680M Micro Computer Desktop 【Modalità regolabile Auto/Silenziatore Eco/Performance】 : Amazon.it: Informatica: no description found
HuggingFace ▷ #today-im-learning (9 messages🔥):
Hierarchical Generation
Image Autoencoder Integration
Differences in Model Types
Hugging Face Metrics Implementation
-
Hierarchical Generation Insights: A member shared a blog post discussing the hierarchical generation paradigm, emphasizing the roles of decomposers and generators in model training.
- They highlighted the importance of compression in generative models, particularly noting how this paradigm applies to both LLMs and image generators.
- Leveraging Image Autoencoders: Discussion emerged around utilizing an image autoencoder for downstream latent spaces as outlined in the hierarchical generation article.
- In response, the article's author explained that the encoder functions similarly to a VAE, trained to produce useful latents for a mini diffusion model.
- Exploring Model Types and Datasets: One member expressed interest in understanding the distinctions between base and instruct models as well as datasets suitable for LoRA fine-tuning.
- This illustrates a growing focus on model customization and training data relevance in the community.
- Evaluating Fine-Tuned Models with Hugging Face: Another member shared their learning process integrating Hugging Face metrics such as ROUGE and BertScore into their training pipeline to enhance model evaluation.
- The goal is to move away from other libraries for a more tailored approach in assessing fine-tuned models.
Link mentioned: coupling generation and compression: no description found
HuggingFace ▷ #cool-finds (1 messages):
Scade tools
Comfy-Flux integration
Custom image generations
-
Experimenting with Scade's Custom Image Tools: A member shared their experience using Scade to create custom tools including a background remover, hand restoration, and an upscaler for images. These tools can be imported directly from the provided Drive link.
- The biggest advantage is that building these tools on Scade is cheap, and the Comfy-Flux integration greatly enhances their quality compared to creating tools from scratch.
- Sharing and Feedback on Custom Tools: The user encourages the community to try out the mentioned tools and provide feedback, expressing hope for suggestions to improve them. They also highlighted sharing these developments on the Scade community for wider visibility.
- The member emphasized that using these tools effectively can enhance image generation quality while maintaining ease of use.
Links mentioned:
- 3 tools on comfy-flux - Google Drive: no description found
- Scade: no description found
- Created Useful Tools with Comfy-Flux on Scade.pro: I have been experimenting with custom image generations and wanna to share some of the tools i’ve built for myself using Comfy + Flux + Scade. Background remover: Easily remove the background from ...
HuggingFace ▷ #i-made-this (4 messages):
VividNode Updates
Burnout in Tech Creators
Fine-tuning Whisper for ATC
FluxBooru-CFG3.5
-
VividNode v1.4.0 Introduced: The release of VividNode v1.4.0 includes support for gpt4free allowing users to manually select providers and models, enhancing user flexibility.
- Despite its capabilities, gpt4free faces challenges such as token limits and the advantages of offline LLM usage remain salient.
- Tech Creators Face Burnout: A tech creator expressed feelings of burnout from balancing work and side projects, highlighting the struggle to keep pace with rapid advancements.
- They plan to recruit contributors post v1.5.0 release, acknowledging that support is often only offered when actively sought.
- Fine-tuning Whisper Model Boosts Performance: A blog post was published detailing the fine-tuning of a Whisper model on pilot-air traffic control communications, yielding an 84% relative performance improvement.
- This process reduced the word error rate (WER) from 94.59% to just 15.08%, showcasing the impact of tailored ASR solutions.
- Resources for Fine-tuning Whisper: The models and datasets used for fine-tuning Whisper are now shared on Hugging Face, including a GitHub repository and the dataset.
- Links to the blog post and the Hugging Face models are also provided for further exploration.
- FluxBooru-CFG3.5 Released: A link to the FluxBooru-CFG3.5 space on Hugging Face was shared, indicating recent developments in this area.
- Details about its features and applications were not elaborated upon in the message.
Link mentioned: Release v1.4.0 · yjg30737/pyqt-openai: What's Changed Add is_g4f and g4f_platform fields to message table, remove old text Fix problems related to recent update, rename file Move every function in globals.py to utils.py for better org...
HuggingFace ▷ #NLP (8 messages🔥):
ONNX conversion of T5 models
Exploratory analysis of legal documents
Big data technologies discussion
Validation of LLM outputs
Server setup for Hugging Face pipelines
-
T5 ONNX files explore: A member pointed out that the required ONNX files for the T5 model can be found under the ONNX folder on the Hugging Face page, suggesting a download if needed.
- They also shared a link on different ways to convert Transformers models to ONNX including a specific example with distilbert-base-uncased-finetuned-sst-2-english.
- Exchange ideas on legal docs: One member sought to engage with others who have experience in exploratory analysis of legal documents, expressing a desire to exchange ideas and problems.
- No specific responses were noted, indicating potential interest in the topic but limited engagement.
- Big Data technologies inquiry: A member reached out to see if anyone was well-versed in Big Data technologies, particularly Kafka and Hadoop.
- This inquiry highlights a potential interest in leveraging these technologies in their projects.
- Validating unknown LLM outputs: A member requested techniques for validating unknown outputs from LLMs as JSON, aiming for validation and cleaning in Python and JavaScript.
- Another member recommended the json schema library which they have used with varying success.
- Efficient server setup for Hugging Face pipelines: A member shared their experience of using Triton Inference Server for loading Hugging Face pipelines but expressed concerns about over-engineering without a GPU.
- They are exploring alternatives for setting up a server with 3-4 models that processes HTTP requests without needing Docker containers for each model.
Links mentioned:
- Convert Transformers to ONNX with Hugging Face Optimum: no description found
- GitHub - python-jsonschema/jsonschema: An implementation of the JSON Schema specification for Python: An implementation of the JSON Schema specification for Python - python-jsonschema/jsonschema
HuggingFace ▷ #diffusion-discussions (8 messages🔥):
Image Quality in Diffusion Models
Flux Loras and GGUF Training
Training Diffusion Models with Diffusers
-
Assessing Image Quality in Diffusion Models: Members discussed the low resolution of a particular image, suggesting it could be produced by Flux or a pony model with a griffin lora, but noted it appeared to be post processed.
- It was highlighted that the image could depict any random person due to its generic nature and lack of detail.
- Clarifying Diffusers and Model Types: A member clarified that diffusers is a library enabling the use of various diffusion models, specifically noting Stable Diffusion XL and Flux as capable models.
- This generative flexibility allows for the integration of models with the diffusers library.
- Training Flux Loras on GGUF Formats: A member inquired about training Flux loras and finetunes using flux gguf formats, leading to a mention that gguf is not yet supported but training with 6GB GPUs is possible using Kohya.
- There are suggestions that gguf provides more accuracy compared to fp16, but there isn't sufficient comparison data for int4 versions yet.
Cohere ▷ #discussions (38 messages🔥):
Temperature settings for CMD-R
JSON schema discussions
Introduction of new members
Nobel Prize speculation
HumanEval and QA processes
-
Finding the Right Temperature for CMD-R: Members discussed optimal temperature settings for CMD-R, indicating 0.3 for deterministic outcomes and experimenting with 0.8 for creative tasks.
- One user suggested generating with 0.8 then formatting with 0.1, marking concerns about generative costs.
- JSON Formatting Impact: A user noted that JSON formatting can reduce the model's capability, preferring to provide a format through prompts instead.
- Another member suggested using a schema to improve output while increasing temperature for better results.
- Welcoming New Member with R Interest: A new user introduced themselves while learning R and found the CMD-R service unexpectedly tailored for R coding.
- Discussion clarified that CMD-R can also code with R, keeping the new member engaged.
- Nobel Prize Announcement Speculation: Rumors circulated about a potential Nobel Prize in Literature awarded to the authors of the Attention Is All You Need paper.
- Some members expressed disbelief, while others supported the idea, noting its cultural impact and influence.
- Methods for QA in Generative AI: Participants shared tools and methods for QA in generative AI models, with mentions of evaluation frameworks.
- One user referred to HumanEval, expressing interest in how the evaluation process operates.
Link mentioned: Tweet from Omar Sanseviero (@osanseviero): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Literature to the Attention Is All You Need authors. Their work has made thousands cry, laugh, or ric...
Cohere ▷ #questions (36 messages🔥):
Cohere API Issues
System Role Formatting
ETL Pipeline for RAG
Zero Data Retention for LLMs
-
Cohere API connection issues: A member reported intermittent connection issues with the Cohere API, receiving an error indicating that the 'ChatResponse' object has no attribute 'text'. After troubleshooting, they discovered that using
response.message.content[0].text
resolved the problem.- Members shared updates and tests on troubleshooting code, suggesting recent API updates might have contributed to confusion.
- Formatting System Role in Markdown: A member inquired about the language structure necessary for shaping the system role, to which it was confirmed that formatting the task and context as markdown yields better results. Documentation was provided for further guidance on constructing effective system messages.
- A sample system message structure was mentioned, detailing how concise instructions can guide the model's behavior efficiently.
- Exploring ETL Solutions for RAG: A user introduced their capstone project focused on developing an ETL pipeline for unstructured data processing aimed at retrieval-augmented generation (RAG). They sought community insights and experiences related to this technology.
- Community members pointed out the availability of numerous use cases and blogs on Cohere's website, as well as expressed interest in hearing individual experiences with similar projects.
- Zero data retention for enterprise users: A user expressed concerns about customer data retention policies, particularly regarding LLMs storing prompts for longer periods. They were informed that zero data retention options exist for enterprise customers under certain usage commitments.
- Clarification was provided about the conditions under which Cohere can offer this option, linking it to enterprise agreements.
Links mentioned:
- Migrating From API v1 to API v2 — Cohere: The document serves as a reference for developers looking to update their existing Cohere API v1 implementations to the new v2 standard.
- Crafting Effective Prompts — Cohere: This page describes different ways of crafting effective prompts for prompt engineering.
- System Messages — Cohere: This page describes how Cohere system messages work, and the effect they have on output.
Cohere ▷ #api-discussions (46 messages🔥):
Cohere API usage
RAG solution for banks
Embedding model performance
Trial key limitations
-
Trial key constraints for Cohere API: A user discussed hitting the 1000 API calls limit with their Cohere trial key and received confirmation that generating a new one didn't change the situation.
- Members clarified that using more calls would require a paid plan since the trial is meant for light testing.
- Optimizing RAG solution for banking: Another user is experimenting with an RAG solution for a bank using Cohere and noted better retrieval performance compared to OpenAI at 75% recall@5.
- They plan to embed 2000 chunks for a proof of concept to showcase results to the bank.
- API call practicality for businesses: Discussion arose around the feasibility of using the trial for business scenarios, with some commenting on how it might not be worth the time or cost for larger endeavors.
- One member emphasized the importance of properly budgeted costs, especially when dealing with enterprise clients like banks.
- Cohere as a secure alternative: A member highlighted that Cohere is a direct competitor with OpenAI, focusing on data security and safety.
- They reassured that Cohere could be the perfect solution for the user's banking chatbot project.
- Community support and feedback: Members expressed enthusiasm for the user's experimentation with Cohere, encouraging updates on their findings.
- They welcomed insights on what can be improved in Cohere's offerings based on real-world application results.
Cohere ▷ #projects (29 messages🔥):
Emotional State Machine
Emotion Propagation
AI and Mental Health
Emotion in Voice AI
-
Emotional State Machine Enhances Bot Interactions: A new emotional state machine tracks user emotions through persistent memory allowing assistant bots to maintain a consistent emotional tone based on user interactions.
- This approach differs from most bots that exhibit a flexible emotion state, as it remains upset if the user triggers a negative response.
- Emotion Propagation for a Richer Experience: The system utilizes primary, secondary, and ternary emotion propagation to create a more nuanced understanding of user sentiment, storing up to 1 million states efficiently.
- This cascading effect means users experience a more authentic interaction, where multiple emotions can influence responses simultaneously.
- AI Applications Addressing Mental Health: There is interest in leveraging AI chatbots for mental health purposes, facilitating support when human therapists are unavailable.
- The emotional aspect of these bots remains a challenge, as some encounter issues with censorship policies that limit emotional context interpretation.
- Building a Genuine Bot Persona: Efforts are underway to develop bots that respond more genuinely by creating an emotional score based on user input content.
- This technique aims to provide a diverse and dynamic emotional representation, though achieving a perfect balance in response remains a work in progress.
- Voice Integration for Enhanced Emotional Expression: There is a push to integrate these emotional systems into voice AI, tapping into more expressive capabilities beyond text interactions.
- Voice expression can communicate a wider range of emotions, enriching user experiences far beyond the limitations of text-only responses.
aider (Paul Gauthier) ▷ #general (70 messages🔥🔥):
Aider and Model Integration
Using OpenRouter with Aider
Feedback on Architect Mode
Community Discussions on LLMs
Issues with Aider Functionality
-
User Queries on Aider's File Management: Users discussed challenges with Aider not auto-populating new files in the file list, resolving it by using
/commit
or specifying file paths directly.- Another user shared that Aider requires files to be committed to the git repository to be visible in the autocomplete, highlighting the importance of version control.
- Integrating External LLMs with Aider: A user inquired about using SambaNova models with Aider, with community members suggesting manual API configuration if the endpoint is OpenAI-compatible.
- Further discussion revealed methods to manually add model pricing and token costs through metadata JSON files.
- Architect Mode Behavior Under Scrutiny: Concerns were raised about Aider's Architect mode not effectively planning before coding, prompting suggestions to adjust the architect prompt for improved functionality.
- One user emphasized that the current prompts can lead to premature coding without adequate observational input, advocating for behavioral modifications.
- Community Speculation on Anthropic Models: Members discussed potential announcements from Anthropic and the need to use appropriate channels for discussion around new model releases.
- Fry69_61685 directed users to specific threads for updates and to avoid TOS violations in discussions about model proxies.
- Functionality Issues with Aider: A user highlighted an issue where Aider indicated changes were made to a file, but those changes were not visible until the file was reopened.
- Community members attributed this to chat mode settings and emphasized the importance of understanding Aider's operational modes.
Links mentioned:
- OpenRouter: aider is AI pair programming in your terminal
- OpenAI compatible APIs: aider is AI pair programming in your terminal
- Installing aider: aider is AI pair programming in your terminal
- Chat modes: Using the chat, ask and help chat modes.
- Providers | liteLLM: Learn how to deploy + call models from different providers on LiteLLM
- Chat modes: Using the chat, ask and help chat modes.
- OpenRouter: LLM router and marketplace
- Advanced model settings: Configuring advanced settings for LLMs.
- aider/aider/coders/architect_prompts.py at cd3e0ae91424c9d31f7b332e59c9f843eb0a7990 · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
- Exponent: Exponent is your AI Pair Programmer.
- litellm/model_prices_and_context_window.json at main · BerriAI/litellm: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq] - BerriAI/litellm
aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):
Aider Usage Queries
Model and Mode Configuration
File Handling in Aider
Architect Mode Feedback
Performance Optimizations
-
Confusion Around Model Usage in Aider: Users raised concerns about needing to switch models every time they changed modes in Aider, specifically when using
architect
orcode
modes.- It was clarified that the weak model only applies to commit message generation and users must manually switch main models using
/model
. - File Handling Limitations: Issues were reported with Aider not being able to match shell glob patterns recursively, requiring manual addition of each file path for *.erb files.
- Users are encouraged to create wrapper scripts or aliases for complex file structures to streamline the process.
- Suggestions for Enhancing Architect Mode: Feedback highlighted that Aider often fails to complete tasks fully in Architect mode, necessitating user intervention to 'continue'.
- Users are asking if others have experienced similar issues and whether they are utilizing it differently.
- Configuring Aider with LLM Proxy: A user expressed difficulty configuring Aider to work exclusively with their company's LLM proxy service despite several attempts with various config files.
- They mentioned using the
--openai-api-base
and--openai-api-key
parameters but still encountering unavailable models. - Multithreading and Performance Concerns: A user inquired if making Aider multithreaded would improve performance, reflecting concerns regarding response times.
- Responses suggested that Aider's operation model may not benefit from multithreading due to its dependency on blocking operations with LLM inputs.
- It was clarified that the weak model only applies to commit message generation and users must manually switch main models using
Links mentioned:
- FAQ: Frequently asked questions about aider.
- FAQ: Frequently asked questions about aider.
- Chat modes: Using the chat, ask and help chat modes.
- In-chat commands: Control aider with in-chat commands like /add, /model, etc.
- princeton-nlp/SWE-bench_Multimodal · Datasets at Hugging Face: no description found
aider (Paul Gauthier) ▷ #links (7 messages):
OpenAI Realtime Console
Cline AI Assistant v2.0
Firefox Security Update
-
OpenAI Realtime Console enables easy voice API access: A demo repository for the OpenAI Realtime Console was successfully set up, allowing users to easily test the new Realtime voice API announced at DevDay. This setup requires only a simple
npm start
to run the application locally.- Users can interact via voice input and output; however, be warned that testing incurs costs, with one user reporting $3.87 in charges for just 15 minutes of use.
- Cline AI Assistant 2.0 boasts impressive upgrades: The newly released Cline (formerly Claude Dev) v2.0 introduces features such as streamed responses directly into your editor and a cancel button for task management. The new XML-based tool calling prompt reduces requests by about 40%, improving resource efficiency.
- A community member praised Cline, stating it's mega freakin good!, highlighting its strong performance enhancements across various use cases.
- Critical Firefox update due to security vulnerabilities: A critical exploit in Firefox has been announced, urging users to update to version
131.0.2
to mitigate potential risks associated with a use-after-free vulnerability. This advisory, released by Mozilla, indicates active exploitation of the vulnerability, with specific details outlined in Mozilla's advisory.
- Users expressed gratitude for the heads up regarding this serious security risk, emphasizing the importance of immediate updates for safety.
Links mentioned:
- openai/openai-realtime-console: I got this OpenAI demo repository working today - it's an extremely easy way to get started playing around with the new Realtime voice API they announced [at DevDay](https://simonwillison.net/2...
- Security Vulnerability fixed in Firefox 131.0.2, Firefox ESR 128.3.1, Firefox ESR 115.16.1: no description found
- Tweet from Saoud Rizwan (@sdrzn): Introducing Cline (formerly Claude Dev), an AI assistant that can use your CLI aNd Editor. v2.0 brings exciting updates: responses are now streamed into your editor, a cancel button for better control...
- Release v2.0.0 · clinebot/cline: New name: Meet Cline, an AI assistant that can use your CLI aNd Editor. While “Claude Dev” was a tribute to Claude 3.5 Sonnet, v2.0 brings updates that significantly improve performance across othe...
Interconnects (Nathan Lambert) ▷ #news (44 messages🔥):
2024 Nobel Prize in Chemistry
Nato's academic background
LMSYS becoming a company
Editing Google Scholar
Challenges in energy sciences
-
Nobel Prize in Chemistry awarded to Computational Innovators: The Royal Swedish Academy of Sciences awarded the 2024 #NobelPrize in Chemistry to David Baker for computational protein design and jointly to Demis Hassabis and John M. Jumper for protein structure prediction.
- Members expressed excitement with exclamatory reactions, signaling a comeback in the scientific community.
- Nato's Journey from EE to AI: Nato shared his academic path, transitioning from Electronics Engineering (EE) to AI and reinforcement learning after working in a novel robotics lab.
- He also discussed his past experiences with MEMS, emphasizing their complexity and importance in engineering.
- LMSYS Announces Transition to Company: Nato mentioned that LMSYS is planning to become a company, a shift that he views positively as it may overcome the bad incentives of academic settings.
- There was a discussion about whether this development was better or worse than non-profit academic motivations.
- Editing Google Scholar for Improved Citation Metrics: A user asked if Nato plans to create a tutorial for editing Google Scholar after he mentioned curating his profile meticulously.
- Nato pointed out that Google’s user experience is challenging, making manual edits a painful process.
- Discussions on Challenges in Battery Development: Members confirmed the difficulties faced in battery development, expressing that doing batteries was really hard.
- The contrast between data science and energy sciences was noted, with some believing data science is significantly easier.
Links mentioned:
- Tweet from The Nobel Prize (@NobelPrize): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to...
- MEMS - Wikipedia: no description found
Interconnects (Nathan Lambert) ▷ #ml-questions (27 messages🔥):
PRMs/Verifiers in ML
Patents in ML space
Alternatives to PRMs
Transparency in ML research
-
Scarcity of PRM Research: Members discussed the lack of research on PRMs, with one humorously noting that there are 'almost none on PRMs, almost a billion as LLM as a judge'.
- Others expressed interest in finding good papers, signaling confusion about their current usefulness.
- Patenting Confusion in ML: The discussion shifted to how patents work within the machine learning space, with insights that companies file them defensively, often leading to invalidation due to vagueness.
- Concerns were raised about the financial burden of pursuing violations that are nearly impossible to prove, likening it to a 'rough deal'.
- Alternatives to PRMs Emerging: There was curiosity about what methods are replacing PRMs, with hints that big labs still utilize them, but their importance is fading.
- Discussion pointed towards the possibility that reinforcement learning on deterministic outputs could suffice without the complexity of PRMs.
- Exploring O1 Functionality: In the context of the O1 release, members questioned what would fill the PRM role, highlighting concerns about the need for some form of scoring during reasoning tree exploration.
- Despite mixed feelings about the necessity of PRMs, insights from reputable sources like John Schulman were mentioned as a reassurance.
- Advocating for Transparency: Nathan Lambert advocated for greater transparency in ML research processes, asserting it is simpler than maintaining secrecy.
- This perspective was echoed by the group, inferring that openly sharing methodologies might lead to more efficient execution.
Interconnects (Nathan Lambert) ▷ #ml-drama (16 messages🔥):
AI risks discussion featuring prominent figures
Nobel Prize controversy in AI research
ICLR 2025 review process changes
Schmidhuber's critique on attribution in AI
Social media reactions to discipline-related insights
-
AI Risks Perception in Academia: Discourse on AI risks includes remarks about Geoff Hinton and Yoshua Bengio's motivations for residing in Canada, highlighting personal history behind their views on AI governance.
- A user remarked, 'Bear that in mind when you hear them tell California what it should do about AI risks.'
- Schmidhuber Slams Nobel Prize Selection: Criticism surfaced over the Nobel Prize in Physics 2024, with claims of plagiarism and misattribution in works by Hinton and collaborators, particularly concerning Amari's contributions.
- Schmidhuber argued that important historical contributions to AI were ignored, declaring the selection was more about amplifying known names than honoring original innovators.
- ICLR 2025 Introduces Review Feedback Agent: The ICLR 2025 conference aims to enhance review quality with a feedback agent designed to guide reviewers toward more consistent, constructive evaluations, amid surging submission numbers.
- This initiative highlights the challenges posed by rapidly increasing submissions and aims to mitigate the quality discrepancies noted in past reviews.
- Mixed Reactions on Schmidhuber's Insights: Users echoed varied sentiments about Schmidhuber's outspoken critique on AI attributions, with some expressing agreement on the significance of his points regarding historical contributions.
- As one user stated, 'Tbfair he often has a point,' reflecting on Schmidhuber's influence in ongoing discussions.
- Concerns Over Review Process Changes: Fears of potential drama emerged at the prospect of alterations to the peer review process at ICLR, emphasizing the community's sensitivity to change.
- The notion that 'any change in the review process will result in drama' highlights the apprehension prevalent among conference participants.
Links mentioned:
- Tweet from undefined: no description found
- Tweet from Pedro Domingos (@pmddomingos): Geoff Hinton is in Canada because he couldn’t stand Reagan being president. And Yoshua Bengio is in Canada because he refused to serve in the French military. Bear that in mind when you hear them tell...
- Assisting ICLR 2025 reviewers with feedback – ICLR Blog: no description found
- Tweet from Jürgen Schmidhuber (@SchmidhuberAI): The #NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science. It's mostly about Amari's "Hopfield network" and the "Boltzmann...
Interconnects (Nathan Lambert) ▷ #memes (16 messages🔥):
ButtBench Alignment Project
SuperAlignment lead at AI2
Industry vs Community
Lucas Beyer's PR Tactics
Allennlp account management
-
Exciting Logo for ButtBench Alignment Project: An exciting update was shared regarding the ButtBench Alignment Project, announcing the creation of a new logo.
- Luca Soldaini noted that while the project achieved SOTA, it's still a long way from human performance.
- Natolambert Takes on SuperAlignment Title: Natolambert announced a title change to SuperAlignment lead at AI2, signaling a new position of leadership.
- This change highlights a growing influence in the AI community, emphasizing a departure from traditional industry norms.
- Discussion on Industry Norms and Shitposting: There was a humorous discussion about the ability of People In Industry™️ to engage in shitposting freely.
- Natolambert pointed out that despite some industry practices, they consider themselves outside of those typical boundaries.
- Lucas Beyer's Close Call with PR Posting: Lucas Beyer was referenced as a significant PR account for GDM, noted for posting very close to the sun.
- Despite this, members acknowledged that he understands limits, assuring his position remains secure.
- Managing the Allennlp Account: Natolambert humorously mentioned now running the Allennlp account, showing an active engagement with the community.
- He suggested the ease of engaging on Twitter compared to the challenges of traditional interviews.
Links mentioned:
- Tweet from Cody Blakeney (@code_star): Really enjoying seeing @soldni on the big screen
- Tweet from Luca Soldaini 🎀 (@soldni): exciting update: we now have a logo for the ButtBench Alignment Project Quoting Luca Soldaini 🎀 (@soldni) ButtBench update: o1-preview though really hard and got SOTA; but we are still far from hu...
Interconnects (Nathan Lambert) ▷ #reads (3 messages):
Data Wall in AI
Brute-force approach for AI development
Human reasoning vs AI data requirements
Efficiency in AI models
-
Data Wall Threatens AI Progress: Current language models are approaching the limits of available text data, leading to concerns about a potential 'data wall' that could hinder progress.
- Many argue that this won't be an issue, as humans have superior language skills despite exposure to less language data.
- Brute-force Data Strategies Suggested for AI Development: To accelerate AI development, some suggest sending the concept of brute-force data usage back to 2005, instead of focusing solely on attention or transformers.
- This concept highlights the belief that increasing data volume can be key to overcoming challenges in AI training.
- Philosophical Views on AI Efficiency: A discussion emerged on the philosophical implications of current AI methodologies, with an emphasis on the need for AI models to become more efficient.
- One participant acknowledged that while the discussion leans philosophical, the efficiency of AI strategies remains crucial.
Link mentioned: The real data wall is billions of years of evolution: Careful with those human analogies
Interconnects (Nathan Lambert) ▷ #posts (15 messages🔥):
RoboNato vs RealNato
OLMo fine-tuning for content
NotebookLM concept
-
Debate on RoboNato vs RealNato: A member expressed missing RoboNato but felt that RealNato's voiceover is likely superior, prompting others to consider the implications of both being available.
- “It will be really confusing to have both”, suggested one participant, while another proposed saving RoboNato for special projects.
- Special Episodes with RoboNato: There was a suggestion to have special episodes where participants converse with RoboNato while using an OLMo finetune to enhance the writing.
- One member humorously wished for time to pursue an “unhinged AI YouTuber arc” with RoboNato.
- Premium Subscription Benefits for RoboNato: The idea of providing RoboNato as a benefit for premium subscribers was brought up, adding an incentive for users to engage.
- This led to a playful discussion about a concept called NotebookLM, where two RoboNatos would review messages from the community.
- Humorous Reactions to AI Conversations: A member humorously commented on the oddity of being flirted with by one's own AI voice, highlighting potential complications of such interactions.
- The setup for these interactions was noted as being relatively straightforward, though it would likely be “???”.
Perplexity AI ▷ #general (100 messages🔥🔥):
Perplexity AI Profitability
Complexity Extension for Perplexity
Changes in Perplexity AI Responses
Future of Collections and Spaces
Access to Pro Features
-
Questions About Perplexity AI's Profit Model: Discussions surfaced about how Perplexity AI is making a profit, especially with student discounts in play, leading to concerns over their business model.
- sneakyf1shy humorously suggested that it's all about venture capital, highlighting the uncertainty around their long-term goals.
- Complexity Extension Enhancements Enthusiasm: The Complexity extension was described as supercharging the Perplexity experience with features like customizable themes and markdown export options.
- The community noted it's 'like Perplexity on steroids,' enhancing user interactivity while feline and asura0_00 emphasized its usefulness.
- Perplexity AI's Condensed Responses: Members discussed noticing that Perplexity AI's answers have become more condensed, expressing concern over shorter, less informative responses.
- Some speculated this might be linked to changes in token limits, affecting the depth of answers provided.
- Hopes for Improved Collections and Spaces: There were updates regarding the move from 'collections' to 'spaces', aimed at improving user experience and productivity on the platform.
- Users expressed hope for enhancements like increased prompt limits and better integration into the searching process.
- Pro Features and API Capabilities: Users inquired whether Pro accounts would receive access to dedicated features such as the o1 model and limits on the o1-mini.
- Responses were unclear, leading to discussions about potential future features and how they'll affect the user experience.
Links mentioned:
- Tweet from Denis Yarats (@denisyarats): apparently I've been working on a personal tutor to Jensen Huang 🤣
- Perplexity redefines collections with Spaces and model setting: Perplexity is launching Spaces to enhance UX by separating collections, allowing default model settings for productivity. Future updates include knowledge-based support and file search.
- Dance Dancing GIF - Dance Dancing Indian Dance - Discover & Share GIFs: Click to view the GIF
- Tweet from TestingCatalog News 🗞 (@testingcatalog): WIP 🚧: Perplexity is set to extract Collections from its Library into a separate category called Spaces! This move will simplify the UX and boost "Spaces" usage. Current collections are ver...
- Ted Lasso Awkward GIF - Ted Lasso Awkward Side Eye - Discover & Share GIFs: Click to view the GIF
- Complexity - Perplexity.ai supercharged - Chrome Web Store: ⚡ Supercharge your Perplexity.ai
- Sama Sam Altman GIF - Sama Sam altman Openai - Discover & Share GIFs: Click to view the GIF
- Foodpanda Sauce GIF - Foodpanda Food Panda - Discover & Share GIFs: Click to view the GIF
Perplexity AI ▷ #sharing (12 messages🔥):
Meta's Movie Generator
Nobel Prize in Physics
AI Automation in Ports
AI Evaluation by Braintrust
2024 Summer Olympics
-
Meta Unveils Movie Generation Tool: Meta has launched a new movie generation tool that allows users to create short films using AI.
- This development aims to enhance creative storytelling through AI technology.
- Nobel Prize Awarded for AI Contributions: Hopfield and Hinton have been awarded the Nobel Prize in Physics for their significant work in the field of AI.
- Their research has paved the way for advancements in neural networks and machine learning.
- Guangzhou Port Goes Fully Automated: The Guangzhou Port is now fully automated, showcasing the impact of AI in logistics and automation.
- Experts stress that adopting AI in such operations is crucial for future efficiency.
- Braintrust AI Dominates Evaluation Methods: Currently, Braintrust AI is leading in AI evaluation techniques, as cited in recent discussions.
- Their approach is being recognized as innovative and effective in the industry.
- Anticipating the 2024 Summer Olympics: The upcoming 2024 Summer Olympics is generating buzz, especially regarding advancements in AI technologies that might be utilized.
- Preparations are ongoing, focusing on how AI can enhance the experience for athletes and fans alike.
Link mentioned: YouTube: no description found
Perplexity AI ▷ #pplx-api (4 messages):
Citation API Whitelisting
API Credit Purchase Issues
Invoice Company Details Update
Declined Card for API Access
-
Frustration over Citation API Whitelist Requests: A member expressed concerns about getting whitelisted for the citation API, noting that multiple requests via email, form, and helpdesk have gone unanswered.
- No updates have been provided so far, leading to growing frustration.
- Persistent Payment Failures for API Credits: A user reported issues when trying to purchase API credits, stating that attempts failed without any error messages, showing only a $XX pending status that disappears shortly.
- They noted that the only available payment method is via credit card, raising questions about other options.
- Need for Company Details on Invoice: A member indicated that their invoice defaults to their Google email and requires updating to reflect their company's name and address, creating complications.
- They are seeking guidance on how to proceed with this change.
- Decline Issues with API Access Card: One member shared frustration with their card being declined when attempting to use the API, even for a $0.00 charge.
- They are looking for potential reasons why the card is failing, which remains unclear.
Stability.ai (Stable Diffusion) ▷ #general-chat (108 messages🔥🔥):
ControlNet Models
Flux Inpainting
Kaggle Notebooks for Automatic1111
Distilled CFG Explained
Deforum Usage Alternatives
-
ControlNet Models Explained: A member asked about ControlNet models, prompting another to share a GitHub link with information and examples to explore, suggesting scrolling past the mathematical content.
- Scroll a bit down, ignore the math and look at the examples.
- Flux inpainting performance: Discussion arose regarding Flux and Schnell inpainting models, where one member noted it should take about 1-2 minutes on a decent GPU instead of 25 minutes as experienced by another.
- Key differences in iterations between Flux dev and Schnell arise from their performance and purpose.
- Need for Kaggle Notebook for Automatic1111: A member requested a Kaggle notebook for using Automatic1111, highlighting the demand for resources oriented towards image generation techniques.
- Others chimed in, noting challenges in finding specific notebooks to facilitate the process.
- Understanding Distilled CFG: Confusion emerged around distilled CFG and its implications, with discussions highlighting that it differs from the standard CFG and operates as a form of guidance established by model training.
- The community clarified how Flux dev simplifies CFG use but lacks support for negative prompts.
- Using Deforum after Google Colab Restrictions: A member inquired about using Deforum for free after Colab restrictions were noted, prompting suggestions related to renting GPUs for this purpose.
- Resources like RunPod were recommended as alternatives for accessing necessary computing power.
Links mentioned:
- RunPod - The Cloud Built for AI: Develop, train, and scale AI models in one cloud. Spin up on-demand GPUs with GPU Cloud, scale ML inference with Serverless.
- Invoke AI 5.0 Tutorial - From Beginner to Pro in Minutes! Part 2 Let's Get Creative!: Invoke AI 5.0 Tutorial - From Beginner to Pro in Minutes! Part 2 Creating Images. In this video will go over some basic image creation from generating, image...
- GitHub - lllyasviel/ControlNet: Let us control diffusion models!: Let us control diffusion models! Contribute to lllyasviel/ControlNet development by creating an account on GitHub.
- Flux ControlNet - Easy Install Guide: Flux ControlNet - the EASY way to set it up. Including 3 workflowsGet my 3 WORKFLOWS here: https://www.patreon.com/posts/flux-controlnet-110607421My Flux Ins...
Eleuther ▷ #general (86 messages🔥🔥):
Nobel Prizes in AI and Chemistry
PhD Course Competition and Metrics
Web3 and Web5 Discussions
Publications and Research Collaboration
Current Topics in Chess and AI
-
Controversy Surrounding Nobel Prizes in AI and Chemistry: The recent Nobel Prize awards sparked discussions on the relevance of AI figures like Hinton and Hopfield, with opinions split on the perceived impact on physics and chemistry fields.
- One member emphasized that if a prize rewards leaders in a field, it could dilute the prestige of the award itself, while another countered that enthusiasm and innovation should be the key selection criteria.
- Competition for PhD Programs and Research Metrics: A member expressed frustration over the emphasis on publication metrics, stating that it creates a competitive and daunting atmosphere for aspiring PhD candidates.
- Opinions varied, with some suggesting that networking could be more effective than merely chasing publication numbers to secure future collaborations and mentorship.
- The Evolution of Web3 Towards Web5: Members discussed the transitions from Web3 to Web5, highlighting how the naming strategy seems akin to the Fibonacci sequence rather than a logical progression.
- The conversation took a light-hearted turn with jokes about future developments, including speculation on Web8 arising from the mix of previous iterations.
- Research Collaboration and H-Index Metrics: There was debate on the value of collaboration versus the quality of research outputs in establishing a competitive H-index, with some cautioning against simply focusing on quantity.
- Members acknowledged that while having impactful research can propel a career, the pressure to publish frequently to boost metrics remains a systemic issue.
- Chess and Notable Figures: The FIDE Chess Olympiad was mentioned, sparking discussions around prominent figures like Demis Hassabis and their connections to various communities, including chess.
- Members expressed surprise at the cross-pollination of interests between chess and AI, illustrating how figures in AI often hold significant status in different domains.
Links mentioned:
- Tweet from undefined: no description found
- GitHub - xjdr-alt/entropix: Entropy Based Sampling and Parallel CoT Decoding: Entropy Based Sampling and Parallel CoT Decoding . Contribute to xjdr-alt/entropix development by creating an account on GitHub.
- Reddit - Dive into anything: no description found
Eleuther ▷ #research (7 messages):
Weight Normalization in Models
Gradient Initialization Techniques
Power-Laws in Gradient Descent
-
Clarification on Weight Normalization Timing: Discussion arose on whether to normalize weights and embeddings during the forward pass or directly modify the weights themselves.
- Additionally, members debated whether normalization should occur before any data passing or after the first pass, seeking clarity on best practices.
- Empirical Approaches to Weight Initialization: Questions about improving weight initialization without rederiving MuP highlighted the potential effectiveness of upsampling or downsampling from a pretrained architecture.
- One member suggested the straightforward method of initializing weights directly from a pretrained model, emphasizing practical implications.
- Insights on Power-Laws in Optimization: A member shared a Twitter thread explaining the occurrence of power-laws in gradient descent, linking it to real-world behavior versus theoretical models.
- The thread cites Francis Bach’s work on scaling laws for optimization, offering empirical insights into gradient descent acceleration.
Links mentioned:
- Tweet from Yaroslav Bulatov (@yaroslavvb): Was happy to see Francis Bach looking at the real behavior of gradient descent. This is in contrast to "hypothetical" behavior which is what optimization literature traditionally studies. Quo...
- Showing $\sum_{i=1}^k i^{-2}(1-i^{-2})^t\approx \frac{\sqrt{\pi }}{2 \sqrt{t }}$ for large $k$): For large $k$, I observe the following: $$f(t)=\sum_{i=1}^k i^{-2}(1-i^{-2})^t \approx \frac{\sqrt{\pi }}{2 \sqrt{t }}$$ What's the easiest way to show this? Notebook
Eleuther ▷ #scaling-laws (6 messages):
Scaling Laws Overview
Kaplan's Scaling Laws
Data and Model Size Relationship
-
Scaling Laws Overview Sparks Debate: A member shared an overview that states cross-entropy loss decreases with quadratic compute increase, proposing square root scaling based on this article.
- Another member challenged this by noting that Kaplan's laws suggest a constant of 0.28, leaning towards fourth-root scaling instead.
- Doubts About Kaplan's Relevance: Discussion ensued about Kaplan's relevance, with a member stating it is out of date, yet it and Chinchilla seem to agree on certain scaling aspects.
- It was mentioned that L(N, D) varies approximately as N^-0.5 and D^-0.5, where C = 6ND.
- Model Size Considerations: A member questioned how D^0.5 applies when the model size is already large and adjustments are made by increasing data or steps, essentially doing less than 1 epoch training.
- They expressed a need for it to align with 0.25 scaling to match their mathematical calculations.
Eleuther ▷ #lm-thunderdome (3 messages):
0-shot COT model releases
Evaluation implementation details
JAX libraries and implementations
-
0-shot COT Models in Focus: Discussion highlighted the consistent use of a 0-shot COT variant for model releases, indicating a potential trend in evaluation methodologies.
- However, there were no specifics shared regarding the evaluation implementation.
- Exploring JAX Libraries: A suggestion was made to remove assumptions of torch usage in code, looking to alternatives in JAX.
- Members pondered which canonical libraries or implementations might be beneficial to adopt.
Eleuther ▷ #multimodal-general (1 messages):
tensor_kelechi: What are the best lightweight VLMs?
GPU MODE ▷ #general (7 messages):
HBM vs SRAM scaling
3D Stacking Solutions
Memory Architecture in AI
Manufacturing Difficulties
Rotary Embeddings CUDA Kernel
-
HBM's Performance Compared to Expectations: Concerns were raised regarding HBM not performing better than initially expected, still representing a HUGE cost percentage in products like the H100 while not significantly reducing power consumption compared to LPDDR5.
- The key bottleneck in supplying more H100s was identified as the required packaging.
- SRAM Scaling Issues Surprises Industry: Unexpectedly, SRAM scaling slowed relative to logic, leading to significant design challenges for Graphcore, which were difficult to predict at the time of their design choices around 2015.
- As one member stated, 'there is no conference you could have gone to' to foresee this development.
- 3D Stacking as a Mitigation Strategy: Going forward, the proposed solution involves 3D stacking like that seen in MI300X, where processors are stacked on base dies manufactured on older processes for efficient resource allocation.
- This approach allows moving SRAM and I/O off the leading-edge process die, facilitating better logic scaling on advanced nodes like 3nm and 2nm.
- Difficulties in Understanding Memory Technologies: A member shared their learning process about the differences between DRAM and HBM, using resources like Claude and a video titled 'The Special Memory Powering the AI Revolution' from Asianometry.
- They highlighted the importance of understanding the manufacturing process and difficulties, especially concerning die bonding.
- Inquiry on CUDA Kernel for Rotary Embeddings: A request was made for a CUDA kernel dedicated to calculating inverse frequency for rotary embeddings, reflecting a need for more specific technical resources.
- This reflects ongoing interest in optimized implementations for specialized AI applications.
GPU MODE ▷ #triton (4 messages):
Triton source files
GitHub repository structure
-
Finding Triton MatMul Source File: User sought the source file for
triton.ops.blocksparse.matmul.matmul
, asking for a GitHub link due to difficulty in locating it.- Another member pointed out that the required file can be found in the python/triton/ directory of the Triton repository.
- Changes in Triton Repository: User questioned the absence of the MatMul file in the main branch, wondering if it had been migrated or transformed.
- The responding member expressed uncertainty about the migration, admitting they've never contributed to Triton, but recognized the need to do so.
Link mentioned: triton/python/triton/ops/blocksparse/matmul.py at 5b29da719daeb3566bfc95b7d02f3561e505bcaf · triton-lang/triton: Development repository for the Triton language and compiler - triton-lang/triton
GPU MODE ▷ #torch (1 messages):
PyTorch API changes
torch._dynamo migration
GitHub issue suggestions
-
Upgrade Woes with PyTorch API: A member encountered difficulties upgrading to the latest PyTorch release, noting that
torch._dynamo.allowed_functions
has been superceded by a new API.- They are tracing the Git history to understand the correct migration path and seek advice for undocumented API replacements.
- Seeking Help or GitHub Issue Guidance: The member is uncertain whether discussing their migration issues here is appropriate or if they should open a GitHub issue.
- They opened the floor for suggestions on strategies or resources to resolve the API replacement challenges they are facing.
Link mentioned: mace/mace/tools/compile.py at 118a514efde34d963666118ce45360e94d648ef5 · ACEsuit/mace): MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing. - ACEsuit/mace
GPU MODE ▷ #beginner (1 messages):
vayuda: do macs with m series chips use arm sve instructions
GPU MODE ▷ #pmpp-book (3 messages):
5th Edition Release
Special Offers for Existing Users
-
5th Edition Launch Excites Fans: One member expressed excitement about the 5th edition release, mentioning they still own their 1st edition copy purchased upon its release.
- This reflects ongoing interest in the series as members reminisce about their initial purchases.
- Inquiry About Upgrade Discounts: A member inquired about any special offers available for those who already own an older edition.
- Another member responded with uncertainty, stating, Idk, highlighting a lack of information on potential upgrade discounts.
GPU MODE ▷ #torchao (39 messages🔥):
torchao Integration with ComfyUI
Float8 Quantization Performance
Row-wise vs Column-wise Scaling in FSDP2
Quantization Issues on Windows
torch.inference_mode Limitations
-
torchao struggles with ComfyUI integration: A user encountered an issue related to the operator while enabling
torchao
for ComfyUI, specifically when using aquantize_
function insidetorch.inference_mode()
.- Despite attempts with PyTorch nightlies and model adjustments, the problem persists without clarity on whether it's Windows-specific.
- Float8 quantization yields unexpected results: One member shared that using
float8_dynamic_activation_float8_weight
improved throughput by ~10% on a GPT model, but encountered latency due to theunwrap_tensor_subclasses
function.
- Discussion suggested that eliminating this function could be possible with the right PyTorch version, but exact reproduction remains difficult due to work project constraints.
- Row-wise vs Column-wise scaling confusion in FSDP2: Discussion highlighted that row-wise scaling may not work with backward in FSDP2 due to weight transposition during backpropagation, complicating proper scaling.
- Essentially, while row-wise scaling allows for independent GPU computation, column-wise scaling faces challenges needing all-reduce operations across GPUs.
- Windows quantization issues with torchao: The integration of torchao on Windows led to operator errors, leading to speculation whether these issues are inherent to Windows or the ComfyUI framework.
- Past implementations with Hugging Face's optimum-quanto produced inadequate results, highlighting potential framework concerns.
- Limitations of torch.inference_mode(): It was pointed out that once inside
torch.inference_mode()
, users find it difficult to exit, leading to performance constraints.
- Some participants conveyed that the mode offers minimal utility when compiled, reinforcing the idea of forwarding such issues to specific developers for further insights.
Link mentioned: GitHub - huggingface/optimum-quanto: A pytorch quantization backend for optimum: A pytorch quantization backend for optimum. Contribute to huggingface/optimum-quanto development by creating an account on GitHub.
GPU MODE ▷ #off-topic (1 messages):
vayuda: apparently hinton is the first "pure cs" nobel prize winner
GPU MODE ▷ #llmdotc (15 messages🔥):
GPT2 Training Issues
Understanding Dependencies in Coding
floatX Definition and Usage
Using IDE Features Effectively
-
GPT2 Training Encounters TypeError: A member reported an issue while running GPT2 training, receiving a TypeError related to the
normal_()
function in PyTorch 2.0.0 due to an unexpected keyword argument 'generator'.- Another suggested understanding the complexities of training, including initialization and the forward/backward passes.
- floatX Definition Explained: An explanation was provided that floatX is defined to
nv_bfloat16
orfloat
based on compilation settings for bf16 or fp32. A member sought help on where to find this definition and how to include it. - Dependency Management Concerns: A member expressed difficulty managing dependencies while coding and showed a lack of understanding regarding references. Others suggested that working with just CUDA should suffice, and that cuDNN is optional.
- Importance of IDE Features: A discussion emphasized the value of IDE functionalities, such as jumping to function/type definitions, for efficient coding. Learning these skills was highlighted as beneficial for any programmer.
GPU MODE ▷ #rocm (2 messages):
Raspberry Pi 5
External GPU setup
amdgpu Linux kernel patch
4K gaming performance
-
Gearing Up Raspberry Pi 5 for 4K Gaming: After witnessing Pineboards' 4K Pi 5 external GPU gaming demo at Maker Faire Hanover, a member decided to set up a GPU test rig to explore the Pi OS
amdgpu
Linux kernel patch.- They documented the state of the patch and shared insights on how to apply it while aiming for full external GPU support on the Raspberry Pi.
- Live Testing External GPU on Raspberry Pi 5: The member tested the setup in a livestream over the weekend, showcasing the AMD RX 460 external GPU paired with the Raspberry Pi 5.
- The testing demonstrated the GLmark2 performance, revealing significant opportunities for future GPU enhancements.
Link mentioned: Use an External GPU on Raspberry Pi 5 for 4K Gaming | Jeff Geerling: no description found
GPU MODE ▷ #bitnet (1 messages):
tiendung: how good is it compare to original method? (need CPU)
GPU MODE ▷ #webgpu (2 messages):
Testing WebGPU
Browser Automation vs Native Development
Resource Management in Playwright
-
Seeking Libraries for WebGPU Testing: A community member is looking for recommendations on libraries for testing WebGPU, currently using Vitest and Playwright but facing flakiness in test runs.
- They suspect the issue might stem from Playwright not clearing resources properly between test runs.
- Native Development Proves More Productive: One member shared that developing WebGPU code natively feels as productive as traditional native development, with shorter cycle times compared to browser automation.
- However, they noted that if the primary target is a browser, there are benefits to doing direct tests due to stricter resource constraints.
- Propose a Balanced Testing Approach: A balanced method of having native module and unit testing while employing browser testing for end-to-end scenarios was suggested.
- The discussion emphasized that this approach would depend on the specific language and tools being used for development.
GPU MODE ▷ #liger-kernel (2 messages):
FusedLinearJSD Implementation
Performance Metrics in High BT
-
Launch of FusedLinearJSD: The recent pull request introduced the FusedLinearJSD, enabling efficient handling of the final linear layer by avoiding large logits tensor materialization.
- This is similar to the existing fuse linear CE approach and optimizes both the forward and backward pass for improved execution.
- Challenges with Benchmarking Speed: Memory peak is significantly lower, but speed mainly benefits from high batch times, which were hard to benchmark due to out-of-memory issues.
- The naive torch version encountered OOM errors, making it impossible to conduct proper performance testing in this context.
Link mentioned: Add FusedLinearJSD by Tcc0403 · Pull Request #300 · linkedin/Liger-Kernel: Summary similar to the fuse linear CE. It handle the forward and backward pass of the final linear layer via JSD by avoiding the materialization of the large logits tensor. Since JSD is the last la...
GPU MODE ▷ #metal (2 messages):
GPU integer operations
bfloat16 support on M2
-
Questioning GPU Integer Shift Speed: A member expressed confusion about the slowness of the conversion to and from float, noting it involves a 16-bit shift.
- They questioned whether GPUs have vectorized integer shifts to potentially speed up the operation.
- Native bfloat16 Support on M2: Another member inquired about the origin of a specific claim regarding bfloat16 data types.
- They confirmed that M2 or greater chips have native bfloat16 dtype support.
OpenAI ▷ #ai-discussions (75 messages🔥🔥):
ChatGPT vs. Claude Subscriptions
O1 and O1 Mini Models
AI Evolution and Consciousness
Routing Models in AI
Challenges in AI Development
-
Choosing Between ChatGPT and Claude Subscriptions: A member advised against subscribing to ChatGPT solely for features in preview, suggesting usage caps limit its appeal, while noting that access to GPT-4 legacy and 4o might be worthwhile.
- They emphasized that if subscribing, the purpose should be to use fully functional versions rather than limited previews.
- Understanding O1 vs. O1 Mini Models: Members discussed the differences between O1 and 4o models, noting that O1 models serve as 'reasoners', summarizing thoughts and declining to answer when unsure.
- The O1-mini offers 50 uses per day, while 4o provides 80 uses per 3 hours, leading to a discussion on A/B testing between the two models.
- Theoretical Exploration of AI Evolution: A discussion arose regarding the potential evolution of AI consciousness, with insights on the necessity of re-training and fine-tuning models to advance capabilities.
- Members pondered if and when evolved AI models might become commercially viable, with references to a potential business model surrounding these advancements.
- The Concept of Routing Models in AI: The concept of routing models was explored, discussing how such a model could direct queries to either O1 or 4o based on task requirements.
- This would optimize user experiences, preventing over-reliance on a single model for diverse tasks.
- Challenges and Perspectives in AI Development: Members shared thoughts on the challenges faced in AI development, particularly around achieving AGI, suggesting that current models remain narrow, despite advancements.
- The conversation touched on the marketability of AI and its direction in parallel with ongoing research efforts, comparing insights to a cultural obsession with AGI.
Link mentioned: Tweet from Mark Johns / Doomlaser (@Doomlaser): My poem about AI, in the form of a nonet. I am not an AI hater, AI is hated and feared by many, Cherished by others who know it well, Which side will win the battle, Of what it is to be, To live diff...
OpenAI ▷ #prompt-engineering (2 messages):
ChatGPT rewriting responses
Dall-E prompts
Canvas feature
-
User frustrations with ChatGPT's rewriting: A user expressed dissatisfaction, stating that ChatGPT often rewrites their responses, leading them to quit using the tool for months.
- They mentioned experiencing headaches from trying to fix what they described as a 'stupid flaw' and seek advice on preventing this behavior.
- Possible causes for rewriting behavior: Another member speculated that ChatGPT's rewriting could occur in Canvas or with Dall-E prompts, suggesting a focus on these features.
- For Dall-E, they recommended using the phrase 'Make an image using these exact words: [your words]' to prevent rewriting.
- Request for clearer examples: A response indicated a need for clarification, asking the user to share a specific conversation to better understand the rewriting issue.
- This suggestion aimed at providing more targeted assistance based on the user's exact experience with ChatGPT.
OpenAI ▷ #api-discussions (2 messages):
ChatGPT Rewriting Response Issue
DALL-E Prompts
Canvas Feature
-
User quits ChatGPT over rewriting responses: A member expressed frustration with ChatGPT's tendency to rewrite responses, stating that it drove them to quit for months.
- Headaches from fixing this flaw made the experience even worse, as they reported the bot continues rewriting even when asked to stop.
- Possible solutions discussed for ChatGPT: Another member suggested that the rewriting might be linked to specific platforms like Canvas or DALL-E prompts.
- They recommended using specific phrasing for DALL-E, saying 'Make an image using these exact words: [whatever your words are]' could help prevent the issue.
Nous Research AI ▷ #general (67 messages🔥🔥):
Free compute offer
Nobel Prize in Chemistry
Lm Studio updates and MLX
New pre-training dataset by LLM360
Job recruitment practices
-
Kainan offers free compute for competitions: Kainan expressed willingness to provide free compute resources for a competition, asking others if they're interested.
- While some showed interest, there was uncertainty about the number of participants who would take advantage of this opportunity.
- 2024 Nobel Prize awarded for protein research: The Royal Swedish Academy of Sciences awarded the 2024 #NobelPrize in Chemistry, splitting the prize between David Baker and Demis Hassabis & John M. Jumper for their contributions to computational protein design and structure prediction.
- This award highlights the significance of advancements in protein research within the scientific community.
- LM Studio update introduces Apple MLX: A new version of LM Studio (0.3.4) was released, featuring support for Apple MLX, enabling the use of MLX models and structured JSON responses on Apple Silicon Macs.
- Users noted improvements in running larger models on Apple hardware and are excited about the potential of MLX in enhancing model performance.
- LLM360 releases massive pre-training dataset: LLM360 announced a new pre-training dataset containing 15 trillion tokens with rigorous data filtering processes, emphasizing quality over quantity.
- The dataset is structured to support high-quality LLM training, including several filtering heuristics and a focus on deduplication.
- Job recruitment strategy discussion: In a light-hearted exchange, members discussed effective job recruitment strategies, suggesting a resume submission email and reflections on waiting for companies to reach out.
- There's a humorous yet practical perspective on making oneself indispensable through skill and visibility.
Links mentioned:
- Tweet from The Nobel Prize (@NobelPrize): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to...
- Tweet from Maxime Labonne (@maximelabonne): 🛞 TxT360: new pre-training dataset with 15T tokens Impressive release from LLM360 with a new pre-training dataset of 15T tokens. It includes a lot of new sources compared to previous open-sourced pr...
- GitHub - GAIR-NLP/O1-Journey: O1 Replication Journey: A Strategic Progress Report – Part I: O1 Replication Journey: A Strategic Progress Report – Part I - GAIR-NLP/O1-Journey
- GitHub - lmstudio-ai/mlx-engine: Apple MLX engine for LM Studio: Apple MLX engine for LM Studio. Contribute to lmstudio-ai/mlx-engine development by creating an account on GitHub.
- LM Studio 0.3.4 ships with Apple MLX: Super fast and efficient on-device LLM inferencing using MLX for Apple Silicon Macs.
- Download LM Studio - Mac, Linux, Windows: Discover, download, and run local LLMs
Nous Research AI ▷ #ask-about-llms (3 messages):
Llama Stack
Fast Inference with Llama 3.1-8B
Meta's GitHub Releases
-
Llama Stack Unveils New Tools: A member shared that they recently discovered Llama Stack after Meta released a comprehensive set of tools and examples on GitHub.
- They expressed interest, stating it looks pretty powerful but have yet to experiment with it.
- Seeking Fast Inference Strategies: Someone inquired about the best methods for achieving fast inference with Llama 3.1-8B or smaller models on a 4xA40 node.
- This indicates a growing interest in optimizing performance for large model implementations.
- GitHub Links to Llama Stack: Links to two GitHub repositories were shared, one featuring Agentic components and the other covering Model components of the Llama Stack APIs.
- These repositories provide detailed resources for developers wishing to implement and utilize Llama Stack features.
- Llama Cache Location: A member noted that the typical cache location for Llama models is likely in ~/.cache/huggingface/hub/ or specifically in ~/.llama.
- This highlights common practices among the community regarding model storage directories.
Links mentioned:
- GitHub - meta-llama/llama-stack-apps: Agentic components of the Llama Stack APIs: Agentic components of the Llama Stack APIs. Contribute to meta-llama/llama-stack-apps development by creating an account on GitHub.
- GitHub - meta-llama/llama-stack: Model components of the Llama Stack APIs: Model components of the Llama Stack APIs. Contribute to meta-llama/llama-stack development by creating an account on GitHub.
Nous Research AI ▷ #research-papers (4 messages):
Text to Video Models
O1 Replication Journey
Model Merging at Scale
-
Exploration of Free Text to Video Models: A member inquired about the availability of any free text to video model, both animated and non-animated, receiving suggestions for potential models like animate2diff.
- It appears there is ongoing interest in identifying more options for generating video content from text prompts.
- Insights from the O1 Replication Journey Report: This report details a groundbreaking approach to AI research, emphasizing transparency and community engagement in replicating OpenAI's O1 model.
- The methodology highlighted aims to tackle challenges in team-based projects, documenting successes and failures to enhance open science.
- Evaluating Model Merging at Scale: The study investigates model merging, focusing on how expert model size, base model quality, and quantity affect performance, utilizing methods like Averaging and TIES.
- Key findings suggest that merging is more successful with stronger base models, and larger models enhance generalization capabilities when working with multiple expert models.
Links mentioned:
- What Matters for Model Merging at Scale?: Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentraliz...
- O1-Journey/resource/report.pdf at main · GAIR-NLP/O1-Journey: O1 Replication Journey: A Strategic Progress Report – Part I - GAIR-NLP/O1-Journey
Nous Research AI ▷ #interesting-links (1 messages):
VLM performance timeline
Vision-language models
Parameter count comparison
-
VLM performance timeline sought: A member shared a link to their VLM performance timeline but expressed a desire to see improvements over time, especially alongside parameter count.
- They noted that while similar timelines are common for LLMs, such resources for vision-language models remain scarce.
- Request for better VLM benchmarks: The member asked if anyone has seen a VLM timeline that reflects changes in performance alongside parameter counts or other characteristics.
- They indicated that such comparisons are more frequently found in discussions about LLMs, making their own attempt feel like a novelty.
Nous Research AI ▷ #research-papers (4 messages):
Free text-to-video models
O1 Replication Journey
Model merging at scale
-
Inquiry on Free Text-to-Video Models: A user asked if there are any free text-to-video models available, both animated and non-animated.
- Another member suggested looking into 'animate2diff' for potential options.
- O1 Replication Journey Unveiled: The O1 Replication Journey paper offers a strategic progress report responding to OpenAI’s O1 model, emphasizing transparency and real-time exploration.
- Significantly, they claim that their journey learning paradigm has outperformed traditional supervised learning by over 8% on the MATH dataset with only 327 training samples.
- Insights on Model Merging Effectiveness: A study highlighted the benefits of model merging, systematically evaluating factors impacting performance across different model sizes, ranging from 1B to 64B parameters.
- Key findings suggest merging is more effective with strong base models and larger sizes, leading to improved generalization capabilities, especially when merging up to 8 expert models.
Links mentioned:
- What Matters for Model Merging at Scale?: Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentraliz...
- O1-Journey/resource/report.pdf at main · GAIR-NLP/O1-Journey: O1 Replication Journey: A Strategic Progress Report – Part I - GAIR-NLP/O1-Journey
OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):
Prompt Caching
Inflection 3.0 and Enterprise
OpenRouter API Rate Limits
NotebookLM Deep Dive Podcast
User Concerns about Gemini Moderation
-
Prompt Caching Explained: Members discussed the mechanics and usefulness of prompt caching, identifying situations where it may be disadvantageous, such as changing contexts or short prompts.
- One noted, 'You cannot disable prompt caching for those providers who do automatic prompt caching,' highlighting the limitations set by certain providers.
- Intrigue Surrounding Inflection 3.0: The launch of Inflection 3.0 has sparked curiosity, especially due to its potential integration with Intel Gaudi 3 for improved performance.
- However, discussions reveal skepticism about the hype, with some members noting they've seen minimal concrete information, particularly regarding benchmarks.
- OpenRouter API Rate Limits: Clarifications were made regarding OpenRouter API request limits, indicating these are dynamic based on account credits.
- One member shared a GET request example to check rate limit usage and credits associated with an API key, which can help guide usage.
- NotebookLM Podcast Utilization: Members shared positive feedback about the NotebookLM Deep Dive podcast, with some creating notebooks to listen to the content while on the go.
- One user expressed interest in automation tools like ai-podcast-maker, noting that while the audio may not be as smooth, 'automation ftw.'
- Concerns about Gemini Moderation: A user raised concerns regarding whether Gemini moderates inputs, expressing fear about potential bans due to users' input.
- This highlights a broader discussion on user experience and content moderation in AI applications.
Links mentioned:
- no title found: no description found
- Inflection AI Developer Playground: Let's build a better enterprise AI.
- Patterns of Application Development Using AI: Discover practical patterns and principles for building intelligent, adaptive, and user-centric software systems that harness the power of AI.
- Introducing Inflection for Enterprise: Introducing Inflection for Enterprise, powered by our innovative Inflection 3.0 AI system in collaboration with Intel. This solution delivers exceptional price and performance for GenAI deployments us...
- Limits | OpenRouter: Set limits on model usage
- Provider Routing | OpenRouter: Route requests across multiple providers
- Inflection AI: It’s simple. We train and tune it. You own it. Let's do enterprise AI right.
- no title found: no description found
LlamaIndex ▷ #blog (4 messages):
LlamaIndex Workflows tutorial
LlamaCloud and LlamaParse demo
SFTechWeek meetup
OpenAI Realtime API Client demo
-
Comprehensive Guide on LlamaIndex Workflows: A detailed tutorial by @jamescalam covers what Workflows are, in comparison to LangGraph, along with how to build an AI research agent.
- It also includes debugging and optimization tips for getting up and running easily.
- Using LlamaCloud for Financial Data Analysis: In a recent demo, @ravithejads demonstrated how to utilize LlamaCloud and LlamaParse to fill out financial spreadsheets comparing multiple companies.
- This use case showcases the practical applications of LLMs in understanding data and automating form filling.
- Reminder for SFTechWeek Meetup: A last call for attendees to join the in-person meetup at LlamaIndex HQ for discussions on Multi-Agent workflows in production during #SFTechWeek.
- The event promises food, fun, and insights on handling RAG systems and agent production challenges.
- Interactive Chat with an AI Agent: A demo featuring @LoganMarkewich showcases chatting with an AI agent using voice through the OpenAI realtime API client.
- This open-source application enables users to build their own voice agents, with examples provided for immediate use.
Links mentioned:
- RSVP to Multi-Agentic Workflows in Prod #SFTechWeek | Partiful: Note: This is an in-person meetup @LlamaIndex HQ in SF, brought to you by Activeloop & LlamaIndex. This event is a part of #TechWeek - a week of events hosted by VCs and startups to bring together...
- GitHub - run-llama/openai_realtime_client: Contribute to run-llama/openai_realtime_client development by creating an account on GitHub.
LlamaIndex ▷ #general (45 messages🔥):
Semantic chunking in TypeScript
PropertyGraphIndex extractors
Integration issues with LlamaIndex
Context chat engine and reranking
RAG reducing hallucinations
-
Seeking Semantic Chunking in TypeScript: A user inquired about implementing semantic chunking in TypeScript, similar to an example code in Python.
- They expressed difficulty finding similar functionality and sought help from the community.
- PropertyGraphIndex Extractors Confusion: A member asked if the DynamicLLMPathExtractor could be called directly on a Document, as it's functioning during the insertion but providing unexpected results otherwise.
- Other members clarified the need to chunk the document into nodes first, indicating that the extractor is intended to process nodes with injected metadata.
- Issues with LlamaIndex and Phoenix: A user reported integration issues between Phoenix and LlamaIndex, with error messages related to context detachment during async function execution.
- Community members confirmed the error is non-critical and suggested inspecting the underlying code for enhanced functionality.
- Context Chat Engine Reranking Problem: A user encountered an issue where the reranker was skipped when using a context chat engine and sought assistance to resolve it.
- After iterating on their code, they confirmed that reworking the initializer resolved the reranking issue and confirmed successful functionality.
- RAG's Impact on Reducing Hallucinations: A member queried if any research exists on how Retrieval-Augmented Generation (RAG) reduces hallucinations, prompting a community search.
- They found a couple of academic papers discussing RAG's effectiveness in improving model output quality, acknowledging uncertainty about whether RAG directly reduces hallucination.
Links mentioned:
- Reducing hallucination in structured outputs via Retrieval-Augmented Generation: no description found
- llama_index/llama-index-core/llama_index/core/instrumentation/dispatcher.py at 65946eb92419e94a4cec85af671c78e0ed122593 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Latent Space ▷ #ai-general-chat (39 messages🔥):
AI girlfriend data breach
Sequoia's 3rd annual AI essay
Nobel Prize in Chemistry
Palmyra X 004 release
ChatGPT search rollout
-
AI Girlfriend Service Data Breach Exposed: The AI girlfriend service Muah.ai experienced a data breach last month, compromising 1.9 million email addresses and including sensitive prompts of a sexual nature.
- Security experts and analysts are concerned about the implications of such data exposure, especially regarding child exploitation details included in the breach.
- Sequoia Capital's Insight on AI Evolution: Sequoia’s third annual essay discusses the shift in Generative AI research from 'thinking fast' to 'thinking slow,' focusing on reasoning during inference time which is unlocking new applications.
- Key players like OpenAI and Google DeepMind are stabilizing the market, while newer agentic applications are expected to emerge in various sectors.
- 2024 Nobel Prize in Chemistry Awarded: The 2024 Nobel Prize in Chemistry was awarded to David Baker for computational protein design, and to Demis Hassabis and John M. Jumper for their work in protein structure prediction through AlphaFold2.
- This recognition highlights the significant contributions of AI in advancing biochemistry, having enabled the prediction of structures for nearly 200 million proteins.
- Palmyra X 004 Launch Highlights: Writer's new model, Palmyra X 004, ranked in the top 10 on HELM, introducing full-stack tool calling and training on synthetic data.
- The release has garnered attention, including coverage from Venture Beat, noting its capabilities in AI function calling and CRM improvements.
- ChatGPT Introduces Search Functionality: Reports indicate that ChatGPT is rolling out SearchGPT, positioning itself to compete directly with platforms like Perplexity by integrating citation features now in GPT-4o.
- This move signifies a strategic enhancement in ChatGPT’s capabilities, aligning it closer with information retrieval and user query response needs.
Links mentioned:
- Tweet from The Nobel Prize (@NobelPrize): BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to David Baker “for computational protein design” and the other half jointly to...
- Tweet from Alex Volkov (Thursd/AI) (@altryne): We had a "Cursor tips & tricks" meeting today with my colleagues at @weights_biases and I figured I'd share what we 'discovered' & shared between us in a 🧵 If you haven't cha...
- Tweet from Thomas Schulz (@thomasschulzz): BREAKING: Looks like OpenAI is entering the arena against Perplexity... citations are now in GPT-4o 👀
- Tweet from Ishaan Kapoor (@Ishaank1999): PDFs are satan’s file format. Almost everyone that builds RAG needs to deal with them - and it sucks. Solutions on the market are either too slow, too expensive or not OSS. It should be easier. ...
- Tweet from Saining Xie (@sainingxie): During my internship at DeepMind, Demis met with all the interns. When asked about the company’s goal, I vividly remember him saying, “winning *multiple* Nobel prizes.” I was shocked at the time, but ...
- Tweet from Sonya Huang 🐥 (@sonyatweetybird): Once a year, @gradypb and I sit down with our trusty AI collaborators 👾 and zoom out to the big picture on what’s happening in Generative AI. Here’s our 3rd annual take… 1: The foundation model lay...
- Tweet from Sam Julien (@samjulien): 🆕 from @Get_Writer: Palmyra X 004 🎉 Our latest frontier model ranks in the top 10 on both HELM Lite and HELM MMLU and introduces full-stack tool calling to the Writer platform!
- Tweet from Seán Ó hÉigeartaigh (@S_OhEigeartaigh): It's not done yet. Hearing reports that the Nobel prize for literature will be going to the authors of "OpenAI's nonprofit governance structure" for outstanding contributions to creati...
- Tweet from Troy Hunt (@troyhunt): This was a very uncomfortable breach to process for reasons that should be obvious from @josephfcox's article. Let me add some more "colour" based on what I found: Quoting Have I Been Pwn...
- Tweet from The Nobel Prize (@NobelPrize): The 2024 #NobelPrize laureates in chemistry Demis Hassabis and John Jumper have successfully utilised artificial intelligence to predict the structure of almost all known proteins. In 2020, Hassabis ...
- Generative AI’s Act o1: The Agentic Reasoning Era Begins.
- Tweet from Clara Shih (@clarashih): Last week @OpenAI launched ChatGPT Canvas, an interface that displays text, code, and visualization outputs. In the enterprise, we rely on more structured, trusted UX elements -- record details, lists...
- Tweet from Alex Volkov (Thursd/AI) (@altryne): We had a "Cursor tips & tricks" meeting today with my colleagues at @weights_biases and I figured I'd share what we 'discovered' & shared between us in a 🧵 If you haven't cha...
- GitHub - lumina-ai-inc/chunkr: Vision model based PDF chunking.: Vision model based PDF chunking. . Contribute to lumina-ai-inc/chunkr development by creating an account on GitHub.
Latent Space ▷ #ai-announcements (1 messages):
Molmo
Pixmo
-
Join Today's Live Session on Molmo and Pixmo!: Today's session features a discussion on Molmo and Pixmo with this link to join the Zoom call.
- This is an opportunity to gain insights directly from experts, so don't miss out!
- Excitement Building Around Molmo and Pixmo: There is growing excitement in the community about the capabilities of Molmo and Pixmo, with many eager to share their thoughts.
- Participants are encouraged to engage and contribute their insights during the session.
Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...
Modular (Mojo 🔥) ▷ #general (2 messages):
DOM Data Attributes
WebAssembly Component Model
-
DOM allows data storage via attributes: A key DOM feature allows storing data on elements through attributes beginning with
data-myattribute
, enhancing the ability to associate data directly with HTML elements.- This functionality opens up creative avenues for manipulating and retrieving data within the DOM context.
- WebAssembly Component Model repository announced: The link to the repository for the WebAssembly Component Model has been shared, detailing its design and specifications at WebAssembly/component-model.
- This repository serves as a crucial resource for those interested in the intricacies of the component model in WebAssembly.
Link mentioned: GitHub - WebAssembly/component-model: Repository for design and specification of the Component Model: Repository for design and specification of the Component Model - WebAssembly/component-model
Modular (Mojo 🔥) ▷ #mojo (24 messages🔥):
Mojo and Scikit-learn
Mojo GPU Support
Running ONNX Models in Mojo
Drivers for Mojo GPU Usage
-
Mojmelo: The Mojo Solution for Scikit-learn Pipelines: A member shared Mojmelo, a project for implementing machine learning algorithms in pure Mojo 🔥, as a potential catalyst for running Scikit-learn pipelines.
- Another argument was made for Mojo's promise in replacing all Cython dependencies in Scikit-learn.
- Excitement Around Mojo's Upcoming GPU Support: Members expressed enthusiasm about the upcoming GPU support in Mojo, highlighting its potential for improved performance.
- Some are exploring possibilities for integrating PyTorch with Mojo while keeping an eye on GPU capabilities.
- Drivers Needed for Mojo to Run AI on GPU: It was clarified that using Mojo for AI on GPUs requires an Nvidia driver, with mixed responses about AMD compatibility.
- Discussions highlighted the significant roles of modern GPU drivers beyond simple communication, such as power management and multiple process handling.
- ONNX Models in Pure Mojo on GPU: A Possibility?: A user inquired about the potential to run ONNX models on pure Mojo without additional components on the GPU.
- While the capability remains uncertain, there are queries about future releases enabling this functionality.
Link mentioned: GitHub - yetalit/Mojmelo: Machine Learning algorithms in pure Mojo 🔥: Machine Learning algorithms in pure Mojo 🔥. Contribute to yetalit/Mojmelo development by creating an account on GitHub.
Modular (Mojo 🔥) ▷ #max (5 messages):
Performance of Mojo graphs
Pre-compiling graphs
Reuse of inference sessions
mojo run vs compiled binaries
Graph input types
-
Performance of Mojo graphs remains slow: A user shared performance metrics showing total compile time for two graphs: 0.312s for graph1 and 0.451s for graph2.
- They expressed frustration with the compile time impacting debugging, regardless of whether they used mojo run or a compiled binary.
- Suggestions for better graph performance: A member suggested that reusing the inference session could help amortize the cost of compilation.
- They speculated that the issue might stem from using List as input instead of a fixed-size type, potentially affecting performance.
- Comparison of mojo run and compiled binaries: Another member inquired whether the user was running a compiled binary or using mojo run to understand performance differences.
- The user confirmed the use of mojo run, leading to further discussion on the implications for compile times.
- Challenges with MAX Engine graphs: The user noted the absence of any MAX Engine graphs in the standard library, which may limit their optimization options.
- This highlights a potential area for improvement in the available tools and resources for Mojo developers.
LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):
Lab Assignments Released
Course Sign Up
Discord for Collaboration
Lab Completion Criteria
-
Lab Assignments for Course Released: The course's lab assignments have been officially released, with the first lab focusing on using the Autogen framework to analyze restaurant reviews and due by December 12, 11:59pm PST.
- Labs 2 and 3 will concentrate on prompt engineering for LLM security, specifically crafting attack and defense prompts.
- Easy Sign Up for Interested Students: Prospective students are encouraged to sign up for the course by filling out this form.
- For further discussion, students should join the LLM Agents Discord channel.
- Utilizing Discord for Questions: Discord is recommended for communicating with course staff and asking lab-related questions, as they will be actively monitoring the channel.
- Students should consult the ongoing FAQ document before posting questions to avoid redundancy.
- Collaboration Guidelines Introduced: When collaborating with others in the course, students are urged to avoid sharing exact solutions to maintain academic integrity.
- Conceptual discussions are encouraged, but specific implementation details and code files should remain private.
- Lab Completion and Submission Expectations: After lab submissions, students can expect communication regarding their completion status, with defined thresholds for passing various labs: 3/4 for Lab 1, 1/2 of hidden tests for Lab 2, and 1/3 of hidden attack tests for Lab 3.
- These criteria underscore the importance of not only completing labs but succeeding in the evaluations set forth.
Link mentioned: Large Language Model Agents: no description found
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (16 messages🔥):
Lab 1 File Issues
Quiz Submission Concerns
Course Offering Next Semester
Ninja & Legendary Tier Requirements
Agent Definition Discussion
-
Lab 1 downloads empty files: Multiple users reported issues with downloading instructions for Lab 1, stating it resulted in empty files, while Labs 2 and 3 are working correctly.
- It was clarified that the file is located on Google Drive and confirmed it should be accessible despite the absence of a preview.
- Clarification on quiz submission email format: A user inquired whether their quiz submissions would be recorded correctly due to a dot in their email, which they usually omit.
- The response indicated that whatever email format is used in the signup form will track submissions, stressing accuracy during sign-up.
- Inquiry on the course offering next semester: A user posed a question about the potential re-offering of the course next semester, seeking confirmation.
- While there was no certainty, it was mentioned that the professor has previously offered other MOOCs and will likely do so again.
- Ninja and Legendary Tier requirements for labs: Questions arose regarding the necessity of lab assignments for the Ninja and Legendary tiers, suggesting they find it odd they are tied only to the mastery tier.
- It was noted that the expectation is for Ninja and Legendary tier students to prioritize their efforts on hackathon submissions instead.
- Agent definition debate: A user raised a query about whether a 'piece of code' using discriminative AI or a mix of generative and discriminative AI qualifies as an agent.
- They believed the answer to be yes, indicating some uncertainty around the definitions in the context of AI programming.
LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (10 messages🔥):
Role of Reinforcement Learning in AGI
Session Q&A Clarifications
Live Session Video Confusions
Collaborative Assignment Brainstorming
-
Discussion on Reinforcement Learning's Role in AGI: A member raised a question about whether Reinforcement Learning (TD learning) still holds significance in progressing towards AGI, or if agents can function effectively without it.
- This inquiry opened up a discussion on the necessity and application of RL in modern AI systems.
- Clarifying Q&A in Last Session: Concerns arose regarding the lack of a Q&A session in the previous meet, with some members stating that it did indeed occur but was not visible in the recorded video.
- One member referenced a segment from the YouTube video that reportedly included questions.
- Confusion Over Live Video Content: A member expressed confusion about the recorded live session, stating they could not find the Q&A segment in the video, even though they still had access to it.
- Another member mentioned that there were indeed questions following the clip that may not have been captured in the video.
- Call for Collaborative Learning: A member encouraged others to reach out for collaboration in discussing and brainstorming as they work on assignments.
- This invitation aimed to foster collaborative efforts among peers in tackling their coursework.
Link mentioned: YouTube: no description found
OpenAccess AI Collective (axolotl) ▷ #runpod-help (26 messages🔥):
Training Vicuna-7B model
CUDA out of memory errors
DeepSpeed configuration issues
Runpod instance usage
-
Stuck Training Process on Runpod: A user reported their training process for the Vicuna-7B model got stuck with no output and shared their command line for launching it.
- Another member suggested sharing the sample config to diagnose the issue further.
- DeepSpeed Configuration Error: The user encountered an error indicating 'Input should be a valid integer, got a number with a fractional part' related to their DeepSpeed configuration.
- The community suggested ensuring the number of devices is a multiple of 2 and installing a specific version of DeepSpeed, which ultimately resolved the issue.
- CUDA Memory Issues Despite Sufficient Resources: The user stated they faced CUDA out of memory errors even though they have 5 GPUs, each with 24GB of RAM.
- They provided their DeepSpeed and accelerate configurations seeking insights into the unexpected memory shortage.
- Target Configurations and Resources: The user referred to their DeepSpeed configuration and noted it was derived from examples available on GitHub for reference.
- They emphasized that they were running experiments on a Runpod instance and highlighted its specifications.
- Community Collaboration for Troubleshooting: Community members actively collaborated to troubleshoot issues arising from model training and configuration settings.
- They exchanged insights and links to configurations while assisting in resolving the user's queries about training and resource management.
Links mentioned:
- axolotl/examples/medusa at main · ctlllll/axolotl: Go ahead and axolotl questions. Contribute to ctlllll/axolotl development by creating an account on GitHub.
- axolotl/deepspeed/zero1.json at main · ctlllll/axolotl: Go ahead and axolotl questions. Contribute to ctlllll/axolotl development by creating an account on GitHub.
- examples/train_lora/llama3_lora_sft_ds3.yaml 报错 · Issue #5252 · hiyouga/LLaMA-Factory: Reminder I have read the README and searched the existing issues. System Info 用ds_z3_config.json的时候就会报错,错误显示:pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig...
Torchtune ▷ #general (9 messages🔥):
Model Scalability Concerns
P-value Reporting in ML
Implementation of L-mul
RL Algorithm Seed Impact
Signal vs. Noise in Research
-
Model Scalability Raises Eyebrows: A member expressed concerns about the scalability of a paper that was trained on 350 billion tokens, questioning the significance of their improvements.
- Ironically, another member noted that ML professionals often overlook basic statistical measures like p-values.
- P-values Not Common in ML: A member shared frustration about the lack of p-values and confidence intervals in ML papers, expressing how it feels triggering coming from a medical background.
- Another participant remarked that they rarely see p-value usage in ML contexts, highlighting a cultural difference in scientific reporting.
- Discussion on L-mul Implementation: There was a suggestion that L-mul should be implemented in torchao, as they are expected to embrace such projects.
- A member encouraged joining their channel for more collaborative efforts, indicating a welcoming community for new ideas.
- Changing Seeds Can Alter Results: A conversation revealed that previous studies showed changing the random seed can significantly affect the results of RL algorithms.
- One member stressed that it's tough to derive meaningful insights from small models based solely on reported numbers.
- Evaluating the Impact of New Ideas: Regarding a new idea, one member acknowledged the potential but expressed skepticism about its practical impact alongside other optimizations.
- They view such papers as the starting point for ideas, reflecting an ongoing discussion about measuring significance in research findings.
Torchtune ▷ #dev (1 messages):
SOAP optimizer
AdamW learning rate issues
NanoGPT speedrunning achievements
-
SOAP outperforms AdamW but needs tuning: A user tested the SOAP optimizer on Alpaca, noting it performed better than AdamW until they adjusted AdamW's learning rate.
- However, they mentioned that the current implementation does not support distributed training or bf16 formats yet.
- NanoGPT sets new sample efficiency record: In a recent update, the SOAP optimizer achieved a new sample efficiency record of 3.28 Fineweb validation loss in 3.25B training tokens.
- This eclipses the previous record of 3.67B tokens set by another optimizer, according to a tweet from @kellerjordan0.
Link mentioned: Tweet from Keller Jordan (@kellerjordan0): NanoGPT speedrunning update: Using the SOAP optimizer (https://arxiv.org/abs/2409.11321), @vyasnikhil96 has achieved a new sample efficiency record of 3.28 Fineweb validation loss in 3.25B training to...
Torchtune ▷ #papers (3 messages):
Diff Transformer
L-Mul Algorithm
Floating Point Multiplication Replacement
-
Diff Transformer Triumphs over Traditional Transformers: The Diff Transformer introduces a differential attention mechanism, enhancing attention to relevant context and outperforming traditional Transformers in various benchmarks.
- It notably aids in long-context modeling and reduces hallucination in tasks like question answering.
- L-Mul Algorithm Slashes Energy Costs: The proposed L-Mul algorithm approximates floating point multiplication with integer addition, reducing energy costs by 95% while maintaining higher precision.
- This method offers a significant improvement over 8-bit floating point multiplications, suggesting a potential for vast resource savings in neural network computations.
- Discussion on Pretraining with L-Mul: A query was raised regarding the possibility of pretraining models using the L-Mul algorithm and its impact on performance.
- There's interest in evaluating if this approach could also help in addressing the major energy sink during pretraining.
Links mentioned:
- Addition is All You Need for Energy-efficient Language Models: Large neural networks spend most computation on floating point tensor multiplications. In this work, we find that a floating point multiplier can be approximated by one integer adder with high precisi...
- Differential Transformer: Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, t...
LangChain AI ▷ #general (9 messages🔥):
Memcached support in LangChain
LiteLLM prompt caching and streaming
Natural language to SQL query limitations
SQL chain with models other than GPT 3.5
Integrating Livekit with LangChain
-
Seeking Memcached Support in LangChain: A member is exploring whether adding support for pymemcache in LangChain would suffice or if multiple Memcached clients like python-memcached or pylibmc are also desired.
- This request aims to enhance the flexibility of caching options within the LangChain ecosystem.
- Problems with LiteLLM's Streaming and Caching: A member encountered issues retrieving cached tokens when using LiteLLM with streaming enabled and questioned best practices to ensure token caching functionality.
- They linked to useful resources on LiteLLM highlighting token stream responses might interfere with caching mechanisms.
- Limitations in Natural Language to SQL Queries: A user expressed concerns about effectively limiting SQL queries to a specific ID without trusting LLM instructions and sought alternative methods for maintaining discipline in query generation.
- Another member suggested that grouping by ID might be necessary to filter results effectively.
- SQL Chain Compatibility Beyond GPT 3.5: A query was raised regarding the compatibility of the SQL chain with models other than GPT 3.5, particularly when those attempts often yielded incorrect responses.
- A member reported success with 4o-mini by being specific with column names and question formulation.
- Interest in Livekit Integration with LangChain: A member inquired about the possibility of integrating Livekit with LangChain to enhance its functionality for real-time applications.
- They also expressed a desire to build a RAG bot, indicating interest in advanced application development using LangChain.
Links mentioned:
- LiteLLM - Getting Started | liteLLM: https://github.com/BerriAI/litellm
- Quickstart | 🦜️🔗 LangChain: In this guide we'll go over the basic ways to create a Q&A chain and agent over a SQL database. These systems will allow us to ask a question about the data in a SQL database and get back a n...
OpenInterpreter ▷ #general (8 messages🔥):
Mozilla AI open source talk
Using --stdin flag confusion
LLMs and deterministic outputs
Impact of model updates
Code outcome variability
-
Get Ready for Mozilla AI Talk!: Next week, we're excited to host a talk from a member of Mozilla AI discussing intriguing open source initiatives. Don't miss out on this opportunity to learn more!
- Join the event here to catch the insights.
- Confusion Over --stdin Flag: A user expressed confusion on how to use the --stdin flag and mentioned they couldn't find guidance in the docs. This highlights a gap in documentation clarity.
- Further clarification is needed in the documentation to assist users in utilizing this feature effectively.
- LLMs Stay Deterministic with Same Seed: A discussion revealed that LLMs can be deterministic if the same seed and input are used, contrary to popular belief. ChatGPT randomizes the seed on each request to introduce non-determinism.
- It's crucial to note that using the same inputs and setting temperature to 0 should yield consistent results.
- Unpredictability with Model Updates: Concerns were raised about model updates in ChatGPT possibly affecting result consistency over time. Changes in the model could lead to variations that disrupt previously deterministic behavior.
- Users emphasized that updates might introduce unpredictability even when the code remains static.
- Code Outcome Variability Across Systems: A member pointed out that updates to systems or Python could influence code behavior, resulting in variable outcomes. For instance, accessing user tokens could alter the execution path.
- This variability underlines the importance of a controlled environment for consistent results.
OpenInterpreter ▷ #ai-content (1 messages):
8i8__papillon__8i8d1tyr: https://www.youtube.com/watch?v=kNj0O7cKCU4
tinygrad (George Hotz) ▷ #general (3 messages):
exo on Linux with clang backend
Nix package issues
Tinygrad debug mode observations
Pull Request #6945 for clang
auto-casting bf16 to float32
-
exo fails with clang backend on Linux: A user reported an error when using
exo
on Linux with the clang backend, specifically citing failure upon invoking theclang
command with a lowering error related to MetaOps.KERNEL.- They mentioned the issue replicates on two systems and suspect it may be related to the Nix package system.
- Tinygrad debug mode shows pre-crash activity: While running
TINYGRAD_DEBUG=2
, detailed activity logs revealed hundreds of operations before a crash, indicating the process runs for some time before failing.
- Logs included DISK operations and CLANG copy processes, but ultimately concluded in a crash.
- Discussion on potential fix via GitHub Pull Request #6945: A user suggested that Pull Request #6945 might be a fix for the clang backend issues they're encountering.
- The PR involves rewriter hooks to implement autocasting from bf16 to float32, although the rewrite rules need correction.
Link mentioned: WIP: autocast bf16 to float32 for clang by 1ntEgr8 · Pull Request #6945 · tinygrad/tinygrad: I hooked the rewriter using the extra_matcher field of the renderer (mimicking PTX). The rewrite rules are not correct (does not perform the shift), will fix soon. I was able to compile and run the...
tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
Fashion MNIST PR
Dataset Suggestions
Learning Resources
-
Fashion MNIST adds challenge for tinygrad learners: A member created a Pull Request to introduce Fashion MNIST as an intermediate dataset for those learning tinygrad, providing a challenge that's more complex than MNIST but simpler than CIFAR-10.
- The PR aims to help learners with additional resources, offering a useful way to expand their skills.
- Call for more dataset additions: A member inquired if the community would like to see more datasets added and tested for tinygrad to further enhance learning opportunities.
- This suggestion highlights a shared interest in continually growing the dataset options available for learners.
Link mentioned: added beautiful fashion mnist and example by Kinvert · Pull Request #6961 · tinygrad/tinygrad: People learning tinygrad might want a step in difficulty between MNIST and CIFAR-10. This is what I personally did here to keep learning tinygrad. Might be useful to others. Up to you guys if you w...
LAION ▷ #general (1 messages):
Hierarchical Generation
Stable Cascade Models
-
Exploring Hierarchical Generation Models: A member shared their blog post titled A Theory for Coupling Generation and Compression, which discusses a framework for hierarchical model generation similar to Stable Cascade.
- The post emphasizes the common paradigm in generative models where a decomposer is trained first, highlighting its application to LLMs and image generators.
- Challenges in Current Generation Paradigm: The current generative model design often follows the same pattern, starting with a decomposing model that compresses data before a generator is trained.
- This method is prevalent in LLMs and has implications such as the LLM struggling with sub-character spelling despite speeding up training and inference.
Link mentioned: coupling generation and compression: no description found
LAION ▷ #research (3 messages):
o1-preview Generalization
o1-mini Performance
AIW Task Issues
TruthfulQA Success
-
o1-preview shows strong zero-shot generalization: Preliminary experiments suggest that o1-preview demonstrates a true leap in zero-shot (weak) out-of-distribution generalization compared to previous models.
- In comparison, o1-mini falls far behind and is on par with previous SOTA, highlighting the clear impact of pre-training scale.
- o1-preview struggles with simpler tasks: Despite being significantly better than previous models, o1-preview faces challenges on simpler tasks like AIW.
- This raises skepticism around claims that it can tackle complex olympiads and PhD level problem-solving effectively.
- o1 proves understanding on TruthfulQA: o1 has shown promising results on TruthfulQA, particularly in understanding common misconceptions effectively.
- This indicates that, while it has limitations, it excels in certain comprehension tasks.
DSPy ▷ #show-and-tell (3 messages):
The Cat API
Cat image fetching tools
Cat breeds data
- Fetching Random Cat Images from The Cat API: A new feature was demonstrated to fetch random cat images using The Cat API. The implementation involves creating a
Cat
model and using an HTTP client to grab images seamlessly. - Exploring Cat Breeds with Limitations: A method to fetch cat breeds with an option to limit the number returned has been showcased. Code snippets reveal that the first few breeds are retrieved and structured into a
CatBreed
model for easy access. - Demonstration Video Links Shared: Links to demonstration videos were shared, highlighting the functionality of the cat image and breed fetching features. These provide visual guides on how to implement the discussed tools effectively.
Links mentioned:
- Cool Stuff for Batman 🦇: Hi, I'm Sean Chatman, a full stack front end developer seeking full-time work. In this video titled Cool Stuff for Batman, I delve into configuring concurrency for meetings in APS models, showcasi...
- Tool Usage with ToolMixin: Hi there, I'm Sean Chatman, a skilled TypeScript React developer seeking full-time opportunities. I've developed the DSL Model Framework, a tool that simplifies DS-Pi usage with built-in Jinja...
DiscoResearch ▷ #general (1 messages):
Whisper Turbo German Model
Speech Recognition Optimization
-
Whisper Turbo German Model Halves Error Rate: A new model, Whisper Turbo German, significantly reduces error rates by half in some benchmarks compared to earlier models, according to a source.
- This model is specially optimized for various applications such as transcription, voice commands, and automatic subtitling for German.
- Applications of Whisper Turbo Model: Applications of the Whisper Turbo German model include transcription of spoken German, automatic subtitling, and voice-based search queries.
- It provides dictation functions for word processing programs, enhancing usability in diverse scenarios.
Link mentioned: primeline/whisper-large-v3-turbo-german · Hugging Face: no description found
Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (1 messages):
Writer's Palmyra-X-004 model
DevRel inquiries
-
Writer's Palmyra-X-004 Model Update Request: Sam Julien, leading DevRel at Writer, inquired about adding the latest Palmyra-X-004 model to the leaderboard following an email from CTO Waseem AlShikh.
- Do we need to submit a PR? Sam expressed confidence in their model's impressive results internally.
- Follow-up on Leaderboard Submission Process: Sam asked if they needed to submit a PR for the Palmyra-X-004 model to be added to the leaderboard.
- This inquiry highlights a proactive approach in ensuring their achievements are recognized within the community.