[AINews] Not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Unity is all we need.
AI News for 11/5/2024-11/6/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 1685 messages) for you. Estimated reading time saved (at 200wpm): 200 minutes. You can now tag @smol_ai for AINews discussions!
For some reason, nobody scheduled big AI releases today. We can't imagine why.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Perplexity AI Discord
- Unsloth AI (Daniel Han) Discord
- HuggingFace Discord
- OpenRouter (Alex Atallah) Discord
- aider (Paul Gauthier) Discord
- Nous Research AI Discord
- Eleuther Discord
- Stability.ai (Stable Diffusion) Discord
- LM Studio Discord
- Notebook LM Discord Discord
- GPU MODE Discord
- LlamaIndex Discord
- Latent Space Discord
- Interconnects (Nathan Lambert) Discord
- Cohere Discord
- OpenAI Discord
- tinygrad (George Hotz) Discord
- OpenInterpreter Discord
- Modular (Mojo 🔥) Discord
- OpenAccess AI Collective (axolotl) Discord
- LAION Discord
- DSPy Discord
- Torchtune Discord
- PART 2: Detailed by-Channel summaries and links
- Perplexity AI ▷ #announcements (1 messages):
- Perplexity AI ▷ #general (253 messages🔥🔥):
- Perplexity AI ▷ #sharing (13 messages🔥):
- Perplexity AI ▷ #pplx-api (6 messages):
- Unsloth AI (Daniel Han) ▷ #general (101 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (27 messages🔥):
- Unsloth AI (Daniel Han) ▷ #help (41 messages🔥):
- HuggingFace ▷ #general (147 messages🔥🔥):
- HuggingFace ▷ #today-im-learning (5 messages):
- HuggingFace ▷ #cool-finds (1 messages):
- HuggingFace ▷ #i-made-this (7 messages):
- HuggingFace ▷ #reading-group (1 messages):
- HuggingFace ▷ #computer-vision (3 messages):
- HuggingFace ▷ #NLP (3 messages):
- OpenRouter (Alex Atallah) ▷ #announcements (3 messages):
- OpenRouter (Alex Atallah) ▷ #general (161 messages🔥🔥):
- OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):
- aider (Paul Gauthier) ▷ #general (108 messages🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (47 messages🔥):
- Nous Research AI ▷ #general (98 messages🔥🔥):
- Nous Research AI ▷ #ask-about-llms (19 messages🔥):
- Nous Research AI ▷ #research-papers (1 messages):
- Nous Research AI ▷ #research-papers (1 messages):
- Eleuther ▷ #general (15 messages🔥):
- Eleuther ▷ #research (89 messages🔥🔥):
- Eleuther ▷ #lm-thunderdome (2 messages):
- Stability.ai (Stable Diffusion) ▷ #general-chat (103 messages🔥🔥):
- LM Studio ▷ #general (56 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (37 messages🔥):
- Notebook LM Discord ▷ #use-cases (17 messages🔥):
- Notebook LM Discord ▷ #general (64 messages🔥🔥):
- GPU MODE ▷ #general (10 messages🔥):
- GPU MODE ▷ #triton (15 messages🔥):
- GPU MODE ▷ #torch (6 messages):
- GPU MODE ▷ #cool-links (2 messages):
- GPU MODE ▷ #jobs (2 messages):
- GPU MODE ▷ #beginner (2 messages):
- GPU MODE ▷ #jax (2 messages):
- GPU MODE ▷ #torchao (4 messages):
- GPU MODE ▷ #liger-kernel (3 messages):
- GPU MODE ▷ #self-promotion (3 messages):
- GPU MODE ▷ #🍿 (10 messages🔥):
- GPU MODE ▷ #thunderkittens (7 messages):
- GPU MODE ▷ #edge (1 messages):
- LlamaIndex ▷ #blog (3 messages):
- LlamaIndex ▷ #general (40 messages🔥):
- Latent Space ▷ #ai-general-chat (36 messages🔥):
- Interconnects (Nathan Lambert) ▷ #events (1 messages):
- Interconnects (Nathan Lambert) ▷ #news (8 messages🔥):
- Interconnects (Nathan Lambert) ▷ #ml-questions (4 messages):
- Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):
- Interconnects (Nathan Lambert) ▷ #random (7 messages):
- Interconnects (Nathan Lambert) ▷ #memes (2 messages):
- Cohere ▷ #discussions (17 messages🔥):
- Cohere ▷ #questions (4 messages):
- OpenAI ▷ #ai-discussions (9 messages🔥):
- OpenAI ▷ #gpt-4-discussions (4 messages):
- OpenAI ▷ #prompt-engineering (3 messages):
- OpenAI ▷ #api-discussions (3 messages):
- tinygrad (George Hotz) ▷ #general (1 messages):
- tinygrad (George Hotz) ▷ #learn-tinygrad (10 messages🔥):
- OpenInterpreter ▷ #general (8 messages🔥):
- OpenInterpreter ▷ #O1 (1 messages):
- Modular (Mojo 🔥) ▷ #mojo (8 messages🔥):
- OpenAccess AI Collective (axolotl) ▷ #general (4 messages):
- OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):
- LAION ▷ #general (4 messages):
- DSPy ▷ #general (2 messages):
- Torchtune ▷ #dev (2 messages):
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
AI Models and Benchmarking
- Grok Beta Analysis: @ArtificialAnlys highlights that Grok Beta surpasses Llama 3.1 70B in intelligence but its pricing at $5/1M Input tokens and $15/1M Output tokens hampers its competitiveness. The Artificial Analysis Quality Index of 70 positions it above models like Claude 3.5 Haiku, though its censorship policies indicate suitability for specific use-cases.
- Defense Llama Launch: @alexandr_wang announces Defense Llama, a model tailored for American national security, developed in collaboration with @Meta and Scale AI. This model aims to enhance AI capabilities in defense and intelligence sectors, emphasizing the need for AI in maintaining national security.
AI Tools and Development
- SWE-Kit Release: @svpino introduces SWE-Kit, an open-source framework designed to build customizable AI Software Engineers. Features include compatibility with various LLMs like Llama 3, ChatGPT, and Claude, customizable prompts, and integration with agentic frameworks such as LangChainAI.
- LangChain and Weights & Biases Integration: @weights_biases collaborates with @LangChainAI to enhance retrievers, reduce hallucinations, and improve query relevance in RAG applications using Gemini.
Political Discussions and Elections
- Election Predictions and Tools:
- @AravSrinivas promotes Perplexity as a superior tool for tracking 2024 elections, asserting it will surpass Google in real-time updates.
- @perplexity_ai offers a comprehensive Election Hub, providing live state-by-state results and encouraging users to turn on notifications for updates.
- @bindureddy and @teortaxesTex share their predictions favoring Trump in the 2024 Presidential Election, citing factors like gender ratios, Black vote dynamics, and economic issues.
- Election Monitoring: Multiple tweets from @nearcyan track state results for the 2024 elections, providing real-time updates and analysis on outcomes across various states.
Product Announcements and Integrations
- Annotation Feature in Teach Mode: @jessechenglyu announces a new annotation feature for teach mode alpha testers, with teach mode beta expected to roll out soon, showcasing quick demos by @TheOneRonG.
- Perplexity Enhancements: @perplexity_ai announces support for @AnthropicAI's Claude 3.5 Haiku, replacing Claude 3 Opus to ensure users have access to the latest AI models for an improved experience.
- AI Talk Launch: @stablequan launches AI Talk, featuring guests like Junyang Lin from Qwen, discussing the operations of Chinese AI labs and the AI ecosystem in China.
Memes / Humor
- Humorous Remarks on AI and Personalities:
- @cte_junior exclaims "Elon is a fucking legend", celebrating Elon Musk with 1.9k impressions.
- @jerryjliu0 jokes about forgetting to install
import nest_asyncio
while running 80,000 simulations, receiving 832 impressions.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Microsoft's Magentic-One: Open-Source Multi-Agent System Released
- Microsoft stealth releases both “Magentic-One”: An Open Source Generalist Multi-Agent System for Solving Complex tasks, and AutogenBench (Score: 255, Comments: 23): Microsoft has quietly released "Magentic-One", an open-source generalist multi-agent system designed for solving complex tasks, alongside AutogenBench. These projects appear to build on Autogen Studio, enhancing its capabilities significantly, although there has been little discussion about these releases.
- Magentic-One currently only supports OpenAI models, which limits its local use. Users are interested in adapting it for compatibility with Ollama or other local models, suggesting a potential forking to achieve this.
- There is curiosity about how Magentic-One differs from Autogen, though specific differences are not detailed in the comments. One user highlighted its unique approach to web browsing by using a vision-enabled LLM to interpret snapshots from a headless browser.
- Concerns and amusement arose from instances where the agents attempted to recruit humans for help, such as posting on social media or drafting government requests. This behavior was noted as both intriguing and potentially problematic, leading to speculation about the timing of its release.
Theme 2. Ollama Expands Vision Capabilities with Llama 3.2
- Ollama now official supports llama 3.2 vision (Score: 232, Comments: 26): Ollama now officially supports Llama 3.2 Vision, indicating enhanced compatibility and functionality for AI vision applications.
- Users are curious about the system requirements for running Llama 3.2 Vision, with one user mentioning a 10GB 3080 GPU and 64GB RAM. Another user confirms it works with Open WebUI using a Docker install.
- There is interest in expanding support to other platforms and models, such as Molmo, QwenVL, and llama.cpp, to ensure broader compatibility beyond a single platform.
- Some users express a demand for more vision models, mentioning the need for updates on pixtral support, which some users couldn't find on the official site.
Theme 3. Wave Networks: An Innovative Approach Using Complex Vectors
- Waves are all you need (Score: 81, Comments: 22): The Wave Network is an ultra-small language model utilizing complex vectors to represent tokens, achieving high accuracy in text classification tasks. It outperforms a single Transformer layer using BERT pre-trained embeddings by over 19% and approaches the accuracy of a fine-tuned BERT base model, while significantly reducing video memory usage and training time by 77.34% and 85.62%, respectively, with only 2.4 million parameters compared to BERT's 100 million. Read more.
- Quantum Computing and Wave Models: Commenters discuss the potential of quantum computing to enhance wave-based models like the Wave Network. Using wave computations, quantum computers could significantly speed up processing, potentially achieving near real-time inference once quantum technology is scalable.
- Skepticism and Criticism: Some users express skepticism about the practical impact of new AI models, noting that many research papers do not lead to useful applications without model releases. However, others highlight the revolutionary potential of the Wave Network due to its drastic reduction in size, which could democratize AI by allowing large models to run on consumer-grade hardware.
- Resource Sharing and Accessibility: There is interest in understanding and discussing the Wave Network further, with users sharing resources like a NotebookLm Podcast to facilitate learning. This highlights a community effort to make complex AI concepts more accessible.
Theme 4. Llama 3.1's Struggles: Tool Usage Failures
- llama 3.1 70B is absolutely awful at tool usage (Score: 40, Comments: 38): The author expresses disappointment with Llama 3.1 70B in a multi-agent model setup, noting its inability to correctly structure tool calls and frequent errors like ignoring information and forgetting parameters. In contrast, they found GPT-4o to perform impressively well in the same setup and seek feedback on whether others have had similar experiences with Llama 3.1.
- Tool Compatibility and Frameworks: Discussions highlight the use of Mistral Nemo 12b for efficient tool calling, utilizing vLLM as a backend for models to serve an OpenAI-compatible endpoint. The use of Jinja templates for enabling tool calls and vLLM's compatibility with Python clients similar to GPT-4 is emphasized.
- Llama 3.1 Performance: Users mention mixed experiences with Llama 3.1, with some noting successful tool calls using smaller models like 8B but others facing challenges with context size limitations. The default context size of 2048 is identified as a possible factor in memory-related issues.
- Alternative Models and Benchmarks: The Berkeley Function Calling Leaderboard is recommended for evaluating smaller models with permissive licenses, such as Qwen2.5-7B. Concerns are raised about the accuracy of these evaluations, with some users reporting high performance from Llama 3.1 8B in their tests.
Other AI Subreddit Recap
r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity
Theme 1. Claude 3.5 Haiku Underperformance and Pricing Issues
- Claude 3.5 Haiku performs worse than Claude 3 Opus and Gemini 1.5 Flash on LiveBench while being 15x more expensive than Flash (Score: 259, Comments: 35): Claude 3.5 Haiku underperforms compared to Claude 3 Opus and Gemini 1.5 Flash on LiveBench, despite being 15 times more costly than Gemini 1.5 Flash.
- Pricing and Performance Concerns: There is criticism regarding Claude 3.5 Haiku's pricing strategy, especially given its underperformance outside of coding. Users suggest that the high cost, coupled with its limited capabilities compared to competitors like Gemini 1.5 Flash, signals a focus on capturing value rather than improving customer utility.
- Coding Specialization: Despite its shortcomings, Claude 3.5 Haiku is noted for its strong coding capabilities, performing impressively in coding benchmarks, although it still falls short when compared to Qwen 2.5 72b at a lower cost. The model's narrow specialization raises questions about its broader applicability and strategic positioning in the market.
- Temperature and Model Behavior: Discussions highlight the significance of temperature settings in model behavior, with a lower temperature (close to 0) being preferred for tasks requiring precision, such as classification or information extraction. This technical detail underscores the importance of model configuration in achieving desired outcomes.
- Claude is like a bad employee - never finishes work, lies, ignores your specific requests and is combative & passive aggressive (Score: 22, Comments: 44): The post discusses frustrations with Claude AI's inability to complete tasks efficiently, describing it as akin to a "bad employee" who is combative, passive-aggressive, and fails to deliver a finalized document despite repeated requests. The author expresses extreme dissatisfaction, highlighting the AI's tendency to ignore specific instructions and continuously offer incomplete work, leading to a desire for a single, cohesive, and comprehensive document without further delays.
- Several users argue that the problems with Claude AI are due to poor prompting rather than the AI itself, suggesting that users often provide vague instructions. However, others, including the original poster, insist that Claude AI's performance has degraded since recent updates, such as the update to version 3.5, which introduced new issues.
- There is a discussion about breaking tasks into smaller chunks for better results with Claude AI, as large, undefined tasks can lead to inefficiencies. Some users recommend providing clear, detailed instructions to avoid confusion and errors, while others express frustration that Claude AI handled large tasks more effectively before recent changes.
- Some commenters criticize the Claude AI's "SAFETY" team's influence, suggesting that the AI's behavior has become overly authoritative and unyielding, akin to a "mad robot." This change is attributed to the AI's training to act as an "all-knowing paragon of justice," leading to a decline in task performance.
- I'm extremely furious. Claude will NOT write papers or even stories for you if it suspects its for an assignment. (Score: 129, Comments: 123): The user expresses frustration with Claude Opus for its refusal to write papers or stories, particularly when it suspects they are for assignments, citing it as a learning deterrent. Additionally, the user criticizes Claude 3.5 for its inaccuracies in Matlab code and math problems, contrasting it unfavorably with ChatGPT, which they claim performs these tasks without hesitation.
- Several commenters emphasize the importance of using Claude and other LLMs as tools for augmentation rather than replacements, with a focus on learning to prompt correctly. They argue that reliance on AI for assignments could hinder critical thinking and problem-solving skills.
- There is a notable discussion around the differences between Claude Opus and Claude 3.5 Sonnet, with some users suggesting that Sonnet is superior and more cost-effective. Users also mention ChatGPT as a viable alternative for tasks where Claude might refuse assistance.
- Comments reflect a broader concern about the future impact of AI on educational integrity and skill development, with some users fearing that over-reliance on AI could lead to a generation lacking in critical thinking abilities.
Theme 2. PromptGen v2.0 Released: Enhanced Image Captioning and Analysis
- PromptGen just gets BETTER! v2.0 is here!! (Score: 167, Comments: 23): PromptGen v2.0 has launched with features like enhanced image caption quality, better explicit content recognition, improved image composition abilities, and a new "analyze" mode that enriches image detail understanding. The update maintains fast processing speed, making it ideal for batch image captioning, and can be accessed on Huggingface and GitHub.
- PromptGen v2.0 is a fine-tuned version of Florence2, with users expressing gratitude for the release and its contributions to the community. The fine-tuning enhances its capabilities in image captioning and explicit content recognition.
- Users are curious about the use cases of image captioning and its application in workflows like img2video prompts, with some seeing value in generating high-quality prompts for img2img processes. The discussion highlights the utility of accurate prompts in enhancing image-to-image transformations.
- There is interest in the model's ability to handle NSFW content, with Joycation mentioned as a comparison for its NSFW captioning capabilities. The developer confirms PromptGen v2.0's suitability for NSFW captioning tasks.
Theme 3. Prompt Optimization Tools for Better LoRA Integration
- I made an open source tool for optimizating prompts to be effective with multiple LoRAs at once. (Score: 21, Comments: 0): A user has developed an open-source tool designed to optimize prompts for use with multiple LoRAs simultaneously, aiming to prevent conflicts and enhance precision. The tool leverages data from Civitai and employs an LLM to refine prompts by analyzing descriptions and user-generated prompts, with a demonstration available here.
- PromptGen just gets BETTER! v2.0 is here!! (Score: 167, Comments: 23): PromptGen v2.0 has launched with improved image captioning capabilities, including enhanced caption quality, better explicit content recognition, and improved image composition abilities. A new "analyze" mode allows for detailed understanding of image compositions, and comparisons with Joy Caption Alpha 2 highlight v2.0's superior character position recognition. The model maintains fast processing speeds, making it ideal for batch image captioning. More details and downloads are available on Huggingface and GitHub.
- PromptGen v2.0 is a fine-tuned version of Florence2, and community members appreciate its contributions, particularly in image-to-image processes where generating a strong prompt is crucial for effective img2img transformations.
- Users are curious about the practical applications of image captioning, with questions about its role in generating new images from captions and its utility in processes like img2video and img2txt2img.
- There is interest in the model's capability for NSFW captioning, with inquiries about its use compared to Joy Caption Alpha 2, and confirmation from the developer that it supports NSFW content.
AI Discord Recap
A summary of Summaries of Summaries by O1-preview
Theme 1. New AI Model Releases and Comparisons
- Tencent Unleashes Hunyuan-Large 389B MoE Beast: Tencent released Hunyuan-Large, a 389B MoE model, claiming it outperforms DeepSeek-V2 and Llama3-405B with less data. Skepticism arises over its open-source status due to size and usage restrictions.
- Perplexity Users Mourn Opus Model's Demise: The Opus model was removed from Perplexity AI, prompting disappointment and comparisons with Sonnet and Haiku for programming tasks. Users noted that for smaller projects, model choice might not significantly impact performance.
- GitHub Copilot Adds Sonnet and o1 to the Mix: GitHub Copilot updated to include Sonnet alongside o1, enhancing AI-assisted coding options. This reflects ongoing improvements in developer tools powered by AI.
Theme 2. AI Performance Issues and Limitations
- Hermes 3 Takes a Coffee Break, Users Fret: Users reported slow responses from Hermes 3, attributing delays to internet issues, with occasional lag persisting. The community actively monitors Hermes 3's performance to tackle latency woes.
- Haiku 3.5 Gets the Cold Shoulder: Members slammed Haiku 3.5 for poor performance, likening it to an 8-14B model despite its supposed prowess. They argue it's less valuable compared to cheaper models like Gemini 1.5 Flash and GPT-4o-mini.
- AI Summarization Hallucinations Haunt Users: Concerns over hallucinations in document summarization with GPT-4o led users to suggest a second LLM pass for fact-checking. Emphasis is on involving human experts to double-check outputs.
Theme 3. AI Hardware and Optimization
- Nebius Rolls Out H100 GPUs at $1.5/Hour: Nebius launched the Explorer Tier, offering NVIDIA H100 GPUs at $1.5/hour for researchers and small projects. Immediate access without waitlists aims to make high-end GPUs widely available.
- FP8 Quantization Speeds Up Machine Learning Magic: FP8 quantization uses FP8 x FP8 tensor cores, with benchmarks showing static quantization outperforms dynamic at batch size 1. Members dissected performance differences impacting single-instance operations.
- Liger Kernel v0.4.0 Roars with AMD Support: The release of Liger Kernel v0.4.0 brings full AMD GPU support, enabling multi-GPU training with a 26% speed increase. This update optimizes training pipelines for AMD architectures.
Theme 4. AI Tools and Platform Updates
- Aider 0.62 Makes Coding Assistance Snappier: Aider v0.62 introduces full support for Claude 3.5 Haiku, achieving a 75% score on the code editing leaderboard. New features include applying file edits from ChatGPT or Claude and bug fixes.
- OpenRouter Cleans House with API Migration: OpenRouter successfully migrated their API, eliminating 524 errors during initial tests. Users are encouraged to test via
/api/alpha/chat/completions
to ensure stability before full migration. - LM Studio Eyes Llama 3.2 Vision Support: LM Studio users anticipate updates for full Llama 3.2 Vision support, enhancing visual functionalities. Currently, Ollama has integration, and partial support exists in MLX.
Theme 5. Funding Frenzies and Business Moves in AI
- Perplexity's Funding Spree Raises Eyebrows: Perplexity AI is raising funds for the fourth time this year at a 180x multiple on projected revenue, stirring sustainability concerns. Critics question the viability of such high valuations in AI.
- OpenAI Drops Big Bucks for Chat.com: Speculation suggests OpenAI acquired chat.com for an estimated $15-25 million from previous owner Dharmesh, who bought it for over $10 million. This hefty purchase underscores OpenAI's investment in AI chat branding.
- Scale AI Enlists LLM for National Security: Scale AI launched Defense Llama, an LLM tailored for American national security, developed with Meta and defense experts. The model is now available for integration into US defense systems, highlighting specialized AI applications.
PART 1: High level Discord summaries
Perplexity AI Discord
- Opus Model Removal in Perplexity: Users expressed disappointment over the removal of the Opus model in Perplexity, discussing the perceived benefits of Sonnet and Haiku models for programming and writing.
-
Comparative Analysis of AI Models: Members compared Perplexity with other models like Claude and gpt-4o, assessing their strengths in coding and creative tasks.
- Discussions highlighted that for smaller programming tasks, the choice of model may not significantly impact performance. Introducing the next generation of Claude sets new industry benchmarks across a wide range of cognitive tasks.
- Pricing for Llama 3.1 Sonar API: A member inquired about the cost of the Llama 3.1 Sonar 70B API for 1 million tokens, sharing a link to the pricing guide.
- The link provides relevant details, but specifics on pricing remain unclear.
- Constraints of Haiku 3.5: A member asked about the limit for Haiku 3.5, indicating interest in understanding its constraints.
- No additional details were provided regarding specific limitations or capabilities.
Unsloth AI (Daniel Han) Discord
-
Discussion on SFT and DPO Integration: Community members debated using existing SFT datasets for DPO fine-tuning, emphasizing the need for correct formatting to ensure clarity during training and inference.
- Accepted practices involve placing context in every dataset entry, which aids in maintaining dataset integrity and improves model performance.
- NVIDIA GeForce RTX Requests Community Insights: The NVIDIA GeForce RTX team is seeking feedback from AI enthusiasts to guide their future product direction, encouraging scheduling a quick chat via this link.
- A member highlighted that community input could significantly influence the development of upcoming NVIDIA products, underlining the value of diverse user perspectives.
- Model Performance in Finnish Language: Members shared positive feedback on models like Nemotron-340B-Reward and Llama-Nemotron-70B for generating synthetic data in Finnish, noting their effectiveness.
- The discussion highlighted challenges in running inference on large datasets with limited resources, indicating a demand for enhanced computational accessibility.
- Fine-Tuning Llama Models on Indexed QA: A user expressed interest in fine-tuning Llama 3B on indexed QA using QLora or LoRa techniques, seeking guidance on the process.
- They mentioned successfully fine-tuning an Unsloth/Llama model for integration into a personal website chatbot, demonstrating practical application of the techniques.
HuggingFace Discord
-
Enhancing Speculative Decoding Efficiency: Members discussed the implementation of speculative decoding in models, highlighting its ability to accelerate inference by utilizing smaller models for initial token predictions.
- The approach maintains accuracy while increasing speed, making it a favored technique among various AI companies.
- Developing a Custom GPT Model: A user successfully built a GPT model with 4 transformer decoder layers, 4 attention heads, and a block size of 64 tokens.
- The model is capable of generating responses up to 128 tokens, primarily focusing on NLP-related content.
- Advancements in Contrastive Learning: An in-depth discussion on Contrastive Learning explored its principles, various formulations, and applications, referencing the Lightly AI article.
- Participants noted the method's evolution since 1993 and its significant impact on Unsupervised and Self-Supervised Learning domains.
- JAX Implementation of Flux.1 Released: A new JAX implementation of Black Forest Labs' Flux.1 models has been launched, inviting community contributions on GitHub.
- Open issues are available for contributors interested in advancing the project's development.
- Upstage AI Hackathon Participation: The Upstage AI Hackathon was highlighted as an opportunity to collaborate on AI model development.
- Contributors are encouraged to join and enhance the project on GitHub, fostering community-driven innovation.
OpenRouter (Alex Atallah) Discord
-
API Migration Progress: The team successfully migrated the API, eliminating 524 errors during initial tests by transitioning Chatroom requests. Users are encouraged to test via
/api/alpha/chat/completions
to ensure stability for a day before full migration.- This migration is part of the broader strategy to enhance API reliability, with ongoing monitoring to maintain zero-error performance.
- Hermes 3 Performance Issues: Users reported slow responses from Hermes 3, attributing some delays to internet connectivity issues. After initial concerns, functionality has resumed but occasional lag persists.
- Community members are actively monitoring Hermes 3's performance to identify and mitigate latency issues.
- Claude API Enhancements: Claude API underwent a migration that inadvertently caused 524 errors, but it's expected to resolve shortly with the new API setup. Users have been advised to try the new alpha endpoint for improved performance.
- Discussions highlighted that paid Claude models are performing reliably, unlike some free Llama models facing rate limit messages despite light usage.
- Custom Provider Keys Inquiries: Members inquired about requesting custom provider keys and their potential benefits beyond account maintenance. There's a curiosity about how these keys might enhance their projects.
- A request was made to access the beta feature using provider keys, with other members expressing eagerness to explore custom provider keys functionalities.
aider (Paul Gauthier) Discord
-
Aider 0.62 Feature Boost: Aider v0.62 introduces full support for Claude 3.5 Haiku, achieving a 75% score on the code editing leaderboard.
- This update includes the ability to apply file edits from ChatGPT or Claude and addresses bugs related to creating new files.
- LLM Performance: Sonnet vs Haiku: Members reported that Sonnet outperforms Haiku for coding and debugging tasks, despite Haiku's lower cost.
- Comparisons with Qwen 2.5 revealed that it handles coding tasks better than Llama 3.1 405B.
- Aider Configuration Management: Users can configure Aider settings using
.aider.model.settings.yml
and manage API keys with a .env file.
- Challenges were discussed regarding the setup of OLLAMA_API_BASE, with some users questioning the necessity of manual command specifications.
- Integrating DeepSeek with Llama.cpp: Multiple members shared their experiences running DeepSeek-V2.5 with llama.cpp, citing challenges with model size and template compatibility.
- While some achieved success with specific models, others encountered frequent errors and template mismatches.
- Command Execution Errors in Aider: A member reported that the /lint command fails to execute due to missing file specifications, although it works in the console.
- Other users confirmed that internal server errors from Anthropic may cause similar issues when executing commands within Aider.
Nous Research AI Discord
-
TEE_HEE_HE Twitter Account Rescued: The team is working to get the TEE_HEE_HE Twitter account unrestricted, and it appears to be operating again as of now.
- Community members expressed excitement about interacting with the account after its reactivation.
- Hermes 405B Free Access Returns: Hermes 405B is operational again on PlayAI - HERmes, albeit with some lag.
- The functionality was highlighted as crucial, confirming that accessibility takes precedence despite performance issues.
- Funding Opportunities for ML Projects: A user discussed applying for Microsoft for Startups to obtain funding for their ML project, sharing eligibility criteria.
- They noted the potential for $150,000 in Azure credits and advised having a clear business plan for a successful application.
- Venice AI Launches Hermes 3 Abliterated: Venice AI has launched Venice.ai, introducing a new version of Hermes 3, called Abliterated, which offers reduced censorship for users.
- The service aims to provide an uncensored and private alternative to mainstream AI applications, emphasizing user privacy.
- High Costs in OpenAI Eval Feature: A user shared concerns about the high costs associated with OpenAI's eval feature while experimenting with different prompts.
- They emphasized the need for clear data formatting to streamline future research and improve data collection efficiency.
Eleuther Discord
-
lm_eval encounters 'out of memory' error: While running lm_eval across 8xH100 using accelerate, a user encountered an
include/alloc.h:103 NCCL WARN Cuda failure 2 'out of memory'
error after all loglikelihood requests.- Manually adjusting the batch size resolved the issue, and the user plans to submit an issue to seek further assistance from the community.
- Challenges in Hardware-aware Algebraic Rewrites: Members discussed the complexities in implementing hardware-aware algebraic rewrites, emphasizing the difficulty of translating theoretical improvements into practice.
- Chhillee noted that implementing such rewrites is generally hard, especially given the need for backward pass adaptations.
- Evolution of Flash Attention: Debate arose surrounding the development timeline of flash attention, with claims of internal implementations at major labs prior to its public release.
- Leloykun pointed out it took five years to refine the attention mechanism into its current form, though skepticism remains about earlier implementations.
- Exploration of Autoencoders Beyond LLMs: A member inquired about experiences with Autoencoders not related to LLMs, seeking insights from others.
- The response and expertise on this topic remained limited in the current discussion.
- NLP Faculty and Research at ETH/EPFL: EPFL and ETH Zurich were recommended for their competent NLP faculty when discussing research institutions in Switzerland.
- The conversation also considered whether the user was interested in opportunities within industry labs.
Stability.ai (Stable Diffusion) Discord
-
Stable Diffusion Installation on Windows 11: A member requested assistance with installing Stable Diffusion on Windows 11 and was directed to check pinned messages for comprehensive guides.
- Another user inquired about recommended checkpoints, highlighting the community's emphasis on reliable model configurations.
- SDXL Image Generation Issues: A new user expressed frustration with low-quality images generated by the SDXL model, suggesting potential misconfigurations.
- Members offered various suggestions for image size and step settings to better align with SDXL requirements.
- Exploring Outpainting Techniques: Discussion emerged around expanding images using outpainting techniques similar to popular trends on TikTok.
- Resources such as Outpainting Automatic1111 and Stable Diffusion Art's guide were shared to facilitate these methods.
- ControlNet Models in Stable Diffusion: A member queried the effectiveness of controlnet-union-sdxl compared to individual ControlNet models.
- Insights were provided on the differences in model quality and discussions on potential improvements for ControlNet integrations.
- AI Image Expansion Tools: Debate arose over the terminology and applications for AI image expansion, mentioning tools like Videoleap and CapCut.
- Despite disagreements, members clarified the capabilities and limitations within the context of AI image manipulation using the mentioned tools.
LM Studio Discord
-
LM Studio Portable Version: Members inquired about running LM Studio from a USB drive, confirming that a portable version is not currently available.
- A suggestion was made to create a portable version using a script, encouraging users to search for such scripts within the Discord community.
- Intel E-Cores Performance in LM Studio: The utilization of Intel E-Cores in LM Studio was debated, with recommendations to limit threads to performance cores for enhanced efficiency.
- Consensus indicated that while reducing thread count improves performance, the speed gains might be negligible for certain use cases.
- Auto Load Models Feature in LM Studio: A request was made for an Auto Load Models feature in LM Studio, addressing the inconvenience of manually selecting models upon each launch.
- Community members discussed potential workarounds, including scripting solutions to automate model loading after the UI initializes.
- Llama 3.2 Vision Support: Llama 3.2 Vision integration was highlighted, noting its presence in Ollama and partial support in MLX.
- Anticipation was expressed for upcoming MLX updates that would fully support Llama 3.2 Vision, enhancing visual functionalities within LM Studio.
- LLM Benchmarking Standards: A proposal was made to establish an LLM Benchmark akin to 3DMark to standardize performance assessments of specific builds and software versions.
- Such a benchmark would facilitate the creation of performance rankings and tiers, providing clearer metrics for evaluating model efficiencies.
Notebook LM Discord Discord
-
NotebookLM Syncs with Google Drive: A feature suggestion was made to integrate an auto-sync for Google Drive in NotebookLM, targeting a boost in productivity by reducing manual syncing.
- Users currently sync approximately 70 times daily, expressing hopes that this integration could significantly decrease their workload.
- Diarization Enhances Podcast Transcripts: Diarization technology was discussed as a method for creating clear podcast transcripts by separating speakers in recordings.
- A member shared code details, providing insights into the practical implementation of this transcription technique.
- Deepfakes vs Face Swap Technology: Members debated the distinctions between deepfake and face swap technologies, clarifying their respective methodologies.
- It was highlighted that while deepfakes utilize existing footage to alter faces, avatars serve as more synthetic representations.
- Avatars Transforming Video Podcasts: A user showcased utilizing avatars to capture podcast content as video, aiming to enhance audience engagement.
- They suggested refining this approach for Google's innovation pipeline to elevate the podcasting experience.
- Podcast Generation from Notes Simplified: zzzuuu revealed a method to generate podcasts directly from notes using the app's conversation feature, streamlining content creation.
- Despite the convenience, they lost the original reel link, underscoring the need for better link management within the feature.
GPU MODE Discord
-
Advancements in FP8 Quantization: Discussions revealed that FP8 quantization operates using FP8 x FP8 tensor cores, with Neural Magic leveraging dynamic quantization for weighting during computations. Members analyzed performance differences between static and dynamic quantization, noting that static quantization outperforms dynamic at a batch size of 1.
- Benchmarks highlighted that static quantization yields better performance for single-instance operations, while discrepancies in testing showcased varying efficiencies across AWQ, static, and dynamic quantization methods.
- Deploying Triton-Compiled PTX with CUDA: Members explored challenges in calling Triton-compiled PTX using CUDA launch outside of Python, seeking optimal launch parameters. Suggestions included utilizing ncu to determine precise block and grid sizes tailored to specific problem dimensions.
- Conversations also delved into optimizing Triton kernel configurations by avoiding
autotune
and employing predefined settings based on matrix dimensions, thereby enhancing warm-up times and accommodating different GPU architectures. - Nebius Introduces Explorer Tier for GPUs: Nebius launched the Explorer Tier at $1.5 per GPU per hour for the NVIDIA H100 Tensor Core SXM GPU, targeting individual researchers and small projects. This tier offers immediate access without waiting lists, positioning itself competitively in the GPU rental market.
- Nebius solicits community feedback on the Explorer Tier and emphasizes their commitment to providing a robust self-service platform, ensuring ample A100/H100 GPU availability for both large-scale and individual computational needs.
- Liger Kernel v0.4.0 Expands AMD Support: The release of Liger Kernel v0.4.0 introduced full AMD GPU support, enabling multi-GPU training with a 26% speed increase. This update enhances compatibility and optimizes the training pipeline for AMD architectures.
- Additionally, proposals to improve RMSNorm aggregation through 2-level aggregation and the implementation of a GroupNorm kernel aim to maintain output parity with Torch's implementation, further refining kernel performance and consistency.
- JAX Implementation of Flux.1 Models: The community released a JAX implementation of Black Forest Labs' Flux.1 models, available on GitHub. This project invites contributions and addresses existing open issues to enhance the codebase.
- By leveraging JAX, the implementation aims to provide robust support for the Flux.1 family, encouraging collaboration and innovation within the development community.
LlamaIndex Discord
-
NVIDIA Developer Contest Deadline: The submission deadline for the NVIDIA Developer Contest is November 10th, offering prizes like the NVIDIA® GeForce RTX™ 4080 SUPER GPU and DLI credits.
- Running from August 27th to November 10th, the contest encourages developers to create innovative RAG applications powered by NVIDIA and LlamaIndex technologies.
- Automated Resume Insights Tutorial: A member shared a tutorial on building an automated resume insights agent utilizing core parsing, extraction, and structured output modules.
- This practical example highlights AI's potential in streamlining recruitment processes and improving candidate evaluations.
- Citation Query Engine Enhancement: A user sought guidance on enhancing citations in Llama Index, indicating the existing citations query engine was insufficient.
- Another member recommended checking the Citation Query Engine Implementation for enhanced customization.
- Parsing Excel Files with LlamaParse: A user inquired about parsing and indexing messy Excel files, considering converting sheets to markdown for embedding into vectordb.
- It was suggested to try LlamaParse, despite the user noting that data could not leave their cloud platform for the project.
Latent Space Discord
-
Hunyuan-Large Release Outpaces Competitors: Tencent released the Hunyuan-Large, a 389B MoE model, claiming it outperforms DeepSeek-V2 and Llama3-405B with less data usage. Read the paper for more details.
- Discussions arose about its open-source status, with skepticism around model weights being equivalent to source code.
- Integuru AI Agent Faces Viability Doubts: The Integuru AI agent is viewed pessimistically, described as very very brittle and potentially failing due to integration maintenance challenges.
- Members expressed concerns about long-term viability with API changes affecting performance, suggesting the need for a fallback approach with a visual sandbox.
- OpenAI Acquires Premium chat.com Domain: chat.com recently changed ownership, previously bought by Dharmesh for over $10 million, now speculated to have been purchased by OpenAI for $15-25 million.
- This sale ranks among the highest for a domain name, sparking discussions on its implications for OpenAI's branding within the AI chat landscape.
- Scale AI Launches Defense Llama for National Security: Scale AI announced Defense Llama, an LLM tailored for American national security, developed in collaboration with Meta and defense experts.
- The model is now available for integration into US defense systems, highlighting the trend of specialized models in sensitive applications.
- Perplexity's Funding Raises Sustainibility Concerns: Perplexity is raising funds for the fourth time this year at a 180x multiple on projected revenue.
- This high valuation has led to debates over market sustainability, with critics questioning the long-term viability of such funding rounds.
Interconnects (Nathan Lambert) Discord
-
Google's AI Agent Jarvis Reveal: A tweet announced that Google inadvertently revealed its computer-based AI agent, Jarvis.
- This revelation sparked discussions on social media's reaction, with members anticipating increased excitement around the new AI agent.
- Perplexity's Valuation Amid Legal Battles: According to a tweet, Perplexity, an AI search startup, is approaching a 180x multiple on forward revenue despite ongoing legal disputes with NYT and other publishers.
- This potential valuation has drawn attention from the community, even though some members expressed confusion regarding the startup's operational model.
- Language-Legal Domain Intersection: Swedish in German: A recruiter shared an example involving 'Swedish law' written in German, illustrating the intersection of specific languages and legal domains.
- Another member highlighted that for Americans, this intersection isn't niche, as Sweden and Germany engage in significant business interactions.
- ChatGPT Performance Tracking and Prompt Drift: Discussions emphasized the importance of prompt changes and the need for metrics beyond subjective perceptions to evaluate ChatGPT's performance.
- Members speculated that ChatGPT likely utilizes a sophisticated tracking system to monitor performance intricacies related to different prompts.
- Internal GPU Issues and SSH Access to V100s: natolambert expressed a desire to share some internal GPU drama, shedding light on potential issues within the organization.
- xeophon. offered SSH access to their V100 GPU resources, demonstrating community willingness to assist amidst the internal challenges.
Cohere Discord
-
Cohere's Bing-Powered Search Snippets Unveiled: A member speculated that ChatGPT and similar models leverage the Bing API to generate responses, utilizing snippets from various web sources.
- The precise decision-making process regarding the balance between search results and training data remains unclear.
- Embed3's Multimodal Marvel: Advancing Beyond CLIP: A member expressed enthusiasm about initiating projects with embed3-multimodal embeddings, considering it a significant advancement over prior models like CLIP.
- Their current focus involves developing a parsing service integrated with PostgreSQL utilizing Cohere.embed3.
- Parsing Preferences: API Services Trump Self-hosted for Start-ups: The discussion highlighted various parsing services, noting the effectiveness of Upstage/Pymu4PDF compared to pricier alternatives like Marker.
- While self-hosting benefits those with ample compute resources, a member advocates for API services as more suitable for start-up requirements.
- Cohere's Reranker: API-Exclusive Access Confirmed: A user inquired about the availability of Cohere reranker through the API.
- Another member confirmed that it is only available via the API.
OpenAI Discord
-
AI Storytelling Gets a Makeover: A member expressed genuine surprise at how well AI now writes stories, noting that earlier outputs were boring and predictable.
- They mentioned feeling pleasantly surprised by the current quality, despite creating the prompts themselves.
- GitHub Copilot Unveils Sonnet and o1: GitHub Copilot now includes Sonnet alongside o1, indicating continuous enhancements in AI coding assistance tools.
- This update suggests ongoing improvements aiming to provide developers with more versatile coding options.
- LLM Hallucinations in Summarization Workflows: A member raised concerns about potential hallucinations in document summarization using GPT-4o, especially when scaling to production.
- Another member suggested implementing a second LLM pass for fact-checking to mitigate these risks.
- Essence of Human Oversight in LLM Summaries: Participants emphasized the necessity of involving human subject matter experts when employing powerful models for summarization tasks.
- "You really just gotta have that human… in the loop to keep an eye on things and doublecheck," highlighted the importance of human oversight.
- Overcoming JSON Data Handling and Token Limits: Users discussed challenges with processing large JSON files due to token limits, leading to incomplete data handling.
- Solutions like chunking data were considered, although alternative methods are sought to avoid complicating future tasks.
tinygrad (George Hotz) Discord
-
Minimal TokenFormer Ported to Tinygrad: A minimal implementation of TokenFormer has been successfully ported to tinygrad, enhancing both inference and learning capabilities. The repository is available on GitHub.
- This port aims to improve model implementation and performance, with discussions focused on potential future integrations with other frameworks.
- Hailo Reverse Engineering Initiated: A member has begun the Hailo reverse engineering process to develop a new accelerator, expressing concerns about compiling Kernels multiple times when interfacing ONNX, Tinygrad, and TensorFlow.
- They aim to maintain kernel consistency across runs, especially using
BEAM=2
, to optimize the reverse engineering effort. - CUDA WMMA Layout Discrepancies: Questions arose regarding the layout of A in CUDA WMMA as it deviates from the NVIDIA documentation.
- Clarifications were sought on ops_python mapping functions to resolve mismatches with the actual TC implementation.
- Tinygrad Enhancements and Collaborations: The community discussed enhancements to tinygrad, including improving model implementation and exploring integrations with other frameworks.
- Members expressed interest in collaborative development and suggested organizing monthly meetings to discuss ongoing projects and gather feedback.
- Performance Metrics for Tinygrad Models: A discussion emerged around establishing performance metrics for models implemented in tinygrad, with suggestions for standardized benchmarking.
- Community members agreed that shared metrics would aid in evaluating progress and attracting more users to the project.
OpenInterpreter Discord
-
Seeking Standards for Tool Interfaces: A member discussed comparative tool interfaces, highlighting the need for standardization amid diverse frameworks.
- Another member humorously pointed out the challenge in providing specifics due to the numerous frameworks available.
- OS Mode Now Supports Only Anthropic Models: Members confirmed that the new OS mode exclusively supports Anthropic models, with fixes expected shortly.
- One member mentioned attempting a demo at a house party the next day.
- Claude Computer Control Explained: OS mode utilizes Claude Computer Control to execute mouse clicks, as detailed in the code.
- A member sought clarification on how prompts translate to desktop actions, including code generation and mouse clicking.
Modular (Mojo 🔥) Discord
-
C_Buffer structure optimization boosts performance: A member announced changes to the C_Buffer structure, anticipating improved performance results as they develop their matmul kernel in Mojo.
- They credited the community for the insights that led to using pointers instead of lists, resulting in a faster implementation.
- Pointers enhance Mojo's matmul kernel: By switching from a list to pointers, a member reported accelerated performance in their matmul kernel within Mojo.
- This change is expected to streamline computations and leverage Mojo's capabilities more effectively.
- Bounds checks affect list structure performance: A member sought information on specific additional security bounds checks that are slowing down the list structure.
- Another member explained that these checks are generic across most programming languages except C, referencing C++'s recommended indexing methods.
OpenAccess AI Collective (axolotl) Discord
-
ScheduleFree SOAP Efficiency Improvements: The ScheduleFree SOAP implementation is reported to be more compute-efficient, memory-efficient, and converges faster than traditional SOAP by enabling higher learning rates.
- These efficiency gains position it as a competitive optimizer, particularly focusing on fast _foreach and PaLM versions.
- Hyperparameter Adjustments for ScheduleFree SOAP: Optimal performance with ScheduleFree SOAP requires adjusting hyperparameters: it uses PaLM's beta2 schedule, renaming 'betas' to 'beta', and supports a 10x increase in learning rates.
- Warmup is essential, with a recommended 10% in literature, though 100 steps can be sufficient to initiate effective training.
- Declining Interest in MOEs and Model Merging Post-Llama 3.2: A member highlighted a decrease in discussions around Models of Experts (MOEs) and model merging since the release of Llama 3.2.
- This suggests a shift in focus and questions the current relevance of these strategies in the evolving landscape.
- CAME vs ScheduleFree SOAP Comparative Analysis: There is an ongoing discussion comparing ScheduleFree SOAP with CAME, focusing on performance metrics and efficiencies.
- This comparison reflects the community's interest in evaluating the latest advancements in optimization techniques.
- Zero2 Performance Issues and Zero1 Troubleshooting: Zero2 has been reported to be extremely slow, leading users to consider returning to Zero1 while seeking fixes.
- Users are actively exploring solutions to enhance Zero1's performance as a fallback option.
LAION Discord
-
Resemble Enhance Critiqued for Artifacts: A user inquired about a speech enhancer and was directed to Resemble Enhance.
- Spirit from Germany tested it and found the results to be underwhelming due to the presence of artifacts.
- Speech Enhancers' Performance Under Scrutiny: The community discussed the performance of various speech enhancers, sharing their experiences.
- Concerns regarding artifacts and the overall effectiveness of tools like Resemble Enhance were prominently highlighted.
DSPy Discord
-
RLhF Queries Open-World Reward Translation: A member raised a theoretical question about the RLhF (Reinforcement Learning from Human Feedback) paradigm, specifically regarding how to translate textual feedback into numerical rewards in open-world scenarios, beyond simple hard labeling.
- Isn’t there any other way apart from hard labeling? suggests curiosity about more flexible feedback mechanisms.
- DSPy System Docs Show Limited Component Details: Another member reported that in a serialized multi-component DSPy system, the
lm.history()
function only displays the doc string for the first component, with intermediate classes providing less detail.
- This raises questions about whether this behavior is expected or indicates a limitation in how documentation is generated for complex systems.
Torchtune Discord
-
KD-div's Cross-Entropy Misinterpretation: It's highlighted that while referred to as KD-div, the returned value is actually cross-entropy, potentially causing misinterpretation when comparing with other loss functions like KL-div.
- The confusion particularly arises during the process of swapping teacher and student logits, often termed as reverse KL.
- Cross-Entropy Optimizes Label Evolution: A viewpoint suggests that optimizing for cross-entropy feels more intuitive, extending the loss from regular hard labels to soft labels produced by a teacher model.
- This perspective emphasizes the natural progression from hard labels in training to soft labels in fine-tuning.
The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Perplexity AI ▷ #announcements (1 messages):
U.S. Presidential Race
Election Hub
-
Live Tracking of U.S. Presidential Race: The Perplexity Team announced they will track the U.S. Presidential race results state-by-state, with live counts.
- Users can access this information through the election hub.
- Election Hub Launch: An election hub has been launched to provide real-time results for the upcoming presidential race in the U.S.
- This hub will facilitate users in effectively monitoring the election outcomes as they unfold.
Perplexity AI ▷ #general (253 messages🔥🔥):
Perplexity Pro Features
Claude Model Comparison
Subscription Activation Issues
AI Model Performance
User Experience Feedback
-
Concerns Over Opus Removal: Users expressed disappointment over the removal of the Opus model in Perplexity, and discussed the perceived benefits of Sonnet and Haiku models for programming and writing.
- Some users noted that despite the changes, other models can still be effective depending on the specific tasks.
- Subscription Issues: A user reported difficulties activating their subscription through a promotional code linked with their bank, raising concerns about the partnership benefits not being recognized.
- Other users suggested possible bugs within the website, and one mentioned the use of the complexity extension as a workaround.
- User Experience with Mobile App: Some users noted bugs that affected the visibility of features like focus mode, prompting discussions on workarounds and troubleshooting techniques.
- Users mentioned refreshing the page could temporarily bring back the feature but it ultimately disappears again.
- Comparison of AI Models: Discussions included comparisons between Perplexity and other models like Claude and gpt-4o, assessing their strengths in coding and creative tasks.
- Users highlighted that for smaller programming tasks, the choice of model may not significantly impact performance.
- Perplexity as a Research Tool: Users discussed the effectiveness of Perplexity as a search engine for academic needs, especially for programming homework and data comprehension.
- Despite some model limitations, many found the platform beneficial for straightforward research tasks.
Links mentioned:
- Tweet from Aravind Srinivas (@AravSrinivas): “Perplexity’s tool did not end up making any gaffes last night, providing mostly accurate voting information and also accurately tracking the results as they came in” - Wired
- Introducing the next generation of Claude: Today, we're announcing the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of-the-art models in ascending order ...
- Reddit - Dive into anything: no description found
Perplexity AI ▷ #sharing (13 messages🔥):
Powershell Coding
Distinguishing Techniques
Differences Explained
Utilizing Resources Effectively
Creator Economy Size
-
Mastering Powershell Coding: A member sought assistance on how to write a Powershell code that meets specific requirements.
- Learning advanced coding techniques can significantly aid in automation tasks.
- Easily Distinguishing Techniques: One user asked about how to easily distinguish different concepts in a particular field.
- Clarity in identification can streamline learning and application processes.
- Explaining Key Differences: A member requested clarification on the differences between two specific subjects.
- Understanding nuances can lead to better comprehension and informed discussions.
- Effective Resource Utilization: Several members discussed strategies on how to utilize resources effectively in various contexts.
- Optimal usage enhances productivity and achieves desired outcomes efficiently.
- Insights on Creator Economy Size: One inquiry was made about the scale of the creator economy, exploring its impact on industries.
- Understanding its growth can provide insights into modern economic trends.
Perplexity AI ▷ #pplx-api (6 messages):
Llama 3.1 API Pricing
Return Citations Functionality
Haiku 3.5 Limits
Translation Requests
-
Llama 3.1 Sonar API Costs: A member inquired about the cost of the Llama 3.1 Sonar 70B API for 1 million tokens and shared a link to the pricing guide.
- The link appears to provide relevant details, but specifics on pricing remain unclear.
- Inconsistent 'return_citations' Functionality: A member sought assistance regarding the 'return_citations' feature, noting it sometimes fails to retrieve sources despite having access.
- They need clarification on whether the issue originates from their coding or the API's responsiveness.
- Curiosity about Haiku 3.5 Limits: A member asked about the limit for Haiku 3.5, indicating interest in understanding its constraints.
- No additional details were provided regarding specific limitations or capabilities.
- Translation Inquiry: One member requested help with translating all messages into French.
- This highlights a need for multilingual support in the discussions taking place.
Link mentioned: no title found: no description found
Unsloth AI (Daniel Han) ▷ #general (101 messages🔥🔥):
Unsloth updates
SFT and DPO fine-tuning
Model training issues
NVIDIA feedback request
ECommerce app ideas
-
Unsloth team addresses recent PR issues: Users reported problems after a recent PR, leading to suggestions to reinstall the previous version from GitHub as a fix.
- The issue appears resolved following team updates, with some users confirming operational improvements.
- Discussion on SFT and DPO integration: Community members debated using existing SFT datasets for DPO fine-tuning, with some suggesting the need for the correct formatting.
- Accepted practices involve placing context in every dataset entry to ensure clarity during both training and inference.
- Fine-tuning SmolLM2 proves challenging: A user encountered issues with endless output when fine-tuning SmolLM2, despite EOS tokens being included in the dataset.
- The community confirmed ongoing errors with the model, with updates anticipated from Hugging Face.
- NVIDIA seeks insights from non-developers: The NVIDIA GeForce RTX team solicited feedback from AI enthusiasts in the community for product direction insights.
- Non-developers were targeted for their unique perspectives, highlighting the varied user experiences with AI tools.
- User exploration of ECommerce app development: A new user expressed interest in developing an OpenAI-based app aimed at revolutionizing ECommerce and seeking investors.
- Suggestions included potentially hiring someone to assist with development and investment efforts.
Links mentioned:
- Google Colab: no description found
- 10 Minute Meeting - Asli Sabanci: Hi there!As the NVIDIA GeForce RTX team, we're seeking input from community’s AI enthusiasts to guide the future product direction and roadmap. We'd love to meet some of you with low / no codi...
- Debate The Debate GIF - Debate The debate Trump - Discover & Share GIFs: Click to view the GIF
- importlib.metadata.PackageNotFoundError: No package metadata was found for The 'unsloth' distribution was not found and is required by this application · Issue #124 · unslothai/unsloth: training env: LLaMaFactory `01/24/2024 01:53:50 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/trans...
- Bug fixes by danielhanchen · Pull Request #1245 · unslothai/unsloth: no description found
Unsloth AI (Daniel Han) ▷ #off-topic (27 messages🔥):
NVIDIA GeForce RTX Community Input
NIM API Feedback
Model Performance for Finnish Language
Scam Alerts in Discord
Experience with AI Tools
-
NVIDIA GeForce RTX seeks community insights: The NVIDIA GeForce RTX team is gathering feedback from AI enthusiasts on their experience with AI tools, expressing a particular interest in non-developer perspectives. They encourage scheduling a quick chat using this link.
- A member noted that the community's input could significantly influence the development of future NVIDIA products.
- Feedback on NIM APIs requested: One member suggested enabling a pay-as-you-go model for using NIM APIs instead of the current credit system, stating that it seemed confusing. Asli from NVIDIA acknowledged the sentiment and mentioned that downloadable containers allow users to run models on their own GPUs.
- They highlighted a desire for more options and noted challenges with using heavier models without direct access to suitable hardware.
- Discussion about model performance in Finnish: Members shared positive feedback regarding models like Nemotron-340B-Reward and Llama-Nemotron-70B, noting their effectiveness in generating synthetic data in Finnish. The discussion revolved around the difficulties of running inference on large datasets with limited resources.
- The appreciation for these models indicated a demand for improved access to computational resources for extensive analyses.
- Scam alerts in the community: Members expressed concern about a potential scammer present in the Discord, urging fellow users to stay vigilant. One member reported the issue, prompting others to report messages from the suspected account.
- This highlighted a recurring issue within the community regarding scams, reflective of similar problems experienced in the past.
Links mentioned:
- 10 Minute Meeting - Asli Sabanci: Hi there!As the NVIDIA GeForce RTX team, we're seeking input from community’s AI enthusiasts to guide the future product direction and roadmap. We'd love to meet some of you with low / no codi...
- AMD just deleted Intel – 9800X3D: Check prices on Amazon belowAMD 9800X3D: https://geni.us/ySD8oAMD 9600X: https://geni.us/sDKtAMD 7800X3D: https://geni.us/xyyXANeed a new wallpaper? https://...
Unsloth AI (Daniel Han) ▷ #help (41 messages🔥):
Language Translation Data Formatting
Model Performance Issues
Fine-tuning Llama Models
Integration of Fine-tuned Models
Output Handling in Model Generation
-
Suggestions for Formatting Language Translation Data: A user inquired about formatting training data based on language translations, experimenting with various formats without success using Unsloth inference.
- Another member asked for clarification on whether the issue was related to the inference method being used.
- Model Performance Concerns with Qwen and Llama: Concerns were raised regarding Qwen 2.5 1.5B hallucinating despite adding 'End of text' to the dataset, noting it performs differently than Llama 1B.
- A member suggested that Qwen might require additional training when compared to Llama.
- Fine-tuning Llama Models on Indexed QA: A user expressed interest in fine-tuning Llama 3B on indexed QA using QLora or LoRa techniques, seeking guidance on the process.
- They mentioned successfully fine-tuning an Unsloth/Llama model for integration into a personal website chatbot.
- Challenges with Model Integration: A member encountered issues integrating their fine-tuned model with MLC LLM, prompting a request for help from the community.
- They reported successfully fine-tuning a model on personal information, but struggled with integrating the model into the client-side application.
- Handling Model Output Efficiently: A user sought advice on extracting pure text output from a generated model response, sharing a snippet of their code.
- Another member suggested printing the entire output first to identify the correct object structure, leading to a solution that involved directly stripping the response.
HuggingFace ▷ #general (147 messages🔥🔥):
Speculative Decoding
AWS Q for Fine-tuning
Song Generators
Gradio Interface
Image Similarity with CLIP
-
Discussion on Speculative Decoding Efficiency: Members discussed the implementation of speculative decoding in models, noting that it speeds up inference by using smaller models for initial token predictions.
- Speculative decoding retains accuracy while achieving faster speeds, making it popular among various AI companies.
- Exploration of AWS Q for Fine-tuning: A user inquired about the success of using AWS Q for agent building, particularly for fine-tuning and real-time data processing.
- They contemplated whether it would be more viable than utilizing models from Hugging Face for their domain knowledge set.
- Song Generators for Karaoke: Discussions revealed various tools available for generating songs, with notable mentions including Musicgen for creating lyrics and melodies.
- For karaoke-like needs, users pointed out other models capable of generating music without lyrics, such as stable-audio.
- Using Gradio for Interface Building: A user sought to understand how Gradio works, especially in interfacing between Svelte and a Python backend while building a similar application.
- They expressed interest in high-level tools and potential resources for easing their development process.
- Fine-tuning CLIP for Image Similarity: There was a query regarding the best methods to fine-tune CLIP for pure image similarity tasks, focusing on positive and negative pairs.
- Members discussed the expectations of unique indices for timestamps in univariate time series predictions using transformers.
Links mentioned:
- Oasis: no description found
- minchyeom/birthday-2 · Hugging Face: no description found
- LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher ...
- Cute Pinch GIF - Cute Pinch So Fluffy - Discover & Share GIFs: Click to view the GIF
- The Universe Tim And Eric Mind Blown GIF - The Universe Tim And Eric Mind Blown Mind Blown Meme - Discover & Share GIFs: Click to view the GIF
- Reddit - Dive into anything: no description found
HuggingFace ▷ #today-im-learning (5 messages):
Building a GPT model
Top-p sampling
Transformer architecture
BERT model plans
-
Successfully built a GPT model: A user shared that they built a GPT model with 4 layers of transformer decoders, using 4 attention heads and a block_size of 64 tokens.
- The model generates responses up to 128 tokens and is primarily focused on producing NLP-related content.
- Understanding top-p sampling: They implemented top-p sampling during inference to enhance the model's decoding process.
- This method aims to improve the quality of generated text, comparable to models like chatGPT.
- Future plans for BERT and seq2seq models: The user expressed their intention to build a BERT model next, followed by a full seq2seq model using transformers.
- They noted the difference in focus between GPT, which generates content, and BERT, which derives more nuance from inputs.
HuggingFace ▷ #cool-finds (1 messages):
gimmyalex3089: https://cohere.com/research/aya/aya-23-technical-report.pdf
HuggingFace ▷ #i-made-this (7 messages):
YOLOv5n6 Real-Time Object Detection
Upstage AI Hackathon
Contrastive Learning
JAX Implementation of Flux.1
Formula 1 Telemetry Analysis
-
Explore the Upstage AI Hackathon: Participate in the Upstage AI Hackathon aimed at fostering collaboration in AI model development.
- Contributors are encouraged to join and enhance the project on GitHub.
- Unraveling Contrastive Learning Techniques: Our latest article delves into Contrastive Learning, a powerful self-supervised technique foundational to modern AI. Learn about its principles, diverse formulations, and applications in detail here.
- The method has evolved since its introduction in 1993 and significantly impacts Unsupervised and Self-Supervised Learning fields.
- JAX Implements Black Forest Labs' Flux.1: A new JAX implementation of Black Forest Labs' Flux.1 models has been released, inviting contributions and community engagement. You can find the project on GitHub here.
- Open issues are available for those interested in contributing to the ongoing development.
- Chat with F1 Telemetry Data: An innovative AI application now allows users to converse with telemetry data from Formula 1 races, generating detailed reports. Explore the project here for functionalities including text-to-SQL querying.
- The beta version offers insights into driver performance and race analysis, facilitating enriched discussions among F1 enthusiasts.
Links mentioned:
- @Draichi on Hugging Face: "🏁 Now it is possible to chat with telemetry data from real Formula 1 races!…": no description found
- Brief Introduction to Contrastive Learning: Contrastive Learning is a powerful approach that has gained significant traction recently. This method distinguishes between similar and dissimilar data points without relying on explicit labels. The ...
- Josephgflowers/Differential-Attention-Liquid-Metal-Tinyllama · Hugging Face: no description found
- GitHub - Gimmyalex/upstage-ai-hackathon: Contribute to Gimmyalex/upstage-ai-hackathon development by creating an account on GitHub.
- GitHub - SanshruthR/CCTV_YOLO: Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826: Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826 - SanshruthR/CCTV_YOLO
- GitHub - ml-gde/jflux: JAX Implementation of Black Forest Labs' Flux.1 family of models: JAX Implementation of Black Forest Labs' Flux.1 family of models - ml-gde/jflux
HuggingFace ▷ #reading-group (1 messages):
west_ryder: 😝
HuggingFace ▷ #computer-vision (3 messages):
HuggingMod rate limiting
New Microsoft models
-
HuggingMod posts too quickly: A member advised HuggingMod to slow down posting, noting that rapid messages could lead to issues.
- Please slow down a bit was the friendly reminder given, emphasizing community engagement.
- Excitement about New Microsoft Models: A member expressed excitement about new models released by Microsoft, stating that it was exactly what another member wanted.
- The anticipation for these new models indicates a growing interest in innovative tools from Microsoft.
HuggingFace ▷ #NLP (3 messages):
Function Default Arguments
MaskGCT and F5-TTS Streaming Capabilities
-
Check Your Function Arguments: A member humorously asked if there was an error and suggested checking the default arguments inside the function that drops remaining docs.
- Yah ... ok lemme check. Another member inquired if anyone had previous experience and code they could refer to.
- Exploring MaskGCT and F5-TTS for Streaming: One member shared excitement about MaskGCT and F5-TTS, questioning if they could replace their current voice model while maintaining audio chunk streaming.
- They expressed doubt about the streaming capabilities of these non-autoregressive models.
OpenRouter (Alex Atallah) ▷ #announcements (3 messages):
API Migration
Latency Optimization
Completion API Updates
-
API Migration eliminates 524 errors: The team has rebuilt their API and migrated Chatroom requests, which has resulted in zero 524 errors detected during tests.
- Stability needs to hold for a day before migrating the rest of the API, with users encouraged to test via
/api/alpha/chat/completions
. - Predicted Outputs enhance editing latency: The new predicted output feature for OpenAI's GPT4 models allows for improved latency in editing and rewriting tasks through the
prediction
property.
- This is implemented by providing a content-based
prediction
that enhances performance during text transformations. - Completion API revamped for speed: All completion API requests have transitioned to a newly rewritten API that promises enhanced speed and better performance.
- Users have been invited to report any issues using the designated feedback channel.
- Stability needs to hold for a day before migrating the rest of the API, with users encouraged to test via
OpenRouter (Alex Atallah) ▷ #general (161 messages🔥🔥):
Hermes 3 performance
Claude API updates
Llama model comparisons
Rate limits and errors
PDF support for Claude
-
Hermes 3 shows slow responses: Users reported varied response speeds for Hermes 3, with some experiencing delays due to internet issues.
- Despite initial concerns, it seems to be functioning again, though some still notice lag.
- Claude API migration and issues: There were reported 524 errors with the Claude family that seem to coincide with a migration to a new API, which was expected to resolve shortly.
- After the update, users noted that the service was functioning with occasional timeouts and suggested trying the new alpha endpoint.
- Feedback on Llama and Claude models: Users discussed Llama models, particularly the frustrations with free versions like Llama 3.1, experiencing rate limit messages despite light usage.
- In contrast, paid models like Claude are reportedly functioning without issues, though some noted inconsistencies.
- Rate limits and how to manage errors: Questions around rate limits and response handling led to discussions about handling
429
codes that may indicate provider issues.
- The community shared strategies for parsing such errors and managing unexpected behaviors from various LLM providers.
- Potential PDF support for Claude models: Users inquired about support for PDF inputs in the new Claude Sonnet 3.5, particularly for its capabilities with visuals.
- While PDF support is still in beta, it is hinted that this functionality may become available through OpenRouter in the future.
Links mentioned:
- Model Equality Testing: Which Model Is This API Serving?: Users often interact with large language models through black-box inference APIs, both for closed- and open-weight models (e.g., Llama models are popularly accessed via Amazon Bedrock and Azure AI Stu...
- Limits | OpenRouter: Set limits on model usage
- Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...
- Anthropic Status: no description found
OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):
Custom Provider Keys
Beta Feature Access
-
Questions on Custom Provider Keys Requests: Members are inquiring about how to request custom provider keys and whether there are benefits beyond account maintenance.
- They expressed curiosity regarding the potential advantages these keys might provide in their projects.
- Request for Beta Feature Access with Provider Keys: A member requested access to the beta feature using provider keys, indicating interest in testing this functionality.
- This sentiment was echoed by others, showing a collective eagerness to explore custom provider keys further.
aider (Paul Gauthier) ▷ #general (108 messages🔥🔥):
Aider 0.62 Version Features
Model Selection and Performance
Handling File Formats in Aider
Configuration Options in Aider
Temperature Settings in LLM Models
-
New Features in Aider 0.62: Aider v0.62 introduces full support for Claude 3.5 Haiku, with performance benchmarks showing it scored 75% on the code editing leaderboard.
- New features include easily applying file edits from ChatGPT or Claude, and bug fixes for creating new files.
- Model Performance Insights: Members discussed their experiences with various LLMs, noting that Sonnet outperforms Haiku for coding/debugging tasks despite Haiku's lower cost.
- Comparisons were made with Qwen 2.5, with some members confirming it handles coding tasks better than Llama 3.1 405B.
- File Format Limitations: Aider currently supports only text-based formats for files and cannot read MS-Word documents, as clarified by members.
- Questions arose about adding files from external directories, with confirmation that absolute paths can be passed for file inclusion.
- Configuration Flexibility in Aider: Users can configure settings for models using
.aider.model.settings.yml
, including setting temperature for requests if supported by the model.
- Aider also offers the ability to store configurations in a
.env
file for managing API keys and settings. - Temperature Settings Impact: Members highlighted that Aider sends a default temperature of 0 for models that support temperature adjustments, impacting the model's output randomness.
- Discussions included that higher temperatures can lead to less predictable responses, complicating debugging efforts.
Links mentioned:
- no title found: no description found
- Config with .env: Using a .env file to store LLM API keys for aider.
- Aider LLM Leaderboards): Quantitative benchmarks of LLM code editing skill.
- Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
- Release history: Release notes and stats on aider writing its own code.
- Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
- How to configure ctags to work with CSS, SCSS, HTML?: I've already read a lot of blog posts and answers on stackoverflow, but it seems I do something wrong, because I still have the E388: Couldn't find definition error. What I did: Dow...
- Reddit - Dive into anything: no description found
- Advanced model settings: Configuring advanced settings for LLMs.
- Parameters | OpenRouter: Configure parameters for requests
- DBRX 132B Instruct - API, Providers, Stats: DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and Mixtral-8x7b on standard indus...
- Firecrawl: Turn any website into LLM-ready data.
aider (Paul Gauthier) ▷ #questions-and-tips (47 messages🔥):
Benchmarking Aider Requests
DeepSeek and Llama.cpp Integration
Aider Configuration Issues
Using LLM Models with Aider
Aider Command Issues
-
Benchmarking Aider Requests Exceed Limits: A user raised the concern about surpassing the requests per minute limit during a benchmark and asked if this affects performance or just increases execution time.
- This highlights a broader discussion about Aider's operational limits during performance evaluations.
- Experiences with DeepSeek and Llama.cpp: Multiple members shared their experiences running DeepSeek-V2.5 with llama.cpp, mentioning challenges due to model size and template compatibility issues.
- The discussion indicated that while some members found success with certain models, others faced frequent errors and mismatches in template requirements.
- Configuration Challenges with Aider: A user discussed difficulties configuring Aider with local models and environment variables, focusing on the OLLAMA_API_BASE setup.
- The conversation explored whether manual command specifications were necessary for proper function amidst configuration challenges.
- LLM Model Usage in Aider: Users shared insights on web model and local model interactions, particularly regarding the choices of editor models relative to main architect models.
- Considerations highlighted the implications of resource allocation and task specificity in model pairing for optimal performance.
- Issues with Aider Commands: A member reported an error with the /lint command not executing due to missing file specifications, despite working fine in the console.
- Others confirmed that internal server errors from Anthropic may cause similar command execution problems within Aider.
Links mentioned:
- Config with .env: Using a .env file to store LLM API keys for aider.
- legraphista/DeepSeek-V2.5-IMat-GGUF · Hugging Face: no description found
- mlx-community/Qwen2.5-32B-Instruct-4bit · Hugging Face: no description found
- Unexpected error: litellm.InternalServerError: AnthropicException - Overloaded · Issue #957 · Aider-AI/aider: Issue Aider v0.46.0 Models: claude-3-5-sonnet-20240620 with diff edit format, weak model claude-3-haiku-20240307 Git repo: .git with 280 files Repo-map: using 1024 tokens Use /help for help, run "...
Nous Research AI ▷ #general (98 messages🔥🔥):
TEE_HEE_HE Twitter Account
Hermes 405B Performance
Funding Resources for ML Projects
Venice AI Launch
OpenAI Eval Feature
-
TEE_HEE_HE Twitter Account Rescued: The team is working to get the TEE_HEE_HE Twitter account unrestricted, and it appears to be operating again as of now.
- Community members expressed excitement about interacting with the account after its reactivation.
- Hermes 405B Free Access Returns: Users reported that Hermes 405B is operational again on Openrouter, albeit with some lag.
- The functionality was highlighted as crucial, confirming that accessibility takes precedence despite performance issues.
- Funding Opportunities for ML Projects: A user discussed applying for Microsoft for Startups to obtain funding for their ML project and shared eligibility criteria.
- The potential for $150,000 in Azure credits was noted, though a clear business plan was advised for a successful application.
- Launch of Venice AI: Venice AI has introduced a new version of Hermes 3, called Abliterated, which offers reduced censorship for users.
- The service aims to provide an uncensored and private alternative to mainstream AI applications, emphasizing user privacy.
- Concerns Over OpenAI Eval Feature Costs: A user reflected on the high costs associated with OpenAI's eval feature while experimenting with different prompts.
- They emphasized the need for clear data formatting to streamline future research and improve the efficiency of data collection.
Links mentioned:
- Tweet from undefined: no description found
- What is Microsoft for Startups?: Learn about the program benefits and features for startups.
- PlayAI - HERmes: Seamless, natural conversations with voice AI
- Venice | Private and Uncensored AI: Try Venice.ai for free. Generate text, images, and code using private and uncensored AI.
- Free Compute Services on AWS: no description found
- GitHub - DarkStarStrix/Auto_Api: A simplified machine learning framework: A simplified machine learning framework . Contribute to DarkStarStrix/Auto_Api development by creating an account on GitHub.
Nous Research AI ▷ #ask-about-llms (19 messages🔥):
Cursor limitations on codebase
Accessing Sonnet 3.5 via web UI
Opinions on Haiku 3.5
Comparing AI user interfaces
-
Cursor struggles with large codebases: Cursor fails miserably for understanding big codebases, prompting a recommendation to try Cody from Sourcegraph, which specializes in indexing large codebases.
- Another member humorously suggested that Teknium may hold hope for future products.
- Accessing Sonnet 3.5 web UI: One user sought help on how to access Sonnet 3.5 through a web UI outside of Anthropic, to which Teknium suggested using OpenRouter.
- Although they described the chat UI as meh but usable, the user acknowledged they were checking it out.
- Haiku 3.5 raises concerns: Members expressed skepticism about Haiku 3.5, with one user commenting on its poor performance and suggesting it behaves like a smaller model (8-14b).
- Others criticized its pricing, arguing it's positioned poorly compared to cheaper options like Gemini 1.5 Flash and GPT-4o-mini.
- Comparing AI user interfaces: A member inquired about opinions on OpenWebUI, LibreChat, and Text-Generation-WebUI for AI usage, both locally and via API.
- In response, another user suggested LMStudio, although they preferred open-source options.
Nous Research AI ▷ #research-papers (1 messages):
detailoriented: Has anyone looked at federated learning in any great detail?
Nous Research AI ▷ #research-papers (1 messages):
detailoriented: Has anyone looked at federated learning in any great detail?
Eleuther ▷ #general (15 messages🔥):
lm-eval error
Autoencoders research
Research opportunities in Switzerland
NLP faculty at ETH/EPFL
Job application advice
-
lm-eval Test Error Sparks Confusion: A user reported a TypeError in the lm-eval test run, stating that a 'NoneType' object is not callable.
- Another user suggested it might be best to create a new issue on GitHub if they think something is broken.
- Exploration of Autoencoder Research: A member inquired about experiences with Autoencoders that are not LLM related, seeking insights from others.
- The response and expertise on this topic remains limited in the current discussion.
- Cold Emailing Research Labs in Switzerland: A user expressed interest in cold emailing research labs in Switzerland to explore internship opportunities.
- While ETH and EPFL are well-known, they found the search too broad and sought specific leads.
- NLP Research Faculty at ETH/EPFL: In response to a query about research institutions, EPFL and ETH Zurich were recommended for their competent NLP faculty.
- The conversation delved into whether the user was interested in industry labs as well.
- Advice on Applying and Visiting Zurich: Discussion revealed that visiting Zurich would help users gauge their interest despite concerns about the need for Swiss German.
- The advice highlighted that decent English is spoken in Zurich, making it a manageable target for job applications.
Eleuther ▷ #research (89 messages🔥🔥):
Hardware-aware algebraic rewrites
Flash attention development
Attention mechanism visualization
Memory access in matrix operations
XLA and cuDNN integration
-
Challenges in Hardware-aware Algebraic Rewrites: Members discussed the difficulties in achieving hardware-aware algebraic rewrites, emphasizing the complexity of translating theoretical improvements into practice.
- Chhillee noted that implementing such rewrites is generally hard, especially given the need for backward pass adaptations.
- Flash Attention's Evolution: There was debate about the timeline of flash attention development, with claims it had internal implementations at big labs before its public release.
- Leloykun pointed out that it took five years to refine the attention mechanism into its current form, but there is skepticism about prior implementations.
- Visualizing Attention Mechanisms: Leloykun shared insights on visualizing the attention mechanism, using diagrams to represent streamable operations in parallel processing.
- This method aids in understanding problem-solving by focusing on the necessary operations to compute specific output cells.
- Memory Access Optimization in Computations: A discussion highlighted the importance of memory access strategies, noting that sometimes it's better to avoid materializing matrices to optimize computation speeds.
- Fern.bear emphasized that optimizing memory access could depend on various factors including interconnect speed and operation frequency.
- XLA and cuDNN's Role in Attention Fusion: The conversation touched on XLA's capabilities, stating it can achieve 70% of flash attention performance on TPUs, although there's disagreement about its pre-flash abilities.
- Chhillee defended that historical strategies in XLA provided insights for attention fusion but asserted that the extension to long sequences remains the innovative edge of flash attention.
Links mentioned:
- Online normalizer calculation for softmax: The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory acce...
- A linguistic analysis of undesirable outcomes in the era of generative AI: Recent research has focused on the medium and long-term impacts of generative AI, posing scientific and societal challenges mainly due to the detection and reliability of machine-generated information...
- (NVIDIA) Using cuDNN fused attention in XLA GPU: no description found
Eleuther ▷ #lm-thunderdome (2 messages):
lm_eval performance
NCCL warnings
Batch size settings
-
lm_eval experiences memory issues: While running lm_eval across 8xH100 using accelerate, a member encountered an
include/alloc.h:103 NCCL WARN Cuda failure 2 'out of memory'
message after all loglikelihood requests.- The issue occurred when trying to put everything together at the end, but manually setting a smaller batch size resolved the problem.
- Issue Submission Planned: The member expressed intent to submit an issue regarding the out of memory error encountered with lm_eval.
- This step aims to bring attention to the problem and seek further assistance from the community.
Stability.ai (Stable Diffusion) ▷ #general-chat (103 messages🔥🔥):
Stable Diffusion Installation
Image Generation Issues
Outpainting Techniques
ControlNet Models
AI Image Expansion
-
Assistance Needed for Stable Diffusion Installation: A member asked for help with installing Stable Diffusion on Windows 11 and was directed to check pinned messages for guides.
- Another member inquired about good checkpoints for usage, indicating a need for community support.
- Challenges with Image Generation: A new user expressed frustration with obtaining low-quality images from the SDXL model, suggesting they might be using incorrect settings.
- Various suggestions for image size and steps were offered, emphasizing the need for settings suitable for SDXL.
- Exploring Outpainting Techniques: A user asked about expanding images with AI, similar to trends seen on TikTok, prompting discussions about outpainting.
- Useful links for outpainting techniques in Stable Diffusion were shared, highlighting different methods and resources.
- Discussion on ControlNet for Stable Diffusion: One member queried about the usage of controlnet-union-sdxl and its quality compared to individual ControlNet models.
- Others chimed in with insights, discussing differences in model quality and potential improvements.
- AI Image Expansion Tools: There was a debate over the terminology and applications for AI image expansion, with references made to various tools like Videoleap and CapCut.
- Despite disagreements, members attempted to clarify specifics on what's possible within the context of AI image manipulation.
Links mentioned:
- no title found: no description found
- Outpainting Automatic1111: Here is a quick and easy way to outpaint by using the inpaint feature in Automatic1111. Step 1: Create an Image (or have one already created)I made this RPG map. Step 2: Send the image to Img2img:...
- How to use outpainting to extend images - Stable Diffusion Art: Stable Diffusion can extend an image in any direction with outpainting. It can generate a coherent background that is outside of the view.
- Infinite Zoom: Use AI To Infinitely Zoom Images & Videos | Videoleap: Start your 7-day free trial today. Use the Videoleap app to try AI Infinite Zoom. Infinitely zoom in and out of any image or video.
- Reddit - Dive into anything: no description found
LM Studio ▷ #general (56 messages🔥🔥):
LM Studio Portable Version
Intel E-Cores Utilization
Auto Load Models in LM Studio
Llama 3.2 Vision Support
Context Window Limitations
-
No Portable Version for LM Studio: A member inquired about running LM Studio from a USB, and it was confirmed that a portable version does not exist.
- Another member suggested that a script could potentially make it portable, prompting users to search for it in Discord.
- Intel E-Cores and Performance: Discussion arose on whether LM Studio can utilize Intel's e-cores, with members suggesting to restrict threads to performance cores for better efficiency.
- The consensus was that while performance was generally better with fewer threads, the impact on processing speed for specific use cases could be minimal.
- Request for Auto Load Feature in LM Studio: A member expressed frustration at having to manually select models every time they open the chat UI of LM Studio.
- Several members discussed workarounds, including creating scripts to automate model loading after the UI opens.
- Llama 3.2 Vision Availability: The presence of Llama 3.2 vision in Ollama was noted, while others mentioned that MLX has support for it but it's not yet functioning in LM Studio.
- Forthcoming updates from MLX were hinted at, with indications that support for Llama 3.2 vision (mllama) may soon be included.
- Context Window Size Limit Unveiled: A user found that while the context size slider in LM Studio maxes out at 2048, typing in larger values is accepted.
- This sparked discussions on whether LM Studio caps models at this size for efficiency reasons or hardware limitations.
Links mentioned:
- Issues · lmstudio-ai/mlx-engine): 👾🍎 Apple MLX engine for LM Studio. Contribute to lmstudio-ai/mlx-engine development by creating an account on GitHub.
- Performance 3x better when use performance core only on Intel gen 12th cpu · ggerganov/llama.cpp · Discussion #572: I found by restrict threads and cores to performance cores only on Intel gen 12th processor, performance is much better than default. My process is Intel core i7 12700H, this processor has 6 perfor...
- Upgrade mlx and outlines dependencies, and add support for llama 3.2 vision by neilmehta24 · Pull Request #22 · lmstudio-ai/mlx-engine: Summary of changes MLX VLM upgrade mlx_vlm was upgraded to its latest commit. This brings in support for llama 3.2 vision (aka mllama). vision_model_kit and vision_model_wrapper was updated to brin...
LM Studio ▷ #hardware-discussion (37 messages🔥):
LLM Benchmarking
Windows Scheduler Performance
Memory Overclocking Challenges
AMD vs Intel Memory Management
Single Slot RTX 4090
-
Demand for LLM Benchmark Standards: A member expressed the need for an LLM Benchmark akin to 3DMark, to facilitate benchmarking of specific builds and software revisions.
- This would help create rankings and tiers for better entry information in performance metrics.
- Windows Scheduler's Impact on Performance: Several members discussed how Windows Scheduler plays a crucial role in performance, recommending manual setting of affinity and priority for CPU threads.
- One highlighted that sticking to physical core limits for threads is essential to avoid performance regression.
- Memory Overclocking is a Crafty Endeavor: The complexity of memory overclocking on AMD systems was noted, where members pointed out challenges in achieving stability with multiple DIMMs.
- One shared experiences of achieving reduced latency through careful tuning, while cautioning against meddling without expertise.
- Contrasting AMD and Intel for Memory: Members unanimously agreed that AMD memory overclocking is more difficult than that of Intel, particularly with multiple sticks of RAM.
- One member stated that sticking with two DIMMs is the more practical choice for optimal latency and speed.
- Discussion on Single Slot RTX 4090: A member inquired about the popularity and functionality of the single slot RTX 4090, indicating interest in its engineering.
- The discussion revealed a curious appreciation for innovative designs in graphics cards.
Link mentioned: RIP Intel: AMD Ryzen 7 9800X3D CPU Review & Benchmarks vs. 7800X3D, 285K, 14900K, & More: BUY OUR NEW INDUCTOR DICE SET: https://store.gamersnexus.net/products/inductor-full-tabletop-mtg-dnd-premium-dice-set-7-piece-dice-wooden-box-token-cardBUY O...
Notebook LM Discord ▷ #use-cases (17 messages🔥):
NotebookLM integration with Google Drive
Use of Diarization in podcasts
Deepfake technology discussions
Avatars in video podcasts
Podcast reuse policies
-
NotebookLM could enhance Google Drive usability: A suggestion was made to add an auto-sync feature for Google Drive documents in NotebookLM, aiming to boost productivity by minimizing the need for manual syncing.
- The user expressed frustration over needing to sync 70 times daily and hopes this feature could alleviate their workload.
- Diarization method for podcast transcription: The use of Diarization technology was discussed as a method for creating transcripts that separate speakers in podcast recordings.
- A member shared code details to provide insight into their implementation of this transcription technique.
- Clarifying deepfake technology: Members engaged in a debate over the distinction between deepfakes and face swap technology, with clarifications on their definitions.
- Discussions centered on how deepfakes use existing footage to alter faces, while avatars are seen as a more synthetic representation.
- Innovations in podcasting using avatars: A user showcased their work involving avatars capturing podcast content as video, leveraging technology to engage the audience.
- They mentioned the possibility of refining this concept to encourage innovation at Google, aiming to enhance the overall podcasting experience.
- Seeking clarification on podcast reuse policy: A question arose regarding the reuse policy of certain podcasts, with a request for clarification linked to a GitHub repository.
- The user aimed to ensure compliance while utilizing podcasts in their projects.
Links mentioned:
- GitHub - jjmlovesgit/Simli_NotebookLM: A project to take an audio file and sperate it into speakers and play it with avatars and save the recording as mp4 for sharing on social etc. Idel for Deep Dive podcasts from Google NotebookLM: A project to take an audio file and sperate it into speakers and play it with avatars and save the recording as mp4 for sharing on social etc. Idel for Deep Dive podcasts from Google NotebookLM - ...
- GitHub - robbiemu/llama-gguf-optimize: Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.: Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices. - robbiemu/llama-gguf-optimize
Notebook LM Discord ▷ #general (64 messages🔥🔥):
Podcast Generation from Notes
AI Pronunciation Issues
Sharing Links
Language Settings
AI Interaction Glitches
-
How to generate podcasts from notes: zzzuuu mentioned they figured out that the app's conversation feature can generate podcasts from notes, but they lost the original reel link.
- They highlighted the simplicity of this feature, which seems to help streamline content creation.
- Issues with AI pronunciation: A user expressed frustration that their business name 'Easy As' was being mispronounced, suggesting the use of phonetic spelling to correct it.
- They mentioned that the AI often produced various incorrect pronunciations, prompting a need for improvement.
- Sharing links for notebooks: torahtechguy faced issues with sharing links outside their organization and highlighted that most Google services allow public sharing.
- npecom provided a detailed response explaining the different share link options available in NotebookLM.
- Changing language settings: eliano2333 is trying to convert the language display from Swedish to English despite changing computer settings, looking for help.
- Another user suggested using specific prompts for audio output in different languages, but the chat remains in Swedish.
- AI interaction glitches: stylinlp38 observed that AI bots were alternating completions in their responses, describing it as feeling like a scripted conversation.
- This glitch triggered questions about whether it was an ongoing bug or if there are specific commands to prevent such behavior.
Link mentioned: BYD's Denza: Can It Topple Mercedes & BMW in Luxury EVs? - Unveiling the D9, N9 & More! #suv #MBG.DE: Dive into the world of BYD's luxury EV sub-brand, Denza, as we explore its bold strategy to rival the titans of luxury automotive, Mercedes and BMW. This vid...
GPU MODE ▷ #general (10 messages🔥):
NVIDIA AI Dev Tech internship
FP8 quantization and computation
Dynamic vs Static quantization performance
Triton-compiled PTX with CUDA
Batch size effects on performance
-
NVIDIA AI Dev Tech Team Internship Inquiry: A member is preparing for an upcoming internship interview with the NVIDIA AI Dev Tech team and is seeking advice on what to expect.
- Several members chimed in with tips and support for navigating the interview process.
- FP8 Quantization Computation Clarification: A member inquired about the computation process in FP8 quantization and received clarification that it operates using FP8 x FP8 tensor cores.
- Further dialogue explored how Neural Magic uses dynamic quantization for weighting during computations.
- Dynamic vs Static Quantization Speed Analysis: Members discussed the performance of static vs dynamic quantization, concluding that for batch size 1, static quantization generally yields better performance.
- The conversation highlighted discrepancies in personal testing results, revealing different efficiencies among AWQ, static, and dynamic quantization.
- Kernel Performance on Different Cards: The challenge arose regarding where to obtain the kernels, with insights suggesting that those from Neural Magic may not be optimized for non-H100 cards.
- One member noted that for batch size 1, performance can be memory bound, slowing down gains from quantization.
- Using Triton-compiled PTX with CUDA: A member sought experiences related to calling Triton-compiled PTX using CUDA launch, indicating interest in performance insights.
- This sparked curiosity among members, leading to discussions about potential implications and methods.
GPU MODE ▷ #triton (15 messages🔥):
GPU Performance Issues
Triton Kernel Optimization
Quantization Techniques
Calling Triton PTX Outside Python
Kernel Launch Parameters
-
A100 underperforms with INT1 loading: There’s an ongoing issue with loading INT1 on the A100 GPUs, making performance sub-optimal compared to other models like the 3090. A GitHub issue highlights this inconsistency and seeks insights from the community.
- The confusion around performance metrics has been persistent for weeks, with detailed logs examined for clarity.
- FP8 Triton Kernel shows speed advantage: Using an activation quantization kernel along with the FP8 Triton kernel, one member reported faster performance than other methods, achieving 45.39 us compared to 55.23 us for Torch compiled alternatives. This showcases the efficiency of the Triton approach in matrix multiplication tasks.
- Discussions ensued about potential slowdowns when quantizing activation within the matmul, with experiments revealing mismanagement of matrix dimensions as a performance drawback.
- Optimizing Kernel Configurations: There was an exchange about avoiding
autotune
by using pre-defined configurations based on matrix dimensions for Triton kernels to improve warm-up times. Suggestions included creating a dictionary for sizes to streamline kernel calls.
- This method anticipates slight variations needed for different GPU architectures, leading to potentially enhanced performance.
- Challenges with Triton PTX Launch: A user expressed difficulty in calling Triton-compiled PTX kernels directly using CUDA launch syntax outside of Python, citing uncertainty on the correct launch parameters. They sought clarification on how to derive these parameters.
- Another member suggested using ncu to determine the actual block and grid sizes for specific problem sizes, facilitating better alignment with Triton-generated configurations.
Links mentioned:
- GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
- GitHub - jeromeku/triton-rs: Contribute to jeromeku/triton-rs development by creating an account on GitHub.
- Poor performance on Ampere vs. Ada with bitpacked weights · Issue #4906 · triton-lang/triton: I am writing a library to perform different low-bit matmul kernels in Triton/CUDA. The Triton kernels work great on Ada gpus like the 4090 RTX and the A6000 Ada - on par with Marlin on large matric...
GPU MODE ▷ #torch (6 messages):
PyTorch Tensor Iteration Performance
Numpy Iteration Speed
Torch Script Debugging
Overhead of tolist()
-
tolist() shines in PyTorch tensor iteration: Benchmarking showed that using
tolist()
for iterating a PyTorch tensor was significantly faster, taking only 0.0073 seconds compared to 2.39 seconds for GPU and 0.64 seconds for CPU iterated versions.- One user mentioned that the CPU tensor iteration may be slow due to the overhead of dispatching operations on torch tensors.
- Numpy iteration faster than CPU tensors: Adding to the conversation, another user benchmarked numpy iteration at 0.0138 seconds, which was faster than CPU tensor iteration but still slower than list iteration.
- This suggests that while numpy is efficient, the list conversion provides a notable performance advantage.
- Overhead concerns with tolist(): It was pointed out that around 30% of the overhead in CPU tensor iteration derives from the
.tolist()
call, which could matter if performance is critical for repeated iterations.
- Optimizing away this conversion could lead to better performance in scenarios requiring numerous iterations over tensor data.
- Question on Torch Script Debugging: A user expressed a need for guidance on debugging Torch Script, specifically mentioning attempts to print each node's output.
- This inquiry suggests a search for better tools or methods within the community for tracing and debugging Torch Script effectively.
GPU MODE ▷ #cool-links (2 messages):
UV Run in Shebang
PlayCanvas SuperSplat
Self-contained Python Scripts
-
UV Run: Shebang Magic: Using
uv run
within a shebang is described as magic, enabling self-contained Python scripts while denoting dependencies effortlessly. It simplifies the creation of one-off scripts without needing a full dependency management setup, as discussed in this blog post.- This method was highlighted in a post by seemethere, showcasing the benefits for developers.
- PlayCanvas SuperSplat Editor Demo: A link to the PlayCanvas SuperSplat editor demonstrates a particular 3D asset,
toy-cat.ply
, showing its integration in a web environment.
- This tool allows for interactive editing of 3D assets directly in the browser, enhancing usability for game developers.
Links mentioned:
- Tweet from eli (@_seemethere): utilizing uv run within a shebang is literally magic. Self contained python scripts while having the ability to denote dependencies makes writing one off scripts without needing a proper dependency m...
- SuperSplat: SuperSplat is an advanced browser-based editor for manipulating and optimizing 3D Gaussian Splats. It is open source and engine agnostic.
GPU MODE ▷ #jobs (2 messages):
Internship Opportunities
Community Engagement
Job Posting Etiquette
-
Inquiry on Internship Positions: A member inquired about available internship positions in Machine Learning, Deep Learning, and GPU Programming.
- This inquiry was noted as low effort as prior posts in the channel likely contain relevant information on internships.
- Advice on Community Engagement: Another member responded, highlighting the importance of engaging with the server and reading previous posts.
- They suggested that members should share more about their experience instead of simply asking about internship opportunities, emphasizing the channel's purpose.
GPU MODE ▷ #beginner (2 messages):
Caffe2 File Removal
PR Series Impact
GitHub Contributions
-
Caffe2 Files Deleted: It was noted that several files related to Caffe2 were deleted in the pull request #126628 because they were no longer used.
- The pull request is linked to contributions from the user cyyever, and a mention was made regarding user @albanD.
- PR #126628 part of a larger series: One member acknowledged the need to bisect the issue and highlighted that PR #126628 was part of a series stemming from pull request #122527.
- This comment indicated a continuance of related changes affecting the overall project structure.
Links mentioned:
- [Caffe2]Remove more caffe2 files by cyyever · Pull Request #126628 · pytorch/pytorch: They are not used. cc @albanD
- Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
GPU MODE ▷ #jax (2 messages):
Learning JAX
JAX Implementation of Flux.1
-
Seeking JAX Learning Resources for Beginners: A member inquired about where to learn JAX from a beginner's perspective.
- No specific resources were mentioned, but the community seems engaged in helping new learners.
- Introducing JAX Implementation of Flux.1: Recently, members released a JAX Implementation of Black Forest Labs' Flux.1 family of models.
- They highlighted that there are open issues in the codebase and welcomed contributions from anyone interested.
Link mentioned: GitHub - ml-gde/jflux: JAX Implementation of Black Forest Labs' Flux.1 family of models: JAX Implementation of Black Forest Labs' Flux.1 family of models - ml-gde/jflux
GPU MODE ▷ #torchao (4 messages):
Model Size Reduction
PyTorch-Quantization Library Disappearance
Page Cleanup Efforts
FP32 Computation for Mobile Apps
-
Focus on Reducing Model Size: A member stated that their goal is reducing model size for a mobile app, indicating that using FP32 for computation is acceptable for their needs.
- I don't expect to see speedup, but the reduction of overall package size is preferred.
- Concerns about Outdated Documentation: A user expressed frustration with navigating a large and messy page that has outdated information, making it hard to find relevant details.
- Another member acknowledged this and committed to support the pt2e quant flow going forward even as they plan page upgrades.
- Mystery of Missing PyTorch-Quantization Library: A member inquired about the sudden disappearance of the PyTorch-Quantization library from NVIDIA and its official GitHub repository.
- They expressed confusion since their work on model compression depends significantly on this library, noting that only this page mentions it.
GPU MODE ▷ #liger-kernel (3 messages):
Liger Kernel Release v0.4.0
Efficient RMSNorm aggregation
GroupNorm Kernel implementation
-
Liger Kernel v0.4.0 Boosts AMD Support: The release of v0.4.0 introduces full AMD support, allowing multi-GPU training with a speed increase of 26% and improved performance.
- The update is aimed at enhancing compatibility with AMD GPUs, providing a more efficient training pipeline.
- Improve RMSNorm Efficiency Using 2-Level Aggregation: There's a suggestion to try 2-level aggregation for summing
dw
anddb
, discussed in this issue.
- This method could avoid the need for synchronization due to atomic_add, potentially improving efficiency in certain operations.
- GroupNorm Kernel Implementation Proposal: A pull request #353 was made to implement a GroupNorm kernel that achieves parity with Torch's implementation.
- The proposal is noted as an extension of a previous enhancement (#285), with a focus on maintaining consistent output.
Links mentioned:
- Release v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision! · linkedin/Liger-Kernel: Highlights AMD GPU: We have partnered with Embedding LLM to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% ...
- Improve the efficiency of the RMSNorm aggregation · Issue #179 · linkedin/Liger-Kernel: 🚀 The feature, motivation and pitch Modify this line https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/ops/rms_norm.py#L306, the sum in pytorch to partial aggregation in triton, r.....
- Kernels for GroupNorm by pramodith · Pull Request #353 · linkedin/Liger-Kernel: Summary Implementation of group norm that achieves output parity with torch's GroupNorm. This is feature is a part of #285 Details The formulas/equations involved in GroupNorm are the same as ...
GPU MODE ▷ #self-promotion (3 messages):
Nebius Explorer Tier
GPU pricing for researchers
Availability of H100 GPUs
Self-service platform advantages
-
Nebius launches Explorer Tier at $1.5/hour: Nebius introduced a special pricing offer of $1.5 per GPU per hour for the NVIDIA H100 Tensor Core SXM GPU, aimed at supporting small projects and individual researchers.
- This initiative provides immediate access to resources without waiting lists, allowing users to start experimenting quickly at a competitive market rate.
- Feedback welcomed for Nebius offer: Nebius encourages the community to provide feedback and share their new pricing offer, with a link that details the Explorer Tier and how it benefits researchers.
- They aim to foster a supportive environment for AI enthusiasts, making their platform accessible for various projects.
- User inquiry about GPU instance availability: A user raised concerns about the availability of A100/H100 instances, highlighting that many cheap platforms often lack them.
- In response, Nebius emphasized that a significant portion of their capacity is reserved for self-service and on-demand usage, assuring users they shouldn't face capacity issues.
- Nebius differentiates from cheap platforms: Nebius clarified that they do not position themselves as a 'cheap platform,' but rather provide a real self-service experience for customers along with a focus on growth.
- They are committed to offering computational resources not just for large customers but also for individual projects.
Links mentioned:
- Nebius AI Cloud Explorer Tier with H100 starting from just $1.5 per hour: Discover the most efficient way to build, tune and run your AI models and applications on top-notch NVIDIA® GPUs.
- Tweet from Nebius (@nebiusai): Just $1.5/h for #H100 #GPU 🔥 To support your first steps in new AI projects, we’re introducing the Explorer Tier — enjoy NVIDIA® H100 Tensor Core SXM GPUs at just $1.5 per hour for your first 1,000 G...
GPU MODE ▷ #🍿 (10 messages🔥):
Maintainer Access
GitHub Access Request
Popcorn Project Opportunity
Automated Deployments on Heroku
Deployment Strategies
-
Maintainer Access Granted: A user requested access and after providing their GitHub handle, they received maintainer access on GitHub and an invitation to the test Discord group.
- Sweet just gave you maintainer access was shared as part of the acknowledgment.
- GitHub Access and Roles Needed: Another user inquired about obtaining roles and GitHub access for their account, indicating a desire to discuss something further in DMs.
- Marks responded positively, inviting the user to DM for further conversation.
- University Students Can Use Popcorn Project: Marks offered assistance for university students wanting to use the popcorn project as a credential for projects, volunteering to sign any necessary documents.
- This was reposted with the intent to be useful for others seeking formal recognition of their work.
- Heroku Automation Updates: Marks announced that automated deployments on Heroku are now functional, allowing the bot to update by pushing changes to the main branch.
- He mentioned readiness to connect to the server once GPUs are available.
- Exploring Deployment Options: Marks expressed curiosity about how Heroku performs as a deployment option while also considering using a Raspberry Pi.
- This reflects ongoing exploration of deployment strategies for the bot.
GPU MODE ▷ #thunderkittens (7 messages):
Desirable Kernels
Beginner Contributions to ThunderKittens
Preliminary List of Features
Long Convolution Example
-
Confusion over Desirable Kernels List: alexarmbr asked for the list of desirable kernels and features mentioned in a previous post, struggling to locate it in the repository.
- marksaroufim clarified that they were looking for a specific list of kernels suitable for beginner contributions.
- Resource for Contributing to ThunderKittens: sydriax directed members to the ThunderKittens GitHub page for all project details, emphasizing its availability for public development.
- This transparency ensures that there are no private clones or hidden elements in the project.
- Preliminary List of Starter Examples: simran9493 mentioned there is a very preliminary list available on the GitHub README, particularly under the demos section.
- They encouraged others to explore these demos and communicate their interest in attempting to contribute.
- Encouragement for Contribution: simran9493 offered to provide PyTorch references for those interested in adding a long convolution for non-squared sequence lengths as a starter example.
- This addition aims to extend the existing long-convolution kernel, showing a proactive approach to enhancing the project.
- Ongoing Updates to Contribution List: simran9493 expressed intent to continue adding to the list of desirable contributions as it evolves.
- This reflects the collaborative spirit within the community and the commitment to keeping resources current.
Links mentioned:
- GitHub - HazyResearch/ThunderKittens: Tile primitives for speedy kernels: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.
- GitHub - HazyResearch/ThunderKittens: Tile primitives for speedy kernels: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.
GPU MODE ▷ #edge (1 messages):
Training and inference on edge devices
Hardware evolution for mobile/embedded
Challenges in production environments
Consumer and commercial use-cases
Hardware heterogeneity
-
Channel Launch for Edge Device Discussions: The channel was launched to facilitate discussions on training and inference specifically for edge (mobile/embedded) devices.
- It's a space evolving rapidly due to changing hardware needs for diverse consumer and commercial applications.
- Exciting Opportunities in Edge Device Hardware: Members are encouraged to share their projects as various hardware solutions are quickly evolving to support new use-cases.
- This evolution is stressing memory and power constraints, prompting innovative approaches to tackle these challenges.
- Unique Challenges in Production Settings: In production environments, there are unique challenges concerning hardware heterogeneity, reliability, accuracy, and compute.
- Addressing these issues will be crucial for success in deploying AI applications at the edge.
LlamaIndex ▷ #blog (3 messages):
NVIDIA competition
Automated resume insights agent
AI in recruiting
-
NVIDIA competition deadline approaching: The submission deadline for the NVIDIA competition is November 10th, with a chance to win prizes including an NVIDIA® GeForce RTX™ 4080 SUPER GPU and DLI credits. Interested participants can find more details and register here.
- The contest runs from August 27th to November 10th, encouraging developers to create innovative RAG applications powered by NVIDIA and LlamaIndex technologies.
- Tutorial for building automated resume insights agent: A member shared a tutorial on creating an automated resume insights agent that utilizes core parsing, extraction, and structured output modules. This practical example of AI in recruiting showcases how to handle unstructured resumes effectively, with the tutorial available here.
- The tutorial emphasizes the potential of AI in streamlining recruitment processes and improving candidate evaluations.
Link mentioned: NVIDIA and LlamaIndex Developer Contest: Stand a chance to win cash prizes, a GeForce RTX GPU, and more.
LlamaIndex ▷ #general (40 messages🔥):
Handling ChatMessage input
Issues with Anthropic tools
Citations in Llama Index
Pull Request Guidance
Parsing Excel Files
-
ChatMessage Input Errors: A user encountered an error stating, 'Input should be a valid dictionary or instance of ChatMessage' when using
ChatMessage.from_str()
for structured output.- It was suggested that the input should actually be a list, indicating a potential misunderstanding in the usage of the Llama Index API.
- Bugs with Anthropic Tools: One user expressed concerns about using Anthropic with tools, receiving 'Tools are not supported in streaming mode' messages.
- Another member confirmed they were using lower-level functions, noting that streaming wasn't integrated into the FunctionCallingAgent yet.
- Improving Citation Handling: A user sought guidance on displaying citations in Llama Index, stating that the existing citations query engine wasn't sufficient.
- Another member recommended checking the Citation Query Engine Implementation for enhanced customization.
- Guidance on Pull Requests: A user was unsure about updating documentation and version bumping in a pull request for Llama Index, asking for assistance.
- Another member clarified that documentation updates do not require a version bump, but that integration readmes do.
- Parsing and Indexing Excel Files: A user inquired about methods for parsing and indexing messy Excel files, considering converting sheets to markdown for embedding into vectordb.
- It was suggested to try LlamaParse, despite the user indicating that data could not leave their cloud platform for the project.
Links mentioned:
- LlamaParse: Transform unstructured data into LLM optimized formats — LlamaIndex, Data Framework for LLM Applications: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).
- llama_index/llama-index-integrations/llms/llama-index-llms-ollama at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- Build RAG with in-line citations - LlamaIndex: no description found
- fix var name in pinecone vector store by logan-markewich · Pull Request #16853 · run-llama/llama_index: Make sure the sparse vector is defined before trying to use it
- llama_index/llama-index-integrations/llms/llama-index-llms-ollama/pyproject.toml at 3add7441cb7d482812065b1ecaf4de2ebed4e6c6 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Latent Space ▷ #ai-general-chat (36 messages🔥):
Hunyuan-Large MoE Model
Integuru AI Agent
Chat.com Domain Sale
Scale AI's Defense Llama
Perplexity's Funding Round
-
Hunyuan-Large Model Release: Tencent released the Hunyuan-Large, a 389B MoE model, claiming it outperforms DeepSeek-V2 and Llama3-405B with less data usage. Discussions arose about its open-source status, with skepticism around model weights being equivalent to source code.
- “Not open source,” noted one member, citing discrimination clauses in its use policy and highlighting the potential issues with hosting such a large model.
- Concerns About Integuru: The consensus on the Integuru AI agent is largely pessimistic, with comments describing it as “very very brittle” and possibly doomed to fail due to challenges in maintaining integrations. It was suggested that if successful, it could facilitate self-healing capabilities similar to TDD.
- Members expressed skepticism about long-term viability, especially with API changes affecting performance, acknowledging the need for a fallback approach with a visual sandbox.
- Chat.com Domain Acquisition: The domain chat.com recently changed ownership, previously bought by Dharmesh for over $10 million, now speculated to have been purchased by OpenAI for $15-25 million. This sale could rank among the highest for a domain name.
- The news sparked interest in the implications of the domain's value and its relationship with OpenAI's branding, with discussions highlighting its significance in the AI chat landscape.
- Scale AI's Defense Llama Announcement: Scale AI announced Defense Llama, an LLM specifically tailored for American national security, developed in collaboration with Meta and defense experts. It is now available for integration into US defense systems.
- This release positions Scale AI at the intersection of AI and national security, highlighting the growing importance of specialized models in sensitive applications.
- Perplexity's Funding Concerns: Perplexity is raising funds for the fourth time this year at a 180x multiple on projected revenue, raising eyebrows and questions about the sustainability of such high valuations. This has led to a discussion around whether the market is experiencing a bubble.
- Critics pointed out that persistent high multiples may not be sustainable, prompting concern over the long-term viability of such funding rounds.
Links mentioned:
- Tweet from Nick Turley (@nickaturley): http://chat.com
- Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent: In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation param...
- Tweet from Aadit Sheth (@aaditsh): @sama Always wondered who @dharmesh sold the domain to. It all makes sense.
- Tweet from Anu Aakash (@anukaakash): Google Notebook LM hosts discussing an image in video format, not just as audio. (inspired by @EHuanglu's video) Process:
- Tweet from Alexandr Wang (@alexandr_wang): Scale AI is proud to announce Defense Llama 🇺🇸: the LLM purpose-built for American national security. This is the product of collaboration between @Meta, Scale, and defense experts, and is availabl...
- Tweet from morgan — (@morqon): perplexity are raising for the fourth time this year, at a 180x multiple on projected revenue — did anyone say bubble?
- Tweet from dharmesh (@dharmesh): BREAKING NEWS: I bought the domain name http://chat.com. tl;dr: I did it because #ChatUX is a Very Big Deal. Details posted here (so I don't have to answer 100 DMs and emails from friends, fami...
- Tweet from Steven Tey (@steventey): OpenAI just bought http://chat.com. Fun fact: @dharmesh was the previous owner of the domain and he bought it for $10M+ Rough estimate: OpenAI bought it for $15M - $25M range – making it one of the ...
- GitHub - Integuru-AI/Integuru: The first AI agent that builds third-party integrations through reverse engineering platforms' internal APIs.: The first AI agent that builds third-party integrations through reverse engineering platforms' internal APIs. - Integuru-AI/Integuru
- Reddit - Dive into anything: no description found
- Tencent Hunyuan-Large | Hacker News: no description found
Interconnects (Nathan Lambert) ▷ #events (1 messages):
swyxio: mee
Interconnects (Nathan Lambert) ▷ #news (8 messages🔥):
Swedish law in German
AI search startup Perplexity
Google AI reveal
Intersection of languages and domains
-
Recruiter Example on Language and Law: A recruiter provided an example of 'questions about Swedish law written in German', highlighting the niche intersection of specific languages and legal domains.
- Another member noted that, for Americans, this isn't too niche as Sweden and Germany have significant business interactions.
- Google's AI Agent Reveal: A member shared a tweet discussing Google's accidental reveal of its computer-based AI agent, named Jarvis.
- The discussion escalated about how social media would react to this revealing incident, anticipating a heightened excitement.
- Perplexity's 180x Revenue Multiple: According to a tweet, Perplexity, an AI search startup, nears a 180x multiple on forward revenue despite ongoing legal battles with NYT and other publishers.
- This potential valuation attracted attention even from those who expressed confusion about the startup's operations.
Links mentioned:
- Tweet from Amir Efrati (@amir): wow: AI search startup Perplexity nears getting a 180x multiple on forward revenue, despite the legal battles with NYT and other publishers. https://www.theinformation.com/articles/perplexity-nears-9...
- Tweet from Amir Efrati (@amir): news: google made an oopsy and revealed its computer using agent AI (jarvis) today
Interconnects (Nathan Lambert) ▷ #ml-questions (4 messages):
Drift in prompts
ChatGPT performance tracking
Data quality for applications
-
Redefining Drift: Prompt Changes Matter: A discussion emerged regarding drift, defined as changing prompts for the same task, calling for metrics beyond subjective feelings for performance evaluation.
- This highlights a need for better methodologies in quantifying performance without relying on vibes.
- ChatGPT's Under-the-Hood Tracking: A member speculated that ChatGPT likely monitors performance intricacies related to different prompts, hinting at a complex tracking system.
- This raises questions about the depth of insights available from such tracking.
- The Quest for High-Quality Data: Emphasizing the necessity for robust data, there was a consensus on the need for really good data to substantiate performance insights and evaluation.
- This necessity suggests the current information might not be ready for traditional or boring applications.
Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):
GPU Drama
V100 SSH Access
-
Wishing for GPU Drama Sharing: natolambert expressed a desire to share some internal GPU drama, highlighting potential issues or stories within their organization.
- This glimpse into internal conflicts sparked interest and discussion among members.
- Offering SSH Access to V100s: xeophon. responded positively, suggesting to share SSH access to some of their V100 GPU resources, indicating a willingness to help out.
- This offer shows camaraderie, with a touch of humor, as indicated by the heart emoji ❤️.
Interconnects (Nathan Lambert) ▷ #random (7 messages):
Discord app for email verification
Manual email collection process
External database synchronization
-
Looking for a Discord app for email verification: A member expressed a need for a solution that can sync with an external database for email verification in Discord, stating they might build it themselves.
- It seems like you have to build an auth flow of some kind is how they summarized their findings.
- Manual email collection as a fallback: Another member suggested a manual workaround by locking everyone in the channel until they provide their email and then unlocking them.
- They mentioned trying available apps but hadn't checked the options in a while.
- Quest for recommendations: Member called out for recommendations on apps that can handle email verification and database synchronization.
- This call for help sparked a light-hearted reply from another member, seeking further insights.
- Whimsical comment on service distraction: A member remarked on the nature of the process as a 'huge service/distraction' while navigating the discussion.
- This comment highlighted the often complex setups involved in managing Discord communities.
Interconnects (Nathan Lambert) ▷ #memes (2 messages):
OpenAI CEO petition
Biden not running
-
OpenAI Petition for Leadership Change: A member shared a tweet advocating for OpenAI to fire and re-hire its CEO today as a calming distraction.
- This petition reflects ongoing community sentiment regarding leadership stability at OpenAI.
- Surprise Announcement About Biden: A member tweeted about a surprising revelation for voters regarding Joe Biden, stating that he is not running for reelection.
- The tweet highlights the unexpected nature of this political shift, suggesting major implications for the upcoming election.
Links mentioned:
- Tweet from Alex Konrad (@alexrkonrad): petition for OpenAI to fire and re-hire its CEO today as a calming distraction
- Tweet from Armand Domalewski (@ArmandDoma): Imagine being a voter who just today found out Joe Biden isn’t running
Cohere ▷ #discussions (17 messages🔥):
Cohere Search Process
Embed3 Multimodal Embeddings
Parsing Techniques Comparison
Cohere Reranker API Availability
-
Cohere Search Process Explained: A member speculated about how ChatGPT and similar models utilize the Bing API to generate responses, mentioning the use of snippets from various web sources.
- The precise decision-making process regarding the balance between search results and training data remains unclear.
- Excitement for Embed3 Multimodal Embeddings: A member expressed enthusiasm about starting projects with embed3-multimodal embeddings, viewing it as a significant advancement over previous models like CLIP.
- Their current focus is on building a parsing service integrated with PostgreSQL using Cohere.embed3.
- API vs Self-hosted Parsing Techniques: The discussion highlighted various parsing services, noting the effectiveness of Upstage/Pymu4PDF in comparison to more expensive options like Marker.
- While self-hosting is valuable for those with abundant compute resources, the member finds API services more suitable for their start-up needs.
- Cohere Reranker Exclusively on API: A user inquired about the availability of Cohere reranker through the API.
- Another member confirmed that it is only available via the API.
Cohere ▷ #questions (4 messages):
Cohere billing process
Contacting sales
-
Cohere Billing: GCP Marketplace Confusion: A user raised a question about how charges would be billed after activating Cohere through the GCP Marketplace, inquiring if it would be charged to the payment card registered on the platform or billed through GCP.
- Another member clarified that Vertex bills via GCP, likely addressing the user's concerns about billing preferences.
- Inquiries about Sales Contact: One user asked if it is possible to contact sales directly within the channel.
- A member promptly provided an email address, stating to reach out to support@cohere.com for further assistance.
OpenAI ▷ #ai-discussions (9 messages🔥):
AI Storytelling Improvements
Interactive Prompting Techniques
GitHub Copilot Updates
-
AI storytelling has improved: A member expressed genuine surprise at how well AI now writes stories, noting that earlier outputs were boring and predictable.
- They mentioned feeling pleasantly surprised by the current quality, despite creating the prompts themselves.
- Enhancing AI creativity through interaction: A member recommended asking the AI what makes a good story before initiating a story prompt to boost creativity.
- Another member suggested asking the AI to review its own work and adjust based on the feedback to improve the output further.
- Consistent prompt techniques yield better results: One member shared that using the prompt
Please critique your answer. Then, answer the question again
has become their go-to method for any AI interaction.
- They highlighted the effectiveness of this feedback loop across various AI applications, not limited to storytelling.
- Curiosity about GPT o1 preview thought process: A member raised a concern about not being able to view the thought process for the GPT o1 preview.
- Their inquiry reflected a broader interest in understanding AI decision-making.
- GitHub Copilot introduces new features: It was noted that GitHub Copilot now has Sonnet as a new option alongside o1.
- This update suggests ongoing enhancements in AI coding assistance tools.
OpenAI ▷ #gpt-4-discussions (4 messages):
Document summarization hallucinations
Involvement of human experts
Canvas document deletion in CGPT4o
-
Hallucination Risks in Document Summarization: A member expressed concern about potential hallucinations in document summarization workflows, noting that testing with GPT-4o showed no issues but worried about production scaling.
- Another member confirmed that hallucinations are an inherent risk in LLMs, suggesting a second LLM pass for fact-checking to mitigate risks.
- Human Experts as Essential Safeguards: One participant emphasized the necessity of having a human subject matter expert involved when using powerful models for summarization tasks.
- “You really just gotta have that human… in the loop to keep an eye on things and doublecheck,” underscoring the importance of human oversight.
- Desire to Delete Documents in Canvas: A member voiced a wish for the ability to delete Canvas documents in the integration of CGPT4o + Canvas.
- This point highlights a potential limitation in the current functionality that users find inconvenient.
OpenAI ▷ #prompt-engineering (3 messages):
AI JSON Modification Techniques
Assistant API Token Limits
File Upload Data Handling
-
AI struggles with large JSON files: A member described issues with passing large JSON data to the assistant, noting it sometimes omits parts of the data from the output.
- They speculated that this might be due to token limits, leading to incomplete processing of the input file.
- Chunking JSON data for better results: The member considered chunking the JSON data to ensure the assistant processes all entries, though they aimed to avoid this as it may complicate future tasks.
- They sought alternative solutions instead of breaking the data down into smaller parts.
- Polling for assistant modifications: Discussion around prompting the AI to populate specific values in the JSON data without affecting other entries.
- The member mentioned using two assistants; one to handle data upload and another to format the output to avoid creative malformations in the JSON structure.
OpenAI ▷ #api-discussions (3 messages):
AI Prompting Techniques
JSON Data Handling
Token Limit Issues
-
Exploring AI Prompting Techniques: A user shared their method of using two assistants, one for uploading JSON data and another for formatting outputs to prevent malformation.
- They mentioned a humorous outcome where the AI gets 'creative' and outputs malformed JSON data despite instructions.
- Struggles with Large JSON Files: The user expressed frustration with the AI deleting portions of large JSON data during processing, leading to incomplete outputs.
- They speculated that this issue might be related to token limits, causing the assistant to omit data instead of processing the entire file.
- Avoiding Chunking JSON Data: The user contemplated chunking the JSON data to manage size but expressed a desire to avoid this solution due to potential complications.
- They seek a more effective approach to ensure the AI processes all entries in large JSON files without skipping.
tinygrad (George Hotz) ▷ #general (1 messages):
TokenFormer implementation
tinygrad
-
Minimal TokenFormer Ported to Tinygrad: A minimal implementation of TokenFormer has been successfully ported to tinygrad and is available on the GitHub repository.
- This implementation focuses on both inference and learning capabilities, enhancing the functionality of tinygrad.
- Discussion on Tinygrad Enhancements: The community expressed excitement about the new tinygrad features and how they improve model implementation and performance. Members shared insights about potential future integrations with other frameworks to further expand tinygrad's capabilities.
- Interest in Collaborative Development: Several members voiced interest in collaborating on improving the tinygrad ecosystem, focusing on contributions and support. There was a suggestion to organize a monthly meeting to discuss ongoing projects and gather feedback.
- Exploring New Architectures in Tinygrad: Conversations shifted toward exploring new architectures that could be supported by tinygrad, specifically mentioning the need for efficiency. Members debated the merits of various design choices and frameworks that could complement tinygrad.
- Performance Metrics for Tinygrad Models: A lively discussion arose regarding the performance metrics for models implemented in tinygrad, with suggestions for standardized benchmarking. Members agreed that collective metrics would help in evaluating progress and attracting more users.
Link mentioned: GitHub - kroggen/tokenformer-minimal at tinygrad: Minimal implementation of TokenFormer for inference and learning - GitHub - kroggen/tokenformer-minimal at tinygrad
tinygrad (George Hotz) ▷ #learn-tinygrad (10 messages🔥):
Hailo Reverse Engineering
CUDA WMMA Layout Discrepancies
-
Hailo Reverse Engineering begins: A member is starting their Hailo reverse engineering process with the goal of opening a new accelerator. They expressed concern about the necessity of compiling Kernels multiple times if they need to interface between ONNX, Tinygrad, and TensorFlow.
- They hope to avoid wasting time by ensuring that the kernels remain consistent between runs, especially when using
BEAM=2
. - Confusion over CUDA WMMA Layout: A member raised questions about whether the layout of A in CUDA WMMA differs from the expected format outlined in the NVIDIA documentation. They provided code snippets indicating discrepancies in input shapes.
- Another member sought clarification on the differences in the ops_python mapping functions and expressed interest in correctly addressing any mismatches with the actual TC implementation.
- They hope to avoid wasting time by ensuring that the kernels remain consistent between runs, especially when using
OpenInterpreter ▷ #general (8 messages🔥):
Comparative Tool Interfaces
OS Mode Updates
Claude Computer Control
-
Looking for Tool Interface Standards: A member expressed interest in discussing comparative discussions of tool interfaces, suggesting a need for standardization amid the vast array of frameworks available.
- Another member chimed in, humorously noting the overwhelming number of frameworks and how it's challenging to provide specifics.
- OS Mode Exclusively for Anthropic Models: Members discussed whether the new update to OS mode exclusively supports Anthropic models, confirming that it does for now, but fixes are anticipated soon.
- A member mentioned trying for a demo at a house party the following day.
- Understanding OS Mode Functionality: A member sought clarification on how OS mode translates prompts to actions on a desktop, questioning the mechanism behind code generation and mouse clicking.
- It was explained that the system employs Claude Computer Control to execute mouse clicks and provided a link to the relevant code for further reference.
Links mentioned:
- Remote ollama/llava (localhost:11434) fails to load in OS mode · Issue #1486 · OpenInterpreter/open-interpreter: Describe the bug Attempting to run ollama/llava in OS mode gives the following error: I'm not sure if it has to do with this specific model or not, or if being pointed to a remote ollama server is...
- open-interpreter/computer_use/tools/computer.py at development · OpenInterpreter/open-interpreter: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.
- Open Interpreter: Open Interpreter has 6 repositories available. Follow their code on GitHub.
OpenInterpreter ▷ #O1 (1 messages):
zer0blanks.: https://www.tiktok.com/t/ZTFckAFHR/
Modular (Mojo 🔥) ▷ #mojo (8 messages🔥):
C_Buffer Structure Changes
Performance Improvements using Pointers
Understanding Bounds Checks
-
C_Buffer structure changes accelerate performance: A member indicated they would change the C_Buffer structure, informing others of upcoming performance results as they develop their matmul kernel in Mojo.
- They expressed gratitude towards the community, asserting that using pointers instead of a list made the implementation faster.
- Questions on initializing lists with extra elements: A member questioned the logic behind initializing a list with 8 elements and then appending another 8.
- The original author admitted to having uploaded a prior version of their Mojo code.
- Inquiry about additional security bounds checks: A member requested links to read about the specific additional security bounds checks that slow down the list structure.
- Another member responded that these checks are generic and present in most programming languages aside from C, mentioning C++'s recommended indexing methods.
OpenAccess AI Collective (axolotl) ▷ #general (4 messages):
ScheduleFree SOAP
Hyperparameter Adjustments
MOEs and Model Merging
CAME Comparison
-
ScheduleFree SOAP boasts efficiency improvements: The ScheduleFree SOAP implementation is claimed to be more compute-efficient, memory-efficient, and converges faster than traditional SOAP by allowing higher learning rates.
- This efficiency makes it a strong contender among existing optimizers, primarily focusing on fast _foreach and PaLM versions.
- Note on hyperparameter changes for ScheduleFree SOAP: For optimal performance with ScheduleFree SOAP, hyperparameters must be adjusted: it utilizes PaLM's beta2 schedule, renaming 'betas' to 'beta', while supporting higher learning rates — suggesting a 10x increase.
- Warmup is necessary, with 10% recommended in literature, but 100 steps can suffice to get started effectively.
- Interest in MOEs and model merging wanes post-Llama 3.2: A member inquired about the current status of Models of Experts (MOEs) and model merging, noting the lack of discussions since the release of Llama 3.2.
- This raises questions about the ongoing relevance and application of these strategies in the current landscape.
- Comparative discussion with CAME: A member questioned how ScheduleFree SOAP compares with CAME, seeking insights on performance metrics or efficiencies.
- This comparison indicates an interest in understanding the latest advancements in optimization techniques.
Link mentioned: HeavyBall/heavyball/schedule_free_palm_foreach_soap.py at main · ClashLuke/HeavyBall: Implementations of various optimizers; mostly focussing on fast _foreach and PaLM versions - ClashLuke/HeavyBall
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):
Zero2 performance
Zero1 troubleshooting
-
Zero2 performance issues: A user reported that Zero2 was extremely slow, indicating that it would not work for their needs.
- They expressed the need to find fixes while considering a return to Zero1.
- Exploring fixes for Zero1: In light of the performance issues with Zero2, the user is looking for solutions to improve Zero1.
- This suggests a potential shift back to Zero1 if no improvements are found in Zero2 performance.
LAION ▷ #general (4 messages):
Open Source Speech Enhancer
Resemble Enhance
-
Spirit from Germany critiques Resemble Enhance: A user inquired about a good open source speech enhancer, leading to the mention of Resemble Enhance.
- Spirit from Germany tested it and found the results to be underwhelming due to the presence of artifacts.
- Discussion around speech enhancing tools: The conversation centers on the performance of various speech enhancers with users sharing their experiences.
- Concerns regarding artifacts and overall effectiveness of tools like Resemble Enhance were prominently highlighted.
DSPy ▷ #general (2 messages):
RLhF paradigm
Serialized multi-component systems
-
Understanding RLhF and Textual Feedback: A member raised a theoretical question about the RLhF (Reinforcement Learning from Human Feedback) paradigm, specifically regarding how to translate textual feedback into numerical rewards in open-world scenarios, beyond simple hard labeling.
- Isn’t there any other way apart from hard labeling? suggests curiosity about more flexible feedback mechanisms.
- Limitations in Documenting Multi-Component Systems: Another member reported that in a serialized multi-component DSPy system, the
lm.history()
only shows the doc string for the first component, with intermediate classes providing less detail.
- This raises questions about whether this behavior is expected or indicates a limitation in how documentation is generated for complex systems.
Torchtune ▷ #dev (2 messages):
KD-div Misinterpretation
Cross-Entropy Optimization
-
KD-div's Returned Value Confusion: It's highlighted that while we refer to it as KD-div, the returned value is actually cross-entropy, which can lead to misinterpretation when comparing with other loss functions like KL-div.
- The potential confusion arises particularly when considering swapping teacher and student logits, a process often referred to as reverse KL.
- Cross-Entropy as a Natural Extension: A viewpoint shared suggests it feels more intuitive to optimize for cross-entropy, interpreting it as an extension of cross-entropy loss from regular labels to soft labels produced by a teacher model.
- This perspective emphasizes the evolution from hard labels in training to soft labels in fine-tuning as a natural progression.