[AINews] Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Team and $31m is all you need to recreate Stability?
AI News for 7/31/2024-8/1/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (335 channels, and 3565 messages) for you. Estimated reading time saved (at 200wpm): 346 minutes. You can now tag @smol_ai for AINews discussions!
We have been covering Rombach et al's work this year closely as he shipped Stable Diffusion 3 and then left Stability AI. His new stab at the text-to-image domain is FLUX.1, and we love featuring pretty images here so here it is executing a variety of standard tasks from hyperrealistic to fantastical to photorealistic to long text prompting:
The three variants span the spectrum of size and licensing:
- pro: API only
- dev: open-weight, non-commercial
- schnell: Apache 2.0
Based on Black Forest Labs' own ELO score, all three varients outdo Midjourney and Ideogram:
They also announced they will work on SOTA Text-to-Video next. All in all, one of the strongest and most confident model lab launches we've seen this past year.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- HuggingFace Discord
- Nous Research AI Discord
- Unsloth AI (Daniel Han) Discord
- Perplexity AI Discord
- OpenAI Discord
- CUDA MODE Discord
- Stability.ai (Stable Diffusion) Discord
- LM Studio Discord
- Eleuther Discord
- Interconnects (Nathan Lambert) Discord
- Latent Space Discord
- LlamaIndex Discord
- Cohere Discord
- LangChain AI Discord
- OpenRouter (Alex Atallah) Discord
- OpenInterpreter Discord
- Modular (Mojo 🔥) Discord
- OpenAccess AI Collective (axolotl) Discord
- DSPy Discord
- tinygrad (George Hotz) Discord
- LAION Discord
- Torchtune Discord
- MLOps @Chipro Discord
- LLM Finetuning (Hamel + Dan) Discord
- PART 2: Detailed by-Channel summaries and links
- HuggingFace ▷ #announcements (1 messages):
- HuggingFace ▷ #general (852 messages🔥🔥🔥):
- HuggingFace ▷ #cool-finds (4 messages):
- HuggingFace ▷ #i-made-this (16 messages🔥):
- HuggingFace ▷ #reading-group (8 messages🔥):
- HuggingFace ▷ #core-announcements (1 messages):
- HuggingFace ▷ #NLP (2 messages):
- HuggingFace ▷ #diffusion-discussions (2 messages):
- Nous Research AI ▷ #off-topic (1 messages):
- Nous Research AI ▷ #interesting-links (9 messages🔥):
- Nous Research AI ▷ #general (441 messages🔥🔥🔥):
- Nous Research AI ▷ #ask-about-llms (2 messages):
- Nous Research AI ▷ #reasoning-tasks-master-list (3 messages):
- Unsloth AI (Daniel Han) ▷ #general (205 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (4 messages):
- Unsloth AI (Daniel Han) ▷ #help (130 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #research (5 messages):
- Perplexity AI ▷ #announcements (1 messages):
- Perplexity AI ▷ #general (293 messages🔥🔥):
- Perplexity AI ▷ #sharing (10 messages🔥):
- Perplexity AI ▷ #pplx-api (4 messages):
- OpenAI ▷ #ai-discussions (255 messages🔥🔥):
- OpenAI ▷ #gpt-4-discussions (24 messages🔥):
- OpenAI ▷ #prompt-engineering (12 messages🔥):
- OpenAI ▷ #api-discussions (12 messages🔥):
- CUDA MODE ▷ #general (55 messages🔥🔥):
- CUDA MODE ▷ #triton (9 messages🔥):
- CUDA MODE ▷ #torch (3 messages):
- CUDA MODE ▷ #algorithms (1 messages):
- CUDA MODE ▷ #cool-links (3 messages):
- CUDA MODE ▷ #pmpp-book (2 messages):
- CUDA MODE ▷ #torchao (11 messages🔥):
- CUDA MODE ▷ #llmdotc (177 messages🔥🔥):
- CUDA MODE ▷ #lecture-qa (1 messages):
- CUDA MODE ▷ #cudamode-irl (4 messages):
- Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):
- Stability.ai (Stable Diffusion) ▷ #general-chat (212 messages🔥🔥):
- LM Studio ▷ #general (121 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (24 messages🔥):
- Eleuther ▷ #general (88 messages🔥🔥):
- Eleuther ▷ #research (7 messages):
- Eleuther ▷ #scaling-laws (15 messages🔥):
- Eleuther ▷ #interpretability-general (5 messages):
- Eleuther ▷ #lm-thunderdome (11 messages🔥):
- Interconnects (Nathan Lambert) ▷ #news (61 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #ml-drama (32 messages🔥):
- Interconnects (Nathan Lambert) ▷ #random (4 messages):
- Interconnects (Nathan Lambert) ▷ #posts (28 messages🔥):
- Latent Space ▷ #ai-general-chat (56 messages🔥🔥):
- LlamaIndex ▷ #blog (3 messages):
- LlamaIndex ▷ #general (47 messages🔥):
- LlamaIndex ▷ #ai-discussion (1 messages):
- Cohere ▷ #discussions (16 messages🔥):
- Cohere ▷ #questions (17 messages🔥):
- Cohere ▷ #api-discussions (15 messages🔥):
- Cohere ▷ #cohere-toolkit (3 messages):
- LangChain AI ▷ #general (45 messages🔥):
- LangChain AI ▷ #langserve (2 messages):
- LangChain AI ▷ #share-your-work (2 messages):
- OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):
- OpenRouter (Alex Atallah) ▷ #general (39 messages🔥):
- OpenInterpreter ▷ #general (23 messages🔥):
- OpenInterpreter ▷ #O1 (8 messages🔥):
- Modular (Mojo 🔥) ▷ #general (18 messages🔥):
- Modular (Mojo 🔥) ▷ #mojo (4 messages):
- Modular (Mojo 🔥) ▷ #max (5 messages):
- OpenAccess AI Collective (axolotl) ▷ #general (8 messages🔥):
- OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (5 messages):
- OpenAccess AI Collective (axolotl) ▷ #general-help (6 messages):
- OpenAccess AI Collective (axolotl) ▷ #replicate-help (1 messages):
- OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (4 messages):
- DSPy ▷ #general (10 messages🔥):
- DSPy ▷ #random (1 messages):
- DSPy ▷ #papers (8 messages🔥):
- DSPy ▷ #jobs (2 messages):
- DSPy ▷ #colbert (1 messages):
- tinygrad (George Hotz) ▷ #general (2 messages):
- tinygrad (George Hotz) ▷ #learn-tinygrad (11 messages🔥):
- LAION ▷ #general (4 messages):
- LAION ▷ #research (6 messages):
- Torchtune ▷ #general (5 messages):
- Torchtune ▷ #dev (4 messages):
- MLOps @Chipro ▷ #events (2 messages):
- MLOps @Chipro ▷ #general-ml (5 messages):
- LLM Finetuning (Hamel + Dan) ▷ #general (3 messages):
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
Gemma 2 Release and AI Model Developments
Google DeepMind released Gemma 2, a new family of open-source AI models, including a 2 billion parameter model (Gemma-2 2B) that has achieved impressive performance:
- @GoogleDeepMind announced Gemma-2 2B, a new 2 billion parameter model offering best-in-class performance for its size and efficient operation on various hardware.
- @lmsysorg reported that Gemma-2 2B achieved a score of 1130 on the Chatbot Arena, outperforming models 10x its size and surpassing GPT-3.5-Turbo-0613 (1117) and Mixtral-8x7b (1114).
- @rohanpaul_ai highlighted that Gemma-2 2B outperforms all GPT-3.5 models on Chatbot Arena, using distillation to learn from larger models and optimized with NVIDIA TensorRT-LLM for various hardware deployments.
- @fchollet noted that Gemma 2-2B is the best model for its size, outperforming GPT 3.5 and Mixtral on the lmsys Chatbot Arena leaderboard.
The release also includes additional components:
- ShieldGemma: Safety classifiers for detecting harmful content, available in 2B, 9B, and 27B sizes.
- Gemma Scope: Uses sparse autoencoders (SAEs) to analyze Gemma 2's internal decision-making, with over 400 SAEs covering all layers of Gemma 2 2B and 9B.
AI Model Benchmarks and Comparisons
- @bindureddy criticized the Human Eval Leaderboard, claiming it's gamed and doesn't accurately represent model performance. They argue that GPT-3.5 Sonnet is superior to GPT-4o-mini, despite leaderboard rankings.
- @Teknium1 pointed out a discrepancy between Arena scores and MMLU performance for Gemma-2 2B, noting it scores higher than GPT-3.5-turbo on Arena but has an MMLU of 50 compared to 3.5-turbo's 70.
Open-Source AI and Government Stance
- @ClementDelangue shared that the United States Department of Commerce issued policy recommendations supporting the availability of key components of powerful AI models, endorsing "open-weight" models.
- @ylecun praised the NTIA report supporting open-weight/open-source AI platforms, suggesting it's time to abandon innovation-killing bills based on imaginary risks.
AI in Coding and Development
- @svpino discussed the limitations of current AI coding tools like Cursor, ChatGPT, and Claude, noting they don't significantly improve productivity in writing code.
- @svpino emphasized the potential of "passive AI" tools that work in the background, offering recommendations and identifying issues in code without requiring explicit queries.
Other Notable AI Developments
- @c_valenzuelab demonstrated real-time video generation, producing 10 seconds of video in 11 seconds.
- @mervenoyann discussed SAMv2 (Segment Anything Model 2), which introduces a new task called "masklet prediction" for video segmentation, outperforming previous state-of-the-art models.
- @rohanpaul_ai shared information about faster ternary inference, allowing a 3.9B model to run as fast as a 2B model while using only 1GB of memory.
Memes and Humor
- @bindureddy joked about Apple Vision Pro being abandoned by users and potentially being the biggest flop in Apple's history.
- @teortaxesTex shared a humorous tweet about the "Friend" gimmick.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Google's Gemma 2 Release and Ecosystem
- Google just launched 3 new Gemma products (Gemma 2 2B, ShieldGemma, and Gemma Scope) (Score: 143, Comments: 30): Google has expanded its Gemma AI lineup with three new products: Gemma 2 2B, ShieldGemma, and Gemma Scope. While specific details about these products are not provided in the post, the launch suggests Google is continuing to develop and diversify its AI offerings in the Gemma family.
- Gemma-2 2b 4bit GGUF / BnB quants + 2x faster finetuning with Flash Attention support! (Score: 74, Comments: 10): Google released Gemma-2 2b, trained on 2 trillion tokens of distilled output from a larger LLM. The post author uploaded 4bit quantized versions (bitsandbytes and GGUF) for 2b, 9b, and 27b models, and developed a method for 2x faster finetuning with 63% less VRAM usage, incorporating Flash Attention v2 support for Gemma-2. They provided links to various resources including Colab notebooks, quantized models on Hugging Face, and an online inference chat interface for Gemma-2 instruct.
- Google quietly released a sparse auto-encoder to interpret Gemma 2 and 9b. This is a google colab they put together to get you started. Super exciting, I hope Meta follows this example! (Score: 104, Comments: 22): Google has released a sparse auto-encoder for interpreting Gemma 2 and 9b models, providing a Google Colab notebook to help users get started with the tool. This release aims to enhance the interpretability of these language models, potentially setting a precedent for increased transparency in AI development that the poster hopes other companies like Meta will follow.
- The sparse auto-encoder tool allows visualization of layer activations for each token, potentially enabling research into refusal removal, induction heads, and model lying detection. Users can explore low-hanging fruit in safety research and measure fine-tuning impacts on specific concepts.
- The tool opens possibilities for runtime, low-cost fine-tuning to promote certain moods or themes in AI models. This could be applied to create dynamic AI experiences, such as an interrogation game where the model's lying probability is scored in real-time.
- Users discussed interpreting the tool's graphs, noting they show token probabilities which can quantify fine-tuning effects. The feature activations, represented as number strings, are considered more useful than the visual dashboard for analysis purposes.
Theme 2. Open Source LLM Advancements and Comparisons
- Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed (Score: 156, Comments: 49): The Llama-3.1 8B model has been released in a 4-bit HQQ/calibrated quantized version, achieving 99.3% relative performance to FP16 while offering the fastest inference speed for transformers. This high-quality quantized model is available on Hugging Face, combining efficiency with performance for improved AI applications.
- Just dropping the image.. (Score: 562, Comments: 74): The image compares OpenAI's model releases with open-source alternatives, highlighting the rapid progress of open-source AI development. It shows that while OpenAI released GPT-3 in June 2020 and ChatGPT in November 2022, open-source models like BLOOM, OPT, and LLaMA were released in quick succession between June and December 2022, with Alpaca following in March 2023.
- Users criticize OpenAI's lack of openness, with comments like "OpenAI being full closed. The irony." and suggestions to rename it "ClosedAI" or "ClosedBots". Some argue OpenAI is sustained by public hype and brand recognition from being first in the space.
- Gemma 2 from Google receives praise, with users noting its surprising quality and personality. One user describes it as "better than L3 in many ways" and expresses anticipation for Gemma 3 with potential multimodality and longer context.
- Mistral AI is commended for its rapid progress despite limited resources compared to larger companies. Users suggest normalizing comparisons based on team size and available resources to highlight Mistral's achievements.
- Google's Gemma-2-2B vs Microsoft Phi-3: A Comparative Analysis of Small Language Models in Healthcare (Score: 65, Comments: 9): A comparative analysis of Google's Gemma-2-2b-it and Microsoft's Phi-3-4k models in the medical field reveals their performance without fine-tuning. Microsoft's Phi-3-4k outperforms with an average score of 68.93%, while Google's Gemma-2-2b-it achieves 59.21% on average, as shared in a tweet by Aaditya Ura.
- Users criticized the graph color choices in the original analysis, highlighting the importance of visual presentation in data comparisons.
- Discussion arose about the specific Phi-3 model used, with speculation it was the 3.8B Mini version. Users also inquired about fine-tuning techniques for the PubMed dataset.
- Debate ensued on the relevance of evaluating small LLMs on medical QA datasets. Some argued for its importance in assessing medical knowledge, while others noted LLMs are already being used to answer medical questions, especially in areas with limited access to doctors.
Theme 3. Hardware and Inference Optimization for LLMs
- Woah, SambaNova is getting over 100 tokens/s on llama 405B with their ASIC hardware and they let you use it without any signup or anything. (Score: 247, Comments: 94): SambaNova has achieved a breakthrough in AI hardware performance, generating over 100 tokens per second on the Llama 405B model using their ASIC hardware. This technology is now accessible to users without requiring any signup process, potentially democratizing access to high-performance AI inference capabilities.
- Post your tokens per second for llama3.1:70b (Score: 61, Comments: 124): The post requests users to share their tokens per second (TPS) performance benchmarks for the Llama 3.1 70B model. While no specific performance data is provided in the post itself, it aims to collect and compare TPS metrics from different users and hardware setups running this large language model.
- 70b here I come! (Score: 216, Comments: 65): The post author is preparing to run 70B parameter models with a high-end GPU setup. They express excitement about their upcoming capability to work with large language models, as indicated by the enthusiastic title "70b here I come!"
- Users discussed thermal management, with one mentioning undervolting two 3090 FE GPUs for better performance. The original poster uses a Meshify case with good airflow and disables the 3090 when not needed.
- Performance benchmarks were shared, with one user reporting 35 tokens per second using AWQ and LMDeploy for the LLaMA 3.1 70B model. Another recommended a GitHub tool for monitoring GDDR6 memory temperatures.
- Concerns about 3090 memory overheating were raised, especially in warmer climates. One user experienced crashes with Stable Diffusion image generation and resorted to removing the case side panel for better cooling.
Theme 4. New Tools and Frameworks for LLM Development
- PyTorch just released their own llm solution - torchchat (Score: 135, Comments: 28): PyTorch has released torchchat, a new solution for running Large Language Models (LLMs) locally on various devices including servers, desktops, and mobile. The tool supports multiple models like Llama 3.1, offers Python and native execution modes, and includes features for eval and quantization, with the GitHub repository available at https://github.com/pytorch/torchchat.
- A user tested torchchat with Llama 3.1, achieving 26.47 tokens/sec on an NVIDIA GeForce RTX 3090. Comparatively, vLLM reached 43.2 tokens/s initially, and up to 362.7 tokens/s with higher batch sizes.
- Discussions focused on performance optimization, including using --num-samples for more representative metrics after warmup, --compile and --compile-prefill for PyTorch JIT engagement, and --quantize for model quantization.
- Users inquired about ROCm support for AMD GPUs, compatibility with Mamba models, and comparisons to other frameworks like Ollama and llama.cpp.
All AI Reddit Recap
r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity
AI Research and Applications
- Google DeepMind's Diffusion Augmented Agents: A new paper from Google DeepMind introduces Diffusion Augmented Agents, potentially advancing AI capabilities in complex environments. (r/singularity)
- AI outperforms doctors in prostate cancer detection: A study finds AI detects prostate cancer 17% more accurately than doctors, showcasing the potential of AI in medical diagnostics. (r/singularity)
AI Products and User Experiences
- ChatGPT Advanced Voice Mode: A video demonstration shows ChatGPT's voice mode mimicking an airline pilot before abruptly stopping due to content guidelines. (r/singularity)
- OpenAI's improved conversational AI: A user reports better conversational flow and educational capabilities in OpenAI's latest update, used during a 1.5-hour commute to learn about GitHub repositories. (r/OpenAI)
- Criticism of AI wearable device: A post criticizes a new AI wearable device, comparing it to previous failed attempts like the Humane Pin and Rabbit R1. Users discuss potential issues with the device's functionality and business model. (r/singularity)
AI and Data Rights
- Reddit CEO demands payment for AI data access: Reddit's CEO states that Microsoft should pay to search the site, sparking discussions about data rights and compensation for user-generated content. (r/OpenAI)
AI Discord Recap
A summary of Summaries of Summaries
Claude 3.5 Sonnet
1. New AI Models and Capabilities
- Llama 3.1 Launch Sparks Debate: Meta released Llama 3.1, including a new 405 billion parameter model trained on 15.6 trillion tokens, with Together AI's blog post sparking debate about implementation differences affecting model quality across providers.
- The AI community engaged in discussions about potential cherry-picking of results and the importance of rigorous, transparent evaluation methodologies. Dmytro Dzhulgakov pointed out discrepancies in Together AI's showcase examples, emphasizing the need for consistent quality testing.
- Flux Shakes Up Text-to-Image Generation: Black Forest Labs, formed by original Stable Diffusion team members, launched FLUX.1, a new suite of state-of-the-art text-to-image models including a 12B parameter version available under non-commercial and open licenses.
- The FLUX.1 model gained attention for its impressive capabilities, with users noting its strengths in rendering body extremities like hands and fingers. A pro version of FLUX.1 is already available for testing on Replicate, showcasing the rapid development in the text-to-image space.
2. AI Infrastructure and Efficiency Gains
- MoMa Architecture Boosts Efficiency: Meta introduced MoMa, a new sparse early-fusion architecture for mixed-modal language modeling that significantly improves pre-training efficiency, as detailed in their recent paper.
- According to Victoria Lin, MoMa achieves approximately 3x efficiency gains in text training and 5x in image training. The architecture employs a mixture-of-experts (MoE) framework with modality-specific expert groups for handling interleaved mixed-modal token sequences.
- GitHub Integrates AI Models: GitHub announced GitHub Models, a new feature that brings industry-leading AI tools directly to developers on their platform, aiming to bridge the gap between coding and AI engineering.
- This integration is designed to make AI more accessible to GitHub's massive developer base, potentially transforming how coding and AI interact at scale. The community speculated whether this move is an attempt to compete with platforms like Hugging Face by integrating AI capabilities into developers' existing workflows.
3. AI Ethics and Policy Developments
- NTIA Advocates for Open AI Models: The National Telecommunications and Information Administration (NTIA) issued a report supporting the openness of AI models while recommending risk monitoring to guide policymakers in the US.
- Community members noted the NTIA's direct reporting line to the White House, giving significant weight to its policy recommendations on AI model openness. This report could potentially influence future AI regulations and policy directions in the United States.
- Watermarking Debate in AI Trust: A debate emerged around the effectiveness of watermarking in solving trust issues in AI, with some arguing it only works in institutional settings and cannot prevent misuse entirely.
- The discussion suggested that better cultural norms and trust mechanisms, rather than watermarking alone, are needed to address the spread of deepfakes and misrepresented content. This highlights ongoing challenges in establishing trust and authenticity in AI-generated content.
PART 1: High level Discord summaries
HuggingFace Discord
- Fresh Web Simulators for Neural Networks: A new Neural network simulation tool invites AI enthusiasts to fiddle with different neural network configurations online.
- The simulator aims at demystifying neural network behaviors, featuring an interactive experience for users to modify and understand neural dynamics.
- Blueprints for Transferable AI Wisdom: IBM offers a detailed breakdown of Knowledge Distillation, elucidating the process of imbuing compact 'student' models with insights from bulkier 'teacher' models.
- Knowledge distillation stands out as a method for model compression and efficient knowledge transfer, pivotal for AI scalability.
- Interactive Heatmap Chronicles Model Milestones: An innovative heatmap space charts AI model releases, gaining community interest for its potential integration into Hugging Face profiles.
- This tool presents an insightful visual aggregation of model development trends, aiming to bolster visibility and understanding of AI evolution tempo.
- Crafting Semantic Parsers for Solr: A member seeks advice on teaching a Large Language Model (LLM) to interpret queries for Apache Solr, aiming to generate JSON responses with product information.
- With no training dataset at hand, the challenge lies in methodically guiding the LLM to enhance search functionality and user experience.
Nous Research AI Discord
- Chameleon Architecture Leaps Ahead: A new multi-modal architecture pioneered by the creators of Chameleon boasts substantial efficiency gains, with details available in an academic paper.
- Victoria Lin provided insights on Twitter, noting gains of approximately 3x in text training and 5x in image training, making MoMa 1.4B a standout performer (source).
- Decoding the Speculative Decoding: Speculative decoding mechanisms were a hot topic, with claims that smaller draft models can impact output distribution unless corrected by techniques like rejection sampling.
- A YouTube resource further explains speculative decoding, hinting at the balance between speed and fidelity in the process.
- Bitnet Boasts Blazing Speed: Bitnet's finetuning approach is drawing attention, achieving an impressive 198 tokens per second on a singular CPU core as reported on Reddit.
- A compact 74MB model emerged from this finetuning method, with an open-source release expected, triggering anticipation for its use in future projects (Twitter source).
- LangChain: A Key or a Kink?: Debates arose around the necessity of LangChain when using Mixtral API in the OpenAI API format.
- Some members question the requirement for LangChain, suggesting direct API interactions might suffice, sparking a discussion on tool dependencies and API conventions.
- Project Participation without the Price Tag: Members of the community inquired about ways to assist with a no-cost AI project, with steps laid out in an anticipated PR.
- The discussion affirmed the project's cost-free nature, highlighting the actionable tasks to be disclosed in a forthcoming PR, easing onboarding for new contributors.
Unsloth AI (Daniel Han) Discord
- Multi-GPU Meltdown to Victory: Discussions shed light on multi-GPU training issues, praising fixes but highlighting initial setup headaches and environmental tweaks.
- A swap to
llamafacs env
was the key to success for some, contrasting with the more hands-on approach of a manual transformers upgrade for others.
- A swap to
- Unsloth Crypto Runner Unveiled: Details on Unsloth Crypto Runner's AES/PKI-based design were reconciled, elucidating its cryptographic communication from client to server.
- The community buzzed when
MrDragonFox
underscored the imperative of GPU usage, and Skunkworks AI's intent to open-source was revealed.
- The community buzzed when
- Continuous Qwen Refinement Realized: Qwen2-1.5B-Instruct's Continuous Fine-tuning Without Loss ushered in a blend of code FIM and instruct capabilities, marking a technical milestone.
- Community spirit was buoyed as a call for a tutorial to demystify documentation challenges echoed amongst users.
- LoRA's Binding Predicament: Merging LoRA adapters was brought to the fore, with a focus on the risks of melding leading to deceptive 16-bit representations from 4-bit models.
- Concerns bubbled up about the propagation of these faux 16-bit models within the community, prompting vigilance.
Perplexity AI Discord
- Perplexity's Prodigy Perk with Uber One: Uber One members now have access to Perplexity Pro subscription for free, valid until October 31, 2024, providing an enhanced answer engine worth $200.
- To avail this benefit, users in the US and Canada need to maintain their Uber One subscription and set up a new Perplexity Pro account. More details are at Perplexity Uber One.
- Perplexity Tops AI Search Engine Benchmarks: In a comparative assessment, Perplexity Pro outranked rivals like Felo.ai and Chatlabs, excelling in UI/UX and query responses.
- Members rated search engines on their capabilities with Pro Search appearing as a favorite, highlighted on platforms such as ChatLabs.
- Perplexity API Prompts Puzzlement: Discussions revealed user dissatisfaction regarding suboptimal outputs from Perplexity's API, feeling the result quality has declined.
- Speculation about problem prompts rose, with individuals requesting advice on improving outcomes and expressing curiosity about Perplexity References Beta access.
- Perplexity's Refined Flask Authentication: A discussion on Flask highlighted the need for secure user authentication, recommending packages such as
Flask-Login
, and a secure setup guide.- Users were pointed to resources outlining model creation, user authentication routes, and encryption practices.
- OpenAI Voices Future with GPT-4o: OpenAI impressed with its launch of Advanced Voice Mode for ChatGPT, granting Plus subscribers realistic voice interactions as of July 30, 2024.
- The update allows for enhanced vocal features, like emotional tone variation and interruption handling, documented on OpenAI's update page.
OpenAI Discord
- Vivid Visionaries: GPT-4o Sparks Image Innovation: Enthusiastic debate surged on GPT-4o's image output capabilities with users comparing it to DALL-E 3, sharing examples that sparked interest over its lifelike and realistic imagery.
- Despite acclaims for GPT-4o's impressive outputs, criticisms arose on its moderation endpoint, echoing similar concerns faced by DALL-E 3.
- Versatile Vocals: GPT-4o's Vocal Prowess Under the Microscope: AI aficionados tested GPT-4o's voice model abilities, highlighting its adaptability with accents and emotional range, and its capacity to meld background tunes and effects.
- Findings were a mix of admiration for its potential and pointers to its inconsistent performance, igniting discussions on the model's limitations and future improvements.
- Platform Conundrums: The Search for Prompt Precision: AI Engineering mavericks swapped insights on preferred platforms for prompt engineering, elevating Claude 3, Sonnet, and Artifacts + Projects as prime candidates.
- Heuristic tools for prompt evaluations grabbed the spotlight, with the Anthropic Evaluation Tool mentioned for its heuristic approach, while a collaborative Google Sheet with scripts was tabled as a sharable and efficient alternative.
- Strategic Subscription Shift: Pondering Plus's Influence: Community chatter revolved around the impact of cancelling Plus subscriptions, revealing that doing so would render custom GPTs inaccessible.
- The contemplation extended to the prerequisites for GPT monetization, spotlighting the need for substantial usage metrics and localization within the USA as criteria for revenue generation opportunities.
- The Diagram Dilemma: Charting Courses Through AI Assistance: In the world of AI diagrams, participants probed for complimentary tools adept at crafting visual aides, with a nod to ChatGPT – though its diagram-drawing talents remain up for debate.
- The dialogue also touched on the challenge LLMs face in text truncation, suggesting that seeking qualitative descriptors might be more effective than exact character or word counts.
CUDA MODE Discord
- FSDP Discord Sparks Flare: A member's critique of FSDP as 'kind of ass' sparked debate on its scalability, countered by the claim that it excels in ease of use.
- The conversation pivoted toward FSDP's situational suitability, indicating it's not a one-size-fits-all solution despite its user-friendly nature.
- Sharded LLaMA Woes and vLLM Hopes: Challenges in sharding LLaMA 405B on multiple nodes surfaced during discussions, with possible workarounds involving vLLM enhancement for larger context windows.
- Participants recommended approaches like quantization, with some avoiding vLLM, directing users to enhancement details and support for LLaMA 3.1.
- Megatron's Scholarly Appeal: The Megatron paper provoked interest among members discussing distributed training's relevance, backed by resources like the Usenix paper and explanatory MIT lecture video.
- Discourse on Megatron extended to practical insights on distributed training with references to both academically acclaimed and YouTube disseminated materials.
- Triton Tutorial's Tiled Matmul Matrix: Queries regarding the
GROUP_SIZE_M
argument in the Triton tutorial surfaced, addressing its role in optimizing caching.- The debate included how setting
GROUP_SIZE_M
too high could lead to inefficiencies, exploring the delicate equilibrium of hardware design choices.
- The debate included how setting
- Llama 3.1: Turmoil and TorchChat Guideposts: Users voiced the need for a 10-line Python snippet to simplify Llama 3.1 model usage, with existing inference scripts deemed complex.
- In response, PyTorch unveiled TorchChat as a guide, providing the sorely needed reference implementation to run Llama 3.1.
Stability.ai (Stable Diffusion) Discord
- Stable Fast 3D's Lightning Launch: Stability AI announced Stable Fast 3D, a new model capable of converting a single image to a detailed 3D asset in just 0.5 seconds, pushing the boundaries of 3D reconstruction technology. The model's implications for gaming and VR are substantial, with a focus on speed and quality. Discover the technical details.
- 'Stable Fast 3D's incredible processing time pioneers rapid prototyping efforts in 3D frameworks.' Users benefit from additional features like optional remeshing, adding minimal time increase for broad industry applicability.
- SD3 in the Spotlight: Community discussions revolved around the utilization of Stable Diffusion 3 (SD3) Medium, tackling loading errors and exploring the model's capabilities. Shared solutions include obtaining all components and utilizing tools like ComfyUI workflows for smoother operation.
- Challenges such as 'AttributeError' were navigated through community support and adapting to various available UIs, ensuring more seamless creative experiences with SD3.
- Solving the VAE Conundrum: A common issue within the community was addressed: images turning red during rendering due to VAE settings. Collaborative efforts led to troubleshooting methods that mitigate the problem.
- Applying the '--no-half-vae' command emerged as a peer-recommended fix, easing workflows for artists crafting images with accuracy while navigating hardware-specific solutions.
- Clearing Creative Upscaler Fog: A collective effort was made to disentangle the confusion surrounding the mention of a 'Creative Upscaler' with clarification that it is not a Stability AI project. Members exchanged alternative upscaling recommendations.
- The favored techniques included ERSGAN application and adopting transformer technology, with advice pooling from various community-contributed resources for prompted challenges.
- Flux: The Next Generation in Imagery: Anticipation surrounded Black Forest Labs' release of the Flux model, with the community buzzing about enhancements in image rendition and efficient parameter usage. The announcement teased potential for the text-to-image field.
- Discourse on the model's GPU efficiency highlighted the Nvidia 4090 for optimal performance, with a special nod to the model's prowess in rendering body extremities like hands and fingers.
LM Studio Discord
- Exit Codes Expose Compatibility Clashes: LM Studio users report exit codes like 6 and 0, sparking conversations on system compatibility and the debugging labyrinth.
- This dilemma has escalated to discussions around system-specific quirks and the potential need for updated LM Studio versions.
- Gemma 2 Glitches Generate GPU Grief: Challenges in running Gemma 2 2B models emerged, particularly on dated hardware, compelling users to advocate for a new release of LM Studio.
- The community's response included both commiseration and shared strategies for circumventing the hardware hurdle.
- LLaMA: The Embedding Enigma: Enthusiasts explore embedding capabilities with projects like LLM2Vec, amidst queries on LLaMA's integration within LM Studio.
- This culminated in curated conversations on future-forward solutions for text encoders and the excitement around embedding evolution.
- Diving into LM Studio's Depths: Members unraveled bugs in LM Studio, from GPU offloading oddities to nettlesome network errors potentially tied to VPN/DNS configurations.
- Peers pitched in to pinpoint problems and proposed possible patches, promoting a collaborative climate for tackling tech troubles.
- Vision for Vivid LM Studio Features: The discourse delved into dreams of future LM Studio features, with users yearning for additions like TTS voices and RAG-supported document interactions.
- Hugging Face and approaches to Visual Question Answering (VQA) at Papers with Code garnered attention amidst these aspirations.
Eleuther Discord
- Watermark Woes: AI's Authentication Angst: Members debated watermarking's role in AI trust issues, pointing out its limited effectiveness and suggesting that establishing cultural norms** is crucial.
- The concern is that watermarking may not thwart misuse and misrepresented content without broader trust mechanisms in place.
- NTIA's Open AI Advocacy: Policy Influence Peaks**: The NTIA report promotes the openness of AI models and recommends diligent risk monitoring to guide policymakers.
- Observers note the weight of NTIA's policy recommendations owing to its direct reporting line to the White House, flagging potential shifts in AI regulation.
- GitHub's Model Mashup: Integrating AI with Code**: GitHub's introduction of GitHub Models facilitates direct access to AI models within developer workflows.
- Debate ensued on whether this is a strategy to challenge competitors like Hugging Face or a natural evolution of GitHub's service offerings.
- Relaying the Double Descent: Scaling Laws Under Scrutiny: AI researchers discussed anomalies in validation log-likelihood in scaling law experiments, particularly when models with 1e6 sequences underperformed**.
- This prompted references to the BNSL paper, shedding light on similar patterns and sparking curiosity about dataset size impacts.
- Prompt Overproducing Mystery: lm-eval's Unexpected Multiples: lm-eval's behavior of using more prompts than benchmarks specify, as observed in benchmarks like gpqa_main**, incited technical inquiry and debugging efforts.
- Clarification emerged that the progress bar in lm-eval accounts for
num_choices * num_docs
, reconciling perceived discrepancies and aiding in understanding tool behavior.
- Clarification emerged that the progress bar in lm-eval accounts for
Interconnects (Nathan Lambert) Discord
- Grok's Growth: xAI Unlikely to Capture Character AI: Rumors of xAI acquiring Character AI to enhance its Grok models have been circulating, but Elon Musk denied these claims, calling the information inaccurate.
- The community pondered the truth behind Musk's statements, referencing prior instances where official denials preceded confirmed acquisitions.
- Black Forest Labs Emerges from Stable Diffusion's Roots: The founding team of Stable Diffusion sparked excitement with the launch of Black Forest Labs, specializing in advanced generative models.
- Black Forest Labs' Flux demonstrates creative prowess, and early testers can try it out on fal, signaling potential disruptions in the generative landscape.
- GitHub Models Meshes Devs with AI Prowess: GitHub makes a splash in AI by introducing GitHub Models, offering powerful AI tools to its massive developer base.
- This new suite aims to democratize AI usage for developers, potentially transforming how coding and AI interact on a grand scale.
- Apple Intelligence Puts a Twist in Tech's Future: Apple's latest AI advancements promise to weave apps together more seamlessly, enhancing daily tech interactions.
- Skeptics in AI labs question the groundbreaking status of Apple Intelligence, while others see it as a significant multiplier for tech utility.
- Rejection Sampling Finds Home in Open Instruct: Open Instruct embraces rejection sampling, a method set to fine-tune training by avoiding common pitfalls.
- The move could signal improved efficiencies in model training and a step forward for methodologies within the AI training spectrum.
Latent Space Discord
- Llama 3.1 Touches Nerve in Quality Debate: Together AI blog spurred debate on Llama 3.1 by spotlighting variances in performance due to different implementation practices by inference providers, raising concern for model consistency.
- Dmytro Dzhulgakov drew the community’s attention to potential result cherry-picking and emphasized the cruciality of clear methodologies in model evaluation, igniting extensive discussion on this thread.
- Sybill Secures Millions for AI-Enhanced Selling: Sybill has secured a potent $11M Series A to refine their personal assistant AI for sales reps, with prominent backers like Greystone Ventures (announcement details).
- The AI sales tool spectrum is seeing a spark of innovation with Sybill's solution, cloning sales reps' voices to engineer more relevant follow-ups.
- Black Forest Labs Breaks Ground with FLUX.1: Black Forest Labs, featuring ex-Stable Diffusion wizards, debut their groundbreaking text-to-image model FLUX.1, inclusive of a robust 12B parameter version (see announcement).
- The pro iteration of FLUX.1 is currently live on Replicate for trials, displaying an edge over others in the space.
- LangGraph Studio Unveils New Horizons for Agentic Apps: LangChain propels IDE innovation with the launch of LangGraph Studio, built to streamline the creation and debugging of agentic applications (announcement tweet).
- The agent-focused IDE marries LangSmith, boosting efficiency and teamwork for developers in the realm of large language models.
- Meta MoMa Transforms Mixed-Modal Modeling: Meta's novel MoMa architecture accelerates the pre-training phase for mixed-modal language models, employing a mixture-of-experts approach (accompanying paper).
- The architecture is tailored to juggle and make sense of mixed-modal sequences effectively, marking a step forward in the domain.
LlamaIndex Discord
- Async Advances Accelerate BedrockConverse: New asynchronous methods for BedrockConverse have been integrated, resolving outstanding issues as seen in pull request #14326, notably #10714 and #14004.
- The community expressed appreciation, highlighting the contribution's significant impact on enhancing user experience with BedrockConverse.
- Insights from the LongRAG Paper: The LongRAG paper, authored by Ernestzyj, introduced techniques for indexing larger document chunks to harness the potential of long-context LLMs.
- Opening new possibilities, this method simplifies the retrieval-augmented generation process, garnering interest from the community.
- Workflows Work Wonders in LlamaIndex: Newly introduced workflows in llama_index empower the creation of event-driven multi-agent applications.
- The community applauded this innovation for its readable, Pythonic approach to complex orchestration.
- Stabilizing the Codebase Conundrum: Conversation revolved around determining the stable version of LlamaIndex, clarified by directing users to installations via pip as the safeguard for stability.
- The term 'stable' emerged as a focal point, associating stability with the most recent releases available on PyPI, sparking further debate.
- Prompt Playing with DSPy and LlamaIndex: Members evaluated DSPy's prompt optimization against LlamaIndex's rewriting features.
- Enthusiasm was noted for the comparative exploration between these two tools, considering their application in improving prompt performance.
Cohere Discord
- Embed with Zest: Content Structures Clarified: In a technical discussion, Nils Reimers clarified that embedding models automatically remove new lines and special symbols, reinforcing that preprocessing text is not essential.
- This revelation indicates the models’ robustness in handling noisy data, allowing AI engineers to focus on model application rather than extensive text preprocessing.
- Citations Boost Speed; Decay Dilemmas: A perceptive user linked slower responses with high citation_quality settings in Ukrainian/Russian language on Cohere Cloud, noting that shifting from fast to accurate resolved character issues.
- While the stable output was attained, the trade-off in response speed has become a topic for potential optimization conversation among engineers.
- Arabic Dialects in LLMs: A Linguistic Leap: Surprise was expressed when LLM Aya generated accurate text in various Arabic dialects, prompting questions about dialect training in an English-based prompt environment.
- The community's experience with LLMs in dialect handling reinforces the notion of advanced contextual understanding, stoking curiosity about the training mechanisms.
- Devcontainer Dilemma: Pydantic Ponders: AI engineers faced a bottleneck when pydantic validation errors aborted setup of a Cohere toolkit repository, highlighting issues in the
Settings
class with missing fields like auth.enabled_auth.- A swift response from the team promised an imminent fix, demonstrating agility and commitment to toolkit maintenance and usability.
- "Code and Convene": AI Hackathon Series: Enthusiasm bubbled as community members discussed participation in the AI Hackathon Series Tour at Google, spanning 3 days of AI innovation and competition.
- The tour aims to highlight AI advancements and entrepreneurial ventures, culminating in PAI Palooza, a showcase of emerging AI startups and projects.
LangChain AI Discord
- Pydantic Puzzles in LangChain Programming: Confusion arose with a ValidationError due to a version mismatch of Pydantic, causing type inconsistencies when working with LangChain.
- The conflict was highlighted by input mismatches and validations that led to execution failures, spotlighting the necessity for api_version harmony.
- API Access Angst for LangSmith Users: A user experienced a
403 Forbidden
error when attempting to deploy an LLM using LangSmith, suggesting potential API key misconfiguration.- Community discussion circled around the proper setup for the key and seeking assistance through various LangChain channels.
- Streaming Solutions for FastAPI Fabulousness: Proposing a pattern for asynchronous streaming with FastAPI in LangChain applications, a user advocated using Redis for smooth message brokering.
- This would maintain current synchronous operations while empowering LangChain agents to share outcomes in real-time.
- Jump-Start Resources for LangChain Learners**: The discourse delved into available resources for mastering LangChain, highlighting alternatives and repositories for effective learning.
- Members exchanged GitHub examples and various API docs to advantageously navigate common deployment and integration puzzles.
- LangGraph's Blueprints Unveiled: An innovative LangGraph design pattern was shared, aimed at user-friendly integration into apps like web-chats and messenger bots, with a GitHub example showcasing the integration process.
- Additionally, an invitation was extended for beta testing Rubik's AI new features, inclusive of top-tier models like GPT-4o and Claude 3 Opus, through a special promotional offer.
OpenRouter (Alex Atallah) Discord
- Digital Detox Diet: Moye's Method: Moye Launcher's minimalistic design promotes digital wellbeing by intentionally making apps less accessible, championing behavioral shifts towards less screen time.
- The developer targets three contributors to excess usage, such as auto-clicks and a lack of accountability, aiming to forge habits for focused app engagement through design and user feedback.
- BEAMing Personalities: Big-agi's Big Play: Big-agi's 'persona creator' lets users spin up character profiles from YouTube inputs and the BEAM feature merges outputs of multiple models, increasing response diversity.
- Still, Big-agi feels the pinch of absent server save and sync functions, hindering an otherwise smooth model interaction experience.
- Msty Merges Memory and Web Mastery: Msty's integration with Obsidian and website connectivity garners user praise for its ease of use but faces criticism for its forgetful parameter persistence.
- Some users look to swap to Msty despite its need for a polish, thanks to its sleek interfacing capabilities.
- Llama 405B Walks FP16 Tightrope: OpenRouter lacks a FP16 avenue for Llama 405B, while Meta-recommended FP8 quantization proves more efficient.
- Although SambaNova Systems offers similar services, they're hemmed in by a max 4k context limit and cost-intensive bf16 hosting.
- OpenRouter's Beta Guarantees Gateway to APIs: OpenRouter teases an API integration beta, welcoming support emails for rate limit fine-tuning and threading OpenAI and Claude APIs into user endeavours.
- While its website sometimes stumbles with regional troubles, the OpenRouter status page acts as a beacon, guiding users through operational tempests.
OpenInterpreter Discord
- Open Interpreter Stuck in the Slow Lane: Concern is mounting over Ben Steinher's delayed response from Open Interpreter, who missed his mid-July response deadline.
- Despite the delay, the community lauded a new PR for Groq profile contribution as an impactful way to support Open Interpreter, highlighting a GitHub PR by MikeBirdTech.
- Techies Tune in for Accessibility Talk: An Accessibility Roundtable is set for August 22nd to stir discussion and engagement, with an open invite for the community to share insights.
- Anticipation is high for the upcoming House Party event, after sorting initial time-zone tangles, with participants directed to the event link.
- Model Selection Muddles Minds: Discussion arose about the necessity of an OpenAI API key and the right model string when using '01 --local', evidencing a need for clearer guidelines.
- Inquisitive threads continue, probing whether OpenInterpreter can save and schedule workflows, with answers still pending in the community.
- iKKO Earbuds Amplifying AI Possibilities: Buzz is building about integrating OpenInterpreter on iKKO ActiveBuds, merging high-resolution audio with AI, as detailed on iKKO's website.
- Shipment updates for 01 spark urgency within the community, with an unanswered call for updated information as August ticks by.
- Earbuds with a Vision: Camera Talk: A novel idea emerged for earbuds equipped with cameras, bolstering interaction by capturing visual context during conversations with LLMs.
- Community members pondered the integration, contemplating a tap feature to activate the camera for an enhanced HCI experience.
Modular (Mojo 🔥) Discord
- Mojo Misses the Thread: In a conversation about Mojo's capabilities, a member clarified that Mojo does not currently expose thread support directly to users.
- It was mentioned that utilizing fork() is a workaround for achieving threading within the compiled environments.
- MAX & Mojo's Packing Proclamation: Upcoming changes to MAX and Mojo packaging have been revealed, starting with version 0.9 of the
modular
CLI, dropping the need for authentication to download MAX and Mojo.- Mojo will be merged with MAX nightly builds, with the announcement suggesting a shift to the new
magic
CLI for seamless Conda integration.
- Mojo will be merged with MAX nightly builds, with the announcement suggesting a shift to the new
- Charting a Tier of Confusion: Members expressed bewilderment over a tier chart, debating its accurate representation and criticizing it for not reflecting the intended 'level of abstraction'.
- Some advocated for simplifying the visual with a fire emoji, indicating the expectation of a clear and effective communication tool.
- Unicode Unleashed in CrazyString: The CrazyString gist was updated, introducing Unicode-based indexing and boasting full UTF-8 compatibility.
- The conversation touched upon Mojo string's small string optimization and the increased usability due to the updates.
- Max Installation Maze on M1 Max: Challenges arose for a member attempting to install max on their Mac M1 Max device, with the community stepping in to provide potential fixes.
- A shared resource suggested a specific Python installation workaround could help to navigate the installation issue.
OpenAccess AI Collective (axolotl) Discord
- Axolotl's Ascent with Auto-Stopping Algorithms: Axolotl introduced an early stopping feature in response to queries about halting training when loss plateaus or validation loss surges.
- Community members engaged in a brief exchange regarding the abilities to manually terminate runs while saving the current LoRA adapter state.
- Masked Learning Leap for SharedGPT: A member put forward an "output mask" field for each turn of SharedGPT, aimed at targeted training through selective output masking.
- This innovation sparked discussion about its potential to refine learning through processed output errors.
- Chat Templates Call for Clarity: Issues with deciphering new chat templates prompted members to call for better documentation to aid in understanding and customization.
- A member volunteered to share personal notes on the topic, suggesting a community-driven update to the official documents.
- Pacing Pad Token Problems: Training troubles talked about the frequent occurrence of
<pad>
token repetition, hinting at inefficiencies in sampling methods.- The conversation contributed a tip: ensure pad tokens are cloaked from labels to prevent recurring redundancies.
- Gemma2's Eager Edge Over Flash: An endorsed tip for Gemma2 model training surfaced, suggesting 'eager' over 'flash_attention_2' to solidify stability and performance.
- Practical guidance was given, with code provided to demonstrate setting
eager
attention inAutoModelForCausalLM
.
- Practical guidance was given, with code provided to demonstrate setting
DSPy Discord
- Discussions Ignite around DSPy and Symbolic Learning: Members buzz with anticipation over integrating DSPy with symbolic learners, speculating on the groundbreaking potential.
- Optimism sparks as participants expect substantial advancements from such a combination in AI capabilities.
- Self-Adapting Agents Step into the Spotlight: The Microsoft Research blog brought self-adapting AI agents to the fore, showcasing an article with promising workplace applications.
- Insights link the games industry as a catalyst to AI advancement, now materializing in tools like ChatGPT and Microsoft Copilots.
- Enter Agent Zero: A Foray into User-Tested AI: Agent Zero makes its debut as the first user-tested production version, showing off its AI prowess.
- Feedback insinuates a shift towards AI occupying more diverse roles in professional settings.
- LLMs Self-Improve with Meta-Rewarding: A new Meta-Rewarding technique enhances LLMs' self-judgment, revealed in an arXiv paper, improving their performance.
- Significant win rate increases are reported on AlpacaEval 2, indicating that models like Llama-3-8B-Instruct also benefit.
- MindSearch Paper Explores LLM-Based Multi-Agent Frameworks: A paper published on arXiv presents MindSearch, emulating human cognitive processes in web searches using LLM-driven agents.
- The study tackles information seeking challenges and aims to refine modern search-assisted models.
tinygrad (George Hotz) Discord
- NVIDIA Grabs Taxpayer Dough: A message showed enthusiasm for NVIDIA receiving public funds, detailing the value for the taxpayer's investment.
- This topic stirred conversation on investment priorities and implications for tech development.
- George Hits Hotz Button on Discord Decorum: George Hotz issued a reminder about the server's rules, funneling focus towards tinygrad development.
- Hotz's nudge was a call to maintain a professional and on-topic dialogue within the community.
- Argmax Chokes GPT-2 Speed: A deep dive into GPT-2 performance found that embedding combined with
argmax
significantly throttles execution speed, as observed in Issue #1612.- The inefficiency traced back to an O(n^2) complexity issue, sparking discussions on more efficient algorithmic solutions.
- Embedding Bounty: Qazalin's Got a Quest: Talks of a bounty for enhancing embeddings in tinygrad surfaced, exclusively directed towards a user named Qazalin.
- The bounty generated buzz and motivated other contributors to seek different optimization opportunities within tinygrad.
- Cumsum Conundrum: Challenges with the
cumsum
function's O(n) complexity were tackled in Issue #2433, inciting innovative thought among developers.- George Hotz rallied the troops, advocating for practical experiments to discover possible optimization strategies.
LAION Discord
- Polyglot ChatGPT's Vocal Feats: A member showcased ChatGPT Advanced Voice Mode adeptly reciting poetry in Urdu and storytelling in several languages including Hebrew, Norwegian, and Georgian.
- This display included narratives in lesser-known dialects like Moroccan Darija, Amharic, Hungarian, Klingon, wowing the engineering community.
- Spectacular Reveal of Black Forest Labs: Enthusiasm erupted over the launch of Black Forest Labs, with a mission focused on innovative generative models for media.
- The initiative took off with FLUX.1, a model that promises to enhance creativity, efficiency, and diversity in generating visuals.
- FLUX.1 Model Debuts Impressively: The community turned their attention to FLUX.1, a new model whose debut on Hugging Face was met with acclaim.
- Discussions emerged on how this model could potentially shift the landscape of generative learning, with features termed as refreshing and super good.
- Innovative Activation Function Twists: AI enthusiasts delved into experiments with varied normalization and activation functions on complex-valued activations, tagging the exercises as 'kinda fun!'.
- This practical exploration led to sharing of insights and potential applications in complex domains.
- The Overhyped Regularization Riddle: A user pointed out, using a Medium article, that extensive methods like data augmentation and dropout fail to curb overfitting significantly.
- Probing the effectiveness of various regularization techniques, the community pondered on methods beyond traditional tricks to advance machine learning models.
Torchtune Discord
- Topping the Charts with Top_p: A member discovered that setting top_p=50 met their performance standards with substantial results.
- They compared the 0.8 online model against their own, noting the online variant's superior outcome.
- Debugging Delight with Generate Recipe: Clarification was brought that generate recipe is geared for debugging purposes, targeting an accurate portrayal of the model.
- Any discrepancies with benchmarks should prompt the submission of an issue, with evaluations affirming the recipe's efficacy.
- FSDP2's New Feature Fusion: A member shared that FSDP2 now handles both quantization for NF4 tensor and QAT, boosting its versatility.
- While QAT recipes seem compatible, compiling with FSDP2 may present challenges, marking an area for potential refinement.
- Merging PRs with Precision: The upcoming merger of a PR has been flagged as dependent on a prior one, with PR #1234 under review, thereby paving the way for sequential improvements.
- This anticipates enhanced fine-tuning datasets, with a focus on grammar and samsum, advancing Torchtune's methodical evolution.
MLOps @Chipro Discord
- Data Phoenix Ascends with AI Webinar: The Data Phoenix team announced a webinar titled 'Enhancing Recommendation Systems with LLMs and Generative AI,' featuring Andrei Lopatenko set for August 8 at 10 a.m. PDT.
- This webinar aims to unveil how LLMs and Generative AI are transforming personalization engines, with a webinar registration made available.
- dlt Elevates ELT Know-how with Workshop: A 4-hour workshop on ELT with dlt is slated to school data enthusiasts on constructing robust ELT pipelines, resulting in a 'dltHub ELT Engineer' certification.
- Scheduled online for 15.08.2024 at 16:00 GMT+2, the session starts with dlt basics and registrations can be made here.
- Conferences Showcase NLP & GenAI Dominance: Two ML conferences placed a heavy accent on NLP and genAI, overshadowing presentations on models like Gaussian Processes and Isolation Forest.
- The trend underscores a strong community tilt towards NLP and genAI technologies, leaving some niche model discussions in the shadows.
- ROI from genAI Under Community Microscope: A lively debate questioned whether the ROI for genAI will live up to the lofty expectations set by some in the field.
- The conversation pointed out the gap between expectations and realities, stressing the need for grounded anticipation of returns.
LLM Finetuning (Hamel + Dan) Discord
- LangSmith Credits Conundrum: Digitalbeacon reported an issue accessing LangSmith credits after adding a payment method, using a different email address from his organization ID 93216a1e-a4cb-4b39-8790-3ed9f7b7fa95.
- Danbecker recommended contacting support for credit-related troubles, implying a need for direct resolution with customer service.
- Payment Method Mayhem for LangSmith: Digitalbeacon inquired about a zero credit balance in LangSmith post payment method update, even after timely form submission.
- The situation suggests a system glitch or user misstep, necessitating further investigation or support intervention.
The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
HuggingFace ▷ #announcements (1 messages):
Neural network simulation
Video clustering
Synthetic dataset
Knowledge distillation
Gradio demo
- Simulate Neural Networks Online: A member shared a Neural network simulation that's now available online.
- Explore different neural network configurations and their behaviors in an interactive website.
- Master Video Clustering Techniques: A new YouTube video explains how to use image descriptors like Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG) for video clustering.
- Learn clustering for better video data organization and processing.
- Explore Massive Synthetic Dataset: A huge synthetic dataset was released by a community member.
- Perfect for experimenting with tabular data models.
- Trendy Knowledge Distillation Techniques: An insightful article discusses the latest knowledge distillation trends and their implications.
- Stay updated on efficient model training methods.
- Finance and Medical Models Launch: New models for finance and medical purposes, Palmyra-Med-70b and Palmyra-Fin-70b, have been introduced.
- Palmyra-Med-70b excels in medical tasks with an MMLU performance of ~86%, while Palmyra-Fin-70b is the first model to pass the CFA Level III exam with 73%.
- 🎥 Master Video Clustering with Image Descriptors: LBP & HOG Explained! 🌟: 🔍 Discover the power of video clustering in this detailed guide! Learn how to use image descriptors like Local Binary Pattern (LBP) and Histogram of Oriente...
- Unity ML-Agents | Live Agent training from Scratch: a quick little experiment withing ml agents and cuda
- Tweet from Sam Julien (@samjulien): 🔥 @Get_Writer just dropped Palmyra-Med-70b and Palmyra-Fin-70b! Palmyra-Med-70b 🔢 Available in 8k and 32k versions 🚀 MMLU perf ~86%, outperforming top models 👨⚕️ For diagnosing, planning treatme...
HuggingFace ▷ #general (852 messages🔥🔥🔥):
GPTs agents
Keras introduction
OpenAI sidebars changes
Autoencoders for Minecraft
Fine-tuning models with quantization
- GPTs Agents misunderstood: Members discussed that GPTs agents do not learn from additional information after their initial training.
- Clarification was provided that uploaded files are saved as 'knowledge' files for reference but do not modify the base knowledge.
- Introducing Keras for Deep Learning: Members provided an explanation of Keras as a multi-backend deep learning framework with support for JAX, TensorFlow, and PyTorch.
- Keras is praised for accelerating model development and offering state-of-the-art performance with easy-to-debug runtimes.
- OpenAI platform sidebar changes: Members discussed the disappearance of two icons from the sidebars of platform.openai.com.
- It was noted that icons for threads and messages disappeared from the sidebar, prompting further discussion.
- Autoencoders for Minecraft video generation: Members worked on training autoencoders to compress Minecraft images and videos with aims of generating Minecraft video sequences.
- Challenges in Fine-tuning Models with Quantization: Members addressed issues related to fine-tuning the Llama 3-8b model using quantization to manage GPU memory efficiently.
- no title found: no description found
- Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models: Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs—the original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativi...
- Tweet from nisten (@nisten): hacked bitnet for finetuning, ended up with a 74mb file. It talks fine at 198 tokens per second on just 1 cpu core. Basically witchcraft. opensourcing later via @skunkworks_ai base here: https://huggi...
- imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
- Google Colab: no description found
- Maintainer «nroggendorff»: A Curated List of the Large and Small Language Models (Open-Source LLMs and SLMs). Maintainer «nroggendorff» with Dynamic Sorting and Filtering.
- glides (Glide): no description found
- Handling big models for inference: no description found
- PhotoMaker - a Hugging Face Space by TencentARC: no description found
- Tuh Buh GIF - Tuh Buh Guh - Discover & Share GIFs: Click to view the GIF
- keras: Multi-backend Keras.
- Enhancing Recommendation Systems with LLMs and Generative AI · Luma: The Data Phoenix team invites you to our upcoming webinar, which will take place on August 8 at 10 a.m. PDT. Topic: Enhancing Recommendation Systems with LLMs…
- no title found: no description found
- BioMistral/BioMistral-7B · Hugging Face: no description found
- Fine-tuning Mistral on Your Dataset: no description found
- LLM Compiler - a facebook Collection: no description found
- The Unreasonable Effectiveness of JPEG: A Signal Processing Approach: Visit https://brilliant.org/Reducible/ to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription.Ch...
- Computer Generates Human Faces: 5:51 To skip to the results.Try It Online: http://codeparade.net/faces/Download App (Windows 64-bit): https://github.com/HackerPoet/FaceEditor/raw/master/Fac...
- lm-format-enforcer/README.md at main · noamgat/lm-format-enforcer: Enforce the output format (JSON Schema, Regex etc) of a language model - noamgat/lm-format-enforcer
- Creating my own customized celebrities with AI: Check out Brilliant.org for fun STEMmy courses online! First 200 people to sign up here get 20% off their annual premium subscription cost: https://brilliant...
- Why images are compressible: The Vastness of Image Space: We explore why images are compressible, which is related to the (larger than) astronomical space of all possible images. This is one of my favorites. Follow...
- palm of my hands: Song · John Summit, venbee · 2024
- GitHub - jxnl/instructor: structured outputs for llms: structured outputs for llms . Contribute to jxnl/instructor development by creating an account on GitHub.
- Chicken - Esolang: no description found
- Flux pipeline by sayakpaul · Pull Request #9043 · huggingface/diffusers: We are working on uploading the diffusers weights to the respective FLUX repositories. Will be done very soon.
HuggingFace ▷ #cool-finds (4 messages):
finegrain Object Eraser model
Evolution of AI bots
Knowledge distillation
- Finegrain unveils Object Eraser model: A member shared news of a new Object Eraser model available on a Hugging Face space, demonstrating the model's capabilities.
- This model was developed by @finegrain_ai and is aimed at showcasing new applications publicly for everyone to try.
- Evolution of AI bots article on Medium: A member posted an article on Medium about the Evolution of AI bots, detailing various AI tools like LLMs and RAG pipelines. Read the full article.
- The article is designed for newcomers and delves into high-level patterns, pipelines, and architectural designs used in 2024.
- Understanding Knowledge Distillation: A member found knowledge distillation to be an interesting topic, sharing a detailed page from IBM on Knowledge Distillation.
- The article explains that knowledge distillation transfers learnings from a large pre-trained 'teacher model' to a smaller 'student model' for compression and knowledge transfer purposes.
- Tweet from Pierre Chapuis (@pchapuis): Made a @huggingface space to demonstrate one of the models we trained at @finegrain_ai: the Object Eraser. Pretty happy everyone can try it publicly at last. :) https://huggingface.co/spaces/finegrai...
- What is Knowledge distillation? | IBM : Knowledge distillation is a machine learning technique used to transfer the learning of a large pre-trained “teacher model” to a smaller “student model.”
- Evolution of the AI Bots: Harnessing the Power of Agents, RAG, and LLM Models: Structuring knowledge about tools for AI bot development, also high-level overview of approaches, architectures and designs.
HuggingFace ▷ #i-made-this (16 messages🔥):
model release heatmap
grounding-sam2-demo
TinyML bird detection project
Infinite Sands project
2D parallelism in deep learning
- Model release heatmap space gains attention: A member created a space for a heatmap of model releases among top AI labs.
- Others expressed interest in integrating such a heatmap into future Hugging Face profile pages for better visibility.
- Grounding-Sam2 demo showcases paired models: A member shared a GitHub project demonstrating a Gradio interface for grounding dino and segment anything v2 models.
- The demo highlights upgraded usage of these models in a simple and interactive format.
- TinyML detects birds with Seeed and Blues: A project on Hackster reports bird species using TinyML hardware and a Blues Notecard.
- The setup involves Seeed's Grove Vision AI Module V2 and compresses EfficientNetLite for efficient bird detection.
- Infinite Sands brings sandbox to life with AI: Infinite Sands uses generative AI to create stories from sandbox shapes.
- The project applies ControlNet depth and Whisper for command handling, making it a playful and interactive exploration.
- AI + i podcast launches focused on AI models: A new podcast series, Ai + i, has been launched to discuss leading foundation and open-source models.
- The host seeks topic suggestions from the community for future podcast episodes.
- Model Release Heatmap - a Hugging Face Space by cfahlgren1: no description found
- Malaysia-AI blog 2D Parallelism using Ray PyTorch: Malaysia-AI blog 2D Parallelism using Ray PyTorch
- cfahlgren1/model-release-heatmap · Discussions: no description found
- tasksource/deberta-base-long-nli · Hugging Face: no description found
- grounding-sam2-demo/interface.py at main · CoffeeVampir3/grounding-sam2-demo: A simple demo for utilizing grounding dino and segment anything v2 models together - CoffeeVampir3/grounding-sam2-demo
- Bird Detection with TinyML and a Blues Notecard: I built a project to identify birds at a bird feeder using Machine Learning (TinyML) and transmit data to the cloud with a Blues Notecard By Timothy Lovett and Kerin Lovett.
- Infinite Sands: ROCM powered sandbox to shape your reality with the help of standard diffusion and controlnet. By Timothy Lovett and Kerin Lovett.
HuggingFace ▷ #reading-group (8 messages🔥):
Deep Learning Study Group
LLM Model Suggestions
New Learners Collaboration
- Deep Learning Enthusiasts Unite: A new member expressed interest in forming a group of motivated individuals to learn deep learning and machine learning together.
- LLM Model for PDF Table and Checkbox Detection: A member requested suggestions for LLM models capable of performing table and checkbox detection and extraction from PDF inputs.
HuggingFace ▷ #core-announcements (1 messages):
sayakpaul: Will be merged in a few https://github.com/huggingface/diffusers/pull/9043
HuggingFace ▷ #NLP (2 messages):
Training LLM for Solr
AI System for Aphasic Patients
- Training LLM to interpret search queries for Solr: A member asked for advice on training a Large Language Model (LLM) to receive search queries and output JSON with product facets and categories for use in Apache Solr.
- They mentioned not having an instruction dataset and sought guidance on how to approach the task.
- Building AI for Communication with Aphasic Patients: A member intends to build an AI system combining microexpression recognition, speech recognition, and image recognition to help facilitate communication with aphasic patients.
- They requested help as they have no idea how to start the project and mentioned that anything would be extremely helpful.
HuggingFace ▷ #diffusion-discussions (2 messages):
Amazing Results
Trolling Allegations
- Welltoobado Praises Results: A member expressed satisfaction, noting 'Yeah pretty amazing results, good job!' in response to something.
- Pseudoterminalx Questions Trolling: Another member, uncertain about the sincerity, responded, 'hard to tell if you're trolling anymore lol'.
Nous Research AI ▷ #off-topic (1 messages):
pradeep1148: https://www.youtube.com/watch?v=DLb7Lrzw8wo
Nous Research AI ▷ #interesting-links (9 messages🔥):
New SOTA efficiency gains in Multi-modal architecture
- New SOTA efficiency gains in Multi-modal architecture: The authors who introduced Chameleon achieved significant efficiency gains in a new multi-modal architecture, incorporating a mixture of experts and modal-specific expert routing techniques.
- Efficiency gains were approximately 3x in text training and 5x in image training, with MoMa 1.4B significantly outperforming its dense counterpart and other MoE models according to Victoria Lin.
- Discussion on New SOTA efficiency gains in Multi-modal architecture: Members expressed excitement about the new architecture, noting its significant FLOPs savings and improved performance.
- The gains in image training were particularly noted, highlighting the new architecture's impressive 5.2x efficiency improvement.
Link mentioned: Tweet from Victoria X Lin (@VictoriaLinML): 4/n Under a 1T token training budget, MoMa 1.4B (4 text experts+4 image experts) achieves FLOPs savings of 3.7x (text: 2.6x, image: 5.2x) compared to its dense counterpart (measured in pre-training lo...
Nous Research AI ▷ #general (441 messages🔥🔥🔥):
Heptagon Riddle
GPT Benchmarks vs Human Heuristics
Speculative Decoding Mechanics
Dynamic Memory Systems
Bitnet for Finetuning
- Heptagon Riddle Solved: A riddle about a denizen of flatland involves determining a regular polygon type. After discussion, heptagon was the correct answer.
- One user noted some models occasionally get lucky answers, but overall, the LLMs struggle with symbolic logic riddles.
- Speculative Decoding Insights: Participants discussed speculative decoding techniques, explaining that using smaller draft models to speed up decoding isn't always lossless.
- While some initial claims stated that output distribution can diverge if not done correctly, others clarified that rejection sampling ensures lossless output by aligning draft and base models.
- Dynamic Memory System Applications: Dynamic persona memories were discussed as a current gap in the ragdata set, with participants suggesting collaboration opportunities.
- Participants compared techniques to parallelize token generation and noted issues with accurate context handling by LLMs in dynamic systems.
- Bitnet's Finetuning Brings Speed: A Reddit post about Bitnet's finetuning method received attention due to its impressive speed, running at 198 tokens per second on just one CPU core.
- Experimenters achieved a 74MB file size using Bitnet and claimed it operates efficiently, sparking interest in its potential for future projects.
- Tweet from nisten (@nisten): hacked bitnet for finetuning, ended up with a 74mb file. It talks fine at 198 tokens per second on just 1 cpu core. Basically witchcraft. opensourcing later via @skunkworks_ai base here: https://huggi...
- Tweet from ryunuck (p≈np) (@ryunuck): What Ilya saw CRISPR-Q runs on Sonnet 3.5 and enables the model to rewrite the context window through targeted operations of its own self-memeplex. The incomprehensibly alien generative heuristic tha...
- openai-community/gpt2-xl · Hugging Face: no description found
- Speculative Decoding Explained: One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llmsAdvanced Inference Repo (Paid Lifetime Membership): https://trelis.com/enter...
- Reddit - Dive into anything: no description found
- GitHub - holo-q/OpenQ: The open-source implementation of Q*, achieved in context as a zero-shot reprogramming of the attention mechanism. (synthetic data): The open-source implementation of Q*, achieved in context as a zero-shot reprogramming of the attention mechanism. (synthetic data) - holo-q/OpenQ
- GitHub - carsonpo/octoquadmul: Contribute to carsonpo/octoquadmul development by creating an account on GitHub.
- GitHub - carsonpo/octomul: Reasonably fast (compared to cublas) and relatively simple int8 tensor core gemm: Reasonably fast (compared to cublas) and relatively simple int8 tensor core gemm - carsonpo/octomul
Nous Research AI ▷ #ask-about-llms (2 messages):
LangChain usage
Mixtral API retrieval
OpenAI API format
- Using LangChain with Mixtral API in OpenAI format: A member discussed a code snippet using LangChain with environment variables like mixtral_api_base for retrieving the Mixtral LLM from the OpenAI API.
- There was a debate on whether this approach makes sense without LangChain, since LangChain uses the OpenAI API format.
- Debate on LangChain necessity: Another discussion ensued regarding whether the use of LangChain is necessary for interacting with the Mixtral LLM from OpenAI API.
- Members expressed differing views on the dependency on LangChain for such operations.
Nous Research AI ▷ #reasoning-tasks-master-list (3 messages):
Assisting with project setup
Cost considerations for project
- Project Setup Assistance: A member asked what they could do to help get the project going and if it costs much.
- Another member confirmed that it doesn't cost anything and instructed them to follow the steps mentioned in a pending PR.
- Cost-Free Project Initiative: A participant mentioned that the project does not incur any costs.
- The next steps involve following the instructions provided once a new PR is made.
Unsloth AI (Daniel Han) ▷ #general (205 messages🔥🔥):
Multi-GPU Support
Unsloth Finetuning
Qwen Model Merging
AI Performance
Bitnet Code Hacking
- *Multi-GPU Training* works but needs improvement: Users confirmed multi-GPU training works after fixes, but noted earlier installation problems required creating a new environment and troubleshooting various setups.
- An example stated: 'installing it into llamafacs env worked first try,' while another mentioned needing to manually upgrade transformers.
- Unsloth Crypto Runner Clarifications: Clarifications were provided on the Unsloth Crypto Runner, stating it involves AES/PKI-based cryptography between client and license server.
- 'MrDragonFox' emphasized, 'what you need to care about is the right side as you see my both GPU's utilized.'
- Finetuning Qwen with Continuous Fine-tuning: Using Continuous Fine-tuning Without Loss on Qwen2-1.5B-Instruct was successful, incorporating both code FIM and instruct capabilities.
- Members were excited about the method, with one suggesting 'writing up a tutorial' for those facing confusion over the documentation.
- Issues with Merging Adapters: Users discussed merging LoRA adapters and 4-bit models, noting that improperly merging could lead to models only appearing as 16-bit but actually being 4-bit quality.
- A concern was raised about 4-bit models being upscaled to 16-bit, potentially leading fake 16-bit models to propagate in the community.
- Hack on Bitnet for Finetuning: User Nisten mentioned hacking Bitnet for finetuning, resulting in a 74MB model that runs at 198 tokens per second on 1 CPU core.
- This hack was described as 'basically witchcraft' and will be open-sourced via Skunkworks AI.
- Tweet from nisten (@nisten): hacked bitnet for finetuning, ended up with a 74mb file. It talks fine at 198 tokens per second on just 1 cpu core. Basically witchcraft. opensourcing later via @skunkworks_ai base here: https://huggi...
- johnpaulbin/qwen1.5b-e2-1-lora · Hugging Face: no description found
- Google Colab: no description found
- rombodawg/gemma-2-9b-reuploaded · Hugging Face: no description found
- Tweet from sankalp (@dejavucoder): wake up babe, daniel han video finally dropped
- Dancing Dj Ravine GIF - Dancing Dj Ravine Groovy - Discover & Share GIFs: Click to view the GIF
- Finetune Llama 3 with Unsloth: Fine-tune Meta's new model Llama 3 easily with 6x longer context lengths via Unsloth!
- AI Unplugged 16: Llama 3, AIMO winners, Segment Anything Model 2, LazyLLM: Insights over Information
- Low Level Technicals of LLMs: Daniel Han: This workshop will be split into 3x one hour blocks:How to analyze & fix LLMs - how to find and fix bugs in Gemma, Phi-3, Llama & tokenizersFinetuning with U...
- Llama 3.1 Fine Tune - Mervin Praison: https://huggingface.co/mervinpraison/Llama-3.1-8B-bnb-4bit-python Train Model with Custom Data Convert to GGUF Ollama Modelfile Ollama Create Custom Model
Unsloth AI (Daniel Han) ▷ #off-topic (4 messages):
Google new model
OpenAI vs Google
- Google's new model beats OpenAI: Finally Google beat OpenAI with a new model.
- A user shared a link to Reddit highlighting the new model from Google that claims to surpass OpenAI.
- Users react skeptically: I can't believe it... was the initial reaction to the purported news from Google.
- Another user responded with skepticism saying, ummm, casting doubt on the credibility of the information.
Link mentioned: Reddit - Dive into anything: no description found
Unsloth AI (Daniel Han) ▷ #help (130 messages🔥🔥):
Python versions for Unsloth installation
Installing Unsloth with Conda
LoRA fine-tuning issues
Inference problems with GGUF quantization
Custom dataset training errors on Llama 3.1
- *Python versions spark debate*: Members were confused about Unsloth's compatibility with Python versions 3.10 and 3.11, as different results appeared when following the installation guide.
- Felicitiy00637 shared issues with installation on Compute Canada's Narval cluster, noting success only after bypassing xforms in 'pyproject.toml'.
- *Conda environment clarifies setup*: Fjefo stressed the importance of following the guide precisely for Conda environments, noting that deviations could complicate debugging.
- Despite felicity00637's assurance of following the guide, confusion persisted until confirmation that Conda wasn't used.
- *LoRA parameters under discussion*: Felicitiy00637 sought clarification on LoRA parameters like 'r' and 'lora_alpha', asking for their definitions and recommended values.
- The community explained that LoRA scaling parameters should ideally be set to twice the rank (r), linking to the LoRA parameter encyclopedia for deeper insights.
- *GGUF quantization wreaks havoc*: Akshatiscool reported models outputting gibberish post-GGUF quantization, despite correct outputs during Collab inference.
- Theyruinedelise suggested checking chat templates, acknowledging recent issues fixed in GGUF quantization.
- *Llama 3.1 training stumbles*: Bigboypikachu encountered 'Expected all tensors to be on the same device' errors when training custom long-context datasets on Llama 3.1-8b-instruct.
- The same kernel successfully trained on a predefined dataset, but failed with custom datasets, hinting at context length issues.
- Google Colab: no description found
- Continued Pretraining | Unsloth Documentation: AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- Narval - Alliance Doc: no description found
- Saving Models | Unsloth Documentation: Learn how to save your finetuned model so you can run it in your favorite inference engine.
- Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han: The story behind our 8 bug fixes for Gemma, multiple tokenization fixes for Llama 3, a sliding window bug fix and Mistral-fying Phi-3, and learn about how we...
- FastLanguageModel has a problem with PromptTemplate and other complicate things · Issue #839 · unslothai/unsloth: I am trying to specify the prompt to apply RAG in Unsloth environment but Unfortunately, current Unsloth environment has some complicated problems. First, I will provide slow but well-worked code. ...
- Unsloth Documentation: no description found
- `KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'` · Issue #27985 · huggingface/transformers: System Info transformers version: 4.36.0 Platform: Linux-5.15.0-70-generic-x86_64-with-glibc2.35 Python version: 3.11.4 Huggingface_hub version: 0.19.4 Safetensors version: 0.3.3 Accelerate version...
Unsloth AI (Daniel Han) ▷ #research (5 messages):
AI interoperability with Groq
Black Forest Labs launch
FLUX.1 text-to-image model
OpenAI models
Generative AI
- Groq AI limited to inference post-finetuning: Members discussed whether AI models can work on both Google AI and Groq AI.
- It was clarified that with Groq, models can most likely only do inference after being finetuned using another service.
- Black Forest Labs steps into the scene: Announcing Black Forest Labs, a new venture focused on advancing generative deep learning models for media.
- Their initial release, the FLUX.1 suite of models, aims to push the frontiers of text-to-image synthesis. Open weights make it accessible for further development.
Link mentioned: Announcing Black Forest Labs: Today, we are excited to announce the launch of Black Forest Labs. Deeply rooted in the generative AI research community, our mission is to develop and advance state-of-the-art generati...
Perplexity AI ▷ #announcements (1 messages):
Perplexity Pro free for Uber One members
- Uber One offers Perplexity Pro for free: Uber One members across the US and Canada can now enjoy a free year of Perplexity Pro. This offer, available until October 31, allows members to unlock the full potential of Perplexity’s answer engine, normally valued at $200.
- Enhance info discovery with Perplexity Pro: From quick facts during Uber rides to detailed research at home, Perplexity Pro enhances every information discovery moment for Uber One members.
- Learn more about this perk and the terms at Perplexity Uber One.
Link mentioned: Eligible Uber One members can now unlock a complimentary full year of Perplexity Pro : Uber One members can now save even more time with perks like Pro Search
Perplexity AI ▷ #general (293 messages🔥🔥):
Uber One Perplexity Pro deal
Rating AI search engines
Perplexity functionality comparisons
Technical issues and bugs
Legal use cases for AI
- Uber One members get Perplexity Pro for free: Perplexity announced that eligible Uber One members in the US and Canada can redeem a complimentary year of Perplexity Pro from now through October 31, 2024. Members discussed details and eligibility, noting the promotion requires signing up with a new Perplexity Pro account and maintaining an active Uber One membership throughout.
- Comparing different AI search engines: Users shared their experiences comparing various AI search engines like Perplexity, Felo.ai, and Chatlabs, focusing on aspects like UI, UX, speed, and response quality. Perplexity Pro was generally rated highest, followed by SearchGPT, Uncovr free, and others.
- Perplexity app functionality issues and gaps: Members highlighted several issues with Perplexity's app, especially on mobile, such as the inability to delete uploaded files and generate images, poor Android performance, and significant missing features compared to OpenAI and Microsoft Copilot. One user expressed their frustration with mobile bugs and inconsistencies that lead to lost text.
- Troubleshooting exporting and uploading issues: Users encountered issues with exporting text and sources from pages, with one noting: 'Truly IMPOSSIBLE. Impossible. Never going to happen.' Another member reported token count errors when trying to upload large PDFs in AIStudio.
- Using AI for legal document search and analysis: A member shared their positive experience using Perplexity for searching and analyzing legal documents, finding it particularly useful for locating relevant cases. They inquired about applying Retrieval-Augmented Generation (RAG) to search through a large collection of discovery documents.
- ChatLabs: ChatLabs is a platform for LLM and AI tinkerers. Experience more than 30 AI models in one place.
- Eligible Uber One members can now unlock a complimentary full year of Perplexity Pro : Uber One members can now save even more time with perks like Pro Search
- monnef / AIlin · GitLab: AIlin is a tool that connects AI services, such as Perplexity.ai, with your local computer.
- Complexity: Perplexity's New Extension: The Complexity extension for Perplexity AI introduces a range of powerful features designed to enhance the user experience and streamline interactions with...
Perplexity AI ▷ #sharing (10 messages🔥):
Perplexity AI skills and features
Flask secure user authentication
Checking Pro account status
Impacts of drinking coffee on dental health
Next iPhone release details
- Perplexity AI combines search and text generation: Perplexity AI is a powerful tool that integrates search capabilities with large-scale language models to provide precise and comprehensive answers.
- Its notable features include effective market research and competitive analysis, helping users to synthesize data from multiple reports and understand competitive landscapes.
- Flask secure user authentication setup: To implement secure user authentication in Flask, install necessary packages like
Flask-Login
,Flask-SQLAlchemy
, andFlask-Bcrypt
, and follow step-by-step guidelines.- This involves creating an application factory, defining a
User
model, and setting up routes for registration, login, and logout as demonstrated here.
- This involves creating an application factory, defining a
- Check Pro account status with steps: To check if an account is subscribed to Pro, navigate to account settings or billing information on the platform.
- Alternatively, verify through payment history, or contact customer support for assistance, as detailed here.
- OpenAI rolls out hyper-realistic voice mode: OpenAI launched its Advanced Voice Mode for ChatGPT, giving Plus subscribers access to hyper-realistic audio interactions powered by the GPT-4o model on July 30, 2024.
- This feature introduces real-time, natural conversations with capabilities like mid-sentence interruptions and emotional intonation detection.
- Folksable app enhances habit tracking with social features: Folksable is a habit tracking app that encourages users to share updates and progress with friends through photos and social contracts to maintain accountability.
- Available on Android and iOS, the app ensures data privacy with encryption and allows users to create custom rituals and manage visibility through privacy controls.
- does drinking coffee have any negative impact on dental health?: Drinking coffee can have several negative impacts on dental health. Coffee contains acidity, which can erode tooth enamel and cause sensitivity and brittle...
- Folksable app: Folksable is a photo habit tracking app that encourages users to share updates on their habits with friends for accountability. Users can create private or...
- I'm curious about the shooting posture of Olympic shooters and tell me why...: Olympic shooters adopt specific postures and stances to maximize their accuracy, stability, and performance. Here's an overview of the shooting postures used...
- Synchron's Brain Implant Advances, OpenAI's Voice Mode, Friend's AI Necklace, and HIV Nanobody Br...: Discover the latest breakthroughs in brain-computer interface technology as Synchron enables mind control of Apple's Vision Pro headset. This groundbreaking ...
- 나 por인데 왜 기본으로 되지?: 안녕하세요. 제가 이해한 바로는 귀하께서 한국어로 대화를 시작하셨는데, 시스템이 기본적으로 영어(POR)로 설정되어 있는 것 같습니다. 이런 상황이 발생한 이유에 대해 몇 가지 가능성을 설명해 드리겠습니다: 1. 시스템 설정: 사용 중인 애플리케이션이나 웹사이트의 기본 언어 설정이...
- When will the next iPhone be released?: The next iPhone, expected to be the iPhone 16, is anticipated to be released in September 2024. This follows Apple's typical release pattern for new iPhone...
- OpenAI Begins Hyper-realistic Voice Rollout: OpenAI has begun rolling out its highly anticipated Advanced Voice Mode for ChatGPT, offering select Plus subscribers access to hyper-realistic audio...
- please provide an example of secure user authentication in Flask: To implement secure user authentication in a Flask application, you can follow these steps, which include setting up the necessary packages, creating a user...
- What is best skills in PerplexitAI ?: Perplexity AI é uma ferramenta poderosa que combina capacidades de busca e geração de texto, utilizando modelos de linguagem de grande escala (LLMs) para...
Perplexity AI ▷ #pplx-api (4 messages):
Subpar Prompt Results
Perplexity References Beta
Perplexity API on make.com
- Users call out subpar prompt results: Users expressed concerns over recent prompt results, indicating they feel like the results are going backwards.
- One user asked for suggestions on specific prompts that might be causing the issue.
- Inquire about Perplexity References Beta access: A user inquired about the status of the Perplexity references beta, wondering if it’s still possible to gain access.
- 'Hey there, I've applied for the perplexity references beta and was wondering if those are still being given out or if there is a way for me to get there? 🙂'.
- Integrating Perplexity API on make.com: A user inquired about connecting to Perplexity API on make.com, specifying the use of Sonnet 3.5 model to generate summaries.
- The user outlined a requirement to generate a page with a model on Perplexity API and then post the link on Discord.
OpenAI ▷ #ai-discussions (255 messages🔥🔥):
GPT-4o Image Output
Multimodal Training Models
Voice Model Testing
DALL-E and Imagen 3 Comparisons
Alpha Testing Experience
- GPT-4o Image Output Debated: Discussion centered around GPT-4o's image output capabilities with examples, comparing it to other models like DALL-E 3.
- Users noted that GPT-4o's output seemed more realistic but faced criticisms over its moderation endpoint similar to DALL-E 3.
- Future of Multimodal Training Models: A user proposed the future relevance of multimodal models that learn indirectly from video data to label emotions, suggesting they might outperform single-modality models for tasks like text to speech.
- Voice Model Testing and Capabilities: Users experimented with the voice capabilities of GPT-4o, sharing various scenarios including accent changes and emotional expressions.
- Findings highlighted the model's ability to add background music and sound effects, though it was inconsistent.
- Comparing DALL-E and Imagen 3: Requests and comparisons were made between DALL-E and Imagen 3, with offers to run prompts to see which produced better imagery.
- Initial feedback suggested that while both had strong capabilities, Imagen 3 might have a moderation endpoint issue.
- Experiences and Limitations of Alpha Testing: Alpha testers shared mixed experiences, noting issues like high latency and occasional connectivity problems while enjoying new features.
- Debate over region-based access in Europe suggested varying availability, with some users contemplating refunds.
Link mentioned: Tweet from Greg Brockman (@gdb): A GPT-4o generated image — so much to explore with GPT-4o's image generation capabilities alone. Team is working hard to bring those to the world.
OpenAI ▷ #gpt-4-discussions (24 messages🔥):
Alpha testing eligibility
Custom GPTs issues
Free AI diagram tools
Plus subscription impacts
Monetizing GPTs
- Alpha testing eligibility relies on luck: When asked about how to become an alpha tester, a user simply replied that it requires luck.
- Custom GPTs stuck during configuration: A user having trouble uploading PNG screenshots to their custom GPTs received an error stating 'Hmm...something seems to have gone wrong' repeatedly without resolution.
- Custom GPTs disabled upon cancelling Plus subscription: It was confirmed that cancelling a Plus subscription will disable and hide any custom GPTs created by the user.
- Monetizing GPTs requires significant usage numbers: A discussion revealed that high usage numbers and being located in the USA are prerequisites for being invited to monetize GPTs.
- Despite initial announcements about GPT Store monetization, users are disappointed due to lack of progress and rollouts of promised features.
OpenAI ▷ #prompt-engineering (12 messages🔥):
Prompt engineering platforms
Evaluation tools
Text reduction strategies
- Best platform for prompt engineering: A member asked for the best platform for prompt engineering, to which another replied, Claude 3.5 Sonnet.
- Artifacts and Projects were praised for their strengths in this regard.
- Tools for heuristic prompt evaluations: A member expressed interest in prompt evaluations and steerability, preferring heuristic and prototyping tools over full automation.
- The Anthropic Evaluation Tool was mentioned positively, but there was interest in alternatives that work with other LLMs.
- Google Sheet for evaluation: For collaborative prompt evaluation, a member suggested that a Google Sheet with scripts might be the best approach.
- This method could facilitate sharing and collaboration better than other tools.
- Free AI tools for drawing diagrams: A member inquired about free AI tools that can draw diagrams.
- Another member simply replied, ChatGPT.
- Challenges in text length reduction: A member asked about reducing text to a specific character or word count.
- Another clarified that LLMs struggle with exact counts, suggesting qualitative language for more consistent lengths.
OpenAI ▷ #api-discussions (12 messages🔥):
Prompt Engineering Platforms
Human Evaluation Tools
AI for Drawing Diagrams
Reducing Text Length
- Best Platforms for Prompt Engineering: A member asked about the best platforms for prompt engineering and another suggested Claude 3 and Sonnet.
- They also mentioned that Artifacts + Projects are strong contenders in the field.
- Anthropic Evaluation Tool for Steerability: A discussion focused on Anthropic Evaluation Tool for prompt evaluations and steerability for heuristics and prototyping.
- A member suggested that a Google Sheet with scripts might be the most collaborative and easy-to-share alternative.
- Free AI Tools for Drawing Diagrams: A member inquired about free AI tools that can draw diagrams.
- Another member recommended ChatGPT, although its suitability for drawing diagrams was disputed.
- Reducing Text to Specific Lengths: A member asked about reducing text to specific character or word counts.
- Another member explained that due to the nature of LLMs, they can't ensure exact counts and suggested using qualitative language terms like short or long instead.
CUDA MODE ▷ #general (55 messages🔥🔥):
FSDP Criticism
Sharding LLaMA 405B
vLLM and LLaMA 3.1 Support
Megatron Paper Discussions
Torchrun and GPU Memory Issues
- FSDP Criticism Sparks Debate: A member criticized FSDP, calling it 'kind of ass', which led to a discussion about its applications and scalability.
- Another member pointed out that while FSDP is not ideal for all scenarios, 'there's no beating it as far as ease of use is concerned'.
- Struggling with Sharding LLaMA 405B Across Nodes: Members discussed issues with sharding LLaMA 405B across 2 nodes with 8 x H100s, primarily facing problems during inference.
- Suggestions were made to use vLLM and explore quantization methods, though the original member preferred to avoid VLLM.
- vLLM Extends Support for LLaMA 3.1: A member highlighted that vLLM now supports the LLaMA 3.1 model series with enhancements for larger context windows and pipeline parallelism.
- They shared a blog post detailing these new features including FP8 quantization.
- Megatron Paper Sparks Interest: Members showed interest in the Megatron paper from 2021, discussing its relevance and sharing links to the paper and related resources.
- A YouTube video was also shared for further understanding of distributed training concepts.
- Issues with Torchrun and GPU Memory: A member reported issues with torchrun, where GPU memory isn't freed when manually stopping the script.
- Suggestions included using @record to handle errors and ensure GPU memory is cleared.
- Error Propagation — PyTorch 2.4 documentation: no description found
- Announcing Llama 3.1 Support in vLLM: Today, the vLLM team is excited to partner with Meta to announce the support for the Llama 3.1 model series. Llama 3.1 comes with exciting new features with longer context length (up to 128K tokens), ...
- Index of /~matei/papers/2021: no description found
- Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis: Graphics processing units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over the last decade, researchers have focused on ...
- EfficientML.ai Lecture 17: Distributed Training (Part I) (MIT 6.5940, Fall 2023, Zoom): EfficientML.ai Lecture 17: Distributed Training (Part I) (MIT 6.5940, Fall 2023, Zoom)Instructor: Prof. Song HanSlides: https://efficientml.ai
- EfficientML.ai Lecture 17: Distributed Training (Part I) (MIT 6.5940, Fall 2023, Zoom): EfficientML.ai Lecture 17: Distributed Training (Part I) (MIT 6.5940, Fall 2023, Zoom)Instructor: Prof. Song HanSlides: https://efficientml.ai
CUDA MODE ▷ #triton (9 messages🔥):
Triton tiled matmul tutorial
GROUP_SIZE_M argument
Block and group tiling
L2 cache optimization
- Clarification on GROUP_SIZE_M in Triton Tiled Matmul Tutorial: A user inquired about the role of the
GROUP_SIZE_M
argument in the Triton tiled matmul tutorial, questioning its purpose and advantage.- Another user explained that
GROUP_SIZE_M
controls how many blocks of rows are processed before changing columns, enhancing L2 cache hit rate, and is one level of cache tiling above block tiling and below warp/thread tiling.
- Another user explained that
- GROUP_SIZE_M vs. MAX Value Usage: The discussion continued with a user asking why
GROUP_SIZE_M
should not always be set to the maximum possible value.- The response highlighted that similar logic applies to block tiling in shared memory and that setting it to the max could lead to inefficiencies explained in the tutorial, comparing it to not using the full length of dimensions for block sizes.
Link mentioned: Matrix Multiplication — Triton documentation: no description found
CUDA MODE ▷ #torch (3 messages):
Running video predictor example notebook
Google Colab example for sam2
GitHub issue for segment-anything-2
- Running video predictor example notebook fails: A member was unable to run the video predictor example notebook from sam2.
- Despite trying various changes on their end, they could not get it to work and sought community advice.
- Alternative Google Colab notebook found for sam2: The same member found a Google Colab notebook that works with their configuration.
- They thanked the contributor on the relevant GitHub issue for providing a solution.
- Google Colab: no description found
- Google Colab example · Issue #40 · facebookresearch/segment-anything-2: Not an issue. If someone needs it, I build a working Colab with the model - https://colab.research.google.com/drive/1Un09HITLLM-ljkG1Ehn9cJjdwk8FVI_1?usp=sharing Working end-to-end.
CUDA MODE ▷ #algorithms (1 messages):
Llama 3 Herd of Models
AIMO: Findings from the winners
SAM 2: Segment Anything Model 2
LazyLLM
- Meta reveals Llama 3.1: Herd of Models: Meta released Llama 3.1 which includes a new model with 405 billion parameters, trained on 15.6 trillion tokens on a cluster of 16,000 H100 GPUs.
- They utilized models like Roberta to filter out and create a high-quality dataset for training.
- AIMO winners' findings dissected: This week's analysis includes a detailed review of the winners' findings from the AIMO competition.
- SAM 2: The successor to Segment Anything Model: Discussion covered SAM 2, the next iteration of the Segment Anything Model.
- LazyLLM boosts LLM inference performance: A segment focused on LazyLLM, which aims at improving the performance of LLMs during inference.
Link mentioned: AI Unplugged 16: Llama 3, AIMO winners, Segment Anything Model 2, LazyLLM: Insights over Information
CUDA MODE ▷ #cool-links (3 messages):
Digital Video Eavesdropping
NVIDIA Titan Series Graphics Cards
Segment Anything Video (SA-V) Dataset
- Revolutionizing Digital Video Eavesdropping Techniques: A recent arXiv paper discusses a novel approach to eavesdrop on digital video displays by analyzing electromagnetic waves from HDMI cables, termed TEMPEST.
- The authors propose using a deep learning module to map observed electromagnetic signals back to the displayed image, overcoming the challenges posed by the high bandwidth and non-linear mapping of digital signals.
- NVIDIA's Next-Gen Titan GPUs Unveiled: According to a Wccftech article, NVIDIA's new Titan-class graphics card based on the Blackwell GPU architecture exists, but its launch remains doubtful.
- Previous Titan releases include the Titan RTX from 2018, and there is speculation whether new
- Meta Releases Vast SA-V Dataset for AI Research: Meta introduced the Segment Anything Video (SA-V) dataset, containing 51K videos and 643K spatio-temporal segmentation masks.
- The dataset supports computer vision research and consists of manually annotated and automatically generated masklets, with an average video resolution of 1401×1037 pixels.
- NVIDIA's Next-Gen Titan Graphics Card Does Exist & Based on Flagship Blackwell GPU: NVIDIA reportedly already has a Titan-class graphics card based on its next-gen Blackwell GPU architecture but its launch is doubtful.
- Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations: In this work, we address the problem of eavesdropping on digital video displays by analyzing the electromagnetic waves that unintentionally emanate from the cables and connectors, particularly HDMI. T...
- no title found: no description found
CUDA MODE ▷ #pmpp-book (2 messages):
Ampere A100 SM organization
Warp distribution in processing blocks
Hardware design choices
Hopper architecture
- Ampere A100 SM split into smaller processing blocks: A user queried why the Ampere A100 SM, with 64 cores, is organized into four processing blocks with 16 cores each rather than 32 cores to match the warp size.
- Another user speculated that Nvidia likely made this choice to maintain a balance that keeps the hardware busy, given kernel needs, space on silicon, bandwidth, and latency parameters.
- Speculations on Hardware Design Choices: One user mentioned that hardware design involves balancing space on silicon with utilization, where more units take more space.
- They suggested it might be a delicate balance act to ensure that additional units are worth their cost in terms of bandwidth and latency.
CUDA MODE ▷ #torchao (11 messages🔥):
.py vs .ipynb
Quantization-Aware Training (QAT)
Conversion of .ipynb to .py
GitHub Repositories for Jupyter and PyTorch
Performance Comparison of QAT and PTQ
- .py vs .ipynb Usability Debate: Discussion centered around whether .py files can be easily runnable and modifiable in comparison to .ipynb files, with some members suggesting various tools and methods for conversion.
- One member mentioned using LibCST for conversions, while another noted the availability of export options in Colab and Jupyter UI.
- Quantization-Aware Training improves PyTorch Model Accuracy: A blog post on PyTorch discusses an end-to-end Quantization-Aware Training (QAT) flow which can recover up to 96% of the accuracy degradation on hellaswag and 68% of the perplexity degradation on wikitext for Llama3 compared to post-training quantization.
- QAT vs. PTQ in Practical Application: One member explained the crucial difference between Quantization-Aware Training and Quantized Training, emphasizing QAT's substantial performance improvements.
- Another participant highlighted the excitement about combining low-rank adaptation with QAT for enhanced performance.
- Overfitting Concerns with QAT: A user questioned if overfitting was checked during the QAT process, suggesting that MMLU could be a good metric for verification.
- This sparked a further mention for verification by another user, indicating the community's interest in the thorough evaluation of QAT.
- Quantization-Aware Training for Large Language Models with PyTorch: In this blog, we present an end-to-end Quantization-Aware Training (QAT) flow for large language models in PyTorch. We demonstrate how QAT in PyTorch can recover up to 96% of the accuracy degradation ...
- notebook/docs/source/examples/Notebook/Running Code.ipynb at main · jupyter/notebook: Jupyter Interactive Notebook. Contribute to jupyter/notebook development by creating an account on GitHub.
- GitHub - Instagram/LibCST: A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree: A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree - Instagram/LibCST
CUDA MODE ▷ #llmdotc (177 messages🔥🔥):
GELU changes
Llama 3.1 reference implementation
Reference implementation issues
TorchChat
RoPE scaling
- GELU optimization PR for LLMC: A new PR was submitted to move faster GELU changes from the FP8 branch to master, which improves validation loss slightly.
- Surprisingly, it actually helps val loss a tiny bit, but again might be noise.
- Llama 3.1 implementation issues: Members discussed the lack of documentation for running the Llama 3.1 model after downloading it from Meta's repo and shared code snippets to attempt loading and running it.
- It's suspected that a 10-line Python snippet is missing for a straightforward run, with inference scripts highlighted as overly complicated.
- TorchChat as a Llama 3.1 reference: A reference implementation for Llama 3.1 was shared in the form of a new TorchChat repository released by PyTorch.
- This implementation serves as a detailed guide for local and server-based running of Llama 3.1 models.
- RoPE scaling and specialized features: The conversation included detailed discussions on how RoPE scaling differs in Llama 3.1 and the necessity to update reference implementations accordingly.
- Members shared insights on integrating this in CUDA code for better fine-tuning operations.
- Fine-tuning techniques on Llama 3.1: Discussion pivoted towards fine-tuning, weighing full finetuning vs. LoRA approaches, with insights into LoRA being efficient on smaller datasets.
- It was suggested that sometimes training on just completions can yield better results, and a snippet to implement this was shared from the unsloth repo.
- llama3/llama/generation.py at main · meta-llama/llama3: The official Meta Llama 3 GitHub site. Contribute to meta-llama/llama3 development by creating an account on GitHub.
- llama3/example_text_completion.py at main · meta-llama/llama3: The official Meta Llama 3 GitHub site. Contribute to meta-llama/llama3 development by creating an account on GitHub.
- llama-recipes/recipes/quickstart/inference/local_inference/README.md at main · meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q...
- GitHub - karpathy/nano-llama31: nanoGPT style version of Llama 3.1: nanoGPT style version of Llama 3.1. Contribute to karpathy/nano-llama31 development by creating an account on GitHub.
- GitHub: Let’s build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and fea...
- GitHub - karpathy/nano-llama31: nanoGPT style version of Llama 3.1: nanoGPT style version of Llama 3.1. Contribute to karpathy/nano-llama31 development by creating an account on GitHub.
- Faster GELU forward & backward using MUFU.TANH for SM7.5+ by ademeure · Pull Request #721 · karpathy/llm.c: These are faster GELU kernels by using the HW instruction NVIDIA introduced for this in Turing (SM7.5) but never exposed outside of PTX as far as I can tell, possibly because it's slightly less ac...
- GitHub - meta-llama/llama-models: Utilities intended for use with Llama models.: Utilities intended for use with Llama models. Contribute to meta-llama/llama-models development by creating an account on GitHub.
- Do not modify global random state · Issue #39716 · pytorch/pytorch: 🚀 Feature Currently, the recommended approach to achieve reproducibility is setting global random seeds. I would like to propose that instead all functions which need a random source accept a local.....
- unsloth/unsloth/chat_templates.py at main · unslothai/unsloth: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- llama-models/models/llama3_1/api/model.py at main · meta-llama/llama-models: Utilities intended for use with Llama models. Contribute to meta-llama/llama-models development by creating an account on GitHub.
- HydraHarp 400 - Multichannel Picosecond Event Timer & TCSPC Module | PicoQuant: no description found
- unsloth/unsloth/models/llama.py at main · unslothai/unsloth: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- llm.c/llmc/encoder.cuh at 7e0c497936540a44338e214bc230a1f041090fcb · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
- llama2.c/runq.c at 8a0ad84b9ee94fad175e5687fb8774503efbd23b · trholding/llama2.c: Llama 2 Everywhere (L2E). Contribute to trholding/llama2.c development by creating an account on GitHub.
- GitHub - pytorch/torchchat: Run PyTorch LLMs locally on servers, desktop and mobile: Run PyTorch LLMs locally on servers, desktop and mobile - pytorch/torchchat
- torchchat/generate.py at main · pytorch/torchchat: Run PyTorch LLMs locally on servers, desktop and mobile - pytorch/torchchat
- torchchat/build/model.py at main · pytorch/torchchat: Run PyTorch LLMs locally on servers, desktop and mobile - pytorch/torchchat
CUDA MODE ▷ #lecture-qa (1 messages):
L2 latency as hyperparameter
latency bound algorithm
- Question on using L2 latency as a hyperparameter: A member asked how to use L2 latency as a hyperparameter in the options for the 2 billion options.
- The same member also inquired about the definition and application of a latency bound algorithm.
- Understanding latency bound algorithm: A user sought clarification on what is meant by latency bound algorithm.
- This followed a previous question on the role of L2 latency in hyperparameter tuning.
CUDA MODE ▷ #cudamode-irl (4 messages):
Gradient involvement
Seq Parallel
Triton Kernels
Hackathon
Event Criteria
- Gradient's Michael explores Seq Parallel and Triton Kernels: Michael from Gradient announced his work on either Seq Parallel or Triton Kernels for some unique architectures and invited others to join him in SF.
- Hackathon-style learning interest from a newbie: Pacomann expressed interest in joining the event, emphasizing a desire to learn a lot in a hackathon-style format.
- Question on event approval criteria: Evil666man asked whether there was a criterion for approval or if it was first come, first serve.
- Kashimoo responded, implying the event would have been full if it were first come, first serve.
Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):
Stable Fast 3D Launch
Technical Report
3D Asset Generation Technology
Speed and Quality of 3D Reconstruction
Applications in Gaming and VR
- Stable Fast 3D Launch 🚀: Stability AI has introduced Stable Fast 3D, a model that transforms a single input image into a detailed 3D asset in just 0.5 seconds, setting a new standard for speed and quality in 3D reconstruction. Learn more and access the report.
- 'Stable Fast 3D's unprecedented speed and quality make it an invaluable tool for rapid prototyping in 3D work.'
- How Stable Fast 3D Works: Users can upload a single image of an object, and Stable Fast 3D rapidly generates a complete 3D asset, including UV unwrapped mesh, material parameters, and albedo colors with reduced illumination bake-in. Watch the video for detailed model improvements.
- Optional quad or triangle remeshing adds only 100-200ms to the processing time, increasing its utility across various industries.
Link mentioned: Introducing Stable Fast 3D: Rapid 3D Asset Generation From Single Images — Stability AI: We are excited to introduce Stable Fast 3D, Stability AI’s latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset, settin...
Stability.ai (Stable Diffusion) ▷ #general-chat (212 messages🔥🔥):
Training Loras for TV Characters
SD3 Model Usage
Handling VAE Issues
Creative Upscaler Confusion
Flux Model Release
- Training Loras for TV characters in SD3: Members discussed how to train 2 Loras of TV characters and have both of them in the same image, recommending the use of SD3 for its unique understanding capabilities.
- Suggestions included starting with prompting, using regional prompter extension in auto1111, and validating through community testing.
- SD3 Medium model issues and usage: Users faced errors loading SD3 Medium from Huggingface such as 'AttributeError: NoneType object has no attribute lowvram'.
- Resolutions discussed included downloading all model components, using ComfyUI workflows, and exploring other compatible UIs like Auto1111.
- Managing VAE settings to prevent red images: Community members addressed issues where rendered images turn red at 95%, attributing it mostly to VAE settings.
- Solutions included using '--no-half-vae' setting and sharing troubleshooting tips for different graphics cards and VAE combinations.
- Clarifying Stability AI's Creative Upscaler: Confusion around the 'Creative Upscaler' mentioned in NightCafe led to clarifications that it's not a real Stability AI product.
- Members recommended alternative upscaling techniques using ERSGAN, transformers, and multi-stage workflows shared on community forums.
- Flux model release by Black Forest Labs: The community welcomed the release of the Flux model, which offers significant improvements in image quality and parameter count.
- Users discussed the model’s performance on different GPUs, with the 4090 being highly recommended, and noted exceptional results in rendering hands and fingers.
- Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models: Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs—the original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativi...
- stabilityai/stable-diffusion-3-medium at main: no description found
- Reddit - Dive into anything: no description found
- stabilityai/stable-diffusion-3-medium · Hugging Face: no description found
- GitHub - Stability-AI/generative-models: Generative Models by Stability AI: Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.
- Comfy Workflows: Share, discover, & run thousands of ComfyUI workflows.
LM Studio ▷ #general (121 messages🔥🔥):
Exit codes in LM Studio
Gemma 2 models
Model embedding and LLaMA capabilities
Bugs and troubleshooting in LM Studio
Future LM Studio features and user requests
- Members report various Exit Codes: Users encountered different exit codes such as 6 and 0 on various systems, leading to discussions on system compatibility and debugging.
- Gemma 2 Models: Compatibility and Errors: Community members faced issues running Gemma 2 2B models, especially on older or specific hardware, with some requiring new LM Studio versions.
- Embedding with LLaMA and Future Prospects: Queries arose about using LLaMA for embedding within LM Studio, highlighting projects like LLM2Vec for potential solutions.
- Bugs and Troubleshooting in LM Studio: Various bugs were highlighted by users, including issues with GPU offload and network errors linked to VPN/DNS settings.
- User Requests for Future LM Studio Features: Users expressed a desire for features like TTS voices, internet access for models, and RAG for document interaction within LM Studio.
- meta-llama/Meta-Llama-3.1-405B · Hugging Face: no description found
- Papers with Code - Visual Question Answering (VQA): **Visual Question Answering (VQA)** is a task in computer vision that involves answering questions about an image. The goal of VQA is to teach machines to understand the content of an image and answer...
- GitHub - McGill-NLP/llm2vec: Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders': Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders' - McGill-NLP/llm2vec
LM Studio ▷ #hardware-discussion (24 messages🔥):
GPU offload in LM Studio
Stable Diffusion model compatibility
Amuse AI for image generation
Proxmox learning
- Enable iGPU for better VRAM availability: A member tried to enable their iGPU to free up VRAM on their RTX3090 for loading models in LM Studio but still sees 0.5/24.0 GB VRAM usage when idle.
- Another member clarified that iGPUs are unsupported without the OpenCL addon pack; a new beta version with Vulkan support might help.
- Stable Diffusion not supported in LM Studio: A user reported an error when trying to load a stable-diffusion model, revealing that LM Studio does not support image generation models such as Stable Diffusion.
- Suggestions were given to use Stability Matrix, Automatic1111, or Amuse AI for these tasks.
- Amuse AI now available for Radeon users: A member announced that Amuse AI is available for Radeon users, allowing stable diffusion image generation on GPUs with new EZ mode.
- It offers features such as AI filters and sketch-to-image generation without login or cost prerequisites.
- Proxmox learning tips for beginners: A participant asked for tips on drivers in Proxmox and was advised to practice Proxmox inside VirtualBox under Windows first.
- A thorough learning plan was shared, covering topics from installation to GPU passthrough and LLM utilization.
- Spongebob Patrick Star GIF - Spongebob Patrick Star Shocked - Discover & Share GIFs: Click to view the GIF
- Amuse: Stable Diffusion Image and Video Generation
Eleuther ▷ #general (88 messages🔥🔥):
Watermarking in AI
NTIA Report on AI Openness
GitHub Models Launch
Legal Challenges in Deepfakes
GPT-2 Model Improvements
- Watermarking tech trust issues spark debate: Members debated the effectiveness of watermarking in solving trust issues in AI, with some arguing it only works in institutional settings and cannot prevent misuse entirely.
- The discussion suggested that better cultural norms and trust mechanisms, rather than watermarking, are needed to address the spread of deepfakes and misrepresented content.
- NTIA supports open models in latest report: The NTIA issued a report advocating for the openness of AI models while recommending risk monitoring, influencing policy considerations in the US.
- Participants noted that the NTIA functions within the Department of Commerce and reports directly to the White House, giving weight to its policy recommendations on AI model openness.
- GitHub introduces integrated AI models: GitHub announced GitHub Models, allowing developers to access and experiment with top AI models directly on their platform.
- Community members speculated that this move might be an attempt to compete with platforms like Hugging Face by integrating AI capabilities into developers' existing workflows.
- Challenges of regulating deepfakes: Members discussed the regulatory complexities around deepfakes, particularly libel and defamation issues, and the difficulties of enforcing laws on a global scale.
- The discussion highlighted concerns over the feasibility of prosecuting deepfake creators and the potential for such content to be used in blackmail schemes.
- Optimizing GPT-2 with new papers and techniques: A participant working on a GPT-2 model sought advice on incorporating advanced techniques, having already implemented Rotary Positional Embeddings and Grouped Query Attention.
- Community members suggested looking at recent papers and evaluation metrics like human eval to further improve the model and measure its performance effectively.
- Tweet from Thomas Dohmke (@ashtom): Build AI applications right where you manage your code. With GitHub Models, now more than 100 million developers can access and experiment with top AI models where their workflow is – directly on GitH...
- NTIA Supports Open Models to Promote AI Innovation | National Telecommunications and Information Administration: no description found
- Tweet from Imperishable Knight ⛩️ (RJ) (@impershblknight): Tip for Plus users hoping to get #ChatGPT Advanced Voice alpha access: Have you tried enabling these settings? I didn't get the AV invite initially but I enabled them then hours later as the next...
- United States Department of Commerce - Wikipedia: no description found
- Federal Register :: Request Access: no description found
- UK’s AI bill to focus on ChatGPT-style models: no description found
Eleuther ▷ #research (7 messages):
system prompt style model training
MLCommons AlgoPerf results
synthetic data generation
system prompt generalization
- System Prompt Style Models Training Query: A member questioned the existence of papers on how system prompt style models were trained, finding them synthetic as they don't exist in the wild.
- Another member suggested they can be generated automatically or with minimal human effort once a system prompt-tuned model is available.
- MLCommons AlgoPerf Results Announced: MLCommons AlgoPerf results are in, highlighting a $50K prize competition where non-diagonal preconditioning outperformed Nesterov Adam by 28%, setting a new SOTA in hyperparameter-free algorithms.
- This achievement was celebrated as distributed shampoo emerged victorious in the competition.
- Synthetic Data for System Prompts: Discussion on using synthetic data generation and GPT-4 distillation to generate system prompts for chat/instruct models.
- A member expressed the need for more research to back up claims about the effectiveness of system prompt generation in ensuring model guardrails.
Link mentioned: Tweet from MLCommons (@MLCommons): @MLCommons #AlgoPerf results are in! 🏁 $50K prize competition yielded 28% faster neural net training with non-diagonal preconditioning beating Nesterov Adam. New SOTA for hyperparameter-free algorith...
Eleuther ▷ #scaling-laws (15 messages🔥):
Scaling law experiments
Validation log-likelihood anomalies
Double descent phenomenon
Broken Neural Scaling Law (BNSL) paper
Task-specific scaling behavior
- Scaling law experiments reveal anomalies: Experiments comparing the validation log-likelihood of models trained on different-sized subsets show that the model trained on 1e6 sequences significantly underperforms those trained on fewer or more sequences.
- Speculations and explanations for validation dip: Members initially suspected a bug in the data processing pipeline but couldn't find any, prompting discussions on the double descent phenomenon.
- Another user mentioned the BNSL paper showing similar double descent behavior regarding dataset size, leading to confusion about this occurring depending on the task.
- Double descent debated: Double descent is mentioned as a potential cause, though traditionally linked to increasing parameters rather than dataset size.
- A user clarified that double descent can occur for both parameters and dataset size, noting that the issue might be task-specific.
Link mentioned: Broken Neural Scaling Laws: We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of deep neural networks ...
Eleuther ▷ #interpretability-general (5 messages):
Gemma Scope
ICML Mech Int Workshop Recording
- Recording for the ICML Mech Int Workshop: A member inquired about the recording for the ICML Mech Int Workshop and was informed by another member that it will be available after a month due to ICML rules.
- It was mentioned that these rules are likely to incentivize people to pay for a virtual pass. Another suggestion was made to obtain the link from a conference attendee.
- Great Work on Gemma Scope: A member complimented the excellent progress on Gemma Scope in a brief interaction.
- The query about the ICML Mech Int Workshop recording followed the praise for Gemma Scope.
Eleuther ▷ #lm-thunderdome (11 messages🔥):
lm-eval prompt counts
GPQA benchmarks
lm_eval harness behavior
Issue tracking for lm_eval
Interpreting progress bars in lm_eval
- lm-eval uses more prompts than present in benchmark: A user noticed that running lm-eval even with zeroshot uses 4x the prompts present in certain benchmarks like gpqa_main, processing 1792 prompts instead of 448.
- GPQA benchmark explained: Another user explained that GPQA has four options and is likely running each option separately.
- Another user clarified that varying sizes between options shouldn't result in exactly 4x prompts and indicated this happens across other benchmarks like MMLU.
- Issue within GPQA eval harness: A user shared their launch script and a specific case where the lm_eval harness processes more prompts than expected, providing detailed settings and asking for issue references.
- Progress bars track choices: A user clarified that the progress bar in lm-eval shows
num_choices * num_docs
for consistency, even if settings allow single-token responses without multiple LM calls.
Interconnects (Nathan Lambert) ▷ #news (61 messages🔥🔥):
xAI Acquisition Rumors
Black Forest Labs Announcement
Gemini 1.5 Pro Release
GitHub Introduces AI Models
- xAI rumored acquisition of Character AI refuted by Elon Musk: Rumors spread that xAI might acquire Character AI to test and improve its Grok models, but Elon Musk denied these claims, dismissing the reports as misinformation.
- Users speculated about the credibility of these rumors, citing similar instances where Musk previously denied reports before they were later confirmed.
- Black Forest Labs formed by original Stable Diffusion team: The original Stable Diffusion team announced the formation of Black Forest Labs to develop advanced generative deep learning models for media.
- They aim to push the boundaries of creativity and efficiency, with their latest model Flux available for testing on fal.
- Google launches Gemini 1.5 Pro: Google's latest model, Gemini 1.5 Pro, was released on Google AI Studio and quickly became the top model on LMSYS with an ELO of 1300.
- This model is praised as the strongest and most intelligent Gemini model to date, showcasing significant advancements.
- GitHub introduces AI Models: GitHub announced the launch of GitHub Models to empower developers with industry-leading AI tools directly on their platform.
- This initiative is designed to make AI more accessible to the developer community, bridging the gap between coder and AI engineer.
- Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models: Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs—the original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativi...
- Tweet from Elon Musk (@elonmusk): @nmasc_ @KalleyHuang @steph_palazzolo The [Mis]Information strikes again. xAI is not considering an acquisition of Character AI.
- Tweet from Simon (@tokumin): We've just pushed the latest Gemini 1.5 Pro to http://aistudio.google.com. It's a REALLY good model, and coming in as the #1 model on LMSYS with an ELO of 1300. Amazing work from the whole G...
- Tweet from natasha mascarenhas (@nmasc_): I'm hearing that xAI is looking at a number of consumer AI companies as potential acquisition targets, in addition to Character AI. Also hearing on a daily basis that there are more Inflection/Ad...
- Tweet from natasha mascarenhas (@nmasc_): SCOOP: xAI is weighing an acquisition of Character AI, as it looks to test and improve its Grok models and beef up its talent ranks https://www.theinformation.com/articles/musks-xai-considers-buying-...
- Introducing GitHub Models: A new generation of AI engineers building on GitHub: We are enabling the rise of the AI engineer with GitHub Models – bringing the power of industry leading large and small language models to our more than 100 million users directly on GitHub.
- Tweet from Black Forest Labs (@bfl_ml): We are excited to announce the launch of Black Forest Labs. Our mission is to develop and advance state-of-the-art generative deep learning models for media and to push the boundaries of creativity, e...
- Tweet from Black Forest Labs (@bfl_ml): We are excited to announce the launch of Black Forest Labs. Our mission is to develop and advance state-of-the-art generative deep learning models for media and to push the boundaries of creativity, e...
- Tweet from Elon Musk (@elonmusk): xAI is not raising capital and I have had no conversations with anyone in this regard Quoting X Daily News (@xDaily) NEWS: The Financial Times has reported that @xAI is seeking investments up to $6...
Interconnects (Nathan Lambert) ▷ #ml-drama (32 messages🔥):
Together AI's Critique
Suno vs Music Labels
AI2 Rebrand
OpenAI vs. Non-Profit Perceptions
- Together AI Critique Calls Out Cherry-Picked Errors: An AI researcher criticized Together AI for cherry-picking results and presented points on the need for scientific rigor in LLM evaluations, pointing out that non-smooth outputs and biased benchmarks skew real-world performance.
- He shared detailed tweets and external resources to emphasize quantization techniques and transparent methodologies in LLM evaluation.
- Suno Clashes with Music Labels Over Copyright: Suno's response to RIAA highlights their mission amid a lawsuit from music labels who allege Suno trained on copyrighted output.
- The discussion reflects on Suno admitting to using copyrighted materials and the contentious talks leading up to the lawsuit.
- AI2's Rebrand Sparks Mixed Reactions: Allen AI unveiled its new brand and website, but not all responses were favorable, with some highlighting the use of sparkles emoji as a familiar tactic in AI branding.
- The change stirred conversations about how even non-profits face scrutiny and mixed reactions during rebranding efforts.
- OpenAI's Non-Profit Status Questioned: In a casual exchange, members humorously noted that OpenAI claims to be a non-profit, leading to skepticism about the legitimacy of such status in practice.
- This reflected broader sentiments that even non-profits do not escape negative press and accountability.
- Tweet from Rachel Metz (@rachelmetz): looks like @allen_ai is taking a page from the sparkles emoji playbook with its redesign! see my recent piece on the AI industry's embrace of ✨ to learn more about the humble sparkles' jump in...
- Tweet from Yangqing Jia (@jiayq): As an AI researcher and engineer, I fully respect together's achievement but would like to also point out the many cherrypicked errors. I am sure they are unintentional, but evaluation of LLMs is ...
- Tweet from Mikey (@MikeyShulman): We're filing our response to the members of the RIAA today. It's important to understand additional context around our mission and what is at stake. You can read more about it on the suno blog...
- Tweet from Ai2 (@allen_ai): After months of behind-the-scenes research, interviews, and labors of love, we’re delighted to debut Ai2’s new brand and website today. Explore the evolution 🧵
Interconnects (Nathan Lambert) ▷ #random (4 messages):
Anime Profile Picture Feed
Article Timing
Llama 3.1 Scores
- Anime Profile Picture Feed Features Article: A member mentioned that their anime PFP feed started posting an article, calling it a 'banger' with impeccable timing.
- Perfect Timing on Article Release Awaiting Llama 3.1 Scores: Natolambert mentioned getting lucky with the article's timing and revealed they were waiting for Llama 3.1 scores before releasing it.
Interconnects (Nathan Lambert) ▷ #posts (28 messages🔥):
Interviewing Sebastian Raschka
Knowledge distillation definitions
Apple AI advancements
Rejection sampling in RLHF
Open Instruct updates
- Sebastian Raschka discusses open LLMs and Llama 3.1: Sebastian Raschka's interview covers the state of open LLMs, Llama 3.1, and AI education.
- During the interview, concerns about distillation verbiage similar to Alpaca and Self-Instruct papers were discussed, highlighting a naming conflict in the field.
- Confusion over knowledge distillation terms: Members debated the terms for distillation used during training with synthetic data versus soft-target and hard-target distillation.
- The issue is magnified with terms like rejection sampling being un-googleable outside specific AI contexts.
- Apple AI integration makes waves: A discussion on Apple's new AI features suggests their integration can connect apps more seamlessly, making daily tasks easier.
- Apple's multi-model AI system, Apple Intelligence, is seen as a force multiplier in everyday tech, though AI labs remain skeptical of its transformative potential.
- Implementing rejection sampling in Open Instruct: Rejection sampling is being implemented in Open Instruct, aiming to streamline training processes.
- This method might reduce issues found in other training approaches, improving the overall efficiency of model training.
- On-policy preference data collection challenges: The community discussed the costs and challenges of collecting on-policy preference data for single-policy alignment datasets.
- It was noted in the An update on DPO vs PPO for LLM alignment video that having diverse model generations can make Ultrafeedback easier to use, but single-policy focus might be necessary for consistent alignment.
- AI for the rest of us: Apple Intelligence makes a lot of sense when you get out of the AI bubble. Plus, the cool technical details Apple shared about their language models "thinking different."
- Interviewing Sebastian Raschka on the state of open LLMs, Llama 3.1, and AI education: This week, I had the pleasure of chatting with Sebastian Raschka. Sebastian is doing a ton of work on the open language model ecosystem and AI research broad...
- Add rejection sampling script by vwxyzjn · Pull Request #205 · allenai/open-instruct: no description found
Latent Space ▷ #ai-general-chat (56 messages🔥🔥):
Llama 3.1 evaluation and controversies
AI SDR fundraising
New player in text-to-image space: Black Forest Labs
LangGraph Studio announcement
Mixed-modal language modeling with Meta MoMa
- Llama 3.1 under scrutiny: Llama 3.1 has taken the world by storm but faces criticism for differences in quality when different inference providers use different implementations (Together AI blog).
- Notable figures in the AI community have pointed out inaccuracies and potential hallucinations in Together AI's evaluations and claim cherry-picked results, emphasizing the importance of transparent methodology and rigorous data-based testing (discussion thread).
- Sybill raises $11M for AI SDR: Sybill announced raising $11M in Series A funding to build a personal assistant for every sales rep, led by Greystone Ventures and other notable VCs (read more).
- The market for AI-powered sales tools is heating up, and Sybill’s feature of cloning the seller's voice to draft relevant follow-ups was highlighted as particularly on-point.
- Black Forest Labs emerges in text-to-image space: Black Forest Labs launched with a new suite of SOTA text-to-image models called FLUX.1, which includes a 12B param model available under non-commercial and open licenses on Huggingface (announcement and model weights).
- The team consists of former Stable Diffusion members, and their pro model is already available for testing on Replicate.
- LangGraph Studio: New Agent IDE: LangChain announced LangGraph Studio, a specialized IDE for agentic applications, enabling better visualization, interaction, and debugging of LLM workflows (announcement).
- The tool integrates with LangSmith for collaboration and aims to make developing LLM applications more efficient and accessible.
- Meta introduces MoMa for mixed-modal language modeling: Meta announced MoMa, a new sparse early-fusion architecture for mixed-modal language modeling, improving pre-training efficiency (paper and announcement).
- MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups, handling interleaved mixed-modal token sequences efficiently.
- Tweet from Ai2 (@allen_ai): After months of behind-the-scenes research, interviews, and labors of love, we’re delighted to debut Ai2’s new brand and website today. Explore the evolution 🧵
- Tweet from nisten (@nisten): hacked bitnet for finetuning, ended up with a 74mb file. It talks fine at 198 tokens per second on just 1 cpu core. Basically witchcraft. opensourcing later via @skunkworks_ai base here: https://huggi...
- Tweet from Elon Musk (@elonmusk): @nmasc_ @KalleyHuang @steph_palazzolo The [Mis]Information strikes again. xAI is not considering an acquisition of Character AI.
- Tweet from Noah Hein (@TheNoahHein): trying out the @bfl_ml flux-dev model on @replicate! Here's a list of it's outputs, with the prompt, and a side-by-side comparison of the same prompt into MJ! Flux is on the left, MJ on the ...
- Tweet from Tim Dettmers (@Tim_Dettmers): After 7 months on the job market, I am happy to announce: - I joined @allen_ai - Professor at @CarnegieMellon from Fall 2025 - New bitsandbytes maintainer @Titus_vK My main focus will be to strengthe...
- Tweet from LlamaIndex 🦙 (@llama_index): Today we’re excited to introduce @llama_index workflows - a new event-driven way of building multi-agent applications. Model each agent as a component that subscribes to events and emits events; you c...
- Tweet from LangChain (@LangChainAI): 🚀Announcing LangGraph Studio: The first agent IDE LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debu...
- Tweet from undefined: no description found
- Tweet from Yangqing Jia (@jiayq): As an AI researcher and engineer, I fully respect together's achievement but would like to also point out the many cherrypicked errors. I am sure they are unintentional, but evaluation of LLMs is ...
- Tweet from Dmytro Dzhulgakov (@dzhulgakov): Example: AI researcher question “What is group query attention?” Claim: Factually correct, and detailed answer Reality: The answer implies that GQA is some form of sequence-sparse attention. However...
- Tweet from Baseten (@basetenco): We're excited to introduce our new Engine Builder for TensorRT-LLM! 🎉 Same great @nvidia TensorRT-LLM performance—90% less effort. Check out our launch post to learn more: https://www.baseten.c...
- Tweet from Dmytro Dzhulgakov (@dzhulgakov): This you? We ran your show-case example 3 times on Together playground, and it infinitely looped or answered incorrectly every time. Curious how that slipped through all 5 steps of your quality testin...
- Tweet from Together AI (@togethercompute): Recently there has been considerable discussion on differences in quality when different inference providers use different implementations of Meta's Llama 3.1 models. In the blog post below, we ...
- Tweet from Contextual AI (@ContextualAI): We’re excited to share today that we’ve raised $80M in Series A funding to accelerate our mission to change the way the world works through AI. Read more at our blogpost: https://contextual.ai/news/an...
- Tweet from Romain Huet (@romainhuet): @triviatroy @OpenAI The dollar price per image is the same for GPT-4o and GPT-4o mini. To maintain this, GPT-4o mini uses more tokens per image. Thank you for your observation!
- Tweet from Nishit Asnani (@asnani04): 🚀 Big news! Sybill raised $11M in Series A funding, led by @greycroftvc , with participation from @neotribevc, Powerhouse VC, and Uncorrelated VC. We're building a personal assistant for every ...
- Tweet from Victoria X Lin (@VictoriaLinML): 1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (https://arxiv.org/pdf/2407.21770). MoMa employ...
- Tweet from Stability AI (@StabilityAI): We are excited to introduce Stable Fast 3D, Stability AI’s latest breakthrough in 3D asset generation technology. This innovative model transforms a single input image into a detailed 3D asset in just...
- Tweet from Character.AI (@character_ai): Thrilled to share that we're open sourcing our innovative approach to prompt design! Discover how Prompt Poet is revolutionizing the way we build AI interactions in our latest blog post: https://r...
- Tweet from Robin Rombach (@robrombach): 🔥 I am so damn excited to announce the launch of Black Forest Labs. We set ourselves on a mission to advance state-of-the-art, high-quality generative deep learning models for images and video, and m...
- Tweet from lmsys.org (@lmsysorg): Exciting News from Chatbot Arena! @GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Goo...
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): Black Forest Labs announces new suite of SOTA text-to-image models called FLUX.1 Best model FLUX.1[pro] behind API FLUX.1[dev] is 12B param model under non-commercial license FLUX.1[dev] is 12B pa...
- Introducing GitHub Models: A new generation of AI engineers building on GitHub: We are enabling the rise of the AI engineer with GitHub Models – bringing the power of industry leading large and small language models to our more than 100 million users directly on GitHub.
- Llama 3.1: Same model, different results. The impact of a percentage point.: no description found
- Tweet from Griffin Adams (@GriffinAdams92): Announcing Cold Compress 1.0 with @answerdotai A hackable toolkit for using and creating KV cache compression methods. Built on top of @cHHillee and Team’s GPT-Fast for torch.compilable, light-weigh...
- Self-directed Synthetic Dialogues (and other recent synth data): A talk covering a recent synthetic data project we launched. Find the details below.https://arxiv.org/abs/2407.18421Slides: https://docs.google.com/presentat...
- Reddit - Dive into anything: no description found
- black-forest-labs/flux-pro – Run with an API on Replicate: no description found
LlamaIndex ▷ #blog (3 messages):
Async functionality for BedrockConverse
LongRAG paper by @Ernestzyj
@llama_index workflows
- Async functionality now in BedrockConverse: Async methods for BedrockConverse LLM have been implemented, resolving issues #10714 and #14004.
- This contribution was greatly appreciated by the team for enhancing user experience.
- LongRAG paper simplifies long-context LLMs: The LongRAG paper by @Ernestzyj proposes indexing and retrieving larger document chunks to better utilize long-context LLMs.
- This approach aims to ease the retriever’s tasks, enhancing the retrieval-augmented generation (RAG) process.
- @llama_index introduces workflows: @llama_index workflows enable event-driven multi-agent applications, allowing agents to subscribe to and emit events.
- This new approach offers a readable and Pythonic way to build complex orchestration.
Link mentioned: feat: ✨ Implement async functionality in BedrockConverse
by AndreCNF · Pull Request #14326 · run-llama/llama_index: Description Implement async methods for the BedrockConverse LLM. Fixes #10714 Fixes #14004 New Package? Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.m...
LlamaIndex ▷ #general (47 messages🔥):
Alternatives to RagApp
Generating Images with LlamaParse
Stable Versions of LlamaIndex
Handling Agent Errors in ReAct
Configuration in LlamaIndex
- Searching Alternatives to RagApp: A user inquired about alternatives to RagApp and discussed the usefulness of
create-llama
despite some install issues with Poetry. - Generating Images with LlamaParse: Users discussed methods for generating images with LlamaParse, referencing GitHub examples and additional resources.
- Identifying Stable Versions of LlamaIndex: A user questioned how to identify the 'stable' version of LlamaIndex, and it was clarified that installing via pip ensures the latest stable version.
- Further comments emphasized that the 'stable' version typically refers to the latest release on PyPI.
- Handling Errors in ReAct Agent: A user explored making ReAct agents function without invoking tools and discussed alternative approaches like
SimpleChatEngine
or handling agent errors more gracefully.- Suggestions included using
llm.chat(chat_messages)
for a simpler setup and exploring the function calling agent for better tool handling.
- Suggestions included using
- Configuring Parameters in LlamaIndex: There was a discussion on setting parameters like
max_input_size
and chunk overlap in LlamaIndex v10.x after the removal of thePromptHelper
.- Alternatives like passing configurations directly to node parsers or using response synthesizers were suggested.
- llama_parse/examples/multimodal/multimodal_rag_slide_deck.ipynb at main · run-llama/llama_parse: Parse files for optimal RAG. Contribute to run-llama/llama_parse development by creating an account on GitHub.
- llama_parse/examples/demo_json.ipynb at main · run-llama/llama_parse: Parse files for optimal RAG. Contribute to run-llama/llama_parse development by creating an account on GitHub.
- Cassandra Vector Store - LlamaIndex: no description found
LlamaIndex ▷ #ai-discussion (1 messages):
DSPy
Prompt Optimizing
Prompt Rewriting
LlamaIndex
- Comparing DSPy prompt optimization with LlamaIndex: A member inquired about others' experiences with DSPy and requested opinions on its prompt optimizing versus prompt rewriting capabilities in LlamaIndex.
- DSPy Prompt Optimization versus LlamaIndex: Interest was expressed in comparing prompt optimization and prompt rewriting between DSPy and LlamaIndex.
Cohere ▷ #discussions (16 messages🔥):
Embedding content structure
Table and checkbox detection in PDFs
AI Hackathon Series Tour
Ivan as a Gamer
Cows as Pets
- Discussion on leveraging content structure for embeddings: Queries about the impact of new lines, page-breaks, and special symbols on embedding performance was discussed, with Nils Reimers confirming these elements are removed automatically in English and multilingual models.
- No need to preprocess the text extensively for embedding models was the key takeaway, with models being robust enough to handle noisy data.
- Detect and extract table and checkbox data from PDFs: A member sought recommendations for models to detect tables and checkboxes from non-readable PDFs to extract into text or docx formats.
- The suggestion highlighted the effectiveness of using unstructured.io for converting PDF data into JSON format, evidenced by a similar ongoing project within the community.
- Join the AI Hackathon Series Tour at Google: The AI Hackathon Series Tour invites registrations for an event at Google, encompassing innovative AI projects and a competition over 3 days.
- The event provides a creative and competitive platform, concluding with the PAI Palooza, showcasing top AI startups and projects from the host city.
- Ivan's gaming background revealed: A LinkedIn article shared revealed Ivan's past as a gamer, surprising some community members.
- Karthik_99_ expressed amazement on discovering Ivan's transition from gaming to AI co-founder.
- Taking care of cows: A lighthearted comment on owning cows led to the observation that they are a lot of work, addressing a member's jealousy.
Link mentioned: Techstars StartUp Weekend - PAI Palooza & GDG Build with AI—Mountain View · Luma: This AI Hackathon Series Tour is a groundbreaking, multi-city event that spans the United States, bringing together the brightest minds in artificial…
Cohere ▷ #questions (17 messages🔥):
Training LLMs for Arabic Dialects
Joining the Cohere Research Community
Training LLMs for JSON Output
- Training LLMs for Arabic Dialects: A member queried how models like Aya can generate fluent responses in different Arabic dialects without explicit dialect information in the training prompts.
- They expressed surprise that a prompt in English asking for an Egyptian dialect would correctly generate text in that form.
- Joining the Cohere Research Community: A member reported issues joining the Cohere research community and being signed up for newsletters instead.
- Responses mentioned the manual review process and apologized for delays, asking the member to DM their email for a status update.
- Training LLMs for JSON Output: A member asked about training an LLM to convert free-form search queries into structured JSON for Apache Solr input.
- It was suggested they could manually label data, find labeled data, or generate data synthetically, and to check out Cohere's documentation for producing structured outputs.
Link mentioned: Structured Generations (JSON): no description found
Cohere ▷ #api-discussions (15 messages🔥):
August OH event
Ukrainian/Russian language support degradation
Citation_quality settings
Speed optimization for Cohere Cloud
- Invitation to August OH Event: A member invited others to join the August OH event for a meetup.
- They encouraged participation by suggesting the event would be a fun hangout.
- Degradation in Ukrainian/Russian Language Support: A user reported experiencing degradation in Ukrainian/Russian language support on Cohere Cloud, resulting in broken characters.
- The issue was linked to the citation_quality setting, and switching from fast to accurate resolved it, although this affected response speed.
Cohere ▷ #cohere-toolkit (3 messages):
devcontainer issue
pydantic validation error
repository update
team response
- Validation errors block repository setup: A member reported issues running the latest version of the repository in a devcontainer, encountering various pydantic validation errors related to the
Settings
class.- Six validation errors were noted, specifically missing fields like auth.enabled_auth and auth.google_oauth, which caused
make setup
to fail.
- Six validation errors were noted, specifically missing fields like auth.enabled_auth and auth.google_oauth, which caused
- Team swiftly addresses devcontainer issues: The issue was acknowledged quickly by another member, promising that the team would look into and resolve the errors.
- An update followed shortly, confirming that the team is already working on a fix.
Link mentioned: Redirecting...: no description found
LangChain AI ▷ #general (45 messages🔥):
Pydantic type error in LangChain
Executing tools in LangChain
LangSmith API key issue
LangChain and deployment
LangChain documentation and resources
- Pydantic version conflicts cause errors: A member encountered a
pydantic.v1.error_wrappers.ValidationError
despite having installed Pydantic v2, leading to a mismatch in expected types and validation errors during execution in LangChain. - Tool Execution Issues in LangChain: LangChain tools encounter issues when executing
execute_tools
node, causing failures due to input type mismatches and validation errors, despite correct Pydantic validation of inputs beforehand. - LangSmith API key setup troubles: A user struggled with a
403 Client Error: Forbidden
when trying to deploy an LLM with LangSmith, suspecting it was an issue related to the API key configuration. - LangChain resource suggestions and alternatives: Members discussed different sources for learning about LangChain and alternative LLM inference services, recommending OpenAI and TogetherAI for free or affordable usage with LangChain's prompt classes.
- LangChain documentation and error handling: Users were directed to example resources on LangChain's GitHub to troubleshoot various issues and avoid common errors with tool use and API integrations.
- Stable Fast 3D - a Hugging Face Space by stabilityai: no description found
- Build a Simple LLM Application with LCEL | 🦜️🔗 Langchain: In this quickstart we’ll show you how to build a simple LLM application
- langgraph/examples/plan-and-execute/plan-and-execute.ipynb at main · langchain-ai/langgraph: Build resilient language agents as graphs. Contribute to langchain-ai/langgraph development by creating an account on GitHub.
- langgraph/examples/tool-calling.ipynb at main · langchain-ai/langgraph: Build resilient language agents as graphs. Contribute to langchain-ai/langgraph development by creating an account on GitHub.
- LangChain.js - v0.2.12: no description found
- langgraph/examples/tool-calling-errors.ipynb at main · langchain-ai/langgraph: Build resilient language agents as graphs. Contribute to langchain-ai/langgraph development by creating an account on GitHub.
LangChain AI ▷ #langserve (2 messages):
Streaming Support in FastAPI LangChain Application
Using /stream_events endpoint in langserve v2
- Adding Streaming Support to FastAPI LangChain Application: A user proposed a design to add asynchronous streaming support to a FastAPI application with LangChain, focusing on using Redis as a message broker for real-time token generation.
- The design includes keeping existing synchronous endpoints, adding new streaming endpoints, and updating LangChain agents to publish chunks and full responses to Redis.
- Using /stream_events endpoint in langserve v2: A user asked for guidance on how to use the
/stream_events
endpoint in langserve version v2, mentioning that they couldn't find any documentation.- They expressed difficulty in finding information and sought help from the community.
LangChain AI ▷ #share-your-work (2 messages):
LangGraph design pattern
Advanced research assistant and search engine
GPT-4o
Claude 3 Opus
Llama 3.1
- LangGraph design pattern for user apps: A member shared a LangGraph design pattern for easy integration into user-facing apps like web-chats or Telegram/Whatsapp bots, with a detailed example available on GitHub.
- “Here's a LangGraph design pattern that can be easily integrated into your user-facing apps with streaming.”
- Rubik's AI Pro offers beta testing with premium models: A member invited others to beta test an advanced research assistant and search engine, offering 2 months of free premium that includes Claude 3 Opus, GPT-4o, Gemini 1.5 Pro, and other models via Rubik's AI.
- “Use the promo code
RUBIX
to get 2-months of free premium to test new features and expert models.”
- “Use the promo code
- Rubik's AI - AI research assistant & Search Engine: no description found
- ai-champ-design-patterns/ai-agents/LangGraph-multi-agent-user-facing.ipynb at main · TonySimonovsky/ai-champ-design-patterns: Contribute to TonySimonovsky/ai-champ-design-patterns development by creating an account on GitHub.
OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):
Moye Launcher
Digital detox tools
- Moye Launcher Promotes Digital Detox: Moye Launcher is a minimalist Android launcher with built-in AI-powered digital detox tools, aiming to reduce excessive screen time. It eliminates the app drawer to make apps less accessible, encouraging less impulsive app use.
- The launcher aims to address the top three reasons for unproductive screen time, such as auto-clicking due to boredom and lack of accountability, by removing easily accessible app icons and providing usage feedback.
- Digital Detox Tools Explained: Moye Launcher uses AI tools to help users stay accountable and avoid unnecessary app usage, providing reminders and tracking usage.
- These features target the main reasons for unproductive screen time: auto-clicking of apps, lack of a 'watchman,' and forgetting why an app was opened initially.
Link mentioned: Moye Launcher: Digital Detox - Apps on Google Play: no description found
OpenRouter (Alex Atallah) ▷ #general (39 messages🔥):
Lobe interface
Librechat capabilities
Big-agi features
Msty tool integrations with Obsidian
Llama 405B Instruct providers
- Big-agi expands model capabilities with BEAM: Big-agi introduces a 'persona creator' that allows users to generate prompts from YouTube videos or text and the BEAM feature to call 2/4/8 models simultaneously and merge their responses.
- However, it lacks server saving and easy syncing capabilities.
- Msty integrates Obsidian and websites: Msty offers slick integrations with Obsidian and website access, though its parameter settings are reportedly easily forgotten.
- Despite minor polish issues, many users find it appealing and are considering switching to it.
- Llama 405B Instruct providers and quantization: There are no FP16 providers for Llama 405B on OpenRouter, and FP8 quantization, recommended by Meta, runs more efficiently than FP16.
- SambaNova Systems runs in bf16 but is limited to 4k context length, and hosting in bf16 is computationally expensive.
- API Integration with OpenRouter under Beta: Users seeking API integration to handle rate limits and integrate OpenAI and Claude API are advised to email support to join the Beta waitlist.
- Detailed requests can be directed to support@openrouter.ai for assistance.
- OpenRouter website faces occasional regional issues: The OpenRouter website experiences occasional regional connection issues but generally remains operational.
- Users can check status updates for real-time operational information via the OpenRouter status page.
- SambaNova Systems | Revolutionize AI Workloads: Unlock the power of AI for your business with SambaNova's enterprise-grade generative AI platform. Discover how to achieve 10x lower costs & unmatched security.
- OpenRouter: LLM router and marketplace
- OpenRouter Status: OpenRouter Incident History
- DRY: A modern repetition penalty that reliably prevents looping by p-e-w · Pull Request #5677 · oobabooga/text-generation-webui: Looping is an undesirable behavior where the model repeats phrases verbatim that have previously occurred in the input. It affects most models, and is exacerbated by the use of truncation samplers....
OpenInterpreter ▷ #general (23 messages🔥):
Open Interpreter Response Delays
Groq Profile Contribution
Accessibility Roundtable Announcement
House Party Event
Community Building Focus
- Open Interpreter Response Delays: Members are concerned about a delayed response from Ben Steinher of Open Interpreter; he was expected to respond 'early next week' on the 11th of July.
- Groq Profile Contribution Celebrated: A member announced a new PR for a Groq profile, describing it as a great way to contribute to the Open Interpreter project.
- Heyyy we love Groq around these ends 😁
- Accessibility Roundtable on August 22nd: Accessibility Roundtable announced for August 22nd at noon PST, inviting members to participate in a discussion about accessibility.
- Excitement for House Party Event: Members reminded others about the House Party event happening in 4 hours, providing a link to the event.
- There appeared to be some confusion about the event's start time, but the issue was resolved and participants joined the correct voice channel.
- Community Building AI Focus: A member shared their AI project's focus on community-building, specifically fostering backyard barbecue neighborhood friendships.
- "This is so important!! And community block parties without an HOA lol"
- Friend Reveal Trailer: not imaginary. preorder now at friend.com.
- Added Groq profile and flag by MikeBirdTech · Pull Request #1376 · OpenInterpreter/open-interpreter: Added Open Interpreter groq profile support via default groq.py file, updated parser for CLI shortcut in start_terminal_interface.py to accept --groq flag to apply the profile Describe the changes ...
OpenInterpreter ▷ #O1 (8 messages🔥):
Model Selection Questions
01 Workflows and Scheduling
iKKO ActiveBuds
01 Shipping Status
Earbuds with Camera
- Confusion Around Model Selection and API Key Use: A member expressed confusion about selecting the model string and why an OpenAI API key is needed when running '01 --local.'
- They cited their lack of knowledge about these basic concepts.
- 01 Workflows and Scheduling Capabilities?: A member inquired if OpenInterpreter (OI) can save workflows and set up task schedules.
- The question remains unanswered within the given messages.
- 01 on iKKO ActiveBuds Would Be Dope: Members discussed the potential integration of 01 on the iKKO ActiveBuds, which boasts features like an AI-Smart System, AMOLED Touchscreen, and High-Resolution Sound.
- The idea was endorsed as feasible and exciting for improved Human-Computer Interaction (HCI).
- Immediate Need for 01 Shipping Information: A member asked about the shipping status of 01 since it is already August.
- Response linked without further details provided in the conversation.
- Desire for Earbuds with Camera: Members expressed a desire for earbuds featuring a camera that can capture context while conversing with an LLM.
- The idea includes a push/tap feature to activate the camera, enhancing Human-Computer Interaction capabilities.
Link mentioned: ActiveBuds: AI-Smart Earphones with ViVid Touchscreen | iKKO Audio: AI Voice Assistant by ChatGPT-4o. High-bitrate Bluetooth pairing for high-resolution wireless audio among earphones, speakers, smartphones. 45 languages translations. Portable memos for ChatGPT and tr...
Modular (Mojo 🔥) ▷ #general (18 messages🔥):
Mojo Threads
Max and Mojo Packaging
Tier Chart Discussion
Existential Quantifiers
- Mojo lacks explicit thread support: A member asked if Mojo supports threads and another member confirmed Mojo does not currently expose thread support to users.
- However, calling fork() and getting threads that way is tolerated in the compiled version.
- MAX and Mojo packaging changes announced: Announcements were made about changes to MAX and Mojo packaging starting with version 0.9 of the
modular
CLI, making authentication unnecessary to download MAX and Mojo.- Further changes include merging Mojo nightly packages with MAX and transitioning to a new
magic
CLI for easier integration into the Conda ecosystem.
- Further changes include merging Mojo nightly packages with MAX and transitioning to a new
- Tier chart discussion causes confusion: A discussion ensued about a tier chart, with members questioning its representation and noting that it did not reflect a 'level of abstraction'.
- Suggestions were made to replace the entire iceberg with a fire emoji for simplicity.
Link mentioned: MAX FAQ | Modular Docs: Answers to questions we expect about MAX Engine.
Modular (Mojo 🔥) ▷ #mojo (4 messages):
CrazyString gist update
Unicode based indexing
- *CrazyString Gist Adds Unicode Support*: CrazyString gist now includes support for Unicode-based indexing, along with small string optimization and full UTF-8 compatibility.
- Mojo String with small string optimisation and potential full UTF-8 support described in the update.
- Math and Computation as Universal Languages: A member remarked that 'Math is the universal language and Computation is the universal action'.
Link mentioned: Mojo String with small string optimisation and potential full UTF-8 support: Mojo String with small string optimisation and potential full UTF-8 support - crazy_string.mojo
Modular (Mojo 🔥) ▷ #max (5 messages):
Installing max on Mac M1 Max
Mojo compatibility with Python
- Issue with Installing max on Mac M1 Max: A member reported facing issues while trying to install max on a Mac M1 Max device.
- Another member suggested following this fix for Python installation to potentially resolve the problem.
- Mojo aims to be a superset of Python: Mojo is designed to be compatible with existing Python programs, allowing programmers to use it immediately while leveraging the vast ecosystem of Python packages.
- Mojo is in early development and many Python features are not yet implemented, but it allows importing Python modules, calling Python functions, and interacting with Python objects.
Link mentioned: Python integration | Modular Docs: Using Python and Mojo together.
OpenAccess AI Collective (axolotl) ▷ #general (8 messages🔥):
Automated Training Run Termination
Early Stopping in Axolotl
Manual Run Termination
Output Mask Field Proposal
- Axolotl Implements Early Stopping: A member inquired if Axolotl has features to automatically terminate training runs when loss converges asymptotically or validation loss increases.
- Another member confirmed that Axolotl supports early stopping for this purpose.
- Manually Terminate and Save Current LoRA Adapter: A member asked if they could manually terminate a run while saving the most recently trained LoRA adapter instead of canceling the whole run.
- There was no follow-up from the community on this request.
- Output Mask Field in SharedGPT: A member proposed adding an "output mask" field in every turn of the SharedGPT to allow selective training on outputs.
- They explained that this would let the AI make and subsequently learn from mistakes in the masked fields.
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (5 messages):
Chat templates documentation
Preprocessing step issue
- Documentation for new chat templates needed: A member mentioned the need for documentation for new chat templates, stating that it was challenging to understand how they work and how to extract specific parts of a message.
- Another member noted that they had already written some documentation for themselves and would try to add it to the official docs.
- Bug in preprocessing step with older version: A member requested an example to run just the preprocess step on an older version of the main branch to identify a bug causing improper tokenization.
- They indicated that the bug needs to be fixed as it only triggers in some cases.
OpenAccess AI Collective (axolotl) ▷ #general-help (6 messages):
Pad Token Repetition in Model Training
Dataset Viewers for Conversation Cleaning
Training and Finetuning Llama3
- Issues with Pad Token Repetition in Model Training: A member discussed the occurrence of
<pad>
repetition likely due to not using sample packing and possibly related to enabling eager attention instead of flash.- Caseus mentioned that the pad tokens should be masked out from the label to prevent this issue.
- Need for Better Dataset Viewers: A member sought recommendations for a dataset viewer that allows both viewing and editing conversations beyond simple jsonl format.
- Argilla was suggested, highlighting its collaboration tool capabilities for AI engineers and integration with Hugging Face, but this didn't meet the member's needs.
- Finetuning Llama3 for Translation: A member asked for advice on the best dataset for finetuning Llama3 as a translation model, citing their current limit of 8 billion parameters and showcasing their dataset on Hugging Face.
- Diabolic6045 shared a Sanskrit text dataset on Hugging Face used for translation, including both the Sanskrit source and English translation.
- The tool where experts improve AI models: Argilla is a collaboration tool for AI engineers and domain experts that strive for data quality, ownership, and efficiency.
- 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 : no description found
- diabolic6045/Sanskrit-llama · Datasets at Hugging Face: no description found
OpenAccess AI Collective (axolotl) ▷ #replicate-help (1 messages):
Serverless GPUs
AI Infrastructure
Inferless report
Cold starts
Autoscaling tests
- Inferless Publishes New Serverless GPUs Report: Inferless published a follow-up report on the state of Serverless GPUs, highlighting significant changes and improvements since their previous report six months ago.
- The report gained traction on Hacker News and includes insights from hundreds of engineers deploying machine learning models in production.
- Cold Starts and Autoscaling Tests in New Report: The new Inferless report discusses cold starts and autoscaling tests across different serverless GPU providers.
- These insights help developers make informed decisions when choosing their serverless provider.
Link mentioned: Serverless GPU Part 2 Benchmarking: A Comprehensive Comparison of Performance & Pricing: Dive into an in-depth review of Serverless GPU platforms. Explore cold-start times, integration challenges, pricing comparison and auto-scaling capabilities. Make informed choices with our detailed an...
OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (4 messages):
Gemma2 models training
Eager attention implementation
flash_attention_2
AutoModelForCausalLM
- Training Gemma2 Models: Use Eager Attention: It is strongly recommended to train Gemma2 models with the
eager
attention implementation instead offlash_attention_2
by usingAutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')
. - Eager Attention Over Flash_Attention_2 for Gemma2: The
eager
attention implementation should be used overflash_attention_2
for training Gemma2 models to ensure optimal performance.- A detailed example code demonstrates how to set this in the
AutoModelForCausalLM
.
- A detailed example code demonstrates how to set this in the
Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
DSPy ▷ #general (10 messages🔥):
Saving/Loading OptimizerResult
Improving JSON Parsing
Parallel Execution in DSPy Module
LiteLLM Proxy Issues with Non-OpenAI Models
DSPy with BIG-Bench via Weights & Biases
- Saving/Loading OptimizerResult for Typed Optimizers: A user inquired whether there is a method to save/load OptimizerResult for typed optimizers similar to untyped optimizers.
- Schema-Aligned Parsing to Reduce JSON Errors: A user proposed moving to Schema-Aligned Parsing to reduce unnecessary retries due to bad JSON output, noting it would also consume fewer tokens.
- They lamented that their TypedPredictor ends up with a large JSON schema and this method could be more efficient.
- Parallel Execution in DSPy Module: A user asked if it's possible to run
dspy.Predict
in parallel within a module, showing an example where they wish to parallelize thefor c in criteria
loop. - LiteLLM Proxy Issues with Non-OpenAI Models: A user reported encountering errors when using LiteLLM proxy with non-OpenAI models such as Claude, mistral, and llama models, despite it working well for OpenAI models.
- They shared the code used:
dspy.OpenAI(model = 'gpt-3.5-turbo', api_base = BASE_API, max_tokens = 1024)
.
- They shared the code used:
- DSPy Integration with BIG-Bench and Weights & Biases: A user found an example on Twitter on how to use DSPy for causal reasoning tasks from BIG-Bench Hard and evaluate via Weights & Biases Weave.
- However, they encountered an
OpCallError
due to an unexpected keyword argument 'system_prompt' while executing the related Colab notebook.
- However, they encountered an
- Tweet from GeekyRakshit (e/mad) (@soumikRakshit96): 🍀 DSPy is a framework that pushes modular "programming" models for prompting and lets us optimize our prompting strategies automatically using a teleprompter. 🧑💻 I created an example demo...
- Google Colab: no description found
- Prompting vs JSON Mode vs Function Calling vs Constrained Generation vs SAP: no description found
DSPy ▷ #random (1 messages):
Effortless AI article
Chatmangpt features
- Effortless AI with Chatmangpt: A LinkedIn article discusses the simplicity and power of Chatmangpt for harnessing AI capabilities effortlessly.
- Chatmangpt features overview: The article emphasizes how Chatmangpt's features integrate seamlessly into existing workflows, maximizing efficiency and productivity.
DSPy ▷ #papers (8 messages🔥):
Integration of DSPy with symbolic learner
True Agentic Behavior
Self-Adapting AI Agents
Agent Zero
Novel Meta-Rewarding in Self-Improvement of LLMs
- DSPy integrates with Symbolic Learner: Members are excited about the potential of integrating DSPy with a symbolic learner, anticipating significant advancements.
- One comment expressed excitement about the development, suggesting this could be a major leap forward.
- Microsoft's Self-Adapting AI Agents Break New Ground: A shared Microsoft Research blog post highlights advancements in self-adapting AI agents, suggesting profound implications for the workplace.
- The blog emphasizes that the games industry has historically driven AI innovation, culminating in modern applications like ChatGPT and Microsoft Copilots.
- Agent Zero Debuts: Agent Zero has been mentioned as the first production version tested by users, showcasing significant potential.
- Opinions suggest that agents like Agent Zero are paving the way for AI to take on more roles in the workplace.
- Meta-Rewarding Improves Self-Judgment in LLMs: New research on arXiv introduces a Meta-Rewarding step enhancing the judgment capabilities of LLMs during the self-improvement process.
- This method led to substantial win rate improvements on benchmarks like AlpacaEval 2, demonstrated by models such as Llama-3-8B-Instruct.
- MindSearch: LLM-Based Multi-Agent Framework: A recent paper on arXiv introduces MindSearch, which mimics human cognitive processes in web information seeking and integration using LLM-based multi-agent frameworks.
- The study addresses challenges in information retrieval, noise management, and context handling, aiming to enhance the capabilities of modern search-assisted models.
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher: Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this...
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge: Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et a...
- Discover Trace, a new framework for AI optimization from language models to robot control: Introducing Trace, Microsoft and Stanford University's novel AI optimization framework, now available as a Python library. Trace adapts dynamically and optimizes a wide range of applications from...
DSPy ▷ #jobs (2 messages):
Official Job Board Setup
Bounties for Tutorial Blog Posts
- Official Job Board Setup Announced: An official job board is being set up, and members are invited to list their jobs for free by sending a DM.
- Bounties for Tutorial Blog Posts: A call was made for members interested in claiming bounties for writing tutorial blog posts.
DSPy ▷ #colbert (1 messages):
amey_86281: Has anyone used Colbert Embeddings and store the embeddings in Pinecone ?
tinygrad (George Hotz) ▷ #general (2 messages):
NVIDIA's impact on taxpayer money
Discord rules reminder by George Hotz
- *NVIDIA Taxpayer Money Love: A user expressed affection for taxpayer money being directed toward NVIDIA*.
- *George Hotz Reminds of Discord Rules: George Hotz reminded users of the discord rules emphasizing that the chat is for tinygrad development and usage discussions*.
tinygrad (George Hotz) ▷ #learn-tinygrad (11 messages🔥):
GPT-2 Slowdown
Embedding/Argmax Inefficiency
Setup Environment for Tinygrad
Bounty for Embeddings
Cumsum O(n) Complexity
- GPT-2 Slowed by Embedding/Argmax Bottleneck: A user identified that the use of
Tensor.arange
in GPT-2 implementation results in inefficiencies, slowing down the model (Issue #1612).- The problem stems from the O(n^2) complexity due to looping over embeddings with masking, instead of direct fetching.
- Bounty for Embeddings Addressed to Specific User: There is a bounty for improving embeddings, but it is currently exclusive to a user named Qazalin.
- Thus, new contributors are encouraged to explore other issues in the codebase.
- Exploring Embedding Code in Tinygrad: Discussion detailed the functioning of the
Embedding
feature within tinygrad, including an example kernel code clarifying its execution.- A member initially misunderstood the purpose of summing across the input embeddings matrix and later acknowledged the correct implementation.
- Cumsum Complexity Discussion: A user questioned the impossibility of making
cumsum
O(n) in the context of tinygrad (Issue #2433).- George Hotz encouraged experimentation to explore potential optimizations.
- Embedding/argmax are O(n^2) · Issue #1612 · tinygrad/tinygrad: This is making GPT-2 slow
- Embeddings are slow and shouldn't be · Issue #2433 · tinygrad/tinygrad: While it's not possible to make cumsum O(n), it should be possible to make Embeddings O(n). It's beyond ARANGE, but points the way to fast selection for dataloader.
- tinygrad/tinygrad/nn/__init__.py at c6a8395f1b726c00c47a65ba0252e7d142b7738a · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad
LAION ▷ #general (4 messages):
ChatGPT Advanced Voice Mode
Black Forest Labs Launch
FLUX.1 Model
- ChatGPT Multilingual Voice Stunt: A user shared ChatGPT Advanced Voice Mode performing a linguistic stunt by reciting a couplet in Urdu and telling stories in multiple languages including Hebrew, Norwegian, Moroccan Darija, Amharic, Hungarian, Georgian, and Klingon.
- Black Forest Labs Lights Up: A user expressed excitement about the launch of Black Forest Labs aimed at advancing state-of-the-art generative deep learning models for images and video, underlined by their new release, FLUX.1.
- Black Forest Labs is committed to pushing the boundaries of creativity, efficiency, and diversity in media with their new mission and model.
- FLUX.1 Debuts on Hugging Face: A user shared a link to the FLUX.1 model, highlighting its impressive capabilities.
- Refreshing and super good were comments made about the performance of FLUX.1.
- Tweet from Robin Rombach (@robrombach): 🔥 I am so damn excited to announce the launch of Black Forest Labs. We set ourselves on a mission to advance state-of-the-art, high-quality generative deep learning models for images and video, and m...
- FLUX.1 [Schnell] - a Hugging Face Space by black-forest-labs: no description found
- Tweet from Cristiano Giardina (@CrisGiardina): ChatGPT Advanced Voice Mode recites a couplet in Urdu → tells a story in Hebrew → Norwegian → Moroccan Darija → Amharic → Hungarian → Georgian → finally attempts some Klingon
LAION ▷ #research (6 messages):
Normalization and activation functions
Regularization techniques
Common code errors
- Experimenting with activation functions on complex-valued activations: A user mentioned experimenting with different normalization and activation functions on complex-valued activations and noted it was 'kinda fun!'
- Data augmentation and regularization techniques discussed: A link on data augmentation was shared, but a member noted that techniques like data augmentation, dropout, and weight decay merely delay overfitting and do not significantly reduce final validation error.
- 'They delay overfitting but don't generally reduce the final val error much.'
- Code typo discovered after 50+ experiments: A user found a stupid typo in their code which had been obstructing the architecture's performance in the past 50+ experiments.
Link mentioned: Data Augmentation Techniques in CNN using Tensorflow: Recently, I have started learning about Artificial Intelligence as it is creating a lot of buzz in industry. Within these diverse fields of…
Torchtune ▷ #general (5 messages):
model performance
generate recipe debugging
llama3 model
top_p settings
- Online model outperforms user's own model: A member noted that testing 0.8 online yielded much better results than their own model.
- Top_p=50 considered acceptable: The member reported that top_p=50 seemed perfectly fine for their needs.
- Generate recipe meant for debugging, not optimal quality: Another member clarified that the generate recipe is intended for debugging, not to showcase optimal performance, but aims for a high-quality, accurate sampling of the trained model.
- Evaluation tests using the same generation utils showed similar numbers to reported benchmarks, and any quality issues should be submitted as an issue.
- Rechecking performance of original llama3 model: A member planned to create a new server instance, download the llama3-8B-instruct model again, and test it on standard settings to check if the generation quality still differs from the online benchmarks.
Torchtune ▷ #dev (4 messages):
PR Merge
FSDP2
Quantization APIs
QAT and FSDP2 Compatibility
- Merged fine-tuning datasets discussed in PR #1234: A member mentioned that they will put up a separate PR after PR #1234 gets reviewed and landed since it depends on some elements from this PR.
- FSDP2 supports both quantization and NF4 tensor: A member noted that FSDP2 should support both quantization for NF4 tensor and possibly QAT, although they have not tried many other quantization APIs.
- They also mentioned that for their current QAT recipe, compile won't work with FSDP2.
Link mentioned: [1/n] Merged fine-tuning dataset: grammar + samsum by RdoubleA · Pull Request #1234 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) As discussed in the RFC in #1186, we will merged instruc...
MLOps @Chipro ▷ #events (2 messages):
Data Phoenix Webinar
ELT Workshop with dlt
- Data Phoenix Hosts Webinar on Enhancing Recommendation Systems: The Data Phoenix team is hosting a free webinar on August 8 at 10 a.m. PDT, titled 'Enhancing Recommendation Systems with LLMs and Generative AI,' featuring Andrei Lopatenko, VP AI & Engineering.
- The talk will discuss how LLMs and Generative AI can revolutionize recommendation systems and personalization engines. Register here.
- 4-hour Comprehensive ELT Workshop with dlt: A 4-hour workshop on robust and easy ELT with dlt is being held to teach data enthusiasts and engineers how to build ELT pipelines, with a registration link here.
- Completion includes a 'dltHub ELT Engineer' certification. The first part covers dlt fundamentals and takes place online on 15.08.2024 at 16:00 GMT+2.
- dltHub events: Come meet the dltHub team at these events.
- Enhancing Recommendation Systems with LLMs and Generative AI · Luma: The Data Phoenix team invites you to our upcoming webinar, which will take place on August 8 at 10 a.m. PDT. Topic: Enhancing Recommendation Systems with LLMs…
MLOps @Chipro ▷ #general-ml (5 messages):
Computer Vision
Conferences on Machine Learning
Gaussian Processes
Isolation Forest
GenAI ROI
- Machine Learning Conferences Emphasize NLP & GenAI: A member shared their experience attending two machine learning conferences in the past year where their presentations on Gaussian Processes and Isolation Forest models were overshadowed by the focus on NLP and genAI.
- They noted that many attendees had no idea about their work, highlighting the prevalent interest in NLP and genAI technologies.
- Skepticism Surrounds GenAI ROI Expectations: Discussion revolved around skepticism that the ROI from genAI might not meet high expectations.
- One member commented that a return on investment first requires a return of investment, emphasizing the need for realistic expectations.
LLM Finetuning (Hamel + Dan) ▷ #general (3 messages):
LangSmith credit access
Payment method issues
- LangSmith Credits Inaccessible Without Payment Method: Digitalbeacon raised a concern about being unable to access credits in LangSmith despite adding a payment method. His organization ID is 93216a1e-a4cb-4b39-8790-3ed9f7b7fa95 and he used a different email ID in the form than in the course.
- Danbecker advised contacting support for any credit-related issues.
- Payment Method Issues for LangSmith Credits: Digitalbeacon mentioned adding a payment method but still seeing zero credits in LangSmith. They asked for assistance because they had filled out the form on time.