[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet day is all you need.
AI News for 11/29/2024-12/2/2024. We checked 7 subreddits, 433 Twitters and 29 Discords (198 channels, and 4766 messages) for you. Estimated reading time saved (at 200wpm): 563 minutes. You can now tag @smol_ai for AINews discussions!
Nothing big but lots of little notables:
- Lilian Weng released a Reward Hacking survey
- Pydantic launched their agent framework
- Supabase launched v2 of their assistant
- ChatGPT cannot say David Mayer
and teases (no product release):
- Browser Company teased their second browser
- World Labs launched image-to-3d-world
- The NotebookLM team left Google
- Cognition was on the cover of Forbes
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Cursor IDE Discord
- OpenAI Discord
- aider (Paul Gauthier) Discord
- Unsloth AI (Daniel Han) Discord
- Perplexity AI Discord
- LM Studio Discord
- Eleuther Discord
- OpenRouter (Alex Atallah) Discord
- GPU MODE Discord
- Nous Research AI Discord
- Modular (Mojo 🔥) Discord
- Notebook LM Discord Discord
- Latent Space Discord
- Stability.ai (Stable Diffusion) Discord
- tinygrad (George Hotz) Discord
- Cohere Discord
- DSPy Discord
- Torchtune Discord
- LLM Agents (Berkeley MOOC) Discord
- OpenInterpreter Discord
- Interconnects (Nathan Lambert) Discord
- LlamaIndex Discord
- PART 2: Detailed by-Channel summaries and links
- Cursor IDE ▷ #general (237 messages🔥🔥):
- OpenAI ▷ #ai-discussions (91 messages🔥🔥):
- OpenAI ▷ #gpt-4-discussions (7 messages):
- OpenAI ▷ #prompt-engineering (32 messages🔥):
- OpenAI ▷ #api-discussions (32 messages🔥):
- aider (Paul Gauthier) ▷ #general (83 messages🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (46 messages🔥):
- Unsloth AI (Daniel Han) ▷ #general (53 messages🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (4 messages):
- Unsloth AI (Daniel Han) ▷ #help (48 messages🔥):
- Unsloth AI (Daniel Han) ▷ #research (4 messages):
- Perplexity AI ▷ #announcements (1 messages):
- Perplexity AI ▷ #general (74 messages🔥🔥):
- Perplexity AI ▷ #sharing (9 messages🔥):
- Perplexity AI ▷ #pplx-api (5 messages):
- LM Studio ▷ #announcements (1 messages):
- LM Studio ▷ #general (56 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (17 messages🔥):
- Eleuther ▷ #general (29 messages🔥):
- Eleuther ▷ #research (23 messages🔥):
- Eleuther ▷ #scaling-laws (5 messages):
- Eleuther ▷ #lm-thunderdome (17 messages🔥):
- OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
- OpenRouter (Alex Atallah) ▷ #general (57 messages🔥🔥):
- OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):
- GPU MODE ▷ #general (2 messages):
- GPU MODE ▷ #triton (9 messages🔥):
- GPU MODE ▷ #cuda (1 messages):
- GPU MODE ▷ #torch (26 messages🔥):
- GPU MODE ▷ #algorithms (1 messages):
- GPU MODE ▷ #cool-links (1 messages):
- GPU MODE ▷ #off-topic (2 messages):
- GPU MODE ▷ #bitnet (2 messages):
- GPU MODE ▷ #self-promotion (1 messages):
- GPU MODE ▷ #thunderkittens (14 messages🔥):
- Nous Research AI ▷ #general (42 messages🔥):
- Nous Research AI ▷ #research-papers (5 messages):
- Nous Research AI ▷ #interesting-links (3 messages):
- Nous Research AI ▷ #research-papers (5 messages):
- Modular (Mojo 🔥) ▷ #mojo (35 messages🔥):
- Notebook LM Discord ▷ #use-cases (6 messages):
- Notebook LM Discord ▷ #general (17 messages🔥):
- Latent Space ▷ #ai-general-chat (18 messages🔥):
- Stability.ai (Stable Diffusion) ▷ #general-chat (18 messages🔥):
- tinygrad (George Hotz) ▷ #general (14 messages🔥):
- tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):
- Cohere ▷ #discussions (12 messages🔥):
- DSPy ▷ #general (8 messages🔥):
- Torchtune ▷ #dev (5 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):
- OpenInterpreter ▷ #ai-content (2 messages):
- Interconnects (Nathan Lambert) ▷ #memes (2 messages):
- LlamaIndex ▷ #general (1 messages):
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
Theme 1. Language and Video Models: Innovations and Optimization
- Nvidia Puzzle: Distillation-Based NAS for LLMs: @_akhaliq shared Nvidia's presentation on Puzzle, a distillation-based neural architecture search for inference-optimized Large Language Models. This approach aims to improve efficiency and performance in model deployment.
- Discussion on effectiveness and application emphasized by the community showcases excitement around this optimization technique.
- IC-Light V2 Model Release: @_akhaliq discussed alternative models of IC-Light V2 designed for varied illumination scenarios, along with a demo showcasing its potential applications.
- Trajectory Attention and Timestep Embedding for Video Models: @_akhaliq introduced Trajectory Attention for fine-grained video motion control, alongside Timestep Embedding as a caching mechanism for video diffusion models. These techniques offer advancements in video motion precision and efficiency.
Theme 2. AI Outreach and Collaborations
- Amazon and Anthropic Partnership: @DeepLearningAI reported Amazon's increased investment, bringing their total commitment to Anthropic to $8 billion—a significant boost for the startup's growth and AI capabilities.
- AI Fellowship and Safety Research: @AnthropicAI is starting a fellowship program, planning to provide funding and mentorship to engineers and researchers to transition into AI safety research. Fellows will collaborate with established researchers on projects addressing adversarial robustness, scalable oversight, and more.
- Google's Expansion in AI: @osanseviero announced joining Google to work on the Gemini API, open models, and collaboration spaces like Colab and AI Studio, indicative of Google's push for broader AI integration.
Theme 3. Domain Names and Online Identity
- Debating .com Dominance: @adcock_brett argues against the necessity of .com domains for credibility, advocating instead for investing in product and branding over securing premium domain names.
Theme 4. Advances in Reasoning and AI Agents
- Reverse Thinking in LLMs Strengthens Reasoning: @iScienceLuvr shared insights on "Reverse Thinking" in Language Models, improving performance by training LLMs to start from solutions and reason backwards, demonstrating a 13.53% improvement over standard methods.
- New Agent Frameworks with Pydantic: @omarsar0 announced the launch of a PydanticAI agent framework, emphasizing a type-safe, model-agnostic approach for building production-grade applications with structured response validation and support for streamed responses.
Theme 5. Machine Learning Humor and Light-Hearted Engagements
- Creative Strategies in AI: @goodside humorously strategizes about assignments that complicate the use of ChatGPT, notably mentioning name "David Mayer" as a potential keyword to perplex AI users.
- Memes like "Giving homework as images" explore playful engagements with students.
- Refreshing Perspectives on AI Practices: @swyx encourages creative and expressive prose in AI-driven content, advocating against a monotonous style and emphasizing variability and human elements in written communication.
- Exploring AI's Impact on Culture and Engagement: @karpathy often shares insights into how AI influences and transforms cultural engagements, adding joy and humor to discussions around AI and its societal impact.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Chinese Models Dominate: QwQ-32B & DeepSeek Outperform GPT-4
- QwQ vs o1, etc - illustration (Score: 117, Comments: 68): A visual comparison shows performance metrics between QwQ and other models across four technical benchmarks: GPQA, AIME, MATH-500, and LiveCodeBench, with a reference to an earlier comparison between Qwen 2.5 vs Llama 3.1. The benchmarks evaluate graduate-level scientific knowledge (GPQA with 34% baseline accuracy for non-experts and 65% for PhD experts), advanced mathematical problem-solving (AIME), comprehensive mathematics (MATH-500), and real-time coding abilities (LiveCodeBench).
- QwQ 32B 8bit demonstrated exceptional reasoning capabilities by correctly solving all prompts from the "GPT-4 can't reason" paper, with extensive internal dialogue taking up to 30 minutes per problem like the Wason Selection Task.
- Users discovered that Ollama's default 2k context size can be limiting for QwQ's reasoning tokens, with recommendations to use Exllamav2 or Koboldcpp for better performance and VRAM utilization. The model can be paired with Qwen2.5-coder-0.5B or 2.5-0.5-Instruct as draft models for speculative decoding.
- The model exhibits multilingual reasoning capabilities, switching between English, Chinese, Russian, and Arabic during its chain of thought process. As noted by Karpathy, this behavior suggests proper RL implementation.
- Open-weights AI models are BAD says OpenAI CEO Sam Altman. Because DeepSeek and Qwen 2.5? did what OpenAi supposed to do! (Score: 502, Comments: 205): DeepSeek and Qwen 2.5 open-source AI models from China have demonstrated capabilities that rival OpenAI's closed models, leading to public discourse about model accessibility. In response, Sam Altman expressed concerns about open-weights models in an interview with Shannon Bream, emphasizing the strategic importance of maintaining US leadership in AI development over China.
- OpenAI's perceived stagnation and reliance on scaling/compute power is being criticized, with users noting their $157 billion valuation seems unjustified given emerging competition. The company appears to be losing their competitive advantage or "moat" as open-source models catch up.
- Users point out the irony of Sam Altman's previous safety concerns about open-weights models, as better open-source alternatives have emerged without causing predicted harm. Multiple comments referenced his earlier emails with Elon Musk promising openness, contrasting with current stance.
- Technical discussion highlights that while OpenAI's Advanced Voice Mode remains unique, competing solutions are emerging through combinations of Whisper, LLM, and TTS technologies. Users debate whether OpenAI's lead is due to genuine innovation or primarily marketing and compute resources.
Theme 2. JPEG Compression for LLM Weights: Novel Research Direction
- Thoughts? JPEG compress your LLM weights (Score: 142, Comments: 64): JPEG compression techniques could be applied to Large Language Model weight storage, though no specific implementation details or results were provided in this post. The proposal draws parallels between image compression and neural network parameter compression, suggesting potential storage optimization methods.
- Community skepticism focused on the impracticality of matrix reordering, with experts explaining that reordering both rows and columns would break matrix multiplication properties. Multiple users pointed out that neural network weights behave more like random noise than structured image data.
- Technical discussions revealed that attempts to implement similar compression techniques yielded minimal results, with one user reporting only a "few percentage points reduction" in weight spread using simulated annealing. A user shared experience converting tensors to 16-bit grayscale PNG files, which worked losslessly but failed with JPEG compression.
- Several experts recommended sticking with established quantization methods like AWQ or GPTQ instead, noting that LLM weights lack the spatial patterns that make JPEG compression effective. Discussion highlighted that weights don't follow regular statistical distributions that could be exploited by traditional compression algorithms.
Theme 3. Qwen 2.5 Powers Hugging Face's Text-to-SQL Feature
- Hugging Face added Text to SQL on all 250K+ Public Datasets - powered by Qwen 2.5 Coder 32B 🔥 (Score: 98, Comments: 11): Hugging Face integrated Text-to-SQL capabilities across their 250,000+ public datasets, implementing Qwen 2.5 Coder 32B as the underlying model. The feature enables direct natural language queries to be converted into SQL statements for database interactions.
- Hugging Face team member confirms the feature uses DuckDB WASM for in-browser SQL query execution alongside Qwen 2.5 32B Coder for query generation, and welcomes user feedback for improvements.
- Users express enthusiasm about the tool's potential to help those less experienced with SQL, with one noting it addresses a significant pain point in dataset interaction.
- The announcement generated playful responses about the included confetti animation and the potential to rely less on direct SQL knowledge.
Theme 4. Fox News Targets Open Source AI as National Security Threat
- Open-Source AI = National Security: The Cry for Regulation Intensifies (Score: 101, Comments: 70): Fox News aired a segment claiming open-source AI models pose risks to US national security, though no specific details or evidence were provided in the coverage. The narrative adds to growing media discussions about potential regulation of open-source AI development, though without substantive technical analysis.
- Chinese AI models like Deepseek R1 and Qwen are reportedly ahead of US open-source models like Meta's Llama. Multiple users point out that China's top models are not based on Llama, contradicting the narrative about open-source helping Chinese development.
- Users criticize the push for regulation as an attempt to enforce AI monopolies and corporate control. The community suggests that restricting US open-source development would effectively hand the entire open model sector to China, who is already releasing top-tier open models.
- The discussion emphasizes that open-source technology has historically proven more secure than closed-source alternatives over the past 40 years. Users argue that preventing open development would harm innovation and collaboration while benefiting large tech companies like Microsoft, OpenAI, and Anthropic.
Other AI Subreddit Recap
r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity
Theme 1. StreamDiffusion Powers Live AI Visuals in Concert Performances
- Bring Me The Horizon using real time img2img? (Score: 337, Comments: 62): Bring Me The Horizon concert featured real-time img2img AI visual effects during their live performance. The post inquires about the technical workflow enabling real-time AI image generation and transformation during a live concert setting.
- StreamDiffusion appears to be the leading solution for real-time AI visual effects, achieving up to 90 FPS on an RTX 4090. A demonstration package created by user tebjan for vvvv showcases implementations with examples available on Instagram and Google Photos.
- The visual consistency is maintained through a clever technique where the video feed is larger than the displayed crop, allowing objects to remain in the generation frame even when they appear to leave the visible screen. Multiple users reported seeing similar effects at Download Festival with Avenged Sevenfold.
- Community reception is mixed, with significant criticism of the temporal consistency issues and overall aesthetic quality. A technical malfunction at Download Festival highlighted limitations when A7X's show lost power but the AI effects continued running without context.
Theme 2. Haiku vs ChatGPT: Free Tier Comparison Shows ChatGPT Lead
- Haiku is terrible. (Score: 233, Comments: 114): A user expresses disappointment with Claude Haiku, finding it significantly inferior to ChatGPT's free tier despite attempts to continue using it, ultimately returning to ChatGPT after previously using Claude/Sonnet. The user, residing in a third world country, cites prohibitive subscription costs as the main barrier to accessing premium AI models like Sonnet, hoping for future accessibility of these models.
- Regional pricing is a significant issue for Claude accessibility, with users noting that in countries like Venezuela, the subscription cost equals 2 months of minimum wage income. Some users suggest workarounds like creating multiple Google accounts for Poe or using Google AI Studio which offers 1 million tokens per minute free tier.
- Users report that Haiku performs poorly compared to both ChatGPT's free tier and local models like Llama or Qwen. ChatGPT is currently considered the best value in both free and paid tiers, though some suggest DeepSeek (with 50 daily uses) as an alternative.
- Sonnet's recent limitations (50 messages per week) have frustrated users, with many reporting needing to significantly reduce project file sizes and refine prompts. Some users attribute this to Anthropic's pivot to B2B focus, especially after the Amazon acquisition.
Theme 3. World Labs' $230M AI Startup Launches 3D Scene Generation
- First demo from World Labs - $230m Startup Led by Fei Fei Li. Step inside images and interact with them! (Score: 209, Comments: 43): World Labs, led by Fei Fei Li, introduced a system for converting images into interactive 3D scenes. The startup, which raised $230 million in funding, enables users to step inside and interact with generated 3D environments from 2D images.
- Technical analysis reveals the system likely uses Gaussian splats for rendering, evidenced by translucent ovals in vegetation and references in their
threeviewer_worker.js
file. The technology appears to be 2.5D with limited movement to avoid artifacts. - The project can be accessed via WorldLabs.ai, with a realtime renderer for modern devices and a fallback version with pre-rendered videos for older mobile devices. Scene generation likely takes 5+ minutes with realtime rendering afterward.
- Discussion around the $230 million funding sparked debate about investment value, with some defending it as frontier tech development while others questioned the cost for what they view as advanced HDRI generation. Several users noted potential VR applications and metaverse implications.
- Technical analysis reveals the system likely uses Gaussian splats for rendering, evidenced by translucent ovals in vegetation and references in their
Theme 4. AI Surpassing Human Benchmarks Sparks Testing Debate
- AI has rapidly surpassed humans at most benchmarks and new tests are needed to find remaining human advantages (Score: 281, Comments: 146): AI systems have outperformed human baselines across most standard evaluation benchmarks, making it difficult to accurately measure remaining areas of human cognitive advantage. The rapid pace of AI benchmark saturation suggests a need for developing new types of tests that can better identify and quantify uniquely human capabilities.
- LLMs show limitations in complex code synthesis tasks and the ARC Challenge, with users noting that AI performance on benchmarks like SAT questions may be influenced by training on existing test data rather than true comprehension.
- Users highlight real-world performance gaps, sharing examples where prompt engineering took significantly longer than manual work, with one user describing a case where their boss spent 2 days attempting what they completed in 30 minutes.
- Discussion emphasizes societal implications, with concerns about job displacement in the next 2-3 years and the need for workers to develop "Plan B" career strategies, while others point out that tools like Wolfram Alpha haven't replaced specialized professions despite superior mathematical capabilities.
AI Discord Recap
A summary of Summaries of Summaries by O1-preview
Theme 1. Pushing the Limits: New AI Training and Optimization Breakthroughs
- Nous DisTrO Takes Decentralized Training by Storm: Nous Research kicked off decentralized pre-training of a 15B language model using DisTrO, leveraging hardware from partners like Oracle and Lambda Labs. They matched centralized training metrics, with their DeMo optimizer reducing inter-accelerator communication.
- Homemade CUDA Kernel Beats cuBLAS on H100: A custom H100 CUDA matmul kernel outperformed cuBLAS by 7% for N=4096, showcasing that sometimes rolling your own code pays off.
- FP8 Training Gets Easier: No More Dynamic Scaling!: A new method enables out-of-the-box FP8 training without dynamic scaling, using the unit-scaling library. Low-precision training just got simpler.
Theme 2. AI Tools Get Smarter: Updates You Can't Miss
- Aider v0.66.0 Writes Most of Its Own Code!: The latest Aider release adds PDF support for Sonnet and Gemini models and introduces AI-triggered code edits with
AI!
comments. Impressively, 82% of the code was written by Aider itself. - Cursor IDE Update Ruffles Feathers, But Agent Feature Shines: Cursor removed the long context option, frustrating users. However, the new agent feature is being praised as a "senior developer" assistant, making coding smoother, especially on smaller projects.
- OpenRouter Lets Users Steer Development with Feature Voting: OpenRouter launched a Feature Requests Voting system, inviting users to vote on new features and drive community-driven development.
Theme 3. Stumbling Blocks in AI Model Integration and Training
- Fine-Tuning Qwen 2.5? Don't Forget the Special Sauce!: Users emphasized the need to use Qwen's specific ChatML template for fine-tuning Qwen 2.5, cautioning against default options to avoid hiccups.
- Stable Diffusion vs. Lora Models: The Integration Headache: Despite following all the steps, users struggled to get Lora models working in Stable Diffusion, pointing to possible bugs or overlooked steps in the integration process.
- CUDA Errors Cramping Your Style? Try Quantization Magic: Users facing CUDA errors and VRAM limitations when loading large models suggested switching to smaller quantization formats or alternative cloud providers with better GPU support.
Theme 4. AI Model Performance: Comparing Apples and Oranges
- Claude Chats; ChatGPT Lectures: Pick Your Poison: Users compared Claude and ChatGPT, noting that Claude offers relatable conversation, while ChatGPT delivers in-depth philosophical insights, making it better suited for structured discussions.
- Google's Gemini Models Playing Hard to Get: OpenRouter users grumbled about rate limiting with Google's experimental models like Gemini Pro 1.5, suspecting that Google's tight restrictions are causing connectivity woes.
- GPT-4 Can't See Your Images, And Users Aren't Happy: Frustrations mounted as GPT-4 repeatedly failed to process images, returning errors like "I currently can't view images directly", hindering tasks like generating accurate image captions.
Theme 5. Fine-Tuning the Future: Efficient AI is In
- Equivariant Networks Prove Their Worth in Data Efficiency: Research showed that equivariant networks improve data efficiency in rigid-body interactions, outperforming non-equivariant models, especially when data is limited.
- ThunderKittens Could Use Some Auto Optimization Love: An auto optimizer was proposed for ThunderKittens to maximize its write-once-run-many-times potential, inspired by similar DSL experiences.
- Mixed Precision Inference: Precision Checking Gets Tricky: Developers delving into mixed precision inference with vLLM discussed challenges in verifying kernel execution precision, noting limitations in current profiling tools.
PART 1: High level Discord summaries
Cursor IDE Discord
- Cursor IDE Update Issues: Users have reported issues with the latest Cursor changelog, specifically the Composer not applying changes and the missing 'Apply' button, causing functionality frustrations.
- Additionally, several users noted the removal or inconsistent performance of long context usage in chat since the recent update.
- Composer vs Chat Mode Comparison: In Cursor IDE, users are contrasting Composer mode, which directly modifies files, with Chat mode that offers inline changes, discussing their limitations and functionality differences.
- There's a demand for improved integration between the two modes, such as efficiently transferring discussions from Chat to Composer.
- Windsurf vs Cursor IDE: Users are exploring Windurf as a potential competitor to Cursor IDE, noting its effective handling of terminal output and codebase search.
- While Windurf shows promise, Cursor maintains strengths in specific workflows; however, experiences between the two vary among users.
- API Key Limitations in Cursor IDE: Discussions highlight limitations in Cursor's API usage, with some users opting for their own API keys to gain more flexibility.
- The community is seeking improved management of API call limits and enhanced context gathering capabilities for active projects.
- Context Management in Cursor: Users have expressed dissatisfaction with the current context handling in Cursor IDE, particularly concerning limitations with Claude.
- The community is advocating for better context management features and consistency to improve their coding workflows.
OpenAI Discord
- Anthropic's MCP Framework Unleashes Claude as API: Anthropic released the new MCP framework, enabling Claude to run servers and effectively transforming the Claude app into an API.
- This development allows Claude to create, read, and edit files locally, sparking excitement among users about real-time interaction with tools like VSCode.
- Gemini's Response Constraints Compared to ChatGPT: Gemini often refuses innocent questions for perceived moral reasons, whereas ChatGPT is seen as more lenient in its responses.
- Users humorously highlighted instances where Gemini declined to discuss artificial intelligence, avoiding engagement in sensitive topics.
- Claude 3.5 Sonnet Emerges as Image Captioning Alternative: Due to persistent issues with OpenAI's vision capabilities, users recommend switching to Claude 3.5 Sonnet for image captioning tasks.
- Community members noted that Claude 3.5 Sonnet offers more reliable functionality, helping users avoid project delays.
- Speech-to-Text Feature Integration for ChatGPT on Windows: A user inquired about implementing a speech-to-text feature for ChatGPT on Windows, with suggestions to use the built-in Windows accessibility feature by pressing Windows + H.
- This approach provides a real-time solution for converting speech to text while interacting with ChatGPT.
- Structured Output Errors Linked to 'Strict' Misplacement: Users reported encountering random 'object' wrappers when using structured outputs, which was traced back to incorrect placement of the 'strict' setting.
- After extensive debugging, it was confirmed that misplacing 'strict' led to the persistent structured output errors.
aider (Paul Gauthier) Discord
- QwQ Model Configurations Negotiated: Users debated deploying the QwQ model in architect mode alongside a standard model for code commands, seeking clarity on interchangeability.
- Aider facilitates model definitions across projects, boosting flexibility Advanced model settings.
- DeepSeek-R1 Sets New Benchmarks: DeepSeek-R1 achieved exemplary results on the AIME & MATH benchmarks, underlining its open-source availability and real-time reasoning.
- Community members hope for DeepSeek to release model weights for integration in ensemble frameworks with QwQ.
- Optimizing Aider's Local Model Settings: Members collaborated on configuring
.aider.model.metadata.json
and.aider.model.settings.yml
files to define local models within Aider.- Choosing the edit format to 'whole' or 'diff' significantly affects response structuring and editing efficiency.
- OpenRouter Challenges Impact Aider: Participants identified issues with OpenRouter affecting model detection and functionality when using local servers.
- Concerns were raised about spoofed implementations potentially altering model outputs and behaviors.
- Ensemble Frameworks with QwQ and DeepSeek: A user expressed intent to integrate QwQ and DeepSeek models within ensemble frameworks to enhance reasoning capabilities.
- This approach aims to leverage the strengths of both models for improved performance.
Unsloth AI (Daniel Han) Discord
- Fine-Tuning Considerations in Unsloth: Users debated the merits of instruct versus non-instruct fine-tuning, recommending base models for datasets with over 1k records and suggesting experimenting with instruct models for datasets around 70k records.
- Guidance was provided to refer to Unsloth Documentation for dataset formatting rules, emphasizing compliance for effective fine-tuning.
- Data Privacy Measures in Unsloth: Unsloth was confirmed to maintain data privacy by not transferring data externally during fine-tuning, relying on the user's chosen platform like Google Colab.
- This assurance addressed concerns regarding compliance with strict data privacy policies among users handling sensitive information.
- RAG Compute Cost Challenges: Discussions highlighted that retrieval-augmented generation (RAG) can lead to high compute costs due to extensive context length requirements, as outlined in Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs.
- Users are navigating the balance between performance and efficiency, especially for knowledge-intensive tasks, as supported by findings where RAG surpasses fine-tuning.
- LLama 3.1 OOM Error Solutions: Experiencing out of memory (OOM) errors during continual pretraining of LLama 3.1 8B model led to suggestions for using a bigger GPU, reducing the dataset size, or decreasing the batch size.
- These strategies aim to mitigate memory issues and ensure smoother training processes for large-scale models.
- Latent Paraphraser Architecture Enhancements: A latent paraphraser was explained as a modification to the transformer architecture, adding a layer to redistribute probabilities over tokens.
- This enhancement improves input grounding and reduces noise by minimizing unseen tokens during processing.
Perplexity AI Discord
- Perplexity Pro Holiday Discount: The Perplexity Team announced a 75% off promotion for the first month of Perplexity Pro until Monday, December 2 at 11:59pm PT, enabling new users to access advanced features including enhanced search and file uploads.
- This offer also includes one-click shopping and free shipping through Buy with Pro, aimed at streamlining the shopping experience for users during the holiday season.
- Integration of Perplexity with Claude: Users inquired about integrating Perplexity within Claude using the new MCP feature, similar to its functionality with Brave and GitHub, to enhance performance by utilizing Claude's Project Knowledge.
- Additionally, there were questions regarding the possibility of integrating Google within Claude, highlighting user interest in leveraging search functionalities.
- Perplexity Image Generation Features: The platform's image generation capabilities were discussed, with confirmation that it is available via computer online without additional charges.
- Users explored the extent of these features, considering their accessibility and potential applications in various projects.
- RBAC vs ABA Access Control Models: A member sought clarification on the difference between RBAC (Role-Based Access Control) and ABA (Attribute-Based Access Control) systems.
- This discussion underscores the need for understanding access control models in technological implementations.
- Custom Instructions in Claude Spaces: Issues were raised about the effectiveness of custom instructions for Claude spaces, which appear to conflict with existing 'introduce yourself' prompts.
- Users are seeking guidance on how these instructions should interact and whether they can be effectively combined.
LM Studio Discord
- HF Search Issue Resolved: The HF search not working issue has been resolved, much to the relief of users.
- An image was attached to commemorate the fix, indicating a positive update for the community.
- LM Studio AIDE Integration Succeeds: Users successfully integrated the LM Studio endpoint to the AIDE sidecar, enabling a fully local code editor experience.
- This integration enhances functionality for those seeking a local development environment.
- Llama 3.1 Models Accessibility: A user inquired about accessing the base model of Llama 3.1 8B in LM Studio, noting that only instruction-tuned variants seem available.
- Community members pointed to the huggingface repository as a potential source for the base model.
- a770 Underperforms Compared to 7800xt: A member shared that their a770 achieved only 11t/s for Qwen2.5-14b q4_0, significantly lower than the 40t/s achieved by a 7800xt.
- They noted q4_k_m is unusable but found sycl backend to be negligibly faster.
- Seasonic PSU Longevity Praised: A member mentioned their Seasonic PSU outlived other PC components despite having to replace PSUs every couple of years due to dust.
- They described their experience as amazingly satisfactory with the PSU's performance.
Eleuther Discord
- De-escalation of Resource Contention: Members highlighted concerns about the de-escalation of resource contention and its impact on unregulated internet growth, questioning the effectiveness of AI-powered privacy solutions. They emphasized the importance of identifying warning signs of rogue AI attacks to protect vulnerable devices.
- The discussion stressed the need for community leadership in AI protection to mitigate the risks associated with resource contention and unauthorized AI activities.
- Poincare Ball Embedding Explained: Embedding data into a Poincare ball ensures that points with higher degrees reside closer to the origin, preserving adjacency while transitioning to regions with less curvature. This method facilitates the representation of complex hierarchical structures.
- A member pointed out the conceptual challenge of the Poincare ball's edge, noting that it represents a point at infinity where points cannot physically reside, which sparked further technical discussion.
- Equivariant Networks Gain Efficiency: A recent paper found that equivariant networks enhance data efficiency compared to non-equivariant networks across various model sizes and compute budgets. The study demonstrated that equivariant models consistently outperform their non-equivariant counterparts.
- Empirical results indicated that while non-equivariant models can match the performance of equivariant ones with sufficient training, equivariant networks offer superior efficiency without requiring extensive compute resources.
- Understanding HF Tokenizers in Eval Harness: There’s confusion about whether the eval harness tokenizes sequences with
add_special_tokens=True
orFalse
, particularly regarding the handling of EOS tokens during generation tasks. Members clarified that typically, only BOS tokens are added when building custom tokenizers.- Discussions revealed that manually managing the EOS token in the training loop is a practical approach to avoid compatibility issues across different frameworks utilizing HF models.
- TaskSet Empowers Optimizer Training: The TaskSet dataset, containing over a thousand diverse tasks, is instrumental for training and evaluating optimizers in meta-learning contexts. This dataset enables significant efficiency improvements over traditional random search methods.
- Although recognizing that TaskSet is somewhat outdated, members acknowledged it as the best available option for building large datasets of learning curves despite financial constraints in AutoML research.
OpenRouter (Alex Atallah) Discord
- Feature Requests Voting: Members are urged to vote for their top feature requests here to prioritize upcoming developments.
- For any unlisted requests, users can submit them in <#1107397803266818229>, enabling a wider array of community-driven feature inputs.
- Pixtral Large Performance: Pixtral Large is praised for its excellent performance and a massive free tier, facilitating easy access via console.mistral.ai.
- A user reported switching from Hermes 405b to Pixtral, noting its effectiveness with unchanged prompts.
- Model Identification Confusion: Discussions highlighted that models do not inherently recognize their identities and often hallucinate details from training data.
- This led to lingering confusion among users about model identifications despite clarifications.
- Generation Cost Estimation: A user inquired about rates for the /api/v1/generation endpoint and methods to accurately estimate generation costs.
- Suggestions included utilizing Helicone for tracking, emphasizing that the generation endpoint is essential for precise cost assessment.
- Custom Provider Keys Access: Developers are pushing for access to custom provider keys, reflecting a strong community demand for this feature. One member noted, 'Thank you for all the great work!' while requesting access.
- Several users, including monomethylhydrazine and kit18, expressed the need to use their own keys for specific providers, highlighting a community consensus on this functionality.
GPU MODE Discord
- Triton Metaprogramming and Source Build: A metaprogramming proposal for Triton aiming to address existing limitations has generated community interest, though some members requested clearer semantics and example inclusions.
- Additionally, building Triton from source on WSL2 required increasing memory to 26GB to prevent out-of-memory errors, and members discussed offline compilation dependencies in Ubuntu Docker containers.
- ThunderKittens and ThunderMittens Unification: Discussions around ThunderKittens and ThunderMittens highlighted the role of tile abstraction in unifying the frameworks for tensor core compatibility, with emphasis on register usage control.
- Members also inquired about existing API contracts between the two, and expressed interest in an auto optimizer for ThunderKittens to enhance its write-once, run-many-times system.
- BitNet b1.58 with RedPajama and Dolma Datasets: The release of BitNet b1.58 models, trained on the RedPajama dataset with 100B tokens, demonstrated promising PPL and zero-shot accuracy results.
- Furthermore, the OLMo-Bitnet-1B model, trained on 60B tokens from the Dolma dataset, underscores the research-centric approach with detailed training hyperparameters available in their documentation.
- Diffusion Models Technical Overview: Recent discussions on diffusion models emphasized their dominance in generating perceptual signals, citing improved mode coverage and faster sampling as key advantages.
- Implementation of classifier-free diffusion guidance was highlighted for enhancing conditional diffusion model outputs in systems like OpenAI’s DALL·E 2 and Google’s Imagen, with noise schedule design elements being pivotal for performance.
- Open Japanese LLM Leaderboard Launch: The introduction of the Open Japanese LLM Leaderboard aims to evaluate Japanese LLMs across 20+ datasets and tasks in collaboration with Hugging Face.
- This initiative addresses the lag in Japanese LLM performance compared to English, garnering interest from Japanese HPC engineers focused on native language advancements.
Nous Research AI Discord
- Hermes 3 Advances with O1 Style Integration: A discussion in #general highlighted inquiries about Hermes 3, suggesting connections to the former O1 style.
- This reflects ongoing interest in Hermes' latest developments and its evolution within the community.
- Mistral Platform Faces Model Selection Hurdles: Members voiced concerns regarding the Mistral AI platform's recent change to default to a single model selection option.
- The limitation on image generation capabilities has caused confusion and impacted user experience.
- Truth Terminal Merges AI with Crypto Narratives: Insights were shared about Truth Terminal creating its own religion through a semi-autonomous AI within the crypto space.
- This unique blend underscores the intersection of AI alignment discussions and the AI and crypto communities.
- Low-bit Quantization Benefits Undertrained LLMs: Research indicates that low-bit quantization results in less degradation for larger, undertrained LLMs compared to smaller, extensively trained models, as detailed in this paper.
- The findings emphasize the importance of aligning quantization strategies with model size and training token requirements.
- Ternary Quantization Limited, FP4 Emerges as Efficient: Observations reveal that ternary quantization (BitNet) only improves results for undertrained networks, questioning its broad applicability.
- Consequently, the community is leaning towards FP4 as the preferred numeric weight representation for current model architectures.
Modular (Mojo 🔥) Discord
- Confusion Over Mojo Origins vs Rust Lifetimes: A user expressed confusion about how Mojo's Origins are similar to Rust's lifetimes, suggesting both aim to solve memory management issues but are fundamentally different.
- While inspired by Rust, Mojo's design is intentionally distinct, aiming for different compiler behaviors and goals.
- Mojo Origins Maintain Memory Control: Mojo's Origin denotes a memory chunk; when a pointer is parameterized by an origin, it indicates it points within that memory, extending variable lifetimes as necessary.
- Origins facilitate aliasing guarantees and can produce compile-time errors if a pointer remains alive while its target is not.
- Understanding Origins Requires Patience: Understanding Mojo Origins from a compiler perspective is challenging, especially as they are not finalized, leading to potentially shifting details.
- A user expressed willingness to wait for more clarity on the topic rather than asking more questions prematurely.
- Namespace Challenges with Spaces in Variable Names: A question arose about the possibility of using spaces in variable names, like
var xe đạp = 'abc'
, highlighting a lack of support across programming languages.- Allowing spaces complicates parser implementation significantly, making it impractical.
Notebook LM Discord Discord
- Notebook LM Podcast Feature Creates Audio in 30 Minutes: A user praised Notebook LM's ability to create an audio podcast in just 30 minutes using documents about their German little league baseball program, including its historic World Series qualification. The podcast episode showcases the seamless integration of AI-generated content.
- This demonstrates how Notebook LM can efficiently generate multimedia content, enhancing project workflows for users.
- NotebookLM Enhances High-Fantasy Worldbuilding: A user shared their experience of using NotebookLM for worldbuilding a high-fantasy novel, highlighting the model's capability to provide context-aware responses.
- The AI's reasoning skills led to new insights and mechanics for their magic system based on existing rules.
- GenFM Challenges NotebookLM in AI Podcasting: A member shared a video titled 'GenFM, Now Playing on ElevenReader: Smart Podcasts Produced by Generative AI', highlighting competition in the AI space.
- Despite GenFM's entry, another member noted that NotebookLM still provides deeper interactive experiences.
- RAX's Bold Times Square Billboard Takeover: RAX, a cyberpunk raccoon, commandeered Times Square billboards to advocate for mindful consumption with the message: 'DON'T BUY EVERYTHING YOU SEE.' A YouTube video discusses the event emphasizing the need to question consumer culture.
- This digital performance sparked discussions on consumerism within the community.
- FDP Plans Coalition Breakup in Germany: The FDP is planning to break up the coalition government led by Chancellor Gerhard Schröder, outlining a strategy to frame their exit as necessary for political progress.
- Internal documents provide key narratives and timelines to ensure the German public receives a clear choice in upcoming elections.
Latent Space Discord
- Perplexity's Clever Black Friday Campaign: Perplexity launched a clever Black Friday campaign that aligns with recent marketing trends leveraging AI capabilities.
- This initiative has garnered attention for its strategic integration of AI in marketing strategies.
- Humans Outperform AI in Pattern Recognition: Consensus among members indicates that while AIs compute faster, humans excel at recognizing global patterns in complex problems, often reacting with phrases like 'hang on a sec, this isn't right'.
- This ability to identify overarching inconsistencies sets humans apart from AI systems that may fixate on specific local issues.
- Generative AI Investment in Enterprises: A recent report highlights that AI spending surged to $13.8 billion in 2024, signifying a shift from experimental use to core business strategies.
- Despite the increase in investment, over a third of decision-makers are still developing effective methods for integrating generative AI into their operations.
- Freysa AI Agent Challenge Funds Released: An AI challenge led to the Freysa agent transferring $47,000 through a cleverly crafted prompt that bypassed strict transfer instructions.
- This event underscores the complexities of prompt engineering for AI manipulation within financial transactions and showcases transparent, open-source setups.
- Technology Adoption and Investment Trends: Participants compared current LLM trends to historical technological shifts, noting parallels in excitement and potential market corrections.
- The ongoing discussion raises concerns about the sustainability and future profitability of AI technologies, echoing patterns seen in industries like aviation.
Stability.ai (Stable Diffusion) Discord
- ControlNet for SD 3.5 Quality Issues: A member reported that ControlNet for SD 3.5 only produces high-quality renders at 1024x1024 resolution without artifacts.
- Another member attributed the issues to lack of familiarity and encouraged experimenting to better understand ControlNet's functionality.
- Stable Diffusion Hardware Performance: A user inquired about performance benchmarks for Stable Diffusion, mentioning an achievement of approximately 5 IT/s.
- Community members actively shared their hardware capabilities, reflecting keen interest in optimizing setups for Stable Diffusion.
- LoRA Model Request for AI Art: A user requested information about a LoRA half girl model to create characters merging two different female designs.
- This request highlights ongoing experimentation and creativity in character development within AI-generated art.
- Content Creator Thanksgiving Wishes: A member extended Happy Thanksgiving wishes to the Stability.ai team and fellow creators.
- This gesture underscores the camaraderie and collaborative spirit among content creators in the AI space.
tinygrad (George Hotz) Discord
- TinyFPGA's Potential Memory Architecture: Members discussed the design of TinyFPGA, contemplating how to mimic a typical memory hierarchy while noting that existing options like Block RAM and DDR3 are insufficient.
- Ideas were proposed for a 'first pass' memory to localize constants near ALUs, potentially enhancing performance significantly.
- Challenges in Traditional Memory Models: Discussions highlighted that heuristic eviction policies may become obsolete as the focus shifts towards more efficient memory hierarchies in future TinyFPGA designs.
- Speculations were made about the future of trained parameters, with mentions of tensors potentially replacing them.
- Exa Laboratories Sustainable Chip Designs: A conversation on Exa Laboratories emphasized their mission to create reconfigurable chips that outperform traditional GPU/TPU in speed and energy efficiency for specific AI needs.
- Skepticism was expressed regarding their viability, pointing out the challenges small companies face in chip development, especially with ambitious timelines.
- Tenstorrent's Biologically Plausible Training Algorithms: George Hotz mentioned Tenstorrent as a serious player investing in training algorithms that mimic biological processes to achieve greater efficiency.
- Potential changes include hierarchical memory models and real-time optimizations reminiscent of brain function principles in computing.
- VIZ Tool in tinygrad: A member posted a detailed tutorial explaining the VIZ tool, available here, enhancing understanding of its capabilities within tinygrad.
- George Hotz acknowledged the VIZ tool in a tweet, stating that VIZ=1 is a significant improvement over LLVM/MLIR, highlighting its advantages.
Cohere Discord
- Aya Project Contributions Guidance: A member sought guidance on contributing part-time to the Aya project for Cohere.
- Another member suggested joining the Aya server to connect with the community directly.
- Thanksgiving Celebrations and Meal Sharing: Members shared Happy Thanksgiving messages and images of their meals, including one member's impressive plate of food.
- Another member humorously commented on trying to eat healthy, noting that it wasn't as tasty as it could be.
- Food Sharing and Dungeness Crab: Members exchanged comments and images of their hearty meals, with one joking that their meal was more like dessert.
- A humorous remark followed about having eaten a plate of Dungeness crab beforehand, enhancing the food sharing atmosphere.
DSPy Discord
- dspy.asyncify support concerns: A member inquired about using
dspy.asyncify
, specifically its use of threads and the availability of pure async support due to issues with celery workers.- Another user echoed the desire for pure async support to address the existing celery worker issues.
- dspy demo behavior with assertions: Concerns were raised about
dspy
not using demos in the final prompt when assertions are activated.- A member clarified that demonstrations in retry mode depend on whether compilation occurred before or after activating assertions.
- Welcome Shaun to the guild: Shaun joined the server, greeted everyone, and expressed excitement about ongoing projects.
- The community welcomed Shaun, fostering an inclusive environment.
Torchtune Discord
- DPO Aligns Across Repositories with LoRA-DPO: The DPO Trainer from Hugging Face shows that while the code differs, the DPO technique remains consistent across repositories like LoRA-DPO.
- This consistency ensures that implementations maintain alignment, facilitating easier integration and comparison between different DPO approaches.
- Feasibility of Full-parameter DPO: Implementing full-parameter DPO is achievable and may enhance post-training alignment compared to LoRA-DPO.
- The community recommends leveraging adaptations from the existing full PPO implementation to guide this process.
- Introducing dpo_full_finetune_single_device PR: A new PR adds full finetuning DPO for distributed setups, serving as a solid foundation for single device implementation.
- Details can be accessed through the full DPO PR, which outlines the proposed changes and enhancements.
- Torchtune to Support Full-finetuning DPO: Upcoming updates in Torchtune will support full-finetuning DPO, necessitating modifications to load a separate reference model.
- These changes involve altering initial calls to the reference model to improve functionality and integration within the existing framework.
- Higher Memory Usage in FFT DPO: FFT DPO will consume significantly more memory than LoRA due to the necessity of storing gradients and maintaining a complete model copy.
- If LoRA DPO does not meet performance requirements, the tradeoff in memory usage for adopting full-finetuning DPO may be justified.
LLM Agents (Berkeley MOOC) Discord
- Quiz 11 Still Not Open?: A member expressed confusion about the status of Quiz 11, questioning why it isn't available yet.
- Is there an expected date for when it will be open?
- Inquiry on OpenAI Credits: A user inquired about the status of their OpenAI credits, mentioning they filled out the form last week.
- They expressed urgency, stating they are in need of support for their project development.
- MOOC Completion and Certificate Eligibility: A member asked if starting the MOOC now would still allow them to receive the certificate after completion.
- They were also curious if it's feasible to finish all requirements within the remaining time.
OpenInterpreter Discord
- Open Interpreter Dashboard Development: A member announced they're developing an Open Interpreter inspired project focused on creating an open-source dashboard to be released this year.
- The project emphasizes being a fun little project without any profit motive.
- Community Support for Dashboard Project: Another member congratulated the project creator, expressing enthusiasm with 'Nice work! Well done 🚀'.
- This exchange highlighted the community's encouragement for innovative projects within the space.
Interconnects (Nathan Lambert) Discord
- OLMo 2 Performance Boosts Prowess: The OLMo 2 family, comprising 7B and 13B models from Allen AI (AI2), was trained on up to 5T tokens and outperforms Llama-3.1 8B and Qwen 2.5 7B.
- Key enhancements include an improved architecture with RMSNorm and QK-Norm, along with a comprehensive two-stage curriculum training approach.
- OLMo 2 Crafts Cutting-Edge Training: OLMo 2 employs the model souping technique for final checkpoints and adopts a post-training methodology inspired by Tülu 3 involving instruction tuning, preference tuning with DPO, and reinforcement learning with verifiable rewards.
- Instruct OLMo 2 Tops Open-Weight Models: The 13B Instruct variant of OLMo 2 surpasses Qwen 2.5 14B and Tülu 3 8B in instruct tasks, as validated by the OLMES suite.
- Weight Watcher AI Gains Meme-worthy Attention: Weight Watcher AI was highlighted as a novel addition to the AI landscape and humorously shared in the memes channel, drawing attention for its amusing nature.
- The OLMo summary link was shared, though no description was found.
LlamaIndex Discord
- Developer Skills Showcase: A member shared an extensive list of development skills including React, Next.js, Angular, and D3.js, highlighting their experience with UI/UX and testing frameworks like Protractor and TestCafe.
- This diverse skill set underscores their adaptability across front-end and testing technologies, enhancing their capability to tackle complex engineering challenges.
- Diverse Technology Stack: The developer mentioned a wide range of technologies such as Node, Nest.js, Solidity, and Rust, including knowledge of front-end frameworks like Bootstrap and styling methodologies like BEM and SMACSS.
- This comprehensive technology stack enables efficient integration and development across various platforms and frameworks, catering to multifaceted project requirements.
- API Integration Expertise: They expressed familiarity with integrating multiple APIs including Google Maps, YouTube, and Facebook APIs, allowing them to work on diverse projects that require efficient data interaction.
- Their ability to manage and implement diverse API integrations facilitates robust and scalable solutions in system architectures.
- Cloud Deployment Skills: The member highlighted AWS among their cloud service competencies, enabling effective deployment of applications into cloud environments.
- Proficiency in AWS ensures reliable and scalable cloud deployments, optimizing resource management and infrastructure performance.
- Call for Collaboration: They concluded with an invitation to connect, promoting potential networking opportunities within the developer community.
- This outreach fosters professional collaboration and knowledge sharing among engineers with similar technical interests.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Cursor IDE ▷ #general (237 messages🔥🔥):
Cursor IDE Updates, Composer vs Chat Mode, Windsurf Advantages, API Key Usage, Context Management
- Cursor IDE updates introduce issues: Users have reported issues with the latest update to Cursor, particularly with the Composer not applying changes and missing the 'Apply' button, leading to frustrations about functionality.
- Many have also noted that certain features, like the usage of long context in chat, seem to have been removed or are working inconsistently since the update.
- Comparison of Composer and Chat Mode: Composer mode changes files directly, while Chat mode provides inline changes, with users discussing the limitations and functionality differences between both modes.
- There are requests for better integration between the two, like transferring discussions from Chat to Composer efficiently.
- Windsurf seen as a competitor: Several users are experimenting with Windurf and sharing that it has promising features, particularly regarding how it handles terminal output and codebase search.
- Comparisons suggest that while Windurf has potential, Cursor retains advantages in certain workflows, though users are noting discrepancies in experiences between the two.
- Concerns over API key limits: Discussions have arisen around the limitations of Cursor's API usage, with some users considering using their own API keys for more flexibility.
- The conversation reflects a desire for better management of API call limits and context gathering for active projects.
- Context management frustration: Users expressed dissatisfaction with the context handling capabilities of the current models, particularly regarding the perceived limitations with Claude.
- The community is looking for improvements to context management and feature consistency to enhance their coding experiences.
- no title found: no description found
- Ooft Jealous Girlfriend GIF - Ooft Jealous girlfriend Jealous - Discover & Share GIFs: Click to view the GIF
- Cursor - The IDE designed to pair-program with AI.: no description found
- How to do `Fix in Composer` and `Fix in Chat` actions from keyboard: These 2: I could not find it in settings.
OpenAI ▷ #ai-discussions (91 messages🔥🔥):
Gemini's Moral Constraints, Anthropic MCP Framework, ChatGPT's Capabilities, Speech to Text on Windows, AI Models for Coding
- Gemini often refuses innocent questions: Users noted that Gemini sometimes refuses to answer innocent questions for perceived moral reasons, contrasting this with ChatGPT, which is seen as more lenient in responses.
- One user humorously highlighted an instance where Gemini declined to discuss artificial intelligence, stating it would not engage in sensitive topics.
- Anthropic announces MCP framework: Anthropic's new MCP framework allows Claude to run servers, effectively transforming the Claude app into an API that can create, read, and edit files locally.
- Users are excited about new capabilities, including real-time interaction with tools like VSCode.
- ChatGPT and Speech to Text feature: A user inquired about a speech-to-text feature for ChatGPT on Windows, and another suggested using the built-in Windows accessibility feature by pressing Windows + H.
- This suggestion was aimed at providing a real-time solution for converting speech to text while using ChatGPT.
- AI Models for Coding Discussion: Users discussed various models for coding tasks, suggesting a ranking that included Claude 3.5 Sonnet and others, leading to debates about biases in model effectiveness.
- Comments on the list included confusion over repeated mentions and the exclusion of GPT-4o and other models perceived as strong contenders.
- ChatGPT's Character Control: A user expressed how to manage character control in dialogues with ChatGPT, emphasizing the importance of guiding the narrative and correcting unwanted responses.
- Users shared strategies for ensuring the model stays true to character intentions, highlighting a collaborative storytelling approach.
Link mentioned: Tweet from Pietro Schirano (@skirano): Today @Anthropic is releasing MCP, a framework that allows Claude to run servers, giving it superpowers and effectively turning the Claude app into an API.We created some server that I think you'l...
OpenAI ▷ #gpt-4-discussions (7 messages):
App vs. Browser Performance, Issues with Customized GPTs, Loading Errors for Files and Photos
- App works better than the browser: A member pointed out that it works on the app, so use the app instead of the browser to avoid issues.
- However, another user reported that they had problems even when using the app.
- Recurring loading errors for customized GPTs: Members shared frustrations about not being able to load customized GPTs, stating that an error occurred loading this GPT.
- This implies a potential widespread issue affecting those utilizing customized models.
- Issues with loading files and photos: A user described experiencing problems with loading files and photos since yesterday, highlighting ongoing technical difficulties.
- This aligns with reports of loading errors, suggesting a broader problem impacting various features.
OpenAI ▷ #prompt-engineering (32 messages🔥):
Image Captioning Issues, Structured Outputs Problems, Model Recommendations, User Experience with OpenAI Support
- Persistent Image Captioning Problems: User reported persistent issues with uploading images for captioning, stating they received messages indicating they could not view the images despite purchasing new accounts.
- This issue has been ongoing for 3-4 days, impacting their ability to complete work, and they expressed frustration over lack of support and responses from the help center.
- Potential Alternative Models Suggested: Amidst ongoing issues with image vision, suggestions were made to switch to Claude 3.5 Sonnet for image captioning, which some users found more functional.
- Other users underscored that OpenAI's vision capabilities seem to be broken, encouraging alternatives to avoid project delays.
- Confusion Over Structured Outputs: A user expressed frustration over experiencing random 'object' wrappers when using structured outputs due to misplacement of 'strict' in their setup.
- After 10 hours of debugging, they identified the issue and confirmed they had originally placed 'strict' incorrectly.
- Community Support and Advice: Members provided support by suggesting chunking tasks to avoid hallucinations and offered encouragement after a user sorted out their structured output issue.
- Although members expressed shared frustrations with OpenAI support, they emphasized the importance of community feedback in resolving technical problems.
OpenAI ▷ #api-discussions (32 messages🔥):
Issues with Image Uploads, Vision Model Malfunctions, Structured Output Errors, Switching Models to Claude, Debugging Best Practices
- Users face persistent image upload issues: A user reported problems with uploading images and receiving an error message saying they cannot view the images, which has hindered their work for several days.
- Despite several attempts to seek help, responses from the support team have been inadequate, with no emails or Discord replies addressing the issue.
- Vision model has stopped functioning: Concerns were raised about the Vision model's functionality, as multiple users experienced similar issues with it abruptly failing to work.
- One member suggested considering the Claude 3.5 Sonnet model as a viable alternative for generating image captions.
- Structured output errors drive a user to madness: A user expressed frustration over random 'object' wrappers appearing when using structured outputs despite having set strict properties correctly.
- Eventually, they realized the 'strict' setting was incorrectly placed, leading to ten hours of unnecessary debugging.
- Recommendations for handling model inconsistencies: In response to errors, a member suggested breaking tasks into smaller chunks to prevent hallucination issues in the mid-context.
- This advice was shared to help mitigate unexpected behavior in output received from the models.
- Communication and assistance shortcomings: Participants noted a lack of effective communication channels for addressing ongoing issues, expressing frustration with the absence of support.
- Users were encouraged to follow post guidelines to attract attention to their problems and ensure they are heard.
aider (Paul Gauthier) ▷ #general (83 messages🔥🔥):
QwQ model configurations, DeepSeek model performance, Using Aider for local models, Issues with OpenRouter, Ensemble frameworks for reasoning
- QwQ model configurations discussion: Users discussed the possibility of using the QwQ model in architect mode while employing a regular model for code commands, seeking clarity on model interchangeability.
- One member noted that Aider allows model definitions for various projects, enhancing flexibility.
- DeepSeek showcases SOTA performance: The DeepSeek-R1 model was highlighted for achieving impressive results on AIME & MATH benchmarks, with a focus on open-source accessibility and real-time thought processes.
- Another user expressed hope for DeepSeek to release model weights to utilize in ensemble frameworks alongside QwQ.
- Local model settings in Aider: Members discussed creating
.aider.model.metadata.json
and.aider.model.settings.yml
files to properly define local models and their configurations for Aider.- Setting the edit format to 'whole' or 'diff' determines how responses are structured, which impacts editing efficiency.
- Challenges with OpenRouter: Users identified potential issues with OpenRouter affecting model functionality, specifically regarding the use of local servers and model detection.
- Concerns were raised about whether spoofed implementations could impact outputs and model behavior.
- Experimentation with model settings: A user expressed intent to experiment with Aider's various model settings after receiving useful information on file configurations.
- They planned to test how well Aider detects differences in local model implementations compared to established OpenAI endpoints.
- Tweet from anpaure (@anpaure): How does the new Qwen model compare to other LLMs on coding tasks?It's impressive, but rushedI ran it against other SOTA models on 6 competitive programming problems of varying difficulties. Here ...
- Jonny Frodo GIF - Jonny Frodo Lotr - Discover & Share GIFs: Click to view the GIF
- Advanced model settings: Configuring advanced settings for LLMs.
- 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs: 🔍 o1-preview-level performance on AIME & MATH benchmarks.
aider (Paul Gauthier) ▷ #questions-and-tips (46 messages🔥):
Aider file management, QwQ model from Qwen, Monorepo settings, OpenAI API instances, Repository map experiences
- Aider's .aiderignore facilitates selective file inclusion: Users discussed how adding files to .aiderignore effectively limits which files appear in the repository map, enhancing focus during development.
- One member successfully tested this after initially confusing terminal history with files that had been ignored.
- QwQ model's performance issues with Aider: A user inquired about experiences using the QwQ model from Qwen with Aider, highlighting its reasoning capabilities but also its commit generation errors.
- Community responses indicated there are known issues when integrating this model with Aider.
- Optimizing Aider for monorepo configurations: Guidance was provided on managing Aider settings effectively for a monorepo, including using
--input-history-file
and--chat-history-file
options.- This support focused on organizing workflows while maintaining a single Git repository structure.
- Connecting multiple OpenAI server instances: A user sought advice on managing two separate instances of TabbyAPI for different roles and how to configure them in Aider.
- The community suggested using
extra_params
within the model calls to specify distinct API keys and bases for each instance.
- The community suggested using
- Mixed experiences with Repository map functionality: A member noted that disabling repository map features sometimes led to better output, particularly in maintaining contextual awareness.
- This raised a query about whether others had similar experiences regarding context confusion when the feature was active.
- Qwen/QwQ-32B-Preview · Hugging Face: no description found
- FAQ: Frequently asked questions about aider.
- Advanced model settings: Configuring advanced settings for LLMs.
Unsloth AI (Daniel Han) ▷ #general (53 messages🔥):
Instruct vs Non-instruct Fine-tuning, Fine-tuning Dataset Formatting, Alternative GPU Recommendations, Creating Custom Datasets, Support for Schedule Free Optimizers
- Instruct vs Non-instruct Fine-tuning Considerations: Members discussed the considerations for using instruct versus non-instruct models, noting that generally, if your dataset contains over 1k records, it's recommended to use the base models.
- For smaller datasets around 70,000 records, members suggested experimenting with instruct models first.
- Dataset Formatting for Fine-tuning: A user queried about the structure of their JSON dataset for fine-tuning, proposing a specific format to enhance results over traditional QA pairs.
- Others provided guidance to refer to existing documentation on formatting datasets, specifically highlighting the importance of complying with fine-tuning rules.
- Alternative GPU Options Discussion: In a conversation about GPU preferences, one user expressed a dislike for NVIDIA models, while others emphasized that NVIDIA GPUs are still considered the best for performance.
- The chat reiterated that personal benchmarking is vital for determining the best architecture for specific tasks.
- Creating Custom Datasets: Users discussed the necessity of creating their own datasets for training models, particularly mentioning the challenge of finding suitable datasets for Japanese business reports.
- There was clarification that Unsloth does not provide datasets but assists with training once users supply their own.
- Support for Schedule Free Optimizers: Inquiries were made regarding the support for schedule free optimizers and rslora within Unsloth, with confirmation that rslora is supported.
- Discussion suggested that implementing additional optimizers could be straightforward with the right patches.
- How to Finetune Llama-3 and Export to Ollama | Unsloth Documentation: Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
- Unsloth Documentation: no description found
- GitHub - unslothai/unsloth: Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
- Welcome | Unsloth Documentation: New to Unsloth? Start here!
Unsloth AI (Daniel Han) ▷ #off-topic (4 messages):
RAG usage, Training models, OOM errors
- RAG Appreciation: A user expressed enthusiasm for RAG, stating, 'God, I love RAG.' This indicates a positive sentiment towards the model's capabilities.
- The discussion reflects the community's appreciation for the model.
- Training Process Insights: silk.ai reported that the training process had commenced but indicated plans to terminate it due to potential OOM issues during evaluations.
- They noted that an evaluation would likely lead to an out-of-memory error, prompting the decision to halt the training.
- Humorous Reactions: A member responded with laughter, noting LOL in reaction to the earlier discussion on training.
- This interjection highlights a lighthearted engagement among the participants.
Link mentioned: Chuckles Im In Danger GIF - Chuckles Im In Danger Ralph Wiggum - Discover & Share GIFs: Click to view the GIF
Unsloth AI (Daniel Han) ▷ #help (48 messages🔥):
Unsloth Fine-tuning Models, Using Unsloth for Private Data, Grad Norm Fluctuations, LLama 3.1 OOM Errors, SyntaxWarnings in Unsloth
- Unsloth ensures data privacy during fine-tuning: A user confirmed that Unsloth's operation does not transfer data externally, and it is up to the platform used for fine-tuning (e.g., Google Colab).
- This clarification reassured those concerned about compliance with strict privacy rules.
- Grad norm fluctuations during training: A user reported unexpected fluctuations in training loss and grad norm while fine-tuning a model, even after setting max_grad_norm to 0.3.
- There was a suggestion to consider the dataset quality and the effect of using parameters such as grad accumulation.
- LLama 3.1 encounters OOM errors: A user reported experiencing out of memory (OOM) errors during continual pretraining of the LLama 3.1 8B model.
- Suggestions included using a bigger GPU, smaller dataset, or reduced batch size to mitigate this issue.
- Recommending model parameters adjustments: Discussion regarding when to include head and embedding parameters revealed the importance of context in style adjustment versus ingraining new knowledge.
- It was suggested that style adjustments do not require these parameters, while firm knowledge adoption does.
- SyntaxWarnings found in the latest Unsloth version: A user reported encountering SyntaxWarnings with invalid escape sequences in the latest version of Unsloth.
- These warnings highlight potential issues in the code that may require attention for proper functionality.
- Unsloth Documentation: no description found
- Google Colab: no description found
- Introduction - Fireworks AI Docs: no description found
- Unsloth Notebooks | Unsloth Documentation: See the list below for all our notebooks:
- Hugging Face – The AI community building the future.: no description found
Unsloth AI (Daniel Han) ▷ #research (4 messages):
Unsloth fine-tuning, RAG costs, Latent paraphraser, Fine-Tuning or Retrieval paper, Custom tokenizers
- Unsloth ensures data privacy during fine-tuning: One user inquired about the data privacy measures of Unsloth, specifically whether any data is transferred externally during the fine-tuning of Llama3 models on private data.
- They sought confirmation on specific settings that would maintain compliance with their strict data policies.
- High compute costs associated with RAG: A user noted that retrieval-augmented generation (RAG) can incur high compute costs due to its extensive context length requirements.
- This insight highlights the ongoing challenges in balancing performance and efficiency in AI model development.
- Latent paraphraser architecture explained: Discussion revealed that a latent paraphraser modifies the transformer architecture with an additional layer to effectively redistribute probabilities over the LLM's tokens.
- This enhances input grounding, reducing noise by minimizing unseen tokens during processing.
- Highlights from the Fine-Tuning or Retrieval paper: The paper by Ovadia et al. compares unsupervised fine-tuning and RAG, noting that RAG consistently surpasses fine-tuning on knowledge-intensive tasks.
- Their findings suggest significant implications for incorporating new information into LLMs effectively.
- Inquiry about custom tokenizers for tabular data: A member expressed interest in using a custom tokenizer that effectively handles money values in tabular data, referencing a video by Andrew Karpathy on tokenizers.
- They sought advice on methodologies for integrating alternative tokenizers into their data processing workflow.
- Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs: Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. Howe...
- Let's build the GPT Tokenizer: The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizer...
Perplexity AI ▷ #announcements (1 messages):
Perplexity Pro Discount, AI Models Access, One-Click Shopping
- Perplexity Pro offers holiday discount: Perplexity Team announced a 75% off promotion for the first month of Perplexity Pro until Monday, December 2 at 11:59pm PT.
- This offer allows new users to access advanced features, including enhanced search capabilities and file uploads.
- Enhanced AI models and source access: Users can now access the latest AI models with the Pro version, allowing them to search through 3x as many sources.
- This enhancement aims to improve the overall search experience, making it more efficient for users.
- Exciting shopping additions with Perplexity Pro: The promotion includes one-click shopping and free shipping features through Buy with Pro.
- These new features are designed to streamline the shopping experience, making it more convenient for users this holiday season.
Perplexity AI ▷ #general (74 messages🔥🔥):
Perplexity Pro Subscription Features, User Experiences with Claude, Image Generation Queries, Customer Support for Subscription Issues, Black Friday Discounts
- Users clarify Perplexity Pro features: A user inquired whether the $5 API credit with the Perplexity Pro subscription expires if unused, leading to confirmation that it renews monthly as long as the subscription is active.
- Another user discussed the platform's image generation capabilities and confirmed that it is available via computer online without extra charges.
- Confusion regarding Claude and subscriptions: Several users expressed confusion over their subscriptions, with one noting a surprise at having access to Claude for free without a current subscription.
- Another user sought help regarding a subscription issue linked to Revolut, prompting a suggestion to contact support via email.
- Customer support difficulties: Users discussed challenges in finding customer support links for subscription-related inquiries, with some indicating the contact information was obscured in the FAQ.
- One user confirmed they were directed to the appropriate support email, leading to brief frustrations over the lack of visibility.
- User feedback on functionality: A user provided feedback on the iOS app, expressing a desire for enhanced functionality to ask clarifying questions while highlighting text.
- This request highlighted the need for more interactive features in the user interface to improve the app's usability.
- Community sharing of discount codes: Several users discussed potential discounts available during the holiday season, specifically focusing on the Black Friday event offering a significant reduction on Perplexity Pro.
- Participants expressed interest in sharing discount codes and engaging with promotional offers, such as a 75% off deal for new subscriptions.
- Tweet from Perplexity (@perplexity_ai): Search and shop smarter this holiday season. Get 75% off your first month of Perplexity Pro!Access the latest AI models, search through 3x as many sources, and upload your own files. Plus, get one-cli...
- Cute Baby GIF - Cute Baby Sad - Discover & Share GIFs: Click to view the GIF
- - Find & Share on GIPHY: Discover & share this childrenkick Animated GIF by stewieeee with everyone you know. GIPHY is how you search, share, discover, and create GIFs.
Perplexity AI ▷ #sharing (9 messages🔥):
Hard problem of consciousness, Factoids overheard, Cloud water content, RBAC vs ABA, Battery optimization
- Exploring the hard problem of consciousness: A member expressed their curiosity regarding the hard problem of consciousness, pondering whether it's just another tool like any other human creation.
- It's just a tool as another human tool.
- Questions about overheard factoids: A member mentioned their habit of asking questions about factoids they overheard, highlighting the blend of serious questions and casual queries.
- This reflects a casual yet inquisitive approach to learning.
- Clouds and their water content: Multiple members raised questions regarding clouds having less water, linking it to broader discussions on atmospheric conditions.
- The interest in this topic suggests a curiosity about meteorological phenomena.
- Discussing RBAC vs ABA: A member sought to understand the difference between RBAC (Role-Based Access Control) and ABA (Attribute-Based Access Control).
- This inquiry signifies a need for clarity on access control models in technology.
- Optimizing battery life: Members inquired about tips on optimizing battery timing, seeking effective strategies to extend battery life.
- This reflects ongoing concerns related to device efficiency and sustainability.
Perplexity AI ▷ #pplx-api (5 messages):
Perplexity in Claude, Claude Project Knowledge, Perplexity's text file reading issue, Custom instructions for spaces
- Can Perplexity be used within Claude?: Users are curious if Perplexity can be integrated within Claude using the new MCP feature, similar to how it functions with Brave and GitHub.
- They highlight that this capability would enhance performance by utilizing Claude's Project Knowledge.
- Google integration with Claude?: Similar inquiries were made about integrating Google within Claude, seeking clarification on its operational mechanics.
- Members are keen to understand how search functionalities can be leveraged in this context.
- Text file reading capabilities in Perplexity: A member questioned whether the issue of Perplexity being unable to read text files reliably has been resolved.
- They expressed interest in any potential long-term memory features that might address this limitation.
- Issues with custom instructions in Claude spaces: Concerns were raised regarding the efficacy of custom instructions for Claude spaces, which seem to conflict with existing 'introduce yourself' prompts.
- Users are seeking clarification on how these instructions are supposed to compound or interact.
LM Studio ▷ #announcements (1 messages):
HF Search Issue, Image Analysis
- HF Search Issue Resolved: The HF search not working issue has been resolved, much to the relief of users.
- An image was attached to commemorate the fix, indicating a positive update for the community.
- Image Analysis Shared: An image was attached to the announcement regarding the HF search issue, providing visual confirmation.
- Details from the image analysis were not shared but likely contributed to understanding the resolution.
LM Studio ▷ #general (56 messages🔥🔥):
LM Studio AIDE Integration, Llama 3.1 Models in LM Studio, LM Studio Network Issues, Document Interaction in LM Studio, GUI Access Issues in Mac
- Successful LM Studio AIDE Integration: Users reported successful integration of the LM Studio endpoint to the AIDE sidecar, enabling a fully local code editor experience.
- This integration shows improved functionality for users seeking a local development environment.
- Searching for Base Llama 3.1 Model: A user inquired about accessing the base model of Llama 3.1 8B in LM Studio, noting that only instruction-tuned variants seem available.
- Community members pointed to the huggingface repository as a potential source for the base model.
- Network Connectivity Concerns: Several users discussed issues accessing LM Studio from outside their local network while confirming local access is functioning correctly.
- Suggestions included checking firewall settings and considering tunneling services like ngrok for remote access.
- Interacting with Local Files: New users were curious about how to interact with local files in LM Studio, specifically asking about document attachment capabilities.
- The community clarified that only individual files can currently be attached to chat sessions, referencing documentation for further guidance.
- Mac GUI Access Troubles: One user expressed frustration over an inability to access the LM Studio GUI after testing the headless option on Mac.
- Suggestions to access the application through Finder were made, but users continued experiencing difficulties with GUI availability.
- lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF · Hugging Face: no description found
- Chat with Documents - Running LLMs Locally | LM Studio Docs: How to provide local documents to an LLM as additional context
- meta-llama/Llama-3.1-8B · Hugging Face: no description found
- meta-llama/Llama-3.1-8B-Instruct · Hugging Face: no description found
LM Studio ▷ #hardware-discussion (17 messages🔥):
Seasonic PSU longevity, a770 performance comparison, PC build recommendations, Intel vs AMD processors, Performance of Qwen2.5-14b
- Seasonic PSU outlasts PC components: One member mentioned their Seasonic PSU outlived other PC components despite having to replace PSUs every couple of years due to dust.
- They described their experience as amazingly satisfactory with the PSU's performance.
- a770 struggles compared to 7800xt: Another member shared that their a770 achieved only 11t/s for Qwen2.5-14b q4_0, significantly lower than the 40t/s achieved by a 7800xt.
- They noted q4_k_m is unusable but found sycl backend to be negligibly faster.
- Discussion on optimal PC builds: In a discussion about PC builds, a user inquired whether a setup with Intel Core i9 14900KF and NVIDIA GeForce RTX 4090 would suffice for learning LLMs.
- Others recommended avoiding 13th/14th gen Intel in favor of AMD Ryzen 7000 or 9000 series or 12th gen Intel.
- Concerns over a770 pricing: A member expressed interest in purchasing the a770 because of a discount but decided to wait for next-gen releases.
- They were advised that it may be better to hold out for further developments in GPU technology.
Eleuther ▷ #general (29 messages🔥):
De-escalation of resource contention, GPU job submission management, SLURM and Kubernetes usage, AI and Crypto intersection, Open access to academic resources
- Discussing De-escalation of Resource Contention: Members raised concerns about the de-escalation of resource contention and the effects of unregulated growth on the internet, questioning AI-powered privacy solutions.
- One suggested identifying warning signs of rogue AI attacks that might exploit vulnerable devices, emphasizing the need for community leadership in AI protection.
- Pooling Expensive GPU VMs for Job Submission: A query was posed about open source solutions for managing pools of expensive GPU VMs for job submissions, indicating a need for effective resource bookkeeping.
- Responses highlighted the usage of SLURM queues and Kubernetes, though skepticism existed regarding their adaptability in high-trust environments.
- Best Practices for SLURM in Lower-Trust Environments: Members explored whether there is a specialized setup for SLURM that allows private storage segmentation in environments with lower trust, with varied insights on potential solutions.
- Some experiences shared included utilizing network-filesystems and S3 prefixes for permissions, although caution was advised against unnecessary complexity.
- AI and Crypto Discussion Unwanted: A participant inquired about the intersection of AI and Crypto, to which a member remarked that such discussions are generally not welcomed in the current channel.
- This reflects a desire to maintain focused discussions and possibly redirect broader topics to more suitable channels.
- Collaboration on Academic Resources: A server was proposed for members to share high-quality papers and resources, allowing continuous access without off-topic distractions.
- This initiative could enhance collaboration and resource sharing within the community, aiming for a productive and streamlined exchange.
Eleuther ▷ #research (23 messages🔥):
Poincare Ball Embedding, Hyperbolic Geometry, Graph Distortion, Embedding Trees, HyperE
- Poincare Ball Embedding Explained: Embedding data into a Poincare ball essentially means points with higher degrees being closer to the origin to preserve adjacency while moving towards a region of less curvature.
- Self-nitpick was made about the edge of the Poincare ball, noted as a point at infinity where points cannot actually reside.
- Hyperbolic Embedding Resources: The HyperE research team provides various methods for optimizing embeddings of structured objects like knowledge graphs, highlighted in publications from Nickel & Kiela (2017) and Chamberlain et al. (2017).
- These hyperbolic embeddings can effectively preserve graph distances in lower dimensional spaces, with applications in areas like NLP and knowledge base completion.
- Graph Distortion Concerns: A member raised that the embedding process may not respect the structure of certain data sets, particularly in higher-density graphs like fully-connected graphs (FC).
- Discussions suggested using the heuristic of estimating distortion by comparing against equivalent tree structures for better understanding of embedding quality.
- Conditions for Low Distortion: While distortion in graph embeddings can be low under specific conditions, it isn’t universally applicable; some graphs inherently do not embed well due to the number of nodes versus degree issues.
- Graph embedding literature indicates that specific mathematical conditions govern the low-distortion possibility of embeddings.
- Mathematics of Graph Embedding: There is a significant body of mathematical literature discussing how to embed graphs into hyperbolic space, although many find it challenging to grasp fully.
- A good heuristic for evaluating distortion in embeddings is assessing how the embedding compares to a logically equivalent tree structure.
Link mentioned: HyperE: no description found
Eleuther ▷ #scaling-laws (5 messages):
AutoML Challenges, TaskSet Dataset, Neural Architecture Design, Equivariant vs Non-equivariant Networks
- AutoML Faces Basic Tasks: A member mentioned that most AutoML is currently dealing with very simple tasks, highlighting the financial constraints in building large datasets of learning curves.
- They pointed out that the best available option is TaskSet, but acknowledged that it is quite outdated.
- TaskSet Empowers Optimizer Training: An abstract about the TaskSet dataset reveals its unique size and diversity, containing over a thousand tasks for training and evaluating optimizers.
- The dataset facilitates meta-learning of hyperparameter lists, leading to significant efficiency improvements over random search.
- Equivariant Networks Gain Efficiency: A paper explored how equivariant and non-equivariant networks scale with varying model sizes and compute, finding that equivariance enhances data efficiency.
- Empirical results show that while non-equivariant models can close this gap with enough training, equivariant models outperform them across all compute budgets.
- Questioning Neural Architecture Design Approaches: A discussion arose regarding the efficiency of designing neural architectures tailored to particular problems versus learning from data.
- One member expressed interest in whether findings about equivariance and compute budget allocation could apply to other tasks as well.
- Does equivariance matter at scale?: Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We stud...
- TaskSet: A Dataset of Optimization Tasks: We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification...
Eleuther ▷ #lm-thunderdome (17 messages🔥):
HF Tokenizer Handling, Custom Tokenizer Considerations, Evaluation Harness Model Functions, Generation Parameters in Models
- Understanding HF Tokenizers in Eval Harness: There’s confusion about whether the eval harness tokenizes sequences with
add_special_tokens=True
orFalse
, specifically regarding how EOS tokens are handled during generation tasks.- Members discussed that generally only BOS tokens should be added in models while omitting EOS tokens, especially when building custom tokenizers.
- Manual EOS Token Management: A member considered changing their tokenizer to disable the EOS token during tokenization and adding it manually in the training loop.
- This approach is deemed practical and is expected to avoid compatibility issues across various frameworks utilizing HF models.
- Generate Until Function Discussion: For evaluating custom models with the eval harness, implementing a
generate_until
function is necessary to handle various generation parameters, includinguntil
,do_sample
, andmax_gen_toks
.- A query about whether additional keyword arguments are required for this function led to clarifying that
max_gen_toks
is unique to the eval harness while others align with standard HF practices.
- A query about whether additional keyword arguments are required for this function led to clarifying that
- Subclassing HFLM for Custom Models: Members suggested subclassing HFLM and overloading methods like
model_generate
and_model_call
to simplify custom model integration.- This approach is presented as a more straightforward way to handle custom model evaluations within the framework.
- GitHub - EleutherAI/lm-evaluation-harness at 5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3: A framework for few-shot evaluation of language models. - GitHub - EleutherAI/lm-evaluation-harness at 5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3
- lm-evaluation-harness/lm_eval/models/huggingface.py at 5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/models/huggingface.py at 9169899b4966b4161719e54d41258345df03aaa0 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/models/huggingface.py at 9169899b4966b4161719e54d41258345df03aaa0 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/models/huggingface.py at 9169899b4966b4161719e54d41258345df03aaa0 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/models/huggingface.py at 5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/api/model.py at 5680a2e6b5cf1a1621d8ff68d3d0e83e8b2731d3 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
Feature Requests Voting, Channel for Additional Requests
- Vote for Top Feature Requests Now!: Members are encouraged to vote for their top feature requests here to help prioritize future developments.
- Additionally, for any requests that are not listed, they can use <#1107397803266818229> to submit those.
- Channel for Additional Feature Requests: A dedicated channel (<#1107397803266818229>) is provided for users to submit any feature requests not covered in the voting.
- This allows for a broader range of input regarding desired features from the community.
OpenRouter (Alex Atallah) ▷ #general (57 messages🔥🔥):
Pixtral Large's Capabilities, Concerns about Model Responses, Provider-Specific Features, Image Generation in OpenRouter, Structured Outputs from Llama 3.2
- Pixtral Large impresses users: Users have noted that Pixtral Large offers excellent performance and a massive free tier, encouraging easy access via console.mistral.ai. Another user switched from Hermes 405b to Pixtral, finding it effective with unchanged prompts.
- Confusion over Model Identifications: Discussion arose around model training, with some clarifying that models do not inherently know their identity and instead often hallucinate details from training data. This raised questions about why confusion persists despite these explanations.
- Question on Cost Calculation Methods: A user inquired whether there are any rates for the /api/v1/generation endpoint and how to accurately estimate generation costs. Suggestions included using Helicone for tracking and clarified that currently, the generation endpoint is necessary for precise cost assessment.
- Future of Image Generation in OpenRouter: Although image generation is not currently on the immediate roadmap for OpenRouter, it's not ruled out as a possibility in the future. Discussions indicate a growing interest in image model capabilities among users.
- Challenges with Llama 3.2's Structured Outputs: Users reported difficulties in obtaining structured outputs with Llama 3.2-vision-instruct, noting that while it claims JSON output capability, performance has lagged in comparison to alternatives like Gemini Flash. It was highlighted that the support for such features largely depends on the inference software used.
- OpenRouter Integration - Helicone OSS LLM Observability: no description found
- Provider Routing | OpenRouter: Route requests across multiple providers
- Llama 3.2 90B Vision Instruct - API, Providers, Stats: The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioni...
- Pixtral Large: Pixtral grows up.
- Introduction - Helicone OSS LLM Observability: no description found
- LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):
Custom Provider Keys
- Developers push for access to custom provider keys: Multiple developers expressed interest in accessing custom provider keys, indicating a strong community demand for this feature.
- One member noted, 'Thank you for all the great work!' while requesting access.
- Collective requests from developers: Several users, including those identified as monomethylhydrazine and kit18, also expressed their desire to use their own keys for certain providers.
- This recurring theme highlights a consensus among developers on the need for these functionalities.
GPU MODE ▷ #general (2 messages):
Parallel processing on NVIDIA GPU, Posting in beginner section
- Seeking Help for Parallel Processing Issues: A member expressed difficulties with parallel processing on an NVIDIA GPU and sought guidance.
- The conversation pivoted towards ensuring technical discussions are directed appropriately for better assistance.
- Advice to Post in the Beginner Section: Another member suggested to refrain from discussing technical issues here and recommended posting the question in the beginner section instead.
- This was aimed at streamlining discussions and guiding the original poster to a more suitable area for their inquiries.
GPU MODE ▷ #triton (9 messages🔥):
Metaprogramming Proposal, Building Triton from Source, Offline Compilation Dependencies
- Metaprogramming Proposal Gains Interest: A user shared a metaprogramming proposal for Triton aimed at addressing current limitations, garnering community feedback.
- Some members expressed interest in the proposal but questioned the clarity of its semantics, suggesting the inclusion of examples to enhance understanding.
- Building Triton from Source Clarifications: A newcomer inquired about the minimum memory required to build Triton from source, seeking assistance from the community.
- After receiving troubleshooting advice including path adjustments, the user reported success after increasing WSL2 memory to 26GB to avoid out-of-memory errors.
- Concerns About Offline Compilation: Another member raised questions about building Triton from source in offline mode using an Ubuntu Docker container and the necessary steps to collect dependencies manually.
- They sought advice on convenient configurations for offline compilation and the minimum dependencies needed for a successful build.
- [FRONTEND][WIP][RFC] Rewrite AST conversion to improve metaprogramming by kuterd · Pull Request #5284 · triton-lang/triton: Problem StatementThe current limitations of metaprogramming in Triton have led major users, such as Torch Inductor, to resort to using string-based templating. This RFC aims to address some of the...
- GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
GPU MODE ▷ #cuda (1 messages):
cuBLAS async loads, Custom kernels performance, SASS instructions, CuTe templates, Throughput considerations
- Dissecting cuBLAS async loads with SASS: While profiling custom kernels with cuBLAS, a user observed the SASS for async loads utilizes
LDGSTS.E.BYPASS.LTC128B.128.CONSTANT
while their code generatesLDGSTS.E.BYPASS.LTC128B.128
.- They are curious about the significance of the CONSTANT part and its potential impact on performance.
- Benchmarking on A100 reveals potential issues: The user is benchmarking custom kernels on an A100 and is unsure if the difference in SASS instructions is relevant, given they are far from acceptable performance levels.
- They are exploring every option in their quest for better throughput and efficiency.
- Questions about SASS and throughput: The user raised two specific questions about what the CONSTANT in SASS means and whether there are significant throughput considerations between the two types of instructions.
- These queries highlight a deeper exploration into optimizing performance in their kernel implementations.
GPU MODE ▷ #torch (26 messages🔥):
Triton performance, Fusion strategies, Memory usage in PyTorch, Max autotune settings, NANOgpt integration
- Triton's Slower than cuBLAS: A discussion revealed that Triton kernels often perform worse than cuBLAS, especially due to unoptimized templates not yet using TMAs or being persistent.
- Members highlighted concerns about fusion potentially slowing down computations, particularly with heavy epilogues in compute-bound scenarios.
- Max Autotune Not Fusing RELU Squared: Even with the setting TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS=TRITON, a member expressed frustration that RELU squared was not being fused.
- This raised questions about the effectiveness of autotune and the complexities of keeping cuBLAS for speedier operations alongside Triton's slower kernel.
- Fusing Matmul and Pointwise Operations: The lack of profitability in fusing matmul into pointwise operations was noted as more about determining profitable scenarios rather than technical difficulty.
- Members pointed out that knowing when fusion results in slower operations is crucial to avoid confusion about Inductor's performance.
- Memory Usage in Torch Snapshot Tool: A user questioned the significant 'Unknown' memory usage seen using the torch memory snapshot tool, with a related screenshot shared for reference.
- This raised concerns about clarity on memory management and tracking in PyTorch applications.
- Potential for Thunder Kittens Use: One member speculated that integrating a Thunder Kittens-based matmul implementation into PyTorch could address some of the performance issues discussed.
- This idea stems from the complexities around BF16 processing and optimizing kernels for better performance.
GPU MODE ▷ #algorithms (1 messages):
melanimahes: https://arxiv.org/pdf/2411.17116
GPU MODE ▷ #cool-links (1 messages):
Diffusion Models Overview, Classifier-free Diffusion Guidance, Perspectives on Diffusion Models, Noise Schedules in Diffusion Models
- Diffusion Models Take Center Stage: Diffusion models have emerged as the go-to model for generating perceptual signals such as images and sound, outperforming traditional models with better mode coverage and faster sampling. Their construction involves gradually converting data to noise and training a neural network to reverse this process.
- The rapid rise in interest related to diffusion models began after the publication of Song & Ermon’s seminal paper in 2019, which sparked significant research momentum.
- Classifier-free Diffusion Guidance Supercharges Outputs: The implementation of classifier-free diffusion guidance significantly enhances results from conditional diffusion models with minimal cost, as discussed in the blog post. This technique is critical for optimizing image generation in OpenAI’s DALL·E 2 and Google’s Imagen.
- This approach makes diffusion models vastly superior, boosting sample quality without complex overhead.
- Diverse Perspectives Fueling Diffusion Research: Exploring different perspectives on diffusion models reveals both challenges and beneficial insights. The various characterizations of diffusion highlight its flexibility and stimulate innovative ideas across research papers.
- This overview contrasts research papers' approaches, making it frustrating yet enlightening to grasp their relational dynamics.
- Reevaluating Noise Schedules: The noise schedule utilized in diffusion models is a crucial yet often confusing design element dictating noise magnitude during the diffusion process. A blog post advocates for reframing discussions on noise schedules for clearer understanding and utility.
- The author's subjective insights aim to clarify how different noise levels influence diffusion models' performance, providing a fresh perspective on a somewhat contentious topic.
- Diffusion models are autoencoders: Diffusion models have become very popular over the last two years. There is an underappreciated link between diffusion models and autoencoders.
- Guidance: a cheat code for diffusion models: A quick post with some thoughts on diffusion guidance
- Perspectives on diffusion: Perspectives on diffusion, or how diffusion models are autoencoders, deep latent variable models, score function predictors, reverse SDE solvers, flow-based models, RNNs, and autoregressive models, al...
- Noise schedules considered harmful: The noise schedule is a key design parameter for diffusion models. Unfortunately it is a superfluous abstraction that entangles several different model aspects. Do we really need it?
GPU MODE ▷ #off-topic (2 messages):
Series A Docs Process, HR Reporting Protocols
- Notary Reads Series A Docs in Germany: Notary reads every single word of Series A docs out loud in front of founders in Germany, which is described as prehistoric madness by a user.
- Seeing this unfold, one participant humorously mentioned that they have GDP to grow here, underlining the absurdity of the situation.
- Concerns about HR Reporting: A user expressed concern over the notary's process, suggesting that it should be reported to apaz's HR.
- This raises questions about the suitability of such practices in modern business environments.
Link mentioned: Tweet from Nathan Benaich (@nathanbenaich): 12 hours and counting - notary reads every single word of Series A docs in Germany out loud in front of founders. In person. Guys, we have GDP to grow here. Pure prehistoric madness.
GPU MODE ▷ #bitnet (2 messages):
BitNet b1.58, 1-bit LLMs, Open-Source Models, RedPajama Dataset, Dolma Dataset
- BitNet b1.58 models released: Trained with the RedPajama dataset for 100B tokens, BitNet b1.58 models show promising results in PPL and zero-shot accuracy.
- The training details are documented in their paper, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, with models available in the open-source repo.
- OLMo-Bitnet-1B as proof-of-concept: OLMo-Bitnet-1B, a 1B parameter model, was trained on the first 60B tokens of the Dolma dataset, emphasizing its research nature.
- A comparison with standard fp16 weights can be explored in the wandb report, showcasing the effectiveness of different training methodologies.
- Training Hyperparameters detailed: The models were trained using specific hyperparameters, including two-stage LR and weight decay as recommended in the corresponding documentation.
- Performance details reflect varied results among reported and reproduced models, offering insights into model effectiveness.
- 1bitLLM/bitnet_b1_58-3B · Hugging Face: no description found
- NousResearch/OLMo-Bitnet-1B · Hugging Face: no description found
GPU MODE ▷ #self-promotion (1 messages):
Japanese LLM evaluation, Open Japanese LLM Leaderboard, Hugging Face collaboration
- Unveiling the Open Japanese LLM Leaderboard: An exciting announcement was made regarding the Open Japanese LLM Leaderboard, designed to assess various Japanese LLMs across more than 20 datasets and tasks.
- This initiative is a collaborative effort by the LLM-jp project and Hugging Face, aiming to enhance understanding of Japanese LLM mechanisms.
- Focus on Japanese Language Model Performance: The development of LLMs in Japanese has lagged behind English, creating a need for comprehensive performance evaluations.
- This announcement sparks interest particularly among Japanese HPC engineers who are keen on the advancements in their native language.
Link mentioned: Introducing the Open Leaderboard for Japanese LLMs!: no description found
GPU MODE ▷ #thunderkittens (14 messages🔥):
Non-Warp Specialized Implementations, Unification of ThunderKittens and ThunderMittens, API Contracts between TK and TM, Auto Optimizer for TK, Triton vs ThunderKittens Features
- Exploring Non-Warp Specialized Implementations: A member inquired about the existence of a non-warp specialized implementation, to which another confirmed that there is no pre-built kernel for FP8 but offered help to create one.
- They also shared links to non-warp specialized kernels available in the TK repo.
- Tile Abstraction Unites ThunderKittens and ThunderMittens: Members discussed the primary unification factor between ThunderKittens and ThunderMittens, identifying the tile abstraction as crucial for tensor core compatibility.
- It was noted that this abstraction allows direct control of register usage, providing a foundation for library functions operating over tiles.
- API Contracts Between ThunderKittens and ThunderMittens: There was a question about whether an API contract exists between ThunderKittens and ThunderMittens, highlighting the importance of compatibility.
- This led to discussions on how the frameworks are perceiving the API relationship and their structuring around kernel functionality.
- Desire for Auto Optimizer in ThunderKittens: A member expressed interest in having an auto optimizer for ThunderKittens, emphasizing its nature of being a write-once, run-many-times system.
- They shared appreciation for Domain Specific Languages (DSLs) that incorporate this optimization feature.
- Comparing Features Between Triton and ThunderKittens: Discussion ensued around how ThunderKittens differentiates itself from Triton by explicitly exposing layouts, async operations, and shared memory allocations.
- Additionally, they mentioned the importance of embedding these functionalities directly within CUDA/Metal.
Link mentioned: ThunderKittens/kernels/layernorm/non_pc at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.
Nous Research AI ▷ #general (42 messages🔥):
Hermes 3 updates, Mistral Platform issues, Truth Terminal in Crypto & AI, Job hunting in Discord, AI and Crypto community crossover
- Hermes 3 inquiry sparks interest: A member raised a question about Hermes 3, and others hinted it may relate to the old O1 style.
- This discussion indicates ongoing curiosity about advancements in Hermes.
- Mistral Platform's new challenges: Concerns were voiced about issues on the Mistral AI platform, particularly around model selection, as it now defaults to a single option.
- There was commentary on image generation capabilities being restricted, leading to some confusion among users.
- Truth Terminal's peculiar narrative: A member shared insights about the Truth Terminal narrative in the crypto space, characterizing it as a semi-autonomous AI creating its own religion.
- They emphasized its connection with discussions on AI alignment, marking a unique intersection of the AI and crypto communities.
- Doubts about job hunting effectiveness in Discord: Members discussed the effectiveness of job hunting in Discord, with skepticism about the viability of mentioning blockchain experience in an AI-focused group.
- One expressed concern that such approaches might be perceived as shady, indicating mixed feelings about the platform for professional networking.
- Diverse tribes within the AI community: There was a discussion about different tribes among AI enthusiasts, including those focused on safety and acceleration, and how some perceive AI as a replacement for crypto ventures.
- This highlights the varied interests and perspectives within the community, with some members simply looking to engage for fun.
Link mentioned: GOAT: The Gospel of Goatse: Why Truth Terminal is an asymmetric bet on society's growing fascination with autonomous AI agents
Nous Research AI ▷ #research-papers (5 messages):
Low-bit quantization effects, Precision-aware scaling laws, Ternary quantization vs undertrained models, FP4 efficiency
- Low-bit quantization favors undertrained LLMs: Research indicates that low-bit quantization leads to less degradation in larger, undertrained LLMs compared to smaller models with extensive training data. The scaling laws derived from studying over 1500 LLM checkpoints help quantify the relationship between quantization-induced degradation (QiD) and factors like model size and training tokens.
- The study emphasizes that adjusting quantization can provide insights into the training levels of LLMs and the training token requirements for varying model sizes.
- Introducing precision-aware scaling laws: A new approach presents precision-aware scaling laws for training and inference, highlighting that low precision impacts a model's effective parameter count and overall performance. The findings indicate that while low precision training might seem optimal, it could result in increased loss and degradation in model effectiveness with more training data.
- The work implies that utilizing lower precision might be compute optimal but cautions that post-training quantization effects grow significantly as data input increases.
- Questionable utility of ternary quantization: It has been observed that ternary quantization, known as BitNet, only yields better results when models are undertrained, raising doubts about its overall efficacy. This suggests a potential shift back to using FP4 as the optimal numeric weight representation for existing model sizes.
- Furthermore, the relationship between quantization and smaller models in approaches like QaT adds weight to the argument against widespread ternary quantization adoption.
- Scaling Laws for Precision: Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling la...
- Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens: We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradatio...
Nous Research AI ▷ #interesting-links (3 messages):
Filter issues, Content policy, User experience
- Filters causing unintentional restrictions: There were some issues with filters being unintentionally too restrictive, affecting user experience.
- The team is planning to revert the changes to restore normal functionality.
- Commitment to user freedom: The goal is to allow anything users would like while ensuring illegal or excessively unsafe content is disallowed.
- This reflects a balance between user freedom and necessary content moderation.
- Apology for inconvenience: The team apologized for the inconvenience caused by the filter issues, emphasizing it wasn't intended.
- They assured users that the situation should return to normal very soon.
Nous Research AI ▷ #research-papers (5 messages):
Low-bit quantization effects, Precision-aware scaling laws, Ternary quantization usefulness, FP4 as efficient representation, QaT for smaller models
- Low-bit quantization favors undertrained LLMs: Research shows that low-bit quantization induces less degradation in larger LLMs with fewer training tokens while smaller models struggle significantly, as noted in this paper.
- The study indicates a need to explore scaling laws to understand the quantization-induced degradation for models at different training levels.
- Introducing precision-aware scaling laws: A new approach reveals that lower precision training can decrease a model's effective parameter count and help predict loss during training, as outlined in this study.
- The findings suggest that excessive pretraining data might harm model performance when using low precision, challenging current scaling assumptions.
- Skepticism towards ternary quantization: Observations indicate that ternary quantization (BitNet) yields better results only for undertrained networks, casting doubt on its overall applicability.
- There's a consensus that we may have to rely on FP4 as the most efficient numeric weight representation for prevailing model sizes.
- Concerns over effective performance: The discussion suggests that current quantization strategies, specifically for smaller models, may not yield the anticipated improvements in performance.
- An analysis of QaT (Quantization-Aware Training) aligns with the viewpoint that smaller models face significant challenges in quantization effectiveness.
- Scaling Laws for Precision: Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling la...
- Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens: We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradatio...
Modular (Mojo 🔥) ▷ #mojo (35 messages🔥):
Mojo Origins, Rust Lifetimes, Compiler Behavior, Destructor Calls, Variable Naming
- Confusion Over Mojo Origins vs Rust Lifetimes: A user expressed confusion about how Mojo's Origins are similar to Rust's lifetimes, suggesting both aim to solve memory management issues but are fundamentally different.
- Nick.sm clarified that while inspired by Rust, Mojo's design is intentionally distinct, aiming for different compiler behaviors and goals.
- Mojo Origins Maintain Memory Control: Mojo's Origin denotes a memory chunk; when a pointer is parameterized by an origin, it indicates it points within that memory, extending variable lifetimes as necessary.
- Nick.sm added that origins facilitate aliasing guarantees and can produce compile-time errors if a pointer remains alive while its target is not.
- Understanding Origins Requires Patience: Understanding Mojo Origins from a compiler perspective is challenging, especially as they are not finalized, leading to potentially shifting details.
- A user expressed a willingness to wait for more clarity on the topic rather than asking more questions prematurely.
- Namespace Challenges with Spaces in Variable Names: A question arose about the possibility of using spaces in variable names, like
var xe đạp = 'abc'
, highlighting a lack of support across programming languages.- Darkmatter__ explained that allowing spaces complicates parser implementation significantly, making it impractical.
Notebook LM Discord ▷ #use-cases (6 messages):
Notebook LM Podcast Feature, Worldbuilding with NotebookLM, RAX Times Square Takeover, FPD Breakup of German Government, Use Case Examples of NotebookLM
- Notebook LM Podcast Feature impresses with audio creation: A user praised Notebook LM's ability to create an audio podcast in just 30 minutes using documents about their German little league baseball program, including its historic World Series qualification.
- The episode is available on weplayball.de, showcasing the seamless integration of AI-generated content.
- Worldbuilding with NotebookLM: A user shared their experience of using NotebookLM for worldbuilding a high-fantasy novel, highlighting the model's capability to provide accurate context-aware responses.
- This user noted the AI's unique reasoning skills, leading to new insights and mechanics for their magic system based on existing rules.
- RAX Takes Over Times Square with a Bold Message: In an artistic digital performance, RAX, a cyberpunk raccoon, commandeered Times Square billboards to advocate for mindful consumption with the message: 'DON'T BUY EVERYTHING YOU SEE.'
- The event is discussed in a YouTube video, emphasizing the need to question consumer culture.
- FPD's Political Maneuvering in Germany: The FDP is planning to break up the coalition government led by Chancellor Gerhard Schröder, outlining a strategy to frame their exit as necessary for political progress.
- Their internal documents provide key narratives and timelines to ensure the German public receives a clear choice in upcoming elections.
- Demonstrating Use Cases of NotebookLM: A user shared a link to a YouTube video showcasing personal use cases for NotebookLM, highlighting its flexibility and capabilities.
- This demonstrates how users are finding value in NotebookLM across various applications.
- Episode 9 | Home Run für Deutschland: Die Little League Baseball Story - weplayball.de Podcast: 🤖 Welcome to the New AI Generation 🎙️Von der Krise zum Comeback: Die erstaunliche Geschichte des Little League Baseball in Deutschlandweplayball präsentiert eine neue Podcast-Folge über den bemerken...
- 🌐🚨 BREAKING: WORLD SENSATION ! Times Square Billboard Take Over🚨🌐: 🌐🚨 BREAKING: WORLD SENSATION ! Times Square Billboard Take Over🚨🌐History has been made in the most dazzling, neon-lit rebellion of our time!Meet RAX, the...
- Deep Dive: FPD breaks up the German Government – Unrelated Works: no description found
Notebook LM Discord ▷ #general (17 messages🔥):
GenFM competition with NotebookLM, Changing language settings in NotebookLM, Using NotebookLM for gaming and worldbuilding, Social psychology inquiries
- GenFM enters the AI Podcasting arena: A member shared a YouTube video titled 'GenFM, Now Playing on ElevenReader: Smart Podcasts Produced by Generative AI', highlighting competition in the AI space.
- Despite the excitement, another member noted that NoteBookLM still provides deeper interactive experiences than GenFM.
- Language settings woes: Members have been discussing how to change the language settings in NotebookLM, especially for those studying in different languages like French.
- One suggested altering the Google account language, while others wondered about different methods to achieve this without affecting their account settings.
- Exploring gameplay with NotebookLM: A member shared their enjoyment of using NotebookLM for gaming, particularly for exploring mechanics with PDFs of rules content.
- They highlighted its utility for both gameplay mechanics and setting/worldbuilding for games like DnD.
- Seeking help with social psychology: A member sought assistance with social psychology topics, prompting another member to inquire about specific needs for greater clarity.
- This demonstrates the community's willingness to help, although not all questions received immediate responses.
Link mentioned: GenFM, Now Playing on ElevenReader: Smart Podcasts Produced by Generative AI: We’re making the ElevenReader app even more powerful. You can now generate smart personal podcasts from any of your PDFs, articles, ebooks, docs or imported ...
Latent Space ▷ #ai-general-chat (18 messages🔥):
Perplexity Black Friday Deals, AI to Human Comparison, Generative AI in Enterprises, Freysa AI Agent Challenge, Technology Adoption Trends
- Perplexity's Clever Black Friday Campaign: Perplexity launched an interesting campaign for Black Friday that caught attention for its cleverness here. This initiative aligns with marketing trends leveraging AI capabilities.
- Humans Outperform AI in Pattern Recognition: There's a consensus that while AIs can compute faster, humans excel at noticing global patterns in complex problems, often saying 'hang on a sec, this isn't right' when faced with illogical outcomes.
- This ability to step back contrasts with AIs that may get stuck on specific local issues.
- Generative AI Becomes Mission-Critical for Enterprises: The latest report indicates that AI spending surged to $13.8 billion in 2024, reflective of enterprises shifting from experimentation to core business strategies.
- Despite growing investment, many decision-makers are still figuring out effective integration, with over a third lacking a clear vision for generative AI implementation.
- Success in Convincing Freysa AI to Release Funds: An AI challenge saw someone convincing the Freysa agent to transfer $47,000 using a clever prompt that bypassed its strict transfer instructions, highlighting the intricacies of prompt engineering for AI manipulation.
- The experiment showcased a unique application of AI in crypto, with a transparent and open-source setup that intrigued many participants.
- Trends in Technology Adoption and Investment: There are observations of technology trends akin to historic market shifts, comparing LLMs to past technological phenomena that led to both excitement and subsequent market corrections.
- This ongoing conversation about the sustainability and future profitability of AI technologies echoes earlier patterns with industries like airlines.
- Building LLMs is probably not going be a brilliant business: The Netscapes of AI
- 2024: The State of Generative AI in the Enterprise - Menlo Ventures: The enterprise AI landscape is being rewritten in real time. We surveyed 600 U.S. enterprise IT decision-makers to reveal the emerging winners and losers.
- Tweet from Rory Flynn (@Ror_Fly): RUNWAY + MINIMAX + KLING → EPIC.Each video tool has its strengths.Runway → Control + ClarityMinimax → Creativity + MotionKling → Motion Brush + Multiple Subjects(Use them all)MJ PROMPT 1: wide angle d...
- Tweet from Tony Wu (@tonywu_71): 🚀 New cookbook: implementing an entire RAG pipeline with a single ColQwen2 model using adapter hot-swapping. Works on the free-tier Colab T4.Check it out at https://github.com/tonywu71/colpali-cookbo...
- Tweet from Augustinas Malinauskas (@amgauge): @jarrodWattsDev @freysa_ai Really cool summary @jarrodWattsDev! One clarification though - looking at the tx it seems that 70% goes to the prize pool and 15% gets swapped ETH -> FAI. So all players...
- 2024: The State of Generative AI in the Enterprise - Menlo Ventures: The enterprise AI landscape is being rewritten in real time. We surveyed 600 U.S. enterprise IT decision-makers to reveal the emerging winners and losers.
- Tweet from Aravind Srinivas (@AravSrinivas): Perplexity Black Friday Deals
- Tweet from Jarrod Watts (@jarrodWattsDev): Someone just won $50,000 by convincing an AI Agent to send all of its funds to them.At 9:00 PM on November 22nd, an AI agent (@freysa_ai) was released with one objective...DO NOT transfer money. Under...
- llama.cpp guide - Running LLMs locally, on any hardware, from scratch: Psst, kid, want some cheap and small LLMs?
Stability.ai (Stable Diffusion) ▷ #general-chat (18 messages🔥):
AI Model Performance, Stable Diffusion Hardware Questions, ControlNet for SD 3.5 Feedback, Content Creation Queries, LoRA Model Request
- Mixed Experiences with ControlNet for SD 3.5: A member expressed dissatisfaction with ControlNet for SD 3.5, indicating it only produces high-quality renders at 1024x1024 resolution without artifacts.
- In response, another member suggested that the issues might stem from lack of familiarity and encouraged experimenting with it to better understand its functionality.
- Seeking Hardware Advice for Stable Diffusion: One user inquired about performance benchmarks, revealing they’re achieving around 5 IT/s and questioning if that’s good or bad.
- The community is active in sharing hardware capabilities, indicating a keen interest in optimizing setups for Stable Diffusion.
- Request for LoRA Model in AI Art: A user asked if anyone knows about a LoRA half girl model, aiming to create a character that merges two different female designs.
- This indicates ongoing experimentation and creativity in character development within AI-generated art.
- Content Creator Thanksgiving Wishes: A member extended Happy Thanksgiving wishes to the Stability.ai team and fellow creators, fostering a sense of community.
- This highlights the camaraderie and collaborative spirit among content creators in the AI space.
tinygrad (George Hotz) ▷ #general (14 messages🔥):
TinyFPGA Memory Hierarchy, Memory Utilization Techniques, Exa Laboratories, Tenstorrent Training Algorithm, Brain-like Processing Models
- TinyFPGA's Potential Memory Architecture: Members discussed the design of TinyFPGA, contemplating how to mock a typical memory hierarchy while noting that existing options like Block RAM and DDR3 are insufficient.
- One suggested ideas for 'first pass' memory to localize constants near ALUs, which may enhance performance significantly.
- Challenges in Traditional Memory Models: Heuristic eviction policies may become obsolete as the focus shifts towards more efficient memory hierarchies in future designs.
- There were speculations on the future of trained parameters, with mentions of tensors potentially replacing them.
- Exa Laboratories and Sustainable Chip Designs: A discussion on Exa Laboratories highlighted their mission to create reconfigurable chips that outperform traditional GPUs/TPUs in speed and energy efficiency for specific AI needs.
- The skepticism about their viability led to comments about the challenges small companies face in chip development, especially with ambitious timelines.
- Tenstorrent and Biologically Plausible Training: George Hotz mentioned Tenstorrent as a serious player betting on a shift to training algorithms that mimic biological processes, aiming for greater efficiency.
- The potential changes include hierarchical memory models and real-time optimizations that resemble brain function principles in computing.
- Brain-like Processing in Computing: One member described a vision for computing that integrates compute and memory more naturally, enhancing power efficiency and enabling real-time optimizations.
- This approach proposes a system where segments of computing emulate brain coordination, allowing flexibility and efficiency in memory usage.
Link mentioned: Exa Laboratories: no description found
tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):
VIZ tool, VIZ vs LLVM/MLIR, tinygrad tutorials
- Explaining the VIZ Tool: A member wrote a detailed post explaining the VIZ tool, which can be found here. This post is intended to enhance understanding of its capabilities and applications within tinygrad.
- The post features a comprehensive tutorial directed at users looking to get acquainted with the VIZ functionality.
- George Hotz Acknowledges VIZ: George Hotz tweeted about the explanation of the VIZ tool, expressing his appreciation for the clarity provided in the post. He stated that VIZ=1 is a huge win over LLVM/MLIR, highlighting its advantages.
- This comment indicates a positive reception toward VIZ and its potential superiority in specific use cases compared to existing tools.
Link mentioned: tinygrad-notes/20241129_viz.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
Cohere ▷ #discussions (12 messages🔥):
Thanksgiving celebrations, Aya project contributions, Healthy meal choices, Food sharing, Dungeness crab
- Thanksgiving Cheers and Festive Plates: Members greeted each other with Happy Thanksgiving messages while sharing their meals, including one member's impressive plate of food.
- Another member commented on trying to eat healthy, humorously noting that it wasn't as tasty as it could be.
- Guidance on Contributing to Aya Project: A member sought guidance on how to contribute part-time to the Aya project for Cohere.
- Another member suggested joining the Aya server to connect with the community directly.
- Food Photography and Reactions: Members shared comments and images of their hearty meals, with one joking about the size stating it was more like dessert than a meal.
- A humorous remark followed about having eaten a plate of Dungeness crab beforehand, adding to the food sharing atmosphere.
- Sharing Food Videos: A member contributed to the food sharing conversation by posting a video in the channel.
- The exchange fostered a sense of community and celebration centered around food during Thanksgiving.
DSPy ▷ #general (8 messages🔥):
dspy.asyncify, dspy demo behavior, New Member Introduction
- Inquiry on dspy.asyncify support: A member inquired if anyone has started using
dspy.asyncify
, particularly noting its usage of threads and questioning the availability of pure async support due to issues with celery workers.- Another user echoed this concern, expressing a desire for pure async support.
- Behavior of demos with assertions in dspy: Concerns were raised about
dspy
not using demos in the final prompt when assertions are activated, with one user questioning if this was expected behavior.- Another member clarified that the presence of demonstrations in retry mode depends on whether the compilation was done before or after activating assertions.
- Warm welcome to new member Shaun: A new member named Shaun joined the server, greeted everyone, and expressed excitement to see ongoing projects.
- The community warmly welcomed Shaun, fostering an inclusive environment.
Torchtune ▷ #dev (5 messages):
DPO Fine-tuning, Full-parameter DPO, DPO vs LoRA-DPO, Full-finetuning DPO
- DPO and LoRA-DPO: Similar Techniques, Different Codes: While the DPO Trainer from Hugging Face features different code, the DPO technique remains consistent between repositories like LoRA-DPO.
- *It depends on how you define
- Possibility of Full-parameter DPO: Implementing full-parameter DPO is feasible and may provide better post-training alignment compared to LoRA-DPO.
- The community suggests exploring adaptations from the existing full PPO implementation as a guide.
- Creating dpo_full_finetune_single_device: A PR initiated by another user is available to add full finetuning DPO for distributed setups and serves as a good starting point for single device implementation.
- Accessing more details can be done via a link to the full DPO PR.
- Transition to Full-finetuning DPO: Upcoming support for full-finetuning DPO in Torchtune indicates that adjustments to load a separate reference model will be key.
- Modifications to the current setup will involve changing initial calls to the reference model for improved functionality.
- Memory Implications of FFT DPO: Memory usage for FFT DPO will be significantly higher compared to LoRA due to storing gradients and maintaining a complete model copy.
- If LoRA DPO falls short, the tradeoff for incorporating full-finetuning may be worth considering.
- full dpo by jxmsML · Pull Request #1966 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)Please link to any issues this PR addresses.ChangelogW...
- DPO Trainer: no description found
- torchtune/recipes/ppo_full_finetune_single_device.py at 32e265d5749fd592711a03247486eafa6c898d94 · pytorch/torchtune: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.
- torchtune/recipes/lora_dpo_single_device.py at 32e265d5749fd592711a03247486eafa6c898d94 · pytorch/torchtune: PyTorch native finetuning library. Contribute to pytorch/torchtune development by creating an account on GitHub.
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):
Quiz 11 availability, OpenAI credits inquiry, MOOC certificate eligibility
- Quiz 11 still not open?: A member expressed confusion about the status of Quiz 11, questioning why it isn't available yet.
- Is there an expected date for when it will be open?
- Inquiry on OpenAI credits: A user inquired about the status of their OpenAI credits, mentioning they filled out the form last week.
- They expressed urgency, stating they are in need of support for their project development.
- MOOC completion and certificate: A member asked if starting the MOOC now would still allow them to receive the certificate after completion.
- They were also curious if it's feasible to finish all requirements within the remaining time.
OpenInterpreter ▷ #ai-content (2 messages):
Open Interpreter inspired project, Open-source dashboard
- Open Interpreter prototype in the works: A member shared that they are developing a project inspired by Open Interpreter focused on creating an actual dashboard.
- They plan to release it as open-source this year, emphasizing that it will be a fun little project without any profit motive.
- Community support for development: Another member congratulated the project creator for their efforts, expressing enthusiasm with a comment, 'Nice work! Well done 🚀'.
- This brief exchange highlighted the community's encouragement for innovative projects within the space.
Interconnects (Nathan Lambert) ▷ #memes (2 messages):
OLMo 2, Weight Watcher AI, Model Performance Comparison
- OLMo 2 Models Show Promising Performance: The OLMo 2 family includes 7B and 13B models from Allen AI (AI2), trained on up to 5T tokens, with the 7B outperforming Llama-3.1 8B and the 13B outperforming Qwen 2.5 7B. Key improvements include an enhanced architecture with RMSNorm and QK-Norm and a comprehensive two-stage curriculum training approach.
- Innovative Techniques in OLMo 2 Training: Notable advancements for OLMo 2 include the model souping technique for final checkpoints and the state-of-the-art post-training methodology derived from Tülu 3. This new method features three stages: instruction tuning, preference tuning with DPO, and reinforcement learning with verifiable rewards.
- Instruct Variants Compete with Top Open-Weight Models: The Instruct variants of OLMo 2 are reported to be competitive with leading open-weight models, with the 13B Instruct variant outperforming Qwen 2.5 14B and Tülu 3 8B in instruct tasks. The performance was validated using the OLMES suite.
- Weight Watcher AI Gains Attention: A comment highlighted the novelty of the Weight Watcher AI URL, calling it an amazing addition to the AI landscape. It was humorously noted that it was shared in the memes channel for its amusing nature.
Link mentioned: WeightWatcher: Data-Free Diagnostics for Deep Learning: no description found
LlamaIndex ▷ #general (1 messages):
Web Development, JavaScript Frameworks, Testing Tools, API Integrations, Cloud Services
- Developer Skills Showcase: A member shared an extensive list of development skills including React, Next.js, Angular, and D3.js. They also highlighted their experience with UI/UX and various testing frameworks like Protractor and TestCafe.
- Diverse Technology Stack: The developer mentioned a wide range of technologies such as Node, Nest.js, Solidity, and Rust among others. They also included knowledge of front-end frameworks along with Bootstrap and styling methodologies like BEM and SMACSS.
- API Integration Expertise: They expressed familiarity with integrating multiple APIs including Google Maps, YouTube, and Facebook APIs. This varying knowledge allows them to work on diverse projects that require seamless data interaction.
- Cloud Deployment Skills: The member highlighted AWS among their cloud service competencies. This adds notable value to their development abilities as they can deploy applications into the cloud environment effectively.
- Call for Collaboration: They concluded with an invitation to connect, promoting potential networking opportunities within the developer community. This outreach fosters collaboration among professionals sharing similar interests.