[AINews] Qdrant's BM42: "Please don't trust us"

Cardboard NPU

                July 6, 2024

            [AINews] Qdrant's BM42: "Please don't trust us"

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Peer review is all you need.

AI News for 7/4/2024-7/5/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (418 channels, and 3772 messages) for you. Estimated reading time saved (at 200wpm): 429 minutes. You can now tag @smol_ai for AINews discussions + try Smol Talk!

Qdrant is widely known as OpenAI's vector database of choice, and over the July 4 holiday they kicked off some big claims to replace the venerable BM25 (and even the more modern SPLADE), attempting to coin "BM42":

to solve the problem of semantic + keyword search by combining transformer attention for word importance scoring with collection-wide statistics like IDF, claiming advantages over every use case:

Only one problem... the results. Jo Bergum from Vespa (a competitor), pointed out the odd choice of Quora (a "find similar duplicate" questions dataset, not a Q&A retrieval dataset) as dataset and obviously incorrect evals if you know that dataset:

Specifically, the Quora dataset only has ~1.6 datapoints per query so their precision@10 number was obviously wrong claiming to have >4 per 10.
Nils Reimers of Cohere took BM42 and reran on better datasets for finance, biomedical, and Wikipedia domains, and sadly BM42 came up short on all accounts:

For their part, Qdrant has responded to and acknowledged the corrections, and published corrections... except still oddly running a BM25 implementation that scores worse than everyone else expects and conveniently worse than BM42.
Unfortunate for Qdrant, but the rest of us just got a lightning lesson in knowing your data, and sanity checking evals. Lastly, as always in PR and especially in AI, Extraordinary claims require extraordinary evidence.

Meta note: If you have always wanted to customize your own version of AI News, we have now previewed a janky early version of Smol Talk, which you can access here: https://smol.fly.dev

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
PART 1: High level Discord summaries
HuggingFace Discord
Stability.ai (Stable Diffusion) Discord
Unsloth AI (Daniel Han) Discord
Latent Space Discord
LM Studio Discord
CUDA MODE Discord
Perplexity AI Discord
LAION Discord
OpenAI Discord
Nous Research AI Discord
OpenRouter (Alex Atallah) Discord
Eleuther Discord
LangChain AI Discord
LlamaIndex Discord
Cohere Discord
OpenInterpreter Discord
Modular (Mojo 🔥) Discord
LLM Finetuning (Hamel + Dan) Discord
Interconnects (Nathan Lambert) Discord
OpenAccess AI Collective (axolotl) Discord
tinygrad (George Hotz) Discord
Torchtune Discord
AI Stack Devs (Yoko Li) Discord
DiscoResearch Discord
MLOps @Chipro Discord
Datasette - LLM (@SimonW) Discord

PART 2: Detailed by-Channel summaries and links
HuggingFace ▷ #announcements (1 messages):
HuggingFace ▷ #general (495 messages🔥🔥🔥):
HuggingFace ▷ #today-im-learning (2 messages):
HuggingFace ▷ #cool-finds (6 messages):
HuggingFace ▷ #i-made-this (32 messages🔥):
HuggingFace ▷ #reading-group (7 messages):
HuggingFace ▷ #computer-vision (4 messages):
HuggingFace ▷ #NLP (17 messages🔥):
HuggingFace ▷ #diffusion-discussions (3 messages):
Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (528 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (267 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #announcements (1 messages):
Unsloth AI (Daniel Han) ▷ #off-topic (7 messages):
Unsloth AI (Daniel Han) ▷ #help (121 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #showcase (3 messages):
Unsloth AI (Daniel Han) ▷ #community-collaboration (10 messages🔥):
Latent Space ▷ #ai-general-chat (94 messages🔥🔥):
Latent Space ▷ #ai-announcements (5 messages):
Latent Space ▷ #llm-paper-club-west (34 messages🔥):
Latent Space ▷ #ai-in-action-club (243 messages🔥🔥):
LM Studio ▷ #💬-general (157 messages🔥🔥):
LM Studio ▷ #🤖-models-discussion-chat (130 messages🔥🔥):
LM Studio ▷ #🧠-feedback (3 messages):
LM Studio ▷ #⚙-configs-discussion (5 messages):
LM Studio ▷ #🎛-hardware-discussion (61 messages🔥🔥):
LM Studio ▷ #🧪-beta-releases-chat (2 messages):
LM Studio ▷ #amd-rocm-tech-preview (2 messages):
CUDA MODE ▷ #general (10 messages🔥):
CUDA MODE ▷ #triton (2 messages):
CUDA MODE ▷ #torch (7 messages):
CUDA MODE ▷ #algorithms (5 messages):
CUDA MODE ▷ #cool-links (1 messages):
CUDA MODE ▷ #beginner (17 messages🔥):
CUDA MODE ▷ #pmpp-book (4 messages):
CUDA MODE ▷ #jax (4 messages):
CUDA MODE ▷ #torchao (11 messages🔥):
CUDA MODE ▷ #off-topic (3 messages):
CUDA MODE ▷ #llmdotc (134 messages🔥🔥):
CUDA MODE ▷ #bitnet (3 messages):
Perplexity AI ▷ #general (165 messages🔥🔥):
Perplexity AI ▷ #sharing (13 messages🔥):
Perplexity AI ▷ #pplx-api (15 messages🔥):
LAION ▷ #general (185 messages🔥🔥):
LAION ▷ #research (2 messages):
LAION ▷ #resources (1 messages):
LAION ▷ #learning-ml (1 messages):
LAION ▷ #paper-discussion (1 messages):
OpenAI ▷ #ai-discussions (116 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (26 messages🔥):
OpenAI ▷ #prompt-engineering (16 messages🔥):
Nous Research AI ▷ #research-papers (1 messages):
Nous Research AI ▷ #datasets (1 messages):
Nous Research AI ▷ #off-topic (4 messages):
Nous Research AI ▷ #interesting-links (5 messages):
Nous Research AI ▷ #general (110 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (1 messages):
Nous Research AI ▷ #rag-dataset (8 messages🔥):
Nous Research AI ▷ #world-sim (10 messages🔥):
OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):
OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):
Eleuther ▷ #general (42 messages🔥):
Eleuther ▷ #research (28 messages🔥):
Eleuther ▷ #scaling-laws (5 messages):
Eleuther ▷ #interpretability-general (3 messages):
Eleuther ▷ #lm-thunderdome (18 messages🔥):
Eleuther ▷ #multimodal-general (1 messages):
LangChain AI ▷ #general (75 messages🔥🔥):
LangChain AI ▷ #share-your-work (3 messages):
LangChain AI ▷ #tutorials (1 messages):
LlamaIndex ▷ #announcements (1 messages):
LlamaIndex ▷ #blog (4 messages):
LlamaIndex ▷ #general (71 messages🔥🔥):
Cohere ▷ #general (45 messages🔥):
Cohere ▷ #project-sharing (17 messages🔥):
OpenInterpreter ▷ #general (57 messages🔥🔥):
OpenInterpreter ▷ #O1 (2 messages):
Modular (Mojo 🔥) ▷ #general (5 messages):
Modular (Mojo 🔥) ▷ #🔥mojo (22 messages🔥):
Modular (Mojo 🔥) ▷ #nightly (10 messages🔥):
Modular (Mojo 🔥) ▷ #mojo-marathons (17 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #general (12 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (7 messages):
LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #ankurgoyal_textsql_llmevals (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #jeremy_python_llms (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #axolotl (3 messages):
LLM Finetuning (Hamel + Dan) ▷ #credits-questions (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #predibase (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):
Interconnects (Nathan Lambert) ▷ #news (5 messages):
Interconnects (Nathan Lambert) ▷ #other-papers (8 messages🔥):
Interconnects (Nathan Lambert) ▷ #random (9 messages🔥):
Interconnects (Nathan Lambert) ▷ #posts (4 messages):
OpenAccess AI Collective (axolotl) ▷ #general (20 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):
OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (1 messages):
tinygrad (George Hotz) ▷ #general (3 messages):
tinygrad (George Hotz) ▷ #learn-tinygrad (12 messages🔥):
Torchtune ▷ #general (8 messages🔥):
AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (5 messages):
AI Stack Devs (Yoko Li) ▷ #assets (1 messages):
DiscoResearch ▷ #general (1 messages):
DiscoResearch ▷ #discolm_german (2 messages):
MLOps @Chipro ▷ #events (2 messages):
Datasette - LLM (@SimonW) ▷ #llm (1 messages):

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet.

Stripe Issues and Alternatives

Stripe account issues: @HamelHusain noted Stripe is "holding all my money hostage" with "endless wall of red tape" despite no refund requests. @jeremyphoward called it "disgraceful" that Stripe canceled an account due to an "AI/ML model failure".
Appealing Stripe decisions: @HamelHusain appealed a Stripe rejection but got denied within 5 minutes, with Stripe "holding thousands of dollars hostage". 
Alternatives to Stripe: @HamelHusain noted needing a "backup plan" as "Getting caught by AI/ML false positives sucks." @virattt expressed caution about using Stripe after seeing many posts about issues.

AI and LLM Developments

Anthropic Constitutional AI: @Anthropic noted Claude 3.5 Sonnet suppresses parts of answers with "antThinking" tags that are removed on the backend, which some disagree with being hidden.
Gemma 2 model optimizations: @rohanpaul_ai shared Gemma 2 can be finetuned 2x faster with 63% less memory using the UnslothAI library, allowing 3-5x longer contexts than HF+FA2. It can go up to 34B on a single consumer GPU.
nanoLLaVA-1.5 vision model: @stablequan_ai announced nanoLLaVA-1.5, a compact 1B parameter vision model with significantly improved performance over v1.0. Model and spaces were linked.
Reflection as a Service for LLMs: @llama_index introduced using reflection as a standalone service for agentic LLM applications to validate outputs and self-correct for reliability. Relevant papers were cited.

AI Art and Perception 

AI vs human art perception poll: @bindureddy posted a poll with 3 AI generated images and 1 human artwork, challenging people to identify the human one, as a "quick experiment" on art perception.
AI art as non-plagiarism: @bindureddy argued AI art is not plagiarism as it does the "same thing humans do" in studying work, getting inspired, and creating something new. Exact replicas are plagiarism, but not brand new creations.

Memes and Humor

Zuckerberg meme video: @GoogleDeepMind shared a meme video of Mark Zuckerberg reacting. @BrivaelLp joked about Zuckerberg's "masterclass" in transforming into a "badass tech guy".
Caninecyte definition: @c_valenzuelab jokingly defined a "caninecyte" as a "type of cell characterized by its resemblance to a dog" in a mock dictionary entry.
Funny family photos: @NerdyRodent humorously asked "Why is it that when I go through old family pictures, someone always has to stick their tongue out?" with an accompanying pixelated artwork.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Progress and Implications

Rapid pace of AI breakthroughs: In /r/singularity, a post highlights how compressed recent AI advances are in the grand scheme of human history, with modern deep learning emerging in just the last "second" if human existence was a 24-hour day. However, an article questions the economic impact of AI so far despite the hype.
AI humor abilities: Studies show AI-generated humor being rated as funnier than humans and on par with The Onion, though some /r/singularity commenters are skeptical the AI jokes are that original. 
OpenAI security breach: The New York Times reports that in early 2023, a hacker breached OpenAI's communication systems and stole info on AI development, raising concerns they aren't doing enough to prevent IP theft by foreign entities.
Anti-aging progress: In a YouTube interview, the CSO of Altos Labs discusses seeing major anti-aging effects in mice from cellular reprogramming, with old mice looking young again. Human trials are next.

AI Models and Capabilities

New open source models discussed include Kyutai's Moshi audio model, the internlm 2.5 xcomposer vision model, and T5/FLAN-T5 being merged into llama.cpp.
An evaluation of 180+ LLMs on code generation found DeepSeek Coder 2 beat LLama 3 on cost-effectiveness, with Claude 3.5 Sonnet equally capable. Only 57% of responses compiled as-is.

AI Safety and Security

/r/LocalLLaMA discusses ways to secure LLM apps, including fine-tuning to reject unsafe requests, prompt engineering, safety models, regex filtering, and not rewriting user prompts.
An example is shared of Google's Gemini AI repeating debunked information, showing current AI can't be blindly trusted as factual.

AI Art and Media

Workflows are shared for generating AI singers using Stable Diffusion, MimicMotion, and Suno AI, and using ComfyUI to generate images from a single reference.
/r/StableDiffusion discusses a new open source method for transferring facial expressions between images/video, and the emerging role of AI Technical Artists to build AI art pipelines for game studios.
/r/singularity predicts a resurgence in demand for live entertainment as AI displaces online media.

Robotics and Embodied AI

Videos are shared of Menteebot navigating an environment, a robot roughly picking tomatoes, and Japan developing a giant humanoid robot to maintain railways.
A tweet calls for development of open source mechs.

Miscellaneous

/r/StableDiffusion expresses concern about the sudden disappearance of the Auto-Photoshop-StableDiffusion plugin developer.
An extreme horror-themed 16.5B parameter LLaMA model is shared on Hugging Face.
/r/singularity discusses a "Singularity Paradox" thought experiment about when to buy a computer if progress doubles daily, with comments noting flaws in the premise.

AI Discord Recap

A summary of Summaries of Summaries

1. LLM Performance and Optimization

New models like Llama 3, DeepSeek-V2, and Granite-8B-Code-Instruct are showing strong performance on various benchmarks. For example, Llama 3 has risen to the top of leaderboards like ChatbotArena, outperforming models like GPT-4-Turbo and Claude 3 Opus.

Optimization techniques are advancing rapidly:

ZeRO++ promises 4x reduction in communication overhead for large model training.
The vAttention system aims to dynamically manage KV-cache memory for efficient LLM inference.
QServe introduced W4A8KV4 quantization to boost cloud-based LLM serving performance.

2. Open Source AI Ecosystem

Tools like Axolotl are supporting diverse dataset formats for LLM training.
LlamaIndex launched a course on building agentic RAG systems.
Open-source models like RefuelLLM-2 are being released, focusing on specific use cases.

3. Multimodal AI and Generative Models

New multimodal models are enhancing various capabilities:

Idefics2 8B Chatty focuses on improved chat interactions.
CodeGemma 1.1 7B refines coding abilities.
Phi 3 brings AI chatbots to browsers via WebGPU.
Combinations of models (e.g., Pixart Sigma + SDXL + PAG) are aiming to achieve DALLE-3 level outputs.

4. Stability AI Licensing

Stability AI revised the license for SD3 Medium after community feedback, aiming to provide more clarity for individual creators and small businesses.
Discussions about AI model licensing terms and their impact on open source development are ongoing across multiple communities.
Stability AI's launch of Stable Artisan, a Discord bot integrating various Stable Diffusion models for media generation and editing, was a hot topic (Stable Artisan Announcement). Users discussed the implications of the bot, including questions about SD3's open-source status and the introduction of Artisan as a paid API service.

5. Community Tools and Platforms

Stability AI launched Stable Artisan, a Discord bot integrating models like Stable Diffusion 3 and Stable Video Diffusion for media generation within Discord.
Nomic AI announced GPT4All 3.0, an open-source local LLM desktop app, emphasizing privacy and supporting multiple models and operating systems.

6. New LLM Releases and Benchmarking Discussions:

Several AI communities discussed the release of new language models, such as Meta's Llama 3, IBM's Granite-8B-Code-Instruct, and DeepSeek-V2, with a focus on their performance on various benchmarks and leaderboards. (Llama 3 Blog Post, Granite-8B-Code-Instruct on Hugging Face, DeepSeek-V2 on Hugging Face)
Some users expressed skepticism about the validity of certain benchmarks, calling for more credible sources to set realistic standards for LLM assessment.

7. Optimizing LLM Training and Inference:

Across multiple Discords, users shared techniques and frameworks for optimizing LLM training and inference, such as Microsoft's ZeRO++ for reducing communication overhead (ZeRO++ Tutorial), vAttention for dynamic KV-cache memory management (vAttention Paper), and QServe for quantization-based performance improvements (QServe Paper).
Other optimization approaches like Consistency LLMs for parallel token decoding were also discussed (Consistency LLMs Blog Post).

8. Advancements in Open-Source AI Frameworks and Datasets:

Open-source AI frameworks and datasets were a common topic across the Discords. Projects like Axolotl (Axolotl Dataset Formats), LlamaIndex (Building Agentic RAG with LlamaIndex Course), and RefuelLLM-2 (RefuelLLM-2 on Hugging Face) were highlighted for their contributions to the AI community.
The Modular framework was also discussed for its potential in Python integration and AI extensions (Modular Blog Post).

9. Multimodal AI and Generative Models:

Conversations surrounding multimodal AI and generative models were prevalent, with mentions of models like Idefics2 8B Chatty (Idefics2 8B Chatty Tweet), CodeGemma 1.1 7B (CodeGemma 1.1 7B Tweet), and Phi 3 (Phi 3 Reddit Post) for various applications such as chat interactions, coding, and browser-based AI.
Generative modeling techniques like combining Pixart Sigma, SDXL, and PAG for high-quality outputs and the open-source IC-Light project for image relighting were also discussed (IC-Light GitHub Repo).

10. New Model Releases and Training Tips in Unsloth AI Community:

The Unsloth AI community was abuzz with discussions about new model releases like IBM's Granite-8B-Code-Instruct (Granite-8B-Code-Instruct on Hugging Face) and RefuelAI's RefuelLLM-2 (RefuelLLM-2 on Hugging Face). Users shared their experiences with these models, including challenges with Windows compatibility and skepticism over certain performance benchmarks. The community also exchanged valuable tips and insights on model training and fine-tuning.

PART 1: High level Discord summaries
HuggingFace Discord

Vietnamese Linguistics Voiced: Vi-VLM's Vision**: The Vi-VLM team announced a Vision-Language model tailored for Vietnamese, integrating Vistral and LLaVA frameworks to focus on image descriptions. Viewers can find the demo and supporting code in the linked repository.
Dataset Availability: Vi-VLM released a dataset specific for VLM training in Vietnamese, which is accessible for enhancing local language model applications. The dataset adds to the linguistic resources available for Southeast Asian languages.

Grappling with Graphics: WHAM's Alternative Search**: An enthusiast sought alternatives to WHAM for human pose estimation in complex videos, pointing out the ungainly Python and CV dependencies. The community exchange hints at a need for tools that accommodate non-technical users in sophisticated AI applications.
Learning resources for ViT and U-Net implementations were shared, including a guide from Zero to Mastery and courses by Andrew Ng, indicating community interest in mastering these vision transformer models.

Tuning In: Audio-Language Model Discourse**: Moshi's linguistic fluidity: Yann LeCun shared a tweet spotlighting Kyutai.org's digital pirate that can comprehend English spoken with a French accent, showcasing the model's diverse auditory processing capabilities.
Interest in the Flora paper and audio-language models remains strong, reflecting the AI community's focus on cross-modal faculties. Upcoming paper reading sessions on these topics are anticipated with enthusiasm.

Frozen in Thought: The Mistral Model Stalemate**: Users reported a crawling halt in the Mistral model's inference process, notably at iteration 1800 out of 3000, suggesting possible caching complications. This reflects on the pragmatic challenge of managing resources during extensive computational tasks.
Conversations surfaced around making effective API calls without downloading models locally, highlighting the need for streamlined remote inference protocols. The API-centric dialogue underscores a trend towards more flexible, cloud-based ML operations.

Diffusion Discussion: RealVisXL and ISR**: RealVisXL V4.0, optimized for rendering photorealistic visuals, is now in training, with an official page and sponsorship on Boosty, spotlighting community support for model development.
The existing IDM-VTON's 'no file named diffusion_pytorch_model.bin' error in Google Colab exemplifies the troubleshooting dialogs that emerge within the diffusion model space, emphasizing the practical sides of AI deployment.

Stability.ai (Stable Diffusion) Discord

Clearing the Confusion: Stability AI’s Community License: Stability AI has revised their SD3 Medium release license after feedback, offering a new Stability AI Community License that clarifies usage for individual creators and small businesses. Details are available in their recent announcement, with the company striking a balance between commercial rights and community support.
Users can now freely use Stability AI's models for non-commercial purposes under the new license, providing an open source boon to the community and prompting discussions about how these changes could impact model development and accessibility.

Anime AI Model's Metamorphosis: Animagine XL 3.1: The Animagine XL 3.1 model by Cagliostro Research Lab and SeaArt.ai is driving conversations with its enhancements over predecessor models, bringing higher quality and broader range of anime imagery to the forefront.
The AAM Anime Mix XL has also captured attention, sparking a flurry of comparisons with Animagine XL 3.1, as enthusiasts discuss their experiences and preferences between the different anime-focused generation models.

Debating the GPU Arms Race: Multi-GPU Configurations: The technical community is actively discussing the optimization of multiple GPU setups to boost Stable Diffusion's performance, with emphasis on tools like SwarmUI that cater to these complex configurations.
The debates converge on the challenges of efficiently managing resources and achieving high-quality outputs, highlighting the combination of technical prowess and creativity required to navigate the evolving landscape of AI model training.

CivitAI's SD3 Stance Spurring Controversy: CivitAI's move to ban SD3 models has divided opinion within the community, as some view it as a potential roadblock for the development of the Stable Diffusion 3 framework.
The discussions deepen with insights into licensing intricacies, commercial implications, and the overall trajectory of how this decision could shape future collaborations and model evolutions.

License and Limits: Stable Diffusion Under Scrutiny: The latest conversations scrutinize the license for Stable Diffusion 3 and its compatibility with both individual and enterprise usage, considering the community's need for clarity and freedom in AI model experimentation.
Community sentiment is split, as discussions pivot around whether the perceived license restrictions unfairly penalize smaller projects or whether they're an inherent part of maturing technologies in the field of AI.

Unsloth AI (Daniel Han) Discord

Gemma's Quantum Leap: The new Gemma 2 has hit the tech scene, boasting 2x faster finetuning and a lean VRAM footprint, requiring 63% less VRAM (Gemma 2 Blog). Support for hefty 9.7K token contexts on the 27B model was a particular highlight among Unsloth users.
The marred launch with notebook issues such as mislabeling was glossed over by a community member's remark on the rushed blogpost, but those issues have been swiftly tackled by developers (Notebook Fixes).

Datasets Galore at Replete-AI: Replete-AI has introduced two extensive datasets, Everything_Instruct and its multilingual cousin, each packing 11-12GB of instruct data (Replete AI Datasets). Over 6 million rows are at AI developers' disposal, promising to fuel the next wave of language model training.
The community's enthusiasm was tempered with quality checks, probing the datasets for deduplication and content balance, a nod to the seasoned eye for meticulous dataset crafting.

Notebooks Nailed to the Mast: Requests in collaboration channels have led to a commitment for pinning versatile notebooks, assisting members to swiftly home in on valuable resources.
Continued efforts were seen with correcting notebook links and the promise to integrate them into the Unsloth GitHub page, showcasing a dynamic community-driven documentation process (GitHub Unsloth).

Patch and Progress with Unsloth 2024.7: Unsloth's patch 2024.7 got mixed reception due to checkpoint-related errors, yet it marks an important stride by integrating Gemma 2 support into Unsloth's ever-growing toolkit (2024.7 Update).
Devoted users and Unsloth's responsive devs are on top of the fine-tuning foibles and error resolutions, evidencing a robust feedback loop essential for fine-grained model optimization.

Facebook's Controversial Token Tactics: Facebook's multi-token prediction model stirred debate over access barriers, stirring a whirlwind of opinions among Unsloth's tight-knit community.
Critical views on data privacy were par for the course, specifically relating to the need for sharing contact data to utilize Facebook's model, fueling an ongoing conversation on ethical AI usage (Facebook's Multi-Token Model).

Latent Space Discord

Sprinting on Funding Runway: Following a link to rakis, community members discussed the whopping $85M seed investment intersecting AI with blockchain, sparking conversations on the current venture capital trends in technology.
The developers of BM42 faced heat for potentially skewed benchmarks, leading to a vigilant community advocating for rigorous evaluation practices; this prompted a revised approach to their metrics and datasets.

Collision Course: Coding Tools: Users compared git merge tool experiences, singling out lazygit and Sublime Merge, driving the conversation towards the need for more nuanced tools for code conflict resolution.
Claude 3.5 and other AI-based tools grabbed the spotlight in discussions for their prowess in coding assistance, emphasizing efficiency in code completion and capabilities like handling complex multi-file refactors.

Tuning into Technical Talk: On the Latent Space Podcast, Yi Tay from Reka illuminated the process of developing a training stack for frontier models while drawing size and strategy parallels with teams from OpenAI and Google Gemini.
Listeners were invited to engage on Hacker News with the live discussion, bridging the gap between the podcast and broader AI research community dialogues.

Navigating AV Troubles: OpenAI's AV experienced disruptions during the AIEWF demo, with voices for a switch to Zoom ensuing, followed by a swift action resulting in sharing a Zoom meeting link for better AV stability.
Compatibility issues between Discord and Linux persisted as a recurrent technical headache, prompting users to explore more Linux-friendly communication alternatives.

Deconstructing Model Merger Mania: Debates on model merging tactics took center stage with participants mulling the differing objectives and potential integrative strategies for tools like LlamaFile and Ollama.
The conversation dived into the possibilities of wearable technology integration with AI for enhancing event experiences, paired with a deep consideration for privacy and consent.

LM Studio Discord

Snapdragon's Surprising Speed: The Surface Laptop with Snapdragon Elite showcased heft, hitting 1.5 seconds to first token and 10 tokens per second on LLaMA3 8b with 8bit precision, whilst only using 10% GPU. No NPU activity yet, but the laptop's speed stirred speculation on eventual NPU boosts to LLaMA models.
Tech enthusiasts compared Snapdragon's CPU prowess to older Intel counterparts, finding the former's velocity vivacious. Amidst laughter, the tech tribe teases about a makeshift Cardboard NPU, projecting potential performance peaks pending proper NPU programming.

Quantization Quirks and Code Quests: Quantization quandaries arose with Gemma-2-27b, where model benchmarks behaved bizarrely across different quantized versions. Meanwhile, tailored system prompts polished performance for Gemma 2 27B, prompting PEP 8-adhering and efficient algorithm emission.
Suggestions surfaced that Qwen2 models trot best with ChatML and a flash attention setting, while users with non-CUDA contraptions cautioned against the chaos of IQ quantization, noting notably nicer behavior on alternative architectures.

LM Studio's ARM Standoff: A vexed user voiced frustration when LM Studio's AppImage defied a dance with aarch64 CPUs. The error light shone, signaling a syntax struggle, and a lamenting line confirmed, "No ARM CPU support on Linux."
Dialogues dashed hopes for immediate ARM CPU inclusions, leaving Linux loyalists longing. A shared sibling sentiment suggested an architecture adjustment for LM Studio belongs on the horizon but hasn't hit home base just yet.

RTX's Rocky Road: RTX 4060 8GB VRAM owners opined their predicament with 20B quantized models; a tenacious tussle with tokens terminated in total system freezes. Fellow forum members felt for them, flashing back to their own fragmentary RTX 4060 experiences.
Guild guidance gave GPU grievances a glimmer of hope, heralding less loaded models like Mistral 7B and Open Hermes 2.5 for mid-tier machine mates. A commendatory chorus rose for smaller souls, steering clear of titanic token-takers.

ROCm's Rescue Role: Seeking solace from stifled sessions, users with 7800XT aired their afflictions as models muddled up, missing the mark on GPU offload. A script signalled success, soothing overtaxed systems seeking ROCm solace.
The cerebral collective converged on solutions, corroborating the effectiveness of the ROCm installation script. Joyous jingles jived in the forum, as confirmation came that GPGPU gurus had gathered a workaround worthy of the wired world.

CUDA MODE Discord

CUDA Conundrums & Mixed-precision MatMul: Discussions in the CUDA MODE guild veered into optimizing matrix multiplication using CUDA, highlighting a blog post on techniques for column loading in GPU matrix multiplication; another thread featured the release of customized gemv kernels for int2*int8 and the BitBLAS library for mixed-precision operations.
Users explored TorchDynamo's role in PyTorch performance, and compared ergonomics of Python vs C++ for CUDA kernel development, with Python favored for its agility in initial phases. Some faced challenges adapting to Python 3.12 bytecode changes with torch.compile, addressed in a recent discussion.

GPTs Crafting Executive Summaries & Model Training Trials: A blog post detailing the use of GPTs for executive summary drafting sparked interest, while LLM training trials with FP8 gradients were flagged for increasing losses, prompting a switch to BF16 for certain operations.
Schedule-Free Optimizers boasted smoother loss curves, with empirical evidence of convergence benefits shared by users, meanwhile, a backend SDE's transition to CUDA inference optimization was deliberated with suggestions spanning online resources, course recommendations, and community involvement.

AI Podcasts & Keynotes Spark Engaging Discussions: Lightning AI's Thunder Sessions podcast with Luca Antiga and Thomas Viehmann caught the attention of community members, whereas Andrej Karpathy's keynote at UC Berkeley was a highlighter of innovation and student talent.
Casual conversations and channel engagement painted a picture of an interactive forum, with members sharing brief notes of excitement or appreciation, yet holding back on deeper technical exchanges in channels tagged as less focused.

Deep Learning Frameworks & Triton Kernel Fixes: The quest to build a deep learning framework from scratch in C++, akin to tinygrad, uncovered the complexity hurdle, kindling a debate on the affordances of C++ vs Python in this context, while Triton kernel's tl.load issues in parallel CUDA graph instances required ingenuity to circumnavigate latency concerns.
Further intricacies surfaced when discussing the functioning of the .to method in torch.ao, where current limitations restrict dtype and memory format changes, prompting temporary function amendments as discussed in issue trackers and commit logs.

Perplexity AI Discord

Llamas Looping Lines: Repetition Glitch in AI**: Users experienced Perplexity AI outputting repetitive responses across models such as Llama 3 and Claude, and were reassured that the issue was being addressed with an imminent fix.
Alex confirmed the issue's recognition and the ongoing efforts to rectify it, marking a pressing concern within the Perplexity AI's performance benchmark.

Real-Time Reality Check Fails: Live Access Hiccups**: A gap in expectations has emerged as Perplexity AI users face live internet data retrieval issues, receiving obsolete rather than up-to-date information.
Despite attempts to resolve the inaccuracies by restarting the application, the users indicated the problem persistence in the feedback channel.

Math Model Missteps: Perplexity Pro's Calculation Challenges**: Perplexity Pro's computations, such as CAPM beta, were highlighted for inaccuracies despite its GPT-4o origins, casting shadows on its reliable academic application.
The community expressed its dissatisfaction and concerns regarding the model's utility in fields requiring exact mathematical problem solving.

Stock Market Success Stories: Perplexity’s Profitable Predictions**: Anecdotes of financial victories like making $8,000 surfaced among users who harnessed Perplexity AI for stock market decisions, triggering conversations on its varied benefits.
Such user stories serve as testimonials to the diverse capabilities of the Pro version of Perplexity AI in real-world use cases.

Subscription Scrutiny: Decoding Perplexity AI Plans**: Questions and comparisons flourished as users delved into the differences between Pro and Enterprise Pro plans, particularly concerning model allocations like Sonnet and Opus.
Enquiries were directed at understanding not just availability but also the specificity of models included in Perplexity’s varied subscription offerings.

LAION Discord

BUD-E Board Expansion: BUD-E now reads clipboard text, a new feature shown in a YouTube video with details on GitHub. The feature demo, presented in low quality, sparked light-hearted comments.
The community discussed AI model training challenges due to recurrent usage of overlapping datasets, with FAL.AI's dataset access hurdles highlighting the issue. Contrastingly, breakthroughs like Chameleon are linked to a variety of integrated data.

Clipdrop Censorship Confusion: Clipdrop's NSFW detection misfired, mislabeling a benign image as inappropriate, much to the amusement of the community.
Stability AI revises license for SD3 Medium, now under the Stability AI Community License, allowing increased access for individual creators and small businesses after community feedback.

T-FREE Trend Setter: The new T-FREE tokenizer, detailed in a recently released paper, promises sparse activations over character triplets, negating the need for large reference corpora and potentially reducing embedding layer parameters by over 85%.
The approach is praised for enhancing performance on less common languages and slimming embedding timers, adding a compact edge to LLMs.

Alert: Scammer in the Guild: A scammer was flagged in the #[research] channel, putting the community on high alert.
A string of identical phishing links offering a '$50 gift card' was posted across multiple channels by a user, raising concerns.

OpenAI Discord

Voices in the Void: The unveiling of a new Moshi AI demo sparked a mix of excitement for its real-time voice interaction and disappointment over issues with interruptions and looped responses.
Hume AI's playground was scrutinized for its lack of long-term memory, frustrating users who seek persistent AI conversations.

Memory Banks in Question: GPT's memory prowess came under fire as it saves user preferences but still fabricates responses, with members suggesting enhanced customization to mitigate this.
A heated GPT-2 versus modern models debate surfaced, comparing the cost-efficiency of older models with the performance leaps in current iterations like GPT-3.5 Turbo.

ChatGPT: Free vs. Plus Plans: Advantages of the paid ChatGPT Plus plan were clarified, detailing perks such as a higher usage cap, DALL·E access, and an expanded context window.
GPT-4 usage concerns were addressed, with cooldown periods in place after limit hits, specifically allowing Plus members up to 40 messages every 3 hours.

AI Toolbox Expansion: Community members explored tools for testing multiple AI responses to prompts, suggesting a custom-built tool and existing options for efficient assessment.
Conversation turned to API integrations, looking at Rigorous Aggregate Generators (RAG) for linking AI models to diverse datasets and utilizing existing Assistant API endpoints.

Contest with Context: In #prompt-engineering, strategies for contesting traffic tickets were delineated, advising structured approaches and legal argumentation techniques.
Discussions blossomed over creating an employee recognition program to heighten workplace morale, focusing on goals and recognition criteria for notable contributions.

Nous Research AI Discord

Datasets Deluge by Replete-AI: Replete-AI dropped two gargantuan datasets, titled Everything_Instruct and Everything_Instruct_Multilingual, boasting 11-12GB and over 6 million data stripes. Intent is to amalgamate variegated instruct data to advance AI training.
The Everything_Instruct targets English, while Everything_Instruct_Multilingual brings in a linguistic mix to broaden language handling of AI. Both sets echo past successes like bagel datasets and take a cue from EveryoneLLM AI models. Dive in at Hugging Face.

Nomic AI Drops GPT4All 3.0: The latest by Nomic AI, GPT4All 3.0, hits the scene as an open-source, local LLM desktop app catering to a plethora of models and prioritizing privacy. The app is noted for its redesigned user interface and is licensed under MIT. Explore its features.
Touting more than a quarter-million monthly active users, GPT4All 3.0 facilitates private, local interactions with LLMs, cutting internet dependencies. Uptake has been robust, signaling a shift towards localized and private AI tool usage.

InternLM-XComposer-2.5 Raises the Bar: InternLM introduced InternLM-XComposer-2.5, a juggernaut in large-vision language models that brilliantly juggles 24K interleaved image-text contexts and scales up to 96K via RoPE extrapolation.
This model is a frontrunner with top-tier results on 16 benchmarks, closing in on behemoths like GPT-4V and Gemini Pro. Brewed with a sprinkle of innovation and a dash of competitive spirit, this InternLM concoction awaits.

Claude 3.5's Conundrum and Lockdown: Attempts to bypass the ethical constraints in Claude 3.5 Sonnet led to frustration among users, with strategies around specific pre-prompts making little to no dent.
Despite the resilience of Claude's restrictions, suggestions to experiment with Anthropic's workbench were shared. Yet, users were cautioned about the risks of account restrictions following such endeavors. Peer into the conversation.

Apollo's Artistic AI Ascent: Achyut Benz bestowed the Apollo project upon the world, an AI that crafts visuals akin to the admired 3Blue1Brown animations. Built atop Next.js, it taps into GroqInc and interweaves both AnthropicAI 3.5 Sonnet & GPT-4.
Apollo is all about augmenting the learning experience with AI-generated content, much to the enjoyment of the technophile educator. Watch Apollo's reveal.

OpenRouter (Alex Atallah) Discord

Quantum Leap in LLM Deployment: OpenRouter's deployment strategy for LLM models specifies FP16/BF16 as the default quantization standard, with exceptions noted by an associated quantization icon.
The adaptation of this quantization approach has sparked detailed discussions on the technical implications and efficiency gains.

API Apocalypse Averted by OpenRouter: A sudden change in Microsoft's API could have spelled disaster for OpenRouter users, but a swift patch brought things back in line, earning applause from the community.
The fix restored harmony, reflecting OpenRouter’s readiness for quick turnarounds in the face of technical disruptions.

Infermatic Instills Privacy Confidence: In an affirmative update, Infermatic declared its commitment to real-time data processing with its new privacy policy, explicitly stating it won’t retain input prompts or model outputs.
This update brought clarity and a sense of security to users, distancing the platform from previous data retention concerns.

DeepSeek Decodes Equation Enigma: Users troubleshooting issues with DeepSeek Coder found a workaround for equations not rendering by ingeniously using regex to tweak output strings.
Persistent problems with TypingMind's frontend not correctly processing prompts were flagged for a fix, demonstrating proactive community engagement.

Pricey API Piques Peers: Debate heated up around Mistral's Codestral API pricing strategy, with the 22B model considered overpriced by some community members.
Users steered each other towards more budget-friendly alternatives like DeepSeek Coder, which offers competitive coding capabilities without breaking the bank.

Eleuther Discord

Fingerprints of the Digital Minds: The community explored Topological Data Analysis (TDA) for unique model fingerprinting and debated the utility of checksum-equivalent metrics for model validation, such as for the LlamaForCausalLM using tools like lm-evaluation-harness.
Discussions also touched on Topological Data Analysis to profile model weights by their invariants, referencing resources like TorchTDA and considering bit-level innovations from papers like 1.58-bit LLMs for efficiency.

Tales of Scaling and Optimization: Attention was given to the efficientcube.ipynb notebook for scaling laws, while AOT compilation capabilities in JAX were highlighted as a step forward in pre-execution code optimization.
FLOPs estimation methods for JIT-ed functions in Flax were shared, and critical batch sizes were reinvestigated, challenging the assumption that performance is unaffected below a certain threshold.

Sparse Encoders and Residual Revelations: The deployment of Sparse Autoencoders (SAEs) trained on Llama 3 8B's residual stream discussed utilities for integrating with LLMs for better processing, furnished with details on the model's implementation.
Looking into residual stream processing, the strategy organized SAEs by layer for optimizing their synergy with Llama 3 8B, as expanded upon in the associated model card.

Harnessing the Horsepower of Parallel Evaluation: Enthusiast surfaced questions on the viability of caching preprocessed inputs and resolving Proof-Pile Config Errors, noting that changing to lambada_openai circumvented the issue.
Notables included model name length issues, prompting OSError(36, 'File name too long'), and guidance was sought on setting up parallel model evaluation, with warnings about single-process evaluation assumptions.

LangChain AI Discord

LangChain Lamentations: LangChain users reported performance issues when running on CPU, with long response times and convoluted processing steps being a significant pain point.
The debate is ongoing whether the sluggishness is due to inefficient model reasoning or the absence of GPU acceleration, while some suggest it's bogged down by unnecessary complexity, as discussed here.

AI Model Showdown: OpenAI vs ChatOpenAI: Discussions ensued over the advantages of using OpenAI over ChatOpenAI as the former might be phased out, sparking a comparison of their implementation efficiencies.
Members shared mixed experiences around task-specific requirements, while some preferred OpenAI for its familiar interface and tooling.

Juicebox.ai: The People Search Prodigy: Juicebox.ai's PeopleGPT was praised for its Boolean-free natural language search capabilities to swiftly identify qualified talent, enhancing the talent search with ease-of-use features.
The technical community lauded its combination of filtering and natural language search, elevating the overall experience for users; details are available here.

RAG Chatbot Calendar Conundrum: A LangChain RAG-based chatbot developer sought guidance for integrating a demo scheduling function, highlighting the complexities found in the implementation process.
Community response was geared towards assisting with this integration, indicating a cooperative effort to enhance the chatbot's capabilities despite the absence of explicit links to resources.

Visual Vectored Virtuosity: A blogpost outlined creating an E2E Image Retrieval app using Lightly SSL and FAISS, complete with a vision transformer model.
The post, accompanied by Colab Notebook and Gradio app, was shared to encourage peer learning and application.

LlamaIndex Discord

LlamaIndex RAG-tastic Webinar Whirl: LlamaIndex partnered with Weights & Biases for a webinar demystifying the complexities involved in RAG experimentation and evaluation. The session promises insights into accurate LLM Judge alignment, with a spotlight on Weights and Biases collaboration.
Anticipation builds as the RAG pipeline serves as a focal point for the upcoming webinar, highlighting challenges in the space. A hint of skepticism over RAG's nuanced evaluation underscores community buzz around the event.

Rockstars of AI Edging Forward: Rising star @ravithejads shares his ascent in becoming a rockstar AI engineer and educator, fueling aspirations within the LlamaIndex community.
LlamaIndex illuminates @ravithejads's contribution to OSS and consistent engagement with AI trends, igniting discussions about pathways for professional development in AI.

Reflecting on 'Reflection as a Service': 'Reflection as a Service' enters the limelight at LlamaIndex, proposing an introspective mechanism to boost LLM reliability by adding a self-corrective layer.
This innovative approach captivated the community, sparking dialogue on its potential to enhance agentic applications through intelligent self-correction.

Cloud Function Challenges Versus Collaborative Fixes: Discussions surfaced on the Google Cloud Function regarding hardships with multiple model loading, sparking a collective search for more efficient methods among AI enthusiasts.
Community wisdom circulates as members share their strategies for reducing load times and optimizing model use, showcasing a collaborative spirit in problem-solving.

CRAG – Corrective Measures on Stage: Yan et al. introduce Corrective RAG (CRAG), an innovative LlamaIndex service designed to dynamically validate and correct irrelevant context during retrieval, stirring interest among AI practitioners.
Connections are drawn between CRAG and possibilities for advancing retrieval-augmented generation systems, fueling forward-thinking conversations on refinement and accuracy.

Cohere Discord

Open Invites to AI Soirees: Community members clarified that no special qualifications are necessary to attend the London AI event; simply filling out a form will suffice. The inclusive policy ensures that events are accessible to all, fostering a diverse exchange of ideas.
Discussion around event attendance highlighted the importance of community engagement and open access in AI gatherings, as these policies promote broader participation and knowledge sharing across fields and expertise levels.

API Woes in Production Mode: A TypeError issue was raised by a member deploying an app using Cohere's rerank API in production, sparking a troubleshooting thread in contrast with its smooth local operation.
The community’s collaborative effort in addressing the rerank API problems showcased the value of shared knowledge and immediate peer support in overcoming technical challenges in a production environment.

Fresh Faces in AI Development: Newly joined members of diverse backgrounds, including a Computer Science graduate and an AI developer focused on teaching, introduced themselves, expressing eagerness to contribute to the guild's collective expertise.
The warm welcome extended to newcomers underlines the guild's commitment to nurturing a vibrant community of AI enthusiasts poised for collaborative growth and learning.

Command R+ Steals the Limelight: Cohere announced their most potent model in the Command R family, Command R+, now ready for use, creating quite the buzz among the tech-savvy audience.
The release of Command R+ is seen as a significant step in advancing the capabilities and applications of AI models, indicating a continuous drive towards innovation in the field.

Saving Scripts with Rhea.run: The introduction of a 'Save to Project' feature in Rhea.run was met with enthusiasm as it allows users to create and preserve interactive applications through conversational HTML scripting.
This new feature emphasizes Rhea.run’s dedication to simplifying the app creation process, thereby empowering developers to build and experiment with ease.

OpenInterpreter Discord

MacOS Copilot Sneaks into Focus: The Invisibility MacOS Copilot featuring GPT-4, Gemini 1.5 Pro, and Claude-3 Opus was highlighted for its context absorption capabilities and is currently available for free.
Community members showed interest in potentially open-sourcing grav.ai to incorporate similar functionalities into the Open Interpreter (OI) ecosystem.

'wtf' Command Adds Debugging Charm to OI: The 'wtf' command allows Open Interpreter to intelligently switch VSC themes and provide terminal debugging suggestions, sparking community excitement.
Amazement over the command's ability to execute actions intuitively was voiced, with plans to share further updates on security roundtables and the upcoming OI House Party event.

Shipping Woes for O1 Light Enthusiasts: Anticipation and frustration were the tones within the community regarding the 01 Light shipments, as discussions revolved around delays.
Echoed sentiments of waiting reinforced the collective desire for clear communication on shipment timelines.

Modular (Mojo 🔥) Discord

Mojo Objects Go Haywire!: Members discussed a casting bug affecting Mojo objects compared to Python objects, potentially linked to GitHub Issue #328.
A debate ensued on whether the casting bug might be correlated with differences in object handling, as outlined in issues #3065 and #3167.

MLIR's Unsigned Integer Drama: The community discovered that MLIR interpreted unsigned integers as signed, sparking discussion and leading to the creation of GitHub Issue #3065.
Concern surged around how this unsigned integer casting issue could impact various users, pivoting the conversation to this emerging bug.

Compiler Nightly News: Segfaults and Solutions: Recent segfaults in the nightly build led to the submission of a bug report and sharing the problematic file, seen here.
Added to this, new compiler releases were announced, with improvements including an exclusive parameter and new methods in version 2024.7.505, linked in the changelog.

Marathon March: Mojo's Matrix Multiplication: Benny impressed by sharing a matrix multiplication technique and recommended tailoring block sizes, advising peers to consult UT Austin papers for insights.
In a separate discussion thread, speed bumps occurred with increased compilation times and segfaults in the latest test suite, with participants directing each other to resources such as a Google Spreadsheet for papers.

LLM Finetuning (Hamel + Dan) Discord

Solo Smithing Without Chains: Discussion confirmed LangSmith can operate independently of LangChain as demonstrated in examples on Colab and GitHub. LangSmith allows for instrumentation of LLMs, offering insights into application behaviors.
Community members assuaged concerns about GPU credits during an AI course, emphasizing proper communication of terms and directing to clear info on the course platform.

Credit Clarity & Monthly Challenges: A hot topic revolves around the $1000 monthly credit and its perishability, with consensus on no rollover but still appreciating the offer.
A user's doubt about a mysteriously increased balance of $1030 post-Mistral finetuning led to speculation on a possible $30 default credit per month.

Training Tweaks: Toiling with Tokens: A thread on the Meta-Llama-3-8B-Instruct setup using type: input_output sparked some confusion, with users examining special tokens and model configurations, referencing GitHub.
Trainers experienced better outcomes favoring L3 70B Instruct over L3 8B, serendipitously found when a configuration defaulted to the instruct model, highlighting model choice implications.

Credit Confusion & Course Catch-up: Uncertainty loomed about credit eligibility for services, with one member seeking clarification on terms post-enrollment since June 14th.
Another user echoed concerns about compute credit expiration, requesting an extension for the remaining credit which slipped through the calendar cracks.

Interconnects (Nathan Lambert) Discord

Debunking the Demo Dilemma: Community member challenged the legitimacy of an AI demo, calling into question the realism of its responses and highlighting significant response time problems. The thread included a link to the contentious demonstration.
In an apologetic pivot, Stability AI made revisions to the Stable Diffusion 3 Medium in response to community feedback, along with clarifications on their license, earmarking a path for future high-quality Generative AI endeavors.

Search Smackdown: BM42 vs. BM25: The Qdrant Engine touted its BM42 as a breakthrough in search technology, promising superior RAG integration over the long-established BM25, as seen in their announcement.
Critics, including Jo Bergum, questioned the integrity of BM42's reported success, suggesting the improbability of the claims and sparking debate on the validity of the findings presented on the dataset from Quora.

VAEs Vexation and AI Investment Acumen: A humorous account of the difficulties in grasping Variational Autoencoders surfaced, juxtaposed against a claim of exceptional AI investment strategy within the community.
A serious projection deduced that for AI to bolster GDP growth effectively, it must range between 11-15%, while the community continues to grapple with Anthropic Claude 3.5 Sonnet's opaque operations.

Google's Grinder in the Global AI Gauntlet: Users discussed Google's sluggish start in the Generative AI segment, expressing concerns over the company's messaging clarity and direction regarding its products like Gemini web app.
Discourse evolved around the pricing model and effectiveness of Google’s Gemini 1.5, with comparisons to other AI offerings and software like Vertex AI, amid reflections on the First Amendment's application to AI.

OpenAccess AI Collective (axolotl) Discord

API Queue System Quirks: Reports of issues with the build.nvidia API led to discovery of a new queue system to manage requests, signaling a potentially overloaded service.
A member encountered script issues with build.nvidia API, observing restored functionality after temporary downtime hinting at service intermittency.

YAML Yields Pipeline Progress: A member shared their pipeline's integration of YAML examples for few-shot learning conversation models, sparking interest for its application with textbook data.
Further clarifications were provided on how the YAML-based structure contributed to efficient few-shot learning processes within the pipeline.

Gemma2 Garners Stability: Gemma2 update brought solutions to past bugs. A reinforcement of version control with a pinned version of transformers ensures smoother future updates.
Continuous Integration (CI) tools were lauded for their role in preemptively catching issues, promoting a robust environment against development woes.

A Call for More VRAM: A succinct but telling message from 'le_mess' underlined the perennial need within the group: a request for more VRAM.
The single-line plea reflects the ongoing demand for higher performance computing resources among practitioners, without further elaboration in the conversation.

tinygrad (George Hotz) Discord

Tensor Trouble in Tinygrad: Discussions arise about Tensor.randn and Tensor.randint creating contiguous Tensors, while Tensor.full leads to non-contiguous structures, prompting an examination of methods that differ from PyTorch's expectations.
A community member queried about placement for a bug test in tinygrad, debating between test_nn or test_ops modules, with the final decision leaning towards an efficient and well-named test within test_ops.

Training Pains and Gains: Tinygrad users signal concerns about the framework's large-scale training efficiency, calling it sluggish and economically impractical, while considering employing BEAM search despite its complexity and time demands.
A conversation sparks around the use of pre-trained PyTorch models in Tinygrad, directing users towards tinygrad.nn.state.torch_load for effective model inference operations.

Matmul Masterclass: A blog post showcasing a guide to high-performance matrix multiplication achieves over 1 TFLOPS on CPU, shared within the community, detailing the practical implementation approach and source code.
The share included a link to the blog post that breaks down matrix multiplication into an accessible 150 line C program, inviting discussion on performance optimization in Tinygrad.

Torchtune Discord

Torchtune's Tuning Talk: Community members exchanged insights on setting evaluation parameters for Torchtune, with mentions of a potential 'validation dataset' parameter to tune performance.
Others raised concerns about missing wandb logging metrics, specifically for evaluation loss and grad norm statistics, highlighting a need for more robust metric tracking.

Wandb Woes and Wins: A topic of discussion was wandb's visualization capabilities, where a grad norm graph miss sparked questions about its availability compared to tools like aoxotl.
Suggestions included adjusting the initial learning rate to affect the loss curve, but despite optimizations, one member noted no significant loss improvements, emphasizing the challenges of parameter fine-tuning.

AI Stack Devs (Yoko Li) Discord

Code Clash: Python meets TypeScript: A challenging encounter was shared regarding the integration of Python with TypeScript while setting up the Convex platform. Issues surfaced when Convex experienced launch bugs stemming from a lack of pre-installed Python.
Furthermore, discussion revolved around the difficulties faced in automating the installation of the Convex local backend within a Docker environment, emphasizing the complication arose from the specific configuration of container folders as volumes.

Pixel Hunt: In Search of the Perfect Sprite: A member explored the domain of sprite sheets, expressing their goal to find visuals resonant with the Cloudpunk game's style, but found their assortment from itch.io lacking the desired cyberpunk nuance.
They are on the lookout for sprite resources that align better with Cloudpunk's distinctive aesthetic, as previous acquisitions fell short in mirroring the game's signature atmosphere.

DiscoResearch Discord

Summarizing with a GPT Trio: Three GPTs Walk into a Bar and Write an Exec Summary blog post showcases a dynamic trio of Custom GPTs designed to extract insights, draft, and revise executive summaries swiftly.
This toolkit enables the producing of succinct and relevant executive summaries under tight deadlines, streamlining the process for delivering condensed yet impactful briefs.

Magpie's Maiden Flight on HuggingFace: The Magpie model makes its debut on HuggingFace Spaces, offering a tool for generating preference data, albeit with a duplication from davanstrien/magpie.
User experiences reveal room for improvement, as feedback indicates that the model’s performance isn't fully satisfactory, yet the community remains optimistic about its potential applications.

MLOps @Chipro Discord

Build With Claude Campaigns Ahead: Engineering enthusiasts are called to action for the Claude hackathon, a creative coding sprint winding down next week.
Participants aim to craft innovative solutions, employing Claude's capabilities for a chance to shine in the closing contest.

Kafka's Cost-Cutting Conclave: A webinar set for July 18th at 4 PM IST promises insights into optimizing Kafka for better performance and reduced expenses.
Yaniv Ben Hemo and Viktor Somogyi-Vass will steer discussions, focusing on scaling strategies and efficiency in Kafka setups.

Datasette - LLM (@SimonW) Discord

Jovial Jest at Job Jargon: Conversations have sprouted around the growing potential uses for embeddings in the field, sparking some playful banter about job titles.
One participant quipped about renaming themselves an Embeddings AyEngineer*, lending a humorous twist to the evolving nomenclature in AI.

Title Tattle Turns Trendy: The rise in embedding-specific roles leads to a light-hearted suggestion of the title Embeddings Engineer.
This humorous proposition underscores the significance of embeddings in current engineering work and the community's creative spirit.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

HuggingFace ▷ #announcements (1 messages):

VLM training dataset in Vietnamese
Highlights parser
See 2 sound demo
text2cypher model
Guide to Designing New Functional Proteins

Vietnamese VLM Dataset Released: VLM training dataset in Vietnamese released by user. The dataset is now available for community use.
Highlights Parser Tool: Highlights parser tool created by user is now available. It helps users parse community highlights effectively.
See 2 Sound Demo: Check out the See 2 sound demo based on the newly released paper available on this space. It provides an innovative way to experience sound.
Text2Cypher Model Outperforms GPT-4: The new text2cypher model by user outperforms GPT-4. This model represents a significant advancement in text-to-cypher translation.
Guide to Designing Functional Proteins: Guide to Designing New Functional Proteins and improving them with Generative AI now available here. This guide covers protein function, stability, and diversity.

HuggingFace ▷ #general (495 messages🔥🔥🔥):

Use of Deepeval with HuggingFace Transformers
Proficiency certifications in ML
Uploading image on HuggingFace projects using Gradio API
GPU recommendations for ML beginners
Issues with renting A100 vs. 4090 GPUs for inference

Proficiency certifications in ML: Members discussed various certifications to validate ML skills, preferring free options from platforms like Harvard and Coursera.
GPU recommendations for ML beginners: Users debated between recommending RTX 3060 or 4060, considering VRAM and performance, with suggestions leaning towards 3060 for its 12GB VRAM.
Issues with renting A100 vs. 4090 GPUs for inference: A discussion revolved around renting GPU configurations for efficient ML model inference, with suggestions pointing towards H100 over multiple 4090s for better performance.
Creating video with AI models: The chat explored text-to-video generation AI models like the ipivs-morph-img2vid-animatediff-lcm-hyper-sd, noting that processing on standard devices is slow but feasible.
Stable Diffusion model licensing update: Stability AI revised the license for SD3 Medium to better support the open-source community, addressing previous issues with commercial use restrictions.

Links mentioned:
Startup Weekend Tokyo: no description found
Serverless GPU Endpoints for AI Inference: Run machine learning inference at scale with RunPod Serverless GPU endpoints.
How I train a LoRA: m3lt style training overview: no description found
Luma Dream Machine: Dream Machine is an AI model that makes high quality, realistic videos fast from text and images from Luma AI
Happyfourthofjuly July4th GIF - Happyfourthofjuly July4th - Discover & Share GIFs: Click to view the GIF
ERLAX on Instagram: "…#techno #dreamcore #rave #digitalart #aiart #stablediffusion": 2,738 likes, 151 comments - erlax.case on June 24, 2024: "… #techno #dreamcore #rave #digitalart #aiart #stablediffusion".
artificialguybr/doodle-redmond-doodle-hand-drawing-style-lora-for-sd-xl · Hugging Face: no description found
Reddit - Dive into anything: no description found
Google Colab: no description found
InstantStyle - a Hugging Face Space by InstantX: no description found
Tweet from lilbotomy☆ (@p4ino): this is how i tell stories
diffusion/zelda.ipynb at main · nroggendorff/diffusion: Contribute to nroggendorff/diffusion development by creating an account on GitHub.
internlm/internlm2_5-7b-chat-1m · Hugging Face: no description found
Tweet from undefined: no description found
I Just Work Here Idk GIF - I just work here Idk Idk about that - Discover & Share GIFs: Click to view the GIF
Community License — Stability AI: Our new Community License is now free for research, non-commercial, and commercial use. You only need a paid Enterprise license if your yearly revenues exceed USD$1M and you use Stability AI models in...
aheedsajid/Edge-TTS · 🚩 Report: Spam: no description found
Happy Tree Friends Htf GIF - Happy tree friends Htf Cuddles - Discover & Share GIFs: Click to view the GIF
Nick088/Stable_Diffusion_Finetuned_Minecraft_Skin_Generator · 🚩 Report: Spam: no description found
Nick088/SDXL-Flash · Need a btter version: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

HuggingFace ▷ #today-im-learning (2 messages):

Building a TikTok videos dataset for harmful content classification
Troubleshooting LDM implementation with RGB images

TikTok Dataset to Classify Harmful Content: A user shared a TikTok videos dataset, 30 GB with around 3,000 videos, to build a video classification model for classifying harmful content for children. They also provided a notebook for fine-tuning a Hugging Face model on this dataset.
LDM Model Troubleshooting: A user is learning to create LDMs from scratch with Flax library, succeeding with the MNIST dataset but facing issues with RGB images from imagenette/160px-v2. They requested tips for troubleshooting as their model only generates color blocks for RGB images.

Links mentioned:
TikHarm Dataset: A dataset of TikTok videos for training models to classify harmful content.
How to Use Hugging Face for Fine-Tuning on the Tik: Explore and run machine learning code with Kaggle Notebooks | Using data from TikHarm Dataset

HuggingFace ▷ #cool-finds (6 messages):

Kyutai.org's digital pirate understands English with a French accent
Small demo of Moshi, an audio language model
Graph Structure Learning (GSL) with GraphEdit and large language models
Claude's ease in building Deep Learning Visualizer dashboards
nanoLLaVA - cool VLM under 1B

Kyutai's digital pirate gets language savvy: A tweet from Yann LeCun reveals that Kyutai.org's digital pirate can understand English with a French accent. This was demonstrated in a small demo by Neil Zegh from the Moshi project.
GraphEdit pushes the boundaries of GSL: The paper GraphEdit proposes a new approach to Graph Structure Learning using Large Language Models (LLMs) for enhanced reliability by instruction-tuning over graph structures.
nanoLLaVA attains attention: The Hugging Face space nanoLLaVA is highlighted as a cool Visual Language Model (VLM) under 1 billion parameters. It has been noted for its impressive visualization capabilities.

Links mentioned:
Tweet from Yann LeCun (@ylecun): Where we learn that http://Kyutai.org's digital pirate understands English with a French accent Quoting Guillaume Grallet (@guillaumgrallet) A small demo by ⁦@neilzegh⁩ from #moshi, an audio la...
nanoLLaVA-1.5 - a Hugging Face Space by qnguyen3: no description found
GraphEdit: Large Language Models for Graph Structure Learning: Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data by generating novel graph structures. Graph Neural Networks (GNNs) have...

HuggingFace ▷ #i-made-this (32 messages🔥):

Introduction of Vision-Language model for Vietnamese by Vi-VLM team
Vi-VLM releasing a dataset for VLM training in Vietnamese
Simple translation tool for converting messages to pt-br
CyclicFormer architecture enhancement for transformers
UVR5's UI completion for audio separation

Vi-VLM introduces Vision-Language model for Vietnamese: The Vi-VLM team introduced a Vision-Language model for Vietnamese, built on LLaVA and Vistral, with an image description focus; demo and code available here.
CyclicFormer enhances transformer performance: The CyclicFormer architecture introduces a cyclic loop between decoder layers to enhance transformer performance, GitHub link here.
E2E Image Retrieval app using Lightly SSL: An image retrieval app was built using an arbitrary image dataset from the Hub, leveraging FAISS for vector indexing and Lightly SSL for self-supervised learning, detailed in a blogpost.
Check out the Gradio app for a practical demonstration.

UVR5 UI for audio separation completed: UVR5's UI is now complete, allowing easy separation of vocals and instrumental tracks; it uses advanced audio separation models available via Gradio.
Perfect separation of voice and melody in various tests, including popular songs like 'Faroeste Caboclo' from 1987.

Simple translation tool for pt-br: A tool was created to translate community highlights into pt-br, useful for faster importing of messages; see the tool here.

Links mentioned:
rishitdagli/see-2-sound · Hugging Face: no description found
Highs Parser - a Hugging Face Space by rrg92: no description found
UVR5 UI - a Hugging Face Space by TheStinger: no description found
GitHub - LegallyCoder/CyclicFormer: CyclicFormer is a new architecture designed to enhance the performance of the transformer architecture. It introduces a new perspective for decoder layers, forming a cyclic loop between all the layers.: CyclicFormer is a new architecture designed to enhance the performance of the transformer architecture. It introduces a new perspective for decoder layers, forming a cyclic loop between all the lay...
Vi-VLM/Vista · Datasets at Hugging Face: no description found
Vi-VLM/Vistral-V-7B · Hugging Face: no description found
GitHub - hllj/Vistral-V: Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model.: Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model. - hllj/Vistral-V
Vector Indexes and Image Retrieval using lightly: Use a pre-trained Vision Transformer provided by Lightly to create a vector index on an arbitrary dataset for Image Retrieval using faiss
Food101 Image Retrieval - a Hugging Face Space by lightly-ai: no description found
Tweet from Saurav Maheshkar ☕️ (@MaheshkarSaurav): 🚀 Latest work at @LightlyAI. Learn how you can create an Image Retrieval app using FAISS (@AIatMeta) as an vector index 🗃️, model implementations from the Lightly SSL package and @weights_biases for...
Google Colab: no description found

HuggingFace ▷ #reading-group (7 messages):

triton paper reading
upcoming paper reading schedule
interest in audio-language models
flora paper discussion

Upcoming Paper Reading on Triton: A member apologized for delaying a planned paper reading on Triton due to being busy and invited others to present if interested. Participants were encouraged to contact another member for more information.
Flora Paper Gains Interest: A member expressed interest in the Flora paper, calling it cool. This paper seems to be gaining attention for an upcoming discussion.

HuggingFace ▷ #computer-vision (4 messages):

WHAM alternatives for human pose estimation in monocular, in-the-wild videos
Learning ViT and U-Net implementations
Using visual-semantic information to boost fine-grained image classification performance
Discussing zero/few shot multi-modal models at CVPR

Searching for WHAM alternatives for wrestling animations: A non-coder is looking for a machine learning method for human pose estimation in monocular, in-the-wild videos of complex human interactions like Brazilian jiu-jitsu. They struggled with WHAM due to its complex Python and CV dependencies and seek a more user-friendly alternative.
Learning ViT and U-Net from online resources: A member shared a link to learn ViT and U-Net implementations from the DL Specialization by Andrew Ng and CNN Course Week 3.
Boosting image classification using visual-semantic info: Another user inquired about leveraging visual-semantic information from captions/metadata to enhance fine-grained image classification performance beyond zero/few shot learning. Florence 2 was suggested as a potential model for this specific supervised fine-tuning.

Link mentioned: 08. PyTorch Paper Replicating - Zero to Mastery Learn PyTorch for Deep Learning: Learn important machine learning concepts hands-on by writing PyTorch code.

HuggingFace ▷ #NLP (17 messages🔥):

Meta-LLaMA download issues
API calls to models without local download
Inference freeze in Mistral model
Static KV cache documentation
Troubleshooting errors related to memory

*Meta-LLaMA download struggles: A user expressed frustration over Meta-LLaMA* taking forever to download and worried about their hard drive filling up due to potential temp files.
*API call confusion*: There was confusion on whether one could build an API call to a model without a local download, questioning the feasibility of this approach.
*Mistral model freezes at iteration 1800: Mistral* froze at iteration 1800 during inference of 3000 runs, whereas it worked fine for 100 inferences, leading to suspicion of some kind of caching problem.
*Static KV cache causes confusion: A user highlighted that the static KV cache* is on by default since version 4.41, suggesting checking the relevant release for more details.
*TypedStorage deprecation concern: Concerns were raised about TypedStorage* being deprecated, with a suggestion to wait for a stable solution before making any code changes.

Link mentioned: Release v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM · huggingface/transformers: New model additions 💎 Gemma 💎 Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned v....

HuggingFace ▷ #diffusion-discussions (3 messages):

Running RealVisXL_V4.0_Lightning using diffusers
Error with yisol/IDM-VTON in Google Colab
Improving resume analyzer to assess project intensity

*RealVisXL V4.0 Lightning model release*: RealVisXL V4.0 Lightning is in training and supports photorealistic images in both sfw and nsfw categories. Users can support the creator on Boosty and find the CivitAI page here.
*Diffusers don't match A1111 quality*: A user reported that the RealVisXL V4.0 model works well with A1111 but produces poorer quality images with diffusers despite using the same parameters.
*Error with IDM-VTON in Google Colab*: A user is encountering a 'no file named diffusion_pytorch_model.bin' error while using yisol/IDM-VTON on Google Colab.
*Enhancing Resume Analyzer Beyond Keywords*: A user is seeking advice on creating a resume analyzer that evaluates project intensity rather than just matching keywords. They aim to differentiate between less complex tasks and more significant projects.

Link mentioned: SG161222/RealVisXL_V4.0_Lightning · Hugging Face: no description found

Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):

License concerns with SD3 Medium release
Stability AI Community License update
Issues with commercial licensing in previous release
Improvement and support for open source community

Stability AI updates license for broader use: Stability AI acknowledged that their SD3 Medium release didn't meet community expectations and the associated commercial license caused confusion. They have revised the license for individual creators and small businesses, covered under the new Stability AI Community License, read the full update here.
Non-commercial use is free under new Stability AI License: Under the new Stability AI Community License, non-commercial use remains free. This change supports the open source community by giving broader access to recent releases, including SD3 Medium.

Link mentioned: Community License — Stability AI: Our new Community License is now free for research, non-commercial, and commercial use. You only need a paid Enterprise license if your yearly revenues exceed USD$1M and you use Stability AI models in...

Stability.ai (Stable Diffusion) ▷ #general-chat (528 messages🔥🔥🔥):

Hyper vs turbo
AAM Anime Mix XL
Animagine XL 3.1
Stable Diffusion GPU usage
CivitAI and SD3 discussions

Hyper is the new Turbo: Animagine XL 3.1 Updates: Users discussed the merits of the anime-themed model Animagine XL 3.1. This model improves on Animagine XL 3.0 with higher quality images and a broadened character range from well-known anime series, developed by Cagliostro Research Lab and SeaArt.ai.
AAM Anime Mix XL Gains Attention: A user shared their enthusiasm for AAM Anime Mix XL, another popular anime image generation model. This sparked comparisons and recommendations for related models like Animagine XL 3.1.
Struggles with Multiple GPU Configurations: Users discussed the challenges and potential solutions for using multiple GPU setups to improve Stable Diffusion's speed and output quality. Specific tools like SwarmUI were highlighted for their capabilities of handling multi-GPU operations.
CivitAI's SD3 Ban Sparks Debate: The community reacted to CivitAI's ban on SD3 models with mixed opinions. Many expressed that this move could hinder the development of SD3, while others discussed the technical and licensing issues surrounding the model.
Stable Diffusion Licensing and Model Updates: The conversation included concerns about the license for Stable Diffusion 3 and its new models. There were debates over whether the licensing terms were too restrictive, affecting both small and large business users.

Links mentioned:
Using A1111? Why not SwarmUI? - A transition guide | Civitai: I’ve recently transitioned from Forge to SwarmUI (previously known as StableSwarmUI), and I’m really glad I did! I had experimented with it before,...
SegMoE - The Stable Diffusion Mixture of Experts for Image Generation!: Mixture of experts. Seems hot for AI text generation... but what if you had a mixture of experts for IMAGE generation? Oh. Segmind just did that. Welcome to ...
How to Make Concept Art with AI (Free and Easy) - Stable Diffusion Tutorial 2022: ATTENTION! Lots has changed for the better since I made this video! Here’s my guide how to install and use Stable Diffusion in June 2023: https://youtu.be/nB...
cagliostrolab/animagine-xl-3.1 · Hugging Face: no description found
Reddit - Dive into anything: no description found
GitHub - vladmandic/automatic: SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models: SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models - vladmandic/automatic
GitHub - ltdrdata/ComfyUI-Manager: ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.: ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th...
Models - Hugging Face: no description found
PhdExpert-CDvr4 - Poe: INPUT YOUR DESIRED LANGUAGE. [TOP NOTCH RESPONSE EXPECTED]
GitHub - kijai/ComfyUI-LivePortraitKJ: ComfyUI nodes for LivePortrait: ComfyUI nodes for LivePortrait. Contribute to kijai/ComfyUI-LivePortraitKJ development by creating an account on GitHub.
ptx0 (PseudoTerminal X): no description found
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

Unsloth AI (Daniel Han) ▷ #general (267 messages🔥🔥):

Gemma 2 Release and its features
Issues with the Gemma 2 notebooks and user feedback
Methods for dataset preparation and handling long-context examples
Performance and optimization techniques for various LLMs
Recent advancements and announcements in AI models and tools

*Gemma 2 Release brings speed and VRAM improvements: The Gemma 2 Release is now available, claiming 2x faster finetuning and using 63% less VRAM compared to Flash Attention 2 (Gemma 2 Blog). Key details include support for up to 9.7K context lengths* with Unsloth.
"Blogpost was super rushed honestly I already found some mistakes," noted by a community member highlighting the fast-paced release.

*Unsloth notebooks and model directory issues: Users reported issues with the Gemma 2 notebooks, particularly errors related to model directory naming and missing configurations (e.g., unsloth/gemma instead of unsloth_gemma). Collaboration and quick fixes* were made by the developers to address these problems.
*Training on long-context examples and dataset preparation techniques: Members discussed techniques for handling long-context datasets, with some examples reaching up to 78,451 tokens. Suggestions included setting appropriate context lengths and using specific functions to find max tokens* in a dataset.
Sharing functions and discussing prompt engineering methods were common themes. Practical advice like, "you can choose the tone in the instruction part," were shared to help users better format their data for model training.

*Gemma 2 performance and limitations in the absence of Flash Attention support: Without Flash Attention support, Gemma 2 models are reported to be notably slow and almost unusable for intensive tasks*. This highlights the significant impact of optimized attention mechanisms on model performance.
Community members suggested that gradacc (gradient accumulation) might be a more efficient approach than traditional batching, with one noting, "If anything, gradacc was faster."

*New AI models and tools announcements: Nomic AI announced GPT4ALL 3.0*, a new open-source local LLM desktop app, emphasizing privacy and local data processing (GPT4ALL 3.0 Announcement). It's praised for supporting thousands of models and major operating systems.
InternLM-XComposer-2.5 was also mentioned, highlighting its capabilities to support long-context input and output, achieving GPT-4V level performance with just a 7B LLM backend (InternLM-XComposer-2.5).

Links mentioned:
Finetune Gemma 2 with Unsloth: Fine-tune Google's new Gemma 2 model 2x faster with 63% less memory VRAM via Unsloth! 9B and 27B parameters.
Emotions in AI: Fine-Tuning, Classifying, and Reinforcement Learning: In this video we are exploring the creation of fine-tuning dataset for LLM's using Unsloth and Ollama to train a specialized model for emotions detection.You...
mlx-community/Phi-3-mini-4k-instruct-8bit · Hugging Face: no description found
Tweet from Daniel Han (@danielhanchen): Gemma 2 finetuning is now 2x faster and uses 63% less VRAM with @UnslothAI! 1. We fixed 2 issues in the official Gemma repo 2. 27b Softcapping must be done on attn & logits, or losses will diverge. 9...
unsloth/unsloth/models/llama.py at 9b4cc934efec66abd0a77df011779b393a99c026 · unslothai/unsloth: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - b4rtaz/distributed-llama: Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.: Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage. - b4rtaz/distributed-llama
Tweet from Nomic AI (@nomic_ai): Launching GPT4All 3.0: The Open-Source Local LLM Desktop App - Completely Private Experience - Supports 1000’s of models and all major operating systems - Major UI/UX Improvements - Local File Chat -...
GPT4All: Run Large Language Models Locally: privacy-first and no internet required
Fix downcasting and upcasting by danielhanchen · Pull Request #67 · google/gemma_pytorch: Fixes RMS Layernorm downcasting prematurely. We move it to the very end. Fixes embedding matrix scaling / normalizer upcasting to float32. Instead we must use float16 or bfloat16 for the normali...
Baby Face Palm GIF - Baby Face Palm Really - Discover & Share GIFs: Click to view the GIF
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Tweet from AK (@_akhaliq): InternLM-XComposer-2.5 A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that s...
internlm/internlm-xcomposer2d5-7b · Hugging Face: no description found
MInference: Million-Tokens Prompt Inference for LLMs: no description found
Forecasting Model Search: no description found

Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Gemma 2 Release
Training speed and VRAM reduction
Context length improvements
4-bit model support updates
Experimentation with models

*Gemma 2 speeds up finetuning: Unsloth now supports Gemma 2 with 2x faster training and 63% less memory usage*. Check out the Gemma 2 Blog for more details.
*Context lengths boosted significantly: You can now finetune Gemma 2 (27B) with 9.7K context lengths on a 40GB GPU using Unsloth, compared to 3K with HF+FA2. The 9B model achieves 11K context lengths* on a 24GB card, versus 2.6K with HF+FA2.
*New Free Notebooks available: Access the Gemma 2 (9B) Colab notebook to get started with the latest model. Gemma 2 (27B)* notebook support has also been added.
*4-bit models now supported: Explore the new 4-bit models: Gemma 2 (9B) Base, Gemma 2 (9B) Instruct, Gemma 2 (27B) Base, and Gemma 2 (27B) Instruct. The Phi 3 mini* update is also available on HF.
*Call for community experimentation: Unsloth encourages users to share, test, and discuss their models and results* in their community channels. Join the discussion and experiment with the latest updates.

Links mentioned:
Finetune Gemma 2 with Unsloth: Fine-tune Google's new Gemma 2 model 2x faster with 63% less memory VRAM via Unsloth! 9B and 27B parameters.
Google Colab: no description found

Unsloth AI (Daniel Han) ▷ #off-topic (7 messages):

Release of Replete-AI datasets
Discussion on Facebook multi-token prediction
Fireworks.ai yi-large issues

*Replete-AI Drops Massive Datasets*: Replete-AI announced the release of two new datasets each around 11-12GB and containing over 6 million rows of data. The datasets include an English-only version and a multilingual version aimed at training versatile AI models.
*Is Facebook's Multi-Token Prediction Worth it?*: Discussion sparked about the worthiness of Facebook's multi-token prediction model that requires sharing contact information to access. One member expressed skepticism, while another deemed it worthwhile despite Facebook's involvement.
*Fireworks.ai yi-large Disappoints Users*: Users reported frustrations with the yi-large model on Fireworks.ai. One user admitted to being 'jebaited' by the model, indicating it did not meet their expectations.

Links mentioned:
facebook/multi-token-prediction · Hugging Face: no description found
Replete-AI/Everything_Instruct · Datasets at Hugging Face: no description found
Replete-AI/Everything_Instruct_Multilingual · Datasets at Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #help (121 messages🔥🔥):

Issues with Unsloth patch 2024.7 and checkpoints
Gemma 2 support in Unsloth
Fine-tuning models using Unsloth
Errors during fine-tuning and evaluation processes
Updating Unsloth and GGUF issues

*Gemma 2 support announced in Unsloth!: Unsloth has added support for Gemma 2*; you can now update and try the new features with the latest patch 2024.7.
*Checkpoint training errors in Unsloth patch 2024.7: Users reported errors like RuntimeError: Expected all tensors to be on the same device when resuming training from a checkpoint in Unsloth patch 2024.7*. Some suggested returning to older versions, but issues persist and require investigation.
*Unsloth fine-tuning pitfalls: Some users experienced issues fine-tuning Gemma 1.1 and Phi-3 mini* models without LoRA; it works for Phi-3 but raises errors when attempted with full fine-tuning on Gemma 1.1.
*Errors with specific models and configurations: Various errors were encountered, such as RuntimeError: The size of tensor a (4096) must match the size of tensor b (4608), when dealing with large models like Gemma-2-27B-bnb-4bit* and potential VRAM issues noted during evaluation with specific metrics.
*Updating Unsloth and handling GGUF issues*: Users were guided to update Unsloth via the wiki; some faced errors pushing fine-tuned models to Hugging Face due to GGUF quantization issues, which have since been fixed according to dev updates.

Links mentioned:
mlx-community/Phi-3-mini-4k-instruct-8bit · Hugging Face: no description found
tokenizer_config.json · microsoft/Phi-3-mini-128k-instruct at main: no description found
Home: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Adding accuracy, precision, recall and f1 score metrics during training: hi, you can define your computing metric function and pass it into the trainer. Here is an example of computing metrics. define accuracy metrics function from sklearn.metrics import accuracy_score, ...
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Google Colab: no description found
Home: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Ollama by danielhanchen · Pull Request #671 · unslothai/unsloth: no description found

Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Release of two new massive datasets by Replete-AI
Details and translations of Everything_Instruct_Multilingual
Questions about dataset deduplication and content balance

Replete-AI unveils massive instruct datasets: Replete-AI released two new datasets, Everything_Instruct and Everything_Instruct_Multilingual, each sizing 11-12GB with over 6 million rows of data. These datasets combine multiple types of instruct data to train advanced AI models in English and multilingual versions.
Translations for Everything_Instruct_Multilingual demo: A message demonstrated the Everything_Instruct_Multilingual dataset by providing translations in 10 different languages including Arabic, German, Spanish, and French for simple English commands.
Translations such as 'wake me up at nine am on friday' were shown in each language, like German: 'weck mich am freitag um neun uhr auf'.

Community queries dataset quality: Community members raised questions about the new datasets' quality, asking if they are deduped and decontaminated. Another member expressed concerns regarding the dataset's balance, noting that nearly 50% is code-related.

Links mentioned:
Replete-AI/Everything_Instruct_Multilingual · Datasets at Hugging Face: no description found
Replete-AI/Everything_Instruct · Datasets at Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #community-collaboration (10 messages🔥):

Pinning notebooks
Adding notebooks to the GitHub page
Correcting notebook links in the channels

*Pinning notebooks request confirmed*: A member requested that certain notebooks be pinned, and another member confirmed that they would do so, asking for some time.
*Notebook links corrected in channels*: A correction was made to the notebooks linked in the channels, clarifying that there were two notebooks: one about using multiple datasets and another about text classification.
*Notebooks to be added to GitHub page*: It was mentioned that the notebooks will be added to the GitHub page, but more time is needed for checking and editing.

Latent Space ▷ #ai-general-chat (94 messages🔥🔥):

AI + Blockchain funding discussions
Git merge tool alternatives and conflict resolutions
Learning AI curriculum and recommendations
Claude and other AI tools for coding assistance
Evaluations and criticisms of new search algorithms like BM42

*AI + Blockchain grabs $85M seed*: "AI + Blockchain = $85M seed ☠️ vcs are cooked," one member stated, joking about the massive funding while sharing a link to a free project: rakis.
*Git Merge Tools Showdown*: Members discussed various tools for resolving git merge conflicts, including interactive rebase tools like lazygit and Sublime Merge, emphasizing the tediousness of manual conflict resolution.
*Learning AI Curriculum for Beginners*: A user looking for AI learning resources received suggestions such as Replit's 100 Days of Code and the Deep Learning Specialization by Andrew Ng, and preferred interactive courses over books like Machine Learning Specialization.
*Claude 3.5 and Other AI Tools for Coding*: Users shared their experiences with coding tools like Claude 3.5 and aider, with favorable mentions for Cursor in terms of code completion and the ability to handle complex multi-file refactors.
*Controversy Over BM42 Search Algorithm*: The introduction of BM42 by Qdrant faced criticism for presenting potentially misleading benchmarks, prompting the developers to revise their evaluation metrics and datasets, as seen in their follow-up post.

Links mentioned:
Tweet from Qdrant (@qdrant_engine): For 40 years, BM25 has been the standard for search engines. However, it falls short for modern RAG applications. Say hello to BM42: The combination of semantic and keyword search
glazed/pkg/doc/topics/13-layers-and-parsed-layers.md at e180e5d59031f20009c461466a2995ff28ee25a7 · wesen/glazed: a library to make it easy to output structured data in your command line tools. add the icing on top of your data - wesen/glazed
Tweet from Jo Kristian Bergum (@jobergum): Okay, gloves off. What @qdrant_engine did with the BM42 post is unacceptable. They are misguiding the RAG community in a big way. 1) Presenting Quora as a relevant RAG question-answering dataset. I...
Tweet from Qdrant (@qdrant_engine): Hey all! We actually did find a discrepancy with our previous benchmarks of bm42. Please don't trust us and always check performance on your own data. Our best effort to correct it is here: http...
Build a Large Language Model (From Scratch): Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up!</b>In Build a Large Language Model (from Scratch)</i>, you’ll discover how LLMs w...

Latent Space ▷ #ai-announcements (5 messages):

New podcast episode with Yi Tay of Reka
Discussion on the qualities of successful AI researchers
Comparisons of OpenAI, Google Gemini, and Reka teams
Technical topics covered in the podcast

Yi Tay on YOLO Researcher Metagame: New podcast episode with Yi Tay of Reka discusses his team’s journey in building a new training stack from scratch and training frontier models purely based on gut feeling. Yi Tay draws comparisons to OpenAI and Google Gemini team sizes and reflects on the research culture at Reka.
"@sama once speculated on the qualities of '10,000x AI researchers', and more recently @_jasonwei described the 'Yolo run' researcher." Detailed topics include LLM trends, RAG, and Open Source vs Closed Models.

Now on Hacker News: Latent Space Podcast episode with Yi Tay is now featured on Hacker News. Engage with the discussion and vote for visibility.

Links mentioned:
New Links | Hacker News: no description found
Tweet from Latent Space Podcast (@latentspacepod): 🆕 pod: The Yolo Researcher Metagame with @YiTayML! https://latent.space/p/yitay OpenAI (ca. GPT4): ~600 people Google Gemini: ~950 coauthors @RekaAILabs: 20 people @sama once speculated on the qua...

Latent Space ▷ #llm-paper-club-west (34 messages🔥):

Issues with Discord AV
Migration to Zoom for better AV
Known compatibility issues between Discord and Linux

*Discord AV struggles during AIEWF demo: OpenAI AV faced significant issues during the AIEWF demo, with multiple users unable to see the screen and experiencing cut-outs. Eugene and others suggested switching to Zoom for a more stable experience.*
swyxio added:

*Switch to Zoom for Paper Club: The group decided to switch from Discord to Zoom* due to continuous AV issues. The Zoom link was shared, and members began migrating.
*Discord-Linux compatibility problems discussed: Several participants highlighted known compatibility problems between Discord and Linux. Eugene added that Discord does not play well with Linux* and suggested looking into alternatives.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...

Latent Space ▷ #ai-in-action-club (243 messages🔥🔥):

User technical difficulties and skill humor
Personal compliments to workshop hosts
Discussion on model merging tactics
LlamaFile vs Ollama comparison
Event planning and feedback

*Users Battle Technical Issues and Share Laughs: A user struggled to hear during a call, prompting jokes and the now-popular phrase, 'skill issue tbh'*. Eventually, the user realized they were not in the call and reconnected with a humorous resolution.
*LlamaFile vs Ollama: Divergent Aims: Community members compared LlamaFile and Ollama, noting LlamaFile's strength in portability and optimization versus Ollama's broad compatibility with numerous models*.
*Model Merging Tactics: A discussion highlighted the difference in product goals between LlamaFile and Ollama* while raising ideas of potential model merging tactics and respective improvements needed on both sides.
*AI-Generated Notes and Wearable Tech*: Discussion on the use of wearables highlighted their potential privacy concerns and the importance of consent in recording. Participants mentioned ambitions to integrate wearables with AI-generated notes for easier event navigation.
*Upcoming Event Plans and Feedback*: Participants brainstormed potential improvements for future events, considering additional days for workshops and community events and noting the success of current methods for organizing and executing productive AI conferences.

Links mentioned:
AI Engineering World Fair: no description found
AI Engineers World Fair Recaps - Powered by Compass: Experience the biggest technical AI conference with live transcriptions and AI-generated summaries.
AI Engineer World Fair in SF: Week 26 of Coding with Intelligence
Tweet from Rick Lamers (@RickLamers): Model merging is nuts, check out this family tree :0
Tweet from Philip Kiely (@philip_kiely): Here are 3 themes I picked up in 3 incredibly high-energy days at @aiDotEngineer World's Fair: 1. Open source is closing the gap 2. Inference everywhere 3. Evals are everything Details:
AI Engineering Worlds Fair: AI Engineering Worlds Fair Thomas Dohmke Human centric approach - “co-pilot” Copilot helps devs be in the flow of software Democratizes access to information - onboarding Agent - ai dishwasher (side...
AI in action - 2024-07-05: AI in action AI Engineers World Fair recap 2024-07-05
Tweet from Bryan Young (@intertwineai): @aiDotEngineer Day 3 Recap and Wrap! 1/12: Day 3 of #AIEWF 2024 is over and it's clear we're just scratching the surface of AI's potential and defining what an @aiDotEngineer is. Here...
Tweet from Bryan Young (@intertwineai): @aiDotEngineer 2nd Day Recap! 1/14. The second day started with a timely session on AI-generated music by @YoungPhlo_. We all made some sick beats together. Although the fresh @RIAA lawsuits agains...
Tweet from Bryan Young (@intertwineai): 1/5: Day 1 of @aiDotEngineer was just as exciting as I thought it would be! #AIEWF Quick recap of the day:

LM Studio ▷ #💬-general (157 messages🔥🔥):

Waiting to upgrade hardware for LM Studio
Comparison of Llama3 and Mistral models
Usage of API keys from OpenAI or Anthropic in LM Studio
Text embeddings and local server setup in LM Studio
Challenges in running large models like Llama3 70b on limited hardware

*Waiting to upgrade hardware for LM Studio: A user mentioned planning to wait for 2 years to buy a new laptop for LM Studio, preferring to use their current setup with 64GB DDR4 RAM, Ryzen 5900 CPU, and 3060 6GB GPU* in the meantime.
*Comparison of Llama3 and Mistral models: Members discussed preferences, with some favoring Llama3 8b over Mistral 7b Instruct 0.3, and others highlighting successful experiences with OpenHermes 2.5 finetuned from Mistral*.
*Usage of API keys from OpenAI or Anthropic in LM Studio: A user inquired whether LM Studio allows using API keys from OpenAI or Anthropic for loading their models. They were informed that LM Studio* supports only local text models.
*Challenges in running large models like Llama3 70b on limited hardware: A user reported issues running Llama3 70b on a RTX 3090 Ti* due to memory constraints, receiving advice to lower GPU offload and context length or switch to smaller models.

Links mentioned:
LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage: no description found
Text Embeddings | LM Studio: Text embeddings are a way to represent text as a vector of numbers.
Llama 3 Chat Meta AI - Llama 3 Chat Online 8B and 70B: Llama 3 is the latest language model from Meta.Llama 3 comes in two sizes: 8B and 70B.Quickly try out Llama 3 Online with this Llama chatbot.
Reddit - Dive into anything: no description found
30B model now needs only 5.8GB of RAM? How? · ggerganov/llama.cpp · Discussion #638: (Edit: apologies, I should have clarified initially I'm running on Linux OS. I didn't realize it might not be obvious from the screenshot alone for a non-Linux users.All tests are done on Ubun...

LM Studio ▷ #🤖-models-discussion-chat (130 messages🔥🔥):

Discussion on model behavior mismatch between different quantized versions of Gemma-2-27b
Using system prompts to improve coding model behaviors
Comparing different quantization techniques and their performance
Qwen2 model preset and ChatML format discussion
Issues and experiences with different large language models like Gemma, InternLM, and Dolphin

*Gemma 2 models underperform in benchmarks: Users reported that Gemma-2-27b models performed poorly and erratically in benchmarks, with significant inconsistencies across different quantization methods (Q5_K_M or Q6_K). A specific test showed vast discrepancies in performance between 27b and 9b* models.
*System prompts improve coding responses: Crafting tailored system prompts for coding guidance improved response quality in models like Gemma 2 27B. A specific method, focusing on PEP 8 guidelines and efficient algorithms, enhanced code generation consistency and completeness*.
*Understanding ChatML format for Qwen2: New users struggled with using Qwen2 models due to the lack of clear instructions on ChatML presets. A detailed explanation on the importance of ChatML format* helped clarify preset configurations.
*Issues with different quantization techniques: Users discussed the instability of IQ quants on non-CUDA hardware, reporting slower token speeds and random behavior like infinite loops and inconsistent responses. It's advised to avoid IQ quants on Apple devices* and consider other quantization methods for better performance.
*Experiences with various LLMs in game development and other tasks: Members shared mixed results from using different large models like Gemma, InternLM, and Dolphin for tasks like game development and VFX pipelines. Models showed uneven performances in retaining context and following instructions, leading to concerns over practical application and stability*.

Links mentioned:
How to work with the Chat Markup Language (preview) - Azure OpenAI: Learn how to work with Chat Markup Language (preview)
facebook/multi-token-prediction · Hugging Face: no description found
mradermacher/koboldai-erebus-extended-32k-7B-GGUF · Hugging Face: no description found
bartowski/Qwen2-7B-Instruct-GGUF · Hugging Face: no description found
deepseek-ai/ESFT-vanilla-lite · Hugging Face: no description found
Downtown-Case/internlm2_5-7b-chat-1m-llamafied-Q6K-GGUF at main: no description found
KoboldAI/Mistral-7B-Erebus-v3 · Hugging Face: no description found
mradermacher/Mistral-7B-Erebus-v3-i1-GGUF · Hugging Face: no description found
Reddit - Dive into anything: no description found

LM Studio ▷ #🧠-feedback (3 messages):

Issue with model downloads in LM on MacBook Pro M2
Solution for pausing/stopping downloads in LM

Models Get Stuck Downloading on MacBook Pro M2: msouga experienced an issue with some models in LM getting stuck downloading indefinitely on their MacBook Pro with an M2 chip, unable to stop these downloads or estimate their completion time.
How to Pause/Stop Downloads in LM: a_dev_called_dj_65326 suggested checking under the downloads section (bottom bar) to pause or stop the downloads. msouga confirmed this solution worked perfectly.

LM Studio ▷ #⚙-configs-discussion (5 messages):

Nxcode 7B JSON request
CodeQwen 1.5 7B ChatML compatibility
RTX 4060 8GB VRAM and 16 GB DDR5 RAM performance issues
Suggested models for mid-range GPU setups

Nxcode 7B JSON request: @49206c696b652063757465 asked for a JSON for Nxcode 7B or CodeQwen 1.5 7B.
CodeQwen 1.5 7B ChatML compatibility: @heyitsyorkie mentioned that both Nxcode 7B and CodeQwen 1.5 7B use ChatML, and CodeQwen requires flash attention enabled.
RTX 4060 8GB VRAM struggles with 20B models: @falconandeagle123 shared that their laptop with RTX 4060 8GB VRAM and 16 GB DDR5 RAM struggled to run q4 quant 20B models, causing the laptop to freeze.
Suggested models for mid-range GPU setups: @niga256_512_1024_2048 suggested using simpler models like Mistral 7B, Open Hermes 2.5, Wizard code, and Phi 3 mini for mid-range GPU setups.
They pointed out that these models are more suitable for systems similar to a laptop with RTX 4060.

LM Studio ▷ #🎛-hardware-discussion (61 messages🔥🔥):

Surface Laptop with Snapdragon Elite performance details
NPU and GPU utilization in Snapdragon devices
Comparison of CPU performance on Snapdragon and Intel devices
Future support for NPU in Llama.cpp
General discussion on hardware used with LM Studio

*Snapdragon Elite CPU holds its own: Member discusses performance details of Surface Laptop with Snapdragon Elite, including first token speed and tokens per second (t/s) on LLaMA3 models. Other members compare this with their Intel quad-core laptops and find Snapdragon's CPU performance impressive.*: A member reports 1.5 seconds to first token and 10 t/s on LLaMA3 8b with 8bit precision on a Surface Laptop with Snapdragon Elite and 32 GB of RAM. They note 10% GPU usage** and no NPU activity, sparking curiosity about potential future NPU utilization.
Comparisons reveal Snapdragon Elite CPU's performance to be significantly faster than older Intel quad-core laptops, even rivaling typical cloud AI speeds. Members speculate about future NPU support possibly leading to further speed improvements.

**Future NPU support for Llama.cpp?**: *Discussion on when NPU support might land for Llama models in LM Studio.*: Members discuss that NPU support is not yet available in Llama.cpp, leading to CPU-only performance for LLaMA models in LM Studio. Speculation arises about when support might be implemented, with hopes set for late 2024 or early 2025.
Conversations reveal that Qualcomm has a GitHub repository showing LLaMA2 operating on NPU, though it's currently rough. Community shows enthusiasm for future enhancements*, especially with Qualcomm and Microsoft pushing for NPU utilization.

*NPU implementation faces delays*: *Members express hopes and struggles with current hardware performance.: Efforts to implement NPU* in existing systems have been slow, with members sharing links to discussions and repositories investigating the subject (GitHub repo).
Members appear optimistic about eventual improvements, even sharing humorous suggestions like a Cardboard NPU as a placeholder solution.

*Surface Laptop shows promise on Snapdragon Elite*: *Users share their positive experience regarding the new Surface Laptop's build quality and performance.: A member praises the build quality, keyboard, and trackpad of their Surface Laptop with Snapdragon Elite. They highlight the ability to perform video editing and play games* as standout features.
Overall, the Surface Laptop with Snapdragon Elite is well-received, especially as a daily driver for personal use, despite needing separate work laptops with IT restrictions.

Links mentioned:
Office Michael GIF - Office Michael Scott - Discover & Share GIFs: Click to view the GIF
Not Funny Haha Not Funny GIF - Not funny Haha not funny Hahaha - Discover & Share GIFs: Click to view the GIF

LM Studio ▷ #🧪-beta-releases-chat (2 messages):

AppImage not compatible with aarch64 CPUs
No ARM CPU support on Linux for LM Studio

*AppImage not compatible with aarch64 CPUs: A user encountered an Exec format error while trying to execute LM_Studio-0.2.27.AppImage on an aarch64 system, indicating architecture incompatibility. The lscpu command output confirmed the CPU architecture as aarch64*.
*No ARM CPU support on Linux: Discussion highlighted the lack of ARM CPU support for LM Studio on Linux. A member confirmed, "No arm cpu support on linux"*.

LM Studio ▷ #amd-rocm-tech-preview (2 messages):

7800XT user confirms GPU works
Problems loading models with GPU offload
Successful ROCm installation script

*7800XT user confirms GPU works: User reports that their 7800XT works successfully and is not sure if pinging is needed.*
*Problems loading models with GPU offload: Loading models failed* unless GPU offload is disabled. Users discussed installation scripts to address this issue.
*Successful ROCm installation script: A user suggested a script* to install ROCm that helped solve loading issues with GPU offload. Another user confirmed it works well.

CUDA MODE ▷ #general (10 messages🔥):

Matrix multiplication in CUDA
Efficient remote development with pay-for-use compute
New blog post on executive summary process using GPTs
Paid CUDA/ML system certifications
Upcoming in-person CUDA mode event in October

Matrix Multiplication in CUDA: Why Column Instead of Row?: A user questioned why a 64-element column is loaded on the purple tile instead of a row during a GPU matrix multiplication, and another shared a detailed blog post for optimizing this process using CUDA.
Streamline Executive Summaries with GPTs: A new blog post details a process involving three Custom GPTs to expedite the writing of executive summaries, showing how they can extract insights, draft, and revise the summaries quickly.
Tips for Efficient Remote Development: Members discussed solutions for remote development that allows for pay-per-use compute while retaining files, mentioning services like Lightning AI and AWS S3 as potential options.
Recommendations for CUDA/ML Certifications: A user sought recommendations for paid CUDA/ML certifications under $500, leading to suggestions of NVIDIA courses and a possible community-organized workshop.
In-Person CUDA Mode Event Announced: CUDA Mode is planning an in-person event for October, as revealed by a community member, promising more details soon.

Links mentioned:
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog: In this post, I’ll iteratively optimize an implementation of matrix multiplication written in CUDA.My goal is not to build a cuBLAS replacement, but to deepl...
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

CUDA MODE ▷ #triton (2 messages):

Triton kernels with multiple CUDA graphs create latency issues
SRAM contention affecting performance

*Triton Kernels under Parallel CUDA Execution: Multiple CUDA graph instances running in parallel with Triton kernels show worse latencies* compared to local benchmarks.
It's suggested that SRAM contention might be a cause if multiple instances are doing tl.load.

*Comparison with Torch Performance: Despite potential SRAM contention, this issue doesn't seem present in Torch* under similar conditions.
This discrepancy raises questions about how SRAM evictions are handled differently between Triton and Torch.

CUDA MODE ▷ #torch (7 messages):

torch.compile not supported on Python 3.12
Python bytecode compatibility issues
TorchDynamo and Python frame evaluation API
TorchDynamo's role in PyTorch performance

*Torch 2.3 .compile Unsupported on Python 3.12: For torch 2.3, the .compile function is not supported on Python 3.12* due to changes in Python's internals, especially in how it handles bytecode. A detailed explanation can be found here.
*Python Bytecode Changes Cause Lag in Support: Python bytecode changes every Python version, requiring time for frameworks like torch.compile* to adjust and support these new changes. More information on the bytecode adjustments can be read in this documentation.
*TorchDynamo Enhances PyTorch Performance: TorchDynamo is a Python-level JIT compiler that hooks into CPython's frame evaluation to modify Python bytecode and compile PyTorch operations into an FX Graph*. Using torch._dynamo.optimize() wrapped by torch.compile(), it boosts PyTorch code performance seamlessly.

Links mentioned:
TorchDynamo Deep Dive — PyTorch 2.3 documentation: no description found
Torch.compile support for Python 3.12 completed: Signal boosting that Python 3.12 support has been added to torch.compile and has been present in the nightly builds for a while. We anticipate that this feature will be included in the PyTorch 2.4 rel...

CUDA MODE ▷ #algorithms (5 messages):

New method for training language models to predict multiple future tokens
Self speculative decoding in language models
Comparison between multi-token prediction and lookahead decoding baselines
Effectiveness of n-gram generation in multi-token prediction models

New approach boosts language model efficiency: Latest research paper suggests training language models to predict multiple future tokens at once, resulting in higher sample efficiency and improved downstream capabilities with no additional training time. 13B parameter model shows substantial gains, solving 12% more problems on HumanEval and 17% more on MBPP.
Self Speculative Decoding gets a thumbs up: A member mentioned the cool aspect of the model's ability to perform self speculative decoding.
Questioning lookahead decoding baselines: Members wondered how this new multi-token prediction compares to lookahead decoding baselines.
Dissecting n-gram effectiveness: A discussion emerged on the effectiveness of generating n-grams in multi-token prediction models and their alignment with traditional next-token prediction outputs.

Link mentioned: Better & Faster Large Language Models via Multi-token Prediction: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in h...

CUDA MODE ▷ #cool-links (1 messages):
iron_bound: https://oimo.io/works/life/

CUDA MODE ▷ #beginner (17 messages🔥):

Learning path for backend SDEs interested in CUDA and inference optimization
Challenges of finding jobs with open source contributions
Recommendation of CUDA Mode GitHub for beginners
Building a deep learning framework from scratch in C++ using CUDA
Using Python for CUDA kernel development vs C++

*Finding Path to CUDA Mastery: A backend SDE seeks advice on transitioning to a job related to CUDA and inference optimization. Recommendations included watching specific channels and reading relevant resources, contributing to GitHub, and joining working groups*.
*Open Source Contributions Not Always a Job Ticket: Concerns were raised about individuals making significant open source contributions* yet failing to secure jobs. The community acknowledged the challenge and discussed the high bar for entry.
*CUDA Mode GitHub: A Beginner's Treasure Trove: For beginners looking to dive into CUDA, CUDA Mode GitHub* was recommended as a fruitful starting point. It's suggested as a platform to build engaging projects and learn efficiently.
*Building Deep Learning Frameworks in C++ with CUDA: A member expressed interest in building a deep learning framework similar to tinygrad using CUDA and C++ for parallelism but encountered difficulties with C++ complexity. They considered using Python* instead for better manageability and potential for faster completion.
*Python vs C++ for CUDA Kernel Development: Debate ensued over whether to use Python or C++ for CUDA kernel development. The consensus leaned towards using Python for initial endeavors and transitioning to C++ for deep system-level work, citing repositories like llama.c* for learning.

CUDA MODE ▷ #pmpp-book (4 messages):

Fourth edition released in 2022
Differences between third and fourth editions

*Fourth edition released in 2022*: The fourth edition was released in 2022, whereas the previous edition was released in 2012.
*Differences between third and fourth editions*: A member mentioned not having read the third edition, expressing curiosity about the differences. Another member referred to the back of their copy for details.

CUDA MODE ▷ #jax (4 messages):

casual conversation
channel engagement

Casual Engagement in Channel: A member expressed their excitement with a simple "that's so cool!", indicating casual engagement and appreciation.
Another member replied with "thanks", showing a friendly and appreciative interaction in the channel.

Friendly Interactions: Members engaged in a casual and friendly manner with short messages like "yo" and "you".
These interactions reflect a positive and welcoming community environment.

CUDA MODE ▷ #torchao (11 messages🔥):

Handling a.to method recognition and functionality
Removing unnecessary args in PyTorch/ao
Current limitations and workarounds for a.to method
Adding support for device and dtype handling in subclasses
Future functionality and testing in Torchbench models

*Fixing a.to issues in PyTorch*: The a.to(torch.int32) method is recognized as a.to(device=torch.int32) causing unexpected behavior, and needing removal of unnecessary device and memory_format arguments in affine_quantized_tensor.py to fix this issue.
*Challenges with a.to(dtype=torch.int32): A discussion highlighted that a.to(dtype=torch.int32) currently only changes the device and not other keywords like dtype or layout, indicating that dtype and memory format changes* are unsupported for now.
*Temporary Function Adjustments in AQT*: A suggestion was made to modify the affine_quantized_tensor.py file to temporarily drop device, dtype, and memory_format arguments to handle the limitations in the current implementation.
*Subclass a.to Method Limitations*: Discussion around subclass functionality in torchbench revealed that handling a.to method for differing dtypes was not intended as changing external representations' dtype poses complex challenges.
*Testing Functionality in Torchbench: Concerns were raised about whether the current setup supports .to method across various models in torchbench, especially regarding subclass handling* and required functionality testing in AQT implementations.

Links mentioned:
ao/torchao/dtypes/affine_quantized_tensor.py at a8956992191853b13f82ceb3e6929bed7691a3fa · pytorch/ao: Create and integrate custom data types, layouts and kernels with up to 2x speedups and 65% less VRAM for inference and training - pytorch/ao
ao/torchao/dtypes/affine_quantized_tensor.py at a8956992191853b13f82ceb3e6929bed7691a3fa · pytorch/ao: Create and integrate custom data types, layouts and kernels with up to 2x speedups and 65% less VRAM for inference and training - pytorch/ao

CUDA MODE ▷ #off-topic (3 messages):

Thunder Sessions podcast by Lightning AI
Andrej Karpathy's keynote at UC Berkeley AI Hackathon 2024

*Thunder Sessions podcast ignites excitement: Lightning AI announced the Thunder Sessions podcast hosted by Luca Antiga and Thomas Viehmann to cover compilers and performance optimization, airing Friday, July 5 @ 11am EST*.
*Andrej Karpathy steals the show at UC Berkeley Hackathon: The YouTube video of the 2024 UC Berkeley AI Hackathon Awards Ceremony features Andrej Karpathy* delivering an inspiring keynote, highlighting groundbreaking pitches from the participants.

Links mentioned:
Tweet from Lightning AI ⚡️ (@LightningAI): We’re excited to introduce 🌩️ Thunder Sessions 🌩️, a new podcast from the team at Lightning AI covering the world of compilers and performance optimization. Join us this Friday, July 5 @ 11am EST w...
Andrej Karpathy's Keynote & Winner Pitches at UC Berkeley AI Hackathon 2024 Awards Ceremony: At the 2024 UC Berkeley AI Hackathon's Awards Ceremony, the atmosphere was electric as Andrej Karpathy, founding member of OpenAI, delivered an inspiring key...

CUDA MODE ▷ #llmdotc (134 messages🔥🔥):

CUDA MODE Discord chatbot messages
FP8 Gradient Issues in GPT-2 Training
Schedule-Free Optimizer Paper
GPT-2 Training Performance
Training Length Estimations for GPT-2

*Issues with Schedule-Free Optimizers*: A member noted that using Schedule-Free Optimizers produced surprisingly smooth loss curves, which seemed improbable on noisy datasets like ImageNet. Despite initial skepticism, the optimizer showed significant convergence advantages even without custom optimizations.
*FP8 Gradient Activations Impact GPT-2 Training*: A member found that converting gradient activations to FP8 significantly increased loss during GPT-2 test runs. They noted that this error propagated through the model, and attempts to mitigate it with stochastic rounding had limited success, suggesting keeping some operations in BF16 for stability.
*Technical Woes with Compile Times on Lambda Servers*: A user reported much longer compile times on Lambda servers compared to local machines, likely due to disabled CPU Turbo on virtualized instances. Investigations revealed the CPU staying at a base clock of 2GHz, unable to utilize its full potential of 3.8GHz Turbo clock speeds.
*Sweeps on Hyperparameters and Model Scaling*: Several discussions focused on sweeping different hyperparameters like LR, attn_mult, and out_mult across different model widths and depths. Preliminary results indicated that cosine schedulers and an attn_mult of 1 were optimal, but further tests were ongoing.
*Austin Tech Scene Tidbits*: Casual talk revealed that members attended July 4th parties with notable figures from the tech industry, like Lex Fridman. They also noted Austin's importance in semiconductor engineering but highlighted its lack of intersection with the broader tech scene.

Links mentioned:
Tweet from Lucas Nestler (@_clashluke): Schedule-free optimizers (https://x.com/aaron_defazio/status/1776320004465582331) are surreal. I've read the paper, looked into the math, and tried to understand what's happening. It all seem...
llm.c/scripts/run_gpt2_1558M.sh at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.

CUDA MODE ▷ #bitnet (3 messages):

Optimized kernels in CUDA for int2*int8 gemm
Release of a custom gemv for int2*int8
BitBLAS library for mixed-precision matrix multiplications

*Newcomer asks about optimized kernels for int2int8 gemm: A new member asked if there are optimized kernels in CUDA for *int2int8 gemm** operations.
*Custom gemv kernel release announced: A member announced that they have made a custom gemv kernel* for int2*int8, which will be released in a few days.
They also suggested checking out BitBLAS as another option.

Link mentioned: GitHub - microsoft/BitBLAS: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment. - microsoft/BitBLAS

Perplexity AI ▷ #general (165 messages🔥🔥):

Perplexity AI Repetition Issue
Live Internet Access Problems
Math Accuracy in Perplexity Pro
Experience with Perplexity in Stock Market
Subscription Plans and Model Usage

*Perplexity AI Repetition Issue*: Users reported Perplexity AI giving repetitive answers with the same prompt, particularly with models like Llama 3 and Claude. One user mentioned that Alex responded they are aware of the issue and working on a fix.
*Live Internet Access Problems*: One user described issues with Perplexity AI accessing live internet for real-time data, providing inaccurate and outdated information instead. Despite closing and reopening the app, the problem persisted and the user noted it in the feedback channel.
*Math Accuracy in Perplexity Pro*: Users expressed frustration with Perplexity Pro's inaccuracies in handling math problems like CAPM beta calculations. Despite the model being GPT-4o, results were significantly off, raising doubts about the model's efficacy in academic calculations.
*Experience with Perplexity in Stock Market*: One user shared that they made $8k in the stock market using Perplexity, praising its abilities. This sparked a brief discussion about the various benefits users have experienced with the pro version.
*Subscription Plans and Model Usage*: Users discussed the differences between Pro and Enterprise Pro plans, with specifics on model usage like Sonnet and Opus. Questions emerged about the availability and specificity of models in different subscription plans.

Links mentioned:
Chat Completions: no description found
Hyperthyroidism - Symptoms and causes: no description found
Supported Models: no description found
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

Perplexity AI ▷ #sharing (13 messages🔥):

Threads' Milestone
Ancient Aboriginal Rituals
Nuclear-Powered Data Centers
Mars Moss
Eating Contests

*Threads Hit Milestone: A YouTube video titled Discover today: Threads' Milestone, Ancient Aboriginal Rituals, and Nuclear-Powered Data Centers* discusses the recent achievement by Threads.
*Mars Moss and Other Wonders: Another YouTube video titled Discover today: Mars Moss, Eating Contests, Tech Titans, and Toxic Green* explores the existence of moss on Mars and various unusual topics.

Links mentioned:
YouTube: no description found
YouTube: no description found

Perplexity AI ▷ #pplx-api (15 messages🔥):

Difference between pplx-70b-online and llama-3-sonar-large-32k-online
Google Dorks usage with the API
Temporal awareness in LLMs
Effectiveness of query commands in LLMs
Perplexity AI models and model cards

*Google Dorks and API Mastery*: A user suggested that leveraging Google Dorks can enhance the utility of the API, as it simplifies filtering source domains effectively on web products.
*LLMs Lack Temporal Awareness: Users discussed the inability of models like llama3 and haiku* to intuitively understand 'latest' or 'most recent' without explicit cues, influencing their responses.
*Query Commands in LLMs: Not Official: It was highlighted that while Google Dork operators* are often suggested to constrain results, they are not officially integrated into the backend of Perplexity's LLMs.
*Perplexity Model Clarification: A user sought clarification on the difference between pplx-70b-online and llama-3-sonar-large-32k-online* models, referencing both Perplexity's blog and API documentation.
*Model Alias and Obsolescence*: There was confusion over model aliases and potential obsolescence; one user suggested some models might be aliases, while another noted that certain models might now throw errors.

Links mentioned:
Supported Models: no description found
Chat Completions: no description found

LAION ▷ #general (185 messages🔥🔥):

BUD-E update on new features
Issues with Clipdrop NSFW detection
Discussion on dataset availability and usage
Performance of various AI models and training techniques
Commercial licensing of Stability AI models

*BUD-E updates with Clipboard Access: A recent YouTube video showcases BUD-E's new feature* of reading text from the screen and clipboard, detailed in the project description on GitHub. The demo was presented in 240p resolution, which drew some humorous criticism.
*Clipdrop's NSFW Detection Failure: A member shared a humorous incident where Clipdrop incorrectly labeled an image as NSFW content*.
*Struggles with Dataset Availability: Members discussed the difficulties faced by FAL.AI in acquiring new datasets, with comments highlighting the extensive reliance on the same datasets for multiple models. One user emphasized that interesting breakthroughs, like Chameleon*, come from diverse and integrated modalities.
*Stability AI's License Fix: Stability AI revised the commercial license for SD3 Medium* to the Stability AI Community License, allowing broader free use for individual creators and small businesses. This change was made in response to community feedback regarding the original commercial license.

Links mentioned:
BUD-E Update: Seeing images & reading text from screen & clipboard: https://github.com/christophschuhmann/Desktop_BUD-E/tree/main
LPDoctor/Glyph-SDXL-v2 at main: no description found
Dog In Space Dog GIF - Dog In Space Dog I Have No Idea - Discover & Share GIFs: Click to view the GIF
Letter level tokenization: Letter level tokenization. GitHub Gist: instantly share code, notes, and snippets.
Community License — Stability AI: Our new Community License is now free for research, non-commercial, and commercial use. You only need a paid Enterprise license if your yearly revenues exceed USD$1M and you use Stability AI models in...

LAION ▷ #research (2 messages):

scammer alert
new tokenizer proposal for LLMs
T-FREE tokenizer paper

*User flags a scammer: A user alerted the community to the presence of a scammer* in the chat.
*T-FREE Tokenizer Proposal Shakes Up LLMs: A new paper proposes T-FREE, a tokenizer that embeds words through sparse activation patterns over character triplets, eliminating the need for a reference corpus and achieving a parameter reduction of more than 85%* in embedding layers. You can view the paper here.
The paper outlines T-FREE's advantages, including improved performance for underrepresented languages and significant compression of embedding layers.

Link mentioned: T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings: Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational ov...

LAION ▷ #resources (1 messages):
khazn: $50 gift card steamcommunity.com/gift/sd271azjxn2h

LAION ▷ #learning-ml (1 messages):
khazn: $50 gift card steamcommunity.com/gift/sd271azjxn2h

LAION ▷ #paper-discussion (1 messages):
khazn: $50 gift card steamcommunity.com/gift/sd271azjxn2h

OpenAI ▷ #ai-discussions (116 messages🔥🔥):

Moshi AI demo
Issues with GPT-2
Voice modality in OpenAI models
Bangla language support in chatGPT
API usage for AI integration

*Moshi AI Demo Excites and Frustrates: A new Moshi AI demo was released, featuring real-time voice interaction and promises of open-source flexibility. However, users experienced issues like conversational interruptions and looped responses, highlighting the current model's limitations.*: A new Moshi AI demo was released, featuring real-time voice interaction and promises of open-source flexibility. However, users experienced issues like conversational interruptions and looped responses, highlighting the current model's limitations.
*Lack of Long-Term Memory in AI: Hume AI's playground offers interruptable voice AI but lacks long-term memory functionality, resetting after each session. This limitation frustrates users who desire continuous learning from their AI interactions.*: Hume AI's playground offers interruptable voice AI but lacks long-term memory functionality, resetting after each session. This limitation frustrates users who desire continuous learning from their AI interactions.
*Call for Enhanced Bangla Language Support: A user highlighted ongoing issues with chatGPT handling the Bangla language, urging improvements for better accessibility. The request was posted with a thread ID for specific reference and emphasizes the need for broader language support.*: A user highlighted ongoing issues with chatGPT handling the Bangla language, urging improvements for better accessibility. The request was posted with a thread ID for specific reference and emphasizes the need for broader language support.
*GPT-2 vs Modern Models Debate: There was a discussion on whether to use the older GPT-2 model for text generation or upgrade to more current options like GPT-3.5 Turbo. While some argued for the cost-efficiency of GPT-2, others pointed out the drastically better performance of newer models.*: There was a discussion on whether to use the older GPT-2 model for text generation or upgrade to more current options like GPT-3.5 Turbo. While some argued for the cost-efficiency of GPT-2, others pointed out the drastically better performance of newer models.
*Navigating AI Integration via API: Users discussed various methods for integrating AI models using APIs, particularly focusing on RAG via Assistant API endpoints. The conversation highlighted how crucial coding knowledge is for maximizing AI's utility and customization.*: Users discussed various methods for integrating AI models using APIs, particularly focusing on RAG via Assistant API endpoints. The conversation highlighted how crucial coding knowledge is for maximizing AI's utility and customization.

Links mentioned:
Voice-to-Voice Demo • Hume AI: Speak to the first empathic AI voice.
moshi.chat: no description found

OpenAI ▷ #gpt-4-discussions (26 messages🔥):

Differences between free and paid ChatGPT plans
Handling images and PDFs in GPT knowledge base
Effectiveness of GPT memory
Accessing other GPT models within a GPT
External file linking and vector databases for GPT knowledge base

Paid ChatGPT Plan Benefits Explained: A member asked about the benefits of a paid ChatGPT plan, and it was explained that Plus offers a higher usage cap, access to DALL·E, and a larger context window. Additional details can be found here.
Images and PDFs in GPT Knowledge Base: Members discussed whether GPT uses vision to read images and handle PDFs uploaded to the knowledge section. The conclusion was that GPT does not use vision and relies on OCR for text extraction from images and PDFs.
GPT Memory Effectiveness Questioned: A member criticized GPT's memory feature, noting it saves preferences but still makes things up. Another member clarified that these memories function as suggestions, not hard rules, and recommended using customization options to improve behavior.
Linking GPTs and Document Services: A complex discussion unfolded around linking GPT knowledge bases to Google Drive and other similar services. It was noted that external files cannot match the optimization of vector databases without a custom backend, with some services offering live link support for similar features.
GPT-4 Usage Cooldown Confirmed: Concerns about GPT-4 availability were addressed, explaining that users face a cooldown period before using GPT-4 again after hitting their limit. Plus users can send up to 40 messages every 3 hours on GPT-4 and 80 on GPT-4o, with potential reductions during peak hours.

OpenAI ▷ #prompt-engineering (16 messages🔥):

Employee Recognition Program
Content Generation script for training courses
Tool to test multiple AI responses
Tabletop RPG prompts
Traffic ticket challenge guidance

Employee Recognition Program Boosts Morale: Users discussed developing an employee recognition program to boost morale and motivation. The program includes goals, recognition methods, criteria, an implementation plan, and feedback mechanisms.
Effective Content Generation Script: One user is seeking advice on developing a content generation script to create training course structures based on inputs like location, length, topic, and audience. They are considering prompt engineering, RAG, and web search integration as potential techniques.
Tool for Testing Multiple AI Responses: A user inquired about tools to test and visualize multiple AI responses from the same prompt, seeking features like supporting file uploads and displaying response variations. Suggestions included a custom-built tool or existing options like Autogen.
Tabletop RPG Battle Maps Prompting: A user asked for prompt ideas for generating tabletop RPG battle maps. Specific tools or techniques were not discussed.
Guidance on Challenging Traffic Tickets: The channel discussed a structured approach to challenging a traffic ticket in court. The guidance included steps for contesting the ticket effectively and strategies for presenting a case.

Nous Research AI ▷ #research-papers (1 messages):
teknium: https://x.com/kerstingaiml/status/1809152764649574541?s=46

Nous Research AI ▷ #datasets (1 messages):

Replete-AI releases two massive datasets
Everything_Instruct and Everything_Instruct_Multilingual datasets
Sizes and features of new datasets
Influence of bagel datasets and EveryoneLLM AI models

Replete-AI Unveils Massive Datasets: <@716121022025302076> released two new datasets, Everything_Instruct and Everything_Instruct_Multilingual, each sized 11-12GB and containing over 6 million rows of data. These are formatted in Alpaca Instruct style with a focus on creating a comprehensive instruct dataset to train AI models.
Dual Dataset for Ultimate AI Model Training: Everything_Instruct is designed for English-only data, while Everything_Instruct_Multilingual includes multilingual translations to enhance models' language capabilities. Both datasets are inspired by the bagel datasets and previous EveryoneLLM AI models.
The goal is to combine all conceivable types of instruct data into one massive dataset to train top-notch AI models. Enjoy the datasets on Hugging Face.

Links mentioned:
Replete-AI/Everything_Instruct · Datasets at Hugging Face: no description found
Replete-AI/Everything_Instruct_Multilingual · Datasets at Hugging Face: no description found

Nous Research AI ▷ #off-topic (4 messages):

Upcoming Nous physical magazine contribution
Open-source / decentralized technology in StudioMilitary magazine

Call for Contributions in Nous Physical Magazine: John0galt invited everyone to contribute to the upcoming Nous physical magazine by offering good writing, interesting content, or ideas. Reach out to John0galt if interested.
StudioMilitary Magazine Seeking Contributions: StudioMilitary has begun work on their first magazine edition focusing on open-source and decentralized technology. They are looking for contributions in writing, articles, pictures, and infographics, and have encouraged interested parties to reach out.

Link mentioned: Tweet from John Galt (@StudioMilitary): I'm beginning work on the first edition of our magazine. General theme is open-source / decentralized technology. Highlighting the optimistic forces in our world. If you're interested in cont...

Nous Research AI ▷ #interesting-links (5 messages):

Apollo project by Achyut Benz
flask-socketio-llm-completions GitHub repo
foxhop's demo chatroom app
LLM integration with flask-socketio

*Apollo project visualizes topics AI-generated in 3Blue1Brown style: Achyut Benz introduced Apollo, which visualizes topics in 3Blue1Brown style videos, all AI-generated. It uses the Next.js framework, GroqInc inference, and supports AnthropicAI 3.5 Sonnet & OpenAI GPT-4 integrated with LangChainAI*.
Inspired by Chris Abey, the project aims to enhance learning through AI-generated educational videos.

*Chatroom app sends messages to multiple LLMs via flask-socketio: foxhop shared a GitHub repo for flask-socketio-llm-completions, a chatroom app that sends messages to GPT, Claude, Mistral, Together, and Groq AI*, streaming to the frontend.
"This app is maintained to work seamlessly with various LLMs and demonstrates real-time communication capabilities."

*Foxhop showcases demo for LLM-integrated chatroom app: foxhop provided a demo link to showcase the chatroom app integrated with LLMs. The demo exemplifies how messages interact with vLLM, Hermes, and Llama3* models.
The application serves as a practical tool for interacting and experimenting with LLM capabilities in a chatroom environment.

Links mentioned:
Tweet from ach (@achyut_benz): introducing apollo, a new project i've been working on that visualizes topics or concepts in @3blue1brown style videos, all ai-generated. @nextjs framework @GroqInc inference supports @Anthropic...
GitHub - russellballestrini/flask-socketio-llm-completions: Chatroom app where messages are sent to GPT, Claude, Mistral, Together, Groq AI and streamed to the frontend.: Chatroom app where messages are sent to GPT, Claude, Mistral, Together, Groq AI and streamed to the frontend. - russellballestrini/flask-socketio-llm-completions

Nous Research AI ▷ #general (110 messages🔥🔥):

New datasets released by Replete-AI
Nomic AI launches GPT4ALL 3.0
InternLM-XComposer-2.5 model release
Challenges with jailbreaks for Claude 3.5 Sonnet
Discussion on visual latent space for LLMs

Replete-AI Unveils Massive New Datasets: Replete-AI releases two new massive datasets, Everything_Instruct and Everything_Instruct_Multilingual, each sizing 11-12GB with over 6 million rows of data, aiming to combine various instruct data to train AI models to new heights. Details here.
These datasets, inspired by bagel datasets and Replete-AI's EveryoneLLM models, include one set for English and another with multilingual translations to enhance models' multilingual capabilities.

Nomic AI Launches GPT4ALL 3.0: Nomic AI announces the release of GPT4All 3.0, an open-source local LLM desktop app supporting thousands of models across major operating systems with significant UI/UX improvements and MIT license. Check it out, boasting 250,000+ monthly active users and privacy-first features with local file chat.
InternLM-XComposer-2.5 Sets New Benchmarks: InternLM releases InternLM-XComposer-2.5, a versatile large-vision language model supporting long-contextual input and output, trained with 24K interleaved image-text contexts and capable of extending to 96K long contexts via RoPE extrapolation. Announcement here, it surpasses existing open-source models on 16 benchmarks and competes closely with GPT-4V and Gemini Pro.
Frustrations with Jailbreaking Claude 3.5 Sonnet: Users share challenges in jailbreaking Claude 3.5 Sonnet, discussing attempts with specific pre-prompts and roles, but the AI remains persistent on ethical constraints. Some suggest using Anthropic's workbench for potentially higher success but warn of possible account bans.
Exploring LLMs' Visual Latent Space Capabilities: Discussions arise about letting LLMs draw or represent their visual latent space, considering if trained on enough visual data, they could repeat visual elements like chemical structures or 3D spaces. Some examples include a model generating a 3D city using HTML and CSS, suggesting potential but noting the need for datasets involving visual data.

Links mentioned:
Tweet from Nomic AI (@nomic_ai): Launching GPT4All 3.0: The Open-Source Local LLM Desktop App - Completely Private Experience - Supports 1000’s of models and all major operating systems - Major UI/UX Improvements - Local File Chat -...
GPT4All: Run Large Language Models Locally: privacy-first and no internet required
Anthropic Console: no description found
Tweet from LocalAI (@LocalAI_API): 🚀 New model alert! Check out #internlm2, a 7B parameter chat model with outstanding reasoning capabilities & 1M context window. Install it in LocalAI with `local-ai run internlm2_5-7b-chat-1m` #AI #N...
Tweet from AK (@_akhaliq): InternLM-XComposer-2.5 A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that s...
no title found: no description found
Tweet from Philipp Schmid (@_philschmid): I wasn't aware of that, but it looks like Anthropic Claude 3.5 Sonnet on (claude ai) is suppressing parts of his answer from the user, which are not sent to the client. You can test that with, fro...
Karan4D's WorldSim System Prompt Open Source - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Keyword search across all chatrooms to find across conversation history by russellballestrini · Pull Request #1 · russellballestrini/flask-socketio-llm-completions: Summary by CodeRabbit New Features Added a search functionality to find rooms and messages. Introduced a search results page to display search outcomes. Refactor Streamlined chat interface b...
Tweet from RednBlackSalamander (@9mmballpoint): Art tools
internlm/internlm-xcomposer2d5-7b · Hugging Face: no description found
Support Open Models that allow OpenAI API-style tool use & "auto" tool choice by K-Mistele · Pull Request #5649 · vllm-project/vllm: DRAFT: OpenAI Tool Use Checklist This (Draft) PR will add support for OpenAI-style tool calling in a way that is minimally opinionated about tool use formats & prompt formatting. The following fea...
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found
Replete-AI/Everything_Instruct_Multilingual · Datasets at Hugging Face: no description found
Replete-AI/Everything_Instruct · Datasets at Hugging Face: no description found

Nous Research AI ▷ #ask-about-llms (1 messages):

using visual-semantic information to boost image classification performance
zero/few shot multi-modal models discussed at CVPR
applying Florence 2 for supervised fine-tuning

Boosting Image Classification with Visual-Semantic Info: A user inquires about using the interaction between visual-semantic information to enhance fine-grained image classification performance, specifically through supervised fine-tuning. They mention a potential application of Florence 2 for this purpose.
CVPR Highlights Zero/Few Shot Multi-modal Models: At CVPR, numerous papers focused on zero/few shot multi-modal models, demonstrating interest in leveraging both visual and textual data. A user working in computer vision seeks advice on employing this research in practical, supervised settings.

Nous Research AI ▷ #rag-dataset (8 messages🔥):

crossover with pipelines, flows, and agents
rag dataset as 0 shot context ingestion
context and metadata for llm
HF tool processing corpus queries against hf datasets
keyword matching for relevance score and filtering

*Crossover with pipelines, flows, and agents: Pipelines, flows, and agents are merging, and the idea is to make the RAG dataset primarily for 0 shot context ingestion*, focusing on agent-based processing later.
interstellarninja mentioned it's beneficial to incorporate cross-overs into agentic flows, even during RAG development.

*HF Tool Processing and Keyword Matching: A HF tool was described that can process a corpus of queries against HF datasets, converting them into schemas with metadata as .jsonl files, utilizing an inverted index for keyword matching*.
@everyoneisgross mentioned the interface allows for editing generations with Gradio, keyword search functions well for toy prompting.

Nous Research AI ▷ #world-sim (10 messages🔥):

Users discussing lack of credits to use WorldSIM
Issues with using GPT-3.5 on WorldSIM
Prompt engineering for different models on WorldSIM
Positive feedback about WorldSIM
Buddhist world simulation on WorldSIM

WorldSIM Users Run Out of Credits: A user recommended explaining the credit limitations on WorldSIM, suggesting a heading like "Not enough credits to use" or using red text to indicate "NO CREDITS". This would help avoid confusion for new users.
Frustration with GPT-3.5 on WorldSIM: Several members expressed frustration with using GPT-3.5 on WorldSIM, mentioning that it often returns one-line answers before eventually working. One user complained about wasting credits on multiple messages to get started.
New Prompt Engineering for WorldSIM Models: A discussion revealed that WorldSIM is working on new prompt engineering for different models. A member mentioned that separating the prompts between different models is a work in progress (WIP).
Members Praise WorldSIM: A member stated that WorldSIM is "bonkers" and congratulated the team on an awesome job. Another member shared their experience of using up all their credits during a lunch hour to create a world rooted in Buddhist principles.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Simple Telegram bot to interface with different AI models
First 1000 responses free on the bot

*Try Mysticella Bot for AI Model Interfacing: Created a simple Telegram bot to interface with different AI models. First 1000 responses* are free.
*Telegram Bot First 1000 Responses Free: Check out the new Telegram bot Mysticella for free AI model interfacing. The first 1000 responses* are completely free.

OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

Quantisation of deployed LLM models in OpenRouter
Microsoft's API changes affecting OpenRouter
Infermatic's privacy policy update
Issues with DeepSeek Coder equations rendering
Mistral Codestral API pricing and performance

LLM models quantization confusion clarified: OpenRouter LLM models are deployed in FP16/BF16 unless a provider specifies otherwise, as explained by a user. Another user clarified the presence of a quantization icon indicating model quantization status.
Microsoft API change impacts OpenRouter: Microsoft introduced a breaking change to their API used by OpenRouter, but a patch was quickly deployed. User feedback praised the rapid response and fix.
Infermatic clarifies privacy policy: Infermatic does not log any input prompts or model outputs, processing data in real-time only, as clarified in their revised privacy policy. Users found this reassuring compared to older policies indicating potential data retention.
DeepSeek Coder equation issue resolved: Users experienced issues with equations not rendering correctly in DeepSeek Coder, although one user found solutions by manipulating output strings with regex. Another user reported the system prompts not being processed correctly on TypingMind's frontend, raising the issue for review.
Mistral Codestral API pricing criticized: Users expressed dissatisfaction with Mistral's Codestral API pricing, considering it overpriced for a 22B model. Alternative options like DeepSeek Coder were recommended for better cost efficiency and comparable coding performance.

Links mentioned:
UGI Leaderboard - a Hugging Face Space by DontPlanToEnd: no description found
Llama 3 Euryale 70B v2.1 by sao10k: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and c...
Limits | OpenRouter: Set limits on model usage
Code generation | Mistral AI Large Language Models: Codestral is a cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Codestral was trained on 80+...
A guide to LLM inference and performance: Learn if LLM inference is compute or memory bound to fully utilize GPU power. Get insights on better GPU resource utilization.
SillyTavern/src/prompt-converters.js at release · SillyTavern/SillyTavern: LLM Frontend for Power Users. Contribute to SillyTavern/SillyTavern development by creating an account on GitHub.
Privacy Policy - Infermatic: no description found
no title found: no description found
Privacy Policy - Infermatic: no description found
lluminous: no description found

Eleuther ▷ #general (42 messages🔥):

Failed jobs on the leaderboard
Checksum for generative models
Topological Data Analysis for model fingerprinting
1.58 bit LLM paper and its implementation
VQ-VAE immunity to posterior collapse

*Leaderboard Job Issues Surface*: A member inquired about failed jobs on the Hugging Face leaderboard and whether they can be re-added.
*Debate on Checksums for Generative Models*: Discussion arose on whether there is a checksum-like metric for generative models like LlamaForCausalLM using lm-evaluation-harness, with discrepancies noted between benchmarks and checksums.
*Exploring TDA for Model Fingerprinting*: Members delved into the use of Topological Data Analysis (TDA) to fingerprint models by measuring topological invariants, referencing tools like TorchTDA.
’Have you ever looked into Topological Data Analysis? You could potentially accomplish this by using TDA to profile the weights by their inherent topological invariants.’

*Implementing 1.58-bit LLM Innovations*: A member sought guidance on adopting techniques from the 1.58-bit LLM paper to quantize weights and activations for higher cost-efficiency.
They planned to replace the linear layers with a 'BitLinear' layer in a pre-trained model like Pythia to test quantized weight training.

*Struggles with PDF Markup Tools*: A member expressed frustration over the lack of PDF markup tools with a 'Search -> Markup All' function, mentioning expensive options like Bluebeam and PDF Studio.

Links mentioned:
Overview — giotto-tda 0.5.1 documentation: no description found
cognitivecomputations/dolphin-2.9.2-qwen2-72b_eval_request_False_bfloat16_Original.json · open-llm-leaderboard/requests at 8c010a41f0b5f726199183bbad05f1649a362adf: no description found
On the Origin of Llamas: Model Tree Heritage Recovery: The rapid growth of neural network models shared on the internet has made model weights an important data modality. However, this information is underutilized as the weights are uninterpretable, and p...
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits: Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single param...
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness

Eleuther ▷ #research (28 messages🔥):

Diffusion forcing for planning
Comparison with Nathan Frey's walk_jump method
Discussing new research strategies
Continual pre-training for LLMs
Function approximation with different homotopy classes

Diffusion Forcing Shows Promise in Planning: A member shared a video demonstrating diffusion forcing for planning, generating a lot of interest and positive feedback, 'really cool result tbh'.
Diffusion Forcing vs Walk-Jump Method: Discussion on whether diffusion forcing would outperform Nathan Frey's walk_jump method concluded that they may be orthogonal techniques with different mechanisms.
Effective Paper Consumption Strategy: A member inquired about the strategy for keeping up with new research, receiving advice that skimming ArXiv papers on release and systematic effort in filtering important ones is key.
Continual Pre-training for Large Language Models: Recent research on continual pre-training observed a 'stability gap' in the performance of LLMs when adapting to new domains, and proposed three strategies to mitigate it.
Homotopy Classes in Function Approximation: A member queried the benefit of having each basis function's image belong to different homotopy classes during function approximation, particularly in modeling rotation trajectories.

Links mentioned:
Protein Discovery with Discrete Walk-Jump Sampling | Nathan Frey: Portal is the home of the AI for drug discovery community. Join for more details on this talk and to connect with the speakers: https://portal.valencelabs.co...
Protein Discovery with Discrete Walk-Jump Sampling: We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carl...
Efficient Continual Pre-training by Mitigating the Stability Gap: Continual pre-training has increasingly become the predominant approach for adapting Large Language Models (LLMs) to new domains. This process involves updating the pre-trained LLM with a corpus from ...

Eleuther ▷ #scaling-laws (5 messages):

efficientcube.ipynb in chinchilla repository
XLA capabilities in JAX
FLOPs estimation for JIT-ed functions in Flax
Critical batch size and performance degradation

*EfficientCube Notebook in Chinchilla: A toolkit for scaling law research, named efficientcube.ipynb*, has been added to the Chinchilla repository. The notebook includes utilities relevant for scaling research activities.
*JAX adds AOT Compilation Capabilities*: JAX now supports ahead-of-time (AOT) compilation in addition to JIT compilation. This allows users to compile code prior to execution, giving more control over the compilation process.
*Flax FLOPs Estimation Method Shared: A code snippet for estimating FLOPs of JIT-ed functions* in Flax was shared in a discussion on GitHub. This method leverages XLA’s capabilities within JAX for precise performance measurements.
*Reevaluation of Critical Batch Size Theory: Recent findings suggest that below a certain optimal batch size, performance degrades, contradicting the conventional wisdom* that any batch size below a critical value is good. This is noted as being interesting in theory but not significant at large scales.

Links mentioned:
Ahead-of-time lowering and compilation — JAX documentation: no description found
How do you access XLA's flop estimate for a jitted program? · google/flax · Discussion #1854: For now, here is how you do it: In [1]: import jax, jax.numpy as jnp In [2]: m = jax.xla_computation(lambda x, y: x @ y)(jnp.ones((1000, 1000)), jnp.ones((1000,1000))).as_hlo_module() In [3]: clien...
chinchilla/examples/efficientcube.ipynb at master · kyo-takano/chinchilla: A toolkit for scaling law research ⚖. Contribute to kyo-takano/chinchilla development by creating an account on GitHub.

Eleuther ▷ #interpretability-general (3 messages):

SAEs on Llama 3 8B
Sparse autoencoders
Residual stream processing

*SAEs on Llama 3 8B trained: Sparse autoencoders (SAEs) trained on the residual stream of Llama 3 8B are now available for use. These SAEs employ the RedPajama corpus and can be loaded using the EleutherAI sae library*.
Downloads are not currently tracked for this model.

*Residual stream processing using SAEs: This project organizes SAEs by layer* and integrates them with the Llama 3 8B model to process residual streams more effectively. For more details, consult the model card.

Link mentioned: EleutherAI/sae-llama-3-8b-32x · Hugging Face: no description found

Eleuther ▷ #lm-thunderdome (18 messages🔥):

Preprocessing Function Optimization
Proof-Pile Config Error
Metric Inconsistencies in Config
Long Model Names Issue
Evaluating Model in Parallel

Preprocessing Caching Alternative: A user inquired if preprocessed questions/arguments could be saved before feeding them into the model, to avoid rerunning preprocessing functions every time.
Proof-Pile Config Error Resolution: A user faced an error with the proof-pile task using a specific config file. Switching to lambada_openai worked, indicating a potential issue with the dataset itself.
Metric Mismatch Identified in Config: There was confusion over using loglikelihood_rolling in the config while loglikelihood got called, likely due to metric inconsistencies. loglikelihood metrics: perplexity vs word_perplexity, byte_perplexity, bits_per_byte.
Long Model Names Cause Saving Issues: A user experienced issues with saving due to long model names causing files and directories to not be written correctly. Errors returned OSError(36, 'File name too long').
Parallel Evaluation Setup Inquiry: A user asked how to evaluate the model in a parallelized way while passing it via the pretrained parameter. Warning received: 'assuming single-process call to evaluate() or custom distributed integration'.

Links mentioned:
Exploring model generations - a Hugging Face Space by open-llm-leaderboard: no description found
Added MedConceptsQA Benchmark by Ofir408 · Pull Request #2010 · EleutherAI/lm-evaluation-harness: Hi, I haved added our new benchmark called MedConceptsQA. MedConceptsQA is a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various...

Eleuther ▷ #multimodal-general (1 messages):
wendlerc: Does anyone have a good SDXL latent downscaler? I’d like to go from 128x128x4 to 64x64x4.

LangChain AI ▷ #general (75 messages🔥🔥):

Difficulty using LangChain
Preference between OpenAI or ChatOpenAI
PeopleGPT and Juicebox.ai functionality
RAG Architecture for scheduling demos
LangChain performance issues and improvements

*Whys and Whys Nots of LangChain: A member expressed difficulty using LangChain* and questioned its utility, citing long response times and unnecessary steps in processing, especially while running locally on CPU.
Another user pointed out it might be the model's reasoning performance or simply the fact it's running without a GPU, leading to inefficiencies like excessive irrelevant searches.

*OpenAI vs. ChatOpenAI: OpenAI and ChatOpenAI were compared for task executions, with a user inquiring the pros and cons and noting that OpenAI might be deprecated in favor of ChatOpenAI*.
Several members clarified that diverse experiences exist, depending on the exact requirements and implementation contexts.

*PeopleGPT in Juicebox.ai Shines: A member discussed Juicebox.ai powered by PeopleGPT*, a natural language-based search engine for finding qualified talent without using Booleans, providing easy clickable examples here.
The discussion focused on the technical functionality, highlighting it combines filters with search to enhance user experience.

*Issues with LangChain and CSV Files: A user sought updated methods for dealing with multiple CSV files in LangChain*, noting previous limitations in handling more than two files post-update.
The member reminisced about the effectiveness of previous modules and queried modern alternatives for optimal performance and integration.

*Challenges with LangChain for Scheduling Demos: A member struggled with incorporating demo scheduling features in a chatbot using LangChain and RAG architecture, mentioning tools like SlackScheduleMessage*.
Detailed steps provided from LangChain's community were discussed for possible solutions, emphasizing the need for further community input.

Links mentioned:
Conceptual guide | 🦜️🔗 LangChain: This section contains introductions to key parts of LangChain.
langchain_core.messages.human — 🦜🔗 LangChain 0.2.6: no description found
Agents and GraphQL- 401 Client Error: Unauthorized for url: https://streaming.bitquery.io/eap · Issue #23881 · langchain-ai/langchain: Checked other resources I added a very descriptive title to this issue. I searched the LangChain documentation with the integrated search. I used the GitHub search to find a similar question and di...
langchain.agents.tool_calling_agent.base.create_tool_calling_agent — 🦜🔗 LangChain 0.2.6: no description found
Juicebox (PeopleGPT) - The leader in AI-powered people search.: Discover PeopleGPT, the search engine that know who you're looking for. Search through 800+ million profiles in real-time using natural language. Get contact details and set up outreach campaigns...
Tweet from @levelsio (@levelsio): I recommend everyone against using LangChain and this article explains well It uses abstractions on top of abstractions and actually makes your code needlessly complicated Just write API calls and a...
Why we no longer use LangChain for building our AI agents: When abstractions do more harm than good - lessons learned using LangChain in production and what we should’ve done instead
Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

LangChain AI ▷ #share-your-work (3 messages):

Adding demo scheduling feature to chatbot using the RAG architecture and LangChain framework
Blogpost on creating an E2E Image Retrieval app using Lightly SSL and FAISS
Beta testing for advanced research assistant and search engine with premium model access

RAG Chatbot Needs Demo Scheduling Feature: A member asked for community help to add a demo scheduling feature to their chatbot built using the RAG architecture and the LangChain framework.
*Lightly SSL* and FAISS power Image Retrieval App: A blogpost was shared on creating an E2E Image Retrieval app using Lightly SSL and FAISS, including implementing a vision transformer model and creating vector embeddings. The detailed blogpost includes a Colab Notebook and a Gradio app.
*Rubik's AI* offers Free Beta Testing: An invitation was extended for beta testing an advanced research assistant and search engine, offering 2 months of free premium access to models like Claude 3 Opus, GPT-4o, and more.
Users were prompted to sign up using the promo code 'RUBIX' for the free trial.

Links mentioned:
Rubik's AI - AI research assistant & Search Engine: no description found
Vector Indexes and Image Retrieval using lightly: Use a pre-trained Vision Transformer provided by Lightly to create a vector index on an arbitrary dataset for Image Retrieval using faiss
Food101 Image Retrieval - a Hugging Face Space by lightly-ai: no description found
Tweet from Saurav Maheshkar ☕️ (@MaheshkarSaurav): 🚀 Latest work at @LightlyAI. Learn how you can create an Image Retrieval app using FAISS (@AIatMeta) as an vector index 🗃️, model implementations from the Lightly SSL package and @weights_biases for...
Google Colab: no description found

LangChain AI ▷ #tutorials (1 messages):
dievas_: https://www.youtube.com/watch?v=yF9kGESAi3M try this one

LlamaIndex ▷ #announcements (1 messages):

Next webinar on RAG experimentation/evaluation with LlamaIndex and Weights and Biases
Announcements about the timing and focus of the upcoming webinar
Complex challenge of aligning LLM Judge for accurate evaluation

*� Next Webinar on Aligning Your LLM Judge: Join the next webinar on a principled approach to RAG experimentation/evaluation with LlamaIndex and Weights and Biases next Wednesday at 9am PT. Reserve your spot by registering through the provided link.
Complex Challenge of Aligning Your LLM Judge: This webinar will explore various evaluation strategies focused on aligning your LLM Judge using a RAG pipeline as a case study. It will also demonstrate how to leverage Weights and Biases Weave for systematic assessment.

Link mentioned: LlamaIndex Webinar: Aligning Your LLM Judge with LlamaIndex and W&B Weave · Zoom · Luma: While creating a RAG pipeline is now straightforward, aligning your LLM Judge for accurate evaluation remains a complex challenge. In this webinar, we’ll delve…

LlamaIndex ▷ #blog (4 messages):

New Webinar: A Principled Approach to RAG Experimentation + Evaluation
Reflection as a Service
Becoming a Rockstar AI Engineer and Educator
Corrective RAG as a Service

Webinar: Partnering with Weights & Biases on RAG: LlamaIndex announced a webinar with Weights & Biases to showcase building, evaluating, and iterating on RAG pipelines. This follows 1+ years of RAG development but notes that proper evaluation remains challenging.
Ensuring Reliability with Reflection as a Service: LlamaIndex discussed the concept of 'Reflection as a Service,' addressing reliability issues in agentic applications by implementing a reflection step to self-correct outputs if incorrect. This solution aims to prevent problematic outputs from LLMs.
Rockstar AI Engineer: @ravithejads's Journey: LlamaIndex highlighted the journey of community member @ravithejads, who became a developer advocate through passion, OSS contributions, and staying updated with the latest AI trends. His story is shared to inspire others to excel in AI engineering and education.
Releasing Corrective RAG as a Service: LlamaIndex announced the release of Corrective RAG (CRAG) by Yan et al., which dynamically validates retrieved context and corrects it if irrelevant, using web search before the generation step.

LlamaIndex ▷ #general (71 messages🔥🔥):

Google Cloud Function inference pipeline with multiple model loading
Performance comparison of Cohere's command r+
Implementing conversational memory in LlamaIndex with RAG
Using hybrid retrievers without storing/loading from filesystem
Few-shot example technique for 'Poor man's RLHF'

*Multiple Model Loading in Google Cloud Function Inference Pipeline*: A user expressed issues with loading the Alibaba NLP embedding model and Llama3 LLM for inferences on a Google Cloud Function, facing repetitive loading times. They asked for alternatives to load embeddings directly from Vertex AI and received suggestions but no concrete solution.
*Handling Conversational Memory in LlamaIndex*: A user sought ways to avoid overuse of conversation memory in LlamaIndex and received advice on improving prompt engineering to mitigate the issue. They agreed that modifying the system prompt might help.
*Hybrid Retriever Usage Without Filesystem Storage*: A user inquired about implementing a hybrid retriever without filesystem storage, and suggestions included writing the BM25 algorithm for sparse vectors and storing them in a vector store. Discussion also mentioned future explorations with bm42 and minor tweaks needed for LlamaIndex support.
*Handling Large Models and Quantization*: A user discussed challenges with using the 'gte-Qwen2-7B-instruct' and 'BAAI/bge-large-en-v1.5' embedding models due to GPU limitations. They planned to test quantized embedding models and learned both models can be used if dimensions match.
*Local LLMs, GPT4All, and Outdated Documentation*: Concerns were raised about outdated examples and links in the documentation. Latest information on using local LLMs was shared, and it was noted that contributions to update the documentation are welcome.

Links mentioned:
no title found: no description found
BM42: New Baseline for Hybrid Search - Qdrant: Introducing BM42 - a new sparse embedding approach, which combines the benefits of exact keyword search with the intelligence of transformers.
Tweet from Jay Hack (@mathemagic1an): "Poor man's RLHF" 1) Have user indicate when model is correct 2) Store associated (input, output) in embedding index 3) At inference time, retrieve nearest K previous inputs 4) Put the...
Semantic double merging chunking - LlamaIndex: no description found
GitHub - microsoft/graphrag: A modular graph-based Retrieval-Augmented Generation (RAG) system: A modular graph-based Retrieval-Augmented Generation (RAG) system - microsoft/graphrag
Starter Tutorial (Local Models) - LlamaIndex: no description found
Google Colab: no description found
Index - LlamaIndex: no description found
Chat Engine - Context Mode - LlamaIndex: no description found
Recursive Retriever + Node References - LlamaIndex: no description found

Cohere ▷ #general (45 messages🔥):

Discussion about qualifications for attending the London event
Issue with deploying an app using Cohere's rerank API in production
Introduction of new members
Teaching AI and advanced development
Working on AI-Plans, a peer review platform for red teaming alignment plans

*No Qualification Needed for London Event: A member asked if certain qualifications were necessary to attend the London event, and others clarified that no prerequisite requirements were needed and anyone could attend by filling out a form. No PhD needed to attend community events* was a key message.
*Rerank API Error in Production: A member raised a TypeError when deploying an app using the rerank API* in production, contrasting its local functionality. Another member noted that the issue seems unrelated to Cohere and asked for the Streamlit script for further diagnosis.
*New Members Introduce Themselves: Several new members, including a recent Computer Science graduate* and an AI developer interested in teaching, introduced themselves and expressed excitement about joining the community. They highlighted their backgrounds and what they hope to achieve within the Discord.
*Teaching AI and Advanced Development: A member expressed keen interest in teaching AI and advanced development*, inviting others to reach out for collaboration. This was well-received, with another member openly offering to seek his expertise soon.
*AI-Plans Platform: A member revealed working on AI-Plans, a peer review platform for red teaming alignment plans*. This sparked interest and welcomed them to further discuss their project.

Cohere ▷ #project-sharing (17 messages🔥):

Featuring a tutorial on Cohere blog
Introducing Command R+, the new powerful model
Using Rhea.run to create toy apps
New 'Save to Project' feature in Rhea.run

*Feature Tutorial on Cohere Blog*: A member expressed interest in featuring a tutorial on the Cohere blog and shared an old blog post and starter code for a Slack bot on GitHub. Another member confirmed they will follow up directly.
*Using Rhea.run for Toy Apps*: Members discussed using Rhea.run to create toy apps, noting its capability to generate interactive applications by asking it to design HTML scripts.
*Introducing Command R+*: Cohere announced the release of Command R+, their most powerful model in the Command R family, now available for use.
*New Feature in Rhea.run*: A new 'Save to Project' feature was introduced in Rhea.run, which allows users to create interactive applications by designing HTML scripts through conversations.

Links mentioned:
Rhea | Byte Breeze Studios: no description found
Build a smart Slack bot with language models: Have you ever wanted to build an intelligent Slack bot? There are many ways to inject intelligence into a Slack or Discord bot. Starter Code: https://github.com/cohere-samples/cohere-slack-starter-ap...
GitHub - cohere-samples/cohere-slack-starter-app: Co:here-powered Slack App Starter Project: Co:here-powered Slack App Starter Project. Contribute to cohere-samples/cohere-slack-starter-app development by creating an account on GitHub.

OpenInterpreter ▷ #general (57 messages🔥🔥):

Technical question about interpreter output
Discussion on new MacOS Copilot, Invisibility
Acknowledgment and progress on Open Interpreter (OI) security
Open Interpreter's new debugging feature
Monthly House Party events

*Invisibility: MacOS Copilot Gains Traction*: Members discussed the new Invisibility MacOS Copilot that uses GPT-4, Gemini 1.5 Pro, and Claude-3 Opus, highlighting its free availability and features like seamless context absorption. Development of voice, long term memory, and iOS is ongoing.
Interest was expressed about integrating similar tools into the OI ecosystem, with one member suggesting the possibility of open-sourcing grav.ai, a preceding project.

*Open Interpreter Implements Debug Command*: One user excitedly reported that Open Interpreter can now change the VSC theme from light mode to dark mode automatically, showcasing its ability to perform certain actions without explicit programming. This feature, referred to as the 'wtf' command, allows for debugging errors in the terminal and suggesting fixes.
This newly implemented functionality caused quite a buzz, with members sharing their amazement and support for ongoing improvements.

*Acknowledgment of OI Security Measures*: A member praised the OI team for their dedication to security, mentioning a meeting where various ideas and suggestions were discussed to improve the system's security model. The team's commitment to making security a priority was highly appreciated.
Plans for future security roundtables were mentioned, with a promise to update the community on dates and ongoing efforts.

*Monthly House Party Recap*: The community celebrated the success of OI’s 4th of July House Party which showcased new demos, faces, and previews of upcoming updates. The next event is scheduled for August 1st.
Members expressed their joy and gratitude for the event, highlighting its role in fostering engagement and collaboration within the community.

Links mentioned:
Tweet from SKG (ceo @ piedpiper) (@sulaimanghori): So we've been cooking the last few weeks. Excited to finally unveil Invisibility: the dedicated MacOS Copilot. Powered by GPT4o, Gemini 1.5 Pro and Claude-3 Opus, now available for free -> @inv...
Gravity: Your personal AI.

OpenInterpreter ▷ #O1 (2 messages):

01 Light shipments update
Delays in 01 Light shipments

*01 Light Shipments Update: Members expressed anticipation about the 01 Light shipments with one hoping for an update soon. Another member shared their frustration, stating they've been waiting forever*.
*Frustration Over Shipment Delays: A member conveyed their dissatisfaction over the prolonged wait for the 01 Light*. The sentiment was echoed by another member, indicating collective frustration.

Modular (Mojo 🔥) ▷ #general (5 messages):

Discussion on casting bugs in Mojo
Comparison between Mojo and Python objects
Proposal for a Mojo Fundamentals course at EDx
Resources for learning Mojo

*Casting Bug in Mojo*: A member highlighted the casting bug with references to relevant GitHub issues #3065 and #3167.
*Mojo vs Python Objects Discussion: There is speculation that the casting bug might be related to differences between Mojo objects and Python objects*, referencing issue #328.
*Mojo Fundamentals Course Proposal*: A user proposed creating a "Mojo Fundamentals" course for EDx, but another member suggested it would become outdated quickly. They recommended using Mojo by example and mojo-learning as up-to-date resources instead.

Links mentioned:
Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #🔥mojo (22 messages🔥):

Casting file pointer to struct in Mojo
Calling external programs in Mojo using system or popen
Handling bitcast issues in Mojo with byte array manipulation
Pass a List as an argument to a function in Mojo
MLIR issue with unsigned integer casting in Mojo

Casting file pointer to struct in Mojo: A user successfully bitcasted a List's UnsafePointer to a struct in Mojo using an example shared by another user, with specific reference to bitcast.
MLIR unsigned integer casting bug reported: MLIR Issue #3065 was discussed where casting to unsigned integers behaves like casting to signed integers, creating inconsistencies. This issue has been affecting multiple users and the discussion moved from Discord to GitHub Issue #3065.
External programs in Mojo: Running external programs in Mojo can be achieved using external_call with references given to example here for implementations like system and popen. A Python example for popen was shared, detailing how to run
Handling bitcast issues in Mojo with byte array manipulation: A user encountered inconsistencies when bitcasting objects from a file pointer in Mojo, with behaviors changing based on byte array lookup. The issue was suspected to be due to the bytes getting freed, suggesting keeping the bytes around or using Reference to avoid undefined behavior.
Pass a List as an argument to a function in Mojo: A user resolved an issue passing List as an argument by specifying the type in the function signature as inout inList:List[String]. They initially faced type errors but successfully appended items to the list following the fix.

Links mentioned:
UnsafePointer | Modular Docs: This is a pointer type that can point to any generic value that is movable.
[BUG] Unsigned integer casting overflowing as if signed when using `int()` or `UInt32()` · Issue #3065 · modularml/mojo: Bug description Migrating this here after a bit of discussion in Discord. It seems like casting to unsigned integers actually just casts to signed integers, but has different behaviour in different...
max/examples at main · modularml/max: A collection of sample programs, notebooks, and tools which highlight the power of the MAX Platform - modularml/max

Modular (Mojo 🔥) ▷ #nightly (10 messages🔥):

segfault issues with nightly build
bug report submission
os.path.expanduser bug
new nightly Mojo compiler releases
changelog updates

*Nightly Build Segfaults on Compilation*: A member experienced a segfault while compiling a source file with the nightly build and shared the problematic file. This prompted them to submit a bug report.
*os.path.expanduser Bug Causes Nightly Build Failures*: A bug introduced by using os.path.expanduser caused nightly builds to fail because the HOME environment variable was not set during tests. A member admitted the mistake, apologizing for the inconvenience.
*New Nightly Mojo Compiler Released*: A new Mojo compiler version 2024.7.416 has been released, featuring updates like an exclusive parameter to pointer types and the implementation of collections.Counter. See the changelog and raw diff for detailed changes.
*Subsequent Nightly Mojo Compiler Release*: Another nightly compiler version 2024.7.505 was released, deprecating time.now in favor of time.perf_counter methods. Detailed changes are available in the changelog and raw diff.

Modular (Mojo 🔥) ▷ #mojo-marathons (17 messages🔥):

Feedback from Modular staff on best answers
Interest in x86 and SVE rounds
PR for a better timer needing MLIR knowledge
Benny's solution for matrix multiplication
Compilation times and segfaults in test suite

Modular staff to provide feedback on best answers: Modular staff will give feedback on the best answer at the end of the challenge, as well as offer suggestions for improvement.
Interest in x86 and SVE benchmarks: A discussion emerged about conducting x86 (with and without AMX) and SVE rounds since Graviton 4 is expected to go GA soon, and it features SVE.
Benny shares matrix multiplication solution and hints: Benny shared his best solution for matrix multiplication and hinted at tuning the block size for improved performance. He mentioned using CPU cache sizes as parameters and suggested checking UT Austin papers for more details.
Compilation time and segfault issues in test suite: Users reported long compilation times and internal segfault issues when running the latest test suite with provided solutions.
Relevant papers for parameter tuning: Benny referenced several UT Austin papers for parameter tuning related to cache sizes and matrix multiplication performance improvements. He provided a Google Spreadsheet link listing those papers.

Link mentioned: Matrix Multiplication: Sheet1 Contstraints,Parameters / Tuning Vectorization,Contiguous Access,Nelts, Unrollable Parallelization,Unrollable Unrolling,Contiguous Operations Tiling Square Optimized,Amorized Increase,Recursiv...

LLM Finetuning (Hamel + Dan) ▷ #general (12 messages🔥):

Usage of LangSmith without LangChain
Accusation of lack of GPU credits during AI course
3rd place solution in AI Mathematical Olympiad
Benefits of in-context learning vs. fine-tuning

*LangSmith Operates Independently from LangChain: A user inquired if LangSmith can be used without LangChain, to which others confirmed that it's possible and provided a Colab example and GitHub link. LangSmith allows instrumentation of any LLM application*, useful for debugging and monitoring.
*Accusations About Missing GPU Credits: A heated debate ensued over claims that participants of a course did not receive GPU credits*, with multiple members pointing out that the terms were clearly stated and visible on the course platform. Some speculated that the complaints might be unfounded or driven by ulterior motives.
*Top 3rd Place AI Mathematical Olympiad Solution’s Lack of Fine-Tuning: A user highlighted that the 3rd place solution in the AI Mathematical Olympiad, which won $32k*, did not involve any fine-tuning. The leaderboard can be reviewed for more details here.
*In-Context Learning vs Fine-Tuning Discussion: An interesting discussion was sparked by a LinkedIn post comparing in-context learning with fine-tuning* for LLMs. The detailed insights can be found here.

Links mentioned:
AI Mathematical Olympiad - Progress Prize 1 | Kaggle: no description found
Tracing without LangChain | 🦜️🛠️ LangSmith: Open In Collab Open In GitHub

LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (7 messages):

Discussion on monthly credits and expiration
Distributed finetuning issue solutions
Clarifying the usage and remaining balance of credits

*Clarifying monthly credits and expiration: Members discussed the $1000 monthly credit and potential loopholes, clarifying that unused credits* may not carry over, but still finding it generous.
*Issues with distributed finetuning: A member shared a link to a thread detailing steps to resolve issues encountered during distributed finetuning*.
*Understanding credit usage and balance: Discussion centered on members noticing their remaining balance, with one reporting $1030 after finetuning Mistral, and questioning if it is due to a default $30* per month allocation.

LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (1 messages):
goktrenks: when is the expiration date for the credits? (thanks btw!)

LLM Finetuning (Hamel + Dan) ▷ #ankurgoyal_textsql_llmevals (2 messages):

Text2SQL use case discussion and appreciation for iterative eval dataset building

*Iterative Building of Eval Dataset Impresses: A member expressed appreciation for the session on Text2SQL, highlighting its value due to the iterative building of the eval dataset*.
The iterative process was particularly appreciated and seen as beneficial for an upcoming use case.

*Thanks to the Community: Members expressed gratitude towards the community, particularly towards an individual for their guidance in building the eval dataset* for Text2SQL.
Such sessions and discussions are found incredibly valuable by the members.

LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (1 messages):

Applying eval framework to unstructured applications
Challenges of using unit tests/Level 1 evals without structured output

Challenges in Eval Framework for Unstructured Output: A user questioned the applicability of the eval framework to outputs that lack strict syntax rules, like a query language. They expressed confusion over implementing unit tests/Level 1 evals without a structured output.
Missing Methodology in Unstructured Eval Applications: The user asked if they were missing something when considering how to apply the eval framework to less structured applications, indicating a gap in understanding or practice.

LLM Finetuning (Hamel + Dan) ▷ #jeremy_python_llms (2 messages):

Pushing models to HF_HUB for inference endpoints
Training models on HF_HUB as endpoints

Push Models to HF_HUB for Inference: Inference endpoints on HF_HUB might be facilitated by pushing a model to the hub and then using the credits for an endpoint. This suggestion revolves around utilizing existing resources for creating efficient inference pipelines.
Training Not Feasible as Endpoints: The idea that training will work as an endpoint on HF_HUB is questionable. It's discussed that training may not be practical for endpoints, possibly due to resource or infrastructure limitations.

LLM Finetuning (Hamel + Dan) ▷ #axolotl (3 messages):

Using type: input_output with Meta-Llama-3-8B-Instruct
Special tokens configuration in Axolotl
Training outcomes with L3 8B base vs L3 70B Instruct
Template usage for prompt formatting
Special tokens setup discrepancies between models

Struggling with Meta-Llama-3-8B-Instruct setup: A user shared challenges with using type: input_output and configuring special_tokens for the Meta-Llama-3-8B-Instruct model, citing confusion over correct setup in their jsonl and yml files. They referenced a GitHub example and a blog post for additional context.
Disparities in special tokens setup: Discussion included the need to add special tokens from Meta's special_tokens_map.json, comparing it to the special tokens setup for Mistral 7B base. They suggested following similar configurations as used in other training setups to avoid issues.
Training results favoring L3 70B Instruct: A user noted better subjective outcomes training on L3 70B Instruct base compared to L3 8B base, discovering the improved results only after checking the model configuration post-training. They mentioned an accidental but preferable result when a training setup defaulted to the 70B instruct model.

Link mentioned: Hamel’s Blog - Template-free axolotl: Template-free prompt construction in axolotl with the new input_output format.

LLM Finetuning (Hamel + Dan) ▷ #credits-questions (1 messages):

Eligibility for credits on all services
Enrollment date and course catch-up

*Seeking Eligibility for Credits: A member inquired about their eligibility for credits* on all services and expressed gratitude for any applicable credits.
*Course Enrollment Date: The same member mentioned they enrolled in the course on June 14th* and have been catching up recently.

LLM Finetuning (Hamel + Dan) ▷ #predibase (1 messages):

Expired compute credits
Extension request for compute credits

*Compute Credits Expire Too Soon: A member realized that their compute credits have expired after only one month, leaving them with around $70* still unused.
*Extension Request for Compute Credits: The same member politely asked if it is possible to get an extension for the remaining compute credits*.

LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):

Credit grant request
Enrollment details

*Credit grant request: A user requested credit grants for their updated form with organization ID org-SxGZTlTAAYP5xAswIojG7KI5*.
*Enrollment details: The user mentioned they enrolled on June 14th* and are catching up on the course lately.

Interconnects (Nathan Lambert) ▷ #news (5 messages):

Unimpressed reaction to AI demo
Stability AI's apology and license update

*AI Demo Criticism Raises Authenticity Questions: @benhylak expressed disappointment with an AI demo on X, questioning its authenticity by stating 'it's really, really bad... leaves me wondering if the demo was fake?'. Response time* issues were particularly noted.
*Stability AI Apologizes and Updates License: Stability AI acknowledged that Stable Diffusion 3 Medium didn't meet community expectations and clarified updates on its commercial license, aiming to address confusion and concerns. They committed to releasing high-quality Generative AI* models moving forward.

Links mentioned:
Tweet from Stability AI (@StabilityAI): At Stability AI, we’re committed to releasing high-quality Generative AI models and sharing them generously with our community of innovators and media creators.  We acknowledge that our latest releas...
Tweet from ben (@benhylak): just tried it and... it's really, really bad. leaves me wondering if the demo was fake? Quoting ben (@benhylak) the world is about to change very fast.

Interconnects (Nathan Lambert) ▷ #other-papers (8 messages🔥):

BM42 vs BM25 in search engines
Contextual AI's focus on RAG
Jo Bergum's critique of Qdrant's BM42 claims

BM42 challenges BM25 in search tech: Qdrant Engine claims that the new search model, BM42, surpasses the traditional BM25 in modern RAG applications, offering a mix of semantic and keyword search as mentioned in their tweet.
Jo Bergum- BM42 results are fake: Jo Bergum criticized Qdrant Engine for falsifying results about BM42 on a Quora dataset, stating that Precision@10 reported was impossibly high, and calling the results **

Links mentioned:
Tweet from Qdrant (@qdrant_engine): For 40 years, BM25 has been the standard for search engines. However, it falls short for modern RAG applications. Say hello to BM42: The combination of semantic and keyword search
Tweet from Jo Kristian Bergum (@jobergum): Okay, gloves off. What @qdrant_engine did with the BM42 post is unacceptable. They are misguiding the RAG community in a big way. 1) Presenting Quora as a relevant RAG question-answering dataset. I...

Interconnects (Nathan Lambert) ▷ #random (9 messages🔥):

Understanding VAEs
Interconnects' investment genius
GDP growth rate from AI for timelines
Anthropic Claude 3.5 Sonnet suppressing answers

Understanding VAEs leads to nosebleeds: VAEs (Variational Autoencoders) have caused confusion, with one user humorously noting they got a nosebleed trying to understand them.
Investment Genius in Interconnects: In a recent post, it was revealed that interconnects showcased his prowess as an "absolute investment genius."
AI-driven GDP growth requires significant rates: GDP growth from AI needs to be between 11-15% to meet Stuart's timelines, depending on initial conditions. This metric was checked for feasibility and deemed reasonable.
Anthropic Claude 3.5 Sonnet suppressing answers: Anthropic Claude 3.5 Sonnet is reportedly suppressing parts of its answers from users. The usage of hidden tags like §§antThinking§§ has raised concerns about transparency in these AI systems.

Link mentioned: Tweet from Philipp Schmid (@_philschmid): I wasn't aware of that, but it looks like Anthropic Claude 3.5 Sonnet on (claude ai) is suppressing parts of his answer from the user, which are not sent to the client. You can test that with, fro...

Interconnects (Nathan Lambert) ▷ #posts (4 messages):

Gemini web app
Google AI Studio
Vertex AI
Google's AI race
First Amendment and weights

Google's AI race lags behind: Google is behind other companies in the AI race, needing to clean up clarity issues that caused user confusion, according to a detailed discussion in the chat.
One participant expressed that Google is slow and messy booting up in the AI race, but acknowledged that they are improving.

Understanding Gemini and its offerings: The Gemini web app costs $20/mo and competes with ChatGPT, previously named Bard and powered by PaLM 2 before now using Gemini 1.5. Google AI Studio provides an API key for developers to use Gemini 1.5 with 2M context, while Vertex AI offers the same for enterprises.
One user expressed confusion about whether the paid version of Gemini always uses Gemini 1.5 due to unclear FAQs.

First Amendment and weights: A user discussed the application of the First Amendment to AI model weights, suggesting it could be a logical but optimistic view.
The idea is that weights should be protected as something published, thereby covered by the First Amendment.

OpenAccess AI Collective (axolotl) ▷ #general (20 messages🔥):

Issues with build.nvidia API
Queue system for build.nvidia API
Script issues and resolutions
Pipeline using YAML examples

*build.nvidia API has hiccups: A member noted trouble with the build.nvidia API. Another pointed out the emergence of a queue system* for handling requests.
In attempts to resolve the script issues, a member realized it worked again after a brief pause, suggesting intermittent reliability of the API.

*Pipeline accepts YAML inputs: In a discussion on handling inputs, a member mentioned their pipeline employs YAML examples* of conversations for few-shot learning. They clarified this when questioned about incorporating textbook data.

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):

Gemma2 update fixing issues
Pinned version of transformers
CI catching problems

Gemma2 Fixes Issues in Updates: The update for Gemma2 addressed previously encountered problems. Using the pinned version of transformers ensures these issues are avoided, thanks to our CI system detecting such problems.
CI Ensures Stability with Transformers: Pinned version of transformers should sidestep issues, as continuous integration (CI) will catch potential problems. This guarantees a more stable development environment.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (1 messages):
le_mess: Need more VRAM 🙂

tinygrad (George Hotz) ▷ #general (3 messages):

test for bug placement
issue reproduction
focused test case
PR management

*Test Bug Placement Decision: A user inquired about the best location for a bug test—either in test_nn or test_ops*—and asked for advice on naming it.
The user confirmed understanding and delegated the task to someone else, indicating that they will handle it.

*Issue Reproduction and PR Management*: Another user suggested leaving the PR open, treating it as an issue with a reproduction step, and ensuring the fix includes a more focused test case.
Final confirmation from the original user implied they would handle the specifics.

tinygrad (George Hotz) ▷ #learn-tinygrad (12 messages🔥):

Contiguous Tensors in Tinygrad
Tinygrad Training Efficiency Concerns
Matrix Multiplication Blog Post
Using Pre-trained PyTorch Models with Tinygrad

Tinygrad Contiguous Tensors Confuse Users: There's a discussion about Tensor.randn/randint creating contiguous Tensors whereas Tensor.full and similar methods create non-contiguous ones, which contrasts with PyTorch behavior.
Optimize Tinygrad for Large Scale Training: Members discuss the inefficiencies in Tinygrad for large-scale training, mentioning it as slow and not cost-effective. A suggestion to use BEAM search was made but it takes time.
Learn Matmul with an Informative Blog Post: An engaging blog post about high-performance matrix multiplication on CPU is shared, demonstrating over 1 TFLOPS performance with easy-to-understand code.
Run Inference on Tinygrad with PyTorch Models: Inquiry about the best way to run inference with a pre-trained PyTorch model using Tinygrad. The answer provided points to the usage of tinygrad.nn.state.torch_load.

Links mentioned:
TinyJit vis WIP: TinyJit vis WIP. GitHub Gist: instantly share code, notes, and snippets.
Beating NumPy’s matrix multiplication in 150 lines of C code: TL;DR The code from the tutorial is available at matmul.c. This blog post is the result of my attempt to implement high-performance matrix multiplication on CPU while keeping the code simple, portable...

Torchtune ▷ #general (8 messages🔥):

Setting evaluation parameters for Torchtune
Grad norm graph on wandb
Loss curve optimization in wandb
Learning rate adjustment impacts
Missing wandb logging metrics

*Setting evaluation parameters for Torchtune: A user inquired about how to set evaluation parameters in Torchtune*, and another mentioned there should be a parameter for 'validation dataset' or something similar.
*Missing grad norm graph in wandb: A user sought assistance on obtaining a grad norm graph* in wandb, as it is a default graph in other tools like aoxotl.
*Loss curve optimization in wandb: A user was advised to observe the shape of the loss curve* for a downward trend and was provided an example with a link. They noted insufficient optimisation in their loss curve and the suggestion to increase the initial learning rate.
*Learning rate adjustment impacts: After receiving feedback, a user increased the initial learning rate* and altered several parameters to optimize their model but reported no significant improvement in the loss.
*Missing wandb logging metrics: A user questioned the absence of wandb logging* for evaluation loss and grad norm, indicating an issue with metric logging.

Link mentioned: salman-mohammadi.): Weights & Biases, developer tools for machine learning

AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (5 messages):

Investigating system robustness with Python and TypeScript
Challenges with automatic Docker installation of Convex local backend

*Python & TypeScript face integration issues: A member shared issues with integrating Python and TypeScript, specifically encountering bugs when launching Convex* if Python wasn't pre-installed.
*Docker's Convex backend installation is tricky: Another member discussed challenges in making the Convex local backend installation automated within Docker*, mainly due to how the container folder was set up as a volume for ease of updates and access.

AI Stack Devs (Yoko Li) ▷ #assets (1 messages):

Collection of sprite sheets
Aesthetics and style matching with Cloudpunk
Largest tilemaps on itch.io

Searching for sprite sheets to match Cloudpunk's aesthetics: A member inquired about the source of a specific collection of sprite sheets, mentioning their purchases of several large tilemaps on itch.io that didn't quite match the dark, futuristic, cyberpunk aesthetics of Cloudpunk.
Matching aesthetics of purchased tilemaps: The member is curious about where to obtain spritesheets that go well with the Cloudpunk aesthetic, as their current collections and purchases from itch.io fall short.

DiscoResearch ▷ #general (1 messages):

Three GPTs Walk into a Bar and Write an Exec Summary blog post by dsquared70
Utilizing Custom GPTs for creating executive summaries
Processes for high-frequency, short turnaround executive summaries

*Three GPTs Revolutionize Executive Summaries: Three GPTs Walk into a Bar and Write an Exec Summary blog post introduces a simple process for rapid executive summary creation. Three Custom GPTs* work together: one extracts insights, one crafts summaries, and a third revises the content.
*High-Frequency Executive Summary Tactics: The blog details how these Custom GPTs* address high-frequency and short turnaround needs when summarizing events, technology, or trends. Often tasked with tight deadlines, this process ensures quick yet meaningful summaries.

Link mentioned: Three GPTs Walk into a Bar and Write an Exec Summary – D-Squared: no description found

DiscoResearch ▷ #discolm_german (2 messages):

Magpie model available on HuggingFace Spaces
Generating preference data via HuggingFace Spaces
Duplicated model from davanstrien/magpie
User feedback on Magpie model performance

Magpie model available on HuggingFace Spaces: A Magpie model is now accessible on HuggingFace Spaces, which has been duplicated from davanstrien/magpie.
Doesn't work that well yet, but the concept of generating preference data via HuggingFace Spaces is well-liked.

User feedback on Magpie model performance: A user shared that the Magpie model doesn’t function effectively but appreciates the concept.

Link mentioned: Magpie - a Hugging Face Space by sroecker: no description found

MLOps @Chipro ▷ #events (2 messages):

Claude hackathon collaboration
Kafka optimization webinar

*Claude Hackathon Collaboration*: A member invited others to collaborate and build something cool for the Claude hackathon ending next week.
*Optimize Kafka and Save Costs!: Join a webinar on July 18th at 4 PM IST* to learn best practices for optimizing Kafka, including scaling strategies and cost-saving techniques.
*Expert Speakers at Kafka Webinar: The event will feature Yaniv Ben Hemo from Superstream and Viktor Somogyi-Vass* from Cloudera, who will share their expertise on building scalable, cost-efficient Kafka environments.

Links mentioned:
no title found: no description found
Optimizing Kafka for Cost-Efficiency: Best Practices and Strategies, Thu, Jul 18, 2024, 4:00 PM | Meetup: **Event Title:** **Optimizing Kafka for Cost-Efficiency: Best Practices and Strategies** **Event Details:** Date: July 18th 2024 Time: 4:00 PM IST (Virtual Event) Join us

Datasette - LLM (@SimonW) ▷ #llm (1 messages):

Potential uses for embeddings
New job title 'Embeddings Engineer'

*Embeddings Engineer finds more uses: An individual stated they are discovering more potential uses for embeddings and joked about adopting the title Embeddings Engineer*.
*New job title humor: Embeddings Engineer* was suggested humorously as a new job title due to the increasing number of uses for embeddings.
I think I'll call myself Embeddings Engineer from now on 😄

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):