[AINews] Ten Commandments for Deploying Fine-Tuned Models

Ten Commandments for Deploying Fine-Tuned Models

                May 24, 2024

            [AINews] Ten Commandments for Deploying Fine-Tuned Models

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Gemini-in-Google-Slides is all we needed.

AI News for 5/23/2024-5/24/2024.
We checked 7 subreddits, 384 Twitters and 29 Discords (380 channels, and 4467 messages) for you. 
Estimated reading time saved (at 200wpm): 495 minutes.

Followups: Jason Wei published a nice "201" supplement to yesterday's topic on Evals, somewhat on the metagame of making a successful eval, but with some side digressions and anecdotes about specific notable evals like MATH and LMSYS. It's also the last day to use the AINEWS code for the AI Engineer World's Fair.

It's a quiet news day so we went diving for interesting content from the community. Today's winner is Kyle Corbitt's talk on Deploying Finetuned Models in Prod:

In brief the commandments are:
Thou Shalt Not Fine-Tune: Just use prompting! And optionally few-shot examples/RAG. Fine-tuning is expensive, slow, and complex. Only do it if your use case really requires it.
Thou Shalt Write a Freaking Prompt: Create a baseline and prove the task is possible with prompting.
Thou Shalt Review Thy Freaking Data: If you must fine-tune, make sure you understand your data thoroughly.
Thou Shalt Use Thy Actual Freaking Data: Your model will only be as good as the data it's trained on. Make sure your training data is as close as possible to the data your model will see in production.
Thou Shalt Reserve a Test Set: Always reserve a portion of your data for testing to evaluate your model's performance.
Thou Shalt Choose an Appropriate Model: The more parameters a model has, the more expensive and slower it is to train. Choose a model that is appropriate for your task and your budget.
Thou Shalt Write Fast Evals: Write evaluation metrics that are fast to compute so you can quickly iterate on your model.
Also, Thou Shalt Write Slow Evals: Write evaluation metrics that are more comprehensive and take longer to compute, to give you a deeper understanding of your model's performance.
Thou Shalt Not Fire and Forget: Don't just deploy your model and forget about it. Monitor its performance and be prepared to retrain or update it as needed.
Thou Shalt Not Take the Commandments Too Seriously: These commandments are meant to be helpful guidelines, not hard and fast rules. Use your best judgment and adapt them to your specific needs.
Fun fact, we used Gemini to do this summary of the deck. Give it a try.

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
PART 1: High level Discord summaries
LLM Finetuning (Hamel + Dan) Discord
HuggingFace Discord
Perplexity AI Discord
Stability.ai (Stable Diffusion) Discord
Unsloth AI (Daniel Han) Discord
Nous Research AI Discord
Eleuther Discord
LM Studio Discord
CUDA MODE Discord
Modular (Mojo 🔥) Discord
OpenAI Discord
LangChain AI Discord
LAION Discord
LlamaIndex Discord
OpenRouter (Alex Atallah) Discord
Latent Space Discord
Interconnects (Nathan Lambert) Discord
OpenAccess AI Collective (axolotl) Discord
OpenInterpreter Discord
Mozilla AI Discord
Cohere Discord
AI Stack Devs (Yoko Li) Discord
MLOps @Chipro Discord
DiscoResearch Discord

PART 2: Detailed by-Channel summaries and links
LLM Finetuning (Hamel + Dan) ▷ #general (74 messages🔥🔥):
LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (23 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #asia-tz (8 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (18 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (16 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #hugging-face (9 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #replicate (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #kylecorbitt_prompt_to_model (164 messages🔥🔥):
LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (117 messages🔥🔥):
LLM Finetuning (Hamel + Dan) ▷ #workshop-3 (3 messages):
LLM Finetuning (Hamel + Dan) ▷ #axolotl (32 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (118 messages🔥🔥):
LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (192 messages🔥🔥):
HuggingFace ▷ #announcements (1 messages):
HuggingFace ▷ #general (490 messages🔥🔥🔥):
HuggingFace ▷ #today-im-learning (8 messages🔥):
HuggingFace ▷ #cool-finds (3 messages):
HuggingFace ▷ #i-made-this (22 messages🔥):
HuggingFace ▷ #computer-vision (4 messages):
HuggingFace ▷ #NLP (8 messages🔥):
HuggingFace ▷ #diffusion-discussions (6 messages):
Perplexity AI ▷ #general (493 messages🔥🔥🔥):
Perplexity AI ▷ #sharing (7 messages):
Perplexity AI ▷ #pplx-api (1 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (427 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (275 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #announcements (1 messages):
Unsloth AI (Daniel Han) ▷ #random (4 messages):
Unsloth AI (Daniel Han) ▷ #help (103 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):
Nous Research AI ▷ #off-topic (12 messages🔥):
Nous Research AI ▷ #interesting-links (6 messages):
Nous Research AI ▷ #general (280 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (8 messages🔥):
Nous Research AI ▷ #project-obsidian (6 messages):
Nous Research AI ▷ #rag-dataset (36 messages🔥):
Nous Research AI ▷ #world-sim (21 messages🔥):
Eleuther ▷ #general (53 messages🔥):
Eleuther ▷ #research (249 messages🔥🔥):
Eleuther ▷ #interpretability-general (3 messages):
Eleuther ▷ #lm-thunderdome (10 messages🔥):
LM Studio ▷ #💬-general (142 messages🔥🔥):
LM Studio ▷ #🤖-models-discussion-chat (70 messages🔥🔥):
LM Studio ▷ #📝-prompts-discussion-chat (23 messages🔥):
LM Studio ▷ #⚙-configs-discussion (6 messages):
LM Studio ▷ #🎛-hardware-discussion (5 messages):
LM Studio ▷ #amd-rocm-tech-preview (4 messages):
LM Studio ▷ #model-announcements (1 messages):
CUDA MODE ▷ #general (23 messages🔥):
CUDA MODE ▷ #triton (4 messages):
CUDA MODE ▷ #torch (1 messages):
CUDA MODE ▷ #announcements (1 messages):
CUDA MODE ▷ #pmpp-book (4 messages):
CUDA MODE ▷ #torchao (5 messages):
CUDA MODE ▷ #llmdotc (115 messages🔥🔥):
CUDA MODE ▷ #rocm (2 messages):
CUDA MODE ▷ #bitnet (1 messages):
Modular (Mojo 🔥) ▷ #general (90 messages🔥🔥):
Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):
Modular (Mojo 🔥) ▷ #ai (12 messages🔥):
Modular (Mojo 🔥) ▷ #🔥mojo (31 messages🔥):
Modular (Mojo 🔥) ▷ #performance-and-benchmarks (2 messages):
Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):
Modular (Mojo 🔥) ▷ #nightly (34 messages🔥):
OpenAI ▷ #ai-discussions (116 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (11 messages🔥):
OpenAI ▷ #prompt-engineering (8 messages🔥):
OpenAI ▷ #api-discussions (8 messages🔥):
LangChain AI ▷ #general (83 messages🔥🔥):
LangChain AI ▷ #share-your-work (4 messages):
LangChain AI ▷ #tutorials (1 messages):
LAION ▷ #general (65 messages🔥🔥):
LAION ▷ #research (11 messages🔥):
LlamaIndex ▷ #blog (3 messages):
LlamaIndex ▷ #general (60 messages🔥🔥):
LlamaIndex ▷ #ai-discussion (4 messages):
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
OpenRouter (Alex Atallah) ▷ #general (41 messages🔥):
Latent Space ▷ #ai-general-chat (36 messages🔥):
Latent Space ▷ #ai-announcements (1 messages):
Interconnects (Nathan Lambert) ▷ #random (27 messages🔥):
Interconnects (Nathan Lambert) ▷ #lectures-and-projects (2 messages):
OpenAccess AI Collective (axolotl) ▷ #general (17 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #community-showcase (3 messages):
OpenInterpreter ▷ #general (8 messages🔥):
OpenInterpreter ▷ #O1 (5 messages):
Mozilla AI ▷ #llamafile (9 messages🔥):
Cohere ▷ #general (8 messages🔥):
AI Stack Devs (Yoko Li) ▷ #late-night-lounge (6 messages):
MLOps @Chipro ▷ #events (1 messages):
MLOps @Chipro ▷ #general-ml (1 messages):
DiscoResearch ▷ #general (1 messages):

Part 2
LLM Finetuning (Hamel + Dan) Discord
HuggingFace Discord
Perplexity AI Discord
Stability.ai (Stable Diffusion) Discord
Unsloth AI (Daniel Han) Discord
Nous Research AI Discord
Eleuther Discord
LM Studio Discord
CUDA MODE Discord
Modular (Mojo 🔥) Discord
OpenAI Discord
LangChain AI Discord
LAION Discord
LlamaIndex Discord
OpenRouter (Alex Atallah) Discord
Latent Space Discord
Interconnects (Nathan Lambert) Discord
OpenAccess AI Collective (axolotl) Discord
OpenInterpreter Discord
Mozilla AI Discord
Cohere Discord
AI Stack Devs (Yoko Li) Discord
MLOps @Chipro Discord
DiscoResearch Discord

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Anthropic's Claude AI and Interpretability Research

Feature alteration in Claude AI: @AnthropicAI demonstrated how altering internal "features" in their AI, Claude, could change its behavior, such as making it intensely focus on the Golden Gate Bridge. They released a limited-time "Golden Gate Claude" to showcase this capability.
Understanding how large language models work: @AnthropicAI expressed increased confidence in beginning to understand how large language models really work, based on their ability to find and alter features within Claude.
Honesty about Claude's knowledge and limitations: @alexalbert__ stated that Anthropic is honest with Claude about what they know and don't know, rather than purposefully making decisions about its ability to speculate on tricky philosophical questions.

Open-Source AI Models and Advancements

Open-source models catching up to closed-source: @bindureddy highlighted that on the MMLU benchmark, open-source models like GPT-4o are nearing the performance of closed-source models like GPT-4 for simple consumer use-cases. However, more advanced models are still needed for complex AI agent and automation tasks.
New open-source model releases: @osanseviero shared several new open-source model releases this week, including multilingual models (Aya 23), long context models (Yi 1.5, M2-BERT-V2), vision models (Phi 3 small/medium, Falcon VLM), and others (Mistral 7B 0.3).
Phi-3 small outperforms GPT-3.5T with fewer parameters: @rohanpaul_ai pointed out that Microsoft's Phi-3-small model, with only 7B parameters, outperforms GPT-3.5T across language, reasoning, coding, and math benchmarks, demonstrating rapid progress in compressing model capabilities.

AI Agents, Retrieval-Augmented Generation (RAG), and Structured Outputs

Shift from RAG for QA to report generation: @jxnlco forecasted that in the next 6-8 months, RAG systems will transition from question-answering to report generation, leveraging well-designed templates and SOPs to unlock business value by targeting people with money.
ServiceNow uses RAG to reduce hallucination: @rohanpaul_ai shared a ServiceNow paper showing how RAG can ensure generated JSON objects are plausible and executable for workflow automation by retrieving relevant steps and table names to include in the LLM prompt.
RAG adds business value by connecting LLMs with real-world data: @cohere outlined how RAG systems address challenges like hallucinations and rising costs by connecting LLMs with real-world data, highlighting the top 5 reasons enterprises are adopting RAG for their LLM solutions.

AI Benchmarks, Evaluation, and Cultural Inclusivity

Standard AI benchmarks may not guide true global cultural understanding: @giffmana suggested that typical "western" AI benchmarks like ImageNet and COCO may not be indicative of genuine "multicultural understanding". Training models on global data instead of just English can greatly improve performance in non-western cultures.
Difficulties in evaluating large language models: @clefourrier and @omarsar0 shared a report discussing the challenges in robustly evaluating LLMs, such as differences between initial benchmark design and actual use, and the need for more discriminative benchmarks as models become more capable.
Aya 23 multilingual models expand who technology serves: @sarahookr introduced Cohere's Aya 23 models, a powerful multilingual family aiming to serve nearly half the world's population, as part of their mission to change who is seen by technology.

Memes and Humor

Nvidia stock and the "permanent underclass": @nearcyan joked about a spouse regretting not buying Nvidia stock and being part of the "permanent underclass forever".
Satire of Anthropic's Golden Gate Bridge AI: @jeremyphoward satirized Anthropic's interpretability demo, humorously claiming that "OpenAI has already caught up with the latest feature in Claude, and also has an advanced Golden Gate Bridge mode based on sophisticated mechanistic interpretability research."
Poking fun at Google's AI mistakes: @mark_riedl shared a humorous anecdote about jokingly claiming Google's AI incorrectly thought he won a DARPA award, leading people to actually believe he didn't receive the honor.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Progress and Capabilities

Impressive transcription and location identification by GPT-4: In /r/OpenAI, GPT4-o demonstrates remarkable abilities to transcribe text from images and identify locations, even without EXIF data, as shown in this video and discussed further.
Yi-Large catching up to state-of-the-art models: A comparison posted in /r/singularity shows Yi-Large approaching GPT-4 performance and surpassing Claude 3 Opus and Gemini 1.5 pro on several benchmarks.

AI Ethics and Safety Concerns

OpenAI employees leaving over ethical concerns: In /r/singularity, it's reported that OpenAI employees are departing not just due to "decel" fears but over issues like partnering with News Corp, lobbying against open source, and aggressive tactics against ex-employees. 
Concerns over OpenAI's News Corp partnership: An /r/OpenAI post criticizes OpenAI's partnership with News Corp, a right-wing propaganda company, worried it could lead to ChatGPT legitimizing extreme viewpoints.
California AI bill requires safeguards but criticized: A new California AI bill, discussed in /r/singularity, mandates models over 10^26 flops have weapons creation prevention, shutdown buttons, and government reporting. However, the requirements are criticized as not making technical sense.
Yann LeCun pushes back on AI doomerism: In a video shared on /r/singularity, AI pioneer Yann LeCun argues the biggest AI dangers are censorship, monitoring, and centralized power, not the doomer scenarios often portrayed.

AI Interpretability and Control

Anthropic's "Golden Gate Claude" maps AI features: Anthropic's research, detailed in /r/singularity, shows their "Golden Gate Claude" can map and manipulate an AI's internal features, a potentially major advance in understanding and controlling AI behavior.
Anthropic demonstrates feature alteration to shape AI behavior: Another Anthropic paper, shared on /r/singularity, shows interpretable features learned by a sparse autoencoder can represent complex concepts and be altered to control an AI, such as inducing an obsession.

AI Commercialization and Access

Meta considers paid version of AI assistant: The Information reports, in a post on /r/singularity, that Meta is working on a premium paid version of its AI assistant.
Macron positions Mistral as EU's top AI company: A CNBC article, shared on /r/singularity, describes French President Macron promoting Mistral as the leading EU AI company, drawing criticism of favoring a French firm over other European contenders.
Google Colab offers free GPUs for AI development: An /r/singularity post highlights that Google Colab is providing free GPU access, including A100s, to enable AI development.

Memes and Humor

Meme on boomers not letting go: A meme on /r/singularity jokes about boomers refusing to let younger generations take over. 
Satirical video on Microsoft training GPT5: An /r/singularity video satirizes Microsoft training GPT5 by feeding it data like a whale consuming krill.
Meme about Windows Recall AI and privacy: A meme on /r/singularity pokes fun at a hypothetical Windows Recall AI feature and the privacy concerns it would raise.

AI Discord Recap

A summary of Summaries of Summaries

LLM Fine-Tuning Techniques and Best Practices:
- Ten Commandments for Fine-Tuning: In Kyle Corbitt's talk, members emphasized meticulous prompt design and template configurations, using ### delimiters and "end of text" tokens for efficient model fine-tuning.
- Hamel’s Latency Optimization Blog: Discussions on reducing overfitting and the effective use of retrieval-augmented generation (RAG) strategies highlighted practical guidance from ongoing fine-tuning experiments on platforms like Axolotl.

Innovations in Quantization and Performance Optimization:
- Tim Dettmers' Research on LLM.int8(): His work, highlighted by this blog, demonstrates how advanced quantization methods maintain transformer performance without degradation, revealing insights into emergent features and their implications.
- CUDA's Gradient Norm Bug Fixing: Solved issues like exploding gradients and batch size problems significantly improved training stability, as detailed in this PR.
- Optimized Memory Architecture in Axolotl: Sample packing efficiency improvements showed a 3-4% resource management gain during distributed training.

Open-Source Frameworks and Community Efforts:
- Axolotl's Latest Updates: The community discussed integrating observability into LLM applications and resolving cache and configuration issues to streamline workflows in fine-tuning models.
- PostgresML Integration with LlamaIndex: Andy Singal highlighted the synergy between PostgresML and LlamaIndex in efficiently leveraging AI for database management tasks.

Multimodal AI and New Model Developments:
- Phi-3 Model Excitement: Unsloth's Phi-3 models, touted for their longer context lengths and medium support, captured community interest with announcements of rapid optimization and integration.
- Mobius Model Anticipations: DataPlusEngine's upcoming release promises efficient base model creation, sparking debates on the implications for foundational diffusion models and their training methodologies.

Challenges in AI Ethics, Governance, and User Experience:
- SB-1047 Regulatory Concerns: Community outrage over the centralization of AI governance and comparisons to regulatory captures in other industries prompted heated discussions on the bill's impact on small developers.
- Ethical Use of AI in Communication Tools: Deployments of GPT-4 and Claude for workplace communication monitoring raised philosophical questions about embedding ethics into AI and their potential for reducing legal vulnerabilities, as highlighted in discussions regarding API integration and usage limits.

PART 1: High level Discord summaries
LLM Finetuning (Hamel + Dan) Discord
Fine-Tuning Facts: Discussion on fine-tuning in the general channel revealed a concern about semantic similarity overfitting due to biased data categories. A user struggled with understanding fine-tuning vis-à-vis user inputs and initial model training. Changes in the OpenAI platform's sidebars were also noted with the disappearance of two icons (threads and messages).
Templates Take the Spotlight: In workshop-1, the importance of configuring templates correctly during fine-tuning was highlighted. In particular, the delimiter ### aids in parsing different input sections, and "end of text" tokens indicate when to stop token generation.
Maven Mingles with Moderation: In asia-tz, a light-hearted exchange between members referenced a reunion. A request for a conference talk recording was met, with the video being available on Maven.
Modal Mobilization: Modal users in 🟩-modal shared excitement over received credits, training experiences, and provided specific links to Modal documentation and examples for new users. A plan to use Modal for a Kaggle competition was also shared, including setup and execution details.
Jarvis Jots Down Jupyter Jumble: In the jarvis-labs channel, members discussed storing a VSCode repo on Jarvis with a suggestion to use GitHub for saving work. There was a notice of spot instance removal due to instability. The cost and duration of fine-tuning the open-lama-3b model were shared, and a user resolved an Ampere series error by adjusting model parameters.
Hugging Face Huddles on Credits & Spanish Models: The hugging-face channel saw discussions about pending HF credits and models suitable for Spanish text generation—with Mistral 7B and Llama 3 models being recommended.
Credit Countdown Carries On in replicate, where an upcoming announcement related to credit management and distribution was teased.
Corbitt's Commandments Claim Clout: Enthusiastic attendees in the kylecorbitt_prompt_to_model channel discussed fine-tuning methods and techniques presented in Kyle Corbitt's talk, including Ten Commandments for Deploying Fine-Tuned Models.
Axolotl Answers the Call in workshop-2, where users discussed datasets, model training, and troubleshooting in Axolotl. A blog post on TinyLLama Fine-Tuning was shared, and there was a push for integrating observability into LLM applications.
Zoom Out, Discord In: Users from workshop-3 migrated their discussions to Discord after the Zoom chat was disabled.
Axolotl's Cache Conundrum Causes Confusion: Issues with cache in Axolotl frustrating users and confusion with missing files were resolved in axolotl. Discussions on sample packing and a guide on tokenizer gotchas addressed concerns around efficiency and tokenization.
Accelerate to Victory: zach-accelerate saw users work through confusion over float comparisons, resolve Jarvislab training command errors, and exchange resources for learning model acceleration with a focus on fine-tuning best practices.
Winging It with Axolotl: The wing-axolotl channel collaborated on dataset templates, pre-processing issues, Axolotl configurations, and provided a PR merge for the latest Axolotl updates. They delved into debugging tools and the significance of precise templates for training success.

HuggingFace Discord
Protein Data Visuals Reach New Heights: A new protein visualization project now sports 3D rendering and includes examples for human hemoglobin and ribosomal proteins, with the project details found on GitHub.
Enter the TranscriptZone with OpenAI's Whisper: A new transcription app that leverages OpenAI's Whisper to transcribe YouTube videos and more is available at Hugging Face Spaces.
Decentralizing the Web - More than a Dream?: A project building infrastructure for a decentralized internet sought community feedback through a survey, raising discussions about the ethics of data collection.
A Vision Transformers Query in Depth: A member sought resources on applying Vision Transformers (ViT) for monocular depth estimation, indicating an intent to develop a model using ViT, but no specific resources were provided in the discussion.
Quantisation Quandary for Mistral Model: The use of bitsandbytes for 8-bit quantisation on Mistral v0.3 Instruct led to slower performance compared to 4-bit and fp16, a baffling outcome that contradicts expected efficiency gains from reduced-bit computation.

Perplexity AI Discord

Perplexity Climbs Over ChatGPT in CSV Showdown: Engineers discussed that Perplexity AI outshines ChatGPT in CSV file processing by allowing direct CSV uploads. Also, Julius AI was recommended for data analysis, leveraging Python and integration with LLMs like Claude 3 or GPT-4.

Users Snub Claude 3 Opus: Claude 3 Opus is getting the cold shoulder due to increased content restrictions and perceived diminished utility, with GPT-4 posed as a preferable option despite limitations.

Querying Pro Search's True Upgrade: Upgrades to Pro Search raised eyebrows as users discussed whether new multi-step reasoning features and API specs were genuine backend improvements or merely surface-level UI enhancements.

API Integration Articulated: Dialogue around API integration for external tools with Claude generated interest along with sharing of custom function calls, serverless backends, and documentation like Tool Use with Claude.

Ethics in AI: More Than a Thought Experiment: Discourse on infusing GPTs with ethical monitoring capabilities sparked, casting light on potential applications in workplace communication and legal defensibility, albeit with philosophical wrinkles yet to be ironed out.

Stability.ai (Stable Diffusion) Discord

Speculation Peaks on RTX 5090's VRAM: There's buzzing debate over whether the rumored RTX 5090 with 32GB VRAM makes practical sense. References were made to potential specs and images on PC Games Hardware, but some members remained skeptical about its authenticity.

Stable Diffusion and the AMD Challenge: Users offered guidance on installing Stable Diffusion on an AMD 5700XT GPU, suggesting that starting with web services like Craiyon may circumvent potential compatibility issues.

Stable Diffusion 3: Trial Before Commitment: The community contrasted Stable Diffusion 3 with competitor Midjourney, highlighting that while a free trial is available for SD3, ongoing access would require a Stability membership.

Anticipation Builds Around Mobius Model: An announcement concerning DataPlusEngine’s novel Mobius model has garnered significant interest for its claim to create efficient base models. The model, teased on Twitter, is neither a straightforward base model nor a tuned version of something pre-existing.

32GB VRAM: Game Changer or Overkill?: The mention of a 32GB VRAM GPU led to conversations about the potential shift in Nvidia's approach to data center GPU sales, considering how products with substantial memory could impact the market demand for the H100/A100 series.

Unsloth AI (Daniel Han) Discord

PEFT Config Snag Solved: An issue where config.json was missing during PEFT training was resolved by copying it from the base model's configuration, with the user confirming success.

Llama Levitates Above Bugs: The Llama 3 model's base weights were described as "buggy," but Unsloth has implemented fixes. To improve training, the use of reserved tokens and updates to the tokenizer and lm_head are recommended.

System Prompt Boosts Llama 3: Incorporating a system prompt, even a blank one, was observed to enhance Llama3 finetuning outcomes.

Phi 3 Proliferation: Excitement bubbled as Phi 3 models debuted, sporting medium support. Community chatter pointed engineers toward extensive details in blog posts and release notes.

Stable Diffusion's Sinister Side Show: Creepy artifacts and uncanny voice cloning outputs from Stable Diffusion startled users, with discussions and experiences shared via YouTube videos and a Reddit thread.

VSCode Copilot Climbing Onboard: Recommendations for a local VSCode "copilot" were sought and met with suggestions and positive responses in the random channel.

Inference Inertia with Phi-3: Slower inference times using Unsloth Phi-3 puzzled one user, who provided a Colab notebook to investigate the lag, with community efforts yet to find a fix.

Quantization Quandary Unraveled: A member faced challenges quantizing a custom model, hitting walls with llama.cpp and Docker compatibility, sparking a discussion on solutions.

VRAM Verdict for Model Might: VRAM requirements were laid out: 12GB for Phi 3 mini is okay, but 16GB is a must for Phi 3 medium. For hefty tasks, considering outside computing resources was proposed.

Data Diligence for Training Consistency: The importance of using consistent datasets for training and evaluation was echoed, highlighting Unslothai's public datasets like the Blackhole Collection.

Platform Possibilities and Cautions: Queries regarding Unsloth support for older Macs were addressed, confirming a focus on CUDA and GPU usage, with suggestions for those on CPU-only rigs.

Enterprise Expertise Extension: A community member stepped forward to offer enterprise expertise to Unsloth, hailing the joining of accelerators at Build Club and Github, hinting at synergistic potential for Unsloth's endeavors.

Nous Research AI Discord
Intellectual Debate Ignites Over AI Understanding: In-depth discussions were had about the true understanding of concepts by LLMs, with interpretability research considered important empirical evidence. Skeptics argued that current efforts are lacking, with references to work by Anthropic on mapping large language model minds.
The Creature from the Llama Lagoon: A technical foray into enhancing Llama models centered around crafting a script that could manage function calls, with Hermes Pro 2's approach serving as inspiration. Another inquiry circled the implementation of Llama3 LoRA techniques on a 3080 GPU.
Reality Quest in Digital Dimensions: Spearheading a conversation on Nous and WorldSim, members explored the possible applications of NightCafe and multi-dimensional AR spaces in mapping complex AI worlds. Dream-like explorations in audio-visualizers and whimsical ASCII art representations highlighted creative uses for AI-driven simulations.
Sifting Through RAG Data: Advocation for models to integrate internal knowledge with Retrieval-Augmented Generation (RAG) was a hot topic, with questions raised about how to handle contradictions and resolve conflicts. Emphasizing user evaluations was seen as essential, particularly for complex query cases.
Precision over Pixie Dust in Fine-Tuning AI: The community's discourse featured a celebration of the Mobius model for its prowess in image generation, with anticipation for an open-sourced version and elucidating publications. Additionally, Hugging Face was mentioned for their PyTorchModelHubMixin enabling easier model sharing, though limited by a 50GB size constraint without sharding.

Eleuther Discord

JAX vs. PyTorch/XLA: The TPU Showdown: The performance comparison of JAX and PyTorch/XLA on TPUs spurred debate over benchmarking nuances such as warmup times and blocking factors. The dramatic decline in GPT-3 training costs from $4.5M to an estimated $125K-$1M by 2024 was highlighted, considering TFLOP rates and GPU-hour pricing from various contributors, linking to a Databricks Blog Post.

Scaling and Teaching LLMs: In the research forum, the Chameleon model was noted for its strong performance in multimodal tasks, while Bitune promised improvements in zero-shot performance for LLMs (Bitune Paper). Discussions questioned the scalability of the JEPA model for AGI and critiqued RoPE's context length limitations, referencing a relevant paper.

Emergent Features Puzzle LLM Enthusiasts: Tim Dettmers' research on advanced quantization methods maintaining performance in transformer inference was linked, including his concept of emergent outliers, and its integration with Hugging Face via the bitsandbytes library. Discourse on emergent features coalescing around ideas of them being the "DNA" of a model, driving discussions on its implications for phase transitions.

A Brief on Technical Tweaks & LM Evaluation: Within the lm-thunderdome, engineers covered practical tips for setting seeds in vllm models, retrieving the list of tasks with lm_eval --tasks list, and handling changes in BigBench task names that affect harnesses like Accelerate with memory issues. It was suggested to locate tasks by perusing the lm-eval/tasks folder for better organization.

A Call for Collaboration: An appeal was made for expanding the Open Empathic project, with a YouTube guide for contributing movie scenes and a link to the project shared. Further collaboration was encouraged, underlining the need for community efforts in enhancement.

LM Studio Discord
GPU Adventures: Engineers discussed challenges when loading small models onto GPUs, with some favoring models like llama3, mistral instruct, and cmdrib. Meanwhile, using lower quantizations, such as llamas q4, reportedly yielded better results than higher ones like q8 for certain applications, refuting the notion that "bigger is always better."
Next-Gen Models Incoming: An update in the model realm informed about the release of a 35B model, with testing to ensure LM Studio compatibility. Optimizations for different scales of models were a topic too, with a focus on Phi-3 small GGUFs and their efficiency.
Servers and Setups: Hardware discussions included leveraging distributed inference with llama.cpp and its recent RPC update, although quantized models aren't supported yet. Experimental builds using clustered cheap PCs with RTX 4060 Ti 16GB for distributed model setups and possible network constraints were also explored.
Multilingual Cohesion Achieved: Cohere models now extend their prowess to 23 languages, as advertised with aya-23 quants available for download, but ROCm users must await an update to dive in. 
Stable Diffusion Left Out: LM Studio clarified that it exclusively handles language models, excluding image generators like Stable Diffusion, alongside dealing with CUDA issues on older GPUs and promoting services like Julius AI to ease user experience woes.

CUDA MODE Discord

Gradient Norm Nuisance: Altering the batch size from 32 leads to a sudden spike in gradient norm, disrupting training. A pull request resolved this issue by preventing indexing overflow in the fused classifier.

Int4 and Uint4 Types Need Some TLC: A member flagged that many functions lack implementations for int4 and uint4 data types in PyTorch, with a discussion thread indicating limitations on type promotion and tensor operations.

Live Code Alert – Scan Algorithm in Spotlight: Izzat El Hajj will lead a live coding session on the Scan algorithm, vital for ML algorithms like Mamba, scheduled for <t:1716663600:F>, promising to be a technical deep dive for enthusiasts.

CUB Library Queries and CUDA Nuances: Members tapped into discussions ranging from the functioning of CUDA CUB library code to triggering tensor cores without cuBLAS or cuDNN, highlighting resources like NVIDIA's CUTLASS GitHub repository and the NVIDIA PTX manual.

FineWeb Dataset Conundrum: Processing the FineWeb dataset can be a storage hog, hitting 70 GB on disk and gobbling up to 64 GB of RAM, hinting at a need for better optimization or more robust hardware configurations for data processing tasks.

Modular (Mojo 🔥) Discord
Python Libraries Cling to C Over Mojo: There's a lively conversation about the feasibility and preparedness of porting Python libraries to Mojo, with concerns about pushing maintainers too hard given Mojo's evolving API. Members discussed whether targeting C libraries might be a more immediate and practical endeavor.
Rust's Security Appeal Doesn't Rust Mojo's Potential: Mojo is not slated to replace C, but the security benefits of Rust are influencing how engineers think about Mojo's application in different scenarios. Ongoing discussions address concepts from Rust that could benefit Mojo developments.
Blazing Ahead With Nightly Mojo: BlazeSeq performance on MacOS using Night versions of Mojo shows promising similarity to Rust's Needletail, fueling cross-platform efficiency discussions. Rapid nightly updates, noted in changelog, keep the community engaged with the evolving language.
Curiosity Sparks Over Modular Bot's Machinery: Queries were raised about the underlying tech of "ModularBot", and although no specific model was referenced, the bot shared a colorful reply. Separately, the potential for ML model training and inference within Mojo was discussed, with mention of Max Engine as a numpy alternative, though no full-fledged training framework is on the horizon.
Compile-Time Confusion and Alignment Woes: Problems from aligning boolean values in memory to compile-time function issues are causing a stir among users, with workarounds and official bug reports highlighting the importance of community-driven troubleshooting.

OpenAI Discord

LaTeX Loyalist LLM: In the realm of formatting, users noted frustration with GPT's strong inclination to default to LaTeX despite requests for Typst code, revealing preferences in coding syntax that the LLM seems to adhere to.

Microsoft Copilot+ vs. Leonardo Rivalry: Conversations in the community centered on the value of Microsoft Copilot+ PCs for creative tasks like "sketch to image," while some members encouraged checking out Leonardo.ai for analogous capabilities.

A Thirst for Efficiency in AI: Concern was voiced over the environmental toll of AI, citing a Gizmodo article on the substantial water usage during the training of AI models, prompting discussions on the need for more eco-friendly AI practices.

Iteration Over Innovation: There was active dialogue on enhancing the performance of LLMs through iterative refinement, with references to projects like AutoGPT addressing iterations, despite the associated higher costs.

Intelligence Infusion Offer Overstated?: The guild pondered the plausibility and potential of embedding legal knowledge within ChatGPT, enough to consider a valuation at $650 million, though detailed perspectives on this bold assertion were limited.

LangChain AI Discord
LangChain CSV Agent Deep Dive: Engineers explored LangChain's CSV agent within a SequentialChain and discussed how to customize output keys like csv_response. Challenges with SQL agents handling multi-table queries were mentioned, pointing towards token limits and LLM compatibility issues, with direction to GitHub for issues.
AI Showcases Gather Buzz: OranAITech tweeted their latest AI tech, while everything-ai v2.0.0 announced features including audio and video processing capabilities with a repository and documentation available.
Demystifying VisualAgents: Demonstrations of Visual Agents platform were shared via YouTube, revealing its potential to streamline SQL agent creation and building simple retrieval systems without coding, utilizing LangChain's capabilities. Two specific videos showcased their workflows: SQL Agent and Simple Retrieval.
EDA GPT Impressions On Display: A demonstration of EDA GPT, including a five-minute overview video showcasing its various functions, was linked to via LOVO AI. The demo highlights the AI tool's versatility.
Tutorial Teaser: A message in the tutorials channel provided a YouTube link to business24.ai's content, although the context of its relevance was not disclosed.

LAION Discord

Piracy's Not the Panacea: Despite a humorous suggestion that The Pirate Bay could become a haven for sharing AI model weights, skepticism among members arises, highlighting the potential for friendlier AI policy landscapes in other nations to prevail instead.

Japan Takes the AI High Road: Participants noted Japan's encouraging position on AI development, referencing a paper shared via a tweet about creating new base diffusion models without the need for extensive pretraining, showcasing a strategy involving temporary disruption of model associations. 

Poisoned Recovery Protocols Probed: A collaborative study, involving a poisoned model recovery method conducted by fal.ai, was mentioned, with findings expected to empirically substantiate the recovery approach. Reservations were expressed regarding the aesthetics of AI-generated imagery, specifically the "high contrast look" and artifacts presented by models like Mobius versus predecessors such as MJv6.

Claude Mappings Crack the Code: Anthropic's research paper details the dissection of Claude 3 Sonnet's neural landscape, which illustrates the manipulation of conceptual activations and can be read at their research page. Debates sparked over the potential commercialization of such activations, with a juxtaposed fear of the commercial implications driving AI practitioners to frustration.

A Nostalgic Look at AI's Visual Visions: A member reminisced about the evolution from early AI visual models like Inception v1 to today's sophisticated systems, recognizing DeepDream’s role in understanding neural functionality. Furthermore, the benefits of sparsity in neural networks were discussed, describing the use of L1 norm for sparsity and a typical 300 non-zero dimensions in high-dimensional layers.

LlamaIndex Discord

Meetup Alert: Limited Seats Available: Few spots remain for the upcoming LlamaIndex meetup scheduled for Tuesday, with enthusiasts encouraged to claim their spots quickly due to limited availability.

MultiOn Meets LlamaIndex for Task Automation: LlamaIndex has been coupled with MultiOn, an AI agents platform, facilitating task automation through a Chrome web browser acting on behalf of users; view the demo here.

RAGApp Launches for Code-Free RAG Chatbot Setup: The newly introduced RAGApp simplifies the deployment of RAG chatbots via a docker container, making it easily deployable on any cloud infrastructure, and it's open-source; configure your model provider here.

Solving PDF Parsing Puzzles: The community endorses LlamaParse as a viable API for extracting data from PDFs, especially from tables and fields, leveraging the GPT-4o model for enhanced performance; challenges with Knowledge Graph Indexing were also a topic, highlighting the need for both manual and automated (through VectorStoreIndex) strategies.

PostgresML Joins Forces with LlamaIndex: Andy Singal shared insights on integrating PostgresML with LlamaIndex, detailing the collaboration in a Medium article, "Unleashing the Power of PostgresML with LlamaIndex Integration", receiving positive remarks from the community.

OpenRouter (Alex Atallah) Discord

Phi-3 Medium 128k Instruct Drops: OpenRouter unveiled Phi-3 Medium 128k Instruct, a powerful 14-billion parameter model, and invited users to review both the standard and free variants, and to participate in discussions on its effectiveness.

Wizard Model Gets a Magic Boost: The Wizard model has shown improvements, exhibiting more prompt and imaginative responses, yet attention is required to avoid repeated paragraphs.

Eyes on Phi-3 Vision and CogVLM2: Enthusiasm surges around Phi-3 Vision, with sharing of testing links like Phi-3 Vision, and suggestions to use CogVLM2 for vision-centric tasks found at CogVLM-CogAgent.

Automatic Llama 3 Prompt Transformation: It was clarified that prompts to Llama 3 models are automatically transformed through OpenRouter's API, streamlining the process, but manual prompting remains as an alternative approach.

Gemini API Annoyances: Users reported issues with Gemini FLASH API, such as empty outputs and token drain, recognized as a model-centric problem. The emergence of Google's daily API usage limits has piqued interest in how this might affect OpenRouter's Gemini integration.

Latent Space Discord

Indexify Ignites Interest: The launch of Indexify, an open-source real-time data framework by Tensorlake, sparked discussions focusing on its "streaming ETL" capabilities and the challenges in creating sustainable open-source models. Concerns were raised about the adequacy of the extractors provided and their potential paths to monetization.

LLM Evaluation under the Microscope: A Hugging Face blog post about Large Language Model (LLM) evaluation practices, the importance of leaderboards, and meticulous non-regression testing caught the attention of members, emphasizing the critical role of such evaluations in AI developments.

AI's Answer to Search Engine Manipulations: An incident involving website poisoning affecting Google's AI-gathered overviews triggered discussions around security and data integrity, including workarounds through custom search engine browser bypasses as reported in a tweet by Mark Riedl.

AI Democratizing Development or Raising Reliability Questions?: GitHub CEO Thomas Dohmke's TED Talk on AI's role in simplifying coding provoked debates over its reliability despite AI-driven UX improvements that expedite problem-solving in the coding process.

Diversity Scholarships to Bridge Gaps: Engineers from diverse backgrounds who face financial barriers to attending the upcoming AI Engineer World's Fair received a boost with the announcement of diversity scholarships. Interested applicants should furnish concise responses to the essay questions provided in the application form.

Interconnects (Nathan Lambert) Discord

Tax Tales Without Plastic: Nathan Lambert deciphered an invoice kerfuffle, realizing the rational behind tax billing sans credit card due to resale certificates.

Golden Gate AI Gets Attention: Experimentation by Anthropic AI led to "Golden Gate Claude," an AI single-mindedly trained on the Golden Gate Bridge, creating buzz for its public interactivity at claude.ai.

Google's AI Missteps: Google's failure to harness feedback and premature deployment of AI models spurred discussion about the tech giant's public relations challenges and product development woes.

Battling Dataset Misconceptions: Google's AI team countered claims about using the LAION-5B dataset by putting forth that they utilize superior in-house datasets, as referenced in a recent tweet.

Nathan Shares Knowledge Nuggets: For AI aficionados, Nathan Lambert uploaded advanced CS224N lecture slides. Additionally, attendees were tipped off about an upcoming session recording, sans release date details.

OpenAccess AI Collective (axolotl) Discord

GQA Gains Traction in CMDR Models: Discussions revealed that Grouped Query Attention (GQA) is present in the "cmdr+" models but not in the basic "cmdr" models, indicating an important distinction in their specifications.
VRAM Efficiency with Smart Attention: Engineers noted that while GQA doesn't offer linear scaling, it represents an improved scaling method compared to exponential, affecting VRAM usage favorably.
Sample Packing Gets a Boost: A new GitHub pull request showcases a 3-4% efficiency improvement in sample packing, promising better resource management for distributed contexts, linked here.
Academic Achievement Acknowledged: A member's co-authored journal article has been published in the Journal of the American Medical Informatics Association, highlighting the impact of high-quality, mixed-domain data on medical language models, with the article available here.
Community Cheers Scholarly Success: The community showed support for the peer's published work through personal congratulatory messages, fostering a culture of recognition for academic contributions within the AI field.

OpenInterpreter Discord
SB-1047 Sparks Technical Turmoil: Engineers express deep concerns about the implications of SB-1047, dubbing it as detrimental to smaller AI players and likening the situation to regulatory capture observed in other industries.
Perplexity and Arc, Tools of the Trade Showcased: The community spotlighted tools aiding their workflows, sharing a Perplexity AI search on SB-1047 and the new “Call Arc” feature of Arc Browser, which simplifies finding relevant answers online, with an informational link.
Install Issues Incite Inquiry: Users face issues with Typer library installation via pip, raising questions about whether steps in the setup process, such as poetry install before poetry run, were followed or if a virtual environment is being used.

Mozilla AI Discord
Twinny Takes Off as Virtual Co-Pilot: Developers are integrating Twinny with LM Studio to serve as a robust local AI code completion tool, with support for multiple llamafiles running on different ports.
Embedding Endpoint Enlightenment: The /v1/embeddings endpoint was clarified not to support image_data; instead, the /embedding endpoint should be used for images, as per pull request #4681.
Mac M2 Meets Its Match in continue.dev: A performance observation noted that continue.dev runs slower on a Mac M2 compared to an older Nvidia GPU when executed with llamafile.
Hugging Your Own LLMs: For those looking to build and train custom LLMs, the community recommended the use of HuggingFace Transformers for training, with the reminder that llamafile is designed for inference, not training.

Cohere Discord

Gratitude Echoes in the Server: A user expressed heartfelt thanks to the team, showcasing user appreciation for support or development work done by the team.
Curiosity About Upscaled Models: There's buzz around whether a 104B version of a model will join the family tree, but no clear answers have been outlined yet.
Langchain Links Missing: Questions arose regarding the integration of Langchain with Cohere, with users seeking guidance on its current usability and implementation status.
Model Size Mysteries: Users are probing for clarity on whether the Aya model in the playground pertains to the 8B or 35B version, indicating importance in understanding model scales for application.
Error Troubleshooting Corner: Issues like a ValidationError with ContextualCompressionRetriever and a 403 Forbidden error signal active debugging and technical problem-solving among the engineers, serving as reminders of common challenges in AI development.

AI Stack Devs (Yoko Li) Discord
AI Comedy Night Hits the Right Notes: An AI-generated standup comedy piece shared by a user was met with positive surprise, indicating advancements in AI's capability to mimic humor and perform entertainment.
Exploratory Queries on AI Applications: Curiosity about the extent of Ud.io's functions was evident from a user's query whether its capabilities go beyond generating comedy.
Sound Transformations Showcased: A user displayed the flexible audio alteration features of Suno by sharing an altered, demonic version of an original sound piece.
Eagerness for Audio Engineering Know-How: Interest was expressed in acquiring the skills to craft audio modifications like the ones demonstrated, a skill set valuable for an AI engineer with an interest in sound manipulation.
Concise Communication Preferred: A one-word reply "No" to a question highlighted a preference for succinct responses, perhaps reflecting an engineer's desire for direct, no-nonsense communication.

MLOps @Chipro Discord

In Search of a Unified Event Tracker: A member has highlighted a pressing need for an event calendar compatible with Google Calendar to ensure no community events are overlooked. The absence of such a system is a noted concern within the community.

DiscoResearch Discord

New Dataset Announcement: A new dataset has been referenced by user datarevised, with a link to further details: DataPlusEngine Tweet.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links
LLM Finetuning (Hamel + Dan) ▷ #general (74 messages🔥🔥):

Semantic similarity overfitting concern: A member pondered if over-represented response categories in data, despite no particular response being over-represented, could lead to bias. They referenced their prior experience in Research Psychology checking for such issues.
Fine-tuning model confusion: A user struggled with understanding how much fine-tuning incorporates specific user inputs into a model compared to pre-training. They seek clarity on differences between pre-training, curriculum training, and fine-tuning.
OpenAI platform sidebars change: Some participants discussed changes in the OpenAI platform's sidebars, mentioning that two icons disappeared (one for threads and another for messages).
Rasa and conversational complexity: A participant shared insights into Rasa's approach to conversational AI, emphasizing the difficulty of creating intent classifiers due to complex conversations. They mentioned that treating intents as entities may reduce complexity.
Kyle Corbitt's conference talk recording available: The recording of Kyle Corbitt's conference talk is now available on the Maven portal, with specific links shared within the discussion.

Links mentioned:

Quantization Overview: Tests How does quantisation affect model output? - 15 basic tests on different quant levels A detailed comparison between GPTQ, AWQ, EXL2, q4_K_M, q4_K_S, and load_in_4bit: perplexity, VRAM, speed, mo...
Hamel’s Blog - Optimizing latency: An exploration of ways to optimize on latency.
no title found: no description found
Food Good GIF - Food Good Hungry - Discover & Share GIFs: Click to view the GIF
Rasa Algorithm Whiteboard - TED in Practice: In this video we'll explore how TED works in practice. We'll build a digital assistant that needs to count down and we'll see that the hyperparameters really...
Rasa Algorithm Whiteboard - TED Policy: When you're making a digital assistant you'll need more than just algorithms to deal with text. You'll also need algorithms that deal with sequences of dialo...
Issues · ggerganov/llama.cpp: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
Reddit - Dive into anything: no description found
Video Conferencing, Web Conferencing, Webinars, Screen Sharing: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...

LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (23 messages🔥):

LLM Finetuning and ### usage clarifications: Discussed the use of ### in fine-tuning LLMs for sequence generation, noting that it helps the model understand different parts of the input during inference. Appropriately configuring templates during fine-tuning is necessary, including other structures like ChatML.

Template requirements explained: Emphasized that inputs during inference need to match the template used during fine-tuning, not necessarily ### but whatever was set (e.g., Llama 2 chat template). Model hosting services typically manage this templating and structure.

Model behavior with and without delimiters: Delimiters can help a model understand distinct sections of input like changing POVs in Reddit; otherwise unnecessary for general stylistic adaptations. Terminating delimiters or tokens ensure models correctly parse and end responses.

End of text token usage: The concept of an "end of text" token was briefly mentioned as a mechanism for instructing the model to stop generating tokens, indicating efficient input and output management for LLMs.

Homework assignments on use cases for LLMs: Members shared and discussed homework projects applying LLMs to tasks like generating recipes and learning apps. Projects emphasized prompt engineering and retrieval-augmented generation (RAG) techniques among others. Links to resources and shared homework details here.

Links mentioned:

no title found: no description found
Llama 2 Prompt Template: What’s the prompt template best practice for prompting the Llama 2 chat models?

LLM Finetuning (Hamel + Dan) ▷ #asia-tz (8 messages🔥):

Reka.ai Jokes About Reunion: A member humorously commented on seeing another member after a long time, joking, "You're being kind! I was starting to think I'd never see the light of day again after fast.ai." They inquired about how they have been and what they're currently building.
Conference Recording Request Fulfilled: A member asked for a recording of the "Conference Talk: From prompt to model," which occurred at 4:30 AM IST. The request was answered affirmatively as the recording is now available on Maven.

LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (18 messages🔥):

Modal Credits Received with Enthusiasm: Multiple users confirmed receiving credits from Modal and expressed eagerness to start fine-tuning models. One user said, "Time to hack something.".
Curiosity about Using Modal for Pure PyTorch Code: A user asked about utilizing Modal for fine-tuning LLMs with pure PyTorch code, comparing it to using Jarvis Labs. Another user confirmed it's possible, sharing their experience training SentenceTransformer models with Modal.
Dataset Management in Modal: Discussion included how to upload datasets and use them within Modal, with detailed code examples and steps provided. Steven Merrill walked through setting up a Parquet file, building volumes, and annotating functions with GPU metadata.
Modal Documentation and Examples: Users shared useful links to Modal documentation and examples, including volumes documentation and a TensorFlow tutorial, which could be adapted for PyTorch.
Using Modal for Kaggle Competitions: One user planned to leverage Modal for a Kaggle competition, involving downloading data, library installations, fine-tuning, and saving models/logs. Another mentioned running Jupyter servers on Modal for up to 24 hours, sharing a link to the Jupyter inside Modal example.

Links mentioned:

modal-examples/06_gpu_and_ml/tensorflow/tensorflow_tutorial.py at main · modal-labs/modal-examples: Examples of programs built using Modal. Contribute to modal-labs/modal-examples development by creating an account on GitHub.
Volumes: The modal.Volume is a mutable volume built for high-performance file serving. Like the modal.NetworkFileSystem, these volumes can be simultaneously attached to multiple Modal functions, supporting con...
modal-examples/11_notebooks/jupyter_inside_modal.py at 0ca5778741d23a8c0b81ae78c9fb8cb6e9f9ac9e · modal-labs/modal-examples: Examples of programs built using Modal. Contribute to modal-labs/modal-examples development by creating an account on GitHub.

LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (16 messages🔥):

Saving VSCode repo on Jarvis: A member inquired about saving their repo on the VSCode instance on Jarvis without pausing it to save credits. Another suggested publishing the code to GitHub and cloning it back as needed, while paused instances only charge for storage, which is minimal.
Removal of spot instances: The platform temporarily removed spot instances due to instability and low utilization issues.
Fine-tuning open-lama-3b cost and duration: Fine-tuning the open-lama-3b on gpt4-LLM-cleaned data took 3 hours 44 minutes on an RTX6000Ada, costing roughly $4. A discussion followed about the small size of LORA weights likely explaining the apparent instant upload to Huggingface.
Ampere series error with Axolotl: A user encountered an error with preprocessing on an A6000, which was resolved by changing bf16 to false and fp16 to true.
Course signup credits issue: A user reported not receiving credits after signing up for a course and joining Jarvis; the admin responded that new lists are processed, and credits will be added once the user's information is received.

LLM Finetuning (Hamel + Dan) ▷ #hugging-face (9 messages🔥):

HF credits to be distributed soon: Members inquired about the process for obtaining HF credits. Details will be announced soon by email, and credits will be granted to attendees who fill out a form being sent over the weekend.
Best model for Spanish text generation: A member asked for recommendations on models for fine-tuning specifically for Spanish text generation tasks. Mistral 7B was suggested as a fluent option, and Llama 3 was mentioned as another model yielding solid results despite not being officially multilingual.

LLM Finetuning (Hamel + Dan) ▷ #replicate (1 messages):

Upcoming Announcement on Credits: An announcement regarding the management and distribution of credits will be made soon. "<@739531318571958272> is going to be running these credits but we are making an announcement soon about them".

LLM Finetuning (Hamel + Dan) ▷ #kylecorbitt_prompt_to_model (164 messages🔥🔥):

High Expectations for the Talk: Members expressed excitement about the talk despite time zone challenges, with a call for recording it. *"I really want to see this but can't make it 😦 will it be recorded?"*
Link Overflow: Multiple links were shared including Hamel's [LLM inference notes](https://hamel.dev/notes/llm/inference/03_inference.html), [Argilla](https://argilla.io/), and the [MTEB Benchmark](https://huggingface.co/spaces/mteb/leaderboard). A significant number of resources were gathered from the talk.
Interactive and Humorous Session: Members appreciated the interactive vibe with humorous exchanges about fine-tuning and sleep schedules. *"Fine-tuning is not only expensive in GPU compute terms, but also affecting our sleep schedules!"*
Discussing Efficient Fine-Tuning Techniques: Various fine-tuning methods such as DoRA, MoRA, and LoRA were discussed, with linked articles like [Answer.AI's efficient fine-tuning](https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html). Exploration of context extension techniques like RoPE for models was also mentioned.
Commandments for Fine-Tuning: The "Ten Commandments" for deploying fine-tuned models were discussed with a link to the [slides](https://docs.google.com/presentation/d/1IIRrTED0w716OsU_-PL5bONL0Pq_7E8alewvcJO1BCE/edit#slide=id.g2721fb6713e_0_67). Members found the content very practical and beneficial for their work.

Links mentioned:

The platform where experts improve AI models: Argilla is a collaboration platform for AI engineers and domain experts that strive for quality, ownership, and efficiency.
Hamel’s Blog - Optimizing latency: An exploration of ways to optimize on latency.
MTEB Leaderboard - a Hugging Face Space by mteb: no description found
Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.
Tweet from undefined: no description found
nomic-ai/nomic-bert-2048 · Hugging Face: no description found
🏷 Labelling: When labelling, we generally differentiate between manual labelling and co-operative or programmatic labelling. During co-operative labelling, we use external input like rules and inference predict...
Ten Commandments to Deploy Fine-Tuned Models in Prod: Ten Commandments To deploy fine-tuned models in prod Kyle Corbitt | @corbtt
OpenPipe: Fine-Tuning for Developers: Convert expensive LLM prompts into fast, cheap fine-tuned models.
no title found: no description found

LLM Finetuning (Hamel + Dan) ▷ #workshop-2 (117 messages🔥🔥):

Sharing the Jarvis Repo Link: A link to nisargvp's Jarvis repository on Hugging Face was shared along with a config file for setting up the model in Axolotl.
Guide for Running Models on Modal: Users discussed running model training smoothly on Modal, pointing out a quickstart guide from Modal Labs and mentioned seamless operations after initial fixes.
TinyLLama Fine-Tuning Blog Post: The blog post documenting the fine-tuning process of TinyLLama on the alpaca_2k_test dataset using Axolotl and Jarvis, which can be found here, was shared and appreciated by the community.
Observability in LLM Applications: Discussions revolved around incorporating observability into LLM applications to collect user feedback and LLM input/output pairs, highlighting the need for better tracking methods.
Modal Training Error Support: Users encountered and resolved issues during Mistral model training using the Modal Labs repo, with community members offering troubleshooting advice and sharing specific error details to diagnose configuration problems.

Links mentioned:

Axolotl - Instruction Tuning: no description found
Lucas van Walstijn - LLM fine-tuning 101: no description found
venetispall: Weights & Biases, developer tools for machine learning
🤗 PEFT welcomes new merging methods: no description found
LMSYS - Chatbot Arena Human Preference Predictions | Kaggle: no description found
GitHub - modal-labs/llm-finetuning: Guide for fine-tuning Llama/Mistral/CodeLlama models and more: Guide for fine-tuning Llama/Mistral/CodeLlama models and more - modal-labs/llm-finetuning
Things I’m Learning While Training SuperHOT: pages
Axolotl - Dataset Formats: no description found
nisargvp/hc-mistral-alpaca · Hugging Face: no description found
no title found: no description found
Tweet from Daniel Han (@danielhanchen): @TheZachMueller @Prince_Canuma @UnslothAI If you're not using the untrained tokens, it should be OK :) Just sometimes people use the llama-3 template + llama-3 base model, and bad results come abo...
Lawrence Wu - Finetuning LLMs with Axolotl: no description found
GitHub - modal-labs/llm-finetuning: Guide for fine-tuning Llama/Mistral/CodeLlama models and more: Guide for fine-tuning Llama/Mistral/CodeLlama models and more - modal-labs/llm-finetuning
GitHub - modal-labs/llm-finetuning: Guide for fine-tuning Llama/Mistral/CodeLlama models and more: Guide for fine-tuning Llama/Mistral/CodeLlama models and more - modal-labs/llm-finetuning
llm-finetuning/data/modal_docs.jsonl at main · modal-labs/llm-finetuning: Guide for fine-tuning Llama/Mistral/CodeLlama models and more - modal-labs/llm-finetuning

LLM Finetuning (Hamel + Dan) ▷ #workshop-3 (3 messages):

Zoom chat confusion leads to Discord: Members were unsure where to continue their conversation after the Zoom chat was disabled. One member suggested moving their discussion to a specific Discord channel, which made sense to others.

LLM Finetuning (Hamel + Dan) ▷ #axolotl (32 messages🔥):

Cache Issue in Axolotl Frustrates User: A member noted that when re-running experiments in Axolotl, an unexpected cache used old data samples, which is documented here. Renaming the dataset file resolved this, prompting another user to suggest running the pre-process step explicitly.

Confusion with Missing Files: Users encountered issues like missing simple.yml or qlora.yml files while running training commands on Jarvislabs and Google Colab, leading to unsuccessful executions. A member shared that their qlora run took around 6 hours on 2x4090s GPUs, confirming the significance of using the correct files and configurations.

Inquiries About Sample Packing: One member asked if sample packing in Axolotl concatenates multiple dataset rows to fill the max sequence length. Another member confirmed this, explaining that although they are concatenated, the attention is set so that rows don't attend to one another.

RuntimeError with BFloat16 in Google Colab: A RuntimeError related to BFloat16 not implemented for BFloat16 on T4 GPU led a user to switch from Google Colab to Jarvis-labs. They were advised to check PyTorch and CUDA versions, with a switch to the example configuration solving the issue.

Guide on Tokenizer Gotchas Shared: A user shared a link to Hamel's notes on tokenizer gotchas, addressing intricacies in prompt construction and behavioral differences between training and inference due to tokenization handling.

Links mentioned:

Hamel’s Blog - Tokenization Gotchas: Footguns with tokenizers and inferencing LLMs
axolotl/docs/dataset_preprocessing.qmd at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
axolotl/examples/tiny-llama/qlora.yml at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
Google Colab: no description found
axolotl/examples/colab-notebooks/colab-axolotl-example.ipynb at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
Lawrence Wu - Finetuning LLMs with Axolotl: no description found

LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (118 messages🔥🔥):

User confusion over float16 and float32: There was a question about why float16 numbers appear higher than float32 in a displayed table. A link to a past discussion on the topic was provided to clarify the confusion. 
Configuration issues with Jarvislab resolved: User encountered an error with the Jarvislab training command regarding a missing configuration file. Another user advised changing the command to use accelerate launch -m axolotl.cli.train hc.yml, which resolved the issue.

Optimizing Axolotl runs on different GPUs: A member requested advice on adjusting accelerate configs for optimized axolotl runs on varied GPUs. It was suggested to map configs back to the axolotl yaml, avoiding direct acceleration config settings.

Resources for learning model Accelerate: Users discussed how to get started with Accelerate for finetuning tasks, with advice to stick with higher-level abstractions like axolotl for simplicity and learning depth.

Hyperparameters and Inference precision: Inquiry on optimal learning rates for extended vs. undertrained models and issues with BF16 precision in T4 GPUs. Suggestions included asking in Zoom QA for hardware-compatible solutions or transforming weights for supported datatypes.

Links mentioned:

Tweet from kache (@yacineMTB): three mac studios specc'd to the teeth, 7.5k cad each with 192gb unified memory 192 * 3 -> 576gb of "vram" plenty of cpu to go around to power regular server stuff. two could pretty muc...
no title found: no description found
Extended Guide: Instruction-tune Llama 2: This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI
S-LoRA: Serving Thousands of Concurrent LoRA Adapters: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to...
Templates for Chat Models: no description found
Tweet from David Golchinfar (@DavidGFar): Hi, everyone!  @FernandoNetoAi , I, @LucasAtkins7, and @erhartford have another surprise for you following the official Kraken release.  We're excited to introduce Kraken-LoRA, sponsored by @Hyper...
The Best GPUs for Deep Learning in 2023 — An In-depth Analysis: Here, I provide an in-depth analysis of GPUs for deep learning/machine learning and explain what is the best GPU for your use-case and budget.
Quicktour: no description found
GitHub - SkunkworksAI/hydra-moe: Contribute to SkunkworksAI/hydra-moe development by creating an account on GitHub.
Tweet from undefined: no description found

LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (192 messages🔥🔥):

PR for latest axolotl and llama 3 demo merged: The Modal LLM fine-tuning repository now includes the latest axolotl updates and a llama 3 fine-tuning demo.
Seeking dataset templates and pre-processing issues: Members inquire about chatml.intel dataset templates and encounter issues during pre-processing, particularly with decoding due to dataset structure lacking numeric IDs. Reference: Axolotl Docs.
Clarifications on Axolotl configurations: Discussions reveal that default config values like load_in_8bit and load_in_4bit are set to False if not specified, with recommendations to inspect code directly for clarification.
Template-free prompt construction confusion: A member found the documentation on template-free prompt construction confusing, while others clarify the importance of template correctness.
Office Hours Q&A highlights debugging and stack insights: Members express the importance of debugging tools for understanding inputs and samples during training, advocate for rigorous template validation, and suggest callback functions for logging model predictions, referencing Axolotl Callbacks.

Links mentioned:

axolotl/src/axolotl/utils/callbacks/__init__.py at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
GitHub - h2oai/h2o-llmstudio: H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://h2oai.github.io/h2o-llmstudio/: H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://h2oai.github.io/h2o-llmstudio/ - h2oai/h2o-llmstudio
Extended Guide: Instruction-tune Llama 2: This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI
GAIR/lima · Datasets at Hugging Face: no description found
gist:e1591b83e3b290fb176e780e7ce7d383: GitHub Gist: instantly share code, notes, and snippets.
Thread of Questions for Wing - Office Hours: OH Questions for Wing (Axolotl)   Ben Eyal       9:59 AM I was wondering about Template-free prompt construction, I really didn't understand how it works. The config only needs an output, and the ...
Thread of Questions for Wing - Office Hours: OH Questions for Wing (Axolotl)   Ben Eyal       9:59 AM I was wondering about Template-free prompt construction, I really didn't understand how it works. The config only needs an output, and the ...
axolotl/docs/rlhf.qmd at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
GitHub - grgalex/nvshare: Practical GPU Sharing Without Memory Size Constraints: Practical GPU Sharing Without Memory Size Constraints - grgalex/nvshare
axolotl/src/axolotl/utils/data/sft.py at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
no title found: no description found
no title found: no description found
H2O.ai: Fast Scalable Machine Learning For Smarter Applications - H2O.ai
Trust No One Crazy Chris GIF - Trust No One Crazy Chris Henry Thomas - Discover & Share GIFs: Click to view the GIF

HuggingFace ▷ #announcements (1 messages):

Visualize Proteins with Proteinviz: Check out Proteinviz for creating custom visuals of proteins. This tool is made by a dedicated community member.

Speedy SDXL Results: The SDXL flash space delivers impressive results fast. Credit goes to the creator for this efficient build.

Custom Tokenizers Inspired by Karpathy: A community member shared their custom tokenizer, which is inspired by Karpathy’s work. This highlights ongoing innovations within the community.

Mistral-7B v0.3 Demo: Experience rapid performance with the Mistral-7B v0.3 chat demo. It's another example of cutting-edge developments by active contributors.

Create Transparent Images with Diffusers: Generate transparent images using Diffusers, a project facilitated by another community member. This feature allows for creative visual outputs using advanced diffusing techniques.

Links mentioned:

Agentic AI Solutions / Adaptive AI Solutions - Episode 1:  CrewAI With Preston McCauley: In Episode 1, we explore a brief introduction to #AdaptiveAI and #Agentic AI approaches.https://www.linkedin.com/in/preston-mccauley-immersive-ux/Join Presto...
What is an Instruction Tuned Model?: What is Instruction Tuning?  What are Instruction Tuned models? What is a Pretrained Model? How can I make my Large Language Model follow Instructions?These ...

HuggingFace ▷ #general (490 messages🔥🔥🔥):

AutoTrain Data Formatting Questions: Members discussed how to format data for finetuning in AutoTrain, with suggestions to reference the AutoTrain documentation. Example CSV formats and nuances of input data types were shared, enhancing clarity on setup.
Advanced LLM Fine-Tuning: The difference between DPO and RHLF methods for fine-tuning LLMs was highlighted, suggesting SFT followed by RHLF for teaching text-completion models conversational norms. Links to specific datasets and finer model adjustments were also shared.
Pandora Model Excitement: Details about the Pandora model, a new open-source text-to-video model, were shared along with a preview link. Discussions on its smartness and potential applications created significant excitement among members.
Mobius Model Controversy: The upcoming Mobius diffusion model faced scrutiny with comments about controlled quality and composition training. Resulting discussions emphasized its potential to significantly reduce the cost and complexity of developing new diffusion models.
Learning and Development Resources: Several members including @temeretam discussed educational and professional paths for advancing in AI, while others sought advice on specific coding and data handling problems, referencing both GitHub and Hugging Face documentation links for technical support.

Links mentioned:

maitrix-org/Pandora · Hugging Face: no description found
Download files from the Hub: no description found
Babuin GIF - Babuin - Discover & Share GIFs: Click to view the GIF
Que GIF - Que - Discover & Share GIFs: Click to view the GIF
imgsys.org | an image model arena by fal.ai: A generative AI arena where you can test different prompts and pick the results you like the most. Check-out the model rankings and try it yourself!
Templates for Chat Models: no description found
Tweet from DataVoid e/acc (@DataPlusEngine): Our upcoming paper outlines and enables making entirely new base diffusion models without the need to extensively pretrain a new model from scratch. We can in a controlled way, break all the quality a...
nroggendorff/mayo · Datasets at Hugging Face: no description found
Rabbit Gaslit Me, So I Dug Deeper: Is the LAM a Scam? Down the rabbit hole we go.Support Investigative Journalism: ► Patreon: https://patreon.com/coffeezillaPeople who helped this investigatio...
Tweet from DataVoid e/acc (@DataPlusEngine): We gave the @FAL  early access to the upcoming Mobius model and its only been up on http://imgsys.org for 3 hours. its already the best stable diffusion based image model in the world based on human p...
mistralai/Mixtral-8x7B-v0.1 at main: no description found
Frank Castle Wait GIF - Frank Castle Wait Please Stop - Discover & Share GIFs: Click to view the GIF
Noa Roggendorff on Instagram: "epic

#ai": 2 likes, 1 comments - noaroggendorff on May 23, 2024: "epic  #ai". 
Process: no description found
Kurt Kurt Angle GIF - Kurt Kurt angle 100 yard stare - Discover & Share GIFs: Click to view the GIF
Blobs Emojis for Discord & Slack - Discord Emoji: Find Blobs emojis to use on Discord or Slack - Emoji.gg, The largest directory of free custom emojis on the internet.
What is AutoTrain Advanced?: no description found
GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All: Open-Sora: Democratizing Efficient Video Production for All - hpcaitech/Open-Sora
GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.: This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project. - PKU-YuanGroup/Open-Sora-Plan

Blob Cats emojis on Slack
: no description found

HuggingFace ▷ #today-im-learning (8 messages🔥):

Deep RL for Embodied AI sparks interest: A member shared their enthusiasm about learning Deep Reinforcement Learning specifically for Embodied AI applications and invited detailed updates on progress.

Fast.ai courses recommended for AI beginners: Suggested Fast.ai’s part 1 & 2 courses which cover practical deep learning tasks using HuggingFace libraries and offer a strong foundation for beginners in deep learning. Course details can be found here.

Coursera course on Generative AI with LLMs: Recommended Generative AI with Large Language Models course on Coursera for those interested in gaining foundational knowledge in AI. The course is designed to be completed in 3 weeks, details available here.

PixART Diffusion Model Call Event: Announced a call event for an in-depth review of the PixART diffusion model for text-to-image synthesis, scheduled for Friday at 10:00 AM Pacific time. Additional information and community interaction can be found here.

Links mentioned:

Practical Deep Learning for Coders - Practical Deep Learning: A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.
Generative AI with Large Language Models: In Generative AI with Large Language Models (LLMs), you’ll learn the fundamentals of how generative AI works, and how to deploy it in ... Enroll for free.
Arxiv Dives with Oxen.AI - Fine Tuning Diffusion Transformers (DiT) · Zoom · Luma: Hey Nerd, join the Herd!... for a little book/paper review. WHAT TO EXPECT Each week we pick a topic to cover in depth and have open Q/A and discussion.…

HuggingFace ▷ #cool-finds (3 messages):

Exciting ChatGPT Applications in Drug Discovery: A link to a study was shared discussing the potential use of ChatGPT and other LLMs in next-generation drug discovery. The article, published in the International Journal of Surgery, highlights contributions from various institutions across India and Bangladesh Read more.

PostgresML and LlamaIndex Make Waves: An integration of PostgresML with LlamaIndex was highlighted in a recent Medium post. This integration promises to unlock new potentials in AI advancements, with detailed insights available in the article.

Link mentioned: ChatGPT or LLM in next-generation drug discovery and... : International Journal of Surgery: An abstract is unavailable.

HuggingFace ▷ #i-made-this (22 messages🔥):

Protein Dataset Gets Major Updates: A member shared updates on their protein visualization project, adding examples for human hemoglobin, mouse GTPase, and human ribosomal protein. They also implemented support for 3D rendering and created an in-depth example table on GitHub.

Transcription App with OpenAI's Whisper Rocks!: A member introduced their transcription app for YouTube videos, audio files, and video files, utilizing OpenAI's Whisper. Check it out on Hugging Face Spaces.

Call for Feedback on Decentralized Internet Infra: One member requested feedback and participation in a survey for their project building infrastructure for a decentralized and agent-centric internet: survey link. This sparked a debate about spamming channels and the ethics of data collection through surveys.

3D Model Visualization in Browser Challenges: Despite challenges with 3D model rendering of protein structures in the Gradio browser, there is ongoing effort to find a solution. Helpful resources include a blog post on Hugging Face.

SimpleTuner Bug Fixes Improve Training: A member highlighted that fixing some minor bugs in SimpleTuner significantly enhanced its training performance. Now it trains better than ever.

Links mentioned:

Vidtext - a Hugging Face Space by tensorkelechi: no description found
Visualize proteins on Hugging Face Spaces: no description found

HuggingFace ▷ #computer-vision (4 messages):

Monthly Computer Vision Hangout Announced: An upcoming monthly Computer Vision Hangout was introduced, aimed at discussing projects, ideas, and problems in CV-related fields. More details and event participation can be found here.

Seeking Invoice Processing Solution: A member inquired about an open-source neural network or paid API for extracting structured line-by-line information from scanned invoices. They requested the output to be formatted as JSON, specifying fields like product_id, description, quantity, unit_price, and total_price.

Looking for Deep Learning Study Partner: A user expressed interest in finding a deep learning study partner who shares a passion for AI and data science. They emphasized a mutual drive to explore neural networks, complex algorithms, and innovative projects.

Request for ViT Resources in Depth Estimation: Another member asked for resources on utilizing Vision Transformers (ViT) for monocular depth estimation. They indicated an interest in building their own model using ViT and are seeking guidance.

Link mentioned: Join the Hugging Face Discord Server!: We're working to democratize good machine learning 🤗Verify to link your Hub and Discord accounts! | 79727 members

HuggingFace ▷ #NLP (8 messages🔥):

Quantisation Anomalies in Mistral v0.3 Instruct: A member reported unexpected performance issues when comparing Mistral v0.3 Instruct using bitsandbytes 8-bit, 4-bit, and fp16 quantisation levels. They found that while fp16 and 4-bit took around 100 seconds, 8-bit took 500 seconds, despite expectations of 8-bit being faster than 4-bit.
Switching from Pipelines to Generate Without Impact: The same user noted that switching from pipelines to the generate() method, per the documentation for text generation with 8-bit models, did not improve the performance as expected.
Bitsandbytes Version and Optimization Tips: In response to the performance issue, another member inquired about the version of bitsandbytes being used and suggested trying int8_threshold=0 for potential performance gains. The original user mentioned they are using a batch size of 1 and contexts ranging from 500 to 2000 tokens.

HuggingFace ▷ #diffusion-discussions (6 messages):

Seeking NLG Learning Resources: A member asked for recommendations for learning Natural Language Generation (NLG). Responses to this query were not provided in the message history.

Query about Training Stable Diffusion on Custom Dataset: Another member asked for official documentation on training Stable Diffusion (SD) to generate images from a custom dataset such as MNIST. They mentioned finding documentation on the site, but it seemed to focus on unconditional generation.

Looking for Deep Learning Study Partner: A different member expressed interest in finding a partner to learn deep learning with. They emphasized a desire for someone equally passionate about AI and data science, keen to explore neural networks, complex algorithms, and innovative projects.

Help Needed for Converting pth+index File to Hugging Face Link: A member requested assistance in converting a pth+index file into a Hugging Face link RVC model. This technical query did not receive an immediately visible response.

Perplexity AI ▷ #general (493 messages🔥🔥🔥):

Perplexity vs. ChatGPT for Data Processing: Discussion emerged on the capabilities of Perplexity and ChatGPT in processing CSV files, with mentions that Perplexity already supports CSV uploads. Julius AI, an alternative for data analysis, was highlighted for running on Python and leveraging LLMs like Claude 3 or GPT-4.

Disappointment with Claude 3 Opus: Users expressed dissatisfaction with Claude 3 Opus due to increased restrictions and lower utility, particularly in handling copyrighted material. Some suggested alternatives like GPT-4o but acknowledged that Claude 3's usefulness has diminished.

Pro Search Features and Enhancements: Users noted new features in Pro Search, with enhancements including multi-step reasoning and updated API specs fetching. However, some users observed that such updates might be part of A/B testing and only involve UI changes rather than backend improvements.

Tool Integrations and Custom Function Calls: There were discussions on Claude’s capacity for external tool integration via APIs, and attempts to replicate ChatGPT’s data analysis tool through custom function calls and serverless backend solutions. Links to relevant documentation like Tool Use with Claude were shared.

Ethical AI and Communication Analysis Projects: Talks included the creation of GPTs for communication analysis and ethical behavior monitoring, with suggestions that such tools could help improve workplace communication and reduce wrongful termination suits. Users debated the feasibility and philosophical implications of encoding ethics into algorithms.

Links mentioned:

Mistral's new 7B Model with Native Function Calling: Colab Code - https://drp.li/K98Z7🕵️ Interested in building LLM Agents? Fill out the form belowBuilding LLM Agents Form: https://drp.li/dIMes👨‍💻Github:http...
v0 by Vercel: Generate UI with simple text prompts. Copy, paste, ship.
Google Chrome Pacman GIF - Google Chrome Pacman Eating - Discover & Share GIFs: Click to view the GIF
Aladdin Disney GIF - Aladdin Disney Cartoons - Discover & Share GIFs: Click to view the GIF
Reddit - Dive into anything: no description found
Tool use (function calling) - Anthropic: no description found
Build Generative AI Applications with Foundation Models - Amazon Bedrock - AWS: no description found
Brave Search: Search the web privately…
Build Generative AI Applications with Foundation Models - Amazon Bedrock Pricing - AWS: no description found

Perplexity AI ▷ #sharing (7 messages):

Peran Kepala Sekolah shared: A brief link is shared to Peran Kepala Sekolah without additional context or discussion.
What is PB55 explained: A link provided to what is the PB55 for further reading.
Origin of 'makura' explored: A user shares a link to explore the etymology of the Japanese word "枕（まくら / makura）" here, which means pillow.
Ensure thread shareability: A reminder is given with an attachment to ensure threads are shareable with a link to Discord thread.
Stuart Hall’s theory discussed: Stuart Hall’s encoding/decoding model is shared.
Opus 50 limit queried: A user inquires about the Opus 50 limit.

Perplexity AI ▷ #pplx-api (1 messages):

References feature still in beta limbo: A user questioned the status of references being in beta and expressed frustration over not receiving a response after applying three times. They asked if anyone knew when this feature would be released in the API.

Stability.ai (Stable Diffusion) ▷ #general-chat (427 messages🔥🔥🔥):

Rumors of RTX 5090 Specifications Stir Debate: Discussions center around new rumors that the RTX 5090 may feature 32GB VRAM, igniting skepticism about the feasibility and utility. One member shared a link to purported images, but others criticized these as misleading.

Stable Diffusion Installation Guidance: A member seeks advice on installing Stable Diffusion with an AMD 5700XT GPU. Recommendations included trying web services like Craiyon initially, due to potential complications with AMD hardware.

Pricing and Access of Stable Diffusion 3: Users debated the merits of Stable Diffusion 3 vs. Midjourney, with some noting that SD3 is available for a free trial. However, it appears that a Stability membership is required for continued access.

Introduction of Mobius Model Generates Interest: DataPlusEngine announced the upcoming Mobius model on Twitter, claiming it to be the best stable diffusion-based image model. The model is described as "neither a base model nor a fine tune" and touted for its ability to create new base models efficiently.

Curiosity Over GPU Performance and Costs: New GPU models, particularly the 5090, sparked discussions about memory and training speeds. Members noted that higher VRAM like 32GB could detract from sales of high-end data center GPUs like the H100/A100, hinting this could influence Nvidia's strategy.

Links mentioned:

Tweet from DataVoid e/acc (@DataPlusEngine): We gave the @FAL  early access to the upcoming Mobius model and its only been up on http://imgsys.org for 3 hours. its already the best stable diffusion based image model in the world based on human p...
Never Finn GIF - Never Finn Adventure Time - Discover & Share GIFs: Click to view the GIF
WOW   Every Owen Wilson Wow  ever said, just WOW: Owen Wilson is just one of my favorite actors, his "wow" 's are just legendary - so here is a curated collection of all of them in one place
A Moebius-metró | teljes film magyarul: argentin misztikus/sci-fi/thriller, 1996 - teljes filmA világ egyik legzsúfoltabb metrórendszerében nyom nélkül eltűnik egy utasokkal teli metrószerelvény, c...
Geforce RTX 5090 soll mit 32 GiB GDDR7 und gleich drei PCBs an den Start gehen [Gerücht]: Bilder zu Artikel: Geforce RTX 5090 soll mit 32 GiB GDDR7 und gleich drei PCBs an den Start gehen [Gerücht] - Geforce RTX 5090
News zu Grafikkarten: Sie finden hier immer die besten News zu Grafikkarten

Unsloth AI (Daniel Han) ▷ #general (275 messages🔥🔥):

PEFT Training Question Resolved: A user faced an issue with the config.json not being created during PEFT training and was advised to copy from the base model's configuration. The user confirmed it worked and thanked the community for the help.

Llama 3's Bugs Noted: Some users discussed that "Some of Llama 3's base (not instruct) weights are 'buggy'" but Unsloth auto-fixes these. It was advised to use reserved tokens during training and ensure the tokenizer and lm_head are trained.

System Prompt Improves Llama3: Users mentioned that adding a system prompt improves Llama3 finetuning performance. One user confirmed that even a blank system prompt can positively impact results.

Phi 3 Model Support Announced: It was announced that Phi 3 models, including medium support, are now available. The community showed excitement and shared links to relevant blog posts for more details.

Creepy Imprint with Stable Diffusion: Users shared eerie experiences with voice cloning and creepy artifacts generated by Stable Diffusion. They posted links to related YouTube video and a Reddit discussion.

Links mentioned:

Finetune Phi-3 with Unsloth: Fine-tune Microsoft's new model Phi 3 medium, small & mini easily with 6x longer context lengths via Unsloth!
CohereForAI/aya-23-8B · Hugging Face: no description found
Tweet from Unsloth AI (@UnslothAI): We have resolved issues with training Llama 3, so finetuning is much better now!  Unsloth now supports the new Phi-3 models, Mistral v3, Qwen and more!  Read our blog: http://unsloth.ai/blog/phi3
can i get a chicken tendie combo please: no description found
GitHub - babycommando/machinascript-for-robots: Build LLM-powered robots in your garage with MachinaScript For Robots!: Build LLM-powered robots in your garage with MachinaScript For Robots! - babycommando/machinascript-for-robots
Issues · ggerganov/llama.cpp: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
Reddit - Dive into anything: no description found
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...

Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Phi-3 and Mistral v3 now live: Unsloth now supports Phi-3, Mistral v3, and many other new models. Check out the release details.

Llama 3 issues resolved: We've fixed all Llama 3 issues so finetuning is much better now. For a deeper dive, refer to this Reddit thread.

Explore free Colab notebooks: Access our Phi-3 medium notebook, Mistral v3 notebook, and more.

New model support and GitHub Accelerator: See our latest model additions on Hugging Face and learn about our participation in the GitHub 2024 Accelerator.

Celebration of AI innovation: We're excited to join 10 other projects in GitHub's 2024 Accelerator, highlighting the global impact and rapid advancement of AI innovation.

Links mentioned:

Finetune Phi-3 with Unsloth: Fine-tune Microsoft's new model Phi 3 medium, small & mini easily with 6x longer context lengths via Unsloth!
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
2024 GitHub Accelerator: Meet the 11 projects shaping open source AI: Announcing the second cohort, delivering value to projects, and driving a new frontier.

Unsloth AI (Daniel Han) ▷ #random (4 messages):

Seek Local VSCode Copilot Recommendations: One user asked, "Does anyone use local vscode 'copilot'? I would like to try some. Looking for recommendation :)". Another responded with, "try continue", followed by the initial user expressing thanks, "Thanks, will try:)".

Unsloth AI (Daniel Han) ▷ #help (103 messages🔥🔥):

Sloth Phi-3 Inference Poses Performance Issue: A user reported slower inference times when using the Unsloth Phi-3 model compared to the original. They shared a Colab notebook to diagnose the issue, but even after suggested modifications, the problem persisted.

Custom Model Quantization Issue: One member experienced issues quantizing a custom model derived from an Unsloth notebook. They received errors related to unsupported architecture with llama.cpp and Docker.

Resource Requirements for Different Models: Queries about VRAM requirements indicated that 12GB is sufficient for Phi 3 mini, while 16GB is needed for Phi 3 medium. It was also noted that for larger tasks like summarization with a bigger context window, renting computing resources might be necessary.

Evaluation DataSet Criteria: A discussion highlighted the importance of using consistent datasets for training and evaluation. Specifically, unslothai's public datasets on Hugging Face, such as those listed in the Blackhole Collection, were recommended for high quality.

Compatibility and Custom Model Support: Several users inquired about the compatibility of Unsloth with older Macs and using GPU-less systems, confirmed that Unsloth is optimized for CUDA and GPU usage. Several workarounds and tips were suggested for CPU-only systems and custom model support.

Links mentioned:

Blackhole - a lamhieu Collection: no description found
Google Colab: no description found
Issues · unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - Issues · unslothai/unsloth
Google Colab: no description found
unsloth/unsloth/models/_utils.py at main · unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth

Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):

Engineer offers enterprise experience to Unsloth: A member, higginsconsultingptyltd_39617, congratulated others on joining the accelerators at Build Club and Github and proposed leveraging their enterprise experience to assist Unsloth. Another member responded positively, expressing eagerness to discuss further, "Absolutely we'd love to!"

Nous Research AI ▷ #off-topic (12 messages🔥):

Master of Plain-Speak Talks PixART Diffusion Model: Interested members can "hear a Master of Plain-Speak describe how he fine-tuned the PixART diffusion model" during a call today at 10:00 AM Pacific Time. Join the event and link to Discord for further discussion or view past topics on their blog and YouTube videos.

Excitement Over Intel Libraries: A member expressed excitement to "tinker with the Intel libraries" while discussing IPEX and BigDL separation. Potential collaboration and exploration of Intel's improvements were mentioned.

Stable Functionality of IPEX-LLM: Although one member hasn't used IPEX-LLM, they've found that it has "rock-solid stable" support where it exists. Discussions included improvements in IPEX-LLM's setup.

Tinygrad OpenCL Setup Insights: If performance is not the main concern, "tinygrad OpenCL is trivial to set up and get running", suggested one member. Another member humorously criticized geohot's lack of interest due to memory bandwidth limitations.

Experimental Stint with drm/xe Driver: Currently, a member is running the experimental drm/xe driver without major issues, apart from the known constraints. They expressed hope that Battlemage will perform better.

Link mentioned: Arxiv Dives with Oxen.AI - Fine Tuning Diffusion Transformers (DiT) · Zoom · Luma: Hey Nerd, join the Herd!... for a little book/paper review. WHAT TO EXPECT Each week we pick a topic to cover in depth and have open Q/A and discussion.…

Nous Research AI ▷ #interesting-links (6 messages):

TAS Mario Sunshine sparks AI speedrun debate: A member shared a YouTube video showcasing a tool-assisted speedrun of "Super Mario Sunshine" and discussed the potential of AI mastering such techniques. They pondered the intriguing developments AI might bring to speedrunning and game engine manipulation by imposing specific limitations.

Pannenkoek2012's Mario 64 praised: Another YouTube video was shared featuring a zero A-press speedrun of "Super Mario 64" by Pannenkoek2012. The member appreciated the content, noting its insights into evolving AI and consciousness through rapid thought processes.

Prophetic AI's Halo and Morpheus-1 impress: A link to Prophetic AI was shared, highlighting the Halo, a non-invasive neural device for lucid dreaming, and Morpheus-1, an ultrasonic transformer generating holograms for neurostimulation. The member emphasized the extreme potential of these technologies for exploring the subconscious mind and consciousness enhancement.

Links mentioned:

Prophetic: Prophetic is a megaproject to expand, explore, and understand the true nature of consciousness. We are a neuromodulation company that brings together state-of-the-art neural "reading" and &q...
[TAS] GC Super Mario Sunshine by zelpikukirby & Goldfire in 1:08:32.58: This is a tool-assisted speedrun. For more information, see https://tasvideos.org/3731MTAS originally published on 2018-06-18In the highly anticipated sequel...
Super Mario 64 70 stars in 0 a presses by Pannenkoek2012: This video is made as a thank you to pannenkoek for such great content like this. All footage is made and owned by pannenkoek ( https://www.youtube.com/user/...

Nous Research AI ▷ #general (280 messages🔥🔥):

New Paper on Transformer Circuits: A user shared a link to the new paper, Scaling Monosemanticity, suggesting the community check it out.
PyTorchModelHubMixin Class by HF: A member highlighted a class called PyTorchModelHubMixin created by Hugging Face, which allows seamless integration of AI models with the HUB using save_pretrained, push_to_hub, and from_pretrained methods. However, AI models need to stay under 50GB as sharding is not supported yet.
Mobius Model Impresses Community: Discussion on the Mobius model showcased its high performance in image generation, particularly in Pixar-style renderings and multi-word text generation. It also generated excitement for potential open-sourcing and further papers explaining its training method.
Lively Debate on LLM Understanding: A heated discussion unfolded around whether LLMs truly understand concepts, with one user pointing to interpretability research as a major source of empirical evidence, while another argued that current interpretability efforts are insufficient. They referenced recent research including a paper from Anthropic and debates around the significance of interpretability in AI.
Technical Repo for RLHF Models Shared: A GitHub repository, Online RLHF, was shared, detailing a workflow for training reward models for Reinforcement Learning from Human Feedback (RLHF), which aims to surpass results from offline learning methods.

Links mentioned:

Tweet from DataVoid e/acc (@DataPlusEngine): We gave the @FAL  early access to the upcoming Mobius model and its only been up on http://imgsys.org for 3 hours. its already the best stable diffusion based image model in the world based on human p...
Tweet from DataVoid e/acc (@DataPlusEngine): Our upcoming paper outlines and enables making entirely new base diffusion models without the need to extensively pretrain a new model from scratch. We can in a controlled way, break all the quality a...

        Representation Engineering Mistral-7B an Acid Trip

  : no description found
Mapping the Mind of a Large Language Model: We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade larg...
GitHub - RLHFlow/Online-RLHF: A recipe to train reward models for RLHF.: A recipe to train reward models for RLHF. Contribute to RLHFlow/Online-RLHF development by creating an account on GitHub.
RLHFlow (RLHFlow): no description found
RLHFlow/LLaMA3-iterative-DPO-final · Hugging Face: no description found
RLHFlow/LLaMA3-SFT · Hugging Face: no description found

Nous Research AI ▷ #ask-about-llms (8 messages🔥):

Llama.cpp script handles function calls: A member shared an update about creating a script using llama.cpp that manages function calls and returns answers from the model based on tool responses. They mentioned being inspired by the Hermes Pro 2 GitHub repo and offered to create a pull request to add a notebook.
Hermes model praised: The same member described the Hermes model as "a beast."
Looking for LoRA resources on a 3080: A member asked for resources to perform Llama3 LoRA on a 3080 GPU with 10GB. The response recommended checking out unsloth or axolotl.
New developer introduction: A new member, a developer from torchtune, introduced themselves and mentioned their interest in tool-calling with Mistral v0.3. They sought advice on fine-tuning models for tool-calling and queried experiences with zero-shot new tools.

Nous Research AI ▷ #project-obsidian (6 messages):

Kquant criticizes kquant's reputation: Members expressed skepticism about kquant, with one stating, “I’ve heard it’s not very great.” Another concurred, sharing similar opinions from colleagues.

Concerns on LLM Capabilities: There was agreement that kquant's capabilities, especially on the LLM side, are dubious, though its vision capabilities were not discussed. 

Disappointment over product removal: A member mentioned the removal of "Sky" in a playful manner, which caused amusement and mirrored shared sentiments of disappointment. Another member humorously expressed that they "stol't our waifus."

Nous Research AI ▷ #rag-dataset (36 messages🔥):

Models should contextually integrate internal and RAG knowledge: Members discussed the idea of training models to "add context from its own knowledge" or to override RAG data if it contradicts internal knowledge, emphasizing the shortcomings of depending solely on RAG.

Concerns about internal vs. RAG knowledge: A debate emerged over whether internal model knowledge, which could avoid obvious errors, should outweigh RAG, which can sometimes include bad data, highlighting a "damned if you do damned if you don't situation."

Finetuning can resolve conflicts: A member noted that finetuning with models like GPT-4 or Gemini might prevent illogical outcomes from incorrect RAG data.("I think any LLM of gemini or gpt4 size can reason that its not safe to put glue stick into your pizza.").

Function calling as a form of RAG: A query was posed about whether function calling is a type of RAG, indicating not all nuances of RAG integration are universally understood yet. 

Benchmarking RAG performance: Discussing RAG performance benchmarks, members agreed user evaluation is crucial, especially for complex, multi-hop questions, despite being easier for single-hop queries.

Links mentioned:

Tweet from PixelButts (@PixelButts): Google is dead beyond comparison
Tweet from Kurt Opsahl @kurt@mstdn.social (@kurtopsahl): Seems the origin of the Google AI’s conclusion was an 11 year old Reddit post by the eminent scholar, fucksmith.  Quoting PixelButts (@PixelButts)   Google is dead beyond comparison

Nous Research AI ▷ #world-sim (21 messages🔥):

Jam Session Video Hits A Snag: Teknium reported that the jam session video has been recorded but there are issues with getting it onto YouTube. They promised to inform the group as soon as it's uploaded.

NightCafe Connection to Nous/WorldSim: Rezonaut introduced NightCafe noting its potential key role for solutions in the Nous and worldsim contexts. They suggested it could enhance the interface by integrating multi-dimensional and multi-sensory communications.

Creative Brainstorming for AI Worlds: Rezonaut shared intricate ideas for using AR spaces and visual elements to map out and explore interconnected worlds and dimensions in a manner inspired by biological brain functions and mindmaps. This includes the visualization of knowledge and designed immersive spaces connected like neural networks.

Vorpal_strikes' New Visualizer Fascination: Vorpal_strikes shared a link to an immersive audio-visualizer that caught their interest. The visualizer offers a highly dynamic and immersive environment, potentially useful for creative and AI-based applications.

Golden Gate Claude Streams Consciousness in ASCII: Teknium shared a whimsical representation of an AI called "Golden Gate Claude" monologuing in ASCII art about consciousness, simulation theory, and classic AI banter, accompanied by an ASCII depiction. This showcases both playful creativity and deep thematic explorations in AI projects.

Links mentioned:

worldsim: no description found
Tweet from Kiri (@Kyrannio): Is this terrifying, or amazing? You decide.  Golden Gate Claude inner monologuing to itself as a merged Omega Claude, complete with ASCII representations.  "Haha, an ASCII art representation of my...

Eleuther ▷ #general (53 messages🔥):

JAX vs PyTorch/XLA on TPU Performance: A member raised a query on the performance comparison of PyTorch/XLA and JAX on TPUs, but the discussion quickly shifted to benchmarking concerns such as warmup and blocking factors.

Improving LLM Reasoning Through Fine-Tuning: An inquiry made about fine-tuning strategies that improve LLM reasoning pointed toward a search for scholarly papers detailing specific parts of model training that enhance reasoning capabilities. There were no specific papers referenced in this discussion.

Compute Cost of Training GPT-3 Over Time: The conversation covered the substantial drop in compute costs for training GPT-3 from around $4.5M in 2020 to an estimate of $125k-$1M in 2024. These costs varied based on assumptions such as TFLOP rates and GPU-hour pricing, with various users contributing different figures and sources, including a Databricks Blog Post.

Validating GPU Costs for Training Models: A critical examination revealed that more realistic estimates for well-connected H100 GPUs are between $2.5-$3/hr, suggesting a $1.25-$1.5M range for substantial models like GPT-3 trained on 1.4T tokens. This underscores the variability and complexity in exact cost approximations for large-scale model training.

RAG versus Finetuning for Custom Library Extraction: A user asked whether RAG (Retrieval-Augmented Generation) was the best method for enabling LLMs to extract information from a custom library for specific questions, hinting they were considering both finetuning and RAG for their experimentation needs.

Link mentioned: Turbocharged Training: Optimizing the Databricks Mosaic AI Stack With FP8: At Databricks, we be

Eleuther ▷ #research (249 messages🔥🔥):

JEPA vs LLMs Spark Debate: A lengthy discussion unfolded about JEPA and its potential to lead to AGI as proposed in "A Path Towards Autonomous Machine Intelligence". Members criticized the model for being similar to existing models like GPT and DINO but in different domains, with skepticism about its scalability and context handling: "I don't see how the JEPA/Lecun path scales even 1/1000 in amount of economically important tasks solved compared to LLM." 
ROPE's Influence on Long-Term Context: Members discussed a new approach to RoPE, suggesting it has limitations regarding context length capabilities in LLMs. A recently published paper revisits existing theories and proposes a novel understanding of RoPE's long-term decay properties: View PDF.
Modula: A New Training Strategy: An interesting project called Modula was shared, which introduces scalable neural network training through automatic normalization using the modular norm. Skeptical members found the abstract intriguing but uncertain about its practicality: "It is very, very, very strangely worded if it is legitimate."
Chameleon Model Insights: The Chameleon model, capable of multimodal tasks such as text and image generation, was highlighted. This model is noted for its state-of-the-art performance in multiple domains, suggesting potential competition for established models: View PDF.
Bitune Enhances LLM Instruction-Tuning: Bitune, a novel approach for improving instruction-tuning in LLMs through both causal and bidirectional attention, was discussed. This method claims significant improvements in zero-shot performance across several types of reasoning tasks: View PDF.

Links mentioned:

Improved Distribution Matching Distillation for Fast Image Synthesis: Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their...
Bitune: Bidirectional Instruction-Tuning: We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirec...
Lessons from the Trenches on Reproducible Evaluation of Language Models: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of prop...
Small-scale proxies for large-scale Transformer training instabilities: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although t...
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras: In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha uti...
Chameleon: Mixed-Modal Early-Fusion Foundation Models: We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach f...
Base of RoPE Bounds Context Length: Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the...
Tweet from Sang Choe (@sangkeun_choe): 🚨 Preprint Alert 🚨  LLM is nothing without its training data 💛 But…how (much) does each data contribute to LLM outputs? In our paper, we develop algorithms, theory, and software for LLM-scale data ...
Tweet from Leshem Choshen @LREC 🤖🤗 (@LChoshen): At last, a curriculum learning that works, one for pretraining and another for instruction tuning @l__ranaldi @Giuli12P2 @andrenfreitas @znz8 https://aclanthology.org/2024.lrec-main.464.pdf https://ac...
A Formulation of Quantum Fluid Mechanics and Trajectories: A formalism of classical mechanics is given for time-dependent many-body states of quantum mechanics, describing both fluid flow and point mass trajectories. The familiar equations of energy, motion, ...
GitHub - jxbz/modula: Scalable neural net training via automatic normalization in the modular norm.: Scalable neural net training via automatic normalization in the modular norm. - jxbz/modula
Scalable Optimization in the Modular Norm: To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single l...

Eleuther ▷ #interpretability-general (3 messages):

Tim Dettmers' quantization research: a mixed reaction: A post highlights Tim Dettmers' quantization methods described in his paper and blog, explaining no performance degradation transformer inference with advanced quantization methods. It also mentions the intriguing concept of emergent outliers in transformers as "sinks of entropy/information", integrated with Hugging Face via bitsandbytes library.
Emergent features as “DNA” of the model: The concept of emergent features being invariant across layers and behaving like "sinks of entropy" was discussed, with a comparison to "DNA" from which the rest of the model's functionality could be reconstructed. The conversation probes into phase transitions around 7B parameter models and possible parallels to phase transitions in 3SAT or spin glass models.
Exploring transfer learning and fine-tuning applications: A member speculated about the potential for using ablation of vectors separating in-distribution and out-of-distribution samples to improve out-of-distribution generalization by minimizing shortcut features. However, this approach is acknowledged as being closer to transfer learning than true out-of-distribution generalization.

Link mentioned: LLM.int8() and Emergent Features — Tim Dettmers: When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transfo...

Eleuther ▷ #lm-thunderdome (10 messages🔥):

Set a seed in vllm models: Members discuss setting a seed in model_args for vllm models, noting that while it defaults to seed=1234, it might not be the issue. vllm also allows a per-sample seed in gen_kwargs, typically set to 0 during greedy decoding.

List all possible tasks using lm_eval: One member asked how to see the list of all possible tasks to test. Another specified that using lm_eval --tasks list gives a list of all task names, highlighting the need for better documentation.

BigBench task names have changed: A member is looking for updated BigBench task names as their 8-month-old eval harness no longer aligns. They are frustrated because the old harness isn't properly utilizing Accelerate, causing memory issues by overloading GPUs.

Organize tasks in lm-eval folder: To find tasks, it's suggested to look in the lm-eval/tasks folder. It's mentioned that tasks are "pretty nicely organized" there.

LM Studio ▷ #💬-general (142 messages🔥🔥):

Challenges with Small Model Loading on GPU: Members discussed issues related to loading small models on GPUs. One noted, "only load the biggest small models," while others suggested trying models like llama3, mistral instruct, cmdr.

Better Results with Lower Quantizations: A member shared, “I got better results with llamas q4 than I did q8 for my application," noting "Bigger not always better."

Finding Uncensored and Specialized Models: The discussion highlighted the challenge of finding appropriate models, with suggestions to try "deepseek coder, wizardlm, llama3," and a link to Hermes 2 Pro for JSON and function calling.

Vector Search and Context Management in Queries: Topics included using embeddings and vector search to handle full-article context for better responses. Specific prompts were shared, with one noting it “works much better with full articles,” providing more detailed answers.

Disk Utilization and Performance: Conversations touched on how disk utilization might affect performance, with one noting, “running models partially offloaded to swap has worked for me,” though “tok/sec becomes sec/tok.”

Links mentioned:

NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF · Hugging Face: no description found
GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference? - XiongjieDai/GPU-Benchmarks-on-LLM-Inference

LM Studio ▷ #🤖-models-discussion-chat (70 messages🔥🔥):

Model Updates Announced: A member announced that the 35B model is incoming, followed by a release announcement. They are actively testing to ensure compatibility with the latest LM Studio version.

Compatibility Issues and Fixes: Discussion around compatibility issues with ROCm build and new model versions were highlighted. Confirmed issues were related to outdated versions which will be resolved as ROCm version gets updated in the coming days.

Recommendations for Conversational Models: Members discussed decent conversational models, with one recommending Wavecoder Ultra as an excellent choice for coding and learning. Another suggestion was to try Mistral-Evolved-11b-v0.1 for uncensored use.

Loading Issues with Specific Hardware: A user reported indefinite loading times using a model on their system with a 5800x3d, 32GB DDR4, 4080 16GB VRAM. They later clarified it worked properly without using web search agents.

Potential Issues and Future Releases: Some members expressed anticipation for Phi-3 small GGUFs and discussed optimization differences between medium and small models, noting that phi small models provide better optimization.

Links mentioned:

failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 · Hugging Face: no description found
bartowski/wavecoder-ultra-6.7b-GGUF · Hugging Face: no description found
Add Support for IBM Granite · Issue #7116 · ggerganov/llama.cpp: Prerequisites Please answer the following questions for yourself before submitting an issue. [ ✅] I am running the latest code. Development is very rapid so there are no tagged versions as of now. ...

LM Studio ▷ #📝-prompts-discussion-chat (23 messages🔥):

LLMs struggle with precise character prompts: A user noted that Local Language Models (LLMs) often fail to adhere to precise character limits in prompts. They emphasized the difficulty of avoiding unnecessary additions like opinions or comments.

Capitalization and model behavior vary: Discussions highlighted that different models respond variably to capitalized instructions. One user pointed out, "Generally, LLM's don’t follow capitalized words on order of importance."

Specialized model recommended for multilingual tasks: A recommendation was made for using a specialized multilingual model for tasks like grammar and punctuation correction. The suggested model was Aya 23 8B by Cohere For AI.

Temperature adjustment considered for output quality: A user contemplated tweaking the temperature setting in Llama 3 to potentially improve its performance, as they observed, “Llama 3 has a much more... Creative way of doing it.”

GPU vs. CPU processing time discrepancy: One user mistakenly ran a grammar check task on their CPU, which extended the duration from 35 minutes to an estimated 15 hours. They later corrected this by running the task on GPU, significantly reducing the time required.

Link mentioned: lmstudio-community/aya-23-8B-GGUF · Hugging Face: no description found

LM Studio ▷ #⚙-configs-discussion (6 messages):

Tried disabling VPN routing for specific traffic types: A suggestion was made to disable VPN routing for specific traffic types and directly download models from Huggingface, possibly injecting them into the Models directory manually. The strategy is commonly recommended, especially when facing regular concerns about VPN-related issues.

CUDA versions on older GPUs may be problematic: It was pointed out that CUDA versions on the GTX 950m might be too outdated to function correctly. This could be a limiting factor in running certain models.

Recommendation for using Julius AI: Julius.ai was recommended, offering 10 free chats as a promotional feature. This is presented as a useful resource or tool for users encountering issues.

Persistent NVIDIA CUDA issues despite driver updates: Attempts to update NVIDIA drivers and configure different CUDA and CuDNN versions (12.4, 12.1, 11.8) on a system with a GTX 950m GPU have not resolved issues. The user continues to run on AMDOpenCL, leaving the potential CUDA capability of their NVIDIA card unused without clear reasons or solutions.

Links mentioned:

Julius AI | Your AI Data Analyst: Julius is a powerful AI data analyst that helps you analyze and visualize your data. Chat with your data, create graphs, build forecasting models, and more.
Julius AI | Your AI Data Analyst: Julius is a powerful AI data analyst that helps you analyze and visualize your data. Chat with your data, create graphs, build forecasting models, and more.

LM Studio ▷ #🎛-hardware-discussion (5 messages):

Llama.cpp supports distributed inference: Reddit discussion link revealed that llama.cpp now supports distributed inference with recent RPC code updates. Although it doesn't support quantized models yet, it can still run models across multiple machines by adjusting certain lines in the code.

Exploring PC builds for distributed models: Discussion considered the feasibility of clustering cheap used PCs with RTX 4060 Ti 16GB cards for optimal builds. There was curiosity about the network bandwidth requirements and possible constraints when linking these machines.

Using rented online PCs for inference: One suggestion was to use services like Maximum Settings or ShadowPC for renting multiple PCs to run larger models. However, concerns about high costs and specific limitations such as ShadowPC's inactivity timer and limited 6GB system RAM were raised.

Considerations for power consumption and networking: It was noted that RTX 4060 Ti cards draw 160W peak power, implying significant power considerations for host machines. Networking expenses and performance benchmarks are also crucial factors in a distributed architecture setup.

Link mentioned: Reddit - Dive into anything: no description found

LM Studio ▷ #amd-rocm-tech-preview (4 messages):

7900 XTX available?: One member inquired, "7900 xtx here, where can I get it?" indicating interest in acquiring a specific GPU model.
7900m works on Windows, not sure about Stable Diffusion: Another member shared that the 7900m works on Windows but they haven't figured out Stable Diffusion on LM Studio. They also mentioned not yet trying it on NixOS with a 6800xt.
LM Studio doesn't support Stable Diffusion: A member clarified that Stable Diffusion is not supported in LM Studio, which is dedicated solely to language models, not image generation models.
ROCm praised as a game changer: One participant expressed enthusiasm about ROCm, noting, "damn ROCm really is a game changer huh."

LM Studio ▷ #model-announcements (1 messages):

Cohere models go multilingual: Cohere models are now available in 23 different languages including Arabic, Chinese, French, and more. Check out the download links for aya-23 quants on the lmstudio-community page.
Update on deployment requirements: To use the aya-23 models, you'll need version 0.2.23 or newer. ROCm users will have to wait for an upcoming update.

CUDA MODE ▷ #general (23 messages🔥):

Clarification on Sparsity and Pruning: A member asked if sparsity is just pruning, but the discussion did not elaborate further.
Quantization of Neural Networks Questioned: There was a query about whether neural net quantization is only scaling down the precision or if it involves non-uniform quantization like remapping weights to quantiles.
Workshop Excitement: One member mentioned that the workshop was rad and expressed excitement to be there.
Question Posting Guidance: A user asked where to post questions and was directed to a specific Discord channel by another user here.
Announcement Channel Adjustment: A member requested an announcement channel for webhooks, and it was promptly adjusted into an announcement channel by another user, who also commented, "LOL done".

CUDA MODE ▷ #triton (4 messages):

Minimum Dimension Requirement for Dot Product: A member questioned why the dot product computation in CUDA requires matrices to have at least a dimension of 16. Another user suggested it might be due to tensor cores requirements.

Optimizing Matrix-Vector Multiplication: To optimize matrix-vector multiplication K v, a member asked if padding the vector to a shape of n by 16 would be advisable. They also pondered whether running sum(K * v.T, axis=-1) would be cheaper performance-wise.

Symmetric Matrix Computation: Discussion on whether performance can be improved by not recomputing already computed parts of a symmetric matrix. The member inquired if there is a special order of computation that could be considered to boost performance.

CUDA MODE ▷ #torch (1 messages):
davidgonmar_: Might be inplace operators?

CUDA MODE ▷ #announcements (1 messages):

Exciting live coding session with Izzat El Hajj: A speaker event featuring Izzat El Hajj, co-author of the PMPP book, is scheduled for tomorrow at <t:1716663600:F>. The highlight of the event will be actual live coding of the Scan algorithm, which is crucial for modern ML algorithms like Mamba, promising an engaging session for attendees.

CUDA MODE ▷ #pmpp-book (4 messages):

Excitement builds over book purchase: A member announced, "I bought the book," sparking curiosity from another member who asked how they liked it. The buyer responded that they had just bought it and would see how it is.

Upcoming PMPP author events: A member informed the channel about opportunities to meet and discuss with PMPP authors in the upcoming weeks. They mentioned that Prof Izzat El Hajj will present SCAN topics tomorrow and next week, and Prof Wen-mei Hwu will present later this summer. Check out the events calendar for more details.

CUDA MODE ▷ #torchao (5 messages):

int4 dtype functions lack implementations: A member noticed a lot of functions aren't implemented for the int4 dtype, even mentioning that the test script contains a few TODOs. They questioned if this gap is worth addressing ("Is this worth working on?").

uint4 extensions and limitations discussed: References were made to uint4 extensions, highlighting specific limitations such as type promotion constrained to uint8 and tensor shape operations like unbind and slice having restrictions. Another member stated that sub-byte dtypes are typically utilized in custom kernels rather than standard eager/compile functions.

uint4 needs improvement: A member straightforwardly pointed out that "uint4 indeed does need some love", indicating a recognized need for enhancement in this area.

Questioning the value of the task: Another member posed the question of what defines whether the task is "worth working on," hinting at a need for clarity on the potential benefits versus the required effort.

Link mentioned: Supporting new dtypes in PyTorch: tldr; This post explains what adding a new dtype to PyTorch core means, the criteria of adding a new dtype to PyTorch core and the official recommendation of how to support new “secondary dtypes” use ...

CUDA MODE ▷ #llmdotc (115 messages🔥🔥):

Gradient Norm Issues with Batch Size: A bug was identified where changing the batch size from 32 caused the gradient norm to spike significantly, causing failures in the training process. As one member phrased it, "the gradient norm is suddenly really really large and training fails".
Exponential Notation Parsing Issue: Members discussed a problem with passing floats in exponential notation to C, noting that -l 3e-4 doesn't get parsed by atof. It was noted that using 3.0e-4 might work, but this will need to be tested later.
Deterministic Kernels for Multi-GPU Runs: Members discussed the importance of getting deterministic kernels before any larger run, pointing out that a 124M model is still relatively small but more extensive runs would need determinism.
FineWeb Dataset Storage and RAM Usage: The FineWeb dataset is large, with intermediate disk usage reaching 70 GB and RAM usage up to 64 GB during processing. This has led to performance issues across systems with different configurations.
Exploding Gradients Fix: A fix for the exploding gradients issue, especially with large batch sizes, was implemented and tested successfully. This fix prevents indexing overflow in the fused classifier as mentioned in this PR.

Links mentioned:

PyTorch vs. llm.c cross-checks · karpathy/llm.c · Discussion #454: llm.c is starting to get to the point where we can start doing nice and serious "production" pretraining runs. That means: start training from scratch (random initialization) train on a nice...
fix for large batch sizes by ngc92 · Pull Request #456 · karpathy/llm.c: prevent indexing overflow in fused classifier, and added one more model configuration that makes testing easier on smaller systems
add checkpoint function write to file by karpathy · Pull Request #457 · karpathy/llm.c: no description found

CUDA MODE ▷ #rocm (2 messages):

Dreams of MI300 Gaming Card: One member speculated, "maybe after the mi300 does well they will ship a gaming card that works XD." Another humorously replied, "A person can dream at least."

CUDA MODE ▷ #bitnet (1 messages):
mobicham: https://arxiv.org/pdf/2405.14854

Modular (Mojo 🔥) ▷ #general (90 messages🔥🔥):

Funding Python Libraries' Port to Mojo: A user questioned the availability of a budget to incentivize developers of major Python libraries like psycopg3 to port their work to Mojo. It was discussed that the fast-evolving API and lack of stable FFI story could potentially burn out maintainers if pursued prematurely.
Debate on Porting Libraries: Some members argued against the practicality of asking existing Python libraries to port to Mojo, pointing out the challenges and potential unwelcome response. Others highlighted that C libraries, specifically those with no dependencies, might be more suited for early porting efforts.
Comparison with Rust and Future Prospects: Security benefits of moving to Rust were mentioned favorably, although it was noted that Mojo aims to suit different use cases without fully replacing C. Discussions touched on Rust’s commitment to portability and the potential of Mojo leveraging similar concepts.
BlazeSeq on MacOS: A user faced issues running BlazeSeq on MacOS, which was resolved by using the nightly version of Mojo. Feedback on performance was shared, showing similar efficiency between BlazeSeq and Rust's Needletail, indicating promising results on Mac's Ventura pro-max M2 arm64.
Prospects of HVM for Various Languages: There was a discussion about the HVM being used for running various programming languages like Python and Haskell, similar to JVM. Attention was drawn to an explanation by Victor Taelin about HVM's potential despite its current performance limitations.

Links mentioned:

CIRCT: no description found
True GIF - True - Discover & Share GIFs: Click to view the GIF
GitHub - rust-lang/rustc_codegen_gcc: libgccjit AOT codegen for rustc: libgccjit AOT codegen for rustc. Contribute to rust-lang/rustc_codegen_gcc development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):
ModularBot: From Modular:
https://twitter.com/Modular/status/1793797622572220431

Modular (Mojo 🔥) ▷ #ai (12 messages🔥):

Training ML models and inference in Mojo?: One member inquired about the future of training ML models and running inference natively in Mojo, and if Modular has plans to introduce a PyTorch-alternative written in Mojo. "They have Max Engine, which can be used in place of numpy for inference" but no plans for a training framework.
Level-Up Celebration with ModularBot: ModularBot congratulated a member for reaching level 16 with a whimsical comparison to a knight's journey. The bot continued with playful banter about taco preferences but clarified it cannot send funds.
Curious about ModularBot's model: A member asked about the model ModularBot is based on, and the bot responded with a fanciful narrative, stating it is "forged from the fires of ancient forges" and adept at dispensing knowledge, not funds.

Modular (Mojo 🔥) ▷ #🔥mojo (31 messages🔥):

Low-bit-depth networks spark debate: Discussions on the utility of low-bit-depth networks for embedded AI systems emphasized the importance of potentially incorporating dedicated support in programming languages. "Having an easy, language-supported means to specify that you wanted limited bit depth would be a big step to making small embedded AI systems."

FFT in Mojo: Scipy vs FFTW: One member sought advice on performing FFTs in Mojo, weighing the use of Scipy's FFT functions against wrapping FFTW. Another member suggested referring to a discussion on Tensor to NumPy array conversion for more insights.

Function-only structs without initialization: A proposal for a decorator to create function-only structs without initialization sparked a discussion on using @staticmethod to achieve similar functionality. "I guess what I want is to be able to call a variation of that once for an entire struct."

Mojo function argument handling update: A user highlighted a recent update on how Mojo processes function arguments, shifting from making copies by default to using borrowed conventions unless mutations occur. The update aims to "improve consistency, performance, and ease of use," as outlined on GitHub changelog.

Compile-time metaprogramming confusion: A user encountered issues with a function designed to build tables at compile time, facing a "range check issue" with list indexing. Another member proposed setting the list size explicitly using table.size, table.resize(256*n, 0), or table.append to resolve the issue.

Links mentioned:

GitHub - modularml/mojo: The Mojo Programming Language: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
How can I convert Tensor from/to numpy array? · modularml/mojo · Discussion #1048: I created a Tensor object, and applied some operations. but now I don't know how can I view the tensor? or if possible can I convert it to numpy array so that I can apply some python function?

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (2 messages):

Benchmarking in Jupyter vs Compiling questioned: A member asked about the reliability of benchmarking in a Jupyter notebook versus compiling. Another responded that one should benchmark in an environment similar to production and provided detailed tips to enhance precision, emphasizing compiled benchmarks and CPU isolation techniques.

Link mentioned: CPU Isolation – Introduction – by SUSE Labs (part 1...: This blog post is the first in a technical series by SUSE Labs...

Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):
Zapier: Modverse Weekly - Issue 35
https://www.modular.com/newsletters/modverse-weekly-35

Modular (Mojo 🔥) ▷ #nightly (34 messages🔥):

Mojo 24+ introduces breaking changes: A user experienced a runtime error with mojo parser.mojo Diffusion.bwpreset after updating to Mojo 24+. The culprit was identified as a type mismatch in a method, solved by ensuring read_bytes returns List[SIMD[uint8, 1]] (repo link).

Traits to support f-strings proposed: There was a discussion about contributing to f-string support with a Formatable trait in Mojo. One member suggested starting with something akin to Python's __format__ method handling format_spec.

Documenting bug in DTypePointer[bool]: A member discovered inconsistent behavior in DTypePointer[bool] when storing/loading with different widths and filed a bug report. The issue possibly involves bitpacking and alignment, providing code examples to reproduce the behavior.

Mojo nightlies released frequently: Users discuss the rapid deployment of nightly builds, now updated to 2024.5.2414. Links were shared to changelogs and community meetings for updates (roadmap, community meeting).

Alignment issues with bitpacking: Another alignment-related bug affected storing bool values in memory. Workarounds and multiple implications were discussed, leading to further exploration and bug documentation for community visibility.

Links mentioned:

FileHandle | Modular Docs: File handle to an opened file.
[BUG] `DTypePointer[bool]` packs bits inconsistently · Issue #2813 · modularml/mojo: Bug description When using DTypePointer[bool] store()/load() with different widths, you get inconsistent results. Steps to reproduce var ptr = DTypePointer[DType.bool].alloc(4) ptr.store(0, True) p...
FileHandle | Modular Docs: File handle to an opened file.
mojo/stdlib/src/builtin/file.mojo at 011bf40a304078b4471fe9ca18f4101b19943aa6 · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
Mojo Community Meeting #1: Mojo Community Meeting Public Agenda: https://modul.ar/community-meeting-doc

OpenAI ▷ #ai-discussions (116 messages🔥🔥):

Run an LLM with Nvidia A40: Participants discussed whether it is possible to run Large Language Models (LLMs) using an Nvidia A40 GPU, indicating interest in hardware requirements for AI tasks.
Microsoft Copilot+ PC features: There was a detailed discussion on Microsoft Copilot+ PCs, which include features like "sketch to image" in Microsoft Paint. Users debated the capabilities and recommended checking out alternatives like Leonardo.ai for similar functionalities.
Water consumption by AI models: Concerns were raised about the water usage of training AI models, with gizmodo article shared to highlight the environmental impact of AI technologies. Participants expressed the need for making AI more energy-efficient.
AI empowerment and iterative work: There was a conversation about empowering AI with iterative work to refine outputs. Some users pointed to projects like AutoGPT that attempt to address iterative improvements but acknowledged the cost issues associated with such tasks.
GPT-4's capabilities vs. GPT-3.5: The participants compared GPT-4's improved ability to handle specific tasks like word counting when compared to GPT-3.5. An example was shared showing GPT-4 completing a word count task correctly by following a detailed process.

Link mentioned: Training ChatGPT Required Enough Water to Fill a Nuclear Cooling Tower: An average user’s conversational exchange with ChatGPT amounts to dumping a large bottle of fresh water out on the ground, new research says.

OpenAI ▷ #gpt-4-discussions (11 messages🔥):

GPT refuses to output Typst code: A user complains that GPT defaults to writing LaTeX instead of Typst code, despite explicit requests. They are frustrated with GPT's persistent behavior.

Inquiry about GPTs running on 4o: A user asked if GPTs are running on GPT-4o. It's confirmed indirectly that GPT-4 capabilities might include building further advanced models.

Clarification on Vision capabilities: Mixed responses on whether Vision is out. One user confirms GPT-4 and GPT-4o can analyze images, while another negates it.

Addressing Invalid Request errors: A user reaches out to see if a peer resolved their Invalid Request error from a year ago. They mention currently experiencing the same issue and seek assistance.

Discussion on monetizing legal knowledge ChatGPT: A user asks for opinions on selling a company embedding ChatGPT with legal knowledge for $650 million dollars. This remains a provocative inquiry but receives no elaborate response.

OpenAI ▷ #prompt-engineering (8 messages🔥):

Improving Prompt Engineering for Name Selection: A member asked for advice on structuring a prompt to either provide a name if a code is given or vice versa. Another member suggested a solid prompt but did not offer further details.
AI Should Verbalize Problem-Solving Steps: One member observed that clarifying the need for the AI to "verbally work out a problem step-by-step" often resolves issues. There was no further elaboration on specific steps or examples.
Fun Custom Instruction for Assistant Persona: A member shared a custom instruction called "PONDER," which directs the AI to engage in a soliloquy-like, self-reflective exploration on a topic, preferably seeking creative insights. This setup involves an autoprompting loop initiated by a user input of "." and showcases innovative patterns through a dynamic ideational network.

OpenAI ▷ #api-discussions (8 messages🔥):

Improving prompt engineering for name selection: A member seeks advice on how to configure a prompt to return a code when a name is expected and vice versa. They received a positive response indicating the prompt was solid.

Citation needed: A member asks for a "citation?" in the middle of a discussion, but no specific context is provided.

Clarify AI problem-solving with verbal steps: Noted that prompting the AI to verbally work through a problem step-by-step can enhance its problem-solving capabilities.

Fun and useful custom "ponder" instructions: Shared a detailed custom instruction for making the AI "ponder" and enter an autoprompting loop using the cue of '.' from the user. This method is described as both fun and a tool for exploring connections and generating insights creatively.

LangChain AI ▷ #general (83 messages🔥🔥):

Using CSV Agent in LangChain: Members discussed how to use a CSV agent as part of an LLM chain in LangChain. Documentation links were shared for further details.

Sequential Chains with CSV Agent: Instructions were provided on integrating a CSV agent into a SequentialChain along with other chains like wiki_chain and verifier_chain. Specific parameters like output_variables were highlighted for configuring the chain's behavior.

CSV Agent Custom Output Key: Guidance was given on customizing the create_csv_agent to set the output key as csv_response. This involves modifying the output_key parameter in the LLMChain of the agent.

Memory in Sequential Chain: There was a request for adding memory to a Sequential Chain, with examples provided on using ConversationBufferMemory and implementing the memory within an agent setup.

SQL Agent Issues: Concerns were raised about SQL agents struggling with multi-table queries despite using few-shot prompts, suggesting potential issues with token usage, LLM compatibility, or prompt templates. Specific GitHub issues were mentioned for further context.

Links mentioned:

no title found: no description found
Chat models | 🦜️🔗 LangChain: Advanced features
CSV | 🦜️🔗 LangChain: LLMs are great for building question-answering systems over various types of data sources. In this section we'll go over how to build Q&A systems over data stored in a CSV file(s). Like worki...
Github | 🦜️🔗 LangChain: The Github toolkit contains tools that enable an LLM agent to interact with a github repository.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Quickstart | 🦜️🔗 Langchain: In this guide, we will go over the basic ways to create Chains and Agents that call Tools. Tools can be just about anything — APIs, functions, databases, etc. Tools allow us to extend the capabilities...
Chains | 🦜️🔗 LangChain: Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to do this is with LCEL.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
EDA GPT DEMO | LOVO AI: EDA GPT DEMO
">no title found: no description found
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
SequentialChainInput | LangChain.js - v0.2.2: no description found
BaseChain | LangChain.js - v0.2.2: no description found
langchainjs/langchain/src/chains/base.ts at a269f531692c815acee094aeef01b259d1fd2674 · langchain-ai/langchainjs: 🦜🔗 Build context-aware reasoning applications 🦜🔗. Contribute to langchain-ai/langchainjs development by creating an account on GitHub.

LangChain AI ▷ #share-your-work (4 messages):

OranAITech Showcases on Twitter: A member shared a Twitter link showcasing their latest advancements in AI technology. No additional context was provided.

Everything-AI v2.0.0 Launches with New Features: A member announced the release of everything-ai v2.0.0, highlighting its ability to handle tasks such as audio processing, video generation, and 3D protein structure prediction. The project can be accessed on GitHub and comes with detailed documentation.

VisualAgents Flow Engineering Demos: Two YouTube videos were shared, showcasing the Visual Agents flow engineering platform built on LangChain: Building a SQL Agent and Building a Simple Retrieval. The platform enables flow creation in a fully browser-based PWA without coding.

EDA GPT DEMO by Sounak Roy: A demo for EDA GPT was shared via this link, offering a 5-minute overview of its capabilities.

Links mentioned:

GitHub - AstraBert/everything-ai: Your fully proficient, AI-powered and local chatbot assistant🤖: Your fully proficient, AI-powered and local chatbot assistant🤖 - AstraBert/everything-ai
everything-ai: Introducing everything-ai, your multi-task, AI-powered and local assistant! 🤖
EDA GPT DEMO | LOVO AI: EDA GPT DEMO
Build a SQL Agent Using VisualAgents & LangChain: In this short demo, we build a SQL Agent flow and use it to ask a question about a SQL database we loaded online (the Chinook customer database). This is don...
Building a Simple Retrieval using VisualAgents & LangChain: Using examples from the KangChain quickstart guide, watch me create the entire flow in VisualAgents without writing any code!Learn more: https://visualagents.ai

LangChain AI ▷ #tutorials (1 messages):
business24.ai: https://youtu.be/gflsu_6R_8g

LAION ▷ #general (65 messages🔥🔥):

Pirate Bay won't save AI: A member speculated that "the pirate bay might eventually end up with a weights category and be the saviour of AI," but another disagreed, stating it won't happen due to more AI-friendly policies in other countries.

Japan supports AI training: A discussion highlighted Japan's protective stance on AI training and inference, linking to a tweet discussing a paper on making new base diffusion models without extensive pretraining.

Controversy over model technique descriptions: Disputes arose regarding the communication and understanding of methods for creating new base diffusion models. The technique involves "nighshading and other tech" to disrupt model associations before restoring them, which one user defended against accusations and misunderstandings.

Human preference study with Ella-SDXL: A project involving a poisoned model recovery method is under a human preference study in collaboration with fal.ai. The results are forthcoming, and the approach seeks to demonstrate the validity of the method through empirical results. 

Artifacts in AI-generated images: Critique of the "high contrast look" and artifacts in Mobius and other models were discussed, with comparisons to previous AI models like MJv6 and earlier iterations. Members noted issues with latent noise and the visual characteristics of different models.

Links mentioned:

Tweet from DataVoid e/acc (@DataPlusEngine): Our upcoming paper outlines and enables making entirely new base diffusion models without the need to extensively pretrain a new model from scratch. We can in a controlled way, break all the quality a...
GitHub - rohitgandikota/erasing: Erasing Concepts from Diffusion Models: Erasing Concepts from Diffusion Models . Contribute to rohitgandikota/erasing development by creating an account on GitHub.

LAION ▷ #research (11 messages🔥):

Anthropic releases research paper on Claude: A member shared a major new research paper from Anthropic about interpreting large language models, where they mapped out the inner workings of Claude 3 Sonnet. The paper highlights the ability to identify and tune specific concept activations, such as the Golden Gate Bridge.
Debate on AI as an ad product: A member questioned the potential for companies to leverage AI concept activations as an ad product, sparking a humorous response and a linked example on X. Another member lamented the inevitability of such developments driving them mad.
Reflections on AI model progress: A member reminisced about early AI vision work on the Inception v1 model and its evolution to today's sophisticated models. They commented on the historical importance of hallucinogenic DeepDream for learning about neurons and circuit manipulation.
Discussion on sparsity in neural networks: A member explained the architecture and training methodology of a sparse autoencoder, emphasizing the use of L1 norm enforcement to maintain sparsity. They noted that a high-dimensional middle layer typically has only around 300 non-zero dimensions on average.

Links mentioned:

Thermodynamic Natural Gradient Descent: Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed ...
Tweet from Philip Kung (@PhilipKung5): thank you golden gate claude 😂😂😂
Golden Gate Claude: When we turn up the strength of the “Golden Gate Bridge” feature, Claude’s responses begin to focus on the Golden Gate Bridge. For a short time, we’re making this model available for everyone to inter...

LlamaIndex ▷ #blog (3 messages):

Few spots left for LlamaIndex meetup: "There's only a few spots left for Tuesday's meetup, so grab them while you can!" Stay updated here.
Automate tasks using LlamaIndex and MultiOn: "MultiOn is an AI agents platform that works with the web to get real things done by connecting to the Internet through your Chrome web browser and acting on your behalf." Check out the demo here.
Introducing RAGApp - A no-code interface for RAG chatbot: "A docker container that’s easily deployable in any cloud infrastructure and is fully open-source." Configure your LLM model provider easily here.

LlamaIndex ▷ #general (60 messages🔥🔥):

LlamaParse Emerges as PDF Extraction Solution: Users recommended LlamaParse for extracting data from PDFs with tables and fields, suggesting it's a suitable out-of-the-box API for the task. LlamaParse supports extraction via GPT-4o.

Knowledge Graph Indexing Advice: Discussions addressed challenges with indexing knowledge bases containing links to other pages, suggesting manual triplet creation for KnowledgeGraphIndex while considering VectorStoreIndex for efficiency. 

LlamaIndex Integration Clarifications: Participants shared confusion over installing LlamaIndex locally with all necessary packages, specifically the LLM OpenAI component, advising to clear cache and ensure proper directory structure.

Pydantic Parsing Issues in LLM: User struggled with pydantic model errors during response parsing, with suggestions to add better descriptions to fields and improved input parsing for GPT-4o. The issue pointed to the LLM's inability to correctly interpret the output class.

Better Models for Invoice Processing: Recommendations were made to check HuggingFace MTEB leaderboard for superior embedding models, with specific mentions of BGE, Nomic, and GTE models for tasks like chatting with invoices and PDFs.

Links mentioned:

Redirecting...: no description found
GitHub - run-llama/llama_index: LlamaIndex is a data framework for your LLM applications: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Query Engine - LlamaIndex: no description found

LlamaIndex ▷ #ai-discussion (4 messages):

Andy Singal unveils PostgresML power with LlamaIndex: A Medium article titled "Unleashing the Power of PostgresML with LlamaIndex Integration" by Andy Singal was shared. jerryjliu0 found the article nice and praised it, to which Andy Singal expressed gratitude.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

New AI Model Alert: Phi-3 Medium 128k Instruct: OpenRouter announced the release of Phi-3 Medium 128k Instruct model. Users can check out the standard variant and the free variant, and join the discussion here to share their feedback on its performance and applicability.

Links mentioned:

Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...

OpenRouter (Alex Atallah) ▷ #general (41 messages🔥):

Wizard Model Shows Improved Performance: Members noticed that wizard model responses have become significantly better, with reduced wait times and more creative answers. “You still need to babysit it to avoid paragraph repetition, but otherwise, it was quite good,” highlighted one user. 
Phi-3 Vision Gains Interest: Discussions led to the hype around Phi-3 Vision's capabilities, with users sharing test links like Phi-3 Vision and mentioning its potential when combined with other models. Another model, CogVLM2, was recommended for vision tasks at CogVLM-CogAgent on Hugging Face.
Llama 3 Model Prompt Formatting Clarified: Members clarified that prompts for Llama 3 models get automatically transformed by OpenRouter's API, eliminating the need for manual formatting. Manual prompt submission is an option, using the prompt parameters and the completions endpoint instead of chat/completions.
Llama 3 Parameter Update: Optimal parameters for Llama 3 models are being updated soon due to a recently fixed bug. This update will be pushed within approximately 48 hours, according to a team response.
Google's Gemini API Issues and Limits: Users expressed frustration over Gemini FLASH returning blank outputs despite high token usage. It's confirmed as a model-side issue, and the discussion highlighted Google's new daily API usage limits, sparking curiosity about increased OpenRouter Gemini usage.

Links mentioned:

Azure AI Studio: no description found
CogVLM - a Hugging Face Space by THUDM: no description found

Latent Space ▷ #ai-general-chat (36 messages🔥):

Tensorlake launches Indexify: Members discussed the new open-source product by Tensorlake, called Indexify, which provides a real-time data framework for LLMs. "It's like a 'streaming ETL' layer," said one member, while another pondered the challenge of sustainability with open source products.

Indexify dissected: The design choices behind Indexify sparked interest, partly attributed to its creator's background with Nomad. There were questions about the sufficiency and monetization of the extractors provided.

Hugging Face Leaderboard blogpost shared: A post by Clementine, running the HF OSS Leaderboard, was shared. It delves into LLM evaluation practices and the significance of leaderboards and non-regression testing (Hugging Face blog).

Website poisoning works on Google's AI overviews: A link to a revelation by Mark Riedl about a website poisoning attack that affects Google's AI overviews (X post). This led to further discussion on using custom search engine browser bypasses to avoid such issues.

Thomas Dohmke's TED Talk on AI in coding: Members discussed Thomas Dohmke's TED Talk on how AI is lowering the barriers to coding. There were mixed feelings about its current reliability, but acknowledgment that UX improvements allow quicker workarounds for issues.

Links mentioned:

Tweet from Tensorlake (@tensorlake): We are super excited to finally announce @tensorlake's open-source, real-time data framework, Indexify.  It fits into any LLM stack and provides a foundational building block for bringing your dat...
Let's talk about LLM evaluation: no description found
Tweet from Bartłomiej Cupiał (@CupiaBart): So here's a story of, by far, the weirdest bug I've encountered in my CS career.  Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old rog...
Tweet from jason liu (@jxnlco): There is my prediction on where RAG is headed. In this video i talk about   - Shift from RAG as question-answering systems to report generation tools - Importance of well-designed templates and SOPs i...
Show HN: Open-source real time data framework for LLM applications | Hacker News: no description found
Tweet from Mark Riedl (@mark_riedl): Yes! My website poisoning attack works on Google's new LLM-powered AI overviews!
With AI, Anyone Can Be a Coder Now | Thomas Dohmke | TED: What if you could code just by talking out loud? GitHub CEO Thomas Dohmke shows how, thanks to AI, the barrier to entry to coding is rapidly disappearing — a...
Tweet from Mark Riedl (@mark_riedl): Yes! My website poisoning attack works on Google's new LLM-powered AI overviews!
Tweet from Nathan Lands — Lore.com (@NathanLands): 11)

Latent Space ▷ #ai-announcements (1 messages):

World's Fair Diversity Scholarships Available: Those struggling to afford tickets to the AI Engineer World's Fair can apply for diversity scholarships, which offer either free or discounted tickets for the event from June 25-27 in San Francisco. Applications should include "concise but specific responses to essay questions" and can be applied for here.

Link mentioned: Diversity Program - AI Engineer World's Fair June 2024: AI Engineer World's Fair is committed to assisting underrepresented minorities who want to attend our event. We steadfastly believe in the value of having a wide variety of people attend. We know ...

Interconnects (Nathan Lambert) ▷ #random (27 messages🔥):

Tax Invoicing without a Credit Card: Nathan Lambert mentioned an odd situation where a platform sent him an invoice for taxes despite not having a credit card on file. He found the process logical after learning the details about resale certificates.

Golden Gate Bridge-Focused AI: The group was intrigued by Anthropic AI's experiment, which demonstrated altering an AI's internal features to make it focus on the Golden Gate Bridge. This led to the creation of "Golden Gate Claude," available for public interaction at claude.ai.

Google's PR Fiasco: Members discussed how Google's product pipeline issues seem to lead to repeated public failures, such as poorly received AI releases. The conversation highlighted concerns about internal feedback not being heeded and oversights in rolling out substandard models.

Response to AI Dataset Claims: A link shared by Philpax refuted claims about Google's AI datasets, specifically denying reliance on LAION-5B. Google's AI team emphasized they have superior internal datasets for their research.

Links mentioned:

Tweet from Anthropic (@AnthropicAI): This week, we showed how altering internal "features" in our AI, Claude, could change its behavior.  We found a feature that can make Claude focus intensely on the Golden Gate Bridge.  Now, fo...
Tweet from Lucas Beyer (bl16) (@giffmana): Just in case it’s not obvious: the answer is a ridiculous hallucination. Maybe because “Google’s ai dataset” isn’t even a thing.  We’re not touching laion5b, not even for research. We don’t need to, w...

Interconnects (Nathan Lambert) ▷ #lectures-and-projects (2 messages):

Advanced CS Lecture Slides Available: Nathan Lambert shared a link to a more advanced version of his CS25N lecture, based on material from CS224N. The slides can be accessed here.

Future Recording Announcement: Nathan Lambert mentioned that a recording of the session would be available eventually. No specific dates were provided for the release.

Link mentioned: [21 May 2024]  Life after DPO (for alignment): Life after DPO Nathan Lambert || Allen Institute for AI || @natolambert Stanford CS224N: Natural Language Processing with Deep Learning 21 May 2024

OpenAccess AI Collective (axolotl) ▷ #general (17 messages🔥):

GQA confusion with cmdr models: Members were clarifying whether "cmdr" and "cmdr+" models have Grouped Query Attention (GQA). One member confirmed, "cmdr+ has gqa. not + doesnt," showing different specs for each version.
VRAM scaling discussion: There was a discussion on how the presence or absence of GQA affects VRAM usage. One user mentioned, "gqa is better than exponential but not linear yeah... it just scales better."
Sample packing efficiency improvement: Members highlighted a new PR on GitHub, noting a "3-4% efficiency improvement with sample packing". This was linked to a PR by Dave Sescleifer.

Link mentioned: Switch to parallel FFD bin packing algorithm. by winglian · Pull Request #1619 · OpenAccess-AI-Collective/axolotl: Add support for packing in a distributed context. Add packing efficiency estimate back. See #1516 by @dsesclei. Attempting to rebase the original PR onto the latest main wasn't terribly clean. I a...

OpenAccess AI Collective (axolotl) ▷ #community-showcase (3 messages):

Journal Article Published: A member shared a journal article they co-authored, now published in the Journal of the American Medical Informatics Association. They mentioned their affiliation with Université catholique de Louvain and other contributors to the paper.

Congratulations Pour In: Another member congratulated the author on the publication, adding a friendly "congrats 🙂" note. This shows community support and celebration for the author's achievement.

Link mentioned: Impact of high-quality, mixed-domain data on the performance of medical language models: AbstractObjective. To optimize the training strategy of large language models for medical applications, focusing on creating clinically relevant systems th

OpenInterpreter ▷ #general (8 messages🔥):

SB-1047 sparks outrage: Members discussed concerns about SB-1047, which they see as an attempt to centralize AI governance among big players like OpenAI. One member called it a “whimsical, flaming pile of garbage” and drew parallels with regulatory capture in Big Pharma and the Energy Sector, arguing it disadvantages smaller developers on tight budgets. 
Perplexity AI search link shared: A member shared a link to Perplexity AI search regarding SB-1047. No further details or context was provided in the chat about the specifics of the search.
Arc Browser's Call Arc praised: The new “Call Arc” feature of Arc Browser was highlighted for its simplicity and usefulness. The member praised it for allowing users to “ask your browser to find and collect relevant answers for you” effortlessly, sharing a link for more details.

Links mentioned:

1.44.1 Release:  
‎Gemini - SB 1047: Stifling Open-Source AI Innovation?: Created with Gemini Advanced

OpenInterpreter ▷ #O1 (5 messages):

User faces issue with Typer installation: A user stated "queuelabs: pip install typer does not resolve" indicating they are having trouble installing the Typer library using pip.
Poetry setup problem troubles users: Another user asked "Did you run poetry install before poetry run 01? Are you running in a virtual environment," pointing out potential steps missed in the setup process.

Mozilla AI ▷ #llamafile (9 messages🔥):

Twinny + LM Studio blow minds as local co-pilot: A user shared their positive experience using Twinny with LM Studio as a local co-pilot replacement. They asked about running this setup via llamafiles and received confirmation that running two llamafiles at the same time is possible by assigning different ports.

Embedding images with llama.cpp endpoint confusion solved: A member asked if the llamafile/llama.cpp server supports images in llava embeddings and shared a command that did not work as expected. They later clarified that the /v1/embeddings endpoint does not accept image_data but using the /embedding endpoint works as expected.

Running continue.dev with llamafile performance issues: Another user reported running continue.dev with llamafile, noting it was slow on a Mac M2 but somewhat faster on an older Nvidia GPU.

Inquiries on building and training custom LLMs: A member sought advice on building and training a custom LLM using company documentation for internal use. They received a recommendation to use HuggingFace Transformers for training, noting that llamafile only supports inference.

Links mentioned:

🤗 Transformers: no description found
GitHub - rjmacarthy/twinny: The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.: The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private. - rjmacarthy/twinny
Allow server to generate multimodal embeddings via the `/embedding` endpoint by kseth · Pull Request #4681 · ggerganov/llama.cpp: The server already exposes multimodal support in /completion and other places, but not in /embedding. The change for this is relatively straightforward, if a user submits in image_data to the /embe...

Cohere ▷ #general (8 messages🔥):

User Thanks Team: "THANK YOU!" expressed in response to a previous interaction.

Inquiry About 104B Model: A user asked if the team is planning to publish a 104B version of their model family.

Langchain Integration Question: A member inquired about the current status and recommendation for using Langchain integration with Cohere.

Aya Model Size Clarification: A user asked whether the Aya model on the playground is for the 8B or 35B version.

Validation Error with Compressor: An issue was shared regarding a ValidationError with ContextualCompressionRetriever due to an abstract method.

"56 Bananas Equal to 1 Apple" Calculation: A calculation problem was explored with CMR+: "1 apple = 2 pears, 3 pears = 4 oranges, 6 oranges = 7 bananas", concluding "56 bananas are equal to 1 apple."

403 Forbidden Error Troubleshoot: A user reported a 403 Forbidden error despite using the correct production key.

AI Stack Devs (Yoko Li) ▷ #late-night-lounge (6 messages):

AI Generated Standup comedy is surprisingly good: A user shared a link expressing surprise at the quality of AI-generated standup comedy. They seemed impressed with its performance.

Exploring the Ud.io App: Another user asked if the app mentioned, Ud.io, only does comedy. This inquiry suggests curiosity about the app's full capabilities.

Transforming audio on Suno: A member shared a more "demonic" version of the original audio using Suno. This highlights the versatility of the platform in modifying sound.

Interest in Learning Audio Manipulation: One user expressed interest in learning how to create audio modifications similar to the ones shared. This indicates a desire to acquire skills in audio engineering or AI-driven sound manipulation. 

Dismissive Response: Briefly, a user responded with a curt "No" to a query, indicating either disinterest or negation of a previous statement.

Links mentioned:

csimpkins - Standup Comedy on AI Generated Music | Udio: Listen to Standup Comedy on AI Generated Music by csimpkins on Udio. Discover, create, and share music with the world. Use the latest technology to create AI music in seconds.
AI Standup Comedy on AI Generated Musicby by @unwaveringplugin464 | Suno: Standup comedian performing at a comedy show song. Listen and make your own with Suno.

MLOps @Chipro ▷ #events (1 messages):

Member seeks Google Calendar integration for event tracking: A member inquired about the availability of an event calendar that could be imported into Google Calendar to avoid missing events. They expressed their concern with a sad emoji, indicating a need for a streamlined way to keep track of scheduled activities.

MLOps @Chipro ▷ #general-ml (1 messages):
evelynciara: yess I'm glad this channel exists 😅

DiscoResearch ▷ #general (1 messages):
datarevised: https://x.com/DataPlusEngine/status/1793803117642854732

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):