[AINews] Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

bfloat16

                June 25, 2024

            [AINews] Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            window.ai.createTextSession() is all you need

AI News for 6/21/2024-6/24/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (415 channels, and 5896 messages) for you. Estimated reading time saved (at 200wpm): 660 minutes. You can now tag @smol_ai for AINews discussions!

The latest Chrome Canary now has Gemini Nano in a feature flag:

Prompt API for Gemini Nano chrome://flags/#prompt-api-for-gemini-nano
Optimization guide on device
chrome://flags/#optimization-guide-on-device-model
Navigate to chrome://components/ and look for Optimization Guide On Device Model; Check for update to start the download

You'll now have access to the model via the console: http://window.ai.createTextSession()

Nano 1 and 2, at a 4bit quantized 1.8B and 3.25B parameters has decent performance relative to Gemini Pro:

and you should see this live demo of how fast it runs

Lastly, the base model and instruct-tuned model weights have already been extracted and posted to HuggingFace.

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
Claude 3 Sonnet
Claude 3.5 Sonnet
Claude 3 Opus
GPT4T (gpt-4-turbo-2024-04-09)
GPT4O (gpt-4o-2024-05-13)
Model Optimization and LLM Innovations
Open-Source Developments and Community Efforts
AI in Production and Real-World Applications
Operational Challenges and Support Queries
Upcoming Technologies and Future Directions

PART 1: High level Discord summaries
HuggingFace Discord
Unsloth AI (Daniel Han) Discord
Stability.ai (Stable Diffusion) Discord
CUDA MODE Discord
LM Studio Discord
OpenAI Discord
Perplexity AI Discord
Nous Research AI Discord
Boost in Dataset Deduplication: Rensa outperforms datasketch with a 2.5-3x speed boost, leveraging Rust's FxHash, LSH index, and on-the-fly permutations for dataset deduplication.
Model Jailbreak Exposed: A Financial Times article highlights hackers "jailbreaking" AI models to reveal flaws, while contributors on GitHub share a "smol q* implementation" and innovative projects like llama.ttf, an LLM inference engine disguised as a font file.
Lively Debate on Model Parameters: In the ask-about-llms, discussions ranged from the surprisingly capable story generation of TinyStories-656K to assertions that general-purpose performance soars with 70B+ parameter models.
Dataset Synthesis and Classification Enhanced: Members share a Google Sheet for collaborative dataset tracking, explore improvements using the Hermes RAG format, and delve into datasets like SciRIFF and ft-instruction-synthesizer-collection for scientific and instructional purposes.
AI Safety Models Scrutiny and Coursework: #general sees a mix, from Gemini and OpenAI's redaction-capable safety models to the launch of Karpathy's LLM101n course, encouraging engineers to build a storytelling LLM.

Eleuther Discord
Latent Space Discord
Modular (Mojo 🔥) Discord
LAION Discord
Cohere Discord
LangChain AI Discord
OpenRouter (Alex Atallah) Discord
OpenInterpreter Discord
LLM Finetuning (Hamel + Dan) Discord
LlamaIndex Discord
Interconnects (Nathan Lambert) Discord
OpenAccess AI Collective (axolotl) Discord
Mozilla AI Discord
Torchtune Discord
tinygrad (George Hotz) Discord
LLM Perf Enthusiasts AI Discord
MLOps @Chipro Discord

PART 2: Detailed by-Channel summaries and links
HuggingFace ▷ #general (715 messages🔥🔥🔥):
HuggingFace ▷ #today-im-learning (3 messages):
HuggingFace ▷ #cool-finds (5 messages):
HuggingFace ▷ #i-made-this (12 messages🔥):
HuggingFace ▷ #reading-group (5 messages):
HuggingFace ▷ #computer-vision (9 messages🔥):
HuggingFace ▷ #NLP (1 messages):
HuggingFace ▷ #diffusion-discussions (2 messages):
Unsloth AI (Daniel Han) ▷ #general (376 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #random (108 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #help (228 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #showcase (1 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (583 messages🔥🔥🔥):
CUDA MODE ▷ #general (17 messages🔥):
CUDA MODE ▷ #torch (4 messages):
CUDA MODE ▷ #cool-links (1 messages):
CUDA MODE ▷ #jobs (1 messages):
CUDA MODE ▷ #beginner (3 messages):
CUDA MODE ▷ #torchao (28 messages🔥):
CUDA MODE ▷ #off-topic (18 messages🔥):
CUDA MODE ▷ #hqq (2 messages):
CUDA MODE ▷ #llmdotc (465 messages🔥🔥🔥):
CUDA MODE ▷ #rocm (2 messages):
CUDA MODE ▷ #bitnet (25 messages🔥):
LM Studio ▷ #💬-general (312 messages🔥🔥):
LM Studio ▷ #🤖-models-discussion-chat (116 messages🔥🔥):
LM Studio ▷ #🧠-feedback (4 messages):
LM Studio ▷ #⚙-configs-discussion (9 messages🔥):
LM Studio ▷ #🎛-hardware-discussion (18 messages🔥):
LM Studio ▷ #🧪-beta-releases-chat (3 messages):
LM Studio ▷ #avx-beta (1 messages):
LM Studio ▷ #model-announcements (1 messages):
LM Studio ▷ #🛠-dev-chat (12 messages🔥):
OpenAI ▷ #ai-discussions (276 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (29 messages🔥):
OpenAI ▷ #prompt-engineering (53 messages🔥):
OpenAI ▷ #api-discussions (53 messages🔥):
Perplexity AI ▷ #general (381 messages🔥🔥):
Perplexity AI ▷ #sharing (12 messages🔥):
Perplexity AI ▷ #pplx-api (12 messages🔥):
Nous Research AI ▷ #off-topic (20 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):
Mozilla AI ▷ #announcements (1 messages):
Mozilla AI ▷ #llamafile (31 messages🔥):
Torchtune ▷ #general (24 messages🔥):
tinygrad (George Hotz) ▷ #general (8 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
LLM Perf Enthusiasts AI ▷ #claude (1 messages):
MLOps @Chipro ▷ #events (1 messages):
Nous Research AI ▷ #interesting-links (9 messages🔥):
Nous Research AI ▷ #general (278 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (15 messages🔥):
Nous Research AI ▷ #rag-dataset (12 messages🔥):
Nous Research AI ▷ #world-sim (1 messages):
Eleuther ▷ #general (114 messages🔥🔥):
Eleuther ▷ #research (155 messages🔥🔥):
Eleuther ▷ #scaling-laws (10 messages🔥):
Eleuther ▷ #interpretability-general (3 messages):
Eleuther ▷ #lm-thunderdome (6 messages):
Eleuther ▷ #multimodal-general (2 messages):
Eleuther ▷ #gpt-neox-dev (3 messages):
Latent Space ▷ #ai-general-chat (133 messages🔥🔥):
Latent Space ▷ #ai-announcements (3 messages):
Latent Space ▷ #ai-in-action-club (72 messages🔥🔥):
Modular (Mojo 🔥) ▷ #general (62 messages🔥🔥):
Modular (Mojo 🔥) ▷ #📺︱youtube (1 messages):
Modular (Mojo 🔥) ▷ #ai (5 messages):
Modular (Mojo 🔥) ▷ #🔥mojo (51 messages🔥):
Modular (Mojo 🔥) ▷ #performance-and-benchmarks (58 messages🔥🔥):
Modular (Mojo 🔥) ▷ #nightly (21 messages🔥):
LAION ▷ #general (102 messages🔥🔥):
LAION ▷ #research (27 messages🔥):
Cohere ▷ #general (117 messages🔥🔥):
Cohere ▷ #project-sharing (10 messages🔥):
Cohere ▷ #announcements (1 messages):
LangChain AI ▷ #general (100 messages🔥🔥):
LangChain AI ▷ #langchain-templates (21 messages🔥):
LangChain AI ▷ #share-your-work (5 messages):
LangChain AI ▷ #tutorials (1 messages):
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):
OpenRouter (Alex Atallah) ▷ #general (106 messages🔥🔥):
OpenInterpreter ▷ #general (85 messages🔥🔥):
OpenInterpreter ▷ #O1 (17 messages🔥):
OpenInterpreter ▷ #ai-content (5 messages):
LLM Finetuning (Hamel + Dan) ▷ #general (33 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #learning-resources (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #hugging-face (6 messages):
LLM Finetuning (Hamel + Dan) ▷ #replicate (3 messages):
LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #axolotl (3 messages):
LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #simon_cli_llms (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #credits-questions (3 messages):
LLM Finetuning (Hamel + Dan) ▷ #fireworks (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #braintrust (25 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #predibase (13 messages🔥):
LlamaIndex ▷ #blog (5 messages):
LlamaIndex ▷ #general (70 messages🔥🔥):
LlamaIndex ▷ #ai-discussion (1 messages):
Interconnects (Nathan Lambert) ▷ #news (17 messages🔥):
Interconnects (Nathan Lambert) ▷ #ml-questions (20 messages🔥):
Interconnects (Nathan Lambert) ▷ #ml-drama (13 messages🔥):
Interconnects (Nathan Lambert) ▷ #random (9 messages🔥):
Interconnects (Nathan Lambert) ▷ #memes (3 messages):
Interconnects (Nathan Lambert) ▷ #reads (4 messages):
OpenAccess AI Collective (axolotl) ▷ #general (33 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):
OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):
OpenAccess AI Collective (axolotl) ▷ #datasets (5 messages):
OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):
Mozilla AI ▷ #announcements (1 messages):
Mozilla AI ▷ #llamafile (31 messages🔥):
Torchtune ▷ #general (24 messages🔥):
tinygrad (George Hotz) ▷ #general (8 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
LLM Perf Enthusiasts AI ▷ #claude (1 messages):
MLOps @Chipro ▷ #events (1 messages):

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Model Releases and Benchmarks

Anthropic Claude 3.5 Sonnet: @adcock_brett noted Anthropic launched Claude 3.5 Sonnet, an upgraded model that bests GPT-4o across some benchmarks. For devs, it's 2x the speed of Opus, while pricing comes in at 1/5 the cost of Anthropic's previous top model. For consumers, it's completely free to try. @lmsysorg reported Claude 3.5 Sonnet has climbed to #4 in Coding Arena, nearing GPT-4-Turbo levels. It's now the top open model for coding. It also ranks #11 in Hard Prompts and #20 in Overall generic questions.
DeepSeek-Coder-V2: @dair_ai noted DeepSeek-Coder-V2 competes with closed-sourced models on code and math generation tasks. It achieves 90.2% on HumanEval and 75.7% on MATH, higher than GPT-4-Turbo-0409 performance according to their report. Includes a 16B and 236B parameter model with 128K context length.
GLM-0520: @lmsysorg reported GLM-0520 from Zhipu AI/Tsinghua impresses at #9 in Coding and #11 Overall. Chinese LLMs are getting more competitive than ever!
Nemotron 340B: @dl_weekly reported NVIDIA announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models.

AI Research Papers

TextGrad: @dair_ai noted TextGrad is a new framework for automatic differentiation through backpropagation on textual feedback provided by an LLM. This improves individual components and the natural language helps to optimize the computation graph.
PlanRAG: @dair_ai reported PlanRAG enhances decision making with a new RAG technique called iterative plan-then-RAG. It involves two steps: 1) an LLM generates the plan for decision making by examining data schema and questions and 2) the retriever generates the queries for data analysis. The final step checks if a new plan for further analysis is needed and iterates on previous steps or makes a decision on the data.
Mitigating Memorization in LLMs: @dair_ai noted this paper presents a modification of the next-token prediction objective called goldfish loss to help mitigate the verbatim generation of memorized training data.
Tree Search for Language Model Agents: @dair_ai reported this paper proposes an inference-time tree search algorithm for LM agents to perform exploration and enable multi-step reasoning. It's tested on interactive web environments and applied to GPT-4o to significantly improve performance.

AI Applications and Demos

Wayve PRISM-1: @adcock_brett reported Wayve AI introduced PRISM-1, a scene reconstruction model of 4D scenes (3D in space + time) from video data. Breakthroughs like this will be crucial in the development of autonomous driving.
Runway Gen-3 Alpha: @adcock_brett noted Runway demoed Gen-3 Alpha, a new AI model that can generate 10-second videos from text prompts and images. These human characters are 100% AI-generated.
Hedra Character-1: @adcock_brett reported Hedra launched Character-1, a new foundation model that can turn images into singing portrait videos. The public preview web app can generate up to 30 seconds of expressive talking, singing, or rapping characters.
ElevenLabs Text/Video-to-Sound: @adcock_brett noted ElevenLabs launched a new open-source text and video-to-sound effects app and API. Devs can now build apps that generate sound effects based on text prompts or add sound to silent videos.

Memes and Humor

Gilded Frogs: @c_valenzuelab defined "Gilded Frogs" as frogs that have amassed great wealth and adorn themselves with luxurious jewelry, including gold chains, gem-encrusted bracelets, and rings, covering their skins with diamonds, rubies, and sapphires.
Llama.ttf: @osanseviero noted Llama.ttf is a font which is also an LLM. TinyStories (15M) as a font 🤯 The font engine runs inference of the LLM. Local LLMs taken to an extreme.
VCs Funding GPT Wrapper Startups: @abacaj posted a meme image joking about VCs funding GPT wrapper startups.
Philosophers vs ML Researchers: @AmandaAskell posted a meme image comparing the number of papers published by philosophers vs ML researchers.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

Stable Diffusion / AI Image Generation

Pony Diffusion model impresses users: In /r/StableDiffusion, users are discovering the capabilities and creative potential of the Pony Diffusion model, finding it fun and refreshing to use. Some admit to underestimating Pony's responsibility and prompt adherence. There are requests for in-depth Pony tutorials to help produce desired family-friendly anime/manga style images while avoiding unintended NSFW generations.

New techniques and model updates: Users are sharing background replacement, re-lighting and compositing workflows in ComfyUI and demonstrating the use of the [SEP] token for multiple prompts in adetailer models. The SD.Next release announcement highlights 10+ improvements like quantized T5 encoder support, PixArt-Sigma variants, HunyuanDiT 1.1, and efficiency upgrades for low VRAM GPUs. sd-scripts now supports training Stable Diffusion 3 models.

Creative applications and model comparisons: An exhibition at the Nikola Tesla Museum features 118 AI-assisted artworks created with Stable Diffusion, highlighting adoption outside the AI community. New LoRA models like Aether Illustration for Nordic-style portraits and a black-and-white illustration style for SDXL are being released. A comparison of various models on a "woman lying on grass" prompt sparks discussion on their relative performance.

Licensing discussions: Users discovered the initial Stable Cascade weights were released under an MIT license for about 4 days before changing to a more restrictive one, suggesting potential for commercial use of the MIT-licensed version. This has led to people downloading that specific version.

ChatGPT / AI Assistants

AI-generated games impress users: In /r/ChatGPT, Claude, an AI assistant, created a playable 3D first-person shooter game within the chat interface. The game, which involves shooting happy emojis at sad monsters, was Claude's own idea. This is seen as a groundbreaking moment, with AI now competing with beginner human game developers. Users appreciate Claude's cute and hopeful approach.

Model performance and benchmarks: Claude 3.5 Sonnet outperforms other models on various benchmarks like MMLU-Pro, according to recently released results.

Improving chatbots with knowledge integration: In /r/singularity, a user is surprised large AI companies haven't connected their chatbots to knowledge bases like Wikipedia or tools like WolframAlpha for improved accuracy on facts, math, physics, etc. They believe the underlying technology exists but needs integration, though language models may still face fundamental limitations.

AI Discord Recap

SPECIAL NOTE: As we do for fresh frontier model launches, we are displaying the differences in output from running the same prompts across Claude 3.5 and 3 and GPT4o.

Claude 3 Sonnet
1. LLM Performance Benchmarking and Advancements

Meta's Llama 3 model has rapidly risen to the top of leaderboards like ChatbotArena, outperforming models like GPT-4-Turbo and Claude 3 Opus as mentioned in this discussion.
New models like IBM's Granite-8B-Code-Instruct and DeepSeek's DeepSeek-V2 236B model were discussed, with the latter outperforming GPT-4 in some benchmarks.
However, there was skepticism around certain benchmarks and calls for credible sources to set realistic evaluation standards.

2. Techniques for Efficient LLM Training and Inference

DeepSpeed's ZeRO++ was mentioned as promising 4x reduced communication overhead for large model training on GPUs.
The vAttention system was discussed for dynamically managing KV-cache for efficient inference without PagedAttention.
QServe's W4A8KV4 quantization was highlighted as a technique to boost cloud LLM serving performance on GPUs.
Techniques like Consistency LLMs were mentioned for exploring parallel token decoding to reduce inference latency.

3. Open-Source AI Frameworks and Community Efforts

The Axolotl project was discussed for supporting diverse dataset formats for instruction tuning and LLM pre-training.
LlamaIndex was mentioned as powering a new course on building agentic RAG systems with Andrew Ng.
The open-source RefuelLLM-2 model was introduced, claiming to be the best for "unsexy data tasks".
Discussions around Modular's Mojo highlighted its potential for Python integration and AI extensions like bfloat16 as mentioned here.

4. Multimodal AI and Generative Modeling

The Idefics2 8B Chatty and CodeGemma 1.1 7B models were mentioned for their focus on chat interactions and coding abilities respectively, as discussed here.
The Phi 3 model was highlighted for bringing powerful AI chatbots to browsers via WebGPU, according to this Reddit link.
Combining Pixart Sigma, SDXL and PAG was mentioned as aiming to achieve DALLE-3 level outputs, with potential for further refinement through fine-tuning, as per this discussion.
The open-source IC-Light project focused on improving image relighting techniques was also brought up in this conversation.

Claude 3.5 Sonnet

LLM Performance and Benchmarking Debates:

New models like DeepSeek-V2 and Hermes 2 Theta Llama-3 70B are generating buzz for their performance. However, there's growing skepticism across communities about AI benchmarks and leaderboards, with calls for more credible evaluation methods.
Discussions in multiple discords touched on the challenges of comparing models like GPT-4, Claude, and newer open-source alternatives, highlighting the complexity of real-world performance versus benchmark scores.

Optimizing LLM Training and Inference:

Communities are sharing techniques for improving LLM efficiency, such as quantization methods and optimizing for specific hardware like AMD GPUs.
There's significant interest in reducing computational costs, with discussions ranging from VRAM optimization to novel architectures for more efficient inference.

Open-Source AI Development and Collaboration:

Projects like Axolotl and LlamaIndex are fostering community-driven development of AI tools and frameworks.
There's a trend towards open-sourcing models and tools, as seen with RefuelLLM-2 and Llamafile, encouraging wider participation in AI development.

Multimodal AI and Creative Applications:

Discussions across discords highlight the growing interest in multimodal models that can handle text, image, and potentially video, with projects like Stable Artisan bringing these capabilities to wider audiences.
There's ongoing experimentation with combining different models and techniques to achieve DALL-E 3-level outputs, showing a community-driven approach to advancing generative AI capabilities.

AI Integration and Practical Applications:

Multiple communities are exploring ways to integrate AI into everyday tools, from browser-based models to Discord bots for media creation.
There's a growing focus on making AI more accessible and useful for specific tasks, as seen in discussions about code generation, data analysis, and creative applications across various discord channels.

Claude 3 Opus

LLM Performance and Benchmarking:

Discussions on the performance of models like Llama 3, DeepSeek-V2, and Claude 3.5 Sonnet across various benchmarks and leaderboards.
Skepticism surrounding certain benchmarks, such as AlpacaEval, with calls for more credible assessment standards.

Optimizing LLM Training and Inference:

Techniques for efficient training, such as ZeRO++ and Consistency LLMs, and optimized inference with vAttention and QServe.
Discussions on quantization methods, like W4A8KV4, and their impact on model performance and resource requirements.

Open-Source AI Frameworks and Collaborations:

Updates and collaborations involving open-source frameworks like Axolotl, LlamaIndex, and Modular.
Introducing new open-source models, such as RefuelLLM-2 and Granite-8B-Code-Instruct.

Multimodal AI and Generative Models:

Advancements in multimodal AI with models like Idefics2 8B Chatty and CodeGemma 1.1 7B.
Innovations in generative modeling, such as Phi 3 for browser-based chatbots and combining techniques to achieve DALLE-3-level outputs.
Open-source efforts in image relighting with projects like IC-Light.

AI Ethics, Legality, and Accountability:

Discussions on the ethical implications of AI-generated content, as seen with Perplexity AI's alleged plagiarism.
Concerns about the legal risks associated with AI models making inaccurate or defamatory statements, as highlighted in the Perplexity AI case.
Debates on the accountability of tech companies using open datasets and the practice of "AI data laundering".

GPT4T (gpt-4-turbo-2024-04-09)
**1. AI Hardware Evolves but Costs Spiral:

VRAM requirements for AI models like Command R (34b) Q4_K_S lead to discussions about switching to EXL2, a more VRAM-efficient format. The NVIDIA DGX GH200 remains out of reach due to high costs.

**2. Optimization Takes Center Stage in AI Tools:

Quantization techniques are leveraged to optimize model performance, with ROCm's versions of xformers and flash-attention mentioned for efficiency. Implementation of PyTorch enhancements in the Llama-2 model results in significant performance boosts.

**3. AI Breaks New Ground in Multifaceted Applications:

AI-powered reading aid Pebble, developed with OpenRouter, Pebble, illustrates novel uses of AI in enhancing reading comprehension. Cohere's APIs enable multi-step tool uses, significantly broadening functional reach, aided by detailed documentation on multi-step tool use.

**4. AI-centric Education and Jobs Expanding:

MJCET launches the first AWS Cloud Club in Telangana to educate students on AWS technologies, celebrating with an event featuring AWS Community Hero Mr. Faizal Khan. RSVP through event link. Also, AI-inspired educational content is proffered through Hamel Dan's LLM101n course, teaching LLM building from scratch.

**5. Miscellaneous AI Developments Reflect Wide Impact:

Intel retracts from AWS, puzzling the AI community on resource allocations. Claude Sonnet 3.5's prowess in coding tasks garners praise, showcasing AI's advancement in technical applications. In bizarre turns, Mozilla's Llamafile ventures into Android territory promising innovative mobile solutions.

GPT4O (gpt-4o-2024-05-13)
Model Optimization and LLM Innovations

DeepSeek and Sonnet 3.5 Dominate Benchmarks: The DeepSeek model impressed the community with its quick performance and coding abilities, surpassing GPT-4 in some cases (DeepSeek announcement). Similarly, Claude 3.5 Sonnet outperformed GPT-4o in coding tasks, validated through LMSYS leaderboard positions and hands-on usage (Claude thread).
ZeRO++ and PyTorch Accelerate LLM Training: ZeRO++ reduces communication overhead in large model training by 4x, while new PyTorch techniques accelerate Llama-2 inference by 10x, encapsulated in the GPTFast package, optimizing its use on A100 or H100 GPUs (ZeRO++ tutorial).

Open-Source Developments and Community Efforts

Axolotl and Modular Encourage Community Contributions: Axolotl announced the integration of ROCm fork versions of xformers for AMD GPU support, and Modular users discussed contributing to learning materials for LLVM and CUTLASS (related guide).
Featherless.ai and LlamaIndex Expand Capabilities: Featherless.ai, a new platform to run public models serverlessly, was launched to wide curiosity (Featherless). LlamaIndex now supports image generation via StabilityAI, enhancing its toolkit for AI developers (LlamaIndex-StabilityAI).

AI in Production and Real-World Applications

MJCET's AWS Cloud Club Takes Off: The inauguration of the AWS Cloud Club at MJCET promoted hands-on AWS training and career-building initiatives (AWS event).
Use of OpenRouter in Practical Applications: JojoAI was highlighted for its proactive assistant capabilities, using integrations like DigiCord to outshine competitive models like ChatGPT and Claude (JojoAI site).

Operational Challenges and Support Queries

Installation and Compatibility Issues Plague Users: Difficulties in setting up libraries like xformers on Windows raised compatibility discussions, with suggestions converging on Linux for more stable operations (Unsloth troubleshooting).
Credit and Support Issues: Numerous members of the Hugging Face and Predibase communities faced issues with missing service credits and billing inquiries, showcasing the need for improved customer support systems (Predibase).

Upcoming Technologies and Future Directions

Announcing New AI Models and Clusters: AI21's Jamba-Instruct with a 256K context window and NVIDIA's Nemotron 4 highlighted breakthroughs in handling large-scale enterprise documents (Jamba-Instruct, Nemotron-4).
Multi Fusion and Quantization Techniques: Discussions on the merits of early versus later fusion in multimodal models and advancements in quantization highlighted ongoing research in reducing AI model inference cost and boosting efficiency (Multi Fusion).

PART 1: High level Discord summaries
HuggingFace Discord
Juggernaut or SD3 Turbo for Virtual Realities?: While Juggernaut Lightning is favored for its realism in non-coding creative scenarios, SD3 Turbo wasn't discussed as favorably, suggesting that choices between models are influenced by specific context and goals.
Quantum Leap for PyTorch Users: Investments in libraries like PyTorch and HuggingFace are recommended over dated ones like sklearn, and use of bitsandbytes and precision modifications such as 4-bit quantization can assist with model loading on constrained hardware.
Meta-Model Mergers and Empathic Evolutions: The Open Empathic project is expanding with contributed movie scene categories via YouTube, while merging tactics for UltraChat and Mistral-Yarn elicited debate, with references to mergekit and frankenMoE finetuning as noteworthy techniques for improving AI models.
Souped-Up Software and Services: A suite of contributions surfaced, including Mistroll 7B v2.2's release, simple finetuning utilities for Stable Diffusion, a media-to-text conversion GUI using PyQt and Whisper, and the new AI platform Featherless.ai for serverless model usage.
In Pursuit of AI Reasoning Revelations: Plans to unravel recent works on reasoning with LLMs are brewing, with Understanding the Current State of Reasoning with LLMs (arXiv link) and repositories like Awesome-LLM-Reasoning and its namesake alternative repository link earmarked for examination.

Unsloth AI (Daniel Han) Discord

Unsloth AI Previews Generate Buzz: A member's anticipation for Unsloth AI's release led to the sharing of a temporary recording, as theywaited for early access after a video filming announcement. Thumbnail updates, such as changing "csv -> unsloth + ollama" to "csv -> unsloth -> ollama", were suggested for clarity, alongside adding explainer text for newcomers.
Big VRAM Brings Bigger Conversations: A YouTube video showcased the PCIe-NVMe card by Phison as an astonishing 1Tb VRAM solution, sparking discussions about its impact on performance. Meanwhile, Fimbulvntr's success in extending Llama-3-70b to a 64k context and the debate on VRAM expansion highlighted the ongoing exploration of large model capacities.
Upgrades and Emotions in LLMs: Monday or Tuesday earmarked the Ollama update, promising CSV file support, while Sebastien's emotional llama model, fostering a better understanding of emotions in AI, became available on Ollama and YouTube.
Solving Setups & Compatibility: From struggles to install xformers on Windows with Unsloth via conda to ensuring correct execution of initial setup cells in Google Colab notebooks, members swapped tips for overcoming software challenges. GPU Cloud (NGC) container setup discussions, as well as CUDA and PyTorch version constraints, featured solutions like using different containers and sharing Dockerfile configurations.
Pondering on Partnerships & AI Integration: A blog titled "Apple and Meta Partnership: The Future of Generative AI in iPhones" stirred the guild's interest, with discussions focused on the strategic implications and potential integration challenges of generative AI in mobile devices.

Stability.ai (Stable Diffusion) Discord

Bot Beware: A Discord bot was shared for integrating Gemini and StabilityAI services, but members raised safety and context concerns regarding the link.
Civitai Pulls SD3 Amidst License Concerns: The removal of SD3 resources by Civitai sparked intense discussions, suggesting the step was taken to preempt legal issues.
Running Stable with Less: Techniques for operating Stable Diffusion on lower specification GPUs, like utilizing automatic1111, were debated, weighing the efficiency of older GPUs against newer models like the RTX 4080.
Training Troubles and Tips: Community members sought advice for training models and overcoming errors such as VRAM limits and problematic metadata, with some suggesting specialized tools like ComfyUI and OneTrainer for enhanced management.
Model Compatibility Confusion: Discussions highlighted the necessity for alignment between models like SD 1.5 and SDXL with add-ons such as ControlNet; mismatched types can lead to performance degradation and errors.

CUDA MODE Discord

CUTLASS and CUDA Collaboration Call: Users expressed interest in forming a CUTLASS working group, encouraged by a shared YouTube talk on Tensor Cores. Additionally, insights on the CPU cache were amplified with a shared primer on cache functionality, highlighting its significance for programmers.
Floating Points and Precision Perils: Precision loss in FP8 conversion drew attention, prompting a shared resource for understanding rounding per IEEE convention and the use of tensor scaling to counteract loss. For those exploring quantization, a compilation of papers and educational content was recommended, including Quantization explained and Advanced Quantization.
Enthusiasts of INT4 and QLoRA Weigh In: In a discussion contrasting INT4 LoRA fine-tuning versus QLoRA, it was noted that QLoRA's inclusion of a CUDA dequant kernel (axis=0) sustains both quality and pace, especially compared to solutions using tinnygemm for large sequences.
Networks Need Nurturing: The integration of Bitnet tensors with AffineQuantizedTensor sparked debate, considering special layouts for specifying packed dimensions. For assistance with debugging Bitnet tensor issues, CoffeeVampire3's GitHub and the PyTorch ao library tutorials were spotlighted as go-to resources.
Strategies to Scale System Stability: Strategies for multi-node setup optimizations and integrating FP8 matmuls were at the forefront of conversations, addressing performance challenges and training stability, especially on H100 GPUs which showed issues compared to A100. Upcoming large language model training on a Lambda cluster was also prepped for, with an eye on efficiency and stability.

LM Studio Discord
VRAM Crunch and Hefty Price Tags: Engineers highlighted the VRAM bottleneck when handling colossal models like Command R (34b) Q4_K_S, suggesting EXL2 as a more VRAM-efficient format. For heavy-duty AI work, the NVIDIA DGX GH200, touted for its mammoth memory, remains out of reach financially for most, hinting at thousands of dollars in investment.
Quantum Leaps in LLM Reasoning: Users were impressed with the Hermes 2 Theta Llama-3 70B model, known for its significant token context limit and creative strengths. Conversations around LLMs lack temporal awareness spurred mention of the Hathor Fractionate-L3-8B for its performance when output tensors and embeddings remain unquantized.
Cool Rigs and Hot Chips: On the hardware battlefield, using P40 GPUs with Codestral demonstrated a surge in power utilization to 12 tokens/second. Meanwhile, the iPad Pro’s 16GB RAM was debated for its ability to handle AI models, and the dream of using DX or Vulkan for multi-GPU support in AI was floated in response to the absence of NVlink in 4000 series GPUs.
Patchwork and Plugins: The LLaMa library vexed users with errors stemming from a model's expected tensor count mismatch, whereas deepseekV2 faced loading woes, potentially fixable by updating to V0.2.25. Enthusiasm bubbled for a hypothetical all-in-one model runner that could handle a gamut of Huggingface models including text-to-speech and text-to-image.
Model Engineering and Enigmas: The quaintly named Llama 3 CursedStock V1.8-8B model piqued curiosity for its unique performance, especially in creative content generation. There was chatter about a Multi-model sequence map allowing data flow among several models, and the latest quantized Qwen2 500M model made waves for its ability to operate on less capable rigs, even a Raspberry Pi.

OpenAI Discord

Siri and ChatGPT's Odd Couple: There's confusion among users about Siri's integration with ChatGPT, with the consensus being that ChatGPT acts as an enhancement to Siri rather than a core integration. Elon Musk's critical comments fueled further discussion on the topic.
Claude's Coding Coup Over GPT-4o: The Claude 3.5 Sonnet is praised for its superior performance in coding tasks compared to GPT-4o, with users highlighting Claude's success in areas where GPT-4o stumbled. Effectiveness is gauged by both practical usage and positions on the LMSYS leaderboard rather than just benchmark scores.
Persistent LLM Personal Assistant Dreaming: Enthusiasm is noted regarding the possibility of tailoring and maintaining language models, like Sonnet 3.5 or Gemini 1.5 Pro, to serve as personalized work-bots trained on an individual's documents, prompting discussions about long-term and specialized applications of LLMs.
GPT-4o’s Context Window Woes: Users struggle with limitations in GPT-4o's ability to adhere to complex prompt instructions and handle lengthy documents. Alternatives such as Gemini and Claude are suggested for better performance with larger token windows.
DALL-E Vs. Midjourney Artistic Showdown: A debate is unfolding on the server over DALL-E 3 and Midjourney’s capacities for generating AI images, particularly in the realm of paint-like artworks, with some showing a preference for the former's distinct artistic styles.

Perplexity AI Discord

Perplexity AI Caught in Plagiarism Uproar: Wired reported Perplexity AI's alleged policy violations by scraping websites, with its chatbot misattributing a crime to a police officer and a debate emerging on the legal implications of inaccurate AI summaries.
Mixed Reactions to Claude 3.5 Sonnet: The release of Claude 3.5 Sonnet was met with both applause for its capabilities and frustration for seeming overcautious, as reported by Forbes, while users experienced inconsistencies with Pro search results leading to dissatisfaction with Perplexity's service.
Exclusives on Apple and Boeing's Struggles: Apple's AI faced limitations in Europe while Boeing's Starliner confronted significant challenges, information disseminated on Perplexity with direct links to articles on these issues (Apple Intelligence Isn't, Boeing’s Starliner Stuck).
Perplexity API Quandaries: The Perplexity API community discussed issues like potential moderation triggers or technical errors with LLama-3-70B when handling long token sequences, and queries about restricting link summarization and time filtration in citations via the API were raised as documented in the API reference.
Community Convergence for Better Engagement: An OpenAI community message highlighted the need for shareable threads to foster greater collaboration, while a Perplexity AI-authored YouTube video previews diverse topics like Starliner dilemmas and OpenAI's latest moves for educational consumption.

Nous Research AI Discord
Boost in Dataset Deduplication: Rensa outperforms datasketch with a 2.5-3x speed boost, leveraging Rust's FxHash, LSH index, and on-the-fly permutations for dataset deduplication.
Model Jailbreak Exposed: A Financial Times article highlights hackers "jailbreaking" AI models to reveal flaws, while contributors on GitHub share a "smol q* implementation" and innovative projects like llama.ttf, an LLM inference engine disguised as a font file.
Lively Debate on Model Parameters: In the ask-about-llms, discussions ranged from the surprisingly capable story generation of TinyStories-656K to assertions that general-purpose performance soars with 70B+ parameter models.
Dataset Synthesis and Classification Enhanced: Members share a Google Sheet for collaborative dataset tracking, explore improvements using the Hermes RAG format, and delve into datasets like SciRIFF and ft-instruction-synthesizer-collection for scientific and instructional purposes.
AI Safety Models Scrutiny and Coursework: #general sees a mix, from Gemini and OpenAI's redaction-capable safety models to the launch of Karpathy's LLM101n course, encouraging engineers to build a storytelling LLM.

Eleuther Discord

SLURM Hiccups with Jupyter: Engineers are facing issues with SLURM-managed nodes when connecting via Jupyter Notebook, citing errors potentially due to SLURM restrictions. A user experienced a 'kill' message on console before training even with correct GPU specifications.
PyTorch Boosts Llama-2 Performance: PyTorch's team has implemented techniques to accelerate the Llama-2 inference speed by up to a factor of ten; the enhancements are encapsulated in the GPTFast package, which requires A100 or H100 GPUs.
Ethics and Sharing of AI Models: A serious conversation about the ethical and practical considerations of distributing proprietary AI models such as Mistral outside official sources highlighted concerns for legalities and the importance of transparency.
Understanding AI Model Variants: Users debate methods to determine if an AI model is GPT-4 or a different variant, including examining knowledge cutoffs, latency disparities, and network traffic analysis.
LingOly Challenge Introduces: A new LingOly benchmark is addressing the evaluation of LLMs in advanced reasoning involving linguistic puzzles. With over a thousand problems presented, top models are achieving below 50% accuracy, indicating a robust challenge for current architectures.
Text-to-Speech Innovation with ARDiT: A podcast episode explores the usage of SAEs for model editing, inspired by the approach detailed in the MEMIT paper and its source code, suggesting wide applications for this technology.
Pondering the Optimality of Multimodal Architectures: Dialogue surfaced about whether an early fusion model, like Chameleon, stands superior to later fusion approaches for multimodal tasks. The trade-off between generalizability and visual acuity loss in the image tokenization process of early fusion was a focus.
Intel Retreats from AWS Instance: Intel is discontinuing their AWS instance leveraged by the gpt-neox development team, prompting discussions on cost-effective or alternative manual solutions for computational resources.
Execution Error: NCCL Backend: Engineers report persistent NCCL backend challenges while attempting to train models with gpt-neox on A100 GPUs, a problem consistent across various NCCL and CUDA versions, with Docker use or without.

Latent Space Discord

Character.AI Cracks Inference at Scale: Noam Shazeer of Character.AI illuminates the pursuit of AGI through optimization of inference processes, emphasizing their capability to handle upwards of 20,000 inference queries every second.
Acquisition News: OpenAI Welcomes Rockset: OpenAI has acquired Rockset, a company skilled in hybrid search architecture with solutions like vector (FAISS) and keyword search, strengthening OpenAI's RAG suite.
AI Education boost by Karpathy: Andrej Karpathy plants the seeds of an ambitious new course, "LLM101n," which will deep dive into constructing ChatGPT-like models from ground up, following the legacy of the legendary CS231n.
LangChain Clears the Air on Funds: Harrison Chase addresses scrutiny regarding LangChain's expenditure of venture capital on product development instead of promotions, with a response detailed in a tweet.
Murati Teases GPT's Next Leap: Mira Murati of OpenAI teases enthusiasts with a timeline hinting at a possible release of the next GPT model in about 1.5 years, while discussing the sweeping changes AI is bringing into creative and productive industries, available in a YouTube video.
Latent Space Scholarship on Hiring AI Pros: A new "Latent Space Podcast" episode breaks down the art and science of hiring AI engineers, guiding listeners through hiring processes and defensive AI engineering strategies, with insights from @james_elicit and @adamwiggins available on this page and gathering buzz on Hacker News.
Embarking on new YAML Frontiers: Conversations illustrate developing a YAML-based DSL for Twitter management to enhance post analytics, with a nod to Zoho Social's comprehensive features; for similar ventures, Anthropics suggests employing XML tags, and a GitHub repo showcases the successful design of a YAML templating language with LLMs in Go.

Modular (Mojo 🔥) Discord

LLVM's Price Tag: An article estimating the cost of the LLVM project was shared, detailing that 1.2k developers produced a codebase of 6.9M lines with an estimated cost of $530 million. Cloning and checking out LLVM is part of understanding its development costs.
Installation Troubles and Request for Help: Issues with Mojo installation on 22.04 were highlighted, citing failures in all devrel-extras tests; a problematic situation that led to a pause for troubleshooting. Separately, frustration over segmentation faults during Mojo development prompted a user to offer a $10 OpenAI API key for help with their critical issue.
Discussions on Caching and Prefetching Performance: Deep dives into caching and prefetching, with emphasis on correct application and pitfalls, were a significant conversation topic. Insights shared included the potential for adverse effects on performance if prefetching is incorrectly utilized, and recommendations to utilize profiling tools such as vtune for Intel caches, even though Mojo does not support compile-time cache size retrieval.
Improvement Proposals and Nightly Mojo Builds: Suggested improvements for Mojo's documentation and a proposal for controlled implicit conversion in Mojo were noted. Updates on new nightly Mojo compiler releases as well as MAX repo updates sparked discussions on developmental workflow and productivity.
Data Labeling and Integration Insights: A new data labeling platform initiative received feedback about common pain points and successes in automation with tools like Haystack. The potential for ERP integration (prompted by manual data entry challenges and PDF processing) was also a focal point, indicating a push towards streamlining workflows in data management.

LAION Discord

New Gates Open at Weta & Stability AI: A wave of discussions followed news of leadership changes at Weta Digital and Stability AI, focusing on the implications of these shake-ups and questioning the motives behind the appointments. Some talks pointed to Sean Parker and shared articles on the subject, linking a Reuters article Reuters article on Stability AI.
Llama 3 on the Prowl: There was palpable excitement about the Llama 3 hardware specifications suggesting impressive performance, potentially outclassing rival models like GPT-4O and Claude 3. Participants shared projected throughputs of "1 to 2 tokens per second" on advanced setups.
The Protection Paradox with Glaze & Nightshade: A sobering conversation unfolded over the limited ability of programs like Glaze and Nightshade to protect artists' rights. Skeptics noted that second movers often find ways around such protections, thus providing artists with potentially false hope.
Multimodal Models – A Repetitive Breakthrough?: The guild examined a new paper on multimodal models, raising the question of whether the purported advancements were meaningful. The paper promotes training on a variety of modalities to enhance versatility, yet participants critiqued the repeated 'breakthrough' narrative with little substantial novelty.
Testing Limits: Promises and Limitations of Diffusion Models: A deeper dive into diffusion models was encapsulated in a GitHub repository shared by lucidrains, discussing the EMA (Exponential Moving Average) model updates (Diffusion Models on GitHub) and their use in image restoration, despite evidence pointing to the consistent bypassing of protections like Glaze.

Cohere Discord

Welcome Wagon for Newcomers: New members joined the Cohere-focused Discord, guided by shared insights and tool use documentation that helps connect Cohere models to external applications.
Skepticism Surrounding BitNet Practicality: Amidst debates on BitNet's future, it's noted to require training from scratch and is not optimized for existing hardware, leading Mr. Dragonfox to express concerns about its commercial impracticality.
Cohere Capacities and Contributions: Following the integration of a Cohere client in Microsoft's AutoGen framework, there was a call within the community for further support from the Cohere team in the project's advancement.
AI Enthusiasts Eager for Multilingual Expansions: Cohere's model's ability to understand and respond in multiple languages, including Chinese, was confirmed, directing interested parties to documentation and a GitHub notebook example to learn more.
Developer Office Hours and Multi-Step Innovations: Cohere announced upcoming developer office hours emphasizing the Command R family's tool use capabilities, providing resources on multi-step tool use for leveraging models to execute complex sequences of tasks.

LangChain AI Discord

Confusion Over Context and Tokens: Users reported confusion regarding the integration of max tokens and context windows in agents, specifically with LangChain not adhering to Pydantic models' validations. It was noted that context window or max token counts should include both the input and generated tokens.
LangChain Learning and Implementation Queries: There was a spirited discussion about the learning curve with LangChain, with members sharing resources like Grecil's personal journey that includes tutorials and documentation. Meanwhile, debate about ChatOpenAI versus Huggingface models highlighted performance differences and adaptation in various scenarios.
Enhancing PDF Interrogation with LangChain: A detailed guide was shared for generating Q&A pairs from PDFs using LangChain, referring to issues like #17008 on GitHub for further guidance. Adjustments for using Llama2 as the LLM were also discussed, emphasizing customizing the QAGenerationChain.
From Zero to RAG Hero: Members showcased their experience building no-code RAG workflows for financial documents, an article detailing the process was shared. A discussion also centered around a custom Corrective RAG app and Edimate, an AI-driven video creation, demoed here, which signs a future for e-learning.
AI Framework Evaluation Video: For engineers evaluating AI frameworks for app integration including models like GPT-4o, a YouTube video was shared, urging developers to consider critical questions regarding the necessity and choice of the AI framework for specific applications.

OpenRouter (Alex Atallah) Discord

Jamba Instruct Boasts Big Context Window: AI21's Jamba-Instruct model has been introduced, showcasing a gigantic 256K context window, ideal for handling extensive documents in enterprise settings.
Nemotron 4 Makes Waves with Synthetic Data Generation: NVIDIA's release of Nemotron-4-340B-Instruct focuses on synthetic data generation for English-language applications with its new chat model.
JojoAI Levels Up to Proactive Assistant: JojoAI differentiates itself by becoming a proactive assistant that can set reminders, employing DigiCord integrations, positioning it apart from competitors like ChatGPT or Claude. Experience it on the JojoAI site.
Pebble's Pioneering Reading Aid Tool: The unveiling of the Pebble tool, powered by OpenRouter with Mistral 8x7b and Gemini, provides a resource for enhancing reading comprehension and retention for web content. Kudos to the OpenRouter team for their support as acknowledged at Pebble.
Tech Community Tackles Environmental and Technical Issues: Discussions pointed to concerns about the environmental footprint of using models like Nemotron 340b, with smaller models being recommended for efficiency and eco-friendliness. The community also dealt with practical affairs, such as resolving the disappearance of Claude self-moderated endpoints, praising Sonnet 3.5 for coding capabilities, addressing OpenRouter rate limits, and advising on best practices for handling exposed API keys.

OpenInterpreter Discord

Local LLMs Enter OS Mode: The OpenInterpreter community has been discussing the use of local LLMs in OS mode with the command interpreter --local --os, but there are concerns regarding their performance levels.
Desktop Delights and GitHub Glory: The OpenInterpreter team is promoting a forthcoming desktop app with a unique experience compared to the GitHub version, encouraging users to join the waitlist. Meanwhile, the project has celebrated 50,000 GitHub stars, hinting at a major upcoming announcement.
Model Benchmarking Banter: The Codestral and Deepseek models have sparked attention with Codestral surpassing internal benchmarks and Deepseek impressing users with its quick performance. There's buzz about a future optimized interpreter --deepseek command.
Cross-Platform Poetry Performance: The use of Poetry for dependency management over requirements.txt has been a contentious topic, with some engineers pointing to its shortcomings on various operating systems and advocating for alternatives like conda.
Community Kudos and Concerns: While there's enthusiasm and appreciation for the community's support, particularly for beginners, there's also frustration regarding shipping delays for the 01 device, highlighting the balance between community sentiment and product delivery expectations.

LLM Finetuning (Hamel + Dan) Discord
Instruction Synthesizing for the Win: A newly shared Hugging Face repository highlights the potential of Instruction Pre-Training, providing 200M synthesized pairs across 40+ tasks, likely offering a robust approach to multi-task learning for AI practitioners looking to push the envelope in supervised multitask pre-training.
Bringing DeBERTa and Flash Together?: Curiosity is brewing over the possibility of combining DeBERTa with Flash Attention 2, posing the question of potential implementations that leverage both technologies to AI engineers interested in novel model architecture synergies.
Fixes and Workarounds: From a Maven course platform blank page issue solved using mobile devices to the resolution of permission errors after a kernel restart within braintrust, practical troubleshooting remains a staple of community discourse.
Credits Saga Continues: Persistent reports of missing service credits on platforms like Huggingface and Predibase sparked member-to-member support and referrals to respective billing supports. This included a tip that Predibase credits expire after 30 days, suggesting that engineers keep a keen eye on expiry dates to maximize credit use.
Training Errors and Overfitting Queries: Errors in running Axolotl's training command (Modal FTJ) and concerns about LORA overfitting ('significantly lower training loss compared to validation loss') were significant pain points, showcasing the need for vigilant model monitoring practices among AI engineers.

LlamaIndex Discord

LightningAI and LlamaIndex Join Forces: LightningAI's RAG template offers an easy setup for multi-document agentic RAGs, promoting efficiency in AI development. Additionally, LlamaIndex's integration with StabilityAI now allows for image generation, broadening AI developer capabilities.
Customizing Complexity with LlamaIndex: Those developing with LlamaIndex can customize text-to-SQL pipelines using Directed Acyclic Graphs (DAGs), as explained in this feature overview. Meanwhile, for better financial analysis, the CRAG technique can be leveraged using Hanane Dupouy's tutorial slides for improved retrieval quality.
Fine-Tuning RAGs with Mlflow: To enhance answer accuracy in RAGs, integrating LlamaIndex with Mlflow provides a systematic way to manage critical parameters and evaluation methods.
In-Depth Query Formatting and Parallel Execution in LlamaIndex: Members discussed LlamaIndex's query response modes like Refine and Accumulate, and the utilization of OLLAMA_NUM_PARALLEL for concurrent model execution; document parsing and embedding mismatches were also topics of technical advice.
Streamlining ML Workflows with MLflow and LLMs: A Medium article by Ankush K Singal highlights the practical integration of MLflow and LLMs through LlamaIndex to streamline ML workflows.

Interconnects (Nathan Lambert) Discord

Gemini vs. LLAMA Parameter Showdown: A source from Meta indicated that Gemini 1.5 Pro has fewer parameters than LLAMA 3 70B, inciting discussions about the impact of MoE architectures on parameter count during inference.
GPT-4's Secret Sauce or Distilled Power: The community debated whether GPT-4T/o are early fusion models or distilled versions of larger predecessors, showing divergence in understanding of their fundamental architectures.
Multimodal Training Dilemmas: Members highlighted the difficulties in post-training multimodal models, citing the challenges of transferring knowledge across different data modalities. The struggles suggest a general consensus on the complexity of enhancing native multimodal systems.
Nosing Into Nous and Sony's Stir: A tongue-in-cheek enquiry by a Nous Research member to @sonymusic sparked a blend of confusion and interest, touching upon AI's role in legal and innovation spaces.
Sketchy Metrics on AI Leaderboards: The legitimacy of the AlpacaEval leaderboard came under fire with engineers questioning biased metrics after a model claimed to have beaten GPT-4 while being more cost-effective. This led to discussions on the reliability of performance leaderboards in the field.

OpenAccess AI Collective (axolotl) Discord

ROCm Forks Entering the Fray: To utilize certain functionalities, engineers are advised to use the ROCm fork versions of xformers and flash-attention, with a note on hardware support specifically for MI200 & MI300 GPUs and requirement of ROCm 5.4+ and PyTorch 1.12.1+.
Reward Models Dubbed Subpar for Data Gen: The consensus is that the reward model isn't efficient for generating data, as it is designed mainly for classifying the quality of data, not producing it.
Synthesizing Standardized Test Questions: An idea was shared to improve AGI evaluations for smaller models by synthesizing SAT, GRE, and MCAT questions, with an additional proposal to include LSAT questions.
Enigmatic Epoch Saving Quirks: Training epochs are saving at seemingly random intervals, a behavior recognized as unusual but familiar to the community. This may be linked to the steps counter during the training process.
Dataset Formatting 101 and MinHash Acceleration: A member sought advice on dataset formatting for llama2-13b, while another discussed formatting for the Alpaca dataset using JSONL. Moreover, a fast MinHash implementation named Rensa is shared for dataset deduplication, boasting a 2.5-3x speed increase over similar libraries, with its GitHub repository available for community inputs (Rensa on GitHub).
Prompt Structures Dissected and Mirrored: Clarification on prompt_style in the Axolotl codebase unveiled different prompt formatting strategies with INSTRUCT, CHAT, and CHATML highlighted for contrasting interactive uses. The use of ReflectAlpacaPrompter to automate prompt structuring using the designated style was exemplified (More on Phorm AI Code Search).

Mozilla AI Discord

Llamafile Leveled Up: Llamafile v0.8.7 has been released, boasting faster quant operations and bug fixes, with whispers of an upcoming Android adaptation.
Globetrotting AI Events on the Horizon: SF gears up for the World's Fair of AI and the AI Quality Conference with community leaders in attendance, while the Mozilla Nightly Blog hints at potential llamafile integration offering AI services.
Mozilla Nightly Blog Talks Llamafile: The Nightly blog details experimentation with local AI chat services powered by llamafile, signaling potential for wider adoption and user accessibility.
Llamafile Execution on Colab Achieved: Successful execution of a llamafile on Google Colab demonstrated, providing a template for others to follow.
Memory Manager Facelift Connects Cosmos with Android: A significant GitHub commit for the Cosmopolitan project revamps the memory manager, enabling support for Android and stirring interest in running llamafile through Termux.

Torchtune Discord

ORPO's Missing Piece: The ORPO training option for Torchtune is not supported, though DPO can use a documented recipe for training, as noted by guild members citing a mix dataset for ORPO/DPO.
Epochs Stuck on Single Setting: Training on multiple datasets with Torchtune does not currently allow for different epoch settings for each—users should utilize ConcatDataset for combining datasets, but the same number of epochs applies to all.
To ChatML or Not to ChatML: Engineers debated the efficacy of utilizing ChatML templates with the Llama3 model, contrasting approaches using instruct tokenizer and special tokens against base models without these elements, referencing models like Mahou-1.2-llama3-8B and Olethros-8B.
Tuning Phi-3 Takes Tweaks: The task of fine-tuning Phi-3 models (like Phi-3-Medium-4K-Instruct) was addressed, with suggestions to modify the tokenizer and add a custom build function within Torchtune to enable compatibility.
System Prompts: Hack It With Phi-3: Despite Phi-3 not being optimized for system prompts, users can work around this by prepending system prompts to user messages and adjusting the tokenizer configuration with a specific flag discussed to facilitate fine-tuning.

tinygrad (George Hotz) Discord

Conditional Coding Conundrum: In discussions about tinygrad, the use of a conditional operation like condition * a + !condition * b as a simplification for the WHERE function was met with caution due to potential issues with NaNs.
Intel Adventures in Tinygrad: Queries about Intel support in tinygrad revealed that while opencl is an available option, the framework has not integrated XMX support to date.
Monday Meeting Must-Knows: The 0.9.1 release of tinygrad is on the agenda for the upcoming Monday meeting, focusing on tinybox updates, a new profiler, runtime improvements, Tensor._tri, llama cast speedup, and bounties for uop matcher speed and unet3d improvements.
Buffer View Toggle Added to Tinygrad: A commit in tinygrad introduced a new flag to toggle the buffer view, a change that was substantiated with a GitHub Actions run.
Lazy.py Logic in the Limelight: An engineer seeks clarification after their edits to lazy.py within tinygrad resulted in a mix of both positive and negative process replay outcomes, suggesting a need for further investigation or peer review.

LLM Perf Enthusiasts AI Discord

Claude Sonnet 3.5 Stuns with Performance: An engineer shared their experience using Claude Sonnet 3.5 in Websim, praising its speed, creativity, and intelligence. They were particularly taken with the "generate in new tab" feature and experimented with sensory engagement by toying with color schemes from iconic fashion brands, as shown in a shared tweet.

MLOps @Chipro Discord

AWS Cloud Club Lifts Off at MJCET: MJCET has launched the first AWS Cloud Club in Telangana, a community aimed at providing students with resources and experience in Amazon Web Services to prepare for tech industry careers.
Cloud Mastery Event with an AWS Expert: An inaugural event will celebrate the AWS Cloud Club's launch on June 28th, 2024, featuring AWS Community Hero Mr. Faizal Khan. Interested parties can RSVP via an event link.

The AI Stack Devs (Yoko Li) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

HuggingFace ▷ #general (715 messages🔥🔥🔥):

Juggernaut Lightning vs SD3 Turbo: A member recommended using Juggernaut Lightning as it is "way more realistic" compared to SD3 Turbo due to it being a base model. Another member mentioned Juggernaut being more suited for role-playing and creativity rather than coding and intelligence.
Help for Beginners: An ML beginner sought advice on which libraries to use for their project and received suggestions to use PyTorch for its extensive neural network support and HuggingFace for loading pre-trained models. Another member recommended avoiding outdated libraries like sklearn.
Model Loading Issues: A member faced challenges loading large AI models on limited hardware and received guidance on using quantization techniques to improve performance. Recommendations included installing the bitsandbytes library and instructions for modifying model load configurations to utilize 4-bit precision.
AI Content Creation Tools: There was a discussion on the complexities of generating AI-generated videos similar to Vidalgo, indicating that while generating text and audio is straightforward, creating small moving videos is challenging. Tools like RunwayML and Capcut were suggested for video edits and stock images.
Collaborative Projects and Model Updates: Members shared their experiences and projects related to various AI models, including a model trained to play games using Xbox controller inputs and a toolkit for preprocessing large image datasets. Additionally, ongoing work and upcoming updates on several models and their potential applications were discussed.

Links mentioned:
🧑‍🎓 How to use Continue | Continue: Using LLMs as you code with Continue
Chess notation - Wikipedia: no description found
mm ref: no description found
Datasets: no description found
Anthropic's SHOCKING New Model BREAKS the Software Industry! Claude 3.5 Sonnet Insane Coding Ability: Learn AI With Me:https://www.skool.com/natural20/aboutJoin my community and classroom to learn AI and get ready for the new world.#ai #openai #llm
SWE-bench: no description found
briaai/RMBG-1.4 · Hugging Face: no description found
alignment-handbook/recipes/zephyr-141b-A35b at main · huggingface/alignment-handbook: Robust recipes to align language models with human and AI preferences - huggingface/alignment-handbook
Apple M1 - Wikipedia: no description found
alignment-handbook/recipes/zephyr-7b-beta at main · huggingface/alignment-handbook: Robust recipes to align language models with human and AI preferences - huggingface/alignment-handbook
Paper page - Zephyr: Direct Distillation of LM Alignment: no description found
HuggingChat: Making the community's best AI chat models available to everyone.
GitHub - abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue): Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue) - abi/screenshot-to-code
GitHub - simpler-env/SimplerEnv: Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge): Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) - simpler-env/SimplerEnv
Hugging Face – Blog: no description found
@nroggendorff on Hugging Face: "@osanseviero your move": no description found
Playing a Neural Network's version of GTA V: GAN Theft Auto: GAN Theft Auto is a Generative Adversarial Network that recreates the Grand Theft Auto 5 environment. It is created using a GameGAN fork based on NVIDIA's Ga...
Huh Cat GIF - Huh Cat Cat huh - Discover & Share GIFs: Click to view the GIF
Hand Gesture Drawing App Demo | Python OpenCV & Mediapipe: In this video, I demonstrate my Hand Gesture Drawing App using Python with OpenCV and Mediapipe. This app allows you to draw on screen using hand gestures de...
stabilityai/stable-video-diffusion-img2vid-xt-1-1 · Hugging Face: no description found
microsoft/Phi-3-mini-4k-instruct-gguf at main: no description found
RAG chatbot using llama3: no description found
Azazelle/L3-RP_io at main: no description found
Vidalgo - One-Click Vertical Video Creation: Experience effortless video creation with Vidalgo! Our platform empowers you to produce stunning vertical videos for TikTok, YouTube Shorts, and Instagram Reels in just one click. Start creating today...
stabilityai/stablelm-zephyr-3b · Hugging Face: no description found
Hugging Face: The AI community building the future. Hugging Face has 227 repositories available. Follow their code on GitHub.
Toy Story Woody GIF - Toy Story Woody Buzz Lightyear - Discover & Share GIFs: Click to view the GIF
azaz (Z): no description found
GitHub - huggingface/alignment-handbook: Robust recipes to align language models with human and AI preferences: Robust recipes to align language models with human and AI preferences - huggingface/alignment-handbook
Agents & Tools: no description found
GitHub - beowolx/rensa: High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets: High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets - beowolx/rensa
How to write code to autocomplete words and sentences?: I'd like to write code that does autocompletion in the Linux terminal. The code should work as follows. It has a list of strings (e.g. &quot;hello, &quot;hi&quot;, &quot;how a...
GitHub - minimaxir/textgenrnn: Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.: Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code. - minimaxir/textgenrnn
GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks. - huggingface/datatrove
GitHub - not-lain/loadimg: a python package for loading images: a python package for loading images. Contribute to not-lain/loadimg development by creating an account on GitHub.
Vaas Far Cry3 GIF - Vaas Far Cry3 That Is Crazy - Discover & Share GIFs: Click to view the GIF
Tweet from vik (@vikhyatk): asked claude to make me a cool new vaporwave style home page... should i switch to it?
Tweet from vik (@vikhyatk): "make it better"
sonnet_shooter.zip: 1 file sent via WeTransfer, the simplest way to send your files around the world
huggingface_hub/src/huggingface_hub/hub_mixin.py at main · huggingface/huggingface_hub: The official Python client for the Huggingface Hub. - huggingface/huggingface_hub
Reddit - Dive into anything: no description found
Can Apple’s M1 Help You Train Models Faster & Cheaper Than NVIDIA’s V100?: In this article, we analyze the runtime, energy usage, and performance of Tensorflow training on an M1 Mac Mini and Nvidia V100. .
GitHub - maxmelichov/Text-To-speech: Roboshaul: Roboshaul. Contribute to maxmelichov/Text-To-speech development by creating an account on GitHub.
Robo-Shaul project: The Robo-Shaul Competition was a 2023 competition to clone the voice of Shaul Amsterdamski. The results are all here.
Introducing Accelerated PyTorch Training on Mac: In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, bu...
Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns - Nature Communications: Here, using neural activity patterns in the inferior frontal gyrus and large language modeling embeddings, the authors provide evidence for a common neural code for language processing.

HuggingFace ▷ #today-im-learning (3 messages):

Coding Self-Attention and Multi-Head Attention: A member shared a link to their blog post detailing the implementation of self-attention and multi-head attention from scratch. The blog post explains the importance of attention in Transformer architecture for understanding word relationships in a sentence to make accurate predictions. Read the full post here.
Interest in Blog Post: Another member expressed interest in the blog post on attention mechanisms. They affirmed their engagement with a simple "Yes I am interested."
Tree-Sitter S-expression Challenges: A member mentioned the challenges they are facing with Tree-Sitter S-expressions, referring to them as "a pain." This suggests difficulties in parsing or handling these expressions in their current work.

Link mentioned: Ashvanth.S Blog - Wrapping your head around Self-Attention, Multi-head Attention: no description found

HuggingFace ▷ #cool-finds (5 messages):

Implementing RMSNorm Layer in SD3: A member mentioned implementing an optional RMSNorm layer for the Q and K inputs, referencing the SD3 paper. No further details were provided on this implementation.
LLMs and Refusal Mechanisms: A blog post was shared about LLM refusal/safety highlighting that refusal is mediated by a single direction in the residual stream. The full explanation and more insights can be found in the paper now available on arXiv.
Florence-2 Vision Foundation Model: The abstract for Florence-2, a vision foundation model, was posted on arXiv. Florence-2 uses a unified prompt-based representation across various computer vision and vision-language tasks, leveraging a large dataset with 5.4 billion annotations.
Facebook AI Twitter Link: A Twitter link related to Facebook AI was shared without any additional context. Twitter link
wLLama Test Page: A link was shared to a wLLama basic example page demonstrating model completions and embeddings. Users can test models, input local files, and calculate cosine distances between text embeddings wLLama Basic Example.

Links mentioned:

wllama.cpp demo: no description found
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks: We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models exce...
Refusal in LLMs is mediated by a single direction — AI Alignment Forum: This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…
Refusal in Language Models Is Mediated by a Single Direction: Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is wid...

HuggingFace ▷ #i-made-this (12 messages🔥):

Mistroll 7B Version 2.2 Released: A member shared the Mistroll-7B-v2.2 model trained 2x faster with Unsloth and Huggingface's TRL library. This experiment aims to fix incorrect behaviors in models and refine training pipelines focusing on data engineering and evaluation performance.
Stable Diffusion Trainer Code Shared: A simple Stable Diffusion 1.5 Finetuner for experimentation was shared on GitHub. This "very janky" code uses Diffusers, aimed at helping users explore finetuning.
Media to Text Conversion Software Release: Developed by a member, this software converts media files into text using PyQt for GUI and OpenAI Whisper for STT, supporting local and YouTube video transcriptions. Available on GitHub.
Enhancements to SimpleTuner: Refactored and enhanced EMA support for SimpleTuner was shared, now compatible with SD3 and PixArt training, supporting CPU offload and step-skipping. The changes can be reviewed on GitHub.
Featherless.ai - New AI Platform: A member introduced Featherless.ai, a platform to run public models from Huggingface serverlessly, instantly. They are onboarding 100+ models weekly and aim to cover all HF public models, inviting users to try the service and provide feedback.

Links mentioned:

BarraHome/Mistroll-7B-v2.2 · Hugging Face: no description found
Linear Regression From Scratch In Python: Learn the implementation of linear regression from scratch in pure Python. Cost function, gradient descent algorithm, training the model…
GitHub - CodeExplode/MyTrainer: A simple Stable Diffusion 1.5 Finetuner for experimentation: A simple Stable Diffusion 1.5 Finetuner for experimentation - CodeExplode/MyTrainer
GitHub - yjg30737/pyqt-assistant-v2-example: OpenAI Assistant V2 Manager created with PyQt (focused on File Search functionality): OpenAI Assistant V2 Manager created with PyQt (focused on File Search functionality) - yjg30737/pyqt-assistant-v2-example
GitHub - yjg30737/whisper_transcribe_youtube_video_example_gui: GUI Showcase of using Whisper to transcribe and analyze Youtube video: GUI Showcase of using Whisper to transcribe and analyze Youtube video - yjg30737/whisper_transcribe_youtube_video_example_gui
EMA: refactor to support CPU offload, step-skipping, and DiT models | pixart: reduce max grad norm by default, forcibly by bghira · Pull Request #521 · bghira/SimpleTuner: no description found
CaptionEmporium/coyo-hd-11m-llavanext · Datasets at Hugging Face: no description found
Featherless - Serverless LLM: Featherless - The latest LLM models, serverless and ready to use at your request.
Featherless AI - Run every 🦙 AI model & more from 🤗 huggingface | Product Hunt: Featherless is a platform to use the very latest open source AI models from Hugging Face. With hundreds of new models daily, you need dedicated tools to keep with the hype. No matter your use-case, fi...

HuggingFace ▷ #reading-group (5 messages):

Chad plans reasoning with LLMs discussion: A member announced plans to discuss "reasoning with LLMs" next Saturday and received enthusiastic support. He felt most confident about this topic and chose it over Triton.
Readying for “Understanding the Current State of Reasoning with LLMs”: Chad stated he would start with the paper Understanding the Current State of Reasoning with LLMs arXiv link and referenced an elaborative Medium article article link.
Exploring Awesome-LLM-Reasoning repositories: He mentioned diving into repositories like Awesome-LLM-Reasoning and another repository with the same name alternative repository link to explore the current state of LLMs for logic.
Survey Paper Mentioned: Chad plans to go through the beginning of Natural Language Reasoning, A Survey survey PDF and reference papers published post-GPT-4 launch GPT-4 research link.
Seeking long-term planning papers: He expressed interest in learning about good long-term planning papers for LLMs, particularly those focused on pentesting.

Links mentioned:

Emergent Abilities of Large Language Models: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we...
Understanding the Current State of Reasoning with LLMs: The goal of this article is to go through the repos of Awesome-LLM-Reasoning and Awesome-LLM-reasoning for an understanding of the current…

HuggingFace ▷ #computer-vision (9 messages🔥):

Pricing Performance for OCR Models: Members are seeking recommendations for a good price-to-performance model for OCR that outputs in JSON. This highlights ongoing quests for cost-effective AI solutions.
Stable Faces, Changing Hairstyles Video: A video showing a model where "faces almost remained constant but the hairstyle kept changing" sparked curiosity about which model achieved this. The video can be found here.
Unsupported Image Type RuntimeError: A user encountered a "RuntimeError: Unsupported image type, must be 8bit gray or RGB image." This occurred during the encoding process of images for face recognition, with code provided for debugging.

Link mentioned: Tweet from Science girl (@gunsnrosesgirl3): The evolution of fashion using AI

HuggingFace ▷ #NLP (1 messages):
capetownbali: Let us all know how your fine tuning on LLama goes!

HuggingFace ▷ #diffusion-discussions (2 messages):

Redirect to diffusion-discussions channel: A user advised, "Your best bet is to ask here" for further discussions on the related topic.
Inquiry about audio conversion models: A member inquired about the availability of models for audio-to-audio conversion, specifically from Urdu/Hindi to English, indicating a need for multilingual processing capabilities.

Unsloth AI (Daniel Han) ▷ #general (376 messages🔥🔥):

Cossale eagerly awaits Unsloth's release: They requested early access and were informed by theyruinedelise that the video would be filmed the next day. They can watch a temporary recording in the meantime.
Feedback on Thumbnails and Flowcharts: Cossale suggested changes to the thumbnail for clarity, prompting theyruinedelise to update it from "csv -> unsloth + ollama" to "csv -> unsloth -> ollama". They also advised adding descriptive text below logos for beginner users.
Gigantic VRAM discussions impress: Members discussed Phison's impressive PCIe-NVMe card presenting as 1Tb VRAM, impacting performance. Fimbulvntr shared a YouTube video to explain this tech.
Excitement around extended LLMs: Fimbulvntr succeeded in extending Llama-3-70b’s context to 64k, and iron_bound debated performance implications of VRAM expansion. The conversation touched on various large model updates and their potential impacts.
Upcoming releases and resources in the community: Theyruinedelise announced the Ollama update set for Monday or Tuesday including CSV file support. Additionally, Sebastien's fine-tuned emotional llama model and its supportive resources are now available on Ollama and YouTube.

Links mentioned:
Introducing Lamini Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations | Lamini - Enterprise LLM Platform: no description found
Get DOLPHIN on Uniswap: no description found
Tweet from Unsloth AI (@UnslothAI): Tomorrow we will be handing out our new stickers for the @aiDotEngineer World's Fair! 🦥 Join us at 9AM, June 25 where we will be doing workshops on LLM analysis + technicals, @Ollama support & m...
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings sugge...
Tweet from Kearm (@Nottlespike): http://x.com/i/article/1805030133478350848
Tweet from RomboDawg (@dudeman6790): Announcing Replete-Coder-Qwen2-1.5b An uncensored, 1.5b model with good coding performance across over 100 coding languages, open source data, weights, training code, and fully usable on mobile platfo...
Emotions in AI: Fine-Tuning, Classifying, and Reinforcement Learning: In this video we are exploring the creation of fine-tuning dataset for LLM's using Unsloth and Ollama to train a specialized model for emotions detection.You...
Tell 'im 'e's dreamin': Some clips from the movie The Castle.
AI and Unified Memory Architecture: Is it in the Hopper? Is it Long on Promise, Short on Delivery?: Sit back, relax and enjoy the soothing sounds of Wendell's rambleing. This episode focuses on the MI 300a/x and Nvidia Grace Hopper. Enjoy!******************...
LlamaCloud: no description found
Noice Nice GIF - Noice Nice Click - Discover & Share GIFs: Click to view the GIF
SebLLama-Notebooks/Emotions at main · sebdg/SebLLama-Notebooks: Contribute to sebdg/SebLLama-Notebooks development by creating an account on GitHub.
GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. - GitHub - Unstructured-IO/unstructured: Open source librar...
GitHub - datamllab/LongLM: [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning: [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning - datamllab/LongLM
Google Colab: no description found
sebdg/emotional_llama: Introducing Emotional Llama, the model fine-tuned as an exercise for the live event on Ollama discord channer. Designed to understand and respond to a wide range of emotions.
Replete-AI/Replete-Coder-Qwen2-1.5b · Hugging Face: no description found
Replete-AI/Adapter_For_Replete-Coder-Qwen2-1.5b · Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #random (108 messages🔥🔥):

Logitech mouse and ChatGPT wrapper: A member discussed using a Logitech mouse with a “cool” ChatGPT wrapper capable of programming basic queries such as summarizing and rewriting text. They shared a link to show the UI of this setup.

Links mentioned:
Hallucination is Inevitable: An Innate Limitation of Large Language Models: Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts hav...
Reddit - Dive into anything: no description found
GitHub - PygmalionAI/aphrodite-engine: PygmalionAI's large-scale inference engine: PygmalionAI's large-scale inference engine. Contribute to PygmalionAI/aphrodite-engine development by creating an account on GitHub.
ChatGPT is bullshit - Ethics and Information Technology: Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persist...

Unsloth AI (Daniel Han) ▷ #help (228 messages🔥🔥):

Installation Woes with Xformers on Windows: One user struggled to install xformers on Windows when setting up Unsloth via conda, encountering a "PackagesNotFoundError." Another suggested that the challenges may be due to platform compatibility, prompting discussions about whether Unsloth works better on Linux.
Trouble Importing FastLanguageModel in Colab: Users reported issues with importing FastLanguageModel in Unsloth’s Google Colab notebooks. A workaround suggested was ensuring all initial cells, particularly those installing Unsloth, are executed properly.
Results Varying Based on Token Expiration: One user solved their issues by changing their Google account, identifying that an expired token in Colab secrets was causing problems, particularly around accessing datasets and downloading models.
Using Huggingface Tokens: A user discovered that adding a Huggingface token fixed access issues, prompting confusion as models were meant to be public. The general sentiment was that inconsistencies in Huggingface access could be at play.
Running Unsloth with Docker and Jupyter: There was a discussion about setting up Unsloth on NVIDIA GPU Cloud (NGC) containers with compatibility issues noted for specific CUDA and PyTorch versions. A solution involved trying different containers and careful installation of dependencies like xformers and bitsandbytes, with users sharing their Dockerfile configurations.

Links mentioned:

PyTorch Release 24.05 - NVIDIA Docs: no description found
Home: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
unsloth (Unsloth AI): no description found
GitHub - srush/Triton-Puzzles: Puzzles for learning Triton: Puzzles for learning Triton. Contribute to srush/Triton-Puzzles development by creating an account on GitHub.
Sao10K/Claude-3-Opus-Instruct-15K · Datasets at Hugging Face: no description found
I got unsloth running in native windows. · Issue #210 · unslothai/unsloth: I got unsloth running in native windows, (no wsl). You need visual studio 2022 c++ compiler, triton, and deepspeed. I have a full tutorial on installing it, I would write it all here but I’m on mob...
Google Colab breaks · Issue #243 · unslothai/unsloth: I am getting the below error while trying to import the FastLangugeModel from unsloth while using an A100 GPU on colab. Failed to import transformers.integrations.peft because of the following erro...
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
CUDA_VISIBILE_DEVICES not functioning · Issue #660 · unslothai/unsloth: I saw error message when I am trying to do supervised fine tuning with 4xA100 GPUs. So the free version cannot be used on multiple GPUs? RuntimeError: Error: More than 1 GPUs have a lot of VRAM usa...
unsloth/unsloth/models/llama.py at main · unslothai/unsloth: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Package repository for pytorch :: Anaconda.org: no description found
Package repository for nvidia :: Anaconda.org: no description found
Package repository for xformers :: Anaconda.org: no description found

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

Blog on Apple and Meta partnership stirs conversation: An AI enthusiast shared a blog post titled Apple and Meta Partnership: The Future of Generative AI in iPhones. The article discusses the implications, benefits, and challenges of integrating generative AI models into Apple's AI system, generating interest in the potential impact on the tech landscape.

Link mentioned: Apple and Meta Partnership: The Future of Generative AI in iPhones: Recent discussions between Apple and AI companies like Meta regarding partnerships to integrate generative AI models into Apple's AI system for iPhones have generated significant interest. This articl...

Stability.ai (Stable Diffusion) ▷ #general-chat (583 messages🔥🔥🔥):

Discord Bot Advertisement Gone Wrong: A member shared a bot link, claiming it integrates with Gemini for chat assistance and StabilityAI for text-to-image generation. Others criticized the link's lack of context and its potential safety issues.
Civitai and SD3 Licensing Drama: There was a heated debate over Civitai removing SD3 resources due to licensing concerns. One member argued this was done in response to potential legal issues, while others found the justification dubious.
Stable Diffusion on Low-End GPUs: Multiple members discussed the challenges of running Stable Diffusion on low-spec machines. Suggestions included using automatic1111 and adjusting settings like steps and resolution, and there was a debate about the effectiveness of older GPUs versus newer ones like RTX 4080.
Training and Technical Discussions: Members asked for advice on training models and handling errors, including issues with metadata and VRAM allocation. Recommendations were given to join specific training servers or use tools like ComfyUI and OneTrainer for better management.
Misunderstood Model Integrations: Users discussed compatibility issues between different model architectures, particularly between SD 1.5, SDXL, and ControlNet modules. The significance of matching model types with their appropriate extensions was highlighted to avoid errors and improve performance.

Links mentioned:

no title found: no description found
Discord - Group Chat That’s All Fun & Games: Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
Green Code: 01001000 01101001 00100001 00100000 01001001 00100000 01101101 01100001 01101011 01100101 00100000 01110110 01101001 01100100 01100101 01101111 01110011 00100000 01100001 01100010 01101111 01110101 01...
Stable Diffusion 3 Medium - a Hugging Face Space by stabilityai: no description found
Alfredo Canziani: Music, math, and deep learning from scratch
FIFO Diffusion tests: As the title says rendered in FIFO Diffusion. 4 1/2h render time for both clips total on a 4090. Kinda underwhelming . Will give it an other chance.
Advanced Style transfer with the Mad Scientist node: We are talking about advanced style transfer, the Mad Scientist node and Img2Img with CosXL-edit. Upgrade the IPAdapter extension to be able to use all the n...
Hot Sweating GIF - Hot Sweating Melting - Discover & Share GIFs: Click to view the GIF
Well, This Is Shit: Thomas Benjamin Wild Esq · Song · 2021
lllyasviel/sd-controlnet-canny at main: no description found
Download the latest official NVIDIA drivers: Download the latest official NVIDIA drivers
PyTorch
: no description found
List of Aesthetics: If you need assistance with identifying your aesthetic or creating a moodboard, feel free to ask questions in the Discussion Tab (in the pull-down bar of the "Explore" tab at the top of the ...
lllyasviel/sd_control_collection at main: no description found
TypeError: list indices must be integers or slices, not str: I've got two lists that I want to merge into a single array and finally put it in a csv file. How I can avoid this error : def fill_csv(self, array_urls, array_dates, csv_file_path): ...
stable-diffusion-webui/requirements_versions.txt at master · AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
0002 - Pony - v3.1alt | Stable Diffusion Checkpoint | Civitai: differences between 0001: Higher saturation Brighter More dynamic uses 2 loras trained by me: - QEW: https://civitai.com/models/470285/qew-quasarca...
GitHub - Nerogar/OneTrainer: OneTrainer is a one-stop solution for all your stable diffusion training needs.: OneTrainer is a one-stop solution for all your stable diffusion training needs. - Nerogar/OneTrainer
Feature request: Option to run CodeFormer and/or GFPGAN automatically again after upscale · Issue #1151 · AUTOMATIC1111/stable-diffusion-webui: Is your feature request related to a problem? Please describe. I've noticed that it seems GFPGAN and CodeFormer run before the upscaling happens, which results in a bit of a blurred resolution in ...
[Feature Request]: Offline Mode · Issue #11518 · AUTOMATIC1111/stable-diffusion-webui: Is there an existing issue for this? I have searched the existing issues and checked the recent builds/commits What would your feature do ? Have an option to download all files that could be reques...
GitHub - lucidrains/mmdit: Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch: Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch - lucidrains/mmdit
ABS Aquilon Aqua Gaming PC - Windows 11 Home - Intel Core i7 14th Gen 14700KF - GeForce RTX 4060 Ti 16GB - DLSS 3 - AI-Powered Performance - 32GB DDR5 6000MHz - 1TB M.2 NVMe SSD - AQA14700KF4060TI16G - Newegg.com: Buy ABS Aquilon Aqua Gaming PC - Windows 11 Home - Intel Core i7 14th Gen 14700KF - GeForce RTX 4060 Ti 16GB - DLSS 3 - AI-Powered Performance - 32GB DDR5 6000MHz - 1TB M.2 NVMe SSD - AQA14700KF4060TI...
Don't ask to ask, just ask: no description found
Civitai Link | One-click install Stable Diffusion models: Directly download any models from Civitai to your Stable Diffusion instance.
Update on SD3 on Civitai | Civitai: Standard disclaimer; This post does not constitute legal advice. How you interact with SAI and their product is up to you. You should seek your own...
Stable Diffusion 3: no description found
stabilityai/stable-diffusion-3-medium · Hugging Face: no description found
SD3 IS HERE!! ComfyUI Workflow.: SD3 is finally here for ComfyUI!Topaz Labs: https://topazlabs.com/ref/2377/HOW TO SUPPORT MY CHANNEL-Support me by joining my Patreon: https://www.patreon.co...
Deep Learning Fundamentals - Lightning AI: Deep Learning Fundamentals is a free course on learning deep learning using a modern open-source stack.
Introduction - Hugging Face NLP Course: no description found
HuggingFace: HuggingFace is on a mission to solve Natural Language Processing (NLP) one commit at a time by open-source and open-science. Our youtube channel features tutorials and videos about Machine Learning, ...
Whose art is this, really? Inside Canadian artists’ fight against AI: Visual artists’ work is being gathered online and used as fodder for computer imitations. When Toronto’s Sam Yang complained to an AI platform, he got an email he says was meant to taunt h...

CUDA MODE ▷ #general (17 messages🔥):

Beginners questioning working group contributions: A new member asked how to contribute to working groups, wondering if monitoring GitHub repositories is sufficient or if a more formal method exists.
Register usage in complex kernels: A member shared debugging strategies for a kernel using too many registers per thread, suggesting either commenting out code parts or examining SASS in Nsight Compute.
Announcing CUTLASS working group: A member proposed forming a working group to create learning materials for CUTLASS, inviting others to express interest and prepare by reviewing a YouTube talk on Tensor Cores.
CPU cache insights: A member shared a CPU-centric guide on computer cache, emphasizing the importance of understanding cache for programmers.

Links mentioned:

Lecture 23: Tensor Cores): Slides: https://drive.google.com/file/d/18sthk6IUOKbdtFphpm_jZNXoJenbWR8m/view?usp=drive_link
Exploring How Cache Memory Really Works: Even though we often hear terms like L1, L2, cache block size, etc., most programmers have a limited understanding of what cache really is. This is a beginner-friendly primer on how cache works.

CUDA MODE ▷ #torch (4 messages):

INT4 LoRA fine-tuning vs QLoRA: A user inquired about the differences between INT4 LoRA fine-tuning and QLoRA in terms of accuracy and speed. Another member explained that QLoRA with HQQ involves frozen quantized weights, does not use tinnygemm, and utilizes dequantizing alongside torch.matmul due to inefficiencies in tinnygemm for large sequences.
Performance and Speed in QLoRA: It's mentioned that QLoRA maintains good quality and fast performance, especially when a CUDA dequant kernel (axis=0) is implemented. A separate contribution was noted where a user created a fused GEMM for int4, which is effective for training with fixed sequence lengths, providing the fastest solution.

CUDA MODE ▷ #cool-links (1 messages):

Measure Bandwidth, Throughput, Latency with NVIDIA tools: A member shared a detailed GitHub guide on how to measure bandwidth, throughput, and latency using NVIDIA tools. The guide provides step-by-step instructions contributing to better performance analysis and optimization.

Link mentioned: Guide-NVIDIA-Tools/Chapter09 at main · CisMine/Guide-NVIDIA-Tools: Contribute to CisMine/Guide-NVIDIA-Tools development by creating an account on GitHub.

CUDA MODE ▷ #jobs (1 messages):

Internship Seeker with AI and CUDA Skills: A member from VietNam seeks a remote internship in AI and CV focusing on CUDA optimization. They shared their experience and two GitHub repositories: Parallel-Computing-Cuda-C and Guide-NVIDIA-Tools.

Link mentioned: GitHub - CisMine/Parallel-Computing-Cuda-C: Contribute to CisMine/Parallel-Computing-Cuda-C development by creating an account on GitHub.

CUDA MODE ▷ #beginner (3 messages):

Seeking AI/ML Fundamentals: A member asked for recommendations on good courses for learning fundamentals in AI/ML on platforms like Coursera. Another member inquired about their background in programming, computer science, or math to suggest appropriate resources.

CUDA MODE ▷ #torchao (28 messages🔥):

Precision Loss in FP8 Conversion Discussed: Members discussed how PyTorch follows the IEEE convention for rounding in FP8 conversions, addressing precision loss and suggesting that scaling tensors could minimize this loss. One member mentioned that scaling ensures more effective use of the GPU's range (link).
Floating-Point Precision Explained: Floating-point precision issues were a hot topic, and a member shared the floating-point-gui.de as a resource for understanding unexpected precision errors in numerical outputs.
Scaling for FP8 Precision: Several members debated how to determine scaling factors for tensor conversion to FP8, with some suggesting to base it on min/max values or other metrics to avoid overflow and underflow (link).
Quantization Learning Resources Shared: For those looking to understand quantization better, members recommended various resources including a GitHub list of papers and educational YouTube videos (Quantization explained and Advanced Quantization).
FP8 Scaling Updates: One member mentioned recent updates to PyTorch, now supporting row-wise scaling for FP8 conversion and hinted at upcoming posts for community discussion.

Links mentioned:

Scaled_FP8.md: GitHub Gist: instantly share code, notes, and snippets.
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training: In this video I will introduce and explain quantization: we will first start with a little introduction on numerical representation of integers and floating-...
Lecture 7 Advanced Quantization: Slides: https://www.dropbox.com/scl/fi/hzfx1l267m8gwyhcjvfk4/Quantization-Cuda-vs-Triton.pdf?rlkey=s4j64ivi2kpp2l0uq8xjdwbab&dl=0
GitHub - cuda-mode/awesomeMLSys: An ML Systems Onboarding list: An ML Systems Onboarding list. Contribute to cuda-mode/awesomeMLSys development by creating an account on GitHub.
pytorch/c10/util/Float8_e4m3fn.h at f42d5b6dca75ee020355fc75532347ca2734b117 · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
Float stored in 8 bits - ONNX 1.17.0 documentation: no description found
The Floating-Point Guide - What Every Programmer Should Know About Floating-Point Arithmetic: no description found
Visualising ML number formats: A visualisation of number formats for machine learning --- I couldn’t find any good visualisations of machine learning number formats online, so I decided to make one. It’s interactive, and hopefully ...
float8_experimental/float8_experimental/float8_utils.py at d4ade877dff327ea7f51e91f7cc218ae956e8cfd · pytorch-labs/float8_experimental: This repository contains the experimental PyTorch native float8 training UX - pytorch-labs/float8_experimental
float8_experimental/test/test_base.py at d4ade877dff327ea7f51e91f7cc218ae956e8cfd · pytorch-labs/float8_experimental: This repository contains the experimental PyTorch native float8 training UX - pytorch-labs/float8_experimental

CUDA MODE ▷ #off-topic (18 messages🔥):

Valorant account locked for associating with a cheater: A user's friend got her Valorant account locked for 180 days because she queued with someone who was cheating. "I told her to go through support but she's getting desperate so I figured it was worth mentioning."
Anxiety over account lock: The friend was anxious and only waited an hour for support before seeking further help. "I told her to wait for now."
Region and details provided: The user mentioned that the affected friend is located in California and plays Valorant. "She's in California, she just told me."
Response from support query: A respondent mentioned the possibility of looking into the issue but noted that there might not be much they can do. "I think the answer is 'nothing really' LOL"
Replay review and appropriate bans: Assurance was given that replays would be watched to make sure bans are appropriate. "They'll watch the replay and do the bans appropriately though!"

CUDA MODE ▷ #hqq (2 messages):

Running torchao_int4_demo.py produces nonsense output: One member reported getting meaningless output like "Unterscheidung Hinweis Unterscheidung Einzeln Unterscheidung Unterscheidung ..." when trying to run torchao_int4_demo.py. They mentioned the only change was "setting compile=None" and sought help from another member who inquired if the issue occurs with all models and suggested trying with 'axis=0'.

CUDA MODE ▷ #llmdotc (465 messages🔥🔥🔥):

Plan for NCCL Initialization: A member proposed a plan to use MPI to initialize NCCL and fallback to the file system or TCP sockets if MPI is unavailable. They aimed to keep GPU computations in CUDA to ensure stability and performance.
H100 vs A100 Training Stability: Members discussed the instability in the training on H100 GPUs compared to A100 GPUs, with H100 experiencing "exploding" gradients around 28K steps. One suggested copying computations to GPU to avoid this issue.
CUDA and Multi-node Setup: Significant efforts were made to test multi-node setups using different methods such as MPI, slurm, and TCP sockets. The discussions included refinements necessary to ensure all nodes work well together without significant overhead.
Integrating FP8 Matmuls: A member described integrating FP8 matmuls and observed marginal performance increases. They shared detailed challenges and strategies related to FP8 tensor cores and optimizing rescaling and transposing operations.
Preparation for Cluster Training: Plans were discussed to try training large language models on a new Lambda cluster, aiming to complete significant training milestones faster. This included ensuring cost efficiency and verifying the stability of the training runs on different hardware setups.

Links mentioned:

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling: Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance o...
WIP Distribution Visualisation to help with FP8 work & beyond by ademeure · Pull Request #618 · karpathy/llm.c: Not ready for integration at all / still very hacky, bunch of unsolved issues I am not sure where code should go etc.: need to find a way to make it pollute the code less with all of those generat...
Socket server/client interface by chinthysl · Pull Request #633 · karpathy/llm.c: Dummy PR to make use of the distributed interface in PR #632
FlexNet 11.19.5 build on Visual Studio 2015: Hi all,   I am trying to build my app with FlexNet 11.19.5. I am facing some compiler issues (Visual Studio 2015):c:\program files (x86)\windows kits\8.1\include\shared\ws2def.h(100): warning C4005: '...
MPI/TCP/FS for NCCL-init by gordicaleksa · Pull Request #632 · karpathy/llm.c: Instead of mixing NCCL & Open MPI during training: let's transition to using only NCCL. To the best of my knowledge there are no downsides here, they're equivalent and speedwise i couldn&#...

CUDA MODE ▷ #rocm (2 messages):

PCIe limitations discussed: Members discussed how PCIe has power, weight, and pin limits when it comes to communication. One member noted that the main reason for not creating lower-spec products is focus on selling high-end servers which are more profitable.
Big players targeted: Another member speculated that the company is primarily targeting big players like cloud GPU providers. This aligns with their current product strategy which maximizes revenue.

CUDA MODE ▷ #bitnet (25 messages🔥):

Debugging Bitnet Tensor Issue: Members faced an issue with Bitnet tensors while running a trainable network, encountering an error due to a dimension not divisible by 4. An error traceback was shared indicating an AssertionError caused by Bitnet dispatch attempting an unsupported aten.to.dtype_layout operation.
Updated Test Script and Repo Link: An updated test script was linked to CoffeeVampir3's GitHub to use the new library paths. CoffeeVampir3 also shared the main repository link here.
Affine Quantization Discussion: Vayuda and Jerry discussed the potential integration of Bitnet tensors into AffineQuantizedTensor, considering creating a new layout for packed tensors which would indicate the currently packed dimension. Jerry emphasized that bit (uint1) tensors should remain separate but compatible with affine quantized tensors.
Seeking Assistance and Minimal Repro Request: Marksaroufim requested a minimal reproducible example to debug the dtype conversion issue in Bitnet tensors. CoffeeVampir3 provided the link to the test script to facilitate this debugging process.
New Tutorials and Tensor Subclassing Ideas: Marksaroufim suggested new tutorials on the PyTorch ao library, highlighting the library's potential to handle quantized optimizers and kv caches. Gau.nernst and Vayuda discussed the absence of progress on fp5 and the potential interest in integrating 8-bit Adam with tensor subclasses.

Link mentioned: The next tutorials · Issue #426 · pytorch/ao: From our README.md torchao is a library to create and integrate high-performance custom data types layouts into your PyTorch workflows And so far we've done a good job building out the primitive d...

LM Studio ▷ #💬-general (312 messages🔥🔥):

GPU VRAM limits model capabilities: Discussions highlighted limitations in loading large models like Command R (34b) Q4_K_S on GPUs with limited VRAM, resulting in reduced token context windows and hindered usability. Various members recommended looking into alternative formats like EXL2 which are more VRAM-efficient for models.
Interest in server setup and headless operation: Users expressed interest in running LM Studio on remote servers and headless setups for better hardware utilization. Suggestions included exploring llama.cpp for server setups and noting that LM Studio does not support direct remote or headless operations.
Text-to-text dominant focus and model customization: Members discussed the limited capabilities of LM Studio to only handle text-to-text interactions, with no support for image generation or text-to-speech features. Some users mentioned alternative frontends like SillyTavern but acknowledged its RP/character focus, highlighting the need for more versatile options.
Optimizing cooling for P40 GPUs: There were troubleshooting tips shared on GPU cooling, especially around P40 GPUs. Users noted the importance of adequate cooling solutions and shared experiences like crafting custom air ducts to manage GPU temperatures more effectively.
Exploring various language models for coding: Discussions involved finding the best language models for coding tasks, with mentions of models like Codestral 22B. Members highlighted the importance of model size and quantization, recommending Q5 or Q6 quants for optimal performance given specific hardware constraints.

Links mentioned:

README.md · artificialguybr/ColoringBookRedmond-V2 at main: no description found
[Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B](https://arxiv.org/abs/2406.07394): This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex m...
GitHub: Let’s build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and fea...
bartowski/Codestral-22B-v0.1-GGUF · Hugging Face: no description found
Confused Computing GIF - Confused Computing Counting - Discover & Share GIFs: Click to view the GIF
configs/Extension-Pack-Instructions.md at main · lmstudio-ai/configs: LM Studio JSON configuration file format and a collection of example config files. - lmstudio-ai/configs
GitHub - theroyallab/tabbyAPI: An OAI compatible exllamav2 API that's both lightweight and fast: An OAI compatible exllamav2 API that's both lightweight and fast - theroyallab/tabbyAPI

LM Studio ▷ #🤖-models-discussion-chat (116 messages🔥🔥):

Hermes 2 Theta Llama-3 amazed users: Members praised the Hermes 2 Theta Llama-3 70B model for its ability to remember context up to 19k tokens and effectively follow instructions. One member shared that it might be their top model now due to its deep reasoning and creative capabilities in role-play scenarios. Hermes 2 Theta Llama-3.
DeepSeek Coder V2 gains popularity: Users discussed the performance and prompt issues of the DeepSeek Coder V2 model, recommending using specific prompt presets to avoid unexpected output in Chinese. One user highlighted how this model outperformed GPT4o for tasks related to C# coding. DeepSeek Coder V2.
Llama 3 CursedStock models intrigue: Members expressed curiosity and amusement at the unusual naming and performance of Llama 3 CursedStock V1.8-8B, sharing that it fits its quirky name by merging uncensored models. There were also discussions about how well it performs in niche roles such as specific story-writing and generating creative content. Llama-3 CursedStock V1.8-8B.
Concerns over Temporal Awareness in LLMs: There was a debate about LLMs' inability to handle tasks that require temporal awareness and cause-and-effect reasoning. Users acknowledged the limitations of current AI, emphasizing the need for specialized hardware to achieve genuine general intelligence.
Experimenting with Quantized Models: Users shared experiences with different quantized models like Q6_K_L and Q8, noting issues with certain builds in handling large context sizes. They also discussed the potential benefits of keeping output tensors and embeddings unquantized for better performance, particularly with the Hathor Fractionate-L3-8B model. Hathor Fractionate-L3-8B.

Links mentioned:

DeepSeek: Chat with DeepSeek AI.
PrunaAI/cognitivecomputations-Dolphin-2.9.1-Phi-3-Kensho-4.5B-GGUF-smashed · Hugging Face: no description found
cognitivecomputations/Dolphin-2.9.1-Phi-3-Kensho-4.5B · Hugging Face: no description found
mradermacher/New-Dawn-Llama-3-70B-32K-v1.0-GGUF · Hugging Face: no description found
meta-llama/Meta-Llama-3-8B · Hugging Face: no description found
PJMixers/LLaMa-3-CursedStock-v1.8-8B · Hugging Face: no description found
Flash Thumbs Up GIF - Flash Thumbs Up Way To Go - Discover & Share GIFs: Click to view the GIF
GitHub - RJ-Flash/AGI-Project: The AGI Project aims to develop an Artificial General Intelligence (AGI) system capable of understanding, learning, and applying knowledge across a wide range of tasks at a level comparable to human intelligence. Our goal is to create a system that can perform any intellectual task that a human being can do, with the ability to learn and adapt.: The AGI Project aims to develop an Artificial General Intelligence (AGI) system capable of understanding, learning, and applying knowledge across a wide range of tasks at a level comparable to huma...
mradermacher/Hermes-2-Theta-Llama-3-70B-32k-i1-GGUF at main: no description found
Nitral-AI/Hathor_Fractionate-L3-8B-v.05 · Hugging Face: no description found
bartowski/Hathor_Stable-L3-8B-v0.5-GGUF · Hugging Face: no description found
TheDrummer (Drummer): no description found
mradermacher/Halu-8B-Llama3-Blackroot-GGUF · Hugging Face: no description found
mradermacher/Mistral-7B-Erebus-v3-i1-GGUF · Hugging Face: no description found
OpenPipe/Hermes-2-Theta-Llama-3-70B-32k · Hugging Face: no description found

LM Studio ▷ #🧠-feedback (4 messages):

DeepseekV2 Chat Loading Issues: One user mentioned that deepseekV2 cannot be loaded for chat. Another noted that V0.2.25 is required and "auto update currently broken".
Multi-Model Sequence Proposal: A member proposed a feature for Multi-model setups to "build a sequence map for models" allowing one model to feed information into two parallel models, which then feed into a final model.
Ubuntu LM Studio Network Error: LM Studio on Ubuntu 22.04 gets a "network error" when trying to search models on Hugging Face. However, the member noted it still works on Mac M1 and the issue appeared after commenting out the ser2net config file for port 3001, used by AnythingLLM web server.

LM Studio ▷ #⚙-configs-discussion (9 messages🔥):

Estimating the AI setup cost stumps users: A member asked about the budget to set up a machine with the performance of GPT or Bard. Responses indicated that the cost is extremely high, potentially thousands of dollars, depending on the configuration, and not feasible for a typical user.
NVIDIA DGX GH200 is highlighted: A link to the NVIDIA DGX GH200 was shared, noting that it is used by OpenAI and features large memory capacities designed to handle terabyte-class models. Another member humorously remarked that such setups are out of reach for most people's budgets.

Link mentioned: NVIDIA DGX GH200: Massive memory supercomputing for emerging AI

LM Studio ▷ #🎛-hardware-discussion (18 messages🔥):

NVlink's absence limits 4000 series GPUs: A member questioned whether the absence of NVlink in 4000 series GPUs would hinder using multiple GPUs for AI purposes. They also queried the potential use of DX or Vulkan multi-GPU features as alternatives.
Performance on Nvidia P40s in Proxmox setup: A user discussed their new setup with two Nvidia P40s in a server running Proxmox and Debian. They noted power utilization spiked significantly when using Codestral for full GPU offload, achieving 12 tokens/second.
ROCm 6.1.3 supports multi-GPU: It was shared that AMD released ROCm 6.1.3, which now supports multi-GPU for high-end RDNA3 cards.
Debate on 16GB RAM for iPad Pro: There was a debate on whether the 16GB RAM version of the iPad Pro is necessary for running large AI models. One member highlighted that quantized models can fit into 16GB on their RTX 4070 Ti Super, but was unsure if this would apply to Apple's hardware.
Corsair PSU and storage purchase query: A user inquired if purchasing a Corsair AX1600i for €266 and 4 Exos Enterprise 18TB drives for €668 was worth it, receiving no specific feedback.

LM Studio ▷ #🧪-beta-releases-chat (3 messages):

Llama.cpp model loading error: One member reported a "wrong number of tensors" issue with the error message 'done_getting_tensors: wrong number of tensors; expected 356, got 291' while loading the Blombert 3B f16 gguf model. Another suggested the error is due to llama.cpp version incompatibility with LM Studio.
Context length troubleshooting advice: A common issue with large models such as Blombert 3B was discussed, attributing errors to mismatched context lengths. "Keep ratcheting the context length down until it doesn't lose its' mind," was advised as a possible solution.

LM Studio ▷ #avx-beta (1 messages):
cdrivex4: Yes ok.. Sounds like fun

LM Studio ▷ #model-announcements (1 messages):

Qwen2 500M Model Quantization Update: The latest quantized versions of the Qwen2 500M model have been published. These models are optimized for speedy generation and can even be deployed on lightweight compute machines like a Raspberry Pi. Explore the models here.

LM Studio ▷ #🛠-dev-chat (12 messages🔥):

Model loading issues frustrate user: One user struggled with loading their model using LMS with a batch script but eventually succeeded. They asked for feedback on their batch script to check for mistakes or streamlining opportunities.
LMStudio is not open source: A user inquired whether LMStudio is open source and if it could be extended. Another member clarified that it is not open source, leading the user to consider developing their own tools to achieve desired functionalities.
Dreams of an all-in-one model runner: A discussion touched on the desire for a program capable of running various models from Huggingface, including text to speech, text to image, and more. No existing solution was known, but there was interest in such a project.

OpenAI ▷ #ai-discussions (276 messages🔥🔥):

GPT-5 Anticipation Builds: Users expressed frustration at OpenAI's delayed feature rollouts, with voice mode and GPT-4 Vision being repeatedly mentioned as overdue. A member stated, "at this point i don't even care when it comes it comes, and ill use it but meh thats just me ofcourse."
Siri and ChatGPT Integration Debate: Confusion arose over whether ChatGPT is integrated into Siri, with one member clarifying, "no its just like a bonus its not exactly integrated where its reliant on it". Elon Musk's criticism of the integration also sparked conversation.
Claude vs ChatGPT Performance: Many users discussed the superiority of Claude 3.5 Sonnet over GPT-4o, especially in coding, with one saying, "same things i tried in 4o and where it failed, claude 3.5 did it successfully and more". Benchmarks and specific features like Claude's "artifacts" were frequently mentioned as evidence.
AI Model Economics and Token Limits: Discussions highlighted comparative aspects of various AI models, including Claude’s 200k tokens versus ChatGPT’s 128k for GPT-4 and 32k for Plus users. One user noted, "Claude 3.5 Sonnet is on the LMSYS leaderboard," emphasizing practical performance over pure benchmarks.
Persistent Use-Cases for LLMs: A user inquired about how to create a persistent LLM trained on personal documents, asking, "Is there a way to essentially hyper focus one of these LLMs like sonnet 3.5, or gemini 1.5 pro, etc and use personally as my own work-bot?" This sparked significant interest around the potential for customized, long-term AI applications.

Links mentioned:

Wired: AI startup Perplexity is 'BS machine': Katie Drummond, Wired’s global editorial director, joins 'Squawk Box' to discuss the magazine's investigation into AI search startup Perplexity.
Computer Stick Man GIF - Computer Stick Man Table Flip - Discover & Share GIFs: Click to view the GIF

OpenAI ▷ #gpt-4-discussions (29 messages🔥):

GPT-4o connectivity issues resolved: Multiple users reported encountering an error message on GPT-4o stating, "An error occurred connecting to the worker," but it was resolved after a short period. One user confirmed, "seems for me its back working now."
Screen sharing feature has no ETA: A user inquired about the availability of a screen-sharing feature, to which another user responded that there is no estimated time of arrival (ETA) yet.
GPT-4o prompt adherence problems: Users discussed issues with GPT-4o where it fails to stick to specified prompt formats and instructions consistently. For instance, it often outputs in markdown despite clear instructions for HTML, and it misinterpreted structured review instructions by reviewing entire documents at once.
ChatGPT's slow performance and crashes: Users experienced slow performance and frequent crashes while using ChatGPT. One remarked, "yeah, its crashing frequently here too."
Document length and GPT context window limitations: A user with 1200-page documents faced issues with GPT accurately processing content. Another user explained that ChatGPT’s context window is not sufficient for such large documents and recommended tools like Gemini and Claude for larger token windows.

OpenAI ▷ #prompt-engineering (53 messages🔥):

Members discuss background removal limitations: A member mentioned that DALL-E only edits its own generations and that ChatGPT offers some image editing capabilities like generating Python scripts for tasks, but struggles with background removal. Another member suggested trying online services for background removal.
Eager anticipation for Sora launch: A user expressed excitement about Sora's launch, asking for updates. Another member shared that there is no timeline yet but linked to a Sora video generated on the server.
Creation of fantasy movie plots with AI: A member excitedly shared their fantasy movie ideas being developed with ChatGPT, including a reimagining of The Wizard of Oz. They discussed the use of DALLE to visualize their ideas.
Troubleshooting ChatGPT's capabilities: Users were troubleshooting ChatGPT's image background removal skills, noting that while it attempts with basic coding, it runs into memory allocation issues with more complex tasks like using the "Deeplab model". The discussion included insights on modifying behavior by adjusting custom instructions.
Interactive prompts and optimizing responses: A member shared a detailed interactive prompt for building a PC on a budget, and another sought advice on prompts related to cryptocurrency. Additionally, there was interest in improving MyGPT prompts for better response accuracy and reliability, especially in extracting topics and processing uploaded files.

OpenAI ▷ #api-discussions (53 messages🔥):

Background removal: Dream or reality?: Members discussed attempts to get ChatGPT to perform background removal on images. Despite ChatGPT generating scripts to try this, results were inconsistent due to memory allocation issues when using advanced machine learning tools.
Sora launch anticipation grows: New users expressed excitement and impatience for the launch of Sora. A member shared a link to a video of a Sora event that generated some buzz on the server.
DALL-E vs. Midjourney for artworks: Members debated the effectiveness of DALL-E 3 compared to Midjourney for creating AI images, especially for paint-like images. Personal preferences leaned towards DALL-E 3 for its specific artistic styles.
Fantasy movies and prompt crafting: A user shared their experience using ChatGPT to create movie ideas, specifically a reimagination of "The Wizard of Oz". They sought advice on refining prompts for more accurate and vivid image generation.
Interactive PC building prompts: A member showcased a creative interactive prompt designed to help users build PCs within a specified budget, incorporating web searches for affordable components and tracking the project's progress using Python.

Perplexity AI ▷ #general (381 messages🔥🔥):

Wired slams Perplexity for plagiarism: A Wired article accused Perplexity AI of "surreptitiously scraping" websites, violating its own policies. Users discussed it, with some finding the backlash excessive considering AI's common practices with data summarization (source).
Legal perspectives on AI summarization: Redditors discussed the legal risks of AI summarizing articles inaccurately and potentially making defamatory statements. A Wired observation highlighted Perplexity’s chatbot falsely attributing a crime to a police officer despite linking to the source (archive link).
Claude 3.5 Sonnet rollout: Perplexity Pro members noted the recent addition of the Claude 3.5 Sonnet model. Initial reactions praised its capabilities but some users criticized it for being overly cautious and limiting (Forbes Article).
User frustrations and platform reliability: Several users reported issues with Perplexity, including inconsistencies in Pro search results and login problems on the mobile app. One user expressed significant dissatisfaction with the functionality and restriction levels of Claude 3.5 Sonnet.
Pro search and model usage insights: Discussions revealed frustrations with changes in Pro search's effectiveness and source limits, with users suggesting Perplexity prioritizes partnerships over core improvements. A user noted that Claude's API subscription provides more value compared to competitors (related video).

Links mentioned(https://discord.com/channels/1104757954588196865/1104757955204743201/1253827044463083582)** (33 messages🔥):

- **Use ROCm Fork Versions**: Members discussed needing to use the ROCm fork versions of [xformers](https://github.com/ROCm/xformers) and [flash-attention](https://github.com/ROCm/flash-attention) for certain functionalities. One user confirmed that flash-attention support requires ROCm 5.4+, PyTorch 1.12.1+, and MI200 & MI300 GPUs.
- **Reward Model Not Effective for Data Generation**: A brief exchange concluded that the reward model isn't worthwhile for generating data, as it primarily classifies data quality.
- **Boosting AGI Eval**: One user mentioned plans to synthesize SAT, GRE, and MCAT questions to potentially boost AGI evaluations for smaller models, with suggestions to include LSAT questions as well.
- **Epoch Saving Issues**: A user reported issues with epoch saving during training, where it saves at seemingly inconsistent points like 1.05 epochs and then returns to 0.99 epochs. This was recognized as a known but peculiar behavior, possibly related to the steps counter.
- **Finetuning on AMD**: Questions were raised about finetuning on AMD hardware, with a response indicating that Eric has experience with this, though it wasn't confirmed if it is a straightforward process.

**Links mentioned**:

- [GitHub - ROCm/flash-attention: Fast and memory-efficient exact attention](https://github.com/ROCm/flash-attention): Fast and memory-efficient exact attention. Contribute to ROCm/flash-attention development by creating an account on GitHub.
- [GitHub - ROCm/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.](https://github.com/ROCm/xformers): Hackable and optimized Transformers building blocks, supporting a composable construction. - ROCm/xformers

---

### **OpenAccess AI Collective (axolotl) ▷ #[axolotl-dev](https://discord.com/channels/1104757954588196865/1104758010959634503/)** (1 messages):

lore0012: I am no longer hitting the issue.

---

### **OpenAccess AI Collective (axolotl) ▷ #[general-help](https://discord.com/channels/1104757954588196865/1110594519226925137/1253830860449382578)** (4 messages):

- **HeaderTooLarge error in fine-tuning Qwen2 7b**: A member encountered a `safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge` while running `CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess axolotl/ben_configs/qwen2_first.yaml`. This error occurs when attempting to load checkpoint shards.
- **Local directory issues with Qwen2 7b model**: The fine-tuning configuration works when setting `base_model` to a Hugging Face repository but fails when pointing to a local directory (`/large_models/base_models/llm/Qwen2-7B`). The failure persists even though the folder is a mounted NFS.
- **Frustration with NVIDIA Megatron-LM bugs**: A user expressed frustration after spending a week trying to get megatron-lm to work, encountering numerous errors. An example of the issues faced can be seen in [GitHub Issue #866](https://github.com/NVIDIA/Megatron-LM/issues/866), which discusses a problem with a parser argument in the `convert.py` script.

**Link mentioned**: [[BUG] the argument of parser.add_argument is wrong in tools/checkpoint/convert.py · Issue #866 · NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM/issues/866): Describe the bug [https://github.com/NVIDIA/Megatron-LM/blob/main/tools/checkpoint/convert.py#L115](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/checkpoint/convert.py#L115) It must be 'choices=['GPT', 'BERT'],' not 'choice=['GPT', 'BER...

---

### **OpenAccess AI Collective (axolotl) ▷ #[datasets](https://discord.com/channels/1104757954588196865/1112023441386778704/1254518443789648024)** (5 messages):

- **Newbie asks about dataset suitability**: A new member experimenting with fine-tuning **llama2-13b** using **axolotl** inquired about dataset formatting and content. They asked, "Would this be an appropriate place to ask about dataset formatting and content?"
- **Formatting example for 'Alpaca' dataset**: Another member shared a dataset case using **JSONL** for fine-tuning **Alpaca**. They provided detailed examples, including instructions, input patterns, and expected outputs, and questioned if the LLM could generalize commands like "move to the left" and "move a little to the left."
- **Introducing Rensa for high-performance MinHash**: A member excitedly introduced their side project, **Rensa**, a high-performance MinHash implementation in Rust with Python bindings. They claimed it is 2.5-3x faster than existing libraries like `datasketch` for tasks like dataset deduplication and shared its [GitHub link](https://github.com/beowolx/rensa) for community feedback and contributions.

**Link mentioned**: [GitHub - beowolx/rensa: High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets](https://github.com/beowolx/rensa): High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets - beowolx/rensa

---

### **OpenAccess AI Collective (axolotl) ▷ #[axolotl-phorm-bot](https://discord.com/channels/1104757954588196865/1225558824501510164/1254711001174245438)** (5 messages):

- **Prompt Style Explained in Axolotl Codebase**: The inquiry about `prompt_style` led to an explanation that it specifies how prompts are formatted for interacting with language models, impacting the performance and relevance of responses. Examples such as `INSTRUCT`, `CHAT`, and `CHATML` were detailed to illustrate different prompt structuring strategies for various interaction types.
- **Example of ReflectAlpacaPrompter Usage**: The `ReflectAlpacaPrompter` class example highlights how different `prompt_style` values like "instruct" and "chat" dictate the structure of generated prompts. The `match_prompt_style` method is used to set up the prompt template according to the selected style.

**Link mentioned**: [OpenAccess-AI-Collective/axolotl | Phorm AI Code Search](https://phorm.ai/query?projectId=1e8ce0ca-5f45-4b83-a0f4-9da45ce8e78b&threadId=4809da1a-b260-413e-bdbe-8b82397846e6)): Understand code, faster.

---

### **Mozilla AI ▷ #[announcements](https://discord.com/channels/1089876418936180786/1089876419926032396/1254906057256468573)** (1 messages):

- **Llamafile v0.8.7 releases with upgrades**: [Llamafile v0.8.7](https://discord.com/channels/1089876418936180786/1182689832057716778/1254823644320763987) released with **faster quant operations** and **bug fixes**. An Android version hint was also mentioned.
- **San Francisco hosts major AI events**: **World's Fair of AI** and **AI Quality Conference** will feature prominent community members. Links to [World's Fair of AI](https://www.ai.engineer/worldsfair) and [AI Quality Conference](https://www.aiqualityconference.com/) are provided.
- **Firefox Nightly AI services experiment**: Firefox Nightly consumers can access optional AI services through an ongoing experiment. Details can be explored in the [Nightly blog](https://discord.com/channels/1089876418936180786/1254858795998384239).
- **Latest ML Paper Picks available**: The [latest ML Paper Picks](https://discord.com/channels/1089876418936180786/1253145681338830888) have been shared by a community member.
- **RSVP for upcoming July AI events**: Events include [Jan AI](https://discord.com/events/1089876418936180786/1251002752239407134), [AI Foundry Podcast Roadshow](https://discord.com/events/1089876418936180786/1253834248574468249), and [AutoFIx by Sentry.io](https://discord.com/events/1089876418936180786/1245836053458190438).

---

### **Mozilla AI ▷ #[llamafile](https://discord.com/channels/1089876418936180786/1182689832057716778/1253796478535860266)** (31 messages🔥):

- **Llamafile Help Command Issue**: A user reported that running `llamafile.exe --help` returns empty output and inquired if this is a known issue. There was no further discussion or solutions provided in the chat.
- **Running Llamafile on Google Colab**: A user, after some initial confusion, successfully ran a llamafile on Google Colab and shared a [link to their example](https://colab.research.google.com/drive/1jWKKwVCQneCTB5VNQNWO0Wxqg1vG_E1T#scrollTo=13ISLtY9_v7g).
- **Llamafile Repackaging Concerns**: A user expressed concerns about the disk space requirements when repackaging llamafiles, suggesting the ability to specify different locations for extraction and repackaging. This sparked a discussion on the potential need for specified locations via environment variables or flags due to large llamafile sizes.
- **New Memory Manager for Cosmopolitan**: A [commit on GitHub](https://github.com/jart/cosmopolitan/commit/6ffed14b9cc68b79d530b23876f522f906173cca) discussing a rewrite of the memory manager to support Android was shared and sparked interest in potentially running llamafile on Android via Termux.
- **Mozilla Nightly Blog Mentions Llamafile**: The [Nightly blog](https://blog.nightly.mozilla.org/2024/06/24/experimenting-with-ai-services-in-nightly/) mentioned llamafile, offering guidance on toggling Firefox configurations to enable local AI chat. This excited the community, with suggestions to provide clearer instructions for new users.

**Links mentioned**:

- [no title found](http://localhost:8080`): no description found
- [Tweet from Dylan Freedman (@dylfreed)](https://x.com/dylfreed/status/1803502158672761113): New open source OCR model just dropped! This one by Microsoft features the best text recognition I've seen in any open model and performs admirably on handwriting. It also handles a diverse range...
- [Mozilla Builders](https://future.mozilla.org/builders/): no description found
- [Release llamafile v0.8.7 · Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.7): This release includes important performance enhancements for quants. 293a528 Performance improvements on Arm for legacy and k-quants (#453) c38feb4 Optimized matrix multiplications for i-quants on...
- [Rewrite memory manager · jart/cosmopolitan@6ffed14](https://github.com/jart/cosmopolitan/commit/6ffed14b9cc68b79d530b23876f522f906173cca): Actually Portable Executable now supports Android. Cosmo's old mmap code required a 47 bit address space. The new implementation is very agnostic and supports both smaller address spaces (e.g....
- [ggerganov - Overview](https://github.com/ggerganov/): I like big .vimrc and I cannot lie. ggerganov has 71 repositories available. Follow their code on GitHub.
- [Google Colab](https://colab.research.google.com/drive/1jWKKwVCQneCTB5VNQNWO0Wxqg1vG_E1T#scrollTo=13ISLtY9_v7g): no description found
- [Feature Request: Support for Florence-2 Vision Models · Issue #8012 · ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp/issues/8012): Feature Description Support for Florence-2 Family of Vision Models needed Motivation A 400M model beating a 15-16B parameter model in benchmarks? Possible Implementation No response

---

### **Torchtune ▷ #[general](https://discord.com/channels/1216353675241590815/1216353675744641096/1253791496432517293)** (24 messages🔥):

- **DPO Training Options Available; ORPO Not Yet Supported**: When asked about the options for DPO and ORPO training with Torchtune, a member shared a [dataset for ORPO/DPO](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) and mentioned that ORPO is not yet supported while DPO has a [recipe available](https://github.com/pytorch/torchtune/blob/f200da58c8f5007b61266504204c61a171f6b3dd/recipes/configs/llama2/7B_lora_dpo.yaml#L9). This was confirmed by another member who added that ORPO would need to be implemented separately from supervised fine-tuning.
- **Training on Multiple Datasets and Epochs Limitation**: A member inquired about training on multiple datasets and setting different epochs per dataset, and was directed to use *ConcatDataset*. It was highlighted that setting different epochs per dataset is not supported.
- **Debate on ChatML Template Use with Llama3**: There was an ongoing discussion about the use of ChatML templates with Llama3, featuring [Mahou-1.2-llama3-8B](https://huggingface.co/flammenai/Mahou-1.2-llama3-8B) and [Olethros-8B](https://huggingface.co/lodrick-the-lafted/Olethros-8B). Participants debated whether using an instruct tokenizer and the base model without special tokens versus with ChatML was appropriate.
- **Phi-3 Model Fine-Tuning Feasibility**: Queries about the feasibility of fine-tuning the Phi-3-Medium-4K-Instruct model using torchtune were addressed. It was suggested to update the tokenizer and add a custom build function in torchtune for compatibility, and include system prompts by prepending them to user messages if desired.
- **Instruction on Using System Prompts with Phi-3**: It was noted that Phi-3 models might not have been optimized for system prompts, but users can still prepend system prompts to user messages for fine-tuning on Phi-3 as usual. A specific flag in the tokenizer configuration [was mentioned](https://github.com/pytorch/torchtune/blob/main/torchtune/models/phi3/_sentencepiece.py#L128) for allowing system prompt usage.

**Links mentioned**:

- [lodrick-the-lafted/Olethros-8B · Hugging Face](https://huggingface.co/lodrick-the-lafted/Olethros-8B): no description found
- [flammenai/Mahou-1.2-llama3-8B · Hugging Face](https://huggingface.co/flammenai/Mahou-1.2-llama3-8B): no description found
- [microsoft/Phi-3-mini-4k-instruct · Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): no description found
- [torchtune/torchtune/models/phi3/_sentencepiece.py at main · pytorch/torchtune](https://github.com/pytorch/torchtune/blob/main/torchtune/models/phi3/_sentencepiece.py#L128.): A Native-PyTorch Library for LLM Fine-tuning. Contribute to pytorch/torchtune development by creating an account on GitHub.
- [mlabonne/orpo-dpo-mix-40k · Datasets at Hugging Face](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k): no description found
- [torchtune/recipes/configs/llama2/7B_lora_dpo.yaml at f200da58c8f5007b61266504204c61a171f6b3dd · pytorch/torchtune](https://github.com/pytorch/torchtune/blob/f200da58c8f5007b61266504204c61a171f6b3dd/recipes/configs/llama2/7B_lora_dpo.yaml#L9): A Native-PyTorch Library for LLM Fine-tuning. Contribute to pytorch/torchtune development by creating an account on GitHub.
- [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/html/2404.14219v1#S2)): no description found
- [microsoft/Phi-3-mini-4k-instruct · System prompts ignored in chat completions](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/51#665f24e07a329f831b1e3e4e.): no description found
- [microsoft/Phi-3-medium-4k-instruct · Hugging Face](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct): no description found
- [config.json · microsoft/Phi-3-medium-4k-instruct at main](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct/blob/main/config.json): no description found

---

### **tinygrad (George Hotz) ▷ #[general](https://discord.com/channels/1068976834382925865/1068976834928193609/1253788818042126418)** (8 messages🔥):

- **WHERE Function Clarification**: A member asked if the WHERE function could be simplified with conditional operations like `condition * a + !condition * b` and was pointed out that *NaNs* could be an issue.
- **Intel Support Inquiry**: Someone inquired about **Intel support** in tinygrad. Another member responded that **opencl** can be used, but there is no XMX support yet.
- **Monday Meeting Overview**: Key topics for the upcoming Monday meeting at 9:40 a.m. PT include updates on *tinybox*, new profiler, runtime enhancements, and plans for the **0.9.1 release**. Specific agenda items cover enhancements like `Tensor._tri`, llama cast speedup, and mentions of bounties such as improvements in *uop matcher speed* and *unet3d*.
- **Future of Linear Algebra Functions**: A user asked about plans for implementing general linear algebra functions like determinant calculations or matrix decompositions in tinygrad. *No specific response was given in the extracted messages.*

---

### **tinygrad (George Hotz) ▷ #[learn-tinygrad](https://discord.com/channels/1068976834382925865/1070745817025106080/1254621018971050006)** (2 messages):

- **Buffer view option flagged in tinygrad**: A commit was shared that introduces a flag to make the buffer view optional in tinygrad. The commit message reads, *"make buffer view optional with a flag"* and the associated [GitHub Actions run](https://github.com/tinygrad/tinygrad/actions/runs/9638260193/job/26578693946?pr=5120) was provided.
- **Change in lazy.py raises concerns**: A member questioned if they were doing something wrong as their changes to `lazy.py` resulted in positive (good) and negative (bad) process replay outputs. They were seeking clarity on this unexpected behavior, implying potential issues with their modifications.

**Link mentioned**: [make buffer view optional with a flag · tinygrad/tinygrad@bdda002](https://github.com/tinygrad/tinygrad/actions/runs/9638260193/job/26578693946?pr=5120): You like pytorch? You like micrograd? You love tinygrad! ❤️ - make buffer view optional with a flag · tinygrad/tinygrad@bdda002

---

### **LLM Perf Enthusiasts AI ▷ #[claude](https://discord.com/channels/1168579740391710851/1168582222194933860/1254510317266796731)** (1 messages):

- **Claude Sonnet 3.5 impresses in Websim**: A member was testing **Claude Sonnet 3.5** in Websim and was highly impressed by the model's *"speed, creativity, and intelligence"*. They highlighted features such as "generate in new tab" and shared their experience of trying to *"hypnotize" themselves with the color schemes of different iconic fashion brands*. [Twitter link](https://fxtwitter.com/RobertHaisfield/status/1804945938936668413).

**Link mentioned**: [Tweet from Rob Haisfield (robhaisfield.com) (@RobertHaisfield)](https://fxtwitter.com/RobertHaisfield/status/1804945938936668413): I was "testing" Sonnet 3.5 @websim_ai + new features (mainly "generate in new tab"). I'm FLOORED by this model's speed, creativity, intelligence 🫨😂 Highlights from the lab t...

---

### **MLOps @Chipro ▷ #[events](https://discord.com/channels/814557108065534033/869270934773727272/1254828730174406738)** (1 messages):

- **MJCET launches AWS Cloud Club**: We are delighted to share that MJCET has launched the FIRST **AWS Cloud Club** in Telangana! This vibrant community provides resources, training, and hands-on experience with Amazon Web Services (AWS), equipping members with essential skills for a tech industry career.
- **Exclusive inaugural event with AWS Hero**: Join the grand inauguration of AWS Cloud Club MJCET on June 28th, 2024, from 10am to 12pm at Block 4 Seminar Hall, featuring **Mr. Faizal Khan**, AWS Community Hero. RSVP via this [meetup link](https://meetu.ps/e/NgmgX/14DgQ2/i) to confirm your attendance.

**Link mentioned**: [Inauguration of AWS Cloud Clubs MJCET, Fri, Jun 28, 2024, 10:00 AM | Meetup](https://meetu.ps/e/NgmgX/14DgQ2/i): **Join Us for the Grand Inauguration of AWS Cloud Club MJCET!** We are delighted to announce the launching event of our AWS Cloud Club at MJCET! Come and explore the world

---

---

---

---

---

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):