[AINews] Mozilla's AI Second Act

goat emojis

                June 27, 2024

            [AINews] Mozilla's AI Second Act

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Superfast CPU inference is all you need.

AI News for 6/25/2024-6/26/2024.
We checked 7 subreddits, 384 Twitters and 30 Discords (416 channels, and 3358 messages) for you. 
Estimated reading time saved (at 200wpm): 327 minutes. You can now tag @smol_ai for AINews discussions!

The slow decline of Mozilla's Firefox market share is well known, and after multiple rounds of layoffs its future story was very uncertain. However at the opening keynote of the AIE World's Fair today they came back swinging:

Very detailed live demos of llamafile with technical explanation from Justine Tunney herself, and Stephen Hood announcing a very welcome second project sqlite-vec that, you guessed it, adds vector search to sqlite.
You can watch the entire talk on the livestream (53mins in):

LlamaIndex also closed the day with a notable launch of llama-agents

Some mea culpas: yesterday we missed calling out Etched's big launch (questioned), and Claude Projects made a splash. The PyTorch documentary launched to crickets (weird?).

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
Claude 3 Sonnet
Claude 3.5 Sonnet
Claude 3 Opus
GPT4O (gpt-4o-2024-05-13)

PART 1: High level Discord summaries
OpenAI Discord
HuggingFace Discord
LAION Discord
Eleuther Discord
CUDA MODE Discord
AMD's Radeon MI300X Takes on Nvidia:
ASIC Chip Ambitions:
Optimization Tweaks and Triton Queries:
PyTorch Celebrates with Documentary:
Intel Pursues PyTorch Integration for GPUs:
Discussions of AI Infrastructure and Practices:

Perplexity AI Discord
Latent Space Discord
LM Studio Discord
Modular (Mojo 🔥) Discord
Interconnects (Nathan Lambert) Discord
Stability.ai (Stable Diffusion) Discord
Nous Research AI Discord
LangChain AI Discord
LlamaIndex Discord
OpenInterpreter Discord
Cohere Discord
OpenAccess AI Collective (axolotl) Discord
LLM Finetuning (Hamel + Dan) Discord
Mozilla AI Discord
tinygrad (George Hotz) Discord
AI Stack Devs (Yoko Li) Discord
DiscoResearch Discord
OpenRouter (Alex Atallah) Discord

PART 2: Detailed by-Channel summaries and links
OpenAI ▷ #annnouncements (2 messages):
OpenAI ▷ #ai-discussions (388 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (21 messages🔥):
OpenAI ▷ #prompt-engineering (1 messages):
OpenAI ▷ #api-discussions (1 messages):
HuggingFace ▷ #announcements (1 messages):
HuggingFace ▷ #general (245 messages🔥🔥):
HuggingFace ▷ #today-im-learning (3 messages):
HuggingFace ▷ #cool-finds (13 messages🔥):
HuggingFace ▷ #i-made-this (66 messages🔥🔥):
HuggingFace ▷ #reading-group (15 messages🔥):
HuggingFace ▷ #core-announcements (1 messages):
HuggingFace ▷ #computer-vision (3 messages):
HuggingFace ▷ #NLP (5 messages):
HuggingFace ▷ #diffusion-discussions (3 messages):
LAION ▷ #general (327 messages🔥🔥):
LAION ▷ #research (7 messages):
Eleuther ▷ #announcements (2 messages):
Eleuther ▷ #general (98 messages🔥🔥):
Eleuther ▷ #research (114 messages🔥🔥):
Eleuther ▷ #scaling-laws (15 messages🔥):
Eleuther ▷ #interpretability-general (4 messages):
CUDA MODE ▷ #general (16 messages🔥):
CUDA MODE ▷ #triton (1 messages):
CUDA MODE ▷ #torch (6 messages):
CUDA MODE ▷ #algorithms (1 messages):
CUDA MODE ▷ #torchao (38 messages🔥):
CUDA MODE ▷ #hqq (8 messages🔥):
CUDA MODE ▷ #llmdotc (146 messages🔥🔥):
CUDA MODE ▷ #intel (2 messages):
Perplexity AI ▷ #general (153 messages🔥🔥):
Perplexity AI ▷ #sharing (10 messages🔥):
Perplexity AI ▷ #pplx-api (5 messages):
Latent Space ▷ #ai-general-chat (66 messages🔥🔥):
Latent Space ▷ #llm-paper-club-west (100 messages🔥🔥):
LM Studio ▷ #💬-general (109 messages🔥🔥):
LM Studio ▷ #🤖-models-discussion-chat (7 messages):
LM Studio ▷ #🧠-feedback (2 messages):
LM Studio ▷ #🎛-hardware-discussion (15 messages🔥):
LM Studio ▷ #🧪-beta-releases-chat (7 messages):
LM Studio ▷ #open-interpreter (1 messages):
LM Studio ▷ #🛠-dev-chat (2 messages):
Modular (Mojo 🔥) ▷ #general (49 messages🔥):
Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):
Modular (Mojo 🔥) ▷ #ai (2 messages):
Modular (Mojo 🔥) ▷ #🔥mojo (18 messages🔥):
Modular (Mojo 🔥) ▷ #nightly (64 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #news (14 messages🔥):
Interconnects (Nathan Lambert) ▷ #ml-drama (28 messages🔥):
Interconnects (Nathan Lambert) ▷ #random (69 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #memes (11 messages🔥):
Stability.ai (Stable Diffusion) ▷ #general-chat (121 messages🔥🔥):
Nous Research AI ▷ #off-topic (5 messages):
Nous Research AI ▷ #interesting-links (2 messages):
Nous Research AI ▷ #general (87 messages🔥🔥):
Nous Research AI ▷ #rag-dataset (1 messages):
LangChain AI ▷ #general (69 messages🔥🔥):
LangChain AI ▷ #langserve (1 messages):
LangChain AI ▷ #share-your-work (2 messages):
LangChain AI ▷ #tutorials (1 messages):
LlamaIndex ▷ #general (37 messages🔥):
LlamaIndex ▷ #ai-discussion (2 messages):
OpenInterpreter ▷ #general (9 messages🔥):
OpenInterpreter ▷ #O1 (17 messages🔥):
Cohere ▷ #general (16 messages🔥):
Cohere ▷ #project-sharing (2 messages):
OpenAccess AI Collective (axolotl) ▷ #general (11 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):
OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #general (6 messages):
LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (3 messages):
Mozilla AI ▷ #announcements (1 messages):
Mozilla AI ▷ #llamafile (9 messages🔥):
tinygrad (George Hotz) ▷ #general (7 messages):
AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (4 messages):
DiscoResearch ▷ #embedding_dev (4 messages):
OpenRouter (Alex Atallah) ▷ #announcements (2 messages):
OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Anthropic Claude Updates

New UI features: @alexalbert__ noted new features in the Claude UI, including a sidebar for starring chats, shareable projects with 200K context windows for documents and files, and custom instructions to tailor responses.
Anthropic announces Projects: @AnthropicAI introduced Projects, which allow organizing chats into shareable knowledge bases with a 200K context window for relevant documents, code, and files. Available for Claude Pro and Team users.

Hardware and Performance Benchmarks

Etched AI specialized inference chip: @cHHillee shared thoughts on Etched's new inference chip, noting potential misleading marketing claims around silicon efficiency and performance. Benchmarks claim 500k tokens/sec (for multiple users) and replacing 160 H100s with one 8x Sohu server, but may not be normalized for key details. More info needed on benchmark methodology.
Sohu chip enables 15 agent trajectories/sec: @mathemagic1an highlighted that 500k tokens/sec on Sohu translates to 15 full 30k token agent trajectories per second, emphasizing the importance of building with this compute assumption to avoid being scooped.
Theoretical GPU inference limits: @Tim_Dettmers shared a model estimating theoretical max of ~300k tokens/sec for 8xB200 NVLink 8-bit inference on 70B Llama, assuming perfect implementations like OpenAI/Anthropic. Suggests Etched benchmarks seem low.

Open Source Models

Deepseek Coder v2 beats Gemini: @bindureddy claimed an open-source model beats the latest Gemini on reasoning and code, with more details on open-source progress coming soon. A follow-up provided specifics - Deepseek Coder v2 excels at coding and reasoning, beating GPT-4 variants on math and putting open-source in 3rd behind Anthropic and OpenAI on real-world production use cases.
Sonnet overpowers GPT-4: @bindureddy shared that Anthropic's Sonnet model continues to overpower GPT-4 variants in testing across workloads, giving a flavor of impressive upcoming models.

Biological AI Breakthroughs

ESM3 simulates evolution to generate proteins: @ylecun shared news of Evolutionary Scale AI, a startup using a 98B parameter LLM called ESM3 to "program biology". ESM3 simulated 500M years of evolution to generate a novel fluorescent protein. The blog post has more details. ESM3 was developed by former Meta AI researchers.

Emerging AI Trends and Takes

Data abundance is key to AI progress: @alexandr_wang emphasized that breaking through the "data wall" will require innovation in data abundance. AI models compress their training data, so continuing current progress will depend on new data, not just algorithms.
Returns on human intelligence post-AGI: @RichardMCNgo predicted that the premium on human genius will increase rather than decrease after AGI, as only the smartest humans will understand what AGIs are doing. 
Terminology for multimodal AI: @RichardMCNgo noted it's becoming weird to call multimodal AI "LLMs" and solicited suggestions for replacement terminology as models expand beyond language.

Memes and Humor

@Teknium1 joked about OpenAI having trouble removing "waifu features" in GPT-4 voice model updates.
@arankomatsuzaki made a joke announcement that Noam Shazeer won the Turing Award for pioneering work on AI girlfriends.
@willdepue joked that "AGI is solved" now that you can search past conversations in chatbots.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Progress

AI website generation: A new AI system can generate full webpages from just a URL or description input, demonstrating progress in AI content creation capabilities. Video demo.
OpenAI Voice Mode delay: OpenAI announced a one month delay for the advanced Voice Mode alpha release to improve safety and user experience. Plans for all Plus users to have access in the fall.
Singularity book release: Ray Kurzweil released a sequel to his 2005 book The Singularity is Near, sparking excitement and discussion about the future of AI.
AI agents speculation: OpenAI's acquisition of a remote desktop control startup led to speculation about integration with ChatGPT desktop for AI agents.
AI-generated ads: Toys R Us used the SORA AI system to generate a promotional video/ad, showcasing AI in marketing.

AI Research

New optimizer outperforms AdamW: A research paper introduced Adam-mini, a new optimizer that achieves 50% higher throughput than the popular AdamW.
Matrix multiplication eliminated in LLMs: Researchers demonstrated LLMs that eliminate matrix multiplication, enabling much more efficient models with major implications for running large models on consumer hardware.
Simulating evolution with AI: EvolutionaryScale announced ESM3, a generative language model that can simulate 500 million years of evolution to generate new functional proteins.

AI Products & Services

Deepseek Coder V2 math capabilities: Users praised the math capabilities of the Deepseek Coder V2 model, a free model from China that outperforms GPT-4 and Claude.
AI audiobook narration: An AI-narrated audiobook was well-received, implying audiobook narration is now a solved problem with AI.
New AI apps and features: Several new AI applications and features were announced, including Tcurtsni, a "reverse-instruct" chat app, Synthesia 2.0, a synthetic media platform, and Projects in Claude for organizing chats and documents.

AI Safety & Ethics

Rabbit data breach: A security disclosure revealed a data breach in Rabbit where all responses from their R1 model could be downloaded, raising concerns about AI company negligence. 
Hallucination concerns: An opinion post argued the "AI hallucinates" talking point is dangerous as it masks the real risks of rapidly improving AI flooding job markets.

AI Hardware

AMD MI300X benchmarks: Benchmarks of AMD's new MI300X AI accelerator chip were released and analyzed.
Sohu AI chip claims: A new Sohu AI chip was announced claiming 500K tokens/sec on a 70B model, with 8 chips equivalent to 160 NVIDIA H100 GPUs.
MI300X vs H100 comparison: A comparison showed AMD's MI300X is ~5% slower but 46% cheaper with 2.5X the memory of NVIDIA's H100 on the LLaMA-2 70B model.

AI Art

A8R8 v0.7.0 release: A new version of the A8R8 Stable Diffusion UI was released with Comfy integration for regional prompting and other updates.
ComfyUI new features: A detailed post reviewed new features in the ComfyUI Stable Diffusion environment like samplers, schedulers, and CFG implementations.
Magnific AI relighting tool: Results from Magnific AI's new relighting tool were compared to a user's workflow, finding it lacking in quality. 
SD model comparisons: Different Stable Diffusion model sizes were compared on generating specified body positions, with performance noted as "not good."

Other Notable News

Stability AI leadership changes: Stability AI announced a new CEO, board members, funding round, and commitment to open source while expanding enterprise tools.
AI traffic analysis: A post proposed ways to quantify bandwidth usage of major AI systems, estimating AI is still a small part of overall internet traffic.
Politician shares false ChatGPT stats: A news article reported a Canadian politician shared inaccurate statistics generated by ChatGPT, highlighting risks of using unverified AI outputs.
Open-source AI agent for on-call: Merlinn, an open-source AI Slack bot to assist on-call engineers, was announced.
Living skin robots: BBC reported on research into covering robots with living human skin to make them more lifelike.
Gene therapy progress: A tweet discussed gene therapies progressing from rare to common diseases.
Google AI event: News that Google will reveal new AI tech and Pixel phones at an August event.
Tempering AI release expectations: A post advised taking AI product release dates with a grain of salt due to R&D uncertainty.
AI ending amateurism: An opinion piece argued generative AI will allow everyone to produce professional-quality work.

AI Discord Recap

A summary of Summaries of Summaries

Claude 3 Sonnet
1. 🔥 LLM Advancements and Benchmarking

Llama 3 from Meta tops leaderboards, outperforming GPT-4-Turbo and Claude 3 Opus per ChatbotArena.
New models: Granite-8B-Code-Instruct for coding, DeepSeek-V2 with 236B parameters.
Skepticism around certain benchmarks, calls for credible sources to set realistic evaluation standards.

2. 🤖 Optimizing LLM Inference and Training

ZeRO++ promises 4x reduced communication overhead on GPUs.
vAttention dynamically manages KV-cache memory for efficient inference.
QServe uses W4A8KV4 quantization to boost cloud serving on GPUs.
Consistency LLMs explore parallel token decoding for lower latency.

3. 🌐 Open-Source AI Frameworks and Community Efforts 

Axolotl supports diverse formats for instruction tuning and pre-training.
LlamaIndex powers a course on building agentic RAG systems.
RefuelLLM-2 claims best for "unsexy data tasks".
Modular teases Mojo's Python integration and AI extensions.

4. 🖼 Multimodal AI and Generative Modeling Innovations

Idefics2 8B Chatty for elevated chat interactions. 
CodeGemma 1.1 7B refines coding abilities.
Phi 3 brings powerful chatbots to browsers via WebGPU.
Combining Pixart Sigma + SDXL + PAG aims for DALLE-3-level outputs with potential fine-tuning.
Open-source IC-Light for image relighting techniques.

5. Stable Artisan for AI Media Creation in Discord

Stability AI launched Stable Artisan, a Discord bot integrating Stable Diffusion 3, Stable Video Diffusion, and Stable Image Core for media generation within Discord.
Sparked discussions around SD3's open-source status and Artisan's introduction as a paid API service.

Claude 3.5 Sonnet

LLMs Level Up in Performance and Efficiency:

New models like IBM's Granite-8B-Code-Instruct and RefuelLLM-2 are pushing boundaries in code instruction and data tasks. Communities across Discord channels are discussing these advancements and their implications.

Optimization techniques such as Adam-mini are gaining traction, promising 45-50% memory reduction compared to AdamW while maintaining performance. This has sparked discussions in the OpenAccess AI Collective and CUDA MODE Discords.

The vAttention system for efficient KV-cache memory management is being explored as an alternative to PagedAttention, highlighting the ongoing focus on inference optimization across AI communities.

Open-Source AI Flourishes with Community-Driven Tools:

Axolotl is gaining popularity for its support of diverse dataset formats in LLM training, discussed in both the OpenAccess AI Collective and HuggingFace Discords.

The LlamaIndex framework is powering new courses on building agentic RAG systems, generating excitement in the LlamaIndex and general AI development communities.

Mojo's potential for Python integration and AI extensions is a hot topic in the Modular Discord, with discussions on its implications for AI development workflows.

Multimodal AI Pushes Creative Boundaries:

The combination of Pixart Sigma, SDXL, and PAG is being explored to achieve DALLE-3 level outputs, as discussed in the Stability.ai and general AI communities.

Stable Artisan, a new Discord bot from Stability AI, is integrating models like Stable Diffusion 3 and Stable Video Diffusion, sparking conversations about AI-powered media creation across multiple Discord channels.

The open-source IC-Light project for image relighting is gaining attention in computer vision circles, showcasing the ongoing innovation in image manipulation techniques.

AI Hardware Race Heats Up:

AMD's Radeon Instinct MI300X is challenging Nvidia's dominance in the GPU compute market, despite software ecosystem challenges. This has been a topic of discussion in the CUDA MODE and hardware-focused Discord channels.

The announcement of Etched's Sohu AI chip has sparked debates across AI hardware communities about its potential to outperform GPUs in running transformer models, with claims of replacing multiple H100 GPUs.

Discussions about specialized AI chips versus general-purpose GPUs are ongoing, with community members in various Discord servers debating the future direction of AI hardware acceleration.

Claude 3 Opus
1. LLM Performance and Benchmarking:

Discussions about the performance of various LLMs, such as Llama 3 from Meta outperforming models like GPT-4-Turbo and Claude 3 Opus on leaderboards like ChatbotArena.
New models like IBM's Granite-8B-Code-Instruct and DeepSeek-V2 showcasing advancements in instruction following and parameter count.
Concerns about the credibility of certain benchmarks and the need for realistic LLM assessment standards from reputable sources.

2. Hardware Advancements and Optimization Techniques:

Techniques like ZeRO++ and vAttention being explored to optimize GPU memory usage and reduce communication overhead during LLM training and inference.
Advancements in quantization, such as QServe introducing W4A8KV4 quantization for improved GPU performance in cloud-based LLM serving.
Discussions about the potential of specialized AI chips like Etched's Sohu and comparisons with GPU performance for running transformer models.

3. Open-Source Frameworks and Community Efforts:

Open-source frameworks like Axolotl and LlamaIndex supporting diverse dataset formats and enabling the development of agentic RAG systems.
The release of open-source models like RefuelLLM-2, claiming to be the best LLM for "unsexy data tasks."
Community efforts to integrate AI capabilities into platforms like Discord, with bots such as Stable Artisan from Stability AI for media generation and editing.

4. Multimodal AI and Generative Models:

New models focusing on specific tasks, such as Idefics2 8B Chatty for elevated chat interactions and CodeGemma 1.1 7B for coding abilities.
Advancements in browser-based AI chatbots, like the Phi 3 model utilizing WebGPU for powerful interactions.
Efforts to combine techniques like Pixart Sigma, SDXL, and PAG to achieve DALLE-3-level outputs in generative models.
Open-source projects like IC-Light focusing on specific tasks such as image relighting.

GPT4O (gpt-4o-2024-05-13)

Model Performance and Benchmarks:

Llama3 70B Models Show Promise: New open LLM leaderboards hosted on 300 H100 GPUs have Qwen 72B leading, though bigger models don't always equate to better performance. Analyses highlighted differences in scope between training vs. inference benchmarks. 
Solving Grade School Arithmetic highlights skepticism where data leakage in large LLMs results in misleadingly high benchmarks despite incomplete learning. Calls for credible assessments were noted.

Training, Optimization and Implementation Issues:

Push for Better Optimizers: Adam-mini optimizer offers equivalent performance to AdamW but reduces memory use by 45-50%. This optimizer simplifies storage by reducing the number of learning rates per parameter.
Memory Management in High-Context Models: Efforts to load large models, such as Llama3 70B or Hermes, on consumer-grade GPUs are hindered by significant OOM errors, driving discussions on effective GPU VRAM utilization.

AI Ethics and Community Debates:

Ethics of AI Data Use: Debates in LAION Discord stressed the controversial inclusion of NSFW content in datasets, balancing ethical concerns with the motivation for unrestricted data access.
Model Poisoning Concerns: Discussions in LAION focused on ethical implications and potential model poisoning, where controversial techniques in training and dataset usage are encouraged without broader consideration of long-term impacts.

Specialized AI Hardware Trends:

Etched's Sohu Chips Boast 10x Performance: Etched’s new transformer ASIC chips claim to outperform Nvidia GPUs significantly, with considerable financial backing. However, practical adaptability and inflexibility concerns were discussed within CUDA MODE.
AMD's MI300X Challenges Nvidia: AMD's MI300X seeks to dethrone Nvidia in GPU compute markets, despite lagging behind Nvidia's CUDA ecosystem.

AI Application Integration:

Custom GPT Apps on Hugging Face Flourish: Growing interest in custom GPT-based applications, citing niche tasks like Japanese sentence explanations, remains strong. Collaborative efforts in the community have driven the creation of resources and toolkits for ease of implementation.
AI-Assisted Tools Expand Academic Reach: The new GPA Saver platform leverages AI for academic assistance, indicating growing integration of AI in streamlined educational tools. Community discussions about improving AI-driven functionalities highlighted potential and current constraints.

PART 1: High level Discord summaries
OpenAI Discord
Quick Access with a Shortcut: The ChatGPT desktop app for macOS is now available, featuring a quick-access Option + Space shortcut for seamless integration with emails and images.
Voice Mode Hiccup: The anticipated advanced Voice Mode for ChatGPT has been postponed by a month to ensure quality before alpha testing; expect more capabilities like emotion detection and non-verbal cues in the fall.
OpenAI vs Anthropic's Heavyweights: Discussions are heating with regards to GPTs agents' inability to learn post-training and Anthropic's Claude gaining an edge over ChatGPT due to technical feats, such as larger token context windows and a rumored MoE setup.
Customization Craze in AI: Enthusiasts are creating custom GPT applications using resources like Hugging Face, with a particular interest in niche tasks like explaining Japanese sentences, as well as concerns about current limitations in OpenAI's model updates and feature rollout.
GPT-4 Desktop App and Performance Threads: Users noted the limitation of the new macOS desktop app to Apple Silicon chips and shared mixed reviews on GPT-4's performance, expressing desire for Windows app support and improvements in response times.

HuggingFace Discord

RAG Under the Microscope: A discussion centered on the use of Retrieval-Augmented Generation (RAG) techniques highlighted consideration for managing document length with SSM like Mamba and using BM25 for keyword-oriented retrieval. A GitHub resource related to BM25 can be found here.

Interactive Hand Gestures: Two separate contexts highlighted a Python-based "Hand Gesture Media Player Controller," shared via a YouTube demonstration, indicating burgeoning interest in applied computer vision to control interfaces.

PAG Boosts 'Diffusers' Library: An integration of Perturbed Attention Guidance (PAG) into the diffusers library promises enhanced image generation, as announced in HuggingFace's core announcements, thanks to a community contribution.

Cracking Knowledge Distillation for Specific Languages: Queries around knowledge distillation were prominent, with one member proposing a distilled multilingual model for a single language and another recommending SpeechBrain for tackling the task.

LLMs and Dataset Quality in Focus: Alongside advances such as the Phi-3-Mini-128K-Instruct model by Microsoft, the community spotlighted the importance of dataset quality. Concurrently, concerns related to data leakage in LLMs were addressed through papers citing the issue here and here. 

Clamor for AI-driven Tools: From a request for a seamless AI API development platform, referenced through a feedback survey, to the challenge of identifying data in handwritten tables, there's a clear demand for AI-powered solutions that streamline tasks and inject efficiency into workflows.

LAION Discord

AI Ethics Take Center Stage: Conversations arose about the ethics in AI training, where a member expressed concerns about active encouragement of model poisoning. Another member debated the offered solution to the AIW+ problem as incorrect, mentioning it overlooks certain familial relationships, thus suggesting ambiguity and ethical considerations.

Music Generation with AI Hits a High Note: Discussions involved using RateYourMusic ID to generate songs and lyrics, with an individual confirming its success and describing the outcomes as "hilarious."

The Great NSFW Content Debate: A debate surged regarding whether NSFW content should be included in datasets, highlighting the dichotomy between moral concerns and the argument against excessively cautious model safety measures.

GPU Showdown and Practicality: Members exchanged insights on the trade-offs between A6000s, 3090s, and P40 GPUs, noting differences in VRAM, cooling requirements, and model efficiency when applied to AI training.

ASIC Chips Enter the Transformer Arena: An emerging topic was Etched's Sohu, a specialized chip for transformer models. Its touted advantages sparked discussions on its practicality and adaptability to various AI models, contrasting with skepticism concerning its potential inflexibility.

Eleuther Discord

ICML 2024 Papers on the Spotlight: EleutherAI researchers gear up for ICML 2024 with papers addressing classifier-free guidance and open foundation model impacts. Another study delves into memorization in language models, examining issues like privacy and generalization.

Multimodal Marvels and Gatherings Galore: Huggingface's leaderboard emerges as a handy tool for those seeking top-notch multimodal models; meanwhile, ICML's Vienna meet-up attracts a cluster of enthusiastic plans. The hybrid model Goldfinch also became a part of the exchange, merging Llama with Finch B2 layers for enhanced performance.

Papers Prompting Peers: Discussion in the #research channel flared around papers from comparative evaluations of Synquid to the application of Hopfield Networks in transformers. Members dissected topics ranging from multimodal learning efficiencies to experimental approaches in generalization and grokking.

Return of the Hopfields: Members offered insights on self-attention in neural networks by corralling it within the framework of (hetero)associative memory, bolstered by references to continuous modern Hopfield Networks and their implementation as single-step attention.

Sparse and Smart: Sparse Autoencoders (SAEs) take the stage for their aptitude in unearthing linear features from overcomplete bases, as touted in LessWrong posts. Additionally, a noteworthy mention was a paper on multilingual LLM safety, demonstrating cross-lingual detoxification from directionally poisoned optimization (DPO).

CUDA MODE Discord
AMD's Radeon MI300X Takes on Nvidia:
The new AMD Radeon Instinct MI300X is positioned to challenge Nvidia's dominant status in the GPU compute market despite AMD's software ecosystem ROCm lagging behind Nvidia's CUDA, as detailed in an article on Chips and Cheese.
ASIC Chip Ambitions:
Etched's announcement of the Transformer ASIC chips aims to outpace GPUs in running AI models more efficiently, with significant investment including a $120 million series A funding round supported by Bryan Johnson, raising discussions about the future role of specialized AI chips.
Optimization Tweaks and Triton Queries:
Engineering conversations revolve around a proposed Adam-mini optimizer that operates with 45-50% less memory, with code available on GitHub, and community assistance sought for a pow function addition in python.triton.language.core as shown in this Triton issue.
PyTorch Celebrates with Documentary:
The premiere of the "PyTorch Documentary Virtual Premiere: Live Stream" has garnered attention, featuring PyTorch’s evolution and its community, substantially reiterated by users and symbolized with goat emojis to express the excitement, watchable here.
Intel Pursues PyTorch Integration for GPUs:
Building momentum for Intel GPU (XPU) support in stock PyTorch continues with an Intel PyTorch team's RFC on GitHub, signaling Intel’s commitment to becoming an active participant in the deep learning hardware space.
Discussions of AI Infrastructure and Practices:
Community dialogue featured topics like learning rate scaling, update clipping with insights from an AdamW paper, infrastructural choices between AMD and Nvidia builds, and the intrigue around the Sohu ASIC chip's promises, impacting the efficacy of large transformer models.

Perplexity AI Discord
Perplexed by Perplexity API: Engineers discussed intermittent 5xx errors with the Perplexity AI's API, highlighting the need for better transparency via a status page. There were also debates on API filters and undocumented features, with some users probing the existence of a search domain filter and citation date filters.
In Search of Better Search: The Perplexity Pro focus search faced criticism for limitations, while comparisons to ChatGPT noted Perplexity's new agentic search capabilities but criticized its tendency to hallucinate in summarizations.
Claude Leverages Context: The guild buzzed about Claude 3.5's 32k token context window for Perplexity Pro users, with Android support confirmed. Users showed a clear preference for the full 200k token window offered by Claude Pro.
Innovation Insight with Denis Yarats: The CTO of Perplexity AI dissected AI's innovation in a YouTube video, discussing how it revolutionizes search quality. In a related conversation, researchers presented a new method that could change the game by removing matrix multiplication from language model computations.
Hot Topics and Searches in Sharing Space: The community shared numerous Perplexity AI searches and pages including evidence of Titan's missing waves, China's lunar endeavors, and a study on how gravity affects perception, encouraging others to explore these curated searches on their platform.

Latent Space Discord

AI World's Fair Watch Party Launch: Enthusiasm stirred up for hosting a watch party for AI Engineer World’s Fair, livestreamed here, spotlighting cutting-edge keynotes and code tracks.

Premiere Night for PyTorch Fans: Anticipation builds around the PyTorch Documentary Virtual Premiere, highlighting the evolution and impact of the project with commentary from its founders and key contributors.

ChatGPT's Voice Update Muted: A delayed release of ChatGPT's Voice Mode, due to technical difficulties with voice features, causes a stir following a tweet by Teknium.

Bee Computer Buzzes with Intelligence: Attendees at an AI Engineer event buzz over new AI wearable tech from Bee Computer, touted for its in-depth personal data understanding and proactive task lists.

Neural Visuals Exceed Expectations: A breakthrough in neuroscience captures community interest with the reconstruction of visual experiences from mouse cortex activity, demonstrating incredible neuroimaging strides.

LM Studio Discord

Tech Troubles and Tips in LM Studio: Engineers reported errors with LM Studio (0.2.25), including an Exit code: -1073740791 when loading models. For Hermes 2 Theta Llama-3 70B, users with RTX 3060ti faced "Out of Memory" issues and considered alternatives like NousResearch's 8b. Issues were also noted when running Llama 3 70B on Apple's M Chip due to different quant types and settings.

RAG Gets the Spotlight: A detailed discussion on retrieval-augmented generation (RAG) took place, highlighting NVIDIA's blog post on RAG's capability to enhance information generation accuracy with external data.

Scam Warnings and Security Tips: Users noted the presence of scam links to a Russian site impersonating Steam and reported these for moderator action. There's awareness in the community regarding phishing attacks and the importance of securing personal and project data.

Hardware Conversations Heat Up: A completed build using 8x P40 GPUs was mentioned, sparking further discussions on server power management involving a 200 amp circuit and VRAM reporting accuracy in LM Studio for multi-GPU setups. The noise produced by home server setups was also humorously likened to a jet engine.

Innovative Ideas and SDK Expo: Members shared ideas ranging from using an LLM as a game master in a sci-fi role-playing game to solving poor performance with larger context windows in token prediction. There's a guide to building Discord bots with the SDK here and questions regarding extracting data from the LM Studio server using Python.

Uploading Blocks in Open Interpreter: There's frustration over the inability to upload documents or images directly into the open interpreter terminal, limiting users in interfacing with AI models and use cases.

Modular (Mojo 🔥) Discord

Plotting a Path with Mojo Data Types: Engineers are experimenting with Mojo data types for direct plotting without conversion to Numpy, utilizing libraries like Mojo-SDL for SDL2 bindings. The community is mulling over the desired features for a Mojo charting library, with focus areas ranging from high-level interfaces to interactive charts and integration with data formats like Arrow.

Vega IR for Versatile Visualization: The need for interactivity in data visualization was underscored, with the Vega specification being proposed as an Intermediate Representation (IR) to bridge web and native rendering. The conversation touched on the unique approaches of libraries like UW's Mosaic and mainstream ones like D3, Altair, and Plotly.

WSL as a Windows Gateway to Mojo: Mojo has been confirmed to work on Windows via the Windows Subsystem for Linux (WSL), with native support anticipated by year's end. Ease of use with Visual Studio Code and Linux directories was a highlight.

IQ vs. Intelligence Debate Heats Up: The community engaged in a lively debate about the nature of intelligence, with the ARC test questioned for its human-centric pattern recognition tasks. Some users view AI excelling at IQ tests as not indicative of true intelligence, while the concept of consciousness versus recall sparked further philosophical discussion.

Compile-Time Quirks and Nightly Builds: Multiple issues with the Mojo compiler were aired, ranging from reported bugs in type checking and handling of boolean expressions to the handling of List and Tensor at compile time. Encouragement to report issues, even if resolved in nightly builds, was echoed across the threads. Specific commits, nightly build updates, and suggestions for referencing immutable static lifetime variables were also discussed, rallying the community around collaborative debugging and improvement.

Interconnects (Nathan Lambert) Discord

LLM Leaderboard Bragging Rights Questioned: Clement Delangue's announcement of a new open LLM leaderboard boasted the use of 300 H100 GPUs to rerun MMLU-pro evaluations, prompting sarcasm and criticisim about the necessity of such computing power and the effectiveness of larger models.

API Security Gone Awry at RabbitCode: Rabbitude's discovery of hardcoded API keys, including ones for ElevenLabs and others, has left services like Azure and Google Maps vulnerable, causing concerns over unauthorized data access and speculation about the misuse of ElevenLabs credits.

Delay in ChatGPT's Advanced Voice Mode: OpenAI has postponed the release of ChatGPT’s advanced Voice Mode for Plus subscribers till fall, aiming to enhance content detection and the user experience, as shared via OpenAI's Twitter.

Murmurs of Imbue’s Sudden Success: Imbue's sudden $200M fundraise drew skepticism among members, exploring the company's unclear history and comparing their trajectory with the strategies of Scale AI and its subsidiaries for data annotation and PhD recruitment for remote AI projects.

Music Industry’s AI Transformation: Udio's statement on AI's potential to revolutionize the music industry clashed with the RIAA's concerns, asserting AI will become essential for music creation despite industry pushback.

Stability.ai (Stable Diffusion) Discord

Challenging Stability AI to Step Up: Discussions point to growing concerns about Stability AI’s approach with Stable Diffusion 3 (SD3), stressing the need for uncensored models and updated licenses to retain long-term viability. A more practical real-world application beyond novelty creations is requested by the community.

Cost-Effective GPU Strategies Discussed: The comparison of GPU rental costs reveals Vast as a more economical option for running a 3090 compared to Runpod, with prices cited as low as 30 cents an hour.

Debate: Community Drive vs. Corporate Backup: There's an active debate on the balance between open-source initiatives and corporate influence, with some members arguing for community support as crucial and others citing Linux's success with enterprise backing as a valid path.

Optimizing Builds for Machine Learning: Members are sharing hardware recommendations for effective Stable Diffusion setups, with a consensus forming around the Nvidia 4090 for its performance benefit, potentially favoring dual 4090s over higher VRAM single GPUs for cost savings.

Nostalgia Over ICQ and SDXL Hurdles: The shutdown of the legacy messaging service ICQ triggered nostalgic exchanges, while the community also reported challenges in running SDXL, particularly for those experiencing "cuda out of memory" errors due to insufficient VRAM, seeking advice on command-line solutions.

Nous Research AI Discord

Introducing the Prompt Engineering Toolkit: An open-source Prompt Engineering Toolkit was shared for use with Sonnet 3.5, designed to assist with creating better prompts for AI applications.

Skepticism Breeds Amidst Model Performance: A demonstration of Microsoft's new raw text data augmentation model on Genstruct prompted doubts about its efficacy, showing results that seemed off-topic.

AI Chip Performance Heats Up Debate: The new "Sohu" AI chip sparked discussions about its potential for high-performance inference tasks, linking to Gergely Orosz's post which suggests OpenAI doesn't believe AGI is imminent despite advancing hardware.

70B Model Toolkit Launched by Imbue AI: Imbue AI released a toolkit for a 70B model with resources including 11 NLP benchmarks, a code-focused reasoning benchmark, and a hyperparameter optimizer, found at Imbue's introductory page.

Embracing the Whimsical AI: A post from a user featured AI-generated content in meme format by Claude from Anthropic, reflecting on Claude's explanation of complex topics and its humorous take on not experiencing weather or existential crises.

LangChain AI Discord

Streamlining AI Conversations: Engineers highlighted the .stream() method from langchain_community.chat_models for iterating through LangChain responses, while others discussed integrating Zep for long-term memory in AI and contemplated direct BytesIO PDF handling in LangChain without temp files.

Visualization Quest in LangChain: Discussion around live visualizing agents' thoughts in Streamlit touched on using StreamlitCallback but also identified a gap in managing streaming responses without callbacks.

Troubleshooting the Unseen: Inquiries were made about LangSmith's failure to trace execution despite proper environmental setup, with a suggestion to check trace quotas.

Extending Containerized Testing: A community member contributed Ollama support to testcontainers-python, facilitating LLM endpoint testing, as indicated in their GitHub issue and pull request.

Cognitive Crafts and Publications: A Medium article on few-shot prompting with tool calling in Langchain was shared, alongside a YouTube video exploring the ARC AGI challenges titled "Claude 3.5 struggle too?! The $Million dollar challenge".

LlamaIndex Discord

Chatbots Seeking Contextual Clarity: An engineer inquired about how to effectively retrieve context directly from a chat response within the LlamaIndex chatbot framework, sharing implementation details and the challenges encountered.

Pull Request Review on the Horizon: A member shared a GitHub PR for review, aimed at adding query filtering functionalities to the Neo4J Database in LlamaIndex, and another member acknowledged the need to address the backlog.

Silencing Superfluous Notifications: There was a discussion on how to suppress unnecessary notifications about missing machine learning libraries in the Openailike class, with the clarification that such messages are not errors.

Tuning SQL Queries with LLMs: Dialogue among users highlighted the benefits of fine-tuning language models for enhanced precision in SQL queries when using a RAG SQL layer, suggesting better performance with quality training data.

Balancing Hybrid Searches: Questions about hybrid search implementations in LlamaIndex have been addressed, focusing on adjusting the alpha parameter to balance metadata and text relevance in search results.

Boosting RAG with LlamaIndex: An article was shared highlighting ways to build optimized Retrieval-Augmented Generation systems with LlamaIndex and DSPy, providing insights and practical steps for AI engineers.

Open Source Contribution Perks: A call was made for feedback on an open-source project, Emerging-AI/ENOVA, for enhancing AI deployment, monitoring, and auto-scaling, with an incentive of a $50 gift card.

OpenInterpreter Discord

Claude-3.5-Sonnet Steps into the Spotlight: The latest Anthropic model is officially named claude-3-5-sonnet-20240620, putting an end to name confusion among members.

MoonDream's Vision Limitation Acknowledged: While there's interest in a MoonDream-based vision model for OpenInterpreter (OI), current conversation confirms it's not compatible with OI.

Multiline Input Quirks and Vision Command Errors: Technical issues arose with -ml for multiline inputs and the interpreter --os --vision command, with one user verifying their API key but facing errors, and another member reported a ban from attempting to directly drop files into the terminal.

01: OI's Voice Interface, Not for Sale Everywhere: 01, as the voice interface for OI, can't be bought in Spain; enthusiasts are redirected to an open-source dev kit on GitHub for DIY alternatives.

Constructing Your Own 01: Tutorials for DIY assembly of 01 from the open-source kit will be proliferating, including one planned for July, hinting at the community's commitment to ensuring wider access beyond commercial sale limitations.

Cohere Discord
Curiosity About Cohere's Scholars Program: One member inquired about the status of the scholars program for the current year, but no additional information or discussion followed on this topic.
Billable Preamble Tokens in the Spotlight: A user highlighted an experiment involving preamble tokens for API calls, bringing up a cost-cutting loophole that could avoid charges by exploiting non-billable preamble usage.
Designing with Rust for LLMs: An announcement was made about the release of Rig, a Rust library for creating LLM-driven applications, with an invitation to developers to engage in an incentivized feedback program to explore and review the library.
Ethical Considerations Surface in AI Usage: Concerns were brought up regarding SpicyChat AI, a NSFW bot hosting service, potentially violating Cohere's CC-BY-NA license through profit-generating use coupled with the claim of circumventing this via OpenRouter.
Learning Event on 1Bit LLMs by Hongyu Wang: An online talk titled The Era of 1Bit LLMs hosted by Hongyu Wang was announced with an invitation extended to attend through a provided Google Meet link.

OpenAccess AI Collective (axolotl) Discord

Adam Optimizer Slims Down: Engineers discussed an arXiv paper introducing Adam-mini, highlighting its reduced memory footprint by 45% to 50% compared to AdamW. It achieves this by using fewer learning rates, leveraging parameter block learning inspired by the Hessian structure of Transformers.

Training Pitfalls and CUDA Quandaries: One engineer sought advice on implementing output text masking during training, akin to train_on_input, while another raised an issue with CUDA errors, suggesting enabling CUDA_LAUNCH_BLOCKING=1 for identifying illegal memory access during model training.

Gradient Accumulation—Friend or Foe?: The impact of increasing gradient accumulation was hotly debated; some believe it may shortcut training by running the optimizer less often, others worry it could lead to slower steps and more training time.

Cosine Schedules and QDora Quests: Questions arose about creating a cosine learning rate scheduler with a non-zero minimum on the Hugging Face platform, and excitement was evident over a pull request enabling QDora in PEFT.

Narrative Engines and Mistral Mysteries: The introduction of Storiagl, a platform for building stories with custom LLMs, was showcased, while another engineer reported a repetitive text generation issue with Mistral7B, despite high temperature settings and seeking solutions.

LLM Finetuning (Hamel + Dan) Discord
Prompting Takes the Cake in Language Learning: Researchers, including Eline Visser, have shown that prompting a large language model (LLM) outperforms fine-tuning when learning Kalamang language using a single grammar book. The findings, indicating that 'prompting wins', are detailed in a tweet by Jack Morris and further elaborated in an academic paper.
Catch the AI Engineer World’s Fair Online: The AI Engineer World's Fair 2024 is being streamed live, focusing on keynotes and the CodeGen Track, with access available on YouTube; more specifics are provided on Twitter.
Claude Contest Calls for Creatives: The June 2024 Build with Claude contest has been announced, inviting engineers to demonstrate their expertise with Claude, as outlined in the official guidelines.
Credit Where Credit is Due: An individual offered assistance with a credit form issue, asking to be directly messaged with the related email address to resolve the matter efficiently.
Model Offloading Techniques Debated: The community has observed that DeepSpeed (DS) seems to have more effective fine-grained offloading strategies compared to FairScale's Fully Sharded Data Parallel (FSDP). Additionally, the utility of these offloading strategies with LLama 70B is under consideration by members seeking to optimize settings.

Mozilla AI Discord

Mozilla's Builders Program Ticks Clock: Members are reminded to submit their applications for the Mozilla Builders Program before the July 8th early application deadline. For support and additional information, check the Mozilla Builders Program page.

'90s Nostalgia via Firefox and llamafile: Firefox has integrated llamafile as an HTTP proxy, allowing users to venture through LLM weights in a retro web experience; a demonstration video is available on YouTube.

Create Your Own Chat Universe: Users can create immersive chat scenarios, fusing llamafile with Haystack and Character Codex, through a shared notebook which is accessible here.

Cleansing CUDA Clutter in Notebooks: To keep Jupyter notebooks pristine, it's suggested to address CUDA warnings by using the utility from Haystack.

NVIDIA's Stock Sent on a Rollercoaster: Following a talk at AIEWF, NVIDIA's market cap fell dramatically, triggering various analyses from outlets like MarketWatch and Barrons over the catalyst of the company's financial performance.

tinygrad (George Hotz) Discord

Tinygrad Explores FPGA Acceleration: There's chatter about tinygrad leveraging FPGAs as a backend, with George Hotz hinting at a potential accelerator design for implementation.
Groq Alumni Launch Positron for High-Efficiency AI: Ex-Groq engineers introduced Positron, targeting the AI hardware market with devices like Atlas Transformer Inference Server, boasting a 10x performance boost per dollar over competitors like DGX-H100.
FPGA's Role in Tailored AI with HDL: Discussion centered on the future of FPGAs equipped with DSP blocks and HBM, which could allow for the creation of model-specific HDL, although it was noted that Positron's approach is generic and not tied to a specific FPGA brand.
PyTorch's Impact on AI Celebrated in Documentary: A documentary on YouTube highlighting PyTorch's development and its influence on AI research and tooling has been shared with the community.

AI Stack Devs (Yoko Li) Discord

Angry.penguin Ascends to Mod Throne: User angry.penguin was promoted to moderator to tackle the guild's spam problem, volunteering with a proactive approach and immediately cleaning up the existing spam. Yoko Li entrusted angry.penguin with these new responsibilities and spam control measures.

Spam No More: Newly-minted moderator angry.penguin announced the successful implementation of anti-spam measures, ensuring the guild's channels are now fortified against disruptive spam attacks. Members may notice a cleaner and more focused discussion environment moving forward.

DiscoResearch Discord

German Encoders Go Live on Hugging Face: AI engineers might be enticed by the newly released German Semantic V3 and V3b encoders, available on Hugging Face. V3 targets knowledge-based applications, while V3b emphasizes high performance with innovative features including Matryoshka Embeddings and 8k token context capability.

Finetuning Steps for German Encoders Without GGUF: Despite inquiries, the German V3b encoder does not currently have a gguf format; however, for those interested in finetuning, it is recommended to use UKPLab's sentence-transformers finetuning scripts.

Possibility of GGUF for Encoders Empowered by Examples: In the wake of confusion, a member clarified by comparing with Ollama, establishing that encoders like German V3 can indeed be adapted to gguf formats which may involve using dual embedders for enhanced performance.

OpenRouter (Alex Atallah) Discord

New AI Player in Town: OpenRouter has introduced the 01-ai/yi-large model, a new language model specialized in knowledge search, data classification, human-like chatbots, and customer service; the model supports multilingual capabilities.

Parameter Snafu Resolved: The Recommended Parameters tab for the model pages on OpenRouter had data display issues, which have been fixed, ensuring engineers now see accurate configuration options.

AI Meets Academia: The newly launched GPA Saver leverages AI to offer academic assistance and includes tools like a chat assistant, rapid quiz solver, and more; early adopters get a discount using the code BETA.

Easing the Integration Experience: Thanks were expressed to OpenRouter for streamlining the process of AI model integration, which was instrumental in the creation of the GPA Saver platform.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

OpenAI ▷ #annnouncements (2 messages):

ChatGPT desktop app for macOS releases: The ChatGPT desktop app for macOS is now available to all users. Get quicker access to ChatGPT with the Option + Space shortcut, enabling seamless chats about emails, screenshots, and more.

Advanced Voice Mode delayed but incoming: The rollout of the advanced Voice Mode, initially planned for late June, has been delayed by a month to ensure quality. This mode, capable of understanding emotions and non-verbal cues, will start alpha testing with a small group before expanding to all Plus users in the fall, with updates on video and screen sharing capabilities to follow.

OpenAI ▷ #ai-discussions (388 messages🔥🔥):

GPTs Agents and OpenAI's Progress: Members expressed frustration over GPTs agents not learning new information after initial training. They pointed out that while OpenAI's models excel in initial training, continuous improvements are held back by excessive regulations.

Anthropic's Claude Rises in Popularity: Discussions highlighted how Anthropic's Claude 3.5 Sonnet has gained traction, with claims it offers better performance in coding and larger context windows compared to OpenAI's models. One user speculated on the efficiency of its architecture, possibly employing a MoE setup.

Model Performance Comparisons: Users discussed the upper hand Anthropic's Claude has over OpenAI's ChatGPT, especially in token context windows and refusal rates. While some argued Claude is more censored, others noted Claude's technical improvements, like larger token windows and less lag in responses.

Open Source and Custom Models: There was interest in custom GPTs and synthetic datasets tailored for niche applications, such as Japanese sentence explanations. Users shared resources like Hugging Face datasets and local inference tools like LM Studio for further customization.

Criticism and Future Prospects of OpenAI: Members voiced concerns about the delayed rollout of OpenAI's voice features and the limited benefits of the ChatGPT Plus subscription. They hope for advancements in context windows and other features to match competitors like Google's Gemini and Anthropic's Claude.

OpenAI ▷ #gpt-4-discussions (21 messages🔥):

Windows Beats Mac in Desktop App Demand: "Wouldn't Windows desktop app get way more used than mac desktop app?" sparked a discussion, with another user stating "yes, thats bs releasing it for mac."
LaTeX Formatting and GPT-4o Performance Concerns: A member explained they get the best results specifying LaTeX format. They also noted performance issues with GPT-4o, citing failure in logic and historical research tasks.
Mac Desktop App Limited to Apple Silicon: The discussion around the new macOS desktop app clarified it is only available for Apple Silicon (M1 or better), with no plans to support Intel Macs.
TTS Model New Voices Inquiry: A user asked if the new voices would be available through the TTS model but did not receive a direct response.
Slow GPT-4o Responses Frustrate Users: Members inquired and complained about the slowness of GPT-4o, wondering if there was an underlying problem.

OpenAI ▷ #prompt-engineering (1 messages):

Context Matters for AI Errors: One member noted that understanding AI mistakes depends greatly on the topic, knowledge content, and context. They suggested that reviewing what specific errors the AI is making could be helpful.

OpenAI ▷ #api-discussions (1 messages):

Dependence on AI's understanding of context and knowledge: The effectiveness of the AI's responses depends heavily on the topic and the context of the knowledge content. One member noted, "It'd be helpful to see what the AI is getting wrong."

HuggingFace ▷ #announcements (1 messages):

Argilla 2.0 enhances AI dataset creation: The release of Argilla 2.0 introduces a unified framework for feedback collection, a new Python SDK, flexible UI for data annotation, and updated documentation. These features aim to assist AI builders in creating high-quality datasets more efficiently.
Microsoft's Florence model impresses: Microsoft launched Florence, a vision model capable of multiple tasks like captioning and OCR. The models are MIT licensed and provide high quality despite their smaller sizes compared to much larger models.
Instruction pre-training by Microsoft: Microsoft's Instruction Pre-Training can enhance LLM pretraining with instruction-response pairs, leading to comparable performance of a Llama 3 8B model to a 70B model. This method is demonstrated in a Gradio space.
Marlin TGI features boost GPTQ models: The next Hugging Face TGI update will include Marlin features, supporting fast Marlin matrix multiplication for GPTQ-quantized models. This is achieved with the help of Neural Magic's Marlin kernel.
Ethics and Society newsletter underscores data quality: The Ethics and Society newsletter stresses the importance of data quality. It features collaborative efforts from various members and provides insights into maintaining high-quality data standards.

Links mentioned:

Tweet from Argilla (@argilla_io): 📢 Another big announcement: Argilla 2.0 rc!  What does it mean for AI builders?  🤺 Unified framework for feedback collection  🐍 New Python SDK to work with datasets, including a new @huggingface da...
Tweet from Omar Sanseviero (@osanseviero): Microsoft just silently dropped Florence  👀Vision model that can tackle many vision tasks (captioning, detection, region proposal, OCR) 🤏Small models (200M and 800M) with ~quality to models 100x lar...
Tweet from merve (@mervenoyann): Fine-tune Florence-2 on any task 🔥  Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andi_marafioti @skalskip92   Keep reading ⇓
Tweet from Vaibhav (VB) Srivastav (@reach_vb): Generate GGUF quants in less than 120 seconds! ⚡  > Added support for imatrix quants > GGUF-split support for larger quants > Automatic upload to hub > Support for private and org repos  U...
Tweet from Omar Sanseviero (@osanseviero): Microsoft just silently (again!) dropped Instruction Pre-Training!  👀Augment pretraining datasets generating instructions 🦙A Llama 3 8B with comparable performance to 70B! 🔥General+domain models (m...
Tweet from Daniel van Strien (@vanstriendaniel): Instruction pre-training is a new approach that enhances LLM pretraining by using instruction-response pairs from an instruction synthesizer instead of raw data.  Explore this method in this @gradio S...
Tweet from Daniël de Kok (@danieldekok): 🐬More Marlin features coming to the next @huggingface TGI release: support for using existing GPTQ-quantized models with the fast Marlin matrix multiplication kernel.  ⚡This feature is made possible ...
Tweet from Eustache Le Bihan (@eustachelb): Distil-Whisper goes multilingual!! 🤗  The French distilled version of Whisper is here! 🇫🇷 As accurate as large-v3, faster than tiny. The best of both worlds! 🚀  Check out the details below ⬇️
Tweet from Philipp Schmid (@_philschmid): Embedding models are crucial for successful RAG applications, but they're often trained on general knowledge! Excited to share an end-to-end guide on how to Train and Deploy open Embeddings models...
Tweet from F-G Fernandez (@FrG_FM): Xavier & @osanseviero presenting the robotics initiatives of @huggingface 🤗 (including LeRobot led by none other than @RemiCadene) at #AIDev by @linuxfoundation  Looking forward to the day when we re...
Tweet from Sayak Paul (@RisingSayak): Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? 🧨  Takes you through simple prompt engineering, prompt weighting, prompt enhan...
Tweet from Avijit Ghosh (@evijitghosh): The quarterly @huggingface Ethics and Society newsletter is out! Had so much fun collabing on this with @frimelle and supported by the ethics regulars. The theme for this quarter's newsletter is t...

HuggingFace ▷ #general (245 messages🔥🔥):

Struggles with VSCode Coding Assistants: A user experienced issues with Codiumate crashing mid-coding task, leading to frustration with coding assistants for VSCode. They expressed a need for a reliable solution that examines files and generates fixes without failing.
AI API Platform for Testing and Development: A member proposed building an AI-driven platform to automate testing and API code generation, sharing a survey to gather feedback. They seek full-stack developers and prompt engineers to contribute to the project.
Phi-3-Mini-128K-Instruct Model Highlights: The Phi-3-Mini-128K-Instruct model, a lightweight and state-of-the-art open model by Microsoft, has been showcased on HuggingFace. It supports longer token contexts and undergoes advanced post-training processes to enhance instruction following and safety.
Mozilla Builders Competition and Collaboration: Members discussed teaming up for the Mozilla Builders competition, which requires creating AI projects that run locally. Relevant resources and guidelines were shared for interested participants.
Optimizing Stable Diffusion Inference: Users discussed methods to speed up Stable Diffusion inference, with suggestions including using the Accelerate library and the stable-fast framework for significant performance improvements.

Links mentioned:

Claude 3.5 struggle too?! The $Million dollar challenge: The million dollar ARC AGI challengeGet free HubSpot report of how to do AI data analysis project: https://clickhubspot.com/d30🔗 Links- Follow me on twitter...
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
zero-gpu-explorers (ZeroGPU Explorers): no description found
no title found: no description found
Mozilla Builders: no description found
Hardest Choides Thanos GIF - Hardest Choides Thanos Avengers - Discover & Share GIFs: Click to view the GIF
Tweet from Thomas Wolf (@Thom_Wolf): a 3.25B params quantized gemini running locally in coming Google Chrome with less than 100ms latency while using less than 2GB of ram  that's less ram usage than many of my current Chrome page alr...
Beach Vacation GIF - Beach Vacation Artem - Discover & Share GIFs: Click to view the GIF
GitHub - chengzeyi/stable-fast: Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.: Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs. - chengzeyi/stable-fast
no title found: no description found
Thanos Memoji GIF - Thanos Memoji - Discover & Share GIFs: Click to view the GIF
An end-to-end AI API Platform + Current Challenges with Testing & Development:  Description: We are building an AI-driven platform to help developers and others test APIs, UI, and anything use AI automation. We are not limited to just testing you can also talk about development ...
GitHub - ToonCrafter/ToonCrafter: a research paper for generative cartoon interpolation: a research paper for generative cartoon interpolation - ToonCrafter/ToonCrafter
Open Sora - a Hugging Face Space by hpcai-tech: no description found
Open Sora Plan V1.0.0 - a Hugging Face Space by LanguageBind: no description found
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
  : no description found

HuggingFace ▷ #today-im-learning (3 messages):

Naive Bayes Algorithm on Kaggle: A user shared a link to a Kaggle code notebook that explores the Naive Bayes algorithm. The link points to a resource for studying this machine learning algorithm.

InfiniAttention Reproduction Progress: A user is working on a 95% reproduction of the InfiniAttention paper. They mentioned needing to fix the vanishing gradient issue and to run one final experiment to complete their work.

Link mentioned: Naive Bayes Algorithm: Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

HuggingFace ▷ #cool-finds (13 messages🔥):

Optimized RAG Systems with LlamaIndex and DSPy: An article on Medium provides a deep dive into building optimized Retrieval-Augmented Generation (RAG) systems using LlamaIndex and DSPy. Building Optimized RAG Systems

AI Canon: Curated Modern AI Resources: A blog post from a16z shares a curated list of resources dubbed the "AI Canon," useful for both beginners and experts in AI. It includes foundational papers, practical guides, and technical resources. AI Canon

Hand Gesture Media Player Controller Demo: A YouTube video demo showcases a Python-based hand gesture media player controller project. "Check out this cool project I've been working on - a Hand Gesture Media Player Controller using Python!" Hand Gesture Media Player Controller Demo

Protein Design Advances Noted in Nature: A Nature article discusses advances in protein design, noting challenges with traditional physics-based methods and highlighting the breakthroughs achieved with AlphaFold2. Protein Design

Few-Shot Prompting with Tool Calling in Langchain: An article discusses using few-shot prompting with tool calling in Langchain for improved AI model performance. Few-Shot Prompting

Links mentioned:

Gradio Llamma Cpp - a Hugging Face Space by freddyaboulton: no description found
Coding Agents Are Evolving From Novelties to Widely Useful Tools: On Father’s Day last weekend, I sat with my daughter to help her practice solving arithmetic problems...
Hand Gesture Media Player Controller Demo: Hey everyone! 👋 Check out this cool project I've been working on - a Hand Gesture Media Player Controller using Python! 🎮🖐️So , I've built a Python-based ...
papers-we-love/machine_learning/General-self-similarity--an-overview.pdf at main · papers-we-love/papers-we-love: Papers from the computer science community to read and discuss. - papers-we-love/papers-we-love
AI Canon | Andreessen Horowitz: A curated list of resources we’ve relied on to get smarter about modern AI, including generative AI, LLMs, and transformer models.
Final EMW 2023 - Macro Keynote (06.28.23).pdf: no description found
Onlyone Dennis in RYI UNITYDEFI OFFICIAL CHANNEL: 📣Buckle up everyone ! This Monday at 1 pm, join our X contest and earn RYIU!  - 0-300 followers: earns 100 RYIU/tweet - 300-600 followers: earns200 RYIU/tweet - 600-1000 followers: earns300 R...
Computational design of soluble and functional membrane protein analogues - Nature: A deep learning approach enables accurate computational design of soluble and functional&nbsp;analogues of membrane proteins, expanding the soluble&nbsp;protein fold space and facilitating new...

HuggingFace ▷ #i-made-this (66 messages🔥🔥):

Exploring Custom Byte Encoding in LLMs: In a detailed technical discussion, members explored the use of custom byte encoding for LLMs, predicting sequences in UTF-32. The conversation included potential issues with floating point accuracy and robustness, with one member expressing skepticism about its effectiveness but remaining curious about the results.

Hand Gesture Media Player Controller Demo: A member shared a YouTube video demonstrating a hand gesture-based media player controller using Python.

Bioinformatics Tools and Projects: Members shared various bioinformatics tools and projects, including PCALipids, a tool for PCA and related analyses of lipid motions, and other GitHub projects such as embedprepro-lib and PixUP-Upscale.

New Text Analysis CLI Tool Released: A member announced the release of a new text analysis command-line tool called embedprepro, designed for generating text embeddings, clustering, and visualization, aimed at researchers and developers.

Dataset for Optimizing LLMs for RLHF: A member released the Tasksource-DPO-pairs dataset on Hugging Face, Tasksource. This dataset is tailored for optimizing LLMs for Reward Learning from Human Feedback (RLHF) and focuses on fine-grained linguistic reasoning tasks.

Links mentioned:

StorIA: no description found
Tweet from Egg (@eggwens): Here is the live demo for pet psychic attached are the sample codes made in react with sample styling:  With Pet Psychic Scheduler, you can: 🔮 Book psychic readings for your pets ✨ Check daily mood f...
Hand Gesture Media Player Controller Demo: Hey everyone! 👋 Check out this cool project I've been working on - a Hand Gesture Media Player Controller using Python! 🎮🖐️So , I've built a Python-based ...
Dockerfile · KoboldAI/Koboldcpp-Tiefighter at main: no description found
GitHub - U-C4N/PixUP-Upscale: Contribute to U-C4N/PixUP-Upscale development by creating an account on GitHub.
GitHub - azmiord/project: Contribute to azmiord/project development by creating an account on GitHub.
GitHub - Elma-dev/embedprepro-lib: Contribute to Elma-dev/embedprepro-lib development by creating an account on GitHub.
GitHub - membrane-systems/PCAlipids: Scripts for PCA and related analyses of lipid motions: Scripts for PCA and related analyses of lipid motions - membrane-systems/PCAlipids
GitHub - bigsk1/voice-chat-ai: 🎙️ Speak with AI - Run locally using ollama or use OpenAI - XTTS or OpenAI Speech or ElevenLabs: 🎙️ Speak with AI - Run locally using ollama or use OpenAI - XTTS or OpenAI Speech or ElevenLabs - bigsk1/voice-chat-ai
tasksource/tasksource_dpo_pairs · Datasets at Hugging Face: no description found
tasksource/tasks.md at main · sileod/tasksource: Datasets collection and preprocessings framework for NLP extreme multitask learning - sileod/tasksource

HuggingFace ▷ #reading-group (15 messages🔥):

Thank Alex for recommending the work: A member expresses gratitude for Alex’s post, which brought attention to a particular work that no one had recommended before. 

Discussion on data leakage in LLMs: Eleuther AI mentioned papers discussing benchmark dataset leakage in LLMs. They shared links to one paper investigating the phenomenon and another paper addressing detection of benchmark data leakage.

Terminator architecture introduced: A new architecture called "Terminator" was shared from a Twitter link and further detailed in a GitHub repository. This architecture notably lacks residuals, dot product attention, and normalization.

Saturation in LLM leaderboards: Highlighting a concern, a member shared a HuggingFace link to a blog about the saturation in LLM leaderboards, indicating community attention to the issue. The link to the blog post: HuggingFace LLM Leaderboard Blog.

Links mentioned:

Tweet from Alex Yanko 🇺🇦 (@LeopolisDream): Welcome the new architecture:   Terminator   No residuals, no dot product attention, no normalization...   https://arxiv.org/pdf/2401.17948
Terminator/models/modules/hyperzzw.py at main · hyperevolnet/Terminator: Contribute to hyperevolnet/Terminator development by creating an account on GitHub.
Benchmarking Benchmark Leakage in Large Language Models: Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion...
A Careful Examination of Large Language Model Performance on Grade School Arithmetic: Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset c...

HuggingFace ▷ #core-announcements (1 messages):

Perturbed Attention Guidance now in Diffusers: HuggingFace has announced support for Perturbed Attention Guidance (PAG) in their diffusers library, enhancing image generation quality without additional training. Check out the update and kudos to the contributor who led the integration.

Link mentioned: PAG is now supported in core 🤗 · Issue #8704 · huggingface/diffusers: Hello folks! #7944 introduced support for Perturbed Attention Guidance (PAG) which enhances image generation quality training-free. Generated Image without PAG Generated Image with PAG Check out th...

HuggingFace ▷ #computer-vision (3 messages):

Evaluation Error with detection_util in Folder: Someone pointed out that the evaluate function has issues locating detection_util if it is in a folder within a space. This causes problems during evaluation as the function cannot find the required files.
Hand Gesture Media Player Controller Demo: A user shared a YouTube video demonstrating a "Hand Gesture Media Player Controller" made with Python. They encouraged others to check out their cool project.
Developing a Handwritten Table Data Pipeline: Someone requested assistance in creating a pipeline for identifying data in handwritten tables. They mentioned trying GPT-Vision, but it did not meet their expectations.

Link mentioned: Hand Gesture Media Player Controller Demo: Hey everyone! 👋 Check out this cool project I've been working on - a Hand Gesture Media Player Controller using Python! 🎮🖐️So , I've built a Python-based ...

HuggingFace ▷ #NLP (5 messages):

Seek advice on multilingual model distillation: Looking for suggestions on knowledge distillation of a multilingual model for a single language. 

Named entity recognition using RAG: A member seeks advice on using Retrieval-Augmented Generation (RAG) for recognizing named entities in long documents. Considering using SSM like Mamba for managing document length, another member suggests BM25 for keyword-oriented search and provides a GitHub link for more information.

Developing a pipeline for handwritten tables: A member wants to create a pipeline for identifying data in handwritten tables and finds that GPT-Vision is not meeting expectations. Asking for advice on more effective methods.

Experiences with LLM knowledge editing sought: A query about hands-on experiences with LLM knowledge editing and its deployment for simpler tasks like translation was raised.

Link mentioned: semantic-search-with-amazon-opensearch/Module 1 - Difference between BM25 similarity and Semantic similarity.ipynb at main · aws-samples/semantic-search-with-amazon-opensearch: Contribute to aws-samples/semantic-search-with-amazon-opensearch development by creating an account on GitHub.

HuggingFace ▷ #diffusion-discussions (3 messages):

Exploring Knowledge Distillation for Multilingual Models: A member inquired about performing knowledge distillation for a multilingual model focusing on a single language. Another member suggested trying SpeechBrain on HuggingFace as a possible solution.

LAION ▷ #general (327 messages🔥🔥):

Generate Music with AI on RateYourMusic: Members discussed generating songs and lyrics of any musician by using IDs from the RateYourMusic website. One member tried this method and confirmed its effectiveness, calling it "hilarious".

Open Model Initiative Controversy: There's a significant discussion about LAION's withdrawal from the Open Model Initiative and their involvement in datasets with problematic content. A member speculated that LAION might have been excluded for sharing non-synthetic datasets, but others believed it was a voluntary decision.

Synthetic vs. Non-Synthetic Data Debate: Several members debated the inclusion of NSFW (Not Safe For Work) content in datasets for training AI models. Concerns included moral and PR implications, with some advocating for excluding NSFW content and others critical of heavy-handed safety measures on models like SD3.

GPU and Workstation Discussions: Members compared different GPUs, including A6000s, 3090s, and P40s for AI training, discussing the trade-offs in VRAM, cost, and performance. They also talked about practical aspects like on-system cooling, fitting models in single VRAM vs. sharding, and specific models' efficiency and compatibility with certain GPUs.

ASIC Chips for Transformers: There's an intriguing discussion about Etched's Sohu, a specialized chip for transformer models claimed to be faster and cheaper than GPUs. Some members doubted its practicality due to its apparent inflexibility, which might limit its use to only specific types of AI models.

Links mentioned:

MambaOut: Do We Really Need Mamba for Vision?: Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision t...
Etched is Making the Biggest Bet in AI: no description found
Tweet from Bryan Johnson /dd (@bryan_johnson): Excited to invest in @Etched's $120 million series A.    10x cheaper AI models will allow us to solve aging 100x faster.  Quoting Etched (@Etched)   Meet Sohu, the fastest AI chip of all time.  Wi...
No GIF - Theoffice Stevecarrell Michaelscott - Discover & Share GIFs: Click to view the GIF
GitHub - tenstorrent/tt-firmware: Tenstorrent Firmware repository: Tenstorrent Firmware repository. Contribute to tenstorrent/tt-firmware development by creating an account on GitHub.
Jim Halpert GIF - Jim Halpert The - Discover & Share GIFs: Click to view the GIF
Lambda | GPU Compute for AI: The GPU Cloud built for AI developers. Featuring on-demand & reserved cloud NVIDIA H100, NVIDIA H200 and NVIDIA Blackwell GPUs for AI training & inference.

LAION ▷ #research (7 messages):

Debate on Poisoning Models: A member expressed concern that someone "actively encouraged poisoning models," indicating controversies in model training ethics.
AIW+ Problem Harder but Solvable: Another member clarified that the AIW+ problem, although more complex than simple AIW, is still a common-sense problem and solvable. They suggested checking the paper’s supplementary material for the solution. 
Caution Against Manual Evaluation: It was advised against manual evaluation, as it can be highly misleading due to inconsistent results from repeated trials. The recommendation was to use systematic prompt variations and conduct at least 20 trials per prompt variation.
Disagreement Over AIW+ Solution: A member disputed the provided solution for the AIW+ problem, stating it was incorrect and ambiguous due to unaccounted familial relationships. They also remarked that model agreement with this solution does not eliminate the ambiguity.

Eleuther ▷ #announcements (2 messages):

EleutherAI at ICML 2024: EleutherAI members shared their excitement about presenting multiple papers at ICML 2024, covering a range of topics from classifier-free guidance to the societal impacts of open foundation models. Links to their papers, such as Stay on topic with Classifier-Free Guidance and Neural Networks Learn Statistics of Increasing Complexity, were provided to keep the community informed.

Understanding Memorization in LMs: A member highlighted their work on better understanding memorization in language models, introducing a taxonomy to differentiate between recitation, reconstruction, and recollection. They shared a preprint and a Twitter thread to elaborate on their findings and its implications for copyright, privacy, and generalization.

Link mentioned: Tweet from Naomi Saphra (@nsaphra): Humans don't just "memorize". We recite poetry drilled in school. We reconstruct code snippets from more general knowledge. We recollect episodes from life. Why treat memorization in LMs u...

Eleuther ▷ #general (98 messages🔥🔥):

Finding the best multimodal models: A member inquired about locating top-performing multimodal models, specifically Image+Text to Text models, and shared a link to Huggingface for reference. This helped others looking for similar resources.

ICML Social Thread kicks off: A social thread for ICML in Vienna, Austria, was started to coordinate meetups and events. Members discussed logistics and planned gatherings, showing enthusiastic participation.

Goldfinch model details shared: Information about the hybrid Goldfinch model, featuring an improved Llama-style transformer layer paired with Finch B2, was shared. Members exchanged links and DM’d more details and discussed technical specifics.

Documenting OOD input handling in LLMs: A paper concerning how neural network predictions behave with out-of-distribution (OOD) inputs was discussed, specifically this arxiv link. This sparked a discussion on whether LLMs behave similarly and the implications for Bayesian DL.

Request for vision model recommendations: A member requested suggestions for vision models capable of performing RAG on PDFs with image data. The conversation unfortunately did not yield any specific model recommendations.

Links mentioned:

LeaderboardExplorer - a Hugging Face Space by leaderboards: no description found
Spongebob Eating GIF - Spongebob Eating Chewing - Discover & Share GIFs: Click to view the GIF
Cat Hi GIF - Cat Hi Hello - Discover & Share GIFs: Click to view the GIF
Deep Neural Networks Tend To Extrapolate Predictably: Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural...
Worried Scared GIF - Worried Scared Oh No - Discover & Share GIFs: Click to view the GIF
Cat Cat Crazy GIF - Cat Cat crazy Crazy cat - Discover & Share GIFs: Click to view the GIF

Eleuther ▷ #research (114 messages🔥🔥):

Comparative Evaluation of Synquid: Members debated the merits of the paper Synquid, highlighting well-thought-out experiments but expressing mixed feelings about certain missing baselines like "no activation function." One member noted, "It will also score lower on their complexity measure at random initialization," emphasizing the importance of this baseline in their analysis.

NRS Framework in Paper Critique: The discussion inspected the hypothesis testing in a paper on neural network initialization and inductive biases. One member stated, "The complexity at initialization correlating with downstream performance on tasks of similar complexity," while others critiqued the reinterpretations of existing work, specifically their stance on random sampling of low-loss solutions.

Implementation of Multimodal Metrics and Experimental Validation: Members analyzed a paper on JEST, emphasizing joint example selection for data curation in multimodal contrastive learning. They discussed the significant efficiency gains claimed in the paper, noting the approach surpasses state-of-the-art models with much fewer iterations and computational requirements.

Homomorphic Encryption and LLMs: Members briefly touched on the speculative nature of using homomorphic encryption for large language models, as discussed in a Zama AI blog post. The discussion noted skepticism about the practical advancements in homomorphic encryption for real-time applications.

Generalization and Grokking in Transformers: Members debated whether grokking and generalization are being confused in a paper, pointing out that "grokking refers specifically to a sudden shift in eval performance after a long flat period of training." They critiqued the paper's methods and the historical context of generalization research.

Links mentioned:

Data curation via joint example selection further accelerates multimodal learning: Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independe...
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers: State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. This has sparked a research agenda to reduce these models' parameter count an...
Large Language Models are Interpretable Learners: The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer inte...
Making ChatGPT Encrypted End-to-end: With Homomorphic Encryption, you can use LLMs without revealing your personal data.
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers: Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and su...
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization: We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types...
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework: Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss...

Eleuther ▷ #scaling-laws (15 messages🔥):

Self-Attention confirmed as (hetero)associative memory model: One member clarified that self-attention functions as a (hetero)associative memory model, pointing out the connection to associative memory frameworks like Hopfield networks. They referenced the paper Hopfield Networks is All You Need to support this claim.
LeCun's perspective on Transformers: A discussion referenced Yann LeCun's description of transformers as "associative memories." This is tied to the idea that self-attention mechanisms have memory model characteristics.
Hopfield Networks paper sparks interest: The paper Hopfield Networks is All You Need generated significant discussions, with mentions of its authors and the ideas it presents about modern continuous Hopfield networks (MCHNs) relating closely to self-attention.
Criticism of "Is All You Need" papers with exceptions: One member expressed disdain for papers titled "Is All You Need" but acknowledged that some, like Hopfield Networks is All You Need, present exceptional value. The user cited its innovative treatment of grokking and overall contributions to the field.
Hopfield layers as single-step attention: Clarification on how Hopfield layers work in practice within neural networks was provided, noting that memorization happens during pre-training and retrieval occurs in the forward pass. Each operation is dictated as a single step of a Hopfield network, emphasizing the practical application in self-attention mechanisms.

Links mentioned:

Hopfield Networks is All You Need: We introduce a modern Hopfield network with continuous states and a corresponding update rule. The new Hopfield network can store exponentially (with the dimension of the associative space) many patte...
Pretraining on the Test Set Is All You Need: Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a n...
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models: A large number of neural network models of associative memory have been proposed in the literature. These include the classical Hopfield networks (HNs), sparse distributed memories (SDMs), and more re...

Eleuther ▷ #interpretability-general (4 messages):

SAEs identified to recover linear features: A member shared a research report showing that "SAEs recover linear features from an overcomplete basis," and highlighted that using "a single layer autoencoder with an L1 penalty on hidden activations" can identify features beyond minimizing loss. They linked to the LessWrong post and acknowledged feedback from other researchers.

Interest in toy models for SAE testing: The same member expressed interest in exploring toy models to test SAEs, inspired by another post emphasizing the importance of feature geometry beyond the superposition hypothesis. They shared another LessWrong post on the subject, which discusses structural information in neural networks' feature vectors.

Excitement for multilinguality and safety work: A member shared a Twitter link about new work on multilinguality, safety, and mechanistic interpretation, highlighting that "DPO training in only English can detoxify LLM in many other languages." They also provided a link to the associated research paper on arXiv.

Links mentioned:

Tweet from Zheng-Xin Yong (Yong) (@yong_zhengxin): 🔥New work on multilinguality + safety + mech interp!  We show that DPO training in only English can detoxify LLM in many other languages.  We also give a mechanistic explanation on how cross-lingual ...
SAE feature geometry is outside the superposition hypothesis — LessWrong: Written at Apollo Research • Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations…
[Interim research report] Taking features out of superposition with sparse autoencoders — LessWrong: We're thankful for helpful comments from Trenton Bricken, Eric Winsor, Noa Nabeshima, and Sid Black.  …

CUDA MODE ▷ #general (16 messages🔥):

AMD MI300X Challenges Nvidia's GPU Dominance: A post about AMD's Radeon Instinct MI300X highlights its aim to compete with Nvidia's GPU compute market lead. While AMD's software ecosystem ROCm trails Nvidia's CUDA, the MI300X represents an effort to overcome this hardware gap independently. Full post.

Etched Introduces Transformer ASIC: Etched's new Transformer ASIC chips claim to run AI models significantly faster and cheaper than GPUs by etching transformer architecture directly into silicon. The chip promises applications like real-time voice agents and the capability to run trillion-parameter models.

Skepticism Around Etched's ASIC Claims: Users expressed doubts about the practical advantages of Etched's ASICs, particularly whether etching just the architecture rather than also including the weights would deliver the promised performance gains. The discussion highlighted the competition and rapid advancement in AI hardware.

Etched Secures Major Investment: Bryan Johnson announced his excitement to invest in Etched's $120 million series A, citing the company's claim that their Sohu chip can run AI models 10x cheaper and replace 160 Nvidia H100 GPUs with one 8xSohu server. Tweet link.

Future of AI Chips Debated: Users debated the future role of specialized AI chips like ASICs compared to GPUs, with mentions of the industry's direction towards dedicated hardware accelerators. The potential for rapid changes in model architectures and the flexibility of tensor cores were highlighted.

Links mentioned:

Testing AMD’s Giant MI300X: Editor Note (6/26/2024): We have rephrased the acknowledgment section to make more clear that we got no direct support from AMD on this article. Our testing is fully independent, and AMD did not ha…
Etched | The World's First Transformer ASIC: Transformers etched into silicon. By burning the transformer architecture into our chips, we're creating the world's most powerful servers for transformer inference.
AI Chip Startup Etched Aims to Take On Nvidia: AI chip startup Etched raised $120 million to expand manufacturing of its specialized chip that it boasts will rival Nvidia’s products. Etched CEO Gavin Uber...
Tweet from Bryan Johnson /dd (@bryan_johnson): Excited to invest in @Etched's $120 million series A.    10x cheaper AI models will allow us to solve aging 100x faster.  Quoting Etched (@Etched)   Meet Sohu, the fastest AI chip of all time.  Wi...
The Future of AI Chips Might Not Be GPU | Pixelstech.net: no description found

CUDA MODE ▷ #triton (1 messages):

New user seeks help on Triton issue: A new member introduced themselves and shared an issue they opened in the Triton repo. They are looking for pointers on how to add a pow function in python.triton.language.core and provided a link to the issue.

Link mentioned: How to add a pow function in python.triton.language.core? · Issue #4190 · triton-lang/triton: I tried to use pow operation in a triton.jitted function as: output = x + y**3 ^ However got AttributeError("'tensor' object has no attribute 'pow'"). In file python/trit...

CUDA MODE ▷ #torch (6 messages):

PyTorch Documentary Premiers: Members shared a YouTube link to the "PyTorch Documentary Virtual Premiere: Live Stream" featuring key players from the project's early days to the present. This was posted repeatedly by multiple users to emphasize its importance.
Goat Emoji Hype: A member reacted to the PyTorch Documentary link with a goat emoji (🐐), symbolizing excitement and hype. The reaction was noted and mirrored by another member to highlight this sentiment.

Links mentioned:

PyTorch Documentary Virtual Premiere: Live Stream: Join us for the official release of the PyTorch Documentary! Hear from key players in the project, from the early days to the present.
PyTorch Documentary Virtual Premiere: Live Stream: Join us for the official release of the PyTorch Documentary! Hear from key players in the project, from the early days to the present.

CUDA MODE ▷ #algorithms (1 messages):

Adam-mini optimizer reduces memory usage: Adam-mini is proposed as an optimizer that offers equivalent or better performance than AdamW while using 45% to 50% less memory. The GitHub repository contains the code and details of the implementation.

Link mentioned: GitHub - zyushun/Adam-mini: Code for the paper: Adam-mini: Use Fewer Learning Rates To Gain More: Code for the paper: Adam-mini: Use Fewer Learning Rates To Gain More - zyushun/Adam-mini

CUDA MODE ▷ #torchao (38 messages🔥):

Raw Kernel for Linear Algebra in PyTorch: A user shared a link to the raw kernel in the PyTorch repository. This kernel is located in the native linear algebra section of the code.
Subclass dtype Issue in PyTorch: Members discussed issues with tensor subclasses not reflecting their actual dtype, complicating compatibility and usability. Marksaroufim encouraged filing an issue on PyTorch and suggested looking into internal improvements.
Open Source Contributions' Value: Locknit3 questioned whether open source contributions help in job searches, sparking a debate. Gau.nernst and kashimoo affirmed their value, mentioning instances where recruiters noted their contributions.
Integrating HQQ with TorchAO: Members discussed the potential integration of HQQ with TorchAO's quantize() API, linking to the HQQ optimizer. They highlighted the algorithm's simplicity and suggested it could be a new baseline for INT4 quantization.
Low-bit Fused GEMV CUDA Kernels: Mobicham shared that they have been developing low-bit fused GEMV CUDA kernels, outlining their flexibility and current limitations. Gau.nernst inquired about support for odd bitwidths, to which Mobicham confirmed feasibility.

Links mentioned:

Google Colab: no description found
hqq/hqq/core/optimize.py at master · mobiusml/hqq: Official implementation of Half-Quadratic Quantization (HQQ) - mobiusml/hqq
Release v0.3.0 · pytorch/ao: v0.3.0 Highlights We are excited to announce the 0.3 release of torchao! This release adds support for a new quantize API, MX format, FP6 dtype and bitpacking, 2:4 sparse accelerated training and b...
ao/torchao/_models/llama/model.py at f172c474cbd56641bb34e73df5d61818a9d4e6e1 · pytorch/ao: Create and integrate custom data types, layouts and kernels with up to 2x speedups with 65% less VRAM for inference and support for training - pytorch/ao
pytorch/aten/src/ATen/native/LinearAlgebra.cpp at b7e7a4cb01de394af7686ab6feb216a8a5c716bb · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch

CUDA MODE ▷ #hqq (8 messages🔥):

Axis setting affects HQQModelForCausalLM performance: A user reported issues with the HQQModelForCausalLM related to meta-llama/Llama-2-7b-chat-hf and Mistral-7B-v0.1, specifically when setting axis=0, which skips using torchao's int4 kernel. Another user clarified that axis controls the axis along which grouping is performed and affects both the perplexity/lm-eval score and inference speed due to kernel support.
Inference quality issues tied to HF's transformers cache implementation: There were mentioned quality issues with Hugging Face's Transformers cache implementation, suggesting it could be a potential source of issues with model evaluation.

CUDA MODE ▷ #llmdotc (146 messages🔥🔥):

Scaling LR and update clipping controversies: Members discussed scaling the learning rate (LR) and using update clipping. One shared, "tbh, some form of update clipping still sounds reasonable to me," while another noted, "for stabilization, we analyze loss spikes," pointing to AdamW's paper and another arXiv link.

AMD vs NVIDIA system builds: A user reported building a machine with RDNA3 cards and having access to big machines, while another mentioned using A6000's and planned to build an AMD system, reflecting on potential providers like Azure's MI300X instances.

Sohu ASIC chip could revolutionize transformers: A member highlighted a tweet about Sohu's new ASIC chip boasting 500,000 tokens per second for Llama 70B, potentially replacing 160 H100 GPUs. Questions arose regarding its specialization solely for transformer models and the impact on versatility.

FP8 integration sparks mixed reactions: Discussion on integrating FP8 into existing systems, balancing simplicity versus greater changes, concluded with "optional but you’d still take the PR if it’s in decent enough shape". The feasibility of avoiding global amax history and using local scaling was also analyzed.

Effective multi-node training on Lambda: Highlighting successful deployment, "It was not easy to set it up," a member shared their 16 GPU multi-node training setup on Lambda with almost a 2X speedup, stating, "It is glorious."

Links mentioned:

Tweet from Etched (@Etched): Meet Sohu, the fastest AI chip of all time.  With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s.  Soh...
rolling checkpoints by karpathy · Pull Request #636 · karpathy/llm.c: checkpoints are either MINOR or MAJOR and minor checkpoints get deleted with a rolling window. This is an optimization that will allow us to save state more often, but preserve disk space overall. ...
CI Dataloader test and ptx/sass file generator by rosslwheeler · Pull Request #629 · karpathy/llm.c: New CI tests file - added dataloader test and ptx/sass file generator to it. Cuda Makefile - added capability build from main Makefile. Added support for ptx and sass output files. layernorm_forwar...
CI Dataloader test and ptx/sass file generator by rosslwheeler · Pull Request #629 · karpathy/llm.c: New CI tests file - added dataloader test and ptx/sass file generator to it. Cuda Makefile - added capability build from main Makefile. Added support for ptx and sass output files. layernorm_forwar...
On-device reductions by ngc92 · Pull Request #635 · karpathy/llm.c: Moves loss calculation to backward, and ensures  we can do more on-device reductions and fewer host<->device transfers. Also enables a micro-optimization, that validate does not calculate dlogit...
optimi/optimi/stableadamw.py at 4542d04a3974bb3ac9baa97f4e417bda0432ad58 · warner-benjamin/optimi: Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers - warner-benjamin/optimi

CUDA MODE ▷ #intel (2 messages):

Intel Pytorch team works on XPU support: The Intel PyTorch team is actively working to enable XPU (Intel GPUs) support in stock PyTorch. They have shared an RFC on GitHub to discuss the upstreaming process and leverage Intel's advancements in GPU technology.

Link mentioned: [RFC] Intel GPU Upstreaming  · Issue #114723 · pytorch/pytorch: TL;DR This RFC document aims to propose and discuss the upstreaming of Intel GPU support in PyTorch. Our focus is on leveraging Intel's advancements in GPU technology to enhance PyTorch's perf...

Perplexity AI ▷ #general (153 messages🔥🔥):

Pro focus search limitations and updates: Users discussed issues with Perplexity Pro focus search, noting that non-Pro + Reddit searches are functioning correctly. One user appreciated that their Pro search for Standard mode returned more sources than previously.

Claude 3.5 capabilities and availability: Members discussed Claude 3.5 Sonnet's context window being about 32k tokens for Perplexity Pro, with a preference for Claude Pro for a full 200k tokens. Claude 3.5's availability on Android was also confirmed.

API filters and undocumented features: Questions were raised about API citation date filters and possible undocumented features, like a search domain filter. Users discussed whether some features are in development or available through workaround methods.

Debate on AI search quality and new features: Users compared Perplexity's response quality to ChatGPT, acknowledging Perplexity's new agentic search capabilities which handle multi-step queries. Some expressed frustrations with Perplexity's summarization and source handling, suggesting it often leads to hallucination in responses.

Service disruptions and API status concerns: Users reported 5xx errors with the Perplexity API, expressing frustration over the lack of a status page to check service uptime. Calls were made for better transparency and basic API management features.

Link mentioned: Researchers Upend AI Status Quo By Eliminating Matrix Multiplication In LLMs - Slashdot: Researchers from UC Santa Cruz, UC Davis, LuxiTech, and Soochow University have developed a new method to run AI language models more efficiently by eliminating matrix multiplication, potentially redu...

Perplexity AI ▷ #sharing (10 messages🔥):

CTO Denis Yarats discusses AI Innovations: Check out the YouTube video to see Denis Yarats, CTO of Perplexity AI, discussing the innovative use of AI in creating high-quality search experiences. Yarats joins Lukas Biewald on Gradient Dissent to dive into this topic.
Titan's Waves, China's Lunar Triumph, and Volkswagen's Rivian Investment: Perplexity AI’s YouTube video explores compelling topics such as Titan's missing waves, China's lunar achievements, and Volkswagen's investment in Rivian.
Hot Searches on Perplexity.ai: Explore various searches on Perplexity AI including Intel CPU, Perplexity functionality, Hive Blockchain, and 5000 unit results.
Insightful Pages on Perplexity.ai: Discover detailed pages like overcoming trauma at this link and updates on Julian Assange's release here.
Curiosities about Gravity: Check out the in-depth search results for how gravity affects perception and related phenomena at this link.

Links mentioned:

YouTube: no description found
Transforming Search with Perplexity AI’s CTO Denis Yarats: In this episode of Gradient Dissent, Denis Yarats, CTO of Perplexity, joins host Lukas Biewald to discuss the innovative use of AI in creating high-quality, ...

Perplexity AI ▷ #pplx-api (5 messages):

Confusion about Perplexity AI usage: A member expressed confusion, asking for clarification on what another user meant. No additional context or answers were provided.
Request for Perplexity AI search functionality: Another member wanted to know how to make Perplexity AI perform a search function for recent events, like obtaining details on a new car. 
Feature suggestion: llama-3-sonar-*-32k-online: In response to the query about search functionality, another member suggested trying the feature named "llama-3-sonar--32k-online"*.
Inquiry about closed beta API with citation and image support: A member asked if anyone has access to a closed beta API that includes citation and image features after requesting access.
Issues with Perplexity API and status inquiry: A member reported receiving 5xx errors while using the Perplexity API and asked if there was a status page to check when the API server will be up.

Latent Space ▷ #ai-general-chat (66 messages🔥🔥):

AI Engineer World’s Fair watch party coordination: A member asks if anyone can host a watch party for the AI Engineer World’s Fair, which will be livestreamed here. The event includes keynotes and code tracks.

PyTorch Documentary Premiere: An announcement about the PyTorch Documentary Virtual Premiere sparks interest. It will feature key players from the project's history and present developments.

ChatGPT Voice Mode delay discussed: A member shares a tweet by Teknium discussing ChatGPT's delay in releasing its advanced Voice Mode. There are issues with killing the waifu features in the updates.

Excitement over AI wearable at AIE: Attendees of the speaker dinner at the AI Engineer event received an AI wearable from Bee Computer. One member said, "it knows almost all the most important facts about me... and has a list of TODOs for me", indicating its impressive functionality.

Fascination with reconstructed movies from mouse visual cortex: A member is amazed by the reconstructed movies from mouse visual cortex activity. They describe the neuroscientific achievement as mind-blowing.

Links mentioned:

Tweet from Joel Bauer (@Neuro_Joel): 🚀 Excited to share the first preprint from my postdoc in the labs of @ClopathLab and Troy Margrie at @SWC_Neuro! We've reconstructed movies from mouse visual cortex activity. 📽️✨ #Neuroscience #...
Tweet from andrew gao (@itsandrewgao): 💋📚 Lip reading AI in action!  i infamously tweeted that i'm bearish on voice.  today, i changed my mind.  @SymphonicLabs trained AI to ** read your lips **   now i can use voice interfaces compl...
Tweet from Alex Volkov (Thursd/AI) (@altryne): ok holy shit...   Everyone at the speaker dinner at @aiDotEngineer got a new AI wearable, @bee__computer as a gift.   I onboarded (will post video later) put it on and kinda forgot about it, and tool ...
AI Engineer: Talks, workshops, events, and training for AI Engineers. 
PyTorch Documentary Virtual Premiere: Live Stream: Join us for the official release of the PyTorch Documentary! Hear from key players in the project, from the early days to the present.
David Luan: Why Nvidia Will Enter the Model Space & Models Will Enter the Chip Space | E1169: David Luan is the CEO and Co-Founder at Adept, a company building AI agents for knowledge workers. To date, David has raised over $400M for the company from ...
AI Engineer World’s Fair 2024 — Keynotes & CodeGen Track: https://twitter.com/aidotengineer
Tweet from Xeophon (@TheXeophon): Anthropic in a single week: - Releases new model - Releases Artifacts - Releases Projects in Claude  OpenAI:  - Releases a Mac-only app - Delays voice for months  Quoting OpenAI (@OpenAI)   We're ...
Tweet from Machine Learning Street Talk (@MLStreetTalk): My dream came true today - an epic day of filming with the original gangster of AI - @SchmidhuberAI
Bee Developer Platform: no description found
Tweet from Teknium (e/λ) (@Teknium1): They're having trouble killing the waifu features in the gpt4o voice updates :l
Tweet from Alex Volkov (Thursd/AI) (@altryne): ok holy shit...   Everyone at the speaker dinner at @aiDotEngineer got a new AI wearable, @bee__computer as a gift.   I onboarded (will post video later) put it on and kinda forgot about it, and tool ...
Tweet from Austin Byrd (@AustinTByrd): Figma AI is free for a year before they start billing everyone

Latent Space ▷ #llm-paper-club-west (100 messages🔥🔥):

Vectors in SQLite Shine: Multiple users expressed excitement about the topic, with one noting “Vectors in SQLite 🔥” and another capturing a screenshot of a relevant slide.
Vector Databases Declare Dead: A bold statement was made that "vectordbs are dead," which sparked reactions among the participants.
Slides Will Not Be Available: When asked if the slides from the presentation would be available later, the response was a firm "no," disappointing some attendees.
AI Engineer Conference Hiccups: The conference faced several issues, including a 10-minute late start leading to a canceled talk and OpenAI dropping out with less than 48 hours notice. Swyxio expressed frustration about audio issues, saying, “this audio issue is pissing me off.”
YouTube Livestream for Follow-ups: Swyxio referred attendees looking to catch up on missed content to the YouTube livestream for the "AI Engineer World’s Fair 2024 — Keynotes & CodeGen Track."

Links mentioned:

AI Engineer: Talks, workshops, events, and training for AI Engineers. 
AI Engineer World’s Fair 2024 — Keynotes & CodeGen Track: https://twitter.com/aidotengineer

LM Studio ▷ #💬-general (109 messages🔥🔥):

Error Loading Model in LM Studio: A member reported a recurring error, (Exit code: -1073740791), when trying to load a model in LM Studio (0.2.25). It was suggested they provide system specs and try a different model or configuration.
OM issues with 3060 and Discussion of Alternative Models: Attempting to run Hermes 2 Theta Llama-3 70B on an RTX 3060ti will lead to "Out of Memory" (OOM) issues. There’s a suggestion to use NousResearch's 8b version instead.
Struggles with Large Models on Apple M Chips: A user described problems running Llama 3 70B on unified RAM where one model worked fine while another led to unreadable, scrambled output. It’s acknowledged that different quant types and settings like Q3 or Q4 KM may affect performance.
RAG Explanation and Application: There's an in-depth explanation of retrieval-augmented generation (RAG) for assisting detailed information generation. The link NVIDIA's blog on RAG was shared for understanding the concept better.
AnythingLLM for Document Analysis and Summarization: For those needing document analysis and summary generation, "AnythingLLM" is recommended for its ease of use with various document types and integration with LM Studio. It was noted as a free and open-source solution.

Links mentioned:

LLM Finetuning: no description found
What Is Retrieval-Augmented Generation aka RAG?: Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
Retrieval augmented generation (RAG) | 🦜️🔗 Langchain: Let's now look at adding in a retrieval step to a prompt and an LLM, which adds up to a "retrieval-augmented generation" chain:
LLM Explorer: A Curated Large Language Model Directory. LLM List. 34902 Open-Source Language Models.: Browse 34902 open-source large and small language models conveniently grouped into various categories and llm lists complete with benchmarks and analytics.

LM Studio ▷ #🤖-models-discussion-chat (7 messages):

GM Master in Traveler Universe: A user asked for the chatbot to act as the game master in a hard-core science fiction role-playing game set in the original Traveler universe. They emphasized the need for the inclusion of unfortunate outcomes to random events to enhance the value of success, and also to give the user a chance to respond to NPC actions.

Lack of Discussion on 70b New Dawn: A member noted the notable absence of discussions around the 70b New Dawn model, calling it "really good." Another user suggested that this might be due to most users only running smaller models like 7b-13b, limiting larger models' exposure.

Struggles with Academic-Level Local Models: A user expressed dissatisfaction with local models, specifically L3-Daredevil-Obliterated, L3 SFR-Iterative, and L3 Hermes, for academic discussions and complex instructions. They inquired about recommendations for models under 34B, if not preferable 20B, highlighting their preference for FOSS and privacy-focused models over options like OpenAI.

Bartkowski's Q4KM DeepCoder V2 Performance: A user shared their set-up for running Bartkowski's Q4KM DeepCoder V2 230B with 8K context using LM Studio 0.2.25 and discussed its impressive performance. They noted the model's RAM and GPU memory usage, achieving 2.68-2.8 tokens per second, and discussed their challenges with higher context lengths like 32K.

LM Studio ▷ #🧠-feedback (2 messages):

Scam alert on Discord URL: A user cautioned that a shared URL links to a scam site in Russia. They emphasized that the URL in question is not real.
Message deletion confirmation: Another user noted that they couldn't see the suspect message and speculated that it might have been deleted.

LM Studio ▷ #🎛-hardware-discussion (15 messages🔥):

A4000 vs 4060ti for inference: A member asked if anyone has experience with an A4000 and wondered about its performance for inference in a single slot compared to the 4060ti 16G.
8x P40 Build complete: A member shared that they have completed their build using 8x P40 GPUs, integrated into a used server cabinet with Garuda Linux working out of the box with Nvidia drivers.
VRAM reporting in LM Studio: Queries were made regarding if LM Studio accurately reports the correct amount of VRAM for systems with multiple GPUs, with one user indicating their setup reported ~192GB of VRAM correctly.
Home lab noise issues: A humorous comment noted the downside of a home lab setup, comparing the server's noise to a "jet engine" on startup. 
Server power management: Discussion included the use of a 200 amp circuit with 4x1000w power supplies, noting power draw around 1KW.

LM Studio ▷ #🧪-beta-releases-chat (7 messages):

Context window impacts token prediction: A member humorously explained that "the longer the context window the more scratchpad the model needs" to track token prediction, and when it runs out, it starts emitting gibberish. They joked about optimal sock color affecting performance, emphasizing the anecdotal nature of their comments.
Potential scam links reported: A user alerted mods to a person's account potentially being hacked and posting links to a "Russian scam site pretending to be steam".
Mod assistance requested: A user asked for moderator intervention as scam links had been present in several channels for hours, and thanked those who responded.
Mamba architecture support query: A user inquired about when LMstudio will start supporting the Mamba architecture.

LM Studio ▷ #open-interpreter (1 messages):

Interpreter refuses document and image uploads: A member is frustrated that they cannot "move documents or images directly into the terminal" in their interpreter. They mentioned the interpreter "gives me the ban" and "doesn't give me consent" for these actions.

LM Studio ▷ #🛠-dev-chat (2 messages):

SDK walkthrough for building a Discord bot: A helpful guide was shared for those interested in using the SDK to create a Discord bot. Check it out here.

Querying token speed and generation time in Python: One member inquired about how to extract data from the local LM Studio server using Python, specifically focusing on token speed and the time it takes to generate the first token.

Link mentioned: no title found: no description found

Modular (Mojo 🔥) ▷ #general (49 messages🔥):

Exploration of Plotting with Mojo Data Types: A user questioned if there are any projects that use Mojo data types directly for plotting without converting to Numpy and Matplotlib. Another user shared code examples and links to discussions about using Mojo with Matplotlib and the Mojo-SDL project on GitHub for SDL2 binding.

Community Decision on Charting Libraries: A discussion emerged about what the community wants from a Mojo charting library. The conversation involved decisions on whether to make high-level interfaces, support interactive charts, or focus on data input formats such as Arrow.

Interactivity in Data Visualization: A user emphasized the importance of interactivity in data science visualization tools, suggesting that something like the Vega specification could be an IR to address both web and native rendering. A Vega maintainer revealed challenges and the potential of alternative query engines like those used by Mosaic.

Mojo on Windows via WSL: A user confirmed that Mojo works on Windows using WSL with native support expected by the end of the summer. They discussed the ease of linking WSL with VSCode and the slight learning curve related to using Linux and directories.

Reflection on Plotting Libraries: There was a discussion on various plotting libraries like D3, Altair, Matplotlib, Seaborn, and Plotly, with users comparing their focuses and target audiences. The dialogue also touched on UW’s Mosaic library and its innovative approach in the data visualization space.

Links mentioned:

Modular: Mojo🔥 ❤️ Pi 🥧: Approximating Pi with Mojo🔥 using Monte Carlo methods: We are building a next-generation AI developer platform for the world. Check out our latest post: Mojo🔥 ❤️ Pi 🥧: Approximating Pi with Mojo🔥 using Monte Carlo methods
GitHub - msteele/mojo-sdl: Minimal SDL2 binding for Mojo🔥: Minimal SDL2 binding for Mojo🔥. Contribute to msteele/mojo-sdl development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):
ModularBot: From Modular:
https://twitter.com/Modular/status/1806070670293692594

Modular (Mojo 🔥) ▷ #ai (2 messages):

ARC test includes pattern recognition common to humans: A member described the ARC test as a catalog focusing on "closed area, symmetry, object features, and other stuff that are culturally common to humans." They humorously suggested a "dog arc test" focusing on features relevant to dogs like poop smell and bark pitch.
AI aces IQ tests but lacks true intelligence: One user argued that IQ tests do not measure intelligence, which is why AI can excel at them. They believe the ARC test is "the most AI thing ever" yet criticized its dataset as very poor.
Questioning the nature of intelligence and consciousness: Another member pondered the difference between human information recall and AI systems like LLMs. They asked if others differentiate between intelligence and consciousness, suggesting recall might be only a part of intelligence.

Modular (Mojo 🔥) ▷ #🔥mojo (18 messages🔥):

Mojo discusses GPU programming with MAX Graph API: Brad Larson mentioned targeting GPUs using the MAX Graph API in Mojo, which lets users construct and optimize computational graphs. He explained, "Custom operations can be written in Mojo and staged inside that graph."

Bug with type checking in Mojo 24.4.0: Brooklyn_marc reported a potential bug where a function returning a String was incorrectly accepted as a Float. Roboquant and carlcaulkett confirmed that newer nightly versions properly raise an error, with carlcaulkett sharing specific error output to illustrate the issue.

Variable reassignment and type checking quirks: Darinsimmons experimented with the reported bug and noted that nightly versions handle type checking more robustly. He commented on the dynamics of assignment and type checking within the compiler, wondering if it's an order of operations issue.

Community encourages reporting issues: Despite brooklyn_marc's hesitation, darinsimmons and soracc encouraged reporting potential issues even if they appear fixed in the nightly builds. "If you think it's an issue, feel free to say something... they've been encouraging about seeing issues on GitHub."

Modular (Mojo 🔥) ▷ #nightly (64 messages🔥🔥):

Boolean Expressions Issue Found: A user identified a problem with handling certain boolean expressions at compile time in Mojo, mentioning that commenting @parameter and removing not or using var fixes the issue. They linked a specific commit possibly related to this issue.

Nightly Compiler Updates Released: Two new nightly builds of the Mojo compiler have been released. Users are informed to update to 2024.6.2605 and 2024.6.2614 with links to the raw diffs and changelog.

Unsigned Integer Casting Bug: There's ongoing discussion about an issue with unsigned integer casting overflowing as if signed in Mojo. Users are speculating about the behavior and potential bugs in how var and alias are handled.

List and Compile-Time Evaluation Bugs: There's a bug report highlighting that List doesn’t work correctly at compile time in Mojo. This issue adds to another reported compiler-time problem with Tensor, which leads to inconsistent results during successive runs.

Using Static Lifetime for References: The concept of ImmutableStaticLifetime was introduced, allowing users to take references to alias items, which was previously problematic due to lifetime issues. This enhancement is akin to using let and promises better management of static items.

Links mentioned:

Comparing 6961ce560d0457689f8667986b94a7ea02940cea...7c00fc9a5a3171531da871f7fc3925f960bd8d31 · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
[stdlib] Bump compiler version to 2024.6.2516 · modularml/mojo@6961ce5: no description found
[BUG] Unsigned integer casting overflowing as if signed when using `int()` or `UInt32()` · Issue #3065 · modularml/mojo: Bug description Migrating this here after a bit of discussion in Discord. It seems like casting to unsigned integers actually just casts to signed integers, but has different behaviour in different...
[BUG] `Tensor` initialised from a list with wrong type shows weird behaviour · Issue #3098 · modularml/mojo: Bug description To be more specific, Tensor[DType.int8] initialised with List[UIn8] doesn't compute its total number of elements correctly. I think it's again somehow related to implicit conve...
[BUG] `List` doesn't work at compile time. · Issue #3126 · modularml/mojo: Bug description As title. At least List.__getitem__ doesn't work. Steps to reproduce fn main(): alias l = List[Int](1, 2, 3) print(l[0]) # prints 0 System information Mojo 2024.6.2614 (366c690a) o...
Modular Inc: Modular is an integrated, composable suite of tools that simplifies your AI infrastructure so your team can develop, deploy, and innovate faster. - Modular Inc
[Modular CLI]: modular install mojo should support version pinning · Issue #1405 · modularml/mojo: Issue description I cannot figure out how to install a specific version of mojo. This would be useful, essential really, for library maintainers and CI/CD systems. Steps to reproduce $ modular inst...
[BUG] Unsigned integer casting overflowing as if signed when using `int()` or `UInt32()` · Issue #3065 · modularml/mojo: Bug description Migrating this here after a bit of discussion in Discord. It seems like casting to unsigned integers actually just casts to signed integers, but has different behaviour in different...
[stdlib] List __getitem__ returns auto-dereferenced ref by mikowals · Pull Request #2847 · modularml/mojo: With this, List.__getitem__() no longer makes copies when returning a value.  I also added a test to show that setting an individual field using sugared my_list[0].value = 1 no longer produces extr...

Interconnects (Nathan Lambert) ▷ #news (14 messages🔥):

Clement Delangue announces new LLM leaderboard with 300 H100s: Clement Delangue announced a new open LLM leaderboard where 300 H100 GPUs were used to re-run evaluations like MMLU-pro. Key takeaways include Qwen 72B dominating, outdated evaluations, focus on main evaluations over others, and the realization that bigger models aren't always smarter.

Reaction to Delangue's 300 H100s: Nathan Lambert humorously dismissed the effort by saying, "300 H100s is so few" and referred to it as "cringe" for a corporate CEO to brag about.

Community responds with humor: Members like xeophon. joked about needing 300 H100s for their university and setting up a LinkedIn post for it, while sumo43 sarcastically interpreted "burned" 300 H100s as an offering to Jensen. 

Leadership board is supported despite skepticism: Despite criticism about the announcement, Nathan Lambert expressed support for the new leaderboard, calling it "nice". 

Corporate bravado criticized: The community generally criticized the perceived bravado of the announcement, with xeophon. mentioning how it plays into the "underdog meme/game" but acknowledged that it isn't effective.

Link mentioned: Tweet from clem 🤗 (@ClementDelangue): Pumped to announce the brand new open LLM leaderboard. We burned 300 H100 to re-run new evaluations like MMLU-pro for all major open LLMs!  Some learning: - Qwen 72B is the king and Chinese open model...

Interconnects (Nathan Lambert) ▷ #ml-drama (28 messages🔥):

RabbitCode API keys vulnerability exposed: The Rabbitude team uncovered critical hardcoded API keys in the Rabbit codebase on May 16, 2024, compromising services like ElevenLabs, Azure, Yelp, and Google Maps. These keys could potentially allow unauthorized alteration and access to sensitive data.
ElevenLabs Credits might be exploitable: Discussion emerged on whether the ElevenLabs credits from the compromised keys could be used, with one member remarking "it's just VC money" so it isn't real money.
OpenAI delays advanced Voice Mode: According to OpenAI, the rollout of ChatGPT’s advanced Voice Mode has been delayed until fall for all Plus subscribers. The delay is to improve content detection and user experience.
HF leaderboard changes impact 7B model merge: The open LLM leaderboard changes saw the most significant drop in the 7B model merge, sparking reactions from the community.
Udio discusses AI's transformative role in music: In a detailed statement, Udio emphasized AI’s potential to empower artists, despite concerns from the RIAA. They predict AI will become integral in music creation and industry growth.

Links mentioned:

Tweet from RIAA (@RIAA): @RIAA response to @udiomusic ⬇️
rabbit data breach: all r1 responses ever given can be downloaded - rabbitude: rabbit inc has known that we have had their elevenlabs (tts) api key for a month, but they have taken no action to rotate the api keys.
Tweet from SebastianBoo (@SebastianB929): Who would have thought... The 7B model merge dropped the most 😅
Tweet from Techmeme (@Techmeme): Sources: YouTube is in talks with Sony, Warner, and Universal to license their songs for an AI music generation tool that mimics popular singers (Financial Times)  https://on.ft.com/3L0zaEQ  📫 Subscr...
Tweet from Nathan Benaich (@nathanbenaich): these two music genai suits are worth a read  spicy and quite revealing
Tweet from udio (@udiomusic): Today, we'd like to share some thoughts on AI and the future of music.  In the past two years, AI has become a powerful tool for creative expression across many media—from text to images to film, ...
Tweet from OpenAI (@OpenAI): We're sharing an update on the advanced Voice Mode we demoed during our Spring Update, which we remain very excited about:  We had planned to start rolling this out in alpha to a small group of Ch...

Interconnects (Nathan Lambert) ▷ #random (69 messages🔥🔥):

Suspicion over Imbue's $200M raise: Members discussed their skepticism about Imbue's recent success, questioning the lack of track record and sudden fundraise. One member mentioned a bad interview experience with them, while others noted they seem to be on a better track now.

Hype around CARBs and Scale: There's excitement about the release of CARBs, as one member mentioned attempting to implement it previously. The chat later pivoted to discussing Scale AI's strategy, including Scale's use of subsidiaries like remotasks.com for data annotation work to potentially isolate brand perception from customers.

Scale AI hires PhDs for remote work: Discussion included Scale AI flying contractors for project collaborations, with some working on AI projects for companies like Alphabet's Bard. A member pointed to a subsidiary for various strategic reasons, aligning practices with competitors like surgehq.ai and Dynata.

Gemma V2 Model excitement: Members showed enthusiasm for the Gemma V2 model, discussed the coy naming conventions, and appreciated the immediate openness about it. One member highlighted a shared article revealing internal details indirectly, generating significant signups.

AI research lab dynamics at AI2: There was an amusing anecdote about teaching a manager about system prompts, contrasting their high-level proficiency with basics. Discussions veered towards internal office dynamics and a critique of their current model outputs being 'meh' due to the lack of system prompts.

Links mentioned:

Tweet from minjune (@minjunesh): are u kidding me claude? this is 1984 levels of information gating
Tweet from krishna soham (@iamkrishnasoham): im-also-a-late-june-chatbot  cc: @apples_jimmy @jeremyphoward
Tweet from Rémi Eismann (@decompwlj): @TheXeophon It seems this is a Gemma V2 model: https://reddit.com/r/LocalLLaMA/comments/1dovvbd/gemma_v2_in_the_lmsys_arena/
Outlier: Refine the Next Generation of AI with Your Expertise. 

Interconnects (Nathan Lambert) ▷ #memes (11 messages🔥):

Tempting Tweets Cause Debate: A conversation arose about whether to tweet certain messages, with one remarking, "Should I tweet this ^ lol". Another member expressed a desire to avoid "directly attacking or dunking on people" and expressed that the safer option was to post them in their "secret discord friends" group.
Extra Stock Request: One member humorously mentioned they are in the market for scarce items, saying, "if he bought one to rock and one to stock just let him know that I'm in the market 😏🫴💳".

Stability.ai (Stable Diffusion) ▷ #general-chat (121 messages🔥🔥):

Concerns about Stability AI’s Open Model: Members voiced concerns that unless Stability AI fixes and un-censors SD3 while updating its license, no amount of new investors will save the company in the long run. One member added, “People sitting around making lewd waifus and deep fakes all day doesn't serve any actual benefit to the generative AI community…” indicating a need for real-world utility.

Cost Comparisons for GPUs: Members discussed GPU rental costs on platforms like Runpod and Vast, highlighting that running a 3090 is currently cheaper on Vast. “Running it on runpod is literally like 30 cents an hour,” one member noted.

Debate on Open Source vs Corporate Interests: The chat oscillated between advocating for open-source philosophies versus corporate interests. One member argued, “you need the community to drive an open source philosophy,” while another countered that Linux's success was majorly due to enterprise and corporate support.

New Builds and Hardware Recommendations: A user sought recommendations for building a proper SD setup, with discussions favoring the Nvidia 4090 for its performance advantage. “Probably cheaper to get 2x 4090 instead of a gpu with 48,” was suggested as a cost-effective option.

ICQ Shutdown Noted: Members reminisced about the past as ICQ, a once-popular messaging service, shut down. “Oh, and ICQ dies today... R-I-P,” one member remarked, triggering a nostalgic discussion.

Issues with Running SDXL: Users reported difficulties running SDXL on their hardware, mentioning “cuda out of memory” errors, particularly on machines with limited VRAM. Advice was sought on suitable command-line arguments and optimizations.

Links mentioned:

The Office No GIF - The Office No Angry - Discover & Share GIFs: Click to view the GIF
Blender - Open Data: Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

Nous Research AI ▷ #off-topic (5 messages):

Pet Psychic App Demo Sparks Interest: A meme posted by a member inadvertently highlighted a live demo for a Pet Psychic Scheduler app, featuring capabilities like booking psychic readings for pets and checking daily mood forecasts. Another member humorously inquired if the app was real, mentioning their dog's need for a horoscope.

Link mentioned: Tweet from Egg (@eggwens): Here is the live demo for pet psychic attached are the sample codes made in react with sample styling:  With Pet Psychic Scheduler, you can: 🔮 Book psychic readings for your pets ✨ Check daily mood f...

Nous Research AI ▷ #interesting-links (2 messages):

Imbue AI releases a powerful 70B model toolkit: Imbue AI announced they have trained a 70B model optimized for reasoning and coding, matching the performance of LLAMA 3 70B with just 1/7th of the data. They are releasing a toolkit that includes 11 NLP benchmarks, a code-focused reasoning benchmark, and a hyperparameter optimizer for scaling experiments. 
Community reaction to Imbue's infrastructure deep dive: A discussion emerged regarding the practicality and audience for Imbue AI's comprehensive infrastructure scripts intended for high-capacity training. One user noted that the detailed information might only be useful for a small niche market, albeit acknowledging its usefulness.

Link mentioned: Tweet from Imbue (@imbue_ai): Early this year, we trained a 70B model optimized for reasoning and coding. This model roughly matches LLAMA 3 70B despite being trained on 7x less data.  Today, we’re releasing a toolkit to help othe...

Nous Research AI ▷ #general (87 messages🔥🔥):

New prompt engineering toolkit released: A user shared, "I opensourced a little personal project I worked on over the weekend with sonnet 3.5, a prompt engineering toolkit," with a link to their GitHub project.

Disappointment with Microsoft model: A user was dissatisfied with Microsoft's new raw text data augmentation model and shared a demo link to Genstruct which yielded confusing results unrelated to the provided context.

Speculation on specialized AI hardware: Members discussed various high-performance AI chips like "Sohu" and others, debating their real-world performance and potential for inference, with references like Gergely Orosz's post about OpenAI's internal expectations on AGI.

New local project announcement: Another user excitedly shared a creative project involving character simulations using local LLMs, referencing NousResearch's CharacterCodex and tools like Haystack.

Discussions about model repetition and sampling issues: Users debated why instruction-tuned LLMs might repeat content, attributing it to "lack of repetition penalty or bad sampling settings," with experienced members confirming these potential issues.

Links mentioned:

Tweet from Etched (@Etched): Meet Sohu, the fastest AI chip of all time.  With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s.  Soh...
About: no description found
Open LLM Leaderboard 2 - a Hugging Face Space by open-llm-leaderboard: no description found
Tweet from Gergely Orosz (@GergelyOrosz): Fascinating observation by @ByrneHobart about how even OpenAI is unlikely to believe that "AGI" is close, based on changing equity structures.  Basically, OpenAI is expecting more employees to...
Genstruct 7B - a Hugging Face Space by davanstrien: no description found
GitHub - teknium1/Prompt-Engineering-Toolkit: Contribute to teknium1/Prompt-Engineering-Toolkit development by creating an account on GitHub.
@anakin87 on Hugging Face: "🌌 Creating adventures with local LLMs

What if 🤔... Homer Simpson met…": no description found

Nous Research AI ▷ #rag-dataset (1 messages):
namayra: me!

LangChain AI ▷ #general (69 messages🔥🔥):

Stream LangChain responses with .stream() method: After importing LLM from langchain_community.chat_models and installing ollama, it is recommended to use .stream("query"). This method allows for iterating through tokens and printing them line by line.

Long-term memory with Zep looks promising: Users are discussing the potential of using Zep for an AI's long-term memory, which can populate prompts with relevant facts from past conversations.

Using BytesIO for PDF in LangChain: A user seeks a method to load a PDF document directly from a BytesIO object without creating a temporary file. Current workaround involves creating a temp file, which is seen as inefficient.

Streamlit with AgentExecutor and streaming responses: Instructions provided for using StreamlitCallbackHandler to visualize thoughts and actions of an agent live in a Streamlit app. Users seek ways to handle streaming responses within this setup without using callback handlers.

LangSmith tracing issue troubleshooting: A user inquires about LangSmith no longer tracing their project despite having set all required environment variables. The suggestion is to check if the trace quota has been exhausted.

Links mentioned:

Streamlit | 🦜️🔗 LangChain: Streamlit is a faster way to build and share data apps.
Zep - Long-Term Memory for AI Assistants: Recall, understand, and parse chat dialog to power personalized experiences.
Argilla | 🦜️🔗 LangChain: Argilla is an open-source data curation platform for LLMs.
Identity-enabled RAG using PebbloRetrievalQA | 🦜️🔗 LangChain: PebbloRetrievalQA is a Retrieval chain with Identity & Semantic Enforcement for question-answering
Build an Agent | 🦜️🔗 Langchain: This guide assumes familiarity with the following concepts:
Build an Agent with AgentExecutor (Legacy) | 🦜️🔗 LangChain: This section will cover building with the legacy LangChain AgentExecutor. These are fine for getting started, but past a certain point, you will likely want flexibility and control that they do not of...
GitHub - SuperDuperDB/superduperdb: 🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.: 🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scal.....
Build a Chatbot | 🦜️🔗 LangChain: This guide assumes familiarity with the following concepts:
How to run custom functions | 🦜️🔗 Langchain: This guide assumes familiarity with the following concepts:
How to stream | 🦜️🔗 Langchain: This guide assumes familiarity with the following concepts:
How to generate original training videos based on existing videoset?: I am a software engineer who is quickly ramping up on AI tech, but am nevertheless very new to the sector.
A collegue has an extensive collection of training videos, the vertical is wheelchair sea...
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Build an Agent | 🦜️🔗 LangChain: This guide assumes familiarity with the following concepts:
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
How to debug your LLM apps | 🦜️🔗 LangChain: Like building any type of software, at some point you'll need to debug when building with LLMs. A model call will fail, or model output will be misformatted, or there will be some nested model ca...

LangChain AI ▷ #langserve (1 messages):

Tracing execution chains in GA4 from langserve backend intrigues: A member inquired about tracing execution chains in GA4 using the langserve backend. They considered subclassing Langsmith and clarified the need to capture only the first invoke or stream without tracking any subsequent steps.

LangChain AI ▷ #share-your-work (2 messages):

Testcontainers-Python adds Ollama support: A member shared their new contributions to testcontainers-python, adding support for an Ollama module to test LLM endpoints through Ollama in Python. You can check their issue and pull request for more details and provide feedback.

Medium Article on Few-Shot Prompting with Tool Calling: A member shared a Medium article that discusses few-shot prompting with tool calling in Langchain. The article provides insights and methods to implement this approach.

Links mentioned:

New Container: OllamaContainer · Issue #617 · testcontainers/testcontainers-python: Add support for the OllamaContainer to simplify running and testing LLMs through Ollama. What is the new container you'd like to have? I would like to request support for a new container: OllamaCo...
feat(core): Add support for ollama module by bricefotzo · Pull Request #618 · testcontainers/testcontainers-python: Added a new class OllamaContainer with few methods to handle the Ollama container.   The _check_and_add_gpu_capabilities method checks if the host has GPUs and adds the necessary capabilities to th...
Few-Shot Prompting with Tool Calling in Langchain: Ankush k Singal

LangChain AI ▷ #tutorials (1 messages):

ARC AGI Challenge Video Shared: A YouTube video titled "Claude 3.5 struggle too?! The $Million dollar challenge" was shared. The video provides a tutorial on how to do the ARC AGI challenges with agents and includes a link to a free HubSpot report on AI data analysis projects.

Link mentioned: Claude 3.5 struggle too?! The $Million dollar challenge: The million dollar ARC AGI challengeGet free HubSpot report of how to do AI data analysis project: https://clickhubspot.com/d30🔗 Links- Follow me on twitter...

LlamaIndex ▷ #general (37 messages🔥):

LlamaIndex chatbot development issues: A user queried about fetching context directly from a chat response instead of individual query results while building a chatbot with LlamaIndex. They provided specific implementation details and challenges faced.

Review discussions on GitHub PR: A member sought advice on merging a PR that adds functionality for filtering queries on Neo4J Database in LlamaIndex. Another indicated they were backlogged with reviews but would attend to it soon.

Notification issues with ML libraries: A user asked how to remove an notification about missing ML libraries while using the Openailike class. A response clarified that it wasn’t an error and pointed to the specific source of the message.

Fine-tuning LLM for SQL queries: Users discussed the potential improvements in query precision through fine-tuning an LLM when using a RAG SQL layer. It was suggested that fine-tuning on good data would likely yield better performance.

Hybrid search with LlamaIndex: There was an inquiry about implementing hybrid search by balancing metadata and text chunk influence, to which a detailed response outlined the use of the alpha parameter for configuring search weightings.

Links mentioned:

GitHub - Emerging-AI/ENOVA: A deployment, monitoring and autoscaling service towards serverless LLM serving.: A deployment, monitoring and autoscaling service towards serverless LLM serving. - Emerging-AI/ENOVA
Add MetadataFilters to neo4j_property_graph by theoneamendez · Pull Request #14362 · run-llama/llama_index: Description Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Summary fo...
How to Build a Chatbot - LlamaIndex: no description found
Keyword - LlamaIndex: no description found
llama_index/llama-index-core/llama_index/core/readers/json.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
llama_index/llama-index-core/llama_index/core/readers/file/base.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Query Pipeline for Advanced Text-to-SQL - LlamaIndex: no description found
Building an Agent around a Query Pipeline - LlamaIndex: no description found

LlamaIndex ▷ #ai-discussion (2 messages):

Optimize RAG systems with LlamaIndex and DSPy: A member shared a Medium article about building optimized Retrieval-Augmented Generation (RAG) systems utilizing LlamaIndex and DSPy. The article details practical steps and insights for achieving robust RAG implementations.

Open-source project seeks feedback with perks: Another member introduced an open-source project on GitHub aimed at enhancing AI deployment, monitoring, and autoscaling services for serverless LLM serving. They are looking for feedback and offering a $50 gift card in exchange for an online interview to share insights.

Link mentioned: GitHub - Emerging-AI/ENOVA: A deployment, monitoring and autoscaling service towards serverless LLM serving.: A deployment, monitoring and autoscaling service towards serverless LLM serving. - Emerging-AI/ENOVA

OpenInterpreter ▷ #general (9 messages🔥):

Model confusion resolved: it's Claude-3.5-Sonnet: There was some confusion about the new Anthropic model's name. It was clarified as claude-3-5-sonnet-20240620.

MoonDream-based local vision model discussed: Members discussed whether OI has a MoonDream-based local vision model, but it's noted that it's not currently usable with OI.

Multiline input problems: A member faced issues with the -ml option to use multi-line inputs using '''.

Vision error concerns: Another member faced errors while using interpreter --os --vision to identify screen contents, despite verifying their API key.

File drop restrictions in the interpreter: Restrictions around moving documents or images directly into the terminal were brought up, with a member indicating getting a ban when attempting to do so.

OpenInterpreter ▷ #O1 (17 messages🔥):

01 is the voice interface for OpenInterpreter: A user confirmed that 01 serves as the voice interface for OpenInterpreter (OI), addressing another member's confusion about the relationship between 01 and OI.
01 not available for sale in Spain: A member expressed interest in purchasing 01 in Spain but was informed it is only available for sale in the United States. They were directed to a GitHub repository to build one themselves using the open-source dev kit.
DIY 01 tutorials available online: Another user confirmed that tutorials for building 01 from the open-source kit are available on YouTube, and they plan to create a tutorial in July.
Challenges setting up voice functionality: Members discussed difficulties and specifics about setting up voice functions with 01, including integrating TTS and STT in Spanish and sending voice to an ngrok websocket on a Macbook Pro M1.

Link mentioned: 01/hardware/light at main · OpenInterpreter/01: The open-source language model computer. Contribute to OpenInterpreter/01 development by creating an account on GitHub.

Cohere ▷ #general (16 messages🔥):

Inquire About Scholars Program: A member recently asked if the scholars program is running this year. There was no further discussion on this topic provided.

Preamble Tokens Billing Discussion: A member detailed an experiment about billing for preamble tokens in API calls, emphasizing that "preamble is billed". They mentioned a scenario where a 16k token preamble could theoretically be used without charges if not billed.

Era of 1Bit LLMs Talk Event: Announcement for a talk by Hongyu Wang on the topic The Era of 1Bit LLMs. Participants were invited to join the talk via a Google Meet link.

Websim.ai Webpage Simulation: Members enjoyed experimenting with Websim.ai, which simulates predicted webpages based on URLs. It uses Anthropics' Claude to create artifacts and simulate a personal pocket internet.

Reporting Commercial Abuse of Command-R: A member raised a concern about a NSFW bot hosting service, SpicyChat AI, using Command-R for profit-making. They highlighted the service's owner claiming that the use of OpenRouter negates the Cohere CC-BY-NA license.

Links mentioned:

...: no description found
no title found: no description found
Run From Dinosaurs - Crash Bandicoot Style: no description found

Cohere ▷ #project-sharing (2 messages):

Announcing Rig Rust Library Release: A member shared an update on the release of Rig, a Rust library for building LLM-powered applications. They are running an incentivized feedback program where developers are rewarded for building use cases and providing feedback on the library.
Feedback program is on-topic: Another member confirmed that posting about the feedback program is appropriate for this channel. They humorously mentioned that the library should obviously support Cohere's models.

OpenAccess AI Collective (axolotl) ▷ #general (11 messages🔥):

Adam-mini aims to optimize memory usage: A member shared an arXiv paper proposing Adam-mini, an optimizer that achieves performance on-par or better than AdamW with 45% to 50% less memory footprint. The optimizer reduces memory by cutting down the number of learning rates in Adam, using a single learning rate for parameter blocks inspired by the Hessian structure of Transformers.
Inquiring about masking output texts during training: A member asked if there's a way to mask out certain 'output' texts, similar to train_on_input, suggesting a potential feature like do_not_train_on_output_marked_as_masked.
Debate on gradient accumulation impact on training time: Multiple members discussed whether increasing gradient accumulation (GA) times, e.g., accumulating gradients 100 times, would affect the overall training time per epoch. One member suggested it might be faster as the optimizer runs fewer times, potentially reducing noise absorbed by parameters, while another argued that high GA slows down per step performance.
Issue with CUDA errors during training: A member shared an error related to CUDA illegal memory access encountered during training, suggesting debugging steps like using CUDA_LAUNCH_BLOCKING=1 or compiling with TORCH_USE_CUDA_DSA to enable device-side assertions.

Link mentioned: Adam-mini: Use Fewer Learning Rates To Gain More: We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the number of learning rates in...

OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):

Creating cosine lr scheduler with custom min lr on HF: A member asked about an easy way to create a cosine learning rate scheduler with a minimum learning rate greater than zero on Hugging Face. This points to potential tweaks in the HF library for practical implementations.

QDora enablement in PEFT: Caseus mentioned a pull request that enables QDora in PEFT and promised to track it down. This sparked interest from another member willing to put in significant effort to get it working.

Mistral7B repeat issues: A user reported that their full instruction-tuned Mistral7B model repetitively generates sentences or paragraphs even with high temperature settings. They noted that their dataset does not contain such repetition, seeking advice for the cause and solution.

OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

Storiagl: Free Story Creation with LLMs: An exciting new platform, Storiagl, allows users to create and play stories utilizing custom LLMs for character interpretation. It offers advanced settings to craft complex and detailed narratives.

Link mentioned: StorIA: no description found

LLM Finetuning (Hamel + Dan) ▷ #general (6 messages):

Teaching LLMs Kalamang from a single book: A Dutch PhD student, Eline Visser, wrote "The Grammar of Kalamang," the only text in the language, and researchers used this to see if a language model could learn Kalamang using various types of fine-tuning and prompting. Interestingly, "prompting wins (and it’s not close)" in this experiment, although humans still outperform LLMs in this task. Detail; Abstract. 

AI Engineer World’s Fair 2024 streaming now: The "AI Engineer World’s Fair 2024" focusing on Keynotes & CodeGen Track is currently live-streaming on YouTube. Watch here; more details are available through the event's Twitter description.

Build with Claude Contest June 2024: The "Build with Claude" contest for June 2024 has been announced, providing an opportunity for participants to showcase their capabilities with Claude. More details can be found in the official contest overview.

Links mentioned:

Tweet from jack morris (@jxmnop): recently read one of the most interesting LLM papers i've ever read, the story goes something like this  > dutch PhD student/researcher Eline Visser lives on remote island in Indonesia for seve...
A Benchmark for Learning to Translate a New Language from One Grammar Book: Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does o...
AI Engineer World’s Fair 2024 — Keynotes & CodeGen Track: https://twitter.com/aidotengineer
Build with Claude June 2024 contest - Anthropic: no description found

LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):

Email Verification Requested for Credits Issue: A member offered help with a credits form issue, advising another to DM them with the email address used in the form. "Feel free to DM me and I can take a look - please let me know what email you used in the credits form."

LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (3 messages):

DS surpasses FSDP in offloading: "Likely the reason for this is DS has more fine-grained offloading vs FSDP, assuming that offloading is happening".
Lacking experience with DS and FSDP: A member noted, "I do not/haven't used them yet".
Exploring LLama 70B settings: A member shared that they wanted to try LLama 70B but acknowledged a need to understand more about the settings.

Mozilla AI ▷ #announcements (1 messages):

Builders Program Early Deadline Reminder: A reminder was issued for the Builders Program, urging members to submit their applications before July 8th for the early application deadline. More information and applications can be found here.

Questions and Support Available: For any questions related to the Builders Program, members can get support through this Discord channel.

Mozilla AI ▷ #llamafile (9 messages🔥):

Firefox makes llamafile integration a nostalgic web adventure: Using llamafile as an HTTP proxy, Firefox can explore the knowledge in LLM weights, creating a web experience reminiscent of the '90s. Check out the YouTube video demonstrating this integration.

Chat Adventures with llamafile and Character Codex: A member shared a detailed notebook on Creating Chat Adventures from scratch using llamafile, Haystack, and Character Codex. Access the notebook here to experiment with scenarios like Homer Simpson meeting Spider-Man.

CUDA warnings in Jupyter notebooks: There's a discussion about handling CUDA warnings in Jupyter-like environments to keep the notebooks clean. Suggested solution involves a utility from Haystack to detect if a program is running in such an environment.

NVIDIA stock volatility amidst AI news: A tweet highlighted a significant drop in NVIDIA's market cap after a talk at AIEWF, with conflicting analyses from MarketWatch and Barrons discussing the factors affecting the stock price.

Links mentioned:

Tweet from AI Engineer (@aiDotEngineer): breaking news: $NVDA loses $56 BILLION in market cap after @JustineTunney @stlhood talk at AIEWF  Quoting swyx 👉 ai.engineer (@swyx)   MOZILLA is so back what the hell  MOZILLA AI might just be their...
llm browser demo: no description found
haystack/haystack/utils/jupyter.py at main · deepset-ai/haystack: :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with yo...
haystack-cookbook/notebooks/charactercodex_llamafile.ipynb at main · deepset-ai/haystack-cookbook: 👩🏻‍🍳 A collection of example notebooks. Contribute to deepset-ai/haystack-cookbook development by creating an account on GitHub.
Nvidia Stock Still Falling as Shareholder Meeting Concludes: At a quick annual meeting, Nvidia shareholders approved all 12 recommended nominees to the company’s board. 

tinygrad (George Hotz) ▷ #general (7 messages):

FPGA backend for tinygrad discussed: Members discussed the possibility of tinygrad having an FPGA backend. George Hotz suggested that it might be more practical to design an accelerator that runs on the FPGA and then target that.

Positron aims for transformer inference: Some Groq engineers have left to work on Positron, which aims to provide hardware solutions like the Atlas Transformer Inference Server and Redstone Developer Workstation, claiming 10x better performance per dollar than DGX-H100.

FPGA specialization and HDL: Members mentioned newer FPGA platforms that are being outfitted with DSP blocks and HBM, which could specialize models by generating HDL specific to them, although trsohmers clarified that Positron is not using Xilinx/AMD FPGAs and their design is generic for all transformer models.

PyTorch documentary shared: A link to a YouTube documentary titled "Official PyTorch Documentary: Powering the AI Revolution" was shared, providing insights into the origins and impact of PyTorch on the AI landscape.

Links mentioned:

Positron | The Best Performing Transformer Inference System: Positron makes purpose-built hardware to accelerate multimodal AI.
Official PyTorch Documentary: Powering the AI Revolution: This film unveils the authentic narrative of PyTorch’s inception, attributing its existence to a dedicated group of unsung heroes driving technological innov...

AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (4 messages):

Angry.penguin steps up as mod: A member, angry.penguin, offered to become a mod to prevent future issues with spam, stating "if you want to make me a mod I can set it up so this doesnt happen again in the future!" Yoko Li promptly accepted the offer.
Yoko Li and angry.penguin handle spam: Following their promotion to mod, angry.penguin reported that they have addressed the spam issue, mentioning "Should be good now from future spam 😄" and "also cleaned up the spam".

DiscoResearch ▷ #embedding_dev (4 messages):

New German Encoders Released: A member announced the release of German Semantic V3 and V3b on Hugging Face. V3 focuses on being knowledge-heavy while V3b is geared towards performance with features like Matryoshka Embeddings and an 8k context length.

Inquiry about GGUF: Another member inquired about the existence of gguf formats for the new encoders and how to further finetune V3b. The response indicated no gguf available and suggested using sentence-transformers scripts from UKPLab's GitHub repository for finetuning.

GGUF Format Feasibility: The member clarified that it is possible to use gguf formats for encoders and cited Ollama as an example of utilizing two embedders in such a format.

Links mentioned:

sentence-transformers/examples/training at master · UKPLab/sentence-transformers: Multilingual Sentence & Image Embeddings with BERT - UKPLab/sentence-transformers
aari1995/German_Semantic_V3 · Hugging Face: no description found
aari1995/German_Semantic_V3b · Hugging Face: no description found

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter Introduces New Model: Check out the new 01-ai/yi-large model just announced by OpenRouter, LLC for 2023 - 2024. It's the latest addition to their offering.

Recommended Parameters Tab Issue Fixed: There was an issue with the incorrect data being shown on the Recommended Parameters tab for model pages. This problem has been resolved and the tab now displays the correct information.

Link mentioned: Yi Large by 01-ai: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service.  It stands out for its multilingual pro...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

GPA Saver Website Launched: A member shared their new website, GPA Saver, integrating AI for academic assistance. They thanked OpenRouter for making LLM integration seamless and easy.

AI-Powered Academic Tools: The website offers several academic tools including an assistant chat, rapid quiz solver, PDF summarizer, interactive whiteboard, and flashcard generator. There's a special discount code BETA for the first 100 users, providing ~37% off.

Link mentioned: GPA Saver: Leverage the power of AI for your studies.

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):