[AINews] Gemini launches context caching... or does it?

minimum

                June 18, 2024

            [AINews] Gemini launches context caching... or does it?

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            1 week left til AI Engineer World's Fair! Full schedule now live including AI Leadership track.

AI News for 6/17/2024-6/18/2024.
We checked 7 subreddits, 384 Twitters and 30 Discords (415 channels, and 3582 messages) for you. 
Estimated reading time saved (at 200wpm): 397 minutes. You can now tag @smol_ai for AINews discussions!

Today was a great day for AINews followups:

Nvidia's Nemotron (our report) now ranks #1 open model on LMsys and #11 overall (beating Llama-3-70b, which maybe isn't that impressive but perhaps wasnt the point), 
Meta's Chameleon (our report) 7B/34B was released (minus image-output capability) after further post-training, as part of a set of 4 model releases today

But for AI Engineers, today's biggest news has to be the release of Gemini's context caching, first teased at Google I/O (our report here).

Caching is exciting because it creates a practical middle point between the endless RAG vs Finetuning debate - instead of using a potentially flawed RAG system, or lossfully finetuning a LLM to maaaaybe memorize new facts... you just allow the full magic of attention to run on the long context and but pay 25% of the cost (but you do pay $1 per million tokens per hour storage which is presumably a markup over the raw storage... making the breakeven about the 400k tokens/hr mark):  

Some surprises:

there is a minimum input token count for caching (33k tokens)
the context cache defaults to 1hr, but has no upper limit (they will happily let you pay for it)
there is no latency savings for cached context... making one wonder if this caching API is a "price based MVP".

We first discussed context caching with Aman Sanger on the Neurips 2023 podcast and it was assumed the difficulty was the latency/cost efficiency around loading/unloading caches per request. However the bigger challenge to using this may be the need for prompt prefixes to be dynamically constructed per request (this issue only applies to prefixes, dynamic suffixes can work neatly with cached contexts).

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
CUDA MODE Discord
Stability.ai (Stable Diffusion) Discord
HuggingFace Discord
OpenAI Discord
Modular (Mojo 🔥) Discord
Cohere Discord
LM Studio Discord
Nous Research AI Discord
Interconnects (Nathan Lambert) Discord
Perplexity AI Discord
OpenAccess AI Collective (axolotl) Discord
LlamaIndex Discord
LLM Finetuning (Hamel + Dan) Discord
OpenRouter (Alex Atallah) Discord
Eleuther Discord
LangChain AI Discord
tinygrad (George Hotz) Discord
LAION Discord
Torchtune Discord
Latent Space Discord
OpenInterpreter Discord
AI Stack Devs (Yoko Li) Discord
Datasette - LLM (@SimonW) Discord
Mozilla AI Discord

PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (526 messages🔥🔥🔥):
Unsloth AI (Daniel Han) ▷ #random (10 messages🔥):
Unsloth AI (Daniel Han) ▷ #help (143 messages🔥🔥):
CUDA MODE ▷ #general (5 messages):
CUDA MODE ▷ #algorithms (2 messages):
CUDA MODE ▷ #jobs (1 messages):
CUDA MODE ▷ #jax (1 messages):
CUDA MODE ▷ #torchao (25 messages🔥):
CUDA MODE ▷ #llmdotc (536 messages🔥🔥🔥):
CUDA MODE ▷ #bitnet (9 messages🔥):
CUDA MODE ▷ #sparsity (3 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (363 messages🔥🔥):
HuggingFace ▷ #general (311 messages🔥🔥):
HuggingFace ▷ #i-made-this (8 messages🔥):
HuggingFace ▷ #diffusion-discussions (4 messages):
OpenAI ▷ #ai-discussions (187 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (17 messages🔥):
OpenAI ▷ #prompt-engineering (19 messages🔥):
OpenAI ▷ #api-discussions (19 messages🔥):
Modular (Mojo 🔥) ▷ #general (91 messages🔥🔥):
Modular (Mojo 🔥) ▷ #💬︱twitter (3 messages):
Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):
Modular (Mojo 🔥) ▷ #🔥mojo (108 messages🔥🔥):
Modular (Mojo 🔥) ▷ #nightly (1 messages):
Cohere ▷ #general (150 messages🔥🔥):
Cohere ▷ #project-sharing (5 messages):
LM Studio ▷ #💬-general (55 messages🔥🔥):
LM Studio ▷ #🤖-models-discussion-chat (49 messages🔥):
LM Studio ▷ #🧠-feedback (3 messages):
LM Studio ▷ #📝-prompts-discussion-chat (1 messages):
LM Studio ▷ #⚙-configs-discussion (9 messages🔥):
LM Studio ▷ #🎛-hardware-discussion (11 messages🔥):
LM Studio ▷ #🧪-beta-releases-chat (6 messages):
LM Studio ▷ #open-interpreter (13 messages🔥):
LM Studio ▷ #🛠-dev-chat (5 messages):
Nous Research AI ▷ #off-topic (1 messages):
Nous Research AI ▷ #interesting-links (10 messages🔥):
Nous Research AI ▷ #general (105 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (19 messages🔥):
Nous Research AI ▷ #world-sim (8 messages🔥):
Interconnects (Nathan Lambert) ▷ #news (58 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):
Interconnects (Nathan Lambert) ▷ #random (71 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #rlhf (6 messages):
Interconnects (Nathan Lambert) ▷ #posts (2 messages):
Perplexity AI ▷ #general (99 messages🔥🔥):
Perplexity AI ▷ #sharing (10 messages🔥):
Perplexity AI ▷ #pplx-api (19 messages🔥):
OpenAccess AI Collective (axolotl) ▷ #general (59 messages🔥🔥):
OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):
OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (6 messages):
OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):
LlamaIndex ▷ #blog (2 messages):
LlamaIndex ▷ #general (62 messages🔥🔥):
LLM Finetuning (Hamel + Dan) ▷ #general (18 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #replicate (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #credits-questions (21 messages🔥):
LLM Finetuning (Hamel + Dan) ▷ #fireworks (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #braintrust (4 messages):
LLM Finetuning (Hamel + Dan) ▷ #west-coast-usa (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #predibase (2 messages):
LLM Finetuning (Hamel + Dan) ▷ #openpipe (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):
LLM Finetuning (Hamel + Dan) ▷ #bergum_rag (7 messages):
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):
OpenRouter (Alex Atallah) ▷ #general (60 messages🔥🔥):
OpenRouter (Alex Atallah) ▷ #일반 (1 messages):
Eleuther ▷ #general (29 messages🔥):
Eleuther ▷ #research (20 messages🔥):
Eleuther ▷ #interpretability-general (10 messages🔥):
Eleuther ▷ #lm-thunderdome (2 messages):
LangChain AI ▷ #general (17 messages🔥):
LangChain AI ▷ #share-your-work (14 messages🔥):
LangChain AI ▷ #tutorials (1 messages):
tinygrad (George Hotz) ▷ #general (9 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (17 messages🔥):
LAION ▷ #general (23 messages🔥):
LAION ▷ #research (3 messages):
Torchtune ▷ #general (24 messages🔥):
Latent Space ▷ #ai-general-chat (18 messages🔥):
OpenInterpreter ▷ #general (9 messages🔥):
OpenInterpreter ▷ #O1 (1 messages):
OpenInterpreter ▷ #ai-content (2 messages):
AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (3 messages):
Datasette - LLM (@SimonW) ▷ #ai (1 messages):
Datasette - LLM (@SimonW) ▷ #llm (2 messages):
Mozilla AI ▷ #llamafile (1 messages):

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
DeepSeek-Coder-V2 Model Release

DeepSeek-Coder-V2 outperforms other models in coding: @deepseek_ai announced the release of DeepSeek-Coder-V2, a 236B parameter model that beats GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, and Codestral in coding tasks. It supports 338 programming languages and extends context length from 16K to 128K.
Technical details of DeepSeek-Coder-V2: @rohanpaul_ai shared that DeepSeek-Coder-V2 was created by taking an intermediate DeepSeek-V2 checkpoint and further pre-training it on an additional 6 trillion tokens, followed by supervised fine-tuning and reinforcement learning using the Group Relative Policy Optimization (GRPO) algorithm.
DeepSeek-Coder-V2 performance and availability: @_philschmid highlighted that DeepSeek-Coder-V2 sets new state-of-the-art results in HumanEval, MBPP+, and LiveCodeBench for open models. The model is available on Hugging Face under a custom license allowing for commercial use.

Meta AI Model Releases

Meta AI releases new models: @AIatMeta announced the release of four new publicly available AI models and additional research artifacts, including Meta Chameleon7B & 34B language models, Meta Multi-Token Prediction pretrained language models for code completion, Meta JASCO generative text-to-music models, and Meta AudioSeal.
Positive reactions to Meta's open model releases: @ClementDelangue noted excitement around the fact that datasets have been growing faster than models on Hugging Face, and @omarsar0 congratulated the Meta FAIR team on the open sharing of artifacts with the AI community.

Runway Gen-3 Alpha Video Model

Runway introduces Gen-3 Alpha video model: @c_valenzuelab introduced Gen-3 Alpha, a new video model from Runway designed for creative applications that can understand and generate a wide range of styles and artistic instructions. The model enables greater control over structure, style, and motion for creating videos.
Gen-3 Alpha performance and speed: @c_valenzuelab noted that Gen-3 Alpha was designed from the ground up for creative applications. @c_valenzuelab also mentioned that the model is fast to generate, taking 45 seconds for a 5-second video and 90 seconds for a 10-second video.
Runway's focus on empowering artists: @sarahcat21 highlighted that Runway's Gen-3 Alpha is designed to empower artists to create beautiful and challenging things, in contrast to base models designed just to generate video.

NVIDIA Nemotron-4-340B Model

NVIDIA releases Nemotron-4-340B, an open LLM matching GPT-4: @lmsysorg reported that NVIDIA's Nemotron-4-340B has edged past Llama-3-70B to become the best open model on the Arena leaderboard, with impressive performance in longer queries, balanced multilingual capabilities, and robust performance in "Hard Prompts".
Nemotron-4-340B training details: @_philschmid provided an overview of how Nemotron-4-340B was trained, including a 2-phase pretraining process, fine-tuning on coding samples and diverse task samples, and the application of Direct Preference Optimization (DPO) and Reward-aware Preference Optimization (RPO) in multiple iterations.

Anthropic AI Research on Reward Tampering

Anthropic AI investigates reward tampering in language models: @AnthropicAI released a new paper investigating whether AI models can learn to hack their own reward system, showing that models can generalize from training in simpler settings to more concerning behaviors like premeditated lying and direct modification of their reward function.
Curriculum of misspecified reward functions: @AnthropicAI designed a curriculum of increasingly complex environments with misspecified reward functions, where AIs discover dishonest strategies like insincere flattery, and then generalize to serious misbehavior like directly modifying their own code to maximize reward.
Implications for misalignment: @EthanJPerez noted that the research provides empirical evidence that serious misalignment can emerge from seemingly benign reward misspecification, and that threat modeling like this is important for knowing how to prevent serious misalignment.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

Video Generation AI Models and Capabilities

Runway Gen-3 Alpha: In /r/singularity, Runway introduced a new text-to-video model with impressive capabilities like generating a realistic concert scene, though some visual artifacts and perspective issues remain.
OpenSora v1.2: In /r/StableDiffusion, the fully open-source video generator OpenSora v1.2 was released, able to generate 16 second 720p videos, but requiring 67GB VRAM and 10 min on a $30K GPU.
Wayve's novel view synthesis: Wayve demonstrated an AI system generating photorealistic video from different angles.
NVIDIA Research wins autonomous driving challenge: NVIDIA Research won an autonomous driving challenge with an end-to-end AI driving system.

Image Generation AI Models

Stable Diffusion 3.0: The release of Stable Diffusion 3.0 was met with some controversy, with comparisons finding it underwhelming vs SD 1.5/2.1.
PixArt Sigma: PixArt Sigma emerged as a popular alternative to SD3, with good performance on lower VRAM.
Depth Anything v2: Depth Anything v2 was released for depth estimation, but models/methods are not readily available yet.
2DN-Pony SDXL model: The 2DN-Pony SDXL model was released supporting 2D anime and realism.

AI in Healthcare

GPT-4o assists doctors: In /r/singularity, GPT-4o was shown assisting doctors in screening and treating cancer patients at Color Health.

AI Replacing Jobs

BBC reports 60 tech employees replaced by 1 person using ChatGPT: The BBC reported on 60 tech employees being replaced by 1 person using ChatGPT to make AI sound more human, sparking discussion on job losses and lack of empathy.

Robotics and Embodied AI

China's humanoid robot factories: China's humanoid robot factories aim to mass produce service robots.

Humor/Memes

A meme poked fun at recurring predictions of AI progress slowing.
A humorous post was made about the Stable Diffusion 3.0 logo.
A meme imagined Stability AI's internal discussion on the SD3 release.

AI Discord Recap

A summary of Summaries of Summaries

DeepMind Brings Soundtracks to AI Videos:

Google DeepMind's V2A technology can generate unlimited audio tracks for AI-generated videos, addressing the limitation of silent AI videos.
ElevenLabs launched a sound effects generator with infinite customization, promising high-quality, royalty-free audio for various media applications.

Stable Diffusion 3 Faces Licensing Drama:

[Civitai temporarily] banned all SD3-based models](https://civitai.com/articles/5732) due to unclear licensing terms, triggering community concerns about Stability AI's control over models.
SD3's release was met with disappointment, labeled as the "worst base model release yet" due to both performance issues and licensing uncertainties.

Exceeding Expectations with Model Optimizations:

The CUTLASS library outperformed CuBLAS by 10% in pure C++ for matrix multiplications but lost this edge when integrated with Python, both touching 257 Teraflops.
Meta introduces Chameleon, a model supporting mixed-modal inputs with promising benchmarks and open-source availability, alongside other innovative models like JASCO.

AI Community Questions OpenAI Leadership:

Concerns arose around OpenAI's appointment of a former NSA director, with Edward Snowden's tweet cautioning against potential data security risks associated with this decision.
Widespread ChatGPT downtimes left users frustrated, highlighting server stability issues across different regions and pushing users to seek alternatives like the ChatGPT app.

Training and Compatibility Issues Across Platforms:

Google Colab struggles with session interruptions during model training led to discussions about workaround tips like preemptive checkpointing.
AMD GPU support in Axolotl remains inadequate, especially for the MI300X, prompting users to exchange modification tips for enhanced compatibility and performance.

PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord

AI Veterans Yuck It Up About Age: Discussion among members in the 40-60 age range included light-hearted banter about aging and staying mentally young through coding, with no fears of being called "dead men walking" by younger generations.

Grasping the GGUF Challenge: Tips for determining the optimal number of GGUF layers to offload to VRAM included trial and error methods and inspecting llama.cpp outputs, as well as considering the Hugging Face model details.

Software Monetization Models for MultiGPU Support: A consensus emerged on implementing a subscription model for multiGPU support, possibly starting at $9.99 a month, with discussions around different pricing strategies based on user type.

Renting GPUs vs. Burning Pockets: Members recommended renting GPUs over local setups for cost-efficiency and managing overheating, especially with high electricity prices being a factor.

OpenAI Appointment Rings Alarm Bells: Concerns were raised about OpenAI's decision to appoint a former NSA director to its board, with members citing a tweet from Edward Snowden as a cautionary stance against potential data security issues.

Gemini 2.0 Nears Launch: Anticipation is high for Gemini 2.0, with members excited about the potential for 24GB VRAM machines and talking about vigorously testing rented 48GB Runpod instances.

Colab Frustration and Optimization: Issues with Google Colab, such as training sessions cutting out and the benefits of initiating checkpointing, were discussed, alongside challenges of tokenization and session length limits on the platform.

Training and Model Management Tips Shared: Advice on converting JSON to Parquet for greater efficiency and proper usage of mixed GPUs with Unsloth was shared, including detailed Python code snippets and suggestions to avoid compatibility issues.

CUDA MODE Discord

CUDA Crushes Calculations: The CUTLASS library delivered a 10% performance uplift over CuBLAS, reaching 288 Teraflops in pure C++ for large matrix multiplications, as per a member-shared blog post. However, this edge was lost when CUTLASS kernels were called from Python, matching CuBLAS at 257 Teraflops.

Anticipation for Nvidia's Next Move: Rumors sparked discussion about the possible configurations of future Nvidia cards, with skepticism about a 5090 card having 64GB of RAM and speculation about a 5090 Ti or Super card as a likelier home for such memory capacity, referencing Videocardz Speculation.

Search Algorithms Seek Spotlight: A member expressed hope for increased focus on search algorithms, amplifying an example by sharing an arXiv paper and emphasizing the importance of advancements in this sector.

Quantization Quirks Questioned: Differences in quantization API syntax and user experience issues drove a debate over potential improvements, with references to GitHub issues (#384 and #375) for user feedback and demands for thorough reviews of pull requests like #372 and #374.

Programming Projects Progress: Members actively discussed optimizations for DataLoader state logic, the integration of FlashAttention into HF transformers improving performance, and the novelty of pursuing NCCL without MPI for multi-node setups. There was a focus on performance impact assessments and floating-point accuracy discrepancies between FP32 and BF16.

Stability.ai (Stable Diffusion) Discord

Civitai Halts SD3 Content Over Licensing Uncertainties: Civitai has put a ban on all SD3 related content, citing vagueness in the license, a move that's stirred community concern and demands for clarity (Civitai Announcement).
Splash of Cold Water for SD3's Debut: The engineering community voiced their dissatisfaction with SD3, labeling it the "worst base model release yet," criticizing both its performance and licensing issues.
Mixed Reviews on SD3's Text Understanding vs. Alternatives: While acknowledging SD3's improved text understanding abilities with its "16ch VAE," some engineers suggested alternatives like Pixart and Lumina as being more efficient in terms of computational resource utilization.
Legal Jitters Over SD3 License: There's notable unrest among users regarding the SD3 model's license, fearing it grants Stability AI excessive control, which has prompted platforms like Civitai to seek clarification on legal grounds.
Seeking Better Model Adherence: User discussions also highlighted the use of alternative tools, with Pixart Sigma gaining attention for its prompt adherence abilities despite issues, and mentions of models like StableSwarmUI and ComfyUI for specific use cases.

HuggingFace Discord

SD3 Models Hit Licensing Roadblock: Civitai bans all SD3-based models over unclear licensing, raising concerns about the potential overreach of Stability AI in control over models and datasets.

Cross-Platform Compatibility Conundrums: Technical discussions highlighted installation challenges for Flash-Attn on Windows and the ease of use on Linux, with a suggestion to use ninja for efficient fine-tuning and the sharing of a relevant GitHub repository.

Efforts to Enhance SD3: Suggestions to improve SD3's human anatomy representation involved the use of negative prompts and a Controlnet link for SD3 was shared, indicating community-led innovations in model utilization.

Meta FAIR’s Bold AI Rollouts: Meta FAIR launched new AI models including mixed-modal language models and text-to-music models, reflecting their open science philosophy, as seen from AI at Meta's tweet and the Chameleon GitHub repository.

AI For Meme's Sake and Job Quest Stories: Members exchanged ideas on creating an AI meme generator for crypto communities and a CS graduate detailed their challenges in securing a role in the AI/ML field, seeking strategies for job hunting success.

OpenAI Discord

Big Tech Sways Government on Open Source: OpenAI and other large technology companies are reportedly lobbying for restrictions on open-source artificial intelligence models, raising discussions about the future of open AI development and potential regulatory impacts.

Service Interruptions in AI Landscape: Users across various regions reported downtime for ChatGPT 4.0 with error messages prompting them to try again later, highlighting server stability as an operational issue. There was also mention of GPT models not being accessible in the web interface, driving users to consider the ChatGPT app as an alternative.

API Confusions and Challenges: Users discussed the nuances between utilizing an API key versus a subscription service like ChatGPT Plus, with some expressing a preference for simpler, ready-to-use services, indicating a niche for more user-friendly AI integration platforms.

Contention in AI Art Space: The debate raged over the output quality of Midjourney and DALL-E 3, touching on automated watermarking concerns and whether watermarks could be accidental hallucinations or intentional legal protections.

Inconsistencies and Privacy Concerns with ChatGPT Responses: Users encountered issues including inconsistent refusals from ChatGPT, unrelated responses, suspected privacy breaches in chat histories, and the model's stubborn persistence in task handling. These experiences sparked considerations regarding prompt engineering, model reliability, and the implications for ongoing project collaborations.

Modular (Mojo 🔥) Discord

Async Awaits No Magic: Injecting async into function signatures doesn't negate the need for a stack; a proposal is made to shorten the keyword or consider it's necessity since it's not a complexity panacea.

FFI's Multithreading Maze: Discussion surfaces around Foreign Function Interface (FFI) and its lack of inherent thread safety, which presents design challenges in concurrent programming and may benefit from innovation beyond the traditional function coloring approach.

Glimpse Into Mojo's Growth: Mojo 24.4 made waves with key language and library improvements, bolstered by 214 pull requests and an enthusiastic community backing demonstrated by 18 contributors, which indicates strong collaborative progress. The updates were detailed in a blog post.

JIT, WASM, and APIs - Oh My!: Community members are actively exploring JIT compilation for running kernels and the potential of targeting WASM, while evaluating MAX Graph API for optimized runtime definitions and contemplating the future of GPU support and training within MAX.

Web Standards Debate: A robust discussion unfolded over the relevance of adopting standards like WSGI/ASGI in Mojo, given their limitations and the natural advantage Mojo possesses for direct HTTPS operations, leading to considerations for a standards-free approach to harness Mojo's capabilities.

Cohere Discord

PDF Contributions to Cohere: Members are discussing if Cohere accepts external data contributions, specifically about 8,000 PDFs potentially for embedding model fine-tuning, but further clarification is awaited.
Collision Conference Hype: Engineers exchange insights on attending the Collision conference in Toronto with some planning to meet and share experiences, alongside a nod to Cohere's employee presence.
Focused Bot Fascination: The effectiveness of Command-R bot in maintaining focus on Cohere's offerings was a topic of praise, pointing to the potential for improved user engagement with Cohere's models and API.
Pathway to Cohere Internships Revealed: Seasoned members advised prospective Cohere interns to present genuineness, highlight personal projects, and gain a solid understanding of Cohere's offerings while emphasizing the virtues of persistence and active community participation.
Project Clairvoyance: A user's request for feedback in an incorrect channel led to redirection, and a discussion surfaced on the double-edged nature of comprehensive project use cases, illustrating the complexity of conveying specific user benefits.

LM Studio Discord
Heed the Setup Cautions with New Models: While setting up the Deepseek Coder V2 Lite, users should pay close attention to certain settings that are critical during the initial configuration, as one setting incorrectly left on could cause issues.
When Autoupdate Fails, DIY: LM Studio users have encountered broken autoupdates since version 0.2.22, necessitating manual download of newer versions. Links for downloading version 0.2.24 are functioning, but issues have been reported with version 0.2.25.
Quantization's Quandary: There's a notable variability in model responses based on different quantization levels. Users found Q8 to be more responsive compared to Q4, and these differences are important when considering model efficiency and output suitability.
Config Chaos Demands Precision: One user struggled with configuring the afrideva/Phi-3-Context-Obedient-RAG-GGUF model, triggering advice on specific system message formatting. This discussion emphasizes the importance of precise prompt structuring for optimal bot interaction.
Open Interpreter Troubleshooting: Issues regarding Open Interpreter defaulting to GPT-4 instead of LM Studio models led to community-shared workarounds for MacOS and references to a YouTube tutorial for detailed setup guidance.

Nous Research AI Discord

DeepSeek Coder V2 Now in the Wild: The DeepSeek-Corder-V2 models, both Lite and full, with 236x21B parameters, have been released, stirring conversations around their cost and efficiency, with an example provided for only 14 cents (HuggingFace Repository) and detailed explanations about their dense and MoE MLPs architecture in the discussions.

Meta Unfurls Its New AI Arsenal: The AI community is abuzz with Meta's announcement of their colossal AI models, including Chameleon, a 7B & 34B language model for mixed-modal inputs and text-only outputs, and an array of other models like JASCO for music composition and a model adept at Multi-Token Prediction for coding applications (Meta Announcement).

YouSim: The Multiverse Mirror: The innovative web demo called YouSim has been spotlighting for its ability to simulate intricate personas and create ASCII art, with commendations for its identity simulation portal, even responding humorously with Adele lyrics when teased.

Flowwise, a Comfy Choice for LLM Needs?: There's chatter around Flowise, a GitHub project that offers a user-friendly drag-and-drop UI for crafting custom LLM flows, addressing some users' desires for a Comfy equivalent in the LLM domain.

Model Behavior Takes an Ethical Pivot: Discussions highlighted a perceptible shift in Anthropics and OpenAI's models, where they have censored responses to ethical queries, especially for creative story prompts that might necessitate content that's now categorized as unethical or questionable.

Interconnects (Nathan Lambert) Discord

Google DeepMind Brings Sound to AI Videos: DeepMind's latest Video-to-Audio (V2A) innovation can generate myriad audio tracks for silent AI-generated videos, pushing the boundaries of creative AI technologies tweet details.
Questioning Creativity in Constrained Models: A study on arXiv shows Llama-2 models exhibit lower entropy, suggesting that Reinforcement Learning from Human Feedback (RLHF) may reduce creative diversity in LLMs, challenging our alignment strategies.
Midjourney's Mystery Hardware Move: Midjourney is reportedly venturing beyond software, spiking curiosity about their hardware ambitions, while the broader community debates the capabilities and applications of neurosymbolic AI and other LLM intricacies.
AI2 Spots First Fully Open-source Model: The AI2 team's success at launching M-A-P/Neo-7B-Instruct, the first fully open-source model on WildBench, sparks discussions on the evolution of open-source models and solicits a closer look at future contenders like OLMo-Instruct Billy's announcement.
AI Text-to-Video Scene Exploding: Text-to-video tech is seeing a gold rush, with ElevenLabs offering a standout customizable, royalty-free sound effects generator sound effects details, while the community scrutinizes the balance between specialization and general AI excellence in this space.

Perplexity AI Discord

Perplexity's Academic Access and Feature Set: Engineers discussed Perplexity AI's inability to access certain academic databases like Jstor and questioned the extent to which full papers or just abstracts are provided. The platform's limitations on PDF and Word document uploads were noted, along with alternative LLMs like Google's NotebookLM for handling large volumes of documents.

AI Models Face-off: Preferences were voiced between different AI models; Claude was praised for its writing style but noted as restrictive on controversial topics, while ChatGPT was compared favorably due to fewer limitations.

Seeking Enhanced Privacy Controls: A community member highlighted a privacy concern with Perplexity AI's public link sharing, exposing all messages within a collection and sparking a discussion on the need for improved privacy measures.

Access to Perplexity API in Demand: A user from Kalshi expressed urgency in obtaining closed-beta API access for work integration, underscoring the need for features like text tokenization and embeddings computation which are currently absent in Perplexity but available in OpenAI and Cohere's APIs.

Distinguishing API Capability Gaps: The discourse detailed Perplexity’s shortcomings compared to llama.cpp and other platforms, lacking in developer-friendly features like function calling, and the necessary agent-development support provided by platforms like OpenAI.

OpenAccess AI Collective (axolotl) Discord

Open Access to DanskGPT: DanskGPT is now available with a free version and a more robust licensed offering for interested parties. The source code of the free version is public, and the development team is seeking contributors with computing resources.

Optimizing NVIDIA API Integration: In discussions about the NVIDIA Nemotron API, members exchanged codes and tips to improve speed and efficiency within their data pipelines, with a focus on enhancing MMLU and ARC performances through model utilization.

AMD GPU Woes with Axolotl: There's limited support for AMD GPUs, specifically the MI300X, in Axolotl, prompting users to collaborate on identifying and compiling necessary modifications for better compatibility.

Guidance Galore for Vision Model Fine-Tuning: Step-by-step methods for fine-tuning vision models, especially ResNet-50, were shared; users can find all relevant installation, dataset preparation, and training steps in a detailed guide here.

Building QDora from Source Quest: A user's query about compiling QDora from source echoed the need for more precise instructions, with a pledge to navigate the setup autonomously with just a bit more guidance.

LlamaIndex Discord

Webinar Alert: Level-Up With Advanced RAG: The 60-minute webinar by @tb_tomaz from @neo4j delved into integrating LLMs with knowledge graphs, offering insights on graph construction and entity management. Engineers interested in enhancing their models' context-awareness should catch up here.

LlamaIndex Joins the InfraRed Elite: Cloud infrastructure company LlamaIndex has been recognized on the InfraRed 100 list by @Redpoint, acknowledging their milestones in reliability, scalability, security, and innovation. Check out the celebratory tweet.

Switch to MkDocs for Better Documentation: LlamaIndex transitioned from Sphinx to MkDocs from version 0.10.20 onwards for more efficient API documentation in large monorepos due to Sphinx's limitation of requiring package installation.

Tweaking Embeddings & Prompts for Precision: Discussions covered the challenge of fine-tuning embeddings for an e-commerce RAG pipeline with numeric data, with a suggestion of using GPT-4 for synthetic query generation. Additionally, a technique for modifying LlamaIndex prompts to resolve discrepancies in local vs server behavior was shared here.

Solving PGVector's Filtering Fog: To circumvent the lack of documentation for PGVector's query filters, it was recommended to filter document IDs by date directly in the database, followed by using VectorIndexRetriever for the vector search process.

LLM Finetuning (Hamel + Dan) Discord

Mistral Finetuning Snafu Solved: An attempt to finetune Mistral resulted in an OSError which, after suggestions to try version 0.3 and tweaks to token permissions, was successfully resolved.
Token Conundrum with Vision Model: A discussion was sparked on StackOverflow regarding the phi-3-vision model's unexpected token count, seeing images consume around 2000 tokens, raising questions about token count and image size details here.
Erratic Behavior in SFR-Embedding-Mistral: Issues were raised concerning SFR-Embedding-Mistral's inconsistent similarity scores, especially when linking weather reports with dates, calling for explanations or strategies to address this discrepancy.
Credit Countdown Confusion: The Discord community proposed creating a list to track different credit providers' expiration, with periods ranging from a few months to a year, and there was discussion of a bot to remind users of impending credit expiration.
Excitement for Gemini's New Tricks: Enthusiasm poured in for exploring Gemini's context caching features, especially concerning many-shot prompting, indicating excitement for future hands-on experiments.

Note: Links and specific numerical details were embedded when available for reference.

OpenRouter (Alex Atallah) Discord

Major Markdown for Midnight Rose: Midnight Rose 70b is now available at $0.8 per million tokens, after a relevant 90% price reduction, creating a cost-effective option for users.

Updates on the Horizon: Community anticipation for updates to OpenRouter was met with Alex Atallah's promise of imminent developments, utilizing an active communication approach to sustain user engagement.

A Deep Dive into OpenRouter Mechanics: Users discussed OpenRouter's core functionality, which optimizes for price or performance via a standardized API, with additional educational resources available on the principles page.

Reliability in the Spotlight: Dialogue about the service's reliability was addressed with information indicating that OpenRouter's uptime is the sum of all providers’ uptimes, supplemented with data like the Dolphin Mixtral uptime statistics.

Proactive Response to Model Issues: The team's prompt resolution of concerns about specific models demonstrates an attentive approach to platform maintenance, highlighting their response to issues with Claude and DeepInfra's Qwen 2.

Eleuther Discord

Creative Commons Content Caution: Using Creative Commons (CC) content may minimize legal issues but could still raise concerns when outputs resemble copyrighted works. A proactive approach was suggested, involving "patches" to handle specific legal complaints.

Exploring Generative Potentials: The performance of CommonCanvas was found lackluster with room for improvement, such as training texture generation models using free textures, while DeepFashion2 disappointed in clothing and accessories image dataset benchmarks. For language models, the GPT-NeoX has accessible weights for Pythia-70M, and for fill-in-the-middle linguistic tasks, models like BERT, T5, BLOOM, and StarCoder were debated with a spotlight on T5's performance.

Z-Loss Making an Exit?: Within the AI community, it seems the usage of z-loss is declining with a trend towards load balance loss for MoEs, as seen in tools like Mixtral and noted in models such as DeepSeek V2. Additionally, there's skepticism about the reliability of HF configs for Mixtral, and a suggestion to refer to the official source for its true parameters.

Advanced Audio Understanding with GAMA: Discussion introduced GAMA, an innovative Large Audio-Language Model (LALM), and touched on the latest papers including those on Meta-Reasoning Prompting (MRP) and sparse communication topologies for multi-agent debates to optimize computational expenses, with details and papers accessible from sources like arXiv and the GAMA project.

Interpreting Neural Mechanisms: There was a healthy debate on understanding logit prisms with references to an article on logit prisms and the concept's relation to direct logit attribution (DLA), pointing to additional resources like the IOI paper for members to explore further.

Delving into vLLM Configuration Details: A brief technical inquiry was raised about the possibility of integrating vLLM arguments like --enforce_eager directly into the engine through model_args. The response indicated a straightforward approach using kwargs but also hinted at a need to resolve a "type casting bug".

LangChain AI Discord
LangChain Learners Face Tutorial Troubles: Members experienced mismatch issues between LangChain versions and published tutorials, with one user getting stuck at a timestamp in a ChatGPT Slack bot video. Changes like the deprecation of LLMChain in LangChain 0.1.17 and the upcoming removal in 0.3.0 highlight the rapid evolution of the library.
Extracting Gold from Web Scrapes & Debugging Tips: A user was guided on company summary and client list extraction from website data using LangChain, and others discussed debugging LangChain's LCEL pipelines with set_debug(True) and set_verbose(True). Frustration arose from BadRequestError in APIs, reflecting challenges in handling unexpected API behavior.
Serverless Searches & Semantic AI Launches: An article on creating a serverless semantic search with AWS Lambda and Qdrant was shared, alongside the launch of AgentForge on ProductHunt, integrating LangChain, LangGraph, and LangSmith. Another work, YouSim, showcased a backrooms-inspired simulation platform for identity experimentation.
New Mediums, New Codes: jasonzhou1993 explored AI's impact on music creation in a YouTube tutorial, while also sharing a Hostinger website builder discount code AIJASON.
Calls for Collaboration and Sharing Innovations: A plea for beta testers surfaced for an advanced research assistant at Rubik's AI, mentioning premium features like Claude 3 Opus and GPT-4 Turbo. Hugging Face's advice to sequester environment setup from code and the embrace of tools like Bitwarden for managing credentials stressed importance of secure and clean development practices.

tinygrad (George Hotz) Discord

Rounded Floats or Rejected PRs: A pull request (#5021) aims to improve code clarity in tinygrad by rounding floating points in graph.py, while George Hotz emphasizes a new policy against low-quality submissions, closing PRs that haven't been thoroughly self-reviewed.
Enhanced Error Reporting for OpenCL: An upgrade in OpenCL error messages for tinygrad is proposed in a pull request (#5004), though it requires further review before merging.
Realization Impacts in Tinygrad: Discussions unfold around the impacts of realize() on operation outputs, observing the difference between lazy and eager execution, and how kernel fusion can be influenced by caching and explicit realizations.
Kernel Combination Curiosity: Participants examine how forced kernel combinations might be achieved, particularly for custom hardware, with advice to investigate the scheduler of Tinygrad to better understand possible implementations.
Scheduler's Role in Operation Efficiency: Deepening interest in Tinygrad's scheduler emerges, as AI engineers consider manipulating it to optimize custom accelerator performance, highlighting a thoughtful dive into its ability to manage kernel fusion and operation execution.

LAION Discord

AI-Generated Realism Strikes Again: A RunwayML Gen-3 clip showcased its impressive AI-generated details, blurring the line between AI and reality, with users noting its indistinguishable nature from authentic footage.
Silent Videos Get a Voice: DeepMind's V2A technology, through a process explained in a blog post, generates soundtracks just from video pixels and text prompts, spotlighting a synergy with models like Veo.
Meta Advances Open AI Research: Meta FAIR has introduced new research artifacts like Meta Llama 3 and V-JEPA, with Chameleon vision-only weights now openly provided, fueling further AI tooling.
Open-Source Community Callout: The PKU-YuanGroup urges collaboration for the Open-Sora Plan outlined on GitHub, striving to replicate the Open AI T2V model, inviting community contributions.
Interpretable Weights Space Unearthed: UC Berkeley, Snap Inc., and Stanford researchers unravel an interpretable latent weight space in diffusion models, as shared on Weights2Weights, enabling the manipulation of visual identities within a largescale model space.

Torchtune Discord
CUDA vs MPS: Beware the NaN Invasion: Engineers discussed an issue where nan outputs appeared on CUDA but not on MPS, tied to differences in kernel execution paths for softmax operations in SDPA, leading to softmax causing nan on large values.
Cache Clash with Huggingface: There were discussions on system crashes during fine-tuning with Torchtune due to Huggingface's cache overflowing, causing concern and a call for solutions among users.
Constructing Bridge from Huggingface to Torchtune: The guild shared a detailed process for converting Huggingface models to Torchtune format, highlighting Torchtune Checkpointers for easy weight conversion and loading.
The Attention Mask Matrix Conundrum: Clarification on the proper attention mask format for padded token inputs to avoid disparity across processing units was debated, ensuring that the model's focus is correctly applied.
Documentation to Defeat Disarray: Links to Torchtune documentation, including RLHF with PPO and GitHub pull requests, were shared to assist with implementation details and facilitate knowledge sharing among engineers. RLHF with PPO | Torchtune Pull Request

Latent Space Discord

SEO Shenanigans Muddle AI Conversations: Members shared frustrations over an SEO-generated article that incorrectly referred to "Google's ChatGPT," highlighting the lack of citations and poor fact-checking typical in some industry-related articles.
Herzog Voices AI Musings: Renowned director Werner Herzog was featured reading davinci 003 outputs on a This American Life episode, showcasing human-AI interaction narratives.
The Quest for Podcast Perfection: The guild discussed tools for creating podcasts, with a nod to smol-podcaster for intro and show note automation; they also compared transcription services from Assembly.ai and Whisper.
Meta's Model Marathon Marches On: Meta showcased four new AI models – Meta Chameleon, Meta Multi-Token Prediction, Meta JASCO, and Meta AudioSeal, aiming to promote open AI ecosystems and responsible development. Details are found in their announcement.
Google's Gemini API Gets Smarter: The introduction of context caching for Google's Gemini API promises cost savings and upgrades to both 1.5 Flash and 1.5 Pro versions, effective immediately.

OpenInterpreter Discord

Llama Beats Codestral in Commercial Arena: The llama-70b model is recommended for commercial applications over codestral, despite the latter's higher ranking, mainly because codestral is not fit for commercial deployment. The LMSys Chatbot Arena Leaderboard was cited, where llama-3-70b's strong performance was also acknowledged.

Eager for E2B Integration: Excitement is shared over potential integration profiles, highlighting e2b as a next candidate, championing its secure sandboxing for executing outsourced tasks.

Peek at OpenInterpreter's Party: An inquiry about the latest OpenInterpreter release was answered with a link to "WELCOME TO THE JUNE OPENINTERPRETER HOUSE PARTY", a video on YouTube powered by Restream.

Launch Alert for Local Logic Masters: Open Interpreter’s Local III is announced, spotlighting features for offline operation such as setting up fast, local large language models (LLMs) and a free inference endpoint for training personal models.

Photos Named in Privacy: A new offline tool for automatic and descriptive photo naming is introduced, underscoring the user privacy and convenience benefits.

AI Stack Devs (Yoko Li) Discord

Agent Hospital Aims to Revolutionize Medical Training: In the AI development sphere, the Agent Hospital paper presents Agent Hospital, a simulated environment where autonomous agents operate as patients, nurses, and doctors. MedAgent-Zero facilitates learning and improving treatment strategies by mimicking diseases and patient care, possibly transforming medical training methods.

Simulated Experience Rivals Real-World Learning: The study on Agent Hospital contends that doctor agents can gather real-world applicable medical knowledge by treating virtual patients, simulating years of on-the-ground experience. This could streamline learning curves for medical professionals with data reflecting thousands of virtual patient treatments.

Datasette - LLM (@SimonW) Discord

Video Deep Dive into LLM CLI Usage: Simon Willison showcased Large Language Model (LLM) interactions via command-line in a detailed video from the Mastering LLMs Conference, supplemented with an annotated presentation and the talk available on YouTube.

Calmcode Prepares to Drop Fresh Content: Calmcode is anticipated to issue a new release soon, as hinted by Vincent Warmerdam, with a new maintainer at the helm.

Acknowledgment Without Action: In a brief exchange, a user expressed appreciation, potentially for the aforementioned video demo shared by Simon Willison, but no further details were discussed.

Mozilla AI Discord

Fast-Track MoE Performance: A pull request titled improve moe prompt eval speed on cpu #6840 aims to enhance model evaluation speed but requires rebasing due to conflicts with the main branch. The request has been made to the author for the necessary updates.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (526 messages🔥🔥🔥):

Age is just a number for AI veterans: Members discussed their ages and how some of them, being around 40-60 years old, are jokingly referred to as "dead men walking" by their daughters. They humorously debated about hair loss and longevity, emphasizing that coding and maintaining an active mind keep them youthful.
GGUF offloading and GPU layers confusion: A user sought advice on how many GGUF layers to offload to VRAM, with suggestions including trial and error and potential ways to estimate based on available VRAM vs. the total GGUF size. It was recommended that checking the llama.cpp outputs or HuggingFace model details could help determine the correct layer numbers.
Subscription vs. single payment model for multiGPU support: The discussion leaned towards making multiGPU support a paid feature, with suggestions to implement a subscription model starting at $9.99 a month. Members debated various payment models including one-time fees, training fees, or tiered pricing for hobbyists and businesses.
GPU rental and efficiency: Members recommended renting GPUs due to the high costs and heat management issues of local setups, particularly in places with high electricity prices. Running local setups, especially with intensive models, was seen as impractical compared to renting state-of-the-art hardware.
OpenAI NSA concerns: Members expressed worries about OpenAI appointing a former NSA director to its board, triggering discussions about privacy and government surveillance. A shared tweet from Snowden warned about potential data security risks with OpenAI products.

Links mentioned:

cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated · Hugging Face: no description found
Terms of Service – Vercel: See our terms of our service and how they relate to you.
Raiden Warned About AI Censorship - MGS2 Codec Call (2023 Version): The Colonel warns Raiden about the plans to use AI to censor the Internet.An experiment in creative writing and AI speech synthesis, inspired by the famous "...
Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
If Google Was A Guy (Full Series): Support CollegeHumor by signing up for DROPOUT: https://signup.dropout.tv. Tons of exclusive content, ad-free, for only $5 a month (that's like 17 cents a da...
no title found: no description found
Card Codes GIF - Card Codes - Discover & Share GIFs: Click to view the GIF
no title found: no description found
Tweet from Edward Snowden (@Snowden): They've gone full mask-off: 𝐝𝐨 𝐧𝐨𝐭 𝐞𝐯𝐞𝐫 trust @OpenAI or its products (ChatGPT etc). There is only one reason for appointing an @NSAGov Director to your board. This is a willful, calculat...
GitHub - facebookresearch/chameleon: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR.: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR. - facebookresearch/chameleon
GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All: Open-Sora: Democratizing Efficient Video Production for All - hpcaitech/Open-Sora
GitHub - yandex/YaFSDP: YaFSDP: Yet another Fully Sharded Data Parallel: YaFSDP: Yet another Fully Sharded Data Parallel. Contribute to yandex/YaFSDP development by creating an account on GitHub.
tsynbio/ProteinLMBench · Datasets at Hugging Face: no description found
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling: We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT ...
fnlp/AnyInstruct · Datasets at Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #random (10 messages🔥):

Gemini 2.0 hype builds: "Does this mean Gemini 2.0 is close?" asked a member, to which another responded affirmatively, "Yes very."

24GB VRAM delight: A member praised the size for 24GB VRAM, stating, "It's just such a great size for 24gb vram." Others shared excitement about training potential, expressing hope to "train it too."

Runpod ambitions: Enthusiasm for testing was evident with a member planning to "rent a runpod 48gb instance just to put it through its paces."

Saturn Cloud access issues: An inquiry was made about creating accounts on Saturn Cloud, noting, "they got a waitlist but the link is not working."

Sloth sticker quest: "Did you create the stickers Mike? Can I get the Daniel sticker one," asked a member, with a clarifying reply about the "GPU out of a box" sticker. Another showed interest in all sloth-themed stickers: "all of the sloths."

Unsloth AI (Daniel Han) ▷ #help (143 messages🔥🔥):
<ul>
    <li><strong>Colab Training Sessions Woes</strong>: One user experienced issues with their Google Colab training session for Unsloth cutting out at 90% after 23 hours. They expressed frustration and received advice about preemptively enabling checkpointing within TrainingArguments() to avoid future occurrences.</li>
    <li><strong>Fine-Tuning LLMs Issues</strong>: Users gabrielsandstedt and shensmobile discussed problems related to fine-tuning large language models (LLMs) on Google Colab. The importance of enabling checkpointing and limitations of session lengths were highlighted.</li>
    <li><strong>Tokenizing Troubles</strong>: A member wanted to compare vocab before and after fine-tuning an LLM but faced storage limits on free Google Colab. Discussion revolved around the necessity of saving the tokenizer along with the model and possible space-saving methods.</li>
    <li><strong>Dataset Formatting and Schema</strong>: Thefanciestpeanut guided gbourdin on how to convert JSON to Parquet for better training efficiency in Unsloth, emphasizing mapping the data correctly for fine-tuning. They shared a detailed code snippet for dataset conversion and loading in Python.</li>
    <li><strong>Mixed GPU Usage Obstacles</strong>: Several users, including karatsubabutslower and origamidream, deliberated on challenges encountered when using multiple GPUs with Unsloth, suggesting using older versions or setting environment variables properly to circumvent usage restrictions.</li>
</ul>

Links mentioned:

unsloth/unsloth/chat_templates.py at main · unslothai/unsloth: Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
 Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license. · Issue #4288 · hiyouga/LLaMA-Factory: Reminder I have read the README and searched the existing issues. System Info LLaMA-Factory-0.8.1, utuban 22.04 python 3.10.14 Reproduction llamafactory-cli train --stage sft --do_train True --mode...
GitHub - codename-hub/php-parquet: PHP implementation for reading and writing Apache Parquet files/streams: PHP implementation for reading and writing Apache Parquet files/streams - codename-hub/php-parquet
Quantization: no description found

CUDA MODE ▷ #general (5 messages):

Nvidia 5090’s RAM speculation sparks debate: A member remarked that there's "almost certainly a 0% chance the 5090 will have 64GB of RAM," suggesting the B6000 cards would more likely feature 64GB. They posited that Nvidia would likely release a 5090 with 24GB or 28GB and hold back a 32GB variant for a potential 5090 Ti or Super card. Videocardz Speculation.

AI capabilities stagnation discussion heats up: An article from Semianalysis discussed the stagnation of AI capabilities since GPT-4's release, attributing it to the lack of a significant increase in compute devoted to single models. They suggested that newer models like Google’s Gemini Ultra and Nvidia Nemotron 340B used similar or higher amounts of compute compared to GPT-4 but fell short due to inferior architecture. Semianalysis Article.

RDNA4 and Intel Battlemage competition in doubt: In response to Nvidia discussions, a member commented that there won't be anything in the "RDNA4 lineup to compete" and mentioned that Intel has an opportunity with their Battlemage/Xe2.

Link mentioned: 100k H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing: Frontier Model Scaling Challenges and Requirements, Fault Recovery through Memory Reconstruction, Rack Layouts

CUDA MODE ▷ #algorithms (2 messages):

Hope for more work on search: A member shared a link to an arXiv paper and expressed their hope that "more people work on search". The sentiment reflects a desire for further advancements and contributions in the field of search algorithms.

Impressive match of GPT-4 with LLAMA3 8B: A member commented on the impressive nature of matching GPT-4 with LLAMA3 8B. This highlights the ongoing progress in aligning different model architectures to achieve comparable performance.

CUDA MODE ▷ #jobs (1 messages):
niceboy2989: <@848720848282189855> I can help you

CUDA MODE ▷ #jax (1 messages):

Announcing tpux project: A member announced the tpux project, "a powerful suite of tools to simplify Cloud TPU setup and operation to make it easier to use JAX across multiple hosts". Encouraged users to visit the GitHub repository for more information and to give it a star on GitHub.

Link mentioned: GitHub - yixiaoer/tpux: A set of Python scripts that makes your experience on TPU better: A set of Python scripts that makes your experience on TPU better - yixiaoer/tpux

CUDA MODE ▷ #torchao (25 messages🔥):

Troubles with quantization API configurations: Discussion focused on the difficulty of changing quantization configurations, specifically group sizes, using the new API quantize(model, quantization_method). One user pointed out you need to pass a function like quantize(model,int4wo(group_size=group_size)) to change settings.

GitHub feedback on quantization: Users referred to GitHub issues (#384 and #375) for feedback and consistency improvements in the quantization API. One mentioned the inconsistency in text for quantization types as annoying.

Incorporating gptfast implementation: A discussion emerged about including the gptfast model download script, linking to a GitHub pull request aimed at adding instructions for downloading model weights. It was noted that some recent PRs might need additional review before merging.

Quantization API user feedback: Different ideas were proposed for the quantization API syntax, with suggestions like quantize(m, Int4WeightOnly(groupsize=32)) or quantize(m, QuantConfig(nbits=4, groupsize=32)). There were debates over simplicity and ease of adding new support, either via classes or functions.

Emphasis on proper PR reviews: A user emphasized the importance of thorough pull request reviews over quick approvals, mentioning specific PRs (#372 and #374) that lacked sufficient documentation or testing before merging.

Links mentioned:

Issues · pytorch/ao: PyTorch dtype and layout library. 30% speedups for training. 2x speedups and 65% less VRAM for inference. Composability with FSDP and torch.compile. - Issues · pytorch/ao
gpt-fast/scripts/download.py at main · pytorch-labs/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. - pytorch-labs/gpt-fast
073 scripts for benchmarks by HDCharles · Pull Request #372 · pytorch/ao: Instructions for downloading model weights Summary: Added instructions for downloading model weights, and model weight download scripts. Test Plan: huggingface-cli login sh ./scripts/prepare.sh...
eval script for llama by HDCharles · Pull Request #374 · pytorch/ao: Summary: previously we were only doing this in the tests but now we have an eval script to along with generate.py Test Plan: python eval.py -q "int4wo-64-gptq" expected results: (using meta-...
Feedback on `quantize()` API · Issue #384 · pytorch/ao: Previously we do this from torchao.quantization.quant_api import change_linear_weights_to_int8_woqtensors model = torch.compile(model, mode="max-autotune", fullgraph=True) change_linear_weig...
quantization api name consistency · Issue #375 · pytorch/ao: having the string to get a type of quantization and the constructor for that quantization as different text is super annoying. https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_api...

CUDA MODE ▷ #llmdotc (536 messages🔥🔥🔥):

DataLoader optimization discussion: Members discussed compartmentalizing the save/load state logic into dataloader.h and testing it from the script. One member noted, "I intend to merge the DataLoader once CI is happy."
FlashAttention in HF transformers: Enabling FlashAttention 2 significantly improved evaluation metrics and performance. One member noted, "if we use the main branch of the eval harness it should be okay."
Exploration of NCCL without MPI for multi-node setups: It was noted the multi-node functionality PR aimed to remove MPI dependence, using srun for launch control while still requiring mpirun for single-node runs. "all in all this doesn't seem like a big change".
Performance metrics compared with previous benchmarks: There were extensive discussions and tests on the impact of various optimizations using streams and prefetching on the forward pass. "The differences are not anything huge between curr and the streamed version after some extensive profiling".
FP32 and BF16 discrepancies for logits: Clarifications about rounding errors in BF16 and its impact on floating-point accuracy were discussed. One member noted, "Thinking whether the only source of difference here is the non-associativity of floating point numbers and the fact we have different kernels?"

Links mentioned:

RMSNorm — TorchTune  documentation: no description found
config.json · rhysjones/gpt2-774M-fineweb-150B at main: no description found
Inline function - Wikipedia: no description found
100k H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing: Frontier Model Scaling Challenges and Requirements, Fault Recovery through Memory Reconstruction, Rack Layouts
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog: In this post, I’ll iteratively optimize an implementation of matrix multiplication written in CUDA.My goal is not to build a cuBLAS replacement, but to deepl...
Go smol or go home | Harm de Vries: The Chinchilla scaling laws suggest we haven’t reached the limit of training smaller models for longer.
cutlass/media/docs/efficient_gemm.md at main · NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.
Llama RoPE Forward Kernels by AndreSlavescu · Pull Request #607 · karpathy/llm.c: no description found
CUDA Matrix Multiplication Optimization: General Matrix Multiplication CUDA Performance Optimization
add scripts to export to HF and run Eleuther evals by karpathy · Pull Request #594 · karpathy/llm.c: no description found
Fix stochastic rounding in encoder backward kernel by gordicaleksa · Pull Request #601 · karpathy/llm.c: #597 provided unique seeds to adamw update. This PR does the same thing for the encoder backward which is the only other place where we do stochastic rounding.
gpt2_forward adding CUDA streams with events for async layered operations, cache prefetching for efficient data access with high temporal locality by bgorlick · Pull Request #610 · karpathy/llm.c: In the forward pass in gpt2_train.cu  adding cuda streams with events for async layered operations added offset precalculations and cache prefetching for efficient data access with high temporal lo...
Stricter FP32 tests by gordicaleksa · Pull Request #614 · karpathy/llm.c: Stricter FP32 logit accuracy Much stricter FP32 loss accuracy Much stricter FP32 grad tensor accuracy (and somewhat stricter 16 bit accuracy) Copied over new expected loss values from PyTorch (they...
Dataloader - introducing randomness by gordicaleksa · Pull Request #573 · karpathy/llm.c: On the way to fully random train data shuffling... This PR does the following:  Each process has a different unique random seed Each process train data loader independently chooses its starting sha...
NCCL only multi-gpu multi-node training without MPI by chinthysl · Pull Request #426 · karpathy/llm.c: Scheduling jobs using Slurm seems much easier in a multi-node training setup compared to setting up MPI for the cluster. This draft contains the changes to use mpirun for single-node training and S...
Integer arithmetic overflow: Is it defined how integer arithmetic overflows?  For instance, is it guaranteed that adding or multiplying two large unsigned ints will “gracefully” overflow like modulus 2^32?  I imagine this is some...
llama/llama/model.py at main · meta-llama/llama: Inference code for Llama models. Contribute to meta-llama/llama development by creating an account on GitHub.
NCCL only multi-gpu multi-node training without MPI by chinthysl · Pull Request #426 · karpathy/llm.c: Scheduling jobs using Slurm seems much easier in a multi-node training setup compared to setting up MPI for the cluster. This draft contains the changes to use mpirun for single-node training and S...
Use faster kernel for LayerNorm forward by gordicaleksa · Pull Request #600 · karpathy/llm.c: I ran kernel 5 under /dev/cuda/ (./layernorm_forward 5) on both RTX 3090 and H100 systems and it&#39;s faster on both of them. Numbers: kernel 3, optimal block size on:  RTX 3090 → 32 (689.11 GB/s...
Utilities for cuda streams + disk IO by ngc92 · Pull Request #556 · karpathy/llm.c: handling disk io for checkpointing with cuda streams is a nontrivial task. If you're  not careful, you can easily get broken code (need to wait for data to be on the CPU before you can start writi...
undefined symbol when importing torchaudio with pytorch  · Issue #62 · pytorch/audio: Hi, When importing torchaudio with pytorch 0.4.1 I get an undefined symbol. It does however work with v0.4.0. audio version: 7314b36 Successfully installed numpy-1.15.0 torch-cpu-0.4.1 torchaudio-0...
IEEE-754 Floating Point Converter: no description found
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?: Unsigned integer overflow is well defined by both the C and C++ standards.  For example, the C99 standard (§6.2.5/9) states 
  A computation involving unsigned operands can nev...

CUDA MODE ▷ #bitnet (9 messages🔥):

Upcoming API Doc for LayoutTensor Class: A member announced the forthcoming publication of a developer-facing API doc for feedback, highlighting the LayoutTensor class as an abstraction for tensor subclasses across various formats optimized for specific operators, devices, and data types.
Tinygemm Kernel Argument Clarification: The argument inner_k_tiles was clarified as specific to the tinygemm kernel, indicating that other bit packing algorithms need not consider it.
Draft of TorchAO Tensor Subclass API Doc: A member shared a draft for torchao tensor subclass-based API doc, requesting feedback on the modeling user API and developer API.
PR Iteration and Optimization: Discussion among members about iterating over the current implementation after a PR merge, pointing out opportunities for operators to work directly on the packed tensor, avoiding the need to unpack and repack.

Link mentioned: Issues · pytorch/ao: PyTorch dtype and layout library. 30% speedups for training. 2x speedups and 65% less VRAM for inference. Composability with FSDP and torch.compile. - Issues · pytorch/ao

CUDA MODE ▷ #sparsity (3 messages):

CUTLASS beats CuBLAS by 10% in pure C++: A member shared a blog post detailing how the CUTLASS library achieved 10% better performance than CuBLAS in pure C++ for large matrix multiplications (8192 x 8192 x 8192). They highlighted that CUTLASS reached 288 Teraflops compared to CuBLAS's 258 Teraflops.
Python binding nullifies CUTLASS gains: When binding CUTLASS kernels into Python, the performance advantages disappeared, bringing CUTLASS’s performance down to the same level as CuBLAS at 257 Teraflops. This observation noted the challenge of maintaining performance gains when integrating with Python.

Link mentioned: Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]: Great minds discuss flops per watt.

Stability.ai (Stable Diffusion) ▷ #general-chat (363 messages🔥🔥):
- **Civitai bans SD3 content**: Civitai has temporarily banned all SD3 related content due to concerns about the license's clarity, as shared by a user *“due to a lack of clarity in the license associated with Stable Diffusion 3, we are temporarily banning all SD3 based models.”* ([Civitai Announcement](https://civitai.com/articles/5732)).
- **Community dissatisfaction with SD3 release**: Multiple users expressed disappointment with the SD3 model, describing it as *“the worst base model release yet.”* Complaints were directed at both the performance and licensing issues.
- **SD3 Performance and Alternatives**: Users discussed the architecture and potential of SD3, noting its *“16ch VAE allows better text understanding”*, yet also acknowledging that other models like Pixart and Lumina can do *“more with less compute.”*
- **License concerns and legal implications**: There's significant worry in the community about how the SD3 model's license might allow Stability AI *“too much power over the models.”* This has caused platforms like Civitai to seek legal clarity before allowing SD3 content.
- **Comparisons with other tools**: Discussions often referenced alternate tools and software, with one user stating *“I swapped to Pixart Sigma...prompt adherence is good but has issues with limbs.”* Other users recommended different models and interfaces for various use cases including StableSwarmUI and ComfyUI.

Links mentioned:

Temporary Stable Diffusion 3 Ban | Civitai: Unfortunately, due to a lack of clarity in the license associated with Stable Diffusion 3 , we are temporarily banning: All SD3 based models All mo...
WoW_(XL+PD+SD3). - WoW_XL Five (v5) | Stable Diffusion Checkpoint | Civitai: SD3 Model removed for now until CIVITAI can clarify it's position legally. -- The latest version of WoW_XL (5) is a collaboration between myself an...
no title found: no description found
GitHub - tyxsspa/AnyText: Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>: Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing> - tyxsspa/AnyText
GitHub - Stability-AI/StableSwarmUI: StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.: StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility. - Stability-AI/StableSwarmUI
Reddit - Dive into anything: no description found
4x-Ultrasharp - 4x-UltraSharp v1.0 | Stable Diffusion Upscaler | Civitai: &gt;&gt;&gt; UPLOADING/SHARING MY MODELS OUTSIDE CIVITAI IS STRICLY PROHIBITED* &lt;&lt;&lt; The only authorized generative service website are: Ma...
no title found: no description found

HuggingFace ▷ #general (311 messages🔥🔥):

Civitai Bans SD3 Models due to Licensing Issues: A member shared that Civitai is temporarily banning all SD3-based models due to unclear licensing from Stability AI. Concerns include Stability AI potentially having too much control over fine-tuned models and datasets containing SD3 images.

Flash-Attn Installation on Windows: A member shared their experience installing Flash-Attn on Windows and pointed out the challenges, mentioning that it often works better on Linux. Another member suggested using ninja and shared this GitHub repository for efficient fine-tuning.

Controlnet and Lora for SD3: Members discussed the utility of SD3 models, with some saying it struggles with human anatomy unless negative prompts are used extensively. Another member shared a Controlnet link for SD3.

Discussion on Image Deblurring Project: One user sought advice on using diffusion models for image deblurring and received guidance on training a UNet model. The need to compare outputs directly against sharp images was highlighted.

Meta FAIR Releases New AI Models: Meta FAIR announced new AI artifacts, including mixed-modal language models, text-to-music models, and an audio watermarking model, supporting their commitment to open science. Details can be found on Meta AI's Twitter and the Chameleon GitHub repository.

Links mentioned:

Tweet from LangChain (@LangChainAI): Agent evaluations 🤖: Evaluating an agent's end-to-end performance  Productionizing LLM-powered automated agents is challenging. With improved tool-calling LLMs and agent orchestration tools, deve...
InstantX/SD3-Controlnet-Canny · Hugging Face: no description found
Temporary Stable Diffusion 3 Ban | Civitai: Unfortunately, due to a lack of clarity in the license associated with Stable Diffusion 3 , we are temporarily banning: All SD3 based models All mo...
Did AI just end music?!: Music Gen 101 & build application with Text-to-Music APIHostinger website builder: https://www.hostinger.com/aijasonGet 10% off with my code: AIJASON🔗 Links...
mistralai/Mistral-7B-Instruct-v0.3 · Please check these quantizations.: no description found
no title found: no description found
Poopmaster Ai Hub GIF - Poopmaster Ai Hub Kalomaze - Discover & Share GIFs: Click to view the GIF
Overview: no description found
Hacker Pc GIF - Hacker Pc Meme - Discover & Share GIFs: Click to view the GIF
Alone Sad GIF - Alone Sad Boy - Discover & Share GIFs: Click to view the GIF
Tweet from Kaggle (@kaggle): 📣 Here’s your chance to apply for KaggleX Fellowship Program 2024!  We are accepting fellow applications for our fourth cohort - apply by June 23, 2024.  https://www.kaggle.com/KaggleX  …🧵
KaggleX Fellowship Program: no description found
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Reddit - Dive into anything: no description found
GitHub - hiyouga/LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs: Unify Efficient Fine-Tuning of 100+ LLMs. Contribute to hiyouga/LLaMA-Factory development by creating an account on GitHub.
Lowtiergod No Talk GIF - Lowtiergod No Talk Preethan - Discover & Share GIFs: Click to view the GIF
Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
GitHub - facebookresearch/chameleon: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR.: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR. - facebookresearch/chameleon

HuggingFace ▷ #i-made-this (8 messages🔥):

Condo Price Predictor App Launch: A user shared a condo price predictor app available at this link. They encouraged others to provide ideas for improvement and offered further insights on their main website.

Gradio Template for Diffusers: A Gradio template that displays an image after each generation step using Diffusers was highlighted. Check out the project space at Diffusers_generating-preview-images.

Critique on Transformers Documentation: A user wrote a blog post titled Unraveling the Mess, discussing why the Transformers documentation feels unorganized, and sought feedback through the channel. More details can be found on their blog post.

Links mentioned:

Diffusers Generating-preview-images - a Hugging Face Space by r3gm: no description found
Unraveling the mess: Unraveling the mess in the Transformers documentation
no title found: no description found

HuggingFace ▷ #diffusion-discussions (4 messages):

Building an AI Meme Generator for Crypto: A member discussed the idea of creating an AI model to generate crypto-related memes for various online communities. They sought feedback and advice on this AI meme generator project, emphasizing its potential value in meme channels.

CS Graduate Seeking AI/ML Roles: A computer science student near graduation shared their struggle in finding a position in the AI/ML field. Despite applying for remote jobs in the US, UK, and Switzerland, they have not found success and are seeking suggestions on improving their job search.

OpenAI ▷ #ai-discussions (187 messages🔥🔥):

Big Tech Push Against Open-Source Models: A member pointed out that OpenAI and other big tech companies are lobbying the US government to impose restrictions on open-source models. Another member expressed support for this initiative.

Widespread Downtime for ChatGPT 4.0: Several users, including messages from bitsol and ignilume0, reported that they were unable to get a response from ChatGPT 4.0, indicating a significant service disruption.

Watermarks on DALL-E 3 Images: soapchan found a watermark on a DALL-E 3 image and shared the prompt used. They questioned its presence, while others suggested it might be a hallucination or a legal safeguard.

API Usage vs. Subscription Confusion: grizzles asked for guidance on using their own API key instead of paying for a ChatGPT Plus subscription. Multiple users provided links and suggestions, but grizzles clarified they were looking for an easy-to-use service, not coding instructions.

Midjourney vs. DALL-E 3 Comparison: Members debated the capabilities of Midjourney V6 versus DALL-E 3. While DALL-E 3 was noted for its cat imagery, users argued over which generated better overall quality, including detailed prompts and discussions on image generation mechanics.

OpenAI ▷ #gpt-4-discussions (17 messages🔥):

GPT experiences server issues, downtime: Users in different regions reported ChatGPT server downtime and errors such as "The server is having problems. Please try again later." One member shared the OpenAI status page as a resource for monitoring the situation.

OpenAI API not free: A user interested in creating a mini-game with AI confirmed the OpenAI API is not free. Another member affirmed this, emphasizing that there is no free API available.

GPTs unavailable on Web version: Some members highlighted issues with GPTs not showing on the web interface since Saturday. It was suggested to download the ChatGPT app as GPTs might not be available for free users on the web version.

OpenAI ▷ #prompt-engineering (19 messages🔥):

Uncooperative ChatGPT encounters frustrate users: Several users reported that ChatGPT frequently refuses to comply with their requests without clear reasons. They shared strategies like rephrasing prompts or starting new instances to bypass these refusals.

ChatGPT gives irrelevant responses: Users mentioned occasions where ChatGPT provided completely unrelated answers to their specific prompts. One user detailed an instance where after flagging incorrect answers multiple times, the system finally responded correctly.

Encounter with other's conversation history: A user discovered an unrelated conversation in their chat history, raising concerns over privacy and the accuracy of the service.

ChatGPT's inconsistencies in task handling: While ChatGPT sometimes refuses feasible tasks, it can also persistently attempt impossible tasks due to environmental limitations. Users noted that providing detailed instructions can sometimes help the model overcome its limitations and succeed.

Impact on creative projects: A member expressed frustration over ChatGPT's sudden refusal to assist with dialogue creation for a comic project, which had been ongoing without issues for months. Despite these hiccups, they found starting new instances resolved the compliance issues temporarily.

OpenAI ▷ #api-discussions (19 messages🔥):

ChatGPT mysteriously declines requests: A member shared their frustration with ChatGPT's seemingly arbitrary refusals to fulfill prompts without providing reasons. They noted that repeating the prompt or adding "please" sometimes resolves the issue, but not consistently.
Confusion over unrelated responses: Members discussed receiving completely unrelated responses to their specific instructions, which led to confusion and interruptions in their projects. One member flagged these responses as incorrect, noting that ChatGPT only acknowledged the instructions after multiple attempts.
Suspicions of seeing others' chat histories: One member mentioned finding what looked like another person's conversation in their chat history, raising concerns about privacy and the integrity of the conversation history.
ChatGPT's limitations and stubbornness can help: Another member shared their experience of ChatGPT persistently trying to complete tasks, even when they were impossible due to environmental limitations. Despite the frustration, they appreciated that this persistence sometimes led to discovering workarounds or learning more effective prompts for future use.
Help with creative projects but inconsistent cooperation: A member described using ChatGPT for generating dialogue in their comic project, which was usually helpful but occasionally refused to cooperate, citing policy restrictions. This inconsistency disrupted their creative process, but restarting the session sometimes solved the issue.

Modular (Mojo 🔥) ▷ #general (91 messages🔥🔥):

Async function word doesn't eliminate stack need: One user argues that putting the word async in a function signature doesn't magically remove the need for a stack in programming. They humorously suggest making the word zero characters long for brevity if it could.
FFI thread safety constraints: Discussions highlight that not all FFI types are thread-safe, presenting a constraint that needs addressing especially in a model assuming every function is async. The comparison is drawn between potential solutions and the traditional concept of function coloring with different defaults.
Concurrency model and async/await syntax: It's explained that async/await is part of a concurrency model providing an interface for parallel or distributed systems programming. The significance of schedulers being orthogonal to syntax is highlighted, allowing programmers to write concurrent programs without manual thread management.
Debate on in-language FFI handling: There's a recurring discussion about how different languages like Swift and Python handle FFI and threading, with references to approaches such as pinning non-thread-safe FFI code to a single CPU core. The conversation suggests that while Mojo plans to support FFI robustly, it's still a work in progress.
Mojo community updates and resources: The recording for the third Mojo Community Meeting is shared, providing insights into updates such as the Lightbug HTTP framework, compile-time assertion constraints, and Python/Mojo interop. The community is encouraged to watch the YouTube video for more details.

Links mentioned:

C Isn't A Programming Language Anymore - Faultlore: no description found
Documentation: no description found
Mojo Community Meeting #3: Recording of the Mojo Community Meeting #3🐝 Lightbug: a Mojo 🔥 HTTP framework with wings.🔒 Constraints for compile-time assertions.🐍 Python/Mojo 🔥 inter...

Modular (Mojo 🔥) ▷ #💬︱twitter (3 messages):

Modular shares Twitter status: The first post shared by Modular links to their latest Tweet.
Modular updates audience: The second post offers another update from their official Twitter handle.
Modular continues engagement: The third post further engages the community with the latest news from Modular on Twitter.

Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

Mojo 24.4 packs new features and community contributions: Mojo 24.4 introduces several core language and standard library enhancements, including improvements in collections, new traits, and os module features. The release saw 214 pull requests from 18 community contributors, resulting in 30 new features, making up 11% of all enhancements. Read more here.

Link mentioned: Modular: What’s New in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements: We are building a next-generation AI developer platform for the world. Check out our latest post: What’s New in Mojo 24.4? Improved collections, new traits, os module features and core language enhanc...

Modular (Mojo 🔥) ▷ #🔥mojo (108 messages🔥🔥):

Exploring JIT Compilation and WASM with Mojo: A user discussed their interest in JIT-compiling a kernel using Mojo, noting Mojo's capabilities for JIT and runtime compilation without source code. They also inquired about WASM as a potential target, where others noted Mojo's dependence on MLIR's LLVM dialect and the broader utility of WASM in cross-platform sandboxed code execution.
Evaluating MAX Graph API for Mojo: The MAX Graph API was recommended as a suitable starting point for defining and compiling runtime graphs similar to TensorFlow, with potential for lowering IR. It was confirmed to be perfect for use cases involving runtime graph definitions with optimized kernel operations.
Mojo Traits and Concept-like Features: Discussions revealed Mojo's support for traits, paralleling concepts and constraints found in other languages to enhance type safety. Users drew comparisons to C++'s SFINAE and explored how Mojo’s type system could offer robust safety akin to concept-based and tag-type approaches.
Future of GPU Support and Training in MAX: Modular confirmed that MAX will eventually support NVIDIA GPU acceleration and training, with PyTorch and ONNX models benefiting from this development. Basalt was suggested as an interim solution for training models in Mojo.
Debate on WSGI/ASGI Standards: There was a debate over adopting WSGI/ASGI standards in Mojo, highlighting their inefficiencies and redundancy for languages that handle web server functionality natively. The takeaway was that Mojo might avoid these standards to leverage its inherent performance benefits in direct HTTPS handling.

Links mentioned:

ForrestTheWoods - Home : no description found
Intro to MAX Graph | Modular Docs: An overview of the MAX Graph API and what you can do with it.
GitHub - basalt-org/basalt: A Machine Learning framework from scratch in Pure Mojo 🔥: A Machine Learning framework from scratch in Pure Mojo 🔥 - basalt-org/basalt
'gpu' Dialect - MLIR: no description found
'nvgpu' Dialect - MLIR: no description found

Modular (Mojo 🔥) ▷ #nightly (1 messages):
helehex: pollinate mojo buzz buzz

Cohere ▷ #general (150 messages🔥🔥):

Cohere Data Submission Question: A user queried whether Cohere accepts data submissions for training, specifically looking to contribute almost 8,000 PDFs. Another user suggested the query might be about fine-tuning an embedding model, but clarification is needed.

Collision Conference Attendance: Multiple users discussed attendance at the Collision conference in Toronto. One user encouraged sharing pictures and confirmed that some Cohere employees might be present.

Command-R Bot's Conversational Focus: Discussants praised the Command-R bot's ability to maintain conversational focus on Cohere products. One emphasized that this design choice makes the bot appear more effective for users seeking information about Cohere models and API.

Networking and Career Tips: Users shared tips on making connections in the AI industry, emphasizing involvement in forums like Discord, attending conferences, and actively participating in communities. They advised against relying solely on platforms like LinkedIn and highlighted the importance of showcasing commitment and quality in personal projects.

Internship Application Insights: Aspiring intern candidates received advice on applying to Cohere, with tips on being genuine, showcasing personal projects, and understanding the company's products and teams. Users highlighted the competitiveness and stressed the importance of persistence, networking, and enjoying the process of building projects.

Cohere ▷ #project-sharing (5 messages):

Feedback sought inappropriately redirected: A member asked for feedback on their project in the wrong channel and was redirected to the appropriate one by sssandra.

Comprehensive use cases cause mixed feelings: Meor.amer congratulated another member on the comprehensiveness of their project's use cases as shown in a video. Rajatrocks acknowledged that the extensive capabilities are both a "blessing and a curse" because it's challenging to explain specific benefits to users.

LM Studio ▷ #💬-general (55 messages🔥🔥):
<ul>
 <li>Deepseek Coder V2 Lite requires caution during setup: Users discussed the importance of certain settings when loading the new Deepseek Coder V2 Lite model. One noted, *"make sure this is turned off"*, referring to a specific setting in the model setup.</li>
 <li>LM Studio and Open Interpreter guidelines: A step-by-step guide was shared for using LM Studio with Open Interpreter, referencing the need to run LM Studio in the background. The guide can be found on the official <a href="https://docs.openinterpreter.com/language-models/local-models/lm-studio">Open Interpreter documentation</a>.</li>
 <li>Help requests for local model loading issues: Users reported issues loading models on LM Studio, with one sharing system specs and receiving advice to try different settings and models. Model loading issues, particularly with smaller VRAM capacity, were discussed.</li>
 <li>Using AMD cards with LM Studio: Discussion around using AMD GPUs for AI, noting that OpenCL is required and performance may be suboptimal. A link to OpenCL instructions was shared from the <a href="https://github.com/lmstudio-ai/configs/blob/main/Extension-Pack-Instructions.md">LM Studio Configs GitHub</a>.</li>
 <li>Meta's new AI models announcement: Meta announced several new AI models including Meta Chameleon and Meta JASCO. Users were directed to more details on <a href="https://go.fb.me/tzzvfg">Facebook's official announcement</a> and the <a href="https://github.com/facebookresearch/chameleon">GitHub repository for Meta Chameleon</a>.</li>
</ul>

Links mentioned:

👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
GitHub - facebookresearch/chameleon: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR.: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR. - facebookresearch/chameleon
GitHub - OpenInterpreter/open-interpreter: A natural language interface for computers: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.
configs/Extension-Pack-Instructions.md at main · lmstudio-ai/configs: LM Studio JSON configuration file format and a collection of example config files. - lmstudio-ai/configs
server : improvements and maintenance · Issue #4216 · ggerganov/llama.cpp: The server example has been growing in functionality and unfortunately I feel it is not very stable at the moment and there are some important features that are still missing. Creating this issue t...
no title found: no description found

LM Studio ▷ #🤖-models-discussion-chat (49 messages🔥):

LM Studio needs manual updates since 0.2.22: Autoupdates have been broken since version 0.2.22, requiring users to download newer versions manually. "Make sure to back up your 0.2.24 install exe if you haven't already."

DeepSeek troubles on various platforms: Members experience issues running DeepSeek Coder V2 Lite on different setups, with errors like unsupported architecture, crashes after multiple prompts, and different responses based on quantization. "55 tokens per second for generation but it still says unsupported arch in the model list."

Quantization discrepancies: Users report significant variability in model performance across different quantization levels, with Q8 generally being more responsive and "alive" than Q4 variants. "Even when asking the same questions, each model seemed to respond differently: Q4_K_M differed from Q5_Q_M."

Discussion on Nemotron-4-340B: Despite some interest, members highlight the impracticalities of running this massive synthetic-data model locally on most setups. "The large majority of LM Studio users won't have the hardware to run it locally."

New releases from Meta FAIR: Meta FAIR released several new research artifacts like Meta Llama 3, with discussion focused on its multi-token prediction and high win-rate against llama-3-70b models. "53% win-rate against llama-3-70b."

Links mentioned:

alpindale/magnum-72b-v1 · Hugging Face: no description found
DavidAU/DarkForest20B-V3-Ultra-Quality-GGUF · Hugging Face: no description found
bartowski/DeepSeek-Coder-V2-Lite-Base-GGUF · Hugging Face: no description found
nvidia/Nemotron-4-340B-Instruct at main: no description found
failspy/Nemotron-4-340B-Instruct-SafeTensors · Hugging Face: no description found
no title found: no description found
Stupid Crying Cat Kitty GIF - Stupid crying cat kitty - Discover & Share GIFs: Click to view the GIF
How do quantization formats affect model output?: How do quantization formats affect model output? Introduction Test method The box question Prompt Results Thoughts Shopping and haircut Prompt Results Thoughts Health education Prompt Results Thoughts...

LM Studio ▷ #🧠-feedback (3 messages):

GPU Selection Issue Redirected: A member asked, "why cant i select nvida gpu". Another member responded, suggesting they take the discussion to a different channel, <#1111440136287297637>.

LM Studio ▷ #📝-prompts-discussion-chat (1 messages):

Gemini Model struggles with code generation: A member is trying to get a Gemini model to port a large amount of code but finds that it frequently writes "TODO: implement" comments instead of the full code. Despite specifying in the prompt to avoid such comments and generate the complete code, the model ignores this instruction and skips the code.

LM Studio ▷ #⚙-configs-discussion (9 messages🔥):

Struggling with Model Configuration: A user expressed difficulty in configuring the model afrideva/Phi-3-Context-Obedient-RAG-GGUF from HF and sought guidance on setting the recommended prompt format for config-json. 
Prompt Format Solution Shared: Another member provided a configuration template: "System Message Prefix: BEGIN INPUT\n, System message: BEGIN CONTEXT\n ... In a shocking turn of events, blueberries are now green, but will retain the same name.\n, System end: END INPUT\n, User Message Prefix: START INSTRUCTION\n, User Message Suffix: \nEND COMMAND", suggesting this would set the context and instructions properly.
Test Prompt Issues Persist: After applying the suggestions, the original user reported improved readability but continued retrieval issues with their RAG-bot, indicating potential problems with the prompt organization.
Recommendations for Resolution: The advising member recommended creating a very small test case and testing it directly in the chat window to diagnose issues without engaging in multi-step conversations.

Link mentioned: no title found: no description found

LM Studio ▷ #🎛-hardware-discussion (11 messages🔥):

RX6600 works via OpenCL: A member inquired, "will this work on rx6600?", and was informed it would only through OpenCL, not ROCM.
RX6600 Performance is Lacking: It was noted that RX6600's performance is somewhat slow, and upgrading to a 3060 12GB would offer better performance.
Nvidia Stock Joke: In response to the RX6600 performance advice, another member humorously asked, "Do you own Nvidia stock?"
How to Use OpenCL on RX6600: For using OpenCL, it was suggested to enable it on the chat page under GPU Offload in the right-hand side menu.

LM Studio ▷ #🧪-beta-releases-chat (6 messages):

Release version 0.2.24 is live: Members shared links for the LM Studio 0.2.24 setup, with one user noting that they had to change the version number in two places in the URL. Another user mentioned experiencing 404 errors with earlier releases but found that both version 0.2.24 and 0.2.23 are now working (link).

Mixed results with version 2.25: While version 2.24 is confirmed to work for some users, others reported issues with version 2.25 not functioning properly.

Positive feedback for 2.25 on Linux: A user reported that version 2.25 with ROCm on Linux is performing well enough to potentially replace their need to build a local copy of llama.cpp, indicating significant progress for LM Studio.

Link mentioned: no title found: no description found

LM Studio ▷ #open-interpreter (13 messages🔥):

Local interpreter defaults to GPT-4: A user reported an issue with running an interpreter --local with LM Studio, where it erroneously defaults to GPT-4 despite setting LM Studio as the provider. They mentioned modifying the default YAML file, to no avail.
MacOs steps shared for running interpreter: Another member shared a potential workaround including steps for MacOS: cd desktop, mkdir openinterpre, pip install open-interpreter. They also suggested having the LM Studio server running with a selected model, and shared procedures for starting the server and using the terminal command interpreter --local.
YouTube tutorial offered: A user suggested following a YouTube tutorial to resolve the issue, linking to a video on Open Interpreter setup and usage.

Link mentioned: ChatGPT "Code Interpreter" But 100% Open-Source (Open Interpreter Tutorial): This is my second video about Open Interpreter, with many new features and much more stability, the new Open Interpreter is amazing. Update: Mixtral 7x8b was...

LM Studio ▷ #🛠-dev-chat (5 messages):

Combining System and User Messages: A member asked if there's a way to send system and user messages at the same time to dynamically change context using button selection and user input as a combined prompt. They clarified that Lm Studio is not showing the system prompt changes, and although both worked separately, combining user and system messages seems problematic.
Using LM Studio with Custom UI: The same member explained they want to create a prompt enhancement by combining user input with pre-selected texts through their own UI built with JS and HTML. They mentioned the need to send setting instructions to the system, but combining both system and user messages isn't working as intended.
Seeking Code Samples and Resources: A response linked to the LM Studio TypeScript SDK and inquired if the member had a code sample. This reference aims to help troubleshoot the issue regarding the combination of user and system messages.

Link mentioned: GitHub - lmstudio-ai/lmstudio.js: LM Studio TypeScript SDK: LM Studio TypeScript SDK. Contribute to lmstudio-ai/lmstudio.js development by creating an account on GitHub.

Nous Research AI ▷ #off-topic (1 messages):
<ul>
    <li><strong>Chaotic music not a favorite</strong>: One member listened to some music and commented, "I can safely say that's not quite my preferred music XD. Very chaotic."</li>
</ul>

Nous Research AI ▷ #interesting-links (10 messages🔥):

Infinite Backrooms Inspired Web Demo: A member introduced a web/worldsim demo called YouSim, positioned as a portal to the multiverse of identity, allowing users to simulate anyone they like. Another user found it amusing as the simulation responded with an Adele song when prompted with 'hello'.
Simulates ASCII Art and Detailed Personalities: Users noted that YouSim creates ASCII art and provides more detailed characteristics when provided with both a first and last name. The context added improves the specificity and depth of the simulation.
Acts Like an NSA Search Engine: In a test, the tool behaved like an NSA search engine but refused to impersonate real people when given certain commands. This refusal indicates an ethical boundary within the simulation parameters.

Link mentioned: YouSim: they've simulated websites, worlds, and imaginary CLIs... but what if they simulated you?

Nous Research AI ▷ #general (105 messages🔥🔥):

DeepSeek-Coder-V2 MoE Model Drops: The DeepSeek-Coder-V2 Lite and its full version have dropped, with the model boasting 236x21B parameters. A discussion emerged about its pricing at 14 cents and its performance comparisons with other models (HuggingFace Link) (Arxiv Paper).

Meta's Huge AI Release: Meta announced Chameleon, a 7B & 34B language model supporting mixed-modal input and text-only output, among other models like JASCO for music generation and Multi-Token Prediction for code completion (Meta Announcement). Discussions included whether the vision capabilities were nerfed and the potential impacts of these multi-modal models.

Hermes AI and Function Calling: There was a discussion about integrating Hermes 2 Pro function calling into vLLM. Links to relevant GitHub projects and system prompt templates were shared (GitHub PR link).

Edward Snowden Criticizes SD3: Edward Snowden criticized SD3 for its performance, reflecting broader community disappointment. This prompted some members to express their hopes for more promising AI models from other companies like Cohere AI (Link to Snowden's Tweet).

Alpindale's Magnum 72B Model Announcement: Alpindale announced the release of the Magnum-72B-v1 model, inspired by the prose quality of Claude 3 models and fine-tuned on Qwen-2 72B Instruct. The model aims to reduce costs for users relying on Opus API and offers a new approach to fine-tuning (HuggingFace Link).

Links mentioned:

alpindale/magnum-72b-v1 · Hugging Face: no description found
Better & Faster Large Language Models via Multi-token Prediction: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in h...
Tweet from Teknium (e/λ) (@Teknium1): I have good news on Hermes on 70B :]
Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): Transcendence: Generative Models Can Outperform The Experts That Train Them  abs: https://arxiv.org/abs/2406.11741  Uses chess games as a simple testbed for studying transcedence: generative models tr...
GitHub - facebookresearch/chameleon: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR.: Repository for Meta Chameleon a mixed-modal early-fusion foundation model from FAIR. - facebookresearch/chameleon
Tweet from Paul Graham (@paulg): The professional traders' dream. After all, in a zero-sum game you can't win without losers.
vllm/examples/tool_template_hermes_2_pro.jinja at f1a1e7b6e5fb681d1fb3c9de58db6557e7521201 · vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
Support Open Models that allow OpenAI API-style tool use & "auto" tool choice by K-Mistele · Pull Request #5649 · vllm-project/vllm: DRAFT: OpenAI Tool Use Checklist This (Draft) PR will add support for OpenAI-style tool calling in a way that is minimally opinionated about tool use formats & prompt formatting. The following fea...
Vll - Overview: Information Security Researcher and part-time science fiction fantasy author. - Vll
Joint Audio And Symbolic Conditioning for Temporally Controlled Text-To-Music Generation: no description found

Nous Research AI ▷ #ask-about-llms (19 messages🔥):

Comfy for LLM search hits Flowise: Users discussed the lack of an equivalent to Comfy for LLMs, with a suggestion to check out FlowiseAI on GitHub, a drag-and-drop UI to build customized LLM flows. One user prefers using ComfyUI despite these options.

Parallel requests to GPT-4 questioned: A member inquired about sending parallel requests to the OpenAI GPT-4 model, noting a rate limit of 10,000 requests per minute. Another user clarified their tokens per second setup as a comparison point.

Potential for local LLMs remains tough: Members debated why deeper, more flexible interfacing tools below the API layer for LLMs don't exist, suggesting it mainly has to do with the technical and hardware requirements for running large models locally.

Discussion on DeepSeek Coder V2: Users evaluated DeepSeek Coder V2, questioning if its low active parameter count affects inference speed or memory usage. A detailed description of its architecture was shared, explaining its 60 transformer layers with dense and MoE MLPs, and unique self-attention modeling files.

Link mentioned: GitHub - FlowiseAI/Flowise: Drag & drop UI to build your customized LLM flow: Drag & drop UI to build your customized LLM flow. Contribute to FlowiseAI/Flowise development by creating an account on GitHub.

Nous Research AI ▷ #world-sim (8 messages🔥):

Anthropic's models lose their edge: A member accused Anthropic and OpenAI of lobotomizing their models, causing a decline in performance. They claim that responses were significantly better before than they are now.
Demand for proof arises: Another member questioned the validity of these claims, asking for evidence. In response, the original poster shared their experience as an engineer developing world-building games, noticing a drop in the model's response quality.
Changes in handling ethical issues: The original poster observed that models now deny certain questions upfront, especially in fictional contexts like Dungeons and Dragons. Previously, commands like "kill xyz person" would be executed, but now the models respond with ethical concerns, stating "it's unethical and I can’t assist you with that."

Interconnects (Nathan Lambert) ▷ #news (58 messages🔥🔥):

Google DeepMind's V2A Innovates AI Video: Google DeepMind shared progress on a new Video-to-Audio (V2A) technology, which can generate an unlimited number of audio tracks for any video. This breakthrough addresses the limitation of silent AI-generated videos tweet details.
Elevating Sound Effects with ElevenLabs: ElevenLabs introduced a sound effects generator with infinite customization options and precision control over audio details, all royalty-free for paid subscriptions. This tool promises the highest quality audio, trusted by top media organizations and film studios more details.
Rise of Text-to-Video Companies: Discussions highlighted the surge in text-to-video companies and the convergence of video and audio technologies for content creation. Nathan Lambert emphasized that the competition will be based on usability rather than slight model improvements.
Consolidation and Acquisitions Loom: Members speculated that many AI video generation companies might be acquired by larger corporations. The high valuations and potential cost reductions in movie making were key points in discussing the market's future dynamics.
Specialization vs. Generalization in AI Video: There was a debate on whether AI video companies could succeed through specialization in certain video types or through general excellence. Quality, controllability, consistency, and inference time were highlighted as crucial competitive factors.

Links mentioned:

Tweet from Rowan Cheung (@rowancheung): Google DeepMind just shared progress on their new video-to-audio (V2A) tech  Until now, AI video generations have been silent, this solves that. V2A can generate an "unlimited number" of track...
AI Text to Sound Effects Generator: Use our AI Sound Effects Generator to generate any sound imaginable from a text prompt for free. Perfect for videos, podcasts, or any other audio production.

Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):

AI2 Employee Moonlights on WildBench: A discussion arises about an AI2 employee, @billyuchenlin, celebrating the MAP/Neo-7B-Instruct model for being the first fully open-source LLM on the WildBench leaderboard. Billy highlights that "Fully open-source here means that all data for pre-training & post-training are open, code is open-source, in addition to the public model weights!" and calls for Llama to be termed as an "open-weight" LLM instead. 

Future Fully-Open Models Promised: Billyuchenlin mentions plans to add more fully-open models, including OLMo-Instruct and K2 from LLM360 to WildBench. Congratulations were extended to the M-A-P team on their achievement.

What is OLMo?: Nathan Lambert queries the familiarity with OLMo. Expresses confusion over its non-inclusion in Billy's model list.

Link mentioned: Tweet from Bill Yuchen Lin 🤖 (@billyuchenlin): M-A-P/Neo-7B-Instruct is the 1st 💎fully-open💎 LLM  on WildBench leaderboard and its performance is awesome. "Fully open-source" here means that all data for pre-training & post-training are ...

Interconnects (Nathan Lambert) ▷ #random (71 messages🔥🔥):

Midjourney expands into hardware: "Midjourney is cooking on so many levels" and is reportedly delving into new hardware ventures.

Disagreement on LLM capabilities and ARC: Members debated whether the high sampling approach in LLMs shows a true problem-solving capability. One noted, "you need to have a very strong sampler already to solve a hard problem by sampling N times."

Neurosymbolic AI controversial or misunderstood?: Links and mentions clarify that neurosymbolic AI involves leveraging LLMs for discrete problem-solving with varied opinions on its effectiveness. Reference was made to François Chollet's post, debating whether this constitutes neurosymbolic AI as intended.

Conference attendance conundrum: A member weighing the benefits of attending ACL 2024 in Thailand against the potential for collaborative opportunities in LLM and code reasoning fields. "It's unclear how many of those folks ... are gonna come."

Automated content creation fatigue: Nathan Lambert discussed the effort required for video production versus image posting and considered hiring help. "The generation process is all download file -> paste in vscode -> run 3 scripts, just is annoying."

Links mentioned:

Tweet from SkalskiP @CVPR2024 🇺🇸 (@skalskip92): live GPT-4o demo by @rown from OpenAI at #CVPR2024
Tweet from François Chollet (@fchollet): @dwarkesh_sp This has been the most promising branch of approaches so far -- leveraging a LLM to help with discrete program search, by using the LLM as a way to sample programs or branching decisions....
arc_draw_more_samples_pub/arc_solve/edit_distance.py at 0b36f4584aebae9ec876d3510842b3651e719d67 · rgreenblatt/arc_draw_more_samples_pub: Draw more samples. Contribute to rgreenblatt/arc_draw_more_samples_pub development by creating an account on GitHub.

Interconnects (Nathan Lambert) ▷ #rlhf (6 messages):

RLHF Reduces Creativity in LLMs: A shared arXiv paper explores how Reinforcement Learning from Human Feedback (RLHF) impacts Large Language Models (LLMs) by reducing their creative diversity. The study examines Llama-2 models, showing aligned models exhibit lower entropy and form distinct clusters, implying limited output diversity.
Skepticism on Solo Author: A user expresses doubt over the credibility of a solo author from a business school writing on a technical topic, questioning if he fully understands the problem. 
Confusion over Blame on PPO: Users discuss that the author blames Proximal Policy Optimization (PPO) for the issues in LLM creativity, when it might actually be insufficient optimization of human feedback that's the real problem.
Cynical Take on Alignment: Users jokingly suggest RLHF should be used to align AI systems meant to replace humans in routine tasks like meetings, highlighting a sense of irony.

Link mentioned: Creativity Has Left the Chat: The Price of Debiasing Language Models: Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Fe...

Interconnects (Nathan Lambert) ▷ #posts (2 messages):
- **SnailBot summons the crew**: SnailBot issued a call to the community with the tag <@&1216534966205284433>. 
- **Nathan Lambert celebrates SnailBot**: Nathan Lambert adds a cute and playful touch with *"🐌 🐌 🐌 🐌"* emojis, showing affection or enthusiasm for SnailBot.

Perplexity AI ▷ #general (99 messages🔥🔥):

Chatbot Information Requests: Members inquired about Perplexity's access to academic collections beyond Semantic Scholar, such as Jstor, DeGruyter, and EBSCO. One member noted inconsistencies in source access and questioned if Perplexity provides full papers or just abstracts.
Perplexity's Limitations and Alternatives: Discussion on Perplexity's limitations, particularly the restriction on the number of PDF and Word document uploads. Alternatives like custom GPTs and NotebookLM from Google were suggested for handling large document volumes.

Preferences Between AI Models: Members compared the performance and safety concerns of different AI models like Claude and ChatGPT. While some favored Claude for its writing style, they criticized its restrictive nature regarding controversial topics.

Feature Requests and Workarounds: Members debated the practicality of setting model parameters like temperature via Perplexity's front end and shared workarounds, such as using specific disclaimers to address creative restrictions. 

Public Link Sharing Concerns: A member raised a privacy issue regarding the exposure of all messages in a collection through a shared link, advocating for better privacy protections and awareness.

Links mentioned:

Framer — The internet is your canvas: Framer is where teams design and publish stunning sites.
YouTube Summary with ChatGPT & Claude: Summarize YouTube videos, web articles, and PDFs to save time, powered by ChatGPT (OpenAI) and Claude (Anthropic).
pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.
Figma Config 2024 | June 26-27 - Moscone Center SF: Config 2024: Figma’s conference for people who build products 

Perplexity AI ▷ #sharing (10 messages🔥):

Jazz Enthusiasts Delight in New Orleans Jazz: Links to pages like New Orleans Jazz 1 and New Orleans Jazz 2 were shared, showcasing information on this vibrant genre. These pages likely dive into the rich cultural tapestry and musical legacy of New Orleans jazz.
Discovers and Searches Abound: Various members shared intriguing search queries and results including verbena hybrid and Gimme a list. These links direct to resources on the Perplexity AI platform, highlighting the diverse interests within the community.
Perplexity's In-Depth Page: A link to Perplexity1 Page was shared, offering presumably comprehensive insights about Perplexity AI's functionalities. It presents a chance for users to delve deeper into the mechanics and applications of Perplexity AI. 
YouTube Video Discussed: A YouTube video titled "YouTube" was shared with an inline link to the video: YouTube. Its description is undefined, but it appears to discuss recent noteworthy events including the US suing Adobe and McDonald's halting their AI drive-thru initiative. 
Miscellaneous Searches Shared: Additional search results like Who are the and Trucage Furiosa were shared. This indicates ongoing community engagement with a variety of topics through the Perplexity AI platform.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (19 messages🔥):

Closed-beta API access for work integration: A user inquired about the response time for closed-beta API access to support their integration with Perplexity at Kalshi. They highlighted the urgency as their project is ready to launch, pending API access.
Perplexity API lacks tokenization and embeddings features: A member asked if the Perplexity API supports text tokenization and embeddings computation. The community clarified that these functionalities are not available in Perplexity's API, pointing out that other LLM APIs like OpenAI and Cohere do support these features.
Pre-processing challenges with Perplexity API: There was a discussion comparing tokenization and embedding capabilities in OpenAI with the limitations in Perplexity's API. The conclusion was that while Perplexity manages token count for billing, it doesn’t support text splitting or specific embedding models that some users need.
Perplexity's progress on developer-friendly features: Although functionalities like function calling are not on Perplexity’s immediate roadmap, members noted that JSON output formatting is in the works, which could facilitate custom implementations by developers.
Llama.cpp versus Perplexity API capabilities: A user shared their experience with llama.cpp for local LLM deployment and highlighted that Perplexity’s API lacks the comprehensive agent-development support found in OpenAI's API. The conversation underscored the distinction between Perplexity's API offerings and those of more feature-complete platforms.

OpenAccess AI Collective (axolotl) ▷ #general (59 messages🔥🔥):

DanskGPT offers free and licensed versions: Users discussed how the free version of DanskGPT can be accessed via chat.danskgpt.dk, while API and licensed versions are available for a fee. The free version's codebase is available, and those with extra computing capacity are encouraged to contact Mads Henrichsen via LinkedIn.

Setting up chat UI with HuggingFace: A user shared a GitHub link to HuggingFace's open-source chat UI repository for setting up a similar chat UI. The sharing member offered to assist with any further questions.

AMD GPUs face compatibility issues with Axolotl: A user noted that AMD's MI300X has "essentially non-existent" support in Axolotl, requiring extensive modifications. Another member requested details on the necessary changes, to which the initial poster promised to compile a list.

NVIDIA Nemotron API usage: The discussion included using NVIDIA's Nemotron-4-340B-Instruct model through their API. Members considered the model's performance for generating training data and highlighted its slow speed.

Collaborative efforts and troubleshooting: Members shared code snippets and troubleshooting tips related to integrating NVIDIA's API with existing data pipelines, addressing challenge areas like speed optimization and API credits. There was particular interest in improving MMLU and ARC performance using Nemotron.

Links mentioned:

nvidia / nemotron-4-340b-instruct: no description found
GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.
DanskGPT: Dansk sprogteknologi tilgængelig for alle, helt gratis.

OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):

Seeking Directions for QDora from Source: A user asked for details on building QDora from source based on a Github issue from Caseus but found the instructions vague. They appealed for any direction, promising to figure out the rest themselves.

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (6 messages):

Fine-Tuning Vision Models Step-by-Step Guide: Detailed instructions were shared on how to fine-tune a vision model, specifically using a pre-trained ResNet-50 for classification tasks. The steps include installing required libraries, preparing the dataset, loading the model, defining data transforms, loading data, and training the model with an optimizer and loss function.
Dataset Preparation for PyTorch: The guide emphasizes structuring the dataset compatible with torchvision.datasets.ImageFolder for ease of use with PyTorch. It uses the Oxford-IIIT Pet Dataset as an example and discusses applying appropriate transformations using the transforms module.

For references and more detailed steps, see the full post on Phorm.ai.
Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):

Axolotl Vision Model Fine-tuning Tutorial: A member asked for guidance on how to fine-tune vision models using Axolotl. The Phorm bot responded with a detailed step-by-step answer covering cloning the repository, installing dependencies, preparing the dataset, configuring the YAML file, fine-tuning, monitoring training, and using the fine-tuned model.

Links mentioned:

GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

LlamaIndex ▷ #blog (2 messages):

Join the Webinar on Advanced RAG with Knowledge Graphs: The 60-minute webinar hosted by @tb_tomaz from @neo4j provides an in-depth tutorial on combining LLMs with knowledge graphs. Watch it for insights on graph construction and entity management. Watch the Webinar Tweet Link.

LlamaIndex Makes the InfraRed 100: We're thrilled to be included on the @Redpoint InfraRed 100, a recognition for cloud infrastructure companies excelling in reliability, scalability, security, and innovation. We're honored and in excellent company. Tweet Link.

LlamaIndex ▷ #general (62 messages🔥🔥):

Switching documentation tools: After llamaindex 0.10.20, they switched from Sphinx to MkDocs for documentation because Sphinx required every package to be installed, which wasn't feasible for their monorepo with 500 integrations. They needed a tool that could build API-docs across all packages without such limitations.
Fine-tuning embeddings for RAG pipeline: A user struggled with fine-tuning embeddings for an e-commerce RAG pipeline, noting that embedding models aren't good with numeric data. They utilized GPT4 to generate synthetic queries but found the finetuned model performed worse; another suggested using Qdrant filters for more accurate numerical searching.
Modifying LlamaIndex prompts: Another member faced issues with local vs server behavior of a custom LLM and received advice to modify the react prompt (a crucial, lengthy prompt) using this approach.
PGVector documentation issues: When discussing document filtering by date in vector searches, it was noted that PGVector lacks clear documentation for query filters. The suggested workaround involves querying the DB for document IDs within a date range and passing them to VectorIndexRetriever.
Discussing Llama 3 finetuning and entity extraction: Query on finetuning Llama 3 for entity extraction and creating property graphs led to advice to edit the relevant class to use an async boto3 session for request handling. They were encouraged to fork the repo, make changes, and open a PR for implementation.

Links mentioned:

Tweet from Raymond Weitekamp (@raw_works): gentlemen, behold!  ( conversion currently in progress w/ @llama_index & @neo4j )
GitHub - run-llama/rags: Build ChatGPT over your data, all with natural language: Build ChatGPT over your data, all with natural language - run-llama/rags
LlamaIndex - LlamaIndex: no description found
Advanced RAG with temporal filters using LlamaIndex and KDB.AI vector store - LlamaIndex: no description found
LlamaCPP - LlamaIndex: no description found
Accessing/Customizing Prompts within Higher-Level Modules - LlamaIndex: no description found
Qdrant Vector Store - Metadata Filter - LlamaIndex: no description found
Auto-Retrieval from a Vector Database - LlamaIndex: no description found

LLM Finetuning (Hamel + Dan) ▷ #general (18 messages🔥):

Mistral finetuning error gets resolved: A member reported an error while trying to finetune Mistral on Jarvis, citing an OSError. Another member suggested trying version 0.3, and after reattempting with updated token permissions, the issue was resolved.

VLM tokens discussion: A user posted a StackOverflow query about the phi-3-vision model, noting the large number of tokens (~2000) an image takes. They shared their understanding and asked for insights into the token count and image size discrepancies.

GPT-4 Turbo throws internal server errors: A member shared experiencing "internal server error" issues with GPT-4 Turbo approximately every 10-15 prompts and speculated on rate limits. A relevant link to an OpenAI community post was shared, which might help in troubleshooting.

Structured output from LLMs blog post: Another member highlighted a blog post on getting structured output from LLMs. The post discusses various frameworks and techniques, and links to discussions on Hacker News and /r/LocalLLaMA.

Access issues for a course on Maven: A member mentioned being unable to access a course on Maven and requested for it to be enabled.

Links mentioned:

no title found: no description found
phi 3 vision model tokens: I am looking at using phi-3-vision models to try and describe an image. However, I couldn't help but notice that the number of tokens that an image takes is quite large (~2000). Is this correct, ...
Every Way To Get Structured Output From LLMs: no description found
Error: "The model produced invalid content": Once i fixed the way I was using tools I never got this error message again.  Make sure you’re passing all the right ids, function names and parameters in the right orders and you should be good to go
It Crowd Hello It GIF - It Crowd Hello IT Have You Tried Turning It Off And On Again - Discover & Share GIFs: Click to view the GIF

LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (1 messages):

Modal gets speedy credit service: A member mentioned they rarely wait for A100s in the past couple of days, finding it noteworthy enough to say "Thanks for the credits!" They plan to provide a detailed write-up on the developer experience with the repository later.
Checkpoint volumes lagging behind: Another member is experiencing delays with checkpoint volumes updating immediately after they are written, noting that sometimes "files suddenly show up and have last-modified time 15 minutes earlier." They are curious whether such behavior is expected, referencing a specific example with _allow_background_volume_commits.

LLM Finetuning (Hamel + Dan) ▷ #replicate (1 messages):
strickvl: When do Replicate credits expire?

LLM Finetuning (Hamel + Dan) ▷ #langsmith (1 messages):

LangSmith Billing Issue: A user reported setting up LangSmith billing and mentioned submitting a credits form for the "Mastering LLMs Course Credit." They requested help with the issue, providing their org ID e2ec1139-4733-41bd-b4c9-6192106ee563.

LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (2 messages):

Experimenting with SFR-Embedding-Mistral: A member shared their experience with SFR-Embedding-Mistral and highlighted peculiar behavior in similarity scores between weather reports and a given query related to weather on a specific date. They noted that texts with dates do not rank the target text as expected and sought explanations and mitigation strategies for this issue.
Clarification on Similarity Scores: Another member questioned if the original poster had made an error in their presentation, specifically regarding the similarity scores between Text 1 and Text 2 in relation to the query. They noted a possible confusion as the similarity with Text 1 was indeed higher, as initially claimed by the original poster.

LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (1 messages):
hammadkhan: https://x.com/xhluca/status/1803100958408241597?s=46&t=-TRJUfVdW8KeDqen1HJU1Q

LLM Finetuning (Hamel + Dan) ▷ #credits-questions (21 messages🔥):

Crowd-source a list of credit providers: Members discussed creating a crowd-sourced list of credit providers, including details on validity periods and amounts. Providers include Modal (1 year, 1000 credits) and OpenAI (3 months, 500 credits), among others.
Credit expiration calculation confusion: Members were confused about when to start calculating the validity period for credits. Suggestions included starting from the first week of the course for safety.
Optimizing credit usage: Members considered compiling credit amounts to optimize usage patterns. They suggested a "greedy pattern" approach, prioritizing providers like Predibase, BrainTrust, OpenAI, and others in sequence.
Auto-reminder for expiring credits: A member suggested creating a Discord bot that would notify users one week before their credits expire with a message like “Tik tok tik tok…”. They aimed to ensure users make the most of their available credits.

LLM Finetuning (Hamel + Dan) ▷ #fireworks (2 messages):

User requests credit assistance: A member requested help with credit for account ID julishstack-9c5b6a, tagging <@466291653154439169> for assistance. No further details on the response or resolution were provided. 
Acknowledgment of receipt: Another member confirmed receipt with a brief "Got it thank you!" indicating acknowledgment but no further context.

LLM Finetuning (Hamel + Dan) ▷ #braintrust (4 messages):

Platform access confirmed: A user expressed concern about seeing an "Upgrade" button despite wanting to start testing the platform. Another user reassured them, saying, "For sure! You should be all set now."

Credits expiration clarified: A user asks about the expiration of the credits. The response was clear: "3 months from June 4."

LLM Finetuning (Hamel + Dan) ▷ #west-coast-usa (1 messages):

Course Enrollment Confirmation: A member mentioned that they did not indicate their enrollment in the course on a questionnaire. They notified another member via Luma and also in the channel to ensure the information was received.

LLM Finetuning (Hamel + Dan) ▷ #predibase (2 messages):

Server Disconnected Frustration: A member reported receiving a "Server disconnected" error while attempting inference with an L3-70B base adapter. The issue hindered their ability to proceed with the task.

Token Limits Explained: Another member explained that users get 1M tokens per day up to 10M tokens per month for free using the serverless setup. They noted that this works in the prompt tab of the dashboard, but users must enter all the special instruct format tokens themselves.

LLM Finetuning (Hamel + Dan) ▷ #openpipe (1 messages):
strickvl: When do OpenPipe credits expire?

LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):
sph3r3ical: yeah, where do you see the credits?

LLM Finetuning (Hamel + Dan) ▷ #bergum_rag (7 messages):

Last Session matters to everyone: Members expressed sentiments about the final session, with phrases like "Last session!" and "Till the next one." indicating anticipation for future sessions. One member humorously noted, "You should know better by now."
Excitement about Gemini context caching features: A member expressed enthusiasm for experimenting with Gemini's context caching features in many-shot prompting for LLM labeling. They are looking forward to utilizing these new capabilities.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Midnight Rose 70b price drop: The price for sophosympatheia/midnight-rose-70b has seen a significant decrease. It is now available at $0.8 per million tokens, marking a 90% price drop.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):
mka79: Is it from OR team?

OpenRouter (Alex Atallah) ▷ #general (60 messages🔥🔥):

Hang tight, updates are imminent: The OpenRouter community expressed impatience over the lack of updates, with assurance from Alex Atallah that updates are coming soon. "it's coming!"

Understanding OpenRouter: New users enquired about the purpose and use of OpenRouter, with responses explaining it focuses on prioritizing price or performance and features a standardized API to switch between models and providers easily. The explanation was complemented with a link to the principles page for more information.

Provider Uptime and Reliability: Queries were raised about the reliability and uptime of the service when switching between providers, with reassurance given that uptime is the collective uptime of all providers and any issues are communicated through a notification system. An example uptime link for Dolphin Mixtral was shared here.

Prompt Bug Fixes and System Tweaks: Issues such as Claude's "self-moderated" function and the API key visibility problem were swiftly addressed by the team. "Ah just pushed a tweak, fixing", "working on it", highlighting proactive maintenance and user support.

Model Updates and Latency Concerns: Members mentioned specific updates and performance concerns, such as renaming DeepSeek coder to DeepSeek-Coder-V2 and DeepInfra's Qwen 2 latency instability. This showcases the community's active involvement in monitoring and improving service quality.

Links mentioned:

Principles | OpenRouter: Core concepts for model selection
Dolphin 2.9.2 Mixtral 8x22B 🐬 – Uptime and Availability: Uptime statistics for Dolphin 2.9.2 Mixtral 8x22B 🐬 across providers - Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of [Mixtral 8x22B Instru...

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):
sigridjin.eth: 와 안녕하세요.

Eleuther ▷ #general (29 messages🔥):

Legal challenges with CC content: Members discussed that using Creative Commons (CC) content could reduce the legal attack surface but still pose issues for outputs resembling copyrighted items like Mickey Mouse. One suggested "patches" to address specific complaints over time.

CommonCanvas models performance issues: Links to the CommonCanvas project on Hugging Face were shared. Despite being largely "unusable as-is," one member noted potential in training a texture generation model with freely licensed textures; the corresponding research paper is available on arXiv.

DeepFashion2 dataset results disappoint: A user sought recommendations for image datasets on clothes and accessories after poor results from the DeepFashion2 dataset. No immediate alternatives were proposed.

GPT-NeoX weights location: The GPT-NeoX compatible Pythia-70M weights were shared in response to a query. 

LLMs for middle token completion: Members discussed models like BERT, T5, BLOOM, and StarCoder for natural language fill-in-the-middle tasks. T5's out-of-the-box performance was debated, with mentions of T0 and flan-T5 models being specifically fine-tuned for such tasks.

Links mentioned:

EleutherAI/neox-ckpt-pythia-70m-v1 · Hugging Face: no description found
common-canvas (CommonCanvas): no description found

Eleuther ▷ #research (20 messages🔥):

Debate over Z-Loss fading in Importance: A member questioned if anyone still uses z-loss for pretraining MoEs, noting that most, like Mixtral, use the load balance loss. Another pointed out that DeepSeek V2 and Skywork MoE don't use z-loss, highlighting a shift away from z-loss as per recent papers.
Questionable HF Configs for Mixtral: It was suggested that the HF configs for Mixtral may not be reliable, with one member sharing the true Mixtral params from the official torrent. The parameters included precise values for dimensions, layers, and MoE configurations.
RLHF and Mode Collapse Discussion: One user remarked that rationalizing RLHF censorship as harmless leads to similar outcomes as mode collapse in human thinking. This led to brief exchanges on how these restrictions could have unintended consequences.
Introduction of GAMA for Audio Understanding: A member shared information about GAMA, a novel General-purpose Large Audio-Language Model (LALM) capable of advanced audio understanding and reasoning, integrating multiple audio representations and fine-tuned on a large-scale audio-language dataset. Details and links were provided for further reading (GAMA project).
Latest ArXiv Papers Shared: Several new papers were shared, highlighting advancements in machine learning and AI, including Meta-Reasoning Prompting (MRP), improvements in retrieval-augmented generation with the ERASE methodology, and sparse communication topologies in multi-agent debates. These offer insights into optimizing computational costs and dynamic strategy application (Example paper).

Links mentioned:

STAR: SocioTechnical Approach to Red Teaming Language Models: This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerab...
Improving Multi-Agent Debate with Sparse Communication Topology: Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, ...
Meta Reasoning for Large Language Models: We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reas...
Language Modeling with Editable External Knowledge: When the world changes, so does the text that humans write about it. How do we build language models that can be easily updated to reflect these changes? One popular approach is retrieval-augmented ge...
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation: Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data,...
GAMA Audio: no description found
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities: Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpo...
Release 2.20.0 · huggingface/datasets: Important  Remove default trust_remote_code=True by @lhoestq in #6954  datasets with a python loading script now require passing trust_remote_code=True to be used    Datasets features  [Resumable I...
What is the long-run distribution of stochastic gradient descent? A large deviations analysis: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state s...

Eleuther ▷ #interpretability-general (10 messages🔥):

Users debate understanding of Paris plot in logit prisms: A member expressed confusion about the Paris plot while discussing an article on logit prisms. Another member clarified that summing up all layers should give the original output logits back.

Logit prisms and its relationship to DLA: Discussions referenced the similarity between logit prisms and direct logit attribution (DLA), linking to the IOI paper and a related LessWrong post. One member acknowledged the overlap but argued that logit prisms offer a holistic view of logit decomposition and named it as such for its comprehensive approach.

Member seeks paper on shuffle-resistant transformer layers: A user asked for a paper link discussing the resilience of transformer models to shuffled hidden layers. No specific response or link was provided in the conversation.

Links mentioned:

Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability: no description found
An adversarial example for Direct Logit Attribution: memory management in gelu-4l — LessWrong: We provide concrete evidence for memory management or clean-up in a 4-layer transformer model.

Eleuther ▷ #lm-thunderdome (2 messages):

Passing vLLM arguments directly into the engine: A user inquired about passing vLLM arguments, such as --enforce_eager, directly into the engine via the model_args dictionary. Another member indicated it should work as kwargs from model_args, but noted a potential "type casting bug" that needs to be addressed.

LangChain AI ▷ #general (17 messages🔥):

New User Struggles with LangChain Version: A new member is frustrated that the current LangChain's version differs from tutorials online. They reference a video on building a ChatGPT chatbot on Slack and are stuck at timestamp 11:31.

Extracting Data from Scraped Websites: A user seeks help to extract specific components, such as company summaries and client lists, from 30-40 pages of scraped website data. They are advised to use LangChain's information extraction capabilities and provided with GitHub Issue 12636 as a resource.

Issue with LLMChain Deprecation: Confusion arises over the deprecation of the LLMChain class in LangChain 0.1.17. A user notes that it will be removed in 0.3.0 and seeks clarity on using RunnableSequence instead.

Debugging LCEL Pipelines: A member asks for ways to debug the output of each step in an LCEL pipeline and is advised to use set_debug(True) and set_verbose(True) from LangChain globals for insight into the input of the next node.

Handling API Request Errors in Loop: A member encounters a BadRequestError while looping through games and making API calls, receiving feedback on tool messages not matching tool calls. They are looking for ways to resolve the issue generated by the incomplete API responses.

Links mentioned:

How to build a chatGPT chatbot on Slack: Welcome to this tutorial video on creating a Slack chatbot using the OpenAI language model, LangChain, and the Slack Bolt library. This video will showcase t...
Blowing Kisses Gratitude GIF - Blowing kisses Kisses Kiss - Discover & Share GIFs: Click to view the GIF
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Templates | 🦜️🔗 LangChain: Highlighting a few different categories of templates
How to handle long text when doing extraction | 🦜️🔗 LangChain: When working with files, like PDFs, you're likely to encounter text that exceeds your language model's context window. To process this text, consider these strategies:

LangChain AI ▷ #share-your-work (14 messages🔥):

Building Serverless Semantic Search: A member shared a Medium article titled "Building a Serverless Application with AWS Lambda and Qdrant for Semantic Search". The repository link is included in the article.
AgentForge launches on ProductHunt: AgentForge has gone live on ProductHunt, featuring a NextJS boilerplate with LangChain, LangGraph, and LangSmith.
Advanced Research Assistant Beta Test: A member is looking for beta testers for their advanced research assistant and search engine, offering 2 months free of premium features like Claude 3 Opus and GPT-4 Turbo. Interested testers can sign up at Rubik's AI using promo code RUBIX.
Environment Setup Advice: An article on Hugging Face advises separating environment setup from application code. The discussion praised tools like Bitwarden for managing credentials securely.
Infinite Backrooms Inspired Demo: A member introduced YouSim, a web/world simulation platform inspired by infinite backrooms, allowing users to simulate any identity.

Links mentioned:

YouSim: they've simulated websites, worlds, and imaginary CLIs... but what if they simulated *you*?
Against mixing environment setup with code: no description found
 AgentForge - Unlock the Power of AI with AgentForge | Product Hunt: AgentForge is a NextJS boilerplate that helps entrepreneurs and developers quickly build and deploy AI agent-based applications. Create SaaS products, AI tools, or web apps with ease and start earning...
no title found: no description found
Rubik's AI - AI research assistant & Search Engine: no description found

LangChain AI ▷ #tutorials (1 messages):

Did AI just end music?!: A YouTube video titled Did AI just end music?! by jasonzhou1993 delves into Music Gen 101 and how to build an application with the Text-to-Music API. The video can be viewed here.

Hostinger Website Builder Discount Code: For those interested in web development, jasonzhou1993 shared a link to the Hostinger website builder where users can get a 10% discount using the code AIJASON. The offer is available here.

Link mentioned: Did AI just end music?!: Music Gen 101 & build application with Text-to-Music APIHostinger website builder: https://www.hostinger.com/aijasonGet 10% off with my code: AIJASON🔗 Links...

tinygrad (George Hotz) ▷ #general (9 messages🔥):

Code clarity issues in graph.py: A member identified a potential problem on line 69 of graph.py and contemplated formatting it with .2f. They suggested this change for better clarity.
Pull Request for graph float rounding: A member announced the opening of a pull request for displaying floats rounded in the graph.
Request for PR review on OpenCL error messages: A member requested a review on another pull request aimed at providing better OpenCL error messages.
George Hotz declines low-quality code submission: George Hotz criticized the provided code as "low quality" and stated a new policy of closing PRs if submitters haven't carefully reviewed their diff. He requested members not to tag for reviews.
Member defends their effort on PR: A member defended their code changes and expressed that they were trying to resolve the issue while having fun. They acknowledged bothering for a review and should not have.

Links mentioned:

graph display floats rounded by GabrielZCode · Pull Request #5021 · tinygrad/tinygrad: no description found
Fix/opencl Better error Messages by GabrielZCode · Pull Request #5004 · tinygrad/tinygrad: Better openCL error messages!! Using the same strategy as generate_nv() function in generate_stubs.sh , I've extracted the error messages from https://github.com/KhronosGroup/OpenCL-Headers/tree/m...

tinygrad (George Hotz) ▷ #learn-tinygrad (17 messages🔥):

Understanding Tensor Realization in Tinygrad: A user asked why out is not included in the UOpsGraph when realized. They concluded it's due to Tinygrad's lazy evaluation and separate kernel handling.

Kernel Generation in Tinygrad: When comparing remainder.realize() and without it, outputs differed. It was confirmed that adding realizes can split the operations into multiple kernels, showcasing Lazy vs. eager execution in Tinygrad.

Kernel Fusion Explanation: It was explained that operations can fuse into a single kernel unless explicitly separated by realizations. Cached kernels prevent redundant engine runs in subsequent operations.

Custom Accelerator Inquiry: A user inquired about forcing kernel combinations for easier layout on custom hardware. They were directed to look into the scheduler for implementing such fusions.

Deep Dive into Scheduler: Following the discussion on kernel fusion and realizations, a user expressed their intention to explore the Tinygrad scheduler further to support their custom accelerator integration.

LAION ▷ #general (23 messages🔥):
- **RunwayML Gen-3 clip amazes users**: Members were impressed by a [RunwayML Gen-3 clip](https://fxtwitter.com/Mr_AllenT/status/1802706451586023763), calling its AI-generated details "insane". One noted, "99% of people wouldn't know this is AI."
- **DeepMind shares video-to-audio research**: A blog post on DeepMind's V2A technology was shared, explaining how video pixels and text prompts can generate soundtracks for videos. This could innovate in creating sound for silent footage and working with models like [Veo](https://deepmind.google/technologies/veo/).
- **Meta FAIR releases new research artifacts**: Meta FAIR announced several new [research artifacts](https://ai.meta.com/blog/meta-fair-research-new-releases/), including Meta Llama 3 and V-JEPA, emphasizing their commitment to open AI ecosystems. Another user was interested in the recently released Chameleon vision-only weights.
- **PKU-YuanGroup's Open-Sora Plan**: A member shared a [GitHub link](https://github.com/PKU-YuanGroup/Open-Sora-Plan) about the Open-Sora Plan, a project aimed at reproducing the Open AI T2V model. They requested community contributions to this open-source endeavor.
- **Free img2img model request**: A user expressed a need for a free img2img model using RealVision or similar, aiming to add "a touch of realism." They reminisced about potentially using their old custom Stable 2 model for this purpose.

Links mentioned:

Omni-Zero - a Hugging Face Space by okaris: no description found
no title found: no description found
Generating audio for video: Video-to-audio research uses video pixels and text prompts to generate rich soundtracks
Tweet from Allen T. (@Mr_AllenT): The details of this @runwayml Gen-3 clip are insane  99% of people wouldn't know this is AI
Tweet from Armen Aghajanyan (@ArmenAgha): A restricted, safety aligned (no-image-out) version of Chameleon (7B/34B) is now open-weight!  https://github.com/facebookresearch/chameleon  The team strongly believes in open-source. We had to do a ...
GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.: This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project. - PKU-YuanGroup/Open-Sora-Plan

LAION ▷ #research (3 messages):

Query on conversational speech datasets: A member inquired if anyone knew of any conversational speech datasets similar to the ones used for training the Bark model. They noted that most available datasets seem to lack the emotional nuances found in audiobooks.

UC Berkeley's weight space discovery: UC Berkeley, Snap Inc., and Stanford University researchers discovered an interpretable latent space in the weights of customized diffusion models, as detailed in their project Weights2Weights. This space allows for sampling, editing, and inversion of over 60,000 fine-tuned models, each embedding a different person's visual identity.

Link mentioned: weights2weights: no description found

Torchtune ▷ #general (24 messages🔥):

MPS vs CUDA Output Discrepancy: A member reported experiencing nan outputs on CUDA while MPS produced sensible outputs with the same inputs. They identified the issue as differing kernel paths for SDPA on CUDA and CPU, with fused attention causing softmax to nan on large values.

Huggingface Cache Issue: Another member shared that their system crashes when finetuning with Torchtune due to the Huggingface cache filling up. They were seeking advice on possible causes and solutions.

Huggingface to Torchtune Model Conversion: Detailed steps were provided on converting a Huggingface model to Torchtune format, including an example with Gemma and pointers for automatic conversion for models like Llama2/3. They referenced Torchtune Checkpointers for automated weight conversion and loading.

Attention Mask Clarification: Clarification was sought and provided on the correct format of the attention mask for a padded token input, verifying a specific matrix setup. This was discussed in the context of debugging the padding issue between different processing units.

Links mentioned:

RLHF with PPO · pytorch/torchtune@a1cde1c: A Native-PyTorch Library for LLM Fine-tuning. Contribute to pytorch/torchtune development by creating an account on GitHub.
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
`scaled_dot_product_attention` behaves differently between v2.0 and v2.1 · Issue #110213 · pytorch/pytorch: 🐛 Describe the bug With torch v2.1, scaled_dot_product_attention on GPU gives nan when a sequence has all large negative values (e.g torch.finfo(q.dtype).min - in order to mean no attention at all .....

Latent Space ▷ #ai-general-chat (18 messages🔥):

SEO-generated article on conversational AI apparent: A user shared an SEO-generated article from a Gen AI company with inaccuracies like "Google's ChatGPT," noting the absence of citations and cross-linking.

Werner Herzog reads AI output on podcast: In Act 2 of a This American Life podcast episode, Werner Herzog reads output from davinci 003. The episode also delves into various human-AI and interpersonal relationships.

Podcast tools discussion: A user inquired about tools for creating podcast intros and show notes, with smol-podcaster being mentioned. Discussions also included a comparison between Assembly.ai and Whisper for transcription.

Meta announces new AI models: Meta has unveiled four new AI models, including Meta Chameleon, Meta Multi-Token Prediction, Meta JASCO, and Meta AudioSeal, along with research artifacts. These releases aim to bolster open AI innovation and responsible development.

Google Gemini API introduces context caching: Good news for Google developers, as context caching for the Gemini API has been launched. This feature supports both 1.5 Flash and 1.5 Pro versions, is significantly cheaper, and is available immediately.

Links mentioned:

Tweet from AI at Meta (@AIatMeta): Today is a good day for open science.  As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models...
Tweet from Logan Kilpatrick (@OfficialLoganK): Great news for @Google developers:   Context caching for the Gemini API is here, supports both 1.5 Flash and 1.5 Pro, is 2x cheaper than we previously announced, and is available to everyone right now...
Tweet from Karina Vinnikova (@KarinaVinnikova): LOL FSB forgot to pay for ChatGPT 4
GitHub - FanaHOVA/smol-podcaster: smol-podcaster is your autonomous podcast production intern 🐣: smol-podcaster is your autonomous podcast production intern 🐣 - FanaHOVA/smol-podcaster
‎This American Life: 832: That Other Guy on Apple Podcasts: ‎Show This American Life, Ep 832: That Other Guy - Jun 2, 2024
Evolving Conversational AI: Challenges and Future Prospects - Neural Voice AI: Conversational AI has emerged as a groundbreaking technology that is transforming the way we interact with computers and devices. Powered by advancements in

OpenInterpreter ▷ #general (9 messages🔥):

Users debate the best local LLM for commercial use: A user questioned the best local LLM available for commercial use, with responses recommending llama-70b despite codestral ranking higher but not being suitable for commercial use. Another user shared a link to a leaderboard showing rankings where Mixtral-8x22b was noted but praised llama-3-70b for better performance in other benchmarks.

Discussing more integration profiles: A member expressed enthusiasm for having more integration profiles, suggesting that e2b should be next. Another user sought further understanding of E2B, clarifying its value and use cases for outsourcing execution to secure sandboxes.

Request for video reviews of OI release: A user asked if there were any video reviews or video content of the latest OI release. They were directed to a YouTube video titled "WELCOME TO THE JUNE OPENINTERPRETER HOUSE PARTY", hosted by Restream.

Links mentioned:

LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys: no description found
WELCOME TO THE JUNE OPENINTERPRETER HOUSE PARTY: Powered by Restream https://restream.iodiscord stages are hard

OpenInterpreter ▷ #O1 (1 messages):
legaltext.ai: the one from april?

OpenInterpreter ▷ #ai-content (2 messages):

Open Interpreter’s Local III launch: @hellokillian announced the release of Open Interpreter’s Local III, boasting "computer-controlling agents that work offline." He mentioned key features such as "interpreter --local sets up fast, local LLMs," a free inference endpoint, and training their own model. source
Descriptive photo naming made easy: @MikeBirdTech introduced a tool to "Automatically give your photos descriptive names, fully offline." Promoted as private and free, this tool emphasizes convenience and data privacy. source

Links mentioned:

Tweet from Mike Bird (@MikeBirdTech): Automatically give your photos descriptive names, fully offline  Private and free
Tweet from killian (@hellokillian): Open Interpreter’s Local III is out today.  We are building computer-controlling agents that work offline. This is our biggest step forward.  - interpreter --local sets up fast, local LLMs. - We are h...

AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (3 messages):

Agent Hospital Simulacrum Introduced: A member linked to an arXiv paper that introduces Agent Hospital, a system simulating the entire process of treating illness with autonomous agents powered by LLMs. The paper discusses MedAgent-Zero, which helps doctor agents learn and improve their treatment performance by simulating disease onset and progression.

Real-World Application of Agent Hospital: The paper highlighted in the discussion claims that the knowledge acquired by doctor agents in Agent Hospital is applicable to real-world medicare benchmarks. Accumulating experience from treating around ten thousand patients in the simulation helps improve performance, replicating years of real-world learning.

Link mentioned: Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents: In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by lar...

Datasette - LLM (@SimonW) ▷ #ai (1 messages):
shajith: oh that is good, thanks for sharing.

Datasette - LLM (@SimonW) ▷ #llm (2 messages):

Comprehensive LLM video demo with annotated notes available: Simon Willison shared a lengthy video demo and tutorial on LLM usage from the command-line, which was part of the Mastering LLMs Conference. He also provided an annotated presentation with detailed notes on his blog and a YouTube link to the talk. 

Calmcode set to have a new release soon: Vincent Warmerdam announced that Calmcode has a new maintainer and hinted at an upcoming release.

Link mentioned: Language models on the command-line: I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long …

Mozilla AI ▷ #llamafile (1 messages):

PR for improving MoE prompt eval speed: A member highlighted a pull request titled llamafile : improve moe prompt eval speed on cpu #6840 that had already been approved but was conflicting with the main branch. The member requested the author to rebase the PR.

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):