[AINews] Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Claude 3.5 Sonnet is all you need?
AI News for 6/19/2024-6/20/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (415 channels, and 3577 messages) for you. Estimated reading time saved (at 200wpm): 392 minutes. You can now tag @smol_ai for AINews discussions!
The news of the day is nominally Claude 3.5 Sonnet - ostensibly Anthropic's answer to GPT-4o:
Including claiming SOTA on GPQA, MMLU, and HumanEval:
as well as "surpassing Claude 3 Opus across all standard vision benchmarks".
The model card demonstrates the Opus-level context utilization now extending to Sonnet:
We don't have a ton of technical detail on what drives the changes, but Anthropic is selling this as a Pareto improvement over 3 Sonnet and 3 Opus:
Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.
However the bigger focus of the messaging beyond general capability and efficiency improvements is Claude Sonnet's coding ability:
"Claude is starting to get really good at coding and autonomously fixing pull requests. It's becoming clear that in a year's time, a large percentage of code will be written by LLMs." - Alex Albert
This seems to be backed up by Claude.ai's release of "Artifacts":
a new feature that expands how users can interact with Claude. When a user asks Claude to generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside their conversation. This creates a dynamic workspace where they can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.
This would seem like Anthropic's answer to OpenAI's Code Interpreter or Cognition Labs' Devin.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.
Claude 3.5 Sonnet Release by Anthropic
- Performance: @alexalbert__ noted Claude 3.5 Sonnet outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. It shows marked improvement in grasping nuance, humor, and complex instructions. @AnthropicAI highlighted it now outperforms GPT-4o on several benchmarks like GPQA, MMLU, and HumanEval.
- Artifacts Feature: @AnthropicAI introduced Artifacts, allowing users to generate docs, code, diagrams, graphics, or games that appear next to the chat for real-time iteration. @alexalbert__ noted he's stopped using most simple chart, diagram, and visualization software due to this.
- Coding Capabilities: In Anthropic's internal pull request eval, @alexalbert__ shared Claude 3.5 Sonnet passed 64% of test cases vs 38% for Claude 3 Opus. @alexalbert__ quoted an engineer saying it fixed a bug in an open source library they were using.
- Availability: @AnthropicAI noted it's available for free on claude.ai and the Claude iOS app. Claude Pro and Team subscribers get higher rate limits. Also available via Anthropic API, Amazon Bedrock, Google Cloud's Vertex AI.
Ilya Sutskever's New Company: Safe Super Intelligence (SSI)
- Goal: @ilyasut stated they will pursue safe superintelligence in a straight shot, with one focus, one goal, and one product through revolutionary breakthroughs by a small cracked team.
- Reactions: Some like @bindureddy praised the focus on AGI without obsessing about money. Others like @DavidSHolz compared it to the Yahoo/AOL/pets dot com era of AI. @teortaxesTex speculated this destroys possibility of a binding USA-China AGI/ASI treaty.
- Funding: @ethanCaballero questioned how SSI will raise $10B in one year or they're "dead on arrival".
AI Benchmarks and Evaluations
- Mixture of Agents (MoA): @corbtt introduced MoA model+FT pipeline that beats GPT-4 but is 25x cheaper. Humans prefer MoA outputs vs GPT-4 59% of the time. New SOTA on Arena-Hard (84.8) and Alpaca Eval (LC 68.4).
- Infinity Instruct: @_philschmid shared this 3M sample deduplicated Instruction dataset. 10M sample version planned for end of June. SFT experiments for Mistral 7B achieve 7.9 on MT Bench, boost MMLU by 6% and HumanEval to 50%.
- τ-bench: @ShunyuYao12 introduced τ-bench at Sierra Platform to evaluate critical agent capabilities omitted by current benchmarks: robustness, complex rule following, human interaction skills.
Memes and Humor
- Meme about Logi AI Prompt Builder on an AI mouse: @nearcyan
- Meme about Yahoo/AOL/pets dot com era of AI: @DavidSHolz
- Encrypted Shakespeare sonnet about Claude 3.5: @AnthropicAI
- Meme about SSI raising $10B in funding: @bindureddy
AI Reddit Recap
Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!
AI Companies and Developments
- Dell partnering with NVIDIA on "AI factory": In a tweet, Michael Dell announced Dell is building an "AI factory" with NVIDIA to power "grok for xAI", hinting at a major AI infrastructure initiative between the two tech giants.
- Anthropic's Claude AI demonstrates strong legal reasoning: According to an analysis, Anthropic's Claude AI matched Supreme Court findings in 27 out of 37 cases, showcasing its ability to comprehend and reason about complex legal issues.
- Meta's Chameleon language model training datasets revealed: Model files for Meta's Chameleon AI show it was trained on diverse datasets spanning legal content, code, safety/moderation data and more, providing insight into the knowledge domains Meta prioritized.
AI Capabilities and Benchmarks
- Microsoft open-sources Florence-2 vision models: Microsoft released its Florence-2 vision foundation models under an open-source license, with the models demonstrating strong performance across tasks like visual question answering, object detection, and image captioning.
- LI-DiT-10B claims to outperform DALLE-3 and Stable Diffusion 3: A comparison image suggests the LI-DiT-10B model surpasses DALLE-3 and Stable Diffusion 3 in image-text alignment and generation quality, with a public API planned after further optimization.
- 70B parameter Llama-based story writing model released: DreamGen Opus v1.4, a 70B parameter language model based on Llama 3 and focused on story generation, was released along with a detailed usage guide and example prompts showcasing its creative writing capabilities.
Discussions and Opinions
- Concerns over Stability AI's business prospects: An opinion piece raised questions about the sustainability of Stability AI's business model and future outlook in light of issues with the Stable Diffusion 3 release and other factors.
Memes and Humor
- AI memes touched on the rapid growth of AI startups, poked fun at OpenAI's closed model despite its name, and satirized Stability AI's handling of the Stable Diffusion 3 problems.
- One meme imagined Doc Brown's stunned reaction to AI progress by 2045, nodding to the rapid pace of advancement.
AI Discord Recap
A summary of Summaries of Summaries
1. Model Performance Optimization and Benchmarking
- [Quantization] techniques like AQLM and QuaRot aim to run large language models (LLMs) on individual GPUs while maintaining performance. Example: AQLM project with Llama-3-70b running on RTX3090.
- Efforts to boost transformer efficiency through methods like Dynamic Memory Compression (DMC), potentially improving throughput by up to 370% on H100 GPUs. Example: DMC paper by @p_nawrot.
- Discussions on optimizing CUDA operations like fusing element-wise operations, using Thrust library's
transformfor near-bandwidth-saturating performance. Example: Thrust documentation.
- Comparisons of model performance across benchmarks like AlignBench and MT-Bench, with DeepSeek-V2 surpassing GPT-4 in some areas. Example: DeepSeek-V2 announcement.
2. Fine-tuning Challenges and Prompt Engineering Strategies
- Difficulties in retaining fine-tuned data when converting Llama3 models to GGUF format, with a confirmed bug discussed.
- Importance of prompt design and usage of correct templates, including end-of-text tokens, for influencing model performance during fine-tuning and evaluation. Example: Axolotl prompters.py.
- Strategies for prompt engineering like splitting complex tasks into multiple prompts, investigating logit bias for more control. Example: OpenAI logit bias guide.
- Teaching LLMs to use
<RET>token for information retrieval when uncertain, improving performance on infrequent queries. Example: ArXiv paper.
3. Open-Source AI Developments and Collaborations
- Launch of StoryDiffusion, an open-source alternative to Sora with MIT license, though weights not released yet. Example: GitHub repo.
- Release of OpenDevin, an open-source autonomous AI engineer based on Devin by Cognition, with webinar and growing interest on GitHub.
- Calls for collaboration on open-source machine learning paper predicting IPO success, hosted at RicercaMente.
- Community efforts around LlamaIndex integration, with issues faced in Supabase Vectorstore and package imports after updates. Example: llama-hub documentation.
4. Multimodal AI and Generative Modeling Innovations
- Idefics2 8B Chatty focuses on elevated chat interactions, while CodeGemma 1.1 7B refines coding abilities.
- The Phi 3 model brings powerful AI chatbots to browsers via WebGPU.
- Combining Pixart Sigma + SDXL + PAG aims to achieve DALLE-3-level outputs, with potential for further refinement through fine-tuning.
- The open-source IC-Light project focuses on improving image relighting techniques.
5. Misc
- Stable Artisan Brings AI Media Creation to Discord: Stability AI launched Stable Artisan, a Discord bot integrating models like Stable Diffusion 3, Stable Video Diffusion, and Stable Image Core for media generation and editing directly within Discord. The bot sparked discussions about SD3's open-source status and the introduction of Artisan as a paid API service.
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
Ollama Gets Unslothed: Engineers are keen on the new support for Ollama by Unsloth AI, providing a Colab link for tests and requesting bug reports for early adopters.
Distillation of Distributed Training: Deep dives into distributed data parallelism (DDP) focused on scaling models across multiple GPUs, highlighting the importance of model accuracy, token, and context handling in training.
Anthropic Innovates with Claude 3.5 Sonnet: Anthropic's announcement of Claude 3.5 Sonnet has captured engineers' attention for setting new industry model standards.
CausalLM Confusion Cleared: A slew of messages addressed confusion around causalLM loss calculation during training, comparing it to loss calculation in traditional Masked LM tasks, indicating its aggregate nature for next word prediction accuracy.
Deployment Blues and Pretraining Queries: AI engineers discussed practical challenges and solutions in deploying models, such as resolving llama3 library version compatibility using Conda, and strategies for continued pretraining and fine-tuning instruct models, with a helpful discussion found here.
OpenAI Discord
- GPT-4o Sparks Engineering Curiosity: Engineers debated GPT-4o's reasoning capabilities, noting its advancement over other models and anticipating its implementation in larger models like the hypothetical GPT-5. Concerns centered around AI's theoretical limits and practical applications, with a specific focus on OpenAI's offerings vis-à-vis competitors like Claude 3.5 and Google’s Gemini.
- Pushing the Boundaries of ASI: Discussions on Artificial Superintelligence (ASI) raised questions about achieving "God-like intelligence" and its ethical implications. Debates oscillated between the concerns over limitations of ASI and the enthusiasm for its unprecedented technological progression.
- Practical Prompt Engineering Woes: Engineers shared frustrations over token usage in OpenAI assistants, with unexpected high token counts for simple commands. On the creative side, limitations of DALL-E in generating asymmetrical images prompted suggestions for more diverse descriptive phrases but acknowledged limited success.
- Voice of the Engineers: Call for Updates and Alternatives: Users expressed dissatisfaction with stalled updates from OpenAI, such as a voice release from Sam Altman, and discussed chat experiences with Google’s AI Studio, noting Gemini’s superior performance in handling large context windows.
- AI's Practical Limitations in Long Outputs and System Instructions: ChatGPT was highlighted for difficulties in generating reliable long outputs due to its token limitations. Furthermore, reports on GPT-3.5-turbo-0125 sometimes overlooking system instructions led to advice for clearer and simplified directives to ensure compliance.
Stability.ai (Stable Diffusion) Discord
- Stability AI CEO Under the Spotlight: Shan Shan Wong has been confirmed as the CEO of Stability AI. Some members teased about possibly sharing exclusive updates in the future without providing specifics.
- Licensing Woes for AI Artisans: AI-generated images by stabilityai/stable-diffusion-xl-base-1.0 model raised questions on licensing, with members exploring the use of various Creative Commons licenses. The model in question operates under the CreativeML Open RAIL++-M License.
- Art Community Channels Axed: Deletion of the Cascade and other art-related community channels due to inactivity and bot spamming led to a stir among members. A moderator noted that these channels could be restored should the community express a renewed interest.
- Turbo Versus Finetuned Model Showdown: Turbo models were valued for their speed and flexibility among some members, while others advocated for the use of finetuned models, like Juggernaut and Pony, for tasks needing specific detail or conceptual accuracy.
- Introducing Mobius, The Anti-Bias Model: The Mobius model was highlighted as a leader in debiased diffusion models, utilizing a domain-agnostic approach to reduce bias. Questions were raised about its size and requirements, such as clip skip 3 and its Lora compatibility was discussed.
Links: Hatsune Miku Gif, Mobius on Civitai, ComfyUI_TensorRT GitHub, Google Colab notebook.
Perplexity AI Discord
- Perplexity's CEO Chats with Lex Fridman: In a riveting podcast session, Perplexity's CEO discussed the powerful impact of AI on search and the internet, invoking inspiration from Larry Page with the mantra, "the user is never wrong." The video is available on YouTube.
- Technical Troubles and Triumphs: Users have encountered issues with Pro Search's inability to find sources when toggled on, an inconsistency compared to the iPhone app's performance, prompting a community escalation. Meanwhile, there's anticipation for the upgrade to Claude 3.5 Sonnet, notably for its potential in creative writing, although HOW it integrates remains a point of curiosity.
- AI Ethics in the Spotlight: A Wired article sparked debate on Perplexity's adherence to robots.txt with some users defending AI's role in information retrieval for user requests, while others urge closer scrutiny.
- Prospects and Psychedelics: Conversations took a turn from high-paying career paths for English literature majors to the financial speculations around Lululemon earnings, juxtaposed starkly with discussions on how psychedelic experiences can pivot personal belief systems.
- API Adaptability Aches: The Perplexity API showcases solid performance, particularly notable for running large LLMs, but is critiqued for its constrained customization and lack of certain features like Pages via API. However, resetting API keys is simplified via the Perplexity API settings page.
CUDA MODE Discord
Character.AI Pushes Efficient INT8 Training: Character.AI is working towards AGI with INT8 optimization, achieving more inference queries at a rate about 20% of Google Search's volume. Inquiry into their use of Adaptive Quantization (AQT) remains open. Read more.
Kernel Profiling and Triton Tackles: Nsight Compute is the go-to for profiling CUDA kernels to squash performance bugs in the codebase, while Triton 3.0.0 is hailed as a fix for numerous issues, with detailed upgrade instructions available. GitHub profiling script and YouTube resource for kernel profiling.
Emerging AI Breakthroughs: Advancements in Qwen2, DiscoPOP, and Mixture of Agents are shaping the future of AI with the potential to boost LLM performance. Unfolding research projects like Open Empathic and Advisory Board GPT offer creative angles on model utilization. AI Unplugged Coverage.
Optimizing with Quantization & Introducing FPx: While Finetuning the details, the community evaluates tinygemm compatibility, embraces challenges with FP8 quantization, and ponders XLA integration with quantized models. The uint2 quantization and performance comparisons against FP16 showcase promising speedups. Quantization code reference.
Leveraging Newer Tech for Voltage Speed: H100 box experimentation with a 1558M model demonstrates a 2.5x speedup over A100, providing tangible efficiency gains from the cutting edge of hardware advancements. Speed optimizations continue to be a focal point, with a 20% enhancement through torch compile max autotune mentioned.
Nous Research AI Discord
- Hermes 2 Theta Surpasses GPT-4: Hermes 2 Theta 70B has scored 9.04 on the MT-Bench, a leap ahead of GPT-4-0314's 8.94 score, flaunting increased creativity and capabilities. It's a product of the collaboration between Nous Research, Charles Goddard, and Arcee AI, and download links for both FP16 and GGUF versions are available on Hugging Face.
- General Chat Buzzes with Claude 3.5 Sonnet: The community resonated with the release of Claude 3.5 Sonnet for its speed and problem-solving abilities, branding it a step forward in AI capabilities. Meanwhile, the debate on model parsing emphasized the importance of converting model-specific tool calls into a standard format, suggesting the incorporation of reverse templates into
tokenizer_config.json.
- Teasing New Resources: Members hinted at an upcoming resource in the #ask-about-llms channel, sparking intrigue and anticipation among peers.
- Model Integration Techniques Under Scrutiny: A suggestion in the #general channel described a direct method of merging tools into model prompts, potentially facilitating fluent use of multiple AI tools.
- Music Video Diversifies Conversation: In a lighter exchange, a YouTube music video was shared by a member on the #world-sim channel, offering a diversion from the technical discussions.
Torchtune Discord
- Direct Data Streaming On the Horizon: Users highlighted current limitations with Torchtune, as in-memory datasets are still downloaded to local disk from Hugging Face (HF) locations. They are moving towards streaming datasets to bypass saving on disk.
- Configuring HF Datasets: A Piece of Cake: The community agreed on configuring HF datasets in
torchtune.dataset.chat_datasetusingconversation_style: openai, which should integrate effortlessly with Torchtune.
- Sequence Length Debate Settles at 8k: There was a debate on llama3 maximum sequence length, resulting in a consensus of up to 8192 characters, though concerns were raised about VRAM capacity limitations.
- Crash Course in Memory Management: Amidst RAM-related crashes during model training, particularly with qlora and lora, it was suggested to offload layers to CPU and sort out ROCm setup quirks for smooth operation.
- Navigating the ROCm Maze: Discussions on setting up ROCm for AMD GPUs unearthed several issues, but community-shared resources, including a Reddit thread about successful ROCm operation on a 6900 XT, proved to be valuable. Building from source was the recommended route for simplicity and effectiveness.
HuggingFace Discord
AI integrations prove handy in scripting: Users discussed integrating Stable Diffusion within VSCode and were advised to run commands via the terminal within the editor. There was also a mention of using a stable-diffusion-3-medium-diffusers model as a workaround for a missing model index in Stable Diffusion 3.
LLMs debate over drug names and finetuning issues: NLP models showed a preference for generic drug names (acetaminophen) over brands (Tylenol), suggesting possible data contamination as discussed in this study and demonstrated on a leaderboard. Meanwhile, a member encountered problems while fine-tuning Llama 3 using TRL with QLoRa and linked to their code and potential solutions.
Challenging assumptions with multi-table data synthesis: A member scrutinized the challenge of generating synthetic multi-table databases, particularly those containing date columns, and an article compared three data synthesis vendors. Additionally, ToolkenGPT was proposed in a paper as a method for LLMs to use external tools via tokenization, aiming to bypass restrictions of fine-tuning and in-context learning.
Protein predictions get a parallel processing power-up: Users celebrated an update to BulkProteinviz, an open-source protein structure prediction tool that now enables simultaneous multiple predictions. This could significantly accelerate research in computational biology.
LLama 3:70B seeks a size upgrade: One engineer asked for tips to grow their training data for Llama 3:70B managed through Ollama, attempting to increase from 40GB to 200GB for more robust local training.
Modular (Mojo 🔥) Discord
MLIR's Kgen Dialect Causes Consternation: Community members are baffled by the kgen dialect in MLIR as it lacks public documentation, with one user describing the code as messy. Suggested workarounds for implementing 256-bit integers in MLIR include using SIMD[DType.int64, 4] or defining an i256 type, supported by a GitHub reference.
Mojo Rides the Open Source Wave: Members are informed that Mojo language is partially open source with its compiler to be progressively open-sourced, as detailed in a blog post. Discussions revealed current practical limitations in Mojo for production environments and advice was shared against using Mojo in complex automation work until it matures.
Evolving Mojo's Ecosystem with Package Managers and Livestreams: The development of a package manager for Mojo is underway with community suggestions like Hammad-hab's pkm. Additionally, the community was invited to a Modular Community Livestream to discuss MAX Engine and Mojo developments, available on YouTube.
Blueprints for Burning Questions in Modular's 'engine' Room: A detailed clarification about the execute function in the MAX Engine was provided, specifying that it can take a variadic NamedTensor or Tuple[StringLiteral, EngineNumpyView], as stated in the Model documentation.
Nightly, Handle Mojo with Care: The release of the latest Mojo compiler version 2024.6.2005 was announced, and users can view the changelog for details. Additionally, a new tool titled "mojo_dev_helper" for standard library contributors was introduced, with more details available on GitHub.
AI Stack Devs (Yoko Li) Discord
- Spam Storm Strikes Discord: Multiple channels within the Discord guild were plagued by spam bots promoting "18+ Free Content" including OnlyFans leaks, with a link to an illicit Discord server. The shared invite URL across all instances was Join the Discord Server!.
- Community Acts Against Spam: Following the flood of inappropriate content, actions were taken by members to report and block the origins of the spam. There is confirmation that steps were taken against a reported user, indicating vigilance within the community.
- Nitro Boost Giveaway Scam Warning: In addition to adult content spam, there was mention of an alleged Nitro Boost giveaway, likely a part of phishing attempts or scams associated with the same spammed Discord link.
- Repeated Targeted Channels: The spam was not isolated but instead appeared across various channels, from #committers to #ai-explained-cartoons, indicating a widespread issue.
- Members' Concern and Prompt Response: Amidst the spam, there was an expressed concern from members for the need to take swift actions, and there were affirmative responses indicating that the community is responsive and proactive in handling such disruptions.
LM Studio Discord
New Horizons for LM Studio 0.2.23: LM Studio's version 0.2.23 is hailed for its speed boost, greatly improving efficiency. Users report headaches with Deepseek Coder v2 due to "unsupported architecture" errors, but note that disabling flash attention and employing version 0.2.25's deepseek coder preset can mitigate the problem.
Hardware Conundrums and GPU Debates: Discussions revolve around the heavy VRAM demands of large language models (LLMs), suggesting 38GB+ of VRAM for seamless performance on 34GB models and debating the merits of Nvidia's 3090 vs 4090 in cost-effectiveness and VRAM capacity. AMD 7900XT's suitability for LLMs is questioned amid issues with ROCm support and general detection hitches on some systems.
Seeking Frontend Flexibility: Engineers are exploring frontend options for local LLM server deployment on various devices, with every-chatgpt-gui and awesome-chatgpt repositories being common starting points. Some express frustrations over automated moderation in llama-related subreddits, which seem overly aggressive.
Technical Quirks in Model Discussions: Nvidia's new storytelling model garners interest for its balance in reinforcement content. The extent of Opus's context capacity sparks debates, with hopes pinned on extended limits. DeepSeek Coder V2 Lite has a peculiar inclination towards Chinese responses unless an older template is used. A preference emerges for a new model over Midnight Miqu's offerings following some hands-on tests.
Bottlenecks in Beta and Tech Previews: Latest beta testing of LM Studio reveals detection problems with Nvidia's 4070 GPU on Linux Mint and hiccups with DeepseekV2 models. M1 Mac users face inconsistencies when leveraging GPU acceleration, and AMD users are directed towards installing ROCm packages to ensure GPU compatibility.
OpenRouter (Alex Atallah) Discord
- A Quicker, Cheaper, Better Claude: The new Claude 3.5 Sonnet from Anthropic has been launched, touting better performance than its predecessor Opus, while being 5x cheaper and 2.5x faster; it offers self-moderated versions alongside standard ones, with prices detailed in a tweet.
- Stripe's Glitch in the Credits: Stripe payment issues that caused credits to queue incorrectly have been resolved, with affected transactions from the last half-hour processed successfully.
- Nemotron's Hosting Challenges: Nemotron is not favored for hosting among providers, primarily due to its large size at 340 billion parameters and lack of compatibility with popular inference engines.
- Dolphin Mixtral's Open Licensing Advantage: Praise was shared for Dolphin Mixtral 1x22b model, which is available on HuggingFace and recognized for its potential to replace Codestral while avoiding licensing restrictions.
- Clarifying DeepSeek-Coder V2's Limits: Confusion over the context length for DeepSeek-Coder V2 was addressed; despite its model card claiming 128K, further clarification revealed a 32K cap due to the OpenRouter hosting limitations.
Eleuther Discord
- 1B Internet Argument Solver? Cost vs. Practicality: There's lively debate on the feasibility of training a 1B model specifically to resolve internet arguments, with concerns about high costs versus the model's training time, which can be under two days on an H100 node.
- Tech Woes: Selectolax, Lexbor, and NumPy Miseries: Engineers face technical issues with Selectolax and Lexbor causing segmentation faults, and struggle with NumPy 2.0 compatibility in the
lm-eval-overview.ipynb, even after downgrading.
- Warc and the Speed Demons: Discussion on CC Warc file processing has members sharing various optimizations, with reports of one Warc taking 60 seconds to process using 100 processes, and another approach leveraging parallel processing across 32 processes.
- Data Hub Bonanza: Epoch AI's Data Hub now catalogs over 800 models, aiming to benefit researchers, policymakers, and stakeholders and pointing to potential computational explosion in frontier AI by the 2030s, as discussed in a CNAS report.
- Research Riches: From Token Datasets to Slot SSMs: Discussion in the research channel spans diverse topics including the performance effects of the 4T token dataset from DCLM-Baseline, the introduction of SlotSSMs for better sequence modeling in a paper, models struggling with drug brand names in medical applications, post-training enhancement techniques like LAyer-SElective Rank reduction (LASER), and domain conditional PMI to tackle surface form competition in LLMs.
Interconnects (Nathan Lambert) Discord
- Claude 3.5 Sonnet Takes the Lead: Anthropic introduced Claude 3.5 Sonnet, boasting faster speeds and improved cost-efficiency, along with a promise of future models named Haiku and Opus. Meanwhile, Character.AI focuses on optimizing inference for their AGI, capable of handling 20,000 queries per second—comparatively 20% of Google Search's volume.
- Youth Driven AI Engagement: Character.AI is experiencing notable session times, particularly among younger users, which surpass the engagement seen with ChatGPT. Additionally, Claude 3.5 Sonnet tops aider’s code editing leaderboard, especially excelling in "whole" and "diff" editing formats.
- Sour Grapes in AI Safety?: Members expressed skepticism about the trust and implementation of AI safety, with ironic "Trust me bro" sentiments and references to Eliezer Yudkowsky's challenge to AI alignment plans. Scott Aaronson's recount of Ilya Sutskever's quest for a theoretically robust alignment stance also surfaced.
- Kling Outshines Sora: Kuaishou has launched Kling, a text-to-video generative AI model available to the public, which raises the bar with two-minute videos at 1080p and 30fps, unlike OpenAI’s Sora. Furthermore, there's curiosity about Meta's use of 5000 V100s for generating synthetic data, a topic being revisited by Nathan Lambert.
LlamaIndex Discord
- CrewAI Teams Up with LlamaIndex: CrewAI announced an enhancement to multi-agent systems by integrating with LlamaIndex, providing a way to define a "crew" of agents that leverage LlamaIndex capabilities for tasks. Details of this integration can be found in their latest blog post.
- AI Fair's Future Speaker: The founder of LlamaIndex is scheduled to present at the AI Engineer's World's Fair, discussing the Future of Knowledge Assistants on June 26th with some major announcements, and another session on June 27th. For more information, enthusiasts can learn more here.
- Vector Store Customization Queries: Engineers are exploring the flexibility of LlamaIndex's VectorStoreIndex with questions about adding sequential identifiers, custom similarity scores, and asynchronous node retrieval, though some features might require custom implementation due to current limitations.
- Knowledge Generation from Documents: Discussion around generating questions from PDFs using LlamaIndex's
DatasetGeneratorwas shared, including an example utilizing OpenAI's model for the task.
- Persisting Indexes Made Easy: A focus on storing indexes persisted with a conversational highlight on using
storage_context.persist()to store a DocumentSummaryIndex in LlamaIndex, accompanied by practical code illustrations.
OpenAccess AI Collective (axolotl) Discord
- Speed Boost in Nemotrons API: Members reported Nemotrons API improvements, highlighting significant speed increases and a newly released reward model.
- Turbcat or Turbca?: Clarification was made on the Turbcat debate; it's the model, with Turbca being the individual behind it. Issues with dataset configuration and tokenization methods prompted discussion and concern.
- Tokenization Troubles and Solutions: A robust debate emerged regarding tokenization and how to handle end of text (EOT) tokens, with a member presenting the Multipack with Flash Attention documentation to showcase the best practices.
- Qwen's Biases Unraveled: The community expressed concern over the Qwen model's biases and the need for adjustments, pointing to Chinese LLM censorship analysis for insights into the model's potential propagandistic inclinations.
- Layer-Pruning and QLoRA Hit the Spot: The intersection of layer-pruning and QLoRA was brought up, with a member citing its successful application in improving model performance (MMLU scores by up to 10 points) and a Hugging Face model card for practical application details.
LangChain AI Discord
- Single Quotes Save the System: A user discovered that substituting backticks with single quotes fixes a data injection issue in a SystemMessage.
- Chunk and Conquer Large Text: Strategies for handling large text data from web scraping were discussed, including token limits and how to effectively combine chunked responses, with links to LangChain documentation.
- PDF Puzzles Vector Databases: Retrieving data from vector databases using PDF documents has proved challenging for a user, who encountered non-informative "I don't know" answers from the system.
- Manage Event Streaming Like a Pro: Techniques for event filtering in astream_event were shared, with pointers to specific sections in the LangChain documentation guiding users on the process.
- Launching Foodie AI Assistants and Chatbots: TVFoodMaps introduced an AI-powered feature to help users find restaurants featured on TV, requiring a premium membership, while a guide to create SQL agents using OpenAI & LangChain was shared, inviting feedback. A new concept named Conversational Time Machine was introduced in an article on Medium, exploring the development and uses of a LangGraph Support Chatbot.
tinygrad (George Hotz) Discord
Bounty Hunters for Approximation: In the pursuit of a bounty to implement Taylor approximations for LOG2, EXP2, and SIN in function.py, issues about adding bitwise operations to ops.py arose, with community concern about operation count inflation. Practicality trumps purity as the need for new operations competes with the aim for minimalism.
Multi-GPU Quest Continues: Clarifications around multi-GPU support with NVLink led to learning that GPUs connect via PCI-E, and a GitHub resource was shared, evidencing NVIDIA's Linux open GPU kernel modules with P2P support.
High Bar for Diffusion Models: A community member's port of a diffusion model from PyTorch to tinygrad sparked a debate on code quality, with George Hotz setting the bar high for inclusion into the project. Contributors are encouraged to submit a PR for scrutiny.
Clip, Clip, Hooray? Or Mayday?: An intense technical dissection took place regarding clip_grad_norm_ implementation in TinyGrad, where Metal's limitations forced a discussion on tensor chunking as a workaround. This signifies the ongoing struggles with optimization within hardware confines.
Tying Weights, Loosing Bugs: A suspected bug involving weight tying in TinyGrad was spotlighted, revealing that two ostensibly linked tensors were being optimized independently. The community is on the case, suggesting library corrections for consistent weight optimization.
LLM Finetuning (Hamel + Dan) Discord
- Persistence of Discord Community Debated: Members discussed the continued activity of the Discord server post-course, suspecting it would depend on member and moderator engagement, without concrete plans outlined.
- Expert LLM Livestream Incoming: A livestream with Eugene Yan from Amazon and Bryan Bischof from Hex discussing real-world LLM applications was announced, promising insights geared toward prompt engineering, evaluation, and workflow optimization. Interested members can register here and explore their learnings detailed in an O'Reilly report.
- Finetuning Insights and Requests: Regarding custom LLM workloads, discussions included needing fine-tuning for specific roles such as fraud detection, while general tasks like language translation may not. In a different vein, there was a buzz around Jarvis Lab's upcoming Docker feature and Modal's user experience enhancement for finetuning.
- Credits and Access Issues Centre Stage: Multiple members sought assistance regarding credits and account access across various platforms like LangSmith and OpenAI, often providing IDs or emails in a plea for help, indicating a level of confusion or technical problems.
- Technical Glitches and Triumphs: Amidst praise for a well-designed eval framework, users reported various technical issues from CORS errors at Predibase to credit visibility on OpenAI, showing a mix of user experience in the practical aspects of applying LLMs to projects.
OpenInterpreter Discord
- Riches Beyond Just Wealth in AI Discussion: Members joked about whether OpenInterpreter (OI) could make someone financially richer, leading to playful banter about achieving 100% richness instead of just 5%. In another thread, discussions around Claude 3.5 Sonnet revealed a preference for its dialogue style over GPT-4.
- AI Models Face Off for Top Honors: Debates surfaced concerning the best uncensored models, with "2.8 dolphin" and "mistral 3/31/24" mentioned as contenders. Opinions diverged, indicating varying user experiences with each model and no definitive best model emerged.
- Memory Lane with Open Interpreter: Queries regarding potential long-term memory capabilities in OpenInterpreter prompted discussion but yielded no conclusive solutions. Members are actively looking into how to equip OI with persistent memory.
- OpenInterpreter's Tentative Manufacturing Milestone: An update in #O1 indicated the expected shipping of the first 1,000 OpenInterpreter units between October 31st and November 30th, as per an announcement from Ben. Curiosity arose about order statuses and positioning within the first shipment batch.
- Practical AI Magic with Local, Task-Oriented Controllers: A demonstration showed a fully local, computer-controlling AI successfully connecting to WiFi by reading a password from a sticky note, illustrating the effectiveness of AI in executing everyday tasks. The example noted reflects AI's potential to simplify daily interactions with technology.
LAION Discord
- Graph-Based Captions Make a Leap: The GBC10M dataset, a graph-based recaptioned version of CC12M, is now available on Hugging Face. Efforts are underway to secure a less restrictive license and transition the dataset to the Apple organization on Hugging Face, with plans to publish the accompanying paper on arXiv and release the code once it's refined.
- Adversarial Robustness Debates Heat Up: A scuffle erupts in academic circles as experts like Carlini and Papernot challenge the Glaze authors on adversarial robustness issues, particularly regarding a withheld codebase for perturbation budgets.
- VAEs Channel Increase Sparks Debate: Raising the channel count in VAE latent spaces from 4 to 16 sparked a technical debate, juxtaposing the complexity in latent space against computational costs, and noting the quadratic scaling of global attention with pixel count.
- The Mystery of Overfitting Solved by Claude-3.5?: An engineer's manual experiment suggests that Claude-3.5-Sonnet shows a promising ability to reason through problems without overfitting on recognizable patterns, unlike other models.
- Chameleon Model Training Hits a Wall: Engineers face an unexpected challenge with the Chameleon model as extreme gradient norms cause NaN values, with no remedy from standard fixes like reducing learning rates or switching to higher precision.
Cohere Discord
- Multi-Language Chatbots with Cohere: AI enthusiasts are employing the Cohere API for developing chatbots in various languages, and a discussion highlighted its compatibility with OpenAI's API, allowing integration into any environment through RESTful APIs or sockets.
- Purple Praise: Cohere's interface, notably its use of the color purple, received commendations for its stylish design in the community, sparking inspiration for future design endeavors among members.
- Problem-Solving in Project Development: A community member shared their experiences dealing with chat hang-ups potentially linked to API issues, with a commitment to addressing the issue through UI adjustments and ongoing troubleshooting.
- Community Camaraderie: Excitement was evident among participants who welcomed new members and shared their positive impressions of Cohere's unique and intelligent approach.
- Platform Adaptability Discussions: Dialogues emerged around utilizing Cohere's capabilities on different platforms, with a specific mention of creating chatbots in .NET on a Mac.
Latent Space Discord
- Toucan TTS Breaks Language Barriers: The open-source Toucan TTS model is distinguished by its capability to support TTS in 7000 languages, featuring a text frontend for language-agnostic articulatory features and leveraging meta-learning for languages lacking data.
- Claude 3.5 Sonnet Takes Efficiency to New Heights: The new Claude 3.5 Sonnet impresses the community by outperforming competitors, providing higher speeds and reduced costs. Members also celebrate the launch of the Artifacts feature, a Code Interpreter successor, enabling real-time doc, code, and diagram generation.
- Consultancy Collaboration Creates AI Synergy: Market buzz as Jason Liu's Parlance Labs merges with Hamel Husain's and Jeremy Lewi's teams, uniting to enhance AI product support and development, focusing on infrastructure, fine-tuning, and evaluations as noted in their announcement.
- Groq Steps Up with Whisper Support, but Concerns Linger: Groq's new Whisper model support, which achieves speeds at 166x real-time, opens doors for faster AI processing; yet, the community raises questions about its current rate limits and the model's broader applicability.
Mozilla AI Discord
- Llamafile Aims for Model Diversity: In discussions, it was proposed to harness YOLOv10 PyTorch and OCR Safe Tensors within a Llamafile structure. A solution offered entails converting these models to gguf format via llama.cpp Python scripts.
MLOps @Chipro Discord
- Infer Conference Ignites AI/ML Discussions: Hudson Buzby and Russ Wilcox will spearhead conversations on real-life recommender systems and AI/ML challenges at Infer: Summer '24, with a focus on optimizing AI pipelines and content accuracy, featuring experts from companies like Lightricks.
- Network and Learn at RecSys Learners Virtual Meetup: RecSys Learners Virtual Meetup, hosted by Rohan Singh S Rajput on 06/29/2024, provides a platform for professionals of all levels to connect and enhance their knowledge in recommendation systems.
Datasette - LLM (@SimonW) Discord
Florence 2 Takes Handwriting OCR Up a Notch: Florence 2 by Microsoft has received praise for its superior performance in handwriting recognition and OCR, especially useful for journalism. Microsoft's model stands out in processing public records.
Test Drive Florence 2 on Hugging Face: The Florence 2 model is available for hands-on experimentation at Florence-2 on Hugging Face, showcasing its range of capabilities in vision-related tasks, which are crucial for AI development and research.
Inside Florence 2’s Visual Prowess: The model uses a prompt-based methodology for various vision and vision-language tasks, trained on the massive FLD-5B dataset containing 5.4 billion annotations, demonstrating mastery in multi-task learning and adaptability in both zero-shot and fine-tuned scenarios.
YAIG (a16z Infra) Discord
- "Don't Mention AI or Get Piledrived": An entertaining blog post, "I Will Fucking Piledrive You If You Mention AI Again", mocks the AI hype cycle, cautioning against the overzealous and impractical adoption of AI technology with a warning that it's a "cookbook for someone looking to prepare a twelve course fucking catastrophe." Engineers interested in cultural critiques of the industry might find it a different but relevant read, available here.
The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!