AI News (MOVED TO news.smol.ai!)

Archives
August 27, 2024

[AINews] not much happened this weekend

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


we're running out of subtitles.

AI News for 8/23/2024-8/26/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (214 channels, and 5673 messages) for you. Estimated reading time saved (at 200wpm): 639 minutes. You can now tag @smol_ai for AINews discussions!

A few news items:

  • Distributed AI: Nous Research announced DisTrO, their new optimizers that "reduces the inter-GPU communication requirements by 1000x to 10,000x without relying on amortized analysis, and matches AdamW+All-Reduce in convergence rates. This enables low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware." - a nice alternative to GDM's DiLoCo.
  • a snowball of Cursor AI love following a viral video of an 8yr old using it and their fundraise announcement. Their first podcast interview was exactly a year ago and Aman returned to co-host in June.
  • George Hotz's tinybox is live for sale!. About

Since the newsflow is light, why not give Box feedback on Box AI's new beta?


[Sponsored by Box] You are building things with AI. So is Box. Imagine if you built your things using Box’s things. Actually, don’t imagine it, try it yourself in the Box AI Developer Zone.

Swyx's comment: thanks to Box (via Freeman & Forrest) for supporting AI News this August (1, 2, 3)!


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI and Robotics Developments

  • Humanoid Robots: @adcock_brett reported that China-based robotics startup AGIBOT revealed 5 new humanoid robots with open-source plans, each designed for different tasks from household chores to industrial operations. Additionally, @adcock_brett mentioned that Unitree, another Chinese robot manufacturer, showcased its new G1 humanoid robot, reportedly nearing "mass production" at a price of $16,000.
  • AI-Generated Motion: @adcock_brett noted that ETH Zurich and Disney developed an AI system capable of generating physics-based movements for robots from text or image inputs, using a two-stage approach that learns latent representations of motion from large datasets.
  • Teleoperation System: @adcock_brett highlighted UC San Diego's release of ACE, a low-cost, cross-platform teleoperation system allowing researchers to control multiple robots with precision simultaneously. The system is fully open-sourced.

AI Models and Tools

  • Jamba 1.5: @adcock_brett reported that AI21 Labs unveiled Jamba 1.5, a new multilingual AI model family with a 256,000 context length, 2.5x faster long context in its size class, and permissive licensing for smaller organizations. The model has full open weights.
  • Dream Machine 1.5: @adcock_brett mentioned Luma Labs' release of Dream Machine 1.5, an upgrade to their AI video generation model, allowing for higher-quality text-to-video, more intelligent prompt understanding, and improved image-to-video capabilities.
  • Ideogram v2: @adcock_brett noted that Ideogram released v2 of its text-to-image AI model, distinguishing itself with the ability to generate near-perfect text, opening up new use cases for image generation like thumbnails, posters, and memes.
  • Mistral-NeMo-Minitron 8B: @adcock_brett reported that Nvidia and Mistral released Mistral-NeMo-Minitron 8B, a small model that can run on laptops and PCs, outperforming Mistral-7B and Meta-LLama 3.1-8B on the Open LLM leaderboard.

AI Applications and Research

  • Autonomous Sales Agents: @adcock_brett mentioned Salesforce's introduction of two fully autonomous, AI-powered sales agents, Einstein SDR Agent and Einstein Sales Coach Agent, capable of engaging with inbound leads and coaching salespeople in real-time.
  • Amazon's AI Assistant: @adcock_brett shared an update from Andy Jassy on Q, Amazon's AI assistant for software development, estimating it has saved the equivalent of 4,500 developer-years of work.
  • Neuralink Progress: @adcock_brett reported on Neuralink's progress with their second human patient, Alex, who demonstrated impressive control in playing Counter-Strike 2 using just the brain-computer interface and broke the previous world record for BCI cursor control on day one.

AI Development and Tools

  • Git Commit Message Generator: @karpathy shared a utility that auto-generates git commit messages based on the git diff of staged changes, using the llm CLI utility from @simonw.
  • Speculative Decoding for Code Edits: @rohanpaul_ai highlighted Cursor.ai's blog post on modifying diff format and speculative edits with fine-tuned Llama 70B, achieving a 4-5x speed up over GPT4-o and pushing the pareto frontier on the accuracy/latency curve.
  • VoiceCraft: @rohanpaul_ai mentioned an impressive tool for zero-shot speech editing and text-to-speech in the wild, capable of cloning unseen voices with only a few seconds of reference.

AI Research and Frameworks

  • GraphRAG: @rohanpaul_ai discussed a survey paper on GraphRAG techniques, bridging graph-structured data and language models to capture complex relational knowledge more effectively than text-based methods.
  • iLoRA: @rohanpaul_ai highlighted a paper proposing Instance-wise LoRA (iLoRA), which personalizes LLM recommendations by integrating LoRA with Mixture of Experts for improved accuracy in sequential recommendation systems.
  • RAGLAB: @rohanpaul_ai mentioned RAGLAB, an open-source library for standardizing RAG research, featuring a modular design for fair comparisons between algorithms.

AI Ethics and Regulation

  • California SB 1047: @labenz commented on the SB 1047 bill, noting that few models would be covered (only those costing $100M+) and that developers are already voluntarily conducting extensive safety testing.

Memes and Humor

  • @AravSrinivas shared a humorous t-shirt caption related to AI.
  • @vikhyatk jokingly suggested turning off syntax highlighting to become a better developer.
  • @abacaj humorously commented on the prevalence of Cursor-related content in their feed.

This summary captures the key developments, research, and discussions in AI and robotics from the provided tweets, focusing on aspects relevant to AI engineers and developers.


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Hardware Optimization for Local LLM Inference

  • Is $2-3000 enough to build a local coding AI system? (Score: 55, Comments: 102): A user inquires about building a local coding AI system with a budget of $2,000 to $3,000, aiming to replicate the performance of commercial coding assistants like Cursor and Anthropic. They prioritize speed over accuracy, suggesting that accuracy can be improved through better prompting or retries, and specifically ask if a Mac Studio would be sufficient for this purpose.
  • Consider not using a Mac... (Score: 178, Comments: 149): The post compares LLM inference performance between an M2 Mac Studio and an AMD build with a 2080ti GPU. The Nvidia setup significantly outperforms the Mac, processing 32k context in 25 seconds compared to the Mac's 260 seconds, while using less VRAM (10GB vs 30GB) and supporting 64k context with flash attention and quant k,v. Additionally, the Nvidia rig demonstrates more stable performance with context shifting and reply generation.

Theme 2. Advancements in Long-Context LLM Generation

  • LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs (Score: 74, Comments: 15): LongWriter is a technique that enables long-context Large Language Models (LLMs) to generate coherent texts exceeding 10,000 words. The method involves breaking down the generation process into manageable chunks, using context windows of up to 32,000 tokens, and employing strategies like recursive summarization and dynamic prompting to maintain consistency across sections. This approach allows for the creation of extended narratives, comprehensive reports, and other long-form content while preserving thematic coherence and logical flow throughout the generated text.

Theme 3. Anthropic's Controversial Stance on AI Regulation

  • Do you think Anthropic is worse than OAI with fighting open source? To me it seems like the case. This letter appears to imply they actually suggested the bill to Sen Wienner... I really like my OSS LLMs.... (Score: 226, Comments: 111): Anthropic appears to be taking a more aggressive stance against open-source LLMs compared to OpenAI, potentially suggesting legislation to Senator Wienner. The post author expresses concern about this perceived stance, indicating a preference for open-source language models. This debate highlights the tension between AI safety regulations and innovation in LLM development, particularly in the open-source domain.
    • The proposed California bill SB1047 requires safety testing and a built-in "kill switch" for large AI models. Critics argue this could stifle innovation and open-source development, potentially driving AI progress out of the US.
    • Users expressed concerns about regulatory capture, suggesting Anthropic may be pushing for legislation to maintain their market position. Some compared it to past attempts to regulate new technologies like cars, planes, and video games.
    • Discussion highlighted the challenges of implementing a "kill switch" in mathematical models and the potential for innovation to move elsewhere, particularly to countries like China that may be less inclined to regulate AI development.

Theme 4. Emerging Chinese LLMs Challenging Western Models

  • Impressed by GLM-9B (they say little about the model) (Score: 54, Comments: 12): The post author expresses surprise at the performance of the GLM4-9B model, claiming it far exceeds Gemma 2 9B and Llama 3.1 8B in terms of answer quality. They share a link to the model on Hugging Face and ask for others' opinions and experiences with the model, noting that there seems to be little discussion about it.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Robotics and AI Hardware

  • Disney Research's facial mimicry robot: A robot developed by Disney Research can imitate human facial movements, specifically blinking and subtle head movements.
  • Beijing World Robotics Conference 2024: The conference showcased various robotic technologies, highlighting advancements in the field.

Biotechnology and Food Tech

  • Lab-grown meat cost parity: A study suggests that lab-grown meat can cost the same as USDA organic chicken, indicating progress in making cultured meat economically viable.

AI Model Development

  • Model size and intelligence trade-offs: A discussion on the compromise between model size and intelligence suggests that recent models are significantly distilled compared to earlier versions like GPT-4, potentially affecting their capabilities.
  • Perceived slowdown in AI progress: Users in r/OpenAI are discussing a perceived slowdown in AI advancements, noting that recent developments haven't been as impressive as those from a year ago.

AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. LLM Advancements and Benchmarking

  • Grok-2 Climbs the LMSYS Leaderboard: xAI's Grok-2 has made a significant impact on the LMSYS Leaderboard, surpassing GPT-4o (May) and tying with the latest Gemini for the #2 spot with over 6,000 community votes.
    • Grok-mini, also from xAI, secured the #5 position, excelling particularly in Math (#1) and ranking #2 across Hard Prompts, Coding, and Instruction-following categories.
  • 1.5-Pints LLM: Quality Over Quantity: A new compact LLM called "1.5-Pints" was pretrained in just 9 days using a curated dataset of 57 billion tokens, outperforming both Apple's OpenELM and Microsoft's Phi on the MT-Bench benchmark.
    • The model utilizes a modified Mistral tokenizer and Llama-2 architecture, prioritizing "textbook-like" content for enhanced reasoning and logical deduction capabilities.

2. LLM Optimization Techniques

  • DisTrO: Revolutionary Distributed Optimization: Nous Research released a preliminary report on DisTrO, a family of distributed optimizers that reduces inter-GPU communication requirements by 1000x to 10,000x without relying on amortized analysis.
    • DisTrO matches AdamW+All-Reduce in convergence speed, potentially revolutionizing large-scale LLM training. The full report is available on GitHub.
  • LIGER Kernel Boosts LLM Training Efficiency: The new LIGER kernel for LLM training has achieved impressive results, offering 25% VRAM savings and 33% training time reduction compared to traditional methods.
    • While primarily designed for multi-GPU setups, LIGER is expected to provide improvements even for single GPU training scenarios, sparking excitement in the AI community.
  • Sparse-Marlin Accelerates Matrix Multiplication: Sparse-Marlin, a new GPU-optimized kernel, has been integrated into the vllm_project, achieving 5.3x speedups on NVIDIA GPUs (Ampere/Ada) for matrix multiplication with 4-bit quantized weights.
    • This advancement maintains efficiency with batch sizes up to 32 and leverages 2:4 sparsity, potentially revolutionizing inference speed for large language models.

3. Open Source AI Developments

  • Zed AI: The Open Source Coding Companion: Zed AI has launched as an open-source AI-powered code editor, offering a powerful interface for AI-assisted programming with support for models like Claude-3.5 and integration with Ollama.
    • The editor features a new Anthropic API designed for fast text transformations, available free for the first month, positioning itself as a strong alternative to proprietary options like Cursor.
  • Apple's ML-Superposition Prompting Goes Open Source: Apple has released their ML-Superposition Prompting project as open source, now available on GitHub, aiming to advance prompting techniques in machine learning.
    • This release has generated excitement in the AI community, potentially offering new tools and methodologies for researchers and developers working on language models and prompt engineering.
  • Tinybox: Open Hardware for AI Enthusiasts: The Tinybox, an open hardware project associated with the tinygrad framework, has launched sales to the public through the tiny shop.
    • With a current production capacity of about 4 units per day and a backlog of 60 units, the Tinybox represents a growing interest in accessible, open-source hardware for AI development and research.

4. AI Industry and Community Updates

  • AI Engineer London Meetup Announced: The first AI Engineer London Meetup is scheduled for September 12th, featuring speakers Maxime LaBonne, Rovio Sc, Martins Bruveris, and Chris Bull, as announced by @dctanner.
    • This event is inspired by @swyx's AI Engineer World's Fair, aiming to bring together AI enthusiasts and professionals in London for knowledge sharing and networking.
  • Together AI Adjusts Pricing Structure: Together AI announced price increases for its Serverless Reference endpoints, with Llama-3 8B rising from $0.20 to $0.40 per million tokens, and Llama-3 70B from $0.90 to $1.80 per million tokens, effective September 1, 2024.
    • While these changes affect the Serverless Reference endpoints, Together AI's Turbo and Lite pricing remains unchanged, as reflected on their pricing page.

PART 1: High level Discord summaries

Nous Research AI Discord

  • DisTrO's Distributed Optimization Breakthrough: Nous Research released a preliminary report on DisTrO, demonstrating a 1000x to 10,000x reduction in inter-GPU communication without amortized analysis and matching AdamW+All-Reduce in convergence speed. The full report is available on GitHub.
    • This advancement in distributed optimizers marks a significant progress in LLM training, with the team expressing excitement for upcoming code and algorithm releases.
  • Hermes 2.5 Beats Hermes 2 in Performance: After integrating code instruction examples, Hermes 2.5 demonstrated superior performance over Hermes 2, achieving a score of 52.3 on the MMLU benchmark compared to Hermes 2's 34.5.
    • This substantial improvement sets a new standard for LLM performance evaluations among engineers.
  • 1.5-Pints LLM Achieves Quick Training Success: The new 1.5-Pints model was pretrained in just 9 days, surpassing both Apple's OpenELM and Microsoft's Phi on MT-Bench, which emulates human judgments. This was done using a curated dataset of 57 billion tokens focusing on logical deduction.
    • Utilizing a modified Mistral tokenizer and the Llama-2 architecture, this model exemplifies efficient training methodologies in the LLM domain.
  • Sparse-Marlin Accelerates Matrix Multiplication: The introduction of Sparse-Marlin into vllm_project improves matrix multiplication speeds by achieving 5.3x speedups on NVIDIA GPUs using 4-bit quantized weights.
    • This GPU-optimized kernel is likely to enhance performance significantly for users working with large models.
  • Exploring Whisper Diarization Implementation: A user inquired about implementing Whisper diarization and shared a script utilizing Whisper v3, seeking a method to identify speaker changes.
    • Current efforts involve amalgamating diarization capabilities to streamline audio processing and improve output fidelity.


Unsloth AI (Daniel Han) Discord

  • Unsloth Accuses LinkedIn of Code Theft: Members of the Unsloth channel assert that LinkedIn has copied code from their project, particularly in their Triton kernel implementation. They indicated the use of LinkedIn's Liger-Kernel repository and a post on Ollama as evidence.
    • The claims point out that LinkedIn benchmarks its kernels against Unsloth's work, implying a lack of fair contribution back to the original project.
  • Performance Comparison: Unsloth vs. Hugging Face: Discussions highlighted that Unsloth outperforms platforms like Hugging Face in speed and memory efficiency, despite lacking support for 8-bit models. This places Unsloth in a competitive position, yet with notable limitations.
    • Members expressed that while Unsloth demonstrates impressive training and inference times, full model support remains essential for broader adoption.
  • Liger Kernel Speeds Up LLM Training: A member revealed that the new Liger Kernel could enhance LLM training speeds by 20% while cutting memory usage by 60%, as discussed in a Reddit post.
    • Utilizing Triton, this kernel shows promise for optimizing training times, attracting attention for its potential applications.
  • Challenges in Fine-Tuning Multilingual Models: Members shared insights on training models in languages like Arabic and Persian, stressing the importance of specialized datasets and pretraining. One suggestion included leveraging Persian Wikipedia for better model results.
    • Concerns were raised regarding proper support for these languages in Llama-3, indicating a gap that may hinder advancement in multilingual capabilities.
  • Replete-LLM V2 Arrives with Enhanced Features: Replete-LLM-V2-Llama-3.1-8b is launched, emphasizing improvements in reasoning and coding performance, trained on the Replete-AI/The_Living_AI_Dataset to embed concepts of Love and Empathy.
    • The effectiveness of this model heavily relies on its system prompts, crucial for optimizing its information processing capabilities.


Stability.ai (Stable Diffusion) Discord

  • Clarifying Stable Diffusion Online's Status: Members questioned whether Stable Diffusion Online is an official site or if it operates independently from Stability AI.
    • This inquiry reveals ongoing confusion within the community regarding the credibility and linkage of various platforms related to Stable Diffusion.
  • ComfyUI vs. ForgeUI - Choose Your Tool!: A suggestion arose that those not utilizing the full capabilities of ComfyUI should consider switching to ForgeUI for a streamlined experience.
    • This debate highlights the ongoing conversation about optimizing workflows for image diffusion setups.
  • Diving into SD Image Upscaling Approaches: Members discussed various techniques for image upscaling, including Ultimate SD Upscale and Tiled Diffusion, particularly noting the '4x-NomosWebPhoto-atd' model combined with SUPIR.
    • These discussions emphasize the community's efforts to enhance image quality through advanced methods.
  • Noise Injection: The Secret Sauce for Image Quality: A member elaborated on 'Noise Injection' in A1111/Forge, explaining its role in improving image upscaling efforts.
    • This technique garners attention as a potential enhancement tactic, leading to higher quality outputs.
  • Struggles with Flux - Overfitting Issues: Discussion focused on Flux's challenges with overfitting, particularly in fantasy-related outputs leading to less diversity in generated images.
    • This exploration raised concerns about how Flux needs adjustments to balance creativity with variability.


HuggingFace Discord

  • Hermes 2.5 outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
    • Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
  • Mistral struggles expanding beyond 8k: Members stated that Mistral cannot be extended beyond 8k without continued pretraining and this is a known issue.
    • They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
  • Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
    • Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed 'cursed model merging'.
  • Model Quantization and Distillation Essentials: The importance of Model Quantization and Model Distillation for productionizing machine learning models was highlighted.
    • Members agreed these techniques are fundamental for effective deployment beyond local training.
  • TinyLlama's Quick Success: TinyLlama, a model similar to Tau LLM, was successfully trained in just 9 days and has outperformed both Apple’s OpenELM and Microsoft’s Phi on MTBench.
    • Training code and model weights were made publicly available on GitHub and HuggingFace.


OpenAI Discord

  • Model Scaling Hits Diminishing Returns: Discussions highlight diminishing returns on model scaling, especially with Llama 3.1 and Claude 3.5 Sonnet, where performance improvements lag behind increased compute power.
    • Participants stress the necessity of innovative breakthroughs to scale AI beyond mere data and computational increases.
  • Debating AI Consciousness: Philosophical discussions revolve around whether current LLMs like GPT can be considered conscious, considering they lack organic experience and may follow different laws than human consciousness.
    • Participants also examined implications on free will, suggesting AI systems exhibit decision-making based on internal logic rather than true volition.
  • Sharing GPTs Effectively: Members expressed interest in better tracking shared GPTs and their utility within the community, questioning how to assess their effectiveness.
    • The conversation included usability concerns regarding shared output features and possible improvements for tracking use-cases.
  • Create Custom GPTs with Brand Identity: A suggestion arose to leverage the custom GPT builder to craft GPTs that align with specific brand identities for content creation, using the GPT store for system prompts.
    • The emphasis was on enhancing brand consistency through custom prompts in API integrations.
  • Subscription Models for OpenAI API: Users explored how platforms manage subscription models for OpenAI's API, like monthly plans that utilize token-based pricing.
    • Chatbase was cited as an example under discussion, indicating a pressing need for clarity on implementation strategies.


Perplexity AI Discord

  • Perplexity Creator Community Launch: Perplexity AI partnered with Kale to launch the Perplexity Creator Community, allowing creators to earn cash for engaging video content.
    • This initiative encourages users to post on their own schedule while generating income based on their videos' reach.
  • API Rate Limits Cause Frustration: Maged Helmy from Newcode.ai urgently requested increased API rate limits for their integration after waiting six months without a response from the Perplexity team.
    • With over 3,500 users, Newcode.ai's operation depends on these enhanced limits to maintain performance.
  • GPT-4o Dominates Coding, Claude 3.5 Sonnet for Knowledge: Discussions highlighted GPT-4o as superior for STEM tasks while Claude 3.5 Sonnet excels in knowledge retrieval, particularly for coding-related queries.
    • Users noted Claude struggles with poetry and narratives, making GPT-4o a go-to option for a broader array of tasks.
  • Image Generation Troubles in Perplexity: Users reported significant challenges with image generation, particularly using Dalle3, where attempts led to thread failures.
    • Feedback indicated that the image generation process might need refinement, as some results did not meet user expectations.
  • Perplexity Pro's LinkedIn Subscription Offer: Perplexity AI is providing a free year of Perplexity Pro to LinkedIn Premium subscribers, though some users in the EU faced issues with availability.
    • The Pro version grants unlimited searches and access to advanced AI models like GPT-4 Omni and Claude 3.5 Sonnet.


OpenRouter (Alex Atallah) Discord

  • Grok-2 and Grok-mini hit Leaderboard!: xAI's Grok-2 and Grok-mini have surged onto the LMSYS Leaderboard with over 6000 community votes! Notably, Grok-2 ties with Gemini for #2, while Grok-mini excels in Math (#1) and ranks #2 across Hard Prompts, Coding, and Instruction-following.
    • Members cheered as Grok-2 bested GPT-4o (May), indicating a potential shift in the leaderboard dynamics and user preferences.
  • Database Outage Resolved: A recent database change led to a approximately 2 minute outage, but the issue has since been resolved and service is back to normal.
    • The team apologized for the inconvenience caused, emphasizing the need for reliable uptime.
  • Mistral can't scale beyond 8k: Concerns arose around Mistral, as it reportedly cannot extend past 8k without continued pretraining, highlighted as a known issue.
    • Suggestions included exploring mergekit and frankenMoE finetuning techniques for improved performance.
  • Claude 3.5 Sonnet goes dark again: Users reported that Claude 3.5 Sonnet is facing intermittent outages, affecting its availability significantly.
    • While Haiku is functional, issues persist across other models like Hermes 3.5, hinting at broader system instabilities.
  • OpenRouter API Key query: Users are discussing how to integrate their own API keys with OpenRouter, and whether the displayed token pricing reflects total costs including the OpenRouter fee.
    • Clarifications indicated that token prices are listed in OpenRouter credits, and the applicable fees are calculated upon adding credits.


Eleuther Discord

  • OMI Model Competence Discussion: Members discussed the ability of OMI participants to create AI models from scratch, but fell short on sharing concrete opinions or assessments.
    • No solid conclusions were reached, leaving participants pondering the competencies at play.
  • LLM Repetition Failure Mode: A common failure mode in LLMs where they repeat phrases was discussed, possibly linked to model over-quantization and minimizing loss.
    • Participants hypothesized that certain conditions might be triggering this looping behavior, highlighting a need for further investigation.
  • Anthropic's Interpretability Cost Challenge: Questions arose about the cost of replicating Anthropic's interpretability work for models like Llama 8B or Mistral, which are data-hungry and compute-intensive.
    • Members noted the high costs without providing specific figures, emphasizing the importance of resource allocation in these projects.
  • Sparse MoE's GPU Utilization Benefits: A member raised how Sparse MoE utilizes GPU sparsity for efficient distributed training, allowing experts to be spread across multiple processes.
    • This strategy could enhance performance in distributed inference contexts, highlighting scalability approaches.
  • GNNs and Evolutionary Learning Approaches: One member compared the evolution of GNNs to positional embeddings, suggesting future advancements may involve inferring embeddings from latent representations.
    • This perspective hints at new pathways toward improving representation learning in graph structures.


Latent Space Discord

  • Hermes 2.5 outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 shows superior performance over Hermes 2 in benchmarks, scoring 52.3 versus Hermes 2’s 34.5 on the MMLU.
    • This improvement highlights the effectiveness of recent optimizations in newer model iterations.
  • Mistral struggles with 8k limitations: Mistral cannot extend beyond an 8k context length without ongoing pretraining, recognized as a significant limitation in its current setup, and this is a known issue.
    • There's ongoing dialogue about exploring solutions like mergekit and frankenMoE finetuning to push these boundaries.
  • Unpacking BERTopic Utility: Discussion surfaced about BERTopic, a robust tool for topic modeling, with members sharing their project on visualizing data.
    • The conversation reaffirmed its end-to-end capabilities for generating interpretable topics, stimulating curiosity about its clustering efficacy.
  • Call for Collaboration on Open Empathic Project: A plea for broadening the categories for the Open Empathic project was made, emphasizing the need for contributions from the community.
    • Members were pointed to a YouTube tutorial for guidance on how to add their favorite scenes, alongside a link to the OpenEmpathic project.
  • AI Engineer Meetup Launch in London: Newly announced AI Engineer Meetup set for September 12th in London, inspired by the AI Engineer World's Fair with four notable speakers confirmed.
    • Interested attendees are encouraged to register here for what promises to be a highly engaging gathering.


tinygrad (George Hotz) Discord

  • Tinybox Sales Launch!: The Tinybox factory is now at full power, with sales set to open shortly to the public. Interested buyers can check out the tiny shop for purchase options.
    • The Tinybox is currently sold out with a production capacity of about 4 units per day, creating a backlog of 60 more units.
  • Concerns About E-graph Performance: Members expressed that e-graph rewrites lag behind current SAT solvers when tackling large search spaces, highlighting potential performance bottlenecks.
    • Continuous improvement is suggested to match the efficiency seen in established SAT solving techniques.
  • Exploring Tinygrad and AMD GPUs: Discussion emerged about using AMD GPUs with Tinybox, noting AMD's recent acquisition of Silo AI and their advancements in training LLMs on AMD hardware.
    • Community members weighed in, contemplating the feasibility and advantages of integrating AMD's capabilities effectively.
  • Tinygrad vs Torch in BERT Pre-Training: A user showed interest in collaborating with Tinygrad to pre-train a large BERT model, offering computing resources for the task.
    • This collaboration could pave the way for exploring the performance differences between Tinygrad and PyTorch for large model training.
  • Improving Training Speed: A user reported a 25% increase in training speed (GFLOPS) after tweaking preprocessing by removing the .cast(dtypes.default_float) call in the beautiful_cifar example.
    • With this adjustment, they noted that the model now processes data as dtype.float, enhancing efficiency.


Cohere Discord

  • Command-R Model Update Lacks Announcement: A new Command-R model has been released, but there’s been no official communication about its features, including pricing and context window.
    • Users demand clarity as many are eager to know about fine-tuning options and addressing unanswered questions.
  • Durov's Bold Citizenship Move: Pavel Durov, the Telegram founder, recently secured French citizenship and is now facing a trial in France, stirring debate.
    • Some speculate he aims for strategic prison time to gain international media attention amid tensions with NATO.
  • Cohere Offers Free Trial for Chatbots: A user explored using Cohere's free trial for building a Rasa chatbot, hoping for a cost-free alternative to OpenAI’s services.
    • The response indicated interest in affordable options as users navigate the costs associated with AI deployments.
  • Cohere API Rate Limits Tightened: New reports show users hitting 'too many requests' errors even at the documented rate, as limits have shifted to 1,000 calls per minute across all API keys.
    • Cohere clarified this means a holistic 1,000/minute limit per user organization, impacting those using multiple keys concurrently.
  • Clarification on Rerank 3 Pricing: Users inquired about Rerank 3 pricing, specifically whether $2 for 1,000 searches covers true API calls.
    • Cohere confirmed that each search processes up to 100 documents and totals 409,600,000 tokens for 1,000 searches based on document limits.


LlamaIndex Discord

  • Create Llama introduces extraction template: The Create Llama tool now features a structured extraction template, enhancing user experience.
    • This addition aims to streamline data extraction processes while maintaining accuracy and efficiency.
  • GraphRAG tutorial series kicks off: A new step-by-step tutorial series on building GraphRAG has begun, focusing on core component implementation.
    • The first video emphasizes how to extract entities and relationships with LLMs using an in-memory implementation.
  • Data silos hinder enterprise LLM development: Challenges persist with data silos in enterprise LLM development, underscoring the need for seamless authentication management.
    • LlamaIndex is investigating viable solutions to consolidate scattered knowledge across teams.
  • LLMs automate newsletter creation: The LlamaIndex newsletter has transitioned to using LLMs for automating content creation, previously a manual, time-intensive task.
    • This shift exemplifies the capability of LLMs in enhancing efficiency for regular content summarization.
  • RAG-a-thon hackathon on the horizon: The second RAG-a-thon hackathon, in partnership with Pinecone, is set for October 11-13 in Palo Alto, offering over $7k in cash prizes.
    • It will be hosted at the 500 Global VC offices, welcoming participants to showcase innovative solutions.


Torchtune Discord

  • Compiled Function Outputs Differ from Eager Mode: A member raised a question about why a compiled function might produce different outputs compared to its non-compiled version, with the same seed. This is attributed to differing RNG usage: Triton's RNG in compiled code versus PyTorch's in eager mode, potentially influenced by in-place operation behavior.
    • In-place operations, like scatter_, may yield unexpected results in compiled code, leading to higher memory consumption and varying output.
  • Cudagraphs Might Consume More Memory: The utilization of cudagraphs for debugging was discussed, indicating their potential to pre-allocate buffers. However, they can also lead to increased memory usage, which may not be desirable.
    • This signifies a trade-off in using cudagraphs, as the benefits need to be weighed against their memory overhead.
  • FP16 as a Memory-Saving Strategy: Switching to FP16 for inference instead of FP32 was suggested to lower memory usage, especially on hardware that doesn't support BF16. This altered approach reportedly alleviated out-of-memory issues.
    • Despite these improvements, discrepancies between compiled and non-compiled outputs remained a concern.
  • Exploring Numerical Differences in Compiled Kernels: The remaining output variance might arise from numerical differences inherent in the compiled kernels, even with optimized memory usage. This points to potential computational variations despite identical inputs.
    • Participants expressed concern over these numerical discrepancies, highlighting an area for further consideration in compiled code evaluation.


LangChain AI Discord

  • LangChain Document Loading: Image Extraction Simplified: The extract_images=True parameter in PyPDFLoader from LangChain's community package allows seamless image extraction from PDF documents, enriching text context for LLM processing.
    • This is particularly useful for applications requiring image analysis in conjunction with text data, expanding the functional capabilities of LangChain.
  • LLMChain vs LCEL: Flexibility vs Optimization: LLMChain provides a straightforward approach to chaining models and prompts, whereas LCEL offers greater customization and flexibility for complex tasks.
    • While LLMChain remains the optimal choice for most scenarios, enthusiasts of modular design may prefer the intricate control that LCEL introduces.
  • Troubleshooting PostgresSaver Errors: Users are encountering a TypeError related to tuple indexing while leveraging PostgresSaver with LangGraph, indicating potential issues in data type handling.
    • Further investigation is required to clarify tuple access methods and resolve this ongoing challenge experienced by developers.
  • GenAI's Growing Role in Data Science: A discussion highlighted the emerging role of Generative AI in the data science landscape, particularly in automating code generation and data pipeline setup.
    • Despite skepticism regarding its limits, participants acknowledged the critical integration between data science and GenAI advancements.
  • RAG Collaboration: Seeking Partners: A member shared their intent to develop a Retrieval-Augmented Generation (RAG) chatbot using LangChain, hoping to find collaborators for the project.
    • Challenges with scraping and RAG components were noted, underscoring the collaborative opportunities in this technical space.


OpenAccess AI Collective (axolotl) Discord

  • GPT-4 Fine-tuning vs Mistral: Mixed Reviews: A user claimed that fine-tuning GPT-4 is 'kind of shit' in comparison to Mistral, even though they utilized less data for training.
    • This sparked a discussion about the relative performance of both models in practical applications.
  • lm-eval-harness: Benchmarking Made Easy: Members discussed the lm-eval-harness framework, suggesting it simplifies the creation of benchmarks by offering easy task integration.
    • One user emphasized their research on generating benchmark questions, shared in their recent paper on MCQs for LLM evaluation.
  • LIGER Shows Impressive Training Efficiency: LIGER kernel promises 25% VRAM and 33% training time savings for LLM training, exciting users who are eager to test its capabilities.
    • However, there are doubts about its effectiveness for single GPU training, as noted by one user.
  • Curious About Phi-3-medium-128k-instruct Training Config: A user sought the training configuration for the Phi-3-medium-128k-instruct model, emphasizing the need for shared setups.
    • Another user questioned the token training in a specific config setup (modules_to_save) and referenced an external message for clarity.
  • Exploring Data Curation Techniques: A user probed into data curation, inquiring if it involves models providing ratings like the LLM-Judge system.
    • The conversation indicated an interest in methods employing model assessments for curating data, akin to existing systems.


Modular (Mojo 🔥) Discord

  • Mojo's Jitting Behavior Explained: When running mojo main.mojo in script mode, jitting occurs, which is why global variables behave differently than in the mojo build main.mojo compile mode.
    • This clarification helps users understand the complications of memory management when switching modes.
  • Community Ponders Development Pace: Concerns arose over a perceived slowdown in blog posts and updates for both Max and Mojo, possibly due to summer vacations or accumulating issues.
    • Members seek clarification on whether this impacts future releases and projects.
  • GPU Support Takes Center Stage: There's a strong push for GPU support in Mojo with expectations that future releases could address this, potentially moving Magic out of alpha.
    • Members are eagerly awaiting the next major release, aligning their community discussions with progress on these capabilities.
  • Modverse 42 Release Schedule Clarified: Members questioned the absence of Modverse 42 release last week, learning that releases occur every 1-3 weeks, depending on project volumes.
    • The ongoing weekly tag might be adjusted as content flow stabilizes.
  • Mojo's Struct Parameters and UnsafePointer Details: Issues arose with struct usage in Mojo causing errors due to variadic parameters not being parameterized correctly outside their defining structure.
    • A discussion on using UnsafePointer highlighted how ownership needs to be explicitly managed, underscoring the complexities of reference management in Mojo.


OpenInterpreter Discord

  • Custom Paths for OpenInterpreter Profiles?: A member queried about setting a custom path for OpenInterpreter profiles, but the developer stated this functionality isn't available yet, though it may come in the future.
    • This feature could enhance user flexibility once implemented.
  • OpenInterpreter --vision Flag Functionality on Windows: Inquiries about the --vision flag on Windows concluded that it should function correctly, with an encouragement to report any issues in a dedicated channel.
    • Further testing might yield vital insights into its compatibility across setups.
  • Prebuilt OpenInterpreter Demand Surges: The developer shared that preorders for prebuilt OpenInterpreter units are closed due to high demand, indicating strong interest.
    • Users will need to wait until sales resume, highlighting the technical community's engagement with the product.
  • Brand Guidelines Still Missing: A request for a brand guideline document surfaced, but members confirmed that no such document is available yet.
    • The inquiry tied into discussions around project accessibility and design considerations.
  • Zed AI: The Open Source Coding Companion: Zed AI offers a cool interface for coding with AI assistance, supporting models like Claude-3.5 and Ollama, enhanced by a new Anthropic API free for the first month.
    • It's gaining attention as a strong alternative to proprietary options like Cursor, fostering greater open-source development.


DSPy Discord

  • Apple's Superposition Prompting Project Launches: Members expressed excitement over Apple's new ML-Superposition Prompting project, now live on GitHub, aimed at refining prompting techniques in ML.
    • Currently, the community discussion centers around the initial reception of the project without further technical insights.
  • OpenAI Introduces Typed Outputs: Discussion sparked about OpenAI's new feature for typed outputs, focusing on validation for structured outputs in JSON format with references to projects like Outlines, Guardrails, and others.
    • Members linked to relevant GitHub repositories, indicating various libraries available for managing structured output formats.
  • Handling DSPy Output Errors: A member reported a ValueError in DSPy regarding 'Too many retries trying to get the correct output format' while using typed predictors, attributed to output filler text.
    • Another user provided insight and linked to an existing GitHub issue to clarify this common problem with JSON output parsing.
  • Exploring ColBERT Training for German: A user seeks guidance on structuring training data for a ColBERT model in German, proposing a 32-way triplets format like that of ColBERTv2.
    • Their suggested format for data structuring includes raw_query = [(query, (positive_passage, positive_score), [(negative_passage1, negative_score1), ...])], and they are looking for validation on its suitability.


Gorilla LLM (Berkeley Function Calling) Discord

  • Hugging Face Leaderboard syncs with Website: The Hugging Face Leaderboard now mirrors the website leaderboard due to a recent pull request, prompting a request for feedback from team members.
    • Anyone concerned about this change is encouraged to share suggestions.
  • BFCL V2-Live Dataset Accuracy in Focus: There’s an ongoing discussion about how to calculate the overall accuracy for the BFCL V2-Live dataset, noting it contains 2,251 question-function-answer pairs.
    • The dataset includes 258 simple, 7 multiple, 16 chained, and 14 multi-stage function calls, raising questions about accurate assessment methods.
  • Inquiries about Adding Models to BFCL: A new member expressed interest in adding a model to BFCL, asking about the process for non-open-source uploads and model evaluations with multiple components.
    • Details on maintaining model integrity while integrating with BFCL are sought.
  • Gorilla Leaderboard Explained: A query arose regarding the phrase "prepare the executable test pairs" in the Gorilla Leaderboard documentation.
    • The documentation clarifies that users are encouraged to contribute executable test pairs to the leaderboard, fostering collaborative improvement of evaluation methods.
  • Training LLMs for Function Calls: The Gorilla Leaderboard serves to train and evaluate LLMs for function calls using a standardized benchmark.
    • This framework allows for comparison across various models, enhancing performance evaluations.


LAION Discord

  • Anthropic's Mechanistic Interpretability Costs: A user questioned the expenses related to running Anthropic's mechanistic interpretability for models like Llama 8b and Mistral, noting the absence of open-source alternatives.
    • They highlighted concerns regarding whether the limitations are due to being data-intensive or compute-heavy, alongside seeking clarity on other contributing factors.
  • Upcoming AI Engineer London Meetup: Mark your calendars for the AI Engineer London Meetup on September 12th, showcasing insights from figures like Maxime LaBonne and Rovio Sc.
    • Details shared in a tweet by Damien C. Tanner reveal that this event aims to bring part of Swyx's AI Engineer World's Fair to the UK.


Interconnects (Nathan Lambert) Discord

  • Romain Huet Takes Over OpenAI DevRel: The new head of developer relations at OpenAI is Romain Huet, who confirmed his role on Twitter after joining in July 2023.
    • Huet's appointment comes after previous lead Logan's departure, suggesting a focused leadership transition in OpenAI's developer outreach.
  • Logan's Smooth Transition: Logan left OpenAI in July 2023, with confirmation from his successor, Romain Huet.
    • Huet noted that the transition was smooth, indicating established protocols for leadership changes within the organization.


Alignment Lab AI Discord

  • AI Engineer London Meetup Kicks Off: The inaugural AI Engineer London Meetup is set for the evening of 12 September, featuring four speakers: Maxime La Bonne, Roviosc, Martins Bruveris, and Chris Bull. Registration details can be found here.
    • This event aims to be a segment of the AI Engineer World's Fair, hosted by Damien C. Tanner, highlighting vibrant discussions among AI engineers.
  • Highlight on AI Engineer World's Fair Influence: This London Meetup draws inspiration from the AI Engineer World's Fair, with the goal of creating a collaborative atmosphere for AI discussions. The event brings together an exciting lineup of speakers to share insights and experiences.
    • Hosted by Damien C. Tanner, the meetup serves as a communal space for AI enthusiasts to network and engage with cutting-edge topics in the field.


LLM Finetuning (Hamel + Dan) Discord

  • Hamel's Attendance Question: A user inquired if Hamel was available during a discussion on LLM Finetuning, indicating interest in his expertise.
    • This interaction highlights the community's anticipation for insights from known contributors in LLM optimization.
  • Hamel is not present: Unfortunately, Hamel was not available at the time of the inquiry, suggesting a missed opportunity for discussion.
    • Community members expressed hope that he would engage in future sessions to share his insights.


MLOps @Chipro Discord

  • CUDA Hackathon Hits San Francisco: Get ready for the CUDA Hackathon in San Francisco on September 21st, where you can hack alongside NVIDIA engineers and tackle real-world CUDA challenges.
    • This is a golden opportunity to engage with experts and work on innovative accelerated computing projects.
  • Deep Dive into Accelerated Computing: The event will explore accelerated computing, using NVIDIA's parallel computing platform to optimize GPU applications.
    • Participants will have hands-on access to NVIDIA resources and engineers for guidance in building and refining CUDA applications.


DiscoResearch Discord

  • Together AI Hits Users with Price Increase: Effective September 1, 2024, the pricing for Together API's Serverless Reference endpoints will rise for the Llama-3 8B and 70B models, with the 8B model increasing from $0.20 to $0.40 per million tokens.
    • The 70B model will see a jump from $0.90 to $1.80 per million tokens, reflecting a significant upward adjustment.
  • Turbo and Lite Pricing Stays Steady: While serverless endpoints are increasing, Together API's Turbo and Lite pricing remains intact, as confirmed on the Together Pricing Page, last updated July 18, 2024.
    • This keeps users from facing price hikes on these endpoints amid overall pricing changes.
  • OpenAI Drops Prices, Leaves Together AI Looking Weird: In contrast to Together AI's upcoming price increases, a member noted that OpenAI has recently dropped the price for GPT-4O-Mini, stirring discussions about pricing strategies.
    • This shift raises eyebrows about Together AI's decision to hike prices while competitors decrease theirs.
  • Funding Woes Spark Price Increases Speculation: Speculations arose that Together AI may be doubling their prices due to funding issues, as members discussed the sustainability of current pricing strategies.
    • They mentioned that pricing for 4-bit and 8-bit models should remain unchanged for now, but potential changes lurk in the future.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.