AI News (MOVED TO news.smol.ai!)

Archives
August 31, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


3 day weekends are all you need.

AI News for 8/29/2024-8/30/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (213 channels, and 3131 messages) for you. Estimated reading time saved (at 200wpm): 340 minutes. You can now tag @smol_ai for AINews discussions!

A smattering of things we considered:

  • Highlighting the Google Gemini and Cohere Command R (blogpost, but no leaderboard updates yet) model updates this week
  • Lmsys responding to criticism by introducing style control leaderboards, though ChatGPT-4o-latest still destroys everyone else
  • Meta's AI assistant announcing 400m MAU, 185m WAU, 40m DAU.

But nothing seemed must-know.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Benchmarks

  • LLaMA 3.1 Adoption: Meta announced significant adoption of LLaMA models, with nearly 350 million downloads on Hugging Face and widespread use across industries. @AIatMeta highlighted the importance of open source AI in extending benefits to everyone.
  • Long Context Models: Magic AI Labs introduced LTM-2-Mini, a model with a 100 million token context window. @magicailabs claimed this is equivalent to 10 million lines of code or 750 novels. They also introduced HashHop, a new evaluation method for long-context models.
  • Style Control in AI Evaluations: LMSys introduced style control in their regression model for Chatbot Arena, aiming to separate the impact of style from substance in rankings. @lmsysorg reported that models like Claude 3.5 Sonnet and Llama-3.1-405B rose significantly when style was controlled.
  • Qwen2-VL Release: Alibaba released Qwen2-VL, a new multimodal LLM available in 2B and 7B sizes under Apache 2.0 license. @_philschmid noted its competitive performance with GPT-4o mini on various benchmarks.

AI Safety and Regulation

  • US AI Safety Institute Testing: OpenAI CEO @sama announced an agreement with the US AI Safety Institute for pre-release testing of future models, emphasizing the importance of national-level testing.
  • Concerns About AI Takeover: @ajeya_cotra discussed the need for preventative measures against potential AI takeover, questioning how to build consensus and willingness to act before catastrophic harm occurs.

AI Applications and Tools

  • Web Crawling Tool: @rohanpaul_ai shared information about firecrawl, an open-source tool for crawling entire websites and converting them into LLM-ready markdown or structured data.
  • PDF Processing Challenges: @svpino highlighted the difficulties of processing PDF documents with current AI models and suggested preprocessing documents into text format for better results.

AI Industry and Market Trends

  • AI Hype Cycles: @fchollet observed that peak AI hype in the tech community was in Q1-Q2 2023, while peak AI greed in public markets was in Q1-Q2 2024, noting that progress in AI research and applications continues regardless.
  • Call Center Industry Disruption: A viral Reddit post discussed the potential impact of AI on the call center industry, suggesting that AI agents could replace human workers within two years. @rohanpaul_ai shared this, noting the implications for customer service and employment.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Advancements in Long Context AI Inference

  • Local 1M Context Inference at 15 tokens/s and ~100% "Needle In a Haystack": InternLM2.5-1M on KTransformers, Using Only 24GB VRAM and 130GB DRAM. Windows/Pip/Multi-GPU Support and More. (Score: 114, Comments: 28): KTransformers project has introduced local 1M context inference for the InternLM2-1M model, achieving 15 tokens/s inference speed and ~100% accuracy on a "Needle In a Haystack" challenge using only 24GB VRAM and 130GB DRAM. The project implements an efficient sparse attention operator for CPUs, based on research like H2O, InfLLM, Quest, and SnapKV, resulting in a 6x speed increase and 92.88% success rate on the 1M challenge, while maintaining 100% accuracy on the 128K test.
    • The RULER benchmark suggests InternLM2.5 has an "effective" context length of only 4K tokens, after which it performs worse than Llama2-7b. The project developer noted they will test RULER later, emphasizing that their demo showcases the sparse attention operator's effectiveness.
    • Users expressed interest in adding support for Mistral Large 2 to the project's model list, which already includes Mixtral-8x22B. The project's progress has been described as "exciting to track" by some commenters.
    • Some users reported installation issues, with one encountering 404 errors from pip during the cmake process. This suggests potential technical challenges in setting up the project for some users.

Theme 2. California's SB 1047: Implications for AI Development

  • SB 1047 got passed. Do you think this will affect LLAMA? (Score: 52, Comments: 68): SB 1047, a bill addressing AI-generated content, has been passed in California. The legislation requires disclosure of AI-generated content in certain contexts, which could potentially impact LLAMA and other AI language models. While the specific effects on LLAMA are uncertain, the bill's passage may necessitate changes in how AI-generated content is presented and used, particularly in commercial and political applications.
    • The bill's $100 million training cost threshold sparked debate about its impact on open-source AI. Some argue it won't affect local models, while others believe it could impact larger models like LLAMA 405B and its distillations.
    • Critics expressed concerns about the bill's potential to stifle innovation and favor large corporations. Some users called Governor Newsom's office to oppose SB 1047, citing worries about unnecessary regulations and increased costs for AI companies.
    • The legislation requires safety measures for large AI models, including shutdown capabilities, third-party audits, and whistleblower protections. Some view these as reasonable precautions, while others see them as potential threats to open-source development and free speech.
  • California assembly passed SB 1047 (Score: 165, Comments: 73): The California assembly passed SB 1047, a bill that could significantly impact open-source AI models. The legislation reportedly includes provisions requiring model authors to have the ability to shut down their models, potentially making it impractical for state-of-the-art AI models to be open source and potentially concentrating AI development among a limited number of corporations.
    • Meta may face significant challenges due to the bill, as they are headquartered in California. Users speculate the company might move to Seattle or spin off a subsidiary to circumvent the law, while others suggest they may simply stop releasing open-source models.
    • The bill's $100 million training cost threshold for covered models was reportedly determined arbitrarily by Eric Schmidt and colleagues, according to a YouTube video at 20:15. Some users argue this legislation could drive innovation out of California and benefit Chinese AI development.
    • Legal scholars suggest companies doing business in California would need to comply with the bill regardless of location, due to the state's economic importance. Some users view this as California handicapping the entire industry, while others see it as big tech corporations wanting regulation to limit competition.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Video Generation and Visual Effects

  • AI-generated monster movie clips: A video showcasing AI-generated sea monster scenes sparked discussion about the current state of AI video generation. While impressive, many commenters noted it still falls short of Hollywood quality, citing issues with physics, geometry, and human reactions.
  • AI movies on the horizon: A post about upcoming AI-generated movies received significant attention, indicating growing interest in AI's potential impact on the film industry.

AI Model Advancements

  • Magic's 100 million token context window: Magic has trained a model with a 100 million token context window, equivalent to 10 million lines of code or 750 novels, representing a significant advancement in model context capacity.

AI Safety and Regulation

  • Anthropic's agreement with US AI Safety Institute: Anthropic has reached an agreement with the US AI Safety Institute for pre-release testing of their future models, indicating a step towards more regulated AI development.

AI in Gaming and Interactive Environments

  • AI playing Minecraft: A video demonstrating an AI playing Minecraft like a human showcases advancements in AI's ability to interact in complex, open-world gaming environments.

AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. LLM Advancements and Benchmarking

  • Llama 3 Tops Leaderboards: Llama 3 from Meta has rapidly risen to the top of leaderboards like ChatbotArena, outperforming models like GPT-4-Turbo and Claude 3 Opus in over 50,000 matchups.
    • The community expressed excitement over Llama 3's performance, with discussions on its potential impact on the AI landscape and how it compares to proprietary models.
  • Grok 2 Impresses in Code Generation: Discussion highlighted performance comparisons between Grok 2, Gemini, and ChatGPT, with Grok 2 noted as particularly strong in code generation tasks.
    • Users speculated on upcoming models such as Grok 3, raising questions about potential performance edges backed by robust hardware setups.
  • Word Game Bench Challenges LLMs: The newly developed Word Game Bench serves as a benchmark to evaluate language models on word puzzle games like Wordle, with no model currently achieving over a 50% win rate.
    • This benchmark focuses on model interaction and reasoning, emphasizing the challenges LLMs face in dynamic, game-like environments.

2. Open Source AI Developments

  • Re-LAION-5B Dataset Launch: The launch of Re-LAION-5B, a cleaned version of the LAION-5B dataset, was celebrated by the community for addressing previous safety concerns.
    • This updated dataset, created in partnership with key organizations, marks a significant milestone in ensuring safety and compliance in large-scale AI training data.
  • RunwayML Deletes Stable Diffusion Repos: RunwayML deleted all their Stable Diffusion 1.5 repositories on HuggingFace and GitHub, causing frustration among users and breaking functionalities in Diffusers 1.5.
    • The community speculated about potential legal issues behind the deletions, highlighting the impact of such actions on the open-source AI ecosystem.
  • GameNGen: Neural Game Engine Breakthrough: GameNGen, the first game engine powered entirely by a neural model, can simulate DOOM at over 20 frames per second on a single TPU, achieving a PSNR of 29.4.
    • This breakthrough demonstrates the potential of neural models in real-time game simulation, with human raters struggling to distinguish between real gameplay and simulations.

3. Model Optimization Techniques

  • Dynamic Expert Routing Enhances Adaptability: The concept of allowing models to define their own experts during training, instead of using a fixed configuration, was discussed as a way to improve adaptability.
    • This idea is linked to ongoing research like the methods proposed in the LayerSkip paper, aiming to enhance model performance and efficiency.
  • Quantization Techniques for Large Models: Discussions highlighted quantization techniques like AQLM and QuaRot aimed at running large language models (LLMs) on individual GPUs while maintaining performance.
    • Members shared implementation details and benchmarks, such as running Llama-3-70b on RTX3090, showcasing the potential of these optimization methods.
  • Finite Scalar Quantization (FSQ) as VQ-VAE Alternative: The introduction of finite scalar quantization (FSQ) was discussed as a potentially effective and simpler alternative to traditional vector quantization techniques in VQ-VAEs.
    • The FSQ method promises improved performance across various tasks, as noted in a linked paper, with implications for token utilization in language models.

4. AI Deployment and Infrastructure

  • Tinygrad Launches Affordable Cloud Service: Tinygrad announced a new cloud service offering a 4090 GPU and 500 GB of storage for just $60/month, making it 3x cheaper than competitors like Vast AI.
    • The service introduces a 'CLOUD=1' feature, allowing users to run Tinygrad locally while leveraging cloud speed for performance enhancements with 10-step processing.
  • OpenRouter Stealth Launch Goes Live: OpenRouter successfully launched, serving Llama 3.1-405B-instruct with 128k context and function calling support at a competitive price of $2.5/mil tokens.
    • The team emphasized building reliable infrastructure over referral-based compensation, highlighting their focus on service quality and accessibility.
  • Cohere's Command R Series Update: Cohere announced refreshed Command R and R+ models with improvements in performance for reasoning, coding, and multilingual RAG, now available under new aliases.
    • The updated models feature lower pricing per token, with R being significantly cheaper at $0.15 for input tokens, showcasing advancements in both performance and cost-efficiency.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

  • Debate on Fine-Tuning vs RAG: Discussion revealed that while RAG might reduce hallucinations, controlled overfitting is crucial in fine-tuning processes. The effectiveness largely hinges on dataset size and hyperparameters like rank and alpha.
    • Participants emphasized that neither method clearly outranks the other and both strategies must be tailored based on specific project requirements.
  • Diverse Use Cases for LLMs: LLMs are currently employed across various industries, with companies like AT&T using them for customer support and others for proprietary research applications. Instruction-based models akin to GPT dominate the deployment landscape.
    • The versatility shown in these applications indicates a strong trend towards integrating LLMs into practical daily operations.
  • OpenRouter Launch Hits the Ground Running: The OpenRouter successfully went live with the Llama 3.1-405B-instruct, featuring 128k context and function calling capabilities at an inviting price of $2.5/mil tokens.
    • Clarifications highlighted that the developer's compensation is unaffected by referral link usage, focusing instead on building reliable infrastructure.
  • Upcoming Models and New Pricing Trends: Speculation around Meta's soon-to-be-announced Llama models has generated buzz, though specifics about Llama 4 are still unclear. Concurrently, OpenAI revealed reduced pricing for their GPT-4o model, which now costs $4 per 1M tokens.
    • The adjustments provide a pathway for developers to optimize costs while accessing newer models and features, such as structured outputs aligning strictly with JSON Schemas.
  • Community Collaboration on Finetuning Goals: A community member expressed eagerness to finetune an LLM without a specific objective, just for the fun of it. This openness highlights the exploratory spirit within the community.
    • Such a mindset may inspire other developers to experiment and innovate outside of fixed project frameworks.


aider (Paul Gauthier) Discord

  • Gemini Model Generates Mixed Reactions: The new Gemini model is causing a stir with claims of enhanced performance, but users maintain a cautious stance regarding its effectiveness compared to existing models like Sonnet.
    • Skepticism focuses on the model's practical utility in Aider scenarios, leading to user experiences being shared for validation.
  • Sonnet Keeps Delivering: Recent benchmarks confirm that Sonnet remains consistent in performance, countering previous speculations of decline.
    • Users express continued interest in the model's capabilities and reliability based on its stable benchmark scores.
  • Investment Talks Heat Up for Aider: Community buzz surrounds potential investments in Aider, especially the need for a refined GUI to broaden its usability.
    • Suggestions include enhancing the leaderboard feature with user-generated data to better reflect performance metrics.
  • Long Context Models Gaining Traction: Discussions around models that can manage 100 million tokens could significantly impact coding workflows, with tools like Magic dev mentioned as game-changers.
    • User curiosity about the practical applications of these models in AI-assisted development continues to grow.
  • Swift Support Lacking in Aider: The current lack of Swift support in Aider, due to the tree-sitter package's limitations, is causing frustration among developers.
    • Users acknowledge that adding backend support for Swift may require additional custom development efforts.


OpenAI Discord

  • Personalization of LLMs Gains Traction: Members expressed strong interest in personalization of language models, advocating for customizable personalities and long-term memory to enhance user interactions.
    • Concerns over high implementation costs and maintenance complexities emerged, with ideas like RAG (Retrieval-Augmented Generation) considered as potential solutions.
  • Crafting Chatbots with OpenAI API: The community discussed leveraging the OpenAI API for custom chatbot development, addressing the requirement for programming skills and suited use cases.
    • While suggestions for no-code solutions like Zendesk emerged, limitations in automation and integration with systems like Jira were acknowledged.
  • Grok 2 Stands Out in Performance Testing: Discussion highlighted performance comparisons between Grok 2, Gemini, and ChatGPT, marking Grok 2 as notably strong in code generation tasks.
    • Speculation on upcoming models such as Grok 3 stirred excitement, raising questions about their potential performance edge backed by robust hardware.
  • AGI Development Fuels Global Concerns: Participants voiced apprehension regarding which nation might first achieve AGI and the ensuing power shift implications.
    • Emphasis was placed on the necessity for the US to maintain technological superiority to mitigate risks to global stability.
  • Challenges in CV Matching Scores: A user reported difficulties in scoring CVs against job descriptions via API prompts, noting a perplexing score of 65 for an unrelated commercial director position.
    • Adjusting scoring parameters showed no improvement, with significant misalignment issues persisting across different engineering roles.


HuggingFace Discord

  • Inference Endpoints Are Down: Members reported issues with inference endpoints likely due to a bug related to payment methods, creating urgency for fixes as production websites rely on them.
    • A pull request was opened, and the team indicated that the problem is being addressed.
  • Discussion on Training Models and Performance: Users explored the nuances of training dialogue data with various models, discussing the effectiveness of incorporating system prompts vs learning from context.
    • Concerns arose regarding VRAM limitations for local models, leading to suggestions of using Colab for more robust resources.
  • Human Feedback crucial for Model Evaluation: A paper emphasized that human feedback is essential for training Large Language Models, albeit influenced by biases.
    • The researchers highlighted that while preference scores assist evaluation, they often don't represent crucial aspects like factuality (View PDF).
  • Efficient Layer Pruning in LLMs: A study reviewed a layer-pruning strategy for LLMs finding minimal performance degradation until up to half the layers were removed.
    • This technique involves parameter-efficient finetuning (PEFT) and quantization to recover model performance post-pruning.
  • FLUX LoRA Training Simplified: A guide titled FLUX LoRA Training Simplified instructs users on utilizing Kohya SS GUI for training with an 8GB GPU.
    • The tutorial enables novices to start their training journey smoothly.


CUDA MODE Discord

  • Flash Attention Faces Memory Challenges: Users are struggling with shared memory sizes in their flash attention kernel, particularly a size demand reaching 131,072 bytes for Q, raising concerns about efficiency on non-Hopper GPUs.
    • When testing with NVIDIA GeForce RTX 3090, users encountered an OutOfMemoryError while using the Hugging Face example, indicating challenges in memory management with the current package version.
  • LayerNorm Kernel Updates Enhance Performance: The integration of LayerNorm custom kernels was confirmed with the merge of PR #169 in the Liger Kernel repository, tested for correctness on RTX 3090.
    • Further discussions centered on dynamic dispatch for atomic operations to optimize performance in multi-GPU setups.
  • Returning to FP8 for Development: A member is reverting to FP8 code development to solidify their understanding and push forward on the ongoing project, feeling good about their earlier progress.
    • This suggests a focus on enhancing performance and compatibility in the current environment where further optimization is anticipated.
  • L2 Side Aware Optimization Sees Speed Boost: The L2 Side Aware code achieved a consistent speed of 1823GB/s for GELU forward, marking a 2% increase over earlier performance with x128 configurations.
    • Despite this improvement, discussions indicated a need for further simplifications to sustain optimization and reduce power consumption.
  • Community Questions Quantization Techniques: In discussions of quantizing attention layers, members raised concerns over accuracy in QKV projections, suggesting a need for refining strategies to maintain latency in system performance.
    • Notably, issues were identified with the AWQ performance degrading when using floating point integers, prompting inquiries into optimal implementation for high performance.


Stability.ai (Stable Diffusion) Discord

  • IP Adapter for Flux sparks mixed reactions: Members discussed the recent introduction of an IP adapter for Flux, noting mixed performance results among users.
    • Despite varying opinions on its effectiveness, many are still excited about this addition to their toolkit.
  • Training Models with Limited VRAM presents challenges: Experiences were shared regarding training with limited VRAM on an RTX 3060, revealing that higher resolutions (like 1024) consume huge amounts of memory.
    • It was suggested that lowering resolution can help, especially since 12GB RAM may not be enough for complex tasks.
  • Segmentation in Image Processing raises questions: Discussion emphasized the concept of SEG (Segmentation) in image processing workflows, particularly its role in systems like ComfyUI.
    • Members voiced confusion over its implementation, questioning its necessity compared to simpler alternatives.
  • RunwayML SD 1.5 repos vanish from platforms: RunwayML has deleted all Stable Diffusion 1.5 repos on HuggingFace and GitHub, stirring conversation on the implications of this move.
    • Users speculated if this marks a departure from 1.5 models, which seem to have dropped in utilization.
  • SDXL vs SD 1.5 creates debate: One user considered transitioning from SD 1.5 to SDXL, balancing concerns over generation times and storage needs for their GPU.
    • Advice focused on optimizing performance using command line arguments to suit weaker GPU capabilities.


Nous Research AI Discord

  • Amnesia Mode Reveals Professionalism in Hermes 3: Users reported that the 'amnesia mode' in Hermes 3 favors professionalism over casual language, limiting its conversational flexibility.
    • One user showed frustration, stating that the model maintains a 'family-friendly' demeanor, prompting speculations about its predefined behavior.
  • Training Techniques Yield Better AI Output: Discussions highlighted that training models on outputs alone leads to better benchmarks compared to incorporating user inputs during instruction tuning.
    • Members agreed that this specific training method enhances coherence and reduces unwanted 'AI-y' responses.
  • Gradient Strategies Could Cut Communication Costs: A user proposed leveraging low-rank approximations for gradient synchronization in distributed training to minimize communication overhead.
    • This sparked discussions on effectively combining various optimization techniques to enhance model training performance.
  • Introducing the Word Game Bench for AI Assessment: The new 'Word Game Bench' benchmark captures language model performance via word puzzle games like Wordle, allowing unique interaction based on previous actions.
    • Community members displayed curiosity about its engaging methodology and potential for evaluating model behavior.
  • GameNGen Transforms Game Development Landscape: GameNGen, the first neural model game engine, enables real-time DOOM simulations without conventional tools, achieving over 20 fps.
    • Human raters struggled to differentiate between simulated and actual footage, showcasing its advanced realism potential.


LM Studio Discord

  • API Inference Speed Cap Discussion: A user raised questions about capping inference speed on the API; another member noted that multiple requests using different models is viable.
    • The user prefers sticking to the same model for VRAM conservation but recognizes the limitations.
  • User Feedback on LM Studio Version 0.3: Concerns emerged regarding the latest LM Studio update leading to reduced AI responsiveness and unusual repeated output.
    • Members suggested this might be tied to prompt settings or template parsing, advising tweaks for improvement.
  • M2 Ultra Mac ready for development: One member set up their M2 Ultra Mac with 192 GB Unified Memory for exploring LLMs, with a 2 TB drive for storage.
    • They are also using a separate PC as a server to augment their development environment.
  • Exploring LLM performance on RTX 4090s: Discussions highlighted running the 405b model on 6 RTX 4090s, yielding around 1 token per second, influenced by offload settings.
    • One member experimented with various GPU configurations, finding memory linking can enhance speeds when models are well-distributed.
  • Impact of PCIe lane settings on performance: Members discussed running RTX 4090s on gen4 x8 vs. x16 settings, examining potential impacts on speed for multi-GPU environments.
    • While gen4 x8 might not matter for single GPUs, it could hinder performance in setups with denser models.


OpenRouter (Alex Atallah) Discord

  • Gemini Flash models now free!: The Gemini Flash 8B (EXP) model is now available for use at this link, with the Gemini Flash Experiment also confirmed free until pricing is finalized for AI Studio.
    • Users celebrated the availability of Gemini Experimental models, marking a significant step towards broader access.
  • Cheers to Daun.ai's launch!: Community members expressed excitement over the Daun.ai launch, marking it as a noteworthy addition to AI tools.
    • The enthusiasm reflects an increasing demand for innovative AI solutions in the developer community.
  • Cohere model updates spark interest: Recent updates to Cohere's Command R models introduced new features and pricing changes, igniting a buzz among users eager to explore the enhancements.
    • Concerns about the handling of safety modes in OpenRouter were raised, highlighting the community's attention to secure implementations.
  • Experimental models hit rate limits: Users reported rate limit errors while trying out experimental models, indicating challenges in accessing new features during peak use.
    • Consequential discussions arose on managing safety settings through the API, pointing to a need for clearer documentation.
  • Concerns over infrastructure stability: A spate of recent downtime issues attributed to database capacity has prompted concerns in the community, with ongoing upgrades proposed as a solution.
    • Developers acknowledged the ongoing effects of these outages, ensuring plans are in place to enhance stability moving forward.


Eleuther Discord

  • Embedding Weights Hit NaN Early: A user reported that embedding weights became NaN just a few steps into training, likely due to a loss function denominator rounding to zero, exacerbated by a data-dependent decay term.
    • Members tracked gradients to better understand the complexity of this situation, providing insights into loss function optimization.
  • Seeking Insights on Compression Techniques: Jeremy Vonderfecht is requesting feedback on his research involving compressing images with diffusion models like Stable Diffusion, recognizing the need for collaboration.
    • Members suggested using specific channels for ongoing discussions to foster constructive dialogue.
  • Dynamic Expert Routing Boosts Adaptability: The discussion highlighted the potential of dynamic expert routing, allowing models to define their own experts during training for enhanced adaptability.
    • This is linked to ongoing research such as the methods in the LayerSkip paper.
  • Launching Word Game Bench to Challenge Models: Word Game Bench is a new benchmark for evaluating language models on word games like Wordle, with no model surpassing a 50% win rate; it focuses on dynamic interactions.
    • More information can be found at Word Game Bench and a tweet announcement.
  • Addressing Tokenization Challenges: Participants discussed the significant limitations of tokenization, especially for non-Latin languages, and its influence on model training efficiency.
    • Concerns were raised about how tokenization can obscure crucial data features, making optimization slower.


Perplexity AI Discord

  • Discord server celebrates 100K members!: The Discord server has officially reached 100K members, marking a significant community milestone, with heartfelt thanks to all members for their support.
    • The team expressed excitement for continued growth, underscoring the contributions from every member that enrich the group's atmosphere.
  • Pro API credits missing for users: Users reported not receiving their $5 PPLX API credits after purchasing Pro, leading to calls for urgent support to resolve the issues.
    • Members are sharing account details for quicker resolution, emphasizing concern over the usage and accessibility of API credits.
  • Concerns over Pro Searches functionality: There was uncertainty regarding the functionality of Pro Searches through the API, especially for users running llama-3.1-sonar-huge-128k-online.
    • The absence of Pro in the API left users questioning when this feature would become available.
  • Users experience API Rate Limit errors: Several users reported encountering a 429 Client Error: Too Many Requests when accessing the API, bringing attention to potential usage caps.
    • This situation signals underlying issues that may affect overall API functionality for engineers relying on consistent performance.
  • Feedback on AI Model behavior and performance: Users scrutinized their AI models, noticing inconsistent outputs even after switching models, which indicated possible bugs impacting user experience.
    • Queries on model behavior sparked discussions around recent updates, suggesting a need for clarity on outputs and model identification.


Cohere Discord

  • MMLU's Lack of Practical Correlation: Members noted that MMLU does not correlate well with practical utility in building LLMs, highlighting outdated examples like Freud's theories, and remarked on recent model refreshes improving data relevance from the internet.
    • This sparked a discussion on the future of benchmark metrics in evaluating LLM applicability in real-world scenarios.
  • Command R+ Impresses with Updates: Cohere announced significant performance improvements for the refreshed Command R and R+ models, featuring better multilingual RAG and a cost-efficient $0.15 per input token.
    • Members confirmed the updates are available on Hugging Face and noted the need for quantization before deployment on other platforms.
  • Cohere Chat Interface Remains Unchanged: Users raised concerns about the Cohere chat interface, questioning if updates align with new model features, notably the absence of a dark mode option.
    • The call for enhancements in user interface options indicates a growing desire for improved user experience in model interactions.
  • API Trial Key Limitations Cause Frustration: A user faced a rate limit error (429) using a trial API key, lamenting the 1,000 API calls/month limit, with peers confirming the necessity for a production key.
    • The discussion emphasized the importance of optimizing API usage for enhanced performance and broader experimentation.
  • Launch of Maya LLaVA-Pretrain Dataset: The newly available Maya LLaVA-Pretrain dataset contains 4,404,776 entries across 8 languages, developed for pretraining large models, and expanded via machine translation.
    • Members expressed appreciation for addressing queries around batch processing and API capabilities related to this dataset.


Latent Space Discord

  • Codeium bags $150M in Series C: Codeium successfully raised $150 million led by General Catalyst, now valued at $1.25 billion with total funding reaching $243 million since its inception. Co-founder Varun Mohan mentioned they still have not tapped into their $65 million Series B funds.
    • This strategic reserve may signal a cautious approach as they navigate market demands.
  • Meta AI Assistant hits 400M MAU: Meta's AI Assistant soared to 400 million Monthly Active Users (MAU) and 40 million Daily Active Users (DAU), showcasing its expanding user base and engagement. Discussion highlighted the potential necessity for licensing as user numbers continue to rise.
    • Such metrics reflect a significant adoption rate, spurring discussions about future scaling needs.
  • Google DeepMind rolls out customizable Gems: Google DeepMind introduced customizable Gems, specialized iterations of their Gemini model tailored for specific domains like Learning Coach and Coding Partner. The initiative aims to enhance user experience through targeted functionality.
    • Feedback focused on the effectiveness of these Gems and their usability in real-world scenarios.
  • Tome pivots to focus on enterprise AI: Tome announced a shift toward becoming an AI assistant designed to help users penetrate new enterprise accounts, marking a significant change in its business focus. The news was confirmed by a company representative outlining the strategic journey.
    • Members expressed interest in how this pivot might redefine Tome's market positioning and goals.
  • New Podcast with Nicholas Carlini: The latest episode of the Latent Space podcast showcases insights from Nicholas Carlini of Google DeepMind on LLM benchmarks and extraction methodologies of training data. Key highlights involved critical perspectives on the cessation of OpenAI logprobs.
    • Carlini’s reflections prompted community dialogue about benchmarking practices in AI.


Modular (Mojo 🔥) Discord

  • Mojo's Potential in Blockchain Protocols: Discussions are ongoing about using Mojo for blockchain protocols, with developers noting its current immaturity compared to Go, Rust, and C++.
    • One developer remarked that Mojo and Go are the most competent languages, but Go's 20% performance loss could be crucial for some projects.
  • Questions on Mojo's Open Source Future: Inquiries arose about the availability of the Mojo compiler's source code, which remains closed source for now.
    • The Modular team indicated they may not know when or if it will be open-sourced while balancing development speed with community engagement.
  • Performance Comparison Insights: Members debated the performance of Go versus C, highlighting Go's limitations in various tasks.
    • Darkmatter pointed out that Go's performance may significantly drop, citing 30 requests per second capacity compared to C's 100.
  • Architect's Role in Memory Management: A member argued that if a programmer is unsure about memory management, it signifies a flaw in the system's design.
    • They emphasized the need for solid architectural design to minimize concerns for application programmers.
  • Exciting Export Ideas for Fastai: A proposed enhancement involves overriding Learner.export in fastai to export Mojo code along with the PyTorch model.
    • This tactic could improve integration between the input pipeline and the model for streamlined production use.


LangChain AI Discord

  • LangChain Embraces Function Calling & Streaming: A member struggled with using LangChain v2.0 for function calling and streaming, citing documentation gaps. Another clarified that function calling is supported, but streaming outputs need careful setup in JavaScript.
    • Exploring resources like the AgentExecutor documentation might help clarify configurations.
  • Docker Tales: Ollama Connection Woes: One user faced a connection refusal error with their LangChain app in Docker while trying to use the Ollama API. They later resolved it by correcting the base URL to a direct Ollama host URL.
    • This issue highlights the importance of proper URL settings in containerized environments, especially when leveraging tools like Docker.
  • Custom GPT for HR Sparks Ideas: A user expressed a desire to create a specialized GPT for their HR team, targeting hallucination reduction and feedback mechanisms. The discussion turned toward enhancing LLM interactions with fine-tuning and RAG techniques.
    • Implementing feedback loops could significantly improve performance, especially when adapting existing manual content.
  • Challenges with LangChain Streaming Outputs: A user reported difficulties with LangChain agent executors that collect outputs before the final response is delivered, rather than streaming in real-time. Suggestions emerged to utilize the streamRunnable option for real-time output delivery.
    • Leveraging this feature could streamline response times, enhancing user experience in real-time applications.
  • GraphRAG vs Traditional RAG: A Preference Battle: Discussion emerged around the effectiveness of hybrid RAG methods, with a member favoring traditional RAG techniques for their process. They pointed out that exploring new methods like self-query and large context RAG might prove worthwhile.
    • This conversation potentially opens the door to more advanced exploration in RAG methodologies for response enhancement.


LlamaIndex Discord

  • GymNation partners with LlamaIndex for success: GymNation partnered with LlamaIndex, resulting in a 20% increase in digital lead to sales conversion and an 87% conversation rate with digital leads. For more details, check their full success story.
    • Remarkable outcomes showcase how LlamaIndex enhances user engagement effectively.
  • LLMs in Production insights shared: An upcoming discussion on September 9th will feature insights on deploying LLMs effectively. Details are available here.
    • Attendees can expect practical tips on real-world LLM applications.
  • MLFlow Podcast features LlamaIndex: Co-founder discussed the MLFlow integration with LlamaIndex on their podcast, focusing on streamlined logging and evaluating applications. Watch the demo and insights here.
    • Powerful enhancements in managing AI applications were showcased during the session.
  • LLM x Law Hackathon announced: An LLM x Law Hackathon on September 8th invites participants to explore AI in legal practices. More information can be found here.
    • This event will feature multiple tracks, emphasizing innovation in AI-legal integrations.
  • Financial Data Analysis with MoW: Innovative financial data analysis employing Mixture of Workflows (MoW) and Corrective RAG was discussed, utilizing models like Phi-3, Qwen-2, and others. Further details can be found here.
    • This method provides context-aware analysis of financial statements, promising better insights.


OpenInterpreter Discord

  • House Party Next Week: Join us for a House Party next week at an earlier time to boost community engagement! Join the Discord Event.
    • This event aims to create a fun atmosphere and encourage discussions about potential applications.
  • Seeking Terminal App Suggestions: A member is looking for alternatives to the Konsole terminal app on KDE due to screen bleeding issues. Users reported similar problems while using GPT-4o-mini in standard terminal setups.
    • This highlights ongoing concerns about terminal performance in high-demand environments.
  • Obsidian OI Plugin Installation Help Needed: A user praised resources on the Obsidian OI plugin but is struggling with global installation issues. They were advised to share their installation details in a designated channel for further support.
    • This reflects a collaborative effort within the community to resolve technical challenges.
  • GameNGen: A Leap in Game Simulation: GameNGen now simulates DOOM at over 20 frames per second using a neural model, showcasing exceptional performance on a single TPU, with a PSNR of 29.4.
    • The experience left human raters hard-pressed to tell apart real gameplay from its simulations, marking a significant advancement in game technology.
  • Excitement for AgentOps Developments: Members are buzzing with enthusiasm for upcoming initiatives from Adam and the AgentOps team. This excitement underlines the community's interest in next-gen agent tech breakthroughs.
    • This anticipation signals a healthy dialogue about the future prospects in smart agent systems.


LAION Discord

  • Google's GPU Acquisition Sparks Curiosity: Members questioned why Google is purchasing GPUs from NVIDIA despite their own TPUs, suggesting a potential gap or interest in NVIDIA technologies.
    • Is the TPU not enough? One member mused about Google's strategic choices in hardware.
  • RunwayML Deletes All Stable Diffusion Repos: Discussion erupted over RunwayML deleting all their Stable Diffusion 1.5 repositories on HuggingFace and GitHub, leaving many users frustrated.
    • One member noted that this action broke many functionalities in Diffusers 1.5, particularly impacting single file loading.
  • Disruption from Repo Deletions: Members expressed annoyance about the seemingly thoughtless nature of RunwayML's deletions, with one stating it felt like they wanted to cause disruption.
    • Speculation arose around potential legal issues, but no specific reasons were confirmed for the deletions.
  • Generating Realistic Images for Book Covers: A member sought advice on generating comic book-style or cartoonish images for their novel covers, struggling with overly realistic outputs from DALL·E.
    • Despite attempts, they found DALL·E not catering to the specific style they desired.
  • Launch of Re-LAION-5B: Members celebrated the launch of Re-LAION-5B, a cleaned version of the LAION-5B dataset, which addresses previous concerns following a safety revision procedure.
    • The dataset was updated in partnership with key organizations to ensure safety and compliance, marking a significant milestone.


Interconnects (Nathan Lambert) Discord

  • Tech Giants Eye OpenAI: Nvidia, Apple, and Microsoft are in discussions to invest in OpenAI as part of a new $100 billion funding round source. This move indicates strong interest in driving AI funding and innovation from major players.
    • Chatbot wars are heating up as these companies jockey for pivotal stakes in AI development.
  • Chatbot Wars Heat Up: ChatGPT has surpassed 200 million weekly users, posing a challenge for rivals like Meta AI, which is also increasing its market traction source. This competitive landscape raises questions about user engagement and effectiveness of different platforms.
    • Concerns exist regarding the real utilization of Meta AI, as only 40 million DAUs could suggest accidental engagement with its offerings.
  • Tinygrad Launches Affordable Cloud Solution: Tinygrad introduced a new cloud service featuring a 4090 GPU and 500 GB of storage for only $60/month, significantly undercutting competitors like Vast AI source. This new model promises a cost-effective solution for developers looking to leverage advanced hardware.
    • Coming soon: CLOUD=1 enables users to operate Tinygrad locally while taking advantage of cloud processing speed for efficient handling.
  • Inquiry on System Prompts Impact: Members are probing into the impact of system prompts on evaluation scores, sparking interest in whether different prompting techniques can significantly adjust results. There’s a call for research papers to support this exploration.
    • This inquiry highlights the ongoing desire to refine AI performance metrics through thoughtful prompt design.


Torchtune Discord

  • QLoRA Faces Memory Puzzles: Concerns arose as a member questioned the memory sufficiency for QLoRA after encountering a CUDA error indicating illegal memory access with 4 48GB GPU cards.
    • This highlights potential pitfalls in hardware setup that need careful consideration when configuring memory resources.
  • A6000 GPUs Get Confused: Clarifications confirmed that A6000 GPUs have been upgraded to 48GB, thus ensuring four of these cards should meet the required capacity.
    • Members suggested CPU offloading and sequence length adjustments could additionally impact memory distribution during training.
  • Training Sequence Lengths Under Scrutiny: A member experimented with different training sequence lengths (8K and 4K), indicating how these variations may affect vRAM usage.
    • Probing into these specifics showcases the essential balancing act between sequence configuration and memory demands.
  • Interest in Multi-GPU Evaluation: Inquiries about the existence of multi-GPU evaluation support in TorchTune suggest a keen interest in optimizing performance.
    • This reflects a broader trend where AI engineers seek scalability and efficiency in handling demanding training setups.
  • Debugging CUDA Errors for Data Integrity: A member received debugging tips such as setting CUDA_LAUNCH_BLOCKING=1 to address illegal memory access errors during training.
    • This points to the ongoing complexities of executing distributed training with PyTorch while managing memory constraints effectively.


DSPy Discord

  • Confusion Over Repo Connections: A member expressed confusion about the connection between their statement and the GitHub repository, clarifying that the repo was separate but showcased to inspire community involvement.
    • It’s getting over 2k likes each day, indicating significant interest in the LinkedIn Auto Jobs Applier tool.
  • Concerns on LinkedIn Tool Performance: Another member raised concerns regarding the performance of the LinkedIn Auto Jobs Applier, pointing to GitHub issues that reveal room for improvement.
    • This highlights ongoing feedback suggesting there’s still much to enhance in the tool's capabilities.
  • Workshop for Reliable AI Agents: A member shared a link to the YouTube video for a workshop focusing on Useful and Reliable AI Agents, which tackles accuracy, reliability, and cost-effectiveness.
    • The workshop addresses the active research on AI agents and their effective utilization in real-world applications.
  • AgentOps Tools for AI Development: AgentOps offers resources for building agents, featuring tools that streamline the development process by eliminating guesswork in prompting.
    • This transparency aims to enhance how developers approach AI solutions.
  • DSPy Seminar at Bay Area AI Meetup: The upcoming Bay Area AI meetup will feature Michael Ryan discussing DSPy: Prompt Optimization for LM Programs, showcasing his work on the MIPROv2 algorithm.
    • The meetup is sponsored by Neo4j and promises to deliver valuable insights.


OpenAccess AI Collective (axolotl) Discord

  • Axolotl GitHub Docs Needs Dark Mode: A member requested the Axolotl GitHub documentation to offer a dark mode, citing discomfort with the current light mode during frequent visits.
    • They emphasized challenges with checking configuration parameters in the current theme.
  • Hardware for Training LLaMA 70B: Discussion revolved around the hardware requirements for training the LLaMA 70B model, with speculations that only a few NVIDIA A6000 GPUs might be needed.
    • A member confirmed that 3x A6000 GPUs should be sufficient for training the full model, highlighting potential advancements in GPU capabilities.
  • Llama 3.1 Still Struggles with Special Tokens: Concerns were raised about Llama 3.1 base still experiencing issues with uninitialized special tokens and out-of-distribution embeddings.
    • Members expressed ongoing challenges with managing special tokens, which could impact model performance.
  • Potential Fix for Untrained Tokens: A new option, fix_untrained_tokens: true, was introduced to address uninitialized special tokens in Llama 3.1, signaling a step towards improvement.
    • This fix reflects a continued effort to refine model interactions and performance.
  • New Assistant Prefill Feature Launch: The recent Pull Request #33198 at Hugging Face adds a long-requested assistant prefill feature that automatically initiates model responses.
    • This update aims to enhance user experience in the TextGenerationPipeline, employing a creative approach to response generation.


Gorilla LLM (Berkeley Function Calling) Discord

  • Groq Waits for Leaderboard PRs: Groq has not yet been added to the leaderboard as the team is still waiting for their PRs, expected around next week.
    • This delay sparked discussions about their integration and anticipated performance implications.
  • Model Steps Documentation is Essential: A member confirmed that documenting model steps is crucial for reproducibility, enhancing model understandability.
    • Proper documentation ensures usability and minimizes confusion during model implementation.
  • Java Test Case Reveals GIS Issues: A user reported performance issues in a Java test case related to GIS geometry initialization.
    • They concluded that simpler direct examples might serve better than complex function calls, given user queries.
  • Queries on Evaluation Temperature Settings: Members questioned if evaluations are conducted with a greedy decode and temperature of 0 for fair metrics.
    • Discussions referenced recent GitHub links on leaderboard evaluation criteria, contemplating randomness in output.
  • OSSHandler Default Parameters Discussed: The default temperature for OSSHandler is set at 0.001, and adjustments were briefly considered but ultimately rejected.
    • This choice aligns with maintaining consistent function outputs and overall model performance optimization.


tinygrad (George Hotz) Discord

  • Exploring tinygrad's limitations: codeman3786 questioned if tinygrad is effective for statically scheduled operations but struggles with semi-structured sparsity options. George Hotz's invitation for specific examples of tinygrad's shortcomings highlights community curiosity about its operational limits.
    • The ensuing discussion signals a shared interest in dissecting the real-world applicability of tinygrad, especially in the context of complex data handling.
  • Tensor.cat's trouble with sharded tensors: A user ran into issues using Tensor.cat with sharded tensors, receiving an error about padding not supported. They devised a workaround utilizing unsqueeze, but additional reshaping errors kept cropping up.
    • This indicates a need for clarity on whether the limitation stems from core functionality or is merely unsupported behavior, as the user considers adapting the code for batch dimension support.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.