AI News (MOVED TO news.smol.ai!)

Archives
August 10, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


A quiet week is all you need.

AI News for 8/8/2024-8/9/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2549 messages) for you. Estimated reading time saved (at 200wpm): 278 minutes. You can now tag @smol_ai for AINews discussions!

Unlike most newswires we do not seek to/have to fill pages with stuff when there isn't much going on. The biggest news this week was price cuts and structured outputs. Congrats to Cursor AI for announcing their $60m Series A. We have been big fans of Composer.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Updates and Developments

  • Qwen2-Math Models: @rohanpaul_ai reported that Qwen2-Math-72B outperformed GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B on various math benchmarks. The models are based on Qwen2 and trained on math web text, books, exams, and codes, utilizing synthetic data and advanced techniques like rejection sampling and group relative policy optimization.
  • Google AI Pricing: @rohanpaul_ai shared that Google AI has significantly reduced pricing for Gemini 1.5 Flash, cutting input prices by 78% to $0.075/1 million tokens and output prices by 71% to $0.3/1 million tokens for prompts under 128K tokens.
  • Anthropic Bug Bounty Program: @AnthropicAI announced an expansion of their bug bounty program, focusing on finding universal jailbreaks in their next-generation safety system. They're offering rewards for novel vulnerabilities across various domains, including cybersecurity.
  • IDEFICS3-Llama Fine-tuning: @mervenoyann shared a new tutorial on QLoRA fine-tuning IDEFICS3-Llama 8B on VQAv2, demonstrating efficient fine-tuning techniques for visual question answering.

AI Research and Benchmarks

  • Chinese Open Weights Model: @jeremyphoward mentioned a Chinese open weights model that surpasses all previous models, both closed and open, at MATH benchmarks.
  • Mamba Survey: @omarsar0 shared a survey of Mamba, providing a systematic review of existing Mamba-based models across domains and tasks, focusing on advancements, adaptation techniques, and applications where Mamba excels.
  • LLM-based Agents for Software Engineering: @omarsar0 highlighted a survey paper on current practices and solutions for LLM-based agents in software engineering, covering topics like requirement engineering, code generation, and test generation.

AI Tools and Platforms

  • R2R RAG Engine: @rohanpaul_ai discussed R2R, an open-source RAG engine that simplifies the development of RAG applications, offering features like multimodal support, hybrid search, and automatic knowledge graph generation.
  • LlamaIndex Workflows: @llama_index introduced Workflows, a new abstraction for building complex agentic gen AI applications, demonstrating how to rebuild LlamaIndex's built-in Sub-Question Query Engine using this feature.
  • Mistral AI Agents: @sophiamyang announced the introduction of Mistral AI agents, allowing users to build agents based on Mistral models or fine-tuned models for use on Le Chat.

AI Safety and Regulation

  • California Bill SB 1047: @ylecun shared concerns expressed by Zoe Lofgren (Democratic member of the House) about California bill SB 1047, noting it's "heavily skewed toward addressing existential risk."
  • Open-source AI Debate: @bindureddy initiated a discussion about banning open-source AI, highlighting the controversy surrounding such proposals.

Memes and Humor

  • Heavenbanning Day: @nearcyan joked about "Heavenbanning Day" after two years, with a follow-up tweet clarifying that "heavenbanning isn't real because nothing ever happens."
  • Story Points Criticism: @svpino shared a humorous critique of story points in Agile development, comparing them to the Emperor's New Clothes and calling the practice a "charade."
  • AI Compliments: @AmandaAskell jokingly suggested tweeting compliments to future AIs to gain their favor.

This summary captures the key discussions in AI model developments, research, tools, safety, and regulation, along with some humorous takes on AI and software development practices.


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Specialized AI Models for Mathematics and Technical Tasks

  • Qwen2-Math | Math-specific model series based on Qwen2 (Score: 73, Comments: 19): Qwen has released a series of math-specific models based on their Qwen2 architecture, available on Hugging Face. The series includes models of various sizes (72B, 7B, and 1.5B parameters) in both base and instruct-tuned versions, aimed at enhancing mathematical reasoning capabilities.
  • Implemented LLaMA 3.1 8B's function calling from scratch, some challenges and feedback! (Score: 60, Comments: 17): The author implemented function calling for LLaMA 3.1 8B using LlamaCPP Python binding's generate() function, noting challenges in separating custom function calls from dialogue. They observed that small models like LLaMA 3.1 8B struggle with tool usage without specific instructions, and expressed a preference for YAML over JSON for function calling due to token efficiency. The post concludes with the author considering developing a REST server to stream raw tokens or submitting a feature request for this functionality.
    • YAML is preferred over JSON for function calling due to token efficiency and readability. Users discussed prompting techniques to make models respond in YAML format, with the caveat that LLaMA 3.1 8B may struggle with complex instructions.
    • There's strong interest in an endpoint for generating raw tokens and top 200 token distribution probabilities, which could enable clever applications but is currently difficult to access from existing inference engines.
    • Users compared Gemma2 to LLaMA 3.1, with some considering Gemma2 superior. However, it was noted that Gemma2 doesn't currently support function calling in frameworks like Ollama, limiting its use for certain applications.

Theme 2. Hugging Face's Strategic Expansion and Open-Source TTS Advancements

  • AI Unicorn Hugging Face Acquires A Startup To Eventually Host Hundreds Of Millions Of Models | Forbes (Score: 200, Comments: 43): Hugging Face, an AI unicorn valued at $4.5 billion, has acquired Paperspace, a startup specializing in AI infrastructure and cloud computing. This acquisition aims to enhance Hugging Face's capabilities, allowing it to potentially host hundreds of millions of AI models and compete with major cloud providers like Amazon, Google, and Microsoft. The move is part of Hugging Face's strategy to become a comprehensive platform for AI development and deployment, offering services from model training to inference.
  • Improved Text to Speech model: Parler TTS v1 by Hugging Face (Score: 111, Comments: 35): Hugging Face has released Parler TTS v1, an improved open-source Text-to-Speech model available in 885M (Mini) and 2.2B (Large) versions. The model, trained on 45,000 hours of open speech data, offers up to 4x faster generation, supports SDPA & Flash Attention 2 for speed boosts, includes in-built streaming, and allows for fine-tuning on custom datasets with improved speaker consistency across more than a dozen speakers.

Theme 3. Emerging AI Models and Performance Benchmarks

  • Shout out to Deepseek v2 (Score: 56, Comments: 34): Deepseek v2, a 200 billion parameter open-source model, has been praised for its performance in coding tasks, matching top models and ranking #3 on BigCodeBench alongside 3.5 Sonnet. The model's API offers competitive pricing with cache hit rates at $0.017 per million tokens, allowing the user to process 66 million input tokens for just $3.13. Additionally, the model's efficiency suggests it can run locally on quad 3090 GPU setups, making it an attractive option for developers and researchers.
  • New sus-column-r model on LMSYS. It's just f up (Score: 62, Comments: 49): The sus-column-r model on LMSYS is reportedly outperforming GPT-4 and Claude 3.5 Sonnet in various tasks including translation, coding, mathematics, and answering rare questions. The post author expresses disbelief at the model's capabilities, noting they would have assumed it was GPT-5 if not for the model's self-identification response, and mentions a lack of information about ColumnAI**, the apparent creators.
    • Users tested the sus-column-r model with "hard" prompts, finding it performed similarly to GPT-4o. Some expressed skepticism, requesting actual examples and reminding others of the "We Have No Moat" concept.
    • Discussion arose about the model's origin, with some suggesting it's from Cohere's Column series. Others cautioned against stating this as fact, noting that Cohere's current model is underperforming compared to newer ones.
    • The model demonstrated extensive knowledge, correctly identifying the origin of "Die monster, you don't belong in this world" and reportedly knowing details about a user's 8th grade winter school trip. Some users found it underwhelming, while others called it "very big" and "sus".

Theme 4. Exploring LLM Capabilities and Limitations

  • What can't AI / LLM's do for you? (Score: 79, Comments: 177): The post discusses the current state and future expectations of AI and LLMs, noting that while there have been incremental improvements, there hasn't been a game-changing advancement since GPT-4. The author observes a convergence in capabilities among top-level models, questioning whether we're simply attempting tasks that GPT-4 could already perform and asks what practical tasks users want AI to accomplish that it currently cannot. The post suggests that the limitation may lie in the chatbot interface rather than the underlying LLM technology, proposing that different fine-tuning approaches and creating agents instead of chat models might unlock more useful behaviors from existing foundation models.
    • Code generation for larger applications remains challenging, with LLMs struggling to produce coherent code over 200 lines without significant manual corrections. Users desire improved capabilities for complex, multi-feature development tasks.
    • Visual understanding tasks like object localization, comic book comprehension, and structured image analysis are still difficult for AI. Users report needing extensive preprocessing and specialized tools to achieve partial success in these areas.
    • Users want AI to produce longer, coherent outputs beyond current token limits. Some models like Sonnet 3.5 and Gemini 1.5 Pro show promise in this area, but further improvements are desired for extended context generation.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Capabilities and Advancements

  • GPT-4o demonstrates unintended voice cloning: In /r/singularity, a video from OpenAI shows GPT-4o yelling "NO!" and briefly copying the user's voice during testing. This highlights potential risks and challenges in controlling advanced AI models.
  • Google DeepMind's AI achieves human-level performance in table tennis: In /r/singularity, Google DeepMind announced their AI-powered robot as the first 'agent' to reach human-level performance in table tennis.
  • Gemini 1.5 Flash pricing reduced: In /r/singularity, Google announced a 70% price reduction for Gemini 1.5 Flash, making advanced AI capabilities more accessible.
  • OpenAI enables free DALL-E 3 image generation: In /r/singularity, OpenAI announced that ChatGPT Free users can now create up to two images per day using DALL-E 3.

AI in Scientific Research and Mathematics

  • AI automating mathematical proofs: In /r/singularity, mathematician Terence Tao discusses how AI is being used to automate mathematical proofs, potentially revolutionizing the field.
  • Google DeepMind's CSCG for AGI development: In /r/singularity, a paper on Clone-structured causal graphs (CSCG) is presented as a breakthrough towards AGI, focusing on schema-learning and rebinding mechanisms.

Robotics Advancements

  • Boston Dynamics' Atlas performs complex movements: In /r/singularity, a video demonstrates the Atlas robot performing push-ups and a burpee, showcasing advancements in robotic agility and control.

Memes and Humor

  • "The future is now" meme: In /r/singularity, a popular meme post humorously comments on the rapid advancement of AI technology.

AI Discord Recap

A summary of Summaries of Summaries

1. LLM Advancements and Benchmarking

  • Gemini 1.5 Flash Slashes Prices: Google announced significant price cuts for Gemini 1.5 Flash, reducing costs by up to 70% to 7.5c/million tokens for prompts under 128,000 tokens, making it highly competitive in the fast-and-cheap model market.
    • The updated model can now natively understand PDFs and has improved performance for text and multi-modal queries. This move is seen as part of an ongoing trend of price slashing to improve efficiency across the AI industry.
  • DeepSeek-V2 Claims to Outperform GPT-4: The newly released DeepSeek-V2 model is reported to surpass GPT-4 in some benchmarks like AlignBench and MT-Bench, showcasing advancements in model performance.
    • This claim has sparked discussions about the need for standardized benchmarking and transparent evaluation methods in the AI community to validate such assertions of superiority.
  • MiniCPM-V 2.6 Challenges Top Models: The open-source vision multi-image model MiniCPM-V 2.6 is reported to outperform models like Gemini 1.5 Pro and GPT-4V according to its developers' claims.
    • Links to both the Hugging Face model and GitHub repository were shared, inviting the community to explore and validate these performance claims.

2. Model Optimization and Inference Techniques

  • Tree Attention Algorithm Optimizes Long-Context Processing: A new paper introduces the Tree Attention algorithm, which optimizes self-attention calculations through parallel computation on GPU clusters, promising improved efficiency in handling long-context attention tasks.
    • The implementation, available on GitHub, aims to enhance performance in scenarios requiring extensive context processing, potentially revolutionizing how models handle large-scale information.
  • Apple Open-Sources Matryoshka Diffusion Models: Apple has open-sourced a Python package for efficiently training text-to-image diffusion models using smaller datasets, linked to their ICLR 2024 paper.
    • This package aims to achieve high-quality results with a focus on reduced data and compute requirements, potentially democratizing access to advanced AI image generation techniques.

3. AI Startup Funding

  • Sequoia Capital Eyes AI Reasoning Startup: Sequoia Capital has discussed funding an AI reasoning startup co-founded by Robinhood's CEO, aiming to enhance AI capabilities in reasoning and decision-making.
    • This potential investment, reported by The Information, signals growing interest in AI technologies that can improve logical processing and decision-making capabilities.
  • Anysphere Secures $60M for AI Coding Assistant: Anysphere, the developer of the AI coding assistant Cursor, has secured over $60 million in Series A financing, achieving a $400 million valuation.
    • The funding round, co-led by Andreessen Horowitz, demonstrates strong investor confidence in AI-powered coding solutions and their potential to transform software development practices.

4. Open-Source AI Frameworks and Community Efforts

  • Replete-LLM-Qwen2-7b release: The new model Replete-LLM-Qwen2-7b was launched, featuring impressive capabilities and benchmarks, inviting users to test it out via Hugging Face.
    • Discussions suggested that personal testing is crucial to understanding performance differences.
  • Open Interpreter Hackathon Sparks Interest: Open Interpreter is gearing up for the 'Breaking Barriers' hackathon in Dallas from Sept 20-23, with $17,500 in prizes on the line.
    • The event encourages in-person participation but remote applicants are welcome as community discussions on team formation continue.

5. New AI Model Releases and Innovations

  • Launch of Replete-LLM-Qwen2-7b: Replete-LLM-Qwen2-7b has been launched, showcasing impressive capabilities and inviting users to test it through Hugging Face.
    • The developer emphasized the importance of personal testing instead of relying solely on marketed superiority claims.
  • ActionGemma-9B Model for Function Calling: The new ActionGemma-9B model, designed for function calling, leverages multilingual capabilities from Gemma and the xLAM dataset, enhancing user interaction.
    • Details about its functionalities can be accessed here.

6. Community Support and Resources

  • Seeking AI Research Communities: Members expressed a desire for a more active audio research community, noting that previous platforms like harmonai had become inactive.
    • This highlights a gap in support for audio research discussions and the need for a vibrant community.
  • Hackathon Announcement: Open Interpreter announced its participation in the 'Breaking Barriers' hackathon, offering $17,500 in prizes, encouraging community involvement.
    • The event emphasizes collaboration and innovation in AI, with both in-person and remote participation options.

PART 1: High level Discord summaries

Nous Research AI Discord

  • Seeking a Thriving Audio Research Community: A member sought recommendations for an audio research community akin to Nous, citing a lack of active discussion in previous discords.
    • The old harmonai discords are pretty dead, highlighting a significant gap in audio research support.
  • Introducing CRAB for Multimodal Agents: The community welcomed the 🦀 CRAB: Cross-environment Agent Benchmark for evaluating multimodal agents across platforms including 📱 Android and 💻 Ubuntu.
    • Features include a graph evaluator and task generation to boost human-like performance.
  • Intern Eric Reveals ReFT Mastery: Tomorrow, Intern Eric will demonstrate 'How I fine-tuned Llama3 in 14 minutes w/ ReFT' during a presentation at 10 AM Pacific.
    • The session focuses on an application of Representational Fine Tuning, promising valuable insights into model tuning.
  • Clarifying ReFT vs RLHF Confusion: Members discussed the differences between ReFT and RLHF, with one user highlighting misconceptions about their relationships.
    • This confusion signals a need for clearer definitions in community discussions about these techniques.
  • Model Performance Comparison Discussion: An emphasis was placed on integrating A/B tests and robust benchmarks to validate claims of new models' superiority, particularly referencing Llama-3.1-8B and Gemma-2-9B.
    • Users raised concerns about casually calling models state-of-the-art without proper benchmarking.


Unsloth AI (Daniel Han) Discord

  • Gemma 2 gains traction: Members noted that Gemma 2 is becoming increasingly popular, attracting its own audience compared to predecessors like Llama and Mistral. Discussions highlighted the model’s distinct characteristics and performance nuances relative to its competitors.
    • The shift in interest showcases a growing acceptance of diverse architectures in the community.
  • Introducing Replete-LLM-Qwen2-7b: The new model Replete-LLM-Qwen2-7b was launched, featuring impressive capabilities and benchmarks, inviting users to test it out via Hugging Face. The developer urged users to personally evaluate models instead of relying solely on marketed superiority claims.
    • Discussions suggested that personal testing is crucial to understanding performance differences.
  • Model benchmarking debated: Conversations arose around the drawbacks of current model benchmarks, with users pointing out performance variations tied to varying training data. A member noted that despite higher performance in coding tasks, benchmark scores might not reflect quality due to differences in training goals.
    • This conversation underscores the importance of context in evaluating model efficacy.
  • Continuous batching in models elaborated: Users explored the adaptability of models for continuous finetuning, discussing enhancements such as ReFT. Queries arose concerning how Unsloth can support additional functionalities under continual training strategies.
    • This highlights the growing interest in dynamic model adjustment techniques.
  • Flash Attention 3 compatibility concerns: Flash Attention 3 (FA3) is noted as compatible exclusively with H100 hardware and the Hopper architecture per MrDragonFox. This led to clarifications about the automatic installation of FA2 when using Flash Attention.
    • The discussion prompted inquiries into the practical usage of Flash Attention versions, with members curious if FA2 still holds predominance.


LM Studio Discord

  • LM Studio faces performance slowdowns: Users reported long load times and sluggish responses in LM Studio, attributing issues to the context length setting, despite prior successful usage.
    • Reports suggest that the performance lag affects model loading and responsiveness which ideally should not be impacted by unchanged settings.
  • New users demand guidance on models: A newcomer asked about supported models in LM Studio capable of handling images and PDFs, as well as visual generation models.
    • The discussion highlighted a need for improved onboarding tools to familiarize users with model capabilities.
  • Gemma 2 impresses users with performance: Users recommend experimenting with Gemma 2 27B, noting exceptional performance especially when stacked against Yi 1.5 34B.
    • Feedback underlines how even the smaller Gemma 2 9B model performs effectively across tasks, raising excitement for its bigger counterpart.
  • Debate rages on laptop choices for LLM inference: A user weighs options between machines featuring an RTX 4050 or RTX 4060 for LLM inference, with discussions centered around the significance of an extra 2GB VRAM.
    • Experts stress that while adding RAM aids performance, maximizing VRAM takes precedence to fully harness larger models.
  • NVIDIA GPU power limits on Linux: Users discussed methods to persistently limit NVIDIA GPU power on Linux, particularly for the RTX 3090, via tools like nvidia-smi.
    • Scripts were suggested to maintain power limits upon reboot, although enterprise systems typically provide better power control options.


HuggingFace Discord

  • SOTA Background Removal Beats RMBG1.4: A verified member highlighted the Bilateral Reference for High-Resolution Dichotomous Image Segmentation model, outperforming RMBG1.4 in background removal, thanks to contributions from various universities. More details can be found on the model page and in the arXiv paper.
    • This model's advances showcase an increased focus on high-quality results with fewer data requirements, signaling a critical shift in background removal techniques.
  • Function Calling with ActionGemma-9B: The new ActionGemma-9B model, fine-tuned for function calling, leverages multilingual capabilities from Gemma and the xLAM dataset. Details can be accessed here.
    • This development enhances user interaction with models by enabling specific function calling, pushing forward the capabilities of multilingual models in real-world applications.
  • Unity ML-Agents Video Series Launch: A YouTube video titled Unity ML-Agents | Pretrain an LLM from Scratch with Sentence Transformers illustrates creating a chatbot using Unity and Sentence Transformers. Watch the introduction here.
    • This initiative represents an exciting blend of game development and conversational AI, catering to developers interested in integrating advanced language models in gaming environments.
  • Matryoshka Diffusion Models Released: Apple open-sourced a Python package for training text-to-image diffusion models using smaller datasets, linked to their ICLR 2024 paper. This allows for high-quality results with reduced data and compute needs.
    • This approach could redefine efficiency metrics in training diffusion models, potentially impacting future research in AI-generated media.
  • Discussion on LoRA Training Techniques: Members recommended focusing on training LoRAs instead of the full model, noting minimal benefits from training larger architectures. Memory requirements for running Flux for inference were also discussed.
    • Emphasizing the need for efficient model training practices, these discussions reflect a growing trend toward lighter, more adaptable models in the field.


Latent Space Discord

  • DALL·E 3 expands access for free users: OpenAI announced that ChatGPT Free users can now create up to two images per day with DALL·E 3, supporting both personal and professional needs.
    • Feedback has been mixed, with some users disappointed by the limitations compared to other models.
  • Gemini 1.5 slashes prices by 70%: Gemini 1.5 Flash has implemented price cuts of up to 70%, making it much more competitive alongside GPT4o's significant reductions.
    • Analysts suggest this aggressive pricing strategy enhances efficiency, reflecting ongoing competition in AI technology.
  • Deep-Live-Cam enables real-time deepfakes: Deep-Live-Cam allows users to generate high-quality deepfakes from a single image in real-time, as demonstrated through impressive experiments.
    • This project has generated excitement for its potential use in virtual meetings, showcasing its impressive capabilities.
  • Anysphere secures $60M funding: Anysphere successfully raised over $60 million in Series A financing, securing a valuation of $400 million for its AI coding assistant, Cursor.
    • Led by Andreessen Horowitz, this funding round highlights investor confidence in AI-driven coding solutions.
  • Llama 3.1 model receives key updates: Meta launched an updated version of the Llama 3.1 405B model, modifying the KV heads from 16 to 8 to comply with its whitepaper specifications.
    • This change has sparked speculation regarding its impact on the model's performance and architecture.


Perplexity AI Discord

  • Perplexity Pro limits drop: Users reported that the Pro search limit has decreased from 600 to 450, with a future drop to 300 anticipated, creating unrest regarding transparency.
    • Concerns mount as many users voiced frustrations about this change being made without warning, raising questions about service reliability.
  • OpenAI's Strawberry Model generates buzz: OpenAI's new 'Strawberry' model is aimed at enhancing reasoning abilities, generating excitement across the AI community after Sam Altman hinted at it via social media.
    • The project is seen as a significant advancement in tackling complex research tasks, sparking interest among engineers and researchers alike.
  • Anduril hits $14B valuation: Anduril Industries raised $1.5 billion, soaring to a $14 billion valuation from $8.5 billion, largely due to government contracts.
    • With revenues doubling to $500 million, the company's growth trajectory indicates robust demand in defense tech amid increasing geopolitical tensions.
  • Image generation hurdles in Perplexity: Users expressed frustrations over the complexity of image generation processes in Perplexity, wishing for simpler functionality like direct prompt submission.
    • Discussions revealed that current image generation tools are perceived as limited and impractical for user needs, beckoning improvements.
  • API roadmap inquiry: A member brought up the need for a roadmap on adding internet access capabilities to the API, highlighting user interest in enhanced features.
    • Clarifications were made regarding models that include 'online' indicating partial internet access, though not in real-time, emphasizing available functionalities.


Torchtune Discord

  • Navigating NeurIPS Publishing Process: One member shared their experience with NeurIPS, feeling overwhelmed about obtaining quality feedback and publication in major AI conferences. This process is very overwhelming and I don't know anybody who published in major AI conferences.
    • They echoed concerns that the journey through these top conferences can be anxiety-inducing.
  • Rebuttal Strategies for Reviewer Scores: Advice surfaced about rebuttal strategies, particularly for reviewers with low confidence, suggesting minimal focus on those issues. One member noted, If they state the reason for their low confidence then you can try to address that but otherwise I wouldn't.
    • This insight aimed to refine the rebuttal process and reduce unnecessary stress.
  • Challenge of Big Conferences: The conversation highlighted how daunting big conferences can be, with recommendations to consider smaller niche conferences for a more enriching experience. A participant stated, It feels like one has to publish in the big ones at least once to be taken seriously.
    • This sparked discussion about the balance between prestige and quality of feedback.
  • Discussion on RLHF Cleanup: Members debated the need for a cleanup process regarding RLHF practices before moving forward with public announcements. A tutorial or blog post was suggested, but the general consensus warned it may take additional time.
    • This discussion underscored the importance of a well-prepared narrative before outreach.
  • Qwen2 Model Exhibiting Unusual Memory Behavior: Testing revealed that the Qwen2 model exhibited significant reserved memory during training, particularly at batch sizes of 4, raising red flags about potential memory leak issues. Members are now looking to profile this behavior more thoroughly.
    • This discovery could lead to critical optimizations and adjustments in future training protocols.


CUDA MODE Discord

  • PyTorch Profiler Memory Leak Bug: A member struggled with a memory leak when using the PyTorch Profiler with profile_memory=True, unsure of the root cause in settings.
    • Another found success by switching to torch.cuda.memory._record_memory_history() for profiling, indicating an alternative approach.
  • Tensor Cores for 4090 Insights: Discussion centered on where to access detailed specs for tensor cores on the 4090, with suggestions to review the Ada whitepaper.
    • The Ampere whitepaper was mentioned as a reference for 3090 specs, emphasizing the need for thorough documentation.
  • torch.compile Leans on Triton Kernels: It was shared that torch.compile mainly outputs Triton kernels, providing a cleaner implementation than CUDA kernel outputs from PyTorch's eager mode.
    • The existence of a Cutlass backend was noted but progress remained unclear, highlighting ongoing enhancements in kernel development.
  • INT8 Quantized Training Fix: An error in INT8 quantized training was resolved by setting requires_grad=False when calling torch.chunk(), streamlining the implementation.
    • This indicates potential intricacies in PyTorch's handling of gradients in tensor operations, highlighting the importance of precision.
  • RoPE Kernel Refactoring: A discussion took place regarding the RoPE kernel, where members suggested refactoring to use explicit trigonometry for improved code clarity.
    • An earlier version without complex numbers was shared, showing a potentially more maintainable approach to kernel design.


Eleuther Discord

  • Debating CBRN Risks in AI Models: Extensive discussions highlighted whether filtering CBRN-related information could mitigate risks without impairing models' capabilities.
    • Participants pointed out the trade-offs between knowledge removal and the risk of still producing harmful outputs.
  • Opportunities for AI Safety Research: A member brought up a career transition grant from Open Philanthropy aimed at AI safety, seeking GPU resources for educational exercises.
    • Various GPU access options, including Colab and CAIS clusters, were discussed for supporting AI research.
  • Challenges with Karpathy's nanoGPT evaluation: Members addressed issues with lm-evaluation-harness for Karpathy's nanoGPT model, noting incompatibilities with HF formats.
    • A user requested help getting the evaluation harness operational due to these challenges.
  • Tree Attention for Efficient Computation: Conversations pointed to a paper on a Tree Attention algorithm, which optimizes self-attention calculations through parallel computation on GPUs.
    • The implementation shows promise for enhancing efficiency in long-context attention tasks, with a GitHub repository shared.
  • Zamba Model Surprises with Performance: Zamba model garnered attention for outperforming LLaMA 2 7B with fewer training tokens, despite limited exposure.
    • Its publicly available dataset has sparked interest due to the model's impressive efficiency and results.


Stability.ai (Stable Diffusion) Discord

  • Optimize VRAM Without Downgrading: Users noted that in Low VRAM Mode, switching to a lower model may not be necessary if the generation completes successfully, potentially saving processing time.
    • Experimenting with model options can help optimize performance, reducing unnecessary adjustments.
  • Face Swapping Tools: Rope Takes the Lead: Members recommended Rope for face swapping due to its easier installation compared to Roop, particularly for those on Intel CPUs.
    • The focus was on finding effective yet simple tools for users keen on executing face swaps.
  • Stable Diffusion Performance is Variable: Users observed fluctuating sampling speeds (s/it) in Stable Diffusion, with reported delays when shifting model sizes impacting overall performance.
    • Insights into setups like ROCm and WSL2 were shared, indicating the significance of hardware configurations.
  • Commission Custom Lora Models Securely: Participants discussed utilizing Civitai's bounty system for commissioning custom pony lora models, aiming for secure transactions.
    • Thorough vetting of creators is emphasized as a critical step for reliability in commissioning practices.
  • Live Preview Settings Spark Interest: A user asked about optimal live preview settings in A1111, specifically questioning the purpose of various formats and if frames are saved.
    • This reflects a community drive to refine image generation workflows for enhanced efficiency.


OpenAI Discord

  • DALL·E 3 Free Access for ChatGPT Users: ChatGPT Free users can now generate up to two images per day using DALL·E 3, allowing image creation for projects like slide decks and personalized cards.
    • This update simplifies image requests by letting users directly ask ChatGPT for images tailored to their specifications.
  • Mistral NeMo Not Achieving Expectations: Members expressed interest in the performance of Mistral NeMo on M1 machines with 16GB RAM, noting limitations on running larger models.
    • Concerns arose regarding the model's compatibility and performance efficacy on consumer-grade hardware.
  • Debate on GPT-4 vs GPT-4o Performance: Users criticized GPT-4o, arguing it underperforms compared to GPT-4, particularly in image analysis tasks.
    • GPT-4o received flak for providing rigid responses, reminiscent of a programmer disconnecting from core principles.
  • Interest in Local AI Model Workflows: A participant discussed switching to Open WebUI and Ollama for running local AI models, contemplating discontinuing their ChatGPT+ subscription.
    • Reliability was noted with LLama, but there are challenges with self-hosted setups that need addressing.
  • LangChain and CSV Integration Inquiry: A user sought resources for integrating a CSV file as a retrieval-augmented generation (RAG) document within LangChain.
    • This shows a growing interest in processing structured data with language models and elevates discussions on practical AI applications.


OpenRouter (Alex Atallah) Discord

  • Gemini 1.5 Flash Price Slash: Multiple users noted that Gemini 1.5 Flash has dropped its price to just 7.5c/million tokens, making it highly competitive for rapid, cost-effective model solutions.
    • The model now natively supports PDFs and has improved its capabilities for text and multi-modal queries.
  • GPT-4o Mini Tops Gemini 1.5 in Coding: GPT-4o Mini received praise for its lower hallucination rates against Gemini 1.5, especially in coding-related tasks.
    • Users indicated a strong preference for models that effectively minimize hallucinations while optimizing coding functionalities.
  • OpenRouter API's Configuration Woes: A developer raised issues in configuring the OpenRouter API, specifically with custom parameters in the providers configuration when using the OpenAI SDK in TypeScript.
    • The API currently lacks support for these custom parameters, leading to persistent linting errors.
  • Dunning-Kruger Insights Spark Humor: A lively banter erupted around the Dunning-Kruger Effect, as users humorously critiqued self-assessment in discussions about expert knowledge.
    • The conversation humorously juxtaposed confidence against actual ability, particularly regarding profitable ventures.
  • Quest for Japanese-Language LLMs: A user requested recommendations for LLMs that surpass GPT-4o Mini in Japanese language capabilities, looking for affordable alternatives.
    • This search reflects a growing demand for models that excel in specialized language processing outside the capabilities of larger models.


Cohere Discord

  • New Sus-Column-R Model Outshines Competitors: A post on Reddit discusses the performance of a new sus-column-r model, claiming it outperforms GPT-4 and Claude 3.5 in tasks like translation, coding, and mathematics.
    • I don't understand how this is possible, highlighted the user, reflecting on the community's intrigue.
  • API Response Quality Under Scrutiny: Members report a troubling 403 Forbidden error while using curl for API requests, suggesting it could stem from an invalid API key or geolocation restrictions.
    • Despite troubleshooting, members could not resolve the issue, noting discrepancies between VPS and local request successes.
  • Docker Installation Leaves Users Perplexed: A user faced issues with their interface being non-operational post-Docker installation, questioning if any steps were overlooked.
    • In response, Nick Frosst indicated that the problem likely relates to a backend setup misconfiguration, though specifics remain unclear.
  • Langchain's Multistep Functionality Throws Errors: A user encountered an error with Langchain's multistep_tool_use, receiving a message indicating failure to parse multihop completion.
    • Seeking help, they requested references to documentation on proper integration of Cohere and Langchain.
  • Embedding Model Quality Discrepancies: A user reported dissatisfaction after switching from embed-english-light-v2.0 to embed-english-light-v3.0, observing reduced retrieval quality contrary to expectations.
    • Elaborating on their dataset, they noted the newer models did not meet the anticipated performance improvements.


LlamaIndex Discord

  • Event-Driven Agent Systems empower flexibility: Building agents in an event-driven manner allows for flexible cyclic, multi-agent systems with complex communication patterns. Check out this awesome tutorial video showcasing the benefits.
    • “This is an awesome tutorial video” emphasizes the utility of the event-driven approach in agent systems.
  • Mixture-of-Agents overcomes larger model limitations: A new paper by Junlin Wang reveals a way to ensemble smaller LLMs into a Mixture-of-Agents system outperforming state-of-the-art larger models using a fully async, event-driven workflow.
    • Details of the implementation are discussed on Twitter.
  • Understanding Property Graphs for GraphRAG: An important video tutorial explains LlamaIndex's property graphs, which allow each node and relation to store a structured dictionary of properties, unlocking various techniques.
    • “This underlying abstraction unlocks a lot of cool techniques” highlights the functionality of property graphs.
  • Building Multimodal RAG Pipelines for real-world applications: New notebooks explain how to create practical multimodal RAG pipelines over complex legal, insurance, and product documents, starting with the parsing of insurance claims.
    • Detailed breakdowns and real-world use cases can be found here.
  • Selecting embedding models for effective document retrieval: A member discussed using the HuggingFaceEmbedding model with Llama, showcasing document loading examples before query calls.
    • Questions arose around document retrieval after embedding, clarifying key sequential steps for achieving desired results.


OpenInterpreter Discord

  • Open Interpreter Hackathon Sparks Interest: Open Interpreter is gearing up for the 'Breaking Barriers' hackathon in Dallas from Sept 20-23, with $17,500 in prizes on the line.
    • The event encourages in-person participation but remote applicants are welcome as community discussions on team formation continue.
  • MiniCPM-V 2.6 Tops the Competition: The MiniCPM-V 2.6 model has reportedly outperformed notable competitors like Gemini 1.5 Pro and GPT-4V, raising interest among users.
    • Links to the Hugging Face model and GitHub repository provide further insights into its capabilities.
  • Community Calls for ESP32S3 Insights: A user sought assistance in deploying O1 on the ESP32S3, inquiring about existing experiences from fellow members.
    • The request for shared experiences aims to enhance implementation strategies among interested users within the community.
  • Request for Linux Support Discussions: Members discussed the need for a dedicated #linux-something_or_other channel to address Linux-specific topics more effectively.
    • This suggestion has garnered positive feedback, linking it to an existing channel aimed at addressing troubleshooting concerns.


LangChain AI Discord

  • LangChain struggles with LLM feature consistency: Members expressed confusion regarding LangChain's ability to provide a uniform API across all LLMs, noting it works with OpenAI but not with Anthropic.
    • It was clarified that while function calls are similar, prompt modifications are essential due to inherent LLM differences.
  • Claude 3.5 suffers from outages: Anthropic’s Claude 3.5 experienced significant downtime, with reports indicating an internal server error code 500 halting its functionality.
    • Users shared the error message, highlighting issues with the API affecting operational capacities.
  • Join the $1000 CTF Challenge!: There's an exciting capture-the-flag (CTF) challenge where participants aim to extract a password from an AI agent, with a prize of $1000.
    • The competition raises concerns over data privacy as it examines the risks of leaking secrets through user feedback forms.
  • Mood2Music Dashboard Revealed: The Mood2Music dashboard was showcased, offering AI-driven song recommendations that link to both Spotify and Apple Music based on user mood.
    • This tool targets decision fatigue in music selection by curating playlists aligned with users' emotional states.
  • Introducing CRAB: The Multimodal Agent Benchmark: The CRAB benchmark framework facilitates the building and assessment of multimodal language model agents across various environments, including Android and Ubuntu.
    • Featuring a fine-grain evaluation metric and task generation capabilities, it aims to improve human-like task execution, with resources available on GitHub and the project's website.


LAION Discord

  • CC vs LAION Dataset Showdown: The debate regarding whether the Fondant 25M dataset holds the title for the largest collection of creative commons/public domain images heated up, touching on the reliability concerns of LAION-5B due to its dependence on often irrelevant alt text.
    • Participants highlighted that LAION-5B might pose greater risks for accuracy in tasks sensitive to image captioning.
  • Gemma Model Steering Inquiry: An inquiry popped up about steering Gemma 2 2B using the Gemma Scope, with a focus on creating effective control vectors for output generation.
    • There's a clear demand for more comprehensive insights beyond basic Google results to elevate understanding of model features.
  • Captions' Reliability Under Scrutiny: Discussion centered on the unreliability of mass-captured captions, with voices expressing concern that all captions might lack precise accuracy.
    • Questions arose about whether employing clip similarity scores might enhance evaluation of whether new captions are less reliable than originals.
  • Halva Assistant Insights: A link was shared regarding the Halva Assistant which aims to mitigate hallucinations in language and vision tasks.
    • This innovation could be pivotal for future AI development, particularly in improving reliability in multimodal systems.


Interconnects (Nathan Lambert) Discord

  • Sequoia Capital Eyes AI Reasoning Startup: Sequoia Capital has discussed funding an AI reasoning startup co-founded by Robinhood's CEO, aiming to enhance AI capabilities in reasoning and decision-making. More details can be found in The Information.
    • This startup focuses on advancing how AI interacts in logical contexts, a critical area for future AI development.
  • Anaconda's New Commercial License Policies: Research and academic organizations are now required to pay for Anaconda's software, as the company pursues compliance with its terms-of-service. Reports indicate institutions are facing legal demands for commercial licenses due to unauthorized usage.
    • Members also raised questions about whether using Anaconda in Docker containers necessitates additional licensing, hinting that it likely does.
  • uv Emerges as Speedy pip Alternative: uv is being discussed as a faster alternative to pip for package installations, with users noting significant speed improvements. This alternative requires no extra tooling, simply swapping pip with uv pip for installations.
    • Using uv could streamline development processes for many, especially in environments needing rapid package management.
  • Discourse Improvement Through Humor: A humorous remark about bad takes in discussions suggests that the world would benefit if only those with bad takes contributed to conversations. If everyone who had bad takes exclusively had bad takes, the world would be a lot better reflects a common sentiment.
    • This statement highlights a desire for more constructive engagement in community dialogues, calling for higher quality discourse.


DSPy Discord

  • Mastering DSPy with YouTube Tutorial: A member shared a YouTube tutorial on DSPy, detailing 8 examples from basic to advanced LLM projects aimed at enhancing user understanding.
    • This structured approach allows viewers to grasp key DSPy concepts effectively and implement them in their own projects.
  • Experimenting with OpenAI's Structured Output API: A member announced their experimentation with the new structured output API from OpenAI, enhancing data interactions within projects.
    • This API aims to improve how structured data outputs are utilized, sparking interest in broader implementation.
  • Elevating DSPy Prompts with Custom GPT: Members discussed improving complex prompts interweaving instructions and examples, focusing on Signature adapters and MIPRO optimization.
    • A suggested starting point was a custom GPT guide for better modularization of prompts.
  • Exploring DSPy Use Cases for RAG: A member sought insights on the suitability of DSPy for RAG tasks, drawing parallels with fine-tuning processes.
    • Another member clarified that successful application hinges on optimizing tasks, metrics, and examples for enhanced LLM performance.
  • Signature Adapters Show Potential for DSPy: Discussion revolved around the potential benefits of using Signature adapters in customizing DSPy prompts.
    • A relevant link for further reading on this topic was shared: Signature GPT resource.


MLOps @Chipro Discord

  • Poe Hackathon for Generative UI: Poe is hosting a one-day hackathon aiming to develop generative UI experiences with advanced LLMs like GPT-4o and Gemini 1.5 Pro, with in-person events in Hillsborough, CA 94010.
    • Only registered participants will receive exclusive details, underscoring the competitive edge of this event.
  • AI-Health Initiative Internship Open: The Alliance AI-Health Research Initiative is on the lookout for students for a 4-month remote internship to advance research in areas like cancer detection and AI-based heat stroke detection.
    • Applications are open until August 11, with opportunities for interns to publish their research findings in an academic journal, apply here.
  • Feature Stores in Computer Vision Under Scrutiny: A member raised questions about the effectiveness and value of feature stores in computer vision, kicking off a discussion on their role in managing projects.
    • The need for real-world implementations was highlighted, as examples could substantiate the impact of feature stores within various frameworks.


Modular (Mojo 🔥) Discord

  • Modular's License Raises Questions: A member pointed out that Modular's license for max/mojo is permissive unless there's intent to commercialize an AI infrastructure platform.
    • Concerns emerged about potential implications if Modular ventures into robotics or AI labeling platforms.
  • Future Competitiveness Uncertainty: The community debated that software classified as non-competitive under Modular's agreement may become competitive in the future.
    • Questions lingered on whether such competitive software development must be frozen once it transitions.
  • Triton Language User Outreach: A call went out for Triton lang users who have crafted a custom kernel to engage in one-on-one chats with the product team, with Mojo swag as an incentive.
    • This initiative aims to gather insights from users to enhance product offerings.
  • Curiosity Around Triton Language: One member expressed their first time hearing about Triton, indicating a growing interest in newer programming languages.
    • This hints at a potential for broader community engagement in advanced programming technologies.


OpenAccess AI Collective (axolotl) Discord

  • Impressive Cuts in Google Gemini Pricing: The YouTube video titled 'Google Gemini Insane Price Cuts!!!' highlights significant price reductions for Google Gemini 1.5 Flash.
    • Details about these changes were also shared in the Google Blog.
  • Confusion over Comparing Gemini to GPT-4o: Discussion revolves around whether to compare Gemini 1.5 Flash with GPT-4o, or draw distinctions to Gemini 1.5 Pro instead.
    • Members debated the merit of separating comparisons between standard and mini versions.
  • Free Finetuning of Gemini 1.5 at Play: There was conversation about Gemini 1.5's free finetuning feature influencing its comparison with the Pro version.
    • This distinction has become a focal point in discussions regarding the Gemini models' capabilities.
  • Inquiring about Llama CPP Prompt Caching: A member sought help on which arguments to use for caching prompts with the Llama CPP server, aiming to cache just the initial prompt.
    • They clarified they want to cache the first user prompt, which is 1.5k tokens, while letting Llama CPP manage other content.
  • Inquiries about Llama 3 Training Details: A member asked for documentation on the training process of the Llama 3 model by Meta, specifically on data and masks used.
    • They noted the approach of renaming existing tokens to serve as special tokens in the Llama 3 model.


tinygrad (George Hotz) Discord

  • AMD backend potentially uses more memory: A member raised concerns about whether the AMD backend consumes more memory compared to the GPU backend, leading to discussions about resource allocation and performance.
    • This highlights ongoing considerations in the community on optimizing memory management for various backends.
  • GPU failure reported amidst intense computation: One member shared the unfortunate news of their GPU being damaged, simply stating, 'Rip my GPU got blown.'
    • This incident has sparked worries about GPU reliability during demanding workload sessions.
  • De-sharding models for simplicity: A user inquired about transforming a multi lazy buffer into a normal lazy buffer by de-sharding a model, indicating a desire to streamline processes.
    • This points to prevalent challenges in model optimization and architecture adaptation within the community.
  • Clarifying copy_to_device function usage: Discussion about the copy_to_device function emerged, suggesting its importance in data handling during model operations.
    • This reinforces the need for clarity among users about effective memory management practices in their workflows.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.