AI News (MOVED TO news.smol.ai!)

Archives
October 15, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


Vertical SaaS agents are all you need.

AI News for 10/14/2024-10/15/2024. We checked 7 subreddits, 433 Twitters and 31 Discords (228 channels, and 1569 messages) for you. Estimated reading time saved (at 200wpm): 197 minutes. You can now tag @smol_ai for AINews discussions!

Another quiet day in technical news. But the agents funding landscape is afire, with Decagon announcing $100m in funding not long after Sierra's monster $4b round. It is remarkable how rapidly the consensus has converged that vertical AI agents are the way to go.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Industry Developments and Discussions

  • OpenAI Alumni Ventures: @bindureddy reported that Mira Murati, ex-CTO of OpenAI, is raising VC funds and poaching talent from OpenAI for a new venture. This highlights the growing competition in the AI market, with over 10 ex-OpenAI startups expected to emerge.
  • Nobel Prize for AI Achievements: @demishassabis shared his thoughts on winning the Nobel Prize for the AlphaFold2 project, which solved the 50-year grand challenge of protein structure prediction. He emphasized the importance of AI in scientific discovery and its potential for developing new therapies.
  • AI Model Developments:
    • @ClementDelangue noted that the gap between open-source and closed-source LLMs is now insignificant.
    • @johnowhitaker highlighted some interesting techniques used in a new model, including LoRA projectors for weight sharing and annealing on high-quality data.
  • AI Research and Applications:
    • @ylecun discussed the importance of high-bandwidth sensory inputs for self-supervised learning, arguing that language alone is insufficient for learning common sense.
    • @fchollet commented on a project combining LLMs with the Lean theorem prover, describing it as "intuition-guided reasoning" and a good example of deep-learning guided discrete program search.
  • AI Infrastructure: @nearcyan shared an image showing the datacenter size needed for frontier models, illustrating the massive computational requirements of cutting-edge AI research.
  • AI Tools and Frameworks:
    • @rasbt shared a Jupyter notebook with tips for reducing memory usage when loading larger models like LLMs in PyTorch.
    • @jerryjliu0 described a multi-agent workflow for report generation and form filling, utilizing tools like LlamaParse and long-context LLMs.
  • AI Ethics and Challenges: @ajeya_cotra expressed interest in research investigating how easy it is to get AI agents to perform harmful tasks they're supposed to refuse, and how competent they are at those tasks.

AI Model Performance and Benchmarks

  • Model Evaluation: @rohanpaul_ai shared information about a paper demonstrating how even a "null model" that always outputs a constant response can cheat automatic benchmarks and achieve top-ranked win rates.
  • Linearizing LLMs: @togethercompute announced LoLCATs, a new method for converting existing Transformers like Llama and Mistral into state-of-the-art subquadratic variants, potentially reducing computational costs.
  • AI Optimization: @rohanpaul_ai discussed LPZero, a framework for automatic Zero-cost proxy design in Neural Architecture Search, which could enhance efficiency in evaluating language model architectures.

AI Industry Trends and Opinions

  • Competition in AI: @adcock_brett criticized the notion of a large market with many winners in AI, emphasizing the importance of competitiveness.
  • Open-Source vs. Closed-Source: @ClementDelangue stated that the gap between open-source and closed-source LLMs is now insignificant, suggesting a leveling of the playing field in AI development.
  • AI Research Culture: @finbarrtimbers commented on the culture of empiricism in modern deep learning, noting both positive and negative aspects of this approach.

Memes and Humor

  • @ID_AA_Carmack shared an anecdote about winning a bet regarding the adoption of a distributed metaverse vs. Roblox, highlighting the unpredictability of technology adoption.
  • @DavidSHolz poetically described SpaceX's rocket catch as not just an engineering victory, but a cultural-spiritual one that stirs a deep yearning for science and objective truth.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Advancements in Small Language Models: Llama 3.2 1B Performance

  • Llama3.2:1B (Score: 116, Comments: 40): The post compares Llama3.2:1B to larger models, noting its effectiveness for code generation and one-time requests on systems with CPU and 8GB RAM. While it performs well for these tasks, the model's performance degrades in long conversations, with the 3B version handling extended chat histories more effectively despite being slower.
    • The rapid progress in AI since ChatGPT's launch is highlighted, with 1B models now providing comparable quality answers. Some users express excitement about "AI for the masses," while others report issues with smaller models, such as increased hallucinations and unrelated responses.
    • The post's UI received significant praise, with multiple comments describing it as "cool" and "crazy." The creator mentioned it's part of an AI device project and is considering adding code execution capabilities.
    • A user proposed the idea of hardware that "crystalizes" an LLM, suggesting dedicated hardware for performance gains in local LLM applications. The post creator responded, indicating plans for a future version with a dedicated model and board designed for lightweight use.

Theme 2. AI-Generated Game Environments: Current Limitations and Future Potential

  • Playing AI-Generated CS:GO on a Single RTX 3090 in real time (Score: 116, Comments: 49): A team of researchers developed an AI-generated version of Counter-Strike: Global Offensive (CS:GO) that runs in real-time on a single RTX 3090 GPU. The system uses a vision-language model to interpret game state and generate appropriate actions, achieving a frame rate of 4 FPS and demonstrating the potential for AI to create and play complex video games autonomously.
    • Users discussed potential improvements, suggesting a modular game with AI-generated textures and 3D objects, maintaining control over game mechanics while allowing for persistent states and shared player contributions.
    • Some compared the technology to AI-generated Doom gameplay and speculated about future applications, such as real-life driving simulations using dashcam footage with inputs for acceleration and steering.
    • Debate arose about the project's practicality, with some praising it as an "unreal experience" while others argued it's "light years away" from being useful, predicting significant advancements in 2-3 years.

Theme 3. Hardware Requirements for Running Large Language Models Locally

  • Hardware costs to run 90B llama at home? (Score: 55, Comments: 80): The post inquires about the hardware costs to run a 90B parameter version of the Llama language model at home for offline text generation. The user specifies that speed is not a critical factor and that additional features like vision or fine-tuning are not required, acknowledging that the setup might be unaffordable but expressing interest in exploring the possibility.
    • Llama 3.1 70B and Llama 3.2 90B have the same text model, with the 90B version including vision capabilities. Users can run the 70B model on various setups, including 64GB RAM for CPU inference, dual P40 GPUs for 6-7 tokens/s, or dual 3090/4090 GPUs for faster processing.
    • Hardware options range from budget to high-end: a single 3090 GPU setup (~$2,000) can run 70B models adequately; dual 3090 GPUs (~$3,000) can handle both 70B and 90B models; dual 5090 GPUs (~$6,000) offer comfortable performance for both. Apple Mac Studio M2 Max with 64GB RAM runs 70B models at ~7 tokens/s.
    • Alternative options include using AMD EPYC 7002 servers with 8-channel DDR4 memory, capable of running Llama 70B Q8 at 2 tokens/s or even Llama 405B Q8 at 0.6 tokens/s with dual CPUs and 512GB RAM. Some users suggest **AMD MI60 GP

Theme 4. Recreating GPT-like Thinking Processes in Open-Source Models

  • Recreating GPT o1 CoT Thinking (Thinking and Outputting) (Score: 34, Comments: 13): The post discusses the creation of a Thinking and Outputting tag function for OpenWebUI, attempting to replicate the behavior of GPT-O1. The author achieved this by fine-tuning instructions within the model file, requiring the model to support the ## Thinking tag and exit "Thinking" mode with "***", demonstrating the function with a video and providing a download link for others to try.
    • cddelgado hypothesizes that GPT-O1 uses a complex reasoning system involving chain of thought, tree of thought, and adversarial agents for planning and critique. They suggest implementing this with smaller LLMs using multiple conversations, with one as the main worker and another as an adversary.
    • kristaller486 clarifies that the post's implementation is not GPT-O1 but rather Chain of Thought (CoT), stating that O1 is an RL-based reasoning system, not just a prompt/agent/fine-tuned model. They provide a link for further information.
    • asankhs recommends trying the cot_reflection approach from the OptILLM GitHub repository to generate thinking and reflection tokens in responses, offering an alternative method to achieve similar functionality.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Techniques

  • Google Deepmind advances multimodal learning: A paper from Google Deepmind demonstrates how data curation via joint example selection can further accelerate multimodal learning. (/r/MachineLearning)
  • Microsoft's MInference speeds up long-context task inference: Microsoft's MInference technique enables inference of up to millions of tokens for long-context tasks while maintaining accuracy, dramatically speeding up supported models. (/r/MachineLearning)
  • Scaling synthetic data creation using 1 billion web-curated personas: A paper on scaling synthetic data creation leverages diverse perspectives within a large language model to generate data from 1 billion personas curated from web data. (/r/MachineLearning)

AI Model Releases and Improvements

  • Salesforce's "tiny giant" xLAM-1b model surpasses GPT 3.5 in function calling: Salesforce released xLAM-1b, a 1 billion parameter model that achieves 70% accuracy in function calling, surpassing GPT 3.5. (/r/LocalLLaMA)
  • Phi-3 Mini (June) with function calling: Rubra AI released an updated Phi-3 Mini model in June with function calling capabilities. It is competitive with Mistral-7b v3 and outperforms the base Phi-3 Mini. (/r/LocalLLaMA)

AI Applications and Demonstrations

  • AI-enhanced image upscaling reveals historical anomalies: A post demonstrates how advanced image upscaling techniques can reveal hidden details in historical images, potentially challenging established narratives. (/r/StableDiffusion)
  • AI-generated space port view: An image showcasing a futuristic hotel room view of a space port demonstrates the creative potential of AI image generation. (/r/StableDiffusion)
  • Adobe Firefly Video: Adobe introduced Firefly Video, described as "the first commercially safe video generation model", supporting text-to-video and image-to-video generation with a focus on prompt coherence. (/r/singularity)

AI in Warfare and Defense

  • AI improves Ukrainian drone effectiveness: A report claims that AI has raised Ukrainian drone kill rates to 80%, highlighting the increasing role of AI in modern warfare. (/r/singularity)

Philosophical and Societal Implications of AI

  • Questioning human reasoning abilities: A post asks if anyone has written a paper on "Can humans actually reason or are they just stochastic parrots?", suggesting that humans might fail reasoning tests in ways similar to LLMs. (/r/singularity)
  • Predictions of rapid AI-driven societal changes: Multiple posts discuss the potential for rapid, transformative changes due to AI advancements, with some predicting significant societal upheaval and others offering more speculative timelines for AI development. (/r/singularity)

AI-Generated Art and Media

  • Blending real-world and anime aesthetics: A post showcases AI-generated images that seamlessly blend realistic and anime-style elements, demonstrating advanced style transfer capabilities. (/r/StableDiffusion)

AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1: Gradient Accumulation Bug Fix Rocks AI Training

  • Eleuther Fixes Gradient Accumulation Bug, Stabilizes Training: Eleuther released a fix for a bug causing divergent training losses with large gradient accumulation sizes, linked to cross entropy loss normalization. Users are urged to update their libraries to benefit from this improvement.
  • Unsloth AI Boosts Accuracy by 10x with Gradient Fix: Unsloth AI announced a fix for a gradient accumulation bug, leading to over 10x accuracy improvement in training losses. New notebooks demonstrate the fix's impact, and users are encouraged to update Unsloth.
  • Nous Research Celebrates Gradient Fix from Unsloth AI: The Nous Research community discussed the gradient accumulation fix, highlighting its significance in improving training consistency across setups and enhancing model reliability.

Theme 2: SageAttention Speeds Up Inference, Engineers Excited

  • SageAttention Promises 2.7x Faster Model Inference: The paper SageAttention introduces a quantization method boosting operations per second over FlashAttention2 and xformers by 2.1x to 2.7x, while maintaining accuracy. Researchers are eager about potential efficiency gains in transformer models.
  • Training with SageAttention Hits a Snag: Attempts to use SageAttention for training led to divergence issues, underscoring that it's currently designed for inference acceleration. Discussions reveal challenges in adapting it beyond its intended purpose.
  • LM Studio Eyes SageAttention for Performance Leap: Community members highlight that integrating SageAttention into tools like llama.cpp and MLX could potentially double token processing speed. If implemented, it would mark a significant performance leap for transformer models.

Theme 3: AI Model Components Under Fire—QKNorm and ReLU² Scrutinized

  • QKNorm Gets the Cold Shoulder in Larger Models: Testing showed QKNorm underperformed under tight baselines, leading to "weak attention" in larger models and skepticism about its design merits.
  • ReLU²'s Meager 4% Gain Leaves Engineers Unimpressed: ReLU² offered only a 4% improvement over functions like GELU, casting doubt on its practicality for scaling large models and igniting debate over activation function efficacy.
  • Researchers Call Out Misleading Performance Claims: Participants noted that some claimed performance improvements might mask instability issues rather than represent genuine advancements, urging critical evaluation of such assertions.

Theme 4: AI Industry Shaken by Talent Moves and Controversies

  • Microsoft AI Star Sebastien Bubeck Joins OpenAI: Sebastien Bubeck's move from Microsoft to OpenAI is causing ripples in the AI community. Discussions focus on talent dynamics and the potential impact on AI research directions.
  • Controversy Erupts Over Bubeck's 'Sparks of AGI' Paper: Community members express mixed feelings about Bubeck's Sparks of AGI paper, with critiques targeting its hyperbolic positioning and questioning its implications for defining AGI.

Theme 5: LLMs' Reasoning Abilities Under Question

  • Apple Study Exposes Cracks in LLMs' Logical Reasoning: An Apple research study reveals that LLMs rely on probabilistic pattern matching, leading to logical reasoning errors when benchmarks change. Engineers discuss the necessity of human comparison baselines and precise definitions of "reasoning."
  • OpenAI Community Debates LLMs' Reasoning Limitations: Members highlight that LLMs struggle with genuine logical inference, causing "catastrophic" failures in tasks requiring true reasoning. The study prompts a reevaluation of how reasoning is defined and assessed in AI models.

PART 1: High level Discord summaries

Eleuther Discord

  • Gradient Accumulation Bug Fix Unveiled: A fix is now live for a bug that caused divergent training losses with large gradient accumulation sizes, directly tied to cross entropy loss normalization. Users are encouraged to read more in this blog post and update their libraries.
    • This issue was raised by multiple members, highlighting the importance of aligning the normalization strategy to ensure stable training loss curves.
  • QKNorm's Effectiveness Questioned: Testing revealed that QKNorm underperformed under tight baselines, leading to 'weak attention' in larger models, creating skepticism around its design. Interestingly, its use in the Olmoe project suggests mixed views on its potential.
    • Participants noted the need for further investigation into its implications for larger architectures, especially as attention mechanisms become crucial.
  • ReLU^2's Gains in Question: ReLU^2 only yielded a modest 4% improvement compared to competitors like GELU, raising doubts about its real-world utility in scaling. This nuanced performance analysis sparks a broader discussion about the activation functions used in large models.
    • The contrast in performance urges engineers to consider both minor enhancements and computational efficiency before adopting new activation methods.
  • Fine-Tuning Libraries under Review: Concerns arose about the limitations of existing fine-tuning libraries, like the absence of a non-chat-template structure in torchtune, as members seek improved evaluation methods. The community is eager for libraries that simplify the fine-tuning process without convoluted templates.
    • Discussion emphasized the usability of QuestionAnswerTemplate as a viable alternative for model evaluations, ensuring clearer metrics.
  • Misleading Performance Improvements Scrutinized: Participants noticed that claims of improved performance can often mask instability issues rather than reflect genuine advancements; A/B testing has been cited as a common pitfall. Papers lacking solid baselines are typically deemed less valuable unless they reveal significant performance shifts.
    • Such practices dilute the quality of research findings, making it vital for researchers to critically assess the conditions under which performance improvements are reported.


Unsloth AI (Daniel Han) Discord

  • Gradient Accumulation Fix Improves Training: Unsloth fixed a bug causing diverging training losses in gradient accumulation, boosting accuracy by over 10x. Users should update Unsloth and check new notebooks demonstrating the impact.
    • The fix was highlighted in a tweet from Unsloth AI mentioning significant improvements in training metrics.
  • Launch of INTELLECT-1 Decentralized Model: Prime Intellect introduced INTELLECT-1, a 10-billion-parameter model for collaborative decentralized training. This initiative aims to promote open-source AGI by allowing community contributions.
    • More details are available in their blog post discussing how this model can benefit distributed AI training.
  • SageAttention Promises Faster Model Inference: The paper SageAttention reveals a quantization method that enhances operations per second compared to FlashAttention2 and xformers by 2.1 and 2.7 times. The method maintains accuracy across various models.
    • However, efforts to use SageAttention for training showed divergence issues, underscoring its inferential focus rather than being viable for training.
  • Exploring LLM Fine-tuning Processes: Discussions revolved around workflows for fine-tuning LLMs like Llama, highlighting the impact of data formatting on output quality. Emphasis was placed on exploring diverse LLM outputs.
    • Participants considered how effective formatting and efficient data management would enhance model performance.
  • Comparative Analysis of Model Performance: A lively debate emerged surrounding the performance of models like Qwen and Llama, focusing on their applicability to fine-tuning and dataset utilization. Quality over quantity was a common theme.
    • Engagement centered around how specific datasets could yield better fine-tuning results while discussing integration with tools like Deepspeed for improved capabilities.


Perplexity AI Discord

  • Perplexity's Reasoning Feature Faces Inconsistency: Users noted that triggering the new reasoning feature in ProSearch appears random and varies with question complexity, causing inconsistencies in analyses.
    • They observed that the previous reasoning model was more reliable, while the new one has increased instances of hallucinations during information generation.
  • ProSearch App's Frustrating Delays: Many users expressed annoyance over the delay of the ProSearch Mac app, which was initially expected at an earlier date.
    • Additional complaints included issues with missing threads and overall sluggish performance in the application.
  • Adobe's AI Video Model Enhancements: Perplexity AI highlighted Adobe's AI Video Model as a transformative development in video editing, promising advanced features that improve workflows.
    • This innovation is anticipated to significantly enhance content creation speed and accessibility.
  • NASA Successfully Launches Europa Clipper: The NASA Europa Clipper mission has successfully launched, aiming to investigate potential signs of life on Jupiter's moon, Europa.
    • Experts eagerly await findings that may reveal new insights into the moon's subsurface ocean.
  • Chinese Researchers Break RSA Encryption: Recent reports reveal that Chinese researchers have successfully broken RSA encryption, creating major concern within the cybersecurity community.
    • This advancement prompts significant discussions about vulnerabilities in current encryption practices for sensitive data.


aider (Paul Gauthier) Discord

  • Aider's LLM Party with Multiple Instances: Users discussed the feasibility of running multiple Aider instances for larger and smaller tasks, suggesting it should work as long as they don't tamper with the same files.
    • One user humorously branded it as an 'LLM party', highlighting the fun potential of simultaneous LLM operations.
  • API Key Validation Woes: Members reported API validation errors when attempting to configure Aider with the Gemini model, particularly after setting the key in their .env file.
    • One user confirmed that the key worked via the command line, indicating that the issue is likely tied to their scripting setup.
  • Scripting Strategies for Efficient Command Use: There were discussions on scripting Aider commands effectively using Python and command line, emphasizing the need for correct environment loading.
    • A user recounted modifying an example script to implement the Gemini model, but encountered environment variable-related errors.
  • Comparison of Models: Aider vs Sonnet-3.5: Users noted that Sonnet-3.5 outperformed other models like Gemini for non-web development tasks, making it a preferred choice.
    • One user emphasized consistent superior results from Sonnet-3.5, while testing various models for coding tasks.
  • Gemini Integration and Configuration Challenges: There were inquiries regarding proper configuration of the Gemini-1.5 Pro model within Aider, focusing on API key setups.
    • Documentation references were made, yet users continued to face API errors stemming from environmental misconfiguration.


HuggingFace Discord

  • HuggingFace account recovery urgency: A user urgently sought help recovering their hacked and deleted HuggingFace account, advised to email website@huggingface.co for support.
    • Recovery time might take a few days, but members encouraged patience while awaiting responses.
  • AI automation raises job security concerns: Members discussed anxieties over AI's potential to automate jobs in Data Science and ML, emphasizing a hopeful shift towards more creative roles.
    • Comparisons were made to past technological advances that also transformed job structures.
  • Llama 3.2 model's inference speed debate: Running inference on a large dataset with the Llama 3.2 1B model on an A100 GPU took over 14 hours, prompting discussions on efficiency improvements.
    • Members shared their model loading and inference strategies to optimize performance.
  • Exciting Flutter development collaborations: A member announced their availability as a Flutter developer for collaboration on AI applications, inviting others to join forces.
    • This call emphasizes a growing need for partnerships in developing AI-focused projects.
  • Gradio 5 makes a splash on Product Hunt: The launch of Gradio 5 was announced on Product Hunt, with a request for community support.
    • Team members encouraged users to engage with the new features and provide feedback to boost visibility.


OpenRouter (Alex Atallah) Discord

  • Hermes 3 Llama 3.1 405B becomes a subscription model: The Hermes 3 Llama 3.1 405B Instruct model is now available for $1.79/month, with a free version accessible at OpenRouter.
    • Don't miss out on this updated pricing structure for powerful AI functionality!
  • Nous Hermes Yi 34B is deprecated: The Nous Hermes Yi 34B model has been deprecated by all service providers, making it no longer available for use.
    • Users are encouraged to transition to alternative models in light of this deprecation.
  • Highlighting the Rankings of AI Models: Users discussed the performance of various AI models, with Llama-3-8b-Instruct and GPT-4o gaining attention for following instructions effectively.
    • Grok 2 mini and Gemini 1.5 Pro were also noted as decent alternatives, while Opus faced some critique for its quirks.
  • Innovative Chatbot Design Techniques: A user proposed creating a hidden AI chatbot that avoids generic refusal messages to insults, suggesting the use of another LLM for filtering.
    • Participants highlighted models like Llama Guard for extra support in managing responses.
  • Issues Reported with Infermatic Provider: A user reported problems with the Infermatic provider as their chats began yielding irrelevant responses unexpectedly.
    • This alerted the community to potential service disruptions that have emerged recently.


Nous Research AI Discord

  • Nous Research Community's Origins: The Nous Research community started on Discord and evolved into a funded tech company focused on AI research and collaboration.
    • Members actively share ideas and work on various AI models and techniques, enhancing engagement and project outcomes.
  • Gradient Accumulation Bug Fix Released: The UnslothAI team resolved a significant bug in gradient accumulation that caused divergent training losses, improving overall consistency.
    • This fix is now available to users, streamlining training processes and enhancing model reliability.
  • Zamba2-7B Model Performance Explored: Zyphra announced the launch of the Zamba2-7B model, claiming it surpasses Llama3 and Mistral in performance and quality for consumer GPUs.
    • Details on capabilities are outlined in a recent blog post, which provides insights into its deployment.
  • Model Collapse due to Synthetic Data: Research shows that even 1% synthetic data in training sets can lead to significant model collapse, impacting performance of large models.
    • This underlines the risks involved in training large models like ChatGPT, suggesting current practices may require reevaluation.
  • Efficiency of SageAttention Method: SageAttention introduces a quantization method that boosts efficiency in attention mechanisms, outperforming FlashAttention2 by 2.1 to 2.7 times.
    • This method ensures high accuracy while significantly reducing computational complexity, making it vital for inference acceleration.


GPU MODE Discord

  • Lux-AI Challenge Invites Collaboration: Members are encouraged to contribute to the Lux-AI Challenge's GitHub repository to foster team collaboration.
    • There is a call for interested individuals to team up for the Lux-AI project, showcasing community engagement in contributing to the challenge.
  • Triton Struggles with Jetson Builds: Users reported issues building triton-lang on the Jetson Orin AGX 64GB, where CUDA mistook Unified Memory for AMD GPU. A rebuild is underway, with hopes that LLVM support is to blame.
    • Discussions revealed that users should check LLVM support for ARM on related issues.
  • Learn PyTorch for Deep Learning Now Available: A new course, Learn PyTorch for Deep Learning: Zero to Mastery, has been shared as a top resource for mastering PyTorch fundamentals.
    • The course format blends video insights with an accessible online book, offering a structured approach to learning.
  • Ollama Performance Hits Raspberry Pi: The Ollama model runs at 5.32 tokens/s with the llama3.2 version on the Raspberry Pi 5, while the llama3.1 case struggles at 1.5 tokens/s.
    • Discussion touched on the integration of an eGPU with a 2080, indicating a feasible upgrade path for the Raspberry Pi systems.
  • WebGPU Lacks CUDA Interaction: Clarifications were made that WebGPU does not interact with CUDA, meaning developers must rely on other APIs moving forward.
    • Moreover, WebGPU's functioning depends on specific graphics APIs defined by the operating system, such as Vulkan and DirectX.


Latent Space Discord

  • Real-Time STT Engines Set New Standards: Gladia's new Real-Time STT engine boasts < 300 ms latency, supporting over 100 languages and code-switching, backed by a $16M Series A funding. Another competitor's engine claims a 90ms inference time with multi-language support, escalating the competition in transcription tech.
    • As members discuss, this improvement positions these engines as viable choices for a range of applications in real-time communication.
  • Linear Attention Models Promise Efficiency Gains: The implementation of linear attention models within the Llama 3.1 family shows potential for significant efficiency, making it less resource-intensive. Conversations revealed challenges when attempting to transform >50% of transformer attention layers into linear versions.
    • Participants seem hopeful about this shift, emphasizing that it aligns with current resource optimization trends in machine learning.
  • AI as the New Building Material: A blog post compares AI's integration into industries to historical shifts caused by plastics, positing AI as a revolutionary material for modern design. The discussion centered around how previous material ages redefined production and architecture.
    • Participants expressed excitement for AI's growing role, echoing thoughts on how software is now more pivotal than physical materials.
  • Funding Announcements Ignite Curiosity: $65M Series B funding for DecagonAI stirred interest regarding trends in AI startup investments, especially in application layers instead of core models. Prominent investors included Bain Capital Ventures and Accel, highlighting a robust market for AI solutions.
    • Members noted that such fundraising endeavors reflect a shift in focus towards practical AI implementations, shedding light on current market dynamics.
  • Debate on Outsourcing Documentation: There's a vibrant discussion about the possibilities of outsourcing documentation for AI and open-source projects, weighing pros and cons of using LLMs vs. human writers. Community members reflect on how this could impact quality and accessibility.
    • The conversation raises questions on the balance between cost-effectiveness and thorough documentation, indicating a vital consideration in project management.


LlamaIndex Discord

  • Llama 3.1-70B Integration Faces Truncation Trouble: An integration of Llama 3.1-70B is returning truncated responses, consistently providing only 5 skills when a list of 20 software engineering skills is requested, due to hitting the max_tokens limit.
    • One user noted, 'Responses end with finish_reason: max_tokens' despite parameter adjustments.
  • Qdrant Node Addition Triggers Errors: A member encountered an error when adding new nodes to the Qdrant index, without prior reports of such issues, indicating a potential setup conflict.
    • Another user suggested that their own successful additions imply possible misconfigurations in the first user's setup.
  • Build a Financial Agent with Claude 3.5: You can create a Financial Agent powered by Claude 3.5 Sonnet using APIs for stock prices and company data shared by @financial_mod.
    • According to Hanane Dupouy, this agent provides diverse insights, including income statements and comprehensive company information.
  • PineconeVectorStore Failing in ComposableMemory: Members expressed frustration with PineconeVectorStore in SimpleComposableMemory, receiving a 'Namespace not found' error message.
    • Another user speculated set-up issues might be causing these persistent errors.
  • Performance Lag in Neo4jPropertyGraphStore Initialization: A significant delay in initializing the Neo4jPropertyGraphStore has been reported, with schema generation taking excessively long on larger graphs.
    • This issue may be exacerbated by not using async operations, corroborated by a related GitHub issue.


OpenAI Discord

  • LLMs Show Cracks in Reasoning: A recent Apple study reveals that LLMs utilize probabilistic pattern matching in mathematical reasoning, leading to errors when benchmarks shift.
    • Members expressed the necessity for baseline human comparisons and highlighted the ambiguous definitions of reasoning according to the study.
  • Swarm Library Needs Better Testing: Users examining the Swarm library identified difficulties in distinguishing whether tasks are executed by agents or the base LLM, underlining the need for robust tests.
    • Concerns about Swarm's non-production-ready status arose, along with mentions of alternatives like Swarm.js.
  • Confusion Over GPT Voice Features: Discussions emerged regarding the rollout of the advanced GPT voice feature, with no definitive announcements from OpenAI yet on its functionality.
    • Skepticism grew about potential updates due to past versions being unsupported.
  • Issues with Custom GPT Updates: A member's custom GPT, built from 300 pages of materials, remained in 'Update Pendings' for over a week after splitting the PDFs into six smaller files.
    • Despite the PDFs being acknowledged, the bot often redirected queries back to code, rather than answering directly from the documents.
  • Troubles with PDF Processing: Another member encountered performance issues when testing 1 PDF in GPT-4, indicating deeper problems with PDF content processing affecting responsiveness.
    • This suggests that there may be systemic challenges in how GPT interacts with PDF inputs.


LM Studio Discord

  • LM Studio configuration options need clarity: A member proposed that configuration details be shared in formats other than screenshots, noting that future blog posts will incorporate these changes.
    • This suggestion aims to enhance usability, making it easier for users to comprehend settings and optimizations.
  • M2 Studio excels with large models: Users are praising the M2 Studio equipped with 192 GB RAM for its impressive performance with Mistral's large 128K context model, proving ideal for specific applications.
    • It's such a good model for my use case underscores its value, possibly attracting more users to high-RAM setups.
  • Tweaking GPUs for performance boosts: One user recommended Under-volting (UV) GPUs using Afterburner, stating that even a 100mV adjustment can notably enhance performance.
    • They urged peers to check YouTube for targeted tutorials, facilitating better performance tuning across setups.
  • Stellar TPS performance from Llama 8B: Some users reported achieving 30 TPS with Llama 8B on various GPUs, with expectations for 150+ TPS driving discussions on necessary upgrades.
    • Factors like model size and quantization significantly influence performance, especially when comparing setups equipped with advanced tensor cores versus older GPUs.
  • SageAttention promises efficiency gains: The recent paper on SageAttention highlights outstanding efficiency improvements in attention mechanisms, with significant implications for tools like llama.cpp and MLX.
    • If implemented, it could potentially double token processing speed, marking a leap in performance for Transformer models.


Cohere Discord

  • Cohere Connector misunderstands inputs: Users reported that the Cohere Connector triggers a search even upon a simple 'hi', prompting inquiries about control features to limit unnecessary interactions.
    • Is there a way to refine its functionality? The community is actively seeking solutions to optimize this.
  • API Token Limits raise concerns: A discrepancy was raised regarding the Cohere API token limits, noting a 10k monthly cap versus 5 million tokens mentioned in chat, leading to questions about potential overage costs.
    • Will exceeding the 10k cap result in billing? Clarity is sought by members on this critical point.
  • Google Connector not performing: Multiple users are facing issues with the Google Connector, which is failing to operate correctly, sparking a troubleshooting session among users.
    • Share any breakthroughs! The community is encouraged to support one another in resolving this connectivity issue.
  • Command Model pricing clarified: Discussion clarified that there are no fees for the web-search connector, but charges apply to results sent to the Command input context, potentially impacting users' budget.
    • This distinction highlights the intricacies of API usage costs and encourages careful monitoring.
  • OrionChat aggregates AI models: A member launched OrionChat, a web interface enabling users to interact with various AI models from Cohere, OpenAI, and others seamlessly in one place, available at this link.
    • The initiative aims to consolidate conversations and facilitate comparisons across models, fostering user feedback for further refinement.


Stability.ai (Stable Diffusion) Discord

  • WordPress Plugin Development Seeks Feedback: A member is developing multiple WordPress plugins for text generation and txt2img servers, eagerly seeking community feedback and testing.
    • Nobody is responding, highlighting significant frustrations with community engagement in AI Discord servers.
  • CORS Issues Frustrate Stable Diffusion Setup: Users discussed persistent CORS errors faced while using SSL with Stable Diffusion servers on a reverse proxy setup.
    • A tech-savvy member emphasized the need for the webserver and Stable Diffusion server to run on the same machine for full functionality.
  • Searching for Active AI Communities on Discord: A member expressed disappointment in their AI Discord server's lack of activity, seeking suggestions for more vibrant communities related to comfyUI and A1111.
    • Unanswered inquiries about plugins point to a broader need for better engagement within the community.
  • Exploring Base Models for Text Generation: A user inquired about base models that enhance text generation during style transfer, specifically mentioning i2i and SD1.5.
    • Another member recommended trying flux or SD3, while cautioning that SD3 struggles with human representation.
  • Techniques for Creating Stylized Photos: Discussion centered around methods for producing stylized photos, with several members suggesting the use of ControlNets.
    • Creative approaches were shared, including techniques outlined here for various artistic styles, such as pin-up.


tinygrad (George Hotz) Discord

  • Tinygrad outperforms NumPy on .dot operations: A detailed comparison showed that Tinygrad's .dot operations exhibit accuracy drops for larger matrices, hitting ±0.001 differences for dimensions like M=16384, N=8192, K=1280.
    • Conversely, smaller matrices (M=10, N=4, K=5) only had minimal deviations, not exceeding ±0.000001.
  • VIZ UI improvements take center stage: A discussion revolved around Issue #7067, highlighting sought-after enhancements to the VIZ UI, especially related to autoscrolling features.
    • Proposals included resizing and collapsible sidebars, aiming to improve user experience.
  • George Hotz vows to rival PyTorch's performance: George posited that beating PyTorch's performance on NVIDIA GPUs would be monumental for Tinygrad, marking a turning point for the project.
    • 'All we have to do is beat PyTorch in perf and we win,' he stated, underscoring the stakes involved.
  • Unpacking TD-MPC implementation in Tinygrad: One user shared the exciting news about successfully implementing TD-MPC learning in Tinygrad and plans to test it on hardware.
    • Links to the GitHub repository were shared, detailing necessary hardware requirements.
  • Methods for disabling gradient calculations: Users debated effective ways to disable gradients, advocating for Tensor.no_grad while suggesting alternatives like with Tensor.test(): as a modern practice.
    • The conversation aimed to refine gradient control methods within the community.


Modular (Mojo 🔥) Discord

  • Resolving Library Installation Issues: A user found that missing libraries could be installed using sudo apt-get install libtinfo-dev, assisting others with similar installation issues.
    • This finding emphasizes the role of community knowledge sharing to tackle common problems effectively.
  • Addressing Custom stdlib Challenges: Users faced challenges running a modified version of stdlib, where original implementations persisted despite following build instructions.
    • A workaround involving adjustments to the build process was proposed to address these ongoing issues.
  • Seeking New Image Hashing Algorithms: Questions arose regarding the relevance of older image hashing algorithms such as pHash, with calls for recommendations on advanced alternatives.
    • The community's exploration showcases an eagerness to adopt cutting-edge techniques as technology evolves.
  • Discussing Memory Management Strategies: A premature destruction of a struct instance during an assertion call raised concerns about memory management in Mojo.
    • Suggestions included creating a getter method to safely access struct members, reducing risks of early destruction.
  • Collaborative Bug Reporting Success: A user reported a string interpolation issue that was confirmed to be fixed in the latest version of Mojo.
    • This instance highlights the effectiveness of community collaboration in identifying and resolving bugs swiftly.


Interconnects (Nathan Lambert) Discord

  • Sebastien Bubeck joins OpenAI: Microsoft's star AI researcher, Sebastien Bubeck, is making waves by moving to OpenAI, prompting discussions on talent dynamics in AI.
    • This move was first reported in an article from The Information.
  • o1-turbo-mini impresses in benchmarks: Buzz surrounds the performance of o1-turbo-mini, showcasing suspiciously strong results that have led to a mix of skepticism and humor among engineers.
    • Community members noted the amusing potential to poke fun at an overly online crowd reacting to this news.
  • Doomsday Clock for AGI stirs controversy: A Doomsday Clock launched by a Saudi-backed Swiss business school claims to warn against 'uncontrolled general intelligence,' criticizing it as outdated.
    • Creator Michael Wade argues it's absurd to liken software like Excel to the threats posed by AGI, reflecting historical fears rather than contemporary relevance.
  • AI2 seeks Research Interns for OLMo: AI2 announced openings for Research Interns in the OLMo project, aimed at enhancing natural language processing and machine learning.
    • This 12-week internship in Seattle offers competitive compensation between $86,520 and $123,600, focusing on impactful research initiatives.
  • OpenAI's impact on the legal field: Discussion highlights OpenAI's role in creating favorable conditions for lawyers, linking AI advancements to evolving legal jobs.
    • This underscores the growing interplay between AI technology and practical applications in the legal domain.


LangChain AI Discord

  • Framework Selection is a Nightmare!: Members expressed frustration about the constant shifting among frameworks like Langchain, Langflow, and Langgraph, making finalizing a production choice difficult.
    • One noted that their entire codebase has transitioned to Langchain LCEL, highlighting the chaos surrounding these frameworks.
  • Langgraph Deployment on Private Cloud: A member inquired about deploying a Langgraph application on their cloud outside of the US or EU, seeking community insights.
    • While there was no direct response, this inquiry sparked interest in regional application hosting.
  • Debate on dspy vs. Langchain: Interest arose around whether dspy would dominate over Langchain and other frameworks or if they would maintain relevance.
    • This reflects uncertainty in the community about the future landscape of AI frameworks.
  • Acknowledgment of Langsmith's Utility: One member suggested Langsmith is useful for tracing, emphasizing its importance among shifting frameworks.
    • This led to recommendations for the Langchain Academy course on Langgraph to sharpen related skills.
  • Clarification on Langflow's Affiliation: A user clarified that LangFlow is not an offering of LangChain, addressing confusion among members about related tools.
    • This distinction may help align understanding within the community regarding the various discussed frameworks.


LLM Agents (Berkeley MOOC) Discord

  • LLM Agents MOOC provides all course details online: All the details on labs and assignments can be found on the course website at course website, encouraging participants to check for updates.
    • To join, prospective students should fill in this form and engage with the community via the LLM Agents Discord for real-time support.
  • Test-time compute scaling law observed: Members discussed the broader impact of the 'test-time compute' scaling law, linking it to earlier laws affecting the GPT family, supported by this paper.
    • Another document relevant to this discussion was also shared, found here.
  • AI-Powered Search book emerges as essential: A member recommends this book as a critical resource for the next few years in AI-powered search technologies, likely impacting practitioners and researchers.
    • They expect its insights to be foundational for AI studies across various industries.
  • Lecture video quality concerns raised: One member noted the necessity for improved video quality in lecture uploads, stating that 720p is the highest available for lecture 6, making it hard to read code.
    • This concern indicates a demand for more accessible learning materials within the course.
  • Exploring reasoning and planning in LLMs: A member sought insights on how LLMs and agents work with reasoning, planning, and identifying tools rather than just generating text.
    • They expressed interest in further lecture coverage on planning and tool use to deepen understanding of LLM applications.


OpenInterpreter Discord

  • Open Interpreter hits π release: A member announced a new version update of Open Interpreter, accessible via pip install --upgrade open-interpreter, marking this as a significant π release with notable enhancements.
    • This tweet by Mike Bird shared the improvements and generated buzz around its capabilities.
  • Hume AI impresses, Oi takes the stage: A user recounted how the Hume AI model exceeded expectations, stating it works almost too well, which raises scrutiny on performance thresholds.
    • The conversation shifted focus to the Oi model, suggesting active experimentation with various AI frameworks.
  • Play 3.0 mini boosts Text-To-Speech: Play.ht unveiled Play 3.0 mini, a Text-To-Speech model that offers improved speed and accuracy across multiple languages while being cost-effective.
    • They invited users to test it out on the playground and share feedback on the enhancements.
  • Think-on-Graph calls for collaborators: The Think-on-Graph GitHub repository is now live, inviting researchers interested in collaborating in Shenzhen to check it out here.
    • The project includes an open invitation for contact via email for those wanting to contribute and be part of the research team.
  • Watch video on AI advancements: A user shared a YouTube video that touches on recent advancements formed around AI technologies.
    • Details were scant, suggesting viewers to engage directly to glean insights from the contents presented.


DSPy Discord

  • Curious about Loom Video Insights: A member shared a Loom video, likely containing insights relevant to ongoing discussions, though details were sparse.
    • The video piqued members' interest, prompting them to explore its content for valuable information.
  • Contextual Embeddings Resources Roll In: A member shared a Google Colab and a YouTube video titled 'Contextual Retrieval with Any LLM,' focused on implementing contextual embeddings.
    • The video aims to streamline the implementation of contextual retrieval strategies from Anthropic for various LLMs.
  • RAG Mechanics: Clarifying Chunking Process: Members discussed the challenges of adding whole documents to prompts without exceeding token limits, highlighting the chunking process integral to RAG (Retrieval-Augmented Generation).
    • It was clarified that RAG utilizes similarity search to include only the most relevant chunks, ensuring compliance with token limits.
  • DSPy Integration into GPT-O1+ Status Check: One member inquired about the progress of integrating DSPy into the GPT-O1+ system, anticipating updates on the development.
    • However, the details of this integration remain unaddressed in the discussions.


Torchtune Discord

  • ICLR Reviews are finally out!: The long-awaited review papers for ICLR have been released, prompting excitement among members eager to dive in.
    • One member noted it will take time to process their assigned review.
  • Study on Continuous Pre-training and Instruction Fine-tuning: A recent paper investigates the relationship between continuous pre-training and instruction fine-tuning for Large Language Models, emphasizing the need for models to stay updated with the latest data.
    • It raises the question of which model should undergo this pre-training for maintaining instruction-following abilities.
  • Model Merging Approach Critique: A member questioned the novelty of the approach in the paper, suggesting it resembles long-established methods of model merging.
    • This sparked a discussion about the relevance and originality of the proposed techniques.


LAION Discord

  • Inquiry on LAION-2B Dataset and MSCOCO Overlap: A member inquired about whether the LAION-2B dataset contains images from MSCOCO (COCO2014 or COCO2017), questioning the potential data overlap.
    • The inquiry highlighted the mention in the paper regarding data overlap, with a request for further details on the techniques employed to verify this issue.
  • Good Morning and General Greetings: Members exchanged general greetings with one member stating, 'Good morning everyone.' fostering a friendly environment in the chat.
    • Another member casually acknowledged the greeting with 'gm', contributing to a light atmosphere.


Gorilla LLM (Berkeley Function Calling) Discord

  • Decoding Inference Pipeline Mechanics: The inference pipeline in Gorilla LLM executes functions by outputting valid function calls that decod_exec can interpret, signaling turn completion when it outputs nothing or an un-decodable response.
    • This automatic signaling indicates when the model has finished its task, enhancing interaction efficiency.
  • Model's Output Stop Signals: A member underscored the importance of the model determining when to cease function calls, suggesting it can signal turn end by outputting nothing.
    • This flexibility becomes crucial for maintaining fluid user interaction in various scenarios.
  • Weather Inquiry Demonstrates Function Calls: An illustrative example showed the model handling a weather query using function calls like get_coordinate and get_weather, showcasing its data retrieval process.
    • The session concluded when the model's post-data output couldn't be decoded, effectively ending that turn.
  • Function Call Output Variability Explored: The model's approach to function call outputs allows it to stop or extend interactions creatively, including opting not to output anything at all.
    • This variability highlights the diverse techniques prompting models utilize to adapt to user queries.


LLM Finetuning (Hamel + Dan) Discord

  • Appreciation for LLM Finetuning Help: A user expressed gratitude towards another member for their assistance in LLM Finetuning efforts.
    • This gesture highlights the collaborative environment within the community, showcasing the shared knowledge and support for technical challenges.
  • Contribution Acknowledgment: Member cyberg0285 thanked another community member by tag for their contributions, indicating a supportive atmosphere.
    • Such acknowledgments foster a sense of community and collaboration among engineers working on complex LLM projects.


OpenAccess AI Collective (axolotl) Discord

  • Discussion on Forwarding Protocols: A member shared an important link regarding forwarding protocols, highlighting their relevance in recent discussions.
    • Here is the forwarded message for reference.
  • Importance of Information Sharing: Another member stressed the need for proper information sharing practices to boost community engagement and streamline communication.
    • They noted that forwarding messages can facilitate quicker responses and clearer communication.


Mozilla AI Discord

  • Launch of AI Stewardship Practice Program: The AI Stewardship Practice Program by MaRS Discovery District offers free slots for a pilot course aimed at positively influencing AI development. More details can be found on the Tech Stewardship website.
    • This microcredential program is designed for researchers, educators, and policymakers, providing an opportunity to engage in AI stewardship practices.
  • Become a Tech Steward: Participants can engage with offerings promoting the goal to bend the arc of technology towards good through this Tech Stewardship initiative. Interested individuals should reply in thread here to join the pilot course valued at 500 CAD.
    • The program aims to cultivate a community of tech stewards dedicated to responsible AI practices and ethical technology use.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.