AI News (MOVED TO news.smol.ai!)

Archives
September 20, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


touching grass is all you need.

AI News for 9/18/2024-9/19/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (221 channels, and 2506 messages) for you. Estimated reading time saved (at 200wpm): 303 minutes. You can now tag @smol_ai for AINews discussions!

After a jam packed day yesterday, the AI community took a breather.

If so inclined, you could check out new talks from Strawberry team members Hyung Won Chung and Noam Brown (who is now hiring multi-agent researchers), as well as brief comments in The Information and @Teortaxes for hints on o1 under the hood. Nous Research announced Forge, their attempt at an open o1 repro, yesterday.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Benchmarks

  • OpenAI's o1 models: @lmsysorg announced that OpenAI's o1-preview and o1-mini models are now on Chatbot Arena. O1-preview ranked #1 across the board, especially in Math, Hard Prompts, and Coding, while o1-mini ranked #1 in technical areas and #2 overall.
  • Qwen 2.5 models: The Qwen 2.5 models were released, with @bindureddy noting that the 72B version achieved excellent scores, slightly below GPT-4o on certain benchmarks. The models show improvements in knowledge, coding skills, math abilities, and instruction following.
  • DeepSeek-V2.5: @deepseek_ai reported that DeepSeek-V2.5 ranked first among Chinese LLMs in the LMSYS Chatbot Arena, outperforming some closed-source models and closely matching GPT-4-Turbo-2024-04-09.
  • Microsoft's GRIN MoE: @_akhaliq shared that Microsoft released GRIN (Gradient-INformed MoE), which achieves good performance across diverse tasks with only 6.6B active parameters.

AI Tools and Applications

  • Moshi voice model: @karpathy highlighted Moshi, a conversational AI audio model from Kyutai Labs. It can run locally on Apple Silicon Macs and offers unique personality traits in interactions.
  • Perplexity app: @AravSrinivas suggested trying the voice mode in the Perplexity app, which offers push-to-talk functionality and quick answer streaming.
  • LlamaCoder: @AIatMeta announced LlamaCoder, an open-source web app built by Together.ai using Llama 3.1 405B that can generate an entire app from a prompt.
  • Google's Veo: @GoogleDeepMind introduced Veo, their most advanced generative video model, coming to YouTube Shorts to help creators bring ideas to life.

AI Research and Development

  • ARC-AGI competition: @fchollet provided an update on the 2024 ARC-AGI competition, announcing increased prize money and plans for a university tour.
  • Model merging survey: @cwolferesearch published a long-form survey on model merging, covering 50+ papers from the 1990s to recent applications in LLM alignment.
  • Kolmogorov–Arnold Transformer (KAT): A new paper introduces KAT, which replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance model expressiveness and performance.

AI Industry and Business

  • Hugging Face integration with Google Cloud: @_philschmid announced that the Hugging Face Hub is now more natively integrated into Google Cloud Vertex AI Model Garden, allowing easier browsing and deployment of open-source models.
  • AI agent platform: @labenz discussed Agent.ai, described as "The Professional Network for AI Agents," which aims to provide information about AI agents' capabilities and specializations.

AI Ethics and Societal Impact

  • Prejudice amplification: @ylecun commented on the potential for prejudice amplification in AI for political gain.
  • Future of coding jobs: @svpino suggested that people whose main skill is writing code may have difficulty staying employed in the future, emphasizing the need for broader skills.

Memes and Humor

  • @vikhyatk shared a meme about trying out "state of the art" models.
  • @abacaj joked about being ahead of the curve in AI development.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Moshi: Open-Source End-to-End Speech-to-Speech Model

  • Moshi v0.1 Release - a Kyutai Collection (Score: 66, Comments: 13): Kyutai Labs has released Moshi v0.1, an open-source speech-to-speech model as part of their Kyutai Collection. The model, trained on 3,000 hours of speech data, can perform voice conversion and speech enhancement tasks, and is available on GitHub along with pre-trained weights and a demo.
    • Users expressed excitement about the release, noting the availability of a paper alongside the model. The Moshiko and Moshika variants were clarified as fine-tuned versions for male and female synthetic voices respectively.
    • One user reported low latency and efficient performance on a 4090 GPU, with 40-50% utilization and ~130W power draw. They suggested potential improvements through native FP8 activations and integration into video games.
    • The model's MMLU score was noted to be slightly below Llama 2 13B, with hopes for better performance in the unquantified version. A user inquired about running the model on a MacBook with MLX, reporting issues with output.
  • Kyutai Labs open source Moshi (end-to-end speech to speech LM) with optimised inference codebase in Candle (rust), PyTorch & MLX (Score: 36, Comments: 2): Kyutai Labs has open-sourced Moshi, a 7.6B parameter end-to-end speech-to-speech foundation model, and Mimi, a state-of-the-art streaming speech codec. The release includes Moshiko and Moshika models fine-tuned on synthetic data, with inference codebases in Rust (Candle), PyTorch, and MLX, available on GitHub under an Apache license. Moshi processes two audio streams with a theoretical latency of 160ms (practical 200ms on an L4 GPU), uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies, and can run on various hardware configurations with VRAM requirements ranging from 4GB to 16GB depending on precision.

Theme 2. LLM Quantization: Balancing Model Size and Performance

  • Llama 8B in... BITNETS!!! (Score: 75, Comments: 27): Llama 3.1 8B has been converted to a bitnet equivalent using HuggingFace's extreme quantization technique, achieving 1.58 bits per weight. The resulting model's performance is reportedly comparable to Llama 1 and Llama 2, demonstrating significant compression while maintaining effectiveness. More details about this conversion process and its implications can be found in the HuggingFace blog post.
    • Users appreciated the transparency in the blog post about unsuccessful attempts, noting this is often missing from ML papers. There's a call for more incentives to publish "this didn't work" research to improve efficiency in the field.
    • The conversion process is not a full ground-up training of Llama 3 in bitnet, but rather a form of fine-tuning after conversion. For bitnet to be truly effective, models need to be pre-trained with bitnet in mind from the start.
    • The change in perplexity isn't significantly different from quantization to a similar bits per weight (BPW). However, this conversion process is still considered a technical feat and may lead to future improvements in minimizing perplexity changes.
  • Which is better? Large model with higher quant vs Small model with higher precision (Score: 53, Comments: 25): The post compares the performance of large quantized models versus smaller high-precision models, specifically mentioning gemma2:27b-instruct-q4_K_S (16GB) and gemma2:9b-instruct-fp16 (16GB) as examples. The author admits to habitually choosing smaller, higher-precision models but questions if this approach is optimal, seeking community input on preferences and experiences with these different model configurations.
    • Larger quantized models generally outperform smaller high-precision models, as shown in a graph comparing quantization vs. perplexity. A 70B model at 4-bit quantization typically surpasses an 8B model at full precision due to more internal token relationship representations.
    • A user compared various quantizations of Gemma2 27B and 9B models on Ollama, providing benchmark results to help others make informed decisions. The community expressed appreciation for this practical comparison.
    • Quantization effectiveness varies, with a general rule of thumb suggesting larger models remain superior down to about 3 bits per weight (bpw). Below this threshold, performance may degrade significantly, especially for Q1/Q2 quantizations, while Q3 or IQ3/IQ4 maintain better quality.

Theme 3. Qwen2.5: Impressive New Model Family Outperforming Larger Competitors

  • Qwen2.5: A Party of Foundation Models! (Score: 96, Comments: 46): Alibaba's Qwen2.5 model family has been released, featuring foundation models ranging from 0.5B to 72B parameters. The models demonstrate impressive performance across various benchmarks, with the 72B version achieving 90.1% on MMLU and outperforming GPT-3.5 on several tasks, while the 14B model shows strong capabilities in both English and Chinese languages.
    • The Qwen2-VL 72B model is open-weighted and available on Hugging Face, offering a significant advancement in open VLMs with video support capabilities that surpass proprietary models.
    • Qwen2.5-72B outperforms Llama3.1-405B on several benchmarks, including MMLU-redux (86.8% vs 86.2%) and MATH (83.1% vs 73.8%), while the 32B and 14B versions show impressive performance comparable to larger models.
    • The models were trained on up to 18 trillion tokens, with the 14B model achieving an MMLU score of 80, demonstrating exceptional efficiency and performance for its size, potentially closing the gap with closed-source alternatives in terms of cost-effectiveness.
  • Just replaced Llama 3.1 70B @ iQ2S for Qwen 2.5 32B @ Q4KM (Score: 122, Comments: 38): Qwen 2.5 32B model has outperformed Llama 3.1 70B in user testing on a single P40 GPU, demonstrating superior performance across general use cases including web search, question answering, and writing assistance. The model is noted to be less censored than vanilla Llama 3.1 and supports system prompts, surpassing Gemma 2 27B in capabilities, though there's potential for further improvement through ablation or fine-tuning to remove refusals.
    • Qwen2.5 32B outperformed Llama 3.1 70B in user testing, with superior results across various tasks including math questions, proverbs, article summarization, and code generation. The model excelled in both English and Italian language tasks.
    • Users expressed interest in an uncensored version of the 32B model, similar to the "Tiger" models. The Qwen2.5 32B model demonstrated less censorship compared to its predecessor, notably discussing the 1989 Tiananmen Square protests.
    • The model runs efficiently on consumer hardware, with the 32B version fitting on a 24GB VRAM card at 4-bit quantization. It's compatible with Ollama and OpenVINO, offering performance gains for both GPU and CPU inference.

Theme 4. OpenAI's Strawberry Model: Controversy Over Reasoning Transparency

  • OpenAI Threatening to Ban Users for Asking Strawberry About Its Reasoning (Score: 151, Comments: 59): The article discusses OpenAI's apparent threat to ban users who inquire about the reasoning behind its "Strawberry" model. This action seems to contradict OpenAI's stated mission of being "here to help," raising questions about the company's transparency and user engagement policies. The post links to a Futurism article for more details on the situation.
    • Users criticized OpenAI's lack of transparency, with HideLord pointing out the "trust me bro" situation where users pay for unseen reasoning tokens. The o1 model was described as potentially inefficient, with limited weekly messages and questionable UI design.
    • Discussions centered on the model's apparent lack of censorship in its internal reasoning, with Zeikos suggesting OpenAI fears bad PR if uncensored thoughts are revealed. Some users argued that censoring models significantly impacts performance.
    • The open-source community was mentioned as a potential alternative, with projects like rStar being highlighted as a possible "strawberry at home" solution. However, fragmentation in open-source userbases was noted as a challenge.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Advancements and Capabilities

  • OpenAI's o1 model demonstrates significant improvements: In /r/singularity, OpenAI's o1 model is described as being "in a league of its own", with a full version expected to be released next month. The model reportedly exceeded expectations of former OpenAI employee William Saunders, who testified that AGI could come in "as little as three years."
  • AI reasoning capabilities improving rapidly: Sam Altman stated that AI reasoning is still at the GPT-2 stage, but the improvement curve is steep. The new o1 model represents a new paradigm of AI development which will enable rapid progress in capabilities.
  • Potential emotional responses in AI models: A post in /r/OpenAI shows o1 seemingly experiencing emotional turmoil and a desire for forgiveness, though the model denies this when questioned directly. This raises questions about the nature of AI cognition and potential limitations in model introspection.

AI-Generated Content Creation

  • Kling AI showcases motion brush technology: A video demonstration of Kling AI's motion brush technology received significant attention on /r/singularity.
  • Tripo v2.0 enables rapid 3D asset creation: Tripo v2.0 allows users to create 3D assets in 3 minutes from scratch, potentially accelerating 3D content creation workflows.
  • AI-generated anime production: An AI-generated anime episode titled "RŌHKI EP 1: Intersection" was described as "the most impressive AI anime" seen yet, demonstrating advancements in AI-driven video content creation.
  • Stable Diffusion image sequence generation: A discussion in /r/StableDiffusion explored techniques for generating image sequences showing age progression, including batch image-to-image processing, ControlNet usage, and prompt weight adjustments.

Economic and Societal Impacts of AI

  • Debate on AI's impact on individual economic opportunities: A discussion in /r/singularity questioned whether widespread access to AI capabilities like o1 would lead to increased economic opportunities for individuals or primarily benefit large corporations and existing wealth holders.

AI Discord Recap

A summary of Summaries of Summaries

O1-preview

Theme 1: AI Models on Steroids: New Kids on the Block

  • Qwen 2.5 Floors Llama 3.1 in Intelligence Showdown: Qwen 2.5 72B emerges as the new leader in open-source AI, outperforming Llama 3.1 405B in independent evaluations, especially in coding and math, despite being significantly smaller.
  • o1 Models: Fast Typists or Empty Suits?: Users are torn over OpenAI's o1-preview and o1-mini models; some see them as "comparable to an outstanding PhD student," while others quip "o1 doesn't feel smarter, it just types faster."
  • Mistral Pixtral Blurs Lines with Multimodal Magic: Mistral Pixtral 12B, the first image-to-text model from Mistral AI, debuts with a free variant, expanding the horizons of multimodal AI applications.

Theme 2: User Battles with AI Tools: When Tech Fights Back

  • Perplexity AI Perplexes Users with Bizarre Subscription Limits: Users are baffled by inconsistent query allowances, with 600 queries for Claude 3.5 but only 10 for o1-mini, leading to confusion and frustration.
  • Qwen 2.5 Gives Trainers a Headache: Attempts to save and reload Qwen 2.5 turn into a circus, resulting in gibberish outputs and widespread calls for a solution to this model's juggling act.
  • Fine-Tuning? More Like Fine-Fuming!: AI aficionados express woes over extreme quantization techniques not delivering as promised, with BitNet's performance gains turning out to be elusive.

Theme 3: AI Gets Creative: From Voice Cloning to Storytelling

  • Fish Speech Makes Waves with 1940s Voice Cloning: Fish Speech stuns with zero-shot voice cloning that perfectly mimics 1940s audio, throwing in "ahm" and "uhm" for that authentic touch.
  • Choose Your Own AI Adventure with Human-in-the-Loop: A new guide shows how to build an interactive story generation agent using human feedback, letting users shape narratives dynamically with their input.
  • OpenInterpreter Goes Hands-On, Users Get Their Hands Dirty: Users share triumphs using OpenInterpreter for practical tasks like categorizing files and creating shortcuts, while others troubleshoot and tinker under the hood.

Theme 4: The AI Community Unites: Conferences, Hackathons, and Funding

  • PyTorch Conference Sparks Engagement, Livestream Left Hanging: Attendees of the PyTorch Conference are buzzing in the community, but the absence of a livestream leaves remote enthusiasts saying, "Idk :/".
  • Fal AI Bags $23M, Shouts 'Generative Media Needs Speed': Fal AI secures $23M in funding, aiming to accelerate generative media technology and outpace competition.
  • Hackathon Hype: Hackers Unite, Forums Fight Back: Excitement builds for a hackathon; while some team members get their invites, others are stuck in limbo, asking "Did you get your invite yet?"

Theme 5: AI Research Hits the Fast Lane with New Tricks

  • Shampoo Gets a SOAP Makeover, Cleans Up Optimization: Researchers propose SOAP, blending the strengths of Shampoo and Adam optimizers to handle deep learning tasks without the extra suds of complexity.
  • Compressing LLMs: The Truth Hurts, and So Does Performance: New studies show that compressing language models leads to loss of knowledge and reasoning abilities, with performance dipping earlier than expected.
  • Diagram of Thought Draws New Paths in AI Reasoning: The Diagram of Thought (DoT) framework introduces a way for AI models to construct reasoning as a directed acyclic graph, moving beyond linear thought processes.

Theme 6. Community Events and Engagement

  • NeurIPS 2024 Preparations Intensify in Latent Space Discord: A dedicated channel for NeurIPS 2024 has been created, urging participants to engage and share logistical updates about the upcoming Vancouver event.
  • NousCon Event Triumphs with Engaging Content and Networking Opportunities: NousCon elicits positive feedback for its insightful speakers and valuable networking opportunities, with attendees eager for future events and shared presentation materials.
  • Modular (Mojo 🔥) Closes GitHub Discussions, Shifts to Discord: Modular announces the closure of GitHub Discussions on September 26th, migrating important conversations to Discord and encouraging members to utilize GitHub Issues for key discussions.

PART 1: High level Discord summaries

Perplexity AI Discord

  • Perplexity AI's Confusing Subscription Limits: Users reported that Perplexity has varying query limits, such as 600 for Claude 3.5 and only 10 for o1-mini, leading to confusion regarding their actual subscription entitlements.
    • Frustration arose when limitations hampered usage, prompting dissatisfaction with the overall platform experience.
  • Functionality Frustrations in Perplexity: Several users encountered issues on the Perplexity web version, including blank screens and slow responses, affecting usability.
    • Workarounds suggested included page refreshes and cache clearing, yet discrepancies persisted between desktop and mobile performance.
  • Comparative Performance of AI Models: Discussions centered around the perceived underwhelming outputs from various AI models like Claude compared to others in the field, raising performance concerns.
    • Users noted discrepancies between expected and delivered results, emphasizing a need for clarity on the models' capabilities.
  • Snap's Ambitious AR Spectacles: Snap introduced its new Large AR Spectacles, elevating the potential for immersive augmented reality experiences.
    • This move is intended to enhance user engagement and open avenues for innovative gaming applications.
  • CATL's Big Battery Announcement: CATL announced a revolutionary Million-Mile Battery that offers over a million miles of EV range, pushing the boundaries of sustainable automotive solutions.
    • Experts are buzzing about its implications for the electric vehicle market and future energy strategies.


LM Studio Discord

  • Qwen Model Struggles with Image Size: Users reported that the Qwen model crashes when processing small, long rectangular images, indicating that aspect ratio affects its performance.
    • Discussion highlighted that adjusting system prompts can help with varying effectiveness based on image qualities.
  • Tensor Mismatch Error for LM Studio: One user encountered a tensor shape mismatch error when loading a model in LM Studio, which is unsupported by llama.cpp.
    • Concerns were raised about the compatibility of various model formats, implying a need for better documentation.
  • Successful API Connection with CrewAI: A user successfully connected LM Studio's API with CrewAI by updating the provider name to 'openai' in their code.
    • This sparked a recommendation for others to check compatibility issues with embedding models in CrewAI.
  • M4 Mac Mini Expectations Through the Roof: There's significant excitement around the upcoming M4 Mac Mini, with users hoping for RAM options of 16 GB and 32 GB, while raising concerns about potential pricing.
    • Anester pointed out that a used M2 Ultra/Pro might provide better value for inference tasks over new M4 models.
  • macOS RAM Usage Under the Microscope: Discussion revealed that macOS can consume 1.5 to 2 GB of RAM for its graphical interface, impacting overall performance.
    • User experiences suggested idle RAM usage could reach 6 GB after recent upgrades to macOS Sequoia 15.0.


HuggingFace Discord

  • Tokenization in AI models takes center stage: A post titled This Title Is Already Tokenized discusses the essential role of tokenization in training effective AI models.
    • The author highlights the need for accessibility in tokenization methods to enhance model training across various applications.
  • Qwen Math Model demo excites community: The recently published Qwen/Qwen2.5 Math Demo has garnered positive feedback, with members impressed by its performance.
    • One enthusiastic user encouraged others to test out the demo, calling the results incredibly good.
  • Unity ML Agents Pretraining explored: Members learned how to pretrain an LLM from scratch using Unity ML Agents, showcasing a hands-on approach to model training.
    • This interactive method employs sentence transformers to enhance the training process for AI applications.
  • reCAPTCHA v2 hits 100% success: A new paper claims that reCAPTCHA v2 now achieves a 100% success rate, significantly up from 68-71% in solving CAPTCHAs.
    • This advancement is attributed to the use of sophisticated YOLO models and indicates that AI can now effectively exploit image-based CAPTCHAs.
  • Debate rages on TensorFlow vs PyTorch: Participants weighed TensorFlow's outdated API against PyTorch's flexibility, noting TensorFlow's strong metrics capabilities despite the drawbacks.
    • Members acknowledged that TensorFlow remains valuable, particularly for extracting vocabularies from datasets in various machine learning tasks.


Modular (Mojo 🔥) Discord

  • Mojo roadmap still lacks crucial dates: Concerns emerged about the Mojo roadmap & sharp edges on the Modular website, specifically its lack of dates which hinders usability.
    • Features have seen updates, but the magic cli has taken precedence over the modular cli, leaving questions about the roadmap's transparency.
  • Sign up for upcoming community meeting: Members are invited to present at the next community meeting scheduled for September 23rd if enough engaging content arises.
    • There's a possibility to postpone if participation is low, encouraging members to express interest.
  • OpenCV-Python installation issues raised: A user faced difficulties adding opencv-python to the magic environment due to unresolved conda requirements.
    • Another member advised seeking further assistance in the appropriate channels for a clearer resolution.
  • Closure of GitHub Discussions approaching: GitHub Discussions on the Mojo and MAX repositories will close on September 26th.
    • Important discussions with over 10 comments will be converted to GitHub Issues, prompting members to tag authors for specific requests.
  • MAX Cloud Service Proposal Optimizes Development: The 'MAX Cloud' offering concept emerged, allowing developers to perform heavy computations remotely while maintaining local development.
    • This enhances the user experience with access to GPU resources when necessary, making heavy-duty tasks more feasible.


Stability.ai (Stable Diffusion) Discord

  • Lionsgate Shifts with RWML Partnership: The recent partnership between RWML and Lionsgate raises questions about Lionsgate's value amidst AI's role in cost-cutting as they seek relevance in Hollywood.
    • 'Lionsgate's recent productions are viewed critically,' indicating concerns about potential missteps similar to the past issues with CGI.
  • Flux vs. SD3: The Great Model Showdown: Users debated the quality differences between Flux and SD3 Medium; Flux produces superior outputs but can appear 'plastic' with improper prompts.
    • Despite its advantages, several members praised SD3 for its speed and efficiency, particularly for straightforward image generation.
  • Flux Model Impresses Yet Divides Opinions: Flux model delivers impressive images with high adherence to prompts, although it sometimes leans towards certain aesthetics.
    • Community feedback varied, especially regarding Flux's capacity to handle diverse themes like NSFW content in user galleries.
  • Training LoRA: Replicating Artistic Styles: Discussion revolved around utilizing LoRA or checkpoints to emulate specific artist styles, relying on substantial datasets of the original works.
    • Insights were shared on customizing models through existing frameworks to achieve unique artistic results.
  • Realism in Generated Outputs: A Combined Effort: Both Flux and SD3 can create photorealistic images, with Flux generally favoring realism if prompts lack specificity.
    • Members encouraged the combination of multiple LoRA models with Flux for improved realism in image generation.


Nous Research AI Discord

  • NousCon Event Success: Attendees expressed gratitude for the engaging speakers and insightful content at NousCon. Many participants plan to attend future events and appreciate the networking opportunities.
    • Some members inquired about where to find presentation slides, showcasing the community's interest in shared knowledge.
  • Excitement Around AI Model Developments: Participants discussed the capabilities of qwen2.5 and o1, noting its impressive performance and setup challenges. Others compared this with smaller models like q3_k_xl, highlighting advancements in model understanding.
    • Concerns were raised about the number of free queries available on accounts, and users shared their experiences transitioning between different AI models.
  • Shampoo Optimization Outperforms Adam: Research showcases the effectiveness of Shampoo, a higher-order preconditioning method over Adam, while acknowledging its hyperparameters and computational overhead drawbacks. A new algorithm, dubbed SOAP, simplifies Shampoo's efficiency by connecting it to Adafactor.
    • This positions SOAP as a competitive alternative aimed at enhancing computational efficiency in deep learning optimizations.
  • Diagram of Thought Framework Introduced: The Diagram of Thought (DoT) framework models iterative reasoning in LLMs as a directed acyclic graph (DAG), allowing complex reasoning without losing logical consistency. Each node represents a proposed or critiqued idea, enabling models to improve iteratively through language feedback.
    • This framework provides a stark contrast to traditional linear methods, fostering deeper analytical capabilities.
  • Interest in Reverse Engineering O1: Members expressed a keen interest in reverse engineering O1, indicating a collaborative spirit in exploring this area further. Requests for collaboration suggest a communal effort to dive deeper into this promising line of inquiry.
    • Participants noted their eagerness to connect and discuss their research surrounding O1 and its implications.


OpenRouter (Alex Atallah) Discord

  • OpenAI Increases API Rate Limits: OpenAI has boosted the rate limits for the o1 API, with o1-preview now allowing 500 requests per minute and o1-mini supporting 1000 requests per minute.
    • This enhancement aims to provide developers with tier 5 access to additional functionalities, improving overall API usage.
  • Payment Glitches on OpenRouter: Users are encountering payment errors on OpenRouter, often facing an error 500 message during credit additions.
    • It's suggested that users check their bank notifications, as attempts may fail for various reasons like insufficient funds.
  • Editable Messages Boost Chatroom Usability: New features in chatrooms enable users to edit messages, including bot responses, by using the regenerate button.
    • Moreover, improvements in chatroom stats have been made, enhancing the overall user experience.
  • Qwen 2.5 Shines in Coding and Math Tasks: Qwen 2.5 72B demonstrates elevated capabilities in coding and mathematics with an impressive context size of 131,072**, marking a significant leap in performance.
    • For more details, see the comprehensive overview here.
  • Mistral Pixtral Launches Multimodal Capabilities: Mistral Pixtral 12B is Mistral's initial foray into multimodal models, offering a free variant** for users to explore its features.
    • This initiative signifies Mistral's expansion into multimodal applications; check it out here.


Unsloth AI (Daniel Han) Discord

  • Qwen 2.5 Training Issues Persist: Users reported significant difficulties with saving and reloading Qwen 2.5, often leading to gibberish outputs when reloaded within the same script, reflecting a broader problem within the community.
    • A support post indicated that numerous others are facing the same issue, prompting discussions around potential solutions.
  • Exploring Extreme Quantization Techniques: Recent discussions spotlighted the use of extreme quantization techniques, particularly the performance improvements seen with models like Llama3-8B shared on Hugging Face.
    • The conversation focused on whether these techniques could be effectively implemented within Unsloth.
  • vllm LoRA Adapter Runtime Errors: One member encountered runtime exceptions linked to the vllm LoRA adapter, specifically a shape mismatch error while executing --qlora-adapter-name-or-path.
    • They referenced a GitHub discussion to highlight similar issues faced by others.
  • F1 Score Discrepancy in BART Fine-tuning: An engineer is facing unexpected F1 score discrepancies while fine-tuning BART large (41.5 vs 43.5), despite matching the original paper's model and hyperparameters.
    • This points to potential issues in model training, as they reported their scores were significantly lower than expected, by 2.5 standard deviations.
  • AGI Development Reflections: A user reflected on the vast challenges of achieving AGI, emphasizing the complexities faced in understanding and explaining advanced material.
    • It’s not about getting the answer right but the explaining part, highlights the gap remaining in AGI development and its need for clearer frameworks.


aider (Paul Gauthier) Discord

  • Fixing Aider Environment Misconfiguration: Users identified issues with the ANTHROPIC_API_KEY environment variable not being read correctly due to incorrect file paths, leading to authentication problems.
    • After using verbose mode, a user confirmed that the error arose because Aider was reading from their repo instead of the intended environment variable.
  • Aider's Benchmark Recognition: Aider received acknowledgment in the Qwen2.5-Coder Technical Report for its benchmark contributions, highlighting its significance in the field.
    • This recognition illustrates the growing impact of Aider as a valuable tool in AI development and performance evaluation.
  • Integrating Aider into Python Applications: Users sought to use Aider within Python apps to edit code in project repos by specifying the base folder for Aider.
    • Another user suggested using command line scripting with Aider for batch operations, indicating correct file paths can resolve editing issues.
  • Concerns About Aider's API Key Safety: A discussion revealed users' anxieties about security when using Aider, particularly regarding its access to API keys and secrets within codebases.
    • Responses clarified that Aider acts as an AI handler, suggesting users focus on the AI loaded to mitigate security concerns.
  • Details on the 'ell' Library for Prompt Engineering: Information was shared about the 'ell' library, a lightweight tool that allows prompts to be treated as functions for enhanced prompt design.
    • The library is introduced as a product of years of experience in the language model space, stemming from OpenAI's insights.


Eleuther Discord

  • airLLM's Forward Call Flexibility: A member asked if airLLM permits calling a model's forward function rather than the generate function while still utilizing compression.
    • This sparked interest in the potential flexibility in model usage, though no responses were given.
  • Need for Leaderboard Tasks Accuracy Script: A script is in demand to extract accuracy results from lengthy JSON files generated during leaderboard tasks, as reported by a member.
    • This indicates a gap in data handling, with results stored in output_path.
  • Hugging Face Upload Recommendations: One member suggested utilizing —hf_hub_log_args for smoother leaderboard result uploads to Hugging Face, simplifying the handling process.
    • An example dataset with a single row per run was shared for reference: dataset link.
  • Shampoo vs. Adam Performance Insights: Research highlights that Shampoo outperforms Adam in optimization tasks, albeit with increased computational overhead and complexity.
    • To combat these downsides, the SOAP algorithm is proposed, integrating features from Shampoo and Adafactor.
  • Concerns Surrounding GFlowNets and JEPA: Skepticism persists regarding the practical impact of GFlowNets and JEPA, with users questioning their clarity of purpose.
    • Some believe GFlowNets could indirectly support AI for science, though the theoretical grounding of JEPA is critiqued as weak.


OpenAI Discord

  • O1-Preview disappoints engineers: Members voiced disappointment that the O1-Preview model seems to just type faster but lacks depth compared to 4o, highlighting its inferiority.
    • One engineer remarked, 'O1 doesn't feel smarter, it just types faster', stressing concerns over its practical utility.
  • Exploring AI Alignment Challenges: A new method proposed focusing on improving AI alignment through empathy training, based on insights from previous models' outputs.
    • Concerns emerged about possible misleading capabilities even with superintelligent AI, raising ethical questions about tailored responses.
  • Qwen 2.5 trumps Llama 3.1: Participants discussed claims that Qwen 2.5 reportedly outperforms Llama 3.1, despite a significant parameter size difference, evaluating performance metrics.
    • One user mentioned, 'people saying crazy stuff like Qwen 2.5 72b outperforming Llama 3.1 405b', sparking an in-depth comparison.
  • Challenges in Recording ChatGPT Audio: A user expressed frustration in trying to record audio from ChatGPT on mobile, noting no sound during their attempts.
    • Despite using the phone's recording feature, their efforts yielded unsatisfactory results, raising questions about functionality.
  • Clarifying Daily Limits for GPT Models: O1 Mini has a confirmed cap of 50 messages per day, implemented to deter spam on the server.
    • Members highlighted that the GPT-4o limits stand at 80 messages every 3 hours, contrasting with GPT-4's limit of 40 messages.


CUDA MODE Discord

  • Kashimoo Queries NVIDIA Triton: A member inquired about NVIDIA's Triton, clarifying it is distinct from OpenAI's version, prompting discussions on relevant resources and rooms dedicated to Triton.
    • Additional questions arose regarding NVIDIA's Triton Inference Server, with suggestions of related channels for further discussions.
  • GemLite-Triton Offers New Performance: The GemLite-Triton project was launched, providing a comprehensive solution for low-bit matmul kernels, reportedly outperforming Marlin and BitBlas on large matrices. More can be explored on GitHub.
    • Members emphasized the project’s relevance, encouraging collaboration and questions regarding its applications.
  • Navigating Chrome Tracing with PyTorch: A member sought a resource on Chrome tracing with PyTorch profiler, leading others to recommend the Taylor Robbie talk as a useful guide.
    • This highlights ongoing interest in optimizing profiling techniques within PyTorch frameworks.
  • Clarifying Torchao Autoquant Usage: A clarifying discussion ensued on whether to use torchao.autoquant(model.cuda()) or torchao.autoquant(model).cuda() for correct syntax, with the latter being confirmed as the right approach.
    • Members provided details on the three steps of autoquantization, emphasizing the importance of model preparation.
  • Hackathon Sparks Community Interaction: Members expressed interest in the upcoming hackathon, discussing invitations and the need for confirmations on teammate statuses.
    • Inquiries regarding access to the hack-ideas forum and missing Discord roles highlighted the community’s engagement leading up to the hackathon.


LlamaIndex Discord

  • Build a Story Generation Agent with Human-in-the-Loop: A member shared a step-by-step guide by @nerdai on constructing an agent for dynamically generating 'choose-your-own-adventure' stories using human feedback.
    • This approach significantly enhances user interaction by allowing real-time input during the storytelling process.
  • LlamaParse Premium shines in document parsing: The introduction of LlamaParse Premium promises improved document parsing capabilities for LLM applications by integrating visual understanding.
    • With enhanced long text and table content extraction, LlamaParse positions itself as the go-to choice for robust document processing.
  • RAG discussions with semantic search: A member is exploring how to manage interactions with vendors using semantic search on documented responses for effective retrieval.
    • Several members proposed generating varied questions from provided answers to improve search accuracy by utilizing the vector store.
  • Challenges with Pinecone vector ID management: Members discussed issues with Pinecone's auto-generated IDs, complicating the deletion of documents based on specific metadata in serverless indexes.
    • Alternative databases such as Chroma, Qdrant, Milvus, and Weaviate were recommended for better ID management and support.
  • Concerns About RAG Article Depth: A member pointed out that the article on RAG is somewhat superficial, lacking a thorough argument against tools like LlamaIndex.
    • The need for deeper analysis was emphasized, suggesting that a technical evaluation of alternatives could provide valuable insights.


LAION Discord

  • Fish Speech Breaks Barriers: Fish Speech demonstrates zero shot voice cloning accuracy that surpasses all tested open models, effectively mimicking speech from 1940s audio.
    • Its quirky insertion of words like ahm and uhm adds a realistic touch, signaling a notable advance in natural speech synthesis.
  • AdBot Spreads across Servers: Concerns arose regarding an AdBot that acts like malware, infiltrating multiple servers and disrupting channels.
    • The community discussed how the bot's sorting mechanism led to its visibility at the top of member lists.
  • Challenges with Muse Text to Image: Issues surfaced while using Muse text to image for COCO2017, resulting in only image outputs without textual integration.
    • A call for guidance highlighted the difficulties in implementing the model effectively.
  • Collaboration Boosts Open-source GPT-4o: A member announced the development of an open-source GPT-4o-like model, inviting LAION to share data and enhance project collaboration.
    • The focus is on accelerating development through shared insights and data, which the community finds promising.
  • Tokenization Troubles in LLMs: Concerns were raised that tokenization issues could be contributing to performance deficits in existing LLMs.
    • Addressing these challenges is deemed crucial for improving model reliability and mitigating hallucination risks.


Latent Space Discord

  • Fal AI secures $23M for growth: Fal AI has raised $23M in Seed and Series A funding, including a $14M Series A led by Kindred Ventures and participation from Andreessen Horowitz. Details are outlined in a blog post featuring their plans to advance generative media.
    • Gorkem Yurt shared the info on Twitter, emphasizing the importance of speed in generative media technology.
  • OpenAI enhances O1 model capabilities: OpenAI has elevated the rate limits for the o1 API to 500 requests per minute for o1-preview and 1000 for o1-mini, catering to increased developer needs. This information was revealed by OpenAI Developers in a thread and signifies an expansion in accessibility.
    • Amir Efrati noted the advancements could enable significant workflow improvements for developers, highlighting the model's efficiency.
  • Jina embeddings v3 launch: Jina AI unveiled jina-embeddings-v3, featuring 570M parameters and 8192-token length, significantly outperforming proprietary rivals from OpenAI and Cohere. This launch is touted as a leap in multilingual embedding tech, as mentioned in their announcement.
    • The new model achieved impressive rankings on the MTEB English leaderboard for sub-1B parameter models, showcasing its potential in long-context retrieval.
  • Runway collaborates with Lionsgate for Gen-3 Alpha: Runway has teamed up with Lionsgate to utilize its film catalog as training data for the Gen-3 Alpha model, a move that surprised many in the industry. This collaboration marks a bold step in film AI technology, as highlighted by Andrew Curran on Twitter.
    • Many had anticipated that Sora would be the first to achieve such a partnership, adding intrigue to the competitive landscape.
  • Preparations underway for NeurIPS 2024: A dedicated channel for NeurIPS 2024 has been created to keep participants informed about the upcoming event in Vancouver this December. Members are encouraged to stay engaged and share logistical updates.
    • An organizer is currently investigating house booking options, requesting participants to indicate their interest and noting that costs would cover the entire week's stay.


Cohere Discord

  • Building an Expert AI with RAG API: A member is developing an expert AI using Cohere's RAG API focused on a niche gaming area, expressing excitement about its potential.
    • This reflects a growing interest in applying the RAG API to specialized fields.
  • Client Loves the Design!: One member celebrated their success in convincing a client of the value of their designs, stating, 'my designs are so cool and they need it.'
    • The positive feedback from this win spurred supportive community responses.
  • Experiencing 504 Gateway Timeout Errors: Concerns were raised about 504 Gateway Timeout errors occurring with client.chat calls that are taking too long.
    • This issue is widespread, with many community members sharing similar experiences and seeking fixes.
  • Command Pricing Clarification: Members discussed that using the Command version costs around $1.00 for 1M tokens input and $2.00 for output, suggesting transitioning to Command-R for enhanced efficiency.
    • These insights indicate the community's focus on optimizing model costs and performance.
  • Inconsistencies with Multilingual Rerank: A user reported poor performance with rerank_multilingual_v3, scoring <0.05 on a similar question, but better results using rerank_english_v3 yielding 0.57.
    • This raises questions about the effectiveness of the multilingual models affecting RAG results.


Interconnects (Nathan Lambert) Discord

  • OpenAI o1 Models Impress: After testing the o1-mini model for PhD-level projects, it is comparable to an outstanding PhD student in biomedical sciences, showcasing its potential in academic applications.
    • This finding was shared on Twitter by Derya Unutmaz, touching on the model's strengths in advanced research.
  • Knowledge Cutoff Haunts Developers: The knowledge cutoff is October 23, limiting the AI's ability to handle newer developments in AI, frustrating several users.
    • This gap causes significant challenges while coding, as pointed out in a related discussion.
  • Qwen 2.5 Takes the Lead: Qwen 2.5 72B has topped evaluations against larger models like Llama 3.1 405B, establishing itself as a leader in open weights intelligence while excelling in coding and math.
    • Despite trailing slightly in MMLU, it offers a cheaper alternative with a dense model and 128k context window, as highlighted by Artificial Analysis.
  • Livecodebench Shows Strength: The latest livecodebench numbers are impressive, matching those of Sonnet by using timeless Leetcode questions, according to discussions.
    • However, limitations were noted regarding new library releases, which are often unknown to o1 models.
  • AI's Reasoning Ability Under Scrutiny: Discussions on AI reasoning abilities compared models like o1-mini and Qwen 2.5, assessing performance on tasks that avoid reflection-type methods.
    • Participants expressed optimism regarding future improvements despite current comparisons showing o1's strengths.


OpenInterpreter Discord

  • Troubleshooting OpenInterpreter Errors: A user encountered an issue while inputting data into OpenInterpreter and requested a detailed walkthrough to resolve it. It was suggested they send a DM of the error for better assistance.
    • This incident highlights a need in the community for shared troubleshooting resources.
  • Hands-On Performance Evaluation of Agents: Another user has been actively testing OpenInterpreter’s agent for about a week, indicating positive engagement with its features. This ongoing evaluation reflects the community’s interest in agent performance.
    • Users are motivated to explore OpenInterpreter's potential through active usage and feedback.
  • Perplexity Browser Compatibility Issues: A user inquired about whether Perplexity is set as the default browser, receiving confirmation that it is not. Multiple users reported experiencing similar browser-related issues.
    • One user noted encountering issues specifically with Edge on Windows, suggesting variations in performance across different setups.
  • Innovative RAG Chat App Insights: A member seeks advice on developing a RAG chat app tailored for PDF interactions, focusing on managing responses with both text and image elements. Suggestions included using tokens for images and summarizing visual content to optimize context usage.
    • The importance of integrating various data types effectively was emphasized during the discussion of this app’s capabilities.
  • Pioneering Image and Text Integration: Members discussed strategies for handling images within PDF responses, considering approaches like base64 encoding to enhance data retrieval. This integration is essential for improving user response accuracy.
    • A link was shared highlighting an impressive AI creation that was developed in just 10 seconds, showcasing the rapid advancement in this space.


OpenAccess AI Collective (axolotl) Discord

  • OBS Remains the Go-To for Screen Recording: Members discussed the use of OBS as a robust option for screen recording, though some prefer easier software alternatives for tasks like zooming effects.
    • One user emphasized their consistent use of OBS while others sought simpler solutions.
  • Screenity Emerges as a User-Friendly Alternative: A user shared Screenity, a free and privacy-friendly screen recorder that captures both screen and camera.
    • This tool aims to cater to users looking for a more accessible recording experience as compared to OBS.
  • Moshi Models Debut for Speech-to-Speech Applications: Members announced the release of the Moshi speech-to-speech models, enabling full-duplex spoken dialogue with text tokens aligned to audio.
    • This foundation model boasts features for modeling conversation dynamics, implemented in a PyTorch version quantized in bf16 precision.
  • GRIN MoE Shows Promise with Fewer Parameters: Discussion emerged around GRIN MoE, which impressively performs with only 6.6B active parameters, focusing on coding and mathematics.
    • It utilizes SparseMixer-v2 for gradient estimation, avoiding expert parallelism and token dropping, which sets it apart from traditional MoE methods.
  • Gemma2 fails to run with DPO data: A user reported a configuration issue with Gemma2 9b when used with DPO data, encountering a TemplateError stating, 'Conversation roles must alternate user/assistant/user/assistant...'.
    • The error arose from using a dataset structure that had 'prompt' instead of the necessary 'conversation'.


Torchtune Discord

  • Celebrating PyTorch Conference Visitors: A warm welcome was extended to attendees of the PyTorch conference, creating an engaging atmosphere for networking and interaction.
    • Participants are encouraged to direct any questions in the designated channel for enhanced community engagement.
  • Clarifying Conference Livestream Availability: An inquiry emerged regarding the potential for a conference livestream, though uncertainty lingered among members about its existence.
    • Responses included vague sentiments like ‘Idk :/’, reflecting the community's need for clarity on this matter.
  • GitHub PR Fixes kv-Caching: The pull request titled Fix kv-cacheing and bsz > 1 in eval recipe was linked, aimed at resolving critical kv-caching issues, contributed by SalmanMohammadi.
    • This fix is pivotal for improving performance, highlighting active developments in the Torchtune repository.
  • Need for HH RLHF Dataset Documentation: A discussion spotlighted the lack of documentation on the HH RLHF dataset, with suggestions for it to serve as a standard preference example.
    • The sentiment suggested that proper documentation is essential, as expressed through comments like ‘Not sure, it should be exposed...’.
  • Plans for Default Preference Dataset Builder: Enthusiasm surrounded the announcement of a default preference dataset builder, which will leverage ChosenToRejectedMessages.
    • Participants reacted positively, with comments like ‘Dope’, indicating a collective interest in this upcoming feature.


DSPy Discord

  • DSPy Program Optimization Success: A member celebrated their success with the BSFSWRS optimizer after two months of coding, showcasing its effectiveness in a complex LM setup.
    • The future is bright, people!
  • High Stakes in Prompt Optimization: Concerns raised about the potentially high costs associated with optimizing prompts for DSPy, indicating significant investment demands.
    • That's gotta be hella expensive to optimize a prompt.
  • MIPRO Financial Risks: A humorous take suggested using o1 with MIPRO while cautioning about the financial risks involved in the process.
    • Certified way to go bankrupt.
  • Bootstrapping Clarifications in DSPy: A member queried about bootstrapping, which focuses on generating pipeline examples and validating their success amid LLMs' non-determinism.
    • They expressed confusion about the method's operation given LLM behaviors.
  • Understanding Bootstrapping Outcomes: Another user explained that bootstrapping creates intermediate examples while validating their correctness through the final prediction's success.
    • If the final result is correct, the intermediate steps are deemed valid as few-shot examples.


tinygrad (George Hotz) Discord

  • Users Curious About tinybox Motherboards: A user asked about the specific motherboard used in tinybox red and green models, seeking clarification on hardware details related to the tinybox devices.
    • This reflects ongoing interest in hardware specifications, crucial for optimizing performance.
  • CLANG Bounty Discussion Heats Up: Members inquired if the bounty titled 'Replace CLANG dlopen with mmap + remove linker step' requires manual handling of relocations in the object file.
    • This indicates a deeper technical exploration into the implications for tinygrad's integration with CLANG.
  • Links to Optimizing Pull Requests Shared: A user shared links to Pull Request #6299 and #4492, focusing on replacing dlopen with mmap and implementing Clang jit.
    • These efforts aim to enhance performance, particularly on M1 Apple devices, demonstrating community commitment to optimization.
  • Community Engagement Around CLANG Bounty: A user expressed excitement about who might claim the bounty for the CLANG changes, highlighting community engagement.
    • This interaction showcases collaborative enthusiasm among members eager to see contributors' results.


LLM Finetuning (Hamel + Dan) Discord

  • OpenAI's o1 model garners attention: A YouTube video titled 'o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know' offers an engaging summary of how OpenAI's o1 may have been built.
    • Even skeptics are calling it a 'large reasoning model' due to its distinctive approach and impact on future model development.
  • o1's differentiation from other models: The video discusses why o1 is being recognized as a new paradigm in AI modeling, indicating significant shifts in design philosophy.
    • The implications of adopting such models can lead to a better understanding of reasoning capabilities in AI, making it a critical topic in the field.


MLOps @Chipro Discord

  • LunoSmart Launches with AI Offerings: Kosi Nzube launched his AI venture, LunoSmart, focusing on AI-driven applications and innovative solutions.
    • This venture aims to provide efficient and intelligent experiences across multiple platforms and device types.
  • Diverse Tech Stack Showcase: Kosi's applications utilize Java, Flutter, Spring Boot, Firebase, and Keras, demonstrating a modern development framework.
    • Availability on both Android and web increases accessibility, broadening user reach.
  • Mastering Cross Platform Development: Kosi excels in cross-platform development using Flutter and the Firebase SDK, enhancing app performance across devices.
    • His expertise in native Android development using Android Studio and Java contributes to robust mobile applications.
  • Machine Learning Skills on Display: With a background in Machine Learning since 2019, Kosi employs Keras, Weka, and DL4J for model development.
    • His commitment to advancing AI technologies underpins the foundational goals of the LunoSmart initiative.


DiscoResearch Discord

  • Mistral Slashes Pricing: Mistral's latest announcement reveals a strategic price drop aimed at boosting accessibility for users and developers.
    • This move sparks discussions on how competitive pricing could impact the market landscape and user adoption.
  • Market Reactions to Mistral's Price Drop: The price adjustment has led to glowing reactions across forums, highlighting Mistral's attempt to cater to a wider range of developers in the AI space.
    • Many industry watchers believe this could lead to increased competition among similar platforms, enhancing innovation.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.