AI News (MOVED TO news.smol.ai!)

Archives
March 22, 2025

[AINews] lots of little things happened this week

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


Incremental updates are all you need.

AI News for 3/20/2025-3/21/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 3009 messages) for you. Estimated reading time saved (at 200wpm): 318 minutes. You can now tag @smol_ai for AINews discussions!

  • Claude Code (which we mentioned last month) had a mini launch week
  • Mindmaps in NotebookLM
  • Roboflow launched their YOLO competitor
  • Anthropic made a lot of noise about a think tool
  • Gemini launhced a bunch of things
  • Kyutai Moshi added vision
  • Topaz announced a fast upscaler
  • Percy Liang relaunched HELM

all this and more in the Twitter/Reddit/Discord recaps. We hope to ship the weekly AINews this weekend.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

Models and Benchmarks

  • New research from @AnthropicAI reveals a simple 'think' tool dramatically improves instruction adherence and multi-step problem solving for agents: @alexalbert__ documented these findings in a blog post. @skirano also noted that they made an MCP for this, which can be downloaded from their official Anthropic MCP server repo. @_philschmid observed that @AnthropicAI appears to be the first to release combined reasoning and tool use, with Claude reasoning, generating a function call, executing it, and then continuing to reason with the output.
  • NVIDIA's Llama-3.3-Nemotron-Super-49B-v1 ranks at #14 on LMArena: According to @lmarena_ai, this model is a powerful open reasoning model, excelling in math with an openly released 15M post-training dataset. The ranking overview of this model, previously tested under the codename "march-chatbot" on LMArena, can be found here.
  • Sakana AI is using Sudoku Puzzles to superpower AI reasoning: @SakanaAILabs announced the release of a new reasoning benchmark based on the modern variant of Sudoku to challenge the AI community, believing these puzzles are perfect for measuring progress in AI reasoning capabilities. The new benchmark and training data are available here. @hardmaru simply stated that as a species, we can improve our collective reasoning and problem-solving ability by playing Sudoku.
  • The HELM benchmark has a new leaderboard: HELM Capabilities v1.0: @percyliang noted that they curated 5 challenging datasets (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH) and evaluated 22 top language models.
  • Meta AI released SWEET-RL, a novel RL algorithm for long-horizon & multi-turn tasks which can perform better credit assignments: @AIatMeta reported that experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success & win rates on CollaborativeAgentBench compared to other state-of-the-art multiturn RL algorithms, enabling Llama-3.1-8B to match or exceed the performance of GPT4-o in realistic collaborative content creations. More details on both of these releases can be found in the full paper published on arXiv.
  • Meta AI also released a new agents benchmark: CollaborativeAgentBench, the first benchmark studying collaborative LLM agents that work with humans across multi-turn collaboration on realistic tasks in backend programming & frontend design: Details at @AIatMeta.
  • New on LMArena: @Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14. It is a powerful open reasoning model—top-15 overall, excelling in math, with an openly released 15M post-training dataset.

Language Model Development and Releases

  • Gallabytes joined Cursor to work on coding agents: After an incredible 3 years leading model development at Midjourney, @gallabytes announced their move to Cursor.
  • Kyutai Labs released MoshiVis, an end-to-end low-latency Vision Speech Model: @reach_vb noted the model only adds 206M parameters and uses a learnable gating mechanism, adding only ~7ms per inference step on a MacMini with M4 Pro Chip, while maintaining real-time performance.
  • NVIDIA built GR00T N1, a powerful open-source AI model designed for humanoid robots: According to @TheTuringPost, it's a Vision-Language-Action (VLA) model based on Eagle-2 with SmolLM-1.7B, and a Diffusion Transformer. It generates 16 actions in ~64 milliseconds on an NVIDIA L40 GPU.
  • ByteDance just announced InfiniteYou available on Hugging Face: According to @_akhaliq, this is for Flexible Photo Recrafting While Preserving Your Identity.
  • Roblox just casually dropped a app for Cube 3D on Hugging Face: @_akhaliq noted that it generates 3D models directly from text.
  • Claude gets real-time web search: According to @TheRundownAI, OpenAI's voice AI got a personality boost. @_philschmid believes that @AnthropicAI are the first releasing combined reasoning + tool use.

AI Applications and Tools

  • The Deep Research x AI Builder Thesis: @swyx theorizes the collision path between the prompt-to-app AI builder and the deep research agent, suggesting building a deep research app on demand to split out UI generation and data generation into separate agents.
  • Dair.AI promotes the use of LLM-as-a-Judge, a technique for automating the assessment of LLM outputs by using a specialized LLM as a “Judge”: @dair_ai believes this enables rapid development of LLM applications and AI agents.
  • LangChain released MCP Adapters: @LangChainAI announced their new TypeScript library that connects Anthropic's MCP tools with LangChain.js & LangGraph.js, featuring multi-server support and seamless agent integration.
  • LlamaIndex announced LlamaExtract is now in public beta: This leading, genAI-native agent for structured document extraction adapts the latest models to structure even the most complex documents: @jerryjliu0.
  • Perplexity is working on an updated version of Deep Research: @AravSrinivas states that the new version will throw even more compute, think longer, present more detailed answers, use code execution, and render in-line charts.

AI Community and Events

  • Andrew Ng shared his observations from the AI Dev 25 conference: @AndrewYNg noted that agentic AI continues to be a strong theme, developers are fine-tuning smaller models on specific data, and many speakers spoke about the importance of being pragmatic about what problems we are solving, as opposed to buying into the AGI hype.

Optimization and Training

  • Cloneofsimo shared findings from exploring extreme beta values in training: @cloneofsimo notes that large beta2 seems crucial, until beta1 also becomes small, and that small beta1 allows small beta2.
  • Hamel Husain provided an update on training tools: @HamelHusain let his audience know that he'd be online in ~ 15 min (will be recorded for those who sign up).

Humor

  • Neel Nanda jokingly asked if 21% don't think someone is a billionaire: @NeelNanda5.
  • Vikhyatk joked about moving to SF and finding a room for only $6000/mo: @vikhyatk.
  • Swyx updated a meme: @swyx.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. SpatialLM: LLM for 3D Scene Understanding

  • SpatialLM: A large language model designed for spatial understanding (Score: 1033, Comments: 94): SpatialLM is a large language model specifically designed to enhance 3D scene understanding using Llama 1B. The model focuses on improving spatial comprehension, potentially offering advancements in applications that require detailed environmental awareness.
    • SpatialLM Capabilities: SpatialLM processes 3D point cloud data to generate structured scene understanding, identifying architectural elements like walls and doors and classifying objects with semantic categories. It works with various data sources, including monocular videos, RGBD images, and LiDAR sensors, making it versatile for applications in robotics and navigation.
    • Technical Queries and Clarifications: Discussions raised questions about the classification of SpatialLM as a language model, given its processing of non-human readable data. It was clarified that it outputs structured 3D object graphs, which is a specific form of language, and is based on Llama 1B and Qwen 0.5B.
    • Model Performance and Applications: Users expressed amazement at the model's capabilities with only 1.25 billion parameters and discussed potential applications, such as integration with text-to-speech for the visually impaired and use in robot vacuum cleaners. The model's ability to estimate object heights and its potential for integration into reasoning models were also highlighted.

Theme 2. Qwen 3: Modular AI Model Developments

  • Qwen 3 is coming soon! (Score: 402, Comments: 97): Qwen 3 is anticipated to be released soon, as indicated by a pull request on the Hugging Face Transformers GitHub repository. The link to the pull request is here.
    • Discussion highlights the Qwen 3 MoE model's architecture, particularly its use of 128 experts with 8 activated per token, and the 15B MoE model size, which makes it suitable for CPU inference. Users express hope for larger models, like a potential 30-40B MoE or even a 100-120B MoE, to compete with modern models.
    • Several comments delve into the technical details and performance metrics of Qwen 3, with comparisons to other models like Deepseek v3. Active parameters are noted to be 2B, and there's a discussion on the model's potential performance, with references to benchmarks and model equivalence calculations.
    • The community is excited about Qwen 3's potential, especially its CPU compatibility and small active parameter size, which reduces computational resource requirements. There's interest in its embedding capabilities and curiosity about its performance in coding tasks, with some users noting the vocab size of 152k and max positional embeddings of 32k.

Theme 3. Docker's Competitive Leap: LLM in Containers

  • Docker's response to Ollama (Score: 240, Comments: 136): Docker is introducing a new feature that enables Mac GPU access, allowing users to run models like mistral/mistral-small on their machines. This update excites users as it enhances Docker Desktop's capability by allowing containers to utilize the Mac's GPU, as detailed in their official announcement and further discussed in a YouTube video.
    • The discussion highlights the use of wrappers like Ollama and llama-swap for managing and running models, with some users criticizing these as unnecessary abstractions over llama.cpp. However, others argue that these tools simplify deployment, especially for those not deeply familiar with technical setups, and offer modularity and ease of use in distributing and hosting models.
    • Docker's new feature enabling Mac GPU access is seen as a significant advancement, allowing Mac users to run applications in isolated environments with GPU acceleration. This update is particularly important for those using Apple silicon and is compared to the impact of GitHub Container Registry on Docker Hub, though some users express dissatisfaction with Docker's command-line interface.
    • There is a debate over the open-source community's approach, with some users expressing concern about projects like Ollama branding themselves instead of contributing to existing projects like llama.cpp. Others defend the modular approach, emphasizing the importance of simplicity in development and deployment, particularly in the context of AI model hosting and managing dependencies.

Theme 4. Gemma 3, Mistral 24B, and QwQ 32B: Performance Comparison

  • Gemma 3 27b vs. Mistral 24b vs. QwQ 32b: I tested on personal benchmark, here's what I found out (Score: 231, Comments: 74): QwQ 32b excels in local LLM coding and reasoning, outperforming Deepseek r1 in some instances and significantly surpassing Gemma 3 27b and Mistral 24b. In mathematics, both Gemma and QwQ handle simple tasks well, with Gemma being faster but having a more restrictive license. Mistral 24b underperforms compared to the others, though it, along with Gemma, offers image support. For further details, refer to the blog post.
    • QwQ 32b's Performance and VRAM Requirements: Users confirm that QwQ 32b excels in coding and reasoning tasks, outperforming some cloud models, but note its significant VRAM requirements. This makes it challenging to run on a single GPU even with quantization, limiting its context window size.
    • Model Comparisons and Quantization Concerns: There's a need for clarity on Gemma's model type used in comparisons, as well as concerns about quantization settings, particularly for Mistral, which may affect performance. RekaAI_reka-flash-3 and ExaOne Deep are suggested as alternatives for users with limited hardware resources.
    • Benchmarking and Use Cases: Suggestions include running models like Gemma, Mistral, and QwQ in an IDE for more practical benchmarks, and testing ExaOne Deep and DeepHermes for comparison. Users also highlight QwQ 32b's strong performance in transcript summarization, occasionally surpassing GPT-4/4.5.

Theme 5. ByteDance's InfiniteYou: Identity-Preserving Image Model

  • ByteDance released on HuggingFace an open image model that generates Photo While Preserving Your Identity (Score: 128, Comments: 36): ByteDance has launched InfiniteYou, an image generation model available on HuggingFace that allows for flexible photo recrafting while preserving individual identity. The project features a diverse array of portraits, showcasing individuals in various settings, emphasizing a blend of realism and artistic interpretation. Key resources include the project page, code repository, and the model on HuggingFace.
    • Commenters critique the image quality of InfiniteYou, describing it as "rough" and "plastic-y," indicating skepticism about the model's ability to generate realistic images.
    • macumazana points out that similar work has been done previously with older models, suggesting that InfiniteYou doesn't offer significant novelty or advancement in the field.
    • moofunk suggests a strategic approach by focusing on model strengths and proposing the idea of chaining models to enhance photo generation quality, rather than relying on single-model outputs.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. 5 Second Flux Innovation: Nunchaku, InfiniteYou, and Step-Video-TI2V

  • 5 Second Flux images - Nunchaku Flux - RTX 3090 (Score: 263, Comments: 66): MIT-Han-Lab has released ComfyUI-nunchaku, a tool for generating 5-second flux images. The announcement also mentions the RTX 3090, although no specific details about its role in the project are provided.
    • Users expressed skepticism about ComfyUI-nunchaku's output quality, noting that images appear "plastic" and similar to those generated by models like SDXL. Concerns were specifically raised about the artificial appearance of human faces, often featuring cleft chins.
    • Nunchaku SVDQuant offers significant performance improvements, reducing model size by 3.6×, memory usage by 3.5×, and achieving 10.1× speedup on NVIDIA RTX 4090 laptops by eliminating CPU offloading. The tool supports lora conversion similar to TensorRT, with detailed setup instructions provided via GitHub and Hugging Face.
    • A user shared their experience using the deepcompressor repo for quantizing flux finetunes, encountering challenges with cuda/transformers dependencies and VRAM limitations, suggesting a 24GB VRAM is insufficient. They provided a workaround by renting an A40 GPU and shared steps for potential dependency fixes.
  • InfiniteYou from ByteDance new SOTA 0-shot identity perseveration based on FLUX - models and code published (Score: 193, Comments: 59): ByteDance has introduced InfiniteYou, a new state-of-the-art zero-shot identity preservation model based on FLUX. The model, alongside its code, has been published, showcasing its ability to enhance identity characteristics in images, as demonstrated in a comparison grid featuring ID Image, PuLID-FLUX, and InfU (Our model), with InfU showing advanced rendering and fidelity in identity preservation.
    • Discussion around Flux's identity preservation reveals mixed opinions: while some users note that the model effectively adheres to prompts and maintains facial details, others criticize it for not accurately replicating input features like eye color and hair, as well as the "Flux chin" issue. ByteDance's InfiniteYou is viewed as a significant step forward, though its realism is questioned by some users.
    • Hugging Face is a focal point for the model's availability, with users eager to see its integration into ComfyUI workflows. There is a demand for better handling of features like freckles, scars, and tattoos, which are seen as essential for high-quality facial replicas.
    • Users express impatience with the current Flux model's aesthetic and predict a shift once a new open model becomes available. ByteDance's approach focuses on research and methodology rather than aesthetics, which some users find lacking in terms of practical, photorealistic application.
  • Step-Video-TI2V - a 30B parameter (!) text-guided image-to-video model, released (Score: 119, Comments: 61): Step-Video-TI2V is a newly launched 30 billion parameter model that facilitates text-guided image-to-video conversion. This release marks a significant advancement in AI-driven video generation.
    • Model Size and Performance: The Step-Video-TI2V model, with its 30 billion parameters and 59GB weights, is seen as a significant advancement, though its local usage is challenged by high VRAM requirements (up to 70GB for 720p videos). Users discuss the impracticality of its current resource demands, jokingly suggesting the need for a kidney to run it locally.
    • Chinese AI Development: There is a perception that China is advancing rapidly in the AI sector, with multiple video models emerging consecutively, while the US and EU lag behind. Some users note that although China is producing these models, they do not always provide the best quality outputs, as seen with Yuewen's implementation.
    • Quality and Compression Concerns: Users express concerns about the compression techniques used in the model, which result in a loss of detail despite the model's large size. The model's reliance on 16x spatial and 8x temporal compression is criticized for hindering its ability to generate fine details, leading to glitches and subpar results in video outputs.

Theme 2. Text-to-Video AI Advancements: From Open-Source Initiatives

  • Remade is open sourcing all their Wan LoRAs on Hugging Face under the Apache 2.0 license (Score: 171, Comments: 21): Remade is open sourcing all their Wan LoRAs on Hugging Face under the Apache 2.0 license, allowing for broader access and use within the AI community.
    • Some users, like Weird_With_A_Beard and Mrwhatever79, are enthusiastic about the Wan LoRAs, expressing gratitude and enjoyment in using them for video generation. However, others are skeptical about the claim of open-sourcing, highlighting that LoRAs generally don't have licenses and questioning the authenticity of the open-source claim due to premium services offered via a Discord server.
    • LindaSawzRH and hurrdurrimanaccount criticize the open-source claim, arguing that the LoRAs are not truly open-source if the training data and processes are not provided, and access is behind a paywall. They express concerns about the precedent this sets for the community, with hurrdurrimanaccount questioning whether datasets are being shared.
    • Ballz0fSteel shows interest in a tutorial for training Wan LoRAs, but LindaSawzRH suggests that access to such information might require payment, further fueling the discussion about the transparency and accessibility of the resources.
  • Wan I2V - start-end frame experimental support (Score: 160, Comments: 21): Wan I2V introduces experimental support for start-end frames, enhancing its capabilities in video processing. This update is likely to improve the precision and efficiency of video frame analysis.
    • WanVideoWrapper Update: The WanVideoWrapper by Kijai received an update for experimental start-end frame support, previously available in raindrop313's repository. This improvement allows the introduction of new objects in scenes which were difficult to prompt before, although some issues like missing elements and color shifts persist, which can be mitigated by adjusting parameters such as caching and resolution.
    • Community Excitement and Testing: Users expressed enthusiasm about the update, with some already testing it with Kija nodes and reporting positive results. The feature is seen as a potential game-changer for scripted storytelling, offering more reliability than previous versions.
    • Open Source and Collaboration: The community appreciates the open-source nature of the project, highlighting contributions from various developers like raindrop313 and expressing gratitude for the collaborative efforts that led to these advancements.

Theme 3. Critique of LLM Evaluation Methods: Simplification & Blame

  • Shots Fired (Score: 1372, Comments: 284): Critics argue that LLM intelligence tests are often unevaluative, implying that they fail to accurately measure or reflect the true capabilities and intelligence of large language models. This criticism suggests a need for more rigorous and meaningful evaluation methods to assess AI performance.
    • Yann LeCun's Perspective: Yann LeCun is discussed extensively, with many agreeing that LLMs alone won't lead to AGI. LeCun emphasizes the need for new AI architectures beyond LLMs, as presented in his speech at the NVDA conference, and is recognized for his significant contributions to AI, particularly in deep learning and CNNs.
    • Limitations of LLMs: Several commenters argue that LLMs are limited in achieving AGI due to their architecture, which lacks the ability to learn and adapt like human intelligence. Virtual Intelligence (VI) is suggested as a more appropriate term for current AI capabilities, emphasizing utility over consciousness or self-awareness.
    • Current AI Utility and Misconceptions: There is a consensus that while LLMs are not useless, they are tools that require proper use and understanding. Some express skepticism about the AI hype, noting that tools like Claude have improved and can enhance productivity, but they do not replace human jobs or achieve independent reasoning.
  • After giving me a puzzle I couldn’t solve I asked for one simpler (Score: 529, Comments: 113): The post discusses a ChatGPT interaction where the user requested a simpler puzzle after being unable to solve "The Three Chests" puzzle. The AI responded without sarcasm, implying it genuinely believed the user needed an easier challenge, highlighting potential limitations in understanding user intent or context.
    • Logical Reasoning and Puzzle Analysis: Claude's analysis of "The Three Chests" puzzle demonstrates a classic logical reasoning approach, questioning the accuracy of labels and considering potential twists like incorrect labels. The discussion highlights the need to consider whether all labels are incorrect, which would lead to choosing the chest labeled "Gold" after testing the "Silver" chest first.
    • Humor and Sarcasm: Several commenters, like EuphoricDissonance and Careless_General5380, use humor to engage with the topic, joking about the treasure being "love" or "friends made along the way." This reflects the light-hearted nature of the discussion around the puzzle's simplicity and the AI's response.
    • Puzzle Constraints and Solutions: Toeffli points out a missing element in the puzzle regarding truth-telling and lying notes, which affects determining the treasure's location. Professional_Text_11 and others note the absence of a rule against opening all three chests, suggesting a straightforward solution that bypasses the intended puzzle logic.

Theme 4. AI-Generated Satire and Historical Reconstructions

  • Doge The Builder – Can He Break It? (Score: 183, Comments: 24): Doge The Builder satirizes Elon Musk and Dogecoin by comparing them to "Bob the Builder," highlighting themes of greed, economic chaos, and unchecked capitalism. The post humorously references a fictional licensing by the Department of Automated Truth (DOAT) and suggests a YouTube link for viewing in the comments.
    • AI's Role: Commenters express admiration for the capabilities of AI in creating content like "Doge The Builder," highlighting its impressive nature in the current age.
    • Cultural Impact: Discussions touch on the influence of individuals like Elon Musk on society's zeitgeist, questioning the morality of amassing wealth and its implications on civilization.
    • Creation Curiosity: There is curiosity about the process of creating satirical content, with inquiries on how such pieces are made.
  • Made this in 5 minutes. We're going to need some good AI detection soon... (Score: 13355, Comments: 552): The post highlights the urgent need for improved AI detection technologies, specifically in the context of rapidly produced AI-generated videos. The author underscores the ease with which such content can be created, implying potential challenges in distinguishing authentic videos from AI-generated ones.
    • Concerns about the authenticity of AI-generated videos are prevalent, with users like YoshiTheDog420 expressing skepticism about ever having reliable AI detection tools. They fear that visual evidence could become unreliable, with any footage potentially dismissed as AI-generated, undermining trust in media.
    • The discussion highlights the ease with which people can be fooled by AI-generated content, as Rude_Adeptness_8772 suggests a significant portion of elderly individuals might perceive such videos as genuine. Visarar_01 shares an anecdote about a family member being deceived by an AI video, illustrating the potential for misinformation.
    • Some commenters, like ProfessionalCreme119, propose solutions such as integrating AI detection tools into devices to identify AI-generated videos, suggesting a need for widespread implementation of detection mechanisms. Others, like Soft-Community-8627, warn about the potential misuse of AI to fabricate events, which could be leveraged by governments to manipulate public perception.

Theme 5. AI Art and Workflow Transparency Debates

  • Can we start banning people showcasing their work without any workflow details/tools used? (Score: 265, Comments: 56): The post suggests banning art posts that do not include workflow details or tools used, arguing that without such information, these posts function merely as advertisements. The author calls for a change to ensure contributions are informative and beneficial to the community.
    • Many users, including Altruistic-Mix-7277 and GravitationalGrapple, argue against banning posts without workflow details, suggesting that the subreddit serves both as a gallery and a resource for learning. They emphasize the importance of open-ended discussion and the ability to ask questions directly in comments for additional details.
    • Lishtenbird highlights the ongoing issue of "no workflow" posts, noting the disparity in engagement between detailed guides and flashy, low-effort content. They suggest implementing an auto-mod comment system to ensure that at least some information, like prompts, is shared, although this would require additional resources to implement.
    • ByWillAlone and wonderflex discuss the subreddit’s dual nature as both an art showcase and a learning platform. They propose the idea of creating a separate space, like r/aiartschool, dedicated to in-depth tutorials and high-effort content, while maintaining the voting system to naturally filter content quality.
  • This guy released a massive ComfyUI workflow for morphing AI textures... it's really impressive (TextureFlow) (Score: 105, Comments: 11): ComfyUI released a significant workflow called TextureFlow for generating and morphing AI textures. The release is notable for its impressive capabilities in AI texture manipulation.
    • TextureFlow is available via a direct link to the workflow JSON on GitHub. Users are exploring its capabilities for AI texture manipulation and generation.
    • Users like Parulanihon are experimenting with TextureFlow for logo creation, recommending a denoising level of 0.3 or 0.4 max. However, challenges include achieving transparent backgrounds and aligning with outdated YouTube tutorials, necessitating a mix-and-match approach to achieve desired results.
    • No-Mistake8127 is using TextureFlow to create animated artwork for a custom Raspberry Pi driven digital frame, highlighting its ability to handle inputs such as video, text prompts, photos, movement, and controlnets.

AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Pricing Showdowns and Censorship Woes

  • Cursor Burns Wallets: Users rage over charges for connection errors and lost premium requests when downgrading plans. One member quipped "Normal agent is a shit show without max" and opted out due to cost inefficiencies.
  • OpenAI’s o1 Pro Overheats: Developers call o1 Pro a "monumentally overpriced" model, preferring Claude or cheaper alternatives like DeepSeek. Some joked that o1 Pro costs $30 per full send, making it a luxury few can afford.
  • Pear vs. Cursor Price War: Some note that Pear is cheaper but “can’t code worth a damn” and relies on roo code for file changes. Others warn that if Cursor’s pricing and context limits don’t improve, they might jump ship.

Theme 2. Model Upgrades and Debates

  • Claude 3.7 Provokes Passion: Some swear 3.7 is "better for going the extra mile," while others say 3.5 is more accurate. The community agrees "no single hammer is better for every job," reflecting a divide over performance quirks.
  • Qwen 3 Draws Crowds: People excitedly track news that Qwen 3 is imminent, following the recent Qwen 2.5 Omni release. Leaked hints suggest it might challenge top-tier models like GPT-4.5.
  • Sora Falls Short: Despite big teasers, this public release underwhelmed users who found it inferior to Keling AI and Hailuo AI. Critics suspect "turbo version" hype overshadowed real performance limitations.

Theme 3. Fine-Tuning Adventures and VRAM Tussles

  • Gemma 3 Keeps Breaking: Missing dependencies and --no-deps bugs stumped users trying older Colab notebooks. One dev lamented "Why does Llama fail here, but works fine in my other environment?"
  • QLoRA Slays Memory Woes: Turning on QLoRA instantly cut VRAM usage, letting Gemma 3 run on smaller hardware. Loading in 4-bit mode helped avoid out-of-memory crashes.
  • DeepHermes 24B Overflows VRAM: People face OOM errors running 24B on multi-GPU rigs, even with minimal context. Suggestions include 8-bit versions or fine-tuning multi-GPU setups with flags like --tensor-split.

Theme 4. New Tools, Agents, and RAG

  • Oblix Orchestrates Edge vs. Cloud: A slick demo shows agents juggling local and remote LLMs for cost/performance trade-offs. The system decides whether to run queries on hardware like Ollama or farm them out to OpenAI.
  • Local RAG App Wows Coders: A fully local retrieval-augmented generation tool chats with code using GitIngest for parsing and Streamlit for UI. It runs Meta’s Llama 3.2 locally through Ollama, delighting developers seeking offline solutions.
  • Semantic Workbench Rides In: Microsoft’s new VS Code extension prototypes multi-agentic systems in one place. Users wonder if it doubles as an MCP framework or stays primarily a dev tool.

Theme 5. Tokenizer Tricks, Synthetic Data, and Hardware Upgrades

  • SuperBPE Shrinks Sequences: A newly minted superword tokenizer cuts sequence lengths by 33% at a fixed 200k vocab. Tests show an 8% MMLU boost and 27% faster inference compared to standard BPE.
  • Synthetic Data Reigns: Researchers highlight filtering, augmentation, and generation as a way to “reject data we already predict well.” Open-source labs like Bespoke promise fresh synthetic pipelines for targeted fine-tuning.
  • Nvidia’s Blackwell Sparks Skepticism: Next-gen RTX Pro cards tout up to 96GB of VRAM but threaten to worsen GPU supply shortages. Enthusiasts doubt Nvidia’s claim that “we’ll fix availability by May/June.”

PART 1: High level Discord summaries

Cursor Community Discord

  • Cursor's Pricing Proves Punitive: Users express frustration with Cursor's pricing model, citing charges for connection errors, resumed requests, and 'tool charges for no responses', with some reporting lost premium requests after downgrading plans.
    • Some users find the 'Normal agent is a shit show without max' and find it 'quicker than spending real $ on max', opting out of premium due to perceived cost inefficiencies.
  • Claude 3.7 Causing Consternation: Members report issues with Claude 3.7's performance in Cursor, claiming false assumptions and decreased reliability compared to Claude 3.5, with some having the opposite experience.
    • Opinions vary, with one user stating '3.7 is better for going the extra mile. 3.5 is better for accuracy', while another notes 'There’s no single hammer that’s better for every job'.
  • Pear's Potential Prompts Pricey Problems: Users compare Pear AI to Cursor, noting Pear’s cheaper pricing but also concerns about its reliance on roo code and per-file change acceptance workflow, whereas others cite that Pear can’t code worth a damn.
    • Some Cursor users, like one who said 'I don't like pear AI that much, mainly cause they use roo code and roo code is not that stable', are considering switching if Cursor doesn't improve its context window or pricing.
  • React Racketeering Raises Rivalries: The channel debates the merits of React versus Svelte for a SaaS app, with some preferring React for its large community and compatibility with Cloudflare Pages, while others find it slow and messy, advocating for Svelte.
    • The user base seems pretty split, with arguments ranging from 'react is slow af' to 'svelte also doesn't need workarounds'.
  • Vibe Visions Vary Wildly: Members debated the usefulness of vibe coding, with some calling it a marketing ploy and a crock, while others argued that it is a real thing requiring technical expertise, like a basic knowledge of Git.
    • Despite varying definitions, a consensus emerged that successful 'vibing' requires critical thinking, debugging skills, and the ability to steer AI tools effectively.


Unsloth AI (Daniel Han) Discord

  • Gemma 3 gets Dependency Glitch: Gemma 3 has a bug with --no-deps causing missing dependencies in older notebooks, and a Google Colab with a 2018 GPU might be too outdated for some tasks, according to this discussion.
    • A user encountered issues with Llama failing in a Gemma-specific environment, but the same notebook failed on Google Colab due to missing dependencies, according to this notebook.
  • Vision Fine-Tuning still on Unsloth's backburner: Despite Gemma 3 supporting images, vision fine-tuning is not yet supported on Unsloth, according to this issue.
    • A user attempted to fine-tune Gemma 3 using Llama code, which failed, but they still wanted to know if the model would run images after fine-tuning text only.
  • QLoRA to the Rescue for Gemma 3: Users encountered memory errors when running the Gemma 3 model, but enabling QLoRA resolved the issue, likely due to reduced VRAM usage as mentioned here.
    • Turning on QLoRA automatically sets load in 4bit = true, which helps to reduce VRAM usage.
  • Community Seeks Synthetic Data Nirvana: Members discussed tools for synthetic data generation, with one user recommending Bespoke Labs due to its extensive features, and confirmed it's open source with a dedicated Discord server.
    • One user inquired about the availability of example notebooks or Colabs demonstrating the implementation of GRPO with vision models, but such an example was currently lacking, but is planned for the future.
  • DPO Trainer gets an Upgrade: A user shared their experience upgrading to the latest DPO Trainer with the latest Unsloth and Unsloth Zoo, providing a link to their small diff for others facing similar challenges.
    • The user also found the Zephyr (7B)-DPO notebook confusing and suggested updating it via a pull request to the Unsloth notebooks repository.


OpenAI Discord

  • OpenAI's o1 Pro Pricing Sparks Outrage: Users are unhappy with OpenAI's API pricing for the o1 Pro model, calling it severely overpriced, and preferring Claude.
    • Some joked about OpenAI's pricing strategy, and observed that DeepSeek offers comparable performance at a fraction of the cost, according to shared charts.
  • Debate Surrounds o1 Architecture: Discord users are debating if OpenAI's o1 model is based on GPT-4o, with conflicting claims about its architecture.
    • Arguments focus on knowledge cutoff dates; some think o1 is just gpt4o with reasoning.
  • Perplexity Desktop App Boosts Loyalty: Perplexity is rewarding desktop app users with a free month of Pro after 7 days of use.
    • The reward is limited to the Windows app, excluding macOS, iOS, Android, and Web users.
  • GPT Pro Subscription Woes Plague Users: Users reported paying for GPT Pro but not getting subscription access, and expressed frustration over OpenAI support's unresponsiveness.
    • Affected users were directed to help.openai.com for support, with assurances that the channel cannot assist with billing matters.
  • Structured Output Hinders AI Reasoning: Members tested if the phrase "No other keys or commentary are allowed" reduces reasoning capabilities in structured output, and discovered an adverse effect, along with increased token usage.
    • Results suggest that models are overthinking ethical implications, in these conditions.


LM Studio Discord

  • LM Studio API Courts RAG Integration: Users are eyeing the potential of RAG (Retrieval-Augmented Generation) integration with the LM Studio server API, similar to Ollama and Qdrant.
    • While the GUI fetches only the top 3 vectors, the API could enable customized implementations with embeddings and vector databases, according to one user.
  • ZeroGPU Pro Users Bump into Quota Walls: A ZeroGPU Pro user hit their GPU quota despite upgrading, possibly because they were using a FastAPI backend instead of a Gradio UI.
    • They are seeking advice on resolving the quota issue when calling the ZeroGPU Pro API from their own application.
  • LM Studio Inspires Browser Extension Ideas: Potential browser extensions for LM Studio are being discussed, including webpage translation using Gemma 3 27b and YouTube video summarization.
    • One member suggested extensions to summarize YouTube videos by extracting and summarizing subtitles, while feasibility of real-time webpage translation was debated due to speed constraints.
  • Audio Model Alchemists Brew with PyTorch: A member is experimenting with pretraining an audio model from scratch using PyTorch and a transformer architecture, aiming to generate proper audio from tokens.
    • Another member shared their model's song outputs based on names (e.g., abba.mp3, mj.mp3) and suggested fine-tuning or uploading the model to Hugging Face for broader experimentation.
  • RX 9070 owners report slow speeds: Several users with the new RX 9070 cards are reporting slower inference speeds compared to older cards, with one user reporting their speeds dropped from 5-7 tok/s to around 3 tok/s with a Granite 3.1 8B Q8_0 model.
    • The performance issues are suspected to stem from bugs in AMD's Vulkan drivers.


aider (Paul Gauthier) Discord

  • Claude Code Copies Aider Web Search: A user observed that Claude code is implementing web search in a similar fashion to Aider, which was demonstrated in a post on X.
    • It was clarified that the new Claude web search feature is currently exclusive to Claude Desktop.
  • Aider's Commit Flag Triggers Hook Headaches: Aider adds the --no-verify flag during commits, bypassing system hooks, according to aider/repo.py code.
    • The maintainer explained that this is because commit hooks could cause arbitrarily strange things to happen, suggesting the use of lint and test hooks as a workaround.
  • o1-pro API Costs Price Users Out: Users trying o1-pro via the API reported exorbitant costs of $30 per full send, rendering it prohibitive.
    • The high cost spurred discussions on caching mechanisms, with speculation on whether OpenAI's automatic prompt caching could help mitigate expenses.
  • Pipx Package Installation Woes on Ubuntu: A user encountered difficulties installing Aider for all users on Ubuntu, despite advice to use sudo pipx install --global aider-chat.
    • They eventually succeeded by installing with uv at /usr/local/bin after overcoming pip and version conflict issues.
  • Aider's Auto-Fixing needs manual prompting: A user reported that Aider needs manual prompts such as "fix the tests" after each failure, despite having enabled the --auto-test parameter, referencing the documentation here.
    • Aider should automatically fix test failures if configured with the "--auto-test" setting.


Perplexity AI Discord

  • Deep Research Limits Debated Fiercely: Users are debating Deep Research usage limits, referencing the Perplexity blog stating unlimited access for Pro, while others cite a 500 queries per day limit.
    • A member pointed to a tweet by Aravind Srinivas indicating Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries.
  • GPT 4.5's Disappearance Creates Confusion: Users report GPT 4.5 is missing from Perplexity Pro, with some suggesting the model was removed after gaining new subscribers.
    • Some users lauded 4.5 as SOTA for writing text while others deemed it slow and uninsightful, creating uncertainty among the user base.
  • Perplexity Users Frustrated by Auto Model Switching Glitch: Users are experiencing a glitch where Perplexity reverts to the Auto model, even after selecting a specific model like Claude.
    • This issue requires users to manually reselect their preferred model, leading to frustration, especially among those who favor Claude over R1.
  • API Key Spend Tracking Feature Requested: A feature request was submitted to GitHub to allow users to name API keys for better spend tracking.
    • Currently, users can track spend by API key, but lack the ability to assign names, hindering efficient management of API usage costs.
  • R1-1776 Finetuning Faces Censorship Scrutiny: An independent researcher found canned CCP answers and censored content in R1-1776-671B and the distilled R1-1776-70B when prompted on topics like Tiananmen Square, documented in this blogpost
    • The researchers raised concerns regarding political bias and content filtering in the open-source weights of the model.


Interconnects (Nathan Lambert) Discord

  • Claude Unleashes Web Search: Web search is now available in claude.ai, enabling Claude to finally search the internet and deliver true positives for research queries, confirmed in this tweet.
    • It was later confirmed the search engine being used by Claude is Brave.
  • Midjourney Lead Swaps Beauty for Code: After 3 years leading model development at Midjourney, a key member joined Cursor to work on coding agents, marking a shift from a focus on beauty and creativity to code, as noted in this tweet.
    • The move signals a growing emphasis on practical AI applications in coding environments.
  • InternVL's Training Code Opens Up: Members expressed surprise that InternVL has open source training code, making it one of the few notable models with open training pipelines, with InternVL's packing implementation provided as an example of the dataloading approach.
    • The open-source nature of InternVL allows the community to inspect the data loading process and dataset iteration.
  • SuperBPE Tokenizer boosts efficiency: SuperBPE, a new superword tokenizer that includes tokens spanning multiple words, created a model that consistently outperforms the BPE baseline on 30 downstream tasks (+8% MMLU), while being 27% more efficient at inference time, described in this tweet.
    • At a fixed vocab size of 200k, SuperBPE reduces sequence length by 33% on average.
  • Smaller Models Benefit from Synthetic Augmentation: Members discussed whether smaller datasets are a new trend, with larger models like GPT-4.5 potentially needing more data, especially during various post-training stages and the conversation touched on the use of synthetic data to augment smaller datasets for training smaller models.
    • The conversation suggested a trade-off between data size, model size, and the use of synthetically generated data, implying a strategy where smaller models might rely more on enhanced datasets, while larger models can effectively utilize larger volumes of raw data.


LMArena Discord

  • Claude Gets Overrated, Grok3 Still King?: Community members suggest Claude is overrated in coding due to limited evaluations beyond SWE-bench, hinting it doesn't match Grok3 on livecodebench.
    • The ratings may be skewed by non-developers, leading to inaccurate assessments of its true capabilities.
  • Gemma Gets Glowing Review: Members were amazed by Gemma3's 1340 score and its relatively small 27B parameter size.
    • One member described Gemma's responses as autistic, giving very brief answers, often when a much longer one is warranted.
  • Deepseek R1 Hogging VRAM: Deepseek R1 requires around 1000GB of VRAM, with one user deploying it on 8xH200s.
    • Despite high VRAM usage, there are claims that Deepseek R1 exhibits baked-in PRO CHINA biases, raising concerns about its use, with one user saying tldr deepseek is #&&@% don't recommend using it.
  • Qwen 3 Coming Soon, Qwen 2.5 Omni Announced: Reports indicate that Qwen 3 is coming soon, confirmed by a post on the Hugging Face Transformer repository.
    • This news follows the announcement of Qwen 2.5 Omni, sparking interest and anticipation within the community, as noted in a Tweet from Lincoln 🇿🇦.
  • Sora's Turbo Version Struggles, Hype not Matching Reality: Users found Sora's public release underwhelming compared to its promotional materials, and maybe inferior to competitors like Keling AI and Hailuo AI.
    • It's suspected that OpenAI used huge amounts of compute over hours to generate them and the released Sora version is the turbo version.


Notebook LM Discord

  • NLM's Podcast Feature Gets Mixed Reactions: Users are reporting positive experiences with NotebookLM's Podcast feature, though some find that the AI cuts them short during discussions.
    • One user likened the experience to being part of a radio show where I can talk to hosts, but felt like a third wheel because the AI would revert to its own script.
  • Gemini 1.5 Pro Powers NotebookLM: Users discuss the underlying model of NotebookLM, with speculation pointing towards Gemini 1.5 Pro, while others suggest Gemini 2.0.
    • The discussion underscores the importance of NotebookLM staying grounded in its sources, a key differentiator from Gemini.
  • Users Seek Streamlined PDF Processing: A user is seeking a more efficient workflow for scanning physical papers into private online storage and making them searchable via natural language queries, and asks whether taking photos with iPhone and sending to NLM for automatic naming and OCR is more efficient.
    • The current manual process involves scanning to PDF, sending to Gmail, manually naming each file, OCR processing, and importing into NotebookLM.
  • AI Avatar Lip Sync Services Face Off: Members compared lip syncing services for AI avatars, noting that Hedra is great but pricey.
    • RunwayLM garnered less favorable feedback.
  • Mind Map Feature Slowly Unveiled: The Mind Map feature rollout is proceeding slowly, with many users, including Plus subscribers, not yet seeing it in their accounts.
    • Staff confirmed it will take a few days for all users to get it.


Nous Research AI Discord

  • Nvidia Blackwell RTX Pro Sparks Supply Chain Concerns: Nvidia launched the Blackwell RTX Pro series for various platforms, potentially squeezing the already tight Blackwell GPU supply.
    • While Nvidia anticipates improved GPU availability by May/June, skepticism persists among community members.
  • Dataset Evaluation & Augmentation Proves Paramount: Discussions highlighted dataset evaluation, augmentation, sorting, and categorization as effective methods for using GPU hours, with a suggestion to filter data using a small model.
    • A member noted the potential of using a small model to reject data, describing the area as "underexplored in public" and cited Predictive Data Selection and Programming Every Example.
  • DeepHermes 24B Stumbles on Multi-GPU Setup: A user encountered Out-of-Memory (OOM) errors running DeepHermes 24B on a 5x 3090 setup using llama.cpp, even with minimal context settings.
    • Suggested solutions involved using the 8-bit version, and verifying multi-GPU configurations with --device, --split-mode, and --tensor-split flags.
  • Hermes 3 Powers Up with Llama 3.2: Nous Research released Hermes 3 3B, a new addition to the Hermes LLM series, detailed in the Hermes 3 Technical Report.
    • This model features advanced agentic capabilities, improved roleplaying, reasoning, multi-turn conversation, and long context coherence over Hermes 2.
  • C# Developer Champions Anthropic LLMs: A developer offered their C# expertise and professional LLM experience to the community, highlighting their work on documentation and examples for Anthropic.
    • They cited examples such as a Titanfall 2-based generator and the Bladewolf example from Metal Gear Rising, accessible on the Anthropic GitHub.


HuggingFace Discord

  • Hugging Face APIs Suffer 404 Meltdown: Multiple Hugging Face API models experienced widespread 404 errors, causing significant downtime for dependent applications.
    • Users reported the outage lasted almost a whole day without official acknowledgement, urging the HF dev team for immediate attention.
  • Roblox's Voice Safety Classifier Speaks Up: Roblox released a large classification model trained on 2,374 hours of real-world voice chat to detect toxicity.
    • The model outputs a tensor with labels like Profanity, DatingAndSexting, Racist, Bullying, Other, NoViolation, and uses a synthetic data pipeline detailed in this blog post.
  • Fuse GPU VRAM via Tensor Tricks: Users explored techniques to combine VRAM from multiple GPUs, like running Gemma3-12B on an A2000 12GB and a 1060 6GB using tensor parallelism.
    • References were made to Ollama issues on GitHub and llama.cpp discussions for more on multi-GPU support.
  • Oblix Platform Juggle AI on Cloud and Device: The Oblix.ai platform intelligently routes AI tasks to cloud or edge based on complexity, latency requirements, and cost considerations, using autonomous agents for optimal performance.
    • A YouTube video demonstrates how Oblix dynamically decides whether to process each AI request locally or in the cloud.
  • Gradio Upgrade Unwraps Dataframe Feature: A user reported that upgrading to Gradio 5.22 caused the gr.Dataframe(wrap=True) feature to stop working; this wrapping feature was only functioning in Gradio 5.20.
    • No further information about this issue was provided.


MCP (Glama) Discord

  • Microsoft Intros Semantic Workbench: Microsoft launched the Semantic Workbench, a VS Code extension, which is a tool to prototype intelligent assistants, agents, and multi-agentic systems, prompting questions about its role as an MCP.
    • A member specifically inquired if the tool functions as an MCP.
  • MySQL Server Bombs Out: A user is encountering issues connecting mcp-mysql-server to Docker MySQL, reporting connection failures despite it working outside of MCP.
    • The error occurs with every connection attempt, creating a significant development hurdle.
  • Glama API 500 Error: A user reported receiving a 500 error from the Glama API, but another member stated that there have been no outages in the last 24 hours, and shared a code sample.
    • The code to reproduce is curl -X 'GET' 'https://glama.ai/api/mcp/v1/servers?first=10&query=github' -H 'accept: application/json'.
  • DaVinci Resolve MCP Seeks Speedy Server Claim: A user is seeking to resubmit their DaVinci Resolve MCP project with a license and updates and was told claiming the server might speed up the update process.
    • The project's repo hosts the relevant code.
  • Calendar Scheduling Gets Automated: A blog post detailed the use of Asana MCP and Google Calendar MCP with Goose to automate task scheduling, using the blog post.
    • Tasks are pulled from Asana, analyzed, and scheduled in Google Calendar with a single prompt.


OpenRouter (Alex Atallah) Discord

  • OpenRouter Eyes TTS, Image Gen Rollout: Members expressed interest in OpenRouter offering TTS and image generation, with some voicing concerns about potentially high pricing.
    • Pricing details and release dates for the new features are still under wraps.
  • Groq Hits Speed Bump, Not Sambanova: A member reported that Sambanova was down, but quickly corrected the statement, clarifying that it was Groq that was experiencing issues.
    • Service status updates for Groq were not immediately available.
  • GPT-4o Lands on OpenRouter: GPT-4o-64k-output-alpha is now available on OpenRouter, supporting both text and image inputs with text outputs.
    • The pricing is set at $6/M input tokens and $18/M output tokens.
  • Fireworks Heats Up Pricing War: Fireworks slashed pricing for R1 and V3, with V3 allegedly matching existing performance, pegged at .9/.9.
    • The move intensifies competition in the generative AI service market; more information can be found on the Fireworks pricing page.


GPU MODE Discord

  • Nvidia Talk Eyes Pythonic CUTLASS: Attendees will hear about the pythonic future of CUTLASS in its next major 4.0 version at GTC, especially its integration into Python.
    • Previously, a member announced their GTC presentation titled Performance-Optimized CUDA Kernels for Inference With Small Transformer Models [S73168] happening today at 4pm, focused on Hopper architecture.
  • BFloat16 Atomic Addition Sucks: A member reported that using tl.atomic_cas with a lock for atomic addition with bfloat16 actually works, but it sucks.
    • The member is seeking improvements to the implementation, and offered a code snippet using tl.atomic_cas with a lock, inviting the community to enhance its performance.
  • Triton's Simplicity Entices GPU Newbies: A member highlighted that Triton's key strength lies not in peak performance, but in its accessibility, enabling individuals with limited GPU experience to create complex kernels, and pointed to lucidrains/native-sparse-attention-pytorch as an example.
    • They noted that achieving peak performance on predefined workloads is relatively straightforward, but Triton's robustness is what sets it apart.
  • FlashMLA's SmemLayoutP Unveiled: A member inquired about the dimensions of SmemLayoutP in the FlashMLA code, specifically its shape ((2,2), kNThreadsS, 1, kBlockN/8) and the role of kNThreadsS in synchronizing P between warpgroups.
    • The member speculated whether other dimensions might be related to wgmma, awaiting clarification from other experts.
  • Grayscale Leaderboard Smokes the Competition: Multiple leaderboard submissions to the grayscale leaderboard were successful on GPUs: L4, T4, A100, and H100 using Modal runners with IDs 2351, 2429, 2430, 2431, 2459, and 2460.
    • Benchmark submission with id 2363 to leaderboard vectoradd on GPUs: T4, L4, A100, H100 using Modal runners also succeeded, indicating progress in the vectoradd benchmark across various GPU architectures.


Nomic.ai (GPT4All) Discord

  • Oblix Orchestrates Local vs Cloud LLMs: A member shared a demo video (https://youtu.be/j0dOVWWzBrE) of Oblix, which seamlessly switches between local vs cloud, using agents to monitor system resources and make decisions dynamically.
    • The platform orchestrates between Ollama and OpenAI for optimal performance and cost-efficiency, as detailed on Oblix.ai.
  • AI Engineers Compare LLM Leaderboards: Members shared links to Artificial Analysis and LM Arena to find reliable LLM leaderboards for specific purposes.
    • Concerns were raised about filtering relevant models from these lists, particularly avoiding outdated options like Grok-3.
  • Members Design Medical Data Processing PC: A member requested assistance with building a new PC to process medical data using AI, emphasizing the need for secure, offline operation.
    • Another member suggested starting with an Intel i9, 128GB RAM, and an Nvidia 4090 RTX.
  • GPT4All Struggles with Audio Transcription: A member inquired about using GPT4All for local audio file transcription, specifically uploading .wav files, but found that it wasn't working.
    • Another member clarified that GPT4All is primarily designed for docs/pdf, recommending XTTS webui for wav to text conversion, but cautioned that the installation process is complex.


Yannick Kilcher Discord

  • W-GANs Sidestep Gradient Explosion: W-GANs mitigate gradient saturation by being linear, avoiding the BCE issues of traditional GANs, as shown in Figure 2 of the W-GAN paper.
    • However, instability can still arise if the generator or discriminator becomes overly dominant, leading to saturation on both sides.
  • Transformers Get Soft with Slots: Members shared an image analysis on soft slot methods which shows how soft slots dynamically bind to input tokens or retrieved content in Transformers.
    • Equations for Attention and Soft Slots (S') were shown, with learnable slots using softmax and scaled dot-product attention.
  • OpenAI.fm's UX/UI: Fast but Flawed?: Members joked about the simple and rushed UX/UI of OpenAI.fm.
    • One member pointed out that a more structured protocol is easily disrupted by less structured protocols that can evolve according to user needs, and that clients consume more of what they like and less of what they don't.
  • G-Retriever Enables Chatting with Graphs: The G-Retriever paper details the semantic extraction of information from knowledge graphs, enabling chatting with your graph, graph QnA and Graph RAG.
    • The paper introduces a Graph Question Answering (GraphQA) benchmark with data from scene understanding, common sense reasoning, and knowledge graph reasoning.
  • Moore's Law Accelerates AI?: Members are discussing METR_Evals' research suggesting "Moore’s Law for AI agents", claiming the length of tasks AIs can do is doubling about every 7 months.
    • Some members refuted the claim, arguing that certain tasks are not interesting for probabilistic models.


LlamaIndex Discord

  • Local RAG App Deployed for Code Chat: A fully local, fully open-source RAG app has been built that can chat with your code and was announced in this tweet.
    • The app uses GitIngest to parse the code into summaries and markdown, Streamlit for the UI, and runs Meta's Llama 3.2 locally using Ollama.
  • TypeScript Bundler Config Fixed Import Bug: A member using LlamaIndex TS had an issue importing agent, which was resolved by updating the tsconfig bundler configuration.
    • The user confirmed that modifying the TS config resolved the import error, and thanked the community for the suggestion.
  • Parallel executions limited in Agent Workflows: A member asked about limiting parallel executions in Agent Workflows, specifically for a tool with a human-in-the-loop event due to the agent calling the tool multiple times in parallel.
    • The issue was replied on GitHub because the user sought to ensure the tool was called only once at a time.


Cohere Discord

  • Account Limit Trumps Trial Key Limit: Users clarified that the monthly limit of 1k requests for trial keys is per account, not per key.
    • They cautioned that creating multiple accounts to bypass this limit will result in removal of all accounts.
  • Cohere API's Throw Errors: Users encountered various Cohere API error messages, including invalid request, rate limiting, and token limits due to empty documents, short prompts, exceeding token limits, and incorrect model specifications.
    • Rate limiting errors are identified by a 429 status code, as detailed in the Cohere API documentation.
  • Cohere User Seeks Rate Limit Checker: A user inquired about an API to check their remaining rate limit usage.
    • Currently, there doesn't appear to be a direct API solution available.
  • Hospitality Expert Pioneers Low-Code Tech: Gaby, a professional in the hospitality industry, introduced herself as a low-code tech enthusiast, proficient with platforms like Make and Adalo.
    • Her expertise showcases the growing importance of low-code tools in various industries.


Modular (Mojo 🔥) Discord

  • Mojo's Duration Module Displays Weirdness: A developer working on a duration module proposal for Mojo ran into unexpected behavior with type casting between Ratio and Duration structs, sharing code snippet to demonstrate the issue.
    • The specifics of the bug involve unexpected results when converting between the two time formats.
  • Mojo and PyTorch Team Up?: A member speculated if using PyTorch in Mojo could speed up training with MAX.
    • The inquiry did not receive a response, leaving the potential benefits unconfirmed.
  • Mojo Community Debates Nanosecond Precision: The community debated using nanosecond precision as the base unit for time representation in Mojo; one member noted that a UInt64 of nanoseconds can cover over 500 years.
    • Another member countered that C++ guarantees a default time resolution of at least 292 years, emphasizing that seconds are the base SI unit for time.


DSPy Discord

  • MIPRO v2 Judges LLMs: A member reported using MIPRO v2 with LLM-as-a-judge as their evaluation metric and shared a link to a math reasoning tutorial showcasing its use.
    • The math reasoning tutorial demonstrates MIPRO as a metric for evaluating LLMs.
  • DSPy Shares LLM-as-a-Judge Documentation: Documentation on utilizing LLM-as-a-judge was shared from DSPy's learning resources.
    • The documentation details the use of AI feedback for metric evaluations.
  • Automatic Metrics Optimize DSPy: It was emphasized that automatic metrics are critical for evaluation and optimization within DSPy.
    • DSPy employs metrics to monitor progress and enhance program effectiveness.
  • Metrics Evaluate Task Performance: A metric is defined as a function that scores system outputs based on data examples; where simple tasks may use basic metrics like accuracy or exact match.
    • Complex tasks benefit from metrics that assess multiple output properties via AI feedback.


tinygrad (George Hotz) Discord

  • Member Questions Unet3d's Dimensions: A member inquired if the example unet3d model is actually 3D, proposing it might be 2.5D because it uses 2D convolutions and 2D transposes on 3D input.
    • They drew attention to the difference from a real 3D Unet architecture.
  • 2D Convolutions Mimic 3D: The conversation clarified that using 2D convolutions on 3D input creates a 2.5D effect, in contrast to true 3D Unet architectures which use genuine 3D operations.
    • The original poster requested clarification on the dimensionality of the implementation.


Torchtune Discord

  • Paper Shared on Torchtune: krammnic shared a paper on the Torchtune channel.
    • No discussion occurred about this paper.
  • Follow-up on Paper's Relevance: The paper's title and abstract suggest potential relevance to the ongoing discussions within the Torchtune community.
    • Further investigation is needed to determine the paper's specific contributions and applicability to current projects.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.