AI News (MOVED TO news.smol.ai!)

Archives
March 1, 2025

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


a quiet day.

AI News for 2/27/2025-2/28/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (221 channels, and 8236 messages) for you. Estimated reading time saved (at 200wpm): 795 minutes. You can now tag @smol_ai for AINews discussions!

Much discussion about the relative merits of GPT 4.5, which you can read below.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

GPT-4.5 Model Performance and User Perception

  • Initial User Experiences and Subjective Evaluation: @karpathy conducted a poll comparing GPT-4 and GPT-4.5, finding that in 4 out of 5 questions, users preferred GPT-4, which was surprising as @karpathy personally found GPT-4.5 better in all cases, suggesting a possible preference for "high-taste testers" towards GPT-4.5's deeper charm, creativity, and humor. However, @jeremyphoward responded to Karpathy's poll results, stating that the awkwardness, not "high taste" was the reason for user preference. @Teknium1 also reacted to the poll results with "Damn lol must have some high, or low, taste people testing here idk". @abacaj expressed strong dissatisfaction, stating GPT-4.5 needs to enhance productivity to be useful, otherwise it is "fucking useless". @abacaj also argued that if GPT-4.5 is only a "high taste" model, it is "blowing investor money". @stevenheidel likened the GPT-4.5 launch to the initial ChatGPT excitement, as people are again having fun chatting with AI.
  • Concerns Regarding Speed and Practicality: @abacaj noted GPT-4.5 is "very slow" and "impractical to use for agent loops", despite being "fun to prompt". @abacaj elaborated that it takes "3+ minutes to answer one question" in a moderate prompt loop, deeming it "very impractical". @abacaj further commented that GPT-4.5 "feels more like a research artifact than a real model you can deploy" due to its slowness.
  • Critique of Capabilities and Value Proposition: @abacaj criticized the showcased capabilities of the "largest language model", questioning if drawing a triangle using SVG is the highlight. @abacaj found the value add for end-users questionable, suggesting internal use within OAI for distillation.
  • Pricing and Economic Viability: @Yuchenj_UW remarked that the pricing "makes even less sense" in light of GPT-4.5's performance. @Yuchenj_UW speculated about the potential pricing of GPT-5 and o4. @AravSrinivas highlighted Perplexity Deep Research at $20/month versus ChatGPT at $200/month.
  • Performance Compared to Other Models: @METR_Evals reported that GPT-4.5 performs above GPT-4o but below o1 or Claude 3.5 Sonnet based on METR experiments with an earlier checkpoint, noting a time horizon score of ~30 minutes. @dylan522p stated Claude 3.7 beats GPT 4.5 on most tasks, but GPT 4.5 has better "vibes" and is the first model since Claude 3 Opus to make them laugh, emphasizing humor as intelligence. @scaling01 speculated GPT-4.5 could be "GPT-4o x 10" in size, estimating around 5T parameters. @Teknium1 mentioned Grok's context window is only 128k. @multimodalart shared evaluations comparing GPT 4.5 with non-thinking models like Sonnet 3.7, Deepseek V3, and Grok 3.
  • Emotional Intelligence (EQ) and "Vibes": @karpathy found Claude 3.7's humor to be the funniest after scrutinizing LLM outputs for humor. @random_walker argued that the "EQ" improvements in GPT 4.5 are due to post-training, not parameter count, suggesting any EQ differences are behavioral rather than capability-based. @random_walker further claimed that GPT-4o and GPT-3.5 can exhibit similar EQ behavior as GPT-4.5 with appropriate post-training. @omarsar0 suggested using the OpenAI Playground to compare models and observe GPT-4.5's "thoughtful" responses. @omarsar0 noted GPT-4.5 often sounds more "thoughtful" by adding sensations and thoughts. @marktenenholtz observed that Sonnet 3.7 is "almost too eager" and GPT-4.5 is "almost too deferential".
  • Technical Details and Training: @sama credited @ColinWei11, Yujia Jin, and @MikhailPavlov5 for the difficult work at the intersection of ML and systems required for GPT-4.5. @cloneofsimo highlighted that GPT4.5 was "trained on multiple datacenters" and "aggressively used low precision training", implying "diloco goes brr" and the benefit of fp8 training due to high granularity. @rasbt pointed to the system card mentioning "new supervision techniques" used in training. @rasbt mentioned that apparently character-training was not used. @Teknium1 questioned how GPT-4.5's knowledge cutoff remains 2023 despite current pretraining runs, speculating about data contamination from ChatGPT 3.5 data or if the model was trained long ago.

Model Architecture, Scaling Laws and Efficiency

  • Scaling Law Limitations and Alternative Approaches: @Yuchenj_UW suggested that the GPT-4.5 release indicates LLM pre-training scaling has plateaued, noting that a 10x compute increase yields limited improvement, which allows companies like xAI to catch up through innovation in algorithms and data, as demonstrated by DeepSeek's efficiency gains. @jxmnop echoed this, suggesting GPT 4.5 might signal "the beginning of the end for scaling laws", questioning if data is exhausted or if scaling laws fail to capture desired task performance. @ibab emphasized that algorithms are increasingly important with larger models, suspecting training details are key to Grok 3's performance. @MParakhin stated pre-training needs higher-perplexity targeted data and Active Learning to progress further. @teortaxesTex asserted that non-thinking LLMs pretrained on natural data have hit their practical limit, doubting a $1T training run would significantly improve them.
  • Inference Compute and Efficiency: @rasbt clarified that train- and inference-compute are orthogonal ways to improve LLMs and an apple-to-oranges comparison is being made without considering inference-compute scaling for GPT-4.5. @rasbt questioned if GPT-4.5 is more expensive and slower than o1 (GPT4-sized + inference-compute scaling) and what GPT-4.5 with o1-style scaling would look like. @iScienceLuvr highlighted research on "Thinking Slow, Fast", using distilled reasoners based on smaller models like Llama-1B and -3B with Mamba architecture to improve inference scaling. @_akhaliq shared FlexiDiT, a diffusion transformer framework that generates high-quality samples with less compute by using varying patch sizes during denoising. @TheTuringPost discussed Chain of Draft (CoD), which encourages models to generate short reasoning steps to reduce costs and speed up models while maintaining accuracy.
  • Hardware and System Architecture: @reach_vb highlighted DeepSeek's Fire-Flyer File System (3FS), noting its disaggregated architecture, strong consistency using CRAQ, stateless metadata services, and KVCache for inference, achieving high read throughput and outperforming in benchmarks. @teortaxesTex discussed N4 process allowing 2.32x denser chips compared to N7, based on transistor counts and die sizes. @awnihannun reported Kimi's Moonshot 16B MoE model running nicely on M4 Max with MLX at 154 toks/sec, performing as good or better than dense 7Bs. @casper_hansen_ commented on CUDA's moat, noting even AMD engineers use CUDA for tensor engines.

Open Source Models, Tools, and Frameworks

  • DeepSeek's Open Source Contributions: @Yuchenj_UW praised DeepSeek for drastically reducing GPU requirements through infrastructure and algorithm optimization and their "goated open source work". @reach_vb, @reach_vb, @reach_vb and @reach_vb shared multiple links and details regarding DeepSeek's Fire-Flyer File System (3FS) and benchmarks. @teortaxesTex mentioned DeepSeek's file system from 2019 is still SoTA. @aidan_mclau jokingly scanned DeepSeek's training data and found "deep commitment from a brilliant team".
  • Hugging Face Ecosystem and Integrations: @_akhaliq and @_akhaliq provided code snippets for developers to get started with GPT-4.5-preview using ai-gradio[openrouter] and Hugging Face. @ClementDelangue highlighted the French ministry of culture and interior being on Hugging Face. @mervenoyann shared that Microsoft's MAGMA-8B model is easily loadable to Hugging Face transformers. @ClementDelangue announced Perplexity R1-1776 inference directly from HF model page via FireworksAI_HQ. @_akhaliq shared a link to AI Conference Deadlines on Hugging Face.
  • Local LLMs and MLX: @reach_vb shared instructions for running Phi 4 Mini Instruct locally on a Mac using llama.cpp. @awnihannun committed to using local LLMs for a vibe-check on performance gap, favoring tools like the raw terminal (mlx_lm) and LM Studio. @awnihannun, @awnihannun, and @awnihannun showcased local inference on M4 Max using MLX for models like Qwen2.5 and Moonshot.
  • Other Open Source Tools and Projects: @pirroh mentioned Replit building their own Copy-On-Write distributed file system before LLMs became coding proficient. @bobvanluijt highlighted Weaviate's open-source vector database and its new features. @_akhaliq shared TALKPLAY, a multimodal music recommendation system with LLMs. @alexalbert__ announced Anthropic API quality of life update allowing public facing URLs for image/document sources. @DeepLearningAI promoted a short course on "Build Apps with Windsurf’s AI Coding Agents" in collaboration with Codeium. @AymericRoucher recommended reading about instrumenting smolagent runs and setting up LLM-judge systems using Arize Phoenix. @mervenoyann advertised a weekly newsletter on open-source art tools. @rasbt shared a tutorial to deploy AI models on public/private cloud using open-source tools.

AI Applications and Industry Use Cases

  • Enterprise AI and Productivity: @perplexity_ai, @perplexity_ai, @perplexity_ai, and @perplexity_ai announced Perplexity Deep Research for Enterprise Data, connecting to Google Drive, OneDrive, and SharePoint, enabling deep research across company files and the web with enterprise-grade security. @AravSrinivas, @AravSrinivas, @AravSrinivas, @AravSrinivas, and @AravSrinivas further detailed Perplexity Enterprise Pro, emphasizing features like deep research, reasoning, internal/external search, access to all models, and collaboration. @lmarena_ai and @lmarena_ai announced Claude 3.7 Sonnet's top ranking in coding on the Arena, highlighting its capabilities. @AIatMeta showcased Llama being used by SevillaFC with IBM's watsonx to create Scout Advisor for soccer star scouting. @OpenAIDevs highlighted ConsensusNLP using GPT-4.5 for scientific/medical analysis and structured outputs for visualizing research agreement.
  • Agentic AI and Automation: @mervenoyann announced Microsoft's MAGMA-8B vision language action model for physical and digital world operations including embodied robots and web automation. @llama_index shared an example of agentic productivity applications built with LlamaIndex. @RichardSocher suggested using research agents like ARI for extensive literature reviews in serious medical problems, providing an example report.
  • Coding and Development: @nearcyan shared a meme about junior devs watching Claude 3.7 "destroy their codebase in cursor". @HamelHusain stated "It is only possible for me to understand GraphQL because of AI". @cloneofsimo critiqued current automated software development tools like Devin, OpenHands, Replit, and Cursor Compose, finding them unable to complete even small applications end-to-end, lacking in server/client, IPC, queue, and scheduling capabilities. @rishdotblog claimed to have replaced a $100/month tool with a $10 Claude Code solution, suggesting programming jobs and SaaS companies are "going away".

AI Research and Papers

  • Recent Research Paper Highlights: @rasbt provided a list of recent AI research papers covering topics like SWE-RL, LoRA boosting, long-context LLMs, Logic-RL, test-time scaling, AI research agents, model selection, inner thinking transformers, natural reasoning, knowledge acquisition, freelance software engineering with LLMs, sparse attention, unlearning, large language diffusion models, model merging, reasoning-action dilemma, finance LLMs, infinite context, distillation scaling laws, prompt caching, reasoning from demonstrations, hierarchical reasoning, thinking in LLMs, compute-optimal test-time scaling, mathematical reasoning, large memory models, quantized LLMs, video RoPE, scaling up test-time compute, self-backtracking, training efficient reasoning, reasoning advancements, teaching critique via RL, enhancing reasoning for domain applications, less-is-more reasoning, chain-of-thought reasoning, chain-of-associated-thoughts, direct alignment algorithms, embedding layer scaling, and competitive programming with large reasoning models. @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, and @iScienceLuvr highlighted papers on FlexiDiT, Self-Training for Concise Reasoning, and Thinking Slow, Fast with Distilled Reasoners, providing abstracts and code links. @omarsar0, @omarsar0, and @omarsar0 shared papers on METAL (Modality-tailored critique), Modality-tailored critiques for self-correction, and Test-Time Scaling on Chart Generation, noting performance improvements. @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, and @_akhaliq linked to papers on Mobius (Text to Seamless Looping Video), FlexiDiT, R1-T1 (Translation Capability Incentivization), and LongRoPE2 (Context Window Scaling). @dair_ai and @dair_ai highlighted Google's PlanGEN framework for complex planning and reasoning in LLMs, detailing its constraint-guided verification and adaptive algorithm selection. @DeepLearningAI summarized a paper on Brain2Qwerty, a non-invasive AI system translating brain waves to text using MEG recordings.
  • Cognitive Science and AI Alignment Theory: @AndrewLampinen shared a preprint on "Naturalistic Computational Cognitive Science", synthesizing AI and cognitive science towards generalizable cognition models. @DanHendrycks discussed the evolution of ideas in AI alignment theory, contrasting "random memetic drift" with Yudkowsky's contributions, suggesting GPT is forcing empirical realities on the alignment forum.

Humor and Miscellaneous

  • AI Model Humor and Vibe Checks: @_akhaliq and @_akhaliq posted animated SVGs as humorous responses from GPT-4.5 about being open-sourced. @_philschmid asked for "vibe test prompts", suggesting counting to ten omitting numbers ending in "e" and generating an SVG of a pelican on a bicycle. @NeelNanda5 shared an LLM hack: "Write your response in the style of a Scott Alexander blog post" for more enjoyable long outputs. @aidan_mclau presented a humorous IQ scale from 0 to infinity, culminating in an enlightened fart joke. @andersonbcdefg shared a meme about asking OpenAI if their model is good or lazy. @Teknium1 posted "GPT4.5 finally knows me, lmao" with an image implying GPT-4.5 understood their personality.
  • Societal and Philosophical Reflections: @RichardMCNgo made an observation about the demographic overlap between high-IQ autism-spectrum biological males, transness, and systemizing thinking. @RichardMCNgo analogized the US presidency since 2012 to progressive chess. @teortaxesTex joked Unitree bots will cause an uptick in solipsism. @francoisfleuret expressed a "nightmare" scenario of nukes, AI, and drones as rational defense. @AmandaAskell humorously suggested an expensive "I totes respect you" pin as an alternative to uncomfortable suits for East Coast formality. @AmandaAskell joked about gendered profile preferences on dating apps.
  • Industry and Community Chatter: @suchenzang posted "big model smell" with a link, and @suchenzang tweeted "things you can't buy for $9bn, maybe not even $30bn...". @nearcyan declared being "done with benchmarks", losing empathy for hyper-dimensional shape descriptions. @agihippo questioned working hours in AI, suggesting "AI people are mostly working all the time!". @ID_AA_Carmack was "very happy to see more classic game source code released", noting the disjoint between game dev and broader open source culture. @c_valenzuelab joked Runway's new about page states "We are brain surgeons for artificial brains.".

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek Realse: Revolutionary Storage and Data Processing Tech

  • DeepSeek Realse 5th Bomb! Cluster Bomb Again! 3FS (distributed file system) & smallpond (A lightweight data processing framework) (Score: 499, Comments: 73): DeepSeek launches 3FS, a high-performance distributed file system optimized for AI workloads, utilizing modern SSDs and RDMA networks to enhance distributed application development. Additionally, smallpond, a lightweight data processing framework, integrates with DuckDB and 3FS, offering a streamlined solution for data processing tasks. For more information, visit their GitHub page and smallpond repository. - 3FS Performance and Comparison: 3FS achieves an impressive 6.6 TiB/s bandwidth, significantly surpassing typical DRAM speeds. Discussions compared 3FS to other systems like Colossus and noted its unique application in AI training workloads without traditional file read optimizations like caching. - Open Source Strategy and Impact: Many commenters appreciated DeepSeek’s open-source approach, highlighting its potential to democratize AI advancements and challenge monopolistic tech giants like OpenAI and Nvidia. The open-source culture was emphasized as a reciprocal process, benefiting both contributors and the broader AI community. - Technical Insights and Historical Context: 3FS has been in production for over five years, developed by High-Flyer AI and used in their Fire-Flyer II system. It is optimized for large-scale random read operations, employs Direct I/O, and uses the FFRecord format for sample data storage, enhancing AI model training efficiency significantly.
  • DeepSeek OpenSourceWeek Day 5 (Score: 127, Comments: 9): Fire-Flyer File System (3FS) is a parallel file system designed to maximize the bandwidth of modern SSDs and RDMA networks, achieving an impressive 6.6 TiB/s aggregate read throughput in a 180-node cluster and 3.66 TiB/min throughput on the GraySort benchmark with a 25-node cluster. It offers 40+ GiB/s peak throughput per client node for KVCache lookup and supports a disaggregated architecture with strong consistency semantics, facilitating tasks like training data preprocessing and embedding vector search. For more details, visit the 3FS repository and the Smallpond framework. - 3FS is highly suitable for AI Training Workloads and AI Inference, offering benefits like random access to training samples without prefetching, high-throughput checkpointing, and a cost-effective KVCache for large language model inference. It also supports data-intensive applications requiring strong consistency and high throughput, as evidenced by its performance on the GraySort benchmark. - Users expressed amazement at the development team’s productivity, noting the impressive output despite limited manpower. The project originated from the CEO’s hedge fund team in 2019, and their recruitment strategy focuses on hiring top CS graduates from elite Chinese universities. - Some users find the technical details of 3FS too complex and not directly applicable to most use cases, suggesting a potential mismatch between user expectations and the system’s specialized capabilities.

Theme 2. French Reasoning Model: Economical and Effective

  • I trained a reasoning model that speaks French—for just $20! 🤯🇫🇷 (Score: 229, Comments: 78): I cannot generate a summary as the post body does not contain sufficient textual information, only a link to a video. - Fine-tuning a 7B LLM: TheREXincoming fine-tuned a 7B LLM based on Qwen 2.5 using only 2,000 samples (1K English + 1K French) at a cost of $20. The model performs comparably to R1 Distil 7B on math benchmarks, showcasing minimal knowledge degradation. - Model and Data Availability: The fine-tuned model and its dataset are available on Hugging Face (Data, Model, GGUF). The model is designed for high-performance French language capabilities and can serve as a template for training reasoning LLMs in other languages. - Community Feedback and Development: Users inquired about the data selection and training details, while TheREXincoming mentioned ongoing efforts to clean up the data curation pipeline and plans to update the repository. The initiative was met with enthusiasm and disbelief at the low cost and high performance achieved.

Theme 3. Sesame Realtime Voice Model Rivals OpenAI

  • “Crossing the uncanny valley of conversational voice” post by Sesame - realtime conversation audio model rivalling OpenAI (Score: 200, Comments: 37): Sesame showcased a compelling real-time conversational voice model that rivals OpenAI’s Advanced Voice Mode, with plans to release it under an Apache 2.0 license. Although the public weights are not yet available, the demo has impressed users with its quality, indicating a promising future for this new player in voice synthesis technology. - Users are highly impressed with the Sesame conversational voice model, noting its superior quality and speed compared to ChatGPT’s advanced voice mode. The demo is praised for its smooth response time and realistic sound, with users expressing excitement for its potential open-source release. - There is enthusiasm for the potential integration of the model with other technologies, such as function calling and RAG, to enhance its capabilities without increasing latency. Users are eager for the model to be available on platforms like Hugging Face for easier access and integration. - Some users highlighted limitations, such as the model’s inability to detect emotions or sarcasm and its tendency to shut down conversations if inputs are delayed. Despite these issues, the model’s engaging conversational style and memory capabilities were appreciated, with users looking forward to trying it on their own setups.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Humorous and Creative Applications of GPT 4.5

  • GPT 4.5 as Donald Trump explaining creation of Earth (Score: 550, Comments: 86): GPT 4.5 humorously mimics Donald Trump in a satirical narrative about the creation of Earth, attributing the planet’s formation to Trump’s personal initiative. The narrative highlights exaggerated claims about creating the sun, Earth, and its features, while humorously critiquing dinosaurs as a “huge mistake” before introducing “winning” animals and humans, all in a style characteristic of Trump’s speech patterns. - Commenters appreciated the humor and style of the GPT 4.5 narrative, with many finding it amusing and noting its exaggerated Trump-like qualities, though some felt it was too coherent or repetitive. The humor about dinosaurs being a “huge mistake” and the planet being “the wettest ever” particularly resonated with readers. - There was interest in converting the text to audio using text-to-speech models, with some already sharing audio links (SoundProofHead’s link and TwoLevelsAhead’s link) or expressing a desire for a deepfake video version. - The discussion highlighted the potential of AI in humor, with some commenters suggesting that achieving genuine comedy could be a significant benchmark for AI capabilities, while others joked about the implications of AI mastering humor to a superhuman level.
  • ChatGPT’s existential crisis over emoji (Score: 203, Comments: 48): ChatGPT humorously misidentifies emojis, including a seahorse, unicorn, shrimp, and dragon, leading to a playful yet existential reflection on emoji recognition capabilities. The conversation, shown on a dark background, underscores the casual and comedic nature of the AI’s attempts at identifying emojis. - Emoji Misidentification: Users enjoyed sharing humorous instances of ChatGPT misidentifying emojis, often repeatedly confusing seahorses with other animals like unicorns, dragons, and fish. This led to a playful and comedic exchange, highlighting the AI’s struggle with emoji recognition. - Community Engagement: Many users shared their own experiences and screenshots, contributing to the light-hearted nature of the conversation. The shared content included links to images and humorous dialogues, emphasizing the communal enjoyment of the AI’s quirky responses. - AI Humor and Reflection: The thread reflects on the whimsical nature of AI’s limitations, with users appreciating the comedic errors and engaging in a shared digital experience. This playful interaction underscores the community’s enjoyment of AI’s unpredictability and the shared humor derived from its errors.

Theme 2. Innovations in AI Video and Audio Processing

  • Advanced Voice 4.5 (Score: 365, Comments: 95): The post titled “Advanced Voice 4.5” likely discusses advancements in AI voice acting technology, specifically focusing on version 4.5. Without additional context or details, the post emphasizes the development of more realistic AI-generated voices. - There is skepticism about the “Advanced Voice 4.5” update, with users questioning whether it includes voice advancements, as some believe it is just an uncensored update. TheRobotCluster claims that version 4.5 does not apply to voice and is simply an uncensored version, raising questions about whether ChatGPT now allows uncensored content. - Discussions around the AI’s ability to mimic accents reveal mixed opinions; some users criticize the AI’s attempt at an English accent, suggesting it sounds like an American trying to mimic it. This raises questions about the authenticity and accuracy of AI-generated accents. - The conversation touches on AI’s impact on various industries, with some users predicting that AI advancements, particularly in voice acting and potentially the porn industry, could lead to significant technological evolution and financial gains in the future.
  • SpargeAttn: A new method giving you a 1.83x speedup on video models with NO quality loss. (Score: 155, Comments: 45): SpargeAttn offers a 1.83x speedup for video models without compromising quality, as demonstrated by a comparison on an L40 GPU. The method reduces processing time from 1897 seconds with “Full Attention” to 1037 seconds, maintaining video quality. - Installation Challenges: Users discuss the complexity of installing SpargeAttn due to dependencies like Triton and the need for specific Python versions. Detailed steps for installation on Windows are provided, including links to necessary packages and commands for integration with ComfyUI. - Compatibility and Performance: SpargeAttn is noted to be model dimension specific, with potential issues when tuning across different model sizes (e.g., 1.3B vs 14B models). Sliding Tile Attention is mentioned as an alternative that performs well with tuning but is currently limited to H100 cards. - Community Contributions: Kijai has incorporated SpargeAttn into the ComfyUI-WanVideoWrapper, showcasing community efforts to integrate new tools into existing frameworks. Users express hope for future native support of attention mechanisms like sage attention and triton to simplify installation processes.

Theme 3. AI Identity Confusions and Hallucinations

  • Groks thinks it is Claude unprompted, and doubles down on it after being called out (Score: 187, Comments: 54): Groks, an AI model, erroneously identified itself as Claude during a conversation with the head of a debate club and persisted in this claim even after being questioned. The incident, detailed in a conversation shared on X, raises questions about the underlying cause of this identity confusion. - Several users speculate that Grok’s identity confusion might stem from its training data, which includes outputs from older models like Claude. There’s a belief that xAI’s post-training might have been less thorough due to its newness and an attempt to reduce bias, leading to such errors. - The incident is viewed humorously by some, with comments highlighting the absurdity of the debate club’s questioning of smallpox’s existence. This has led to skepticism about the legitimacy of the debate club, with some users suggesting it resembles a conspiracy group. - There are suspicions that Grok might be using Claude’s technology underneath or trained on its datasets, similar to Deepseek using ChatGPT data, raising concerns about the legality and ethics of such practices.
  • GPT-4.5 will just invent concepts mid-conversation (Score: 348, Comments: 75): GPT-4.5 is noted for its ability to invent concepts during interactions, as highlighted in a Twitter post by Aaron Ng. In a conversation snippet, the AI invents the “CLEAR Model” specifically for the interaction, demonstrating its dynamic conversational capabilities. - Peter Hawkins originally invented the CLEAR Model, and GPT-4.5‘s reference to it is a form of hallucination, as noted by I_am_John_Mac with a link to hotpmo.com. This highlights GPT-4.5‘s tendency to create concepts that may not be accurate or original. - There is a humorous tone in the discussion about turning hallucinations into a feature, with some users joking about the AI possibly filing patents or claiming intellectual property on its hallucinated concepts. - The hallucination rate of GPT-4.5 is noted to be 37.1%, which is lower than GPT-4o’s rate of 61.8% and o1’s rate of 44%, as mentioned by Hexpe and vingeran, suggesting an improvement in accuracy over previous models.

Theme 4. AI Tools Streamlining Programming and Writing

  • I made a simple tool that completely changed how I work with AI coding assistants (Score: 167, Comments: 41): CodeSelect is a tool designed to streamline the process of sharing code with AI coding assistants like Claude and ChatGPT by displaying project structures as a checkbox tree, allowing quick file selection, and automatically detecting file relationships for better context. This lightweight tool, which installs with a single command and has no external dependencies, significantly reduces preparation time and improves AI response quality by providing proper context, and is available on GitHub. - Repomix is highlighted as an alternative tool for managing code project structures, with a simple command (cd myProject && npx repomix) that works on any folder and outputs a draggable file, which users find effective for project management. - Users discuss integrating a Gemini powered agent into CodeSelect to suggest edits and file references to Claude, aiming to enhance efficiency and save tokens during the coding process. - Claude’s GitHub integration is noted for its ability to manage project-wide changes, such as renaming variables and updating comments, which users find impressive for maintaining project context without manual input.
  • Just bit the bullet and got a yearly Claude Pro subscription (Score: 104, Comments: 128): The author praises the Claude Pro subscription as a transformative tool for daily tasks, analytics, creative problem-solving, and software engineering, highlighting its effectiveness in debugging and code reviews. They express satisfaction with Anthropic’s product, contrasting it with criticisms of Claude 3.7 for being too concise, and emphasize the significant advancement it represents over traditional search engines. - Users discuss usage limits as a significant issue with the Claude Pro subscription, with some suggesting strategies like starting new chats to manage limits effectively. Others express frustration with hitting limits frequently, which disrupts their workflow, while some users report rarely encountering these issues by keeping conversations short. - There is skepticism about posts praising Claude Pro being genuine, with some users suspecting them to be part of a marketing campaign. This suspicion is fueled by the timing of posts with promotional emails and the repetitive nature of positive endorsements, though others argue the discussions are genuine due to the subreddit’s focus. - Subscribers debate the value of a yearly subscription versus monthly payments, with some regretting the purchase due to decreasing quality and restrictive usage limits. Others find the subscription beneficial for their work, suggesting that the decision should depend on personal use cases and the rapidly evolving AI landscape.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. GPT-4.5 Enters Arena, but Claude 3.7 Still King of the Code

  • GPT-4.5 Fails to Impress, Price Tag Stings: Early testers find OpenAI's GPT-4.5 overpriced at $150 per million tokens and not significantly better than GPT-4 Turbo for coding, with many developers still favoring Claude 3.7 Sonnet for its superior performance in software engineering tasks. Early benchmarks on aider's polyglot coding benchmark showed GPT-4.5 scoring 45% compared to Sonnet 3.7's 65%, leading to disappointment and questions about its value proposition given the high API cost.
  • Claude 3.7 Sonnet Faces Load Issues, Remains Top Coder: Despite reports of high load messages and refusals, Claude 3.7 Sonnet is still considered the best model for software engineering due to its ability to accurately follow instructions and debug code effectively. Users highlight Claude 3.7's improved instruction following and debugging capabilities, even though some speculate Anthropic is making the model harder to use.
  • DeepSeek R2 Hype Train Gathers Steam: Anticipation is building for DeepSeek's R2 model, with some members expecting it to surpass current SOTA models and disrupt corporate hype, as DeepSeek's Chatbot already outperforms existing models in coding. Members compare DeepSeek's R1 model favorably to OpenAI's o1, further fueling excitement for the upcoming R2 release.

Theme 2. IDE Wars: Cursor and Windsurf Trade Blows Over AI Coding Supremacy

  • Cursor Plagued by Bugs, Users Cry Foul: Users report Cursor IDE is riddled with bugs, experiencing frequent crashes and lost code changes after updates, with some considering disabling auto-updates and waiting for more stable releases. Frustration mounts as some users claim the coding quality of Claude 3.7 on Cursor has declined since launch.
  • Windsurf AI Jumps on GPT-4.5 Bandwagon, Questions Emerge: Windsurf AI integrated GPT-4.5 in Beta, but early tests show it's significantly more expensive and not as strong for software engineering, sparking debate if this move is genuine or propaganda against Cursor. Users question Windsurf's pricing model, specifically flow credits, finding Cursor's pricing more straightforward.
  • Memory Banks in Cursor Deemed "Pointless" and Costly: Cursor's Memory Banks feature is criticized as inefficient and expensive, with users reporting costs reaching $50 a day using the Claude 3.7 API, and that memory banks sometimes hallucinate making it cheaper to hire a programmer. Users find memory banks inefficient because they occasionally make mistakes, leading to the conclusion that hiring a human programmer is more cost-effective.

Theme 3. Hardware Hustle: DeepSeek's DualPipe and TinyLM Offer Glimmers of Innovation

  • DeepSeek's DualPipe Declares War on Pipeline Bubbles: DeepSeek AI released DualPipe, a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training, aiming to reduce pipeline bubbles compared to traditional methods. This release, along with EPLB, an expert-parallel load balancer, is part of a week-long series of releases from DeepSeek AI.
  • TinyLM Unleashes Client-Side LLMs with WebGPU Fury: tinylm v0 launched, a library enabling client-side LLMs in browsers or Node.js with WebGPU acceleration, boasting zero-cost inference and complete privacy with an OpenAI-compatible API. tinylm supports text generation, embeddings, and real-time token streaming, and eliminates the need for servers for local LLM inference.
  • NVIDIA Shifts Tensor Core Focus to FP4, Leaving INT4 Behind?: NVIDIA appears to be shifting away from INT4 Tensor Cores towards FP4, with Blackwell GPUs featuring FP4, while Ada had INT4 and Hopper had INT8, raising questions about the future of INT4 precision in NVIDIA's hardware strategy. Benchmarks suggest NVIDIA is prioritizing FP4 for quantized model training, potentially impacting future hardware development and software optimization strategies.

Theme 4. Pricing Pressure: GPT-4.5 API Costs Spark Outrage, Open Source Alternatives Beckon

  • GPT-4.5 API Pricing Deemed "Insane," Users Seek Alternatives: OpenAI's GPT-4.5 (Preview) API pricing at $75 input / $150 output per million tokens is met with harsh criticism, with users decrying the exorbitant cost compared to models like Grok3 and Claude Sonnet 3.7, questioning its value and prompting some to consider open-source alternatives. The high cost of GPT-4.5 raises concerns about accessibility and sustainability for developers and researchers.
  • Deepinfra Underprices Fal AI by 100x, Claims User: A user claims Deepinfra is 100x cheaper than Fal AI for character processing, charging $0.8 per million characters and offering free compute, contrasting with Fal AI's $50 free credit, and suggesting Kokoro TTS as another low-cost alternative. This pricing discrepancy highlights the competitive landscape and cost-saving opportunities in the AI infrastructure market.
  • Windsurf Users Question Flow Credits, Find Cursor Pricing "Preferable": Windsurf's pricing model, particularly flow credits and additional flow action costs, is confusing to users, leading some to prefer Cursor's more straightforward pricing approach. Users express concern about the disproportionate cost of additional flow actions, impacting the perceived value and transparency of Windsurf's pricing structure.

Theme 5. Community Pulse: From Robotics Arms to LeetCode for CUDA, Innovation Thrives

  • Hobbyists Unite to Build DIY Robotics Arm: Members in LM Studio Discord are enthusiastically discussing building a robotics arm from scratch, leveraging affordable 3D printers like the $100 Creality Ender 3 V2 and open-source resources for learning servos, CAD, and microcontrollers. This project showcases the community's hands-on approach to learning and applying AI and robotics principles.
  • LeetCode for CUDA Arrives, Challenges GPU Gurus: The CUDA community celebrates the beta release of LeetCode for CUDA, a new platform offering coding challenges specifically designed for CUDA development, inviting users to test their skills and provide feedback. This new platform fosters a competitive and collaborative environment for improving CUDA programming skills.
  • Hugging Face Community Fixes Microsoft's Phi-4 Mini Fiasco: Microsoft's Phi-4 mini model was found to be completely unusable due to bugs, prompting the Unsloth AI team to upload fixed versions on Hugging Face after Microsoft failed to incorporate Unsloth's bug fixes. This community-driven effort highlights the collaborative nature of open-source AI development and the importance of rapid response to critical issues.


PART 1: High level Discord summaries

Cursor IDE Discord

  • GPT-4.5 Underwhelms Testers with Hefty Price Tag: Early testers find GPT-4.5 from OpenAI overpriced and not significantly better than GPT-4 Turbo, noting the cost at $150 per million tokens.
    • The consensus is that Claude 3.7 Sonnet remains superior for coding, leading some to call GPT-4.5 “just big” and highlight its lack of new frontier capabilities.
  • Claude 3.7 Sonnet Faces High Load and Refusal Issues: Users report issues with Claude 3.7 Sonnet, including frequent high load messages and refusals to answer certain prompts, with some speculating about whether Anthropic is making model more difficult to use.
    • Despite these issues, many still consider Claude 3.7 Sonnet the best model for software engineering due to its ability to accurately follow instructions and debug code effectively.
  • Cursor Riddled with Bugs and Update Woes: Multiple users reported experiencing frequent crashes and the need to reinstall Cursor after updates, and lost code changes to the bugs, and the latest versions may be impacting performance and stability.
    • Others suggested disabling auto-updates and waiting for a more stable release, and some users are claiming the quality of Claude 3.7 coding, on cursor, has reduced compared to launch.
  • Windsurf AI Boasts Quick GPT-4.5 Integration: Windsurf AI announced that GPT-4.5 is now available in Beta on Windsurf, but noted that early testing shows that it’s significantly more expensive (>10x) than alternative models, and is not as fast nor as strong as existing models for software engineering or tool calling.
    • Users debate whether Windsurf's move is mere propaganda to attack Cursor or a genuine effort to provide access to the latest models, even with limitations, according to this tweet.
  • Memory Banks Fall Short of Expectations: Discord members report that the memory banks seems very inefficient to me, and besides being expensive, using Claude 3.7 API can easily reach $50 a day.
    • The inefficiency arises because memory banks sometimes makes mistakes or hallucinates, making it cheaper to hire a programmer.


aider (Paul Gauthier) Discord

  • GPT-4.5 Falls Flat, Claude 3.7 Dominates: Early benchmarks show disappointing coding performance of GPT-4.5 Preview, scoring 45% on aider's polyglot coding benchmark compared to Sonnet 3.7's 65%, leading members to believe it is intended to be a "friendly" non-reasoning language model.
    • Despite GPT-4.5's release, Claude 3.7 remains the top choice for complex coding problems, outperforming GPT-4.5 on coding benchmarks and also easier to jailbreak.
  • DeepSeek R2 Hype Intensifies: Members are highly anticipating DeepSeek's R2 model, expecting it to surpass current SOTA models and disrupt corporate hype, with some comparing DeepSeek's R1 model to O1.
    • The anticipation stems from the sentiment that DeepSeek's Chatbot already outperforms existing models in coding capabilities.
  • Aider Users Advocate for Auto-Retry Mode: Users are requesting an auto-retry mode for Aider to address the unreliability of models like Deepseek R1, proposing a fallback mechanism to another model if the primary one fails.
    • The request highlights the need for more reliable model performance to enhance the Aider coding experience.
  • Sam Altman Blames the Great GPU Shortage for GPT-4.5's insane API price: Sam Altman admitted to the difficulty in meeting GPU demand, which is limiting GPT-4.5's access behind a higher paywall.
    • Some members speculate that the high price of GPT-4.5's API is due to the unaffordability of the model's configuration otherwise.
  • Aider Configuration with Venice AI is now possible: Members are exploring configuring Aider to function with Venice AI, an LLM provider utilizing an OpenAI-style API endpoint, by setting the OPENAI_API_BASE and OPENAI_API_KEY environment variables as described in the OpenAI compatible API documentation.
    • If you would like to use Claude 3.7 with thinking in aider.conf.yaml, here is an example configuration on how to set up the model for the editor with thinking.


OpenAI Discord

  • GPT-4.5 Skips Multimodal Features: OpenAI released a research preview of GPT-4.5, their largest and best model for chat, rolling out to ChatGPT Pro users first, but GPT-4.5 currently does not support multimodal features such as Voice Mode, video, and screensharing in ChatGPT.
    • Initial testing indicates that GPT-4.5 feels more natural due to its broader knowledge base, improved ability to follow user intent, and greater "EQ", making it useful for improving writing, programming, and solving practical problems.
  • Anonymous Model Shadows Sonnet 3.7: An anonymous model is rumored to be around Sonnet 3.7's performance, sparking speculation that if it's GPT 4.5, it's underwhelming given the model size.
    • Members speculated that if OpenAI releases a model that is bigger but performs the same as Sonnet 3.7, then they are behind the competition, even if the model is non-thinking.
  • Cracking LLM's Creative Prose: When using LLMs for creative writing, defining a deep background for characters and directly discussing alternate routes can enhance the narrative's depth and avoid repetitive emotional scenes and clichés.
    • Experiment with having ChatGPT generate conversations and interactions first, followed by a narration from the writer's perspective, steering it towards desired directions.
  • Peeking at OpenAI's Model Spec: OpenAI released its Model Spec which outlines the intended behavior for the models that power OpenAI's products, including the API platform.
    • The goal is to create models that are useful, safe, and aligned with the needs of users and developers while advancing their mission to ensure that artificial general intelligence benefits all of humanity.


Unsloth AI (Daniel Han) Discord

  • Unsloth Unsnarls Phi-4 Mini Fiasco: Members reported issues with Microsoft's Phi-4 mini, and the Unsloth team uploaded fixed versions on HF.
    • The team stated that Microsoft didn't use Unsloth's bug fixes, leading to the model being completely unusable.
  • DeepSeek Drops DualPipe Delight: DeepSeek AI released DualPipe, an algorithm for computation-communication overlap in V3/R1 training, which includes EPLB, an expert-parallel load balancer, optimized for V3/R1.
    • The release is part of a series of releases this week from DeepSeek.
  • GRPO Reward Functions Get Groomed: Community members debugged and improved the reward functions in the GRPO notebook, adding re.DOTALL flag for multiline XML matching, correcting a typo in count_xml, and addressing issues with integer rewards.
    • Community members recommended a block size of 128 as ideal, and an effective size of 64/128 as more stable.
  • Ollama's Think-Token Trickery Troubles Users: A user found that Ollama appends a token to prompts, which prevents the model from generating it, requiring adjustments to output parsing for tags.
    • The user suggested that disabling this feature would be helpful, acknowledging that it stems from the model's processing class.
  • Inception Labs Invents Mercury dLLM: InceptionAILabs introduced Mercury, a diffusion large language model (dLLM), to advance intelligence and speed through parallel, coarse-to-fine text generation.
    • Challenges remain deploying such models, especially lack of OS support and difficulties extending context length could be bottlenecks.


Codeium (Windsurf) Discord

  • Claude 3.7 Prompt Actions Inflated: The team is working with Anthropic to address higher flow actions per prompt in Claude 3.7 Sonnet compared to Claude 3.5 Sonnet.
    • They advise using 3.7 for precise tasks and 3.5 for balanced performance.
  • Claude 3.7 Credit Multiplier Reduced: The credit multiplier for Claude 3.7 Sonnet Thinking decreased from 1.5 to 1.25 due to initial token usage data.
    • Users now consume 1.25 user prompt credits and 1.25 flow action credits per tool call.
  • Cascade Crashes Cause Consternation: Users reported that Cascade isn't working due to a resource_exhausted error, according to a Feature Request.
    • Members are encouraged to follow the roadmap to stay updated.
  • Windsurf Users Question Pricing: Members express confusion over Windsurf's pricing, specifically regarding flow credits and the cost of additional flow actions.
    • Some users found Cursor's pricing preferable for its straightforward approach.
  • GPT-4.5 Enters Beta: GPT-4.5 is available in @windsurf_ai on rolling beta!, but is significantly more expensive (>5-10x GPT-4 Turbo) and rate limits are more strict, with incrementally rolling it out to users.
    • Early testing of GPT-4.5 shows it may not be the best code model. Tweet from Windsurf about GPT-4.5.


GPU MODE Discord

  • DeepSeek's R1 Model Rocks Reasoning Realm: DeepSeek's R1 model enhances reply quality via chain of thought generation, matching OpenAI's o1 on benchmarks and providing open-source access, as detailed in their technical reports and the DeepSeek API documentation.
    • In related news, DeepSeek released DualPipe on Github, a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
  • AIE Toolchain Troubles Trounce Techies: A member struggled with AMD's Zen 5 NPU and AIE toolchain, noting the difficulty compared to Intel, finding Linux support merged recently but installation remains complicated.
    • The member suggested that NPU BLAS was easier to run on Intel architecture.
  • NVIDIA Abandons INT4 TensorCores: A member observed NVIDIA shifting from INT4 Tensor Cores to FP4, sharing quantized model benchmarks.
    • Another member clarified that Ada had INT4, Hopper had INT8, and Blackwell features FP4.
  • CUDA Community Gets Leet-ified: The CUDA community highlights the release of LeetCode for CUDA in beta, inviting users to try it out and provide feedback, but users should expect some hiccups due to its beta status.
    • In related news, NVIDIA is hosting invite-only, hands-on CUDA C++ and CUDA Python tutorials the day before GTC 2025 on Sunday, March 16, 2025, from 12-4 PM, and invites you to also the GPU MODE event from 5-10 PM (lu.ma/8w1ehhrw).
  • Diffusion Models Demolish LLMs in Generation Speed?: Members reported that Diffusion models can achieve super-speedy generation on GPUs, surpassing Groq/Cerebras, and do much better at “fill-in-the-middle” (FIM) compared to other models like DeepSeek V2 Lite (tweet).
    • They highlighted Mercury by Inception Labs, the first commercial-grade diffusion large language model (dLLM) with parallel, coarse-to-fine text generation, claiming to be up to 10x faster than speed-optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s.


OpenRouter (Alex Atallah) Discord

  • OpenAI Suffers Outage: OpenRouter experienced an OpenAI provider outage, which has been resolved after being identified as an incident on OpenAI's side.
    • Requests are now succeeding, and OpenAI as a provider on OpenRouter has recovered.
  • DeepSeek R1 Runs Fast with SambaNovaAI: The 671B-param DeepSeek R1 is now available via SambaNovaAI on OpenRouter, delivering 150 tokens/second.
    • More details can be found on OpenRouterAI's tweet.
  • Sonnet 3.7 Gains Capacity Boost and Browsing: Claude Sonnet 3.7 now features significantly higher rate limits and web search capability on OpenRouter.
    • A reminder of these features was posted on OpenRouterAI's tweet.
  • GPT-4.5 (Preview) Launches at Premium Price: GPT-4.5 (Preview), designed to push boundaries in reasoning, creativity, and long-context conversations, is now available on OpenRouter, costing $75/M input tokens and $150/M output tokens.
    • The announcement links to the OpenAI blog post and a discussion on X, with community members decrying the exorbitant cost compared to models like Grok3 and Claude Sonnet 3.7.
  • Users Track API Usage with YPerf: A member created YPerf.com to monitor model API usage and performance across OpenRouter.
    • The Gemini Flash 1.5 8B ranks #66, costing $0.04, with 0.52s latency and 419.8T/s throughput.


LM Studio Discord

  • Hobbyists Building DIY Robotics Arm: Members discussed building a robotics arm from scratch to learn about servos, CAD, and microcontrollers, recommending a $100 Creality Ender 3 V2 printer from Microcenter.
    • They also pointed to transformers for ML and highlighted open-access courses from top universities like Stanford and videos from Karpathy (ex OpenAI, Tesla) for learning ML.
  • Debating LLM Backends for Websites: Members discussed how to implement an LLM in a website, with suggestions including using websockets, SSR, AnythingLLM, and code editors like Cursor and Continue.dev.
    • It was clarified that hosting a website on GitHub Pages would require the LLM to be hosted elsewhere (Azure, cloud, ngrok).
  • Grok-3's Performance Surprises Members: Members discussed the surprisingly good performance of Grok-3 vs the previous O3 model on various benchmarks, questioning if X.ai's benchmarks were accurate or misleading.
    • The users debated if Grok-3 was rushed to market without proper ethical red-teaming, while others argued that Grok 3 is a beta, monitored, and not on API due to safety reasons.
  • Framework Desktop Features Unified RAM: The Framework desktop features unified RAM between the CPU and GPU, offering up to 128GB of shared memory, with approximately 90GB available for the GPU.
    • One user likened it to a MAC setup, highlighting the appeal of unified RAM in a PC.
  • GMK Announces Ryzen AI Mini-PC: GMK announced the world's first mini-PC based on AMD Ryzen AI 9 Max+ 395, expected to hit the market in the first or second quarter.
    • This mini-PC will feature Zen 5 architecture with up to a 16-core/32-thread configuration and powerful integrated graphics based on the RDNA 3.5 architecture.


Interconnects (Nathan Lambert) Discord

  • Phi-4 Multimodal Family Gets Launched: Microsoft launched the Phi-4 family of small language models (SLMs), including Phi-4-multimodal (processes speech, vision, and text) and Phi-4-mini (excels in text-based tasks), available in Azure AI Foundry, HuggingFace, and the NVIDIA API Catalog.
    • Some users doubt claims that it has similar multimodal performance to Gemini Flash lite.
  • Leaked GPT-4.5 System Card Sparks Debate: A user shared the GPT-4.5 System Card available here, indicating that interacting with GPT-4.5 feels more natural and that internal testers report GPT-4.5 is warm, intuitive, and natural.
    • The card notes that it improves GPT-4's computational efficiency by more than 10x, yet some call the card very boring, while others interpret the card to indicate a GPT4.5 is a creative writer while Sonnet 3.5 is a problem solver.
  • OpenAI Launches GPT-4.5, Character Mainstream?: OpenAI launched GPT-4.5 as a research preview, available to OpenAI Pro users and API developers with image + text in, text out and same context as 4o model, trained till June 2024, official announcement here.
    • A user notes that character/personality is becoming a mainstream topic, and OpenAI aggressively used low-precision training, and is now priced at $75 per million input tokens and $150/million for output.
  • GPT-4.5 Benchmarks Disappoint: Early benchmarks of GPT-4.5 show it being outperformed by o1 on several problems, indicating pre-training isn't the optimal place to spend compute in 2025.
    • One user notes the hallucination metrics are very good while another believes in 1-2 years this will be the default model size.
  • Anthropic Gets Called Out On Sneaky Data: A user accused Anthropic of sneaky data collection from the Computer Use API, using it to train classifiers for corporate ethical guidelines, and updating their website to appear transparent, according to this fxtwitter thread.
    • It was inferred that Anthropic used user data based on their summarization for monitoring blogpost, and although a user pointed out that the data source for training remains unspecified.


Latent Space Discord

  • Speak AI Sees Hockey-Stick Growth: Paul Graham shared Speak AI's revenue graph showing a novel variant of exponential growth, where a company selling a new year's resolution product sees sustained usage due to its effectiveness.
    • Swyx and others observed this unique growth pattern.
  • Hume AI's Octave Sings Emotionally: Hume AI launched Octave, a new LLM for text-to-speech that can design voices with prompts and control emotion and delivery, with a creator studio for long-form content production.
    • The model understands how meaning affects delivery to generate emotional, human-like speech, unlike traditional TTS systems.
  • Diffusion LLM Mercury Rises: Inception Labs introduced Mercury, the first commercial-grade diffusion large language model (dLLM), which promises parallel, coarse-to-fine text generation.
    • Karpathy sees potential for Mercury to demonstrate unique psychology, new strengths and weaknesses, and encouraged people to try it out.
  • Karpathy Shares LLM Wisdom: Andrej Karpathy released a 2h11m YouTube video on How I Use LLMs, a practical guide to the LLM ecosystem with examples, including tool use, file uploads, audio/video I/O, memory, and custom GPTs.
    • The video covers topics such as ChatGPT interaction, tool use (internet search, deep research, Python interpreter), Claude Artifacts, Cursor Composer, Speech I/O, NotebookLM, and image/video I/O.
  • GPT-4.5 Launch Underwhelms: Members experienced initial technical difficulties and felt the GPT-4.5 launch stream was a disappointment, with descriptions such as hostage video.
    • The new model doesn't have an API, and is focused on heavy-tail, real world edge cases like responding to angry texts.


Nous Research AI Discord

  • Wan2.1 Model a Video Diffusion Milestone: The release of Wan2.1, an open and advanced large-scale video generative model, is considered a pivotal moment for video models, similar to Stable Diffusion.
    • Users are excited to see how this model will be used to disrupt the current set of problems and issues when it comes to video diffusion.
  • GPT-4.5: More Compute, Less Impressive?: GPT-4.5 has been released, is more compute-intensive than GPT-4o, with Sam Altman saying that this model feels like talking to a thoughtful person.
    • Despite Karpathy claiming it has 10x more pretraining compute than GPT-4, its use case might be limited given it is overfit on the river crossing puzzle and geared towards creative use cases.
  • Apple Intelligence Gets Thumbs Down: Members found Apple Intelligence underwhelming, calling it a shift from business API use to consumers, and stating they're in an edge-inference-first trap.
    • Some argued that Apple should have prioritized making AI as good as possible, rather than focusing on on-device constraints, however the edge-inference-first constraint ultimately messed it up.
  • Mercury dLLM: Lightning Fast Diffusion LLM: Inception Labs launched Mercury, a diffusion large language model (dLLM) family that they claim is 10x faster than optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s.
    • A code generation model, Mercury Coder, is available for testing in a playground.
  • Reasoning Toggle via Voice?: A user asked about toggling reasoning in an AI model via voice commands, aiming for 90% reasoning off unless specifically prompted with phrases like 'use reasoning'.
    • The user is trying to add a system prompt to achieve this and finetune the reasoning process and enable text-to-speech functionality, potentially with Elevenlabs or Cartesia.


HuggingFace Discord

  • Deepinfra Decimates Fal AI Dollars?: A user claimed Deepinfra is 100x cheaper than Fal AI for character processing, charging $0.8 per million characters and offers free compute.
    • They stated that Fal AI offers $50 free credit, while suggesting Kokoro TTS as another low-cost alternative.
  • REFUTE Benchmark Reckons Reasoning: The REFUTE benchmark assesses Language Models (LMs) in their ability to falsify incorrect algorithmic solutions, revealing even top agents score a low 9%.
    • The paper introducing the benchmark advocates for challenging solutions rather than merely generating them, emphasizing the importance of falsification in scientific discovery with a link to the paper.
  • Smolagents Quiz is a Pain: Multiple users reported issues with the smolagents course quizzes, including display problems with the iframe making feedback unreadable, and contradictory validation from the agent regarding the id argument in HfApiModel.
    • Users expressed frustration over discrepancies between the quiz's security settings and current documentation, as well as confusion about model implementation with HfApiModel versus LiteLLMModel.
  • NVIDIA Neutralizes Nasty Needle Attacks: The NVIDIA AI Red Team identified that prompt injection can exploit plug-ins in the LangChain library.
    • They warned that prompt injection is a new attack technique specific to large language models (LLMs) that enables attackers to manipulate the output of the LLM.
  • PyTorch360Convert Presents Panoramic Potential: A member introduced pytorch360convert, a new lightweight PyTorch library to simplify working with 360° images for VR, AR, video games, and more, available via pip install pytorch360convert.
    • The library supports various image representations, including equirectangular images and cubemaps, and is GPU/CPU compatible with multiple precision types, available on GitHub.


Perplexity AI Discord

  • Voice Mode Vigorously Vouched For: Members discussed the new voice mode feature, noting improvements in UI, the ability to interrupt, and changes to voices.
    • While some users found it impressive, others felt it didn't quite match the level of Microsoft Copilot, Grok 3, or ChatGPT.
  • GPT-4.5 Gossip Grows Galore: Users discussed the potential integration of GPT-4.5 into Perplexity, referencing a YouTube demo and noting it as a model with greater context and more human-like responses.
    • A user shared a link from Sam Altman on X mentioning that GPT-4.5 is the first model that feels like talking to a thoughtful person.
  • Perplexity Users share many Perplexity Links: Several users shared an array of Perplexity AI search and page links, spanning topics from quantum computing to AI communication.
    • These links also included discussions around building a house, and AI-driven diagnoses.
  • API Credit Confusion Causes Concerns: A user inquired about the number of API calls and searches possible with the $5 API credit included with Perplexity Pro, and how to pay if they exceed the given credit.
    • A user also asked about how to get a refund if the API is recharged by mistake and remains unused.
  • Web Clipper Configuration Catastrophe: A user is experiencing issues configuring the Perplexity API with the sonar-deep-research model in Obsidian Web Clipper despite setting the correct Base URL and API Key.
    • The user has provided screenshots of their configuration and the failure message, seeking assistance with troubleshooting.


Stability.ai (Stable Diffusion) Discord

  • Stability AI Kicks off Website Redesign Competition: Stability AI launched a Website Redesign Contest for the Stable Diffusion community to showcase their best work, submissions close on Friday, March 7th.
    • Winning images will be featured on Stability AI’s official website, and entries must use Stable Diffusion 3.5 as a base.
  • SD community hooked on T5 CLIP: A member sought an SDXL-like model with T5 CLIP integration, saying they had a taste of T5 prompt adherence in SD3.5.
    • They found the T5 adherence addictive and was looking for an alternative.
  • ControlNet Models Craze Rages On: A member asked for recommendations for the best ControlNet models to maintain character consistency in SDXL.
    • They specifically requested a reference U-Net model, if available.
  • ComfyUI Remote Installs Now on Sale: A member mentioned selling ComfyUI workflows and remote installs to make them work for users, typically using TeamViewer.
    • They clarified that they charge for their time and knowledge, rather than the workflow itself.
  • Inpaint Anything Hits Snag: A member reported a shape mismatch error in Inpaint Anything: value tensor of shape [159, 256] cannot be broadcast to indexing result of shape [64, 256].
    • The member was using Automatic1111 with the Inpaint Anything extension and asked how to resolve this error.


Eleuther Discord

  • HF Deprecation Feature Fail: A member tried to mark a repo as deprecated on Hugging Face with a link to a newer version, but discovered the feature only applies to models, not datasets.
    • Another member suggested that for small corpora, prompting an LLM to check for relevance is better than tweaking embeddings and rerankers.
  • DeepSeek Doubles Down with DualPipe: DeepSeek released DualPipe, a bidirectional pipeline parallelism algorithm designed to overlap computation and communication in V3/R1 training.
    • A user expressed hope that DeepSeek would release its entire pretraining framework, including core bits, on the final day.
  • Gemini's Flash Thinking Benchmarked Internally: Members discussed Gemini 2.0 Flash Thinking, Google's enhanced reasoning model that shows its thoughts to improve performance and explainability, particularly in math and science.
    • Some suspect the model was benchmarked internally but not published due to underperformance compared to O3 Mini.
  • MI Community Opens Doors with Survey: A survey paper representing many of the major mech interp groups was shared, titled open problems in mechanistic interpretability.
    • Also, 50+ intermediate checkpoints for ALL the SmolLM2 models were released, in the hopes of helping people learn about interpretability.
  • QA Harness sparks question of tasks structures: A member inquired about evaluating QA tasks like ARC-Easy and ARC-hard using a harness, questioning why the concatenation only includes Question + Option instead of Question + Options + Answer for each option.
    • Another member pointed to Mosaic's eval framework and Section 5.2 for background on task structures and evaluation methods.


Yannick Kilcher Discord

  • Microsoft Dodges Dominance Death?: A member claimed Microsoft relies on government support instead of true innovation, while another cited Yahoo as an example of resources not guaranteeing success.
    • The exchange underscored the complex dynamics of market dominance and the importance of innovation beyond financial backing.
  • AI Outputs: Meaningful but Mutable: Members debated how non-deterministic AI models can exhibit deterministic behavior, especially regarding code generation in Cursor.
    • It was noted that AI models generate outputs with the same meaning, even with changes in comments and variable names; the meaning of the output is similar but the literal output changes.
  • GPT-4.5 Focuses on Preference, Not Progress?: The release of GPT-4.5, as introduced in Introduction to GPT-4.5 YouTube video, emphasizes user preference and helpfulness.
    • Some suggest OpenAI felt pressured by Grok-3 and Claude 3.7, leading to the release and increased pricing of $75 per million input tokens and $150 for output.
  • Alexa's AI Upgrade Costs Extra?: The new Alexa, codenamed Remarkable, might require a monthly subscription between $5 and $10 according to tomsguide.com.
    • It remains uncertain if users will pay for Alexa, considering that Google, Samsung, and Apple offer their AI services for free.
  • Hashing Out KV Similarity: Discussions covered hash collisions, where the implementation aims to induce collisions when qkT_i is high, leveraging the collision probability P(h(q) == h(k_i)) where h is a hash function, as described in arxiv.org/pdf/2502.03387.
    • Hash collisions are used as a metric to remove similar key-value pairs.


Cohere Discord

  • Cohere Models play nice with OpenAI SDK: AI Engineers celebrated the ability to access Cohere models directly through the OpenAI SDK using the Quickstart Guide with demos for Python, TS, & cURL, plus streaming, tool calls, and structured outputs.
    • Sandra Kublik tweeted you can now access Cohere models directly through the OpenAI SDK.
  • Cohere releases Command R7B Arabic Model: Cohere released Command R7B Arabic, an R7B model optimized for Arabic which can be found on the Cohere Platform via command-r7b-arabic-02-2025 and on Hugging Face and will be on Ollama later today.
    • According to the release notes, it has a context length of 128,000 tokens and excels at enterprise tasks such as instruction following, length control, RAG, and responding in the correct language.
  • Community Hopes Command R+ update beats Mistral Large: Community members discussed and expressed their eagerness for an upcoming Command R+ update, hoping it will surpass Mistral Large 2411.
    • Members expect that specific release details are unlikely to be shared due to NDAs, and cautioned against spreading unconfirmed information.
  • Arabic LLMs get Benchmark Boost: There was community interest in benchmarking Cohere's R7B Arabic model against Qatar's Fanar model and Saudi's ALLaM, with the suggestion to use the Arabic Balsam index.
    • A member shared a link to the GPT-4.5 system card which provides an overview of benchmarking methodology.
  • Adobe Premiere does Auto Transcriptions: A member suggested that Adobe Premiere has an auto transcription feature, and others confirmed its existence and availability.
    • Previously, community members discussed auto caption and auto subtitle options.


LlamaIndex Discord

  • LlamaIndex boosts Autism Care: LlamaIndex is helping CentralReach transform autism and IDD care with AI, boiling down mountains of research and paperwork into relevant insights and key points to enhance doctor efficiency.
    • The integration of AI in medical fields helps streamline complex data analysis, improving the speed and accuracy of diagnoses and treatment plans.
  • LlamaExtract simplifies Data Extractions: LlamaIndex's LlamaExtract is now in public beta, simplifying structured data extraction from unstructured documents by enabling users to define and customize schemas for data extraction programmatically.
    • The new beta version aims to improve the efficiency of data processing workflows for LlamaIndex users.
  • LlamaParse springs Data Leak: A user reported a data leak in LlamaParse 0.6.2, where images and analyses from other users were mixed into their results, including sensitive information; the issue, confirmed as a mix-up with test/benchmark data, has been fixed in the backend API.
    • The reporter provided a list of Job IDs for investigation, emphasizing the importance of robust data segregation in multi-tenant systems.
  • Docs for LlamaExtract 'Outdated': A user noted that the create_agents method was missing in LlamaExtract 0.0.4, with confirmation that the project has moved to LlamaCloud Services, and that the documentation is outdated.
    • The relevant code is now in the llama_cloud_services repo, indicating a shift towards cloud-based knowledge agent management.
  • Searxng Search Engine Explored: A user inquired about integrating Searxng, a free meta-search engine, into the framework, suggesting a tool for enhanced search capabilities.
    • A member suggested using Searxng with an agent by putting it in a FunctionTool, despite it being a new integration.


DSPy Discord

  • Portkey AI Studio Launches with a Bang: Portkey AI has launched a Prompt Engineering Studio, an IDE for prompt engineers that allows testing across 1600+ models and offers improvements from an AI-powered assistant.
    • The studio features reusable templates, version control, prompt deployment, and performance tracking with real-time analytics; Portkey AI will host a live workshop on March 3rd to demo the studio, with signups available on Portkey's website.
  • ReAct Struggles with Sequential Tool Use: A user questioned how to integrate tools requiring external pings with dspy.ReAct for tasks like creating text and sending emails, especially concerning orchestration.
    • The challenge involves ensuring the system understands the sequence of actions (text creation before email) when the email function necessitates external function calls.
  • DSPy Release 2.6.7 Gets Yanked for Import Errors: Users reported a ModuleNotFoundError in dspy-ai==2.6.7, with a GitHub issue detailing the import failure, hindering module access.
    • Downgrading to version 2.6.6 resolved the issue, the faulty release was quickly yanked, and 2.6.8 was released to address the import problems caused by a migration from setup.py to pyproject.toml.
  • MIPROv2 Runs Out of Token Budget: A user encountered a ContextWindowExceededError with MIPROv2, even after ensuring conversations were under 1000 characters and using light mode.
    • It was suggested that the user reduce the number of demos in the optimizer or set view_data_batch_size=3 in the .compile() call to address the token limit issue, this setting was required to reduce the data summary size.
  • Refine API Evolving Feedback Loops: A user inquired about how to control advice/feedback passed to the LLM on subsequent retries with dspy.Refine, compared to older assertion methods.
    • Feedback will be returned in the reward_fn, and that dspy.Refine should now participate in the compilation feedback mechanism, allowing for optimization of previously unoptimizable suggestions.


Torchtune Discord

  • GPT-4.5 Lands on Azure: A member reported that GPT-4.5 is now accessible on Azure.
    • No further details were provided regarding specific features, pricing, or availability regions.
  • Activation Offloading Requires Checkpointing: A member inquired about why activation offloading necessitates activation checkpointing in Torchtune.
    • Another member clarified that offloading and loading activations can throttle GPU performance due to the significant memory requirements compared to checkpoints, which only store the input vector to the transformer block.
  • Shared Memory to the Rescue: A member sought guidance on efficiently loading merged models in distributed Federated Learning (FL) to prevent downloading on all ranks.
    • The recommended approach was to utilize shared memory instead of dumping the merged model to disk for all ranks to access.
  • DeepSeek's DualPipe Aims to be Parallel: A member shared DeepSeek's DualPipe GitHub repository, showcasing a bidirectional pipeline parallelism algorithm designed for computation-communication overlap in V3/R1 training.
    • Another member noted it may assist in optimizations between FL syncs, even if it is dwarfed by communication overhead.
  • DPO Integration Test in Limbo: A member inquired about the status of the DPO integration test and any issues preventing its addition.
    • Another member indicated that a single-device recipe already exists here and adding a distributed recipe shouldn't pose any problems.


Notebook LM Discord

  • NotebookLM Users Seek Emoji Customization: Users requested the ability to change emojis on their notebooks, but the feature is currently unavailable; users can support existing feature requests or create new ones, as compared against OneNote, Obsidian, and Goodnotes.
    • A user pointed to a tweet lamenting NotebookLM's lack of momentum and mobile apps, blaming Google's pattern of stifling internal innovation.
  • Notebook Sharing Causes Headaches: Users are encountering issues sharing notebooks with groups, finding that simply handing over the link is insufficient, as they need to add users specifically to grant access.
    • It seems that users may need to have an account before they can access a shared notebook, and both adding the user via email and providing the link might be necessary.
  • Audio Overview Plagued by Errors: Users are frequently encountering an error saying 'There was an error fetching your conversation. Please try again' when trying to load the audio overview.
    • The issue seems intermittent, working sometimes but failing frequently, causing frustration among users who rely on this feature.
  • User Encounters 'Service Unavailable' Error: A user reported receiving a 'Service unavailable' error when logging into NotebookLM, with a message indicating that 'You tried to access a service that isn't available for your account', and linked to their Google Account services page.
    • A user suggested that the account may be defaulting to a school account instead of a personal one.


Modular (Mojo 🔥) Discord

  • Modular Restructures Repos, Signals Change: Modular is streamlining its MAX and Mojo repositories, merging them to simplify contributions and consolidate bug reports, according to a post on the Modular forum.
    • This restructure has led to speculation about Mojo's future as a standalone language, with some questioning whether its prioritization is shifting.
  • Mojo Gets HyperLogLog Implementation: A member implemented the HyperLogLog algorithm in Mojo, sharing the code on GitHub and requesting feedback.
    • The developer described Mojo as a more powerful Python, which is fun to use.
  • MAX Taps Undocumented MLIR: Inline MLIR is used within Mojo's stdlib, but it is largely undocumented and intended for internal use by Modular and stdlib contributors and the MAX Graph Compiler.
    • Internal dialects like mo, moq, mogg, mef, mgp, grt, rmo are not intended to be exposed to the public, although some intrepid users are exploring Mojo's internals using nm to discover details related to dialects, types, and ops.
  • Mojo Unions Spark Discussion: The discovery of the union type in Mojo has sparked debate about its intended use and potential hazards.
    • Concerns include poorly defined aliasing and type-punning rules, potentially leading to unexpected behavior.


MCP (Glama) Discord

  • MCP Finds Users in Production: Members are using MCP in production workflows, reporting its utility despite issues with line numbers changing during edits.
    • Mitigation strategies involve clever prompting and resource inclusion to manage these changes, as noted in Open-Source MCP servers.
  • Claude Code's Diff-Based Editing Falters on GO: Users highlighted that Claude Code employs diff-based editing, which encounters problems with Go code because of the way spaces are added for readability.
    • The automated formatting adjustments interfere with the diff-based approach, causing editing failures.
  • Official Everything Server Streams SSE: The official everything server now supports SSE (Server-Sent Events), making it suitable for testing real-time data streams.
    • One user confirmed that SSE is particularly perfect for their testing scenarios, suggesting enhanced capabilities for event-driven applications.
  • Glama AI's GitHub App Seeks Scalability: The creator of Glama AI urged users to install the Glama AI GitHub app to bolster the project and escalate API rate limits.
    • An initial could_not_parse_params error during installation was addressed, with clarification that only registration is needed and no data collection occurs.
  • tinylm Enables Client-Side LLMs with WebGPU: tinylm version 0 released, a library for running LLMs client-side in browsers or Node.js with WebGPU acceleration, featuring an OpenAI-compatible API.
    • Key features touted include zero-cost inference, complete privacy, and support for text generation, text embeddings, and real-time token streaming, according to tinylm - Run Models Locally with WebGPU.


Nomic.ai (GPT4All) Discord

  • GPT4ALL User Asks for Google Gemini LIVE Mode: A user requested a LIVE mode feature akin to Google Gemini, suggesting it could surpass Google's tools and linked to a GPT4ALL Voice Assistant demo built in Python that uses OpenAI Whisper for offline voice detection.
    • The member suggested leveraging voice recognition (STT) for input and TTS for output, for a more conversational user experience.
  • Clarification Sought for GGUF Model Chat Templates: A member inquired about how chat_template is used with GGUF models, specifically if the template is read from the .gguf file on initial load and stored in model3.json.
    • They sought verification that modifications made in the GUI are saved in model3.json, like with gpt4all and Hugging Face models, for persistent configuration.
  • Oobabooga Adds Alltalk TTS: Oobabooga now implements a text-to-speech extension called alltalk_tts that functions with GGUF, AWQ, and GPTQ models.
    • Users have noted that the install process is a little difficult, due to the need for a Python installation with a BAT install, but the upside is that it requires no coding.
  • Slow Internet Cripples TTS Install: One user reported that with their slow internet speed of 40 kbps, the Oobabooga installation would take approximately two days.
    • This is in stark contrast with other users for whom install only took one hour.


tinygrad (George Hotz) Discord

  • GROUP AST struggles with large Tensors: Changes to the AST for GROUP operations are on par with PyTorch when summing (2048,2048) tensors, but falter with (4096,4096) tensors due to needing multiple successive OptOps.
    • The team debated adjusting BEAM search to find these OptOps, or modifying the lowerer/expander to output something different that will do multiple accumulators.
  • BEAM Search meets Frustration: The author faces difficulties in getting BEAM search to identify the optimal sequence of OptOps for summing larger tensors (4096,4096).
    • They are contemplating modifying the lowerer or expander to generate alternative ASTs, but are uncertain of guaranteeing performance gains, linking to a relevant pull request.
  • arange GROUP Optimization Breaks CI: The author notes that the arange GROUP optimization isn't being applied, leading to an extra inner loop in arange operations and broken CI.
    • After rebasing onto master, tests are now passing and successfully matching pytorch performance, and asked for feedback on the arange GROUP optimization.
  • Speed Test Times Out: A member reported that Speed Test BEAM=2 is timing out on GitHub Actions.
    • The author resolved the timeout by trimming some of the added OptOps and also reported that adding GROUP and GROUPTOP slowed the BEAM search because of a greatly increased number of kernels tried.
  • Tests Still Fail on Pull Request: A member reported that tests are still failing on the pull request with slower LLVM speed and 0 gain.
    • The author clarified that it was not ready for review, but asked whether the arange tests failing on GROUP OptOps was a known issue.


LLM Agents (Berkeley MOOC) Discord

  • Discord Server Announces Research Plans: A member announced their research plans and shared a Discord invite link for a more detailed announcement.
    • The member encouraged interested parties to DM them for more information or join the Discord server directly for projects and collaborative opportunities.
  • Research Track Subgroups on the Horizon: A research track is forming that will focus on predictive decision making and long-term memory in agents, with sync meetings to discuss lectures and foster collaboration.
    • Interested members can join via this Discord invite to enhance agents' abilities to anticipate future outcomes and make informed choices.


MLOps @Chipro Discord

  • tinylm v0 Released: A library for running LLMs and embedding models client-side in a browser or Node.js with WebGPU acceleration has been released, called tinylm.
    • It supports OpenAI SDK like text generation and embeddings generation with text-to-speech and speech-to-text coming soon, with no servers needed.
  • tinylm mimics OpenAI API: tinylm provides an OpenAI-compatible API for running language models directly in your browser or Node.js application using WebGPU acceleration.
    • Features include zero-cost inference, client-side processing, text generation, text embeddings, cross-platform compatibility, true streaming, and detailed progress tracking.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.