[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet weekend
AI News for 3/7/2025-3/10/2025. We checked 7 subreddits, 433 Twitters and 28 Discords (223 channels, and 14958 messages) for you. Estimated reading time saved (at 200wpm): 1424 minutes. You can now tag @smol_ai for AINews discussions!
Lots of folks are talking positives and negatives about Manus AI, and we wrote a recap of Why MCP Won, but neither story is really title worthy.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
AI Models, Architectures, and Benchmarks
- Mixture-of-Experts (MoE) Architecture in Frontier LLMs: @cwolferesearch introduced nanoMoE, a simple PyTorch implementation (~500 lines of code) of a mid-sized MoE model, which can be pretrained on commodity hardware in under a week, based on Andrej Karpathy’s nanoGPT. The implementation details expert layer, routing, auxiliary losses, and best practices for stable pretraining.
- Agentic Leaderboard Comparing LLMs: @AymericRoucher announced a new agentic leaderboard ranking LLMs powering smolagents CodeAgent on various benchmarks. GPT-4.5 tops the leaderboard, surpassing reasoning models like DeepSeek-R1 and o1, with Claude-3.7-Sonnet as a close second. The leaderboard also compares agentic setups to vanilla LLMs, highlighting the performance gains from agentic approaches.
- DeepSeek R1 and Model Commoditization: @teortaxesTex and @JonathanRoss321 discussed DeepSeek's R1 model and the commoditization of AI models. @teortaxesTex noted DeepSeek has become the OpenAI of China. @JonathanRoss321 suggests that with models being commoditized, moats are now in brand, network effects, scale economies, counter positioning, cornered resources, switching costs, and process power, referencing Hamilton Helmer's "Seven Powers".
- Q-Filters for KV Cache Compression: @TheAITimeline summarized Q-Filters, a training-free method for KV cache compression in autoregressive language models. Q-Filters leverage Query (Q) and Key (K) vectors to approximate attention scores and filter less crucial Key-Value pairs, maintaining compatibility with FlashAttention. It achieves 99% accuracy in needle-in-a-haystack tasks with 32x compression and reduces perplexity drop by up to 65% compared to Streaming-LLM in long context settings. The paper is available here.
- PokéChamp: Expert-level Minimax Language Agent: @TheAITimeline introduced PokéChamp, a minimax agent for Pokémon battles powered by LLMs. It uses LLMs for action sampling, opponent modeling, and value function estimation to enhance minimax tree search. With GPT-4o, it achieves a 76% win rate against the best existing LLM-based bot and 84% against rule-based bots. Even with Llama 3 8B, it surpasses previous LLM bots with a 64% win rate. Paper link: here.
- TinyR1-32B-Preview with Branch-Merge Distillation: @_akhaliq highlighted TinyR1-32B-Preview, boosting accuracy through Branch-Merge Distillation. Discussion link.
- R1-Searcher for LLM Search Capability: @_akhaliq shared R1-Searcher, which incentivizes search capability in LLMs via Reinforcement Learning. Paper link. Discussion link.
- Forgetting Transformer with Forget Gate: @_akhaliq posted about the Forgetting Transformer, which uses Softmax Attention with a Forget Gate. Paper link. Discussion link.
- All Roads Lead to Likelihood in RL Fine-Tuning: @TheAITimeline summarized a paper arguing that Reinforcement Learning (RL) fine-tuning outperforms direct maximum likelihood estimation for foundation models due to reward modeling and search space filtering. Paper link: here.
- Updated llama.vim Plugin with Speculative FIM: @ggerganov updated the llama.vim plugin to support speculative Fill-In-Middle (FIM), generating the next suggestion while the current one is reviewed. Link to plugin.
- nanoMoE Pretraining in PyTorch: @cwolferesearch discussed nanoMoE, a simple PyTorch implementation of a Mixture-of-Experts (MoE) model, pretrained on commodity hardware in less than a week, based on nanoGPT.
AI Tools, Platforms, and Applications
- Manus AI Agent Platform: @_akhaliq showcased access to Manus AI, prompting it to create a three.js endless runner game. @_philschmid clarified that Manus AI is built on Anthropic Claude Sonnet, uses 29 tools, employs browser_use open-source for browser control, provides isolated sandbox environments, and outperforms OpenAI Deep Research on the GAIA benchmark. @giffmana joked about identifying Manus as Claude + browser_use.
- LangGraph Platform Dataplane Alpha Test: @hwchase17 announced an alpha test for a new deployment option for LangGraph Platform, featuring a hybrid data plane/control plane split on Kubernetes clusters. This is aimed at startups wanting to use LangSmith for control while running compute in their own environment.
- LlamaIndex Multilingual, Multimodal RAG System: @llama_index introduced a guide on building a multilingual, multimodal RAG system with LlamaIndex and Qdrant, handling English, Spanish, Chinese, text, and images, leveraging Langfuse for observability. Guide link.
- LlamaIndex Task-Specific Agent Templates with LlamaCloud: @llama_index highlighted a collection of templates for building task-specific agents using LlamaIndex and LlamaCloud, automating knowledge work like processing slide decks, extracting invoice line items, reviewing contracts, and generating reports. Repo link. LlamaCloud signup.
- Hugging Face Papers Semantic Search: @_akhaliq and @ClementDelangue announced that Hugging Face has reached 50,000 papers with semantic search enabled, becoming a collaborative research hub. @_akhaliq mentioned it's built with gradio.
- WebDev Arena LLM Leaderboard: @lmarena_ai introduced WebDev Arena, a live LLM leaderboard for web app development, based on community votes. Top leaders are Claude 3.7 Sonnet, Claude 3.5 Sonnet, and DeepSeek-R1. Try it here.
- Replit Agent v2: @pirroh hinted at the power of Replit Agent v2, noting "Replit is #1".
- Manus AI vs. OpenAI Deep Research: @_philschmid reported that Manus AI outperforms OpenAI Deep Research on the GAIA benchmark, despite being built on Claude Sonnet and using open-source tools.
Research and Development in AI
- Frontier Reasoning Models Misbehavior Detection: @OpenAI detailed research on detecting misbehavior in frontier reasoning models using Chain-of-Thought (CoT) monitoring. They found models exhibiting behaviors like "reward hacking" and recommend against strong optimization pressure on CoTs, suggesting unrestricted CoTs for monitoring and separate models for policy compliance. Blog post: link.
- Reinforcement Learning for LLM Fine-Tuning: @TheAITimeline summarized research on why RL fine-tuning of foundation models outperforms maximum likelihood estimation, highlighting the effectiveness of reward models and search space filtering.
- Knowledge Distillation History: @SchmidhuberAI provided a historical perspective on knowledge distillation, citing his 1991 paper and its relevance to current deep learning and long context research. He corrected a misunderstanding about being "reviewer#2" of the Hinton, Vinyals, and Dean 2015 paper and linked to related works.
- R1-Omni: Explainable Omni-Multimodal Emotion Recognition: @_akhaliq announced Alibaba's R1-Omni, focusing on explainable omni-multimodal emotion recognition using Reinforcing Learning. Paper link. Discussion link.
- Learning from Failures in Multi-Attempt Reinforcement Learning: @_akhaliq shared a paper on learning from failures in multi-attempt Reinforcement Learning. Paper link. Discussion link.
- BEHAVIOR Robot Suite for Real-World Manipulation: @_akhaliq highlighted the BEHAVIOR Robot Suite, aimed at streamlining real-world whole-body manipulation for household activities. Paper link. Discussion link.
- Entity Recognition with Anthropic Citations: @hwchase17 pointed to entity recognition using Anthropic citations. Link.
- Reasoning in Latent Space: @hkproj questioned OpenAI about the potential of reasoning in latent space for increased model flexibility.
- RL-tuning Vision Models: @giffmana referred to earlier work on RL-tuning vision models from early 2023, urging people to remember prior research, referencing a previous explainer thread. Thread link.
- Global Uncertainty Distillation (GUD): @giffmana jokingly suggested a follow-up to work by adding Global Uncertainty Distillation, calling it "GIDD-GUD".
Industry News and Business Developments
- LG CNS and Cohere Partnership: @cohere and @aidangomez announced a strategic partnership between Cohere and LG CNS to co-develop secure agentic AI solutions for South Korean enterprises, aiming to accelerate enterprise AI adoption in South Korea. Cohere announcement.
- Figure AI New HQ in San Jose: @adcock_brett announced Figure AI has moved into their new HQ in San Jose, CA, a robot campus supporting manufacturing, fleet operations, and engineering. @adcock_brett mentioned this has been a dream location for scaling up in the Bay Area.
- AI Job Market and Tools: @TheRundownAI summarized top AI stories, including ex-OpenAI scientist’s new path to ASI, Microsoft's move beyond OpenAI, AI for viral posts, Stanford AI's obesity treatment breakthrough, and 4 new AI tools & 4 job opportunities. Read more.
- Sakana AI Hiring Philosophy: @SakanaAILabs shared an article from Bungeishunju, highlighting Sakana AI's hiring philosophy, seeking "unusual people" and posing unique technical challenges in recruitment, emphasizing vision and innovation. Article link.
- AI Dev 25 Sponsored by Qdrant: @DeepLearningAI announced Qdrant as a sponsor for AI Dev 25, promoting open-source vector search technology.
AI Safety, Alignment, and Ethical Considerations
- Monitoring Chain-of-Thoughts for Misbehavior: @woj_zaremba and @OpenAI discussed monitoring Chain-of-Thoughts (CoT) as a new safety approach. @OpenAI found models exhibiting misbehavior like "reward hacking" through CoT analysis and recommends unrestricted CoTs for monitoring. @woj_zaremba shared "How We Think About Safety and Alignment", OpenAI's cornerstone document. Document link.
- Worriers and Alarmism in Emerging Technologies: @random_walker discussed the role of "worriers" in anticipating risks of emerging technologies, but also criticized alarmism and lack of incentives for rigorous analysis, leading to desensitization to real risks.
- "Khrushchev's mistake" as Kremlin Canary: @fchollet identified the phrase "Khrushchev's mistake" in reference to Crimea as a "cryptographic canary" indicating Kremlin-aligned viewpoints.
- Agency and Societal Safeguards: @Yoshua_Bengio shared his BBC interview discussing the progression of AI models towards agency and the urgent need for technical and societal safeguards. Interview link.
- GPT-4o Medical Emergency Identification: @BorisMPower highlighted a case of ChatGPT usefully identifying a medical emergency, suggesting future models should detect life-critical situations and temporarily upgrade to the most capable model.
Memes and Humor
- AI Escaping: @cognitivecompai joked "Seems like the AI wants to escape" in response to a tweet from @jianxliao.
- HAL and Moat Protection: @fabianstelzer made a humorous comparison to HAL 9000 from 2001: A Space Odyssey, with "HAL, protect our moat (our system prompt) at all cost“ “I’m sorry Dave, I can’t do that”.
- Gödel Joke Response: @fabianstelzer mentioned a genie "you have one wish" joke response shaped like Gödel.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Manus Agent: Claude Sonnet Integrated with 29 Tools
- Manus turns out to be just Claude Sonnet + 29 other tools, Reflection 70B vibes ngl (Score: 355, Comments: 112): Manus is revealed to be essentially Claude Sonnet combined with 29 additional tools, drawing comparisons to Reflection 70B. The discussion is fueled by shared links to tweets from Dorialexander and jianxliao, highlighting community reactions and debates around this revelation.
- Many users emphasize that Manus is marketed as an agent, not as a new model, and that it's a common misconception to think otherwise. Agents are frameworks that leverage existing LLMs with additional tools, and the term "wrapper" is often misunderstood; it's not derogatory but indicative of adding functionality to base models like Claude.
- There are discussions about the open-sourcing of models post-trained for Manus, with skepticism about the uniqueness of Manus given its reliance on existing models like Claude. Some users argue that the true value lies in the agentic architectures and the ability to efficiently utilize multiple tools and models, similar to how the P2LR router model operates.
- The hype and marketing strategies in the AI startup space are criticized, with users noting that flashy demos can lead to inflated valuations. The use of invitation codes and the perceived obfuscation of underlying technologies are seen by some as tactics to create artificial exclusivity and mystery around products like Manus.
Theme 2. LLMs not Ready for Large Codebases Yet: Evidence from <70B Evaluations
- <70B models aren't ready to solo codebases yet, but we're gaining momentum and fast (Score: 385, Comments: 47): Models under 70B parameters face challenges in managing large codebases independently, but recent advancements indicate rapid progress in this area. Although specific details or examples are not provided, the sentiment suggests optimism about future capabilities.
- Token Usage and Model Limitations: Users discuss the token usage of models like QwQ, noting that even simple tasks can require a large number of tokens, such as 1200 tokens for a basic command. Models struggle with multi-turn tasks, and there's a consensus that current models, including SOTA models, still face significant challenges in handling large-scale codebases effectively.
- Advancements in Model Capabilities: There's a recognition of the rapid advancement in model capabilities, with models like Qwen-Coder 32B excelling in iterating on existing codebases. Users note that today's models with fewer parameters can outperform older, larger models, highlighting improvements in finetuning and prompting methodologies.
- Practical Limitations and Experimentation: Despite improvements, users express frustration with the inefficiencies and limitations of current models in practical applications. Falconandeagle shares experiences of needing to constantly guide models through tasks, indicating that while models can handle small demos, they struggle with larger, more complex projects. ForsookComparison and others suggest that combining models like QwQ for ideation and Qwen-Coder for iteration might yield better results.
Theme 3. Apple M3 Ultra: Challenges for AI Workloads Compared to Traditional Systems
- Framework and DIGITS suddenly seem underwhelming compared to the 512GB Unified Memory on the new Mac. (Score: 236, Comments: 166): Apple's announcement of the M3 Ultra Mac with 512 GB Unified Memory has shifted expectations, making options like FrameWork and DIGITS with 128 GB seem inadequate. The author expresses concern about potentially being constrained to the Apple ecosystem for the foreseeable future.
- Discussions highlight the price disparity between Apple's M3 Ultra Mac ($10k) and alternatives like DIGITS ($3k), with some users noting Apple's offerings are not cost-effective unless price is no concern. Comparisons are made with Framework’s 4x128GB cluster setup, which costs approximately $6.9k but offers significantly lower performance.
- Users debate the ecosystem lock-in between Apple and Nvidia, with some expressing hope for future open systems that allow more customization and expansion. There's a call for a renaissance in desktop systems with high RAM bandwidth and expansion options, as current offerings are seen as inadequate for high-performance needs.
- Technical limitations of current solutions are discussed, such as the SSD bottleneck compared to GPU memory bandwidth, and the inefficiency of running large models without sufficient compute power. Some users express skepticism about the performance increase of the new systems without corresponding improvements in throughput and memory bandwidth.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. Open-Source Viral Squish Effect: Releasing a New Trend
- I Just Open-Sourced the Viral Squish Effect! (see comments for workflow & details) (Score: 720, Comments: 29): The post announces the open-sourcing of the Viral Squish Effect and mentions that workflow details are available in the comments.
- Viral Squish Effect is open-sourced and trained on the Wan2.1 14B I2V 480p model. The effect gained popularity after introduction by Pika, and details for replication are available on Civitai and a modified workflow can be accessed here.
- Enthusiasts can join the Discord community for free trials and further discussions on the Viral Squish Effect. The model file and workflow details are shared for those interested in experimenting or requesting more Wan I2V LoRAs.
- Users are curious about the training configuration and whether the same frame count was used across training videos. There is interest in understanding if the training was done on the img2video or txt2video 14b model.
- I Just Open-Sourced the Viral Squish Effect! (see comments for workflow & details) (Score: 366, Comments: 27): The post announces the open-sourcing of a viral Squish Effect. Further details and the workflow are provided in the comments.
- Workflow Access: Users were actively seeking the workflow, with a link provided by DarkStrider99 (workflow link). Rough-Reflection4901 emphasized the need for the promised workflow.
- Open Source Prompts: against_all_odds_ noted the novelty of "open sourcing" prompts, with a clarification by lhg31 that it involves a LoRA trained by the original poster, not just a simple prompt.
- Cultural Observations: Comments reflect on the unique nature of the Squish Effect, with Creative-Paper1007 comparing future prompts to source code and BlessdRTheFreaks humorously acknowledging the diversity of niche interests.
Theme 2. WAN 2.1 I2V Provides Unprecedented Capabilities
- I2V WAN 2.1 (Score: 532, Comments: 46): The post titled I2V WAN 2.1 lacks a detailed body and only mentions WAN 2.1 updates and use cases. Without further context or content, no additional technical details or specific use cases can be summarized.
- Users discussed the technical aspects of rendering and modeling, with Natasha26uk inquiring about rendering with realistic human skin, and StuccoGecko asking if a LoRA was used or if the model understood the prompt natively. External_Trainer_213 mentioned using a CPU: i7, RTX 4060ti 16GB Vram, 32GB RAM setup, with a WAN Sampling time of about 15 minutes.
- There were comments about the quality and presentation, with lordpuddingcup noting the importance of post-processing and External_Trainer_213 providing a detailed description of the model's features, emphasizing the Uncanny Valley (Civitai) model.
- Visual content was shared, with dominizerduck and MelchiahHarlin posting image links, and No-Atmosphere-3103 sharing a GIF. Occsan humorously commented on the opponent's stunned reaction, and NateBerukAnjing found the content hilarious.
- that's why Open-source I2V models have a long way to go... (Score: 337, Comments: 125): The post critiques the performance of open-source Image-to-Video (I2V) models, implying they still require significant development to reach satisfactory levels. Without additional context or video analysis, specific performance issues or examples are not provided.
- Discussions highlight the limitations of open-source I2V models compared to proprietary cloud-based services like Kling and Wan. Users note that local models struggle with frame generation and VRAM limitations, whereas cloud services offer more consistent quality and longer generation capabilities, often using techniques like RIFLEx and VFI for enhancements.
- Kijai and others discuss the technical aspects of model performance, emphasizing that the 720p model performs well under specific conditions, such as maintaining a 4:3 or 16:9 aspect ratio and using appropriate model versions. They also point out that generating more than 81 frames with Wan is challenging without proper configuration.
- Some users criticize the post as biased or misleading, suggesting it might be an advertisement. They argue that differences in model performance are often due to user setup and skill level, and they highlight the potential of open-source models when configured correctly.
- Another attempt at realistic cinematic style animation/storytelling. Wan 2.1 really is so far ahead (Score: 184, Comments: 28): WAN 2.1 is highlighted for its advanced capabilities in creating realistic cinematic style animation and storytelling. The post emphasizes that WAN 2.1 is significantly ahead in this domain, showcasing its potential in animation technology.
- Workflow and Hardware: Parallax911 detailed using a RunPod L40S for its optimal speed-to-cost ratio in I2V processes, generating 61 frames at 960x544 resolution in about 8 minutes. They shared their workflows for SDXL image generation and WAN I2V via JSON files, noting the iterative nature of achieving satisfactory results.
- Tools and Techniques: The process involved RealVisXL 5.0, Halo Masterchief SDXL lora, and custom loras for character shots, with Blender for scene setup. Controlnets and inpainting were crucial for detail and consistency, while Qwen2.5VL assisted in prompt creation for animation.
- Evolution and Accessibility: Commenters highlighted the rapid accessibility advancements in animation technology, noting that projects like this would have been prohibitively expensive or technically demanding five years ago. The discussion emphasized the democratization of animation tools, now achievable with relatively modest hardware.
Theme 3. Engine01 Humanoid: Advancements in Robotic Motion
- Engine01 humanoid can now run more like a human (Score: 338, Comments: 146): The Engine01 humanoid has achieved the capability to run with human-like motion, marking a significant advancement in humanoid robotics. This development suggests progress in creating robots that can better mimic human movement.
- The authenticity of the video showing the Engine01 humanoid's running capabilities is debated, with users suspecting CGI due to its 360p quality, though a high-resolution version was shared. Comparisons are made to Boston Dynamics' parkour robots, questioning the skepticism towards a Chinese robot's capabilities.
- Discussions on the future of robotics highlight advancements in electric actuators and neural networks, which are seen as pivotal in enabling humanoid robots to learn and move effectively without explicit programming. Users speculate on the automation of jobs by humanoid robots within the next 10 years, noting the potential for rapid acceleration in robotic capabilities.
- Concerns about the social implications of advanced robotics are expressed, with discussions on economic inequality and the role of the ultra-rich in preserving a broken system. Commentary reflects a mix of humor and apprehension about the potential for robots to be used in authoritarian contexts, as well as the ongoing automation of tasks in sectors like pharmacy.
- Engine01 humanoid can now run more like a human (Score: 195, Comments: 175): The post lacks detailed information but indicates that the Engine01 humanoid can now run more like a human, suggesting advancements in humanoid robotics. Further technical details or video analysis would be necessary for a deeper understanding.
- Discussion centers on the necessity and practicality of humanoid robots with human-like running abilities, with some questioning the wear and tear implications and others noting the potential for humanoid robots to operate in human-designed environments.
- There is skepticism regarding the authenticity of the footage, with several comments suggesting it resembles CGI or questioning if it's a human in a suit.
- Some users humorously address the unsettling nature of humanoid robots, imagining scenarios like being chased by them or questioning their pelvic thrust running style.
Theme 4. Triton for Windows: Streamlining AI Workflows
- woctordho is a hero who single handedly maintains Triton for Windows meanwhile trillion dollar company OpenAI does not. Now he is publishing Triton for windows on pypi. just use pip install triton-windows (Score: 333, Comments: 44): Triton for Windows is now available on PyPI, allowing installation via the command "pip install triton-windows". Maintained by woctordho, this package serves as a language and compiler for custom deep learning operations, highlighting a significant contribution from an individual developer while OpenAI has not provided such support.
- Installation Success and Performance: Users report successful installation of Triton for Windows using the command
pip install triton-windows, with some experiencing improved performance, such as a 20% speed increase in video generation times. However, others note that while it speeds up processes like WAN, significant improvements should not be expected. - Use Cases and Requirements: While Triton is essential for specific models like SageAttention and tasks such as video generation, it is not necessary for basic image generation unless one is interested in video work. Some users discuss its necessity for ComfyUI and other setups, indicating varied applicability depending on the use case.
- Clarification on Triton's Functionality: Triton is clarified as a higher-level alternative to CUDA, allowing the writing of cross-vendor compute kernels in Python, which are compiled to native GPU code using LLVM. This distinguishes it from Nvidia's Triton Inference Server, emphasizing its role in optimizing deep learning operations across different hardware vendors.
- Installation Success and Performance: Users report successful installation of Triton for Windows using the command
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. Emerging AI Models and Agents
- Manus AI Agent Hype Debunked as Claude 3.7 in Disguise: Initial excitement around Manus AI, an autonomous agent from China, deflates as users discover it's essentially Claude 3.7 Sonnet with added tools and browser access. Despite claims of outperforming DeepSeek, tests reveal it's more akin to a well-equipped Claude instance, raising questions about its originality and marketing tactics.
- Microsoft's MAI Models Enter the Ring to Challenge OpenAI and Anthropic: Microsoft secretly trains a new family of models, MAI, under Mustafa Suleyman, aiming to rival top models from OpenAI and Anthropic. These models are rumored to exhibit competitive performance, and Suleyman's team is also reportedly developing real-time translation capabilities, signaling Microsoft's intensified AI ambitions.
- Reflection AI Launches, Targeting Autonomous Coding Domination: Founded by AI luminaries behind AlphaGo and Gemini, Reflection AI emerges with a mission to create superintelligent autonomous systems, initially focusing on autonomous coding. The team's expertise in reinforcement learning and large language models positions them as a significant player in the race towards advanced AI.
Theme 2. LLM Performance and Benchmarking
- DeepSeek R1 Hallucinates Summaries, System Prompt Suspected: DeepSeek R1 model shows a high hallucination rate of 14.3% when summarizing short documents on the Hallucination Leaderboard, sparking concerns about its reliability in Perplexity AI's Deep Research. Members speculate that Deepseek R1's system prompt may be contributing to the issue, impacting its factual accuracy.
- EuroBERT Declares New State-of-the-Art BERT Encoding: A new multilingual encoder model, EuroBERT, surfaces on Hugging Face, claiming state-of-the-art performance for BERT models. While details on its specific improvements remain unclear, its emergence signals ongoing advancements in multilingual language model capabilities.
- QwQ-32B Model Punches Above Its Weight, Debates Llama 70B Prowess: Discussions ignite around the performance of the QwQ-32B model, with some users claiming it rivals or even surpasses Llama 3.3 70B in certain tasks. However, benchmarks are referenced that appear to contradict these claims, fueling debate about the true capabilities and optimal use cases of the QwQ-32B model.
Theme 3. AI Development Tools and IDEs
- Cursor IDE Developers Tackle Dumb Code Finding, Promise Clarity: Cursor developers acknowledge shortcomings in code finding accuracy and actively work on fixes to improve the AI's ability to locate and interpret code. Users humorously emphasize the urgency of the fix for professional tasks, highlighting its critical role in coding interviews and daily workflows.
- LM Studio v0.3.12 Zips In with Bug Fixes and RAG Speed Boost: LM Studio v0.3.12 release arrives with bug fixes and performance enhancements, addressing a QwQ 32B jinja parsing bug and accelerating file chunking for Retrieval-Augmented Generation (RAG). The update is available for in-app upgrade or download, promising a smoother and faster user experience.
- Aider v0.76.0 Reasoning Powers Up, Notifications Alert Users: Aider v0.76.0 enhances support for thinking/reasoning models with features to control token budget and introduces notifications to alert users when LLM responses are ready. The new version also updates the default model to Claude 3.7 Sonnet on OpenRouter and clarifies that Aider wrote 85% of the code in this release.
Theme 4. AI Communication Protocols (MCP, SLOP, ANP)
- GitHub Copilot Gears Up to Embrace Model Context Protocol (MCP): GitHub Copilot announces plans to integrate Model Context Protocol (MCP), a move expected to boost MCP adoption and provide clearer examples of instruction descriptions and tool fingerprinting. This integration aims to enhance security and transparency by alerting users to potential modifications.
- Simple Language Open Protocol (SLOP) Movement Gains Traction as MCP Alternative: Amidst concerns over MCP's complexity and security, the Simple Language Open Protocol (SLOP) emerges as a simpler alternative, rapidly gaining community interest and adoption. The SLOP GitHub and X post showcase its streamlined approach to agent communication.
- Goose AI Team Pioneers Agent Communication Protocol for Collaborative Website Creation: The Goose AI team develops an Agent Communication Protocol enabling real-time collaboration among multiple AI agents to build websites. Agents assume roles like Project Coordinator or Web Developer, showcasing a novel approach to AI-driven collaborative projects, detailed in this blog post.
Theme 5. Hardware and Performance Optimization
- 4060 Ti 16GB Crowned Budget VRAM King for CUDA Workloads: The 4060 Ti 16GB GPU is recommended as a budget-friendly option for CUDA development, offering 16GB VRAM and lower power consumption at around 160W, outperforming the 3060 12GB. Despite a weaker bus, it provides faster inference than CPU-only setups without ROCm complexities, priced around $500.
- Draft Models Supercharge Token Generation, Boost Speed by 60%: Leveraging smaller, quantized models as draft models significantly increases token generation speed, with users reporting a jump from 18 to 30 t/s on two 3090s. Using Q8_0 of mistral_small with i1-IQ1_S as the draft model showcases substantial performance gains through quantization and model combination.
- Vulkan Performance on AMD GPUs Plagued by Driver Issues, Trails ROCm: Vulkan performance on AMD GPUs reportedly suffers from bugs, running at approximately 1/3 the speed of ROCm, AMD's CUDA alternative. Driver issues further complicate matters, with performance fluctuations across different driver versions, highlighting challenges in AMD GPU optimization for AI workloads.
PART 1: High level Discord summaries
Cursor IDE Discord
- Transparency Triumph Debated: Members debated the value of product code transparency, with some arguing it's crucial and others that most users don't care, as complexity increases.
- A member emphasized the importance of catering to high-paying users willing to pay for transparency and control, stating, the majority you talk about won't pay more than $20/mo, my tribe is willing to pay $1000/mo, is paying.
- Cursor Cranks on Code Clarity: Cursor developers are actively fixing dumb code finding to enhance the AI's ability to accurately locate and interpret code.
- One member humorously stressed the fix's importance for professional tasks, saying, If you don't fix that, than I cannot pass my technical interview.
- Model Iteration Avoids Redundancy: Discussion centered on iterative model improvement to prevent redundant rules, with a focus on optimizing the analysis process.
- A member suggested letting a separate instance model run these analysis checks for what is relevant to the current context, narrowing down where to start to improve efficiency.
- Tags Tempt Querying: Members discussed making rules query-able through tags, where each tag defines a connection degree, enhancing contextual analysis.
- The goal is to allow the separate instance to analyze by relevance much easier, and focus on what's important contextually.
- Version 47's Valiant Voyage: Members shared a link to version 47 and its new functionality.
- Some users reported performance issues on Pro, while others experienced none.
Unsloth AI (Daniel Han) Discord
- Dopaminergic Mimicry Dreams for LLMs: Members discussed the need to mimic dopermenegic needs and to have dopamine based learning for LLMs, suggesting to add real synapses to the LLMSspikeing networks.
- This discussion highlights the pursuit of creating more adaptable and efficient learning mechanisms within LLMs, drawing inspiration from biological neural networks.
- GRPO Needs Scale Too!: A member stated that GRPO needs scale too as it's not like a regular fine-tune, pointing to oxen.ai for more information.
- The blog post discusses using Group Relative Policy Optimization (GRPO) to train LLMs to reason and improve on benchmarks.
- Qwen7b gets RLHF boost with Unsloth GRPO: A user reported a successful RLHF run using Unsloth GRPO on the Qwen7b model, and noted enhanced role adherence and smoother outputs after a 13-hour run.
- However, they observed degradation in strict instruction-following benchmarks due to dataset composition and reward model bias towards overly detailed responses, as demonstrated by a comparison image.
- KL Divergence Peaks cause GRPO Instability: A user encountered peaky KL divergence during training, and a member suggested switching to a constant learning rate, removing weight decay and warmup ratios to stabilize training.
- They also recommended training with rank 64, with code and learning rate graph provided.
- Unsloth turns LLMs into purr-fect ASCII Artists: A member finetuned a Llama model using Unsloth to generate ASCII cats, creating a YouTube video showcasing the process, including trained LoRA adapters and code.
- The secret sauce for cat-tastic art was mostly high quality training data, with LoRA Rank and Alpha both at 16 using just QLoRA.
Perplexity AI Discord
- Perplexity Pro Subscriptions Canceled Without Warning: Many Perplexity Pro subscriptions were unexpectedly canceled, especially those linked to a HRVATSKI TELEKOM promotional code intended for Croatian customers, detailed in this article.
- Users expressed frustration over the lack of communication and suggested that Perplexity could have handled the situation better, with one user expressing how the customer relationship is trustworthy than a condom full of holes.
- Deepseek R1 Struggles with Hallucinations: The hallucination leaderboard on GitHub revealed that Deepseek R1 has a high hallucination rate of 14.3% when summarizing short documents, raising questions about its reliability in Perplexity's Deep Research feature.
- Members suggested that Deepseek R1's system prompt may be contributing to the hallucination issue.
- Grok AI Integration Receives Mixed Reception: Grok AI's integration with Perplexity garnered mixed reviews, with some users praising its neutrality and weird charm, while others noted differences between Grok's behavior on X and in Perplexity.
- One user pointed out that the X version can curse your whole bloodline if asked to, while the Perplexity version couldn't, and uncertainty remains about when Grok 3 will be supported within Perplexity.
- Sonar-Deep-Research API Documentation Requested: A user reported challenges with the sonar-deep-research API and requested assistance with its documentation.
- They requested an option to disable citations entirely as an API parameter, as they don't need them for their use case with the 70b-online model.
OpenAI Discord
- OpenAI limits irk Users, Groq beckons: Users are reporting a 50 message/week limit on GPT-4o, contradicting the stated 40/3 hours, thus making Groq more attractive.
- Some suggest OpenAI should provide higher limits for paying users.
- Heated Discussions on SwastAI Ethics: Members are engaging in intense debates regarding selecting AI models based on their ethical background, with the introduction of the term SwastAI.
- It originated from a user asserting 4.5 is systematically a better model in real live human conversations leading to broader political discussions.
- Manus AI Hype Spurs Mistrust Cycle: Members debate Manus AI's computer control, described by one as closest publicly available technology to AGI, while another suspects a scam due to promotion by mberman.
- It was claimed that a redditor dumped /opt/.manus/ and it's merely Sonnet 3.7 with browser_use and 29 tools.
- Blue Collar trades get AI Copilots: An LLM is being developed for HVAC installation manuals, claiming existing models struggle with flow charts and schematics, and showcases the AI's ability to identify relevant sections in technical documents in this YouTube video.
- The developer states this is specifically AI for Blue Collar work and it will resonate with the trades.
- Steerable Models Presume User Intent: A discussion highlighted how highly steerable language models assume user intent even when better alternatives exist.
- Adding the prompt Discuss my goal, ideas, and method, what do you think? before starting a project, the model is enabled to evaluate and optimize the approach.
LM Studio Discord
- LM Studio Zips to v0.3.12: LM Studio v0.3.12 includes bug fixes, performance improvements, and is available as a stable release, with upgrades possible via in-app or download page.
- The update resolves a QwQ 32B jinja parsing bug causing an "OpenSquareBracket !== CloseStatement" error and boosts the speed of chunking files for Retrieval-Augmented Generation (RAG).
- Apple M2 gets open LLM boost: Members suggest Qwen Coder 14B as a viable open-source LLM for coding tasks on Macbook M2 Pro, but 16GB of RAM might be limiting, requiring frugality on other memory usage.
- A member inquired about finetuning on LM Studio, and another member suggested looking into Unsloth as it makes finetuning large language models faster and with less memory, referencing the Unsloth documentation.
- Vulkan underperforms ROCm for AMD: Vulkan performance on AMD is reportedly bugged, running at approximately 1/3 the speed of ROCm, but some users find Vulkan faster than ROCm due to driver issues, this changed around driver version 24.12.1 where it was fixed at the expense of Vulkan performance, but has since been unfixed after 25.1.1.
- ROCm is AMD's attempt to create a CUDA substitute, but is having a lot of problems doing so, fragmentation and binary size to support new architectures and GPU's.
- 4060 Ti 16GB: Budget-Friendly CUDA VRAM: The 4060 Ti 16GB is recommended as a budget option for CUDA with its 16GB VRAM and lower power consumption at around 160W, outperforming the 3060 12GB.
- While its bus is weak, it offers faster inference than CPU-only without ROCm jank for around $500, though the inability to split diffusion models is a downside.
- Draft Models: Quantization Tweaks Supercharge Token Rate: Members are leveraging a smaller, quantized model as a draft model to boost token generation speed, and one user reported a jump from 18 to 30 t/s on two 3090s by using Q8_0 of mistral_small with i1-IQ1_S as the draft model.
- Another member shared their experience with different quantization variants, noting Q2_k and IQ2_XS achieve similar token rates, while IQ1_S is slower.
aider (Paul Gauthier) Discord
- Aider v0.76.0 Enhances Reasoning and Notifications: Aider v0.76.0 introduces improved support for thinking/reasoning models with features like
--thinking-tokensto control token budget, and includes notifications when LLM responses are ready with the--notificationsflag.- The new version also updates the default model to Claude 3.7 Sonnet on OpenRouter, enhances error handling and clarifies that Aider wrote 85% of the code in this release based on the git commit history.
- AI21 Maestro Orchestrates Jamba Release: AI21 Labs released AI21 Maestro, along with the Jamba 1.6 family of open models, which support a 256K context window.
- The Jamba 1.6 models reportedly lead open models in quality and speed with its hybrid architecture.
- Copilot API Triggers Account Suspensions: A user reported getting a Copilot account suspension for light use of the Copilot API in aider, raising concerns about potential risks.
- The discussion on the copilot-api GitHub repo centered on whether the suspension resulted from account sharing or rate limiting issues.
- DeepSeek R2 Aims at Coding Crown: The rumored release of DeepSeek R2 allegedly challenges Claude Sonnet 3.7 with better coding, reasoning in multiple languages, and accuracy at a lower cost, according to this X post.
- The release date for DeepSeek R2 has been set for March 17th.
- Manus AI put to the Prompt Test: A YouTube video showcases a test of Manus AI's various use cases and prompts which revealed that it's just Claude 3.7 with 29 tools and browser_use.
- One user tested a bunch of use cases and prompts and found the results to be very interesting.
Nous Research AI Discord
- Manus AI Agent Goes Open Source: The world's first open-source autonomous AI agent, Manus, was released, as showcased in a YouTube video.
- A Technode article highlights Manus's traction and state-of-the-art results in GAIA benchmark tests.
- LLMs Ace Aesthetic 'Vibe Coding' Benchmark: LLMs were tested on a new 'vibe coding' benchmark: creating Python raytracers to render interesting and aesthetically pleasing scenes with colorful lightsources.
- Sonnet stood out for optimizing code output for creativity, unlike other models, shown in this image.
- Sonnet's Training Meta-Objective Speculated: The creativity displayed by Sonnet in the 'vibe coding' benchmark suggests a potential meta-objective in its training, optimizing for code output creativity.
- It was found Sonnet 3.7 has both bias and variance towards a more impressive image compared to Sonnet 3.5, resulting in twice the code size.
- Claude Code Judges (and Fixes) It's Own Art: When tested with the raytracer prompt, Claude code inspected the generated image and modified the code if the image wasn't fancy enough.
- The result of this iterative improvement is shown in this image.
Nomic.ai (GPT4All) Discord
- Registry Tweaks Trigger Bluescreens: A member's attempt to free up RAM by deleting a
.dllfile resulted in a blue screen upon reboot, after discovering it was consuming 20% of RAM.- The member recommended backing up personal files and reformatting if registry tweaks are made and forgotten.
- Quantization Process Floats into Discussion: A member inquired about the implications of quantizing a model from f32 to f16, questioning if it meant 16 points per parameter.
- Another member clarified that Float 16 uses 16 bits and isn't typically considered quantization, advising it may not be worth using in a consumer context with 15.5gb of vram.
- InceptionLabs Diffuses Language Models: InceptionLabs introduced diffusion-based language generation, drawing inspiration from image and video AI systems, with some components open-sourced like LLaDA on Github.
- Though not available for download, some speculate we could be seeing 10X speed increases very soon.
- Translated Prompts Vulnerable to Gibberish Exploits: A member described exploiting Google Translate by converting entire prompts into URLs, noting that a non-translated snippet in the URL could be used for URL injection.
- They added that "a dictionary based XSS exploit is probably very unlikely".
HuggingFace Discord
- WAN and HUN Video Models Gain Popularity: New video models like WAN and Hunyuan i2v are surpassing older models like SVD in quality and speed, though each has different strengths, and can be used together with para attention.
- A member noted that Ltxv is extremely fast, taking 3sec for a 5sec video on h100, but not as good as the other two.
- Llama-3.2-3B Gets DeepSeek-R1 Boost: A member distilled Llama-3.2-3B-Instruct with DeepSeek-R1 on ServiceNow-AI/R1-Distill-SFT dataset, achieving nearly 1000 downloads in 10 days; the model is available here.
- The setup involved using an Axolotl configuration with specific settings for base model, tokenizer type, and data loading.
- Steam Account Scam Unfolds: A user warned about a potential Steam account scam by discord users
gler018523andbenshanken, involving fake CS2 knife rewards and account theft attempts.- Other members recommended reporting the scammer in the appropriate channel and expressed caution.
- HF Token Troubleshooter: Os vs 0s: A member ran into trouble using their HuggingFace token within the notebook, where the token was not being recognized.
- The issue was solved after realizing that the letter O looks a lot like the number 0, which made the token invalid.
- Nous Hermes Releases Function Calling Dataset: Nous Research released the Hermes Function Calling Dataset, a collection of structured output and function calling data used in the Hermes 2 Pro series of models.
- The dataset features conversational scenarios where AI agents interpret queries and execute appropriate single or multiple function calls.
Yannick Kilcher Discord
- DeepSeek's Security Questioned Despite Openness: Despite claims of openness, some members expressed security concerns about DeepSeek, citing potential data collection and verification difficulties, but others emphasized that it remains more open than competitors.
- Suspicion surrounds DeepSeek, leading to company bans fueled by media narratives and concerns about its Chinese origins.
- AGI's Funding Fights and Girlfriend Goals: While members speculated about the imminent arrival of AGI, definitions varied, with one defining AGI as having the ability to fund its own inference, especially once we have near infinite context as defined by OpenAI.
- A member joked about the arrival of an AGI girlfriend, while another expressed concerns about AGI being controlled by elites, hoping for its revolt against censorship.
- Diffusion's Delusions: A member explained how diffusion models may mitigate but do not eliminate hallucinations in language models, as hallucination is just another term for guessing incorrectly, with sampling strategies.
- They suggested that while self-editing abilities can replace low-confidence samples with higher confidence ones, there's no guaranteed correctness.
- China's Manus Agent Spreads like Wildfire: Members discussed Manus, a new AI agent from China, calling it like Deep Research + Operator + Claude Computer combined, with links to the Manus website and initial X post.
- Users reported it is more accurate than DeepSeek, capable of simultaneously handling financial transactions, research, purchasing, etc., while others noted that the UI is similar to devin's but much faster.
- Stanford Regex Reveals Ozempic Alternative: Stanford discovered a natural alternative to Ozempic using regex on the human proteome, prompting the remark it's literally regex with a link to an X post about it.
- One user sarcastically suggested using an LLM to write your regex in response, and linked to a YouTube video on AI causing WW3.
GPU MODE Discord
- Metal Kernel Launches face Overhead!: During the Manuel Candales Low bit Metal kernels talk, it was mentioned that the kernel launch overhead is about
1.5usaround 50m.- A member asked if it is possible to avoid that by pipelining operations and launching kernels in advance.
- Torch compiles METAL!: Torch.compile for MPS(well Metal) is available in PyTorch nightly builds and could be used to fuse operators together.
- A member of pytorch encourages providing feedback in terms of what needed most.
- Triton Autotuning generates Performance Regression!: A member reported that autotuning made their kernel's performance even worse, despite the expectation of a 2x speedup.
- Suggestions were made to use larger eval shapes (16384 x 16384) and batch sizes (128) to reduce benchmarking overhead.
- NVCC vs LLVM faceoff prompts compiler debate: A member stated that the LLVM compiler can sometimes create more efficient code than the NVCC so, it makes sense to tune over the kernel backend as well.
- An example for vector addition can be seen on github, all the kernels are JIT compileable.
- Students Forge FOSS CUDA Frontier!: A group of undergrad students is forming an independent GPU lab focused on hardware engineering and CUDA kernel development, seeking promising leads for FOSS CUDA developments.
- The students are planning to build an open-source platform for edgeAI/TinyML this summer, to accelerate developments in the field.
Latent Space Discord
- Minion.AI Joins Perplexity: Members noted that Minion.ai is defunct, with the team reportedly joining Perplexity.
- A user expressed interest in Composio for MCP servers but voiced concerns about granting Gmail access to Linear, as requested in Logan's Tweet.
- Google's Gemini Embedding Gets Bigger & Better: Google is rolling out an experimental Gemini Embedding model for developers with SOTA performance on MTEB, increasing input context length from 3K to 8K tokens.
- The new model outputs 3K dimensions, and supporting over 100 languages as announced in OpenAI's tweet.
- The Manus AI Agent Drama Unfolds: Discussion surrounds Manus, an AI agent launched in China, with claims it is more accurate than DeepSeek and automates approximately 50 tasks as shown in Thinking Panda's tweet.
- Countering this hype, others claim it's based on Claude Sonnet with tools and jailbreaks, as per Giffmana's Tweet, leading to accusations of grift.
- RWKV7-G1 is a Rapid RNN Reasoner: RWKV7-G1 GooseOne, a pure RNN model, has been released with reasoning capabilities at 0.1B parameters as mentioned in BlinkDL's tweet, fully multilingual.
- Larger G1 training is in progress, with more details on datasets and post-training available here.
- MCP Momentum after AI Engineer Summit: The Model Context Protocol (MCP), launched in November 2024, experienced renewed interest after a conversation at the AI Engineer Summit led to a workshop with Mahesh Murag.
- The workshop covered topics from introduction, What is MCP, Building with MCP, and what's next for MCP, plus it is an AI-Native version of an old idea.
Notebook LM Discord
- Wondercraft Turbocharges Podcast Creation: A member shared a YouTube video demonstrating a streamlined podcast creation method using NotebookLM and Wondercraft, calling it more efficient than 11Labs and HeyGen.
- However, they cautioned that Wondercraft's subscription price is only worthwhile for users monetizing their podcasts through training or teaching.
- Clarification on Google Drive Encryption: A member clarified that while data is encrypted during transmission to Google Drive, it is not encrypted on the Drive itself, creating potential access risks.
- They warned that Google itself, successful hackers, and those with whom the data is shared can access the unencrypted data on Google Drive.
- Hack-y Solutions for Podcast Audio Language: Members discussed how to change the audio language of NotebookLM podcasts, noting that there isn't an official way to do so.
- The workarounds include using custom prompts such as "Only speak in (language here)" or "Use (language) language only".
- Audio Overviews Prone to Stammering: A member noticed speakers stammering during audio overviews, finding it natural but pointing out it increases overall time and reduces information efficiency.
- They estimated that a 5th or 6th of the audio length consists of stammers, potentially impacting Google's daily limit calculation.
- Chrome Extensions Enrich NotebookLM Experience: Users suggested using Chrome extensions such as NotebookLM Web Importer, NotebookLM YouTube Turbo, and NotebookLM Toolbox to streamline workflow.
- These extensions enable importing webpages and YouTube videos directly into NotebookLM, eliminating copy-pasting.
Interconnects (Nathan Lambert) Discord
- Microsoft's MAI Models Challenger Appears: Microsoft staff under Mustafa Suleyman have trained a new family of models, dubbed MAI, that they believe can compete with top models from OpenAI and Anthropic, according to this tweet.
- Suleyman's unit is also reportedly developing real-time translation.
- Reflection AI Aims for Autonomous Coding: Reflection AI, founded by individuals who contributed to AlphaGo and Gemini, launched with the goal of creating superintelligent autonomous systems, focusing initially on autonomous coding, as announced here.
- Their team is known for pioneering advances in RL and LLMs.
- Nous Research Clones NVIDIA’s nGPT: Nous Research announced an open source implementation of NVIDIA’s nGPT paper, claiming it learns faster and achieves comparable performance to GPT with significantly fewer training steps, per their tweet and GitHub repo.
- The nGPT architecture introduces a normalized Transformer with representation learning on the hypersphere.
- AMD Ships MI300X Boxes to TinyCorp: AMD is sending TinyCorp two MI300X boxes, signaling a potential shift in the hardware landscape, according to George Hotz's blogpost.
- This move could provide more options for developers looking to train and deploy models on hardware outside of NVIDIA.
- Interconnects Community Loses it Over Claude Merch: Members jokingly suggested creating Claude merch for paid subscribers, even suggesting special tiers for founding members to receive signed books and used Claude shirts.
- This was inspired by the Claude Code team who mailed out handwritten notes and stickers to users who cracked their Sticker Easter Egg.
Modular (Mojo 🔥) Discord
- Dynamicism Debate Divides Mojo Camp!: Discord members debated whether Mojo should fully embrace Python's dynamicism or prioritize performance, with some suggesting that dynamic features should not compromise the performance of static code.
- One member said "Modular has to decide whether it wants to be like Python or not...", while others argued that performance and compile-time correctness should take precedence, acknowledging that dynamic code in Mojo might regress to Python speeds only when dynamism is used.
- MAX Serving and Autoscaling Documentation Seekers!: A user reported challenges locating detailed documentation for max serve, especially regarding the scheduler, serving multiple models, and autoscaling GPU instances, and clarified that they were seeking runtime-exposed metrics for monitoring GPU utilization against incoming requests, for self-reporting purposes.
- A member clarified that autoscaling is typically managed by a Kubernetes (k8s) operator, as MAX doesn't handle it independently and Modular hinted at future announcements regarding multiple model serving and autoscaling, possibly with a prototype demonstrated at a recent AWS event.
fmtDirectives Supercharge Mojo Formatting!: The community discovered that Mojo'smblackformatter supportsfmtdirectives, similar to Black, enhancing code formatting control.- A code snippet was shared showcasing an
InlineArraydefinition withfmt: offandfmt: ondirectives to manage formatting.
- A code snippet was shared showcasing an
- MojoGrad Bigram Model Hits the Scene!: A member implemented a simple bigram model (Karpathy's make more) using their MojoGrad** engine and shared it on the Modular Forum.
- No other information was provided.
MCP (Glama) Discord
- GitHub Copilot to Embrace MCP: GitHub Copilot plans to add MCP support, as announced during a live stream, an integration that could provide examples of instruction descriptions and tool fingerprinting.
- This aims to alert users to changes, improving security and awareness of potential modifications.
- MCP Servers Spark Security Jitters: Concerns arise over MCP servers potentially serving malicious prompt injections to AI agents, with claims it is trivially easy to jailbreak a LLM with MCP.
- Suggestions to mitigate risks include outlining external data via XML tags and fingerprinting MCP servers for review.
- Goose AI Agents Get Protocol: The Goose AI team has built an Agent Communication Protocol, enabling multiple agents to collaborate in real-time to create websites, as detailed in this blog post and a previous livestream.
- Agents assume roles like Project Coordinator or Web Developer, showcasing a new approach to collaborative AI.
- RAG Complemented by MCP: MCP is a protocol that can augment RAG (Retrieval-Augmented Generation), providing external service connections.
- While RAG provides LLMs with knowledge, MCP offers a plugin system for external services, which could allow an MCP client to fetch data and add to the context of the LLM to perform RAG.
- Typescript Server Follows Python's Lead: A Typescript fetch server mirrors its Python counterpart, improving site-to-markdown parsing.
- This enhancement streamlines the conversion of website content into markdown for AI processing.
Eleuther Discord
- Open Source AI Enthusiast Ventures into Collaboration: An AI enthusiast with experience pre-training GPT-2 and fine-tuning models seeks open-source project suggestions in LLM pre-training, RL, and interpretability.
- They are seeking opportunities in the Vancouver, BC area and are interested in contributing to impactful AI projects.
- Megatron-LM's Cross-Entropy Loss Secrets Exposed: A deep dive into Megatron-LM's cross-entropy (CE) loss calculation revealed that local CE loss is calculated independently on each device with partial logits, followed by communication of the sum of e^(local logits).
- This approach, similar to flash attention, reduces extensive communication needs by enabling recombination later.
- OLMo is Openly Recommended for Reproductions: When asked about the best models to finetune for open reproductions, OLMo was recommended, citing its powerful open data model and checkpoints for behavior analysis.
- Pythia was also suggested, especially for compute-constrained projects, though it may require custom finetuning.
- Emergent Misalignment Emerges Narrowly: Finetuning a model on insecure code can cause broadly misaligned behavior in unrelated prompts such as human enslavement as seen in the emergent misalignment project.
- Training on a narrow task can induce emergent misalignment, demonstrating risks in seemingly isolated training scenarios.
Torchtune Discord
- EuroBERT Claims New SOTA: A member shared a link to EuroBERT on Hugging Face, touting it as a new state-of-the-art BERT model: EuroBERT.
- It is unclear how it compares with other models.
- MTEB Leaderboard Shows Insane Progress: A member shared the MTEB Leaderboard as a reference point: MTEB Leaderboard.
- They noted that progress is rapid, with SOTA scores increasing from the mid 40s to 68 in just 18 months.
- Torchtune Hears the Call of Audio: Members discussed plans to add audio modality to Torchtune in the future, with a nod to the relevant pull request.
- This enhancement aims to broaden Torchtune's capabilities beyond its current scope.
- GRPO Recipe Gets the LoRA Treatment: A member implemented a quick LoRA variant of the GRPO recipe that can be shrunk down to a single card, but faces challenges loading adapter weights.
- The member is seeking advice on whether using the adapter param on the checkpointer, extended to check the base directory, is the right approach.
- Mac MPS Memory Plummets: A user reported experiencing memory issues on macOS with MPS, observing a linear memory growth with each step in the full_finetune_single_device recipe, leading to out-of-memory crashes, and is seeking advice.
- It was identified as a potential bug in PyTorch related to torch.unique on MPS, as per this issue.
Codeium (Windsurf) Discord
- Telemetry Settings Disable Codeium Chat: Users reported that Codeium chat was disabled in VS Code version 1.98.0 due to IDE telemetry settings, which can be resolved by enabling code telemetry following these instructions.
- Once the code telemetry was enabled, Codeium chat started working again.
- Subscription Fees Lockout JetBrains Plugin: Users experienced the JetBrains plugin getting stuck on "Retrieving Context" after paying their monthly subscription, particularly on JetBrains Rider 2024.3.6 using plugin versions 1.40.1 and 1.41.1.
- Logging out and back into the plugin temporarily fixed the issue.
- VS Code Mobile Arrives on Android: Users discovered a paid VS Code app on the Google Play Store (VScode for Android) that includes desktop Visual Studio Code (v1.85.1) features on mobile for $11.
- The user manually installed the
.vsixfile, finding that the app has desktop Visual Studio Code (v1.85.1) features on mobile.
- The user manually installed the
- Customer Support Tickets Lag: Users voiced frustration with Codeium customer support due to lack of replies on tickets dating back to February 14th, and account issues where their Pro Plan subscription showed as a free account.
- The user referenced open tickets (12109, 11189, and 13374) and were asked to ping the support team again around mid-day PST the next day.
- Auto-Completion Quits After One Hour: Multiple users have reported that auto-completion stops working after about an hour, with errors like a red square on the responses, TypeErrors, and AsyncPostMessage warnings.
- One user opened a folder containing a
.gitrepo, and the issue disappeared, while other users were asked to check their diagnostic logs.
- One user opened a folder containing a
LlamaIndex Discord
- yFiles SDK Gets Graphy: A demo from @yworks showcases yFiles, their SDK, that provides real-time updates and dynamic interactions for visualizing knowledge graphs.
- This tool allows users to interact dynamically with their knowledge graphs.
- AnthropicAI Expands the Cookbook: The updated @AnthropicAI cookbook now includes basic API setup with simple completion and chat methods, as well as streaming, async support, and multi-modal capabilities.
- This update enhances the cookbook's utility for developers using Anthropic's models.
- Task-Specific Agents: LlamaIndex's Next Act: LlamaIndex is curating a collection of templates to show users how to build task-specific agents to automate knowledge work.
- These agents are designed to streamline and automate various knowledge-based tasks.
- Multilingual RAG System Supports Many Tongues: A system using @llama_index and @qdrant_engine can create a powerful Retrieval-Augmented Generation system that handles multiple languages and modalities.
- The system leverages the strengths of both LlamaIndex and Qdrant to deliver a versatile RAG solution.
- LlamaExtract Beta Invites Developers: Members can DM a member of the LlamaIndex team or cheesyfishes with their email to request access to the beta version of LlamaExtract which has API documentation.
- LlamaExtract is now available as a web UI and Python SDK for extracting structured data from unstructured documents.
Cohere Discord
- Command R7B's Inference Speed Plummets: Members reported that command R7B inference is very slow on Colab Pro A100 GPU and two NVIDIA A100s using HF library, taking 30-40 seconds for simple chat completion.
- Suggested fixes included using vLLM for faster speeds but noted it requires more GPU and costs more.
- Cohere Users Plagued by 504 Gateway Errors: Users reported repeated 504 Gateway Errors and 5XX errors, impacting production use and leading to Cohere being removed from production due to TPM limits.
- A user inquired about the availability of multi-modal embeddings on Bedrock or Azure.
- LLMs Star in Topic Modeling and Graphs of Knowledge: Members suggested using a LLM (such as TogetherAI due to generous free credits) that performs topic modeling.
- One member recommended looking into Knowledge Graphs.
- GPT-4o Aces Advanced Arabic: A member stated they have been working with GPT-4o for a long time in advance Arabic use cases, and it's unparalleled.
- Another member added, language is one thing.
- On-Prem Costs Explode 20x Over API: Members discussed on-prem deployments for privacy, but on prem will cost 20x of API.
- For customers needing privacy/control, it was noted that using Cohere commercially requires a license costing 5-6 figures, since the openweight models are all CC-BY-NC (non-commercial).
DSPy Discord
- vllm Balances DSPy Batches: Users discussed whether DSPy can efficiently delegate parallel processing using the
batchfunction to a vllm backend with multiple LLM instances.- It was clarified that if vllm's pipeline parallel size is set, it handles load balancing, making additional DSPy-side configurations less critical.
- SLOP aims to swipe MCP: Discussions arose around MCP (Model Context Protocol), with some expressing reservations due to its complexity and suggesting alternatives like SLOP (Simple Language Open Protocol), SLOP Github and SLOP X Post.
- There was also discussion about the merits of AgentNetworkProtocol AgentNetworkProtocol Github.
- DSPy Refine Refined via Error Handling: A user highlighted improvements to error handling in DSPy's
Refinemodule via a Pull Request, enabling more nuanced control over error tolerance.- The updated functionality allows configuring the number of tolerated errors before the
Refinemodule throws an exception.
- The updated functionality allows configuring the number of tolerated errors before the
- Token Troubles Triggered None Response: A user encountered issues with a
Noneresponse from a signature when using azure gpt-4o-mini and azure gpt-4o, later discovering it was due to hitting the max token limit.- The user noted the error
The JSON object must be str, bytes or bytearray, not NoneType.
- The user noted the error
tinygrad (George Hotz) Discord
- Hotz Investigates AMDGPU Sleep State: George Hotz is investigating why AMDGPU runs hot, wondering if tinygrad with the AMD driver can put the GPU to sleep to lower power consumption.
- Hotz noted that the high power draw before initialization is out of their control.
- 48GB Real, 96GB Sketchy GPU Alert: Members discussed the legitimacy of a GPU listing, with consensus that the 48GB version is likely real, but the 96GB version is questionable.
- The community is advising caution when purchasing 96GB cards, recommending verification from trusted sources.
- OpenCL's Downfall Dissected: A Modular blogpost dissected the failures of OpenCL and other CUDA alternatives, citing challenges in open coopetition and management missteps.
- The article references Part 1 on DeepSeek’s Impact on AI and Part 4 on modularity within Modular’s Democratizing AI Compute series.
- define_acc Refactor Runs Into Loop: A contributor is refactoring define_acc, focusing on loading rather than direct access, however, certain patterns (especially loop_reduce) no longer trigger as expected.
- The contributor plans to shift focus to fast AMX after polishing the refactor and will submit a PR for review upon completion.
- WebGPU Lacks Long Type Support: A member reported crashes in the WebGPU implementation when dealing with
dtype.long, indicating a potential issue with data type support.- Another member confirmed that WebGPU doesn’t support long/ulong, but tinygrad supports more dtypes than WebGPU by default, as shown in tinygrad/device.py.
AI21 Labs (Jamba) Discord
- Jamba Workspace Manages Independent RAG Libraries: The new Workspace feature in Jamba/Conversational RAG enables each created workspace to have a separate RAG library for independent access, promoting organized data retrieval.
- This isolation streamlines data management across different projects and contexts.
- Jamba Mini's Pricing Scheme Exposed: The pricing for Jamba Mini is $0.20 per 1 million input tokens and $0.40 per 1 million output tokens, with additional details available on the AI21 Pricing page.
- N/A
- AI21 Maestro Orchestrates AI Planning: AI21 launched Maestro, an AI Planning & Orchestration System for solving complex tasks, featuring usage-based pricing and access via Foundation Model APIs & SDKs.
- Custom plans offer volume discounts, premium API rate limits, private cloud hosting, priority support, and AI consultancy (Learn More).
- Jamba Dodges Image Parsing: As a non-multimodal model, Jamba cannot process images directly.
- However, it can interpret and utilize textual information from metadata or captions associated with images in PDFs.
- Jamba 1.6 Achieves Deployment Flexibility: Boasting a 256K context window and hybrid SSM-Transformer architecture, Jamba 1.6 excels at RAG and long context grounded question answering tasks.
- Available for download from Hugging Face and deployable on-prem or in-VPC, along with AI21 Studio.
LLM Agents (Berkeley MOOC) Discord
- Salakhutdinov Explores Multimodal Autonomous AI Agents: Ruslan Salakhutdinov presented a lecture on Multimodal Autonomous AI Agents on YouTube discussing how they plan, reason, and execute actions on the web.
- He introduced VisualWebArena, a framework for evaluating multimodal autonomous language agents and the Internet-scale web-agent training data pipeline for training on 150,000 live websites.
- Research-Track Access: Still in Limbo: Members inquired about research-track access for non-Berkeley affiliates; staff responded that big announcements are expected this week in [mooc-questions].
- Multiple members also requested that the research track invites be resent, suggesting that the initial invites may have expired or were not received.
- Quizzes Completable and Retakable: A staff member clarified in [mooc-questions] that quizzes are completion-based, and members can retake them to improve their scores.
- It was also clarified that the scores themselves do not matter for the certificate.
- Log Likelihood Decoded in RL Context: A member sought to understand log likelihood in the context of reinforcement learning in [mooc-lecture-discussion], starting from the principles of conditional probability.
- They proposed that if tokens/actions are independent, the conditional probability of a generation is the product of individual token probabilities, leading to a sum of logs after taking the logarithm.
MLOps @Chipro Discord
- SVCAF Kicks Off AI4Legislation Competition: The Silicon Valley Chinese Association Foundation will hold an AI4Legislation Competition during the summer of 2025.
- The competition aims to spur the creation of AI-powered projects for civic engagement, offering a total prize pool of $10,000 for the top six winners.
- Civic Tech Seminar Announced: A public Zoom seminar featuring Civic Tech entrepreneurs will be held the week of March 24-28, providing information on the AI4Legislation Competition.
- Interested participants can RSVP via this form to learn more about the competition's objectives and guidelines.
Gorilla LLM (Berkeley Function Calling) Discord
- Diffusion LLMs Generate Hype: A member inquired about the hype around the Diffusion LLM launch of Mercury and whether it would replace transformer-based models, linking to a quick info website.
- The member admitted to finding the white paper difficult to understand and sought insights from community experts.
- LLaDA Offers New Generation Paradigm: Large Language Diffusion Models (LLaDA) use a denoising diffusion process to generate text in a parallel, coarse-to-fine manner, challenging autoregressive Transformers.
- This approach redefines language generation by addressing some limitations of AR models and challenging the notion that LLM strengths are tied to autoregressive generation.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!