AI News (MOVED TO news.smol.ai!)

Archives
Subscribe
April 15, 2025

[AINews] GPT 4.1: The New OpenAI Workhorse

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


GPT 4.1 is all you need from OpenAI?

AI News for 4/11/2025-4/14/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (211 channels, and 16961 messages) for you. Estimated reading time saved (at 200wpm): 1382 minutes. You can now tag @smol_ai for AINews discussions!

GPT 4.1 links:

  • https://openai.com/index/gpt-4-1/
  • New benchmarks: MRCR and GraphWalks
  • New prompting guide and cookbook

and a new interview published on Latent Space:


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

GPT-4.1 Release and Performance

  • Availability and Features: @sama announced that GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano are now available in the API, emphasizing their strengths in coding, instruction following, and handling long contexts (up to 1 million tokens). @kevinweil notes that GPT-4.1 achieves a 54 score on SWE-bench verified.
  • Instruction Following: @OpenAIDevs points out that GPT-4.1 follows instructions more reliably than GPT-4o, particularly in format adherence, complying with negative instructions, and ordering.
  • Pricing and Cost: @stevenheidel states GPT-4.1-nano is the cheapest and fastest model released, costing $0.10/1M input ($0.03 cached) and $0.40/1M output.
  • Coding Performance: @omarsar0 highlights that, according to Windsurf AI, GPT-4.1 shows a 60% improvement over GPT-4o on internal benchmarks like the SWE-benchmark, reduces the need to read unnecessary files by 40%, and modifies unnecessary files 70% less. @OpenAIDevs states it is significantly more skilled at frontend coding and has reliable tool use. @polynoamial mentions GPT-4.1 achieves 55% on SWE-Bench Verified without being a reasoning model.
  • Integration and Support: @llama_index mentions Llama Index now has day 0 support for GPT-4.1.
  • Initial Impressions: @aidan_mclau notes that startup engineers were amazed by GPT-4.1 mini/nano, finding it comparable to GPT-4o but much cheaper. @aidan_mclau describes it as a Pareto optimal, Swiss Army knife API model, and an upgrade over newssonnet for agent stacks.
  • Limited Availability on ChatGPT: @DanHendrycks suggests that the free GPT-4.1 mini might be intentionally limited on ChatGPT to incentivize college students to subscribe to ChatGPT Plus.
  • Naming Conventions: @polynoamial joked about naming models. @iScienceLuvr notes that the naming scheme for GPT models follows GPT-4.10, so it comes after GPT-4.5, while @kevinweil joked that it would not get better at naming this week.
  • Deprecation of GPT-4.5: @OpenAIDevs announced that GPT-4.5 Preview in the API will be deprecated starting today and fully turned off on July 14, as GPT-4.1 offers improved or similar performance.
  • Negative Reviews: @scaling01 advises against using GPT-4.1-nano, describing it as a terrible model. @scaling01 reports the GPT-4.1 API version is worse than Optimus Alpha.

Model Benchmarks and Comparisons

  • Search Arena Leaderboard: @AravSrinivas reports that Perplexity's Sonar API is tied with Gemini-2.5 Pro for the #1 spot in the LM Search Arena leaderboard. @lmarena_ai reports that Gemini-2.5-Pro-Grounding and Perplexity-Sonar-Reasoning-Pro top the leaderboard.
  • Llama 4 ELO Drop: @casper_hansen_ reports that Llama 4 quietly dropped from 1417 to 1273 ELO, on par with DeepSeek v2.5.
  • Google Gemini 2.5 Pro: @abacaj said that Google has finally made the best model with Gemini 2.5 pro. @omarsar0 is surprised at how good Gemini 2.5 Pro is at debugging and refactoring, and that it's one of the best models at understanding larger codebases.
  • Gemini 2.0 Flash: @_philschmid reports Gemini 2.0 Flash is $0.1/$0.4 (input/output per 1M tokens) with strong scores on GPQA Diamond, Multilingual MMLU, and MMMU.
  • Mistral Models: @casper_hansen_ stated that Long Mistral models are great and their latest 24B model is very competitive.
  • Nvidia Llama Nemotron-Ultra: @adcock_brett notes Nvidia released Llama Nemotron-Ultra, a 253B parameter reasoning AI that beats DeepSeek R1, Llama 4 Behemoth and Maverick, and is fully open-source.
  • Meta Llama 4: @adcock_brett details that Meta released the Llama 4 family of natively multimodal, open-source models with context windows up to 10M tokens, including the 109B param Scout, 400B param Maverick, and a third, 2T param Behemoth. @DeepLearningAI notes Llama 4 Scout features an unprecedented 10 million-token context window, Maverick beats GPT-4o’s reported benchmarks, and Behemoth claims to outperform GPT-4.5 and Claude 3.7 Sonnet.
  • Kimina-Prover vs. other models: @_lewtun notes that the new programming language Lean has Kimina-Prover beating Gemini 2.5 Pro and o3-mini on Olympiad-level math with just 7B parameters!
  • GPT-4.1 vs DeepSeek-V3: @scaling01 states that GPT-4.1 underperforms DeepSeek-V3-0324 by over 10% on AIME and is 8x more expensive and also underperforms on GPQA.
  • GPT-4.1 vs. GPT-4.5: @scaling01 states that GPT-4.1 outperforms GPT-4.5 in AIME and MMLU.

Robotics and Embodied AI

  • Hugging Face Acquisition: @ben_burtenshaw reports that Hugging Face acquired Pollen Robotics, an open source robot manufacturer.
  • Fourier's Open-Source Humanoid: @adcock_brett notes Fourier’s fully open-source humanoid robot.
  • Samsung & Google Partnership: @adcock_brett notes Samsung announced a partnership with Google to power its Ballie home robot with Google's Gemini and its own multimodal AI models.

AI Research and Papers

  • Reflection in Pre-Training: @omarsar0 summarizes a paper arguing that reflection emerges during pre-training and introduces adversarial reasoning tasks to show that self-reflection and correction capabilities improve with compute, even without supervised post-training.
  • Reinforcement Learning and Reasoning: @rasbt summarizes a paper showing that reinforcement learning (RL) can lead to longer responses in reasoning models, not because they are needed for accuracy, but because RL training favors longer responses.
  • Multimodal Models Scaling Laws: @TheAITimeline summarizes a scaling laws analysis involving 457 native multimodal models (NMMs), revealing that early-fusion architectures outperform late-fusion ones and that Mixture of Experts (MoEs) significantly boosts performance.
  • Paper List: @TheAITimeline posted a list of top AI/ML research papers, and @dair_ai similarly shared their top AI papers.
  • Visual Tokenizers: @iScienceLuvr notes that GigaTok improves image reconstruction, generation, and representation learning when scaling visual tokenizers.

Other Model and AI Tool Releases

  • Deep Cogito Models: @adcock_brett notes that Deep Cogito emerged from stealth with Cogito v1 Preview, a new family of open-source models.
  • Runway Gen 4 Turbo: @adcock_brett shares that Runway released Gen 4 Turbo, a faster version of its video model, available to all users, including those on the free tier.
  • Midjourney V7: @adcock_brett reports that Midjourney released V7, with improved quality, enhanced prompt adherence, and a voice-capable Draft Mode.
  • Microsoft Copilot Updates: @adcock_brett mentions that Microsoft upgraded its Copilot app with new memory capabilities, web browsing actions, and vision features.
  • Amazon AI: @adcock_brett says that Amazon released a speech-to-speech AI called "Nova Sonic" and launched Reel 1.1 AI for extended 2-min video generations.
  • Nvidia Cartoon AI: @adcock_brett shares that Nvidia and Stanford researchers unveiled an AI technique to generate consistent, minute-long cartoons.
  • DolphinGemma: @GoogleDeepMind introduced DolphinGemma, an AI helping us dive deeper into the world of dolphin communication. 🐬, and is an audio to audio model.

AI Infrastructure and Tooling

  • OpenAI Infrastructure Scale: @sama mentioned that the scale of computing systems at OpenAI is insane and they need help.
  • ElevenLabs MCP Integration: @adcock_brett reports ElevenLabs launched its MCP server integration, enabling platforms like Claude and Cursor to access AI voice capabilities.
  • Qdrant + n8n: @qdrant_engine notes that Qdrant and n8n are automating processes beyond similarity search.
  • LangChain Tools: @LangChainAI promotes an open-source library connecting any LLM to MCP tools for custom agents, featuring integration with LangChain and support for web browsing, Airbnb search, and 3D modeling.
  • Hamel Husain Chrome Extension: @HamelHusain created a Chrome extension that allows you save an entire Gemini chat (via aistudio) into a gist or copy as markdown, and also has one for Claude.

AI Strategy and Discussion

  • Open Source Robotics: @ClementDelangue advocates for making AI robotics open-source.
  • Prioritizing Medical Diagnostics: @iScienceLuvr notes that better diagnostics + care delivery are more impactful than finding a new chemotherapy drug for curing cancer.
  • LLMs and Search Engines: @rasbt doesn’t think LLMs will replace search engines.
  • Conciseness via RL: @TheAITimeline summarizes research uncovering a correlation between conciseness and reasoning accuracy and a method for achieving more concise reasoning in LLMs via a secondary RL phase.
  • Developer Experience: @sedielem highlights importance of developer experience.
  • Value of Expertise in RAG: @HamelHusain emphasizes the value of talking to people who have spent lots of time optimizing retrieval & search to get better at RAG.
  • Future of AI: @scaling01 shares that the base case for LLMs is that over the next few years they’ll evolve into hyper-specialized autistic superintelligences that excel in domains where verification is straightforward.

Humor and Miscellaneous

  • Flat Organizations: @typedfemale made a joke about flat organizations.
  • Hot Sauce: @vikhyatk joked not to try "murder hornet" hot sauce 5 mins before bedtime.
  • Overhyped Valuations: @andrew_n_carr talks about SSI valuation.
  • Personal Anecdotes: @DavidSHolz accidentally asked a friend how they were enjoying "jew york" due to autocorrect. @sjwhitmore stated that they’ll put their baby to sleep and 30 min later catch themselves looking at photos of him. @willdepue mentioned openai hunting cap is a must for the next podcast and @sama bought a lot of silly baby things that they haven't needed, but recommends a cradlewise crib and a lot more burp rags than you think you could possibly need.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. "Exciting Advancements in GLM-4 Reinforcement Learning Models"

  • glm-4 0414 is out. 9b, 32b, with and without reasoning and rumination (Score: 190, Comments: 64): GLM-4 0414 has been released, introducing six new models of sizes 9B and 32B, with and without reasoning and rumination capabilities. The models include GLM-Z1-32B-0414, a reasoning model with deep thinking capabilities developed based on GLM-4-32B-0414 through cold start, extended reinforcement learning, and further training on tasks like mathematics, code, and logic. GLM-Z1-Rumination-32B-0414 is a deep reasoning model with rumination capabilities, capable of deeper and longer thinking to solve more open-ended and complex problems. GLM-Z1-9B-0414 is a 9B parameter model employing all the aforementioned techniques, exhibiting excellent capabilities in mathematical reasoning and general tasks, achieving top-ranked performance among open-source models of the same size. GLM-Z1-9B-0414 is considered a surprise, achieving an excellent balance between efficiency and effectiveness, making it a powerful option for users seeking lightweight deployment. The models demonstrate significant improvements in mathematical abilities, research-style writing, and the capability to solve complex tasks.

    • A commenter notes that the new 32B models have only 2 kv value heads, resulting in the KV cache taking up about four times less space than on Qwen 2.5 32B, and wonders if this might cause issues with handling long context.
    • Another commenter is impressed with the benchmarks, mentioning that GLM models have been around since LLama 1 days and have always been very good, but feels they need better marketing in the West as they seem to go under the radar.
    • A commenter appreciates that the models included the SuperGPQA benchmark results, making the models more comparable with many others.

Theme 2. "DeepSeek's Open-Source Contributions to AI Inference"

  • DeepSeek is about to open-source their inference engine (Score: 1312, Comments: 92): DeepSeek is about to open-source their inference engine, which is a modified version based on vLLM. They are preparing to contribute these modifications back to the community. An article titled 'The Path to Open-Sourcing the DeepSeek Inference Engine' outlines their motivations and steps, including challenges like codebase divergence, infrastructure dependencies, and limited maintenance bandwidth. They express gratitude towards the open-source ecosystem and plan to collaborate with existing projects to modularize features and share optimizations, aiming to enhance artificial general intelligence (AGI) for the benefit of humanity. More details can be found in their GitHub repository. The original poster expresses enthusiasm about DeepSeek's commitment to the community, particularly appreciating their goal 'with the goal of enabling the community to achieve state-of-the-art (SOTA) support from Day-0.' There is excitement about the potential positive impact of DeepSeek's contributions on the open-source AI community.

    • One user points out that DeepSeek may not be directly open-sourcing their inference engine but will contribute their improvements to vLLM and sglang, as their fork is too outdated.
    • Another commenter expresses deep appreciation for DeepSeek, comparing their love for the company to their love for Wikipedia.
    • A user feels that the release of DeepSeek's R1 was a pivotal moment in the AI race, noting that while it wasn't the smartest or cheapest model, it signaled alternatives to OpenAI like Claude, Gemini, and DeepSeek, and appreciates their ongoing innovation in the open-source field.
  • DeepSeek will open-source parts of its inference engine — sharing standalone features and optimizations instead of the full stack (Score: 252, Comments: 9): DeepSeek will open-source parts of its inference engine by sharing standalone features and optimizations instead of releasing the full stack. They are working on porting their optimizations to popular open-source inference engines like vLLM, llama.cpp, and kobold. Some believe the title is misleading, implying DeepSeek is withholding parts of their stack. However, others feel that by porting their optimizations to popular open-source inference engines, DeepSeek is contributing more effectively to the community. Users are optimistic about improved inference performance from these contributions.

    • Commenters note that DeepSeek is enhancing popular open-source inference engines like vLLM, llama.cpp, and kobold by porting their optimizations.
    • Some users are excited about the potential for better inference performance as a result of DeepSeek's contributions.
    • Users are asking if there is anything available now from DeepSeek for personal projects.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. "Revolutionizing Science: OpenAI's New Reasoning Models"

  • Scientific breakthroughs are on the way (Score: 724, Comments: 207): OpenAI is about to release new reasoning models called o3 and o4-mini that are able to independently develop new scientific ideas for the first time [1]. These AI models can process knowledge from different specialist areas simultaneously and propose innovative experiments—an ability previously considered a human domain. Early versions have shown promising results: Scientists at Argonne National Laboratory were able to design complex experiments in hours instead of days using early versions of these models. OpenAI plans to charge up to $20,000 a month for these advanced services, which would be 1000 times the price of a standard ChatGPT subscription. The technology could dramatically accelerate the scientific discovery process, especially when combined with AI agents capable of controlling simulators or robots to directly test and verify generated hypotheses. This represents a potential revolution in the field, shifting abilities previously thought to be exclusive to humans to AI.

    • Some users are skeptical about OpenAI charging $20,000 a month for these AI models, questioning why the company doesn't use them to solve major problems themselves.
    • Others believe the information is credible due to the source's accuracy regarding OpenAI news, suggesting possible intentional leaks from the company.
    • There's confusion and speculation about the high subscription fee, with users recalling previous instances where rumored prices were higher than the actual release prices.

Theme 2. "Exciting AI Model Innovations and Competitive Updates"

  • GPT 4.1 with 1 million token context. 2$/million input and 8$/million token output. Smarter than 4o. (Score: 313, Comments: 140): GPT-4.1 is announced as the flagship model for complex tasks, featuring a 1 million token context window and a maximum output capacity of 32,768 tokens. Pricing is set at $2 per million tokens for input and $8 per million tokens for output, with additional information about cached input costs. The model claims enhanced intelligence compared to previous versions. The original poster emphasizes that GPT-4.1 is smarter than 4o, highlighting its advanced capabilities and suggesting it as a significant improvement over previous models.

    • Users compare GPT-4.1 to Google's Gemini models, discussing pricing and performance differences, and some express a wish for lower costs.
    • There is skepticism about how effectively GPT-4.1 utilizes its 1 million token context window, with mentions that models like Gemini 2.5 can handle about 100k tokens flawlessly.
    • Some speculate that GPT-4.1 may lead to the discontinuation of GPT-4.5, and express hope that upcoming models like o4-mini will be state-of-the-art.
  • OpenAI announces GPT 4.1 models and pricing (Score: 245, Comments: 119): OpenAI has announced the release of GPT 4.1 models along with their pricing details. The announcement has generated mixed reactions, with some users expressing frustration over the proliferation of models and others discussing the availability and improvements of GPT‑4.1.

    • One user expresses frustration over the multitude of models, stating they're so sick of this mess of random models.
    • Another points out that GPT‑4.1 will only be available via the API, noting that improvements have been gradually incorporated into the latest version of GPT‑4o in ChatGPT.
    • Some users joke about the knowledge cutoff being June 2024, humorously wishing they were as gullible as GPT 4.1 😂.
  • Kling 2.0 will be unveiled tomorrow. (Score: 281, Comments: 29): Kling 2.0 will be unveiled tomorrow, April 15, 2025, at 6:00 AM GMT. The announcement includes an image with a dynamic green background and the slogan 'From Vision to Screen', emphasizing innovation and technology. More details can be found at https://x.com/Kling_ai/status/1911702934183882986 and https://xcancel.com/Kling_ai/status/1911702934183882986. The promotional image conveys excitement and anticipation for Kling 2.0, capturing attention with its dynamic design. The slogan suggests a significant advancement from previous versions, building enthusiasm among potential users.

    • Users are amazed at the rapid release of Kling 2.0, with one noting that 'version 1.6 is still number 1'.
    • Discussion highlights how this last week has been 'WILD', with numerous AI advancements like Midjourney v.7, OpenAI GPT-4.1, and Google Agentspace Boxing.
    • There is anticipation for new features in Kling 2.0, such as longer video generation, as users are 'stuck at 5-10 sec' currently.


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. GPT-4.1 Models: Release, Performance, and Availability

  • OpenAI Unleashes GPT-4.1, Benchmarks Beat 4o: OpenAI's blog post announced GPT-4.1, touted for long-context reasoning, with benchmarks showing ~10% improvement over GPT-4o. Windsurf AI immediately integrated it, offering free unlimited access for a week, while OpenRouter launched GPT-4.1, Mini, and Nano versions, revealing Optimus Alpha and Quasar Alpha as early test versions of GPT-4.1.
  • Windsurf Waves Free GPT-4.1 for Users: Windsurf AI made GPT-4.1 its new default model, offering free unlimited usage for one week on all plans, then at a discounted rate of 0.25 credits per use. Cursor Community members anticipate GPT-4.1 becoming the new standard, with 4.5 being deprecated as users migrate to 4.1.
  • Aider v0.82.0 Embraces GPT-4.1 Patch Format: Aider v0.82.0 now supports GPT-4.1, including OpenAI's new patch edit format, and members reported performance similar to Quasar/Optimus but at $4.76 per run. LlamaIndex also announced day 0 support for GPT-4.1 API via llama-index-llms-openai, noting a ~2% improvement on agentic approaches.

Theme 2. Gemini 2.5 Pro: Performance Swings and Pricing Shifts

  • Google Nerfs Gemini 2.5 Pro Tool Calling: LMArena Discord members reported Google nerfed Gemini 2.5 Pro's tool calling function, possibly due to cost, rendering it unable to execute tool calls. OpenRouter also began charging normal prices for long Gemini prompts, ending a 50% discount for prompts over 200k tokens for Gemini 2.5 and 128k for Gemini 1.5.
  • Gemini 2.5 Pro Still UI Design Champ: Despite tool calling issues, Cursor Community members praised Gemini 2.5 Pro for its "insane" UI design capabilities, highlighting unique output and context retention. However, Aider users found Gemini 2.5 Pro struggling with longer contexts and code completion compared to Claude 3.7.
  • Gemini 2.5 Pro Eats Data, Steals Perplexity Subs: Manus.im Discord users lauded Gemini 2.5 Pro's data processing prowess, with one user canceling their Perplexity subscription due to Gemini 2.5 Pro's superiority and lower credit consumption per task. Perplexity AI's Sonar models, however, tied with Gemini-2.5-Pro-Grounding in LM Arena's Search Arena, citing 2-3x more search sources for Sonar's outperformance.

Theme 3. Open Source Models and Tools Gain Momentum

  • OpenRouter Opens Floodgates to Free Models: OpenRouter added six new free models, including NVIDIA's Llama-3 variants (Nano-8B, Super-49B, Ultra-253B) optimized for reasoning and RAG, and roleplay-tuned QwQ-32B-ArliAI-RpR-v1. Hugging Face also welcomed Meta's Llama 4 Maverick and Scout for testing.
  • DeepSeek Opens Inference Engine, DeepCoder Delivers Coding Power: DeepSeek open-sourced its Inference Engine, sparking discussions on inference performance for smaller providers. Nous Research AI highlighted DeepCoder, a 14B parameter open model achieving top coding performance with enhanced GRPO and 64K context generalization.
  • Aider and Ollama Embrace Open Source Ecosystem: Aider v0.82.0 added support for Fireworks AI's deepseek-v3-0324 model and improved architect mode with Gemini 2.5 Pro. Hugging Face users are increasingly using Ollama to run models locally as a substitute for API-limited models, and LlamaIndex suggests using larger open-source models like Llama3 or Mistral with Ollama for agent workflows.

Theme 4. Hardware Optimization and CUDA Deep Dives

  • GPU Mode Explores Hilbert Curves for GEMM Performance: GPU Mode Discord members discussed Hilbert curves for GEMM implementation, with benchmarks showing effectiveness against cuBLAS as matrix size increases, though Morton ordering is considered a more practical trade-off. NVIDIA also released its Video Codec SDK, prompting caution against AI-generated PR submissions.
  • CUDA Synchronization and memcpy_async Caveats: GPU Mode members exchanged CUDA synchronization guidance, suggesting custom ops and load inline, and investigated performance slowdowns with cuda::memcpy_async, noting it's a cooperative API requiring all threads to pass the same pointers, and alignment issues could hinder coalesced memory access.
  • Threadripper vs Xeon and DDR5 RAM Bandwidth Bottleneck: LM Studio's hardware discussion debated Threadripper vs Xeon CPUs for token generation cost-effectiveness, and considered DDR5 RAM bandwidth as a bottleneck, theorizing it limits overall hardware usage and first word latency limits max tokens/s.

Theme 5. Agent Development and Tooling Ecosystem Evolves

  • MCP Server Workshop and Growing Adoption: MLOps@Chipro announced an AWS workshop for building production-grade MCP servers on April 17th, highlighting MCP as an emerging standard to improve ML context management. Wildcard paused maintenance of agents.json due to MCP adoption, and AutoMCP launched as a platform to deploy agent projects as MCP servers with a Vercel/Heroku-like experience.
  • LlamaIndex LlamaParse Excels in Document Parsing: LlamaIndex highlighted LlamaParse's enhanced document parsing quality for documents with images, tables, and charts, surpassing basic readers like SimpleDirectoryReader in parsing quality, and offered a guide on Visual Citations with LlamaParse Layout Agent Mode.
  • Brave Search API Gains Traction for Agent Pipelines: Yannick Kilcher Discord members suggested Brave Search API as a good alternative for agent pipelines, even on the free tier, noting its AI summarizer is cheaper than OpenAI's web search API. Hugging Face sought early testers for a new Deep Search Agent using smolagents, and Nomic.ai members explored Nomic embeddings for automatic website linking to create interconnected document networks.

PART 1: High level Discord summaries

Perplexity AI Discord

  • Perplexity Launches Six New Features!: Perplexity AI announced six new features, including Android Draw to Search, Champions League integration, Voice Search, Box and Dropbox Connectors, Perplexity Finance Time Comparison, and a Perplexity Telegram Bot, as documented in their changelog.
    • The update aims to enhance search and automation capabilities for users across various platforms.
  • Sonar Models Beat Gemini in Search Arena: Perplexity AI's Sonar-Reasoning-Pro-High model tied for first place with Gemini-2.5-Pro-Grounding in LM Arena's Search Arena, scoring 1136 and 1142 respectively.
    • According to Perplexity's blog, Sonar models outperformed Gemini models due to substantially higher search depth, citing 2-3x more sources.
  • Perplexity Eyes Livestream Recordings, API Toggles, and ComfyUI Integration: The team confirmed that recordings from the Perplexity livestream will be made available online after a user inquired about it, as seen on X.com.
    • Additionally, a member hinted at a Perplexity ComfyUI integration and questioned if API toggles, similar to the "Social" toggle, are on their way.
  • Users Triggered By Fake Play Button: Members in the general channel admitted to being tricked by a fake play button.
    • One member stated that that fake play button got me and another replied lowkey tapped instantly.


LMArena Discord

  • Google Nerfs Gemini 2.5 Pro Tool Calling: Members reported that Google nerfed 2.5 Pro's tool calling function and 2.5 Pro now can't execute tool calls because of buggy messes.
    • Members suggest the nerfing may be related to cost.
  • GPT 4.1 Surfs on Windsurf AI: GPT 4.1 is free in Windsurf for the next 7 days, prompting users to try it out.
    • Some users expressed surprise that OpenAI partnered with Windsurf rather than Cursor for the release.
  • RooCode Emerges as Top-Tier Coding IDE: After some nudging, some members tried RooCode, calling it absolutely superior to Cline, and most likely the best coding IDE right now.
    • Downsides include that GitHub Copilot integration into RooCode is rate limited and buggy.
  • GPT-4.1 Trumps GPT-4o Mini: Members believe that Quasar/Optimus are test versions of the recently released GPT-4.1 and GPT-4.1 Mini models and that these models are not groundbreaking or as impressive as initially hoped.
    • The GPT-4.5 model has been deprecated, and the improvements have been rolled into the 4.1 model.
  • GPT 4.1 Dissolves into GPT4 Turbo: Members are reporting that GPT 4.1 is not available via the API and that improvements in instruction following, coding, and intelligence are gradually being incorporated into the latest version of GPT 4o.
    • Some members confirmed that the GPT 4.1 improvements have been rolled into the GPT 4o model and can be accessed on the OpenAI website.


aider (Paul Gauthier) Discord

  • Aider's latest update with GPT-4.1 support: Aider v0.82.0 gets support for GPT 4.1, architect mode with Gemini 2.5 Pro, and the Fireworks AI model deepseek-v3-0324, as well as patch, editor-diff, editor-whole, and editor-diff-fenced edit formats.
    • The release includes support for xai/grok-3-beta, openrouter/openrouter/optimus-alpha, and aliases like grok3 and optimus to replace OpenRouter's now-retired free alpha endpoints for Optimus and Quasar.
  • Discord users debate off-topic channels for Aider: Members are split on the necessity of an off-topic channel in the Aider Discord server, discussing the balance between 'having fun' and keeping the main channel focused, and requesting a change of heart from Paul G.
    • Members can't agree whether to focus on Aider or have a place to discuss fart jokes.
  • Claude 3.7 wins over Gemini 2.5: Members report that Gemini 2.5 Pro struggles with longer contexts and code block completion, but can be improved with a 'swear oath', whereas Claude 3.7 performs better for natural writing and specific tasks.
    • Community members praise Claude 3.7 for its natural language capabilities, and others found the models great in getting rid of overcommenting behaviors.
  • Users seek replication of Cline's memory bank workflow in Aider: A member inquired about replicating something like Cline's memory bank workflow in Aider, by adding plan.md to the chat and then alternating between saying do the next step and mark that step done.
    • This aims to help create a task list so that Aider can go through each task one at a time together.
  • Members share Prompt Engineering Resources: A member posted a link to Kaggle's whitepaper on prompt engineering, while other members shared a prompting guide for GPT-4.1.
    • The prompting guide is designed to help users optimize interactions with the GPT-4.1 model.


OpenRouter (Alex Atallah) Discord

  • Gemini Prices Get Real: OpenRouter began charging normal prices for long Gemini prompts, affecting prompts over 200k for Gemini 2.5 and 128k for Gemini 1.5, aligning with Vertex/AI Studio rates.
    • The change was due to skyrocketing Gemini 2.5 usage, ending a 50% discount for long context prompts.
  • Free Models Flood OpenRouter!: Six new free models were added to OpenRouter, including roleplay-tuned QwQ-32B-ArliAI-RpR-v1, long-context code generation DeepCoder-14B-Preview, and Mixture-of-Experts VLM Kimi-VL-A3B-Thinking.
    • These models offer diverse capabilities, from role-playing to code generation, expanding the options available on the platform.
  • NVIDIA Llama-3 Variants go Free!: Three Llama-3 variants from NVIDIA (Nano-8B, Super-49B, Ultra-253B) were added, optimized for reasoning, tool use, and RAG tasks with extended context windows up to 128K tokens.
    • Users have begun testing the relative performance of these models.
  • GPT-4.1 Models: The Next Iteration: GPT-4.1, GPT-4.1-mini, and GPT-4.1-nano models launched on OpenRouter, with the full model optimized for long-context reasoning.
    • Users have noted that GPT-4.1 and 4.1 mini seem to perform on par somehow at least on the spaceship prompt, but others were performing thorough tests to measure performance.
  • Skywork-OR1 Series Unleashes Reasoning Power: The Skywork-OR1 model series was introduced, featuring Skywork-OR1-Math-7B, which excels at mathematical reasoning, and Skywork-OR1-32B-Preview, rivaling Deepseek-R1's performance on math and coding tasks.
    • Both models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.


Manus.im Discord Discord

  • PDF to Website Transfer is Hot: A member noted the ease of transferring PDFs to websites.
    • This solution was considered a great case.
  • DeepSeek V3 Waits in the Wings: A member inquired about Manus's project-creation capabilities, but it was concluded that Manus currently offers only DeepSeek R1, with a future upgrade to their top-tier model anticipated in a few months.
    • Another member derided Qwen's recent coding abilities.
  • Cybersecurity Career Combos Considered: A member considered a career switch but decided to remain in cybersecurity, given their coding proficiency.
    • The potential impact of quantum on cybersecurity was also discussed.
  • Agency Chooses GCP Over Firebase: An agency chose GCP for its infrastructure, citing its cost-effectiveness, with another user presenting a 40-page analysis supporting a switch from Microsoft to GCP.
    • Google received a rating of 4.7 out of 5, whereas Microsoft scored 4.4.
  • Gemini 2.5 Pro Eats Data: A user praised Gemini 2.5 Pro for its data processing prowess, superiority over ChatGPT, and it prompted them to cancel their Perplexity subscription.
    • Users observed that Gemini 2.5 Pro requires fewer credits per task and is improving alongside the release of Claude max pro and decreasing costs.


Unsloth AI (Daniel Han) Discord

  • Gemma GRPO Grind: Members debated using Gemma 4B versus Gemma 1B for GRPO, clarifying that while GRPO can be done on both, the 4B version won't fit on Colab.
    • Concerns arose about setting appropriate training steps for a 15k-row dataset, with suggestions to check how batching, epochs, and gradient accumulation work together.
  • AMD GPU Anaconda: Users are wrestling to get Unsloth working on AMD GPUs, running into NotImplementedError given Unsloth's initial NVIDIA focus.
    • The core issue centers on BNB failing to build correctly, even with AMD torch.cuda.is_available() returning True.
  • LM2 Memory: Gemma Gains: Experiments involving integrating LM2's memory units directly into Gemma 3 were undertaken to promote contextual awareness between prompts.
    • Monkey patching model layers to hook memory leads to challenges in quantization to reduce hardware requirements, with one member hooking every 6th layer in gma6 [https://github.com/jagoff2/gma6].
  • DeepSeek's Inferencing Insights: The DeepSeek Inference Engine has stirred discussion regarding the inference performance expectations for smaller providers.
    • Concerns were raised about providers potentially running vllm serve with suboptimal configurations, affecting model performance when serving DeepSeek R1.
  • Apple's Cross Entropy Eviscerated: An insightful article explaining Apple's cut cross entropy was shared, framing transformers as a sequential ML classification task on a for loop (zhuanlan.zhihu.com).
    • An alternative GitHub repo was provided due to accessibility issues with the original link.


OpenAI Discord

  • OpenAI Streams Soon!: OpenAI announced a livestream scheduled for 10am PT , and community members are speculating on the potential release of GPT-4.1 in the API.
    • The announcement specifically tagged the GPT roles, suggesting a possible focus on GPT models or related updates.
  • Veo 2 vs Sora in the Video Ring: Members compared Google's Veo 2 to OpenAI's Sora for video generation, with some preferring Veo 2's more natural 24 fps video.
    • One member noted that overly smooth frame rates register in their brain as instant AI-generated content and another member was able to jailbreak the model to animate The Lion King.
  • Memory Controls Get Detailed!: Details on the OpenAI Memory FAQ show controls for ChatGPT's memory with a dual-tier architecture of saved memories and chat history references.
    • The update lets users control and edit preferences by enabling or disabling memory and chat history.
  • User Battles Prompt Defaults!: A user reported that their ChatGPT agent, built two months ago, is now rigorously ignoring prompt defaults, such as table format or column specifications, despite no changes to the extensive prompt.
    • The user requested insights or solutions to this problem of models ignoring past established parameters.
  • Images Get Clearer with Prompting Tweaks!: A user inquired about removing the smudged look from image generations, to which another user suggested it depends on the prompt, sharing prompting techniques to guide the model.
    • Additionally, a user successfully generated specific fonts in images by providing a screenshot of the desired font to ChatGPT.


Cursor Community Discord

  • OpenAI drops Model and China reacts: OpenAI dropped a new model, sparking comparisons to DeepSeek, Claude, GPT, and Gemini.
    • A member observed that China is not doing too hot in this arena, while another remarked that the USA underestimates everything, like always.
  • Claude 3.7 Wins Gold for Cursor: Members are finding Claude 3.7 Sonnet to be the top choice in Cursor, outperforming Gemini and Google models due to stability, one-shot capabilities, and code quality.
    • With one adding that Claude models are improving, to me the older the smarter.
  • Gemini 2.5 Gets Insane at UI: Gemini 2.5 Pro is getting recognized for its insane UI design capabilities, with members sharing examples of its unique output, and keeping it in context.
    • One user commented that Gemini’s UI modifications are absolutely insane.
  • Windsurf Sinks, Users Prefer Cursor: Users are reporting reliability issues with Windsurf, saying it overpromises, leading some to recommend Cursor when utilized properly.
    • One user quipped, welcome to shit surf.
  • Community Awaits GPT-4.1: The community is discussing the imminent release of GPT-4.1 and how to start using it, mentioning the expected deprecation of 4.5.
    • Members anticipate that Everyone will start merging to 4.1; 2.5 pool will clear, Claude 3.5 3.7 will clear a bit until 4.1 gets quote exceeded and repeat the same process with a newer model.


LM Studio Discord

  • LM Studio Nixes Multi-Model Magic: Users lament the loss of the multi-model prompting feature in LM Studio version 0.3, a feature previously available in version 0.2, with one user commenting it was "the best thing in the world" to compare models using LM Studio.
    • They are seeking alternatives for model comparisons.
  • Offline LM Studio Runtime Wrangling Required: To run LM Studio on an offline PC, users must manually transfer the LM runtimes located in C:\Users\jedd\.cache\lm-studio\extensions\backends.
    • Documentation for importing models via localhost can be found here.
  • Python Purgatory: Examples Pulled from LM Studio Server Docs: Users noticed the Python examples are missing from the server part of LM Studio and are requesting Python examples.
    • An alternative was shared: lmstudioservercodeexamples.
  • Threadripper Thrashes Xeon for Tokens: A member stated that for purely cost considerations, a Threadripper or Epyc chip would provide better dollars per token than dual Intel Xeon w7-3565X CPUs.
    • It was noted that on Threadripper 7xxx, there's almost no performance difference after llama.cpp uses over 20 threads, but performance slows when exceeding 64 threads on one CPU to utilize another.
  • ROCm Rough Patch: RX 6700 XT Recs Reconsidered: A member asked about buying an AMD Radeon RX 6700 XT to run Gemma, and whether ROCm is as strong as CUDA.
    • The reply was that there is no rocm support on 6700XT, and to run Gemma 12b at least 16GB of VRAM is needed, so it's recommended to save for a 7900XT with 24GB of VRAM if an AMD card is a must.


Yannick Kilcher Discord

  • LLMs Compared to Probabilistic FSAs: LLMs are argued to be approximately probabilistic finite-state automata (FSA), implying scaling obstacles and weaknesses in math; there was one member rebutting that this analogy is not very meaningful.
    • Members added that the comparison is similar to saying humans are "approximately a monkey", undermining the comparison's weight.
  • AlphaProof is Silver Medalist: Members watched a video about using AI for assisted proofing and summarized that AlphaProof won silver medalists without using a single bit of human knowledge.
    • Another member pointed out that this information is based on the company's claims, stating "AlphaProof is silver medalists without using a single bit of human knowledge (as far as they say)"
  • Brave Search API Gaining Traction: Members suggests the Brave Search API as a good alternative for agent pipelines, highlighting positive experiences even on the free tier.
    • It was mentioned that the AI summarizer is cheaper than OpenAI's web search API.
  • Gen AI Use Case Data Skewed?: Members are discussing the The 2025 Top-100 Gen AI Use Case Report, suggesting the data might be skewed due to Reddit being the only data source.
    • Members also pointed out that Character.AI has 28 million users but receives little attention in ML circles.


HuggingFace Discord

  • Hugging Face tests Llama 4 Maverick & Scout: Hugging Face welcomed Llama 4 Maverick and Llama 4 Scout, and tests showed their performance on the DABStep benchmark.
    • It was reported that Claude 3.7 Sonnet, Gemini 2.5 Pro, Llama 4 Maverick, and Llama 4 Scout were all tested and compared in the process.
  • HF Models 404 Errors Plague Users: Users reported widespread 404 errors when trying to access Hugging Face models, bringing their apps down, as seen in this link.
    • A member tagged a specific HF employee, mentioning this 404 error had persisted most of the day already.
  • Users are Obsessed with Ollama: Members discussed using Ollama to run models locally, sharing commands to download and run specific models like qwen2.5-coder:32b as a substitute for models behind API limits.
    • One member provided a code snippet demonstrating how to specify the Ollama provider when initializing a CodeAgent with a locally hosted model like bartowski/Qwen2.5-Coder-32B-Instruct-GGUF.
  • New Deep Search Agent Seeks Early Testers: A new agent focused on Deep Search using smolagents has been built, and early testers are being sought at agent.galadriel.com.
    • Feedback is welcome, with a request to reach out with questions and ideas to the product team.
  • Agent Fixated with Pope's Age: One user reported their agent was inexplicably obsessed with finding the Pope's age and squaring it to 0.36 when running locally with models like llama3, deepseekr1:8b, and qwen2.5-coder:latest.
    • The issue was suspected to originate from a hardcoded sample within the smolagent default agent tool prompts, as it didn't occur when using HfApiModel.


Eleuther Discord

  • Models Bear Striking Resemblance: A member noticed striking similarities in post-MLP hidden state cosine similarity between sequences of different models, using this script.
    • Small models group by type more than color, while larger models rank by color more consistently.
  • No Batch Repetition!: A member advised against repeating data within a minibatch, citing potential for major issues.
    • They shared about investigative information analytics within cognitive science and ML/AI, facilitating insights across disciplines, and communicating those to different parties.
  • Multiple Token Prediction Papers: A member sought after papers on multiple token prediction with LLMs during inference, and another user suggested DeepSeek v3.
    • Another user pointed to this paper and recalled seeing one from Meta years ago.
  • AI "Research" Under Scrutiny: Members voiced concerns about the rise of AI-generated content presented as research, which is often characterized by made-up terminology and lack of alignment with legitimate research ideas.
    • Suggestions included a ban for bad-faith users hiding AI usage and a long-term mute for good-faith users exhibiting inexperience.
  • Length Extrapolation Discrepancies: Members discussed challenges in length extrapolation, noting that models often fail to consistently decrease token loss beyond their training sequence length, as shown in this plot.
    • Techniques like NoPE + SWA and ssmax (Super Scaling Max Activation) were mentioned as potential solutions.


Latent Space Discord

  • Karpathy Tries Embarrassing ChatGPT: ChatGPT got put on the spot by a user who shared a prompt asking What's the most embarrassing thing you know about me?.
    • The user wanted to see if ChatGPT could give honest and direct answers through multiple rounds of questioning.
  • Thinking Machines Seed Hits $2B: Thinking Machines is apparently doing a $2B seed round, advised by Alec Radford, according to a Fortune article.
    • A user posted a good chart from Epoch AI illustrating the raise.
  • DeepSeek Opens Up Inference Engine: DeepSeek has open-sourced its inference engine, with the GitHub repo available for review.
    • Members wondered who wants to chat about DeepSeek's open sourcing.
  • Quasar Launch Watch Party Happening: Latent Space is hosting another watch party for the Quasar launch, at this discord event.
    • During an OpenAI Quasar launch watch party, members discussed the features of GPT-4.1, including its competitive pricing compared to Claude and flat pricing on long input contexts, referencing the pricing documentation.
  • Agent Definitions Vibe Checked: Members debated the definition of an agent, with one suggesting today's definition: an LLM calls a tool while another presented a Figma board on self-improving agents.
    • One suggested: the agent you vibe code while bored in a meeting.


Notebook LM Discord

  • NotebookLM's Latent Space Creates Non-Determinism: A member stated that the variability of the latent space causes the inability to generate the same output every time, resulting in random generations based on the input each time, as NotebookLM is not designed to be a deterministic system.
    • They cautioned against expecting NotebookLM to perform like a more expensive, specialized system.
  • NotebookLM Transforms Education Experience: A member is using NotebookLM in their classroom to upload slide decks and materials, create notes, study guides with quiz questions, a glossary of terms, mind maps, and an audio overview, then shares it with students to help them prepare for exams.
    • They also reported having students create their own NotebookLMs in groups.
  • Users clamoring for Gemini Education Workspace: A member asked if others are using Gemini through an Education Workspace, expressing interest in districts and departments successfully using Gemini within their Workspaces.
    • They noted that in NSW, Australia, they cannot yet use Gemini.
  • Cat Owners Want Chatbots for Furry Friends: A member who runs a large support group for owners of diabetic cats wants to provide their members with a conversational interface to their documentation, including video content, and in French.
    • They would like members to ask questions and get answers based on documentation with links to relevant docs to read.
  • NotebookLM "Discover" Feature Sparks Excitement: A user expressed great satisfaction with the new "Discover sources" feature in NotebookLM, stating "It's everything I could have wanted".
    • The same user looks forward to more audio overview flavors and praised Grace's podcasts.


Nous Research AI Discord

  • Llama 4 Burns GPU Hours?: Members noted that Meta's Llama 4 Maverick used 2.38M GPU hours, while Llama 4 Scout used 5.0M GPU hours, the same as training Deepseek V3.
    • Some questioned the fairness of comparing against models tuned for human preferences, while others suggested LeCun's involvement may explain it.
  • DeepCoder Delivers Top Coding Performance: A member shared a VentureBeat article about DeepCoder, highlighting its efficient 14B parameter open model and enhanced GRPO algorithm.
    • The model incorporates offline difficulty filtering, no entropy loss, no KL loss, and overlong filtering from DAPO, generalizing to 64K context despite training with 32K.
  • Nvidia UltraLong Models Swallow Context: Nvidia's UltraLong-8B models, featured in this Hugging Face collection, are designed to process sequences up to 4M tokens built on Llama-3.1.
    • These models combine continued pretraining with instruction tuning, trained for 150 iterations with a 4M sequence length and a global batch size of 2.
  • GPT-4.1 Benchmarks Better, Pricing Confuses: Members discussed pricing and benchmarks for GPT-4.1, noting that benchmarks are better than past releases, but the pricing and model versioning are confusing, especially with the new model's availability in GitHub Copilot.
    • Speculation arose about 4.1-nano rivaling good 14B models, and the possibility of it being open sourced.
  • H100 training of Llama 4 Scout shows Loss Increase!: A member observed an increasing loss from 1.9011 to 2.3407 between epochs 1 and 2 when training Llama 4 Scout on an H100 setup.
    • The user expressed concern because loss did not decrease as expected, even when using two H100 GPUs and a member suggested the minimum you should work with is 10M parameters no matter what the task is.


MCP (Glama) Discord

  • Graphlit Crafts MCP Server for Content: Graphlit is building an MCP server for Reddit and Quora, and offered to add Quora ingestion if needed.
    • Currently a few exist for Reddit, such as this repo.
  • Agency Dev Kit rivals MCP: Members discussed Google's ADK and A2A and their similarity to MCP, and potential centrality to the internet of agents.
    • A member shared that there is no official consensus on non-MCP tech talk, but if it's at least somewhat relevant to AI/ML/MCP then there should be no issues.
  • Function-less Models get Block Tweaks: Block is experimenting with models that lack function calling abilities to see if they can tweak their output to work with agents, and this blog post explores doing that without a secondary model via XML output.
    • The team is weighing the latency costs versus the benefits of using a secondary model for parsing, with concerns about longer sessions and the ability to stick to the XML format, and may use a local model, with concerns of more overhead.
  • Copilot Client debugging aided by MCP Tools: synf and mcptee help members spot and fix bugs while testing with Copilot client, which can struggle with longer contexts and more tools.
    • One member is building with fast hardware in mind, since multiple API calls will always be slower than doing 1.
  • Paprika Recipe App gets Savory MCP Server: An MCP server was created for anyone who uses the Paprika recipe app, so that Claude can automatically save recipes into Paprika via this GitHub repo.
    • No further information was given.


GPU MODE Discord

  • CUDA Synchronization Guidance Crystallizes: A member asked for CUDA references within Python/PyTorch models, and another member shared their recent GTC talk about it, also found on nvidia.com.
    • The talk suggests that custom ops and load inline should address most problems, along with ongoing work to cut compilation times; a member found Stephen Jones' videos, referenced in the talk, and said that vacation is over and talks start again.
  • Hilbert Curves Heat Up GEMM Performance: A member shared a GitHub repo showcasing GEMM implementation with Hilbert curves, along with benchmarks against cuBLAS.
    • The benchmarks indicate that Hilbert curves become more effective as the matrix size increases, with further discussion revealing that Hilbert Curves, while optimal, are not hardware-efficient, suggesting Morton ordering is a better practical trade-off and pointing to a blog post comparing the two.
  • memcpy_async Alignment Accelerates Performance: After switching to cuda::memcpy_async, a user reported a performance slowdown, and it was suggested that this is a cooperative API, meaning all threads must pass the same pointer(s) and a size corresponding to the entire block of memory, referencing the official CUDA documentation.
    • It was also suggested that potential problems with memcpy_async include the alignment of the shared memory address and conditionals around the instruction, which can hinder coalesced memory access referencing a forum post.
  • Memory Profiling Distributed Systems Baffles Beginners: An engineer seeks advice on memory profiling a model trained on a SLURM cluster with 8 nodes, each having 8 GPUs, for distributed training.
    • Furthermore, an engineer inquired about the implementation pointed to by a specific line in ATen's attention.cu (link to GitHub) aiming to understand how torch/CUDA handles individual user operands [dHead x K-cache-length] in a batch.
  • Metal Memory Mystery Mastered: A member found that a global memory coalesced matrix multiplication implementation in Metal uses half the memory of a naive version, testing with this CUDA MMM implementation as a reference.
    • One explanation posited that the OS pulls data as pages, and non-coalesced access leads to inefficient page usage where only a small portion of the pulled data is actually utilized, others noted that M-series chips have unified memory, which should negate paging between CPU and GPU.


Nomic.ai (GPT4All) Discord

  • Nomic Embeddings Weave Websites: A member reports success using Nomic embeddings to automatically link website pages, drastically cutting manual work, detailed in the semantical-website-links blogpost.
    • They're exploring methods to automatically identify and link key terms to embeddings, creating an interconnected, self-updating network of documents, as discussed in this YouTube video.
  • GPT4All's Token Tussle: A user trying to generate a lengthy play using GPT4All models encountered a response length cap, despite attempts to use models within GPT4All.
    • Suggestions included upping the Max Tokens setting and breaking the story down, but the user is still on the hunt for models that can handle longer outputs.
  • HuggingFace Story Models: Models tagged with 'story' on HuggingFace are proving successful for generating longer responses, much to the delight of a member.
    • However, caution was advised, as many of these models may be proprietary, potentially limiting their use as free software.
  • Deciphering Chat Template Locations: A member sought the whereabouts of chat templates for models like Llama3.2, Llama3.1, Aya-23, and KafkaLM-8x7b-German-V0.1,
    • They were advised to check the model authors' releases on their website, GitHub, or Hugging Face, with a specific focus on the tokenizer_config.json file for the chat_template entry.
  • Context Length Curbs Creativity: Models typically train on context lengths between 2048 and 8192 tokens, and while RoPE and Yarn can stretch this, response quality tends to nosedive beyond the original range.
    • While dependent on the training dataset and finetuning, response length can be tweaked with prompting, like explicitly asking the model to make it VERY VERY LONG.


Modular (Mojo 🔥) Discord

  • Origins morphs into Lifetimes: The term Origin in Mojo was renamed to Lifetime, potentially easing understanding for those familiar with Rust's lifetime concepts, per the docs.
    • Mojo extends the lifetime of values to match any reference holding onto them; instead, the origin of every reference must be tracked to determine value extensions and freedom, contrasting Rust's scope-based lifetime tracking.
  • VSCode loses Mojmelo: Users reported that the Mojo VSCode extension fails to detect the mojmelo module despite manual installation, due to the extension's use of its own Mojo installation.
    • The workaround involves manually configuring the extension to use local module repositories for intellisense.
  • Mojo PEPs are in the works: Inspired by Python's PEPs, a member suggested a similar system for Mojo to track changes, and another member pointed to Mojo's existing proposal system.
    • The discussion shows the community's interest in a structured way to manage and communicate language evolution.
  • Negative Bounds are now in season: Negative bounds are a way to invert a named set, often used with marker traits to define the inverse of a set of types, such as !Send representing a thread-local variable.
    • For example, the marker trait indicates that it's not safe to move between threads.


LlamaIndex Discord

  • GPT-4.1 API Gets Day 0 Support: OpenAI launched GPT-4.1 in the API, with immediate support via pip install -U llama-index-llms-openai, detailed here.
    • Benchmarks indicate that GPT-4.1 shows a ~10% improvement against 4o and a ~2% improvement on existing agentic approaches.
  • LlamaParse Excels in Document Parsing: LlamaParse delivers enhanced parsing quality for documents with images, tables, and charts, surpassing basic readers like SimpleDirectoryReader.
    • One member emphasized that it's the quality of the parsed documents that differentiates LlamaParse from SimpleDirectoryReader.
  • Open Source LLMs Battle Agentic Tasks: While smaller open-source LLMs struggle with agent workflows, larger models such as Llama3, Llama 3.1, Llama 3.2:3b, or Mistral are proving more effective, especially when used with Ollama.
    • A member mentioned successful use of llama3.2:3b for their agentic needs.
  • No History for .query Chats: It was clarified that Char .query is stateless and does not retain any chat history, and therefore does not store the chat log.
    • Members looking for memory persistence are advised to consider using an agent.
  • AI Evaluation Models Evaluated: A research paper, Benchmarking AI evaluation models, assessed models like LLM-as-a-judge, HHEM, and Prometheus across 6 RAG applications.
    • The study found that these evaluation models perform surprisingly well in real-world scenarios.


tinygrad (George Hotz) Discord

  • NVIDIA Drops New Video Codec SDK: NVIDIA released the Video Codec SDK along with samples on GitHub and one user cautioned against AI-generated PRs.
    • The user threatened to close submissions and ban repeat offenders, emphasizing the importance of understanding the content.
  • TinyGrad Meeting #66 Topics: Meeting #66 is scheduled for Monday covering company updates, chip!, fast python, bert, mlperf, scheduler, driver, webgpu, retinanet, torch frontend multi gpu, cloud scale uuuvn stuff, and bounties.
    • A member indicated they understood the requirements for the Index Validation PR after seeing a comment and expect to have it ready by the next day.
  • Clang Flags Silence Debug Output: A member suggested using the -fno-ident clang flag to prevent extra sections (.comment and .note.GNU-stack) from being added to images and polluting DEBUG=7 output.
    • This change helps in keeping the debug output cleaner and more manageable.
  • New TinyGrad Project Seeks Assistance: A new member introduced themselves seeking a first project to get hands-on experience with tinygrad and was recommended to work on a small bounty.
    • Helpful resources, including tinygrad-notes and mesozoic-egg's tinygrad-notes, were also shared to aid in their learning.
  • Debugging NaN Issues in Softmax: A member reported debugging NaNs within a model, suspecting a softmax() issue and noted that printing mid-__call__ was causing optimizer issues.
    • George Hotz responded that printing shouldn't break things and suggested posting an issue for further investigation.


Torchtune Discord

  • TorchTune Models Integrate with vLLM: Members discussed integrating custom TorchTune models with vLLM, recommending inferencing TorchTune finetuned models similar to HF models, with a tutorial provided.
    • For custom networks not defined on HF, defining the model in vLLM is necessary, as detailed in the vLLM documentation, or use Torchtune's generate script as an alternative.
  • Bitsandbytes Bites Mac Users: pip install -e '.[dev] fails on macOS due to bitsandbytes>=0.43.0 not shipping binaries for platforms other than linux, but downgrading to bitsandbytes>=0.42.0 can help.
    • Releases up to 0.42 were incorrectly tagged, but at least this makes it installable, according to bitsandbytes issue 1378.
  • QLoRA Digs Deeper with Sub-4-Bit Quantization: Members have been seeking literature on QLoRA-style training using quantization below 4 bits.
    • The inquiry specifically targeted methods and findings related to sub-4-bit quantization techniques in the context of QLoRA.
  • Reward Functions Get Shaped: The team plans to support different reward functions, with implementation details under discussion, and there have been questions about locating reward computing in a weird way.
    • There was a follow up about collecting a list of important ones, so stay tuned!
  • Loss Functions Proliferate, Experimentation Thrives: The team experiments with different loss functions, aiming to avoid excessive recipe proliferation by potentially adopting a protocol similar to DPO losses.
    • The objective is to balance supporting essential losses and preventing overgeneralization during this experimental phase, and there is an acknowledgement of hardcoded test parameters during testing on A100s.


Cohere Discord

  • Coral Chat Extends Reach into Firefox: Coral Chat is now a chatbot in the Firefox sidebar, configurable by setting browser.ml.chat.provider to https://coral.cohere.com/.
    • A user demonstrated the integration in an Imgur link showcasing its functionality.
  • Next-Token Generation Troubles Surface: A YouTube video highlights the potential issues LLMs face when generating the next token in a given context.
    • Discussion suggests that the problem is widespread across various LLMs.
  • Cohere Chat API Gets Java Demo: A member shared a Java example showcasing the Cohere Chat API, particularly the runInteractiveDemo() method interacting with the command-a-03-2025 model.
    • The demo allows users to interact with Cohere AI, logging prompts and API interactions for debugging and optimization.
  • Diofanti.org Exposes Greek Government Spendings: Diofanti.org is an open-data platform monitoring government spending in Greece, providing tools for transparency and accountability.
    • The Aya model is the go-to model for the platform's chatbot, supporting transparency and accountability initiatives.
  • LUWA App Set to Launch in April 2025: The LUWA.app, a search directory for AI powered apps, will go live on April 25, 2025.
    • The creator is exploring Cohere and its LLM models to reduce costs and enhance app performance.


LLM Agents (Berkeley MOOC) Discord

  • Lambda Gives Serverless API Credits: Lambda is offering $100 of serverless API credits for Inference to every individual participant, application here.
    • Sponsors Lambda, HuggingFace, Groq, and Mistral AI are also offering API/compute credits to select teams, with more details here and application here.
  • Google Provides Access to Gemini API: Google is granting access to Gemini API and Google AI Studio free of charge to ALL participants.
    • This provides a valuable opportunity for participants to explore and utilize Google's AI capabilities during the hackathon.
  • Sean Welleck Teaches AI-Powered Math: Sean Welleck, an Assistant Professor at Carnegie Mellon University, presented a lecture on Bridging Informal and Formal Mathematical Reasoning, covering AI-powered tools that support proof development, watch the livestream here.
    • Welleck leads the Machine Learning, Language, and Logic (L3) Lab at Carnegie Mellon University and has won a NeurIPS 2021 Outstanding Paper Award and two NVIDIA AI Pioneering Research Awards.
  • Email Notifications Briefly Delayed: Members noted that there was a delay with usual email notification for today's lecture.
    • A member confirmed that there was a lecture and the email was sent a little late.


DSPy Discord

  • AI Agent Developer available for hire: An experienced AI Agent Developer announced their availability for new projects or full-time opportunities.
    • They specialize in building autonomous agents powered by GPT-4, LangChain, AutoGen, CrewAI, and other cutting-edge tools.
  • DSPy Module Metric?: A member inquired about a new metric to evaluate DSPy modules.
    • They referenced this paper as possible inspiration.


MLOps @Chipro Discord

  • MCP Server Deploys on AWS: A workshop on April 17th at 8 AM PT will cover building and deploying a production-grade Model Context Protocol (MCP) server on AWS.
    • Sign up is available at https://buff.ly/R7czfKK for the workshop.
  • MCP Standard Improves ML Contexts: MCP is highlighted as an emerging standard to improve how machine learning contexts are defined, shared, and managed across projects and teams.
    • The workshop will provide practical insights into MCP’s capabilities, benefiting Data Engineers, Data Scientists, Machine Learning Engineers, and AI/ML Enthusiasts.


Codeium (Windsurf) Discord

  • Windsurf Launches GPT-4.1: GPT-4.1 is now available on Windsurf, across Twitter/X, Bluesky, and Threads.
    • Windsurf made a promotional video and a TikTok post (latest vid too.
  • Windsurf Offers Free Unlimited GPT-4.1: Windsurf is offering free unlimited GPT-4.1 usage on all plans for one week only (April 14-21).
    • After April 21, GPT-4.1 will be available at a special discounted rate of just 0.25 credits per use.
  • GPT-4.1 Becomes New Default Model: New users will get GPT-4.1 as their default model, and existing users can easily switch through the model selector.
    • Windsurfers are saying, "Don't miss this limited-time opportunity!"


Gorilla LLM (Berkeley Function Calling) Discord

  • Gorilla LLM Loses a Column: The multi-turn composite column was removed from the dataset, though the reason remains unstated.
    • Despite its removal, the column is still mentioned in the "Newly Introduced Categories" section of the BlogPost and carries a weight of 200 points out of 1000 for multi-turn tasks.
  • Gorilla LLM Has Dataset Glitch: A discrepancy affects the dataset composition, as the multi-turn composite column is absent from the table/diagram illustrating the dataset's structure.
    • It remains unclear whether the column's removal is temporary or if the blog post should also be updated to reflect this change.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.