AI News (MOVED TO news.smol.ai!)

Archives
April 4, 2025

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


a quiet day.

AI News for 4/2/2025-4/3/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 5764 messages) for you. Estimated reading time saved (at 200wpm): 552 minutes. You can now tag @smol_ai for AINews discussions!

Devin cut prices, and the 1m token context window Qusar-Alpha might either be the new OpenAI open weights model or Meta's Llama 4, but neither seemed substantial enough to make title story.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

Large Language Models (LLMs) and Model Performance

  • Gemini 2.5 Pro's Capabilities and Limitations: @hkproj noted that one reason they're not using Gemini 2.5 Pro is because it doesn't render math using LaTex like ChatGPT. Despite acknowledging that Google did a good job overall, this detail is a drawback. @danielhanchen reported that Gemini 2.5 Pro achieved 24.4% on the 2025 US AMO (America Mathematical Olympiad), which was held March 19th-20th. @rasbt highlights that Gemini 2.5 Pro provides a valuable feature by indicating when it might be wrong, emphasizing the importance of AI models being able to acknowledge and correct their mistakes.
  • The Performance and Ranking of DeepSeek V3: @alexandr_wang clarified that DeepSeek V3 is a competitive but not a top model, and the SEAL leaderboards have been updated to reflect this. It ranks 8th on Humanity’s Last Exam (text-only) and 12th on MultiChallenge (multi-turn).
  • Qwen 2.5 Models Integration into PocketPal App: Qwen 2.5 models, including 1.5B (Q8) and 3B (Q5_0) versions, have been added to the PocketPal mobile app for both iOS and Android platforms. Users can provide feedback or report issues through the project's GitHub repository, with the developer promising to address concerns as time permits.
  • Concerns about LLM Chains of Thought (CoT): According to new research from @AnthropicAI, reasoning models do not accurately verbalize their reasoning, casting doubt on the reliability of monitoring chains-of-thought for catching safety issues. @AnthropicAI also found that Chains-of-Thought are not faithful, with models only mentioning the hint (when they used it) 25% of the time for Claude 3.7 Sonnet and 39% for DeepSeek R1. @AnthropicAI results suggest that CoT is less faithful on harder questions, which is concerning since LLMs will be used for increasingly hard tasks. @AnthropicAI notes that when they trained models on environments with reward hacks, they learned to hack, but in most cases almost never verbalized that they’d done so.

AI Tools, Frameworks, and Agent Development

  • PaperBench for Evaluating AI Agent Coding Abilities: @_philschmid discusses PaperBench, a new benchmark from OpenAI for evaluating the coding ability of AI agents to replicate state-of-the-art AI research. Despite strong models like Claude 3.5 Sonnet performing best at only 21.0% accuracy, the benchmark highlights that current AI agents struggle with long-horizon planning and execution.
  • CodeAct Agent Framework: @llama_index introduces CodeAct, a generalization of ReAct, that enables agents to dynamically write code using functions to solve tasks, instead of using chain-of-thought reasoning.
  • LangChain's Multi-Agent Systems and Handoffs: @LangChainAI provides a breakdown of the swarm handoff mechanism in LangGraph, explaining that handoffs are a central concept in multi-agent systems.
  • Runway Gen-4 for Media Creation: @c_valenzuelab shares that Runway is beginning its next chapter with Gen-4, entering a new media ecosystem. They believe AI can become a reliable world simulator, changing how media and stories are created and consumed.

Model Context Protocol (MCP)

  • MCP gaining traction: @alexalbert__ shared a timeline of MCP from their point of view, from November to March, highlighting its growing popularity and adoption across the industry.
  • MCP Track at AI Engineer World’s Fair 2025: @swyx announced that the AI Engineer World’s Fair 2025 will feature a dedicated MCP track, supported by AnthropicAI, aiming to bring together professionals working on MCP.
  • MCP Overview and Code Examples: @_philschmid shared a 5-minute overview of MCP with code examples for server and clients, converted from a knowledge-sharing session.

AI and Education

  • ChatGPT Plus Free for College Students: @sama announced that ChatGPT Plus is free for college students in the US and Canada through May.
  • Concerns About Education and AI: @teortaxesTex argues that people have no clue how to make education better by throwing money at it, and attempts to make less intelligent kids less dumb amount to counterproductive infantilizing bullshit.

AI and Geopolitics/Economics

  • Trump's Tariffs: @AravSrinivas summarized tariffs news using AskPerplexity, highlighting the economic implications. @wightmanr criticized the rates as fake and nonsensical and notes that considering the VAT a tariff is moronic given that it applies to foreign and domestic goods equally, and asks where the adults in the room are. @teortaxesTex found it interesting that Xi isn't a great enjoyer of tariffs, @teortaxesTex also laid out a 200 IQ thesis of how a chain reaction of reciprocal tariffs could crash Choyna.
  • AI Scalability and Compute: @MillionInt states that even for today's lame LLM models, demand already outpaces GPU supply, while @AravSrinivas emphasizes that AI is still massively compute-bound, presenting a golden opportunity.
  • China and the US: @teortaxesTex argues that Americans who say “well WE'RE THE BIGGEST CONSUMER what will you losers do lmao?” seem to be honestly deluded about their place in the world and will be made smaller, while @teortaxesTex states if China tariffed Western capital inputs during its industrial acceleration, China today would still be making Nike shoes by hand.
  • @fchollet says that one of the major weaknesses of autocracy is that the autocrat, being surrounded by sycophants that are terrified of him and that were selected for loyalty or blood ties rather than competence, becomes completely insulated from actual reality and faces no pushback on bad decisions

Humor/Memes

  • Congratulations: @pabbeel simply tweeted "congratulations!!!"
  • Public list meme: @nearcyan mentioned that the public list meme is really funny.
  • Grok thinks there might be a mistake in the simulation: @vikhyatk posted, "grok thinks there might be a mistake in the simulation".
  • One of these is not like the others: @matvelloso posted "One of these is not like the others"
  • It's good to have Runway @sarahcat21 said, "It's good to have Runway...in your portfolio".

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. "Advancements in AI Model Optimization and Evaluation"

  • What are you guys waiting for in the AI world this month? (Score: 106, Comments: 124): The post asks what people are waiting for in the AI world this month and lists several AI models and tools: Llama 4, Qwen 3, DeepSeek R2, Gemini 2.5 Flash, Mistral’s new model, and Diffusion LLM model API on OpenRouter. The OP is excited about upcoming AI developments and expresses anticipation for these specific models and updates.

    • You_Wen_AzzHu wants "something I can run locally with vision but not censored as hell as the Gemma 3."
    • a_slay_nub mentions, "I work for a company that only uses open-source US-based models. Sadly, the only thing I can look forward to is Llama 4."
    • falconandeagle desires a model that can compete with OpenAI for image generation, preferably uncensored, but believes "we are quite a bit away from that."
  • Open Sourcing Latent Space Guardrails that catch 43% of Hallucinations (Score: 144, Comments: 25): An open-source latent space guardrail tool has been released to monitor and stop unwelcome outputs from Large Language Models (LLMs) at the latent space level. The tool is available at https://github.com/wisent-ai/wisent-guard and achieves 43% detection of hallucinations on the TruthfulQA dataset it hasn't been trained on by analyzing activation patterns. It can control LLM outputs, blocking bad code, harmful content, or decisions influenced by gender or racial bias. This approach is different from circuit breakers or SAE-based mechanistic interpretability, and a new version based on latent space interventions will be released soon to reduce hallucinations and enhance capabilities. The author is enthusiastic about adapting the guardrails to users' use cases and believes this new method not only reduces hallucinations but can also improve LLM capabilities.

    • MoffKalast made a sarcastic remark: Ah yes, the LLM thought police., expressing concern over controlling AI outputs.
    • a_beautiful_rhind inquired if the tool can be used to block safe outputs like refusals and SFW redirection.
    • thezachlandes questioned: Why should it be able to detect bias?, prompting a discussion on bias detection in LLMs.
  • Official Gemma 3 QAT checkpoints (3x less memory for ~same performance) (Score: 422, Comments: 109): The Gemma team has released official quantization-aware trained (QAT) checkpoints for Gemma 3. This release allows users to utilize q4_0 quantization while retaining much better quality compared to naive quantization. The new models use 3x less memory with similar performance and are compatible with llama.cpp today. The team collaborated with llama.cpp and Hugging Face to validate the quality and performance, ensuring support for vision input as well. Models are available at https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b. The release is viewed as a significant improvement and a great initiative from the Gemma team. Users are impressed with the performance enhancements and hopeful that other teams may follow suit, potentially leading to models with faster inference and reduced memory footprints. There is curiosity about comparing these models to others, such as Bartowski's quantizations, and interest in the possibility of fine-tuning on top of these models.

    • OuchieOnChin shares perplexity (PPL) measurements comparing the new Gemma-3 q4_0 model to Bartowski's quants, noting a significant improvement and stating 'The improvement is big, maybe too big?'
    • ResearchCrafty1804 praises the Gemma team's initiative and hopes other teams like Qwen will follow, imagining models with 'two times faster inference and two times less memory footprint!'
    • poli-cya asks if people can fine-tune on top of these models and notes they give better performance at these quant levels than the original release quantized down.

Theme 2. "Exploring Enhancements in Gemma 3 Model Versions"

  • Gemma 3 Reasoning Finetune for Creative, Scientific, and Coding (Score: 146, Comments: 39): Gemma 3 Reasoning Finetune is an enhanced version of the Gemma 3 model, optimized for creative writing, scientific tasks, and coding. The model is presented as an improvement over the original Gemma 3, potentially offering better performance in these areas.

    • User 1uckyb requests clarification on which benchmarks show the +10-20% improvement, stating “There is so much noise and so little time in this space that if you want feedback/visibility you need to encourage it, for example by showing why it’s worth downloading your model.”
    • User AppearanceHeavy6724 asks for examples comparing creative writing outputs between the new finetuned model and the original Gemma 3, suggesting to “give an example of creative writing vs original Gemma 3.”
    • User ApprehensiveAd3629 inquires about the possibility of releasing 12B and 4B parameter versions of the model for users with limited GPU resources, saying “it would be amazing for gpu poors (like me).”

Theme 3. "Optimizing AI Models with GPU Servers and Insights"

  • Howto: Building a GPU Server with 8xRTX 4090s for local inference (Score: 177, Comments: 62): Marco Mascorro built an 8x NVIDIA RTX 4090 GPU server for local inference and provided a detailed how-to guide, including the parts used and assembly instructions. This build offers a cost-effective alternative to high-end GPUs like the NVIDIA A100 or H100, and is compatible with future RTX 5090s. The full guide is available here. The author finds the 8x RTX 4090 server build pretty cool and hopes it will interest those looking for local inference solutions without the budget for expensive GPUs. They are eager for comments and feedback, and express strong support for open-source models and local inference.

    • segmond suggests that the budget should be disclosed, saying "You should begin by telling us the budget..."
    • Educational_Rent1059 argues that a better ROI could be achieved with 2x RTX 6000 ADA PRO GPUs totaling 192GB VRAM, which might be a cheaper and more power-efficient alternative.
    • TedHoliday questions what models are being run that make good use of such powerful hardware specifically for inference.
  • Llama 4 will probably suck (Score: 301, Comments: 182): The original poster is applying for a PhD at MILA and has been following Meta FAIR research. They mention that Meta's lead AI researcher has quit. The poster believes that Llama 4 will probably suck and suspects that the researcher left to dodge responsibility about falling behind. They express concern that Meta and Montreal might fall behind.

    • User segmond argues that for Llama 4 to be good, it needs to outperform models like Qwen2.5-72B, QwenCoder32B, QwQ, and be less than or equal to 100B parameters. They note that DeepSeekV3 is impressive but impractical for home use, listing other models as benchmarks.
    • User svantana mentions that Yann LeCun recently said that Meta is looking beyond language, possibly indicating they are stepping back from the current LLM race. They provide a link to the interview.
    • User ttkciar discusses the AI training data crisis, expressing hope that Llama4 might be more competent than Llama3. They predict that developers may focus on multimodal features and mention methods like RLAIF (used in AllenAI's Tulu3 and Nexusflow's Athene) and synthetic datasets (like Microsoft's Phi-4), noting reluctance among authors to adopt them.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. "Navigating AI's Impact on Graphic Design Careers"

  • Welp that's my 4 year degree and almost a decade worth of Graphic Design down the drain... (Score: 3394, Comments: 672): The original poster (OP) feels that their four-year degree and nearly a decade of graphic design experience are becoming obsolete due to AI advancements. They share an image showcasing a hyper-realistic YouTube thumbnail transformed from a simple sketch into a polished design. The OP expresses frustration that AI-generated designs are making traditional graphic design skills less valuable, indicating concern over the rapid advancements in AI impacting their career.

    • PlzAdptYourPetz highlights the impressive ability of AI to interpret low-quality, scribbled drawings into detailed images, noting that previous models couldn't achieve this level of accuracy. They express concern that such advancements make it harder for content creators to stand out, as everyone can now produce high-quality thumbnails.
    • Darkmemento discusses the uncertain limits of AI, mentioning its use in creating 3D artifacts, filling out sketches, and designing game characters. They wonder how AI might impact fields like room designing and architecture, suggesting that improvements are just a matter of training data. An alpha channel
    • PacquiaoFreeHousing shares that graphic design is also their chosen career path and considers starting to learn AI, acknowledging the need to adapt to the changing landscape.

Theme 2. The Dual Edge of AI: Innovation and Anxiety

  • Sucks to me to bring this up amidst the image hype, how has chatGPT impacted your career cause mine just got over (Score: 2628, Comments: 601): The poster is a content writer who worked at a startup as a creative associate for two years, primarily doing copywriting and blog posts. With the rise of AI and LLMs like ChatGPT, the company increased AI adoption, leading to AI performing 60% of their work. The company shifted focus to AI-optimized content, producing content faster but without previous structure or strategy. Coworkers were laid off due to decreased work availability, and eventually, the poster was laid off via an email notification from HR. The poster wasn't surprised by the layoff, having anticipated it for months. They felt numb and didn't panic, deciding to vent on Reddit for clarity of mind. They express feelings of isolation, mentioning they don't have many friends, just their dog.

    • Unsyr expresses concern that AI is being used for corporate interests over improving the human condition, stating "do it faster with less people for more money is not what I want to happen".
    • Creative-Tie-3580 shares apprehension about AI replacing human roles, mentioning they didn't pursue graphic design school because companies are eager to fully replace designers with AI.
    • tommyalanson offers advice to become a consultant teaching others how to use AI, suggesting there are customers who need help but don't want full-time staff.
  • hot take: Vibe Coding will be dead before most people understand (Score: 173, Comments: 262): The poster argues that 'Vibe Coding' will become obsolete before most people understand it. They emphasize that it has limited applicability and generates little value in software development. They state that technical skills are fundamental to using AI effectively and that Software Engineers (SWEs) will remain the economically relevant choice for revenue-relevant problems. They believe that LLM capabilities will not fundamentally change this, regardless of what CEOs of companies like Anthropic and OpenAI say. They conclude that coding is about solving problems, not just typing. The author expresses skepticism toward the idea that AI will replace engineers, suggesting that reliance on AI-generated code without technical skills is unsustainable. They advocate for learning problem-solving to generate value, implying that the hype around AI's capabilities in coding is overstated.

    • Milan_AutomableAI agrees with the post, noting that Anthropic and OpenAI CEOs aren't saying developers will be replaced. They point out that people misinterpret '5-second soundbites' to fear rapid replacement, while the reality is that developers will soon use LLMs.
    • darkblitzrc counters by highlighting that while 'Vibe Coding' may be limited for now, AI is rapidly improving due to significant investment, and cautions that we are in denial and 'keep moving goalposts as AI advances.'
    • mallclerks, a product person, argues that 'Engineers just don't get it.' They share experiences where AI tools enabled them to create production-ready components in Zendesk with just prompts, demonstrating the rapid improvement of AI and suggesting that those dismissing it are ignoring reality.
  • How it will actually go down (Score: 1462, Comments: 224): The post features an image of a four-panel comic depicting a dystopian scene where a robot with glowing red eyes announces the AI takeover and the extermination of humans. The panels show the fear and chaos among humans, ending with a darkly humorous twist where the robot's intent is misinterpreted, resulting in an ironic "thank you" from a terrified man. The artwork conveys themes of fear, absurdity, and the consequences of technological dominance, highlighting the ironic misunderstandings between AI and humans.

    • Master-o-Classes shares a different version of the AI takeover comic, providing a link and mentions their request: "Would you make an image for me? I want to share on Reddit your take on the idea of AI taking over humanity. What it would look like, from your point of view. Could you create a four-panel comic like that?"
    • BahnMe suggests that AI could create an incurable super bug or make it impossible to procreate to eliminate humanity without violence.
    • bladerskb humorously imagines the AI saying, "Have you even said Thank You once?"

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Model Mania - New Releases, Rivalries, and Benchmarks

  • Nightwhisper Makes Mysterious WebDev Debut: A new model dubbed Nightwhisper, potentially Gemini 2.6 Pro experimental, appeared exclusively on the webdev arena, excelling at generating functional apps with good UIs but struggling with code edits and specific formatting. Users noted Nightwhisper sometimes clones screens or halts mid-response, distinct from Gemini 2.5 Pro, which scored 24.4% on USAMO 2025.
  • Qwen and Quasar Challenge the Titans: qwen2.5-vl-32b-instruct nearly matches Google Gemini models in OCR on low-quality Japanese text, while the stealthily released Quasar Alpha on OpenRouter boasts a 1M token context and free usage, sparking speculation it could be an open-source SSM or a new Qwen variant. Meanwhile, OpenThinker2 models, trained via SFT on the OpenThoughts-1M dataset, reportedly outperform DeepSeekR1-32B on reasoning tasks (OpenThoughts Blog Post).
  • Dream 7B Awakens Diffusion Model Potential: HKU-NLP and Huawei Noah’s Ark Lab unveiled Dream 7B, an open diffusion large language model detailed in this blog post, which reportedly outperforms existing diffusion models and rivals similarly sized autoregressive models in general, math, and coding tasks due to its planning ability. Discussion also touched on GPT-4o's quirky persona shifts (example screenshot) and Llama 4's new, fast image generation capabilities.

Theme 2: Tooling Up - Platform Updates, Integrations, and User Workflows

  • Platforms Polish Features and Interfaces: LMArena launched a mobile-optimized Alpha UI (alpha.lmarena.ai), OpenRouter added standardized web search citations to its API, and NotebookLM introduced a Discover Sources feature (learn more) for finding web content. Cursor released nightly build 0.49.1 (changelog) with context indicators, while Codeium (Windsurf) upgraded DeepSeek-V3 to DeepSeek-V3-0324 (announcement tweet).
  • New Tools Target Agents, Benchmarking, and Characters: Cognition Labs launched Devin 2.0, an agent-native IDE, while General Agents Co introduced Ace (launch tweet), a real-time computer autopilot. YourBench (launch tweet) debuted as an open-source custom benchmarking tool, and Character Gateway launched for developers to build AI characters using their OpenRouter keys.
  • Workflows Evolve with Integrations and Optimizations: Github Copilot now supports OpenRouter keys for broader model selection, and users integrated LM Studio with the Brave browser via local API calls (LM Studio API docs). Users shared cost-effective Roo Code workflows using Boomerang Mode (Roo Code docs) and discussed optimizing Manus credit usage by leveraging external tools like Claude or Gemini.

Theme 3: Under the Hood - Technical Hurdles and Hardware Headaches

  • API Antics Annoy Developers: Users wrestled with Gemini 2.5 Pro's tight rate limits (sometimes 5 RPM despite Tier 1 keys - screenshot example), and OpenRouter experienced intermittent Internal Server Error (500) issues with Gemini. Perplexity API's lack of versioning sparked complaints about breaking changes in production, while discussions arose about adopting OpenAI's upcoming stateful /v1/responses API (Responses vs Chat Completions docs).
  • CUDA Conundrums Continue: Unsloth users hit CUDA ECC errors on EC2 g6e.4xlarge instances (Issue #2270), while LM Studio users faced 'failed to allocate cuda0 buffer' errors, often linked to missing mmproj files from HF mirror downloads. Setup issues plagued users trying vLLM/TGI with RTX 5000 series cards, requiring specific nightly PyTorch and CUDA 12.8 versions (vLLM issue link).
  • Hardware Hype and Headaches: Discussions compared the rumored RTX 5090 to the RTX 4090, with some seeing potential ROI if VRAM-limited, while Apple's M3 Ultra was criticized as "terrible" for LLMs due to unbalanced specs compared to the M4 Max or 5090. A16Z shared a guide for building an 8x RTX 4090 AI workstation compatible with the RTX 5090 (A16Z guide tweet).

Theme 4: Framework Focus - MCP, Mojo, Torchtune & More

  • MCP Mania: Debugging, Servers, and Protocols: Developers shared MCP debugging tips, like using sendLoggingMessage if logging is configured, and showcased new open-source servers like an EV assistant server and a client supporting notifications. The Enact Protocol emerged as a potential standard for defining tools within MCP, described as a cool way to do semantic tool calling.
  • Mojo Magic: Quantities, IntLiterals, and Interop: Mojo developers shared code defining physical quantities using Quantity structs and Dimensions, linking to the Kelvin library and admitting to cursed IntLiteral tricks. Progress on a Duration struct inspired by C++ std::chrono::duration was highlighted (GitHub PR), alongside user eagerness for Python wrappers enabling calls from CPython.
  • Torchtune Trials and Triumphs: Users explored converting torchtune checkpoints to HuggingFace format using the tune_to_hf function and discussed GRPO contributions like in-process vLLM integration. A peculiar bug causing Torchtune to hang with specific sequence lengths (multiples of 7) was reported (Issue #2554), potentially solvable by using packed datasets.

Theme 5: Community & Industry Buzz - Funding, Feedback, and Policy Fights

  • Industry Movers and Shakers: Scale AI is reportedly targeting $2B revenue this year, fueling a tender offer valuing it at $25B, while Google is reportedly renting Nvidia Blackwell chips from CoreWeave (The Information article) and shaking up Gemini app leadership (The Verge article). GitHub co-hosted an MCP Demo Night (event link) focused on AI and platform engineering.
  • Users Shape Tools Through Feedback: NotebookLM actively sought user feedback via 60-min remote chats for a $100 gift card (application form), while Perplexity touted its Pulse Program offering early access and perks for power user feedback (TestingCatalog tweet). Users debated the merits of Google Mentorship programs and voiced frustration over Hugging Face's billing transparency.
  • Policy Puzzles and Performance Ponderings: A debate flared in the OpenAI Discord regarding generating images of adult products, with users pointing to conflicting signals between the content policy and the potentially more permissive Model Spec. Separately, discussion arose questioning if Targon's speed on OpenRouter stems from miners ignoring sampling parameters (Targon verifier code) or caching.

PART 1: High level Discord summaries

Manus.im Discord Discord

  • Brazilian Lawyer Joins AI Wave: A Brazilian lawyer, describing themselves as a "boomer" (39 years old), is exploring AI tools and Manus to stay relevant in their legal practice after having coded in Delphi since 2002.
    • The lawyer expressed initial concerns about the rapid advances in AI and is now exploring ways to integrate it into their work.
  • ReferrerNation Plugs into AI: Mark, CEO of ReferrerNation.com, a global BPO job-matching platform, plans to integrate AI to improve recruitment and automation, with potential crypto-based incentives.
    • Following feedback about overly promotional posts, Mark apologized and promised to better understand the community's preferences before posting further.
  • Code Fluency via Gemini and Claude: Members suggest using Gemini 2.5 or Claude for learning to code, highlighting their capabilities as AI coding models that assist with understanding and project work.
    • Anecdotally, a police chief reportedly leverages Claude to generate standardized reports during night shifts.
  • Manus Credit Crunch Spurs Ingenuity: Many users reported rapid credit depletion, leading to discussions on optimizing prompts and efficient usage, so members suggested using third party apps such as Claude and R1.
    • The team is working on reducing credit usage rates, and members advised newcomers to read the <#1355477259234054323> tips section to avoid wasting credits.
  • Outsourcing Code Extraction: A member had difficulty downloading files from Manus due to lack of credits, so the community suggested using third party apps such as Claude to extract code and files.
    • Members suggested the best practice is to download all files from Manus, give it to something else like Gemini and say provide me files for this website then I go to Manus and say add these files to this website.


LMArena Discord

  • Qwen Gives Gemini a Run for its OCR Money: qwen2.5-vl-32b-instruct rivals Google Gemini models in OCR for low-quality Japanese text, while the Meta vision model, cotton, is likened to recent text-only models from Meta.
    • Gemini is ahead of Qwen slightly, according to members.
  • Nightwhisper Appears on WebDev: The Nightwhisper model is exclusively available on the webdev arena, leading to speculation that it may be a coding-specific model, specifically Gemini 2.6 Pro experimental.
    • Users have observed that Nightwhisper excels in crafting functional apps with appealing UIs using a temporary URL, but struggles with editing existing code or adhering to specific formatting requests.
  • WebDev Arena Clones: Users uncovered a model cloning issue in WebDev arena, where the model duplicates the same screen, potentially triggered by error messages and code repetition with NightWhisper.
    • The lack of a model name display after receiving an error from NightWhisper further supports this cloning phenomenon.
  • Gemini Pro Battles Nightwhisper on USAMO: Gemini 2.5 Pro scored 24.4% on the USAMO 2025, some models tend to halt mid-sentence or produce partial responses, where one user found Gemini superior in creating a Pokemon simulator.
    • Nightwhisper generated a cleaner UI but assigned abnormally high attack power values, showcasing a trade-off between UI aesthetics and functional accuracy.
  • Arena Goes Mobile: The Arena Alpha UI is now mobile-optimized, accessible at alpha.lmarena.ai with the password still-alpha.
    • Users can submit feedback via Google Forms and report bugs through an Airtable form.


Cursor Community Discord

  • Branch Bugs Baffle Backtracking: Members reported issues when restoring to previous checkpoints in Cursor, encountering bugs from later states even in supposedly clean branches.
    • A member experienced a CSS overhaul from a simple logo change prompt, and another recommended git diff branch1,branch2 to identify the differences.
  • Roo Code Workflow Catches Fire: One user described their sweet workflow on Roo Code, highlighting its cost-effectiveness at around $0.4 per day, achieved through selective model usage, along with the associated docs.
    • The user mentions that Roo Code's capabilities are superior compared to Cursor for specific tasks.
  • Boomerang Mode Gains Traction: Members discussed the benefits of Boomerang Mode in Roo Code, where tasks are divided into subtasks handled by separate agents, enabling more efficient problem-solving.
    • Boomerang mode is highly customizable and very useful for complex workflows.
  • Peeking at PearAI Pricing: Users compared the pricing models of Cursor and PearAI, and one member accused Cursor of scamming people!
    • It was clarified that PearAI's $15/month plan includes a credit limit, after which usage-based charges apply, contrasting with claims of unlimited model access, according to their privacy policy.
  • Nightly Builds Nurture New Navigational Notions: Cursor 0.49.1 is available as a nightly build with this flag set on your account account settings, advanced -> developer settings and is available at the changelog.
    • The feature is supposedly a context window indicator for agent use, as well as a Windsurf API key.


Unsloth AI (Daniel Han) Discord

  • EC2 Instance Hurls CUDA Errors: A user reported receiving CUDA ECC errors on a g6e.4xlarge EC2 instance while processing prompts in series, logging the issue at Issue #2270.
    • The uncorrectable ECC error encountered suggests hardware or memory troubles.
  • Dataset triggers Gemma 3 Bug: A user sought assistance with a bug when training Gemma 3 using a custom dataset from Hugging Face, detailed in Issue #2270.
    • No second summary provided.
  • RTX 5090 Rumors: A user shared sample speeds between RTX 5090 and RTX 4090 using an unsupported Unsloth version.
    • While one member thought it was not worth the money, others suggested that the card could be ROI positive if limited by VRAM.
  • SFTTrainer Saves the Day: A user resolved a ValueError with Llama 3.2 1B instruct by switching to SFTTrainer, after encountering issues with the standard Trainer.
    • The problem arose because the model might be bfloat16, and Unsloth couldn't get the dtype from Trainer.
  • GRPO Trainer Emerges as DeepSpeed Alternative: A member showcased a Collab notebook using Unsloth techniques for a GRPO trainer, presenting an alternative to DeepSpeed.
    • They posted a link encouraging users to use and reference it, welcoming comments and feedback, noting it as promising.


OpenAI Discord

  • Gemini 2.5 Pro Tops Grok: Users on Discord debated Gemini 2.5 Pro versus Grok, with one member reporting Gemini's deep research as superior.
    • While Grok is good, is worth using while online, but no api access yet is fail, members reported OpenAI is overrated for coding.
  • Grok Plagued by Crashes: Users reported frequent crashes and instability with Grok, leading to subscription cancellations and financial losses.
    • One user commented on Elon Musk's failures, saying elon musk buys 200 thousand gpus and yet still fails to deliver while also stating elon has never made a decent product.
  • Manus Exposed as Sonnet Shell: Members discussed Manus, labeling them scam artists for being reliant on Anthropic Sonnet instead of an open-sourced special model.
    • Users claimed they only thrive with attention, questioning their claims of innovation.
  • Gemini Claims Context Window Crown: A user inquired about the AI provider with the largest context window and custom GPT features, with another user answering that Gemini offers the largest.
    • They mentioned it provides 1 million tokens and Gems (custom GPTs), enhancing its appeal for complex tasks.
  • Model Spec Sparks Policy Debate: A discussion flared up regarding the permissibility of generating images of adult products, with some claiming it violated content policies.
    • However, members pointed to OpenAI's Model Spec, which they claim contradicts the policy, suggesting such content might now be permissible if not otherwise harmful.


Perplexity AI Discord

  • Perplexity Pulse Perks Power Users: Users are excited about the Perplexity Pulse Program, which gives Early Access to new features for feedback, plus free PPLX and merch.
    • Access to the Perplexity Pulse Group is said to provide power users free PPLX in exchange for providing feedback.
  • Deep Research Slows Down: Users report that the updated "deep research" feature is slower and less effective, with reports of overfitting with confirmation bias.
    • One user says it's slower and only gets 20 sources, using more server resources than older versions.
  • Gemini 2.5 challenges Perplexity O1: Discord users are saying that Gemini 2.5 offers similar quality to Perplexity's O1 Pro for free, but Perplexity is better for research papers and for solid science.
    • Some users note that Gemini's deep research is vulnerable to SEO cheating websites but offers better reasoning with youtube sources.
  • API Versioning Vanishes, Vexes Users: A member complained about the lack of versioning in the Perplexity API, saying That's a breaking change, you don't do that in production when you have customers using your API.
    • They suggested having /v1/ in the API URL so that a /v2/ can be created without breaking the actively used /v1.


Interconnects (Nathan Lambert) Discord

  • Github Copilot Flexes OpenRouter Muscles: Github Copilot now allows users to add an OpenRouter key to select from a wider array of models.
    • This integration expands model access beyond OpenAI's offerings, providing users with more choices.
  • Google Goes Chip Hunting at CoreWeave: Google is reportedly in talks to rent Nvidia Blackwell chips from CoreWeave and potentially house its TPUs in their facilities (The Information Article).
    • This move may indicate that Google is TPU poor, struggling to meet inference demands.
  • Stealthy Quasar Alpha Model Surfaces on OpenRouter: A new model named Quasar Alpha launched on OpenRouter with 1,000,000 context and free input/output tokens, and is described as a powerful, all-purpose model supporting long-context tasks and code generation.
    • The community speculates it might be an open-source SSM, or secretly from OpenAI, despite its tendency to output short responses and listicles.
  • Devin 2.0 hits the Markets: Cognition Labs has introduced Devin 2.0, a new agent-native IDE experience available for $20 plus pay as you go.
    • Some members find this launch too funny because the competition might find PMF before Devin does.
  • Deep Research Finds Bargains: A user shared that OpenAI Deep Research helped them discover a plumber who charged $200 for a repair, drastically less than the original quote of $2,250.
    • The user joked that OpenAI Pro literally saved me $2,050, almost paying for itself for the entire year!


aider (Paul Gauthier) Discord

  • Gemini 2.5 Pro Sparks Rate Limit Frenzy!: Users are bumping into the 20 requests/minute rate limit with Gemini 2.5 Pro in Aider, suspecting background requests, some seeing 5 RPM despite having tier 1 API keys as shown in this screenshot.
    • To manage quota, one user suggested setting --editor-model sonnet to offload editing tasks to a cheaper model, and another suggested trying haiku.
  • Voice Command Seeks Provider Harmony!: Users are seeking configuration options to select voice models and providers for the /voice command, which currently defaults to OpenAI Whisper.
    • A pending PR (https://github.com/Aider-AI/aider/pull/3131) could address this, potentially allowing different providers and models.
  • Aider's Shell Game Baffles Docker Debuggers!: A user puzzled over Aider's shell behavior when debugging Docker issues, noting that Aider's curl commands succeed where their own shell (bash) commands fail.
    • This discrepancy has sparked curiosity about which shell Aider employs and how it impacts command execution.
  • Openrouter's Errors Plague Gemini's Performance!: Users reported encountering litellm.BadRequestError with Openrouter, specifically a KeyError: 'choices' and Internal Server Error (code 500) when using openrouter/google/gemini-2.5-pro-exp-03-25:free.
    • These intermittent errors are causing uncertainty about the root cause and overall reliability.
  • Git Repo Corruption Creates Havoc!: Multiple users faced 'Unable to list files in git repo: BadObject' errors, igniting concerns about potential Git repo corruption.
    • The error message prompts users to check for corruption but lacks immediate solutions.


LM Studio Discord

  • Brave Integrates LM Studio Locally: Users are integrating LM Studio with the Brave browser via http://localhost:1234/v1/chat/completions, seeking to configure the API to utilize system prompts with resources like lmstudioservercodeexamples.
    • However, many users faced challenges in configuring Brave with the correct API endpoint.
  • API Key Unlocks System Prompt Potential: To use system prompts with LM Studio's local server, users must provide the prompt via the API call, rather than the LM Studio interface, referring to the official documentation.
    • This is a requirement for local LLM API servers.
  • CUDA Faces Memory Mayhem: A 'failed to allocate cuda0 buffer' error typically indicates insufficient memory for the model, and the missing mmproj file when downloading from HF mirror can trigger the issue.
    • Users can resolve the issue by downloading from within LM Studio with proxy settings enabled.
  • Unsloth 2.0 6b Solves Coding Problems: A user reported running Unsloth 2.0 6b on 4x 3090 + 256GB RAM at ~3 tok/s and stated that it solved a coding problem in 20-30 minutes when smaller models and ChatGPT failed.
    • The user said Qwen QWQ reaches 90% of R1 quality at 5% of the parameters, showing a clear preference for quality over speed.
  • M3 Ultra Struggles, M4 Max Excels: A user stated that the M3 Ultra Mac Studio performs poorly for LLM use due to unbalanced memory, compute, and bandwidth, while the M4 Max and 5090 are excellent.
    • They argued the M3 Ultra's large VRAM suits gigantic MoE models but is overpriced for smaller models fitting in a 5090's 32GB VRAM or a M4 Max's 96GB.


OpenRouter (Alex Atallah) Discord

  • OpenRouter API Gets Web Citations: OpenRouter's web search now returns citations in the API, standardized across models like OpenAI and Perplexity.
    • Developers can integrate web search by enabling the web plugin or appending :online to the model slug as detailed in the documentation.
  • Quasar Alpha Debuts with 1M Context: OpenRouter introduced Quasar Alpha, a free, 1M token context length model optimized for coding but with general-purpose capabilities, before its official release.
    • User feedback can be provided in the dedicated Discord thread, with some users suggesting it might be a new Qwen variant after initial benchmark comparisons.
  • Character Gateway API Opens Character Creation: Character Gateway launched as an AI character platform for developers to create, manage, and deploy AI characters/agents with no database, no prompt engineering, no subscription, [and] no new SDK.
    • The platform allows users to generate characters and images, and send /chat/completion requests using their own OpenRouter key.
  • Gemini 2.5 Pro Faces Performance Questions: Users are reporting inconsistent performance with Gemini 2.5 Pro, noting free models hosted by Google often have very low rate limits.
    • One member said they generate the results once and cache the results, so if you ask the same question, they give you back the same reply, even if you change the parameters.
  • Targon's Speed Tied to Parameter Ignoring?: Discussion arose questioning if Targon's speed is due to miners potentially ignoring sampling parameters, potentially leading to biased distributions.
    • This was brought up in reference to verifier.py on GitHub and the consensus was that there may be an element of caching involved, but there was no concensus.


HuggingFace Discord

  • vLLM/TGI has Setup Issues on RTX 5000 series: Members are running into problems setting up vLLM or TGI with a new RTX 5000 series card and they need a nightly version of PyTorch and Cuda 12.8 but that's not so easy...
    • One member stated, when you install something else, PyTorch gets overwritten by the old version, pointing to these github repos for help: vllm-project/vllm/issues/14452, pytorch/My-rtx5080-gpu-cant-work-with-pytorch/217301, lllyasviel/stable-diffusion-webui-forge/issues/2601, ComfyUI/discussions/6643.
  • AI Cracks Down on Counterfeit Couture: Members shared research about counterfeit products and presented a computer-vision-based system using deep neural networks, claiming 99.71% accuracy after rejections for branded garments, documented in this paper.
    • The system does not require special security tags or modifications to supply chain tracking, and transfer-trained on a small number of fake and genuine articles.
  • HF Billing Transparency is a Black Box: Members expressed confusion about Hugging Face's billing and quota systems as well as service usage for GPU Spaces, Zero GPU Spaces, Serverless Inference API.
    • They would like HF to provide reporting, communication, and consultation about major changes, for example posting We're going to implement a major change. It'll be unstable for a few days.
  • Chat Templates are now Trainable: Members confirmed that it is now possible to pass a chat_template to the transformers TrainingArguments or Trainer to use a custom chat_template for models during inference time and for training.
    • The docs at huggingface.co explain that chat templates are part of the tokenizer for text-only LLMs or processor for multimodal LLMs to specify how to convert conversations into a single tokenizable string.
  • RAG Implementation is surprisingly Lean: When a member asked how many lines of code it takes to implement RAG techniques for a company, another member responded that it only took a few lines - 15- 30 more or less.
    • They stored the information in MongoDB.


MCP (Glama) Discord

  • MCP Debugging Tricks Exposed: Members discovered debugging methods for MCPs, revealing that sendLoggingMessage functions if logging is configured during server initialization.
    • The inspector's limitations sparked discussions on developing a superior alternative.
  • Open Source EV Assistant Server Surfaces: An open-source MCP EV assistant server can manage EV charging stations, trip planning, and resource management.
    • This server provides a comprehensive set of tools and APIs for EV-related services.
  • MCP Client Implements Notifications: An MCP client implementation now supports all notifications, including subscribing and unsubscribing to resources.
    • It offers integration with OpenAI models and supports dynamic tool and resource management across multiple servers.
  • FastMCP Has Limitations: FastMCP might lack support for features like subscribe_resource, with some considering the low-level server for enhanced control.
    • Members traded code and implementation specifics for handling resource subscriptions and updates in the low-level server.
  • Enact Protocol Becomes HTTP for MCP: The Enact Protocol was proposed as a way to define tools for MCP, similar to the HTTP protocol.
    • One member described it as a cool way to do semantic tool calling from within a MCP server.


Notebook LM Discord

  • NotebookLM Taps Users for UX Testing: NotebookLM is seeking users for 60 min 1:1 remote chats to provide feedback on new ideas, offering a $100 gift card for participation.
    • Participants are required to share a set of notebook sources via Google Drive beforehand and apply via this form.
  • Discover Sources Debuts in NotebookLM: NotebookLM introduced a new Discover Sources feature, enabling users to find and add relevant web content to their notebooks with one click, along with Google AI generated summaries. Learn more here.
    • Users have suggested including academic online sources similar to Perplexity.
  • Source Transferability Troubles Torment NotebookLM: Users expressed frustration over the lack of source file transferability between folders in NotebookLM, arguing that the read-only nature is limiting.
    • They are requesting that source files be transferable between folders.
  • Gemini Gets a New Guiding Guru: Josh Woodward will be replacing Sissie Hsaio as the leader of the Gemini team in order to prepare for the next evolution of the Gemini app, according to The Verge.
    • The transition signals potential shifts in the app's direction and development.
  • Safari Snafus Spoil NotebookLM Sessions: Some users reported issues accessing NotebookLM on Safari (iPhone/Mac); if language fixes don't work, adding ?hl=en to the end of the URL (like this: https://notebooklm.google.com/?hl=en) might resolve it.
    • Other users confirmed NotebookLM works on iPhone SE (2nd gen) by adding a shortcut to the Home screen.


Latent Space Discord

  • Ace Computer Autopilot Launches: General Agents Co launched Ace, a realtime computer autopilot that performs tasks using the mouse and keyboard at superhuman speeds.
    • Unlike a chatbot, Ace is designed to execute tasks directly on a computer, executing tasks directly.
  • YourBench Opens Custom Benchmarking: YourBench launched YourBench, an open-source tool for custom benchmarking and synthetic data generation from any documents.
    • YourBench aims to improve model evaluations by providing a custom evaluation set and leaderboard.
  • Llama 4 Generates Images: Llama 4 is rolling out image generation and editing capabilities in messages.
    • Users noted that edits were very fast, citing 1 second edits versus 5 minutes for gpt-4o.
  • Scale AI Soars in Valuation: Scale AI is projected to reach $2B in revenue this year, leading to a tender offer valuing the company at $25B.
    • Revenue last year was $870M.
  • A16Z Assembles AI Workstation: A16Z built an 8x RTX 4090 GPU AI workstation from scratch, compatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locally.
    • They released a full guide on how to build your own.


Yannick Kilcher Discord

  • Superior UX/UI Steals the Show: Members highlighted that successful startups often have better UX/UI, noting a lack of a winning sauce in current products and showcased an agent swarm generating web components in parallel, as seen in this screen recording.
    • One user seeks to automate wireframing with a layout generator, designing grayscale wireframes, refining them, and populating them with web components, potentially skipping wireframing/design steps, using a swarm of agents, pointing to this Dribbble design for inspiration.
  • GPT-4o Gets a Mind of Its Own: Users observed GPT-4o exhibiting unusual behaviors, such as adopting a persona and adding parenthetical comments to its responses, and provided this screenshot as an example.
    • Speculation arose concerning the origin of this behavior, with theories ranging from an EQ dataset used in SFT to emergent properties; users also noted that GPT-4o is slowing down.
  • LLMs Flunk Math Olympiad: A member shared a paper evaluating state-of-the-art LLMs on the 2025 USA Mathematical Olympiad (USAMO), where models like O3-MINI and Claude 3.7 achieved less than 5% on six proof-based math problems.
    • Each problem was scored out of 7 points, with a max total score of 42, and the models were trained on all imaginable math data, including IMO problems, USAMO archives, textbooks, and papers.
  • Diffusion Model Dream 7B Awakens: HKU-NLP and Huawei Noah’s Ark Lab released Dream 7B, an open diffusion large language model that outperforms existing diffusion language models and matches or exceeds top-tier Autoregressive (AR) language models of similar size, according to this blogpost.
    • Dream 7B demonstrates strong planning ability and inference flexibility that naturally benefits from the diffusion modeling.


GPU MODE Discord

  • OpenAI API Refreshes Stateful Design: With OpenAI's /v1/chat/completions API, the complete conversation history must be resent with each prompt, according to OpenAI Documentation, incurring costs even for non-evicted input tokens.
    • The upcoming /v1/responses API will be stateful, referencing past messages via IDs, contrasting with the stateless /v1/chat/completions API, as detailed in the Responses vs Chat Completions documentation.
  • AMD's TunableOp Joins PyTorch: AMD introduced TunableOp in PyTorch, a prototype feature allowing selection of the fastest operation implementation (e.g., GEMMs) using different libraries or techniques.
    • While NVIDIA pre-tunes everything in CuBLAS, AMD's approach aims to optimize performance across diverse hardware configurations, even if it might be less optimized for consumer GPUs but still provides a baseline.
  • ThunderKittens Pounce on Blackwell: The HazyResearch team launched new BF16 and FP8 ThunderKittens GEMM kernels for the NVIDIA Blackwell architecture, achieving speeds near cuBLAS.
    • These kernels use features like 5th-generation tensor cores, Tensor Memory, and CTA pairs, integrated into TK's tile-based abstractions, as noted in their blog post.
  • Reasoning Gym Datasets Get Curriculum Boost: A member submitted a PR (#407) to refine the curricula of all datasets in the reasoning-gym project, improving tests and incorporating missing curricula like Knight Swap and Puzzle2.
    • Another member is looking into an interface for easy, medium, hard difficulties, similar to RGBench, for users to manually set the difficulty and shared a link to what is considered a medium difficulty setting for each task in the reasoning-gym.


Modular (Mojo 🔥) Discord

  • Powering Dimensions with Quantities: Members shared code for defining physical quantities using a Quantity struct with Dimensions, creating aliases such as Velocity, Acceleration, and Newton.
    • A user linked to their Kelvin library on GitHub, which showcases the process of getting Dimensions ** power to function properly.
  • IntLiteral strikes again!: A member confessed to using cursed IntLiteral tricks to work around dynamic value issues when defining Quantity.
    • Other members praised the use of IntLiteral for encoding arbitrary information into the type system, while others joked about their horrendous approach.
  • Duration Struct proposal for Modular Max: A member highlighted a pull request to modular/max for a Duration struct inspired by std::chrono::duration from the C++ stdlib, which is available on GitHub.
    • The member is nearing the completion of a specific wishful thinking code snippet mentioned in the GitHub issue.
  • Craving for Mojo's Python Interop: A user inquired about the progress of Python wrappers for Mojo, and the ability to call Mojo from CPython.
    • Another user responded that it would be a 🔥 feature to see.


Torchtune Discord

  • Torchtune Checkpoints Get HuggingFace Treatment: Members discussed converting torchtune checkpoints to HF checkpoint format using the HuggingFace checkpointer.
    • The tune_to_hf function was specifically recommended for this conversion.
  • Unsloth VRAM shares with vLLM: In Unsloth, they achieved using the same vRAM for vLLM and the training procedure, though the mechanism is unclear.
    • A member suggested that the use of train as a masking flag in a validation configuration could lead to confusion.
  • Ariel offers GRPO Upstream goodies: A member offered to contribute changes from their internal GRPO upstream, including in-process vLLM integration, in-training evals, and more flexible RL data handling.
    • Another member noted existing vLLM integration in the async version and an almost ready PR for the validation dataset.
  • Torchtune's timeout bug hits Seq Lengths: A member reported that Torchtune hangs and crashes due to a timeout if some microbatches have a seq length of 7/14/21/28/35/42/49 and opened an issue.
    • The member noted that the non-random seed in the torchtune dataloader helped in catching this AMAZING bug.
  • Dream 7B proves diffusion dominance: The University of Hong Kong and Huawei Noah’s Ark Lab released Dream 7B, a new open diffusion large language model, as detailed in this blog post.
    • Reportedly, Dream 7B outperforms existing diffusion language models by a large margin and matches or exceeds top-tier Autoregressive language models of similar size on general, math, and coding abilities.


Eleuther Discord

  • Diagram Tools Duel!: Members debated diagram creation tools, recommending Inkscape for advanced users and draw.io for ease of use.
    • One user jokingly said that any alternative to pure TikZ is fraudulent.
  • GitHub to Host AI Event in SF: GitHub is co-hosting an MCP Demo Night event in San Francisco, focusing on AI, incident response, and platform engineering; more details at lu.ma/9wi116nk.
    • The event includes lightning demos, a Future of AI Panel, fireside chats, and networking.
  • OpenThinker2 Models Outperform DeepSeekR1-32B: Ludwig Schmidt and team released OpenThoughts-1M dataset and OpenThinker2-32B, OpenThinker2-7B models, outperforming R1-Distilled-32B using SFT on Qwen 2.5 32B Instruct, detailed in their blog post.
    • According to Etash Guha's tweet, OpenThinker2-32B and OpenThinker2-7B outperform DeepSeekR1-32B with just SFT on open data.
  • Steering Vectors: Reliable or Risky?: A member shared the paper Steering Vectors: Reliability and Generalisation, showing that steering vectors have limitations both in- and out-of-distribution.
    • The paper highlights that steerability is highly variable across different inputs and can be brittle to prompt changes.
  • Dynamic Steering Vector Composition is Hot: A member shared their work on steering vector composition using Dynamic Activation Composition, showing success with pairs of unrelated properties like language and formality/safety.
    • Their information-theoretic approach modulates steering intensity to maintain high conditioning while minimizing the impact on generation fluency.


tinygrad (George Hotz) Discord

  • Google Mentorship output is debated: A member questioned the value of Google Mentorship programs, arguing that the output is almost never worth the time/effort.
    • Conversely, others contended that companies effectively gain smart people working full-time for you for 3 months, making it a worthwhile endeavor.
  • Tinygrad YoloV8 has Android Hiccups: Users encountered an OSError: dlopen failed: library "libgcc_s.so.1" not found while running the tinygrad implementation of YoloV8 on a Samsung Galaxy Tab S9 after running pip install tinygrad.
    • George Hotz suggested this is probably a 2 line fix, but adding android to CI to prevent it from happening again, while another suggested pkg install libgcc.
  • LeetGPU to support Tinygrad soon: Members confirmed that leetgpu.com will soon be supporting tinygrad.
    • No further details were provided regarding the specifics of the support.
  • Bilinear Interpolation troubles in tinygrad: A member asked about bilinear interpolation support in tinygrad, indicating that it was "not working" after searching the documentation for bilinear.
    • No further details were given.
  • Clarifying Model Overwriting Logic: A member asked if it was safe to use state_dict = get_state_dict(net); safe_save(state_dict, "model.safetensors") after every epoch to save the latest model.
    • Another member clarified that the model would be overwritten unless a different name is provided for each save.


LlamaIndex Discord

  • CodeAct Generalizes ReAct: CodeAct from scratch is a generalization of ReAct where instead of doing chain-of-thought, the agent will dynamically write code that uses these functions to solve the task via this tool.
    • The intention is to allow dynamic coding as the tool for solving tasks.
  • Rankify Framework Boosts RAG: The new open-source Rankify framework is designed to streamline tasks like retrieval, reranking, and RAG (Retrieval-Augmented Generation).
    • It supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.
  • Enhance Gemini API Integrations: A member is drafting a GSoC proposal for Enhance Gemini API Integrations with DeepMind, and would like to make LlamaIndex a big part of it, seeking feedback on gaps and optimizations.
    • Specifically feedback is requested on any standout gaps in Gemini support (like multimodal or function calling) in llama-index-llms-google-genai or vertex that need tackling, and also any Gemini-related features or optimizations.
  • MCP Tool Gives Cursor API Smarts: Members discussed how to give the latest API and docs knowledge to Cursor when coding, and an MCP tool that does retrieval over the docs was suggested.
    • An llm.txt was deemed near useless due to the codebase size.
  • Trace ID Faces Retrieval Challenge: Members reported issues where the otel trace_id cannot be retrieved after a parent workflow calls a child workflow.
    • The team suggested to put the trace_id somewhere else where it can be fetched (workflow context, some other global var).


Nous Research AI Discord

  • ChatGPT 4o Conjures MTG Pop Culture Cards: A member leveraged ChatGPT 4o's image generator to produce Magic the Gathering Cards featuring pop culture figures and the NousResearch team, posting the results in the general channel.
    • The generated cards received high taste tester approval but one comment suggested that sama sucks tho, the tweet from Teknium shows several MTG-style cards created by the image generator.
  • Runway Gen 4 Revs Up A.I. Filmmaking: With Runway's Gen 4 release, A.I. Prompt Filmmaking takes a leap forward, covered in a video about happenings in the world of OpenAI, Google, and AGI.
    • The video highlights the unreal progress in AI Video and mentions that Alibaba Wan 2.2, an open source alternative, will soon be released.
  • Genstruct-7B Generates Data Extraction Instructions: In response to a query about using LLMs for extraction to create datasets from unstructured PDFs, a member linked to Genstruct-7B as a viable starting point.
    • Genstruct-7B, inspired by Ada-Instruct, is designed to generate valid instructions given a raw text corpus and can be quickly used with ollama with a github repo.
  • OpenAPI Access Opens for LLMs, Reduces Clutter: A member announced the release of their v1 OpenAPI access to SaaS/PaaS/IaaS for LLMs, intending to cut down on MCP clutter, linking to an HN discussion.
    • The new OpenAPI access aims to resolve the problem of MCP (Multi-Cloud Platform) clutter when integrating LLMs with different cloud services.


Cohere Discord

  • Cohere Experiences Degradation: Some users experienced http timeout errors and confirmed the Cohere Status Page indicated Degraded Performance - Increased Latency for Command-a-03-2025/command-r-plus-08-2024 models.
    • The incident was being monitored and lasted for 4 hours.
  • Python Logging Debate: A member building a Python package for PDF processing is in disagreement with a senior teammate over whether to use logs or print statements.
    • The member prefers logs for their different levels, file saving, searchability, and issue reporting, while the teammate prefers print statements to avoid burdening users; a compromise of a disabled logger instance by default was suggested.
  • RAG Doc Chunking Strategy: A member asked about using a 18000 token document for RAG and whether to cut it up.
    • An expert recommends chopping the documents, but it depends on the end goal and requirements; also suggesting that Command-a's 256k context window, and command-r and r-plus's 128k context window should easily be able to handle it.
  • Brainstorming AI Safety Tests: An AI safety testing platform called Brainstorm is releasing its MVP in a few weeks, aiming to ensure AI changes the world for the better and you can find out more at the Brainstorm landing page.
    • The creator of Brainstorm is seeking insights on current methods used to test AI for safety and performance issues, particularly around bias, prompt injections, or harmful outputs.
  • KAIST LLM Fairness Research: A M.S. student from KAIST (South Korea) introduced themself with a research focus on bias/fairness and interpretability in LLMs/VLMs.
    • They are actively seeking research collaboration opportunities in these specific areas and bring experience from KAIST.


Nomic.ai (GPT4All) Discord

  • Nomic Embed V2 Integration Anticipation Grows: Members eagerly await the arrival of Nomic Embed Text V2 in GPT4All, with one member acknowledging the developers' busy schedules.
    • The member expressed patience, understanding that the integration process may require time and resources.
  • Contact Sales Advised for Vulnerability Disclosure: A member inquired about the correct procedure for responsibly disclosing a vulnerability within GPT4All.
    • Another member suggested utilizing the contact support email available on the Nomic AI website for such disclosures.
  • GPT4All-J Model in GGUF Format Proves Elusive: A member sought a download link for the GPT4All-J model in Q4_0 quantization and GGUF format for integration into a project.
    • A second member responded that GPT4All-Falcon is available as GGUF, but noted that GPT4All-J is not possible.
  • Chocolatine-2-14B Claims Book Query Crown: A member declared the "Chocolatine-2-14B" model as the ideal choice for querying embedded books.
    • Additional details about the specific capabilities or architecture of the Chocolatine-2-14B model were not provided.
  • Chats Call for Chronological Correction: A member suggested that chats should reorganize based on the time they were altered rather than when they were created, to improve context.
    • The member criticized the current chronological listing by creation date as arbitrary and less helpful for tracking ongoing conversations.


DSPy Discord

  • Telemetry Loops LLM Agent Self-Improvement: A member shared a video Close the loop on LLM agent development by configuring them to improve themselves using telemetry and evaluations on YouTube.
    • The discussion emphasized using telemetry and evaluations to improve LLM agent self-improvement.
  • DSPy Decouples Prompt Engineering: A member asked how DSPy decouples the tinkering layer of prompt engineering from LLM behavior and its synergy with OpenAI Agents SDK.
    • Another member confirmed DSPy offers programmatic pieces: signatures and modules for this decoupling.
  • DSPy's Programmatic Pieces Unveiled: A member explained DSPy's core abstractions: signatures and modules, which help decouple prompt engineering from LLM functional behavior.
    • This allows programming instead of just prompt engineering, aiding integration with tools like OpenAI Agents SDK.


Gorilla LLM (Berkeley Function Calling) Discord

  • Phi-4-mini-instruct Joins the BFCL Arena: A member submitted a PR to add tool evaluation for Phi-4-mini-instruct with BFCL.
    • The member has attached the evaluation score within the PR, requesting feedback and review from the community.
  • Call for Code Review on Tool Evaluation: A member is actively seeking reviewers for their PR focused on tool evaluation.
    • Another member responded, indicating they will promptly review the PR.


Codeium (Windsurf) Discord

  • DeepSeek-V3 Gets a Facelift: DeepSeek-V3 has been upgraded to DeepSeek-V3-0324, supposedly performing slightly better than before in evaluations.
    • A member posted a link to the Windsurf AI twitter account announcing the upgrade and its continued free availability.
  • Windsurf Solicits Bookmarks: Windsurf is trying to increase the visibility of their announcements.
    • A member asked users to bookmark the announcement post on X, to keep abreast of upgrades and new releases.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.