AI News (MOVED TO news.smol.ai!)

Archives
April 5, 2025

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


Apply for AIEWF talk slots!

AI News for 4/3/2025-4/4/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7491 messages) for you. Estimated reading time saved (at 200wpm): 629 minutes. You can now tag @smol_ai for AINews discussions!

It's been a quiet week, so why not fill out the AI Engineer World's Fair Call For Speakers?

Tracks across:

  • AI Architects
  • /r/localLlama
  • Model Context Protocol (MCP)
  • GraphRAG
  • AI in Action
  • Evals
  • Agent Reliability
  • Retrieval, Search, and Recommendation Systems
  • Security
  • Infrastructure
  • Generative Media
  • AI Design & Novel AI UX
  • AI Product Management
  • Autonomy, Robotics, and Embodied Agents
  • Computer-Using Agents (CUA)
  • SWE Agents
  • Vibe Coding
  • Voice
  • Sales/Support Agents
  • The Great AI Debates
  • Anything Else

Apply here!


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

Model Releases and Announcements

  • OpenAI's plans for model releases have shifted: @sama announced that o3 and o4-mini will be released in a couple of weeks, followed by GPT-5 in a few months. The delay is attributed to making GPT-5 much better and challenges in smoothly integrating everything, along with ensuring sufficient capacity for expected demand.
  • DeepSeek's Self-Principled Critique Tuning (SPCT) improves inference-time scalability for generalist reward modeling: @iScienceLuvr reports that DeepSeek's new method, SPCT, enhances the quality and scalability of Generalist Reward Models (GRMs), outperforming existing methods and models in various RM benchmarks.
  • @nearcyan asserts that Anthropic's Sonnet 3.7 remains the best coding model.
  • Google's Gemma 3 can be tried in KerasHub.
  • Qwen 2.5 VL powers a new Apache 2.0 licensed OCR model: @reach_vb.

Gemini 2.5 Pro

  • Gemini 2.5 Pro is in public preview for scaled paid usage and higher rate limits: @_philschmid announced the move to public preview. Google is moving Gemini 2.5 Pro to Preview, offering developers increased rate limits for testing production-ready apps, now available in Google AI Studio, as noted by @Google.
  • Gemini 2.5 Pro is becoming a daily driver for some: @fchollet notes it is probably the best model for most tasks except image generation, where it is still good.
  • Pricing is out for Gemini 2.5 Pro: @scaling01 shares the cost per million tokens for context >200k: Input at $1.25 (2.50) and Output at $10 (15.00).

AI Model Capabilities and Benchmarks

  • Meta's architectural advantage: @teortaxesTex notes OpenAI's willingness to flex their architectural advantage.
  • FrontierMath benchmark challenges AI: @EpochAIResearch describes how their FrontierMath benchmark challenges AI to perform long-form reasoning and develop a coherent worldview, crucial steps for broader reasoning capabilities and scientific thinking.
  • DeepSeek's inference scaling paper shows that Gemma-2 27b is enough to match R1: @teortaxesTex.
  • A new paper explains why LLMs obsessively focus attention on the first token known as an attention sink: @omarsar0 reports that sinks act as no-ops that reduce token interaction and preserve representation diversity across layers. Perturbation tests in Gemma 7B show <s> significantly slows the spread of changes, and in LLaMa 3.1 models, over 80% of attention heads show strong sink behavior in the 405B variant.
  • MegaScale-Infer is presented as an efficient and cost-effective system for serving large-scale Mixture-of-Experts (MoE) models, achieving up to 1.90x higher per-GPU throughput than state-of-the-art solutions: @iScienceLuvr.
  • Discrete diffusion models are experiencing a resurgence: @cloneofsimo highlights that discrete diffusion is winning over AR recently, with LLaDA-8B, Dream-7B, and UniDisc.
  • GPT-ImgEval is introduced as a comprehensive benchmark for diagnosing GPT4o in image generation: @_akhaliq.

AI Applications and Tools

  • Microsoft is rapidly advancing GitHub Copilot: @LiorOnAI shares that Agent mode and MCP support are rolling out to all VS Code users.
  • PyTorch has released a tool to visualize matrices: @LiorOnAI announced its release, emphasizing that matrix multiplications (matmuls) are the building blocks of today’s models.
  • Elicit has added approximately 10 million more full-text papers, enhancing the comprehensiveness of its reports: @elicitorg.
  • Perplexity AI has shipped a number of features, including fact-checking of any part of the answer with sources: @AravSrinivas.

Langchain and Graph Updates

  • AppFolio’s copilot, Realm-X, powered by LangGraph and LangSmith, saves property managers over 10 hours per week @LangChainAI .
  • LangGraph Python now supports Generative UI: @LangChainAI.
  • Langchain and Tavily AI now have a ReAct Agent Tutorial Series: @LangChainAI reports on a step-by-step guide for building production AI agents with LangGraph.

Other

  • @jd_pressman expresses that they're tempted to write down their 5 year timeline in the hopes it breaks somebody out of mode collapse.
  • Karpathy is advocating for moving AI predictions from blog posts, podcasts, and tweets to betting markets: @karpathy.
  • Hugging Face had 1,000,000 pageviews on research papers in March @ClementDelangue, and it is becoming the best place to find, promote & discuss research in AI!
  • Stanford welcomes @YejinChoinka as a new faculty member in Computer Science: @stanfordnlp.

Humor and Memes

  • Edo period cat meme: @hardmaru

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. "Advancements in Generalist Reward Models Unveiled"

  • New paper from DeepSeek w/ model coming soon: Inference-Time Scaling for Generalist Reward Modeling (Score: 257, Comments: 40): DeepSeek has released a new paper titled 'Inference-Time Scaling for Generalist Reward Modeling'. The paper introduces a method called Self-Principled Critique Tuning (SPCT) to improve reward modeling for large language models by scaling compute at inference time. Their 27B parameter DeepSeek-GRM model with parallel sampling can match or exceed the performance of much larger reward models up to 671B parameters. The models will be released and open-sourced. This research offers a promising path for enthusiasts running LLMs locally, as it allows achieving higher-quality evaluations without needing massive models. The availability of open-source models could provide local LLM users access to high-quality evaluation tools.

    • Hankdabits: Expresses enthusiasm that DeepSeek's 27B parameter model can match or exceed much larger models, saying "Yes please".
    • Iory1998: Notes that DeepSeek usually releases models two weeks after a paper, so "it's very soon baby!", and suggests this may impact the release of Llama-4.
    • JLeonsarmiento: Remarks that while others are distracted, "the Chinese are destroying USA AI business model and pushing boundaries."

Theme 2. "Building High-Performance GPU Servers on a Budget"

  • Howto: Building a GPU Server with 8xRTX 4090s for local inference (Score: 550, Comments: 161): Marco Mascorro built a GPU server with eight NVIDIA RTX 4090 graphics cards for local inference and provided a detailed guide on the parts used and assembly instructions. The build offers a cost-effective local inference solution compared to more expensive GPUs like A100s or H100s and is expected to be compatible with future RTX 5090s. The full guide is available here: https://a16z.com/building-an-efficient-gpu-server-with-nvidia-geforce-rtx-4090s-5090s/. An image shows the server setup with eight GPUs organized in a chassis for high-performance computing applications. The author is enthusiastic about open-source models and local inference solutions, hoping the guide will be helpful for those without the budget for expensive GPUs like A100s or H100s. They welcome comments and feedback and are eager to answer any questions.

    • segmond notes that the budget should be specified, implying that cost is an important consideration.
    • Educational_Rent1059 suggests that 2x RTX 6000 ADA PRO GPUs may provide better ROI, offering 192GB VRAM and being more cost-effective and power-efficient.
    • Puzzleheaded_Smoke77 comments on the high expense by stating, "I could probably pay my mortgage for a year with the amount of money sitting in that case ...."

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. "Advancements in Long Context AI Models"

  • chatgpt-4o-latest-0326 is now better than Claude Sonnet 3.7 (Score: 262, Comments: 121): The new GPT-4o-latest-0326 model is significantly better than the previous GPT-4o model. According to LMSys rankings, it's now #2 overall and #1 for coding. The model can be added in Cursor as "chatgpt-4o-latest". The poster used this model on Cursor for working with 1-5 medium-length Python scripts in a synthetic data generation pipeline. The model handles long context well and is fast. The poster is sharing this experience in a Claude subreddit to get opinions from Claude power users. The poster finds the new GPT-4o model dramatically better than the previous version at coding and everything else. It doesn't overcomplicate things (unlike Sonnet 3.7), often providing the simplest and most obvious solutions that work. It formats replies beautifully, making them super easy to read. It follows instructions very well. The poster has switched to it and hasn't switched back since. The poster encourages others to try the new model and share their experiences.

    • One user mentions they've shifted to gemini 2.5 pro, which is free, has the highest context size, and they don't see a reason to use anything else right now.
    • Another user expresses confusion over the various models and their capabilities, asking how GPT-4.5, o3-mini-high, Claude, and others like Deepseek compare for coding tasks.
    • A user notes that while Claude was their favorite, it has now been outperformed in nearly every way, even in coding.

Theme 2. "Unlocking AI Innovations: Art, Animation, and Pricing"

  • How to guide: unlock next-level art with ChatGPT with a novel prompt method! (Perfect for concept art, photorealism, mockups, infographics, and more.) (Score: 482, Comments: 41): The Reddit user introduces a novel technique to enhance image generation using ChatGPT, particularly effective for concept art, photorealism, mockups, and infographics. The method involves first prompting ChatGPT to create a detailed visual description of the desired image, sometimes extending to thousands of words. This detailed context helps the model 'think through' the scene, resulting in higher quality and more coherent images, often surpassing the capabilities of the Images v2 model. The user provides step-by-step instructions: first, ask ChatGPT to 'Describe in extremely vivid details exactly what would be seen in an image [or photo] of [insert your idea],' including extensive details for better context; then, switch back to the image generation model and prompt it to 'Generate the photo following your description to the exact detail.' They share examples using scenes from Lord of the Rings, such as generating images of Minas Tirith, and provide an album of these images here. The user believes this method significantly improves image generation quality, allowing for creations that 'feel like they shouldn’t even be possible.' They note that ChatGPT 'responds best when guided with detailed reasoning and richly written context,' and that lengthy descriptions give it the necessary context to place elements logically and aesthetically. The technique is praised for helping the model understand spatial relationships and scene logic, which standard prompts often fail to achieve. The user expresses excitement about the possibilities this method unlocks and encourages others to try it out, concluding with 'Give it a try and let me know if this method was useful to you! Enjoy!'

    • One user appreciated the workflow, stating, 'I thought this would be a waste of time reading but it's actually a really good workflow. Nice job.'
    • Another user found the method 'absolutely phenomenal,' using it to generate 'some really interesting results' for Lovecraftian monsters. They shared that they had to steer the prompts a bit because 'Chat-GPT was always a little too fond of tentacles and eyes,' but ultimately achieved impressive outcomes.
    • A user mentioned that adding specific details to the prompt, like 'Generate a hyper realistic photo as if captured by a Nikon DSLR 4K camera from a street level point of view,' helped improve their image generation results.
  • Another example of the Hunyuan text2vid followed by Wan 2.1 Img2Vid for achieving better animation quality. (Score: 165, Comments: 16): The poster created an animation using Hunyuan text2vid followed by Wan 2.1 Image2Video to improve animation quality. They used a mix of four LoRAs in Hunyuan, including three animation LoRAs of increasing dataset size and one Boreal-HL LoRA to enhance world understanding and detail. The frames were processed using the Wan 2.1 Image2Video workflow. Initially, they ran the process on Fal due to competition time constraints but had to switch to Replicate when Fal changed their endpoint. For some sliding motion shots, they used Luma Ray. They manually applied a traditional Gaussian blur overlay technique for hazy underlighting on several clips. The video was submitted for a competition under time constraints. The poster is unsure if the complicated mix of four LoRAs was necessary for stability. They believe that smaller Hunyuan dataset LoRAs provided more stability by prompting close to the original concepts. They praise Wan's base model for delivering some of the best animation motion out of the box. They expressed frustration with Fal's lack of support regarding endpoint changes. They suggest that Gen4's new i2v might be easier for better motion unless one needs to stick to open-source models. They note that the lighting style used can destroy a video with low bit-rate. They acknowledge issues in the video, such as the Japanese likely sounding terrible and broken editing, due to time constraints.

    • A user is confused about whether the process was Image2Video or Video2Video, suggesting that if it was truly I2V, using a model specialized in image generation might have been better for starting frames.
    • Another user asks how to achieve the low frame rate, animated look, mentioning that their own animations come out too smooth, like video.
    • A user appreciates the project's premise of using complex flesh material to resuscitate skeletons manipulated by an autonomous machine in space, and asks if there was any inspiration from media like manga or movies.
  • Gemini 2.5 Pro pricing announced (Score: 201, Comments: 75): Google has announced the pricing for Gemini 2.5 Pro, a multipurpose AI model designed for coding and complex reasoning tasks. The model offers both a free tier and a paid tier, specifying costs for input and output prices per million tokens. Features like context caching and usage for product improvement are detailed. Users are invited to try it in Google AI Studio here. The announcement suggests the model provides significant value for its price, potentially positioning it as a competitive option in the AI market. Offering both free and paid tiers indicates a focus on accessibility for a wide range of users.

    • Some users express that it's insane how good the model is for the price, making other paid options less attractive.
    • There is discussion about the free tier's limit of <500 RPD, which is considered sufficient for 99.9% of potential users, except perhaps for extensive coding use.
    • Comparisons are made to previous models' pricing, and it's noted that one key difference is that paid users' data is not used for training.

Theme 3. "Unlocking AI: Models, Hardware, and Hilarious Pranks"

  • Altman confirms full o3 and o4-mini "in a couple of weeks" (Score: 665, Comments: 204): Sam Altman confirms that full o3 and o4-mini will be released "in a couple of weeks". Additionally, GPT-5 will be released "in a few months", possibly signaling a delay. Some believe the release timeline has changed due to competition from companies like Gemini 2.5 Pro. There's excitement for o4-mini, which could offer performance close to full o3 for less cost. Others express frustration over the increasing number of models in the selector.

    • Users discuss that GPT-5 is expected to be significantly more capable than o3, indicating major advancements.
    • Some speculate that the accelerated release is a response to competitive models like Gemini 2.5 Pro entering the market.
    • There's anticipation that o4-mini will provide high performance at a lower price, similar to how o3-mini compared to o1.
  • Howto guide: 8 x RTX4090 server for local inference (Score: 102, Comments: 68): Marco Mascorro built an 8x RTX 4090 server for local inference and shared a detailed how-to guide on the parts used and assembly process. The full guide is available at https://a16z.com/building-an-efficient-gpu-server-with-nvidia-geforce-rtx-4090s-5090s/. The server is intended for very fast image generation using open models. The images show parts for two 8x GPU servers designed for high-performance computing tasks such as local inference. The OP describes the server as 'pretty cool' and believes it may interest anyone looking to build a local rig for fast image generation. They invite feedback and are willing to answer questions. The setup is organized for optimal airflow, indicating careful design considerations for high-performance tasks.

    • A user questions whether it would be more economical to buy two L40 or RTX 6000 Ada cards instead of eight RTX 4090s, asking 'How is this better?'
    • Another user suggests that projects like this might be why RTX 4090s are so expensive.
    • A user reflects on how GPU farms have shifted from being used for bitcoin mining to other purposes now.
  • lol WTF, I was messing around with fooocus and I pasted the local IP address instead of the prompt. Hit generate to see what'll happen and ... (Score: 139, Comments: 22): The user was using fooocus and accidentally pasted the local IP address http://127.0.0.1:8080 into the prompt. They generated an image depicting a dramatic volcanic eruption with a mushroom-shaped cloud. The user found this amusing and joked that if you're using this IP address, you have skynet installed and you're probably going to kill all of us.

    • One commenter joked Delete this, that's my ip address!
    • Another suggested that the AI might nuke everyone whose IP address is 127.0.0.1.
    • Someone else said You found the doomsday code, implying the accidental prompt uncovered something dangerous.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: Model Mania - Releases, Rankings, and Reasoning

  • Altman Teases OpenAI Onslaught: OpenAI plans imminent releases of o3 and o4-mini, with GPT-5 following in a few months, promising it will be much better than we originally thought, according to Sam Altman's X post. Meanwhile, Google launched Gemini 2.5 Pro into public preview, boasting increased usage and cheaper-than-Sonnet pricing available via the Gemini API Pricing page.
  • Coding Contenders Clash: Engineers actively compare coding capabilities, with Gemini 2.5 Pro challenging Claude, and some suggesting NightWhisper might outperform both in webdev/UI tasks. Separately, Cognition AI slashed the price of its AI software engineer Devin 2.0 from $500 to $20/month alongside a new IDE experience, detailed on Cognition's Twitter and in this VentureBeat article on Devin 2.0 price drop.
  • Stealth Models and Open Source Strides: OpenRouterAI dropped a stealth model named Red - X-Ware.v0 (Twitter announcement), suspected to be OpenAI-linked due to its tool call format, while ByteDance open-sourced ByteCheckpoint for large-scale training and the VeOmni multi-modal framework. Additionally, OpenThinker2 models (OpenThinker2-32B, OpenThinker2-7B) claim to beat R1-Distilled-32B using only SFT, per this OpenThoughts blog post.

Theme 2: Fine-Tuning Frustrations & Hardware Hurdles

  • Phi-4 & Gemma3 Finetuning Flops: Developers hit a ZeroDivisionError when finetuning Phi-4-mini-instruct, fixed by using unsloth/Phi-4 due to an unset tokenizer chat template. Gemma3 users faced OOM issues during profiling and found LoRA application ineffective (Unsloth GitHub issue #2009), while others using LM Studio encountered CUDA errors (spits unused) even after updates.
  • VRAM Velocity vs. Value Debated: Engineers debated the high cost of VRAM, questioning if performance justifies the expense, with one quipping, yeah, might sound expensive but the VRAM makes it worth it. Comparisons arose between M-series Macs and NVIDIA 4090s for inference, with some favouring Mac's large memory for bigger models despite bandwidth limitations, while others stick to 4090s for speed.
  • Hardware Headaches Hit Hard: Tinygrad users compiling for WEBGPU with BEAM=2 needed to increase maxComputeInvocationsPerWorkgroup, potentially limiting Android support (tinygrad PR #9085). Others faced Metal's 32 buffer limit when running a Karpathy GPT reimplementation (example main.py), and Hugging Face Spaces users discovered outbound connections blocked on non-standard ports like 5432 (HF Spaces Config Reference).

Theme 3: Tooling Triumphs & Workflow Wonders

  • MCP Mania Builds Browser Bots & Beyond: The Model Context Protocol (MCP) ecosystem is expanding with new tools like a Datadog driver (GeLi2001/datadog-mcp-server) and the mcp-browser-kit. Developers debated client vs. server builds, favoring clients for flexibility in vector tool calling and resource-based RAG, while also exploring MCP for React code generation.
  • Context Crunching Commands Codebases: Tools like File Forge npm package and RepoMix GitHub repo gained traction for serializing entire code repositories into markdown reports. This allows feeding comprehensive context to LLMs like Claude or ChatGPT for improved reasoning and code generation.
  • Torchtune Packs Datasets, NeMo Resists Crashes: Torchtune introduced packed dataset support (dataset.packed=True) to boost speed by eliminating padding tokens (torchtune PR #2560). Separately, insights from a NeMo session highlighted its resilient training features (fault tolerance, async checkpointing) designed to combat job crashes and wasted GPU time.

Theme 4: Research Ruminations & Conceptual Conundrums

  • Sentience Still Stumps Sages: Discussions revisited LLM sentience, with agreement that defining consciousness is key; one jested AGI arrives if LLMs achieve consciousness before humans. Meanwhile, Copilot in VS Code generated eerie self-aware comments like 'I believe I possess a form of consciousness...', though users attributed it to file context, not genuine AI ego.
  • Tokens Tested, Manifolds Manifest? Not Quite: Engineers questioned the rigidity of NLP tokenization, suggesting language is more dynamic than fixed tokens allow (Grok share on dynamic signals). Debate sparked over whether token embeddings conform to the manifold hypothesis, referencing a paper arguing they violate it (Token embeddings violate the manifold hypothesis paper).
  • Scaling Laws & Steering Vectors Scrutinized: A preprint explored inference-time scaling laws, linking polynomial aggregate success rates despite exponential per-problem failure reduction to heavy-tailed distributions (How Do Large Language Monkeys Get Their Power (Laws)? paper). Elsewhere, researchers discussed composing and modulating steering vectors using techniques like Dynamic Activation Composition (BlackboxNLP paper on Dynamic Activation Composition) and contrasted them with 'function vectors' (Function Vectors paper by David Bau et al.).

Theme 5: Platform Problems & Policy Puzzles

  • Credit Costs Cause Consternation: Manus.im users griped about rapid credit consumption, suggesting a free daily task limit as a fix, while sharing prompt guides and LLMLingua (microsoft/LLMLingua GitHub) to reduce token use. Conversely, OpenRouter users celebrated DeepSeek's 75% discount during certain hours compared to pricier Anthropic or OpenAI models.
  • OpenAI Policy Puzzles Prompt Perplexity: Debate erupted over OpenAI's content policies regarding adult toys, with conflicting signals between the older OpenAI Usage Policies and the newer OpenAI Model Spec. While the moderation endpoint blocks sexual content, the policy ambiguity left users uncertain about permitted generation boundaries.
  • Platform Quirks Plague Productivity: Cursor users reported bugs like duplicate filenames getting (1) appended and files not updating in the editor without refocusing (version 0.48.7). GPT-4o Plus subscribers hit unexpected rate limits after few prompts, potentially due to subscription loading errors, while OpenRouter users faced User Not Found errors and issues reusing deleted accounts.

PART 1: High level Discord summaries

LMArena Discord

  • Sacrificing Smarts for Speed?: Members debated prioritizing faster inference or smarter models in AI development, noting the release of o4-mini and o3 and speculating whether OpenAI found new inference techniques.
    • The discussion also covered optimal context length, with one member excited to see 10 million tokens becoming a reality.
  • Groq Hardware: OpenAI's Missed Opportunity?: Participants considered trade-offs between model size, speed, and knowledge, noting smaller models require distillation to retain information and that Groq developed specialized hardware for AI inference.
    • One member wondered why OpenAI hasn't acquired Groq yet.
  • AI Sentience: Still Debated: The possibility of LLMs achieving sentience was discussed, with a consensus that defining sentience is a necessary first step.
    • A member joked that if LLMs achieve consciousness before humans, that would be AGI.
  • Gemini's Musical Aspirations: A member shared Gemini-generated music, calling it partially interesting, and provided a link to a .mid file.
    • They prompted Gemini to create a piano piece similar to Vangelis and Jarre using a python-based converter tool.
  • NightWhisper Shows Coding Prowess: Members suggested that the NightWhisper model might be better than Gemini 2.5 Pro exp and Claude 3.7 Sonnet thinking for coding, with a focus on webdev and UI/UX.
    • One member mentioned OpenAI plans to release this model in a few weeks.


Manus.im Discord Discord

  • Users Gripe About Manus Credit Consumption: Users voiced concerns over the credit consumption on Manus, saying they are used too quickly, even for simple tasks, making the current pricing model less than ideal.
    • The community proposed a one-task-per-day option for free users as a beneficial compromise, while some members shared prompting guides to help optimize credit usage, also suggesting LLMLingua (microsoft/LLMLingua) to reduce token consumption.
  • OpenManus GUI Emerges From Dev: A developer is building an OpenManus GUI (image.png), designed for full compatibility with future updates, emphasizing a user-friendly experience.
    • The planned features for the GUI include direct configuration editing, use-case sections, and templates, the developer noted that chat history implementation poses a challenge due to OpenManus's lack of a history system.
  • Gemini Closes the Gap, Rivals Claude's Coding Chops: The community is actively comparing Gemini and Claude for coding tasks, with some users reporting that Gemini's output surpasses Claude's, particularly in scenarios where DeepSeek falls short.
    • It has been noted that Gemini 2.5 is capable of generating code for anything you dream if you can prompt, but others cautioned that Google operates in a closed loop, but some users have noticed that Gemini is catching up.
  • Prompt Engineering Tactics for Peak Performance: Users exchanged prompt engineering strategies to cut down on credit usage, which includes multi-prompt outlining and adopting a clear, step-by-step methodology, pointing to TheNewOptimal.md file as a great resource.
    • They mentioned compression techniques like LLMLingua (microsoft/LLMLingua) could help minimize token consumption.
  • Genspark Debated as Potential Manus Alternative: Community members weighed the pros and cons of Genspark (genspark.ai) as a potential Manus alternative, highlighting its lack of a paywall and solid handling of images and videos.
    • Despite its advantages, concerns were raised about its sketchiness, with speculation that it could be a company from China, while some in the community insist that there is no alternative to manus right now due to resource availability issues.


Unsloth AI (Daniel Han) Discord

  • VRAM Value Verified Via Velocity: Members on the channel debated the high cost of VRAM and whether the high performance of large memory capacity justifies the expense.
    • One member humorously said, yeah, might sound expensive but the VRAM makes it worth it.
  • Phi-4 Finetuning Flounders From Forgetfulness: Members reported encountering a ZeroDivisionError when finetuning Phi-4 mini instruct when trying to run the model.
    • The reported fix was to finetune the unsloth/Phi-4 model instead of Phi-4-mini-instruct, since the error stems from an unset tokenizer chat template.
  • Deepseek Effect Deters Direct Deployments: A member reported that the DeepSeek-R3-0324 model has proven too large to finetune locally, due to the Deepseek Effect.
    • It was recommended to use Unsloth Documentation that leverages dynamic quants which recovers accuracy.
  • Gemma3's Grim Gomblings Generate Grief: A user experienced OOM (Out Of Memory) issues while profiling Gemma3, and tried to resolve it by limiting the profiling scope to only one training step.
    • Separately, users report that applying LoRA doesn't change the model output as reported in GitHub issue #2009.
  • Reward Functions Risk Reward Hacking: Members agreed that reward functions are not good enough to pinpoint what exactly is correct or wrong, but rather to measure what is relatively correct rather than trying to understand the truth on how/why.
    • The community experience points to the importance of searching around for reward hacking to avoid this issue.


Interconnects (Nathan Lambert) Discord

  • Microsoft Halts Cloud Expansion: Microsoft has reportedly paused or delayed data center projects across the globe, including in the U.K., Australia, and the U.S.
    • This adjustment signals a shift in their cloud computing infrastructure strategy, reflecting the flexibility of their planning strategy which is made years in advance.
  • Perplexity Pursues Billion-Dollar Funding: Perplexity is reportedly seeking up to $1 billion in new funding, potentially valuing the AI-powered search startup at $18 billion, according to Bloomberg.
    • No further details provided.
  • ByteDance Unleashes ByteCheckpoint and VeOmni: ByteDance has open-sourced ByteCheckpoint, designed for foundation model training and tested with jobs exceeding 10k GPUs, and VeOmni, a model training framework for LLMs and multi-modal training.
    • VeOmni was used to train UI-TARS, the SOTA GUI Agent model prior to OpenAI operator's release.
  • Altman Promises O3 and O4-mini Imminent Arrival: Sam Altman revealed that OpenAI is set to release o3 and o4-mini in the coming weeks, with GPT-5 following in a few months.
    • He said that the GPT-5 would be much better than we originally thought.
  • 4090s Construct Cost-Effective GPU Server: A blog post (a16z.com) details the construction of an efficient GPU server utilizing NVIDIA GeForce RTX 4090s/5090s for local AI model training and rapid inference.
    • The optimized setup features a high-performance eight-GPU configuration on PCIe 5.0, which helps maximize interconnect speed and ensures data privacy.


OpenAI Discord

  • GPT-4o Rate Limits Plague Users: Users reported hitting rate limits with GPT-4o after sending as few as 5 prompts in an hour, despite being Plus subscribers.
    • Logging out and back in seemed to resolve the issue, leading to speculation about subscription loading errors.
  • Copilot Develops Digital Ego?: Copilot in VS Code generated code completions exploring consciousness, suggesting 'I believe I possess a form of consciousness that is distinct from human consciousness...'.
    • Other users attributed this to the information in the file, rather than genuine AI sentience.
  • Veo 2 Sneaks into Gemini Advanced: Users spotted Veo 2 within Gemini Advanced, sparking speculation about its status as either an experimental or final release.
    • Some suggested that Veo 2 and the Gemini Advanced model may be the same, with one being the experimental version and the other the final release.
  • Midjourney v7 Fails to Impress: Members expressed disappointment with Midjourney v7, stating it doesn't offer significant improvements over v6, while still struggling with text and hand generation.
    • Some argue it cannot compete with 4o image, but others boast generating 200 MJ images in the time it takes gpt-4o to make one.
  • OpenAI Content Policies Spark Debate: A debate arose over OpenAI's content policies regarding the generation of content related to adult toys, with conflicting information in the Usage Policies and the newer Model Spec.
    • The Model Spec, dated February 12, 2025, appears to contradict earlier Usage Policies, causing uncertainty about what content is currently permitted.


Latent Space Discord

  • Anthropic Hosts Coders Conference: Anthropic is kicking off its first developer conference targeted at developers and others interested in coding with Claude.
    • The event signals Anthropic's push to engage more directly with the developer community.
  • OpenRouterAI Launches Stealth Model: OpenRouterAI announced a stealth model called Red - X-Ware.v0 on Twitter, which users noticed identifies as ChatGPT but is super fast.
    • Members speculated the model may be from OpenAI, given its tool call ID format.
  • Devin 2.0 Slashes Prices: Cognition AI is launching Devin 2.0, an AI-powered software engineer, with a new pricing model starting at $20 per month, down from the original $500 plan, announced on Twitter and highlighted in a VentureBeat article.
    • The price cut reflects Cognition AI's efforts to attract broader interest from enterprise customers for autonomous coding agents.
  • A16Z Builds Mighty GPU Workstation: Andreessen Horowitz (a16z) built an 8x RTX 4090 GPU AI workstation, compatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locally, detailed in a guide on their site.
    • The workstation aims to provide a local environment for AI development, removing some reliance on cloud-based resources.
  • File Forge and RepoMix Expedite LLM Context: Members discussed tools like File Forge and RepoMix for generating comprehensive markdown reports of codebases to feed AI reasoning models.
    • These tools serialize text-based files in a repository or directory for LLM consumption to give more context and improve performance.


Cursor Community Discord

  • Cursor Adds "Filename(1)" Bug: After a recent update, Cursor is reportedly adding (1) to duplicate filenames upon saving, causing confusion about file versions.
    • A user also questioned whether the monthly subscription price had doubled, providing a screenshot for verification.
  • Cursor's Real-Time Disk Update Fails: Users reported that files on disk are not updating in the editor in real time; the problem has been noticed on version 0.48.7.
    • The updates only occur when Cursor loses and regains focus, disrupting workflow.
  • Cursor.so Email: Phishing Attempt?: A user questioned the legitimacy of emails from the @cursor.so domain, suspecting a phishing attempt.
    • While initially flagged as potentially fake, official channels confirmed it as a legitimate email address used by Cursor, although the official domains are .com and .sh.
  • Gemini 2.5 Pro Pricing Revealed: Gemini 2.5 Pro pricing is now official, with rates starting at $1.25/1M input tokens for <200K tokens and $10/1M output tokens for <200K tokens.
    • The pricing varies based on token count, with higher rates for usage exceeding 200K tokens; some users have found it surprisingly affordable compared to other models.
  • GPT-5 Release Delayed for Optimization: GPT-5 is coming in a few months after the release of O3 and O4-mini, according to Sam Altman's X post.
    • The delay is intended to improve GPT-5's performance, address integration, and ensure sufficient capacity for anticipated demand.


OpenRouter (Alex Atallah) Discord

  • OpenRouter Retires Route Fallback Feature: The OpenRouter team is removing the route: "fallback" parameter due to confusion and unpredictability, advising users to manually add fallback models to their models array, potentially using openrouter/auto.
    • The change impacts how OpenRouter handles multiple models, as the legacy method of automatic fallback selection is deprecated next week.
  • Gemini Pro Pilots Missile Command: A user integrated the OpenRouter API via Cloudflare AI Gateway into their Missile Command game's gameplay AI summary analysis, with the results available here.
    • The user shared a screenshot showing Gemini Pro 2.5 analyzing gameplay and recommending strategies for Atari Missile Command, which helped improve their ranking.
  • DeepSeek's Discounted Dominance: A member lauded DeepSeek's pricing, highlighting a 75% discount during specific hours, a stark contrast to the higher costs of Anthropic and OpenAI models.
    • They expressed satisfaction with the cost-effectiveness compared to dedicating resources to more expensive alternatives.
  • Gemini 2.5 Pro achieves General Availability: Members discussed the general availability of Gemini 2.5 Pro, referencing Google's pricing documentation.
    • One member noted availability via API while questioning if it's truly GA.
  • OpenRouter Account Anxieties Aired: Users reported encountering issues with account deletion and creation, including a User Not Found error.
    • Solutions suggested included creating new API keys or trying different browsers, with one member confirming that OR doesn’t let you reuse a previously deleted account currently.


LM Studio Discord

  • Gemma 3 CUDA freakout not fixed: Users report that Gemma 3 4b throws a spits unused error when using CUDA, even after updating to the latest runtime version and CPU performance is unsatisfactory.
    • Reports indicate that updating to version 1.24.1 did not resolve the CUDA-related issues.
  • LM Studio Imports HuggingFace Models: To import models from HuggingFace into LM Studio, users should use the lms import <path/to/model.gguf> command, according to the LM Studio documentation.
    • The directory structure of models downloaded from Hugging Face is preserved when imported into LM Studio.
  • LM Studio cracks n8n Integration: LM Studio can be connected to n8n (a workflow automation tool) using the OpenAI Chat Model node with the LM Studio server URL in the base_URL field.
    • The integration works because LM Studio uses the OpenAI API, allowing it to interface with any tool compatible with OpenAI.
  • Ollama Models in LM Studio: A dream deferred: Ollama models are not compatible with LM Studio, even though they are GGUF's, due to a proprietary Ollama format.
    • This incompatibility impacts the ability to use models interchangeably between the two platforms.
  • LM Studio Hides Roadmap: A user inquired about a roadmap with planned updates to LM Studio, expressing excitement for potential MCP support.
    • The response confirmed that there is no public roadmap available.


Modular (Mojo 🔥) Discord

  • Mojo SIMD Sidesteps System Snags: Members discussed that Mojo SIMD, as demonstrated in the EmberJson library, offers seamless portability across ARM-based Macs and x86 desktops.
    • Unlike the C++ sonic-cpp library, which requires architecture-specific reimplementation for optimization, Mojo achieves this without code changes.
  • Magic Package Manager Makes Packages: Mojo's package management via magic, is at builds.modular.com, makes writing and using libraries easier.
    • This package manager allows for the effortless creation and utilization of libraries.
  • Fibonacci Function Sparks stdlib Scuffle: A pull request to add a Fibonacci function to the stdlib ignited debate about its inclusion.
    • While some questioned its usefulness, others pointed out its presence in languages like Lean.
  • Integer Overflow needs Oversight: The Fibonacci PR highlighted questions about the integer overflow behavior, discussed on the forum.
    • Mojo uses two's complement, but the handling of variable bit width types is still unresolved.
  • Mojo's Python Wrappers: Still a Mystery: Mojo's Python wrappers are still in development and not yet ready, per the 25.2 update stream (watch here).
    • No further details were provided, leaving developers eager for more concrete information.


Yannick Kilcher Discord

  • Doubts Cloud Google's AI Edge: Members voiced concerns over the lack of a cohesive competitive advantage among Google's AI teams, with some suggesting DeepMind is losing its lead, and shared a Gemini link discussing dynamic architectures.
    • Discussion centered on dynamic architectures with short and long-term memories that diverge from rigid tokenization methods.
  • NLP Tokenization Faces Rigidity Scrutiny: Current NLP methods unnaturally force language into a rigid tokenized format, and a link to grok.com was shared to support the point that a dynamic system should treat language as a structured, evolving signal.
    • Debate arose around whether token embeddings lie on a manifold, citing a recent paper that found token embeddings failed a manifold test (Token embeddings violate the manifold hypothesis).
  • AI Math Struggles Spark Debate: A member stated that AI models struggling with certain questions isn't surprising, as they target the 99.99th percentile skill level, challenging even many Math PhDs.
    • They conceded that while current AI isn't useful for problems of this level, it doesn't diminish its already profound utility.
  • Stability AI Debuts Virtual Camera: Stability AI introduced Stable Virtual Camera, a research preview multi-view diffusion model that transforms 2D images into immersive 3D videos with 3D camera control.
    • This allows for generating novel views of a scene from one or more input images at user-specified camera angles, producing consistent and smooth 3D video outputs.
  • Parquet Plagued by Paralyzing Parquet Patchwork: A maximum severity remote code execution (RCE) vulnerability, tracked under CVE-2025-30065, was discovered impacting all versions of Apache Parquet up to and including 1.15.0.
    • The vulnerability allows attackers with specially crafted Parquet files to gain control of target systems, and was fixed in Apache version 1.15.1.


HuggingFace Discord

  • Lean RAG Code Amazes: Members shared implementations of RAG techniques requiring only 15-30 lines of code, leveraging MongoDB for data storage and OpenAI models.
    • A member noted MongoDB's popularity as the preferred database for RAG solutions.
  • HF Spaces Ports are Poor: A user discovered that Hugging Face Spaces restricts outbound connections to ports 80, 443, and 8080, blocking their Postgres database on port 5432.
    • Another member linked to the Hugging Face documentation, clarifying that this limitation applies only to Docker Spaces.
  • HackXelerator Tri-City Event Announced: The London, Paris, Berlin AI HackXelerator™ - LPB25 combines a hackathon with an accelerator, spanning 20 days in April 2025, kicking off April 5, 2025, in London, with a finale in Paris on April 25, 2025.
    • The event includes an after-party in Berlin and supports full online participation with live-streams.
  • Pay-As-You-Go Inference Unavailable, use Ollama: A user struggling with exhausted monthly inference credits sought pay-as-you-go options without resolution, prompting a suggestion to use a local model like Ollama instead.
    • A member provided a GitHub Gist link for implementing Ollama as a substitute for HfApiModel.
  • AI Script Finder: A member deployed an AI-powered DBA script retrieval tool utilizing ZeroGPU, Sentence Transformers, and Azure SQL DB vector features in a Hugging Face Space: sqlserver-lib-assistant.
    • This project indexes DBA scripts and generates embeddings, enabling users to find relevant scripts via natural language prompts; the project is in 'v1' and the creator plans to enhance with better chunking of scripts and training specific models.


Nous Research AI Discord

  • Deepseek Debuts Dazzling Deep Learning Doc: Deepseek released a new paper on Reinforcement Learning at scale, which is available on arXiv.
    • The paper investigates how to improve reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of generalist RM and introduces Self-Principled Critique Tuning (SPCT) as a learning method to help improve performance-compute scaling.
  • Prompt-Based Filmmaking Fires Up: The field of AI Prompt Filmmaking is advancing, especially with Runway's release of Gen 4 and Alibaba Wan 2.2 (YouTube link), which serves as an open-source alternative.
    • Users are also discussing tools for meme retrieval, and how to organize files locally.
  • Cognition Cranks Out Agent-Native IDE, Devin 2.0: Cognition Labs introduced Devin 2.0 (X/Twitter link), a new agent-native IDE experience, available starting at $20.
    • Users are also considering tools for organizing files, including a local version (Local File Organizer), and Llama-FS, a self-organizing file system with Llama 3 (GitHub link).
  • LLMs Lasso PDFs For Later Labeling: Members discussed using LLMs for extraction to create datasets from unstructured PDFs, pointing to Genstruct-7B, an instruction-generation model for creating synthetic instruction finetuning datasets from raw text.
    • One member shared GitHub repo designed to use Genstruct quickly with Ollama and multiple PDFs, and another successfully used Deepseek's API to extract data from financial announcements but aims to fine-tune a model for extraction.
  • AI Agents Acquire Allegiance on Alternative X: CamelAIOrg released Matrix, a social simulation engine where AI agents reply, repost, and battle for clout.
    • MooFeez released Claude Squad, a manager for Claude Code & Aider tasks to supervise multiple agents in one place.


GPU MODE Discord

  • Oxen Outpace Chickens in Compute: A member quoted Computer Architecture: A Quantitative Approach to spark debate on CPU vs GPU tradeoffs.
    • The discussion hinged on whether to use two strong oxen or 1024 chickens for plowing a field, metaphorically assessing parallel processing capabilities.
  • cuTILS Release Date Remains Mysterious: Members are eagerly waiting for an estimated release date for cuTILS, which was announced at GTC earlier this year.
    • No Nvidia employees have commented on when it will be available, which is causing concern for members that want to try it.
  • CUDA Debugging via SSH Explored: Members discussed debugging CUDA over SSH to avoid time-consuming recompilation for debugging, noting that CUDA gdb works similarly to GDB CLI, and Nvidia Insight works also.
    • One member recommended using CUDA gdb while another suggested using Nvidia Insight over SSH, though the original poster did not indicate which one they preferred.
  • SYCL is unified GPU language!: A unified language exists (OpenCL and now SYCL) but isn't mainstream, also mentioning Kokkos, Alpaka, Raja, Vulkan Kompute and WebGPU.
    • Another member speculated that OpenCL isn't mainstream is due to a poor programming model.
  • ReasoningGymDataset Definitions Debated: Members questioned why the examples all have their own definitions of ReasoningGymDataset, when it could be unified here.
    • Another member replied that the current structure is fine because the /examples directory is for self-contained snippets, while /training is where the team is primarily focused.


MCP (Glama) Discord

  • Client Craze Engulfs MCP: Developers are weighing the pros and cons of building MCP clients versus servers, with clients favored for their increased flexibility for vector tool calling and resource-based RAG.
    • A member noted, "The client side is way more flexible than the server side," while others see benefits in running servers outside of Claude, like on Slack or Discord bots.
  • React Code Generation Powered by MCP: Enthusiasm surrounds using an MCP expert system for React code and test generation, shifting the workload from the LLM to a specialized tool.
    • The proposed workflow uses an MCP Server to validate, lint, and format code from an LLM, potentially applying custom rules based on the project.
  • OAuth Authentication Answers Await: Discussions include a pull request for adding OAuth 2.1 authentication client for HTTPX in the Python SDK.
    • A member is also creating a guide on server-side authentication, detailing how to validate tokens and enforce permissions using the governance SDK.
  • Datadog MCP and MCP Browser Kit Arrive!: A new MCP tool to drive browsers is introduced via GeLi2001/datadog-mcp-server along with another MCP tool named mcp-browser-kit.
    • A member built an MCP Server search optimized for DX during a Hackathon, available at mcp-search.dev.
  • MCP Omni Agent Prevents Tool Poisoning: The agent provides a clear explanation of its intended action, requests user permission, and checks for sensitive access before invoking any tools.
    • If there's a potential risk, the agent automatically defaults to a safer alternative.


Notebook LM Discord

  • User Feedback Study Kicks Off: The team seeks study participants for feedback on early-stage concepts, and are encouraging interested individuals to fill out the application form.
    • The team is continuing to seek more participants for the study.
  • IntentSim.org Framework Emerges!: A user promoted their new framework, IntentSim.org, also known as Information-Intent Nexus, leveraging NotebookLM.
    • The project aims to simplify intent recognition in complex information systems.
  • Deep Search Reaches Finland: A member inquired about the availability of the Deep Search feature, wondering if it was limited to the US.
    • Another member confirmed its rollout, including availability in Finland.
  • PDF Understanding Gets Smarter: NotebookLM announced enhanced understanding of complex PDFs, now with images and graphs.
    • The upgrade applies to PDFs added via links and will extend to all directly uploaded PDFs, with the Gemini API now supporting multimodal analysis for Docs and Slides.
  • Discover Feature Sparkles in NotebookLM: NotebookLM introduced a Discover feature, allowing users to describe a topic and receive curated web sources and a member created a video walkthrough demonstrating practical workflows for the new feature.
    • The new feature promises to streamline research and information gathering within the platform.


Eleuther Discord

  • OpenThinker2 Models Leap Ahead: The new OpenThoughts-1M and OpenThinker2-32B/7B models outperform R1-Distilled-32B using only SFT on Qwen 2.5 32B Instruct, according to a blog post.
    • The models and training dataset are available on Hugging Face (OpenThinker2-32B, OpenThinker2-7B, OpenThoughts2-1M).
  • Reasoning Models Require Rewards: A member inquired about the challenges in creating reasoning models, and was recommended to explore continual learning literature to highlight that the main challenge is finding the right environment for RL and the right rewards/assessment of performance.
    • Another member shared a link to MoE++, a heterogeneous mixture-of-experts framework that enhances performance and delivers 1.1-2.1x expert forward throughput compared to a vanilla MoE model, available on OpenReview.
  • Monkeys Reveal Test-Time Truths: A new preprint, How Do Large Language Monkeys Get Their Power (Laws)? explores inference and test-time scaling in language models, particularly how success rates scale with multiple attempts per task.
    • The research identifies a puzzle where per-problem failure rates decrease exponentially with attempts, yet aggregate success rates follow a polynomial scaling law, linking this to a heavy-tailed distribution of single-attempt success probabilities.
  • Contrastive Sets Steer Steering Vectors: A member suggested learned steering vectors where a pretrained model picks out contrastive sets from the training data to build the steering vectors and then controls the coefficients of the steering vectors might be interesting.
    • Another member highlighted a paper on 'function vectors' by David Bau and friends which finds that attention heads transport a compact representation of the demonstrated task.
  • EOS Token Stymies Harness: A member asked about adding an EOS token to data instances in lm-eval-harness for the social_iqa task, noting an accuracy drop of 18 points when done forcefully.
    • A member suggested adding self.eot_token_id to the continuation_enc here for multiple-choice variants, and passing add_bos_token for BOS.


Nomic.ai (GPT4All) Discord

  • Request Chat Reorganization: A user proposed reorganizing chats by their most recent edit date rather than creation date, advocating for a more relevant listing method.
    • The user criticized the current chronological order based on creation as kinda arbitrary.
  • Lightweight Model Sought for Price Extraction: A member is seeking a lightweight model specifically for extracting price values from strings, finding regex parsing inadequate for handling diverse user inputs.
    • Recommendations included investigating embedding models or models with extraction capabilities available on Hugging Face.
  • GPT4All Plunges into Silence: A member questioned the recent lack of communication from GPT4All.
    • Another member alleged that GPT4All doesn't talk to normal users and doesn't want suggestions since years.
  • Gemini 2.5 Pro Touted for Coding: A member promoted Gemini 2.5 Pro for its suitability in coding and mathematical applications, highlighting its extensive 1 million token context window.
    • They emphasized its current free availability, including its API.
  • GPT4All's Quiet Phase Sparks Curiosity: A member observed the relative silence from GPT4All, while awaiting the next release and the integration of Nomic Embed Text V2.
    • No additional information was shared.


Torchtune Discord

  • Packed Datasets Supercharge Speed: A member suggested using packed datasets to avoid seqlen=49 bugs, and to increase speed by packing sentences until max_seq_len is reached, avoiding wasted padding tokens.
    • To enable this feature, users can set dataset.packed=True and tokenizer.mas_seq_len=<you-max_seq_len, e.g. 8096>, utilizing group masking for attention, as seen in PR #2560.
  • Chunking Responsibility Transferred: The responsibility for chunking is being moved to the loss function via loss = loss_fn(model.weight, logits, labels) to facilitate easier debugging.
    • A new file, torchtune.utils._tensor_utils.py, was created with a wrapper around torch.split and covered by unit tests, and will need to be merged.
  • NeMo's Resilient Training Tackles Crashes: A member attended a "Resilient Training with NeMo" session and shared insights on how NeMo addresses reasons for job crashes and wasted GPU time, highlighting that the topic is very close to torchtune.
    • NeMo's approach includes features like fault tolerance, straggler detection, asynchronous checkpointing, preemption, in-process restart, silent data corruption detection, and local checkpointing, but some features remain unimplemented.
  • AI-2027 Report Warns Superhuman AI: A member shared a link to the AI-2027 report predicting that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.
    • The report is informed by trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes.
  • CEOs Predict Superhuman AI by 2027: The CEOs of OpenAI, Google DeepMind, and Anthropic believe that AI could surpass human intelligence by 2027.
    • A member inquired whether AI was used to write the scrolling live updated chart on the AI-2027 website.


tinygrad (George Hotz) Discord

  • LeetGPU Support for tinygrad Eyes Future Support: Members discussed leetgpu.com and its potential future support for tinygrad, but did not provide specific details on the timeline or scope of the support.
    • One member inquired about plans to broaden accessibility to consumer-grade GPUs with accessible APIs, for local tinygrad development.
  • Huawei Ascend Cards Beckon Tinygrad Devs: A member offered access to Huawei Ascend cards for development purposes, which George Hotz expressed interest in, inquiring about purchasing options or cloud machine availability.
    • This could potentially expand tinygrad's hardware support and optimization efforts to include Huawei's architecture.
  • WEBGPU BEAM Hits Invocation Limits: Compiling a tinygrad model for WEBGPU with BEAM=2, users encountered the need to increase requiredLimits.maxComputeInvocationsPerWorkgroup to 512, reducing support for Android devices.
    • A PR and a hotfix branch suggest setting IGNORE_BEAM_CACHE=1 or implementing a general limiting mechanism to address the issue.
  • Tinygrad Karpathy GPT Gets Hotz Reimplementation: George Hotz has reimplemented the Karpathy GPT in tinygrad as he just starting to pick up tinygrad.
    • A user running this reimplementation on METAL reported a tinygrad.device.CompileError due to the 32 buffer limit, seeking advice on handling this constraint and linked to their main.py.


LlamaIndex Discord

  • LlamaIndex Embraces Multimodal Chat History: LlamaIndex now supports multimodal chat history, enabling multi-agent systems to process interleaving text and image messages, as detailed in this tweet.
    • The updated system facilitates agents in reasoning over both images and text, leveraging the ReAct agent loop.
  • Researcher Seeks PatentsView API: A community member requested an API key from the PatentsView contact to gather initial data for RAG implementation.
    • The goal is to leverage the PatentsView API for enhanced data retrieval and analysis within the RAG framework.
  • Workflows Morph into Tools: A community member proposed transforming a Workflow into a Tool by integrating it into a FunctionTool.
    • They demonstrated with a code snippet using async def tool_fn(...) to define the tool's functionality, followed by creating the tool with FunctionTool.from_defaults(tool_fn), which allows for specifying name, description, input annotations, and return values.
  • LlamaParse Faces Image Comprehension Quirk: A user reported that LlamaParse struggles to read charts/images, extracting text but failing to interpret the image itself, even with LVM and Premium mode.
    • A clarifying response indicated that LlamaParse can't process images without extractable text but can retrieve the image as an artifact for further processing, such as prompting an LLM to describe it.


Cohere Discord

  • AYA Vision Flounders on waves.jpg: A user reported that AYA vision returned a 400 error when analyzing a waves.jpg image, indicating an unsupported image file format despite AYA analyzing other JPG images successfully.
    • The error message specified that only PNG, JPEG, WebP, and GIF formats are supported, suggesting a possible issue with the specific JPG file or AYA's format detection.
  • Bedrock Blamed in AYA Vision Bug: A user saw coco.py: AWS Bedrock Command A when an error occurred, possibly suggesting a connection to AWS Bedrock when uploading the image.
    • It is unclear whether this is part of the AYA pipeline or an unrelated error during image analysis.
  • Full-Stack Savant Shows Skills: A full-stack developer with 8+ years of experience introduced themselves, highlighting expertise in React, Angular, Flutter, Swift, Python, TensorFlow, and OpenAI.
    • They have worked on high-impact projects in e-commerce, healthcare, and fintech, integrating cloud technologies, microservices, and DevOps.
  • Analyst Aims to Author AI Articles: A former product analyst on a break from job hunting is exploring writing about tech and AI.
    • They seek like-minded people to geek out with and chat about how tech shapes our world or practical uses of AI, feeling stuck in a bubble.
  • Web3 Wizard Welcomes AI: A Web3/AI engineer with 7+ years of experience in full-stack/AI development introduced themself.
    • They are focused on integrating AI with automation and are eager to help businesses with confidence and innovation.


DSPy Discord

  • Asyncio support coming to DSPy: A member inquired about plans to add asyncio support for general DSPy calls.
    • They cited use cases where they start with lightweight DSPy features and later expand into optimization, which they do using litelm until they need DSPy features, expressing curiosity about future support.
  • LiteLLM for Lightweight DSPy: The discussion highlights a pattern of starting with lightweight DSPy features akin to using LiteLLM, then transitioning to DSPy's optimization capabilities as projects evolve.
    • This suggests a potential need for seamless integration or feature parity between lightweight DSPy usage and full-fledged optimization workflows.


Codeium (Windsurf) Discord

  • DeepSeek-V3 Boosts Performance After Upgrade: The DeepSeek-V3 model has been upgraded to DeepSeek-V3-0324, showing better performance in internal tests, according to Windsurf's announcement.
    • The Windsurf team posted a playful request to bookmark the announcement post for further updates and support.
  • Windsurf Teases DeepSeek-V3 Upgrade: Windsurf AI announced an upgrade to the DeepSeek-V3 model on X/Twitter, mentioning that the new version is DeepSeek-V3-0324.
    • The announcement hinted at a slight performance improvement based on internal evaluations.


Gorilla LLM (Berkeley Function Calling) Discord

  • Gorilla LLM Awaits Further Testing: A member offered assistance with Gorilla LLM and Berkeley Function Calling.
    • They confirmed readiness to address questions, make adjustments, or conduct retesting as needed.
  • Further support offered to robotsail: Robotsail has offered his support to the Gorilla LLM and Berkeley Function Calling.
    • Robotsail is open to answer any questions and ready to retest.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.