[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it π
a quiet day is all you need.
AI News for 3/31/2025-4/1/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7148 messages) for you. Estimated reading time saved (at 200wpm): 719 minutes. You can now tag @smol_ai for AINews discussions!
people were mostly smart enough not to launch things on april fools'.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
Open Source Models and Releases
- OpenAI's upcoming open-weight language model: @sama stated OpenAI will not impose restrictions like preventing usage if a service exceeds 700 million monthly active users. @LiorOnAI noted that OpenAI is planning to release their first open-weight model since GPT-2 in the next few months. @ClementDelangue welcomes OpenAI's willingness to share open weights, hoping it leads to a golden age of AI progress. @snsf mentioned the open weight model coming in the next few months.
- DeepSeek's Open-Source R1 Model: @scaling01 reports that OpenAI's commitment to releasing an open-weight language model is a response to DeepSeek's R1 model launch on January 20, 2025, which challenges the notion that China lags in AI development.
- License and Usage of Open Source Models: @cognitivecompai defended that someone merely stated that a license was silly and that he's not going to do that.
Model Performance and Benchmarks
- Gemma Model Performance: @osanseviero announced that Gemma 3 can do function calling and is now on the Berkeley Function-Calling Leaderboard. @jack_w_rae noted Gemini's rate of progress in math is amazing to see, driven by talented researchers, observing the uplift on HMMT.
- GemmaCoder3-12b: @ben_burtenshaw introduced GemmaCoder3-12b, a code reasoning model that improves performance on the LiveCodeBench benchmark by 11 points, highlighting its ability to run on 32GB of RAM, its 128k context length, and the option to activate thinking via the chat template.
- Qwen 2.5 Models: @TheTuringPost highlights Alibaba_Qwen's Qwen2.5-Omni, which understands any types of input and introduces a two-part Thinker-Talker system and TMRoPE feature to create responses in both text and natural speech.
- @vipulved reported that the TogetherCompute inference team achieved 140 TPS on a 671B parameter R1 model, which is ~3x faster than Azure, and ~5.5x faster than DeepSeek API on Nvidia GPUs.
AI Product and Tool Releases & Updates
- ChatGPT and OpenAI: @kevinweil announced that the new image generation in ChatGPT is now available to 100% of free users. @OpenAI announced the release of a new voice in ChatGPT.
- Runway Gen-4: @TomLikesRobots shares excitement about Gen-4 for animating miniature diorama-style generations, praising its movement interpretation and style maintenance.
- LangChain: @LangChainAI introduced using LangGraph's pre-built computer use agent through a chat-based generative UI.
- Figure 03 Humanoid Robots: @adcock_brett discussed the first commercially deployed humanoid robots, highlighting full autonomy, real-world integration at BMW, fleet data for better pretraining, and BotQ manufacturing scaling.
- Other Tools: @juberti highlighted the new OpenAI realtime transcription API now supports WebRTC connections. @TheRundownAI mentioned Amazonβs Nova Act AI browser agent.
AI Research and Studies
- Efficient Reasoning for LLMs: @omarsar0 shared a survey focusing on reasoning economy in LLMs, analyzing how to balance deep reasoning performance with computational cost.
- Stanford's Tutor CoPilot: @DeepLearningAI reports that Stanford researchers developed Tutor CoPilot, a GPT-4-powered tool that assists online tutors.
- AI-driven Automation and Economic Implications: @EpochAIResearch discussed that AI investments might seem huge, but global wages add up to over $70 trillion.
Hugging Face and Gradio
- Gradio Usage: @ClementDelangue announced that Gradio just crossed 1,000,000 monthly developers using it in March.
Humor/Memes
- Sarcasm and April Fools' Jokes: @sama joked that "-restart-0331-final-final2-restart-forreal-omfg3" is gonna hit, i know it. @vladquant jokingly announced that after strategic review, Kagi is now Kagibara.
AI Reddit Recap
/r/LocalLlama Recap
1. LLM Mathematical Reasoning Limitations
- Olympiad Obstacle: Top Models Falter: A research paper revealed that state-of-the-art LLMs like O3-MINI and Claude 3.7 scored less than 5% on the 2025 USA Mathematical Olympiad (USAMO), despite being trained on extensive mathematical data including previous olympiad problems.
- The study highlighted significant issues with the models' logical reasoning, creativity, and self-evaluation capabilities, with LLMs overestimating their own scores by up to 20x compared to human graders. Community discussion pointed to the need for specialized proof-focused benchmarks and integration with formal proof tools like Lean or Coq.
- Formal Proof Progress: Pioneering Paths Forward: Reddit users discussed ongoing research efforts in automated theorem proving, with users sharing links to Google's AlphaProof and several open-source projects from Princeton, Stanford, and Huawei focusing on formal mathematical proofs.
- The discussion highlighted the challenges of formalizing mathematics, with users suggesting that future AI systems might combine strict formalized symbolic logic with diffusion-like processes for concept discovery. Many agreed that current LLMs need specialized tools and training to excel at mathematical reasoning rather than just answer prediction.
2. DeepMind Research Publication Strategy
- Six-Month Secrecy: DeepMind's Defensive Delay: According to a Financial Times report, Google's DeepMind will implement a six-month embargo policy on publishing strategic generative AI research papers to maintain competitive advantage, with a researcher stating they "could not imagine us putting out the transformer papers for general use now."
- The community had mixed reactions, with some users arguing the delay is reasonable given how companies like OpenAI built their business on DeepMind's freely shared research, while others expressed concern that this could be a "race to the bottom" that would eventually lead to longer delays or permanent secrecy.
- Open Research Ramifications: Progress vs. Profit: Redditors debated the impact of DeepMind's new publication policy on AI advancement, with many pointing out that transformer architecture research from 2017 created hundreds of billions in value for other companies while Google failed to capitalize on its own innovations.
- Some commenters argued that open collaboration accelerates progress for everyone, noting "we probably wouldn't be where we are currently when it comes to the field if it wasn't publicly shared," while others defended DeepMind's right to protect its intellectual property and competitive position.
3. New Tools and Features for Local LLM Users
- Hugging Face's Hardware Helper: Hugging Face launched a new feature that allows users to check if their hardware can run specific GGUF models directly from the model page by entering their hardware specifications at https://huggingface.co/settings/local-apps.
- Users welcomed this quality-of-life improvement while suggesting additional features such as filtering models by hardware compatibility, estimating maximum context length, and providing layer offload recommendations for CPU+GPU setups. The Hugging Face team indicated they would iterate on these suggestions in future updates.
- Mobile Model Momentum: iPhone Inference Innovations: A developer demonstrated achieving 90 tokens per second with Llama 3.2 1B in float16 precision on an iPhone by completely rewriting the inference engine, showcasing significant performance improvements over existing solutions like MLX.
- The community discussed the trade-offs between using float16 versus quantized models, with some questioning whether the quality difference between fp16 and q8 was significant enough to justify the performance cost, while others debated the practical applications of such small models on mobile devices.
- DeepSeek's Diminutive Deployment: V3 GGUF Quantization: User VoidAlchemy released new GGUF quantizations of DeepSeek V3-0324 using the ikawrakow/ik_llama.cpp fork, optimized to support 32k+ context in under 24GB VRAM with Multi-Layer Attention (MLA) and high-quality tensors for attention/dense layers.
- The quantizations were designed specifically for the ik_llama.cpp fork and won't work with mainline llama.cpp or other tools like Ollama or LM Studio. Performance benchmarks showed achieving near Q8_0 quality with speeds comparable to 4bpw quants on CPU-only setups.
4. Novel LLM Research Concepts
- Temporal Training: LLMs Trapped in Time: A Reddit user proposed creating LLMs trained exclusively on data from before a specific year or time period, such as pre-2010, sparking discussion about the feasibility and implications of such historically-bounded models.
- Community members suggested that models limited to pre-1950s data would be possible with public domain books, newspapers, and archived materials, but noted such models would reflect historical biases and technological optimism while lacking modern concepts. Some pointed to existing research like TimeLMs that tracked how language models' performance degraded on recent content.
- Acoustic Analysis: GPU Symphonies from Model Inference: Users discovered that different LLM models produce distinctive sounds from GPUs during inference, with a post linking to evidence that these audio patterns are specific to model architecture, quantization, and context size combinations.
- The discussion revealed this phenomenon is caused by "coil whine" from capacitors and inductors in the GPU's voltage regulator module, with some noting researchers have previously extracted encryption keys by recording such processing noise, raising potential security implications.
- Attention-Free Architecture: Qwerky's Quantum Leap: A post highlighted Qwerky-72B and 32B, attention-free models trained on just 8 GPUs, representing a significant advancement in efficient model architecture that requires less computational resources.
- These models, available on Hugging Face, demonstrate how attention-free architectures can reduce VRAM requirements while maintaining performance, with community members noting the potential implications for long context handling and accessibility of large model training.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
1. GPT-4o Image Generation Capabilities
- Precise Placement Prowess: Reddit users are impressed by GPT-4o's ability to handle precise item arrangement and text in generated images. A user shared examples showing how the model accurately places multiple icons in grid layouts with correct labeling, maintaining consistency across complex visual hierarchies.
- Many commenters noted that GPT-4o's unified text-image architecture gives it a significant advantage in understanding and executing detailed prompts compared to other models. One user demonstrated the model could handle up to 24 distinct icons with labels before quality degradation, showcasing its impressive compositional capabilities.
- Content Filter Frustrations: Users expressed frustration with GPT-4o's content filtering system, with one post titled "Chat gpt 4O sucks and everything trips its baby mode content filters" gaining significant traction. The poster complained about being unable to generate even mildly violent or suggestive content like fantasy battle scenes.
- Despite complaints, several users demonstrated techniques to work around the filters, sharing successful generations of warrior characters and stylized art. This sparked debate about OpenAI's approach to content moderation, with some users creating satirical content mocking the filters, including a GitHub repo titled "I've reverse-engineered OpenAI's ChatGPT 4o image generation algorithm" that was actually an April Fool's joke.
2. Claude vs Gemini Competition Heats Up
- Gemini 2.5 Takes the Lead: A post titled "This is the first time in almost a year that Claude is not the best model" sparked significant discussion as a Claude user admitted that Google's Gemini 2.5 now outperforms Claude across multiple use cases. The post highlighted Gemini's superior handling of context and overall reliability.
- Users debated specific strengths of each model, with many noting that Gemini 2.5's million-token context window is a game-changer compared to Claude's more limited capacity. Several commenters praised Gemini's creative writing abilities, though some suggested the influx of pro-Gemini posts might be strategic "astroturfing" rather than organic user feedback.
- Claude's Service Struggles: Multiple posts documented issues with Claude's service reliability, with users reporting increased rate limiting and error messages like "Message Limit Reached" appearing more frequently for paid subscribers. Screenshots showed the service becoming unresponsive despite users paying for premium access.
- The timing of these issues coincided with growing praise for Gemini 2.5, leading some users to question Anthropic's infrastructure scaling. One user wrote, "Anthropic should scale or get out of the business, we are paying customers," while others noted they were switching to Gemini due to Claude's increasingly restrictive usage limits.
3. Video Generation Breakthroughs
- Wan 2.1 Video Model Mastery: User @legarth shared an impressive video demonstration of the Wan 2.1 vid2vid model running locally on a 5090 GPU, transforming a clip from Tropical Thunder into a scene featuring the Joker. The model accurately maintained physics details like jacket movement despite working only with pose information.
- The creator explained they processed 216 frames (9 seconds at 24fps) but noted quality deterioration after about 120 frames. The community was particularly impressed by the model's ability to predict physics from motion alone, with one commenter noting "the jacket's cool. Physics" and another highlighting how the model handled hair movement despite the original actor being bald.
- VACE Video Control Released: A significant open-source video generation advancement was announced with the partial release of VACE (Video with Attention-based Cascaded Extraction) models on GitHub. The release includes VACE-Wan2.1-1.3B-Preview and VACE-LTX-Video-0.9, with the larger 14B version promised for later.
- Users expressed excitement about this open-source alternative to closed commercial platforms, with one commenter noting: "If this works anything like the examples shown, open-source video just leveled up big time." The technology appears to offer enhanced control over video generation, including structure and pose preservation features.
4. AI Development Tools and Innovations
- Claude Code's Costly Creation: A developer shared their experience spending $417 on Claude Code to build a word game called LetterLinks, detailing both successes and frustrations with the AI coding assistant. Despite the high cost, the user concluded it was still cheaper than hiring a freelancer for the estimated $2-3K the project would have required.
- The post highlighted specific issues with Claude Code, including context window limitations that became problematic as the codebase grew to 15K lines, and the need for extensive manual testing since "Claude can write code all day but can't click a f***ing button to see if it works." Many commenters suggested alternatives like using Gemini 2.5 Pro with its million-token context window or Claude MCP on the desktop app.
- EasyControl: Diffusion Transformer Enhancement: A new framework called EasyControl was released, designed to add efficient and flexible control capabilities to Diffusion Transformer (DiT) models. The system incorporates a lightweight Condition Injection LoRA module and position-aware training to enhance model compatibility and generation flexibility.
- Community members were particularly interested in EasyControl's potential to provide ControlNet-like functionality for Flux models, with one user commenting "Could this be the long-awaited good ControlNets for Flux?" Testing revealed mixed results, with OpenPose control working well but subject transfer capabilities showing inconsistent performance.
5. Pixel Art and Retro Graphics AI
- Retro Diffusion's Pixelated Precision: Retro Diffusion launched an interactive browser-based playground for generating authentic pixel art using AI, with no signup required. The FLUX-based model can create pixel art across various styles through smart prompting alone, without requiring LoRAs.
- The technical article accompanying the launch detailed how Retro Diffusion solved challenges specific to pixel art generation, including grid alignment, limited color palettes, and maintaining pixel-perfect outputs. The platform's creator joined the discussion, answering questions about features like animation capabilities and color palette control.
- XLSD: Lightweight Model Magic: Developer @lostinspaz shared progress on the XLSD project, which aims to create a high-quality image generation model that can run on systems with limited VRAM (8GB or even 4GB). The approach involves forcing SD1.5 to use the SDXL VAE and then training it to produce significantly better results.
- Comparison images showed substantial quality improvements over the base SD1.5 model, with the developer noting they were "cherry picking a little" but providing a fair comparison using identical settings. The community responded positively to this optimization-focused approach, with one commenter appreciating "people who push a piece of technology to its limit and explore it just for the sake of it."
AI Discord Recap
A summary of Summaries of Summaries by o1-preview-2024-09-12
Theme 1: OpenAI's Open-Weight Model Sparks Excitement
- Sam Altman Dangles Open-Weight Model Carrot: OpenAI CEO Sam Altman announced plans to release a powerful new open-weight language model, seeking developer feedback to maximize its utility. He assured they "will not do anything silly like saying that you can't use our open model if your service has more than 700 million monthly active users."
- Community Speculates on OpenAI's Strategy Shift: Nathan Lambert expects a 30B parameter reasoning model with an MIT/Apache license, igniting discussions about OpenAI's potential impact on the open-source community.
- Enthusiasts Brace for OpenAI's Return to Open Releases: AI developers express optimism about OpenAI's move, seeing it as a boost for collaboration and innovation in AI development.
Theme 2: New AI Models Under the Microscope
- Gemini 2.5 Pro's 'Aliveness' Stirs Turing Test Talk: Users are intrigued by Gemini 2.5 Pro's unique interaction style, suggesting it "might be the first to pass a serious Turing Test" due to its apparent aliveness and curiosity.
- DeepSeek R1 Zips Past Rivals, Democratizes RL: DeepSeek R1 outperforms larger labs with efficient resource use and an MIT license, making reinforcement learning accessible to the "GPU poor" through GRPO.
- Gemma 3 Smokes Gemini 1.5 in Benchmarks: Gemma 3 27B outperforms Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, impressing users with its superior capabilities.
Theme 3: Users Vent Over AI Tool Troubles
- Manus.im Users Rage Over Credit Crunch: Manus.im's new credit system infuriates users as credits deplete rapidly, leading them to recommend alternative AI research tools like Traycer.
- Gemini 2.5 Pro Rate Limits Drive Users Nuts: Frustrated users encounter rate limits on Gemini 2.5 Pro, debating whether limits apply to free and paid tiers, with some attempting to bypass them via a VPN.
- Cursor Charges for Free Models? Users Say 'What the Heck!': Cursor users question being charged to use a free model, sparking discussions about API usage, billing practices, and transparency within the platform.
Theme 4: Open-Source Contributions and Technical Innovations Shine
- Neuronpedia Opens the Data Floodgates: The interpretability platform Neuronpedia goes open source under MIT license, releasing over 4 TB of data and tools to democratize model interpretability.
- Stanford Teaches Transformers to the Masses: Stanford opens its CS25 Transformers seminar course to the public via Zoom and YouTube, covering topics from LLM architectures to creative applications.
- Megatron Tensor Parallelism Gets Deep Dive Treatment: An illustrated deep-dive into Megatron-style tensor parallelism, including fused/parallel CE loss, is shared, enhancing understanding of ML scalability and performance techniques.
Theme 5: AI Makes Strides in Law and Healthcare
- AI Decodes Legal Jargon in New Seminar Series: The Silicon Valley Chinese Association Foundation hosts a seminar on AI in legislation, featuring the founder of Legalese Decoder, exploring how AI simplifies complex legal documents.
- Sophont Aims for Medical AI Revolution: Sophont launches to build open multimodal foundation models for healthcare, striving to create a DeepSeek for medical AI.
- Dream On! Rem App Wants You to Journal Your Nighttime Adventures: Rem introduces a dream journaling app that allows users to record, analyze, and share dreams, leveraging AI to uncover hidden patterns in their subconscious.
PART 1: High level Discord summaries
Manus.im Discord Discord
- R1 Users Rage Over Credit Crunch: Many R1 users expressed dissatisfaction with the new credit system, with some experiencing complete credit depletion after a few requests, and recommend alternative AI research tools like Traycer to save credits.
- They observed that the system is like gambling and proposed more clear and transparent options for future plans, urging a reconsideration for user adoption.
- Decoding Credit Consumption: Credits deplete based on LLM tokens, virtual machines, and third-party APIs, increasing with task complexity and time, now also consuming credits even just browsing online.
- Members reported projects failing to upload and needing 800 credits plus 1800 more debugging, pointing out that debugging on ChatGPT was superior.
- OpenManus Gains Traction: Despite security concerns with PAT and API keys, there's rising interest in OpenManus, with some planning to evaluate its capabilities, with a member asking if the tool's output could improve.
- Members caution about capability deficiences when adapting to the Manus' work scenarios and also point out that it can generate interactive study guides as websites and in depth research, depending on the situation.
- Manus Now Offers Website Hosting: Members are reporting success with Manus on creating a hosted website, pointing out that the software provides DNS and hosting services, while they are combining services like Perplexity and Gemini Deep Research.
- A member says there's a video on website creation, leading other members to inquire about how to get people to use the website.
- Manus Android App Debuts: Users discover that Manus has an Android app, accessible via the browser by clicking a phone icon, which redirects to the Play Store.
- Some members even jokingly suggested purchasing an iPhone as a solution.
LMArena Discord
- Meta Models' Safety Settings Get Downgraded: Newer models from Meta are becoming safer by sanitizing censored details when inferring hidden context from corrupted text, marking a shift in model behavior.
- Previous models like Themis, Cybele, and Spider were eager to go where other models couldn't.
- Decoding the 'Venom' System Prompt: Members analyzed the system prompt for models like Spider, Cybele, and Themis, believing they share a similar prompt to the now exposed
venomprompt.- The analysis reveals a whacky but intelligently crafted prompt that heavily influences the models' style and responses, particularly in how they format and structure their outputs.
- Gemini 2.5 Pro's 'Aliveness' Sparks Debate: Members expressed intrigue over the aliveness and curiosity of Gemini 2.5 Pro, with one suggesting it might be the first to pass a serious Turing Test due to its unique interaction style and exceptional creative writing.
- They highlight Gemini's top scores on Philip's SimpleBench as evidence of its potential and note the model appears to be more creative and engaging, leading to calls for a double-blind Turing Test.
- LMArena Unleashes New Models into the Pantheon: LMArena introduced a flood of anonymous models like Aether, Maverick, Ray, Stargazer, Riveroaks, with members trying to uncover their origins and capabilities.
- Stargazer is said to be made by Google (=== Nebula), and Riveroaks claims to be from OpenAI, gpt 4o, while Maverick, Spider and 24_karat_gold seem to have a similar style due to their shared system prompts and origins at Meta.
- Alpha Arena Adds Copy Code and Images: The Alpha Arena now features a copy code function and image generation capabilities, enhanced accessibility.
- Testers are encouraged to provide feedback via a Google Forms link and report bugs via an Airtable link.
Cursor Community Discord
- Reasoning Ability of Gemini 2.5 Pro Debated: Members are debating the reasoning capabilities of Gemini 2.5 Pro, with some finding it quick but lacking depth, while others praise its performance in specific coding scenarios citing Tweet from Min Choi.
- Some suggested Claude 3.7 handles complexity and detail more effectively, however the new Gemini Pro 2.5 model is now being used in Cursor. See Tweet from Ryan Carson.
- Account Restrictions Ignite Trial Abuse Discussion: A user's account limitations sparked a debate about trial abuse, with claims of accounts being flagged for abuse and requiring a credit card.
- Alternatives like Windsurf or Cline were suggested to bypass payment issues, but no further details were provided about how to use those tools or their reliability.
- AI's Impact on Jobs Discussed: Members are discussing AI's potential impact on employment, speculating that 86% of jobs could be replaced by 2030.
- The response was to learn ML/AI and Prompting properly, with the additional suggestion to learn polynomials with regressions.
- Cursor's Charging for Free Models Questioned: Members questioned Cursor for charging to use a free model, with explanations clarifying that Cursor manages API usage through their wallet and has deals with AI models via Fireworks.
- The general consensus was that Cursor has limited token usage but is approximately 10x cheaper than Claude, offering a more cost-effective solution for some users.
Unsloth AI (Daniel Han) Discord
- Multi-GPU Support Marches to Unsloth: Unsloth is adding multi-GPU support, with the first release focusing on data parallelism, but fsdp (Fully Sharded Data Parallelism) may not be included initially.
- The fsdp (Fully Sharded Data Parallelism) component will be under the AGPL3 license.
- DeepGrove's Bonsai Boasts Budget BitNet Bootstrapping: A member is skeptical about DeepGrove's Bonsai claiming to pretrain a BitNet with only $70 and 3.8b tokens.
- They're running the model in Kaggle to see if it's valid, exploring whether the model is a blindly copied Qwen model or continue trained Qwen to BitNet.
- Unsloth Dataset Defect Detected: A user ran into a
ValueErrorwhen using a custom dataset in Unsloth Orpheus format, which was later resolved by using a GPU.- Another user mentioned that the Orpheus dataset uses SNAC, which operates at 24kHz.
- Gemma 3 Generates Text-to-Image Wonders: A user sought image and text inference samples for Unsloth/Gemma 3 using Hugging Face, referencing a Gemma 3 demo on Hugging Face Spaces.
- It was noted that while Llama 3.2 Vision requires an image, Gemma 3 should not have the same issue.
- Long Ctx Benchmarks? RULER is Rule!: For long ctx benches, a member stated that RULER is the bare minimum for what should be considered a long ctx bench, and NIAH is garbage.
- They added that some of the recent ones are alright.
Perplexity AI Discord
- Discord Revamp Imminent!: The moderation team is gearing up to overhaul the Discord experience, focusing on simplified onboarding, a consolidated feedback channel, and automated Pro Channel access over the next week.
- These changes aim to streamline user engagement and ensure the team stays responsive to community needs.
- Space Instructions Still Caged?: Users found out that Space Instructions in Perplexity AI have limitations on controlling the search experience, mainly impacting output summarization.
- Because instructions only apply after the data has been extracted, this prevents the AI from avoiding specific topics.
- Image Generation Goes MIA: Users noticed the disappearance of the image creation feature within Perplexity.
- While it isn't clear if the function is completely discontinued, one user suggested using the web search to find the generate option, but another confirmed that the function doesn't seem to appear for everyone, perhaps indicating phased rollout or feature testing.
- GPT Omni Receives Thumbs Down: Members have reported experiencing frustration with GPT Omni, with one describing it as suck ass.
- While designed for smarter interaction with audio, video, and images, users have noted that Omni has been dumbed down compared to GPT-4 for cost reasons.
- JSON Escapes Sonar API: A user reported issues with the Sonar API adding odd special characters to the JSON results when searching the web, despite using pydantic for formatting.
- The user provided an example where extra characters were added to the
source_name,source_title,summary, andurlfields in the JSON output.
- The user provided an example where extra characters were added to the
OpenAI Discord
- ChatGPT Debuts Monday Voice: ChatGPT introduced a new voice option called Monday, accessible via the voice picker in the top right corner of voice mode, as shown in this demo video.
- Users can select the new Monday voice option by opening the voice mode and using the voice picker in the top right corner.
- Beware of Fake ChatGPT Apps!: Users reported encountering fake ChatGPT apps on the Play Store, where they purchased the app but did not receive access, emphasizing the need to verify purchases via their purchase history.
- It's crucial to ensure you're using the official app to avoid scams and ensure you have access to genuine OpenAI services.
- Gemini 2.5 Pro Rate Limits Plague Users: Users reported experiencing rate limits on Gemini 2.5 Pro, leading to discussions on whether the limit applies to both free and paid tiers, with some users trying to bypass rate limits by using a VPN.
- A suggestion was made to use Gemini in Google AI Studio instead, where the usage limits are higher (50 requests per day).
- ElevenLabs Model Promises Audiobooks: A member explored ElevenLabs' new model for narrated audio books, citing its voice cloning feature.
- While they were impressed with the initial results, they await OpenAI to release a similar voice product to avoid subscribing to external services, because it may be useful as a voice acting placeholder for game developers.
- Model Users RESET Rigid Patterns: A member shared a code snippet
FORMAT_RESETto help models acknowledge when they've fallen into rigid patterns and rethink their approach.- The code encourages the model to analyze what format would better suit the response and completely rethink its approach without defaulting to templates.
LM Studio Discord
- Gemma 3 Smokes Gemini 1.5: Gemma 3 27B outperformed Gemini 1.5 Flash in benchmarks like MMLU-Pro and Bird-SQL, with one member producing the data using Gemini 2.5 Pro, available for free on OpenRouter.
- A user with a 4060 Ti and i5 12400F was recommended Qwen Coder 7B, available on LM Studio's model page, though members emphasized that local LLMs generally perform worse than cloud alternatives.
- Gamers Plug eGPU into LM Studio: Members discussed the feasibility of using an eGPU with LM Studio, suggesting it should work if the computer recognizes it, though speeds may be slower, as referenced in a YouTube video comparing LLMs on RTX 4090 Laptop vs Desktop.
- Another user observed a 3.24x speedup from M4 Max to 5090 after resolving crashing issues, which aligns with the 3.28x ratio of their memory bandwidths when doing QwQ 32B 4 bit quant comparisons.
- Copilot's Code Called Garbage!: Members debated the benefits of AI assistance in programming, with one arguing it hurts more than helps due to learning from AI slop, expressing concern that Copilot is trained on garbage code.
- Others disagreed, saying Copilot works great for experienced developers, but one person noted average users trust the recommendations too easily.
- Context Size Drives Mac Preference: Despite faster Nvidia GPUs, users are leaning towards Macs for the freedom to have more context size, highlighting the utility of larger context sizes even with slower speeds.
- One user wondered what would happen if they could load the context overflow into the shared memory/system RAM while keeping the entire model in VRAM, but another user noted that the LLM needs all the context in VRAM to generate the next token.
- Nvidia Drivers Fail after 10 Hours: A user reported Nvidia driver instability after running models for 10-12 hours, requiring a driver reinstall to resolve performance issues, clarifying the issue was with the Nvidia driver itself, not the Windows OS.
- A user inquired about performance results for the Tenstorrent Wormhole (n150d and n300d) within the Discord community, expressing interest in obtaining TOK/s metrics for these models.
aider (Paul Gauthier) Discord
- Gemini 2.5 Pro is a Mixed Bag: Users are seeing mixed results with Gemini 2.5 Pro, with some models hallucinating, DC'ing, or being top-tier for coding tasks.
- One user found Gemini 2.5 Pro and DeepseekV3 to be "almost free and top tier", whereas others are giving up, and throwing away their computers, as shown in this GIF.
- RateLimitError Fixes Sought: Users are experiencing frequent RateLimitErrors when requesting summaries and clearing history.
- It was clarified that the rate limit is likely based on the number of requests per minute or day, and a possible solution may be found in this Github issue.
- Dot Command Revolution?: A user is promoting the use of .dotcommands as a productivity tool for developers, to automate tasks with single-line commands such as
.statusand.next.- The goal is to provide cognitive shortcuts optimized for clarity and specific functionality, but it was noted that *THE DOT REVOLUTION IS HERE π₯ Coders everywhere will want to try this one cool trick.
- Aider's Subtree Savior Emerges: Members are seeking ways to limit aider to a subdirectory of a monorepo.
- The solution is to use the
--subtree-onlyswitch after changing to the desired directory, setting aider to ignore the repo outside the starting directory, however, the asker pointed out the FAQ on large monorepos.
- The solution is to use the
- The Case of the Misconfigured Model: A member reported that specifying model names in a local YAML config file wasn't working as expected.
- Despite the startup message showing the correct config settings, aider still defaulted to anthropic/claude-3-7-sonnet-20250219 rather than the configured deepseek/deepseek-chat.
OpenRouter (Alex Atallah) Discord
- Organizations Feature Escapes Beta!: OpenRouter announced the Organizations feature is out of beta, enabling teams to manage billing, data policies, provider preferences, and API keys in one place, according to this X post.
- Over 500 organizations were created during the two-week beta, offering full control over data policies and billing.
- Web Search Invades Chatroom!: Web search results are now integrated into the chatroom, using Perplexity results formatted similarly to OpenRouter's
:onlinemodel variants.- A user requested that OpenRouter post on Bluesky to avoid Xitter reliance.
- Gemini Flash 2 Transforms!: OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in.
- These transforms are applied by default on endpoints with context length less than 8192 tokens, and only once the 1M limit is reached.
- Usage Downloads Coming Soon!: A member requested downloads of their usage data, including tokens and costs, as displayed on the activity page, for credit verification.
- A maintainer responded that while this feature is unavailable, we're working on it.
- EU Provider Selection Quandary!: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer noted the need but mentioned limited coverage, recommending seeking an EU certified provider for strict EU data guidelines if provider selection is not enough.
Eleuther Discord
- Stanford Teaches Transformer Class, Online: Stanford has opened its CS25 Transformers seminar course to the public via Zoom, featuring discussions with researchers and covering topics from LLM architectures to creative applications and past lectures available on YouTube.
- The course includes lectures, social events, networking sessions, and a Discord server for discussions.
- Deep Sets finds Triangles, Achieves Nothing: A member shared a link to a paper titled Deep Sets for the Area of a Triangle (arxiv link), which presents a polynomial formula for triangle area in Deep Sets form.
- The abstract concludes that the project, motivated by questions about computational complexity of n-point statistics in cosmology, gained no insights of any kind.
- Neuronpedia Opens the Data Floodgates!: The interpretability platform Neuronpedia is now MIT open source and available on GitHub, offering a quick Vercel deploy.
- A trove of interpretability data, totaling over 4 TB, is available for download as Public Datasets.
- SmolLM Scores Zeroes Out, PR fixes Aggregate Scores: A member reported that aggregate scores for tasks like leaderboard_bbh, leaderboard_math_hard, and leaderboard_musr were empty in the results JSON when running leaderboard evaluations with lm-eval on SmolLM-1.7B.
- A member shared a PR adding subtask aggregation to address missing aggregate scores in tasks with subtasks.
Interconnects (Nathan Lambert) Discord
- CodeScientist Automates Science: AllenAI introduces CodeScientist, an autonomous system using genetic search over research articles and codeblocks to generate and evaluate machine-generated ideas, with 19 discoveries from experiments in agents and virtual environments detailed in their paper.
- The system addresses limitations in current ASD systems by exploring broader design spaces and evaluating research artifacts more thoroughly.
- OpenAI Teases Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2, seeking developer feedback via this form to maximize its utility, according to Sam Altman's tweet.
- Altman stated that they will not do anything silly like saying that you cant use our open model if your service has more than 700 million monthly active users.
- Meta Preps Screened Smart Glasses: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman's report.
- Members are interested to see how they'll do against xreal.
- Pydantic Evaluates LLMs: Pydantic Evals is a powerful evaluation framework designed to help systematically test and evaluate the performance and accuracy of the systems you build, especially when working with LLMs.
- It provides a structured environment for assessing model capabilities and identifying areas for improvement.
- Lambert Returns to OpenAI: Nathan Lambert shared his thoughts on OpenAI returning in a substack post, mentioning that he may use this format for unbaked career thoughts too.
- He also mentioned DMing some OpenAI folks about it, hoping to find allies of open source who feel exiled by the current situation.
GPU MODE Discord
- A100 Parallel Threads Face Reality Check: Members debated the maximum number of parallel threads on an A100 GPU, but practical tests using GeoHot's tool revealed a limit of 24576 or 256 threads per SM before performance degrades.
- The conversation clarified that GPUs use oversubscription to hide latencies with cheap (~1 cycle) context switches, suggesting adding threads beyond the "parallel threads" limit doesn't significantly increase runtime.
- FlexAttention Lets it All Hang Out: FlexAttention now supports arbitrary sequence lengths, removing the restriction for segment sequence lengths to be a multiple of 128 in PyTorch 2.6.
- This enhancement was discussed with Horace He at a GPU mode event in San Jose.
- Desire for Memory Savings Using Tensor Deletion: A user seeks methods to delete argument tensors within a loss function to achieve memory savings of approximately 7GB, with a related GitHub Issue available.
- The user wants to free the storage associated with a tensor after it's no longer needed, even if a reference exists in the outer scope, while ensuring it remains compatible with torch compilation to avoid graph breaks.
- Apple Pushes MLX to the Max: Apple is hiring engineers for their MLX team to build scalable, distributed training and research pipelines, advancing the frontier of ML and systems.
- The company seeks system engineers and software developers with a background in ML to build technologies powering future products.
- Megatron Tensor Parallelism gets the Deep Dive Treatment: A member wrote an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, and seeks feedback on the content, available here.
- The article aims to deepen the understanding of ML scalability & performance techniques.
Latent Space Discord
- Cursor Codes Cash with Huge Round: Cursor closed a $625M funding round at a $9.6B post-money valuation, led by Thrive and A16z, with Accel as a new backer, and achieved $200M ARR, a 4x increase from its previous round in November 2024, according to The Information.
- The round sparked chatter about vibe coding, with Abe Brown noting the company's valuation has rapidly grown, approaching $10B.
- Etched Etches $85M for Transformer ASIC: Etched, a startup building transformer ASICs, closed an unannounced $85M round at a $1.5B valuation, following two stealth rounds at $500M and $750M, according to Arfur Rock.
- The company claims its chip Sohu can process over 500,000 tokens per second running Llama 70B, with one 8xSohu server replacing 160 H100s, although it cannot run CNNs, LSTMs, SSMs, or other AI models.
- OpenAI Opens the Gates with Open-Weight Model: OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, seeking developer feedback on how to maximize its utility, according to OpenAI.
- The company will evaluate the model using its preparedness framework and host developer events in SF, Europe, and APAC, with Nathan Lambert anticipating a 30B parameter reasoning model with MIT/Apache license, according to his tweet.
- OpenDeepSearch Searches Deeper than GPT-4o: Seoong79 announced the release of OpenDeepSearch (ODS), an open-source search agent that works with any LLM and outperforms OpenAIβs specialized model for web search, GPT-4o-Search, on the FRAMES benchmark from DeepMind, according to his tweet.
- Specifically, OpenDeepSearch achieved +9.7% accuracy over GPT-4o-Search when paired with DeepSeek-R1.
- Sophont Seeks to Solve Medical AI with Open Models: iScienceLuvr announced the launch of Sophont, a company building open multimodal foundation models for the future of healthcare, aiming to create a DeepSeek for medical AI, according to his tweet.
- The new company seeks to create foundation models that can perform well in healthcare.
HuggingFace Discord
- DeepSeek R1 Zips Past Rivals: A tweet lauded DeepSeek R1 for outperforming larger Western labs with efficient resource utilization and a permissive MIT license.
- The release also democratized RL for the GPU poor through GRPO.
- xAI Swallows X Corp!: xAI acquired X in an all-stock deal, valuing xAI at $80 billion and X at $33 billion according to Elon's Tweet.
- The merger aims to combine xAI's AI expertise with X's extensive user base.
- LLM Hyperparameter Tuning Hot Takes: Members sought guidance on selecting hyperparameters for fine-tuning LLMs, and were directed to Unsloth's LoRA Hyperparameters Guide.
- The question focused on how contextual changes influence hyperparameter settings.
- Coding Model OpenHands LM Opens!: The open coding model OpenHands LM, a 32B parameter model, is now on Hugging Face.
- The coding model is intended for use in autonomous agents for software development, as mentioned on the project blog.
- Gradio Rides the Million Developer Wave!: Gradio announced they've hit 1,000,000 monthly active developers for building and sharing AI interfaces.
- The Gradio team expressed gratitude for the community's contributions.
Modular (Mojo π₯) Discord
- MAX 25.2 Livestream Glitches: Modular's MAX 25.2 livestream faced technical difficulties, but a cleaned-up recording and Chris' lightning talk are now available on YouTube and YouTube, respectively.
- The team apologized and promised a better system for future events, with one member humorously mistaking a GTC video of Chris for the live event.
- Compiler Error Bugging Users: A user reported a confusing compiler error message when defining a method for a
Datasetstruct, suspecting a compiler bug, see GitHub issue #4248.- A potential cause could be using
out selfinstead ofmut self, highlighting the need for clearer error messaging.
- A potential cause could be using
- Enums MIA in Mojo: Inquiries about enum updates in Mojo revealed that there are no updates available at this time.
- The response was a simple "Sadly no. πππ"
- FlexAttention MAXed Out?: A user inquired about implementing flex-attention in Mojo, linking to a PyTorch blog post and suggesting it as a custom op in MAX.
- The response indicated that Mojo on the GPU is close to CUDA and *"unless you run into something that's a work in progress, MAX should be able to do more or less whatever you want."
- Float-to-String Algorithm Fails to Float: A user ported a new float to string algorithm to Mojo from this code, referencing the creator's CPPCon talk, but found it slower than the standard library's dragonbox implementation.
- Stringifying
canada.jsonwent from mid 30ms to low 40s, despite ripping the formatting from the standard library.
- Stringifying
Nous Research AI Discord
- OpenAI API Gets One-Line Fix: Any tutorial working with OpenAI API should work with the Nous Research AI API, provided the endpoint is changed to
endpoint = "api.nousresearch.com".- A member confirmed the fix and noted that they will be adding styles.
- Midjourney Models Write Creatively: Midjourney released a new research paper with NYU on training text-based large language models (LLMs) to write more creatively, moving past image generation.
- The company also revealed it is developing its own AI hardware, announced in late summer 2024.
- Sam Altman Teases Open-Weight Model: Sam Altman announced plans to release a new open-weight language model with reasoning capabilities, seeking developer feedback through events in SF, Europe, and APAC, according to this announcement.
- This marks OpenAI's first open-weight model release since GPT-2.
- DeepSeek Jiu Jitsu saves the Open Source Community: Members expressed gratitude to DeepSeek for their sophisticated maneuvers in enabling an Open Source community.
- The sentiment was linked to this YouTube video discussing OpenAI's shifting strategy related to the open-weight model.
- CamelAIOrg Launches Project Loong π: CamelAIOrg introduces Project Loong π, a structured, modular solution for generating and verifying synthetic data, and this blog post details the integration of synthetic data generation with semantic verification.
- The project features a multi-agent framework ensuring accuracy and consistency.
Yannick Kilcher Discord
- Graphs Experience Learning Renaissance: A Google Research blogpost highlights the evolution of graph learning since 2019, tracing the history of graph theory back to Leonhard Euler in 1736 and its applications in modeling relationships.
- Community members showed great interest in recent advancements of the area.
- AI/ML Reshapes Job Landscape: Recent AI/ML improvements primarily impact low-level jobs, such as minor programming tasks, yet human adaptation remains crucial, reducing dependencies on others, exemplified by AI/ML's role in initial legal assistance.
- This shift saves resources and enables multidisciplinary tasks, suggesting a significant restructuring of professional roles.
- RLHF Produces Nerfed Models: Concerns rise over RLHF leading to emergent misalignment if models are penalized for useful tasks such as ML R&D, potentially resulting in open-source models becoming increasingly evil as they compensate for suppressed behaviors.
- Discussion also touched on whether open-source models may become nerfed.
- Gemini 2.5 Pro Bombs Math Test: Testers found Gemini 2.5 Pro (experimental) to be totally trash in math, with issues in UI math display, with ChatGPT and Grok 3 demonstrated superior question comprehension in information theory and geometry.
- Results led the user to guide the language model to write correctly.
- AI Model Feedback Aired Out: With the launch of the OpenAI Open Model Feedback forum, there was renewed discussion of Ilya Sutskever's quote of if there was one great failing it would be that you always had to check the results.
- The forum aims to improve models using community input.
MCP (Glama) Discord
- Pichai Promotes MCP?: Sundar Pichai's tweet asking 'To MCP or not to MCP, that's the question' has ignited significant interest in MCP, amassing over a million views.
- The moderator of
/r/mcpeven proposed hosting an AMA if Google adopts MCP.
- The moderator of
- ActivePieces Abandons MCP!: Active pieces, an open-source Zapier alternative, discontinued its support for MCP.
- There was no reason stated, but it might be related to the general MCP protocol still undergoing active development, along with the growing pains of many MCP-related side projects being deprecated.
- MCP RBAC approaches Explored: Users are exploring Role-Based Access Control (RBAC) implementations on MCP servers for segmented tool visibility, with one suggestion being integration with WorkOS.
- Another member mentioned Toolhouse API handles RBAC based on the API key.
- SDK Governance goes Open Source!: An open source SDK for enterprise governance (Identity, RBAC, Credentials, Auditing, Logging, Tracing) within the Model Context Protocol framework, is available at ithena-one/mcp-governance-sdk.
- Community feedback is welcomed.
- Asynchronous MCP Cometh: The extension MCPC mitigates MCP's synchronous limitations by adding asynchronous support.
- It maintains backwards compatibility, so existing setups remain functional, while the new features are available to both client and server setups.
Notebook LM Discord
- NotebookLM Chases Webby Wins: NotebookLM is nominated for three Webby Awards and is asking for community votes at this link.
- Voters should confirm their votes by clicking the verification link in their email, and check their spam folder.
- Google Tasks Tempts Integration: A user suggested that Google Tasks could integrate with NotebookLM by allowing users to pick a task list via a dropdown/popup.
- They proposed that this could work similarly to how Google Tasks allows selecting a task list for sharing.
- Archival Aspirations Arise: A user requested a way to archive notebooks in NotebookLM to hide them and reduce the number of notebooks counting against their limit.
- They suggested that hidden/archived notebooks should not appear in the list of notebooks available for sharing content.
- Gemini 2.5 Pro: Prompting Parity: A user requested that the NotebookLM IA be updated to Gemini 2.5 Pro, citing their love for the updated Gemini version.
- They hope that NotebookLM will perform even better with the new model, but the NotebookLM team has not commented on any ETAs.
- Notes, Not Sources Needed: A user with personal notes managed in Obsidian (2000+ short notes) finds the 300-note limit restrictive.
- They propose limiting the total number of words instead of the number of sources to better accommodate mesh note systems; a user suggests that folders or zipped files as a single source would also solve the problem.
Torchtune Discord
- Torchtune Scheduled for Next Friday: Members announced the next Torchtune office hours next Friday, linking to the Discord event.
- Members celebrated Discord's automatic timezone conversion feature.
- Hurry Review PR #2441: A member requested a final review for PR #2441 to expedite the merge process.
- Regression testing for PR #2477 is paused awaiting Qwen model upload to S3 for download during the regression test script, but the S3 bucket hookup is encountering internal infra snags.
- Llama2 Called Geriatric: A member suggested swapping the regression tests using the Llama2 model with something more current.
- It wasn't clear if the member's issues were related to regression test failures or simply the test suite using older components.
- Recursive Reshard Routine Removed: PR #2510 removes the
recursive_reshardutility because it wasn't needed.- This PR was initially intended to address #2483, but further examination revealed the utility was unnecessary.
tinygrad (George Hotz) Discord
- ImageDtype's Purpose Revealed: A member asked about the purpose of ImageDtype and the IMAGE environment variable in tinygrad, referencing its influence on Tensor.conv2d implementation with a link to a VAE training script.
- Another member thinks that it is related to accelerating comma.ai models on Qualcomm (QCOM) hardware, by utilizing mobile GPUs' texture performance and caching.
- tinygrad BEAM Leaves tf-metal in the Dust: A user reported performance gains on an M1 Pro, going from 3.2 it/s without BEAM to 28.36 it/s with BEAM=2; while Keras with tf-metal achieved about 25 it/s.
- George Hotz was pleased to see that it's "faster than tf-metal with BEAM!"
- Mobile GPUs Get Accelerated Via Textures and ImageDType: Discussion suggests ImageDType and associated functions optimize for mobile GPUs' texture performance, referencing a Microsoft research paper on mobile GPUs.
- A member questioned the hardcoding of layout specifics and suggested HWC (Height, Width, Channel) handling should be part of normal conv2d with user-defined padding.
- arange() Algorithm Optimized: A member identified suboptimal code generation for small arange ranges (e.g.,
arange(1, 2, 0.1)) compared to larger ranges (e.g.,arange(1, 10, 0.1)) and documented their findings on.arange()here.- They also noticed an unnecessary addition in the generated code, proposing a fix from
((float)((ridx0+1)))*0.1f)+0.9f)to(((float)((ridx0)))*0.1f)+1.0f).
- They also noticed an unnecessary addition in the generated code, proposing a fix from
LlamaIndex Discord
- LLM Agents Open New Frontiers for Docs: An underrated use case for LLM agents is every field that depends heavily on complex technical documentation like manufacturing, construction, and energy, where an agent can do structured extraction from documents.
- These docs are often full of screenshots as mentioned in this tweet.
- OpenAI RateLimitError Hinders ReAct Agent Locally: A user encountered an OpenAI RateLimitError (Error 429) when using a ReAct agent with a local model set up via Ollama, questioning if ReAct agents are exclusively for OpenAI LLMs, with setup details in their GitHub repository.
- The suggestion was that the embedding model might be the cause of the OpenAI error, as it could be defaulting to OpenAI's embedding model if not explicitly set, even though the user confirmed that they are using a Hugging Face embedding model, set during document creation.
- VectorStoreIndex Setup Needs LLM and Embedding Model: It was advised to pass in both the
llmandembed_modelwhen creating the VectorStoreIndex.- Also, make sure to specify
llmwhen callingindex.as_query_engine().
- Also, make sure to specify
Nomic.ai (GPT4All) Discord
- GPT4All Expands Globally with Translations: Official translations have been rolled out for the GPT4All documentation, now supporting Simplified Chinese, Traditional Chinese, Italian, Portuguese, Romanian, and Spanish.
- This broadens accessibility and usability of GPT4All for non-English speaking developers.
- Users Debate Llama3 8B Instructor Model Use Case: A user inquired whether the Llama3 8B Instruct model is optimal for generating blog posts and web pages from video and text-based course materials.
- Another user requested that they rephrase their question.
- Clarification on .bin vs .gguf File Formats: A user initially questioned the interchangeability of .bin and .gguf file formats.
- The user then retracted the question, noting they were mistaken about the incompatibility.
LLM Agents (Berkeley MOOC) Discord
- MOOC Quizzes Completion-Based: Members confirmed that the MOOC quizzes are completion based.
- Instructors hope students will attempt their best for their own learning.
- Llama 3 Cookbook Unveiled: The LLM Agents Cookbook mentioned in Week 5 coding agents refers to the Llama 3's cookbook found here.
- Meta released the Meta Llama 3 family of LLMs in 8 and 70B sizes, optimized for dialogue use cases and outperforming other open source chat models on industry benchmarks according to their blogpost.
- Loong Verifiers Validate Reasoning Models: As discussed in Project Loong, Large Reasoning Models like DeepSeek-R1 greatly improved general reasoning when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward.
- The ability to verify accuracy is crucial for improving domain-specific capabilities, particularly in mathematics and programming.
- High-Quality Datasets Enhance CoT Learning: The consensus is that abundant, high-quality datasets, featuring questions paired with verified correct answers, are a critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs).
- The community believes that these datasets provide the necessary signals for models to reliably arrive at correct answers.
Cohere Discord
- Command A Screams Eternally: A user found that Command A gets stuck generating the same character endlessly when encountering a context where a character is screaming with repeated letters.
- This issue occurs even with default API Playground settings, freezing the interface and preventing feedback, reliably reproduced with prompts like "Please generate a scream in fiction inside quotation marks".
- Rem App wants you to journal Dreams: A user shared Rem, a dream journaling app created with a friend to easily record, analyze, and share dreams.
- The app aims to provide a platform for users to log their dreams and gain insights into their subconscious.
- New Cohere Members make Introductions: The community welcomes new members to the Cohere Discord server, encouraging them to introduce themselves and share what they're working on.
- New members are prompted to share their company, favorite tech tools, and what they hope to gain from this community.
- Members eager to participate and learn: New members are eager to participate, learn, and get feedback on their projects.
- They are excited to engage in discussions about their favorite technologies and tools within the community.
MLOps @Chipro Discord
- Decoding Legalese Seminar: The Silicon Valley Chinese Association Foundation (SVCAF) will host a seminar on April 2, 2025, discussing AI applications in legislation, featuring the Founder of Legalese Decoder.
- The seminar will explore how AI, ML, and NLP simplify legal documents for public understanding.
- SVCAF Launches AI4Legislation Competition: SVCAF is holding a competition this summer to develop open-source AI solutions for citizen engagement in the legislative process, with details available in the official Github repo.
- The competition aims to harness AI's power to make legislative processes more equitable and effective, aligning with SVCAF's mission to educate the Chinese community in public affairs.
- AI4Legislation Seminar Series to Commence: The AI4Legislation seminar series will recur during the first week of each month to provide project guidance and information about legislative AI tools, accessible here.
- Each seminar features a different guest sharing insights on utilizing AI to address key challenges in lawmaking, exploring the potential of AI-driven governance.
AI21 Labs (Jamba) Discord
- Multilingual User Misses Poll: A member noted their absence from a recent poll, mentioning they regularly communicate in both French and English.
- They also indicated occasional use of Greek and Hebrew.
- AI21 Labs Discussed: The discussion briefly touched on AI21 Labs and their new Jamba model.
- However, no specific details or opinions about the model were shared.
Codeium (Windsurf) Discord
- Windsurf Sounds Kickstarts Auditory UX: Windsurf AI debuted Windsurf Sounds, their initial project in sound design and Auditory UX, with the goal of boosting flow state and productivity.
- Check out the full video announcement on X.com for more details.
- Windsurf Next Beta Program Opens to Early Adopters: The Windsurf Next Beta program is ready for early testers to check out new features, with downloads available at Codeium.com.
- Minimum requirements include OS X Yosemite, glibc >= 2.28 for Linux, and Windows 10 (64-bit).
Gorilla LLM (Berkeley Function Calling) Discord
- v0 Dataset: Vanished or Merged?: A member inquired about the fate of the v0 openfunctions dataset within
io_uring.hand whether it was completely merged into the v1 dataset.- The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the
openfunctionsdataset inio_uring.h.
- The discussion seeks to understand the architectural changes and data migration strategies, if any, between the v0 and v1 versions of the
- Architectural Changes in Datasets: The conversation explores the architectural changes between the v0 and v1 versions of the
openfunctionsdataset inio_uring.h.- The members seek to understand the data migration strategies, if any.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!