[AINews] The new OpenAI Agents Platform
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
OpenAI may be all you need.
AI News for 3/11/2025-3/12/2025. We checked 7 subreddits, 433 Twitters and 28 Discords (224 channels, and 2851 messages) for you. Estimated reading time saved (at 200wpm): 258 minutes. You can now tag @smol_ai for AINews discussions!
In a livestream today, OpenAI dropped a sweeping set of changes to prepare for the Year of Agents:
- Responses API
- Web Search Tool
- Computer Use Tool
- File Search Tool
- A new open source Agents SDK with integrated Observability Tools
Atty Eletti told the full story of the design decisions, and sama called it "one of the most well-designed and useful APIs ever".
You can find more code samples and highlights on the exclusive Latent Space interview for today's launch:
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
1. AI Models and Performance: Model releases, benchmarks, performance comparisons of specific models
- Reka Flash 3, a new 21B parameter reasoning model from Reka AI, has been open-sourced @RekaAILabs, achieving competitive performance. @reach_vb highlighted that Reka Flash 3 is Apache 2.0 licensed and beats o1-mini, questioning why it's not trending. Reka AI further detailed that Reka Flash 3 powers Nexus, their new enterprise intelligence platform, and was fine-tuned on synthetic and public datasets, followed by RLOO with model-based and rule-based rewards. Weights are available on Hugging Face.
- OlympicCoder, a series of open reasoning models, outperforms Claude 3.7 Sonnet and models over 100x larger according to @_lewtun. The release includes CodeForces-CoTs dataset and IOI'2024 benchmark for competitive coding problems.
- DeepSeek has built a 32K GPU cluster capable of training V3-level models in under a week, according to @teortaxesTex. @SchmidhuberAI noted that DeepSeek is now discussing AI distillation, a concept he published in 1991, linking it to his earlier work. @cis_female reported getting 30 tokens/s running R1 on 3x abacus + two sticks with int0 quantization.
- Hugging Face Inference now supports Cerebras, as announced by @_akhaliq. Cerebras Inference is reported to run models like Llama 3.3 70B at over 2,000 tokens/s, 70x faster than leading GPUs.
- R1 is reportedly achieving 18t/s on a new M3 Ultra for around $9K, according to @reach_vb, suggesting increasing accessibility of high performance inference.
- Reka's Sonic-2 voice AI model is now available through Together API, delivering 40ms latency and high-fidelity voice synthesis, announced by @togethercompute.
- Qwen Chat has been enhanced with a unified multimodal interface, supporting text, images, and videos, and enhanced video understanding up to 500MB, a redesigned mobile experience with voice-to-text, guest mode, and expanded file upload capacity, according to @Alibaba_Qwen.
2. AI Agents and Developer Tools: Focus on tools for building and using AI agents, SDKs, APIs, and agentic workflows.
- OpenAI launched new tools for building AI agents, including a Responses API, Web search tool, File search tool, Computer use tool, and Agents SDK, as announced by @OpenAIDevs, @OpenAIDevs, @OpenAIDevs, @OpenAIDevs, @OpenAIDevs, @OpenAIDevs, and summarized by @omarsar0 and @scaling01. The Responses API unifies Chat Completions and tool usage, enabling multi-turn agents in a single request. Built-in tools include Web Search (powered by GPT-4o, achieving 90% on SimpleQA), File Search (with metadata filtering), and Computer Use (automating browser and OS tasks, achieving SOTA benchmarks). The Agents SDK (open-source, improving upon Swarm) facilitates orchestration of single and multi-agent workflows with guardrails and observability. @sama called the new API one of the "most well-designed and useful APIs ever". @swyx mentioned a Latent Space Podcast episode with OpenAI discussing these features.
- LangChain announced Agent Chat UI, an OSS web app for interacting with LangGraph apps via chat, and LangGraph-Reflection, a prebuilt graph for agents to self-critique and improve output, reported by @LangChainAI and @LangChainAI. They also highlighted how C.H. Robinson saves 600+ hours/day automating orders with LangGraph and LangSmith, according to @LangChainAI.
- Weaviate launched a Transformation Agent that allows users to not only query but also create and update data in the database, as announced by @bobvanluijt.
- Contextual AI released Contextual Reranker, an instruction-following SOTA reranker designed to improve precision in RAG pipelines and allow for granular control over ranking priorities, detailed by @apsdehal. @douwekiela introduced a similar instruction-following reranker, emphasizing its ability to prioritize based on user instructions.
- Perplexity AI launched a Windows App, providing access to voice dictation, keyboard shortcuts, and the latest models, as announced by @perplexity_ai.
3. AI Applications and Industry Impact: Real-world applications, industry use cases, and company news.
- Figure AI is preparing to ship thousands of humanoid robots, powered by Helix neural networks, as shown by @adcock_brett. He argues that Figure will be the ultimate deployment vector for AGI and that in the future, every moving object will be an AI agent. They are hiring interns and full-time roles and @DrJimFan expressed excitement about their humanoid home.
- Manus, a Chinese high-performing AI agent, was mentioned in AI/ML news by @TheTuringPost. Anthropic's models are reportedly powering Manus, which is described as the latest AI sensation, according to a report by @steph_palazzolo.
- Zoom is leveraging AssemblyAI's Speech-to-Text models to advance their AI research and development for AI Companion, according to @AssemblyAI.
- Cartesia announced Series A funding, as reported by @_albertgu. @saranormous praised their talent density and creativity, noting increased GPU resources.
- Perplexity AI is expanding beyond the web, mentioned in AI/ML news by @TheTuringPost.
- Embra is introduced as a full AI OS, managing email, meetings, relationships, writing emails, scheduling, and automating research, according to @zachtratar.
4. China and AI Competition: Focus on China's AI advancements and competition with the US.
- @teortaxesTex believes China will graduate hundreds of people of caliber comparable to AI greats and that the quality of Chinese ML grads and projects is increasing exponentially, suggesting that the US's hiring pool is insufficient to compete. He also suggests China is secretly steered by technocratic Isekai regression novel nerds.
- @dylan522p highlights China's rise in robotics, covering hardware basics and historical robotics firms in a series.
- @teortaxesTex suggests China may surpass the US in space due to America's inability to build specialized roads while China focuses on scale, engineering, and logistics in space. He predicts another "hockey stick event" in PRC mass to orbit within 5 years, noting they are demonstrably faster. @teortaxesTex contrasts US's "Stargate" approach with China's building "1000 2K GPU sheds", questioning if China's tech market is more competitive than perceived "communist centralization".
- @teortaxesTex argues the West is disserving itself by focusing on "Communism" instead of the "Industrial Party of China", suggesting they are taking on the "White Man's Burden" the West gave up on.
- @teortaxesTex questions the myth of "overcapacity", arguing that in key domains like housing, energy, chips, raw materials, and cars, "More Stuff Better," potentially contrasting with Western economic views.
- @teortaxesTex comments on China commoditizing EVs and humanoids, contrasting Elon Musk's vision with China's market actions.
5. AI Research & Techniques: Core AI research concepts and techniques being discussed.
- New research on optimizing test-time compute via Meta Reinforcement Fine-Tuning (MRT) was highlighted by @rsalakhu and @iScienceLuvr. MRT is presented as a new fine-tuning method achieving 2-3x performance gain and 1.5x token efficiency for math reasoning compared to outcome-reward RL, outperforming outcome-reward RL and achieving SOTA results at the 1.5B parameter scale.
- Inductive Moment Matching (IMM), a new class of generative models from Luma AI for one- or few-step sampling, surpasses diffusion models on ImageNet-256x256 with 1.99 FID using 8 inference steps, as noted by @iScienceLuvr.
- Effective and Efficient Masked Image Generation Models (eMIGM), a unified framework integrating masked image modeling and masked diffusion models, outperforms VAR and achieves comparable performance to state-of-the-art continuous diffusion models with less NFE, according to @iScienceLuvr.
- Medical Hallucinations in Foundation Models are benchmarked in a new study, finding GPT-4o has the highest hallucination propensity in tasks requiring factual and temporal accuracy, but Chain-of-Thought (CoT) and Search Augmented Generation can reduce hallucination rates, as reported by @iScienceLuvr.
- @finbarrtimbers highlighted research using RLOO (Reinforcement Learning from Objective Optimization) for training, noting the excitement around labs exploring algorithms beyond PPO.
- @iScienceLuvr mentions Diffusion language models that can arbitrarily reshuffle token positions as potentially the most powerful way to scale test time compute for bounded sequence lengths.
- @shaneguML describes Chain-of-thoughts as "dark knowledge" of LLMs, allowing for deeper understanding of models through prompting methods.
- @SchmidhuberAI discusses AI distillation, referencing his 1991 work and connecting it to DeepSeek's discussions.
- @jerryjliu0 raises concerns about versioning and regression testing in MCP (Model-as-Control-Plane) agent systems, highlighting potential issues with dynamic behavior changes and API updates causing outages.
- @rasbt released a "Coding Attention Mechanisms" tutorial, explaining self-attention, parameterized self-attention, causal self-attention, and multi-head self-attention.
- @TimDarcet notes that Gaussian Mixture Models (GMM) fit MNIST quickly and well using Expectation-Maximization (EM), questioning if EM GMM might be sufficient.
6. Memes and Humor
- @aidan_mclau made a humorous observation about how many people, even at the Formula One level, misunderstand the function of brakes. This tweet garnered significant attention. @hkproj jokingly replied that brakes are "clearly used to let the driver stretch their foot".
- @nearcyan recommended @TrungTPhan as "honestly among the top posters on the entire site lately", praising his content and suggesting a strong follow.
- @scottastevenson announced "Vibecoding, but for legal docs. Coming soon."
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Gemma 3 Anticipation and Potential Impact
- New Gemma models on 12th of March (Score: 387, Comments: 70): Gemma 3 is set to be released on March 12, 2025, during the "Gemma Developer Day" event in Paris. The announcement features a sleek, modern design with a geometric star icon, highlighting the professional and high-tech nature of the event.
- Gemma 3 Expectations: The community is anticipating the release of Gemma 3 during the "Gemma Developer Day" event, with some users expressing skepticism about a confirmed release. Discussions highlighted the event's high-profile speaker panel and the expectation of significant announcements, although some caution against assuming a release given the event's closed-door nature.
- Technical Compatibility and Improvements: There's a strong interest in ensuring Gemma 3 works seamlessly with llama.cpp, with users recalling issues from Gemma 2's launch and hoping for better integration this time. Some users mention Google's internal use of a llama.cpp fork, suggesting potential for improved compatibility and contributions to the open-source community.
- Model Variants and Performance: Users are keen on seeing more mid-sized models like Gemma 27B, with suggestions for larger models like 32B, 40B, and 70B to enhance performance. There's also interest in smaller models like 9B and 12B for specific tasks, emphasizing the need for diverse model sizes to cater to different use cases.
Theme 2. M3 Ultra 512GB Review with Deepseek R1 671B Q4
- M3 Ultra 512GB does 18T/s with Deepseek R1 671B Q4 (DAVE2D REVIEW) (Score: 384, Comments: 215): The M3 Ultra 512GB achieves a performance of 18T/s when paired with Deepseek R1 671B Q4, as highlighted in the DAVE2D review.
- Discussions highlight issues with RAG systems and memory bandwidth, noting inefficiencies in the R1/MoE architecture and possible areas for optimization. Users discuss that smaller models are typically faster, but the 70B model is slower than expected, and there are potential scheduling/threading issues causing pipeline stalls.
- Commenters debate the cost and efficiency of the M3 Ultra versus other systems, comparing it to setups involving Nvidia 5090s and H200s, emphasizing the energy efficiency and availability of the M3 Ultra. Users mention that while the M3 Ultra has lower power consumption at under 200W, alternative systems might offer higher performance but at greater cost and power usage.
- There are detailed technical discussions about quantization methods like Q4_K_M and memory interleaving, with references to GGML_TYPE_Q6_K and super-blocks for quantization. Users also discuss the memory bandwidth and its implications on performance, particularly when running inference on systems with large RAM capacities.
Theme 3. NVLINK's Impact on RTX 3090 Performance
- NVLINK improves dual RTX 3090 inference performance by nearly 50% (Score: 144, Comments: 41): NVLINK reportedly boosts the inference performance of dual RTX 3090 GPUs by nearly 50%. This suggests a significant improvement in computational efficiency for tasks leveraging these GPUs in tandem.
- Motherboard and GPU Configuration: Users discussed the motherboard's PCIe lane configuration, noting that using x8 risers might limit performance. hp1337 explained their setup with 6 GPUs using x8 lanes, suggesting future tests with x16 lanes for potential performance insights.
- NVLink Availability and Alternatives: FullOf_Bad_Ideas questioned the availability and cost of NVLink bridges for RTX 3090s, with a_beautiful_rhind suggesting an alternative using open-gpu-kernel-modules. However, Pedalnomica noted this only enables P2P, not matching NVLink's performance.
- Quantization and FP8 Calculations: JockY and others discussed the use of FP8 quantization on RTX 3090s, highlighting that vLLM uses the FP8 Marlin kernel for performance without native FP8 hardware, as confirmed by Competitive_Buy6402 and bihungba1101 with references to vLLM's GitHub.
Theme 4. Alibaba's R1-Omni for Emotion Recognition
- Alibaba just dropped R1-Omni! (Score: 244, Comments: 76): Alibaba has launched R1-Omni, which focuses on enhancing emotional intelligence through Omni-Multimodal Emotion Recognition and Reinforcement Learning.
- Ethical Concerns: Several commenters express concerns about the ethical implications of emotion detection technology, highlighting issues such as invasiveness and potential discrimination against neurodivergent individuals. There are worries that automating such subjective tasks could lead to misuse and harm, particularly if used without consent or transparency.
- AI in Therapy: The discussion around AI therapists is polarized, with some seeing potential benefits like accessibility and consistency, while others warn of risks such as reinforcing anxieties or lacking human oversight. The debate touches on the balance between cost, effectiveness, and the potential for misuse by corporations.
- Technical and Community Aspects: There is a mention of the R1-Omni model being available on GitHub, with questions about its relation to Alibaba and internal competition. Users also critique the naming conventions of models and request demonstrations of the technology.
Theme 5. Reka Flash 3: New Open-Source 21B Model
- Reka Flash 3, New Open Source 21B Model (Score: 220, Comments: 50): Reka Flash 3 is a new open-source model featuring 21 billion parameters. It is available on HuggingFace and more details can be found in the Reka AI blog.
- The Reka Flash 3 model, despite its smaller size of 21 billion parameters, is being compared to larger models like QwQ-32B and has shown promising performance benchmarks. Some users noted its potential for use in scenarios where speed is prioritized over size, while others expressed skepticism about its coding capabilities, particularly when compared to models like Mistral Nemo.
- Discussions highlighted the model's Apache license, which allows for broad usage, and its suitability for 24GB cards due to its size. There is interest in its potential multimodal capabilities, though it is currently not confirmed.
- There is a strong interest in the model's reasoning capabilities, with users impressed by its ability to solve complex problems like the "tiger riddle." This demonstrates the model's potential in handling intricate reasoning tasks, which were previously thought to require much larger models.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. Claude 3.7: Enhancing Developer Skills through Debugging
- Claude 3.7 made me a better developer. (Score: 234, Comments: 64): The author criticizes Claude 3.7 for producing overly complex and inefficient code, describing it as "absolute garbage" and "over-engineered nonsense." Despite the frustration, the author acknowledges that the process of fixing such code has improved their development skills, suggesting that resolving AI-generated code issues can be an effective learning experience.
- Commenters emphasize the importance of proper Git practices, such as creating branches for new features and committing frequently to facilitate easy reversion of AI-generated code. They suggest using rebase to merge commits into a single one before merging back to the main branch, highlighting that frequent commits are professional and beneficial.
- Some users discuss their experiences with Claude 3.7 and 3.5, noting that 3.7 often produces overly complex code, while 3.5 was simpler and more reliable. However, there are mixed opinions on 3.5's current performance, suggesting it may have degraded over time.
- A few commenters share strategies for working with AI-generated code, including using test-driven development to guide code quality and having the AI explain concepts rather than directly generating code. They caution against relying on AI for high-level architectural decisions, as it often results in reimplementation of existing functionalities and excessive complexity.
- Dario Amodei: AI Will Write Nearly All Code in 12 Months!! Are Developers Ready? (Score: 181, Comments: 183): Dario Amodei predicts that AI will write nearly all code within 12 months, posing a significant shift for developers. The video content is not analyzed, but the title suggests a discussion on the readiness of developers for this rapid advancement in AI-driven coding.
- Many users express skepticism about Dario Amodei's prediction, comparing it to past over-optimistic claims like Elon Musk's robotaxi timeline and Hinton's radiologist replacement forecast. They argue that AI-generated code still requires significant human oversight due to issues like hallucinations and logical errors, which are not easily resolved by current AI models.
- Several commenters argue that while AI can assist in coding, it cannot yet replace developers due to its inability to autonomously manage complex tasks, ensuring code quality, and understanding design and architecture. They highlight that AI tools can generate code but still need human verification and guidance, making them more akin to advanced compilers rather than independent coders.
- There is a consensus that the current hype around AI's capabilities is largely driven by marketing and fundraising efforts. Commenters emphasize that genuine breakthroughs in AI coding would likely leak due to their market impact, and that claims of rapid advancements often serve more to attract investor interest than reflect immediate technological reality.
- This is why I use ChatGPT instead of Grok (Score: 191, Comments: 14): The post criticizes Claude-generated coding output and expresses a preference for ChatGPT over Grok. An accompanying image humorously contrasts using Reddit on a PC as a more intellectual activity compared to "doomscrolling" on a phone, suggesting that the former is akin to "curating knowledge" or engaging in “Reddit Discourse Analysis.”
- ChatGPT vs. Grok: Grok is criticized for being less versatile and overly complicated compared to ChatGPT, which, despite being labeled as a "liar," is preferred for tasks like grammar correction. Users express frustration with ChatGPT's tendency to delete content it deems unnecessary without acknowledgment.
- Doomscrolling on Devices: The discussion highlights that doomscrolling is similar across devices, with the difference being that a PC setup might appear more controlled but still involves the same mental strain. The distinction is more about the optics and control rather than the device itself.
- User Experience with AI Models: There is interest in comparing responses between different AI models like Grok 3 and GPT-4.5, but limitations such as the 50 message per week cap on GPT Plus hinder such explorations.
Theme 2. Nvidia's Gen3C: Advancements in Image to 3D Conversion
- Gen3C - Nvidia's new AI model that turned an image into 3D (Score: 259, Comments: 25): Nvidia's Gen3C is a new AI model that converts 2D images into 3D representations, showcasing advancements in image processing technology.
- Memory Concerns: Users express concerns about Gen3C potentially being memory-intensive, questioning its feasibility on consumer-grade GPUs. TheSixthFloor suggests that it might require at least 16GB VRAM similar to other advanced AI models.
- Technical Clarifications: Silonom3724 clarifies that Gen3C uses Image to point cloud to NeRF rather than direct 3D polygon representation, while grae_n notes the inclusion of reflective materials suggesting a gaussian/NeRF approach.
- Availability and Access: The Gen3C code is anticipated soon, with links provided to the GitHub repository and Nvidia's research page. Users are eager for updates on its release and local run capabilities.
Theme 3. Dario Amodei: AI Code Generation Predictions and Skepticism
- Dario Amodei: AI Will Write Nearly All Code in 12 Months!! (Score: 139, Comments: 130): Dario Amodei predicts that AI will write nearly all code within 12 months, sparking skepticism among the engineering community. The absence of detailed arguments in the post limits further analysis of the prediction's feasibility.
- Critics argue that AI lacks the capability to write all code within 12 months due to limitations in context window size, which affects its ability to maintain awareness across large codebases. AI struggles to handle complex systems like the Linux kernel or critical system control code effectively, as evidenced by the failure to convert the kernel to Rust.
- Skepticism arises over the practicality of AI writing code without human oversight, with engineers emphasizing the necessity of human review and clear specifications, which AI currently cannot independently manage. Middle management is criticized for lacking the technical expertise to guide AI in this task.
- Some commentators view Dario Amodei's prediction as a strategic move to attract funding, rather than a realistic forecast. The current limitations of tools like Copilot highlight the challenges AI faces in efficiently handling large projects.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Exp
Theme 1: Open AI's Agent Development Ecosystem Evolves
- OpenAI Unveils Responses API and SDK for Agent Creation: OpenAI launched a new Responses API and Agents SDK to simplify agent development, emphasizing improved integration, streamlined workflows, and production readiness. The new SDK offers features such as tracing, guardrails, and lifecycle events, but also signals a sunset for the Assistants API by mid-2026.
- Community Debates Merits of New Agent Tools: The community is actively debating the value and functionality of the new tools, with some questioning the trustworthiness and consistency of code generated by GPT-4.5 and seeking clarity on differences between the Responses API and existing chat completions API. While the new Web Search Tool aims to improve search result reliability, users have observed that it lacks source selection capabilities similar to other platforms.
- Agents SDK's Observability Tools Triggers Tracing Questions: OpenAI's claim about Braintrust data tracing integration in its new Agents SDK is creating buzz, as users are wondering if OpenAI supports integrations with Langfuse or other agent observability tools. It does, and more details on how to use the OpenAI's SDK or send traces to your own can be found in this Github repo.
Theme 2: Navigating the Frontier of AI Model Capabilities and Limitations
- Reka Flash 3 Enters the Ring: Reka Labs released Reka Flash 3, a 21B reasoning model trained from scratch, showcasing competitive performance and multimodal capabilities, challenging existing models like QwQ-32B and o1-mini. Despite its open-source nature, questions linger about its architecture and training data, shifting its purpose from on-device use to powering Reka's AI worker platform Nexus.
- Anthropic's Claude 3.7 Faces Output Constraints on Perplexity: Users discovered that Claude 3.7 has an output limit of 5000 tokens on Perplexity, contrasting with Anthropic's official documentation stating it can output up to 128K. This discrepancy raises questions about the model's practical utility and highlights the importance of understanding platform-specific limitations, particularly in commercial applications.
- GPT-4.5 Code: Inconsistently Hilarious: Users report that GPT-4.5 generates inconsistent code, such as calling a non-existent function
startApp()after defining astart()function. Concerns are raised about the need for constant oversight of GPT-4.5's output and the trustworthiness of AI-generated code in general, describing a need to babysit this 'intelligence'.
Theme 3: Community-Driven Tools and Techniques for AI Development
- AI Code Fusion Tool Debuts for Optimizing LLM Contexts: A community member introduced AI Code Fusion, a tool designed to optimize code for LLM contexts by packing files, counting tokens, and filtering content, showcasing the community's proactive approach to addressing challenges in AI development. The tool's creator is actively seeking feedback from the community to refine its functionality.
- Aider's Watch Files' Live Mode Enables Interactive Coding: Aider's new
--watch-filesflag enables live mode, allowing developers to interactively code with AI by adding comments likeAI!, triggering Aider to make changes, andAI?, triggering it to answer questions, signaling a shift towards more collaborative and interactive coding workflows. - Leveraging Browserless.io to Bypassing Bot Detection: Nous Research AI members recommended using Browserless.io for bypassing bot detection and CAPTCHAs in web scraping, highlighting its ability to avoid leaving subtle fingerprints and bypass many web protections. It supports browser automation using Puppeteer or Playwright and features a scraping IDE for testing and debugging.
Theme 4: Hardware and Infrastructure Considerations for AI Workloads
- Local LLM vs Cloud GPU: The Great Debate Rages On: Users debated the cost-effectiveness of running LLMs locally on high-end hardware, such as an M3 Ultra Mac Studio with 512GB of RAM, versus utilizing cloud-based GPUs, balancing performance with long-term affordability. AMD users reported that Vulkan and ROCm performance was broken in drivers 24.12.1, with performance dropping by 35%, though ROCm was fixed in v1.1.13+.
- Speculative Decoding Stalls on Some Setups: Speculative decoding may perform worse than standard inference when limited by RAM bandwidth, or when comparing 0.5b to 14b models. With the advent of 100Gbit NVMe drives, 400Gbit networks, and CXL memories, swap is becoming useful again, as highlighted in Dave2D's M3 Ultra Mac Studio Review.
- SemiAnalysis Hosts Nvidia Blackwell GPU Hackathon: SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on March 16th, featuring hands-on exploration of Blackwell & PTX infrastructure and speakers from OpenAI, TogetherAI, and Thinking Machines. The hackathon was mentioned across multiple Discords, highlighting its industry significance and attracting developers with the promise of early access to cutting-edge GPU technology.
Theme 5: Ethical Concerns and Usage Policies in AI Development
- Discussions around OpenAI Terms of Service and Jailbreaking: In light of OpenAI's Terms of Service, members had cautionary discussions, and server rules also prohibit discussions on how to bypass these restrictions, while suggesting focusing on allowed content within ethical boundaries. Exploration is permitted through text about violence involving fantasy writing, image generation, or roleplaying games is not forbidden by those general policies.
- Prompting Techniques and Creative Use-Cases Discussed: Members on OpenAI are using prompting techniques to attempt to elicit more candid responses from models, without violating safety policies. Questions were proposed that included having the model teach programming similar to how a user's grandma used to.
- User Want Grok Vibes for ChatGPT: Discussions focused on the desired vibes in relation to filtering content; users shared memes such as this one and expressed the desire for a ChatGPT that does not filter out or restrict content in the same way. Deep Research price comparisons were also made, citing OpenAI's deep research as the best choice, but also acknowledging that the limits SUCK right now lol.
PART 1: High level Discord summaries
Cursor IDE Discord
- Cursor Nightly Bites the Dust: The latest nightly update for Cursor introduced critical bugs, breaking the AI Chat and Cursor settings, rendering the GUI unusable.
- Users reported that reinstalling the app did not resolve the issues, indicating a problem with the latest nightly update itself.
- Claude 3.7 Pricing Sparks Outrage: Users are upset with the new pricing for Claude 3.7 Thinking, which now costs 2 requests instead of 1, prompting some to consider alternatives.
- Discussions highlighted that using Claude 3.7 Thinking with large context could potentially cost up to 16 cents per request.
- Manus AI: Revolutionary Agent or Overhyped Tool?: A user shared Manus AI, calling it the craziest AI agent, showcasing its ability to perform tasks such as cloning the Apple website (Tweet from el.cine).
- Skeptics suggested it might be Sonnet 3.7 with some tools to use your PC, while others speculated a future with AI agents running their companies.
- Cursor's Stability Faces Scrutiny: Multiple users reported Cursor is barely working, often stuck or unresponsive, with Claude Max not functioning for some.
- Some users found that rolling back to version .46.11 fixed the problems, leading to speculation that version .47 might be restricted to a limited user base.
- Local LLM vs Cloud GPU: The Great Debate: A user suggested buying an M3 Ultra Mac Studio with 512GB of RAM to run full DeepSeek R1 locally, triggering discussions on the cost-effectiveness of the setup.
- While some favored local LLMs, others argued that cloud-based GPUs offer faster inference and are more economical in the long run.
Perplexity AI Discord
- Perplexity Releases Desktop App: Perplexity AI released a native desktop app for PC (perplexity.ai/platforms), enabling voice dictation, keyboard shortcuts, and access to the latest models.
- However, users note that the app is essentially a wrapper for the web version, lacking desktop advantages and browser extensions like Complexity; "it's just a nerfed web browser".
- Revolut Promo Codes Give Users a Headache: Revolut users are experiencing issues redeeming Perplexity Pro promo codes, with some being told they need to create a new account or contact Revolut.
- As one user mentioned, *"I contacted Revolut and they said I need to register new account with Perplexity. Its a bummer, but hey, still worth it I guess."
- Claude 3.7 Capped at 5K Tokens: Users discovered that Claude 3.7 has a hard output limit of 5000 tokens on Perplexity.
- This contrasts with Anthropic's official documentation stating it can output up to 128K.
- Universities Explore Perplexity Enterprise: A user is evaluating integrating Perplexity Enterprise into a university system, emphasizing its ability to connect to internal knowledge bases for policies and procedures, see Perplexity Enterprise FAQ.
- The platform offers features for internal data search and customized workspaces.
- API chat completions experience truncation: A member reported intermittent content truncation when calling the chat.completions API with the sonar-reasoning-pro model; see Perplexity AI Playground.
- Increasing max_token allowances did not resolve the issue; the member suggested that the Perplexity AI Playground consistently outputs full responses, suggesting the issue is specific to the API.
Unsloth AI (Daniel Han) Discord
- Reka Flash 3 Sparks Interest: Reka Flash 3, a 21B reasoning model under the Apache 2.0 license, comparable to QwQ-32B and o1-mini, has been released.
- The Reka team consists of ex-DeepMind employees, and Reka's website states that the flash model is multimodal.
- Multi-GPU Training Recommendations Given: When asked about finetuning a large model across multiple nodes and multiple GPUs using Unsloth, a member recommended using axolotl or llama factory.
- Currently, Unsloth does not (officially) support multi-GPU, although support may be arriving in the coming weeks.
- AI Code Fusion Tool Makes Debut: A member introduced AI Code Fusion, a tool designed to optimize code for LLM contexts by packing files, counting tokens, and filtering content, available on GitHub.
- The creator of AI Code Fusion is seeking feedback from the community on this tool.
- Regex Beats LLMs in Date Extraction: A user who aimed to train a model to extract the right opening times from queries was advised that a regex system might be more suitable than using AI for this task.
- A member linked to an relevant xkcd comic about over-engineering simple tasks with complex solutions.
- GRPO Batch Size Affects Training: The GRPO batch size must be the same as the number of generations and
num of generationfor the GRPO RL algorithm must be tuned well.- It was recommended that the range for
num generationsis 4 to 8, and that increasing the batch size multiple reduces your training time, but increases GPU memory requirement drastically.
- It was recommended that the range for
Nous Research AI Discord
- Deep Hermes Shows Early Reasoning: The new Deep Hermes model was released with early reasoning capabilities, distilled from R1, as shown on Hugging Face.
- Members are excited to test it, but expressed concern about exceeding context length.
- Scrape Without Detection via Browserless.io: A member recommended Browserless.io for bypassing bot detection and CAPTCHAs in web scraping, highlighting its ability to avoid leaving subtle fingerprints.
- It supports browser automation using Puppeteer or Playwright and features a scraping IDE for testing and debugging.
- SemiAnalysis Hosts Blackwell GPU Hackathon: SemiAnalysis is hosting a Nvidia Blackwell GPU Hackathon on March 16th, featuring hands-on exploration of Blackwell & PTX infrastructure and speakers from OpenAI, TogetherAI, and Thinking Machines.
- The event is sponsored by companies like Together, Lambda, Google Cloud, Nvidia, and OpenAI.
- Optimize UTs with Forward Gradients: Members discussed using forward gradients for optimizing Universal Transformer (UT) training, as they may be more efficient due to the shared layers in UTs.
- This approach may be interesting when combined with N-GPT.
- ByteDance Launches Trae IDE: ByteDance has released Trae, a free, AI-based IDE similar to Cursor, featuring Claude Sonnet 3.7 for free use, and is available for Mac and Windows.
- A Linux version is planned, and the IDE targets beginners in AI coding.
Eleuther Discord
- Loglikelihood Evaluation Liberates LLMs: Members recommended using loglikelihood-based evaluation for Multiple Choice Question Answering (MCQA) tasks, bypassing the need for strict output formatting.
- This explains why instruct models get some answers correct, but their chat alternatives usually score 0.
- Diffusion Models Perform Spectral Autoregression: A blog post (Spectral Autoregression) reveals that diffusion models of images perform approximate autoregression in the frequency domain.
- The author notes this theory feels intuitive but has little predictive power in practice, especially when using colored noise matching the RAPSD of the target distribution.
- Neural Flow Diffusion Models Enhance Gaussian Noise: Neural Flow Diffusion Models (NFDM) enhance diffusion models by supporting a broader range of forward processes beyond the standard Gaussian noise, with an end-to-end, simulation-free optimization objective.
- Experiments demonstrate NFDM's strong performance and state-of-the-art likelihood estimation, according to the paper.
- Guidance From Badness Averts Mode Dropping: A paper suggests guiding away from badness rather than unconditional-ness to avoid the mode dropping of CFG (classifier-free guidance).
- The approach leads to disentangled control over image quality without compromising the amount of variation, yielding record FIDs of 1.01 for 64x64 and 1.25 for 512x512 on ImageNet.
- Tokenizer Troubles Threaten Patching Evaluation: A member seeks advice on choosing the right metrics to evaluate patching results when analyzing important circuits for Math CoT answers.
- The core issue is that the tokenizer splits numbers like 10 and 15 into two tokens each, disrupting the straightforward application of the evaluation equation.
Latent Space Discord
- Avoma Competes with Gong: Avoma, an all-in-one AI platform for automating note-taking, scheduling, coaching, and forecasting, was suggested as a competitor to Gong.
- The suggestion was made in #ai-general-chat.
- Factorio Learning Environment Tests LLMs: The Factorio Learning Environment (FLE), available on GitHub, is designed to test agents in long-term planning, program synthesis, and resource optimization using the game Factorio.
- A member expressed excitement and humorously requested a job at the Anthropic Factorio lab immediately, highlighting that the environment is currently text-only but could benefit from multimodal data input like Qwen 2.5 VLM.
- Contextual AI Unveils Instruction-Following Reranker: Contextual AI introduced a new reranker that follows custom instructions to rank retrievals based on requirements like recency, document type, or source.
- The announcement was made in #ai-general-chat.
- OpenAI Launches Agent Tools: OpenAI launched new tools for building agents, including a Responses API, Web Search Tool, Computer Use Tool, and File Search Tool.
- They also released a new open source Agents SDK with integrated Observability Tools with tracing, guardrails and lifecycle events, advertising that the SDK is production ready.
- Luma Labs Introduces Inductive Moment Matching: Luma Labs released Inductive Moment Matching (IMM), a new pre-training technique that claims to deliver superior sample quality with 10x more efficiency compared to diffusion models.
- The discussion was centralized in #ai-general-chat.
OpenRouter (Alex Atallah) Discord
- OpenRouter launches FAQ Page: OpenRouter launched a FAQ page to address common questions and provide more clarity for users.
- Alongside the new FAQ, a small quality of life update was released to improve user experience.
- Gemini 2.0 Image Generation Leaks: Gemini 2.0 Flash Experimental image generation is out, capped at 32k context, but lacks code execution, search grounding, or function calling, with users finding the image save code under
gemini-2.0-flash-exp.- This was found on this Reddit post.
- OpenAI Teases Dev-Facing Reveal: Members speculated about an OpenAI reveal based on this post mentioning the Responses API.
- The reveal was expected at 10 AM PT.
- Cohere's AYA Vision Questioned: Members inquired about OpenRouter's support for AYA vision by Cohere and other Cohere models, with pricing for AYA Expanse models (8B and 32B) potentially at $0.50/1M Tokens for Input and $1.50/1M Tokens for Output.
- Users are still trying to confirm these rates, as seen in this screenshot.
- Parameter Calculation Gets Axed: OpenRouter removed the parameter calculation due to inaccuracy, deeming it potentially misleading.
- The team plans to revamp it with manual curation later, acknowledging the difficulty in tuning parameters, humorously calling them ancient runes.
OpenAI Discord
- Agent Tools Unveiled at OpenAI Dev Livestream: OpenAI debuted Agent Tools for Developers during a livestream, followed by an AMA (Ask Me Anything) session offering direct interaction with the development team, with more information and questions on the OpenAIDevs X post.
- The AMA was scheduled from 10:30–11:30 AM PT, allowing developers to directly engage with the team behind the new features.
- Users Crave Grok's Vibe in ChatGPT: Members expressed a desire to bring Grok's unique characteristics to ChatGPT, as seen in an Elon Musk GIF referencing Elon Musk.
- Discussion revolved around the nature of the desired vibes, specifically in relation to filtering content.
- GPT-4.5 Code Generates Inconsistencies: Users reported that GPT-4.5 generated inconsistent code, such as calling non-existent functions or misnaming existing ones, leading to questions about its reliability compared to GPT-4o.
- Concerns were raised about the need for constant oversight of GPT-4.5's output and the trustworthiness of AI-generated code in general, describing a need to babysit this 'intelligence'
- New Responses API Mirrors Assistant API?: A member inquired about the differences between the new responses API and the existing chat completions API, sparking a discussion on the API's functionalities.
- Clarification emerged suggesting that the new responses API is basically the Assistants API but better.
- Jailbreaking Endangers ToS: Members discussed simulating scenarios to get AI models to bypass restrictions, or improve accuracy, which is considered 'jailbreaking', but may violate OpenAI's Terms of Service.
- Users were cautioned against violating ToS to protect account access, with the server prohibiting discussions on bypassing restrictions; discussion of violence involving fantasy or roleplaying is not considered forbidden.
aider (Paul Gauthier) Discord
- Aider 'Watch Files' Now Live: Paul Gauthier announced that running
aiderwith the--watch-filesflag now enables live mode, watching all files in the repo for coding instructions viaAI,AI!, orAI?comments, as shown in the Aider browser UI demo video.- The exclamation point
AI!triggers aider to make changes, while the question markAI?triggers it to answer questions.
- The exclamation point
- Aider Daily Budget Varies Wildly: Members discussed Aider's daily budget, with one reporting roughly 2x the leaderboard cost for Sonnet 3.7 with 7-12 hours of AI coding per week.
- They cautioned that a 40-hour week could easily result in 8-10x the leaderboard cost, while other users manage cost by defaulting to cheaper models like o3 or R1.
- DMCA Takedown Shuts Down Claude Code: A user reported receiving a DMCA takedown notice for forking the Claude code leak repo, with the original leaker and all forks affected.
- Another user speculated about the possibility of o1 pro / o3 mini pro releases in the API soon.
- Aider's Edit Format Defined: The correct edit format in the Aider leaderboard refers to the format Aider expects from the LLM for editing files, with Aider documentation on edit formats detailing the whole and diff editing formats.
- Different models perform better with different formats.
- Code-Act Repo Potentially Interesting: A member shared a link to the code-act repo.
- They noted that it might be related to the discussion.
LM Studio Discord
- Unity Hooks Up with LM Studio: A member showcased a YouTube video connecting Unity and LM Studio using a JSON file for data, but was unsure where to post it in the Discord.
- Users are requesting a dedicated Unity channel for better organization.
- DIY Internal LLM Chat Advice Sought: A member is seeking advice on setting up an internal LLM Chat with user accounts, integrated with their company's Google Docs knowledge base, potentially using an inference API.
- They're considering LlamaIndex for vector DB, AnythingLLM or OpenWebUI for chat interfaces, and exploring options within LM Studio.
- Python SDK Lacks Vision, TypeScript SDK Ahead: A member using Python SDK 1.0.1 noticed the Typescript SDK can send images to vision models, but this feature hasn't been ported to Python yet.
- The community is awaiting the arrival of vision model support in the Python SDK.
- Copy4AI: Extension Snaps Code Context: A member inquired about the
ext installcommand for the Copy4AI extension, designed to copy code snippets for AI assistants.- The extension, now named
leonkohli.snapsource, can be accessed via the extension sidebar in VS Code.
- The extension, now named
- AMD Driver Disaster: Vulkan and ROCm Crippled, Some Recovered: An AMD user reported Vulkan and ROCm performance tanked by 35% in drivers 24.12.1, but ROCm was fixed in v1.1.13+.
- Vulkan performance remained at 50% in 25.1.1, improving incrementally in 25.2.1, with a bug report submitted to AMD.
HuggingFace Discord
- X Gets Hit By Cyberstorm: Dark Storm claimed responsibility for a DDoS attack on X, causing widespread outages on the platform.
- Experts dismissed Elon Musk's suggestion of Ukrainian involvement, with Ciaran Martin calling it "wholly unconvincing" in a BBC article.
- LanguageBind Beats ImageBind: Members discussed using a single solution to process image, audio, video, and pdf modalities, and a member recommended LanguageBind, noting it supports all modalities and beats ImageBind.
- The model, trained from scratch on synthetic and public datasets, performs competitively with proprietary models like OpenAI o1-mini.
- Reka Space Gets Even Smaller: Reka Flash 3, a 21B general-purpose reasoning model, can no longer be called on-device model but is used to power Nexus, Reka's platform for creating and managing AI workers with native deep research capabilities (Reka Space, getnexus.reka.ai).
- The model, trained from scratch on synthetic and public datasets, performs competitively with proprietary models like OpenAI o1-mini, and powers Nexus, Reka's platform for creating and managing AI workers with native deep research capabilities.
- RAGcoon Launches to Help Startups: A new agentic RAG project named RAGcoon has launched, designed to assist in building startups by navigating various resources and suggestions via hybrid search, query expansion, and multi-step query decomposition.
- Built on LlamaIndex, it uses Qdrant for vector database services, Groq for LLM inference (QwQ-32B by Qwen), Hugging Face for embedding models, FastAPI for the backend API, and Mesop by Google for the frontend, and boasts impressive reliability of retrieved context.
- Ollama Takes Over HfApiModel: Members showed how to replace
HfApiModelfrom Hugging Face with Ollama for use withsmolagentsby creating a customOllamaModelclass that interacts with Ollama's API for prompt generation, allowing local LLMs to be used withsmolagents.- They also shared snippets for using Gemini, OpenAI, and DeepSeek models with
smolagents, providing examples for setting up theLiteLLMModelandOpenAIServerModelwith appropriate API keys, and a link to Google AI Studio was provided to obtain a free API key for Gemini.
- They also shared snippets for using Gemini, OpenAI, and DeepSeek models with
GPU MODE Discord
- CUDA Noob Goes to San Jose: Despite lacking CUDA experience, a member expressed interest in attending the GPU mode meeting on March 16th in San Jose.
- The discussion sparks questions regarding the necessity of specialized knowledge for participation.
- Triton's
tl.fullBeats Casting Conundrums: A user successfully employedtl.fullin Triton to craft a 0-dim tensor with a defined value and data type (tl.full((), 5, tl.int8)) to bypass overflow quandaries when accumulating to a tensor.- The triumphant solution involved
tmp_5 = tl.full((1,), value=5, dtype=tl.int8); out = a.to(tl.int8) + tmp_5.
- The triumphant solution involved
- Triton Triumphs Softmax Kernel Speed: A user's pipeline softmax kernel in Triton surprisingly outperformed PyTorch, proving to be faster on a float16 T4 colab, as demonstrated in this image.
- Results show how Triton enables new high-throughput designs.
- Padding Prevents SMEM Bank Conflicts: Addresses for stmatrix need padding to circumvent targeting the same starting SMEM bank, which would otherwise trigger an 8x conflict, mirroring solutions previously implemented in fast.cu and deepgemm codes.
- Given no hardware solution exists, memory layout management is critical when tiled layouts are impractical.
- HuggingFace Libraries Migrate to TS/JS with WebNN/WebGPU: A member is actively porting the entire HuggingFace libraries to TS/JS using WebNN/WebGPU to create a frontend implementation.
- Separately, the initial structure for IPFS Accelerate JS was implemented with placeholder modules and TypeScript conversion via this commit.
Interconnects (Nathan Lambert) Discord
- SemiAnalysis Hosts Blackwell GPU Hackathon: SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday, March 16th, featuring speakers from OpenAI, TogetherAI, and Thinking Machines.
- The hackathon explores Blackwell & PTX infrastructure while collaborating on open-source projects, and is sponsored by Together, Lambda, Google Cloud, Nvidia, GPU Mode, Thinking Machines, OpenAI, PyTorch, Coreweave, and Nebius. More details are available at the SemiAnalysis Hackathon page.
- Reka Labs Releases Reka Flash 3: Reka Labs has open-sourced Reka Flash 3, a new reasoning model trained from scratch with only 21B parameters achieving competitive performance.
- The model was finetuned on synthetic and public datasets, followed by RLOO with model-based and rule-based rewards, forcing the model to output </reasoning> to control quality vs. thinking time, as described in their blog post.
- Anthropic ARR Soars, Powers Manus AI: According to The Information, Anthropic grew from $1B ARR to $1.4B ARR in the first two months of 2025, using their models to power Manus, the latest AI sensation.
- The models are powering Manus, described as the latest AI sensation.
- OpenAI Unveils New APIs and Agents SDK: OpenAI launched new APIs and tools for easier development of agent applications, including the Responses API, Web search tool, File search, Computer use tool, and an open-source Agents SDK.
- The existing Assistants API will be phased out by mid-2026, and the changelog mentions new models o3-mini-pro and o1-pro in the API.
- Dario Predicts AI Coding Domination: Anthropic CEO, Dario Amodei, predicts that AI will write 90% of the code in the next 3 to 6 months, and nearly all code within 12 months, according to a tweet.
- This bold prediction has sparked discussion among developers regarding the future of coding and AI's role in it.
MCP (Glama) Discord
- MCP Servers Struggle in Cursor Integration: Users reported issues integrating MCP servers like Brave Search with Cursor, despite successful integration with Claude, with errors like no tools available at glama.ai/mcp/servers/gwrql5ibq2.
- One member acknowledged this as a known limitation with plans to address it.
- Phoenix Framework Powers MCP Implementations: A member showcased MCPheonix on Github, a simplified implementation of the Model Context Protocol (MCP) server using Elixir's Phoenix Framework.
- This implementation simplifies MCP server creation and management.
- MCP Powers Android Debugging: A member introduced DroidMind, an MCP server that manages Android devices over ADB.
- This project facilitates debugging on-device issues and analyzing logs, controlled via AI.
- MCP Servers Spawn Other MCP Servers: A member unveiled mcp-create, an MCP server designed to build other MCP servers, with TypeScript support.
- The project includes an explanatory article detailing its capabilities and direct execution of generated MCP servers.
- Handoff Includes Full Context: A member shared a github.com search result noting that, by default, the handoff in OpenAI's SDK includes the entire conversation history.
- This encompasses all system, user, and assistant messages.
Notebook LM Discord
- NotebookLM Excels at Exam Prep: A user reported very good results using NotebookLM to quiz themselves on study guide topics, splitting PDFs by bookmarks into separate notebooks.
- The user turned the quiz results into flashcards in other apps for further study.
- NotebookLM Generates Medical Documents: A user in the medical field found NotebookLM useful for parsing guidelines and websites to create patient discharge information.
- Specifically, they created a concise one-page document for patients regarding work-related injury claims.
- Streamlining NotebookLM Ingestion: A user is automating the optimization of information for upload to NotebookLM, focusing on smaller files for easier robot ingestion.
- This streamlines their workflow for document processing in NotebookLM.
- Gemini Generates Discontent: A user expressed dissatisfaction with Gemini, despite its integration into the Google ecosystem.
- The user did not mention details about their negative experiences.
- NotebookLM Handles Massive Knowledge Bases: A user with a 10 million word knowledge base (1500 books, 6000 videos in text) inquired about NotebookLM's limits.
- A member of the NLM team clarified that NotebookLM supports 10 million words, within the 300 source and 500,000 words/source limits, leveraging RAG.
Codeium (Windsurf) Discord
- Windsurf Challenges Users to Refer Friends: The Windsurf Referral Challenge incentivizes users to refer friends, offering 500 flex credits per friend's Pro subscription and a shot at custom Airpods Pro Max by March 31st via windsurf.ai/refer.
- Most referrals wins, but all are winners with credits upon subscription.
- Codeium Extension Can't Read Files: The Codeium VS Code extension chat (Claude 3.7 Sonnet) cannot directly read script files from the folder and requires users to paste the file content into the chat.
- Users are encouraged to report it in codeium.com/support, because it should work technically.
- Claude 3.7 Sonnet Doesn't Work in VS Code Extension: The Claude 3.7 Sonnet Thinking model is not available in the VS Code extension, unlike in Windsurf.
- Users were told that Claude 3.7 Sonnet Thinking is not available in the extension at the moment.
- Codeium's Error Aborts Pending Requests: Users reported a persistent error, preventing Codeium from working, with the message Codeium: The server aborted pending request and mentioning a download URL from releases.codeiumdata.com.
- The issue persisted across IDE restarts and different versions, with users suggested to contact vscode@codeium.com.
- Windsurf Patches MCP and Sonnet: Windsurf released patch fixes in v1.4.6 addressing MCP reliability, 3.7 Sonnet web search, and proxy settings, detailed in the changelog.
- Windsurf Previews (Beta) also now allows users to preview locally run websites directly within Cascade.
Yannick Kilcher Discord
- Compilers Hack Math Operations?: Members debated whether compilers optimize calculations in deep learning frameworks like PyTorch and NumPy, specifically concerning the order of operations in complex equations like (1/n) (a(c + d) + b) versus a(c/n + d/n) + b/n.
- One engineer suggested adding extra brackets to ensure the system performs operations in the desired order, while another pondered the trade-offs between minimal code and explicit code.
- Matplotlib Drawn Amazingly by Claude 3.7: Engineers showed excitement for the Matplotlib graphs generated by Claude 3.7, emphasizing that benchmark and svgmaxing are functioning as expected.
- No specific links were given in this exchange.
- Adaptive Meta-Learning: Framework or Fad?: An engineer inquired whether the term Adaptive Meta-Learning (AML) is already established, describing it as a potential combination of Online HyperParameter Optimization (HPO) and meta-learning.
- Another engineer shared a Semantic Scholar search concluding that while the keywords are used together, they do not constitute a well-defined framework.
- Virtual Reality Solves Prison Crisis??: A California women's facility is finding success using VR headsets in solitary confinement, achieving >>>97% reduction in infractions according to this article.
- The VR program involves participants viewing scenes of daily life and travel adventures, processing their emotions through art.
LlamaIndex Discord
- Llama Extract Access Granted: A member requested access to Llama Extract and was offered addition to the closed beta, pending email confirmation to
rasmus-persson@outlook.com.- No further details were provided about the specifics of the closed beta.
- Premium Plan Upgrade Made Easy: A user inquired about upgrading to the Premium plan and received instructions to log in, click the profile icon, and select the upgrade/manage button.
- No further discussion or details were provided regarding the features or benefits of the premium plan.
- MP3 Parsing Puzzle for APIs: A user reported an error when uploading an .mp3 file for parsing through the API, noting that the upload works fine via the UI/webapp.
- They provided a screenshot of the error.
- Function Calling Face-Off: A member asked about alternative models besides those from OpenAI that are good for function calling, seeking a less expensive option.
- No specific alternative models were recommended in the provided context.
DSPy Discord
- Judge LLM Follows ChainPoll Pattern: Members are building a Judge LLM which follows the ChainPoll pattern, and returns the average response chain.
- A member suggested using
module.batch()ordspy.Parallelto speed up the process.
- A member suggested using
- Best of N Documentation Quest: A member was having trouble finding docs on Best of N.
- The same member noted that ensemble is listed as a teleprompter and asked if it optimizes or aggregates input programs into an optimal single program.
Torchtune Discord
- OpenPipe Masters Deductive-Reasoning: A member shared OpenPipe's deductive-reasoning project, highlighting its use of Torchtune for SOTA deductive reasoning model training.
- The project showcases Torchtune's capabilities in practical, cutting-edge AI applications, particularly in enhancing model training efficiency and effectiveness.
- FP8 Fine-Tuning Faces Friction: Members explored the difficulties of serving models in FP8, considering fine-tuning in FP8 to mitigate quantization error but noted that FP8 poses stability issues during training.
- They suggested gradually increasing weight decay to keep weights in the optimal range during FP8 fine-tuning.
- Torchtune's QAT Quest for FP8: A member inquired about Torchtune's QAT (Quantization Aware Training) support, specifically for FP8, with the goal of fine-tuning and reducing quantization error.
- A promising recipe was identified as a potential solution for FP8 QAT within Torchtune.
- Regression Rendezvous Reveals Review Required: The addition of regression tests prompted discussion about finalizing model size and evaluation methods.
- Members questioned the sufficiency of evaluation alone, hinting at a deeper conversation around more comprehensive measurement strategies.
- Evaluation Efficacy Examined Extensively: The discussion pivoted to the need for measurement strategies beyond simple evaluation, with members debating the value of various evaluation metrics.
- This deliberation is expected to influence decisions on model size and testing methodologies, promoting a shift towards more robust assessment practices.
Cohere Discord
- Cohere Launches Expedition Aya 2024!: Cohere For AI is launching Expedition Aya 2024, a 6-week open-build challenge focused on multilingual, multimodal, and efficient AI.
- Participants gain access to Cohere API credits, with prizes including limited edition Expedition swag and recognition for top projects, with the kick-off meeting in March 2025.
- SemiAnalysis Hosts Blackwell GPU Hackathon!: SemiAnalysis hosts an Nvidia Blackwell GPU Hackathon on Sunday March 16th, offering hands-on exploration of Blackwell & PTX infrastructure.
- Speakers include Philippe Tillet from OpenAI and Tri Dao from TogetherAI, with sponsorship from Together, Lambda, Google Cloud, Nvidia, GPU Mode, Thinking Machines, OpenAI, PyTorch, Coreweave, Nebius.
- Researcher Connects with Multilingual Communities: A researcher inquired about multilingual and multicultural activities within the Cohere Discord community, highlighting their appreciation for Cohere's work.
- New members are encouraged to introduce themselves, specifying affiliation, current projects, preferred tech/tools, and community goals, with community expectations.
tinygrad (George Hotz) Discord
- SemiAnalysis Hosts Nvidia Blackwell GPU Hackathon: SemiAnalysis is hosting an Nvidia Blackwell GPU Hackathon on Sunday, March 16th, offering hands-on exploration of Blackwell & PTX infrastructure while collaborating on open-source projects.
- Speakers include Philippe Tillet of OpenAI, Tri Dao of TogetherAI, and Horace He of Thinking Machines.
- GTC Kickoff Features Blackwell GPUs: SemiAnalysis is kicking off GTC with the Blackwell GPU Hackathon, which includes engaging morning keynotes, all-day hacking with powerful Blackwell GPUs like GB200s, and insightful afternoon talks.
- The event is sponsored by Together, Lambda, Google Cloud, Nvidia, GPU Mode, Thinking Machines, OpenAI, PyTorch, Coreweave, and Nebius.
Modular (Mojo 🔥) Discord
- CUDA Blogposts Anticipated: A user is waiting for the new blog posts on CUDA to be released.
- No additional information was provided.
- Anticipation builds for CUDA update: Enthusiasm is growing as users eagerly await the latest updates and blog posts regarding CUDA.
- The community is keen to explore new features and improvements in CUDA, although specific details remain under wraps.
AI21 Labs (Jamba) Discord
- Qdrant Kicked out of ConvRAG: The development teams considered Qdrant as the vector DB for their ConvRAG but decided to use a different one for unspecified reasons.
- The selected DB offered greater flexibility for VPC deployment.
- ConvRAG Chooses Alternative DB: A different vector DB was chosen over Qdrant for ConvRAG.
- The primary reason cited was the enhanced flexibility it provided for VPC deployment scenarios.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Nomic.ai (GPT4All) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!